archive
Years ago, some leading researchers told us that their objective was AGI. Eager to hear a coherent definition, we naively asked “how do you define AGI?”. They paused, looked at each other tentatively, and then offered up what’s since become something of a mantra in the field of AI: “well, we each kind of have our own definitions, but we’ll know it when we see it.”
This vignette typifies our quest for a concrete definition of AGI. It has proven elusive.
While the definition is elusive, the reality is not. AGI is here, now.
Coding agents are the first example. There are more on the way.
Long-horizon agents are functionally AGI, and 2026 will be their year.
We are investors. We study markets, founders, and the collision thereof: businesses.
Given that, ours is a functional definition, not a technical definition. New technical capabilities beg the Don Valentine question: so what?
The answer resides in real world impact.
*We appreciate that such an imprecise definition will not settle any philosophical debates. Pragmatically speaking, what do you want if you’re trying to get something done? An AI that can just figure stuff out. How it happens is of less concern than the fact that it happens.
A human who can figure things out has some baseline knowledge, the ability to reason over that knowledge, and the ability to iterate their way to the answer.
An AI that can figure things out has some baseline knowledge (pre-training), the ability to reason over that knowledge (inference-time compute), and the ability to iterate its way to the answer (long-horizon agents).
The first ingredient (knowledge / pre-training) is what fueled the original ChatGPT moment in 2022. The second (reasoning / inference-time compute) came with the release of o1 in late 2024. The third (iteration / long-horizon agents) came in the last few weeks with Claude Code and other coding agents crossing a capability threshold.
Generally intelligent people can work autonomously for hours at a time, making and fixing their mistakes and figuring out what to do next without being told. Generally intelligent agents can do the same thing. This is new.
The agent starts with the obvious: LinkedIn searches for “Developer Advocate” and “DevRel” at competing companies — Datadog, Temporal, Langchain. It finds hundreds of profiles. But job titles don’t reveal who’s actually good at this.
It pivots to signal over credentials. It searches YouTube for conference talks. It finds 50+ speakers, then filters for those with talks that have strong engagement.
It cross-references those speakers with Twitter. Half have inactive accounts or just retweet their employer’s blog posts. Not what we want. But a dozen have real followings — they post real opinions, reply to people, and get engagement from developers. And their posts have real taste.
The agent narrows further. It checks who’s been posting less frequently in the last three months. A drop in activity sometimes signals disengagement from their current role. Three names surface.
It researches those three. One just announced a new role — too late. One is a founder of a company that just raised funding — not leaving. The third is a senior DevRel at a Series D company that just did layoffs in marketing. Her last talk was about exactly the platform engineering space the startup targets. She has 14k Twitter followers and posts memes that actual engineers engage with. She hasn’t updated her LinkedIn in two months.
The agent drafts an email acknowledging her recent talk, the overlap with the startup’s ICP, and a specific note about the creative freedom a smaller team offers. It suggests a casual conversation, not a pitch.
Total time: 31 minutes. The founder has a shortlist of one instead of a JD posted to a job board.
This is what it means to figure things out. Navigating ambiguity to accomplish a goal – forming hypotheses, testing them, hitting dead ends, and pivoting until something clicks. The agent didn’t follow a script. It ran the same loop a great recruiter runs in their head, except it did it tirelessly in 31 minutes, without being told how.
To be clear: agents still fail. They hallucinate, lose context, and sometimes charge confidently down exactly the wrong path. But the trajectory is unmistakable, and the failures are increasingly fixable.
Coaxing a model to think for longer is not trivial. A base reasoning model can think for seconds or minutes.
Two different technical approaches seem to both be working and scaling well: reinforcement learning and agent harnesses. The former approach teaches a model intrinsically to stay on track for longer by poking and prodding it to maintain focus during the training process. The latter designs specific scaffolding around the known limitations of models (memory hand-offs, compaction, and more).
Scaling reinforcement learning is the domain of the research labs. They have made exceptional progress on this front, from multi-agent systems to reliable tool use.
Designing great agent harnesses is the domain of the application layer. Some of the most beloved products on the market today are known for their exceptionally engineered agent harnesses: Manus, Claude Code, Factory’s Droids, etc.
If there’s one exponential curve to bet on, it’s the performance of long-horizon agents. METR has been meticulously tracking AI’s ability to complete long-horizon tasks. The rate of progress is exponential, doubling every ~7 months. If we trace out the exponential, agents should be able to work reliably to complete tasks that take human experts a full day by 2028, a full year by 2034, and a full century by 2037.
You can “hire” GPT-5.2 or Claude or Grok or Gemini today. More examples are on the way:
The AI applications of 2023 and 2024 were talkers. Some were very sophisticated conversationalists! But their impact was limited.
The AI applications of 2026 and 2027 will be doers. They will feel like colleagues. Usage will go from a few times a day to all-day, every day, with multiple instances running in parallel. Users won’t save a few hours here and there – they’ll go from working as an IC to managing a team of agents.
Remember all that talk of selling work? Now it’s possible.
What work can you accomplish? The capabilities of a long-horizon agent are drastically different than a single forward pass of a model. What new capabilities do long-horizon agents unlock in your domain? What tasks require persistence, where sustained attention is the bottleneck?
How will you productize that work? How will your application interface evolve in your domain, as the UI of work grows from chatbot to agent delegation?
Can you do that work reliably? Are you obsessively improving your agent harness? Do you have a strong feedback loop?
How can you sell that work? Can you price and package to value and outcomes?
Today, your agents can probably work reliably for ~30 minutes. But they’ll be able to perform a day’s worth of work very soon – and a century’s worth of work eventually.
What can you achieve when your plans are measured in centuries? A century is 200,000 clinical trials no one’s cross-referenced. A century is every customer support ticket ever filed, finally mined for signal. A century is the entire U.S. tax code, refactored for coherence.
The ambitious version of your roadmap just became the realistic one.
Thanks to Dan Roberts, Harrison Chase, Noam Brown, Sholto Douglas, Isa Fulford, Ben Mann, Nick Turley, Phil Duan, Michelle Bailhe, and Romie Boyd for reviewing drafts of this post.
About the author(s)...
Huang is a partner on the growth team at Sequoia, where she has co-led ten new investments including Attentive, Glossier, Gong and Tecton. The daughter of immigrant engineers and an entrepreneur father, Huang was part of the first machine learning undergraduate program at Princeton University and conducted applied AI research in both astrophysics and neuroscience. She previously worked at Goldman Sachs and TPG Capital and is a member of All Raise.
OP Note: This has been the worst formatted article I've ever seen in my life.
Years ago, some leading researchers told us that their objective was AGI. Eager to hear a coherent definition, we naively asked “how do you define AGI?”. They paused, looked at each other tentatively, and then offered up what’s since become something of a mantra in the field of AI: “well, we each kind of have our own definitions, but we’ll know it when we see it.”
This vignette typifies our quest for a concrete definition of AGI. It has proven elusive.
While the definition is elusive, the reality is not. AGI is here, now.
Coding agents are the first example. There are more on the way.
Long-horizon agents are functionally AGI, and 2026 will be their year.
Blissfully Unencumbered by the Details
Before we go any further, it’s worth acknowledging that we do not have the moral authority to propose a technical definition of AGI.We are investors. We study markets, founders, and the collision thereof: businesses.
Given that, ours is a functional definition, not a technical definition. New technical capabilities beg the Don Valentine question: so what?
The answer resides in real world impact.
A Functional Definition of AGI
AGI is the ability to figure things out. That’s it.**We appreciate that such an imprecise definition will not settle any philosophical debates. Pragmatically speaking, what do you want if you’re trying to get something done? An AI that can just figure stuff out. How it happens is of less concern than the fact that it happens.
A human who can figure things out has some baseline knowledge, the ability to reason over that knowledge, and the ability to iterate their way to the answer.
An AI that can figure things out has some baseline knowledge (pre-training), the ability to reason over that knowledge (inference-time compute), and the ability to iterate its way to the answer (long-horizon agents).
The first ingredient (knowledge / pre-training) is what fueled the original ChatGPT moment in 2022. The second (reasoning / inference-time compute) came with the release of o1 in late 2024. The third (iteration / long-horizon agents) came in the last few weeks with Claude Code and other coding agents crossing a capability threshold.
Generally intelligent people can work autonomously for hours at a time, making and fixing their mistakes and figuring out what to do next without being told. Generally intelligent agents can do the same thing. This is new.
What Does It Mean to Figure Things Out?
A founder messages his agent: “I need a developer relations lead. Someone technical enough to earn respect from senior engineers, but who actually enjoys being on Twitter. We sell to platform teams. Go.”The agent starts with the obvious: LinkedIn searches for “Developer Advocate” and “DevRel” at competing companies — Datadog, Temporal, Langchain. It finds hundreds of profiles. But job titles don’t reveal who’s actually good at this.
It pivots to signal over credentials. It searches YouTube for conference talks. It finds 50+ speakers, then filters for those with talks that have strong engagement.
It cross-references those speakers with Twitter. Half have inactive accounts or just retweet their employer’s blog posts. Not what we want. But a dozen have real followings — they post real opinions, reply to people, and get engagement from developers. And their posts have real taste.
The agent narrows further. It checks who’s been posting less frequently in the last three months. A drop in activity sometimes signals disengagement from their current role. Three names surface.
It researches those three. One just announced a new role — too late. One is a founder of a company that just raised funding — not leaving. The third is a senior DevRel at a Series D company that just did layoffs in marketing. Her last talk was about exactly the platform engineering space the startup targets. She has 14k Twitter followers and posts memes that actual engineers engage with. She hasn’t updated her LinkedIn in two months.
The agent drafts an email acknowledging her recent talk, the overlap with the startup’s ICP, and a specific note about the creative freedom a smaller team offers. It suggests a casual conversation, not a pitch.
Total time: 31 minutes. The founder has a shortlist of one instead of a JD posted to a job board.
This is what it means to figure things out. Navigating ambiguity to accomplish a goal – forming hypotheses, testing them, hitting dead ends, and pivoting until something clicks. The agent didn’t follow a script. It ran the same loop a great recruiter runs in their head, except it did it tirelessly in 31 minutes, without being told how.
To be clear: agents still fail. They hallucinate, lose context, and sometimes charge confidently down exactly the wrong path. But the trajectory is unmistakable, and the failures are increasingly fixable.
How Did We Get Here? From Reasoning Models to Long-Horizon Agents
In last year’s essay, we wrote about reasoning models as the most important new frontier for AI. Long-horizon agents push this paradigm further by allowing models to take actions and iterate over time.Coaxing a model to think for longer is not trivial. A base reasoning model can think for seconds or minutes.
Two different technical approaches seem to both be working and scaling well: reinforcement learning and agent harnesses. The former approach teaches a model intrinsically to stay on track for longer by poking and prodding it to maintain focus during the training process. The latter designs specific scaffolding around the known limitations of models (memory hand-offs, compaction, and more).
Scaling reinforcement learning is the domain of the research labs. They have made exceptional progress on this front, from multi-agent systems to reliable tool use.
Designing great agent harnesses is the domain of the application layer. Some of the most beloved products on the market today are known for their exceptionally engineered agent harnesses: Manus, Claude Code, Factory’s Droids, etc.
If there’s one exponential curve to bet on, it’s the performance of long-horizon agents. METR has been meticulously tracking AI’s ability to complete long-horizon tasks. The rate of progress is exponential, doubling every ~7 months. If we trace out the exponential, agents should be able to work reliably to complete tasks that take human experts a full day by 2028, a full year by 2034, and a full century by 2037.
So What?
Soon you’ll be able to hire an agent. That’s one litmus test for AGI (h/t: Sarah Guo).You can “hire” GPT-5.2 or Claude or Grok or Gemini today. More examples are on the way:
- Medicine: OpenEvidence’s Deep Consult functions as a specialist
- Law: Harvey’s agents function as an Associate
- Cybersecurity: XBOW functions as a pen-tester
- DevOps: Traversal’s agents function as an SRE
- GTM: Day AI functions as a BDR, SE, and Rev Ops leader
- Recruiting: Juicebox functions as a recruiter
- Math: Harmonic’s Aristotle functions as a mathematician
- Semiconductor Design: Ricursive’s agents function as chip designers
- AI Researcher: GPT-5.2 and Claude function as AI researchers
From Talkers to Doers: Implications for Founders
This has profound implications for founders.The AI applications of 2023 and 2024 were talkers. Some were very sophisticated conversationalists! But their impact was limited.
The AI applications of 2026 and 2027 will be doers. They will feel like colleagues. Usage will go from a few times a day to all-day, every day, with multiple instances running in parallel. Users won’t save a few hours here and there – they’ll go from working as an IC to managing a team of agents.
Remember all that talk of selling work? Now it’s possible.
What work can you accomplish? The capabilities of a long-horizon agent are drastically different than a single forward pass of a model. What new capabilities do long-horizon agents unlock in your domain? What tasks require persistence, where sustained attention is the bottleneck?
How will you productize that work? How will your application interface evolve in your domain, as the UI of work grows from chatbot to agent delegation?
Can you do that work reliably? Are you obsessively improving your agent harness? Do you have a strong feedback loop?
How can you sell that work? Can you price and package to value and outcomes?
Saddle Up!
It’s time to ride the long-horizon agent exponential.Today, your agents can probably work reliably for ~30 minutes. But they’ll be able to perform a day’s worth of work very soon – and a century’s worth of work eventually.
What can you achieve when your plans are measured in centuries? A century is 200,000 clinical trials no one’s cross-referenced. A century is every customer support ticket ever filed, finally mined for signal. A century is the entire U.S. tax code, refactored for coherence.
The ambitious version of your roadmap just became the realistic one.
Thanks to Dan Roberts, Harrison Chase, Noam Brown, Sholto Douglas, Isa Fulford, Ben Mann, Nick Turley, Phil Duan, Michelle Bailhe, and Romie Boyd for reviewing drafts of this post.
About the author(s)...
Pat Grady
- Grady is a partner at Sequoia, where he is co-captain of growth-stage investments. He is considered one of the firm's leading investors in artificial intelligence.
- Grady co-led Sequoia's investment in OpenAI along with Alfred Lin and Sonya Huang in 2021. OpenAI was last valued at $300 billion in a March 2025 round.
- Grady led investments for legal AI platform Harvey and serves on the company's board. Early this year, Harvey announced a Sequoia-led, Series D funding round of $300 million, valuing the company at $3 billion.
- He also led a $75 million Series A funding round for medical search platform Open Evidence in February 2025, raising the company's valuation to $1 billion.
- He co-led Sequoia's investment in cloud computing company Snowflake, which raised about $3.4 billion from its IPO in 2020. It was the biggest software IPO in U.S. history.
- Grady participated in the $100 million Series C of machine learning startup Hugging Face ($4.5 billion) and the $150 million Series D of data infrastructure company Cribl ($3.5 billion) in 2021.
- Other notable investments include Sumo Logic (IPO 2020), Okta (IPO 2017), HubSpot (IPO 2014), Zoom (IPO 2019) and Embark (IPO 2021).
My high school jobs were in construction. Compared to making $9 an hour putting down T-Lock shingles in 110-degree heat, studying for midterms was paradise. After college, I went into inside sales—50 dials a day yielded 200 conversations a month. At any point, you could hit a button and see how your metrics ranked. I almost killed myself making sure I was on top every day.
At Sequoia, we get to work for some of the best causes on the planet. That may sound like BS, but it’s something we really care about. The vast majority of the money we invest comes from universities, foundations and other nonprofits. I didn’t know it at the time, but my tuition at Boston College was funded in part by proceeds from Sequoia investments. Our returns make a real impact, which is a privilege and a tremendous responsibility—if we screw up, people lose out on scholarships and cancer research. Now that’s motivation.
Sonya Huang
Huang is a partner on the growth team at Sequoia, where she has co-led ten new investments including Attentive, Glossier, Gong and Tecton. The daughter of immigrant engineers and an entrepreneur father, Huang was part of the first machine learning undergraduate program at Princeton University and conducted applied AI research in both astrophysics and neuroscience. She previously worked at Goldman Sachs and TPG Capital and is a member of All Raise.
I’m a Silicon Valley kid at heart. I was born in Mountain View and grew up in the gravity well of innovation that was 1990s Silicon Valley. It was a fascinating time, from learning to code in third grade to witnessing blockbuster IPOs and the dotcom bubble burst.
Growing up in the Valley made me believe deeply in the American Dream and technology’s role in creating a better future. My parents are immigrant engineers, and technology and entrepreneurship gave them the vehicle to create something lasting. I see venture capital as a conduit to amplify the best founders and the best ideas and give them every possible unfair advantage.
I worked at Goldman Sachs and in private equity before joining Sequoia. But I don’t really think of myself as a “finance person.” I’m much more comfortable reading a technical paper or talking to a researcher than going to an industry conference. I like technical people who are precise and can ship.
I’ve always been very interested in AI, dating back to my college days where I trained computer vision neural nets on brain scans and astrophysics data. It’s awesome, exhilarating, and disorienting just how rapidly the field is accelerating. (PS, I’m very AGI-pilled).
I’m lucky to count some of the top founders and researchers in AI as friends. I’m documenting many of these conversations through Sequoia’s AI podcast (Training Data) and our annual AI event (AI Ascent). I believe we are living through a historical moment in time that deserves to be explored and preserved with great curiosity and care.
OP Note: This has been the worst formatted article I've ever seen in my life.