PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogAIWhy 10 million tokens won’t save your AI agent (and what will)
ArticleViewpointAI

Why 10 million tokens won’t save your AI agent (and what will)

New models can process larger inputs, and confuse themselves in the process. Context management techniques can solve the problem.

J

Joaquin Bonifacino

10 min read · May 8, 2026

Why 10 million tokens won’t save your AI agent (and what will)

We are currently living through the AI equivalent of a horsepower war.

Not so long ago, major labs were announcing a model with an astronomically larger context window every few months. We went from 8,000 tokens to 100,000, then to a million. Today, you can find experimental models like Meta’s Llama 4 Scout boasting a 10-million-token capacity.

The underlying assumption is simple, if somewhat brute-force: if we just give the AI a big enough brain to hold every piece of documentation, every line of code, and every chat history simultaneously, it will finally be able to execute complex, long-running tasks autonomously.

But, if you’ve actually tried to build an agent that runs for days, weeks, or months, you already know the dirty secret of the AI industry. Giving an agent a massive context window doesn't make it a genius. It makes it a digital hoarder.

Stop the context rot

When you stuff a prompt to the gills, you trigger a phenomenon we call “context rot”.

As the context window fills up, the model’s performance actively degrades. It starts hallucinating. It loses the plot. It forgets the original objective you gave it three days ago, stops working, and nervously asks you: "Should I keep going?"

Thanks to the “needle in a haystack” test, a way to test LLMs’ in-context retrieval capabilities, the community has realized that simply throwing more tokens at the problem is a dead end.

The future of autonomous AI isn't about building a bigger disembodied brain. It’s about building a better office for that brain to work in. We call this the "harness."

Stop managing the prompt, start engineering the environment

Think about how you - a human - execute a project that takes three weeks.

You don’t try to hold the entire codebase, all your Jira tickets, and every Slack message in your active working memory at the exact same time. You’d lose your mind.

Instead, you use your environment. You write things down. You put files in folders. You leave yourself sticky notes. You delegate tasks to coworkers.

We need to stop treating AI agents like isolated brains trapped in a chat box and start giving them the same environmental affordances we rely on.

The "harness" surrounds the LLM with the environmental tools - like file systems and memory backends - needed to execute long-horizon tasks.

It is the environment where the agent lives. It is the scaffolding, the tools, the permissions, and the sandboxes we wrap around the foundational model.

If we want an agent to autonomously crawl the web, write code, run tests, and fix its own bugs over a month-long horizon, we have to teach it how to ruthlessly manage its own context.

Here is how we actually make that happen.

The power of context offloading

The golden rule of long-running agents is simple: Do not load what you do not immediately need.

Imagine your agent needs to find a specific event date buried in a massive HTML file. The amateur approach is to dump the entire raw HTML into the context window. Congratulations, you’ve just burned 8,000 tokens, cluttered the agent’s working memory, and invited hallucinations.

The professional approach is “context offloading”. Instead of giving the agent the HTML, you give the agent a secure sandbox - a temporary, isolated workspace where it can read, write, and execute code. You give it a goal; it decides what to do and in what order, like downloading the HTML to a file in that sandbox and running a simple terminal command (like grep) to search for the date.

The agent gets the exact answer it needs. It uses four tokens instead of 8,000. And its mental whiteboard remains perfectly clean.

This sandbox approach changes everything. For agents that run for weeks, they can use their file system to save their own state. If an agent realizes its context window is hitting 80% capacity, it can proactively write a summary of its progress to a text file. When that specific instance of the agent hits its limit and dies, the next agent in the relay simply reads the summary file and picks up exactly where the last one left off. It’s an approach that has been popularized by Anthropic in Claude.

Mental housekeeping: Compaction and summarization

Developers are inherently lazy. When we use AI coding assistants, we rarely start fresh, clean chat threads. We just keep iterating in the same window.

For an agent, this is fatal. If you asked an agent to read five files 20 minutes ago, those files - and the long tool-call outputs associated with them - are still sitting in its context window, silently degrading its ability to reason about your current request, costing users time and money..

To keep agents running indefinitely, the harness must perform mental housekeeping. We do this through compaction and summarization.

Compaction is the act of automatically trimming the fat. Every few turns, the harness quietly reaches into the agent's context and deletes old, bulky tool responses, replacing them with a tiny note that says: "If you need this result again, it is saved in [File X]."

Notice how the context length drops sharply during regular “compaction” phases, and resets entirely during a “summarization”.

When compaction isn't enough and the context window inevitably fills up, we trigger summarization. The harness pauses the agent, hands its entire messy context to a secondary, cheaper model, and says something like: "Summarize what has been done and what needs to happen next." The harness then wipes the main agent’s memory entirely, inserts only that brief summary, and lets it start fresh.

Defending the context layer with subagents

A good manager doesn't get bogged down in the weeds of a menial task; they delegate it. Deep agents should do the same.

When a long-running agent encounters a massive, complex sub-task - like running an exploration algorithm or testing a new piece of code - it shouldn't do that work in its main reasoning loop. That will cause massive context rot.

Instead, the harness should allow the main agent to spawn a temporary copy of itself.

The main agent writes a specific prompt for this "subagent," spins it up in an isolated environment, and waits. The subagent does the heavy lifting, burns through thousands of tokens, finds the answer, returns only the final result to the main agent, and then terminates.

This is how you defend the primary context layer. You keep the main agent's mind clear, focused solely on high-level orchestration, while disposable subagents take the cognitive hit of the dirty work.

Build less, understand more

Recently, I was talking to a developer who worked on Manus AI (an autonomous system recently acquired by Meta). He summarized the future of this space perfectly: "Build less, understand more".

For a long time, the instinct in AI engineering has been to micromanage the models. We write labyrinthine, 5,000-word system prompts outlining every possible edge case, hoping to control the agent's behavior.

But long-running autonomy doesn't come from a perfectly engineered prompt. It comes from the harness.

If you want an agent that can work for 30 days straight, stop trying to shove the entire world into its context window. Give it a file system. Give it a terminal. Give it the ability to delegate to subagents, summarize its own thoughts, and offload its memory to a sandbox.

If you give an AI the right environment, you won't need 10 million tokens for it to change the world.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
AI
J

Joaquin Bonifacino

More from this author

In this article

  • Stop the context rot
  • Stop managing the prompt, start engineering the environment
  • The power of context offloading
  • Mental housekeeping: Compaction and summarization
  • Defending the context layer with subagents
  • Build less, understand more

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026