PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogAI-assisted data extractionThe harness matters more than the model - Podcast EP07
ArticleInterviewAI-assisted data extractionLarge Language Models (LLMs)Developer interestData quality

The harness matters more than the model - Podcast EP07

"The model is the engine — but the harness is everything else." In Episode 7, we dig into why the infrastructure layer around your AI model matters more than the model itself, rank the best models available right now, and ask whether the open-weighted revolution is about to make frontier subscriptions obsolete.

John Rooney · Developer Engagement Manager

11 min read · June 27, 2026

The harness matters more than the model - Podcast EP07

The Harness Matters More Than the Model

Extract Data Podcast, Episode 7 — Reflections on models, system prompts, and the infrastructure layer nobody talks about enough

Apple podcasts link - Direct listen

There's a phrase from this week's conversation that I keep coming back to: "flirt with all the models, but marry one harness." It came from Ayan, and it probably captures the single most important practical takeaway from our discussion better than anything else I could say.

In Episode 7, Neha, Ayan, and I went deep on where the AI tooling landscape is actually heading — not the flashy benchmark numbers, but the quieter, more structural question of where value actually accumulates when you're building with these models day to day. We also got into our current model rankings, some honest reflections on context management, and the geopolitics of open-weighted models. A lot to unpack.

What Even Is a Harness?

We spent a good chunk of this episode making sure everyone — including Neha, who asked exactly the right beginner questions — understood what a harness actually is, because it's a term that gets thrown around without much explanation.

The short version: a harness is the environment from which you invoke a model. It's not the model itself. Claude Code is a harness. Codex is a harness. Open Code, Pear AI — these are all harnesses. The model is the engine; the harness is the chassis, the wheels, the dashboard, and the safety systems.

Ayan put it well: think of it as a small personal operating system for working with AI. When you spin up Claude Code, everything it knows about your project — the tools it can call, the MCP servers it has access to, the files it's allowed to read or modify — that's the harness doing its job. The model underneath is almost secondary to that configuration.

This framing matters more than it might seem, because most of the discussion in AI communities focuses almost entirely on the model layer. Which model is smarter? Which scored higher on some benchmark? But as the gap between models closes — and it's closing fast — the harness becomes the real differentiator.

System Prompts: The Invisible Guardrail

The harness conversation naturally led us into system prompts, which are worth understanding even if you never write one yourself.

When you make any API call to a language model, there are effectively two layers of instruction: the user prompt (what you're asking it to do right now) and the system prompt (a set of prior instructions that shape how the model behaves before it even sees your request). Harness providers write and maintain the system prompt on your behalf. When you're using Claude Code, for example, there's a substantial system prompt already in place — one that includes things like "you cannot write to a file without reading it first."

That might sound like a small constraint, but it's actually a meaningful safety net. Ayan made the point that when he's testing a brand-new model with zero track record — say, something that just appeared on Hugging Face — running it inside Claude Code's harness gives him some confidence that it won't do something catastrophic to his file system. The system prompt contains the guardrails that the model would otherwise have no knowledge of.

Compare that to making a raw API call with no system prompt at all. In that case, the model is fully stateless: it has no knowledge of your environment, your constraints, or your preferences. It'll do its best with what you give it, but you've removed all the scaffolding. That's fine in controlled situations — and sometimes it's exactly what you want — but it's worth understanding that the absence of a system prompt is itself a decision.

This came up in the context of Ayan's loop engineering work, where a repair agent is given a broken Scrapy spider and asked to fix it. Without a system prompt explicitly saying "fix the spider, not the HTML," there's a non-trivial chance the model decides the easiest path is to rewrite the HTML so the spider works — which is obviously useless for scraping a real website you don't control. The system prompt closes that ambiguity.

Harness Philosophy: Safety vs. Control

Ayan raised an interesting tension here. Claude Code's system prompt is famously large — people have complained about how much context window it consumes. But that size is the point: it's load-bearing. The guardrails are in there precisely because they need to be there.

Smaller, leaner harnesses like Pear AI trade some of that safety net for efficiency and flexibility. That's a legitimate trade-off depending on your use case, but it's worth going in with eyes open. The analogy I kept coming back to: it's like the difference between running a curated Linux distribution versus compiling everything from scratch. Full control is genuinely available; you just need to know what you're giving up.

There's also a broader architectural debate happening in the harness space right now. One school of thought favours multiple parallel agents running simultaneously. Another — and Ayan's firmly in this camp, with his background in Linux kernel development — favours sub-agents: spawning child processes that have their own independent context windows and report back only what's relevant. Parallel agents share a context window and get noisy; sub-agents stay scoped. It's forks versus daemons, essentially, just applied to AI orchestration.

The Model Rankings (As of Right Now)

With all that said about harnesses, we did still talk about the models themselves — because they're not irrelevant, just less decisive than the infrastructure around them.

Ayan's current ranking, based on his own testing:

  1. Fable 5 — his favourite, but unavailable to us without an NDA arrangement with Anthropic that isn't happening. A shame.
  2. GPT 5.5 — a significant leap from GPT 5.4, and accessible via Codex subscription without needing direct API spend. Surprisingly good value.
  3. Claude Opus 4.8 / GLM 5.2 — roughly equivalent at this level. GLM 5.2 is particularly strong for front-end and design work.
  4. Kimi 2.7, DeepSeek V4, Composer, Gemini 3.5 Flash — the tier below, each with specific strengths.

My own usage is a bit more prosaic. Day to day I lean on Sonnet 4.6 and Opus 4.8 because we have access to them. Outside of that, I've been using GLM 5.2 a fair amount for building and coding — it's thorough, sometimes almost too thorough, but very capable. Deep Seek V4 Pro is also a regular for me when I want a good price-to-performance ratio. And my Hermes agent runs on Deep Seek Flash, because all it needs to do is process text, respond, and save things — there's no point using a frontier model for that.

Neha's been spending most of her time in Codex lately, having drifted away from Claude Pro. Honestly, at the current price point, Codex is hard to argue with.

One genuinely fun data point: Ayan mentioned a site called inthewids.com that runs your name across multiple models to see which ones have training data about you. Deep Seek apparently knows quite a lot about me as a figure in the web scraping community. Nobody else got a hit on Deep Seek. Proof that lurking in the right corners of the internet does eventually compound.

The Token Cost of Being Thorough

One practical note that came out of both Ayan's and my experience with GLM 5.2: it's verbose. Extremely thorough, often to a degree beyond what's actually needed. There are tools like Caveman (which restricts output style to reduce token count) and Ponytail (which apparently adjusts both input and output for minimal, functional code) that can help with this.

The deeper point is that model cost isn't just a function of the model's base rate — it's also a function of how much output it generates. A cheaper model that produces twice the tokens can end up costing more than a pricier model that's more concise. Worth keeping in mind if you're running long agentic loops, which leads to...

Speed Doesn't Matter the Way You Think It Does

We started the episode by comparing GLM 5.2 to Claude Sonnet, and one of the first things I noticed was that GLM 5.2 felt slower. Ayan quickly pointed out that the slowness was almost certainly an Open Router routing issue — different providers offer wildly different inference speeds for the same model, and Open Router's default routing doesn't always prioritise speed.

But more interestingly: does it even matter? If you're using a model interactively, token-per-second speed is very real and very annoying when it's slow. But most of the work I'm building toward is automated — the model runs, does its thing, I come back to a result. In that context, a model that takes twice as long to run is almost never the bottleneck. This is especially true for the kind of multi-hour agentic workflows that Ayan mentioned are becoming more common, where a benchmark and testing mechanism just loops until a goal is reached.

Context Management, and Knowing When to Walk Away

Near the end we talked about context management hacks. I'll be honest: mine are non-existent. Heavy, inefficient context windows are my natural state. The closest thing I do is occasionally save a plan to a markdown file so I can use it as a clean handoff in a new session.

Ayan's approach is more systematic. He uses /compact and /clear to manage Claude Code sessions, and — importantly — asks the model to write a handoff message before clearing, summarising what was accomplished and what comes next. That message then seeds the next agent session. It's essentially a baton pass: no context is lost, but the window stays clean.

He also leans on sub-agents for research tasks. Rather than loading everything into one expanding context, he has the main agent spin up several Haiku or Sonnet sub-agents, each with their own independent context window, to fetch and process information in parallel. Only the relevant output comes back — not the noise.

Neha's answer was the most honest of all: she's at peak AI fatigue, wants to put down the tools, fill up her context window with actual humans and beaches, and remember what creativity felt like before it got outsourced. Which, honestly, is worth hearing. Even for those of us who are deep in this stuff, the signal-to-noise ratio of AI tooling can degrade your own thinking if you're not careful.

The Open-Weighted Question

There's a broader political dimension to all of this that we touched on: the US government apparently requested a staggered rollout of GPT 5.6, with individual user approvals. Ayan's quietly rooting for open-weighted models to win precisely because he doesn't want a government gatekeeping his access to inference.

The irony is real: for decades, open source was largely a Western project — Linux, Apache, the whole stack. Chinese labs were the closed ones. AI has inverted that completely. The best open-weighted models right now are coming out of Chinese labs (GLM, DeepSeek, Kimi), with permissive licenses and public weights. Meanwhile, frontier US models are getting more restricted, not less.

The gap is closing fast. Ayan's estimate: six months before open-weighted models are genuinely on par with state-of-the-art frontier models for most tasks. At that point, the question of self-hosting becomes much more interesting — even if true self-hosting (running a capable model on your own hardware without renting GPU time) is still a way off for most people.

His prediction for the near future: people will take these open-weighted base models and fine-tune them for specific use cases. GLM 5.2 for web scraping. GLM 5.2 for front-end design. Smaller, purpose-built variants that don't require racks of GPU memory to run. That's happening already on Hugging Face.

One Last Thing

We're going to be at EuroPython in Kraków. If you're there, come find us at the booth and mention the podcast — we'll sort you out with some swag. And the loop engineering meetup recording should be live in the end cards of this episode.

See you next week.

Extract Data is a weekly podcast on web scraping, data engineering, and the AI tools that are changing how we build. Hosted by John Rooney, Ayan Pahwa, and Neha Setia.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
AI-assisted data extractionLarge Language Models (LLMs)Developer interestData quality

John Rooney

Developer Engagement Manager

More from this author

In this article

  • The Harness Matters More Than the Model
  • What Even Is a Harness?
  • System Prompts: The Invisible Guardrail
  • Harness Philosophy: Safety vs. Control
  • The Model Rankings (As of Right Now)
  • The Token Cost of Being Thorough
  • Speed Doesn't Matter the Way You Think It Does
  • Context Management, and Knowing When to Walk Away
  • The Open-Weighted Question
  • One Last Thing

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026