PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Login Try Zyte API Contact Sales

Unblocking and Extraction
Zyte API
The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing
Ban Handling
Headless Browser
AI Extraction
SERP
Enterprise
Documentation Support
Hosting and Deployment
Scrapy Cloud
Run, monitor, and control your Scrapy spiders however you want to.
Coding Agent Add-Ons
Agentic Web Data
Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.
Data Services
Pricing
Browse
Subscribe
- NewsletterSwiftly delivered
- Discord communityExtract Data community
Product and E-commerce
From e-commerce and online marketplaces
Data for AI
Collect and structure web data to feed AI
Job Posting
From job boards and recruitment websites
Real Estate
From Listings portals and specialist websites
News and Article
From online publishers and news websites
Search
Search engine results page data (SERP)
Social Media
From social media platforms online
Meet Zyte
Our story, people and values
Contact us
Get in touch
Support
Knowledge base and raise support tickets
Terms and Policies
Accept our terms and policies
Open Source
Our open source projects and contributions
Web Data Compliance
Guidelines and resources for compliant web data collection
Join the team building the future of web data
We're Hiring
Trust Center
Security, compliance & certifications

Login Try Zyte API Contact Sales

Search blog articles

AI66, 66 articles

Data quality13, 13 articles

Developer interest57, 57 articles

Integration2, 2 articles

Open-source41, 41 articles

Proxies29, 29 articles

Scraping practice19, 19 articles

Scraping strategy29, 29 articles

Web data60, 60 articles

Web scraping APIs36, 36 articles

Scrapy47, 47 articles

Scrapy Cloud14, 14 articles

Web Scraping Copilot11, 11 articles

Zyte API57, 57 articles

AI & Machine Learning3, 3 articles

Automotive2, 2 articles

E-commerce & retail27, 27 articles

Entertainment & Streaming2, 2 articles

Financial Services8, 8 articles

Government2, 2 articles

Market Research & Intelligence3, 3 articles

Media & publishing8, 8 articles

Real Estate2, 2 articles

Recruitment & HR3, 3 articles

Transportation & Logistics2, 2 articles

Travel & hospitality2, 2 articles

iPaaS2, 2 articles

Large language model24, 24 articles

MCP3, 3 articles

Python88, 88 articles

Web Scraping Industry Report14, 14 articles

Appearance

Discord Community

BlogAI-assisted data extractionThe harness matters more than the model - Podcast EP07

ArticleInterview, Q&AAI-assisted data extraction Large Language Models (LLMs)Developer interest Data quality

The harness matters more than the model - Podcast EP07

"The model is the engine — but the harness is everything else." In Episode 7, we dig into why the infrastructure layer around your AI model matters more than the model itself, rank the best models available right now, and ask whether the open-weighted revolution is about to make frontier subscriptions obsolete.

John Rooney · Developer Engagement Manager

11 min read · June 27, 2026

The harness matters more than the model - Podcast EP07

The Harness Matters More Than the Model

Extract Data Podcast, Episode 7 — Reflections on models, system prompts, and the infrastructure layer nobody talks about enough

Apple podcasts link - Direct listen

There's a phrase from this week's conversation that I keep coming back to: "flirt with all the models, but marry one harness." It came from Ayan, and it probably captures the single most important practical takeaway from our discussion better than anything else I could say.

In Episode 7, Neha, Ayan, and I went deep on where the AI tooling landscape is actually heading — not the flashy benchmark numbers, but the quieter, more structural question of where value actually accumulates when you're building with these models day to day. We also got into our current model rankings, some honest reflections on context management, and the geopolitics of open-weighted models. A lot to unpack.

What Even Is a Harness?

We spent a good chunk of this episode making sure everyone — including Neha, who asked exactly the right beginner questions — understood what a harness actually is, because it's a term that gets thrown around without much explanation.

The short version: a harness is the environment from which you invoke a model. It's not the model itself. Claude Code is a harness. Codex is a harness. Open Code, Pear AI — these are all harnesses. The model is the engine; the harness is the chassis, the wheels, the dashboard, and the safety systems.

Ayan put it well: think of it as a small personal operating system for working with AI. When you spin up Claude Code, everything it knows about your project — the tools it can call, the MCP servers it has access to, the files it's allowed to read or modify — that's the harness doing its job. The model underneath is almost secondary to that configuration.

This framing matters more than it might seem, because most of the discussion in AI communities focuses almost entirely on the model layer. Which model is smarter? Which scored higher on some benchmark? But as the gap between models closes — and it's closing fast — the harness becomes the real differentiator.

System Prompts: The Invisible Guardrail

The harness conversation naturally led us into system prompts, which are worth understanding even if you never write one yourself.

When you make any API call to a language model, there are effectively two layers of instruction: the user prompt (what you're asking it to do right now) and the system prompt (a set of prior instructions that shape how the model behaves before it even sees your request). Harness providers write and maintain the system prompt on your behalf. When you're using Claude Code, for example, there's a substantial system prompt already in place — one that includes things like "you cannot write to a file without reading it first."

That might sound like a small constraint, but it's actually a meaningful safety net. Ayan made the point that when he's testing a brand-new model with zero track record — say, something that just appeared on Hugging Face — running it inside Claude Code's harness gives him some confidence that it won't do something catastrophic to his file system. The system prompt contains the guardrails that the model would otherwise have no knowledge of.

Compare that to making a raw API call with no system prompt at all. In that case, the model is fully stateless: it has no knowledge of your environment, your constraints, or your preferences. It'll do its best with what you give it, but you've removed all the scaffolding. That's fine in controlled situations — and sometimes it's exactly what you want — but it's worth understanding that the absence of a system prompt is itself a decision.

This came up in the context of Ayan's loop engineering work, where a repair agent is given a broken Scrapy spider and asked to fix it. Without a system prompt explicitly saying "fix the spider, not the HTML," there's a non-trivial chance the model decides the easiest path is to rewrite the HTML so the spider works — which is obviously useless for scraping a real website you don't control. The system prompt closes that ambiguity.

Harness Philosophy: Safety vs. Control

Ayan raised an interesting tension here. Claude Code's system prompt is famously large — people have complained about how much context window it consumes. But that size is the point: it's load-bearing. The guardrails are in there precisely because they need to be there.

Smaller, leaner harnesses like Pear AI trade some of that safety net for efficiency and flexibility. That's a legitimate trade-off depending on your use case, but it's worth going in with eyes open. The analogy I kept coming back to: it's like the difference between running a curated Linux distribution versus compiling everything from scratch. Full control is genuinely available; you just need to know what you're giving up.

There's also a broader architectural debate happening in the harness space right now. One school of thought favours multiple parallel agents running simultaneously. Another — and Ayan's firmly in this camp, with his background in Linux kernel development — favours sub-agents: spawning child processes that have their own independent context windows and report back only what's relevant. Parallel agents share a context window and get noisy; sub-agents stay scoped. It's forks versus daemons, essentially, just applied to AI orchestration.

The Model Rankings (As of Right Now)

With all that said about harnesses, we did still talk about the models themselves — because they're not irrelevant, just less decisive than the infrastructure around them.

Ayan's current ranking, based on his own testing:

Fable 5 — his favourite, but unavailable to us without an NDA arrangement with Anthropic that isn't happening. A shame.
GPT 5.5 — a significant leap from GPT 5.4, and accessible via Codex subscription without needing direct API spend. Surprisingly good value.
Claude Opus 4.8 / GLM 5.2 — roughly equivalent at this level. GLM 5.2 is particularly strong for front-end and design work.
Kimi 2.7, DeepSeek V4, Composer, Gemini 3.5 Flash — the tier below, each with specific strengths.

My own usage is a bit more prosaic. Day to day I lean on Sonnet 4.6 and Opus 4.8 because we have access to them. Outside of that, I've been using GLM 5.2 a fair amount for building and coding — it's thorough, sometimes almost too thorough, but very capable. Deep Seek V4 Pro is also a regular for me when I want a good price-to-performance ratio. And my Hermes agent runs on Deep Seek Flash, because all it needs to do is process text, respond, and save things — there's no point using a frontier model for that.

Neha's been spending most of her time in Codex lately, having drifted away from Claude Pro. Honestly, at the current price point, Codex is hard to argue with.

One genuinely fun data point: Ayan mentioned a site called inthewids.com that runs your name across multiple models to see which ones have training data about you. Deep Seek apparently knows quite a lot about me as a figure in the web scraping community. Nobody else got a hit on Deep Seek. Proof that lurking in the right corners of the internet does eventually compound.

The Token Cost of Being Thorough

One practical note that came out of both Ayan's and my experience with GLM 5.2: it's verbose. Extremely thorough, often to a degree beyond what's actually needed. There are tools like Caveman (which restricts output style to reduce token count) and Ponytail (which apparently adjusts both input and output for minimal, functional code) that can help with this.

The deeper point is that model cost isn't just a function of the model's base rate — it's also a function of how much output it generates. A cheaper model that produces twice the tokens can end up costing more than a pricier model that's more concise. Worth keeping in mind if you're running long agentic loops, which leads to...

Speed Doesn't Matter the Way You Think It Does

We started the episode by comparing GLM 5.2 to Claude Sonnet, and one of the first things I noticed was that GLM 5.2 felt slower. Ayan quickly pointed out that the slowness was almost certainly an Open Router routing issue — different providers offer wildly different inference speeds for the same model, and Open Router's default routing doesn't always prioritise speed.

But more interestingly: does it even matter? If you're using a model interactively, token-per-second speed is very real and very annoying when it's slow. But most of the work I'm building toward is automated — the model runs, does its thing, I come back to a result. In that context, a model that takes twice as long to run is almost never the bottleneck. This is especially true for the kind of multi-hour agentic workflows that Ayan mentioned are becoming more common, where a benchmark and testing mechanism just loops until a goal is reached.

Context Management, and Knowing When to Walk Away

Near the end we talked about context management hacks. I'll be honest: mine are non-existent. Heavy, inefficient context windows are my natural state. The closest thing I do is occasionally save a plan to a markdown file so I can use it as a clean handoff in a new session.

Ayan's approach is more systematic. He uses /compact and /clear to manage Claude Code sessions, and — importantly — asks the model to write a handoff message before clearing, summarising what was accomplished and what comes next. That message then seeds the next agent session. It's essentially a baton pass: no context is lost, but the window stays clean.

He also leans on sub-agents for research tasks. Rather than loading everything into one expanding context, he has the main agent spin up several Haiku or Sonnet sub-agents, each with their own independent context window, to fetch and process information in parallel. Only the relevant output comes back — not the noise.

Neha's answer was the most honest of all: she's at peak AI fatigue, wants to put down the tools, fill up her context window with actual humans and beaches, and remember what creativity felt like before it got outsourced. Which, honestly, is worth hearing. Even for those of us who are deep in this stuff, the signal-to-noise ratio of AI tooling can degrade your own thinking if you're not careful.

The Open-Weighted Question

There's a broader political dimension to all of this that we touched on: the US government apparently requested a staggered rollout of GPT 5.6, with individual user approvals. Ayan's quietly rooting for open-weighted models to win precisely because he doesn't want a government gatekeeping his access to inference.

The irony is real: for decades, open source was largely a Western project — Linux, Apache, the whole stack. Chinese labs were the closed ones. AI has inverted that completely. The best open-weighted models right now are coming out of Chinese labs (GLM, DeepSeek, Kimi), with permissive licenses and public weights. Meanwhile, frontier US models are getting more restricted, not less.

The gap is closing fast. Ayan's estimate: six months before open-weighted models are genuinely on par with state-of-the-art frontier models for most tasks. At that point, the question of self-hosting becomes much more interesting — even if true self-hosting (running a capable model on your own hardware without renting GPU time) is still a way off for most people.

His prediction for the near future: people will take these open-weighted base models and fine-tune them for specific use cases. GLM 5.2 for web scraping. GLM 5.2 for front-end design. Smaller, purpose-built variants that don't require racks of GPU memory to run. That's happening already on Hugging Face.

One Last Thing

We're going to be at EuroPython in Kraków. If you're there, come find us at the booth and mention the podcast — we'll sort you out with some swag. And the loop engineering meetup recording should be live in the end cards of this episode.

See you next week.

Extract Data is a weekly podcast on web scraping, data engineering, and the AI tools that are changing how we build. Hosted by John Rooney, Ayan Pahwa, and Neha Setia.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

AI-assisted data extraction Large Language Models (LLMs)Developer interest Data quality

Developer Engagement Manager

John is the Developer Engagement Manager at Zyte, working closely with the community, creating content and helping developers learn web scraping, Zyte products an much more. He has spoken at Extract Summit's and also creates the workshop's for the events.

X (Twitter)
LinkedIn

More from this author

In this article

The Harness Matters More Than the Model
What Even Is a Harness?
System Prompts: The Invisible Guardrail
Harness Philosophy: Safety vs. Control
The Model Rankings (As of Right Now)
The Token Cost of Being Thorough
Speed Doesn't Matter the Way You Think It Does
Context Management, and Knowing When to Walk Away
The Open-Weighted Question
One Last Thing

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Or follow elsewhere

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

Services

Zyte Data

Coding tools & hacks straight to your inbox. Bi-weekly dosage of all things code.

Web Scraping API

Zyte API

Coding tools & hacks straight to your inbox. Bi-weekly dosage of all things code.

Developers

Zyte Developers

Coding tools & hacks straight to your inbox. Bi-weekly dosage of all things code.

Product & E-commerce
Data for AI
Job Posting
Real Estate
News & Articles
Search
Social Media

Blog
Learn
Case Studies
Webinars
White Papers
Join our community
Documentation

Meet Zyte
Contact us
Jobs
Support
Terms and Policies
Trust Center
Do not sell
Cookie settings

Web Data Compliance
Open Source
What is Web Scraping
Web Scraping in Python: Ultimate Guide
Stop getting blocked, start scraping

Most loved workplace certificate

Zyte reward

G2 reward

G2 reward

G2 reward

X Facebook Instagram YouTube LinkedIn Discord

© Zyte Group Limited 2026