PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Blog

    Learn

    Case Studies

    Webinars

    Videos

    White Papers

    Join our Community

    Featured Posts

    Building superior AI models with quality web data
    Blog Post
    Powerful new spending controls and usage insights for Zyte API
    Blog Post
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
Discord Community

Why I'm adding GLM-5.2 to my agentic coding arsenal

Posted on June 19, 2026
Is GLM-5.2 really closing the gap to Anthropic - and at just a fraction of the cost - or is it just more AI hype? I think so, and let me show you why.
By
Ayan Pahwa

I have a habit that occasionally annoys the people who pair with me: I am always swapping models in and out of my stack. The setup I described in my agentic coding setup already mixes vendors freely, with Claude doing the heavy planning and execution while a separate orchestration runs DeepSeek as the coder and Qwen as the reviewer, and I keep an OpenRouter key with loaded credits around precisely so I can throw a new model at real work the week it lands without committing to anything. When z.ai open-sourced GLM-5.2 couple days ago, a wave of "this beats the frontier" posts went up across the forums, and rather than believing whether the headlines are true I decided to test it myself, because my workflow will have to outlive any new model and there has been a lot of such launches lately.

How I evaluate a new model

What I trust is putting the model on the work I genuinely do and watching it behave, which for me means web scraping: generating Scrapy spiders and selectors, extracting structured data from messy HTML, tool-calling and planning the shape of a scraping project before any code gets written.

I send the identical prompt to the new model so nothing is biased by wording, I price every response at that model's published per-token rate to get a directional cost picture rather than an accounting-grade one, and I reach the open model through OpenRouter because that is one of the easiest way to trial anything new while the Claude models run on my normal account. Where I can score a result objectively I do, checking whether generated code compiles, whether extracted JSON is valid, and whether it covers the schema I asked for, and then I read every single output myself, because the automated scores lie more often than you would like.

What GLM-5.2 is, sourced

GLM-5.2 is a 753-billion-parameter Mixture-of-Experts model with around 40 billion parameters active per token, a one-million-token context window, and an MIT license, per its model card. On the one independent yardstick available, Artificial Analysis's Intelligence Index, it scores 51, which makes it the top open-weight model and fourth overall behind Claude Fable 5, Opus 4.8, and GPT-5.5, and it posts about 78% on Terminal-Bench 2.1, as Artificial Analysis reported. Its own published benchmarks trail Opus 4.8 on essentially everything, usually by a narrow margin that widens on the hardest long-horizon tasks, and you should note that press coverage widely cites a 744-billion-parameter figure while the official card says 753 billion, a small discrepancy I am flagging rather than papering over.

You will also read that it was trained entirely on Huawei Ascend chips with zero NVIDIA hardware, which I would treat with care, because Z.ai never made that claim for GLM-5.2 specifically, there is no technical report, and the figure of 100,000 Ascend chips is inherited from the broader GLM-5 family and remains unaudited, with skeptics at The Register calling a related claim "sophistry." It is probably directionally true and it would matter a great deal if it were, but I am not going to assert it as an established fact.

In frontend benchmarking it even beat Opus and is just next to Fable5, but since I don’t do much frontend, I didn’t test it in that department.

What it did on my work

On execution, GLM-5.2 reached parity with Sonnet, and that genuinely surprised me. The extraction tasks were a dead heat, with both models hitting 100% schema coverage and correctly pulling every item from a listing page as valid JSON, and the spiders were all idiomatic and all compiled, using Scrapy conventions like response.follow for pagination alongside sensible selectors and regex price cleaning. For everyday scraping code I could not reliably tell which output came from which model without looking at the filename.

1import scrapy
2
3class BooksToScrapeSpider(scrapy.Spider):
4    name = "books_to_scrape"
5    allowed_domains = ["books.toscrape.com"]
6    start_urls = ["https://books.toscrape.com/"]
7
8    def parse(self, response):
9        for book in response.css("article.product_pod"):
10            yield {
11                "title": book.css("h3 a::attr(title)").get(),
12                "price": book.css("p.price_color::text").get(),
13                "rating": (book.css("p.star-rating::attr(class)").get() or "").split()[-1],
14            }
15        next_page = response.css("li.next a::attr(href)").get()
16        if next_page:
17            yield response.follow(next_page, callback=self.parse)
Copy

On planning, the part where I normally spend the expensive Opus tokens, it came in roughly 8.5 times cheaper, which is where the cost picture gets hard to ignore.

Task GLM-5.2 cost Current model GLM cheaper by
Project planning $0.013 Opus 4.8 ($0.110) 8.5×
Selector generation $0.004 Sonnet 4.6 ($0.015) 4.2×
Extraction to JSON $0.019 Sonnet 4.6 ($0.036) 1.9×
Simple spider $0.005 Sonnet 4.6 ($0.006) 1.1×

Across the whole suite the totals came to $0.068 for GLM-5.2, $0.122 for Sonnet, and $0.212 for Opus, which tells a clear story as long as you read the next section before

Where it loses, because it does

If I stopped at the cost table I would be selling you something, so here are the two real problems I hit. The first is that GLM-5.2 is verbose, and verbosity is not free, because it spent 1,144 output tokens writing a spider that Sonnet completed in 357, and it used 69 tokens simply to reply with the word "PONG." Its per-token price is low enough that it still wins on cost, but the verbosity erodes the headline advantage and makes the model noticeably slower, taking 24 to 55 seconds against Sonnet's 8 to 20. If you do adopt it, tame this at the source by turning its reasoning effort down and prompting for terse, code-only output, and although the various "talk like a caveman" skills such as caveman package the same terseness trick for Claude Code, it is worth understanding that they shape output style and will not trim the hidden reasoning tokens where much of GLM's bloat actually lives, which is exactly what the native reasoning-effort setting controls.

The second problem is that its knowledge of our own API was staler than Sonnet's. On the task that wired a spider to Zyte API, neither model reached for Zyte's current one-line addon setup and both fell back to the older manual middleware wiring, but Sonnet's configuration would actually run while GLM's was incomplete and would error on startup, because it omitted the required asyncio reactor and request-fingerprinter settings. The caveat that matters here is that I gave neither model web access in this test, so this measures training knowledge rather than live documentation lookup, and defaulting to an outdated pattern is the textbook symptom of training-data lag. Drive either model inside an agentic loop with documentation-fetching tools, which is how you would really run it in anger, and it can pull the current docs, a shift my colleague unpacks in what's becoming of web scraping developers in the age of AI agents. Stripped of tools, though, Sonnet was simply more current.

Running it inside a harness, which is where I actually work

Raw completions are only half the picture, because my real work happens inside Claude Code and OpenCode, where the model has to drive a loop rather than answer once: call a tool, read the result, decide the next step, and know when not to bother. So I ran a second probe aimed squarely at that, and GLM-5.2 handled everything I threw at it, emitting a correct web search call with valid arguments, choosing the file-reading tool over search when the task demanded it, continuing the conversation sensibly after I fed a tool result back, and showing the restraint to answer "what is two plus two" directly instead of reaching for a tool it did not need. Sonnet did the same, and the only visible difference was speed. That parity is the real headline for agentic use, because a model that writes beautiful code but bungles its tool calls is dead weight in a loop.

The cost gap held up inside that loop too, because running the whole tool-calling probe through OpenRouter brought GLM-5.2 in at roughly $0.002 against Sonnet's $0.013 at published rates, close to seven times cheaper, and the raw logs explain why: GLM stayed terse here, spending 20 to 30 output tokens per tool call rather than the rambling it produces on open-ended code, so its verbosity turns out to be task-dependent rather than constant. Two details in those logs are worth dwelling on. The Sonnet rows read $0.00 because I route Anthropic through my own key under OpenRouter's bring-your-own-key (BYOK) mode, which bills my Anthropic account directly rather than OpenRouter credits, so the honest comparison is the published-rate figure rather than the dashboard total. The speed column, meanwhile, is a lesson in provider routing, because GLM served from Z.ai's own infrastructure crawled along at three to five tokens per second while the same model served from a different provider ran at nearly 30, which means the latency I flagged earlier is as much about where you route the request as about the model itself.

OpenRouter request logs comparing Claude Sonnet 4.6 and GLM-5.2 on tool-calling requests, showing input and output token counts, cost, and tokens-per-second speed

Figure : My own OpenRouter logs for the tool-calling run. The Sonnet rows show $0.00 because they are billed through my own Anthropic key under BYOK, while GLM-5.2's genuine per-request cost and the wide speed gap between the Z.ai and third-party providers are both visible.

The practical surprise is that you do not have to leave your harness to use it, because Z.ai ships an Anthropic-compatible endpoint, so pointing Claude Code at GLM-5.2 is mostly a matter of environment variables, after which your MCP servers, skills, and hooks keep working unmodified, as Z.ai documents in its Claude Code guide. OpenCode, where I already run DeepSeek and Qwen, takes the same endpoint or an OpenRouter route through its provider config.

1export ANTHROPIC_BASE_URL="https://api.z.ai/api/coding/paas/v4"
2export ANTHROPIC_AUTH_TOKEN="your-z-ai-key"
3export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"
4export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]"
5export CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000
Copy

That [1m] suffix points at the model's defining feature in a harness, a one-million-token context window that is five times larger than its predecessor and big enough to hold a sprawling codebase or a long, messy scraping-agent trajectory without the harness summarizing it away. The window changes how you manage context rather than freeing you from managing it: you raise the auto-compact threshold so Claude Code uses the full window before it starts summarizing, and you lengthen the request timeout, because a verbose reasoning model can think for a long time before the first token arrives and the default timeout will kill the call mid-flight. The flip side is that the window is not free, since a model this wordy fills context faster than a terse one and a near-full million-token prompt is real money even at $1.40 per million input tokens, so the right mental model is a capability to wield deliberately rather than an excuse to stop pruning context.

My verdict

Good at: structured extraction from messy HTML, where it matched Sonnet on accuracy and schema coverage, and idiomatic everyday Scrapy code, both produced at a fraction of the cost while comfortably handling enormous inputs thanks to the one-million-token context, and crucially it drives a tool-calling loop as reliably as Sonnet, which is what makes it usable in a harness at all.

Weak at: brevity and speed, since it burns tokens and wall-clock time, and recall of fast-moving library specifics from memory alone, where its training lag showed against a more current Sonnet.

Where I would use it: as a cost-sensitive workhorse for high-volume, routine scraping work, ideally inside an agentic loop that can fetch current documentation to cover the staleness, and self-hosted whenever data residency rules out a hosted API.

What it can drop in for: Sonnet on standard execution tasks today, and Opus on routine planning when you want the dramatic cost saving and can tolerate the slower, wordier output, though I would not yet hand it the gnarliest long-horizon planning where Opus still pulls ahead, and it slots straight into Claude Code or OpenCode through Z.ai's Anthropic-compatible endpoint so adopting it never means abandoning the harness you already use.

Hedging your agentic setup

There is a strategic reason to keep an open-weight model in the arsenal beyond cost, and GLM-5.2 is not even the only candidate, because Kimi K2.6 (Moonshot), DeepSeek V4 (MorphLLM), and Qwen 3.5 (review) each beat or matched a frontier model on something this year, and all of them ship downloadable weights. The value of holding two or three of these wired in became vivid on June 13, 2026, when the US Commerce Department barred Anthropic from serving its top Fable 5 and Mythos class models to any foreign national and Anthropic disabled them globally to comply, as Fortune reported, the first export action aimed at an AI model rather than a GPU or chipset, almost right after the moment I unpacked using it in - where Fable 5 and web scraping fit into loop engineering.

A closed model can be rolled back, repriced, or legislated out from under you with no notice, whereas MIT-licensed weights already sitting on your disk cannot be revoked, and self-hosting them is also the cleanest answer to routing your data from infra you own or trust.

What this leaves me thinking about

I am not ripping Opus and Sonnet out of my workflow, because they are faster and more current and I still want the best planner I can get for the hardest problems, yet the exercise quietly moved a belief I did not realize I was holding. Swapping a frontier model for an open one on a meaningful slice of my work turned out to be a twenty-minute test rather than a migration project, and once that is true, loyalty to any single model is mostly untested habit dressed up as a standard. The durable advantage was never the model, because the leaderboard will reshuffle again before this post is a month old; it is the small, unglamorous harness that lets me find out, in an afternoon, what should run where. So the question I would leave you with is the one this experiment really answered for me: if your default model vanished tomorrow, banned, deprecated, or simply beaten by something open and cheaper, how quickly could you prove what should take its place? If the honest answer is that you do not know, then the most valuable thing you can build this quarter is not another spider, it is the test that answers this question for you.

×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more

Table of contents

Get the latest posts straight to your inbox

No matter what data type you're looking for, we've got you

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026