PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogScraping practiceTeaching AI to scrape like a pro: how we measure LLMs’ data quality
ArticleScraping practiceData quality

Teaching AI to scrape like a pro: how we measure LLMs’ data quality

AI-enabled code editors can now conjure scraping code on command. But is it any good? Here’s how Zyte re-engineered LLMs with Web Scraping Copilot to drive best-in-class output.

Theresia Tanzil · Content Writer

10 min read · February 23, 2026

Teaching AI to scrape like a pro: how we measure LLMs’ data quality

In the past couple of years, AI coding assistants have gone from magic power to business-as-usual. You open your code editor, type a comment, and a Large Language Model (LLM) fills in the blanks.

But there's a problem. When you ask a general AI assistant to write code, it's pulling from billions of lines of examples. Very few of them show what good code looks like.

When it comes to specialist code cases, like web scraping, that’s problematic, because scraping code - especially high-quality scraping code - is relatively under-represented in the global training data.

So, how can AI help write good scraping code? This was a key design consideration when we built Web Scraping Copilot, an AI-powered Visual Studio Code extension that specializes in generating and managing web scraping code.

Screenshot 2026 02 23 At 12.28.46  P M

We wanted to help web scraping developers not just code faster, but also to ship good scraping code, fast.

So, what does "good" scraping code actually look like, and how do you get it?

What makes scraping code ‘good’?

Before teaching Web Scraping Copilot how to generate quality code, we first had to define what constitutes quality in the specific case of web scraping.

The data industry has never really formalized such a standard, so Zyte needed to build a measurement system where none existed.

We turned to our team of hundreds of Scrapy experts, distilling its experience in creating accurate and maintainable Scrapy code into three quantifiable dimensions that describe data accuracy and code maintainability.

Variable

Area

Measurement test

ROUGE-1 F1 adj

Data accuracy

The code extracts the right data with the right values.

Source lines of code (SLOC)

Code complexity

The code is tight and non superfluous.

Cyclomatic complexity

Code complexity

The logic is simple and understandable.

If we could define what “accuracy” and “maintainability” actually mean, we could score them.

1. The accuracy challenge: Measuring messy data

Measuring the accuracy of extracted web data is trickier than it seems, because desired output is not always clear from AI prompt inputs.

So we adapted a metric from natural language processing called ROUGE-1 F1, which measures token-level overlap between texts, and extended it to handle structured web data.

This metric gives partial credit for values that differ in formatting but are semantically equivalent (like “24.99”, “$24.99”, and “24.99 USD”) – letting us score thousands of extraction attempts without penalizing harmless variations.

ROUGE-1 F1 scores on a sliding scale from 0 to 1, with lower values for poorer accuracy and greater values for higher accuracy. With this benchmark in our toolset, we could be confident that we are not skipping relevant data.

Accuracy Rouge

2. The maintainability challenge: The leaner, the better

Scraping code that picks up accurate data also needs to be easily understood and adapted as websites change.

The first signal we look at for maintainability is the length of the code generated.

We record the source lines of code (SLOC) in each generated spider.

Fewer lines generally mean less surface area for bugs and lower maintenance cost over time. Keeping SLOC low encourages spiders that are focused, declarative, and easier to reason about.

3. The complexity challenge: Going deep without getting lost

Yet, code length alone doesn’t tell the full story. Two spiders with the same number of lines can vary in how easy they are to understand.

That’s where cyclomatic complexity comes in.

Cyclomatic complexity measures how many independent decision paths exist in a piece of code - essentially, how many branches, conditionals, and forks a reader has to keep in their head at once.

Lower values are generally better: they indicate linear, predictable logic that is easier to test and modify. Higher values suggest brittle code where small changes can have unintended side effects

Cyclomatic complexity score range

Interpretation

1 - 10

Simple to moderate complexity. Low risk.

11 - 20

Moderate. Careful review needed to justify.

21 - 40

Complex. Difficult to test and maintain.

Above 40

Unmaintainable.

A well-structured spider would typically land at around five to 15.

Bringing it together

Taken together, these metrics let us evaluate a scraper from multiple angles at once. Here’s what that looks like for a single spider:

Scraper name

rouge1_f1_adj

SLOC

Complexity

Product scraper for website A

0.7955

35

6.25

In this example:

  • A ROUGE-1 F1 adj score of ~0.8 indicates good extraction accuracy, with minor acceptable variations in formatting.

  • 35 source lines of code suggests the scraper is compact.

  • A cyclomatic complexity of 6.25 means the logic is straightforward, with intuitive branching.

Together, they give us a practical, repeatable way to judge whether a scraper has good quality.

Iterating toward production quality

With our scoring system in place, we could move toward building a Visual Studio Code extension that reliably produces good scraping code.

For Web Scraping Copilot, that meant perfecting our own extension code and crafting embedded prompts that it uses to turn mass-market LLMs into expert spider generators.

We followed the following process to establish target thresholds for each score:

  • Data accuracy: The team produced a source-of-truth dataset - a pre-assembled list of 1,250 on-page data fields, from hundreds of URLs, that are known to be correct. By comparing output from our LLM-produced spider code against the values known to be correct, we could make changes to nudge that rouge1_f1_adj score ever closer to 1.

  • Code complexity: Zyte specialists reviewed the SLOC and cyclomatic complexity scores for LLM-produced, to assess whether generated spiders met their expectations for clarity and structure.

After a couple of iterations, it became clear - good LLM-generated scraping code on average has a scorecard like this:

rouge1_f1_adj

SLOC

Complexity

0.8 +

30 to 40

< 12

With these targets in place, improvements followed a reliable process: adjust prompts or tooling, re-run code generation, and check whether changes moved quality in the right direction across all metrics.

Sometimes, gains were obvious. Other times, they revealed trade-offs: a change might reduce the number of generation attempts needed to produce working code (good), while slightly hurting extraction accuracy (bad). In those cases, we only accepted changes when the overall outcome clearly delivered more value than it cost.

AI code quality is real, today

Today, Web Scraping Copilot consistently generates scraping code that meets the quality bar we set during development and does so in a measurable, repeatable way.

Just as importantly, these scores are not treated as a one-time gate. They are monitored continuously. Every prompt change, tooling adjustment, or model upgrade is evaluated against the same metrics to ensure quality does not regress as the system evolves. When we see improvements, we raise expectations. When tradeoffs appear, we consider them holistically.

Every iteration brings Web Scraping Copilot closer to thinking less like a generic AI coding assistant and more like a colleague who has spent years writing production scrapers.

And the beauty of scoring our own product’s output in this way is that we can apply the same approach to rating the relative quality of scraping code produced by any of the LLM models usable by the extension.

W S C Bench 20260217

For instance, when Anthropic released Sonnet 4.6 in February 2026, Zyte’s research and development team was able to crunch the numbers to show how it beat all rival models in most of the score areas.

That is, at the time - Sonnet 4.6, when instructed by Web Scraping Copilot’s best-in-class, secret-sauce scraping know-how - produced the very best auto-generated scraping code.

We are excited to see where these scores go next, as frontier models get better and better.

Where general AI stops, Web Scraping Copilot begins

Most of today’s general-purpose AI coding assistants optimize for plausibility and speed, not for long-term accuracy or maintainability.

Zyte has “taught” the AI to code like our best scraping engineers by defining, measuring, and iteratively improving quality along the axes that matter most: accuracy and complexity.

We believe that gaining and maintaining access to web data should be hassle-free, no matter who, or what, is writing the code.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
Scraping practiceData quality

Theresia Tanzil

Content Writer

More from this author

In this article

  • What makes scraping code ‘good’?
  • 1. The accuracy challenge: Measuring messy data
  • 2. The maintainability challenge: The leaner, the better
  • 3. The complexity challenge: Going deep without getting lost
  • Bringing it together
  • Iterating toward production quality
  • AI code quality is real, today
  • Where general AI stops, Web Scraping Copilot begins

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026