PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogAnti-banThe recipe for a request: Scaling data extraction through investigation
ArticleAnti-banScraping practice

The recipe for a request: Scaling data extraction through investigation

Learn how an investigative mindset helps scale data extraction from single requests to millions daily by building resilient, efficient scraping systems.

K

Kieron Spearing

5 min read · April 16, 2026

The recipe for a request: Scaling data extraction through investigation

Cooking a delicious meal for your wife at the end of the night - that's super easy.

Now, try cooking 300 plates every day at a Michelin Star level. It's a completely different ball game.

The same is true when we talk about data extraction requests. Sending a single request so that it gets through without a problem is something anyone can do. But scaling that to over 1,000 requests every second changes the game completely.

At Centric Software, we operate at this level, running over 5,000 scrapers that send 130 million requests daily. At this scale, you simply cannot be fixing things every single day. You have to shift your attention from reactive fixes to perfecting the initial development process.

The chef's secret

I used to be a Michelin-trained chef, and I remember working with a colleague who had an amazing capability. He would walk in with a massive list of tasks, and it would just disappear in seconds. I had no idea how he did it. One day, over a beer, I asked him his secret. He told me, "Just take your time."

It sounds counterintuitive, doesn't it? If I work slower on one task, I'll take longer to get to the next. But eventually, it made sense. Every mistake you make is costly. Every time you rush and have to re-do something, it takes time away from you. But, if you take your time, understand the task, and do it correctly the first time, the net amount of tasks you complete is far greater.

“Every minute you spend in the investigation is 10 times that saved in the implementation”

– Kieron Spearing, Data Collection Engineer, Centric Software

This is one of many lessons from the kitchen that can be directly translated to my work today as a Data Collection Engineer.

Adopting an investigative mindset

To build resilient scrapers that can handle thousands of requests per second, you need to adopt an investigative mindset. This is a methodical process for analyzing how a website works before you write a single line of code.

It can be broken down into three key phases:

  1. Learn how the website expects user interaction.

  2. Break down requests to their minimum requirements.

  3. Translate these discoveries into a resilient scraper.

This process ensures you understand the system deeply. As Albert Einstein said:

“If you can’t explain it simply, you don’t understand it well enough.”

 – Albert Einstein

A practical investigation: ‘Go shopping’

Let's walk through an example. The first step is to simply "Go Shopping." Open the target website in a browser and use it. How is the data represented naturally? How is a user expected to search for and buy a product?

As you interact with the site, your goal is to find where the data is coming from. Using your browser’s developer tools, you can inspect the network traffic and identify the specific API request that fetches the data you need.

Once you’ve located the request, it’s time for experimentation. This is where the fun begins.

  • Take the cURL of the request. Extract the raw request from your browser.

  • Bit by bit, remove components with the intention of breaking the request. Remove headers, cookies, and parameters one by one.

  • Fix it and repeat. When the request fails, you’ve found an essential component. Add it back, document it, and continue removing other parts.

The goal isn't just to get a working cURL command; the goal is to understand what you learn as you go. 

This is why I avoid tools like Postman for this initial investigation, as they can modify the request in subtle ways. A better approach is to use a reverse proxy like mitmproxy, which shows you exactly what is being sent. For performing the investigations and documenting these requests, especially in a team environment, I recommend a tool like Bruno.

During this process, you should be able to answer several key questions:

  • Are cookies required?

  • Are any headers essential?

  • Where are dynamic values generated?

  • Is proxy quality important?

  • Is the proxy tied to the request?

  • Is the header order important?

Common pitfalls vs. winning strategies

This investigative process helps avoid common pitfalls that lead to brittle scrapers and technical debt. By shifting your strategy, you build for resilience and scale from the very beginning.

Pitfalls (The Quick Way)

Strategies (The Resilient Way)

HTML-first approach: Scraping data directly from the HTML structure.

API-first approach: Finding the underlying API that populates the page. APIs are more stable and less likely to change than front-end layouts.

Gathering the entire feed at once: Creating a single, monolithic process to discover and collect all data.

Decouple discovery and collection: Use one process to find all the product URLs and a separate process to collect the data for each URL. This prevents a single failure from stopping the entire operation (avoiding cascading failure) and allows for targeted retries.

Hitting the website without regard to latency: Sending requests as fast as possible without any delays.

Retries with Jitter and/or bounded exponential backoff: Be respectful. Implement intelligent delays and backoff strategies to avoid overwhelming the server, which also reduces the chance of getting blocked.

Quick-fix approach: Rushing to get a scraper working to meet a deadline, creating technical debt.

Well-documented investigations & sustainable mindset: Take your time. Thoroughly document your findings. This creates a sustainable system that requires far less maintenance in the long run.

By documenting your discoveries, you create a blueprint for a robust scraper. You understand your framework's limitations and can build any necessary tools.

Ultimately, scale comes from building resilient systems.

So, how can we send 1,000 requests per second as easily as we send one? The answer lies in the methodical, investigative process. Every minute you spend in the investigation is 10 times that saved in the implementation.

Because every chef knows that thorough preparation is the key to fine food.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
Anti-banScraping practice
K

Kieron Spearing

More from this author

In this article

  • The chef's secret
  • Adopting an investigative mindset
  • A practical investigation: ‘Go shopping’
  • Common pitfalls vs. winning strategies

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026