Learn web scraping

Guides, tutorials and courses on web scraping and data extraction — from your first request to production pipelines.

What is a residential proxy?

Learn what residential proxies are, how they compare to datacenter proxies, and why modern web scraping needs more than IP diversity.

Arnold Alexander10 min readMay 29, 2026

Zyte Blog — field notes from the world of data extraction

Use case

How much do rotating proxies cost?

Learn how much rotating proxies cost, what affects pricing, and why total web scraping costs often go beyond proxy subscriptions.

Arnold Alexander10 min readMay 29, 2026

Use case

How do rotating proxies work?

Learn how rotating proxies work, when to use them for web scraping, and why IP rotation alone is not enough for reliable data access.

Arnold Alexander10 min readMay 29, 2026

Use case

Are proxies legal for web scraping?

Learn whether proxies are legal for web scraping, what compliance factors matter, and why modern teams use automated unblocking beyond proxy management.

Arnold Alexander10 min readMay 29, 2026

AI-assisted data extraction

From screenshot to shopping list in 90 seconds

I built a mood board pipeline that starts with a screenshot. Claude Skills and Zyte API for search, product extraction, and image embedding at any scale.

Neha Setia Nagpal10 min readApril 15, 2026

Use case

Should AI Companies Build Their Own Web Scraping Pipelines?

Should AI companies build their own web scraping pipelines? Learn when in-house scraping makes sense and when it becomes costly and hard to maintain at scale.

Arnold Alexander10 min readApril 13, 2026

Use case

What Is AI Data Provenance? Definition & Importance

Learn what AI data provenance is and why it matters. Understand data origin, collection methods, governance, and how provenance supports trust and compliance.

Arnold Alexander10 min readApril 13, 2026

Data quality

How to ensure data quality in your Scrapy web scraping projects using Spidermon and Claude Code

Spidermon is an open-source monitoring framework for Scrapy. You attach it to your spider, define what "success" looks like, and it automatically checks your crawl results after the spider closes, flagging anything that doesn't meet your standards.

Ayan Pahwa5 min readApril 10, 2026

Web data collection

Why your API responses look like gibberish: the gzip decompression trap

The script was working. Requests were going out, responses were coming back with HTTP 200. But the response body was unreadable noise, a wall of binary characters that crashed the JSON parser and reported "no data found". No error code, no timeout, no network failure; just garbage where structured data should be.

Ayan Pahwa6 min readApril 8, 2026

Scraping practice

How to parse HTML tables into structured data (CSV/Excel)

In this guide, you'll learn three things: how HTML tables are actually structured (so the parsing makes sense), how to extract clean tabular data using Python, and how to export it to CSV or Excel

John Rooney7 min readMarch 20, 2026

Use case

How to Test Web Scrapers During Development

Learn how to test web scrapers during development. Validate selectors, use HTML fixtures, and ensure reliable data extraction across changing websites.

Arnold Alexander10 min readMarch 18, 2026

Use case

How Developers Debug Web Scraping Selectors

Learn how developers debug web scraping selectors. Discover common issues, testing techniques, and how to build reliable extraction logic for changing websites.

Arnold Alexander10 min readMarch 18, 2026