Articles, interviews and analysis on how data is gathered, used and fought over — written by the people closest to it.

MongoDB was used early on at Zyte to store scraped data because it's convenient. Scraped data is represented as (possibly nested) records which can be

Introducing Zyte Smart Proxy Manager - Enhance your web scraping with Smart Proxy Manager. Explore its powerful features and benefits as a smart proxy manager.

Git Workflow for Scrapy Projects - Streamline your Scrapy projects with an efficient Git workflow. Improve collaboration and project management.

We often have to write spiders that need to fill login forms to sites. Our customers provide us with the site, username and password, and we do the rest.

Spiders Activity Graphs - Visualize your spiders' performance with activity graphs. Optimize your web scraping process with actionable insights.

This post describes an approach to the problem of finding similar items among crawled items and how this was implemented at Zyte.

Scrapy 0.15 Dropping Support for Python 2.5 - Important update for Scrapy users! Discover the changes in the latest release and the end of Python 2.5 support.

AutoScraping: Casts a Wider Net - Learn how to maximize your web scraping efforts with AutoScraping. Find out how it can broaden your data collection horizon.
G2.com