Open Source

Webinars

Sustainability in Open Source | Fireside Chat

Shane Evans

January 5, 2025

Learn how successful open-source projects balance community value with sustainable growth. Industry leaders share insights on monetization, maintenance, and building thriving communities.

Blog

A Deep Dive into Zyte's Open-Source Libraries

Neha Setia Nagpal

1 mins

December 19, 2024

Discover how Zyte’s open-source libraries like ClearHTML, Extruct, Chomp.js, and more simplify web data extraction and processing.

Blog

Selenium, Puppeteer, Playwright: Which tool is right for web scraping at scale?

Neha Setia Nagpal

1 mins

October 7, 2024

Discover the strengths and limitations of Selenium, Puppeteer, and Playwright for web scraping at scale.

Blog

4 essential Scrapy plugins for building efficient and effective spiders

Neha Setia Nagpal

1 mins

August 15, 2024

Here are four essential Scrapy plugins we use to build efficient web crawlers for our customers.

Blog

Choosing Between Puppeteer vs. Selenium for Web Scraping

Karlo Jedud

8 mins

July 10, 2024

Web scraping tools save hours of work by automating data extraction, testing web applications, and performing repetitive tasks.

Blog

The Scraper’s System Part 2: Explorer’s Compass to analyze websites

Neha Setia Nagpal

8 min

February 16, 2024

In the first part, we discussed a template to define the clear purpose of your web scraping system that can help you design your crawlers better and prepare you for the uncertainty involved in a large scale web scraping project.

Blog

Dateparser: A Little But Powerful Date Parsing Library

Marc Hernandez Cabot

3 Mins

May 6, 2021

It was 6 years ago when Zyte released Dateparser, an open source library that parses human-readable dates, and in October 2020 we released version 1.0.0, a very important milestone.

Blog

Scrapy Update: Better Broad Crawl Performance

Nikita Vostretsov

3 Mins

February 18, 2021

When crawling the web, there’s always a speed limit. A spider can't fetch faster than the host willing to send the pages.

Blog

Building Spiders Made Easy | GUI For Scrapy Shell

Roy Healy

4 Mins

March 3, 2020

As a python developer at Zyte (formerly Scrapinghub), I spend a lot of time in the Scrapy shell.

Blog

ScrapyRT: Turn Websites into Real-Time APIs

Pawel Miech

4 Mins

May 14, 2019

If you’ve been using Scrapy for any period of time, you know the capabilities a well-designed Scrapy spider can give you.

Blog

Spidermon: Zyte's secret to data quality

Ian Kerins

5 Mins

March 5, 2019

If you know anything about Zyte , you know that we are obsessed with data quality and data reliability.

Blog

Meet Spidermon: Our battle tested spider monitoring library

Renne Rocha

6 Mins

March 1, 2019

Absolutely not! Website changes (sometimes very subtly), anti-bot countermeasures, and temporary problems often reduce the quality and reliability of our data.

Explore resources by topic or category