Explore resources by topic or category

Blog

How To Deploy Custom Docker Images For Your Web Crawlers

Valdir Stumm Junior

4 Mins

September 8, 2016

What if you could have complete control over your environment? Your crawling environment, that is...

Blog

Scraping Infinite Scrolling Pages

Valdir Stumm Junior

3 Mins

June 22, 2016

Blog

How To Debug Your Scrapy Spiders

Valdir Stumm Junior

5 Mins

May 18, 2016

Welcome to Scrapy Tips from the Pros! Every month we release a few tricks and hacks to help speed up your web scraping and data extraction activities.

Blog

Machine Learning With Web Scraping: New MonkeyLearn Addon

Cecilia Haynes

5 Mins

April 14, 2016

We deal in data. Vast amounts of it. But while we’ve been traditionally involved in providing you with the data that you need, we are now taking it a step further by helping you analyze it as well.

Blog

Scrapy Tips from the Pros (Part 1): Expert Advice for Better Scraping

Valdir Stumm Junior

5 Mins

January 19, 2016

Blog

Link Analysis Algorithms Explained

Valdir Stumm Junior

6 Mins

June 19, 2015

When scraping content from the web, you often crawl websites which you have no prior knowledge of. Link analysis algorithms are incredibly useful in these scenarios to guide the crawler to relevant pages.

Blog

XPath Tips From The Web Scraping Trenches

Valdir Stumm Junior

3 Mins

July 17, 2014

In the context of web scraping, XPath is a nice tool to have in your belt, as it allows you to write specifications of document locations more flexibly than CSS selectors.

Blog

Extract Schema.Org Microdata with Scrapy Selectors

Valdir Stumm Junior

5 Mins

June 18, 2014

We have released an lxml-based version of this code as an open-source library called extruct. The Source code is on Github, and the package is available on PyPI. Enjoy!

Blog

Optimizing Memory Usage Of Scikit-Learn Models Using Succinct Tries

Mikhail Korobov

7 Mins

March 26, 2014

We use the scikit-learn library for various machine-learning tasks at Zyte. For example, for text classification we'd typically build a statistical model using sklearn's Pipeline, FeatureUnion, some classifier (e.g. LinearSVC) + feature extraction and preprocessing classes.

Blog

Git Workflow For Scrapy Projects

Pablo Hoffman

2 Mins

March 6, 2013

Our customers often ask us what's the best workflow for working with Scrapy projects.

Blog

Spiders Activity Graphs

Pablo Hoffman

2 Mins

August 25, 2012

We often have to write spiders that need to login to sites, in order to scrape data from them. Our customers provide us with the site, username and password, and we do the rest.

Blog

Finding Similar Items

Shane Evans

6 Mins

July 23, 2012