Explore resources by topic or category
Browse by Category
Blog
Link Analysis Algorithms Explained
Valdir Stumm Junior
6 Mins
June 19, 2015
When scraping content from the web, you often crawl websites which you have no prior knowledge of. Link analysis algorithms are incredibly useful in these scenarios to guide the crawler to relevant pages.
Blog
XPath Tips From The Web Scraping Trenches
Valdir Stumm Junior
3 Mins
July 17, 2014
In the context of web scraping, XPath is a nice tool to have in your belt, as it allows you to write specifications of document locations more flexibly than CSS selectors.
Blog
Extract Schema.Org Microdata with Scrapy Selectors
Valdir Stumm Junior
5 Mins
June 18, 2014
We have released an lxml-based version of this code as an open-source library called extruct. The Source code is on Github, and the package is available on PyPI. Enjoy!
Blog
Optimizing Memory Usage Of Scikit-Learn Models Using Succinct Tries
Mikhail Korobov
7 Mins
March 26, 2014
We use the scikit-learn library for various machine-learning tasks at Zyte. For example, for text classification we'd typically build a statistical model using sklearn's Pipeline, FeatureUnion, some classifier (e.g. LinearSVC) + feature extraction and preprocessing classes.
Blog
Git Workflow For Scrapy Projects
Pablo Hoffman
2 Mins
March 6, 2013
Our customers often ask us what's the best workflow for working with Scrapy projects.
Blog
Spiders Activity Graphs
Pablo Hoffman
2 Mins
August 25, 2012
We often have to write spiders that need to login to sites, in order to scrape data from them. Our customers provide us with the site, username and password, and we do the rest.