Zyte Developers Community newsletter issue #3
Hi there,
If you are not signed up already for the Zyte Developers Community newsletter, you can sign up here.
In this issue:
- Scrapy 2.5.0 is out
- Recipe scraping app (with source code)
- Web scraping in Elixir
- Easy table scraping with R
Scrapy 2.5.0 is out
The first new Scrapy release of the year is here!
Highlights:
- Official Python 3.9 support
- Experimental HTTP/2 support
- New get_retry_request() function to retry requests from spider callbacks
- New headers_received signal that allows stopping downloads early
- New Response.protocol attribute
Recipe scraping app
@mango_mero - as part of the #100DaysOfCode challenge - created an awesome django demo app which scrapes recipe information real-time, using beautifulsoup. Source code is available on Github.
Web scraping in Elixir
If you are using Elixir for web dev, and considering a web scraping project, you might want to check out this framework: Crawly, a high-level web crawling & scraping framework for Elixir. Check out the documentation and the quickstart guide.
Easy table scraping with R
Extracting data from HTML tables can be messy. For one-off jobs though, there's an easy alternative. If you're using R Studio, there's an addin which makes it easy to scrape tables: datapasta. You literally just copy the table from the page, paste it into the tool and you get the data in structured form. Here's a tutorial video.