Today we released the latest version of Portia bringing with it the ability to crawl pages that require JavaScript. To celebrate this release we are making Splash available as a free trial to all Portia users so you can try it out with your projects.
This past year, we have been working on a distributed version of our crawl frontier framework, Frontera. This work was partially funded by DARPA and is included in the DARPA Open Catalog.
EuroPython 2015 is happening this week and we’re having the largest company meetup so far as a part of it, with more than 30 members from our fully remote-working team attending.
We’re proud to announce our new open source project, ScrapyRT! ScrapyRT, short for Scrapy Real Time, allows you to extract data from a single web page via an API using your existing Scrapy spiders.
One of the things that takes more time when building a spider is reviewing the scraped data and making sure it conforms to the requirements and expectations of your client or team.
After a year considering it, we have decided to go ahead and drop support for Python 2.5 in Scrapy. Starting from 0.15, Scrapy will require Python 2.6 or above.