Absolutely not! Website changes (sometimes very subtly), anti-bot countermeasures, and temporary problems often reduce the quality and reliability of our data.
The first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website. We’re supporters of the democratization of web data, but not at the expense of the website’s owners.
We eat our own spider food since Scrapy is our go-to workhorse on a daily basis. However, there are certain situations where Scrapy can be overkill and that’s when we use Parsel.
Many governments worldwide have laws enforcing them to publish their expenses, contracts, decisions, and so forth, on the web.
Imagine your business depended heavily on a third party tool and one day that company decided to shut down its service with only 2 weeks notice. That, unfortunately, is what happened to users of Kimono Labs yesterday.
Support for JavaScript has been a much requested feature ever since Portia’s first release 2 years ago. The wait is nearly over and we are happy to inform you that we will be launching these changes in the very near future.
At Zyte we're always building and running large crawls–last year we had 11 billion requests made on Scrapy Cloud alone.
Imagine that you have a lot of samples for a certain kind of data in JSON format. Maybe you want to have a better feel of it, know which fields appear in all records, which appear only in some and what are their types. In other words, you want to know the schema for the data that you have.
Here at Zyte, we love open source. We love using and contributing to it. Over these years we have open sourced a few projects, that we keep using over and over, in the hope that it will make others lives easier.