PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Blog

    Learn

    Case Studies

    Webinars

    Videos

    White Papers

    Join our Community
    Web scraping APIs vs proxies: A head-to-head comparison
    Blog Post
    The seven habits of highly effective data teams
    Blog Post
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
Home
Blog
This Month in Open Source at Zyte (June 2016): Community Highlights
Light
Dark

This month in open source at Zyte June 2016

Read Time
4 Mins
Posted on
June 15, 2016
Open Source
By
Cecilia Haynes
×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more
Subscribe to our Blog

This month in open source at Zyte June 2016

Welcome to This Month in Open Source at Zyte! In this regular column, we share all the latest updates on our open source projects including Scrapy, Splash, Portia, and Frontera.

If you’re interested in learning more or even becoming a contributor, reach out to us by email at opensource@zyte.com or on Twitter @Zyte

OS-Zyte

Scrapy 1.1

For those who missed the big news, Scrapy 1.1 is live! It’s the first official release that comes with Python 3 support, so you can go ahead and move your stack over.

The major changes in this release since the RC1 we announced in February include improved HTTPS connections (with proxy support) and handling URLs with non-ASCII characters. Make sure you upgrade w3lib to 1.14.2.

We’re very grateful for the feedback we received during the release candidate phase. A huge thanks to all the reporters, reviewers and code/documentation contributors.

If you find anything that’s not working, please take a few minutes to report the issue(s) on GitHub.

Notable limitations still present in this release include:

  • Scrapy 1.1 doesn’t work on Windows under Python 3 (Twisted is not fully ported to Python 3 on Windows, but we’ll keep an eye out for updates to this situation).
  • Scrapy’s FTP, Telnet console, and email do not work in Python 3.

Splash 2.1

Splash 2.1 now lets you:

  • Save large arguments to the Splash server so you don’t need to send them in every request. This is particularly useful for when you want to cache Lua scripts.
  • Take screenshots of page regions instead of the whole viewport.
  • Send mouse events without needing to write additional JavaScript.

If you’re using the Scrapy-Splash plugin (formerly “scrapyjs”), we encourage you to upgrade to the latest v0.7 version. It includes many goodies that makes integrating with Scrapy much easier. Check the latest README for details, especially the scrapy_splash.SplashRequest utility.

Google Summer of Code 2016

We’re thrilled to have 5 students this year:

  • Aron Bordin is working on supporting spiders in other programming languages with Scrapy.
  • Preetwinder Bath is porting Frontera to Python 3.
  • Tamer Tas is working on dockerization and orchestration of Frontera deployments.
  • Avishkar Gupta is replacing PyDispatcher to improve Scrapy’s signaling API performance.
  • Michael Manukyan is adding web scraping helpers for Splash.

We’d like to thank the Python Software Foundation for again having Zyte as a sub-org this year!

Libraries

cssselect maintenance

Scrapy relies on lxml and cssselect for all the XPath and CSS selection awesomeness that we use each and every day at Zyte. We learned that Simon Sapin, author of cssselect package, was looking for new maintainers. So we put ourselves forward and now cssselect is hosted under the Scrapy organization on GitHub. Don’t worry though, Simon is still involved! We’re planning on fixing a few corner cases and maybe working on CSS Selectors Level 4. We’ll definitely need assistance with this task, so please reach out if you’re interested in helping out!

Dateparser 0.3.5

We released Dateparser 0.3.5 with support for dates in Danish and Japanese. It now handles dates with accents much better. The library is now working with the latest version of python-dateutil.

Check the full release notes here.

js2xml

This side project of mine is now hosted under Zyte’s organization on GitHub. It’s a little helper library to convert JavaScript code into an XML tree. This means you can use XPath and CSS selectors to extract data (strings, objects, function arguments, etc.) from HTML-embedded JavaScript (this does not interpret it though). You’d be amazed at how much valuable data is “hidden” in JavaScript inside web pages.

It’s on PyPI and is now Python 3-compatible.

Check this Jupyter/ipython notebook for an overview of what you can do with it and make sure to let us know what you think.

w3lib 1.14.2

We updated our w3lib library to handle non-ASCII URLs better, as part of adding Python 3 support to Scrapy 1.1. We recommend that you upgrade to the latest 1.14.2 version.

parsel 1.0.2

If you’re using Scrapy 1.1, you’re using parsel under the hood. Parsel is Scrapy Selectors as an independent package. There’s a new release of parsel that fixes the hiding of XPath exceptions.

Portia

We’ve made some changes to Slybot, the Portia crawler, that include:

  • Re-added nested regions and text data annotations.
  • Selectors now handle comments correctly.
  • Added automatic link following seeded with start urls and sample urls.
  • Allow adjusting splash wait time.

For Portia itself:

  • New download API endpoint: GET portia/api/projects/PID/download[/SID]

Most of the recent developments have been taking place in the Portia beta.

The big changes include:

  • Clustering of pages during extraction to decide which sample to use for extraction.
  • Download Portia spider as Scrapy code: GET portia/api/projects/PID/download[/SID]?format=code
  • Uses Django style Storage object for accessing files.
  • Database access more consistent for MySQL backend.
  • Better element overlays; they can now be split across lines.
  • Re-add toggle CSS option for samples, you can now annotate hidden elements.
  • UI usable on low resolution screens, thanks to smarter wrapping.
  • Inform user of unpublished changes when using Git backend.

Try out the beta using the nui-develop branch.

Frontera

Frontera 0.5 introduces improved crawling strategy, new logging and better test coverage.

Mosquitera

Scrapy-mosquitera is a library to assist Scrapy spiders to do more optimal crawls. In its basic form, it’s a collection of matchers and a mixin to narrow down the crawl to a specific date range. However, you can extend it to be applicable on any domain (URL paths, location filtering, etc). You can find more details about how it works and how you can create your own matchers in the documentation.

Wrap Up

This concludes the June edition of This Month in Open Source at Zyte. We’re always looking for new contributors, so if you’re interested, feel free to explore our GitHub.

×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more

Get the latest posts straight to your inbox

No matter what data type you're looking for, we've got you

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026