PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogOpen-sourceThis Month in Open Source at Zyte (June 2016): Community Highlights
ArticleNewsOpen-source

This Month in Open Source at Zyte (June 2016): Community Highlights

This Month in Open Source at Zyte: June 2016 - Stay updated with the latest open-source initiatives at Zyte. Join our growing community of contributors.

C

Cecilia Haynes

4 min read · June 15, 2016

This Month in Open Source at Zyte (June 2016): Community Highlights

This month in open source at Zyte June 2016

Welcome to This Month in Open Source at Zyte! In this regular column, we share all the latest updates on our open source projects including Scrapy, Splash, Portia, and Frontera.

If you’re interested in learning more or even becoming a contributor, reach out to us by email at opensource@zyte.com or on Twitter @Zyte

OS-Zyte

Scrapy 1.1

For those who missed the big news, Scrapy 1.1 is live! It’s the first official release that comes with Python 3 support, so you can go ahead and move your stack over.

The major changes in this release since the RC1 we announced in February include improved HTTPS connections (with proxy support) and handling URLs with non-ASCII characters. Make sure you upgrade w3lib to 1.14.2.

We’re very grateful for the feedback we received during the release candidate phase. A huge thanks to all the reporters, reviewers and code/documentation contributors.

If you find anything that’s not working, please take a few minutes to report the issue(s) on GitHub.

Notable limitations still present in this release include:

  • Scrapy 1.1 doesn’t work on Windows under Python 3 (Twisted is not fully ported to Python 3 on Windows, but we’ll keep an eye out for updates to this situation).
  • Scrapy’s FTP, Telnet console, and email do not work in Python 3.

Splash 2.1

Splash 2.1 now lets you:

  • Save large arguments to the Splash server so you don’t need to send them in every request. This is particularly useful for when you want to cache Lua scripts.
  • Take screenshots of page regions instead of the whole viewport.
  • Send mouse events without needing to write additional JavaScript.

If you’re using the Scrapy-Splash plugin (formerly “scrapyjs”), we encourage you to upgrade to the latest v0.7 version. It includes many goodies that makes integrating with Scrapy much easier. Check the latest README for details, especially the scrapy_splash.SplashRequest utility.

Google Summer of Code 2016

We’re thrilled to have 5 students this year:

  • Aron Bordin is working on supporting spiders in other programming languages with Scrapy.
  • Preetwinder Bath is porting Frontera to Python 3.
  • Tamer Tas is working on dockerization and orchestration of Frontera deployments.
  • Avishkar Gupta is replacing PyDispatcher to improve Scrapy’s signaling API performance.
  • Michael Manukyan is adding web scraping helpers for Splash.

We’d like to thank the Python Software Foundation for again having Zyte as a sub-org this year!

Libraries

cssselect maintenance

Scrapy relies on lxml and cssselect for all the XPath and CSS selection awesomeness that we use each and every day at Zyte. We learned that Simon Sapin, author of cssselect package, was looking for new maintainers. So we put ourselves forward and now cssselect is hosted under the Scrapy organization on GitHub. Don’t worry though, Simon is still involved! We’re planning on fixing a few corner cases and maybe working on CSS Selectors Level 4. We’ll definitely need assistance with this task, so please reach out if you’re interested in helping out!

Dateparser 0.3.5

We released Dateparser 0.3.5 with support for dates in Danish and Japanese. It now handles dates with accents much better. The library is now working with the latest version of python-dateutil.

Check the full release notes here.

js2xml

This side project of mine is now hosted under Zyte’s organization on GitHub. It’s a little helper library to convert JavaScript code into an XML tree. This means you can use XPath and CSS selectors to extract data (strings, objects, function arguments, etc.) from HTML-embedded JavaScript (this does not interpret it though). You’d be amazed at how much valuable data is “hidden” in JavaScript inside web pages.

It’s on PyPI and is now Python 3-compatible.

Check this Jupyter/ipython notebook for an overview of what you can do with it and make sure to let us know what you think.

w3lib 1.14.2

We updated our w3lib library to handle non-ASCII URLs better, as part of adding Python 3 support to Scrapy 1.1. We recommend that you upgrade to the latest 1.14.2 version.

parsel 1.0.2

If you’re using Scrapy 1.1, you’re using parsel under the hood. Parsel is Scrapy Selectors as an independent package. There’s a new release of parsel that fixes the hiding of XPath exceptions.

Portia

We’ve made some changes to Slybot, the Portia crawler, that include:

  • Re-added nested regions and text data annotations.
  • Selectors now handle comments correctly.
  • Added automatic link following seeded with start urls and sample urls.
  • Allow adjusting splash wait time.

For Portia itself:

  • New download API endpoint: GET portia/api/projects/PID/download[/SID]

Most of the recent developments have been taking place in the Portia beta.

The big changes include:

  • Clustering of pages during extraction to decide which sample to use for extraction.
  • Download Portia spider as Scrapy code: GET portia/api/projects/PID/download[/SID]?format=code
  • Uses Django style Storage object for accessing files.
  • Database access more consistent for MySQL backend.
  • Better element overlays; they can now be split across lines.
  • Re-add toggle CSS option for samples, you can now annotate hidden elements.
  • UI usable on low resolution screens, thanks to smarter wrapping.
  • Inform user of unpublished changes when using Git backend.

Try out the beta using the nui-develop branch.

Frontera

Frontera 0.5 introduces improved crawling strategy, new logging and better test coverage.

Mosquitera

Scrapy-mosquitera is a library to assist Scrapy spiders to do more optimal crawls. In its basic form, it’s a collection of matchers and a mixin to narrow down the crawl to a specific date range. However, you can extend it to be applicable on any domain (URL paths, location filtering, etc). You can find more details about how it works and how you can create your own matchers in the documentation.

Wrap Up

This concludes the June edition of This Month in Open Source at Zyte. We’re always looking for new contributors, so if you’re interested, feel free to explore our GitHub.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
Open-source
C

Cecilia Haynes

More from this author

In this article

  • Scrapy 1.1
  • Splash 2.1
  • Google Summer of Code 2016
  • Libraries
  • cssselect maintenance
  • Dateparser 0.3.5
  • js2xml
  • w3lib 1.14.2
  • parsel 1.0.2
  • Portia
  • Frontera
  • Mosquitera
  • Wrap Up

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

Continue reading

Scrapy in 2026: New release brings modern async crawling standards
Open Source

Scrapy in 2026: New release brings modern async crawling standards

Scrapy 2.14.0 is released with a major under-the-hood modernization. Say goodbye to Twisted Deferreds.

Robert Andrews·6 min·January 12, 2026
The new economics of web data: Smaller scraping just got cheaper
Open Source

The new economics of web data: Smaller scraping just got cheaper

Smarter tools and AI-driven automation are rewriting the rules of web scraping. As costs fall and setup barriers vanish, smaller teams can now compete at scale, reshaping how the web’s data economy works.

Theresia Tanzil·2 mins·October 6, 2025
A Deep Dive into Zyte's Open-Source Libraries
Open Source

A Deep Dive into Zyte's Open-Source Libraries

Discover how Zyte’s open-source libraries like ClearHTML, Extruct, Chomp.js, and more simplify web data extraction and processing.

Neha Setia Nagpal·1 mins·December 19, 2024

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026