PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Data Services
Pricing
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    Enterprise

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Blog

    Learn

    Case Studies

    Webinars

    Videos

    White Papers

    Web scraping APIs vs proxies: A head-to-head comparison
    Blog Post
    The seven habits of highly effective data teams
    Blog Post
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
Register now
Login
Try Zyte API
Contact Sales
Documentation
Support
Join our Community
Login
Try Zyte API
Contact Sales
Join us
Home
Blog
Scrapy in 2026: New release brings modern async crawling standards
Light
Dark
IntroductionThe ‘async’ revolutionSmarter scheduling by defaultAction required: Clean up your spidersOther improvementsStack to the future
×
Subscribe to our Blog
Table of Contents

The world’s most-used open source data extraction framework just rang in the new year with a new release that brings a big structural shift.


If you have been awaiting Scrapy’s full embrace of modern Python async/await patterns, the new version 2.14.0 is the release you have been waiting for.

You might not see flashy new scraping tools in this release. But you will see a framework that is significantly more robust, future-proof, and aligned with modern standards. Think of this as an infrastructure upgrade, one that swaps out aging copper wiring for fiber optics.

The ‘async’ revolution

For years, Scrapy has relied heavily on Twisted’s Deferred objects. While powerful, they predate modern Python’s native async capabilities. In 2.14.0, Scrapy replaces a huge chunk of these internals with native coroutines.


This release introduces AsyncCrawlerProcess and AsyncCrawlerRunner. These are counterparts to the standard runners you know, designed to offer coroutine-based APIs.


What does this mean for you? If you are running Scrapy from a script (common in production pipelines), AsyncCrawlerProcess allows your crawler to play much nicer with other asyncio libraries.


It looks remarkably similar to the CrawlerProcess you are used to. You don't need to rewrite your setup entirely but, under the hood, you are now running on a modernized, coroutine-friendly foundation.

It is now easier to integrate Scrapy into broader asynchronous applications without fighting against conflicting event loops or legacy Deferred chains.

Smarter scheduling by default

If you run large-scale crawls, you know that managing concurrency is an art. In 2.14.0, the DownloaderAwarePriorityQueue is now the default priority queue.


Previously, Scrapy’s scheduler could be a bit "blind," pushing requests without fully understanding the downloader's current load for specific domains. The new default queue is "downloader aware" - it manages request priorities more intelligently based on the downloader's state.


You don’t need to change a single line of code; your crawls should simply run smoother, with fewer bottlenecks when scraping multiple domains.

Action required: Clean up your spiders

Scrapy is standardizing how spiders are configured, deprecating the use of class attributes for specific download settings in favor of the dictionary-based custom_settings.


The old way: If your spiders define download_timeout or user_agent directly as class attributes, you will start seeing warnings.

Other improvements

To keep the framework modern, Scrapy 2.14.0 has updated its requirements:


  • Automatic image Rotation: The ImagesPipeline now automatically transposes images based on EXIF data. If you scrape mobile-uploaded content (like real estate or classifieds), this fixes those annoying "sideways" photos automatically.

  • Python 3.9 support dropped: Requirements have been updated. Scrapy 2.14.0 now requires Python 3.10+.


Better custom download handlers: For advanced users building custom protocol handlers, the API has been documented and improved with a new BaseDownloadHandler class, making it easier to extend Scrapy’s core capabilities.

Stack to the future

Scrapy 2.14.0 is about longevity.


By adopting async internals and modernizing the scheduling logic, developers are ensuring Scrapy remains the go-to framework for serious web data extraction in 2026 and beyond.


Check out the full release notes or Scrapy website for more information.

×

Get the latest posts straight to your inbox

No matter what data type you're looking for, we've got you

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026
Read Time
6 min
Posted on
January 12, 2026
Open Source
By
Robert Andrews

Try Zyte API

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Zyte proxies and smart browser tech rolled into a single API.

Scrapy in 2026: New release brings modern async crawling standards

Scrapy 2.14.0 is released with a major under-the-hood modernization. Say goodbye to Twisted Deferreds.
Start FreeFind out more
Start FreeFind out more
1import scrapy
2from scrapy.crawler import AsyncCrawlerProcess
3
4class MySpider(scrapy.Spider):
5    # Your spider definition
6    ...
7
8# The new AsyncCrawlerProcess works just like the classic one
9process = AsyncCrawlerProcess(
10    settings={
11        "FEEDS": {
12            "items.json": {"format": "json"},
13        },
14    }
15)
16
17process.crawl(MySpider)
18process.start() # script blocks here until crawling finishes
Copy
1class MySpider(scrapy.Spider):
2    name = 'myspider'
3    # Deprecated!
4    download_timeout = 30
5    user_agent = 'MyBot/1.0'
6The new way: Move these configurations into custom_settings. It keeps your spider logic clean and your configuration centralized.
7class MySpider(scrapy.Spider):
8    name = 'myspider'
9    # The Scrapy 2.14.0 Standard
10    custom_settings = {
11        'DOWNLOAD_TIMEOUT': 30,
12        'USER_AGENT': 'MyBot/1.0'
13    }
Copy