Scrapy in 2026: New release brings modern async crawling standards

The world’s most-used open source data extraction framework just rang in the new year with a new release that brings a big structural shift.

If you have been awaiting Scrapy’s full embrace of modern Python async/await patterns, the new version 2.14.0 is the release you have been waiting for.

You might not see flashy new scraping tools in this release. But you will see a framework that is significantly more robust, future-proof, and aligned with modern standards. Think of this as an infrastructure upgrade, one that swaps out aging copper wiring for fiber optics.

The ‘async’ revolution

For years, Scrapy has relied heavily on Twisted’s Deferred objects. While powerful, they predate modern Python’s native async capabilities. In 2.14.0, Scrapy replaces a huge chunk of these internals with native coroutines.

This release introduces AsyncCrawlerProcess and AsyncCrawlerRunner. These are counterparts to the standard runners you know, designed to offer coroutine-based APIs.

What does this mean for you? If you are running Scrapy from a script (common in production pipelines), AsyncCrawlerProcess allows your crawler to play much nicer with other asyncio libraries.

It looks remarkably similar to the CrawlerProcess you are used to. You don't need to rewrite your setup entirely but, under the hood, you are now running on a modernized, coroutine-friendly foundation.

1import scrapy
2from scrapy.crawler import AsyncCrawlerProcess
3
4class MySpider(scrapy.Spider):
5    # Your spider definition
6    ...
7
8# The new AsyncCrawlerProcess works just like the classic one
9process = AsyncCrawlerProcess(
10    settings={
11        "FEEDS": {
12            "items.json": {"format": "json"},
13        },
14    }
15)
16
17process.crawl(MySpider)
18process.start() # script blocks here until crawling finishes

Copy

It is now easier to integrate Scrapy into broader asynchronous applications without fighting against conflicting event loops or legacy Deferred chains.

Smarter scheduling by default

If you run large-scale crawls, you know that managing concurrency is an art. In 2.14.0, the DownloaderAwarePriorityQueue is now the default priority queue.

Previously, Scrapy’s scheduler could be a bit "blind," pushing requests without fully understanding the downloader's current load for specific domains. The new default queue is "downloader aware" - it manages request priorities more intelligently based on the downloader's state.

You don’t need to change a single line of code; your crawls should simply run smoother, with fewer bottlenecks when scraping multiple domains.

Action required: Clean up your spiders

Scrapy is standardizing how spiders are configured, deprecating the use of class attributes for specific download settings in favor of the dictionary-based custom_settings.

The old way: If your spiders define download_timeout or user_agent directly as class attributes, you will start seeing warnings.

1class MySpider(scrapy.Spider):
2    name = 'myspider'
3    # Deprecated!
4    download_timeout = 30
5    user_agent = 'MyBot/1.0'
6The new way: Move these configurations into custom_settings. It keeps your spider logic clean and your configuration centralized.
7class MySpider(scrapy.Spider):
8    name = 'myspider'
9    # The Scrapy 2.14.0 Standard
10    custom_settings = {
11        'DOWNLOAD_TIMEOUT': 30,
12        'USER_AGENT': 'MyBot/1.0'
13    }

Copy

Other improvements

To keep the framework modern, Scrapy 2.14.0 has updated its requirements:

Automatic image Rotation: The ImagesPipeline now automatically transposes images based on EXIF data. If you scrape mobile-uploaded content (like real estate or classifieds), this fixes those annoying "sideways" photos automatically.
Python 3.9 support dropped: Requirements have been updated. Scrapy 2.14.0 now requires Python 3.10+.

Better custom download handlers: For advanced users building custom protocol handlers, the API has been documented and improved with a new BaseDownloadHandler class, making it easier to extend Scrapy’s core capabilities.

Stack to the future

Scrapy 2.14.0 is about longevity.

By adopting async internals and modernizing the scheduling logic, developers are ensuring Scrapy remains the go-to framework for serious web data extraction in 2026 and beyond.

Check out the full release notes or Scrapy website for more information.