The Evolution of the Data Vendor Ecosystem
Responding to those demands, the data vendor landscape has evolved dramatically over the past decade, with several distinct phases of development:
The Institutional Players
The traditional data provider ecosystem was dominated by established, large players in specific verticals.
Institutional companies in the markets information space have long provided market data, pricing information, and financial analytics. Others aggregated consumer credit information, while market research agencies collected and sold consumer behavior data, and another class of provider offered information about companies.
These vendors typically operated with proprietary rather than public data, rigid delivery models, and high price points that limited access to large enterprises.
The API Revolution
Around 2015, a new generation of data vendors emerged, leveraging APIs to make data more accessible and actionable.
Where data owners themselves did not provide API access, a new class of vendors built their entire business models around providing third-party data through developer-friendly APIs in areas like profile enrichment. Meanwhile, "alternative data providers" began aggregating non-traditional data sources for investment insights, available through structured feeds.
This phase marked a significant shift toward more flexible, programmatic access to data. But, without the ability to pick target web data, it still left the majority of sources on the table.
The Developer Decision
To choose from a full menu of web data, it has always been necessary to scrape from the web. Indeed, many companies have long employed teams to develop scraping pipelines using available frameworks.
Others who do not want to take on the task in-house have frequently enlisted all-purpose developer shops to custom-build scrapers for one-off data collection services.
Outsourced developers can quickly and affordably deliver solutions tailored to immediate business needs. But many customers have discovered inconsistent data quality, compliance risks and short-term support, adding up to fragile supply.
Rise of the Marketplace
To solve that problem, the industry spawned marketplaces for pre-scraped data.
Marketplaces aim to meet data buyers' need for immediacy and quality, the equivalent of a bazaar or emporium, offering a range of ready-made datasets.
While the ability to browse and buy those datasets can be appealing to some with rapid needs for common datasets, companies with exacting target data needs frequently find marketplace offerings too narrow, with limited choice, customization and varying underlying quality.
Dawn of the Managed Extraction Vendor
That is why we have recently seen the emergence of a novel category of data vendor, the managed extraction service.
Managed extraction vendors aim to provide a combined, integrated offering:
Targeting: Scraping from a buyers' bespoke list of target sites, including niche or long-tail sites, complex on-page content types and crawls for page discovery.
Done-for-you service: Setup and creation of scrapers for data acquisition.
Ongoing supply: Feeds of data from target sites, updated on customers' preferred cadence.
Human help: Skills from expert scrapers and legal experts to create and manage data pipelines, advise on compliance issues and monitor for data quality.
Managed data extraction services tend to be found within multi-purpose scraping vendors - that is, those which offer a combination of software and services.
In fact, their existence can be thought of as largely thanks to the growth in all these distinct categories to the point where an integrated offering, drawing on each under one roof, has been made possible.