How web data turns e-commerce listings into retail intelligence

How do you build an always-on retail intelligence engine when the shelf you are monitoring never stops moving?

For any retailer, mastery of product trends is growing ever more complicated, with 30.7 million e-commerce sites now operating worldwide and 23% of all retail expected to be online by 2027.

Enter digital shelf analytics (DSA) vendors. These platforms help brands and their retailers make sense of the digital shelf — tracking price, availability, search rankings, and promotions across tens of thousands of sites.

Such intelligence is in high demand – the DSA market was valued at $1.68 billion in 2025, projected to reach $4.48 billion by 2033.

But building the intelligence engine that powers these platforms comes with its own challenges.

The data stream that powers DSA solutions

The foundation of any DSA solution is a continuous, reliable stream of structured data from the open web.

In theory, this data is freely available. In practice, acquiring it at the speed, scale, and quality to power sound retail decisions is a constant operational battle.

Every DSA vendor approaches that challenge differently. In the end, those who solve have a data advantage. To get there, they need an infrastructure advantage.

Four ways web data enables digital shelf analytics

DSA vendors are in a race to provide the most comprehensive and accurate view of the digital shelf.

1. Merchant expansion: Accelerate channel onboarding

Comprehensive coverage across the global e-commerce footprint is the foundational value proposition of most DSAs. Their customers want to know whether the platform covers the marketplaces, categories and locations they care about. Broad, fast coverage helps a sales team close a deal.

Of course, each retailer site is unique, with different infrastructure mechanisms, layouts, and geographical quirks.

A fashion retail intelligence platform needed to onboard 400+ new retailer sites. Its internal team estimated it would take 14 to 18 months. With Zyte Data, it took 90 days.

The DSA provided a list of target URLs and the data schema it needed; the Zyte Data team handled everything else – using AI-powered spider creation to generate and validate crawlers, followed by quality assurance and handoff of a live, monitored feed backed by a service level agreement.

Furthermore, the geolocation feature of Zyte API, which ensures requests are routed through in-country IP addresses so sites return the correct regional page, makes extracting data from regionally specific retailers more reliable.

This partnership accelerated time-to-data, enabling the DSA to receive structured product data immediately without worrying about creating or maintaining any code

2. High-frequency data ingestion: Collecting real-time data at scale

For digital shelf analytics vendors, the modern e-commerce landscape demands up-to-the-minute sight of products, pricepoints and changes. Prices, availability, and rankings all change continuously.

Real-time data means running an enormous volume of crawl requests. At that scale, IP blocking, rate limiting, browser fingerprinting, and CAPTCHA challenges from retailer sites can all stand in the way of high-frequency data-gathering.

For example, one digital shelf analytics provider needed to run two billion HTTP requests per day. But its in-house team was spending more time on ban management during data collection than on building game-changing analytics features on top of that data.

At this scale, the main bottleneck is access. Zyte API’s automatic access management removes that bottleneck entirely. It handles IP rotation, adaptive throttling, browser fingerprinting, and CAPTCHA management without infrastructure burden on DSA vendors.

With access management handled, the DSA’s in-house engineering team could focus on what actually differentiates the platform — turning foundational e-commerce data into table-stakes, and keeping its biggest clients' dashboards running without interruption.

3. Data collection continuity: Keeping the feeds flowing

Retailer sites are constantly introducing new pages, redesigning their layouts, and A/B testing shopping flows, while modern sites also increasingly rely on JavaScript, requiring browser rendering to fully present relevant information.

All these are potential breaking points for data scrapers built on hard-coded CSS or XPath selectors. The result can be a maintenance treadmill, with engineering teams reactively fixing broken scrapers in a scramble to ensure customers retain insights visibility at key moments, like the launch of a new-season collection.

To ensure the product data keeps flowing, DSAs are now embracing tools and practices designed by web data specialists to guarantee collection continuity.

For most product data extraction needs, Zyte API’s AI-powered automatic extraction features allow engineers to find and return on-page product data and productLists. These features are resilient to layout changes meaning an engineer need not re-write code when a seller redesigns.

For more advanced extraction tasks with bespoke data schemas and custom post-processing, Zyte Data, Zyte’s done-for-you expert scraping service, can be the technical partner DSA vendors need. Proactive monitoring detects structural changes immediately, and a dedicated engineering team adapts the scraper and restores the feed, often before the DSA's customers register the disruption. The operational burden is completely outsourced.

A feed that survives a retailer redesign without interruption removes the most common reason analytics customers start evaluating alternatives. Maintenance resilience turns a vendor relationship into a long-term partnership.

4. Product data normalization: Prompting analytics-ready feeds

Raw, unstructured web data is rarely ready for analytics prime-time.

DSAs aim to reliably group stock-keeping units (SKUs), including product variants, within and across retailer real estate.

But product titles may be inconsistent across retailers; attributes like size and color may be buried in unstructured text; variant grouping is unreliable. Product matching and attribute normalization across sites and channels is a challenging data engineering problem, and manual quality assurance is challenging at scale.

Zyte API’s AI-powered product automatic extraction feature parses and returns price, currency, Global Trade Identification Number (GTIN) and more into a consistent, dependable schema.

1{  
2  "url": "https://examplebrand.com/item-page/",  
3  "statusCode": 200,  
4  "product": {  
5    "name": "Sony WH-1000XM5 Wireless Headphones",  
6    "price": "279.99",  
7    "regularPrice": "349.99",  
8    "currency": "USD",  
9    "availability": "InStock",  
10    "gtin": [{ "type": "UPC", "value": "027242920835" }],  
11    "brand": { "name": "Sony" },  
12    "sku": "WH1000XM5/B",  
13    "aggregateRating": { "ratingValue": 4.7, "reviewCount": 2341 },  
14    "images": ["https://..."],  
15    "description": "Industry-leading noise canceling...",  
16    "breadcrumbs": ["Electronics", "Headphones", "Over-Ear"]  
17  },  
18  ...  
19}

Copy

Using the customAttributes feature, data teams instruct Zyte API’s AI engine to identify and return non-standard attributes, using only natural language input. This enables the easy collection and normalization of attributes like colors, fabrics, and styles directly from product listings within a specific target category.

1{  
2  "product": {  
3    "name": "Men's Stretch Chino Trousers",  
4    "price": "49.99",  
5    "currency": "GBP",  
6    "availability": "InStock"  
7  },  
8  "customAttributes": {  
9    "values": {  
10      "fabric_composition": ["98% cotton", "2% elastane",  
11      "fit_type": "tapered",  
12      "size_label_normalised": "EU 32"  
13    }  
14  }  
15}

Copy

The data is always delivered in a structured format specified by the user, allowing for custom schemas and seamless integration into the DSA vendor's analytics platforms.

In addition, the Zyte Data team works with the DSA to build higher-level data engineering logic that sits above the API's extraction layer, such as cross-retailer SKU matching. The same product could appear under different titles, GTINs, and attribute structures on different sites. Zyte Data can reconcile them into a single, consistent product record, tailored to the DSA's specific data model.

Vendors running on clean, analytics-ready data feeds are able to fast-forward the messy work of data cleaning and normalization, while rivals may still be elbows-deep in validation.

Staying ahead with an always-on clean web data

Behind every reliable dashboard and every confident client recommendation is a stream of quality web data that must not be turned off.

The digital shelf is always moving. Keeping up with it at the speed, scale, and accuracy users demand is a continuous operation.

The platforms that get it right help their customers better understand their market opportunity, so that the platforms can grow their own.

Get clean product data

Build reliable product data feeds with easily-accessible web data

How do you build an always-on retail intelligence engine when the shelf you are monitoring never stops moving?

For any retailer, mastery of product trends is growing ever more complicated, with 30.7 million e-commerce sites now operating worldwide and 23% of all retail expected to be online by 2027.

Such intelligence is in high demand – the DSA market was valued at $1.68 billion in 2025, projected to reach $4.48 billion by 2033.

But building the intelligence engine that powers these platforms comes with its own challenges.

The data stream that powers DSA solutions

The foundation of any DSA solution is a continuous, reliable stream of structured data from the open web.

In theory, this data is freely available. In practice, acquiring it at the speed, scale, and quality to power sound retail decisions is a constant operational battle.

Every DSA vendor approaches that challenge differently. In the end, those who solve have a data advantage. To get there, they need an infrastructure advantage.

Four ways web data enables digital shelf analytics

DSA vendors are in a race to provide the most comprehensive and accurate view of the digital shelf.

1. Merchant expansion: Accelerate channel onboarding

Of course, each retailer site is unique, with different infrastructure mechanisms, layouts, and geographical quirks.

A fashion retail intelligence platform needed to onboard 400+ new retailer sites. Its internal team estimated it would take 14 to 18 months. With Zyte Data, it took 90 days.

This partnership accelerated time-to-data, enabling the DSA to receive structured product data immediately without worrying about creating or maintaining any code

2. High-frequency data ingestion: Collecting real-time data at scale

For digital shelf analytics vendors, the modern e-commerce landscape demands up-to-the-minute sight of products, pricepoints and changes. Prices, availability, and rankings all change continuously.

3. Data collection continuity: Keeping the feeds flowing

To ensure the product data keeps flowing, DSAs are now embracing tools and practices designed by web data specialists to guarantee collection continuity.

4. Product data normalization: Prompting analytics-ready feeds

Raw, unstructured web data is rarely ready for analytics prime-time.

DSAs aim to reliably group stock-keeping units (SKUs), including product variants, within and across retailer real estate.

Zyte API’s AI-powered product automatic extraction feature parses and returns price, currency, Global Trade Identification Number (GTIN) and more into a consistent, dependable schema.

1{  
2  "url": "https://examplebrand.com/item-page/",  
3  "statusCode": 200,  
4  "product": {  
5    "name": "Sony WH-1000XM5 Wireless Headphones",  
6    "price": "279.99",  
7    "regularPrice": "349.99",  
8    "currency": "USD",  
9    "availability": "InStock",  
10    "gtin": [{ "type": "UPC", "value": "027242920835" }],  
11    "brand": { "name": "Sony" },  
12    "sku": "WH1000XM5/B",  
13    "aggregateRating": { "ratingValue": 4.7, "reviewCount": 2341 },  
14    "images": ["https://..."],  
15    "description": "Industry-leading noise canceling...",  
16    "breadcrumbs": ["Electronics", "Headphones", "Over-Ear"]  
17  },  
18  ...  
19}

Copy

1{  
2  "product": {  
3    "name": "Men's Stretch Chino Trousers",  
4    "price": "49.99",  
5    "currency": "GBP",  
6    "availability": "InStock"  
7  },  
8  "customAttributes": {  
9    "values": {  
10      "fabric_composition": ["98% cotton", "2% elastane",  
11      "fit_type": "tapered",  
12      "size_label_normalised": "EU 32"  
13    }  
14  }  
15}

Copy

The data is always delivered in a structured format specified by the user, allowing for custom schemas and seamless integration into the DSA vendor's analytics platforms.

Vendors running on clean, analytics-ready data feeds are able to fast-forward the messy work of data cleaning and normalization, while rivals may still be elbows-deep in validation.

Staying ahead with an always-on clean web data

Behind every reliable dashboard and every confident client recommendation is a stream of quality web data that must not be turned off.

The digital shelf is always moving. Keeping up with it at the speed, scale, and accuracy users demand is a continuous operation.

The platforms that get it right help their customers better understand their market opportunity, so that the platforms can grow their own.

Get clean product data

Build reliable product data feeds with easily-accessible web data

How web data turns e-commerce listings into retail intelligence

The data stream that powers DSA solutions

Four ways web data enables digital shelf analytics

1. Merchant expansion: Accelerate channel onboarding

2. High-frequency data ingestion: Collecting real-time data at scale

3. Data collection continuity: Keeping the feeds flowing

4. Product data normalization: Prompting analytics-ready feeds

Staying ahead with an always-on clean web data

Get clean product data

Build your first scraper in minutes

Continue reading

How to build your first Scrapy extension

The best of Zyte and the data web, in your inbox.

How web data turns e-commerce listings into retail intelligence

The data stream that powers DSA solutions

Four ways web data enables digital shelf analytics

1. Merchant expansion: Accelerate channel onboarding

2. High-frequency data ingestion: Collecting real-time data at scale

3. Data collection continuity: Keeping the feeds flowing

4. Product data normalization: Prompting analytics-ready feeds

Staying ahead with an always-on clean web data

Get clean product data

Build your first scraper in minutes

Continue reading

How to build your first Scrapy extension

The best of Zyte and the data web, in your inbox.