Price Intelligence With Python: Scrapy, SQL, And Pandas

Price intelligence with Python: Scrapy, SQL, and Pandas

In this article, I will guide you through a web scraping and data visualization project. We will extract e-commerce data from real e-commerce websites then try to get some insights out of it. The goal of this article is to show you how to get product pricing data from the web and what are some ways to analyze pricing data. We will also look at how price intelligence makes a real difference for e-commerce companies for smarter price intelligence decisions.

This is the simple process we are going to follow for this article:

Identify data sources and fields
Extract and store data
Analyze data

Identify data sources and fields

Websites

In a real-life project, you’d probably know which websites you want to get data from. For this article, I’m choosing some popular European e-commerce stores.

Fields to scrape

When scraping product information we have an endless amount of data types we could get from an e-commerce site: product name, product-specific attributes, price, stock, reviews, category, etc. For now, we will focus on four fields that have the potential to give us the most interesting insights:

product name
price
stock
category

See how we at Zyte structure product data.

Ethical web scraping

Before we start writing code to extract data from any website, it’s important to make sure we are scraping ethically. First, we should check the robots.txt file and see if it allows us to visit the pages we want to get data from.

Example robots.txt:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

User-agent: *

Disallow: /

User-Agent: *

Disallow: /*.json

Disallow: /api

Disallow: /post

Disallow: /submit

Allow: /

User-agent: * Disallow: / User-Agent: * Disallow: /*.json Disallow: /api Disallow: /post Disallow: /submit Allow: /

1User-agent: \*
2Disallow: /
3
4User-Agent: \*
5Disallow: /\*.json
6Disallow: /api
7Disallow: /post
8Disallow: /submit
9Allow: /

Copy

Some things you could do to be compliant:

Respect the rules
Adjust crawling speed if needed
Identify yourself with a UserAgent
Do not harm the website

Extract and store data

This is the part where we extract the data from the website. We’re going to use several modules of the Scrapy framework like Item, ItemLoader, and pipeline. We want to make sure that the output is clean so we can insert it into a database for later analysis.

Installing Scrapy

We are using Scrapy, the web scraping framework for this project that will help you better understand price intelligence. It is recommended to install Scrapy in a virtual environment so it doesn’t conflict with other system packages.

Create a new folder and install virtualenv:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

mkdir ecommerce

cd ecommerce

pip install virtualenv

virtualenv env

source env/bin/activate

mkdir ecommerce cd ecommerce pip install virtualenv virtualenv env source env/bin/activate

1mkdir ecommerce
2cd ecommerce
3pip install virtualenv
4virtualenv env
5source env/bin/activate

Copy

Install Scrapy:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

pip install scrapy

1pip install scrapy

Copy

If you’re having trouble with installing scrapy check out the installation guide.

Create a new Scrapy project

Now that we have Scrapy installed in our environment we can create a new Scrapy project:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

scrapy startproject ecommerce

1scrapy startproject ecommerce

Copy

This will generate the file structure:

Items

Before we can write the spiders for each website, we have to create an item in the items file which contains the previously defined data fields. Remember, since this project is related to price intelligence then it is important how you define and represent your data fields. One item will represent one product and hold all its data.

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class ProductItem(Item):

product = Field()

price = Field()

category = Field()

stock = Field()

class ProductItem(Item): product = Field() price = Field() category = Field() stock = Field()

1class ProductItem(Item):
2   product = Field()
3   price = Field()
4   category = Field()
5   stock = Field()

Copy

Spider

Each of our spiders will look the same except the selectors of course.

To create a spider, first, we should look at the website and its source code, for example:

This is the HTML of one e-commerce website I’m scraping and this part contains the key ingredients that will follow for price intelligence analysis, which are the name and price information. One thing to look out for is the itemprop attribute. Many e-commerce sites use this schema. In the source code above we have itemprop=”name” which contains the product name and itemprop=”price” which contains the product price.

Selecting data fields based on itemprop attributes gives us a better chance that the scraper won’t break in the future when the website layout changes.

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class Ecommerce(Spider):

name = "ecommerce"

start_urls = ["example.com/products/1", "example.com/products/2", "example.com/products/3", ]

def parse(self, response):

item_loader = ItemLoader(item=ProductItem(), response=response)

item_loader.default_input_processor = MapCompose(remove_tags)

item_loader.add_css("product", "h1[itemprop='name']")

item_loader.add_css("price", "span[itemprop=price]")

item_loader.add_css("stock", "span[itemprop=’stock’]")

item_loader.add_css("category", "a[data-track='Breadcrumb']")

return item_loader.load_item()

class Ecommerce(Spider): name = "ecommerce" start_urls = ["example.com/products/1", "example.com/products/2", "example.com/products/3", ] def parse(self, response): item_loader = ItemLoader(item=ProductItem(), response=response) item_loader.default_input_processor = MapCompose(remove_tags) item_loader.add_css("product", "h1[itemprop='name']") item_loader.add_css("price", "span[itemprop=price]") item_loader.add_css("stock", "span[itemprop=’stock’]") item_loader.add_css("category", "a[data-track='Breadcrumb']") return item_loader.load_item()

1class Ecommerce(Spider):
2   name = "ecommerce"
3   start\_urls = \["example.com/products/1", "example.com/products/2", "example.com/products/3", \]
4
5   def parse(self, response):
6       item\_loader = ItemLoader(item=ProductItem(), response=response)
7       item\_loader.default\_input\_processor = MapCompose(remove\_tags)
8
9       item\_loader.add\_css("product", "h1\[itemprop='name'\]")
10       item\_loader.add\_css("price", "span\[itemprop=price\]")
11       item\_loader.add\_css("stock", "span\[itemprop=’stock’\]")
12       item\_loader.add\_css("category", "a\[data-track='Breadcrumb'\]")
13
14       return item\_loader.load\_item()

Copy

I’m using ItemLoader with a default input processor to remove HTML tags. As you can see, I’m selecting the category field from the breadcrumb.

Creating a data pipeline

If we want to run an analysis of our data we need to store it in some kind of database. For this price intelligence project, I’m using a MySQL database for storage. If you want to use MySQL as well you should install MySQL-python if it isn’t already installed:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

sudo pip install MySQL-python

1sudo pip install MySQL-python

Copy

Then in Scrapy, we create a new class called DatabasePipeline in the pipelines.py file:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class DatabasePipeline:

def process_item(self, item, spider):

return item

class DatabasePipeline: def process_item(self, item, spider): return item

1class DatabasePipeline:
2       def process\_item(self, item, spider):
3    	   return item

Copy

In this class we have several things to do:

Add database connection parameters in the constructor
Implement from_crawler method and get database connection info from settings.py
Connect to the database when the spider starts
Insert data records into the database (one item at a time)
When all done close the database connection

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class DatabasePipeline:

# Add database connection parameters in the constructor

def __init__(self, db, user, passwd, host):

self.db = db

self.user = user

self.passwd = passwd

self.host = host

# Implement from_crawler method and get database connection info from settings.py

@classmethod

def from_crawler(cls, crawler):

db_settings = crawler.settings.getdict("DB_SETTINGS")

if not db_settings:

raise NotConfigured

db = db_settings['db']

user = db_settings['user']

passwd = db_settings['passwd']

host = db_settings['host']

return cls(db, user, passwd, host)

# Connect to the database when the spider starts

def open_spider(self, spider):

self.conn = MySQLdb.connect(db=self.db,

user=self.user, passwd=self.passwd,

host=self.host,

charset='utf8', use_unicode=True)

self.cursor = self.conn.cursor()

# Insert data records into the database (one item at a time)

def process_item(self, item, spider):

sql = "INSERT INTO table (field1, field2, field3) VALUES (%s, %s, %s)"

self.cursor.execute(sql,

(

item.get("field1"),

item.get("field2"),

item.get("field3"),

)

self.conn.commit()

return item

# When all done close the database connection

def close_spider(self, spider):

self.conn.close()

class DatabasePipeline: # Add database connection parameters in the constructor def __init__(self, db, user, passwd, host): self.db = db self.user = user self.passwd = passwd self.host = host # Implement from_crawler method and get database connection info from settings.py @classmethod def from_crawler(cls, crawler): db_settings = crawler.settings.getdict("DB_SETTINGS") if not db_settings: raise NotConfigured db = db_settings['db'] user = db_settings['user'] passwd = db_settings['passwd'] host = db_settings['host'] return cls(db, user, passwd, host) # Connect to the database when the spider starts def open_spider(self, spider): self.conn = MySQLdb.connect(db=self.db, user=self.user, passwd=self.passwd, host=self.host, charset='utf8', use_unicode=True) self.cursor = self.conn.cursor() # Insert data records into the database (one item at a time) def process_item(self, item, spider): sql = "INSERT INTO table (field1, field2, field3) VALUES (%s, %s, %s)" self.cursor.execute(sql, ( item.get("field1"), item.get("field2"), item.get("field3"), ) ) self.conn.commit() return item # When all done close the database connection def close_spider(self, spider): self.conn.close()

1class DatabasePipeline:
2       # Add database connection parameters in the constructor
3       def \_\_init\_\_(self, db, user, passwd, host):
4    	      self.db = db
5    	      self.user = user
6    	      self.passwd = passwd
7    	      self.host = host
8       # Implement from\_crawler method and get database connection info from settings.py
9       @classmethod
10       def from\_crawler(cls, crawler):
11    	      db\_settings = crawler.settings.getdict("DB\_SETTINGS")
12    	      if not db\_settings:
13                  raise NotConfigured
14    	      db = db\_settings\['db'\]
15    	      user = db\_settings\['user'\]
16    	      passwd = db\_settings\['passwd'\]
17    	      host = db\_settings\['host'\]
18    	      return cls(db, user, passwd, host)
19       # Connect to the database when the spider starts
20       def open\_spider(self, spider):
21    	      self.conn = MySQLdb.connect(db=self.db,
22                            user=self.user, passwd=self.passwd,
23                            host=self.host,
24                            charset='utf8', use\_unicode=True)
25    	      self.cursor = self.conn.cursor()
26       # Insert data records into the database (one item at a time)
27       def process\_item(self, item, spider):
28    	      sql = "INSERT INTO table (field1, field2, field3) VALUES (%s, %s, %s)"
29    	      self.cursor.execute(sql,
30                             (
31                             item.get("field1"),
32                             item.get("field2"),
33                             item.get("field3"),
34                             )
35                             )
36    	      self.conn.commit()
37    	      return item
38	# When all done close the database connection
39        def close\_spider(self, spider):
40    	       self.conn.close()

Copy

For effective price intelligence, technically, you could also hardcode your database connection info in the pipeline but I suggest putting it into the settings file like this:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

DB_SETTINGS = {

'db': "my_db",

'user': 'root',

'passwd': 'my_pass',

'host': '0.0.0.0',

}

DB_SETTINGS = { 'db': "my_db", 'user': 'root', 'passwd': 'my_pass', 'host': '0.0.0.0', }

1DB\_SETTINGS = {
2 	'db': "my\_db",
3 	'user': 'root',
4 	'passwd': 'my\_pass',
5 	'host': '0.0.0.0',
6 }

Copy

Now we only have to activate this pipeline in the settings file:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

ITEM_PIPELINES = {

'ecommerce.pipelines.DatabasePipeline: 300,

}

ITEM_PIPELINES = { 'ecommerce.pipelines.DatabasePipeline: 300, }

1ITEM\_PIPELINES = {
2   'ecommerce.pipelines.DatabasePipeline: 300,
3}

Copy

Price intelligence

We have focused on how to extract e-commerce pricing data now let’s look at some basic ways you can analyze it and get actionable insights. This is essentially Price Intelligence.

In this section, I’m going to introduce some basic ways to analyze price data and how to get actionable insights from it. I’m using pandas and SQL queries on the backend to get the data from the database. To generate the charts on the front end, I’m using Google Charts.

Price history

When it comes to price intelligence, one important analysis is the price history of one product. It shows you how one product was priced in the past. This could be one way to help determine the pricing strategy of an e-commerce store. Obviously, for this, you need to scrape their data regularly for a longer time. But when you actually have access to the data you can see how their pricing has changed or not changed in the past. It’s also interesting to see what pricing strategy they use on important shopping days like Black Friday.

The following should give you a better understanding of what is meant by price intelligence.

If you look at the history chart above, you can have a good understanding of how you and your competitors set the prices for one product. Based on the past you could forecast how the competitors will change their prices in the future so you can adjust your strategy to prepare for it.

Stock

One of the key factors when shopping online is the availability of the chosen product. Maybe I’m willing to wait a week or two till the product I want to buy is in stock again but most of the time I want it in my hands as soon as possible and maybe even pay a little more as well just to get it faster.

To use the dynamic of smart price intelligence to our advantage, we can scrape the stock information from the product page and get alerted if all of our competitors are out of the given product so we can increase the price.

Price comparison

On a day-to-day basis, maybe the best insight price intelligence can give us is the overall view on the market and how our products fit in. Without web scraping, we would have a hard time knowing how our main competitors are pricing the same products as we sell.

Price position

On a higher level, we can analyze how many of our products are priced lower, the same, or higher than each of the competitors. On the chart below, we have 34 products that have a higher price than competitor3 and 9 products with lower price than competitor5. As you can see, this is a key component for price intelligence.

For example, we might want to position ourselves to have higher prices than one of our competitors or we want to be lower than another one. This kind of price intelligence analysis can help you adjust prices accordingly.

Category analysis

When we were writing the spider we extracted the category of the products as well. This way we can also group together products based on category.

This chart shows what our price position is compared to the competitors in each product category. As you can see, here we have 12 products in the “Cameras” category where we have the lowest price. In the other categories, we are either in the middle or highest price position.

Wrapping up

So this is the process of effective price intelligence so you can scrape e-commerce websites and get actionable insights from the data using python and some visualization. First, you plan what data fields you exactly need and from what websites. Second, you create web spiders to extract and store the data. (If you don’t want to struggle with selectors/XPath, use an AI-based web scraping tool, like Zyte Automatic Extraction (formerly AutoExtract)). Finally, you visualize the data to understand it and find business opportunities, which comes to the definition of price intelligence.

If you have an eCommerce data-driven product and you need help with data extraction, contact us here

Price intelligence with Python: Scrapy, SQL, and Pandas

This is the simple process we are going to follow for this article:

Identify data sources and fields
Extract and store data
Analyze data

Identify data sources and fields

Websites

In a real-life project, you’d probably know which websites you want to get data from. For this article, I’m choosing some popular European e-commerce stores.

Fields to scrape

product name
price
stock
category

See how we at Zyte structure product data.

Ethical web scraping

Example robots.txt:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

User-agent: *

Disallow: /

User-Agent: *

Disallow: /*.json

Disallow: /api

Disallow: /post

Disallow: /submit

Allow: /

User-agent: * Disallow: / User-Agent: * Disallow: /*.json Disallow: /api Disallow: /post Disallow: /submit Allow: /

1User-agent: \*
2Disallow: /
3
4User-Agent: \*
5Disallow: /\*.json
6Disallow: /api
7Disallow: /post
8Disallow: /submit
9Allow: /

Copy

Some things you could do to be compliant:

Respect the rules
Adjust crawling speed if needed
Identify yourself with a UserAgent
Do not harm the website

Extract and store data

Installing Scrapy

Create a new folder and install virtualenv:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

mkdir ecommerce

cd ecommerce

pip install virtualenv

virtualenv env

source env/bin/activate

mkdir ecommerce cd ecommerce pip install virtualenv virtualenv env source env/bin/activate

1mkdir ecommerce
2cd ecommerce
3pip install virtualenv
4virtualenv env
5source env/bin/activate

Copy

Install Scrapy:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

pip install scrapy

1pip install scrapy

Copy

If you’re having trouble with installing scrapy check out the installation guide.

Create a new Scrapy project

Now that we have Scrapy installed in our environment we can create a new Scrapy project:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

scrapy startproject ecommerce

1scrapy startproject ecommerce

Copy

This will generate the file structure:

Items

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class ProductItem(Item):

product = Field()

price = Field()

category = Field()

stock = Field()

class ProductItem(Item): product = Field() price = Field() category = Field() stock = Field()

1class ProductItem(Item):
2   product = Field()
3   price = Field()
4   category = Field()
5   stock = Field()

Copy

Spider

Each of our spiders will look the same except the selectors of course.

To create a spider, first, we should look at the website and its source code, for example:

Selecting data fields based on itemprop attributes gives us a better chance that the scraper won’t break in the future when the website layout changes.

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class Ecommerce(Spider):

name = "ecommerce"

start_urls = ["example.com/products/1", "example.com/products/2", "example.com/products/3", ]

def parse(self, response):

item_loader = ItemLoader(item=ProductItem(), response=response)

item_loader.default_input_processor = MapCompose(remove_tags)

item_loader.add_css("product", "h1[itemprop='name']")

item_loader.add_css("price", "span[itemprop=price]")

item_loader.add_css("stock", "span[itemprop=’stock’]")

item_loader.add_css("category", "a[data-track='Breadcrumb']")

return item_loader.load_item()

1class Ecommerce(Spider):
2   name = "ecommerce"
3   start\_urls = \["example.com/products/1", "example.com/products/2", "example.com/products/3", \]
4
5   def parse(self, response):
6       item\_loader = ItemLoader(item=ProductItem(), response=response)
7       item\_loader.default\_input\_processor = MapCompose(remove\_tags)
8
9       item\_loader.add\_css("product", "h1\[itemprop='name'\]")
10       item\_loader.add\_css("price", "span\[itemprop=price\]")
11       item\_loader.add\_css("stock", "span\[itemprop=’stock’\]")
12       item\_loader.add\_css("category", "a\[data-track='Breadcrumb'\]")
13
14       return item\_loader.load\_item()

Copy

I’m using ItemLoader with a default input processor to remove HTML tags. As you can see, I’m selecting the category field from the breadcrumb.

Creating a data pipeline

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

sudo pip install MySQL-python

1sudo pip install MySQL-python

Copy

Then in Scrapy, we create a new class called DatabasePipeline in the pipelines.py file:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class DatabasePipeline:

def process_item(self, item, spider):

return item

class DatabasePipeline: def process_item(self, item, spider): return item

1class DatabasePipeline:
2       def process\_item(self, item, spider):
3    	   return item

Copy

In this class we have several things to do:

Add database connection parameters in the constructor
Implement from_crawler method and get database connection info from settings.py
Connect to the database when the spider starts
Insert data records into the database (one item at a time)
When all done close the database connection

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

class DatabasePipeline:

# Add database connection parameters in the constructor

def __init__(self, db, user, passwd, host):

self.db = db

self.user = user

self.passwd = passwd

self.host = host

# Implement from_crawler method and get database connection info from settings.py

@classmethod

def from_crawler(cls, crawler):

db_settings = crawler.settings.getdict("DB_SETTINGS")

if not db_settings:

raise NotConfigured

db = db_settings['db']

user = db_settings['user']

passwd = db_settings['passwd']

host = db_settings['host']

return cls(db, user, passwd, host)

# Connect to the database when the spider starts

def open_spider(self, spider):

self.conn = MySQLdb.connect(db=self.db,

user=self.user, passwd=self.passwd,

host=self.host,

charset='utf8', use_unicode=True)

self.cursor = self.conn.cursor()

# Insert data records into the database (one item at a time)

def process_item(self, item, spider):

sql = "INSERT INTO table (field1, field2, field3) VALUES (%s, %s, %s)"

self.cursor.execute(sql,

(

item.get("field1"),

item.get("field2"),

item.get("field3"),

)

self.conn.commit()

return item

# When all done close the database connection

def close_spider(self, spider):

self.conn.close()

1class DatabasePipeline:
2       # Add database connection parameters in the constructor
3       def \_\_init\_\_(self, db, user, passwd, host):
4    	      self.db = db
5    	      self.user = user
6    	      self.passwd = passwd
7    	      self.host = host
8       # Implement from\_crawler method and get database connection info from settings.py
9       @classmethod
10       def from\_crawler(cls, crawler):
11    	      db\_settings = crawler.settings.getdict("DB\_SETTINGS")
12    	      if not db\_settings:
13                  raise NotConfigured
14    	      db = db\_settings\['db'\]
15    	      user = db\_settings\['user'\]
16    	      passwd = db\_settings\['passwd'\]
17    	      host = db\_settings\['host'\]
18    	      return cls(db, user, passwd, host)
19       # Connect to the database when the spider starts
20       def open\_spider(self, spider):
21    	      self.conn = MySQLdb.connect(db=self.db,
22                            user=self.user, passwd=self.passwd,
23                            host=self.host,
24                            charset='utf8', use\_unicode=True)
25    	      self.cursor = self.conn.cursor()
26       # Insert data records into the database (one item at a time)
27       def process\_item(self, item, spider):
28    	      sql = "INSERT INTO table (field1, field2, field3) VALUES (%s, %s, %s)"
29    	      self.cursor.execute(sql,
30                             (
31                             item.get("field1"),
32                             item.get("field2"),
33                             item.get("field3"),
34                             )
35                             )
36    	      self.conn.commit()
37    	      return item
38	# When all done close the database connection
39        def close\_spider(self, spider):
40    	       self.conn.close()

Copy

For effective price intelligence, technically, you could also hardcode your database connection info in the pipeline but I suggest putting it into the settings file like this:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

DB_SETTINGS = {

'db': "my_db",

'user': 'root',

'passwd': 'my_pass',

'host': '0.0.0.0',

}

DB_SETTINGS = { 'db': "my_db", 'user': 'root', 'passwd': 'my_pass', 'host': '0.0.0.0', }

1DB\_SETTINGS = {
2 	'db': "my\_db",
3 	'user': 'root',
4 	'passwd': 'my\_pass',
5 	'host': '0.0.0.0',
6 }

Copy

Now we only have to activate this pipeline in the settings file:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

ITEM_PIPELINES = {

'ecommerce.pipelines.DatabasePipeline: 300,

}

ITEM_PIPELINES = { 'ecommerce.pipelines.DatabasePipeline: 300, }

1ITEM\_PIPELINES = {
2   'ecommerce.pipelines.DatabasePipeline: 300,
3}

Copy

Price intelligence

We have focused on how to extract e-commerce pricing data now let’s look at some basic ways you can analyze it and get actionable insights. This is essentially Price Intelligence.

Price history

The following should give you a better understanding of what is meant by price intelligence.

Stock

Price comparison

Price position

Category analysis

When we were writing the spider we extracted the category of the products as well. This way we can also group together products based on category.

Wrapping up

If you have an eCommerce data-driven product and you need help with data extraction, contact us here

Identify data sources and fields

Websites

Fields to scrape

Ethical web scraping

Extract and store data

Installing Scrapy

Create a new Scrapy project

Items

Spider

Creating a data pipeline

Price intelligence

Price history

Stock

Price comparison

Price position

Category analysis

Wrapping up

Build your first scraper in minutes

Continue reading

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte

How I trade gold using e-ink, live data and an old Raspberry Pi

How price extraction is fuelling insights for modern retailers

The best of Zyte and the data web, in your inbox.

Identify data sources and fields

Websites

Fields to scrape

Ethical web scraping

Extract and store data

Installing Scrapy

Create a new Scrapy project

Items

Spider

Creating a data pipeline

Price intelligence

Price history

Stock

Price comparison

Price position

Category analysis

Wrapping up

Build your first scraper in minutes

Continue reading

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte

How I trade gold using e-ink, live data and an old Raspberry Pi

How price extraction is fuelling insights for modern retailers

The best of Zyte and the data web, in your inbox.