PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogScraping practiceExtract JSONs Like A Pro With Chompjs And JMESPath
ArticleScraping practice

Extract JSONs Like A Pro With Chompjs And JMESPath

Easily get data from nested dictionaries within blocks of javascript code with these tips that facilitate how you extract JSONs.

R

Roy Healy

4 min read · June 3, 2021

Extract JSONs Like A Pro With Chompjs And JMESPath

Extract JSONs like a pro with chompjs and JMESPath

Handling javascript objects is an important skill for any web data extraction developer.

At the start, having to extract JSONs can seem like a daunting task to get data from these nested dictionaries within blocks of javascript code, however, I am going to introduce you to two packages I use to make getting info from these seem like a breeze.

You might only start dipping your toes into this area when dealing with dynamic pages, but you will then quickly see that

1\_\_DATA\_\_ = {"data":{"type":"@products", "products":\[{"id":12345678, "name":"Bacon", "brand": "Some Brand", "price":2.50, "instock": false},{"id":12345679, "name":"Ham", "price":3.50, "instock": true},{"id":12345680, "name":"Beef", "price":1.50, "instock": false}\]}};
2some\_javascript(data) {results = do\_stuff(data); return results};
3new beep\_boop\_js\_var = some\_javascript(\_\_DATA\_\_)
Copy

This text has a lot of elements that you don’t want but it also has, what looks a lot like a python dictionary, containing data about a lot of products.

That’s a javascript object.

extract jsons

Normally you’d have a JSON package to help you turn it into a dictionary for your overall project to extract JSONs.

But what do you do when it’s not a clean JSON? We already know the end goal is to extract JSONs more effectively.

Fellow Zytan Mariusz Obajtek made a package to help us in this situation: chompjs.

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

from chompjs import parse_js_object

script = """__DATA__ = {"data":{"type":"@products", "products":[{"id":12345678, "name":"Bacon", "brand": "Some Brand", "price":2.50, "instock": false},{"id":12345679, "name":"Ham", "price":3.50, "instock": true},{"id":12345680, "name":"Beef", "price":1.50, "instock": false}]}};

some_javascript(data) {results = do_stuff(data); return results};

new beep_boop_js_var = some_javascript(__DATA__)"""

data = parse_js_object(script)

from chompjs import parse_js_object script = """__DATA__ = {"data":{"type":"@products", "products":[{"id":12345678, "name":"Bacon", "brand": "Some Brand", "price":2.50, "instock": false},{"id":12345679, "name":"Ham", "price":3.50, "instock": true},{"id":12345680, "name":"Beef", "price":1.50, "instock": false}]}}; some_javascript(data) {results = do_stuff(data); return results}; new beep_boop_js_var = some_javascript(__DATA__)""" data = parse_js_object(script)

1from chompjs import parse\_js\_object
2script = """\_\_DATA\_\_ = {"data":{"type":"@products", "products":\[{"id":12345678, "name":"Bacon", "brand": "Some Brand", "price":2.50, "instock": false},{"id":12345679, "name":"Ham", "price":3.50, "instock": true},{"id":12345680, "name":"Beef", "price":1.50, "instock": false}\]}};
3some\_javascript(data) {results = do\_stuff(data); return results};
4new beep\_boop\_js\_var = some\_javascript(\_\_DATA\_\_)"""
5
6data = parse\_js\_object(script)
Copy

In this case, the parse_js_object function looks through the script to find the first js object, extracts it, and then turns it into a python dictionary.

From the perspective of having to extract JSONs, well, this is just the tip of the iceberg with chompjs, check out the examples on its Github to see other, more difficult formats you can parse easily with it.

Extract data using JMESPath

So, now that you have your dictionary, what’s the best way to get your data out of it?

We already know the end goal is to extract JSONs in the best way possible.

Moving on, with nested dictionaries, it can be annoying to pick out the fields you need, but you can make it much easier by using another package: JMESPath.

For example if you want to get the list of products from that dictionary, you can do that with a single function call:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

jmespath.search('data.products', data)

jmespath.search('data.products', data)

1jmespath.search('data.products', data)
Copy

It doesn’t stop there. Let’s go one step further - say you want the names of the products? You can do:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

jmespath.search('data.products[].name', data)

jmespath.search('data.products[].name', data)

1jmespath.search('data.products\[\].name', data)
Copy

The change here indicates that I want to go through the products list and pull out the name fields, this will leave me with a list of product names. Now, while that is already very useful, we can go a bit deeper. 

Say you want the dict for only one of these products - the one called “Bacon”. Well, we can actually enter a query within the square brackets to filter our results:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

jmespath.search('data.products[?name==`Bacon`]', data)

jmespath.search('data.products[?name==`Bacon`]', data)

1jmespath.search('data.products\[?name==\`Bacon\`\]', data)
Copy

As before, we can also pull a specific field out:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

jmespath.search('data.products[?name==`Bacon`].price', data)

jmespath.search('data.products[?name==`Bacon`].price', data)

1jmespath.search('data.products\[?name==\`Bacon\`\].price', data)
Copy

Now, let's do something a bit more interesting. Say I want to find products that are over a certain price. Well, I can do other sorts of conditionals in those brackets too:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

jmespath.search('data.products[?price>`2`], data)

jmespath.search('data.products[?price>`2`], data)

1jmespath.search('data.products\[?price>\`2\`\], data)
Copy

You may have noticed that only one of the items has a brand, so if I was to do the following it would give me just the brand name for that one product. Take care in this case as if there are incomplete results you won’t know which of the dictionaries this data actually comes from:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

jmespath.search('data.products[].brand', data)

jmespath.search('data.products[].brand', data)

1jmespath.search('data.products\[\].brand', data)
Copy

Finally, you may have noticed that the instock field in our sample has a boolean value, so if we wanted to only get the names of all in-stock items we can do it as so:

Plain text

Copy to clipboard

Open code in new window

EnlighterJS 3 Syntax Highlighter

jmespath.search('data.products[?instock].name', data)

jmespath.search('data.products[?instock].name', data)

1jmespath.search('data.products\[?instock\].name', data)
Copy

Conclusion to Extract JSONs

These two packages are probably two of the more important packages I use when I am extracting web data.

JSON, or JavaScript Object Notation, is a readable text-based format for structuring data. It is used primarily to transmit data between a server and web application. A JSON string is virtually identical to the code for a javascript object, making it easy to work with in JavaScript, as well as with other programming languages.

Many sites will typically use standard JSON or js scripts in their source, which chompjs can extract for you. With these cases and also most API responses you likely end up with nested dictionaries, which JMESPath makes sifting through a breeze.

Using these tools will enable you to level up your web scraping capabilities and get the data you need, minus the fuss.

Try our Smart Browser, a single API solution with browser and javascript rendering. 

At Zyte, we know the value of data, and we want to empower you with the right tools to web scrape quickly and efficiently. 

If you require custom solutions, talk to our experts today to see what works best for your needs. 

Looking for additional resources?

Check out other useful open source packages for parsing HTML and extracting data:

  • Price-parser
  • Dateparser

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
Scraping practice
R

Roy Healy

More from this author

In this article

  • Parse script text using chompjs library
  • Extract data using JMESPath
  • Conclusion to Extract JSONs
  • Looking for additional resources?

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026