PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogUse caseMapping Corruption In The Panama Papers With Open Data
ArticleUse case

Mapping Corruption In The Panama Papers With Open Data

We are at a point in the digital age where corruption is increasingly difficult to hide. Information leaks are abundant and shocking.

C

Cecilia Haynes

5 min read · April 6, 2016

Mapping Corruption In The Panama Papers With Open Data

Mapping corruption in the Panama papers with open data

We are at a point in the digital age where corruption is increasingly difficult to hide. Information leaks are abundant and shocking.

We rely on whistleblowers for many of these leaks. They have access to confidential information that’s impossible to obtain elsewhere. However, we also live in a time where data is more open and accessible than at any other point in history. With the rise of Open Data, people can no longer shred away their misdeeds. Nothing is ever truly deleted from the internet.

It might surprise you how many insights into corruption and graft are hiding in plain sight through openly available information. The only barriers are clunky websites, inexperience in data extraction, and unfamiliarity with data analysis tools.

We now collectively have the resources to produce our own Panama Papers. Not just as one offs, but as regular accountability checks to those in situations of power. This is especially the case if we combine our information to create further links.

One example of this democratization of information is a recent project in Peru called Manolo and its intersection with the Panama Papers. Manolo used the webscraping of open data to collect information on Peruvian government officials and lobbyists.

Manolo

Manolo is a web application that uses Scrapy to extract records (2.2 million so far) of the visitors frequenting several Peruvian state institutions. It then repackages the data into an easily searchable interface, unlike the government websites.

Peruvian journalists frequently use Manolo. It has even helped them uncover illegal lobbying by tracking the visits of construction company representatives who are currently under investigation to specific government officials.

Developed by Carlos Peña, a Zyte engineer, Manolo is a prime example of what a private citizen can accomplish. By opening access to the Peruvian media, this project has opened up a much needed conversation about transparency and accountability in Peru.

image02

Clunky government website

image01

Scrapy working its magic

image00

Extracted data in a structured format

image03

Final transformation into Manolo

Cross-Referencing Datasets

With leaks like the Panama Papers as a starting point, web scraping can be used to build datasets to discover wrongdoing and to call out corrupt officials.

For example, you could cross-reference names and facts from the Panama Papers with the data that you retrieve via web scraping. This would give you more context and could lead to you discovering more findings.

We actually tested this out ourselves with Manolo. One of the names found in the Panama Papers is Virgilio Acuña Peralta, currently a Peruvian congressman. We found his name in Manolo’s database since he visited the Ministry of Mining last year.

image04

According to the Peruvian news publication Ojo Público, Acuña wanted to use Mossack Fonseca to reactivate an offshore company that he could use to secure construction contracts with the Peruvian state. As a congressman, this is illegal. In Peru, there are efforts to investigate Virgilio Acuña and his brother, who recently ran for president, for money laundering.

image06

Virgilio Acuña, congressman (middle), on the right Cesar Acuña (campaigning for the Peruvian presidency last month) [Photo via @Ojo_Publico]

Another name reported in the Panama Papers by Ojo Público was Jaime Carbajal Peréz, a close associate of former Peruvian president Alan García.

Ojo Público states that in 2008, Carbajal, along with colleague Percy Uriarte and others, bought the offshore company Winscombe Management Corp. from Mossack Fonseca. Carbajal and Uriarte own a business that sells books to state-run schools. Further plot twist is that the third owner of the bookstore, José Antonio Chang, was the Minister of Education from 2006 to 2011 and the Prime Minister from 2010 to 2011.

A quick search of the Manolo database reveals that Percy Uriarte visited the Peruvian Ministry of Education 45 times between 2013 and 2015. IDL-Reporteros, another Peruvian news outlet, reported that the company led by Carbajal illegally sold books to the Peruvian government in 2010. He used a front company for these transactions since he was forbidden by law to engage in contracts with the state due to his close association with the former president.

image05

Data Mining

Data from previous leaks such as the Saudi Cables, Swiss Leaks and the Offshore Leaks have been released publicly. In many cases, the data has been indexed and catalogued, making it easy for you to navigate and search it. And now we’re just waiting on the full data dump from the Panama Papers leaks.

You can use information retrieval techniques to dig deeper into the leaks.You could find matching records, sift through the data by tagging specific words, parts of speech, or phrases, and identify different entities like institutions or people.

Creating Open Data

If the information is not readily available in a convenient database, then you can always explore open data yourself:

  • Identify sources of open data that have the information you need. These are often government websites and public records.
  • Scrape the data. You can do this in two ways: using a visual web scraper like Portia (no programming needed) or a framework like Scrapy that allows you to customize your code. Big bonus is that these are both open source tools and completely free.
  • Download your data and import into your favorite data analysis software. Then start digging!
  • Package your findings by creating visual representations of the data with tools like Tableau or Plot.ly.

We actually offer platform integrations with Machine Learning programs like BigML and MonkeyLearn. We’re looking into integrating more data tools later this year, so keep an eye out!

Wrap Up

Data and corruption are everywhere and they can both seem difficult to access.That’s where web scraping comes in. We are a point where citizens have the tools necessary to hold elected officials, businesses, and folks in power accountable for illegal and corrupt actions. By increasing transparency and creating further links between information leaks and readily available data, we can remove the loopholes where firms like Mossack Fonseca exist.

Frameworks like Scrapy are great for those who know how to code, but there shouldn’t be a barrier to acquiring data. Journalists who wish to take advantage of these vast sources of information can use visual web scrapers like Portia to get the data for current and future investigations.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
Use case
C

Cecilia Haynes

More from this author

In this article

  • Manolo
  • Cross-Referencing Datasets
  • Data Mining
  • Creating Open Data
  • Wrap Up

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

Continue reading

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte
Use case

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte

Tired of repeating web scraping setup? Learn how a multi-arch Docker container with Scrapy, Zyte, Requests, and Pandas speeds up exploration and debugging.

Ayan Pahwa·10 min·February 5, 2026
How I trade gold using e-ink, live data and an old Raspberry Pi
Use case

How I trade gold using e-ink, live data and an old Raspberry Pi

Track real-world gold and silver retail prices automatically using Zyte API, Python, and a Raspberry Pi with an e-ink display. Learn how to scrape rendered HTML, parse prices, and build an always-on trading dashboard.

Ayan Pahwa·10 min·February 2, 2026
How price extraction is fuelling insights for modern retailers
Use case

How price extraction is fuelling insights for modern retailers

Retail pricing has long combined data, experience, and instinct – but today’s market volatility demands a faster, smarter approach.

Theresia Tanzil·7 mins·July 23, 2025

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026