PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogUse caseHow Web Scraping Reveals Lobbying and Corruption in Peru
ArticleUse case

How Web Scraping Reveals Lobbying and Corruption in Peru

How Web Scraping is Revealing Lobbying and Corruption in Peru - Discover how web scraping is shedding light on lobbying and corruption issues in Peru.

V

Valdir Stumm Junior

4 min read · March 9, 2016

How Web Scraping Reveals Lobbying and Corruption in Peru

How web scraping is revealing lobbying and corruption in Peru

Update: With the release of the Panama Papers, a reliable means of exposing corruption and the methods of money laundering and tax evasion are now even more important. Web scraping provides an avenue to find, collate, and organize data without relying on information leaks.

Paid political influence and corruption continue to be major issues in a surprising number of countries throughout the world. The recent "Operation Lava Jato" in Brazil, in which officials from the state company Petrobras were accused of taking bribes from construction companies in exchange for contracts financed with taxpayer money, is a reminder of this. It has been suggested that unregulated lobbying can lead to this kind of corruption and Transparency International, a non-governmental organization that monitors corruption, has strongly recommended that governments regulate lobbying.

I live in Peru and with corruption scandals regularly making headlines, I was curious to see how Peruvian officials fared and to examine the role that lobbyists play in my government.

Is it possible to track lobbying in Peru?

The Peruvian Law of Transparency requires government institutions to maintain a website where they publish the list of people that visit their offices. Every visitor must register their ID document number, the reason they’re visiting, and the public servant whom they are visiting along with the time of entrance, time of exit and date of visit. You can find all of this information on websites such as this.

The problem with open data

While almost all institutions have their visitor list available online, they all suffer from the same model of a broken user interface. One of the major issues with these websites is that you’re not able to search for visitors, you can only browse who visited on a particular day. If you’re looking for a known lobbyist, you would need to visually scan several pages as there can be up to 400 visitors per day in some institutions.

Obviously this method of search is time-consuming, tedious, boring, and it’s easy to miss the person that you are searching for. This problem is compounded when you want to search for a particular person visiting more than one public institution.

The web scraping solution

To help journalists track lobbyists, I started the project Manolo. Manolo is a simple search tool that contains the visitor records of several government institutions in Peru. You can type any name and search for any individual across 14 public institutions in a matter of milliseconds.

Under the hood

Unfortunately Peruvian institutions do not provide their visitor data in a structured way. There’s no API to fetch this data in a machine readable format, so the only way to get the information is by scraping their websites and then mining the data.

Manolo consists of a bunch of web scrapers written with Zyte’s popular Python framework Scrapy. The Scrapy spiders crawl the visitor lists once a week, extracting structured data from the HTML content and storing it in a PostgreSQL database. Elasticsearch indexes this data and users can then perform searches via a Django-based web UI, available on the Manolo website.

image00

So far, my friend @matiskay and I have created 14 spiders which have scraped more than 2 million records. You can find the source code for the spiders and the search tool here.

image02

Who uses Manolo?

I have always aimed to provide this search tool online for anyone to use free of charge. At the beginning, I was just hoping that I could convince one or two journalists to use Manolo when searching for information.

You can imagine my suprise when two national newspapers (El Comercio and Diario Exitosa), one TV station (Latina) and two news web portals (Utero.pe and LaMula) became active users of Manolo.

One of the most impressive uses of Manolo was when a journalist used it to find out that the legal representative of a company that signed numerous and extremely profitable construction contracts with the government was a regular visitor to the President's residence building. According to the data scraped by Manolo, this representative visited some of the closest allies of the president 33 times. He was already under investigation because his company pocketed the money from the contracts and never built anything.

The discovery of the 33 visits made big headlines in 4 nationwide newspapers. Soon after, journalists from a variety of media outlets started using Manolo to find suspicious visits of known lobbyists to ministries responsible for the construction of public infrastructure. They even found that lobbyists involved in the famous "Lava Jato" corruption scandal were also regular visitors of Peruvian institutions.

image01

"Chocherin Visited Government Palace 33 Times"

Wrap Up

Government corruption through lobbying is shockingly prevalent. Transparency is the solution, but even publicly available information can be convoluted and difficult to access. And that’s where web scraping comes in. Open data is what we should all be striving towards and hopefully others will follow Manolo’s lead.

Our use of Scrapy can easily be applied to any other situation where you are looking to scrape websites for information on government officials (or anyone else, for that matter).

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
Use case
V

Valdir Stumm Junior

More from this author

In this article

  • Is it possible to track lobbying in Peru?
  • The problem with open data
  • The web scraping solution
  • Under the hood
  • Who uses Manolo?
  • Wrap Up

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

Continue reading

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte
Use case

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte

Tired of repeating web scraping setup? Learn how a multi-arch Docker container with Scrapy, Zyte, Requests, and Pandas speeds up exploration and debugging.

Ayan Pahwa·10 min·February 5, 2026
How I trade gold using e-ink, live data and an old Raspberry Pi
Use case

How I trade gold using e-ink, live data and an old Raspberry Pi

Track real-world gold and silver retail prices automatically using Zyte API, Python, and a Raspberry Pi with an e-ink display. Learn how to scrape rendered HTML, parse prices, and build an always-on trading dashboard.

Ayan Pahwa·10 min·February 2, 2026
How price extraction is fuelling insights for modern retailers
Use case

How price extraction is fuelling insights for modern retailers

Retail pricing has long combined data, experience, and instinct – but today’s market volatility demands a faster, smarter approach.

Theresia Tanzil·7 mins·July 23, 2025

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026