PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
AI60, 60 articles
Data quality13, 13 articles
Developer interest57, 57 articles
Integration2, 2 articles
Open-source40, 40 articles
Proxies29, 29 articles
Scraping practice17, 17 articles
Scraping strategy26, 26 articles
Web data60, 60 articles
Web scraping APIs33, 33 articles
Zyte API59, 59 articles
Scrapy48, 48 articles
Scrapy Cloud10, 10 articles
Web Scraping Copilot12, 12 articles
AI & Machine Learning1, 1 articles
Automotive2, 2 articles
E-commerce & retail26, 26 articles
Entertainment & Streaming2, 2 articles
Financial Services8, 8 articles
Government2, 2 articles
Market Research & Intelligence3, 3 articles
Media & publishing8, 8 articles
Real Estate2, 2 articles
Recruitment & HR3, 3 articles
Transportation & Logistics2, 2 articles
Travel & hospitality2, 2 articles
Extract Summit25, 25 articles
PyCon1, 1 articles

Appearance

Discord Community
BlogUse caseAlternative Data For Hedge Funds & Portfolio Management
ArticleUse case

Alternative Data For Hedge Funds & Portfolio Management

Hedge funds are increasingly turning to alternative data generate Alpha. Buy-side spend has grown 160% in the last 2 yrs, but not all data is created equal.

J

John Campbell

7 min read · May 13, 2021

Alternative Data For Hedge Funds & Portfolio Management

Alternative data for hedge funds & portfolio management

2020 & 2021 thus far have not been business as usual. Last year started with a continued run of the longest bull market run in modern economic history, which traces its beginnings to the previous market low post-sub-prime mortgage triggered the financial crisis in early March 2009. The S&P 500 on 9th March 2009 closed at 676.53, while on 19th February 2020, it closed at 3,386.15. That's a 500% growth over a little over a decade; the best run ever for a major index topping the second-best run of 417% experienced in the 1990s.

Then came March 2020 and the once brushed off as a new mild flu-like disease circumscribed to mainland China came knocking on everyone's doorstep. Most if not all financial markets came to an abrupt halt recuperating somewhat since (specific sectors better than others). While this remains true today, the quick transition to a mainly socially distanced and 'virtual/online-first world has exacerbated an ongoing trend seen over the past few years. For example, +160% growth on buy-side spend on alternative data since 2018, topping $1.7Bn in 2020 alone.

In the Alternative Investment Management Association (AIMA) report Casting the Net: How Hedge Funds Are Using Alternative Data, more than half the respondents said they actively used alternative data. Another 14% were exploring options.

While precise definitions of alternative data vary, there can be almost universal agreement that alternative data sourced from places other than the traditional sources. SEC filings, financial performance reports are usually compounded by organizations such as Bloomberg, Refinitiv/Thomson Reuters, internal proprietary portfolio performance data, and other well-known sources. As I discussed in my latest article, Alternative Data is not just Facebook data! Datasets exclusively sourced from social media networks may very well not be of enough quality t and require deft preparation and parsing before applying to the financial services space.

Safe to say that throughout the history of modern capitalism and even in antiquity (curious fact: in ancient Babylon, merchants used Euphrates' depth measurements to inform their commodity prices), businesses have sought to gain an advantage over their competitors. Either by uncovering patterns and trends that weren't readily available through conventional analysis or were not leveraged by competitors. What is different this time around is the so-called data explosion were are currently experiencing. IDC estimates that in 2010 1.2 zettabytes of data were created, but by 2025 this figure will balloon to 175 zettabytes. That and the combination of the advancement of data gathering, processing, and analysis technologies such as very sophisticated data extraction and closely related technologies, knowledge graph technology, natural language processing (NLP), entity resolution, and the continuous advancement of raw compute power by the likes of Dell EMC, HPE, AWS, etc.

As Michael Megaw from SS&C recently put it, 'Alternative data has become a disruptor in the hedge fund industry' placing those organisations that successfully transform to harness it, augment their existing research methodologies by applying it in prime position to reap benefits far greater than the laggards in the industry. 

Word of caution. This wave of enthusiasm might put some on the wrong idea about what this kind of data can do for organizations in the financial services space, be it capital markets, investment management, high street & commercial banking, or insurance. Alternative data does not replace existing datasets or sound quantitative research methodologies. It serves as a means to an end, significantly augmenting and enhancing traditional research methods with more profound, more comprehensive insights.

Based on the market evidenced trends, it all points to continued growth in all aspects of alternative data - data available, sell-side data providers, adoption, and integration into existing workflows -. As the eagerness of just getting 'your hands on this new 'miracle' data wanes and it becomes more pervasive across the entire capital markets and investment management industry, a shift towards data quality above quantity and diversity will be inevitable. The focus on value will, in turn probably lead to a consolidation in terms of players in this space, particularly when data quality issues are costing organizations in excess of $9.7 million a year on average.

All hedge funds are not created equal

First things first, all hedge funds are not created equal. A simple way to tell them apart is to look at the assets under management (AUM) figure to tell apart the big boys from the smaller, more niche type of players.

Common sense dictates that mid and more prominent players, those with $5 billion or more and $10 billion or more under management, respectively, would have more resources at their disposal to harness and productize the 'promise' held by alternative datasets. This seems to be the case as EY, in its latest annual Global Alternative Fund Survey, showed that while 44% of Funds have dedicated FTEs to leverage this onslaught of alt. data as a whole, that figure skyrockets to 60% in the case of funds with over $10 billion AUM. For smaller funds (less than or equal to $2 billion AUM), this figure drops to about 32%. That said, it still shows that smaller players are nimble enough to allocate resources to punch above their weight and subsequently reap the benefits. 

In terms of performance, particularly for alternative investment funds, which are even more prone to leverage alternative data, 58% of investors have come out saying that their managers have met or exceeded their performance expectations during the market volatility that occurred due to the pandemic. 

Top types of alternative data used by hedge funds

There are thousands of alternative datasets out there, pitched by various vendors. One top leading vendor states that they have 1500+ ready-to-consume datasets, so you can visualize how categorizing them becomes all the more important.

That said, most of the alternative data hedge funds are utilizing at least one of the following types of data regularly

  • Web data 🌐
  • Transaction data/Consumer spending 💵
  • Social Media & related sentiment data 🔗
  • App usage 📱
  • Web traffic📈
  • Geo location 📍
  • Satellite imagery 🛰️
  • Email receipt 📧

Generating Alpha & Risk Management

Delivering a return that can solely be attributed to the hedge fund manager's savviness and skill set above all else is the golden grail of generating alpha. To deliver this, fund managers need to identify and leverage an edge. Either spot opportunities that others have missed or underestimated and, vitally, allocate the right amount of weight into their investment portfolio strategy. Tricky, isn't it?

In a recent study by Eagle Alpha, Olga Kokareva from Quantstellation synthesized it brilliantly by showing how the usage of alternative data by hedge fund managers differs significantly depending on if pursuing a so-called fundamental or quantitative approach.

"It's important to understand that usage of alternative data by fundamental hedge fund managers and by quantitative hedge funds are two very different processes. Fundamental hedge fund managers normally use alternative data to reinforce their investment thesis that they derived from their regular research process. For example, a manager can hold a long position in a retailer, and they are thinking about closing it, but they are not sure. So, instead of waiting for the next quarterly report, they can start looking at foot traffic data or credit card data. If the sales numbers are indeed going down, they might close this position earlier." 

On the other hand, quantitive hedge funds derive their investment hypothesis and thesis purely from insights derived from the available data to them, applying more often than not advanced machine learning models. This has been taking place for years, for sure, since the late 1970s and 80s. Therefore leveraging alternative data sets to improve probabilistic models, for example, can only be described as a natural extension of previously occurring phenomena.

But alternative data usage cannot only be circumscribed to the likes of stock selection use cases but also the broader discipline of risk management. After all, from a capital allocation standpoint, investment risk management is something most hedge fund managers are known for. The idea of risk-adjusted returns spearheaded by the now-famous Sharpe ratio developed by William Sharpe, for which he won the Nobel Prize in 1990, has been paramount to sound portfolio management.

The problem, however, still lies in imperfect information, therefore, rendering mathematical constructs like the Sharpe ratio inherently flawed. Although perfect information is empirically impossible, what alternative data promises is to reveal hidden risks that can make a significant difference in risk-reward calculations. Insurance and lending organizations have already started layering alternative data on top of traditional data sets for this exact purpose. Hedge funds are somewhat lagging behind in applying alternative data to manage risk, with only 23% of market leaders using alternative data to help them improve risk management processes.

Risks & Challenges

Being able to efficiently and effectively use alternative data, in other words using it to produce investment or generate operational efficiencies, requires five key components:

  1. Having the adequate human capital
  2. Having the correct infrastructure
  3. Having the correct processes, i.e.: Master Data Management.
  4. Navigating the regulatory environment that governs the collection, usage, and distribution of this data
  5. Being able to demonstrate to investors ROI

I don't intend to drill into each element highlighted above in detail (stay tuned for the next article), but highlight the main points regarding infrastructure, processes, and adequate human capital.

Respondents previously mentioned study by AIMA (49% of 'market leaders' and 54% of the 'rest of the market'), but having the appropriate infrastructure and human capital as the key challenge to delivering on the promise of alternative data.

When drilled into explaining the main components of this challenge, 77% of market leaders & 54% of the rest of the market answered that the biggest challenge is the inability to back-test alternative data. So many of the ever more large datasets just don't go far enough back in time for them to be of any significant contribution to historical-based models.

Another key insight is that over half of respondents within the 'market leaders' segment and more than 60% in the 'rest of market' segment stressed the difficulties of sourcing quality data sets. Quality of data is paramount to not only hedge fund portfolio management performance, but it speaks to a wider issue that stems from the 'data explosion' phenomena that I've alluded to earlier on.

Organizations are now almost drowning in data; getting data is not the problem, deriving actionable insights from it is. To do that, data needs to be clean and insight-ready. The fact that in too many circumstances, it is not is costing close to $10 million on average per year to organizations, according to Gartner. Almost five years ago, IBM estimated that the cost of bad data in the United States alone was a staggering $3 Trillion per year.

Circling back to hedge funds in particular, even after being able to identify the correct alternative data set, the question of master data management comes to the fore. Data governance & stewardship, semantic consistency between databases and systems, permanency risk (how far into the future will datasets be used for), data robustness and consistency (can it be mapped to fixed references such as CUSIPs in the US or SEDOLs in the UK & Ireland; all need to be addressed.

This situation points us to the need for organizations to have the correct amount and type of human capital to make the most of alternative data. The fact that full-time employees dedicated to managing these datasets have jumped by ~450% in the last couple of years is a testament that the race is on between organizations to get the best candidates into their organizations and fast.

Where to from here

In the recent study by AIMA and the recently published EY 2020 Global Alternative Fund Survey is anything to go by, the pervasiveness of alternative data within the hedge fund space is only going to increase in the immediate future.

Although the number of datasets might continue to grow, undoubtedly, there will be a flight for quality sooner rather than later with a consolidation of players. With big hitters such as Bloomberg announcing they are doubling down on alternative data and Refinitiv consolidating their own offering, we can just say one thing: the era of alternative data augmented investment strategies is here for the long run.

How can Zyte help

Here at Zyte, we specialize in delivering either custom data feeds explicitly optimized for you by taking the unstructured data from the web and providing it in a structured format or enabling organizations with their own internal data collection teams through robust, resilient, and always-on infrastructure designed for web data extraction.

We help some of the largest FSI's navigate the complexity of web data extraction for alternative data-related use cases, ensuring compliance standards are met and, equally importantly, ensuring a healthy data pipeline.

Try Zyte API

Build your first scraper in minutes

Free trial, no credit card. From a single request to production in an afternoon.

Get started
Use case
J

John Campbell

More from this author

In this article

  • All hedge funds are not created equal
  • Top types of alternative data used by hedge funds
  • Generating Alpha & Risk Management
  • Risks & Challenges
  • Where to from here
  • How can Zyte help

Follow

Get the latest

Zyte and the data web in your inbox — or wherever you already are.

Subscribe

Or follow elsewhere

Continue reading

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte
Use case

Scraping Swiss Army Knife: My personal fix for web setup fatigue using Docker, Scrapy and Zyte

Tired of repeating web scraping setup? Learn how a multi-arch Docker container with Scrapy, Zyte, Requests, and Pandas speeds up exploration and debugging.

Ayan Pahwa·10 min·February 5, 2026
How I trade gold using e-ink, live data and an old Raspberry Pi
Use case

How I trade gold using e-ink, live data and an old Raspberry Pi

Track real-world gold and silver retail prices automatically using Zyte API, Python, and a Raspberry Pi with an e-ink display. Learn how to scrape rendered HTML, parse prices, and build an always-on trading dashboard.

Ayan Pahwa·10 min·February 2, 2026
How price extraction is fuelling insights for modern retailers
Use case

How price extraction is fuelling insights for modern retailers

Retail pricing has long combined data, experience, and instinct – but today’s market volatility demands a faster, smarter approach.

Theresia Tanzil·7 mins·July 23, 2025

The Community · Newsletter

The best of Zyte and the data web, in your inbox.

One curated edition — new articles, product updates, and the stories shaping the data web. No noise.

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026