The Fly, The Parrot & The Thinking Machine: The Rise of Reasoning LLMs

Read Time

10 mins

Posted on

April 14, 2025

By leveraging the power of LLMs to reason about web page structures and data relationships, we can automate tasks that previously required significant human intervention.

Konstantin Lopukhin

Introduction The trouble with reason Beyond parrot fashion: what makes a ‘reasoning’ model?DeepSeek R1: a turning point The cost of intelligence What's on the horizon?

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.

Start Free Find out more

Return to top

Subscribe to our Blog

Table of Content

What do a speeding train, a sleepy fly, and the future of artificial intelligence have in common? More than you might think.

For the last few years, the AI world has been wowed by Large Language Models (LLMs) that can generate impressively human-like text. But ask one of these models a classic riddle – a slight twist on the train-fly problem – and it would often get lost in a maze of digital confusion. It could parrot the form of reasoning, but it would get lost halfway through a problem.

That, however, is rapidly changing.

I have been closely tracking a revolution in the LLM landscape: the rise of reasoning models. These aren't your typical text generators. They're designed to tackle problems that require breaking down complexity, considering different angles, and even learning from their mistakes. And the implications are huge, especially for those of us working with the messy, unpredictable world of web data.

The trouble with reason

Let’s take this math problem. A train travels from point A to B, 45 miles away, at 90 mph. A fly starts from point A at the same time, flying at 180 mph, repeatedly shuttling back and forth between the oncoming train and point B, taking a 2-minute nap each time it reaches point B. This continues until the train reaches point B. What is the total distance covered by the fly?

It's not rocket science, but it requires a few logical steps. Unlike the original train-fly problem, you can't use a shortcut to compute the total time the fly spent flying. You need to consider the relative speeds, the nap times, and the fact that the distances change with each leg of the fly's journey. Older LLMs, trained primarily on predicting the next word in a sequence, could often generate plausible-sounding explanations, but they'd frequently get the answer wrong. They mimic the style of a solution, not the underlying logic – parroting the fly’s progress without understanding it.

Traditionally, LLMs are fine-tuned using a method called Supervised Fine-Tuning (SFT). It's like showing the model a textbook full of solved problems and saying: "Learn to imitate this." It works well for generating text that looks like the training data, but it doesn't teach the model to truly understand the problem-solving process, as the model has its hand held the whole way.

Beyond parrot fashion: what makes a ‘reasoning’ model?

Reasoning models work very differently. They can:

Break down complex problems: They can take a multi-step problem (like our fly and train example) and deconstruct it into smaller, manageable parts.
Consider multiple perspectives: They can look at a problem from different angles, weighing options and possibilities.
Adapt and recover: If it makes a mistake in its "thinking" process, they can often detect it and course-correct.

The reason “reasoning” models get to reason is thanks to a fundamentally different approach to training – reinforcement learning (RL).

Imagine it like this: instead of a textbook, you give the model a set of problems and a simple rule: "Answer only with a number and a unit." Then, you let it loose.

The model will generate all sorts of responses – some right, some wrong, some just plain nonsensical. But here's what is crucial:

We can automatically check if the final answer is correct.
We can reward the model for producing reasoning traces that lead to correct answers.
We can also reward good formatting, staying on topic and in a single language.

This is like giving the model a biscuit for getting the right answer and for showing its work in a clear, logical way. Good dog!

Over time, through countless trials and errors, the model learns not just to find the right answer, but to reason its way there. It learns to break down problems, consider different approaches, and even recover from dead ends.

DeepSeek R1: a turning point

The release of DeepSeek R1 in January 2025 was a watershed moment. It wasn't just another incrementally better model. It was a demonstration of the power of RL in creating truly reasoning LLMs, and its release was incredibly generous with its open-source offering.

Deepseek R1 was notable because it was the first model, at the time of its release, to achieve state-of-the-art performance, and because its weights and most of the code were open-sourced. What’s more, it came with detailed papers, R1 and V3, unlike competitors like Llama or Mistral, which only released technical reports, while its “Mixture of Experts” architecture was comparatively cheap to train and run.

The Deepseek team didn't just serve up a three-course meal: they shared the recipe. This level of transparency is rare in the fast-paced world of AI, and it sparked a flurry of activity.

One of the most fascinating aspects of DeepSeek R1 is its "thinking" process. Unlike traditional LLMs that might spit out an answer in a few hundred tokens, DeepSeek R1 can generate thousands of tokens of reasoning – exploring different paths, reconsidering assumptions, and even correcting its own mistakes. It's like watching a digital mind at work – in fact, click here for a glimpse into R1’s mind, as it thinks-through the train-fly problem.

The cost of intelligence

Of course, this extended reasoning comes with trade-offs.

Latency: Reasoning models take longer to generate answers. If you're generating 100 tokens per second, 8,000 reasoning tokens will take 80 seconds. That's a significant delay compared to traditional models.
Cost: Even if the model itself is cheap, you're still paying for those extra tokens.

However, these costs are often outweighed by the benefits. On complex reasoning benchmarks like ARC-AGI-1, reasoning models dramatically outperform traditional models.

These leaps aren't just confined to research labs; they're already making a real-world impact. At Zyte, for instance, we're using these techniques to build smarter, more efficient web scraping tools.

By leveraging the power of LLMs to reason about web page structures and data relationships, we can automate tasks that previously required significant human intervention. This means faster, more reliable, and more cost-effective data extraction.

What's on the horizon?

The field is now moving at breakneck speed. Right now, we are seeing:

Open-source reasoning revolution: The success of DeepSeek R1 has inspired others to release their own open-source reasoning models, like Qwen QwQ-32B, fostering competition and collaboration.
More deep thinking: Apart from the Deepseek R1 release, the established LLM players like OpenAI and Google have also injected deeper reasoning capabilities into their offerings.
Smaller, more efficient models: Researchers are actively working on distilling the reasoning capabilities of larger models into smaller, more practical ones.
Real-world applications: These models are being deployed in a growing number of applications, from solving complex mathematical problems to generating more robust and reliable code.

The age of truly reasoning AI is upon us. It's a world where models don't just predict the next word, but actively engage with problems, explore solutions, and, in a sense, think.

And that’s a good reason to be excited.

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.

Start Free Find out more