Apple’s New Research Shows That LLM Reasoning Is Completely Broken | by Dr. Ashish Bamania | Jun, 2025 | AI Advances


AI Summary Hide AI Generated Summary

Apple Research on LLM Reasoning

Apple's recent research casts doubt on the effectiveness of Large Reasoning Models (LLMs), particularly their reasoning abilities. The study challenges the prevalent belief that these models, designed for thoughtful problem-solving, significantly outperform non-reasoning counterparts.

Key Findings

The research indicates that:

  • Top-performing LLMs demonstrate no improvement, or even worse performance, compared to their non-reasoning counterparts on simpler tasks.
  • The accuracy of LLMs dramatically declines as problem complexity rises.

These findings contradict optimistic predictions about achieving Artificial General Intelligence (AGI) in the near future, based on the initial impressive performance of LLMs on reasoning benchmarks.

Implications

Apple's research suggests that the current generation of LLMs have limitations in their reasoning abilities, raising concerns about their true capabilities and the feasibility of near-term AGI.

Sign in to unlock more AI features Sign in with Google

Apple’s New Research Shows That LLM Reasoning Is Completely Broken

A deep dive into Apple research that exposes the flawed thinking process in state-of-the-art Reasoning LLMs

Image generated by author using Google ImageFX

Large Reasoning Models (LRMs), or simply called Reasoning LLMs, are becoming quite popular.

These models are specifically trained to take their time and think before they answer, especially when solving tough problems.

Their thinking mechanism comes from generating a long Chain-of-Thought (CoT) and self-verifying it at inference/ test-time before giving the final answer.

The performance of these models on multiple reasoning benchmarks is very impressive, leading many to believe that we might achieve AGI in just the next few years.

This claim is far from the truth, though.

These are not merely my thoughts, but a recent research paper from Apple has just confirmed this (and this is not the first time Apple has exposed similar lies of the big tech giants).

The research shows that the best reasoning LLMs perform no better (or even worse) than their non-reasoning counterparts on low-complexity tasks and that their accuracy completely collapses beyond a certain problem complexity.

Was this article displayed correctly? Not happy with what you see?

Tabs Reminder: Tabs piling up in your browser? Set a reminder for them, close them and get notified at the right time.

Try our Chrome extension today!


Share this article with your
friends and colleagues.
Earn points from views and
referrals who sign up.
Learn more

Facebook

Save articles to reading lists
and access them on any device


Share this article with your
friends and colleagues.
Earn points from views and
referrals who sign up.
Learn more

Facebook

Save articles to reading lists
and access them on any device