Apple’s New Research Shows That LLM Reasoning Is Completely Broken | by Dr. Ashish Bamania | Jun, 2025 | AI Advances

Apple's new research reveals that Large Reasoning Models (LLMs) fail to deliver on their promise of superior reasoning capabilities, especially as problem complexity increases.

AI Summary available — skim the key points instantly. Show AI Generated Summary

Show AI Generated Summary

Apple’s New Research Shows That LLM Reasoning Is Completely Broken

A deep dive into Apple research that exposes the flawed thinking process in state-of-the-art Reasoning LLMs

Image generated by author using Google ImageFX

Large Reasoning Models (LRMs), or simply called Reasoning LLMs, are becoming quite popular.

These models are specifically trained to take their time and think before they answer, especially when solving tough problems.

Their thinking mechanism comes from generating a long Chain-of-Thought (CoT) and self-verifying it at inference/ test-time before giving the final answer.

The performance of these models on multiple reasoning benchmarks is very impressive, leading many to believe that we might achieve AGI in just the next few years.

This claim is far from the truth, though.

These are not merely my thoughts, but a recent research paper from Apple has just confirmed this (and this is not the first time Apple has exposed similar lies of the big tech giants).

The research shows that the best reasoning LLMs perform no better (or even worse) than their non-reasoning counterparts on low-complexity tasks and that their accuracy completely collapses beyond a certain problem complexity.

Was this article displayed correctly? Not happy with what you see?

See Archived Versions