Apple’s New Research Shows That LLM Reasoning Is Completely Broken
A deep dive into Apple research that exposes the flawed thinking process in state-of-the-art Reasoning LLMs
Large Reasoning Models (LRMs), or simply called Reasoning LLMs, are becoming quite popular.
These models are specifically trained to take their time and think before they answer, especially when solving tough problems.
Their thinking mechanism comes from generating a long Chain-of-Thought (CoT) and self-verifying it at inference/ test-time before giving the final answer.
The performance of these models on multiple reasoning benchmarks is very impressive, leading many to believe that we might achieve AGI in just the next few years.
This claim is far from the truth, though.
These are not merely my thoughts, but a recent research paper from Apple has just confirmed this (and this is not the first time Apple has exposed similar lies of the big tech giants).
The research shows that the best reasoning LLMs perform no better (or even worse) than their non-reasoning counterparts on low-complexity tasks and that their accuracy completely collapses beyond a certain problem complexity.