Sup AI Sets New Performance Record on Humanity’s Last Exam Benchmark
Sup AI has reported a new benchmark result on Humanity’s Last Exam (HLE), which is a highly challenging evaluation of reasoning, math, science and logic. Achieving a 52.15 percent accuracy score, Sup AI is setting what the company describes as a new performance record on the evaluation. The benchmark is designed to test advanced reasoning, problem-solving and knowledge integration across a wide range of complex tasks, positioning it as a stress test for frontier large language models rather than a measure of narrow domain expertise. Sup AI said the result reflects progress in model architecture, training techniques and evaluation methodologies focused on reasoning depth rather than surface-level accuracy. According to Sup AI, its ability to take a dynamic route to each question and its automated retry capability position it above most platforms.
Interest in tougher AI benchmarks has grown as traditional tests become less effective at differentiating top-tier models. Sup AI’s reported result underscores how competitive model development is increasingly tied to performance on these emerging evaluations, which may play a larger role in enterprise and research adoption decisions.
Posted by Pure AI Editors on 01/05/2026