NIXsolutions: OpenAI Leads in Humanity’s Last Exam

Less than two weeks ago, AI experts introduced Humanity’s Last Exam, a highly challenging test designed to evaluate advanced neural networks. Two OpenAI projects, o3-mini and Deep Research, emerged as top performers.

NIX Solutions

Developed by experts worldwide, the benchmark includes exceptionally difficult questions on knowledge and reasoning—some of which even humans struggle to understand, let alone answer. Initially, the reasoning AI model DeepSeek R1 led with 9.4% correct answers. However, it was soon surpassed by OpenAI’s o3-mini, scoring 10.5%, and o3-mini-high, which achieved 13%. The latter is more powerful but operates at a slower speed. Yet the most striking result came from OpenAI’s Deep Research AI agent, which scored 26.6%, surpassing the previous leader in under ten days.

The Role of Information Search in AI Performance

The comparison is not entirely fair, as Deep Research has the capability to search for information, while traditional AI models do not, notes NIXsolutions. This feature is particularly crucial for Humanity’s Last Exam, where some questions assess knowledge rather than reasoning alone. However, AI systems are rapidly improving, raising the question of when one will achieve a perfect score.

OpenAI Deep Research is designed as a highly capable personal analyst, capable of conducting research, compiling reports, and generating answers that would take a human hours to complete. As AI continues to advance, we’ll keep you updated on new developments and improvements in these cutting-edge systems.