Breaking New Thresholds: Even the Finest AI Struggle to Surpass Cutting-Edge Benchmark

2025-01-24

The Center for AI Safety (CAIS) in partnership with Scale AI, an AI development service provider, has unveiled a formidable new benchmark for advanced AI systems.

Named ‘Humanity’s Last Exam’, the benchmark offers thousands of questions sourced from crowd knowledge and covers diverse fields from mathematics to humanities and natural sciences. Featuring a variety of question formats, including visual-based questions, this new benchmark significantly ups the ante in AI evaluation.

In light of preliminary examination, no existing AI flagship systems were able to exceed a minimal 10% score on ‘Humanity’s Last Exam’.

CAIS and Scale AI aim to extend access to the benchmark to the global research fraternity to facilitate in-depth explorations and assessment of emerging AI models.

Original source: Read the full article on TechCrunch

Related Posts