OpenAI's GPT-5.4, released in early March 2026, currently leads the Humanity's Last Exam leaderboard with a 41.6% score without tools—up sharply from GPT-5.2's 34.5% just two months prior—demonstrating accelerated reasoning gains across 2,500 expert-level questions in math, sciences, and humanities. This benchmark, developed by the Center for AI Safety and Scale AI to combat saturation in easier evals, highlights frontier large language model progress amid intense competition from Google's Gemini 3 Pro (37.5%) and Anthropic's Claude Opus 4.6 (34.4%). Trader sentiment hinges on whether OpenAI iterates to GPT-5.5 or equivalent before June 30, with upcoming developer previews or funding announcements as key catalysts, though tool-assisted scores (up to 58%) underscore evaluation nuances and potential delays in raw capability scaling.
Polymarket verilerine atıfta bulunan deneysel AI tarafından oluşturulmuş özet. Bu bir işlem tavsiyesi değildir ve bu piyasanın nasıl çözümlendiğinde hiçbir rolü yoktur. · Güncellendi$14,919 Hac.
%50+
54%
$14,919 Hac.
%50+
54%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Piyasa Açıldı: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's GPT-5.4, released in early March 2026, currently leads the Humanity's Last Exam leaderboard with a 41.6% score without tools—up sharply from GPT-5.2's 34.5% just two months prior—demonstrating accelerated reasoning gains across 2,500 expert-level questions in math, sciences, and humanities. This benchmark, developed by the Center for AI Safety and Scale AI to combat saturation in easier evals, highlights frontier large language model progress amid intense competition from Google's Gemini 3 Pro (37.5%) and Anthropic's Claude Opus 4.6 (34.4%). Trader sentiment hinges on whether OpenAI iterates to GPT-5.5 or equivalent before June 30, with upcoming developer previews or funding announcements as key catalysts, though tool-assisted scores (up to 58%) underscore evaluation nuances and potential delays in raw capability scaling.
Polymarket verilerine atıfta bulunan deneysel AI tarafından oluşturulmuş özet. Bu bir işlem tavsiyesi değildir ve bu piyasanın nasıl çözümlendiğinde hiçbir rolü yoktur. · Güncellendi
Harici bağlantılara dikkat edin.
Harici bağlantılara dikkat edin.
Sıkça Sorulan Sorular