OpenAI's GPT-5.4, released in early March 2026, currently leads the Humanity's Last Exam leaderboard with a 41.6% score without tools—up sharply from GPT-5.2's 34.5% just two months prior—demonstrating accelerated reasoning gains across 2,500 expert-level questions in math, sciences, and humanities. This benchmark, developed by the Center for AI Safety and Scale AI to combat saturation in easier evals, highlights frontier large language model progress amid intense competition from Google's Gemini 3 Pro (37.5%) and Anthropic's Claude Opus 4.6 (34.4%). Trader sentiment hinges on whether OpenAI iterates to GPT-5.5 or equivalent before June 30, with upcoming developer previews or funding announcements as key catalysts, though tool-assisted scores (up to 58%) underscore evaluation nuances and potential delays in raw capability scaling.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트$14,919 거래량
50%+
54%
$14,919 거래량
50%+
54%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
마켓 개설일: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI's GPT-5.4, released in early March 2026, currently leads the Humanity's Last Exam leaderboard with a 41.6% score without tools—up sharply from GPT-5.2's 34.5% just two months prior—demonstrating accelerated reasoning gains across 2,500 expert-level questions in math, sciences, and humanities. This benchmark, developed by the Center for AI Safety and Scale AI to combat saturation in easier evals, highlights frontier large language model progress amid intense competition from Google's Gemini 3 Pro (37.5%) and Anthropic's Claude Opus 4.6 (34.4%). Trader sentiment hinges on whether OpenAI iterates to GPT-5.5 or equivalent before June 30, with upcoming developer previews or funding announcements as key catalysts, though tool-assisted scores (up to 58%) underscore evaluation nuances and potential delays in raw capability scaling.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트
외부 링크에 주의하세요.
외부 링크에 주의하세요.
자주 묻는 질문