Google's Gemini 3.1 Pro Preview model currently tops the Humanity's Last Exam leaderboard at 44.7%, outpacing OpenAI's GPT-5.4 (41.6%) on this rigorous 2,500-question benchmark testing PhD-level reasoning across mathematics, sciences, and humanities—far below human expert performance near 90% but a leap from sub-30% scores in late 2025. February's Gemini 3 Deep Think release briefly hit 48.4% in early tests, driving sentiment amid fierce competition from Anthropic's Claude series and xAI's Grok. Recent April updates like Gemini 3.1 Flash enhancements signal ongoing iteration, with Google I/O in May poised for major announcements that could push toward 50% thresholds. Traders watch official Scale AI or Artificial Analysis leaderboards for resolution by June 30.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트$305,941 거래량
50% 이상
40%
55%+
20%
60%+
10%
$305,941 거래량
50% 이상
40%
55%+
20%
60%+
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
마켓 개설일: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini 3.1 Pro Preview model currently tops the Humanity's Last Exam leaderboard at 44.7%, outpacing OpenAI's GPT-5.4 (41.6%) on this rigorous 2,500-question benchmark testing PhD-level reasoning across mathematics, sciences, and humanities—far below human expert performance near 90% but a leap from sub-30% scores in late 2025. February's Gemini 3 Deep Think release briefly hit 48.4% in early tests, driving sentiment amid fierce competition from Anthropic's Claude series and xAI's Grok. Recent April updates like Gemini 3.1 Flash enhancements signal ongoing iteration, with Google I/O in May poised for major announcements that could push toward 50% thresholds. Traders watch official Scale AI or Artificial Analysis leaderboards for resolution by June 30.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트
외부 링크에 주의하세요.
외부 링크에 주의하세요.
자주 묻는 질문