Google's Gemini 3.1 Pro recently surpassed 50% accuracy on Humanity's Last Exam—a rigorous benchmark of 2,500 expert-vetted questions spanning mathematics, sciences, and humanities—according to the Stanford 2026 AI Index released April 15, fueling trader optimism for further gains by June 30. This milestone, up dramatically from 8.8% in 2025, underscores Google's scaling prowess in large language model reasoning, positioning it neck-and-neck with Anthropic's Claude Opus 4.6 (also over 50%, up to 56.8% reported) and OpenAI's GPT-5.4. While official leaderboards like Scale AI's lag at ~45-51% for Gemini variants, competitive pressures and potential announcements at Google I/O in May could drive scores toward 55-60% thresholds, though benchmark saturation risks and tool-free evaluation criteria add uncertainty to market-implied odds.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · AggiornatoPunteggio Google Gemini all'ultimo esame dell'umanità entro il 30 giugno?
Punteggio Google Gemini all'ultimo esame dell'umanità entro il 30 giugno?
$305,966 Vol.
50%+
40%
55%+
20%
60%+
10%
$305,966 Vol.
50%+
40%
55%+
20%
60%+
10%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Mercato aperto: Jan 29, 2026, 12:50 PM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Google's Gemini 3.1 Pro recently surpassed 50% accuracy on Humanity's Last Exam—a rigorous benchmark of 2,500 expert-vetted questions spanning mathematics, sciences, and humanities—according to the Stanford 2026 AI Index released April 15, fueling trader optimism for further gains by June 30. This milestone, up dramatically from 8.8% in 2025, underscores Google's scaling prowess in large language model reasoning, positioning it neck-and-neck with Anthropic's Claude Opus 4.6 (also over 50%, up to 56.8% reported) and OpenAI's GPT-5.4. While official leaderboards like Scale AI's lag at ~45-51% for Gemini variants, competitive pressures and potential announcements at Google I/O in May could drive scores toward 55-60% thresholds, though benchmark saturation risks and tool-free evaluation criteria add uncertainty to market-implied odds.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · Aggiornato
Fai attenzione ai link esterni.
Fai attenzione ai link esterni.
Domande frequenti