| GPT-5.2-Codex | 89.9 | GPQA Diamond89.9 |
| GPT-5.1-Codex | 85.6 | MMLU86GPQA Diamond86HumanEval84.9 |
| o3 Pro | 84.5 | GPQA Diamond84.5 |
| GPT-5.1-Codex-Mini | 82.3 | MMLU82GPQA Diamond81.3HumanEval83.6 |
| GPT-4o-mini (2024-07-18) | 82.0 | MMLU82 |
| GPT-5 Codex | 81.4 | MMLU86.5GPQA Diamond83.7SWE-Bench Verified74 |
| o4 Mini High | 77.6 | MMLU83.2GPQA Diamond81.4SWE-Bench Verified68.1 |
| o3 Mini High | 76.7 | MMLU79.1GPQA Diamond77LiveCodeBench74 |
| GPT-4.1 Mini | 76.2 | MMLU78.1GPQA Diamond66.4IFEval84.1 |
| GPT-5.3-Codex | 73.4 | GPQA Diamond91.5AA Intelligence Index54SWE-Bench Verified74.8 |
| GPT-3.5 Turbo 16k | 70.0 | MMLU70 |
| GPT-5.2 Pro | 69.7 | MMLU87.4GPQA Diamond90.3FrontierMath Tier-431.3 |
| GPT-4.1 Nano | 65.2 | MMLU80.1GPQA Diamond50.3 |
| GPT-4 | 63.0 | MMLU86.4HumanEval67GPQA Diamond35.7 |
| GPT-4 Turbo Preview | 61.1 | MMLU86.5GPQA Diamond35.7 |
| GPT-5.4 Pro | 58.8 | GPQA Diamond94.6FrontierMath Tier-437.5Humanity's Last Exam44.3 |
| o1 | 58.7 | MMLU84.1GPQA Diamond74.7LiveCodeBench67.9Humanity's Last Exam8.0 |
| GPT-5.4 | 55.0 | GPQA Diamond74.8SWE-Bench Verified80AA Intelligence Index57FrontierMath Tier-427.1Humanity's Last Exam36.2 |
| GPT-5 | 53.9 | MMLU80.6GPQA Diamond85SWE-Bench Verified74.9AA Intelligence Index45FrontierMath Tier-412.5Humanity's Last Exam25.3 |
| GPT-5 Chat | 52.4 | MMLU82GPQA Diamond68.6SWE-Bench Verified73.5FrontierMath Tier-412.5Humanity's Last Exam25.3 |
| o4 Mini | 51.3 | MMLU83.2GPQA Diamond78.4SWE-Bench Verified68.1AIME 202492.7FrontierMath Tier-42.1SciPredict16.2Humanity's Last Exam18.1 |
| GPT-4 Turbo | 50.3 | MMLU86.5GPQA Diamond35.7HumanEval67SWE-Bench Verified12 |
| o3 Mini | 49.9 | MMLU79.1GPQA Diamond74.8LiveCodeBench71.7FrontierMath Tier-44.2SciPredict19.8 |
| GPT-5.2 | 49.2 | MMLU81.4GPQA Diamond71.2AA Intelligence Index51SWE-Bench Verified73.8FrontierMath Tier-418.8SciPredict20.6Humanity's Last Exam27.8 |
| GPT-4.1 | 49.1 | MMLU80.6GPQA Diamond66.6SWE-Bench Verified54.6IFEval87.4FrontierMath Tier-40.0%Humanity's Last Exam5.4 |
| GPT-5 Mini | 49.1 | MMLU82.8GPQA Diamond80.3AA Intelligence Index41SWE-Bench Verified64.7FrontierMath Tier-46.3Humanity's Last Exam19.4 |
| GPT-3.5 Turbo (older v0613) | 49.0 | MMLU70GPQA Diamond28.0 |
| GPT-5.1 | 48.0 | MMLU80.1GPQA Diamond64.3SWE-Bench Verified68.0FrontierMath Tier-44.2Humanity's Last Exam23.7 |
| GPT-5.2 Chat | 46.5 | SWE-Bench Verified73.8FrontierMath Tier-418.8GPQA Diamond91.4SciPredict20.6Humanity's Last Exam27.8 |
| o3 | 46.2 | MMLU85.3GPQA Diamond82.7SWE-Bench Verified69.1FrontierMath Tier-42.1SciPredict17.9Humanity's Last Exam20.3 |
| GPT-5.1 Chat | 45.2 | SWE-Bench Verified68.0FrontierMath Tier-44.2GPQA Diamond85.0Humanity's Last Exam23.7 |
| GPT-4o (2024-11-20) | 42.6 | MMLU88.7SWE-Bench Verified31.0GPQA Diamond47.9Humanity's Last Exam2.7 |
| gpt-oss-120b | 37.0 | AA Intelligence Index37 |
| gpt-oss-120b | 37.0 | AA Intelligence Index37 |
| GPT-4 Turbo (older v1106) | 35.7 | GPQA Diamond35.7 |
| GPT-5 Nano | 32.8 | MMLU55.6GPQA Diamond42.8FrontierMath Tier-40.0% |
| GPT-5 Pro | 31.6 | Humanity's Last Exam31.6 |
| GPT-5.4 Mini | 31.3 | GPQA Diamond60.6FrontierMath Tier-42.1 |
| GPT-5.4 Nano | 31.0 | GPQA Diamond55.8FrontierMath Tier-46.3 |
| GPT-3.5 Turbo Instruct | 28.0 | GPQA Diamond28.0 |
| gpt-oss-20b | 22.0 | AA Intelligence Index22 |
| gpt-oss-20b | 22.0 | AA Intelligence Index22 |
| o1-pro | 8.1 | Humanity's Last Exam8.1 |