| GPT-5.2-Codex | 89.9 | GPQA Diamond89.9 |
| GPT-5.1-Codex | 85.6 | GPQA Diamond86HumanEval84.9MMLU86 |
| o3 Pro | 84.5 | GPQA Diamond84.5 |
| GPT-5.1-Codex-Mini | 82.3 | GPQA Diamond81.3HumanEval83.6MMLU82 |
| GPT-4o-mini (2024-07-18) | 82.0 | MMLU82 |
| GPT-5 Codex | 81.4 | GPQA Diamond83.7MMLU86.5SWE-Bench Verified74 |
| o4 Mini High | 77.6 | GPQA Diamond81.4MMLU83.2SWE-Bench Verified68.1 |
| o3 Mini High | 76.7 | GPQA Diamond77LiveCodeBench74MMLU79.1 |
| GPT-4.1 Mini | 76.2 | GPQA Diamond66.4IFEval84.1MMLU78.1 |
| GPT-5.3-Codex | 73.4 | AA Intelligence Index54GPQA Diamond91.5SWE-Bench Verified74.8 |
| GPT-3.5 Turbo 16k | 70.0 | MMLU70 |
| GPT-5.2 Pro | 69.7 | FrontierMath Tier-431.3GPQA Diamond90.3MMLU87.4 |
| GPT-4.1 Nano | 65.2 | GPQA Diamond50.3MMLU80.1 |
| GPT-4 | 63.0 | GPQA Diamond35.7HumanEval67MMLU86.4 |
| GPT-4 Turbo Preview | 61.1 | GPQA Diamond35.7MMLU86.5 |
| GPT-5.4 Pro | 58.8 | FrontierMath Tier-437.5GPQA Diamond94.6Humanity's Last Exam44.3 |
| o1 | 58.7 | GPQA Diamond74.7Humanity's Last Exam8.0LiveCodeBench67.9MMLU84.1 |
| GPT-5.4 | 55.0 | AA Intelligence Index57FrontierMath Tier-427.1GPQA Diamond74.8Humanity's Last Exam36.2SWE-Bench Verified80 |
| GPT-5 | 53.9 | AA Intelligence Index45FrontierMath Tier-412.5GPQA Diamond85Humanity's Last Exam25.3MMLU80.6SWE-Bench Verified74.9 |
| GPT-5 Chat | 52.4 | FrontierMath Tier-412.5GPQA Diamond68.6Humanity's Last Exam25.3MMLU82SWE-Bench Verified73.5 |
| o4 Mini | 51.3 | AIME 202492.7FrontierMath Tier-42.1GPQA Diamond78.4Humanity's Last Exam18.1MMLU83.2SciPredict16.2SWE-Bench Verified68.1 |
| GPT-4 Turbo | 50.3 | GPQA Diamond35.7HumanEval67MMLU86.5SWE-Bench Verified12 |
| o3 Mini | 49.9 | FrontierMath Tier-44.2GPQA Diamond74.8LiveCodeBench71.7MMLU79.1SciPredict19.8 |
| GPT-5.2 | 49.2 | AA Intelligence Index51FrontierMath Tier-418.8GPQA Diamond71.2Humanity's Last Exam27.8MMLU81.4SciPredict20.6SWE-Bench Verified73.8 |
| GPT-4.1 | 49.1 | FrontierMath Tier-40.0%GPQA Diamond66.6Humanity's Last Exam5.4IFEval87.4MMLU80.6SWE-Bench Verified54.6 |
| GPT-5 Mini | 49.1 | AA Intelligence Index41FrontierMath Tier-46.3GPQA Diamond80.3Humanity's Last Exam19.4MMLU82.8SWE-Bench Verified64.7 |
| GPT-3.5 Turbo (older v0613) | 49.0 | GPQA Diamond28.0MMLU70 |
| GPT-5.1 | 48.0 | FrontierMath Tier-44.2GPQA Diamond64.3Humanity's Last Exam23.7MMLU80.1SWE-Bench Verified68.0 |
| GPT-5.2 Chat | 46.5 | FrontierMath Tier-418.8GPQA Diamond91.4Humanity's Last Exam27.8SciPredict20.6SWE-Bench Verified73.8 |
| o3 | 46.2 | FrontierMath Tier-42.1GPQA Diamond82.7Humanity's Last Exam20.3MMLU85.3SciPredict17.9SWE-Bench Verified69.1 |
| GPT-5.1 Chat | 45.2 | FrontierMath Tier-44.2GPQA Diamond85.0Humanity's Last Exam23.7SWE-Bench Verified68.0 |
| GPT-4o (2024-11-20) | 42.6 | GPQA Diamond47.9Humanity's Last Exam2.7MMLU88.7SWE-Bench Verified31.0 |
| gpt-oss-120b | 37.0 | AA Intelligence Index37 |
| gpt-oss-120b | 37.0 | AA Intelligence Index37 |
| GPT-4 Turbo (older v1106) | 35.7 | GPQA Diamond35.7 |
| GPT-5 Nano | 32.8 | FrontierMath Tier-40.0%GPQA Diamond42.8MMLU55.6 |
| GPT-5 Pro | 31.6 | Humanity's Last Exam31.6 |
| GPT-5.4 Mini | 31.3 | FrontierMath Tier-42.1GPQA Diamond60.6 |
| GPT-5.4 Nano | 31.0 | FrontierMath Tier-46.3GPQA Diamond55.8 |
| GPT-3.5 Turbo Instruct | 28.0 | GPQA Diamond28.0 |
| gpt-oss-20b | 22.0 | AA Intelligence Index22 |
| gpt-oss-20b | 22.0 | AA Intelligence Index22 |
| o1-pro | 8.1 | Humanity's Last Exam8.1 |