Overview
Grok 3 is xAI's frontier reasoning model, released in February 2025 after a rapid scaling push that leveraged xAI's Colossus supercluster — one of the largest AI training clusters ever assembled. The results were immediately apparent: Grok 3 achieved 93.3 on MMLU, 97.7 on MATH, and 84.6 on GPQA at launch, placing it at the very top of public benchmark rankings alongside the best models from OpenAI, Anthropic, and Google.
Colossus: The Training Infrastructure
The story behind Grok 3 begins with infrastructure. xAI built the Colossus supercluster in Memphis, Tennessee — a facility housing 100,000 NVIDIA H100 GPUs that came online in record time (reportedly assembled in 122 days from ground-up construction). This scale of compute enabled training runs that simply weren't possible on smaller clusters, which contributed directly to Grok 3's benchmark-leading performance.
The scale of Colossus demonstrates xAI's ambition and provides the infrastructure foundation for continued model improvement beyond Grok 3.
Benchmark Performance at Launch
Grok 3's benchmark scores at launch were among the highest reported for any publicly available model:
| Benchmark | Score | Context | |-----------|-------|---------| | MMLU | 93.3 | Top-tier broad knowledge | | MATH | 97.7 | Near-perfect mathematical reasoning | | GPQA | 84.6 | Graduate-level science, competitive with best models |
On AIME (American Invitational Mathematics Examination — a competition mathematics benchmark that separates truly strong reasoning models from the rest), Grok 3 posted competitive scores against o1 and DeepSeek R1, demonstrating that the reasoning capability is real and not just benchmark optimisation.
Think Mode: Extended Reasoning
Grok 3 includes a "Think" mode that enables extended chain-of-thought reasoning before producing a final answer. Like o1 and DeepSeek R1, this mode allows the model to:
- Break down complex problems into steps before attempting to solve them.
- Reconsider and backtrack when an approach isn't working.
- Verify intermediate results before proceeding.
- Produce more reliable answers on problems where quick intuition fails.
Think mode is particularly valuable for mathematical proofs, complex coding tasks, multi-step logical reasoning, and scientific problem-solving. Users can toggle between standard and Think mode depending on the task.
Integrated into Grok.com and X
Grok 3 powers the flagship Grok assistant available at Grok.com and within the X platform for Premium subscribers. This gives a large existing user base immediate access to frontier-model capability within a familiar interface. X Premium+ subscribers get access to Think mode for extended reasoning.
Real-Time X Data Access
Like Grok 2, Grok 3 retains access to real-time X/Twitter data, allowing it to answer questions about current events, trending topics, and live information — a capability that static-knowledge models lack regardless of their benchmark scores.
API Access
Available via the xAI API at $3 per million input tokens and $15 per million output tokens. The API is OpenAI-compatible, simplifying integration for developers already working with the OpenAI SDK.
Best Use Cases
- Competitive mathematics and science: Problems at or near competition level where extended reasoning and deep knowledge matter.
- Complex coding: Architecture design, algorithm optimisation, debugging subtle logical errors.
- Research assistance: Graduate-level reasoning across STEM domains.
- Real-time information tasks: Combining frontier intelligence with live X data access.
- Extended reasoning workflows: Multi-step problems where Think mode can explore and verify before committing to an answer.