Overview
DeepSeek R1 arrived in January 2025 and immediately shocked the AI world. The model matched OpenAI's o1 on mathematical reasoning benchmarks — scoring 97.3 on MATH and 90.8 on MMLU — while being released entirely open-weight under the MIT license. For the first time, a model at the frontier of reasoning capability was freely available to download, inspect, modify, and deploy without restriction. The implications were profound: every assumption about the cost and secrecy required to build frontier AI had to be revised.
o1-Level Reasoning, MIT License
Before DeepSeek R1, accessing reasoning-optimised models meant paying premium prices to closed API providers and accepting their terms of service. DeepSeek R1 changed that equation entirely:
- MIT license: No usage restrictions, no commercial fees, no attribution requirements.
- Full weights available: Download and self-host the complete model, including the 671B parameter full version.
- Inspect the training approach: DeepSeek published the full technical report, including training methodology and dataset details.
- Fine-tune and modify: Adapt the model for specific domains without asking anyone's permission.
This was not just a benchmark achievement — it was a demonstration that the reasoning capabilities that OpenAI had been treating as a premium, gated product could be replicated and openly distributed.
Visible Chain-of-Thought Reasoning
Like o1, DeepSeek R1 uses extended chain-of-thought reasoning to tackle hard problems. Unlike o1, the reasoning process is visible to users by default. Before producing a final answer, the model writes out its thinking — exploring approaches, catching mistakes, and verifying conclusions — and this scratchpad is returned as part of the response.
This transparency has practical benefits:
- Debugging model behaviour: When the answer is wrong, you can see exactly where the reasoning went astray.
- Building trust: Users and auditors can verify the logic, not just the conclusion.
- Educational value: The reasoning trace can teach problem-solving approaches.
- Prompt improvement: Understanding how the model thinks helps you write better prompts.
Benchmark Performance
| Benchmark | DeepSeek R1 | OpenAI o1 | |-----------|-------------|-----------| | MATH | 97.3 | 96.4 | | MMLU | 90.8 | 91.8 | | GPQA | 71.5 | 77.3 |
On the MATH benchmark, DeepSeek R1 actually outperforms o1 — a result that generated significant attention in the research community. GPQA, which measures graduate-level scientific reasoning, remains stronger for o1, but the gap is far smaller than most observers expected.
Available at Multiple Quantizations
The full DeepSeek R1 model has 671 billion parameters — requiring substantial hardware for inference. However, DeepSeek and the community have released multiple quantised versions:
- R1 671B full: Requires ~1.3TB of GPU VRAM — dedicated cluster hardware.
- R1 671B Q4: ~350GB — feasible on 4–8× A100/H100 nodes.
- R1-Distill-Llama-70B: A 70B model distilled from R1's reasoning traces — fits on 2× A100 80GB. Retains much of the reasoning quality.
- R1-Distill-Qwen-32B: 32B distilled version — fits on a single A100 40GB or 2× RTX 4090.
- R1-Distill-Qwen-7B: 7B distilled version — runs on consumer hardware.
The distilled models are particularly significant: they bring reasoning-model capability within reach of teams that cannot afford datacenter-scale hardware.
DeepSeek API
Beyond self-hosting, DeepSeek offers a managed API at $0.55 per million input tokens and $2.19 per million output tokens — significantly cheaper than comparable reasoning models from OpenAI. This makes high-quality reasoning accessible for cost-sensitive production deployments.
Best Use Cases
- Mathematical problem solving: From algebra to competition mathematics, R1 sets the open-weight standard.
- Complex coding: Algorithm design, debugging difficult code, and code generation requiring deep reasoning.
- Scientific reasoning: Multi-step problems in physics, chemistry, and biology.
- Research assistance: Breaking down and solving problems that require sustained logical effort.
- Fine-tuning for reasoning: The open weights make R1 the starting point for domain-specific reasoning model fine-tuning.
- Cost-sensitive reasoning workloads: R1 via API is substantially cheaper than o1 for comparable tasks.