Overview

DeepSeek R1 arrived in January 2025 and immediately shocked the AI world. The model matched OpenAI's o1 on mathematical reasoning benchmarks — scoring 97.3 on MATH and 90.8 on MMLU — while being released entirely open-weight under the MIT license. For the first time, a model at the frontier of reasoning capability was freely available to download, inspect, modify, and deploy without restriction. The implications were profound: every assumption about the cost and secrecy required to build frontier AI had to be revised.

o1-Level Reasoning, MIT License

Before DeepSeek R1, accessing reasoning-optimised models meant paying premium prices to closed API providers and accepting their terms of service. DeepSeek R1 changed that equation entirely:

MIT license: No usage restrictions, no commercial fees, no attribution requirements.
Full weights available: Download and self-host the complete model, including the 671B parameter full version.
Inspect the training approach: DeepSeek published the full technical report, including training methodology and dataset details.
Fine-tune and modify: Adapt the model for specific domains without asking anyone's permission.

This was not just a benchmark achievement — it was a demonstration that the reasoning capabilities that OpenAI had been treating as a premium, gated product could be replicated and openly distributed.

Visible Chain-of-Thought Reasoning

Like o1, DeepSeek R1 uses extended chain-of-thought reasoning to tackle hard problems. Unlike o1, the reasoning process is visible to users by default. Before producing a final answer, the model writes out its thinking — exploring approaches, catching mistakes, and verifying conclusions — and this scratchpad is returned as part of the response.

This transparency has practical benefits:

Debugging model behaviour: When the answer is wrong, you can see exactly where the reasoning went astray.
Building trust: Users and auditors can verify the logic, not just the conclusion.
Educational value: The reasoning trace can teach problem-solving approaches.
Prompt improvement: Understanding how the model thinks helps you write better prompts.

Benchmark Performance

| Benchmark | DeepSeek R1 | OpenAI o1 | |-----------|-------------|-----------| | MATH | 97.3 | 96.4 | | MMLU | 90.8 | 91.8 | | GPQA | 71.5 | 77.3 |

On the MATH benchmark, DeepSeek R1 actually outperforms o1 — a result that generated significant attention in the research community. GPQA, which measures graduate-level scientific reasoning, remains stronger for o1, but the gap is far smaller than most observers expected.

Available at Multiple Quantizations

The full DeepSeek R1 model has 671 billion parameters — requiring substantial hardware for inference. However, DeepSeek and the community have released multiple quantised versions:

R1 671B full: Requires ~1.3TB of GPU VRAM — dedicated cluster hardware.
R1 671B Q4: ~350GB — feasible on 4–8× A100/H100 nodes.
R1-Distill-Llama-70B: A 70B model distilled from R1's reasoning traces — fits on 2× A100 80GB. Retains much of the reasoning quality.
R1-Distill-Qwen-32B: 32B distilled version — fits on a single A100 40GB or 2× RTX 4090.
R1-Distill-Qwen-7B: 7B distilled version — runs on consumer hardware.

The distilled models are particularly significant: they bring reasoning-model capability within reach of teams that cannot afford datacenter-scale hardware.

DeepSeek API

Beyond self-hosting, DeepSeek offers a managed API at $0.55 per million input tokens and $2.19 per million output tokens — significantly cheaper than comparable reasoning models from OpenAI. This makes high-quality reasoning accessible for cost-sensitive production deployments.

Best Use Cases

Mathematical problem solving: From algebra to competition mathematics, R1 sets the open-weight standard.
Complex coding: Algorithm design, debugging difficult code, and code generation requiring deep reasoning.
Scientific reasoning: Multi-step problems in physics, chemistry, and biology.
Research assistance: Breaking down and solving problems that require sustained logical effort.
Fine-tuning for reasoning: The open weights make R1 the starting point for domain-specific reasoning model fine-tuning.
Cost-sensitive reasoning workloads: R1 via API is substantially cheaper than o1 for comparable tasks.

Overview

o1-Level Reasoning, MIT License

Before DeepSeek R1, accessing reasoning-optimised models meant paying premium prices to closed API providers and accepting their terms of service. DeepSeek R1 changed that equation entirely:

MIT license: No usage restrictions, no commercial fees, no attribution requirements.
Full weights available: Download and self-host the complete model, including the 671B parameter full version.
Inspect the training approach: DeepSeek published the full technical report, including training methodology and dataset details.
Fine-tune and modify: Adapt the model for specific domains without asking anyone's permission.

Visible Chain-of-Thought Reasoning

This transparency has practical benefits:

Debugging model behaviour: When the answer is wrong, you can see exactly where the reasoning went astray.
Building trust: Users and auditors can verify the logic, not just the conclusion.
Educational value: The reasoning trace can teach problem-solving approaches.
Prompt improvement: Understanding how the model thinks helps you write better prompts.

Benchmark Performance

| Benchmark | DeepSeek R1 | OpenAI o1 | |-----------|-------------|-----------| | MATH | 97.3 | 96.4 | | MMLU | 90.8 | 91.8 | | GPQA | 71.5 | 77.3 |

Available at Multiple Quantizations

The full DeepSeek R1 model has 671 billion parameters — requiring substantial hardware for inference. However, DeepSeek and the community have released multiple quantised versions:

R1 671B full: Requires ~1.3TB of GPU VRAM — dedicated cluster hardware.
R1 671B Q4: ~350GB — feasible on 4–8× A100/H100 nodes.
R1-Distill-Llama-70B: A 70B model distilled from R1's reasoning traces — fits on 2× A100 80GB. Retains much of the reasoning quality.
R1-Distill-Qwen-32B: 32B distilled version — fits on a single A100 40GB or 2× RTX 4090.
R1-Distill-Qwen-7B: 7B distilled version — runs on consumer hardware.

The distilled models are particularly significant: they bring reasoning-model capability within reach of teams that cannot afford datacenter-scale hardware.

DeepSeek API

Best Use Cases

Mathematical problem solving: From algebra to competition mathematics, R1 sets the open-weight standard.
Complex coding: Algorithm design, debugging difficult code, and code generation requiring deep reasoning.
Scientific reasoning: Multi-step problems in physics, chemistry, and biology.
Research assistance: Breaking down and solving problems that require sustained logical effort.
Fine-tuning for reasoning: The open weights make R1 the starting point for domain-specific reasoning model fine-tuning.
Cost-sensitive reasoning workloads: R1 via API is substantially cheaper than o1 for comparable tasks.

Provider	DeepSeek
Released	2025-01-20
Status	Current
Context window	66K tokens
Pricing	Open
Input price	$0.55/M
Output price	$2.19/M
Capabilities	textcodereasoningmath
Hugging Face	View on HF ↗

DeepSeek R1

Benchmarks

Overview

o1-Level Reasoning, MIT License

Visible Chain-of-Thought Reasoning

Benchmark Performance

Available at Multiple Quantizations

DeepSeek API

Best Use Cases

Compare with similar models

Overview

o1-Level Reasoning, MIT License

Visible Chain-of-Thought Reasoning

Benchmark Performance

Available at Multiple Quantizations

DeepSeek API

Best Use Cases

	DeepSeek R1	Llama 3.1 405B	DeepSeek V3	Llama 3.3 70B
Context	66K	131K	66K	131K
MMLU	90.8	88.6	88.5	86.0
HumanEval	—	—	91.6	—
MATH	97.3	73.5	90.2	77.0
GPQA	71.5	—	—	—
Pricing	Open	Open	Open	Open
Input $/M	$0.55	—	$0.27	—