CodeT5 is a family of open-source code language models from Salesforce Research. The original CodeT5 (2021) used an encoder-decoder T5 architecture for code understanding and generation. CodeT5+ (2023) extended this to a series of models from 220M to 16B parameters, achieving strong HumanEval scores. All models are available on Hugging Face Hub for download and use.

CodeT5 is completely free — MIT and BSD licensed models available on Hugging Face. You can download, use, fine-tune, and build products on top of CodeT5 without licensing fees. Commercial use is permitted under the respective licenses. This makes it a popular choice for companies who want to build proprietary code AI tools on top of an open-source foundation.

What is the difference between CodeT5 and CodeT5+?

CodeT5 (2021) was an encoder-decoder model designed for code summarization, generation, and translation. CodeT5+ (2023) is a family of models with improved performance: 220M, 770M, 2B, 6B, and 16B parameter variants. CodeT5+ includes decoder-only models (for text generation) and encoder-decoder models (for translation/summarization). CodeT5+ models generally outperform the original CodeT5 on benchmarks.

How does CodeT5 compare to Code Llama?

Code Llama (Meta) is larger (7B, 13B, 34B, 70B) and generally more capable for code generation tasks. CodeT5+ (2B, 6B, 16B) is competitive at similar sizes but was released earlier. Code Llama benefits from Meta's larger compute investment and training data. For research requiring smaller, controllable models, CodeT5+ is well-documented and widely studied. For production code AI, Code Llama or StarCoder2 are typically preferred today.

CodeT5 | db.fyi

Why it matters

Salesforce Research's open publication and model release advanced the field of code AI significantly — foundational work for subsequent models.
Full open-source access (model weights + code) enables research reproducibility, fine-tuning, and integration without API dependency.
Range of model sizes (220M to 16B) allows deployment in resource-constrained environments where large models aren't feasible.
Encoder-decoder architecture makes CodeT5 uniquely suited for code translation, summarization, and code-to-description tasks vs. decoder-only models.

Key capabilities

Code generation: Generate code from natural language descriptions and docstrings.
Code completion: Auto-complete partial code with context-aware suggestions.
Code summarization: Generate natural language descriptions of code functions and classes.
Code translation: Convert code between programming languages (Python → Java, etc.).
Bug detection: Identify bugs and suggest fixes in code.
Multiple model sizes: 220M, 770M, 2B, 6B, 16B parameters for different resource requirements.
Fine-tuning ready: Pre-trained on code; easily fine-tuned on domain-specific code with PEFT/LoRA.

Technical notes

Architecture: Encoder-decoder (T5-based); decoder-only variants in CodeT5+
License: BSD-3-Clause (CodeT5); Apache 2.0 (CodeT5+)
GitHub: github.com/salesforce/CodeT5 (7K+ stars)
Hugging Face: Salesforce/codet5p-(220m, 770m, 2b, 6b, 16b)
Training data: CodeSearchNet + 20+ programming languages from GitHub
Languages: Python, JavaScript, Java, Go, Ruby, PHP, C/C++, C#, and more
Creator: Salesforce Research (Yue Wang, Weishi Wang, Shafiq Joty)

Ideal for

Researchers studying code language models who need open-weight models with documented training and architecture.
Companies building proprietary code AI tools on top of a free, fine-tunable foundation model.
Teams deploying code AI in environments where commercial model APIs aren't allowed (on-premise, air-gapped).

Not ideal for

Production code generation where quality is the primary requirement — Code Llama 34B or DeepSeek-Coder typically outperform.
Real-time code completion in IDEs — inference speed requires GPU; dedicated serving is needed for low-latency completion.
Teams who want a managed API without hosting — use Together AI or Fireworks AI to serve open-source code models.

CodeT5

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also

FAQ

Alternatives

Integrations

Built on

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also

CodeT5

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also

FAQ

What is CodeT5?

Is CodeT5 free?

What is the difference between CodeT5 and CodeT5+?

How does CodeT5 compare to Code Llama?

Alternatives

Integrations

Built on

Related tools

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also