MosaicML was a startup that built infrastructure for training and fine-tuning LLMs efficiently. Their Composer framework implemented dozens of training efficiency techniques (FlashAttention, gradient checkpointing, data shuffling optimizations) that could reduce training time and cost by 2-5×. They also released MPT (MosaicPretrainedTransformer) models — including MPT-7B and MPT-30B — as open-source alternatives to GPT and Llama models.

Is MosaicML still available?

MosaicML was acquired by Databricks in June 2023 for $1.3B. The technology is now integrated into Databricks' Machine Learning platform (mlflow, Model Serving, and foundation model training). The Composer framework is still open source on GitHub (github.com/mosaicml/composer). Teams can use Databricks for managed LLM training using MosaicML's technology.

What are the MPT models?

MPT (MosaicPretrainedTransformer) was MosaicML's family of open-source LLMs: MPT-7B (optimized for instruction following), MPT-30B (larger, more capable), and MPT-7B-chat (RLHF fine-tuned for chat). The key advantage was training efficiency (trained 7B params with Flash Attention 2, ALiBi positional embeddings) and commercially permissive licensing. They're now less prominent following Llama 2/3 and Mistral releases.

What replaced MosaicML for LLM training?

For managed LLM training, teams now use: Databricks (MosaicML's new home), Predibase (for LoRA fine-tuning), Together AI, or cloud providers' managed fine-tuning services. The open-source Composer framework is still available. For training from scratch, Megatron-LM and the Composer framework remain strong choices for researchers.

MosaicML (Databricks) | db.fyi

Why it matters

$1.3B Databricks acquisition validated MosaicML as the leading independent ML training platform — technology now powers Databricks' enterprise ML.
Composer framework open-source training efficiency techniques (FlashAttention, tensor parallelism, data loading) established best practices for large-scale LLM training.
MPT models demonstrated that efficient training techniques could produce competitive models at fraction of compute cost of models like GPT-3.
MosaicML's work on streaming datasets (mosaic-streaming) enables training on datasets larger than RAM — foundational for very large-scale training.

Key capabilities (Databricks integration)

Managed LLM training: Fine-tune Llama, Mistral, and other models on Databricks with MosaicML's infrastructure.
Composer framework: Open-source training efficiency techniques: FlashAttention, gradient checkpointing, data loading optimizations.
Foundation Model Training: Train models from scratch on proprietary data (enterprise, data-sensitive use cases).
Model Serving: Deploy fine-tuned models as production APIs on Databricks infrastructure.
Streaming datasets: mosaic-streaming library for efficient large-dataset training without full data loading.
Multi-GPU training: Efficient sharded training across 100s of A100/H100 GPUs.

Technical notes

Status: Acquired by Databricks (June 2023, $1.3B); now Databricks ML platform
Open source: github.com/mosaicml/composer (Composer framework, Apache 2.0)
MPT models: Available on Hugging Face (mosaicml/mpt-7b, mosaicml/mpt-30b)
Current home: databricks.com/product/machine-learning
Access: Via Databricks platform; enterprise pricing

Ideal for

Enterprises already using Databricks who need LLM training and fine-tuning capabilities integrated with their data platform.
Research teams who want to use Composer's training efficiency techniques in custom training pipelines.
Organizations training models from scratch on proprietary data where data privacy prevents using external APIs.

Not ideal for

Teams without Databricks (the acquisition made MosaicML's managed service harder to access independently).
Small teams who need simple LoRA fine-tuning — Predibase or Unsloth are simpler and cheaper.
Real-time inference serving — Databricks' serving has higher latency than specialized inference providers.

MosaicML (Databricks)

Why it matters

Key capabilities (Databricks integration)

Technical notes

Ideal for

Not ideal for

See also

FAQ

Alternatives

Integrations

Built on

Why it matters

Key capabilities (Databricks integration)

Technical notes

Ideal for

Not ideal for

See also

MosaicML (Databricks)

Why it matters

Key capabilities (Databricks integration)

Technical notes

Ideal for

Not ideal for

See also

FAQ

What was MosaicML?

Is MosaicML still available?

What are the MPT models?

What replaced MosaicML for LLM training?

Alternatives

Integrations

Built on

Related tools

Why it matters

Key capabilities (Databricks integration)

Technical notes

Ideal for

Not ideal for

See also