Why it matters
- The simplest way to run a local LLM — download an app, pick a model, and start chatting in under 10 minutes.
- Apple Silicon support via Metal acceleration makes M1/M2/M3 Macs genuinely fast for 7B–13B models.
- Built-in OpenAI-compatible local server lets Continue.dev, Cline, and any other OpenAI-SDK tool use your local model — zero code changes needed.
- Privacy-first: all inference runs on your machine, nothing sent to any external server.
Key capabilities
- Model browser: Search and download GGUF models from HuggingFace, filtered by size and compatibility.
- Chat UI: Multi-turn chat with system prompt configuration, temperature, and context window controls.
- Local inference: GPU-accelerated inference via CUDA (NVIDIA), Metal (Apple Silicon), or CPU.
- OpenAI-compatible API server: Built-in server at
localhost:1234— drop-in replacement for OpenAI API calls. - Multi-model management: Download and switch between multiple models; manage storage.
- Context window configuration: Adjust context size to balance memory vs. capability.
- GGUF quantization support: Q4_K_M, Q5_K_M, Q8_0 — balance between quality and memory usage.
- Preset system prompts: Save and reuse custom system prompts for different use cases.
Technical notes
- Supported hardware: Apple Silicon (M1/M2/M3 via Metal), NVIDIA CUDA, AMD ROCm (experimental), CPU
- Platforms: macOS (10.14+), Windows (10/11), Linux
- Model format: GGUF (via llama.cpp); automatic download from HuggingFace Hub
- API: OpenAI-compatible REST API at
http://localhost:1234/v1 - Pricing: Free for personal use; commercial license required for enterprise deployment
- Use with other tools: Works with Continue.dev, Cline (via custom base URL), Jan, and any OpenAI SDK
- Maintained by: LM Studio team; regularly updated with new model support
Ideal for
- Developers and researchers who want a private, offline ChatGPT experience with frontier-quality open models.
- Teams building applications who want a local development LLM without API costs or internet dependency.
- Power users on Apple Silicon who want maximum local LLM performance in a polished desktop app.
Not ideal for
- Very large model deployments (70B+ without quantization) — requires 64GB+ RAM and powerful GPUs.
- Server or headless deployments — LM Studio is a GUI app; use Ollama or vLLM for server/API serving.
- Teams needing enterprise support, SLAs, or auditing capabilities.
See also
- Ollama — Command-line tool for running local LLMs, simpler for scripting.
- Text Generation WebUI — More feature-rich local LLM UI with extension ecosystem.
- Open WebUI — Self-hosted ChatGPT-like UI designed to connect to Ollama backends.