Why it matters
- Developed by LangChain — the deepest observability integration for the LangChain ecosystem, with zero-config tracing for LangChain apps.
- Full-stack trace visibility: see exactly what prompts, tokens, latency, and costs were incurred at every step of a chain or agent run.
- Online evaluation lets you attach custom evaluators that automatically grade outputs on quality, accuracy, or custom criteria.
- Dataset management turns production traces into test suites — continuously test prompt changes against real production examples.
Key capabilities
- Full trace capture: Record every LLM call, chain step, tool use, and retrieval in a hierarchical trace tree.
- Latency and cost tracking: See token counts, latency, and model cost for every call — identify bottlenecks and expensive steps.
- Prompt versioning: Save and compare prompt templates; test different versions against the same dataset.
- Dataset curation: Create test datasets from production traces; annotate examples as ground truth.
- Automated evaluation: Define evaluators (LLM-based or custom Python) that grade output quality automatically.
- Human annotation: UI for human labelers to rate and annotate model outputs for feedback datasets.
- Regression testing: Run evaluation suites in CI/CD to catch prompt regressions before deployment.
- LangChain integration: Automatic tracing with one env variable (
LANGCHAIN_TRACING_V2=true).
Technical notes
- SDK: Python and JavaScript/TypeScript (
pip install langsmith);@traceabledecorator and Client API - LangChain integration: Automatic with
LANGCHAIN_TRACING_V2=trueandLANGCHAIN_API_KEY - Data storage: Hosted on LangChain's cloud (US); no self-hosted option (use LangFuse for that)
- Pricing: Free (5,000 traces/mo); Plus $39/mo; Teams $299/mo; Enterprise custom
- API: REST API for programmatic access to traces, datasets, and evaluations
- Founded: 2023 (LangChain Inc., spun out from Harrison Chase's work); San Francisco
Ideal for
- Teams using LangChain who want native observability with zero additional configuration.
- Production AI apps where debugging complex multi-step agent failures is a regular occurrence.
- ML teams building systematic evaluation pipelines with automated regression detection.
Not ideal for
- Organizations with strict data residency requirements — LangSmith is cloud-only (use LangFuse for self-hosted).
- Teams primarily using non-Python LLM stacks where LangSmith's SDK adds friction.
- Budget-conscious teams — LangFuse open-source offers comparable features for free.
See also
- LangFuse — Open-source LLM observability with self-hosted deployment option.
- Weights & Biases — ML experiment tracking platform with W&B Weave for LLM tracing.
- Helicone — Lightweight LLM proxy with observability (simpler than LangSmith).