Why it matters
- Eliminates JSON parsing boilerplate — no more
json.loads(response.content) with try/except wrappers.
- Automatic retry with validation feedback dramatically improves structured extraction reliability vs. naive JSON prompting.
- Works with every major LLM API (OpenAI, Anthropic, Google, Cohere, Mistral) via provider-specific patches.
- Created by Jason Liu (jxnl), a respected practitioner in the LLM engineering community — built from real production patterns.
Key capabilities
- Pydantic integration: Define data models with Pydantic; receive typed, validated Python objects.
- Auto-retry: Validation errors are fed back to the LLM for automatic correction (configurable max_retries).
- Multi-provider: OpenAI, Anthropic Claude, Google Gemini, Cohere, Mistral, Groq, Ollama.
- Streaming: Stream structured outputs token-by-token with partial object construction.
- Nested models: Support for complex nested Pydantic schemas with relationships.
- Validators: Custom Pydantic validators run on extracted data; failures trigger retry.
- Async support:
AsyncInstructor for async/await usage with async LLM clients.
- Hooks: Pre/post-processing hooks for logging, caching, and monitoring.
Technical notes
- Install:
pip install instructor
- License: MIT (open source)
- GitHub: github.com/jxnl/instructor (9K+ stars)
- Providers: OpenAI, Anthropic, Google, Cohere, Mistral, Groq, Ollama, Bedrock
- Python: 3.9+
- Created by: Jason Liu (jxnl)
Usage example
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class User(BaseModel):
name: str
age: int
user = client.chat.completions.create(
model="gpt-4o",
response_model=User,
messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
)
# user.name == "John", user.age == 30
Ideal for
- Building data extraction pipelines where LLM output must conform to specific schemas (invoices, entities, classifications).
- API developers who want typed responses from LLMs without building custom parsing and validation logic.
- Teams using Pydantic already who want to extend it to LLM responses naturally.
Not ideal for
- Local model users who want guaranteed constrained generation — use Outlines for mathematical guarantees.
- Free-form text generation where structure isn't needed.
- Non-Python environments — Instructor is Python-only (JavaScript users can use Zod + Vercel AI SDK).
See also
- Outlines — Constrained generation library; mathematical guarantees for local models.
- Guidance — Microsoft's constrained generation framework; more complex but more powerful.
- Pydantic — The validation library Instructor is built on.