Portkey is an AI gateway that sits in front of your LLM API calls. You send requests to Portkey's endpoint instead of directly to OpenAI or Anthropic — Portkey then routes them to the right provider, handles retries and fallbacks, caches repeated queries, logs all requests for observability, and tracks costs. The OpenAI SDK is compatible: just change the base URL and add your Portkey API key.

Portkey has a free tier with 10,000 requests/month, basic observability, and core gateway features. The Pro plan (~$49/mo) adds higher volume, team collaboration, and guardrails. Enterprise plans offer dedicated infrastructure, SSO, and custom SLAs.

How does Portkey's fallback routing work?

You define a routing configuration (called a 'gateway config') specifying primary and fallback providers. If OpenAI returns a 429 rate limit or 500 error, Portkey automatically retries with Anthropic Claude or another configured provider — your application sees a seamless response. This prevents outages from single-provider failures without any changes to your application code.

How does Portkey's semantic cache work?

Portkey's semantic cache stores LLM responses and matches future requests by embedding similarity rather than exact string match. If a user asks 'What is the capital of France?' and earlier a similar question was cached, Portkey returns the cached answer. Cache hit rates of 20-40% are common for customer support and FAQ use cases, reducing both latency and API costs.

Portkey | db.fyi

Why it matters

Single endpoint for all LLM providers eliminates provider lock-in and enables A/B testing across GPT-4, Claude, Gemini, and others.
Automatic fallbacks prevent production outages when a single provider experiences downtime or rate limits.
Semantic caching can reduce LLM API costs by 20-40% for repeated or similar queries without any application changes.
OpenAI SDK compatibility means zero code changes beyond the base URL — drop-in adoption for existing apps.

Key capabilities

Multi-provider routing: Route to OpenAI, Anthropic, Cohere, Mistral, Azure OpenAI, AWS Bedrock, and 20+ providers.
Automatic fallbacks: Define fallback chains; switch providers on rate limits or errors without downtime.
Load balancing: Distribute requests across multiple API keys or providers for higher throughput.
Semantic caching: Cache LLM responses by embedding similarity; serve cached answers for semantically equivalent queries.
Request logging: Log every LLM call with input, output, latency, cost, and model metadata.
Prompt versioning: Version and deploy prompts with rollback capabilities.
Guardrails: Detect and filter PII, harmful content, and off-topic requests before they reach the LLM.
Analytics dashboard: Track spend, latency, token usage, and error rates across providers.
SDK compatibility: Works with OpenAI Python/JS SDK, LangChain, LlamaIndex, and raw HTTP.

Technical notes

Integration: Change base URL to api.portkey.ai + add x-portkey-api-key header
SDK support: OpenAI Python, OpenAI JS, LangChain, LlamaIndex, raw REST
Providers: OpenAI, Anthropic, Cohere, Mistral, Azure, Bedrock, Vertex AI, Groq, 20+
Caching: Semantic cache (embedding-based) + exact match cache
Observability: Logs, traces, cost tracking, latency percentiles
Hosting: Cloud (Portkey-managed); self-hosted (Enterprise)
Pricing: Free (10K req/mo); Pro ~$49/mo; Enterprise custom
Founded: 2022; San Francisco; YC W23

Ideal for

Teams running LLM apps in production who need reliability (fallbacks), cost control (caching), and visibility (logging).
Organizations using multiple LLM providers who want a unified routing layer without custom middleware.
Developers migrating between providers or A/B testing GPT-4 vs. Claude for quality and cost comparisons.

Not ideal for

Local LLM deployments (Ollama, LocalAI) where the gateway adds unnecessary network hops.
Simple single-model prototypes where the overhead of a gateway isn't justified.
Teams who need full evaluation pipelines — Braintrust or LangSmith have stronger eval workflows.

Portkey

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also

FAQ

Alternatives

Integrations

Built on

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also

Portkey

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also

FAQ

What is Portkey?

Is Portkey free?

How does Portkey's fallback routing work?

How does Portkey's semantic cache work?

Alternatives

Integrations

Built on

Related tools

Why it matters

Key capabilities

Technical notes

Ideal for

Not ideal for

See also