GPT-4o ("o" for omni) is OpenAI's flagship model combining text, vision, and audio understanding in a single end-to-end model — rather than stitching separate specialized models together. Released in May 2024, it achieves near-human response latency for voice interactions and matches GPT-4 Turbo on text intelligence benchmarks while being 2× faster and 50% cheaper.
Key capabilities
- Native multimodality — processes and generates text, images, and audio without pipeline handoffs
- 128K context window — fits entire codebases, long documents, and multi-turn conversations
- Function calling & JSON mode — reliable structured outputs for agentic workflows
- Vision understanding — analyzes charts, diagrams, screenshots, and real-world photos
- Multilingual — improved performance across non-English languages versus prior models
What it's best for
GPT-4o excels at tasks requiring a blend of reasoning and speed: customer-facing chatbots, real-time voice assistants, code generation, document analysis, and complex instruction following. For extended reasoning chains (math, competitive coding), consider o1 or o3-mini.
Pricing
Available on ChatGPT (Free tier has limited access; Plus/Pro has full access) and via the OpenAI API at $2.50/M input tokens and $10/M output tokens.