Overview

Gemini 1.5 Pro is Google's flagship long-context multimodal model, offering a context window of up to 1,048,576 tokens — approximately one million tokens. This is a transformative capability: it means you can feed in an entire codebase, a full-length novel, hours of transcribed audio, or a lengthy collection of documents and have the model reason across all of it in a single request.

The 1 Million Token Window

The significance of a 1M token context window cannot be overstated. Where most models force you to chunk documents and manage retrieval separately, Gemini 1.5 Pro can hold everything in working memory at once. Practical implications include:

Entire codebases: Drop in a large repository and ask architectural questions, trace bugs across files, or generate comprehensive documentation.
Long documents: Analyse lengthy legal contracts, research papers, or technical manuals without losing context between sections.
Extended conversations: Maintain coherent, in-depth dialogue sessions that span tens of thousands of words without summarisation artifacts.
Multi-document synthesis: Feed in dozens of source documents simultaneously for cross-referencing and synthesis tasks.

Multimodal Capabilities

Beyond text, Gemini 1.5 Pro processes a wide range of modalities natively:

Video: Analyse video files up to approximately one hour in length (within the token budget). Ask questions about specific timestamps, summarise content, or extract information from visual sequences.
Audio: Transcribe, summarise, and reason about audio files including speech, music, and environmental sounds.
Images and documents: Interpret charts, diagrams, photographs, screenshots, and scanned PDFs with high accuracy.
Code: Strong code generation, debugging, and review capabilities across dozens of programming languages.

Pricing

Gemini 1.5 Pro uses a tiered pricing model. Prompts up to 128K tokens are charged at $1.25 per million input tokens; prompts exceeding 128K tokens are billed at $2.50 per million input tokens. Output is charged at $5.00 per million tokens regardless of prompt length.

A free tier is available through Google AI Studio with rate limits, making it accessible for experimentation before committing to production usage.

Best Use Cases

Document analysis and Q&A: Legal review, financial report analysis, academic research summarisation.
Long-context coding: Refactoring large codebases, understanding legacy systems, cross-file debugging.
Video understanding: Content moderation, meeting summarisation, educational content review.
Multi-step reasoning: Complex tasks that require holding a large body of evidence in context simultaneously.
Enterprise RAG alternative: For many use cases, the 1M context window can replace a traditional retrieval pipeline entirely.

Access

Gemini 1.5 Pro is available through the Google AI Studio (free tier) and the Gemini API for production use. It is also accessible through Google Cloud Vertex AI for enterprise deployments with additional compliance and data residency controls.

Context

1.0M

131K

200K

128K

MMLU

85.9

87.5

88.7

HumanEval

—

93.7

90.2

MATH

58.5

76.1

78.3

76.6

GPQA

—

65.0

53.6

Pricing

Freemium

Input $/M

$1.25

$2.00

$3.00

$2.50

Overview

The 1 Million Token Window

Entire codebases: Drop in a large repository and ask architectural questions, trace bugs across files, or generate comprehensive documentation.

Long documents: Analyse lengthy legal contracts, research papers, or technical manuals without losing context between sections.

Extended conversations: Maintain coherent, in-depth dialogue sessions that span tens of thousands of words without summarisation artifacts.

Multi-document synthesis: Feed in dozens of source documents simultaneously for cross-referencing and synthesis tasks.

Multimodal Capabilities

Beyond text, Gemini 1.5 Pro processes a wide range of modalities natively:

Video: Analyse video files up to approximately one hour in length (within the token budget). Ask questions about specific timestamps, summarise content, or extract information from visual sequences.

Audio: Transcribe, summarise, and reason about audio files including speech, music, and environmental sounds.

Images and documents: Interpret charts, diagrams, photographs, screenshots, and scanned PDFs with high accuracy.

Code: Strong code generation, debugging, and review capabilities across dozens of programming languages.

Pricing

A free tier is available through Google AI Studio with rate limits, making it accessible for experimentation before committing to production usage.

Best Use Cases

Document analysis and Q&A: Legal review, financial report analysis, academic research summarisation.

Long-context coding: Refactoring large codebases, understanding legacy systems, cross-file debugging.

Video understanding: Content moderation, meeting summarisation, educational content review.

Multi-step reasoning: Complex tasks that require holding a large body of evidence in context simultaneously.

Enterprise RAG alternative: For many use cases, the 1M context window can replace a traditional retrieval pipeline entirely.

Provider	Google
Released	2024-02-15
Status	Current
Context window	1.0M tokens
Pricing	Freemium
Input price	$1.25/M
Output price	$5.00/M
Capabilities	textvisionaudiovideocodefunction-calling
API docs	Docs ↗

Gemini 1.5 Pro

Benchmarks

Overview

The 1 Million Token Window

Multimodal Capabilities

Pricing

Best Use Cases

Access

Compare with similar models

Tools built on this model

Overview

The 1 Million Token Window

Multimodal Capabilities

Pricing

Best Use Cases

Access