Overview

Gemini 1.5 Flash is Google's speed-optimised multimodal model, designed for high-throughput production workloads where latency and cost are as important as capability. It retains the full 1,048,576 token context window of Gemini 1.5 Pro while delivering responses significantly faster and at a fraction of the cost.

Speed and Cost

The numbers speak clearly: at $0.075 per million input tokens and $0.30 per million output tokens, Gemini 1.5 Flash is one of the most affordable large-context models available. For comparison, Gemini 1.5 Pro charges 16× more per input token. When you're processing millions of requests per day, this difference is the gap between a viable product and a cost-prohibitive one.

Latency improvements come from a distilled model architecture that sacrifices some accuracy at the margins to deliver substantially lower time-to-first-token and higher throughput — the right trade-off for most production systems.

Free API Tier

Gemini 1.5 Flash is available free of charge through the Gemini API up to generous rate limits (requests per minute and per day). This makes it an excellent starting point for:

Prototyping and early-stage products
Personal projects and research
Learning and experimentation without billing setup

The free tier uses your requests to improve Google's models unless you opt out — a trade-off worth considering for sensitive data.

1M Context at Production Cost

The combination of a 1M token window with Flash-level pricing opens up use cases that were previously cost-prohibitive:

Bulk document classification: Process thousands of long documents per day without per-document retrieval overhead.
High-volume extraction: Pull structured data from lengthy reports, forms, or transcripts at scale.
Real-time summarisation pipelines: Summarise long meeting transcripts, articles, or conversations as they arrive.
Multi-document comparison: Compare large sets of documents without chunking or summarisation artifacts.

Multimodal on a Budget

Like its Pro sibling, Flash supports text, vision, audio, video, code, and function calling — all in the same model. This makes it practical to build multimodal pipelines (e.g., image classification, audio transcription, video description) at production scale without reaching for separate specialist models.

Best Use Cases

High-volume classification and extraction: Any task that needs to process many documents quickly and cheaply.
Production chatbots: Fast, coherent conversational responses with large context for memory.
Data pipelines: Structured extraction from unstructured documents at scale.
Multimodal preprocessing: Cheap first-pass analysis before routing complex cases to a stronger model.
Prototyping with real data: Test ideas using the full 1M context without incurring Pro-level costs.

Access

Available via the Google Gemini API and Google AI Studio. Flash is also available on Google Cloud Vertex AI for enterprise deployments.

Context

1.0M

33K

128K

MMLU

79.9

82.0

81.0

82.0

HumanEval

—

87.2

MATH

—

GPQA

—

Pricing

Free

Open

Freemium

Input $/M

$0.07

$0.10

$0.15

Overview

Speed and Cost

Free API Tier

Gemini 1.5 Flash is available free of charge through the Gemini API up to generous rate limits (requests per minute and per day). This makes it an excellent starting point for:

Prototyping and early-stage products

Personal projects and research

Learning and experimentation without billing setup

The free tier uses your requests to improve Google's models unless you opt out — a trade-off worth considering for sensitive data.

1M Context at Production Cost

The combination of a 1M token window with Flash-level pricing opens up use cases that were previously cost-prohibitive:

Bulk document classification: Process thousands of long documents per day without per-document retrieval overhead.

High-volume extraction: Pull structured data from lengthy reports, forms, or transcripts at scale.

Real-time summarisation pipelines: Summarise long meeting transcripts, articles, or conversations as they arrive.

Multi-document comparison: Compare large sets of documents without chunking or summarisation artifacts.

Multimodal on a Budget

Best Use Cases

High-volume classification and extraction: Any task that needs to process many documents quickly and cheaply.

Production chatbots: Fast, coherent conversational responses with large context for memory.

Data pipelines: Structured extraction from unstructured documents at scale.

Multimodal preprocessing: Cheap first-pass analysis before routing complex cases to a stronger model.

Prototyping with real data: Test ideas using the full 1M context without incurring Pro-level costs.

Provider	Google
Released	2024-05-14
Status	Current
Context window	1.0M tokens
Pricing	Free
Input price	$0.07/M
Output price	$0.30/M
Capabilities	textvisionaudiovideocodefunction-calling
API docs	Docs ↗

Gemini 1.5 Flash

Benchmarks

Overview

Speed and Cost

Free API Tier

1M Context at Production Cost

Multimodal on a Budget

Best Use Cases

Access

Compare with similar models

Tools built on this model

Overview

Speed and Cost

Free API Tier

1M Context at Production Cost

Multimodal on a Budget

Best Use Cases

Access