Released March 3, 2026

Gemini 3.1 Flash Lite

Google's fastest and most cost-efficient Gemini 3 model

86.9% GPQA Diamond • 363 tokens/sec • 1432 Arena Elo • Beats Gemini 2.5 Flash on every benchmark

Explore Benchmarks

2.5×

Faster First Token

363

Tokens/sec

Token Context

1/8×

Cost vs Pro

Gemini 3.1 Flash Lite Features

Optimized for speed, cost, and high-volume production workloads

2.5× Faster First Token

Gemini 3.1 Flash Lite achieves 2.5x faster time to first answer token vs Gemini 2.5 Flash — ideal for real-time and interactive applications.

Extreme Cost Efficiency

Priced at one-eighth the cost of Gemini 3.1 Pro, making it the most affordable option for high-volume production deployments.

363 Tokens Per Second

Achieves 363 tokens/sec output speed — 45% faster than Gemini 2.5 Flash's 249 tokens/sec — while maintaining similar or better quality.

Full Multimodal Input

Supports text, image, video, audio, and PDF inputs with a 1M token context window. Outputs text only.

Dynamic Thinking Levels

Adaptive thinking that matches compute to task complexity — from instant responses to deeper reasoning when needed.

High-Volume Agentic Tasks

Purpose-built for agentic pipelines, simple data extraction, classification, and translation at massive scale.

Gemini 3.1 Flash Lite Performance

Fastest and most cost-efficient in the Gemini 3 series

Speed & Throughput

Time to First Token

2.5× faster than Gemini 2.5 Flash

2.5×

Output Speed

363 tokens/sec vs 249 tokens/sec

363/s

Speed Improvement

45% faster output than Gemini 2.5 Flash

+45%

Latency Profile

Optimized for low-latency applications

Low

Cost Efficiency

Cost vs Gemini 3.1 Pro

One-eighth the price

1/8×

Input Pricing

Extremely low cost per 1M tokens

Low

Scale Suitability

Designed for millions of requests/day

M+/day

Budget Optimization

Best cost-to-quality ratio in Gemini 3 series

Best

Context & Multimodal

Context Window

1,048,576 input tokens

Output Token Limit

65,536 tokens per response

65K

Input Types

Text, Image, Video, Audio, PDF

Batch API

Supported for bulk processing

✓

Supported Capabilities

Function Calling

Supported

✓

Search Grounding

Supported

✓

Code Execution

Supported

✓

Structured Outputs

Supported

✓

Gemini 3.1 Flash Lite Benchmark Results

Outperforms Gemini 2.5 Flash on every key benchmark — at lower cost

Benchmark	Score	Description
Arena Elo	1432	Human preference ranking (Arena.ai) — outperforms models in its weight class
Intelligence Index	34	Artificial Analysis Intelligence Index — +12 points over Gemini 2.5 Flash-Lite
GPQA Diamond	86.9%	PhD-level scientific knowledge — beats Gemini 2.5 Flash (82.8%), Claude 4.5 Haiku (73.0%), GPT-5 mini (82.3%)
MMMU-Pro	76.8%	Multimodal understanding & reasoning — beats Claude Opus 4.6, Kimi K2.5, and GPT-5 mini (74.1%)
Video-MMMU	84.8%	Knowledge acquisition from videos — outperforms GPT-5 mini (82.5%) and Grok 4.1 Fast (74.6%)
Humanity's Last Exam	16.0%	Academic reasoning across text & multimodal — comparable to GPT-5 mini (16.7%)
LiveCodeBench	72.0%	Code generation (Jan–May 2025) — 2× better than Gemini 2.5 Flash-Lite (34.3%)
SimpleQA Verified	43.3%	Parametric knowledge accuracy — 4× better than GPT-5 mini (9.5%) and Claude 4.5 Haiku (5.5%)
MMMLU (Multilingual)	88.9%	Multilingual Q&A — outperforms GPT-5 mini (84.9%), Claude 4.5 Haiku (83.0%), Grok 4.1 Fast (86.8%)
CharXiv Reasoning	73.2%	Information synthesis from complex charts — beats Gemini 2.5 Flash (63.7%) and Claude 4.5 Haiku (61.7%)
MRCR v2 (128k)	60.1%	Long context performance (8-needle, 128k avg) — outperforms GPT-5 mini (52.5%) and Claude 4.5 Haiku (35.3%)
Output Speed	363 tok/s	45% faster than Gemini 2.5 Flash (249 tok/s) — fastest in its price tier
Input Price	$0.25/1M	Per million input tokens — competitive with GPT-5 mini ($0.25/1M)
Output Price	$1.50/1M	Per million output tokens — significantly cheaper than Claude 4.5 Haiku ($5.00/1M)

Source: Artificial Analysis & Arena.ai Leaderboard

About Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google DeepMind's fastest and most cost-efficient model in the Gemini 3 series, launched on March 3, 2026. It scores 34 on the Artificial Analysis Intelligence Index — a 12-point jump over its predecessor Gemini 2.5 Flash-Lite. With 86.9% GPQA Diamond, 76.8% MMMU-Pro, 84.8% Video-MMMU, and 1432 Arena Elo, it outperforms models in its weight class and even surpasses previous-generation larger models like Gemini 2.5 Flash across reasoning, multimodal, and coding benchmarks.

Important Notice: Gemini3.us is an independent enthusiast community and developer platform. We are not affiliated with, endorsed by, or officially connected to Google LLC. We provide paid access to Google's official Gemini API services to support our infrastructure and operations.

Get Started with Gemini 3.1 Flash Lite

Experience the fastest and most cost-efficient Gemini 3 model