Gemini 3.1 Flash Lite
Google's fastest and most cost-efficient Gemini 3 model
86.9% GPQA Diamond • 363 tokens/sec • 1432 Arena Elo • Beats Gemini 2.5 Flash on every benchmark
Gemini 3.1 Flash Lite Features
Optimized for speed, cost, and high-volume production workloads
2.5× Faster First Token
Gemini 3.1 Flash Lite achieves 2.5x faster time to first answer token vs Gemini 2.5 Flash — ideal for real-time and interactive applications.
Extreme Cost Efficiency
Priced at one-eighth the cost of Gemini 3.1 Pro, making it the most affordable option for high-volume production deployments.
363 Tokens Per Second
Achieves 363 tokens/sec output speed — 45% faster than Gemini 2.5 Flash's 249 tokens/sec — while maintaining similar or better quality.
Full Multimodal Input
Supports text, image, video, audio, and PDF inputs with a 1M token context window. Outputs text only.
Dynamic Thinking Levels
Adaptive thinking that matches compute to task complexity — from instant responses to deeper reasoning when needed.
High-Volume Agentic Tasks
Purpose-built for agentic pipelines, simple data extraction, classification, and translation at massive scale.
Gemini 3.1 Flash Lite Performance
Fastest and most cost-efficient in the Gemini 3 series
Speed & Throughput
Cost Efficiency
Context & Multimodal
Supported Capabilities
Gemini 3.1 Flash Lite Benchmark Results
Outperforms Gemini 2.5 Flash on every key benchmark — at lower cost
| Benchmark | Score | Description |
|---|---|---|
| Arena Elo | 1432 | Human preference ranking (Arena.ai) — outperforms models in its weight class |
| Intelligence Index | 34 | Artificial Analysis Intelligence Index — +12 points over Gemini 2.5 Flash-Lite |
| GPQA Diamond | 86.9% | PhD-level scientific knowledge — beats Gemini 2.5 Flash (82.8%), Claude 4.5 Haiku (73.0%), GPT-5 mini (82.3%) |
| MMMU-Pro | 76.8% | Multimodal understanding & reasoning — beats Claude Opus 4.6, Kimi K2.5, and GPT-5 mini (74.1%) |
| Video-MMMU | 84.8% | Knowledge acquisition from videos — outperforms GPT-5 mini (82.5%) and Grok 4.1 Fast (74.6%) |
| Humanity's Last Exam | 16.0% | Academic reasoning across text & multimodal — comparable to GPT-5 mini (16.7%) |
| LiveCodeBench | 72.0% | Code generation (Jan–May 2025) — 2× better than Gemini 2.5 Flash-Lite (34.3%) |
| SimpleQA Verified | 43.3% | Parametric knowledge accuracy — 4× better than GPT-5 mini (9.5%) and Claude 4.5 Haiku (5.5%) |
| MMMLU (Multilingual) | 88.9% | Multilingual Q&A — outperforms GPT-5 mini (84.9%), Claude 4.5 Haiku (83.0%), Grok 4.1 Fast (86.8%) |
| CharXiv Reasoning | 73.2% | Information synthesis from complex charts — beats Gemini 2.5 Flash (63.7%) and Claude 4.5 Haiku (61.7%) |
| MRCR v2 (128k) | 60.1% | Long context performance (8-needle, 128k avg) — outperforms GPT-5 mini (52.5%) and Claude 4.5 Haiku (35.3%) |
| Output Speed | 363 tok/s | 45% faster than Gemini 2.5 Flash (249 tok/s) — fastest in its price tier |
| Input Price | $0.25/1M | Per million input tokens — competitive with GPT-5 mini ($0.25/1M) |
| Output Price | $1.50/1M | Per million output tokens — significantly cheaper than Claude 4.5 Haiku ($5.00/1M) |
Source: Artificial Analysis & Arena.ai Leaderboard
About Gemini 3.1 Flash Lite
Gemini 3.1 Flash Lite is Google DeepMind's fastest and most cost-efficient model in the Gemini 3 series, launched on March 3, 2026. It scores 34 on the Artificial Analysis Intelligence Index — a 12-point jump over its predecessor Gemini 2.5 Flash-Lite. With 86.9% GPQA Diamond, 76.8% MMMU-Pro, 84.8% Video-MMMU, and 1432 Arena Elo, it outperforms models in its weight class and even surpasses previous-generation larger models like Gemini 2.5 Flash across reasoning, multimodal, and coding benchmarks.
Important Notice: Gemini3.us is an independent enthusiast community and developer platform. We are not affiliated with, endorsed by, or officially connected to Google LLC. We provide paid access to Google's official Gemini API services to support our infrastructure and operations.
Get Started with Gemini 3.1 Flash Lite
Experience the fastest and most cost-efficient Gemini 3 model