gemini3.us
Released March 3, 2026

Gemini 3.1 Flash Lite

Google's fastest and most cost-efficient Gemini 3 model

86.9% GPQA Diamond • 363 tokens/sec • 1432 Arena Elo • Beats Gemini 2.5 Flash on every benchmark

Explore Benchmarks
2.5×
Faster First Token
363
Tokens/sec
1M
Token Context
1/8×
Cost vs Pro

Gemini 3.1 Flash Lite Features

Optimized for speed, cost, and high-volume production workloads

2.5× Faster First Token

Gemini 3.1 Flash Lite achieves 2.5x faster time to first answer token vs Gemini 2.5 Flash — ideal for real-time and interactive applications.

Extreme Cost Efficiency

Priced at one-eighth the cost of Gemini 3.1 Pro, making it the most affordable option for high-volume production deployments.

363 Tokens Per Second

Achieves 363 tokens/sec output speed — 45% faster than Gemini 2.5 Flash's 249 tokens/sec — while maintaining similar or better quality.

Full Multimodal Input

Supports text, image, video, audio, and PDF inputs with a 1M token context window. Outputs text only.

Dynamic Thinking Levels

Adaptive thinking that matches compute to task complexity — from instant responses to deeper reasoning when needed.

High-Volume Agentic Tasks

Purpose-built for agentic pipelines, simple data extraction, classification, and translation at massive scale.

Gemini 3.1 Flash Lite Performance

Fastest and most cost-efficient in the Gemini 3 series

Speed & Throughput

Time to First Token
2.5× faster than Gemini 2.5 Flash
2.5×
Output Speed
363 tokens/sec vs 249 tokens/sec
363/s
Speed Improvement
45% faster output than Gemini 2.5 Flash
+45%
Latency Profile
Optimized for low-latency applications
Low

Cost Efficiency

Cost vs Gemini 3.1 Pro
One-eighth the price
1/8×
Input Pricing
Extremely low cost per 1M tokens
Low
Scale Suitability
Designed for millions of requests/day
M+/day
Budget Optimization
Best cost-to-quality ratio in Gemini 3 series
Best

Context & Multimodal

Context Window
1,048,576 input tokens
1M
Output Token Limit
65,536 tokens per response
65K
Input Types
Text, Image, Video, Audio, PDF
5
Batch API
Supported for bulk processing

Supported Capabilities

Function Calling
Supported
Search Grounding
Supported
Code Execution
Supported
Structured Outputs
Supported

Gemini 3.1 Flash Lite Benchmark Results

Outperforms Gemini 2.5 Flash on every key benchmark — at lower cost

BenchmarkScoreDescription
Arena Elo1432Human preference ranking (Arena.ai) — outperforms models in its weight class
Intelligence Index34Artificial Analysis Intelligence Index — +12 points over Gemini 2.5 Flash-Lite
GPQA Diamond86.9%PhD-level scientific knowledge — beats Gemini 2.5 Flash (82.8%), Claude 4.5 Haiku (73.0%), GPT-5 mini (82.3%)
MMMU-Pro76.8%Multimodal understanding & reasoning — beats Claude Opus 4.6, Kimi K2.5, and GPT-5 mini (74.1%)
Video-MMMU84.8%Knowledge acquisition from videos — outperforms GPT-5 mini (82.5%) and Grok 4.1 Fast (74.6%)
Humanity's Last Exam16.0%Academic reasoning across text & multimodal — comparable to GPT-5 mini (16.7%)
LiveCodeBench72.0%Code generation (Jan–May 2025) — 2× better than Gemini 2.5 Flash-Lite (34.3%)
SimpleQA Verified43.3%Parametric knowledge accuracy — 4× better than GPT-5 mini (9.5%) and Claude 4.5 Haiku (5.5%)
MMMLU (Multilingual)88.9%Multilingual Q&A — outperforms GPT-5 mini (84.9%), Claude 4.5 Haiku (83.0%), Grok 4.1 Fast (86.8%)
CharXiv Reasoning73.2%Information synthesis from complex charts — beats Gemini 2.5 Flash (63.7%) and Claude 4.5 Haiku (61.7%)
MRCR v2 (128k)60.1%Long context performance (8-needle, 128k avg) — outperforms GPT-5 mini (52.5%) and Claude 4.5 Haiku (35.3%)
Output Speed363 tok/s45% faster than Gemini 2.5 Flash (249 tok/s) — fastest in its price tier
Input Price$0.25/1MPer million input tokens — competitive with GPT-5 mini ($0.25/1M)
Output Price$1.50/1MPer million output tokens — significantly cheaper than Claude 4.5 Haiku ($5.00/1M)

Source: Artificial Analysis & Arena.ai Leaderboard

About Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google DeepMind's fastest and most cost-efficient model in the Gemini 3 series, launched on March 3, 2026. It scores 34 on the Artificial Analysis Intelligence Index — a 12-point jump over its predecessor Gemini 2.5 Flash-Lite. With 86.9% GPQA Diamond, 76.8% MMMU-Pro, 84.8% Video-MMMU, and 1432 Arena Elo, it outperforms models in its weight class and even surpasses previous-generation larger models like Gemini 2.5 Flash across reasoning, multimodal, and coding benchmarks.

Important Notice: Gemini3.us is an independent enthusiast community and developer platform. We are not affiliated with, endorsed by, or officially connected to Google LLC. We provide paid access to Google's official Gemini API services to support our infrastructure and operations.

Get Started with Gemini 3.1 Flash Lite

Experience the fastest and most cost-efficient Gemini 3 model