Google Launches Gemini 3 Flash as Default Model
Gemini 3 Flash now powers the main Gemini experience in the app and in many Google services that use AI mode. Users can still switch to Gemini 3 Pro for heavier tasks, but Flash is now the first model most people hit by default.
Google positions Gemini 3 Flash as a frontier level model that balances speed, cost, and intelligence so it can handle both personal queries and production systems at scale.
Speed and Performance
Gemini 3 Flash is about three times faster than Gemini 2 point 5 Pro in independent benchmarks, with sub one second time to first token for typical prompts. It can stream about 218 tokens per second, which is much higher than the earlier Pro series, so responses feel almost instant in chat and apps.
On reasoning tests like SWE bench Verified for coding agents, Gemini 3 Flash reaches around 78 percent, which beats both Gemini 2 point 5 models and even Gemini 3 Pro in that specific benchmark. Google also notes that it uses about 30 percent fewer tokens on average than 2 point 5 Pro for common tasks, which helps control cost at scale.
Technical Capabilities
Gemini 3 Flash shares the same core architecture as Gemini 3 Pro but is tuned for low latency and high throughput. It supports full multimodal input so it can work with text, images, audio, and video in the same prompt for tasks like video analysis and visual question answering.
The model is built for agentic workflows and tool use, which lets it call external tools or APIs, analyze large document sets, and drive coding agents inside integrated development environments. Google highlights strong performance in code generation and debugging, with Pro grade coding quality in a model that stays responsive in tight feedback loops.
Pricing and Enterprise Features
Through the Gemini API and Vertex AI, Gemini 3 Flash is priced at about 0 point 50 dollars per one million input tokens and 3 dollars per one million output tokens, which is much cheaper than Gemini 3 Pro and the older 2 point 5 Pro series. Audio input is billed separately at around 1 dollar per one million tokens.
Google also includes context caching features that can cut token costs by up to 90 percent for repeated context in enterprise apps, which matters for chatbots or agents that reuse the same long instructions or knowledge base. For large companies, this cost profile makes it easier to run high volume workloads like support agents, internal copilots, or analytics bots.
Impact for Users and Businesses
For regular users of the Gemini app, the shift means faster replies, better multimodal understanding, and more capable planning features without needing to manually pick an advanced model. The upgrade is automatic, so anyone using Gemini now benefits from the new default model.
For developers and enterprises, Gemini 3 Flash offers near Pro level reasoning at much lower latency and price, which fits real time interfaces, production agents, and large batch processing. It effectively becomes the main workhorse model in the Gemini stack for most use cases where speed, scale, and cost matter more than absolute peak accuracy.
Disclaimer:
The information provided reflects the features and pricing of Gemini 3 Flash as of its launch. Prices and capabilities may vary over time. Always refer to official sources for the latest updates.




