TechJun 1096% confidence

Google Introduces DiffusionGemma: Experimental Model Achieving 4x Faster Text Generation

Center 100%

3 sources

Google has released DiffusionGemma, an experimental 26-billion-parameter open-source language model that generates 256 tokens simultaneously using a diffusion-based approach rather than sequential token-by-token processing. The model, built on the Gemma 4 backbone and released under an Apache 2.0 license, achieves over 1,000 tokens per second on a single NVIDIA H100 GPU, compared to 200–450 tokens per second for comparable autoregressive Gemma 4 models. The release expands options for developers running local or low-concurrency AI applications, though Google acknowledges the model's output quality is lower than standard Gemma 4.

Google has introduced DiffusionGemma, an experimental open-source text generation model that applies diffusion techniques — previously used in image generation — to large-scale language modeling. Rather than predicting tokens sequentially left-to-right, the model starts with a 256-token block of random placeholders and iteratively refines the entire block in parallel, locking in high-confidence positions and re-evaluating uncertain ones across multiple passes. Benchmark data shows the FP8 version reaching approximately 1,008 tokens per second on a single H100 and 1,288 on an H200, representing roughly five to six times the speed of a standard autoregressive baseline under optimal single-user conditions, and approximately 4x faster than comparable Gemma 4 models in Google's own benchmarks. As a 26B Mixture of Experts model that activates only 3.8B parameters during inference, it fits within 18GB VRAM when quantized, making it compatible with consumer GPUs such as the NVIDIA RTX 4090 and 5090. The bidirectional attention architecture gives the model a structural advantage for constrained generation tasks — such as code infilling, structured data generation, and problems requiring context from later in a sequence — as demonstrated by a fine-tuned version achieving an 80% success rate on Sudoku puzzles that the base model could not solve at all. However, Google and independent researchers caution that the speed advantage diminishes in high-throughput cloud serving where autoregressive models already saturate compute through batching, and that overall output quality remains below standard Gemma 4 for open-ended generation tasks. The model is available now on Hugging Face and is the first diffusion language model natively supported in the open-source vLLM inference platform.

The numbersIntelligence vs. Latency: DiffusionGemma vs. Gemma 4 modelstok/s (x-axis), Avg. GPQA-Diamond LCB v6 % (y-axis)

Data: Article / DiffusionGemma

What's missing

The long-term roadmap for moving DiffusionGemma from experimental to production status, and whether Google plans to address the quality gap, is also not addressed.

What different sources said

Hacker NewsCenter
DiffusionGemma: 4x Faster Text Generation
VentureBeatCenter
Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes
Ars TechnicaCenter
Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Tech

Samsung Galaxy S25 and S25 FE See Significant Price Cuts

Samsung's Galaxy S25 and Galaxy S25 FE smartphones are currently available at notably reduced prices, with the S25 FE dropping $201 (33%) to $449 on Woot for a limited time. The price reductions come amid a competitive smartphone market and ahead of anticipated future Samsung releases. The discounts make previously premium-priced devices more accessible to budget-conscious consumers.

2 sourcesJun 16

Tech

Anthropic Disables Fable 5 and Mythos 5 AI Models Globally After US Government Export Control Order

Anthropic has suspended all public access to its two most advanced AI models, Fable 5 and Mythos 5, after the US Commerce Department issued an export control directive ordering the company to block foreign nationals from accessing them on national security grounds. The order came just three days after Fable 5's public launch and reportedly stems from government concerns about a potential jailbreak that could enable the models to assist with cyberattacks, though Anthropic says it received only verbal evidence of a narrow, non-universal vulnerability. The shutdown affects all customers globally — including enterprise users and Anthropic employees — and marks a significant escalation of US efforts to restrict foreign access to advanced AI models themselves, rather than just the chips that power them.

4 sourcesJun 16

Tech

Xbox Free Play Days Offers Three Games Free to Play June 11–14

Microsoft's Xbox Free Play Days program is offering Hell Let Loose, State of Decay 2: Juggernaut Edition, and Blasphemous 2 at no cost from June 11 to June 14. Hell Let Loose requires an Xbox Game Pass Ultimate, Premium, or Essential membership, while State of Decay 2 and Blasphemous 2 (via a five-hour timed trial) are accessible to all Xbox console owners. Players who wish to keep any of the games can purchase them at a limited-time discount and retain any achievements earned during the free period.

2 sourcesJun 13

Google Introduces DiffusionGemma: Experimental Model Achieving 4x Faster Text Generation

What's missing

What different sources said

Related

Samsung Galaxy S25 and S25 FE See Significant Price Cuts

Anthropic Disables Fable 5 and Mythos 5 AI Models Globally After US Government Export Control Order

Xbox Free Play Days Offers Three Games Free to Play June 11–14