Google Introduces DiffusionGemma, an Experimental Open Model for High-Speed AI Text Generation

Google has unveiled DiffusionGemma, an experimental open-source AI model designed to explore a fundamentally different approach to text generation using diffusion techniques. Released under the Apache 2.0 licence, the model aims to significantly accelerate text generation while enabling new types of interactive AI applications.

Unlike conventional autoregressive large language models (LLMs), which generate text sequentially one token at a time, DiffusionGemma produces entire blocks of text simultaneously. The approach, derived from Google's Gemini Diffusion research, enables up to four times faster text generation on GPUs compared with traditional architectures.

Built on the Gemma 4 family of open models, DiffusionGemma employs a 26-billion-parameter Mixture of Experts (MoE) architecture, activating only 3.8 billion parameters during inference. This design allows the model to operate efficiently on high-end consumer hardware while maintaining rapid generation speeds.

According to Google, DiffusionGemma can generate more than 1,000 tokens per second on a single NVIDIA H100 GPU and over 700 tokens per second on an NVIDIA GeForce RTX 5090, making it particularly attractive for latency-sensitive applications.

Rethinking Text Generation

The model represents a departure from the dominant autoregressive paradigm that underpins most modern AI chatbots and text generation systems. By generating 256 tokens in parallel during each forward pass, DiffusionGemma enables every token to attend to all others within the generated block.

This bi-directional attention mechanism offers advantages for tasks that require understanding relationships across an entire sequence rather than processing information sequentially. Potential use cases include code infilling, document editing, mathematical graph generation, amino acid sequence modelling and other non-linear workflows.

Google said the model's architecture also enables "intelligent self-correction," allowing DiffusionGemma to iteratively refine generated content and identify errors across the entire text block in real time.

Designed for Research and Interactive Workflows

While Google continues to position its autoregressive Gemma 4 models as the preferred choice for production deployments requiring maximum output quality, DiffusionGemma is targeted at researchers and developers experimenting with speed-critical applications.

The company highlighted use cases such as in-line editing, rapid content iteration and interactive local AI systems where inference speed is often a major constraint.

Another key advantage is accessibility. When quantised, DiffusionGemma can run within approximately 18GB of VRAM, making it feasible to deploy on advanced consumer-grade GPUs rather than requiring specialised data centre hardware.

New Possibilities for Fine-Tuning

Google also emphasised the model's potential for specialised fine-tuning. In one demonstration, AI training platform Unsloth fine-tuned DiffusionGemma to solve Sudoku puzzles, a task that often challenges autoregressive models because each token may depend on future information.

The model's ability to process and reason across entire token blocks simultaneously makes it particularly well-suited to such structured reasoning problems.

Expanding the Open AI Research Ecosystem

The release reflects growing interest in diffusion-based approaches beyond image generation and highlights ongoing efforts to develop alternative architectures capable of overcoming the speed and scalability limitations of traditional language models.

By making DiffusionGemma openly available under a permissive licence, Google aims to encourage researchers and developers to explore new possibilities for fast, interactive and non-linear AI applications, potentially shaping the next generation of language model architectures.

While still experimental, DiffusionGemma offers an early glimpse into how diffusion-based text generation could complement conventional LLMs and open new frontiers in AI performance and usability.

Reset Your Password

AI Special

Google Introduces DiffusionGemma, an Experimental Open Model for High-Speed AI Text Generation

AI Tools Directory

Google Introduces DiffusionGemma, an Experimental Open Model for High-Speed AI Text Generation

Meta Expands Use of Off-Platform Activity Data to Personalise AI and Content Experiences

Anthropic Unveils Claude Fable 5 and Mythos 5, Introduces Enhanced Safeguards for Advanced AI Capabilities

Apple Introduces Siri AI, Expands Apple Intelligence across Devices

Funding Tracker

Anthropic Files First, Moves Loud: The AI IPO Race Stops Being Hypothetical

Submer Group Announces Rubix Data Centres Platform to Deliver AI Data Centres Globally

NVIDIA Expands AI Vision at GTC Taipei 2026 With New PC Chips, Vera Rubin Platform and Taiwan Investment Push

AI Watchlist

Carbon Cognition: Why Every AI Thought Costs the Earth Something

Niqo Robotics Nears Profitability with AI-Powered Farm Automation Business

JFrog Report Flags AI Security Enforcement Gaps in Singapore Enterprises

ABB Robotics Advances Industrial AI Vision With Autonomous Versatile Robotics and Digital Twin Innovation

Submer Group Announces Rubix Data Centres Platform to Deliver AI Data Centres Globally

Signaloid Unveils New AI Accelerator ASIC Focused on Low-Energy Physical AI Computing

Contact Us

Enquiry

Subscribe to AISpectrum India Newsletter