Nvidia Brings Day One DiffusionGemma Support to RTX and DGX Systems

news
Nvidia Brings Day One DiffusionGemma Support to RTX and DGX Systems

Nvidia has added day one support for Google DeepMind’s DiffusionGemma open model across its RTX and DGX platforms, giving developers a fast local option for text generation on consumer GPUs, professional workstations, and deskside AI systems. The model is built for faster output than traditional autoregressive models, and Nvidia says its own hardware and software stack can push performance even further.

DiffusionGemma is an open weight model designed to generate text using a diffusion based approach. Instead of predicting one token at a time, it can denoise up to 256 tokens per step. That parallel generation method is the main reason the model can offer faster output, especially in single user local workloads where traditional token by token generation can slow down.

The model is based on Gemma 4 and uses a mixture of experts design. It has 25.2 billion total parameters, but only 3.8 billion active parameters per step. That helps keep performance practical while still giving the model enough capacity for text generation tasks.

DiffusionGemma focuses on faster local AI generation

DiffusionGemma is designed to run locally, which means developers and creators can use it without relying on cloud inference or paying per token. It is available under an Apache 2.0 license and supports tools such as Hugging Face Transformers, vLLM, and Unsloth at launch.

FeatureDiffusionGemma
Model typeOpen weight diffusion model
Base architectureGemma 4
Total parameters25.2 billion
Active parameters3.8 billion per step
Context lengthUp to 256K tokens
Precision formatsBF16 and NVFP4
Main useFast local text generation
Supported platformsNvidia RTX, RTX PRO, DGX Spark, DGX Station

The model also supports text and image modalities, giving it a broader role than a basic text only model. Its main appeal, however, is fast local generation.

Nvidia says its platforms can run the model without extra tuning

Nvidia is supporting DiffusionGemma across GeForce RTX GPUs, RTX PRO workstations, DGX Spark systems, and DGX Station. The company says the model can run with its CUDA software stack and Tensor Core hardware without requiring extra user tuning.

That matters because open models often need careful setup to perform well. If Nvidia’s stack makes DiffusionGemma easier to deploy, more developers may be able to test it quickly on local systems.

DGX Spark appears to be one of the more interesting targets. It uses Nvidia’s GB10 Grace Blackwell Superchip, includes 128GB of unified memory, and is meant for local AI development, agents, research, prototyping, and fine tuning.

DGX Spark reaches 150 tokens per second

Nvidia says DGX Spark can run DiffusionGemma at around 150 tokens per second. DGX Station goes much higher, with claims of up to 800 tokens per second, while H100 Tensor Core GPUs in DGX systems can reportedly reach around 1,000 tokens per second on a single GPU.

Nvidia platformClaimed DiffusionGemma performance or role
DGX SparkAround 150 tokens per second
DGX StationUp to 800 tokens per second
H100 Tensor Core GPUAround 1,000 tokens per second
RTX PRO 6000 workstationsLocal professional inference and agent workflows
GeForce RTX GPUsLocal desktop AI support, with llama.cpp support coming

The company says DiffusionGemma can be roughly four times faster than an equivalent autoregressive model in certain local generation scenarios. That could make it useful for workflows where speed and responsiveness matter more than cloud scale.

Local AI is becoming a bigger part of Nvidia’s RTX strategy

Nvidia has been pushing RTX hardware beyond gaming for several years, especially as local AI tools have become more common. DiffusionGemma fits neatly into that strategy because it gives RTX and DGX owners another model that can run on their own hardware.

For developers, local models are useful because they reduce cloud dependency and allow faster testing. For researchers, they make it easier to prototype agent workflows, experiment with model behavior, and fine tune systems without sending everything to remote servers.

For creators and professionals, local inference can also help with privacy, latency, and cost control.

DiffusionGemma gives Nvidia another way to show its full AI stack

The bigger story is not only that Nvidia supports one new model. It is that Nvidia wants every major open model to work well on its hardware from day one.

That kind of support strengthens the value of RTX and DGX systems. If developers know new models will run quickly on Nvidia hardware, they have less reason to look elsewhere for local AI work.

DiffusionGemma is also a good example of where AI model design is heading. Faster generation is becoming a priority, and diffusion based text models are one attempt to move beyond the slower one token at a time approach used by many existing systems.

For now, Nvidia’s day one support gives DiffusionGemma a strong launch platform. RTX users, workstation owners, and DGX Spark buyers can try the model locally, while Nvidia gets another showcase for CUDA, Tensor Cores, and its wider AI software stack.

Discover: News

Discussion (0)

Be the first to comment.