NVIDIA Blackwell Tops Every MLPerf 6.0 Training Benchmark As GB300 Pulls Ahead Of GB200

news
NVIDIA Blackwell Tops Every MLPerf 6.0 Training Benchmark As GB300 Pulls Ahead Of GB200

NVIDIA has posted another dominant result in the latest MLPerf Training 6.0 benchmark round, with its Blackwell systems leading every test where results were submitted. The company’s GB300 NVL72 systems also showed a clear uplift over GB200, with NVIDIA claiming up to 60 percent faster performance in the same rack scale configuration.

The MLPerf Training suite is one of the most closely watched AI hardware benchmarks because it is open, peer reviewed, and designed to compare real training workloads across different systems. The latest version adds two new mixture of experts tests, DeepSeek V3 with 671 billion parameters and GPT OSS 20B with 21 billion parameters.

NVIDIA was the only platform with submissions across all seven benchmarks in this round. In several of the largest tests, there was no competing submission at all, which made NVIDIA’s lead look even stronger.

Blackwell leads across the full MLPerf 6.0 suite

NVIDIA’s strongest showing came from Blackwell NVL72 systems, including both GB200 and GB300 configurations. These systems combine GPUs, fast interconnects, and rack scale architecture designed for large AI training jobs.

In the new DeepSeek V3 671B benchmark, CoreWeave reached the target quality in 2.02 minutes using 8,192 GB300 GPUs. In Llama 3.1 405B, Microsoft Azure used 8,192 GB200 GPUs and reached the target in 7.07 minutes, the fastest reported time for that test.

BenchmarkNVIDIA resultNearest alternative
DeepSeek V3 671B2.02 minutesNo submission
GPT OSS 20B7.43 minutesNo submission
Llama 3.1 405B7.07 minutesNo submission
Llama 2 70B LoRA0.40 minutes8.27 minutes
Llama 3.1 8B4.46 minutes58.63 minutes
FLUX.117.1 minutes74.44 minutes
DLRM DCNv20.67 minutesNo submission

The gap was especially large in Llama 3.1 8B, where NVIDIA completed the workload in 4.46 minutes while the nearest alternative took 58.63 minutes. That works out to a major time to train difference for a benchmark that still matters to smaller and more flexible AI deployments.

GB300 shows why NVIDIA keeps refreshing Blackwell

The results also show that NVIDIA is not standing still with Blackwell. GB200 was already a major platform for AI training, but GB300 improves performance through higher AI compute density and support for NVFP4.

In the same NVL72 class of system, NVIDIA says GB300 can run up to 60 percent faster than GB200. That matters because AI data centers are now judged not only on raw GPU count, but on how quickly each rack can complete useful training work.

For hyperscalers, faster training time can mean better cluster utilization, lower time to model readiness, and more output from the same physical footprint. That is why rack scale performance is becoming as important as individual accelerator performance.

Competitors had limited visibility in the largest tests

One of the most striking parts of MLPerf 6.0 is the lack of competing submissions in several major workloads. NVIDIA had entries for the new DeepSeek V3 and GPT OSS 20B tests, while alternatives did not appear in those categories.

AMD did submit results in some benchmarks with MI300, MI320, MI350, and MI355 series accelerators, but NVIDIA remained ahead in the listed comparisons. In FLUX.1, a 32 GPU GB300 configuration was shown ahead of much larger MI300X and MI320X submissions. In Llama 2 70B LoRA and Llama 3.1 8B, NVIDIA systems also finished faster at similar or larger scale.

That does not mean competitors have no viable AI hardware. It does mean NVIDIA continues to control the benchmark narrative in training, especially when the workload moves to very large model sizes and multi rack scale.

Scale remains one of NVIDIA’s biggest advantages

NVIDIA’s lead is not only about the GPU. The company’s advantage comes from the full platform, including CUDA, networking, system design, optimized libraries, and cluster level software.

The 8,192 GPU results matter because large AI labs and cloud providers care about how hardware behaves at scale. A chip can look good in a smaller test, but training frontier class models requires thousands of accelerators working together reliably.

Blackwell NVL72 systems and Spectrum X networking are designed for that environment. NVIDIA’s ability to show large scale submissions across multiple workloads gives it a strong position as customers plan next generation AI infrastructure.

Vera Rubin is next, but Blackwell still has room to run

NVIDIA is already preparing its Vera Rubin platform, but MLPerf 6.0 shows that Blackwell remains highly competitive before that next generation arrives. Software optimization, larger deployments, and GB300 upgrades are still increasing performance on the current architecture.

That is important for customers buying systems now. AI hardware roadmaps move quickly, but data center operators need platforms that keep improving after launch. NVIDIA’s MLPerf results suggest Blackwell is still gaining efficiency and scale benefits as deployments mature.

The latest benchmark round does not change the broader AI hardware race, but it reinforces where the market stands today. NVIDIA remains the dominant training platform, competitors are still trying to close the gap, and GB300 has strengthened Blackwell’s position before Vera Rubin enters the picture.

Discover: News

Discussion (0)

Be the first to comment.