AMD launches Instinct MI350P as a PCIe AI accelerator with 144GB HBM3E

news
AMD launches Instinct MI350P as a PCIe AI accelerator with 144GB HBM3E

AMD has introduced the Instinct MI350P, its first PCIe based Instinct accelerator in years. The new card is built for enterprise AI workloads and is designed to fit into standard air cooled servers without requiring a full infrastructure change.

The MI350P gives companies a simpler way to add AI compute to existing data centers. Instead of needing a more complex platform built around large accelerator trays, the card uses a dual slot PCIe design that can be deployed more easily in current server racks.

The accelerator is based on AMD’s CDNA 4 architecture and uses TSMC 3nm process technology for its compute dies. It has a 4 XCD design, which is half of the larger MI350X configuration. It also includes a single 6nm I/O die.

The card includes 128 compute units, 8,192 stream processors, 512 matrix cores, and peak clocks up to 2.2GHz. AMD says the chip contains 73 billion transistors.

Memory is one of the biggest strengths of the MI350P. The card includes 144GB of HBM3E across a 4096 bit bus, delivering up to 4TB/s of bandwidth. It also includes 128MB of Infinity Cache. The larger MI350X doubles the memory to 288GB HBM3E, but the MI350P is aimed at PCIe server deployments where easier installation matters.

Here is a quick look at the main specs:

FeatureAMD Instinct MI350P
ArchitectureCDNA 4
ProcessTSMC 3nm compute dies, 6nm I/O die
Compute units128
Stream processors8,192
Matrix cores512
Memory144GB HBM3E
Memory bandwidthUp to 4TB/s
Infinity Cache128MB
Peak clockUp to 2.2GHz
Power600W, configurable down to 450W
Form factorDual slot PCIe
CoolingPassive server cooling

For AI performance, AMD lists up to 4.6 PFLOPs with MXFP4 and MXFP6. The card also supports lower precision formats used in modern enterprise AI, including MXFP8, FP16, BF16, and INT8, with sparsity support for some workloads.

PrecisionClaimed performance
MXFP44.6 PFLOPs
MXFP64.6 PFLOPs
MXFP82.3 PFLOPs
FP16 with sparsity2.3 PFLOPs
FP161.15 PFLOPs
BF161.15 PFLOPs
BF16 with sparsity2.3 PFLOPs
INT82.3 POPs
INT8 with sparsity4.6 POPs
FP3272 TFLOPs
FP6436 TFLOPs

The MI350P is positioned against NVIDIA’s PCIe data center options, especially the H200 NVL, which includes 141GB of HBM3E. NVIDIA also has the RTX PRO 6000 Blackwell server edition, but that card uses 96GB of GDDR7 rather than HBM memory.

AMD is also leaning on its open software ecosystem. The MI350P supports ROCm and is being offered through partners with an enterprise ready AI software stack.

The launch shows AMD trying to fill an important gap in its AI lineup. Large scale AI systems may use bigger accelerator platforms, but many companies still want PCIe cards that can be added to existing servers. With 144GB of HBM3E, strong low precision AI support, and a standard PCIe form factor, the MI350P is aimed at that middle ground between easier deployment and serious AI performance.

Discover: News

Discussion (0)

Be the first to comment.