AMD has introduced the Instinct MI350P, its first PCIe based Instinct accelerator in years. The new card is built for enterprise AI workloads and is designed to fit into standard air cooled servers without requiring a full infrastructure change.
The MI350P gives companies a simpler way to add AI compute to existing data centers. Instead of needing a more complex platform built around large accelerator trays, the card uses a dual slot PCIe design that can be deployed more easily in current server racks.
The accelerator is based on AMD’s CDNA 4 architecture and uses TSMC 3nm process technology for its compute dies. It has a 4 XCD design, which is half of the larger MI350X configuration. It also includes a single 6nm I/O die.

The card includes 128 compute units, 8,192 stream processors, 512 matrix cores, and peak clocks up to 2.2GHz. AMD says the chip contains 73 billion transistors.
Memory is one of the biggest strengths of the MI350P. The card includes 144GB of HBM3E across a 4096 bit bus, delivering up to 4TB/s of bandwidth. It also includes 128MB of Infinity Cache. The larger MI350X doubles the memory to 288GB HBM3E, but the MI350P is aimed at PCIe server deployments where easier installation matters.
Here is a quick look at the main specs:
| Feature | AMD Instinct MI350P |
|---|---|
| Architecture | CDNA 4 |
| Process | TSMC 3nm compute dies, 6nm I/O die |
| Compute units | 128 |
| Stream processors | 8,192 |
| Matrix cores | 512 |
| Memory | 144GB HBM3E |
| Memory bandwidth | Up to 4TB/s |
| Infinity Cache | 128MB |
| Peak clock | Up to 2.2GHz |
| Power | 600W, configurable down to 450W |
| Form factor | Dual slot PCIe |
| Cooling | Passive server cooling |
For AI performance, AMD lists up to 4.6 PFLOPs with MXFP4 and MXFP6. The card also supports lower precision formats used in modern enterprise AI, including MXFP8, FP16, BF16, and INT8, with sparsity support for some workloads.
| Precision | Claimed performance |
|---|---|
| MXFP4 | 4.6 PFLOPs |
| MXFP6 | 4.6 PFLOPs |
| MXFP8 | 2.3 PFLOPs |
| FP16 with sparsity | 2.3 PFLOPs |
| FP16 | 1.15 PFLOPs |
| BF16 | 1.15 PFLOPs |
| BF16 with sparsity | 2.3 PFLOPs |
| INT8 | 2.3 POPs |
| INT8 with sparsity | 4.6 POPs |
| FP32 | 72 TFLOPs |
| FP64 | 36 TFLOPs |
The MI350P is positioned against NVIDIA’s PCIe data center options, especially the H200 NVL, which includes 141GB of HBM3E. NVIDIA also has the RTX PRO 6000 Blackwell server edition, but that card uses 96GB of GDDR7 rather than HBM memory.
AMD is also leaning on its open software ecosystem. The MI350P supports ROCm and is being offered through partners with an enterprise ready AI software stack.
The launch shows AMD trying to fill an important gap in its AI lineup. Large scale AI systems may use bigger accelerator platforms, but many companies still want PCIe cards that can be added to existing servers. With 144GB of HBM3E, strong low precision AI support, and a standard PCIe form factor, the MI350P is aimed at that middle ground between easier deployment and serious AI performance.



Discussion (0)
Be the first to comment.