New HPC / AI servers based on the AMD Instinct MI300 series

With the MI300X Series, a new player has appeared on the GPU market that will reshuffle the cards in the AI sector.

AMD Instinct™ MI300X GPU accelerators are designed for superior performance in generative AI workloads and HPC applications. AMD’s CDNA3 architecture is specifically optimised for high-performance computing (HPC) and data centres, differentiating it from the RDNA architecture, which is more focused on gaming and consumer graphics. The MI300X offers 19456 cores (stream processors) and 192GB ECC memory and therefore has a significantly better price-performance ratio than the competition.

Gigabyte G593-ZX1 | Dual AMD EPYC 5HE Mainstream HPC/AI Server

Supports AMD Instinct™ MI300X accelerators

Gigabyte G593-ZX2 | Dual AMD EPYC 5HE Mainstream HPC/AI Server

Supports AMD Instinct™ MI300X accelerators

AMD MI300 compared to NVIDIA H100/H200

On paper

On raw numbers, the AMD MI300X outperforms the NVIDIA H100 with 30% more FP8 FLOPS, 60% more memory bandwidth and more than double the memory capacity. Of course, the MI300X is actually more competitive against the new NVIDIA H200 generation, reducing the gap to single digits in memory bandwidth and less than 40% in capacity.

In the real world - benchmarks*

LLAMA 2-70B Benchmark

This is a realistic inference benchmark for most use cases. AMD has a 40% latency advantage, which is logically explainable given the 60% bandwidth advantage over the H100. As the as yet unreleased H200 is close to the AMD GPU in terms of bandwidth, it is to be expected that the H200 will perform similarly. However, it is also to be expected that there will be a considerable price advantage in favour of the AMD architecture.

Bloom

In this benchmark, the Mi300X outperforms the H100 in terms of data throughput by a factor of 1.6. The benchmark is impressive, but possibly misleading. The model used for this benchmark is very large and a long input sequence was used. The system with the smaller memory (H100) is forced to work with a much smaller stack size as the KVCache takes up the entire memory capacity. The system with the larger memory (MI300) can use a larger stack size to utilise its computing power. While this is indeed a real advantage and the throughput-centred scenario is not theoretical but real, there are other scenarios where the performance gap is significantly smaller. *The stated values and categorisations refer to data published by AMD 

Recent Blogs