AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4
Each MCD has 16 MB of Infinity Cache (L3 cache); and a 64-bit GDDR6 memory interface (two 32-bit GDDR6 paths). Six of these add up to the GPU’s 384-bit GDDR6 memory interface. In the scheme of things, the GPU has a contiguous and monolithic 384-bit wide memory bus, because every modern GPU uses multiple on-die memory controllers to achieve a wide memory bus. “Navi 31” hence has a total Infinity Cache size of 96 MB—which may be less in comparison to the 128 MB on “Navi 21,” but AMD has shored up cache sizes across the GPU. The L0 caches on the compute units is now increased numerically by 240%. The L1 caches by 300%, and the L2 cache shared among the shader engines, by 50%. The RX 7900 XTX is confined to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.
The GCD features six Shader Engines, each with 16 compute units (or 8 dual compute units), which work out to 1,024 stream processors. AMD claims to have doubled the IPC of these stream processors over RDNA2. The new RDNA3 ALUs also support BF16 instructions. The SIMD engine of “Navi 31” has an FP32 throughput of 61.6 TFLOP/s, a 168% increase over the 23 TFLOP/s of the “Navi 21.” The slide doesn’t quite detail the new Ray Tracing engine, but references new RT features, larger caches, 50% higher ray intersection rate, for an up to 1.8X RT performance increase at 2.505 GHz engine clocks; over the RX 6950 XT. There are other major upgrades to the GPU’s raster 3D capabilities, including a 50% increase in prim/clk rates, and 100% increase in prim/vertex cull rates. The pixel pipeline sees similar 50% increases in rasterized prims/clock and pixels/clock; and synchronous pixel-wait.