AMD Granite Ridge “Zen 5” Processor Annotated



High-resolution die-shots of the AMD “Zen 5” 8-core CCD were released and annotated by Nemez, Fitzchens Fitz, and HighYieldYT. These provide a detailed view of how the silicon and its various components appear, particularly the new “Zen 5” CPU core with its 512-bit FPU. The “Granite Ridge” package looks similar to “Raphael,” with up to two 8-core CPU complex dies (CCDs) depending on the processor model, and a centrally located client I/O die (cIOD). This cIOD is carried over from “Raphael,” which minimizes product development costs for AMD at least for the uncore portion of the processor. The “Zen 5” CCD is built on the TSMC N4P (4 nm) foundry node.

The “Granite Ridge” package sees the up to two “Zen 5” CCDs snuck up closer to each other than the “Zen 4” CCDs on “Raphael.” In the picture above, you can see the pad of the absent CCD behind the solder mask of the fiberglass substrate, close to the present CCD. The CCD contains 8 full-sized “Zen 5” CPU cores, each with 1 MB of L2 cache, and a centrally located 32 MB L3 cache that’s shared among all eight cores. The only other components are an SMU (system management unit), and the Infinity Fabric over Package (IFoP) PHYs, which connect the CCD to the cIOD.

Each “Zen 5” CPU core is physically larger than the “Zen 4” core (built on the TSMC N5 process), due to its 512-bit floating point data-path. The core’s Vector Engine is pushed to the very edge of the core. On the CCD, these should be the edges of the die. FPUs tend to be the hottest components on a CPU core, so this makes sense. The innermost component (facing the shared L3 cache) is the 1 MB L2 cache. AMD has doubled the bandwidth and associativity of this 1 MB L2 cache compared to the one on the “Zen 4” core.

The central region of the “Zen 5” core has the 32 KB L1I cache, 48 KB L1D cache, the Integer Execution engine, and the all important front-end of the processor, with its Instruction Fetch & Decode, the Branch Prediction unit, micro-op cache, and Scheduler.

The 32 MB on-die L3 cache has rows of TSVs (through-silicon vias) that act as provision for stacked 3D V-cache. The 64 MB L3D (L3 cache die) connects with the CCD’s ringbus using these TSVs, making the 64 MB 3D V-cache contiguous with the 32 MB on-die L3 cache.

Lastly, there’s the client I/O die (cIOD). There’s nothing new to report here, the chip is carried over form “Raphael.” It is built on the TSMC N6 (6 nm) node. Nearly 1/3rd of the die-area is taken up by the iGPU and its allied components, such as the media acceleration engine, and display engine. The iGPU is based on the RDNA 2 graphics architecture, and has just one workgroup processor (WGP), for two compute units (CU), or 128 stream processors. Other key components on the cIOD are the 28-lane PCIe Gen 5 interface, the two IFoP ports for the CCDs, a fairly large SoC I/O consisting of USB 3.x and legacy connectivity, and the all important DDR5 memory controller with its dual-channel (four sub-channel) memory interface.