December 3, 2021

Intel Golden Cove Core Structure Deep Dive: vs Zen 3 and Sunny Cove

At its Architectural Day 2021, Intel unveiled its Golden Cove core which represents the following step in its high-performance CPU structure. Golden Cove succeeds Willow Cove and can compete towards AMD’s Zen 3 and Zen 4-based processors. As such, we’ll be evaluating it towards these cores, plus analyze what has modified in comparison with its predecessors.

Golden Cove is predicated on Intel’s 10nm Enhanced SuperFin node (now renamed to Intel 7) and powers the Alder Lake, and Sapphire Rapids-SP lineup. It has undergone main modifications in comparison with Willow Cove, most of which could be seen as a direct response to Apple’s competing Firestorm cores. With each Golden Cove and Gracemont, Intel has expanded the again and front-end, improved the OoO capabilities, and centered extra on energy effectivity and real-world efficiency.

Intel Golden Cove Core Structure vs Zen 3 vs Sunny Cove

First up, we have now the Golden Cove front-end: From the top-down, the L1 instruction cache is unchanged at 32KB (just like Zen 3 and Sunny Cove), however the related Instruction Translation Lookaside Buffer has been upgraded. It has been doubled from 128 to 256 (4K) entires, leading to 32 (up from 16) 2M/4M big pages. The accompanying Department Goal Buffer (BTB) has additionally been greater than doubled, rising from 5K to 12K entries. As compared, Zen 3 options 6.5K entries, and Sunny Cove is proscribed to only 5K.

The rationale for the elevated give attention to branching is somewhat easy. The decoders are a lot wider with Golden Cove: A 6-wide decoder, the widest of any x86 core (Zen 3 has 4-way), plus an extra 1:4 advanced decoder, two greater than Willow Cove. This implies extra energy and the next latency penalty. As such, Intel is relying extra on the micro-op cache. It has been practically doubled, from 2.25K (on Willow Cove) to 4K entries, placing it on par with Zen 3. In keeping with Intel, the decoder is clock-gated 80% of the time, and the backend largely depends on the micro-op cache as a substitute.

To feed the broader decoder, the instruction fetch has been doubled from 16 on Sunny Cove to 32 bytes on Golden Cove, as soon as once more, placing it on par with Zen 3. The op-cache is now able to sending 8 decoded directions to the op-queue, very similar to, sure Zen 3, in comparison with 6 on Sunny Cove.

Golden Cove frontend

To maintain up with the broader decoder and the op-cache, the micro-op Queue has additionally been widened. For single-threaded purposes not using hyperthreading (SMT), the uop-queue has been elevated from 70 to 144. The only thread can use the assets of each threads. With SMT, the queue depth has been elevated from 70 in Sunny Cove to 72 in Golden Cove.

Golden Cove: Allocation, Scheduling and OoO Execution

Shifting to the allocation and the scheduler, right here’s the place issues get completely different between Zen 3 and Golden Cove. Intel has expanded its newest core structure to 6 allocation ports and 12 execution ports. Sunny Cove had 5 allocation ports and 10 execution ports, whereas Zen 3 separates its FP and INT items (see above diagram). The latter can schedule as much as eight integer directions and 6 FP directions concurrently.

On the draw back for AMD, the OoO capabilities are a lot much less sturdy than Intel’s. Whereas Zen 3 is proscribed to a 256-entry ROB, Golden Cove will get a 512-entry ROB, an enormous improve over Willow Cove’s 352 entry buffer. A rise in ROB dimension often comes with a notable improve in die space and energy consumption, and as such, it’s somewhat stunning that Intel was capable of improve the ROB dimension with out a node shrink.

The modifications on the backend are much less noticeable. On the FP facet, we’re taking a look at two FADD items that are once more, a primary for an x86 core. Each Zen 3 and Willow Cove lack Quick Adders that are purported to be extra power-efficient and quicker than common Adders. The FMA items which assist FP16 are restricted to the server Sapphire Rapids core as a result of lack of AVX512 assist on Alder Lake.

The Integer Execution (as seen in a not too long ago leaked benchmark) is getting extra consideration. Golden Cove will get an extra port (Port 10) in comparison with Willow Cove. Moreover, LEA (load efficient deal with) directions are actually one-cycle directions throughout all 5 ports. This places Golden Cove on par with Zen 3 by way of Integer ALU execution ports, with the added benefit of quicker shift/LEA directions.

Lastly, by way of the AGU, Golden Cove will get an extra Load port. Total, Golden Cove can carry out three 256-bit hundreds or two 512-bit hundreds, along with two 256-bit shops per cycle. That’s a good bit greater than Zen 3 which is able to three hundreds (2x 256b) or two shops (1x 256b) per cycle. Sunny Cove core can do 2x 256b hundreds and 2x 256b shops on the similar time as a result of two devoted ports and the buffer width. It has a a lot wider load and retailer buffers of 128 and 72, respectively. Skylake, alternatively, had 72 entries within the load buffer and 56 within the retailer buffer. Zen 3 is proscribed to only 44 hundreds and 48 retailer entries.

Conclusion: 19% IPC Achieve, However Will or not it’s Sufficient?

Intel guarantees an IPC acquire of 19% in comparison with Cypress Cove and Willow Cove. But when we have a look at the sheer variety of modifications to the core structure, this appears somewhat tame. Extra importantly, Golden Cove appears simply forward of Zen 3 by way of sheer efficiency. The way it holds up in comparison with Zen 4 and extra importantly, Zen 3D is what’s going to matter ultimately. Each Ryzen 5000XT/6000 and Milan 3D are anticipated to launch within the coming months as a response to Sapphire Rapids and Alder Lake.

Related posts:

Leave a Reply

Your email address will not be published. Required fields are marked *