Intel brings a big fork to a server-CPU knife fight

With Intel’s foundry still trying to match the process and packaging offered by archrival Taiwan Semiconductor Manufacturing Co, Intel’s server CPU product line will have to make do with what the foundry has and create products that deliver the right mix of performance and offer price. to compete with CPU rival AMD in the X86 space and the Arm collective creating a new CPU tier in the data center.

And so Intel has decided to split its product line into machines that use true Xeon cores – which are known as P-cores, short for performance cores – and use Atom cores – which are known as E-cores, short for energy -efficient cores. cores. This isn’t so much a new split of the Intel Xeon product line, but rather a hardening of the principles Intel has held for more than a decade. (The “Knights Landing” turn.)

We grew up in Appalachia, and we live in the mountains again after living in New York City for thirty years, and we understand that under the right circumstances – or rather, the wrong ones – a fork can be just as dangerous as a knife . . (Look how well Aquaman does with a golden fork.) You have to sharpen a spoon on the stone wall for a long time, but you can also make that dangerous. . . .

This time, Intel isn’t making a toy server CPU based on Atom-style cores and limiting main memory and I/O expansion, hoping that companies will buy a lot of them and cram them into racks like canned goods for the winter. Instead, Intel bundles much larger numbers of Atom cores into a real server socket, with real memory and I/O capabilities, and plugs into standard Xeon server platforms to deliver excellent price/performance and thermal properties for high-throughput workloads where a standard Xeon P-core with HyperThreading just won’t cut it.

In the long term – meaning over the next five years or so – the market will decide whether having two radically different cores with virtually the same instruction set can compete with two more similar cores with different layouts and half the L3 -cache per core. The latter is AMD’s strategy, which uses a more subtle distinction to distinguish between its standard Zen cores, such as the Zen 4 cores used in the ‘Genoa’ variants of the Epyc 9000 series, and the high number of centers of Bergamo. “Siena” low thermal server CPUs based on the Zen 4c cores.

The thing to remember is that while AMD has a 33 percent share of X86 server CPUs today, as Lisa Su pointed out in her keynote address at Computex 2024 yesterday, Intel still has the remaining 67 percent – and that’s with his foundry branch self-imposedly tied behind his back. But it comes off the ropes.

Intel will have its foundry in order around 2025, and it has plenty of good architects who can deliver excellent CPU designs and perhaps even a competitive GPU with its “Falcon Shores” efforts. The company is working on better returns on its packaging. Intel will compete, and life will become more difficult for AMD and the military. The two flavors of the steps toward closing the CPU server gap for Intel. A year and a half from now, this will be a real knife fight, and we expect market shares in the X86 space to be very similar. And it won’t be long before Arm has a 20 percent share of total server shipments and RISC-V starts to gain some adherents here and there.

This CPU battle in the data center is far from over.

Two goals, one architecture

Intel has been talking about this E-core and P-core strategy for a while, but it’s worth addressing some of the central tenets before we dive into the first batch of Sierra Forest chips that Intel is talking about. There will be others. Intel isn’t just doing a big bang launch of its entire product line at once, and we suspect this is a capacity limitation for the Intel 7 and Intel 3 processes used to make the Sierra Forest chips.

The above graph, an amalgam we made of two Intel graphs, says that the P-core variant of the Xeon 6 is aimed at AI workloads, but also HPC simulation and modeling and indeed any kind of workload where a A stronger core is a better option than a weaker core. AI is just one type of compute-intensive workload, and admittedly, it could be the most interesting for companies considering taking pre-trained generative AI models and retraining them with their own data to run on-premises AI workloads in their CPU parks.

Since the E-core chips don’t have AVX-512 vector units or AMX matrix math units, they can’t really do much in the way of AI or HPC processing. They are actually designed for application, print, file and web serving, and in some cases the E-core variants can be good for other types of microservices applications where code chunks are quite modest. Video streaming, media transcoding and other forms of data streaming are ideal for the E-core machines, Intel says.

In both the E-core and P-core designs, the memory and I/O controllers and the UltraPath Interconnect (UPI) links for NUMA shared memory clustering of CPUs are separated from the cores, which are located on one, two , or three banks of chiplets. The “Sapphire Rapids” Xeon SP v4 launched in January 2023 had everything on each chiplet and integrated four of them to create a socket. With the “Emerald Rapids” Xeon SP v5 launched in December 2023, Intel went back to two chiplets with slightly more aggregated cores, but all controllers were still on the same chiplets as the cores. There were also monolithic single-chiplet implementations of the Sapphire Rapids and Emerald Rapids chips, as well as for low and medium core count devices.

The core complexes of the Sierra Forest Rapids and Emerald Rapids.

The Xeon 6 processors will be available in two package families, called the 6700 and the 6900, which will be further differentiated in the use of E-core and P-core tiles. There’s no Xeon 6 that will combine E-core and P-core chiplets in the same package, but presumably Intel would build it if someone wanted such a beast.

Here are the specifications of the 6700 series and the 6900 series:

Essentially, the 6700 series creates sockets with a “virtual” low core count (LCC), high core count (HCC), and extreme core count (XCC) chip, stitched together with EMIB packaging. There doesn’t seem to be a variant of the middle core count (MCC).

This is what the Xeon 6 6700 series die packages look like:

And this is what the 6900 series die packages look like:

The rollout of the Xeon 6 family of server CPUs will be staggered, and this is based on customer feedback, according to Intel. The lower Sierra Forest E-core chips come out first, followed by the Granite Rapids higher-end P-core chips in the third quarter:

The thicker Sierra Forrest chips with up to 288 cores will be released in the first quarter of next year, and this also applies to Granite Rapids in the 6300, 6500 and 6700 series. There will also be a SoC variant of the Granite Rapids chip, most likely for edge use cases where solid cores and vector and matrix math units are used for AI inference processing.

There has never been a high-performance Atom machine from Intel before, so it’s difficult to make comparisons with the current Xeon SP and future high-performance core Xeon 6 machines. In its presentations, Intel compares the Sierra Forrest Xeon 6 6700 chips with the second generation of Based on Intel’s benchmarks and our own analysis, we agree that the instructions per clock of the Atom-based E-core are about the same for whole-work as the Cascade Lake Xeon SPs. If you do the math, an E-core in Sierra Forest provides about 65 percent of the performance of an Emerald Rapids P-core. It all matches.

We’ll delve deeper into the architecture of the Xeon 6 6700E family, but in the meantime, here’s the rather modest SKU stack, which only has seven variants:

In the first quarter of 2025, Intel will double the performance of the Sierra Forest chips with two compute tiles and two I/O and memory controller tiles to create the Xeon 6 6900E, known as a ZCC package and which will have up to 288 cores to have. .

Of course, if you pay per core for your software, the E-core variants can be a difficult sell. But if you write your own microservices software or pay per socket, then software price is not an issue and an E-core Xeon 6 could be the solution when it comes to reducing thermal energy and costs while getting acceptable throughput.

Here is our usual performance comparison and price chart, which provides a rough performance metric against the four-core “Nehalem” Xeon E5500 from March 2009. These performance metrics take into account cores, clocks, and IPC across generations.

The “performance general purpose” high-bin components of the Emerald Rapids . Prices range from $1,099 to $11,600 in trays of 1,000 units from Intel. The Sierra Forest chips don’t have HyperThreading and range from 64 to 144 cores (which means you only have 64 to 144 threads). Prices range from $2,749 to $11,350, but relative performance ranges from 22.89 to 47.20, meaning value for money is anywhere from 19 to 43 percent better. For a given wattage the performance is twice as high, or for a given power the wattage is half. Very generally speaking, of course.

The comparison with Cascade Lake Xeon SP v2 server CPUs is nice. The top-end 2019 Cascade Lake had 56 P-cores and 112 threads at 2.6 GHz and delivered 21.69 units of oomph at a cost of over $946 per unit of performance. The low-end 2024 Sierra Forest CPU has 64 E-cores at 2.4 GHz and relative performance of 22.89, but the cost per unit of performance is only a little over $120. That’s a factor of 7, 9x improvement in price/performance over the past five years. That highest-bin Cascade Lake part consumed 400 watts, compared to the low-bin Xeon 6 6710E processor in the Sierra Forest series.

The Sierra Forest 6700E top bucket part does more than twice the work of the low bucket part, and the cost of a unit’s performance is also double, so the gap with the Cascade Lake top bucket part tank is half as big. But even 3.95X is pretty good.

Next, we’ll take a deeper architectural dive into Sierra Forest and what we can surmise about Granite Rapids.

Two goals, one architecture

Leave a Comment Cancel reply