December 3, 2021

Raja’s Chip Notes Lay Out Intel’s Path to Zettascale


Raja Chip Notes Intel Path To Zetta Scale
Raja’s Chip Notes Intel’s Path To Zetta Scale

A couple of hours in the past, I bumped into Raja Koduri at a bar. For individuals who have no idea, Raja is the SVP and GM of Intel’s Accelerated Computing Programs and Graphics (AXG) Group. Given all the Supercomputing convention actions we’re protecting this week, it was nice to seize a beer (maybe multiple) with Raja and go into Intel’s HPC technique. Particularly, Raja detailed for me how Intel plans to go from ExaFLOPS in 2022 to ZettaFLOPS in 2027-2028. For some context, that is Intel’s pathway to roughly 1000x efficiency of right this moment’s techniques in solely 5-6 years.

Raja’s Chip Notes Lay Out Intel’s Path to Zettascale

What you’re going to see because the artifacts of our dialogue are merely just a few factors on an Workplace Depot pad. Only for some context, we managed to seize an image whereas having this dialogue.

Patrick Raja Chance Beer Encounter
Patrick Raja Likelihood Beer Encounter

Raja defined to me Intel’s path to Zettascale as an unlimited enchancment to right this moment’s techniques, together with the Aurora supercomputer slated for 2022. I requested and Raja let me snap a photograph of his “chip notes” after the dialogue. For these questioning, the bar supplied us every with small plates of Cool Ranch Doritos. It was a bit humorous since we have been there speaking about chips. Therefore, we’re calling these “Raja’s Chip Notes.”

Raja Chip Notes Intel Path To Zetta Scale
Raja’s Chip Notes Intel’s Path To Zetta Scale

What you’ll be able to see above is a collection of enhancements Raja thinks Intel can attain in an effort to get to Zettaflops, or roughly 500x the Aurora efficiency of >=2 Exaflops (extra on that in a bit.) One of many constraints right here was working inside the same energy footprint to Aurora since it could be much less of an achievement to say Zettaflops have been achieved with a corresponding 500x enhance in energy consumption.

  • One of many massive ones and the primary on that checklist is “Structure” with a 16x enchancment. That 16x entails adopting related math execution to what some others out there are doing. That quantity is 16x, however Raja instructed me that Intel is aware of the architectural adjustments to scale properly past that. The 16x is getting used right here as a result of going properly past which may change the longer term GPUs/ accelerators into double-precision LINPACK optimized chips as a substitute of performing properly on different workloads.
    Raja famous that whereas Intel might concentrate on easy DP execution, as a substitute one of many larger issues is protecting all the execution items fed with information and sufficient reminiscence bandwidth. His place is that Intel would concentrate on not simply DP execution which will get Intel to the Zettaflop period, but additionally AI math operations, and maybe most significantly making certain that reminiscence bandwidth is plentiful and well-utilized. That method could not give a 1000x enchancment for each software, nevertheless it ought to assist the Zettascale structure present huge beneficial properties for a greater variety of purposes.
  • The following one is labeled as “energy/ thermals” and is scoped with a 2x enchancment. Because the Zettaflops aim is focusing on the identical or related energy as Aurora, one different approach to get extra efficiency is to do extra with much less energy. Examples of this can be operating chips at considerably decrease voltages and introducing higher-end cooling. We’re going to see the transition occur to liquid cooling, however extra important cooling could also be required than simply rear door warmth exchangers.
  • Information motion” is a 3x alternative. That is an space that I gave some suggestions on to Raja by way of asking for extra element to be shared. Intel, as one can think about, has tooling to research the place energy is spent in techniques. Today a considerable amount of energy, and it may be a majority of energy, is spent shifting information round in a system and bundle. Because of this, issues like having larger levels of integration could make a significant distinction by way of rebalancing the ability that’s devoted to truly performing computations versus shifting information. For these following silicon photonics, that is coming, and we are going to cowl {that a} bit extra later.
  • The one which I feel many people concentrate on is course of expertise. Intel introduced a reasonably aggressive schedule for brand spanking new course of introduction. That’s the reason there’s a “Course of” 5x be aware. One key merchandise right here is that, particularly on the HPC/ GPU aspect, Intel is embracing the multi-die or multi-tile design together with superior packaging. That is particularly designed to have the ability to permit several types of silicon to be built-in utilizing the proper course of expertise in an effort to restrict threat as Intel strikes ahead to new generations.

Now 2EF x 16 x 2 x 3 x 5 is barely 960, nevertheless Aurora is listed as a >= 2EF peak system. My sense is that it is going to be properly above that. In flip that may permit a bit extra margin within the particular person gadgets above (tough estimates themselves) to advance and nonetheless hit a Zettascale system or roughly 1000x a current-generation 1 Exaflop system.

Now allow us to get to Intel’s HPC Technique web page:

Raja Chip Notes Intel HPC Strategy 2022 2028
Raja’s Chip Notes Intel HPC Technique 2022 2028

That is specified by principally three phases. Every of those phases roughly matches oneAPI variations that you will note on the left aspect. Raja harassed that taking over architectures as an organization was not simply constructing {hardware}. It’s, maybe extra considerably, additionally sustaining and investing in a hardware-software contract for constant order of magnitude efficiency beneficial properties.

Section 1: 2022 – 2023 – Exascale

  • Exascale for Intel actually begins with its 2022 lineup. This consists of Sapphire Rapids and Ponte Vecchio and we are going to see this in Aurora. Though these are 2022 merchandise, there’s a lot on the market on them. I’ve personally seen a number of Sapphire Rapids techniques between OCP Summit 2021 and SC21, so the business has transitioned from speaking about Sapphire as a far-out product to discussing the road extra definitively at this level.
Aurora Specs Accessed 2021 11 18
Aurora Specs Accessed 2021 11 18
  • The following technology Raja calls “optimizing Exascale.” It was getting late (round midnight) and neither of us might bear in mind if Intel had disclosed Granite Rapids. I checked and it was famous within the Intel Accelerated Manufacturing disclosure so one can learn Xeon-Subsequent right here as Granite Rapids. PVC-Subsequent is one other bridge that Intel has not publicly disclosed. The general message I took away from our discuss is that this subsequent technology was about enhancing the 2022 architectures.

Section 2: 2024 – 2025 – Pre-Zetta

  • Within the Pre-Zetta period, we get Falcon. Falcon is the Xeon + Xe mixture that might be extra equal to NVIDIA Grace. One thing that may turn out to be more and more necessary is integration. Eradicating extra SerDes from techniques saves a ton of energy and better ranges of integration imply that much less energy could be spent on shifting information round and extra energy can as a substitute be spent on compute.
  • Lightbender” is what we’ve all been ready for. That is silicon photonics built-in into chips. I’ve some tough concept of the goal specs, however since they didn’t make it to Raja’s Chip Notes let me set it up in another way. Intel has acknowledged that it’s shifting to a chiplet/ tile structure with more and more refined packaging. My sense is that this can be a silicon photonics tile answer that might be quick sufficient to do issues like transfer HBM or different varieties of reminiscence off of GPU/ CPU packages. That opens up the power for brand spanking new system design in addition to the power to simply differ capability and doubtlessly media varieties. A high-speed photonics interconnect additionally signifies that different units equivalent to processors could be bodily extra distant however with a high-speed hyperlink to the GPUs/ accelerators. That can permit for higher system design as properly.

Section 3: 2026 – 2028 – Zettascale

  • Since there’s not rather a lot on the notepad on this one, that is the subsequent step in refining all the completely different elements that Intel is constructing over the subsequent 4-5 years. A method to consider it’s a double-precision Zetaflop in one thing like a 50MW energy envelope. The opposite method to consider that is that it might result in the present 50MW Exaflop-class techniques down-scaling to be solely 50kW techniques that may slot in a rack or just a few racks. An affect of that is actually democratizing large-scale supercomputing. That’s what the road “Exascale on the Edge” is referring to on the primary sheet.
  • One other necessary be aware right here is that that is the timeframe when the structure, energy and thermals, information motion, and course of applied sciences would want to maneuver up the maturity curve. I requested Raja and he’s sensible. Numerous this expertise Intel has line-of-sight to, however not every little thing has been invented but. He acknowledged that there are dangers to the 1000x determine, however he was strolling me by way of the plan. My overarching sense having spent a while with Raja is that he feels some uncertainty and threat but additionally has a little bit of buffer constructed into components of the 1000x plan.
  • One will discover that the dates listed here are written a bit oddly with 26 – 27 – 28. I’ll have recommended including “28” to provide a bit extra margin for future applied sciences.

Now to the ultimate sheet:

Raja Chip Notes OneAPI And Phi Or Phi22
Raja’s Chip Notes oneAPI And Phi Or Phi22

Lately Raja posted this tweet:

For individuals who have no idea “Φ” is the twenty first letter of the Greek alphabet. Extra importantly for context, it’s also the title of the Xeon Phi line that Intel had within the HPC house for years. Raja famous that the Φ image appears considerably just like the O and I from oneAPI put collectively. My suggestion, provided that to realize the 1000x it’s seemingly we could have fewer piecemeal elements and as a substitute, a better degree of integration was to make use of years. So subsequent yr’s Sapphire Rapids plus Ponte Vecchio platform turns into Phi22 for Phi and 2022, the yr it’s going to debut. More than likely somebody at Intel has already mentioned this can be a unhealthy concept, however just a few beers in that was the suggestion.

Ultimate Phrases

First off, I simply needed to say thanks to Raja for taking the day trip of your night (and early morning) to have a number of beers and stroll by way of this. When the preliminary 1000x Zettascale claims got here out, many have been very skeptical. Aurora continues to be not put in and Intel is betting rather a lot on Ponte Vecchio and its utterly new model of chipbuilding. Nonetheless, after chatting with Raja, it seems like Intel has what I’d name its “Phi22” answer pretty properly established and is now the best way to execute an aggressive plan for the longer term. Frankly, Intel must have an aggressive plan right here as a result of if it doesn’t, different corporations will. Raja acknowledged the dangers, however no less than has a plan the place there are various items the corporate already has options for in an effort to get to 1000x. Personally, I can not wait till STH is reviewing 1 Exaflop options in our lab at solely 50kW.

Leave a Reply

Your email address will not be published. Required fields are marked *