The Xilinx Alveo U55C marks a brand new push by the corporate to get into the HPC accelerator market, and with a reasonably distinctive angle. Particularly, Xilinx has a tool with networking, FPGA logic area, and HBM designed to speed up some high-performance workloads. Allow us to get into this announcement.
Xilinx Alveo U55C Brings HBM FPGAs to the HPC Market
The Xilinx Alveo U55C is in some methods a method for Xilinx to uniquely enter a market at the moment dominated by NVIDIA. What Xilinx has right here, is principally a smaller model of NVIDIA’s imaginative and prescient for Grace. Whereas which will appear far fetched at first, the Alveo U55C has high-speed community cloth, its personal management processor, the power to speed up workloads utilizing programmable acceleration, and high-bandwidth reminiscence all in a single card. NVIDIA’s Grace continues to be years out, however this imaginative and prescient is right here at this time with the Xilinx Alveo U55C (observe NVIDIA has the BlueField-2 A100 that’s its providing at this time, but it’s not as typically accessible.)
The fundamental concept right here is that Xilinx permits one to create custom-made accelerator logic on the cardboard hooked up to 16GB of HBM2. If information is available in off of a community interface, it doesn’t have to undergo to the host system. The acceleration could be pipelined immediately on card.
Listed here are the important thing specs for the U55C. There are just a few factors price noting. First, Xilinx is evaluating this to the Alveo U280, however there are some main variations. The U55C doubles the HBM2 reminiscence, but it surely loses the DDR4 reminiscence. The opposite key distinction is the cardboard is now a single slot answer as an alternative of a twin slot answer. It additionally helps DDR4. One different attention-grabbing merchandise is that the everyday energy is up from 100W to 115W however the most energy is just 150W as an alternative of 225W. That makes it a lot simpler to combine into programs, particularly in constrained energy deployments.
A little bit of context right here can be essential. Whereas one usually thinks of HPC accelerators as the massive 500W+ GPUs that sit in centralized supercomputers, there are a variety of workloads which are extra distributed.
An amazing instance of that is the CSIRO case research. The way in which to consider this one is that these playing cards are deployed throughout an unlimited radio astronomy antenna array. The IT gear is all photo voltaic powered which means that there are actual energy constraints, so these playing cards use lower than the 115W/150W scores and may solely use 90W. Right here, having the playing cards implies that information could be ingested, and processed through a customized pipeline within the FPGA cloth leveraging HBM2. Playing cards just like the NVIDIA T4 and NVIDIA A2 do not need HBM onboard nor have they got networking. So a single slot means fewer packing containers and decrease general energy consumption.
Past these scientific HPC pursuits, Xilinx is discussing outcomes of acceleration with LS-DYNA. That is an space the place a big portion of the simulations could be carried out in a customized logic on the FPGA cloth. Then the constraint turns into reminiscence bandwidth and that’s the place the HBM2 is available in. We are going to simply rapidly observe that the comparability right here is the Intel Xeon Platinum 8260L a 2019 period processor with 24 cores and 35.75MB of cache. We now have the AMD Milan-X with 64 cores and 0.75GB of L3 cache. That L3 cache is designed to supply a big speedup by avoiding the necessity to go to slower DDR4, very similar to HBM is used though at completely different latency/ capability tiers.
We requested about pricing for the Xilinx accelerated LS-DYNA however didn’t get a solution as to the way it compares.
Past a single card, Xilinx is scaling out the structure to a number of playing cards so it has options like RoCE v2 and MPI capabilities. These are capabilities required for scale out workloads.
Vitis is Xilinx’s software program platform that enables builders to work inside acquainted frameworks and never have to put logic on FPGAs. Xilinx has been placing a variety of effort into this to make it simpler to make use of its merchandise.
Right here is an instance of how this interprets into HPC-style domains:
After all, this can be a newer entrant into many of those areas so having simply accessible software program instruments is essential.
It’s refreshing to see one thing completely different. It is a lot of innovation becoming performance right into a 150W single slot energy envelope. At STH we now have been reviewing servers with 4x 400W or 8x 300W GPUs over the previous few days after which add one other a number of hundred watts for CPUs and NICs. Having an built-in answer, critically with HBM2 is actually one thing completely different.
For these questioning, you should buy the Xilinx Alveo U55C on Xilinx.com. Xilinx can be engaged on getting these playing cards into varied companion clouds.
Nonetheless, this is without doubt one of the extra distinctive and attention-grabbing options that we’re seeing at SC21. There are a ton of domains the place next-generation 500-600W accelerators are nice, however there are others the place they’re merely not sensible. The massive query is admittedly round whether or not Vitis can unlock the facility of the U55C over the subsequent few quarters to assist the playing cards achieve adoption.