December 4, 2021

Entry-Degree Ampere For Edge Inference


Alongside a slew of software-related bulletins this morning from NVIDIA as a part of their fall GTC, the corporate has additionally quietly introduced a brand new server GPU product for the accelerator market: the NVIDIA A2. The brand new low-end member of the Ampere-based A-series accelerator household is designed for entry-level inference duties, and due to its comparatively small measurement and low energy consumption, can be being aimed toward edge computing eventualities as properly.

Together with serving because the low-end entry level into NVIDIA’s GPU accelerator product stack, the A2 appears supposed to largely exchange what was the final remaining member of NVIDIA’s earlier technology playing cards, the T4. Although a little bit of a higher-end card, the T4 was designed for most of the similar inference workloads, and got here in the identical HHHL single-slot kind issue. So the discharge of the A2 finishes the Ampere-ficiation of NVIDIA accelerator lineup, giving NVIDIA’s server clients a recent entry-level card.
























NVIDIA ML Accelerator Specification Comparability
  A100 A30 A2
FP32 CUDA Cores 6912 3584 1280
Tensor Cores 432 224 40
Enhance Clock 1.41GHz 1.44GHz 1.77GHz
Reminiscence Clock 3.2Gbps HBM2e 2.4Gbps HBM2 12.5Gbps GDDR6
Reminiscence Bus Width 5120-bit 3072-bit 128-bit
Reminiscence Bandwidth 2.0TB/sec 933GB/sec 200GB/sec
VRAM 80GB 24GB 16GB
Single Precision 19.5 TFLOPS 10.3 TFLOPS 4.5 TFLOPS
Double Precision 9.7 TFLOPS 5.2 TFLOPS 0.14 TFLOPS
INT8 Tensor 624 TOPS 330 TOPS 36 TOPS
FP16 Tensor 312 TFLOPS 165 TFLOPS 18 TFLOPS
TF32 Tensor 156 TFLOPS 82 TFLOPS 9 TFLOPS
Interconnect NVLink 3

12 Hyperlinks
PCIe 4.0 x16 +

NVLink 3 (4 Hyperlinks)
PCIe 4.0 x8
GPU GA100 GA100 GA107
Transistor Depend 54.2B 54.2B ?
TDP 400W 165W 40W-60W
Manufacturing Course of TSMC 7N TSMC 7N Samsung 8nm
Type Issue SXM4 SXM4 HHHL-SS PCIe
Structure Ampere Ampere Ampere

Going by NVIDIA’s official specs, the A2 seems to be utilizing a closely cut-down model of their low-end GA107 GPU. With solely 1280 CUDA cores (and 40 tensor cores), the A2 is simply utilizing about half of GA107’s capability. However that is in line with the scale and power-optimized purpose of the cardboard. A2 solely attracts 60W out of the field, and may be configured to drop down even additional, to 42W.

In comparison with its compute cores, NVIDIA is maintaining GA107’s full reminiscence bus for the A2 card. The 128-bit reminiscence bus is paired with 16GB of GDDR6, which is clocked at a barely uncommon 12.5Gbps. This works out to a flat 200GB/second of reminiscence bandwidth, so it will appear somebody actually needed to have a pleasant, spherical quantity there.

In any other case, as beforehand talked about, this can be a PCIe card in a half peak, half-length, single-slot (HHHL-SS) kind issue. And like all of NVIDIA’s server playing cards, A2 is passively cooled, counting on airflow from the host chassis. Talking of the host, GA107 solely provides 8 PCIe lanes, so the cardboard will get a PCIe 4.0 x8 connection again to its host CPU.

Wrapping issues up, in keeping with NVIDIA the A2 is out there instantly. NVIDIA doesn’t present public pricing for its server playing cards, however the brand new accelerator needs to be accessible by NVIDIA’s common OEM companions.

Leave a Reply

Your email address will not be published. Required fields are marked *