Graphcore Thinks It Can Get An AI Piece Of The HPC Exascale Pie – The Next Platform

For the last few years, Graphcore has primarily been focused on slinging its IPU chips for training and inference systems of varying sizes, but that is changing now as the six-year-old British chip designer is joining the conversation about the convergence of AI and high-performance computing.

There are now 168 supercomputers in the Top500 and quite a few more outside of that list that use accelerators to power these increasingly converging workloads. Most of these systems are using Nvidias GPUs, but the appearance of seven new systems with AMDs fresh Instinct MI250X GPUs which includes Oak Ridge National Laboratorys Frontier, the United States first exascale system shows there is an appetite to consider alternative architectures when they can provide an advantage.

Graphcore hopes it can soon get a slice of this action with its massively parallel processors.

Phil Brown, a Cray veteran who returned to Graphcore in May as vice president of scaled systems after a four-month stint at chip startup NextSilicon, tells The Next Platform that the IPU maker has recently seen significant, sustained interest from organizations that are considering deploying Graphcores specialized silicon for these converged AI and HPC needs, and this includes large deployments.

I think were now at the point where there is going to be significant interest in doing large-scale deployments with the systems. The technology space and machine learning capability has evolved sufficiently that it can deliver significant value to the scientific organizations, and so Im expecting those to follow quite rapidly in the future, he says.

Graphcore views three key opportunities around the convergence of HPC and AI: using IPUs class-leading performance for 32-bit floating point math to tackle HPC applications, training large foundation models like DeepMinds 280-billion-parameter language model, and using AI to complement and accelerate traditional HPC workloads to create a feedback loop of sorts.

Its the latter area that Brown says is likely the largest opportunity for Graphcore in HPC.

This may be having surrogate models, elements of a traditional HPC simulation, replaced by a machine learning kernel parameterization in a weather forecast, for example, he says. Surrogate models are computationally expensive, he added, so replacing them with a machine learning models that are much cheaper but equally accurate can help reduce the overall cost of running simulations.

These opportunities are based on exploratory work Graphcore has conducted with partners that has yielded promising results. For instance, the company says its IPUs were used to train a gravity wave drag model for weather forecasting five times faster than Nvidias V100. In another example, Hewlett Packard Enterprise trained a deep learning model for protein folding using Graphcores IPU-M2000 system and found that the second-generation IPU was around three times faster than Nvidias A100.

To help more the conversation forward, several government labs are in different stages of trying out Graphcores IPUs to see if the processors hold promise for large systems in the future.

Most recently, this includes the US Department of Energys Sandia National Laboratories and Argonne National Laboratory. Both are adding Graphcores Bow IPU Pod systems to their AI hardware testbeds, and Argonne is doing so after reporting impressive results with Graphcores first-generation IPU systems. These Bow Pods will use the chip designers recently announced Bow IPU, which makes use of Taiwan Semiconductor Manufacturing Cos wafer-on-wafer 3D stacking technology to provide more performance while using less power compared to its second-generation IPU.

Michael Papka, director of the Argonne Leadership Computing Facility, says the addition of Graphcores Bow IPU Pod supports the testbeds goal of understanding the role AI accelerators can play in advancing data-driven discoveries, and how these systems can be combined with supercomputers to scale to extremely large and complex science problems.

The University of Edinburghs EPCC supercomputing center is also installing a Bow IPU Pod system, which will use it for a broad range of use cases as part of the multi-industry-supporting Data Driven Innovation Programme that is funded by the governments of Scotland and the United Kingdom. EPCC has expressed interest in Graphcores in-development Good computer, which the company has promised will deliver more than 10 exaflops of AI floating point compute with next-generation IPUs.

If we were to travel 226 miles south of EPCC, wed find support for Graphcore from Englands Hartree Centre, which plans to access IPUs through cloud service provider G-Core Cloud to conduct research on fusion energy as part of a partnership with the UK Atomic Energy Authority.

While Graphcore is building its own exascale supercomputer for AI with the Good system, Brown saus he believes the companys IPUs will be well-suited for other exascale supercomputers in the future, ranging from those that are very AI-focused to those running traditional simulation software that could benefit from performing such calculations at a lower precision on IPUs.

This means that, in Browns mind, an exascale system could consist mostly of Graphcore IPUs or the processors could be a component of a larger heterogenous system, which he says is based on feedback hes heard from people in the HPC community.

The message that weve been getting from them is that theyre very interested in exploring exascale system architectures that include components of different types that give them a good balance of overall capability for their systems, because they recognize that the workloads are going to become more heterogeneous in terms of the space but also the performance and the value proposition you get from these heterogeneous processors is well worth the investment, he says.

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.Subscribe now

Originally posted here:
Graphcore Thinks It Can Get An AI Piece Of The HPC Exascale Pie - The Next Platform

Related Posts

Comments are closed.