Google Targets Nvidia With Learning-Capable Cloud TPU – ExtremeTech

Only a week after Nvidias new AI-focused Volta GPU architecture was announced, Google aims to steal some of its thunder with its new, second-generation, Tensor Processing Unit (TPU) that it calls a Cloud TPU. While its first generation chip was only suitable for inferencing, and therefore didnt pose much of a threat to Nvidias dominance in machine learning, the new version is equally at home with both the training and running of AI systems.

At 180 teraflops, Googles Cloud TPU packs more punch, at least by that one measure, than the Volta-powered Tesla V100 at 120 teraflops (trillion floating point operations per second). However, until both chips are available, it wont be possible to get a sense of a real world comparison. Much like Nvidia has built servers out of multiple V100s, Google has also constructed TPU Pods that combine multiple TPUs to achieve 11.5 petaflops (11,500 teraflops) of performance.

For Google, this performance is already paying off. As one example, a Google model that required an entire day to train on a cluster of 32 high-end GPUs (probably Pascal), can be trained in an afternoon on one-eighth of a TPU Pod (a full pod is 64TPUs, so that means on 8TPUs). Of course, standard GPUs can be used for all sorts of other things, while the Google TPUs are limited to the training and running of models written using Googles tools.

Google is making its Cloud TPUs available as part of its Google Compute offering, and says that they will be priced similar to GPUs. That isnt enough information to say how they will compare in cost to renting time on an Nvidia V100, but Id expect it to be very competitive. One drawback, though, is that the Google TPUs currently only support TensorFlow and Googles tools. As powerful as they are, many developers will not want to get locked into Googles machine learning framework.

While Google is making its Cloud TPU available as part of its Google Compute cloud, it hasnt said anything about making it available outside Googles own server farms. So it isnt competing with on-premise GPUs, and certainly wont be available on competitive clouds from Microsoft and Amazon. In fact, it is likely to deepen their partnerships with Nvidia.

The other company that should probably be worried is Intel. It has been woefully behind in GPUs, which means it hasnt made much of a dent in the rapidly growing market for GPGPU (General Purpose computing on GPUs), of which machine learning is a huge part. This is just one more way that chip dollars that could have gone to Intel, wont.

Big picture, more machine learning applications will be moving to the cloud. In some cases if you can tolerate being pre-empted its already less expensive to rent GPU clusters in the cloud than it is to power them locally. That equation is only going to get more lopsided with chips like the Volta and the new Google TPU being added to cloud servers. Google knows that key to increasing its share of that market is having more leading edge software running on its chips, so it is making 1,000 Cloud TPUs available for free to researchers willing to share the results of their work.

The rest is here:
Google Targets Nvidia With Learning-Capable Cloud TPU - ExtremeTech

Related Posts

Comments are closed.