NVIDIA is hooking up with four of the worlds largest original design manufacturers (ODMs) to help accelerate adoption of its GPUs into hyperscale datacenters. The new partner program would give Foxconn, Inventec, Quanta and Wistron early access to the HGX reference architecture, NVIDIAs server design for machine learning acceleration
Source: NVIDIA
HGX is an attempt to establish an industry-standard GPU box that maximizes computational density for machine learning workloads. It uses NVIDIAs most advanced GPUs, namely the Tesla P100, and soon, the Tesla V100. It glues eight of these into an NVLink cube mesh, and uses PCIe switching to allow CPUs to dynamically connect to them. Examples of this architecture include Microsoft's Project Olympus HGX-1 chassis, Facebook's Big Basin system, and NVIDIAs own DGX-1 server.
Facebooks Big Basin and Microsofts HGX-1 systems are GPU-only boxes, which rely on external CPU servers as hosts. Since the processor and co-processor are disaggregated, applications can fiddle with GPU-CPU ratio as needed. In most machine learning situations, you want a rather high ratio of GPUs to CPUs, since most of the processing ends up on the graphics chip. And in hyperscale/cloud datacenters, you also want the flexibility of allocating these resources dynamically as workloads shift around.
The DGX-1 server is a different animal altogether. Its a stand-alone machine learning appliance and includes two Xeon processors, along with the same eight-GPU NVLink mesh of its hyperscale cousins. As such, its not meant for cloud duty, but rather for businesses, research organizations, and software development firms that want an in-house machine learning box. SAP in the most prominent commercial buyer of the DGX-1, at least of those revealed publicly. But NVIDIA never intended to sell boatloads of these systems, especially since a lot of customers would prefer to rent machine learning cycles from cloud providers.
Thats why the ODM partnership could end up paying big dividends. These manufacturers already have the inside track with hyperscale customers, who have figured out that they can use these companies to get exactly the gear they want, and at sub-OEM pricing. ODMs are also more nimble than traditional server-makers, inasmuch that they can shorten the design-to-production timeline. That makes them better suited to the nearly continuous upgrade cycle of these mega-datacenters.
Given that the HGX-1 is manufactured by Foxconn subsidiary Ingrasys and the Big Basin system is built by Quanta, its a logical step for NVIDIA to include the other big ODMs, Inventec and Wistron, into the fold. The goal is to bring a wider range of HGX-type machinery to market and make them available to hyperscale customers other than just Microsoft and Facebook.
The other aspect of this is that NVIDIA would like to solidify its dominance with machine learning customers before Intel brings its AI-optimized silicon to market. Startup companies like Wave Computing and Graphcore also are threatening to challenge NVIDIA with their own custom chips. Establishing an industry-standard architecture before these competing solutions get market traction would help NVIDIA maintain its leadership.
To some extent, NVIDIA is also competing with some its biggest customers, like Google and Microsoft, both of which are building AI clouds based on their own technologies. In the case of Google, its their Tensor Processor Unit (TPU), which the search giant has upgraded for an expanded role that threatens NVIDIA directly. Meanwhile, Microsoft is filling out its AI infrastructure with an FPGA-based solution that, likewise, could sideline NVIDIA GPUs in Azure datacenters.
The prospect of using the future V100 Tesla GPUs in HPX platforms actually intensifies the competition, since these upcoming processors are built for both neural net training and inferencing. Although NVIDIA used to build its own inferencing-specific GPUs (the M4 and M40, followed by the P4 and P40), inferencing is also performed by regular CPUs and FPGAs, not to mention Googles TPUs, running in regular cloud servers.
Inferencing has somewhat different requirements than training, especially with regards to minimizing latency, but with the Volta architecture and the V100, NVIDIA thinks it has designed a solution that is capable of doing both, and doing so competitively. From a hyperscale companys point of view, there are some obvious advantages in separating inferencing, and certainly training infrastructure from the rest of the server farm not the least of which is being able to deploy and run machine learning gear in a more flexible manner. And since these upcoming V100 GPUs will be used by hyperscale companies for training, they are also likely to get a shot at some of those same companies inference workloads.
Finally, if NVIDIA manages to establish HGX as the standard GPU architecture for AI clouds, it makes its own recently announced GPU cloud platform more attractive. Since NVIDIAs cloud stack of machine learning libraries and frameworks runs on top of other peoples infrastructure, pushing its HGX architecture into the ecosystem would make NVIDIAs job of supporting the various hardware solutions that much simpler. It would also make it easier for customers to switch cloud providers without having to tweak their own software.
Well be able to tell if these ODM relationships pay off when we start seeing additional HGX solutions coming to market and being adopted by various cloud providers. As NVIDIA likes to remind us, its GPUs are used in the worlds top 10 hyperscale businesses today. If all goes as planned, someday it will be able to make the same claim for HGX.
Continue reading here:
NVIDIA Amps Up AI Cloud Strategy with ODM Partnerships - TOP500 News
- Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
- ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- [NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- “Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
- Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
- Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
- AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]