A row of eight NVIDIA graphics processing units (GPUs) packed into a Big Sur machine learning server at Facebook's data center in Prineville, Oregon. (Photo: Rich Miller)
The data center team at eBay is plenty familiar with high density data centers. The e-commerce giant has been running racks with more than 30 kilowatts (kW) of power density at the SUPERNAP in Las Vegas, seeking to fill every available slot in racks whenever possible.
But as eBay has begun applying artificial intelligence (AI) to its IT operations, the company has deployed more servers using graphics processing units (GPUs) instead of traditional CPUs.
From a data center power and cooling perspective, theyre a real challenge, said Serena DeVito, an Advanced Data Center Engineer at eBay. Most data centers are not ready for them. These are really power hungry little boxes.
The rise of artificial intelligence, and the GPU computing hardware that often supports it, is reshaping the data center industrys relationship with power density. New hardware for AI workloads is packing more computing power into each piece of equipment, boosting the power density the amount of electricity used by servers and storage in a rack or cabinet and the accompanying heat. The trend is challenging traditional practices in data center cooling, and prompting data center operators to adapt new strategies and designs.
All signs suggest that we are in the early phase of the adoption of AI hardware by data center users. For the moment, the trend is focused on hyperscale players, who are pursuing AI and machine learning at Internet scale. But soon there will be a larger group of companies and industries hoping to integrate AI into their products, and in many cases, their data centers.
Amazon Web Services, Microsoft Azure, Google Cloud Platform and IBM all offer GPU cloud servers. Facebook and Microsoft have each developed GPU-accelerated servers for their in-house machine learning operations, while Google went a step further, designing and building its own custom silicon for AI.
AI is the fastest-growing segment of the data center, but it is still nascent, said Diane Bryant, the Executive VP and General Manager of Intels Data Center Group. Bryant says that 7 percent of servers sold in 2016 were dedicated for AI workloads. While that is still a small percentage of its business, Intel has invested more than $32 billion in acquisitions of Altera, Nervana and MobilEye to prepare for a world in which specialized computing for AI workloads will become more important.
The appetite for accelerated computing shows up most clearly at NVIDIA, the market leader in GPU computing, which has seen its revenue from data center customers leap 205 percent over the past year. NVIDIAs prowess in parallel processing was seen first in supercomputing and high-performance computing (HPC), and supported by facilities with specialized cooling using water or refrigerants. The arrival of HPC-style density in data centers is driven by the broad application of machine learning technologies.
Deep learning on Nvidia GPUs, a breakthrough approach to AI, is helping to tackle challenges such as self-driving cars, early cancer detection and weather prediction, said Nvidia cofounder and CEO Jen-Hsun Huang. We can now see that GPU-based deep learning will revolutionize major industries, from consumer internet and transportation to health care and manufacturing. The era of AI is upon us.
And with the dawn of the AI era comes a rise in rack density, first at the hyperscale players and soon at multi-tenant colocation centers.
How much density are we talking about? A kilowatt per rack unit is common with these GPUs, said Peter Harrison, the co-founder and Chief Technical Officer at Colovore, a Silicon Valley colocation business that specializes in high-density hosting. These are real deployments. These customers are pushing to the point where 30kW or 40kW loads (per cabinet) are easily possible today.
A good example is CirraScale, a service provider that specializes in GPU-powered cloud services for AI and machine learning. CirraScale hosts some of its infrastructure in custom high-density cabinets at the ScaleMatrix data center in San Diego.
These technologies are pushing the envelope, said Chris Orlando, the Chief Sales and Marketing Officer and a co-founder of ScaleMatrix. We have people from around the country seeking us out because they have dense platforms that are pushing the limits of what their data centers can handle. With densities and workloads changing rapidly, its hard to see the future.
Cirrascale, the successor to the Verari HPC business, operates several rows of cabinets at ScaleMatrix, which house between 11 and 14 GPU servers per cabinet, including some connecting eight NVIDIA GPUs using PCIe a configuration also seen in Facebooks Big Sur AI appliance andthe NVIDIA DGX-1supercomputer in a box.
Over the past decade, there have been numerous predictions of the imminent arrival of higher rack power densities. Yet extreme densities remain limited, primarily seen in HPC. The consensus view is that most data centers average 3kW to 6kW a rack, with hyperscale facilities running at about 10kW per rack.
Yet the interest in AI extends beyond the HPC environments at universities and research labs, bringing these workloads into cloud data centers. Service providers specializing in high-density computing have also seen growing business from machine learning and AI workloads. These companies use different strategies and designs to cool high-density cabinets.
A TSCIF aisle containment system inside the SUPERNAP campus in Las Vegas. (Photo: Switch)
The primary strategy is containment, which creates a physical separation between cold air and hot air in the data hall. One of the pioneers in containment has been Switch, whose SUPERNAP data centers use a hot-aisle containment system to handle workloads of 30kW a rack and beyond. This capability has won the business of many large customers, allowing them to pack more computing power into a smaller footprint. Prominent customers include eBay, with its historic focus on density, which hosts its GPU-powered AI hardware at the SUPERNAPs in Las Vegas.
For hyperscale operators, data center economics dictates a middle path on the density spectrum. Facebook, Google and Microsoft operate their data centers at higher temperatures, often above 80 degrees in the cold aisle. This saves money on power and cooling, but those higher temperatures make it difficult to manage HPC-style density. Facebook, for example, seeks to keep racks around 10 kW, so it runs just four of its Big Sur and Big Basin AI servers in each rack. The units are each 3U in depth.
Facebooks machine learning servers feature eight NVIDIA GPUs, which the custom chassis design places directly in front of the cool air being drawn into the system, removing preheat from other components and improving the overall thermal efficiency. Microsofts HGX-1 machine learning server, developed with NVIDIA and Ingrasys/Foxconn, also features eight GPUs.
A custom rack in a Google data center packed with Tensor Processing Unit hardware for machine learning. (Photo: Google)
While much of the AI density discussion has focused on NVIDIA gear, GPUs arent the only hardware being adopted for artificial intelligence computing, and just about all of these chips result in higher power densities.
Google decided to design and build its own AI hardware centered on the Tensor Processing Unit (TPU), a custom ASIC tailored for Googles TensorFlow open source software library for machine learning. An ASIC (Application Specific Integrated Circuits) is a chip that can be customized to perform a specific task, squeezing more operations per second into the silicon. A board with a TPU fits into a hard disk drive slot in a data center rack.
Those TPUs are more energy dense than a traditional x86 server, said Joe Kava, the Vice President of Data Centers at Google. If you have a full rack of TPUs, it will draw more power than a traditional rack. It hasnt really changed anything for us. We have the ability in our data center design to adapt for higher density. As a percentage of the total fleet, its not a majority of our (hardware).
Tomorrow: We look at data center service providers focused on GPU hosting, and how they are designing for extreme density.
See the rest here:
AI Boom Boosts GPU Adoption, High-Density Cooling - Data Center Frontier (blog)
- Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
- ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- [NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- “Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
- Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
- Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
- AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]