Dynamic allocation of resources inside of a system, within a cluster, and across clusters is a bin-packing nightmare for hyperscalers and cloud builders. No two workloads need the same ratios of compute, memory, storage, and network, and yet these service providers need to present the illusion of configuration flexibility and vast capacity. But capacity inevitably ends up being stranded.
It is absolutely unavoidable.
But because main memory in systems is very expensive, and will continue to grow more expensive over time relative to the costs of other components in the system, the stranding of memory capacity has to be minimized, and it is not as simple as just letting VMs grab the extra memory and hoping that the extra megabytes and gigabytes yield better performance when they are thrown at virtual machines running atop a hypervisor on a server. The number of moving parts here is high, but dynamically allocating resources like memory and trying to keep it from being stranded meaning all of the cores in a machine have memory allocations and there is memory capacity left over that cant be used because there are no cores assigned to it is far better than having a static configuration of memory per core. Such as the most blunt approach, which would be to take the memory capacity, divide it by the number of cores, and give each core the same sized piece.
If you like simplicity, that works. But we shudder to think of the performance implications that such a static linking of cores and memory might have. Memory pooling over CXL is taking off among the hyperscalers and cloud builders as they try to deploy that new protocol it atop CPUs configured with PCI-Express 5.0 peripheral links. We covered Facebooks research and development recently as well as some other work being done at Pacific Northwest National Laboratory, and have discussed the prognostications about CXL memory from Intel and Marvell as well.
Microsofts Azure cloud has also been working on CXL memory pooling as it tries to tackle stranded and frigid memory, the latter being a kind of stranded memory where there are no cores left on the hypervisor to tap into that memory and the former being a broader example of memory that is allocated by the hypervisor for VMs but is nonetheless never actually used by the operating system and applications running in the VM.
According to a recent paper published by Microsoft Azure, Microsoft Research, and Carnegie Mellon University, DRAM memory can account for more than 50 percent of the cost of building a server for Azure, which is a lot higher than the average of 30 percent to 35 percent that we cited last week when we walked the Marvell CXL memory roadmap into the future. But this may be more of a function of the deep discounting that hyperscalers and cloud builders can get in a competitive CPU market, with Intel and AMD slugging it out, and that DRAM memory for servers is much more constrained and that Micron Technology, Samsung, and SK Hynix as well as their downstream DIMM makers can charge what are outrageous prices compared to historical trends because there is more demand than supply. And when it comes to servers, we think the memory makers like it that way.
Memory stranding is a big issue because that capital expense for memory is huge. If a hyperscaler or cloud builder is spending tens of billions of dollars a year on IT infrastructure, then it is billions of dollars on memory, and driving up memory usage in any way has to potential to save that hyperscaler or cloud builder hundreds of millions of dollars a year.
How bad is the problem? Bad enough for Microsoft to cite a statistic from rival Google, which has said that the average utilization of the DRAM across its clusters is somewhere around 40 percent. That is, of course, terrible. Microsoft took measurements of 100 clusters running on the Azure cloud that is clusters, not server nodes, and it did not specify the size of these clusters over a 75 day period, and found out some surprising things.
First, somewhere around 50 percent of the VMs running on these Azure clusters never touch 50 percent of the memory that is configured to them when they are rented. The other interesting bit is that as more and more of the cores are allocated to VMs on a cluster, the share of the memory that becomes stranded rises. Like this:
To be specific, when 75 percent of cores in a cluster are allocated, 6 percent of the memory is stranded. This rises to 10 percent of memory when 85 percent of the cores are allocated to VMs, 13 percent at 90 percent of cores, and full loading of cores it can hit 25 percent and outliers can push that to as high 30 percent of DRAM capacity across the cluster being stranded. On the chart on the right above, the workload changed halfway through and there was a lot more memory stranding.
The other neat thing Microsoft noticed on its Azure clusters which again have VMs of all shapes and sizes running real-world workloads for both Microsoft itself and its cloud customers that almost all VMs that companies deploy fit within one NUMA region on a node within the cluster. This is very, very convenient because spanning NUMA regions really messes with VM performance. NUMA spanning happens on about 2 percent of VMs and on less than 1 percent of memory pages, and that is no accident because the Azure hypervisor tries to schedule VMs both their cores and their memory on a single NUMA node by intent.
The Azure cloud does not currently pool memory and share it across nodes in a cluster, but that stranded and frigid DRAM memory could be moved to a CXL memory pool without any impact to performance, and some of the allocated local memory on the VMs in a node could be allocated out to a CXL memory pool, which Microsoft calls a zNUMA pool because it is a zero-core virtual NUMA node, and one that Linux understands because it already supports CPU-less NUMA memory extensions in its kernel. This zNUMA software layer is clever in that it has statistical techniques to learn which workloads have memory latency sensitivity and those that dont. So, workloads dont have such sensitivity, they get their memory allocated all or in part out to the DRAM pool over CXL and if they do, then the software allocates memory locally on the node and also from that core-less frigid memory. Here is what the decision tree looks like to give you a taste:
This is a lot hairier than it sounds, as you will see from reading the paper, but the clever bit as far as we are concerned is that Microsoft has come up with a way to create CXL memory pools that doesnt mess with applications and operating systems, which it says is a key requirement for adding CXL extended memory to its Azure cloud. The Azure hypervisor did have to be tweaked to extend the API between the server nodes and the Autopilot Azure control plane to the zNUMA external memory controller, which has four 80-bit DDR5 memory channels and multiple CXL ports running over PCI-Express 5.0 links that implements the CXL.memory load/store memory semantics protocol. (We wonder if this is a Tanzanite device, which we talked about recently after Marvell acquired the company.) Each CPU socket in the Azure cluster links to multiple EMCs and therefore multiple blocks of external DRAM that comprise the pool.
The servers used in the Microsoft test are nothing special. They are two-socket machines with a pair of 24-core Skylake Xeon SP-8157M processors. It looks like the researchers emulated a CPU with a CXL memory pool by disabling all of the cores in one socket and making all of its memory available to the first socket over UltraPath links. It is not at all clear how such vintage servers plug into the EMC device, but it must be a PCI-Express 3.0 link since that is all that Skylake Xeon SPs support. We find it peculiar that the zNUMA tests were not run with Ice Lake Xeon SP processors with DDR5 memory on the nodes and PCI-Express 5.0 ports.
The DRAM access time on the CPU socket in a node was measured at 78 nanoseconds and the bandwidth was over 80 GB/sec from the socket-local memory. The researchers say that when using only zNUMA memory the bandwidth is around 30 GB/sec, or about 75 percent of the bandwidth of a CXL x8 link, and it added another 67 nanoseconds to the latency.
Here is what the zNUMA setup looks like:
Microsoft says that a CXL x8 link matches the bandwidth of a DDR5 memory channel. In the simplest configuration, with four or eight total CPU sockets, each EMC can be directly connected to each socket in the pod and that cable lengths are short enough so that the latency out to the zNUMA memory is an additional 67 nanoseconds. If you want to hook the zNUMA memory into a larger pool of servers say, a total of 32 sockets then you can lower the amount of overall memory that gets stranded but you have to add retimers to extend the cable and that pushes the latency out to zNUMA memory to around 87 nanoseconds.
Unstranding the memory and driving up overall utilization of the memory is a big deal for Microsoft, but there are performance implications of using the zNUMA memory:
Of the 158 workloads tested above, 20 percent had no slowdown using CXL memory, and 23 percent had a slowdown of 5 percent or less. Which is good. But as you can see, some workloads were hit pretty hard. About a quarter of the workloads had a 20 percent or greater performance hit from using zNUMA memory for at least some of their capacity and 12 percent of the workloads had their performance cropped by 30 percent or more. Applications that are already NUMA aware have been tweaked so they understand memory and compute locality well, and we strongly suspect that workloads will have to be tweaked to use CXL memory and controllers like the EMC device.
And just because we think all memory will have CXL attachment in the server over time does not mean we think that all memory will be local and that CXL somehow makes latency issues disappear. It makes it a little more complicated than a big, fat NUMA box. But not impossibly more complicated and that is why research line the zNUMA effort at Microsoft is so important. Such research points the way on how this can be done.
Here is the real point: Microsoft found that by pooling memory across 16 sockets and 32 sockets in a cluster, it could reduce the memory demand by 10 percent. That means cutting the cost of servers by 4 percent to 5 percent, and that is real money in the bank. Hundreds of millions of dollars a year per hyperscaler and cloud builder.
We are counting on people creating the PCI-Express 6.0 and 7.0 standards and the electronics implementing these protocols to push down to reduce latencies as much as they push up to increase bandwidth. Disaggregated memory and the emergency of CXL as a universal memory fabric will depend on this.
Link:
Microsoft Azure Blazes The Disaggregated Memory Trail With zNUMA - The Next Platform
- Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
- ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- [NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- “Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
- Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
- Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
- AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]