Not so very long ago, distributed computing meant clustering together a bunch of cheap X86 servers and equipping them with some form of middleware that allowed for work to be distributed across hundreds to thousands to sometimes tens of thousands of nodes. Such scale-out approaches, which added complexity to the software stack, were necessary because normal SMP and NUMA scale up techniques, with very tightly coupled compute and shared memory across a dozen or two nodes, simply could not stretch any further.
These distributed systems, which were difficult enough to build, are childs play compared to what we at The Next Platform are starting to call hyperdistributed systems, which are evolving as disaggregation and composability have entered the imagination of system architects at the same time as a wider and wider variety of compute, memory, storage, and networking components are available and are expected to be used in flexible rather than static ways.
The problem, say the co-founders of a stealth-mode startup called Enfabrica, is that this new hyperdistributed architecture has more bottlenecks than a well-stocked bar. And they say they have developed a combination of silicon, system hardware, and software that will create a new I/O architecture that better suits hyperdistributed systems. Enfabrica is not uncloaking from stealth mode just yet, but the companys founders reached out to us as they were securing their first round of funding $50 million from Sutter Hill Ventures and wanted to elaborate the problems they see in modern distributed systems before they eventually disclose how they have solved those problems.
Enfabrica was formed in 2020 by Rochan Sankar, its chief executive officer, Shrijeet Mukherjee, its chief development officer, plus other founding engineers, and its founding advisor is Christos Kozyrakis, a professor of electrical engineering and computer science at Stanford University for the past two decades who got his PhD in computer science at the University of California at Berkeley with none other than David Patterson as his PhD advisor. Kozyrakis runs the Multiscale Architecture and Systems Team (MAST) at Stanford and has done research stints at Google and Intel, among other organizations; he has done extensive work on vector processors, operating systems, cluster managers for clouds, and transactional memory systems.
Sankar got his bachelors in electrical engineering from the University of Toronto and an MBA from the Wharton School at the University of Pennsylvania and spent seven years at Cypress Semiconductor as an application engineer and chip architect and was notably the director of product marketing and management at Broadcom who drove five generations of its Trident and Tomahawk datacenter switching ASICs, which had over 300 million ports sold and generated billions of dollars in revenue for Broadcom.
Mukherjee got his Masters at the University of Oregon and spent eight years at Silicon Graphics working on high-end graphics systems before joining Cisco Systems as a member of its technical staff and becoming a director of engineering on the groundbreaking California Unified Computing System converged server-network system, specifically working on the virtual interface card that is a predecessor to the DPUs we see emerging today. After that, Mukherjee spent nearly seven years at Cumulus Networks as vice president of software engineering, building the software team that created its open source switch software (now part of the Nvidia stack along with the switch ASICs, NICs, and switch operating systems from the $6.9 billion acquisition of Mellanox Technologies.) When Nvidia bought Cumulus, Mukherjee did a two year stint at Google working on network architecture and platforms and he cant say much more about what he did there, as usual.
Sankar and Mukherjee got to know one another because it was a natural for the leading merchant silicon supplier for hyperscaler and cloud builder switches to get to know the open source network operating system supplier Cumulus needed Broadcom more than the other way around of course. Mukherjee and Kozyrakis worked together during their stints at Google. The team they have assembled the exact number is a secret are system architects and distributed systems engineers that have deployed planetscale software, Mukherjee put it, including people from Amazon Web Services, Broadcom, Cisco Systems, Digital Ocean, Facebook, Google, Intel, and Oracle.
We jointly saw a massive transformation happening in distributed computing, Sankar tells The Next Platform. And that is being keyed by the deceleration of Moores Law and on the fact that Intel has lost the leadership role in setting the pace on server architecture iterations. It is no longer a tick-tock cycle, which then drove all of the corresponding silicon and operating system innovation. That has been completely disrupted by the hyperscalers and cloud builders. And we are now in a race with heterogeneous instances of compute, storage, and networking. where we see a diversity of solutions, cloud sourced processors, other ASICs, GPUs, transcoders, FPGAs, disaggregated flash, potentially disaggregated memory. What we saw happening at the datacenter level in terms of the disaggregation of the architecture and the need for interconnects at the datacenter level is now headed straight into the rack.
It is hard to argue with that, and we dont. We see the same thing happening, and the I/O is way far out of whack with the compute and the storage. Take AI as an example.
AI chips are basically improving their processing capabilities by 10X to 100X, depending on who you believe, says Kozyrakis. At the same time is that systems are becoming bigger. If you look at just hyperscalers, its an order of magnitude increase in the size of datacenters. So we have this massive increase in compute capacity. But we need to provide the 10X, the 100X, the 1000X really, in the I/O connectivity infrastructure. Otherwise, it will be very difficult to bring the benefits of this capacity to bear.
To put it bluntly, hyperscaling was relatively easy if no less impressive for its time, but hyperdistribution is much more complex and it is never going to work without the right I/O. With hyperscaling, says Sankar, distributed systems were built with parent-child query architectures mapped onto homogeneous two-socket X86 server nodes with the same memory and storage and the same network interfaces. The hardware was essentially the same, and that made it all easy and drove volume economics to boot.
Datacenters are evolving into data pipelines, Sankar explains. The diversity of what is happening in the software layer with respect to how data is being processed is mapping into the infrastructure layers, and it is driving increasing heterogeneity in the server architectures to make them optimized. We firmly believe that the solutions that are being sketched out today suffer from problems with scalability and performance, and they suffer from the inability to be best of breed across a wide range of composable architectures.
And without really getting into specifics, Enfabrica says it is building the hardware and software that is going to glue all of this compute, storage, and networking together in a more scalable fashion. We strongly suspect that Enfabrica will borrow some ideas from fast networks and DPUs, but that this is also more than just having a DPU in every server and lashing them together. Pensando, Fungible, Nvidia, and Annapurna Labs within Amazon Web Services are already doing that. And to be frank, what those companies will tell you is that many of the ideas that are in those smart-NICs or DPUs came from the work that Mukherjee did on the virtual network and storage interfaces in the UCS platform. The work Mukherjee did with Cumulus has also figured prominently in the way certain hyperscalers do their networking today, by the way.
Without getting into specifics, since the company is still in stealth mode, Enfabrica thinks it has come up with a better idea for massively distributed I/O.
If you look at all of these companies, they have built a product and now they are going to try to convince people to use them, says Mukherjee. Whereas we assembled a team of people who know what the product needs to do and how it will actually fit in into the lattice of compute, network, and storage that it needs to fit into. This difference actually changes how we emphasize whats hardware and whats software, and where you need to put effort in and where you dont. For example, to make a very illustrative point: how big should a table of something be? Hardware is always going to be limited, software will always want everything to be unlimited. How do you make of those decisions and how do you partition? It requires people who have delivered these kinds of solutions because they understand where people are willing to take a cutback and where absolute line performance matters.
We realize that none of this tells you what Enfabrica is doing. But we can tell you how the company is thinking about I/O in the datacenter and the market sizes and players in these areas that it plans to disrupt. Take a look at this chart we have assembled:
This is what Sankar calls the $10 billion I/O problem that Enfabrica is trying to solve, and that is roughly the total addressable market of all of the silicon for interconnects shown above. This lays out all of the shortcomings of various layers of the interconnect stack.
Whatever Enfabrica is doing, we strongly suspect that it is going to disrupt each of these layers withing the server, within the rack, across the rows, and within the walls of the datacenter. The company is still in stealth mode and is not saying, but we expect to hear more in 2021 and 2022 as it works to intercept a slew of different technologies and scale out systems that are being architected for 2023 and beyond.
Read the original here:
Enfabrica Takes On Hyperdistributed I/O Bottlenecks - The Next Platform
- Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
- ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- [NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- “Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
- Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
- Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
- AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]