Clockwork discovering wasted bandwidth between the nanoseconds – diginomica

If time is money, then what is the value of a nanosecond (billionth of a second)? Well, if you are building a large network of distributed applications, it could mean a ten percent improvement in performance or a ten percent reduction in cost for the same workload. It could also mean orders of magnitude fewer errors in transaction processing systems and databases.

At least that is according to Balaji Prabhakar, VMWare Founders Professor of Computer Science at Stanford University, whose research team helped pioneer more efficient approaches for synchronizing clocks in distributed systems. He later co-founded and is CEO of TickTock, which became Clockwork to commercialize the new technology. He also previously co-founded Urban Engines, which developed algorithms for congestion tracking and was acquired by Google in 2016. He has been working on designing algorithms to improve network performance for decades.

The company initially focused on improving the fairness of market placements in financial exchanges. They have started building out a suite of tools to synchronize cloud applications and enterprise networking infrastructure more broadly. Accurate clocks can help networks and applications to improve consistency, event ordering, and scheduling of tasks and resources with more precise timing.

This is a big advantage over the quartz clocks underpinning most computer and network timing, which can drift significantly enough to confound time-stamping processes in networks and transaction processing. Traditional network-based synchronization can help reduce this drift but suffers from path noise created by fluctuations in switching times, asymmetries in path lengths, and clock time stamp noise.

Prabhakar says some customers are interested in cost conservation and want to right-size deployments and switch off virtual machines they no longer need. He notes:

So, if they save 10% or more, and we charge them just 2%, the remaining is just pure savings.

Others want a more performant infrastructure. Clockwork did one case study that found they could get seventy VMs to do the work of a hundred by running apps and infrastructure more efficiently.

It is important to point out that there are two levels of improvement in their new approach. The new protocols can achieve ten nanosecond accuracy with direct access to networking hardware. In cloud scenarios mediated by virtual machines, the protocol can achieve a few microseconds of accuracy. However, thats still good enough to satisfy the new European MiFID IT requirements for high-frequency trading and many other use cases. It is also helpful that the clock sync agent requires less than one percent of a single-core CPU and less than 0.04% of the slowest cloud link while saving 10% of bandwidth.

Perhaps the most important thing to consider is the impact it could have on the trend toward clockless design in distributed systems. Clockless designs help scale up new application and database architectures but make basic operations like consistency, event ordering, and snapshotting difficult.

The more accurate clock sync technology is already showing promise in improving tracing tools, mitigating network congestion, and improving the performance of distributed databases like CockroachDB. Over the last couple of years, Clockwork has been building out supporting infrastructure around the new protocol called HUYGENS to improve cloud congestion control, create digital twins of virtual machine placement, and improve distributed database performance by ten to a hundred times. It is named after Christiaan Huygens, who invented the pendulum clock in the 1600s, which became the most accurate timekeeper until the commercialization of quartz clocks in the late 1960s.

The impact of synchronized time is increasingly important as the world transitions from dedicated networks and compute to various forms of statistical multiplexing. Networks have been transitioning from dedicated circuits using protocols like circuit-switched networks and asynchronous transmission mode (ATM), which delivered high-level performance for each user but wasted unused bandwidth. As a result, the industry has been migrating to TCP/IP and wide-area Ethernet, which do a better job of sharing unused bandwidth but can get clogged up, causing delays when the load gets too high.

A similar thing has been happening with compute. Legacy enterprise systems built on dedicated hardware guarantee high performance. However, these struggle to reallocate compute across multiple applications with varying usage requirements or scale out across multiple servers. The move towards virtual machines, cloud architectures, and now containers helps enterprises gain the same economies for compute that TCP/IP brought to networking.

However, problems with statistical balancing approaches arise when too many users or apps hit the edges of performance. Packets get lost, and transactions dont get processed, resulting in increased delays and additional overhead as services try to make up for lost time with retries. More precise time synchronization helps networks, apps, and micro-services reach their peak load and then gracefully back down when required without wasting resources on packet retries or additional transaction processing.

Referring to the transition from dedicated compute and networks to modern approaches, Prabhakar says:

The trade-off cost us. In communication, we went from deterministic transit times to best-effort service. And computing went from centralized control of dedicated resources to highly variable runtimes and making us coordinate through consensus protocols.

To contextualize the field, the synchronization of mechanical clocks played an important role in improving efficiency and reducing railroad accidents in the 1840s. More recently, innovations in clocks built using quartz, rubidium, and cesium helped pave the way for more reliable and precise clocks. These led to more reliable networks, operations, and automation and played an essential role in the global positioning system (GPS) for accurate location tracking.

However, the inexpensive clocks built into standard computer and networking equipment tend to drift over time. In 1980, computer scientists developed network time protocol (NTP) for achieving millisecond (thousandths of a second) accuracy. Although the protocol supports 200 picoseconds (trillionth of a second) resolution, it loses accuracy owing to varying delays in packet networks, called packet noise.

One innovation on top of NTP, called chrony, combines advanced filtering and tracking algorithms to maintain tighter synchronization. Most cloud providers now recommend and support chrony with optimized configuration files for VMs.

Various other techniques, such as precision time protocol (PTP), data center time protocol (DTP), and pulse per second (PPS), achieve tens of nanosecond accuracy but require expensive hardware upgrades. They also sometimes require precisely measured cables in a data center between a mother clock on a central server and daughter clocks on distributed servers.

Clockworks HUYGENS innovated on NTP with a pure software approach that can be enhanced by existing networking hardware. It uses coded time transmission signals that help to identify and reject bad data caused by queuing delays, random jitter, and network card time stamp noise. It also processes the data using support vector machines that help estimate the one-way propagation times and achieve clock synchronization within 100 nanoseconds. Prior techniques required a round trip, which suffered from differences in each packet's routes.

Another substantial difference is that HUYGENS trades timing data across a mesh to improve resolution instead of the client-server approach used with NTP. The agent on each machine periodically exchanges small packets with five to ten other machines to determine the clock drift for each server or virtual machine in a mesh. The agent, in turn, generates a multiplier for slowing or speeding up the clock as prescribed by the corrections.

Ideally, all the computers would use the most advanced clocks available, but these are expensive and only practical for special applications. As a result, most modern clocks count the electrical vibration in quartz crystals that resonate at 32.7 thousand times per second (called a hertz for short). These are 100 times more accurate than mechanical approaches. These are inexpensive but can drift 6-10 microseconds per second unless cooled with more expensive hardware.

Atomic clocks monitor the cadence of atoms oscillating between energy states. These clocks are so precise that in 1967, a second was defined by the 9.192 billion oscillations per second of a cesium atom. Rubidium is a cheaper secondary clock that ticks at about 6.8 billion hertz. Current atomic clocks drift a second every hundred million years. However, in practice, they must be replaced every seven years. The current most accurate timekeepers, in labs only for now, use strontium that ticks at over a million billion hertz. These only drift a second in 15 billion years and are used for precise gravity, motion, and magnetic field measurements.

It's important to note that the lack of precision in quartz arises from the lack of temperature controls. Prabhakar says:

If these [quartz] clocks were temperature controlled, you can get down to the parts per billion. So, it'll be some small number of nanoseconds per second. Now, those kinds of clocks and network interface cards could easily be in the few hundreds of dollars to possibly up to $1,000 on their own. And the next level is rubidium clocks, which are three to five grand, and then cesium. As you add these costs to the raw cost of a server, you're piling up the costs across a large data center. So, it'd be nice if we could do it without having to resort to that. And that's more or less what we do.

Understanding virtual infrastructure is a dark art since most cloud providers dont inform you about their physical placement. In theory, at least each VM and networking connection is similar. In practice, it is not so simple. Clockworks has been developing a suite of tools to help analyze and optimize the cloud infrastructure using the new protocol. One research project last year explored the nuances of VM colocation.

A simple analysis might suggest that two VMs running on the same server would have a better connection to each other since packets might be able to flow over the faster internal bus. But Clockworks research across Google, Amazon, and Microsoft clouds revealed this is not necessarily the case. The fundamental issue is that the virtual networking service built into hypervisors that run these VMs creates a bottleneck. Sometimes, the hypervisor even tries to run what one would think should be local networking calls to a co-located VM across acceleration services over the much slower external network rather than just the much faster computer bus.

The problem is confounded when enterprises attempt to collocate multiple VMs running similar apps. For example, a business might have multiple instances of a front-end or business logic app all connected to a back-end database. But performance slows significantly during peak traffic when they are all trying to access the backend server. In one instance, they found that four co-located VMs only saw a quarter of the expected bandwidth arising from this competition. The fundamental problem, they surmised, was that the cloud providers were over-allocating bandwidth in the belief that each VM would require peak networking at different times.

Although the technology could improve many aspects of distributed networking, Clockwork is focusing on the cloud for now because that is the biggest consolidated market. Prabakar says:

Cloud is a nice place to sell because it's a place, and it's very big. I'm sure we could improve enterprise LANs and hotel Wi-Fi. But we started with the more consolidated, high-end crowd first and will then go from there.

I never really thought much about time synchronization until I heard about Clockwork a month ago. A few years ago, I was elated that Microsoft started using NTP to automatically tune my computer clock, which always seemed to drift a few minutes per month.

It seems like any protocol or tool that can automatically identify and reduce wasted bandwidth and computer resources could have a long shelf life and provide incredible value. The only concern is that HUYGENS is currently a proprietary protocol, which may limit its broader adoption as opposed to NTP, which became an Internet standard.

It is possible that Google, which bought Prabhakar's prior company and helped develop the technology, may ultimately buy them out and restrict the technology to the Google cloud. This might be a loss for the industry as a whole, but serve as a competitive differentiator for Googles growing cloud ambitions. It could also go the other way by releasing it as an open standard like many other Google innovations.

Original post:
Clockwork discovering wasted bandwidth between the nanoseconds - diginomica

Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
[NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
“Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]

Cloud Hosting

Clockwork discovering wasted bandwidth between the nanoseconds – diginomica

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin