How Snap rebuilt the infrastructure that now supports 347 million daily users – Protocol

In 2017, 95% of Snaps infrastructure was running on Google App Engine. Then came the Annihilate FSN project.

Snap, which launched in 2011, was built on GAE FSN (Feelin-So-Nice) was the name for the original back-end system and the majority of Snapchats core functionality was running within a monolithic application on it. While the architecture initially was effective, Snap started encountering issues when it became too big for GAE to handle, according to Jerry Hunter, senior vice president of engineering at Snap, where he runs Snapchat, Spectacles and Bitmoji as well as all back-end or cloud-based infrastructure services.

Google App Engine wasn't really designed to support really big implementations, Hunter, who joined the company in late 2016 from AWS, told Protocol. We would find bugs or scaling challenges when we were in our high-scale periods like New Year's Eve. We would really work hard with Google to make sure that we were scaling it up appropriately, and sometimes it just would hit issues that they had not seen before, because we were scaling beyond what they had seen other customers use.

Today, less than 1.5% of Snaps infrastructure sits on GAE, a serverless platform for developing and hosting web applications, after the company broke apart its back end into microservices backed by other services inside of Google Cloud Platform (GCP) and added AWS as its second cloud computing provider. Snap now picks and chooses which workloads to place on AWS or GCP under its multicloud model, playing the competitive edge between them.

The Annihilate FSN project came with the recognition that microservices would provide a lot more reliability and control, especially from a cost and performance perspective.

[We] basically tried to make the services be as narrow as possible and then backed by a cloud service or multiple cloud services, depending on what the service we were providing was, Hunter said.

Snapchat now has 347 million daily active users who send billions of short videos, send photos called Snaps or use its augmented-reality Lenses.

Its new architecture has resulted in a 65% reduction in compute costs, and Hunter said he has come to deeply understand the importance of having competitors in Snaps supply chain.

I just believe that providers work better when they've got real competition, said Hunter, who left AWS as a vice president of infrastructure. You just get better pricing, better features, better service. We're cloud-native, and we intend on staying that way, and it's a big expense for us. We save a lot of money by having two clouds.

The Annihilate FSN process wasnt without at least one failed hypothesis. Hunter mistakenly thought that Snap could write its applications on one layer and that layer would use the cloud provider that best fit a workload. That proved to be way too hard, he said.

The clouds are different enough in most of their services and changing rapidly enough that it would have taken a giant team to build something like that, he said. And neither of the cloud providers were interested at all in us doing that, which makes sense.

Instead, Hunter said, there are three types of services that he looks at from the cloud.

There's one which is cloud-agnostic, he said. It's pretty much the same, regardless of where you go, like blob storage or [content-delivery networks] or raw compute on EC2 or GCP. There's a little bit of tuning if you're doing raw compute but, by and large, those services are all pretty much equal. Then there's sort of mixed things where it's mostly the same, but it really takes some engineering work to modify a service to run on one provider versus the other. And then there's things that are very cloud-specific, where only one cloud offers it and the other doesn't. We have to do this process of understanding where we're going to spend our engineering resources to make our services work on whichever cloud that it is.

Snaps current architecture also has resulted in reduced latency for Snapchatters.

In its early days, Snap had its back-end monolith hosted in a single region in the middle of the United States Oklahoma which impacted performance and the ability for users to communicate instantly. If two people living a mile apart in Sydney, Australia, were sending Snaps to each other, for example, the video would have to traverse Australia's terrestrial network and an undersea cable to the United States, be deposited in a server in Oklahoma and then backtrack to Australia.

If you and I are in a conversation with each other, and it's taking seconds or half a minute for that to happen, you're out of the conversation, Hunter said. You might come back to it later, but you've missed that opportunity to communicate with a friend. Alternatively, if I have just the messaging stack sitting inside of the data center in Sydney now you're traversing two miles of terrestrial cable to a data center that's practically right next to you, and the entire transaction is so much faster.

If I want to experiment and move something to Sydney or Singapore or Tokyo, I can just do it.

Snap wanted to regionalize its services where it made sense. The only way to do that was by using microservices and understanding which services were useful to have close to the customer and which ones weren't, Hunter said.

Customers benefit by having data centers be physically closer to them because performance is better, he said. CDNs can cover a lot of the broadcast content, but when doing one-on-one communications with people people send Snaps and Snap videos those are big chunks of data to move through the network.

That ability to switch regions is one of the benefits of using cloud providers, Hunter said.

If I want to experiment and move something to Sydney or Singapore or Tokyo, I can just do it, he said. I'm just going to call them up and say, OK, we're going to put our messaging stack in Tokyo, and the systems are all there, and we try it. If it turns out it doesn't actually make a difference, we turn that service off and move it to a cheaper location.

Snap has built more than 100 services for very specific functions, including Delta Force.

In 2016, any time a user opened the Snapchat app, it would download or redownload everything, including stories that a user had already looked at but hadnt yet timed out in the app.

It was a naive deployment of just download everything so that you don't miss anything, Hunter said. Delta Force goes and looks at the client finds out all the things that you've already downloaded and are still on your phone, and then only downloads the things that are net-new.

This approach had other benefits.

Of course, that turns out to make the app faster, Hunter said. It also costs us way less, so we reduced our costs enormously by implementing that single service.

Snap uses open-source software to create its infrastructure, including Kubernetes for service development, Spinnaker for its application team to deploy software, Spark for data processing and memcached/KeyDB for caching. We have a process for looking at open source and making sure we're comfortable that it's safe and that it's not something that we wouldn't want to deploy in our infrastructure, Hunter said.

Snap also uses Envoy, an edge and service proxy and universal data plane designed for large, microservice service-mesh architectures.

I actually feel like the way of the future is using a service mesh on top of your cloud to basically deploy all your security protocols and make sure that you've got the right logins and that people aren't getting access to it that shouldn't, Hunter said. I'm happy with the Envoy implementations giving us a great way of managing load when we're moving between clouds.

Hunter prefers using primitives or simple services from AWS and Google Cloud rather than managed services. A Snap philosophy that serves it well is the ability to move very fast, Hunter said.

I don't expect my engineers to come back with perfectly efficient systems when we're launching a new feature that has a service as a back end, he said, noting many of his team members previously worked for Google or Amazon. Do what you have to do to get it out there, let's move fast. Be smart, but don't spend a lot of time tuning and optimizing. If that service doesn't take off, and it doesn't get a lot of use, then leave it the way it is. If that service takes off, and we start to get a lot of use on it, then let's go back and start to tune it.

Our total compute cost is so large that little bits of tuning can have really large amounts of cost savings for us.

Its through that tuning process of understanding how a service operates where cycles of cloud usage can be reduced and result in instant cost savings, according to Hunter.

Our total compute cost is so large that little bits of tuning can have really large amounts of cost savings for us, he said. If you're not making the sort of constant changes that we are, I think it's fine to use the managed services that Google or Amazon provide. But if you're in a world where we're constantly making changes like daily changes, multiple-times-a-day changes I think you want to have that technical expertise in house so that you can just really be on top of things.

Three factors figure into Snaps ability to reap cost savings: the competition between AWS and Google Cloud, Snaps ability to tweeze out costs as a result of its own work and going back to the cloud providers and looking at their new products and services.

We're in a state of doing those three things all the time, and between those three, [we save] many tens of millions of dollars, Hunter said.

Snap holds a cost camp every year where it asks its engineers to find all the places where costs possibly could be reduced.

We take that list and prioritize that list, and then I cut people loose to go and work on those things, he said. On an annual basis depending on the year, it's many tens of millions dollars of cost savings.

Snap has considered adding a third cloud provider, and it could still happen some day, although the process is pretty challenging, according to Hunter.

It's a big lift to move into another cloud, because you've got those three layers, he said. The agnostic stuff is pretty straightforward, but then once you get to mixed and cloud-specific, you've got to go hire engineers that are good at that cloud, or you've got to go train your team up on the nuances of that cloud.

Enterprises considering adding another cloud provider need to make sure they have the engineering staff to pull it off: 20 to 30 dedicated cloud people as a starting point, Hunter said.

It's not cheap, and second, that team has to be pretty sophisticated and technical, he said. If you don't have a big deployment, it's probably not worth it. I think about a lot of the customers I used to serve when I was in AWS, and the vast majority of them, their implementations were serving their company's internal stuff, and it wasn't gigantic. If you're in that boat, it's probably not worth the extra work that it takes to do multicloud.

Here is the original post:
How Snap rebuilt the infrastructure that now supports 347 million daily users - Protocol

Box for Android - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
eUKhost - eNlight Cloud Hosting! - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Cloud Computing -- Oracle is Ready to Take You There - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
What is Cloud Computing? - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Webinar - Cloud Computing: Why You Should Care - 2010-10-14 - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
What is Cloud Hosting? - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Cloud Computing Misconceptions and Benefits - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Cloud Hosting and How it is Set to Change Internet Commerce - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Awesome Cloud Computing Explained with Animation - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Rackspace Cloud Race - UK cloud hosting - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Improved Cloud Service Delivery And Hosting | IBM - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Cloud Computing Explained - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Software companies turn to Savvis for cloud hosting and other SaaS services - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Sky News Tech Report on Cloud Computing - Macquarie Telecom Interview - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
BitNami Cloud Hosting Demo - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Fully managed Cloud Computing solution using your current IT infrastructure (Closed Caption) - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
Cloud Hosting Server Provisioning - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
iomart Hosting Provides Cloud Storage and Backup for new Branding Network [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Harris plans to stop offering remote cloud hosting [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
iomart Hosting provides cloud storage and backup for new UK branding network [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
DynamicOps Debuts "Fastest Path to Cloud" Seminar and Webinar [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Harris Corporation to Discontinue Cyber Hosting Operation; Will Continue Providing Advanced Cyber Security and Cloud ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Tutorial! Amazon Cloud Minecraft Server Hosting! - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
MachPanel 4.3 - SaaS and Cloud Hosting Control Panel for Windows - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Webair Carrier Neutral Cloud: Open Network Access in the Cloud [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
iomart Hosting Takes UK Digital Media Agency Into the Cloud [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
FireHost Grows Executive Team on Heels of European Expansion; Appoints Jim Ciampaglio as Sr. Vice President of Global ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
INetU Managed Hosting is SOC 2 and SOC 3 Compliant [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
Web Host Webair Adds Carrier Neutral Cloud Services [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
FireHost Appoints Jim Ciampaglio as Sr. Vice President of Global Sales and Marketing [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
BitRock CEO on BitNami Cloud Hosting - Video [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
Harris kills remote hosting service as customers shun cloud storage [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
Understand Cloud computing in 60secs - Video [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
Systech Integrators® Forms Strategic Relationship With Rackspace Hosting® to Offer Cloud Hosting Services for SAP® ... [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
Dedicated & Cloud Hosting Provider Codero Names Industry Veteran Emil Sayegh, President & CEO [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
Cloud Computing and Technology Mobility - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
Cloud Hosting Providers - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
Online Education Innovator Gives Virtual Internet Cloud Services an A+ [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
SingleHop Introduces the Hosting Industry's First Customer Bill of Rights [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Cloud Services Provider Intermedia Launches Integrated Partner Program [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Cloud Services Provider Intermedia Now Offering Microsoft Office 365 [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Inside IT Cloud Computing Security - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Lansing Cloud Host Introduces Faster ‘Storm SSD’ [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Leading Industry Analyst Firm positions Hosting.com as a Challenger in Managed Hosting Magic Quadrant [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
Hosting.com Positioned as Challenger in Managed Hosting in Gartner's Magic Quadrant [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
ServInt Announces the First Finalist for Its Inaugural Sextant Award, Recognizing the Most Effective Use of the ... [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
Leading Analyst Firm Recognizes Savvis as a Leader in Two Cloud-Focused Magic Quadrants [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
UK Cloud Computing Company iomart Hosting Recruits Scotland Footballers to Kick off New Campaign [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
Rackspace Hosting Positioned as a Leader in the Leaders Quadrant of the Magic Quadrant for Managed Hosting Providers [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
4t Networks Offers Red Hat Enterprise Linux 6 for Cloud Hosting [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
elchemyv2.wmv - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
Steve VanRoekel Keynote, NIST Cloud Computing Forum and Workshop IV - Video [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
Hosting.com Enhances Backup Capabilities to Deliver Leading-Edge Data Recovery Solution for Businesses Any Size ... [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Online Tech Hosts Webinar on Cloud Computing in EHR/RCM Systems [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Hosting.com Enhances Backup & Data Recovery [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
ServInt Introduces Its New Flex Line of High-Performance, Fully Managed Dedicated Servers [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
Telefonica targets LatAm with business cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
TCWH Announces New InMotion Hosting Review 2012 [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
Lokahi Expands Cloud Offering to Include Managed Security Services Through Partnership With StillSecure [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Eco Cloud Hosting IPv6 Ready with Web Application Firewall and Load Balancer - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Private SharePoint Cloud Beats Other Cloud Hosting Options for Enterprises on Price, Practicality [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Private SharePoint Cloud Beats Other Cloud Hosting Options for Enterprises, Says AISN [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
CaymanSecurity.com Introduces Secure Cloud Hosting Services [Last Updated On: March 19th, 2012] [Originally Added On: March 19th, 2012]
Storm On Demand Introduces Windows Cloud Hosting [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
Citrix Streamlines Delivery of Cloud-Hosted Apps and Desktops [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
Cloud Computing Explained.mp4 - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
AMD Opteron 3200 Chips Target Cloud, Web Hosting [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
Understanding the Cloud Computing Stack: SaaS, PaaS and IaaS | CloudU - Video [Last Updated On: March 21st, 2012] [Originally Added On: March 21st, 2012]
Racemi Joins Rackspace Cloud Tools Program [Last Updated On: March 22nd, 2012] [Originally Added On: March 22nd, 2012]
iNetRadio Adds User Music Cloud Hosting [Last Updated On: April 18th, 2012] [Originally Added On: April 18th, 2012]
Managed Hosting Company, OneNeck IT Services, Selected by Southwest Home Builder for Cloud Services [Last Updated On: April 18th, 2012] [Originally Added On: April 18th, 2012]
What is Cloud Hosting? - Australian Cloud Hosting Providers - Video [Last Updated On: April 18th, 2012] [Originally Added On: April 18th, 2012]
Courion Leverages NaviSite's Enterprise Cloud to Deliver Identity and Access Management Software-as-a-Service [Last Updated On: April 24th, 2012] [Originally Added On: April 24th, 2012]
TLD Solutions Launches Next Generation "4GH" Web Hosting [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
ElasticHosts unveils simple cloud web hosting for SMEs [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
Rackspace Hosting 1Q net income up on higher sales [Last Updated On: May 8th, 2012] [Originally Added On: May 8th, 2012]
Infinitely Virtual Announces Support for Microsoft SQL Server 2012, Providing Cloud-Ready Hosting with Mission ... [Last Updated On: May 8th, 2012] [Originally Added On: May 8th, 2012]
Kore Domains Launches Revolutionary New "4GH" Web Hosting Solution [Last Updated On: May 8th, 2012] [Originally Added On: May 8th, 2012]
4GH Web Hosting Europa Launches 4GH Cloud Web Hosting Solution in European Data Center [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
Hughes Cloud Services & Hosting Showcases Its Comprehensive Enterprise IT Offering At ... [Last Updated On: May 12th, 2012] [Originally Added On: May 12th, 2012]

Cloud Hosting

How Snap rebuilt the infrastructure that now supports 347 million daily users – Protocol

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin