This article discusses how Apache Pulsar handles storage and compares it to how other popular data processing technologies, such as Apache Kafka, deal with storage. Follow this link and take 35% off Apache Pulsar in Action in all formats by entering "ttkjerrumgaard" into the discount code box at checkout.
Apache Pulsar's multilayered architecture completely decouples the message serving layer from the message storage layer, allowing each to scale independently. Traditional distributed data processing technologies such as Hadoop and Spark have taken the approach of co-locating data processing and data storage on the same cluster nodes or instances. That design choice offered a simpler infrastructure and some possible performance benefits due to reducing transfer of data over the network, but at the cost of a lot of tradeoffs that impact scalability, resiliency and operations.
Pulsar's architecture takes a very different approach that's starting to gain traction in a number of cloud-native solutions. This approach is made possible in part by the significant improvements in network bandwidth that are commonplace today: separation of compute and storage. Pulsar's architecture decouples data serving and data storage into separate layers: Data serving is handled by stateless "broker" nodes, while data storage is handled by "bookie" nodes as shown in Figure 1.
This decoupling has many benefits. For one, it enables each layer to scale independently to provide infinite, elastic capacity. By leveraging the ability of elastic environments (such as cloud and containers) to automatically scale resources up and down, this architecture can dynamically adapt to traffic spikes. It also improves system availability and manageability by significantly reducing the complexity of cluster expansions and upgrades. Further, this design is container-friendly, making Pulsar the ideal technology for hosting a cloud native streaming system. Apache Pulsar is backed by a highly scalable, durable stream storage layer based on Apache BookKeeper that provides strong durability guarantees, distributed data storage and replication and built-in geo-replication.
A natural extension of the multilayered approach is the concept of tiered storage in which less frequently accessed data can be offloaded to a more cost-effective persistence store such as S3 or Azure Cloud. Pulsar provides the ability to configure the automated offloaded of data from local disks in the storage layer to those popular cloud storage platforms. These offloads are triggered based upon a predefined storage size or time period and provide you with a safe backup of all your event data while simultaneously freeing up storage capacity on the local disk for incoming data.
Both Apache Kafka and Apache Pulsar have similar messaging concepts. Clients interact with both systems via topics that are logically separated into multiple partitions. When an unbounded data stream is written to a topic, it is often divided into a fixed number of equal sized groupings known as partitions. This allows the data to be evenly distributed across the system and consumed by multiple clients concurrently.
The fundamental difference between Apache Pulsar and Apache Kafka is the underlying architectural approach each system takes to storing these partitions. Apache Kafka is a partition-centric pub/sub system that is designed to run as a monolithic architecture in which the serving and storage layers are located on the same node.
In Kafka, the partition data is stored as a single continuous piece of data on the leader node, and then replicated to a preconfigured number of replica nodes for redundancy. This design limits the capacity of the partition, and by extension the topic, in two ways. First, since the partition must be stored on local disk, the maximum size of the partition is that of the largest single disk on the host machine (approximately 4 TB in a "fresh" install scenario); second, since the data must be replicated, the partition can only grow to the size of smallest amount of disk space on the replica nodes.
Let's consider a scenario in which you were fortunate enough to have your leader be placed on a new node that can dedicate an entire 4 TB disk to the storage of the partition, and the two replica nodes each only have 1 TB of storage capacity. After you have published 1 TB of data to the topic, Kafka would detect that the replica nodes are unable to receive any more data and all incoming messages on the topic would be halted until space is made available on the replica nodes, as shown in Figure 3. This scenario could potentially lead to data loss, if you have producers that are unable to buffer the messages during this outage.
Once you have identified the issue, your only remedies are to either make more room on the existing replica nodes by deleting data from the disks, which will result in data loss, since the data is from other topics and most likely has not been consumed yet. The other option is to add additional nodes to the Kafka cluster and "rebalance" the partition so that the newly added nodes will serve as the replicas. Unfortunately, this requires recopying the entire 1 TB partition, which is an expensive, time-consuming and error-prone process that requires an enormous amount of network bandwidth and disk I/O. What's worse is that the entire partition is completely offline during this process, which is not an ideal situation for a production application that has stringent uptime SLAs.
Unfortunately, recopying of partition data isn't limited to only cluster expansion scenarios in Kafka. Several other failures can trigger data recopying, including replica failures, disk failures or machine failures. This limitation is often missed until users experience a failure in a production scenario.
Within a segment-centric storage architecture, such as the one used by Apache Pulsar, partitions are further broken down into segments that are rolled over based on a preconfigured time or size limit. These segments are then evenly distributed across a number of bookies in the storage layer for redundancy and scale.
Using the previous scenario we discussed with Apache Kafka in which one of the bookies disks fills up and can no longer accept incoming data, let's now look at the behavior of Apache Pulsar. Since the partition is further broken down into small segments, there is no need to replicate the content of the entire bookie to the newly added bookie. Instead, Pulsar would continue to write incoming message segments to the remaining bookies with storage capacity until the new bookie is added. At that point, the traffic will instantly and automatically ramp up on new nodes or new partitions, and old data doesn't have be recopied.
As we can see in Figure 5, during the period when the fourth bookie stopped accepting segments, incoming data segments 4, 5, 6 and 7 were routed to the remaining active bookies. Once the new bookie was added, segments were routed to it automatically. During this entire process, Pulsar experienced no downtime and was able to continue serving producers and consumers. As you can see, Pulsar's storage system is more flexible and scalable in this type of situation.
About the authorDavid Kjerrumgaard is the director of solution architecture at Streamlio, and a contributor to the Apache Pulsar and Apache NiFi projects.
See more here:
Apache Pulsar vs. Kafka and other data processing technologies - TechTarget
- CTERA Networks Partners with SYNNEX Corporation to Drive Market Demand for Hybrid Cloud Storage, Collaboration and ... [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Cloud storage exempt from Ninefold's uptime boost [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Virsto Named Finalist of 2012 Storage Virtualization & Cloud Awards [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Innovative Hybrid Cloud Storage Solutions Now Available From PROMISE Technology [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Box Talks Integration with BlackBerry 10 and Cloud Storage for Business - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- AG112's Weekly Technology Tutorials Ep.7 Cloud Storage - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Cloud Storage - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Google Cloud Storage Office Hours - 9/5/2012 - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- IBM Cloud Storage -- Future Directions - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Working with best FREE Cloud storage solution - MediaFire - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Best Cloud Storage | How Nate Made $450 His First Hour... - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Cloud Storage Services: Comparison - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Top 10 Free Cloud Storage Services of 2012 - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Cloud Storage Wars - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Secure and Comprehensive Cloud Storage for Health IT - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Free Cloud Storage! - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Microsoft SkyDrive Cloud Storage - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Top 16 Android Cloud Storage Apps Quick Breakdown - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Up to 48GB of FREE Cloud Storage, 14GB Guaranteed - Video [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Nasuni's CEO To Speak At Interop On The Secure Use Of Cloud Storage [Last Updated On: October 6th, 2012] [Originally Added On: October 6th, 2012]
- Oracle vs Amazon Cloud Storage: OpenWorld 2012 - Video [Last Updated On: October 6th, 2012] [Originally Added On: October 6th, 2012]
- Apple extends iCloud storage for another year [Last Updated On: October 7th, 2012] [Originally Added On: October 7th, 2012]
- Interush Introduces Convenient Cloud-Based Storage Service with Release of PHYTTER DOCK Application [Last Updated On: October 9th, 2012] [Originally Added On: October 9th, 2012]
- Get a free 15GB cloud-storage account from 4Sync [Last Updated On: October 9th, 2012] [Originally Added On: October 9th, 2012]
- Cloud Solutions Increase Customer Engagement and Retention [Last Updated On: October 9th, 2012] [Originally Added On: October 9th, 2012]
- Pogoplug offering 100GB of cloud storage to UK users for just £19.99 a year [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- New vFoglight Storage 2.0 Provides Integrated Application to Disk Performance Monitoring [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- Lunacloud Deploys Cloudian® To Grow Business, Offer S3 Compatible Cloud Storage [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- New Cloud Storage Company, ZapDrive, Launches Today Offering 100 GB for $19.99/year. [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Otixo Adds Ubuntu One to Aggregated Cloud Storage Lineup [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Cloud Storage Reviews Announcement Video - Video [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Cloud storage outage strikes Macquarie Telecom [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Online-Storage.com is Now SIO.CO [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- C2C Maximizes eMail Archiving Flexibility and Control With Support for the Hybrid Cloud [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- OwnCloud: Build your own or manage your public cloud storage services [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- Ubuntu's cloud storage service hits Mac in beta, with 5GB free [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- Akitio Cloud Hybrid Review: Convenient NAS and USB Storage in One [Last Updated On: October 13th, 2012] [Originally Added On: October 13th, 2012]
- Symform Hires Senior Sales Executive to Build Global Partnerships as Distributed Cloud Storage Network Surpasses 5.5 ... [Last Updated On: October 15th, 2012] [Originally Added On: October 15th, 2012]
- Get an extra 25GB of storage in the Dropbox Great Space Race [Last Updated On: October 16th, 2012] [Originally Added On: October 16th, 2012]
- Microsoft Acquires StorSimple To Increase Cloud Storage Capabilities [Last Updated On: October 17th, 2012] [Originally Added On: October 17th, 2012]
- Inktank-Metacloud Partnership Enhances Fully Managed Private Cloud Solution With Enterprise-Class Storage [Last Updated On: October 17th, 2012] [Originally Added On: October 17th, 2012]
- Citrix and NetApp Collaborate to Simplify Cloud Storage [Last Updated On: October 17th, 2012] [Originally Added On: October 17th, 2012]
- Microsoft Acquires Leader In Cloud-integrated Storage [Last Updated On: October 17th, 2012] [Originally Added On: October 17th, 2012]
- Microsoft Buys StorSimple for Enterprise Cloud Storage [Last Updated On: October 18th, 2012] [Originally Added On: October 18th, 2012]
- FreedomPACS, Radiology PACS and Cloud Image Storage Provider, Releases Results of County Hospital Case Study ... [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Nirvanix Selects Brocade as Networking Backbone for Global Cloud Expansion [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Pogoplug offers unlimited cloud storage for $5 a month [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- NTT Communications Chooses Cloudian® S3 compatible Object Storage Platform for Multi Petabyte Cloud Storage as a Service [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- TwinStrata and Google to Host "Beyond Disaster Recovery: Integrating Cloud Storage into Your IT Strategy" Seminar [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Cloud Storage Reviews Outlines "How SugarSync Works" In Latest Guide [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Symform Challenges Users to Think Beyond Centralized Data Centers With Its 'Byte Me' Promotion [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Avere to tart up FTX with cloud storage gateway, mutterings foretell [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Deals WD My Book Live Personal Cloud Storage 2 TB Network Attached Best Price 2012 - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Create and Manage Your Own Cloud Storage Free - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Free Cloud Space 100GB - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- DuraCloud Brown Bag Series: How DuraCloud is Different From Amazon - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- PocketCloud Explore - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Free 1TB Cloud storage - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Store your files on WEB for free - Unlimited and better than dropbox - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- CloudBackupNow - Retention Policy (with audio) - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- CloudBackupNow - Retention Policy - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- CloudBackupNow - Primer II - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- ERP Data Capture animation - Video [Last Updated On: November 1st, 2012] [Originally Added On: November 1st, 2012]
- Cash rains DOWN on the Cloud - Nasuni trousers $20m [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- My PC Backup Review The Cloud Storage Service For You - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Samsung ATIV S Review - Phones 4u - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Trust Me mv - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Product Webinar: Collaborating and Exchanging Large Data at Distance with Faspex 3.0 - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- DT Daily: Facebook takes aim at Craigslist, Halo 4 reviews a - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- 2 MCSE Private Cloud Storage Basics - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Gladinet Cloud Enterprise Quick Start Guide - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Installing OfficeDrop Mac File Sync - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- OfficeDrop Mac File Sync - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Secure Cloud Storage - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Windows Phone 8: Lenese integrates apps in the camera app - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Graphic Video on Wuala Secure Cloud Storage from Paula Hansen and Chart Magic - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Hurricane Sandy Cheat Meal Run to Tastee Diner - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- Cloud Zow Review - Cloudzow Review | Marketing Secret Revealed - Video [Last Updated On: November 3rd, 2012] [Originally Added On: November 3rd, 2012]
- What is Cloud Storage? - Video [Last Updated On: November 4th, 2012] [Originally Added On: November 4th, 2012]
- Perfume - Chocolate Disco [ hide@BSB Battle In Feb. Remix ] - Video [Last Updated On: November 4th, 2012] [Originally Added On: November 4th, 2012]