Artificial intelligence (AI) and machine learning (ML) promise to transform whole areas of the economy and society, if they are not already doing so. From driverless cars to customer service bots, AI and ML-based systems are driving the next wave of business automation.
They are also massive consumers of data. After a decade or so of relatively steady growth, the data used by AI and ML models has grown exponentially as scientists and engineers strive to improve the accuracy of their systems. This puts new and sometimes extreme demands on IT systems, including storage.
AI, ML and analytics require large volumes of data, mostly in unstructured formats. All these environments are leveraging vast amounts of unstructured data, says Patrick Smith, field CTO for Europe, the Middle East and Africa (EMEA) at supplier Pure Storage. It is a world of unstructured data, not blocks or databases.
Training AI and ML models in particular uses larger datasets for more accurate predictions. As Vibin Vijay, an AI and ML specialist at OCF, points out, a basic proof-of-concept model on a single server might expect to be 80% accurate.
With training on a cluster of servers, this will move to 98% or even 99.99% accuracy. But this puts its own demands on IT infrastructure. Almost all developers work on the basis that more data is better, especially in the training phase. This results in massive collections, at least petabytes, of data that the organisation is forced to manage, says Scott Baker, CMO at IBM Storage.
Storage systems can become a bottleneck. The latest advanced analytics applications make heavy use of CPUs and especially GPU clusters, connected via technology such as Nvidia InfiniBand. Developers are even looking at connecting storage directly to GPUs.
In AI and ML workloads, the learning phase typically employs powerful GPUs that are expensive and in high demand, says Brad King, co-founder and field CTO at supplier Scality. They can chew through massive volumes of data and can often wait idly for more data due to storage limitations.
Data volumes are generally large. Large is a relative term, of course, but in general, for extracting usable insights from data, the more pertinent data available, the better the insights.
The challenge is to provide high-performance storage at scale and within budget. As OCFs Vijay points out, designers might want all storage on high-performance tier 0 flash, but this is rarely, if ever, practical. And because of the way AI and ML work, especially in the training phases, it might not be needed.
Instead, organisations are deploying tiered storage, moving data up and down through the tiers all the way from flash to the cloud and even tape. Youre looking for the right data, in the right place, at the right cost, says Vijay.
Firms also need to think about data retention. Data scientists cannot predict which information is needed for future models, and analytics improve with access to historical data. Cost-effective, long-term data archiving remains important.
There is no single option that meets all the storage needs for AI, ML and analytics. The conventional idea that analytics is a high-throughput, high-I/O workload best suited to block storage has to be balanced against data volumes, data types, the speed of decision-making and, of course, budgets. An AI training environment makes different demands to a web-based recommendation engine working in real time.
Block storage has traditionally been well suited for high-throughput and high-I/O workloads, where low latency is important, says Tom Christensen, global technology adviser at Hitachi Vantara. However, with the advent of modern data analytics workloads, including AI, ML and even data lakes, traditional block-based platforms have been found lacking in the ability to meet the scale-out demand that the computational side of these platforms create. As such, a file and object-based approach must be adopted to support these modern workloads.
Block-based systems retain the edge in raw performance, and support data centralisation and advanced features. According to IBMs Scott Baker, block storage arrays support application programming interfaces (APIs) that AI and ML developers can use to improve repeated operations or even offload storage-specific processing for the array. It would be wrong to rule out block storage completely, especially where the need is for high IOPS and low latency.
Against this, there is the need to build specific storage area networks for block storage usually Fibre Channel and the overheads that come with block storage relying on an off-array (host-based) file system. As Baker points out, this becomes even more difficult if an AI system uses more than one OS.
As a result, system architects favour file or object-based storage for AI and ML. Object storage is built with large, petabyte capacity in mind, and is built to scale. It is also designed to support applications such as the internet of things (IoT).
Erasure coding provides data protection, and the advanced metadata support in object systems can benefit AI and ML applications.
Against this, object storage lags behind block systems for performance, although the gap is closing with newer, high-performance object technologies. And application support varies, with not all AI, ML or analytics tools supporting AWSs S3 interface, the de facto standard for object.
Cloud storage is largely object-based, but offers other advantages for AI and ML projects. Chief among these are flexibility and low up-front costs.
The principal disadvantages of cloud storage are latency, and potential data egress costs. Cloud storage is a good choice for cloud-based AI and ML systems, but it is harder to justify where data needs to be extracted and loaded onto local servers for processing, because this increases cost. But the cloud is economical for long-term data archiving.
Unsurprisingly, suppliers do not recommend a single solution for AI, ML or analytics the number of applications is too broad. Instead, they recommend looking at the business requirements behind the project, as well as looking to the future.
Understanding what outcomes or business purpose you need should always be your first thought when choosing how to manage and store your data, says Paul Brook, director of data analytics and AI for EMEA at Dell. Sometimes the same data may be needed on different occasions and for different purposes.
Brook points to convergence between block and file storage in single appliances, and systems that can bridge the gap between file and object storage through a single file system. This will help AI and ML developers by providing more common storage architecture.
HPE, for example, recommends on-premise, cloud and hybrid options for AI, and sees convergence between AI and high-performance computing. NetApp promotes its cloud-connected, all-flash storage system ONTAP for AI.
At Cloudian, CTO Gary Ogasawara expects to see convergence between the high-performance batch processing of the data warehouse and streaming data processing architectures. This will push users toward object solutions.
Block and file storage have architectural limitations that make scaling beyond a certain point cost-prohibitive, he says. Object storage provides limitless, highly cost-effective scalability. Object storages advanced metadata capabilities are another key advantage in supporting AI/ML workloads.
It is also vital to plan for storage at the outset, because without adequate storage, project performance will suffer.
In order to successfully implement advanced AI and ML workloads, a proper storage strategy is as important as the advanced computation platform you choose, says Hitachi Vantaras Christensen. Underpowering a complex distributed, and very expensive, computation platform will net lower performing results, diminishing the quality of your outcome, ultimately reducing the time to value.
Continued here:
Storage requirements for AI, ML and analytics in 2022 - ComputerWeekly.com
- Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
- ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- [NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- “Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
- Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
- Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
- AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]