REPORT Data Storage Architectures for Machine Learning and Artificial Intelligence On-Premises and Public Cloud

ENRICO SIGNORETTI TOPICS: ARTIFICIAL INTELLIGENCE DATA STORAGE MACHINE LEARNING Data Storage Architectures for Machine Learning and Artificial Intelligence On-Premises and Public Cloud

TABLE OF CONTENTS

1 Summary 2 Market Framework 3 Maturity of Categories 4 Considerations for Selecting ML/AI Storage 5 Vendors to Watch 6 Near-Term Outlook 7 Key Takeaways 8 About Enrico Signoretti 9 About GigaOm 10 Copyright

Data Storage Architectures for Machine Learning and Artificial Intelligence 2 1. Summary

There is growing interest in machine learning (ML) and artificial intelligence (AI) in enterprise organizations. The market is quickly moving from infrastructures designed for research and development to turn-key solutions that respond quickly to new business requests. ML/AI are strategic technologies across all industries, improving business processes while enhancing the competitiveness of the entire organization.

ML/AI software tools are improving and becoming more user-friendly, making it easier to to build new applications or reuse existing models for more use cases. As the ML/AI market matures, high- performance computing (HPC) vendors are now joined by traditional storage manufacturers, that are usually focused on enterprise workloads. Even though the requirements are similar to that of analytics workloads, the specific nature of ML/AI algorithms, and GPU-based computing, demand more attention to throughputs and $/GB, primarily because of the sheer amount of data involved in most of the projects.

Depending on several factors, including the organization’s strategy, size, security needs, compliance, cost control, flexibility, etc, the infrastructure could be entirely on-premises, in the public cloud, or a combination of both (hybrid) – figure 1. The most flexible solutions are designed to run in all of these scenarios, giving organizations ample freedom of choice. In general, long term and large capacity projects, run by skilled teams, are more likely to be developed on-premises. Public cloud is usually chosen by smaller teams for its flexibility and less demanding projects.

ML/AI workloads require infrastructure efficiency to yield rapid results. With thex e ception of the initial data collection, many parts of the workflow are repeated over time, so managing latency and throughput is crucial for the entire process. The system must handle metadata quickly while maximizing throughput to ensure that system GPUs are always fed at their utmost capacity.

A single modern GPU is a very expensive component, able to crunch data at 6GB/s and more, and each single compute node can have multiple GPUs installed. Additionally, CPU-storage vicinity is important and why NVMe-based flash devices are usually selected for their characteristics of parallelization and performance. What is more, the data sets require a huge amount of storage capacity to train the neural network. For this reason, scale-out object stores are usually preferred because of their scalability characteristics, rich metadata, and competitive cost.

Data Storage Architectures for Machine Learning and Artificial Intelligence 3 Figure 1: Possible combination of storage and computing resources in ML/AI projects

In this report, we discuss the most recent storage architecture designs and innovative solutions deployed on-premises, cloud, and hybrid, aimed at supporting ML/AI workloads for enterprise organizations of all sizes.

Key findings:

• Enterprise organizations are aware of the strategic value of ML/AI for their business and are increasing investments in this area.

• End users are looking for turn-key solutions that are easy to implement and that deliver a quick ROI (Return on Investment).

• Many of the solutions available are based on a two-tier architecture with a flash-based, parallel, and scale-out for active data processing and for capacity and long term data retention. There are also some innovative solutions that are taking a different approach, with the two tiers integrated together in a single system.

Data Storage Architectures for Machine Learning and Artificial Intelligence 4 2. Market Framework

There are several processes involved in an ML workflow and most of them need to be repeated several times. They all demonstrate different workload characteristics and need to encompass large data sets to return effective results.

Very briefly, an ML workflow (figure 2) can be summarized as follows:

• Data collection. Data is collected from one or multiple sources into a single repository. Capacity, scalability, and throughput are the most important metrics particularly with active data sources and non-historical data.

• Data preparation. The data set is analyzed and stripped of anomalies, out-of-range data, inconsistency, noise, errors, and any other exceptions that could compromise the outcome. Data is tagged and indexed, making it searchable and reusable. This section of the workflow is characterized by a large number of read and write operations with plenty of metadata updates. It is also interesting to note that some storage systems are now able to complete this operation during the ingestion phase thanks to the implementation of serverless computing mechanisms, accelerating the process while simplifying the infrastructure.

• Analysis and model selection. This part of the process is all about finding the right algorithms and data models for the training phase. The data scientist analyzes the data to find the training model that suits it best and repeats this part as many times as necessary on a small subset of the data. The storage system has to be fast to get results quickly, allowing it to compare several models before proceeding. With recent advancement in this field, new solutions are surfacing and uto-MLA products are becoming more common, helping end-users with this task (for reference, check Andrew Brust’s GigaOm Market Landscape Report, AI Within Reach: AutoML Platforms for the Enterprise.

• Neural network training. This is the most compute and storage-intensive part of the entire workflow. It is where the training data set is passed to the selected algorithms to train the model.

• Production and evaluation. This last part is where the data scientist actually sees the results of the ML model. Storage is not accessed anymore but the data has to be preserved in case it is necessary to reassess and improve the model.

Data Storage Architectures for Machine Learning and Artificial Intelligence 5 Figure 2. Typical ML/AI Workflow model

A storage infrastructure designed for ML/AI workloads has to provide performance, capacity, and cost- effectiveness. With this in mind, we can divide the market into two categories:

1. Two-Tier Infrastructure. The first tier is a fast scale-out file system for all active data and front-end operations. This is backed up by a second-tier object store for capacity. This flexible solution allows end-users to build an infrastructure that can be installed on-premises, in the cloud, or in a hybrid fashion (any combination of the two). This flexibility translates into cost savings but sacrifices some efficiency because of the backend data movements necessary to get the right data available where and when it is needed.

2. Single-System Architecture. A single system is much more efficient and provides top-notch performance by hiding data movements internally or making them not necessary. This simplifies infrastructure and operations, contributing to a reduction in TCO (total cost of ownership). On the flip side, they are more difficult andxpensive e to implement in the cloud, especially at scale, limiting them to on-premises or small cloud installations.

Major cloud providers offer managed services based on similar architecture designs, allowing the end- user to simplify resource provisioning and management but, unfortunately, this also creates vendor lock-in for large-scale projects, and becomes expensive.

Data Storage Architectures for Machine Learning and Artificial Intelligence 6 3. Maturity of Categories

Enterprises adopt an architecture depending on the following factors:

• Number of current and future projects

• Size of the projects

• Size of the organization

• Performance needs

• Cost

If the organization’s strategy is to make a major investment in ML and AI, then it is highly likely that it will adopt an on-premises solution alongside the other resources to run it properly. This is necessary to maintain control over the entire process while containing costs.

Other evaluation criteria include security and maintaining the validity and the source of the original data. In fact, if the data sets include sensitive information and already belong to the company, moving or replicating it to external repositories adds complexity to data management, monitoring, and auditing processes. Data governance and stewardship is particularly important in highly regulated industries and for organizations that handle personally identifiable information (PII). For xamplee , making copies of PII, without having the right tools to mask or remove sensitive information, could compromise General Data Protection Regulation (GDPR) compliance.

One of the advantages of deploying a cloud-only or hybrid infrastructure is leveraging the tools made available by cloud providers. In fact, most cloud providers have optimized VM instances with GPUs or other types of coprocessors designed for AI workloads, but they also developed specific services, platforms, and frameworks to simplify application development. The trade-off is, as usual, the risk of vendor lock-in.

For on-premises installations, NVMe-based is becoming the most common option to build the backend foundation of the access tier, usually presented through a high-performance scale- out file system. All the while hard disk drives are still the most cost-effective option to build the capacity tier. It is interesting to note that some object stores are now focusing on high-performance workloads and the S3 interface is becoming common among big data analytics applications and ML/AI frameworks.

Two-Tier Infrastructure

Two-tier infrastructure (figure 3), no matter if deployed entirely on-premises, in the cloud, or a hybrid, allows an organization to optimize the infrastructure investment and take advantage of the flexibility provided by the cloud. The separation of performance from capacity optimizes cost but it introduces

Data Storage Architectures for Machine Learning and Artificial Intelligence 7 additional complexity for system management, and retrieving data from the capacity tier always comes at a cost in terms of added latency. For this reason, several object storage vendors have been focusing their efforts on improving the performance of their products, in terms of higher throughputs and object operations per second.

Primary data storage vendors do offer the entire end-to-end solution from their product catalog. This doesn’t have to be considered a lock-in since the protocol that is usually preferred to move data across tiers is . Therefore, the capacity tier can be built out of commodity servers with hard disk drives (HDDs) and a software-defined storage (SDS) solution that xposese the S3 interface, or a public service with the right balance of performance and cost. At the same time, the front-end cluster can be built with physical servers or appliances for on-prem installations, or replaced by VMs equipped with flash storage.

This type of architecture is leveraged to simplify and improve data mobility. The performance tier, usually a scale-out parallel file system, can be quickly deployed on-premises or in the cloud close to compute resources for the time necessary to complete a task, while all data can be kept on an object store installed on-premises for security and compliance reasons. In general terms, two-tier architectures are chosen for large scale projects and data has to be preserved for long periods of time.

Figure. 3: Two-tier infrastructure logical scheme

Data Storage Architectures for Machine Learning and Artificial Intelligence 8 Single-System Architecture

Single-system architectures do not rely on a separate capacity tier, this is because performance and capacity are integrated. They expose file protocols (NFS, usually) or an object storage interface (S3) and, depending on the product architecture, the system could implement an internal tiering mechanism or take advantage of other techniques to provide performance and capacity.

In some cases, the performance tier of the previous category could operate as a single system allowing the end-user to start relatively small and expand over time to a two-tier architecture when capacity or other requirements arise. In any case, the size of this type of storage system is always quite important, ranging from a few hundred terabytes to multi-petabyte configurations.

Single-system architectures are easier to expand and manage. Overall system efficiency is another advantage as well, requiring less tuning and making it easier to get the best out of the system in terms of performance, but they lack the flexibility of two-tier systems in terms of hybrid installations. The downside remains in the difficulty to build hybrid cloud configurations. In fact, even though some systems can be replicated to the cloud, using virtual appliance instances, a two-tier system can take advantage of cloud resources by spanning the infrastructure between on-premises and cloud using native services.

Data Storage Architectures for Machine Learning and Artificial Intelligence 9 4. Considerations for Selecting ML/AI Storage

In addition to considering organization size, regulations, security, and the speed of development, it is important to think about budget, data growth over time, and the increasing number of projects. Below are some questions to consider:

1. What is the size of the data sets involved in the ML/AI project?

2. Is there a plan to manage future data growth? Is it necessary to keep data for a long period of time once the selected model is trained?

3. How large is the organization in charge of these projects?

4. How will compute power for this project be purchased?

5. Is the infrastructure needed for a single project or will it be reused for future projects?

6. What are the security and regulatory requirements for the data collected for these projects?

7. What are the budget constraints for this type of project?

Cloud computing is considered a good solution for many projects and major cloud providers offer pre- packaged tools, the right compute resources needed, and sufficient storage services. However, both large and small organizations, when developing their ML strategy , often adopt a totally different approach. When properly designed, building a storage infrastructure on-premises is less expensive at scale for data that has to be kept and reused over time. Compute resources can be a permanent part of this infrastructure as well, but some organizations take advantage of inexpensive compute power available from cloud providers to reduce both the time to get results and CAPEX (capital expenditure).

Small projects usually start with single-system architectures which can transform into two-tier as data grows. The second tier is dedicated to storing data on a long term basis (archiving). When designing, it is important to look at where compute resources will be purchased. If most of the compute will reside in the cloud, the storage solution must be flexible enough to move data quickly where it is needed. The cost of renting CPU and GPU resources for a long time in the cloud can become expensive and unpredictable. For this reason, organizations that invest heavily in ML/AI will probably find they are more in control of their budget with an on-premises single-system storage architecture.

Regulations and data security will become increasingly important with projects that leverage sensitive information across the organization. Data management and governance are key when data is distributed to multiple locations. The advantages of multi-tier architecture are diminished if it is harder to maintain control of data location, access, and security. In fact, there are a few tools that allow users to maintain tight control over data that is distributed across multiple locations and accessed through different protocols, as a result this process must be done manually and is more error-prone.

Data Storage Architectures for Machine Learning and Artificial Intelligence 10 5. Vendors to Watch

The number of products optimized for ML/AI workloads is growing quickly, with vendors proposing single-system or two-tier architectures.

1. Performance-tier: EMC Isilon, DataDirect Networks (DDN) A3I , Qumulo, Quobyte, WekaIO, Pure Storage FlashBlade, VAST Data, and IBM Spectrum Scale among others. All of these solutions could be considered for single-system architectures as well. It is also interesting to note that some object storage vendors have their products optimized to offer high performance through an S3 interface while combining NVMe devices with HDDs, like in the case of MinIO or NetApp StorageGRID.

2. Capacity-tier: Scality RING, SwiftStack, MinIO, NetApp StorageGRID, Western Digital Active Scale, Dell EMC Elastic Cloud Storage (ECS), IBM COS, OpenIO, and other S3 compatible systems enter the game as the secondary tier when the amount of data is considerably high in large multi- petabyte (PB) projects that also require long term data retention and better $/GB figures.

Some vendors partner together to offer certified two-tier architectures, while others propose converged infrastructures that integrate storage and compute resources. In this section, we briefly describe some of the solutions in the market. The list of vendors is not exhaustive but provides ample visibility on what is happening in the market and the types of solutions that can be found, the basic characteristics of each one, and their differentiators.

Dell EMC Isilon and ECS

Dell EMC Isilon is a mature scale-out storage system for unstructured data designed around the robust OneFS file system, which is usually accessed through network file protocols like NFS and SMB. Available in all-flash, hybrid, and large HDD configurations, Dell EMC Isilon can serve several types of workloads including big data analytics, commercial HPC, genome sequencing, and ML/AI applications. The Isilon platform is available as pre-configured appliances, both hybrid and all flash, and for the major clouds to improve data mobility between on-premises and cloud infrastructures.

Dell EMC Isilon offers several tools for policy-based automated tiering and cloud-archiving based on the S3 protocol, making it possible to take advantage of public or private cloud object-stores, like Dell EMC ECS, and combine the high-performance front-end with a cost-effective, high-capacity backend.

Dell EMC ECS provides an end-to-end solution from a single vendor, simplifying management, support, and the procurement process. In fact, Dell EMC ECS is a modern object-store available both as a pre- configured appliance or in a software-defined model. Dell EMC has focused ECS development on some key areas that include security and compliance, multi-tenancy, and multi-site/region deployments, offering a compelling solution to enterprise customers that are looking to implement large and secure distributed storage infrastructures.

A major advantage of comes from its ability to offer turnkey full-stack solutions that

Data Storage Architectures for Machine Learning and Artificial Intelligence 11 also include computing resources, software, and professional services. In this context, it is interesting to note that Dell Technologies is actively working with its product teams and partners to bring easy-to- use and efficient solutions. Forxample e , when it comes to ML/AI, the new features that will soon be available on VMware vSphere will allow sharing GPUs across VMs across the entire vSphere cluster, enabling end-users to maximize utilization of these expensive resources and allocate them to different applications at different times of the day (e.g. VDI in the working hours, and machine learning or HPC at night).

Qumulo

Qumulo File System is a data-aware, scale-out file system solution for on-premises, cloud-only, and hybrid file storage environments. Characterized by a modern and innovative cloud-native use a file system foundation. The product is available both as a software-only solution, to be installed on commodity x86 servers, or pre-installed on servers from primary hardware vendors. Furthermore, Qumulo FS is available on major public clouds (Amazon AWS and ), enabling the creation of hybrid infrastructures and data movement across different environments.

Qumulo File System overcomes most of the limitations usually found in traditional scale-out solutions, with distributed metadata handling, native support for erasure coding, and other features usually found in enterprise NAS solutions (like efficient snapshots, multi-protocol support, and advanced remote replication). The platform is easy to deploy and manage; Qumulo offers real-time analytics with insightful dashboards showing performance metrics, system utilization, user activity, auditing, and chargeback. In fact, from this point of view, the solution has a smaller initial impact on the enterprise Systems Administrator than other scale-out file storage solutions, resembling an enterprise NAS and making its adoption easier. Data volumes can be concurrently exposed via NFS, SMB, FTP, and S3 protocols, while are available for deeper application integration. Qumulo’s strength lies in the internal architecture design resulting in high performance, efficiency, simplicity of management, and scalability with most customers reporting a quick ROI and overall good TCO.

WekaIO and Scality

These two vendors worked together to propose an end-to-end solution that brings very high performance associated with large capacities in hybrid cloud environments at reasonable prices.

WekaIO’s file system, ekaFS™,W is a scale-out parallel file-system focused on high-performance workloads. It is available for on-premises installations as well as Amazon AWS marketplaces. Scality Zenko and RING provide the object storage technology to build a backend to store long term data with the ability to distribute data between on-prem and cloud locations according to user-defined policies. Each one of the vendors excels in its field, and the combination of the two products forms a multi- petabyte infrastructure for big data and ML/AI workloads that show a good combination of performance and cost.

WekaFS is usually exposed to remote clients through a dedicated client for optimal performance and other options such as NFS or SMB are available as well. It is designed to take advantage of NVMe

Data Storage Architectures for Machine Learning and Artificial Intelligence 12 devices installed in the storage nodes, with an extremely optimized data path and metadata handling to optimize a large number of workloads regardless of the file size. Distributed erasure coding-like data protection, fast failure detection, and quick rebuilding techniques enable WekaFS to keep data safe and performance at the highest levels even when cluster failures occur.

Scality RING is a market leader in large multi-PB storage infrastructures with installations in many industry verticals such as media & entertainment, healthcare, finance, manufacturing, ISPs, public administration, and so on. Scality RING8, the latest version of Scality’s scalable file & object storage software, introduced several new features aimed at improving security, multi-region, and multi-tenancy. This makes the product even more interesting for organizations that operate in highly regulated environments and want to develop ML/AI projects alongside other initiatives, or take advantage of the object stores for storing data coming from different sources and applications.

Other Noteworthy Solutions

DataDirect Networks (DDN)

DDN is a leader in HPC storage and ML/AI academic research. It has been expanding its product line to cover enterprise use cases for some time now and has recently launched its A3I family of products.

The A3I systems are preconfigured, easy to deploy, and manage converged infrastructures. It is based on NVMe-based parallel file system storage products and Nvidia DGX-1, DGX-2, or HPE Apollo 6500 servers. Already tested and optimized for the leading ML frameworks, A3I systems can start with small configurations and xpande in the field, depending on user needs.

IBM

Thanks to its Spectrum Scale and COS object-store, IBM has a two-tier solution aimed at serving big data, HPC, and ML workloads. The first one is a robust scale-out parallel file system already in use in several large installations around the globe, while the latter is an S3-compatible object storage platform capable of reaching the exascale, with strong security features and a low $/GB thanks to its advanced distribution erasure coding algorithm.

It is a flexible solution and the user can choose between several options ranging from a complete on- premises installation with perpetual licenses to hybrid and IBM public cloud services.

Minio

Minio is an open-source object-store optimized for big data analytics, machine learning, and other workloads that require high throughput and fast access to objects at scale. Thanks to its design, and depending on the use cases, Minio can be deployed as a single-system solution or in a two-tier architecture.

Data Storage Architectures for Machine Learning and Artificial Intelligence 13 It can be installed on all-flash, hybrid, and all-disk nodes. It can also be distributed in containers and VMs for on-premises and cloud installations. The growing solution ecosystem is backed by a strong open source community and some elegant integration capabilities with leading analytics tools like Apache Spark.

NetApp

NetApp has been working on integrating its storage systems following its Data Fabric vision for quite some time now. Its AFF and SolidFire all-flash systems can tier data to xternale s3-based object stores (like NetApp StorageGRID) and they have the performance capacity to sustain high throughput workloads.

OpenIO

OpenIO is an object storage startup that also addresses big data analytics and ML workloads, adding integrations with several leading analytics tools in this area (Apache Spark, iRODS, HDFS/Hadoop). In a recent benchmark, the company has shown very high throughputs on a large production cluster of commodity x86 servers while, at the same time, proving excellent load balancing and linear scalability.

OpenIO is designed to be lightweight while providing strong consistency. The dynamic load balancing mechanism enables disparate hardware to be mixed in the same cluster, allowing the system administrator to match storage infrastructure to business needs.

Pavilion Data Systems

Pavilion Data Systems is an emerging startup with an innovative NVMe-oF array that combines the benefits and performance of direct-attached (DAS) NVMe-based flash memory with the flexibility, availability, and data services of storage area networks.

The Pavilion Hyperparallel Flash Array enables better performance and fast resource provisioning without the inefficiencies of local SSDs in terms of availability and redundancy, unused capacity, and failure management. It provides better overall performance and TCO when compared to local SSDs, making it the ideal companion to scale-out parallel file systems like IBM® Spectrum Scale™, regardless of the workload.

Pure Storage AIRI

Pure Storage’s AI Ready Infrastructure (AIRI) is a converged system jointly architected with NVIDIA, based on Pure Storage FlashBlade, a high-performance all-flash NAS system, and NVIDIA DGX scale- out GPU servers. This hardware and software integrated single-system solution delivers high performance in a compact, easy-to-install, and manageable package that is linearly scalable to 64 racks of DGX servers.

Data Storage Architectures for Machine Learning and Artificial Intelligence 14 AIRI supports leading ML/AI frameworks and can be seamlessly integrated to an entire AI pipeline with Kubernetes orchestration. In addition, it can be configured to start small and grow with the business needs of an organization. The Pure Storage Pure-aaS subscription program (ES2) provides continuous upgrades to the FlashBlade system, allowing organizations to move seamlessly to an OPEX model for their infrastructure investments.

Quobyte

Quobyte is a parallel high-performance file system that is available for major Linux distributions, Quobyte offers many protocols including a plug-in for ensorFlowT (a common framework used in AI projects) which enables faster and more efficient access to data.

The file system is easy to install and manage, enabling multi-tier configurations of any scale and for all file sizes. Deployed in small and large multi-petabyte configurations, it is available both as a perpetual license or as a subscription on a capacity basis.

VAST Data

VAST Data recently presented its innovative scale-out Universal Storage platform. It is based on the latest QLC 3D NAND SSDs, Intel Optane memory, and advanced data footprint reduction techniques to lower costs while maintaining high performance and system reliability.

VAST Data achieves all of this, on a single-system architecture without tiers, exposing data volumes via NFS (with extensions for RDMA) and S3 protocols. This system can scale capacity and performance separately and it is targeted at workloads such as big data, genomic sequencing, media rendering, and machine learning.

Data Storage Architectures for Machine Learning and Artificial Intelligence 15 6. Near-Term Outlook

ML/AI data storage solutions are new to the enterprise market but they are maturing very quickly. Most enterprises are at the beginning of their AI journey and are looking for solutions that are easy-to-use, scalable, and do not have lock-in risks. In many cases, the primary goal is to find solutions that fit with an existing hybrid/multi-cloud strategy:

• Data mobility will become more and more important in the future. With the increasing importance of hybrid cloud in an organization’s strategy, the ability to move data across on-premises and different clouds will yield better ML/AI results faster.

• Single-system architectures are easier to implement and manage and are the common choice for many enterprise organizations. Even though many organizations begin with a limited investment cloud project, as soon as the capacity increases, many organizations turn to on-prem solutions to optimize costs and achieve quicker results.

• Security is another important aspect. In many cases, ML/AI projects often contain sensitive information like facial or document recognition. As data has to be kept for several years, regulation, compliance, access control, and auditing will become features requested by many organizations.

Data Storage Architectures for Machine Learning and Artificial Intelligence 16 7. Key Takeaways

Data storage for ML/AI is different than what we are accustomed to for Big Data infrastructures. Computational power is ever more important with data-hungry GPUs that have to be fed properly all the time. Capacity and low latency throughput are the metrics to consider in the entire workflow, with complexity and cost control coming immediately after.

• Many organizations choose public cloud for the initial development of ML/AI projects; but they switch to hybrid and on-premises solutions as soon as the size of the storage infrastructure grows to the multi-petabyte level, mostly to keep costs under control.

• Single-system architectures are easier to implement and manage and are the common choice for many enterprise organizations. It is important to note that as soon as the infrastructure grows, two- tier architecture is more cost-effective.

• Hybrid solutions are of interest to many organizations that face increasing compute challenges, including peaks and lows.

Data Storage Architectures for Machine Learning and Artificial Intelligence 17 8. About Enrico Signoretti

Enrico Signoretti Enrico has 25+ years of industry experience in technical product strategy and management roles. He has advised mid-market and large enterprises across numerous industries and software companies ranging from small ISVs to large providers.

Enrico is an internationally renowned visionary author, blogger, and speaker on the topic of data storage. He has tracked the changes in the storage industry as a Gigaom Research Analyst, Independent Analyst and contributor to the Register.

Data Storage Architectures for Machine Learning and Artificial Intelligence 18 9. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

Data Storage Architectures for Machine Learning and Artificial Intelligence 19 10. Copyright

© Knowingly, Inc. 2019. "Data Storage Architectures for Machine Learning and Artificial Intelligence" is a trademark of Knowingly, Inc.. For permission to reproduce this report, please contact [email protected].

Data Storage Architectures for Machine Learning and Artificial Intelligence 20