Introduction

In recent Wikibon research Software-led Infrastructure was defined. One component of that vision was Software-led Storage, as presented in Figure 1.

Softwareledstorage2
Figure 1 – Software-led Storage, a component of Software-led Infrastructure
Source: Wikibon 2013 (updated January 2013)

.

The current storage topology is shown in the left-hand column of Figure 1. The application requests IO using either block or network file protocols. Each storage array is the source of software services, including replication, snapshots, clones, encryption, and data reduction. Each of these services then has to be managed across the infrastructure. Many such as de-duplication, compression and encryption, need to be undone and redone (dehydrated and rehydrated) when data is accessed or moved.

Constraints Of Current Storage System

The major constraints of current larger-scale storage infrastructures shown in Figure 1 include impacts on equipment cost, IT productivity, system availability and system performance, because of:

  • Multiple Storage Array Platforms and Arrays
    • Multiple tiers and classes of storage (One Service & One SLA does not fit all)
    • Management and Services defined at the box level (within each array/array type)
    • Multiple expertise areas, and difficulty of promulgating information of storage information and storage problems to other parts of the system
    • Difficulty of automation and reduction to a service all aspects of storage administration
  • Slow Access Speed and throughput of Disk Drives (max 200 IOs/Hard Disk Drive)
    • Expensive technologies designed to mitigate the impact of slow disks (e.g., tiering and caching)
    • High skill levels and expensive technologies to avoid “hot-spots” of individual Drives
    • Application design constrained to accommodate disk performance characteristics
    • Bloated IO Protocols designed for Hard Disk Drive response times
    • Requirement to longitudinally stripe data across multiple disks; This is good for sequential reading and writing of data, but not optimal for random data access
  • Difficulty of migrating current management of data on DAS and SAN to more flexible topologies
    • Very difficult to integrate a distributed persistent flash storage topology at multiple levels of system components and multiple levels of data storage
    • Impossible to create end-to-end managements systems that integrate distributed storage from server to SAN to Archive to Cloud to Iron Mountain
  • Difficulty in managing the volume, growth and complexity of unstructured data
    • Scale-out consolidation of NAS
    • Emerging ‘cloud’ object tier for archive and get/put operations
  • A fragmented and often fragile Data backup process
    • De-duplication technologies with different solutions at different data storage elements, and force data rehydration
  • Difficulty of exploiting NAND flash technologies at the optimal place in hierarchy
    • Storage arrays swamped by NAND flash drives
    • Fitting as a disk into current array or DAS infrastructure makes it easy to initially implement, but does not allow real value of flash to be realized
    • Flash-caching and tiering mechanisms complex and just an extension of todays architectures; only helps well-manered data environments with random reads and with small working sets
    • File systems and storage management system cannot exploit persistent storage at different places in the infrastructure (no data locality)
  • No metadata framework for exploiting state data of storage and other components, and no real-time expended metadata framework for understand what data could help with a business problem and where it was located

Software-Led Storage Framework

A framework for Software-led Storage, together with services from unified metadata about system, application, and storage, will allow much more flexible and manageable storage topologies to be created and maintained. Wikibon has consistently forecast that as the relative cost of flash continues to improve against HDDs, and the performance of flash blows past HDD performance for sequential data, all active storage in the larger-sized data centers will be held on Flash, as well as the metadata for all data.

The second column in Figure 1 shows improved native flash protocols for addressing flash. These protocols have emerged from companies such as Fusion-io and Virident and will be further refined. These allow consistent IO response time to go from the low milliseconds of current SSDs to sub-100 microseconds for PCIe attached storage. Improved PCIe functionality and erasure coding techniques across multiple processors will allow shared storage with no single point of failure at the server level.

Column three in Figure 1 shows that metadata systems available as system-wide services. This improvement will lead to much improved file systems and database systems that will be aware of the location of data. The applications, file systems and database systems will be able to update and utilize metadata by using system-wide metadata services.

Column three also shows the storage services available to applications, file systems and databases that will allow end-to-end encryption and security, end-to-end de-duplication and compressions and system-wide management of data snapshots and clones. The storage management services will allow migration of data between active data on flash and passive data on magnetic media. The magnetic data could be on disk or tape.

The announcement of LTO 6 specs in August 2012 and the announcement of the first support for LTO 6 by IBM in October 2012 together with system-wide metadata on flash could allow FLAPE (Flash/Tape) systems to evolve as very low-cost energy and low-cost storage systems to evolve.

Benefits Of Software-Led Storage

The most important areas of improvement will come from:

  1. The improvement in the ability to manage resources and SLIs from a system perspective;
    • The movement of system and metadata services from the storage array and DAS to system-wide services will allow optimization of storage and processing resources, better data densities from the use of data reduction techniques.
  2. The ability to deploy existing application, update existing applications and write new applications with orders of magnitude more information available in real-time.
    • This cannot be done in today’s systems because of the constraints on IO performance of current storage systems. This leads to sequential data-flows and workflows across applications, where data flows from one system to another in complex and difficult to change ways.
    • Software-led Storage will allow 21st Century applications internally developed and from ISVs to:
      • Allow significant application simplicity to workflow and data-flow by application consolidation
      • Allow significant improvements in productivity from internal, partner and customer users because of the greater amount of information that can be processed in support of the users
      • Allow analytical systems to provide feedback in real-time to operational systems, and allow direct changes to be made to the operational systems in real-time. Real-time is defined as the time to loose a customer, supplier or partner.
  3. Improved agility, speed to deploy and speed to change;
    • This will be the first time that storage technologies will contribute to improved end-user productivity rather than a reduction in IT costs. Wikibon believes that this profound change will result in increased demand for IT as existing models of business are improved and completely new models of business are developed. This increased demand will lead to an increase in the percentage of revenue spend on IT

Migration To Software-Led Storage

The migration to Software-led Infrastructure and within that to Software-led Storage has started, and the journey will continue over the next five to ten years. It is making it’s appearance in the IT giants of Google, Facebook, Federal Systems, Microsoft, Yahoo and Apple. It will be initially adopted by large service providers and in organizations that can take most advantage of increased access to data, the Negroponte bit-orientated organizations.