USOO7634497B2

(12) United States Patent (10) Patent No.: US 7.634,497 B2 Passerini et al. (45) Date of Patent: Dec. 15, 2009

(54) TECHNIQUE FOR IMPROVING (58) Field of Classification Search ...... 707/1, SCALABILITY AND PORTABILITY OFA 707/3,100, 102, 200; 711/149, 161, 162: STORAGE MANAGEMENT SYSTEM 714/5, 42, 54,769, 805 See application file for complete search history. (75) Inventors: Ronald Peter Passerini, Somerville, (56) References Cited MA (US); Robert Warren Perry, Leominster, MA (US); Christopher U.S. PATENT DOCUMENTS Angelo Rocca, Burlington, MA (US); 5.974.409 A * 10/1999 Sanu et al...... 707/3 Michael Daniel Anthony, Wilmington, 2005/0066222 A1 3, 2005 Rowan et al. MA (US) 2005/0066225 A1* 3, 2005 Rowan et al. ------714/5 2005. O193031 A1 9/2005 Midgley et al. OTHER PUBLICATIONS (73) Assignee: Symantec Corporation, Cupertino, CA (US) International Search Report mailed Mar. 6, 2008. * cited by examiner (*) Notice: Subject to any disclaimer, the term of this Primary Examiner Fred I Ehichioya patent is extended or adjusted under 35 (74) Attorney, Agent, or Firm Hunton & Williams LLP U.S.C. 154(b) by 290 days. (57) ABSTRACT (21) Appl. No.: 11/549,416 A technique for improving scalability and portability of a storage management system is disclosed. In one particular (22) Filed: Oct. 13, 2006 exemplary embodiment, the technique may be realized as a storage management system operatively coupled to a storage (65) Prior Publication Data system. The storage management system may comprise a US 2007/OO88768 A1 Apr. 19, 2007 plurality of processor modules, wherein each processor mod ule is capable of intercepting write commands directed to the Related U.S. Application system, backing up data associated with the write (60) Provisional application No. 60/726,186, filed on Oct. commands, and generating metadata having timestamps for 14, 2005, provisional application No. 60/726,187, the data. The storage management system may also filed on Oct. 14, 2005, provisional application No. comprise one or more indexing modules that create one or 60/726,192, filed on Oct. 14, 2005, provisional appli more indexing tables for the backup databased on the meta cation No. 60/726,193, filed on Oct. 14, 2005. data, wherein the one or more indexing modules are in com munication with the processor modules and the storage sys (51) Int. C. tem. G06F 7700 (2006.01) (52) U.S. Cl...... 707/102; 707/200; 711/162 13 Claims, 9 Drawing Sheets

Current Store Tire Store

Storage Management System 106

Storage System 104 U.S. Patent Dec. 15, 2009 Sheet 1 of 9 US 7.634,497 B2

|eun61–

U.S. Patent Dec. 15, 2009 Sheet 2 of 9 US 7.634,497 B2

U.S. Patent Dec. 15, 2009 Sheet 3 of 9

U.S. Patent Dec. 15, 2009 Sheet 4 of 9 US 7.634,497 B2

67 U.S. Patent Dec. 15, 2009 Sheet 5 Of 9 US 7.634,497 B2

809(34) 009?p??T?JST

909 U.S. Patent US 7.634,497 B2

9eun61– IdV6u?Xepul-CIS-1 U.S. Patent Dec. 15 , 2009 Sheet 7 Of 9 US 7.634,497 B2

gE Interface 2

/eun61

9 oeueufs U.S. Patent Dec. 15, 2009 Sheet 8 of 9 US 7.634,497 B2

5705s??xouºpuesque6w/ U.S. Patent Dec. 15, 2009 Sheet 9 of 9 US 7.634,497 B2

|------###|--6eun61–

@======US 7,634,497 B2 1. 2 TECHNIQUE FOR IMPROVING storage system 104. The content of the time store 110 may be SCALABILITY AND PORTABILITY OFA indexed, for example, based on time and/or storage address to STORAGE MANAGEMENT SYSTEM facilitate efficient access to the backup data. CROSS-REFERENCE TO RELATED With a current copy of the digital content of the storage APPLICATIONS system 104 in the current store 108 and the historical records in the time store 110, the storage management system 106 This patent application claims priority to U.S. Provisional adds a new dimension, i.e., time, to the storage system 104. Patent Application Nos. 60/726,186, 60/726,187, 60/726, Assuming the storage management system 106 has been 192, and 60/726,193, all of which are filed on Oct. 14, 2005. 10 operatively coupled to the storage system 104 since a past Each of these provisional applications is hereby incorporated time, the storage management system 106 may quickly and by reference herein in its entirety. accurately restore any addressable content in the storage sys This patent application is related to U.S. patent application tem 104 to any point in time between the past time and a Ser. No. 10/924,652, filed Aug. 24, 2004, which is a continu present time. ation-in-part of U.S. patent application Ser. No. 10/668,833, 15 filed Sep. 23, 2003, each of which is hereby incorporated by There are a wide variety of implementation options for the reference herein in its entirety. above-described CDP method. FIG. 2 shows one exemplary This patent application is also related to three co-pending implementation whereina storage management system 206 is patent applications, respectively entitled “Techniques for operatively coupled to both a host 202 and a storage system Time-Dependent Storage Management with a Portable 204. The storage management system 206 may or may not be Application Programming Interface.” “Technique for positioned in a critical data path 20 between the host 202 and Remapping Data in a Storage Management System, and the storage system 204. If it is not in the critical data path 20, “Technique for Timeline Compression in a Data Store.” filed the storage management system 206 may be switched into a concurrently herewith, each of which is incorporated herein “capture mode” whenever it is desirable for it to intercept in its entirety. 25 communications between the host 202 and the storage system FIELD OF THE DISCLOSURE 204. The storage management system 206 is typically imple mented with one or more processor modules 208, wherein The present disclosure relates generally to data storage each processor module 208 performs a series of operations and, more particularly, to a technique for improving Scalabil 30 Such as, for example, data interception, data replication, ity and portability of a storage management system. record creation, and metadata indexing. FIG. 3 shows an exemplary implementation of a scalable BACKGROUND OF THE DISCLOSURE storage management system 300. The storage management In related U.S. patent application Ser. No. 10/924,652 and 35 system 300 may comprise a plurality of processor modules U.S. patent application Ser. No. 10/668,833, a time-depen 302 that are interconnected via an internal network (or back dent data storage and recovery technique is disclosed. plane)30. Each processor module 302 may comprise a central Embodiments of Such a technique provide a solution for processing unit (CPU) 304 that is in communication with a continuous data protection (CDP) wherein write commands target interface 306, a read-only memory (ROM) 308, a directed to a storage system (or data store) are intercepted by 40 memory 310, an initiator interface 312, and an internal net a storage management system having a current store and a work interface 314. The CPU 304 may be implemented in one time store. or more integrated circuits, and can include other"glue” logic FIG. 1 shows an exemplary embodiment of a CDP system (not shown) for interfacing with other integrated circuits, 100. A storage management system 106 may intercept write Such as bus interfaces, clocks, and communications inter commands that are issued by a host 102 and directed to a 45 storage system 104. In the storage management system 106, a faces. The CPU 304 may implement software that is provided current store 108 may maintain or have access to a current (or in the ROM 308 and also software in the memory 310, which mirror) copy of the digital content of the storage system 104. Software can be accessed, for example, over the internal net A time store 110 may record information associated with work interface 314. The internal network interface 314 may each intercepted write command, such as new data in the 50 connect the processor module 302 to the internal network 30, write command's payload and/or old data to be overwritten in such that the processor module 302 may communicate with the current store in response to the write command. Recorda other processor modules. In one implementation, one or more tion of the new or old data in response to a write command sets of processor modules 302 are rack mounted within a may be referred to as a copy-on-write (COW) operation, and storage management system, and the internal network 30 also the new or old data recorded may be referred to as COW data. 55 connects each rack to the other racks within the storage man The time store 110 may also record other information (i.e., agement system. This distributed processing creates a system metadata) associated with an intercepted write command and/ whose size (e.g., memory capacity, processing speed, etc.) or the corresponding COW operation, such as, for example, a timestamp, an original location in the current store where the may be scaled up or down to fit the desired capacity. old data are overwritten, and a destination location in the time 60 However, the above-described modularization of a storage store to which the COW data are copied. Each COW opera management system is only a first step towards Scalability. A tion typically backs up one or more blocks of COW data, number of limitations still restrict the environments in which thereby creating one set of COW data and corresponding the storage management system may be deployed. There are metadata. Over a period of time, multiple sets of COW data also platform specific restrictions that limit the portability of and corresponding metadata (including timestamps) may be 65 the processor modules. In addition, the current architecture of accumulated as a collection of historical records of what have the storage management system cannot take full advantage of been written or overwritten in the current store 108 or the emerging intelligent Switch techniques. US 7,634,497 B2 3 4 In view of the foregoing, it would be desirable to provide a write commands directed to the storage system, backing up storage management architecture which overcomes the data associated with the write commands, and generating above-described inadequacies and shortcomings. metadata having timestamps for the backup data. The method may also comprise coupling a blade farm having one or more SUMMARY OF THE DISCLOSURE indexing modules to the storage system. The method may further comprise causing the blade farm to communicate with A technique for improving Scalability and portability of a the plurality of processor modules via one or more internal storage management system is disclosed. In one particular networks, wherein the one or more indexing modules create exemplary embodiment, the technique may be realized as a one or more indexing tables for the backup databased on the storage management system operatively coupled to a storage 10 metadata. system. The storage management system may comprise a In yet another particular exemplary embodiment, the tech plurality of processor modules, wherein each processor mod niques may be realized as at least one signal embodied in at ule is capable of intercepting write commands directed to the least one carrier wave for transmitting a computer program of storage system, backing up data associated with the write instructions configured to be readable by at least one proces commands, and generating metadata having timestamps for 15 sor for instructing the at least one processor to execute a the backup data. The storage management system may also computer process for performing the method as recited comprise one or more indexing modules that create one or above. more indexing tables for the backup databased on the meta data, wherein the one or more indexing modules are in com In still another particular exemplary embodiment, the tech munication with the processor modules and the storage sys niques may be realized as at least one processor readable tem. carrier for storing a computer program of instructions config In accordance with other aspects of this particular exem ured to be readable by at least one processor for instructing plary embodiment, the number of the plurality of processor the at least one processor to execute a computer process for modules may be scalable based on a desired capacity of the performing the method as recited above. Storage management System. 25 In a further particular exemplary embodiment, the tech In accordance with further aspects of this particular exem niques may be realized as a method for improving portability plary embodiment, the plurality of processor modules may be and Scalability of a storage management system operatively configured with fault-tolerant redundancy. coupled to a storage system. The method may comprise inter In accordance with additional aspects of this particular cepting, at a plurality of processor modules, write commands exemplary embodiment, the plurality of processor modules 30 directed to the storage system. The method may also com may be coupled to the storage system via fiber connections. prise backing up data associated with the write commands. In accordance with another aspect of this particular exem The method may further comprise generating metadata hav plary embodiment, each of the plurality of processor modules ing timestamps for the backup data. The method may addi may comprise at least one target interface and at least one tionally comprise creating, at one or more indexing modules, initiator interface. 35 one or more indexing tables for the backup databased on the In accordance with yet another aspect of this particular metadata, wherein the one or more indexing modules are in exemplary embodiment, the plurality of processor modules communication with the processor modules and the storage may be in communication with one another. system. In accordance with still another aspect of this particular The present disclosure will now be described in more detail exemplary embodiment, the number of the one or more index 40 with reference to exemplary embodiments thereofas shown ing modules in the blade farm may be scalable based on the in the accompanying drawings. While the present disclosure number of processor modules supported by the blade farm. is described below with reference to exemplary embodi In accordance with a further aspect of this particular exem ments, it should be understood that the present disclosure is plary embodiment, the one or more indexing modules may not limited thereto. Those of ordinary skill in the art having have access to one or more metadata storage devices in the 45 access to the teachings herein will recognize additional storage system via fiber connections. implementations, modifications, and embodiments, as well as In accordance with a yet further aspect of this particular other fields of use, which are within the scope of the present exemplary embodiment, the one or more indexing modules disclosure as described herein, and with respect to which the may communicate with the plurality of processor modules via present disclosure may be of significant utility. one or more internal networks. 50 In accordance with a still further aspect of this particular BRIEF DESCRIPTION OF THE DRAWINGS exemplary embodiment, the one or more indexing modules may communicate with the plurality of processor modules In order to facilitate a fuller understanding of the present through a plurality of agents and proxies. disclosure, reference is now made to the accompanying draw In accordance with another aspect of this particular exem 55 ings, in which like elements are referenced with like numer plary embodiment, the one or more indexing modules may als. These drawings should not be construed as limiting the further perform one or more functions selected from a group present disclosure, but are intended to be exemplary only. consisting of blade configuration, remap engine, global data FIG. 1 shows an exemplary embodiment of a continuous base, production restore, timeline compression, indexing data protection system. database interface, metadata space management, and Vendor 60 multipathing. FIG.2 shows an exemplary implementation for continuous In another particular exemplary embodiment, the tech data protection. niques may be realized as a method for improving portability FIG. 3 shows an exemplary implementation of a scalable and Scalability of a storage management system operatively Storage management System. coupled to a storage system. The method may comprise cou 65 FIG. 4 shows an exemplary implementation of a storage pling a plurality of processor modules to the storage system, management system in accordance with an embodiment of wherein each processor module is capable of intercepting the present disclosure. US 7,634,497 B2 5 6 FIG. 5 shows a diagram illustrating an exemplary storage management appliance 406 may comprise one or TSD blade in accordance with an embodiment of the present more TSD blades 407. The fiber connection 41 may have at disclosure. least two channels, one for an initiator mode and the other for FIG. 6 shows exemplary software components in a TSD a target mode. The same may be true with the fiber connection blade in accordance with embodiments of the present disclo 42. Each TSD blade may perform time store functionalities SUC. (some locally and others remotely) to back up digital content FIG. 7 shows a block diagram illustrating an exemplary in the SAN402. ATSD blade may be configured as a modular indexing blade in accordance with an embodiment of the component on a hardware platform similar to Revivio, Inc.'s present disclosure. CPS1200 Continuous Protection System, or may be embed FIG. 8 shows exemplary software components in an index 10 ded in an intelligent switch or some other type of hardware. It ing blade in accordance with embodiments of the present is preferable that each storage management appliance (e.g., disclosure. 404 or 406) includes at least two TSD blades to achieve fault FIG. 9 shows a block diagram illustrating exemplary con tolerance. Each storage management appliance may be nection handlers in accordance with an embodiment of the coupled, for example, via connections 45 and 46 or present disclosure. 15 47 and 48, to two internal subnets—Internal Subnet 1 and Internal Subnet 2. The storage management appliances may DETAILED DESCRIPTION OF EXEMPLARY also be coupled to a (LAN). EMBODIMENTS The storage management system 400 may also comprise an indexing blade farm 408 that is coupled to the SAN 402 via As used herein, “backup data refers generally to data that fiber connections 43 and 44. The indexing blade farm 408 have been recorded and/or organized with a purpose of restor may comprise a plurality of indexing blades 409 which may ing or recovering digital content of a storage system. “Copy be in communication with metadata storage devices in the on-write data' (or "COW data') refers to substantive data SAN 402 via the fiber connections 43 and 44. The indexing (e.g., new data to be written or old data to be overwritten in blades 409 may also be in communication with each of the response to a write command) that have been recorded in a 25 storage management appliances 404 and 406 via redundant copy-on-write operation. New data to be written in response connections 49 and 50 through Internal Subnet 1 and Internal to a write command are sometimes referred to as “after image Subnet 2. With the indexing functionality physically sepa data' or “after image,” while old data to be overwritten in rated from the TSD blades, the indexing blade farm 408 may response to a write command are sometimes referred to as Support multiple storage management appliances and accom “before image data' or “before image.” 30 modate the scaling of the storage management appliances "Metadata” refers to informational data (e.g., timestamps) and/or the SAN 402.The capacity of the storage management regarding the corresponding COW data in a copy-on-write appliances may be scaled up or down by adding or removing operation. Typically, one copy-on-write operation causes one TSD blades and/or by increasing or decreasing the number of set of COW data and corresponding metadata to be created. the storage management appliances. The capacity of the Despite their correlation, COW data and corresponding meta 35 indexing blade farm 408 may also be scaled up or down by data may be stored in separate storage devices or segments. In adding or removing the indexing blades 409. a time store, COW data may be organized in one or more According to embodiments of the present disclosure, the timestamped “data chunks.” indexing blade farm 408 may be a scalable, loosely coupled A typical 'storage system” may comprise one or more set of indexing blades 409 running a base set of indexing storage devices which may be physical, virtual or logical 40 Software components which Support basic indexing storage devices or a combination thereof. According to one embodi and retrieval along with value-added features such as produc ment, a storage system may comprise a tion restore, timeline compression (or timeline rollup), and (SAN) having one or more datasets, wherein each dataset tiered storage services. The indexing blades 409 typically do may comprise one or more nodes, and wherein one or more not participate in a workload/work-unit configuration. logical units (LUs) may be coupled to each node. Hereinafter, 45 Instead, (LUN) assignments may be for ease of illustration, the term “storage system” may refer to handled dynamically. Agents and proxies may be responsible an entire storage system or a portion (e.g., dataset, node or for heart-beating connections, and, if a blade (either TSD or LU) thereof. indexing) goes away, appropriate reconfiguration may be per Embodiments of the present disclosure provide an formed with help from blade configuration managers as will improved architecture of a storage management system that is 50 be described in detail below. more scalable and/or more portable. In functionalities related Taking advantage of separate TSD and indexing blades, to a time store, input/output (I/O) processing may be physi commits of before image table inserts may be batched to cally separated from indexing functions. I/O processing may optimize performance. In configurations where the indexing be implemented with one or more I/O processing modules operations may be running on a same blade as the I/O pro known as “Time Store Daemon (TSD) blades,” while index 55 cessing, replication of indexing batches between blades may ing functions may be performed by one or more indexing take place to ensure that commit batching may still take place. modules known as “indexing blades.” The indexing blades According to one embodiment, each indexing blade 409 in may be grouped into an indexing blade farm that Supports one the indexing blade farm 408 may require fiber connections to or more sets of TSD blades. the same set of metadata LUs. Metadata LUs may be used as Referring to FIG. 4, there is shown an exemplary imple 60 raw devices utilizing a MetalData I/O Manager to ensure that mentation of a storage management system 400 in accor no two indexing blades write to the same region. In another dance with an embodiment of the present disclosure. The embodiment, to utilize an existing I/O interface, indexing storage management system 400 may comprise one or more data may be stored in Berkeley databases. Initially, a Struc storage management appliances (e.g., 404 and 406) that are tured Query Language (SQL) database may be used for “glo coupled to a storage area network (SAN) 402 via fiber con 65 bal' databases. nections (e.g., 41 and 42). The storage management appli The versatility of managing metadata LUs as an inter-blade ance 404 may comprise one or more TSD blades 405, and the shared set of raw devices means that file systems no longer US 7,634,497 B2 7 8 have to be consciously managed, and indexing of data for The Agents and Proxies 606 may include proxies that serve current store LUs need not be restricted to blades hosting the as the interfaces to agents running on indexing blades. Com . This allows current store LUN assignments to be mon responsibilities of these proxies may include, for completely dynamic with no persistent configuration require example, establishing and maintaining connections to ments, although workloads may still be relevant on the TSD required indexing blades, receiving input from the TSD-In blade. dexing API 604 (converting calls to messages), and providing A remap engine technique was disclosed in U.S. Provi asynchronous (callback) messaging. According to one sional Application No. 60/726,192, filed on Oct. 14, 2005, embodiment, one or more of the following proxies may be which is hereby incorporated herein in its entirety. The remap implemented: an Allocator Proxy, an Indexer Proxy, and a engine technique may further enhance the blade farm versa 10 Remap Proxy. The Allocator Proxy may be responsible for tility. One of the goals for the remap engine may be to provide communication with an Indexing Allocator Agent to allocate a "generic' interface for performing remap reads and writes time store space as required. The Allocator Proxy may also that don't require the remap engine to know or maintain state include local caching of data chunks. The IndexerProxy may about user created objects such as time images. be responsible for forwarding batches of indexing requests to Having separate TSD and indexing blades may have the 15 indexeragents running on indexing blades. In a typical opera additional advantage of allocating more hardware resources tion, the indexing batch may be considered complete when it to indexing. As such, there may always be available CPU has been successfully replicated to the indexing blade farm. cycles for performing feature-related indexing blade tasks, The Remap Proxy may be responsible for forwarding Such as, for example, timeline compression as disclosed in requests to remap engine agent(s) running on indexing U.S. Provisional Application No. 60/726,187, filed on Oct. blades. 14, 2005, which is hereby incorporated herein in its entirety. The Agents and Proxies 606 may include agents that serve Furthermore, there may no longer be any hard restriction on requests from proxies running on indexing blades. One Such the platform that the indexing components live on, leaving a agent may be an I/O Agent. The I/O Agent may listen for system designer free to explore, for example, 64-bit architec batched I/O requests coming from indexing blades that are tures, blade centers, up-to-date versions of LINUX operating 25 currently processing production restores or timeline com system, etc. pression, and may forward the I/O requests to the appropriate FIG. 5 shows a block diagram illustrating an exemplary TSD blade interface. The I/O Agent may be responsible for TSD blade 500 in accordance with an embodiment of the tracking the status of each request and responding appropri present disclosure. This block diagram only shows major ately to the indexing blade when requests complete. components in the TSD blade 500. The TSD blade 500 may 30 The Blade Configuration Manager 608 may be responsible comprise a CPU502 that is coupled to a memory 504, a target for inter-blade communications, discovery, services and con interface 506, an initiator interface 508, and a network inter figurations. A non-persistent (i.e., runtime) configuration of face 510. The memory 504 may preferably have a capacity of TSD blades, indexing blades, and indexing LUNassignments no less than one gigabytes (GB). The target interface 506 and may be maintained by the Blade Configuration Manager 608. the initiator interface 508 may each supporta fiber channel for 35 Through inter-blade communications and coordination, a communication with an associated Storage system or SAN. self-configuring infrastructure may be constructed and The network interface 50 may support a number of commu dynamic load balancing among the blades may be achieved. nication ports, such as, for example, at least two gigabit FIG. 7 shows a block diagram illustrating an exemplary Ethernet (GigE) ports (e.g., GigE Port 1 and GigE Port 2) for indexing blade 700 in accordance with an embodiment of the communication with internal Subnets, a management port 40 present disclosure. This diagram only shows major compo 512, and one or more ports (not shown) for internal commu nents of the indexing blade 700. The indexing blade 700 may nications with other TSD blades. comprise two 32-bit CPUs (i.e., CPU 1 and CPU2), 4 GB of random access memory (RAM) 702, two fiber connection FIG. 6 shows exemplary software components in a TSD ports (i.e., Fiber Channel 1 and Fiber Channel 2), and two blade in accordance with embodiments of the present disclo 45 Ethernet ports (i.e., GigE Interface 1 and GigE Interface 2). sure. The major software components for the TSD blade may The indexing blade 700 may further comprise an internal include functional modules such as, for example, Portable network interface (not shown) for communication with other TSD 602, TSD-Indexing API 604, Agents and Proxies indexing blades in an indexing blade farm. 606, and Blade Configuration Manager 608. FIG. 8 shows exemplary software components in an index The Portable TSDAPIs 602 may support all of the external 50 ing blade in accordance with embodiments of the present interfaces as disclosed in U.S. Provisional Application No. disclosure. The major Software components for the indexing 60/726,193, filed on Oct. 14, 2005, which is hereby incorpo blade may include functional modules such as, for example, rated herein in its entirety. Otherwise, most of the current Blade Configuration Manager 802, Agents and Proxies 804, TSD implementation may remain intact with the exception Remap Engine 806, Global Database 808, New Production that all interactions with the indexing layer are preferably 55 Restore 810, Timeline Compression 812, Indexing Database made through the TSD-Indexing API 604. Other refinements Interface 814, Metadata Space Manager 816, and Vendor may be made to take advantage of separate indexing blades Multipathing Software 818. wherever possible. For example, with indexing data in two The Global Database 808 may run on an indexing blade places, indexing operations initiated by TSD as part of an I/O decided upon by blade configuration managers. There may event chain only have to wait for the successful copy to the 60 not be many changes to the Global Database 808 except for indexing blade instead of a database commit (disk I/O). changes as required for Timeline Compression 812 and New The TSD-Indexing API 604 may be designed and imple Production Restore 810. mented to provide a clean separation of components in a New Production Restore 810 may be based on redesigned storage management system. Using the TSD-Indexing API production restore in order to eliminate its dependency on 604, it may be possible for TSD to interface with indexing 65 region maps as well as to increase its efficiency. services either locally (co-resident) on the platform, or Berkeley databases may be used for all indexing tables. For remotely via a transport. Indexing Database Interface 814, a Berkeley database inter US 7,634,497 B2 9 10 face library may be needed to wrap the various queries that In addition, a Timeline Rollup (TLR) Agent and a Product may be required. Additionally, a Suite of debug routines may Restore (PR) Agent may also be provided in Agents and be implemented to query the indexing data. Proxies 804. Metadata Space Manager 816 may be responsible for allo According to embodiments of the present disclosure, a cating and managing storage space for metadata. 5 number of objects may be provided to facilitate communica Vendor Multipathing Software 818 may be relied upon to tions and coordination among indexing blades and TSD provide a level of high availability (HA) needed for the meta blades. Exemplary objects may include, for example, inter data storage. face handlers (IfaceHalr), a blade message header (BladeMs Agents and Proxies 804 may include subcomponents that gHdr), and connection handlers (ConnHdlr). may be quite distinct from each other, each presenting a 10 The interface handlers may be an abstract base class that distinct set of API methods and responsible for a specific set defines the methods required for inter-blade communica of tasks. Primary and common responsibilities of these sub tions. The interface handlers may work with one or two physi components may include, for example, sending and receiving cal network connections. In the case of two physical network messages to/from a peer blade, handling faults related to loss connections being present, for each send, the interface layer of communication with a peer blade, registering callbacks 15 may randomly select one of the connections to transmit on in with an interface handler for unsolicited messages, and sta order to distribute network traffic on both physical networks. tistics gathering and reporting. A loss of one physical path may be tolerated. A loss of A base class may be created with common methods and connectivity on both networks may be considered a fault data. Agents and proxies both may be derived from this base event and may be handled by calling registered fault handling class. A brief discussion of some of the main proxies and 20 callbacks. agents follows. The interface handlers may support variable sized mes An Insert Proxy and Agent Subcomponent may be provided sages preferably having a common header. Receiver threads in Agents and Proxies 804. An Insert Proxy may receive may first drain the header portion of an incoming message batches of indexing requests for various slices from TSD, before processing the remaining portion of a message. Batch buffer them in buffers for the individual slices, and send the 25 ing of multiple messages may also be Supported. batch to a peer insert agent (i.e., Insert Agent on a peer All message sends may be asynchronous and may employ indexing blade). Upon receipt of the response from the peer callbacks for response processing. The interface handlers insert agent, the batch may be considered secure and the TSD may support “fire and forget message sends as well as mes callbacks completing the events may be called. According to sages that require a response. For the latter, timeouts may be embodiments of the present disclosure, each LU may be 30 Supported that may perform callbacks with an appropriate divided into a plurality offixed-size logical partitions (e.g., 16 response status. Unsolicited messages received (those that are Gigabytes (GB) each) for ease of management and for load not in response to a message sent) may be handled by invok balancing purposes, wherein each fixed-size logical partition ing registered callbacks. may be referred to as one "slice.” The class may support connection-oriented as well as con An A and B buffer may exist for each slice. When the active 35 nectionless interface types. Connection-oriented interface buffer becomes full, the alternate buffer may become active handlers may be constructed as either “servers' or "clients.” and the previously active buffer may be flushed to a scratch this differentiation may affect the behavior of the connect () area on disk. When a maximum BII row count is reached, the method and little else. Insert Agent may be directed to seal the active BII table, and, Listed below are a set of exemplary public methods that upon receipt of the 'seal' response, the scratch area may be 40 may be defined or employed by the interface handlers: freed. The scratch area may be capable of holding the full sendBcast (msg) contents of the active BII table and may be needed in case of Only valid for connectionless interface objects. May an indexing blade failure. This alleviates the onus on the broadcast the message to all connected blades. Indexing Database Interface 814 to commit rows of data sendPriv(msg, addr) while an active table is being filled. 45 Only valid for connectionless interface objects. May Memory utilization on the TSD blade may be a concern and send a private msg to the given address. may certainly limit factors such as buffer size and slice size. Send(msg) An Insert Agent may receive batches of indexing requests Only valid for connection oriented interfaces. Send the from a peer insert proxy (i.e., Insert Proxy on a peer indexing given msg to the peer blade. blade) and may issue inserts into the Indexing Database Inter- 50 connect() face 814. Only valid for connection oriented interfaces. A Remap Proxy and Agent Subcomponent may be pro Synchronous call that may not return until a connection vided in Agents and Proxies 804. A Remap Agent may receive to the peer blade has been established. remap requests from a peer remapproxy (i.e., Remap Proxy registerCB(msgld, callback) on a peer indexing blade) and forward the requests to the 55 Registers a callback for a specific msgID. Multiple call Remap Engine 806. It may also forward the Remap Engine backs may be registered for the same msgD in which 806 results to the peer remapproxy. case each registered callback may be called. An I/O Proxy subcomponent may be provided in Agents getstats() and Proxies 804. An I/O Proxy may forward batches of I/O Gets IfaceStats. sequences to an I/O Agent running on a TSD blade. Timeline 60 resetStats() rollups and production restores may both require I/O opera Resets IfaceStats. tions. Initially, a connectionless interface using User Datagram A TimeStore (TS) Allocation Agent subcomponent may be Protocol (UDP) datagrams and a connection oriented inter provided in Agents and Proxies 804. The TS Allocation Agent face using Transmission Control Protocol (TCP) may be may run on each indexing blade. It may receive allocation 65 implemented. requests from TS Allocation proxies running on the TSD The blade message header may be a common header blades. included in all messages. Each agent or proxy may define US 7,634,497 B2 11 12 messages specific to its set of APIs, but all messages may turn be sent, privately, the blade information of each running share this common header. The common header may contain blade. In this way, the Blade Configuration Manager may one or more of the following fields: have knowledge of every other blade's vital information. MagicNumber Exemplary fields in a bladelnfo struct may include: MsgType 5 blade Node Id Unique ID for the blade, also used with A unique identifier for the message. subnet address to form that node's full IP address. MsgGenNumber bladeState An incrementing number (unique to the proxy/agent) for bladeType each message. Either indexing or TSD blade type. RespCienNumber 10 applianceID The MsgGenNumber that this response corresponds to. If the blade is a TSD blade, this may be the unique id of It may be set to Zero if the msg, is not a response. the appliance. Indexing blades may not use this field. Msg ength partitionID The length of the body of the message (excluding this Indexing blade farms may be partitionable so that spe header). 15 cific sets of blades may be provisioned to service Spare fields LUNs from specific Revivio CDP appliances. A few spare fields for future use. lunOount Conspicuously missing from this list is a version field. The number of LUN slices the blade may be currently Whether to use a version field may depend on a choice Servicing. between the approach of versioning messages and the Services approach of never modifying the contents of a MsgType but A mask of services running on the blade; e.g., global just creating new MsgTypes. database . The connection handlers may be a base class acting as a Global Database partition info container object for proxies, agents and interface Handlers. In memory copy of the on disk global database partition There may be a TSD Connection Handler (tsdConnHdlr) and 25 info. an Index Connection Handler (idxConnFIdlr) derived from Additionally, anytime a blade's information changes, the the base class. blade may broadcast the changed bladelnfo to other blades Each connection handler may represent a nexus between a and/or the indexing blade farm. TSD blade and an indexing blade. All proxies, agents and Blades may negotiate with one other to determine which interface handlers needed for that nexus may be contained 30 blade may run specific services Such as, for example, the within that connection handler. Therefore, all LU- or slice global database server. based interactions between TSD and indexing blades may If a blade boots alone and there are no other blades running, occur through the appropriate connection handler. it may wait for some predetermined amount of time before Some flavor of LunOonfig may maintain the relationship deciding that it should proceed and start services. between connection handlers and LUN/slices. This may be 35 Each indexing blade may have two fiber connection ports the current implementation of LunConfig extended to include for metadata LUN access. All indexing blades in an indexing slices and the connection handler info. blade farm may be Zoned to see the same set of metadata Connection handlers may be instantiated by the blade con LUNs. Metadata LUN discovery may occur when an index figuration managers. ing blade is initially booting, and a background process may FIG. 9 shows a block diagram illustrating exemplary con 40 periodically run discovery to find LUNs that have been added nection handlers in accordance with an embodiment of the to an already running indexing blade. It may then be deter present disclosure. ATSD Connection Handler 910 of a TSD mined as to which LUNs out of the set of LUNs discovered blade (not shown) may be in communication with an Index are to be used for metadata. One approach may be to allow the Connection Handler920 of an indexing blade (not shown). As indexing blades to use any and all LUNs discovered. This shown, the TSD Connection Handler 910 acts as a container 45 approach may obviate the need to have an external manage object for an Interface Handler 912, an Insert Proxy 914, a ment interface for metadata LUN assignment. An alternative Remap Proxy 916, and an I/O Agent 918. Similarly, the Index approach may be to have a management interface for the Connection Handler 920 acts as a container object for an indexing blade farm that may allow a user to assign LUNs for Interface Handler 922, an Insert Agent 924, a Remap Agent metadata. 926, and an I/O Proxy 928. The two interface handlers (912 50 When an indexing blade farm is booted for the first time, and 922) are in communication with each other via Internal the indexing blade that wins the negotiation for running the Subnet 1 and Internal Subnet 2 through which the messages global database service may be responsible for choosing a between the corresponding agents and proxies are routed. LUN on which to run the global database and to query all Separation of indexing functionalities from I/O processing other indexing blades to ensure that this LUN is seen by all may also require a management Solution for both TSD and 55 members of the indexing blade farm. It may then create a indexing blades. According to embodiments of the present partition on the metadata LUN for the global database and disclosure, configuration of blades may be managed by a start the service. The global database partition information Blade Configuration Manager. may be stored in a local disk file and broadcast to the other In a static blade configuration, a set of blade configuration indexing blades in the farm. Each indexing blade may persist data be set at Software installation or via a Supplied utility, 60 the partition information, the information being sufficient for prior to a blade becoming fully operational. These configu any indexing blade to mount the partition and start the global ration data may include, for example, Protocol (IP) database service. configuration for both subnets and blade node ID. The static The indexing blade starting the service on a fresh indexing blade configuration information may be persistent in a file on blade farm may also populate a table in a global database with a local disk. 65 the metadata LUNs that it has discovered. As other indexing When blades are booted, they may broadcast their blade blades gain access to the global database service (e.g., via the information to all blades that are already running and may in broadcast by the service owner), they may also populate the US 7,634,497 B2 13 14 table with the set of metadata LUNs discovered. The goal may ration APIs may be handled by blade configuration managers. be to come up with a union set of LUNs that are seen by all A list of exemplary API calls, sorted by their related proxy, are indexing blades since it may be a requirement that all index provided below. ing blades see the same set of LUNs. Any LUNs that are seen The Configuration APIs may involve configuration-spe by some, but not all of the indexing blades in the farm may be cific information and may be handled via the blade configu marked unavailable. The indexing blades may not enter a ration managers. The Configuration APIs may include the running state until a common set of LUNs has been deter following: mined. AddLun Synchronous call that may be responsible for advertis When a late-comer indexing blade boots into a farm that 10 ing a LUN to the indexing blade farm. It does not has one or more indexing blades in a running state and it return until it has negotiated an owner for the LUN. cannot see the same set of metadata LUNs that the running Indexing blade LUN ownership may be dynamic, blades see, it may not continue booting up to a running state. negating the need for a persistent configuration. In the context of TSD LUN configuration, indexing blades RemoveLun may not be responsible for LUNs but rather uniquely identi 15 Synchronous call that may be responsible for removing fied LBA ranges given to the indexing blade as a globally a LUN from an indexing blade. unique LUN identifier. The memory units so identified may GetTime be slices. The unique identifier may allow different storage Synchronous call to get the current system time. management appliances to present LUNs to the same index Insert Proxy APIs may include: ing blade farm. Use of this new LUN identifier may have to be IndexBI propagated throughout TSD and the indexing blade applica Synchronous call that may be responsible for handling a tions. Since the indexing blades are handed these pseudo “batch' of indexing operations. Upon return, the LUNs, it may be agnostic to slice sizes. The Portable TSD caller may safely assume that the indexing records are APIs may still require slice manager functionality to split SCU. 25 Commit indexing operations that span a slice boundary. Synchronous call that may be responsible for forcing the In the context of TSD LUN configuration, LUNs may not indexing blade to commit all outstanding index BIs. be assigned directly to indexing blades. Instead, LUN owners Remap Proxy APIs may include: may be dynamically chosen when LUNs are put in capture RemapRead mode by TSD. An exemplary sequence for a TSD blade and 30 Asynchronous call that may be responsible for remap an indexing blade to determine LUN ownership is illustrated ping an LBA range based on a given time. in the following table: RemapWrite Asynchronous call that may be responsible for remap TABLE 1. ping a TimeImage write. 35 UpdateMap Sequence for Determining LUN Ownership Asynchronous call that may be responsible for updating TSD Blade Indexing Blade after image maps (AI, BI, DW). TS Alloc Proxy APIs may include: 1 Broadcast LUN AllocChunk advertisement to indexing blade farm 40 Synchronous call that may be responsible for allocating 2 IF (local LUN counts minimum a new TimeStore chunk for the given LUN. LUN count across SealChunk entire set of indexing Asynchronous call that may be responsible for sealing a blades) THEN TimeStore chunk. send LUN ownership offer 45 Add/RemoveLUN 3 IF (first offer Synchronous call that may be responsible for adding or received) removing TimeStore LUNs. Send LUN offer acceptance Add/Remove/ModifyOuotaGroup ELSE Synchronous call that may be responsible for adding, Send LUN offer 50 removing, or modifying quota groups. rejection Get Timeline Info 4 Start idxConnHollr; Send ready Synchronous call that may be responsible for getting 5 StarttsdConnHollr: timeline info for a given lun or quota group. Update LunOwnership map TLR Proxy APIs may include: 6 Once the blade connection 55 Add/Remove/ModifyProfile has been established, broadcast an updated blade Synchronous call that may be responsible for adding, configuration removing, or modifying timeline rollup profiles. GetTLRInfo Synchronous call that may be responsible for getting The Blade Configuration Manager may start a connection 60 timeline rollup information for a given LUN or quota less interface handler (datagram) and, apart from blade dis group. covery and LUN ownership negotiation, may be responsible Batches of I/O requests may be forward from indexing for services negotiation, connection handler creation, and blades to a TSD blade via an I/O Proxy. fault handling coordination. At this point it should be noted that the technique for The APIs between a TSD blade and an indexing blade may 65 implementing a scalable and/orportable storage management be used for all communications between the two blades. The system in accordance with the present disclosure as described APIs may be embedded within respective proxies. Configu above typically involves the processing of input data and the US 7,634,497 B2 15 16 generation of output data to Some extent. This input data 5. The storage management system according to claim 1, processing and output data generation may be implemented wherein each of the plurality of processor modules comprises in hardware or software. For example, specific electronic at least one target interface and at least one initiator interface. components may be employed in a storage area network 6. The storage management system according to claim 1, (SAN) or similar or related circuitry for implementing the wherein the plurality of processor modules are in communi functions associated with storage management system Scal cation with one another. ability and/or portability in accordance with the present dis 7. The storage management system according to claim 1, closure as described above. Alternatively, one or more pro wherein the number of the one or more indexing modules is cessors operating in accordance with stored instructions may scalable based on the number of the plurality of processor implement the functions associated with storage manage 10 modules to be Supported by the one or more indexing mod ment system scalability and/or portability in accordance with ules. the present disclosure as described above. If such is the case, 8. The storage management system according to claim 1, it is within the scope of the present disclosure that such wherein the one or more indexing modules have direct access instructions may be stored on one or more processor readable to one or more metadata storage devices in the storage system carriers (e.g., a magnetic disk), or transmitted to one or more 15 via fiber connections. processors via one or more signals. 9. The storage management system according to claim 1, The present disclosure is not to be limited in scope by the wherein the one or more indexing modules communicate specific embodiments described herein. Indeed, other various with the plurality of processor modules via one or more embodiments of and modifications to the present disclosure, internal networks. in addition to those described herein, will be apparent to those 10. The storage management system according to claim 1, of ordinary skill in the art from the foregoing description and wherein the one or more indexing modules communicate accompanying drawings. Thus, such other embodiments and with the plurality of processor modules through a plurality of modifications are intended to fall within the scope of the agents and proxies. present disclosure. Further, although the present disclosure 11. A method for improving portability and scalability of a has been described herein in the context of a particular imple 25 storage management system operatively coupled to a storage mentation in a particular environment for a particular pur system, the method comprising: pose, those of ordinary skill in the art will recognize that its coupling a plurality of processor modules to the storage usefulness is not limited thereto and that the present disclo system, each processor module to intercept write com sure may be beneficially implemented in any number of envi mands directed to the storage system, generate backup ronments for any number of purposes. Accordingly, the 30 data based upon data associated with the write com claims set forth below should be construed in view of the full mands, and generate metadata having timestamps for the breadth and spirit of the present disclosure as described backup data; herein. coupling a blade farm having one or more indexing mod The invention claimed is: ules to the storage system, wherein the blade farm is 1. A storage management system operatively coupled to a 35 located separate from the plurality of processor modules storage system, the storage management System comprising: to allow separate scalability of the plurality of processor a plurality of processor modules, each processor module to modules and the one or more indexing modules, wherein intercept write commands directed to the storage sys the blade farm is in communication with the plurality of tem, generate backup data based upon data associated processor modules and the storage system; and with the write commands, and generate metadata having 40 causing the blade farm to communicate with the plurality timestamps for the backup data; and of processor modules via one or more internal networks, one or more indexing modules to create one or more index wherein the one or more indexing modules create one or ing tables for the backup data based on the metadata, more indexing tables for the backup databased on the wherein the one or more indexing modules are located metadata, wherein the one or more indexing modules separate from the plurality of processor modules to 45 further perform one or more functions associated with allow separate scalability of the plurality of processor one or more functional modules selected from the group modules and the one or more indexing modules, wherein consisting of a blade configuration manager, a remap the one or more indexing modules are in communication engine, a global database, a production restore, a time with the plurality of processor modules and the storage line compression, an indexing database interface, a system, wherein the one or more indexing modules fur 50 metadata space manager, and Vendor multipathing soft ther perform one or more functions associated with one Wa. or more functional modules selected from the group 12. At least one processor readable storage medium for consisting of a blade configuration manager, a remap storing a computer program of instructions configured to be engine, a global database, a production restore, a time readable by at least one processor for instructing the at least line compression, an indexing database interface, a 55 one processor to execute a computer process for performing metadata space manager, and Vendor multipathing soft the method as recited in claim 11. Wa. 13. A method for improving portability and scalability of a 2. The storage management system according to claim 1, storage management system operatively coupled to a storage wherein the number of the plurality of processor modules is system, the method comprising: Scalable based on a desired capacity of the storage manage 60 intercepting, at a plurality of processor modules, write ment system. commands directed to the storage system; 3. The storage management system according to claim 1, generating backup data based upon data associated with wherein the plurality of processor modules are configured the write commands; with fault-tolerant redundancy. generating metadata having timestamps for the backup 4. The storage management system according to claim 1, 65 data; and wherein the plurality of processor modules are coupled to the creating, at one or more indexing modules, one or more storage system via fiber connections. indexing tables for the backup databased on the meta US 7,634,497 B2 17 18 data, wherein the one or more indexing modules are or more functional modules selected from the group located separate from the plurality of processor modules consisting of a blade configuration manager, a remap to allow separate scalability of the plurality of processor engine, a global database, a production restore, a time modules and the one or more indexing modules, wherein line compression, an indexing database interface, a the one or more indexing modules are in communication 5 metadata space manager, and Vendor multipathing soft with the plurality of processor modules and the storage Wa. system, wherein the one or more indexing modules fur ther perform one or more functions associated with one