FalconStor Transcends both Physical and Virtual Boundaries with new VMware Initiative

VMware ESX Server and its latest vSphere software release have become instrumental in helping organizations tame server hardware costs within data centers while improving the ROI associated with existing and new projects. But this isn’t to say that virtualization doesn’t come with any pain points–and managing virtual storage infrastructures is one of the biggest. So to help combat these challenges, FalconStor Software today announced at VMworld 2009 a comprehensive VMware Initiative that will help organizations bridge their physical and virtual infrastructures and provide continuous availability in multi-vendor storage environments.

Many of the emerging pain points within virtual storage infrastructure concern the enablement of high availability in VMware environments. Despite VMware’s virtualization capabilities, many of VMware’s high availability features depend on a highly available storage infrastructure that VMware does not directly address.

Backup pains are exasperated in growing virtual environments as physical and virtual server infrastructures often demand different backup techniques. As a result, they are managed separately with little commonality in their data protection or disaster recovery approaches.

However this is not a long-term strategy as standardized, common interface to the storage infrastructure is becoming a necessity in order to effectively deploy server virtualization. Further, these standardized interfaces need to encompass remote and branch offices (ROBOs) as well as small and medium businesses (SMBs), as they are the ones that can least afford to make significant storage investments.

In light of these emerging enterprise, ROBO and SMB requirements, FalconStor today introduced a new VMware Initiative that focuses in on four areas.

Continuous Availability. FalconStor extends VMware vCenter Site Recovery Manager services to the physical server infrastructure and provides automated failover and failback support for these physical servers whether they are located locally or remotely.

One of the greatest costs for disaster recovery at remote sites has to do with the build-out of an infrastructure that meets the same specs as an organization’s critical but more costly primary site.

Having parallel environments quickly becomes cost prohibitive especially when an organization considers that its DR services is something it does not use very often. It is basically a stand-by environment and provides fail-over when required only for a very short period of time until the main infrastructure is brought back up and running.

FalconStor’s virtualization platform helps take care of some of these biggest challenges when organizations implement vCenter Site Recovery Manager. It eliminates the need for organizations to deploy the same storage infrastructure at both the production and recovery sites while also removing storage system vendor lock-in.

This is noteworthy since vCenter Site Recovery Manager support is limited to a relatively small number of disk vendors. Using FalconStor, organizations can use whatever storage they have available, locally or remotely. In so doing, it brings the management of these disparate storage infrastructures for physical and virtual environments under one umbrella.

The other vCenter Site Recovery Manager drawback that FalconStor addresses with this release is the difficulty in failing back after a failover. Currently, organizations have to redefine replication methods from their DR site back to the production site, create all of those protection groups at the remote site, institute recovery plans at the production site, fail over from the remote site to production, and then repeat the same steps from the production site to the remote site to protect the production site again by Site Recovery Manager.

Not only can this be very expensive in terms of resources and time in order to do it an Site Recovery Manager failback properly, it is prone to human error. To simplify this, FalconStor now provides a plug-in to vCenter that will scan vCenter and the storage server on the protected site that will discover the virtual machine, data store and storage replication configuration details. Once this is done, SRM is set up to provide automated failover and failback and even migrate the recovery plan to the remote site. Most impressively, this can all be managed and accomplished completely within Site Recovery Manager.

Virtual Storage Management. FalconStor’s new plug-in to the vCenter console was simply a stroke of genius as it enables vCenter to manage the server and the storage infrastructure from one console. Using vCenter, organizations can manage any FalconStor Networked Storage Server (NSS) infrastructure so they do not have to go to the NSS console to do routine storage tasks such as provisioning LUNs. This should further expedite virtual server deployments and improve their protection since the plug-in completely automates the whole set up process.

Virtual Appliance Solutions. One of the other slick moves that FalconStor has made is porting all of its solutions to virtual appliances to offer a cost- effective data services model to ROBOs and SMBs.

Lately FalconStor has ported its File-interface Deduplication System (FDS) to the VMware environment in the form of a without requiring organizations to add networked storage to their infrastructure. This option becomes very attractive to ROBOs and SMBs as they can now introduce deduplication into their environment on their existing physical VMware ESX Server with no changes to their physical environment.

New Direct Storage Access. This new feature provides FalconStor Virtual Appliances a direct path to the network or storage card and completely bypasses the storage virtualization layer created by the hypervisor.

The FalconStor I/O device driver directly accesses the fiber channel card or 10GB Ethernet card sitting in that virtual machine which should result in higher performance connectivity for storage services. In its internal testing, FalconStor has seen a 10-fold increase in performance and can nearly maximize the I/O throughput on an ESX server that has two NICs assuming the attached storage device can support these high rates of I/O.

VMware plays an increasingly critical role as to how organizations are cutting their operational costs while improving application availability and recoverability; but VMware can’t do it all. This new VMware initiative from FalconStor does an admirable job of filling in some of the holes that VMware leaves behind while taking advantage of the virtual infrastructure that VMware has created in the process. In so doing, FalconStor provides organizations a clear path for resolving their storage and data protection issues locally and remotely regardless if they are using physical or virtual servers and in such a way that it does not take a rocket scientist to implement and support.

Cloud Storage Poised to go Mainstream by 2012 but still a Marketing Ploy for now; NetApp Announcement Underwhelms

The last month or so I have spent a lot of time doing research on storage. Its terminology, who the providers are, its maturity (or lack thereof) and who (if anyone) is taking advantage of and supporting it have all been questions I have been asking. “Why?” you may ask. Simple. A survey conducted by Applied Research at the behest ofF5 Networks and released this past Monday finds that more than 80% of IT managers are discussing or implementing public or private cloud solutions. Now when’s the last time you recall seeing a statistic like that?

Now, granted, survey results like this from Applied Research no doubt work in F5 Network’s favor. However Applied Research does these surveys for a number of well-known high tech companies (Symantec specifically comes to mind) and, in looking at the profile of those companies it surveyed, it looks legitimate. The demographics of those surveyed were especially insightful.

It spoke with 250 companies that had at least 2,500 employees worldwide with a median of 75,000 employees The respondents included IT managers (37%), VPs (24%), IT directors (23%) and SVPs (16%). 46% manage IT departments, 41% work in IT departments and 13% have IT departments that report to them

This would communicate to me cloud storage has a lot of momentum and the fact that the answers were so consistent across the board indicates that whatever form that cloud storage takes, it is going to be big and extremely disruptive.

This was confirmed by another conversation that I had this week with a consultant. He tells me that everyone he deals with from CIOs to IT managers are being pushed by their executive boards to take a look at cloud storage. In many cases, expending capital funds for IT hardware or software is coming off the table as option. Instead they are being instructed, whether they like it or not, to look at cloud storage options and how they can pay for it with money out of their operational budgets.

Neither he nor I necessarily see that occurring this year or next (I’ll get to why I don’t in a moment) but this strikes me as eerily similar to what happened with server virtualization a few years ago. First there was interest in 2006/2007, then early adopters in 2008 and then this year (2009), entire organizations are going full steam ahead with server virtualization. So at the rate cloud storage is maturing coupled with this level of interest at the highest levels in enterprise organization, I would expect cloud storage to follow a similar path and start to go mainstream in 2011 and no later than 2012. So why don’t I see much activity occurring until 2012? This is somewhat based on a conversation I had this past week with Atempo. My interest in speaking to Atempo was that among providers of data protection and management software, it has arguably been well ahead of the crowd in providing support for cloud storage.

As far back as last year it announced support for Nirvanix’s cloud storage REST API within its Digital Archive product and then this year at EMC World it announced support for EMC Atmos’s REST API as well. What I wanted to try to understand was, “What was Atempo’s motivation for supporting cloud storage so early in cloud storage’s life cycle?”

Mark Sutter, Atempo’s CTO and VP of Engineering, told me that Atempo’s primary purpose for making it available so soon was essentially a marketing ploy to help it garner some attention and press/media. In that respect, it succeeded. By providing support for Nirvanix’s REST API as soon as it did, it did result in some press coverage that captured the attention of executives at EMC Atmos. That led to conversations that eventually resulted in Atempo Digital Archive’s supporting the EMC Atmos cloud storage offering.

However this does not mean customers should just run out and buy Digital Archive and assume it will automatically work with either Nirvanix‘s or EMC Atmos’s cloud storage offering. He said Atempo does not have many customers using Digital Archive’s cloud storage interface (I got the impression it was in the single digits, if that many) and that most of the customers that Atempo is speaking to are still dipping their toe in the cloud storage waters.

The other question I posed to Mark was, “How difficult was it for Atempo to support the REST API?” The lack of a standard REST API interface among cloud storage providers is something I have brought out in a previous blog so I was curious how difficult it was for Atempo to provide REST API support for not one but two cloud storage providers.

According to Mark, not bad at all. He did say Atempo had to make a modest investment for its initial support of the Nirvanix REST API but that to add support for the EMC Atmos REST API to its Digital Archive product was not onerous. Atempo was able to leverage a lot of what it learned in developing its initial support for Nirvanix’s REST API to support EMC Atmos.

Following this conversation with Atempo, I also had a conversation with NetApp to get a few more details around its cloud storage announcement earlier this week. However after speaking to NetApp’s Jeff O’Neil and Sandra Wu from NetApp, I felt underwhelmed.

Maybe it was because the features ofONTAP 8 have been promised and alluded to by NetApp for so long that this announcement was anti-climatic. Maybe it was because I expected NetApp to announce a REST API so organizations could use NetApp FAS storage systems for either private or public clouds. Whatever the reason, this is one press release that for all of the hoopla surrounding it failed to impress me.

This is not to imply NetApp did not have some good stuff to say. It’s secure multi-tenancy feature (which I want to do some further research on) is very compelling and should serve it in good stead with those customers that want to use NetApp for private cloud deployments. The Performance Acceleration Module II that complements multi-tenancy by optimizing its workloads also merits mention as a key feature that helps to set NetApp apart from the crowd.

Also its new Data Motion feature (though not available until early CY2010) will surely be a feature that private cloud storage providers will love. It allows them to do live data migrations so normally burdensome tasks such as load balancing, system maintenance and storage system refreshes are routine storage admin tasks as opposed major undertakings within organizations. This feature should come as a major relief to those individuals.

But I see all of these announcements as only important technical updates to the NetApp FAS storage systems and applicable primarily to those looking to build internal, private storage clouds. In that vein, I question NetApp’s wisdom of not di scussing its plans for making a public cloud storage option available and even backing away from it when pressed on the conference call. While Sandra did tell me at the end of my briefing to stay tuned for more announcements in the coming few months, I do hope NetApp finds a way to fill this public storage cloud gap sooner rather than later. While over the next 12 months not having a public cloud storage offering option via a REST API will probably not hurt it, I can see NetApp’s lack of presence and direction in this area coming back to bite them.

That’s it for this week. Have a good weekend and be sure to check back in frequently next week. I will be atVMworld covering all things virtual and cloud related so look for more frequent posts during that period of time.

Iomega ix4-200d Doubles Storage Capacity, Triples Performance and Adds Replication; Not Just another SMB NAS Device

It was only back in February that with great fanfare released its StorCenter Pro ix4-100 targeted at the SMB market. Now, only 6 months later, Iomega announces an updated version of the ix4-100 appropriately named the StorCenter™ ix4-200d. The ix4-200d doubles the storage capacity and triples the processing power of the ix4-100d but it is the addition of replication to its EMCLifeLine software that really makes the ix4-200d stand out from its competitors.

The Hardware Basics of the Iomega ix4-200d

The four-drive ix4-200d (the ‘d’ on the ix4-200d stands for ‘desktop’) is a souped up version of the ix4-100 that supports up to 50 users and represents an increase of 25 users over the current ix4-100. To accommodate these additional users, the ix4-200d boasts both a faster processor and dual Ethernet ports.

The storage capacity of the new ix4-200d represents a two-fold increase over the ix4-100 (8 TBs versus 4 TBs) while its CPU is now 1.2 Ghz versus the 400 Mhz CPU found on the ix4-100. This should help the ix4-200d better respond to the workloads and additional storage requirements created by these additional users. The list price for each capacity is $699 (2 TB), $899 (4 TB) and $1,899 (8 TB), respectively.

Its dual Ethernet ports are a nice new touch as they can be configured together to provide high availability and fault tolerance. Each network interface can also be put on a separate network which allows SMBs to place the device in a DMZ (demilitarized zone) where one port is exposed to the for remote access and the other is only used for internal file sharing. This setup is great for those SMBs who have folks on the road that need to gain access to files back in the office. They can do this securely as file transfers are encrypted using the same RSA-based encryption technology found in current StorCenter NAS products. For those individuals in the office that are responsible for managing the ix4-200d, the ix4-200d now includes an LCD screen on the front of the device. The LCD display provides instant status updates of the unit’s total available storage capacity as well as statistics of its current network performance.

The lone downside I saw on the new ix4-200d was that it kept the 512 MB of RAM found on the current ix4-100. In speaking to Iomega about this, they said it was a design decision to help keep ix4-200d’s costs down. My only concern was that it could negatively impact those environments that have a lot of users concurrently reading and writing files to the ix4-200d.

Replication – An Awesome New Feature for an Affordable SMB NAS

Iomega has, over the last year, added iSCSI block-level storage support to its StorCenter family of NAS appliances but the awesome new feature that it adds in this release is replication which comes at no additional cost. This is a huge new feature for SMBs that need the ability to store mission critical data at an off-site location that is now built into the EMC LifeLine OS.

Using only a few clicks of your mouse, an SMB can quickly setup replication between its primary ix4-200d and another ix4-200d or to an externally attached USB drive. It even gives an SMB the option to copy data to another network target using rsync-powered replication. The idea of replicating data to an externally attached USB drive is not necessarily an earth-shattering advancement.However, for an SMB this can be a simple way of eliminating tape drives and cartridges and going to completely disk-based backups. A simple push of the QuikTransfer button on the front of the ix4-200d enables backups to be performed on-demand to an attached USB drive. Once the “backup” is finished, the SMB can disconnect the USB drive and take it to an off-site location for safe keeping. This new feature nicely compliments the unlimited client license EMC Retrospect Express backup software that already comes with the ix4-200d.

Other Hidden Gems in the ix4-200d

You’re probably thinking what else could be hidden in this little gem? Well, Iomega hasn’t dropped any of the previous features that existed in the ix4-100. This includes Windows Active Directory support, print server and media server support (iTunes, ), RAID options for data production redundancy, and folder quotas for handy capacity management.

If storage space does become a problem then the ix4-200d can be expanded by attaching up to 3 externally-attached USB drives. There are 2 USB ports available in the rear and 1 on the front of the unit. Further, it is one of the only dual NAS/iSCSI-based solutions that is VMware certified for ESX 3.5.

Iomega has been busy this past year and they are holding to their CEO’s pledge made earlier this year of providing new feature upgrades every 6 months. The ix4-200d delivers many of the features that an SMB would want and even a little bit more than one might expect from a storage appliance at this price range. So, if you have been putting off on purchasing a NAS storage appliance or waiting for the right SMB NAS appliance for your business, the ix4-200d gives you some compelling reasons to act now. New Use Cases Give CDP a Second Chance

In the last few years no technology has experienced more of its shares of ups and downs than continuous data protection (CDP). Initially hailed by some as a likely successor to backup software, CDP has yet to come close to fulfilling on that original promise. However recent changes in IT data center environments coupled with ongoing improvements in CDP are giving this technology a second chance.

Some may wonder why CDP did not do better out of the gate. While the reasons vary, there are three main reasons why CDP never took off as some initially forecast it might.

It never integrated with backup software. Users tend to have a love-hate relationship with their backup software but replacing backup software with CDP was never really an option in most organizations. So when CDP vendors suggested that they replace their existing backup software with CDP software and offered no options for backup software integration, application support or tape management, the arguments for switching to CDP fell on deaf ears. It solved too few problems in organizations. No backup windows, near zero data loss on recoveries and roll forward and rollback recovery capabilities sound really cool to application owners and backup administrators. But how many applications really, really needed that level of functionality a few years ago and then were willing to pay extra for it? It turned out, not enough. Organizations found that they could make do with existing and more affordable backup and snapshot technologies. Organizations still had the time and resources to do backups and recoveries. 3 – 5 years ago server virtualization was still on the horizon not in the data center. As a result, organizations still had time to backup and recover the majority of applications using backup software agents on LAN-attached servers and various forms of snapshot technologies for mission critical applications.

What has changed over the past few years is that the arguments for not using CDP are less applicable. Many data center infrastructures either have or are deploying server virtualization. This has brought new requirements for less intrusive backup techniques with faster, more comprehensive recovery options. Also, as CDP software has matured, some of the original objections to its adoption are no longer valid. Consider the following new use cases:

Better integration with backup software. CDP providers now rightly recognize that users are not about to abandon their backup software anytime soon. While they want better options to protect and recover their data, they want to continue to use their backup software to set and administer these policies. In light of this CDP providers are adapting.

For example, Symantec is now leveraging its NetBackup 6.5 software to set backup policies for either NetBackup or its NetBackup RealTime CDP software. Organizations can also leverage the NetBackup backup software agent to create application consistent recovery checkpoints within the NetBackup RealTime CDP solution. These application checkpoints can then be used to create snapshots which NetBackup can use as the source for long term backups to disk or tape.

No backup windows. Question: When is a good time to backup a physical server that hosts multiple virtual machines, even those not deemed mission critical? Answer: Never. When organizations consolidate multiple physical servers onto one, using traditional backups on each virtual machine puts additional overhead on the underlying physical server that can tax its memory, networking and processing resources. Using CDP to protect these servers eliminates this overhead associated with nightly backups and, in the case of products like NetBackup RealTime, actually offloads the backup workload to an appliance on the Fibre Channel network.

Standardized, shorter recovery periods for all application servers. As organizations consolidate servers while seeking to maintain or even reduce IT staffing levels, it can become harder to determine which applications are mission critical and which ones are not. CDP eliminates the need for organizations to try to make these sometimes arbitrary determinations by providing a consistently high level of recovery for all protected applications.

In the case of NetBackup RealTime, it virtualizes existing storage systems so that the application servers can run off of the NetBackup RealTime backup without needing to copy data back to the original or another location. This takes backup to a new level as it becomes more akin to a clustering solution since the application server can use the disk image behind the NetBackup RealTime solution to run instead of waiting for a recovery to occur.

New applications for testing and development. Production applications are not the only applications that can run on images created by CDP platforms such as NetBackup RealTime. These same copies of production application data can easily be presented to test and development servers. Since CDP can be used in these new roles, it gives organizations new flexibility in how they budget for CDP initiatives.

CDP will likely never replace backup software but its role in organizations and the business case for it are becoming much more clearly compelling. As these four new use cases illustrate, CDP is no longer a technology that can be or should be summarily dismissed. Rather, it has matured and evolved to become a way for organizations to reduce the cost and effort associated with data protection while simultaneously creating entirely new possibilities for application development, testing and disaster recovery.

SEC Case Reveals Former AIG Execs Demanded Email Evidence Be Destroyed

When the wheels came off the American economy in the fall of 2008 there was a steady stream of companies lining up for a government bailout and none were of a higher profile than American Insurance Group (AIG). Over a chorus of jeers from the general public the United States Government set out to rescue the “Too Big to Fail” company by setting up an $85 Billion dollar reserve in exchange for 79% ownership of the company. Emotions ran high during this time period and no matter which side of the aisle you were on in regards to the bailout of AIG, the current SECcomplaint against AIG will make most any person angry.

The SEC has taken its lumps for its role, or lack of it, during the economic downturn. But, this case shows the SEC is paying attention and investigating those who play fast and loose with the rules. What this investigation highlights most of all is that AIG was routinely engaged in business practices designed to inflate the company’s worth and misstate earnings, costing investors millions of dollars. All of this was orchestrated through the use of shell companies and investors with the full knowledge of the former executives.

Maurice R. “Hank” Greenberg, Former Chairman and CEO, and Howard I. Smith, Chief Financial Officer settled with the SEC with payments of $15 million and $1.5 million respectively for their roles in this scandal. This is in addition to the fines levied in 2006 against AIG totaling around $800 million dollars for securities fraud and improper accounting. The SEC’s release included this quote from Robert Khuzuai, Director of the SEC’s Division of Enforcement, “Corporate leaders cannot avoid the truth and consequences of their company’s performance by using improper accounting gimmicks and signing off on distorted financial reports.”

While reviewing the SEC complaint several interesting items were found;

Greenberg made it clear through conversations regarding reduction of stock price due to inadequate loss reserves that AIG was going to use “aggressive accounting techniques”. This was done through a transaction with GenRe which was paid a $5 million dollar fee and refunded $10 million dollar premium back to AIG through an off-shore company. This transaction was not in conformity with GAAP and referred to as “reckless” by the SEC. A “sham” transaction was done through what is called a round trip of cash. Basically a transaction was done between two companies which had no economic substance, but was done for the sole purpose of manipulating AIG’s financial statements. Materially false statements regarding loan loss reserves were given that were signed off on by Greenberg and Smith. Concealed losses through a shell company from Barbados called Capco which AIG acquired through a subsidiary called AIRCO. Using this company, Capco absorbed $210 million in AIG losses so investors would be willing to make capital investments and stock price would not be affected. When AIRCO officials raised concerns regarding Howard Smith’s orders to not record an unrealized loss, Howard Smith admonished them for sending their concerns over e-mail and demanded that all evidence of the conversation be destroyed.

This SEC complaint reveals full of sham investments, false investors, manipulation of financial statements and cover-up. These were all designed to hide losses that reached the hundreds of millions of dollars while at the financial statements were inflated and stock prices were manipulated. This is not to mention the damning evidence that a direct order was issued from AIG’s CFO to destroy all e-mail trails of AIG’s wrong doings.

Technology such as Estorian’s LookingGlass allows companies to ensure that e-mail evidence isn’t destroyed. Shareholder value and company reputation should be closely guarded and ensuring all communications regarding the financial viability of a company should be kept in accordance to federal law.

AIG was front and center in the current financial crisis and only survived through a bailout from unearned tax payer dollars. But, when you see this type of unethical and arguably criminal of allegations against Hank Greenberg and Howard Smith, it makes one pause and question whether AIG deserved to be saved and undoubtedly will make it harder to gain public support for further cash infusions.

Companies should be mindful of these types of executives and ensure that e-mail discussions of company financial data is preserved through the use of technology such as LookingGlass. As this case points out, executives who cook the books and destroy evidence will eventually be held accountable. The big difference today between AIG and you is that your company cannot count on a bail out if it happens inside your company.

Nirvanix Weighs in on Cloud Storage; ‘Harmonic Mean’ Now Part of the Data Deduplication Discussion

Over the last couple of weeks my weekly recap blogs touching on the subject of cloud storage has prompted a lot of emails and phone calls to me in the background to discuss this topic so I wanted to touch on that again this week. In addition, I’ve also been doing a little research into some of Data Domain’s claims (and the counterclaims of its competitors) in regards to the advertised performance numbers on its new DD880 and under what conditions enterprise users might expect to achieve those numbers. Finally, I wanted to comment on some of the statements that I made last week about a CEO change and a corporate acquisition and end up with a new rumor that is circulating in the storage industry.

First, after my blog on cloud storage posted last week, I received a call from my friend Stephen Foskett. Anyone who knows Stephen knows that this man never sleeps and I secretly suspect he is hard wired into somebody’s storage matrix. Right now one of the many jobs that Stephen holds is that of Nirvanix’s Director of Consulting and I’m sure Nirvanix’s noticeable absence in my previous couple of blogs about cloud storage did not go unnoticed by him. However Stephen can rest easy this week as Nirvanix is now mentioned in the same sentence as cloud storage.

Much of our conversation focused on the differences between “private clouds” and “public clouds”. Previously I had brought out that most storage cloud providers are developing their own proprietary implementation of the REST API which essentially locks users into a specific cloud storage implementation. There was also another good entry earlier this month on ParaScale’s blog (another start-up storage cloud provider) on this exact topic. It points out how each application vendor has to write support for each cloud storage vendor’s implementation of the REST API. However ParaScale failed to mention in that particular blog what it is doing to make the situation any better.

Stephen pointed out to me that the story does not end there. Rather storage cloud offerings are breaking into three distinct camps – those that just do private clouds, those that do public clouds and those that can do both.

Private clouds essentially just provide NFS and CIFS interfaces. These are the easiest to deploy since many applications can already connect to these types of “clouds” since they use standard network protocols. The downside is that they negate some of the benefits of cloud storage since users have to first buy some of the storage infrastructure whereas public storage clouds do not require an upfront capital expenditure for hardware and software.

Conversely, public clouds are starting to adopt the increasingly popular but proprietary REST API implementation that is needed to navigate through corporate firewalls. In an article that I wrote forSearchStorage.com that appeared earlier this week, Jon Martin, director of product management at EMC’s Cloud Infrastructure group, even went so far as to say, “Eighty-five percent of the market is turning to REST while SOAP is fading away.”

However what Stephen brought to my attention is that companies like EMC are making its cloud storage offering available in two flavors, an internal cloud storage offering and the other as a public cloud storage offering. The internal one uses a CIFS/NFS interface while the public cloud uses a REST interface. So even if users want to go exclusively with EMC, they should not assume they can easily migrate data between the two of them.

This is how Nirvanix differentiates itself and represents a third class of storage cloud provider. It installs its own file system on servers and first caches writes intended for the cloud to local disk. The benefit of this approach is that whether users want to create an internal or external storage cloud, any application can take advantage of it since Nirvanix masks the storage cloud (private or public) by presenting a consistent file system interface to applications that it understands and to which it can write data.

This does not come without some risk. Since Nirvanix caches the data to local disk, there is no 100% guarantee the data ever makes it to the cloud since if a connection to the cloud is never established or something catastrophic occurs to that server’s disk cache, the data can be permanently lost. To get around this, Nirvanix encourages users to create disk caches on highly available disk systems for this purpose of caching writes before they go to the cloud. This is an interesting approach but I only see this approach (as I see all of them) as appealing to a certain segment of end-users.

Now I want to turn my attention to Data Domain. As many are aware, about a month ago Data Domain released its new DD880 that was specifically targeted for the enterprise backup space. I personally feel that this product achieved that ends in terms of the performance and throughput benchmarks that it published and what real production environments actually require.

Of course, like every vendor’s published benchmarks, there are caveats to achieving those performance numbers. The two that Data Domain’s competitors most often like to point to are that to achieve these benchmarks, users must implenent Symantec’s OST API and a 10 GB Ethernet connection to achieve these throughput levels.

However NEC makes the case that in order for users to achieve these throughput numbers all backup data sent to the DD880 appliance must be a duplicate of data that is already deduplicated on the DD880. In this way, no new data actually needs to be stored to the DD880. Instead it only needs to write pointers to existing deduplicated data.

In an email I received from Gideon Senderov, NEC’s director of product management and technical marketing, he wrote the following:

The third criterion is also implied to some extent by Data Domain’s claim on the DD880 data sheet: “The high-throughput, inline deduplication data rate of the DD880 is a direct benefit of the Data Domain Stream-Informed Segment Layout (SISL) Scaling Architecture. A single DD880 system achieves single-stream throughput of up to 1.28 TB/hour, performance that is imperative or protecting large, business critical databases in the data center. This performance is achieved by a CPU-centric approach to deduplication, which minimizes the number of disk spindles required to achieve throughput needed for critical single-stream operations.”

The only way performance is CPU-centric and minimizes the number of spindles required to achieve throughput needed is when the amount of data actually committed to disk is so minimal that it is not gating the data flow that is being processed inline. This happens when the data is duplicative and only pointer information is committed to disk.

Over a year ago, we (NEC) actually conducted tests with Data Domain’s 5xx hardware (DD580) and verified that their maximum stated numbers were indeed *only* applicable for 100% duplicate data. For 0% duplicate data (i.e. all data actually written to disk), the max throughput numbers we measured with the same DD system were 45% lower. Note that at the time, their published numbers did not incorporate OST or 10GbE, which co upled with some software enhancements for duplicate data attributed to increasing their published numbers by 90% as specified in their press release a few months ago.

“While performance can increase on all systems across all protocols, large data centers using Data Domain’s flagship DD690 system with Veritas NetBackup OpenStorage (OST) by Symantec and 10 Gb Ethernet, can now support accelerated backup throughput of up to 750 MB/s, or 2.7 TB / hour. This is approximately 90% faster than the DD690’s best throughput when introduced in May 2008.”

If we apply the same ratios to the DD880 (dividing by 1.9 to account for the 90% mentioned in the press release and then multiplying by 55% to account for the 45% lower previously measured throughput for non-duplicate data), the published max number of 5.4 TB/hr or 1,500 MB/s with OST and 10GbE and 100% duplicate data suggests their max throughput for non NBU/OST applications with 1GbE connectivity and 0% duplicate data would be approximately 435 MB/second.”

I checked in with Data Domain’s Senior Director of Product Management, Ed Reidenbach, to get his response on this to see if Data Domain achieves these benchmarks by sending a copy of data to the DD880 that is already deduplicated on the DD880. (In Ed’s defense, he did not see the entire text of Gideon’s message referenced above. Gideon and I discussed this topic at a high level on the phone before Gideon sent me this additional information via email a few days later.) This was Ed’s response to my initial, shorter inquiry:

“To get this number we (Data Domain) use the harmonic mean of 10 full backups to the DD880. Each Full after the first has some new data added, data changed and data deleted. The average global dedupe rate after doing daily fulls for a relatively long retention period (40 to 45 days), is about 15-18x, not the hundreds x that might be implied.“

The term “harmonic mean” was a new one even for me so I did a little digging on the Internet to figure out it meant. I found a layman’s definition on mathforum.org and sent Ed this response in an attempt to clarify his response:

“To ensure I understand ‘harmonic mean’ correctly, this is how I am interpreting it based upon an explanation I found on the Internet:

If you want the harmonic mean of 10 and 20, you first take 1/10 and 1/20, find their average, which is 3/40, and then take the reciprocal of that which results in 40/3 or 13-1/3.

In algebra, the harmonic mean (h) of two numbers (a and b) is 1 / ((1/a + 1/b) / 2) or in other words 1/m = 1/2 (1/a + 1/b).

In the example you cite, instead of there just being two numbers used to determine the harmonic mean, you use 10 numbers. Is that correct?”

He responded to that email with the following:

“You are correct. The harmonic mean is more conservative. The harmonic mean is always less than the geometric mean, which is always less than the arithmetic mean. There would be a total of 9.“

At this point you can draw your own conclusions as to whether or not the high end of Data Domain’s benchmark numbers are achievable in most real world environments. It seems rather unlikely but then again, I have found very few vendors can ever deliver on their performance claims in real world environments. My thought is that most users are unlikely to achieve either Data Domain’s published high end numbers or bottom out at the low end numbers that NEC published in anything but highly unusual circumstances.

Finally, I wanted to turn my attention to a couple of items that I discussed in my blog last week. I mentioned that the CEO had stepped down at a company and that the company had confirmed and would issue a press release this past week to that effect. Of course, just to prove me wrong, the company did not do that. I checked in with them again this week and it now plans to issue a press release after it hires a new CEO. Once it does, I will mention the company in my blog. In the meantime, the company assures me it is doing fine and that it already has a candidate for the job.

Also, there are a few who privately contacted me and asked me if it was Copan Systems that I was referencing when I mentioned last week that I heard a company was going to be acquired. It is not Copan though I certainly can understand why based upon an article that Chris Mellor at The Register posted this past Wednesday. Unfortunately I have no news to share at this point in regards to that company.

In regards to vendor announcements that caught my eye this week (actually, it went out last week, I just didn’t see it until this week), Xiotech is offering a “Cash for Disk Clunkers” program. Xiotech is giving a $1000 credit for each terabyte of disk traded in towards the purchase of a new Xiotech Emprise storage system.

Finally, one rumor that I did pick up is that EMC may still be on the acquisition trail. Apparently it is on the hunt for an eDiscovery company but no one seems to know why or what company or companies it might be interested in. While I have not verified this with EMC, it wouldn’t surprise me if EMC purchased a company like Kazeon Systems if for no other reason just to further tweak NetApp.

Have a good weekend!

Is Your Data Protection Software FIPS 140-2 Compliant? If You are in Healthcare, It Better Be

The current recession’s wrath has spared few, and technology has seen its hard times just like all industry sectors, but one area that appears poised to be one of technology’s biggest benefactors is health care. When the Stimulus bill was passed, President Obama made it a point to bring health care technology front and center by providing $19 Billion dollars for the implementation of an electronic medical record (EMR). $19 Billion dollars certainly gets companies attention and most are either positioning themselves, or renewing their focus on healthcare to glean their share of this substantial investment of dollars.

Stimulus money aimed at EMR is a welcome investment for healthcare for varying reasons but mostly due to the fact implementing this type of technology is costly. But, it is not only costly in upfront cost; it is also difficult to implement. Add in the vagueness of key provisions written into the healthcare regulations of the Stimulus bill referred to as HITECH (Healthcare Information Technology for Economic and Clinical Health) and suddenly it is understandable why healthcare is cautiously wading into the EMR waters.

HITECH has a number of purposely ambiguous deadlines designed for the Health and Human Services (HHS) department so it can add clarification at a later date. Arguably the first area of clarification for healthcare is in the area of data breach notification.

The HITECH act took the idea of data breach notification national after being made popular by state laws such as California’s now infamous SB1386. Although HITECH doesn’t supersede state law if the state law is more restrictive, it does add an additional reputational risk to the law which usually wasn’t part of state law.

HITECH mandates that when a data breach occurs that exposes over 500 patient records, prominent local media must be notified. Further, data breaches in this category must be posted to the HHS web site. Like SB1386 and other similar state laws, HITECH provides for “safe harbor” from the costs of patient notification as well as the reputational risk if data is protected from unauthorized access using encryption.

What HHS mandated is “data in motion”, or data that is moving through a network to include wireless networks. The approved encryption processes to claim safe harbor are those that comply with the requirements of the Federal Information Processing Standards (FIPS) 140-2. This cryptographic standard ensures that federal guidelines for the effectiveness of encryption, strength of the algorithm, and security of the decryption key. HHS does not regard Electronic Protected Health Information (ePHI) as secure if the encryption key or the encrypting process has been breached.

If healthcare is going to invest into a software solution that moves ePHI across a network then FIPS 140-2 certification becomes important. If ePHI is secured in accordance to the HHS guidance of FIPS 140-2 then unauthorized access to the ePHI information does not trigger the HITECH data breach notification requirements. Without this new FIPS certification, it will be difficult for a healthcare institution to invest in solutions that move ePHI across networks for backup and recovery, disaster recovery, data archiving or some other purpose.

DCIG has seen a renewed importance placed in the FIPS 140-2 standard and it only stands to reason a driving factor is the $19 Billion dollars being invested into healthcare technology and the subsequent HHS guidance. Being able to demonstrate technology products adhere to this important security standard will be increasingly important in the future.

The high costs of data breach notification not only affects healthcare but all industries where instances of unauthorized access to data can be financially devastating. ThePonemon Institute estimates the cost of a data breach in 2008 was $202 per record, thus ensuring data is protected while in motion is a critical aspect in any software solution.

The healthcare industry should ascertain whether the software used to move data across networks is FIPS compliant, and if the software solution isn’t compliant, then vendors should be able to provide a roadmap for compliance. If they aren’t FIPS compliant then data breach risks are significantly higher and safe harbor will not apply in cases of unauthorized data exposure. If they are unwilling or unable to show a roadmap for compliance, a very cautious approach should be taken towards their products to minimize the exposure risk to the HITECH notification mandate.

Although encryption is only part of the data security equation, encryption currently offers the best solution for ensuring protection against unauthorized exposure. Seeing companies renew their interest in the FIPS 140-2 standard is most certainly a necessary and welcome step in improving privacy and data security. If an “Approved” File Exception List for Keyword eDiscovery Searches Exists, I Couldn’t Find It

I was asked an interesting question by Jerome Wendt, DCIG President and Lead Consultant a while back. Jerome inquired, “Is there a list of approved files organizations can exclude from their list of search candidates that are common files and will never contain any information relevant to a legal search?” This was an interesting question for several reasons, mostly because nothing immediately came to mind.

I then set out to try and determine if such a list existed and have come to the conclusion that if such a list does exist it isn’t widely known and I certainly could not find it. As I continued to try and answer this question, I realized that if I was having this much trouble, chances are that most people will have the same frustration.

One of the most significant areas of eDiscovery is performing a relevant keyword search of data to produce the proper documents as mandated by eDiscovery requests. This collection of ESI (electronically stored information) holds particular importance as produced documents will go through a review process prior to producing these to opposing counsel. As data continues to grow within organizations eDiscovery costs continue to rise therefore it is extremely important to have a robust search that reduces non-relevant information during a search.

Collection of electronic data should be comprehensive. But based on recent eDiscovery failures involving keyword searches such as the recent case of Active Solutions, LLS and Southern Electronics Supply, Inc. v. Dell, Inc. as was highlighted by DCIG, this process is difficult for even large companies with greater resources such as Dell to achieve.

Companies must take a consistent approach to meeting these mandates and a company’s keyword search process should not negatively impact the eDiscovery procedure as it moves forward. Dell’s case also highlighted the importance of a proper search of a company’s e-mail and the court’s increasingly impatient stance toward inadequate searches of email during eDiscovery. So some areas to consider for keyword search moving forward are:

Be specific in your search. A well thought out word search reduces the amount of irrelevant “hits.” For example, using short words such as “mark” will produce numerous hits on words such as “trademark” “benchmark” etc., if you want to eliminate this then try more specific word searches, or phrases. If possible avoid generic industry words. If you are doing a search at a hospital, words such as “patient” “transfer” and “emergency” will be found in abundance and may hinder rather than help in your search efforts. Search for user created files types. Search for user created file types such as “rtf”, “doc”, “xls”, “pdf”, “txt”, “html”, etc. and avoid known files that are not relevant to your search. The National Software Reference Library is a huge repository for known traceable software and can be very helpful in narrowing search results.

Using email archive products such as Estorian’s LookingGlass can provide companies a means to index and search email and give you accurate results. By accelerating the email search process and reducing a huge volume of data down to only relevant information, it significantly increases a company’s ability to perform early case assessments of the data thus saving costs.

The importance of ensuring a proper approach to email keyword searches is clearly demonstrated in Dell’s case but it applies to almost every aspect of eDiscovery. Email’s continued importance in eDiscovery processes provides a challenge to companies that only through the use of technology such as LookingGlass and proper keyword search techniques can overcome. The courts are consistently showing their unwillingness to accept poor search results and companies that present these types of unsatisfactory results run the risk of the courts demonstrating their frustration through fines and sanctions on the offending party.

Imation Gets Inside Track on LTO-5; Cloud Storage Comes with Strings Attached

Two topics – really on opposite ends of the storage spectrum – captured my attention this week. The first had to do with an announcement that Imation made this past Wednesday regarding it being the first and only company currently licensed to manufacture LTO-5 tape media. The other had to do with cloud storage and some of the conversations that I continued to have with various providers in terms of how ready (or not ready) cloud storage is for prime time.

First, I briefly spoke yesterday to Will Qualls, Imation’s Product Director for Tape Products, about its recent announcement as well as what Imation is experiencing in terms of momentum around tape in general andLTO technology specifically. The significance of the announcement is that so far Imation is the only storage vendor that is currently licensed to produce the LTO-5 media in anticipation of LTO-5’s early 2010 launch. Since the specifications on how to manufacture LTO-5 tape media are not made public and only released to manufacturers “approved” by the LTO consortium, this would seem to give Imation a significant head start on its competition. However Qualls declined to comment as to exactly why Imation is the only manufacturer to receive this approval status.

Qualls and I also discussed the tape market in general and he made a couple of interesting comments in regards to tape’s growth. When asked about how deduplication was affecting the tape market overall and LTO specifically, he said the tape industry has definitely seen a reduction in the number of units shipped but whether that is due to increases in tape capacity or the introduction of deduplicating disk systems into backup processes, he could not really say for sure. However Imation is definitely seeing tape increasingly used in an archival role through Imation anticipates that deduplicating storage systems will show up there sooner rather than later.

While LTO has become the predominant tape format over the last few years, LTO is still looking to broaden its reach. Qualls mentioned the IBM 3590 tape format as a specific area where it has seen some success recently in displacing that format with LTO. I then asked him if LTO had any plans to support mainframe connectivity and on this point he hedged a bit. Since IBM is part of the LTO consortium and also controls the mainframe market, he currently is not aware of any plans to attach LTO to the mainframe.

I guess this makes sense from IBM’s perspective. After all, why would IBM want to undercut itself my giving its mainframe users a more economical option for tape backup? Let them use 3590 media and pay more. Sometimes I think it is a miracle the mainframe has survived as long as it has with that mentality.

In any case, back in the open systems side of the world where it is a rarity for anyone to survive, I had number of engaging conversations with EMC, NetApp and Mezeo Software, a relative newcomer to the storage scene, on the topic of cloud storage. Again, these interviews were done in conjunction with an article I am writing for SearchStorage.com so I am only sharing a couple of tidbits of information that I gathered that are not germane to the article:

It is important for anyone using contemplating the use of cloud storage to understand that it uses object-based storage. Thia means to access and retrieve data stored to the cloud, applications have to use web access protocols like SOAP and REST. This may preclude some applications from using cloud storage as a storage option in the near term until they add support for these protocols. It also means some storage products will need software overlays in order to support these . I talked about this last week in a blog where Mezeo Software and Permabit were cooperating in such a fashion. In that vein, this past Wednesday they formalized their relationship with an announcement that they were entering into a partnership. Sources in the industry also tell me that Mezeo is having similar conversations with other competing archiving providers. The upside of storing all data as an object is that it creates new possibilities for storing and accessing data. Mezeo tells me that this can eliminate current file system constraints such as using hierarchical file systems and allows for the creation of large fields in which to store metadata. This will eventually allow applications to tag objects in many and multiple ways and potentially greatly improve search capabilities of data archived to the cloud. Search engines can first search the metadata and never have to touch the actual archived data unless it actually needs to be retrieved to satisfy a retrieval request. Since many cloud service providers charge for bandwidth and access charges, this stands to significantly lower the ongoing operational costs associated with storing data in the cloud.

Finally, there are a couple of pieces of industry news that I picked up over last few weeks but that I can not yet publicly share. First, I was recently informed that the CEO of a company in this industry has stepped down via an email from the CEO. The company for which he used to work verbally confirmed his departure when I called them. I promised I would wait to blog about it until there was a press release on its website but, as of last night, there was no announcement.

The other tidbit of news I picked up is that a storage hardware company that has been struggling for some time and looking to sell its technology may have found a buyer. Again, this is not definitive but I may be able to share more next week as to who the company and its buyer are.

Are We Really Ready for Deduplication of Databases on Primary Storage? I Still Say No

In a recent blog post entitled Deduplication of Databases on Primary Storage Just Rubs Me the Wrong Way I received some great comments, questions and even a ding. Because of the nature and depth of the comments and questions, however, I felt it only appropriate to produce a follow up to that post and help explain a few things brought up in those comments. In particular I wanted to address questions raised by Matt Simmons and Mike Dutch.

Matt Simmons said: Please correct me if I’m wrong, but isn’t there a performance hit for doing random access on deduplicated data? I don’t see how that couldn’t be the case, since every data request would have to be looked up in a table. Even if the entire data store is stored in cache, that’s still a lot of latency just from lookups. Of course, I could be misinformed as to how dedupe works.

Koopmann says: Intuitively, we all know that adding additional code, instructions, or hardware will add processing overhead. What we all want to know is, “How much?” More often then not we ask ourselves or look for a bigger that will hide the performance hit

Since deduplication on primary storage is semi-new, performance numbers and real-life scenarios are hard to come by and even a number of deduplication vendors are forthright in saying their deduplication is intended for backup environments. However I think we can safely say though that deduplication on primary storage is much more difficult than deduplication of backup data. So I’d like to just leave a few quotes on what some of the experts in the field say:

In regards to deduplication of primary storage for backups I’d like to direct you to a quote I found from Larry Freeman, Senior Marketing Manager Storage Efficiency Solutions at NetApp and Dr Dedup himself. On June 18, 2009, Dr Dedup says: “If your application or system is extremely performance-sensitive, don’t run dedupe”. Now I personally wouldn’t take such a hard and fast stance here and I’d encourage you to read the rest of the thread as there is some good information on how NetApp’s deduplication works along with performance data. In a Storage Switzerland, LLC LAB REPORT: Deduplication of Primary Storage, Senior Analyst George Crump states that “As I have stated in other articles, deduplication by itself has limited value on primary storage” and “works against active online and near line data (like databases), where the occurrence of duplicate data is unlikely. This is in contrast to backup data, where the same full backup runs every week and the chances of duplication are fairly high”. In a SearchStorage.com article,Users turn data reduction focus to primary storage, Senior News Director Dave Raffo asked the questions ‘Is data deduplication a good fit for primary storage?’ Data Domain CEO Frank Slootman said not only do “We try to use the term ‘primary storage’ carefully,” but “If you have data that is really hot, has an extremely high change rate, like transactional data, there’s no sense deduplicating it.”

The problem here isn’t really about how much of a performance impact you will see but more along the lines of, “Can you really measure the performance impact?” and “Does anyone really care or perceive it?” I’ve often found in the database shops that I have worked in that there are very few people that know how to relate storage performance to database performance. Just ask your DBA if they know how many IOPS they are getting and if they require additional spindles to help improve SQL performance. (More than likely they will roll their eyes.)

Mike Dutch said: Isn’t database normalization/record linkage essentially concerned with “information deduplication” focused on ensuring “good” query results rather than “data deduplication” which is focused on improving storage and bandwidth consumption? I tend to think of “data deduplication” as being part of the data storage process as it is similar in many ways to “just another file system”… “go here to get the next set of bytes”.

Since most data deduplication procedures store data compressed, it must be uncompressed and that network bandwidth/latency issues may result in undesirable performance impacts for some data. However, this doesn’t mean that you shouldn’t use data dedupe on some databases (for example, copies of a database used for test purposes) or on other types of primary storage data that are not overly performance- sensitive.

Koopmann says: Yes, database normalization is concerned with developing structures that will not only help ensure data quality but also help ensure that an application is able to manipulate data easily. While a byproduct of normalization is the reduction of redundant data let’s stop right there and also realize that normalization does not eliminate all redundant data.

Again, in a purist sense, data modeling, when done effectively (or compulsively depending on your perspective), will push down all redundant data to lookup tables–leaving UIDs as pointers to that lookup data. At this point duplicate data is completely eliminated except for the system generated UIDs that you would be hard pressed to note any storage gains through deduplication.

Granted, most databases are not modeled this way so looking at how your particular database is architected and works is key to breaking the deduplication mystery (with vendor help). For Oracle, databases structures (table and index designs) will share the same blocks but the actual data/records/rows that are stored in the data blocks are likely to be different.

But it does depend on the data model and transaction patterns (INSERTs, UPDATEs, DELETEs). This means having deduplication at the file level surely won’t work and deduplication on individual blocks probably won’t see duplicates. This leaves us with deduplication at the byte level within and across all data blocks in all SEGMENTs of data files. Depending on your dedup vendor this solution may be beneficial but you, will certainly need to verify the solution.

Surely you can use deduplication on some databases as Mike suggests and hit it square on the nose. The key term used here by Mike is “copies of”. Many databases have copies of data that may be for backup, test, historical, or offloading query functions. If your database has many of these “copies of” data, you could look at the benefits of deduplication, or perhaps consider non-duplicative “thin” snapshots. Non- duplicative snapshots refer to the fact that changed data is never duplicated across a group of snapshots versus copying changed data across all snapshot copies. This particular technology is well-proven in primary storage applications.

Again, I have nothing against deduplication when used appropriately and other factors are uncontrollable. But in a properly architected database I am still skeptical about the fit. Databases are just too dynamic with temporary sort, rollback, and redo areas and high transaction rates that make me question what there could possibly be to dedup in the first place.

Secondly, many database practitioners are too far removed from the performance implications that revolve around storage already that they would certainly not be able to handle an additional layer of abstraction. So for now, I hold steady on my statement that a more viable alternative to current reduction and stabilization of storage acquisitions within database environments would be to deploy thin provisioning.

Using proven and industr y leading thin provisioning and non-duplicative thin copy technologies from a vendor such as 3PAR allows databases to allocate just-enough and just-in-time storage–relieving IT from both having to watch and then add or remove storage. Thin provisioning and thin copy are data reduction technologies developed for primary storage applications, and therefore addresses the desire for capacity efficiency without the performance impact and deployment mystery associated with today’s storage deduplication technology.

Nexsan Dedupe SG Gives Resellers Much Needed New Choice in Mid-Market Deduplication Systems

Resellers that currently offer Data Domain but are affiliated with EMC’s competitors are in somewhat of a pickle right now. The data deduplication market is hot and getting hotter so resellers better have something in their solution portfolio to offer their customers. Unfortunately EMC has not yet publicly stated its intentions as to how it intends to treat current Data Domain partners and the last thing any reseller wants right now is to walk into a customer account without a deduplication solution.

Resellers that are already part of EMC’s channel are probably in better shape than those that are not. As an independent hardware provider, Data Domain’s product could be more readily resold by any reseller regardless of what other products it offered. Now that EMC has direct control of Data Domain, the future of Data Domain’s channel program is hazy at best.

As such, it is only logical to assume that EMC will bend towards favoring current and prospective EMC channel partners in whatever final arrangement is announced. So for Data Domain resellers that are aligned with EMC’s competitors, now is the time to explore other available options before EMC cuts them off and they no longer have a deduplication solution to sell to their customers.

In assessing what options are available to them, they should look for some of the same characteristics in a new solution that originally attracted them to Data Domain in the first place. These features include:

Turnkey deployments in customer environments. The appeal of data deduplication systems is they can directly plug these systems directly into their customers’ existing networks. The systems should look and act like file servers so setup times are minimal and backup software knows how to manage them. Mature data deduplication software. Neither you nor your customers are going to want a “new” data deduplication solution that is untested in the field. This prerequisite shortens the list of companies you can work reasonably expect to look at as an alternative to Data Domain. 100% channel focused. Nothing is more frustrating than to go through the entire sales process and get close to closing the deal with a customer than to have the manufacturer of the product by-pass you and sell directly to the customer. Equally frustrating, you do not want a product that you are reselling also available from an online website at a cut rate price. A single point of contact. A key for any successful reseller is to avoid getting trapped in support issues. Selling a data deduplication solution that includes products from multiple vendors can put you on the spot to support it. To avoid that, you need a solution from a provider that also can meet your support needs.

It is for these four reasons that yesterday’s announcement of the new Nexsan DeDupe SG solution should particularly appeal to current resellers. As a company, Nexsan already has a solid reputation as a storage system provider, is experiencing steady growth and was on track to do an IPO last year before last fall’s stock market decline. But maybe more attractive to resellers, Nexsan sells 100% of its product through the channel.

The addition of DeDupe SG to Nexsan’s storage portfolio should also make Nexsan more interesting from a reseller perspective. Right now the sales cycles for selling deduplication systems is likely much shorter (and potentially more profitable) than production storage systems. As such, resellers can now lead with the DeDupe SG in customer accounts with a higher expectation of success of a sale in a shorter time period.

In terms of bringing the DeDupe SG to market, Nexsan smartly partnered with FalconStor. This negated the need for Nexsan to develop its own data deduplication software while leveraging the data deduplication software that FalconStor sells and has sold for some time. While Nexsan is only including FalconStor’s file server version of the software on the current DeDupe SG release, there is nothing to prevent it from bundling other FalconStor software at some point in the future.

Maybe the most important move Nexsan made was to bundle support for the DeDupe SG’s hardware and software. So if there is an issue with the appliance, all support calls go first and to Nexsan and then it manages the support of the issue regardless if it is the hardware or the software.

Nexsan also brings some features on the hardware side to the table that helps to differentiate it from being just a “me- too” storage solution on the backend. The “SG” in the “DeDupe SG” stands for “Speed with Green”. While other hardware solutions can also arguably deliver speed, the “green” aspect is what makes Nexsan’s part of this solution stand out.

High density and its AutoMAID feature are included on all of Nexsan’s storage systems. The high density feature keeps data center footprints to a minimum day one without compromising the reliability or integrity of the deduplicated data stored on them.

The AutoMAID feature comes into play since backups occur primarily at night and disk drives may be needed for only 8 hours a day or less. By turning on AutoMAID, drives can be spun down during off hours. This reduces power and cooling costs while extending the life of the disk drives in the Nexsan system.

Data Domain’s recent acquisition by EMC has left a void that a number of resellers are anxious to fill in their product portfolios. However rushing into a decision or making the wrong decision is regards to data deduplication is something they cannot afford either.

In that sense, the new co-branded DeDupe SG from Nexsan and FalconStor is clearly a “right time, right place” solution. Nexsan delivers a solution that has all of the characteristics of a “best-of-breed” solution without some of the typical drawbacks that accompany them. By bundling hardware, software and support and configuring the software in such a way that it meets the needs of the mid-market, the Nexsan DeDupe SG should be a logical choice for any reseller to consider on its short list as a data deduplication offering. Admitted Bad Habits around Email Reinforce Why Companies Need to be Proactive in their Email Management Policies

Every now and then a study comes along in IT that makes you wonder if the public will ever listen to security alert messages as some of these studies yield results that quite literally make you want to throw your hands up in frustration. A case in point is the recently released study by Message Anti-Abuse Working Group MAAWG) entitled “A Look at Consumers’ Awareness of Email Security and Practices.” However it is the report’s subtitle “Of Course, I Never Reply to Spam – Except Sometimes” is what gets to the heart of the matter and what frustrates me as it shows that email users do understand the risks of spam yet still click on the message.

This report provided some interesting insight that reveals how pervasive email usage is in corporations and more importantly how users view email:

98% had both work and home email addresses The 24-54 age group is more likely to access email at work than at home The most important email function as identified by users was email from friends and family 1 in 6 people surveyed admitted to clicking on spam

After reading through these statistics (especially the last one) it became clear that even with all of the education and security alerts around email, current email usage policies coupled with virus and spam controls are not enough. Users continue to engage in unsafe email behavior and since most users rightly or wrongly view their work email address as their primary email address, the risks that email misuse presents to organizations are extensive.

Through my IT career I have seen the rise of anti-virus software and, more recently, anti-spam solutions aimed at the enterprise. For the most part organizations now see the risk/reward to installing these types of products but the effectiveness of both the solutions and risks of not implementing the right ones vary widely from company-to- company.

This study highlights some of these risks which include:

What attachments are being sent or received?What are the chances that trade secrets or attachments with confidential or proprietary information are being sent from the company without its knowledge or permission?

Are users using email outside established policy thereby putting your company at risk?The most important function of email for users according to this survey is communicating with friends and family. There is no problem with that but how comfortable are organizations with the knowledge that their employees are using their email system solely for work purposes? Further, are the emails being sent and received in violation of policy and do they, in a worst case scenario, present a legal risk to the organization?

Is a large volume of e-mail being sent from a specific user? Can you identify a rapid acceleration of sent e- mail being sent from a specific user? This type of email velocity could point to a compromised PC being used as a “bot”

These risks beg the question “What else is going on?” as there are undoubtedly other risks to which companies are unknowingly exposed that are not being taken into consideration in this study. Statistics like these scream out the obvious: Companies need to take control of their email. To do so companies need tools to ensure their employees do not expose them to unnecessary risks.

New technologies now mitigate some of these risks. Products such as Estorian’s LookingGlass alleviate the risks outlined above as it fully indexes incoming and outgoing emails and their attachments. In so doing, LookingGlass provides companies the assurance that all emails that their employees send and receive adhere to existing policies in real-time and, if they do not, are blocked and alerts are generated

These notifications warn when a policy has been violated and sends the information to the individuals responsible for taking action, such as HR or an internal security team. Email analytics features included in LookingGlass can track email by user, hour, day and beyond which provides insight into items such as the number of emails sent and received. This can provide important quantitative information on email velocity and identify a small problem before it becomes a big one.

The MAAWG study highlighted that employees do not always differentiate between home and work email. Because of this, organizations need some means of enforcing email policies to deter employees from jeopardizing a company from their ill- advised email behaviors. Taking control of email requires having tools that proactively monitor the information contained in emails so organizations are protected from the potentially abusive or dangerous content that is shared and sent in emails. Products such as Estorian LookingGlass provide this level of control that products such as anti-spam, anti- virus and even other email management products yet lack. It is Too Soon to Proclaim Cloud Storage Ready for the Main Stream

Weekly I try to do a recap of what was on my mind this during the past week and this week cloud storage garnered my attention. Deduplication may be the BIG thing in storage right now but cloud storage is rapidly gaining momentum and looks to be the next big thing in storage sooner rather than later. Recent blog entries by individuals like George Crump (Byte & Switch) and Stephen Foskett have commented on this topic which indicates it is gaining mindshare among analysts, editors and journalists. But when I speak to cloud storage providers that are virtualizing cloud storage offerings from other providers, it tells me that cloud storage has a ways to go before it can be officially proclaimed ready for the main stream.

The reason for my heightened interest in cloud storage was prompted by a current assignment I am working on for SearchStorage.com that is tentatively slated for publication sometime in August or September. That article will provide some tips for users that are considering moving and storing their archival data in public or private cloud so my thoughts and suggestions around that topic will be reserved for that article. However my research for that piece took me down an unexpected path as to how cloud storage is still maturing.

This last week I was speaking to Mike Ivanov, Permabit’s VP of Marketing, about this topic and he mentioned how Permabit was working with a company called Mezeo that just recently came out of stealth mode. Mezeo was cooperating with Permabit so Permabit could compete more effectively among public cloud storage providers like telcos and other managed service providers (MSPs).

However that statement confused me. I’ve covered Permabit in many other blogs so I know it can scale into the petabytes, uses a NAS interface and stores all files as objects. So I was unclear what Permabit hoped to gain by partnering with Mezeo?

To answer that question, I spoke to Mezeo to get an understanding of how its technology works. In brief, Mezeo produces software that is intended for telcos and MSPs that wish to provide their own public cloud storage offering so they can compete against the likes of and Nirvanix. To do this, Mezeo leverages an object oriented file system that is specifically tuned for delivering secure file storage over the Internet using HTTP and REST APIs. (This is an API that cloud storage products intended for private and even public storage clouds may not support.)

While that was interesting, that still did not explain how Mezeo was leveraging Permabit’s technology on the backend. Mezeo went on to explain that Permabit’s clustering architecture provided these cloud storage providers with improved levels of availability and reliability versus just using commodity storage or without needing to spend lots of money on Tier 1 or 2 storage systems.

OK, that made sense, but that still failed to explain how Mezeo virtualized Permabit’s cloud storage solution. So I asked if it works in a manner similar to F5’s Acopia Networks offering? The answer is yes and no. Yes, in the sense that Mezeo’s cloud storage software virtualizes the Permabit CIFS/NFS file system interface which Mezeo then in turn presents to the Internet for secure file access and sharing files over the Internet using HTTP.

However it does not do true file virtualization in the sense that Mezeo will not discover any existing objects or metadata stored on the Permabit cloud storage offering (or any other cloud storage offering). Rather Mezeo assumes all storage it virtualizes is an empty storage system ready for population of data that Mezeo sends to it. If data is already stored on an existing cloud storage product and an organization wishes to virtualize that solution using Mezeo’s software, it will first have to migrate the data into the Mezeo solution so it is then aware of it and can manage it.

So what does this tell me about cloud storage? Two things – first, cloud storage is maturing because deficiencies and shortcomings are being found in current products. As these problems are identified, new solutions are being introduced that compensate for these problems and even tip their hat to them, as Mezeo does by seeking to compliment rather than compete with current cloud storage offerings.

Second, it demonstrates that cloud storage is moving from bleeding edge to leading edge. Companies with tech savvy people can probably now successfully implement the current generation of cloud storage technology. But as this mish-mash of technologies indicates, it is certainly not ready for the main stream as most organizations need turnkey solutions that they can rely upon the storage vendor to deliver and support.

A Better Archive and Backup Deduplication Fit for VAR Solution Portfolios

Most VARs who have had success selling Data Domain systems over the last couple of years are feeling a bit uncomfortable right now: EMC has announced its official take-over of Data Domain. VARs have made a good living on Data Domain, contributing to Data Domain’s success as having one of the best-selling, fastest-growing deduplication storage systems in the market. VARs are now feeling vulnerable to EMC’s goodwill – or probable lack thereof.

EMC’s goodwill is a commodity that is traditionally in short supply when it comes to its channel partners. As a result, more VARs believe they need an alternative solution to offer. Long perceived as a ‘me-too’ vendor in the backup deduplication space, FalconStor appears to be in the right place with its technology at the right time to capitalize on the gap in the channel market that EMC’s recent acquisition of Data Domain creates.

First, some background on the current situation with EMC and Data Domain: EMC announced on Monday, July 20, that it has effective control of over 80% of Data Domain stock. So with EMC now officially taking control of Data Domain, EMC started talking publicly last week about what it intends to do with Data Domain. Its press release, press briefings, and blogs are full of their plans for Frank Slootman and team. Frank and Joe Tucci also published an open letter to customers and partners, which doesn’t say much but “stay tuned as we progress.” But anyone who reads these materials closely will note that one topic is notably missing: EMC’s intentions for Data Domain’s VARs.

So if previous history is any indicator, the likely outcome of this acquisition is that EMC will begin to escort Data Domain into their large enterprise accounts. The joint FAQ seems to indicate that this will happen. Meantime, VARs are on their own to decide what they will offer for a backup and archive deduplication solution for their clients.

Now, some details on why FalconStor might actually turn out to be an even BETTER fit for VAR solution portfolios than Data Domain: FalconStor has seen the light that Data Domain shone into the industry, and is packaging its software deduplication technology into standard product configurations that are simpler and easier to deploy – because it comes prepackaged for servers and storage.

The FalconStor TOTALLY Open software offers more features and better scalability than Data Domain, but has been too hard to purchase and deploy as compared to the Data Domain appliance approach. By combining FalconStor software with select hardware platforms – hopefully those which the VARs find most attractive to offer – FalconStor should be able to offer the best of all worlds.

Certainly, FalconStor’s recent announced partnership with immixGroup to add its TOTALLY Open solutions to the partner’s GSA schedule should encourage VARs selling to federal agencies, and to state and local government. The solutions added include FalconStor VTL with deduplication, CDP, and more.

Although packaged for hardware, FalconStor still offers customers the ability to purchase the servers and storage of their choice. This choice means that VARs need never again be vulnerable to the fate of a single storage vendor solution. FalconStor also works with virtually any backup or archive software, enabling VARs to position and sell any data management software with virtually any storage device in their portfolio. For VARs, this approach offers the ultimate mix and match of components from their portfolios, with at least the promise of better opportunities to maximize profit. VARs are also key to FalconStor’s strategy of simplification for their customers, as only the VARs can make the combination of the hardware and software appear truly seamless for their customers. Talk About Remote Backups – A Perspective from Papua New Guinea

It’s easy for IT folks in the US to think we have problems. Whether it is worrying about our jobs, how we are going to stretch the budget to get everything done that needs to be done or trying to decide if and when to innovate, our problems can pale in comparison to the stresses that individuals working abroad can face. This is especially applicable for those individuals working for missionary organizations that work in the remote parts of the world such as Papua New Guinea that have huge technology needs but who do not really have any viable, affordable data protection options available to them.

Recently I had the opportunity to meet one of these individuals who works in one of the most remote regions of the world: Papua New Guinea. This individual is Bob Lee who works as a missionary in Papau New Guinea forWycliffe Bible Translators and has recently retired to Omaha, NE. While Bob still goes to Papua New Guinea occasionally (and is there as I write this), he is now in Omaha more often than not and I recently had a chance to meet him and his wife over dinner and talk about some of the challenges that they face in their ministry.

Papua New Guinea is unique in that it has one the most diverse set of languages of any country in the world. In a country that is a little larger in size than the size of Texas but has only about 1/6 of Texas’s population (4.7 million versus Texas’ population of over 30 million), it boasts over 800 known languages. Bob’s assignment while in Papua New Guinua was to translate the Bible into the native languages of these different people groups. Obviously this in itself no small feat as he and his staff need to learn the language and then translate it into the language of these different people groups. In some cases into the written word and, in others, into the spoken word since some people groups they were trying to reach have no written language.

However an equally great challenge for Bob and Wycliffe Bible Translators is protecting the data both as the translation of the Bible progresses and then once it is complete. This is problematic in a couple of ways. First, Bob and his staff are working in remote areas of the island (according to Bob, it is a two hour drive on less than ideal roads just to get to where he resides in Papua New Guinea) so backing up the data is in itself a challenge no matter what method he uses. Power is unreliable, technology supplies are hard to come by and he has to manage and track all of the backups himself.

This leads to the second problem: even when he does complete a backup, he still needs to manage this data and get it offsite. Many of the bible translations that Bob and his staff complete are irreplaceable and impossible to recreate. Since everything they do is for the most part stored digitally, if the data is ever lost or compromised it could quite possibly never be recreated by anyone. But even if he does get the backup shipped offsite, Wycliffe’s missionaries largely work independently. So unless someone at Wycliffe knows how the backup was done, what data is on it and how to recover it, there is a good possibility the data could never be recovered.

The purpose of this blog entry is not to make you feel sorry for Bob Lee or ask for money to help Wycliffe in its Bible translation efforts (though you are certainly welcome to do so if you wish.) Bob understood what he was getting himself into, has been doing it for years and I’m sure he and Wycliffe will carry on. Rather realize there are individuals and organizations out there that really have no good options available to them at all and that we should be thankful for the technologies to which we do have access, imperfect though they may be at times.