Demystifying Clouds
Total Page:16
File Type:pdf, Size:1020Kb
Demystifying Clouds force ma·jeure \ˌfȯrs-mä-ˈzhər, -mə-\ noun Etymology: French, superior force Date: 1883 1 : superior or irresistible force 2 : an event or effect that cannot be reasonably anticipated or controlled — compare act of god I'll admit it: I am caught up in the cloud hype. The caveat, however, is that I truly believe that this is disruptive technology. In this article, I am going to try to demystify some of the hype around utility cloud computing and focus in on the companies that are providing cloud solutions and the technology components that they are using. By no means am I professing to be an expert on this subject. My only intent is to render what I have learned thus far in my quest to understand cloud computing. The Myths Cloud computing will eliminate the need to IT personnel. Using my 30 years of experience in IT as empirical proof, I am going to go out on a limb and suggest that this is a false prophecy. One of my first big projects in IT was in the 1980s, and I was tasked to implement “Computer Automated Operations.” Everyone was certain that all computer operators would loose their jobs. In fact, one company I talked to said that its operators were thinking of starting a union to prevent automated operations. The fact was that no one really lost his or her jobs. The good computer operators became analysts, and bad ones became tape operators. There will only be five super computer utility like companies in the future. Again, I will rely on empirical data. I have been buying automobiles for as long as I have been grinding IT, and all one has to do is look at the automotive industry's history as a template to falsify this myth. Some clever person will always be in a back room somewhere with an idea for doing it better, faster, cheaper, and cleaner. In all likelihood, there will probably be a smaller number of mega-centers, but it is most likely that they will be joined by a massive eco-grid of small-to-medium players interconnecting various cloud services. The Facts Since cloud computing is in a definite hype cycle, everyone is trying to catch the wave (myself included). Therefore, a lot of things you will see will have cloud annotations. Why not? When something is not clearly defined and mostly misunderstood, it becomes one of god's great gifts to marketers. I remember that, in the early days of IBM SOA talk, IBM was calling everything Tivoli an SOA. So I did a presentation at a Tivoli conference called “Explaining the 'S' in SOA and BSM.” Unfortunately, one of IBM's lead SOA architects, not Tivoli and not a marketer, was in my presentation and tore me a new one. I was playing their game, I forgot that it was “Their Game.” Therefore, in this article I will try to minimize the hype and try to lay down some markers on what are the current variations of all things considered clouds. Level 0 As flour is to a cookie, virtualization is to a cloud. People are always asking me (their first mistake) what is the difference between clouds and the “Grid” hype of the 1990s. My pat answer is “virtualization.” Virtualization is the secret sauce of a cloud. Like I said earlier, I am by no means an expert on cloud computing, but every cloud system that I have researched includes some form of a hypervisor. IMHO, viirtualization is the differentiator between the old “Grid” computing and the new “Cloud” computing. Therefore, my “Level 0” definition for cloud providers is anyone who is piggy- backing, intentionally or un-intentionally, cloud computing by means of virtualization. The first company that comes to mind is Rackspace, which recently announced that it is going to add hosting virtual servers to their service offering. In fact, it new offering will allow a company to move its current in-house VMware servers to a Rackspace glass house. A number of small players are producing some rain is this space. A quick search on Google will yield monthly plans as low as $7 per month for XEN VPS hosting. It's only a matter of time before cloned Amazon EC2 providers start pronouncing themselves as “Cloud Computing” because they will host XEN services in their own glass house. These services will all be terrific offerings and will probably reduce costs, but they will not quite be clouds, leaving them, alas, at “Level 0.” Level 1 My definition of “Level 1” cloud players are what I call niche players. "Level 1" actually has several sub-categories. Service Providers Level 1 service provider offerings are usually on-ramp implementations relying on Level 2 or Level 3 backbone providers. For example, a company called RightScale un-mangles Amazon's EC2 and S3 API's and provides a dashboard and front-end hosting service for Amazon's Web Services (AWS) offering (I.e., EC2 and S3). AWS is what I consider a “Level 2” offering, which I will discuss later in this article. Service Hybrids Service Hybrids are players like ENKI and Enomoly. Both companies offer services around backbone cloud providers in the form of services and software. In fact, I was baptized in the clouds by Reuven Cohen, the founder of Enomoly, on a plane ride from Austin to Chicago. I sat next to Reuven, and he was gracious enough to school me on Amazon's AWS. Enomoly offers services and software around Amazaon's AWS, and they are clearly the go-to guys for EC2/S3. ENKI seems to be Enomoly's equivalent but with the 3Tera/Applogic application. 3Tera is what I consider a “Level 3” technology, which I discuss below. Pure Play Application Specific This is where I will admit it gets a little “cloudy.” Seriously, companies such as Box.Net and EMC's latest implementation with Mozy are appearing as SaaS storage plays and piling on the cloud wagon. I am almost certain that companies like SalesForce.com will be confused with or will legitimately become cloud plays. Probably the best definition of a “Level 1 – Pure Play” is with EnterpriseDB's latest announcement of running its implementation of PostgreSQL on Amazon's EC2. There are also few rumors of services that are trying to run MySQL on EC2, but most experts agree that this is a challenge on the EC2/S3 architecture. It will be interesting to see Sun's cloud formations flow in regards to its recent acquisition of MySQL. Pure Play Technology When ever you hear the terms Mapreduce, Hadoop, and Google File System in regards to cloud computing, they primarily refer to “Cloud Storage” and the processing of large data sets. Cloud Storage relies on an array of virtual servers and programming techniques based on parallel computing. If things like “S(P) = P − α * (P − 1)” get you excited, then I suggest that you have a party here. Otherwise, I am not going anywhere near there. I will, however, try to take a crack at explaining MapReduce, Hadoop, and the Google Files Systems. It is no wonder that the boys at Google started all of this back in 2004 with a paper describing a programming model called Mapreduce. MapReduce is used for processing and generating large numbers of data across a number of distributed systems. In simplistic terms, MapReduce is made up of two functions: one maps Key/Value pairs, and another reduces and generates output values for the key. In the original Google paper “ MapReduce:Simplified Data Processing on Large Clusters,” a simple example of using GREP to match URL's and output URL counts is used. Those Google boys and girls have come a long way since 2004. Certainly, it is much more complicated than I have described. The real value in MapReduce is its ability to break up the code into many small distributed computations. Next in this little historical adventure, a gentleman named Doug Cutting implemented MapReduce into the Apache Lucene project, which later evolved into the now commonly known Hadoop. Hadoop is an open source Java-based framework that implements MapRecuce using a special file system called the Hadoop Distributed File System (HDFS). The relationship between HDFS and the Google File System (GFS) is not exactly clear, but I do know that HDFS is open and that it is based on the concepts of GFS, which is proprietary and more likely very specific to Google's voracious appetite for crunching data. The bottom line is that a technology like Hadoop and all its sub-components allows IT operations to process millions of bytes of data per day (only kidding, I couldn't resist a quick Dr. Evil Joke here “Dr. Evil: I demand the sum... OF 1 MILLION DOLLARS “). Actually, what I meant to say quintillions of data per day. Most of the experts with whom I have talked say that Hadoop is really only a technology that companies like Google and Yahoo can use. I found, however, a very recent blog on how a RackSpace customer is using Hadoop to offer special services to its customers by processing massive amounts of mail server logs to reduce the wall time of service analytics. Now you're talking my language. Level 2 Level 2 cloud providers are basically the backbone providers of the cloud providers. Amazon's AWS Elastic Cloud Computing (EC2) and Simple Storage Service (S3) are basically the leaders in this space at this time. My definition of a “Level 2” provider is a backbone hosting service that runs virtual images in a cloud of distributed computers.