Master's Thesis

HIGH AVAILABILITY LINUX A Degree Thesis Submitted to the Faculty of the Escola Tècnica d'Enginyeria de Telecomunicació de Barcelona Universitat Politècnica de Catalunya by Marius Oltean In partial fulfilment of the requirements for the degree in TELEMATICS ENGINEERING ENGINEERING Advisor: José Luis Muñoz Tapia Barcelona, October 2015 Abstract Over the last few years the enterprise storage demand have grown exponentially, at a rate of 40 to 60 percent annually, and many companies are doubling their data footprint each year, reaching by the end of 2014 an estimated total digital data of 8,591 exabytes worldwide. Traditional storage systems based on RAID or LV, and connected to servers via SAN or NAS, have solved storage problems since '80s, but working with large amount of data, these monolithic structures have become unsuitable. For this reason I have decided to investigate deeper these technologies and, using different virtualization tools (VirtualBox, LXC and Docker) in Linux operating system, I have installed and studied one of the possible solutions, that it is set to become the future of storage. So, worldwide storage demands systems able to work with this large amount of data, be unified, distributed and reliable, with high performance and most importantly, massively scalable up to the exabyte level and beyond. The technology investigated and described in this project is one of the possible solutions to this growing explosion of data worldwide. It's about an open source software-defined (SDS) and distributed storage system called Ceph. 1 Resum Durant els últims anys la demanda d'emmagatzematge de dades en les empreses ha crescut de manera explonencial, a un ritme de 40 a 60 per cent anual, i moltes empreses estan duplicant cada any la quantitat de dades que gestionen, arribant a la fi de 2014 a un volum estimat de 8,591 exabytes a tot el món. Els sistemes d'emmagatzematge tradicionals basats en RAID i LVM, i connectats als servidors a través de SAN o NAS, han resolt els problemes d'emmagatzematge des dels anys 80, però quan es tracta de treballar amb gran quantitat de dades, aquestes estructures monolítiques són inadequades. Per aquest motiu vaig decidir investigar totes aquestes tecnologies i, utilitzant diferents eines de virtualizació (VirtualBox. LXC i Docker) en el sistema operativo Linux, he instal·lat i estudiat una de les possibles solucions, que està cridada a ser el futur de l'emmagatzematge. Per tant, l'emmagatzematge en tot el món exigeix sistemes capaços de treballar amb aquestes grans quantitats de dades, que siguin unificats, distribuïts i fiables, amb un alt rendiment i, el més important, massivament escalable fins al nivell d'exabytes i més enllà. La tecnologia que he investigat i descrit en aquest projecte és una de les possibles solucions a aquesta creixent explosió de dades en tot el món. Es tracta d'un sistema d'emmagatzematge distribuït, definit per software (SDS) i de codi obert anomenat Ceph. 2 Resumen Durante los últimos años la demanda de almacenamien de datos en las empresas ha crecido de manera explonencial, a un ritmo de 40 a 60 por ciento anual, y muchas empresas estan duplicando cada año la cantidad de datos que manejan, alcanzando a finales de 2014 un volúmen estimado de 8,591 exabytes en todo el mundo. Los sistemas de almacenamiento tradicionales basados en RAID o LVM, y conectados a los servidores a través de SAN o NAS, han resuelto los problemas de almacenamiento desde los años 80, pero cuando se trata de trabajar con gran cantidad de datos, estas estructuras monolíticas son inadecuadas. Por este motivo decidí investigar más a fondo estas tecnologías y, utilizando diferentes herramientas de virtualización (VirtualBox, LXC y Docker) en el sistema operativo Linux, he instalado y estudiado una de las posibles soluciones, y que está llamada a ser el futuro del almacenamiento. Por lo tanto, el almacenamiento en todo el mundo exige sistemas capaces de trabajar con estas grandes cantidades de datos, que sean unificados, distribuidos y fiables, con un alto rendimiento y, lo más importante, masívamente escalables hasta el nivel exabytes y más allá. La tecnología que investigado y descrito en este proyecto es una de las posibles soluciones a esta creciente explosión de datos en todo el mundo. Se trata de un sistema de almacenamiento distribuido definido por sofware (SDS) y de código abierto llamado Ceph. 3 Revision history and approval record Revision Date Purpose 0 02/10/2015 Document creation 1 04/10/2015 Document revision 2 10/10/2015 Document revision 3 13/10/2015 Document revision 3 16/10/2015 Document final revision and aproval DOCUMENT DISTRIBUTION LIST Name e-mail Marius Oltean [email protected] José Luis Muñoz Tapia [email protected] Written by: Reviewed and approved by: Date 02/10/2015 Date 16/10/2015 Name Marius Oltean Name Jose Luis Muñoz Tapia Position Project Author Position Project Supervisor 4 Table of contents Abstract............................................................................................................................1 Resum..............................................................................................................................2 Resumen..........................................................................................................................3 Revision history and approval record................................................................................4 Table of contents...............................................................................................................5 List of Figures...................................................................................................................6 List of Tables:....................................................................................................................7 1.Introduction....................................................................................................................8 1.1.Work Plan (Gantt and packages and deviations)......................................................9 2.State of the art of the technology used or applied in this thesis:..................................12 2.1.Linux Containers....................................................................................................12 2.2.LXC Containers........................................................................................................1 2.3.Docker......................................................................................................................4 2.4.Storage Technologies...............................................................................................6 2.4.1. Introduction..........................................................................................................6 3.Methodology / project development:............................................................................16 4.Results.........................................................................................................................17 5.Budget.........................................................................................................................18 6.Conclusions and future development:..........................................................................19 Bibliography:...................................................................................................................20 Appendices:......................................................................................................................2 Glossary...........................................................................................................................3 5 List of Figures 2.1: LXC vs VM 12 2.2: Docker vs VM 17 2.3: Docker architecture. 17 2.4: Raid 10 22 2.5: File vs Block vs Object 24 2.6:Ceph logo 25 2.7: Ceph architecture 26 2.8: Ceph Crush 27 6 List of Tables: 5.1: Project Budget 18 Appendices: Linux Storage Systems 2.1.2: WWN 59 7 1. Introduction We live in the era of technology that is generating enormous amount of data each second, and with time, this data growth will be incalculable. As data it's the most critical element of any system that exists today, we need to store this ever-growing data such that it remains secure, reliable and, of course, future ready. But can the traditional technologies like RAID or LVM using storage system technologies like File System or Block System and employing storing data over the network methods like DAS, SAN or NAS manage all this amount of data, and above be robust, reliable and, if possible, economic? The answer is no, and we will see the reasons in next chapters. The main objectives of this project are: on one side, analyze the different traditional storage technologies, describing and analyzing the most important concepts involved in its operation; on the other side, perform in depth study of the open source software- defined storage technology called Ceph, explaining the main parts of its architecture and the multiple benefits that provides when managing large amount of data. All of these will be achieved using three different virtualization technologies like VirtualBox, LXC and Docker for simulations, in a Ubuntu 14.04 LTS (Trusty Tahr) release with linux-3.16.0 kernel. As the two main objectives of the project are quite broad issues that lie behind many concepts that until recently were unknown for me, this document is only a brief introduction to the most important concepts of Virtualization tools, Traditional Storage Technologies and Ceph project.

Load more