
Leaf Project - A book streaming platform Infrastructure SEBASTÍAN GALIANO KTH Information and Communication Technology Degree project in Communication Systems Second level, 30.0 HEC Stockholm, Sweden Leaf Project - A book streaming platform: Infrastructure Sebasti´anGaliano [email protected] August 1, 2011 Abstract During the 21st century, humans have begun to have digital books and use in digital form most of their information, be this: image, sound, or printed materials. Some of these efforts have had great success. For images, the new digital cameras brought lots of new features, such as reduced physical size, higher quality, and error correction. For sound, digital music enables digital distribution, ease of sharing, and large amounts of storage. But, despite the recent efforts by Internet companies and publishers, the book industry has resisted success in the digital era. Today, new technologies enable us to distribute digital books. However, these digitised books are still represented in a traditional: as text and static images. However, our project is based on the believe that ones reading experience can evolve and benefit from current technology. The "Leaf" project's main objective is to create an electronic publication distribution platform that can equally satisfy publishers,writers, and readers. The project will create an environment to distribute digital publications that support more advanced technologies, providing in the future something beyond text and static images to the reader. This document describes the different aspects of design of an information technology infras- tructure designed to stream electronic publications. This part of the "Leaf" projects aims to describe the necessary information technology infrastructure needed as an electronic publication distribution platform. The main purpose of this infrastructure is to serve electronic publications on-line to readers; however, that is not the only objective for this infrastructure. It has to deliver a service that should be highly scalable. This architecture should include database and directory servers to support the main services. All the auxiliary services should support the aim of the project , i.e., to serve amounts of traffic in a scalable and highly manageable way, while utilising open standards. The first chapter reviews the state of the art in electronic publication. The document continues in chapter 3 with an analysis of each of the different servers involved in the architecture. The Apache web server is used as streaming book service, the OpenLDAP is used to implement the authentication server. The last section finishes with a discussion of the database server. Chapter 4 describes the performance tests to verify that the propose solution meets the requires set of the system. The "Leaf" project infrastructure has been designed and all the involved services implemented successfully delivering a future possible infrastructure for the book "streaming" services. Moreover, the obtained knowledge will help to create the first steps in the project infrastructure. The performance test unveil that the Varnish web accelerator is a cheap and powerful solution to improve the web server performance. Sammanfattning Under 21 a ˚arhundradet, har m¨anniskor b¨orjatf˚adigitala b¨ocker och anv¨andningi digital form mesta av sin information, den h¨ar: bild, ljud eller tryckt material. N˚agraav dessa insatser har haft stor framg˚ang. F¨orbilder, har v¨ackt nya digitalkameror massor av nya funktioner, s˚asom minskad fysisk storlek, h¨ogrekvalitet och felkorrigering. F¨orljud, m¨ojligg¨ordigital musik digital distribution, enkel delning och stort lagringsutrymme. Men trots den senaste tidens anstr¨angningar fr˚anInternet-f¨oretagoch f¨orl¨aggare,har bokbranschen motst˚andframg˚angi den digitala tids˚aldern. Idag, ny teknik g¨ordet m¨ojligt att distribuera digitala b¨ocker. Men dessa digitaliserade b¨ocker som fortfarande representerade i en traditionell: som text och statiska bilder. Dock ¨arv˚artprojekt bygger p˚atro att de l¨asupplevelse kan utvecklas och dra nytta av dagens teknik. Den "Leaf" projektets viktigaste m˚al ¨ar att skapa en elektronisk plattform publikation distribution som ¨aven kan tillfredsst¨allaf¨orl¨aggare,f¨orfattare, och l¨asare. Projektet kommer att skapa en milj¨of¨oratt distribuera digitala publikationer som har st¨odf¨ormer avancerad teknik, att i framtiden n˚agot bortom text och statiska bilder f¨orl¨asaren. Detta dokument beskriver olika aspekter av design av en IT-infrastruktur f¨oratt str¨omma elektroniska publikationer. Denna del av "Leaf" projekt syftar till att beskriva den infrastruktur som beh¨ovsIT beh¨ovssom en elektronisk publikation distributionsplattform. Det huvudsakliga syftet med denna infrastruktur ¨aratt tj¨anaelektroniska publikationer p˚an¨atettill l¨asarna,men det ¨arinte den enda m˚alet f¨ordenna infrastruktur. Det har att leverera en tj¨anstsom ska vara skalbar. Denna arkitektur b¨oromfatta databas och servrar f¨oratt st¨odjade viktigaste tj¨ansterna. Alla st¨odtj¨ansterb¨orst¨odjam˚aletmed projektet, dvs att tj¨anam¨angder trafik i en skalbar och mycket hanterbart s¨att, samtidigt utnyttjar ¨oppnastandarder. Det f¨orstakapitlet granskar den senaste i elektronisk publicering. Dokumentet forts¨atteri kapitel 3 med en analys av varje av de olika servrar som deltar i arkitekturen. Webbservern Apache anv¨andssom streaming bok tj¨ansten¨arOpenLDAP anv¨ands f¨oratt genomf¨oraautentiser- ingsservern. Det sista avsnittet avslutas med en diskussion om databasservern. Kapitel 4 beskriver prestandatester f¨oratt verifiera att f¨oresl˚al¨osningenuppfyller de kr¨aver upps¨attningav systemet. Den "Leaf" projektet infrastruktur har designats och alla inblandade genomf¨orstjnster kunna ge en framtida m¨ojliginfrastruktur f¨orboken "strmmande" tj¨anster.Dessutom kommer de erh˚allna kunskapen bidrar till att skapa de f¨orstastegen i projektet infrastrukturen. F¨orest¨allningentestet avsl¨ojaratt Varnish Web Accelerator ¨aren billig och kraftfull l¨osningf¨oratt f¨orb¨attraprestandan webbservern. Contents List of figures vii List of Listings ix List of tables xi List of Acronyms and Abbreviations xv 1 Introduction 1 1.1 Current electronic publications platforms . 1 1.2 The Leaf Project . 1 1.3 The Information Technology infrastructure for the Leaf Project . 2 2 Electronic Publications 5 2.1 State of the art in the document formats . 5 2.2 EPUB File Format . 5 2.2.1 EPUB Publications 3.0 . 5 2.2.2 EPUB Content Documents 3.0 . 6 2.2.3 EPUB Open Container Format . 6 2.2.4 EPUB Media Overlays 3.0 . 7 3 Infrastructure Services 9 3.1 High Availability and High Scalability . 9 3.2 Book Streaming Service . 11 3.2.1 Apache . 12 3.2.2 Dealing with the EPUB file format . 14 3.2.3 Improving Apache Performance . 17 3.2.4 Linux Virtual Server . 18 3.3 Directory and Authentication . 28 3.3.1 OpenLDAP Cluster Architecture . 30 3.3.2 OpenLDAP Configuration . 31 3.4 Database . 34 3.4.1 Configuration and Creation of the Database . 35 3.4.2 PostgreSQL and Load Balancing . 37 3.4.3 Load Balancing and Cluster Configuration . 38 4 Performance test 41 4.1 Jmeter . 41 4.2 Test and Results . 42 4.2.1 Test Bed Description . 42 4.2.2 Tests Description . 42 4.2.3 Apache Only Test . 43 4.2.4 Apache + Varnish Test . 45 4.2.5 Apache + CDN . 47 4.3 Performance test conclusion . 48 5 Conclusions and Future work 53 5.1 Conclusions . 53 5.2 Future Work . 53 Appendices 59 A. Apache Web Server Configuration File 60 B. Ldirectord and HeartBeat Configuration Files 61 C. NFS and NIS Configuration Files 63 D. OpenLdap Configuration Files 65 E. Postgres Configuration Files 67 F. Performance test 70 F.1. Results with Only Apache . 70 F.2. Results with Varnish . 223 F.3. Results with CDN . 385 List of Figures 1 Virtual Server Architecture[1] . 10 2 Leaf Project platform architecture . 11 3 Web Server Load Balanced Architecture . 22 4 Book streaming infrastructure . 25 5 LDAP hierarchical tree architecture . 29 6 OpenLDAP infrastructure . 31 7 PostgreSQL infrastructure . 38 8 Jmeter Test plan example[2] . 42 9 Average Response Time for Apache Only Tests . 44 10 Linear Representation of the Average Response Time for Apache Only Tests . 45 11 Average Response Time for Apache + Varnish Tests . 46 12 Linear Representation of the Average Response Time for Apache + Varnish Tests . 47 13 Average Response Time for Apache + CDN Test . 48 14 CPU Usage for Apache Only, 250 threads . 49 (a) Apache Only 250 threads CPU Usage trial 1 . 49 (b) Apache Only 250 threads CPU Usage trial 2 . 49 (c) Apache Only 250 threads CPU Usage trial 3 . 49 15 CPU Usage for Varnish mode, 250 threads . 50 (a) Varnish 250 threads CPU Usage trial 1 . 50 (b) Varnish 250 threads CPU Usage trial 2 . 50 (c) Varnish 250 threads CPU Usage trial 3 . 50 16 CPU Usage for Apache Only, 250 threads . 51 (a) Apache Only 50 threads CPU Usage trial 1 . 51 (b) Apache Only 50 threads CPU Usage trial 2 . 51 (c) Apache Only 50 threads CPU Usage trial 3 . 51 17 Load Balanced topology (Apache + Load Balancer + HeartBeat) . 61 18 NFS and NIS Topology . 63 19 OpenLDAP infrastructure . 65 20 PostgreSQL Final infrastructure . 67 vii List of Listings 1 Command line to activate the AVFS file system . 15 2 Command line to mount the AVFS file system in the htdocs folder . 15 3 Command line to access the content in an EPUB . 16 4 Example of URL to access a EPUB . 16 5 Ldirectord configuration file . 20 6 Command line to activate IPv4 forwarding . 21 7 Configuration of the virtual ip . 21 8 High Availability Configuration . 23 9 High Availability Resources Configuration . 23 10 High Availability Authentication Configuration . 23 11 Host File Configuration Configuration . 23 12 NFS server exports configuration file . 26 13 Passwd example configuration line . 26 14 Groups example configuration line . ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages455 Page
-
File Size-