EPUB: NEW OPEN STANDARD IN E-PUBLISHING

STUDENT NAME: Blazej Blazejewski STUDENT NUMBER: 04-215-125 COURSE NAME: E-Business DEPARTMENT: DIUF

SUPERVISORS: Andreas Meier / Luis Terán

DATE OF SUBMISSION: 17 05 2011 ABSTRACT

The aim of this paper is to analyze the EPUB used to produce and distribute e-books. The paper is divided in two parts. The first part is devoted to the technical aspects of EPUB files. After a brief historical introduction, the current 2.0.1 EPUB specification is thoroughly discussed and a fictional and simplified EPUB file is described in detail to provide a practical example. The future 3.0 EPUB specification is briefly presented afterwards, and the first part is closed by a critical evaluation of the EPUB file format. The second part puts the EPUB file format into a broader economic perspective of the e-publishing value chain. For this reason, the second part starts with the description of e-publishing and the e-publishing value chain. This introduction is followed by a presentation of advantages of EPUB files over traditional printed books with a special focus on production, distribution and pricing. Finally, the second part of the paper compares the EPUB file format with other popular file formats used in e-publishing.

Keywords: EPUB, PDF, e-books, publishing, e-publishing

2 CONTENTS

ABSTRACT ...... 2

LIST OF FIGURES ...... 5

LIST OF ABBREVIATIONS ...... 6

FOREWORD ...... 7

INTRODUCTION ...... 8

PROBLEM STATEMENT ...... 8 OBJECTIVES ...... 8 OUTLINE ...... 9

CHAPTER ONE ...... 9

EPUB HISTORY ...... 9 CURRENT 2.0.1 EPUB SPECIFICATION ...... 10 Preliminary notes ...... 11 General structure ...... 11 MIME type ...... 12 META-INF directory ...... 12 Container file ...... 13 OPF ...... 13 NCX file ...... 15 Data files ...... 16 file ...... 17 Final comments ...... 18 FUTURE 3.0 EPUB SPECIFICATION ...... 18 Structural modifications ...... 18 Content modifications ...... 19 EPUB FORMAT EVALUATION ...... 19 Web standards ...... 19 Open standard ...... 19 Multiple specifications ...... 20 External specifications ...... 20 Hardware independent ...... 20 Reflowable content ...... 20 Basic layout ...... 20 Easy automatization ...... 21

CHAPTER TWO ...... 21

E-PUBLISHING ...... 21 E-PUBLISHING VALUE CHAIN ...... 22

3 EPUB FILES IN E -PUBLISHING VALUE CHAIN ...... 25 Production ...... 26 Distribution ...... 27 Pricing ...... 28 COMPARISON WITH OTHER FILE FORMATS ...... 28 EPUB and PDF...... 28 EPUB and AZW ...... 29

CONCLUSION ...... 30

MAIN FINDINGS ...... 30 CRITICAL ASSESSMENT ...... 31 OUTLOOK ...... 31

REFERENCES ...... 33

4 LIST OF FIGURES

Figure I. Structure of the helloworld. file...... 12 Figure II. The content of the mimetype file...... 12 Figure III. The content of the container. file...... 13 Figure IV. The content of the content.opf file...... 14 Figure V. The content of the toc.ncx file...... 16 Figure VI. The content of the title.html file...... 17 Figure VII. Publishing value chain according to [Hansen / Neumann 2005, p. 616]...... 23 Figure VIII. E-publishing value chain...... 24 Figure IX. Data flow in e-publishing value chain...... 25

5 LIST OF ABBREVIATIONS

ANSI American National Standards Institute ASCII American Standard Code for Information Interchange AZW Amazon Kindle file format CSS Cascading Style Sheets DAISY Digital Accessible Information System DOC Microsoft Word file format DOCX Microsoft Word file format DTBook Digital Talking Book DTD Document Type Definition EPUB Open e-book file format GIF Graphics Interchange Format HTML Hypertext Markup Language IDPF International Digital Publishing Forum ISO International Organization for Standardization IT Information technology JPEG Joint Photographic Experts Group MathML Mathematical Markup Language META-INF Directory name in EPUB file structure MIME Multipurpose Internet Mail Extensions MOBI E-book file format developed by NCX Navigation Control file for XML applications NISO National Information Standards Organization OASIS Organization for the Advancement of Structured Information Standards OeB Open OEBPS Directory name in EPUB file structure OCF Open Container Format OPF Open Packaging Format OPS Open Publication Structure PDF Portable Document Format PNG Portable Network Graphics SVG Scalable Vector Graphics USA United States of America XHTML Extensible Hypertext Markup Language XML Extensible Markup Language ZIP Open archiving file format

6 FOREWORD

The paper is a personal creation of the author and expresses solely his private opinions. The paper does not engage in any way the employer of the author, the Weblaw AG in Bern, or his supervisor, Mr. Franz Kummer.

7 INTRODUCTION

Problem statement

E-books as such are not all that recent. The Project Gutenberg started as early as 1971, and the combination of the PDF file format and the personal computer has widespread reading big volume electronic documents back in the last decade of the 20 th century [Gutenberg 2011; Lebert 2009]. But only at the beginning of the second decade of the 21 st century e-books are really gaining enough momentum to finally approach the critical mass and to possibly become a serious alternative to traditional printed books.

This is already true on the e-book market in the USA. According to the Association of American Publishers [AAP 2011], the e-book sales in 2010 rose by almost 165% in comparison to 2009 and represented more than 8% of the overall book market. These rocketing e-book sales can be placed in a somewhat broader context. On one hand, the booming e-reader [Hamblen 2010b] and tablet computer [Alpeyev / Miller 2011] industry is providing consumers with more and more mobile devices that are particularly well suited to reading e-books. On the other hand, the e-book distribution is also on the rise, ranging from standalone publishing houses such as O’Reilly [Hendrickson 2011], through established book-sellers like Amazon [Carnoy 2010a; Carnoy 2010b] or Barnes and Noble [Carnoy 2011] to the IT industry giants like Apple [Winograd 2010] and Google [Hamblen 2010a].

Given these facts, it is relatively safe to assume that the e-book market will also gain on importance in Europe and in Switzerland. For this reason, it is important to understand what e- books are, how they are prepared and how they are distributed. In order to fully apprehend it, however, many different dimensions have to be taken into account. Legal aspects, hardware availability and software systems are all equally important, to name only a few.

As far as the software dimension is concerned, the question of the data file format is essential. With so many hardware platforms and a huge variety of big and small distributors on the e-book market, the e-book exchange format is an important factor taken into account by the e-books’ producers, distributors and users.

Objectives

The aim of this paper is to examine in detail one specific e-book data exchange format: the EPUB format. EPUB is an open data format proposed by an independent organization and is gradually gaining acceptance as the new standard for e-publishing.

8

The paper will pursue two main objectives. First of all, the paper will try to explain the technical side of the EPUB files, thus providing guidance on working with this particular e-book file type. Secondly, the paper will discuss the e-publishing value chain and try to show how the EPUB file format can effectively be used in e-publishing. As such, the paper is aimed at all those who use e- books, be it as consumers, as writers or as publishers.

Outline

The structure of the paper will be twofold, following the two objectives stated above. The first chapter will cover the technical questions related to the EPUB file format. After a brief historical introduction, the current EPUB specification will be presented using a simple fictional example. The future, forthcoming specification will be presented next, followed by a critical evaluation of the EPUB file format as a whole.

The second chapter will put the EPUB file format into a broader perspective, showing its place in the e-publishing value chain. After presenting the e-publishing and the e-publishing value chain, the optimal usage of the EPUB formatted e-books will be thoroughly discussed. A comparison of the EPUB file format with other popular file formats used in e-publishing will close the second chapter.

CHAPTER ONE

EPUB history

This chapter presents the EPUB file format from the technical point of view. But before delving into the technical specification, it is interesting to discover the process that has lead to the creation of the EPUB e-book exchange format. The complete history of e-books and its different formats is described in detail in [Lebert 2007; Lebert 2008; Lebert 2009].

As Project Gutenberg started in 1971, the first books were only stored as 7 bit ASCII text files. The ASCII format was freshly created in 1968 and only allowed storing the letters of the American alphabet. In the next several years, electronic books were still stored as plain text files, eventually using the new ISO 8859 or Unicode encodings that were already capable of representing many different alphabets.

A major step forward was the creation of the Portable Document Format (PDF) by Adobe. Available from 1993, the new PDF format made it possible to create complex text documents with professional grade software. The documents could later be viewed by consumers using a free

9 viewing tool (Adobe Acrobat Reader). A separate version of the free reading software, called Adobe eBook Reader, was specially created to be used with PDF e-books. This new system was used at the very beginning of the 21 st century to pioneer the commercial distribution of e-books in Internet.

But at the same time, several other companies launched their own software systems for creating and reading e-books. This could have potentially led to a marketing chaos, preventing consumers from using this new book distribution channel. In order to address this standardization issue, in 1999 the American publishing industry has proposed the Open eBook (OeB) format in an attempt to set a common industry standard. The Open eBook format was based on the XML language and specified the Open eBook Publication Structure for the publication of electronic texts. The Open eBook format was well received in the publishers’ community and was replaced in 2007 by the EPUB format that further developed the electronic publication structure. The current stable EPUB version is 2.0.1 and a new 3.0 version is already in development.

As the American publishers created the new common standard, they also decided to found a common organizational structure that would promote and develop the new format. Shortly after the creation of the Open eBook format, the Open eBook Forum was created in 2000. In 2005 the Open eBook Forum was transformed into the International Digital Publishing Forum (IDPF), an international publishing industry organization. According to its website [IDPF 2011], the IDPF goals are to promote the electronic publishing and to set common standards for electronic publications. The IDPF currently unifies under its roof not only well-known publishing houses (like Cambridge University Press, Springer or McGraw-Hill) but also IT industry giants (such as Adobe, Apple, Google and Sony).

Current 2.0.1 EPUB specification

The current stable version of the EPUB file format is 2.0.1. This version is based on three separate specifications: the Open Publication Structure (OPS), the Open Packaging Format (OPF) and the Open Container Format (OCF). The OCF defines the general structure of an EPUB file, the OPF describes the metadata used by an EPUB file and the OPS regulates the actual data an EPUB file might contain. All the three specifications are available on the website of the IDPF [IDPF 2011] and their current release dates from 4th September 2010.

Combined, the three specifications describe how a conform EPUB file should be created. However, separated in three distinct parts and technically inclined, they are not always easy to understand by an inexperienced reader. Surprisingly, there is little to no additional literature on the technical creation process of EPUB files. Only a few online tutorials give a good general introduction to the

10 topic and explain the technical creation process in an accessible way. The rest of this chapter will be based on the online tutorial by [Daly 2011] which is by far the most comprehensive one.

Preliminary notes

The EPUB file format is actually based on other existing data formats. It relies on XML, XHTML and ZIP standards that are combined in order to create a new logical data structure. As a matter of a fact, an EPUB file is simply a ZIP file containing XML files with metadata, XHTML files with the text content of the book and eventually other files in other formats (for example image files with the illustrations for the book).

Creating an EPUB is therefore simply a matter of correctly organizing existing data according to a predefined pattern. The following sections will describe in detail the general structure of an EPUB file and the specific requirements imposed on its components. In order to better explain the whole process, the creation of a simple helloworld.epub e-book will be used as an example.

General structure

The structure of an EPUB file is defined by a mixture of obligatory rules and non-binding conventions.

For each EPUB file it is obligatory to have a mimetype file. Secondly, each EPUB file must have a META-INF directory placed at the root level of the EPUB file. The META-INF directory must contain an XML file named container.xml. Moreover, the EPUB file must contain an NCX file and exactly one XML OPF package document.

By convention, the OPF package document should have an .opf extension and its name should be content.opf. Also by convention, the remaining data files are stored in the OEBPS directory placed at the root level of the EPUB file.

The rest of the EPUB file structure, such as the names and the exact organization of data files, can be freely chosen by the creator of the file.

Some other general rules also apply to the EPUB file content. To name only the most important one, all XML files included in any EPUB file must be well-formed according to the XML 1.0 standard and be encoded either in UTF-8 or in UTF-16.

mimetype META-INF/ container.xml OEBPS/ content.opf

11 toc.ncx title.html chapter1.html stylesheet. IMAGES/ img1.jpg Figure I. Structure of the helloworld.epub file.

The Figure I shows the structure of the helloworld.epub file. The file contains all the obligatory elements and the conventional, yet non obligatory rules are respected. The data files with the actual content of the book (title page, chapter one, CSS stylesheet and a separate folder with an image) are inserted directly in the OEBPS directory.

MIME type

The mimetype file, which has no extension, contains the MIME type of the EPUB file.

As explained in [Sheldon 2001], MIME stands for Multipurpose Internet Mail Extensions and was originally developed to allow sending different file types per e-mail. Also known as Internet media type, it has now become a common standard that describes the type of data files on the Internet. A MIME type is simply a string of characters with a predefined content allowing any application to recognize properly the data it handles.

application/epub+zip Figure II. The content of the mimetype file.

The mimetype file of each EPUB file must only contain the character string “application/epub+zip”, without any new lines or carriage returns. Moreover, the mimetype file must be the first file in the zipped EPUB file and must not be compressed. In this manner, each EPUB file can be recognized as such by any reading application.

META-INF directory

The META-INF directory of any EPUB file may only contain six specified XML files with metadata about the e-book publication. There is one obligatory file, container.xml, that will be described in detail in the next section. The five remaining files which follow are optional.

The manifest.xml file, if included, is the manifest schema file according to the Open Document specification. For more information see the OASIS Open Document format website [OASIS 2011].

The optional metadata.xml file is reserved for information that might be added in future specifications.

12 The signatures.xml file, if present, is used to store the digital signatures of the EPUB file.

The optional encryption.xml file, if used, contains the information about the encryption of the content of the EPUB file. Even if an encryption is used, however, the mimetype file, the content of the META-INF directory and the OPF package file must never be encrypted.

Finally, the rights.xml file, if included, is used to store the Digital Rights Management information.

Container file

The container.xml file must be included in the META-INF directory of each EPUB file. The role of the container.xml file is to provide the location of the OPF package file within the EPUB file. The path should be relative to the root of the EPUB file and is included in the “full -path” attribute of the “rootfile” element. The rest of the container.xml file is virtually constant for all EPUB files.

Figure III. The content of the container.xml file.

OPF package

The OPF package file is simply an XML file constructed according to the OPF specification. By convention, however, it is given the .opf file extension in order to distinguish it easily from other files in an EPUB file. The content.opf file serves four specific purposes.

First of all, the OPF package file contains metadata about the e-book publication. This metadata is stored in the “metadata” element and should be conform to the specification proposed by the Dublin Core Metadata Initiative. According to its website [Dublin Core 2011], the Dublin Core provides a collection of metadata elements that can be used to describe a variety of different resources such as books, music, video, text documents etc. The Dublin core has achieved a wide international acceptance, has become an ISO standard (ISO 15836:2009) and is a part of the Resource Description Framework proposed by the World Wide Web Consortium.

The Dublin Core proposes as much as fifteen metadata elements, ranging from standard information like language or resource creator to somewhat more specific like format or geographical coverage. However, only two elements are obligatory in an EPUB OPF package: the title and the identifier of the book. The identifier element must be unique (it may for example be the

13 ISBN of the book) and must have an “id” attribute by which it is ref erenced by the OPF file. Finally, the Dublin Core metadata elements should be identified by the Dublin Core namespace (which is “http://purl.org/dc/elements/1.1/”).

The second function of the OPF package is to list all the files that make up the e-book contained in the EPUB file. The entire list is stored in the “manifest” element that is composed of several “item” elements that can be given in any order. Each “item” element must contain the three following attributes: the “id” of each item; the MIME type media type of the item; and finally the access path of each file, relative to the OPF package file. Just like the items themselves, the attributes of each item can be specified in any order.

The third main function of the OPF package file is to provide the reading order of the text documents that compose the e-book. This function is fulfilled by the “spine” element composed of one or more “itemref” elements. The “itemref” elements should be included in the order the e -book is meant to be read and should re ference the corresponding files with the “id” attribute as defined earlier in the manifest part of the OPF package. Lastly, the “spine” element must also contain the “toc” attribute with the “id” of the NCX file.

The final, fourth part of the OPF package file is optional. If included, the “guide” element contains semantic information about the e-book elements defined in the “manifest” part. The list of possible semantic descriptions is included in the OPF specification and allows to identify standard book publication elements such as cover page, foreword, index, glossary etc.

Blazej Blazejewski Hello World 123myuniqueidentifier321 Figure IV. The content of the content.opf file. 14

The content of the content.opf file is presented in the Figure IV. The metadata part contains the author, the title and a fictional identifier of the Hello World e-book. The manifest part lists all the files that contain data for the e-book, whereas the spine part gives the reading order of the two text documents contained in the book. Given the relative simplicity of the e-book, the guide part is not included.

NCX file

The NCX file contains information necessary to create a browsable table of contents of the e-book. NCX stands for Navigation Control file for XML applications and is actually a standard developed by the DAISY Consortium as a part of a wider project. According to its website [DAISY 2011], the DAISY Consortium (or the Digital Accessible Information System Consortium) is a non-profit organization based in Zurich, Switzerland which promotes the access to information by people with reading disabilities. The NCX standard is a part of a wider DAISY Consortium project called the Digital Talking Book and is included in the ANSI/NISO Z39.86-2005 specification (for more information see [NISO 2011]). This existi ng NCX standard is used by the EPUB’s OPF specification to ensure a global navigation possibility through the e-book’s text content.

Again, the NCX file is simply an XML file that must be conform to the DAISY’s DTD (available at http://www.daisy.org/z3986/2005/ncx-2005-1.dtd) and must be using the appropriate namespace (which is “http://www.daisy.org/z3986/2005/ncx/”). The NCX file is composed of three obligatory parts.

The first part is enclosed in the “head” element and contains metadata information. The NISO DAISY standard requires five “meta” elements , but within an EPUB file only one element is obligatory. The only “meta” element that must be included contains the unique identifier of the book, which must be the same as the one included in the OPF document. Two optional “meta” elements which could also be used in an EPUB file are the depth of the table of contents (i.e. the number of hierarchical levels) and the designation of the software used to generate the NCX file. The remaining two elements mentioned in the NISO DAISY standard are only applicable to print books and can simply be omitted.

The second part is the “docTitle” element which contains the title of the e -book.

The third and the most important part is included in the “navMap” element. This element contains the actual navigation structure of the e-book. The navigation map is composed of navigation points. Each “navPoint” element has an “id” attribute and a “playOrder” attribute indicating its position in the navigation structure, as well as a text label. Moreover, each navigation point has a 15 “content” element with a “src” attribute containing the access path to a given data. The access path can be pointing to a single file or to a specific part of a file, using XHTML anchors.

Moreover, the NCX file can contain three further, optional parts. The optional “docAuthor” element can be used to store the information about the author of the book. The optional “pageList” and the “navList” elements are similar to the “navMap” part and can be used to p rovide additional navigation possibilities. Finally, it should be noted that some particular parts of the NISO DAISY specification (such as those concerning audio files) are never used in the context of an EPUB file.

Hello World Title Chapter 1 Figure V. The content of the toc.ncx file.

The Figure V presents the content of the toc.ncx file of the Hello World e-book. The NCX file is only composed of the three obligatory parts. The table of contents has one hierarchical level with two navigation points. The first one points to the title page of the book and the second one points to the chapter 1 of the book.

Data files

The actual data an EPUB file might contain is defined in the OPS EPUB specification. The OPS specification allows the four following data types.

First of all, the XHTML data standard is used to store the text content of EPUB format e-books. The XHTML data should be conform to the 1.1 version of the XHTML language. However, the OPS

16 specification imposes some minor modifications to the XHTML usage. For example, the execution of scripts included in a “script” tag is discouraged and the “img” tag should preferably refer to a restricted number of image types (GIF, PNG, JPEG or SVG).

In order to facilitate the layout of the text included in XHTML files, the OPS specification allows the usage of CSS2 cascading style sheets. In the EPUB context, the OPS specification names them OPS Style Sheets and imposes some minor, specific modifications to the CSS2 web standard.

The third data type allowed by the OPS specification are SVG images. SVG stands for scalable vector graphics and, just like XHTML, CSS2 and XML, is a web standard developed by the World Wide Web Consortium (for all the standards, see [W3C 2011a; W3C 2011b; W3C 2011c; W3C 2011d] respectively). The SVG data format is used to store vector images that can be displayed in various sizes without quality loss.

Finally, the OPS specification allows the usage of DTBook, the Digital Talking Book standard defined by DAISY Consortium and formalized as the ANSI/NISO Z39.86-2005 standard. The DTBook standard was designed to provide assistance to people with reading disabilities (for more information see [DAISY 2011; NISO 2011]).

Hello World: title

Hello World

by Blazej Blazejewski

Figure VI. The content of the title.html file.

The figure VI presents the content of the title.html data file. It can be seen that the actual content of the Hello World e-book is very much like a web page with the text of the book. The content of the chapter1.html and stylesheet.css files is not presented, these two files being simply typical XHTML and CSS files.

ZIP file

Once the content of an EPUB e-book is prepared, all the data and metadata files should be compressed into a ZIP file. The ZIP data compression format was originally created in 1989 by Phil Katz and has subsequently become a common standard for data compression [PKWARE 2011].

17 The OCF specification includes some special rules concerning the packaging of the EPUB e-book into a ZIP package. The two following rules are the most important. Firstly, the mimetype file should be the first file in the ZIP package and actually must not be compressed. Secondly, the ZIP file itself must not be encrypted (the encryption of the content of the EPUB file, however, is possible and can be announced in an encryption.xml file in the META-INF directory).

When the ZIP file has finally been created, all that needs to be done is to change the extension of the file from ZIP to EPUB. The EPUB e-book is now ready and can be read by EPUB compatible e- book readers.

Final comments

The example used in this chapter describes the creation of an EPUB file by manually editing each file that composes the final EPUB e-book. This is the most straightforward approach, requiring only a text editor and an archiving utility that allows adding uncompressed files to a ZIP package. On a Windows machine, the standard Notepad application is merely enough to prepare the content of an EPUB e-book. The embedded Windows archiving utility, on the contrary, is of no use as it does not allow adding uncompressed files to ZIP files. A more advanced archiving utility is required, such as the freeware 7-Zip utility.

Manually editing EPUB files can become tedious when the book volume increases. In this case, some special EPUB creation utilities can be used to automatically create EPUB files from other data files. A popular choice is the freeware Calibre software that allows multiple conversion possibilities of different data formats. In an enterprise environment, the professional-grade Adobe InDesign software allows exporting documents to EPUB files.

Future 3.0 EPUB specification

While the 2.0.1 EPUB specification was released in September 2010, the original 2.0 EPUB version dates from as early as 2007. In order to further improve and develop the EPUB file format, a new, third release of the EPUB specification is actually being prepared. Although the 3.0 version is still work in progress, the drafts of the future 3.0 EPUB specification are already available at the IDPF website (see [IDPF 2011]). This section gives a quick overview of the principal modifications that, according to the drafts, will be included in the new 3.0 EPUB specification.

Structural modifications

The new 3.0 EPUB standard will be based on four specifications. The Open Container Format 3.0, just like the current OCF, will define the general structure of EPUB files. The Publications 3.0 specification will succeed the current OPF and will specify the metadata organization of EPUB files. The Content Documents 3.0 specification will replace the current OPS and will define the

18 actual data content on an EPUB file. Finally, the new Media Overlays 3.0 specification will be used to provide a synchronization of text data and audio data.

The new 3.0 EPUB specification will also abandon the use of NCX files as a mean of providing in- document navigation information. The NCX solution will be replaced by EPUB Navigation Documents defined in the new Content Documents 3.0 specification.

Content modifications

The content of the 3.0 version of EPUB files will be based on the new HTML 5 web standard. As such, the new 3.0 EPUB files could include audio and video elements introduced in HTML 5 “audio” and “video” tags. To facilita te the presentation of such multimedia content, the 3.0 EPUB draft specification allows the use of triggers and scripts to increase the users’ interaction possibilities.

Moreover, the 3.0 EPUB specification will support the Mathematical Markup Language (MathML). MathML is an open W3C standard (see [W3C 2011e]) used to easily express complex mathematical notation. This move will foster the use of EPUB files by the scientific community. On the other hand, the new EPUB specification will abandon the support of the DTBook DAISY standard, as it overlaps with the possibilities provided by HTML 5.

EPUB format evaluation

The EPUB file format presents some interesting characteristics that have largely contributed to its growing popularity. This last part of the first chapter provides a critical evaluation of the main characteristics of the EPUB file format, pointing out its main strengths and weaknesses.

Web standards

EPUB files are entirely based on well-known and successful web standards XML, XHTML and CSS. This makes EPUB files easy to create and easy to handle, as it does not require learning any additional programming languages.

Open standard

The EPUB standard is an open standard developed by an independent organization. The complete specification of EPUB files is publicly available online at no cost. Moreover, all the other standards used by EPUB (i.e. XML, XHTML, CSS and ZIP) are also public and free to use. This means that the EPUB standard can simply be employed by all interested companies or individuals, without any legal or economical obstacles.

19 Multiple specifications

The EPUB file format is formally regulated by three separate specifications. Each specification defines a different aspect of the EPUB standard, but all three specifications are heavily linked with each other without any formal hierarchy. Even though it might be a good idea to regulate different matters in different documents, this separation makes the whole EPUB specification much more difficult to apprehend by new or inexperienced users.

External specifications

Moreover, the EPUB specification relies in part on external specifications that were initially developed for other purposes. Again, while reusing existing specifications might sometimes be a good idea, these external references make the EPUB specification less clear and can pose coherence problems with native EPUB rules. In this regard, the abandon of the external NCX specification by the coming 3.0 EPUB version is a welcome improvement.

Hardware independent

The EPUB file format was not developed for a specific hardware device. On the contrary, it can be used on any computer machine, be it mobile or stationary, with any operating system. This makes the EPUB file format particularly versatile and portable. One single EPUB file is enough to make an e-book accessible to consumers with a variety of reading devices. In addition, the very same EPUB file can still be used when a consumer decides to change his hardware device or his operating system.

Reflowable content

As EPUB files were not designed to display on a specific hardware machine, they can be displayed on any screen of any reasonable size. The text content of an EPUB file is reflowable: it means that it adapts itself to the size of the screen it is displayed on. This is a very important feature, because it makes the very same EPUB file perfectly readable on a panoramic 20 inch desktop screen, on a 10 inch tablet device or on a 3 inch mobile phone.

Basic layout

However, this versatility comes with a price. To make the text fit on different screen sizes, an EPUB file allows no pagination or precise page layout. Just like on web pages, the text is only divided in headers and paragraphs that are subsequently displayed within the space given by the screen. This makes EPUB files practically useless for publications that require precise layout and image positioning, such as comics, albums and other rich illustrated books.

The actual rendering of the content of an EPUB file also depends on the software used by the reader. The same EPUB e-book on the same device might look quite differently when viewed with

20 two different software solutions. This all means that EPUB creators have only a very limited choice of layout possibilities.

Easy automatization

Finally, as emphasized by [Daly 2011], EPUB files are not only user and reader friendly, but also creator and programmer friendly. Given that EPUB files rely on open, simple and well-known web standards, it is relatively easy for developers to create EPUB specific software. Besides, the actual technical process of creating EPUB files can also be automatizated, for example when transforming other text files into EPUB e-books.

CHAPTER TWO

E-publishing

The first chapter of this paper presents the technical aspects of working with EPUB files. The aim of the second chapter is to put the usage of EPUB files into a somewhat broader economical context, showing the place of EPUB files in the e-publishing domain. Before analyzing the actual role of EPUB files in practical business conditions, it is necessary to begin with an explanation of what e-publishing actually is.

According to the online version of the Longman Dictionary of Contemporary English [LDOCE 2011], the noun "publishing" describes the business of producing books and magazines. Still according to [LDOCE 2011], "desktop publishing" is the work of arranging the writing and pictures for a magazine or a book using a computer and special software, whereas "e-publishing" is the business of producing books or magazines that are designed to be read using a computer.

These linguistic definitions require further commentary. While it would be difficult to modify the definition of the publishing industry itself, the harsh division between desktop publishing and e- publishing is somewhat more problematic. It is definitely true that the first usage of computers in the publishing industry was simply to prepare the text for the later printing process (see [Spring 1991, p. 42, 44]). It is also true that the digital displaying of longer text documents probably evolved from web services and is not directly linked with the software used in the preparation process of printed books. However, modern publishing and text processing software can be used for both purposes, thus rendering the strict distinction between desktop publishing and e-publishing a bit artificial.

For this reason and in accordance with [Spring 1991, p. 50], this paper will assume that e- publishing covers the whole process of preparing a publication for distribution, that involves

21 computers and computer software, that takes digital input and that produces printed or digital output depending on the will of the publishing house and / or the author.

E-publishing value chain

Following this working definition, e-publishing can be seen as a series of steps that begins with some digital text input and ends with a printed or electronic publication being delivered to the final customer. This whole publishing process is made of several consecutive actions that are all aimed at providing the reader with the publication written by the author. At each step of the production process additional value is added to the product, such that the final product can be sold to the customer at a price higher than the sum of the separate production inputs.

In economics in general, and in electronic business in particular, such a series of production steps increasing the value of the product is called value chain. The notion of value chain was introduced by Michael Porter and is currently used in a wide variety of markets to describe and to analyze the production process of goods and services (see [Porter 1999, p.49; Hansen / Neumann 2005, p. 614, 615]).

The value chain in the publishing market has already been analyzed by [Hansen / Neumann 2005, p. 616] and is depicted in the Figure VII.

22

Figure VII. Publishing value chain according to [Hansen / Neumann 2005, p. 616].

It can clearly be seen that the publishing value chain in the Figure VII only takes into account printed books sold to the reader by a series of intermediaries. In order to adapt this classic vision to the modern e-publishing value chain, the diagram should also include e-books and Internet distribution.

23

Figure VIII. E-publishing value chain.

The Figure VIII presents the publishing value chain modified in order to depict the modern e- publishing. Instead of only one output, two separate outputs are possible. The author and the publishing house have the possibility to produce a printed version of the book, to produce an electronic version of the book (e-book), or even to produce simultaneously both the printed and the electronic version. Moreover, two new different distribution channels are also available. Whereas the classic bookstores can only sell printed books, online vendors can propose both printed books and e-books. Finally, the publishing house can have its own direct distribution, selling both printed and electronic books directly to final customers. This is a so called hybrid distribution, using both offline and online distribution channels (see [Meier / Stromer 2008, p. 134]).

24 In this modern e-publishing value chain, computers and software are used at virtually all production steps. But the most important function of computer software in e-publishing is to allow the transformation of the author's text input into a final output ready to be printed or to be displayed on user's screen. From this point of view, the data flow in the e-publishing value chain can be divided in three main parts.

Figure IX. Data flow in e-publishing value chain.

The Figure IX presents the data flow in the e-publishing value chain as well as the corresponding data formats and / or activities. In the Upstream part of the data flow, the input data from the author and from other sources is collected and stored for further treatment. The input data comes usually in the form of word processing data files or other data that was already published but is reused for a new publication. In the Transformation part of the data flow, the input data is joined, enhanced and finally converted into the desired output. This output is then used in the Downstream data flow part, where the output data is delivered to the printer or directly to the final customer. Depending on the publishing house's choice, the data is transformed into a high quality data file suitable for printing or into a data format that can be directly delivered to e-book readers.

EPUB files in e-publishing value chain

According to the presentation of the data flow in the e-publishing value chain, it can clearly be seen that the EPUB file format is specific to the Downstream part of the data treatment process. It means that EPUB files are converted from the input data after the Transformation phase of the process and are then instantly ready to be delivered to final customers.

Given the characteristics of EPUB files already presented in chapter one, it is now interesting to see how to best use the potential of EPUB files in this specific context of the e-publishing value chain. In order to explicit the advantages of EPUB files, an in-depth analysis will be presented with regard to production, distribution and pricing.

25 Production

Unlike printed books, an EPUB e-book is a so called digital good [Hansen / Neumann 2005, p. 626]. As such, an EPUB file can be copied an unlimited number of times, each copy being perfectly conform to the original. From the production point of view, it means that the publishing house actually produces only one single EPUB file that can later be copied with no limits, at no cost and in very little time. Economically speaking, the production of an additional EPUB e-book has no marginal cost at all: the production cost of ten EPUB copies is equal to the production cost of ten thousand copies. This is in a strong contrast to the production of printed books, where the variable cost rises with the number of printed books and can eventually compromise the possible profit. It is evident that the production planning of EPUB e-books is much simpler than the production planning of printed books, because only the fixed cost of the preparation of the first e-book needs to be taken into account.

Moreover, the quantity of books to be printed has to be decided in advance, depending on an uncertain and imperfect prevision of the future demand. This is known as make to stock production, where the whole quantity of books is printed in advance, and is then stocked and subsequently sold to the readers (see [Meier / Stromer 2008, p. 138; Jacobs et al. 209, p. 165]). The make to stock production has a serious disadvantage: the actual demand for the printed books can finally be drastically different than expected, leading to shortages or to stocks of unsold books. In case of EPUB e-books, the problem of production quantity does not even exist. The publishing house actually produces only the first e-book, and the following copies are made on the fly at the very moment readers buy the e-book.

But the most impressive advantage of EPUB e-books is their flexibility and versatility when compared to printed books. As a matter of a fact, each publication created as a printed book comes in one single variant. It means that all the customers receive the same content of the publication, the only possible option being the choice between paperback and hardcover. This is due to the fact that paper books have to be prepared and produced entirely in advance, trying to match the quantity with an unpredictable demand. This obviously prevents the publishing house from preparing different content versions of the same book, as it would greatly complicate the production and quantity decisions.

As the variable cost and the quantity choice do not apply to EPUB e-books, the EPUB files allow for the creation of multiple variants of the same product without compromising the overall production costs. Coupled with the possible automatization of the creation of EPUB files, this can lead to a highly customizable production process. In fact, it can easily be imagined that several different EPUB files could be made from the same input provided by the Upstream part of the e-

26 publishing value chain. These versions could offer slightly different content and various functionalities; for example, the simplest version could only offer the plain text of the book, whereas the advanced version could also contain several indexes (names, places, keywords etc.) directly linked to the corresponding text fragments. In this manner, the readers could easily choose the version that best suits their personal needs.

An even further step would be to propose a fully personalized EPUB e-book production on client's demand. These customization possibilities could change the production process from make to stock type to make to order type, where a product can be uniquely designed to suit a single customer's needs (see [Meier / Stromer 2008, p. 138; Jacobs et al. 2009, p. 165]). This could be especially interesting in the case of compilations of already existing texts and publications, uniquely chosen and arranged by the client and possibly automatically indexed and linked within one single EPUB file. This approach could take the e-publishing from mass production of uncustomized printed books to mass customization, giving each reader a unique e-book on his own.

Distribution

The classic distribution of printed books relies on a series of intermediaries between the publishing house and the final customer. Each intermediary has its own stock of books that has to be managed and replenished depending on an uncertain future demand. Most of the intermediaries use the push replenish method, i.e. they command a chosen number of books in advance, hoping that this quantity will correspond to the future demand. In this manner, the intermediaries run both the risk of being left with unsold books and the risk of not being able to satisfy the customers’ demand. An alternative solution is the pull replenish method, in which a book is only ordered when it is actually bought by a customer (see [Jacobs et al. 2009, p. 404]). However, this second solution is not perfect as well, because in this case it is the customer that takes the risk of waiting an excessive amount of time for the completion of the order.

In contrast, using EPUB files in the e-publishing value chain requires no inventory management at all. One single EPUB file stored by the publishing house can be copied and delivered via Internet to any intermediary or directly to any customer, at any time and at no specific transportation cost. By consequence, the distribution of EPUB files requires no inventory investment and no specific demand forecasting. Moreover, the distribution of EPUB files is independent from geographical and transportation constraints present in the distribution of printed books. EPUB files are downloadable via Internet and can be made available to any market in the world or can be accessed by users from any place, without the need to physically transport a printed exemplary. Finally, the distribution system of EPUB files is extremely flexible, as it allows to take full advantage of modern online distribution. For instance, publishing houses can opt for an own direct distribution by means 27 of disintermediation or can choose indirect distribution and provide their EPUB e-books to online aggregators (see [Meier / Stormer 2008, p. 31, 37]).

Pricing

Just like the distribution, the pricing of EPUB e-books has several advantages when compared to traditional printed books. First of all, setting the price of an EPUB e-book is significantly easier because, as stated above, e-books have no marginal cost. By consequence, the price of an EPUB e-book should simply aim to maximize the revenue, without taking into account the variable production cost. This is a welcome simplification in comparison with printed books pricing, where the interaction between the price, the demand and the marginal cost should specifically be taken into account.

Furthermore, the possibility to produce different versions of one book allows the usage of advanced pricing policies known as price differentiation. According to [Phillips 2005; Meyer / Stormer 2008, p. 51], price differentiation consists of setting different prices for different versions of the same product in order to appeal to many customer segments with different price sensitivity and different willingness to pay. This approach has advantages for both publishing houses and e-book buyers. On one hand, the sales revenue increases because additional client segments are now willing to buy more e-books. On the other hand, the customers’ satisfaction also increases because each client can choose the price version that suits his willingness to pay. By consequence, selling different versions of EPUB e-books with different prices is a true win-win situation for publishing houses and book readers.

Comparison with other file formats

The previous section has clearly shown that using EPUB e-books in the e-publishing value chain has several important advantages over traditional printed books. But it should be recalled that the EPUB file format is not the only e-book file format that can possibly be used in the e-publishing value chain. Other e-book file formats exist, each one having its own specific advantages and being suitable for specific tasks. In order to complete the overview of the e-publishing value chain, it is thus useful to directly compare EPUB files with other popular file formats. This paper will provide comparison of EPUB files with two file formats, namely PDF and AZW, but the following remarks can easily be extended to other file formats with similar characteristics.

EPUB and PDF

PDF, which stands for Portable Document Format, is a file format that was originally created by the Adobe Corporation in 1993 as a proprietary file format. In 2007 the Adobe Corporation decided to make the PDF specification publically available and in 2008 the PDF file format has been

28 published as ISO 32000 specification. The PDF specification and further information on the PDF file format are available online at [Adobe 2011a; Adobe 2011b].

According to [Daly 2011], the main characteristics of the PDF file format is that it is page-oriented and that it is capable of providing very precise layout control. Moreover, the display of PDF files is software and hardware independent, in the sense that a PDF document will look exactly the same on any compatible PDF reader on any hardware device. This is in strong contradiction with the EPUB file format which is reflowable, provides only limited layout possibilities and can be rendered differently on different reader devices. So it is quite obvious that PDF files are far superior in delivering high quality publications that require very precise element spacing or include many image elements that have to be precisely positioned.

However, the downturn of the PDF file format is that is quite difficult to read and use on small to medium size screens, like for example smartphones or very small tablet computers. The reason is that the text elements in PDF files are of a fixed size that is relative to the page size of the document and not to the size of the screen of the reader device. The user barely has the difficult choice between viewing the whole page with very small text characters or viewing only a magnified part of the page that has to be moved around all the time. In this context, EPUB files are much more flexible and can be adapted and comfortably viewed on any screen size. Furthermore, PDF files contain no metadata about its content, reason for which they are extremely difficult to convert to any other file formats. On the contrary, EPUB files’ content is structured and described by XML and XHTML tags, thus making it relatively easy to convert automatically.

One final difference is that PDF files, namely because of their precise layout, can be used for high quality professional printing. It means that one single PDF file, once created in the e-publishing value chain, can be used both for printing purposes and for displaying on computer screens. This is an important simplification, as it eliminates the need for creation and conversion of multiple file formats of the same publication. But yet again, this simplification comes with a price. The professional creation of PDF files is only possible with professional grade, expensive software, whereas EPUB files can easily be created with open source tools or even self-made software solutions.

EPUB and AZW

AZW is the file format used by Amazon on its Kindle e-book reader. According to [Buchanan 2010], the AZW file format is based on the MOBI file format, originally developed by the Mobipocket company and actually similar to the Open eBook specification (see [Mobipocket 2008] for more technical details). The Mobipocket company was subsequently acquired by Amazon and the MOBI

29 file format was modified and transformed into the AZW file format. The AZW file format is now a proprietary format of Amazon and there is no public specification available.

This closed and proprietary character of AZW files is the main difference in comparison with EPUB files. As far as the technical dimension is concerned, both file types must actually have quite a lot in common, as they both stem from the Open eBook specification. In practice, the difference is that EPUB files can be freely created by any interested person or company, whereas AZW files can only be made by Amazon. Moreover, EPUB files can be read on any device with any software that respects its open specification, while AZW files can only be read with hardware or software expressly provided by Amazon (like Kindle reader or Kindle iPhone app).

The closed or opened character of e-book files has some obvious implications on the creation and distribution of such e-books. From the author’s point of view, open e -books can be opened and read by any potential reader with any reading software, whereas closed e-books are available only to those that have previously acquired a corresponding dedicated reading device. From the publishing house’s point of view, open e -books can be prepared with any (possibly open-source) software and can be distributed to any intermediary, whereas closed e-books require proprietary software and impose a very limited choice of compatible distributors. From the reader’s point of view, open e-books can be acquired from many independent sources and can freely be transferred between hardware devices or software readers. From the distributor’s point of view, however, closed e-books signify exclusivity of distributed titles and a very strong binding of clients that have already acquired a dedicated reading system.

It appears that open e-book files like EPUB are very much easier to distribute and can reach a much wider population of clients. They are advantageous for authors, publishing houses and clients and give them freedom of choice and independence from any particular commercial hardware or software system. Closed e-book files like AZW, on the contrary, are only advantageous to large online distributors and aggregators. Companies like Amazon can use closed file formats to ensure distribution exclusivity and to bind clients to its own hardware of software e-book readers.

CONCLUSION

Main findings

The EPUB file format is an open file format developed by an independent organization IDPF. This file format is based on three separate specifications (OPS, OPF and OCF) that, combined, describe the structure and the content of an EPUB file. In fact, an EPUB file is a ZIP file that

30 contains the metadata and the data of an e-book. The metadata is stored in XML files constructed according to specified standards, whereas the actual text data of the e-book is stored in XHTML files. Additional data (like sound or images) can also be included. The EPUB structure is entirely based on open and well known web standards, thus allowing an easy and automated creation of EPUB e-books. Such e-books can be read on any operating system and any hardware device and can adapt to various screen sizes. However, EPUB e-books offer only limited layout possibilities and can be displayed differently on different hardware and software readers.

The EPUB files are used in the e-publishing value chain, i.e. in a series of consecutive activities that lead to the creation of a printed or an electronic publication. The modern e-publishing value chain allows a parallel creation of both printed and electronic books that can be distributed through online or offline distribution channels. Due to its flexibility and easy automatization, EPUB files have several important advantages over classical printed books. EPUB e-books are digital goods and have no marginal production cost and require no inventory management. They can easily be produced in different versions with different prices, enabling the use of mass customization and revenue management in order to increase customer satisfaction and sales revenue. But the EPUB file format is not the only e-book file format. Other e-book file formats exist, each one having its own advantages. For instance, the PDF file format is more appropriate for high quality publications with complex layout and graphics, whereas the AZW file format is used by Amazon to increase the loyalty of their clients.

Critical assessment

The objective of this paper was to analyze one specific e-book file format and to show its place in the e-publishing value chain. But the e-book phenomenon is a very complex one and cannot be reduced to a single question of file formats. Other important dimensions, such as legal and economic aspects, sociological changes, environmental impact etc. should also be taken into account. All these complex topics should be treated in detail in order to fully apprehend the e-book market.

In addition, this paper has only analyzed the current 2.0.1 version of the EPUB specification. A next, 3.0 version of this specification is already being prepared and will probably replace the 2.0.1 specification in a near future. This fact will obviously turn some technical parts of this paper obsolete.

Outlook

In order to fully and completely apprehend the e-book phenomenon, a wide range of further specific studies would have to be undertaken. These future studies might for example analyze the

31 Swiss and the European e-book market, legal and technical obstacles for the widespread of e- books, sociological changes that can lead to a wider acceptance of e-books or even environmental impact of shifting from printed to electronic books.

More specifically, this paper should be updated once the new 3.0 EPUB specification will be adopted by IDPF.

32 REFERENCES

[AAP 2011] Association of American Publishers: AAP Publishers Report Strong Growth in Year-to- Year, Year-End Book Sales , available: http://www.publishers.org/press/24/, accessed 16 th May 2011 [Adobe 2011a] Adobe Systems Incorporated: Adobe PDF 101 – Quick overview of PDF file format , available: http://partners.adobe.com/public/developer/tips/topic_tip31.html, accessed 7 th May 2011 [Adobe 2011b] Adobe Systems Incorporated: PDF Reference and Adobe Extensions to the PDF Specification , available: http://www.adobe.com/devnet/pdf/pdf_reference.html, accessed 7 th May 2011 [Alpeyev / Miller 2011] Alpeyev, Pavel; Miller, Hugo: Android Tablets Gain on Apple IPad in Fourth Quarter , version 31 st January 2011, available: http://www.bloomberg.com/news/2011-01- 31/android-tablets-gain-on-ipad-in-fourth-quarter-researcher-says.html, accessed 26 th Mars 2011 [Buchanan 2010] Buchanan, Matt: Giz Explains: How You’re Gonna Get Screwed By Ebook Formats , version 10 th March 2010, available: http://gizmodo.com/#!5478842/giz-explains-how- youre-gonna-get-screwed-by-ebook-formats, accessed 7 th May 2011 [Carnoy 2010a] Carnoy, David: What Amazon didn’t say about e -books , version 20 th July 2010, available: http://reviews.cnet.com/8301-18438_7-20011038-82.html, accessed 26 th Mars 2011 [Carnoy 2010b] Carnoy, David: Amazon: we have 70-80 percent of e-book market , version 2 nd August 2010, available: http://reviews.cnet.com/8301-18438_7-20012381-82.html, accessed 26 th Mars 2011 [Carnoy 2011] Carnoy, David: B&N: Nook has 25 percent of U.S. e-book market , version 23 rd February 2011, available: http://news.cnet.com/8301-17938_105-20035277-1.html, accessed 26 th Mars 2011 [DAISY 2011] DAISY Consortium, available: http://www.daisy.org, accessed 9 th April 2011 [Daly 2011] Daly, Liza: Build a digital book with EPUB , version 11 th January 2011, available: http://www.ibm.com/developerworks/xml/tutorials/x-epubtut/index.html, accessed 16 th Mars 2011 [Dublin Core 2011] Dublin Core / Metadata Initiative, available: http://dublincore.org, accessed 6 th April 2011 [Gutenberg 2011] Project Gutenberg, available: http://www.gutenberg.org, accessed 16 th Mars 2011 [Hamblen 2010a] Hamblen, Matt: Google launches , eBookstore , version 6 th December 2010, available: http://www.computerworld.com/s/article/9199599 /Google_launches_eBooks_eBookstore, accessed 26 th Mars 2011 [Hamblen 2010b] Hamblen, Matt: Hot e-readers sales will continue into 2011, Gartner says, version 8 th December 2010, available: http://www.computerworld.com/s/article /9200525/Hot_e_reader_sales_will_continue_into_2011_Gartner_says, accessed 26 th Mars 2011

33 [Hansen / Neumann 2005] Hansen, Hans Robert; Neumann, Gustaf: Wirtschaftsinformatik , 9. Auflage, Lucius & Lucius, Suttgart, 2005 [Hendrickson 2011] Hendrickson, Mike: 2010 State of the Computer Book Market, Post 5 – Wrap- Up and Digital , version 23 rd February 2011, available: http://radar.oreilly.com/print/2011/02/2010- book-market-5.html, accessed 26 th Mars 2011 [Jacobs et al. 2009] Jacobs, F.Robert, Chase, Richard B., Aquilano Nicholas J.: Operations & supply management , 12 th edition, McGraw-Hill, Boston et al. 2009 [LDOCE 2011] Longman: Longman Dictionary of Contemporary English , available: http://www.ldoceonline.com, accessed 22nd April 2011 [Lebert 2009] Lebert, Marie: A Short Histroy of eBooks , version 2009, available: http://www.etudes- francaises.net/dossiers/ebook.htm, accessed 26 th Mars 2011 [Lebert 2008] Lebert, Marie: Technology and Books for All , version 2008, available: http://www.etudes-francaises.net/dossiers/technologies.htm, accessed 26 th Mars 2011 [Lebert 2007] Lebert, Marie: Les mutations du livre à l’heure numérique , version September 2007, available: http://www.etudes-francaises.net/dossiers/mutations.htm, accessed 26 th Mars 2011 [Meier / Stromer 2008] Meier, Andreas; Stromer, Henrik: eBusiness & eCommerce , 2 nd edition, Springer, Berlin Heidelberg 2008 [Mobipocket 2008] Mobipocket: Welcome to Mobipocket Developer Center , version 24 th April 2008, available: http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&File=mobiformat.htm, accessed 7 th May 2011 [NISO 2011] NISO: The DAISY Standard , available: http://www.niso.org/workrooms/daisy/, accessed 9 th April 2011 [OASIS 2011] OASIS: Open Document Format for Office Applications , available: http://www.oasis- open.org/committees/tc_home.php?wg_abbrev=office, accessed 3 rd April 2011 [Phillips 2005] Phillips, Robert L.: Pricing and revenue optimization , Stanford University Press, Stanford 2005 [PKWARE 2011] PKWARE: Our Founder – Phil Katz , available: http://www.pkware.com/about- us/phil-katz, accessed 11 th May 2011 [Porter 1999] Porter, Michael E.: L'avantage concurrentiel , Dunod, Paris 1999 [Sheldon 2001] Sheldon, Tom: McGraw-Hill Encyclopedia of networking & telecommunications , McGraw-Hill, New York et al. 2001 [Spring 1991] Spring, Michael B.: Electronic Printing and Publishing , Marcel Dekkel, Inc., New York et al. 1991 [W3C 2011a] World Wide Web Consortium: XHTML™ 1.1 – Module-based XHTML – Second Edition , version 23 November 2010, available: http://www.w3.org/TR/xhtml11/, accessed 11 th April 2011

34 [W3C 2011b] World Wide Web Consortium: Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification , version 7 December 2010, available: http://www.w3.org/TR/CSS2/, accessed 11 th April 2011 [W3C 2011c] World Wide Web Consortium: Scalable Vector Graphics (SVG) , available: http://www.w3.org/Graphics/SVG/, accessed 11 th April 2011 [W3C 2011d] World Wide Web Consortium: Extensible Markup Language (XML) , available: http://www.w3.org/XML/, accessed 11 th May 2011 [W3C 2011e] World Wide Web Consortium: Mathematical Markup Language (MathML) Version 3.0 , version 21 October 2010, available: http://www.w3.org/TR/MathML/, accessed 16 th April 2011 [Winograd 2010] Winograd, David: The iBookstore six months after launch: One big failure , version 14 th October 2010, available: http://www.tuaw.com/2010/10/14/the-ibookstore-six-months-after- launch-one-big-failure/, accessed 26 th Mars 2011

35