An Ontology-Based Approach to Designing Information Architecture of Websites

AN ONTOLOGY-BASED APPROACH TO DESIGNING INFORMATION ARCHITECTURE OF WEBSITES

Fedor Bakalov

A thesis is submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Information Management

Examination Committee: Prof. Vilas Wuwongse (Chairperson) Dr. Vatcharaporn Esichaikul Dr. Paul Janecek

Nationality: Kyrgyz Previous Degree: Bachelor of Engineering and Technology in Information and Computer Technologies Kyrgyz National University, Kyrgyzstan

Scholarship Donor: ADB-Japan Scholarship Program (ADB-JSP)

Asian Institute of Technology School of Engineering and Technology Thailand August 2007 ACKNOWLEDGEMENTS Many people have contributed to the existence of this paper. First, I would like to thank my advisors from the Asian Institute of Technology (AIT), Prof. Vilas Wuwongse, Dr. Vatcharaporn Esichaikul, and Dr. Paul Janecek, for their valuable help and support in conducting this research.

I would also like to express my gratitude to the Digital Enterprise Research Institute (DERI) in Innsbruck for providing me wonderful facilities and research environment during my thesis work. In particular, I want to thank Dr. Ying Ding and Prof. Martin Hepp for their helpful suggestions and comments.

Also, I wish to thank the Asian Development Bank for providing me the scholarship, which allowed me to study at AIT, and the EastWeb project for giving me the opportunity to conduct my research at DERI.

Finally, I am grateful to the administrative staff of both institutions, AIT and DERI, for assisting me in many different ways. I thank my family, friends, and colleagues for their emotional support and encouragement.

ii ABSTRACT Nowadays we observe the tremendous growth of the number of websites on the Internet and the increasing complexity of website structure and features. This affects the requirements we set for the organization and presentation of information on Web resources. Two new disciplines, Information Architecture and Semantic Web, have been recently coined to address different aspects of the resources on the Web. Information Architecture establishes the practices and guidelines for defining users and designing information structure, navigation schemes, and appearance of websites. The discipline of Semantic Web has created the instruments, ontologies, to enable Web developers providing the machine-readable semantics of website content.

In this paper, an ontology-based approach to designing information architecture of websites has been proposed. The approach harnesses the expertise of information architects in designing high quality websites as well as the new conception of defining machine-processable semantics of information resources using ontologies, which was formulated by the Semantic Web. The approach empowers Web developers with the methodology and instruments for conceptual modeling of the user aspects, information structure, navigation, and layout of Web resources as well as the mechanisms for automatic generation of websites based on the created models.

iii TABLE OF CONTENTS

CHAPTER TITLE PAGE

Title Page i Acknowledgements ii Abstract iii Table of Contents iv List of Figures v

1 Introduction 1 1.1 Background 1 1.2 Statement of the Problem 2 1.3 Objectives 3 1.4 Scope of Study 3

2 Literature Review 4 2.1 Website Development Process 4 2.2 Information Architecture 7 2.3 Website Modeling Approaches 9 2.4 Ontologies 13 2.5 Ontology-Based Approaches for Website Design 15 2.6 Website Evaluation Methods 18 2.7 Summary 19

3 Methodology 21 3.1 Analysis Phase 21 3.2 Development Phase 22 3.3 Evaluation Phase 22

4 Implementation 23 4.1 Website Design Methodology 23 4.2 Upper-level Ontologies 24 4.3 Plone Website Generator 27 4.4 User-case for AIT Website 29

5 Evaluation 40 5.1 Evaluation Criteria 40 5.2 Comparison 41 5.3 Discussion of the comparison results 43

6 Conclusions and Future Research 45 6.1 Conclusions 45 6.2 Future Research 46

References 47

iv LIST OF FIGURES Figure 2.2.1 RMM design methodology ...... 5 Figure 2.2.2 Summary of the OOHDM methodology...... 6 Figure 2.2.3 Lifecycle of a Web application (Fraternali approach) ...... 6 Figure 2.2.4 RMM primitives...... 11 Figure 2.2.5 Visual vocabulary ...... 13 Figure 2.2.6 OntoWebber system architecture ...... 15 Figure 2.2.7 OntoWebber site models and their relationships ...... 16 Figure 2.2.8 WISE system architecture...... 18 Figure 4.1. Website design methodology ...... 23 Figure 4.2. Hierarchy of the Presentation Upper-level Ontology...... 26 Figure 4.3. Composition of Pages ...... 27 Figure 4.4. Architecture of Plone Website Generator ...... 28 Figure 4.5. Structure of Plone product ...... 29 Figure 4.6. Directory structure of Plone product...... 29 Figure 4.7. A user in the User-Task Model for the AIT website...... 31 Figure 4.8. An information class in the Information Model for the AIT website ...... 32 Figure 4.9. A node in the Navigation Model for the AIT website ...... 33 Figure 4.10. A path in Navigation Model for the AIT website ...... 34 Figure 4.11. A view in the Presentation Model for the AIT website ...... 35 Figure 4.12. Composition of the 'About Us' page ...... 36 Figure 4.13. Composition of the 'main_template'...... 36 Figure 4.14. A CSS style definition in the Presentation Model for the AIT website...... 37 Figure 4.15. List of content types in the AIT website ...... 38 Figure 4.16. A form for editing a document according to the ‘school’ content type ...... 38 Figure 4.17. Website hierarchy...... 39

v CHAPTER 1 INTRODUCTION

1.1. Background Interest in developing information resources on the Web has grown dramatically over the past decade. In 1990, Tim Berners-Lee, the researcher at CERN, European Laboratory for Particle Physics located in Switzerland, developed the first tool for creating documents for the Web [Berners-Lee, 2001]. The software, called World Wide Web (later renamed to Nexus), provided WYSIWYG interface for hypertext editing and browsing Web resources [Berners-Lee, 2004]. The application provided limited number of simple functionalities for creating and editing hypertext documents, inserting links and anchors, and working with style sheets. However, it gave the start to one of the fastest growing directions in the modern world, Web development.

Nowadays, the number of websites on the Internet doubles each two years. According to the Netcarft Web Server Survey, in August 2006, the number of websites across all domains exceeded 92 billions [Netcraft, 2006]. Moreover, we can observe the steady growth in the complexity of website content, structure, and provided functionalities. This growth created new disciplines that address different aspects of website development. Web and Graphic Design, Content Management, Web Publishing, and Web Engineering are just few examples of them.

One of the newest and fanciest disciplines that address the issues of developing high quality websites is Information Architecture. The Information Architecture Institute defines Information Architecture (aka IA) as: “(1) The structural design of shared information environments. (2) The art and science of organizing and labeling Web sites, intranets, online communities and software to support usability and findability. (3) An emerging community of practice focused on bringing principles of design and architecture to the digital landscape.” [http://iainstitute.org].

Different approaches and technologies have been proposed to enable Web developers building complex websites. The toolkit of today’s Web developer includes, but not limited to, visual editors and site managers, Web-enabled hypermedia authoring tools, Web database and programming language integrators, Web form editors and report writers, and model-driven Web application generators [Fraternali, 1999]. Several approaches were developed to guide Web developers through set of standardized steps and provide them common conventions which would alleviate and speed up the website development process.

However, the Web is the ever-changing environment which never stops but evolves dramatically. According to Tim Berners-Lee, the father of WWW and still active player on the Internet arena, the first part of his dream about the Web as “...a common information space in which we communicate by sharing information” has been realized—there is immense flood of information on the Internet [Berners-Lee, 2001]. To realize the second part of his dream, to make the Web a place wherein we can “work and play and socialize”, we need “information about information, to help us categorize, sort, pay for, own information”. This need “is driving the design of languages for the Web designed for processing by machines, rather than people”. And this is what Semantic Web can do.

In the Semantic Web, information is given well-defined meaning so that it can be understood by both, humans and computers [Berners-Lee et al., 2001]. However, in order to make the Semantic Web work, we have to structure our information collections with inference rules so that computers can do automatic reasoning about the stored information. Therefore, the main challenge of the Semantic Web is to provide a language that can be used to describe the information and rules for reasoning about this information. And we already have the

1 technologies for that—eXtensibe Markup Language (XML) and the Resource Description Framework (RDF) are two of them. XML enables us to structure the information on Web pages by XML tags. Given the meaning of each tag used in a XML document, computer program can process this document and “understand” the meaning of information on it. In its turn, RDF represents the meaning of information by the sets or triples being like the subject, verb and object of a sentence. Using RDF triples, we can create structures that describe the relationships between information pieces on the Web.

The third important tool of the Semantic Web is ontologies, the mechanisms of specifying conventions for the sharing and reuse of information [Berners-Lee et al., 2001]. The term ontology derives from philosophy where it means the science about the nature of existence [Gruber, 1993]. The researchers from Artificial Intelligence realm use ontology as a formalism that defines the relationships between terms [Berners-Lee et al., 2001]. In the Semantic Web, ontologies can be utilized as a tool for creating the common agreements that computer programs, called agents, can use for inference.

1.2. Statement of the Problem With the evolution of the Web, we can observe the shift of the requirements that website owners and users set for website structure, content, and provided features. Nowadays, in order to develop an information resource or system on the Web that would successfully serve its purpose and address its target audience’s needs, many factors must be considered. Majority Web developers consider technology as the main success factor of websites and ignore the structure, which is the biggest misunderstanding. “Focusing on technology alone is like preparing a meal by equipping the kitchen with cooking utensils, then ignoring the quality of the ingredients” [Evernded et al., 2003; Shedroff, 2001]. The structure and other important factors of Web design, users, navigation, and representation, are addressed by the discipline of Information Architecture [Rosenfeld and Morville, 1997].

Information Architecture can improve the quality of websites from many perspectives. The discipline allows Web developers and content owners better understand the website goals, its users and their needs and characteristics. It selects an appropriate scheme for organizing information on websites and creates the required information classification structures for that scheme. IA designs the navigation mechanisms that allow users easily orient and navigate through the information resource. It also creates the layout and graphical design for the Web pages, which allows the content on them flow in the best way.

However, Information Architecture is relatively a new field and the information architects are rare and expensive professionals. Therefore, only few companies can afford developing a successful website that would satisfy all the requirements set up from the perspective of Information Architecture. This raises the need in an approach that would empower Web developers with easy-to-use but powerful tools for building information architecture of websites.

Another important aspect that today’s Web developers have to consider is the shift towards the Semantic Web. As we are moving to the information space where the information must have defined semantics that be understood by both, humans and computers, we need to put our attention to the ways to make the information on websites readable by computer programs. Therefore, the approach has to empower the designers building the information architecture using the structures and language which can be processed by computer software in order to gain the meaning of information on websites.

2 1.3. Objectives Considering the current demand for the tools and mechanisms for developing high quality websites, and the opportunities that Information Architecture and Semantic Web technologies bring to the Web space, an ontology-based approach to designing information architecture of websites has been proposed as the main objective of this study. The approach harnesses the expertise developed by the information architects in designing high quality websites as well as the new conception of organizing information using ontologies formulated by the Semantic Web. The approach empowers the Web developers with the methodology and instruments for designing the four conceptual models that constitute information architecture of websites: 1) user-task model, 2) information model, 3) navigation model, and 4) presentation model.

The main objective of this study was achieved by:

• Analyzing the existing website development approaches, revealing their strengths and weaknesses.

• Designing upper-level ontologies that define the general concepts and relations needed to model different aspects of information architecture.

• Developing a software to convert individual model ontologies into a working website.

• Developing a use-case to test and demonstrate the potential of the approach.

• Evaluating the approach and comparing it with other website modeling approaches.

1.4. Scope of Study Development of the ontology-based approach to designing information architecture of websites must consider many aspects from different disciplines. This research is based mainly on the three areas of science: Web Development, Information Architecture, and Semantic Web.

Web Development. In context of this study, the extensive analysis and comparison of existing Web development approaches has been conducted. The aim of this analysis is to understand the strengths and weaknesses of the existing Web development approaches, and identify the approaches which are appropriate for the proposed framework.

Information Architecture. One of the goals of this research is to identify all the aspects of building high quality websites developed by the professionals in the realm of Information Architecture. In particular, this includes the standards, guidelines, and best practices of defining users, organizing information on websites, creating navigation paths, and designing layout.

Semantic Web. As it was mentioned before, the Semantic Web technologies use ontologies as a tool for giving the meaning to the content on websites. Therefore, this study conducts extensive research on the potential of ontologies as the media for storing information structure, content, navigation paths, layout, and users of websites.

This study also affects several aspects from the Content Management realm. Content Management and Information Architecture are the “two sides of the same coin” [White, 2004], therefore, must go hand in hand. This research aims to find the ways to convert the information architecture models designed according to the approach into a Content Management System.

3 CHAPTER 2 LITERATURE REVIEW

2.1. Website Development Process Website development is a very complex process which involves many activities and requires collaboration of people with different skills. Websites can be developed using the traditional software development lifecycle, which guides the development team through the set of five phases, namely planning, analysis, design, implementation, and maintenance [Hoffer et al., 2005]. However, there are several specific aspects that are applicable to the development of websites. First, Web development projects usually involve people with different backgrounds: authors, librarians, content designer, artists, and programmers [Isakowitz, 1995]. Second, Web design must deal with the information structures and schemes as well as the aesthetic and cognitive aspects that the traditional software engineering does not consider [Nanard and Nanard, 1995].

Several methodologies have been proposed to guide the developers through the process of Web design. Relationship Management Methodology (RMM) and Object-Oriented Hypermedia Design Model (OOHDM) are two of them. Though, these two approaches propose complex methodologies and formal languages for modeling hypertext resources, in this part of the study, we consider them only from the perspective of the development process.

2.1.1. Relationship Management Methodology RMM is a methodology for design and construction of hypermedia applications using the notation of entity and relationship borrowed from the ER notation used in the database design theory [Isakowitz et al., 1995]. RMM focuses on design, development and construction phases. Figure 2.1 shows the design process, which consists of seven steps:

• S1: E-R Design. Entity-Relationship diagrams are constructed in order to represent the conceptual model of the application.

• S2: Slice Design. The designer splits entities into slices, which represent how information in the entities will be presented to users, and organizes them into a network.

• S3: Navigation Design. The designer defines the navigation paths.

• S4: Conversion Protocol Design. Each element on the RMM diagrams is transformed into the object in the target platform.

• S5: User-Interface Design. The designer creates screen layout for every object present on the RMM diagram.

• S6: Run-Time Behavior Design. The designer defines behavior of the elements on the RMM diagram.

• S7: Construction and Testing. The target application is implemented and tested.

Figure 2.2.1 RMM design methodology

2.1.2. Object-Oriented Hypermedia Design Model OOHDM methodology guides the developers through the four-step process and supports prototyping and conceptual modeling [Schwabe and Rossi, 1995]. The development process, shown in Figure 2.2, consists of the following steps:

1. Domain Analysis. A conceptual model of the website is built using the object- oriented modeling principles.

2. Navigation Design. The navigation structure of the site is designed using the notions of node, link, index, and guided tour.

3. Abstract Interface Design. The interface of the website is constructed using the primitive classes, such as text fields and buttons.

4. Implementation. Implementation objects are generated based on the conceptual model designed at the previous steps.

OOHDM is different from RMM methodology in the two following aspects [Schwabe, 1995]. First, OOHDM focuses on the navigation and abstract interface design. Second difference is that OOHDM uses object for modeling, rather than E-R diagrams.

Figure 2.2.2 Summary of the OOHDM methodology

2.1.3. Fraternali Approach Fraternali proposed his own methodology of website development by combining the OOHDM and RMM methodologies with the traditional software development lifecycle [Fraternali, 1999]. The process, shown in Figure 2.3, consists of the following activities:

• Requirements Analysis. Identification of prospective users and their information needs.

• Conceptualization. The designer develops a conceptual model that represents the main components of the target website.

• Prototyping and Validation. The designer develops a prototype of the application in order to get the early feedback from the target users.

• Design. The designer develops a conceptual models that convey the structure, navigation, and presentation of the application.

• Implementation. Web pages are generated based on the designed models.

• Evolution and Maintenance. Errors in the structure, navigation, and presentation are fixed if necessary.

Figure 2.2.3 Lifecycle of a Web application (Fraternali approach)

6 2.2. Information Architecture The most important part in the process of Web development, the design part, is captured by Information Architecture. IA covers all the aspects of designing Web applications, which include selection of the appropriate scheme for organization of information on the website, creation of the information structures for classifying the website content, designing the navigation paths, and defining the look and feel of the resource. Information Architecture generates set of specifications which during implementation phase are converted into the code, graphics, and other elements that constitute a website.

Information architecture of a website can be designed in two ways: Top-Down and Bottom-Up [Rosenfeld and Morville, 1997; Farnum, 2002]. The Top-Down approach designs the website from its main page to the lowest-level pages. This can be done by developing the set of categories and sub-categories or taxonomy that covers the content on website. The Bottom-Up approach builds the website architecture by analyzing the smallest parts, so called content chunks, and grouping them into categories. Though, some people may think about the advantages of one approach over another, the experienced information architects suggest using both of them.

2.2.1. Information organization The first step of building information architecture of a website is deciding how information will be organized on it. Information organization defines which aspects of content should be used for grouping information on the resource. This is done by creating organization schemes [Rosenfeld and Morville, 1997].

An organization scheme represents the set of shared characteristics of information on website, which can be used for logical grouping of content chunks. There are two organization schemes: exact and ambiguous. Exact organization scheme organizes information by the set of mutually exclusive categories wherein a page can be put only into one category. Mainly, there are three exact organization schemes: alphabetical, chronological, and geographical [Duyne et al., 2002]. Ambiguous organization scheme creates categories for pages that do not have exact meaning and therefore can belong to several sections. Ambiguous organization can be exemplified by the following schemes: topic (subject), task, and target audience.

2.2.2. Information classification After information organization scheme has been selected, the information architect has to create information structures that define vocabularies for information classification. This will tell the content managers which categories assign to Web pages and navigate users through the website [Rosenfeld and Morville, 1997]. There are several schemes that the information architects can use for classifying information on websites: metadata, controlled vocabularies, taxonomies, thesauri, and facets.

Metadata in Information Architecture represents information about content objects: Web pages, images, video and sound files placed on websites [Garshol, 2004]. The best-known vocabulary for metadata is Dublin Core, which is a set of 15 properties that may be applied to information resources to describe them [Weibel et al., 1998]. These properties contain information such as “title”, “creator”, “subject”, “description”, “publisher”, “date”, “language”, etc. Dublin Core defines the meaning of each of its fifteen properties, but does not expresses how these properties are used, which makes the DC metadata independent of technology.

A controlled vocabulary is the set of unambiguous and non-redundant subject headings that have to be enumerated explicitly. This set is created and maintained by registration authority

7 [Pidcock 2003]. The purpose of controlling vocabulary by an authority is to prevent duplicate items, standardize terminology, and avoid authors from misspelling the terms [Garshol, 2004].

A taxonomy is a hierarchical controlled vocabulary [Pidcock, 2003]. The terms in taxonomy are organized as a tree that contains general terms on the first level of the hierarchy, which in their turn are subdivided into more narrow categories. The depth of the hierarchy is not limited. However, when architecting for the Web, the developers must consider that the website users might be lost in too deep hierarchy.

A thesaurus is a controlled vocabulary organized as a network of nodes which represent concepts [Pidcock, 2003]. The ISO2788 standard defines the following properties of subjects in a thesaurus [Garshol, 2004]:

• BT (short for "broader term") refers to the term which is parent to the current term.

• SN (short for “scope node”) provides textual description of the current term.

• USE refers to another term that is more preferred than the current term.

• TT (short for "top term") refers to the ancestor term located on the first level in the thesaurus.

• RT (short for "related term") refers to the related term.

A faceted classification uses explicitly defined, mutually exclusive, and collectively exhaustive aspects of subjects to classify them [KMC, 2004]. Such aspects are called facets of a subject. The faceted classification selects one term of the facets and uses it to describe the subject alone all the different aspects. Ranganathan, the father of the faceted classification, proposed the five following facets to classify objects in an information collection [Garshol, 2004]:

• Personality refers to the primary subject of the document (main facet).

• Matter refers to the material of the document.

• Energy represents the process which the document describes.

• Space describes the location mentioned in the document.

• Time refers to the time period which the document describes.

2.2.3. Navigation Once you select the scheme that you are going to use for organizing the content on your website and develop the corresponding information classification structure for that scheme, you can start designing navigation model of your website. Navigation is the most important aspect for websites [Doss, 2002a]. Consistent and informative navigation allows users easily navigate through the website and orient themselves on it. There are several navigation types that the Web designers can use for navigating the users. These types include global, local, supplemental, and contextual navigations. Each of these types can use different navigation mechanisms, such as embedded links, navigation bars (top and side bars), breadcrumb trials, search bars, sitemaps, paging, and homing [Creative Commons].

A website global navigation (aka universal navigation) is the major navigation roadmap which is accessible from all the sections of the website [Doss, 2002a]. It allows users to move through 8 all the parts of the resource and usually consists of few broad categories [Timberlake, 2001]. Global navigation can be implemented as top or side nav bar and tabs. However, the experienced Web designers suggest using top nav bar for global navigation on the website [Doss, 2002a]

A local navigation (aka sub navigation) is intended to navigate users within a top-level category of the website. Usually local navigation is implemented as the side nav bar [Doss, 2002a].

Contextual navigation allows users to move to the pages which are related to the content on the current page. The simplest example of contextual navigation is embedded links, the links that are present within a paragraph [Doss, 2002a]. Amazon.com demonstrates another example of contextual navigation, providing links to the books that were bought by the customers who bought this book, or linking to the customer and editorial reviews.

Supplemental navigation includes complimentary navigation mechanisms such as breadcrumb trails, search bars, sitemaps and site indexes, paging, and homing [Doss, 2002a].

2.2.4. Page layout Another important aspect of website design is creating effective page layout. In practice, information architects use paper-based prototype and wireframes to define schematically the grid layout of pages on the website [Macdonald, 2003; Doss, 2002b]. Paper prototyping is inexpensive and fast method to define the interface of the website using hand sketched diagrams. Wireframes express the general structure of pages. They define which sections a particular page has consists of and which content should be displayed in these sections.

2.2.5. Look and feel The first impression about a website users make by the visual design of its pages. Therefore, we can not neglect the graphic design and aesthetics of our Web pages. Though, in many cases the graphic design of websites is done by artists, information architects can also do this part of Web design. This can be done by developing style guides. Style guide is a document that defines the visual requirements for the website, including font classes, color palettes, typography, and many other aspects of what makes the “look and feel” of websites [Doss, 2002b].

2.3. Website Modeling Approaches In the early history of the Internet, the designers developed Web applications by “simply building the solution”, without putting much stress on the conceptual modeling phase [Ceri et al.]. However, as the size and complexity of today’s Web applications grow, the developers face more problems. Many of these problems are related to inability to meet the users’ requirements as well as the time and budget limits. This happens due to the difficulties of capturing all the users’ requirements at the early stage, which forces the developers to go back to the analysis phase, elicit the requirement again, and do modifications in the application code. This created the need for the tools that provide information architects and Web developers the support for conceptual modeling of Web applications.

There are many advantages of having tools and approaches for conceptual modeling of websites. They include, but not limited to, the following [Garzotto and Paolini, 1993]:

• Improvement of communication. The conceptual modeling phase generates a set of specifications which are written in formal languages. These languages can be used as the media for communication between the user and the analyst, between the analyst and the designer, and between the designer and the implementer.

9 • Development of design methodologies. The modeling approaches provide a set of guidelines that the Web developers can follow in order to conceptualize, analyze, and design applications on abstract level without going into the implementation phase.

• Reusability. Since the modeling tools define the semantics of Web applications on abstract level, once the model has been produced, it can be reused in similar projects in the future.

• Providing consistency. The conceptual modeling approaches ensure consistency of all the elements within a website.

• Use by design tools. The modeling approaches can be implemented as a part of design tools. Therefore, the deliverables produced on the conceptual modeling phase can be automatically converted into the code.

Several approaches have been proposed for modeling information architecture of the systems on the Web. These include specifications, languages, notations, and methodologies. In this study, four approaches for website conceptual modeling have been analyzed. These are Hypertext Design Model (HDM), Relationship Management Methodology (RMM), Web Modeling Language (WebML), and visual vocabularies.

2.3.1. Hypertext Design Model Hypertext Design Model is a structural approach for “authoring-at-large” [Garzotto and Paolini, 1993]. Authoring-at-large enables designers define the classes of all the information elements and navigation mechanisms that constitute the website and carry out it on conceptual level without specifying the implementation details.

The HDM approach defines a set of primitives that the designers can use for specifying the structure and navigation of the information resource. These primitives include:

• Schema describes all the classes of information elements from the perspective of presentation, internal organizational structure, and the types of their mutual inconsistencies.

• Entity is an element that represents a single information chunk on the website. Entity is formed as a hierarchy of components.

• Entity type is a group of entities that share the same characteristics.

• Component is a group of units.

• Unit is an atomic information element.

• Structural link connects together components of the same entity.

• Perspective link connects together the different units of the same component.

• Application link defines the relationships between components and entities.

• Link type is a group of application links.

Using the above mentioned primitives, the designer can describe a Web application on conceptual level by creating HDM diagrams. These diagrams can be used during the design 10 phases as the language to convey the semantics of the modeling applications. Once the design phase is completed, the created diagrams can be transformed into the code.

2.3.2. Relationship Management Methodology Relationship Management Methodology is an approach that provides the methodology for constructing hypermedia applications (described above) as well as defines the formal language for conceptual modeling [Isakowitz et al., 1995]. RMM utilizes the notion of entity-relationship diagrams derived form database design theory. Similar to the HDM approach, RMM uses the set of primitives (shown in Figure 2.4) for defining the structure of application and navigation through it.

Figure 2.2.4 RMM primitives

2.3.3. Web Modeling Language Web Modeling Language is the most recently proposed approach for website conceptual modeling [Ceri et al.]. WebML allows defining the main features of Web application using a set of intuitive visual elements, so-called concepts. The WebML concepts can be used by any CASE tool for enabling non-technical personnel to specify the website requirements. WebRatio is one of the available CASE tools that support the WebML notations. Another important advantage of WebML is that the approach supports XML syntax, which can be used by software generators for implementation of a website.

WebML consists of four perspectives that describe different aspects of the website. These perspectives include: structural model, hypertext model (divided into two sub-models: composition and navigation models), presentation model, and personalization model.

Structural model uses the notion of entity and relationship to describe the data content of the website. Similar to the HDM and RMM approaches, entity represents an information chunk. Entities can have attributes that denote properties of information objects. To define associations between two or more entities, WebML uses the notion of relationships. 11 Hypertext model describes the hypertext structures of website. For this purpose, two sub- models are used: composition and navigation models.

Composition model describes the pages that constitute the hypertext structure of the website as well as which content units compose these pages. There are six unit types that can be used to make up the hypertext:

• Data unit represents information about a single object.

• Multidata unit represents information about a set of objects.

• Index unit represents a list of objects without describing each object in details.

• Scroller unit defines a command for accessing the elements in a set of objects.

• Filter unit represents input field for entering search keywords.

• Direct unit expresses the connection to a single object semantically related to another.

Navigation model describes how pages and content units are connected within the hypertext structure. For this purpose, WebML uses the notion of link. There are two types of links:

• Contextual link denotes the semantic connection between pages.

• Non-contextual link associates pages in a free way.

Presentation model describes the layout and visual appearance of pages without specifying the output device and rendition language. For this purpose, WebML uses XML syntax. The presentation model can define the layout of each page separately or convey it for all the pages within the website.

Personalization model defines how the content will be presented to a particular user group. A single user is represented with the primitive of User. The set of users that share common characteristics are represented using the notion of User Group.

2.3.4. Visual vocabularies Jesse Jame Garret [Garret, 2006] proposed a set of graphic shapes to construct conceptual model of a website. The Garret’s visual vocabulary can be used by information architects to describe information structure and navigation paths of a site. This vocabulary includes, but not limited to, the following modeling shapes (see Figure 2.5):

• Simple rectangle denotes a page.

• Dog-eared document represents a file without navigation properties.

• Pagestack indicates a group of similar pages.

• Filestack indicates a group of similar files.

• Connector denotes an association between elements.

• Area represents a group of pages (depicted as a rounded-corner rectangle).

12 This vocabulary also includes the symbols for conveying the interaction of users with the website, including flows, decision points, exits, conditional connectors and branches, and clusters.

Figure 2.2.5 Visual vocabulary

2.4. Ontologies An ontology is a discipline that originates from philosophy [Gruber, 1993a]. From the philosophy perspective, ontology is the science of existence. The researches from Artificial Intelligence and Computer Sciences use the concept of ontologies as a tool that is capable of defining relationships between terms in a natural way. More specifically, ontologies can be used as a language for describing the structure of content elements on websites, their properties and relationships with other content objects in a structural and consistent manner.

One of the most cited definitions of ontologies is the one coined by [Gruber, 1993b], who defined ontology as follows: “an ontology is an explicit specification of conceptualization”. This definition was later modified by [Borst, 1997] and explained by [Studer et. al, 1998] as follows: “An ontology is a formal, explicit specification of a shared conceptualization. Conceptualization refers to an abstract model of some phenomenon in the world by having identified the relevant concepts of that phenomenon. Explicit means that the type of concepts used, and the constraints on their user are explicitly defined. Formal refers to the fact that the ontology should be

13 machine-readable. Shared reflects the notion that an ontology captures consensual knowledge, that is, it is not private of some individual, but accepted by a group.”

Ontologies describe objects using individuals (instances), classes (concepts), attributes, and relationships [Onto]. Individuals are the basic components of ontology. They include real objects, for examples, people, animals, planets, etc. Classes are the sets of objects that share same characteristics (individuals). Attributes are the characteristics of classes. And finally, relationships are the associations between classes. There are three types of relationships: subsumption defines objects as members of a class, hierarchy uses ‘child’ and ‘parent-of’ relationships to build hierarchy of class, and meronymy relationship describes how objects are combined together to build another objects.

2.4.1. Types of ontologies Ontologies can be classified in two ways: according to the amount and type of structure of the conceptualization and the subject of conceptualization [Gomez-Perez et al., 2004]. Under the first categorization type, the following types of ontologies are distinguished: controlled vocabularies, glossaries, thesauri, informal is-a hierarchies, formal is-a hierarchies, frames, ontologies that express value restrictions, and ontologies that express general logical constraints.

In the second category, which distinguishes ontologies by the subject of conceptualization, the following types of ontologies are identified:

• Knowledge representation ontologies define the representation constructs that can be used to formalize knowledge according to a particular knowledge representation paradigm [van Heijst et al., 1997].

• General ontologies define primitives that can be used to represent common sense knowledge, which can be reused across different domains [van Heijst et al., 1997].

• Top-level ontologies or upper-level ontologies define very general concepts, to which all root elements in existing ontologies can be grounded [Gomez-Perez et al, 2004].

• Domain ontologies define concepts and relationships that can be used as vocabularies in a particular domain of knowledge, for example medicine, engineering, law, etc [Mizoguchi et al, 1995].

• Task ontologies define vocabularies that are specific to a generic task or activity, for example, diagnosing, scheduling, selling, etc [Mizoguchi et al., 1995].

• Application ontologies define concepts and relations that are required to model knowledge in a particular application [van Heijst et al., 1997].

2.4.2. Languages for building ontologies Several languages have been proposed for designing ontologies. They can be exemplified by Ontology Inference Layer (OIL), DARPA Agent Markup Language (DAML), DAML+OIL, and Web Ontology Language (OWL). OWL is the most recent ontology language which suppressed DAML+OIL. Therefore, this study focuses on OWL only.

The Web Ontology Language (OWL) was designed by the World Wide Web Consortium as a part of the growing stack related to the Semantic Web [OWL]. OWL provides vocabulary and formal semantics to describe information so that it can be consumed by humans and also processed by computers. OWL consists of three sublanguages: OWL Lite, OWL DL, and OWL

14 Full. Each of the sublanguages is an extension of its simpler predecessor. OWL Lite is targeted those users who need a tool for building a classification hierarchy or defining simple constraints. OWL DL provides the maximum expressiveness but uses certain restrictions (e.g. a class can not be part of another class). OWL Full provides the maximum expressiveness and does not limit the ways of defining associations between classes.

2.5. Ontology-Based Approaches for Web Design The potential of ontologies to create “a shared and common understanding of a domain that can be communicated between people and application systems” [Davies et al., 2002] attracts more and more interest from Information Architects and Web designers. They regard ontologies as a tool capable of defining the machine-readable semantics of website content. Few approaches have been proposed recently to utilize this potential of ontologies. OntoWebber, OntoWeaver, and WISE are the three examples of these approaches.

2.5.1. OntoWebber OntoWebber is a system for managing information on the Web by utilizing formal semantics encoded with RDF [Jin et al., 2002]. The OntoWebber uses a model-driven, ontology-based approach to provide support during the entire website development lifecycle, including design, generation, personalization, and maintenance.

The system architecture (shown in Figure 2.6) consists of four layers [Jin et al., 2002]:

• Integration layer deals with the problem of syntactical differences between heterogeneous data sources.

• Articulation layer resolves the differences between different data sources even though they are converted into RDF format.

• Composition layer creates site view specifications based on the provided data source.

• Generation layer instantiates the created site view with the data from repository.

Figure 2.2.6 OntoWebber system architecture 15 The OntoWebber approach also defines the methodology for website design. The methodology is implemented as an iterative process which includes the following steps [Jin et al., 2002]:

1. Requirement analysis. During the initial phase, the detailed analysis of the website objectives and user requirements is conducted.

2. Domain ontology design. At this stage, the designers build the domain ontology which serves as the reference ontology for ontology articulation.

3. Site view design. A site-view graph is constructed at this step using the characteristics, preferences, and requirements or the target website users. The graph is expressed as a graphic representation of three aspects of site view: navigation, content, and presentation.

4. Personalization design. Personalization elements are defined. They include categorical information about the user, such as age, browser, type, etc.

5. Maintenance design. Defines the maintenance aspect of the website.

The OntoWebber approach uses the notion of site model which is definition of all the models used in the website modeling process [Jin et al., 2002]. There are six types of site models (see Figure 2.7):

• Domain model is an ontology that defines common concepts of the website.

• Navigation model describes the site-view graph which denotes how pages and elements on them (cards) are connected.

• Content model associates the design primitives with the site-view graph.

• Presentation model defines the look and feel of the Web pages of the site.

• Personalization model describes the website users by three properties (capacity, interest, and request) and how the website content should be delivered to these users.

• Maintenance model defines how the website is maintained and includes content and functional maintenance types.

Figure 2.2.7 OntoWebber site models and their relationships

There are also several limitations of the approach which can be resolved in the future. In particular, the developers need to decide how the different versions of ontologies can be 16 managed, how to optimize the strategies of website generation, and how to utilize dynamic Web services [Jin et al., 2002].

2.5.2. OntoWeaver OntoWeaver is an ontology-based approach for designing and development of data-intensive websites [Lei et al., 2005]. This approach uses ontologies for modeling the three important aspects of websites: site structure, presentation, and personalization. These aspects are modeled using site-view ontology, presentation ontology, and customization framework.

Site-view ontology describes the structural aspects of the website by the three types of constructs. Navigation construct denotes a link that connects different content elements. Atomic user interface construct expresses the simple components of user interface (input/output elements as well as command elements). Composite user interface construct denotes the complex elements of user interface.

Presentation ontology allows specifying the layout and presentation styles of the website. For this purpose the approach uses templates designed as an ontology. These templates describe presentation styles (background, font color and size, etc). The reference to user interface elements is implemented as Uniform Resource Identifiers (URI).

Customization framework consists of the user ontology and customization rule model. The user ontology describes the website users and associated customization actions, while the customization rule model defines when and how to perform customization actions.

At the moment, the OntoWeaver framework provides very limited critiquing functionality. In the future this functionality can be enhanced by defining complex constraints to verify the validity of complex website structures, and allowing the specification of critiquing rules. It is also planned to extend the customization facility of the framework by allowing the reuse of user profiles which originate from different customization technologies [Lei et al., 2005].

2.5.3. WISE WISE (Web Information System auto-construction Environment) is the most recent proposal for ontology-driven development of Web information systems (WIS) [Tang et al., 2006]. This framework provides the following features:

• Utilization of WIS ontology in order to specify the user requirements.

• Providing graphical interface for ontology development.

• Automatic design of user interface and generation of the code from WIS ontology.

The system is developed as prototype using Eclipse platform. The system architecture (shown in Figure 2.8) consists of three tools:

• WISE Builder is a graphical tool used for constructing domain and behavior ontologies.

• WISE Mapper provides transformation between ontologies and relation database management systems.

• Code Generator generates the source code based on the designed ontologies.

Figure 2.2.8 WISE system architecture

As it was mentioned earlier, WISE is the prototype that provides limited set of functionalities. This system can be enhanced to a great extent by developing a tool for refining the generated code and mechanisms that are capable for defining workflow to control business processes [Tang et al., 2006].

2.6. Website Evaluation Methods Several methods have been proposed to evaluate websites. However, as websites are designed to serve users’ needs, the best evaluation technique is to let users work with the website and provide feedback on it [Nanard and Nanard, 1995]. One of the possible ways of collecting feedback from users is to invite them to think aloud about what they are doing [Shneiderman and Plaisant, 2005]. The evaluators can record on video the users performing tasks, which can reveal the designers the possible problems that users can face using the website.

2.6.1. Heuristic evaluation In order to elicit opinions of experts regarding a website, the development team can use heuristic evaluation. This is an informal method of usability analysis wherein several experts are given an interface and asked to comment on it [Nielsen and Molich, 1990]. [Garzotto et al., 1995] proposed the following set of heuristics to evaluate a Web application:

• Richness examines how much information the website contains and the ways of getting to this information.

• Ease evaluates how easy information can be accessed and functions invoked.

• Consistency measures the regularity of different aspects in the applications, such as fonts, colors, labeling, etc.

• Self-evidence shows how well the user understands meaning of content and functionalities.

18 • Predictability shows if the user can anticipate the output of the requested function.

• Readability measures the user’s feel about the website overall.

• Reuse identifies if the website objects can be used in different context.

2.6.2. Architectural evaluation criteria Another approach, which can be very useful for evaluating website information structure, navigation, and layout, was proposed by the researches from Yonsei University in Seoul, Korea [Hong and Kim, 2004]. They proposed a set of website evaluation criteria which were derived from Architecture and used for evaluating buildings and constructions.

The researches found two similarities between websites and buildings. First, websites and buildings have similar objectives—buildings are constructed to provide a space for physical activities while websites are designed to provide a space for virtual activities. We construct buildings for markets, schools, libraries, and post offices. We also design websites for online malls, e-learning, virtual libraries, and email. The second similarity is about how users perceive websites and buildings. For example, users can expect from both, buildings and websites, to be reliable and convenient.

Using the parallel with buildings, the researches created the following categories of criteria for evaluating websites:

1. Robustness refers to the solidity of the system structure in overcoming all expected and unexpected threats. Robustness of a website can be evaluated by two criteria: internal reliability that measures the operational stability of websites, and external security that measures how safe the website is from external threats.

2. Functional Utility evaluates how appropriate the website is for usage by two criteria: content usefulness which denotes the quality of information provided in websites, and navigation usability that refers to the ease of navigation of websites.

3. Aesthetic Appeal expresses how enjoyable website is to provide a pleasant feeling to the inhabitants. The appeal of websites can be evaluated by two criteria: system interface attractiveness which associates with the pleasantness of the human - computer interface, and communication interface attractiveness that refers to the pleasantness of the interfaces between users.

2.7. Summary In this chapter, the core aspects of website design and utilization of ontologies for Web development have been covered. The most important aspects that the website designs have to address include the website development methodology, information architecture, modeling approaches, and website evaluation. Those developers who are interested in utilization of ontologies for the design process have to consider the available ontology languages and the existing approaches that use ontologies for this purpose.

The development process aimed to produce a website can be realized as the traditional software development lifecycle which consists of five phases: planning, analysis, design, implementation, and maintenance. However, several methodologies have been proposed to address the issues specific to website design that the traditional SDLC does not consider. This study covers three of them: Relationship Management Methodology, Object-Oriented Hypermedia Design, and the most recent development methodology proposed by Fraternali. 19 The most important design issues that influence the success of websites are captured by Information Architecture. IA defines the scheme of organizing information on websites, creates the structures for classifying contents chunks, designs the navigation paths, and defines the look and feel of web pages. Information architects produce set of deliverables which include, but not limited to: sitemaps, wireframes, layouts, style guides. These deliverables are used as the tools to conceptualize the requirements and guidelines for implementers who construct the website.

As the website structures become more complex, the developers need the tools and methodologies that support the website conceptual modeling process. This chapter covered four modeling approaches and tools. Hypertext Design Model (HDM) enables designers define all the information classes and navigation elements that constitute a website. Relationship Management Methodology (RMM), together with the development process, specifies the primitives borrowed from Entity-Relationship diagrams that can be used for conceptual modeling of websites. Web Modeling Language (WebML) is the recent approach for conceptual modeling that utilizes ER notation and XML syntax for modeling the important aspects of websites.

One of the most recent and most promising topics on the Web, ontologies, have potential for improving the Web in many ways. For websites, ontologies can serve as the language for specifying the meaning of content, navigation mechanisms, and layout. First, software agents can “understand” the content of websites defined with ontologies and process it, for example, for indexing. Second, the content of websites can be presented in different ways depending on the user. Third, the website design process can be expedited by utilization of existing ontologies that define website structure, navigation, and layout. Three ontology-based systems for website development were reviewed in this chapter. These are OntoWebber, OntoWeaver, and WISE.

Finally, evaluation is the step that usually concludes the website development process. It aims to measure several aspects of the produced website: usability, robustness, security, aesthetic appeal, etc. Two approaches can be used for this purpose. Heuristic evaluation can elicit opinions of experts using set of heuristics. Another approach, proposed by researches from Yonsei University in Seoul, allows evaluate website using the set of architectural criteria which have been used for evaluating buildings.

20 CHAPTER 3 METHODOLOGY The ontology-based approach to designing information architecture of websites consists of two core elements: upper-level ontologies and website generator. The upper-level ontologies describe the general concepts and relations needed for designing four conceptual models that constitute information architecture of websites, namely user-task, information, navigation, and presentation models. The website generator is the software that converts the conceptual models into a working website.

The objectives of this study were accomplished in the three-phase methodology, which includes analysis, development, and evaluation phases. In the first phase, a wide range of academic resources on Information Architecture and Semantic Web have been reviewed and best practices of Web Design have been examined. During the second phase, the website design methodology, upper-level ontologies, and website generator were developed. Also, during the second phase, the approach was tested using the case study for the website of Asian Institute of Technology. Finally, during the evaluation phase, a set of evaluation criteria was developed and used to compare the approach with OntoWeber and WebML.

3.1. Analysis Phase The target users of the approach proposed in this study are information architects and Web designers. Therefore, in order to develop the approach that would address the actual needs of the potential users, the current IA and website design issues were identified and analyzed. This was accomplished by the following:

• Reviewing professional literature. Professional periodicals, whitepapers, and portals related to Information Architecture and Web Design were reviewed. Current trends in the area of Information Architecture were identified and studied.

• Participating in website development projects. Collaborating to a website development project enhanced the practical understanding of the design issues and reveal the actual problems that Web development teams face.

• Studying the best practices of Web Design. Different approaches that Web designers apply to solve the design problems were analyzed and compared. This will help to find the best solutions in identifying problems and implementing them in the proposed approach.

3.2. Development Phase

3.2.1. Development of the website design methodology During the first step of the development phase, the website design methodology was elaborated for the approach. Four iterative steps were identified in the methodology: user-task modeling, information modeling, navigation modeling, and presentation modeling. Additionally, for each of the four steps, a list of prerequisites and deliverables were formulated.

3.2.2. Development of the upper-level ontologies Based on the website design methodology defined during the preceding step, a set of four upper- level ontologies was created. These ontologies describe the general concepts and relations that are needed in order to construct conceptual models that constitute information architecture of a website.

21 • User-Task Upper-level Ontology defines concepts that can be used by the designer to describe potential users of the website, their demographic characteristics, social aspects and expectations from the website.

• Information Upper-level Ontology contains constructs that are necessary to define that information classes of the resource.

• Navigation Upper-level Ontology defines concepts required to describe navigation schemes of the website.

• Presentation Upper-level Ontology contains the constructs, which can be used to define the composition of pages, their layout, and the overall look-and-feel of the resource.

3.2.3. Development of the website generator During the next step, Plone Website Generator was developed. The software converts the model ontologies into a product for the Plone Content Management System.

3.2.4. Testing of the approach The approach was tested using the use-case of website of Asian Institute of Technology. For this website, a set of four model ontologies was created using the corresponding upper-level ontologies. These ontologies were provided as input for the Plone Website Generator, which produced a product for Plone CMS. This product was installed in the CMS, which created a working website of the institute, including the content types and objects, templates, and CSS styles.

3.2.5. Development tools The three most important issues related to the selection of tools for construction of the approach are: ontology representation language, ontology editor, programming language, and Content Management System.

• Ontology representation language. The Web Ontology Language (OWL) was used for development of the upper-level ontologies as well as individual website model ontologies.

• Ontology editor. Protégé 3.2.1 was used to develop and manage the ontologies.

• Programming language. The Plone Website Generator was written in Java programming language. For processing the ontologies, the Protégé-OWL Java API was used. Plone products are generated in the combination of five languages: Python, XHTML, TAL, METAL, and CSS.

• Content Management System. Plone CMS was selected to demonstrate the automatic generation of website based on the designed model ontologies.

3.3. Evaluation Phase During the final phase, a set of evaluation criteria was created. These criteria cover the factors that reflect information architecture aspects, modeling language aspects, potential for reuse of existing models, tooling support, aspects of the Semantic Web and user-friendliness. This set of evaluation criteria was used to compare the approach with OntoWebber and WebML.

22 CHAPTER 4 IMPLEMENTATION

4.1. Website Design Methodology The design phase follows the analysis phase and precedes the implementation phase of the traditional software development life cycle. The main purpose of the design phase is to produce a set of models that reflect various architectural aspects of the resource. The aspects of information architecture of a website include: 1) potential users and their tasks, 2) information structure, 3) navigation schemes and mechanisms, and 4) presentation views.

The overall website design methodology for the approach consists of four iterative and interrelated activities (Figure 4.1): user-task modeling, information modeling, navigation modeling, and presentation modeling. At the end of each modeling activity, a corresponded IA model is produced. The model is represented as an ontology written in Web Ontology Language (OWL) and must be grounded to the corresponding upper-level ontology.

User-Task 1 Modeling

Information Modeling 2

Navigation 3 Modeling

Presentation Modeling 4

Figure 4.1. Website design methodology

4.1.1. User-Task Modeling The first step in the process of website design is the user-task modeling. The purpose of this activity is to construct the User-Task Model, which defines the potential users of the website and describes the tasks that these users can carry out on the site. Each user or user category can be defined by its age, social life, occupation, location, computer experience, and expectations from the website. Each task is defined by its name, description, and type (view, edit, create, and delete). The User-Task Model is build as an OWL ontology and must be grounded on the User- Task Upper-level ontology.

4.1.2. Information Modeling During the Information Modeling step, the designer constructs a model that defines the information classes of the website and relationships between these classes. Each information

23 classe is defined by its unique name, textual description that explains its nature, attributes that represent different characteristics of the class, and the relationships with other classes.

The Information Model is represented as an OWL ontology and must be grounded to the Information Upper-level Ontology. In addition, the designer may ground the model to another upper-level ontology or link to an existing domain ontology. For instance, the designer can import one of the PROTON ontologies [PROTON]. PROTON ontologies contain about 300 classes and 100 properties, providing coverage of the general concepts necessary for a wide range of tasks, including semantic annotation, indexing, and retrieval of documents.

4.1.3. Navigation Modeling During the navigation modeling step, the designer defines the possible navigation paths that the user must take in order to access a page or perform a task on the website. This is done by identifying the nodes of the site and paths between the nodes. Node is used to represent a website content object, such a page of folder. Path is a directed graph that shows the traversal from one node to another. The Navigation Model is represented as an OWL ontology and must be grounded to the Navigation Upper-level Ontology.

4.1.4. Presentation Modeling The purpose of the presentation modeling is to construct the Presentation Model, which defines composition, layout and look-and-feel of pages on the website. This is the most time-consuming step and includes the following activities:

• Definition of views. View is associated with a particular information class (content type) defined in the Information Model and describes how the content objects of this type will be displayed on the website. The view is defined by constructing sequence of elements that will constitute the webpage. Each element can be associated with a particular attribute of the corresponding information class or can contain a text string.

• Definition of styles. Styles define the aesthetic aspects and positioning of elements on pages. The designer can define styles of the standard HTML tags, specific elements, and common class selectors. Styles are defined using the standard Cascading Styles Sheets.

The Presentation Model is represented as an OWL ontology and must be grounded to the Presentation Upper-level Ontology, which contains all necessary constructs for defining the above-mentioned presentation aspects.

4.1.5. Website Generation After completion of the design phase, the constructed IA models can converted into a working website in Plone CMS. The conversion is done by the Plone Website Generator software, which generates a Plone product. This product, when installed in Plone, creates content types, content objects, skins, and CSS styles of the website.

4.2. Upper-level Ontologies As it was described in Section 4.1, at the end of each of the four modeling steps a website model is produced. These four models are represented as ontologies and based on the corresponding upper-level ontologies. The upper-level ontologies describe general concepts and relations, to which all elements in the individual website models should be linked. The upper-level ontologies are developed in the Web Ontology Language, OWL-DL language variant. For the construction of these ontologies, Protégé software was used.

24 4.2.1. User-Task Upper-level Ontology The User-Task Upper-level Ontology defines two basic concepts that constitute the User-Task Model: User and Task. The User concept represents the generalized description of a potential website user or a user category. The user is described by specifying such data as the user’s age, social life, occupation, location, and computer experience. The Task concept is used to define the tasks that can be carried out on the website by the users. Each task is defined by the task name, description, type (view, edit, create, delete), and the users that can carry it out.

4.2.2. Information Upper-level Ontology During the information modeling step, the information architect can define arbitrary number of information classes, which during the website generation process will be transformed into website content types. One of the major benefits of the approach described in this paper is that the designer can reuse existing upper-level and domain ontologies for defining the information classes.

However, the classes defined in the Information Model may vary considerably with respect to the degree of abstraction and the designer may not need some abstract classes of the domain ontologies to be converted into the website content types. Therefore, we need a way to define those classes that have to be converted into the content types. This is realized by the Information Upper-level Ontology, which defines the ContentType class. The ontology must be imported by the corresponding website model ontology and can be (optionally) combined with other upper- level and domain ontologies that fit with the website domain. In order to specify that a particular information class must be converted into website content type, the designer must define it as sub class of the ContentType class. The information classes that are not sub classes of the ContentType will not be converted.

4.2.3. Navigation Upper-level Ontology The Navigation Upper-level Ontology defines the classes and relations that can be used by the designer to associate the tasks of the User-Task Model with the content objects on the website and describe how these content objects are connected to each other. In addition, this ontology contains a mechanism to define which content object must be generated automatically.

There are two classes in the Navigation Upper-level Ontology: Node and Path. The Node class is used to define content objects, such as pages and folders. Each node is described by its unique identifier, title, association with content type, and association with task defined in the User-Task Model. The Path class is used to define how the nodes are connected; in particular, it describes a path from one node to another node and defines the navigation type. There are two navigation types: global and local. The global navigation type defines the nodes that will be generated on the top level in the website hierarchy, while the local navigation type allows defining the hierarchy of nodes in a particular section of the website.

4.2.4. Presentation Upper-level Ontology The Presentation Upper-level Ontology contains the classes and relations needed to define the composition as well as layout and look-and-feel of the pages on website. The hierarchical structure of the ontology is shown in Figure 4.2.

The main class of the ontology is the View class, which is used to describe view of a particular task defined in the User-Task Model. A view can have one or many screens defined as instances of the Screen class. A screen can be defined as an instance of the MasterScreen class (describes composition of entire page), or as an instance of the SlaveScreen class (defines layout of a particular part of page). Each screen has a sequence (an instance of the Sequence class), which defines the first container of the screen. The Container class is used to define a block of page 25 that contains one simple or composite element. Each instance of the Container class must be followed by another container. Only instances of the LastContainer class do not specify the following container. The Element class is used to define elements that constitute a webpage. There are two categories of elements: simple element (instance of the SimpleElement class) is an atomic element and composite element (instance of the CompositeElement class) is an element that contains other elements. The relation between these classes that defines the composition pages is shown in Figure 4.3.

In addition, the ontology contains the CSS and HTML classes to define the look-and-feel of the page elements. The CSS class contains subclasses and properties that are used to define the styles of the CSS class selectors, HTML selectors, pseudo classes, and pseudo elements. The instances of the HTML class are used to enumerate the standard HTML tags, such as BODY, P, H1, etc.

Legend:

Class

subClassOf relationship

View Master Screen Group Screen Boolean Slave Index Screen Widget Decimal Link Widget Sequence File Form Widget

Container LastContainer IdWidget Composite Navigation Menu Element HTML Image OrderedList Image Widget Element UnorderedList Integer Table Sound Widget

Video Label Widget HTMLTag Simple Element FormElement MultiSelection Widget PseudoClass Password RichText ClassSelector Widget CSS PseudoElement Picklist Heading Widget HTMLSelector

CSSStyle Reference String Widget PseudoClass Style RichText PseudoElement Widget Style Selection Widget

String Widget

TextArea Widget

Figure 4.2. Hierarchy of the Presentation Upper-level Ontology

26 SlaveScreen MasterSchreen

isSubclassOf isSubclassOf

View hasScreens Screen hasSequence Sequence hasSequence

startsWith

Container contains Element

isSubclassOf isSubclassOf isSubclassOf

LastContainer SimpleElement CompositeElement

Figure 4.3. Composition of Pages

4.3. Plone Website Generator Plone Website Generator is a program that converts the information architecture models into a product for the Plone Content Management System. The four IA models, represented as OWL ontologies, are provided as input to the generator. The software, written in Java, uses the Protégé-OWL Java API to access and process these ontologies in order to generate a product for Plone CMS. In turn, the product, when installed in Plone, generates a browsable website.

4.3.1. System architecture The system architecture of the Plone Website Generator, shown in Figure 4.4, consists of the following modules:

Content Types Generation Module produces the definitions of the website content types. The definitions are generated based on Information Model. For every class defined as subclass of the ContentType class in Information Model ontology, the module generates the description file. The description file defines the content type name, fields, field types, widgets used for populating the fields with content and associated skin according to which the content objects of this type will be displayed.

Navigation Generation Module generates Python instructions, which, when executed in Plone, will generate the website hierarchy. The instructions are generated based on the Navigation Model ontology, which defines the pages and folders as nodes, and describes how these nodes are connected.

Presentation Generation Module produces the layout and look-and-feel of pages of the future website. The presentation aspects, defined in the Presentation Model ontology, are converted into skins, templates, portlets, and CSS styles.

File System Generation Module creates the directory structure of the product and writes the instructions generated by the former three modules into the corresponding files and folders.

27 User-Task Information Navigation Presentation Model Model Model Model

Plone Website Generator

Content Types Generation Module

Navigation Generation Module

Presentation Generation Module

File System Generation Module

Plone Product

Figure 4.4. Architecture of Plone Website Generator

4.3.2. Structure of generated Plone product A product generated by the Plone Website Generator is composed of a set of instructions written in combination of five languages: Python, XHTML, METAL, TAL, and CSS. The product consists of the following parts (Figure 4.5):

• Content is a collection of content types. Content type is a schema that defines fields and methods for content objects. This container also contains the initialization file that defines which content types will be installed in Plone.

• Skins container is a collection of template layers for content types, system templates, portlets, and Cascading Style Sheets.

• Initialization module is the usual “Python package” initialization module.

• Configuration module defines the configuration variables for the product, such as product name, description, and contains the global variables used by the content types, such as vocabulary lists.

• Installation module is a set of Python instructions that install the content types, skins, and create the content objects.

Figure 4.5. Structure of Plone product

The product is generated as a directory and has the structure shown in Figure 4.5. This directory must be copied into the Plone Products directory in order to compile the Python files and install the product in Plone.

Figure 4.6. Directory structure of Plone product

4.4. Use-Case for AIT Website This use-case describes the application of the approach described in this paper for modeling the information architecture of the website of Asian Institute of Technology. The Asian Institute of Technology (AIT) is an international institution for higher education in engineering, advanced technologies, and management and planning. AIT is comprised of three schools, School of Engineering and Technology (SET), School of Environment, Resources, and Development (SERD), and School of Management (SOM). This institute operates as a self-contained international community at its campus located some 40 kilometers north of Bangkok, Thailand [AIT Website].

AIT operates its official website http://www.ait.ac.th, which is a fully-fledged academic institution web resource. In 2006, the institute administration took decision to move the content of the website to the Plone CMS and do conceptual remodeling of the resource. In this section, the ontology-based approach to designing information architecture of website is demonstrated on 29 the AIT website. For this website, four simplified information architecture models were constructed using Protégé ontology editor. These models were converted into a Plone product by the Plone Website Generator. Upon installation in Plone, the product generated a prototype website of the institute.

4.4.1. User-Task Model The User-Task Model of AIT website describes the categories of potential users of the website and the tasks that the users may want to carry out on the website (Figure 4.7). Both, users and tasks are defined as instances of the corresponding classes of the User-Task Upper-level Ontology. Eight categories of users were defined in the model:

1. Prospective student 5. Alumnus

2. Current student 6. Partner

3. Faculty member 7. Donor

4. Staff member 8. Media

In addition, the following tasks were defined in the model:

• Home page (view/edit)

• General information about the institute (view/edit)

• Mission statement (view/edit)

• Information about campus (view/edit)

• Contact information (view/edit)

• Information how to get to the institute (view/edit)

• General information on admissions (view/edit)

• Information on general admission requirements (view/edit)

• Information on admission requirements to a specific program (view/edit)

• General application requirements (view/edit)

• Application forms (view/edit)

• Information on available scholarships and financial assistance (view/edit)

• General information on education (view/edit)

• Information about academic programs (view/edit)

• Information about schools (view/edit)

• Information about fields of study (view/edit)

• Academic regulations (view/edit) 30 • General information on research (view/edit)

• Information about research areas (view/edit)

• Information about research centers (view/edit)

• Information about research projects (view/edit)

Figure 4.7. A user in the User-Task Model for the AIT website

4.4.2. Information Model The Information Model defines the information classes that constitute the AIT website and the relationships between the classes (Figure 4.8). This model ontology is grounded on two imported ontologies: Information Upper-level Ontology and PROTON Upper-level Module [PROTON]. All information classes are defined as sub classes of the PROTON ontology. In addition, for the classes that should be transformed into the content type of the future website, the ContentType class was given as direct super class.

The following classes were identified in the model:

• Front page • About statement 31 • Mission statement • Form

• Campus information • General information on education

• Contact information • Academic program

• How to get there information • School

• Map • Field of study (FoS)

• General information on admissions • General information on research

• Application requirement • Research area

• General admission requirement • Research center

• Program eligibility requirement • Research project

Figure 4.8. An information class in the Information Model for the AIT website

32 4.4.3. Navigation Model The Navigation Model defines the nodes (folders and pages) on the institute website and establishes the relations between them. The nodes and paths between them are defined as instances of the Node and Path classes respectively, which are defined in the imported Navigation Upper-level Ontology. Each node describes a particular object that will be generated on the website, such as Home page, About Us page, Campus Information page, etc. For each node, the following information was specified (Figure 4.9):

• Unique ID of the node will be used as unique identifier of the generated content object.

• Title will as a title of the generated content object.

• Association with content type defines the content type according to which the content object will be generated.

• Association with task relates the node to the corresponding task in the User-Task Model.

Figure 4.9. A node in the Navigation Model for the AIT website

The relationships between the nodes, which constitute the navigation scheme, are represented as instances of the Path class. Each path specifies the traversal from one page to another and assigns corresponding navigation type (local or global). For example, the path ‘AboutUs_CampusInformation’, shown in Figure 4.10, defines that there is path from ‘About Us’ page to the ‘Campus Information page’ available by the local navigation type (local navigation menu).

Figure 4.10. A path in Navigation Model for the AIT website

4.4.4. Presentation Model The Presentation Model describes various presentation aspects of the AIT website. The composition of pages was defined using such constructs as View, Screen, Sequence, Container, and Element. For every task defined in the User-Task Model, a corresponding view was created. As the User-Task Model contains two types of tasks, view task and edit task, the Presentation Model contains views for viewing pages as well as views for editing pages. For example, for the task ‘ViewAboutStatement’ a corresponding view is created (Figure 4.11). This view has one screen ‘ViewAboutStatement_Screen’, which is a sequence of the following elements (Figure 4.12):

• Document title

• Brief description

• ‘Facts and Figures’ subtitle

• ‘History’ subtitle

• ‘History’ content

Figure 4.11. A view in the Presentation Model for the AIT website

35 Document Title

Brief description

‘Facts and Figures’ subtitle

‘Facts and Figures’ content

‘History’ subtitle

‘History’ content

Figure 4.12. Composition of the 'About Us' page

The ‘ViewAboutStatement_Screen’ screen is defined as a slave screen, which defines the composition of document body and has master screen ‘main_template’. The ‘main_template’ screen defines the layout of the website pages. The layout is represented as sequence of the following blocks (Figure 4.13):

• Website global header • Content views

• Global navigation menu • Content actions

• Plone personal tools • Document body

• Breadcrumb trails • Website global footer

• Local navigation menu

36 Content views

Website global header Content actions Global navigation menu Personal tools Breadcrumb trails

Local navigation menu

Document body

Website global footer

Figure 4.13. Composition of the 'main_template'

The aesthetical aspects of the website pages are defined using the sub classes of the CSS class (Figure 4.14). The appearance of the standard HTML tags is defined as instances of the HTMLSelector class, the look-and-feel of such elements as header, global navigation bar, local navigation bar, and footer is defined as instances of the IDSelector class. The website common classes are represented as instances of the ClassSelector class.

Figure 4.14. A CSS style definition in the Presentation Model for the AIT website

4.4.5. Website generation The four IA models were provided to the Plone Website Generator software as input, which generated a product for the Plone CMS. Afterwards, the product was installed in Plone using the quick install method. The installation generated a working website in Plone, which consists of the content types, content objects, templates, and styles.

Content types. For each information class defined in the Information Model, a corresponding content type was created. The list of generated content types is displayed in the ‘Add Item’ menu (

Figure 4.15). In order to create a document according to one of the created content types, the content manager has to navigate to the corresponding folder and select the required content type from the ‘Add Item’ menu. After that, the corresponding form is displayed. The form contains fields that correspond to the attributes specified for the information class. For example, in order to create a document that describes the School of Engineering and Technology, the user must navigate to the Schools folder of the website and then select ‘school’ content type from the ‘Add Item’ menu. The form for editing school will appear on the screen (Figure 4.16).

Figure 4.15. List of content types in the AIT website Figure 4.16. A form for editing a document according to the ‘school’ content type Content objects. The content objects, documents and folders, are generated based on the Navigation Model, which defines the position of objects in the website hierarchy. The structure of AIT website consists of four sections, ‘About Us’, ‘Admissions’, ‘Education’, and ‘Research’, which are further subdivided into subsections. The overall website hierarchy is shown in Figure 4.17.

Templates and styles. For every view defined in the Presentation Model, a corresponding template was generated. Two types of templates were created: master template and slave template. Front page and main template were defined as master templates, which provide description of a stand alone page. The templates that define the presentation of content types were defined as slave templates. The last define the presentation of document body and inherit the overall layout of the main template. In addition, the website generator produced the definition of CSS styles, which define the appearance of pages.

Figure 4.17. Website hierarchy

39 CHAPTER 5 EVALUATION This section describes evaluation of the ontology-based approach to modeling information architecture of websites that was discussed in this paper. For the evaluation, a set of evaluation criteria was identified. These criteria were used to compare the approach discussed in this paper with two similar modeling approaches: WebML and OntoWebber described in Section 2.3.3 and Section 2.5.1 respectively.

5.1. Evaluation Criteria The criteria for evaluation of website modeling approaches consist of the factors that reflect information architecture aspects, modeling language aspects, potential for reuse of existing models, availability of website generators, Semantic Web aspects, and the ease of use of the approaches and supporting tools.

Information architecture aspects cover the four important facets of information architecture, such as users, information structure, navigation systems, and presentation. This category describes the degree of support that the approaches provide for modeling the above-mentioned facets.

• User modeling describes the support that the approach provides for modeling the user aspects of websites. In particular, this criterion shows how efficient approach is in defining the users, their needs, characteristics, and expectations.

• Information modeling describes the schemes used by the approach for defining information structure of the website and information classes that will constitute the resource.

• Navigation modeling describes which and how navigation systems can be modeled by the approach.

• Presentation modeling shows which presentation aspects can by captured by the modeling approach. This criterion defines whether the approach allows modeling the layout of pages and the look-and-feel of the entire website.

Modeling language aspects describe the visual vocabulary that can be used for modeling and the formats in which the models can be stored and exchanged.

• Visual vocabulary is the criterion that shows which visual modeling primitives can used by the approach for designing the website.

• Storage and exchange formats are the formats that can be used to store and exchange the models designed by the approach.

Reuse of models covers such aspects as possibility to integrate the models produced by the approach with other models and the availability of models that can be reused by the modeling approach.

• Integration with other models is the criterion that shows whether the models designed by the approach can be integrated with models produced by other approaches or any other forms of conceptualization, such as ontologies.

40 • Availability of models is the criterion that describes the availability of the existing models that can be reused by the approach.

Website generation describes the tooling support of the approach for generating websites according to the designed models.

• Generators are the tools that can automatically generate working websites based on the models designed according to the approach.

• Integration with CMS is the criterion that shows the compatibility of the modeling approach with existing content management systems.

Semantic Web aspects cover the fit of the modeling approach to the new generation of the Semantic Web. In particular, this set of criteria shows which benefits of the Semantic Web are utilized by the approach.

User-friendliness criterion shows the ease of use of the modeling approach and describes the level of skills and qualifications that the website designer must posses in order to use the approach.

5.2. Comparison The criteria described in Section 5.1 were used to make the comparison of the ontology based- approach to designing information architecture of websites described in this paper with two modeling approaches: OntoWebber and WebML. The results of the comparison are shown in Table 5.1.

Approach described in OntoWeber WebML this paper Information architecture aspects User modeling • Definition of the user • Users are defined by the • Personalization model demographic user ontology and the defines users and groups characteristics, computer customization rule model • Each user of group is experience, and social • The user ontology defines described by means of aspects basic details about users specific attributes • Definition of the user such as ID, password, expectations by group, device, and specifying which tasks interests the user want to carry out • The customization rules define when and how perform specific actions Information • Definition of the • The approach does not • Information structure is modeling information classes, their include information defined in terms of entities properties and relations modeling and relationships using the ontological • It only accepts a domain constructs ontology as input Navigation • Definition of global and • Navigation is defined as • Navigation model modeling local navigation schemes site view ontology expresses how pages and • Contextual navigation is • The ontology provides content units are linked generated automatically such constructs as static • The relations can be based on the information links, dynamic links, and defined by two types of model contextual links links: non-contextual and • Supplemental navigation contextual mechanisms can be provided by the CMS and defined in the presentation model

41 Presentation • Definitions of the page • The presentation aspects • Presentation model modeling composition with any are defined in the expresses the layout and level of granularity presentation ontology graphic appearance of • Definition the element • The ontology provides pages position and style vocabulary to describe • Allows definition of • Style definition of the templates, presentation device-specific views standard elements tags objects, and layout objects Modeling languages aspect Visual vocabulary • UML Use-case diagrams • No visual vocabulary is • Information structure is can be used for user used defined by ER diagrams modeling • Users, navigation, and • UML Class diagrams and presentation aspects are UML Package diagrams modeled using the visual can be used for primitives developed as information modeling part of the WebML • UML Activity diagrams approach can be used for • Provided support for UML navigation modeling Class diagrams • UML Profile for GUI modeling can be used for presentation modeling Storage and • All models are stored as • All models are • The models can be exchange formats ontologies in OWL represented as ontologies exported into XMI format language in RDF language Reuse of models Integration with • The approach allows • Domain ontologies can be • Existing data/information other models reusing other models provided as input models represented as ER represented as ontologies or UML class diagrams can be reused Availability of • Large number of upper- • Only ontologies • Large number of ER models level and domain represented in RDF can diagrams for different ontologies are available be used domains are available and can be reused by the approach Website generation Generators • The models designed • Not available • Models can be converted with the approach can be into a working website by converted into a working the WebRatio CASE tool Integration with website in Plone CMS • Not implemented • Not implemented CMS Other aspects Semantic Web • Automatic generation of • Provides basic semantic • Semantics of pages is not aspects the semantically markup of web pages defined annotated content • Semantic search and browsing of the content instances User-friendliness • Requires high level of • Requires high level of • Due to the visual skills in information skills in information primitives implemented in architecture and architecture and the CASE tool, non- ontological engineering ontological engineering technical designers can • If combined with the easily construct website UML diagrams, can be models used by any website designer familiar with UML Table 5.1. Comparison of modeling approaches

42 5.3. Discussion of the comparison results The three modeling approaches were evaluation based on the evaluation criteria defined in Section 5.1. Table 5.1 shows the summarized comparison. In this section, the comparison results are discussed and the overall view of the comparison is presented.

5.3.1. Information architecture aspects User modeling is supported by all approaches. However, the OntoWebber approach provides the most comprehensive constructs for user modeling, which includes the complete definition of users and the description of the customization rules.

Information modeling is included only in two approaches, the approach described in this paper and the WebML approach. The OntoWebber approach does not include information modeling step, but only allows providing existing domain ontologies as input.

All three approaches model navigation by means of defining links between the pages on the website. The contextual links are defined explicitly by OntoWebber and WebML. However, according the approach described in this paper, the contextual navigation can be generated automatically based on the relations between information classes defined in the information model.

Presentation modeling is implemented in all three approaches. The approach described in this paper provides very comprehensive mechanisms for defining the layout and look-and-feel of pages. In addition, the WebML approach allows definition of device-specific views.

5.3.2. Modeling language aspects At the current state, only the WebML approach supports modeling using visual vocabularies. The approach uses ER diagrams for information modeling and its own constructs for modeling other aspects of information architecture. However, the approach discussed in this paper can be easily combined with UML diagrams, which will alleviate the modeling task and allow construction of models for the designer who does not have background in ontology engineering.

For storage and exchange of models, the approaches use different formats. The approach discussed in this paper and the OntoWebber approach use languages that can define semantics of the website content, OWL and RDF respectively. However, WebML models are exchanged using XML Interchange format (XMI), which does not define the meaning of models.

5.3.3. Reuse of models All three approaches provide the benefits of reusing existing models. However, the approach discussed in this paper and the OntoWebber approach provide more flexible mechanism for reuse of models. The models of these approaches are represented as ontologies. Therefore, it enables the designer to import or extend the existing domain or upper-level ontologies in order to construct the model that fits the requirements.

5.3.4. Website Generation Only for two approaches, the approach discussed in this paper and WebML, there is possibility to convert the designed models into a working website. The models designed according to the first approach can be converted into a product for the Plone CMS by the Plone Website Gererator. This product, when installed in Plone, generates content types, content objects, navigation mechanisms, skins, and CSS styles. The WebML approach is also accompanied by tool to generate website WebRatio. However, WebRatio can generate only HTML and JSP code of a standalone website and is not compatible with any CMS. 43 5.3.5. Other aspects The approach described in this paper provides the most sophisticated support for the Semantic Web technologies. The content of the web pages generated according to this approach can be annotated automatically using the information ontology. This enables semantic search and browsing, automatic composition of pages, reuse of content in various software packages, such as address books, calendars, integrated library systems, etc. However, support for the Semantic Web of OntoWebber is limited by the restrictions of the RDF and not implemented in WebML.

The user-friendliness factor is determined by the availability of support for visual modeling. Only WebML provides the visual modeling primitives. However, only for information modeling WebML uses standardized visual primitives--ER diagrams. For modeling other three aspects, users, navigation, and presentation, not standardized constructs WebML-specific constructs are used. It requires that the designer must obtain a certain level of familiarity with these primitives before he or she will be able to design models. In contrast, the approach described in this paper can be combined with UML diagrams for designing all four conceptual models. It will allow any web designer familiar with UML to construct the models with ease.

44 CHAPTER 6 CONCLUSIONS AND FUTURE RESEARCH

6.1. Conclusions With the growth of the size of websites and increasing complexity of their features, the design of information resources for the World Wide Web becomes a very complex task. Poorly designed websites may cause the users to feel frustrated of not finding the necessary information and leave the resource. Information Architecture is a recently coined discipline that aims to tackle this problem by establishing best practices of designing information resources in order to promote usability and findability. The discipline identifies the essential aspects of websites, which must be captured by the designers. These aspects include the website users, its information structure and organization, navigation schemes, and presentation aspects.

However, designing a well organized and aesthetically appealing website with clear structure which can be understood by humans is not enough to say that the resource fully complies with the requirements of the current Web. According to Tim Berners-Lee, we are stepping into a new generation of the Web, where the information can be consumed by machines—this is the generation of Semantic Web. In the Semantic Web, information must be given a well defined meaning, which can be understood by software agents. Ontologies are the backbone technology of the Semantic Web, which can define machine-readable semantics of information resources.

In this paper, ontology-based approach to designing information architecture of website has been presented. The approach brings together the recent practices of Information Architecture and the key technologies of Semantic Web in order to solve the task of modeling information resources on the World Wide Web. Four conceptual models have been developed in order to define the key aspects of information architecture of a website. The User-Task Model defines the potential users of the website, their demographic characteristics, social aspects, and expectations from the website. The Information Model identifies the information classes, which will constitute the future resource, their attributes and relations. Navigation schemes are defined in the Navigation Model. Finally, the Presentation Model describes the composition of pages, their layout, and the overall look-and-feel of the resource. All four models are represented as ontologies, which provide machine-processable semantics of the models.

In addition, the approach is accompanied by the Plone Website Generator, a software package that converts the four conceptual models into a working website. The software accepts the model ontologies as input and transforms them into a product for Plone Content Management System. This product, when installed in Plone, generates content types, content objects, templates, and Cascading Style Sheets of the modeled website. With this software, the approach becomes a powerful framework for rapid development of well-structured websites, information on which has defined machine-readable semantics.

The approach was evaluated and compared with similar approaches. For the evaluation, a set of evaluation criteria was developed. The set reflects such factors as information architecture and modeling language aspects, potential for reusing existing models, availability of tools for automatic website generation based on the models, Semantic Web aspects, and user-friendliness. These criteria were used to compare the approach with OntoWebber and WebML. The results of the comparison have shown that the approach described in this paper made a significant contribution to the multidisciplinary research that aims to investigate the potential of ontologies for modeling information architecture of websites. First, the ontologies provide a flexible media for storage and exchange of models that define information architecture of websites. Second, utilization of ontologies for modeling IA enables the website designers to reusing existing domain and upper ontologies, which can improve the quality of models and speed up the 45 modeling process. Third, the designed model ontologies can be easily converted into a working website by specially developed software, which allows the designers to concentrate on conceptual modeling of information resources rather than on coding.

6.2. Future Research Based on the evaluation results, the following improvements and directions for future research have been identified. Development of a modeling toolkit that enables the designers to construct the conceptual models using the Unified Modeling Language (UML) can considerable improve the modeling task and allow the designers without background in ontological engineering to create models of arbitrary complexity. In addition, the modeling toolkit can be enhanced by allowing automatic generation of the standard information architecture deliverables, such as persona description, site maps, wireframes, and style guides.

Additionally, the inferencing capabilities of ontologies must be studied in order to identify the corresponding benefits that they can bring to the website modeling process. One of the possible benefits of these capabilities could be evaluation of conceptual models. The evaluation can be done automatically or semi-automatically by the modeling toolkit and can identify possible modeling flaws and generate recommendations for fixing them.

Another important aspect of this approach, which has not been explored, is the potential of producing semantically enriched content. This aspect should be studied thoroughly in order to identify the syntactical factors of annotation and representing of machine-readable semantics of website content. Also it is important to determine the appropriate level of granularity when describing the classes in the Information Model. If these aspects are considered, it will improve the quality of search and information retrieval, allow automatic content composition, and enable the software agents to consume the content of website.

Furthermore, the aspect of integration with Content Management Systems should also be explored. Apart from the generation of content types and objects, skins, and styles, it can be possible to generate user groups and permissions in CMS based on the User-Task Model. The aspect of content workflow should also be taken into consideration. If the workflow is properly modeled, it will be possible to generate the workflow policies for CMS, which will define the steps and permissions in the life cycle of website content.

46 REFERENCES AIT Website. Official website of the Asian Institute of Technology. URL: .

Berners-Lee, T. (2001). The World Wide Web: A very short personal history. Retrieved on August 20, 2006 from .

Berners-Lee, T. (2004). The WorldWideWeb Browser. Retrieved on August 20, 2006 from .

Berners-Lee, T. (2006). Frequently Asked Questions by the press. Retrieved on August 20, 2006 from .

Berners-Lee, T., Hender, J., and Lassila, O. (2001). The Semantic Web. Scientific American, 5, 34.

Blueink (2005). Rapid Application Development. Retrieved on August 26, 2006 from .

Borst, W.N. (1997). Construction of Engineering Ontologies. Centre for Telematica and Information Technology, University of Tweenty. Enschede, Netherlands.

CASEMaker (2000). What is Rapid Application Development (RAD)? Retrieved on August 26, 2006 from .

Ceri, S., Fraternali P., and Bongio, A. Web Modeling Language (WebML): a modeling language for designing Web sites. Retrieved on June 20, 2006 from .

Creative Commons. Navigation models. Retrieved on June 20, 2006 from .

DAML. DARPA Agent Mark Up Language (DAML). Retrieved on July 11, 2006 from .

DAML+OIL. Retrieved on July 11, 2006 from .

Davies, J., Fensel, D., and Harmelen, F. (2002). Towards the Semantic Web: Ontology-driven Approach for Knowledge Management. West Sussex, England: John Wiley & Sons.

Doss, G. (2002a). Designing Effective Web Navigation. Retrieved on June 20, 2006 from .

Doss, G. (2002b). Information Architecture Deliverables. Retrieved June 20, 2006 from .

Duyne, D., Landay, J., and Hong, J. (2002). The Design of Sites: Patterns, Principles, and Processes for Crafting a Customer-Centered Web Experience. Boston, MA: Pearson Education.

Evernden, R. and Evernden, E. (2003). Third-Generation Information Architecture. Communications of the ACM, 5, 95.

47 Farnum, C. (2002). Information architecture: Five things information managers need to know. The Information Management Journal, 5, 33-40.

Fraternali, P. (1999). Tools and approaches for developing data-intensive web applications: a survey. ACM Computing Surveys, 3(31).

Garrett, J. (2006). A visual vocabulary for describing information architecture and interaction design. Retrieved June 22, 2006 from .

Garshol, L.M. (2004). Metadata? Thesauri? Taxonomies? Topic Maps!: Making sense of it all. Retrieved on June 30, 2006 from .

Garzotto, F., Paolini, P., and Schwabe, D. (1993). HDM – A model-based approach to hypertext application design. ACM Transactions and Information Systems, 1, 1-26.

Gomez-Perez, A., Fernandez-Lopez, M., and Corcho, O. (2004). Ontological engineering: with examples from areas of knowledge management, e-commerce and the semantic web. London: Springer.

Gruber, T. R. (1993). Toward Principles for the Design of Ontologies Used for Knowledge Sharing, In Formal Ontology in Conceptual Analysis and Knowledge Representation, edited by Nicola Guarino and Roberto Poli, Kluwer Academic Publishers, in press.

Gruber, T. R. (1993b). Toward Principles for the Design of Ontologies Used for Knowledge Sharing, In Formal Ontology in Conceptual Analysis and Knowledge Representation, edited by Nicola Guarino and Roberto Poli, Kluwer Academic Publishers, in press.

Gruber, T.R. (1993a). A translation approach to portable ontology specification. Knowledge Acquisition 5(2),199-220.

Hoffer, J., George, J., and Valacich, J. (2005). Modern Systems Analysis and Design. 4th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2005.

Hong, S. and Kim, J. (2004). Architectural criteria for website evaluation—conceptual framework and empirical validation. Behavior and Information Technology, 23(5), 337- 357.

Isakovitz T. (1997). What is RMM? Retrieved on June 22, 2006 from .

Isakovitz, T., Stohr, Ed., and Balasubramanian, P. (1995). RMM: A methodology for structured hypermedia design. Communications of ACM, 8, 34-44.

Jin, Yu., Decker, S., and Wiederhold, G. (2001). OntoWebber: Model-driven ontology-based web site management. In proceedings of the first international semantic web working symposium (SWWS'01), Stanford University, Stanford, CA, 29 July-1 August.

Jin, Yu., Sichun, X., Decker, S., and Wiederhold, G. (2002). OntoWebber: A novel approach for managing data on the web. 18th International Conference on Data Engineering (ICDE'02).

KMC (2004). Faceted classification of information. The Knowledge Management Connections. Retrieved on July 3, 2006 from . 48 Lei, Yu., Motta, E., and Domingue, J. (2005). OntoWeaver: an ontology-based approach to the design of data-intensive web sites. Journal of Web Engineering, 4(3), 244-262.

Macdonald, N. (2003). What is web design? Singapore: RotoVision.

Mizoguchi, R., Vanwelkenhuysen, J., Ikeda, M. (1995). Task ontology for reuse of problem solving knowledge. In: Mars, N. (ed) Towards very large knowledge bases: knowledge building and knowledge sharing (KBKS’95). University of Tweente, Enschede, Netherlands: IOS Press.

Nanard, J. and Nanard, M. (1995). Hypertext Design Environments and the Hypertext Design Process. Communications of the ACM, 8, 49.

Neilsen, J. (2000). Is navigation useful? Retrieved on June 20, 2006 from .

Netcraft (2006). Netcraft: August 2006 Web Sever Survey. Retrieved on August 20, 2006 from .

Nielsen, J. and Molich, R. (1990). Heuristic Evaluation of User Interfaces. CHI’90 Proceedings.

OIL. Description of OIL. Retrieved on July 11, 2006 from .

Onto. Ontology (Computer Science). Retrieved on July 11, 2006 from .

OWL. Web Ontology Language: Overview. Retrieved on July 11, 2006 from .

Pidcock, W. (2003). What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model? Retrieved on June 30, 2006 from .

PROTON. PROTON Ontologies website. Retrieved on July 4, 2007 from .

Rosenfeld, L. and Morville, P. (1997). Information architecture for the World Wide Web. Sebastol, CA: O’Reilly & Associates.

Schwabe, D., and Rossi, G. (1995). The Object-Oriented Hypermedia Design Model. Communications of the ACM, 8, 45.

Shedroff, N. (2001). Experience design 1.Indianapolis, USA: New Riders.

Shneiderman, B. and Plaisant, C. (2005). Designing the User Interface. 4th ed. College Park, USA: Pearson Education.

SLAC (2006). The Early World Wide Web at SLAC: Documentation of the Early Web at SLAC (1991-1994). Retrieved on August 20, 2006 from .

49 Studer, R., Benjamins V.R., Fensel, D. (1998). Knowledge Engineering: Principles and Methods. IEEE Transactions on Data and Knowledge Engineering 25(1-2), 161-197.

Tang, L., Li, H., Qiu, B., Li, M., Wang, J., Wang, L., Zhou, B., et al. (2006). WISE: A Prototype for Ontology Driven Development of Web Information Systems. Lecture Notes in Computer Science, Volume 3841, 1163 – 1167.

Timberlake, S. (2001). The basics of navigation. Retrieved on June 15, 2006 from . van Heijst, G., Schreiber, A.T., Wielinga, B.J. (1997). Using explicit ontologies in KBS development. International Journal on Human-Computer Studies 45, 183-292.

Weibel, S., Kunze, J., Lagoze, C., and Wolf, M. (1998). RFC 2413: Dublin Core Metadata for Resource Discovery. Retrieved on July 3, 2006 from .

White, M. (2004). Information architecture. The Electronic Library, 5, 218-219.