Catalog Converter

Irina Danilova

A Thesis in the Field of Information Technology for the Degree of Master of Liberal Arts in Extension Studies

Harvard University

June 2009

Abstract

The goal of this thesis is to create an easy-to-use tool for converting data from an online catalog of a biotechnology company into a customizable website that can be run locally and can be distributed on CD.

The creation of such a converter is dictated by two of the company’s needs. First, there is a need to have a backup of the online catalog accessible through means other than contacting the Internet service provider. Second, there are often requests from distributors to provide them with a version of the catalog that does not include prices, shows no logotype, or is otherwise different from the existing version. Given the cost of printing several versions of the same material, the choice of alternative media arises.

The Catalog Converter was created using a variety of tools. The catalog’s data was assembled, cleaned, and sorted in Microsoft Excel. The layout, originally created by a professional graphic designer, was first translated into HTML code with Adobe

Dreamweaver. Afterwards, a similar design was achieved by modifying Apache

Forrest’s (http://forrest.apache.org/) default template. Finally, the XML templates for webpage creation were generated using the Java programming language.

Acknowledgments

First of all, I would like to thank my advisor David P. Heitmeyer for his

invaluable support, helpful suggestions and wise advice during creation of Catalog

Converter creation and especially during the most challenging first steps in choosing the effective tools with which to work. I’m also deeply grateful for his teaching two classes I have taken previously at Harvard Extension School. The skills in HTML and XML I gained during these courses became very useful at the thesis stage.

I would also like to thank the Master of Liberal Arts in Information Technology

Thesis Advisors, Drs. Bill Robinson and Jeff Parker, and Assistant Director Stephen

Blinn. All of them were extremely helpful and patient answering my numerous questions regarding the administrative aspects of the thesis project. The workshop for thesis writers they organized gave a great start for planning the wrap-up of the process.

My gratitude goes to a wonderful graphic designer Alexandra Tayts in San

Francisco, who created simple but elegant graphic design for this project.

And of course, a very special thank you goes to my mother, Nadia, who was constantly reminding me that I have to work on the thesis as hard and as much as possible.

iv Table of Contents

Table of Contents...... v

List of Figures...... ix

Chapter 1 Introduction ...... 1

Chapter 2 Technology Used and Credits ...... 3

Chapter 3 Catalog Design Stages...... 5

Chapter 4 Catalog Converter Implementation ...... 14

Microsoft Excel Stage...... 15

Java Stage ...... 16

Data Parsing and Validation ...... 20

Creation of the File Structure...... 25

Template Processing ...... 28

Apache Forrest Stage...... 36

What is Apache Forrest?...... 36

Why an Open Source Content Management System? ...... 36

Why Apache Forrest and not Programming Language?...... 38

Behind the scenes...... 39

Skin Configuration, Shell Elements...... 40

Skin Configuration, Color Scheme and Cascading Style Sheets...... 40

PDF ...... 43

Menu Creation ...... 45

v Static pages ...... 46

Template Creation...... 46

Build...... 48

Chapter 5 Testing...... 50

Chapter 6 Error Handling...... 54

Chapter 7 User Manual ...... 56

1. Tools and Prerequisites ...... 56

2. Apache Forrest Installation and Configuration...... 56

3. Data Preparation...... 57

4. Apache Forrest Skin Configuration ...... 63

5. Running Catalog Converter ...... 64

Chapter 8 Future Development and Improvements...... 69

Chapter 9 Summary and Conclusions...... 72

References...... 74

Appendix 1 Application Code ...... 75

Java Code...... 76

FileManager.java ...... 76

ParsingManager.java...... 81

Product. java ...... 92

ProductRecorder.java...... 97

ProductTemplateManager.java ...... 99

Compilation files...... 117

Compile.bat...... 117

vi Configuration files ...... 118

files_config.txt ...... 118

Fragments of the modified Apache Forrest files ...... 119

Skinconf.xml, skin colors and extra css parts only...... 119

Skinconf.xml, extra css...... 120

Templates for XML Templates...... 129

directories_template.xml...... 129

index_template.xml...... 130

product_template.xml ...... 131

site_template.xml ...... 133

XML Templates...... 134

index.xml (List of directories) ...... 134

HCV category index.xml ...... 138

101a.xml example of the final XML product file)...... 143

site.xml (modified site.xml file, part of Apache Forrest skin)...... 146 sample xml static page...... 149

company/news.xml ...... 149

Sample log.txt entry...... 150 snippet of tab-delimited data file ...... 151 testing data files ...... 152

test1.txt...... 152

test2.txt...... 153

test3.txt...... 154

vii test4.txt...... 155 test5.txt...... 156

viii List of Figures

Figure 1 Online catalog page ...... 6

Figure 2 Page from the printed catalog...... 7

Figure 3 Redesigned website ...... 9

Figure 4 Apache Forrest home page ...... 11

Figure 5 Apache Forrest generated page ...... 13

Figure 6 Components of Catalog Converter ...... 14

Figure 7 Catalog Converter classes diagram ...... 17

Figure 8 Interaction of Catalog Converter Managers ...... 19

Figure 9 ParsingManager class ...... 20

Figure 10 Parsing Stage Data Flow ...... 21

Figure 11 FileManager class...... 26

Figure 12 FileManager operations...... 27

Figure 13 ProductTemplateManager class...... 28

Figure 14 ProductTemplateManager operations...... 29

Figure 15 FireBug analyses the hierarchy of CSS...... 42

Figure 16 About Us webpage ...... 44

Figure 17 About Us file generated from the same XML files as About Us webpage 45

Figure 18 Unsupported attribute for the image element error ...... 47

Figure 19 Successful validation of the site XML content...... 48

Figure 20 Sample run of the ProductRecorder ...... 52

ix Figure 21 arrow_bckgs.psd...... 60

Figure 22 next_previous.psd...... 62

Figure 23 Catalogconverter directory ...... 64

Figure 24 xdocs level of the catalog directory...... 65

Figure 25 Successful build...... 66

Figure 26 Invisible character ruins the build ...... 66

Figure 27 The pdf document is not found...... 67

Figure 28 Another indication that something (image) is missing...... 67

Figure 29 Images directory ...... 68

x Chapter 1 Introduction

Catalog Converter is a tool for solving the problem of converting product data initially stored in a Microsoft Excel file into a collection of webpages ready to run locally or be published on a CD.

The main goal of this thesis is to create a website as close in design to the existing printed and online versions as possible, so as to give customers a feeling of familiarity and consistency. This is the main reason why commercially available products, such as

MyBusinessCatalog or FlipAlbum are not good enough. Other reasons for avoiding already existing tools is maintaining accuracy in data and shorter processing time for the conversion. All information about the company’s products would be requested from the web service provider and it would be sent in spreadsheet format. Instead of reentering data in the format required for the existing and risking many errors, one may simply add some additional data without changing the already proven material and work with it directly.

The second goal that appeared during the thesis process is to use an existing open- source content management system (CMS). Utilizing a modular CMS like Apache

Forrest has become an important skillset in modern web technology. OpenSourceCMS

(http://php.opensourcecms.com/) lists commercial and free systems available, including wikis, which are the tools used for creating Wikipedia (http://www.wikipedia.org/), a well-known on-line encyclopedia. The Catalog Converter project both utilizes the power

1 of the Apache Forrest engine and, more importantly, demonstrates a possible approach to the preparation of large amounts of text data into an XML-based content management system for generating HTML content. This integration of the constantly developing technology and the automation of filling out XML templates with the validated data is what makes Catalog Converter a useful tool that may be further developed.

2 Chapter 2 Technology Used and Credits

All credit for the design of both the printed catalog and the CD-based version goes to Alexandra Tayts (San Francisco, CA).

There are several tools the author worked with during the creation of the Catalog

Converter.

1. Apache Forrest

Apache Forrest was used for building the entire website based on the XML files generated by running the program written in Java.

2. Adobe CS3 Creative Studio

All the navigation elements except the transparent arrow and company’s logo were created in Adobe Photoshop. The images used on static pages were bought at

Fotolia.com. The prototype layout for the webpages was done in Adobe Dreamweaver.

The icons used for tables and diagrams are free images found with Google

(http://www.google.com).

3. FireBug by Mozilla

This extremely useful tool was used to figure out the hierarchy of cascading style sheets for the Apache Forrest’s default-skin. This knowledge was later utilized for overwriting styles provided in skinconf.xml, the skin’s configuration file.

3 4. Java programming language

The choice of programming language was mostly based on the author’s desire to use this powerful language and to build something practical with it.

All the work on Catalog Converter was done in a Windows XP Professional environment. The Java portion (jre1.6.0_05) and the Apache Forrest (version 0.8) build process were tested on author’s home and work desktops as well as on the Windows stations at Harvard Science Center computer lab. The final website was checked with latest versions of Mozilla Firefox, Internet Explorer, and Netscape Navigator browsers.

4 Chapter 3 Catalog Design Stages

As was briefly mentioned in the introduction, the design of the catalog for distribution on CD must combine features from both the existing online version and the printed one.

Figure 1 represents the actual page of the company’s current online catalog

(http://www.virogen.com) that was designed and is maintained by the company in

California. The company’s products are grouped into categories and are divided further into subcategories. In some cases there is a link to the product’s specification datasheet and a thumbnail of the image that illustrates the results customer may achieve when using this product. The thumbnail serves as a link to a full -size image that opens in a separate window when clicked. In order to make any content edits to the site, the user has to open the management section of the site and use several text fields and drop-down menus.

There is no option for batch-processing (for example, price cannot be updated on a group of products with a single operation), so every time global changes are needed the request must be sent to the support team.

5

Figure 1 Online catalog page

6

Figure 2 Page from the printed catalog

7

Figure 2 shows the printed catalog. It has a clear and simple design. Each category is marked with a unique color. Since the same colors are used in the table of contents, it is easy to find the information the customer needs. There is no division into subcategories. All images and tables share a unifying design.

8

Figure 3 Redesigned website

9 Figure 3 shows the prototype of the CD -version. It features the navigation familiar to the customer and is simpler to use, as well. The main menu is moved to the top of the page, leaving more space for content. As in the printed version, all categories are marked with their distinct color. This version therefore combines elements of both the printed and on-line catalogs.

Figure 4 shows the home page of the Apache Forrest website, which uses the default ‘pelt’ skin. The ‘pelt’ skin was customized to reflect the simple and clean layout of the graphic designer’s concept for use in the development of Catalog Converter.

10

Figure 4 Apache Forrest home page

11 Figure 5 shows the final page layout. Forrest’s default ‘pelt’ skin was used and customized. The colors, font families, and properties of some elements were adjusted.

For instance, width of the tables is now fixed. The new look of the page matches both the original skin and the graphic designer’s design closely, although not completely. The

‘breadcrumbs’ feature (tracking of user’s location) and the search box are removed because Apache Forrest supports it only for sites that run from the server and not locally, as intended for this software. Products are divided only into categories, as in the printed version, and not into subcategories, since Apache Forrest does not support a second level of nesting for its menus. All the static pages (‘contact’, ‘news’, etc.), as well as the template table for the product pages, are first created in Adobe Dreamweaver. The

HTML markup from the design is used to create XML-based templates for Forrest.

Validation of templates is made via the Apache Forrest validation process. Apache

Forrest uses “xdocs document v2.0” format, based on a subset of XHTML, to check validity of all elements in the content-related ‘documentation’ XML files based on its

Data Type Definitions (DTD). It also uses RelaxNG (RNG) - a schema language for

XML - to validate site.xml and other configuration files.

12

Figure 5 Apache Forrest generated page

13 Chapter 4 Catalog Converter Implementation

Figure 6 shows three main stages involved in the Catalog Converter process.

Figure 6 Components of Catalog Converter

14 These steps are discussed in more detail below.

Microsoft Excel Stage

The first stage is the data preparation stage. All the information about the company’s products is stored in the Catalog.xls file. This file is an extended version of the spreadsheet that the company’s webhosting provider sends to its clients when they request a backup. This Excel stage is very important and requires detailed attention from the person manipulating it. First, the data should be factually correct from the beginning, since it is easier for a reviewer to find errors when working with a single document than once the data is published throughout the many pages of the site. Second, the data should be complete. Every field for every product should have some value. If there is no data, then ‘N/A’ – Not Applicable – must be used; category and catalog number, fields, however, must have valid entries, because the presence of valid data for these two fields is required at the point when the file structure is created. Any incomplete records will be ignored during the data validation phase of the process. It is important to make sure, and this emphasis is reflected in the User Manual (see Chapter 7), that unless there is no information, in which case ‘N/A’ is to be used, fields containing data used for building links (file’ and directory’ names) must not contain ‘/’ character, which is interpreted by browser as part of the path. Also, there should be no extra white space at the end of any data field. From the author’s experience, it may be translated as an invalid character upon conversion from the Excel file to the plain text format and create problems at the final build step. Also, ideally there should be no records with the same catalog number. These duplicates will be rejected by the program. After the data is verified, the product

15 information is saved in a tab-delimited text file format. The author chooses this format

because there are commas and spaces in the data, so the comma- and space-delimited

formats are not appropriate. In addition, commas and spaces in the entries may eventually

contain double quotes upon conversion to the text format. These double quotes should be

removed before proceeding, as they are artifacts of the conversion process, and the

browser interprets double quotes as part of a link or a file name. The author decided to leave their removal as a preparation step and not the task of the program. Removing all double quotes automatically would be unwise, since some of them may be inserted intentionally, as a part of some brand name or quote, for example. Also, for the first version, the author left it to the person preparing the data to sort catalog items as desired.

Categories and products will appear in the menus and lists in the order they appear in the data file.

Java Stage

At this stage XML files that will serve as the base for future webpages are created. They follow the layout that was designed with Adobe Dreamweaver (see more in the Apache Forrest Stage section). The importance of this stage cannot be underestimated. It is a base for creating a verified corpus of data for the whole company’s catalog, which today includes over two hundred items and is still growing in size.

The input for this stage is a tab-delimited file created during the Excel Stage. The process of product data transformation is described below.

There are several sub-stages involved in this process:

16 - Data parsing and validation;

- Creation of the file structure;

- Template processing.

These tasks are performed by three separate classes-“managers”:

ParsingManager, FileManager, and ProductTemplateManager called consequently from the main() method of the ProductRecorder class (Figure 7).

PRODUCT

PRODUCT RECORDER

PARSING FILE PRODUCT TEMPLATE MANAGER MANGER MANAGER

Figure 7 Catalog Converter classes diagram

The Product class defines products of the company. Its members are product

properties that are essential for potential buyer to know about:

- catalog number of the reagent (“131-A”);

- its name (“anti-RhoD human antigen IgG1”);

17 - the category it belongs to (Blood Group Antibodies);

- its type (“mAb” - monoclonal antibody);

- the volume per vial (“100 ug/vial”);

- temperature for the short and long storage (“Long Term: -80C; Short

Term: 4C”);

- price per unit (“210”);

- shipping conditions (“Wet Ice”);

- applications it is suited for (“ELISA, WB, IHC, FC, IP”);

- the biological targets it recognizes (“N/A” – Not Applicable);

- the subtype it belongs to (“AFR1”).

The Product class also stores the names of the specification sheets and images

used to illustrate results of possible experiments either in vitro or in vivo. The only

functions of this class are set() and get() methods for storing and retrieving the data.

The ProductRecorder class consists of a single method, the main() method. This method creates new instances of manager classes, and each in turn does its part. The roles of the managers are: parsing, creating the file hierarchy and filling the templates with data; - they are clearly separate. The way the classes relate to each other may be illustrated by the following diagram (Figure 8).

18

Figure 8 Interaction of Catalog Converter Managers

There is no interaction in the form of calling instances of one class from the methods of another but each class uses the material created or modified by its predecessor. FileManager needs product data cleaned and packed in the form of a

19 HashMap collection in order to build the file system and to create various blank files.

ProductTemplateManager in turn uses the newly created blank files, along with the

HashMap collection, to generate the XML base for the website build with Apache

Forrest. During program execution, the user receives information about important events and any problems that arise via messages in the command prompt window, via the notes in the log file, or both.

Data Parsing and Validation

This part is managed by the ParsingManager class (Figure 9).

ParsingManager

FieldsToParse : String productCatalogNumber : String ProductName : String … hashMapCollection : HashMap []

ParseCatalogData () : HashMap [] GetDataFileName : void ReadDataFile : void ParseTheSourceFile (String currentLine) : void

Figure 9 ParsingManager class

20

Figure 10 Parsing Stage Data Flow

Figure 10 illustrates data flow during the operation of ParsingManager. An instance of the ParsingManager class is created by ProductRecorder and its task is to parse the catalog data and to return the full data structure. What the ParsingManager class does can be summarize by its core method, parseCatalogData().

// parseCatalogData() // Performs the sequence of actions needed to parse the text file // with the products info and to populate the data structure // that will be further used to create the xml files for the html // and pdf files creation. public HashMap [] parseCatalogData(){ getDataFileName(); readDataFile(); hashMapCollection = createHashMapCollection(); hashMapCollection = validateData();

21 cleanup(); return hashMapCollection; }

Translating into words, ParsingManager:

- Opens the configuration file and reads in the name of the data file along

with the log file;

- Opens data and log files;

- Checks for the presence of all required fields;

- Creates an array of HashMap objects;

- Validates each HashMap objects;

- Informs the user if any records were rejected (log.txt);

- Closes data and log files;

- Returns an array of HashMaps.

The configFiles variable, which holds the path to the configuration file

(config/files_config.txt), is hard-coded in the list of members of the class. This file contains references to two other file locations, for the tab-delimited text data fil, and for the log file that keeps track of miscellaneous events occurring during the data file parsing. If no log file was present at the specified address initially, it is created at this

point.

The number of fields in the data file is checked along with their names against the

hard-coded list. The names of the fields present in the data file are extracted from the

22 first line that is neither blank nor starts with the comment ‘#’ sign. If all the required

fields are present (and there are no extra ones), a note is made in the log file:

log.println ("All required fields are present.");

The records then are tokenized and stored in the specially designated

dataFieldsFromDataFile array that is used later for HashMaps creation. Along the

way, ParsingManager keeps track of the lines processed.

Let’s assume that something unexpected happens and the data file is either empty

and contains nothing but the line with the data fields and no actual data. The

createHashMapCollection() method handles this situation by checking the number of

lines that were processed. If the number of lines processed was zero (which means an

empty data file) or there was only one non-blank line with no ‘#’ at the beginning (just

the list of fields is present), an error message is displayed in the command prompt

window and the program exits.

log.println("Number of HashMaps " + (lineNumber-1)); if (lineNumber-1 <=1) { System.out.println("No data in the HashMap Collection. Please check the data file and try again."); System.exit(1); }

If at least one valid product record was found, a HashMap (a collection of the key-value pairs) is made for each entry. All created HashMaps are then placed into one- dimensional array.

Next, ParsingManager runs through each HashMap and validates the data based on several criteria. Entries that have some missing information (blank fields) are rejected, removed from theHashMap array and listed in the log.txt file.

23 log.println("N/A or '/' in the product number, category name or the name of the image/specsheet in the product: " + productCatalogNumber + " . The record is rejected.");

Also rejected are lines with the ‘/’ character among the data used for creating

navigation or links to the files (names of files and directories). The only exception is

made for the fields with ‘N/A’ if they are not category or catalog number fields.

ParsingManager also keeps track of all the catalog numbers it processed in a specially designated array. It compares the catalog number of each new record to catalog numbers of previous records. If upon comparison to the array content, a duplicate is discovered, the latter record is rejected and an error message is written both to the log file and the console.

log.println("Duplicated product number: " + productCatalogNumber + " . The record was rejected.");

The duplicate record is removed from the HashMap array. All rejected records are

listed in the log file.

The author chose HashMaps to store of products information because the key-

value mapping that HashMap implements, is the most logical way to approach the catalog

entries, since this is how the data is organized in the source Excel file (field name – field

value for every property of the product). The HashMap does not provide sorting (see

http://java.sun.com for documentation on HashMaps), but that is not critical. The user

may want categories to appear in some particular order, not necessarily alphabetically.

Initial data is stored in the Excel file and it is simple to sort data at that stage of the

process according to any convenient algorithm. Also, during the whole process of

program execution there is no threat of multiple threads accessing this collection, so the

fact that HashMap is not a synchronized implementation does not affect the process.

24 After all of the product’s data is stored in the HashMap array, both data file and log file are closed and the collection is returned to ProductReader.

While straightforward in the tasks it performs, ParsingManager is key in preparation for the next stage of the process.

Creation of the File Structure

After the data is validated and stored in the data structure, the next step is to create all the directories (folders for product categories), their indexes, and the product files. This is the main task of the FileManager class (Figure 11).

// Create an index.xml file for each directory-category // and an xml file for each product. public void makeFileStructure(){

makeProductFiles(); makeCategoriesIndex(); makeIndexFiles(); makeSiteXML();

}

Function names speak for themselves. FileManager (Figure 12):

- Creates blank files in the form of “’product catalog number’.xml” for

every product in the HashMap array received from the ProductRecorder;

- Creates blank index file for each product category;

- Creates blank site.xml file.

25

FileManager

hashMapCollection : HashMap [] nextKeyValuePair : String [][] partialPath : String fullCategoryPath : String siteXmlFile : String …

makeProductFiles () : void makeCategoriesIndex () : void makeIndexFiles (): void makeSiteXML () : void

Figure 11 FileManager class

26

Figure 12 FileManager operations

All the files created at this stage are blank text files with the “.xml” extension.

They will later be filled with the content of the appropriate templates and the actual information about catalog items. FileManager starts by iterating through every HashMap collection in the array received from ProductRecorder and extracting the category name and catalog number for each product. It then uses this information to create a new blank file at the appropriate location in the “Products” folder. If there is any error during the

process, it is displayed in the Command Prompt window. If the file already exists, then

the record is skipped.

27 After this, the blank index file is created at the top level of the “Products”

directory. It will be used later for listing all the categories. For the product listing, index files are created in each category’s folder.

The last step is the creation of the blank site.xml file. Apache Forrest uses it later

on to create a categories’ index in the “Products” menu item.

Template Processing

The most elaborate part of the process is filling the blank files with the actual

content. This task is the responsibility of ProductTemplateManager class (Figures 13,

14).

ProductTemplateManager

categoresArray : String [] productsArray : String [] previousCategory : String nextCategory : String colorScheme : int [] …

fillTemplates () : void createColorSchemes () : void createIndexOfCategories (): void

Figure 13 ProductTemplateManager class

28

Figure 14 ProductTemplateManager operations

The flow of events is reflected in the fillTemplates() method.

// fillTemplates() // Copies content of the template file into the product file. public void fillTemplates(){

// Create arrays of categories and product names // to be used for navigating fillCategoriesProductArray();

// Create an array of the colors // to mark different categories colorScheme = createColorSchemes();

// Make a diretories listing createIndexOfCategories();

// Create navigation (fill site.xml) createSiteXml();

29 // Fill in product templates populateFiles();

// Create category's index files for (int i = 0; i < catCount; i ++ ){ productCategory = categoriesArray[i]; fullCategoryPath = "" + partialPath + "/" + categoriesArray[i] + "/" + "index.xml"; createIndexFiles(fullCategoryPath); } }

In order to fill the templates, some information in addition to the data about an individual product is needed. For example, in order to build an index of categories and to fill the list of menu items it is necessary to know the total number of categories. The total number of products in each category is needed to build the index for a particular category. In order to build navigation properly, one needs to know the previous and next category or product on the list. The author implements the following solution.

First, two arrays are created while iterating through all the HashMaps. The first array stores the names of categories and the second one contains product numbers for each category.

As mentioned in Chapter 3, the design of the CD-based website requires pages that belong to a particular category to have a unique color for specific page elements, so products in one category are easily distinguished from products in other categories. To achieve this each category receives a ‘color index’ that corresponds to the index of the category in the array (0, 1, 2…). In the list of the styles (Apache Forrest’s skinconf.xml file) there are corresponding styles named “0i_back,” “1i_back,” etc. that define the color of the background elements and the navigation buttons for the particular directory.

For now, the system supports up to 20 categories with different color schemes. If there

30 are more than 20 categories, the additional ones use a gray background (to be changed in

future versions of the Converter). For now, the order of colors is fixed.

The next step is the transformation of the data for the XML files. Several

templates were made based on the prototype HTML files (see Appendix). In the future,

the author wants them to follow XML conventions completely and to mark all the places

where content is dynamically created with tags, but in this version placeholders like

PREVIOUS_CAT, TD_CLASS, etc. are used.

The products are read one by one from the HashMap array. The product data is

copied into variables for ease of use. The program now opens both the blank file created

earlier and the corresponding template, and starts copying information from one file to

the other substituting placeholders with the actual information. The lines where no

substitution is needed are simply copied. The lines containing placeholders are processed.

These lines start either with *** or with one of the placeholders. As was already

mentioned, this system, although it works for this particular version, is not intuitive and effective enough and will be changed in a future version.

One of the ProductTemplateManager’s methods, fillTheFields (String currentToken, PrintWriter writer) does the substitution. Below is the partial code

for this function that illustrates some of the template filling logic. The “…” were used to

mark the omitted code similar to the line above.

// fillTheFields (String currentToken) // @param String currentToken // Populates the template file with the product data public void fillTheFields (String currentToken, PrintWriter writer){

// Product - related tokens if (currentToken.equals("CATALOG_NUMBER")) writer.print ("" + productCatalogNumber + " ");

31 … else if (currentToken.equals("PRICE")) {

if (! productPrice.equals("N/A")) writer.print (currency + productPrice); else writer.print ("N/A"); } … // In case of images the path to them is used // rather then the original file names else if (currentToken.equals("THUMBNAIL_IMG")) writer.print (""); … // Previous and Next products part:

else if (currentToken.equals("PREVIOUS_PRODUCT")){ writer.print (""); writer.print (""); }

else if (currentToken.equals("ALL_PRODUCTS")) { writer.print (""); writer.print (""); } …

else writer.print ("" + currentToken + " "); }

As can be seen from the code excerpt, when a placeholder is in place of some product – related information such as a catalog number, application, etc., simple substitution happens with the value of the corresponding variable. For the field holding the price, it is somewhat different. In the current version, Catalog Converter expects that the cost of a product in the Excel file is numeric. There is no validation that the price field is indeed a numeric value or a combination of numbers, commas or periods. The

32 author plans to add a way for a user to specify which format is acceptable so the data in this field is treated accordingly. There are cases when a user (a distributor or the company owner) may want to put a string instead of an integer, such as “Call to discuss the pricing for bulk amount.” The program also adds a currency character (‘$’, since the company is located in US) in front of the price value. The value of this character is specified among the class’s members. Again, with a future version, the user will be able to specify the currency. Most probably a configuration file will be introduced containing all the variables that are hard-coded in this version, including list of required data fields, partial paths to files, filenames, etc.

The author’s previous experience in the field of educational multimedia made her a big proponent of following well thought out standards from the initial steps of any project specially ones that involve a large amount of content of varied content. While designing Catalog Converter, the author introduced a naming convention for non-XML files that also convey product data, i.e. images, in both small and full versions and specification sheets in PDF format. A filename is a combination of the slightly modified catalog number and an appropriate extension. For example, the specification sheet for

101-A is named 101a.pdf and the thumbnail and full-size images are named 101a.jpg

(since the small and full size versions are located in different directories, identical names are acceptable). All the image files are standardized in size in Photoshop. The data coming from different existing word documents sheets are also edited by the author and made as much uniform as possible, and then saved page by page using Adobe Acrobat.

To reference images and pdf files from the webpages, partial paths to the appropriate folders in the Apache Forrest ‘src’ directory are coded as class members.

33 String datasheetPartialPath = "../../../specsheets/"; String thumbnailPartialPath = "../resources/images/products/thumbnails/";

In order to reference the asset the program must combine the proper path with the file name. The value of the product catalog number is used as an “alt” (alternative text to be displayed if image does not show up) element of the “img” (image) HTML tag. If there are no images or datasheet file associated with the product, the code for their display is not written.

To build the navigation between products and within categories, the program looks at the previous and next elements in the categories array. If the current category is the first element in the array, it means that there is no previous category, and the corresponding button does not display. By analogy, if the category is the last, then there is no next element, and the code for the corresponding button is omitted. In all other cases, the link to the index file is built using the previous and next category names.

writer.print ("");

The same logic is utilized for building navigation between products in the current category. There is a mapping between indexes of categories in the categories array and the elements in the two-dimensional array of products. If the current array is an element with index ‘1,’ then the list of corresponding products is stored in the [1][] element of the products array.

There is also a button that leads to the list of all products or categories and it is merely a link to the corresponding index file.

34 As was mentioned, the products that belong to the same category share elements

of the color scheme that makes them different from the products in other categories.

These elements are constructed like this:

The colorScheme[currentCategory] is an index of colors array that determines which images will be used for this product display.

For the current version, all the buttons are made in Photoshop, which means that when an extra category is added, a new set of images must be created and saved in the appropriate category. The User Manual describes what the user will need to do to add his own styles. The new entry in skinconf.xml file external CSS portion should also be made manually to define the new style. In the future, a more dynamic approach will be introduced, with background colors listed in the configuration file and all buttons created during rendering. It will probably not eliminate the need for entering the values for prospective backgrounds by hand since the colors should be of particular tone to match

the overall design.

In this way all the information about the products is translated from the data

structure into the XML files. The index files for categories and the index of categories are

filled along with the modified Apache Forrest site.xml file, used to build a “Products”

menu tab later. The author plans to make the whole class more modular in future

versions to make it easier to expand Catalog Converter functionality and to edit the layout

of webpages.

35 Apache Forrest Stage

What is Apache Forrest?

Apache Forrest is an open source web-publishing framework based on XML and

Apache Cocoon. Apache Forrest uses a variety of content sources transformed into XML to produce different forms of output, including but not limited to HTML and PDF, in a unified format.

Why an Open Source Content Management System?

Open source systems are based on software, for which source code, usually reserved for copyright holders, is provided under license that meets the Open Source

Definition (OSD) or is in a public domain. The OSD means in particular that users can do modify the code and redistribute it in modified form (see http://en.wikipedia.org/wiki/Open_Source_Definition for a complete description of

OSD).

Some of these systems were created to work with various digital content. These systems are called Content Management Systems (CMS). Wikipedia defines such a system as “a computer application used to create, edit, manage, search and publish various kinds of digital media and electronic text.”

More and more companies prefer open source technologies to commercial software for their web content management operations. First, this is because these systems are either free or their licenses have low price point. The cost difference is especially important for non-profit or low-budget organizations, but companies and universities with complicated infrastructure like MIT and Harvard also gain from using

36 open source solutions. A company’s internal documentation, payroll forms, advertisement material, and other documentation needs a systematic unified approach.

Second, the open source content management technologies develop quickly and some are comparable to the commercial analogs in processing power. In addition, instead of starting with a large software package which may have a lot of features a user will never need, Drupal or Apache Forrest users can start with just the basic modules for creating web content or PDF documents, and later incorporate mail forms, search boxes, forums,

RSS feeds, etc. as needed by utilizing additional modules or plug-ins. Of course, the open source system must be set up, maintained, and upgraded so there is still a req uirement for a person or a team to service it. Nevertheless, it can be a cost-effective strategy. Finally, there is almost always a way to communicate with the constantly growing communities of open systems’ contributors, who are eager to offer help and share their experience at every stage of development via meetings, forums and mailing lists.

“Gartner Research” website (http://www.gartner.com) indicates that open source

CMSs solutions are poised to play a significant role in modern web development. CMSs are now used on many different widely used websites (including Wikipedia and

Facebook). Educators utilize content management systems to build sites for primary schools and for distance education courses. Social networking developers often use these systems to build interactive portals. Designers, photographers, and journalists create blogs and galleries with the help of CMSs.

37 Why Apache Forrest and not Programming Language?

In this project it is Apache Forrest that does the job of converting XML templates

into webpages. It is a new technology that is developing rapidly. There is a continually

growing list of websites based on Forrest displayed at the project’s official website.

There were several factors that made the author choose Apache Forrest instead of,

for example, writing the entire project in Java. First of all, Catalog Converter was

planned as a tool that even a non-programmer may use (and it hopefully will become

such a tool with the second version). , which underlies Apache Forrest, is

the perfect choice here, because it has a clear separation of responsibilities; the

programming logic, the content, and the presentation style are insulated from one

another. Also, according to the official website, Apache Forrest closely follows XML and

Java language standards, and it was the author’s deepest desire from the beginning to

improve her skills in these areas.

Apache Forrest doesn’t come with WYSIWIG (what you see is what you get)

editors, like another Cocoon - based framework, Lenya, but it offers a variety of skins to

work with and it is currently in the process of making configuration of skins more

intuitive. Apache Forrest’s default ‘pelt’ skin appeared to be very close to the desired

layout.

The product of Catalog Converter is not only a website that may be distributed on

CD. The site can be served either through the static files produced by the build process or dynamically through the Apache Forrest servlet. The files may also be placed on the server. The search mechanism is already supplied by Apache Forrest, so the user simply needs to uncomment a line in the configuration file before the build and the search box

38 will be conveniently displayed at the top right corner of every page. The only step left to

make this version of the website comparable to the company’s official website is to add

the functionality to sell products online. Other than this ability, the website produced by

Catalog Converter is arguably better than the current commercial product due to clearer

design and a simpler way to add new product information. It is also important to note, that every page is ready to print via a PDF file generated automatically by Cocoon from the same XML template as the page itself. There is an option in the skinconf.xml file to create a PDF version of the whole website and to provide the user with a link to it at a specified location (in the case of Catalog Converter, the link was placed under the

“Company” menu tab). Thus, the user receives three products with one run of Catalog

Converter. In addition, Apache Forrest gives the developer an opportunity to build either a static version of the site or a Java servlet web application (see the Apache Forrest website for details).

Behind the scenes

All the background work for Apache Forrest is done by Apache Coccoon, a that uses a system of pipelines to serialize XML content into variety of output format. A pipeline is a system of components that pass along Simple API for XML

(SAX) events. Cocoon serializes XML into a variety of formats, including HTML,

XHTML, PDF, SVG, Flash and more. See the Apache Cocoon official website

(http://cocoon.apache.org/) for details on Cocoon underlying mechanics. All the HTML

and PDF pages for Catalog Converter are generated by Cocoon.

39 Skin Configuration, Shell Elements

Figures 1, 3, and 4 show the current home page, the newly designed page and the

home page of Apache Forrest project that uses default ‘pelt’ skin respectively. There is

already much similarity between the desired design and predefined design. The locations

of the logo and search box suggested by ‘pelt’ match the desired positions; menu tabs fit

well to the desired functionality and shape. Thus, the ‘pelt’ skin was used as a base and

customized. The skin’s tables and notes change shape when a user changes the width of

the browser window. The Catalog Converter’s tables are of fixed size since the relative

positions are important. The breadcrumbs feature that is supported by ‘pelt’ is omitted

also a part of the design is omitted for the CD-based version because it is not supported

for locally run products, but might easily be turned on by uncommenting the

corresponding code in the skinconf.xml file. It can also be easily moved from the default

top left position to the line below the menu, as the new design suggests. The search box

can be activated the same way if the website is run from server. There is no need to write

any additional code, all the activation or deactivation of search, footnotes and “pdf-print”

button display may be done via properties of skinconf.xml.

Skin Configuration, Color Scheme and Cascading Style Sheets

There are two more steps the user needs to take in order to achieve the desired

page look. The first step is to customize the “colors” section of skinconf.xml file. These

will modify the look of the “shell,” the background colors and font colors of various

elements: menus, footnote with copyright, search box, breadcrumbs strip etc. The author

used a sample page image designed by a professional graphic designer as a guide. She obtained hexadecimal color values by using Adobe Photoshop color picker tool.

40 The second step is to describe properties of page elements: fonts for paragraphs and headings, colors for visited and non-visited links, properties of tables, and other characteristics of text content. The author started by creating different types of pages in

HTML format in Adobe Dreamweaver: a sample static page, index page, and product page. Different styles were applied until the desired look was achieved. Next, styles were moved from the pages to the ”extra-css” () part of the skinconf.xml. The only attributes left as a part of XML markup were the ones supported by Apache Forrest Document Type Definition (DTD) documentation

(http://forrest.apache.org/dtdx/document-v20.dtdx.html#img). For example, all the characteristics of the tables, their rows and columns are described now as styles external to pages. See Appendix for a list of styles used in the project. Some of the entries start with # (#content table, #content th, etc.). These are the overridden ‘pelt’ skin properties.

There are several CSS files with styles that go with the Apache Forrest skins. Some of the user defined files get ignored because of them. It is not obvious at a glance what happens since the logic suggests that if the styles are defined in configuration file, they should override other ones, and this is not the case. Mozilla FireBug software was used to determine properties that were overridden. Figure 15 shows the analyses of the style of header “Disclaimer.” The size of the font and margins are inherited from the skin’s basic.css external file and the color is defined in the style in profile.css file, the file where

Apache Forrest moves all the “extra-css” styles from skincon.xml during the build process.

41

Figure 15 FireBug analyses the hierarchy of CSS

42 PDF

Apache Forrest gives its user the opportunity to simultaneously produce a webpage and its PDF version. The PDF file is linked to the page via a standard icon.

Cocoon does all the necessary transformations and serializations. Some of the parameters for the PDF file (letter format and orientation, page numbering, margins, copyright footer) may be specified by modifying the section of Apache Forrest skinconf.xml file. Design and layout properties are inherited from the section of the same file. PDF files generated during the Apache Forrest build are ready for printing. Figures 16 and 17 demonstrate two files generated from the same source –

XML template of the company’s “About Us” page. Although a useful feature, default

PDF pages created by Apache Forrest still lag behind the ones produced with Adobe tools in terms of design flexibility and usability. There are numerous reports at Forrest forums regarding problems with font styles, backgrounds or images to be incorporated into documents. If the PDF file has links in the form of an image or in block format, the clickable area for the link is not defined precisely. PDF files generated for the Catalog

Forrest product pages demonstrate this beta-level implementation. For example, upon clicking the “Previous” button the user may be referred to the online catalog page instead because the clickabel area is defined imprecisely. Nevertheless, the overall look of the

PDF pages is clear and is in harmony with the website. Further, it is convenient to have a ready-to-print version of each page. There is also an option to create a PDF file for the whole site at once.

43

Figure 16 About Us webpage

44

Figure 17 About Us pdf file generated from the same XML files as About Us webpage

Menu Creation

Two Apache Forrest Files are responsible for menu creation, tabs.xml and site.xml. Tabs.xml was edited manually. This file describes the main menu tabs, the labels they use and the pages they are linked to. The site.xml file defines submenus. The static pages are added manually, and ProductTemplateManager lists product directories

45 dynamically. The look of the menu items, the width of the tabs and other properties are customized through skinconf.xml.

Static pages

A catalog is not only a listing of products. It usually contains an introduction to the company, its history, its profile, its services and contact information. The author’s company catalog is no exception. The corresponding webpages are static for the first version of Catalog Converter.

Text content for the company’s static pages (About ViroGen, Contact Biogenetic, etc.) is taken from the company’s commercial website. The text is used as is but was formatted according to the new styles in HTML with Adobe Dreamweaver. The author also bought several images at the Fotolia.com photo stock website and added them to the text to illustrate concepts of the pages. The HTML markup for each page then was transformed into XML valid for use with Apache Forrest.

Template Creation

All the templates for the static and product pages and indexes are created with

Apache Forrest restrictions in mind. Some markup elements and attributes cannot be used directly in the tags because they are not supported by Apache Forrest DTDs. Figure 18 illustrates an error generated during the process of content validation. The problematic line was:

46 Extra space around the image may still be added. To achieve it one must create a style in the skinconf.xml and apply it to the image.

Figure 18 Unsupported attribute for the image element error

Figure 19 shows the sample run of the successful validation process when all files were found to conform to XML rules and Apache Forrest DTD. For every type of page a prototype was first created using HTML. Then, the HTML markup was rewritten into

XML. This was done in order to make sure that when the reverse process happens during the build and HTML generated from the XML base, the pages look precisely as intended.

47

Figure 19 Successful validation of the site XML content

Build

The author creates a fresh “seed” Forrest directory “catalog ” (a sample directory containing a simple website that is generated automatically by running ‘forrest seed’ command at the desired location). She updates menus and configuration files to achieve the desirable design and functionality, adds XML templates for static pages and all the necessary images. This directory is placed inside the “catalogconverter” folder. The

“Products” folder created at the previous stage along with site.xml are relocated there.

Then the build is performed. During this process, Apache Cocoon works in the background on XML files and transforms them into HTML and PDF files. It is a complex

48 process that includes assembling the website by building and checking navigation between the pages, adding menus, verifying that all components are present, displaying images and other tasks.

49 Chapter 5 Testing

Thorough testing is one of the most important parts of software development. And it is obviously much more than handling the compile- or run-time errors. Everyone has his own logic and looking for errors based on it. When testing the webpage, some will resize the window and go through the links while others will put invalid addresses into forms and will try to download content that was intended for online usage only.

Catalog converter underwent testing at every stage of development.

The author’s supervisor at work, a person with deep knowledge of company’s products, took on the task of verifying the correctness of the product data listed in the

Excel spreadsheet. The author herself verified names of all image files and specification sheets along with the links to the product pages at the company’s official site.

The author was the first tester of the final website. All the navigational links on the created pages were clicked through, as well as images and links to the official website and specification sheets.

The Java component performance was verified both on the author’s home

Windows XP professional station as well as on similar stations at Harvard University

Science Center and Harvard Extension School Church Street facilities. The configuration file and some other files (log.txt, catalog.txt, etc.) were temporarily removed one by one or in combination to see what the program will do if it is missing such vital components.

Several test files with insufficient data were fed to the program instead of the valid data

50 including file, a file with no data at all, a file with just a list of fields, and files with some missing or duplicated information (see the ‘testing data files’ section of Appendix 1 for details).

Different types of testing were performed not only upon completion of the project but at nearly every step. For example, to make sure all the data was read from the data file, the list of products with their properties was written back into a separate file and compared to the original; to ensure that the system with templates works, first the simplest categories index.xml file was created, and previewed without build with Apache

Forrest engine, and only then were other templates created.

To keep the user informed of the status of the program execution, messages are displayed in the command prompt window at key points. Figure 20 illustrates what the screen may look like during a typical run. Catalog Converter informs the user that it extracted names of the data and log files, opened them read in the data, found some problems and rejected them, made a note in the log file, closed both the data and log files, created directories, assigned colors to mark them, created blank product files, index files, and site.xml, and started populating templates. Finally, a list of successfully crated and populated directories is displayed (not shown in the screenshot).

51

Figure 20 Sample run of the ProductRecorder

52 David Heitmeyer tested the final version of the Java code and Apache Forrest build with the freshly produced files. The only problem he reported was some performance issues with Apache Forrest 0.7 edition. Upon upgrading to the current (0.8) version the problem was resolved.

During the Apache Forrest portion of development, the author frequently checked how the final webpages look in most popular browsers. There were no major issues or drastically different looks. The final version of the website was checked on Windows and

Macintosh platforms using Mozilla Firefox, Internet Explorer and Netscape browsers.

The modified skinconf.xml file, all static pages and the prototypes for templates were validated with Apache Forrest’s internal mechanism.

A copy of the website was put on CD and has been sent to one of the company’s distributors to evaluate. The author hopes that this way not only possible technical errors may be revealed, but she may find out how flexible her code is in case some of the clients’ requirements change. This is especially important since one of the main goals of

Catalog Converter is to quickly provide a modified version where some information is removed or added.

53 Chapter 6 Error Handling

While developing Catalog Converter, the author kept in mind that target users of the software are people who work on presenting and promoting biological products and who are not interested in technical details. They may or not be concerned with the design elements, but mainly with content, namely a list of product characteristics presented fully and clearly. There is little chance that a user of this or future versions of Catalog

Converter will be a person with a deep background in software engineering or any related field. Therefore users will not likely understand a stack trace as an explanation for errors.

Whenever there is a possibility in the program flow of generating an error, whether a result of some input-output problem, non-existing data being processed, or a record being unacceptable for some reason, an explanatory message is generated and presented to the user. If an exception is thrown, the program displays such a message and exits. For example, this is what happens in case of an input/output exception:

catch(IOException e){ System.out.println("\nProblem with opening or closing a datafile. Try again."); System.exit(1); }

The author researched Java documentation to find which methods need try-catch sections. It is mostly a command prompt window task to display messages thrown by the system about security issues, files that already exist, or problems with opening or closing

54 files (see individual descriptions of classes for examples of possible problems). The log.txt file records the list of problems with the content of the data file.

Apache Forrest has its own mechanism of validating files against its internal rules. See Apache Stage section of Chapter 4 for more information.

55 Chapter 7 User Manual

1. Tools and Prerequisites

To run Catalog Converter or to make changes in functionality you may need:

- Java environment 1.5 (there may be problems with version 1.6 during

Apache Forrest installation and run),

- Apache Forrest, version 0.8.

To make changes in design the following tools may be helpful:

- Mozilla FireBug,

- HTML editor (the developer recommends Dreamweaver MX or CS3),

- A text editor with spell - checking capabilities such as Microsoft Word,

- A graphic editor able to resize images and optimize them for web

publishing (any version of Adobe Photoshop or Adobe Elements or

similar).

Knowledge of HTML, CSS, and Java is required to modify existing design or functionality.

2. Apache Forrest Installation and Configuration

56 Go to the download page of the official Apache Forrest website

(http://forrest.apache.org/mirrors.cgi) and follow instructions for obtaining the current release. The Catalog Converter was built with Forrest version 0.8. After download is complete and files are extracted from the archive, study the index.html file carefully. It contains important steps that you must follow in order to set the environmental variables and generate your own site that can be run locally. After everything is installed, perform a test by creating a ‘seed’ directory, viewing it locally and building a site from it. While working with Apache Forrest, you may need to set your JAVA_HOME environmental variable to point to the Java 1.5 version. A typical indication that that this must be done is when error messages similar to these are received:

validate-sitemap: C:\bin\apache-forrest- 0.8\main\webapp\resources\schema\relaxng\sitemap-v06.rng:72:31: error: datatype library "http://www.w3.org/2001/XMLSchema- datatypes" not recognized C:\bin\apache-forrest- 0.8\main\webapp\resources\schema\relaxng\sitemap-v06.rng:81:31: error: datatype library "http://www.w3.org/2001/XMLSchema- datatypes" not recognized

See discussion of the Apache Forrest incompatibility with Java 1.6 at the Forrest forum (http://www.mail-archive.com/[email protected]/msg02833.html).

Generally, the Apache Forrest forum is one good resource to deepen your knowledge of

Forrest, since the documentation is limited. The forum has two sections. The first section is for the users. Here questions may be asked or a search for similar questions already answered can be performed. The second section is for developers.

3. Data Preparation

57 Use the VirogenCatalog.xls file located in the 'documentation' directory to fill out all the fields for all the products you would like to include in the catalog using fields provided in the template. You cannot remove any existing fields or add new ones. If any cell is left empty, the program will produce an error regarding the incomplete information, and the record will be rejected. The CATALOG_NUMBER and

CATEGORY fields must have data assigned to them. Both these fields can have alpha - numeric values. If there is no information for any of the other fields, insert ‘N/A.’ Except for this single case, there may not be any ‘/’ characters in the text that is entered into the following fields:

THUMBNAIL_IMG,

FULL_SIZE_IMG,

DATASHEET.

The presence of the ‘/’ character will cause the browser to display directories incorrectly. Also, please make sure that there are no extra spaces at the end of each entry.

These may cause the browser to display items in incorrect locations.

After data entry is complete, go to the 'File' menu and choose the “Save As” option. Choose 'Text (Tab delimited)’ from the 'Save as type:' drop down menu. Name the file catalogdata.txt and save it in the 'data' directory." This will produce the file that will be used for the website creation. Before proceeding, open the file and check the information in it. Remove all extra quotation marks that saving to the plain text document format may create for the entries that have commas in them (such as lists of possible product’ applications).

58 There are two types of images: images on content pages (250 x 250 px maximum), and “product” images (200 x 200 px thumbnail, 400 x 400 px maximum full size).

If there are 20 or less categories in the catalog, all images for navigation purposes are included. However, if there are more than 20 categories, there may be a further course of action. By default, the program uses a neutral gray scheme for categories above 20.To specify a particular color, several things must be done, as follows.

Go to the ‘catalogconverter’ directory and open the ‘psd’ folder. There are two files there: arrow_bckgs.psd and next_previous.psd. You will need the first one to create background image to use in the indexes of products and categories (Figure 21).

The upper layer is a transparent image of an arrow. All other layers named “0” through “19” are the backgrounds used for the 20 categories that are now in ViroGen’s catalog. Twenty categories is the limit for now, since more than 20 are not well supported. The width of the image depends on the width of the table on the index page where categories are listed. If you would like to overwrite this or to modify existing colors, add a new layer, make sure it is below “Layer 1” but above all other layers; name it, fill it with a color of your choice, make sure that the layer is visible, and save image as a GIF file to the ‘bcks’ directory in the upper level of the software directory. The saved images should be named as i0_bckg.gif, i1_bckg.gif etc.

59

Figure 21 arrow_bckgs.psd

Another Photoshop file to be modified is next_previous.psd and it is used to create buttons for navigating among product and category pages (Figure 22). Layers named 0, 1, 2 etc. are filled with the same colors as the corresponding layers in the arrow_bckgs.psd file. ‘Next’ and ‘previous’ layers are transparent arrows that point to the right and to the left respectively. The rest of the layers are designated for the text labels.

60 Next/previous buttons constructed from the proper background color, text and

corresponding arrow, along with the list of categories and products buttons bearing no

arrows, should be saved in GIF format into

the'src\documentation\resources\images\navigation’ directory.

Name the files as follows: i0_all.gif (list of categories), i0_all_products.gif (list of products), i0_next.gif (next category), i0_next_product.gif (next product), i0_pr_product.gif (previous product), i0_previous.gif (previous category).

61

Figure 22 next_previous.psd

If the total number of categories is over 20 and all necessary images have been created, go to the createColorSchemes() method of ProductTemplateManager class

(ProductTemplateManager.java file in the ‘catalogconverter/utilities’ directory).

Locate the following code:

for (int i = 0; i < scheme.length; i ++) {

62 if (i <= 19) scheme[i] = i; else if (i >=20) scheme[i] = 20; }

In the ‘if’ statement, change ‘19’ to the current number of categories minus one, and ‘20’ in the ‘else if’ statement to the current number of categories.

4. Apache Forrest Skin Configuration

For those familiar with HTML and CSS, feel free to modify the skinconf.xml file in the 'src\documentation' directory to produce the desired color scheme of the page elements. By default the software uses the 'pelt' skin. Some of the entries in the skinconf.xml file may be overridden by a skin’s styles. Check the Apache Forrest website for more information.

If any color changes were made to the existing images or new backgrounds were created, scroll to the … section and locate styles that define backgrounds.

.i0 { background-color: #F6946A; }

.i0_back { background-color: #F6946A; background-image: url(images/i0_bckg.gif); background-position: center; }

Modify existing styles or add new styles as needed following the example above.

Use the Photoshop color picker tool to determine the value of the background color.

63 5. Running Catalog Converter

To compile and run Catalog Converter, first open the Command Prompt and

switch to the “catalogconverter” directory. Figure 23 shows what it should look like.

Figure 23 Catalogconverter directory

Type: compile.bat.

A number of files will be generated. The “products” folder will be created with the subfolders named according to the categories listed in the data file. The folders will in turn contain xml files for each product in the directory along with the index.xml file. The index file that lists all the directories will be created as well. Also, this stage will result in dynamic modification of site.xml (Apache Forrest uses this file along with the tabs.xml that was already modified while building the menu).

Move the newly generated “products’’ folder to the “xdocs” directory of the folder “catalog” (Figure 24).

64

Figure 24 xdocs level of the catalog directory

Also replace the site.xml file in this directory with the freshly created copy

located in the subfolder “navigation” in “catalogconverter” directory.

In Command Prompt, switch to the “catalog” directory and type: forrest run.

This will start the process of site generation. Apache Forrest will first validate all the XML source files and all the skin files and will check if it can find all the necessary files and establish all the links.

The HTML pages should now be created. At the end you will get text that confirms the success of the build (Figure 25).

65

Figure 25 Successful build

In the case of a failed build there may be several causes. One source for error is incorrect data input, namely extra white space at the end of some entries. See Figure 26 for an example of such an error.

Figure 26 Invisible character ruins the build

Figures 27 and 28 show a sample error message that resulted from Apache Forrest not being able to find a missing specification sheet (‘BROKEN: No flow in sequence’) and from an image placed in the wrong directory (‘BROKEN: No pipeline match request:…’)

66

Figure 27 The pdf document is not found

Figure 28 Another indication that something (image) is missing

There may also be problems related to incorrect installation of Apache Forrest.

Please, see the documentation available on the official website and users’ forum to rectify

issues relating to Forrest installation.

Copy the images from “bckgs” folder in “catalogconverter” to the “images” folder of the skin inside the “build” directory (Figure 29).

67

Figure 29 Images directory

Notice the path to the required folder at the top of the figure. Apache Forrest expects the images to be in specific places, else they will not be found during the build.

This is the final step in the Catalog Converter installation process.

If you encounter issues not covered here and related to Catalog Converter itself, please inform the author. Any comments, suggestions, and other forms of feedback are welcome as well.

68 Chapter 8 Future Development and Improvements

The first version of Catalog Converter implements all the features described in the

proposal. It reads information from the text file, does some basic data verification, and

produces XML files ready to be parsed by Apache Forrest. The main course of action for

future versions is to make the system as flexible and user-friendly as possible.

First of all, the author wants it to be more modular, so it is possible to add new

functionality by using different sets of classes. One addition to Catalog Converter

features will allow it to work not only with hard-coded lists of the required data fields but

with any additional information about the product that a customer wants to use in the

catalog.

Another future improvement will introduce a way for users to easily change the

color scheme and other design elements. To do it in this version requires opening the

Apache Forrest skinconf.xml (scheme configuration file) and to manipulate the stylesheet

section, which must first be located. Most importantly, to do these manipulations the user

must know HTML and CSS. The original idea, however, was that even a non-technical

person should be able to easily create a modified website. Catalog Converter would be much more user-friendly if there were WYSIWYG interface allowing the user to change background colors or the order of entries in the tables. Another possible path of development might be to introduce several variations of templates with a way for the user to specify the desired one.

69 Cocoon’s capability to resize and manipulate images should be utilized to make the image standardization process more automated. Also, Apache Forrest’s PDF- generation capability may be used to automatically generate product’s specification sheets. Dynamic customization of configuration files and adding menu items are also among future improvements.

Also, the templates for XML creation, although sufficient for their task, are not very intuitive. The indications for inserting code for the backgrounds, images etc. should be replaced with tags so that all the code follows XML convention and is clear to a developer not familiar with the program initially. One possible approache to this task is to use the syntax of XSL Transformations (XSLT), “stylesheet language for XML (see http://www.w3.org/TR/xslt#stylesheet-element for more information). In addition, instead of hard-coding the tags and attributes that will replace the placeholders, a separate data configuration file should be introduced. The more generic the templates will be, the easier it will be to adjust the software functionality to accommodate page layout or style changes, for example.

Yet another venue for improvement would be to redesign ParsingManager so it can work not only with tab-delimited text files but also with other types of data sources.

The obvious addition is to allow Catalog Converter to communicate with some relational database management system like Oracle. The Excel spreadsheet that is currently used resembles a relational table. Produc80

t data can be easily grouped by price, category, etc. Catalog numbers, being unique, serve the same function as primary keys for a database table. Every field in the

Excel spreadsheet is of a particular type: string, int, etc., as with fields in a relational

70 database. Catalog Converter’s manipulation of data stored in the HashMap array is similar to the database query SELECT. Writing code for Catalog Converter to establish and close connection with a database and placing data received as a result of the request into the data structure performs an analogous function to parsing the Excel file. Since the

ParsingManager validation mechanism and FileManager and ProductTemplate managers do not care how the HashMaps are created, the rest of the program flow will not be affected.

The website produced by Catalog Converter in this version is a CD- based informational distributive and contains only the products’ data. In order to purchase a product, a user must go to the company’s commercial website and locate the desired product page that has a link to the shopping cart. In view of the relatively high costs of website’s management and a dubious level of support currently provided, there is a chance that a similar shopping engine can be added to the website generated by Catalog

Converter; the generated website may then be fully used instead of the existing official site.

71 Chapter 9 Summary and Conclusions

The path from the initial idea of creating the Catalog Converter to when the first

properly looking and functioning pages were produced was enjoyable and challenging at

the same time.

Even in its first version, Catalog Converter is a simple and effective automation system that perfectly meets the author’s company’s needs. A nicely designed website with a clear layout is created. It may be placed on CD for distribution or (with some configuration changes) run from the server. Every page of the site also has a link to its

PDF version and may be printed. Although not everything was accomplished as intended, a working system was built that is a good base for further development (see Chapter 7 for a list of changes planned).

While working on the thesis, the author learned several lessons. First, the magnitude of the project taught her to be very detail-oriented. As the process showed, even such minor things as an extra space at the end of the text in the data document or a slash in the midst of the file name may cause unwelcome problems. Second, although there are times when one cannot produce exactly what was envisioned in the beginning, at least a base can be created to improve upon later and eventually attain the initial vision in full.

Finally, it was a great opportunity to get familiar with new technologies (Apache

Forrest and FireBug) and to the expand author’s knowledge of the Java programming

72 language beyond class assignments, as well as to build a useful tool. The processes of writing a technical paper in English and creating a user manual for the author’s own software were also new and demanding, but resulted in a significant personal accomplishment.

73 References

MyBusinessCatalog. http://www.freedownloadscenter.com/Business/Applications/ MyBusinessCatalog_Gold.html, retrieved February 2009.

FlipAlbum. http://www.flipalbum.com/fahome/index.php, retrieved February 2009.

Content management system (CMS). http://en.wikipedia.org/wiki/Content_management_system, retrieved February 2009.

Apache Forrest. http://forrest.apache.org, retrieved February 2009.

Apache Cocoon. http://cocoon.apache.org/, retrieved February 2009.

Drupal. http://drupal.org/, retrieved February 2009.

Flex. http://www.adobe.com/products/flex/, retrieved February 2009.

FireBug. http://getfirebug.com/, retrieved February 2009.

OpenSourceCMS. http://php.opensourcecms.com/, retrieved March 2009.

Gartner Research. http://www.gartner.com, retrieved March 2009.

W3C. http://www.w3.org/TR/xslt#stylesheet-element, retrieved April 2009

74 Appendix 1 Application Code

The files in this appendix are organized in the following order:

- Java code;

- compilation file;

- configuration files;

- fragments of the modified Apache Forrest files;

- XML templates;

- sample xml static page;

- sample log.txt entry

- snippet of tab-delimited data file;

- testing data files (with intentional data errors).

75 Java Code

FileManager.java

/** * File: FileManager class * Contents: Reads categories and product numbers and * creates appropriate directories and xml files. * * @version 1.0 * @author Irina L. Danilova. ALM in IT Thesis. **/ package catalogconverter.util; import java.io.*; import java.lang.*; import java.lang.String; import java.util.*; public class FileManager{

public HashMap [ ] hashMapCollection; String [ ][ ] nextKeyValuePair; String partialPath = "catalogconverter/products/categories"; String catIndexFile = "catalogconverter/products/index.xml"; String siteXmlFile = "catalogconverter/navigation/site.xml"; String siteXmlPath = "catalogconverter/navigation"; String fullCategoryPath; String fullFilePath; String categoryName; String fileName;

//default constructor public FileManager () { System.out.println ("Created default File Manager"); }

// constructor // @param HashMap [ ] myCatalog public FileManager (HashMap [] myCatalog) {

hashMapCollection = myCatalog; // The rest will be filled out while parsing collection // of Hash Maps String fullPath = ""; String categoryName = ""; String fileName = ""; }

// Create an index.xml file for each directory-category // and an xml file for each product. public void makeFileStructure(){

76

makeProductFiles(); makeCategoriesIndex(); makeIndexFiles(); makeSiteXML();

}

// makeCategoriesIndex() // Create an index file for listing of all the directories

public void makeCategoriesIndex(){

System.out.println ("Creating index of categories."); File productFile = new File (catIndexFile);

// put a file there... // make a separate function that accomplish the task // of file creating - reusing of the same code // along the class...

if (! productFile.exists()) { try { productFile.createNewFile(); } catch (SecurityException e) { // Print out the exception that occurred System.out.println("Security Error when creating a categories index: " + e.getMessage()); } catch (IOException e) { // Print out the exception that occurred System.out.println("I/O Error when creating a file: " + e.getMessage()); } }

}

// makeSiteXML() // Creates site.xml file

public void makeSiteXML(){

System.out.println ("Creating site.xml.");

File siteFile = new File (siteXmlFile); File directoryFile = new File (siteXmlPath);

// create a directory if (!directoryFile.exists()){ try { directoryFile.mkdirs(); }

catch (SecurityException e) {

77 // Print out the exception that occurred System.out.println("Security Error when creating a 'navigation' directory: " + e.getMessage()); } }

if (! siteFile.exists()) { try { siteFile.createNewFile(); } catch (SecurityException e) { // Print out the exception that occurred System.out.println("Security Error when creating a site.xml: " + e.getMessage()); } catch (IOException e) { // Print out the exception that occurred System.out.println("I/O Error when creating a site.xml: " + e.getMessage()); } }

}

// makeProductFiles() // Create directories and file structure. Place everything // in the products directory. public void makeProductFiles(){

System.out.println ("Creating files and directories based on the products information.");

int iter=0;

while (hashMapCollection[iter].isEmpty() == false && hashMapCollection[iter].size() == 15 ){

categoryName = "" + hashMapCollection[iter].get("CATEGORY"); fileName = "" + hashMapCollection[iter].get("CATALOG_NUMBER"); fullCategoryPath = "" + partialPath + "/" + categoryName; fullFilePath = "" + partialPath + "/" + categoryName + "/" + fileName + ".xml";

// create a new file File directoryFile = new File (fullCategoryPath); File productFile = new File (fullFilePath);

// create a directory if (!directoryFile.exists()){ try {

78 directoryFile.mkdirs(); }

catch (SecurityException e) { // Print out the exception that occurred System.out.println("Security Error when creating a directory: " + categoryName + " " + e.getMessage()); } }

// put a file there if (!productFile.exists()) { try { productFile.createNewFile(); } catch (SecurityException e) { // Print out the exception // that occurred System.out.println("Security Error when creating a file: " + fileName + " " + e.getMessage()); } catch (IOException e) { // Print out the exception // that occurred System.out.println("I/O Error when creating a file: " + fileName + " " + e.getMessage()); } }

iter++; if (iter >= hashMapCollection.length - 1 ) break; }

}

// makeIndexFiles() // Create index.xml file for each directory public void makeIndexFiles(){

System.out.println ("Creating index files."); int iter=0;

while (hashMapCollection[iter].isEmpty() == false && hashMapCollection[iter].size() == 15 ){

categoryName = "" + hashMapCollection[iter].get("CATEGORY"); fullCategoryPath = "" + partialPath + "/" + categoryName; fullFilePath = "" + partialPath + "/" + categoryName + "/" + "index.xml";

// create a new file

79 File directoryFile = new File (fullCategoryPath); File productFile = new File (fullFilePath);

// create a directory if (!directoryFile.exists()){ try { directoryFile.mkdirs(); }

catch (SecurityException e) { // Print out the exception that occurred System.out.println("Security Error when creating a directory: " + categoryName + " " + e.getMessage()); } }

// put a file there if (!productFile.exists()) { try { productFile.createNewFile(); } catch (SecurityException e) { // Print out the exception that occurred System.out.println("Security Error when creating a file: " + fileName + " " + e.getMessage()); } catch (IOException e) { // Print out the exception that occurred System.out.println("I/O Error when creating a file: " + fileName + " " + e.getMessage()); } }

iter++; if (iter >= hashMapCollection.length - 1 ) break; }

}

}

80 ParsingManager.java

/** * File: ParsingManager * Contents: * - Opens catlog.txt file and reads in the name of the data file * - Opens a text data file; * - Looks for the fields specified in the configuration files * - Validates data * - Returns a collection of HashMap objects * - Informs the user if any records were rejected (log.txt) * * * @version 1.0 * @author Irina L. Danilova. ALM in IT Thesis. **/ package catalogconverter.util; import java.io.*; import java.lang.*; import java.lang.String; import java.util.*; public class ParsingManager{

private String [] fieldsToParse; private String [] requiredFields = { "CATALOG_NUMBER", "PRODUCT_NAME", "CATEGORY", "TYPE", "SIZE", "STORAGE", "PRICE", "SHIPPING", "APPLICATION", "RECOGNITION", "CLONE_SUBTYPE", "THUMBNAIL_IMG", "FULL_SIZE_IMG", "PRODUCT_ONLINE", "DATASHEET", };

// Product data private String productCatalogNumber; private String productName; private String productCategory; private String productType; private String productSize; private String productStorage; private String productPrice; private String productShipping; private String productApplication;

81 private String productRecognition; private String productCloneSubtype; private String productThumbnailImage; private String productFullImage; private String productOnline; private String productDataSheet;

// Some arrays and counters for retrieving data // from the text file private String [][] dataFieldsFromDataFile; private String [][] parsedData; private int [] dataFieldsMap; private int indexOfFields; private int indexOfData; private int rejected; private int lineNumber; private int arrayLength; private int foundFields; private String [] catalogNumbers; private String [] rejectedLines;

// These data are extracted from the files_config.txt file // and represent respectively: // - The file to read the catalog data from; // - The file to write the content of the data file to insure // the correct reading (test purpose only) // - Log-file to track the errors.

private String dataFileName = " "; private String logFile = " "; private String configFiles = "catalogconverter/config/files_config.txt"; private char delim = '\t';

// Tools for reading and tokenizing data file private BufferedReader in; private PrintWriter log; private StringTokenizer st;

// Flags for file parsing private int reuiredFieldsArePresent = 0; private boolean fieldLine = false; private boolean allRequiredFieldsArePresent = true; private boolean invalidData = false;

// Collections public HashMap [ ] hashMapCollection; private String [ ][ ] nextKeyValuePair;

/* Constructors */

//DelimitedFileParser constructor public ParsingManager (){

// Determine the length of array that was passed and make // an array of the same length arrayLength = requiredFields.length;

82 fieldsToParse = new String[arrayLength]; catalogNumbers = new String [1000];

// Initialize index for the future array of fields dataFieldsFromDataFile = new String[1000][1000]; foundFields = 0; // number of matching fields dataFieldsMap = new int[arrayLength]; rejectedLines = new String [1000]; hashMapCollection = new HashMap [1000];

// counters indexOfFields = 0; // entries in the data file fields array indexOfData = 0; rejected = 0; // index of rejected lines array lineNumber = 0; // line of the text file

// Copy the passed attributes into our parser's properties for (int i = 0; i

// initialize array to keep the track of catalog numbers for (int j = 0; j < 1000; j ++) catalogNumbers[j] = " ";

}

// parseCatalogData() // Performs the sequence of actions needed to parse the text file // with the products info and to populate the data structure // that will farther used to create the xml files for the html // and PDF files creation. public HashMap [] parseCatalogData(){ getDataFileName(); readDataFile(); hashMapCollection = createHashMapCollection(); hashMapCollection = validateData(); cleanup(); return hashMapCollection; }

// getDataFileName() public void getDataFileName(){

System.out.println("\nRetrieving the data file name."); try{ StringBuffer contents = new StringBuffer(); try { in = new BufferedReader(new FileReader(configFiles)); String line = null; while (( line = in.readLine()) != null) parseTheCatalogTxt (line);

} finally { in.close(); } } catch (IOException ex){

83 ex.printStackTrace(); }

System.out.println("\nData File: " + dataFileName); }

// parseTheCatalogTxt(String currentLine) // Extract information from the configuration file: the name of // the data file, the log file, and the name of the file to write // the test output to. // In case the name of data file is not supplied, throw the // error and terminate the program; // @param currentLine line of data public void parseTheCatalogTxt(String currentLine){

String[] result = currentLine.split("="); for (int x=0; x

// public void readDataFile() // Open the data file and parse it line by line. public void readDataFile(){ System.out.println("\nOpening data file...\n"); String fullReaderFileName = "catalogconverter/data/" + dataFileName; String fullWriterFileName = "catalogconverter/data/" + logFile; try{ in = new BufferedReader ( new FileReader(fullReaderFileName)); log = new PrintWriter ( new BufferedWriter( new FileWriter(fullWriterFileName))); String line;

while ((line=in.readLine())!=null){ if ((line.startsWith("#")) || (line.length()==0)){ log.println("BAD INPUT" ); rejectedLines[rejected]=line; rejected++; } else parseTheSourceFile(line);

} if ( rejected > 0) { System.out.println ("\nSome of the records were rejected during parsing because of the missing field(s)");

84 System.out.println ("\nCheck log.txt file located in the 'data' folder for the list of the products..."); System.out.println("\nThe rest of the data will now undergo the validity test."); } else System.out.println ("\nAll the required fields for all products are present.");

parsingTest(); listRejectedLines(); }

// Exceptions handling catch(FileNotFoundException e){ System.out.println("\nFile not found. Please, check the spelling and try again."); System.exit(1); }

catch(IOException e){ System.out.println("\nProblem with opening or closing a datafile. Try again."); System.exit(1); } }

// parseTheSourceFile(String nextLine) method. // Performs data parsing. Assumption is the line shouldn't start // with " " unless it's a blank line (to be improved). // @param currentLine // line of data public void parseTheSourceFile(String currentLine){ lineNumber++; if (fieldLine==false){

fieldLine=true; int counter =0; st = new StringTokenizer (currentLine, "\t");

while (st.hasMoreTokens()) { try { dataFieldsFromDataFile[0][counter] = st.nextToken(); counter++; } catch(NoSuchElementException e) { System.out.println( "\nNo token to process."); System.exit(1); } } indexOfFields=counter; if (!(allRequiredFieldsArePresent =

85 checkIfAllRequiredFieldsArePresent( dataFieldsFromDataFile))){ log.println("Data field is missing. The record is rejected."); } else log.println ("All required fields are present."); } // data lines else if (fieldLine==true){ st = new StringTokenizer (currentLine, "\t"); int numOfTokens = st.countTokens(); if (numOfTokens < arrayLength) { rejectedLines[rejected] = currentLine; rejected++; lineNumber--; } // populate the line if all data is here else { while (st.hasMoreTokens()) { dataFieldsFromDataFile[lineNumber- 1][indexOfData] = st.nextToken(); indexOfData++; } } indexOfData=0;

} }

// checkIfAllRequiredFieldsArePresent( String // [][]fieldsInDataFile) method. // Compares the values in the fields line of the data file // with the list of the fields received form the CoirseCreator. // Returns true if all the required fields are present. public boolean checkIfAllRequiredFieldsArePresent( String [][] fieldsInDataFile){ log.println("Checking presence of " + arrayLength + " required fields..."); log.println("Index of fields: " + indexOfFields );

for (int i=0; i

if (foundFields < arrayLength) { return false; }

86 return true; }

// Returns the parsed data for developers check public void parsingTest(){ parsedData = new String [lineNumber][arrayLength]; for (int i=0; i

// void cleanup() method. Closes the data and log files. public void cleanup(){ try{ in.close(); System.out.println("\nData file successfully closed."); log.close(); System.out.println("\nLog file successfully closed."); } catch(IOException e) { System.out.println("\nUnsuccessul in.close()"); System.exit(1); } }

// validateData() // Performs data check: // - rejects entries with '/' in: catalog number, category, // images and datasheet names. // - excludes duplicated catalog numbers. public HashMap [ ] validateData(){ // counter for the HashMaps in the Collection int iter = 0; int index = -1; boolean duplicate = false; int catnumindex = 0;

while (hashMapCollection[iter].isEmpty() == false){

productCatalogNumber = "" + hashMapCollection[iter].get("CATALOG_NUMBER"); productCategory = "" + hashMapCollection[iter].get("CATEGORY"); productThumbnailImage = "" + hashMapCollection[iter].get("THUMBNAIL_IMG"); productFullImage = "" + hashMapCollection[iter].get("FULL_SIZE_IMG"); productDataSheet = "" + hashMapCollection[iter].get("DATASHEET");

// 1. Catalog Number should not be "N/A" or contain // '/'. Nor can it start/end with space.

87 if ( (productCatalogNumber.equals("N/A") || productCatalogNumber.indexOf('/', 0) != -1) || (productCategory.equals("N/A") || productCategory.indexOf('/', 0) != -1) || ((! productThumbnailImage.equals("N/A")) && productThumbnailImage.indexOf('/', 0) != -1 )|| ((! productFullImage.equals("N/A")) && productFullImage.indexOf('/', 0) != -1 || ((! productDataSheet.equals("N/A")) && productDataSheet.indexOf('/', 0) != -1 )) ) {

System.out.println("N/A or '/' in the product number, category name or the name of the image/spec sheet. The record is rejected."); log.println("N/A or '/' in the product number, category name or the name of the image/spec sheet in the product: " + productCatalogNumber + " . The record is rejected."); hashMapCollection = removeEntry(iter); }

// 2. Check if there is a duplicate catalog number. // If so, reject the entry. // If no, add the number to the array for tracking.

for (int i=0; i

if ( catalogNumbers[i].equals(productCatalogNumber){ System.out.println("Duplicated product number: " + productCatalogNumber + " . The record was rejected."); duplicate = true; } }

if (duplicate == true) { log.println("Duplicated product number: " + productCatalogNumber + " . The record was rejected."); hashMapCollection = removeEntry(iter); duplicate=false; }

else {

// System.out.println("Adding to array of catnums: " + productCatalogNumber); catalogNumbers[catnumindex] = productCatalogNumber; catnumindex ++; }

// advance the counter, reset the boolean iter ++;

88 if (iter >= hashMapCollection.length - 1 ) break;

}

return hashMapCollection; }

// removeEntry(int indexOfInvalidEntry) // @param int indexOfInvalidEntry // Removes an invalid entry from the array of HashMaps public HashMap [ ] removeEntry(int indexOfInvalidEntry) {

HashMap [] tempCollection = new HashMap [hashMapCollection.length-1];

int i = 0; int j = 0; int elementToDelete = indexOfInvalidEntry; while (i < hashMapCollection.length && j < tempCollection.length) { if (i == elementToDelete) i++;

else { tempCollection[j] = hashMapCollection[i]; i++; j++; } }

return tempCollection;

}

// listRejectedLines() method. // Lists lines rejected by the parser. public void listRejectedLines(){ log.println("\n------"); log.println("\nPARSER-REJECTED LINES"); log.println("\nThe following " + rejected + " lines were rejected by the parser"); log.println("(they are comments, blank lines or lines with insufficient data):\n");

for (int i=0; i

// createHashMapCollection() // creates HashMapCollection public HashMap [] createHashMapCollection(){ log.println("\nStarting creating an array of HashMaps...\n"); log.println("\nHashMap objects passed to the Driver have

89 following properties:\n");

// go through the cleaned data array and extract // key (required field name) - // value pairs.

HashMap [] myCollection = new HashMap [lineNumber]; nextKeyValuePair = new String [arrayLength][lineNumber];

log.println("Number of HashMaps " + (lineNumber-1)); if (lineNumber-1 <=1) { System.out.println("No data in the HashMap Collection. Please check the data file and try again."); System.exit(1); }

for (int i=0; i

return myCollection; }

// Map fill (Map m, String [][] data) // Fills HashMap with product information. // Exit the program if there are problems. public static Map fill (Map m, String [][] data){

for (int i = 0; i < data.length; i++){ try { m.put(data[i][0], data[i][1]); } catch(NullPointerException e){ System.out.println( "\nNullPointerException while put data into HashMap."); System.exit(1); } catch(UnsupportedOperationException e){ System.out.println("\nThe put operation is not supported by this map."); System.exit(1); } catch(ClassCastException e){ System.out.println("\nThe class of the specified key or value prevents it from being stored in this map."); System.exit(1); } catch(IllegalArgumentException e){ System.out.println("\nSome property of the specified key or value prevents it from being stored in this map.");

90 System.exit(1); } } return m; } }

91

Product. java

/** * File: Product.java * * Contents: This file defines a Product class. Each product in the * catalog has to have some attributes, such as catalog number and short * description. The product also may have price and some additional * characteristics, like shipping, storage conditions, etc. * * This version will use some hard-coded variables. * * @version 1.0 * @author Irina L. Danilova. ALM in IT Thesis. **/ package catalogconverter.source; public class Product {

// Globals private String productCatalogNumber; private String productName; private String productCategory; private String productType; private String productSize; private String productStorage; private int productPrice; private String productShipping; private String productApplication; private String productRecognition; private String productCloneSubtype; private String productThumbnailImage; private String productFullImage; private String productDataSheet; private String productOnline;

// Default constructor public Product () {}

// Constructor public Product ( String prCatalogNumber, String prName, String prCategory, String prType, String prSize, String prStorage, int prPrice, String prShipping, String prApplication, String prRecognition,

92 String prCloneSubtype, String prThumbnailImage, String prFullImage, String prDataSheet ) {}

// SET methods of Product class:

// Set the product catalog number public String setCatalogNumber(String itemCatalogNumber) { productCatalogNumber = itemCatalogNumber; return productCatalogNumber; }

// Set the product's name public String setName(String itemName) { productName = itemName; return productName ; }

// Set the product's category public String setCategory(String itemCategory) { productCategory = itemCategory; return productCategory; }

// Set the product's type (Antibody or Antigen) public String setType(String itemType) { productType = itemType; return productType; }

// Set the product's size public String setSize(String itemSize) { productSize = itemSize; return productSize; }

// Set the product's storage conditions public String setStorage(String itemStorage) { productStorage = itemStorage; return productStorage; }

// Set the product's price public int setPrice(int itemPrice) { productPrice = itemPrice; return productPrice; }

// Set the product's shipping conditions public String setShipping(String itemShipping) { productShipping = itemShipping; return productShipping; }

93 // Set the product's application public String setApplication(String itemApplication) { productApplication = itemApplication; return productApplication; }

// Set the product's recognition public String setRecognition(String itemRecognition) { productRecognition = itemRecognition; return productRecognition; }

// Set the product's clone subtype public String setCloneSubtype(String itemCloneSubtype) { productCloneSubtype = itemCloneSubtype; return productCloneSubtype; }

// Set the product's thumbnail image public String setThumbnailImage(String itemThumbnailImage) { productThumbnailImage = itemThumbnailImage; return productThumbnailImage; }

// Set the product's full image public String setFullImage(String itemFullImage) { productFullImage = itemFullImage; return productFullImage; }

// Set the link to the product's page on-line public String setProductOnline(String itemLink) { productOnline = itemLink; return productOnline; }

// Set the product's datasheet public String setDataSheet(String itemDataSheet) { productDataSheet = itemDataSheet; return productDataSheet; }

// GET methods of Product class:

// Get the product's catalog number public String getCatalogNumber () { return productCatalogNumber; }

// Get the product's name public String getName () { return productName; }

// Get the product's category public String getCategory () {

94 return productCategory; }

// Get the product's type (Antibody or Antigen) public String getType () { return productType; }

// Get the product's size public String getSize () { return productSize; }

// Get the product's storage conditions public String getStorage() { return productStorage; }

// Get the product's price public int getPrice() { return productPrice; }

// Get the product's shipping conditions public String getShipping() { return productShipping; }

// Get the product's application public String getApplication () { return productApplication; }

// Get the product's recognition public String getRecognition () { return productRecognition; }

// Get the product's clone subtype public String getCloneSubtype () { return productCloneSubtype; }

// Get the product's thumbnail image public String getThumbnailImage() { return productThumbnailImage; }

// Get the product's full image public String getFullImage() { return productFullImage; }

// Get the link to the product's page on-line public String getProductLink() { return productOnline; }

95

// Get the product's datasheet public String getDataSheet() { return productDataSheet; } }

96 ProductRecorder.java

/** * File: ProductRecorder.java * * Contents: This file defines a ProductRecorder class, which is * responsible for creating the parser object for traversing the data * file supplied by the user. The Recorder assumes that the data in the * input file are represented as a tab-delimited entries. * This file is a part of the catalogconverter.source. * * @version 1.0 * @author Irina L. Danilova. ALM in IT Thesis. **/ package catalogconverter.source; import catalogconverter.util.*; import java.io.*; import java.lang.*; import java.util.*; public class ProductRecorder {

static BufferedReader keyboardInput; static String delimiter = "\t"; static HashMap [] virogenCatalog; static boolean validData;

/* Methods */

// Default constructor public ProductRecorder ( ) {}

/** * Main () method. * @param none * @return a bunch of xml files to use with the Apache Forrest to * create a CD-based catalog. **/ public static void main(String args[]) {

// Start with checking if the configuration and data files // are where they have to be. If they aren’t, issue an // error and exit.

try { // 1. Find the data file; make sure all the fields // are present; read in the data; perform some basic // data validation. If everything is ok, // proceed farther. ParsingManager parsemanager = new ParsingManager (); virogenCatalog = parsemanager.parseCatalogData();

97 // 2. Based on the data, create all the directories // and files. FileManager filemanager = newFileManager(virogenCatalog); filemanager.makeFileStructure();

// 3. Fill all the files with the content using // templates and the product data. ProductTemplateManager templatemanager = new ProductTemplateManager (virogenCatalog); templatemanager.fillTemplates();

} catch (Exception e) { // Print out the exception that occurred System.out.println("Error: " + e.getMessage()); }

} }

98 ProductTemplateManager.java

/** * File: ProductTemplateManager class * Contents: Fills the xml template with the products/categories data. * * @version 1.0 * @author Irina L. Danilova. ALM in IT Thesis. **/ package catalogconverter.util; import java.io.*; import java.lang.*; import java.lang.String; import java.util.*; public class ProductTemplateManager {

// Paths to file and directories. In the future make them all // to be read from the configuration file. String partialPath = "catalogconverter/products/categories"; String fullCategoryPath; String fullFilePath;

// Paths to index of categories and site.xml file String catIndexFile = "catalogconverter/products/index.xml"; String siteXmlFile = "catalogconverter/navigation/site.xml";

// The default currency is the US dollar. May be changed // if needed String currency="$";

// Product data private String productCatalogNumber; private String productName; private String productCategory; private String productType; private String productSize; private String productStorage; private String productPrice; private String productShipping; private String productApplication; private String productRecognition; private String productCloneSubtype; private String productThumbnailImage; private String productFullImage; private String productOnline; private String productDataSheet;

// Navigational globals int catCount;

// Storage for categories' names and products' catalog numbers

99 private String [] categoriesArray; private String productsArray [][];

// Values for next an previous entries. private String previousCategory; private String nextCategory;

private String previousProduct; private String nextProduct;

private int [] colorScheme; private int currentCategory;

// Buffers for reading templates and writing the final xml files. private BufferedReader in; private PrintWriter product; // product files private PrintWriter category; // index of one directory private PrintWriter catindex; // index of directories private PrintWriter sitexml; // labels for site.xml private HashMap [ ] hashMapCollection; public static final int MAX_CATEGORIES = 20;

// These are passes to (move to config file later): // product template file String productTemplateFile = "catalogconverter/templates/product_template.xml"; // category index file String indexTemplateFile = "catalogconverter/templates/index_template.xml"; // list of categories String catIndexTemplateFile = "catalogconverter/templates/directories_template.xml"; // site.xml (navigation) String siteTemplateFile = "catalogconverter/templates/site_template.xml"; // datasheets String datasheetPartialPath = "../../../specsheets/"; String datasheetFullPath = " "; // images String thumbnailPartialPath = "../resources/images/products/thumbnails/"; String thumbnailFullPath = " "; String fullSizePartialPath = "../resources/images/products/full_size/"; String fullSizeFullPath = " ";

//default constructor public ProductTemplateManager () { System.out.println ("Created default Product Template Manager "); }

// constructor // @param HashMap [ ] myCatalog public ProductTemplateManager (HashMap [] myCatalog) {

hashMapCollection = myCatalog;

100 categoriesArray = new String [hashMapCollection.length]; productsArray = new String [hashMapCollection.length][hashMapCollection.length];

for (int i = 0; i < hashMapCollection.length; i ++){ categoriesArray[i] = " "; }

for (int j = 0; j < hashMapCollection.length; j ++){ for (int k = 0; k < hashMapCollection.length; k ++){ productsArray [j][k] = " "; } }

previousCategory = " "; nextCategory = " ";

previousProduct = " "; nextProduct = " ";

currentCategory = 0; }

// fillTemplates() // Copies content of the template file into the product file. public void fillTemplates(){

// Create arrays of categories and product names to be // used for navigating fillCategoriesProductArray();

// create an array of the colors to mark // different categories colorScheme = createColorSchemes();

// Make a diretories listing createIndexOfCategories();

// Create navigation (fill site.xml) createSiteXml();

// Fill in product templates populateFiles();

// Create category's index files

for (int i = 0; i < catCount; i ++ ){ productCategory = categoriesArray[i]; fullCategoryPath = "" + partialPath + "/" + categoriesArray[i] + "/" + "index.xml"; createIndexFiles(fullCategoryPath); } }

// fillCategoriesProductArray(String categoryName,

101 // String productNumber) // Creates an array of the Categories and Product Files. // This will be used later to dynamically create navigation // between the pages.

public void fillCategoriesProductArray(){

// counter for the HashMaps in the Collection int iter = 0;

// keeps track of the array cell int theCategory = 0; int theProduct = 0; int existingcat = 0;

// if the category already was put into array boolean present = false;

while (hashMapCollection[iter].isEmpty() == false){

// get just the category name and the product name productCatalogNumber = "" + hashMapCollection[iter].get("CATALOG_NUMBER"); productCategory = "" + hashMapCollection[iter].get("CATEGORY");

// If we are at the very first entry, // just add the Category and Product Number // to the proper arrays. Otherwise check if the // category already there for (int i = 0; i < categoriesArray.length; i ++ ){

if (categoriesArray[i].equals(productCategory)){ existingcat = i; present = true; } }

// Now, if we run through the array of Categories // and this Category is new, add it and the product. if (present == false) { theProduct = 0; categoriesArray [theCategory]=productCategory; productsArray [theCategory][theProduct] = productCatalogNumber; theProduct ++; theCategory ++; catCount ++; present = true; }

// If the category is there, just add the product. else if (present == true) { productsArray [existingcat][theProduct] = productCatalogNumber; theProduct ++; }

102

// advance the counter, reset the boolean iter ++; present = false;

if (iter >= hashMapCollection.length - 1 ) break; }

}

// Test method to check if the categories and products were // added properly. public void testArrays() {

// Arrays: for (int i = 0; i < catCount; i ++ ){ System.out.println("Array : " + i + " " + categoriesArray[i]); }

// Products: for (int j = 0; j < productsArray.length; j ++ ){ for (int k = 0; k < productsArray[j].length; k ++ ){ if (! (productsArray[j][k].equals(" "))) System.out.println("Product k: " + k + " , " + productsArray[2][k]); } } }

// Creates an array with numbers from one to number of // categories - 1. Later the backgrounds and images will be used // to create a color scheme and navigation for the directories // products by adding those numbers to the names of classes and // files like bkg_1, prevprod_1 etc. public int [] createColorSchemes() {

System.out.println ("Creating the list of colors :)");

// Let's assume there are two options. The first one is // that every time the catalog is generated everything // should remain exactly as in the previous version (except // maybe price change etc.) // The second option is to generate a random sequence // of the colors to be used. // The choice of using either one should be defined // somewhere, say in the configuration file. // To be added if there is time.

// 1. Let's say we are keeping the colors in the same order // all the time. Then we just need an array with numbers 0 // through the number of categories - 1

int [] scheme = new int [categoriesArray.length];

103 for (int i = 0; i < scheme.length; i ++) { if (i <= 19) scheme[i] = i; else if (i >=20) scheme[i] = 20; }

return scheme; }

// createIndexOfCategories() // Makes a listing of all categories. public void createIndexOfCategories(){

try{

in = new BufferedReader (new FileReader(catIndexTemplateFile)); catindex = new PrintWriter (new BufferedWriter( new FileWriter(catIndexFile))); String line; while ((line=in.readLine())!=null){

// System.out.println(line); parseTheCatsIndex (line);

} cleanup(in, catindex, fullCategoryPath); }

// Exceptions handling catch(FileNotFoundException e){ System.out.println("\nFile not found. Please, check the spelling and try again."); System.exit(1); }

catch(IOException e){ System.out.println("\nProblem with opening or closing a datafile. Try again."); System.exit(1); } }

// createSiteXml() // Adds labels to the site.xm document //(part of the Apache Forrest engine). public void createSiteXml() {

try{

in = new BufferedReader ( new FileReader(siteTemplateFile)); sitexml = new PrintWriter (new BufferedWriter( new FileWriter(siteXmlFile))); String line;

104

while ((line=in.readLine())!=null){ parseTheSiteXml (line);

} cleanup(in, sitexml, fullCategoryPath); }

// Exceptions handling catch(FileNotFoundException e){ System.out.println("\nFile not found. Please, check the spelling and try again."); System.exit(1); }

catch(IOException e){ System.out.println("\nProblem with opening or closing a datafile. Try again."); System.exit(1); } }

// Create Categories index files. Go through the arrays of // categories and products and fill the template. public void createIndexFiles(String catPath) {

try{

in = new BufferedReader ( new FileReader(indexTemplateFile)); category = new PrintWriter (new BufferedWriter( new FileWriter(catPath))); String line; while ((line=in.readLine())! = null){ parseTheIndexTemplate(line); } cleanup(in, category, catPath); }

// Exceptions handling catch(FileNotFoundException e){ System.out.println("\nFile not found. Please, check the spelling and try again."); System.exit(1); }

catch(IOException e){ System.out.println("\nProblem with opening or closing a datafile. Try again."); System.exit(1); } }

// parseTheSiteXml (String currentLine) // @param String currentLine

105 public void parseTheSiteXml(String currentLine){

// Copy everything as is except the labels line. if (currentLine.equals("LABELS")) fillLabels(); else sitexml.println(currentLine); }

// fillLabels() // Go through all the categries and create a label by // adding the letter “i” and array index to the word "label." public void fillLabels(){

String label = " "; System.out.println("Filling the site.xml with the labels.");

// Find index of category: for (int i = 0; i < catCount; i ++ ){ sitexml.print("\n"); } }

// parseTheCatsIndex(String currentLine) // @param String currentLine public void parseTheCatsIndex(String currentLine){

// System.out.println(currentLine); String[] changeparams = currentLine.split(" "); for (int i = 0 ; i < changeparams.length; i++) { fillTheFields(changeparams[i], catindex); } catindex.print("\n"); }

// parseTheIndexTemplate(String currentLine) // @param String currentLine public void parseTheIndexTemplate(String currentLine){

boolean process = true;

String[] changeparams = currentLine.split(" ");

// Title etc. if (changeparams[0].equals("***")) { changeparams[0]=" "; }

// Previous category: else if (changeparams[0].equals("PREVIOUS_CAT")) { previousCategory = findPreviousCategory ();

106 if ( previousCategory.equals("N/A")) process = false; else changeparams[0]=" "; }

// Next product: else if (changeparams[0].equals("NEXT_CAT")) { nextCategory = findNextCategory (); if ( nextCategory.equals("N/A")) process = false; else changeparams[0]=" "; }

if (process == true) { for (int i = 0 ; i < changeparams.length; i++) {

fillTheFields(changeparams[i], category); } }

category.print("\n"); }

// listProducts() lists products in the current category public void listProducts(){

int catIndex=0; String prodName="";

// Find index of category: for (int i = 0; i < catCount; i ++ ){ if(categoriesArray[i].equals(productCategory)) catIndex=i; }

System.out.println("Creating list of products for: " + catIndex + " " + productCategory);

// List Products: for (int j = 0; j < productsArray[catIndex].length; j ++ ){ if (! (productsArray[catIndex][j].equals(" "))) category.print("

// counter for the HashMaps in the Collection int iter = 0;

while (hashMapCollection[iter].isEmpty() == false){

String test = "" +

107 hashMapCollection[iter].get("CATALOG_NUMBER"); if ( test.equals(productsArray[catIndex][j])){ prodName = "" + hashMapCollection[iter]. get("PRODUCT_NAME");

category.print(productsArray[catIndex][j] + ".html\"> " +

productsArray[catIndex][j] + ", " + prodName);

category.print("

" + "\n"); break;

} else // advance the counter iter ++; if (iter > = hashMapCollection.length-1) break; } } }

// listCategories() // Lists all the categories public void listCategories(){

int catIndex=0; System.out.println("Creating list of all the directories in the catalog.");

// Find index of category: for (int i = 0; i < catCount; i ++ ){ catindex.print(" " + categoriesArray[i]); catindex.print(""); catindex.print("\n"); } }

// findPreviousCategory () // Go to array of categories and find the current one. // Look for the previous one. // If there is none (it is the first one), return "N/A," // If there is one, return it. public String findPreviousCategory () {

String prevCat = "N/A";

for (int i = 0; i < categoriesArray.length; i ++ ){ if (categoriesArray[i].equals(productCategory)) {

108 if (i==0) prevCat = "N/A"; else prevCat = categoriesArray[i-1]; } } return prevCat; }

// findNextCategory () // Go to array of categories and find the current one. // Look for the next one. // If there is none (it is the last one), return "N/A," // If there is one, return it. public String findNextCategory () {

// System.out.println(productCategory); String nextCat = "N/A";

for (int i = 0; i < catCount; i ++ ){ if (categoriesArray[i].equals(productCategory)) { if ( i == (catCount - 1)) nextCat = "N/A"; else nextCat = categoriesArray[i+1]; } } return nextCat; }

// Iterates through the collection and inserts data in place // of Placeholders public void populateFiles(){

// initialize counters int iter=0;

// start process entry by entry while (hashMapCollection[iter].isEmpty() == false){

// Assign values to the globals extactTheData (iter); // Create paths to images and specs if there are any createPaths(); // Fill in the data fillProductTemplate();

// Move to the next product record iter++; if (iter >= hashMapCollection.length - 1 ) break; } }

// extactTheData (int index) // Assigns globals product data // @param int index

109 public void extactTheData (int index) {

// extract product details productCatalogNumber = "" + hashMapCollection[index].get("CATALOG_NUMBER"); productName = "" + hashMapCollection[index].get("PRODUCT_NAME"); productCategory = "" + hashMapCollection[index].get("CATEGORY"); productType = "" + hashMapCollection[index].get("TYPE"); productSize = "" + hashMapCollection[index].get("SIZE"); productStorage = "" + hashMapCollection[index].get("STORAGE"); productPrice = "" + hashMapCollection[index].get("PRICE"); productShipping = "" + hashMapCollection[index].get("SHIPPING"); productApplication = "" + hashMapCollection[index].get("APPLICATION"); productRecognition = "" + hashMapCollection[index].get("RECOGNITION"); productCloneSubtype = "" + hashMapCollection[index].get("CLONE_SUBTYPE"); productThumbnailImage = "" + hashMapCollection[index].get("THUMBNAIL_IMG"); productFullImage = "" + hashMapCollection[index].get("FULL_SIZE_IMG"); productDataSheet = "" + hashMapCollection[index].get("DATASHEET"); productOnline = "" + hashMapCollection[index].get("PRODUCT_ONLINE");

previousProduct = findPreviousProduct (); nextProduct = findNextProduct (); System.out.println("Creating " + productCatalogNumber + ".xml, Category: " + productCategory);

// choose the color scheme for (int i = 0; i < categoriesArray.length; i ++){ if (categoriesArray[i].equals(productCategory)){ currentCategory = i; } } }

// Builds paths to the specsheets and images public void createPaths() {

// create path to the file fullCategoryPath = "" + partialPath + "/" + productCategory + "/" + "index.xml"; fullFilePath = "" + partialPath + "/" + productCategory + "/" + productCatalogNumber + ".xml";

// locate the spec sheet and images if any if (! productDataSheet.equals("N/A")) datasheetFullPath = datasheetPartialPath + productDataSheet;

110

if (! productThumbnailImage.equals("N/A")) thumbnailFullPath = thumbnailPartialPath + productThumbnailImage;

if (! productFullImage.equals("N/A")) fullSizeFullPath = fullSizePartialPath + productFullImage; }

// fillProductTemplate() // Substitutes placeholders in the product template files // with the information contained in the myCatalog collection. public void fillProductTemplate(){

try{ in = new BufferedReader ( new FileReader(productTemplateFile)); product = new PrintWriter ( new BufferedWriter( new FileWriter(fullFilePath))); String line;

while ((line=in.readLine())!=null){ parseTheTemplate(line); } cleanup(in, product, fullFilePath); }

// Exceptions handling catch(FileNotFoundException e){ System.out.println("\nFile not found. Please, check the spelling and try again."); System.exit(1); }

catch(IOException e){ System.out.println("\nProblem with opening or closing a datafile. Try again."); System.exit(1); } }

// parseTheTemplate(String currentLine) // @param String currentLine // The function is pretty long since there are a bunch // of cases: IMAGE, URL etc. // To be made shorter later... public void parseTheTemplate(String currentLine){

boolean process = true; String[] changeparams = currentLine.split(" ");

// Title etc. if (changeparams[0].equals("***")) { changeparams[0]=" "; }

111

// Link to the on-line entry else if (changeparams[0].equals("URL")) { if (productOnline.equals("N/A")) process = false; else changeparams[0]=" "; }

// Datasheet else if (changeparams[0].equals("SPEC")) { if (productDataSheet.equals("N/A")) process = false; else changeparams[0]=" "; }

// Images: thumbnail and full size else if (changeparams[0].equals("IMAGE")) { if ( productThumbnailImage.equals("N/A") && (! productOnline.equals("N/A") || ! productDataSheet.equals("N/A")) ) { process = false; }

// if all the three buttons are not needed, // just put some empty space... else if (productOnline.equals("N/A") && productDataSheet.equals("N/A") && productThumbnailImage.equals("N/A")) { fillTheFields(" ", product); process = false; }

else if (! productThumbnailImage.equals("N/A")) changeparams[0]=" ";

}

// Previous product: else if (changeparams[0].equals("PREVIOUS_PR")) { if ( previousProduct.equals("N/A")){ process = false; } else changeparams[0]=" "; }

// Previous product: else if (changeparams[0].equals("NEXT_PR")) { if ( nextProduct.equals("N/A")){ process = false; } else changeparams[0]=" "; }

112

if (process==true) { for (int i = 0 ; i < changeparams.length; i++) {

fillTheFields(changeparams[i], product); } } product.print("\n"); }

// findPreviousProduct () // Go to array of products and find the current one. // Check the previous one in the same "column." // If there is none (this is the first one), return "N/A," // If there is one, return it. public String findPreviousProduct () {

String previous = "N/A"; for (int i = 0; i < productsArray.length; i ++ ){ for (int j = 0; j < hashMapCollection.length; j ++ ){ if (productsArray[i][j]. equals(productCatalogNumber)) if (j==0) previous = "N/A"; else previous = productsArray[i][j-1]; } }

return previous;

}

// findNextProduct () // Go to array of products and find the current one. // Check the next one in the same "column." // If there is no one, return "N/A," // If there is one, return it. public String findNextProduct () {

String nextPr = "N/A"; for (int i = 0; i < productsArray.length; i ++ ){ for (int j = 0; j < hashMapCollection.length; j ++ ){ if (productsArray[i][j]. equals(productCatalogNumber)) if (j == (hashMapCollection.length -1)|| (j <(hashMapCollection.length -1)&& (productsArray[i][j+1].equals(null)|| productsArray[i][j+1].equals("")) )) nextPr = "N/A"; else nextPr = productsArray[i][j+1]; } } return nextPr; }

113

// fillTheFields (String currentToken) // @param String currentToken // Populates the template file with the product data public void fillTheFields (String currentToken, PrintWriter writer){

// Product - related tokens if (currentToken.equals("CATALOG_NUMBER")) writer.print ("" + productCatalogNumber + " "); else if (currentToken.equals("PRODUCT_NAME")) product.print ("" + productName + " "); else if (currentToken.equals("CATEGORY")) writer.print ("" + productCategory + " "); else if (currentToken.equals("TYPE")) writer.print ("" + productType + " "); else if (currentToken.equals("SIZE")) writer.print ("" + productSize + " "); else if (currentToken.equals("STORAGE")) writer.print ("" + productStorage + " "); else if (currentToken.equals("PRICE")) {

if (! productPrice.equals("N/A")) writer.print (currency + productPrice); else writer.print ("N/A"); } else if (currentToken.equals("SHIPPING")) writer.print ("" + productShipping + " "); else if (currentToken.equals("APPLICATION")) writer.print ("" + productApplication + " "); else if (currentToken.equals("RECOGNITION")) writer.print ("" + productRecognition + " "); else if (currentToken.equals("CLONE_SUBTYPE")) writer.print ("" + productCloneSubtype + " "); // In case of images the path to them is used // rather then the original file names else if (currentToken.equals("THUMBNAIL_IMG")) writer.print (""); else if (currentToken.equals("FULL_SIZE_IMG")) writer.print (""); else if (currentToken.equals("PRODUCT_ONLINE")) writer.print ("" + productOnline + " "); else if (currentToken.equals("DATASHEET")) writer.print (""); // Previous and Next products part: else if (currentToken.equals("PREVIOUS_PRODUCT")){ writer.print (""); writer.print ("

114 + colorScheme[currentCategory] + "_pr_product.gif\" alt=\"previous product\" />"); }

else if (currentToken.equals("ALL_PRODUCTS")) { writer.print (""); writer.print (""); }

else if (currentToken.equals("NEXT_PRODUCT")) { writer.print (""); writer.print (""); }

// Previous and Next category part: else if (currentToken.equals("PREVIOUS_CATEGORY")){ writer.print (""); writer.print (""); }

else if (currentToken.equals("ALL_CATEGORIES")) { writer.print (""); }

else if (currentToken.equals("NEXT_CATEGORY")) { writer.print (""); writer.print (""); } else if (currentToken.equals("PRODUCTS")) listProducts();

// Colors else if (currentToken.equals("TR_CLASS")) { writer.print (""); }

115

else if (currentToken.equals("TD_CLASS")) { writer.print (""); }

else if (currentToken.equals("CATEGORIES")) listCategories();

else writer.print ("" + currentToken + " "); }

// void cleanup() method. // Closes the template and product files. public void cleanup(BufferedReader reader, PrintWriter writer, String myfile){ try{ reader.close(); writer.close(); } catch(IOException e) { System.out.println("\nUnsuccessfll close() of " + myfile); System.exit(1); } }

}

116 Compilation files

Compile.bat

javac -classpath . catalogconverter/util/*.java javac -classpath . catalogconverter/source/*.java java -cp . catalogconverter.source.ProductRecorder

117 Configuration files

files_config.txt

# In this file: 'input' - name of the file containing products info. input=Catalog.txt #input=test_files/test2.txt test_output=readback.txt log_file=log.txt

118 Fragments of the modified Apache Forrest files

Skinconf.xml, skin colors and extra css parts only

119 Skinconf.xml, extra css

header, h1 { font-size: 16; font-family: Arial Black; font-weight: normal; font-style: all-caps; color: #754C29 }

p.center { text-align: center; }

p.left { text-align: left; }

p.right { text-align: right; }

title.subhead { font-size: 14; font-family: Arial; font-weight: normal; color: #754C29; }

td.orangeback { background-color: #fddb91; }

p.regulartext { font-size: 100%; font-family: Arial; font-weight: normal; padding-left: 5px; padding-right: 5px; }

p.subheader, div.subheader { font-size: 100%; font-weight: bold; color: #991D20; font-family: Arial; padding-left: 5px; padding-right: 5px; }

120

p.product_headers{ font-size: 100%; font-weight: bold; color: #9a1d20; font-family: Arial; padding-left: 5px; padding-right: 5px; } p.product_subheaders{ font-size: 85%; font-weight: normal; color: #9a1d20; font-family: Arial; padding-left: 5px; padding-right: 5px; text-align:center; }

p.product_desc{ font-size: 85%; font-weight: normal; color: 000000; font-family: Arial; padding-left: 5px; padding-right: 5px; text-align:center; }

p, ul { font-size: 100%; font-family: Arial; font-weight: normal; line-height:120%; margin-bottom:1em; margin-top:0.5em; text-align:left; }

p.introtext { font-size: 175%; font-family: Arial; font-weight: normal; padding-left: 5px; padding-right: 5px; }

p.regulartext_red, ul.regulartext_red { font-size: 100%; font-family: Arial; font-weight: bold; color: #991D20; text-align:left; padding-left: 5px; padding-right: 5px;

121 }

p.red_footer { font-size: 100%; font-family: Arial; font-weight: bold; color: #991D20; text-align:center; text-transform: uppercase; }

quote { margin-left: 2em; padding: .5em; background-color: #f0f0f0; font-family: Arial; }

table.home_table { width: 750px; padding: 5px; border: 0 none; } table.product_table { width: 1000px; padding: 5px; border: 0 none; border-color: red; }

#content table { width: 750px; padding: 5px; border: 0 none;

}

#content th, #content td { padding: 5px; vertical-align: top; border: 0 none;

}

#menu .menupagetitle { background-color:#dfc388; color: #e38529; }

a { text-decoration: none }

a:visited { color:#939090;

122 } td.w250 { width: 250px; table-layout: fixed; }

td.w500 { width: 500px; table-layout: fixed; } td.w150 { width: 150px; table-layout: fixed; }

td.whalf { width: 50%; table-layout: fixed; } td.orangeback_w250 { width: 250px; table-layout: fixed; background-color: #fddb91; } blockquote { padding: 10px; } italic { font-style:italic; } blockquote_italic { padding: 10px; font-style:italic; }

no_border { border: none; }

td.w400 { width: 400px; table-layout: fixed; }

.yellow_cell { background-color: #FDDB91; }

.i0 { background-color: #F6946A;

123 }

.i0_back { background-color: #F6946A; background-image: url(images/i0_bckg.gif); background-position: center; }

.i1 { background-color: #65BE8B; }

.i1_back { background-color: #65BE8B; background-image: url(images/i1_bckg.gif); background-position: center; }

.i2 { background-color: #F9B873; }

.i2_back { background-color: #F9B873; background-image: url(images/i2_bckg.gif); background-position: center; }

.i3 { background-color: #A7CEE3; }

.i3_back { background-color: #A7CEE3; background-image: url(images/i3_bckg.gif); background-position: center; }

.i4 { background-color: #F6D17C; }

.i4_back { background-color: #F6D17C; background-image: url(images/i4_bckg.gif); background-position: center; }

.i5 { background-color: #D99AC2; }

.i5_back {

124 background-color: #D99AC2; background-image: url(images/i5_bckg.gif); background-position: center; }

.i6 { background-color: #95D3CF; }

.i6_back { background-color: #95D3CF; background-image: url(images/i6_bckg.gif); background-position: center; }

.i7 { background-color: #D8E086; }

.i7_back { background-color: #D8E086; background-image: url(images/i7_bckg.gif); background-position: center; }

.i8 { background-color: #B9BCDA; }

.i8_back { background-color: #B9BCDA; background-image: url(images/i8_bckg.gif); background-position: center; }

.i9 { background-color: #88C88B; }

.i9_back { background-color: #88C88B; background-image: url(images/i9_bckg.gif); background-position: center; }

.i10 { background-color: #F6B087; }

.i10_back { background-color: #F6B087; background-image: url(images/i10_bckg.gif); background-position: center; }

125 .i11 { background-color: #8DCCA5; }

.i11_back { background-color: #8DCCA5; background-image: url(images/i11_bckg.gif); background-position: center; }

.i12 { background-color: #CFE8F6; }

.i12_back { background-color: #CFE8F6; background-image: url(images/i12_bckg.gif); background-position: center; }

.i13 { background-color: #FBEDA1; }

.i13_back { background-color: #FBEDA1; background-image: url(images/i13_bckg.gif); background-position: center; }

.i14 { background-color: #F2C1D6; }

.i14_back { background-color: #F2C1D6; background-image: url(images/i14_bckg.gif); background-position: center; }

.i15 { background-color: #BAE2EC; }

.i15_back { background-color: #BAE2EC; background-image: url(images/i15_bckg.gif); background-position: center; }

.i16 { background-color: #FBD497; }

126

.i16_back { background-color: #FBD497; background-image: url(images/i16_bckg.gif); background-position: center; }

.i17 { background-color: #9BBD93; }

.i17_back { background-color: #9BBD93; background-image: url(images/i17_bckg.gif); background-position: center; }

.i18 { background-color: #55A496; }

.i18_back { background-color: #55A496; background-image: url(images/i18_bckg.gif); background-position: center; }

.i19 { background-color: #93C83D; }

.i19_back { background-color: #93C83D; background-image: url(images/i19_bckg.gif); background-position: center; }

.i20 { background-color: #808080; }

.i20_back { background-color: #808080; background-image: url(images/i20_bckg.gif); background-position: center; }

.golden_back { background-color: #ffdb91; width: 125px; }

#footer, #footer a, #feedback, #feedback #feedbackto {

127 color: #991D20; }

#footer a { color: #0F3660; } #footer a:visited { color: #009999; }

128

Templates for XML Templates

directories_template.xml

Categories
CATEGORIES

129 index_template.xml

*** CATEGORY
*** PRODUCTS
PREVIOUS_CAT TD_CLASS PREVIOUS_CATEGORY *** TD_CLASS ALL_CATEGORIES NEXT_CAT TD_CLASS NEXT_CATEGORY

130 product_template.xml

*** CATEGORY > CATALOG_NUMBER ( PRODUCT_NAME )
*** TR_CLASS *** *** *** TD_CLASS *** TD_CLASS

PRODUCT_NAME

*** *** *** *** *** TR_CLASS *** *** ***

CATALOG_NUMBER

TYPE

Product Size

Storage

Price per 100 ug

Shipping

SIZE

STORAGE

PRICE

SHIPPING

Application:
APPLICATION

Recognition:
RECOGNITION

Clone Subtype:
CLONE_SUBTYPE

URL SPEC IMAGE

131

DATASHEET FULL_SIZE_IMG THUMBNAIL_IMG

Click on a picture to enlarge.

PREVIOUS_PR TD_CLASS PREVIOUS_PRODUCT *** TD_CLASS ALL_PRODUCTS NEXT_PR TD_CLASS NEXT_PRODUCT

132 site_template.xml

Only the part of the file that was modified is included. For the rest, see the original Apache Forrest’s site.xml file.

LABELS

..

133 XML Templates

index.xml (List of directories)

Categories
Test
Blood Group Antibodies
CMV
Epstein-Barr Virus

134 HAV

HBV
HCV
HDV
HEV
HIV-1 (2)
HSV

135

HTLV
Rubella virus
SARS
Signal Transduction Antibodies
Syphilis
Tick-born Encephalitis Virus

136

Toxoplasma gondii
Varicella Zoster Virus
Viral antibodies
West Nile Virus

137

HCV category index.xml

HCV

140

00115-V, HCV core recombinant antigen

00115-V-B, HCV core recombinant antigen

00115-V-F, HCV core recombinant antigen

00115-V-R, HCV core recombinant antigen

138

00116-V, HCV NS4 recombinant antigen

00116-V-B, HCV NS4 recombinant antigen

00116-V-F, HCV NS4 recombinant antigen

00116-V-R, HCV NS4 recombinant antigen

00117-V, HCV NS3 recombinant antigen

139

00150-V, HCV core 2-119aa recombinant antigen.

00151-V, HCV core 24 antigen.

00152-V, HCV core recombinant antigen

00153-V, HCV NS3 1192-1456aa recombinant antigen

00154-V, HCV NS3 1359-1456aa antigen.

00155-V, HCV NS4 mosaic recombinant antigen

00156-V, HCV NS4 1916-1947aa recombinant antigen.

00157-V, HCV NS5 2061-2302aa recombinant antigen.

00158-V, HCV NS5 2212-2313aa recombinant antigen.

141

142 101a.xml example of the final XML product file)

Signal Transduction Antibodies > 101-A ( anti-Glutathione monoclonal antibody IgG1 )

101-A

mAb

Product Size

Storage

Price per 100 ug

Shipping

anti-Glutathione monoclonal antibody IgG1

143

100 ug/vial

Long Term: -80C; Short Term: 4C

$210

Wet Ice

Application:
WB, ELISA, IP

Recognition:
N/A

Clone Subtype:
D8

144

Click on a picture to enlarge.

145

site.xml (modified site.xml file, part of Apache Forrest skin)

146

147

148 sample xml static page

company/news.xml

ViroGen News

Virogen is proud to announce a new Rabbit polyclonal antibody designed specifically to react with HCV NS5a genotype 2A, catalog number 276-A.

It is offered at a special price of $195.00 per 100 ug vial until September 30th (regular price is $225/100 ug).

The anti-HCV NS3 monoclonal antibody, catalog number 217-A, has been shown to react with genotype 2A as well.

Please visit www.virogen.com for a complete HCV reagent listing.

As always we hope to serve you research and diagnostic needs.

Sincerely,
Team ViroGen

149 Sample log.txt entry

Checking presence of 15 required fields... Index of fields: 15 All required fields are present. BAD INPUT

------

PARSER-REJECTED LINES

The following 2 lines were rejected by the parser (they are comments, blank lines or lines with insufficient data):

1 143-A anti-H inh human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 ELISA, WB, IHC, FC, IP N/A 97-I N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=17 03 143a.pdf 2

------

Starting creating an array of HashMaps...

HashMap objects passed to the Driver have following properties:

Number of HashMaps 167

150 snippet of tab-delimited data file

CATALOG_NUMBER PRODUCT_NAME CATEGORY TYPE SIZE STORAGE PRICE SHIPPING APPLICATION RECOGNITION CLONE_SUBTYPE THUMBNAIL_IMG FULL_SIZE_IMG PRODUCT_ONLINE DATASHEET 007-A Test Test mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A AFR1 N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=16 91 131a.pdf 131-A anti-RhoD human antigen IgG1 Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A AFR1 N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=16 91 131a.pdf 132-A anti-A1, A2 human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z2A N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=16 92 132a.pdf 133-A anti-A1, A2, A3 human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z2B-1 N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=16 93 133a.pdf 134-A anti-B human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z5H-2 N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=16 94 134a.pdf 135-A anti-AB human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z5H-2/Z2A N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=16 95 135a.pdf 136-A anti-Rh(o)D human antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A 55/2 N/A N/A http://www.virogen.com/commerce/catalog/product.jsp?product_id=16 96 136a.pdf

151

testing data files

test1.txt

(missing data and ‘N/A’ as a catalog number entry) CATALOG_NUMBER PRODUCT_NAME CATEGORY TYPE SIZE STORAGE PRICE SHIPPING APPLICATION RECOGNITION CLONE_SUBTYPE THUMBNAIL_IMG FULL_SIZE_IMG PRODUCT_ONLINE DATASHEET N/A anti-RhoD human antigen IgG1 Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A AFR1 N/A N/A N/A N/A 132-A anti-A1, A2 human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z2A N/A N/A N/A N/A 133-A anti-A1, A2, A3 human blood antigen IgM N/A mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z2B-1 N/A N/A N/A N/A

152 test2.txt

(duplicated record)

CATALOG_NUMBER PRODUCT_NAME CATEGORY TYPE SIZE STORAGE PRICE SHIPPING APPLICATION RECOGNITION CLONE_SUBTYPE THUMBNAIL_IMG FULL_SIZE_IMG PRODUCT_ONLINE DATASHEET 131-A anti-RhoD human antigen IgG1 Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A AFR1 N/A N/A N/A N/A 131-A anti-A1, A2 human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z2A N/A N/A N/A N/A 133-A anti-A1, A2, A3 human blood antigen IgM Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice ELISA, WB, IHC, FC, IP N/A Z2B-1 N/A N/A N/A N/A

153 test3.txt

(an empty text file)

154 test4.txt

(just a list of data fields)

CATALOG_NUMBER PRODUCT_NAME CATEGORY TYPE SIZE STORAGE PRICE SHIPPING APPLICATION RECOGNITION CLONE_SUBTYPE THUMBNAIL_IMG FULL_SIZE_IMG PRODUCT_ONLINE DATASHEET

155 test5.txt

(just one record)

CATALOG_NUMBER PRODUCT_NAME CATEGORY TYPE SIZE STORAGE PRICE SHIPPING APPLICATION RECOGNITION CLONE_SUBTYPE THUMBNAIL_IMG FULL_SIZE_IMG PRODUCT_ONLINE DATASHEET 131-A anti-RhoD human antigen IgG1 Blood Group Antibodies mAb 100 ug/vial Long Term: -80C; Short Term: 4C 210 Wet Ice

156