Appendix C: Technical Details (With Code)
Total Page:16
File Type:pdf, Size:1020Kb
Appendix C: Technical Details (with code) Note: This appendix provides technical details about the eXist system. In some cases the code used to create applications are referenced. As the length of some of the code files is over a few pages, there are two versions of this appendix; one that references the file name and displays the code while the other just lists the file name. Table of Contents 1. System Information 2. Template Files 3. File Structure 4. Creating New Collections 5. Search Functionality 6. Search 7. Search Performance 8. Modifying Indexing Parameters 9. Uploading Files 10. Moving Files 11. Checksum Application 12. Stress Test Tool 13. Dublin Core 14. Exporting Methods 15. Migration to XML Formats 16. Role-Based Access Controls 17. Scalability 1. System Information - eXist Database 1 Free download available on eXist’s home page o 2 Instructions for loading software o 3 o Play with code in the XQuery Sandbox using preloaded with sample data sets - oXygen XML Editor 4 Download free trial version or order licensed product o 5 o Database perspective communicates with eXist database file structure - Mozilla Firefox 6 o Download 1 eXist. Home Page. 2010. http://exist.sourceforge.net/ 2 Meier, Wolfgang M. Quick Start Guide. eXist. November 2009. http://exist.sourceforge.net/quickstart.html 3 eXist. XQuery Sandbox. http://demo.exist-db.org/exist/sandbox/sandbox.xql 4oXygen. oXygen: XML Editor Home Page. 2010. http://www.oxygenxml.com/ 5 oXygen. XML Database Perspective. http://www.oxygenxml.com/xml_and_relational_database_perspective.html 2. Template Files The following files were used as templates for creating new applications. For most new applications, only minor edits were needed for each file. Specific functionality of specialized applications required additional modifications or files. File Description index.xq Main landing page of the application. list-items.xq XQuery for listing items with one item per row. view-item.xq File that transforms XML document into XHTML for viewing. search-form.xq XHTML search form. search.xq Search service. edit.xq XForms application for saving new data and changing data. new-instance.xml Data for a new item (i.e. includes default values). next-id.xml XML file that contains the next ID number to be assigned to a document. save-new.xq Service that takes a HTTP POST (Save) from the edit form and saves it into the collection. It also assigns an ID to the document and increments the ID for the next save. update.xq Updates an existing document from an edit POST. delete-confirm.xq Confirms with the user that a delete should be performed. delete.xq Deletes a document. metrics.xq Counts the number of items in a collection. views collection Files that provide instructions for additional views of data in HTML format, including various reports. scripts collection XQuery Scripts that could be run to import or cleanup data sets or other functions. Sample Template Files 3. File Structure The folder/file structure of eXist is very repetitious which makes locating files simpler. The pilot project folder structure is shown below using oXygen’s Database Perspective. This clip shows the applications (apps) associated with the Minnesota Historical Society (mhs). The folder structure is as follows: - localhost (eXist) o db (eXist database) . cust (customers) • mhs (Minnesota Historical Society) o apps (the applications created for MHS) 6 Mozilla. Firefox 3.6. 2010. http://www.mozilla.com/firefox/ . the list of applications Each application folder contains the same set of folders used to organize the information. The applications ‘ca-bills’, ‘il-bills’, and ‘mn-bills’ are all expanded in this view and show that they all have the following: - A data folder (data) - An image folder (images) - A search folder (search) - A view folder (views) - A file for application information (app-info.xml) - A file for the home page for the application (index.xq) This view expands the folders found under each collection. The data folder holds the data (in this case one file); the images folder holds the images; the search folder holds the search forms required for the application; and the views folder holds the files and forms required for various views. By default, the folder structure also determines the web address in the browser being used to view the database. The following addresses show pathways to selected individual files: - Application Main Page: o http:// localhost:8080/exist/rest/cust/mhs/ap ps/index.xq - Glossary Main Page: o http:// localhost:8080/exist/rest/db/cust/mhs/ apps/glossary/index.xq - California Bill’s Search Page: o http://localhost:8080/exist/rest/db/cust/mhs/apps/ca-bills/search/search-form- html.xq - Minnesota Bill List View: o http://localhost:8080/exist/rest/db/cust/mhs/apps/mn-bills/views/list-items.xq - Checksum Data File: o http:// localhost:8080/exist/rest/db/cust/mhs/apps/check-sum/data/4.xml 4. Creating New Collections The following describes the general steps of creating a new collection within the pilot project system. After these steps are taken you should be able to view the new application home page, view a list of items, view individual records, and search the collection. 1. A set of template files was copied, renamed, and placed into the system structure. (localhost/db/cust/mhs/apps/collection name). The template files included the files contained in the data, images, search, and view folders as well as the main application files described below. a. index.xq: ‘home page’ for collection – shows icon, application title, and list of what you can do with the collection – generally list items, search item, and a count of items b. app-info.xml: provides an icon and application title on the application home page as well as give background information on who created it and with what versions 2. Icons and other necessary collection images were uploaded into the ‘images’ folder for the collection. 3. Data was uploaded into the ‘data’ folder for the collection. a. The XML tags found within the data will be used to determine what is displayed while viewing a record and what tags are searched. 4. The following template files were modified to be able to view the items in the collection. a. views/list.items.xq: view list of bills in the collection b. views/view-bill.xq: view individual bill i. Note that the individual bill views include transformation of XML strikeouts to HTML markup elements. This process uses ‘typeswitch’ XQuery transformations that are based on template rules. 5. The following template files were modified to search an individual collection. a. search/search-form-html.xq: provides a simple keyword search box for a collection b. search/search.xq: query instructions for keyword search displaying a list of results Note: Because of differences with XML tag names between the state collections, it was necessary to write separate transformations for each of the document viewers. For the applications Syntactica developed, each transformation was approximately two pages of XQuery that used the typeswitch function to translate each tag into the appropriate XHTML tag. If standard tags are used across collections, this action would not be necessary. Note: Some form operations on XML data sets are much easier than when using relational database. For example if a field has a one-to-many relationship it is easy to add this function to any form without making any changes to the database or the index configurations. Only a few lines of code were needed to add multiple categories to one item in selected collections. 5. Search Functionality To allow users to search, a simple search HTML form is used (file = search-form-html.xq). The search form displays a web page that contains a key word search box and a search button. The entire form for searching Illinois Bills is shown below: [File: search-form-html.xq for Illinois bills.] xquery version "1.0"; import module namespace style='http://www.mnhs.org/style' at '/db/cust/mhs/modules/style.xqm'; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $title := 'Search Illinois Bills' return <html> <head> <title>{$title}</title> {style:import-css()} </head> <body> {style:header()} {style:breadcrumb()} <h2>{$title}</h2> <div id="searchform"> <form method="GET" action="search.xq"> <p> <strong>Search:</strong> <input name="q" type="text" value="" size="60"/> <input type="submit" value="Search"/> </p> </form> </div> {style:footer()} </body> </html> 6. Search The keywords that are entered into the search box are passed to a REST search interface that actually performs the search. Using reverse keyword indexes created by the Lucene full-text indexing library, the search.xq file tells the REST service what documents match and gives instructions on how to display the search results. In general, the code for this service is a basic XQuery FLWOR7 statement in the form: for $hit in collection($my-collection)[ft:query(., $keywords)] order by ft:score($hit) return local:view-summary($hit) where: $my-collection was the name of the collection to be search $keywords is the string of keywords the users was searching $view-summary is an XQuery summary to view the document hit summary This service is invoked by simply adding the query parameter to the URL such as the following: SERVER/apps/mn-bills/search/search.xq?q=healthcare In addition, all of the search applications use two simple functions that are part of the Lucene function module: fq:query($context, $keywords): a function that returns a conditional (true, false) if a document contains a keyword ft:score($hit) : a relative numeric score that allows the hit results to be sorted from highest score to lowest score A small XQuery transformation was used to transform each XML document into a full HTML document for viewing on a web browser.