IST 516: Web and Information Retrieval Fall 2013 / Dongwon Lee Lab #1: XML Schema (TOTAL: 40 (DUE: Sep. 22 SUN 11:55PM) Points) NOTE: This is an individual lab.

Task 1. Consider the following two XML files:

 http://pike.psu.edu/classes/ist516/latest/labs/xschema/letter.xml  http://pike.psu.edu/classes/ist516/latest/labs/xschema/letter2.xml

letter.xml letter2.xml Dongwon Dongwon Lee Lee Sylvie Example Sylvie Can you infer a DTD ? S. Example I have a question. Can you infer a DTD ?

Download both XML files to your local web folder (e.g., PASS). Then, using some XML editor software, write a reasonably tight schema in DTD (letter.dtd) such that both letter.xml and letter2.xml are “valid” according to letter.dtd. You may have to modify the location information in the XML files (i.e., ).

A tight schema accepts what is permitted in XML files but not too much “more” unnecessarily. For instance, suppose one needs to write a tight schema that accepts the following two XML snippets:

Then, either or would be a correct and tight schema (i.e., content model). However,

The Pennsylvania State University / College of Information Sciences and Technology bar3?)> would still be able to accept above two XML snippets but a whole lot more other XML snippets like:

Therefore, would be a bit “too loose” for the two given snippets, and not the best schema.

Task 2. Consider the following XML file:

 http://pike.psu.edu/classes/ist516/latest/labs/xschema/letter3.xml

letter3.xml

Dongwon Lee Sylvie S. Example I have a question. Can you infer an XML Schema?

Again, download the letter3.xml to your local web folder (e.g., PASS). Then, using some XML editor software, write a reasonably tight schema in XML Schema (letter.xsd) such that letter3.xml is “valid” according to letter.xsd. You may have to modify the namespace or location information in the XML files (i.e., xsi:schemaLocation=”http://some-path-in-your- web-folder letter.xsd”).

Task 3. Using any one of the XML schema validators available on the Web or as part of S/W (e.g., http://www.w3.org/2001/03/webdata/xsv, http://schneegans.de/sv/, XML Pad), verify one more time that XML files are “valid” according to your letter.dtd or letter.xsd.

Grading Rubric (Total: 40 Points). Note that a very loose schema that not only accepts letter.xml, letter2.xml, letter3.xml, but also any other XML files is NOT a good answer. Therefore, design your content model of the schema using regular expression (RE) as tight as possible. As long as your schema is reasonably tight, you will get the full credit.

1. 20 Points: Passing the schema validity using your letter.dtd 2. 15 Points: Passing the validity using your letter.xsd 3. 5 Points: Schema are reasonably tight

Turn-In: By due date, upload the following information to ANGEL: 1. URL of your public web folder where you upload both letter.dtd and letter.xsd files. TA will visit each folder to check the correctness of both schema files. Make sure all files are accessible by TA. 2. Screenshot of the result message by W3C’s schema validator, showing that XML files are VALID