Using XML and PDF Together
Total Page:16
File Type:pdf, Size:1020Kb
Using XML and PDF Together why you don’t necessarily have to choose Leonard Rosenthol Director of Software Development Appligent, Inc. Copyright©1999-2001, Appligent, Inc. Overview · Introduction to XML · Quick review of PDF · XML & PDF meta-data · XML & PDF forms · XML & PDF structure · XML & PDF content · XML & PDF creation Copyright©1999-2001, Appligent, Inc. You are here because... · You’re interested in XML, XHTML & SVG and all the hype surrounding it · You’re currently working with PDF and XML, but not together · You were already awake, but lunch isn’t till noon, and had to find something to kill time · You’re a friend of mine, and wanted to heckle Copyright©1999-2001, Appligent, Inc. How I do things · You already have a draft version of this document in your proceedings, so you shouldn’t need to take too many notes · A copy of the final document is on Appligent's website (<http://www.appligent.com>) for your downloading pleasure · Although I’ve left time at the end for Q & A, I’m more than happy to take questions at any time. Copyright©1999-2001, Appligent, Inc. XML eXtensible Markup Language · XML is really a specification that allows for specific markup languages to be created for specific purposes all within the same compatible syntax (but not the same tag set!) · Although the name clearly gives you this impression, unlike HTML, XML itself has no tags to learn. · It provides a standard organization and set of rules for building any markup language · separation of data vs. presentation · hierarchical structure · human readable Copyright©1999-2001, Appligent, Inc. XML History How we got here · SGML · Standardized General Markup Language · Developed in 1974 by Charles F. Goldfarb & others as a means to create a single basis for any type of markup language · ISO Standard as of 1986 · HTML · HyperText Markup Language · Developed by Tim Berners-Lee as part of a research project to enable sharing of data over the Internet. · Presentation NOT data oriented ! · HTML has a fixed set of tags. · HTML 3.2 was standardized by the W3C in 1996 Copyright©1999-2001, Appligent, Inc. XML Specification · XML 1.0 specification can be found at <http://www.w3.org/TR/1998/RECC-xml-19980210.h tml>. · Though an even better version called the “Annotated XML Specification” by Tim Bray can be found at <http://www.xml.com/axml/testasml.htm> Copyright©1999-2001, Appligent, Inc. Design Principles of XML · XML should be straightforwardly usable over the Internet · XML shall support a wide variety of applications · XML shall be compatible with SGML · It shall be easy to write programs that process XML documents · The number of optional elements in XML is to be kept to an absolute minimum -ideally zero Copyright©1999-2001, Appligent, Inc. Design Principles of XML (cont.) · XML documents should be human readable and reasonably clear · The XML design should be prepared quickly · The design of XML shall be formal and concise · XML documents shall be easy to create · Terseness in XML markup is of minimal importance Copyright©1999-2001, Appligent, Inc. HTML vs. XML · No fixed tag set · ALL tags have both a start and end · Can’t just do <P> - have to do <P></P> · <IMG xxx /> · Tags must be perfectly nested · <B><I>foo</B></I> - NOT! · Capitalization of tags is significant · <BOLD> text </bold> - NOT! · Whitespace is significant · Spaces, tabs, etc. are maintained! Copyright©1999-2001, Appligent, Inc. XML Example <?XML version=“1.0” encoding=“UTF-8”?> <!DOCTYPE customerDB SYSTEM “customerDB.dtd”> <!-- Customer DataBase for some unnamed company --> <DOCUMENT> <CUSTOMER> <NAME> <LASTNAME>Edwards</LASTNAME> <FIRSTNAME>Britta</FIRSTNAME> </NAME> <DATE>April 17, 1998</DATE> <ORDERS> <ITEM> <PRODUCT>Cucumber</PRODUCT> <NUMBER>5</NUMBER> <PRICE>1.25</PRICE> </ITEM> <ITEM> <PRODUCT>Lettuce</PRODUCT> <NUMBER>2</NUMBER> <PRICE>.98</PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT> Copyright©1999-2001, Appligent, Inc. XML Example 2 <?XML version=“1.0” encoding=“UTF-8”?> <!-- Minneapolis Airline Schedule - January 3rd, 1998 --> <SCHEDULE> <AIRLINE>NorthWest</AIRLINE> <FLIGHT> <NUMBER>449</NUMBER> <STATUS>Cancelled</STATUS> </FLIGHT> <FLIGHT> <NUMBER>640</NUMBER> <STATUS depart=“0100”>Delayed</STATUS> </FLIGHT> <AIRLINE>TWA</AIRLINE> <FLIGHT> <NUMBER>1010</NUMBER> <STATUS gate=“17 Gold”>On Time</STATUS> </FLIGHT> </SCHEDULE> Copyright©1999-2001, Appligent, Inc. PDF “the reliable digital master” · final form presentation · container for associated materials · multimedia · interactivity · security (encryption) · authenticity (digital signatures) Copyright©1999-2001, Appligent, Inc. Vive La Differance · Content vs. Presentation · Although new XML grammars are appearing that start to move into PDF’s “space” (SVG, SMIL, XML-Sigs), its true strength is in structure and data exchange. · All in one · XML provides wonderful tools (XPath, XLink, XPointer) for linking content around the Web, but has no provision for “bundling it all together into a single package” · Size does matter · PDF files will always been smaller than XML, given their ability to incorporate binary data and selective compression Copyright©1999-2001, Appligent, Inc. Let’s work together · XML & PDF forms · XML & PDF meta-data · XML & PDF structure & content · XML & PDF creation Copyright©1999-2001, Appligent, Inc. Forms · FDF: Forms Data Format · More details on FDF · Sample FDF · That’s just like XML · Good-bye fair FDF, we knew you well · XForms · XForms Requirements · Sample XForms · XML form filling Copyright©1999-2001, Appligent, Inc. FDF: Forms Data Format · Documented in Appendix H of the PDF 1.3 Specification · FDF is used when submitting Form data to a server, receiving the response, and incorporating it into the Form. It can also be used to generate (i.e. “ export” ) stand-alone files containing Form data that can be stored, transmitted electronically (e.g., via Email), and imported back into the corresponding Form. · FDF can also be used to control more of the document structure. That is, constructs within FDF allow it to control which Acrobat Forms are used in the creation of a new PDF document. This functionality can be used to create complex documents dynamically. · FDF is also used to define a container for annotations that are separate from the PDF document to which the annotations apply. Copyright©1999-2001, Appligent, Inc. More details on FDF · FDF is based on PDF, and uses the same syntax and set of basic object types as PDF. · FDF also has the same file structure as PDF, except for the fact that the cross-reference is optional. · The document structure is much simpler than PDF, since the body of an FDF document consists of only one required object. · Objects in FDF can only be of generation 0; no two objects can have the same object number, and FDF files cannot have updates appended to them. · The value of the Length attribute in the dictionary of any stream object appearing inside an FDF document must be a direct object. Copyright©1999-2001, Appligent, Inc. Sample FDF %FDF-1.2 1 0 obj << /FDF << /Fields 2 0 R >> >> endobj 2 0 obj [ << /T (name) /V (Virginia Gavin) >> << /T (birth) /V (9/16/59) >> << /T (sex) /V (F) >> << /T (address) /V (215 E Providence Rd. Aldan, PA 19018 610-284-4006) >> << /T (essnum) /V (555-222-1512) >> << /T (employer) /V (Digital Applications, Inc.) >> ] endobj trailer << /Root 1 0 R >> %%EOF Copyright©1999-2001, Appligent, Inc. That’s just like XML <?xml version=”1.0” encoding=”UTF-8”?> <xfdf xmlns=”http://www.adobe.com/std/schema/xfdf” xml:space=”preserve”> <fields> <field name=”name”> <value>Virginia Gavin</value> </field> <field name=”birth”> <value>9/16/59</value> </field> <field name=”sex”> <value>F</value> </field> <field name=”address”> <value>215 E Providence Rd. Aldan, PA 19018 610-284-4006</value> </field> <field name=”ssnum”> <value>555-222-1512</value> </field> <field name=”employer”> <value>Digital Applications, Inc.</value> </field> </fields> </xfdf> Copyright©1999-2001, Appligent, Inc. Good-bye fair FDF, we knew you well · Adobe has made it clear that the future of data exchange for their products will be XML-based syntax · Therefore it’s probably a good bet that future versions of Acrobat will use and XML-based grammar (syntax) rather than FDF to represent form data · There are a number of limitations with FDF and Acrobat forms in general that can be removed by moving the XML - not the least of which is the ability to have both form field names AND values in Unicode! Copyright©1999-2001, Appligent, Inc. XForms the future of Web Forms · The replacement for the current HTML form technology. · Based on XML and separates data, logic and presentation · Currently a W3C recommendation as part of the XHTML working group. Copyright©1999-2001, Appligent, Inc. XForms Requirements · Defined in XML, usable in any XML grammar · Migration from HTML 4 · Ease of Authoring · Separate Purpose from Presentation · Integrate with DOM · Device and Application Independence · Unicode, Internationalization, and Region Independence · Modular Construction Copyright©1999-2001, Appligent, Inc. Sample XForms <XFA> <Subform Name=”order”> <Proto> <Format ID=”telno”> <Picture>999-999-9999</Picture> </Format> </Proto> <Subform Name=”customer”> <Field Name=”name”> <Caption> <Value> <Text>Your name:</Text> </Value> </Caption> </Field> <Field Name=”street” W=”40ch”> <Caption> <Value> <Text>Street:</Text> </Value> </Caption> </Field> <Field Name=”phone” W=”40ch”> <Caption> <Value> <Text>Phone:</Text> </Value> </Caption> <Format Use=”#telno”/> </Field> </Subform> <!-- end customer subform --> </Subform> </XFA> Copyright©1999-2001, Appligent, Inc. Meta-data · What is Meta-data? · PDF’s “Info Dictionary” · XML in the Info Dictionary Copyright©1999-2001, Appligent, Inc. What is Meta-data? · extra information (outside of content) contained in a document that provides additional information about the document · document history & information · description, keywords · digital rights information · and just about anything else you can think of! Copyright©1999-2001, Appligent, Inc. PDF’s “Info Dictionary” · a set of defined entries · Author, Title, Keywords, CreationDate, etc.