AO HUPO workshop in Tsukuba (Dec. 19, 2002) An XML Format For Proteomics Database to Accelerate Collaboration Among Researchers
HUP-ML: Human Proteome Markup Language
K. Kamijo, H. Mizuguchi, A. Kenmochi, M. Sato, Y. Takaki and A. Tsugita Proteomics Res. Center, NEC Corp.
1
Overview (HUP-ML) z Introduction z XML features z XMLs in life science z HUP-ML concept z Example of HUP-ML z Details of HUP-ML z Demonstration of “HUP-ML Editor” 2 Introduction z Download entries from public DBs as a flat-file z easy for a person to read z different formats for every DB z sometimes needs special access methods and special applications for each format z Needs machine-readable formats for software tools z To boost studies by exchanging data among researchers Activates standardization
3
XML format
z XML (eXtensible Markup Language) W3C: World Wide Web Consortium (inception in 1996) Attribute Key Value Example Element Start Tag
z Highly readable for machine and person z Can represent information hierarchy and relationships z Details can be added right away z Convenient for exchanging data z Easy to translate to other formats 4 z Logical-check by a Document Type Definition (DTD) Overview (HUP-ML) z Introduction z XML features z XMLs in life science z HUP-ML concept z Example of HUP-ML z Details of HUP-ML z Demonstration of “HUP-ML Editor” 5
XML features z Exchangeable z Inter-operability: z OS (Operating System): Windows, Linux, etc. z Software Development Languages z Communication Protocols z Inter-national / Inter-business z W3C has supervised the XML specifications z XML is well prevailed internationally z XML supports multilingual character code sets W3C: World Wide Web Consortium (inception in 1996)
6 XML features z Extensible z W3C supervised ONLY the language specifications z Everyone could create a NEW language structure based on the specifications z Hierarchy structure z A change in a node does not affect data in other nodes z Easy to add new information z when they are available
7
XML features
z Retrievable with a computer z Easy to find target information in an XML document z By using ‘Tag’ element structure z It is difficult to retrieve them in a flat text file and a binary file. z Possible to find more specific information with ‘Attribute’ keywords step by step
XML document
9
XML features z A variety of XML tools are available for USERS z Standardization Tools z Retrieval Protocols: XPath z Data Transformation: XSLT (Extensible Stylesheet Language Transformations) z Easy to handle XML document by using such tools
10 XML features z A variety of related tools z XML database management system z XML formatter, XML Editor z Communication Protocols (ex. SOAP) z Major applications and tools tend to support XML document handling z RDB: ORACLE, SQL Server .... z Web browser: Microsoft Internet Explorer ... z Applications: Microsoft Office ...
11
XML features z Easy to combine XML applications with other functions z Data processing tools z Embedding link information (XLink, XPointer) z Electric authentication, Encryption z Some tools automatically bring their functions to applications
XML Secure simple link Encryption System XPointer XPath extended link
integrity message authentication XSLT XML Signature signer authentication XML Link 12 XML features z Easy to combine with other XML format documents z By using ‘NameSpaces’ technique, it is possible to combine them, integrate them and divide into original XML formats z Example: Combination of HTML and the following XML z Figures (SVG: Scalable Vector Graphics,Web3D etc.) z Formula (MathML: Mathematical Markup Language) z Chemical formula (CML: Chemical Markup Language)
xml:lang="en" dir="ltr"> :