Lightweight Formats for Product Model Data Exchange and Preservation Alexander Ball (1), Lian Ding (2), Manjula Patel (1) (1) UKOLN, University of Bath Claverton Down, Bath BA2 7AY, United Kingdom EMail: [email protected] EMail: [email protected] (2) IdMRC, University of Bath Claverton Down, Bath BA2 7AY, United Kingdom EMail: [email protected]

ABSTRACT

The designs for engineered products are increasingly defined not by technical drawings but by three-dimensional Computer Aided Design (CAD) models. With rapid turnover of computer hardware and CAD software, these models are in danger of becoming unreadable long before their usefulness has ended. One possible approach is to migrate the models into lightweight formats that are easier to preserve and from which it will be easier to recover information in the future. Such formats also have benefits for design collaboration and dissemination of product model information. Selecting the right lightweight format to use remains a problem, but considering matters of model fidelity, metadata support, security features, file size, software support, and openness, the difference between the formats is not as significant as their common advantages over full-featured, complex models.

Product model data, lightweight representations, digital curation

INTRODUCTION Since the turn of the millennium, the engineering sector has been undergoing a paradigm shift in the way that products are designed and manufactured or constructed. Formerly, Computer Aided Design (CAD) tools were used simply to generate blueprints and other two dimensional technical drawings, so that the official description of the product could be set down on paper. Increasingly, though, three-dimensional (3D) CAD models are being integrated into the engineering workflow, being used as the basis of finite element analysis, stereolithographic prototyping, numerical control part programmes and product inspections, for example. Thus the 3D CAD models are taking over as the official record of a product’s design. Long-term users of engineering product data — including maintenance engineers, accident investigators and designers working on similar products — face a significant challenge due to the ephemeral nature of CAD file formats and the applications that work with them. One way of dealing with this is to migrate the CAD information as soon as possible into lightweight formats that are easier to preserve and from which it will be easier to recover information in the future. This approach may also have immediate benefits for collaboration and the dissemination of product model information. The complexity of full-featured CAD formats means that the file sizes of the models can be too large for comfortable transmission over the Internet, making distributed design work much harder. Lightweight representations, by contrast, can have much smaller file sizes. Furthermore, lightweight for- mats often have free viewers, enabling models to be disseminated, accessed and re-used (for example, in marketing documents) much more widely. Probably because of this, lightweight formats typically also have some intellectual property (IP) protection mechanism, whether by approximating the original model

1 or restricting access to exact model data, allowing models to be shared with partner organizations without risking IP assets. The possibilities afforded by lightweight representations are not lost on industrial technologists and soft- ware vendors, and in recent times a number of new lightweight formats have been developed, each com- peting for acceptance as a common exchange or dissemination format. These formats are not all equivalent in the information they communicate, and so the question is: which formats are most suitable for product model data exchange and preservation?

DESIDERATA FOR A LIGHTWEIGHT REPRESENTATION There are a number of possible uses for lightweight representations of CAD models: communicating design information within an organization, communicating design information with partner organizations, promoting designs to customers, generating maintenance instructions, and preserving design information for future reference and re-use. Each of these uses puts a different set of demands on the format chosen to encode the representation. The most pertinent aspects of the formats to consider are: model fidelity, metadata support, security features, file size, software support, and openness.

Model fidelity Modern CAD software uses a combination of different techniques for representing models [6]: boundary representations (B-Rep) that represent shapes using connecting faces, edges and vertices [2]; non-uniform rational B-splines (NURBS) or Bezier surfaces, or similar mathematical surface descriptions; and ‘fea- tures’ — generic parts (with known engineering significance) that can be adapted to fit a particular need through the specification of certain parameters, such as physical dimensions or the number of holes in the part. In scenarios involving design re-use, the ideal would be a lightweight format that could handle (a version of) all these different techniques, allowing a model to be converted into and back out of that format without changing the geometry or losing any of the engineering significance. In contrast, in use cases involving dissemination outside the organization, such fidelity would be dangerous. In such cases, it is more advan- tageous for the lightweight format to encode geometry in an approximate way, using polygon meshes or simplified surfaces for example.

Metadata support Most CAD software has the capability of recording more than just the geometry of a model: materials, fin- ishes, tolerances, recommended machining techniques and so on. In use cases where such non-geometric information is important, it would be useful if a lightweight representation had a way either of embedding this information directly, or of providing links to allow the information to be stored in a separate file yet still related to the model. In certain intra-organizational cases such as in-service maintenance, it may be useful for engineers to be able to mark up a copy of the lightweight representation with annotations, and use this as a way of feeding back field experience into the design. This is something a lightweight format may be more or less amenable to, but it is more obviously a software issue than a format issue. From a preservation perspective it would be useful to record provenance information as a means of authen- ticating the model and checking its lineage.

2 Security features There are two main approaches to protecting the design data within a lightweight representation. The more conservative approach is to encrypt the model data and either remove direct access to it entirely or introduce some password mechanism to defend it. The more destructive approach is to withhold some or all of the exact design data from the representation — perhaps by using tessellated polygons instead of exact geometry, by scaling the dimensions to use an arbitrary, unknown unit of measurement, or by removing any detail not required for the purpose in hand. In cases where the exact detail of the full design is not needed — customer review, promotional materials, reference components for routine maintenance — the destructive approach is an unproblematic method of securing the IP of the design. A password-protected, exact representation would seem a sensible approach in cases where the design needs to be passed to a trusted third party, such as a regulatory body or a repository used for escrow-type deposits. The most significant drawback of the conservative approach is that it is only as secure as the encryption algorithm (or the password) used, but pragmatically the model data only needs to remain secure as long as the design data remains relevant to current design activity. It should be noted that within design teams, the need for security is less, and security measures may hamper the ability of distributed design teams to collaborate.

File size The primary advantage of representations with small file sizes is that they are inherently easier to move around computer networks and transfer over the Internet. This benefits distributed design teams in partic- ular, as it enables them to transfer their designs between sites that much quicker. Smaller file sizes also make it more practical to view the designs on smaller devices such as PDAs, which would be of particular use for maintenance engineers and inspectors. From a curatorial perspective, smaller and simpler models are likely to be preserved more easily and more successfully than full-featured, complex models.

Software support One of the benefits of using lightweight formats is that it allows people in the wider enterprise to view the design without the aid of expensive CAD packages. The availability of low cost viewing and annotation software for a format makes that format more economical to implement across the enterprise. Similarly, the more software that is able to support a format, the lesser the likelihood of interoperability failures and, in the short term at least, obsolescence of the format.

Openness There is no commonly accepted definition of an open format, but at the very least it means that a complete specification of the format has been published [8], and may also imply that the specification may be read and implemented at zero or nominal cost, and/or that the format is democratically controlled by a group of representatives of interested parties [5, pp. 1-3]. The more open a format is, the easier (and cheaper) it is to write software to process it, meaning not only that this software is cheaper to adopt, but also that there is less chance of the format becoming obsolete: an important consideration for long-term archiving purposes.

A SURVEY OF LIGHTWEIGHT REPRESENTATIONS 3D XML 3D XML [3, 11] is an XML-based format for describing a model’s geometry, structure and visualization, and is optimized for interactivity and compactness. It can represent geometry using compact NURBS-like

3 Table 1: Summary of the characteristics of selected lightweight representations.

Format Feature 3D XML Model Fidelity: Exact surfaces, polygon meshes Metadata support: None Security features: Data approximation File Size: Reference-instance, instance modification, some compression Software support: Dassault Systemes` products, Lotus Notes, Microsoft Word/PowerPoint, Internet Explorer, free viewer Openness: Proprietary specification is cost-free to view HSF Model Fidelity: NURBS surfaces, polygon meshes Metadata support: Arbitrary user data, text objects Security features: Data approximation File Size: Data compression, streaming Software support: Autodesk, Dassault Systemes` and PTC products Openness: Proprietary specification is cost-free to view and implement JT Model Fidelity: B-Rep, polygon meshes Metadata support: Arbitrary user data, PMI Security features: Data approximation File Size: Reference-instance, data compression Software support: UGS products, Microsoft Word/Excel/PowerPoint, free viewer Openness: Proprietary specification is cost-free to view and implement, toolkit can be purchased PLM XML Model Fidelity: NURBS surfaces, 2D and 3D , feature modelling Metadata support: Arbitrary user data, design or manufacturing notes, dimension information, surface finish information, mass and material information, text objects Security features: Data approximation, access restriction File Size: Reference-instance Software support: UGS applications Openness: Proprietary schemata are free to view, implement and extend; toolkit can be purchased U3D Model Fidelity: NURBS surfaces, triangle meshes Metadata support: Aribtrary key/value data Security features: Data approximation File Size: Reference-instance, some compression Software support: Adobe PDF software Openness: ECMA standard, cost-free to view Model Fidelity: NURBS surfaces, polygon meshes, 2D and 3D vector graphics Metadata support: Arbitrary key/value data Security features: Data approximation File Size: Reference-instance Software support: Various open source and proprietary viewers and processors, e.g. Xj3D, Flux, BS Contact Openness: ISO standard, cost-free to view, open source libraries XGL/ZGL Model Fidelity: Triangle meshes Metadata support: None Security features: Data approximation File Size: Reference-instance, whole-file compression Software support: Autodesk, various minor CAD products Openness: Specification no longer maintained

4 surface descriptions, XML polygon meshes and compact-syntax polygon meshes, but does not have any additional security features. File sizes are kept down by a reference-instance mechanism (allowing the same data to be re-used several times within a model), a modification mechanism (allowing an instance or reference object to build on the properties of another reference object) and raster graphic compression. Models may be expressed by a single file or split across several files. Native support for the format is largely restricted to Dassault Systemes` products, although free plugins are provided for Lotus Notes and Microsoft Word, PowerPoint and Internet Explorer, as well as a free standalone viewer. The format is owned and controlled by Dassault Systemes;` the specification for the format is available cost-free to those who register with the Dassault Systemes` website.

HOOPS Stream Format (HSF) HOOPS Stream Format [7] is a binary format for encoding both 2D and 3D geometry using tessellating polygons and (since version 7) NURBS surfaces; it also supports arbitrary user data, text and, by means of an OpenHSF extension to the format, model structures. It does not have any in-built security features other than data approximation. The format permits streams within files to be zlib-compressed, and as the name suggests it can be streamed. It is supported by a number of CAD vendors including Autodesk, Dassault Systemes` and PTC. The format is owned by Tech Soft 3D, but the specification is freely accessible on the Web and the licence to implement it is free.

JT Format JT Format [9] is a binary format for encoding product geometry using boundary representations and wire- frames, and supports additional product manufacturing information and other metadata. It does not have any in-built security features other than approximating data using tessellating polygons. File sizes are kept down using a reference-instance mechanism, zlib compression of various data elements and datatype- specific compression using algorithms such as uniform data quantization, bitlength codec, Huffman codec, arithmetic codec, and Deering Normal codec. Models may be expressed by a single file or split across several files. Native support for the format is largely restricted to UGS products, although free plugins are available for Microsoft Word, Excel and PowerPoint, as well as a free standalone viewer. The format is owned by UGS, but the specification is freely accessible on the Web and blanket permission is given to implement it.

PLM XML PLM XML [10] is a set of XML schemata for describing a model’s geometry, structure, features, owner- ship, and visualization. It is designed to be interoperable between a number of different tools from across the lifecycle of a product. The native schemata for representing geometry can support 2D and 3D vector graphics, NURBS surfaces and features, although non-native representations can also be used or referenced in a PLMXML document. It also allows for a single logical product model to have several different geo- metric representations, tailored to different purposes. Metadata of several different types — mass, material, texture, product manufacturing information, dimensions and tolerances, user markup, application-specific data — can be attached to logical parts of the model or specific geometric representations. File sizes can be reduced using a reference-instance mechanism and by splitting out various sections of data into separate files (so that data not needed for a particular purpose need not be transmitted). As well as approximating and subsetting data, PLM XML also supports mechanisms for restricting access to parts of the model data on the basis of person, organization or place. The format is used extensively by UGS products but is not widely supported otherwise. The format is owned and controlled by UGS; the XML schemata are freely accessible on the Web, but the software development kit must be purchased.

5 (U3D) Universal 3D [4] is a binary format for encoding product geometry using sets of tessellating triangles and (from the 4th edition) NURBS surfaces. A mesh update mechanism allows meshes to be rendered progressively, providing basic streaming support. Metadata, stored as key/value pairs, may be attached to any node in the model tree. It does not have any in-built security features other than approximating the geometry. File sizes are kept down using a reference-instance mechanism and a bit compression algorithm on numeric data fields. The format is most notably supported as a native 3D model format within the Portable Document Format (PDF) specification from version 1.6 (corresponding to Adobe Acrobat 7), which adds some conservative security mechanisms [1]. It was developed by the 3D Industry Forum and is published and maintained as ECMA standard 363; the specification is freely available on the Web.

X3D X3D [13–15] is an improved version of Virtual Reality Markup Language (VRML); it is an XML format optimized for animation and interaction. It can represent 2D and 3D vector graphics, 3D tessellating poly- gon meshes, and NURBS surfaces as well as identifying bones and joints for human animation. Any node in the model tree may have metadata attached, in a format specifying a value (string or number), a meta- data schema and a key. It does not have any in-built security features other than data approximation. X3D has a reference-instance mechanism and a relatively compact XML syntax, with coordinates expressed as space/comma delimited lists within attributes, rather than through a hierarchy of tags; a binary syntax is available that compresses field values according to Fast InfoSet principles, using zlib compression, quanti- zation of floating point number arrays, integer range reduction and conversion of absolute values to relative values. Open source libraries and viewers are available for processing and rendering X3D files. X3D was developed by the Web 3D Consortium, and is published and maintained as ISO standards 19775, 19776 and 19777; these standards are freely available on the Web.

XGL/ZGL XGL [12] is an XML-based encoding of the Open Graphics Library (OpenGL) application programming interface for rendering 2D and 3D computer graphics. When compressed it is known as ZGL. It uses tessellating triangles to encode geometry, and is optimized for display. It does not have any capabilities for storing metadata, nor does it have any in-built security features other than approximating the geometry. File sizes are kept small using a reference-instance mechanism and a relatively compact XML syntax, with vector coordinates expressed as comma delimited lists rather than through a hierarchy of tags. XGL is supported by Autodesk and a few smaller CAD vendors. It was developed by the XGL Working Group but no longer appears to be maintained; the specification of the format was once freely available on the Web, but now only appears in ‘unofficial’ locations.

CONCLUSION There does not seem to be much difference between the formats surveyed in terms of the geometry they can express, with the exception that XGL/ZGL cannot express exact curved surfaces, and that PLM XML can additionally express the semantics of model features. PLM XML also appears to be the most expressive format when it comes to metadata, while 3D XML and XGL/ZGL are least expressive. In terms of software support and openness — both important aspects for the technical business of preservation — the leader would appear to be X3D, an open standard with a number of open source tools available. While there is no one lightweight format that stands out as ideal in all scenarios, the very fact that multiple formats are competing to be simultaneously the most expressive and most interoperable is an encouraging sign. If nothing else, it raises hopes that employing lightweight formats in a data exchange and preservation strategy will help to keep at least the bare essentials of design work readable and usable into the future.

6 REFERENCES [1] Adobe Systems: PDF Reference, Fifth Edition: Adobe Portable Document Format Version 1.6. (2004). : http://www.adobe.com/devnet/pdf/pdfs/PDFReference16.pdf (visited on 2007-08-20). [2] I. C. Braid: Designing with volumes. PhD thesis. Cambridge University, (1974). [3] Dassault Systemes:` 3D XML User’s Guide Version 2 Edition 0. (2006). [4] ECMA-363: Universal 3D . 4th edition. (2007). : http://www.ecma-international.org/ publications/files/ECMA-ST/ECMA-363%204th%20Edition.pdf (visited on 2007-08-20). [5] N. S. Hoe: Free/Open Source Software. Open Standards. With a forew. by Peter J. Quinn. New Delhi: Elsevier, (2006).  81-312-0538-X. : http://www.iosn.net/open-standards/foss-open-standards-primer/ foss-openstds-withcover.pdf (visited on 2007-08-20). [6] C. McMahon, J. Browne: CADCAM: Principles, Practice and Manufacturing Management. 2nd ed. Harlow: Addison-Wesley, (1998).  0-201-1781-9. [7] Open HSF Initiative: The HOOPS 3D Product Suite. : http://www.openhsf.org/docs hsf/index.html (visited on 2007-08-20). [8] D. Taraborelli, B. Guerry, Y. Brailowsky: Open Versus Proprietary Formats. 16th Oct. 2005. : http: //www.openformats.org/en1 (visited on 2007-08-20). [9] UGS: JT File Format Reference Version 8.1. (2006). : http://www.jtopen.com/docs/JT File Format Reference.pdf (visited on 2007-08-20). [10] UGS: Open Product Lifecycle Data Sharing Using XML. White Paper. (2005). : http://www.ugs.com/ products/open/plmxml/docs/wp plm 14.pdf (visited on 2007-08-20). [11] K. Versprille: Dassault Syst`emes’Strategic Initiative: 3D XML for Sharing Product Information. Technology Trends in PLM. Collaborative Product Development Associates, (2005). : http://www.3ds.com/uploads/ tx user3dsplmxml/3DXML for sharing product information.pdf (visited on 2007-08-20). [12] XGL Working Group: XGL File Format Specification. (2006). : http://web.archive.org/web/20060218/ http://www.xglspec.org/ (visited on 2007-08-20). [13] ISO/IEC 19775:2004: Information technology — Computer graphics and image processing — Extensible 3D (X3D). : http://www.web3d.org/x3d/specifications/ISO-IEC-19775-X3DAbstractSpecification/ (visited on 2007-08-20). [14] ISO/IEC 19776:2005: Information technology — Computer graphics and image processing — Extensible 3D (X3D) encodings. : http://www.web3d.org/x3d/specifications/ISO-IEC-19776-X3DEncodings-XML- ClassicVRML/ (visited on 2007-08-20). [15] ISO/IEC 19777:2006: Information technology — Computer graphics and image processing — Exten- sible 3D (X3D) language bindings. : http://www.web3d.org/x3d/specifications/ISO- IEC- 19777- X3DLanguageBindings/ (visited on 2007-08-20).

This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) and the Economic and Social Research Council (ESRC) under Grant Numbers EP/C534220/1 and RES-331-27- 0006.

7