EROS: an open source database for museum conservation restoration Geneviève Aitken*, Christian Lahanier, Ruven Pillay and Denis Pitzalis
Abstract
The EROS database was developed internally to manage all manner of digital documentation. It was designed to handle museum collection analytical data from the laboratory as well as from museum conservation/restoration workshops. The information is focused on scientific and technical data. This includes indexing vocabularies, study reports, restoration reports, digital data from quantitative analysis, spectra, graphs, chemical formulae, ultraviolet, infrared, raking light photography and scanning electron microscopy images. The database also includes administrative information such as inventory tracking and the restoration history of the works of art as well as periodic surveys of the collection. New features include automatic content recognition of objects, geographical location display, panoramic viewing, multi-spectral image and threedimensional model display.
Keywords database management system, multilingual database, conservation–restoration, documentation, conceptual model, high-resolution image viewing, threedimensional object viewing, indexing, image pattern recognition
Centre de recherche et de restauration des musées de France Palais du Louvre Porte des Lions 14, quai François Mitterrand 75001 Paris cedex 01 France E-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Ilenia Cassan 39 ile Fana Joinville Le Pont, 94340 France E-mail: [email protected]
Elena Kuzmina State Historical Museum 1/2 Red Square 113012 Moscow Russia E-mail: [email protected]
Junko Koga Codex Images International 17 Square Edouard VII 75009 Paris France E-mail: [email protected]
Rui Ferreira da Silva, Curvelo Alexandra, Escobar Nazaré and Matos Emília Instituto Portugues de Conservacao e Restauro Rua das Janelas Verdes 37 1249-018 Lisbon Portugal E-mail: [email protected]
Hsien-Min Hsiao 25–27, rue de Londres 67000, Strasbourg France
*Author to whom correspondence should be addressed
Introduction This paper is dedicated to the memory of Michel Aubert, a CIDOC member, who set up the Computer Department at the Ministry of Culture in France in 1970. He was a pioneer, who instituted the first computer management systems of the museum collections of France. Thirty years on, digitized inventory of art collections and monuments is well established. This task requires constant efforts to keep up with developments in computer science, in particular, with the Internet, new imaging technology, automatic content recognition and interoperability based on ICOM-CIDOC concept reference model ontology. In 1990, the C2RMF built a multilingual database named NARCISSE to manage the scientific documentation on canvas paintings that had built up since the founding of the Louvre conservation–restoration department in 1931. The painting metadata standard of the JOCONDE painting database developed at the Direction des Musées de France (the central French state museums directorate) was adopted. The NARCISSE project added new multilingual metadata (indexing vocabularies and their definitions in eight languages) for the management of the photographic as well as the scientific report archives (Arquivos Nacionais/Torre do Tombo 1993, Lahanier et al. 1994, 1995, CRISTAL Conservation Restoration Institutions for Scientific Terminology dedicated to Art Learning Network 1999– 2000). Conservation and research oriented, its aim was to manage high-definition digital images (6000 × 8000 pixels). In 2001, 10 years on from the NARCISSE project, an open source database system named EROS was conceived and constructed with the help of Hewlett-Packard (Lahanier et al. 2002, 2003, Aitken and Lahanier 2003). It was designed for museum research laboratories and conservation centres to efficiently handle extremely large quantities of records, multilingual metadata (including languages such as Russian, Chinese, Arabic and Japanese) and ultra-high resolution colorimetric and multi-spectral imaging. It was also designed to handle new technologies such as automatic image content recognition and new kinds of searching methods. The ICOM Documentation Working Group can be the international support for the development of new information technologies such as computer applications and digitization as well as collaborations between institutions or individuals. During this triennial period, a few people from several countries agreed to translate first the vocabularies for indexing, and later the EROS database content to access, in their own language, this huge information on conservation. This effort will continue through convention signed with the C2RMF. Currently, over 300,000 photographic and radiographic images, 10,000 technical reports, 500 three-dimensional (3D) objects and 200,000 quantitative analyses related to 56,000 works of art are accessible online in digital form.
Database architecture and software
The EROS system is organized in several parts (Figure 1): the storage back-end, the relational database, the image server, the middleware and the web server.
Figure 1. EROS schema
The data are stored on 15TB HP RAID hard disk racks managed by three servers:
• the metadata related to the works of art, the images, the reports, the analysis, the analytical reports, the restoration reports, the conservation surveys, the chemical, structural, isotopic and molecular quantitative and qualitative analytical results and published papers • high-definition digital images (photographic films taken with different techniques such as: infrared (IR), X-ray and ultraviolet (UV) light; detailed cross-sections; electron microscopy views; graphs; spectra; multi-spectral images; panoramic views and 3D models) • feature vectors for 2D and 3D image content recognition for automatic classification.
The EROS system is entirely open source and available under the GNU Public Licence (GPL). It is based on powerful and industry-leading free software such as Linux, Apache, MySQL and HP. Web access is through a W3C standard compliant client such as Firefox. Features include:
• web-based and platform independent, allowing for internal use through an intranet, externally through an extranet and through the Internet for the general public • flexibility, allowing easy customization for individual needs • fully and transparent multilingual access – searches on the data can be performed on all the information in any language or encoding • standards compliant – the use of XML, and so on, allows complex data interaction and analysis to take place both within the server and by the client • distributed systems in different institutions can be consulted simultaneously and the results aggregated • integrated colour calibrated multi-resolution image viewer for both flat, panoramic view and multi-spectral images • extensible, modular and able to evolve over time with the possibility of adding custom plug-ins.
Query interface and navigation mode
The user can select several query conditions in the main menu at the top of the interface (Figure 2) before making a query: Figure 2. EROS search interface
• the language (the system automatically detects the language in use, but this can be overrided if necessary) • a domain or a category of objects • a limit on the number of results per page from a query • the results can be limited to the works and/or the images, and/or the reports and/or the conservation reports • the visualization mode of the results (the whole information related to a painting, only the images, an export file in XML format, a chronological axis to classify the objects by time, a map to locate the place of conservation, of discovery, or the place of making, a table containing all the metadata in HTML or text format for use in other software).
To query, the user can select terms by clicking on the loops at the end of each field which opens a list of vocabulary per domain. Only vocabulary that is actively referenced to in the database is displayed. A navigation bar allows rapid navigation within the list or the user can perform a search to go directly to a specific term. If the user types something into the field themselves, a similarity search is performed. The data within the main database sections (objects, images, documents, restoration histories and analyses) can be selected for a single or cross query. The tables in the database are organized as:
• administration, historical and material data related to the objects (39 fields) • data related to photographic films and digital images (20 fields) • data related to documents such as reports, articles, multimedia, and so on (20 fields) • data related to quantitative and qualitative analyses of the material composition (19 fields).
The number of results from a query appears at the top of the results page. The objects are classified by their identification number. Images associated with the work appear afterwards together with related metadata and are classified chronologically by the date the image was taken. Almost everything on the page is interactive. A system of dynamic linkage allows the user to search within a subset of the query, to see any definitions or to find Web sites related to the terms or the work. Clicking on an image thumbnail opens a viewer appropriate to the image format. Several buttons under the thumbnails allow the user to download either a full-size image or a dynamically generated image in JPEG format at a size chosen by the user. In addition, it is possible to view the colour distribution within the image in a colour space diagram or to perform a search based on image similarity according to different criteria. The user also has access to a kind of shopping-cart style storage space, which allows the user to save items of interest such as works of art, individual images or documents, and so on, for future reference. From the list of results, the user can select an object by clicking on its title to view all manner of related information such as its administrative, historical or material information. The user can view the various images available (for example, normal light, UV, IR, X-Ray, raking light, cross sections, scanning electron microscopy images, spectra, graphs, panoramic views, multi-spectral images and 3D models, and so on). Documents are available in numerous formats such as Word, HTML, PDF, and so on. Access is also available to the conservation reports for the object and to all manner of chemical analyses as well as analytical reports.
Modifying or entering new data To enter new data (Figure 3) or to modify the databases, the user has to login with a password and has a set of modification rights. Each new entry requires a unique identification identifier, which can be chosen by the user or be automatically attributed by the system. The username and date are recorded automatically. Metadata is selected by dictionary lists in the chosen language of
Figure 3. EROS edit interface the user. These are vocabulary lists that have been validated by the manager. The thesaurus is organized as a set of hierarchical dictionaries for each translatable field and for each available language (French, Italian, English, Russian, Japanese, Portuguese, Arabic, Greek and Chinese at the moment). The data within the main database are stored in a compact language independent format as short codes. When the search results are presented, the information in the main database is translated via the thesaurus system into the appropriate language. The thesaurus is not only capable of handling a full lexical hierarchy, but also of handling synonyms and complex character sets, such as Russian, Arabic, Japanese and Chinese, which can be managed via Unicode (UTF-8) encoding. Definitions of the indexing vocabularies are accessible on line in the different languages. The right to enter or modify data is managed per category of user (administrator, manager or users) and per category of metadata (objects, images, documents or analyses). The lists of terms in the query interface are updated dynamically every day and show the number of occurrences of each term currently in the database. A logging system keeps track of any changes made to the database, by whom and when.
Ultra-high resolution and high dynamic range image viewer
The database now contains ultra-high resolution images that can be up to several gigabytes in size. To handle, display and organize such large quantities of data, a special imaging system is required (Figure 4). Furthermore, the database now contains other advances in digital imaging such as accurate colorimetry, extended dynamic ranges and multi-spectral acquisition. The CRISATEL multi-spectral camera acquires images at 13 wavelength ranges and allows us to reconstruct the colour image with an extremely high level of accuracy. It is also possible to simulate colour appearance under almost any illuminant; something that, because of metamerism, is impossible with only three colour channels. The classic 8 bit RGB colour space most commonly used today is insufficient for the richness of information contained within colour images reconstructed in this way. Device-independent colour spaces such as CIELAB are able to contain colours outside of the limited gamut available in RGB colour. Furthermore, higher dynamic ranges (16 bits) are required to store the full dynamic range of imaging detail that would be invisible or saturated using only 8 bits. To handle these extremely large high dynamic range images, the Open Source IIPImage system (IIPImage high resolution remote image viewing system:
Figure 4. IIPImage JavaScript client showing ‘Portrait of a Naked Woman’ by Renoir. The viewer shows a 13 channel 16 bit multispectral image (9500 pixels ×11,530 pixels) magnified by a factor of 15 http://iipimage.sf.net) is used. IIPImage is a client-server system designed for the remote viewing of very high resolution images across an Internet connection. It consists of a client–server architecture that is designed to be usable even over a slow dialup connection. The server is a C++ plug-in that can work with Apache or any other FCGI-enabled web server. Images can be viewed through a featurerich Java client, through a Javascript client embedded within a web page, or full JPEG images can be dynamically generated at the requested size. For the system to be as efficient as possible, the images are stored in a multiresolution tiled format (Figure 5), which allows the server to extract regions of the full image at different resolutions very quickly with no processing overhead. The TIFF format is flexible enough to allow this kind of encoding. Multiple resolutions can be stored within a single file and each image can be tiled and optionally compressed using either lossless compression such as Deflate or LZW or a compression such as JPEG. The system is fast because the client only needs to download the part of the image to be shown on screen at the viewing resolution. It is also bandwidth and memory efficient as users do not need to be able to store or handle massive images on their local machine. Only the required parts of the whole image at the desired resolution need to be sent. The image tiles are extracted by the server, dynamically compressed with JPEG and sent to the client. The compression level used can be controlled by the client to optimize the transmission. This technique makes it possible to view extremely large images of several gigabytes in size in real-time over the Internet.
Figure 5. Pyramidal image structure
The IIPImage system was initially designed to handle standard 8 bit RGB images, so had to be upgraded to handle higher dynamic range 16 bit images as well as other colour spaces such as CIELAB and generic multi-spectral images. Standard monitors are only able, however, to display 8 bits of information per pixel per channel in RGB colour space. To visualize 16 bit or CIELAB images, the raw data need to be first processed. In the case of 16 bit images, a contrast control allows the user to navigate within the extra, normally hidden data. For CIELAB images, a dynamic conversion is performed by the client into the calibrated RGB space of the monitor. In the case of multi-spectral images, the user is able to navigate through the different wavelengths and compare details.
3D object viewing
The IIPImage system is also able to handle the kind of panoramic 3D objects created during the ACOHIR project. The user is able to zoom into the image as well as rotate it. In addition, 3D model support has been added to the EROS system (Figure 7). These objects have been obtained by two different methods:
• laser acquisition: requires expensive hardware and special software • reconstruction based on silhouettes: requires a lot of human intervention.
Figure 6. Panoramic view, laser acquisition and 3D model reconstruction Figure 8. EROS timeline results
Figure 7. EROS full text search page
Full text document searching
Documents linked to paintings or objects are stored in the database in various formats, such as Word, HTML, XML and PDF. Software can be used to extract the textual contents from these formats and allow full text searches to be performed via the EROS search interface. Users are able to perform several kinds of search such as for an exact phrase or excluding a word or phrase, and so on.
Chronological display
The chronological representation (Figure 8) displays a visual timeline of the results allowing the user to see, for example, the evolution of a painter’s style over a period of time. The example shows the stylistic evolution of the painter J F Millet (1814–1875). The visualization clearly illustrates the different periods in the life of the artist, from his apprenticeship with Delacroche where he started as a copyist doing simple academic work, through portrait and commercial phases, towards his most mature and Realist-influenced landscape works.
Cartography
The cartographical representation (Figure 9) displays a geographical map of the world with the location of the results of the search superimposed on it. Users can zoom into a geographical region as well as view the full set of results for a particular location. This allows users to discern possible clusters of results based on where the painting or object was made, found or is now conserved. This is done by using PHP and a special graphics library that is able to dynamically generate images. In this way the user can focus on a geographic area of interest. It is also possible to visualize related information on the objects and give a historical view of the movement of the object from its place of manufacture to its current location. The example (Figure 9) shows that the works of the painter J B Corot (1796–1875) are located in Europe. Zooming in reveals that the majority of his works are located in France. Clicking on the city of Paris displays all of the paintings conserved there. Figure 9. EROS cartography results
XML data interchange
By using the XML standard, it is possible to easily interchange data with different platforms and applications.
Pattern matching for paintings and objects
As part of the SCULPTEUR project, a series of algorithms have been developed to classify and to retrieve images and 3D models in terms of shape, pattern, colour, texture and volume and so on. This allows the user to perform sophisticated searches based on various criteria of similitude. In addition, the PICTEUR graphical interface allows users to combine searches on image similitude with textual metadata. A dynamic graph representing the conceptual classes of knowledge based on the CIDOC-CRM allows the user to select works of art through a user- friendly database interface.
Conclusion Information technology is allowing research to be conducted far more openly and internationally than ever before. Multilingual Internet access, the ontological classification of scientific vocabularies and remote image viewing allow research centres to work together on a global basis. To further this, interoperable standards and formats are essential. These include image formats (quality and compression), colour management (calibration, gamut mapping, colour transformations), archiving system security (watermarking, data hiding), computer management systems and applications, language exchange formats, multilingual vocabularies for indexing, and so on. Access to the Internet is also transforming the relationship between people in the world. The cultural field must be open to the need of users and contribute strongly to the education of new generations. Data entry and digitization within the C2RMF of the documentation produced over 70 years has required over 10 years of effort. The database is unique not only in the sheer volume of data (15 TB online), but also in the diversity of its content. It is currently available online through a high bandwidth line (via the 100 Mbps French academic network). A network of cooperation has established itself around EROS to develop new features. One such is the 3D visualization of paintings that have been digitized by laser in partnership with the Canadian National Research Centre (Art3D Project).
Acknowledgements
We thank: Jacques Misselis, the Director of Education and Research at Hewlett Packard Europe for the computer hardware; Dr Philippe Colantoni at the J M University in Saint-Etienne for the ColorSpace program; Sr Agnès-Mariam de la Croix, Président of La Maison d’Antioche for the Arabic translation, Alison Murray, Associate Professor at Queen’s University, for the reviewing of the English version and Myriam Serck-Dewaide, Executive Director, Institut Royal du Patrimoine Artistique, Dahlia Mees and Marjolein Debulpaep for their collaboration to the EROS system; Benjamin Simon for his contribution to the Millet’s chronology; Laval, Albi, Abbeville, Troyes, Cherbourg and other museums for their interest.
References
Aitken, G and Lahanier, C, 2003, ‘La base EROS, source de connaissance multilingue, actes du Symposium Diderot, ‘cartographier la connaissance’, Langres, 14–17 April 2003. Arquivos Nacionais/Torre do Tombo, 1993a, ‘NARCISSE Système documentaire des peintures et enluminures’, Lisboa, Arquivos Nacionais/Torre do Tombo, 353 p. Arquivos Nacionais/Torre do Tombo, 1993b, ‘NARCISSE Glossaire multilangue’, CDROM, Lisboa, Arquivos Nacionais/Torre do Tombo, 278 p. CRISTAL Conservation Restoration Institutions for Scientific Terminology dedicated to Art Learning Network, no. R99/DGX Bruxelles, programme Raphaël, June 1999– December 2000). Lahanier, C, Meili, D and Aubert, M, 1994, ‘Art and science’, multilingual CD-ROM Intelligent Multimedia Information Retrieval Systems and Management, Rockefeller University, New York, 11–13 October. Lahanier, C, Aitken, G and Aubert, M, 1995, ‘NARCISSE: une bonne résolution pour l’étude des peintures’, Techné no. 2, 178–190. Lahanier, C, Aitken, G, Shindo, J, Pillay, R, Martinez, K and Lewis, P, 2002, ‘EROS: an open source, multilingual research system for image content retrieval dedicated to conservation–restoration exchange between cultural institutions’ in ICOM-CC 13th Triennial Meeting, Rio de Janeiro, 22–27 September 2002, vol. 1, 287–294. Lahanier, C, Aitken, G and Pillay, R, 2003, ‘EROS: European Research Open System’ in ICHIM2003 Congress 10–12 September 2003, Paris.
