Irini FUNDULAKI Curriculum Vitæ

School of Informatics [email protected] University of Edinburgh˜efountou Appleton Tower, Crichton Street Tel: +44 1 31 651 3820 Edinburgh, EH8 9LE Tel: +44 1 31 651 3815 UK Research Interests

• XML Security and Access Control • XML Query and Update Languages • XML Full Text Search with a focus on Query Personalization • XML Data Integration Education

• Diplˆome de Docteur en Informatique (January 2003) Conservatoire National des Arts et M´etiers Paris (C.N.A.M.), Institut National de Recherche en Informatique et Automatique (I.N.R.I.A), France Dissertation: Querying and Integration of XML resources for Web Communities Grade: “High honorable degree” (”Mention tr´eshonorable”) Advisor: Prof. Michel Scholl (C.N.A.M., Paris, France) Co-Advisors: Prof. Bernd Amann (LIP 6 - Universite´ de Pierre et Marie Curie, Paris, France) and Prof. Catriel Beeri (The Hebrew University of Jerusalem, Israel) • M.Sc. in Computer Science (March 1997) Computer Science Department, University of Crete, Greece Thesis: A Diagnostic Support System for Engineering Applications Advisor: Prof. Panos Constantopoulos (University of Crete, Greece) • B.Sc. in Computer Science (September 1994) Computer Science Department, University of Crete, Greece Honors and Awards

• Postdoctoral Fellowship (January 2003 - January 2004) Institut National de Recherche en Informatique et Automatique (I.N.R.I.A) • Teaching and Research Position (Attach´eeTemporaire d’Enseignement et de Recherche) (September 2000 - August 2002) French Ministry of Education • PhD Fellowship (November 1998 - August 2000) Verso Group, Institut National de Recherche en Informatique et Automatique (I.N.R.I.A) • ERCIM fellowship (May 1998 - October 1998) Completed at the Verso Group, Institut National de Recherche en Informatique et Automatique (I.N.R.I.A) • Graduate Scholarship (October 1994 - October 1996) Institute of Computer Science, Foundation for Research and Technology-Hellas (ICS-FORTH) • Undergraduate Scholarship (September 1992 - September 1994) Institute of Computer Science, Foundation for Research and Technology-Hellas (ICS-FORTH)

1 Research Experience

• Research Fellow (February 2006 - present) Database Group, Laboratory for Foundations of Computer Science, School of Informatics, University of Edinburgh, UK Responsible: Prof. Peter Buneman • Member of Technical Staff (June 2004 - January 2006) Network Data and Services Research Department, Bell Laboratories, Lucent Technologies, USA Responsible: Rick Hull • Post Doc Employee (February 2003 - May 2004) Network Data and Services Research Department, Bell Laboratories, Lucent Technologies, USA Responsible: Rick Hull • Visitor (November 2001) Database Group, The Hebrew University of Jerusalem, Israel Responsible: Prof. Catriel Beeri • Ercim Fellow (May 1998 - October 1998) Verso Group, Institut National de Recherche en Informatique et Automatique (I.N.R.I.A), France Responsibles: Prof. Michel Scholl, Prof. Bernd Amann and Anne-Marie Vercoustre (Director of Research at I.N.R.I.A.) • Post-Graduate Employee (November 1996 - April 1998) Information Systems and Software Technology Division, Institute of Computer Science, Foundation for Research and Technology (FORTH), Greece Responsibles: Prof. Panos Constantopoulos, Prof. Vassilis Christophides and Dr. Martin D¨orr • M.Sc. Intern (October 1994 - October 1996) Information Systems and Software Technology Division, Institute of Computer Science, Foundation for Research and Technology (FORTH), Greece Responsible: Prof. Panos Constantopoulos • B.Sc. Intern (September 1992 - September 1994) Information Systems and Software Technology Division Institute of Computer Science, Foundation for Research and Technology (FORTH), Greece Responsible: Prof. Panos Constantopoulos Publications Refereed Journal Papers

[1] Bernd Amann, Irini Fundulaki, and Michel Scholl. Integrating ontologies and thesauri for RDF schema creation and metadata querying. Int’l Journal of Digital Libraries, 3(3):221–236, October 2000.

[2] Bernd Amann, Catriel Beeri, Irini Fundulaki, and Michel Scholl. Interrogation de Ressources XML Concer- nant un domaine d’int´erˆet. Technique et Science Informatiques (TSI). Vol. 22(10):1243–1270, 2003.

Refereed Conference Papers

[3] Sihem Amer-Yahia, Irini Fundulaki and Laks V. Lakshmanan. Personalizing XML Search in PimenTo. To appear in Proc. of the 23rd Int’l Conf. on Data Engineering (ICDE), Instanbul, Turkey, April 2007.

[4] Michael Benedikt and Irini Fundulaki. XML Subtree Queries: Specification and Composition. In Proc. of the 10th Int’l Symposium on Database Programming Languages (DBPL), Trondheim, Norway, September 2005 (held in conjunction with the 31st Int’l Conf. on Very Large Databases (VLDB)).

[5] Arnaud Sahuguet, Bogdan Alexe, Irini Fundulaki, Pierre-Yves Laligand, Abdullatif Shifka, and Antoine Arnail. User Profile Management in Converged Networks (Episode II): “Share your data, keep your secrets”.

2 In Proc. of the Second Biennial Conf. on Innovative Data Systems Research (CIDR), Asilomar, California, USA, January 2005.

[6] Irini Fundulaki and Maarten Marx. Specifying Access Control Policies for XML Documents with XPath. In Proc. of the 9th ACM Symposium on Access Control Models and Technologies (SACMAT), Yorktown Heights, New York, USA, June 2004.

[7] Bernd Amann, Catriel Beeri, Irini Fundulaki, and Michel Scholl. Querying XML Sources Using an Ontology- Based Mediator. In Proc. of the Confederated Int’l Conf.s DOA, CoopIS and ODBASE, Irvine, California, USA, November 2002. Appeared also in the Informal Proc. of the 18i`emes Journ´ees Bases de Donn´ees Avanc´ees (BDA), Paris, France, October 2002.

[8] Bernd Amann, Catriel Beeri, Irini Fundulaki, and Michel Scholl. Ontology-based Integration of XML Web Resources. In Proc. of the 1st Int’l Conf. on the Semantic Web, Sardinia, Italy, June 2002.

[9] Bernd Amann, Catriel Beeri, Irini Fundulaki, Michel Scholl, and Anne-Marie Vercoustre. Rewriting and Evaluating Tree Queries with XPath. In Informal Proc. of the 17i`emes Journ´ees Bases de Donn´ees Avanc´ees (BDA), Agadir, Maroc, November 2001.

[10] Bernd Amann and Irini Fundulaki. Integrating Ontologies and Thesauri to Build RDF Schemas. In Proc. of the 3rd European Conf. on Research and Advanced Technology for Digital Libraries (ECDL), Paris, France, September 1999.

[11] Martin D¨orrand Irini Fundulaki. SIS-TMS, A Thesaurus Management System for Distributed Digital Collections. In Proc. of the 2nd European Conf. on Research and Advanced Technology forzaza Digital Libraries (ECDL), Crete, Greece, September 1998.

[12] Vassilis Christophides, Martin D¨orr,and Irini Fundulaki. The specialist seeks expert views: Managing digital folders in the AQUARELLE project. In Museums and the Web 1997: Selected Papers.

[13] Vassilis Christophides, Martin D¨orr, and Irini Fundulaki. A Semantic Network Approach to Semi-Structured Documents Repositories. In Proc. of the 1st European Conf. on Research and Advanced Technology for Digital Libraries (ECDL), Pisa, Italy, September 1997.

Refereed Demonstration Papers

[14] Sihem Amer-Yahia, Irini Fundulaki, Prateek Jain and Laks Lakshmanan. Personalizing XML Text Search in PimenT. In Proc. of the 31st Int’l Conf. on Very Large Data Bases (VLDB), Trondheim, Norway, September 2005.

[15] Serge Abiteboul, Bogdan Alexe, Omar Benjelloun, Bogdan Cautis, Irini Fundulaki, Tova Milo, and Ar- naud Sahuguet. An Electronic Patient Record “on Steroids”: Distributed, Peer-to-Peer, Secure and Privacy- conscious. In Proc. of the 30th Int’l Conf. on Very Large Data Bases (VLDB), Toronto, Canada, September 2004.

[16] Irini Fundulaki, Guillaume Giraud, Daniel Lieuwen, Nicola Onose, Nicolas Pombourq, and Arnaud Sahuguet. Share your data, keep your secrets. In Proc. of the ACM Int’l Conf. on Management of Data (SIGMOD), Paris, France, June 2004.

[17] Irini Fundulaki, Bernd Amann, Catriel Beeri, and Michel Scholl. STYX: Connecting the XML World to the World of Semantics. In Proc. of the Int’l Conf. on Extending Database Technology (EDBT), Prague, Czech Republic, March 2002.

Refereed Workshop Papers

[18] Irini Fundulaki and Maarten Marx. Mediation of XML data through Entity-Relationship Models. In Infor- mal Proc. of the 1st Int’l Workshop on Semantic Web and Databases (SWDB), Berlin, Germany, September 2003 (in conjunction with the 29th Int’l Conf. on Very Large Data Bases (VLDB)).

[19] Bernd Amann, Catriel Beeri, Irini Fundulaki, Michel Scholl, and Anne-Marie Vercoustre. Mapping XML Fragments to Community Web Ontologies. In Informal Proc. of the Fourth Int’l Workshop on Web and

3 Databases (WebDB), St. Barbara, California, USA, May 2001 (in conjunction with the ACM Int’l Conf. on Management of Data (SIGMOD)).

[20] Sofia Alexaki, Vassilis Christophides, Greg Karvounarakis, Dimitris Plexousakis, Karsten Tolle, Bernd Amann, Irini Fundulaki, Michel Scholl, and Anne Marie Vercoustre. Managing RDF Metadata for Community Webs. In Proc. of the 2nd Int’l Workshop on The World Wide Web and Conceptual Modelling, Salt Lake City, Utah, USA, October 2000 (in conjunction with the 19th Int’l Conf. on Conceptual Modelling (ER)).


[21] Irini Fundulaki, Rick Hull, Bharat Kumar, Daniel Lieuwen, and Arnaud Sahuguet. My Personal Web: A Seminar on Personalization and Privacy for Web and Converged Services. In Proc. of the Int’l Conf. on Data and Knowledge Engineering (ICDE), Boston, Massachusetts, USA, March 2004.

[22] Chris Clifton, Irini Fundulaki, Rick Hull, Bharat Kumar, Daniel Lieuwen, and Arnaud Sahuguet. Privacy- Enhanced Data Management for Next Generation e-Commerce. In Proc. of the 29th Int’l Conf. on Very Large Data Bases (VLDB), Berlin, Germany, September 2003 (co-presented with Arnaud Sahuguet).

Refereed National Conferences

[23] Peter Buneman, James Cheney, Carwyn Edwards and Irini Fundulaki. Preserving Scientific Data with XMLArch. In Proc. of the 5th e-Science All Hands Meeting, Nottingham, UK, September 2006 (Poster presentation).

[24] Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia and Anastasios Kementsietsidis. A View Based Security Framework for XML. In Proc. of the 5th e-Science All Hands Meeting, Nottingham, UK, September 2006.

Unrefereed Technical Reports

[25] Irini Fundulaki and Avinash Vyas. Architectural approaches for blending Data Integration and Identity Management. Bell Labs Internal Technical Report, December 2004.

[26] Bernd Amann, V. Christophides, I. Fundulaki, M. Scholl and A.M. Vercoustre. Intelligent Mediation of Cultural Information Sources. In ERCIM News, No. 35, October 1998.

[27] V. Christophides, M. Do¨rr and I. Fundulaki. The Aquarelle Folder Server. In ERCIM News, No. 33, April 1998. Drafts submitted or in preparation

[28] Irini Fundulaki and Sebastian Maneth. Formalizing XML Access Control Policies in the presence of updates. In Preparation, 2006.

[29] Chee-Yong Chan, Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia and Anastasios Kementsietsidis. Secure XML Querying with Security Views. In Preparation, 2006. Research Projects Database Group, School of Informatics, University of Edinburgh and Digital Curation Center

• Revisiting XML Access Control in the presence of Updates (May 2006 - present) (Joint work with James Cheney, Database group, University of Edinburgh and Sebastian Maneth, Na- tional ICT Australia) Although there has been an extensive amount of work on controlling access for XML data for read-only queries, the issue of access control for update queries has not been sufficiently tackled. In this research project we study the problem of controlling access to XML documents in the presence of update operations. For our work we consider the update operations proposed in the XQuery Update Facility document1.

1XQuery Update Facility:

4 • Personalizing XML Full Text Search (April 2005 - present) (Joint work with Sihem Amer-Yahia, Yahoo! Research, USA and Laks Lakshmanan, University of British Columbia, Canada) XML search is becoming widely popular due to the increasing amount of XML data and to a growing interest in supporting accuracy in searches for both structured and un-structured content. This accuracy varies across systems and a lot of effort is being put into designing scoring functions tailored to specific datasets. In this research project we propose to incorporate user information or user preferences to customize query answers and improve search. We propose a user profile language that incorporates two kinds of preferences: i) scoping rules that are used to rewrite the user query by modifying the structural and/or full text predicates in order to restrict or expand the set of query answers and ii) ordering rules that are used to rank the query results in some order other than the one proposed by the query engine. We have designed algorithms to enforce the user profiles on the user query and achieve a user profile-aware query ranking to guarantee efficient query personalization. Our preliminary effectiveness experiments on INEX (INitiative for the Evaluation of XML Retrieval)2 datasets and queries show that enforcing user profiles achieves good precision and recall. In addition, the efficiency experiments on synthetic datasets show that personalization induces negligible processing overhead. A proof-of-concept prototype was described in [14] and demonstrated at the 31st Int’l Conference on Very Large Databases (VLDB) 2005. A detailed presentation of the user profile language, the query customization and ranking algorithms is given in [3].

Network Data and Services Research Department, Bell Laboratories, Lucent Technologies

• XML Subtree Queries Platform (June 2004 - January 2006) (Joint work with Michael Benedikt) The objective of this project was to provide a set of tools for the management of XML Subtree Queries. Such queries are useful in many aspects of XML data and document processing: in data integration ap- plications that describe views of multiple (virtual) XML documents to be merged into a single document by specifying subtrees of the documents of each source to be merged; in access control applications that filter the inaccessible subdocuments of the XML documents that will eventually be delivered to the end-user. The main contributions of this work are: – an XML subtree query language (XSQL) that is parameterized by different fragments of XPath 1.0 and allows the natural specification of XML subtree queries – a set of composition algorithms for each variant of the subtree query language (depending on the XPath fragment), which allow us to compose subtree queries to form new subtree queries in the same language – a special purpose evaluator (XSQVL) that can implement efficiently subtree queries – a set of simplification rules applicable to both XML subtree and XPath queries and – a set of translation algorithms that translate XML subtree queries to XQuery expressions and XSLT programs. Our extensive experiments showed that the composition algorithms when used to compute a single query from a chain of queries (e.g., in the access control context) have negligible processing overhead. Efficiency experiments showed that the execution time of (a) composing a chain of queries to produce a single query using the composition algorithms of XSQL, translating it into an XQuery expression and evaluating it using an off-the-shelf XQuery evaluator, had an improvement of up to 100% for a common class of subtree queries, when compared to the execution time of (b) translating the chain of subtree queries into a chain of XQuery expressions and using an off-the-shelf XQuery evaluator. Results of this work were presented in [4].

• GUPster (February 2003-September 2004) (Joint work with Arnaud Sahuguet) The GUPster system aims at providing privacy-conscious management of distributed user-profile data. The main idea behind GUPster is to build an XML-based mediator that acts as a centralized meta-data 2INEX:

5 manager to handle highly distributed user profile XML data. The mediator acts as the single point of access between data producers and data consumers. GUPster aims to be the broker for user profile components which are (a) distributed across networks and (b) their distribution varies on a per user basis. Its role is two-fold: data integration and access control. This is the major difference with traditional mediator-based systems that only address the former. The system follows the reference architecture, developed by the Bell Labs GUPster team, that is guiding the direction of the Generic User Profile (GUP) standards body. GUPster uses an XML mediator-based architecture where: (a) the mediator’s backbone is an XML schema agreed upon by standard bodies 3; (b) the user profile document is distributed across disparate data and network sources; and (c) the access control rules are defined in terms of the XML schema and specify who can access what part of the user profile document. The outcome of this work was a prototype system that was shown to Bell Labs and Lucent Business Units and demonstrated to several Lucent customers. Results of this work were presented in [5], [15] and [16]. In the context of this project, we also examined different approaches for identity management. This issue is particularly important in environments where a user can use different identities in order to access her data. The outcome of this work is presented in [25].

Conservatoire National des Arts et M´etiers Paris (C.N.A.M.), Group Vertigo, Cedric Laboratory and Institut National de Recherche en Informatique et Automatique (I.N.R.I.A), France.

• Project Community Webs (C-Web) (European Project) (September 1999-September 2000) C-Web was a European project, defined in the 5th Framework. The principal partners were: ERCIM4 and EDW 5. The concerned ERCIM organizers were: INRIA (projects: Verso and Acacia) and ICS-FORTH (Information Systems and Software Technology Division). The objective of the C-Web project was to provide a generic and open platform for describing, organizing and querying various XML resources. A C-Web is a distributed warehouse of XML resources which are used by a given community that shares the knowledge of a specific domain. The backbone of a C-Web is an RDF schema that is shared by the C-Web users and defined by the integration of extensive taxonomies of terms and ontologies. This schema captured the basic notions and relationships of the domain of interest and was used for expressing the semantics of the available sources. My work for C-Web included the definition and implementation of a tool for the creation of RDF schemas defined by the integration of domain-specific ontologies and structured vocabularies. Results of this work were published in [1], [10], [20] and [26]. C-Web was one of the first projects to deal with research issues concerning the Semantic Web. • Inventaire (September 1999-September 2000) (Contract between the Conservatoire National des Arts et M´etiers-Paris and the French Ministry of Culture) The objective of this project was the development of a methodology and the necessary tools for the exchange and integration of documents describing cultural artifacts. The archaeologists from the French Ministry of Culture were using terms from the MERIMEE thesaurus6 to describe cultural artifacts. The users were able to query the descriptions of the artifacts using the terms from MERIMEE, but were unable to perform structured searches that would allow them to explore relations other than the ones provided by the thesauri. The idea was to integrate the CIDOC/ICOM ontology7 with the MERIMEE thesaurus. The first provides a schema that captures the basic notions and their relationships in the cultural domain. The second provides a rich set of vocabulary terms to describe in a more precise manner the semantics of the domain. This rich schema allowed the users to provide more structured descriptions of the cultural artifacts. In this project I was responsible for the realization of the following two prototypes: in the first, I con- structed a knowledge base by integrating an ontology (a conceptual schema defined independently of the schemas of the underlying sources) with structured vocabularies (thesauri), supplied by the Service de l’Inventaire. The second prototype is an extension of the first with the functionalities of description and querying of the documents of the Inventaire. More specifically, the knowledge base established in

33GPP GUP, The Third Generation Partnership Project ( in the telecommunications domain and Liberty Alliance ( 4 5 6MERIMEE Thesaurus: 7CIDOC/ICOM:

6 the first prototype is used in order to describe the contents of different documents in a precise way (using the terms of the chosen thesaurus) facilitating the exchange of documents without the detailed knowledge of their structure.

Institute of Computer Science, Foundation for Research and Technology (ICS-FORTH), Greece

• European Project “AQUARELLE” (Participant ICS-FORTH) (November 1996 - April 1998) (Responsibles: Prof. Panos Constantopoulos, Prof. Vassilis Christophides and Dr. Martin D¨orr) The Aquarelle project aimed at designing and developing a distributed multimedia information system, offering access to information describing cultural heritage. The main objective of the Aquarelle project was to make cultural information available to a large community of users by: “providing a uniform access to heterogeneous distributed databases and collections of data, linking and commenting pieces of information belonging to different databases and, offering multilingual terminology support.” My tasks in the Aquarelle project involved i) the design and implementation of the conceptual model for the representation and the generic storage of SGML documents using the Semantic Index System (SIS) developed at ICS-FORTH and ii) the design and implementation of an Application Programming Interface (API) in C that provided the functionalities of acquisition and update of SGML documents stored in the SIS Knowledge Base. Results of this work were presented in [12], [13] and [27]. • SIS-Thesaurus Management System (SIS-TMS) (November 1996 - April 1998) (Responsible: Dr. Martin D¨orr) The availability of central reference information as thesauri is critical for correct intellectual access to distributed databases, in particular to digital collections in international networks. There is a continuous raise of interest in thesauri, and several thesaurus management systems have appeared on the market. SIS-TMS is a thesaurus management system built on top of the Semantic Index System of ICS-FORTH and provides the necessary tools for the management of mono-lingual and multi-lingual thesauri. I was responsible for the representation, storage and implementation of the functionalities for the man- agement of monolingual and multilingual thesauri. SIS-TMS is an ICS-FORTH product. Later versions of SIS-TMS have been purchased by (a) the French Ministry of Culture (Direction de Patrimoine), (b) the Royal Commission of Historical Monuments of England, (c) the Museum Documentation Association and (d) the “Instituto Centrale per il Catalogo e la Documentazione” of the Italian Ministry of Culture. Results of this work were presented in [11].

Professional Service

• Program Chair with Angela Bonifati (ICAR-CNR, Italy) of the 8th ACM Int’l Workshop on Web Information and Data Management (WIDM 2006, • Reviewer (Conferences and Workshops) – 33rd Int’l Conf. on Very Large Datavases (VLDB 2007, Demonstration Track) – 10th Int’l Workshop on Web and Databases (WebDB 2007) – Conf´erence en Recherche d’Informations et Applications (CORIA 2006, 2007) – 15th Int’l Conf. on Information and Knowledge Management (CIKM 2006) – 4th Int’l XML Database Symposium (XSym 2006) – ACM Int’l Conf. on Management of Data (ACM SIGMOD 2005, Demonstration Track) – 7th ACM Int’l Workshop on Web Information and Data Management (WIDM 2005) – 6th Int’l Conf. on Web-Age Information Management (WAIM 2005) – 2nd Int’l Workshop on Semantic Web and Databases (SWDB 2004) – 15th Int’l Conf. on Database and Expert Systems Applications (DEXA 2004) – 1st Int’l XML Database Symposium (XSym 2003)

7 • Journal Reviewer

– Int’l Journal on Very Large Databases 2007 – ACM Transactions on Database Systems (TODS 2005) – IEEE Transactions on Knowledge and Data Engineering (TKDE 2002, 2004, 2005, 2006) – Knowledge and Information Systems (KAIS 2006) – Journal of Web Semantics (JWS 2005)

• External Reviewer (Conferences and Workshops) – 32nd Int’l Conf. on Very Large Databases (VLDB 2006) – EDBT-Workshop on Database Technologies for Handling XML Information on the Web (Data-X 2006) – Programming Language Technologies for XML (Plan-X 2005) – Int’l Workshop on Web and Databases (WEBDB 2005) – ACM Int’l Conf. on Management of Data (ACM SIGMOD 2001, 2004, 2007) – 16th Int’l Conf. on Database Engineering (ICDE 2000) – 8th Int’l Conf. on Information and Knowledge Management (CIKM 1999) • Editor of the 2nd Int’l Workshop on Semantic Web and Databases (SWDB 2004, with Christoph Bussler and Val Tannen. Proceedings published by the Lecture Notes in Computer Sci- ence (LNCS) Series, Springer-Verlag. Student Supervision • May 2005 - August 2005 (Bell Laboratories, Lucent Technologies) Supervision of the Diploma Thesis of Prateek Jain, 3rd year undergraduate student at the Indian Institute of Technology (IIT) Kanpur, Kanpur, India. • April 2004 - August 2004 (Bell Laboratories, Lucent Technologies) Supervision of the 4th year project of Bogdan Alexe, Ecole´ Polytechnique, Paris, France and Polytechnic University of Bucarest, Romania. • April 2003 - July 2003 (Bell Laboratories, Lucent Technologies) Co-supervised with Arnaud Sahuguet the 4th year projects of Guillaume Giraud, Nicola Onose and Nicolas Pombourq, Ecole´ Polytechnique, Paris, France. Nicolas Pombourq and Guillaume Giraud received the first price of their class for their work (out of 60 students). • January 2000 - September 2000 (Conservatoire National des Arts et M´etiers Paris (C.N.A.M.)) Co-supervised with Bernd Amann the Diploma Thesis of Stephane Radicevic. Dissertation: “Interfaces for conceptual schemas resulting from the integration of ontologies and the- sauri” • April 2000 - September 2000 (Conservatoire National des Arts et M´etiers Paris (C.N.A.M.)) Co-supervised with Michel Scholl and Bernd Amann the Master Thesis Project of Sandrine Lafois, SIR PE-CNAM-TELECOM, Ecole´ doctorale EDITE. Dissertation: “Implementation and comparison of data structures for the querying of voluminous hier- archies” Teaching Activities • September 2001 - August 2002: Teaching and Research Position (Attach´ee Temporaire d’Enseignement et de Recherche) Conservatoire National des Arts et M´etiers Paris (C.N.A.M.), France Responsible for the preparation and presentation of teaching assignments of the Database Systems course. Responsibles: Profs. Bernd Amann, Michel Scholl, and Dan Vodislav

8 • September 2000 - August 2001: Teaching and Research Position (Attach´ee Temporaire d’Enseignement et de Recherche) Institut d’Informatique d’Enterprise (IIE-CNAM), Paris, France

– September 2000 - December 2000: Responsible for the preparation and presentation of teaching assignments of the Automated Information Systems course. Responsibles: Profs. A. Cabanes and X. Castellani – September 2000 - May 2001: Responsible for the preparation and presentation of teaching assign- ments of the Database Systems course. Responsible: Prof. Mireille Jouve – January 2001 - April 2001: Responsible for the preparation and presentation of teaching assignments of the Advanced Database Systems course. Responsible: Prof. Mireille Jouve – April and May 2001: Seminars on XML and related technologies. Responsible: Prof. Mireille Jouve

• February 2000 - June 2000: Teaching Assistant Conservatoire National des Arts et M´etiers Paris (C.N.A.M.), France Responsible for the presentation of teaching assignments of the Database Systems course. Responsible: Profs. Michel Scholl, Bernd Amann and Dan Vodislav. • January 1996 - June 1996: Teaching Assistant Computer Science Department, University of Crete, Greece Supervision of students’ projects in the Information Systems course (post-graduate course). Responsible: Prof. Panos Constantopoulos • January 1995 - June 1995: Teaching Assistant Computer Science Department, University of Crete, Greece Supervision of students’ projects in the Information Systems course (post-graduate course). Responsible: Prof. Panos Constantopoulos • September 1994 - December 1994: Teaching Assistant Computer Science Department, University of Crete, Greece Responsible for the presentation and correction of the teaching assignments for the Database Systems course (under-graduate course). Responsible: Prof. Sarantos Kapidakis

Technical Skills

• Operating Systems: Unix, Linux, Microsoft Windows. • Query Languages: XQuery 1.0, XPath 1.0 and 2.0, XQuery Full-Text, RQL, OQL, SQL. • Technologies: XML-Spy, Apache Software (Xerces, Xalan), Apache Web Server, Apache Tomcat. • Data Modelling: UML, Description Logics, TELOS, ER.

• Programming Languages : C, Java, JavaCC (Java Compiler Compiler), O2C, Pascal, Fortran, Lex, Yacc, PHP (Hypertext Processor), LINDA (Parallel Programming).

• Databases: DB2, Oracle, MySQL, PostgreSQL, O2, TimesTen.

Languages English (Fluent), French (Fluent), Greek (native Speaker). Citizenship Greek (by birth).

9 Appendix

Ph.D. Thesis: Querying and Integration of XML resources for Web Communities In this thesis we have studied the problem of querying and integration of heterogeneous and autonomous XML resources. Our contribution is two-fold : first we have examined the problem of the construction of metadata schemas by the integration of ontologies and structured vocabularies (thesauri). Second, we have elaborated a model for the integration and querying of XML resources using the mediator-wrapper architecture where the schema of the mediator is an ontology. In the first part of our work we developed a methodology for the construction of metadata schemas by the integration of an ontology and of structured vocabularies (thesauri). Ontologies describe the generic structures in the domain of interest by concepts and roles. Thesauri are vocabularies of terms with precise semantics which are not well-structured. Ontologies have a double role in our model : they define a generic view of information and a structural interface over thesauri. The resulting metadata schema allows the description of a large number of different resources using the generic schema provided by the ontology and the precise semantics of thesaurus terms. The results of this research were validated by the prototype ELIOT developed in the context of a contract between CNAM-Paris and the Service de l’Inventaire of the French Ministry of Culture. In the second part of our work, we have studied the integration and querying of heterogeneous and autonomous XML resources. Our approach, STYX , is based on the mediator-wrapper architecture where the global schema of the mediator is an ontology which is not materialized : the actual data resides in the sources. Our contributions in this context are multiple. First we have defined a simple but expressive model for describing XML resources. The resources are described by means of mapping rules between XML fragments specified by XPath location paths and ontology paths. The use of an ontology at the mediator level (which can be perceived as a conceptual schema with symmetric and inheritance relations) and the XPath language, allows one to represent a large number of XML resources. In addition, the approach of path-to-path mapping allows one to attribute specific semantics to the parent/child relationship between the nodes in an XML document. We have developed a rewriting algorithm for tree queries, which transforms a user query formulated in terms of the ontology, into one or more XQuery queries expressed in terms of the local sources schema. User queries are tree queries with no joins, restructuring or aggregation. The rewriting algorithm calculates the variable to rule bindings by examining each query variable in the context of its father. The algorithm considers both the full bindings (complete answers) and the partial bindings (partial answers) between the query variables and the mapping rules. In this last case, the query is decomposed recursively in a prefix query which is evaluated by the source and one or more suffix queries which are considered for evaluation by the remaining sources. The join operation is used to complete the partial answers obtained by the evaluation of these queries. In this context we have addressed the problem of identification of information. The XML resources which we consider are heterogeneous and autonomous. In consequence, the assumption of persistent object identifiers is not realistic. Towards this direction, we have introduced the notion of global keys at the ontology level : a key is a set of paths which identify the instances of a concept. Each source can in this way control the fragment identifiers exported by the mapping rules associated with the key paths. The STYX rewriting algorithm considers the presence of keys for the decomposition of the queries and the construction of the results. Results of this work appeared in: [1,2,7,8,9,10,17,18,19,20].