A Solution to the Object-Relational Mismatch

Total Page:16

File Type:pdf, Size:1020Kb

A Solution to the Object-Relational Mismatch Universidade do Minho Escola de Engenharia Miguel Esteves CazDataProvider: A solution to the object-relational mismatch Outubro de 2012 Universidade do Minho Escola de Engenharia Departamento de Informática Miguel Esteves CazDataProvider: A solution to the object-relational mismatch Dissertação de Mestrado Mestrado em Engenharia Informática Trabalho realizado sob orientação de Professor José Creissac Campos Outubro de 2012 To my parents... \Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice." Christopher Alexander Abstract Today, most software applications require mechanisms to store information persistently. For decades, Relational Database Management Systems (RDBMSs) have been the most common technology to provide efficient and reliable persistence. Due to the object-relational paradigm mismatch, object oriented applications that store data in relational databases have to deal with Object Relational Mapping (ORM) problems. Since the emerging of new ORM frameworks, there has been an attempt to lure developers for a radical paradigm shift. However, they still often have troubles finding the best persistence mechanism for their applications, especially when they have to bear with legacy database systems. The aim of this dissertation is to discuss the persistence problem on object oriented applications and find the best solutions. The main focus lies on the ORM limitations, patterns, technologies and alternatives. The project supporting this dissertation was implemented at Cachapuz under the Project Global Weighting Solutions (GWS). Essentially, the objectives of GWS were centred on finding the optimal persistence layer for CazFramework, mostly providing database interoperability with close-to-Structured Query Language (SQL) querying. Therefore, this work provides analyses on ORM patterns, frameworks, alternatives to ORM like Object-Oriented Database Management Systems (OODBMSs). It also describes the implementation of CazDataProvider, a .NET library tool providing database interoperability and dynamic query features. In the end, there is a performance comparison of all the technologies debated in this dissertation. The result of this dissertation provides guidance for adopting the best persistence technology or implement the most suitable ORM architectures. Key Words: ORM, SQL, RDBMS, Domain Model, ADO.NET, NHibernate, Entity Framework (EF). ii Resumo Hoje, a maioria dos aplica¸c~oesrequerem mecanismos para armazenar informa¸c~ao persistentemente. Durante d´ecadas, as RDBMSs t^emsido a tecnologia mais comum para fornecer persist^enciaeficiente e confi´avel.Devido `aincompatibilidade dos paradigmas objetos-relacional, as aplica¸c~oesorientadas a objetos que armazenam dados em bases de dados relacionais t^emde lidar com os problemas do ORM. Desde o surgimento de novas frameworks ORM, houve uma tentativa de atrair programadores para uma mudan¸caradical de paradigmas. No entanto, eles ainda t^em muitas vezes dificuldade em encontrar o melhor mecanismo de persist^enciapara as suas aplica¸c~oes,especialmente quando eles t^emde lidar com bases de dados legadss. O objetivo deste trabalho ´ediscutir o problema de persist^enciaem aplica¸c~oesorientadas a objetos e encontrar as melhores solu¸c~oes.O foco principal est´anas limita¸c~oes,padr~oes e tecnologias do ORM bem como suas alternativas. O projeto de apoio a esta disserta¸c~aofoi implementado na Cachapuz no ^ambitodo Projeto GWS. Essencialmente, os objetivos do GWS foram centrados em encontrar a camada de persist^enciaideal para a CazFramework, principalmente fornecendo interoperabilidade de base de dados e consultas em SQL. Portanto, este trabalho fornece an´alisessobre padr~oes,frameworks e alternativas ao ORM como OODBMS. Al´emdisso descreve a implementa¸c~aodo CazDataProvider, uma biblioteca .NET que fornece interoperabilidade de bases de dados e consultas din^amicas. No final, h´auma compara¸c~aode desempenho de todas as tecnologias discutidas nesta disserta¸c~ao. O resultado deste trabalho fornece orienta¸c~aopara adotar a melhor tecnologia de persist^enciaou implementar as arquiteturas ORM mais adequadas. Key Words: ORM, SQL, RDBMS, Domain Model, ADO.NET, NHibernate,EF. iv Acknowledgements It is with immense gratitude that I acknowledge the support and help of Prof Dr. Jos´e Creissac Campos who managed to carefully review my dissertation even on a tight sched- ule. I also thank Prof Dr. Ant´onioNestor Ribeiro for some solid and experienced advice. To my parents and girlfriend who have been very patient with me. I would like to thank Cachapuz for the scholarship and especially Eduardo Pereira for considering and helping the development of my work. Finally, I thank my friends particularly my colleague Ricardo Santos for all the dis- cussions and ideas we debated together at Cachapuz. v Acknowledgements Acknowledgements vi Contents 1 Introduction1 1.1 Context of Work................................2 1.2 Persistence Problem..............................4 1.3 Objectives....................................5 1.4 Structure of Dissertation............................6 2 Object Relational Mapping Theory7 2.1 Object Paradigm................................9 2.2 Relational Paradigm.............................. 11 2.3 ORM as a Paradigm.............................. 13 2.4 The ORM Commitment............................ 17 2.4.1 Inheritance............................... 18 2.4.1.1 Table-per-class........................ 18 2.4.1.2 Table-per-concrete-class................... 21 2.4.1.3 Table-per-class-family.................... 22 2.4.2 Associations............................... 23 2.4.3 Schema complications......................... 27 2.4.4 OID (object identity).......................... 28 2.4.5 Data retrieval.............................. 30 2.4.6 Partial-Object dilemma and Load Time trap............. 31 2.4.7 Transparent Persistence........................ 33 2.5 Alternatives to ORM.............................. 34 2.6 Conclusions................................... 40 3 Design Patterns for ORM 43 3.1 Domain Logic Patterns............................. 44 3.1.1 Transaction Script........................... 44 3.1.2 Table Module.............................. 47 vii CONTENTS CONTENTS 3.1.3 Domain Model............................. 54 3.1.4 Making a Decision........................... 58 3.2 Data Source Architectural Patterns...................... 61 3.2.1 Table Data Gateway.......................... 61 3.2.2 Active Record.............................. 62 3.2.3 Data Mapper.............................. 64 3.3 Object-Relational Behavioural Patterns.................... 68 3.3.1 Unit of Work.............................. 69 3.3.2 Identity Map.............................. 75 3.3.3 Lazy Load................................ 76 3.4 Object-Relational Metadata Mapping Patterns................ 82 3.4.1 Metadata Mapping........................... 82 3.4.2 Query Object.............................. 84 3.4.3 Repository................................ 86 3.5 Conclusions................................... 87 4 Object Relational Mapping Frameworks 89 4.1 Entity Framework................................ 90 4.1.1 Unit of Work.............................. 91 4.1.2 Optimistic Locking........................... 93 4.1.3 Code Customization.......................... 94 4.1.4 POCOs................................. 94 4.1.5 Testing EF............................... 96 4.1.5.1 Configuration and model testing.............. 96 4.1.5.2 Basic querying........................ 99 4.1.5.3 Eager and deferred load................... 100 4.1.6 Dynamism in EF............................ 102 4.2 NHibernate................................... 104 4.2.1 Unit of Work.............................. 106 4.2.2 Optimistic Locking........................... 108 4.2.3 Lazy Load................................ 108 4.2.4 Code Customization: Audit Logging................. 111 4.2.5 Testing NHibernate........................... 115 4.2.5.1 Simple Load and Identity Map............... 118 4.2.5.2 Linq-to-NHibernate join query examples.......... 119 4.2.5.3 Lazy Collections examples.................. 121 viii CONTENTS CONTENTS 4.2.5.4 Cascading Delete operations................. 125 4.2.5.5 HQL examples........................ 126 4.2.5.6 Dynamic LINQ........................ 129 4.2.5.7 Database Synchronization.................. 130 4.2.6 Dynamism in NHibernate....................... 132 4.3 Conclusions................................... 132 5 Implementation of CazDataProvider 135 5.1 Analysis of ClassBuilder............................ 136 5.1.1 Relational Domain Model....................... 137 5.1.2 Data Mapper.............................. 139 5.1.3 Unit of Work and Optimistic Locking................. 140 5.1.4 Audit Logging.............................. 143 5.1.5 Conclusions............................... 144 5.2 .NET Data Providers.............................. 146 5.3 Designing an Architecture........................... 151 5.3.1 Solution 1: Data Context Facade and Factory............ 152 5.3.2 Solution 2: Provider Factory...................... 154 5.3.3 Solution 3: Abstract Provider Factory................ 155 5.3.4 Solution 4: Provider Factory with Subclassing............ 156 5.3.5 Solution 5: Provider Factory with Template Method........ 159 5.3.6 Solution 6: Query Object....................... 163 5.4 Implementation................................
Recommended publications
  • Modellgetriebene O/R-Mapper: Überblick Und Vergleich“
    Technische Hochschule Köln TH Köln – University of Applied Sciences Campus Gummersbach Fakultät für Informatik und Ingenieurwissenschaften Fachhochschule Dortmund University of Applied Sciences and Arts Fachbereich Informatik Verbundstudiengang Wirtschaftsinformatik Abschlussarbeit zur Erlangung des Bachelorgrades Bachelor of Science in der Fachrichtung Informatik „Modellgetriebene O/R-Mapper: Überblick und Vergleich“ Erstprüfer: Prof. Dr. Heide Faeskorn-Woyke Zweitprüfer: Prof. Dr. Birgit Bertelsmeier vorgelegt am: 20.06.2016 von cand. Christian Herrmann aus Bollinghausen 3 42929 Wermelskirchen Tel.: 02196/8822737 Email: [email protected] Matr.-Nr.: 11082914 2 Inhaltsverzeichnis Abbildungsverzeichnis .................................................................................................. 5 Tabellenverzeichnis ...................................................................................................... 6 Abkürzungs- u. Symbolverzeichnis ............................................................................... 7 1 Das Besondere an modellgetriebenen O/R-Mappern ........................................... 9 2 Modellgetriebener Ansatz und O/R-Mapper im Licht wissenschaftlicher Erkenntnisse .................................................................................................. 11 2.1 Modellgetriebene Softwareentwicklung und der Wunsch nach Automatisierung in der Softwareentwicklung ............................................................................... 11 2.1.1 Model Driven Software Development
    [Show full text]
  • Automatically Detecting ORM Performance Anti-Patterns on C# Applications Tuba Kaya Master's Thesis 23–09-2015
    Automatically Detecting ORM Performance Anti-Patterns on C# Applications Tuba Kaya Master's Thesis 23–09-2015 Master Software Engineering University of Amsterdam Supervisors: Dr. Raphael Poss (UvA), Dr. Giuseppe Procaccianti (VU), Prof. Dr. Patricia Lago (VU), Dr. Vadim Zaytsev (UvA) i Abstract In today’s world, Object Orientation is adopted for application development, while Relational Database Management Systems (RDBMS) are used as default on the database layer. Unlike the applications, RDBMSs are not object oriented. Object Relational Mapping (ORM) tools have been used extensively in the field to address object-relational impedance mismatch problem between these object oriented applications and relational databases. There is a strong belief in the industry and a few empirical studies which suggest that ORM tools can cause decreases in application performance. In this thesis project ORM performance anti-patterns for C# applications are listed. This list has not been provided by any other study before. Next to that, a design for an ORM tool agnostic framework to automatically detect these anti-patterns on C# applications is presented. An application is developed according to the designed framework. With its implementation of analysis on syntactic and semantic information of C# applications, this application provides a foundation for researchers wishing to work further in this area. ii Acknowledgement I would like to express my gratitude to my supervisor Dr. Raphael Poss for his excellent support through the learning process of this master thesis. Also, I like to thank Dr. Giuseppe Procaccianti and Prof. Patricia Lago for their excellent supervision and for providing me access to the Green Lab at Vrije Universiteit Amsterdam.
    [Show full text]
  • De La Torre C. Et
    y i EDITION v2.01 DOWNLOAD available at: https://aka.ms/microservicesebook PUBLISHED BY Microsoft Developer Division, .NET and Visual Studio product teams A division of Microsoft Corporation One Microsoft Way Redmond, Washington 98052-6399 Copyright © 2017 by Microsoft Corporation All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher. This book is provided “as-is” and expresses the author’s views and opinions. The views, opinions and information expressed in this book, including URL and other Internet website references, may change without notice. Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred. Microsoft and the trademarks listed at http://www.microsoft.com on the “Trademarks” webpage are trademarks of the Microsoft group of companies. Mac and macOS are trademarks of Apple Inc. The Docker whale logo is a registered trademark of Docker, Inc. Used by permission. All other marks and logos are property of their respective owners. Co-Authors: Editors: Cesar de la Torre, Sr. PM, .NET product team, Microsoft Corp. Mike Pope Bill Wagner, Sr. Content Developer, C+E, Microsoft Corp. Steve Hoag Mike Rousos, Principal Software Engineer, DevDiv CAT team, Microsoft Participants and reviewers: Jeffrey Ritcher, Partner Software Eng, Azure team, Microsoft Dylan Reisenberger, Architect and Dev Lead at Polly Jimmy Bogard, Chief Architect at Headspring Steve Smith, Software Craftsman & Trainer at ASPSmith Ltd. Udi Dahan, Founder & CEO, Particular Software Ian Cooper, Coding Architect at Brighter Jimmy Nilsson, Co-founder and CEO of Factor10 Unai Zorrilla, Architect and Dev Lead at Plain Concepts Glenn Condron, Sr.
    [Show full text]
  • Download Guide
    Profiling and Discovery Sizing Guidelines for Version 10.1 © Copyright Informatica LLC 1993, 2021. Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners. Abstract The system resource guidelines for profiling and discovery include resource recommendations for the Profiling Service Module, the Data Integration Service, profiling warehouse, and hardware settings for different profile types. This article describes the system resource and performance tuning guidelines for profiling and discovery. Supported Versions • Data Quality 10.1 Table of Contents Profiling Service Module........................................................ 3 Overview................................................................ 3 Functional Architecture of Profiling Service Module..................................... 4 Scaling the Run-time Environment for Profiling Service Module............................. 5 Profiling Service Module Resources............................................... 5 Sizing Guidelines for Profiling.................................................... 7 Overview................................................................ 7 Profile Sizing Process........................................................ 8 Deployment Architecture.....................................................
    [Show full text]
  • 26. Relational Databases and Beyond
    26 Relational databases and beyond M F WORBOYS This chapter introduces the database perspective on geospatial information handling. It begins by summarising the major challenges for database technology. In particular, it notes the need for data models of sufficient complexity, appropriate and flexible human-database interfaces, and satisfactory response times. The most prevalent current database paradigm, the relational model, is introduced and its ability to handle spatial data is considered. The object-oriented approach is described, along with the fusion of relational and object- oriented ideas. The applications of object-oriented constructs to GIS are considered. The chapter concludes with two recent challenges for database technology in this field: uncertainty and spatio-temporal data handling. 1 INTRODUCTION TO DATABASE SYSTEMS variation in their requirements. Data should be retrieved as effectively as possible. It should be 1.1 The database approach possible for users to link pieces of information Database systems provide the engines for GIS. In the together in the database to get the benefit of the database approach, the computer acts as a facilitator added value from making the connections. Many of data storage and sharing. It also allows the data users may wish to use the store, maybe even the to be modified and analysed while in the store. For a same data, at the same time and this needs to be computer system to be an effective data store, it controlled. Data stores may need to be linked to must have the confidence of its users. Data owners other stores for access to pieces of information not and depositors must have confidence that the data in their local holdings.
    [Show full text]
  • Mastering JEE Design Patterns
    "Charting the Course ... ... to Your Success!" Mastering JEE Design Patterns Course Summary Description Geared for experienced enterprise Java (JEE) developers, Mastering JEE Design Patterns is a lab- intensive Java / JEE design patterns training course which explores the many sophisticated JEE-oriented design patterns and how to use these patterns to develop solid, robust and reusable JEE applications. Technologies such as JPA and EJB3, as well as frameworks such as Spring, web services and rich interfaces, have significantly impacted previous generations of design patterns. Many of these technologies were heavily influenced by the very problems that previous design patterns addressed. While the basic patterns still ring true, the more advanced patterns have evolved into more robust solutions for secure, stable and scalable enterprise applications. Working in a hands-on environment, developers will explore key patterns in each of the different JEE layers and how they are used most effectively in building robust, reusable JEE applications. Objectives Working in a dynamic, interactive discussion and hands-on programming environment, let by our JEE expert team, students will explore the following pattern categories: Crosscutting Business Tier Presentation Tier Integration Tier Topics Introduction to Design Patterns Integration Tier Patterns “Gang of Four” Design Patterns Presentation Tier Patterns Base Patterns Crosscutting Patterns Business Tier Patterns Working with Patterns Audience This is an intermediate level Java EE (JEE) developer course, designed for experienced Java developers, new to JEE, that need to further extend their skills in web development and Struts. Prerequisites Attendees should have an extensive working knowledge in developing basic Java applications. Duration Five days Due to the nature of this material, this document refers to numerous hardware and software products by their trade names.
    [Show full text]
  • Design Optimizations of Enterprise Applications – a Literature Survey
    Design Optimizations of Enterprise Applications – A Literature Survey Willie (Phong) Tu Dept. of Computer Science, University of Calgary [email protected] Abstract Another dimension is the performance improvement for the single user versus multiple users, in other words, Software design patterns have been defined as scalability. The design optimization patterns noted here possible solutions to reoccurring problems. One aspect will focus on actual running time of operations for of the problems is performance. This can be arguably single user performance and multi-user scalability true in high volume enterprise applications. This paper improvements. will explore the current discussions and research in The organization of the findings is categorized utilization of software design patterns to optimize through the usage of an architectural pattern, which is enterprise applications. defined as layer [5]. The layers used are the primary three-layer architecture for information systems [10,21], which includes the data access layer, domain layer, and 1. Introduction presentation layer. Design optimization patterns that spans layers will be elaborated first, then a bottom up First, let’s define what is design optimization. In this approach starting with the data access layer, followed by context, design optimization is the improvement of an the domain layer, and finally the presentation layer. We application’s design and performance using software will then continue on with findings from different design patterns. Design improvements can be subjective, enterprise applications. After which, we will summarize thus the main qualifier is the utilization of software the current state in design optimizations of enterprise design patterns to improve the performance of an applications based on the findings.
    [Show full text]
  • Enterprise Application Design Patterns: Improved and Applied
    Enterprise Application Design Patterns: Improved and Applied Stuart Thiel A Thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Master of Computer Science at Concordia University Montreal, Quebec, Canada January 2010 © Stuart Thiel, 2010 i CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Stuart Thiel Entitled: Enterprise Application Design Patterns: Improved and Applied and submitted in partial fulfillment of the requirements for the degree of Master of Computer Science complies with the regulations of the University and meets the accepted standards with respect to originality and quality. Signed by the final examining committee: ______________________________________________Chair Dr. Nematollaah Shiri ______________________________________________Examiner Dr. Greg Butler ______________________________________________Examiner Dr. Yuhong Yan ______________________________________________Supervisor Dr. Patrice Chalin Approved by __________________________________________ Chair of Department or Graduate Program Director _____________________________________ Dr. Robin Drew, Dean Faculty of Engineering and Computer Science Date ______________________________________________ i ii Abstract Enterprise Application Design Patterns: Improved and Applied Stuart Thiel Providing developers with proper tools is of ever increasing importance as software integrates itself further into all aspects of our lives. Aside from conventional hardware and software tools, architectural and design patterns have been identified over the years as a means to communicate knowledge of known problems and their solutions. In this thesis, we present several refinements and additions to these patterns, building primarily on Martin Fowler’s Patterns of Enterprise Application Architecture (2003). We present a practical implementation approach to using these patterns and discuss a framework that we have developed to aid practitioners in following this methodology.
    [Show full text]
  • NET Technology Guide for Business Applications // 1
    .NET Technology Guide for Business Applications Professional Cesar de la Torre David Carmona Visit us today at microsoftpressstore.com • Hundreds of titles available – Books, eBooks, and online resources from industry experts • Free U.S. shipping • eBooks in multiple formats – Read on your computer, tablet, mobile device, or e-reader • Print & eBook Best Value Packs • eBook Deal of the Week – Save up to 60% on featured titles • Newsletter and special offers – Be the first to hear about new releases, specials, and more • Register your book – Get additional benefits Hear about it first. Get the latest news from Microsoft Press sent to your inbox. • New and upcoming books • Special offers • Free eBooks • How-to articles Sign up today at MicrosoftPressStore.com/Newsletters Wait, there’s more... Find more great content and resources in the Microsoft Press Guided Tours app. The Microsoft Press Guided Tours app provides insightful tours by Microsoft Press authors of new and evolving Microsoft technologies. • Share text, code, illustrations, videos, and links with peers and friends • Create and manage highlights and notes • View resources and download code samples • Tag resources as favorites or to read later • Watch explanatory videos • Copy complete code listings and scripts Download from Windows Store Free ebooks From technical overviews to drilldowns on special topics, get free ebooks from Microsoft Press at: www.microsoftvirtualacademy.com/ebooks Download your free ebooks in PDF, EPUB, and/or Mobi for Kindle formats. Look for other great resources at Microsoft Virtual Academy, where you can learn new skills and help advance your career with free Microsoft training delivered by experts.
    [Show full text]
  • An Approach to Maintainable Model Transformations with an Internal DSL
    An approach to maintainable model transformations with an internal DSL Master thesis of Georg Hinkel At the Department of Informatics Institute for Program Structures and Data Organization (IPD) Reviewer: Prof. Dr. Ralf Reussner Second reviewer: Prof. Dr. Walther Tichy Advisor: Dr. Lucia Happe Second advisor: Dr. Thomas Goldschmidt Duration:: 1st May 2013 – 31th October 2013 KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu I declare that I have developed and written the enclosed thesis completely by myself, and have not used sources or means without declaration in the text. PLACE, DATE ......................................... (Georg Hinkel) iii Abstract In recent years, model-driven software development (MDSD) has gained popularity among both industry and academia. MDSD aims to generate traditional software artifacts from models. This generation process is realized in multiple steps. Thus, before being transformed to software artifacts, models are transformed into models of other metamodels. Such model transformation is supported by dedicated model transformation languages. In many cases, these are entirely new languages (external domain-specific languages, DSLs) for a more clear and concise represen- tation of abstractions. On the other hand, the tool support is rather poor and the transformation developers hardly know the transformation language. A possible solution for this problem is to extend the programming lan- guage typically used by developers (mostly Java or C#) with the re- quired abstractions. This can be achieved with an internal DSL. Thus, concepts of the host language can easily be reused while still creating the necessary abstractions to ease development of model transformations.
    [Show full text]
  • EMC® Data Domain® Operating System 5.7 Administration Guide
    EMC® Data Domain® Operating System Version 5.7 Administration Guide 302-002-091 REV. 02 Copyright © 2010-2016 EMC Corporation. All rights reserved. Published in the USA. Published March, 2016 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC², EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. For the most up-to-date regulatory document for your product line, go to EMC Online Support (https://support.emc.com). EMC Corporation Hopkinton, Massachusetts 01748-9103 1-508-435-1000 In North America 1-866-464-7381 www.EMC.com 2 EMC Data Domain Operating System 5.7 Administration Guide CONTENTS Preface 13 Chapter 1 EMC Data Domain System Features and Integration 17 Revision history.............................................................................................18 EMC Data Domain system overview............................................................... 19 EMC Data Domain system features...............................................................
    [Show full text]
  • Unsupervised Spatial, Temporal and Relational Models for Social Processes
    Unsupervised Spatial, Temporal and Relational Models for Social Processes George B. Davis February 2012 CMU-ISR-11-117 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Kathleen M. Carley (CMU, ISR), Chair Christos Faloutsos (CMU, CSD) Javier F. Pe~na(CMU, Tepper) Carter T. Butts (UCI, Sociology / MBS) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy This work was supported in part by the National Science Foundation under the IGERT program (DGE- 9972762) for training and research in CASOS, the Office of Naval Research under Dynamic Network Analysis program (N00014- 02-1-0973, ONR N00014-06-1-0921, ONR N00014-06-1-0104) and ONR MURI N00014-08-1-1186, the Army Research Laboratory under ARL W911NF-08-R-0013, the Army Research Instituute under ARI W91WAW-07-C-0063, and ARO-ERDC W911NF-07-1-0317. Additional support was provided by CASOS - the Center for Computational Analysis of Social and Organizational Systems at Carnegie Mellon University. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied of the National Science Foundation, the Office of Naval Research, or the U.S. government. Keywords: Clustering, unsupervised learning, factor graphs, kernel density estimation Abstract This thesis addresses two challenges in extracting patterns from social data generated by modern sensor systems and electronic mechanisms. First, that such data often combine spatial, temporal, and relational evidence, requiring models that properly utilize the regularities of each domain. Sec- ond, that data from open-ended systems often contain a mixture between entities and relationships that are known a priori, others that are explicitly detected, and still others that are latent but significant in interpreting the data.
    [Show full text]