A Computational Framework for Identity and Its Web-Based
Total Page:16
File Type:pdf, Size:1020Kb
William Nick et al. MAICS 2017 pp. 33–40 A Computational Framework for Identity and Its Web-based Realization William Nick Emma Sloan Hannah Foster Albert Esterline Madison Mccotter Brown University James Mayes North Carolina A&T State U. Siobahn Day Greensboro, NC 27411 North Carolina A&T State U. Greensboro, NC 27411 Marguerite McDaniel [email protected] Greensboro, NC 27411 [email protected] North Carolina A&T State U. [email protected] Greensboro, NC 27411 [email protected] {wmnick, mumccott, scday, mamcdan2} @aggies.ncat.edu Abstract and Dempster-Shafer theory (used to handle uncertainty and collaborating and conflicting evidence). Section 3 presents This paper presents a computational framework for identity our running example, Section 4 presents our ontologies, and and is particularly focused on identifying the culprit in a crime-scene investigation. A case is conceptualized as a con- Section 5 summarizes the encoding of our examples in RDF. stellation of situations in the sense of Barwise’s situation the- Section 6 discusses the SWRL (Semantic Web Rule Lan- ory. Data on a case is stored as RDF triples in a triple store. guage) rules that complement our ontologies. Section 7 ad- Several relevant OWL ontologies have been developed and dresses evidence in the legal sense and the importance of supplemented with SWRL rules. Uncertainty and combin- certain objects, viz., biometric artifacts, that persist across ing levels of (possibly conflicting) evidence are handled with situations. Section 8 presents our application of Dempster- Dempster-Shafer theory. A webpage is being developed to Shafer theory, Section 9 outlines a functional design of our make available to students of criminal justice the results of webpage, and Section 10 sketches the ongoing implemen- our work. The user will be able to query about evidence and tation of our web-based system. Section 11 concludes and follow how it accrues to various hypotheses. suggests future work. Our framework is used in a way compatible with con- 1 Introduction temporary crime-scene investigation, with most information We are developing a computational framework for identity, manually encoded as RDF triples and possibly automated and, with our focus on criminal investigations, we are in- encoding of documents. We rely on human perception ex- tegrating ontologies and (to combine levels of evidence) cept for biometric matching. Ontologies, which capture ex- Dempster-Shafer theory into our system. A particularly im- pert knowledge and conventional practice (there is no ma- portant goal of this project is a web interface to this system chine learning), constrain the encoding and support infer- for learning purposes. This webpage will allow a student to ence. Dempster-Shafer theory reveals how evidence com- query information regarding criminal investigations and fol- bines and provides guidance even when evidence is weak. low how evidence accrues to various hypotheses about the For other recent presentations of our framework, see (Mc- identity of the culprit. Daniel et al. 2017a) and (McDaniel et al. 2017b) regarding The SuperIdentity project is the state-of-the-art in frame- ontologies and (Sloan et al. 2016) and (Sloan et al. 2017) works for identity (Creese et al. ). It starts with some known regarding application of Dempster-Shafer theory. information or element of identity, such as a username or email address, and transforms that element into others, e.g., 2 Background by looking up an email address to find the associated user- name. These elements are grouped by type (e.g., phone num- 2.1 Situation Theory ber) into characteristics, multisets of elements. The set of all Our computational framework is based on the situation the- characteristics is a person’s superidentity, a compilation of ory of Barwise and Perry (Barwise and Perry 1983), espe- all known information on them. Our framework covers all cially as systematized by Devlin (Devlin 1995). According aspects of the SuperIdentity framework but from a situation- to Barwise, “‘[s]ituation’ is our name for those portions of theory perspective. We assemble constellations of situations reality that agents find themselves in, and about which they (in the technical sense of Barwise and Perry (Barwise and exchange information" (Barwise 1989). A situation supports Perry 1983)) to produce a case as in the legal sense, provid- elementary items of information, called infons, each essen- ing more structure and provenance than provided by superi- tially a relation among objects at a time and place (or pos- dentities. sibly the lack of such a relation). A real situation supports The remainder of this paper is organized as follows. Sec- an indefinite number of infons. We generally work with ab- tion 2 presents background: situation theory (the theoreti- stract situations, each supporting a finite number of (possi- cal background for our representations), Semantic Web re- bly parameterized) infons. An abstract situation amounts to sources (our OWL ontologies serve as knowledge bases, and a type under which real situations are classified. data is stored as RDF triples conforming to our ontologies), There are constraints between situations, as expressed, for 33 A Computational Framework for Identity and Its Web-based Realization pp. 33–40 example, by “smoke means fire." By virtue of constraints, SPARQL (Pérez, Arenas, and Gutierrez 2006) is a SQL- one situation may carry information about another. By virtue like query language for triple stores. The WHERE clause con- of conventional (linguistic) constraints, an utterance situa- tains a pattern of triples that will be matched by the query tion, in which someone performs a (declarative) speech act, engine. Query output is what is bound to the variables in the carries information about a described situation. Whether the SELECT clause. Various applications allow one to infer new speech act is felicitous may depend on resource situations triples from those present in the triple store via connections related by conventions to the utterance situation; in a purely captured in the OWL ontologies. For additional inference linguistic setting, such situations typically support infons ex- patterns, the ontologies can be supplemented with rules in pressed by relative clauses. the Semantic Web Rule Language (SWRL). These are if- In our framework, an id-action takes place in what we then rules that use the concepts expressed in the ontologies. call an id-situation. Any id-action is considered an asser- tion of identity even if it is not verbal. So an id-situation 2.3 Dempster-Shafer Theory is an utterance situation, and the crime scene is the corre- Dempster-Shafer (DS) theory is a justification-based way of sponding described situation. Supporting situations essential distributing evidence (Halpern 2005). “Mass" of evidence to crime-scene evidence (e.g., those where suspects’ finger- distributes to sets of elements or outcomes, with unassigned prints were recorded) are resource situations: there are con- mass, representing ignorance, given to the set of all ele- ventional constraints requiring the existence of properly exe- ments, called the frame of discernment. This assigns masses cuted situations for the evidence to be admissible. Together, sum to 1.0. A focal element is a set with non-zero mass. A the id-situation, the described situation (crime scene), and refinement is the analysis on the frame of discernment to get the resource situations make an id-case. a more detailed frame of discernment. Given a mass function, the belief function for a set is 2.2 Semantic Web Resources the lower bound for likelihood, calculated by adding the The semantic web is built off of two W3C standards (Pan masses of all subsets of the set, while the plausibility is 2009): the resource description framework (RDF) [ref] and the upper bound for the likelihood, calculated by adding the resource description framework schema (RDFS). The the masses from all of the sets that overlap the set. In web ontology language (OWL) extends the expressiveness symbols, for a frame of discernment ⇥, a mass function of RDFS and allows for the creation of ontologies. “Ontol- m, and any subset ✓ of ⇥, Bel(✓)= ✓ ✓ m(✓⇤) and ⇤✓ ogy" is a term borrowed from philosophy, where it means Plaus(✓)= m(✓⇤). Thus, for any ✓ ⇥, ✓⇤ ✓= P ✓ the conceptualization of entities in the world and how they Bel(✓) Plaus(✓)\. 6 ; interact with each other, but, in computer science, it denotes DS theory allowsP for the combination of multiple mass a conceptualization of a domain. functions for different kinds of evidence to produce a new RDF is a W3C recommendation that allows for the anno- mass function that relates to the combined evidence. There tation of web resources. RDF statements (known as triples) are a number of combination rules that fit different types of are in the form of subject predicate object, where predicate data better. For example, Dempster’s rule combines pieces of is a binary relation. A resource (thing) is denoted in RDF by evidence that are equally reliable while preserving the uncer- a uniform resource identifiers (URI), a string unique across tainty inherent within each piece of evidence. Specifically, the web. A URI reference (URIref) is a URI with an optional Dempster’s rule calculates a measure, K, of conflict between fragment identifier at the end. A URIref is usually repre- the mass functions and divides that measure as mass among sented as a Qname, pre:lp, where pre is a URI (essentially the different focal elements, including the focal element a namespace pefix) and lp is the local part. A blank node that is the entire frame of discernment. In symbols, with (bnode) is a resource that is not identified by a URIref, func- K = m (B)m (C) for mass functions m and tioning like a pronoun. One RDF serialization defined by B C=✓ 1 2 1 m and focal\ 6 elements B and C.