Steven Minton, InferLink Corporation Sofus Macskassy, Fetch Technologies Peter LaMonica, Air Force Research Laboratories Kane See, InferLink Corporation Craig Knoblock, USC/Information Sciences Inst. Greg Barish, Fetch Technologies Matthew Michelson, Fetch Technologies Ray Liuzzi, Raymond Technologies Steve Minton, InferLink Steven Minton, Stanford University

Stephen Minton, Brain Surgeon Steve Minton, Fetch Technologies

Steven Minton, convicted felon

Steven Minton, Jonosboro High School

Steven Minton, JAIR ¡ Application domain: Arms trafficking ¡ Entity Intelligence Portal (ENTEL) ¡ Entity resolution process ¡ Mistakes: Maintaining referential integrity AIJ JAIR ICML AAAI IEEE Intelligent Systems Grants.gov

Web Monitoring System NASA US Forest Service

Twitter National Interagency Fire Center InciWeb.org

Web Monitoring System Airliners.net Banned Airlines

Aviation Week Twitter ATWonline Air Cargo News Aviation Safety Network

Web Monitoring System

Charged with conspiracy to support a terrorist organization, money laundering, …., Omega Aircompany Irbis Air Ishtar Airlines Mega Airlines Aerocom WING AIR Norse Air Charter Air Foyle Centrafricain Airlines Air Bas Air Pass Anikay (Anikai) Airlines Pietersberg Aviation Services Systems Santa Cruz Imperial Great Lake Business Company Balkh Airlines Air Zory JetLine International Flying Dolphin Sitrat Air MaxAvia San Air General Trading Air Mero African Express Air Leone Inter Transavia

Construction Registration Aircraft Type Previous Reg. Sighting Markings Nbr

SHJ 11May03 no markings UN-75002 Ilyushin 18E 185008603 3C-KKR SHJ 04Nov03 a/w, n/t

UN-75003 Ilyushin 18V 184006903 3C-KKJ SHJ 12Oct03 blue tail, no m/s

green cheatline SHJ 14Sep02 and blue tail UN-75004 Ilyushin 18D 186009202 3C-KKK SHJ 04Nov03 No t/t, blue tail SHJ 28Dec03 all white

UN-75005 Ilyushin 18D 187010204 3C-KKL SHJ 04Nov03

SHJ Oct02 No m/s UN-11007 Antonov 12B 9346509 3C-OOZ SHJ 11May03 all white c/s DXB 12Oct03 no titles

[From Ruudleeuw.com]

Web 3 GUI Source A Entitybase™ 5 Source B (entity resolution) Source C Source D 1 Facts Entity IDs Fetch Agent Platform™ Analytics Engine and (web harvesting) Entity-Resolved Text Facts 4 Content Store 2 Fact Extraction Text (entities, facts, relations from unstructured text) Social Network WatchLists OpenCalais Semantex

¡ Entity resolution: Link incoming records describing the same entity from multiple sources R. Landis, President, Fetch Technologies Robert Landes, CEO, Fetch Software

R. Land, CEO, French Alliance Technologies

¡ Many “common sense” issues, for instance: ▪ Multiple formats for names, addresses, etc. ▪ R.L. Landes vs. Robert Landes ▪ Noisy, incorrect values ▪ Landes vs. Landis ▪ Multi-valued attributes ▪ Landes can be both President and CEO ▪ Aliases and Deception

Cluster is a Composed of single entity multiple data records Confidence Threshold

New Record E1 E5 E4

E6 E3 E2

E7 Transformations Initial: Robert → R. Robert Landes, Spelling: Landes → Landis CEO, Fetch Tech Title alias: CEO → President

R. Landis, E1 President, Fetch Tech E5 E4

E3 E6 Transformations E2 Spelling: Land → Landis R. Land, Spelling: French→ Fetch E7 President, French Tech

Robert Landes, CEO, Fetch Tech P(E1 |D) = P(E1) P(D | E2 ) P(D) R. Landis, E1 President, Fetch Tech E5 E4

E3 E6 P(E2 |D) = P(E2) P(D | E2 ) E2 R. Land, P(D) E7 President, French Tech

R. Landis, E1 President, Fetch Tech E5 E4

E3 E6 P(Enew) P(D | Enew ) ? P(D) E2

E7

Enew New Record New Record New Record ¡ Merge example: § Air Cess and Air Bas aircraft ¡ Split example: § George H. W. Bush and George W. Bush EntityBase

E1 E3 E10

E5 E2 E6

E4 EntityBase

E2 ? E3 E10

E5

E6

E4 EntityBase

E3 E10

E5

E6

E4 EntityBase

D1 D2 D10 D6 E3 D3 D4 E10 D9 D5 D6 E5 D7 D11 E6 D13 D12 D8 E4 EntityBase Analytics

WatchList D1 D2 D10 E3 D3 Kartiga Air (D9) D4 Merpati Airlines (D11) E10 Air Cess (D138) D9 D5 D6 …. E5 D7 D11 E6

D13 D12 D8 E4 EntityBase Analytics

“Social” Network

E1 Publish E1 E3 Merges/Splits E200 E10 E2 E91 E5 E2 E9 E15 E6 E6

E34 E4

EntityBase Analytics

“Social” Network

Publish E1 E3 Merges/Splits E200 E10 E10 E2 E91 E5 E9 E15 E6 E6

E34 E4

¡ Two approaches: § Refer-by-Description ▪ Indirect reference: Point to a cluster member ▪ Advantage: Easy, no synchronization necessary ▪ …But limits information that client can cache § Refer-by-Identifier ▪ Direct reference: Cluster ID ▪ Advantage: Client can cache arbitrary information ▪ …But client must synch with EntityBase and maintain consistency

Client Data Entity Resolution Source Service Client Data Source a Data Client Source

¡ Vision: Entity Resolution in a decentralized world ¡ E.g., the Semantic Web (Glaser, Jaffri & Millard, 2009) ¡ Entity resolution can be hard: “AI Complete” § Arms trafficking domain ¡ Entity merges and splits will occur ¡ Entity resolution clients must be designed to deal with this ¡ Two strategies: Refer-by-Description and Refer-by-Identifier ¡ System status: Being evaluated by AF personnel