Steven Minton, Inferlink Corporation Sofus Macskassy, Fetch
Total Page:16
File Type:pdf, Size:1020Kb
Steven Minton, InferLink Corporation Sofus Macskassy, Fetch Technologies Peter LaMonica, Air Force Research Laboratories Kane See, InferLink Corporation Craig Knoblock, USC/Information Sciences Inst. Greg Barish, Fetch Technologies Matthew Michelson, Fetch Technologies Ray Liuzzi, Raymond Technologies Steve Minton, InferLink Steven Minton, Stanford University Stephen Minton, Brain Surgeon Steve Minton, Fetch Technologies Steven Minton, convicted felon Steven Minton, Jonosboro High School Steven Minton, JAIR ¡ Application domain: Arms trafficking ¡ Entity Intelligence Portal (ENTEL) ¡ Entity resolution process ¡ Mistakes: Maintaining referential integrity AIJ JAIR ICML AAAI IEEE Intelligent Systems Grants.gov Web Monitoring System NASA US Forest Service Twitter National Interagency Fire Center InciWeb.org Web Monitoring System Airliners.net Banned Airlines Aviation Week Twitter ATWonline Air Cargo News Aviation Safety Network Web Monitoring System Charged with conspiracy to support a terrorist organization, money laundering, …., Omega Aircompany Irbis Air Ishtar Airlines Mega Airlines Aerocom WING AIR Norse Air Charter Air Cess Galaxy Air Air Foyle Centrafricain Airlines Click Airways Air Bas Air Pass Anikay (Anikai) Airlines Pietersberg Aviation Services Systems Santa Cruz Imperial Great Lake Business Company Balkh Airlines Phoenix Aviation Dolphin Air Air Zory JetLine International Flying Dolphin Sitrat Air MaxAvia San Air General Trading Air Mero African Express Air Leone Inter Transavia Construction Registration Aircraft Type Previous Reg. Sighting Markings Nbr SHJ 11May03 no markings UN-75002 Ilyushin 18E 185008603 3C-KKR SHJ 04Nov03 a/w, n/t UN-75003 Ilyushin 18V 184006903 3C-KKJ SHJ 12Oct03 blue tail, no m/s green cheatline SHJ 14Sep02 and blue tail UN-75004 Ilyushin 18D 186009202 3C-KKK SHJ 04Nov03 No t/t, blue tail SHJ 28Dec03 all white UN-75005 Ilyushin 18D 187010204 3C-KKL SHJ 04Nov03 SHJ Oct02 No m/s UN-11007 Antonov 12B 9346509 3C-OOZ SHJ 11May03 all white c/s DXB 12Oct03 no titles [From Ruudleeuw.com] Web 3 GUI Source A Entitybase™ 5 Source B (entity resolution) Source C Source D 1 Facts Entity IDs Fetch Agent Platform™ Analytics Engine and (web harvesting) Entity-Resolved Text Facts 4 Content Store 2 Fact Extraction Text (entities, facts, relations from unstructured text) Social Network WatchLists OpenCalais Semantex ¡ Entity resolution: Link incoming records describing the same entity from multiple sources R. Landis, President, Fetch Technologies Robert Landes, CEO, Fetch Software R. Land, CEO, French Alliance Technologies ¡ Many “common sense” issues, for instance: ▪ Multiple formats for names, addresses, etc. ▪ R.L. Landes vs. Robert Landes ▪ Noisy, incorrect values ▪ Landes vs. Landis ▪ Multi-valued attributes ▪ Landes can be both President and CEO ▪ Aliases and Deception Cluster is a Composed of single entity multiple data records Confidence Threshold New Record E1 E5 E4 E6 E3 E2 E7 Transformations Initial: Robert → R. Robert Landes, Spelling: Landes → Landis CEO, Fetch Tech Title alias: CEO → President R. Landis, E1 President, Fetch Tech E5 E4 E3 E6 Transformations E2 Spelling: Land → Landis R. Land, Spelling: French→ Fetch E7 President, French Tech Robert Landes, CEO, Fetch Tech P(E1 |D) = P(E1) P(D | E2 ) P(D) R. Landis, E1 President, Fetch Tech E5 E4 E3 E6 P(E2 |D) = P(E2) P(D | E2 ) E2 R. Land, P(D) E7 President, French Tech R. Landis, E1 President, Fetch Tech E5 E4 E3 E6 P(Enew) P(D | Enew ) ? P(D) E2 E7 Enew New Record New Record New Record ¡ Merge example: § Air Cess and Air Bas aircraft ¡ Split example: § George H. W. Bush and George W. Bush EntityBase E1 E3 E10 E5 E2 E6 E4 EntityBase E2 ? E3 E10 E5 E6 E4 EntityBase E3 E10 E5 E6 E4 EntityBase D1 D2 D10 D6 E3 D3 D4 E10 D9 D5 D6 E5 D7 D11 E6 D13 D12 D8 E4 EntityBase Analytics WatchList D1 D2 D10 E3 D3 Kartiga Air (D9) D4 Merpati Airlines (D11) E10 Air Cess (D138) D9 D5 D6 …. E5 D7 D11 E6 D13 D12 D8 E4 EntityBase Analytics “Social” Network E1 Publish E1 E3 Merges/Splits E200 E10 E2 E91 E5 E2 E9 E15 E6 E6 E34 E4 EntityBase Analytics “Social” Network Publish E1 E3 Merges/Splits E200 E10 E10 E2 E91 E5 E9 E15 E6 E6 E34 E4 ¡ Two approaches: § Refer-by-Description ▪ Indirect reference: Point to a cluster member ▪ Advantage: Easy, no synchronization necessary ▪ …But limits information that client can cache § Refer-by-Identifier ▪ Direct reference: Cluster ID ▪ Advantage: Client can cache arbitrary information ▪ …But client must synch with EntityBase and maintain consistency Client Data Entity Resolution Source Service Client Data Source a Data Client Source ¡ Vision: Entity Resolution in a decentralized world ¡ E.g., the Semantic Web (Glaser, Jaffri & Millard, 2009) ¡ Entity resolution can be hard: “AI Complete” § Arms trafficking domain ¡ Entity merges and splits will occur ¡ Entity resolution clients must be designed to deal with this ¡ Two strategies: Refer-by-Description and Refer-by-Identifier ¡ System status: Being evaluated by AF personnel .