Discerning Intelligence from Text (at the UofA) Denilson Barbosa
[email protected] Web search is changing • … from IR-style document retrieval • … to ques?on-answering over en??es extracted from the web which team did Lou Saban coach last? which was Lou Saban’s last team ? • The answer is the Chowan Braves, and it can be found in Lou Saban’s Wikipedia page (ranked #1), obituary (ranked #4), and so on…
[email protected] 2 in good company… An explicit answer
[email protected] 3 it is not all bad news…
[email protected] 4 Structured knowledge (harnessed from the Web)
[email protected] 5 Surface-level relaon ExtracDon AUer his departure from Buffalo, Saban Documents returned to coach college football teams including Miami, Army, and UCF. Recognize Resolve Split Find Entities Coreferences Sentences Relations <“Lou Saban”, departed from, “Buffalo Bills”> <“Lou Saban”, coach, “Miami Hurricanes”> <“Lou Saban”, coach, “Army Black Knights football”> Triple store <“Lou Saban”, coach, “University of Central Florida”>
[email protected] 6 From triples to a KB… ????? • There is a very, very, very long way… § Predicate disambiguaon into seman?c “relaons”… § Named en?ty disambiguaon… § Assigning en??es to classes… § Grouping classes into a hierarchy… § Ordering facts temporally… • It would have been virtually impossible without Wikipedia
[email protected] 7 In this talk… • Work on en?ty linking with random walks … [CIKM’2014] • A bit of the work on open relaon extrac?on – less on disambiguaon § SONEX (clustering-based) [TWEB ‘2012] §