IXA-Pipe) and Parsing (Alpino)

IXA-Pipe) and Parsing (Alpino)

Details from a distance? CLIN26 Amsterdam A Dutch pipeline for event detection Chantal van Son, Marieke van Erp, Antske Fokkens, Paul Huygen, Ruben Izquierdo Bevia & Piek Vossen CLOSE READING DISTANT READING NewsReader & BiographyNet Apply the detailed analyses typically associated with close (or at least non-distant) reading to large amounts of textual data ✤ NewsReader: (financial) news data ✤ Day-by-day processing of news; reconstructing a coherent story ✤ Four languages: English, Spanish, Italian, Dutch ✤ BiographyNet: biographical data ✤ Answering historical questions with digitalized data from Biography Portal of the Netherlands (www.biografischportaal.nl) using computational methods http://www.newsreader-project.eu/ http://www.biographynet.nl/ Event extraction and representation 1. Intra-document processing: Dutch NLP pipeline generating NAF files ✤ pipeline includes tokenization, parsing, SRL, NERC, etc. ✤ NLP Annotation Format (NAF) is a multi-layered annotation format for representing linguistic annotations in complex NLP architectures 2. Cross-document processing: event coreference to convert NAF to RDF representation 3. Store the NAF and RDF in a KnowledgeStore Event extraction and representation 1. Intra-document processing: Dutch NLP pipeline generating NAF files ✤ pipeline includes tokenization, parsing, SRL, NERC, etc. ✤ NLP Annotation Format (NAF) is a multi-layered annotation format for representing linguistic annotations in complex NLP architectures 2. Cross-document processing: event coreference to convert NAF to RDF 3. Store the NAF and RDF in a KnowledgeStore PoS tagging, Tokenization lemmatization (IXA-pipe) and parsing (Alpino) Document NLP Annotation Format (NAF) PoS tagging, Tokenization lemmatization (IXA-pipe) and parsing (Alpino) Document PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document WSD Predicate Matrix (Cornetto/ODWN) tagging PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document WSD Predicate Matrix (Cornetto/ODWN) tagging PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document WSD Predicate Matrix (Cornetto/ODWN) tagging SRL FrameNet labeling (SoNaR SRL) PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document WSD Predicate Matrix (Cornetto/ODWN) tagging SRL FrameNet labeling (SoNaR SRL) PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document WSD Predicate Matrix (Cornetto/ODWN) tagging SRL FrameNet labeling (SoNaR SRL) Event coreference PoS tagging, Tokenization lemmatization Opinions (IXA-pipe) and parsing (Alpino) NED NERC (DBpedia (IXA-pipe) Spotlight) Time expressions (HeidelTime) Document WSD Predicate Matrix (Cornetto/ODWN) tagging SRL FrameNet labeling (SoNaR SRL) Developed by CLTL Event coreference Wrappers/reimplementations Demo of the Dutch pipeline http://kyoto.let.vu.nl/~huygen/test/test.php Event extraction and representation 1. Intra-document processing: Dutch NLP pipeline generating NAF files 2. Cross-document processing: event coreference to convert NAF to RDF ✤ uses GAF to formally distinguish between mentions (represented in NAF) and instances (represented in RDF-based SEM) ✤ involves document clustering and linking textual expressions referring to same entity or event 3. Store the NAF and RDF in a KnowledgeStore Conclusion ✤ First full Dutch NLP pipeline providing rich information about events: WHAT happened WHERE and WHEN, and WHO was involved? ✤ With recent addition of Open Dutch WordNet (ODWN), all software is fully open-source References (1) ✤ IXA-pipe: Agerri, R., Bermudez, J., and Rigau, G. (2014). IXA pipeline: Efficient and ready to use multilingual NLP tools. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC-2014), pages 26–31. http://ixa2.si.ehu.es/ixa-pipes/ ✤ Alpino: Van der Beek, L., Bouma, G., Malouf, R., and Van Noord, G. (2002). The Alpino dependency treebank. Language and Computers, 45(1):8–22. http://www.let.rug.nl/ vannoord/alp/Alpino/ ✤ HeidelTime: Strötgen, J., & Gertz, M. (2010, July). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 321-324. https://github.com/HeidelTime/heideltime ✤ SoNaR Semantic Role Labeler: De Clercq, O., Hoste, V., and Monachesi, P. (2012). Evaluating automatic cross-domain semantic role annotation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), pages 88–93. References (2) ✤ Predicate Matrix: Lopez de Lacalle, M., Laparra, E., and Rigau, G. (2014b). Predicate Matrix: extending SemLink through WordNet mappings. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC-2014). http://adimen.si.ehu.es/web/PredicateMatrix ✤ Open Source Dutch WordNet: Postma, M. and Vossen, P. (2014). Open Source Dutch WordNet. Technical report, VU University Amsterdam. http://wordpress.let.vupr.nl/ odwn/ ✤ FrameNet labeling: Van Son, C.M. (2015) Towards a Dutch Frame-Semantic Parser. Master’s thesis, VU University Amsterdam. ✤ DBpedia Spotlight: Mendes, P. N., Jakob, M., García-Silva, A. and Bizer, C. (2011). DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, pages. 1-8. https://github.com/dbpedia-spotlight References (3) ✤ Software presented today: https://github.com/cltl/ http://www.newsreader-project.eu/results/software/ www.cltl.nl.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us