Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity Ivan Vulic´♠ Language Technology Lab University of Cambridge
[email protected] Simon Baker♠ Language Technology Lab University of Cambridge
[email protected] Edoardo Maria Ponti♠ Language Technology Lab University of Cambridge
[email protected] Ulla Petti Language Technology Lab University of Cambridge
[email protected] Ira Leviant Faculty of Industrial Engineering and Management, Technion, IIT
[email protected] Kelly Wing Language Technology Lab University of Cambridge
[email protected] Olga Majewska Language Technology Lab University of Cambridge
[email protected] All data are available at https://multisimlex.com/ ♠ Equal contribution. Submission received: 11 March 2020; revised version received: 17 July 2020; accepted for publication: 3 October 2020. https://doi.org/10.1162/COLI a 00391 © 2020 Association for Computational Linguistics Published under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00391 by guest on 03 October 2021 Computational Linguistics Volume 46, Number 4 Eden Bar Faculty of Industrial Engineering and Management, Technion, IIT
[email protected] Matt Malone Language Technology Lab University of Cambridge
[email protected] Thierry Poibeau LATTICE Lab, CNRS and ENS/PSL and Univ. Sorbonne Nouvelle
[email protected] Roi Reichart Faculty of Industrial Engineering and Management, Technion, IIT
[email protected] Anna Korhonen Language Technology Lab University of Cambridge
[email protected] We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili).