Entity Linking in 100 Languages

Total Page:16

File Type:pdf, Size:1020Kb

Entity Linking in 100 Languages Entity Linking in 100 Languages Jan A. Botha Zifei Shan Daniel Gillick Google Research Google Research Google Research [email protected] [email protected] [email protected] Abstract The accompanying motivation is that KBs may only ever exist in some well-resourced languages, We propose a new formulation for multilingual but that text in many different languages need to entity linking, where language-specific men- tions resolve to a language-agnostic Knowl- be linked. Recent work in this direction features edge Base. We train a dual encoder in this new progress on low-resource languages (Zhou et al., setting, building on prior work with improved 2020), zero-shot transfer (Sil and Florian, 2016; Ri- feature representation, negative mining, and an jhwani et al., 2019; Zhou et al., 2019) and scaling to auxiliary entity-pairing task, to obtain a sin- many languages (Pan et al., 2017), but commonly gle entity retrieval model that covers 100+ lan- assumes a single primary KB language and a lim- guages and 20 million entities. The model ited KB, typically English Wikipedia. outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare We contend that this popular formulation lim- entities and low-resource languages pose chal- its the scope of EL in ways that are artificial and lenges at this large-scale, so we advocate for inequitable. an increased focus on zero- and few-shot eval- First, it artificially simplifies the task by restrict- uation. To this end, we provide Mewsli-9, a large new multilingual dataset1 matched to our ing the set of viable entities and reducing the va- setting, and show how frequency-based anal- riety of mention ambiguities. Limiting the focus ysis provided key insights for our model and to entities that have English Wikipedia pages un- training enhancements. derstates the real-world diversity of entities. Even within the Wikipedia ecosystem, many entities only 1 Introduction have pages in languages other than English. These Entity linking (EL) fulfils a key role in grounded are often associated with locales that are already language understanding: Given an ungrounded en- underrepresented on the global stage. By ignoring tity mention in text, the task is to identify the en- these entities and their mentions, most current mod- tity’s corresponding entry in a Knowledge Base eling and evaluation work tend to side-step under- (KB). In particular, EL provides grounding for ap- appreciated challenges faced in practical industrial plications like Question Answering (Févry et al., applications, which often involve KBs much larger 2020b) (also via Semantic Parsing (Shaw et al., than English Wikipedia, with a much more signifi- 2019)) and Text Generation (Puduppully et al., cant zero- or few-shot inference problem. 2019); it is also an essential component in knowl- Second, it entrenches an English bias in EL re- edge base population (Shen et al., 2014). Entities search that is out of step with the encouraging shift have played a growing role in representation learn- toward inherently multilingual approaches in nat- ing. For example, entity mention masking led to ural language processing, enabled by advances in greatly improved fact retention in large language representation learning (Johnson et al., 2017; Pires models (Guu et al., 2020; Roberts et al., 2020). et al., 2019; Conneau et al., 2020). But to date, the primary formulation of EL out- Third, much recent EL work has focused on mod- side of the standard monolingual setting has been els that rerank entity candidates retrieved by an cross-lingual: link mentions expressed in one lan- alias table (Févry et al., 2020a), an approach that guage to a KB expressed in another (McNamee works well for English entities with many linked et al., 2011; Tsai and Roth, 2016; Sil et al., 2018). mentions, but less so for the long tail of entities 1http://goo.gle/mewsli-dataset and languages. 7833 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 7833–7845, November 16–20, 2020. c 2020 Association for Computational Linguistics To overcome these shortcomings, this work languages. WikiData itself contains names and makes the following key contributions: short descriptions, but through its close integra- tion with all Wikipedia editions, it also connects • Reformulate entity linking as inherently mul- entities to rich descriptions (and other features) tilingual: link mentions in 104 languages to drawn from the corresponding language-specific entities in WikiData, a language-agnostic KB. Wikipedia pages. • Advance prior dual encoder retrieval work Basing entity representations on features of their with improved mention and entity encoder Wikipedia pages has been a common approach in architecture and improved negative mining EL (e.g. Sil and Florian, 2016; Francis-Landau targeting. et al., 2016; Gillick et al., 2019; Wu et al., 2019), but we will need to generalize this to include multi- • Establish new state-of-the-art performance rel- ple Wikipedia pages with possibly redundant fea- ative to prior cross-lingual linking systems, tures in many languages. with one model capable of linking 104 lan- guages against 20 million WikiData entities. 2.1.1 WikiData Entity Example Consider the WikiData Entity Sí RàdioQ3511500, • Introduce Mewsli-9, a large dataset with a now defunct Valencian radio station. Its KB en- nearly 300,000 mentions across 9 diverse try references Wikipedia pages in three languages, languages with links to WikiData. The which contain the following descriptions:2 dataset features many entities that lack En- glish Wikipedia pages and which are thus inac- • (Catalan) Sí Ràdio fou una emissora de ràdio cessible to many prior cross-lingual systems. musical, la segona de Radio Autonomía Valen- • Present frequency-bucketed evaluation that ciana, S.A. pertanyent al grup Radiotelevisió highlights zero- and few-shot challenges with Valenciana. clear headroom, implicitly including low- resource languages without enumerating re- • (Spanish) Nou Si Ràdio (anteriormente cono- sults over a hundred languages. cido como Sí Ràdio) fue una cadena de radio de la Comunidad Valenciana y emisora her- 2 Task Definition mana de Nou Ràdio perteneciente al grupo RTVV. Multilingual Entity Linking (MEL) is the task of linking an entity mention m in some context lan- • (French) Sí Ràdio est une station de radio guage lc to the corresponding entity e 2 V in a publique espagnole appartenant au groupe language-agnostic KB. That is, while the KB may Ràdio Televisió Valenciana, entreprise de include textual information (names, descriptions, radio-télévision dépendant de la Generalitat etc.) about each entity in one or more languages, valencienne. we make no prior assumption about the relationship kb between these KB languages L = fl1; : : : ; lkg Note that these Wikipedia descriptions are not di- and the mention-side language: lc may or may not rect translations, and contain some name variations. be in Lkb. We emphasize that this particular entity would have This is a generalization of cross-lingual EL been completely out of scope in the standard cross- (XEL), which is concerned with the case where lingual task (Tsai and Roth, 2016), because it does Lkb = fl0g and lc 6= l0. Commonly, l0 is English, not have an English Wikipedia page. and V is moreover limited to the set of entities that express features in l0. In our analysis, there are millions of WikiData entities with this property, meaning the standard set- 2.1 MEL with WikiData and Wikipedia ting skips over the substantial challenges of model- As a concrete realization of the proposed task, ing these (often rarer) entities, and disambiguating we use WikiData (Vrandeciˇ c´ and Krötzsch, 2014) them in different language contexts. Our formula- as our KB: it covers a large set of diverse enti- tion seeks to address this. ties, is broadly accessible and actively maintained, 2We refer to the first sentence of a Wikipedia page as a and it provides access to entity features in many description because it follows a standardized format. 7834 2.2 Knowledge Base Scope Entities Our modeling focus is on using unstructured tex- Lang. Docs Mentions Distinct 2= EnWiki tual information for entity linking, leaving other ja 3,410 34,463 13,663 3,384 modalities or structured information as areas for de 13,703 65,592 23,086 3,054 es 10,284 56,716 22,077 1,805 future work. Accordingly, we narrow our KB to the ar 1,468 7,367 2,232 141 subset of entities that have descriptive text avail- sr 15,011 35,669 4,332 269 tr 997 5,811 2,630 157 able: We define our entity vocabulary V as all Wiki- fa 165 535 385 12 Data items that have an associated Wikipedia page ta 1,000 2,692 1,041 20 in at least one language, independent of the lan- en 12,679 80,242 38,697 14 guages we actually model.3 This gives 19,666,787 58,717 289,087 82,162 8,807 entities, substantially more than in any other task en0 1801 2,263 1799 0 settings we have found: the KB accompanying the entrenched TAC-KBP 2010 benchmark (Ji et al., Table 1: Corpus statistics for Mewsli-9, an evalua- 2010) has less than a million entities, and although tion set we introduce for multilingual entity linking English Wikipedia continues to grow, recent work against WikiData. Line en0 shows statistics for using it as a KB still only contend with roughly English WikiNews-2018, by Gillick et al.(2019). 6 million entities (Févry et al., 2020a; Zhou et al., 2020). Further, by employing a simple rule to de- 5 termine the set of viable entities, we avoid potential six orthographies. Per-language statistics appear selection bias based on our desired test sets or the in Table 1.
Recommended publications
  • Predicting the Correct Entities in Named-Entity Linking
    Separating the Signal from the Noise: Predicting the Correct Entities in Named-Entity Linking Drew Perkins Uppsala University Department of Linguistics and Philology Master Programme in Language Technology Master’s Thesis in Language Technology, 30 ects credits June 9, 2020 Supervisors: Gongbo Tang, Uppsala University Thorsten Jacobs, Seavus Abstract In this study, I constructed a named-entity linking system that maps between contextual word embeddings and knowledge graph embeddings to predict correct entities. To establish a named-entity linking system, I rst applied named-entity recognition to identify the entities of interest. I then performed candidate gener- ation via locality sensitivity hashing (LSH), where a candidate group of potential entities were created for each identied entity. Afterwards, my named-entity dis- ambiguation component was performed to select the most probable candidate. By concatenating contextual word embeddings and knowledge graph embeddings in my disambiguation component, I present a novel approach to named-entity link- ing. I conducted the experiments with the Kensho-Derived Wikimedia Dataset and the AIDA CoNLL-YAGO Dataset; the former dataset was used for deployment and the later is a benchmark dataset for entity linking tasks. Three deep learning models were evaluated on the named-entity disambiguation component with dierent context embeddings. The evaluation was treated as a classication task, where I trained my models to select the correct entity from a list of candidates. By optimizing the named-entity linking through this methodology, this entire system can be used in recommendation engines with high F1 of 86% using the former dataset. With the benchmark dataset, the proposed method is able to achieve F1 of 79%.
    [Show full text]
  • Position Description Addenda
    POSITION DESCRIPTION January 2014 Wikimedia Foundation Executive Director - Addenda The Wikimedia Foundation is a radically transparent organization, and much information can be found at www.wikimediafoundation.org . That said, certain information might be particularly useful to nominators and prospective candidates, including: Announcements pertaining to the Wikimedia Foundation Executive Director Search Kicking off the search for our next Executive Director by Former Wikimedia Foundation Board Chair Kat Walsh An announcement from Wikimedia Foundation ED Sue Gardner by Wikimedia Executive Director Sue Gardner Video Interviews on the Wikimedia Community and Foundation and Its History Some of the values and experiences of the Wikimedia Community are best described directly by those who have been intimately involved in the organization’s dramatic expansion. The following interviews are available for viewing though mOppenheim.TV . • 2013 Interview with Former Wikimedia Board Chair Kat Walsh • 2013 Interview with Wikimedia Executive Director Sue Gardner • 2009 Interview with Wikimedia Executive Director Sue Gardner Guiding Principles of the Wikimedia Foundation and the Wikimedia Community The following article by Sue Gardner, the current Executive Director of the Wikimedia Foundation, has received broad distribution and summarizes some of the core cultural values shared by Wikimedia’s staff, board and community. Topics covered include: • Freedom and open source • Serving every human being • Transparency • Accountability • Stewardship • Shared power • Internationalism • Free speech • Independence More information can be found at: https://meta.wikimedia.org/wiki/User:Sue_Gardner/Wikimedia_Foundation_Guiding_Principles Wikimedia Policies The Wikimedia Foundation has an extensive list of policies and procedures available online at: http://wikimediafoundation.org/wiki/Policies Wikimedia Projects All major projects of the Wikimedia Foundation are collaboratively developed by users around the world using the MediaWiki software.
    [Show full text]
  • Three Birds (In the LLOD Cloud) with One Stone: Babelnet, Babelfy and the Wikipedia Bitaxonomy
    Three Birds (in the LLOD Cloud) with One Stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy Tiziano Flati and Roberto Navigli Dipartimento di Informatica Sapienza Universit`adi Roma Abstract. In this paper we present the current status of linguistic re- sources published as linked data and linguistic services in the LLOD cloud in our research group, namely BabelNet, Babelfy and the Wikipedia Bitaxonomy. We describe them in terms of their salient aspects and ob- jectives and discuss the benefits that each of these potentially brings to the world of LLOD NLP-aware services. We also present public Web- based services which enable querying, exploring and exporting data into RDF format. 1 Introduction Recent years have witnessed an upsurge in the amount of semantic information published on the Web. Indeed, the Web of Data has been increasing steadily in both volume and variety, transforming the Web into a global database in which resources are linked across sites. It is becoming increasingly critical that existing lexical resources be published as Linked Open Data (LOD), so as to foster integration, interoperability and reuse on the Semantic Web [5]. Thus, lexical resources provided in RDF format can contribute to the creation of the so-called Linguistic Linked Open Data (LLOD), a vision fostered by the Open Linguistic Working Group (OWLG), in which part of the Linked Open Data cloud is made up of interlinked linguistic resources [2]. The multilinguality aspect is key to this vision, in that it enables Natural Language Processing tasks which are not only cross-lingual, but also independent both of the language of the user input and of the linked data exploited to perform the task.
    [Show full text]
  • Cross Lingual Entity Linking with Bilingual Topic Model
    Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Cross Lingual Entity Linking with Bilingual Topic Model Tao Zhang, Kang Liu and Jun Zhao Institute of Automation, Chinese Academy of Sciences HaiDian District, Beijing, China {tzhang, kliu, jzhao}@nlpr.ia.ac.cn Abstract billion web pages on Internet and 31% of them are written in other languages. This report tells us that the percentage of Cross lingual entity linking means linking an entity web pages written in English are decreasing, which indicates mention in a background source document in one that English pages are becoming less dominant compared language with the corresponding real world entity in with before. Therefore, it becomes more and more important a knowledge base written in the other language. The to maintain the growing knowledge base by discovering and key problem is to measure the similarity score extracting information from all web contents written in between the context of the entity mention and the different languages. Also, inserting new extracted knowledge document of the candidate entity. This paper derived from the information extraction system into a presents a general framework for doing cross knowledge base inevitably needs a system to map the entity lingual entity linking by leveraging a large scale and mentions in one language to their entries in a knowledge base bilingual knowledge base, Wikipedia. We introduce which may be another language. This task is known as a bilingual topic model that mining bilingual topic cross-lingual entity linking. from this knowledge base with the assumption that Intuitively, to resolve the cross-lingual entity linking the same Wikipedia concept documents of two problem, the key is how to define the similarity score different languages share the same semantic topic between the context of entity mention and the document of distribution.
    [Show full text]
  • Babelfying the Multilingual Web: State-Of-The-Art Disambiguation and Entity Linking of Web Pages in 50 Languages
    Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking of Web pages in 50 languages Roberto Navigli [email protected] http://lcl.uniroma1.it! ERC StG: MultilingualERC Joint Word Starting Sense Disambiguation Grant (MultiJEDI) MultiJEDI No. 259234 1! Roberto Navigli! LIDER EU Project No. 610782! Joint work with Andrea Moro Alessandro Raganato A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, 2014. Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro and Roberto Navigli 2014.05.08 BabelNet & friends3 Roberto Navigli 2014.05.08 BabelNet & friends4 Roberto Navigli Motivation • Domain-specific Web content is available in many languages • Information should be extracted and processed independently of the source/target language • This could be done automatically by means of high- performance multilingual text understanding Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli BabelNet (http://babelnet.org) A wide-coverage multilingual semantic network and encyclopedic dictionary in 50 languages! Named Entities and specialized concepts Concepts from WordNet from Wikipedia Concepts integrated from both resources Babelfying the Multilingual Web: state-of-the-art Disambiguation and Entity Linking Andrea Moro, Alessandro Raganato and Roberto Navigli BabelNet (http://babelnet.org)
    [Show full text]
  • Named Entity Extraction for Knowledge Graphs: a Literature Overview
    Received December 12, 2019, accepted January 7, 2020, date of publication February 14, 2020, date of current version February 25, 2020. Digital Object Identifier 10.1109/ACCESS.2020.2973928 Named Entity Extraction for Knowledge Graphs: A Literature Overview TAREQ AL-MOSLMI , MARC GALLOFRÉ OCAÑA , ANDREAS L. OPDAHL, AND CSABA VERES Department of Information Science and Media Studies, University of Bergen, 5007 Bergen, Norway Corresponding author: Tareq Al-Moslmi ([email protected]) This work was supported by the News Angler Project through the Norwegian Research Council under Project 275872. ABSTRACT An enormous amount of digital information is expressed as natural-language (NL) text that is not easily processable by computers. Knowledge Graphs (KG) offer a widely used format for representing information in computer-processable form. Natural Language Processing (NLP) is therefore needed for mining (or lifting) knowledge graphs from NL texts. A central part of the problem is to extract the named entities in the text. The paper presents an overview of recent advances in this area, covering: Named Entity Recognition (NER), Named Entity Disambiguation (NED), and Named Entity Linking (NEL). We comment that many approaches to NED and NEL are based on older approaches to NER and need to leverage the outputs of state-of-the-art NER systems. There is also a need for standard methods to evaluate and compare named-entity extraction approaches. We observe that NEL has recently moved from being stepwise and isolated into an integrated process along two dimensions: the first is that previously sequential steps are now being integrated into end-to-end processes, and the second is that entities that were previously analysed in isolation are now being lifted in each other's context.
    [Show full text]
  • Entity Linking
    Entity Linking Kjetil Møkkelgjerd Master of Science in Computer Science Submission date: June 2017 Supervisor: Herindrasana Ramampiaro, IDI Norwegian University of Science and Technology Department of Computer Science Abstract Entity linking may be of help to quickly supplement a reader with further information about entities within a text by providing links to a knowledge base containing more information about each entity, and may thus potentially enhance the reading experience. In this thesis, we look at existing solutions, and implement our own deterministic entity linking system based on our research, using our own approach to the problem. Our system extracts all entities within the input text, and then disam- biguates each entity considering the context. The extraction step is han- dled by an external framework, while the disambiguation step focuses on entity-similarities, where similarity is defined by how similar the entities’ categories are, which we measure by using data from a structured knowledge base called DBpedia. We base this approach on the assumption that simi- lar entities usually occur close to each other in written text, thus we select entities that appears to be similar to other nearby entities. Experiments show that our implementation is not as effective as some of the existing systems we use for reference, and that our approach has some weaknesses which should be addressed. For example, DBpedia is not an as consistent knowledge base as we would like, and the entity extraction framework often fail to reproduce the same entity set as the dataset we use for evaluation. However, our solution show promising results in many cases.
    [Show full text]
  • Named-Entity Recognition and Linking with a Wikipedia-Based Knowledge Base Bachelor Thesis Presentation Niklas Baumert [[email protected]]
    Named-Entity Recognition and Linking with a Wikipedia-based Knowledge Base Bachelor Thesis Presentation Niklas Baumert [[email protected]] 2018-11-08 Contents 1 Why and What?—Motivation. 1.1 What is named-entity recognition? 1.2 What is entity linking? 2 How?—Implementation. 2.1 Wikipedia-backed Knowledge Base. 2.2 Entity Recognizer & Linker. 3 Performance Evaluation. 4 Conclusion. 2 1 Why and What—Motivation. 3 Built to provide input files for QLever. ● Specialized use-case. ○ General application as an extra feature. ● Providing a wordsfile and a docsfile as output. ● Wordsfile is input for QLever. ● Support for Wikipedia page titles, Freebase IDs and Wikidata IDs. ○ Convert between those 3 freely afterwards. 4 The docsfile is a list of all sentences with IDs. record id text 1 Anarchism is a political philosophy that advocates [...] 2 These are often described as [...] 5 The wordsfile contains each word of each sentence and determines entities and their likeliness. word Entity? recordID score Anarchism 0 1 1 <Anarchism> 1 1 0.9727 is 0 1 1 a 0 1 1 political 0 1 1 philosophy 0 1 1 <Philosophy> 1 1 0.955 6 Named-entity recognition (NER) identifies nouns in a given text. ● Finding and categorizing entities in a given text. [1][2] ● (Proper) nouns refer to entities. ● Problem: Words are ambiguous. [1] ○ Same word, different entities within a category—”John F. Kennedy”, the president or his son? ○ Same word, different entities in different categories—”JFK”, person or airport? 7 Entity linking (EL) links entities to a database. ● Resolve ambiguity by associating mentions to entities in a knowledge base.
    [Show full text]
  • Jimmy Wales and Larry Sanger, It Is the Largest, Fastest-Growing and Most Popular General Reference Work Currently Available on the Internet
    Tomasz „Polimerek” Ganicz Wikimedia Polska WikipediaWikipedia andand otherother WikimediaWikimedia projectsprojects WhatWhat isis Wikipedia?Wikipedia? „Imagine„Imagine aa worldworld inin whichwhich everyevery singlesingle humanhuman beingbeing cancan freelyfreely shareshare inin thethe sumsum ofof allall knowledge.knowledge. That'sThat's ourour commitment.”commitment.” JimmyJimmy „Jimbo”„Jimbo” Wales Wales –– founder founder ofof WikipediaWikipedia As defined by itself: Wikipedia is a free multilingual, open content encyclopedia project operated by the non-profit Wikimedia Foundation. Its name is a blend of the words wiki (a technology for creating collaborative websites) and encyclopedia. Launched in January 2001 by Jimmy Wales and Larry Sanger, it is the largest, fastest-growing and most popular general reference work currently available on the Internet. OpenOpen and and free free content content RichardRichard StallmanStallman definition definition of of free free software: software: „The„The wordword "free""free" inin ourour namename doesdoes notnot referrefer toto price;price; itit refersrefers toto freedom.freedom. First,First, thethe freedomfreedom toto copycopy aa programprogram andand redistributeredistribute itit toto youryour neighbors,neighbors, soso thatthat theythey cancan useuse itit asas wellwell asas you.you. Second,Second, thethe freedomfreedom toto changechange aa program,program, soso ththatat youyou cancan controlcontrol itit insteadinstead ofof itit controllingcontrolling you;you; forfor this,this, thethe sourcesource
    [Show full text]
  • Instructor Basics: Howtouse Wikipedia As Ateaching Tool
    Instructor Basics: How to use Wikipedia as a teaching tool Wiki Education Foundation Wikipedia is the free online encyclopedia that anyone can edit. One of the most visited websites worldwide, Wikipedia is a resource used by most university students. Increasingly, many instructors around the world have used Wikipedia as a teaching tool in their university classrooms as well. In this brochure, we bring together their experiences to help you determine how to use Wikipedia in your classroom. We’ve organized the brochure into three parts: Assignment planning Learn key Wikipedia policies and get more information on designing assignments, with a focus on asking students to write Wikipedia articles for class. During the term Learn about the structure of a good Wikipedia article, the kinds of articles students should choose to improve, suggestions for what to cover in a Wikipedia lab session, and how to interact with the community of Wikipedia editors. After the term See a sample assessment structure that’s worked for other instructors. 2 Instructor Basics Assignment planning Understanding key policies Since Wikipedia started in 2001, the community of volunteer editors – “Wikipedians” – has developed several key policies designed to ensure Wikipedia is as reliable and useful as possible. Any assignment you integrate into your classroom must follow these policies. Understanding these cornerstone policies ensures that you develop an assignment that meets your learning objectives and improves Wikipedia at the same time. Free content Neutral point of view “The work students contribute to “Everything on Wikipedia must be Wikipedia is free content and becomes written from a neutral point of view.
    [Show full text]
  • Combining Wikidata with Other Linked Databases
    Combining Wikidata with other linked databases Andra Waagmeester, Dragan Espenschied Known variants in the CIViC database for genes reported in a WikiPathways pathway on Bladder Cancer Primary Sources: ● Wikipathways (Q7999828) ● NCBI Gene (Q20641742) ● CIViCdb (Q27612411) ● Disease Ontology (Q5282129) Example 1: Wikidata contains public data “All structured data from the main and property namespace is available under the Creative Commons CC0 License; text in the other namespaces is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy.” Wikidata requirement for Notability An item is acceptable if and only if it fulfills at least one of these two goals, that is if it meets at least one of the criteria below: ● It contains at least one valid sitelink to a page on Wikipedia, Wikivoyage, Wikisource, Wikiquote, Wikinews, Wikibooks, Wikidata, Wikispecies, Wikiversity, or Wikimedia Commons. ● It refers to an instance of a clearly identifiable conceptual or material entity.it can be described using serious and publicly available references. ● It fulfills some structural need, https://www.wikidata.org/wiki/Wikidata:Notability Wikidata property proposals “Before a new property is created, it has to be discussed here. When after some time there are some supporters, but no or very few opponents, the property is created by a property creator or an administrator. You can propose a property here or on one of the subject-specific pages listed
    [Show full text]
  • Interview with Sue Gardner, Executive Director WMF October 1, 2009 510 Years from Now, What Is Your Vision?
    Interview with Sue Gardner, Executive Director WMF October 1, 2009 5-10 years from now, what is your vision? What's different about Wikimedia? Personally, I would like to see Wikimedia in the top five for reach in every country. I want to see a broad, rich, deep encyclopedia that's demonstrably meeting people's needs, and is relevant and useful for people everywhere around the world. In order for that to happen, a lot of things would need to change. The community of editors would need to be healthier, more vibrant, more fun. Today, people get burned out. They get tired of hostility and endless debates. Working on Wikipedia is hard, and it does not offer many rewards. Editors have intrinsic motivation not extrinsic, but even so, not much is done to affirm or thank or recognize them. We need to find ways to foster a community that is rich and diverse and friendly and fun to be a part of. That world would include more women, more newly-retired people, more teachers ± all different kinds of people. There would be more ways to participate, more affirmation, more opportunities to be social and friendly. We need a lot of tools and features to help those people be more effective. Currently, there are tons of hacks and workarounds that experienced editors have developed over time, but which new editors don©t know about, and would probably find difficult to use. We need to make those tools visible and easier to use, and we need to invent new ones where they are lacking.
    [Show full text]