From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Offline Product / Content Experience on the Ground Research Reports Money
Ofine context descriptive value web KA video content Lite/Kolibri/Learning Equality Internet in a product feedback Box video content device product feedback software compressed ZIM content product users product / content Orange product Foundation Medical practitioner local content money money (remote product users medical help clinic) users Patient product Wikifundi product Khan Academy content (remote product specs Wikimedicine clinic) money ? project content curation money distribution Columbia research research reports Kiwix / team Prisoners OpenZIM ZIM content build Kiwix app ofine editing capability product feedback compressed ZIM contentcompressed ZIM content software compressed ZIM content software Other Kiwix money RACHEL Kiwix content reusers research reports Rachel product and content Grant based money distributors (Gabriel compressedmoney ZIM content Thullen) money ofine product / content users Governments Experience App and users WMF Grants Play store content re sellers WMF Experience Phone Education Wikipedia App resalers WMF School training School administrator Partnerships teachers (low s (low ofine product / content resource resource WMF school) school) Phone Product manufacturer WMF ofine product / content s Comms app with content users product feedback users NGOs and online teaching tools Unicef content (wiki edu) product / content Wikipedia app user (Android) distribution Mobile network experience on the ground operators Students Other ofine (low Wikipedia Endless Wikipedia resource editors apps school) XOWA Egranary ofine product / content Content Wif access curators Refugees points (Wikipedia 1.0). -
A Tabulated Wikipedia Resource Using the Parquet Format Klang
WikiParq: A Tabulated Wikipedia Resource Using the Parquet Format Klang, Marcus; Nugues, Pierre Published in: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) 2016 Link to publication Citation for published version (APA): Klang, M., & Nugues, P. (2016). WikiParq: A Tabulated Wikipedia Resource Using the Parquet Format. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) (pp. 4141-4148). European Language Resources Association. http://www.lrec- conf.org/proceedings/lrec2016/pdf/31_Paper.pdf Total number of authors: 2 General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00 WikiParq: A Tabulated Wikipedia Resource Using the Parquet Format Marcus Klang, Pierre Nugues Lund University, Department of Computer science, Lund, Sweden [email protected], [email protected] Abstract Wikipedia has become one of the most popular resources in natural language processing and it is used in quantities of applications. -
Těžba Grafových Dat Z Wikipedie
Západočeská univerzita v Plzni Fakulta aplikovaných věd Katedra informatiky a výpočetní techniky Bakalářská práce Těžba grafových dat z Wikipedie Plzeň 2016 Martin Mach PROHLÁŠENÍ Prohlašuji, že jsem bakalářskou práci vypracoval samostatně a výhradně s použitím citovaných pramenů. V Plzni dne 4. května 2016.................................. Martin Mach ABSTRAKT Cílem této práce je vytvořit nástroj umožňující zpracování historických událostí z Wikipe- die do podoby grafu. Vzhledem k povaze zdroje dat a požadavku na interakci s uživatelem byl za platformu zvolen webový prohlížeč Google Chrome a jeho systém rozšíření. Vytvo- řený nástroj umožňuje výběr preferovaných článků, na základě nichž jsou nabízeny další články dle odhadu jejich relevance. Obsah článků je zpracováván a jsou z něj získávány informace specifické pro typ dané historické události. Mezi jednotlivými články jehledán vzájemný vztah, který potom, spolu s typem tohoto vztahu, tvoří hrany získaného grafu. Vytvořené řešení poskytuje uživateli možnost zpracovat historické události do formy re- šerše reprezentované grafem. Množství získávaných informací a jejich vzájemnou spojitost je možné ovlivnit pomocí systému modulů. Výsledek je poté možno zobrazit na časové ose, případně je možné upravit výstup aplikace tak, aby mohl být zobrazen nástrojem dle volby uživatele. KLÍČOVÁ SLOVA Wikipedie, těžba dat, graf, historická událost, časová osa, Google Chrome, rozšíření ABSTRACT The purpose of this thesis is to create a tool which would allow to process historical events from Wikipedia into the form of a graph. Due to the nature of the data source and the requi- rement for user interaction it was decided to choose Google Chrome and its extensions as a platfrom. The tool allows user to choose articles of his choice, based on which it will find and recommend other articles according to their estimated relevance. -
Wikimedia Free Videos Download
wikimedia free videos download Celebra los 20 años de Wikipedia y las personas que lo hacen posible → In this time of uncertainty, our enduring commitment is to provide reliable and neutral information for the world. The nonprofit Wikimedia Foundation provides the essential infrastructure for free knowledge. We host Wikipedia, the free online encyclopedia, created, edited, and verified by volunteers around the world, as well as many other vital community projects. All of which is made possible thanks to donations from individuals like you. We welcome anyone who shares our vision to join us in collecting and sharing knowledge that fully represents human diversity. (中文 (繁体中文) · Français (France) · Русский (Россия) · Español (España) · Deutsch (Deutschland · (اﻟﻌﺮﺑﯿﺔ (اﻟﻌﺎﻟﻢ · English 1 Wikimedia projects belong to everyone. You made it. It is yours to use. For free. That means you can use it, adapt it, or share what you find on Wikimedia sites. Just do not write your own bio, or copy/paste it into your homework. 2 We respect your data and privacy. We do not sell your email address or any of your personal information to third parties. More information about our privacy practices are available at the Wikimedia Foundation privacy policy, donor privacy policy, and data retention guidelines. 3 People like you keep Wikipedia accurate. Readers verify the facts. Articles are collaboratively created and edited by a community of volunteers using reliable sources, so no single person or company owns a Wikipedia article. The Wikimedia Foundation does not write or edit, but you and everyone you know can help. 4 Not all wikis are Wikimedia. -
Industry Map: Offline Wikimedia March 15, 2017 So What Is This?
Industry map: Offline Wikimedia March 15, 2017 So what is this? The New Readers team is working to address access as a barrier through supporting offline reading. The following is our understanding of how Wikipedia’s free content gets into the hands of readers who don’t always have an internet connection. That includes content packages, software, hardware, and distribution channels. We should continue to build on this industry analysis with a deeper understanding of use cases and strategic priorities. From there, we will be able to match solutions to opportunities. Wikipedia App WMF Servers Reader > content (prototype) How does content get to Raspberry Pi (online) readers? Kiwix apps Wikimed apps Smartphone 3rd party (see slide 5) Kiwix .ZIM Reader Desktop Kiwix software XOWA Proprietary XOWA data Device Endless EGranary ?? CD Pedia Legend Content 3rd Kiwix WMF Prosp- [unknown] party owned owned ective Kiwix distribution How do Kiwix’s content Wikipedia packages get to readers? Preload app (prototype) App store (requires internet) Reader P2P Wikimed apps Kiwix Kiwix apps ($275) Raspberry Pi Self install ReadersReaders Readers Institution Grantees Kiwix .ZIM (eg schools) Ted Whole projets Curated content 3rd party (see (pic or no pic) (eg. Wikimed) Project slide 5) Luxembourg Wikimedia Content Others Legend 3rd Kiwix WMF Prosp- Distribution party owned owned ective Other [known] players using Kiwix Cost Locations Device Comments Digital Doorway South Africa Solar powered Now deployed w/ terminal solar-powered container Rachel $99 - $399 Guatemala, Liberia, Content hot spot Sierra Leone (impact reports) Internet-in-a-box $100 + Proposed Wikimed Content hot spot (not yet deployed) Digisoft $800-1700 Africa Projector with offline Digital classroom system": content Projector with preloaded offline content (but can also go online). -
A Tabulated Wikipedia Resource Using the Parquet Format
WikiParq: A Tabulated Wikipedia Resource Using the Parquet Format Marcus Klang, Pierre Nugues Lund University, Department of Computer science, Lund, Sweden [email protected], [email protected] Abstract Wikipedia has become one of the most popular resources in natural language processing and it is used in quantities of applications. However, Wikipedia requires a substantial pre-processing step before it can be used. For instance, its set of nonstandardized annotations, referred to as the wiki markup, is language-dependent and needs specific parsers from language to language, for English, French, Italian, etc. In addition, the intricacies of the different Wikipedia resources: main article text, categories, wikidata, infoboxes, scattered into the article document or in different files make it difficult to have global view of this outstanding resource. In this paper, we describe WikiParq, a unified format based on the Parquet standard to tabulate and package the Wikipedia corpora. In combination with Spark, a map-reduce computing framework, and the SQL query language, WikiParq makes it much easier to write database queries to extract specific information or subcorpora from Wikipedia, such as all the first paragraphs of the articles in French, or all the articles on persons in Spanish, or all the articles on persons that have versions in French, English, and Spanish. WikiParq is available in six language versions and is potentially extendible to all the languages of Wikipedia. The WikiParq files are downloadable as tarball archives from this location: http://semantica.cs.lth.se/wikiparq/. Keywords: Wikipedia, Parquet, query language. 1. Introduction to the Wikipedia original content, WikiParq makes it easy 1.1. -
Wikirevue 12.Pdf
ASSOCIATION POUR LE LIBRE PARTAGE DE LA CONNAISSANCE Wikimédia France est une association à but non lucratif de droit français (loi 1901) née en 2004, dont l’objectif est de soutenir en France la diffusion libre de connaissance et notamment les projets hébergés par la Wikimedia Foundation comme l’encyclopédie Wikipédia, la médiathèque Wikimedia Commons, le dictionnaire Wiktionnaire et plusieurs autres projets liés à la connaissance. ÉQUIPE SALARIÉE Nathalie Martin Pierre-Antoine Le Page DIRECTRICE EXÉCUTIVE CHARGÉ DE MISSION ORGANISATION Nadia Ayachi TERRITORIALE ATTACHÉE DE DIRECTION Anne-Laure Prévost Aujourd’hui, Sylvain Boissel CONSEILLÈRE SPÉCIALE PARTENARIATS ET RELATIONS INSTITUTIONNELLES 360 membres ADMINISTRATEUR SYSTÈMES ET RÉSEAUX s’impliquent dans les Cyrille Bertin Jean-Philippe Kmiec CHARGÉ DE MISSION COMMUNICATION projets de l’association. CONSEILLER SPÉCIAL FINANCEMENT ET ÉVÉNEMENTIEL ET PARTICIPATION C’est grâce à nos Mathieu Denel Jonathan Balima COMPTABLE membres de plus en plus CHARGÉ DE MISSION SENSIBILISATION nombreux et à vous, ET ÉVALUATION nos donateurs, que les actions de Wikimédia CONSEIL D’ADMINISTRATION France peuvent exister. Christophe Henner Jean-Frédéric Berthelot PRÉSIDENT SECRÉTAIRE-ADJOINT Merci ! Émeric Vallespi Édouard Hue VICE-PRÉSIDENT Guillaume Goursat Samuel Le Goff TRÉSORIER Sébastien Beyou Florian Pépellin TRÉSORIER-ADJOINT Pierre-Sélim Huard Benoît Prieur SECRÉTAIRE N’HÉSITEZ PAS À NOUS CONTACTER : Wikimédia France 40 rue de Cléry, 75002 Paris ET RETROUVEZ TOUTES NOS ACTUALITÉS : www.wikimedia.fr -
Building Knowledge Graphs: Marcus Klang
Building Knowledge Graphs: Processing Infrastructure and Named Entity Linking Marcus Klang Doctoral Dissertation, 2019 Department of Computer Science Lund University ISBN 978-91-7895-286-1 (printed version) ISBN 978-91-7895-287-8 (electronic version) ISSN 1404-1219 LU-CS-DISS: 2019-04 Dissertation 64, 2019 Department of Computer Science Lund University Box 118 SE-221 00 Lund Sweden Email: [email protected] WWW: http://cs.lth.se/marcus-klang Typeset using LATEX Printed in Sweden by Tryckeriet i E-huset, Lund, 2019 © 2019 Marcus Klang Abstract Things such as organizations, persons, or locations are ubiquitous in all texts cir- culating on the internet, particularly in the news, forum posts, and social media. Today, there is more written material than any single person can read through during a typical lifespan. Automatic systems can help us amplify our abilities to find relevant information, where, ideally, a system would learn knowledge from our combined written legacy. Ultimately, this would enable us, one day, to build automatic systems that have reasoning capabilities and can answer any question in any human language. In this work, I explore methods to represent linguistic structures in text, build processing infrastructures, and how they can be combined to process a compre- hensive collection of documents. The goal is to extract knowledge from text via things, entities. As text, I focused on encyclopedic resources such as Wikipedia. As knowledge representation, I chose to use graphs, where the entities corre- spond to graph nodes. To populate such graphs, I created a named entity linker that can find entities in multiple languages such as English, Spanish, and Chi- nese, and associate them to unique identifiers. -
Information for Labdooers (Volunteers) Labdoo Guide to Rescue A
Information for Labdooers (Volunteers) https://www.labdoo.org/de/book/export/html/35 Information for Labdooers (Volunteers) In this guide you will find useful information for Labdooers (Labdoo users/volunteers), including information about how to sanitize laptops, how to package them ready for traveling, or general tricks and tips among others. While this document attempts to be comprehensive, if you have any questions that do not get answered in this guide, please post them directly in any of the Labdoo Teams and one member of the Labdoo community will provide an answer. Tags: information labdooer Labdoo Guide to Rescue a Laptop There are several strategies to collect unused laptops from your local community. Many of the people you know have unused laptops sitting idly and the objective is to give them an option to repurpose them and convert them into powerful educational devices. Here are some ideas to help you find and rescue idle laptops. 1. Think of 5 people who will most likely donate their used laptops. They can be your parents, your co-workers, your relatives, your class-mates or your friends. If you are a student or a teacher in a school, you can write a letter to the students parents asking for unused laptops. Many parents work for companies that have unused laptops. 2. Pitch potential donors with the 3 favorite things you like about Labdoo. When talking with potential laptop donors, tell them about the Labdoo story. Tell them for instance about the notions of using collaboration to spread education around the world and without incurring any economic or environmental cost. -
Building Knowledge Graphs Processing Infrastructure and Named Entity Linking Klang, Marcus
Building Knowledge Graphs Processing Infrastructure and Named Entity Linking Klang, Marcus 2019 Document Version: Publisher's PDF, also known as Version of record Link to publication Citation for published version (APA): Klang, M. (2019). Building Knowledge Graphs: Processing Infrastructure and Named Entity Linking. Department of Computer Science, Lund University. Total number of authors: 1 General rights Unless other specific re-use rights are stated the following general rights apply: Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00 Building Knowledge Graphs: Processing Infrastructure and Named Entity Linking Marcus Klang Doctoral Dissertation, 2019 Department of Computer Science -
Illinois Masonic Academic Bowl Round # 1 2014 State Tournament 1 Section Toss-Up Questions Question #1: Literature – Br
Illinois Masonic Academic Bowl Round # 1 2014 State Tournament 1st Section Toss-up Questions Question #1: Literature – British Literature 10 points This poem describes stars that “threw down their spears, and “The Tyger” watered heaven with their tears.” In this poem, when the title creature’s heart began to beat, the speaker asked “what dread hand and what dread feet?” The speaker also asks if the maker of the lamb also made the title predator. Name this poem about an animal “burning bright in the forests of the night” that is in the collection Songs of Experience by William Blake. Question #2: Science – Astronomy 10 points One of the dark features on the surface of this object is Titan Ontario Lacus [LAH-koos], and some of its bright locations are Dilmun, Adiri, and Xanadu. The dark objects are believed to be hydrocarbon lakes, as opposed to the water ice which makes up much of this object. Its Xanadu region is close to where the Huygens [HOY-gens] probe landed in 2005. After our moon and the Galilean [gal-uh-LAY-un] moons, this was the first moon discovered, and this has the densest atmosphere of all satellites in our solar system. Name this large moon of Saturn. 1 Illinois Masonic Academic Bowl Round # 1 2014 State Tournament 1st Section Toss-up Questions Question #3: Miscellaneous – Pop Culture 10 points In one film, this actor delivered the line, “What we do in life Russell Crowe echoes in eternity.” This person starred as a programmed serial killer being pursued by Denzel Washington in Virtuosity. -
A Java Framework for Diachronic Content and Network Analysis of Mediawikis
WikiDragon: A Java Framework For Diachronic Content And Network Analysis Of MediaWikis Rudiger¨ Gleim1, Alexander Mehler1, Sung Y. Song2 Text-Technology Lab, Goethe University1 Robert-Mayer-Straße 10, 60325 Frankfurt fgleim, [email protected], [email protected] Abstract We introduce WikiDragon, a Java Framework designed to give developers in computational linguistics an intuitive API to build, parse and analyze instances of MediaWikis such as Wikipedia, Wiktionary or WikiSource on their computers. It covers current versions of pages as well as the complete revision history, gives diachronic access to both page source code as well as accurately parsed HTML and supports the diachronic exploration of the page network. WikiDragon is self-enclosed and only requires an XML dump of the official Wikimedia Foundation website for import into an embedded database. No additional setup is required. We describe WikiDragon’s architecture and evaluate the framework based on the simple English Wikipedia with respect to the accuracy of link extraction, diachronic network analysis and the impact of using different Wikipedia frameworks to text analysis. Keywords: Wikipedia, Java Framework, Diachronic Text Analysis, Diachronic Network Analysis, Link Extraction 1. Introduction native approaches to work with wikis offline lack proper interfaces to explore the data or cover only certain aspects: Wikis such as Wikipedia, Wiktionary, WikiSource or for example by modeling only a subset of all namespaces WikiNews constitute a popular form of collaborative writ- (e.g. the main namespace and categories) or by developing ing and thus a valuable resource for computational lin- interfaces to special instances such as Wiktionary. guistics. This relates to research exploiting wiki data as a resource, e.g.