Urobe: a Prototype for Wiki Preservation

Total Page:16

File Type:pdf, Size:1020Kb

Urobe: a Prototype for Wiki Preservation UROBE: A PROTOTYPE FOR WIKI PRESERVATION Niko Popitsch, Robert Mosser, Wolfgang Philipp University of Vienna Faculty of Computer Science ABSTRACT By this, some irrelevant information (like e.g., automat- ically generated pages) is archived while some valuable More and more information that is considered for digital information about the semantics of relationships between long-term preservation is generated by Web 2.0 applica- these elements is lost or archived in a way that is not easily tions like wikis, blogs or social networking tools. However, processable by machines 1 . For example, a wiki article is there is little support for the preservation of these data to- authored by many different users and the information who day. Currently they are preserved like regular Web sites authored what and when is reflected in the (simple) data without taking the flexible, lightweight and mostly graph- model of the wiki software. This information is required based data models of the underlying Web 2.0 applications to access and integrate these data with other data sets in into consideration. By this, valuable information about the the future. However, archiving only the HTML version of relations within these data and about links to other data is a history page in Wikipedia makes it hard to extract this lost. Furthermore, information about the internal structure information automatically. of the data, e.g., expressed by wiki markup languages is Another issue is that the internal structure of particular not preserved entirely. core elements (e.g., wiki articles) is currently not preserved We argue that this currently neglected information is adequately. Wiki articles are authored using a particular of high value in a long-term preservation strategy of Web wiki markup language. These simple description languages 2.0 data and describe our approach for the preservation contain explicit information about the structure of the text of wiki contents that is based on Semantic Web technolo- (e.g., headings, emphasized phrases, lists and tables, etc.). gies. In particular we describe the distributed architecture This internal structure is lost to some extent if digital preser- of our wiki preservation prototype (Urobe) which imple- vation strategies consider only the HTML version of such ments a migration strategy for wiki contents and is based articles rendered by a particular wiki software as this ren- on Semantic Web and Linked Data principles. Further, dering step is not entirely reversible in many cases. we present a first vocabulary for the description of wiki In a summary, we state that the current practice for the core elements derived from a combination of established preservation of Web 2.0 data preserves only one particular vocabularies/standards from the Semantic Web and digital (HTML) representation of the considered data instead of preservation domains, namely Dublin Core, SIOC, VoiD preserving the core elements of the respective data models and PREMIS. themselves. However, we consider these core elements and their relations crucial for future data migration and 1 INTRODUCTION integration tasks. In the following we introduce our system Urobe that is capable of preserving the core elements of Users of Web 2.0 applications like wikis, blogs or social data that are created using various wiki software. networking tools generate highly interlinked data of public, corporate and personal interest that are increasingly con- 2 UROBE: A WIKI PRESERVATION TOOL sidered for long-term digital preservation. The term Web 2.0 can be regarded as referring to “a class of Web-based We are currently developing a prototype (Urobe) for the applications that were recognized ex post facto to share long-term preservation of data created by wiki users. One certain design patterns”, like being user-centered, collabo- particular problem when considering wiki preservation is rative and Web-based [6]. Web 2.0 applications are usually that there exists not one single but a large number of dif- based on flexible, lightweight data models that interlink ferent wiki implementations 2 , each using its own wiki their core elements (e.g., users, wiki articles or blog posts) markup language. This is what makes a general emula- using hyperlinks and expose these data on the Web for hu- tion approach for preserving wiki contents unfeasible as man consumption as HTML. The current practice for the it would require establishing appropriate emulation envi- preservation of these data is to treat this layer like a regular ronments for each wiki software. After further analyzing Web site and archive the HTML representations (usually by crawling them) rather than the core elements themselves. 1 Cf. http://jiscpowr.jiscinvolve.org/wp/2009/ 03/25/arch-wiki/ 2 For example, the website http://www.wikimatrix.org/ c 2010 Austrian Computer Society (OCG). lists over 100 popular wiki engines. several popular wiki engines, we have identified the fol- this problem, we make use of well-defined standardized lowing required components for implementing a long-term, Semantic Web vocabularies to define the semantics of our migration-based wiki preservation strategy: data explicitly. 1. An abstract, semantic vocabulary / schema for the de- Existing inference support. Inference enables to find scription of core elements and their relations stored relations between items that were not specified explicitly. in a wiki, namely: users, articles and revisions, their By this it is possible to generate additional knowledge contents, links, and embedded media. about the preserved data that might improve future access 2. Software components able to extract these data from to and migration of the data. wiki implementations. Expressive query language. One of the key goals of 3. A scalable infrastructure for harvesting and storing digital preservation systems is to enable users to re-find these data. and access the data stored in such an archive. This often 4. Migration services for migrating contents expressed requires a preservation storage to enable complex and so- in a wiki markup language into standardized formats. phisticated queries on its data. Data in RDF graphs can be queried using SPARQL, a rich, expressive, and standard- 5. Migration services for the semantic transformation ized query language that meets these requirements [8]. of the meta data stored in this system to newer for- mats (i.e., services for vocabulary evolution). Furthermore, we decided to publish the archived data 6. Software interfaces to existing digital preservation as Linked Data [2] in order to exchange them between infrastructures using preservation meta data stan- de-centralized components. Linked Data means that (i) re- dards. sources are identified using HTTP URIs (ii) de-referencing (i.e., accessing) a URI returns a meaningful representation 7. An effective user interface for accessing these data. of the respective resource (usually in RDF) and (iii) these representations include links to other related resources. 2.1 Benefits of Semantic Web Technologies Data published in this way can easily be accessed and Urobe is implemented using Semantic Web technologies: integrated into existing Linked Data. It extracts data stored in a wiki and archives it in the form This highly flexible data representation can be accessed of named RDF graphs [4]. The resources and properties in via the Urobe Web interface and can easily be converted these graphs are described using a simple OWL 3 vocabu- to existing preservation meta data standards in order to lary. Resources are highly interlinked with other resources integrate it with an existing preservation infrastructure. due to structural relationships (e.g., article revisions are linked with the user that authored them) but also semantic 2.2 A Vocabulary for Describing Wiki Contents. relationships (e.g., user objects stemming from different We have developed an OWL Light vocabulary for the de- wikis that are preserved by Urobe are automatically linked scription of wiki core elements by analyzing popular wiki when they share the same e-mail address). We decided to engines as well as common meta data standards from the make use of Semantic Web technologies for the represen- digital preservation domain and vocabularies from the Se- tation of preserved data and meta data for the following mantic Web domain. The core terms of our vocabulary reasons: are depicted in Figure 1. Our vocabulary builds upon three common Semantic Web vocabularies: (i) DCTERMS Flexibility. The data model for representing the pre- for terms maintained by the Dublin Core Metadata Initia- served wiki contents is likely to change over time to meet tive 4 , (ii) SIOC for describing online communities and new requirements and it is not predictable at present how their data 5 and (iii) VoiD for describing datasets in a Web this data model will evolve in the future. In this context, of Data 6 . modelling the data with the flexible graph-based RDF data The vocabulary was designed to be directly mappable model seems a good choice to us: migrating to a newer data to the PREMIS Data Dictionary 2.0 [9]. We have imple- model can be seen as an ontology matching problem for mented such a mapping and enable external tools to access which tools and methods are constantly being developed the data stored in an Urobe archive as PREMIS XML in Semantic Web research [5, 11]. descriptions 7 . This PREMIS/XML interface makes any Urobe instance a PREMIS-enabled storage that can easily High semantic expressiveness. In order to read and in- be integrated into other PREMIS-compatible preservation terpret digital content in the future, it is necessary to pre- infrastructures. serve its semantics. As a consequence of the continuous 4 http://purl.org/dc/terms/ evolution of data models, knowledge about data semantics 5 http://sioc-project.org/ disappears quickly if not specified explicitly [11]. To face 6 http://vocab.deri.ie/void/ 7 In compliance to the Linked Data recommendations access to these 3 http://www.w3.org/2004/OWL/ representations is possible via content negotiation.
Recommended publications
  • Assignment of Master's Thesis
    ASSIGNMENT OF MASTER’S THESIS Title: Git-based Wiki System Student: Bc. Jaroslav Šmolík Supervisor: Ing. Jakub Jirůtka Study Programme: Informatics Study Branch: Web and Software Engineering Department: Department of Software Engineering Validity: Until the end of summer semester 2018/19 Instructions The goal of this thesis is to create a wiki system suitable for community (software) projects, focused on technically oriented users. The system must meet the following requirements: • All data is stored in a Git repository. • System provides access control. • System supports AsciiDoc and Markdown, it is extensible for other markup languages. • Full-featured user access via Git and CLI is provided. • System includes a web interface for wiki browsing and management. Its editor works with raw markup and offers syntax highlighting, live preview and interactive UI for selected elements (e.g. image insertion). Proceed in the following manner: 1. Compare and analyse the most popular F/OSS wiki systems with regard to the given criteria. 2. Design the system, perform usability testing. 3. Implement the system in JavaScript. Source code must be reasonably documented and covered with automatic tests. 4. Create a user manual and deployment instructions. References Will be provided by the supervisor. Ing. Michal Valenta, Ph.D. doc. RNDr. Ing. Marcel Jiřina, Ph.D. Head of Department Dean Prague January 3, 2018 Czech Technical University in Prague Faculty of Information Technology Department of Software Engineering Master’s thesis Git-based Wiki System Bc. Jaroslav Šmolík Supervisor: Ing. Jakub Jirůtka 10th May 2018 Acknowledgements I would like to thank my supervisor Ing. Jakub Jirutka for his everlasting interest in the thesis, his punctual constructive feedback and for guiding me, when I found myself in the need for the words of wisdom and experience.
    [Show full text]
  • Research Article Constrained Wiki: the Wikiway to Validating Content
    Hindawi Publishing Corporation Advances in Human-Computer Interaction Volume 2012, Article ID 893575, 19 pages doi:10.1155/2012/893575 Research Article Constrained Wiki: The WikiWay to Validating Content Angelo Di Iorio,1 Francesco Draicchio,1 Fabio Vitali,1 and Stefano Zacchiroli2 1 Department of Computer Science, University of Bologna, Mura Anteo Zamboni 7, 40127 Bologna, Italy 2 Universit´e Paris Diderot, Sorbonne Paris Cit´e, PPS, UMR 7126, CNRS, F-75205 Paris, France Correspondence should be addressed to Angelo Di Iorio, [email protected] Received 9 June 2011; Revised 20 December 2011; Accepted 3 January 2012 Academic Editor: Kerstin S. Eklundh Copyright © 2012 Angelo Di Iorio et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The “WikiWay” is the open editing philosophy of wikis meant to foster open collaboration and continuous improvement of their content. Just like other online communities, wikis often introduce and enforce conventions, constraints, and rules for their content, but do so in a considerably softer way, expecting authors to deliver content that satisfies the conventions and the constraints, or, failing that, having volunteers of the community, the WikiGnomes, fix others’ content accordingly. Constrained wikis is our generic framework for wikis to implement validators of community-specific constraints and conventions that preserve the WikiWay and their open collaboration features. To this end, specific requirements need to be observed by validators and a specific software architecture can be used for their implementation, that is, as independent functions (implemented as internal modules or external services) used in a nonintrusive way.
    [Show full text]
  • Feasibility Study of a Wiki Collaboration Platform for Systematic Reviews
    Methods Research Report Feasibility Study of a Wiki Collaboration Platform for Systematic Reviews Methods Research Report Feasibility Study of a Wiki Collaboration Platform for Systematic Reviews Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 540 Gaither Road Rockville, MD 20850 www.ahrq.gov Contract No. 290-02-0019 Prepared by: ECRI Institute Evidence-based Practice Center Plymouth Meeting, PA Investigator: Eileen G. Erinoff, M.S.L.I.S. AHRQ Publication No. 11-EHC040-EF September 2011 This report is based on research conducted by the ECRI Institute Evidence-based Practice Center in 2008 under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-02-0019 -I). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services. The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment. This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products or actions may not be stated or implied.
    [Show full text]
  • Personal Knowledge Models with Semantic Technologies
    Max Völkel Personal Knowledge Models with Semantic Technologies Personal Knowledge Models with Semantic Technologies Max Völkel 2 Bibliografische Information Detaillierte bibliografische Daten sind im Internet über http://pkm. xam.de abrufbar. Covergestaltung: Stefanie Miller Herstellung und Verlag: Books on Demand GmbH, Norderstedt c 2010 Max Völkel, Ritterstr. 6, 76133 Karlsruhe This work is licensed under the Creative Commons Attribution- ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Fran- cisco, California, 94105, USA. Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswis- senschaften (Dr. rer. pol.) von der Fakultät für Wirtschaftswissenschaften des Karlsruher Instituts für Technologie (KIT) genehmigte Dissertation von Dipl.-Inform. Max Völkel. Tag der mündlichen Prüfung: 14. Juli 2010 Referent: Prof. Dr. Rudi Studer Koreferent: Prof. Dr. Klaus Tochtermann Prüfer: Prof. Dr. Gerhard Satzger Vorsitzende der Prüfungskommission: Prof. Dr. Christine Harbring Abstract Following the ideas of Vannevar Bush (1945) and Douglas Engelbart (1963), this thesis explores how computers can help humans to be more intelligent. More precisely, the idea is to reduce limitations of cognitive processes with the help of knowledge cues, which are external reminders about previously experienced internal knowledge. A knowledge cue is any kind of symbol, pattern or artefact, created with the intent to be used by its creator, to re- evoke a previously experienced mental state, when used. The main processes in creating, managing and using knowledge cues are analysed. Based on the resulting knowledge cue life-cycle, an economic analysis of costs and benefits in Personal Knowledge Management (PKM) processes is performed.
    [Show full text]
  • A Grammar for Standardized Wiki Markup
    A Grammar for Standardized Wiki Markup Martin Junghans, Dirk Riehle, Rama Gurram, Matthias Kaiser, Mário Lopes, Umit Yalcinalp SAP Research, SAP Labs LLC 3475 Deer Creek Rd Palo Alto, CA, 94304 U.S.A. +1 (650) 849 4087 [email protected], [email protected], {[email protected]} ABSTRACT 1 INTRODUCTION Today’s wiki engines are not interoperable. The rendering engine Wikis were invented in 1995 [9]. They have become a widely is tied to the processing tools which are tied to the wiki editors. used tool on the web and in the enterprise since then [4]. In the This is an unfortunate consequence of the lack of rigorously form of Wikipedia, for example, wikis are having a significant specified standards. This paper discusses an EBNF-based gram- impact on society [21]. Many different wiki engines have been mar for Wiki Creole 1.0, a community standard for wiki markup, implemented since the first wiki was created. All of these wiki and demonstrates its benefits. Wiki Creole is being specified us- engines integrate the core page rendering engine, its storage ing prose, so our grammar revealed several categories of ambigui- backend, the processing tools, and the page editor in one software ties, showing the value of a more formal approach to wiki markup package. specification. The formalization of Wiki Creole using a grammar Wiki pages are written in wiki markup. Almost all wiki engines shows performance problems that today’s regular-expression- define their own markup language. Different software compo- based wiki parsers might face when scaling up. We present an nents of the wiki engine like the page rendering part are tied to implementation of a wiki markup parser and demonstrate our test that particular markup language.
    [Show full text]
  • A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques
    A GENERAL ARCHITECTURE TO ENHANCE WIKI SYSTEMS WITH NATURAL LANGUAGE PROCESSING TECHNIQUES BAHAR SATELI A THESIS IN THE DEPARTMENT OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING PRESENTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE IN SOFTWARE ENGINEERING CONCORDIA UNIVERSITY MONTREAL´ ,QUEBEC´ ,CANADA APRIL 2012 c BAHAR SATELI, 2012 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Bahar Sateli Entitled: A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques and submitted in partial fulfillment of the requirements for the degree of Master of Applied Science in Software Engineering complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining commitee: Chair Dr. Volker Haarslev Examiner Dr. Leila Kosseim Examiner Dr. Gregory Butler Supervisor Dr. Rene´ Witte Approved Chair of Department or Graduate Program Director 20 Dr. Robin A. L. Drew, Dean Faculty of Engineering and Computer Science Abstract A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques Bahar Sateli Wikis are web-based software applications that allow users to collabora- tively create and edit web page content, through a Web browser using a simplified syntax. The ease-of-use and “open” philosophy of wikis has brought them to the attention of organizations and online communities, leading to a wide-spread adoption as a simple and “quick” way of collab- orative knowledge management. However, these characteristics of wiki systems can act as a double-edged sword: When wiki content is not prop- erly structured, it can turn into a “tangle of links”, making navigation, organization and content retrieval difficult for their end-users.
    [Show full text]
  • Natural Language Processing for Mediawiki: the Semantic Assistants Approach
    Natural Language Processing for MediaWiki: The Semantic Assistants Approach ∗ Bahar Sateli and René Witte Semantic Software Lab Department of Computer Science and Software Engineering Concordia University, Montréal, QC, Canada [sateli,witte]@semanticsoftware.info ABSTRACT its content that answers a specific question of a user, asked We present a novel architecture for the integration of Natural in natural language? Language Processing (NLP) capabilities into wiki systems. Or image a wiki used to curate knowledge for biofuel The vision is that of a new generation of wikis that can research, where expert biologist go through research publica- help developing their own primary content and organize their tions stored in the wiki in order to extract relevant knowledge. structure by using state-of-the-art technologies from the NLP This involves the time-consuming and error-prone task of and Semantic Computing domains. The motivation for this locating biomedical entities, such as enzymes, organisms, or integration is to enable wiki users { novice or expert { to genes: Couldn't the wiki identify these entities in its pages benefit from modern text mining techniques directly within automatically, link them with other external data sources, their wiki environment. We implemented these ideas based and provide semantic markup for embedded queries? on MediaWiki and present a number of real-world application Consider a wiki used for collaborative requirements en- case studies that illustrate the practicability and effectiveness gineering, where the specification for a software product is of this approach. developed by software engineers, users, and other stakehold- ers. These natural language specifications are known to be prone for errors, including ambiguities and inconsistencies.
    [Show full text]
  • Personal Knowledge Models with Semantic Technologies
    Max Völkel Personal Knowledge Models with Semantic Technologies Personal Knowledge Models with Semantic Technologies Max Völkel 2 Bibliografische Information Detaillierte bibliografische Daten sind im Internet über http://pkm.xam.de abrufbar. Covergestaltung: Stefanie Miller c 2010 Max Völkel, Ritterstr. 6, 76133 Karlsruhe Alle Rechte vorbehalten. Dieses Werk sowie alle darin enthaltenen einzelnen Beiträge und Abbildungen sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsschutz zugelassen ist, bedarf der vorigen Zustimmung des Autors. Das gilt insbesondere für Vervielfältigungen, Bearbeitungen, Übersetzungen, Auswertung durch Datenbanken und die Einspeicherung und Verar- beitung in elektronische Systeme. Unter http://pkm.xam.de sind weitere Versionen dieses Werkes sowie weitere Lizenzangaben aufgeführt. Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswis- senschaften (Dr. rer. pol.) von der Fakultät für Wirtschaftswissenschaften des Karlsruher Instituts für Technologie (KIT) genehmigte Dissertation von Dipl.-Inform. Max Völkel. Tag der mündlichen Prüfung: 14. Juli 2010 Referent: Prof. Dr. Rudi Studer Koreferent: Prof. Dr. Klaus Tochtermann Prüfer: Prof. Dr. Gerhard Satzger Vorsitzende der Prüfungskommission: Prof. Dr. Christine Harbring Abstract Following the ideas of Vannevar Bush (1945) and Douglas Engelbart (1963), this thesis explores how computers can help humans to be more intelligent. More precisely, the idea is to reduce limitations of cognitive processes with the help of knowledge cues, which are external reminders about previously experienced internal knowledge. A knowledge cue is any kind of symbol, pattern or artefact, created with the intent to be used by its creator, to re- evoke a previously experienced mental state, when used. The main processes in creating, managing and using knowledge cues are analysed. Based on the resulting knowledge cue life-cycle, an economic analysis of costs and benefits in Personal Knowledge Management (PKM) processes is performed.
    [Show full text]
  • What Is Creole?
    What is Creole? Problem Solution There are hundreds of wiki engines, each with After many long months of cooperation, we finally their own markup with a very few using reached a point where we were not able to find WYSIWYG. People who work with more than one any more commonalities. Increasing wiki engine on a regular basis have trouble disagreements and the subsequent Creole 0.6 remembering which syntax is supported in which Poll showed that we couldn't reach consensus engine. Therefore, a common wiki markup is anymore. At this point we proposed in a last needed to help unite the wiki community. iteration the move to Creole 1.0. The wiki now has extensive reasoning through documentation of the empirical analysis and discussions of the elements that back up the spec. History Ward Cunningham, the founder of wikis, coined the term Creole, just like he coined the name wiki Implementation from the Hawaiian WikiWiki. He suggested this name at Wikimania 2006 in Boston, where we Creole is a project supported by many different presented our first empirical analysis on existing wiki engines, and the wikis who have implemented markup variants. Ward’s and our idea was to it are an illustration of that. More than ten engines create a common markup that was not now support Creole including DokuWiki, Ghestalt, standardization of an arbitrary existing markup, JSPWiki, Oddmuse, MoinMoin, NotesWiki, but rather a new markup language that was Nyctergatis Markup Engine, PmWiki, PodWiki, created out of the common elements of all existing and TiddlyWiki. Other applications include the engines out there.
    [Show full text]
  • Wikis and Collaborative Systems for Large Formal Mathematics
    Wikis and Collaborative Systems for Large Formal Mathematics Cezary Kaliszyk1? and Josef Urban2?? 1 University of Innsbruck, Austria 2 Czech Technical University in Prague Abstract In the recent years, there have been significant advances in formalization of mathematics, involving a number of large-scale formal- ization projects. This naturally poses a number of interesting problems concerning how should humans and machines collaborate on such deeply semantic and computer-assisted projects. In this paper we provide an overview of the wikis and web-based systems for such collaboration in- volving humans and also AI systems over the large corpora of fully formal mathematical knowledge. 1 Introduction: Formal Mathematics and its Collaborative Aspects In the last two decades, large corpora of complex mathematical knowledge have been encoded in a form that is fully understandable to computers [8, 15, 16, 19, 33, 41] . This means that the mathematical definitions, theorems, proofs and theories are explained and formally encoded in complete detail, allowing the computers to fully understand the semantics of such complicated objects. While in domains that deal with the real world rather than with the abstract world researchers might discuss what fully formal encoding exactly means, in computer mathematics there is one undisputed definition. A fully formal encoding is an encoding that ultimately allows computers to verify to the smallest detail the correctness of each step of the proofs written in the given logical formalism. The process of writing such computer-understandable and verifiable theorems, definitions, proofs and theories is called Formalization of Mathematics and more generally Interactive Theorem Proving (ITP). The ITP field has a long history dating back to 1960s [21], being officially founded in 1967 by the mathematician N.G.
    [Show full text]
  • Taller De Wikis Máster Gestión De Patrimonio Cultural
    Taller de wikis Máster Gestión de Patrimonio Cultural Miquel Vidal GSyC/LibreSoft - Universidad Rey Juan Carlos [email protected] || [email protected] 24 de mayo 2008, Medialab, Madrid Taller de wikis Miquel Vidal CC-by-sa – p. 1 Qué es un wiki Un wiki es el nombre de una tecnología web que tiene como características comunes: Puede ser editado por distintos usuarios mediante un simple navegador. Dispone de un control de versiones y de cambios que permite ver y recuperar cualquier estado anterior de una página. Dispone de un sencillo lenguaje de marcación propio, aunque no estandarizado: CamelCase (convención de nombres sin espacios para crear hipervínculos) y Creole (propuesta de estandarización desde cero). Taller de wikis Miquel Vidal CC-by-sa – p. 2 Historia de los wikis La historia de los wikis se remonta a mediados de los años noventa: Ward Cunningham, un programador estadounidense, inició el desarrollo del primer wiki en 1994. Lo denominó wiki-wiki, a partir de la palabra hawaiana wiki, que significa “rápido”, para reflejar la rapidez y simpleza de edición. Algunas veces se ha interpretado como un falso acrónimo (un retroacrónimo): “What I Know Is”. La idea se emparenta con un viejo concepto que expuso el ingeniero Vannevar Bush en los años cuarenta en un artículo seminal y pionero publicado tras la guerra mundial (“As We May Think”). Taller de wikis Miquel Vidal CC-by-sa – p. 3 Historia de los wikis (y 2) Se empezaron usando en el desarrollo de documentación técnica en proyectos de software libre. El éxito más visible hoy día de los wikis es Wikipedia.
    [Show full text]
  • Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia
    Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia Hannes Dohrn Dirk Riehle Friedrich-Alexander-University Friedrich-Alexander-University Erlangen-Nürnberg Erlangen-Nürnberg Martensstr. 3, 91058 Erlangen, Germany Martensstr. 3, 91058 Erlangen, Germany +49 9131 85 27621 +49 9131 85 27621 [email protected] [email protected] ABSTRACT 1. INTRODUCTION The heart of each wiki, including Wikipedia, is its content. The content of wikis is being described using specialized Most machine processing starts and ends with this content. languages, commonly called wiki markup languages. Despite At present, such processing is limited, because most wiki en- their innocent name, these languages can be fairly complex gines today cannot provide a complete and precise represen- and include complex visual layout mechanisms as well as full- tation of the wiki's content. They can only generate HTML. fledged programming language features like variables, loops, The main reason is the lack of well-defined parsers that can function calls, and recursion. handle the complexity of modern wiki markup. This applies Most wiki markup languages grew organically and with- to MediaWiki, the software running Wikipedia, and most out any master plan. As a consequence, there is no well- other wiki engines. defined grammar, no precise syntax and semantics specifica- This paper shows why it has been so difficult to develop tion. The language is defined by its parser implementation, comprehensive parsers for wiki markup. It presents the de- and this parser implementation is closely tied to the rest of sign and implementation of a parser for Wikitext, the wiki the wiki engine.
    [Show full text]