Detecting Vandalism on Wikipedia Across Multiple Languages

Total Page:16

File Type:pdf, Size:1020Kb

Detecting Vandalism on Wikipedia Across Multiple Languages Detecting Vandalism on Wikipedia across Multiple Languages Khoi-Nguyen Dao Tran A thesis submitted for the degree of Doctor of Philosophy The Australian National University May 2015 c Khoi-Nguyen Dao Tran 2015 Except where otherwise indicated, this thesis is my own original work. Khoi-Nguyen Dao Tran May 5, 2015 To my family for your enduring love and support. To my friends for sharing your life and experiences with me. To my supervisors for your inspiration, guidance, patience, and feedback. To the virtual online worlds and gaming communities that are always welcoming with new friendships, experiences, and knowledge. And to Wikipedia and its supporters for providing the conditions and foundations that have made this thesis possible. Thank you :-) Acknowledgments I thank my family for their enduring love and support throughout my life. There are no words that can express my gratitude for Dat Tran (Dad), Phuong Dao (Mum), Thao Tran (Sister), and our loving dogs Wiggy and Pepper. I thank my friends for sharing their life, experiences, and time with me; and for helping me through my hardest times while studying for my bachelor and postgrad- uate degrees at our university. I have spent many hours interacting with friends in online communities and spent countless more with real life friends. Many friends have left parts of their personality with me, and have helped me grow and appreciate life for all its ups and downs. In particular, I thank Lachlan Horne, Jimmy Thomson, and Mayank Daswani for the uncountable number of hours we have spent talking and hanging out, and for the life lessons that I have learnt from each of them. I thank my main supervisor, Peter Christen, for his inspiration, guidance, pa- tience, and prompt feedback for all aspects of my research. I have been extremely lucky and grateful to have a supervisor who is passionate about his work and the work of his students. I cannot thank him enough for the time he has taken out of his busy schedule to meet, review, and discuss my written work, and to attend my presentations and provide feedback. My research work, skills, and confidence have improved immensely under his supervision. I also thank my co-supervisors Scott Sanner and Lexing Xie for their continued support and providing guidance when I felt lost in pursuing a solution. I thank them for attending my presentations, and making time for meetings and providing feedback on my written work despite their busy schedule. In addition, I thank my supervisors, Mamoun Alazab and Roderick Broadhurst, at the cybercrime laboratory for providing me with an opportunity to extend my research work and skills to cybercrime detection, making Chapter 8 possible. The re- search in Chapter 8 is funded by an ARC Discovery Grant on the Evolution of Cyber- crime (DP 1096833), the Australian Institute of Criminology (Grant CRG 13/12-13), and the support of the ANU Research School of Asia and the Pacific. We also thank the Australian Communications and Media Authority (ACMA) and the Computer Emergency Response Team (CERT) Australia for their assistance in the provision of data and support. I have been part of an amazing research group with many interesting research topics. I thank them for their friendship, encouragements, support, and participation in my presentations and research work. Finally, I thank the Australian National University (ANU) and the Research School of Computer Science for providing a welcoming environment for study and pursu- ing research. I have made many friends and have had many interesting, fun, and memorable experiences at the ANU. vii Abstract Vandalism, the malicious modification or editing of articles, is a serious problem for free and open access online encyclopedias such as Wikipedia. Over the 13 year lifetime of Wikipedia, editors have identified and repaired vandalism in 1.6% of more than 500 million revisions of over 9 million English articles, but smaller manually inspected sets of revisions for research show vandalism may appear in 7% to 11% of all revisions of English Wikipedia articles. The persistent threat of vandalism has led to the development of automated programs (bots) and editing assistance programs to help editors detect and repair vandalism. Research into improving vandalism detection through application of machine learning techniques have shown significant improvements to detection rates of a wider variety of vandalism. However, the focus of research is often only on the English Wikipedia, which has led us to develop a novel research area of cross-language vandalism detection (CLVD). CLVD provides a solution to detecting vandalism across several languages through the development of language-independent machine learning models. These models can identify undetected vandalism cases across languages that may have insufficient identified cases to build learning models. The two main challenges of CLVD are (1) identifying language-independent features of vandalism that are common to mul- tiple languages, and (2) extensibility of vandalism detection models trained in one language to other languages without significant loss in detection rate. In addition, other important challenges of vandalism detection are (3) high detection rate of a va- riety of known vandalism types, (4) scalability to the size of Wikipedia in the number of revisions, and (5) ability to incorporate and generate multiple types of data that characterise vandalism. In this thesis, we present our research into CLVD on Wikipedia, where we identify gaps and problems in existing vandalism detection techniques. To begin our thesis, we introduce the problem of vandalism on Wikipedia with motivating examples, and then present a review of the literature. From this review, we identify and address the following research gaps. First, we propose techniques for summarising the user ac- tivity of articles and comparing the knowledge coverage of articles across languages. Second, we investigate CLVD using the metadata of article revisions together with article views to learn vandalism models and classify incoming revisions. Third, we propose new text features that are more suitable for CLVD than text features from the literature. Fourth, we propose a novel context-aware vandalism detection tech- nique for sneaky types of vandalism that may not be detectable through constructing features. Finally, to show that our techniques of detecting malicious activities are not limited to Wikipedia, we apply our feature sets to detecting malicious attachments and URLs in spam emails. Overall, our ultimate aim is to build the next genera- tion of vandalism detection bots that can learn and detect vandalism from multiple languages and extend their usefulness to other language editions of Wikipedia. ix x List of Publications Below is our list of publications and their relevance to this thesis. We include ad- ditional information on the quality of the conferences and journals (or transactions) from their Web sites, email communications, and the Computing Research and Edu- cation Association (CORE) of Australasia1. CORE provides a ranking of publication venues based on feedback from research peers2. The rankings are A* (flagship conference), A (excellent conference), B (good conference), and C (other). 1. Khoi-Nguyen Tran and Peter Christen. Identifying Multilingual Wikipedia Arti- cles based on Cross Language Similarity and Activity. 2013. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Manage- ment (CIKM). CORE ranking of A3. Poster paper (acceptance rate of 37.7%). This paper describes measures that summarise cross-language knowledge cov- erage and within-language user activity of Wikipedia articles in different lan- guages and over time. This work is the basis of Chapter 4 with extensions for this thesis of additional measures, visualisations, and analyses. 2. Khoi-Nguyen Tran and Peter Christen. Cross-Language Prediction of Vandalism on Wikipedia Using Article Views and Revisions. 2013. In Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). CORE ranking of A4. Full paper (acceptance rate of 11.3%). This paper investigates cross-language vandalism detection using features de- rived from metadata, and compares performance of multiple classifiers. This work is the basis of Chapter 5. 3. Khoi-Nguyen Tran and Peter Christen. Cross-Language Learning from Bots and Users to detect Vandalism on Wikipedia. 2015. In IEEE Transactions of Knowledge and Data Engineering (TKDE). CORE ranking of A5. Full paper (acceptance rate in the first 10 months of 2014 is 13.2%; impact factor in April 2015 of 1.81). This paper investigates cross-language vandalism detection using features de- rived from text data of Wikipedia articles, and provides an analysis of the con- tributions of bots in vandalism detection, which is often ignored by related work. This work is the basis of Chapter 6. 1http://www.core.edu.au 2http://www.core.edu.au/index.php/conference-rankings 3http://103.1.187.206/core/25/ 4http://103.1.187.206/core/1667/ 5http://103.1.187.206/core/2064/ xi xii 4. Khoi-Nguyen Tran, Peter Christen, Scott Sanner, and Lexing Xie. Context-Aware Detection of Sneaky Vandalism on Wikipedia across Multiple Languages. 2015. In Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). CORE ranking of A6. Awarded “Best Student Pa- per” at the PAKDD conference. This paper proposes a novel context-aware detection technique for sneaky vandalism, which are more difficult or ambigu- ous types of vandalism. This work is the basis of Chapter 7. 5. Khoi-Nguyen Tran, Mamoun Alazab, and Roderic Broadhurst. Towards a Fea- ture Rich Model for Predicting Spam Emails containing Malicious Attachments and URLs. 2013. In Proceedings of the 11th Australasian Data Mining Conference (AusDM). CORE ranking of B7. Full paper (acceptance rate of 37%). This paper shows the extension of vandalism detection – the detection of ma- licious activity on Wikipedia – to the detection of malicious content in spam emails from only the email text.
Recommended publications
  • Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract
    THE YALE LAW JOURNAL FORUM O CTOBER 9 , 2017 Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers abstract. The problem of fake news impacts a massive online ecosystem of individuals and organizations creating, sharing, and disseminating content around the world. One effective ap- proach to addressing false information lies in monitoring such information through an active, engaged volunteer community. Wikipedia, as one of the largest online volunteer contributor communities, presents one example of this approach. This Essay argues that the existing legal framework protecting intermediary companies in the United States empowers the Wikipedia community to ensure that information is accurate and well-sourced. The Essay further argues that current legal efforts to weaken these protections, in response to the “fake news” problem, are likely to create perverse incentives that will harm volunteer engagement and confuse the public. Finally, the Essay offers suggestions for other intermediaries beyond Wikipedia to help monitor their content through user community engagement. introduction Wikipedia is well-known as a free online encyclopedia that covers nearly any topic, including both the popular and the incredibly obscure. It is also an encyclopedia that anyone can edit, an example of one of the largest crowd- sourced, user-generated content websites in the world. This user-generated model is supported by the Wikimedia Foundation, which relies on the robust intermediary liability immunity framework of U.S. law to allow the volunteer editor community to work independently. Volunteer engagement on Wikipedia provides an effective framework for combating fake news and false infor- mation. 358 wikipedia and intermediary immunity: supporting sturdy crowd systems for producing reliable information It is perhaps surprising that a project open to public editing could be highly reliable.
    [Show full text]
  • Decentralization in Wikipedia Governance
    Decentralization in Wikipedia Governance Andrea Forte1, Vanessa Larco2 and Amy Bruckman1 1GVU Center, College of Computing, Georgia Institute of Technology {aforte, asb}@cc.gatech.edu 2Microsoft [email protected] This is a preprint version of the journal article: Forte, Andrea, Vanessa Larco and Amy Bruckman. (2009) Decentralization in Wikipedia Governance. Journal of Management Information Systems. 26(1) pp 49-72. Publisher: M.E. Sharp www.mesharpe.com/journals.asp Abstract How does “self-governance” happen in Wikipedia? Through in-depth interviews with twenty individuals who have held a variety of responsibilities in the English-language Wikipedia, we obtained rich descriptions of how various forces produce and regulate social structures on the site. Our analysis describes Wikipedia as an organization with highly refined policies, norms, and a technological architecture that supports organizational ideals of consensus building and discussion. We describe how governance on the site is becoming increasingly decentralized as the community grows and how this is predicted by theories of commons-based governance developed in offline contexts. We also briefly examine local governance structures called WikiProjects through the example of WikiProject Military History, one of the oldest and most prolific projects on the site. 1. The Mechanisms of Self-Organization Should a picture of a big, hairy tarantula appear in an encyclopedia article about arachnophobia? Does it illustrate the point, or just frighten potential readers? Reasonable people might disagree on this question. In a freely editable site like Wikipedia, anyone can add the photo, and someone else can remove it. And someone can add it back, and the process continues.
    [Show full text]
  • Achila, Visigothic King, 34 Acisclus, Córdoban Martyr, 158 Adams
    Index ; Achila, Visigothic king, 34 Almodóvar del Río, Spain, 123–24 Acisclus, Córdoban martyr, 158 Almonacid de la Cuba, Spain, 150. See Adams, Robert, 21 also Dams Aemilian, St., 160 Alonso de la Sierra, Juan, 97 Aerial photography, 40, 82 Amalaric, Visigothic king, 29–30, 132, Aetius, Roman general, 173–75 157 Africa, 4, 21–23; and amphorae, 116, Amber, 114 137, 187, 196; and ARS, 46, 56, 90, Ammianus Marcellinus, Roman histo- 99, 187; and Byzantine reconquest, rian, 166, 168 30; and ‹shing, 103; and olive oil, Amphorae, 43, 80, 199–200; exported 88, 188; and Roman army, 114, 127, from Spain, 44, 97–98, 113, 115–16, 166; and trade, 105, 141; and Van- 172; kilns, 61–62, 87–90, 184; from dals, 27–28, 97, 127, 174 North Africa, 129, 187. See also African Red Slip (ARS) pottery, 101, Kilns 147, 186–87, 191, 197; de‹nition, 41, Anderson, Perry, 5 43, 44, 46; and site survival, 90, Andujar, Spain, 38, 47, 63 92–95, 98–99; and trade, 105–6, 110, Annales, 8, 12, 39 114, 116, 129, 183 Annona: disruption by Vandals, 97, Agde, council of, 29, 36, 41 174; to Roman army, 44, 81, 114–17; Agglomeration, 40–42, 59, 92 to Rome, 23, 27, 44, 81, 113; under Agila, Visigothic king, 158–59. See Ostrogoths, 29, 133. See also Army also Athanagild Antioch, Syria, 126 Agrippa, Roman general, 118 Anti-Semitism, 12, 33. See also Jews Alans, 24, 26, 27, 34, 126, 175 Antonine Itinerary, 152 Alaric, Visigothic king, 2, 5, 26–27 Apuleius, Roman writer, 75–76, 122 Alaric II, Visigothic king, 29–30 Aqueducts, 119, 130, 134, 174–75 Alcalá del Río, Spain, 40, 44, 93, 123, Aquitaine, France, 2, 27, 45, 102 148 Arabs, 33–34, 132–33, 137.
    [Show full text]
  • Initiativen Roger Cloes Infobrief
    Wissenschaftliche Dienste Deutscher Bundestag Infobrief Entwicklung und Bedeutung der im Internet ehrenamtlich eingestell- ten Wissensangebote insbesondere im Hinblick auf die Wiki- Initiativen Roger Cloes WD 10 - 3010 - 074/11 Wissenschaftliche Dienste Infobrief Seite 2 WD 10 - 3010 - 074/11 Entwicklung und Bedeutung der ehrenamtlich im Internet eingestellten Wissensangebote insbe- sondere im Hinblick auf die Wiki-Initiativen Verfasser: Dr. Roger Cloes / Tim Moritz Hector (Praktikant) Aktenzeichen: WD 10 - 3010 - 074/11 Abschluss der Arbeit: 1. Juli 2011 Fachbereich: WD 10: Kultur, Medien und Sport Ausarbeitungen und andere Informationsangebote der Wissenschaftlichen Dienste geben nicht die Auffassung des Deutschen Bundestages, eines seiner Organe oder der Bundestagsverwaltung wieder. Vielmehr liegen sie in der fachlichen Verantwortung der Verfasserinnen und Verfasser sowie der Fachbereichsleitung. Der Deutsche Bundestag behält sich die Rechte der Veröffentlichung und Verbreitung vor. Beides bedarf der Zustimmung der Leitung der Abteilung W, Platz der Republik 1, 11011 Berlin. Wissenschaftliche Dienste Infobrief Seite 3 WD 10 - 3010 - 074/11 Zusammenfassung Ehrenamtlich ins Internet eingestelltes Wissen spielt in Zeiten des sogenannten „Mitmachweb“ eine zunehmend herausgehobene Rolle. Vor allem Wikis und insbesondere Wikipedia sind ein nicht mehr wegzudenkendes Element des Internets. Keine anderen vergleichbaren Wissensange- bote im Internet, auch wenn sie zum freien Abruf eingestellt sind und mit Steuern, Gebühren, Werbeeinnahmen finanziert oder als Gratisproben im Rahmen von Geschäftsmodellen verschenkt werden, erreichen die Zugriffszahlen von Wikipedia. Das ist ein Phänomen, das in seiner Dimension vor dem Hintergrund der urheberrechtlichen Dis- kussion und der Begründung von staatlichem Schutz als Voraussetzung für die Schaffung von geistigen Gütern kaum Beachtung findet. Relativ niedrige Verbreitungskosten im Internet und geringe oder keine Erfordernisse an Kapitalinvestitionen begünstigen diese Entwicklung.
    [Show full text]
  • The History of the Crusades Podcast Presents Reconquista: the Rise of Al-Andalus and the Reconquest of Spain Episode 3 the Battle of 711
    The History of the Crusades Podcast presents Reconquista: The Rise of Al-Andalus and the Reconquest of Spain Episode 3 The Battle of 711 Hello again. Last time we saw a Muslim fighting force under the command of Tariq ibn Zayid leave Tangier and make its way to a place on the southern coast of Spain, which will later be called Gibraltar. When we left the last episode, Tariq had spread his army around the bay which lies adjacent to Gibraltar, while King Roderic had mustered his armies at Cordoba and was heading southwards towards Gibraltar. Now I would absolutely love to take you through a detailed blow-by-blow account of the massively significant battle which is about to take place, but unfortunately I can't. Just like we discovered when looking at the final years of the Visigoths in the last episode, there is an irritating lack of dependable source material about this event. In fact, even to this day, historians aren't able to agree on the rather basic fact of where exactly the battle took place. Most famous battles take their names from the place where the fighting played out, but by the fact that this battle is generally known only as “The Battle of 711”, you can see that even this is unclear. But luckily we do know some facts, and we can make some educated guesses about the rest. So with a huge disclaimer about the accuracy of all of this, here is a rundown of the Battle of 711. Now, as we saw in the last episode, King Roderic’s rise to the throne was opposed by Akhila, who had been promised a hereditary crown following the death of his father.
    [Show full text]
  • Wikipedia Is Under 'Siege'
    Wikipedia and Academic Research A guide on interacting with the free encyclopaedia. Greek Fire Catapult (Harper's Engraving) Public domain image from Wikicommons. Wikipedia is under ‘siege’. It may be helpful, in the first instance, to think of Wikipedia in terms of it being under siege. Whether this is from politicians, PR companies or from private individuals seeking to either gain an advantage in some way or denigrate a rival in some way, the point remains that the Wikipedia community is primed to protect the integrity of its main Open Knowledge project in order to hold on what is good about it. Wikipedia administrators will often immediately revert any changes they perceive as a threat as part of their default safety-first approach. Consequently, the advice from Wikimedia UK is to adopt a ‘softly, softly’ approach when making edits to Wikipedia articles i.e. not overloading the encyclopaedia with too many external links at one time without due consideration of the relevance of the link to the article or if it really adds anything. Protecting the integrity of Wikipedia Parliament WikiEdits Twitter account - Screengrab 28/03/2016 With preserving the integrity of Wikipedia in mind, those Wikipedia edits made from Parliamentary IP addresses are routinely monitored through the Parliament WikiEdits Twitter account (@parliamentedits). Did you know? There are PR companies out there offering ‘Wikipedia sanitisation’ as a service. Imagine the following situation: A new Wikipedia User account has appeared and its only activity is to add external links- all to the same site- to a lot of articles. To a Wikipedia editor this looks promotional and doesn’t look like building an encyclopaedia.
    [Show full text]
  • Lessons from Citizendium
    Lessons from Citizendium Wikimania 2009, Buenos Aires, 28 August 2009 HaeB [[de:Benutzer:HaeB]], [[en:User:HaeB]] Please don't take photos during this talk. Citizendium Timeline ● September 2006: Citizendium announced. Sole founder: Larry Sanger, known as former editor-in-chief of Nupedia, chief organizer of Wikipedia (2001-2002), and later as Wikipedia critic ● October 2006: Started non-public pilot phase ● January 2007: “Big Unfork”: All unmodified copies of Wikipedia articles deleted ● March 2007: Public launch ● December 2007: Decision to use CC-BY-3.0, after debate about commercial reuse and compatibility with Wikipedia ● Mid-2009: Sanger largely inactive on Citizendium, focuses on WatchKnow ● August 2009: Larry Sanger announces he will step down as editor-in-chief soon (as committed to in 2006) Citizendium and Wikipedia: Similarities and differences ● Encyclopedia ● Strict real names ● Free license policy ● ● Open (anyone can Special role for contribute) experts: “editors” can issue content ● Created by amateurs decisions, binding to ● MediaWiki-based non-editors collaboration ● Governance: Social ● Non-profit contract, elements of a constitutional republic Wikipedian views of Citizendium ● Competitor for readers, contributions ● Ally, common goal of creating free encyclopedic content ● “Who?” ● In this talk: A long-time experiment testing several fundamental policy changes, in a framework which is still similar enough to that of Wikipedia to generate valuable evidence as to what their effect might be on WP Active editors: Waiting to explode ● Sanger (October 2007): ”At some point, possibly very soon, the Citizendium will grow explosively-- say, quadruple the number of its active contributors, or even grow by an order of magnitude ....“ © Aleksander Stos, CC-BY 3.0 Number of users that made at least one edit in each month Article creation rate: Still muddling Sanger (October 2007): “It's still possible that the project will, from here until eternity, muddle on creating 14 articles per day.
    [Show full text]
  • Community Or Social Movement? Piotr Konieczny
    Wikipedia: Community or social movement? Piotr Konieczny To cite this version: Piotr Konieczny. Wikipedia: Community or social movement?. Interface: a journal for and about social movements, 2009. hal-01580966 HAL Id: hal-01580966 https://hal.archives-ouvertes.fr/hal-01580966 Submitted on 4 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Interface: a journal for and about social movements Article Volume 1 (2): 212 - 232 (November 2009) Konieczny, Wikipedia Wikipedia: community or social movement? Piotr Konieczny Abstract In recent years a new realm for study of political and sociological phenomena has appeared, the Internet, contributing to major changes in our societies during its relatively brief existence. Within cyberspace, organizations whose existence is increasingly tied to this virtual world are of interest to social scientists. This study will analyze the community of one of the largest online organizations, Wikipedia, the free encyclopedia with millions of volunteer members. Wikipedia was never meant to be a community, yet it most certainly has become one. This study asks whether it is something even more –whether it is an expression of online activism, and whether it can be seen as a social movement organization, related to one or more of the Internet-centered social movements industries (in particular, the free and open-source software movement industry).
    [Show full text]
  • Reliability of Wikipedia 1 Reliability of Wikipedia
    Reliability of Wikipedia 1 Reliability of Wikipedia The reliability of Wikipedia (primarily of the English language version), compared to other encyclopedias and more specialized sources, is assessed in many ways, including statistically, through comparative review, analysis of the historical patterns, and strengths and weaknesses inherent in the editing process unique to Wikipedia. [1] Because Wikipedia is open to anonymous and collaborative editing, assessments of its Vandalism of a Wikipedia article reliability usually include examinations of how quickly false or misleading information is removed. An early study conducted by IBM researchers in 2003—two years following Wikipedia's establishment—found that "vandalism is usually repaired extremely quickly — so quickly that most users will never see its effects"[2] and concluded that Wikipedia had "surprisingly effective self-healing capabilities".[3] A 2007 peer-reviewed study stated that "42% of damage is repaired almost immediately... Nonetheless, there are still hundreds of millions of damaged views."[4] Several studies have been done to assess the reliability of Wikipedia. A notable early study in the journal Nature suggested that in 2005, Wikipedia scientific articles came close to the level of accuracy in Encyclopædia Britannica and had a similar rate of "serious errors".[5] This study was disputed by Encyclopædia Britannica.[6] By 2010 reviewers in medical and scientific fields such as toxicology, cancer research and drug information reviewing Wikipedia against professional and peer-reviewed sources found that Wikipedia's depth and coverage were of a very high standard, often comparable in coverage to physician databases and considerably better than well known reputable national media outlets.
    [Show full text]
  • Basic Spanish for the Camino
    BASIC SPANISH FOR THE CAMINO A Pilgrim’s Introduction to the Spanish Language and Culture American Pilgrims on the Camino www.americanpilgrims.org Northern California Chapter [email protected] February 1, 2020 Bienvenido peregrino Leaving soon on your Camino and need to learn some Spanish basics? Or perhaps you already know some Spanish and just need a refresher and some practice? In any case, here is a great opportunity to increase your awareness of the Spanish language and to prepare for your Camino and the transition into Spanish culture. Our meetings will focus on the language challenges that, as a pilgrim, you are likely to encounter on the Camino. While we will talk about culture, history, food, wine and many other day-to-day aspects of Spanish life, our objective will be to increase your language skills. Familiarity with the Spanish spoken in Spain will make the cultural transition easier for you and ultimately pay off with more satisfying human interactions along the Camino. Our meetings will be informal, in a comfortable environment and geared to making the review of Spanish an enjoyable experience. Buen Camino Emilio Escudero Northern California Chapter American Pilgrims on the Camino www.americanpilgrims.org TABLE OF CONTENTS Day 1 Agenda . i Day 2 Agenda . ii Day 3 Agenda . iii 01 - The Communities, Provinces and Geography of Spain . 1 02 - Spain - A Brief Introduction . 3 03 - A Brief History of Spain and the Camino de Santiago . 5 04 - Holidays and Observances in Spain 2020 . 11 05 - An Overview of the Spanish Language . 12 06 - Arabic Words Incorporated into Spanish .
    [Show full text]
  • Towards Content-Driven Reputation for Collaborative Code Repositories
    University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 8-28-2012 Towards Content-Driven Reputation for Collaborative Code Repositories Andrew G. West University of Pennsylvania, [email protected] Insup Lee University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_papers Part of the Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, and the Software Engineering Commons Recommended Citation Andrew G. West and Insup Lee, "Towards Content-Driven Reputation for Collaborative Code Repositories", 8th International Symposium on Wikis and Open Collaboration (WikiSym '12) , 13.1-13.4. August 2012. http://dx.doi.org/10.1145/2462932.2462950 Eighth International Symposium on Wikis and Open Collaboration, Linz, Austria, August 2012. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/750 For more information, please contact [email protected]. Towards Content-Driven Reputation for Collaborative Code Repositories Abstract As evidenced by SourceForge and GitHub, code repositories now integrate Web 2.0 functionality that enables global participation with minimal barriers-to-entry. To prevent detrimental contributions enabled by crowdsourcing, reputation is one proposed solution. Fortunately this is an issue that has been addressed in analogous version control systems such as the *wiki* for natural language content. The WikiTrust algorithm ("content-driven reputation"), while developed and evaluated in wiki environments operates under a possibly shared collaborative assumption: actions that "survive" subsequent edits are reflective of good authorship. In this paper we examine WikiTrust's ability to measure author quality in collaborative code development. We first define a mapping omfr repositories to wiki environments and use it to evaluate a production SVN repository with 92,000 updates.
    [Show full text]
  • Checkuser and Editing Patterns
    CheckUser and Editing Patterns Balancing privacy and accountability on Wikimedia projects Wikimania 2008, Alexandria, July 18, 2008 HaeB [[de:Benutzer:HaeB]], [[en:User:HaeB]] Please don't take photos during this talk. What are sockpuppets? ● Wikipedia and sister projects rely on being open: Anyone can edit, anyone can get an account. ● No ID control when registering (not even email address required) ● Many legitimate uses for multiple accounts ● “Sockpuppet” often implies deceptive intention, think ventriloquist What is the problem with sockpuppets? ● Ballot stuffing (some decision processes rely on voting, such as request for adminships and WMF board elections) ● “Dr Jekyll/Mr Hyde”: Carry out evil or controversial actions with a sockpuppet, such that the main account remains in good standing. E.g. trolling (actions intended to provoke adversive reactions and disrupt the community), or strawman accounts (putting the adversarial position in bad light) ● Artificial majorities in content disputes (useful if the wiki's culture values majority consensus over references and arguments), especially circumventing “three-revert-rule” ● Ban evasion What is the problem with sockpuppets? (cont'd) ● Newbies get bitten as a result of the possibility of ban evasion: ● Friedman and Resnick (The Social Cost of Cheap Pseudonyms, Journal of Economics and Management Strategy 2001): Proved in a game-theoretic model that the possibility of creating sockpuppets leads to distrust and discrimination against newcomers (if the community is otherwise successfully
    [Show full text]