Pretext Author's Guide

Total Page:16

File Type:pdf, Size:1020Kb

Pretext Author's Guide PreTeXt Author’s Guide PreTeXt Author’s Guide Robert A. Beezer University of Puget Sound DRAFT June 28, 2019 DRAFT ©2013–2016 Robert A. Beezer Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. Preface This guide will help you author a PreTeXt document. So it serves as a description of the PreTeXt xml vocabulary, along with the mechanics of creating the source and common output formats. Note that this is different than publishing your document, which is described in the PreTeXt Publisher’s Guide. Even if you intend to distribute your document with an open license, and you are both author and publisher, it is still helpful and instructive to understand, and separate, the two different steps and roles. v Contents Preface v 1 Start Here 1 1.1 Philosophy..............................................1 1.2 Formatting Your Source.......................................2 1.3 Where Next?.............................................3 2 A Careful, Quick, Minimal Example4 2.1 Authoring...............................................4 2.2 Setup.................................................5 2.3 Processing...............................................5 2.4 Extending the Minimal Example..................................6 2.5 Where Next?.............................................8 3 Overview of Features 9 3.1 Structure...............................................9 3.2 Paragraphs..............................................9 3.3 Cross-References........................................... 10 3.4 Titles................................................. 10 3.5 Mathematics............................................. 11 3.6 Images................................................. 11 3.7 Lists.................................................. 12 3.8 Exercises............................................... 12 3.9 Worksheets.............................................. 13 3.10 References............................................... 13 3.11 Figures and Tables.......................................... 13 3.12 Programs and Consoles....................................... 13 3.13 Special Characters.......................................... 14 3.14 Verbatim and Literal Text...................................... 14 3.15 Sage.................................................. 14 3.16 Side-by-Side Panels.......................................... 15 3.17 Mathematical Results........................................ 15 3.18 Front Matter............................................. 15 3.19 Back Matter.............................................. 15 3.20 Index and Notation Entries..................................... 15 3.21 WeBWorK Exercises......................................... 16 3.22 URLs and External References................................... 16 3.23 Video................................................. 16 3.24 Scientific Units............................................ 16 3.25 Accessibility.............................................. 17 vi CONTENTS vii 4 Processing, Tools and Workflow 18 4.1 Basic Processing........................................... 18 4.2 Modular Source Files......................................... 18 4.3 Verifying your Source........................................ 19 4.4 Customizations, String Parameters................................. 20 4.5 Customizations, Thin XSL Stylesheets............................... 20 4.6 Images and the mbx Script...................................... 21 4.7 Author Tools............................................. 21 4.8 Keeping Your Source Up-to-Date.................................. 22 4.9 File Management........................................... 24 4.10 Testing HTML Output Locally................................... 25 4.11 Doctesting Sage Code........................................ 26 4.12 Building Output in CoCalc..................................... 26 5 PreTeXt Vocabulary Specification 27 5.1 RELAX-NG Schema......................................... 27 5.2 Schematron Rules........................................... 28 5.3 Versions of the Schema........................................ 28 5.4 Reading RELAX-NG Compact Syntax............................... 29 5.5 Validation............................................... 29 5.6 Schema Browser........................................... 30 5.7 Editor Support for Schema..................................... 30 6 (∗) Topics 31 6.1 Paragraphs.............................................. 31 6.2 (∗) Exercises, Inline and Divisional................................. 39 6.3 (∗) Worksheets............................................ 39 6.4 (∗) Lists of Works Cited (References)................................ 39 6.5 Verbatim and Literal Text...................................... 39 6.6 Cross-References and Citations................................... 40 6.7 Divisions................................................ 44 6.8 Mathematics............................................. 44 6.9 Lists.................................................. 48 6.10 Exercises and their Solutions.................................... 52 6.11 Images................................................. 53 6.12 (∗) Tables and Tabulars....................................... 55 6.13 (∗) Program Listings......................................... 55 6.14 Side-by-Side Panels.......................................... 55 6.15 (∗) Front and Back Matter..................................... 56 6.16 Index................................................. 56 6.17 (∗) Notation.............................................. 59 6.18 (∗) Automatic Lists.......................................... 59 6.19 URLs and External References................................... 59 6.20 Video................................................. 60 6.21 (∗) Music............................................... 63 6.22 (∗) Units of Measure......................................... 63 6.23 Unicode Characters.......................................... 63 6.24 (∗) Testing Sage Examples...................................... 64 6.25 Xinclude Modularization....................................... 64 6.26 Accessibility.............................................. 65 7 Authoring Advice 68 7.1 Writing Your Student-Friendly Math Textbook.......................... 68 CONTENTS viii 8 The mbx Script 70 8.1 Running mbx ............................................. 70 8.2 Example Use............................................. 70 8.3 Strategy................................................ 71 8.4 Debugging Image Generation.................................... 71 8.5 Restricting the Scope......................................... 71 8.6 Configuring External Helper Programs............................... 72 8.7 Output................................................ 72 8.8 mbx Capabilities............................................ 72 8.9 mbx on Windows........................................... 72 8.10 Python requests Library...................................... 73 9 WeBWorK Exercises 74 9.1 WeBWorK Problems......................................... 74 A Welcome to the PreTeXt Community 78 A.1 Help and Support........................................... 78 A.2 Feature Requests and Reporting Problems............................. 79 A.3 Contributing............................................. 79 A.4 Personal Email............................................ 79 B FAQ: Frequently Asked Questions 81 Glossary 85 C Best Practices 88 D Conversion from LATEX 89 E Text Editors, Spell Check 90 E.1 Sublime Text............................................. 90 E.2 Aspell................................................. 97 E.3 emacs................................................. 98 E.4 XML Copy Editor.......................................... 98 E.5 vi, vim................................................. 98 F Schema Tools 99 F.1 Jing and Trang............................................ 99 F.2 Schematron.............................................. 101 G Revision Control: git 102 H Windows Installation Notes 103 H.1 Setup................................................. 103 H.2 Installing xsltproc.......................................... 105 H.3 Installing git............................................. 107 H.4 Installing Anaconda......................................... 108 H.5 Installing ImageMagick....................................... 108 H.6 Installing Ghostscript........................................ 109 H.7 Installing pdf2svg .......................................... 109 H.8 Installing jing............................................. 110 H.9 What’s Missing............................................ 111 I Windows Subsystem for Linux 112 CONTENTS ix J The PreTeXt Vagrant box 114 K GNU Free Documentation License 117 References 123 Index 124 Chapter 1 Start Here Welcome to the Author’s Guide for PreTeXt. You are likely eager to get started, but familiarizing yourself with this chapter should save you a lot of time in the long run. We will try to keep it short and at the end of early chapters we will guide you on where to go next. Not everything we say here will make sense on your first reading, so come back after your first few trial runs. When you are ready to seek further help, or ask questions, please read the Welcome to
Recommended publications
  • Practice Tips for Open Source Licensing Adam Kubelka
    Santa Clara High Technology Law Journal Volume 22 | Issue 4 Article 4 2006 No Free Beer - Practice Tips for Open Source Licensing Adam Kubelka Matthew aF wcett Follow this and additional works at: http://digitalcommons.law.scu.edu/chtlj Part of the Law Commons Recommended Citation Adam Kubelka and Matthew Fawcett, No Free Beer - Practice Tips for Open Source Licensing, 22 Santa Clara High Tech. L.J. 797 (2005). Available at: http://digitalcommons.law.scu.edu/chtlj/vol22/iss4/4 This Article is brought to you for free and open access by the Journals at Santa Clara Law Digital Commons. It has been accepted for inclusion in Santa Clara High Technology Law Journal by an authorized administrator of Santa Clara Law Digital Commons. For more information, please contact [email protected]. ARTICLE NO FREE BEER - PRACTICE TIPS FOR OPEN SOURCE LICENSING Adam Kubelkat Matthew Fawcetttt I. INTRODUCTION Open source software is big business. According to research conducted by Optaros, Inc., and InformationWeek magazine, 87 percent of the 512 companies surveyed use open source software, with companies earning over $1 billion in annual revenue saving an average of $3.3 million by using open source software in 2004.1 Open source is not just staying in computer rooms either-it is increasingly grabbing intellectual property headlines and entering mainstream news on issues like the following: i. A $5 billion dollar legal dispute between SCO Group Inc. (SCO) and International Business Machines Corp. t Adam Kubelka is Corporate Counsel at JDS Uniphase Corporation, where he advises the company on matters related to the commercialization of its products.
    [Show full text]
  • Dockerdocker
    X86 Exagear Emulation • Android Gaming • Meta Package Installation Year Two Issue #14 Feb 2015 ODROIDMagazine DockerDocker OS Spotlight: Deploying ready-to-use Ubuntu Studio containers for running complex system environments • Interfacing ODROID-C1 with 16 Channel Relay Play with the Weather Board • ODROID-C1 Minimal Install • Device Configuration for Android Development • Remote Desktop using Guacamole What we stand for. We strive to symbolize the edge of technology, future, youth, humanity, and engineering. Our philosophy is based on Developers. And our efforts to keep close relationships with developers around the world. For that, you can always count on having the quality and sophistication that is the hallmark of our products. Simple, modern and distinctive. So you can have the best to accomplish everything you can dream of. We are now shipping the ODROID U3 devices to EU countries! Come and visit our online store to shop! Address: Max-Pollin-Straße 1 85104 Pförring Germany Telephone & Fax phone : +49 (0) 8403 / 920-920 email : [email protected] Our ODROID products can be found at http://bit.ly/1tXPXwe EDITORIAL ow that ODROID Magazine is in its second year, we’ve ex- panded into several social networks in order to make it Neasier for you to ask questions, suggest topics, send article submissions, and be notified whenever the latest issue has been posted. Check out our Google+ page at http://bit.ly/1D7ds9u, our Reddit forum at http://bit. ly/1DyClsP, and our Hardkernel subforum at http://bit.ly/1E66Tm6. If you’ve been following the recent Docker trends, you’ll be excited to find out about some of the pre-built Docker images available for the ODROID, detailed in the second part of our Docker series that began last month.
    [Show full text]
  • Specification for JSON Abstract Data Notation Version
    Standards Track Work Product Specification for JSON Abstract Data Notation (JADN) Version 1.0 Committee Specification 01 17 August 2021 This stage: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.html https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.pdf Previous stage: https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.html https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.pdf Latest stage: https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.html https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.pdf Technical Committee: OASIS Open Command and Control (OpenC2) TC Chair: Duncan Sparrell ([email protected]), sFractal Consulting LLC Editor: David Kemp ([email protected]), National Security Agency Additional artifacts: This prose specification is one component of a Work Product that also includes: JSON schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.json JADN schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.jadn Abstract: JSON Abstract Data Notation (JADN) is a UML-based information modeling language that defines data structure independently of data format.
    [Show full text]
  • An Introduction to Software Licensing
    An Introduction to Software Licensing James Willenbring Software Engineering and Research Department Center for Computing Research Sandia National Laboratories David Bernholdt Oak Ridge National Laboratory Please open the Q&A Google Doc so that I can ask you Michael Heroux some questions! Sandia National Laboratories http://bit.ly/IDEAS-licensing ATPESC 2019 Q Center, St. Charles, IL (USA) (And you’re welcome to ask See slide 2 for 8 August 2019 license details me questions too) exascaleproject.org Disclaimers, license, citation, and acknowledgements Disclaimers • This is not legal advice (TINLA). Consult with true experts before making any consequential decisions • Copyright laws differ by country. Some info may be US-centric License and Citation • This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). • Requested citation: James Willenbring, David Bernholdt and Michael Heroux, An Introduction to Software Licensing, tutorial, in Argonne Training Program on Extreme-Scale Computing (ATPESC) 2019. • An earlier presentation is archived at https://ideas-productivity.org/events/hpc-best-practices-webinars/#webinar024 Acknowledgements • This work was supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research (ASCR), and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. • This work was performed in part at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. • This work was performed in part at Sandia National Laboratories.
    [Show full text]
  • UNIVERSITY of CALIFORNIA SANTA CRUZ UNDERSTANDING and SIMULATING SOFTWARE EVOLUTION a Dissertation Submitted in Partial Satisfac
    UNIVERSITY OF CALIFORNIA SANTA CRUZ UNDERSTANDING AND SIMULATING SOFTWARE EVOLUTION A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in COMPUTER SCIENCE by Zhongpeng Lin December 2015 The Dissertation of Zhongpeng Lin is approved: Prof. E. James Whitehead, Jr., Chair Asst. Prof. Seshadhri Comandur Prof. Timothy J. Menzies Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright c by Zhongpeng Lin 2015 Table of Contents List of Figures v List of Tables vii Abstract ix Dedication xi Acknowledgments xii 1 Introduction 1 1.1 Emergent Phenomena in Software . 1 1.2 Simulation of Software Evolution . 3 1.3 Research Outline . 4 2 Power Law and Complex Networks 6 2.1 Power Law . 6 2.2 Complex Networks . 9 2.3 Empirical Studies of Software Evolution . 12 2.4 Summary . 17 3 Data Set and AST Differences 19 3.1 Data Set . 19 3.2 ChangeDistiller . 21 3.3 Data Collection Work Flow . 23 4 Change Size in Four Open Source Software Projects 24 4.1 Methodology . 25 4.2 Commit Size . 27 4.3 Monthly Change Size . 32 4.4 Summary . 36 iii 5 Generative Models for Power Law and Complex Networks 38 5.1 Generative Models for Power Law . 38 5.1.1 Preferential Attachment . 41 5.1.2 Self-organized Criticality . 42 5.2 Generative Models for Complex Networks . 50 6 Simulating SOC and Preferential Attachment in Software Evolution 53 6.1 Preferential Attachment . 54 6.2 Self-organized Criticality . 56 6.3 Simulation Model . 57 6.4 Experiment Setup .
    [Show full text]
  • ** OPEN SOURCE LIBRARIES USED in Tv.Verizon.Com/Watch
    ** OPEN SOURCE LIBRARIES USED IN tv.verizon.com/watch ------------------------------------------------------------ 02/27/2019 tv.verizon.com/watch uses Node.js 6.4 on the server side and React.js on the client- side. Both are Javascript frameworks. Below are the licenses and a list of the JS libraries being used. ** NODE.JS 6.4 ------------------------------------------------------------ https://github.com/nodejs/node/blob/master/LICENSE Node.js is licensed for use as follows: """ Copyright Node.js contributors. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. """ This license applies to parts of Node.js originating from the https://github.com/joyent/node repository: """ Copyright Joyent, Inc. and other Node contributors.
    [Show full text]
  • Darwin Information Typing Architecture (DITA) Version 1.3 Draft 29 May 2015
    Darwin Information Typing Architecture (DITA) Version 1.3 Draft 29 May 2015 Specification URIs This version: http://docs.oasis-open.org/dita/dita/v1.3/csd01/dita-v1.3-csd01.html (Authoritative version) http://docs.oasis-open.org/dita/dita/v1.3/csd01/dita-v1.3-csd01.pdf http://docs.oasis-open.org/dita/dita/v1.3/csd01/dita-v1.3-csd01-chm.zip http://docs.oasis-open.org/dita/dita/v1.3/csd01/dita-v1.3-csd01-xhtml.zip Previous version: Not applicable Latest version: http://docs.oasis-open.org/dita/dita/v1.3/dita-v1.3-csd01.html (Authoritative version) http://docs.oasis-open.org/dita/dita/v1.3/dita-v1.3-csd01.pdf http://docs.oasis-open.org/dita/dita/v1.3/dita-v1.3-csd01-chm.zip http://docs.oasis-open.org/dita/dita/v1.3/dita-v1.3-csd01-xhtml.zip Technical committee: OASIS Darwin Information Typing Architecture (DITA) TC Chair: Kristen James Eberlein ([email protected]), Eberlein Consulting LLC Editors: Robert D. Anderson ([email protected]), IBM Kristen James Eberlein ([email protected]), Eberlein Consulting LLC Additional artifacts: This prose specification is one component of a work product that also includes: OASIS DITA Version 1.3 RELAX NG: http://docs.oasis-open.org/dita/dita/v1.3/csd01/schemas/ DITA1.3-rng.zip OASIS DITA Version 1.3 DTDs: http://docs.oasis-open.org/dita/v1.3/os/DITA1.3-dtds.zip OASIS DITA Version 1.3 XML Schemas: http://docs.oasis-open.org/dita/v1.3/os/DITA1.3-xsds.zip DITA source that was used to generate this document: http://docs.oasis-open.org/dita/dita/v1.3/ csd01/source/DITA1.3-source.zip Abstract: The Darwin Information Typing Architecture (DITA) 1.3 specification defines both a) a set of document types for authoring and organizing topic-oriented information; and b) a set of mechanisms for combining, extending, and constraining document types.
    [Show full text]
  • Open Source Software Notice
    Open Source Software Notice This document describes open source software contained in LG Smart TV SDK. Introduction This chapter describes open source software contained in LG Smart TV SDK. Terms and Conditions of the Applicable Open Source Licenses Please be informed that the open source software is subject to the terms and conditions of the applicable open source licenses, which are described in this chapter. | 1 Contents Introduction............................................................................................................................................................................................. 4 Open Source Software Contained in LG Smart TV SDK ........................................................... 4 Revision History ........................................................................................................................ 5 Terms and Conditions of the Applicable Open Source Licenses..................................................................................... 6 GNU Lesser General Public License ......................................................................................... 6 GNU Lesser General Public License ....................................................................................... 11 Mozilla Public License 1.1 (MPL 1.1) ....................................................................................... 13 Common Public License Version v 1.0 .................................................................................... 18 Eclipse Public License Version
    [Show full text]
  • Accesso Alle Macchine Virtuali in Lab Vela
    Accesso alle Macchine Virtuali in Lab In tutti i Lab del camous esiste la possibilita' di usare: 1. Una macchina virtuale Linux Light Ubuntu 20.04.03, che sfrutta il disco locale del PC ed espone un solo utente: studente con password studente. Percio' tutti gli studenti che accedono ad un certo PC ed usano quella macchina virtuale hanno la stessa home directory e scrivono sugli stessi file che rimangono solo su quel PC. L'utente PUO' usare i diritti di amministratore di sistema mediante il comando sudo. 2. Una macchina virtuale Linux Light Ubuntu 20.04.03 personalizzata per ciascuno studente e la cui immagine e' salvata su un server di storage remoto. Quando un utente autenticato ([email protected]) fa partire questa macchina Virtuale LUbuntu, viene caricata dallo storage centrale un immagine del disco esclusivamente per quell'utente specifico. I file modificati dall'utente vengono salvati nella sua immagine sullo storage centrale. L'immagine per quell'utente viene utilizzata anche se l'utente usa un PC diverso. L'utente nella VM è studente con password studente ed HA i diritti di amministratore di sistema mediante il comando sudo. Entrambe le macchine virtuali usano, per ora, l'hypervisor vmware. • All'inizio useremo la macchina virtuale LUbuntu che salva i file sul disco locale, per poterla usare qualora accadesse un fault delle macchine virtuali personalizzate. • Dalla prossima lezione useremo la macchina virtuale LUbuntu che salva le immagini personalizzate in un server remoto. Avviare VM LUBUNTU in Locale (1) Se la macchina fisica è spenta occorre accenderla. Fatto il boot di windows occorre loggarsi sulla macchina fisica Windows usando la propria account istituzionale [email protected] Nel desktop Windows, aprire il File esplorer ed andare nella cartella C:\VM\LUbuntu Nella directory vedete un file LUbuntu.vmx Probabilmente l'estensione vmx non è visibile e ci sono molti file con lo stesso nome LUbuntu.
    [Show full text]
  • An Automated Approach to Grammar Recovery for a Dialect of the C++ Language
    An Automated Approach to Grammar Recovery for a Dialect of the C++ Language Edward B. Duffy and Brian A. Malloy School of Computing Clemson University Clemson, SC 29634, USA feduffy,[email protected] Abstract plications developed in a new or existing language [12, 15]. Frequently, the only representation of a dialect is the gram- In this paper we present the design and implementa- mar contained in the source code of a compiler; however, a tion of a fully automated technique for reverse engineering compiler grammar is difficult to comprehend since parser or recovering a grammar from existing language artifacts. generation algorithms place restrictions on the form of the The technique that we describe uses only test cases and a compiler grammar [16, page 10]. parse tree, and we apply the technique to a dialect of the Another problem with language dialects is that there C++ language. However, given test cases and a parse tree has been little research, to date, addressing the problem for a language or a dialect of a language, our technique of reverse engineering a grammar or language specifica- can be used to recover a grammar for the language, in- tion for a language dialect from existing language arti- cluding languages such as Java, C, Python or Ruby. facts. Lammel¨ and Verhoef have developed a technique that uses a language reference manual and test cases to recover a grammar for a language or dialect [16]. How- 1. Introduction ever, their technique requires user intervention along most of the stages of recovery and some of the recovery process The role of programming languages in software devel- is manual.
    [Show full text]
  • Are Projects Hosted on Sourceforge.Net Representative of the Population of Free/Open Source Software Projects?
    Are Projects Hosted on Sourceforge.net Representative of the Population of Free/Open Source Software Projects? Bob English February 5, 2008 Introduction We are researchers studying Free/Open Source Software (FOSS) projects . Because some of our work uses data gathered from Sourceforge.net (SF), a web site that hosts FOSS projects, we consider it important to understand how representative SF is of the entire population of FOSS projects. While SF is probably hosts the greatest number of FOSS projects, there are many other similar hosting sites. In addition, many FOSS projects maintain their own web sites and other project infrastructure on their own servers. Based on performing an Internet search and literature review (see Appendix for search procedures), it appears that no one knows how many Free/Open Source Software Projects exist in the world or how many people are working on them. Because the population of FOSS projects and developers is unknown, it is not surprising that we were not able to find any empirical analysis to assess how representative SF is of this unknown population. Although an extensive empirical analysis has not been performed, many researchers have considered the question or related questions. The next section of this paper reviews some of the FOSS literature that discusses the representativeness of SF. The third section of the paper describes some ideas for empirically exploring the representativeness of SF. The paper closes with some arguments for believing SF is the best choice for those interested in studying the population of FOSS projects. Literature Review In one of the earliest references to the representativeness of SF, Madley, Freeh and Tynan (2002) mention that they assumed that SF was representative because of its popularity and because of the number of projects hosted there, although they note that this assumption “needs to be confirmed” (p.
    [Show full text]
  • Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data?
    Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 6300–6309 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data? Kelly Dekker, Rob van der Goot University of Groningen, IT University of Copenhagen [email protected], [email protected] Abstract Social media is a valuable data resource for various natural language processing (NLP) tasks. However, standard NLP tools were often designed with standard texts in mind, and their performance decreases heavily when applied to social media data. One solution to this problem is to adapt the input text to a more standard form, a task also referred to as normalization. Automatic approaches to normalization have shown that they can be used to improve performance on a variety of NLP tasks. However, all of these systems are supervised, thereby being heavily dependent on the availability of training data for the correct language and domain. In this work, we attempt to overcome this dependence by automatically generating training data for lexical normalization. Starting with raw tweets, we attempt two directions, to insert non-standardness (noise) and to automatically normalize in an unsupervised setting. Our best results are achieved by automatically inserting noise. We evaluate our approaches by using an existing lexical normalization system; our best scores are achieved by custom error generation system, which makes use of some manually created datasets. With this system, we score 94.29 accuracy on the test data, compared to 95.22 when it is trained on human-annotated data.
    [Show full text]