R in a Nutshell
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
R-Based Strategies for DH in English Linguistics: a Case Study
R-based strategies for DH in English Linguistics: a case study Nicolas Ballier Paula Lissón Université Paris Diderot Université Paris Diderot UFR Études Anglophones UFR Études Anglophones CLILLAC-ARP (EA 3967) CLILLAC-ARP (EA 3967) nicolas.ballier@univ- [email protected] paris-diderot.fr paris-diderot.fr language for research in linguistics, both in a Abstract quantitative and qualitative approach. Developing a culture based on the program- This paper is a position statement advocating ming language R (R Core Team 2016) for NLP the implementation of the programming lan- among MA and PhD students doing English Lin- guage R in a curriculum of English Linguis- guistics is no easy task. Broadly speaking, most tics. This is an illustration of a possible strat- of the students have little background in mathe- egy for the requirements of Natural Language matics, statistics, or programming, and usually Processing (NLP) for Digital Humanities (DH) studies in an established curriculum. R feel reluctant to study any of these disciplines. plays the role of a Trojan Horse for NLP and While most PhD students in English linguistics statistics, while promoting the acquisition of are former students with a Baccalauréat in Sci- a programming language. We report an over- ences, some MA students pride themselves on view of existing practices implemented in an having radically opted out of Maths. However, MA and PhD programme at the University of we believe that students should be made aware of Paris Diderot in the recent years. We empha- the growing use of statistical and NLP methods size completed aspects of the curriculum and in linguistics and to be able to interpret and im- detail existing teaching strategies rather than plement these techniques. -
Fortran Programming Guide
Sun Studio 12: Fortran Programming Guide Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 819–5262 Copyright 2007 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more U.S. patents or pending patent applications in the U.S. and in other countries. U.S. Government Rights – Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements. This distribution may include materials developed by third parties. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, the Solaris logo, the Java Coffee Cup logo, docs.sun.com, Java, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and SunTM Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. -
R for Statistics: First Steps
R for statistics: first steps CAR meeting, Raleigh, Feb 26 2011 Peter Aldhous, San Francisco Bureau Chief [email protected] A note of caution before using any stats package: Beware running with scissors! You can bamboozle yourself, and your readers, by misusing statistics. Make sure you understand the methods you are using, in particular the assumptions that must be met for them to be valid (e.g. normal distribution for many common tests). So before rushing in, consult: • IRE tipsheets (e.g. Donald & LaFleur, #2752; Donald & Hacker, #2731) • Statistical textbooks (e.g. http://www.statsoft.com/textbook/ is free online) • Experts who can provide a reality check on your analysis! Getting started: Download R, instructions at: http://www.r-project.org/ Start the program: Prepare the data: 1) Save as a csv file 2) Point R at the data First steps in the R command line: Load and examine the data: View a basic summary: What is the mean age and salary for CEOs in each sector? An alternative, computing mean and standard deviation in one go: Is there a correlation between age and salary? Draw a graph to explore the correlation analysis: Does mean CEO age differ significantly across sectors? Does mean CEO salary differ significantly across sectors? Draw a graph to explore the distribution of salary by sector: Moving beyond the basic functions: R packages The R community has written 2800+ extensions to R’s basic statistical and graphical functions, available from the CRAN repository: http://cran.r-project.org/web/packages/ Follow the links to find a PDF Reference Manual for each package. -
Creating Telephony Applications for Both Windows® and Linux
Application Note Dialogic Research, Inc. Dual OS Applications Creating Telephony Applications for Both Windows® and Linux: Principles and Practice Application Note Creating Telephony Applications for Both Windows® and Linux: Principles and Practice Executive Summary To help architects and programmers who only have experience in a Windows® environment move their telephony applications to Linux, this application note provides information that aims to make the transition easier. At the same time, it takes into account that the original code for Windows may not be abandoned. The ultimate goal is to demonstrate how to create flexible OS-agnostic telephony applications that are easy to build, deploy, and maintain in either environment. Creating Telephony Applications for Both Windows® and Linux: Principles and Practice Application Note Table of Contents Introduction .......................................................................................................... 2 Moving to a Dual Operating System Environment ................................................. 2 CMAKE ......................................................................................................... 2 Boost Jam .................................................................................................... 2 Eclipse .......................................................................................................... 3 Visual SlickEdit ............................................................................................. 3 Using Open Source Portable -
Eviews 9 Command and Programming Reference Eviews 9 Command and Programming Reference Copyright © 1994–2015 IHS Global Inc
EViews 9 Command and Programming Reference EViews 9 Command and Programming Reference Copyright © 1994–2015 IHS Global Inc. All Rights Reserved ISBN: 978-1-880411-29-2 This software product, including program code and manual, is copyrighted, and all rights are reserved by IHS Global Inc. The distribution and sale of this product are intended for the use of the original purchaser only. Except as permitted under the United States Copyright Act of 1976, no part of this product may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of IHS Global Inc. Disclaimer The authors and IHS Global Inc. assume no responsibility for any errors that may appear in this manual or the EViews program. The user assumes all responsibility for the selection of the pro- gram to achieve intended results, and for the installation, use, and results obtained from the pro- gram. Trademarks EViews® is a registered trademark of IHS Global Inc. Windows, Excel, PowerPoint, and Access are registered trademarks of Microsoft Corporation. PostScript is a trademark of Adobe Corpora- tion. X11.2 and X12-ARIMA Version 0.2.7, and X-13ARIMA-SEATS are seasonal adjustment pro- grams developed by the U. S. Census Bureau. Tramo/Seats is copyright by Agustin Maravall and Victor Gomez. Info-ZIP is provided by the persons listed in the infozip_license.txt file. Please refer to this file in the EViews directory for more information on Info-ZIP. Zlib was written by Jean-loup Gailly and Mark Adler. -
An Introduction to R Notes on R: a Programming Environment for Data Analysis and Graphics Version 4.1.1 (2021-08-10)
An Introduction to R Notes on R: A Programming Environment for Data Analysis and Graphics Version 4.1.1 (2021-08-10) W. N. Venables, D. M. Smith and the R Core Team This manual is for R, version 4.1.1 (2021-08-10). Copyright c 1990 W. N. Venables Copyright c 1992 W. N. Venables & D. M. Smith Copyright c 1997 R. Gentleman & R. Ihaka Copyright c 1997, 1998 M. Maechler Copyright c 1999{2021 R Core Team Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into an- other language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team. i Table of Contents Preface :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 1 1 Introduction and preliminaries :::::::::::::::::::::::::::::::: 2 1.1 The R environment :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 2 1.2 Related software and documentation ::::::::::::::::::::::::::::::::::::::::::::::: 2 1.3 R and statistics :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 2 1.4 R and the window system :::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -
Geoptimization API
Geoptimization API GEOCONCEPT SAS Copyright © 2020 Geoconcept This manual is Geoconcept property 1 Geoptimization API Introduction ...................................................................................................................................... 3 What new features are available? ............................................................................................. 4 Geoptimization JavaScript API .......................................................................................................... 6 Load the JavaScript API ........................................................................................................... 6 Specify your credentials ............................................................................................................ 6 Map ......................................................................................................................................... 6 Map Control ........................................................................................................................... 10 Map Overlay ........................................................................................................................... 12 Map Mode .............................................................................................................................. 21 Layer ...................................................................................................................................... 24 Geolocation ........................................................................................................................... -
Practical Perl Tools: Date: January 9, 2008 2:10:14 PM EST Subject: Re: [SAGE] Crontabs Vs /Etc/ Back in Timeline Cron.[Daily,Hourly,*] Vs /Etc/Cron.D
yo u k n o W, I Wa s j u s t m I n d I n g my own business, reading my email and stuff, d Av I d n . B l A n K- e d e l MA n when the following message from the SAGE mailing list came on my screen (slightly ex- cerpted but reprinted with permission): From: [email protected] practical Perl tools: Date: January 9, 2008 2:10:14 PM EST Subject: Re: [SAGE] crontabs vs /etc/ back in timeline cron.[daily,hourly,*] vs /etc/cron.d/ David N. Blank-Edelman is the Director of Technology On a more specific aspect of this (without at the Northeastern University College of Computer regard to best practice), does anyone know and Information Science and the author of the O’Reilly book Perl for System Administration. He has spent the of a tool that converts crontabs into Gantt past 22+ years as a system/network administrator in charts? I’ve always wanted to visualize how large multiplatform environments, including Brandeis University, Cambridge Technology Group, and the MIT the crontab jobs (on a set of machines) line Media Laboratory. He was the program chair of the up in time. Each entry would need to be LISA ’05 conference and one of the LISA ’06 Invited Talks co-chairs. supplemented with an estimate of the dura- [email protected]@ccs.neu.edu tion of the job (3 minutes vs 3 hours). JM I just love sysadmin-related visualization ideas. This also seemed like a fun project with some good discrete parts well-suited to a column. -
The Use of Statistical Software to Teach Nonparametric Curve Estimation: from Excel to R
ICOTS8 (2010) Invited Paper Refereed Cao & Naya THE USE OF STATISTICAL SOFTWARE TO TEACH NONPARAMETRIC CURVE ESTIMATION: FROM EXCEL TO R Ricardo Cao and Salvador Naya Research Group MODES, Department of Mathematics, University of A Coruña, Spain [email protected] The advantages of using R and Excel for teaching nonparametric curve estimation are presented in this paper. The use of these two tools for teaching nonparametric curve estimation is illustrated by means of several well-known data sets. Computation of histogram and kernel density estimators as well as kernel and local polynomial regression estimators is presented using Excel and R. Interactive changes in the sample and the smoothing parameter are illustrated using both tools. R incorporates sophisticated routines for crucial issues in nonparametric curve estimation, as smoothing parameter selection. The paper concludes summarizing the relative merits of these two tools for teaching nonparametric curve estimation and presenting RExcel, a free add-in for Excel that can be downloaded from the R distribution network. INTRODUCTION There has been an enormous expansion, over the past few years, on the use of computer and communication technologies for teaching statistics at different levels. Microsoft Excel is the most popular spreadsheet program that is used to store information in columns and rows, which can then be organized and/or processed. Many authors consider Microsoft Excel as an excellent tool for statistical education (see, for example, Giles, 2002). On the other hand, the use of free software is also one of the most interesting available tools for teaching statistics. Universal access to the Internet enables easy installation of free software. -
Appendix C Some Details of Matrix.Xla(M)
Appendix C Some details of Matrix.xla(m) C.1 Matrix nomenclature For the sake of notational compactness, we will denote a square diagonal matrix by D with elements dii, a square tridiagonal matrix by T with elements tij where | j – i | ≤ 1, most other square matrices by S, rectangular matrices by R, and all matrix elements by mij. A vector will be shown as v, with elements vi, and a scalar as s. Particular values are denoted by x when real, and by z when complex. All optional pa- rameters are shown in straight brackets, [ ]. All matrices, vectors, and scalars are assumed to be real, ex- cept when specified otherwise. All matrices are restricted to two dimensions, and vectors to one dimen- sion. Table C.1 briefly explains some matrix terms that will be used in subsequent tables. With some functions, the user is given the integer option Int of applying integer arithmetic. When a matrix only contains integer elements, selecting integer arithmetic may avoid most round-off problems. On the other hand, the range of integer arithmetic is limited, so that overflow errors may result if the ma- trix is large and/or contains large numbers. Another common option it Tiny, which defines the absolute value of quantities that can be regarded as most likely resulting from round-off errors, and are therefore set to zero. When not activated, the routine will use its user-definable default value. Condition of a matrix: ratio of its largest to smallest singular value Diagonal of a square matrix: the set of terms mij where i = j Diagonal matrix D square matrix with mij = 0 for all off-diagonal elements i ≠ j. -
R Course for the Nsos in the Arab Countries Part 3: R Data Management
R Course for the NSOs in the Arab countries Part 3: R Data Management Valentin Todorov1 1United Nations Industrial Development Organization, Vienna 18-20 May 2015 Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart 3: R Data Management18-20 May 2015 1 / 41 Outline 1 Motivation 2 Data exchange with other statistical tools 3 Reading and Writing in Excel Format 4 Reading Data in SDMX Format 5 R data base interfaces and Relational DBMSs 6 Case study: UNIDO database 7 R packages for database access 8 Exercise: The Data Expo 2006 9 Accessing international statistical databases 10 Summary and conclusions Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart 3: R Data Management18-20 May 2015 2 / 41 Motivation Motivation 1. A number of statistical software tools and other programs are in use in a statistical organization 2. The data exchange between such systems (SAS, SPSS, EViews, Stata, Excel, Matlab, Octave, etc) is essential 3. Reading and writing data from/to Excel is very important due to its extreme popularity 4. Often data are stored in relational databases (MS Access, MySql, DB2, MS SQL server, Sybase, etc.) and the size do not allow to extract them into flat files before analysis 5. Using SDMX for data and metadata exchange becomes more and more important Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart 3: R Data Management18-20 May 2015 3 / 41 Data exchange with other statistical tools R as a mediator Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart 3: R Data Management18-20 May 2015 4 / 41 Data exchange with other statistical tools Package foreign • Package foreign reads data stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase and others and writes in the format of some of these. -
Warning: Discarded Datum Unknown:” What Does It Mean?
1 “Warning: Discarded datum Unknown:” What Does it Mean? Richard E. Plant February 21, 2021 Additional topic to accompany Spatial Data Analysis in Ecology and Agriculture using R, Second Edition http://psfaculty.plantsciences.ucdavis.edu/plant/sda2.htm Chapter and section references are contained in that text, which is referred to as SDA2. 1. Introduction The reason that I felt the need to write this Additional Topic is shown in the following code sequence, taken from SDA2 Section 1.3. The objective is to create the image shown in Fig. 1, which is Fig. 1.5 in SDA2.The code is from the file 1.3.1.r on the SDA2 website. Figure 1. Reproduction of Figure 1.5 in SDA2. 2 The first step is to read the shapefile containing the data. > library(rgdal) > library(sf) > data.Set1.sf <- st_read("set1\\landcover.shp") Reading layer `landcover' from data source `C:\Data\Set1\landcover.shp' using driver `ESRI Shapefile' Simple feature collection with 1811 features and 6 fields geometry type: POLYGON dimension: XY bbox: xmin: 576446.9 ymin: 4396700 xmax: 589578.4 ymax: 4420790 CRS: NA Both of the library() function calls generate output that we will skip over for now but come back to in a moment. After reading it, we convert the sf object data.Set1.sf to an sp object and assign a projection. Here something unexpected happens. > data.Set1.cover <- as(data.Set1.sf, "Spatial") > proj4string(data.Set1.cover) <- CRS("+proj=utm +zone=10 +ellps=WGS84") Warning message: In showSRID(uprojargs, format = "PROJ", multiline = "NO") : Discarded datum Unknown based on WGS84 ellipsoid in CRS definition SDA2 was written using R version 3.4.3, and there was no warning message.