Data Quality Management in Large-Scale Cyber-Physical Systems

Total Page:16

File Type:pdf, Size:1020Kb

Data Quality Management in Large-Scale Cyber-Physical Systems Data Quality Management in Large-Scale Cyber-Physical Systems Ahmed Abdulhasan Alwan School of Architecture, Computing and Engineering University of East London A thesis presented for the degree of Doctor of Philosophy July 19, 2021 Abstract Cyber-Physical Systems (CPSs) are cross-domain, multi-model, advance informa- tion systems that play a significant role in many large-scale infrastructure sectors of smart cities public services such as traffic control, smart transportation control, and environmental and noise monitoring systems. Such systems, typically, involve a substantial number of sensor nodes and other devices that stream and exchange data in real-time and usually are deployed in uncontrolled, broad environments. Thus, unexpected measurements may occur due to several internal and external factors, including noise, communication errors, and hardware failures, which may compromise these systems quality of data and raise serious concerns related to safety, reliability, performance, and security. In all cases, these unexpected measurements need to be carefully interpreted and managed based on domain knowledge and computational models. Therefore, in this research, data quality challenges were investigated, and a com- prehensive, proof of concept, data quality management system was developed to tackle unaddressed data quality challenges in large-scale CPSs. The data quality management system was designed to address data quality challenges associated with detecting: sensor nodes measurement errors, sensor nodes hardware failures, and mismatches in sensor nodes spatial and temporal contextual attributes. De- tecting sensor nodes measurement errors associated with the primary data quality dimensions of accuracy, timeliness, completeness, and consistency in large-scale CPSs were investigated using predictive and anomaly analysis models via utilising statistical and machine-learning techniques. Time-series clustering techniques were investigated as a feasible mean for detecting long-segmental outliers as an indicator of sensor nodes’ continuous halting and incipient hardware failures. Fur- thermore, the quality of the spatial and temporal contextual attributes of sensor nodes observations was investigated using timestamp analysis techniques. The different components of the data quality management system were tested and calibrated using benchmark time-series collected from a high-quality, temperature sensor network deployed at the University of East London. Furthermore, the effectiveness of the proposed data quality management system was evaluated using a real-world, large-scale environmental monitoring network consisting of more than 200 temperature sensor nodes distributed around London. i The data quality management system achieved high accuracy detection rate us- ing LSTM predictive analysis technique and anomaly detection associated with DBSCAN. It successfully identified timeliness and completeness errors in sensor nodes’ measurements using periodicity analysis combined with a rule engine. It achieved up to 100% accuracy in detecting potentially failed sensor nodes using the characteristic-based time-series clustering technique when applied to two days or longer time-series window. Timestamp analysis was adopted effectively for evaluating the quality of temporal and spatial contextual attributes of sensor nodes observations, but only within CPS applications in which using gateway modules is possible. ii Acknowledgements I would like to express my heartfelt appreciation to Professor Allan J Brimicombe, Dr. Mihaela Anca Ciupala, Dr. Paolo Falcarin and Dr. Andres Baravalle for their professional guidance and support throughout this research. This research was sponsored by the University of East London (UEL) under the PhD scholarship scheme 2017 and through the UEL Research Internship pro- gramme 2019. Therefore, I am very grateful for the generous sponsorship and kind support from UEL staff, in particular Richard Bottoms and Charlotte Forbes. I would like to thank all my friends and staff in the Ministry of Planning - Iraq for their continuous support. I am deeply grateful for the collaboration with Thingful. Ltd, UK, in particular, I thank Usman Haque for his insight and expertise that assisted the research. iii To Suad, My parents and Rawia iv Contents Abstract i Acknowledgements iii Table of Contents iv List of Figures xi List of Tables xxi List of Abbreviations xxv Author’s declaration xxvii 1 Introduction 1 1.1 Cyber-Physical Systems . 1 1.2 Smart Cities as Large-Scale CPSs . 6 1.3 Research Motivation . 11 1.4 The Research Aim . 15 1.5 The Research Questions and Objectives . 16 1.6 Novelty of Research . 18 1.7 Research Boundaries . 19 1.8 Research Structure . 20 2 Literature Review 22 2.1 Data Quality Concepts and Terminology . 23 2.1.1 Accuracy . 24 v 2.1.2 Time-Related Accuracy (Timeliness) . 25 2.1.3 Completeness (Completability) . 28 2.1.4 Consistency . 30 2.2 Data Quality Challenges in Large-Scale CPSs, a Systematic Litera- ture Review . 31 2.2.1 Review Motivation / Introduction . 32 2.2.2 Review Process and Methodology . 34 2.2.3 Review Conduct and Primary Studies Selection . 40 2.2.4 RQ1: Data Quality Challenges in Large-Scale CPSs. 57 2.2.5 RQ2: Data Mining and Data Quality Management in Large- Scale CPSs. 61 2.2.6 RQ3: Unaddressed Data Quality Management Challenges in Large-Scale CPSs and The Research Gap. 67 2.3 The Research Questions . 70 2.4 Summary . 71 3 System Design (Methodology) 72 3.1 Overview of Research Paradigms . 72 3.2 Empirical Research Methods . 74 3.2.1 Quantitative . 74 3.2.2 Qualitative . 74 3.2.3 Mixed-Method (Triangulation) . 75 3.3 Empirical Research Strategy . 76 3.3.1 Experimental . 76 3.3.2 Case Study . 77 3.3.3 Choosing the Research Methodology . 77 3.4 System Design and Development Phases . 81 3.4.1 System Analysis and Design . 85 3.4.2 Data Acquisition Unit . 87 3.4.3 Data Quality Assessment Unit . 91 3.4.4 Selecting Data Analysing Methods . 96 3.4.5 Time-Series Decomposition . 96 vi 3.5 Online Mode - Predictive Analysis Models . 99 3.5.1 Simple Forecasting Methods . 101 3.5.2 Holt-Winters Seasonal . 102 3.5.3 ARMA, ARIMA and Seasonal ARIMA Models . 104 3.5.4 Gaussian Process Regression . 110 3.5.5 Long Short-Term Memory Networks . 112 3.6 Online Mode – Anomaly Analysis Models . 114 3.6.1 Distance-Based Spatial Clustering (K-means) . 121 3.6.2 Density-Based Spatial Clustering (DBSCAN) . 123 3.7 Online Mode - Timestamp Analysis (Temporal Consistency) . 127 3.8 Offline Mode - Time-series Clustering . 130 3.8.1 Dynamic Time Warping . 134 3.8.2 K-Shape . 135 3.8.3 Characteristic-Based Time-Series Clustering . 135 3.9 Offline Mode – Timestamp Analysis (Spatial Attributes Consistency)136 3.10 Summary . 142 4 Implementation and Results 143 4.1 Data Acquisition and Data Process . 144 4.1.1 Sensor Node Networks . 144 4.1.2 Datasets . 150 4.1.3 Software Framework . 156 4.2 Online-Mode Data Quality Assessment . 164 4.2.1 Predictive Analysis Models . 165 4.2.2 Anomaly Analysis Models . 207 4.2.3 Timestamp Analysis (Temporal Consistency) . 220 4.3 Offline-Mode Data Quality Assessment . 223 4.3.1 Time-Series Clustering Models . 224 4.3.2 Timestamp Analysis Model (Spatial Attributes Consistency) 235 4.4 Discussion and Summary . 243 4.4.1 The Data Acquisition Unit (Layers 1 and 2) . 243 vii 4.4.2 The Data Quality Assessment Unit . 244 5 Conclusions and Future Work 253 5.1 Revisiting the Research Questions and Objectives . 254 5.1.1 Review Question-1: . 254 5.1.2 Review Question-2: . 260 5.1.3 Review Question-3: . 260 5.2 Contribution to Knowledge . 262 5.3 Conclusions . 263 5.4 Future Work . 266 References 267 Appendices 291 A Technical and Implementation Details 292 A.1 Sensor Nodes Anatomy and Data Quality . 292 A.1.1 Hardware Components . 293 A.1.2 Operating System . 294 A.2 The Technical Details of the Local Sensor Node Network . 296 A.3 Technical Details of The Data Acquisition Unit . 301 A.3.1 JSON Parsing and Duplication Prevention Trigger . 307 A.3.2 The Periodicity Analysis Rule Engine. 308 A.4 Configuration and programming details . 310 A.4.1 Holt-Winters predictive model . 310 A.4.2 ARMA and ARIMA predictive models . 313 A.4.3 SARIMA predictive model . 320 A.4.4 GPR predictive model . 322 A.4.5 LSTM predictive model . 326 A.4.6 K-means and DBSCAN Partitioning Models . 329 A.4.7 DTW and K-Shape Models . 335 A.4.8 Characteristics (features)-based Model . 338 viii B Research Integrity and Technology RI Completion Certificate 342 ix x List of Figures 1.1 Cyber-Physical Systems structure diagram. 3 2.1 A holistic overview of the processes adopted in this systematic literature review. 35 2.2 The number of SLR primary studies by the year of publication, (October 2020). 42 2.3 The number of SLR primary studies by the country of publication. 43 2.4 The ratio of the data quality dimensions addressed by the SLR primary studies. 59 2.5 The key data quality challenges in large-scale CPSs. 60 2.6 The most popular data quality assessment/management methods or techniques in large-scale CPS applications based on the number (left) and the ratio (right) of the SLR studies. 62 2.7 The most popular data mining techniques in large-scale CPSs based on the No. of the SLR studies utilising these techniques. 63 2.8 The key data mining techniques used to assess the main data quality dimensions in large-scale CPSs. 63 2.9 A holistic diagram of the main data quality management/assess- ment methods and techniques and their associated data quality dimensions that these techniques are addressing, based on the SLR results. 64 3.1 The main processes of the experimental research approach. 77 3.2 The main processes of the case study research approach.
Recommended publications
  • PDF Download
    Table of Contents About ICSEC 2019 ................................................................................................................... I Message from President of Prince of Songkla University .................................................. III Message from Rector of University North, Croatia ............................................................ IV Message from General Chair ................................................................................................ VI Keynote Speakers ................................................................................................................. VII Organizing Committee .......................................................................................................... XI Conference Venue ............................................................................................................... XIII Program at a Glance ........................................................................................................... XVI Technical Sessions Main Track ............................................................................................................................. 1 Special Session on Advanced Digital Media ....................................................................... 33 Special Session on Future SDN: Security, Virtualization, Systems and Architectures ....... 35 Author Index .......................................................................................................................... 36 List of Reviewers
    [Show full text]
  • Cyphyhouse: a Programming, Simulation, and Deployment Toolchain for Heterogeneous Distributed Coordination
    2020 IEEE International Conference on Robotics and Automation (ICRA) 31 May - 31 August, 2020. Paris, France CyPhyHouse: A programming, simulation, and deployment toolchain for heterogeneous distributed coordination Ritwika Ghosh1, Joao P. Jansch-Porto2, Chiao Hsieh1, Amelia Gosse2, Minghao Jiang3, Hebron Taylor2, Peter Du3, Sayan Mitra3, Geir Dullerud2 Abstract— Programming languages, libraries, and develop- With the aim of simplifying application development for ment tools have transformed the application development distributed and heterogeneous systems, we introduce CyPhy- processes for mobile computing and machine learning. This House1—an open source software toolchain for program- paper introduces CyPhyHouse—a toolchain that aims to provide similar programming, debugging, and deployment benefits for ming, simulating, and deploying mobile robotic applications. distributed mobile robotic applications. Users can develop In this work, we target distributed coordination applica- hardware-agnostic, distributed applications using the high-level, tions such as collaborative mapping [8], surveillance, de- event driven Koord programming language, without requiring livery, formation-flight, etc. with aerial drones and ground expertise in controller design or distributed network protocols. vehicles. We believe that for these applications, low-level The modular, platform-independent middleware of CyPhyHouse implements these functionalities using standard algorithms for motion control for the individual robots is standard but path planning (RRT), control (MPC), mutual exclusion, etc. tedious, and coordination across distributed (and possibly A high-fidelity, scalable, multi-threaded simulator for Koord heterogeneous) robots is particularly difficult and error- applications is developed to simulate the same application code prone. This motivates the two key abstractions provided for dozens of heterogeneous agents. The same compiled code by CyPhyHouse: (a) portability of high-level coordination can also be deployed on heterogeneous mobile platforms.
    [Show full text]
  • Behavioral Types in Programming Languages
    Behavioral Types in Programming Languages Behavioral Types in Programming Languages iv Davide Ancona, DIBRIS, Università di Genova, Italy Viviana Bono, Dipartimento di Informatica, Università di Torino, Italy Mario Bravetti, Università di Bologna, Italy / INRIA, France Joana Campos, LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal Giuseppe Castagna, CNRS, IRIF, Univ Paris Diderot, Sorbonne Paris Cité, Paris, France Pierre-Malo Deniélou, Royal Holloway, University of London, UK Simon J. Gay, School of Computing Science, University of Glasgow, UK Nils Gesbert, Université Grenoble Alpes, France Elena Giachino, Università di Bologna, Italy / INRIA, France Raymond Hu, Department of Computing, Imperial College London, UK Einar Broch Johnsen, Institutt for informatikk, Universitetet i Oslo, Norway Francisco Martins, LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal Viviana Mascardi, DIBRIS, Università di Genova, Italy Fabrizio Montesi, University of Southern Denmark Rumyana Neykova, Department of Computing, Imperial College London, UK Nicholas Ng, Department of Computing, Imperial College London, UK Luca Padovani, Dipartimento di Informatica, Università di Torino, Italy Vasco T. Vasconcelos, LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal Nobuko Yoshida, Department of Computing, Imperial College London, UK Boston — Delft Foundations and Trends R in Programming Languages Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is D. Ancona et al.. Behavioral Types in Programming Languages. Foundations and Trends R in Programming Languages, vol. 3, no.
    [Show full text]
  • Insight MFR By
    Manufacturers, Publishers and Suppliers by Product Category 11/6/2017 10/100 Hubs & Switches ASCEND COMMUNICATIONS CIS SECURE COMPUTING INC DIGIUM GEAR HEAD 1 TRIPPLITE ASUS Cisco Press D‐LINK SYSTEMS GEFEN 1VISION SOFTWARE ATEN TECHNOLOGY CISCO SYSTEMS DUALCOMM TECHNOLOGY, INC. GEIST 3COM ATLAS SOUND CLEAR CUBE DYCONN GEOVISION INC. 4XEM CORP. ATLONA CLEARSOUNDS DYNEX PRODUCTS GIGAFAST 8E6 TECHNOLOGIES ATTO TECHNOLOGY CNET TECHNOLOGY EATON GIGAMON SYSTEMS LLC AAXEON TECHNOLOGIES LLC. AUDIOCODES, INC. CODE GREEN NETWORKS E‐CORPORATEGIFTS.COM, INC. GLOBAL MARKETING ACCELL AUDIOVOX CODI INC EDGECORE GOLDENRAM ACCELLION AVAYA COMMAND COMMUNICATIONS EDITSHARE LLC GREAT BAY SOFTWARE INC. ACER AMERICA AVENVIEW CORP COMMUNICATION DEVICES INC. EMC GRIFFIN TECHNOLOGY ACTI CORPORATION AVOCENT COMNET ENDACE USA H3C Technology ADAPTEC AVOCENT‐EMERSON COMPELLENT ENGENIUS HALL RESEARCH ADC KENTROX AVTECH CORPORATION COMPREHENSIVE CABLE ENTERASYS NETWORKS HAVIS SHIELD ADC TELECOMMUNICATIONS AXIOM MEMORY COMPU‐CALL, INC EPIPHAN SYSTEMS HAWKING TECHNOLOGY ADDERTECHNOLOGY AXIS COMMUNICATIONS COMPUTER LAB EQUINOX SYSTEMS HERITAGE TRAVELWARE ADD‐ON COMPUTER PERIPHERALS AZIO CORPORATION COMPUTERLINKS ETHERNET DIRECT HEWLETT PACKARD ENTERPRISE ADDON STORE B & B ELECTRONICS COMTROL ETHERWAN HIKVISION DIGITAL TECHNOLOGY CO. LT ADESSO BELDEN CONNECTGEAR EVANS CONSOLES HITACHI ADTRAN BELKIN COMPONENTS CONNECTPRO EVGA.COM HITACHI DATA SYSTEMS ADVANTECH AUTOMATION CORP. BIDUL & CO CONSTANT TECHNOLOGIES INC Exablaze HOO TOO INC AEROHIVE NETWORKS BLACK BOX COOL GEAR EXACQ TECHNOLOGIES INC HP AJA VIDEO SYSTEMS BLACKMAGIC DESIGN USA CP TECHNOLOGIES EXFO INC HP INC ALCATEL BLADE NETWORK TECHNOLOGIES CPS EXTREME NETWORKS HUAWEI ALCATEL LUCENT BLONDER TONGUE LABORATORIES CREATIVE LABS EXTRON HUAWEI SYMANTEC TECHNOLOGIES ALLIED TELESIS BLUE COAT SYSTEMS CRESTRON ELECTRONICS F5 NETWORKS IBM ALLOY COMPUTER PRODUCTS LLC BOSCH SECURITY CTC UNION TECHNOLOGIES CO FELLOWES ICOMTECH INC ALTINEX, INC.
    [Show full text]
  • Products Newsletter Emergiing Trends, Cuttiing-Edge Technollogiies, and Research Breakthroughs Iin Power Ellectroniics
    Products Newsletter Emergiing Trends, Cuttiing-Edge Technollogiies, and Research Breakthroughs iin Power Ellectroniics ​ ​ ​ November 20, 2020 | Issue 7 IEEE Power Electronics Magazine To empower next-generation engineers with hands-on skills in control and power electronics, Qing- Chang Zhong, Yeqin Wang, Yiting Dong, Beibei Ren, and Mohammad Amin have developed a versatile experimental tool that lowers the barriers to go real from simulations to experiments for various power electronic systems. Their Smart Grid Research and Educational Kit, which is a reconfigurable, open-source, multifunctional power electronic converter with the capability of directly downloading codes from MATLAB/Simulink. Besides minimizing the time, cost, and efforts needed to develop hardware systems, it removes the burden of coding. Read more in the September 2020 issue of IEEE Power Electronics Magazine! IEEE Transactions on Power Electronics (TPEL) The December 2020 Issue presents 86 papers with the latest research in power electronics! December Highlighted Papers: On Beat Frequency Oscillation of Two-Stage Wireless Power Receivers Kerui Li, Siew-Chong Tan, and Ron Shu Yuen Hui Novel iGSE-C Loss Modeling of X7R Ceramic Capacitors David Menzi, Dominik Bortis, Grayson Zulauf, Morris Heller, and Johann W. Kolar December Papers with Active/Multimedia Content Analysis and Design of the LLC LED Driver Based on State-Space Representation Direct Time- Domain Solution by Maikel Menke, João P. Duranti, Leandro Roggia, Fábio E. Bisogno, Rodrigo V. Tambara, Álysson R. Seidel provides a PowerPoint Presentation elucidating each step development of the proposed design procedure during the design of the LLC LED driver. In addition, the authors offer three Wolfram Mathematica notebook script files.
    [Show full text]
  • Homomorphic Encryption: Working and Analytical Assessment DGHV, Helib, Paillier, FHEW and HE in Cloud Security
    Master of Science in Computer Science February 2017 Homomorphic Encryption: Working and Analytical Assessment DGHV, HElib, Paillier, FHEW and HE in cloud security Srinivas Divya Papisetty Faculty of Computing Blekinge Institute of Technology SE-371 79 Karlskrona Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies. Contact Information: Author(s): Srinivas Divya Papisetty E-mail: [email protected] University advisor: Dr. Emiliano Casalicchio Dept. of Computer Science & Engineering Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE-371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 i i ABSTRACT Context. Secrecy has kept researchers spanning over centuries engaged in the creation of data protection techniques. With the growing rate of data breach and intervention of adversaries in confidential data storage and communication, efficient data protection has found to be a challenge. Homomorphic encryption is one such data protection technique in the cryptographic domain which can perform arbitrary computations on the enciphered data without disclosing the original plaintext or message. The first working fully homomorphic encryption scheme was proposed in the year 2009 and since then there has been a tremendous increase in the development of homomorphic encryption schemes such that they can be applied to a wide range of data services that demand security. All homomorphic encryption schemes can be categorized as partially homomorphic (PHE), somewhat homomorphic (SHE), leveled homomorphic (LHE), and fully homomorphic encryption (FHE).
    [Show full text]
  • How Machine Learning Has Been Applied in Software Engineering?
    How Machine Learning Has Been Applied in Software Engineering? Olimar Teixeira Borges a, Julia Colleoni Couto b, Duncan Dubugras A. Ruiz c and Rafael Prikladnicki1 d School of Technology, PUCRS, Porto Alegre, Brazil Keywords: Software Engineering, Machine Learning, Mapping Study. Abstract: Machine Learning (ML) environments are composed of a set of techniques and tools, which can help in solving problems in a diversity of areas, including Software Engineering (SE). However, due to a large number of possible configurations, it is a challenge to select the ML environment to be used for a specific SE domain issue. Helping software engineers choose the most suitable ML environment according to their needs would be very helpful. For instance, it is possible to automate software tests using ML models, where the model learns software behavior and predicts possible problems in the code. In this paper, we present a mapping study that categorizes the ML techniques and tools reported as useful to solve SE domain issues. We found that the most used algorithm is Na¨ıve Bayes and that WEKA is the tool most SE researchers use to perform ML experiments related to SE. We also identified that most papers use ML to solve problems related to SE quality. We propose a categorization of the ML techniques and tools that are applied in SE problem solving, linking with the Software Engineering Body of Knowledge (SWEBOK) knowledge areas. 1 INTRODUCTION automated to reduce human effort and project cost. However, to automate these tasks using ML, we need Machine Learning (ML) is an Artificial Intelligence to have the SE project’s data to explore.
    [Show full text]
  • Failure of the Point Blinding Countermeasure Against Fault Attack in Pairing-Based Cryptography Nadia El Mrabet, Emmanuel Fouotsa
    Failure of the Point Blinding Countermeasure Against Fault Attack in Pairing-Based Cryptography Nadia El Mrabet, Emmanuel Fouotsa To cite this version: Nadia El Mrabet, Emmanuel Fouotsa. Failure of the Point Blinding Countermeasure Against Fault Attack in Pairing-Based Cryptography. 2015, 10.1007/978-3-319-18681-8_21. hal-01197148 HAL Id: hal-01197148 https://hal.archives-ouvertes.fr/hal-01197148 Submitted on 11 Sep 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Failure of the Point Blinding Countermeasure against Fault Attack in Pairing-Based Cryptography ? Nadia EL MRABET1 and Emmanuel FOUOTSA2 (1) LIASD − Universit´eParis 8 -France - [email protected] (1) SAS - CMP Gardanne - France - [email protected] (2) Dep of Mathematics, Higher Teacher's Training College, University of Bamenda - Cameroun (2) LMNO − Universit´ede Caen - France - [email protected] Abstract. Pairings are mathematical tools that have been proven to be very useful in the con- struction of many cryptographic protocols. Some of these protocols are suitable for implementation on power constrained devices such as smart cards or smartphone which are subject to side channel attacks. In this paper, we analyse the efficiency of the point blinding countermeasure in pairing based cryptography against side channel attacks.
    [Show full text]
  • Comparison of Three Common Statistical Programs Available to Washington State County Assessors: SAS, SPSS and NCSS
    Washington State Department of Revenue Comparison of Three Common Statistical Programs Available to Washington State County Assessors: SAS, SPSS and NCSS February 2008 Abstract: This summary compares three common statistical software packages available to county assessors in Washington State. This includes SAS, SPSS and NCSS. The majority of the summary is formatted in tables which allow the reader to more easily make comparisons. Information was collected from various sources and in some cases includes opinions on software performance, features and their strengths and weaknesses. This summary was written for Department of Revenue employees and county assessors to compare statistical software packages used by county assessors and as a result should not be used as a general comparison of the software packages. Information not included in this summary includes the support infrastructure, some detailed features (that are not described in this summary) and types of users. Contents General software information.............................................................................................. 3 Statistics and Procedures Components................................................................................ 8 Cost Estimates ................................................................................................................... 10 Other Statistics Software ................................................................................................... 13 General software information Information in this section was
    [Show full text]
  • STATA January 1993 TECHNICAL STB-11 BULLETIN a Publication to Promote Communication Among Stata Users
    STATA January 1993 TECHNICAL STB-11 BULLETIN A publication to promote communication among Stata users Editor Associate Editors Joseph Hilbe J. Theodore Anagnoson, Cal. State Univ., LA Department of Sociology Richard DeLeon, San Francisco State Univ. Arizona State University Paul Geiger, USC School of Medicine Tempe, Arizona 85287 Lawrence C. Hamilton, Univ. of New Hampshire 602-860-1446 FAX Stewart West, Baylor College of Medicine [email protected] EMAIL Subscriptions are available from Stata Corporation, email [email protected], telephone 979-696-4600 or 800-STATAPC, fax 979-696-4601. Current subscription prices are posted at www.stata.com/bookstore/stb.html. Previous Issues are available individually from StataCorp. See www.stata.com/bookstore/stbj.html for details. Submissions to the STB, including submissions to the supporting files (programs, datasets, and help files), are on a nonexclusive, free-use basis. In particular, the author grants to StataCorp the nonexclusive right to copyright and distribute the material in accordance with the Copyright Statement below. The author also grants to StataCorp the right to freely use the ideas, including communication of the ideas to other parties, even if the material is never published in the STB. Submissions should be addressed to the Editor. Submission guidelines can be obtained from either the editor or StataCorp. Copyright Statement. The Stata Technical Bulletin (STB) and the contents of the supporting files (programs, datasets, and help files) are copyright c by StataCorp. The contents of the supporting files (programs, datasets, and help files), may be copied or reproduced by any means whatsoever, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the STB.
    [Show full text]
  • Reproducible Research in Computational Harmonic Analysis
    R EPRODUCIBLE RESEARCH Reproducible Research in Computational Harmonic Analysis Scientific computation is emerging as absolutely central to the scientific method, but the prevalence of very relaxed practices is leading to a credibility crisis. Reproducible computational research, in which all details of computations—code and data—are made conveniently available to others, is a necessary response to this crisis. The authors review their approach to reproducible research and describe how it has evolved over time, discussing the arguments for and against working reproducibly. assive computation is transform- academic scientist likely has more in common ing science, as researchers from with a large corporation’s information technol- numerous fields launch ambitious ogy manager than with a philosophy or English projects involving large-scale com- professor at the same university. Mputations. Emblems of our age include A rapid transition is now under way—visible particularly over the past two decades—that will • data mining for subtle patterns in vast data- finish with computation as absolutely central to bases; and scientific enterprise. However, the transition is • massive simulations of a physical system’s com- very far from completion. In fact, we believe that plete evolution repeated numerous times, as the dominant mode of scientific computing has al- simulation parameters vary systematically. ready brought us to a state of crisis. The prevalence of very relaxed attitudes about communicating ex- The traditional image of the scientist as a soli- perimental details and validating results is causing tary person working in a laboratory with beakers a large and growing credibility gap. It’s impossible and test tubes is long obsolete.
    [Show full text]
  • On Cross Joining De Bruijn Sequences
    On cross joining de Bruijn sequences Johannes Mykkeltveit and Janusz Szmidt Abstract. We explain the origins of Boolean feedback functions of nonlinear feedback shift registers (NLFSRs) of fixed order n generating de Bruijn binary sequences. They all come into existence by cross joining operations starting from one maximum period feedback shift register, e.g., a linear one which always exists for any order n. The result obtained yields some constructions of NLFSRs generating maximum period 2n − 1 binary sequences. 1. Introduction The task of this note is to get insight into construction of Boolean feedback functions of NLFSRs generating maximum period binary sequences. At the Inter- national Workshop on Coding and Cryptography 2013 in Bergen we discussed the problem whether it was true or not that an arbitrary de Bruijn sequence could be obtained by applying cross-join pair operations to a given one. It seems that this has been a several decades old unsolved problem. In this note we solve this prob- lem in the affirmative. In terms of Boolean feedback functions this result indicates which algebraic operations must be applied to get all feedback functions generating de Bruijn sequences starting from a given one. The same results are true when one deals with modified de Bruijn sequences of period 2n − 1 and the corresponding feedback functions. Feedback shift registers (FSRs) are useful in generating periodic sequences, and that is the task for which they are mostly used in communication and cryptographic systems. Linear feedback shift registers (LFSRs) and NLFSRs are the main buiding blocks of many stream ciphers.
    [Show full text]