A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach

Total Page:16

File Type:pdf, Size:1020Kb

A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach This is a repository copy of A comparison of standard spell checking algorithms and a novel binary neural approach. White Rose Research Online URL for this paper: https://eprints.whiterose.ac.uk/689/ Article: Hodge, V.J. orcid.org/0000-0002-2469-0224 and Austin, J. orcid.org/0000-0001-5762-8614 (2003) A comparison of standard spell checking algorithms and a novel binary neural approach. IEEE Transactions on Knowledge and Data Engineering. pp. 1073-1081. https://doi.org/10.1109/TKDE.2003.1232265 Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request. [email protected] https://eprints.whiterose.ac.uk/ IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 5, SEPTEMBER/OCTOBER 2003 1073 A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach Victoria J. Hodge and Jim Austin Abstract—In this paper, we propose a simple, flexible, and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning, and associative matching in the AURA neural system. We integrate Hamming Distance and n-gram algorithms that have high recall for typing errors and a phonetic spell-checking algorithm in a single novel architecture. Our approach is suitable for any spell checking application though aimed toward isolated word error correction, particularly spell checking user queries in a search engine. We use a novel scoring scheme to integrate the retrieved words from each spelling approach and calculate an overall score for each matched word. From the overall scores, we can rank the possible matches. In this paper, we evaluate our approach against several benchmark spellchecking algorithms for recall accuracy. Our proposed hybrid methodology has the highest recall rate of the techniques evaluated. The method has a high recall rate and low-computational cost. Index Terms—Binary neural spell checker, integrated modular spell checker, associative matching. æ 1INTRODUCTION RRORS, particularly spelling and typing errors, are subsets of the query term. They incorporate statistical Eabundant in human generated electronic text. For correlations between adjacent letters and are able to example, Internet search engines are criticized for their accommodate letter omissions or insertions. Hamming inability to spell check the user’s query which would Distance matches words by left aligning them and matching prevent many futile searches where the user has incorrectly letter for letter. Hamming Distance does not work well for spelled one or more query terms. An approximate word- insertion and deletion where the error prevents the letters matching algorithm is required to identify errors in queries aligning with the correct spelling, but works well for where little or no contextual information is available and transposition and substitutions where most characters are using some measure of similarity, recommend words that still aligned. We have developed a novel scoring system for are most similar to each misspelled word. This error our spell checker. We separate the Hamming Distance and checking would prevent wasted computational processing, n-gram scores so the hybrid system can utilize the best prevent wasted user time, and make any system more match from either and overcome all four typing-error types. robust as spelling and typing errors can prevent the system We add the Soundex score to the two separate scores to identifying the required information. produce two word scores. The overall word score is the We describe an interactive spell checker that performs a maximum of these two values. presence check on words against a stored lexicon, identifies Our approximate matching approach is simple and spelling errors, and recommends alternative spellings [9]. flexible. We assume the query words and lexicon words The basis of the system is the AURA modular neural comprise sequences of characters from a finite set of network [3] described later. The spell checker uses a hybrid 30 characters (26 alphabetical and four punctuation char- approach to overcome phonetic spelling errors and the four acters). The approach maps characters onto binary vectors main forms of typing errors: insertion, deletion, substitu- and two storage-efficient binary matrices (Correlation tion, and transposition (double substitution). We use a Matrix Memories described in Section 2.3) that represent Soundex-type coding approach (see Kukich [10]) coupled the lexicon. The system is not language-specific so may be with transformation rules to overcome phonetic spelling used on other languages; the phonetic codes and transfor- errors. “Phonetic spelling errors are the most difficult to mation rules would just need to be adapted to the new language. Our spell checker aims to high recall1 accuracy detect and correct” [10] as they distort the spelling more possibly at the expense of precision.2 However, the scoring than the other error types so a phonetic component is allows us to rank the retrieved matches so we can limit the essential. We use an n-gram approach [13] to overcome the number of possibilities suggested to the user to the top first two forms of typing errors and integrate a Hamming 10 matches, giving both high recall and precision. Distance approach to overcome substitution and transposi- Some alternative spelling approaches include the Le- tion errors. N-gram approaches match small character venshtein Edit distance [10], Agrep [15], [14], aspell [2], and the two benchmark approaches MS Word 97 and MS Word . The authors are with the Department of Computer Science, University of 2000. We evaluate our approach against these alternatives. York, UK YO10 5DD. E-Mail: {austin, vicky}@cs.york.ac.uk. The reader is referred to Kukich [10] for a thorough treatise Manuscript received 30 Oct. 2000; revised 25 May 2001; accepted 27 July of spell checking techniques. We compare our hybrid 2001. For information on obtaining reprints of this article, please send e-mail to: 1. The percentage of correct spellings retrieved. [email protected], and reference IEEECS Log Number 113081. 2. The percentage of words other than the correct spelling retrieved. 1041-4347/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society 1074 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 5, SEPTEMBER/OCTOBER 2003 system with all of the aforementioned approaches for quality of retrieval—the percentage of correct words retrieved from 600 misspelled words giving a figure for the recall accuracy with noisy inputs (misspellings). The paper is organized into the following sections: In Section 2, we provide a brief description of the alternative techniques used for the empirical comparison against our technique. Section 2.3 is a detailed description of the AURA neural architecture and our hybrid spelling technique using integrated AURA modules. In Section 3, we provide a qualitative comparison of the recall accuracy of our technique and those alternative approaches described in Section 2. Section 4 provides our analysis of the recall results and Section 5 contains our conclusions and sugges- tions for possible further developments for our technique. 2SOME ALTERNATIVE SPELL CHECKING APPROACHES 2.1 Levenshtein Edit Distance Levenshtein edit distance produces a similarity score for the Fig. 1. Diagram of the hybrid spell checker as implemented in the AURA query term against each lexicon word in turn. The score is modular system. the number of single character insertions, deletions, or substitutions required to alter the query term to produce the in the matching process of each word and R the next lexicon word, for example, to go from “ him”to“ham” is one jþ1 step. RShift is a logical right shift, AND and OR denote substitution or “ham”to“harm” is one insertion. The word logical AND and OR, respectively, and S is a bit vector with the lowest score is deemed the best match. c representing the character being compared c. The following fð0; 0Þ¼0 ð1Þ two equations describe the matching process for up to k errors 0 <d k. fði; jÞ¼min½ðfði À 1;jÞþ1;fði; j À 1Þþ1; d R0 ¼ 11 . 100 . 000 with d bits set ð3Þ fði À 1;jÀ 1Þþdðqi;ljÞ where dðqi;ljÞ¼0ifqi ¼ lj else dðqi;ljÞ¼1: d d Rjþ1 ¼Rshift½Rj AND Sc OR ð4Þ ð2Þ dÀ1 dÀ1 dÀ1 Rshift½Rj OR Rshift½Rjþ1 OR Rj : For all words, a function fð0; 0Þ is set to 0 and then a For a search with up to k errors permitted, there are k þ 1 function fði; jÞ is calculated for all query letters and all arrays as stated previously. There are two shifts, one AND, lexicon-word letters, iteratively counting the string differ- and three OR operations for each character comparison (see ence between the query q1q2 ...qi and the lexicon word [15]), so Wu and Manber quote the running time as Oððk þ l1l2 ...lj. Each insertion, deletion, or substitution is awarded a score of 1, see (2). Edit distance is OðmnÞ for retrieval as it 1ÞnÞ for an n word lexicon. performs a brute force comparison with every character (all 2.3 Our Integrated Hybrid Modular Approach m characters) of every word (all n words) in the lexicon and, We implement all spelling modules in AURA. AURA [3] is therefore, can be slow when using large dictionaries.
Recommended publications
  • UNIX (Solaris/Linux) Quick Reference Card Logging in Directory Commands at the Login: Prompt, Enter Your Username
    UNIX (Solaris/Linux) QUICK REFERENCE CARD Logging In Directory Commands At the Login: prompt, enter your username. At the Password: prompt, enter ls Lists files in current directory your system password. Linux is case-sensitive, so enter upper and lower case ls -l Long listing of files letters as required for your username, password and commands. ls -a List all files, including hidden files ls -lat Long listing of all files sorted by last Exiting or Logging Out modification time. ls wcp List all files matching the wildcard Enter logout and press <Enter> or type <Ctrl>-D. pattern Changing your Password ls dn List files in the directory dn tree List files in tree format Type passwd at the command prompt. Type in your old password, then your new cd dn Change current directory to dn password, then re-enter your new password for verification. If the new password cd pub Changes to subdirectory “pub” is verified, your password will be changed. Many systems age passwords; this cd .. Changes to next higher level directory forces users to change their passwords at predetermined intervals. (previous directory) cd / Changes to the root directory Changing your MS Network Password cd Changes to the users home directory cd /usr/xx Changes to the subdirectory “xx” in the Some servers maintain a second password exclusively for use with Microsoft windows directory “usr” networking, allowing you to mount your home directory as a Network Drive. mkdir dn Makes a new directory named dn Type smbpasswd at the command prompt. Type in your old SMB passwword, rmdir dn Removes the directory dn (the then your new password, then re-enter your new password for verification.
    [Show full text]
  • SPELL-2 Manual.Pdf
    Julie J. Masterson, PhD • Kenn Apel, PhD • Jan Wasowicz, PhD © 2002, 2006 by Learning By Design, Inc. All rights reserved. No part of this publication may be reproduced in whole or in part without the prior written permission of Learning By Design, Inc. SPELL: Spelling Performance Evaluation for Language and Literacy and Learning By Design, Inc. are registered trademarks, and Making A Difference in K-12 Education is a trademark of Learning By Design, Inc. 13 12 11 10 09 08 07 06 8 7 6 5 4 3 2 1 ISBN-13: 978-0-9715133-2-7 ISBN-10: 0-9715133-2-5 Printed in the United States of America P.O. Box 5448 Evanston, IL 60604-5448 www.learningbydesign.com Information & Customer Services For answers to Frequently Asked Questions about SPELL–2, please visit the Learning By Design, Inc., website at www.learningbydesign.com. Technical Support First, please visit the technical support FAQ page at www.learningbydesign.com. If you don’t find an answer to your question, please call 1-847-328-8390 between 8 am and 5 pm CST. We’d love to hear from you! Your feedback, comments, and suggestions are always welcome. Please contact us by email at [email protected]. Portions of code are Copyright © 1994–2002 Integrations New Media, Inc., and used under license by Integration New Media, Inc. DIRECTOR® © 1984–2004 Macromedia, Inc. DIBELS is a registered trademark of Dynamic Measurement Group, Inc., and is not affiliated with Learning By Design, Inc. Earobics® is a product and registered trademark of Cognitive Concepts, Inc., a division of Houghton-Mifflin, and is not affiliated with Learning By Design, Inc.
    [Show full text]
  • Glyne Piggott: Cyclic Spell-Out and the Typology of Word Minimality
    Cyclic spell-out and the typology of word minimality∗ Glyne Piggott McGill University “Why is language the way it is [and not otherwise]?” (adapted from O’Grady 2003) Abstract This paper rejects the view that a minimal size requirement on words is emergent from the satisfaction of a binarity condition on the foot. Instead it proposes an autonomous minimality condition (MINWD) that regulates the mapping between morpho-syntactic structure and phonology. This structure is determined by principles of Distributed Morphology, and the mapping proceeds cyclically, as defined by phase theory. The paper postulates that, by language-specific choice, MINWD may be satisfied on either the first or last derivational cycle. This parametric choice underlies the observation that some languages (e.g. Turkish, Woleaian) may paradoxically both violate and enforce the constraint. It also helps to explain why some languages actively enforce the constraint by augmenting words (e.g. Lardil, Mohawk), while others do not have to resort to such a strategy (e.g. Ojibwa, Cariban languages). The theory of word minimality advocated in this paper generates a restrictive typology that fits the attested patterns without over-generating some unattested types that are sanctioned by other frameworks. 1. Introduction. Some languages disallow (content) words that consist of just one light (i.e. CV/CVC) syllable. Hayes (1995: 88) lists forty languages that display this "minimal word syndrome". The cited evidence for the syndrome includes cases where a truncation process is blocked to avoid creating words that are too short and also in cases where the size of a CV/CVC input is increased by a phonological process of word augmentation.
    [Show full text]
  • Unix™ for Poets
    - 1 - Unix for Poets Kenneth Ward Church AT&T Research [email protected] Text is available like never before. Data collection efforts such as the Association for Computational Linguistics' Data Collection Initiative (ACL/DCI), the Consortium for Lexical Research (CLR), the European Corpus Initiative (ECI), ICAME, the British National Corpus (BNC), the Linguistic Data Consortium (LDC), Electronic Dictionary Research (EDR) and many others have done a wonderful job in acquiring and distributing dictionaries and corpora.1 In addition, there are vast quantities of so-called Information Super Highway Roadkill: email, bboards, faxes. We now has access to billions and billions of words, and even more pixels. What can we do with it all? Now that data collection efforts have done such a wonderful service to the community, many researchers have more data than they know what to do with. Electronic bboards are beginning to ®ll up with requests for word frequency counts, ngram statistics, and so on. Many researchers believe that they don't have suf®cient computing resources to do these things for themselves. Over the years, I've spent a fair bit of time designing and coding a set of fancy corpus tools for very large corpora (eg, billions of words), but for a mere million words or so, it really isn't worth the effort. You can almost certainly do it yourself, even on a modest PC. People used to do these kinds of calculations on a PDP-11, which is much more modest in almost every respect than whatever computing resources you are currently using.
    [Show full text]
  • Chapter 2 Related Work
    Chapter 2 Related Work This chapter surveys previous work in structured text processing. A common theme in much of this work is a choice between two approaches: the syntactic approach and the lexical approach. In general terms, the syntactic approach uses a formal, hierarchical definition of text struc- ture, invariably some form of grammar. Syntactic systems generally parse the text into a tree of elements, called a syntax tree, and then search, edit, or otherwise manipulate the tree. The lexical approach, on the other hand, is less formal and rarely hierarchical. Lexical systems generally use regular expressions or a similar pattern language to describe structure. Instead of parsing text into a hierarchical tree, lexical systems treat the text as a sequence of flat segments, such as characters, tokens, or lines. The syntactic approach is generally more expressive, since grammars can capture aspects of hierarchical text structure, particularly arbitrary nesting, that the lexical approach cannot. However, the lexical approach is generally better at handling structure that is only partially described. Other differences between the two approaches will be seen in the sections below. 2.1 Parser Generators Parser generators are perhaps the earliest systems for describing and processing structured text. Lex [Les75] generates a lexical analyzer from a declarative specification consisting of a set of regular expressions describing tokens. Each token is associated with an action, which is a statement or block of statements written in C. A Lex specification is translated into a finite state machine that reads a stream of text, splits it into tokens, and invokes the corresponding action for each token.
    [Show full text]
  • UNIX Reference Card Create a Directory Called Options: -I Interactive Mode
    File System Manipulation Move (Rename) Files and Directories mv present-filename new-filename to rename a file Create (or Make) a Directory mv source-filename destination-directory to move a file into another directory mkdir directory-name directory-name UNIX Reference Card create a directory called options: -i interactive mode. Must confirm file overwrites. Anatomy of a Unix Command Look at a File Copy Files more filename man display file contents, same navigation as cp source-filename destination-filename to copy a file into command-name -option(s) filename(s) or arguments head filename display first ten lines of a file another file tail filename display last ten lines of a file Example: wc -l sample cp source-filename destination-directory to copy a file into another directory : options options: The first word of the command line is usually the command name. This -# replace # with a number to specify how many lines to show is followed by the options, if any, then the filenames, directory name, or -i interactive mode. Must confirm overwrites. Note: this other arguments, if any, and then a RETURN. Options are usually pre- option is automatically used on all IT’s systems. -R recursive delete ceded by a dash and you may use more than one option per command. List Files and Directories The examples on this reference card use bold case for command names Remove (Delete) Files and Directories and options and italics for arguments and filenames. ls lists contents of current directory ls directory-name list contents of directory rm filename to remove a file rmdir directory-name to remove an Important Note about UNIX Commands empty directory : options options: -a list all files including files that start with “.” UNIX commands are case sensitive.
    [Show full text]
  • Brew & Moens (2002)
    Chapter 3 Tools for finding and displaying text This chapter will introduce some basic techniques and operations for use in data-intensive linguistics. We will show how existing tools (mainly standard unix tools) and some limited programming can be used to carry out these operations. Some of the material on standard unix tools is a revised and extended version of notes taken at a tutorial given by Ken Church, at Coling 90 in Helsinki, entitled \Unix for Poets". We use some of his exercises, adding variations of our own as appropriate. One of Church's examples was the creation of a kwic index, which we first encountered in chapter 5 of Aho, Kernighan and Weinberger (1988). In section 4 we discuss this example, provide variations on the program, including a version in Perl, and compare it with other (more elaborate, but arguably less flexible) concordance generation tools. There are more advanced, off-the-shelf tools that can be used for these op- erations, and several of them will be described later on in these notes. In theory, these can be used without requiring any programming skills. But more often than not, available tools will not do quite what you need, and will need to be adapted. When you do data-intensive linguistics, you will often spend time experimentally adapting a program written in a language which you don't necessarily know all that well. You do need to know the basics of these programming languages, a text editor, a humility, a positive attitude, a degree of luck and, if some or all of that is missing, a justified 19 20 CHAPTER 3.
    [Show full text]
  • The Unix Shell
    The Unix Shell Creating and Deleting Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information. Creating and Deleting Introduction shell Creating and Deleting Introduction shell pwd print working directory cd change working directory ls listing . current directory .. parent directory Creating and Deleting Introduction shell pwd print working directory cd change working directory ls listing . current directory .. parent directory But how do we create things in the first place? Creating and Deleting Introduction $$$ pwd /users/vlad $$$ Creating and Deleting Introduction $$$ pwd /users/vlad $$$ ls -F bin/ data/ mail/ music/ notes.txt papers/ pizza.cfg solar/ solar.pdf swc/ $$$ Creating and Deleting Introduction $$$ pwd /users/vlad $$$ ls -F bin/ data/ mail/ music/ notes.txt papers/ pizza.cfg solar/ solar.pdf swc/ $$$ mkdir tmp Creating and Deleting Introduction $$$ pwd /users/vlad $$$ ls -F bin/ data/ mail/ music/ notes.txt papers/ pizza.cfg solar/ solar.pdf swc/ $$$ mkdir tmp make directory Creating and Deleting Introduction $$$ pwd /users/vlad $$$ ls -F bin/ data/ mail/ music/ notes.txt papers/ pizza.cfg solar/ solar.pdf swc/ $$$ mkdir tmp make directory a relative path, so the new directory is made below the current one Creating and Deleting Introduction $$$ pwd /users/vlad $$$ ls -F bin/ data/ mail/ music/ notes.txt papers/ pizza.cfg solar/ solar.pdf swc/ $$$ mkdir tmp $$$ ls –F bin/ data/ mail/ music/ notes.txt papers/
    [Show full text]
  • The Grace Programming Language Draft Specification Version 0.5. 2025" (2015)
    Portland State University PDXScholar Computer Science Faculty Publications and Presentations Computer Science 2015 The Grace Programming Language Draft Specification ersionV 0.5. 2025 Andrew P. Black Portland State University, [email protected] Kim B. Bruce James Noble Follow this and additional works at: https://pdxscholar.library.pdx.edu/compsci_fac Part of the Programming Languages and Compilers Commons Let us know how access to this document benefits ou.y Citation Details Black, Andrew P.; Bruce, Kim B.; and Noble, James, "The Grace Programming Language Draft Specification Version 0.5. 2025" (2015). Computer Science Faculty Publications and Presentations. 140. https://pdxscholar.library.pdx.edu/compsci_fac/140 This Working Paper is brought to you for free and open access. It has been accepted for inclusion in Computer Science Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. The Grace Programming Language Draft Specification Version 0.5.2025 Andrew P. Black Kim B. Bruce James Noble April 2, 2015 1 Introduction This is a specification of the Grace Programming Language. This specifica- tion is notably incomplete, and everything is subject to change. In particular, this version does not address: • James IWE MUST COMMIT TO CLASS SYNTAX!J • the library, especially collections and collection literals • static type system (although we’ve made a start) • module system James Ishould write up from DYLA paperJ • dialects • the abstract top-level method, as a marker for abstract methods, • identifier resolution rule. • metadata (Java’s @annotations, C] attributes, final, abstract etc) James Ishould add this tooJ Kim INeed to add syntax, but not necessarily details of which attributes are in language (yet)J • immutable data and pure methods.
    [Show full text]
  • Spelling and Phonetic Inconsistencies in English: a Problem for Learners of English As a Foreign/Second Language Nneka Umera-Okeke
    Spelling and Phonetic Inconsistencies in English: A Problem for Learners of English as a Foreign/Second Language Nneka Umera-Okeke Abstract: Spelling is simply the putting together of a number of letters of the alphabet in order to form words. In a perfect alphabet, every letter would be a phonetic symbol representing one sound and one only, and each sound would have its appropriate symbol. But it is not the case in English. English spelling is defective. It is a poor reflection of English pronunciation as we have not enough symbols to represent all the sounds of English. The problems of these inconsistencies to foreign and second language learners can not be over- emphasized. This study will look at the historical reasons for this problem; areas of these inconsistencies and make some suggestions to ease the problem of spelling and pronunciation for second and foreign language learners. Introduction: With the spread of literacy and the invention of printing came the development of written English with its confusing and inconsistent spellings becoming more and more apparent. Ideally, the spelling system should closely reflect pronunciation and in many languages that indeed is the case. Each sound of English language is represented by more than one written letter or by sequences of letters; and any letter of English represents more than one sound, or it may not represent any sound at all. There is lack of consistencies. Commenting on these inconsistencies, Vallins (1954) states: 64 African Research Review Vol. 2 (1) Jan., 2008 Professor Ernest Weekly in The English Language forcefully and uncompromisingly expresses the opinion that the spelling of English “is so far as its relation to the spoken word is concerned quite crazy…” At the early stage of writing, say as early as eighteenth century, people did not concern themselves with rules or accepted practices.
    [Show full text]
  • Concepts of Programming Languages, Eleventh Edition, Global Edition
    GLOBAL EDITION Concepts of Programming Languages ELEVENTH EDITION Robert W. Sebesta digital resources for students Your new textbook provides 12-month access to digital resources that may include VideoNotes (step-by-step video tutorials on programming concepts), source code, web chapters, quizzes, and more. Refer to the preface in the textbook for a detailed list of resources. Follow the instructions below to register for the Companion Website for Robert Sebesta’s Concepts of Programming Languages, Eleventh Edition, Global Edition. 1. Go to www.pearsonglobaleditions.com/Sebesta 2. Click Companion Website 3. Click Register and follow the on-screen instructions to create a login name and password Use a coin to scratch off the coating and reveal your access code. Do not use a sharp knife or other sharp object as it may damage the code. Use the login name and password you created during registration to start using the digital resources that accompany your textbook. IMPORTANT: This access code can only be used once. This subscription is valid for 12 months upon activation and is not transferable. If the access code has already been revealed it may no longer be valid. For technical support go to http://247pearsoned.custhelp.com This page intentionally left blank CONCEPTS OF PROGRAMMING LANGUAGES ELEVENTH EDITION GLOBAL EDITION This page intentionally left blank CONCEPTS OF PROGRAMMING LANGUAGES ELEVENTH EDITION GLOBAL EDITION ROBERT W. SEBESTA University of Colorado at Colorado Springs Global Edition contributions by Soumen Mukherjee RCC Institute
    [Show full text]
  • Some Basic Tasks and Corresponding Tools †Eboards, CSC 295
    Thinking in C and *nix (CSC 295/282 2014S) : EBoards CSC295 2014S, Class 02: Some Basic Tasks and Corresponding Tools Overview Preliminaries. Admin. Fun with GitHub. Go over homework. Exercise: The spaces problem. Raymond, chapter 1. Thinking about basic tools. Preliminaries Admin I encourage you to attend CS table on Friday and talk about the ACM code of ethics. Status check 1: How many of you know the C bitwise operations? (&, |, etc.) Status check 2: Could anyone not make class if I moved it to 3:15pm? (Hands closed, blackball-style question.) We are sticking with 1:15 p.m. Note: I was overly optimistic at the start of the semester. Given the amount of time daily homework is taking in my three classes, I probably won't get a lot of chance to write the "book" for this class. I will try to set up some lab problems, though. Questions Fun with GitHub What happens when someone sends a pull request? GitHub sends you a nice message with instructions. You can browse the request to see how someone wants to screw up the repository. Isn't our site lovely? Careful instructions and agreed-upon policies help 1 Go over homework That fun hypothetical C problem. (Did anyone write code to cause the error?) int *x; // Our array of data. int main (int argc, char *argv[]) { x = malloc (...); foo (); bar (); free (x); // Crash } // main Multi-threaded program, foo or bar accesses x after it should. foo and bar have terminated. malloc keeps track of how much space it has allocated Note: "You went beyond the null at the end of the array." Arrays are not null terminated in C.
    [Show full text]