Old Software Treasuries Please Interrupt Me at Any Time in Case of Questions

Total Page:16

File Type:pdf, Size:1020Kb

Old Software Treasuries Please Interrupt Me at Any Time in Case of Questions this talk • show the historical background • introduce, explain and demonstrate ed, sed, awk • solve real life problems Old Software Treasuries please interrupt me at any time in case of questions markus schnalke <[email protected]> please tell me what I should demonstrate goal: motivate you to learn and use these programs old software treasuries origin • older than I am • classic Unix tools • standardized • from Bell Labs • various implementations • representations of the Unix Philosophy • freely available • good software historical background historical background 1968: 1970: • MULTICS fails • Kernighan suggests ‘‘Unix’’ • the last people working on it were Thompson, Ritchie, • deal: a PDP-11 for a document preparation system McIlroy, Ossanna 1973: 1969: • rewrite of Unix in C • they still think about operating systems 1974: • pushing idea: a file system (mainly Thompson) • first publication • in search for hardware: a rarely used PDP-7 computer hardware back then impact on software • computers are as large as closets • don’t waste characters • users work on terminals • be quiet • terminals are: a keyboard and a line printer • nothing like ‘‘screen-orientation’’ • line printers have a speed of 10–15 chars/s <picture of PDP-11> ed • by Ken Thompson in early 70s, based on qed • the Unix text editor • the standard text editor! • Unix was written with ed ed • first implementation of Regular Expressions ed: old-fashioned ed: reasons • already old-fashioned in 1984 • the standard (text editor) •‘‘So why are we spending time on such a old-fashioned • available as /bin/ed program?’’ • line oriented → works on any terminal •‘‘Although many readers will prefer some other editor for • needs few bandwidth ed daily use, is universally available, efficient and effec- • good for automated editing (scripts) tive.’’ • good for presentations (?) it’s not so much about using it, but about knowing it ed: overview ed: commands • usage: ed [-s] [file] i insert p print • commands: [a1[,a2]] cmd [params] a append n print with line number • default addresses, default params c change l list characters • success → no feedback d delete e edit file • problems → ‘‘?’’ m move w write file • about 30 commands, half of them important t transfer (copy) q quit s substitute u undo g global command ed: addresses 4 4th line $ last line . current line + next line (.+1) -- the same as .-2 <ed demo> /RE/ next line matching RE ?RE? previous line matching RE 1,$ all lines -,/foo/ from previous line to next /foo/ ed: derived software grep ed ex vi sed sed sed sed: differences to ed • by Lee McMahon in 1973 • line processing cycle •‘‘stream editor’’ • no forward/backward references • started as a one-night hack on ed (AFAIR) • hold space (g,G,h,H,x) • makes ed suitable for pipeline processing • labels and branches • irony: today, people use sed -i for in-place editing • bad i and a syntax <sed demo> awk awk awk: purpose • by Aho, Weinberger, Kernighan • for text processing • oawk in 1977, nawk in 1985 • avoids complex constructs in sh • name is abbreviation of surnames (but also ‘‘awkward’’) • floating point arithmetic • the second scripting language on Unix (besides sh) • ‘‘ sed meets C’’ awk: usage awk: features • awk ’commands’ <in >out • variables, assoc arrays • awk -f cmdfile <in >out • functions • program flow like in sed • pattern matching (EREs) • input automatically split in records and fields • dynamic typing • program is a list of blocks: cond { commands } • more high-level: no pointers awk: statements awk: functions much like C arithmetic much like in C • if, but no switch-case I/O getline(), system() • while, do-while string sub(), gsub() • for (;;) {} like in C substr(), index(), match(), • for (i in array) {} foreach split(), sprintf() • break, continue, exit, return tolower(), toupper(), • print and printf are statements! redirection possi- length(), int() ble with >, >>, | awk: variables awk: conditions RS, FS record/field separator examples: NR number of current record • NR == 1 {...} NF $1 $NF number of fields ( ... ) • NR == 1, NR == 5 {...} ARGC, ARGV like in C • /RE/ {...} FILENAME name of current input file • $1 == "string" {...} OFMT output format for numbers • $1 ˜ /RE/ {...} ORS, OFS output record/field separator SUBSEP a[1,2] equals a[1 SUBSEP 2] • BEGIN {...} • END {...} <awk demo> real life examples literature • ‘‘The Unix Programming Environment’’ by Kernighan this talk was prepared using tools of the Heirloom project: and Pike is highly recommended to support this talk http://heirloom.sf.net • it’s one of my favorite computer books the slides macros were taken from • you really should read it! http://repo.cat-v.org/troff-slider/ all editing was done with ed, of course :-) • ‘‘SED & AWK ge-packt’’ by Stephan Thesing • if you want a german book about sed and awk then get the slides and examples are available on my website this one http://marmaro.de/docs and on • you can get it for about 3 Euro http://ulm.ccc.de/ChaosSeminar/2009/10_Softwareschaetze 2009-10-12.
Recommended publications
  • QED: the Edinburgh TREC-2003 Question Answering System
    QED: The Edinburgh TREC-2003 Question Answering System Jochen L. Leidner Johan Bos Tiphaine Dalmas James R. Curran Stephen Clark Colin J. Bannard Bonnie Webber Mark Steedman School of Informatics, University of Edinburgh 2 Buccleuch Place, Edinburgh, EH8 9LW, UK. [email protected] Abstract We also use additional state-of-the-art text processing tools, including maximum entropy taggers for POS tag- This report describes a new open-domain an- ging and named entity (NE) recognition. POS tags and swer retrieval system developed at the Uni- NE-tags are used during the construction of the semantic versity of Edinburgh and gives results for the representation. Section 2 describes each component of TREC-12 question answering track. Phrasal an- the system in detail. swers are identified by increasingly narrowing The main characteristics of the system architecture are down the search space from a large text col- the use of the Open Agent Architecture (OAA) and a par- lection to a single phrase. The system uses allel design which allows multiple questions to be an- document retrieval, query-based passage seg- swered simultaneously on a Beowulf cluster. The archi- mentation and ranking, semantic analysis from tecture is shown in Figure 1. a wide-coverage parser, and a unification-like matching procedure to extract potential an- 2 Component Description swers. A simple Web-based answer validation stage is also applied. The system is based on 2.1 Pre-processing and Indexing the Open Agent Architecture and has a paral- The ACQUAINT document collection which forms the ba- lel design so that multiple questions can be an- sis for TREC-2003 was pre-processed with a set of Perl swered simultaneously on a Beowulf cluster.
    [Show full text]
  • A Survey of Engineering of Formally Verified Software
    The version of record is available at: http://dx.doi.org/10.1561/2500000045 QED at Large: A Survey of Engineering of Formally Verified Software Talia Ringer Karl Palmskog University of Washington University of Texas at Austin [email protected] [email protected] Ilya Sergey Milos Gligoric Yale-NUS College University of Texas at Austin and [email protected] National University of Singapore [email protected] Zachary Tatlock University of Washington [email protected] arXiv:2003.06458v1 [cs.LO] 13 Mar 2020 Errata for the present version of this paper may be found online at https://proofengineering.org/qed_errata.html The version of record is available at: http://dx.doi.org/10.1561/2500000045 Contents 1 Introduction 103 1.1 Challenges at Scale . 104 1.2 Scope: Domain and Literature . 105 1.3 Overview . 106 1.4 Reading Guide . 106 2 Proof Engineering by Example 108 3 Why Proof Engineering Matters 111 3.1 Proof Engineering for Program Verification . 112 3.2 Proof Engineering for Other Domains . 117 3.3 Practical Impact . 124 4 Foundations and Trusted Bases 126 4.1 Proof Assistant Pre-History . 126 4.2 Proof Assistant Early History . 129 4.3 Proof Assistant Foundations . 130 4.4 Trusted Computing Bases of Proofs and Programs . 137 5 Between the Engineer and the Kernel: Languages and Automation 141 5.1 Styles of Automation . 142 5.2 Automation in Practice . 156 The version of record is available at: http://dx.doi.org/10.1561/2500000045 6 Proof Organization and Scalability 162 6.1 Property Specification and Encodings .
    [Show full text]
  • The UNIX Time- Sharing System
    1. Introduction There have been three versions of UNIX. The earliest version (circa 1969–70) ran on the Digital Equipment Cor- poration PDP-7 and -9 computers. The second version ran on the unprotected PDP-11/20 computer. This paper describes only the PDP-11/40 and /45 [l] system since it is The UNIX Time- more modern and many of the differences between it and older UNIX systems result from redesign of features found Sharing System to be deficient or lacking. Since PDP-11 UNIX became operational in February Dennis M. Ritchie and Ken Thompson 1971, about 40 installations have been put into service; they Bell Laboratories are generally smaller than the system described here. Most of them are engaged in applications such as the preparation and formatting of patent applications and other textual material, the collection and processing of trouble data from various switching machines within the Bell System, and recording and checking telephone service orders. Our own installation is used mainly for research in operating sys- tems, languages, computer networks, and other topics in computer science, and also for document preparation. UNIX is a general-purpose, multi-user, interactive Perhaps the most important achievement of UNIX is to operating system for the Digital Equipment Corpora- demonstrate that a powerful operating system for interac- tion PDP-11/40 and 11/45 computers. It offers a number tive use need not be expensive either in equipment or in of features seldom found even in larger operating sys- human effort: UNIX can run on hardware costing as little as tems, including: (1) a hierarchical file system incorpo- $40,000, and less than two man years were spent on the rating demountable volumes; (2) compatible file, device, main system software.
    [Show full text]
  • Regular Expression Matching Can Be Simple A
    Regular Expression Matching Can Be Simple A... https://swtch.com/~rsc/regexp/regexp1.html Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) Russ Cox [email protected] January 2007 Introduction This is a tale of two approaches to regular expression matching. One of them is in widespread use in the standard interpreters for many languages, including Perl. The other is used only in a few places, notably most implementations of awk and grep. The two approaches have wildly different performance characteristics: Time to match a?nan against an Let's use superscripts to denote string repetition, so that a?3a3 is shorthand for a?a?a?aaa. The two graphs plot the time required by each approach to match the regular expression a?nan against the string an. Notice that Perl requires over sixty seconds to match a 29-character string. The other approach, labeled Thompson NFA for reasons that will be explained later, requires twenty microseconds to match the string. That's not a typo. The Perl graph plots time in seconds, while the Thompson NFA graph plots time in microseconds: the Thompson NFA implementation is a million times faster than Perl when running on a miniscule 29-character string. The trends shown in the graph continue: the Thompson NFA handles a 100-character string in under 200 microseconds, while Perl would require over 1015 years. (Perl is only the most conspicuous example of a large number of popular programs that use the same algorithm; the above graph could have been Python, or PHP, or Ruby, or many other languages.
    [Show full text]
  • Introducing Regular Expressions
    Introducing Regular Expressions wnload from Wow! eBook <www.wowebook.com> o D Michael Fitzgerald Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo Introducing Regular Expressions by Michael Fitzgerald Copyright © 2012 Michael Fitzgerald. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editor: Simon St. Laurent Indexer: Lucie Haskins Production Editor: Holly Bauer Cover Designer: Karen Montgomery Proofreader: Julie Van Keuren Interior Designer: David Futato Illustrator: Rebecca Demarest July 2012: First Edition. Revision History for the First Edition: 2012-07-10 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449392680 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Introducing Regular Expressions, the image of a fruit bat, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein.
    [Show full text]
  • Contract Number: PH65783 QED, Inc. Appendix D: Pricing Schedules For
    Contract Number: PH65783 QED, Inc. Appendix D: Pricing Schedules Agreement between the New York State Office of General Services and QED, Inc. for Hourly Based IT Services (HBITS) Contract Number: PH65783 Appendix D: Pricing Schedules Effective for Task Orders posted on or after November 1, 2016 Table of Contents Worksheet #1: Region 1 Rates Worksheet #2: Region 2 Rates (Mid-Hudson) Worksheet #3: Region 3 Rates (NYC Metro) Worksheet #4: Region Definitions Worksheet #5: Job Title Definitions Worksheet #6: Skill Level Definitions Worksheet #7: Skill Demand Definitions For Task Orders issued on/after: 11/1/16 Contract Number: PH65783 QED, Inc. Appendix D: Pricing Schedules -----------------------------------------------------------------------------------------------------------------------------REGION 1----------------------------------------------------------------------------------------------------------------------------- Service Group 1. STANDARD TITLES Service Group 2. SOFTWARE/HARDWARE SPECIFIC TITLES Standard Title Markup 30.00% Software/Hardware Specific Markup 30.00% *Note: Please enter the markup as a percentage (i.e.: if the Markup is *Note: Please enter the markup as a percentage (i.e.: if the Markup is 20%, please enter 20 into the cell) 20%, please enter 20 into the cell) Skill Demand Normal High 5% Slight 5% Slight 5% Slight Hourly Hourly Bill Deviation: Hourly Hourly Bill Deviation: Level Deviation: Hourly Wage Rate Rate Hourly Wage Rate Rate Hourly Hourly Hourly Wage Rate Job Title Wage Rate Job Title Wage Rate Wage Rate
    [Show full text]
  • QED at Large: a Survey of Engineering of Formally Verified
    The version of record is available at: http://dx.doi.org/10.1561/2500000045 QED at Large: A Survey of Engineering of Formally Verified Software Talia Ringer Karl Palmskog University of Washington University of Texas at Austin [email protected] [email protected] Ilya Sergey Milos Gligoric Yale-NUS College University of Texas at Austin and [email protected] National University of Singapore [email protected] Zachary Tatlock University of Washington [email protected] Errata for the present version of this paper may be found online at https://proofengineering.org/qed_errata.html The version of record is available at: http://dx.doi.org/10.1561/2500000045 Contents 1 Introduction 103 1.1 Challenges at Scale..................... 104 1.2 Scope: Domain and Literature............... 105 1.3 Overview........................... 105 1.4 Reading Guide........................ 106 2 Proof Engineering by Example 108 3 Why Proof Engineering Matters 111 3.1 Proof Engineering for Program Verification......... 112 3.2 Proof Engineering for Other Domains........... 117 3.3 Practical Impact....................... 124 4 Foundations and Trusted Bases 126 4.1 Proof Assistant Pre-History................. 126 4.2 Proof Assistant Early History................ 129 4.3 Proof Assistant Foundations................ 130 4.4 Trusted Computing Bases of Proofs and Programs.... 137 5 Between the Engineer and the Kernel: Languages and Automation 141 5.1 Styles of Automation.................... 142 5.2 Automation in Practice................... 155 The version of record is available at: http://dx.doi.org/10.1561/2500000045 6 Proof Organization and Scalability 161 6.1 Property Specification and Encodings........... 161 6.2 Proof Design Principles..................
    [Show full text]
  • Compatible Time-Sharing System (1961-1973) Fiftieth Anniversary
    Compatible Time-Sharing System (1961-1973) Fiftieth Anniversary Commemorative Overview The Compatible Time Sharing System (1961–1973) Fiftieth Anniversary Commemorative Overview The design of the cover, half-title page (reverse side of this page), and main title page mimics the design of the 1963 book The Compatible Time-Sharing System: A Programmer’s Guide from the MIT Press. The Compatible Time Sharing System (1961–1973) Fiftieth Anniversary Commemorative Overview Edited by David Walden and Tom Van Vleck With contributions by Fernando Corbató Marjorie Daggett Robert Daley Peter Denning David Alan Grier Richard Mills Roger Roach Allan Scherr Copyright © 2011 David Walden and Tom Van Vleck All rights reserved. Single copies may be printed for personal use. Write to us at [email protected] for a PDF suitable for two-sided printing of multiple copies for non-profit academic or scholarly use. IEEE Computer Society 2001 L Street N.W., Suite 700 Washington, DC 20036-4928 USA First print edition June 2011 (web revision 03) The web edition at http://www.computer.org/portal/web/volunteercenter/history is being updated from the print edition to correct mistakes and add new information. The change history is on page 50. To Fernando Corbató and his MIT collaborators in creating CTSS Contents Preface . ix Acknowledgments . ix MIT nomenclature and abbreviations . x 1 Early history of CTSS . 1 2 The IBM 7094 and CTSS at MIT . 5 3 Uses . 17 4 Memories and views of CTSS . 21 Fernando Corbató . 21 Marjorie Daggett . 22 Robert Daley . 24 Richard Mills . 26 Tom Van Vleck . 31 Roger Roach .
    [Show full text]
  • Vis Editor Combining Modal Editing with Structural Regular Expressions
    Vis Editor Combining modal editing with structural regular expressions Marc Andr´eTanner CoSin 2017, 17. May Editor Lineage1 1966 QED 1971 ed 1974 sed 1975 em 1976 ex 1979 vi 1987 stevie sam 1990 elvis 1991 vim 1994 nvi acme 2010 scintillua 2011 kakoune sandy 2014 neovim vis ... 1Wikipedia, vi family tree, editor history project TL;DR: What is this all about? Demo I https://asciinema.org/a/41361 I asciinema play ... Modal Editing I Most editors are optimized for insertion I What about: navigation, repeated changes, reformatting etc. I Different modes, reuse same keys for different functions The vi(m) Grammar Editing language based on: I Operators (delete, change, yank, put, shift, ...) I Counts I Motions (start/end of next/previous word, ...) I Text Objects (word, sentence, paragraph, block, ...) I Registers Demo: Motions and Text Objects Motions I Search regex /pattern I Find character f{char} I Match brace % Text Objects I Block a{ I Indentation i<Tab> I Lexer Token ii Structural Regular Expressions The current UNIX® text processing tools are weakened by the built-in concept of a line. There is a simple notation that can describe the `shape' of files when the typical array-of-lines picture is inadequate. That notation is regular expressions. Using regular expressions to describe the structure in addition to the contents of files has interesting applications, and yields elegant methods for dealing with some problems the current tools handle clumsily. When operations using these expressions are composed, the result is reminiscent of shell
    [Show full text]
  • Regex Pattern Matching in Programming Languages
    Regex Pattern Matching in Programming Languages By Xingyu Wang Outline q Definition of Regular Expressions q Brief History of Development of Regular Expressions q Introduction to Regular Expressions in Perl What a Regular Expression Is? A regular expression is a sequence of characters that forms a pattern which can be matched by a certain class of strings. A character in a regular expression is either a normal character with its literal meaning, or a metacharacter with a special meaning. Some commonly- used metacharacters include: . ? + * () [] ^ $ For many versions of regular expressions, one can escape a metacharacter by preceding it by \ to get its literal meaning. For different versions of regular expressions, metacharacters may have different meanings. Brief History ❏ It was originated in 1956, when the mathematician Stephen Kleene described regular languages using his mathematical notations called regular sets. ❏ Around 1968, Ken Thompson built Kleene’s notations into the text editor called QED (“quick editor”), which formed the basis for the UNIX editor ed. ❏ One of ed’s commands, “g/regular expression/p” (“global regular expression print”), which does a global search in the given text file and prints the lines matching the regular expression provided, was made into its own utility grep. ❏ Compared to grep, egrep, which was developed by Alfred Aho, provided a richer set of metacharacters. For example, + and ? were added and they could be applied to parenthesized expressions. Alternation | was added as well. The line anchors ^ and $ could be used almost anywhere in a regular expression. ❏ At the same time (1970s), many other programs associated with regular expressions were created and evolved at their own pace, such as AWK, lex, and sed.
    [Show full text]
  • Using Regular Expressions in Intersystems Caché
    Using Regular Expressions in InterSystems Caché Michael Brösdorf, July 2016 Contents 1. About this document ........................................................................................................................... 1 2. Some history (and some trivia) ........................................................................................................... 2 3. Regex 101 ............................................................................................................................................. 3 3.1. Components of regular expressions ......................................................................................... 3 3.1.1. Regex meta characters ................................................................................................. 3 3.1.2. Literals ........................................................................................................................... 3 3.1.3. Anchors ......................................................................................................................... 3 3.1.4. Quantifiers..................................................................................................................... 4 3.1.5. Character classes (ranges) ............................................................................................ 4 3.1.6. Groups ........................................................................................................................... 5 3.1.7. Alternations ..................................................................................................................
    [Show full text]
  • Stream Processing Using Grammars and Regular Expressions
    Stream Processing Using Grammars and Regular Expressions Ulrik Terp Rasmussen DIKU, Department of Computer Science University of Copenhagen, Denmark May 1, 2017 arXiv:1704.08820v1 [cs.FL] 28 Apr 2017 PhD Thesis This thesis has been submitted to the PhD School of the Faculty of Science, University of Copenhagen, Denmark i Abstract In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high- performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action ma- chines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear- time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex.
    [Show full text]