CS252 Introduction to Unix for

December 16, 2019 CS252 Outline

Spring 2020

1 Preamble

Syllabus Policies Library Library

Below are the modules that comprise the course content.

Each module includes a mixture of lecture notes for you to read (required) and relevant textbook chapters (optional). Many of the lecture notes include “Try This” activities for you to perform while logged in to one of our servers.

Most modules include at least one assignment that you should attempt to complete before moving on.

KEYS TO SUCCESS IN THIS COURSE:

1. READ THE SYLLABUS

The syllabus lays out the basic course policies. It tells you what you need to do to earn a passing grade. It tells you when you need to have done that by. It tells you how to get in touch with me if you run into problems.

2. HAVE A SCHEDULE

You have the freedom to schedule your own time in this course, but you DO need to set up a schedule. Don’t forget that this course exists and that you are registered for it. Don’t think you can repeatedly set it aside for weeks at a time and make up the time later.

There are 14 assignments in the course. There are approximately 14 weeks in a Fall or Spring semester (12 in summer). You can easily figure out what kind of pace you should be setting if you want to complete this course.

3. IF YOU DON"T UNDERSTAND SOMETHING, ASK QUESTIONS

In a web course, my role as Instructor changes from “lecturer” to “tutor”. You can ask questions in the course Forums. You can send me email. You can also contact me during office hours. You’ll find more information on these options in the syllabus and other documents on the Course Policies page.

Some people are too shy to ask questions. Some are too proud to ask questions. My advice to both groups is to get over it! Part of being educated is knowing how to exploit your available information resources. In this course, I am one of those resources.

4. READ THE LECTURE NOTES. DO THE “TRY THIS” EXERCISES.

As a general rule, everything you need to complete the assignments and final exam are contained in the lecture notes and are things that you will practice with in the “Try This” exercises.

The listed textbook readings are optional. Hunting the internet for additional info is possible, but can often lead to more confusion than , so you do that at your own risk. When you read, read attentively. When you do the Try This exercises, be observant to the results you are getting and make sure that you understand them.

If you consistently find yourself starting the assignments thinking that you are prepared, then get stuck with no idea how to proceed, that’s a good sign that you were not really giving the lecture notes or Try This exercises enough attention. 1 Getting Started

Objectives

Upon completing this section, a student should…

Be familiar with the course layout and policies. Understand the differences between local and remote command sessions text-mode and graphics mode sessions Have set up a CS network account. Be able to log in to a remote text-mode session on the CS Dept Linux machines.

1. Read: Welcome to CS 252 2. Read: CS 252 Syllabus - Spring 2020 3. Read: Communications 4. Why Unix? 5. Unix account setup 6. Peek: Preface, Ch 1

7. Logging In

8. See website for assignment. 2 The Basics: Working in

2.1 Files and Directories

Objectives

Upon completing this section, a student should…

Understand the hierarchical structure of a typical file system. Understand how paths identify the location of a file. Recognize the various ways to name their own home directory and those of other account holders. Understand the difference between relative and absolute paths. Write both absolute and relative paths to a desired file. Be able to issue basic commands for creating and listing directories, copying, moving, and viewing the contents of files.

1. Working in a Text-Based Interface 2. Peek: Ch 3 3. Taylor, ch 3, 4, 6 4. The Unix File System 5. Some Basic Unix Commands

6. See website for assignment. 7. Getting Help 2.2 The Elements of Unix Commands

Objectives

Upon completing this section, a student should be able to…

Understand the common special characters available in the command . Use special characters to speed up and simplify the typing of commands. Use the three forms of quoting (single quotes, double quotes, and backslashes) to suppress special character actions. Be able to use wildcard patterns to describe lists of multiple files.

1. Typing Unix Commands 2. Patterns for File Names: Wildcards 3. Peek: Ch 4 4. Quoting Special Characters

5. See website for assignment. 2.3 Editing Files

Objectives

Upon completing this section, a student should be able to…

Discuss the relative merits of the nano, emacs, and vim editors. Use at least two of those editors to create and modify text files.

1. Editing in Text Mode 2. Sobell, ch 6,7

3. See website for assignment. 2.4 Protection

Objectives

Upon completing this section, a student should be able to…

Understand the Unix file permissions model. List the permissions being granted by a file. Change the permissions granted by a file.

1. Tansley, ch 1 2. File Protection

3. See website for assignment. 2.5 File Transfer

Objectives Upon completing this section, a student should be able to…

Identify the common protocols used to transfer files from one computer to another over a network. Understand the difference in ASCII text file format between Windows and other operating systems (Unix, Android, OS/X). Transform ASCII text files from Windows format to Unix and vice versa. Use SFTP to transfer files between their own PC and the CS Dept servers.

1. Peek: Ch 6 2. File Transfer

3. See website for assignment. 2.6 Regular Expressions

Objectives

Upon completing this section, a student should be able to…

Write regular expression patterns to describe desired text during search operations. Use common commands for searching through the contents of files and for doing simple text replacements within a file.

1. Patterns for Text: Regular Expressions 2. Tansley, ch 7-8, 10

3. See website for assignment. 2.7 Modifying and Combining Commands

Objectives

Upon completing this section, a student should be able to…

Use input redirection to send the contents of a file to the input of a command that is expecting standard (keyboard) input. Use output redirection to send the output of a command into a file rather than to the screen. Employ pipes to use the output of one command as the the input of another. Use the find and xargs commands to search for and operate on groups of files.

1. Peek: Ch 5 2. Venkateshmurthy, Ch. 4 3. Redirection and Pipes 4. Commands That Launch Other Commands

5. See website for assignment. 3 Program Development in Text-Mode

3.1 Compiling Objectives

Upon completing this section, a student should be able to…

Issue appropriate commands to compile simple ++, C, and Java programs. Issue the sequence of steps required to compile programs consisting multiple compilation units. Capture lengthy lists of error messages for later examination.

1. Compiling and Executing Programs 2. Dealing with Error Messages

3. See website for assignment. 3.2 Project Management (make)

Objectives

Upon completing this section, a student should be able to…

Use the ’make` program to automate a series of project build steps. Write make files describing the automation of a typical programming project.

1. Project Management with Make

2. See website for assignment.

3. Compiling in Editors 4 Working in Graphics Mode 4.1 The

Objectives

Upon completing this section, a student should be able to…

Launch a remote graphics-based session using a compressed X protocol (X2Go). Launch and other X-based programs for operation from their own PC. Discuss the relative merits of the , emacs, and vim editors in a graphics-mode session.

1. Peek: Ch 7

2. The X Window System

3. Editing under X 4. Troubleshooting X

5. See website for assignment. 4.2 Program Development: IDEs

Objectives Upon completing this section, a student should be able to…

Identify the components comprising a typical IDE. Employ the IDEs available on the CS Linux servers (emacs, Code::Blocks, and Eclipse) to create and compile C++ programs.

1. IDEs for Compiling under X

2. See website for assignment. 4.3 Debugging

Objectives

Upon completing this section, a student should be able to…

Understand the value and basic operations of an automated debugger. Employ a debugger to step through code set breakpoints examine the values of program variables examine the call stack Perform each of the above operations in nemiver, Code::Blocks, and `Eclipse.

1. Debugging 2. Debugging under X

3. See website for assignment. 5 Scripting 5.1 Environment Variables

Objectives

Upon completing this section, a student should be able to…

Understand how environment variables affect shell commands. Set and examine environment variables. Use backticks to capture command output in an environment variable. Understand the role of the PATH variable.

1. Tansley, ch 16-18, 20 2. Shore Are a Lot of Shells! 3. Shell and Environment Variables 4. Customizing Your Unix Environment 5.2 Shell Scripts

Objectives

Upon completing this section, a student should be able to… Understand the concept of a script. Write simple scripts. Use control-flow features of the scripting language to modify the order in which script commands are issued. Pass command-line parameters to a script and manipulate those within the script’s commands.

1. Venkateshmurthy, Ch. 8 2. Scripts

3. See website for assignment. 6 End of Semester

1. All assignments for the semester are due by 11:59:59PM ET. Due: 12/06/2019 2. Final Exam (on Blackboard) 12/08/2019, 12:00AM EST - 12/10/2019, 11:59PM EST

© 2015-2020, Old Dominion Univ. CS 252 Syllabus - Spring 2020

Steven Zeil

Last modified: Aug 2, 2018

Contents: 1 Course Description 2 Basic Information 2.1 Instructor 2.2 Location 2.3 Text 2.4 Course Prerequisites 2.5 Hardware and Requirements 3 Course Policies 3.1 Meeting Times 3.2 Computer Accounts 3.3 Communications 3.4 Academic Honesty 3.5 Grading 4 Getting Started 1 Course Description

CS 252 is an introduction to Unix with emphasis on the skills necessary to be a productive in Unix, Linux, and related environments.

The focus of this course is on learning enough Unix for students to function productively in CS courses at the 300 level and beyond. Because working directly from a workstation console in a CS Dept lab is no longer the dominant mode of interacting with our Unix systems, this course will emphasize connecting via the Internet from a remote PC to our Unix systems. Both text-based (ssh/shell) and window-based (X) connections will be covered.

This is a self-paced course delivered via the Internet and may be taken for P/F grades only. There are no regularly- scheduled class meetings. Students will be able to work through the material at any time,1 including taking automatically graded assignments. At the end of the semester, a final exam will be issued. Students who have successfully completed a sufficient number of assignments and achieved a sufficient grade on the exam will be given a grade of P. More detailed information is in the Grading section of this document. 2 Basic Information

2.1 Instructor

Steven J. Zeil E&CS 3208 (757) 683-4928 Fax: (757) 683-4900 [email protected]

When sending e-mail to the instructor, please put include the course number (“CS252”) as part of your Subject line. Messages with that in the subject are flagged by my email program for faster attention and are less likely to be lost amid my daily dose of spam messages. 2.1.1 Office Hours

Students may meet with the instructor in person, by telephone, or via internet-conferencing (Hangouts). A week-by-week schedule of available meeting times can be found by going to the instructor’s home page (http://www.cs.odu.edu/~zeil) and clicking on “Office Hours and Appointments”. 2.2 Location

This course is hosted on ODU’s Blackboard server. 2.3 Text

The readings for this course available on-line.

The lecture notes, denoted in the course outline by the symbol , are required readings.

All information necessary to do the assignments and complete the course is in the lecture notes.

Various textbook chapters, identified in the course outline by the symbol are also listed. These are optional readings.

If you are struggling with some of the ideas presented in the lecture notes, you may find that the texts provide alternative viewpoints or explanations.

These texts are available on-line via the ODU Library through the Virtual Library of Virginia (VLVA). 2.4 Course Prerequisites

CS 150 (Introduction to Programming), or an equivalent, or current registration in CS 333

Students are also expected to be familiar with the use of standard Internet-based tools including email and web browsers. 2.5 Hardware and Software Requirements

Because this course is hosted on the Internet, you will need to make sure that you have access to the appropriate computing equipment and software to participate in the course activities.

2.5.1 Computing Devices

You will not need your own access to a Unix or Linux machine. The CS Dept provides such machines, and learning how to use them from both on and off-campus locations is a major theme of the course.

You will need hands-on access to a PC of some kind. Windows (Windows 7 or later), Mac, or Linux boxes are all acceptable. These do not need to be particularly powerful machines, as you will be using them as entry points for remote connections to the CS Dept. machines on which you will be doing your “real” work.

You will need to install some software on your PC. All such software is available in free, open-source distributions and will be introduced as it becomes relevant during the course.

You should not plan on using a PC at your place of work, or in a library, or other locations where you do not have permission to install software.

In some cases, you may be able to install the software to a USB flashdrive and then use that on machines where you otherwise do not have permissions to install software. Be aware, however, that some sites that lock down their machines to prevent new software installation may also lock them down to prevent execution of programs from USB drives.

Software requirements are fairly relaxed. You will need a reasonably up-to-date version of the Edge, Firefox, or Chrome web browser. Other browsers or older versions of these may also be acceptable, but cannot be guaranteed so, because the course materials are not tested with other and older browsers.

Neither Internet Explorer (Windows) nor Safari (Apple) are recommended. They might work. They might not.

2.5.2 Internet Access

You will need a good quality internet connection. Again, be wary of planning to use PCs at work, libraries, etc., with this course. Many of these places will run firewalls that heavily restrict access to network services.

If you are working form home, your home internet probably allows the access you need by default. If you need to check with the systems staff at some location, tell them that you need to make outgoing connections to

1. web servers using the http and https protocol. If you can view web pages like this one, you are probably OK.

2. secure shell servers using the ssh protocol (port 22). Although rare, some companies and public libraries (including at least one city library in the local Hampton Roads area) do block this. 3 Course Policies 3.1 Meeting Times

This is a self-paced Internet-delivered course. There are no regularly scheduled class meetings. 3.2 Computer Accounts

All students taking this course will need a login account on the CS Dept.’s Unix network. (This is distinct from any Midas or other account you may have from the general University computer center – the ODU ITS).

You may have a CS account already if you were registered for a CS class last semester. If not, you will need to create a new account. Instructions on how to do so are in the course materials for the first module of the course. 3.3 Communications

Because this course does not have traditional lectures, most communication between instructor and students will need to be conducted electronically. Options include email and Forum postings. Details can be found in the Communications Policy.

When sending email related to this course, please remember to include “CS252” as part of the email subject line. This will flag your email for my attention and may also help avoid its getting lost amid my daily spam.

As noted earlier, I will hold regular office hours. Off-campus students can contact the instructor by telephone or by network conferencing during these times. 3.4 Academic Honesty

Everything turned in for grading in this course must be your own work.

If you have questions about the readings or the general subject matter, you may ask me, your classmates, other students, tutors, or anyone else you think might be helpful. If you have questions about assignments (or any other graded activity), you may ask me, the course TA (if I have one – I usually do not have one for this course), or official tutors provided by the CS Dept. or ODU.

You may not discuss possible solutions to assignments or other graded activities with your classmates, other students, TAs for other courses (including TAs for CS 150, 250, or 333), tutors who you may have hired on your own, forums and help sites on the web, etc. Students who copy the bulk of an assignment from other students or from online sites will, at the very least, receive a zero on that assignment.

The instructor reserves the right to question a student orally or in writing and to use his evaluation of the student’s understanding of the assignment and of the submitted solution as evidence of cheating.

Violations will be reported to the office of Office of Student Conduct and Academic Integrity for consideration for punitive action.

Students who contribute to violations by sharing their code/designs with others are subject to the same penalties as those who misrepresent such work as their own. 3.5 Grading

This is a Pass/Fail course. No letter grades are assigned. The only possible grades are P, F, and WF. (This course does not affect your grade point average — only letter-graded courses can do that.)

3.5.1 Requirements

To obtain a pass (P) grade, students must accumulate 18 points out of a possible 24.

Points are awarded as follows:

One point for each assignment completed, up to a total of 14.

All assignments are automatically graded. Students can check their assignment status at any time by using the Grades button on the various course directory pages.

One point for each 10% correct on the final exam.

The final exam is a multiple-choice exam and will be available on line during exam week on dates listed in the course outline.

For example, a student who completes all 14 points need score only 40% on the final exam. A student who completes 12 assignments must score a 60% on the final exam.

Students who fail to achieve the required 18 points will be given an F if they took the final exam, a WF if they did not attempt it.

It is worth noting that the course is designed with the assumption that students will make a serious attempt to complete all assignments, even though this is not strictly required. The assignments give you practice with and help to reinforce the lessons tested by the exam.

Historically, there is a strong correlation between the number of assignments completed and the chances of scoring high enough on the exam to pass the course:

# assignments completed % of students passing the course < 8 mathematically impossible to pass 8 0% 9 0% 10 10% # assignments completed % of students passing the course 11 33% 12 60% 13 85% 14 95%

3.5.2 Due Dates

Assignments are due by the end of the day announced in the outline/schedule. Typically this is the final day of classes for the Fall and Spring semesters and the day prior to the opening of the final exam for the Summer semester.

The final exam will be available on-line. Refer to the outline/schedule for the dates. Detailed instructions will be posted in the Exams area on Blackboard.

3.5.3 Extensions, Exceptions, and Incomplete (I) Grades

Exceptions to the due dates or grading policy will only be granted under the conditions defined by the ODU policy on Incomplete (I) grades: “exceptional circumstances beyond the student’s control”. Except in such circumstances, students who fail to complete the course in the time allowed will not be permitted to resume the course without re-registering, and would then be expected to complete all assignments from the beginning of the course.

Reasons that are most likely to justify an exception include extended illness, military deployments, or job transfer/relocation, but you should be prepared to document these if requested.

The following are usually not valid reasons for an extension:

“I forgot that I was signed up for this course.” or “I didn’t know what the Grading policy was.”

This was not beyond your control.

“I have a part-time (full-time) job.”

This is not exceptional. Most of your classmates work, many of them full-time.

“I have a heavy course-load this semester.”

Neither exceptional nor outside your control.

“I’ve worked so hard on this course.”

I’m sure that your fellow students would not appreciate the implication that you believe they have been slacking off.

Fundamentally, though, this argument misses the whole point of grades in a course. They are not a reward for putting in time and effort. They are given, instead, to certify that you have demonstrated a certain level of knowledge and, in the case of a course like this that is a prerequisite to many other courses, to make sure that you are sufficiently prepared to succeed in those later courses.

“I got stuck on assignment X and was never able to catch up.”

Actually, this might qualify, but only if you made good use of email and/or my office hours to resolve your problems with that assignment in a timely fashion. Your chances of getting an exception in this case will also depend upon just how many assignments you have remaining to complete. You are far more likely to get a short period of time to complete one assignment than to get any extra time at all to complete 7 assignments.

"I had trouble completing some assignments and haven’t sent you an email or attended your office hours because I’m not the kind of person who likes to ask for help. Then you shouldn’t be the kind of person who asks for exceptions either. A significant part of a college-level education is learning to exploit the information resources available to you. Deliberately refusing to do so is not a behavior that I’m inclined to reward.

Requests for an “I” grade or extended time to complete the course should be made before the actual end of the semester, whenever possible. Requests made after grades have been submitted will need to include an explanation of why the request was delayed.

3.5.4 Fourth-Week Grade Report

University regulations require that all instructors of 100 or 200-level courses provide students with an interim grade report by the end of the 4th week of the semester. Obviously, such a report is of questionable utility in a self-paced course like this one.

Students may obtain this report from Leo Online (the same system used to retrieve end-of-semester grades). Students who, by the end of the 4th week of the semester, have completed at least 4 assignments are considered to be “on a pace” to successfully finish the course by the end of the semester. 4 Getting Started

A typical work session for this course starts by entering the course via the course BlackBoard site (linked at the bottom of this page) to check for announcements. Then click on “Modules” to reach the course Outline page.

On the Outline page, you will see the list of topics, with on-line lecture notes, textbook readings, and assignments. You can then begin working through the course material, or pick up from wherever you last left off.

1: Although there are few deadlines associated with CS 252 itself, other CS courses may list CS 252 as a co-requisite, and instructors in those other courses may impose their own deadlines as to when they expect portions of CS 252 to have been completed.

For example, a CS 250 instructor may want to give an assignment on October 15 in which the g++ compiler will be used, and so may inform CS 250 students that they must have completed the CS 252 assignment on “compiling using g++” by October 7. Course Policies

1 READ THIS!

Syllabus

Communications: where to ask questions and make comments, and what information to include when you do. Library

1 PDF Copies of Lecture Notes

For those who might wnat to read the lecture notes offline, each of the PDF documents below contains all lecture notes and assorted other documents for the course. Assignments are not included.

The content of each file is the same. Only the page sizes have been varied to facilitate reading on different types of devices.

A conventional letter size page. A 4x3 landscape page for viewing on tablets with that ratio of width to height (usually larger tablets, 10-12in diagonals). An 8x5 landscape page for viewing on tablets with that ratio of width to height (usually somewhat smaller tablets, 7-9in diagonals, designed for watching “wide-screen” videos).

This is something of an experimental feature. Let me know if you try it and find it useful. 2 Reference Material

Glossary of commands covered in this course.

emacs reference card

gcc/g++ compiler documentation

gdb debugger user manual.

ddd debugger documentation.

A complete guide to shell scripting, using Bash 3 Software Downloads

All software listed here is free unless explicitly stated otherwise. In several cases, I have noted that the software is “portable”, meaning that you can install it onto a flash drive and then run it on other Windows machines without first installing it on those machines. (This is useful if you are going to be working on computers at your job, at a library, at a friend’s hourse, etc). 3.1 Secure Shell (ssh) clients

For Windows, I recommend PuTTY. It is fairly easy to use, and well-documented. PuTTY emulates an terminal.

You can also get “portable” versions of PuTTY here or as part of the Xming package (discussed below). This means you can drop a copy onto a USB flashdrive and run it on almost any Windows PC.

If you are on a Linux or Mac OSX machine, you probably already have the ssh command.

Many PCs in the CS Dept labs have the ssh Secure Shell client, which is also a perfectly good ssh client. Another option for Windows users is to install CygWin (see below) and use its command-line ssh. 3.2 FTP & SFTP:

3.2.1 Command-Line clients

Windows comes with a command-line FTP client that can be run in any cmd window. However, it has problems with connections through a router or firewall. Since most people connectiong from off-campus will fall into that category, you may want to go with one of the alternatives described below.

If you are on a Linux or Mac OSX machine, you probably already have ftp and sftp commands.

If you are on Windows but have installed the ssh support of CygWin (see below), you should have sftp as well and can install ftp.

The same folks who provide PuTTY also provide command-line versions of scp and sftp for Windows.

3.2.2 GUI-Based clients (for Windows)

WinSCP for Windows supports both FTP and SFTP. It’s free, easy to use, and has some useful options for synchronizing directories. A portable version that runs from a flash drive is available.

WinSCP is available on most Cs Dept lab PCs.

FileZilla for Windows, macOS (OS/X), and Linux is also a good GUI for using FTP. It has added SFTP support as well, and so offers much the same capabilities as WinSCP. A portable version that runs from a flash drive is available.

Both WinSCP and Filezilla are solid programs and the choice between them is largely a matter of personal preference. 3.3 X servers

For access from off-campus, I strongly recommend that you use an accelerated X package (X2Go). For programs with complicated interfaces, it can run 10 times faster than ordinary X servers.

Important: The server/client terminology can be a bit confusing.

You are installing an X server on your PC. If your PC is running Linux, you already have this. If you use X2Go, you install an X2Go client on your PC. If your PC runs Windows, the X2Go client includes an X server. If your PC runs macOS (OS/X), you need to install an X server separately, then the X2Go client will work that. If your PC runs Linux, you already have an X server. The X2Go client will work that.

3.3.1 Accelerated X

X2Go is an enhanced version of the X protocol and is the recommended way to make X connections to our servers.

The X2Go client can be run on Windows, macOS (OS/X), or Linux, and is generally easier to install and use than “straight” X clients.

Windows and macOS (OS/X) users can get the X2Go client here. Note that macOS (OS/X) users should install XQuartz, the native Apple X server, first.

Linux users can probably get the X2Go client from their normal software distribution tool. It’s become popular enough that it is offered in the Ubuntu Software Center, the Mint and Debian apt-get or repository managers, etc.

If you can’t locate a version in your Linux distribution’s repository, you can get it from the [X2Go site]http://wiki.x2go.org/doku.php/download:start.

The X2Go client is portable: You can be install it on a USB flash drive and run it on most Windows PCs, even ones where you don’t have permission to install new software.

Pyhoca-GUI is an alternate interface to X2Go. The X2Go client is the same, only the interface is different. I actually use this on my own Windows and Linux PCs, because it offers some options that the standard X2Go client does not. There is no macOS version.

3.3.2 Ordinary X

When working from off-campus, I do not recommend an ordinary, unaccelerated X connection.

When you are on-campus, particularly if you are seated in a CS Dept lab or using ODU Wifi, a straight X client can be a reasonable choice.

I still recommend the X2Go client, though, described above because it’s generally easier to install and set up.

CygWin/X is an option for Windows users. It builds on the CygWin *nix emulation suite. It’s not the easiest setup. See the User’s Guide on that site for installation information.

VcXsrv is a native Windows port. It seems to work nicely with both PuTTY for connecting to remote servers and with Windows 10 bash to run GUI-based Linux applications.

Another package you might install on your own PCs is Xming, a free, open source X server.

From the Xming site, you will want to install the “Public Domain” releases of “Xming” and of the Xming-fonts.

Important: do not use XMing on the CS Lab PCs or in the CS Virtual PC lab.

All of the Linux options described below run X as their normal display. Booting Linux from a CD or flash drive is an easy way to get X, as discussed below. The disadvantage is that you can’t access your Windows programs at the same time.

XQuartz

Apple’s macOS is a Unix variant, so it’s not surprising that it would support X. Apple has flip-flopped on their attitude towards this. In some releases of OS/X, X was included by default. In other releases, it depended on what kind of hardware you were running. Currently, Apple does not include X in macOS, but does support the XQuartz project as a separate package. 3.4 Unix Emulation

The Cygwin environment provides a nearly complete Unix emulation that runs under Windows. You can pick and choose what packages you want, including the g++ compiler, telnet and ssh clients. This is also the basis for the Cygwin/X package, described above. Installation requires some Unix familiarity, so this might not be a good bet for people just starting CS 252, though by the time you finish this course you should certainly be up to it.

Virtualization is the practice of running entire emulated PCs in a software package. This technology has matured quite a bit, and emulated PCs can often run at speeds approaching the real thing.

You could install a free virtualizer such as VirtualBox or VMWare. Then choose a linux distribution, download its installation CD image, and use it to create a virtual PC. 3.5 Linux

If you want to go even farther than just emulating Unix, consider getting a full Linux . Linux is actually far less demanding on your CPU and memory than an equivalent Windows installation, so it’s a great way to rejuvenate that old PC that isn’t quite up to Windows 8 (or even Windows 7).

If you’re curious and just want to give it a try, there are a number of Live CD packages that put an entire Linux installation on a single CD, allowing you to boot Linux from the CD without touching your Windows installation on the hard drive.

Most of these are intended as demos and may not allow you to write anything to your (Windows) hard drive, though they often will save files on USB flash or “thumb” drives. Keep in mind that running from the CD is usually slower than it would be from a hard drive.

Ubuntu Linux is the most popular Linux distribution in the world. It is easy to install and manage, and the major Linux software packages tend to try very hard to stay compatable with Ubuntu. Ubuntu can actually be installed as a Windows application that resides in a single large file on your hard drive - you don’t need to re-partition your hard drive. Google for “Ubuntu WUBI”.

I myself use Linux Mint, which offers the same software mix as Ubuntu, but has a more traditional desktop style (menu button, task bar, etc.). I find that style of desktop more comfortable for managing the rather large number of different applications that I employ on a regular basis.

Among the Live CD packages, I know of one that can be run entirely from the CD at no speed penalty and allows you to save files in a special area on a Windows hard drive. Puppy Linux is, despite the cute name, a quite usable distribution, though a little hard to add optional packages to (including compilers, etc.), but quite speedy after an initially slow boot.

Don’t want to carry a CD around? Pendrivelinux gives instructions on how to put a variety of Linux distributions onto a flash drive. Most of their instructions give you a flash drive that you use to reboot the PC into Linux, but there are also some ways to, at some cost in speed, run Linux as a Windows application without rebooting. CS252 : Frequently Asked Questions

Steven J. Zeil

Last modified: Nov 6, 2019

Contents: 1 General CS252 Questions 1.1 I can’t log in to CS Dept servers 1.2 I can’t log in to CS252 web pages 1.3 Where can I get help on this course? 2 Common Unix Command Errors 2.1 “Permission denied” 2.2 “No such file or directory” 2.3 Where do I find the file ~cs252/Assignments/fileAsst/foo.txt (…or some other length path)? 2.4 I get time-out errors when trying to connect to a server 2.5 I have files in my directories that I don’t own and can’t delete. 3 C++ & General Programming Questions 3.1 What’s all this “foo” and “bar” stuff? 3.2 Why do compilers’ error messages often give the wrong line number, or even the wrong file? 3.3 I’m getting “…undeclared…” or “No match for…” errors when I compile C++ code 3.4 I’m getting “undefined reference to…” errors when I compile C++ code 4 Miscellaneous 4.1 How do I type the M- key in emacs? 4.2 In regular expressions, how can I match a string of exactly K characters?

This is a collection of questions (and answers!) that have arisen repeatedly in some of my past classes. 1 General CS252 Questions

1.1 I can’t log in to CS Dept servers

Common mistakes:

You don’t have an account on our network. You forgot or mis-typed your password. (Some people will do this over and over again.) You have the name of the server machine wrong. You have the wrong settings in the client program you are using the make the connection. You are using the wrong type of client program to make the connection (e.g., trying to make an ssh connection using a web browser). 1.2 I can’t log in to CS252 web pages

Try logging in to some of the CS Dept servers first. If you can’t do that, either, then see the question above.

Most common reasons why people can log in to CS Dept Servers but not CS252 pages:

You aren’t in the CS252 class for this semester. You can still read the lecture notes, but assignments and grade pages are limited to enrolled students.

You activated your CS network account within the past 24 hours. Just wait.

You are on the web site for a previous semester.

Look at the top of the outline page. It says what semester that site is for. Is it the semester you want? (I leave old sites up as a convenience to former students.)

If you are looking for the current semester site, find the link in Blackboard (if you are enrolled in the course) or get it from my home page. 1.3 Where can I get help on this course?

1. You can ask questions about the general course material in the Forums on Blackboard.

2. You can ask the instructor questions about general course material and about the assignments by email.

3. You can make use of the instructors’ office hours. See the syllabus for details.

4. Tutoring might be available from the Math & Science Resource Center.

Note that a reputable tutor will work with you on general course material, but should not be directly helping you to solve the assignments. 2 Common Unix Command Errors 2.1 “Permission denied”

In Unix, there are three basic kinds of permissions on every file:

1. Read permission: allows you (or the commands that you issue) to look at the contents of that file.

2. Write permission: allows you to alter the contents of a file.

3. Execute permission: allows you to execute or run the file as if it were a command or program.

(These same permissions also apply to directories, but the meanings are slightly different.)

So, in general, if you get a Permission denied error, it’s because you tried to read/write/execute a file for which you did not have the corresponding permission.

In CS252, there seem to be two very common reasons why people get this.

2.1.1 Common Reason 1: That’s not YOUR file

Many CS252 students will mistakenly try to create or edit files in my one of directories, e.g.:

cp foo.dat ~cs252/Assignments/wherever/

~cs252 is shorthand for /home/cs252, the home for the cs252 class account, which is my directory, not yours.

Your directories would be at paths starting with ~_yourLoginName_ or /home/_yourLoginName_ .

It should not surprose you to learn that I don’t allow you write permission on my files and directories.

2.1.2 Common Reason 2: That’s not a program or command For ordinary data files that you create and work with, you will have read and write permission, but not execute permission, because you can’t execute ordinary data files. They’re data, not programs.

All Unix commands start with a command or program name, followed by zero or more parameters, most of which are going to be paths to files. Examples are:

cp /usr/include/math.h /usr/include/stdio.h ~/playing mv ~/playing/math.h ~/playing/math0.h g++ -o newProgram newProgram.cpp ./newProgram

But sometimes CS252 students forget to put in the command or program:

~/playing/math.h ~/playing/math0.h

Now, when I look at that, I wonder if the student wanted to copy the first file, rename it, delete the two files, open both files in an editor, or something else?

But when the command shell (the program that reads your typed commands and launches the actual program to carry out the command) looks at that, it assumes that you meant the first thing listed to be a command or program. So you wind up seeing something like:

~/playing/math.h ~/playing/math0.h Permission denied

Why? Because the command shell assumed tyhat you wanted it to execute ~/playing/math.h, and that’s an ordinary data file, so you don’t have Execute permission on it.

See also: Try This (bad commands) 2.2 “No such file or directory”

This is actually pretty self-explanatory. You just issued a command containing a path to some file, and the path does not actually match any existing file.

Usually, this is just a matter of a simply spelling mistake. But it throws some people for loop, because they just can’t believe that they actually typed something wrong.

What can you do about it?

Look and see what the problem is. ls is one of the first Unix commands that you learned. Use it!

If you just typed something like

./myProgram ~/UnixCourse/asst2/dataFile.txt and go a No such file or directory message, then check each of the components in the command. See if it exists and if you have spelled it correctly:

ls ~/UnixCourse/asst2/dataFile.txt

If ls finds that file, then the problem must be in ./myProgram. So do an ls on that. But if you got another No such file or directory, then try: ls ~/UnixCourse/asst2/

If you get a list of files, look to see if dataFile.txt is in there. Or maybe something spelled similarly, such as datafile.txt or dataFile.dat. If, on the other hand, you still get No such file or directory, then try:

ls ~/UnixCourse/ and look for asst2 or something similar.

Diagnosing the reason for this error message should almost always be possible just by using the ls command.

Avoid making the mistake in the first place.

No, I’m not being snarky. Remember that you can use the Tab key after typing a few characters of a file or directory name to ask the command shell to fill in as much of the remaining name as it can.

That’s a great way both to avoid misspelling things in the first place (because you aren’t typing as much) and also to catch misspellings as soon as you make them (because the command shell doesn’t find any valid file or directory names that start with the few letters you just typed). 2.3 Where do I find the file ~cs252/Assignments/fileAsst/foo.txt (…or some other length path)?

How heavy is a 5 pound bag of flour? 2.4 I get time-out errors when trying to connect to a server

Two possibilities:

1. You are trying to connect to the wrong machine, or requesting a service of a machine that does not provide that service.

For example, if you try to make an ssh connection to vcportal.cs.odu.edu, the request will time out because, although that machine exists, but it’s not an ssh server and isn’t even listening to the the ports used for incoming ssh connections.

2. You aren’t waiting long enough. Look at the settings you are using in your client program for connecting to the machine. If you see a short time-out, you might want to try allowing more time.

This seems to be a common problem with the sftp clients FileZilla and WinSCP, which by default time out after a few seconds. Often, that’s just not long enough.

3. Your own network connection may be down. See if you can connect to other internet services such as your favorite websites.

4. The machine might be down. Not much you can do about that, except to alert the system staff, but be sure you have checked out the other three possibilities first. 2.5 I have files in my directories that I don’t own and can’t delete.

If an ls -l command reveals to you that you have some files or directories that belong to me, to ~cs252 or someone else in your account area, there’s a good chance that you won’t be able to delete them. As far as I know, the only way this happens is because you tried to copy files using “cp -a”. This may be a good time to remind you that:

As a general rule, everything you need to solve the assignments will be something that is covered in my lecture notes and that I have asked you to practice with in the “Try This” exercises. Avoid the temptation to go hunting up strange commands and command options on the internet or even in the Linux man pages in the belief that the magic answer is hidden out there somewhere.

The “-a” option of the “cp” command is definitely not something covered in the Lecture Notes nor that you have practiced with in the Try This exercises. It isn’t covered because it is rarely useful and often leads to exactly this sort of problem.

If you encounter this problem and can’t delete the errant files, the bad news is that “owner” of the files probably won’t be able to delete them either. That’s because although the owner likely has the permissions to manipulate those files, the owner probably does not have permission to navigate through your directories to where those files are located, and probably does not have permission to delete things from your directories either.

Here’s what you need to do:

1. Use the mv to rename the file/directory to something that won’t interfere with your work, e.g., “garbage” or “deleteThis”.

2. Send email to [email protected] explaining the problem. Ask them to delete the file/directory. Make sure that you give them the absolute path to the renamed file or directory that you want removed. 3 C++ & General Programming Questions 3.1 What’s all this “foo” and “bar” stuff?

There is a long-standing tradition in computer science of using certain words as sample variable/function/whatever names. Just as a mathematician might use “x” or “y” whenever an arbitrary variable name is needed, computer scientists tend to use “foo”, “bar”, and “baz”, in that order. Check out this entry in the Hacker’s Dictionary for a discussion of the origin of these terms. 3.2 Why do compilers’ error messages often give the wrong line number, or even the wrong file?

A compiler can only report where it detected a problem. Where you actually committed a mistake may be someplace entirely different.

Let’s look at a simple example:

3.2.1 Example 1

Assume that the compiler has read part, but not all, of your program. The part that has just been read contains a syntax error. For the sake of example, let’s say you wrote:

x = y + 2 * x // missing semi-colon

Now, when the compiler has read only the first line, it can’t tell that anything is wrong. That’s because it is still possible, as far as the compiler knows, that the next line of source code will start with a “;” or some other valid expression. So the compiler will never complain about this line. If the compiler reads another line, and discovers that you had written:

x = y + 2 * x // missing semi-colon ++i;

The compiler knows that you did something wrong. But it still won’t conclude that there’s a missing semi-colon. For all it knows, the “real” mistake might be that you meant to type “+” instead of “++”.

3.2.2 Example 2

Now, things can be much worse. Suppose that inside a file foo.h you write

class Foo { ⋮ Foo(); int f(); // missing }; and inside another file, bar.cpp, you write

#include "foo.h"

int g() { ... }

void h(Foo) { ... }

int main() { ... }

Where will the error be reported? Probably on the very last line of bar.cpp! Why? Because until then, it’s still possible, as far as the compiler knows, for the missing “};” to come, in which case g, h, and main would just be additional member functions of the class Foo.

So, with most error messages, you know only that the real mistake occurred on the line reported or earlier, possibly even in an earlier-\#include’d file. 3.3 I’m getting “…undeclared…” or “No match for…” errors when I compile C++ code

In C++, most things that you give names to (e.g., variables, functions, etc.) need to be both declared and defined.

You declare something by introducing its name and sating what type of thing it is. For example:

int foo (int x); declares a function named “foo” and a parameter named “x”.

The rule in C++ is that you must declare names before your try to use them.

If you get a message that says that something is undefined or is not a match for the name appearing in some line of code, then you have either

forgot to declare it before using it, or misspelled the name in the declaration, or misspelled the name when you later used it. 3.4 I’m getting “undefined reference to…” errors when I compile C++ code

In C++, most things that you give names to (e.g., variables, functions, etc.) need to be both declared and defined.

Your define something by supplying its name, description, and the initial value, function body, or other information that “completes” everything the compiler needs to know about that named thing. For example:

int foo (int x) { return x + 2; } defines the function named “foo”.

The rule in C++ is that you must define things exacly once in all of the compilation units (.cpp files) that make up your program before producing your final executable.

If you are getting a message saying that some name is undefined, it means that the compiler/linker could not find that definition when it tried to generate your final executable program.

Most often, this seems to happen with functions. The possible causes are:

You have forgotten to supply a body for a function. You misspelled the name of the function in the body, so the compiler thinks you are supplying a body for a completely different function. You forgot to compile the .cpp file that has the function body or forgot to include the resulting .o file when you linked the rest of the program together. You gave the wrong compilation command, and told g++ to treat a single .cpp file as the entire program even though there are multiple .cpp files making up the program. 4 Miscellaneous

4.1 How do I type the M- key in emacs?

This is covered in the emacs tutorial, but the basic answer is that it depends on what kind of PC you are one and on what ssh client or X server software you are using. But there are several options, and at least one of those is guaranteed to work:

"M- means hold the META or EDIT or ALT key down while typing . If there is no META, EDIT or ALT key, instead press and release the ESC key and then type . We write for the ESC key."

Also take note that if you have OS/mode or Alt keys on both sides of the spacebar, some ssh clients and X servers will treat them differently, often reserving one for issuing commands to your PC’s operating system while using the other to send things through the ssh connection to the remote machine. So you may need to try both the left and the right keys separately.

When in doubt, however, the Esc key always works. But It’s not a modifier like Shift or Ctrl that gets held down whily you type the other keys. You type Esc and release it before giving the rest of the character sequence. 4.2 In regular expressions, how can I match a string of exactly K characters?

K might be 3, 4, 5, or any fixed value.

1. Write a pattern that matches at least K characters. 2. Use ^ to anchor the first character of your pattern to the beginning of the string. 3. Use ‘$’ to anchor the last character of your pattern to the end of the string.

Because if you match K characters somewhere in a string, and the first character you match is at the start of the string and the last character that you match is at the end of the string, then “obviously” the string can only be K characters long.

For example, if you wanted to match only strings of length 3 in which the first character is an upper-case letter and the last character is a lower-case letter, you could use the regular expression

^[A-Z].[a-z]$ because

[A-Z] matches one character that must be an upper case letter, . matches any one character, [a-z] matches one character that must be an lower case letter, the ^ means that the upper-case character must be at the start of the string, and the $ means that the lower-case character must be at the end of the string.

Surprisingly, a lot of people seem to get the idea of using the ^ or the $, but don’t think to use both. Welcome to CS 252

Steven J Zeil

Last modified: Feb 22, 2018

You are reading the beginning of the lecture notes for CS 252, Introduction to Unix for Programmers, a course offered by the ODU Computer Science Department. The course is designed to introduce students to the basic Unix skills that they will need to work productively on the ODU CS Dept.’s network of Linux servers.

Whether you are enrolled in that course or not, you are welcome to peruse these notes.

This course is about working remotely, via the Internet, from your own PC, on Linux machines provided by the CS Dept. Your PC might be running Windows, OS/X or, yes, Linux. But…

You do not need to have a PC that runs Linux for this course.

You do not need to download and install Linux on your own PC for this course.

You will probably need to download and install some (free) communications software onto your PC to enable you to connect to those CS Linux machines over the Internet.

The necessary software will be introduced and discussed in the lecture notes as it becomes relevant. Links to download it will be available on the Library page.

CS 252 is likely to be very different from any course you have taken before. I hope it comes as no surprise to you to learn that CS 252 is a web-based course. (It really did say that in the on-line catalog description, but it’s easy to miss.)

This course does not meet at any scheduled time. It is a self-paced, work whenever-you-like experience. There are no deadlines directly associated with this course other than completing it by the end of the semester. Some students will complete the course in a couple of weeks. Others will stretch it out over the entire semester.

Note, however, that there may be indirect deadlines. If you are taking CS 250 or CS 333 this same semester, then those courses will require to complete some parts of 252 in time to use that material on assignments for that other course.

You’ll find the CS 252 course itself, including the syllabus that provides details on the course policies, at http://www.cs.odu.edu/~zeil/cs252/s20/.

If you were enrolled in a CS Dept course in the prior semester and still have an active CS account, you may start work on the course at any time. If you do not have an active CS account, you can start reading course materials now, but will not be able to access the assignments until you have set up your CS account, probably a week or so before classes start.

KEYS TO SUCCESS IN THIS COURSE:

1. READ THE SYLLABUS

The syllabus lays out the basic course policies. It tells you what you need to do to earn a passing grade. It tells you when you need to have done that by. It tells you how to get in touch with me if you run into problems.

Every semester I get at least one student who fails the course because they did not read the syllabus but instead relied on some rumor that they heard from somewhere about what is required in this course.

2. HAVE A SCHEDULE You have the freedom to schedule your own time in this course, but you DO need to set up a schedule. Don’t forget that this course exists and that you are registered for it. Don’t think you can repeatedly set it aside for weeks at a time.

There are 14 assignments in this course, and most semesters have 13-14 weeks (the summer semester is a bit shorter). That makes the pace you need to maintain fairly easy to gauge.

3. DO THE “TRY THIS” EXERCISES

As you read through the lecture notes, you will encounter numerous “Try This” exercises. These are mini-“laboratories”. I expect everyone to actually log in, try the commands that I suggest, observe the results, and to think about them.

It’s very rare for an assignment in this course to ask you to do something that you would not have practiced in one of the earlier “Try This”s. Before you go hunting through Unix documentation or websites for obscure commands or command options to solve an assignment, ask yourself if you’ve possibly forgotten or failed to recognize something from a “Try This”.

4. IF YOU DON"T UNDERSTAND SOMETHING, ASK QUESTIONS

You’ll find that nearly every page on the website has buttons that take you to a course Forum or allow you to email the instructor with the page URL already filled in ot the email. Whatever page you are on when you encounter something confusing, it’s easy to ask about it. You can also contact me during office hours, as described in the syllabus.

Some people are too shy to ask questions. Some are too proud to ask questions. My advice to both groups is to get over it! Part of a college education is learning to exploit your available information resources. In this course, I am one of those resources. (One of the big differences between teaching a webcourse as opposed to a regular lecture course is that I’m not spending my time preparing and giving lectures. So I fully expect to devote most of my time to answering one-on-one questions.)

Again, welcome to the course. I hope you find it a valuable and interesting experience.

Steven J Zeil Communications

Steven Zeil

Last modified: Nov 3, 2019

Contents: 1 So many options 2 General Rules for On-line Communications 2.1 Public and Private Communications 2.2 Etiquette in Email and Other Written Communications 3 Asking Good Questions 3.1 Identification 3.2 You Have to Give Information to Get Information 3.3 Thou Shalt Not Paraphrase 3.4 “Copy and Paste” is Your Friend! 3.5 No Screenshots 3.6 If I Ask You a Question, Answer It 1 So many options

Communication is a major concern in any course. Web courses make this a little trickier by reducing the options for face- to-face discussion. So I try to open up a variety of options that you can use.

Options for communications in this course include:

Email

to instructor, Email to the entire class should generally be avoided.

Forums

The forums for this course allow you to post messages that will be seen by the instructor and by all of your classmates. Sometimes this means that you can get a faster response to a question simply because more people are checking in during the day.

The forums are in your BlackBoard section for the course. There are two forums:

The Hallway: This is a place for questions and discussions about the general subject area of Unix/Linux and about the course contents.

Because this is a public forum, and everything that you post will be seen by your classmates, this is not the place to ask questions about how to answer/solve assignments, exam questions, or anything else you are being graded upon.

The Janitor’s Closet: Use this forum to report technical problems with the website: broken links, malformed HTML, etc.

Office Hours

Yes, given that this is a Distance Course, many of you can’t come to campus to see me face-to-face. But I also offer options of meeting by telephone or via network conferencing (Hangouts).

You can find my office hours (and instructions on using Hangouts) here. 2 General Rules for On-line Communications 2.1 Public and Private Communications

Choose a communications option that is appropriate to the nature of the discussion.

Some of the communications options that you have will open your discussion up to the entire class. Others will limit your discussion to the instructor or to your team on a group assignment.

In general, any conversation in which you discuss all or part of your solution to an individual assignment, even if you are only speculating on possible solutions, should be limited to you and the instructor. Sharing such information with other students is a violation of the course’s policy on Academic Honesty.

Use email or office hours for those kinds of questions.

Similarly, questions about your own grade on an assignment should be limited to a private communication with the instructor. In fact, if you were to try to ask such a question in a public forum, ODU privacy policies would prohibit the instructor from replying.

On the other hand, questions about the course subject matter or purely clarification questions about an assignment may be useful subjects for the entire class. These are good subjects for the Hallway forum.

The instructor may, if he feels it is appropriate, copy an e-mailed question to a Forum so that the answer becomes available to everyone.

2.2 Etiquette in Email and Other Written Communications

Students posting in the Forums or sending email to the instructor or to classmates are expected to conform to the norms for civility and respect for ones’ classmates and instructors that are common to all on-campus speech and writing.

Students are also expected to conform to the norms of “netiquette”, for example, RFC 1855: Netiquette Guidelines . In particular:

Emotions are often hard to convey and easy to misunderstand in written text. Smileys and other emoticons can help (but don’t assume that attaching a :-) to an insult will make everything OK with the people reading your post.).

DON’T WRITE IN ALL CAPITALS or in all bold or, even worse, IN ALL BOLD CAPITALS. This is considered to be shouting, and most people don’t like to be shouted at, whether in real life or on-line.

“Shooting the messenger” is seldom a good idea. In general, assume that people who take the time to reply to your posts are honestly trying to help. Getting mad at them and “flaming” back is counter-productive if you really want people to help you.

Replies to posts will often be short and to the point simply because the responder has limited time. Don’t mistake terseness for rudeness.

Many people who post questions and requests for help may have made very basic mistakes. If you omit the details of everything you thought of and checked before making your post, don’t be insulted if someone replies with a very basic suggestion or a link to something that you have already read.

Don’t “hijack” existing Discussions (threads) to talk about a topic different from the original poster’s topic. Start your own thread instead. In an ordinary conversation, no one appreciates the person who barges in and insists on changing the topic of discussion. And if two groups of people actually insist on trying to simultaneously carry on a discussion on two distinct topics within the same conversation, the result is usually confusing to everyone. 3 Asking Good Questions

Whether posted in Discussions or sent via email, a question is the beginning of a dialog. A well-prepared question will get you an informative answer quickly. A poorly-prepared one may get you irrelevant answers or may require several rounds of back-and-forth dialog, delaying your eventual answer by many hours or even days.

So it’s in your own self-interest to ask your question in a way that gets you the answer you need as quickly as possible.

3.1 Identification

Who are you? : If you are sending me email, make sure your course login name or your real name appears somewhere in the message. I hate getting mail from [email protected] saying “Why did I get such a low grade on question 5?” when I have no idea who this person is!

What course is this? : Again, if you are sending me a question via email, please remember to state which course you are asking about. I teach multiple courses most semesters, and having to go look up your name to see which of my courses you are talking about is annoying.

For email, please put the course number (CS252) in the subject line. I configure my email reader to flag such messages for priority handling, giving you less chance of being lost amid the daily flood of spam. But “CS252” should not be, by itself, the entirety of your subject line.

Use a clear and precise subject header.: In Discussions, your subject header helps people decide if your post is worth reading. It also helps people find prior discussions that may have been relevant to later posts.

In e-mail, the subject line shares much the same purpose. Empty and short subject lines are also more likely to get tossed by autoamtic spam filters. Using the same subject line for every email in the entire semester makes it hard to refer back to previous messages. 3.2 You Have to Give Information to Get Information

When you ask a question, you usually want an appropriate answer that will let you get past whatever difficulty you are having. So that answer needs to be relevant to your particular difficulty and tailored to your understanding of the course material.

3.2.1 What’s the Problem?

I’m not psychic.

If you send me an email consisting of nothing more than “I’m stuck on this assignment.”, there’s really no way I can give you the kind of answer you are hoping for. You need to tell me:

What assignment are you stuck on?

What part of the assignment are you stuck on?

What have you tried so far? (Be specific. If you’ve tried lots of things, tell me about your best attempt so far.) What happened when you tried it?

If you have an issue with a webpage, tell me the URL. If a link is broken, tell me the URL of the page containing the link and describe the location of the link. Don’t just tell me “the link to assignment 2 is broken”. I probably have a dozen different links to that assignment in different pages.

3.2.2 What’s Your Environment?

Most of this course is devoted to the idea of working remotely: using your own PC to control a remote Linux server from over the network. Some of you will be using Windows PCs to do this, some will use Apple OS/X, and a few may use Linux boxes. Each of these operating systems comes in different versions. Furthermore, you will often have multiple choices as to what software you are using to communicate between your PC and the remote Linux server.

It is therefore often important that you tell me:

What is the operating system of the PC that you are working on?

Which remote Linux machine were you connected to?

What software are you using to connect the two?

Again, if you don’t tell me those things, I’m just going to have to ask for that information, delaying the process of getting you the answer you want. (Also, please keep in mind that in a typical semester I have between 120 and 180 students in this course. Just because you told me all this a week ago, don’t assume I will remember it when you have a question about a different assignment the following week.) 3.3 Thou Shalt Not Paraphrase

There’s nothing more frustrating than getting a question like

“When I try to compile my solution to the first assignment, I get an error message. What’s wrong?”

Grrr. What was the (exact) text of the error message? Was this on a Linux or Windows machine? What compiler were you using? What compiler options did you set? What did the code look like that was flagged by the message?

No, I’m not kidding. I get messages like this all the time. And it wastes my time as a question answerer to have to prompt for all the necessary information. It also means a significant delay to the student in getting an answer, because we have to go through multiple exchanges of messages before I even understand the question.

The single most important thing you can do to speed answers to your questions is to be specific. I’m not psychic. I can only respond to the information you provide to me.

Never, ever paraphrase an error message (“I got a message that said something about a bad type.”) Never, ever paraphrase a command that you typed in that gave unexpected results (“I tried lots of different compilation options but none of them worked.”, or, my personal favorite, “I tried everything.”) Never, ever paraphrase your source code (“I tried adding a loop, but it didn’t help.”) Never, ever paraphrase your test data (“My program works perfectly when I run it.”)

All of the above are real quotes. And they are not at all rare.

The problem with all of these is that they omit the details that would let me diagnose the problem. 3.4 “Copy and Paste” is Your Friend!

And it’s not all that hard to provide those details. Error messages can be copied and pasted into your message. To copy text from PuTTY, xterm, or from a Linux or OS/X The commands you typed and the responses you terminal, just left-click and drag your mouse across text to received can be copied-and-pasted from your ssh/xterm select it. Your selected text is automatically copied to the session into your message. Your source code can be clipboard, and you can paste it into an e-mail message or copied-and-pasted or attached to the message. Forum post. To paste text into one of these programs, try:

clicking the middle mouse button, if you have one.

The scroll wheel on many mice fills in for a middle mouse button. Some of these programs will allow you to simulate a middle click if you click both the left and right mouse buttons simultaneously.

Hold the shift key and type the “Insert” button.

3.5 No Screenshots

Most of the time, the information you need to show me about the problems you are encountering will be plain text, and so you can easily copy and paste it directly into your messages.

Please don’t send me screen shots…

Screenshots are often hard to read and often do not allow me to make the fine distinctions I need to tell what is going on. Keep in mind that raster graphics formats (gif, jpg, png, etc.) often look very different when rendered on screens with different resolutions, or when rendered by different email programs.

Screenshots also tend to clog up my Inbox, and slow me down. I don’t want to wait for megabytes of graphics to download just to see a couple dozen characters, which I may or may not actually be able to read. I don’t like having to take the time to strip out graphics from emails before archiving them.

…unless you absolutely, positively, need to show me graphics.

Late in the course when we get to graphics-mode sessions, you may need to show me something that isn’t simply text. And then screenshots make sense.

But, please, take a shot of just the portion of the screen involved. Don’t grab a picture of your whole screen if you want my attention to focus on a few square inches. 3.6 If I Ask You a Question, Answer It

I often respond to a student’s question with further questions of my own.

Teachers since Socrates have always done this, and students have always been annoyed at it. But who are we to argue with history? Sometimes I do this to get more info I need, sometimes to guide the student towards an answer I think they should be able to find for themselves.

It’s surprising how often students ignore my questions and either never respond at all, respond as if my questions were rhetorical, or, if I have asked 2 or 3 questions, pick the one that’s easiest to answer and ignore the rest.

This pretty much guarantees that the dialog will grind to a halt as I wind up repeating myself, asking the same questions as before, and some students go right on ignoring my questions, … Why Unix?

Steven Zeil

Last modified: Aug 29, 2019

Contents: 1 All in the *nix Family

We can actually interpret the title question in a number of different ways. Do we mean, “why does the CS department use Unix?”, or “why is this course about Unix?”, or “why was Unix invented?”, or even “why does Unix behave the way that it does?”

It’s actually the last of these interpretations that I want to address, although understanding the answer to that question will go long ways towards explaining why the CS department uses Unix in most of its courses and, therefore, the very reason for the existence of this course.

I think that, to really understand a number of the fundamental behaviors of Unix, it helps to consider how Unix is different from what is probably a much more familiar operating system to most of you, Microsoft Windows. Furthermore, to understand the differences between these operating systems, you need to look at the history of computer hardware and system software evolution in effect at the time when each of these operating systems was designed. In particular, I want to focus on three ideas: evolution of CPU’s and process support, evolution in display technology, and evolution in network and technology.

Computer historians are fond of pointing out that mainframe computers were huge behemoths, occupying massive rooms, drawing large amounts of electrical power for their operation, and often requiring cooling systems fully as large as the processor itself. For some time, processors continued to be physically large, although the processing power squeezed into that space grew tremendously.

On the early machines, only a single program could be run at any given time. As processors became more powerful, both hardware and system software evolved to permit more than one program to run simultaneously on the same processor. This is called multiprocessing. The initial reason for doing multiprocessing was to allow programs from many different users (programmers) to run at once. This, in turn, is called multiprogramming. At first, it was assumed that a single user had no need for more than one process at a time.

Interactive programs are characterized by long periods of idleness, in which they are awaiting the next input from the user. In an interactive environment, it becomes natural for users to switch attention from one process that is awaiting input or, in some cases, conducting a lengthy calculation, to another process that has become more interesting. For example, someone using a word processor might want to switch over to their calendar to look up an important date before returning to the word processor and typing that date into their document. Fortunately, once you have support for multiprogramming, you have most of which you need for combined multiprogramming and multiprocessing.

In fact, there is a definite advantage to having started with multiprogramming. In a multiprogramming environment, there is a great danger that one programmer’s buggy software could crash and, by rewriting portions of memory or resetting machine parameters, take down not only that programmer’s program but other programs that happened to be running on the machine at the same time. Consequently, multiprogramming systems place a heavy emphasis on security, erecting hardware and software barriers between processes that make it very difficult for one process to affect others in any way. Adding that kind of protection on to an operating system that didn’t design for it in the first place is much harder.

The trend toward multiprogramming and multiprocessing persisted, not only across families of mainframe computers, but also across the increasing number of desk-size “minicomputers”. It is in this context that Unix was developed. From the very beginning, Unix was therefore envisioned as an operating system that would provide support for both multiprocessing and multiprogramming.

During the heyday of the mainframe, most data was entered on punch cards and most output went directly to a printer. Most of these systems had a “console” where commands could be entered directly from a keyboard, and output received directly on an electric typewriter-like printer, but such input was slow and inexact. Prior to multiprocessing, it would have been economic folly to tie up an expensive CPU waiting for someone to type commands and read output at merely human speeds. So the system console generally saw use only for booting up the system, running diagnostics when something was going wrong, or issuing commands to the computer center staff (e.g., “Please mount magnetic tape #4107 on drive 2.”).

The advent of multiprocessing and the subsequent rise of interactive computing applications meant that the single system console, hidden away in the computer room where only the computer center staff ever touched it, was replaced with a number of computer terminals accessible to the programmers and data entry staff. An early was, basically, a keyboard for input and an electric typewriter for output.

Terminals were not cheap, but their lifetime cost was actually dominated by the amount of paper they consumed. In fairly short order, the typewriter output was replaced by a CRT screen. This opened up new possibilities in output. A CRT screen can be cleared, output to it can be written at different positions on the screen, and portions of the screen can be rewritten without rewriting the entire thing. These things can’t be done when you are printing directly onto a roll of paper. Terminal manufacturers began to add control sequences, combinations of character codes that, instead of printing directly, would instruct the terminal to shift the location where the next characters would appear, to change to bold-face or underlined characters, to clear the screen, etc. All of this wizardry was hard-wired – there were no integrated-circuit CPUs that could be embedded into the box and programmed to produce the desired results. Consequently, terminals were quite expensive (the fancier ones costing as much as a typical new car). Different manufacturers selected their control sequences as much based upon what they could wire in easily as upon any desire for uniformity or comparability. Consequently, there were eventually hundreds of models of CRT-based computer terminals, all of which used incompatible sets of control sequences.

Embedded microprocessors eventually simplified the design of computer terminals considerably (sinking a number of companies along the way that had made their money leasing the older expensive models), and the capabilities of computer terminal began to grow, including adding graphics and color capabilities. Eventually, PCs became cheap enough that the whole idea of a dedicated box serving merely as a terminal came into question, and the computer terminal now exists as a separate entity only in very special circumstances, although there are periodic attempts to revive the idea (e.g., so-called Internet appliances).

Before there was a World-Wide Web, there was an Internet. The Internet grew out of a deliberate attempt to allow researchers all around the country access to the limited number of highly expensive mainframe CPU’s. Internet traffic originally was dominated by telnet, a protocol for issuing text commands to a computer via the Internet, and FTP, a protocol for transferring files from machine to machine via the Internet. Email came along later.

In imitation of (and perhaps in jealousy of) the Internet, UseNet evolved as an anarchic collection of mainframe and minicomputers that each knew a handful of telephone numbers of other UseNet computers and could pass email and news (a.k.a. bulletin board) entries along those connections.

As the idea of long range networking took hold, more and more sites began installing local area networks to enable communication among their own machines.

Unix evolved for minicomputers in an historical context where

Multiprocessing was expected, and the hardware provided safeguards for protecting one running process from affecting or being affected by other processes on the same CPU.

The most common displays were computer terminals, which came in many different models, all of which used mutually incompatible control sequences. Most of these could display text only, or text with simple vertical & horizontal line graphics. “True” graphics terminals were not unknown, and were clearly on their way, but were so expensive as to be comparatively rare.

Networking was common. In fact, it was normal, perhaps even the rule, for users to be controlling, via the network, machines that were remote from the users’ actual location.

When personal-computer (PC’s) came on the scene, they represented a revolution in terms of both decreased size and decreased cost, but they represented a step backwards in terms of total computing power and in terms of the sophistication of the hardware support for many systems programming activities.

Oddly enough, PC systems seemed to recap the entire history of computing up till that time, though at a somewhat faster pace:

“One user – One CPU” was a rallying cry of the early PC proponents. They argued that, although an individual PC presented limited CPU power compared to mainframe or mini machines, the individual PC could still provide a single user with more CPU power than that person would receive as their share of a mainframe when split over a large number of simultaneous users. So early PC operating systems returned to the single-user, single-process model that had gone out of fashion decades before in the world of larger computers. (Because, after all, that single user didn’t really need to split those precious CPU cycles among more than one application at a time, right?)

Display technology reverted initially to the electric-typewriter style system console. This was quickly supplanted by a “dumb terminal” CRT-and-keyboard, though early printers were still electric typewriter based. (In fact, I recall seeing ads for a device consisting of a panel of solenoids and control circuits that could be placed over the keyboard of an electric typewriter. Send the panel the ASCII code for an “a”, and a solenoid “finger” would punch down right where the “a” key would be on a typewriter. Send it the ASCII code for “Z”, and a pair of solenoids would strike the typewriter’s “shift” and “z” keys.)

The very existence of integrated circuit CPUs, however, lowered the cost of CRT displays to the point where more elaborate, graphics-capable displays were soon available.

Network technology was initially spurned. “One user – One CPU”, remember? Why would anyone need access to other computers. Email and net news could be handled by modem connection without full-fledged networking. It took a surprisingly long time before PC operating systems and applications began to acknowledge that not every bit of information and not every hardware/software resource could economically be replicated on every PC system.

MSDOS was developed in a context where

“One user – One CPU” was the rule. Multiple processes for a single user were not deemed necessary.

Most PCs had a CRT display with limited character and graphics capabilities.

Networking was deemed unnecessary.

As MSDOS evolved into Windows, it did so in response to changes in the HW/SW context:

“One user – One CPU” remained the rule, but a single user might have multiple processes.

PC displays could show characters in a variety of fonts, and graphics capabilities were more common.

Some people might want local networking, but it was supplied by third-party add-ons with minimal support from the operating system itself. As for the Internet, why would anyone with a PC want to communicate with all those mainframe dinosaurs?

Of course, both MS Windows and Unix continued to evolve past their earliest forms, but the contexts in which they have evolved helped establish their fundamental philosophy and continues to influence how they work today.

Unix users tend to do a lot more typing than Windows users. Graphics capabilities were rare when Unix was developed, so commands had to be typed out rather than always working through a window GUI, and that practice of typing commands continues to influence the “look and feel” of Unix. Unix does have a windowing GUI, but the Unix approach is to launch an application via a typed command, then let that application open up windows if it needs them. Compare to MS Windows, where most users never type a command, and may not even realize that many common Windows applications offer a variety of command line options. (Try, for example, creating shortcuts on your Windows desktop (right-click on the desktop and select New…shortcut) with the item:

C:\WINDOWS\EXPLORER.EXE

and another with the item:

C:\WINDOWS\EXPLORER.EXE /n,/e,c:\

Try them each and see what they do. The difference is potentially useful, but this possibility is unknown to most non-programmer Windows users.

Because graphics capabilities were rare when Unix was developed, many Unix applications are text based. The canonical Unix application reads a stream of text in (from “standard input”) and produces a stream of text as output (on “standard output”). Because the output is often considered a slightly modified version of the input, such programs are called filters. Unix users are more likely to string together a bunch of filters that each do something simple than to hunt for a massive dedicated application that does the whole job at once. For example, if you wanted to know how many statements occurred in some file of C++ code, an MS Windows programmer would load that file into a visual C++ programming environment, then search through the menu bars for a “properties” or “number of statements” item that might convey the required information. A Unix programmer would, in a single line of text commands, feed that file into a program that extracted each line containing a “;” (because most C++ statements end with semicolons) and then feed that set of “;”-containing lines through a line-counting filter.

“One user – One CPU” thinking dominates Windows applications even though networking is widely available. MS Windows has never offered the same level of protection against one process affecting others on the same machine. In Windows, it’s still all too common for a single crashing application to lock up the machine to the point where a reboot is required. In Unix, such reboots are quite rare.

In part, this is because Unix allows processes to interact in only two ways:

By having one process write out files that the other reads

By having one process write characters into a “pipeline” that is read by another process.

By contrast, MS Windows has a variety of communication mechanisms, some of which allow one process to directly manipulate the data of another. This allows different applications to work together in interesting and valuable ways (e.g., embedding charts from a spreadsheet directly inside a word-processor document), but at an inevitable cost to overall system security and stability.

Unix applications are often designed to be run remotely, via a network. Even the Unix windowing GUI, called X, is based on the idea that an application can be run on whatever machine it is installed upon, but will actually display its windows on and accept its mouse clicks and other inputs from any machine on the network. MS Windows, on the other hand, assumes that the application is running on the machine where its outputs should be displayed. If you want to use, say, a paint program that isn’t installed on the PC you are sitting at but is installed on another PC in the same room, you can’t run that program without actually getting up and moving to the other PC.

Another instance of this bias, one that is, I suspect, much closer to the hearts and pocketbooks of Windows applications developers, can be seen in the deliberate ignorance of many applications to the realities of the networking world. Although Windows will provide support for different users logging in (one at a time!) to given PC, the majority of application software that a user might install on that PC will assume that a single data area and a single set of option and preference settings are enough for that PC. Therefore every time you change a setting, you affect the behavior of that application not only for yourself, but for all other people who use that same machine. Unix applications, on the other hand, are far more likely to be distributed under site licenses permitting all users at a site equal access.

Indeed, Unix applications are far more likely to be distributed with source code. It’s worth noting that both the Free Software movement and the Open Source movement originated in the Unix community.

If you have PCs arranged on a network, so that they can access a common pool of disk drives, when you purchase a new windows application, it is likely to try to install itself onto a single PC, not allowing itself to be run from the other PCs that can access the drive where you install the software. And even if you could run it by from a different PC, you will often find that all of your preferences or options settings and all of your accumulated data is squirreled away in the registry or systems area of the PC on which the software was installed, making it unusable from other PCs.

All this may help to explain why the CS Dept. makes such heavy use of Unix. It’s not that we dislike MS Windows. But if we require a particular software package (say, a compiler) in our classes, under Unix we install it on a few Unix machines and let students run it from remote locations via the Internet. Under Windows we have to install it on every CS Dept machine, ask that it be installed on every machine in the ODU laboratories (and at the distance learning sites). And when an updated version of that package comes out, we have to go through the entire process all over again. It’s just far, far easier to maintain a consistent working environment for all students under Unix. 1 All in the *nix Family

Is Unix still relevant? Will you really ever need to use or work in a Unix environment? Unix is an old operating system. Sometimes its age shows. Some Unix applications seem to use odd choices for interpreting special keys; some interpret mouse clicks in unexpected (to Windows users) ways; some have odd mechanisms for copying and pasting and sometimes the windows look funny. Take something as simple as as copy-and paste. Windows users know that the keyboard shortcut to paste information is Control-v. This is pretty much universal among Windows programs. But in the Unix emacs editor, instead of “pasting”, you “yank” information with Control-y. Ask an emacs fan why emacs didn’t adopt the same “standard” key stroke as the rest of the world was already using, and you will be given a condescending smile while the fan explains that emacs adopted the Control-y yank long before Microsoft even existed, and that the real question is why the rest of the world agreed to the idea that letter “v” was a logical choice to stand for “paste”.

Unix is a new operating system. It has been continually changed, updated, evolved, and used as the basis for new “Unix- like” operating systems. In fact, it really makes more sense to think of Unix as a family of operating systems (sometimes denoted as “*nix”). There is an actual standards process that allows an operating system to certify that is it a “real” Unix. But there may be even more machines out there running Unix-like operating systems, in other words, something in the *nix family.

You might well be using a Unix or Unix-like operating system without realizing it:

The Apple OS/X operating system is a certified, true Unix.

Linux systems come in many flavors (called “distributions”), most of them available for free. Linux is not certified as Unix, but is so close that it makes no difference to most users.

The Apple iOS used on iPhones and iPads is based on OS/X and so is in the Unix family.

Android, the chief competitor to iOS for portable devices, is based on a modified Linux kernel.

There are a whole host of consumer electronic devices (e.g., DVRs, smart TVs, automobile entertainment systems) that contain CPUs and that are running some version of Linux inside.

So there’s actually a pretty good chance that any computerized system that you lay your hands on that isn’t running Windows will actually be running something in the Unix family. Although some of these hide the operating system deeply enough that you won’t notice it when you use them, if your future profession should involve programming for any of them, you may find yourself working on a *nix system as the closet thing to the platform where your code will eventually run. Unix account setup

Steven Zeil

Last modified: Sep 2, 2018

Contents: 1 Account Setup 2 If you have difficulties logging in

All students in this class must have an account on the CS Dept.’s Unix network. This is independent of any Midas/NetWare or other accounts that you might have through the ITS, the University’s general computing system. 1 Account Setup

1. If you have had a CS Dept. account within the past year, it should be restored for you automatically, with your old login Prior to August 2014, everyone had separate name and password passwords for their CS Unix and Windows network accounts. In August 2014, these were 2. If not, you can request a new account by going to the CS Dept merged so that the same login name and home page and clicking on “Online Services” and then on password works on the entire CS network. If “Account Creation”. your account has been restored from a prior semester, you may need to try both of your old You may need to wait until you have been registered for at passwords to see which one was carried over to least 24 hours before you can create an account. (It takes that the new network configuration. long for the information to get to us.)

If you are registered for a future semester, you generally cannot create a new account until the week before classes for that semester start.

3. Once you have your account set up, including having replied to the confirmation email that you receive as part of the process, you should wait 24 hours, then return to this course and click here to try to view the “Log in” assignment. (You may not be ready to actually do that assignment yet, but we want to be sure that you can access it.) 2 If you have difficulties logging in

Do not return to the Account Creation page and create a second (or third or fourth…) account. This will only create problems identifying you to both this course and to many other CS courses in which you might enroll.

If you have already created multiple accounts, please email [email protected] and tell them the account names (if you remember them) and ask that they remove all but one.

1. You may have simply typed your login/user name or password incorrectly. Use your browser’s Back-up and Refresh buttons to try again.

Take note that, on our system, the user names use no upper-case letters. Also, when logging in to the course website or to one of the Linux servers, you do not put “CS\” in front of your login name, as you might do to log in to a CS Dept Windows PC. Similarly, do not add @odu.edu or any other portion of an email address to your login name.

2. If you are registered for a CS Dept course but do not yet have a CS account, follow the instructions above to request up your account.

Is your Unix account active? The easiest way to check is to try logging in to one of our Linux machines or to use Remote Desktop to connect to our Virtual Windows PC lab.

3. If you are able to log in to a CS Dept machine but not to the protected content on this course website,

A. Have you waited at least 24 hours before trying to access protected content on the course website?

After your account has been activated, your new login name needs to be written into the course enrollment records before my course website will become aware of it. That process takes place overnight.

B. Are you really, really sure that you entered your login name and password correctly? These are taken directly from the same database used to authenticate your logins to the CS Dept Linux and Windows machines, so it is unlikely that they would be out of sync.

4. If you have forgotten your password, you should contact the CS Dept systems staff to have your password reset:

Go to the CS Dept home page and click the “Password Reset” link (under “Online Services”).

5. If you are unable to log in to either the Virtual PC lab or to the Linux servers, you will need to contact the CS Systems staff to resolve your problems.

If you are unable to log in to those services, you will almost certainly be unable to log in to the course website as well. Logging In

Steven Zeil

Last modified: Jul 24, 2017

Contents: 1 Sessions 1.1 Local and Remote Sessions 1.2 Text-mode and Graphics-mode Sessions 1.3 Four Combinations 2 Starting a Local Unix Session 2.1 Local Sessions on a Linux Machine 2.2 Local Sessions on an OS/X Machine 2.3 Local Sessions on a Windows Machine 3 Starting a Remote, Text-Mode Unix Session 3.1 Working on the Network: Servers and Clients 3.2 SSH 3.3 Remote Text-Mode Sessions from a Windows Machine 3.4 Remote Text-Mode Sessions from a Linux, OS/X or Windows-with-Cygwin Machine 4 Logging In 5 Setting Your Terminal Type 6 Changing Your Password 7 Logging Out 1 Sessions

A session in Unix (or any other operating system) is a sequence of actions in which you

1. Log in to a computer.

2. Issue one or more commands, including launching various programs.

(We won’t make a big distinction in this course between issuing commands and launching programs. You launch a program by giving a command telling the operating system to run that program, and even the most trivial commands usually wind up launching a program to carry out your request.)

3. Log out of that computer.

All sessions have those characteristic three steps. If you are working on a PC where you are the only registered user, you might log in automatically when you power up the PC, and you might be logged out automatically when you tell the PC to shut down. On other computers, you might need to give a user name (login name) and a password. Once you are in a session, you might issue commands by typing on the keyboard or by using your mouse to select from a menu of available programs.

So there can be many different “flavors” of operating system sessions. In particular, a session can be either local or remote, and a session can be text-mode or graphics-mode. 1.1 Local and Remote Sessions

A local session is one in which any commands that you issue are used to control the very computer at which you are seated and to which your keyboard/mouse/screen are connected. Right now, if you are reading this while seated in front of a keyboard and screen, you are engaging in a local session with the PC at which you are seated.

A remote session is one in which any commands that you issue (via your local keyboard and mouse) are used to control some other computer, one at which you are not seated but that receives your commands via a network to which both your local computer and the remote one are connected.

As a general rule, if you want to engage in a remote session with some other computer, you will first need to start a local session on the machine you are seated at, then issue a command to that local machine telling it to connect to the remote one and to start a session for you there. 1.2 Text-mode and Graphics-mode Sessions

Another important distinction among different kinds of sessions is how your interaction with the session is presented.

A text-mode session allows you to type commands and to see textual responses in return. Your entire session can be seen as a kind of written dialog between you and the machine. Text-mode sessions are somewhat limited, and their appearance is generally less than exciting.

A graphics-mode session allows the operating system to show you things other than text. This includes not only pictures, charts, and other graphic media, but also windows, menus, toolbars, buttons, and all the other forms of controls that we are used to seeing drawn on our screens.

Graphics-mode sessions are often easier to look at and easier to control than in text-mode. The availability of the mouse or other pointing device permits us to enter information such as positions in a picture that would be difficult to express in text. 1.3 Four Combinations

That gives us four possible modes for a session with a Unix machine:

1. local text-mode,

2. remote text-mode, 3. local graphics-mode, and

4. remote graphics-mode.

All four possible combinations are possible and do get used in practice. Programmers are more likely to engage in a text- mode session than are ordinary users. There are several reasons why software developers need to be capable of working in text-mode:

They don’t place much strain on the computer or the network. In particular, text-mode sessions can be the most effective way to conduct a remote session over slow or overloaded networks.

Text-mode can offer more detailed control over the launching of programs that have a number of different options or inputs.

It is usually easier to prepare scripts of command sequences that we want to save to perform in later sessions when working in text-mode.

In fact, many of the icons, shortcuts, and menu entries that you might employ in a graphics-mode session actually run single lines or scripts of text-mode commands, so the software developer who prepares such things must understand the underlying commands. 2 Starting a Local Unix Session

The main focus of this course is on working remotely. However, if you are working on a computer that has a Unix variant available, then much of what you will be learning to do remotely can also be done on your local machine. So it’s worth looking, briefly, at the process of starting a local session. Also, depending on your local computer’s own operating system, you may need to start a local Unix session before you can start a remote one.

It all depends on your local machine: 2.1 Local Sessions on a Linux Machine

If your local computer runs Linux, then just follow your normal power- up and login procedure. Once you have logged in, you will probably see something roughly like this:

or this:

but there is a wide range of different appearances offered by different Linux distributions. You are, in fact, now in a local graphics-mode session.

To get from here into a local text-mode session, you need to find and launch a “terminal” or “xterm” program. Different Linux distributions will hide this is different places in the menu system, so you will need to hunt for it. 2.2 Local Sessions on an OS/X Machine

If your local computer runs Apple’s OS/X, then just follow your normal power-up and login procedure. Once you have logged in, you will see your usual OS/X desktop.

You are, in fact, now in a local graphics-mode UNIX session. (OS/X is a Unix operating system.)

To get from here into a local text-mode session, look in the Applications launcher for the “Terminal” program. 2.3 Local Sessions on a Windows Machine

If your local computer runs Microsoft Windows, then you are booting up into a local graphics-mode Windows session. You can get a local text-mode Windows session by running the cmd program. Some of the commands and techniques that we will study for Unix text-mode sessions can be applied, with minor changes, to a Windows text-mode session.

Of course, those are Windows sessions, not Unix sessions. Windows users will probably concentrate on doing Unix via remote sessions to Unix machines elsewhere on the network.

It is possible to do local *nix sessions on a Windows PC. For more than a decade, the CygWin project has been providing an extensive collection of (free) Unix-based tools that provide a POSIX1 layer that allow open-source Unix software to be compiled and run on Windows PCs. Personally, I just can’t stand trying to do “serious” work on a Windows machine that does not have CygWin installed. 3 Starting a Remote, Text-Mode Unix Session

In this section, we’ll concentrate on creating text-mode sessions with remote Unix machines. For CS Dept. students, these will be the Dept.’s Linux servers. Remote graphics-mode sessions will be introduced later. 3.1 Working on the Network: Servers and Clients

When we talk about working remotely, we are talking about communicating over a network, possibly a local network like the ODU netowrk or your local WiFi environment, or possibly a combination of such local networks connected via the Internet.

Network services are almost universally organized as communications between servers and clients.

A server is a program that provides a service from some computer connected to the network. A client is a program, usually on a different machine, that requests such services from a server. The server will respond to some selected set of commands from clients that connect to it. The timing and format of those commands and of the server’s response to them are called a protocol.

Different protocols are designed for different purposes. You probably already make use of

protocol server client HTTP web servers web browser HTTPS secure web servers web browser IMAP. POP email servers email programs (reading) SMTP email relays email programs (sending)

Generally, a single server program will support just one (or two closely related) protocols. Similarly, a client program will also support just one (or a few closely related) protocols.

A typical web server provides HTTP or HTTPS. A few provide both.

But you don’t ask a web server to pass your email along to its intended destination.

A typical mail client will allow you to configure your email account to retrieve from mail servers that work via POP or IMAP, and to send mail out by communicating to an SMTP server.

But you don’t ask an email client to fetch or serve web pages.

In this course, you will eventually be working with servers and clients for a number of other protocols:

protocol purpose SSH Issue commands to be run on the server. FTP Transfer files between a server and a client. SFTP Transfer files securely between a server and a client. X Display graphics on a server from a program running on a remote client.

In each case, you will need to install (or already have) an appropriate client or server program on your local PC, connect to a remote machine at ODU, and operate the local client or server to obtain the service you desire.

So, typically, the sequence is

1. Figure out what kind of service you need. 2. Identify which protocol(s) provide that service. 3. Start a client (or server) program supporting that protocol. 4. Connect to a remote server (or client) supporting that protocol. 5. Operate your local program to obtain the service you require.

If that sound daunting, remember that you already do this for limited set of services and protocols. When you want to visit a web page, you

1. Figure out what kind of service you need

— web browsing

2. Identify which protocol(s) provide that service — The URL of the page you want will provide this information in its first 4-5 characters, either HTTP or HTTPS. The URL also provides the Internet address of the web server that is going to supply the pages.

3. Start a client (or server) program supporting that protocol

— start your web browser.

4. Connect to a remote server (or client) supporting that protocol

— paste or type the URL of the desired starting page into the address bar, or select a bookmark with that URL.

5. Operate your local program to obtain the service you require

— wait for the page to display, and start scrolling and clicking away!

We’re just going to expand on the set of protocols, servers and clients that you will be working with.

And, right now, we are interested in sessions in which we issue messages to a remote machine. And that need is fulfilled by the SSH protocol. 3.2 SSH

We will be accessing the remote machines via SSH (Secure SHell), an Internet protocol for issuing interactive text-mode commands to remote machines.

There is an older text-based protocol for giving commands to Unix called “telnet”. Telnet is still used for a variety of purposes, but has fallen out of favor because it sends everything (including your login name and password) in plain-text format, leaving you vulnerable to someone else on your network eavesdropping via “packet sniffers”.

By contrast, ssh encrypts all your communications, so you are safe even if someone is eavesdropping.

“SSH” allows a person connected to the Internet to log into other machines on the Internet and to issue commands to those machines.

To use ssh, you must have an ssh client program on the machine where you are seated, and you must know the name (or the IP address) of a machine elsewhere that is running a ssh server, the program that accepts logins and subsequent commands from the client.

A list of the available CS Linux servers is here. Pick one for your remote connection.

Use the machine name, e.g., atria.cs.odu.edu, rather than the numeric IP address. Although the IP address is, in a sense, the true identifier of a machine on the Internet, if a machine that provides a useful service breaks down or needs to be upgraded, it may be replaced by a machine that has a different IP address. That IP address will, however, then be assigned the familiar name that people have been using to connect to that service, so the substitution of machines will be invisible to people using the server name.

3.2.1 Login Accounts

You will, of course, only be able to connect to machines where you actually have a login account. If you are a student registered for a CS course at ODU, and have never had a Unix account on the CS Dept system, you will need to create an account.

If you have had an account in the recent past, it should be regenerated for you in any semester when you are registered for a CS course. Otherwise, you will need to contact your instructor or from the CS systems staff to get your account. 3.3 Remote Text-Mode Sessions from a Windows Machine On a Windows PC (without CygWin), you will need an ssh client program. If you are on your own PC, you can choose the client program to install and run. If you are on a lab PC, at work, or for some other reason using someone else’s machine, you either need to use whatever they have installed or bring your own client program, probably on a USB flash drive.

From your own Windows PC: Install PuTTY. Then proceed to Connecting via PuTTY.

From Windows PCs in CS Dept Labs: Most, if not all, will have PuTTY. Run it and proceed to Connecting via PuTTY. In some cases, you may find OpenSSH instead.

From Windows PCs in the ODU ITS Labs: ITS machines do not have an ssh client installed. Install Portable PuTTY on a USB flashdrive and proceed to Connecting via PuTTY.

3.3.1 Connecting via PuTTY

When you run PuTTY, it opens a session dialog box like the one shown here.

For the “Host Name”, fill in the name of the ssh server machine that you chose earlier.

You will need the full machine name, including the .cs.odu.edu part.

Use the symbolic name rather than the numeric IP address. IP addresses can change (e.g., if a machine breaks down and needs to be replaced.)

For the “Connection Type”, make sure that “SSH” is selected. Then click “Open”.

The first time you connect to any machine, you will get a rather intimidating warning that “The server’s host key is not cached in the registry” followed by a lot of details that uniquely identify the machine you are connecting to. This is a way ssh tries to protect you against people trying to capture your login info by “spoofing” or impersonating a legitimate machine. In practice, you’re not likely to know whether this info is correct or not, so click “Yes” to proceed with the connection.

If you get a similar message later when reconnecting to a machine you have previously used, that might be suspicious (or it may just mean that the original machine crashed and has been replaced by a different one that was given the same name on the network.)

You should now be prompted for your login name. You are ready to log in. 3.4 Remote Text-Mode Sessions from a Linux, OS/X or Windows- with-Cygwin Machine

All Linux and MacOS machines should include the ssh client program, ssh. On a Windows machine with CygWin, this is an optional package. If you have not installed (or are not sure if you did), run the CygWin installer and select it for installation from the “Network” section.

First, you will need to open a local text-mode session on your machine. Then from within that session, give the command ssh -l yourLoginName sshServerName filling in your CS Unix login name and the name of the ssh server machine you chose earlier (including the .cs.odu.edu ending).

Note that the -l option is a lower case “L” (for “Login name”), not numeric digit one.

An alternate form of the same command is

ssh yourLoginName@sshServerName

The first time you connect to any machine, you will get a rather intimidating warning such as “The authenticity of host … can’t be established” followed by a lot of details that uniquely identify the machine you are connecting to. This is a way ssh tries to protect you against people trying to capture your login info by “spoofing” or impersonating a legitimate machine. In practice, you’re not likely to know whether this info is correct or not, so reply “yes” to proceed with the connection.

If you get a similar message later when reconnecting to a machine you have previously used, that might be suspicious (or it may just mean that the original machine crashed and has been replaced by a different one that was given the same name on the network.

You should now be prompted for your password. You are ready to log in. 4 Logging In

If you are prompted for your login name, enter it.

Don’t prepend a “CS" or ”CS/". That’s a Windows-specific convention.

At the“password:” prompt, enter your password.

Note that as you are typing your password, nothing will happen on your screen. Unix does not echo your password back to you. Most programs and operating systems will not do that with passwords, as a basic security precaution against someone snooping your password from over your shoulder or even electronically. What you may find unusual, however, is that Unix will not even echo back blank characters or *s to show how many characters you have typed. This seems to disturb some people, but it’s a further security precaution. Just knowing how many characters are in your password would be valuable information to someone trying to crack your account.)

After a few moments, you should receive a command prompt. You are now ready to start issuing commands. For example, the “Hello World” of the Unix community might be:

who am i 5 Setting Your Terminal Type

Remember that Unix evolved in a time when many manufacturers made many different models of computer terminals, each with its own set of command codes for clearing the screen, moving the cursor to different screen positions, setting bold face, underline and other text characteristics, and so on. A typical Unix installation will be equipped to communicate with any of a few hundred different types of terminals.

Now, you’re not using a terminal, but you are using a program that simulates one. telnet was originally intended to allow terminals to issue commands to remote CPUs, and ssh is intended as a replacement for the older telnet protocol, so your ssh client program actually works by simulating an “old-fashioned” computer terminal. The authors of your client program chose one or more kinds of terminal that they would simulate. For Unix to manipulate your screen appropriately, it most know what kind of terminal command codes your ssh client program is prepared to accept.

Find out the kind of terminal being emulated by your telnet program. You may need to consult the program documentation or help files for this. You may also be able to deduce this information from the program’s “Options” or “Preferences” menus.

Here are some common choices:

SSH client program TERM type Comments PuTTY xterm OpenSSH xterm Command-line SSH, basic form, Linux or xterm or OS/X Linux Depends on what type of window you type the command Command-line SSH, basic form, CygWin xterm or rxvt in. Command-line SSH, xterm variant xterm

Look at the messages that you received after logging in. If it includes a line saying something like “Terminal type is…” and the terminal type that it names makes sense for your ssh client program, you’re all set and don’t need to set it. You can also check the terminal type by giving the command

echo $TERM

If the terminal type seems wrong, or you see a message indicating that you are on a “dumb terminal”2 , you need to tell the Unix system what kind of terminal you are really emulating. The command to do so is

export TERM=xxxx where xxxx is the kind of terminal (e.g, export TERM=vt100).

Most “dumb” terminals provide for 24 lines of text. Your ssh client may have defaulted to this, but most Unix systems will allow their terminals to provide a larger text area. If you choose to use a larger text area (i.e., resize the window showing your terminal session), your ssh client may take care of notifying the remote machine of the change. If, however, you see evidence that it did not do so (e.g., it only scrolls 24 lines even when you are showing more), you can tell Unix how many lines you are using by giving the command

stty rows nn where nn is the number of rows/lines. 6 Changing Your Password

Whether we like it or not, we need to worry about the security of our computing environment. There are people who would take advantage of this computer system if they had any, or more complete, access to it. This could range from the use of computer resources they have no right to, to the willful destruction and/or appropriation of the information we all have online. In order to maintain the level of security in our computing environment that we need, there are some things we all have to take responsibility for. Even though you may not feel like you personally have much to lose if someone had access to your account or files, you have to realize that as soon as someone gains ANY access to our system, it’s 100 times easier for them to gain access to ALL of it. So when you are lax with your own account, you are endangering the work and research of everyone else working here.

Your password is the fundamental element of security not only for your personal account, but for the whole UNIX system that we share. Without an account and password a person has NO access to our system. If someone discovers (or you tell someone) your password, not only will they have access to your personal files, but they will have a much better chance to launch attacks against the security of the entire system.

Your account password is the key to accessing and modifying all of your files. If another user discovers your password, he or she can delete all your files, modify important data, read your private correspondence, and send mail out in your name. You can lose much time and effort recovering from such an attack. If you practice the following suggestions, you can minimize the risk.

1. NEVER give another user your password. There is no reason to do this. You can change permissions and have groups set up if you need to share access with other individuals. Your account should be yours alone.

2. Never write down your password. Another person can read it from your blotter, calendar, etc. as easily as you can.

3. Never use passwords that can be easily guessed. Personal information about you (birth date, etc.) may be known to the attacker or may be recorded in on-line databases that the attacker has already obtained.

Passwords should not be single words (in any language) because on-line dictionaries are widely available for use in spelling checkers. A common approach to cracking passwords is to compile a set of such words and to run a program that tries each one on each account on the machine. Consider inserting punctuation and other “odd” characters into your password to foil such attacks.

Note, however, that so-called L33T substitutions, such as using “3” in place of “E”, do not improve your security. These are so widely known that password crackers routinely apply them to their word lists when trying to guess your password.

A person with local knowledge can also try your spouse’s name, pets’ names, etc. Your account is vulnerable to this type of cracking unless you choose your password carefully.

4. Change your password the very first time you log in, and every few months thereafter. Security problems are often traceable to stale passwords and accounts. These are accounts that have become inactive for one reason or another or the password has not changed for a long time. In our particular environment we have had break-ins via such stale accounts. A password that remains the same for a long time provides an intruder the opportunity to run much more advanced and longer running programs to break such passwords.

5. Vary the system by which you choose a password. For example, don’t repeatedly use combinations like BLUEgreen and REDyellow. If an intruder discovers your pattern, he or she can guess future passwords.

The command to change your password is

passwd

This command will first prompt you for your old password (just to check that you really are you!) and then will ask you to type your Note: As of Jan 13, 2017, there appears to be a new password (twice, so that an inadvertent typing mistake won’t problem with the passwd command leave you with a password that even you don’t know!). configuration. Until it can be cleared up, students should change their passwords via the CS Windows network.

7 Logging Out

To leave our Unix systems, type

exit

(not logout, as indicated in some books.) 1: POSIX is a standard for making an operating system “look like” Unix both to the people using it and to the software running on it.

2: A dumb terminal is one that displays lines of text but has no command codes for moving things to specific locations on the screen or doing other basic operations. Many Unix applications, such as text editors, will not work with dumb terminals. Working in a Text-Based Interface

Steven Zeil

Last modified: May 11, 2017 1 Why Text-Based Commands?

We begin actually working with Unix by learning how to issue commands from the keyboard, using a text-based command window.

If your prior experience in launching programs on computers has been limited to clicking or double-clicking on icons with a mouse, the idea of entering commands via the keyboard may seem primitive and awkward. But this course is “Unix for Programmers”, and, for programmers, the simple click-on-an-icon approach just won’t cut it in general:

Programmers often need more flexibility, including the ability to select different program options easily.

Programmers often need to repeat a long sequence of commands over and over without clicking to invoke each step, separately, one at a time.

And, ultimately, someone (i.e., a programmer) has to be able to tell the operating system just what program or command should be launched when one of those icons gets double-clicked.

By the way, I have several times now referred to launching programs or launching commands. In practice, there is little difference between running programs and commands. A “command” is just a program that comes with the operating system. When you write your own programs, they become new “commands” that you can start using.

There are a few exceptions to this general rule, but not many. 2 The Road Ahead

In the upcoming sections of this module,

You will learn a handful of Linux commands.

More importantly, you will how to type commands and how to modify those commands with a variety of options.

You will learn how to supply input to commands and how to send the output where you want it.

You will learn a number of shortcuts to make working in a text-based interface easier. You will learn two powerful notations for describing “patterns” for searching and manipulating files and text. The Unix File System

Steven Zeil

Last modified: Aug 26, 2019

Contents: 1 Files 1.1 File Names 1.2 Text Files 1.3 Binary files 1.4 What’s Text and What’s Binary? 2 Directories 3 Paths 3.1 How do you give someone directions? 3.2 File Paths 3.3 Paths Supply Directions 3.4 Absolute and Relative Paths 3.5 Abbreviating Paths 4 File Systems on Other Operating Systems 4.1 MacOs File Systems 4.2 Windows File Systems 1 Files

What’s in a file? Broadly speaking, we can divide files into two categories: text files and binary files.

Text files are files that consist entirely of human-readable (more or less) text, while binary files are files that encode data in a fashion intended only for interpretation by a machine. 1.1 File Names

Unix file names can be almost any length and may contain almost any characters. As a practical matter, however, you should avoid using punctuation characters other than the hyphen, the underscore, and the period. Also, avoid blanks, and non-printable characters within file names. All of these have special meanings when you are typing commands and so would be very hard to enter within a file name.

Some things to keep in mind about Unix file names that may be different from other file systems you have used:

Unix file names are often very long so that they describe their contents.2 The rather perverse exception to this rule is that program/command names are, by tradition, very short, often confusingly so.

Upper and lower case letters are distinct in Unix file names. “MyFile” and “myfile” are different names.

Periods (“.”) are not treated by Unix as a special character. “This.Is.a.legal.name” is perfectly acceptable as a Unix file name. Many programs, however, expect names of their data files to end in a period followed by a short “standard” extension indicating the type of data in that file. Thus data files with names like “arglebargle.txt” for text files or “nonsense.cpp” for C++ source code are common.

By convention, files containing executable programs (such as clpr and psnup in the above examples) generally do not receive such an extension. 1.2 Text Files

A lot of what we store in files is just text. Text is represented in files, much like it is stored in memory, by placing one character in each successive byte of the file. Of course, bytes actually hold numbers (in the range 0..255), so we use a character set mapping to assign each character a numeric value.

Since the mid-1960’s, the dominant character set has been ASCII. ASCII encodes 128 different characters. Technically, you can say that it wastes one bit of every 8-bit byte. The characters encoded are

“Control characters” (numbers 0..15) which do not print as glyphs on the screen/paper, but describe some other property of location or transmission. For example, the control characters include a “line feed” character used to mark the end of a line, a “page feed” character that marks the end of a page, a “carriage return” character that indicates that we want to return to the leftmost column, and a “tab” character to move some number of spaces to the right.

“Printable characters ” (numbers 32..126), each of which is rendered as a symbol on a page or screen. These include

Blank (number 32) Numeric digits 0..9 (numbers 48..57) Upper-case alphabetic letters A..Z (numbers 65..90) Lower-case alphabetic letters a..z (numbers 97..122) Various punctuation marks fill in the remaining slots. For example, the exclamation mark (“!”) is number 33.

The “del” character at 127 is sort of the odd man out. It’s not a printable character, but it’s not positioned with the “real” control characters either.

So a data file containing a single line of text with the word “Hello” would actually be encoded in a file as

72 101 108 108 111 10 i.e., 72 is the ASCII code for ‘H’, 101 is the code for ‘e’, 108 the ode for ‘l’, 111 the code for ‘o’, and 10 is the line feed (end of line) control character.

Here is a dump of the opening bytes of the text file from which this particular document was generated.

Compare the ASCII codes you see here to the opening paragraphs of this document. 1.2.1 What Can You Do With ASCII Text Files?

Certainly we will be able to do everything to a text file that we can do with generic binary files: copying, renaming, etc.

In addition, we will learn that Linux has quite a few commands for working with text, including commands for viewing, changing, editing, and measuring properties of text.

1.2.2 Unicode

Now, the 128 characters defined in ASCII is good enough for basic purposes, particularly if you speak (and write in) English.

But before long, pressure built to expand the available characters. Some of this pressure came from specialized applications. For example, there are numerous symbols in mathematics and the sciences that aren’t in the ASCII code. Even outside of technical fields, typesetters might desire specialized symbols like ➀ or ✓.

For a while, developers tried to stem the tide by defining character sets that used all 256 possible values of a byte, but even that was little more than a temporary respite.

Even more pressure came from different languages. Anyone whose native language was Spanish, for example, would regret the absence of characters like á or ¿. But that’s only the tip of the iceberg. Greeks and Russians have their own entire alphabets. And once we get past Europe, there are entire families of alphabets for Asian, Middle Eastern, and African languages.

Unicode was introduced in the 1980’s to provide a 16-bit character set, later expanded in the mid 1990’s to 26 bits, which provides for over a million possible characters. Unicode now incorporates not only the traditional ASCII characters (as Unicode values 0..127, so an ASCII text file is also a valid Unicode text file), lots of specialized symbols, many different international alphabets, and, for better or worse, emoticons and emoji.

Complicating matters, Unicode allows for a variety of different ways to arrange the numbers within a stream of bytes. These are called character encoding schemes, or “encodings” for short. For example one encoding, UTF-32, stores a single (26 bit) Unicode character in a block of 4 bytes (32-bits). This is fairly simply, but if 99.9% of your characters are actually down from the original ASCII set, then this wastes nearly 75% of the file storage. Another, more popular, scheme, UTF-8, stores all characters from the original ASCII set in a single byte, but inserts special values outside of the ASCII 0..127 range to signal that the next character is a non-ASCII Unicode symbol that will need 2, 3, or 4 bytes.

1.2.3 What Can You Do With Unicode Text Files?

Certainly we will be able to do everything to a text file that we can do with generic binary files: copying, renaming, etc.

Some of the Linux commands for working with ASCII text files will also work with Unicode. Others will not. Some will work with Unicode files encoded in UTF-8 but not in other encodings. Presumably, more and more of the text processing commands will support Unicode in the future. 1.3 Binary files

A binary file is a sequence of bytes that can contain almost anything. In practice, some software developer working on a program defined a file format for holding the data needed by that program. The file format was probably designed to be compact and easily processed by that program.

Here is a dump of the opening bytes of the file containing the first picture in section 1 of this document. You’ll notice that the ASCII column on the far right contains lots of ‘.’ characters, whic are actually used by the hexdump program to indicate a byte that contains a non-ASCII character value or an ASCII value that does not have a visible representation (e.g., line terminators). Where ASCII characters are displayed, they appear to be almost random. That’s because, for the most part, they pretty much are. Any binary data file is bound to contain some bytes that just happen to match an ASCII character code.

1.3.1 What Can You Do With Binary Files?

We can also do operations that don’t rely on interpreting or understanding the contents of the file: e.g., copying the file, renaming it, or moving it to a different directory.

Typically, though, the contents of that binary file can only be processed by that one program or by other programs written later with the specific goal of processing that same file format. So you can use binary files only as input to a program specifically designed to handle their file format.

For example, every operating system defines a file format for executable programs. (For all practical purposes, we “run” an executable program by supplying it as input to the operating system.) For the most part, a program is just a block of machine code instructions encoded in binary. Load that block into memory, point the CPU at the address where it was loaded, and it runs. But to facilitate the process of loading that block into memory, an operating system may specify that each executable program file starts with a “header” of information indicating where it can be loaded, what other resources need to be available, etc. Because these headers are operating-system specific, trying to execute a program designed for one operating system will usually result in a quick error message if you try to run it on a different operating system, because that operating system will quickly realize that the header is not in the proper format. (This is by no means the only reason why executables for one operating system won’t run under another. It does, however, account for the fact that if you try this, you will usually get stopped before doing any real damage.)

As another example, in 1987 Steve Wilhite, a developer at Compuserve, defined a new format for holding images called the Graphics Interchange Format, or GIF for short. Originally, the only programs that could interpret the GIF format were conversion programs provided by Compuserve for converting between GIF and older exiting graphics file formats. Eventually, web browsers added code designed to interpret and render GIF, and now almost every program that deals in graphics includes code designed to handle GIF.

You cannot hand a GIF file to the operating system to be executed like a program, nor can you ask a web browser or graphics viewer to render a program as if it were an image. Doing so will result in an error message a best, garbage output if you are not so lucky, a hung system if you are still less likely, or a corrupted file system if you are really having a bad day. 1.4 What’s Text and What’s Binary?

We’ll talk more about this in a later lesson, but some examples might be useful now.

When you write a program by typing in source code, you are working with text. Program source code files (e.g., .cpp and .h files) are text, and almost always ASCII text.

On the other hand, when you run that source code through a compiler and get an executable program, that executable is binary.

When you type in a word processor, you are certainly working with text. But the wide range of formatting options, e.g., bold, italic, underlines, font selection, etc., permitted by a word processor are not, in and of themselves, part of the actual text. Something more elaborate than plain text is needed to handle all of that. And every word processor defines its own distinct format for storing that information.

So when you save the output of your favorite word processor (e.g., Word), that file is binary.

On the other hand, you may occasionally wotk with a simpler text editor that provides none of those fancy formatting options (e.g., NotePad in Windows). Such programs provide text files as output.

Web pages, like the one you are reading now, offer nearly as many formatting options as a typical word processor. So you might expect that they are working from a binary file. In fact, however, web pages are text files, but use special commands embedded into the text via the Hyper Text Markup language (HTML) to indicate what formatting is needed.

If you are unfamiliar with HTML, try right-clicking on this page and select the option to “View page source”, or something similar, to see the HTML of this page. Hit Ctrl-F to open a search box and look for the word “Hyper”, and you should be able to find and recognize this paragraph.

Directories in Linux (and folders in Windows) are actually files. But they are in a binary format that is understood by the various navigation and file manipulation commands in Linux. That binary format tracks information such as the file name and, most importantly, the location of the file on the disk. 2 Directories

Files in Unix are organized by collecting them into directories. (In Windows these are more commonly known as “folders”.) Directories are themselves files, and so may appear within other directories.

All directories are files. (Binary files, to be specific). But, obviously, not all files are directories!

The result is a tree-like hierarchy. At the root of this tree is a directory known simply as “/”.1 This directory lists various others: The bin directory contains many of the programs for performing common Unix commands. The usr directory contains many of the data files that are required by those and other commands. Of particular interest, however, is the home directory, which contains all of the files associated with individual users like you and me. Each individual user gets a directory within home bearing their own login name. My login name is zeil.

We can expand our view of the Unix files then as:

cd and ls are two common Unix commands, as will be explained later.

Within my own home directory, I have a directory also named “bin”, containing my own personal programs. Two of these are called “clpr” and “psnup”. So these files are arranged as:

3 Paths

3.1 How do you give someone directions?

We’ve all done this from time to time – asked someone for directions on how to get to someplace. Some people are very good at giving directions, others not so much. Some people are good a following directions, others not so much.

How do I get to the White House?

Look for the Washington Monument. It should be easy to spot. From the Washington Monument, head north along the path until you come to a fork. Turn right, walk about 500 ft. then make a sharp left and head north towards the intersection of 15th St. and Constitution Ave. NW.

From that intersection, continue, continue north along 15th st. until you reach Pennsylvania Ave. Turn left and proceed west along Pennsylvania Ave. until you reach the gate of the White House.

This is an example of absolute directions. They rely on your starting from a well-known, easily reached landmark and proceeding from there.

If you asked how to get to my office on the ODU campus, I might give you absolute directions by assuming that you knew how to start from Webb Center, or, more likely, from the abandoned monorail track that passes through much of the campus.

How do I get to the White House?

“Well, where are you now?”

“I’m in Lafayette Square.”

OK, walk along the southeast path until you come to Pennsylvania Ave NW.

Turn right and proceed west along Pennsylvania Ave. until you reach the gate of the White House.

This is an example of relative directions. Relative directions can often (though not always) be shorter and simpler than absolute directions.

Absolute directions remain correct no matter where you start from.

You could be starting from Norfolk, VA., and those absolute directions to the White House are still correct. It’s just more of a chore to get to the starting point.

Relative directions become useless once your starting position changes.

If you start in Norfolk (or, for that matter, anywhere south of the White House, and start walking to the southeast, you will never reach Pennsylvania Ave (or, at least, not the Pennsylvania Ave where the White House is located. 3.2 File Paths

The full “name” of any file is given by listing the entire path from the root of the directory tree down to the file itself, with “/” characters separating each directory from what follows. For example, the full names (paths) of the four programs in the above diagram are

/bin/cd /bin/ls /home/zeil/bin/clpr /home/zeil/bin/psnup

3.3 Paths Supply Directions

It’s important to recognize that a path is a step-by-step set of instructions on how to find a specific file. For example, /home/zeil/bin/psnup means:

1. Start at the root of the Unix file system, / 2. There you should see a directory named home. Look in that directory. 3. In that directory, you should see a directory named zeil. Look in that directory. 4. In that directory, you should see a directory named bin. Look in that directory. 5. In that directory, you should see a file named psnup. That’s the file you want.

In the assignments for this course, I will often give instructions like

Copy the file /home/zeil/bin/psnup into the directory...

and then will get email from students saying something like

I can't find the file /home/zeil/bin/psnup. Where do I find it?

which is rather like asking “What’s the address of the house at 221B Baker St., London, Eng.”? or “How heavy is a 5 pound bag of flour?”

The answer is literally right there in the question!

Now, when I say that a path is a step-by-step set of directions, understand that we seldom have to follow those directions step by step. Almost any time and any place I need to name a file in Unix, I can simply give a path to it and let the operating system follow those step-by-step directions.

Example 1: Try This Throughout this course you will encounter sections labeled Log in to your Unix account. “Try This:”. That means that I really want you to log in, try the commands or procedures I describe, and observe and Upon logging in, your working directory should think about the results. be your home directory. The command pwd will print the working directory. Give the command If you don’t understand the output you receive, please post a question in the course Forum on Blackboard or send me your question via email. pwd

You should see something like

/home/yourname

This is a path. What does this path tell you?

Answer + You could get to your current locations by

1. Starting at the file system root /. 2. In that directory find a directory named home and descend into it. 3. In that home directory, find a file named with your login name.

The command file can tell you what kind of file you have. Give the command:

file /home/yourName

substituting your own login name for yourName.

Does the response you get from file make sense?

The command cd will let you change your current working directory.

Give the following commands and observe the results:

cd / pwd file / cd /usr pwd file /usr

3.4 Absolute and Relative Paths

File paths give step-by-step directions on how to reach a file.

Just like when we give directions in the real world, we can give paths that are relative or absolute.

Absolute paths start from a “landmark”, namely the file system root /.

Relative paths start from “wherever we are now”, our current working directory. In the Try This exercise earlier, you saw how to change your current working directory with the cd command and how to find out what is is with the pwd command.

If a path starts with a ‘/’, it is absolute.

Later we will see that an absolute path can also start with ‘~’.

If a path starts with anything besides ‘/’ (or ‘~’), it is relative.

Most Unix commands and programs will work with one or more files. We tell them what files we want them to use by giving paths to those files. We can give those paths as absolute paths or relative paths. whichever is more convenient for us to type because the commands really won’t care. They will simply follow the paths we give them, step by step.

Example 2: Try This

Log in to your Unix account.

Give the commands: cd /usr pwd

Is that cd command using a relative or an absolute path?

Answer + Absolute: it begins with a ‘/’, telling us that the first step in following this path is to start from the file system root, /.

The command ls is used to list the files contained in a directory. With no path in the command, it lists the contents of the current working directory. We can also give it one or more paths and, if those paths lead to directories, it will list the contents of those directories.

Give the commands:

ls ls /usr

Why do these commands produce identical output?

Answer + The first command lists the contents of our current working directory. However, our current working directory is /usr, the same as the path given in the second command.

So both commands are actually being asked to list the same directory.

If we are following directions in the real world, we often walk a little way, then stop and look around to be sure that we are in the right place, then follow a little more of the directions, stop and look around, and so on.

We often use relative paths in Unix commands to accomplish much the same thing.

Try the command:

file /usr/include/net/ethernet.h

Good enough, but how did I even know that file was there? For that matter, how did I know that the directory /usr/include/net was there? And how did I know that I was typing it correctly with no misspellings or other goofs?

In many cases, I would approach it step by step, using relative paths to move one directory at a time.

Try the commands:

cd / pwd ls

How do I know which of these are directories and which are ordinary files? We could use the file command, but there’s a nice shortcut available in ls. The -F option will attach a punctuation character to the end of “unusual” files to tell us what they are:

a ‘/’ on the end of directories a ’*’ on the end of commands and programs that can be executed a ‘@’ on the end of “symbolic links” – we won’t use these in this course, but they are a kind of shortcut tunnel from one directory to another.

Just remember if you use this option that the appended punctuation is not really par of the file name!

Continuing on, try the commands:

ls -F cd usr pwd

Why does usr work in the cd command above?

Answer + usr does not start with ‘/’, so it is a relative path. So, from our current working directory /, we descend one step into the directory named “usr”.

Continuing on, try the commands:

ls -F cd include pwd ls -F cd net pwd ls -F file ethernet.h file /usr/include/net/ethernet.h

Notice how each cd command adds another link to our current working directory.

If we know the absolute path to a file, we can get there immediately. Otherwise we can step our way, one step at a time, until we get where we want.

Now, there’s lots of intermediate possibilities in between those two extremes. For example, try these commands:

cd /usr/include ls -F ls -F net file net/ethernet.h

3.5 Abbreviating Paths

There are some common abbreviations that can be used to shorten paths.

You can refer to the home directory of someone with login name name as ~name

Similar to our earlier example, we can deconstruct the path ~cs252/Assignments/Asst1/foobar.txt

1. Start at the home directory of the “cs252” account Unix file system, ~cs252, also known as /home/cs252. 2. There you should see a directory named Assignments. Look in that directory. 3. In that directory, you should see a directory named Asst1. Look in that directory. 4. In that directory, you should see a file named foobar.txt. That’s the file you want.

You can refer to your own home directory simply as ~

For example, you could refer to the file containing my clpr program as either /home/zeil/bin/clpr or ~zeil/bin/clpr.

When I myself am logged in, I can refer to this program by either of those two names, or simply as ~/bin/clpr.

There is a big difference between ~jones/ and ~/jones/

~jones/ means "the home account of the jones account, and is a shorthand for /home/jones ~/jones/ means “look in your own home directory for a directory named jones” and is an abbreviation for /home/whateverYourLoginNameIs/jones. And, honestly, it’s pretty unlikely that you even have a a directory named jones. At all times when entering Unix commands, you have a “working” directory. If the file you want is within that directory (or within other directories contained in the working directory), the name of the working directory may be omitted from the start of the file name.

When you first log in, your home directory is your working directory. For example, when I have just logged in, I could refer to my program simply as bin/clpr, dropping the leading /home/zeil/ because that would be my working directory at that time.

The working directory itself can be referred to as simply “.”.

The “parent” of the working directory (i.e., the directory containing the working directory) can be referred to as “..”.

Example 3: Try This

Try the following commands. See if you can predict what each pwd command will print.

cd ~ pwd ls -F cd /usr cd include pwd cd .. pwd ls -F cd /usr/include/.. pwd cd /usr/include/. pwd cd /usr/../home pwd ls -F cd ~ pwd cd ~cs252 pwd cd ~ pwd cd ../cs252 pwd cd ../../usr/include pwd cd ./. pwd

Any surprises? Any results that you just could not explain?

Some miscellaneous notes:

Most people do most of their work in their own directories. That makes the ~ shortcut particularly useful.

Sometimes you will need to access files and directories belonging to someone else. That’s where the ~otherPerson shortcut comes into play.

.. can go a long way towards making relative paths shorter and easier to type than a full absolute path. But I find that many students tend to forget it. . is not nearly as useful. It tends to crop up in a few specialized cases, For example, every now and then, we want to tell a command to do something to our current working directory, and so we give a path consisting of “.”, all by itself. 4 File Systems on Other Operating Systems

Much of what we have covered here is actually applicable to other operating systems as well. All operating systems have files, both text and binary. All operating systems have directories, though they may be called “folders” instead.

And proficient use of any operating system will eventually require you to work with paths in that operating system. 4.1 MacOs File Systems

If you do not have access to an Apple OS/X or macOs PC, skip to the next section.

The macOs is a Unix operating system, so it should not be surprising that almost everything we have looked at will carry over directly to Macs.

The general organization of the file system, starting from the file system room /, is the same. The pwd, cd, and ls commands all work the same.

In fact, the only difference is that in most Unix systems, your home directory would be /home/yourLoginName/, but in macOs your home directory is /Users/yourLoginName/

Example 4: Try This

Open a Terminal window on your Mac. In that window, try the following commands. Try to predict what each pwd and ls is going to show you.

cd / pwd ls -F cd usr pwd ls -F cd /Users pwd ls -F cd ~ pwd ls -F

Now, it’s true that you can usually use the Finder to examine your directories and files with less effort than working through the command line.

But this course is “…for Programmers”, and there are many tasks that programmers, unlike more casual Mac users, will need to perform that involve paths and other command line concepts. 4.2 Windows File Systems

If you do not have access to a Windows PC, skip this section. Windows is the only commonly used operating system today that is not a Unix variant, so it will have lots of differences from Linux. Still, it has files, directories (folders), and paths.

The major differences to watch out for:

Windows separates steps in a path with the backslash \ instead of the forward slash ‘/’ used in Unix. Instead of a single file system root, Windows has separate file trees rooted at each lettered drive: C:\, D:\, E:\, etc. Windows ignores upper/lower case differences in file names and paths. In Unix, hello.cpp and Hello.CPP are different files and can co-exist in the same directory. In Windows, these are considered to be alternate spellings for the same file name. Many of the commands have different names. cd with a directory name or path works much the same in Windows as in Unix, but if you give the cd command with no path, it behaves like the Unix pwd. The Windows command for listing the contents of a directory is dir, not ls. Your home directory in most Windows installations is C:\users\yourLoginName\, where yourLoginName is the account name you use to log in to Windows.

Example 5: Try This

Open a CMD window on your Windows PC. (Click the Start/Windows button on the left end of the task bar and type cmd, then hit enter.)

In that window, try the following commands. Try to predict what each cd and dir is going to show you.

cd dir cd \ cd dir cd \Windows\Help cd dir cd .. cd dir cd /Users cd dir cd yourWindowsLoginName cd dir

Now, it’s true that you can usually use the File Explorer to examine your directories and files with less effort than working through the command line.

But this course is “…for Programmers”, and there are many tasks that programmers, unlike more casual Windows users, will need to perform that involve paths and other command line concepts.

1: It may be more precise to say that this directory’s name is the empty string "".

2: As we will see, one almost never needs to type an entire file name in a Unix command, so long file names are no harder to work with than short ones. Some Basic Unix Commands

Last modified: Aug 17, 2019

Contents: 1 Getting Started: Navigating Your Directories 1.1 Basic Navigation: mkdir, cd, ls, pwd 1.2 Copying Files: cp 1.3 Making Mistakes 1.4 Getting More Information from ls 2 So Many Unix Commands 1 Getting Started: Navigating Your Directories

We’ve already looked at some basic navigation commands. Now, let’s put them together with commands for setting up our own directories. 1.1 Basic Navigation: mkdir, cd, ls, pwd

If you have not yet done so, log in now so that you can work though the following commands.

Example 1: Try This Again, when you encounter sections labeled “Try This:”, I Now, let’s make a place to play in. mkdir will really want you to log in and try the commands or make a new directory. Enter the command procedures I describe. Starting with this one, some of the “Try This:” exercises will cd ~ set up files and directories that will be used in later exercises, mkdir playing so skipping these early examples may lead to problems later on. Some of the assignments will also check to be sure that to create a directory named “playing”. you have done the Try This exercises before attempting the assignment. How does the mkdir command know where to put the new directory? For the same reason, you should not delete the directories and files set up during the Try This Answer + exercises. You are likely to need them again later. It really doesn’t. “playing” is actually the path to the directory that we want to create.

playing is a relative path (it doesn’t start with “/” or “~”), which means that it starts from our current working directory.

* In this case, the working directory is what we just printed a moment ago with the `pwd` command, so "`playing`" is actually the abbreviation for `/home/yourname/playing`

Give the command

ls -F

and you should see playing listed. In fact, it may be the only thing listed.

Give the command sequence

pwd cd playing pwd cd .. pwd cd ./playing pwd

to review the use of relative paths.

The mkdir command can also be used to create a whole “chain” of new directories at once with the -p option. Suppose that we wanted to create a directory named a inside a directory named b inside a directory named c, and that none of those directories exist yet.

Here are three ways to do that:

mkdir c cd c mkdir b cd b mkdir a cd ../.. or

mkdir c mkdir c/b mkdir c/b/a or, quickest of all, just

mkdir -p c/b/a

1.2 Copying Files: cp

The cp command copies one or more files. You can give this command as cp file1 file2 to make a copy of file file1, the copy being named file2. Alternatively, you can copy one or more files into a directory by giving the command as

cp file1 file2 ... fileN directory

Example 2: Try This:

Now try the following:

ls /usr/include

You should see a large number of files, including many ending in “.h”. Copy two of these files and check to see that the copy was successful, as follows:

cp /usr/include/math.h /usr/include/stdio.h ~/playing ls ~/playing cd ~/playing cp /usr/include/limits.h . ls ls . ls ~/playing

Both cp commands above give a directory as the destination for the copy, so the new files should appear in that directory. The second one uses the abbreviation “. ” to denote your current working directory.

The three ls statements in a row actually all list the same directory. With no parameter, the first ls lists your current directory. The final two ls commands simply give different paths to that same directory.

more ~/playing/math.h

The more command is used to page through a file, one screen at a time. (It gets its name from the “more” prompt at the bottom of the screen, letting you know that there are more pages to be displayed.

Hit the space bar to page forward through the file. You can also use ‘b’ to move backwards and q to quit when you are done. (You can also type ‘/’, followed by a string of characters, to search the file for that string.)

Now that we have some files in our playing directory, we can do a little more practice with paths:

Example 3: Try This:

Each of the commands listed below contains a path.

Which commands are absolute and which are relative?

Answer + cd ~/playing # Absolute ls -l ~/playing/math.h # Absolute ls -l math.h # Relative ls -l ../playing/math.h # Relative ls -l playing/math.h # Relative cd ~ # Absolute ls -l ~/playing/math.h # Absolute ls -l math.h # Relative ls -l ../playing/math.h # Relative ls -l playing/math.h # Relative

Try the commands:

cd ~/playing ls -l ~/playing/math.h ls -l math.h ls -l ../playing/math.h ls -l playing/math.h cd ~ ls -l ~/playing/math.h ls -l math.h ls -l ../playing/math.h ls -l playing/math.h

Notice how the absolute path “works”, no matter where you have cd’d to.

The three relative paths, however, give results that depend very much on your current directory. If you don’t see why, use ls to explore your current directory and make sure you understand why some of these relative paths work and others fail.

1.3 Making Mistakes

Before we go too much further in exploring new commands, let’s take a slight diversion. Sooner or later you are going to make a mistake when typing out a command – you may already have done so.

Don’t panic! Read the error message that you get carefully. It really will help.

Let’s make some deliberate mistakes so that you can get familiar with what some of the more common error messages look like.

Example 4: Try This: (bad commands)

Let’s get back into our playing directory:

cd ~/playing

Now issue each of the following commands and observe the error messages you get:

noSuchCommand list

The first response is obvious enough. Of course, it’s clear that noSuchCommand is not, in fact, the name of a valid command. You’re more likely to see that response, however, because you misspelled a command name.

The second command might be an example of such a misspelling, typing list instead of ls. The response is different, however, because list is close enough to some commands that could have been, but aren’t installed on our Linux server.

Most “commands” are, ultimately, files that contain executable programs. So what happens if we just start naming some file that aren’t executable programs? Try:

~/playing ~/playing/math.h

The response to the first one is clear enough. ~/playing is a directory name, not a command. The command shell that processes our typed input has no idea what we want to do with that directory. Do we want to list its contents (ls)? Do we want to step into it (cd)? Do we want to…? You can’t simply give a directory (or file) name and expect the operating system to guess what you want to do.

The response to the second command is a bit more cryptic. Why is it talking about “permissions”? We’ll see in a later lesson that we can set permissions on what both we ourselves and other people can do with our files. One of the things that we can give ourself or other people permission to do is the -execute_ that file. Now, it only makes sense to execute a file if that file actually holds a program, not simply data of some kind.

How does the operating system know if a file contains a program? One important way is that it looks to see if we have given anyone permission to execute it. In this case, math.h does contain a program, so it was not set up with “execute” permission. So when we try to execute it, the operating system refuses, telling us that we lack the (execute) permission.

Example 5: Try This: (bad parameters)

Try:

cd ~/playing ls cp math.hh mathcopy.h ls

In the third line, we are deliberately misspelling the name of the file “math.h”. The “No such file” response should be obvious enough.

This is a very common error message, one that is pretty much self-explanatory, and yet one that I get a lot of email about. Now it’s very easy to misspell a file name. It’s also easy to forget where a file is or is supposed to be. But the appropriate response to this message is not to throw your hands up in the air and give up, nor is it to to email me the message and stop working until you hear back.

Instead, look around. Use the ls, pwd, and cd commands that we have covered to look and see what has actually occurred.

The cp command can lead to lots of interesting errors and error messages. Earlier, for example, you gave the command cp /usr/include/math.h /usr/include/stdio.h ~/playing to copy two files at once into a directory. Let’s try a variation on that:

cp /usr/include/math.h /usr/include/stdio.h ~/playingzzz If we are going to copy multiple files at once, we have to give cp the name of a directory in which to put them. In this example, we have (badly) misspelled the name of that directory.

Interestingly, we don’t get an error message from this:

cp /usr/include/math.h ~/playingzzz

To see what has happened, do

ls ~ more ~/playingzzz

When we are only copying a single file, cp gives us the option of either giving a destination directory (which must already exist) or a destination file (which will be created if it does not exist.

Finally, try:

cp /usr/include/err.h /usr/include /usr/include/limits.h ~/playing ls ~/playing

You’ll see that the first and third file have, indeed, been copied. But the second file in the list was not, and the error message tells you why. The cp command, under normal circumstances, copies only ordinary files and not directories. In section 2.2, you’ll see how to change this behavior, but because copying entire directory structures is somewhat risky (you can waste a lot of storage with some simple mistakes, it’s turned off by default.

1.4 Getting More Information from ls

We’ve already seen the use of the -F option with ls to get a hint at what kinds of files we are looking at.

The ls command can give you still more detailed information about your files if you add the -l option. (Note, that is a lower-case letter ‘L’, not a numeric digit one.)

Example 6: Try This:

let’s add a little something extra to our playing directory:

mkdir ~/playing/around cp /bin/which ~/playing ls ~/playing

You should see the files that you copied earlier, plus the newly created which and around files. But, which of these are directories? You might remember that mkdir creates directories, and so guess that around is a directory, but there’s no clue to that in this listing, and if you have a lot of files in a listing, you might not know or remember how they were all created.

Now try:

ls -l ~/playing

Now you have some additional information. This includes the size of the files and the date (and possibly the time) at which these files were last modified. The rather cryptic “rwx-” characters at the beginning are something that will be explained in a later lesson. However, if you look at the very first character on each line, it tells you something very useful. it will be ‘d’ if the file is a directory and ‘-’ if it is an ordinary, non- directory, file.

There are easier ways to find out if a file is a directory or not. The -F option asks ls to put a ‘/’ character at the end of any directory names and a ’*’ character at the end of any files that are executable programs/commands. Try the commands

ls -F ~ ls -F ~/playing

You should be able to easily identify directories and ordinary files. Remember that these characters added to the listing by the -F option are not actually part of these file /bin is where most of the basic Unix commands names. Don’t type these characters when you try to are stored, so it’s not surprising that a file copied work with these files. from there would turn out to have been an executable command. In fact, try:

ls -F /bin

Notice that the ls, pwd, and mkdir commands that you have been practicing with are all present in the directory listing for /bin. (You’ll also see some files listed with a ‘@’ character. These denote symbolic links, a special type of directory entry that we won’t deal with in this course.)

Similar to the cp command, the mv command can be used to move and/or rename a file.

Example 7: Try This:

Wait, if necessary, until at least one minute has passed since you did the prior Try This. Give the commands

ls -l ~/playing mv ~/playing/around ~/playing/games ls -l ~/playing cp ~/playing/which ~/playing/games/why mv ~/playing/which ~/playing/games/what ls -l ~/playing ls -l ~/playing/games

Notice that we are changing both names and locations of files with the mv command.

On the other hand, look at the date on which the files which and what, and why were last changed in the various listings. The fact that the which and what dates are the same is a good indication that mv has actually moved the existing files and not made a new copy of them. By contrast, why, which was produced as a copy, should have a more recent date.

2 So Many Unix Commands

As we progress through this course, we will encounter quite a few commands. It’s not unusual to forget the name of a command that you have not used in a while.

You can find a glossary of the commands covered in this course in Commands Glossary. Getting Help

Last modified: Jun 29, 2015

Contents:

As you explore Unix, you are bound to have questions. Some ways to get answers include:

The entire Unix manual is on-line.

man command

displays the manual page for the given command.

man -k keyword

looks up the given keyword in an index and lists the commands that may be relevant.

The CS Department systems staff has collected a variety of additional help documents. You can find them by going to the Dept home page (http://www.cs.odu.edu/) and selecting “Frequently Asked Questions” under the “Systems Group” heading.

A staff member is generally on duty or on call in the public CS lab in Hughes Hall (on the Norfolk campus) whenever that room is open.

If none of the above help, then send e-mail to "[email protected]". This is also how you report bugs, machine failures, etc. Typing Unix Commands

Last modified: Aug 17, 2019

Contents: 1 Command Arguments 2 Special Characters 2.1 Saving Yourself Some Typing 2.2 Special Characters that Say “I’m Done” 2.3 Special Characters for Editing Your Command

To run a Unix command (or any program, for that matter), you normally must type the name of the command/program file followed by any arguments. There is actually a program running that accepts your keystrokes and launches the appropriate program. The program that reads and interprets your keystrokes is called the shell. There are many shells available, all of which offer different features. The default shell for ODU CS is called tcsh, and we’ll concentrate on that.

The command/program name is usually not given as a full file path. Instead, certain directories, such as /bin, are automatically searched for a program of You can see your path by giving the the appropriate name. This set of directories is referred to as your execution command path. New accounts are set up so that the directories holding the most commonly used Unix commands and programs are already in the execution echo $PATH path. And you can modify your $PATH, if Thus, one can invoke the ls command as desired, to add additional directories.

/bin/ls but it’s usually simpler to say

ls 1 Command Arguments

Of course, most Unix commands consist not only of the program name but also require one or more command arguments. The command arguments indicate the details of the desired operation, including files to work with, text to use, etc.

Command arguments tend to come in four varieties:

1. Files: If a command needs to operate on one or more files, then we specify those files by giving a path to the file.

As we have already discussed, file paths may be absolute or relative.

2. Directories: these are also specified using absolute or relative paths. In fact, a directory in Unix is also a file – it’s just a binary file whose special contents include a list of other files and their locations on a disk drive. This means that most Unix commands that expect a file can be given a directory as well. But in many cases that won’t do what we want.

Example 1: Try This: Directories versus files in commands

Each of the Try This exercises in this lesson gives you a set of commands that you should enter in a Linux session, carefully observing the results. If you don’t understand the results that you get, you should ask (in the Forum or via email) before moving on to the assignments in this section.

Enter the following commands: The more command lists the contents of a file, a cd ~ page at a time. Use the space bar to move forward more playing/math.h a page, ‘b’ to move back a page, and use ‘q’ to more playing quit.

As you can see, trying to list the contents of a directory file is not useful.

There are a few commands that work only on directories. mkdir (create an empty directory) and rmdir (remove a directory, if empty) are the most common.

In other cases, commands may work with either files or directories, but have slightly different behaviors in the two cases. The cp command is a good example of this.

Example 2: Try This: Directories versus files in commands (2)

cd ~/playing cp stdio.h g1 ls cp math.h g1 ls rm g1 mkdir g1 cp math.h g1 ls ls g1

The three cp commands are almost identical.

In the first case, we copied a file to a path, but the destination path named a file (g1) that did not actually exist. So cp created a copy with that name.

The second cp copies to a path (g1) that now indicates an existing file. cp removes that existing file and then creates a new copy with that name.

The third cp, however, names an existing file that happens to be a directory. In this case, cp does not replace g1 but writes a copy (named math.h) inside the directory g1.

3. Plain text: Sometimes commands take ordinary text as parameters. A simple example of that is the command echo, which simply prints whatever parameters it is given:

Example 3: Try This: Directories versus files in commands

Try the following commands:

echo Hello world! who who am i echo who am i echo I am $USER The who command is normally used to see who is currentl;y logged in to the same machine as you. But the “am i” parameters change it to simply list your own session statistics.

The $USER is an example of an environment variable, a special placeholder that contains information about your current session on the machine. Environment variables are easily recognized by the opening $ in their name. We’ll learn more about these in a later lesson.

A more interesting example is the grep command, which searches files for lines containing desired text. For example,

Example 4: Try This: Text arguments (searching)

more /usr/include/math.h grep def /usr/include/math.h

The grep command lists all lines inside /usr/include/math.h containing the string “def”. We’ll shortly see ways to write more complex search patterns.

4. Flags: these are special arguments used to alter or control the behavior of a command. Flags for Unix commands are usually written beginning with a “-” or occasionally “–”.

Example 5: Try This: Flag arguments in commands

ls ~ ls -a ~ ls /usr/include ls -l /usr/include

Take careful note of how the behavior of the ls command is altered by the “-a” and “-l” flags.

The “l” in “-l” stands for “long”, and provides a longer listing with more information about each file.

The “a” in “-a” stands for “all”. File names in Unix that begin with a “.” are hidden from normal listings. The “-a” option includes those normally hidden files in the ls output.

How do you know what kind of arguments can be accepted by each command? You pretty much have to deal with this on a individual basis, though for any specific command you can consult the on-line manual for that command via man, e.g.

man ls man grep

Many common Linux commands have a truly bewildering variety of possible flags. usually, however, there are a small handful of commonly useful ones. I will try to point those out as we go along and will include them in the Try This exercises.

In the assignments for this course, I will always make sure that any command paramaters and flags that you might need will have been covered and included in a Try This. Consequently, it’s generally a bad idea to go hunting up more obscure flags and options on the Internet or in the man pages for use in the assignments. It’s an unnecessary waste of your time, and may often distort the command outputs past the ability of my grading scripts to recognize what you are actually doing. Short version: If it’s not in the Lecture Notes and not something you practiced with in a Try This exercise, don’t use it.

2 Special Characters

Usually, the meaning of a character that you type is self-evident. If you type an “a”, it means a and you see the “a” appear on your screen. If you hold the shift key down and type “a”, it means A and you see that upper-case A appear on your screen.

But some characters that you type have a special meaning. The most obvious example would be the Enter or Return key, which sends a character that is interpreted to mean “I’m ending a line now”. Another example would be the $, which, as we have seen in an earlier Try This, can introduce an environment variable.

This section describes some of the special characters that you may need to use when entering Linux commands. Most of these special characters are entered by holding down the “Control” or “Ctrl” key while typing a letter. By convention, we designate this by placing the symbol “^” in front of the name of the second key. For example, if you have typed zero or more letters of a filename and want to see a list of what filenames begin with what you have typed, you could type ^D, i.e., hold down the “Control” key and type “d”. 2.1 Saving Yourself Some Typing

Many people get turned off by Linux and, more generally, by working at the command line because they think that typing long file paths is tedious and error-prone. While there’s a certain amount of truth in that, experienced programmers generally don’t spend nearly as much effort typing out long commands as you might think. Instead, they take advantages of shortcuts offered by the command shell

2.1.1 Tab Completion

If you have entered the first few letters of a file name and hit the “Tab” key, the shell will examine what you have typed so far and attempt to complete the filename by filling in the remaining characters. If the shell is unable to complete the filename, a bell or beep sound will be given. Even in this case, the shell will fill in as many characters as it can.

Example 6: Try This: Tab Completion

Log in to a Linux server and enter

ls ~/pla

but do not hit the Enter key to end the line. Instead, type the Tab key, wait to see what happens, then hit Enter.

Now enter the following, using the Tab key where I write and the Enter/Return key where I write . Pause a moment after each to observe what happens.

ls ~/plag

Not so bad, now, is it?

This all works nicely if you have entered enough characters to uniquely identify your file or directory name. What if you make a mistake? Try

ls ~/plax Because you do not have a file or directory name in ~/playing that starts with an ‘x’, this fails and no new characters are added by the shell. Depending on what ssh client you are running and what your settings are, you might hear a bell sound that signals that Tab completion was attempted but could not fill in the remainder.

What if there’s more than one file that matches what you have typed so far?

Hit Enter to finish the previous command and try this:

ls ~/plag/w

The Tab after the ‘w’ fills in an ‘h’, giving you “wh”. Theat’s actually not a complete file name.

Hit Tab again. Nothing happens. That’s because there are actually two files in ~/playing/games/ whose names begin with “wh”.

Hit Tab a third time. This time the command shell decides that you need some help and lists the files in ~/playing/games/ whose names begin with “wh”. Pick one of those by typing the third letter in its name and hitting Tab again, and the entire file name gets filled in.

2.1.2 Repeating Prior Commands

Each time you enter a command, the command shell remembers the last several commands that you entered in a “history”.

You can review those commands, moving backwards in time by typing ^P (hold the Ctrl key and type ‘p’) or, in most cases, the up arrow on your keyboard. Once you have moved a few steps back in time, you can move forward to more recent commands by typing ^N or the down arrow.

You can remember the ‘P’ and ‘N’ in these as “Previous” and “Next”.

Example 7: Try This: Command History

If you are still logged in to a Linux server, log out (exit) and then log back in again.

Type a few ^P characters. The commands that you see should look familiar.

It’s interesting to see that the command history has been retained across separate login sessions.

Try your up arrow key and see if it does the same.

Try a few ^N or down arrow keys to reverse the progress.

Stop at any of your recent “ls” commands (from the previous Try This) and hit Enter. The old command from your history will be repeated.

2.1.3 Copy and Paste

You can copy and paste to and from an ssh session much like you would copy and paste in a word processor.

However, Unix was doing this long before it was a common word processor step, before the term “copy and paste” was coined to The closest early Linux equivalent terms, coined describe the operation. Consequently the now-alomost-universal key for the emacs editor that we will study in later sequences ^C and ^V are not used for copying and pasting in Linux. lessons, were actually “copy” and “yank”. The operation of deleting text while saving it to be ^C, in particular, already had a long history of use as a “Cancel” pasted later, now called “cut”, was earlier called command. “kill”, and what we now refer to as the “clipboard” was the “kill ring”. Unlike the completion and history commands we have been discussing,, the copy and paste functionality is provided by the ssh client or other terminal window that you are running. So exactly how you do copying and pasting depends on what you are running as your ssh client on your local machine:

Windows, using PuTTY

To copy text out of a PuTTY session, just use your left mouse button to drag the mouse across the text you want to copy. Then you can paste that text into an email, word processor document, or other ssh session by using their normal past commands.

To paste text into a PuTTY session, click the middle mouse button, if you have one, or the scroll wheel if you do not.

Linux command windows, using ssh

Same as PuTTY, click-and-drag the left mouse button across text to copy. Use the middle mouse button to paste.

OS/X command window, using ssh

Select the desired text by clicking and dragging the mouse across the desired text. Then copy it with Command-C or by selecting “Copy” from the Edit menu. Paste with Command-V or by selecting “Paste” from the Edit menu.

Example 8: Try This: Copy and Paste

1. In a Linux session, give the following command by copying it from this web page and pasting it into your ssh session:

ls -ld /usr/include/net

Hit Enter to run the command.

2. Then copy, from your ssh session, the portion of that command up to the third ‘/’, and paste that back into your ssh session, then hit Enter to run that abbreviated command.

3. Now, open your favorite email program on your local machine. Start a new message to yourself. In the formatting options for the message body, look for a paragraph style “pre-formatted” or “Fixed width” and select that.

Copy the entire visible text of your ssh session.

Paste it into your email message and send it.

2.2 Special Characters that Say “I’m Done”

Of course, by now you are used to using the Enter key to indicate that you are done entering a line.

^D is used when a command/program is reading many lines of input from the keyboard and you want to signal the end of the input, i.e., that you are done entering lines of text to that command.

^C is used to “Cancel” or abort a program/command that is running too long or working incorrectly. Beware: aborting a program that is updating a file may leave garbage in that file.

You may think that Unix is being stubborn in not using ^C to mean “copy” as is done in Windows, but the “Cancel” meaning of ^C actually dates back to the definition of the ASCII character set and was well-established long before Unix existed.

In many programs, ^Z pauses the program and returns you temporarily to the shell. To return to the paused program, give the command: fg 2.3 Special Characters for Editing Your Command

We all make mistakes when typing. To correct them:

^H, the “Backspace” key, and the “Delete” key all delete the most recently typed character.

^B moves the cursor Backwards over what you have just typed, without deleting those characters. This is useful in correcting typing mistakes. The “left” arrow on your keyboard may also do the same thing.

^F moves the cursor Forwards over what you have just typed, without deleting those characters. The “right” arrow on your keyboard may also do the same thing.

These can, by the way, are often combined with the History special characters discussed earlier to produce new commands that are similar to, but just slightly different from, earlier commands. For example, if I had a command that I wanted applied to file1.dat, file2.dat, … file5.dat, one way to handle it would be to type out the command for file1.dat and run it, then use ^P to retrieve that command from history, ^B to move back to the ‘1’, Backspace or Delete to remove the ‘1’, then type ‘2’ in its place and hit Enter. I could repeat this for each of the files I wanted. (For handling very long lists of files, we’ll see better strategies when we cover “scripting” in later lessons.) Patterns for File Names: Wildcards

Last modified: Aug 2, 2018

Contents: 1 Wild Cards

In the command examples we have used so far, we have always written a single file path or a single text string. In many cases, however, we want to Don’t forget, though, that you can use supply commands with a whole list of files or text strings. Typing out the the Tab character to speed up typing of whole list, one at a time, would be tedious, so we usually write some kind of individual file names. pattern that describes multiple items instead. 1 Wild Cards

Whenever we have a command that can take multiple filenames, we can often write a single pattern for several files. Patterns for file names use wildcard characters, the most common of which is "*", which tells the shell (the program that reads your keyboard input, determines what command or program you want to run, then launches that program) to substitute any combination of zero or more characters that results in an existing file name.

Example 1: Try This: Wild cards in common commands

ls ~/playing rm ~/playing/* ls ~/playing

What files were matched by the wildcard pattern in the rm command?

ls /usr/include

Notice that there are a number of files ending with .h

cp /usr/include/m*.h /usr/include/s*.h ~/playing ls ~/playing ls ~/playing/se* ls ~/playing/sq*.h ls ~/playing/se* ~/playing/sq*.h

Again, note the use of the wildcard to form a pattern for multiple file names. In cases, like this, where there are multiple possible matches, the shell forms a list of all the matches. So * the earlier rm command actually saw a list of all the files in the ~/playing directory. * The cp command saw all the files in the in the /usr/include directory whose names began with “s” and ended with “.h”. * The various ls commands saw restricted sets of files based upon the non-special characters intermixed with the wildcards.

One good way to figure out what files will match a wildcard pattern is to use the echo command. echo simply prints out its arguments. But since the arguments in the command line are processed by the shell before invoking the echo program, any wildcard patterns will have already been expanded.

Example 2: Try This: Showing the effects of a wildcard pattern ls /usr/include echo /usr/include echo /usr/include/*.* echo /usr/include/*

The difference between the last two may be subtle. The “.” pattern will match only files that contain a “.”. Unlike Windows, Unix does not require file names to end with a period and a three-letter extension. Some sort of period and extension is common, but directory names and executable programs often have no extension and no period. (In Windows, you can create a file with an empty extension, but Windows insists on adding a period at the end.)

echo /usr/include/f*.* echo /usr/include/*f*.*

Many of the commands that we have already looked at will allow you to specify multiple files to operate on at one time. The easiest ways to give multiple files will be to use wildcards. grep is a program for searching files to find lines that match a certain pattern. We’ll look at how to write those patterns in a later lesson, but in the meantime we can make good use of grep to search for lines containing a specific text string. grep commands look like: grep flags pattern one-or-more-file-paths

The flags are optional. The ones we will use in this course are

-i When comparing the pattern to the lines from the files, ignore differneces in upper/lower case characters. -v Instead of listing the lines of text that contain the pattern, list the ones to do not contain the pattern. -l Don’t list the lines that match the pattern, just list the names of the files contianing at least one such match.

The pattern, for now, will just be any string made up of letters and numbers.

The list of file paths indicates which files to examine. Wildcards will come in very handy here.

Example 3: Try This: Operating on multiple files at once

Do the following. Make sure that you understand what you are seeing in each case.

ls ~cs252/Assignments/emacsAsst/*.txt grep Title ~cs252/Assignments/emacsAsst/*.txt grep Author ~cs252/Assignments/emacsAsst/*.txt grep title ~cs252/Assignments/emacsAsst/*.txt grep -i title ~cs252/Assignments/emacsAsst/*.txt grep -v title ~cs252/Assignments/emacsAsst/*.txt grep -l constitution ~cs252/Assignments/emacsAsst/*.txt grep constitution ~cs252/Assignments/emacsAsst/8*.txt

Each time you use a wildcard pattern, the command shell expands that to a list of files, separated by spaces:

echo ~cs252/Assignments/emacsAsst/*.txt echo ~cs252/Assignments/emacsAsst/8*.txt

What do you get if you have a list of files spearated by spaces, followed by another list of files separated by spaces? Answer: A longer list of files, separated by spaces. echo ~cs252/Assignments/emacsAsst/*.txt /usr/include/*.h grep -l large ~cs252/Assignments/emacsAsst/*.txt grep -l large /usr/include/*.h grep -l large /usr/include/*.h ~cs252/Assignments/emacsAsst/*.txt Quoting Special Characters

Last modified: Apr 20, 2019

Contents: 1 Special Characters 2 Quoting – Three Ways 2.1 Examples: Unconventional File Names 1 Special Characters

In the grep examples of the previous section, I took care to enclose each regular expression inside ‘quotes’. That’s not some special way to indicate regular expressions, nor is it specific to text strings as arguments. Instead, it’s a way to be sure that the command shell did not treat the enclosed characters as special shell characters but would pass them, unchanged, to the grep command. Among the characters that the shell would tend to treat as “special” are blanks and / * < > | ? \ ; , ! $ ' "` We’ve already seen special uses for the first 3 of these, and will encounter some of the others in later lessons.

Because these characters tend to cause problems when we type them, we generally avoid unnecessary uses of them. For example, you’ll seldom see any of them as part of a file name. That’s not because it’s illegal to use them in file names. Unix is amazingly tolerant of what characters get put into file names, but only masochists would use these special characters because the command shell will interpret them as something else, making it difficult to type the file name. 2 Quoting – Three Ways

What do we do if we need to type one of these special characters into a command but not have it treated specially? For example, suppose that we had a file foo.txt and we wished to list all the lines in that file that contained a “<”. We can’t do

grep < foo.txt because the < is a special (input redirection) character to the shell.

What we need to do is to quote that special character somehow to prevent the command shell from treating it specially. There are three ways we can do this:

1. We can place a backslash (\) in front of the special character.:

grep \< foo.txt

Note that backslashes can quote themselves. So if we wanted to print on our terminal screen a backslash followed by an asterisk, we could write

echo \\\*

The first backslash quotes the second one. The third one quotes the asterisk.

2. We can enclose all or part of the argument in single quotation marks. This suppresses all special characters. Also, if the enclosed portion includes blanks, it combines what would otherwise have been seen as multiple parameters into a single parameter. grep Hello there foo.txt

This would look in the files named “there” and “foo.txt” for any lines containing the word “Hello”.

grep 'Hello there' foo.txt

This would look in the file named “foo.txt” for any lines containing the phrase “Hello there”.

grep 'Hello there!' foo.txt

This would look in the file named “foo.txt” for any lines containing the phrase “Hello there!”. Note that, without the quotes, the “!” would have been treated as a special character.

3. Finally, we can enclose all or part of the argument in double quotation marks. This suppresses all special characters except $, and also gathers its contents into a single parameter.

Example 1: Try This: Single versus Double Quotes

echo $USER * echo '$USER *' echo "$USER *"

Can you see the differences in how the two special characters are treated in each case?

2.1 Examples: Unconventional File Names

One of the common uses of quoting is in dealing with file names containing spaces (or other unusual characters). Unix users tend to avoid file names with spaces, but they aren’t actually illegal. And if you are hopping back and forth between Unix and Windows systems, you will find that Windows users love to put blank spaces inside file names. (Unless they are Windows programmers who use the Windows cmd program to do text-based commands. They hate spaces inside file names, too.)

Example 2: Try This: Unusual Characters in File Names

Give the commands:

cd ~/playing ls cp math.h "ax bx cx.h" ls

You now have a new file. Notice anything odd about it? Let’s look inside it.

Give the command

more cx.h

That doesn’t work, even though it would be easy to guess from the preceding ls listing that it would have worked.

Give the commands more ax bx cx.h more "ax bx cx".h more ax\ bx\ cx.h

Which ones work, and why?

Now try this. Type more ax and then hit the Tab key to request automatic file name completion.

Notice how the automatic feature fills in appropriate quoting for you.

There are worse things than blanks that can legally occur in file names. In fact, just about any special character could be put into a file name if we were masochistic enough.

Example 3: Try This: Even More Unusual Characters in File Names

Give the commands:

cp "math.h" "ma*+;.h" ls

Now suppose you want to access the new file. Try this:

echo ma* echo ma**

Why did you get both files?

To get just the new one, again the answer is proper quoting. Try:

echo ma\**

Do you understand why the two asterisks are treated differently?

Why would

echo ma*\*

not do the same thing? Editing in Text Mode

Steven Zeil

Last modified: Aug 17, 2019

Contents: 1 nano 2 emacs 2.1 Emacs Modes 2.2 The Mark and the Region 2.3 Where’s the Documentation? 3 vim 3.1 vim and C++

An editor is a program that allows you to easily create and alter text files.

Because we are still working via text-mode connections, our editors will need to run within our SSH terminal windows. That means they will not provide menus, buttons, mouse interactions, and many of the conveniences that we get when connected in graphics mode. Instead we will need to give all of our instructions via the keyboard. Usually that means using special key sequences using the Ctrl or Alt keys to accomplish things like loading and saving files, copying and pasting text, etc.

Novice Linux users will find nano to be a good starting point. Later, many will appreciate the special features offered by vim or emacs. We’ll discuss each of these here. The eventual choice, however, is up to you. 1 nano nano is a very basic text editor. It’s chief virtue is its eas of use because it pretty much tells you all the “special” commands available at any time.

You’ll find nano on almost all Linux machines. A few non-Linux Unix machines may have pico instead. pico was the original simple editor. nano was developed as an open-source look-alike.

Invoke nano as nano path-to-existing-file to edit an existing text file, or nano path-to-new-file to create and start editing a new, empty text file.

Example 1: Try This: Edit with nano

Start by copying a file into your directory to work with.

cd ~/playing cp ~cs252/Assignments/textFiles/snark.txt .

Then edit that file: nano snark.txt

You should see something like the picture shown here.

You can see the current cursor position in the upper left.

Try using your arrow keys and Page Up/Page Down keys to move around.

In the bottom two lines, you should see a list of special commands that you can issue. The convention for these is that

The caret (^) indicates a control character. Type this by holding down your Ctrl key and typing the character. Even though control characters are traditionally shown as upper-case characters (e.g., ^G), do not hold the shift key down when typing them.

M- indicates a key sequence that can be typed by typing Esc followed by the next character (again, unshifted). You might also be able to type these key sequences by holding down your Alt key while typign the next character.

1. For now, type ^G and read the help text. Return to your file via the “Exit” command (which will be listed at the bottom of the screen.

2. Use the “Where is” command to hunt for the word “muffins”.

3. Use the “Go to Line” command (note that this will require you to use the Shift key to get _ instead of -) to return to the first line of the file.

4. Use the “Replace” command to replace all occurrences of “snark” by “snipe”.

5. The “Undo” command in nano is M-U. Use it (repeatedly) to undo all of your replacements.

At some point this semester, we will be updating our Linux servers. Until then, this command does not work.

If you try this and get a message indicating that M-U is an “Unknown Command”, simply exit nano without saving your changes and re-enter using the same nano snark.txt command that you used earlier.

6. Use the “Go to Line” command again to return to the first line of the file.

7. Use the “Replace” command to replace all occurrences of “snark” by “snipe”, but this time, before giving your text to replace, use the “Case Sens” command to turn on case sensitivity. Notice the difference in behavior.

8. Use the “Write Out” command to save your changes as a new file snark2.txt.

9. Use the “Exit” command to leave nano. 10. To see that you actually made a change, use a grep command to look for the word “snipe”:

grep snipe snark2.txt

11. The diff command can be used to compare two text files and list the differences between them. Give the command

diff snark.txt snark2.txt

You should see that it lists a single line as having been changed, showing both the old and new form of the line.

2 emacs nano will suffice for basic editing, but I recommend that programmers also learn either emacs or vim because

They work nicely in both text and graphics mode. In graphics mode they offer menus, respond to mouse clicks, and do all the things would would expect in a window-based GUI.

They offer a number of aids for programmers, including syntax-based highlighting of different programming languages (using different colors/fonts for reserved words, comments, strings, etc.), commands for compiling programs and collecting the error messages, and for stepping through those error messages, and, with emacs, for debugging the compiled code.

That said, there is an almost religious fervor separating the emacs and vim camps. I think that you can do well with either. Personally, I prefer emacs because

I think it’s easier to get started with. It has debugging support for programmers.

If you are going to do most of your programming work in IDEs, that may not be a big factor. In fact, I generally use emacs’ debugging support mainly for obscure programming languages where I don’t have a fancier IDE.

So, if you already know a little vi or vim and would prefer to stay in that world, or if you just think the name “emacs” is plain silly, you can skip down to the next section.

Like nano, you launch emacs by giving the command name (emacs) followed by the path to a file that you want to edit.

If you give a path to a file that does not exist yet, emacs will start you off with an empty file. The actual file will be created when you first save your work. emacs has a built-in tutorial, and you should begin that very shortly. But first, just a couple of notes:

The tutorial describes a number of commands like this: C-_chr_, which means to hold down the Ctrl (a.k.a, Cntrl, Control, etc.) key while typing the character chr. For example, C-c means “hold down Ctrl while typing ‘c’.”

You may have seen the notation ^_chr_ (e.g., ^c) used other places to mean the same thing.

The tutorial describes a number of commands like this: M-_chr_, and says the “M-” means to “hold the META or EDIT or ALT key down while typing” the other character, and says that if you don’t have a META, EDIT, or ALT key that you can type (and release) the Escape key instead.

It’s worth noting that you may, indeed, have some of those keys on your keyboard, but your ssh client program might not pass them on. Some ssh programs may just ignore those keys. In other cases (especially the ALT keys), the Windows or other operating system may be jealously seizing upon those keys for its own purposes. In some cases, you may have 2 ALT keys, one on the left of the spacebar and one on the right. Sometimes one of these will work and the other will not, so try them both. But if nothing else works, the Escape key will always be there for you. (If you change ssh programs, though, or when you start using X to connect in graphics mode, you might want to check out those keys once again.)

Example 2: Try This: The emacs tutorial

Give the command

emacs -nw

to run emacs in a text-mode session.

Follow the directions given to bring up the tutorial (i.e., type ^h followed by “t”.).

Continue following the instructions to make your way through the tutorial.

Exit emacs when you have completed it.

Now let’s try actually using emacs to edit a file.

Example 3: Try This: Edit with emacs

You should still have a snark.txt file in your playing directory.

cd ~/playing emacs -nw snark.txt

1. Try using your arrow keys and Page Up/Page Down keys to move around. These should work just as well as the control keys covered in the tutorial.

2. Use the search (C-s) command to hunt for the word “muffins”.

3. Use the M-< command to return to the first line of the file.

4. Use the replace (M-%) command to replace all occurrences of “snark” by “snipe”. Notice as you go how emacs treats the various upper/lower case variations of “snark”.

5. The “Undo” command in emacs is C-x u. Use it (repeatedly) to undo all of your replacements.

6. Use the M-< command again to return to the first line of the file.

7. Toggle the case sensitivity via the command M-x toggle-case-fold-search. You don’t need to type all of that out, however. Try doing M-x tog then hit the Tab key. emacs will attempt to compelte the command, adding gle-.

It will stop there, because there are a lot of different commands that start with M-x toggle- and emacs doen’t know which one you want. (You can hit Tab again to see what they are.) Type a c and hit Tab again to complete your choice. Then use Enter to run the command.

8. Use the replace (M-%) command to replace all occurrences of “snark” by “snipe”. Notice as you go how emacs treats the various upper/lower case variations of “snark” this time.

5.Use the C-x C-w command to save your changes as a new file snark3.txt.

6.Use the command C-x C-c to exit emacs. 1. To see that you actually made a change, use a grep command to look for the word “snipe”:

grep -i snipe snark3.txt

2. The diff command can be used to compare two text files and list the differences between them. Give the command

diff snark.txt snark3.txt

You should see your snark/snipe changes.

When you are done with the tutorial, here are few extra things you should know about emacs: 2.1 Emacs Modes emacs offers customized modes for different kinds of files that you might be editing. Some of these are chosen automatically for you depending upon the file name you give. Others can be chosen automatically by giving the command M-x name-mode where name indicates the desired mode. Some of the most popular modes are: text, html, c, and c++. The programming language modes generally offer automatic indenting at the end of each line, though you may have to end lines with the “Line feed” or “C-j” key rather than “Return” or “Enter” to get this.

Example 4: Try this: C++ in emacs

cd ~/playing cp ~cs252/Assignments/textFiles/hello.cpp . emacs -nw hello.cpp

emacs will take note of the .cpp file extension and start this file in C++ mode.

If your terminal program supports color, you may see colored “syntax highlighting” as emacs renders reserved words, string constants, and comments in different colors and/or fonts.

And there are little, more subtle things that you will come to appreciate over time. Try typing a for loop

for (int i = 1; i <= 10; ++i) { cout << i << endl; }

Notice as you type the “)” and “}” in your code, the cursor briefly flashes back to the “(” or “{” that you just closed. Notice also how the indentation is supplied automatically as you end each line.

Move your cursor back up to the endl and delete that word. Now type just the e and then do M-/.

The command M-/ is a special friend to all programmers who use long variable names but hate to type them. Type a few letters of any word, then hit M-/. emacs will search backwards through what you have previously typed looking for a word beginning with those letters. When it finds one, it fills in the remaining letters. If that wasn’t the word you wanted, just hit M-/ again and emacs will search for a different word beginning with the same characters.

2.2 The Mark and the Region emacs has a number of commands that work on an entire block of text at a time. For example, the emacs tutorial told you how to delete a line using C-k. But what if you wanted to delete everything from the middle of one line to the 1st word five lines away? There is a command (C-w) for killing an entire region of text, but to use it you must first tell emacswhat region you want to kill.

The procedure for doing this is the same in all emacs commands that work on regions of text. The “current region” is the set of characters from the “mark” to the current cursor position. The “mark” is an imaginary position marker established by the set-mark-command. The keystrokes for that command are either C-[spc] (hold the control key and type a space) or C-@ (hold the control and shift keys and type ‘2’).

So to set up a region to operate on, you move the cursor to one end of the region, give the set-mark-command, then move the Some ssh programs will not let you type the characters cursor to the other end of the region. Everything between the C-[spc] or C-@. These characters are the emacs keys mark and the cursor constitutes the current region, and can be used to set the “mark” when selecting a region of text. operated on by any region-based command. Some region commands of note are: You can always run the set-mark command as M-x set-mark-command (but who really wants to type all C-w that every time?) Kill the region, i.e., delete it but save the deleted text in the clipboard. You can “yank” (paste) the deleted text at You could just get a different ssh program. See the the cursor position with C-y. “Downloads” section on the CS 252 Library page for some recommendations. M-w Copy the region, i.e., Save a copy of the region’s text in You can bind a different key to set-mark-command the clipboard. You can “yank” (paste) the deleted text at that you can type from within Windows telnet. Add the the cursor position with C-y. line:

C-xC-x (global-set-key "\M-\\" 'set-mark-command) Exchanges the mark and the cursor. Hitting this repeatedly will flip you back and forth between the start and end of the current region. Although often useful in to your .emacs file, restart emacs, and you can set the its own right, this command also provides a quick way mark with the key sequence M-\ (escape followed by a to check and see if the region is really where you think backslash) it is. I’ve added that global-set-key to my own .emacs file, C-cC-c so if you’ve copied that into your home directory as Comment out a region - in C++ and other programming described earlier, that key binding will already be modes, places comment markers in front of each line in defined. the region.

M-x ispell-region or M-x flyspell-region Run a spell check on all text in the region. 2.3 Where’s the Documentation?

It’s in emacs. The end of the tutorial discussed some of the built-in help features in emacs. One that isn’t mentioned is the way to get to the entire “reference manual” for emacs. The “info viewer” gives you access to extensive documents about emacs (and about a number of other programs as well, as authors of many other programmers have found the info viewer a convenient way to package on-line documentation.) The commands C-h i or M-x info will launch the info viewer, and the first page of the viewer gives basic instructions on how to use it.

Finally, I have an emacs command reference sheet on the Library page. 3 vim If you are happy with emacs, you can skip this section.

But if you develop an incurable allergy to emacs, there are other editors that offer reasonable support to programmers. Some of the textbooks discuss vi, a popular editor that does not offer much support for programming, but a reasonable option is vim (“vi improved”).

Like emacs, vim is available on many, but not all, Unix systems and, once you have learned it, you can use it over telnet via keyboard commands. To learn the basics of running vim, give the command vimtutor.

Example 5: Try This: The vim tutorial

Give the command

vimtutor

Follow the instructions given there to learn the basic operation of vim.

Now, let’s try it out.

Example 6: Try This: Edit with vim

You should still have a snark.txt file in your playing directory.

cd ~/playing vim snark.txt

1. Try using your arrow keys and Page Up/Page Down keys to move around. These should work just as well as the movement keys covered in the tutorial.

2. Use the search (?) command to hunt for the word “muffins”.

3. Position your cursor just after that word. Type ‘i’ to enter “Insert mode”. Insert the text “ and jam”. Use the Esc key to exit Insert mode.

4. Use the gg command to return to the first line of the file.

5. Use the replace (:%s/old/new/igc) command to replace all occurrences of “snark” by “snipe”. Notice as you go how vim treats the various upper/lower case variations of “snark”.

6. The “Undo” command in vim is u. Use it (repeatedly) to undo all of your replacements.

7. Use the gg command again to return to the first line of the file.

8. Use the replace (:%s/old/new/gc) command to replace all occurrences of “snark” by “snipe”. Notice as you go how vim treats the various upper/lower case variations of “snark”.

5.Use the command :w snark3.txt command to save your changes as a new file snark3.txt.

1. Use the command “:q!” to exist vim.

2. To see that you actually made a change, use a grep command to look for the word “snipe”:

grep -i snipe snark3.txt

3. The diff command can be used to compare two text files and list the differences between them. Give the command diff snark.txt snark3.txt

You should see your snipe and jam changes.

3.1 vim and C++

Example 7: Try this: C++ in vim

cd ~/playing cp ~cs252/Assignments/textFiles/hello.cpp . vim hello.cpp

vim will take note of the .cpp file extension and start this file in C++ mode.

If your terminal program supports color, you may see colored “syntax highlighting” as emacs renders reserved words, string constants, and comments in different colors and/or fonts.

And there are little, more subtle things that you will come to appreciate over time. Try typing a for loop

for (int i = 1; i <= 10; ++i) { cout << i << endl; }

just after the existing cout <<… statement.

Notice as you type the “)” and “}” in your code, the cursor briefly flashes back to the “(” or “{” that you just closed.

Move your cursor back up to the endl and delete that word. Now type just the e and, while still in Insert mode, then do ^P.

The commands ^P and ^N are special friends to all programmers who use long variable names but hate to type them. Type a few letters of any word, then hit one of those control keys. vim will search backwards (^P) or forwards (^N) through what you have previously typed looking for a word beginning with those letters. When it finds one, it fills in the remaining letters. If that wasn’t the word you wanted, just hit the same key again and vim will search for a different word beginning with the same characters. File Protection

Steven Zeil

Last modified: Apr 20, 2019

Contents: 1 Protection and Permissions 2 chmod 3 Beware the umask! 4 Planning for Protection 1 Protection and Permissions

Not every file on the system should be readable by everyone. Likewise, some files that everyone needs (such as the executables for commands like cp, mv, etc.) should not be subject to accidental deletion or alteration by ordinary users. This is where file permissions come into play.

Unix allows three forms of access to any file: read, write, and execute. For an ordinary file, if you have read (r) permission, you can use that file as input to any command/program. If you have write (w) permission, you can make changes to that file. If you have execute (x) permission, you can ask the shell to run that file as a program.

The owner of a file can decide to give any, all, or none of these permissions to each of three classes of people:

To the owner of the file him/herself

To members of a designated “group” established by the systems staff. Groups are generally set up for people who will be working together on a project and need to share files among the group members.

To anyone else in the world.

These three classes are abbreviated “u”, “g”, and “o”, respectively. The “u” is for “user”, “g” for “group”, and “o” is for “others”. Until you actually join a project that needs its own group, you will mainly be concerned with “u” and “o” classes.

The ls -l command will show the permissions granted to each class. For example, if you said

ls -l ~/playing you might see the response

-rwxrwx--- 1 johndoe student 311296 Jul 21 09:17 a.out -rw-rw---- 1 johndoe student 82 Jul 21 09:12 hello.c -rw-rw---- 1 johndoe student 92 Jul 21 09:13 hello.cpp -rw-rw---- 1 johndoe student 85 Jul 20 15:27 hello.wc

On the far right, you see the actual file names. In front of that you are shown the date and time on which that file was last modified. In To see what groups you yourself are a member front of the date is the size of the file (in bytes). The two columns of, give the command near the middle that contain names indicating the owner of the file (in this case, the owner has login name johndoe) and the group to groups which that file is assigned (in this case, the group student). Some If you are a member of more than one group, typical groups are “wheel”, “faculty”, “gradstud”, and “student”. you can assign your files to any one of those “Wheel” has no members, but groups like “student” and “gradstud” groups. The Unix command chgrp (“change have very broad membership, as their names imply. group”) is used for this purpose.

Finally, look at the pattern of hyphens and letters at the far left of the ls output. The first character will be a “d” if the file is a directory, “-” if it is not. Obviously, none of these are directories. The next 3 positions indicate the owner’s (u) permissions. By default, you get read and write permission for your own files, so each file has an “r” and a “w”. a.out is an executable program, so the compiler makes sure that you get execute (x) permission on it. The other files can’t be executed, so they get no “x”. This way the shell will not even try to let you use hello.c or any of the other source code files as a program.

The next three character positions indicate the group permissions. In this case, the group permissions are the same as the student owner’s permissions - all members of the student group can read or write these files and can execute the a.out program.

The final three character positions indicate the permissions given to the world (others). Note that in this case, people other than the owner or members of the same group cannot read, write, or execute any of these files.

Directories also can get the same rwx permissions, though the meaning is slightly different. If you have read permission on a directory, you can see the list of files in the directory via ls or other commands. If you have execute permission on a directory, then you can use that directory as one component of a path (e.g., ~yourName/directory-with-x- permission/foo.txt) to get at the files it contains. So, if you have execute permission but not read permission on a directory, you can use those files in the directory whose names you already know, but you cannot look to see what other files are in there. If you have write permission on a directory, you can change the contents of that directory (i.e., you can add or delete files). 2 chmod

The chmod command changes the permissions on files. The general pattern is

chmod class+permissions files or

chmod class-permissions files

Use “+” to add a permission, “-” to remove it. For example, chmod o+x a.out gives everyone permission to execute a.out. chmod g-rwx hello.* denies members of your group permission to do anything at all with the “hello” program source code files.

You can also add a -R option to chmod to make it “recursive” (i.e., when applied to any directories, it also applies to all files in the directory (and if any of those are directories, to the files inside them, and if…). For example, if I discovered that I really did not want the group to have permission to write or execute my files in ~/playing, I could say:

chmod -R g-wx ~/playing

An alternate way of setting and removing permissions is to specify all 9 of the permissions (user, group and world - read write and execute) at once by giving them as a three-digit number. Each digit must be in the range 0-71 and the digits give the permissions for the owner, group and world, in that order. The digits are computed as binary numbers with read permission in the 4’s position, write permission in the 2’s position, and execute permission in the 1’s position. Each bit in this number is 1 if permission is granted and zero if permission is denied. So, for example, if we wanted to give the owner read and write permission but not execute permission, the digit would be computed as: owner = 4 ∗ 1 + 2 ∗ 1 + 0 = 6 If we wanted to give the group read permission only, the digit would be computed as: group = 4 ∗ 1 + 2 ∗ 0 + 0 = 4 and if we wanted to give no permissions at all to the world, world = 4 ∗ 0 + 2 ∗ 0 + 0 = 0

We would then set the permissions for the file by giving these three digits in the chmod command:

chmod 640 hello.c

Example 1: Try This: Setting Permissions on a Directory

Start by copying a file into your directory to work with.

cd ~/playing cp ~cs252/Assignments/textFiles/hello.sh . ls -l hello.sh cat hello.sh ./hello.sh echo '#comment' >> hello.sh cat hello.sh

All of the above commands should succeed.

The >> part of the echo command above uses a technique called redirection that we will study later. In this case, it allows us to add lines to the end of a text file.

You should see the change we have made in the second cat output.

Now let’s take away all permissions and try those things again.

chmod 000 hello.sh ls -l hello.sh cat hello.sh ./hello.sh echo '#comment' >> hello.sh

With these permission settings, we can’t do anything at all with that file.

Let’s add read permission:

chmod 400 hello.sh ls -l hello.sh cat hello.sh ./hello.sh echo '#comment' >> hello.sh cat hello.sh

Let’s add execute permission:

chmod 700 hello.sh ls -l hello.sh cat hello.sh ./hello.sh echo '#comment' >> hello.sh cat hello.sh

We could also have written the chmod command as chmod 700 hello.sh

It’s more difficult to demonstrate the effect of group and world permissions, because they affect how other people see your files, but we can do it using a file that does not belong to us.

ls -l ~cs252/Assignments/textFiles/text1.txt ls -l ~cs252/Assignments/textFiles/text2.txt

Because you are neither the owner of those files, nor in the group they belong to, the “world” permissions control your attempts to use those files. See if you can predict, from the permissions you see, what the results of the next two commands will be before you execute them:

cat ~cs252/Assignments/textFiles/text1.txt cat ~cs252/Assignments/textFiles/text2.txt

Example 2: Try This: Setting Permissions on a Directory

cd ~ chmod 700 ~/playing/hello.sh ls -ld playing

The d option for ls asks it to show us the info on the directory itself instead of, as it sould usually do, the info on the files inside the directory.

Right now, everything should be good:

ls playing cat playing/hello.sh cp playing/hello.sh playing/hello1.sh

Let’s take away our permissions on the directory.

chmod 000 playing ls -d playing

Take note of which commands work and which ones don’t.

ls playing cat playing/hello.sh cp playing/hello.sh playing/hello1.sh

Now, let’s give ourselves read permission on the directory:

chmod 400 playing ls -d playing ls playing cat playing/hello.sh cp playing/hello.sh playing/hello1.sh

Now, let’s change that to execute permission: chmod 100 playing ls -d playing ls playing cat playing/hello.sh cp playing/hello.sh playing/hello1.sh

Finally, let’s give ourselves read, write, and execute permission:

chmod 700 playing ls -d playing ls playing cat playing/hello.sh cp playing/hello.sh playing/hello1.sh ls playing

3 Beware the umask!

Suppose you never use the chmod command. What would be the protection levels on any files you created?

The answer depends upon the value of umask. Look in your ~/.bashrc file for a command by that name, and note the number that follows it. If you don’t have one, just give the command

umask and note the number that it prints.

The umask number is a 3 digit (base 8) number, similar to the numeric form of the permissions in the chmod command. The first digit describes the default permissions for the owner (you), the second digit describes the default permissions for the group, and the final digit describes the default permissions for others. Each of these three numbers is, in turn, formed as a 3-digit binary number where the first digit is the read permission, the second is the write permission, and the third digit is the execute permission.

Unlike the chmod command, however, in each binary digit of the umask, a 0 indicates that the permission is given, a 1 that the permission is denied.

So if my umask is 027, that means that

I (the owner) have 000 — permission to read, write and execute my own files.

The group to which a file belongs has 010, permission to read, no permission to write, and permission to execute that file.

The rest of the world has 111, no permission to read, write or execute.

Of course, these permissions can be changed for individual files via the chmod command. The umask only sets the default permissions for cases where you don’t say chmod.

If you want to change your default permissions, you do it via the umask command by giving it the appropriate 3-digit octal number for the new default permissions. Some common forms are: umask 022 Owner has all permissions. Everyone else can read and execute, but not write. umask 077 Owner has all permissions. Everyone else is prohibited from reading, writing, or executing. Since the point of the umask command is to establish the default behavior for all your files, this command is normally placed within your .bashrc file. 4 Planning for Protection

At the very least, you will want to make sure that files that you are preparing to turn in for class assignments are protected from prying eyes. You need to do a little bit of planning to prepare for this. There are two plausible approaches:

Use a stringent enough umask (e.g., umask 077) so that everything is protected by default.

The only disadvantage is that files that you want to share (e.g., the files that make up your personal Web page) must be explicitly made world-readable (chmod go+r files).

Use a more relaxed umask (e.g., umask 022) so that your files are readable by default, but establish certain directories in which you carry out all your private work and protect those directories so that no one can access the files within them. For example, you might do

cd ~ mkdir Assignments chmod go-rwx Assignments

Now you can put anything you want inside ~/Assignments, including subdirectories for specific courses, specific projects, etc. Even if the files inside ~/Assignments are themselves unprotected, other people will be unable to get into ~/Assignments to get at those files.

The one disadvantage to this approach is that it calls for discipline on your part. If you forget, and place your private files in another directory outside of ~/Assignments, then the relaxed umask means that those files will be readable by everyone!

1: Technically, we are giving a 3-digit Octal (base 8) number. File Transfer

Steven Zeil

Last modified: Aug 2, 2018

Contents: 1 Before You Start: Binary and Text Files 1.1 Variations among Text Files 1.2 Identifying the File Contents 1.3 Converting Text Files 2 Transferring Files Across the ODU Local Network: Samba 3 Transferring Files Across the Internet: ftp 4 Secure File Transfer: sftp and scp 4.1 SFTP via a Text-based Interface 4.2 SFTP via GUI Interface 4.3 Try It Out 4.4 scp 5 Fetching Files from Across the Internet: wget

If you prepare files on one machine but want to use them on another, you need some means of transferring them. For example, if you edit files on your home PC, you may eventually need to get those files onto the CS Department network. On the other hand, you may want to take files your instructor has provided off of that network for use on your home PC. 1 Before You Start: Binary and Text Files

A complicating factor in transferring files from one computer to another is that you must decide whether the files you want to transfer should be treated as “text” or as “binary”. (We’ve talked a bit about this already, but it’s particularly important when transferring files.)

All files are containers of bytes. But in many files, these bytes are intended to represent text:

The numbers stored in individual bytes represent characters. The file is logically divided into lines. The end of each line is indicated by a special line terminator, a sequence of one or two characters reserved for that specific purpose.

Such files are referred to as text files. The way in which text is encoded as numbers in a text file is governed by various international standards, which we’ll look at in just a moment. Because so many programs observe these standards, you can safely manipulate text files with a wide range of different text-oriented programs, even ones that the original creator of the text file might not have intended or might not have known about.

In other files, the numbers really are numbers. They encode data in a more complicated fashion, not line by line of text. These are called binary files. The way in which information is encoded into a binary file is entirely determined by the programs (and their programmers) intended for use with the file. Manipulating a binary file with any program not intended for that specific kind of encoding can be disastrous.

Some binary files can only be used on one specific operating system. Compiled, executable programs, for example can only be run on the operating system for which they were prepared. You cannot run a Windows executable on a Linux machine, nor run a Linux executable on a Windows machine.

Some binary files can be used on multiple operating systems. A Microsoft Word .doc file may start life on a Windows PC, but Word is also available on Apple OS/X and Android, and can also be processed by the OpenOffice and LibreOffice suites that run on all of those operating systems and on Linux as well. Similarly, .zip files and .pdf` can be processed under almost any operating system, even though the exact programs that do so may vary. 1.1 Variations among Text Files

1.1.1 ASCII

Since the late 1960’s, most text files were encoded using the ASCII character set. This encoding uses the numbers 0…127 (in hexadecimal, 0… 7F) and so fits comfortably into an 8-bit byte (which can actually hold the numbers 0… 255).

The numbers 32…126 (in hex, 20…7E) denote the printable ASCII characters, including the blank space, upper and lower-case letters, numbers, and punctuation characters.

Numbers 0…31 (0…1F) and 127 (7F) are used for various control characters.

Originally, these control characters were used to “control” output device behavior. For example CR (carriage return) cause The Ctrl key label on most keyboards is an a printer to move its print head to the leftmost column of a abbreviation for “control”. Most of the control page. LF (line feed) caused a printer to move one line down on characters can be typed by holding down the Ctrl key the page. while typing the ASCII character 96 places higher in the ASCII character set. For example, the TAB or In modern usage, only a few of these control characters will Horizontal Tab character has code 9. You can type it appear in a text file. TAB characters are common, and CR and by holding down Ctrl while typing the key LF are, as we shall see, used as line terminators. FF (form corresponding to ASCII code 9+96=106, the ‘i’. Of feed) characters were originally used to tell a printer to feed in course, that is probably what you get by using the Tab a new form (page) and are still occasionally used to indicate key on the keyboard as well. the start of a new page.

One of the ways to tell if a file is intended to be an ASCII text file is to look for characters that fall outside the normal range of characters. If it has bytes containing numbers 128–255, it definitely is not ASCII text. If it has bytes containing numbers in the range 0–31, other than 10 (LF), 11 (TAB), 12 (FF), or 13 (CR), it probably is not ASCII text.

1.1.2 ASCII Line Termination

Even if you have ASCII text in a file, there is some variation on how to encode the file. Windows and other operating systems disagree on how to divide an ASCII text file into lines.

The end of a line in a Unix/Linux/Android/OSX ASCII text file is represented by a single character: the LF (line feed) character, a.k.a. ^J.

MS Windows uses a pair of characters at the end of each line: the CR (carriage return) or ^M followed by a LF. This means that ASCII text files created in Windows tend to look, to a Unix program, as if they have extra ^M characters near the end of each line. ASCII files created in Unix, on the other hand, tend to look to a Windows program as if the entire file contains only one line with odd ^J characters sprinkled inside.

Example 1: Try This:

1. Download this file and this one onto a Windows PC. (Please note; you don’t download by left- clicking on a link. Right-click on it and use the “Save … As” option – the exact phrase depends on what browser you are using.) 2. Open them each in NotePad. 3. These two files are identical except for the line termination characters. Notice how differently they are treated by NotePad. 4. Log into one of our Linux servers. Use emacs to visit the files /home/cs252/Assignments/textFiles/wintext.txt /home/cs252/Assignments/textFiles/linuxtext.txt /home/cs252/Assignments/textFiles/mixedtext.txt

As you do, pay attention to the emacs status line. emacs is happy to handle wither Linux or Windows text files, but tries to alert you when you open the Windows text file. (“DOS” is, of course, the ancestor of Windows.)

5. Give the commands

file /home/cs252/Assignments/textFiles/wintext.txt file /home/cs252/Assignments/textFiles/linuxtext.txt file /home/cs252/Assignments/textFiles/mixedtext.txt

Observe how the file command tells you about the contents of each.

1.1.3 Unicode

Although ASCII has been queen of the text encoding world for most of the history of computing, it has limitations. Modern applications need many more than the 96 printable characters available in ASCII. Unicode is an international standard encoding that uses multiple bytes per character to extend ASCII (the 128 ASCII characters are preserved in Unicode at their original numeric values), adding characters from international alphabets, mathematical and musical symbols, simple graphics, and a variety of other “utility” characters.

Unicode actually has multiple ways of encoding these extensive characters. One Unicode encoding simply uses two bytes per character, for a total of 65238 possible characters. But since most text files are still heavily oriented towards ASCII (0–127), this doubles the size of the typical text file with a lot of zero bytes. So another popular encoding (called "“UTF- 8”) uses 1 byte to represent an ASCII character, with a special non-ASCII byte value used to signal that the next character coming will be a multi-byte Unicode value.

Like ASCII text files, lines in Unicode text files can be terminated by LF or by a CR-LF sequence, depending on the operating system. Unicode also introduces a its own optional non-ASCII control characters to signal the end of a line and the end of a paragraph. Given this many options, though, it’s generally safe to assume that any program sophisticated enough to handle Unicode will be able to cope with any of the multiple options for line termination a file might employ. 1.2 Identifying the File Contents

How can you tell if a file is text or binary?

Let’s get one thing out of the way right now: You can not tell if a text is text or binary by double-clicking on it in an operating system window to open it up. Launching a file in this way simply runs whatever program the operating system believes is most appropriate to that file. That program may very well show you text, but that doesn’t mean the information was encoded as a text file. On the other hand, the program might show you graphics with no text at all, but that does not mean that the graphics were not drawn from a description written in ASCII text.

You can get a hint as to whether a file is text or binary by looking at the file extension (the 2 or 3 letters after the final ‘.’ in a file name).

These are usually text files:

*.txt and anything else that you produce with a text editor (but not a word processor)

Source code: *.h, *.cpp, *.java, makefile

Web pages and web-based data: *.html, *.htm, *.xml, *.xhtml

A few graphics formats: *.ps, *.eps, *.svg

These are almost always binary files:

Things produced by a compiler: *.o, *.class, *.a, *.lib, *.dll, *.exe

Formatted documents other than web pages: *.doc, *.docx, *.pdf

Most graphics formats: *.gif, *.jpeg, *.jpg, *.png, *.tiff, *.bmp

All music and video formats

Compressed files and archives: *.zip, *.tar, *.jar, *.7z, *.rar, *.gz, *.Z

These could be either

*.dat

Files whose names do not include an extension at the end

But that’s only a hint. There are so many programs in the world that some are bound to use the same extensions.

In Linux, the best way to see what kind of data is in a file is to use the file command, e.g.,

file mystery.dat

The file command will print a description fo the file contents. This description can be a few lines long. If the file contains ASCII text or Unicode text, this will be stated explicitly as part of the description. If the file is ASCII text but with Windows-style line termination, it will state that as well.

In Windows, your best way to see if a file is text or binary is to open it in a text editor such as NotePad (but not in a word processor such as Word).

If the result is readable text, you have ASCII text with Windows-style line terminators.

If the result is readable text but with everything crammed into one line, you have ASCII text with *nix-style line terminators.

If the result has lots of white or black rectangles in the place of characters, the file is apparently binary. 1.3 Converting Text Files If you find yourself with a Windows-style text file on a Linux machine, or if you have a Linux-style text file that you want to transfer to a Windows machine, you can convert from one form to the other.

1.3.1 Windows to Linux

To convert a Windows-style text file to Linux, use the command tr: tr -d '\r' < file1 > file2

This produces a new file file2 from file1 by converting the line endings to the Unix format. (Note that file1 and file2 cannot be the same file.)

On most (not all) Linux systems, you could also use the command dos2unix: dos2unix file1

1.3.2 Linux to Windows

You can also prepare a text file for transfer to a Windows system with unix2dos: unix2dos file1

Be sure to check your file with the file command before using dos2unix or unix2dos. If the file is not, in truth, ASCII text, these commands will likely leave you with a badly corrupted file.

2 Transferring Files Across the ODU Local Network: Samba

If you are sitting at a Windows PC that is part of either the CS Dept’s own local network (e.g., in the CS Dept. labs or connected to the CS Dept’s wireless network) or part of the ODU ITS local network (most ODU computer labs on the Norfolk, Virginia Beach, and Peninsula Gradate Center campuses), then you can access your Unix account directories directly from within Windows. This is because the CS Dept. Unix file servers run a service called “Samba”, a program that mediates file access between UNIX and other systems.

Again, let me emphasize that Samba only works on a local network. If you are connecting to the campus via the Internet, forget Samba.

To use Samba, you might not need to do anything at all. If you are logged in to a CS Dept PC and you have a Z: drive mapped, that is actually a Samba connection to your Unix home directory.

If you have no such drive, use the Windows “Start->Run” button to run

\\userdata.cs.odu.edu\undergrad\your-login-name

(Graduate students would use “grad” instead of “undergrad” in the path above.) You may be prompted for a password, or a login name and password. Supply your CS Unix login/password and, if all is well, a Windows Explorer window should open displaying the contents of your Unix home directory. You can now manipulate files in this Window just as you would in any Windows directory/folder, but the changes are occurring in your Unix directory. Now that we know that it works, we can make this whole process more convenient by mapping a Windows drive letter to your Unix account, giving you a “fake” disk drive that actually accesses your Unix files. From inside any Windows Explorer or “ My Computer” window, select “Tools” (or right-click on the “My Computer” icon) and select “ Map Network Drive…”. Select an unused drive letter, and enter that same address/command string as in the last step for the “ Folder”. Make sure the “Reconnect at logon” box is checked. Finally, if your login name for logging into the PC is different from your CS Unix login name, look for a “Connect using a different user name” link, click on that and supply your Unix login information. Click on OK/Finish and within a few seconds, you should have a new drive available that actually maps onto your Unix account.

Two things to keep in mind when using Samba to access your Unix files from Windows:

Watch your file permissions! Samba tries to set reasonable permission levels to Unix-hosted files that you create or modify from a Windows program, but it may not always do what you wanted. Be prepared to use chmod on any files you transfer in this way to set the permissions what you really wanted.

Samba will transfer all of your files exactly, byte-by-byte. If you are working with text files, you may need to repair your text files before using it with a program on the other operating system. 3 Transferring Files Across the Internet: ftp

FTP (File Transfer Protocol) is the mechanism for transferring files over the Internet. Although most browsers provide some support for FTP, they usually only permit downloads (from the remote machine to your local PC) and usually only permit access to public repositories, not to password-protected accounts.

FTP (File Transfer Protocol) is one of the oldest mechanisms for for transferring files over the Internet. Although most browsers provide some support for FTP, they usually only permit downloads (from the remote machine to your local PC) and usually only permit access to public repositories, not to password-protected accounts.

From a modern perspective, FTP has a number of limitations.

It has no encryption. Anyone on the network could, theoretically, see what you are transferring.

Even worse, if you are transferring to a machine that requires you to log in, FTP transmits your login name and password in plain-text, unencrypted form.

FTP struggles with modern firewalls and routers.

Consequently, FTP has been largely supplanted in practice by SFTP (the Secure File Transfer Protocol) which leverages the built-in encryption of SSH and a special ability of SSH to set up secure, router & fire-wall friendly, “tunnels” through which other protocols can work.

Still, you may occasionally see a download link from a public (no-login) repository with an ftp://… URL. And many programs that support SFTP will also offer an FTP option that, in most cases, you should simply ignore. 4 Secure File Transfer: sftp and scp

SFTP (Secure File Transfer Protocol) is a more modern file transfer that encodes your entire session. It is built upon the secure ssh service, and therefore shares ssh’s ability to tunnel out through most reasonably configured firewalls.

Most SSH servers are also SFTP servers, so if you have an account on a machine that lets you use an SSH client program to open command sessions, you can probably use an SFTP client program to open file transfer sessions on that same server.

You will need an SFTP client program on your local machine to do this. Many systems have sftp commands that can be used to transfer files via a text-based interface.

Alternatively, some programs provide GUI-based interfaces to SFTP that allow you to initiate file transfers by a simple “drag and drop”.

We’ll look at both styles of SFTP client. 4.1 SFTP via a Text-based Interface

Text-based sftp clients are widely available. Linux and OS/X machines will have one as part of the standard OS distribution. Windows does not come with one, but the same people who provide the PuTTY ssh client also provide an sftp client.

Typically, a text-based client is launched from your local machine with the command sftp yourLoginName@serverName

The one potentially confusing thing about working with the sftp command is that you need to be constantly aware of what directory you are working with on two different machines: the local machine you are issuing the commands from and the remote machine that you have connected to.

The commands provided by a typical text-based SFTP client are:

cd directory

Changes to a different directory on the remote machine.

lcd directory

Changes to a different directory on the local machine. (The “l” is for “local”.)

Tip: If you know that you are going to transferring files from one specific directory on your local PC, it’s easiest to cd to that directory before you even run the sftp command. That way you won’t need to do any lcd commands because you will already be in the right place on your local machine.

pwd

Prints the current directory that you ware working in on the remote machine.

There is usually no equivalent command to print the current working directory on your local machine.

ls

Lists the contents of the current working directory on the remote machine.

There is usually no equivalent command to list the current working directory’s contents on your local machine.

get filename

Download a file from the remote machine onto the local machine.

Most SFTP clients will allow wildcard patterns so that you can easily download multiple files at once.

put filename

Upload a file from your local machine to the remote machine.

Most SFTP clients will allow wildcard patterns so that you can easily upload multiple files at once. quit

Just what it says. exit also works.

Here is a sample SFTP session:

The commands supplied by the user are shown like this.

sftp [email protected] ➀ Connected to linux.cs.odu.edu. [email protected]'s password: sftp> lcd ~/temp sftp> cd data sftp> ls file1.txt file2.dat file3.txt file4.dat ➁ sftp> get file1.txt ➂ Fetching /home/yourName/data/file1.txt to file1.txt /home/yourName/data/file1.txt 100% 20KB 19.6KB/s 00:0 226 Transfer complete sftp> get *.dat ➃ Fetching /home/yourName/data/file2.dat to file2.dat /home/yourName/data/file2.dat 100% 1001 1.0KB/s 00: Fetching /home/yourName/data/file4.dat to file4.dat /home/yourName/data/file4.dat 100% 2123 2.1KB/s 0 226 Transfer complete sftp> quit

➀ You would, of course, need to use your own login name instead of “yourName”

➁ Note that these are files on the remote server machine.

➂ Downloading one file.

➃ Using wildcards to download multiple files. 4.2 SFTP via GUI Interface

If the text-based interface does not appeal (or is not available on your local machine), you can get a free GUI-based SFTP client. See the Library page for suggestions.

GUI-based SFTP clients differ considerably in the details of how you operate them. If you use one of these, you will have to rely on its own built-in help or its source website to learn how to use it. Here, we will have to settle for an example of one such client.

One that I can recommend fir Windows uses is WinSCP, shown here. It is fairly typical of GUI-based SFTP clients, showing two window “panes”, side by side. One shows directories and files on your local machine. The other shows directories and files on a remote machine to which you have connected. You can transfer files from one machine to the other by dragging and dropping file icons from one pane to the other.

WinSCP is available on the CS Dept’s Windows PCs, including the Virtual PC lab.

Another option, FileZilla, is shown here. As is typical of such file transfer clients, the interface is dominated by two window “panes”, side by side. One shows directories and files on your local machine. The other shows directories and files on a remote machine to which you have connected. You can transfer files from one machine to the other by dragging and dropping file icons from one pane to the other.

FileZilla can be installed on Windows, OS/X, and Linux machines.

With both of these clients, you will want to be sure that you have selected SFTP and not the older FTP as your transfer mode. Again, consult their documentation and/or built-in help for details. Students have reported that FileZilla may time out when connecting to some ODU machines. (This can also happen with WinSCP, but is less common because WinSCP immediately offers to keep trying for another 60 seconds before giving up.)

This can be caused by…

1. an internal setting of FileZilla that sets the timeout to 20 seconds. You can change this from the FileZilla Settings... entry under the Edit menu. Try bumping this up to 60. 2. entering incorrect login or password information or misspelling the host/server name. 3. attempting to connect via the (default) FTP method rather than SFTP,

4.3 Try It Out

Install an sftp client on your local PC, if you don’t have one already. Consult the Library page for options.

Whichever form of sftp client you have decided to use, text-based or GUI, …

Example 2: Try This:

1. Using your SSH client, log in to one of our Unix servers. Create a ~/playing/transfers directory. Log off.

2. You should still have the files wintext.txt and linuxtext.txt on your local machine from the earlier Try This exercise. If not, download them again.

3. On your local PC, use a command line or GUI-based SFTP client to connect to one of our Linux servers. Transfer the wintext.txt and linuxtext.txt files from your local PC to your ~/playing/transfers directory on the Linux server..

4. Using your SSH client, log in to one of our Unix servers. cd to your ~/playing/transfers directory and examine its contents. you should find the two files you tried to transfer in there.

5. Give the commands

file wintext.txt file linuxtext.txt

Take note of what this tells you about the line termination.

4.4 scp

Another secure trasfer approach is SCP , which you can think of as an attempt to extend the normal Unix cp command to work across networks. SCP uses the same underlying protocol as SFTP, so usually any SSH server will support for SFTP and SCP. The clients, however, are usually quite different. SCP is, by its nature, generally done via a text-mode interface by issuing an scp command.

The basic format of an scp command is scp loginName1@_machine1_:_file1_ loginName2@_machine2_:_file1_ to copy a file from one machine to another. If either file is on your local machine, you omit the “loginName@_machine_:” part. For local files, you may use relative paths to specify the file name. For remote files, you should use absolute paths.

If your login name on your local machine is identical to your login name on the remote machine, you can omit the “loginName@” part.

For example, from my home Linux machine, if I wanted to grab a copy of my .emacs file from my home directory on atria.cs.odu.edu , I might say:

scp [email protected]:/home/zeil/.emacs myAtria.emacs

Personally, I seldom use command-line scp because the paths on the remote machine tend to get long and, unlike paths on your local machine, you cannot use the Tab-key to complete file and directory names after typing the first few characters. I generally use sftp instead. Many command-line sftp clients do tab completion on the remote files. Even if they do not, the built-in ls command to list the current directory on the remote machine makes it easy to copy-and-paste long file names. 5 Fetching Files from Across the Internet: wget

Finally, there is a convenient way to get a copy of a file that is provided via a web server. The wget command accepts a URL (a web address), fetches the file at that URL, and deposits a copy of that file in your current directory.

Example 3: Try This:

Earlier, you downloaded a file linuxtext.txt onto a Windows PC and then transferred it to your Linux account.

Now let’s get one of them directly.

1. Log in to one of our Linux servers. 2. cd to any convenient directory that does not already contain that file. 3. Right click on the linuxtext.txt link above and select the option to copy the link address to your clipboard. 4. In your Linux command session, type “wget”, a space, and then paste the URL you just copied. Hit Enter/Return to run the command. 5. Examine your directory. you should find a copy of linuxtext.txt. Patterns for Text: Regular Expressions

Last modified: Nov 9, 2019

Contents: 1 Searching for Text 1.1 Searching Lines with grep 1.2 Rewriting lines with sed 2 Regular Expressions 3 Sed Redux

If wildcards provide a way to write patterns for file and directory paths, can we also write patterns for text strings? Yes, but this is not built into the shell for use by every command, the way that wildcards are.

Instead, most Unix programs and commands that do some kind of searching or matching for text will share a common notation for patterns of text to be matched. This notation is called regular expressions. 1 Searching for Text

For example, almost every text editor in any operating system will allow you to search a file for a given string. But most Unix text editors (including emacs and vim) will allow you to search for any string matching a regular expression “pattern”. sed, a useful utility for doing simple changes to text files, is most often invoked to use its “substitute” command, which replaces any text matching a regular expression by some desired replacement text. The csplit command splits a single file into multiple pieces, where the point of division is most often indicated via a regular expression. Perl and awk, available on most Unix systems but not covered in this course, are scripting (programming) languages with a heavy emphasis on text manipulation, which is accomplished largely through matching on regular expressions. 1.1 Searching Lines with grep

In an earlier example, we saw that the program grep can be used to list all lines of a file that match a given string. For example,

grep 'def' /usr/include/math.h would list all lines in the indicated file that contain the string “def”.

The first parameter (‘def’) is actually an example of a regular expression, a special notation for writing patterns for searching and matching text.

As we will see shortly, the notation for these patterns is such that a pattern that matches exactly one string (e.g., “def”) is written as that same string — “def” is both the pattern and the string that it matches.

But regular expressions are much more powerful than that. We can write a regular expression pattern to match a wide range of related strings.

First, though, let’s look at another command that makes heavy use of regular expressions. 1.2 Rewriting lines with sed

Many other commands besides grep will use regular expressions. sed, for example, allows you to enter a variety of editing commands that will be applied to every line of a file. A common use of sed is to scan each line of the file for a pattern and to replace that pattern, wherever it occurs, by some string. The sed command to do this is

sed s/pattern/replacement/g filename where filename is the file whose contents we want to scan and replace, pattern is a regular expression describing the text to search for in each line, and replacement is the text by which we wish to replace any thing that matches the pattern. (The ‘/’ characters are simply necessary to indicate the beginning and end of the pattern and replacement strings. They can be replaced by any character that does not appear in either the pattern or replacement strings.)

Example 1: Try This: Substitutions with sed

cd ~/playing cp ~cs252/Assignments/ftpAsst/alas.txt . more alas.txt

Now try operating on that file with sed:

sed s/o/X/g alas.txt

The ‘g’ at the end of each of the prior example indicates that the change should be applied every time a match is found (i.e., this is a global replacement). If the ‘g’ is dropped, only the first match in each line will be replaced.

Try the command

sed s/o/X/ alas.txt

to see the effect of dropping the ‘g’.

Neither the pattern nor the replacement are limited to single letters.

Try:

sed 's!I!you!g' alas.txt sed 's@Horatio@George@g' alas.txt

By default, sed is case-sensitive. You can add an ‘i’ flag at the end of the substitution to change this. Compare the outputs of these commands:

sed 's/I/you/g' alas.txt sed 's/I/you/gi' alas.txt

2 Regular Expressions

To write more powerful patterns, we need to understnad how regular expressions are constructed

A regular expression consisting of a single “non-special” character will match any string containing that character.

As it happens, none of the alphabetic and numeric characters are “special”, so the regular expression d, for example, would match any string containing a “d”.

Example 2: Try This: Regular expressions - basic characters cd ~/playing grep H alas.txt

Since grep works line by line, this would select every line containing an “H”.

If a set of regular expressions r1,r2,…,rk are concatenated together to form a single larger regular expression r1r2 …rk , it matches any string that contains a substring formed from a concatenation of strings s1s2 …sk , each of which matches the corresponding regular expression.

So when we write

grep def /usr/include/math.h

the def is actually the concatenation of three regular expressions d, e, and f, and matches any string that contains a substring matching d followed immediately by a substring e followed immediately by a substring matching f.

That may seem an unnecessarily complex way to get to the original idea of “matches the string `def’,” but this idea of concatenation is a general one that becomes more important as we consider other combinations of smaller regular expressions.

Example 3: Try This: Regular expressions - concatenation

Compare the results of

grep H ~/playing/alas.txt grep or ~/playing/alas.txt grep Hor ~/playing/alas.txt

The real power of regular expressions comes into play when we consider the various “special” characters that serve as regular expression operators.

Regular expressions can be grouped in parentheses, written as \(…\).

Without the backslash (\), the parentheses are just regular characters – they match a parenthesis in the lines of text.

The backslash is a special character, not only to grep and sed, but also to the command shell. So any command parmaters that want to use it will need to be quoted.

Example 4: Try This: Regular expressions - parentheses

Based on the definition of parentheses, these two commands should do exactly the same thing:

grep 'def' /usr/include/math.h grep '\(def\)' /usr/include/math.h

The introduction of the parentheses does not change what is matched. It does, however, group things together (just as parentheses do in conventional algebra), which we can take advantage of with the operators we will introduce shortly.

But compare also

grep 'x' /usr/include/math.h grep '\(x\)' /usr/include/math.h grep '(x)' /usr/include/math.h The vertical line | separating two regular expressions means that string matching either of those regular expressions would be accepted. Again, though, to get this special behavior, the | must be preceded by a backslash. This vertical bar is called the alternation operator. Alternation here means a choice (same root as “alternative”), not alternating from one thing to another and back again.

Example 5: Try This: Regular expressions - Alternation

Try:

grep '\(Y\|H\)' ~/playing/alas.txt grep '\(Y\|H\)or' ~/playing/alas.txt sed 's/\(Y\|H\)or/XXX/g' ~/playing/alas.txt

It is worth noting that, in sed, the expression that we match the text against in a substitution is a regular expression, but the replacement text is not. Try:

sed 's/\(Y\|H\)or/|||/g' ~/playing/alas.txt sed 's/\(Y\|H\)or/\|\|\|/g' ~/playing/alas.txt

Square brackets [ ] containing any set of characters not beginning with ^ will match a string containing any one of those characters.

Example 6: Try This: Regular expressions - brackets

Try these commands:

grep Alas ~/playing/alas.txt grep '[Alas]' ~/playing/alas.txt

Most Unix systems have a file /usr/share/dict/words, which is a “dictionary” used by spellcheck programs. It is not a dictionary in the sense of a list of words and definitions. It is simply a list of words, one per line, in alphabetical order.

Try:

more /usr/share/dict/words

(Remember you can quit with q). A typical version of this file will have nearly 100,000 words.

That words file can be used to search for words that match a selected criterion. For example, if you can’t remember whether a word is spelled ‘belief’ or ‘beleif’, try:

grep 'bel[ei][ei]f' /usr/share/dict/words

A particularly useful variant on the brackets is the use of character ranges. If you write two characters separated by a hyphen (-) inside brackets, e.g., [A-Z], then that is taken to mean all the characters starting at the first one and up to the last one, according to the usual ASCII character encoding rules.

Example 7: Try This: Regular expressions - brackets with character ranges

Try: grep 'i[A-Z]' /usr/share/dict/words sed 's/[A-Z]/*/g' ~/playing/alas.txt sed 's/[a-z]/*/g' ~/playing/alas.txt sed 's/[A-Hm-z]/*/g' ~/playing/alas.txt

This can be combined with ^:

sed 's/[^A-Z]/*/g' ~/playing/alas.txt

The character set inside a regular expression square brackets can be abbreviated by giving a range of characters separated by a hyphen. For example, [a-z] would match all the lower-case alphabetic characters.

Example 8: Try This: grep and -v

The -v option causes grep to list only those lines that do not match the pattern.

Try:

grep -v a ~/playing/alas.txt grep -v e ~/playing/alas.txt grep -v i ~/playing/alas.txt grep -v o ~/playing/alas.txt

And

grep the ~/playing/alas.txt grep -v the ~/playing/alas.txt

Square brackets containing any set of characters beginning with ^ [^…] will match a string containing any character not in that set._

Example 9: Try This: Regular expressions - brackets with ^

Do you remember being taught in grade-school spelling class that the letter ‘q’ is always followed by a ‘u’ in English?

Let’s checK:

grep 'q[^u]' /usr/share/dict/words

(Actually, ardent Scrabble players and crossword puzzlers will recognize that /usr/share/dict/words is missing the word “qat”.)

Or how about that rule “‘i’ before ‘e’ except after ‘c’”?

grep 'cie' /usr/share/dict/words grep '[^c]ei' /usr/share/dict/words

. matches any single printable character, including blanks, but not including the end-of-line character._

Example 10: Try This: Regular Expressions - Matching any single character grep 'd.f' /usr/include/math.h

Can you think of a word that has two ’z’s separated by a single letter?

grep 'z.z' /usr/share/dict/words

If r is a regular expression, then r∗ matches zero or more successive strings, each of which matches r.

Example 11: Try This: Regular Expressions - Zero or more repeats

grep 'ER*N' /usr/include/math.h grep '#.*if' /usr/include/math.h

Note that the behavior of * is very different in regular expressions than it is in file path wildcards.

If r is a regular expression, then r∖+ matches one or more successive strings, each of which matches r.

Example 12: Try This: Regular Expressions - One or more repeats

grep 'ERN' /usr/include/math.h egrep 'ER+N' /usr/include/math.h grep 'ER*N' /usr/include/math.h grep 'ER\+N' /usr/include/math.h

If r is a regular expression, then r∖? matches zero or one strings matching r.

Example 13: Try This: Regular Expressions - Optional occurrence

egrep 'def' /usr/include/math.h egrep 'define' /usr/include/math.h egrep 'def(ine)\?' /usr/include/math.h egrep 'def(ine)\?d' /usr/include/math.h

In essence, the ? makes the preceding item optional. (So the first and third commands in this example are actually equivalent.)

If r is a regular expression, then ^r matches any string that begins with a substring matching r.

The grep and sed programs both use an entire line of text as the string to be searched, so for those programs ^ matches the beginning of a line.

Example 14: Try This: Regular Expressions - Beginning of the string

grep 'ex' /usr/include/math.h grep '^ex' /usr/include/math.h

and

grep '[nN]' ~/playing/alas.txt grep '^[nN]' ~/playing/alas.txt grep '^ *[nN]' ~/playing/alas.txt If r is a regular expression, then r$ matches any string that ends with a substring matching r.

The grep and sed programs both use an entire line of text as the string to be searched, so for those programs $ matches the end of a line.

Example 15: Try This: Regular Expressions - End of the string

Try:

grep '_' /usr/include/math.h grep '_$' /usr/include/math.h

With so many special characters, you might wonder just what you’re supposed to do if you really want to search for lines containing a "*“, or a ”?", or a … The answer is given in our final rule:

If c is a special character, then \c matches that character in a string.

Example 16: Try This: Regular Expressions - Quoting special characters

grep '.$' ~/playing/alas.txt grep '\.$' ~/playing/alas.txt

grep '3.' /usr/include/math.h grep '3\.' /usr/include/math.h

This is by no means an exhaustive list of all the regular expression operations, but it’s probably enough for most purposes. 3 Sed Redux sed has some regular expression features not useful in grep. In particular, if you place part of a regular expression inside parentheses (written as \( and \) ), then in the replacement string you can refer to whatever got matched by the aprenthesized part of the expression via a back reference. If you have just one parenthesized expression, the back reference is \1. If you add another parenthesized expression, you can refer back to it as \2, and so on.

Example 17: Try This: Back References in sed

cd ~/playing cp ~cs252/Assignments/ftpAsst/names.txt . cat names.txt sed 's/\([A-Za-z]\+\) /*/' names.txt sed 's/\([A-Za-z]\+\) \(.*\)/*/' names.txt

The first parenthesized expression matches a block of 0 or more alphabetic characters. That is followed by a blank, which matches a blank in the file. The second parenthesized expression then matches a block of 0 or more of any character, in effect swallowing up the rest of the line.

sed 's/\([A-Za-z]\+\) \(.*\)/\1/' names.txt sed 's/\([A-Za-z]\+\) \(.*\)/\2/' names.txt sed 's/\([A-Za-z]\+\) \(.*\)/\2, \1/' names.txt Note how, in the final replacement pattern, we reversed the order of the two matched blocks, as well as inserting a comma between them. Redirection and Pipes

Steven Zeil

Last modified: Nov 8, 2017

Contents: 1 Redirection 2 Pipes

One of the interesting ideas that pervades Unix is that many, if not most, programs can be viewed as “filters” or “transforms” that take a stream of text as input and produce an altered stream of text as output. Many Unix commands are designed to perform relatively trivial tasks, perhaps not very useful by themselves, that can be chained together in interesting and useful ways.

The practical consequence of this is that Unix shells devote special attention to a standard input stream that forms the main input to most programs/commands, and to a standard output stream that forms the main output from most programs/commands.

The shell attempts to make it easy either to redirect one of these standard streams to a file or to pipe the standard output stream of one program into the There is actually a second output stream standard input of another. supported by many programs, the standard error stream, used for writing error/debugging messages.

1 Redirection

For example, the program wc (for word count) reads text from its input stream and produces as its output stream three numbers indicating the number of lines, words, and characters that it saw. You could invoke this directly:

wc Hello. How are you? ^D in which case, you would see as output:

2 4 20

For this to be very useful, however, we need to make it accept a file as input. This is done by using the < operator in the shell. Think of the < as an arrow indicating data flowing towards the command from a filename: If hello.c is this file:

#include int main () { printf ("Hello from C!\n"); return 0; } then the command wc < hello.c produces the output

6 13 80

On the output end, the shell operator > redirects the standard output into a file (again, think of this as an arrow indicating data flowing into a filename from the command):

wc < hello.c > hello.wc produces no output on the screen, but creates a file called hello.wc. That file will contain the output

6 13 80

of the wc command.

Example 1: Try This: There are slight differences in how redirection cd ~/playing works depending on what shell (the program that echo Hello > greeting.txt reads your commands and executes them) you more greeting.txt are using. echo Goodbye > greeting.txt If you are running in an account that was created more greeting.txt before late summer of 2016, you are probably echo Farewell > greeting.txt running tcsh. Newer accounts are set up to use more greeting.txt the more modern and popular bash shell. You can check to see which you have by running the command:

echo $SHELL

If you are running tcsh, the shell will not allow you to use ordinary redirection to overwrite an existing file. In other words, if you run

echo hello > foo.txt echo goodbye > foo.txt

the second command will fail and foo.txt will still contain the string “hello”.

In tcsh, to overwrite an existing file, you must use >!. For example:

echo hello > bar.txt echo goodbye >! bar.txt

will leave a file bar.txt containing the string “goodbye”.

One thing that does not work is to use redirection with the same file both as input to and output from a single command.

Example 2: Try This:

cd ~/playing echo Hello > greeting.txt sed s/el/o/ < greeting.txt > holo.txt more greeting.txt more holo.txt sed s/el/o/ < greeting.txt > greeting.txt more greeting.txt

The first sed command rewrites the contents of the input file, saving the rewritten content in the output file.

The second sed command trashes the input file contents. That’s because the first step in preparing a file for output is to empty out any prior contents, and that is done by the command shell program that interprets your keyboard command and launches the sed program. So the file gets emptied out before the sed command is actualy launched. sed never gets a chance to read the input file’s contents.

The output redirection operator has an important variant. Sometimes we would like to add output to the end of an existing file instead of replacing that file. This is done with the operator >>. So the code sequence

wc < hello.c > hello.wc wc < hello.c >> hello.wc would result in a file hello.wc with contents

6 13 80 6 13 80

regardless of whether hello.wc had existed previously.

Example 3: Try This:

cd ~/playing echo Aloha >> greeting.txt more greeting.txt

2 Pipes

To pipe the output of one command into the input of another, use the shell operator |. A common example of a pipe is to take a command that may have a large amount of output and to pipe it through more to facilitate viewing.

Example 4: Try This:

ls /bin | more

As you gain facility with a greater variety of Unix text manipulation commands, you will find that redirection and pipes can be a powerful combination. For example, suppose that you have written program myprog that emits a great deal of output, among which might be some error messages starting with the phrase “ERROR:”. If you wanted to read only the error messages, you could, of course, just view all the output, watching for the occasional error message: myprog | more

But if the program produces a lot of output, this will quickly become tedious. However, we previously encountered the program grep, which scans its input stream, printing only those lines matching a given regular expression. By piping the myprog output through grep, we can limit the output to the part we really want to see:

myprog | grep "ERROR:" | more

Example 5: Try This:

You can use sed to reformat the output from some commands. Give the following commands:

cd ~/playing cat greeting.txt cat greeting.txt | sed -e 's/^/She said, "/' -e 's/$/"./'

(You should still have greeting.txt from the earlier Try This exercise in this lesson.)

The -e option to sed allows us to string together multiple substitution commands. Commands That Launch Other Commands

Steve Zeil

Last modified: Dec 13, 2016

Contents: 1 Backticks 2 xargs 3 find

Redirection allows you to alter the behavior of individual commands by changing where they get their input and where they store their output. Pipes allow you to combine commands in interesting ways.

Next we will look at other ways to combine selective inputs and combinations of commands. In particular, we will look at two commands, xargs and find, that are interesting in part because they allow you to issue other commands from them. First, though, we will look at a technique by which the output from one command can be supplied as part of the parameter list to a second command. 1 Backticks

Earlier, we looked at different forms of quoting in shell commands. We used double-quotes (") and single quotes (') to tell the shell to suppress its usual treatment of special characters, making them “normal” characters instead. In essence, quoting suppresses the normal activity of the command shell.

If, however, we write something between a pair of backticks (`, often found just to the left of the “1” key), almost the opposite happens. Instead of treating the enclosed characters as plain text, the shell treats them as a command, runs that command, and replaces the whole `-surrounded string by the output of that command.

Example 1: Try This

Give the following commands, noting the difference between the effects of the forward and backward quotes.

date echo Today is date. echo Today is 'date'. echo Today is `date`.

As an example of where this might be useful, suppose that you had been working on a large programming project, and now wanted to start another, similar project. Let’s assume that the old project is in a directory named oldProject and that we have just created a directory newProject to hold the new one. Now, you might start by copying the files from the old project to the new one:

cp oldProject/* newProject/ but that would copy not only your program’s source code but also files that you don’t want carried over to the new project, the *.o files and executables produced when you compiled the old project.

Now, if we could produce a listing of the files that we wanted, we could use backticks to feed that list into the appropriate spot in a cp command. So, let’s build that list. If we start with ls oldProject/* we get a list of all the files in the old project. Let’s filter that by removing the .o files:

ls oldProject/* | grep -v '\.o'

The backslash in front of the period is needed because, normally, a period in a regular expression means “match any single character”, and we actually want to match only periods. The -v option tells grep to select every line that does not match our pattern. So this should remove all .o files from the list if old project files.

Now, executables in Unix usually have no extension at all. So’ we’d like to remove from our list any files that have no periods in them at all:

ls oldProject/* | grep -v '\.o' | grep '\.'

That gives us the list of files we want to copy, so we can now feed that into a cp command:

cp `ls oldProject/* | grep -v '\.o' | grep '\.'` newProject/

Example 2: Try This

Let’s set up a simple version of the example that we just went through. Give these commands:

cd ~/playing mkdir oldProject newProject cd oldProject date > a.h date > a.cpp date > b.h date > b.cpp date > a.o date > b.o date > c.o date > exec ls cd ..

Now, we’ve set up a dummy project. Let’s see how the example works:

ls oldProject/* ls oldProject/* | grep -v '\.o' ls oldProject/* | grep -v '\.o' | grep '\.' cp `ls oldProject/* | grep -v '\.o' | grep '\.'` newProject/ ls newProject

2 xargs xargs reads a list of file names from the standard input and fills those file names in to a command of your choosing. In its simplest form, you can use xargs like this:

xargs partial-command In this form xargs will read a list of file names (paths) from the standard input and will simply tack them on to the end of the partial command.

Where do the file names come from? You’re not likely to type them directly at the keyboard. (If you were going to do that, you would probably just have typed the whole command.) So, usually, you will run this with a list of files names that you have collected into a text file, or you may pipe into xargs the output of a command that lists the files you want.

Example 3: Try This:

ls /bin ls /bin | xargs echo Issuing a command on

Most command shells will have limits on just how long a single command can get, and xargs tries to be smart about the way it constructs commands. If the standard input contains a very large number of files, xargs will break the list up into pieces. Look at the output above. Can you see evidence that xargs has split the list into separate commands?

You can control the maximum number of files that xargs will pack into one command using the -n flag.

Example 4: Try This:

ls /bin | xargs -n 4 echo Issuing a command on

The most common reason for doing this is because not all Unix commands work on arbitrarily long lists of files. Some work only on a single file, making -n 1 a useful option.

In this basic form, xargs tacks the file names onto the end of the generated command. But sometimes you might want the filenames placed into the middle of the command. The -i option permits that. If xargsis executed with a -i flag, then it looks in your partial command for the characters “{}” and places your files there (one at a time, as if you had said “-n 1”.

Example 5: Try This:

ls /bin | xargs -i echo Hello {} world

As an example of using xargs, suppose that you have a directory with a large number of data files. You want to copy those files into a new directory. However, you have edited many of these files, so the original directory is littered with backup files left by the editors. You would prefer not to copy those backups. The backups can be recognized because some end in “.bak” and some end in “~”.

Now, normally, you would copy files from one directory to another like this:

cp oldDirectory/* newDirectory/

But there is not obvious way to rewrite the wildcard pattern * to exclude the backup files (wildcards are great for including things, not so great for excluding them).

But a list of file names is nothing more than text, and we have some powerful tools like grep for editing and selecting text.

Example 6: Try This:

mkdir ~/xargs cd ~cs252/Assignments/xargs ls ls | grep -v '\.bak$' ls | grep -v '\.bak$' | grep -v '~$' ls | grep -v '\.bak$' | grep -v '~$' | xargs -i cp {} ~/xargs ls ~/xargs

3 find

Wildcards give us a way to describe a number of files at once. But wildcards have a limitation. They can only describe one “level” of directories at a time. You can write a wildcard expression to look at a variety of files in one directory, or at a variety of files in one or more subdirectories of that directory, or in one or more sub-subdirectories of that one. But you cannot write a wildcard expression that will simultaneously describe files in a directory and in its subdirectories.

Some commands will try to help with this. Both cp and rm, for example, offer a -r flag (“r” for “recursive”) that will descend into an arbitrary number of levels of subdirectories, but these are all-or-nothing selections. You can’t be very selective about what files get processed this way.

This is where find come in. find is the Swiss army knife of Unix commands. It provides all kinds of ways to select files, no matter how deep they are in your directory structure. It provides a variety of things it can do with selected files, or it can fill their names into an arbitrary command in a manner similar to xargs -i.

The general form of a find command is

find list-of-files-and-directories list-of-actions find looks at each file and directory given in the list. For directories, it also looks at all files and directories inside those, descending as far as it can from directory to directory.

The actions in the command are all given a flags (beginning with “-”). Some actions will “do something” to a file. Others are used to select which files will be passed on to the later commands in the list.

Example 7: Try This:

ls /usr/include find /usr/include

The most common selection action is -name, which is given a wildcard expression to match file names against.

Example 8: Try This:

ls /usr/include/w*.h ls /usr/include/*/w*.h find /usr/include -name 'w*.h'

The wildcard expression for -name must be quoted, because you don’t want the command shell to expand it before it launches find.

Other useful ways to select files include -type, which chooses different “types” of files. Directories are type ‘d’ and ordinary files are type ‘f’.

Example 9: Try This: find /usr/include -type d -name 'u*' find /usr/include -type d -name 'u*' -ls

Note that all the files listed are directories.

find /usr/include -type f -name 'u*'

You can also select files based on how long ago they were modified.

Example 10: Try This:

find ~ -mtime +7 find ~ -mtime -7

One of these lists only files you have modified within the past 7 days. The other lists files whose last modification is more than 7 days in the past.

Not sure which is which? Try

find ~ -mtime +7 | xargs ls -ld find ~ -mtime -7 | xargs ls -ld

What can find do to files it selects? The simplest possibility is to simply print the file name, which is done by the action “-print”. In fact, that’s the behavior we have seen in all the examples so far, because it’s the default if you don’t do anything else to the selected files. Sometimes, though, you use -print because you want to print a file name and do something else to a file.

You can get a bit more information than just the name:

Example 11: Try This:

find ~ -mtime +7 -ls

The most powerful use of find comes form the -exec action, which allows you to specify an arbitrary Unix command that you want applied to selected files. The command is terminated by a quoted semi-colon (“\;”) and should contain the characters “{}” at the point where you want to insert the file name.

Example 12: Try This:

cp ~cs252/Assignments/xargs/* ~/xargs ls ~/xargs find ~/xargs -name '*.bak' -print -exec rm {} \; ls ~/xargs

Closely related is -ok, which asks for permission before applying a command.

Example 13: Try This:

ls ~/xargs find ~/xargs -name '*~' -ok rm {} \; ls ~/xargs

Actually, because many commands, such as grep, can be used to test files for certain properties, -exec can actually be used to select files as well as to operate on them.

Example 14: Try This:

find /usr/include -type f -exec grep --quiet math {} \; -ls

There are many other possible actions as well. See the man page for details. Compiling and Executing Programs

Steven J Zeil

Last modified: Aug 17, 2019

Contents: 1 Compiling C and C++ 1.1 The Structure of C++ and C Programs 1.2 Compiling a Program With Only One Compilation Unit 1.3 Compiling With Multiple Compilation Units 1.4 Some Useful Compiler Options 2 Compiling Java Programs 1 Compiling C and C++

Now that you know how to create and edit files, you can generate new programs. The most commonly used languages in the CS Department at the moment are C++, C, and Java.

The most popular C++ and C compilers are g++ and gcc. (Actually, gcc and g++ are aliases for the same compiler being invoked with slightly different options.) 1.1 The Structure of C++ and C Programs

Although not really a Unix-specific topic, it’s hard to discuss how to compile code under any operating system without a basic understanding how programs are put together.

The source code for a C++ (or C) program is contained in a number of text files called source files. Very simple programs might be contained within a single source file, but as our programs grow larger and more complicated, programmers try to keep things manageable by splitting the code into multiple source files, no one of which should be terribly long.

There are two different kinds of source files: header files and compilation units. Header files are generally given names ending in “.h”. Compilation unit files are generally given names ending in “.cpp” for C++ code and “.c” for C code.

There are variations of these file extensions, particularly for C++. Other less common endings accepted by some C++ compilers for non-header files include “.C” “.cc” and “.cxx”.

Header and non-header (compilation unit) files are treated differently when we build programs. Each compilation unit is compiled separately from the others (Figure 1). This helps keep the compilation times reasonable, particularly when we are fixing bugs in a program and may have changed only one or two non-header files. Only those changed files need to be recompiled.

Figure 1: Building one program from many files

Header files are not compiled directly, Instead, header files are included into other source files via #include. In fact, when you invoke a C/C++ compiler, before the “real” compiler starts, it runs a pre-processor whose job is to handle the special instructions that begin with #. In the case of #include statements, the pre-processor simply grabs the relevant header file and sticks its content into the program right at the spot of the #include.

This can result in a dramatic increase in the amount of code that actually gets processed. The code shown here, for example, is pretty basic. But the #include #include statements bring in an entire library of I/O and string-related declarations from the #include C++ standard library. Here, for example, is the output of the pre-processor for one using namespace std; compiler for that small block of code. (If you look at the very end, you can int main() recognize the main code for this program.) { string greeting = "Hello!"; A header file can be #included from any number of other header and non-header cout << greeting << endl; files. That is, in fact, the whole point of having header files. return 0; }

Header files should contain declarations of things that need to be shared by multiple other source files.

Compilation unit files should declare only things that do not need to be shared. As we go through all the compilation steps required to build a program, anything that appears in a non-header file will be processed exactly once by the compiler. Anything that appears in a header file may be processed multiple times by the compiler.

1.1.1 What Goes Into a Header File? What Goes Into a Non-Header File?

The short answer is that a header file contains shared declarations, a non-header file contains definitions and local (non- shared) declarations.

1.1.2 What is the difference between a declaration and a definition?

Pretty much everything that has a “name” in C++ must be declared before you can use it. Many of these things must also be defined, but that can generally be done at a much later time.

You declare a name by saying what kind of thing it is:

const int MaxSize; // declares a constant extern int v; // declares a variable void foo (int formalParam); // declares a function (and a formal parameter) class Bar{...}; // declares a class typedef Bar* BarPointer; // declares a type name

In most cases, once you have declared a name, you can write code that uses it. Furthermore, a program may declare the same thing any number of times, as long as it does so consistently. That’s why a single header file can be included by several different non-header files that make up a program - header files contain only declarations.

You define constants, variables, and functions as follows:

const int MaxSize = 1000; // defines a constant int v; // defines a variable void foo (int formalParam) {++formalParam;} // defines a function

A definition must be seen by the compiler once and only once in all the compilations that get linked together to form the final program. A definition is itself also a declaration (i.e., if you define something that hasn’t been declared yet, that’s OK. The definition will serve double duty as declaration and definition.).

When a compilation unit is compiled, we get an object-code file, usually ending in “.o”. These are binary files that are “almost” executable - for some variables and functions, instead of the actual address of that variable/function, they still have its name. This happens when the variable or function is declared but not defined in that compilation unit (after expansion of #includes by the pre-processor). That name will be assigned an address only when a file containing a definition of that name is compiled. And that address will only be recorded in the object code file corresponding to the compilation unit source file where the name was defined.

The complete executable program is then produced by linking all the object code files together. The job of the linker is to find, for each name appearing in the object code, the address that was eventually assigned to that name, make the substitution, and produce a true binary executable in which all names have been replaced by addresses.

Understanding this difference and how the entire compilation/build process works (Figure 1) can help to explain some common but confusingly similar error messages:

If the compiler says that a function is undeclared, it means that you tried to use it before presenting its declaration, or forgot to declare it at all.

The compiler never complains about definitions, because an apparently missing definition might just be in some other non-header file you are going to compile later. But when you try to produce the executable program by linking all the compiled object code files produced by the compiler, the linker may complain that a symbol is

undefined (none of the compiled files provided a definition) or is multiply defined (you provided two definitions for one name, or somehow compiled the same definition into more than one object-code file).

For example, if you forget a function body, the linker will eventually complain that the function is undefined. If you put a variable or function definition in a .h file and include that file from more than one place, the linker will complain that the name is multiply defined. 1.2 Compiling a Program With Only One Compilation Unit

The simplest case for each compiler involves compiling a single-file program or, in general, a program with one compilation unit and some headers.

Example 1: Try This

1. Use an editor (e.g., emacs) to prepare the following files:

hello.cpp

#include

using namespace std;

int main () { cout << "Hello from C++ !" << endl; return 0; }

hello.c

#include int main () { printf ("Hello from C!\n"); return 0; }

2. To compile and run these, give the commands:

g++ -g hello.cpp ls

Notice that a file a.out has been created.

./a.out gcc -g hello.c ./a.out

The compiler generates an executable program called a.out. If you don’t like that name, you can use the mv command to rename it.

3. Alternatively, use a -o option to specify the name you would like for the compiled program:

g++ -g -o hello1 hello.cpp gcc -g -o hello2 hello.c ls ./hello1 ./hello2

In the example above, we placed “./” in front of the file name of our compiled program to run it. In general, running programs is no different from running ordinary Unix commands. You just type

pathToProgramOrCommand parameters

In fact, almost all of the “commands” that we have used in this course are actually programs that were compiled as part of the installation of the Unix operating system.

As we have noted earlier, we don’t usually give the command/program name as a lengthy file path. We say, for example, “ls” instead of “/bin/ls”. That works because certain directories, such as /bin, are automatically searched for a program of the appropriate name. This set of directories is referred to as your execution path. Your account was set up so that the directories holding the most commonly used Unix commands and programs are already in the execution path. (You can modify your execution path, if desired, to add additional directories.) You can see your path by giving the command

echo $PATH

One thing that you will likely find is that your $PATH probably does not include “.”, your current directory. Placing the current directory into the $PATH is considered a (minor) security risk, but that means that, if we had simply typed “a.out” or “hello”, those programs would not have been found because the current directory is not in the search path. Hence, we gave the explicit path to the program files, “./a.out” and “./hello”. 1.3 Compiling With Multiple Compilation Units

A typical program will consist of many .cpp files. (See Figure 1.) Usually, each class or group of utility functions will have their definitions in a separate .cpp file that defines everything declared in the corresponding .h file. The .h file can then be #included by many different parts of the program that use those classes or functions, and the .cpp file can be separately compiled once, then the resulting object code file is linked together with the object code from other .cpp files to form the complete program.

Splitting the program into pieces like this helps, among other things, divide the responsibility for who can change what and reduces the amount of compilation that must take place after a change to a function body.

When you have a program consisting of multiple files to be compiled separately, add a -c option to each compilation. This will cause the compiler to generate a .o object code file instead of an executable. Then invoke the compiler on all the .o files together without the -c to link them together and produce an executable:

g++ -g -c file1.cpp g++ -g -c file2.cpp g++ -g -c file3.cpp g++ -g -o programName file1.o file2.o file3.o

(If there are no other .o files in that directory, the last command can often be abbreviated to “g++ -o programName -g *.o”.) The same procedure works for the gcc compiler as well.

Actually, you don’t have to type separate compilation commands for each file. You can do the whole thing in one step:

g++ -g -o programName file1.cpp file2.cpp file3.cpp

But the step-by-step procedure is a good habit to get into. As you begin debugging your code, you are likely to make changes to only one file at a time. If, for example, you find and fix a bug in file2.cpp, you need to only recompile that file and relink: g++ -g -c file2.cpp g++ -g -o programName file1.o file2.o file3.o

Example 2: Try This

1. Use an editor (e.g., emacs) to prepare the following files:

hellomain.cpp

#include #include "sayhello.h"

using namespace std;

int main () { sayHello(); return 0; }

sayhello.h

#ifndef SAYHELLO_H #define SAYHELLO_H

void sayHello();

#endif

sayhello.cpp

#include #include "sayhello.h"

using namespace std;

void sayHello() { cout << "Hello in 2 parts!" << endl; }

2. To compile and run these, give the commands:

g++ -g -c sayhello.cpp g++ -g -c hellomain.cpp ls g++ -g -o hello1 sayhello.o hellomain.o ls ./hello1

Note, when you do the first ls, that the first two g++ invocations created some .o files.

3. Alternatively, you can compile these in one step. Give the command

rm hello1 *.o ls just to clean up after the previous steps, then try compiling this way:

g++ -g -o hello2 hellomain.cpp sayhello.cpp ls ./hello2

An even better way to manage multiple source files is to use the make command. 1.4 Some Useful Compiler Options

Another useful option in these compilers is -D. If you add an option -Dname=value, then all occurrences of the identifier name in the program will be replaced by value. This can be useful as a way of customizing programs without editing them. If you use this option without a value, -Dname, then the compiler still notes that name has been “defined”. This is useful in conjunction with compiler directive #ifdef, which causes certain code to be compiled only if a particular name is defined. For example, many programmers will insert debugging output into their code this way:

⋮ x = f(x, y, z); #ifdef DEBUG cerr << "the value of X is: " << x << endl; #endif y = g(z,x); ⋮

The output statement in this code will be ignored by the compiler unless the option -DDEBUG is included in the command line when the Zeil’s 1st Rule of Debugging: Never remove compiler is run. debugging output. Just make it conditional. If you remove it, you’re bound to want it again Sometimes your program may need functions from a previously- later. compiled library. For example, the sqrt and other mathematical functions are kept in the “m” library (the filename is actually Zeil’s 2nd Rule of Debugging: Never leave your libm.a). To add functions from this library to your program, you debugging code active when you submit your would use the “-lm” option. (The “m” in “-lm” is the library name.) programs for grading. If the grader is using an This is a linkage option, so it goes at the end of the command: automatic program to check the correctness of the output, unexpected output will make your g++ -g -c file1.cpp program fail the tests. On the other hand, if the g++ -g -c file2.cpp grader is reading the output to check its g++ -g -c file3.cpp correctness, wading through extra output really g++ -g -o programName file1.o file2.o file3.o -lm ticks the grader off!

The general form of gcc/g++ commands is

g++ compilation-option files linker-options

Here is a summary of the most commonly used options for gcc/g++:

Compilation Flags -c compile only, do not link -o filename Use filename as the name of the compiled program - Define symbol during compilation. D_symbol_**=**_value_ Compilation Flags Include debugging information in compiled code (required if you want to be able to run the -g gdb debugger). Optimize the compiled code (produces smaller, faster programs but takes longer to compile). -O Different levels of optimization can be invoked as -O1, -O2 Add directory to the list of places searched when a “system” include ( ) is -I directory #include <...> encountered. Linkage Flags -L directory Add directory to the list of places searched for pre-compiled libraries. -l_libname_ Link with the precompiled library lib_libname_.a 2 Compiling Java Programs

Java programs get compiled into object code for an imaginary CPU called the “Java Virtual Machine” (JVM). Consequently, you can’t execute compiled Java code directly. You must run a program that simulates a JVM and let that simulated computer execute the Java code.

That may seem a little convoluted, but the JVM simulator is easier to write than a “true” compiler. Consequently, JVM simulators can be built into other programs (such as web browsers), allowing Java code compiled on one machine to be executed on almost any other machine. By contrast, a true native-code compiler (e.g., g++) produces executables that can only be run on a single kind of computer.

The command to compile Java code is “javac” (“c” for compiler) and the command to execute compiled Java code is “java”. So a typical sequence to compile and execute a single-file Java program would be

javac -g MyProgram.java java MyProgram

Unlike most programming languages, Java includes some important restrictions on the file names used to store source code.

Java source code is stored in files ending with the extension “.java”.

Each Java source code file must contain exactly one public class declaration.

The base name of the file (the part before the extension) must be the same (including upper/lower case characters) as the name of the public class it contains.

So the command

javac -g MyProgram.java compiles a file that must contain the code:

public class MyProgram ...

The output of this compilation will be a file named MyProgram.class (and possibly some other .class files as well).

If we have a program that consists of multiple files, we can simply compile each file in turn:

javac -g MyProgram.java javac -g MyADT.java but this might not be necessary. If one Java file imports another, then the imported file will be automatically compiled if no .class file for it exists.

So, if the file MyProgram.java looked like this

import MyADT;

public class MyProgram ... then compiling MyProgram.java would also compile MyADT.java.

Beware, however, after the first compilation. javac by default only checks to see if an appropriately named .class file exists. If you subsequently make changes to MyADT.java and then recompile MyProgram.java, the compiler will not realize that MyADT.java needs to be recompiled as well.

Java programs come in two forms: applets and applications. Applets are placed on web servers and can be launched from HTML web pages so that the Java code runs whenever someone browses that page. Applications, on the other hand, are more like traditional programs that get invoked directly by the user.

To run a Java application, we use java:

java MyProgram which looks for a file named MyProgram.class. Within that file, it looks for a compiled version of a function Main that must have been declared this way:

public static void main (java.lang.String[] args) { ⋮ } and executes that function.

As Java programs get larger, programmers usually begin to group their classes into packages. Packages can also be grouped (nested) inside other packages. Again, Java has rules that cause the program’s modular structure to be reflected in the file structure used to store it. If a class is declared to be inside one or more levels of nested packages, then each package name is used as a directory name when storing the source code file.

For example, if we had source code like this:

package Project.Utilities;

class MyADT { ⋮ and

package Project;;

import Project.Utilities.MyADT;

class MyProgram { ⋮ public static void main (java.lang.String[] args) { ⋮ } } then we would need a directory/file structure as shown in Figure 2. Inside the Project directory would be the MyProgram.java file and another directory named “Utilities”, and the MyADT.java function goes inside that Utilities directory.

Figure 2: Sample Java Package Structure

Now, here’s the part that trips up many a Java programmer:

When you compile and execute code in Java packages, you must always do so from the directory at the top of the package structure.

The javac compilation command takes the name of the source code file.

The java execution command takes the name of the class containing the main function.

So the compilation and execution commands would be

javac -g Project/Utilities/MyADT.java javac -g Project/MyProgram.java java Project.MyProgram

For projects with many packages, the compiled code is often packaged up into a single file, called a jar, with a file extension of “.jar”. This makes it easy to distribute an entire program as a single file. A jar file is actually a conventional “zip” compressed archive file with a little bit of extra directory information written into a special file (called the manifest) included in the archive.

A jar file can contain multiple programs. Sometimes there is a single preferred program specified in the manifest. If so, you can execute this program by simply saying

java -jar pathToTheJarFile

If there is no preferred program or you want to execute a program other than the preferred one, you use the normal java command to name the class containing the desired main function, but use the -cp option (described below) to tell the VM to look inside the jar file, e.g.,

java -cp myLargeProgram.jar Project.MyProgram As with g++, there are several options that you may choose to employ when compiling and executing Java code. Here is a summary of the most commonly used ones:

Compilation Flags Add the directories and jar files named in the pathlist (multiple items may be separated by ‘:’) to the list of -cp pathlist places searched when importing other Java source code. -g Include debugging information in compiled code (required if you want to be able to run the jdb debugger. Check each imported class to see if its source code has been changed since it was last compiled. If so, -depend automatically recompile it. - Check the code for features that used to be legal in Java, but are expected to become illegal in the near future. deprecation -O Optimize the compiled code (produces smaller, faster programs but takes longer to compile) Execution Flags Add the directories and jar files named in the pathlist (multiple items may be separated by ‘:’) to the list of -cp pathlist places searched when importing other compiled Java .class files. Dealing with Error Messages

Steven Zeil

Last modified: Aug 17, 2019

Contents: 1 Capturing Error Messages 1.1 Capturing Errors in the Shell 1.2 Capturing Messages in an IDE 1.3 Compiling with emacs 1.4 Compiling with vim 2 Understanding the Error Messages 2.1 Cascading 2.2 Backtracking 3 Common C++ Error Messages 3.1 “…undeclared…”, “No matching function…”, or “No match for…” errors 3.2 “undefined reference to…” errors 3.3 For more…

Unless you are a much better programmer than I am, you will almost certainly make some mistakes and get some error messages from the compiler.

This is likely to lead to two problems: capturing the messages, and understanding the messages. 1 Capturing Error Messages

When your programs contain mistakes, compiling them in the command shell can result in large numbers of error messages scrolling by faster than you can read them.

There are two basic ways to deal with this flood. You can use redirection and pipes to send the error messages somewhere more convenient, or you can use IDEs (Integrated Development Environments), programs that launch the compiler and try to capture all output from it. 1.1 Capturing Errors in the Shell

We’ve talked before about how many Unix commands are “filters”, working from a single input stream and producing a single output stream. Actually, there are 3 standard streams in most operating systems: standard input, standard output, and standard error. These generally default to the keyboard for standard input and the screen for the other two, unless either the program or the person running the program redirects one or more of these streams to a file or pipes the stream to/from another program.

1.1.1 Pipes and redirection

We introduced pipes and redirection earlier. The complicating factor here is that what you want to pipe or redirect is not the standard output stream, but the standard error stream. So, for example, doing something like

g++ myprogram.cpp > compilation.log or g++ myprogram.cpp | more won’t work, because these commands are only redirecting the standard output stream. The error messages will continue to blow on by.

The sequence “2>&1” in a command means “force the standard error to go wherever the standard output is going”. So we can do any of the following:

g++ myprogram.cpp > compilation.log 2>&1 g++ myprogram.cpp 2>&1 | more

A useful program in this regard is tee, which copies its standard input both into the standard output and into a named file:

g++ myprogram.cpp 2>&1 | tee compilation.log

Example 1: Try This: Capturing error messages

1. Create a directory in which to practice compiling.

mkdir ~/playing/sieve cd ~/playing/sieve

2. Place a file sieve.cpp in your ~/playing/sieve directory with the following contents.

sieve.cpp +

#include

using namespavce std;

/* * Find and print all of the prime numbers smaller than or * equal to upToN */ void findPrimes (int upToN) { if (upToN < 2) return; int* sieve = new int[upToN]; int numPrimes = 1; sieve[0] = 2; cout << 2 << endl; for (int i = 3; i <= upToN; i += 2) { bool isPrime = true; for (int j = 0; isPrime && j < numPrimes && sieve[j]*sieve[j] < i; j++) { isPrime = (i % sieve[j] != 0); } if (isPrime) { cout << i << endl; sieve[numPrimes] = i; ++numPrimes; } } delete [] sieve; }

int main() { cout << "Prime number generator" << endl; cout << "What is the largest number we should check to see if it is prime? " << flush; int max; cin >> max; cout << "\n\nPrimes up to " << max << endl; findPrimes(max); return 0; }

Do not type out all of that code by hand. Use the link to download the code and transfer it to the directory, or open an editor editing sieve.cpp in that directory and copy and paste the code into that editor.

3. Compile the code with the command

g++ -g -o sieve sieve.cpp

You should see a lot of error messages flying by, too quikcly for you to read.

When the command is done, you will be able to see only the last few error messages. that’s really a shame, becuase usually it’s the earliest messages that are most meaningful.

4. Compile the code with the command

g++ -g -o sieve sieve.cpp > errors.log 2>&1 more errors.log

Now you can see that the first error occurs quite early in the code. Most of the rest appear to have been caused by the compiler’s inability to recover from the first error.

5. Compile the code again with the command

g++ -g -o sieve sieve.cpp 2>&1 | tee errors2.log more errors2.log

This is very similar to the previous command, but there is, perhaps, less surprise about what will be found in the log file.

1.1.2 Capturing a Script

Another way to capture errors at the command lines is via a program called script.

“script” causes all output to your screen to be captured in a file. Just say

script log.txt and all output to your screen will be copied into log.txt until you say

exit script output can be kind of ugly, because it includes all the control characters that you type or that your programs use to control formatting on the screen, but it’s still useful. (In particular, if you ever want to capture both the output of some program AND the stuff you typed to produce that output (e.g., so you can send it in an e-mail to someone saying “what am I doing wrong here?”), then script is the way to go.) 1.2 Capturing Messages in an IDE

An IDe (Integrated Development Environment) is a program that assists programmers by combining an editor, a mechanism for launching a compiler and capturing error messages, and usually support for debugging as well.

Most IDEs would need to be run in a graphics-mode session, and we will look at some of those later. But there are two programs of note that can function as an IDE even in a text-mode connection: emacs and vim.

Do one or both of the following sections: 1.3 Compiling with emacs

Example 2: Try This:

1. Use emacs to edit your sieve.cpp program:

cd ~/playing/sieve emacs -nw sieve.cpp

2. Now give the emacs command: M-x compile. At the bottom of the screen, you will be asked for the compile command. emacs will suggest the command make -k, a suggestion that will make much more sense after we have looked at make files.

For now, use the backspace key to remove that suggestion, then type in the proper command just as if you were typing it into the shell.

In this case, delete the suggested make command and replace it with

g++ -g -o sieve sieve.cpp

emacs will invoke the compiler, showing its output in a window. Figure 1 shows a typical emacs session after such a compilation.

3. In this case, there should be one or more error messages. The emacs next-error command will move you to the source code location of the first error. That command is given as C-x` – that’s the backtick or backwards , usually found on the same key as ~, not the one usually found on the same key as the quotation mark.

Each subsequent use of C-x ` will move you to the next error location in turn, until all the reported error messages have been dealt with.

Use this command to step through the errors.

4. The very first error is the critical one. When typing my first draft of that code, my finger had slipped when reaching for a ‘c’ and I typed “vc” instead. Try fixing that, saving the file, and then repeating the compilation.

Figure 1: Compiling a program in emacs, with syntax errors

1.4 Compiling with vim

Example 3: Try This:

1. Use vim to edit your sieve.cpp program:

cd ~/playing/sieve vim sieve.cpp

2. Now give the vim command: :make sieve.o.

This will actually only do a partial compilation of the program. That will have to do until we learn about make files in a later lesson. But it’s enough to track down most of the errors.

3. You’ll see a listing of the errors scroll by, leaving you with the last few and a prompt to “Press Enter… to continue”.

Press Enter.

4. Now you will be back at your editor view showing the code. But the cursor will be placed at the location of the first error, which you can see summarized on the bottom line of the editor.

5. The vim next-error command is :cn. That will move you to the next error message. Use this command repeatedly to step through the errors. You will eventually notice that the cursor in the source code also moves along as well.

If you want to go back to a previous error message, use :cp.

6. The very first error is the critical one. When typing my first draft of that code, my finger had slipped when reaching for a ‘c’ and I typed “vc” instead. Try fixing that, saving the file, and then repeating the compilation.

Figure 2: Compiling in vim

2 Understanding the Error Messages

2.1 Cascading

One thing to keep in mind is that errors, especially errors in declarations, can cascade, with one “misunderstanding” by the compiler leading to a whole host of later messages. For example, if you meant to write

string s; but instead wrote

strng s; you will certainly get an error message for the unknown symbol strng. However, there’s also the factor that the compiler really doesn’t know what type s is supposed to be. Often the compiler will assume that any symbol of unknown type is supposed to be an int. So every time you subsequently use s in a “string-like” manner, e.g.,

s = s + "abcdef"; or string t = s.substring(k, m); the compiler will probably issue further complaints. Sometimes, therefore, it’s best to stop after fixing a few declaration errors and recompile to see how many of the other messages need to be taken seriously. 2.2 Backtracking

A compiler can only report where it detected a problem. Where you actually committed a mistake may be someplace entirely different.

The vast majority of error messages that C++ programmers will see are

syntax errors (missing brackets, semi-colons, etc.)

undeclared symbols

undefined symbols

type errors (usually “cannot find a matching function” complaints)

const errors

Let’s look at these from the point of view of the compiler.

2.2.1 Syntax errors

Assume that the compiler has read part, but not all, of your program. The part that has just been read contains a syntax error. For the sake of example, let’s say you wrote:

x = y + 2 * x // missing semi-colon

Now, when the compiler has read only the first line, it can’t tell that anything is wrong. That’s because it is still possible, as far as the compiler knows, that the next line of source code will start with a “;” or some other valid expression. So the compiler will never complain about this line.

If the compiler reads another line, and discovers that you had written:

x = y + 2 * x // missing semi-colon ++i; it still won’t conclude that there’s a missing semi-colon. For all it knows, the “real” mistake might be that you meant to type “+” instead of “++”.

Now, things can be much worse. Suppose that inside a file foo.h you write

class Foo { Foo(); int f(); // missing }; and inside another file, bar.cpp, you write

#include "foo.h"

int g() {...}

void h(Foo) {...}

int main() {...}

Where will the error be reported? Probably on the very last line of bar.cpp! Why? Because until then, it’s still possible, as far as the compiler knows, for the missing “\;}” to come, in which case g, h, and main would just be additional member functions of the class Foo.

So, with syntax errors, you know only that the real mistake occurred on the line reported or earlier, possibly in an earlier- #include’d file.

2.2.2 undeclared and undefined symbols

When you forget to declare or define a type, variable, function, or other symbol, the compiler doesn’t discover that anything is wrong until you try to use the unknown symbol. That, of course, may be far removed from the pace where you should have declared it.

2.2.3 type errors

When you use the wrong object in an expression or try to apply the wrong operator/function to an object, the compiler may detect this as a type mismatch between the function and the expression supplied as the parameter to that function. These messages seem to cause students the most grief, and yet the compiler is usually able to give very precise descriptions of what is going wrong. The line numbers are usually correct, and the compiler will often tell you exactly what is going wrong. That explanation, however, may be quite lengthy, for three reasons:

1. Type names, especially when templates are involved, can be very long and messy-looking.

2. Because C++ allows overloading, there may be many functions with the same name. The compiler will have to look at each of these to see if any one matches the parameter types you supplied. Some compilers report on each function tried, explaining why it didn’t match the parameters in the faulty call.

3. If the function call was itself produced by a template instantiation or an inline function, then the problem is detected at the function call (often inside a C++ standard library routine) but the actual problem lies at the place where the template was used/instantiated. So most compilers will list both the line where the error was detected and all the lines where templates were instantiated that led to the creation of the faulty call.

So, to deal with these, look at the error message on the faulty function call. Note what function/operator name is being complained about. Then look at the line where the faulty call occurred. If it’s inside a template or inline function that is not your own code, look back through the “instantiated from” or “called from” lines until you get back into your own code. That’s probably where the problem lies.

Here’s an example taken from a student’s code:

g++ -g -MMD -c testapq.cpp /usr/local/lib/gcc-lib/-sun-solaris2.7/2.95.2/../../../../include/g++-3/ stl_relops.h: In function `bool operator ><_Rb_tree_iterator,pair &,pair *> >(const _Rb_tree_iterator,pair &,pair *> &, const _Rb_tree_iterator,pair &,pair *> &)': adjpq.h:234: instantiated from `adjustable_priority_queue< PrioritizedNames,map >, ComparePriorities>::percolateDown(unsigned int)' adjpq.h:177: instantiated from `adjustable_priority_queue >, ComparePriorities>::makeHeap()' adjpq.h:84: instantiated from here /usr/local/lib/gcc-lib/sparc-sun-solaris2.7/2.95.2/../../../../include/ g++-3/stl_relops.h:43: no match for `const _Rb_tree_iterator,pair &,pair *> & < const _Rb_tree_iterator,pair &,pair *> &'

Now, that may look intimidating, but that’s mainly because of the long type names (due to template use) and the long path names to files from the C++ standard library. Let’s strip that down to the essentials:

g++ -g -MMD -c testapq.cpp stl_relops.h: In function `bool operator >: adjpq.h:234: instantiated from `percolateDown(unsigned int)' adjpq.h:177: instantiated from `makeHeap()' adjpq.h:84: instantiated from here stl_relops.h:43 no match for ... < ...

(This one is actually worse than most error messages, because it’s easy to miss the “<” operator amid all the <…> template markers.)

The problem is a “no match for” a less-than operator call in line 43 of a template within the standard library file stl_relops.h. But that template is instantiated from the student’s own code (adjpq.h) and so the thing to do is to look at those three lines (234, 177, and 84) for a data type that is supposed to support a less-than operator, but doesn’t.

2.2.4 const errors

Technically, “const”-ness is part of a type, so while sometimes these get special messages of their own, often they masquerade as ordinary type errors and must be interpreted in the same way. 3 Common C++ Error Messages

3.1 “…undeclared…”, “No matching function…”, or “No match for…” errors

In C++, most things that you give names to (e.g., variables, functions, etc.) need to be both declared and defined.

You declare something by introducing its name and sating what type of thing it is. For example:

int foo (int x); declares a function named “foo” and a parameter named “x”.

The rule in C++ is that you must declare names before your try to use them.

If you get a message that says that something is undefined or is not a match for the name appearing in some line of code, then you have probably either

forgot to declare it before using it, or misspelled the name in the declaration, or misspelled the name when you later used it, or omitted to #include the header file where that thing is declared. 3.2 “undefined reference to…” errors In C++, most things that you give names to (e.g., variables, functions, etc.) need to be both declared and defined.

Your define something by supplying its name, description, and the initial value, function body, or other information that “completes” everything the compiler needs to know about that named thing. For example:

int foo (int x) { return x + 2; } defines the function named “foo”.

The rule in C++ is that you must define things exacly once in all of the compilation units (.cpp files) that make up your program before producing your final executable.

If you are getting a message saying that some name is undefined, it means that the compiler/linker could not find that definition when it tried to generate your final executable program.

Most often, this seems to happen with functions. The possible causes are:

You have forgotten to supply a body for a function. You misspelled the name of the function in the body, so the compiler thinks you are supplying a body for a completely different function. You forgot to compile the .cpp file that has the function body or forgot to include the resulting .o file when you linked the rest of the program together. You gave the wrong compilation command, and told g++ to treat a single .cpp file as the entire program even though there are multiple .cpp files making up the program. 3.3 For more…

See my C++ FAQ. Project Management with Make

Steven Zeil

Last modified: Dec 7, 2017

Contents: 1 Makefiles 2 Symbols 3 Default Rules 4 Common Conventions 5 Creating a Makefile 5.1 Creating a makefile: Example 6 Is It Worth It?

When you begin to develop projects that involve multiple files that need to be compiled or otherwise processed, keeping them all up-to-date can be a problem. Even more of a problem is passing them on to someone else (e.g., your instructor) and expecting them to know what to do to build your project from the source code.

The Unix program make is designed to simplify such project management. In a makefile, you record the steps necessary to build both the final file (e.g., your executable program) and each intermediate file (e.g., the .o files produced by compiling a single source code file).

We say that a file file1 depends upon a second file file2 if the file2 is used as input to some command used to produce file1.

When the make program is run, it then checks to be sure that all of the needed files exist, and that each needed file has been updated more recently than all of the files it depends upon. 1 Makefiles

The key bits of information in a makefile, therefore are

For each file, a list of other files it depends upon, and

The command used to produce the dependent file from the files it depends upon.

A makefile may also include various macros/abbreviations designed to simplify the task of dealing with many instances of the same commands or files.

Suppose that we are engaged in a project to produce 2 programs, progA and progB. progA is produced by compiling files utilities.cpp, progA1.cpp, and progA2.cpp and linking together the resulting .o files. Program progB is produced by compiling file utilities.cpp and progB1.cpp and linking together the resulting .o files. All of the progA*.cpp files have an #include statement for a file progA1.h.

Here is a makefile for this project. This file should reside in the project directory, and should be called How do you create a Makefile? It’s just text, so fire up your “Makefile” or “makefile”. favorite text editor (emacs).

progA: utilities.o progA1.o progA2.o Do not, however, try to use a Windows editor to create g++ -g -DDEBUG utilities.o progA1.o progA2.oMakefile s with the intention of transferring it over to the mv a.out progA Unix system. There’s too many things that can go wrong: 1. make is particularly sensistive to to differences in line progB: utilities.o progB1.o endings used in Unix and Windows. If you fail to do g++ -g -DDEBUG utilities.o progB1.o mv a.out progB the proper conversion, your make file will be broken.

utilities.o: utilities.cpp utilities.h 2. Many Windows editors will play fast and loose with g++ -g -DDEBUG -c utilities.cpp Tab characters. You may type a “Tab”, but the editor may replace it with multiple blanks space characters. progA1.o: progA1.cpp utilities.h progA1.h That will also break your make file. g++ -g -DDEBUG -c progA1.cpp

progA2.o: progA2.cpp utilities.h progA1.h The insidious part about both this problem and the one g++ -g -DDEBUG -c progA2.cpp before it is that your make file will look just fine, because the flaw is in what are, in essence, “invisible” characters. progB1.o: progB1.cpp g++ -g -DDEBUG -c progB1.cpp 1. Many Windows programs will refuse to allow you to create a file named “Makefile” - they will insist upon adding a file extension such as “.txt” to the end. Even if you delete the “.txt’ when you save the file, you will find when you transfer the file that Windows insisted upon calling it ”Makefile.“ instead of ”Makefile".

The key information in a makefile consists of a variety of rules for producing “target” files. Each target rule begins with a single line A common mistake in preparing makefiles is to containing the name of the file to produce, a colon, and then a list of use ordinary spaces instead of a tab character in all files that serve as inputs to the commands that produce the file. front of these command lines. The usual result of Following that are any number of command lines that give the Unix this mistake is the error message commands to actually produce the file. Each command line starts with a “Tab” character (invisible in this listing). Makefile:N *** missing separator

Suppose that, with just this Makefile and the various source code where N is the approximate line number where files in your directory, you issued the command the error occurs. The “separator” here refers to the fact that make expects each rule to be make progB separated from the others by one or more empty lines. A line that starts with a space (instead of a make reads the Makefile and finds the rule for creating the file tab) is assumed to be a new rule. Since progB: command lines are not separated from the rest of the rule, a command line starting with a blank progB:➀ utilities.o progB1.o➁ instead of a tab appears to make as a new rule g++ -g -DDEBUG utilities.o progB1.o ➂ starting up without an empty line separating it mv a.out progB from the previous rule.

➀ We (and the make command) know that this rule tells how to create the file progB because progB is listed as the target, to the left of the colon. You and I know, from the initial discussion, that progB is actually a program, but make does not know that and doesn’t care about that. make regards its task to be the creation of files, and it does not care what is in those files.

➁ From the dependency list to the right of the colon, make discovers that, in order to create progB, it will first need up-to-date copies of utilities.o and progB1.o.

➂ make also learns that, once it has up-to-date copies of utilities.o and progB1.o, that it can then create progB by running the commands

g++ -g -DDEBUG utilities.o progB1.o mv a.out progB You can put any command lines into the commands section of a make rule that you can actually execute in Unix. This includes both basic commands that we have studied previously (e.g., mv), invocations of the compiler (e.g., g++), or any other program (including ones that you yourself might have written).

In this case, make should realize that it cannot execute these two commands immediately because it does not have up-to- date copies of utilities.o and progB1.o. In fact, neither of these files exists. Therefore make sets out to create them, by looking for the appropriate rules for each of them. utilities.o depends upon utilities.cpp and utilities.h. Since these files exist and do not themselves depend upon anything else, make will issue the command to create utilities.o from them. This command is the “standard” command for making a .o file from a .c file:

gcc -g -DDEBUG -c utilities.cpp

Next make looks at progB1.o. It depends upon progB1.cpp which exists and does not depend upon anything else. So make uses the standard command for C++ files:

g++ -g -DDEBUG -c progB1.cpp

Now that both .o files have been created, make proceeds to build its main target, progB, using the command lines provided for that purpose:

g++ -g -DDEBUG utilities.o progB1.o and the progB program has been created.

Example 1: Try This:

1. Download this zip file into a convenient directory and unpack it with the command

unzip makeTry.zip

2. cd into the resulting makeTry directory. Read through the files you find. You’ll see a copy of the makefile we have just been looking at, together with a collection of C++ course code files that implement a couple of rather simple programs.

3. Give the command

make progA

Look closely at the commands being run.

Do an ls and take note of the new files that have appeared.

Run the program with the command

./progA

4. Give the command

make progB

Look closely at the commands being run.

Do an ls and take note of the new files that have appeared. Run the program with the command

./progB

5. Give the command

make progA

again. Look closely at the commands being run.

Or, more precisely, the commands not being run. Because make is able to determine that all the necessary work to build progA has already been done, it is smart enough to not re-compile anything.

6. Give the commands

rm *.o progA make

Because we removed the program and its components, make recompiles everything for us.

Note that it only rebuilds progA and not progB. That’s because, if we run make without specifying a target i nthe command, it defaults to building the target of the first rule in the makefile.

7. Edit progA1.cpp by adding a C++ comment.

Give the command

make progA

again. Look closely at the commands being run.

Notice that make is smart enough to re-compile the file we have edited, but knows that it doesn’t have to recompile utilities.cpp and progA2.cpp, because nothing we could have done to progA1.cpp would affect the compilation of the other .cpp files.

8. Give the command

make doesNotExist.dat

Take note of the response that you get.

This is a pretty common response from make and, to me, seems pretty self-explanatory. But for some reason many people seem to be very mystified when they see this response.

If you want to test your makefile without actually performing the commands, add a -n option to your command (e.g., make -n progB) and make will simply list the commands it would issue without actually doing any of them. 2 Symbols

Thinking ahead, we might realize that we won’t always want to compile with the flags “-g -DDEBUG” (the significance of which will be introduced in the debugging section).

We can make our makefile more flexible by gathering things that might need to be changed later into a symbol. make allows us to define symbols like this SymbolName=string and later use that symbol like this: $(SymbolName).

So we could modify our makefile as follows:

# Macro definitions for "standard" language compilations # # First, define special compilation flags. These may change when # we're done testing and debugging. CPPFLAGS=-g -DDEBUG #

# # Targets: # progA: utilities.o progA1.o progA2.o g++ $(CPPFLAGS) utilities.o progA1.o progA2.o mv a.out progA

progB: utilities.o progB1.o g++ $(CPPFLAGS) utilities.o progB1.o mv a.out progB

utilities.o: utilities.cpp utilities.h g++ $(CPPFLAGS) -c utilities.cpp

progA1.o: progA1.cpp utilities.h progA1.h g++ $(CPPFLAGS) -c progA1.cpp

progA2.o: progA2.cpp utilities.h progA1.h g++ $(CPPFLAGS) -c progA2.cpp

progB1.o: progB1.cpp g++ $(CPPFLAGS) -c progB1.cpp 3 Default Rules

Makefiles can be simplified by introducing default rules for forming one kind of file from another. Here is an equivalent makefile that defines appropriate default rules:

# Macro definitions for "standard" language compilations # # First, define special compilation flags. These may change when # we're done testing and debugging. CPPFLAGS=-g -DDEBUG # # The following is "boilerplate" to set up the standard compilation # commands: .SUFFIXES: .SUFFIXES: .cpp .c .cpp .h .o .c.o: ; gcc $(CPPFLAGS) -c $*.c .cpp.o: ; g++ $(CPPFLAGS) -c $*.cpp # # Targets: # progA: utilities.o progA1.o progA2.o g++ $(CPPFLAGS) utilities.o progA1.o progA2.o mv a.out progA

progB: utilities.o progB1.o g++ $(CPPFLAGS) utilities.o progB1.o mv a.out progB

utilities.o: utilities.cpp utilities.h

progA1.o: progA1.cpp utilities.h progA1.h

progA2.o: progA2.cpp utilities.h progA1.h

progB1.o: progB1.cpp

In the “SUFFIXES” area, standard commands are defined for producing a .o file from a .c or .cpp file. Of course these standard commands simply invoke the C or C++ compilers. Command lines are not needed if the standard commands from the “Suffixes” area can be used to build the desired file. 4 Common Conventions

So far, we have talked about using make exclusively with compilation. But a makefile can control almost any sequence of operations that build one kind of file out of others.

Certain conventions have arisen that you will find useful in designing your own makefiles and in using the makefiles of others. Most of these involve certain “artificial” targets that you can use when issuing the make command.

These are just conventions. They don’t happen automatically, but most people who design makefiles set them up to work this way. all The target all compiles and builds everything (except, sometimes, documentation). So one of the more common ways to invoke make is to say

make all

The all target rule, if it is present, is always given as the first rule in the makefile. The make command, if not given any target, always goes to the first target rule. So, in most makefiles,

make

will also compile and build everything. clean The command make clean is often used to clean up a directory, deleting all the temporary files produced by make all, leaving only the original files (e.g., the source code) from which everything else can be rebuilt later, if desired. install In programs that must be “installed” by placing them in special directories, this target controls the commands necessary to do that installation.

A common sequence for building and installing new Unix software is therefore:

make make install make clean test Less common, runs a test suite to see if the program built successfully. doc (or docs) May be used to build program documentation, user manuals, etc. We can bring our sample makefile into conformity with these conventions as follows:

# Macro definitions for "standard" language compilations # # First, define special compilation flags. These may change when # we're done testing and debugging. CPPFLAGS=-g -DDEBUG # # The following is "boilerplate" to set up the standard compilation # commands: .SUFFIXES: .SUFFIXES: .cpp .c .cpp .h .o .c.o: ; gcc $(CPPFLAGS) -c $*.c .cpp.o: ; g++ $(CPPFLAGS) -c $*.cpp # # Targets: # all: progA progB

clean: rm progA progB *.o

progA: utilities.o progA1.o progA2.o g++ $(CPPFLAGS) utilities.o progA1.o progA2.o mv a.out progA

progB: utilities.o progB1.o g++ $(CPPFLAGS) utilities.o progB1.o mv a.out progB

utilities.o: utilities.cpp utilities.h

progA1.o: progA1.cpp utilities.h progA1.h

progA2.o: progA2.cpp utilities.h progA1.h

progB1.o: progB1.cpp

Example 2: Try This:

1. Edit the makefile from the prior Try This to match the listing above.

Don’t forget to use tab characters and not spaces in front of the indented commands.

2. In that makeTry directory, give the commands:

make progA ls make progA ls make clean ls make all ls make clean ls make ls

observing which commands are issued and which files are created at each step. 5 Creating a Makefile

When you are faced with the task of writing your own makefiles, the best way to start is to

1. Make a list of all files that will be involved in your project. This includes the file(s) that constitute the overall “goal” of the project, the files that you yourself will create “manually” using emacs or some other text editor, and all the intermediate files that will be produced by the build commands.

If you’re not sure what those intermediate files will be, write out a list of the commands that you would use to build the project if you were typing those commands, one at a time, into a command shell. Avoid compilation commands that perform multiple compile and link steps via a single command. Think instead of compiling each source code file separately and then separately linking the object files together. Every file mentioned in any of these commands should probably be in your list.

2. Now, divide the files in your list into two groups. The first group is the files that you create via a text editor or other interactive program. The second group are those files that will be created by running a non-interactive Unix command using other files (from either or both groups) as input.

How can you tell if a command is interactive or not?

It’s not enough to look and see if the file contains text, or even if it looks like program source code. By now you are familiar with lots of commands (e.g., grep and sed) that are used to generate text and that, with the aid of redirection, can put that text into a file. You’ve even seen examples of using such commands to alter and produce program source code. In fact, as you move to more advanced forms of programming, you will find that programs and commands that generate part of your program’s source code are quite common.

The safest way to tell if a command is interactive or not is to learn just what that command does. Another possibility is to try running the command and see if it actually stops and asks you for any additional input via the keyboard or mouse.

Finally, if you have no instructions at all on how to generate a particular text file, then odds are you are supposed to maintain that text via an (interactive) editor such as emacs.

3. For each file in the second group, write a makefile rule with that file as the target, the Unix commands used to produce it as the command part of the rule, and any files that will be read by those commands listed as the dependencies of the rule.

If the goal of the project is to produce a single file (e.g., a program executable), the rule with that file as the target should appear first. The other rules can occur in any order.

4. (Optional) Add symbols, artificial targets, and other refinements as desired to simplify your makefile simpler or to make it more convenient to work with. 5.1 Creating a makefile: Example

Suppose that we are engaged in a project to produce 2 programs, progA and progB.

progA is produced by compiling files utilities.cpp, progA1.cpp, and progA2.cpp and linking together the resulting .o files.

Program progB is produced by compiling file utilities.cpp and progB1.cpp and linking together the resulting .o files.

All of the .c and .cpp files have #include statements for a file utilities.h. Also, both of the .cpp files have an #include statement for a file progA1.h. Let’s step through our procedure for creating the makefile.

1. Make a list of all files that will be involved in your project.

The files involved will be: progA, progB, utilities.cpp, progA1.cpp, progA2.cpp, utilities.o, progA1.o, progA2.o, progB1.cpp, progB1.o, utilities.h, progA1.h.

2. Now, divide the files in your list into two groups: files created interactively and files created via non-interactive commands.

The .o files and the main programs are all produced non-interactively. The rest of the files are all program source code, typically (though not always) produced via an editor. So we get

Group 1: utilities.cpp, progA1.cpp, progA2.cpp, progB1.cpp, progA1.h, utilities.h.

Group 2: progA, progB, utilities.o, progA1.o,progA2.o,progB1.o.

3. For each file in the second group, write a makefile rule…

Since we have 6 files in group 2, we need to write 6 rules. For example, we will need a rule for progA1.o. That rule will have progA1.o as its target, the command used to produce it (g++ -g -c progA1.cpp) in the command part of the rule, and any files that will be read by the g++ command in the inputs/dependencies part of the rule. Obviously, progA1.cpp is one of those inputs. But the description above tells us that progA1.cpp includes progA1.h and utilities.h, so those will also be read when we compile and need to listed in the rule:

progA1.o: progA1.cpp utilities.h progA1.h g++ -g -DDEBUG -c progA1.cpp

Continuing on like that, we eventually get a full set of six rules:

progA: utilities.o progA1.o progA2.o g++ -g -DDEBUG utilities.o progA1.o progA2.o mv a.out progA

progB: utilities.o progB1.o g++ -g -DDEBUG utilities.o progB1.o mv a.out progB

utilities.o: utilities.cpp utilities.h g++ -g -DDEBUG -c utilities.cpp

progA1.o: progA1.cpp utilities.h progA1.h g++ -g -DDEBUG -c progA1.cpp

progA2.o: progA2.cpp utilities.h progA1.h g++ -g -DDEBUG -c progA2.cpp

progB1.o: progB1.cpp g++ -g -DDEBUG -c progB1.cpp

which is the example that we saw at the beginning of this lesson. 4. Add symbols, artificial targets, etc.,.

Remember that, if you invoke make without giving an explicit target, then by default it builds the first target. In this case, that would be progA.

We would probably prefer that, as a default, it built both programs in our project. So the single most useful thing we can add would be an artificial “all” target to encouraging building both programs:

all: progA progB

progA: utilities.o progA1.o progA2.o g++ -g -DDEBUG utilities.o progA1.o progA2.o mv a.out progA

progB: utilities.o progB1.o g++ -g -DDEBUG utilities.o progB1.o mv a.out progB

utilities.o: utilities.cpp utilities.h g++ -g -DDEBUG -c utilities.cpp

progA1.o: progA1.cpp utilities.h progA1.h g++ -g -DDEBUG -c progA1.cpp

progA2.o: progA2.cpp utilities.h progA1.h g++ -g -DDEBUG -c progA2.cpp

progB1.o: progB1.cpp g++ -g -DDEBUG -c progB1.cpp

6 Is It Worth It?

Now, creating a makefile may seem like a lot of trouble the first time that you want to compile your program. The payoff comes while you are testing and debugging, and find yourself making changes to two or three files and then needing to recompile. Which files do you really need to recompile? It can be hard to remember some times, and the errors resulting from an incorrect guess may be hard to understand. make eliminates this problem (as well as just being easier to type than a whole series of recompilation commands). (This is why, when you give the M-x compile command in emacs, the default compilation command is “make” rather than a direct use of any particular compiler.)

Most of the details of generating a makefile can be automated. The gcc/g++ compiler, for example, will actually write out the makefile rules that would determine when a given .c or .cpp file needs to be recompiled. I’ve used this idea in this “self-constructing” Makefile. To use it, copy it into your working directory where you keep the source code files for any single program. Your copy must be named “Makefile”. Edit your copy of the file to supply the appropriate program name, list of source code files needed for that program, and to indicate whether the final step (linking) should be done with the C (gcc) or C++ (g++) compiler.

Now you can compile your program by saying make, and clean up afterwards with make clean.

As you continue to work with your code, just remember to keep the OBJS list in the Makefile up to date. Compiling in Editors

Steven Zeil

Last modified: Aug 2, 2018

Contents: 1 Compiling with emacs 2 Compiling with vim

In an earlier lesson we saw that the editors emacs and vim can be used as programming tools to compile code and capture the inevitable compilation error messages.

In the exercises for that lesson, we had to work around the fact that

1. Both emacs and vim assume that you will use a make file to provide the compilation commands for your code.

2. We had not yet covered make files, and so had to work around that assumption to directly enter the compilation commands that we wanted.

It’s worth taking a moment, therefore, to practice with those editors again now that we know how to use make files.

Do one or both of the following sections: 1 Compiling with emacs

Example 1: Try This:

1. We’re going to start by getting a fresh copy of a slightly modified version of our “Sieve” program.

rm ~/playing/sieve/* cd ~/playing/sieve cp ~cs252/Assignments/sieve/* . ls

You should have two .cpp files, one .h file, and one makefile.

emacs -nw makefile

2. Now give the emacs command: M-x compile. At the bottom of the screen, you will be asked for the compile command. emacs will suggest the command make -k. Since we have a makefile for this project, that suggestion is just fine, so hit Enter and let emacs invoke make for you.

You should see various compilation error messages accumulate in a separate pane.

3. Use the emacs next-error command, C-x` , to move through the errors. Notice that emacs loads the appropriate source code files for you at each step.

4. To go back to the beginning, switch your cursor into the buffer with the error message (C-x o), move the cursor back t othe top of the listing, and return to your source code pane (C-x o again). 5. Make a second pass through the errors, correcting them. Recompile the code to make sure you have fixed them.

6. Exit emacs and list your directory. You should find the executable program, sieve, left behind by your successful compilation.

2 Compiling with vim

Example 2: Try This:

1. We’re going to start by getting a fresh copy of a slightly modified version of our “Sieve” program.

rm ~/playing/sieve/* cd ~/playing/sieve cp ~cs252/Assignments/sieve/* . ls

You should have two .cpp files, one .h file, and one makefile.

vim makefile

2. Now give the vim command: :make

This runs make, using the make file in the directory, and capturing the error messages.

3. You’ll see a listing of the errors scroll by, leaving you with the last few and a prompt to “Press Enter… to continue”.

Press Enter.

4. Now you will be back at your editor view showing the code. But the cursor will be placed at the location of the first error, which you can see summarized on the bottom line of the editor.

5. Use the vim next-error command, :cn, to move through the the error messages. Notice that vim loads the appropriate source code file as you go.

Use :cp to move back to the start of the list.

6. Make a second pass through the errors, correcting them. Recompile the code to make sure you have fixed them.

7. Exit emacs and list your directory. You should find the executable program, sieve, left behind by your successful compilation. The X Window System

Last modified: Nov 27, 2018

Contents: 1 X - an Overview 1.1 Clients and Servers 1.2 X Window Managers 1.3 X Performance and Your Internet Connection 1.4 What Do You Need to Run X? 2 Running X - Accelerated X Variants 2.1 X2Go Client 2.2 Pyhoca-GUI Client 3 Running “Classic” X 3.1 “Native” X Servers on Linux or Mac OS/X 3.2 Launching a Client via an ssh command 3.3 StarNet X-Win32 3.4 Xming 3.5 Cygwin/X 4 Working in X 4.1 Process Control 5 Some X Applications 6 Appendix: Alternative & Related Technologies 6.1 Remote Desktop Protocol (RDP) 6.2 Virtual Network Computing (VNC)

X is the for Unix. By running X, you can have several windows on the screen open at once, each devoted to a different task. For example, you can be reading electronic mail in one window while a lengthy compilation is running in another. X also allows the display of graphics and of a variety of fonts. 1 X - an Overview

If you were seated at a Unix or Linux system, you would treat X much the same as you normally treat Microsoft Windows on most PCs - it’s there, it lets you launch your programs, and it makes things look pretty. Most of the time, you are less interested in Windows than in the programs that you are running under it.

The reason that X is of interest in this course is that X was designed, from its beginnings, for remote use. That means that, even if you are not seated at a Unix or Linux system, you can run programs on a remote Unix/Linux machine and you can see and interact with programs that are running on some other, remote machine somewhere else on the network.

Now, you can do remote access with Windows as well (“remote desktop”). But that gives you a separate “box” on your screen that often “feels” very unlike the local applications you might be running on your own machine. And, because you have to transmit an entire desktop worth of graphics, this can be slow and unresponsive if your network connection isn’t up to it. With X you can transmit an entire desktop if you wish, but for remote access you have the option of transmitting an application’s windows separately and having them look and behave like windows on your local machine’s operating system. 1.1 Clients and Servers

To use X, you need to run two different programs. One is an X server, the program that is responsible for controlling the display of your local machine. The server must run on the machine at which you are seated — its job is to actually draw things on your display and handle mouse and keyboard inputs. The other is an X client, which is simply an application program that was compiled to work with X. The client program might be running on your local machine (e.g., if your local machine runs Linux or OS/X) or on a remote machine somewhere on the Internet, such as one of our Dept. Linux machines.

To run X, then you need to

1. Run an X server on your local PC.

2. Launch an X client application, probably on a remote machine.

This assumes, of course, that you have X server software installed on your local PC (or are carrying a portable X server on a flash drive). You may also need software to launch the remote application. Sometimes this comes with the server software. If not, almost any ssh software will do. If you’ve been using PuTTY, for example, for your telnet client, you can use it for ssh as well.

The course Library page has tips on what X servers and ssh programs are recommended, particularly for Windows PCs, and how to get them. 1.2 X Window Managers

X is a windowing system that can present a number of different appearances to the user. The appearance and behaviors that you actually see is controlled by a , a program that is generally run as part of the X start-up procedure. A window manager controls both cosmetic details like the colors used for window borders and other control elements, but also what elements actually appear on each window (e.g., does every window get a “close” button, what does that button look like, and where does it appear in the window?) and what menus appear in response to mouse clicks in various parts of the display. Figure 1, for example, shows X running the twin (Tom’s Window manager). Compare to Figure 2, which shows the same programs running under the ICE window manager.

gure 1: Tom’s Window Manager

gure 2: ICE Window Manager

Of course, if you are sitting at a machine that already runs a windowing operating system, then you can usually just use that as your X window manager. Figure 3 shows those programs running under a version of X that uses MS Windows as its window manager. That’s the approach we’re going to explore.

You can see a number of different window managers in action here. gure3: XWin32 - X in Windows

1.3 X Performance and Your Internet Connection

One thing that should have been obvious from the examples above is that, in X, you are sending a lot more than just plain text from the remote machine to your local PC. Graphics will require a lot more bytes than text, and it takes time to transmit those bytes. Even “plain text” may require a lot more time than it would in an ordinary text-mode ssh session. To send a line of text under X, the remote machine may need to send information on what font to render the text in, and may in some cases even need to send the font itself to your PC.

If even ordinary text-mode connections seem slow to you, you will have real difficulties running X. Nonetheless, you can’t expect to get useful computing work done transferring only plain text in all circumstances, so trying to work out a connection scheme that you can live with is important.

When we talk about connection speed to the Internet, most Internet service providers (ISPs) only talk about how many bytes per second they can ship to you. For web browsing and audio/video streaming, that may be the most important factor. But programming activities tend to be highly interactive. Even in a telnet session, you type a command, you pause to see it echoed on your screen so you can make sure it’s what you wanted, you hit the Enter or Return key, you wait for a result to appear on your screen, then you start typing the next command. It’s that back-and-forth nature that we need to think about. For our purposes, then, what’s important is not only the bandwidth, the number of characters per second that can flow along a connection once the data stream has gotten started, but also the latency, the amount of time it takes for the last byte of a request to go from your machine the remote machine and, assuming the remote machine responds immediately, for the first byte of its response to reach you. Think of the Internet as a hose (or a series of tubes) carrying water. A fire hose is high-bandwidth — it’s nice and wide so that, once the water is flowing, you can get a tremendous amount of water, especially as compared to a narrow (low-bandwidth) garden hose. But if either kind of hose is very long, then there is a long wait from the moment when someone on one end turns the spigot on and when you see the water emerge from the open end - that’s high latency.

X dates back to a time when networks were far less reliable and packets of information were often lost during transmission. When X is displaying something complicated on your screen, there’s a lot of back-and-forth communication. The remote X client sends a packet of information, then waits for your X server to acknowledge that the packet was received and is correct before sending the next packet. That means that latency is at least as important as bandwidth in determining how well X will work for you.

Ideally, you would have an ISP that gives you both high bandwidth and low latency. Some cable services and DSL will give you that. Dial-up services are fairly low bandwidth, but the better ones offer very low latency and so may be OK with X.

Beware of “satellite broadband” ISPs. These bounce their data streams off a satellite in orbit. Even at the speed of light, this adds a lot of time to the delivery of each byte. Satellite broadband can offer high bandwidth, which is good, but they also frequently have very high latency. Even text-mode connections can be painful on a satellite broadband service. You hit Enter and wait…. Satellite broadband may be fine for web browsing, email and even for streaming audio or video because, once the data stream gets going, it just flies. But most programming tasks are interactive, so latency is more critical for programming students.

Luckily, there is an option for people with moderately high-latency connections. The CS Dept’s Linux ssh servers support NX, a variation on the X protocol that exploits data compression and caching to offer tremendous speed improvements in high-latency situations.

In the sections that follow, we will cover both ordinary X and NX. X may be your better choice when connecting from on campus. NX is usually a better bet when connecting from off-campus. 1.4 What Do You Need to Run X?

To recap, if you want to run an X application on a remote machine, then you need to

1. Start an X server on your local PC.

2. Launch the X client application on the remote machine.

Some of the X packages we’ll discuss combine these into a single step. Others keep them as two distinct steps, or give you an option of one step or two.

Often, the first remote client program you will launch is xterm, which provides you a window into which you can type commands, much like an ordinary ssh session. You can then use those commands to launch more visually interesting applications.

The course Library page has tips on what X servers packages are recommended, particularly for Windows PCs, and how to get them (for free!).

In the next two sections I’ll review several packages that you can use. 2 Running X - Accelerated X Variants Earlier I discussed how X tends to be very sensitive to latency in your network connection to the remote machine. In practice, if you are running anything more visually complicated than emacs, this will be a real problem unless you are on the same local network as the local machine. In other words, if you are connecting to a CS Dept machine from off- campus, you are going to find X frustrating slow.

Luckily there are some “accelerated” X packages that speed up X, typically by reducing a lot of the “Did you get that last packet of information?”, “Yes, I got that last packet of information” back-and-forth acknowledgements and by using some modern compression techniques for sending graphics.

In practice, I have seen a more than ten-times speedup using accelerated X variants. Some students have reported 100- times speedups (that’s the difference between launching, say, a debugger and waiting 6 seconds for it to be drawn on your screen, versus waiting 10 minutes).

If you are (or are likely to be) connecting to our network from off-campus, use an accelerated X package.

There have been many attempts to define accelerated versions of X. The first to really see widespread use was NX 3, a protocol invented by the NoMachine corporation. NoMachine sold servers for NX 3, and made clients available for free, The protocol (and some of their server code) was released to the Open Source community, so other free servers and clients became available.

Others in the open source community have refined the NX 3 protocol further. One of the most popular derivatives from NX 3 is called X2Go.

The ODU CS Dept. runs servers supporting X2Go. 2.1 X2Go Client

The X2Go client is my recommended way for most people to use X on our servers. X2Go clients are available for Windows, OS/X, and Linux machines. Go to the Library page for information on how to obtain the X2Go client and install it on your local machine.

With X2Go you will be launching the X server on your local machine and a client program on the remote machine in one step.1

2.1.1 Setting Up an X2Go Session

The first time you run the client, you will see a main window looking like this. The left column is used to present information on your current remote connection. The right column shows the session presets that you have created and saved. Initially, both will be empty.

A “new session” dialog may pop up immediately on your first execution of the client. If not, find the new session button or the Session->new menu item to launch it.

To define a new session:

1. Fill in an appropriate name for your session.

2. Choose one of our Linux servers and fill its name in for the “Host:”.

3. Fill in your CS Dept. network login name, just as you normally type it to log in to our Linux servers, to the “Login” field.

4. Change the “Session type” to “Single application”.2 5. Type “xterm” for the command.

6. Click “OK” to save those settings.

You’ll be returned to the main window. Your new session settings will appear in the right column.

I recommend creating one session setup for each of our Linux servers, so that if one machine is down or overloaded, you can switch to the other.

If you later want to change some of the settings (or remove them), there is a small triangular button on the lower right of a settings box that drops down a menu allowing you to modify your settings. 2.1.2 Running an X2Go Session

From the main X2Go client window, click on a session settings box in the right column to open a connection to the remote machine.

Your session info moves over to the left column. Fill in your password and click OK to connect.

Within a few seconds, an xterm window should pop up. To close your session, simply exit from all programs you have launched from your xterm, and then type “exit” in the xterm (just as if you were logging out of a familiar ssh session).

If your session should become disconnected for any reason (e.g., a network glitch) while you still have programs running, then the next time that you connect via X2Go, you will be reconnected to your running programs.

You can take advantage of this deliberately to stop work for a little while by shutting down the X2Go client, then reconnect later and pick up from where you left off. But please don’t abuse this by leaving programs running in our shared environment for days or months at a time.

2.2 Pyhoca-GUI Client

Pyhoca-GUI is actually an alternate user interface for the same underlying X2Go library as used by the regular X2Go client. It is available for Windows and Linux only – no macOS (OS/X) version is available. A few students have had difficulties getting the X2Go client to run in Windows 10, but have been able to use Pyhoca-GUI instead.

Go to the Library page for information on how to obtain the Pyhoca-GUI client and install it on your local machine.

2.2.1 Setting Up a Pyhoca-GUI Session

When you launch Pyhoca-GUI, after a brief splash screen, you will get a small “ghost” icon in your system tray.

Right-click on that and select Profile Manager, then Add Profile.

The profile manager should pop up. Enter an appropriate name and then click on the Session tab.

In the Type: box, select “Custom command”. In the Custom Command: box, type “xterm”, if it’s not already in there.

Go to the Connection tab.

Enter your login name in the User: box and the name of your chosen SSH server in the Host: box.

Click Add to save your profile. 2.2.2 Running an X2Go Session via Pyhoca-GUI

1. In your system tray, left-click on the Pyhoca-GUI ghost icon, select Connect server, then the name of the profile you created.

You will be prompted for your password. After a few moments, you should be notified that a connection has been made.

2. While that connection is active, you can open as many xterm sessions as you like by left-clicking on the Pyhoca- GUI icon, selecting your profile/connection name, and then Start new session.

If your session should become disconnected for any reason (e.g., a network glitch) while you still have programs running, then the next select your profile/connection name, you will see that you have a session still running on the remote Linux machine and can select it to reconnect to that session.

3. To close your session, simply exit from all programs you have launched from your xterm, and then type “exit” in the xterm (just as if you were logging out of a familiar ssh session). 3 Running “Classic” X

Most people should use the accelerated X package, X2Go.

But, if you are on our local network and don’t see an accelerated package available, you might be able to use one of these unaccelerated X servers instead. You might also find this information useful if you yourself have two or more Linux or OS/X machines on your home network and want to remotely run programs from one on the other. 3.1 “Native” X Servers on Linux or Mac OS/X

If you are running a Linux PC and see windows, menus, etc., when you log in to your local machine, then you are already running an X server and don’t need to do anything special in this step. (Linux is an option for MS Windows users as well. You can get Linux distributions that can be booted and run from a CD or a flash drive. See the “Library” page for further information.)

Apple OS/X used to include a native X server. Later this became a package that was provided on some Mac models but not others. Now X is no longer provided on new Macs, but Apple supports the the XQuartz X server project. Check the “Library” page for information on this. On these systems, your X server is always running. So you just need to launch a client program on a remote machine. You can do this via ssh. 3.2 Launching a Client via an ssh command

Open a terminal/console window where you can type commands.

Give the command

ssh -Y -f -l username machinename xterm replacing “username” by your CS Unix login name and replacing “machinename” by the name of one of the CS Linux machines.

An xterm window should open up within a few seconds. Commands that you type into this new window are being run on the remote machine. For example, if you type

xterm & you will see another new xterm window open up within a few seconds.

If you find these ssh commands a bit tedious to type out, consider adding some lines to your ~/.bashrc file on your local machine to add a simple alias for these ssh commands, similar to the aliases shown here. For example, I have

alias atria="ssh -Y -f -l zeil atria.cs.odu.edu xterm -title atria -sl 500 -sb " alias sirius="ssh -Y -f -l zeil sirius.cs.odu.edu xterm -title sirius -sl 500 -sb " for these two machines and similar aliases for all the remote machines I use on a regular basis. Then I can launch a remote xterm by simply typing the machine name. 3.3 StarNet X-Win32

StarNet X-Win32 is a commercial X server for Windows machines. It is installed on most of the CS Dept. lab machines.

From the “Start” button menu, find and run X-Config. A dialog box should come up listing known connections. If you are working on a lab machine, there may be some connections pre-defined. But these won’t necessarily be to machines that you want to connect to. If none of the connections are what you want, click the Wizard… button to start defining a new connection.

Give the connection a descriptive name and select “ssh” as the connection type.

Give the full name (including the “.cs.odu.edu”) of one of the Linux servers as the host.

Enter your login name, if you wish, but leave the password empty.

Select “Linux” from the commands list to get the appropriate xterm command into the command box.

You’ll be returned to the main X-Config window. Select your newly created connection, and click the Launch button. You’ll be prompted for your password, then an xterm window should open up shortly. You should also see a small blue X icon in your taskbar tray. If you want more applications on the same remote machine, launch them from that xterm.

If you exit from that xterm, you will see that even after it closes, X is still running — the X icon will still be in your taskbar tray. Right- clicking on this will bring up a menu. Under “My Connections”, you will find the connection that you have just created. You can launch a new xterm by simply selecting this.

In future sessions, if you rerun the X-Config program, you will find your new connection listed and you can simply select it and click the Launch button. Alternatively, you can run X- Win32 instead of X-Config, then right-click on the taskbar icon and select the saved connection from the menu.

3.4 Xming xming is a free, open-source X server for Windows machines.

There are a few different ways to launch a client with xming. The easiest way is to use the XLaunch program (find it in the usual Start button menu) to start a wizard that will start Xming and a remote application at the same time. This is only good for starting your first remote application (usually xterm). If you want more applications on the same remote machine, launch them from that xterm.

Once you have started XLaunch, I recommend that you select “Multiple Windows”.

Because we’re trying to keep this a simple, one- step process, select “Start a Program”. (You would choose to start no client if you intended to later start a client in a separate step.)

For the program, type “xterm”. Select “Using PuTTY (plink.exe)”. For the name of the computer to connect to, use any of the CS Linux machines. Enter your login name and your password.

If you have a three-button mouse (the middle “button” is often a clickable scroll wheel), leave the “Additional parameters for xming” blank. If you have a two-button mouse, you might try adding the Xming parameter “-emulate3buttons 50”. This will allow you to simulate clicking a middle mouse button by pressing both mouse buttons at once.

3.5 Cygwin/X

CygWin is a free *nix emulation for Windows. Many open-source programs originally developed for Linux can be compiled, without any changes, in CygWin, yielding a Windows version of that Linux program.

If you have installed Cygwin/X, you may have a shortcut on your desktop or in your start menu to run it. Do so. You should see a small black “X” in the tray at the right end of your Windows task bar.

If you do not have such a shortcut, you can open a CygWin command window and give the command

startxwin

You should see the small black X mentioned above and will also probably see an xterm window open up in a few moments. This is an X application that is running on your local machine, not on a remote Unix box. This is your signal that the X server is running.

To launch a client program on a remote machine, follow the same instructions as given earlier for launching clients from Linux and OS/X machines using ssh commands. 4 Working in X

Example 1: Try This

Once you have X working, try opening up some additional windows with commands like:

xterm & xterm -bg Yellow -fg Green -sl 1000 -sb -geometry 40x16 & xclock & emacs & ~cs252/public_html/clientserver.eps &

The exact appearance of what you will see will depend on the window manager used by your X server.

Most programs that run under X support a very simple “copy-and-paste” facility. Simply drag the mouse across a block of text in any window while holding down the left mouse button. Then position the mouse into a window where you would like that text to be “typed”. Click the middle mouse button, and the selected text will be sent to that window just as if you had typed it yourself.

Try typing one of the above commands 9again) into one of your open xterm windows. Left-drag the mouse across that command. Left-click on the other xterm window to select it. Position your mouse near the cursor and click your middle mouse button.3 Hit Enter/Return if necessary, to complete the command.

A few things to keep in mind:

Some window managers will not immediately open new windows, but will present you with an outline of the window first that you must move, with the mouse to the place on the screen you want, then click to lay the actual window down at that position.

Any time you enter commands in Unix, you can place an ampersand (“&”) at the end of the command to run that command in the background. This “disconnects” the command from your keyboard (in that window). You get a prompt immediately and can enter your next command even if the one you just launched has not yet completed.

Now this capability is not all that useful if you’re not running X. After all, if the program you are running needs input from you, it has been disconnected and can’t see your subsequent keystrokes. Also, if that program produces output, it will still appear, but will be intermingled with the outputs of any new commands you have entered in the meantime. So, if you’re not in X, the & is useful only for commands and programs that need no additional inputs and produce no additional outputs.

Under X, however, many useful programs open their own windows and direct their inputs and outputs through those new windows. For example, you would enter “emacs &” rather than “emacs”, and “firefox &” rather than “firefox”. Without the &, the window where you entered the command to launch a program would be useless to you until that program has finished. With the &, that program runs in its own window and the old window gets a new prompt and can still be used to issue more commands.

As noted above, most X program support a simple copy-and-paste facility. When emacs is run under X, this cut- and-paste feature is supported, but in a different fashion. Text that has been selected in another window by dragging the mouse can be retrieved in emacs by the command C-Y (^Y). Text that has been “killed” in emacs by C-K, C- W, or M-W can be inserted into other windows by clicking the middle mouse button. 4.1 Process Control

Once you start using the & to run commands in the background, it’s easy to accumulate a large number of processes, all running simultaneously. It’s worthwhile at that point to learn a little bit about how to control your running processes.

Each time you type a command, a new process is launched to run that command. (That process may, in turn, launch other processes.)

When we talk about a process running in the foreground or in the background, we’re mainly talking about its relation to your keyboard. If a process is in the foreground, your command shell will not accept new keyboard input until the process finishes. If you do type additional characters, they will be passed on to the standard input of the foreground process (unless you have piped or redirected that input). If a process is in the background, the command shell will not wait on that process to complete before prompting you for another command.

You can run any command in the background by appending an ampersand (&) to the command line when you launch it. Now, it may not make a whole lot of sense to run many commands in the background. If they really need standard keyboard input or are going to be dumping a lot of text on your screen via standard input, you probably want to keep it in he foreground. But most X commands don’t use standard input and output. They communicate with you via the graphics interface. So you can, and probably should, launch most X commands in the background.

You can move a running command from the foreground to the background. Typing a ^z will ask the shell to pause the current foreground command. Then you can give the command

bg

(background) to move it into the background.

You can reverse this as well. The command

fg

(foreground) will move the most recently backgrounded command back to the foreground.

You can use the ps command to get a listing of the processes you have running. This command has a bewildering array of options, but the most useful is the -u option that allows you to request a list of all processes belonging to a particular user. (Even if you are running on your own private Linux box, you’ll find that there are dozens of processes belonging to “root” or other specialized accounts created for administrative purposes.)

For example:

> ps -u zeil PID TTY TIME CMD 6924 ? 00:00:00 sshd 6925 ? 00:00:00 tcsh 6927 ? 00:00:00 nxnode 7273 ? 00:00:00 nxnode 7276 ? 00:00:00 nxnode 8047 ? 00:00:00 nxnode 8050 ? 00:00:00 nxnode 8052 ? 00:00:00 tee 8054 ? 00:00:00 nxnode 8059 ? 00:00:00 nxnode 8094 ? 00:00:41 nxagent 8381 ? 00:00:00 xterm 8436 pts/59 00:00:00 tcsh 11619 pts/59 00:00:06 emacs 14827 pts/59 00:00:37 xxe 25369 pts/59 00:00:00 dbus-launch 25370 ? 00:00:00 dbus-daemon 25373 ? 00:00:00 gconfd-2 33127 ? 00:00:00 -settings- 79867 pts/59 00:00:00 ps

The output shows 4 columns. The PID column gives the unique process ID of each running process. The TTY column isn’t all that important. The TIME column shows how much CPU time has been consumed by this process. (If this number gets extremely large, it can indicate a program caught in an infinite loop.) The CMD is, perhaps the most interesting. It shows the name of the program running in that process. “sshd” is my ssh connection to this machine. “tcsh” is my command shell (running in the “xterm” shown later.) Because I am using NX, there are several processes associated with that. You can also see that I am running emacs and a program named xxe, which is the editor that i use to prepare these lecture notes. At the very bottom of the list, you can see the process running the ps command that actually produced this output.

As noted earlier, it’s common for some processes to launch other processes. Adding the -f option to the ps command gives a great deal more information about the processes, including a PPID column that tells you the process ID of the “Parent” process that launched this one. For example,

> ps -fu zeil UID PID PPID C STIME TTY TIME CMD zeil 1020 8436 0 09:50 pts/59 00:00:00 /bin/sh /home/zeil/usr/local/Doc zeil 1024 1020 2 09:50 pts/59 00:00:52 java -Xss4m -Xmx512m -DXXE_GUI= zeil 2585 8436 0 10:23 pts/59 00:00:00 ps -fu zeil zeil 6924 6907 0 09:02 ? 00:00:00 sshd: zeil@notty zeil 6925 6924 0 09:02 ? 00:00:00 tcsh -c /usr/lib/nx/nxnode --sla zeil 6927 6925 0 09:02 ? 00:00:00 /bin/bash /usr/lib/nx/nxnode --s zeil 7273 6927 0 09:02 ? 00:00:00 /bin/bash /usr/lib/nx/nxnode --s zeil 7276 7273 0 09:02 ? 00:00:00 /bin/bash /usr/lib/nx/nxnode --s zeil 8047 7276 0 09:02 ? 00:00:00 /bin/bash /usr/lib/nx/nxnode --s zeil 8050 8047 0 09:02 ? 00:00:00 /bin/bash /usr/lib/nx/nxnode --s zeil 8052 8047 0 09:02 ? 00:00:00 tee /home/zeil/.nx/C-atria-2056- zeil 8054 8047 0 09:02 ? 00:00:00 /bin/bash /usr/lib/nx/nxnode --s zeil 8059 7276 0 09:02 ? 00:00:00 /bin/bash /usr/lib/nx/nxnode --s zeil 8094 8050 2 09:02 ? 00:01:58 /usr/bin/nxagent -persistent -R zeil 8381 8059 0 09:02 ? 00:00:00 xterm zeil 8436 8381 0 09:02 pts/59 00:00:00 tcsh zeil 11619 8436 0 09:36 pts/59 00:00:08 emacs xwinlaunch/xwinlaunch.dbk zeil 25369 1 0 09:06 pts/59 00:00:00 dbus-launch --autolaunch=d29af3f zeil 25370 1 0 09:06 ? 00:00:00 //bin/dbus-daemon --fork --print zeil 25373 1 0 09:06 ? 00:00:00 /usr/lib/libgconf2-4/gconfd-2 zeil 33127 1 0 09:24 ? 00:00:00 /usr/lib/gnome-settings-daemon/g

In this listing, you can see that the tcsh shell in process 8346 was launched by process 8381, which is an xterm that, in turn, was launched by one of the NX processes, 8059. Among the other information provided are the time at which each process started (STIME) and a more detailed view of the command actually running in the process.

The ability to see process IDs can come is useful if you have a runaway process that will no longer respond to your keyboard. For example, if in your own programming, you accidentally create a program that goes into an infinite loop, the first thing that you would try is ^c to kill that process. But of that does not work, or if the process is in the background and won’t receive any such keyboard commands anyway), you can attempt to kill the process by command.

kill processID will attempt to kill the indicated process ’gently“, giving it a chance to save and close files and perform other housekeeping. If a subsequent check, after a few seconds, with the ps command shows that the errant process is still running, you can give the much more definitive ”hard" kill:

kill -9 processID which pretty much almost always does the job. 5 Some X Applications

Here’s some X-based programs you might want to try. Remember that help files on most programs can be accessed via the “man” command. If you can’t invoke them directly (e.g., programName), try /usr/X/bin/programName. ddd An interactive debugger that can “draw” pictures of your data. eclipse A powerful IDE (Integrated Development Environment) for Java and C++ evince A viewer for Postscript and PDF files gedit An easy-to-use text editor. A graphics viewer and editor that rivals the best commercial products in the area. gnuplot Function/data graphing tool. nemiver A graphical front end to the gdb debugger. A drawing program for shape-based diagrams xfontsel A program that lets you explore the fonts available on the machine. (Use in conjunction with xlsfonts, which gives a simple listing of available fonts.) xterm Probably the most commonly used X application: opens a command window. 6 Appendix: Alternative & Related Technologies

X is one of the earliest protocols to support a GUI connection to remote machines, dating back to 1984. Not surprisingly, alternatives have arisen, though sometimes for slightly different purposes. In particular, conferencing systems allowing various degrees of application sharing or desktop sharing are a hot topic. In addition to those, a couple of desktop sharing systems worth mentioning: 6.1 Remote Desktop Protocol (RDP)

Most Windows systems feature a remote desktop capability. You have to activate it to show anyone your desktop. Viewers (called RDP - Remote Desktop Protocol - viewers) are common even for Linux and other non-Windows systems. Some Linux distributions now offer RDP servers as well, allowing you to access their desktops from remote locations. 6.2 Virtual Network Computing (VNC)

VNC is a popular solution for small groups who want to view and/or manipulate a shared desktop. Windows, Linux, & Mac machines can all participate equally. You need someone to “host” the group by running a VNC server. The host machine’s screen will be visible to everyone who joins the session. Optionally, these remote participants may be able to supply mouse and keyboard input as well.

I recommend TightVNC, a free VNC package with a lot of nice features. One convenient one is that only the host absolutely needs to install the software. The server can generate a web page that runs the viewer as an applet. All the non- host participants simply direct their browsers to a special URL and the viewer software will be loaded and run automatically. (Of course, there is also a non-applet viewer that is a bit faster and offers more options/features.

The trickiest part of hosting a VNC session is that everyone else must be able to connect to a specific port (or pair of ports) on the host machine. This may require opening a special hole in your firewall. It’s a reasonably safe thing to do, but the procedure varies according to the firewall/router you are using, and some people may stumble over this.

Another option would be to run a VNC server on one of the CS Dept Unix machines, then connect to it to view the “desktop” on that Unix machine. Since this “desktop” is actually an X session, VNC can actually be used as an indirect way of running X-based applications. Some people have reported that this approach works better than a direct connection if their Internet connection has high latency.

To do this, ssh or telnet in to one of the CS Unix machines. For the sake of this example, let’s say you chose tango.cs.odu.edu. On that machine, give the command

vncserver :NN -once where NN is a two-digit number. I’d suggest starting around 20. If you get a message saying that there is already a vncserver on that port, try a different number. Eventually you should be asked to pick a password. Choose one. If you are using VNC just as an X alternative, this password prevents others from hijacking your X session. If you were using VNC as a way of collaborating with other people on a team project, you would let them know the NN you chose and the password via email, telephone, IM, etc. (This same password will be used for any future VNC sessions you create unless you delete the file ~/.vnc/passwd).

Now that a VNC server is running on your chosen machine, you need to run a VNC viewer (e.g., vncviewer.exe from the TightVNC package) on your local PC. When asked what machine you want to connect to, respond with tango.cs.odu.edu::59NN, replacing the NN by the two-digit number selected earlier. Of course, if you chose a different machine other than tango, change that name too. Also note that those are two colons between the machine and number, not just one. You should then see a large rectangular window open up with a smaller window inside it (Figure 4). The outer rectangle is a “virtual desktop”. The inner window is an xterm running on the Unix machine. Try issuing a few commands in that xterm. Try moving it (click on the title bar and drag). Click on the open area of the desktop to open up a menu of basic desktop options.

When you done, shut down the server by giving the command

vncserver -kill :NN

1: Yes, the client/server designation is a bit confusing. The X2Go client runs on your PC. It launches an X server on your machine, which waits for instructions on what to draw on your screen. The X2Go client then connects to an X2Go server on our network. That server allows you to run an X client program on our network, that connects to the X server on your PC and tells it what to draw. Luckily, it’s easier to use than to explain.

2: If you would like to experiment with running a full Linux desktop environment, you can select “MATE” instead. Then go to the Input/Output tab and set your desired “Display” size. I don’t recommend this on a regular basis, however, because lots of people running full desktops may load down the server to the point where it becomes unusable.

3: This assumes, of course, that you have a middle mouse button. If your mouse has two buttons with a clickable scroll wheel, the scroll wheel will function as a middle button. If you have a true two-button mouse, your X server may be configured to treat a simultaneous click of both mouse buttons as a middle-click. Give it a try. Editing under X

Steven Zeil

Last modified: Aug 2, 2018

Contents: 1 gedit 2 emacs in Graphics Mode 2.1 Graphics-Mode Features 2.2 Enhanced Text Manipulation 3 vim in Graphics mode

Editing is one task that can benefit tremendously from the availability of a windowing system, enhancing the ability to work with more than one document at a time, adding support for menus, mouse-clicks to reposition the cursor, scroll bars, etc. 1 gedit

Every Linux distribution has at least one “vanilla” X-based editor, the graphics mode equivalent of nano or of Notepad in Windows.

On our Linux servers, this is gedit.

Example 1: Try This: Compare and Merge in emacs

1. Launch an xterm on one of the Dept Linux servers.

2. Copy the file ~cs252/Assignments/emacsxDemo/jabberwocky1.txt to ~/playing.

3. Launch gedit:

cd ~/playing gedit jabberwocky1.txt &

4. In a few seconds, you should now be looking at a window showing the file jabberwocky1.txt.

Use the arrow keys and the mouse to move around in the file.

5. Explore the various menu options available to you.

6. Now use the Open menu item or toolbar button to open ~/playing/sieve/sieve.cpp, which should be left over from one of your earlier exercises in compiling.

Notice that, although gedit is a very basic editor, it still offers colored “syntax highlighting” for C++ (and other common programming languages).

7. Exit gedit when you are done.

Do one or both of the following sections: 2 emacs in Graphics Mode

If you have not already done so, try launching emacs from within an xterm. Notice that it pops up in a separate window. Compare this look …

… to emacs when run in a ssh text window. Note that you now have functioning menus and scroll bars. You may see new use of color. A little experimentation will show that you can, indeed, reposition the cursor with a simple mouse click, and you may find that your PageUp, PageDown and other special keys now behave in an intuitive manner.

Many people who dislike emacs complain about having to learn a large number of keyboard command sequences to get things done, like moving the cursor, scrolling the text, etc. What you should be able to see, now, is that such key sequences are only necessary for relatively uncommon actions, or if you are not running in a windowed environment. After all, if you are running in a text window, how can you expect a mouse click to do anything? In fact, emacs senses whether or not you are running in a windowing system, and takes advantage of it if you are, but leaves you with the key sequences as a backup mode when working under more primitive conditions. 2.1 Graphics-Mode Features

There are few emacs functions that can only be done in graphics mode. But there are also a lot of things that just get easier to work with once you have color and formatting options and menus.

Example 2: Try This: Menus in emacs

1. Launch an xterm on one of the Dept Linux servers. 2. Launch emacs:

cd ~/playing/sieve make clean emacs sieve.cpp &

When emacs comes up on your screen, you should see the various syntax elements of the C++ code appearing in different colors.

3. Click in various places in the C++ text. Notice that you can now position the cursor with a mouse click instead of the keyboard movement commands (although those are still available if you want them).

You should be able to select text by dragging the mouse as well.

You have a scroll bar, though it may not be visible at the moment because this file is so short.

4. You have already seen how emacs provides commands C-x 0, C-x 1, and C-x 2 to split the text into “windows” and recombine them again. If you have forgotten how these work, now might be a good time to refamiliarize yourself with them.

Of course, these aren’t “real” windows (or else those commands would be useless when working via text-mode ssh), but we have to cut emacs a little slack here. Its use of the term “window” to denote those text areas predates all window-based operating systems.

When running under a true windowing system like X, however, emacs does provide analogues of these commands that really do work with “real” windows (called “frames” by emacs and, ofr that matter, by most GUI programming libraries as well). These are produced by replacing the C-x by C-x 5. For example, C-x 5 2 will create a new frame.

Try this, and the other frame command analogues.

5. You should have a fully functional menu bar at the top of the emacs window. Click on a few of the menus just to see what’s there.

In the File menu, for example, you will find alternate ways to invoke the windows and frames commands that you were just experimenting with.

Note that, because you are currently editing a C++ file, you have a C++ menu of items specific to C++ editing. (Others that will be of interest for program development appear in the Tools menu, including entries for compiling your code and launching the debugger.)

Click on the Tools menu entry “Compile…”. You should see a familiar prompt appear in the status line as if you had used the M-x compile command. Go ahead and compile the code.

Even if you have no compilation errors, this will still split the display to show the compilation results. Notice that you can use your mouse now to move between the different windows instead of relying on C-x o.

6. Normally, after a successful compilation, we would want to get rid of the *compilation* window.

Do you remember the emacs command to make the sieve.cpp window fill the entire emacs frame?

Click for Answer + Position the cursor anywhere inside the sieve.cpp window and give the command C-x 1.

Try that now.

7. Exit emacs. You cna do this from the File menu or by the usual C-x C-c key sequence.

2.2 Enhanced Text Manipulation

It’s very common for both writers and programmers to wind up with multiple copies of files containing largely the same content. You might want to figure out what, if any, the actual differences are. For example, you might have an older and a newer version of a program’s source code and wonder just what changes you made to go from one version to another. In the Tools menu, you will find commands to Compare two files (or two buffers, if you have laready loaded the files into emacs). This will show the two files, one beneath the other, with colored sections highlighting the differences and will allow you to step from one to the other.

Such comparisons are particularly useful when testing computer programs, where you often have a file indicating the expected output for a particular test, and would like to compare the actual output that you received by running your program to see if they match or, if they do not, exactly what the differences might be.

A related problem often faced with programmers is discovering that you have made some changes to a program in one file, but that you or a teammate have made other changes in a separate copy of that file. The Tools menu also includes commands to Merge two files (or two buffers). Like the Compare function, this highlights the differences between the two input files and allows you to step from one point of differnece to the next. But the Merge function allows you to select which changes to copy into a third, merged copy that you can use to consolidate the separate changes.

Example 3: Try This: Compare and Merge in emacs

1. Launch an xterm on one of the Dept Linux servers.

2. Copy the file ~cs252/Assignments/emacsxDemo/jabberwocky1.txt to ~/playing.

3. Launch emacs:

cd ~/playing emacs jabberwocky1.txt &

4. You should now be looking at a frame showing only the file jabberwocky1.txt. We’re going to make a copy of this file and make a few changes. Save this file as jabberwocky2.txt (use C-x C-w or choose “Save As” from the File menu).

Now, let’s invoke the emacs “batch” spell checker, used for checking an entire file at a time. M-x ispell-buffer is the command to launch this.

As you might suspect, the spell checker is going to have a field day with this poem.

For each misspelled word in the first and third lines of the poem (starting with the line beginning “Twas…”, not counting the title lines), if ispell suggests one or more replacement words, choose one of the suggested replacements by typing the indicated key. (For example, you should be able to replace “brillig” by “Broiling” by typing ‘4’.)

Leave all misspelled words in the 2nd line unchanged.

When the spell checker has moved past the first 3 lines of the poem, use q to quit the spell checker. Save this file. 5. The ispell spell checker actually is not restricted to use under a windowing system. It will work perfectly fine in a text-mode emacs session. But, now let’s look at an alternative. Open up a new, blank file, called jabber.txt. Then give the command M-x flyspell-mode. (Don’t forget, by the way, that you can tap the space bar after the first few characters to attempt auto-completion of the command.) Then type (not copy-and-paste) in the following line of text:

The bandersnatch also appears in the pome "The Hunting of the Snark".

Notice that, in the on-the-fly spellcheck mode, misspellings are flagged immediately, just as in most commercial word processors. (And were you surprised that one of those words actually was in the checker’s dictionary?)

If you have a 3-button mouse, and your X server supports it, try middle-clicking on one of the misspelled words. If you have a two-button mouse, some servers will emulate the middle button if you press both buttons simultaneously. If this works for you, you should see a list of suggested corrections pop up.

Save that file, or dispose of it as you will. We’re done with that one.

6. A fairly common problem in program development is coming up with two versions of the same file and wondering if their contents are really different and, if so, what those differences are. Of course, we can use the Unix diff command to obtain this info, but it can be rather hard to interpret the differences from the diff output format.

From the Tools menu, select “Compare (Ediff)” then “Two Files…”. Follow the prompts to select “jabberwocky1.txt” and “jabberwocky2.txt” as the files to be compared.

You should see a new, small window pop up momentarily. This is the control window for ediff.

The principal commands for ediff are n to go to the *n*ext difference, p to go back to the *p*revious difference, and q to *q*uit when you are done. Use n and p to navigate through the differences between the two files. You can also expand the command window to get a list of other ediff commands with ?. Another ? will close it again.

Quit ediff when you have viewed all the differences.

7. Sometimes when you have two versions of the same file, you find that you have gotten unlucky. Both versions contain changes that you want to keep. This can easily happen if you are working in teams on a project, or if you keep separate copies on your home PC and on the Unix network. emacs supports a merge facility that can let you selectively combine changes from two files.

Do a C-x 1. From the Tools menu, select “Merge” then “Files…”, and again select the two jabberwocky files.

Again, you can use n and p to navigate back and forth across the set of differences. In addition, however, the commands a and b will copy into the “merge” window the “A” and “B” versions of the differing text.

Using these commands, let the merged version of the files contain the “A” version of the first line of differing text but the “B” version of the 2nd difference. Then quit the merge process with q.

8. With the cursor position in the *ediff-merge* area, use C-x C-w to save this merged version as jabberwocky3.txt.

9. Exit emacs.

10. Back in your xterm window, do an ls to examine the files you have been working with. Give the following commands to compare the various jabberwocky files:

diff jabberwocky1.txt jabberwocky2.txt diff jabberwocky2.txt jabberwocky3.txt diff jabberwocky1.txt jabberwocky3.txt

Do the number of changed lines reported make sense, given the changes you actually made?

3 vim in Graphics mode vim also has an enhanced mode for graphics mode connections, invoked either as gvim or as vim -g.

Example 4: Try This: gvim

1. Launch an xterm on one of the Dept Linux servers.

2. Launch gvim (or vim -g):

cd ~/playing/sieve make clean gvim sieve.cpp &

When vim comes up on your screen, you should see the various syntax elements of the C++ code appearing in different colors.

3. Click in various places in the C++ text. Notice that you can now position the cursor with a mouse click instead of the keyboard movement commands (although those are still available if you want them).

You should be able to select text by dragging the mouse as well.

You have a scroll bar, though it may not be visible at the moment because this file is so short.

4. Explore the various options available to you from the menu bar.

5. Try using the :make command to compile the code.

6. Exit vim. Troubleshooting X

Steven Zeil

Last modified: Aug 2, 2018

Contents: 1 Common Problems 1.1 X Applications Seem to be Unreasonably Slow 1.2 “Cannot Open Display” 1.3 “connect localhost port 6000: Connection refused” [connect-localhost] 1.4 “connection … refused by server” or “Client is not authorized to connect to Server” [refusedByServer] 1.5 “xterm: unable to locate a suitable font” 1.6 Nothing at all seems to happen 1.7 If You Suspect Your Security Software is Blocking Your X Connections 2 X, Firewalls and Security Monitors 2.1 What Does a Firewall Do? 2.2 Problem: Displaying X is a Server’s Job 2.3 Tunneling X via ssh 2.4 So Much Security, So Little Safety

For some of you, an X server may be the most complicated piece of Windows software you have ever installed and run. If you encounter problems… 1 Common Problems

1.1 X Applications Seem to be Unreasonably Slow

1. Try using X2Go instead of a “pure” X server. Even if your local machine is a Linux or OS/X box with a built-in X server, you are likely to find X2Go faster for remote connections.

2. Choose your applications carefully. Applications with fancy tool bars and graphics may run slowly, or may take a long time to appear on your screen but then run at acceptable speeds. If my network connection is sluggish, I am, for example, more likely to run nemiver or gdb-mode within emacs than to use ddd.

3. You can reduce the amount of network traffic required by X by telling your X server software to always use a “backing store”. In X-Win32, this option is found by right-clicking on the small blue “X”, selecting X-config, and going to the “Display” tab. A backing store is simply a copy of the window’s graphics kept in your local memory. This can be used to refresh the screen when part of your X application gets momentarily covered up by some other window, or if you iconify the X application and then restore it from the task bar. Without a backing store, all the graphics may need to be retrieved from the remote client over your modem connection.

This won’t speed up all applications (the default setting of many X servers is to use a backing store only if the application program requests one, so some X applications may already be getting a backing store). But, for some, this might make a big difference.

4. Get out the old modem and find a cheap dial-up ISP. I have had students in remote, mountainous regions report that, although they would continue to use their satellite broadband (their only available broadband option) for web browsing, they got better responses by using dial-up for their programming work. 1.2 “Cannot Open Display” This message is the most common problem that arises when trying to run X applications. Possible causes:

If you are also getting a message “connection … refused by server” and/or “Client is not authorized to connect to Server”, see .

You forgot to start the X server before trying to launch an application.

You made a text-mode connection to a remote machine (e.g., used PuTTY without setting X forwarding, or using an ssh command without the -Y option) and then tried to launch a graphics-mode X program from there.

You are running behind a firewall that is set to an particularly paranoid level of protection.

This can be a problem with ZoneAlarm and other firewalls that prompt you for permission every time a new program tries to access the network. It’s all too easy to answer “Block” just once and have that program be locked out forever afterwards.

It can also be a problem with some LINUX distributions (e.g., some RedHat versions) that, by default, are configured with a firewall that blocks any X connections from other clients. See below. 1.3 “connect localhost port 6000: Connection refused” [connect- localhost]

The important thing about this message is that it indicates that the problem was with “localhost” (or “127.0.0.1”), which is the local machine that you are sitting at. The problem is not that some remote machine out on the Internet has refused you access, but that the machine you are sitting at has in fact refused to allow a connection from itself to itself.

This is pretty much an obvious sign of a firewall or other security program getting in the way. You might have gotten a pop-up window warning that a program (such as XWin.exe or ssh.exe) was trying to make or accept a connection on port 6000. If you told the firewall to block it, that’s the problem. If your firewall or security program is set to block unexpected connections without even asking you, that could be the problem.

The easiest way to check this would be to temporarily turn off your firewall and security software, then try launching an X client again. Unfortunately, I’ve seen some cases where even a switched-off firewall stays partly active and continues blocking things that it believes it has been previously told to block.

The exact fix for this depends a great deal on what firewall/security software you are running, but somehow you need to convince them to trust the X and ssh programs. Here are instructions for telling the Windows firewall to allow exceptions. Try adding those programs as exceptions. You might also want to try opening port 6000. 1.4 “connection … refused by server” or “Client is not authorized to connect to Server” [refusedByServer]

The “not authorized” and “refused by server” messages means that you got at least part way through the process of forming an X connection. The X client (xterm) running on the remote machine was able to actually talk to your X server software (running on the PC you are seated at) but your X server software’s security settings shut it down.

This message generally comes from StarNet’s XWin32. I’ve never seen it reported from users of Cygwin/X.

Things to check:

If you are using ssh to connect to X-win32, and if you downloaded it from the ODU site, then it is important that you followed these instructions for installing it, rather than the instructions included in the downloaded package.

In particular, did you enter the OCCS registration key during installation instead of just “demo”? (XWin32 should warn you that you are in demo mode every time it starts up.) If you entered that code, you cannot use ssh to launch a session and therefore cannot use X-Win32 from behind a firewall/NAT server. You will need to uninstall X-Win32 and then reinstall using my directions.

Run the X-Win32 X-config program and check the “Security” tab. Make sure that “Access Control” and “Use XAuth” are both clear. Click OK in X-config and shut down XWin32 if you have it running. Restart XWin32 and try to connect again.

If you still get the “not authorized” message, then go back to that security tab again. This time, click “Access Control” and use the “Add…” button to add to the X-host list:

localhost 127.0.0.1

Again, click OK in X-config and shut down XWin32 if you have it running. Restart XWin32 and try to connect again via ssh. 1.5 “xterm: unable to locate a suitable font”

This message is usually caused by a font configuration problem on the Unix side. There is a simple workaround, however.

Log into your Unix account. Using emacs or some other editor, create a file named ~/.Xdefaults In that file, place the following lines:

xterm*facename:dejavu sans mono:pixelsize=12 xterm*Geometry:80x36 xterm*foreground:midnight blue xterm*background:white

Actually the first line is the critical one for this error - but you might find the effects of the other lines interesting. You may also want to play around with other combinations. To see what options you can put in for the facename, give the command

fc-list :scalable=true:spacing=mono: family

You might also experiment with changing “medium” to “bold” or changing the pixelsize. 1.6 Nothing at all seems to happen

If you are launching via ssh/Putty, check and make sure that you remembered to start your X server software and that it is running on your PC before you try to launch a client on the remote machine.

You may have an issue with your firewall or other security software. 1.7 If You Suspect Your Security Software is Blocking Your X Connections

If you are able to make an ordinary text-mode ssh connection, then leave your router firewall alone. No need to weaken it or to try turning it off. It’s not the problem.

(Windows, Mac) See if your X server will display clients running on your local machine. Start your X server software. Then

(Mac) Open a command/terminal window and give the command “xterm &”. You should see a window pop up that issues commands to your local Mac. (Windows running Cygwin/X) Open a CygWin command window and give the command “xterm &”. You should see a window pop up that issues commands to your local PC.

(Windows running Xming): Install the “Tools and Clients” option and try running xcalc or xclock.

If you are able to view an X client program running on your local PC, then you know at least that the X server software is working and the problem is definitely in your communications with the remote machine.

(Windows, Mac, Linux) If you have a router protecting you with it’s firewall, try disabling or turning off the software firewall on your PC. If that relieves the problem, then turn it back on, find the instructions for allowing specific programs to communicate through the firewall, and give your X server software that permission. If you are asked about specific port numbers, try 22 and 6000.

(Windows, Mac, Linux) If your PC is connected directly to your cable or DSL modem, i don’t recommend turning off your software firewall even momentarily. Find the instructions for allowing specific programs to communicate through the firewall, and give your X server software that permission. If you are asked about specific port numbers, try 22 and 6000.

(Windows, Mac, Linux) Find the instructions for modifying the settings of whatever security monitors you may be running. Look for any evidence that wither your X server software and/or your ssh client are being blocked. Or try (temporarily) disabling them, one by one, to see if they are the problem, reactivating them if the disabling has no effect on your communications.

(Windows) Try running the X server as an administrator. (Right-click on a shortcut or on the program executable. Look for “advanced properties” and a checkbox to “run as administrator”.) This will probably cause the UAC to pop up every time you run the program to ask if you really want to do that, but you can ignore that warning. 2 X, Firewalls and Security Monitors

One serious barrier to using X on some machines is the action of other programs that deliberately block signals from the X client from reaching the local server software.

Firewalls are software security programs that many people use to protect their networks from outside hacking. One of the features of most firewalls is to block incoming communications to sockets other than the reserved socket numbers for email, http, or other services the corporation wants to support.

Virus scanners inspect files on your disk, files that you download and/or receive as email attachments, and files that you try to execute. The scanner compares these against a database of known malware, warning you and sometimes acting immediately to quarantine or delete the offending file.

Spyware scanners perform many of the same functions as virus scanners, but watch for a variety of less-dangerous malware. These include email filters that watch for phishing and hijacking attacks that display an apparently innocuous link that takes you to a site very different from what the visible content of the email would lead you to expect.

Security monitors watch for suspicious behaviors by running programs. Such behaviors can include opening communications channels to other machines or the overwriting of system files.

Many software packages that provide one of the above services will also provide others. For example, some virus scanners support a “heuristic” mode that tries to detect viruses by “patterns” of behavior rather than by matching its code against a database of known viruses. Such a mode is, in essence, a security monitor.

We’ll start this discussion by looking a bit more closely at what a firewall really does. 2.1 What Does a Firewall Do?

Communication of any kind of service or data over the Internet is addressed to a socket, a combination of a specific machine’s address (the IP address) and the number of a specific port (a communications channel). Some port numbers are reserved for specific network services: ssh, ftp, and email, for example, all have their own specific port numbers. But any program can try to accept communications from, or send communications to, any port number on itself or on some other machine.

A firewall is a hardware or software system that restricts the ports that can actually be used to communicate with a particular machine. A firewall can be set to

1. allow both incoming and outgoing communications on any port (in effect, no firewall at all)

2. allow outgoing communications (from your PC) on any port to other machines, and replies from that machine on that same port. Incoming communications from other machines and on other ports are blocked.

3. allow only outgoing communications (from your PC) on selected ports (such as those for web browsing and email) to other machines, and replies from that machine on that same port.

4. allow no communications on any port, in either direction (in effect, unplugging from the network)

Most firewalls are designed to run at level 2 or 3 above. At level 2, your PC can initiate communications with other machines, and can get responses from them. But other machines cannot initiate contact with your PC. Another way of saying this is that your PC can be a client for services, but it cannot be a server taking requests for services from other machines. At level 3, your PC can only initiate communications for selected network services such as web browsing and email.

Many firewalls will allow you to choose the level you want when you install the firewall. The problem with level 3 is that it assumes that you can predict all of the future network services that you will want to use from your PC. Some firewalls try to “learn” whether new kinds of connections are legitimate by popping up a dialog window whenever a new program tries to open a new port number. The dialog asks you whether to allow or block the connection, and the firewall tries to remember your choice for future sessions. (And woe to you if you don’t understand the choice you’re being asked to make and select the wrong option the first time that you are asked.) 2.2 Problem: Displaying X is a Server’s Job

When you connect to a remote machine is text mode, using ssh or the older telnet protocol, your PC functions as a client that initiates communications with the server via a specific port number (port 22 for ssh, 23 for telnet). The server responds using that port.

Even if your PC is behind a firewall, such a communication is well within the guideline of “local PC can initiate contact with other machines and other machines can then reply on that port”. So this should get through a level 2 firewall. It will get through a level 3 firewall only if the ssh or telnet ports are specifically allowed, and it will get through a “learning” firewall only if you allow it the first time you try using your ssh or telnet client program. What makes X a bit messy is that your local PC is actually supposed to behave as a server, not a client. When you start your X server software running, it is configured to accept graphics for display along a specific port, usually port 6000. (X servers generally refer to a “display number”, usually starting with display number 0, and the X server watches for incoming communications on port number 6000 + display − value .)

An X connection begins with an ordinary ssh or telnet session. Eventually, you issue a command to run some X program such as xterm. That program, running on the telnet server, then tries to open a connection via socket 6000 ( I’m assuming thatyou are using display number 0, as is usually the case.) to your local PC. If you are running X server software on your local PC, it accepts the socket 6000 connection, and a new communications path is thereby established for X windowing information.

Now, throw in a firewall, and things get messy. From the viewpoint of the firewall, the socket 6000 connection is being initiated by an outside machine. Most firewalls will block any such attempt. The connection attempt fails. You may get a message saying the communication failed. You may see nothing at all happen.

Luckily, there’s a way around this problem that does not require you to poke holes in your firewall (which most firewall will let you do - many gamers do so to play multi-player games over the Internet, but it can be risky). ssh can be used to tunnel communications to other ports within its own data stream. 2.3 Tunneling X via ssh

ssh connections start off much like Internet client connections. Your local PC requests an ssh service on port 22 from a remote server. Because the communication is initiated by the local PC, the firewall should let it through. But if X tunnelling has been specified, and we send a command to the ssh server to start running an X program, things get interesting. The ssh server will launch X programs with a $DISPLAY value that maps right back to the ssh server, though usually on an unusually large display number. Therefore when the X program tries to open a connection to the X server that will display its windows, it connects to itself, on a socket number that is watched by the ssh server software. That ssh server software then sends a signal along the already-established ssh communications path indicating that an X connection was attempted. Back at your local PC, your ssh client software then opens a connection to socket 6000 on your local PC, where it gets picked up by your X server software.

It’s a neat dodge when it works. So what can possibly go wrong?

You have to remember to configure your ssh client to do X tunneling.

Luckily, if you are using an X package like X2Go or Xming that includes its own ssh client, the package handles this when you configure a new connection.

Although the connection on port 6000 is from a local PC to itself and should never even touch the firewall, software firewalls running on the same PC sometimes don’t recognize that this is a purely local connection and block it anyway.

Security monitors may consider the act of opening any strange socket, even from a PC to itself, and suspicious and try to block it.

Other security monitors may consider the act of opening any port other than the standard ones for browsing, email, etc., to be task requiring permission of an “administrator”. 2.4 So Much Security, So Little Safety

What we’ve learned about firewalls needs to now be examined in the context of an overall security configuration.

Most people actually connect to the Internet via a router that provides wired or wireless access to the Internet to one or more devices in their home. Even if you have only a single PC as your only Internet-connecting device, it’s likely that your cable or DSL provider would have placed a router between your PC and the cable modem or DSL modem that provides direct access to your service.

Some people may, however, have a PC connected directly to a cable or DSL modem.

What’s important about the difference between these two configurations is that the router provides a firewall. Level-2 firewall protection is actually “built in” to the fundamental behavior of the router, although some can be configured for even stronger protection. The router’s firewall is your first line of defense against anyone trying to exploit your computer from outside. And it’s a pretty good one. When you read about large “botnets” of security-breached computers being controlled remotely, it’s almost never because a router firewall was breached. Almost always, the breach started with someone downloading a virus-laden file, letting a malicious program onto the protected side of the firewall, from where that program can initiate new connections to the outside world without tripping the firewall’s normal protections. Preventing that sort of infection is, first and foremost, a human behavior issue. Catching and cleaning up the human lapses is a matter for virus and spyware scanners (which we won’t be discussing further) and for security monitors.

If your PC is connected directly to your cable/DLS modem with no router, you should be running a software firewall on that PC (e.g., the Windows firewall that comes with your Operating System). Otherwise, you’re wide open to exploitation.

But what if you are behind a router? Most Windows PCs still will be running the Windows firewall. Why? Because Windows nags you if it detects that the firewall is switched off. It doesn’t know that your router is providing the same service. Is there any point to having a firewall behind a firewall? Well, it probably does not add any additional protection against outside intruders. It’s not a total waste, however. It does help provide protection in the event that one machine gets infected. If all your machines have their own firewalls, the infection might not spread to every other machine in your home.

Similar observations can be made about Macs and, with more variations, about Linux PCs. Most will come with a software firewall installed and running. In addition to firewalls, security is provided by scanners that you use to periodically sweep your disk to look for infected files, and by security monitors that watch for “suspicious” activities by running programs. Unfortunately, some of these monitors may regard any socket communications that their programmers didn’t anticipate as suspicious.

For Windows users (Vista, Win7, and later), the operating system provides the UAC (User Account Control), which is the monitor that asks you if you want to actually run a program when you try to launch something not recognized by Microsoft. UAC can be annoying, but it’s generally harmless because it has no memory. If it asks you whether you want to run your X server software and you say “No”, you can change your mind by launching the same program again and answering “Yes” next time.

Also, some third-party anti-virus or security suites may install security monitors of their own. IDEs for Compiling under X

Steven Zeil

Last modified: Aug 17, 2019

Contents: 1 emacs and vim 2 Code::Blocks 3 Eclipse

In this lesson, we will look at Integrated Development Environments (IDEs) that manage the whole process from editing your code to compiling to debugging.

An Integrated Development Environment (IDE) is a software package that attempts to combine the primary programming activities of

1. project setup, 2. editing, 3. building (compiling), and 4. debugging into one unified user interface. Some IDEs go even further, offering support for testing, version control, team collaboration and other activities that are beyond the scope of this course.

In this section we will focus on support for project setup, editing, and compilation. We will consider debugging support separately, when we can compare the debugging support offered by the IDEs to that offered by some stand-alone debugging tools.

You may already have used some IDEs in Windows or OS/X. Code::Blocks is an IDE used heavily in our beginning courses. Eclipse is another that is used in later courses. If you have taken programming courses in other universities or departments, you may have used Microsoft’s Visual IDE or others.

An IDE is not a compiler.

An IDE uses and supports a compiler. But the compiler is a separate program. We have seen how compilers can be invoked from the command line as sseparate entities.

Some IDEs are designed to work with a single compiler for a single programming language. Many IDEs are designed to work with a specific compiler or a specific family of compilers. Microsoft’s Visual suite is in this category. Others IDEs are designed to work with many families of compilers and many different programming languages.

Every IDE will provide a facility for editing your code, compiling it, and correcting errors. Most require you to start by defining a “project”, a kind of organizing unit that keeps track of which files are in your project and of any special settings or options you might need when building (compiling) your project.

Simple IDEs often assume that the only steps involved in building a project is to compile the source code. More sophisticated ones may allow you to specify certain other steps that might be required, such as constructing data files that the program will require when run, or packaging up the program and its data files into an archive file for easy delivery to others, or even running other programs that generate parts of the source code for your program. Beginning programmers might not need all that flexibility, and the best IDEs keep all that out of your way until you actually need it. 1 emacs and vim

If we compare the basic function of an IDE to the support available in emacs and vim:

project setup editing building (compiling) debugging emacs (via make) yes M-x compile yes vim (via make) yes :make no we can see that the combination of emacs and make fulfills all of the basic functions of an IDE. (We’ll look at debugging support in a later lesson.) The combination of vim and make fulfills 75% of the basic functions of an IDE. emacs and vim may not be the most elegant od IDEs, but they are a solid fall-back for Unix programmers, in part because they work in both text-mode and graphics-mode sessions.

Neither directly supports the notion of a project. Rather, because their default “build” commands are to run make, they assumes that you will use a make file as the basis of your project management. 2 Code::Blocks

Code::Blocks is a popular IDE for C++ programming on Windows platforms. It is actually, however, built upon a multi- platform library and therefore can be installed on many different operating system. If it is available on a Linux system, it will generally be run as the command code

codeblocks &

Code::Blocks offers a simple editor for creating and changing code, a project manager, and an interface to a debugger.

Typically you begin work with Code::Blocks by creating a project.

If you already have some C++ code in a directory, you create the project in that directory (the project information is stored as a file within that directory). You then add your existing .cpp and .h files to the project. You can then edit these and/or use Code::Blocks to create more C++ code files.

The Code::Blocks editor can offer suggestions and possible continuations of long function names. Just hit Ctrl-space while typing your code to see its suggestions. When you are ready, you can try to “build” (compile and link) the project. If you have errors, clicking on the error message will take you to the line of code mentioned in that error message.

Once your program builds correctly, you can try to run it or to launch it via the debugger.

All of the commands are accessible via the menu. Many are also available via the buttons in the toolbar. If you hover your mouse over a button, a tool-tip will pop up to explain the function controlled by that button.

Example 1: Try This: Code::Blocks in Linux

1. Start an xterm on one of the Dept Linux servers.

2. In this example, we will work with yet another version of the prime number finding algorithm:

cd ~/playing/sieve rm * cp ~cs252/Assignments/progdevx/* . ls

3. Now, run Code::Blocks:

codeblocks &

4. Select “Create a new project” from the “Start here” pane or select New->Project from the File menu.

Select “Console Application”, then C++, Give your project the title “sieve”, and for “Folder to create project in:”, browse to the folder just above (i.e., containing) your sieve directory. Click Next, and make sure that the “GNU GCC Compiler” is selected. Then click Finish.

5. In the Management pane, expand the Sources list for your new project. You will find that Code::Blocks has created a main.cpp file for you. Because we already have the files that make up our project, we don’t want this one. Right-click on it and select “Remove file from project”. You might even want to delete it (via your xterm window).

Instead, we need to tell Code::Blocks to use the files we have already placed in the directory. Right- click on the Project and select “Add files”. Select the .h and .cpp files you copied from the ~cs252/... directory (Ctrl-click to select multiple files at once). Do not select main.cpp.

Click Open, then OK to select the targets of Debug and Release versions of the code. You should now be able to expand the items in your management panel to see that findPrimes.cpp, sieve.cpp, and sieve.h are all part of your project.

6. Double-click on findPrimes.cpp to open it in the editor. Look for the line near the bottom containing a call to a function named “find”, and position your cursor just after the "“d”.

Now pretend that you had just been typing “find”, then realized that you weren’t sure what the actual name of the function was that you wanted. Type Ctrl-space and a suggestion box will pop up. Unfortunately, there is a function named “find” in , and it’s not the one we want, so this suggestion is not very helpful. Type a “P” instead and hit Ctrl-space again. This time, the suggestion is “findPrimes”, the legitimate name of a function declared earlier in this file. Select this by double clicking.

Code::Blocks inserts the rest of the function name. Unfortunately, it also adds a bogus set of parentheses “()” that we don’t need. Delete these and save the file. 7. Now try to build the project via the Build menu, or via the Build item in the toolbar (usually a gear- shaped object), or by right-clicking on the project in the Management pane and selecting “Build”.

You should see some compilation errors appear in the “Logs & others” pane. Click on different error messages and observe that the editor shift to show you the relevant code. Then click on the first one, and change “integer” to “int”. Save the file, and build again.

This time you should succeed with no errors.

8. Run the program via the Run toolbar button (usually a green triangle) or via the Build menu. A console window will pop up with a prompt for you. Enter a moderately-sized integer (say something in the 3 or 4-digit range). You should see a list of prime numbers go scrolling by. Hit Enter to close that window.

9. In the xterm from which you launched Code::Blocks, use ls to look around your sieve directory. You’ll find that some directories have been added. You can find your compiled executable in bin/Debug. Assuming that you are still cd’d into your sieve directory, run the program from your xterm like this:

bin/Debug/sieve

The results should look familiar.

10. Some programs accept options from the command line instead of via direct prompts. This one can be run that way as well. Try running it as

bin/Debug/sieve 50

You can accomplish the same thing when launching programs from Code::Blocks as well. From the Project menu, select “Set program’s arguments…”. Select the Debug target and then enter 50 into the Program arguments box. Click OK and then run the program again from within Code::Blocks.

11. You can now exit from Code::Blocks. You are likely to get a lot of prompts about saving various configurations. It’s probably a good idea to agree to all of these.

3 Eclipse

Eclipse is another multi-platform IDE that is equally at home on both Windows and *nix systems. Eclipse may be the most popular IDE for use with Java programming. It has a flexible plug-in system that allows additions of support for many different programming tools that ’real“ programmers use ”in the field", including support for C++ programming. Where Code::Blocks is a good environment for student and casual programmers, Eclipse is an environment that can stand up to professional use.

If Eclipse is available on a Linux system, it can often be invoked via the command

eclipse &

Again, one typically begins work with Eclipse by creating a project. By default, Eclipse gathers all of your projects together under a common directory called your workspace. Each project occupies its own subdirectory within the workspace. However, if you prefer, you can override this default and keep your projects wherever you like. If you already have some source code files, you would probably set up the project where these reside. You can then edit existing files or create new ones. Like Code::Blocks, Eclipse can complete partially-typed names and offer suggestions as to what may legally be placed at the cursor position. Type Ctrl-space to see this help. It also may offer such help if you simply pause for a moment at a suggestive spot (e.g., immediately after a ‘.’).

The Eclipse editor has another feature that sets it above Code::Blocks. While you are typing your code, it is checking continuously for errors. Almost all Java compilation errors and many C++ errors will actually be flagged for you as you commit them, instead of waiting until you explicitly request a compilation/build of your project.

When you are ready, you can attempt a project build, via the menus or the toolbar buttons (hover your mouse over each button to discover what it does. If you have error messages, selecting one will take you to the relevant code location.

Once your errors have been dealt with, you can run the project or launch the debugger. You can do this via the menus, the toolbar buttons, or by right-clicking on the executable file in the Project Explorer and selecting the relevant action.

Eclipse has some truly interesting features, especially for Java. For example, it “knows” about many of the common transformations that programmers make to their code and can apply them in one step. These are called refactorings. A basic example of refactoring is renaming a variable or function - Eclipse not only changes the name where the variable/function is declared, but also all the places where it is used, and it is smart enough to distinguish between uses of different functions or variables that happen to have the same name. Another refactoring is to “encapsulate” a data member, making the data member private and creating get/set functions for public access to it, all in one step.

Eclipse also understands the basic structure of the language. If you select a variable or function name with the mouse, you can ask Eclipse to take you to the line of code where that name is declared. Or you can select a declaration and ask Eclipse to list all the places in your code where that name is used.

Example 2: Try This: Eclipse in Linux

1. Start an xterm on one of the Dept Linux servers.

2. Let’s get a fresh copy of our program:

rm -rf ~/playing/sieve/* cd ~/playing/sieve cp ~cs252/Assignments/progdevx/* . ls

3. Now, run Eclipse:

eclipse & You will be asked where you want to keep your workspace (the default area for all of your project settings). Accept the suggestion or browse to a more convenient directory.

If this is your first time running Eclipse, you will be taken to a Welcome page. Close that or click on the curved arrow on the right to enter your workbench.

4. Select “New->Project…” from the File menu or by right-clicking in the Package explorer pane on the left. Select “C++ Project” and click Next.

Give your project the name “sieve”, clear the “Use default location” box, and for “Folder to create project in:”, browse to your sieve directory. Make sure the “Linux GCC” toolchain is selected. Accept the rest of the defaults, clicking Next until you come to Finish, then click that.

A box will come up asking if you want to open the C++ perspective. Click in the “Remember my decision box” and then click “Yes”. (You probably want to do this whenever you are asked about opening a new perspective).

5. If you expand the listing, in your Project Explorer pane, you will see all of the files and directories in your sieve directory. Eclipse has added a new directory, the Debug directory, where it will store your compiled binaries. Eclipse has already found your .cpp and .h files. You don’t need to add them to the project. In fact, Eclipse may have already compiled them. You can expand the Errors item in the Problems pane to see the resulting messages. If you don’t see anything, compile your project using the “Build” or “Build All” buttons on the toolbar or the “Build All” entry in the Project menu.

Double-click on any message and the editor will show you he relevant code location.

Double-click on findPrimes.cpp to open it in the editor (if it’s not already open). Click on the lower of the two red rectangles on the right of the listing just look for the line near the bottom containing a call to a function named “find”. Hover your mouse pointer over the line, and the text of the error message will pop up. Position your cursor just after the “d” in “find”. Now pretend that you had just been typing “find”, then realized that you weren’t sure what the actual name of the function was that you wanted. Type Ctrl-space and a suggestion box will pop up. Three functions are suggested: the “find” from and the functions findPrimes and findSomething, both of which are functions in this program. (Eclipse is generally smarter about its suggestions than is Code::Blocks.) Double-click on findPrimes to select it. Eclipse inserts the rest of the function name. Unfortunately, it also adds a bogus set of parentheses “()” that we don’t need. Delete these.

6. Now move down to the next line and delete the semicolon. Notice that an error marker pops up almost immediately on the left. Eclipse will actually find many C++ mistakes as you type them, before you actually run the compiler. Restore the semicolon and save the file.

7. Now try to build the project via the Project menu, or via the Build item in the toolbar (usually a hammer). You should see a shorter list of compilation errors in the “Problems” pane, and the red marker next to the “find…” line go away. Double-click on the first error message, which will take to a line inside sieve.h, and change “integer” to “int”. Save the file, and build again.

This time you should succeed with no errors.

8. Run the program: Look under “Binaries” in the left column to see your executable file. Right-click and select “Run as…Local C/C++ Application”.

After this first time, you can re-run the same program via the Run toolbar button (usually a white triangle in a green circle) or via the Run menu.

You will see the prompt appear in the Console pane. Enter a moderately-sized integer (say something in the 3 or 4-digit range). You should see a list of prime numbers go scrolling by. Hit Enter to close that window. 9. In the xterm from which you launched Eclipse, use “ls” to look around your sieve directory. You can find your compiled executable in Debug/. Assuming that you are still cd’d into your sieve directory, run the program from your xterm like this:

Debug/sieve

The results should look familiar.

10. Some programs accept options from the command line instead of via direct prompts. This one can be run that way as well. Try running it as

Debug/sieve 50

11. You can accomplish the same thing when launching programs from within Eclipse as well. Here is an example (video) of the procedure for doing so.

From the Run menu, select “Run Configurations…”. You should see that a C++ Application configuration named “sieve” is selected. On the right, click on the Arguments tab then enter 50 into the Program arguments box. Click Run. The arguments you have supplied will be used each time the program is run until you change them or create a separate run configuration.

12. You can now exit from Eclipse. Debugging

Steven Zeil

Last modified: Feb 22, 2018

Contents: 1 Debugging Output 2 Assertions 3 Automated Debuggers 3.1 Working with a Debugger - General Strategies 4 emacs Debugging Mode

Your program is producing incorrect output, or perhaps has crashed with no output at all. How do you find out why?

There’s no easy answer to that. Debugging is hard work, and is as much an art as an engineering process. Basically, though, debugging requires that we reason backwards from the symptom (the incorrect output or behavior) towards possible causes in our code. This may require a chain of several steps:

Example

Hmmm. The program crashed near the end of this loop:

for (int i = 0; i < numElements; ++i) { cout << myArray[i] << endl; } myArray[0] = myArray[1];

Each time around the loop, the code prints myArray[i]. One possible reason for such a crash would be if i was outside the legal range of index values for that array. Now, what could cause that? Maybe the loop exit condition is incorrect. Maybe the array is too small. Maybe we counted the number of items in the array incorrectly.

As we work backwards, we form hypotheses about what might be going wrong, for example, “i was outside the legal range of index values for that array”.

An integral part of debugging is testing those hypotheses to see if we are on the right track. This often involves seeking additional information about the execution that was not evident from the original program output, such as how often the loop is executed, whether we actually exited the loop, what values of i or numElements were employed just before the crash.

An automated debugger can help us in these endeavors. Such a debugging tool typically allows us to set breakpoints at different locations in the code, so that if execution reaches one of those locations, the execution will pause. Debuggers also allow us to examine the program state, printing out the values of variables and the history of function calls that brought us to the current location. Debuggers typically let us step through the code, one statement at a time, so that we can watch the execution at a very fine level.

An automatic debugger is a powerful tool for aiding our reasoning process during debugging. It can be invaluable in those frustrating cases where the program crashes without any output at all, as the debugger will usually take us right to the location where the crash occurred. Automatic debuggers can also be a tremendous waste of time, however. It’s all too tempting to single-step through the code aimlessly, hoping to notice when something goes wrong. Debuggers are best used to augment the reasoning process, not as a substitute for it. In that vein, it’s worth noting first the alternatives to automated debuggers (even if that takes us somewhat beyond the scope of a “Unix” course): 1 Debugging Output

One of the easiest ways to gather information about what’s happening in a program is to add output statements to the code. These can show the values of key variables and also serve as documentation of what statements in the code were actually reached during execution. In our example above, the easiest way to test our hypothesis about i going out of bounds would be to alter our code like this:

for (int i = 0; i < numElements; ++i) { cerr << "i: " << i << endl; cout << myArray[i] << endl; } cerr << "Exited loop" << endl; myArray[0] = myArray[1];

Note that we send the debugging information, not to the standard output, but to the standard error stream. This may or may not be significant, but in many programs standard output may be redirected to files or to other programs, and we would not want to introduce new complications by having this unanticipated extra output included in that redirection.

This solution is not perfect, however. For one thing, we need to remember that this extra output is not acceptable in the final version of the code and must be removed. Actually removing the debugging output statements may not be a good idea anyway. In my experience, I have often removed debugging output statements after fixing one bug, only to discover another bug and wish that I had the same output available once more. So, I seldom remove debugging output, preferring instead to comment it out:

for (int i = 0; i < numElements; ++i) { // cerr << "i: " << i << endl; cout << myArray[i] << endl; } // cerr << "Exited loop" << endl; myArray[0] = myArray[1];

But, if we wind up with lots of debugging output like this in our program, we may have to hunt to find and remove it all before turning in our final program. A better solution is to use conditional compilation:

for (int i = 0; i < numElements; ++i) { #ifdef DEBUG cerr << "i: " << i << endl; #endif cout << myArray[i] << endl; } #ifdef DEBUG cerr << "Exited loop" << endl; #endif myArray[0] = myArray[1];

Now our debugging output will only be compiled only if the compile-time symbol DEBUG is set. This can be done by defining it at the start of the file (or in a .h file that is #include’d by this one):

#define DEBUG 1 or by defining it when we compile the code:

g++ -g -c -DDEBUG myProgram.cpp

For the final program submission, we simply omit these definitions of DEBUG, thereby turning off all our debugging output at once.

Another possibility is to use the macro facilities of C and C++ to define special debugging commands that, again, are active only when DEBUG is defined: For example, I sometimes have a header file named debug.h:

#ifdef DEBUG

#define dbgprint(x) cerr << #x << ": " << x << endl #define dbgmsg(message) cerr << message << endl

#else

#define dbgprint(x) #define dbgmsg(message)

#endif

These are macros. The C/C++ compiler will now replace any strings of the form dbgprint(...) and dbgmsg(...) in your source code by the rest of the macro before actually compiling your code.1

So we can then write:

#include "debug.h" ⋮ for (int i = 0; i < numElements; ++i) { dbgprint(i); cout << myArray[i] << endl; } dbgmsg("Exited loop"); myArray[0] = myArray[1];

If we compile that code this way:

g++ -g -c -DDEBUG myProgram.cpp then what actually gets compiled is

#include "debug.h" ⋮ for (int i = 0; i < numElements; ++i) { cerr << "i" << ": " << i << endl; cout << myArray[i] << endl; } cerr << "Exited loop" << endl; myArray[0] = myArray[1]; but if we compile the code this way:

g++ -g -c myProgram.cpp then what actually gets compiled is #include "debug.h" ⋮ for (int i = 0; i < numElements; ++i) { ; cout << myArray[i] << endl; } ; myArray[0] = myArray[1];

Again, you can see that we can turn our debugging output on and off at will.

One final refinement worth noting. When you have a large program with many source code files, it’s easy to get confused about which debugging output lines are coming from where. C and C++ define two special macros, __FILE__ will be replaced by the name of the file (in quotes) in which it lies, and __LINE__ is replaced by the line number in which it occurs. (In case it’s not clear, each of these symbols has a pair of underscore (_) characters in front and another pair in back.)

So we can rewrite debug.h like this:

#ifdef DEBUG

#define dbgprint(x) cerr << #x << ": " << x << " in " << __FILE__ << ":" << __LINE__ << endl #define dbgmsg(message) cerr << message " in " << __FILE__ << ":" << __LINE__ << endl

#else

#define dbgprint(x) #define dbgmsg(message)

#endif to get debugging output like:

i: 0 in myProgram.cpp:23 i: 1 in myProgram.cpp:23 i: 2 in myProgram.cpp:23 ⋮ i: 125 in myProgram.cpp:23 Exited loop in myProgram.cpp:26

2 Assertions

Sometimes, we can anticipate potential trouble spots as we are writing the code. Good programmers often engage in defensive programming, in which they introduce into their code special checks and actions just in case things don;t actually behave as expected.

One staple of defensive programming is the assertion, a boolean test that should be true if things are working as expected. The assert macro, defined in header file for C and for C++, allows us to introduce assertions into our code so that the program stops with an informational message whenever the asserted condition fails.

For example, back when we were first writing myProgram.cpp, we might have anticipated trouble with that loop and written:

#include ⋮ assert (numElements <= myArraySize); for (int i = 0; i < numElements; ++i) { cout << myArray[i] << endl; } myArray[0] = myArray[1];

Assertions are controlled by the compile-time symbol NDEBUG (“not debugging”). If NDEBUG is defined, then each assertion is compiled as a comment - it doesn’t affect the actual program execution. But if NDEBUG is not defined, then the assertion gets translated to something along these lines (it varies slightly among different compilers):

#define assert(condition) if (!(condition)) {\ cerr << "assertion failed at " << __FILE__ << ":" << __LINE__ \ << endl; \ abort(); \ } so our sample code would become:

#include ⋮ if (!(numElements <= myArraySize)) {\ cerr << "assertion failed at " << __FILE__ << ":" << __LINE__ \ << endl; \ abort(); \ } for (int i = 0; i < numElements; ++i) { cout << myArray[i] << endl; } myArray[0] = myArray[1];

Unlike the kind of debugging out we have looked at earlier, assertions are silent unless things are going wrong anyway, so many programmers don’t bother turning them off in the final submitted version unless the conditions being tested are complicated enough to noticeably slow the execution speed. 3 Automated Debuggers

When debugging output and assertions aren’t convenient or you need more details about what’s going on, it’s time to look at automatic debuggers.

Although there are many automatic debuggers out there, they all provide pretty much the same basic set of capabilities:

If a program crashes, the debugger will pause or freeze execution at the moment of the crash.

Debuggers will allow you to set breakpoints, locations in the code where you want the debugger to pause the program whenever execution hists one of those locations.

When you are paused, either by a crash or at a breakpoint, you can

examine the source code at the paused location, and the code of any functions that were called to get you there, and

Examine the values of variables at the moment of the pause.

When you are paused at a breakpoint, you can

Allow the execution to move forward in small steps.

Most debuggers will have commands allowing you to step forward to the next statement in the same function where you are paused (often called “next”), or to try and step forward to the statement but, if any other functions are being called, to stop first at the start of those functions calls (often called “step”), or to run to the end of the current function in which you are paused (often called “finish”). Some debuggers will have other options, such as stepping from one machine-code instruction to the next, but these are not used so often.

Resume normal execution, running normally without stopping until the program crashes, ends normally, or until another breakpoint is hit (often called “continue”).

When compiling with gcc and g++, the -g option causes the compiler to emit information required for an automatic debugger. The debugger of choice with these compilers is called gdb.

For Java programs, the same -g option causes the compiler to emit information useful for its run-time debugger, called jdb. 3.1 Working with a Debugger - General Strategies

Debuggers like gdb and jdb can be especially useful in dealing with silent crashes, where you really don’t know where in the program the crash occurred.

1. Look at the output produced before the crash. That can give you a clue as to where in the program you were when the crash occurred.

2. Run the program from within a debugger (gdb if you have compiled with g++). Don’t worry about breakpoints or single-stepping or any of that stuff at first. Just run it.

When the crash occurs, the debugger should tell you what line of code in what file was being executed at the moment of the crash.

Actually, it’s not quite that simple. There’s a good chance that the crash will occur on some line of code you didn’t actually write yourself, deep inside some system library function that was called by some other system library function that was called by some other…until we finally get back to your own code. That crash occurred because you are using a function but passed it some data that was incorrect or corrupt in some way.

Your debugger should let you view the entire runtime stack of calls that were in effect at the moment of the crash. (Use the command “backtrace” or “bt” in gdb to view the entire stack.) So you should be able to determine where the crash occurred. That’s not as good as determining why, but it’s a start.

3. Take a good look at the data being manipulated at the location of the crash. Are you using pointers? Could some of them be null? Are you indexing into an array? Could your index value be too large or negative? Are you reading from a file? Could the file be at the end already, or might the data be in a different format than you expected?

If you used a debugger to find the crash locations, you can probably move up and down the stack (gdb commands “up” and “down”) and to view the values of variables within each active call. This may give a clue about what was happening.

4. Form some hypotheses (take a guess) as to what was going on at the time of the crash. Then test your hypothesis! You can do this a number of ways:

A. Add debugging output. If you think one of your variables may have a bad or unanticipated value, print it out. Rerun the program and see if the value looks OK. E.g.,

cerr << "x = " << x << " y = " << y << endl; cerr << "myPointer = " << myPointer << endl; cerr << "*myPointer = " << *myPointer << endl;

B. Add an assertion to test for an OK value. E.g.,

assert (myPointer != 0);

Rerun the program and see if the assertion is violated. C. In the debugger, set a breakpoint shortly before the crash location. Run the program and examine the values of the variables via the debugger interface. Single step toward the crash, watching for changes in the critical variables.

Once you have figured out what was the immediate cause of the crash, then you’re ready for the really important part.

5. Try to determine the ultimate reason for the problem.

Sometimes the actual problem is right where the crash occurs. Unfortunately, it’s all to common for the real “bug” to have occurred much earlier during the execution. But once you know which data values are incorrect or corrupted, you can start trying to reason backwards through your code to ask what could have caused that data to become incorrect.

As you reason backwards, continue to form hypotheses about what the earlier cause might be, and keep testing those hypotheses as described in the prior step. 4 emacs Debugging Mode

The easiest way to run gdb and jdb is, again, from inside emacs. The reason for this is quite simple. emacs will synchronize a display of the source code with the current debugger state, so that as you use the debugger to move through the code, emacs will show you just where in the code you are.

Try creating a longer program in C or C++, and compile it to produce an executable program foo. From within emacs, look at one of the source code files for that program and then give the command M-x gdb. (For Java programs, use M-x jdb, instead.)

At the prompt “Run gdb like this:”, type the program name foo. emacs will then launch gdb, and eventually you will get the prompt “(gdb)” in a window. You can now control gdb by typing commands into the gdb window. The most important commands are: set args … If your program expects arguments on its command lane when it is invoked from the shell, list those arguments in this command before running the program. (These may include redirection of the input and output). break function Sets a breakpoint at the entry to the named function (i.e., indicates that you want execution to pause upon entry to that function). break lineNumber Sets a breakpoint at the indicated line number in the current source code file. You can execute this command in emacs either by typing it directly in to the debugger command window pane, or by changing to a window containing the source code, positioning the cursor on the desired line, and giving the emacs command C-X spacebar, run Starts the program running. c Continues execution after it has been paused at a breakpoint. n Executes the next statement, then pauses execution. If that statement is a function/procedure call, the entire call is performed before pausing.

You can also do this by giving the emacs command C-C C-N. s Like n, executes the next statement, but if that statement is a function procedure call, this commands steps into the body of the function/procedure and pauses there.

You can also do this by giving the emacs command C-C C-S. bt (backtrace) prints a list of all active function calls from main down to the function where execution actually paused. up Moves the current debugger position up the call stack, allowing you to examine the caller of the current procedure/function.

You can also do this via the emacs command C-C < down (or just “d” for short) Moves the current debugger position down the call stack, reversing the effect of a previous “up” command. The allows you to examine the caller of the current procedure/function.

You can also do this via the emacs command C-C > p expression Prints the value of expression, which may include any variables that are visible at the current execution location. quit Ends your gdb session. gdb also has a help command, that supplies more details on its available commands.

For Java programs, the jdb commands are similar to those listed above, but vary in minor details (e.g., you must spell out the command word “print” instead of abbreviating it to “p”). The easiest thing to do is to stick to the emacs commands (e.g., C-c n rather than typing out “next”) and let emacs deal with the minor differences between debuggers.

1: The #x is a special trick supported by the C/C++ macro processor. It takes whatever string x stands for, and puts it inside quotes.] Debugging under X

Steven Zeil

Last modified: Aug 2, 2018

Contents: 1 emacs Debugging Mode 2 nemiver 3 The Code::Blocks Debugger 4 The Eclipse Debugger

Next we turn our attention to automated debuggers.

As we have discussed earlier, there are certain common functions that we expect an automatic debugger to provide:

If a program crashes, the debugger will pause or freeze execution at the moment of the crash.

Debuggers will allow you to set breakpoints, locations in the code where you want the debugger to pause the program whenever execution hists one of those locations.

When you are paused, either by a crash or at a breakpoint, you can

examine the source code at the paused location, and the code of any functions that were called to get you there, and

Examine the values of variables at the moment of the pause.

When you are paused at a breakpoint, you can

Allow the execution to move forward in small steps.

Most debuggers will have commands allowing you to step forward to the next statement in the same function where you are paused (often called “next”), or to try and step forward to the statement but, if any other functions are being called, to stop first at the start of those functions calls (often called “step”), or to run to the end of the current function in which you are paused (often called “finish”).

Some debuggers will have other options, such as stepping from one machine-code instruction to the next, but these are not used so often.

Resume normal execution, running normally without stopping until the program crashes, ends normally, or until another breakpoint is hit (often called “continue”).

We have seen before how the gdb debugger provides these functions for C++, and noted that the jdb debugger does the same for Java. The basic interfaces of these debuggers leave a bit to be desired, however, which is why we studied the emacs debugging mode as an interface to them.

In fact, all of the debuggers that we will look at in this section are actually interfaces to the same underlying gdb and/or jdb debuggers.

In general, these debuggers come in two different forms. Some are standalone programs that can be used with any code that has been compiled with the appropriate compiler (and the appropriate compiler options). Others are part of a larger IDE, including each of the IDEs that we have just looked at. 1 emacs Debugging Mode The gdb and jdb modes for controlling the debugger in emacs acquire some debugging-specific menus when emacs is run under X. Certainly, the emacs interface to the gdb and jdb debuggers remains a viable option when working under X. But the differences are small enough that we have really already covered this option sufficiently. 2 nemiver

nemiver is a relatively new entry into the list of automatic debuggers. It’s an attractive, no-nonsense standalone tool for C++ debugging.

Because nemiver is a standalone debugger interface, not tied to any of our IDEs, it is a useful complement to vim, the only one of our IDEs that lacks debugging support.

A nemiver session is contained within a single window, dominated by a source code viewer on the top and a multi- purpose panel below, which can be switched between an input/output console and a “context” area where program variables can be explored.

Example 1: Try This: nemiver

1. For this exercise, you will use your code from one of the earlier IDE “Try This” exercises. Use your choice of the two IDEs discussed earlier (or emacs) to edit findPrimes.cpp. Near the bottom, you will find two statements that look like this:

// bool* theSieve = NULL; // error bool* theSieve = new bool[maxNum]; // correct

Comment out the second statement and remove the comment markers from the left of the first statement. We are, in fact, deliberately injecting a bug into the program for the purpose of illustration. Compile the resulting program (if you are compiling with emacs, remember to use the -g option to enable debugging).

2. Then, in an xterm window, cd to the directory where the IDE has placed your executable program. From there, give the command

nemiver ./progdevx

(If your IDE called the compiled program something other than “progdevx”, change the name in the above command accordingly.) The nemiver window should appear shortly. It will have started execution of your program, but paused at a breakpoint automatically set at the beginning of your main() function.

3. Click the Context tab at the bottom to get a view of the current call stack (only main() has been called so far) and the parameters and local variables of main(). (In the current version of nemiver, there’s a bit of a glitch in the display of function parameters/arguments that causes them to frequently be displayed twice.)

maxNum and theSieve could have any value at the moment. They have not been initialized yet, so they actually have whatever random bits happened to have been left at their respective locations in memory by earlier-running code.

4. Click the Next (“stepping over”) button to move forward one line. Observe how the value of maxNum changes.

5. Click the Continue button. The program seems to hang up, waiting for something. Click on the “Target Terminal” tab at the bottom to view the console. You can see that you are being prompted for input. Enter an integer of your choosing.

A box will pop up to tell you that a Segmentation Fault has occurred. This is one of several run-time errors that can occur when our code tries to access an illegal memory location. Click OK to clear that pop-up. Then click on the Context tab to view the stack and see where the problem occurred,

In the display, you can see that main has called findPrimes, findPrimes has called sieve, sieve has called fill, and the crash occurred somewhere in fill. Click on any of the stack entries to see in what line of code the various calls take place. Notice that the display of local variables and function arguments changes each time you do, according to which function call you have selected.

6. Now, return to your IDE or editor, fix the bug that we injected, and recompile. nemiver will probably detect when you save the change to findPrimes.cpp and ask if you want to load the new, changed file. You can agree to this.

Then, back in nemiver, click the Restart button. Again, when you see that you have stopped at the initial breakpoint, click Continue, then switch to the Target Terminal tab, and enter a number. This time, the program should run to completion. Click OK in the pop-up box that notifies you that the program is finished.

7. Click the Restart button. Find the body of the findPrimes function, and locate the line that writes to cout. Set a breakpoint at this line by clicking in the margin, just to the right of the line number. Click continue, answer the prompt in the Target Terminal window, and observe that, this time, execution stops at your new breakpoint.

8. Looking at the code just in front of your breakpoint, you can see what the value of message is supposed to be. Check it’s value in the Context pane. Initially, it will be shown as {...}. You’ll have to click on the arrow to the left of the name to expand it’s view of this string, because std::strings in C++ are actually structured types (classes). In fact, you will have to do several expansions before you finally stumble across the actual characters, “ is prime.”, that you might have expected.

This rather annoying behavior may be familiar to you. It’s a common problem that really gets in the way when working with some of the more common classes provided by the C++ standard library. Recent versions of gdb actually have support for “pretty-printing” standard classes, but not all debugger interfaces take advantage of it.

9. Let’s fix this annoyance. Exit from nemiver. Give the command

cp ~zeil/.gdbinit ~ to copy my gdb settings into your home directory. (Remember that file names that start with a ‘.’ are hidden in Unix, so you won’t see these in directory listings unless you use “ls -a”)

These settings activate the “pretty-printing” feature of gdb for several C++ standard types, including strings. (If you later decide you don’t want this, just delete the ~/.gdbinit file.)

Restart nemiver as before. Set a breakpoint at that same statement in findPrimes, and run the program. When execution pauses at that output line, look at “message” in the Context pane. See the difference? This is much, much nicer.

10. There’s another way to display variables, particularly ones that are not listed in the context window (e.g., because they are not local variables of the current function). Let your mouse hover over one of the variables in the source code for a few seconds. Eventually a window will pop up containing the value of that specific variable. If it’s a structured variable (like message), you may need to expand the components of the variable in this pop-up window.

11. One last thing to try. Remember that this program can accept parameters on the command line. There are two ways to get that kind of information to nemiver. Try Clicking on “Load Executable…” in the File menu. A dialog box will pop up that allows you to select the executable program that you want to debug (it should already be indicating your progdevx executable) and the arguments to supply to it. Type an integer into the Arguments: box and click on Execute. Your program should restart. If you click Continue you will see that, this time, it goes straight into findPrimes (pausing at your breakpoint) without prompting for a number in the Target Terminal.

Now exit from nemiver.

You can give command arguments to your program when launching nemiver. Do

nemiver ./progdevx 48

Notice that nemiver does not remember the breakpoint that you had set in your prior session. Click on Continue and you can observe that your program does indeed make use of the command line argument “48”.

12. Exit from nemiver.

3 The Code::Blocks Debugger

Code::Blocks will already be familiar to many students who have used it on Windows machines. The Linux installation of Code::Blocks provides debugger support for C++. It’s certainly usable, though some basic tasks such as examining data values and the call stack seem to be more difficult than they in other debuggers.

Of course, with any IDE-based debugger like this one, one of the biggest attractions is being able to edit, compile, and debug within one package. In working with nemiver you may have felt it inconvenient to have to switch to a different program each time you wanted to make a change to the program being debugged. Having the debugger within an IDE makes small, quick changes somewhat easier.

Example 2: Try This: Code::Blocks Debugger

1. Launch the Code::Blocks IDE. You should be able to open your former Code::Blocks project by selecting “Recent Projects” from the File menu.

2. Look at the file findPrimes.cpp. Near the bottom, you will find two statements that look like this: // bool* theSieve = NULL; // error bool* theSieve = new bool[maxNum]; // correct

Comment out the second statement and remove the comment markers from the left of the first statement. We are again deliberately injecting a bug into the program for the purpose of illustration. Compile the resulting program. Via the Project menu, set the program’s arguments to “48”. Run the program via the green arrow button, and you should it crash with a segmentation fault. Hit Enter to close the window that popped up.

3. Next, run the program in the debugger by selecting “Start” from the Debug menu. Again the program will crash, but this time you will see a lot more activity. The source code display will shift to the location of the crash. A window pops up to show the contents of the call stack at the time of the crash. A separate notification window may pop up to inform you of the segmentation fault, then disappear on tis own after a few seconds. A program console window pops up (but, unlike the earlier run, is likely to inform you that it could not set itself as the controlling terminal).

4. From the Debug menu, select “Debugging Windows”, then “Watches” to open a window that will show you the current local variables and function parameters.

(Oddly, I find that the debugging windows are not very well behaved. If I move them on my screen, they are often no longer rendered properly. This can usually be fixed by moving it to where I want, closing the window, and then re-opening. It then appears at the desired location, but properly drawn.

More of a problem, I find that sometimes opening either of these windows will lock the main Code::Blocks window, so that it refuses to acknowledge any further keypresses or mouse clicks until I close all Debugger windows.)

5. In the call stack window, you can see that main has called findPrimes, findPrimes has called sieve, sieve has called fill, and the crash occurred somewhere in fill. Double-Click on any of the stack entries to see in what line of code the various calls take place. Unlike many of the other debuggers, Code::Blocks does not update the display of local variables and function arguments according to which function call you have selected.

6. Return to findPrimes.cpp, and set a breakpoint on the first line of code in main() by clicking just to the right of the line number. Then stop the debugger (the red X button or via the Debug menu) and restart it. The Debugger windows re-appear when you restart the debugger. If necessary, expand the list of local variables so that you can see maxNum. Click the “Next line” button to mode forward one step. Note that the value of maxNum is not only updated, but the color changes to draw your attention to the fact that it has changed.

7. Stop the debugger again. Use the editor to repair the bug that we injected, and recompile. Run the program in the normal fashion, and observe that it completes correctly.

8. Find the body of the findPrimes function, and locate the line that writes to cout. Set a breakpoint at this line by clicking in the margin, just to the right of the line number. Start the debugger. You should stop at the first breakpoint. Click continue, and execution should stop at your new breakpoint.

Looking at the code just in front of your breakpoint, you can see what the value of message is supposed to be. Check its value in the Watches window. You’ll have to expand it at least once, because Code::Blocks does not support the gdb pretty-printing option.

9. Remove the breakpoint from this statement by clicking on the small red breakpoint marker. “Click the ”Step out" button to finish execution of this function, stopping as soon as we return to the caller. Take note of how the Watches and Call Stack windows are updated.

10. Exit from Code::Blocks. 4 The Eclipse Debugger

Eclipse also offers good support for debugging. I personally prefer it over Code::Blocks because it seems more stable in Linux, presents the crucial information (call stack and variables) by default, and can be used equally well with both C++ and Java.

Example 3: Try This: Eclipse Debugger

1. Launch the Eclipse IDE. You will probably find that your former Eclipse project is already open and waiting for you. Eclipse tends to keep projects open until you close them. (If you leave many projects open, Eclipse will take a long time to start up because it begins by recompiling all open projects.)

2. Look at the file findPrimes.cpp. Near the bottom, you will find two statements that look like this:

// bool* theSieve = NULL; // error bool* theSieve = new bool[maxNum]; // correct

Comment out the second statement and remove the comment markers from the left of the first statement. We are again deliberately injecting a bug into the program for the purpose of illustration. Compile the resulting program.

3. Via the Run menu, check the Run Configurations… You should find that there is an existing entry for a C++ application named “progdevx”. Select it and check the Arguments. It probably recalls the integer you entered there earlier. If not, put a small integer, say, 50, in there. Click on Run and watch the program crash.

The crash is actually quieter than with the other debuggers. It may simply stop running with no output at all.

4. Next, run the program in the debugger. From the Run menu, select “Debug” or click the insect button. If you are asked about launchers, select “Using GDB…Create Process”. The program starts running, but pauses at the first statement in main. Above the source code, you can see the call stack on th left and the local variables and parameters on the right. Click the “Step Over” button to move forward one statement. Note that maxNum changes value and is highlighted in the data display to show you what has changed. Clicking on any of the variables in the data display will cause more detailed info about it to appear at the bottom of that pane.

5. Click on the Resume button. The program runs until the crash. In the call stack area, you can see that it says that execution has been “suspended” because the program received a “signal”. That “signal” is the segmentation fault that we have seen before, Click on the various entries in the call stack to see the editor window change to the call locations. Note that the data display changes to show the variables in each function that you have selected.

6. Click the Terminate button (the red square) to stop the execution of our program.

Use the editor to repair the bug that we injected, and recompile. Run the program in the normal fashion, and observe that it completes correctly.

7. Find the body of the findPrimes function, and locate the line that writes to cout. Set a breakpoint at this line by double-clicking in the left margin alongside that line. Click the insect/bug button to start the debugger. You should stop at the beginning of main(). Click Resume, and execution should stop at your new breakpoint.

8. Looking at the code just in front of your breakpoint, you can see what the value of message is supposed to be. Check its value in the data display. You’ll have to expand it at least once, because the version of Eclipse currently on our servers does not support the gdb pretty-printing option. (The latest version does, so this may change on our servers soon.) 9. Again, we can fix this problem. Click the Terminate button to stop this debug run. Via the “Run” menu, check the Debug Configurations. You should find an existing entry for your progdevx function. Select it and click on the Debugger tab.

1 In the box for “GDB command file:”, enter the path to the .gdbinit file you created earlier.

Click the Debug button and run the program again to reach that same breakpoint. This time, the value of message should be clearer.

10. Remove the breakpoint from this statement by double-clicking on the small blue breakpoint marker. “Click the ”Step return" button to finish execution of this function, stopping as soon as we return to the caller. Take note of how the Watches and Call Stack windows are updated.

11. Exit from Eclipse.

1: Alternatively, you could copy the .gdbinit file to your project directory. Eclipse encourages this as professional programmers might prefer to have the pretty-printing available on some projects but not on all of them. Shore Are a Lot of Shells!

Steven Zeil

Last modified: Jun 29, 2015

Although you may not have thought of it as such, the Unix commands that you type into your keyboard constitute a programming language. This language is interpreted by a program called a shell.

The Unix operating system is positively overrun with different shell languages. You can find out what whell you are using by giving the command

echo $SHELL

Whatever shell you are using, it is only one of several possibilities. And, although we have not touched on it yet, that is a full programming language, including control flow instructions for branching and loops. csh and tcsh constitute one family of commonly used shells. They languages they accept are nearly identical. tcsh adds features that are of especial use when typing commands directly to the shell. csh is preferred when writing scripts, short programs in the shell language, if only because csh is more likely to be installed on any given Unix system.

Another family is the sh family, which contains sh, bsh, and bash. sh is commonly used for writing scripts but lacks some of the conveniences we would want in an interactive shell. bsh is the “Bourne shell” (named for its inventor), and bash, the default interactive shell for Linux and CygWin, is the “Bourne Again shell”.

These are by no means the only shell or scripting languages available for Unix, but they’ll do for now.

What distinguishes these shell languages from more general-purpose programming languages is their emphasis on simplicity. Scripting languages are designed to permit immediate translation. If, every time you typed a command, you had to wait while a full-fledged compiler was loaded and executed to process the characters you had typed, the time delays would have you screaming in frustration. Instead, these languages are designed to very quickly determine (usually by examining the first word you typed) if the command is a special command “built in” to the shell. Anything else is a program to be launched.

If you are using a shell in the ssh/bsh/bash family, the command type can be used to determine if a given word represents a built-in shell command, a program (and if so, where that program is located), or something else. Try logging in and giving the following commands:

type g++ type cp type echo type more type if type type type foobar

If your login shell is not in the sh/bash family, you can try out these commands by typing “bash” or “sh” first, then trying those commands, then type “exit” to return to your normal shell.

We’ll concentrate on the csh for scripting purposes, with some notes on how sh-like languages, including bash, differ from csh. Shell and Environment Variables

Steven Zeil

Last modified: Apr 20, 2019

Contents: 1 Setting and Retrieving Shell Variable Values 1.1 Shell Variables and Backticks 1.2 Manipulating File Paths 2 The Environment 2.1 The $PATH

In any programming language, we expect to have variable names. The commands that you type in via a command shell constitute a programming language, though you might not think of it as one, and so it should not come as too big a surprise to discover that command shells have variables. 1 Setting and Retrieving Shell Variable Values

A typical shell variable name begins with an alphabetic character and continues with zero or more alphanumeric characters. Oddly enough, when we want to get the value of a shell variable, we add a $ to the front of it.

Example 1: Try this:

Log in and enter the following commands.

A=hello echo $A

and the echo command will produce the string “Hello”.

All shell variables hold strings, although in selected instances we may be able to interpret those strings as numbers. It all really depends on the commands we are using. If you place a shell variable into a command at a position where the command expects a number, then it will be treated as a number.

Example 2: Try This

Give the following commands:

cd ~/playing ls ~ > 1.txt cat 1.txt one=1 head $one.txt head -n $one 1.txt

In the next-to-last command, we use the shell variable as a string, part of the filename. In the final command, we used the variable in a position where the head command expects to find a number, and so it was treated as one. Caution: when writing commands that involve environment variables, it helps to keep in mind the different forms of quoting in Unix shells, especially the difference between ‘single quotes’ and “double quotes”.

Example 3: Try This

Give the following commands.

cd ~/playing date > 1.dat ls > 2.dat A=1.dat 2.dat echo $A A="1.dat 2.dat" echo $A echo "$A" echo '$A' head -n 1 $A head -n 1 "$A" head -n 1 '$A'

Some of these commands may give error messages. Study the messages carefully until you understand what “went wrong”.

1.1 Shell Variables and Backticks

The use of backticks to insert the output of one command into another is often exploited to assign environment variables.

Example 4: Try This

Give the following commands

Today=`date` echo $Today Words=`grep ^ze /usr/share/dict/words` echo $Words echo $Words | wc

Because many Unix commands function as “filters”, taking input from standard in and producing output on standard out, we can use pipes within backticks to alter the values that would be stored in a variable.

Example 5: Try This

Give the following commands

echo $USER uc=`echo I am $USER | tr a-z A-Z` echo $uc nv=`echo I am $USER | sed -e "s/[aeiou]/./ig"` echo $nv

1.2 Manipulating File Paths A common use of the combination of environment variables and backticks is to pull apart “pieces” of a file path.

The following commands are commonly used for this purpose: basename filepath: Prints the last part of a path, just after the final / dirname filepath: Extracts the part of a path just before the final /, or prints . if the path has no /s readlink -f filepath: Prints the “canonical form” of the file path — the absolute path to the file with no unnecessary steps.

Example 6: Try This

Give the following commands:

cd ~ echo /home/$USER/playing basename /home/$USER/playing basename ~/playing basename playing basename .

dirname /home/$USER/playing dirname ~/playing dirname playing dirname .

readlink -f /home/$USER/playing readlink -f ~/playing readlink -f playing readlink -f .

These three commands aren’t generally something you would bother with when typing commands directly at the command line. But they can be used with backticks to put useful information into environment variables.

Example 7: Try This

Give the following commands:

cd ~ myFile=~/playing/games/math.h ls -l $myFile myDir=`dirname $myFile` echo $myDir ls -l $myDir

myDir=~/playing/games/.. ls -l $myDir basename $myDir dirname $myDir

Now, if we, as “intelligent” programmers, understand that ~/playing/games/.. is “really” just an awkward way of saying ~/playing. But the basename and dirname commands just aren’t that smart. That’s where the readlink command comes into play. It not only turns relative paths into absolute paths, it “canonicalizes” the path by getting rid of steps like “../” or “./”. Give the following commands:

myDir2=`readlink -f $myDir` echo $myDir2 ls -l $myDir2 basename $myDir2 dirname $myDir2

This combination of the directory manipulation commands, backticks, and environment variables, often proves useful when writing scripts, the subject of the final lesson in this course. 2 The Environment

The scope rules for shell variables are a bit odd. Most variables are local to the process where they are assigned. For example, look at the following (‘>’ is the prompt for a new command).

> A=hello > echo $A hello > sh > echo $A A: undefined variable > exit > echo $A hello

We set A to a value and then , using echo, printed it. In the next line, we start another copy of the shell running (sh or csh). This runs as a separate process (the old one is temporarily suspended). This new “child” process does not inherit the variable A, so when we try to print it, we are told that A is undefined. The exit command shuts down the child process, returning us to the original (parent) process where we had previously defined A, and so we were able to print it again.

This means that, normally, any environment variables that we set in this fashion will have no effect at all on any subsequent commands or programs that we launch (because all commands and programs are run as new, separate processes). Setting environment variables like this affects only the command shell in which we do it, so they can effect how that shell interprets what we type before it launches the command.

If we prefer, we can make a variable exported, meaning that its value will be seen by child processes. In sh and bash this is done by naming the variable in an export command.

> B=goodbye > export B > echo $B goodbye > sh > echo $B goodbye > exit > echo $B goodbye

The Unix environment is the collection of all shell variable values that have been imported from a parent process. Such inherited shell variables are called environment variables. Even when you first log in, your shell may inherit a number of environment variables from the process that manages user logins.

Many Unix programs rely on exported shell variables to control or modify their behavior. Examples that may be familiar include: TERM Way back when we were learning to log in with ssh, we had to set the TERM variable to indicate what kind of terminal our text-mode ssh client was emulating. Many Unix programs use this setting to determine what control character sequences will work on our display.

DISPLAY The DISPLAY variable is used by X applications to determine where to send the windows and graphics for display to the person running the application.

SHELL This variable saves the name of the command shell program being used to interpret your typed-in commands.

Example 8: Try This

The env command lists all environment and shell variables active in your current process. Give the command

env | more

and page through the list (it may be quite long). Can you guess the meaning/purpose of most of these?

2.1 The $PATH

A particularly important environment variable is PATH. This determines what programs you can execute without typing in a full path name. If you type a command that is not a built-in shell command, and does not contain a ‘/’, then the shell looks at each directory listed in $PATH to see if the program you have requested can be found there. For example, suppose you have compiled a C++ program and produced a new program, yourProgram, in your current working directory. Some people will be able to execute the program this way:

yourProgram while others will have to do it like this:

./yourProgram

The difference stems from whether your account has been set up so that your working directory (.) is in your $PATH. If it is, then you can use the first form. If it is not, you must use the second form. (Some people consider it a bit of a security risk to have . in your $PATH, arguing that you could get spoofed into doing some very strange things if someone had deposited some malicious programs into your working directory under innocuous names like “ls” or “cp”. My own feeling is that the threat here is pretty small.) To see your path, just give the command

echo $PATH to print it, just as you would print any environment variable.

Adding additional directories to your $PATH is one of the more common customizations that people make to their Unix environments. This lets them set up special directories full of customized commands and frequently-used programs. Customizing Your Unix Environment

Contents: 1 Startup Customizations 1.1 Customizing bash 1.2 Customizing tcsh 2 Customizing X Application Appearance 3 Program Customizations

Sooner or later, you’re going to decide that you don’t like some of the default behaviors of your favorite Unix programs. When that happens, you will be pleased to learn that there’s are great deal you can do to modify your Unix environment. 1 Startup Customizations

When you launch a new shell (e.g., whenever you log in, or spawn off a new xterm), the shell starts by executing the commands in a set of files within your home directory. This gives you an opportunity to customize your Unix environment.

Prior to late summer of 2016, new student accounts were set up, by default, to use the command shell tcsh. Since then, new student accounts have been set up with bash as the default shell. To see which one your account is using, run the command

echo $SHELL

1.1 Customizing bash

When a new bash shell is started by a command that logs the user in, it reads start-up instructions from a file ~/.bash_profile before allowing the user to type. If a new bash shell is started by an already logged-in user, it reads start-up instructions from ~/.bashrc instead.

Most people put instructions that they alweays want performed in ~/.bashrc, and have their ~/.bash_profile load that after it has done its job. Many people accumulate a large number of “alias” commands, and these are usually kept in a separate file, ~/.bash_aliases. (Note that all of these files start with a ‘.’ in their name, so they are “hidden” from an ordinary ls command unless you give the -a option.)

1.1.1 .bash_profile

A typical .bash_profile:

# include .bashrc if it exists if [ -f "$HOME/.bashrc" ]; then . "$HOME/.bashrc" fi

Actually, all that this does is to run my .bashrc file. So, whether I am logging in for the first time or not, my .bashrc file will be executed.

1.1.2 .bashrc

Here is a simplified version of my .bashrc file: # ~/.bashrc: executed by bash(1) for non-login shells.

# If not running interactively, don't do anything [ -z "$PS1" ] && return

# don't put duplicate lines in the history. HISTCONTROL=ignoredups:ignorespace

# append to the history file, don't overwrite it shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1) HISTSIZE=1000 HISTFILESIZE=2000

# check the window size after each command and, if necessary, # update the values of LINES and COLUMNS. shopt -s checkwinsize

# Display the last two directories of my current path in the command prompt. export PROMPT_DIRTRIM=2 PS1='\u@\h:\w\$ '

# If this is an xterm set the title to user@host:dir case "$TERM" in xterm*|rxvt*) PROMPT_COMMAND='echo -ne "\033]0;${USER}@${HOSTNAME}: ${PWD/$HOME/~}\007"' ;; *) ;; esac

# Set my default editor EDITOR=emacs

# Default permissions when I, create a new directory are 700, # when I create a new file, 600 umask 077

# Load alias definitions from ~/.bash_aliases

if [ -f ~/.bash_aliases ]; then . ~/.bash_aliases fi

# Add my ~/bin directory to my $PATH if [ -d ~/bin ] ; then PATH=~/bin:"${PATH}" fi

There’s a lot going on here, but most of it is explained in the comments.

A few things of note:

The EDITOR= line indicates that emacs is my editor of choice. Some programs will use this information to load an editor when you have large amounts of text to enter/alter.

The umask command sets my default file protections. See this discussion for details.

At the end, I add my ~/bin directory, where I store my personal collection of useful programs and scripts, to my $PATH variable so that I can run them without typing out their full path.

Just before that, I loaded my ~/.bash_aliases file. 1.1.3 .bash_aliases

The .bash_aliases file contains a number of personal shortcuts that I have established. Here’s a sample of mine:

alias cp='cp -i' alias mv='mv -i'

# enable color support of ls & grep if [ -x /usr/bin/dircolors ]; then test -r ~/.dircolors && eval "$(dircolors -b ~/.dircolors)" || eval "$(dircolors -b)" alias ls='ls --color=auto' alias grep='grep --color=auto' alias fgrep='fgrep --color=auto' alias egrep='egrep --color=auto' fi

atria() { ssh -f -Y -A $* atria.cs.odu.edu xterm -title "atria" -sl 500 -sb }

sirius() { ssh -f -Y -A $* sirius.cs.odu.edu xterm -title "sirius" -sl 500 -sb }

The aliases for cp and mv add a -i option to those commands. This option causes the commands to warn me if I am about to overwrite an existing file.

The alias for ls causes it to use different colors for directories and for executable files than for ordinary files (rather like ls -F).

The aliases for grep cause it to use colors to highlight the matched portion fo each line of output.

Finally, I have set up atria and sirius as shorthand commands to launch a new xterm remotely connected to each of those machines. 1.2 Customizing tcsh

When a C-shell or TC-shell is started up (e.g., whenever you log in, or spawn off a new xterm), it executes the commands in a file called ~/.cshrc

You may or may not already have a .cshrc file.. You can check by giving the command

ls -a ~

If you don’t have a .cshrc file, you should make one. If you do, consider changing it as described here. Edit your .cshrc file and insert the following:

setenv EDITOR emacs umask 002 limit coredumpsize 0 # # skip remaining setup if not an interactive shell # if ($?USER == 0 || $?prompt == 0) exit set history=40 set ignoreeof set ellipsis="..." set prompt="%m:%c02> " alias cp 'cp -i' alias mv 'mv -i' alias rm 'rm -i' alias ls 'ls -F' alias lsc 'ls --color=tty' alias ff 'find . -name \!* -print'

The setenv line indicates that emacs is your editor of choice. Some programs, including many e-mail programs, will use this information to load an editor when you have large amounts of text to enter/alter.

The umask command sets your default file protections. See this discussion for details.

Of the remaining lines, the most interesting are the alias commands. These set up “abbreviations” for commands. In this case, we are mainly adding options to familiar commands. The first three aliases add a -i option to the cp, mv, and rm commands. This option causes each command to prompt for a confirmation whenever its action would result in a file being deleted. The fourth alias adds the -F option to all ls commands, making it easier to tell the difference between ordinary files, directories, and executable programs. The lsc alias uses color to give cues as to which files are plain files and which are excutable, and which are directories. The final alias sets up a “find-file” command, ff. This will search the current directory and all subdirectories within it for a file matching a given pattern. For example the command sequence

cd ~ ff '*.txt' will list all of your files with the .txt extension.

After you have checked this file and saved it, try invoking a new copy of the shell

tcsh to test out the changes in behavior.

Another change worth considering in a .cshrc file is adding additional directories to your PATH. For example, if you want to be sure that xterm and other X client programs are readily available whenever you are logged in, you would want to make sure that /usr/local/X11R6/bin is in your $PATH. You can do this by adding the command

set path = (/usr/local/X11R6/bin $path) to your .cshrc file. 2 Customizing X Application Appearance

When an X application is started, it looks for instructions in a ~/.Xdefaults file. These instructions can be used to control the appearance and, to a lesser degree, the behavior, of most X applications.

Each line in a .Xdefaults file has the form

programName*resourceName: value

The programName is the name of the program to which this directive applies. Examples would be “xterm” or “emacs”. The resourceName is the name of some customizable property of that program. The value is the setting you want for that property. Exactly what resources can be customized and what the allowable values are likely to be will vary considerably from one program to another. You can usually find the list by issuing the man command to read the manual page for that program. Almost all programs will support a geometry resource to indicate how large a window to open and/or where to place it on the screen. Many support foreground and background resources to set text and background colors. Another common resource is font, for designating the main text font.

For example, try putting the following into your own ~/.Xdefaults file (on the Unix system): ! ! My xterm settings: I like a heavy font and always want a scrollbar ! xterm*faceName: dejavu sans mono:medium:pixelsize=12 xterm*Geometry: 80x36 xterm*foreground: midnight blue xterm*background: white xterm*scrollBar: true ! ! A light-text-on-dark background version dark*faceName: dejavu sans mono:medium:pixelsize=12 dark*Geometry: 60x24 dark*foreground: grey dark*background: black then launch a new xterm the “normal” way: and also like this:

You can see the effects of changing the geometry and color resources.

By the way, emacs offers a nice way to see the range of available color names. Running emacs under X, try giving the command M-x list-colors-display. It’s better than the giant-size box of Crayolas!

The nastiest part of customizing X is dealing with fonts. As you can see from the above example, font names tend to be long and complicated. The command xlsfonts will list all the fonts know to your remote Unix client machine. But that list is unlikely to be helpful, partly because it’s so long and partly because the X windows server program running on your local PC may not know how to display most of those fonts. A more helpful program is called xfontsel. You can use it to sample all the available fonts, seeing how the entire character set will look on your machine. 3 Program Customizations

Many programs have their own mechanisms for customizing their appearance and behavior. Typically, these are stored in your home directory, but with file names beginning with “.”, making them invisible to your usual ls command unless you add the -a option. (Try doing a ls -a ~ right now - you might be surprised at how many things are already stored there.) The .login and .cshrc files discussed earlier are simply one example of this more general convention in Unix.

In many cases, these files represent information stored after you worked with a program and/or used menu settings to establish your preferences. For example, once you have used the insight X window interface to the gdb debugger, you will afterwards have a file ~/.gdbtkinit that records the breakpoints you had set in your last debugging session and which of the optional “view” windows were open. Under normal circumstances, you would never alter this file directly.

On the other hand, some of these files are intended for you to edit directly. For example, we have already discussed altering your .cshrc and .Xdefaults files. Another worth mentioning is the file ~/.emacs, which controls the behavior of the emacs editor. emacs may be the most customizable program ever written, and the range of things you can put into a .emacs file is well beyond the scope of this course. The best way for most people to get started in emacs customization is to look at other people’s .emacs files, such as mine and to read the emacs documentation, particularly examples of some common things to put into a .emacs file. Scripts

Steven Zeil

Last modified: Dec 13, 2016

Contents: 1 Parameters 2 Control Flow 3 The test and expr Programs 4 Scripting Example: The Student Test Manager 4.1 Sample Scenario 4.2 The Script 5 Debugging Your Scripts 5.1 Isolate the Problem 5.2 Add Debugging Output 5.3 Trace the Execution

You can put any sequence of Unix commands into a file and turn that file into a command. Such a file is called a script. For example, suppose that you are working on a program myprog and have several files of test data that you run through it each time you make a change. You might create a file dotest1 with the following lines:

./myprog < test1.dat > test1.dat.out ./myprog < test2.dat > test2.dat.out ./myprog < test3.dat > test3.dat.out ./myprog < test4.dat > test4.dat.out ./myprog < test5.dat > test5.dat.out

You can’t execute dotest1 yet, because you don’t have execute permission. So you would use the chmod command to add execute permission:

chmod u+x dotest1

Now you could execute dotest1 by simply typing

./dotest1

Most shells provide special facilities for use in scripts. Since these differ from one shell to another, it’s a good idea to tell Unix which shell to use when running the script. You do this by placing the command #!path-to-the-shell in the first line of the script.

There are two main families of shells. The sh family includes sh and bash. The csh family includes csh and tcsh. In fact, you can list any program in the #! line, not just a command shell, and Unix will use that bash and tcsh have a number of features that are oriented towards program to process the remainder of the lines in people typing in commands directly from the keyboard. For the script. example, both feature command and filename completion when the Tab key is pressed. Any text file with execute permission can be invoked as a program. Such interactive features are unnecessary when writing scripts and, in some cases, could actually create problems. So we tend to fall The first line must identify the program to be run (after a starting #!). back on the simpler sh and csh shells when writing command The remaining lines are fed to the standard scripts. input of that program.

So another possibility for executing your test In this lesson, we will focus on sh. code would be to put the line #!./myprog at the front of each of the test data input files test1.dat, test2.dat, …, and then execute those data files!

1 Parameters

We can pass parameters to shell scripts from the command line. For example, suppose we wanted a script to execute a single test like this:

dotest2 test1.dat that would feed test1.dat (or whatever) to the input of myprog, saving the output in test1.dat.out.

th We use the symbol $k to stand for the k argument given to the script. So we can write our script dotest2, as follows:

#!/bin/sh ./myprog < $1 > $1.out

After the appropriate chmod, this could then be invoked as

./dotest2 test1.dat test2.dat test3.dat test4.dat test5.dat

Of course, scripts can have more than one parameter.

Example 1: Try This: A script with command line parameters

Use emacs or any other editor to create the following file, saving it as mcdonald

#!/bin/sh echo Old McDonald had a farm. echo EIEIO echo And on that farm he had a $1. echo EIEIO echo With a $2, $2 here, echo And a $2, $2 there.

Do

chmod +x mcdonald

to make it executable, and then try invoking it this way:

./mcdonald cow moo ./mcdonald dog bark

2 Control Flow Shells feature control flow based on “status codes” returned by programs to indicate whether the program execution “succeeded” or “failed”. For example, this script tests to see if the gcc compiler has successfully compiled a file.

if gcc -c expr.c; then echo All is well. else echo Did not compile! fi

In sh (and its relatives bsh and bash), the if is followed by a list of commands, each terminated by a ‘;’. The status code of the last command is used as the if condition.

This idea of returning a status code explains why, in C and C++, the function main is always supposed to return an int:

int main (int argc, char** argv) { ⋮ return 0; }

The returned value is the status code. You are supposed to return a zero when your program works normally, but you are supposed to return a non-zero “error code” if your program terminates abnormally (e.g., if it discovers that a file it needs is missing).

The shell if statements will take the “then” branch when a command returns a status code of 0, but any non-zero value denotes a failure and will cause the script to follow the “else” branch. Most Unix commands will list the status codes they return on their man pages (the pages you get when running the command man command).

Looping is also available in the shells. One of the most commonly used loop forms is that of visiting every item in some list. The list is often all files satisfying some wildcard pattern:

for file in *.txt do echo $file done

Another common kind of loop visits every parameter passed to the script

for x in $* do echo $x done

$* is a list of all the command parameters. If either of these scripts were stored in a file testp and then invoked as

./testp a b c the output would be

a b c

If invoked as

./testp *.txt the output would be a list of all .txt files in the working directory.

Another common scripting pattern is a script that has few special parameters at the beginning, followed by an arbitrary number of remaining parameters (often filenames). The shift command helps us to handle these. Each call to shift removes the first element of the $* list.

Example 2: Try This: Looping through parameters

Create a file testp2 containing this script:

#!/bin/sh p1="$1" shift p2="$1" shift echo The first two parameters are\ $p1 and $p2 echo The remaining parameters are $* for x in $* do echo $x done

Use chmod to make that file executable and invoke it as

./testp2 a b c d

Study the output. Do you see the effects of the shifts? If not, try commenting out the shift commands (put a ‘#’ character in front of them) and run the command again.

Once you are satisfied that you know what is happening and why, try invoking it like this:

./testp2 a "b c" 'd e f' g

The “$*” is expanded into a simple string “a b c d e f g” and the for loop actually breaks that string apart at each blank space. The result does not respect the grouping of parameters imposed by the quotes in the original command. Hence 'd e f' gets broken into three pieces. On the other hand, the shift command does respect the original quoting. Hence the "b c" were kept together by the shifting.

A while loop is also available.

while condition; do commands done

It’s hard to see how we can use this with the tools we have discussed so far. In the next section, we’ll look at some useful ways to write conditions that can take advantage of this style of loop, including providing a more accurate way to loop through command line arguments.

3 The test and expr Programs

One thing that becomes obvious quickly is that the status code based testing is limited. Often we want to Check for the presence/status of files

Do tests on strings, especially in variables & script parameters

Do numeric tests. but status codes indicating whether or not a program executed successfully don’t seem to help us very much.

The solution is to use a program whose job is to

do the tests, and

return the appropriate value as a status code.

The Unix program that does this is called test:

gcc -c expr.c if test -r expr.o; then echo All is well. else echo Did not compile! fi test takes a bewildering variety of possible parameters. You can see the whole list by giving the command man test. Many of these are used for checking the status of various files. The -r expr.o in the above script checks to see if a file named expr.o exists and if we have permission to read it. Some common file tests are:

test is true if -r file file exists and is readable -w file file exists and is writable -x file file exists and is executable -d file file exists and is a directory

Strings are compared with = and != in sh.

if test $USER = zeil; then echo Nice guy! else echo Who are you? fi

We can use the ability to compare strings to provide a more accurate solution to the problem of looping through command-line arguments than we were able to achieve in the previous section.

Example 3: Try This: Looping through parameters with while

Create a file testp3 containing this script:

#!/bin/sh p1="$1" shift p2="$1" shift echo The first two parameters are\ $p1 and $p2 echo The remaining parameters are $* while test "$1" != ""; do file="$1" shift echo $file done

Use chmod to make that file executable and invoke it as

./testp3 a b c d

Study the output. Do you see the effects of the shifts? If not, try commenting out the shift commands (put a ‘#’ character in front of them) and run the command again.

Once you are satisfied that you know what is happening and why, try invoking it like this:

./testp3 a "b c" 'd e f' g

This time you should see that the original grouping of the commands via quotes is being preserved. That could be important if we had file names that had blank characters inside the name. Such names are somewhat rare in Unix practice, but common in Windows, so if you tend to work with files created on both kinds of systems, you might want to be sure that your scripts can handle such names.

Numbers are compared with rather clumsy operators in sh (-eq -ne -lt -gt -le -ge):

if test $count -eq 0; then echo zero else echo non-zero fi

But where do numbers come from if all the variables contain strings?

From yet another program, of course. The shells themselves have no built-in numeric capability. Calculations can be performed by the expr program. This program treats its arguments as an arithmetic expression, evaluates that expression, and prints the result on standard output. For example:

expr 1 `1` expr 2 + 3 \* 5 `17`

Note the use of \ to “quote” the following character (*). Without that backwards slash, the shell into which we typed the command would treat the * as the filename wildcard, replacing it with a list of all files in the current directory, and expr would have actually seen something along the lines of

expr 2 + 3 file1.txt file2.txt myfile.dat 5

Now, how do we get the output of an expr program evaluation into a variable or a script expression where it can do some good? For this we use the convention that the backwards apostrophes, f, when used to quote a string, mean “execute this string as a shell command and put its standard output right here in this command”.

For example: echo Snow White and the expr 6 + 1 Dwarves `Snow White and the expr 6 + 1 Dwarves` echo Snow White and the `expr 6 + 1` Dwarves `Snow White and the 7 Dwarves`

With these two ideas, we can now do numerical calculations in our scripts:

count=0 for file in * do count=`expr $count + 1` done echo There are $count files\ in this directory.

Example 4: Try This: Looping from the command line

Although control flow commands are most commonly used inside scripts, they are just part of the same shell language that you use when you type commands directly at the keyboard. Try the following commands:

If your login shell is bash: If your login shell is tcsh:

for file in ~/* foreach file (~/*) do echo My home directory contains $file echo My home directory contains $file end done

This can be a useful trick sometimes when you want to apply a command to every file in some directory (although you can usually accomplish the same task with some combination of find or xargs).

4 Scripting Example: The Student Test Manager

As an example of how to bring all those scripting details together, let’s look at some scripts to aid in testing programs. Many programming students wind up adopting a hit-and-miss approach to testing their code, in part because they don’t set themselves up with an easy way to repeat a large number of tests every time they make a change to their programs.

What we’d like to end up with is a simple system for testing non-GUI programs. The idea is that the student designs a number of tests and can then issue a command to run all or a selected subset of those tests. The command should run those tests, capturing the program outputs in files that the student can examine later. Furthermore, we can save the student a bit of time by letting him or her know if the program output has changed on any of those tests. 4.1 Sample Scenario

So, suppose the student is working on a program named myProg and has designed 20 test cases. Once the program compiles successfully, the student could say

./runTests ./myProg 1 20 to run all 20 tests. The output might look something like: Starting test 1... Starting test 2... Starting test 3... Starting test 4...

Maybe at this point the output stops, suggesting that the program has been caught in an infinite loop. The student kills the program with^-C. Now looking in the directory, the student finds files testOut1.txt, testOut2.txt, testOut3.txt, and testOut4.txt corresponding to the tests that were actually started. The student looks at the first 3 of these, decides that the captured output looks correct, and starts debugging the program to figure out why it hung up on test 4. Eventually the student makes change to the program, recompiles it, and tries again, this time just running test 4.

./runTests ./myProg 4 4 Starting test 4... ** Test 4 output has changed

Not surprisingly, the output from test 4 is different, because the infinite loop has now, apparently been fixed. Checking testOut4.txt, everything looks good.

Encouraged, the student launches the whole test set once again

./runTests ./myProg 1 20 Starting test 1... Starting test 2... ** Test 2 output has changed Starting test 3... ** Test 3 output has changed Starting test 4... Starting test 5... ⋮ Starting test 20...

The unexpected has occurred. The fix to make test 4 work has changed the behavior of the program on tests 2 and 3, which had previously been believed to be OK. The student must go back and check these (as well as the outputs of tests 5…20) to see what has happened. Annoying? Yes, but it’s better that the student should discover these changes in behavior before submitting than that the grader should do so after submission! 4.2 The Script

As we have envisioned it, our runTests script takes three parameters:

1. The name of the program to run

2. The number of the first test to be performed

3. The number of the last test to be performed

So we start our script by gathering those three parameters:

#!/bin/sh programName=$1 firstTest=$2 lastTest=$3

Clearly, the main control flow here will be a loop going through the requested test numbers.

#!/bin/sh programName=$1 firstTest=$2 lastTest=$3 # # Loop through all tests testNum=$firstTest while test $testNum -le $lastTest; do ⋮ testNum=`expr $testNum + 1` done

For each test, we will eventually want to compare the output from this test, stored in testOuti.txt to the output from the previous test, which we will assume is stored in testOuti.old.txt. The cmp command lets us compare two files to see if their contents are identical.

#!/bin/sh programName=$1 firstTest=$2 lastTest=$3 # # Loop through all tests testNum=$firstTest while test $testNum -le $lastTest; do ⋮ # Has the output changed? if test -r testOut$testNum.old.txt; then if cmp testOut$testNum.old.txt testOut$testNum.txt; then donothing=0 else echo \*\* Test $testNum output has changed fi fi testNum=`expr $testNum + 1` done

To set this up, we must determine just where the .old.txt files come from. They are simply the previous version of the test output files (if that particular test has ever been run).

#!/bin/sh programName=$1 firstTest=$2 lastTest=$3 # # Loop through all tests testNum=$firstTest while test $testNum -le $lastTest; do # Save the previous output from this test if test -r testOut$testNum.txt; then /bin/mv testOut$testNum.txt testOut$testNum.old.txt fi ⋮ # # Has the output changed? if test -r testOut$testNum.old.txt; then if cmp testOut$testNum.old.txt testOut$testNum.txt; then donothing=0 else echo \*\* Test $testNum output has changed fi fi testNum=`expr $testNum + 1` done

Finally, we come to the heart of the matter. We need to actually execute the program, saving the output in the appropriate testOut... file. Exactly how we want to execute the program depends upon how the program gets its input data. I’m going to assume, for the moment, that this program reads its input data from the standard input stream, and that the student saves the input test cases in testIn1.txt, testIn2.txt, …

#!/bin/sh programName=$1 firstTest=$2 lastTest=$3 # # Loop through all tests testNum=$firstTest while test $testNum -le $lastTest; do # # Save the previous output from this test if test -r testOut$testNum.txt; then /bin/mv testOut$testNum.txt testOut$testNum.old.txt fi # Run the test! echo Starting test $testNum... $programName < testIn$testNum.txt 2>&1 > testOut$testNum.txt # # Has the output changed? if test -r testOut$testNum.old.txt; then if cmp testOut$testNum.old.txt testOut$testNum.txt; then donothing=0 else echo \*\* Test $testNum output has changed fi fi testNum=`expr $testNum + 1` done

By making minor changes to the way the program is run, we can accommodate a number of different possible program styles. How would you change this script for a program that read no inputs at all, but could be invoked with different command-line parameters?

There are a number of possibilities, but I would put the various parameters into the testIn... files, and run them this way:

#!/bin/sh ⋮ # # Run the test! echo Starting test $testNum... $programName `sed -e s/[\\r\\n]//g testIn$testNum.txt` \ 2>&1 > testOut$testNum.txt ⋮ 5 Debugging Your Scripts

Scripting is just another form of programming. Just like programs that you write in C++ or other “traditional” programming languages, you need to test your scripts to see if they work, and, just like those other programs, your scripts are probably not going to work correctly on the first try. Debugging your scripts is not all that different from debugging traditional programs, either. Many of the basic techniques are similar: 5.1 Isolate the Problem

If you have a script that does several things in succession, try to determine how far along that sequence you have been getting before things go wrong. The whole reasoning process that we call “debugging” is a lot easier if you know where to look.

You can try to do this by examining any intermediate results to see if they are correct. In particular, if your script produces some temporary or working files, look at those. If your script rms those files when it’s done with them, try commenting out the rm statements (put a # in front of them) until you’re sure everything is working. Another way to examine intermediate results is by adding debugging output to print them out (see below).

If that doesn’t work, you can try “shortening” your script by commenting out the later steps and seeing if the partial script seems to work, then uncommenting the next step and testing the script again, and so on. 5.2 Add Debugging Output

Just as you might add extra output statements to a C++ program to reveal the values of selected important variables, you can do the same thing with scripts. The echo command is used for this purpose. For example, if we were having problems with this script: sh

#!/bin/sh count=0 for file in * do count=`expr $count + 1` done echo There are $count files\ in this directory. we might add some temporary output to see just what was going on inside the loop:

#!/bin/sh count=0 for file in * do echo Looking at $file. count=`expr $count + 1` echo Count is $count. done echo There are $count files\ in this directory.

Debugging output like this not only reveals the values of variables, but also can be valuable in showing which branch of an if was selected, how many times a loop is executed, etc.

Because echo can print pretty much anything you give it as an argument, it’s particularly useful when you have a script that runs other programs and suspect that your script may not be invoking those programs correctly. For example, if you have a script huntFor:

#!/bin/sh # # Hunt through a list of files for # a string. E.g., # huntFor sqrt *.cpp # Search=$1 shift grep -l $Search $* and are wondering if the grep command is being issued the way you expect, you might try:

#!/bin/sh # # Hunt through a list of files for # a string. E.g., # huntFor sqrt *.cpp # Search=$1 shift echo grep -l $Search $* grep -l $Search $* to see the command just before it gets issued.

You might find it valuable to actually create the above script, storing it in a file named huntFor, and try testing it like this:

./huntFor yes ~/UnixCourse/compileAsst/*.cpp ./huntFor bigger ~/UnixCourse/compileAsst/*.cpp ./huntFor "bigger than" ~/UnixCourse/compileAsst/*.cpp

The last case reveals a problem, which is easily fixed:

#!/bin/sh # # Hunt through a list of files for # a string. E.g., # huntFor sqrt *.cpp # Search=$1 shift grep -l "$Search" $*

If you have tried those tests and don’t see how this fix works, review Quoting.

The use of echo to print out entire commands does carry a bit of a risk if the commands involve output redirection or pipes. For example, if the original command is redirecting its output into a file:

cat $1 $2 > $3 then throwing an echo version in front:

echo cat $1 $2 > $3 cat $1 $2 > $3 will actually result in your debugging output being written into the output file, probably messing things up even more than they were before.

In cases like that, you can obtain the same information more easily by our next technique, tracing the execution. 5.3 Trace the Execution If you were working with a C++ program, you could use gdb or ddd to step through the program and see exactly which statements were being executed, in which order.

Shells provide a more primitive, but still useful, tracing facility. The “-x” option, supplied to sh or csh, asks it to list each command before it executes it (and after replacing any variables by their values).

So, if you have written a shell script named “myScript” with the first line:

#!/bin/sh you might test it this way:

sh -x myScript parameters

Example 5: Tracing a Script

Return, if necessary, to the directory where you created your testp3 script, and run

sh -x testp3 a b c d

and

sh -x testp3 a "b c" 'd e f' g

Observe the results of the trace. Take particular note of how each iteration of the loop is presented to you, in turn.