<<

B16: Software Engineering 2 Four Parts

Structured Programming Design patterns

Algorithms, data structures, complexity, Tested program structures to solve typical correctness problems B16 Software Engineering languages:

Structured Programming Object Oriented Programming Operating Systems Lecture 1: Software engineering Object-oriented programming Hardware abstraction and virtualisation

Dr Andrea Vedaldi Object-oriented languages: C++ Programming the hardware 4 lectures, Hilary Term

For lecture notes, tutorial sheets, and updates see http://www.robots.ox.ac.uk/~vedaldi/teach.html

B16 Part 1: Structured Programming 3 Texts 4

Software engineering principles Algorithms

Design, modularity, abstraction, encapsulation, Proving algorithm correctness by mathematical etc. induction

Structured programming languages Time and space complexity

Interpreted (MATLAB) vs compiled (C) languages Recursion

Control fow Sequencing, alternation, iteration Functions and libraries

Data Data types: primitive, aggregate, and compound The C programming language, 2nd edition Introduction to algorithms Local and global variables, parameters Kernaghan & Ritchie Cormen, Leiserson, Rivest, Stein The heap and the stack Lecture 1 outline 5 Lecture 1 outline 6

The challenge of building software The challenge of building software

Why we care about software Why we care about software

The size and complexity of code The size and complexity of code

Software engineering Software engineering

The aims and scope of software engineering The aims and scope of software engineering

Abstraction and modularity Abstraction and modularity

Design, validation & verifcation Design, validation & verifcation

Structured programming Structured programming

Structuring programs by using abstractions in a programming language Structuring programs by using abstractions in a programming language

Types of languages: imperative vs declarative Types of languages: imperative vs declarative

Fundamental abstractions Fundamental abstractions

Software is ubiquitous 7 Software in engineering: Design & Control 8

From nuclear reactors ...

Business / productivity Communication Entertainment

... to cars. The “size” of software 9 10

SLOC: number of source lines of code

/* Cy=CyMin + iY*PixelHeight; c program: if (fabs(Cy)< PixelHeight/2) Cy=0.0; /* Main antenna */ ------for(iX=0;iX { #include Zy=2*Zx*Zy + Cy; int main() Zx=Zx2-Zy2 +Cx; { Zx2=Zx*Zx; /* screen ( integer) coordinate */ Zy2=Zy*Zy; int iX,iY; }; const int iXmax = 800; /* compute pixel color (24 bit = 3 bytes) */ const int iYmax = 800; if (Iteration==IterationMax) /* world ( double) coordinate = parameter plane*/ { /* interior of Mandelbrot set = black */ double Cx,Cy; color[0]=0; const double CxMin=-2.5; color[1]=0; const double CxMax=1.5; color[2]=0; const double CyMin=-2.0; } const double CyMax=2.0; else /* */ { /* exterior of Mandelbrot set = white */ double PixelWidth=(CxMax-CxMin)/iXmax; color[0]=255; /* Red*/ double PixelHeight=(CyMax-CyMin)/iYmax; color[1]=255; /* Green */ /* color component ( R or G or B) is coded from 0 to 255 */ color[2]=255;/* Blue */ /* it is 24 bit color RGB file */ }; const int MaxColorComponentValue=255; /*write color to the file*/ FILE * fp; fwrite(color,1,3,fp); char *filename="new1.ppm"; } char *comment="# ";/* comment should start with # */ } static unsigned char color[3]; fclose(fp); /* Z=Zx+Zy*i ; Z0 = 0 */ return 0; double Zx, Zy; } output: double Zx2, Zy2; /* Zx2=Zx*Zx; Zy2=Zy*Zy */ /* */ int Iteration; const int IterationMax=200; /* bail-out value , radius of circle ; */ const double EscapeRadius=2; double ER2=EscapeRadius*EscapeRadius; /*create new file,give it a name and open it in binary mode */ fp= fopen(filename,"wb"); /* b - binary mode */ About 90 SLOCs of C code /*write ASCII header to the file*/ fprintf(fp,"P6\n %s\n %\n %d\n %d\n",comment,iXmax,iYmax,MaxColorComponentValue); /* compute and write image data bytes to the file*/ for(iY=0;iY

Source: https://www.rosettacode.org/wiki/Mandelbrot_set#C

11 12 13 14

What 130K lines of code look like 15 Lecture 1 outline 16 Apollo-11, 1969 The challenge of building software

Why we care about software

The size and complexity of code

Software engineering

The aims and scope of software engineering

Abstraction and modularity

Design, validation & verifcation

Structured programming

Structuring programs by using abstractions in a programming language

Margaret Hamilton, Types of languages: imperative vs declarative director of software engineering Fundamental abstractions Code available here: https://github.com/chrislgarry/Apollo-11/ Software engineering 17 Abstraction and modularity 18

Aims Scope The complexity of software is reduced via: Modularity Software engineering seeks principles and Software engineering is concerned with all aspects of Decomposing the system into smaller components methodologies to make programs that are: software production. It includes: Abstraction Usable Theory Each component's behaviour has a simple description, independent of the Meet their requirements, including being Computability, algorithms, correctness, other components and of the internal implementation acceptable by the users complexity, formal languages

Dependable Tools and best practices Benefts: Reliable, secure, safe Specifc programming languages, programming Understandability environments, developer tools to build, debug, Individual components are simple and easy to understand Maintainable and analyse programs Can be updated with minimal effort Reuse Management The same component can be used in many applications (e.g. transistors) Effcient Processes of software creation & maintenance Run as fast as possible with limited resources Isolating changes The implementation of a component (e.g. transistor materials) can be changed as long as the behaviour (e.g. electrical properties) does not

Abstraction and modularity 19 Abstraction and modularity 20 Examples A hierarchy of increasingly-powerful abstractions

transistor transistors fip-fops registers CPU

A Tristate buffers (4 in parallel) CPU MAR 4 Address Bus Bus PC SP Memory INCpc/LOADpc IC β IB, CLKmem = C 4 IR IR(opcode) IR(address) MBR IE = IC + IB, Data Bus B AC VEB 60 mV 4 = CU Status ALU

s SETalu Select CLK Control Lines to Registers, ALU, Memory, etc Outside the CPU

operational amplifer computer virtual machine (OS) application / interpreter

I+ V+ Iout

- + Vout

V- I-

Vgnd

V+ = V-,I+ = 0 A, I- = 0 A [Some images from Micro-controller, Murray] Producing software 21 Design & implementation 22

A software is a set of related activities that leads to the production of a software product [Sommerville]. Design Implementation

Key activities of a software process: Formal Top-down

collect the requirements implement frst the most abstract module

break down the problem in sub-problems use placeholders (mock-ups) for sub-modules

Design & map them to software sub-modules Bottom-up Specifcation implementation start from elementary module Informal (aka "extreme programming") then build more complex ones using them quickly implement & iterate over a prototype

incorporate user feedback at each iteration

Verifcation & Evolution validation

Verifcation & validation 23 Lecture 1 outline 24

Verifcation Testing strategies The challenge of building software

Why we care about software Squash the bugs: Top-down: test that the system behaves correctly overall. If not, dig into your code to fnd the problem. The size and complexity of code Black-box: test what the code does

White-box: inspect the code (possibly automatically) Bottom-up: create unit tests to test individual sub- Software engineering modules, starting from the smallest ones. This is The aims and scope of software engineering Validation extremely useful! Abstraction and modularity Test the code in the real world and get feedback Test coverage Design, validation & verifcation Exhaustive testing is impossible More often than not, the requirements are wrong (i.e. you don't know exactly how the software should behave Pick representative examples of “normal” inputs as Structured programming well as “corner cases” to be useful to the user). Structuring programs by using abstractions in a programming language Example: Test a function to compute the tangent Types of languages: imperative vs declarative normal input: tan(1.1) corner cases: tan(-pi/2), tan(0), Fundamental abstractions tan(pi/2) Imperative languages 25 Imperative languages 26

The most common programming languages are imperative. Abstractions can have a massive impact on the ease of use, understandability, maintainability, power, and effciency of programming languages. An imperative program is a list of instructions to be executed in a specifed order. Look at these three versions of the same program:

Different imperative languages are characterised by different abstractions. Machine Code (Intel ) Machine Language C Language Machine Code (Intel x86) Machine Language 0: 55 pushq %rbp int f() 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp { 1: 48 89 e5 movq %rsp, %rbp 4: c7 45 fc 01 00 00 00 movl $1, -4(%rbp) int x = 1 ; 4: c7 45 fc 01 00 00 00 movl $1, -4(%rbp) b: c7 45 f8 02 00 00 00 movl $2, -8(%rbp) int y = 2 ; b: c7 45 f8 02 00 00 00 movl $2, -8(%rbp) 12: 8b 45 fc movl -4(%rbp), %eax int z = x + y ; 12: 8b 45 fc movl -4(%rbp), %eax 15: 03 45 f8 addl -8(%rbp), %eax return z ; 15: 03 45 f8 addl -8(%rbp), %eax 18: 89 45 f4 movl %eax, -12(%rbp) } 18: 89 45 f4 movl %eax, -12(%rbp) 1b: 8b 45 f4 movl -12(%rbp), %eax 1b: 8b 45 f4 movl -12(%rbp),Machine %eax language ’s main abstraction 1e: 5d popq %rbp are mnemonics (readable names for 1e: 5d popq %rbp 1f: c3 retq instructions, registers, etc.) 1f: c3 retq

Imperative languages 27 Declarative (functional) languages 28

Abstractions have a massive impact on the ease of use, understandability, A declarative program specifes the desired behaviour of the program, but not how this is achieved in term of maintainability, power, and effciency of programming languages. elementary steps.

Here's a preview of some of the abstractions we are going to learn in this course: Regular expression C language (declarative) (imperative) [a-Z]* bool f(char const * str) local variable C Language { function or initialiser bool match = true ; procedure int f() constant literal This means that the program should while (*str) { { match any string consisting only of match &= ('a' <= *str statement int x = 1 ; letters from ‘a’ to ‘Z’. && *str <= 'Z') ; int y = 2 ; binary operator It says what the program should do. str ++ ; code block } int z = x + y ; return match ; return z ; } } This is a C implementation of the same program. return with value It specifes how to solve the problem in terms of elementary steps. Structure in imperative languages 29 Hello world! 30 An overview of fundamental abstractions Getting started with C programming

Data Control fow Data types: elementary, aggregate Blocks, conditionals, and compound. loops, switches. Variables. Statements. #include

int main(int argc, char** argv) Procedural languages { Functions, data scoping, encapsulation, recursion. printf("Hello, world!\n") ; return 0 ; }

Object-oriented programming (Part II of the course) Attach behaviour to data.

Hello world! 31 Hello world! 32 Getting started with C programming Getting started with C programming Hello world! 33 Getting started with C programming

B16 Software Engineering Structured Programming Lecture 2: Control fow, variables, procedures, and modules

Dr Andrea Vedaldi 4 lectures, Hilary Term

For lecture notes, tutorial sheets, and updates see http://www.robots.ox.ac.uk/~vedaldi/teach.html

Lecture 2 outline 35 Lecture 2 outline 36

Control fow Practical notes Control fow Practical notes

Imperative languages Clean vs obfuscated code Imperative languages Clean vs obfuscated code

Goto (considered harmful) Avoid cut & paste Goto (considered harmful) Avoid cut & paste

Blocks, conditionals, loops Blocks, conditionals, loops

State State

Variable Variable

Data types Data types

Static vs dynamic typing Static vs dynamic typing

Compiled vs interpreted language Compiled vs interpreted language

MATLAB functions, subfunctions, toolboxes MATLAB functions, subfunctions, toolboxes

C/C++ declaration, defnition, objective, and C/C++ declaration, defnition, objective, and executable fles executable fles Control fow 37 Goto with line numbers (avoid) 38

An imperative program is a list of statements (instructions) to execute Control fow using Machines like this really existed! Statements are executed sequentially BASIC language example:

The (PC) is a register pointing to the current instructions 10 i ← 0 20 i ← i + 1 It is incremented to move to the next instruction 30 print i, “ squared is “, i * i 40 if i >= 10 then goto 60 10 sleep eight hours 50 goto 20 11 wake up PC 101211 60 print “that’s all folks!” 12 have breakfast

To insert a line of code, one uses an intermediate line number Branching statements (make sure to leave some spaces in the numbers for later!) Allow for non-sequential execution, conditionally on the state of the program 10 i ← 0 Branching is performed by resetting the program counter 20 i ← i + 1 30 print i, “ squared is “, i * i

35 print i, “ cubed is “, i * i * i 13 if today is Saturday then goto 10 14 leave home 40 if i >= 10 then goto 60 50 goto 20 1982-1994 60 print “that’s all folks!”

Goto with labels (better, but still avoid) 39 Structured control fow 40

Labels are an abstraction of line numbers and simplify the use of goto Goto is (almost) never used. Any program can be Spaghetti monster expressed in terms of three simple control A label is just a name given to a statement in the sequence structures [Böhm-Jacopini 66]: i ← 0 more: i ← i + 1 print i, “ squared is “, i * i i ← 0 if i >= 10 then goto end more: i ← i + 1 Blocks: sequences of executable statements goto more print i, “ squared is “, i * i end: print “that’s all folks!” { do_this ; do_that ; do_something_else ; } if i >= 10 then goto end goto more end: print “that’s all folks!” [Dijkstra, E. W. (1968). Letters to the editor: go to Conditionals: execute a block if a condition is statement considered harmful. true Communications of the ACM, 11(3), 147-148.] we call this "" if (condition) { } Structured program Loops: keep executing a block until a condition { i ← 0 remains true while (i < 10) { while (condition) { } i ← i + 1 print i, “ squared is “, i*i } print “that’s all folks!” } Structured control fow 41 Structured control fow: procedures 42

A way to create a software module or component is to wrap a sequence of The code is much easier to understand because Spaghetti monster each block has statements in a procedure. i ← 0 procedure print_n_squared_numbers(n) { procedure do_something () { only one at the beginning more: i ← i + 1 print i, “ squared is “, i * i i ← 0 print “first four squares” if i >= 10 then goto end while (i < n) print_n_squared_numbers(4) ; only one point at the end goto more { } end: print “that’s all folks!” i ← i + 1 print i, “ squared is “, i*i ; procedure do_something_more () { } print “first ten squares” print “that’s all folks!” print_n_squared_numbers(10) ; } }

A procedure implements a reusable functionality (behaviour) hiding the internal implementation details. Structured program enter { i ← 0 Examples while (i < 10) y = tan(x) // compute the tangent of a number { - i ← i + 1 - printf(“a string”) // display a string on the screen print i, “ squared is “, i*i - window = createWindow() // create a new window on the display } print “that’s all folks!” - destroyWindow(window) // destroy it } leave

Structured control fow: procedures 43 MATLAB vs C 44

A way to create a software module or component is to wrap a sequence of statements in C version MATLAB version a procedure: #include procedure print_n_squared_numbers(n) { Calling the procedure from two other procedures: function print_n_squared_numbers(n) i ← 0 void print_n_squared_numbers(int n) i = 0 ; while (i < n) procedure do_something () { { while i < n { print “first four squares” int i = 0 ; print_n_squared_numbers(4) ; i ← i + 1 while (i < n) { i = i + 1 ; print i, “ squared is “, i*i ; } i = i + 1 ; } fprintf('%d squared is %d\n',i,i*i); procedure do_something_more () { print “that’s all folks!” printf("%d squared is %d\n",i,i*i); end print “first ten squares” } } print_n_squared_numbers(10) ; fprintf('that’s all folks!\n'); } printf("that s all folks!\n"); } end A procedure implements a reusable functionality (behaviour) hiding the internal implementation details. /* the program entry point % Example usage is called main */ int main(int argc, char **argv) print_n_squared_numbers(10) ; Examples of procedures: { print_n_squared_numbers(10) ; Both MATLAB and C are imperative and procedural. y = tan(x) // compute the tangent of a number return 0 ;

printf(“a string”) // display a string on the screen } MATLAB is interpreted, C/C++ is compiled. window = createWindow() // create a new window on the display destroyWindow(window) // destroy it MATLAB is dynamically typed, C/C++ is statically typed. Lecture 2 outline 45 Statements and the state 46

Control fow Practical notes Program state = program counter + content of memory Imperative languages Clean vs obfuscated code Executing a statement changes the state Goto (considered harmful) Avoid cut & paste Updates the program counter Blocks, conditionals, loops Almost always modifes the content of the memory as well State

Variable Example

Data types 50 * x * x + y ; // result is not remembered, no effect z = 50 * x * x + y ; // write the result to the variable z Static vs dynamic typing

Compiled vs interpreted language If a statement does not alter the content of the memory, it has essentially no effect.

MATLAB functions, subfunctions, toolboxes Exceptions: C/C++ declaration, defnition, objective, and wasting time executable fles in MATLAB, displaying a value on the screen other side effects

Memory and variables 47 Data types 48

A computer’s memory is just a large array of Data type Examples 001F034C: 66 72 65 64 words (strings of bits): A (data) type specifes Most programming languages support several The meaning depends on how it is interpreted: primitive data types: Addresses 32 bit integer 1718773092 a set of possible values 00000000 4 characters “fred” e.g. integers in the range −2,147,483,648 to 2,147,483,647, MATLAB: numeric arrays (characters, integer, 00000001 foating point 1.68302e+022 character strings single and double precision), logical arrays, cell ... operations involving variables of that type arrays, ... milesToGo: e.g. create, assign, sum, multiply, divide, print, convert to 001F034C Write a 32-bit integer to memory in C foat, ... C: various integer types, character types, foating point types, arrays, strings, ... *((int *) 0x001F034C) = 1718773092 ; MEMORY Data type representation (4GB) A data type representation specifes how values are Structured version: using a variable actually stored in memory milesToGo = 1718773092 ; e.g. integer is usually represented as a string of 32 bits, or four consecutive bytes, in binary notation

This is a another example of abstraction FFFFFFFF To a frst approximation, a variable is just a name given to a You never have to think how MATLAB represents numbers to use memory address (and a data type) them! Dynamic data typing 49 Dynamic data typing 50

Consider the following MATLAB fragment x: 5 Consider the following MATLAB fragment x: 5

% x, y, and z are stored as 64-bit float y: 10 % x, y, and z are stored as 64-bit float y: 10 x = 5 ; x = 5 ; y = 10 ; z: 50 y = 10 ; z: 50 z = x * y ; z = x * y ;

Each variable stores both:

the address of the data in memory and

the type of the data

Use the MATLAB command whos to get a list of variables and their types MEMORY MEMORY (classes):

Name Size Bytes Class Attributes

x 1x1 8 double y 1x1 8 double z 1x1 8 double

Dynamic data typing 51 Dynamic data typing 52

Consider the following MATLAB fragment x: 5 Consider the following MATLAB fragment 5

% x, y, and z are stored as 64-bit float y: 10 % x, y, and z are stored as 64-bit float 10 x = 5 ; x = 5 ; y = 10 ; z: 50 z: 50 y = 10 ; z = x * y ; z = x * y ; x: “Oxford U” % now reassign x and z x: “Oxford U” % now reassign x and z x = 'Oxford U' ; x = 'Oxford U' ; z = x * y ; z = x * y ; z: [1 x 8 array] z: [1 x 8 array] Each variable stores both:

Now variable x refers to the address of the data in memory and

a new memory block and MEMORY the type of the data MEMORY

a different data type Use the MATLAB command who to get a list of variables and their types (classes):

In MATLAB, the data type associated to a variable can be determined only at run- Name Size Bytes Class Attributes time, i.e. when the program is executed x 1x8 16 char y 1x1 8 double z 1x8 64 double This is called dynamic typing Dynamic data typing 53 Overhead in dynamic data typing 54

Consider the following MATLAB fragment 5 In dynamic data typing each a variable is contains to both the actual data record as well as metadata describing its type. x: Type = string % x, y, and z are stored as 64-bit float 10 “Oxford U” x = 5 ; While usually this is not a problem, in some cases the overhead may be y = 10 ; z: 50 signifcant. z = x * y ; y: Type = array

x: “Oxford U” 3.14 % now reassign x and z Example x = 'Oxford U' ; z = x * y ; z: [1 x 8 array] MATLAB uses about 80 bytes to store the data type descriptor: What is z? Storing one array of 1 million numbers uses

80 + 8 * 1e6 bytes (~ 7.6 MB, effciency ~ 100%) Name Size Bytes Class Attributes MEMORY MEMORY … Storing 1 million arrays of 1 number each uses z 1x8 64 double (80 + 8) * 1e6 bytes (~ 83 MB, effciency ~ 9%) Two operations are involved in calculating z: promotion: the string x is reinterpreted as an array of 1⨉8 64-bit foats vector-matrix mult.: the scalar y is multiplied by this array

Static data typing and variable declaration 55 Lecture 2 outline 56

In C variables must be declared before they can be Statically-typed variables Control fow Practical notes used have a well defned type before the program is run Imperative languages Clean vs obfuscated code

A declaration assigns statically a data type to a incorporate constraints on how a variable can be Goto (considered harmful) Avoid cut & paste variable used Blocks, conditionals, loops

Examples Static typing allows for State int anInteger ; /* usually 32 bits length */ smaller run-time overhead in handling variables Variable unsigned int anUnsignedInteger ; char aCharacter ; better error checking before the program is run Data types double aFloat ; int32_t a32BitInteger ; /* C99 and C++ */ Static vs dynamic typing int16_t a16BitInteger ; Compiled vs interpreted language

MATLAB functions, subfunctions, toolboxes

C/C++ declaration, defnition, objective, and executable fles Compiled vs interpreted languages 57 MATLAB: program organisation 58

MATLAB is an interpreted language MATLAB procedures are called functions

a MATLAB program is executed by an interpreter A MATLAB function is stored in a homonymous fle with a .m extension the interpreter converts the text fle in instructions on the fy

signifcant overhead at run-time fle: print_ten_squared_numbers.m fle: my_script.m C and C++ are compiled languages function print_ten_squared_numbers(n) % demonstrates the use of a i = 0 ; function while i < n print_ten_squared_numbers() a compiler reads and transforms the text fle into an executable before it can be executed i = i + 1 ; fprintf('%d squared is %d\n',i,i*i); no overhead at run-time end fprintf('that’s all folks!\n'); A .m fle can also contain a script. the can spot some programming error before the program runs (e.g. type errors) end A script does not defne a function. It is more similar to cutting & pasting code in to Example. Compiling the following fragment generates an error because the multiplication of an the MATLAB prompt. integer and a pointer (see later) is not defned:

int * aPointerToInt = 0 ; int anInt = 10 ; int anotherInt = anInt * aPointerToInt ;

error-pointer-by-integer.c:7: error: invalid operands to binary * (have ‘int *’ and ‘int’)

MATLAB: program organisation 59 MATLAB: grouping related functions 60

MATLAB procedures are called functions Put related functions into a given directory:

A MATLAB function is stored in a homonymous fle with a .m extension Directory: drawing/ drawAnArc.m drawAnArrow.m drawACircle.m fle: print_ten_squared_numbers.m fle: my_script.m Directory: math/ function print_ten_squared_numbers(n) % demonstrates the use of a tan.m i = 0 ; function atan.m while i < n print_ten_squared_numbers() sqrt.m i = i + 1 ; Directory: pde/ fprintf('%d squared is %d\n',i,i*i); euler.m end thats_all() ; end An .m fle defnes a function that can be Use MATLAB addpath() function to include directories in MATLAB search path accessed by functions and scripts in other function thats_all() and make these functions visible. fles. fprintf('that’s all folks!\n'); end A .m fle can contain also any number of MATLAB Toolboxes are just collections of functions organised in directories. local functions.

Local functions are only visible from the fle 1 1Advanced techniques allow to pass references to local functions, so where they are defned. that they can be called from other fles. C/C++: program organisation 61 C/C++: compiling a program 62

C/C++ explicitly support the notion of modules. Run the compiler cc Run the linker, usually also implemented in cc Each .c fle is compiled into an object fle .o The .o fles are merged to produce an executable fle A module has two parts: This is the binary translation of a module the declaration (.h), defning the interface of the functions i.e. the function names and the types of the input and output arguments usefulstuff module myprogram module the defnition (.c), containing the actual implementation of the functions declaration fle: declaration fle: usefulstuff.h N/A fle: usefulstuff.h fle: usefulstuff.c void print_n_squared_numbers(int n) ; #include "usefulstuff.h" defnition fle: defnition fle: int get_an_awesome_number() ; #include usefulstuff.c myprogram.c

void print_n_squared_numbers(int n) { compile compile int i = 0 ; while (i < n) { object fle: object fle: i = i + 1 ; usefulstuff.o myprogram.o printf("%d squared is %d\n",i,i*i); fle: myprogram.c } printf("that s all folks!\n"); #include "usefulstuff.h" link } int main(int argc, char** argv) {

print_n_squared_numbers(10) ; executable fle: int get_an_awesome_number() { } return 42 ; myprogram }

C/C++: compiling a program 63 More on declaring, defning, and calling functions 64

Declaration of the Run the compiler cc Run the linker, usually also implemented in cc void print_n_squared_numbers(int n) ; Each .c fle is compiled into an object fle .o The .o fles are merged to produce an executable fle This is the binary translation of a module Defnition of the function implementation Invocation of the function void print_n_squared_numbers(int n) print_n_squared_numbers(10) ; morestuff module usefulstuff module myprogram module { /* do something */ declaration fle: declaration fle: declaration fle: } morestuff.h usefulstuff.h N/A formal parameter actual parameter defnition fle: defnition fle: defnition fle: Declaring a function morestuff.c usefulstuff.c myprogram.c defnes its prototype = name of the functions + type of input/output parameters compile compile compile thus specifes the interface object fle: object fle: object fle: the compiler needs to know only the prototype to call the function morestuff.o usefulstuff.o myprogram.o Defning a function specifes its implementation. In the defnition, the parameters are link said to be formal as their value is not determined until the function is called. executable fle: myprogram Calling a function starts executing the function body. Now the parameters are said to be actual because they are assigned a value. Return value(s) 65 Lecture 2 outline 66

In C functions have a single output value, assigned by the return statement. Control fow Practical notes Imperative languages Clean vs obfuscated code Defnition of the function Invocation of the function int get_awesome_number() int x ; Goto (considered harmful) Avoid cut & paste { x = get_awesome_number() ; return 42 ; /* x is now 42 */ Blocks, conditionals, loops } State In MATLAB functions have an arbitrary number of output values. Variable They get their value from homonymous variables. Data types Defnition of the function Invocation of the function Static vs dynamic typing function [a,b,c] = get_many_numbers() % x gets 42, y 3.14, and z +inf a = 42 ; [x,y,z] = get_many_numbers(); b = 3.14 ; Compiled vs interpreted language c = +inf ; % get eigenvectors and eigenvalues return ; [V, D] = eig(A) ; MATLAB functions, subfunctions, toolboxes end C/C++ declaration, defnition, objective, and executable fles

Some practical notes 67 Obfuscated code (don’t) 68

The look is important Here is a valid C program (http://en.wikipedia.org/wiki/Obfuscation_(software))

Use meaningful variable names char M[3],A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf("%d",&C); Use comments to supplement the meaning -- E; J[ E] =T [E ]= E) printf("._"); for(;(A-=Z=!Z) || (printf("\n|" Indent code for each block/loop ) , A = 39 ,C -- ) ; Z || printf (M ))M[Z]=Z[A-(E =A[J-Z])&&!C Avoid cutting & pasting code & A == T[ A] |6<<27

Duplicating code is a recipe to disaster because when you need to change the code, you need to change all the copies!

Top-down vs bottom-up Can you fgure out what this program do?

Design top-down

Code bottom-up or top-down, or a combination Lecture 3 outline 70

Scope

Local and global variables Modularisation and side effects

Dynamic memory and pointers

B16 Software Engineering Memory organisation, dynamically allocating memory in the heap Structured Programming Pointers, dereferencing, referencing, references Lecture 3: Scope, dynamic memory, pointers, references, recursion, Passing by values or reference, side-effects stack, compounds Recursion Dr Andrea Vedaldi Procedures that call themselves 4 lectures, Hilary Term Recursion and local variables The stack and stack frames For lecture notes, tutorial sheets, and updates see http://www.robots.ox.ac.uk/~vedaldi/teach.html Passing functions as parameters

Compound data types: structures

Lecture 3 outline 71 The scope of a variable 72

Scope The scope of a variable is the context in which the variable can be used.

Local and global variables The scope of a local variable is the function where the variable is defned. Modularisation and side effects Usually, local variables are created when the function is entered, and destroyed when it is left. Dynamic memory and pointers

Memory organisation, dynamically allocating memory in the heap Global variables can be accessed by all functions. They are created when the Pointers, dereferencing, referencing, references program starts, and destroyed when it ends. Passing by values or reference, side-effects MATLAB example Recursion function x = myFunction(n) m here denotes a variable local to myFunction Procedures that call themselves m = 10 ; Recursion and local variables x = m * n ; end m here denotes a variable local to the test script The stack and stack frames % test script The two variables are distinct and accessible Passing functions as parameters myFunction(5) % 50 only from the respective context. m = 20 ; Compound data types: structures myFunction(5) % still 50! MATLAB global variables 73 C/C++ global variables 74

fle: myfunction.c MATLAB strongly discourages the use of global variables. Declaring a global variable #include "myfunction.h" #include When they are really needed, they must be declared by the global operator. A variable is implicitly global if declared outside of int m ; /* global */ any function. int myFunction(int n) { function x = myFunction(n) Question: which part of the program is responsible return m * n ; global m ; global lets the variable m refer to a data record for initialising m ? } x = m * n ; stored in the global memory area. end Now the two variables are the same. Scope % test script fle: myfunction.h global m ; A global variable defned in a module is visible only /* global decleration */ m = 10 ; to the functions of that module. export int m ; myFunction(5) % 50 m = 20 ; int myFunction(int n) ; myFunction(5) % 100 To make the variable visible from other modules it must be declared in the .h fle, exactly like functions. You can always use MATLAB whos command to check your variables: Furthermore, the export keyword must be used. Name Size Bytes Class Attributes m 1x1 8 double global

Procedure as functions 75 Side-effects 76

Procedures are often intended as mathematical A procedure is useful only if its behaviour is easy to predict and understand. functions: input output This is particularly desirable in software librares: Then only effect of calling a procedure is to function parameters values compute and return an output value. e.g. C/C++ math.h (tan, cos, ...), MATLAB toolboxes The output value depends only on the value of the input parameters. In practice, many procedures have useful side-effects:

This is sometimes called having the function reading a fle, displaying a message, generating an error, ... semantics allocating and returning a new memory block

Side-effects break the function-like semantics reading / writing a global variable input output function e.g. a global variable is an implicit input/ parameters values operating on data in the caller scope by means of references (see later) output parameter ...

hidden input hidden output global variable A clean interface design (and documentation) is essential to control these side- effects. 7 Lecture 3 outline 77 Memory organisation 78

Scope A structured program organises the memory into four areas: address space model

Local and global variables CODE The code area stores the program instructions. Modularisation and side effects

The OS prevents the program from changing it. DATA Dynamic memory and pointers

Memory organisation, dynamically allocating memory in the heap The data (or heap) area contains dynamically allocated records. Pointers, dereferencing, referencing, references Implicit in MATLAB, using malloc() in C. Passing by values or reference, side-effects FREE It grows towards the bottom as more memory is allocated.

Recursion The stack area is used to handle recursive procedure calls and local variables. Procedures that call themselves STACK Recursion and local variables The free area is memory not yet assigned to a particular purpose. The stack and stack frames

Passing functions as parameters

Compound data types: structures

C/C++ local variables, trace for

Dynamic memory allocation (in MATLAB) 79 Dynamic memory allocation (in C) 80

In MATLAB dynamic memory allocation is implicit. In C/C++ dynamic memory allocation is explicit.

% allocate 80,000 bytes to store an array of 10,000 double In C a new memory block is obtained by calling the malloc() function. x = zeros(100,100) ; Allocated memory must be disposed by calling free; otherwise the memory is leaked.

memory allocate block use block The output of malloc is the address of the allocated memory block.

CODE CODE CODE An address is stored into a variable of type pointer to T.

/* declare a pointer x to a double */ DATA DATA double * x ; DATA /* allocate a double (eight bytes) and store the address in x */ } 80,000 bytes x x = malloc(8) ;

FREE FREE /* better: use sizeof to get the required size */ FREE x = malloc(sizeof(double)) ;

/* write to the memory pointed by x */ *x = 3.14 ;

STACK STACK STACK /* free the memory once done */ free(x) ; Pointers and dereferencing 81 Null pointers 82

A pointer to T is a variable containing the address to a record of type By convention, the memory address 0 (0x0000000) is reserved. T. Its type is denoted T * CODE A null pointer is a pointer with value 0 (denoted by NULL). /* Declare and assign a pointer to double */ double *x ; Null pointers are commonly used to represents particular states. For example: x = malloc(sizeof(double)) ; malloc() returns NULL if the requested memory block cannot be allocated /* Dereference x to access the pointed memory */ DATA because the memory is exhausted (an error condition). *x = 3.14 ; in a linked list a NULL pointer may be used to denote the end of the list (see /* This changes the pointer, not the malloc Lecture 4). pointed data. */ output block allocated x = 42 ; 001C1D2F 3.14 by malloc Note that writing to a null pointer (or as a matter of fact to any address not /* This crashes the program corresponding to a properly allocated memory block) crashes the program (or because x does not contain FREE worse!). For example: the address of a valid dereferencing * memory block anymore */ int *myPointer = NULL ; free(x) local variable *myPointer = 42 ; /* crash */ x 001C1D2F The operator * is called dereferencing. It allows accessing the value pointed by the STACK pointer.

Referencing 83 Pointers as input arguments 84

Pointers can be copied: By using pointers, a procedure can access data that belongs to the caller. unsigned int * x = malloc(sizeof(unsigned int)) ; Example. A procedure that swaps the value of two variables: unsigned int * y = x ;

Pointers can also point to a local variable. void swap(int a, int b) { void swap(int *a, int *b) { int temp = a ; int temp = *a ; unsigned int a = 42 ; a = b ; *a = *b ; unsigned int * x = &a ; b = temp ; *b = temp ; } } a (10C4A9D1) 42 The operator & is called referencing. It returns the address of a variable. /* example usage */ /* example usage */ int x = 10 ; int x = 10 ; & * int y = 20 ; int y = 20 ; Now the same data record can be accessed swap(x,y) ; swap(&x,&y) ; by using the variable or the pointer: /* x = 10, y = 20 */ /* x = 20, y = 10 */ x 10C4A9D1 The function has no effect because calling it a and b are copies By passing pointers, the function can access the variables x and /* these three instructions have the same effect */ of x and y. x and y remain unaffected. y in the caller and can swap them. a = 56 ; *x = 56 ; STACK *(&a) = 56 ; Pass by value vs reference 85 Pointers and references: why? 86

C++ (but not C) can pass parameters by reference instead of value Uses. Pointers/references are powerful: Think of references as implicit pointers they allow a procedure to access data of the caller (e.g. swap()) they allow to pass data to a procedure avoiding copying (faster) void swap(int *a, int *b) { void swap(int &a, int &b) { E.g. think of passing a 1000-dimensional vector int temp = *a ; int temp = a ; *a = *b ; a = b ; they allow to construct interlinked data structures *b = temp ; b = temp ; E.g. lists, trees, containers in general } } Caveats. Pointers/references allow side effects: /* example usage */ /* example usage */ int x = 10 ; int x = 10 ; they make a procedure behaviour harder to understand int y = 20 ; int y = 20 ; they make programming errors harder to fnd swap(&x,&y) ; swap(x,y) ; /* x = 20, y = 10 */ /* x = 20, y = 10 */ An error inside a procedure may affect the caller in unpredictable ways

Using pointers (C or C++). Using references (C++ only) In MATLAB there are (almost) no references nor pointers it is only possible to assign or copy the value of variables Under the hood, however, all data are passed by reference. The pass-by-value semantic is ensured by sharing copies as much as possible.

Lecture 3 outline 87 Recursion 88

Scope Recursion is one of the most powerful ideas in

Local and global variables Algorithmic techniques such as divide & conquer map directly to recursion Many data structures are recursive (e.g. trees) Modularisation and side effects Procedures can also be called recursively Dynamic memory and pointers Example: computing the factorial of n Memory organisation, dynamically allocating memory in the heap Pointers, dereferencing, referencing, references Mathematical defnition Corresponding C function Passing by values or reference, side-effects 1, n = 1, int fact(int n) fact(n) = { Recursion {n fact(n − 1), n > 1. int m ; if (n == 1) return 1 ; Procedures that call themselves m = n * fact(n - 1) ; Recursion and local variables return m ; The stack and stack frames }

Passing functions as parameters

Compound data types: structures Recursion and local variables 89 Recursion and the stack 90

The stack is a memory area used to handle (recursive) calls to procedures.

A stack frame is pushed on top of the stack when a procedure is entered, and Each time a function is re-entered a new copy int fact(int n) popped when it is left. It contains: { of the local variables m and n is created. int m ; a return location (PC) to enable resuming the caller upon completion of the procedure if (n == 1) return 1 ; the input and output parameters m = n * fact(n - 1) ; stack frame the local variables return m ; } local variable m CODE ... local variable 1 DATA The local variables constitute the “private” state of the function. Each execution has return location its own state. parameter k ... In this manner, recursive calls do not interfere with each other. FREE parameter 1 And memory for local variable is allocated only when needed. return value n STACK ... STACK return value 1

Example of recursive calls 91 Example of recursive calls 92

int fact(int n) fact(1) m int fact(int n) fact(1) m { res { res int m ; n 1 int m ; n 1 if (n == 1) return 1 ; return value if (n == 1) return 1 ; return value 1 res: m = n * fact(n - 1) ; res: m = n * fact(n - 1) ; m m return m ; fact(2) return m ; fact(2) 2 } res } res n 2 n 2 /* example usage */ return value /* example usage */ return value 2 x = fact(5) ; fact(3) m x = fact(5) ; fact(3) m 6 res res lab: printf(“fact(5) is %d”, x) ; lab: printf(“fact(5) is %d”, x) ; n 3 n 3 return value return value 6 fact(4) m fact(4) m 24 res res n 4 n 4 return value return value 24 fact(5) local variable m fact(5) local variable m 120 return address lab return address lab input param. n 5 input param. n 5 return value return value 120 Recursion: a more advanced example 93 Lecture 3 outline 94

Multiple recursion. A procedure can call itself multiple times. Scope

This examples paints the image region of colour old_colour containing the pixel Local and global variables (x, y) with new_colour. Modularisation and side effects const int SIZE = 256 ; int im[SIZE][SIZE] ; Dynamic memory and pointers void fill (int x, int y, int old_colour, int new_colour) (x,y-1) Memory organisation, dynamically allocating memory in the heap { Pointers, dereferencing, referencing, references if (x >= 0 && x < SIZE && y >=0 && y < SIZE) { if (im[y][x] == old_colour) { Passing by values or reference, side-effects im[y][x] = new_colour; (x-1,y) (x,y) (x+1,y) fill (x-1,y,old_colour,new_colour); Recursion fill (x+1,y,old_colour,new_colour); fill (x,y-1,old_colour,new_colour); Procedures that call themselves fill (x,y+1,old_colour,new_colour); (x,y+1) } Recursion and local variables } return; The stack and stack frames } Passing functions as parameters

Compound data types: structures

Passing functions as parameters 95 Passing functions as parameters (in MATLAB) 96

A function can be passed as a parameter of another function to change its behaviour. More in general, there is one ODE problem for each function F:

Example. Consider implementing an algorithm for the numerical solution of a frst y·(t) = F(y(t)), t ≥ 0 order ODE The Euler solver needs to be modifed: y·(t) = − y(t), t ≥ 0 y(0) = y0 The Euler method chooses a step size h and an initial condition y0 and then uses the iteration: y(hn) = hF(y(h(n − 1))) + y(h(n − 1)), n = 1,2,…, N − 1 To avoid writing a new program for each F pass the latter as a parameter: y(0) = y0 % Example usage function y = solve(F, y0, h, N) y(hn) = − hy(h(n − 1)) + y(h(n − 1)), n = 1,2,…, N − 1 function myF(y) y = zeros(1, N) ; ydot = -y ; y(1) = y0 ; MATLAB implementation: end for n = 2:N ydot = F(y(n-1)) ; function y = solve(y0, h, N) y(n) = y(n-1) + h * ydot ; solve(@myF, Y0, h, N) ; y = zeros(1, N) ; end y(1) = y0 ; end for n = 2:N ydot = - y(n-1) ; y(n) = y(n-1) + h * ydot ; The @ operator returns a handle to a function. A handle is similar to a pointer. end end Passing functions as parameters (in C) 97 Lecture 3 outline 98

In C one uses a pointer to a function: Scope

double myF(double y) { Local and global variables return -y ; Modularisation and side effects }

/* this is not equivalent to MATLAB version as it integrates only one Dynamic memory and pointers step */ double solve(double func(double), double y, double h) Memory organisation, dynamically allocating memory in the heap { Pointers, dereferencing, referencing, references double ydot = func(y) ; return y + h * ydot ; Passing by values or reference, side-effects } This declares a parameter func.

int main() The type of func is “pointer to a function Recursion { that takes a double as input and returns a Procedures that call themselves int n ; double as output”. double y[200], h = 0.05; Recursion and local variables y[0] = 1.0; for (n = 1 ; n < 200 ; n++) { The stack and stack frames y[i] = solve(myF, y[n-1], h); printf(“Y[%d] = %f\n”, n, y[n]); } Passing functions as parameters } Compound data types: structures

Custom and structured data types 99 MATLAB structures 100

All languages support natively a number of primitive types A structure is a compound data type which comprises related data into a single entity. C/C++: char, int, float, double, ... MATLAB: arrays of char, int16,int32, single, double, ... In MATLAB, a structure is defned by assigning a variable using the . operator.

Most languages support defning novel data types. Often these are compound Example: create and assigns a new variable person: types combining primitive types. C: array and structures (struct) person.name = 'Isaac' ; person.surname = 'Asimov' ; C++: array, structures (struct), and classes (class) person.age = 66 ; MATLAB: cell arrays { }, structures, and classes (class) person.occupation = 'writer' ;

Structures can be used to group related information together into a single data The variable person is a structure with the following felds: name, surname, age record. and occupation.

Classes add a behavior to structures in term of a custom set of operations that can Structures can contain other structures, recursively: be applied to the data (see the next lecture series). person.address.city = 'New York' ; person.address.zipCode = '12345' ; C/C++ structures: struct Example: VTOL state

In C, each new type of structure must be declared before a corresponding variable VTOL (Vertical Take-Off and Land) VTOL state: can be defned and assigned: height, velocity, mass (numbers) velocity trust Declaration Defnition and assignment landed (bool) struct Complex_ { struct Complex_ c ; gravity height Use a single structure to store all double re ; c.re = 1.0 ; double im ; c.im = 0.0 ; numbers } ; data encapsulation abstraction typedef can be used as a shorthand: Example

struct Complex_ { double re ; typedef struct double im ; Complex c ; { } ; c.re = 1.0 ; double position ; typedef struct Complex_ c.im = 0.0 ; double velocity ; Complex ; double mass ; bool landed ; } VTOLState ;

Using structures in C/C++ 103

Creating a structure

% As a local variable % Dynamically VTOLState state ; VTOLState * statePtr ; state.position = 10 ; statePtr = malloc(sizeof(VLTOState)); state.velocity = 5 ; statePtr->position = 10 ; state.mass = 1000 ; statePtr->velocity = 5 ; state.landed = false ; statePtr->mass = 1000 ; statePtr->landed = false ; B16 Software Engineering

Note: x-> combines dereferencing and structure access. It is the same as (*x). Structured Programming Lecture 4: Programs = Algorithms + Data structures Passing a structure to a function

% By value % By a pointer Dr Andrea Vedaldi double getThrust(VTOLState double getThrust(VTOLState 4 lectures, Hilary Term state) ; *statePtr) ;

double t = getThrust(state); double t = getThrust(statePtr) ; For lecture notes, tutorial sheets, and updates see http://www.robots.ox.ac.uk/~vedaldi/teach.html double t = getThrust(&state) ; Lecture 4 outline 105 Lecture 4 outline 106

Arrays Linked list Arrays Linked list

In MATLAB and C Search, insertion, deletion In MATLAB and C Search, insertion, deletion

Pointer arithmetic Pointer arithmetic Trees Trees Binary search trees Sorting Binary search trees Sorting

The sorting problem The sorting problem Graphs Graphs Insertion sort Insertion sort Minimum spanning tree Minimum spanning tree Algorithmic complexity Algorithmic complexity

Divide & conquer Divide & conquer

Solving problems recursively Solving problems recursively

Merge sort Merge sort

Bisection root fnding Bisection root fnding

Arrays 107 Array representation in C 108

An array is a data structure containing a numbered (indexed) collection of items of a single data type. In C an array is represented as a sequence of records at consecutive memory addresses. In MATLAB arrays are primitive types. /* array of five doubles */ A: element [0] In C, arrays are compound types. Furthermore, C arrays are much more limited than MATLAB’s. double A [5] ; element [1] fve consecutive blocks of size

element [2] sizeof(double) /* get a pointer to element [3] /* Define, initlaise, and access an array of three integers in C */ the third element */ each int a[3] = {10, 20, 30} ; double * pt = &A[2] ; element [4] } int sum = a[0] + a[1] + a[2];

/* Arrays of custom data types are supported too */ VTOLState states[100]; for (t = 1 ; t < 100 ; t++) { Two (and more) dimensional arrays A: element [0][0] states[t].position = states[t-1].position + states[t-1].velocity + 0.5*g; element [0][1] states[t].velocity = states[t-1].velocity + g are simply arrays of arrays. element [0][2] – getThrust(states[t-1], burnRate) / states[t-1].mass; element [0][3] states[t].mass = states[t-1].mass – burnRate * escapeVelocity ; /* A 2x5 array */ element [0][4] 10 elements in a 2 × 5 array } double A[2][5] ; element [1][0] (row-major order) element [1][1] element [1][2] element [1][3] } element [1][4] Static vs dynamic arrays in C 109 Lecture 4 outline 110

This C statement defnes an array a of fve integers Arrays Linked list int A[5] ; In MATLAB and C Search, insertion, deletion

The size is static because it is specifed before the program is compiled. What if the Pointer arithmetic Trees size needs to be adjusted at run-time? Binary search trees Sorting The solution is to allocate dynamically the required memory: The sorting problem Graphs int arraySize = 5 ; int *A = malloc(sizeof(int) * arraySize) ; Insertion sort Minimum spanning tree Algorithmic complexity Note that A is declared as a pointer to an int, not as an array. However, the array access operator [ ] can still be used. E.g. A[1] = 2 Divide & conquer Pointer math: A[n] is the same as *(A + n) Solving problems recursively E.g. A[0] is the same as dereferencing the pointer *A Merge sort Under the hood, the address stored by a is incremented by n * sizeof(int) to account for the size of the pointed elements Bisection root fnding

Sorting 111 Insertion 112

Problem: sort an array of numbers in non-decreasing order. The insertion procedure extends a sorted array by inserting a new element into it:

There are many algorithms to do this: bubble sort, merge sort, quick sort, ... % Input: array A of size ≥ n such that A[1] <= ... <= A[n-1] % Output: permuted array such that A[1] <= ... <= A[n-1] <= A[n] We will consider three aspects: function A = insert(A, n) i = n Describing the algorithm formally. % the invariant is true here I.e., start with the Proving its correctness. while i > 1 and A[i-1] > A[i] frst n-1 elements swap(A[i-1], A[i]) sorted, end with n Evaluating its effciency. i = i - 1 % the invariant is true here We start from the insertion sort algorithm end end Input: an array of numbers. Output: the numbers sorted in non-decreasing order. A loop invariant is a property that is valid before each loop execution starts. It is Algorithm: initially the sorted output array is empty. At each step, remove an usually proved by induction. For insert() the loop invariant is: element from the input array and insert it into the output array at the right place. A[1] ≤ A[2] ≤ ... ≤ A[i-1] ≤ A[i+1] ≤ ... ≤ A[n] and A[i] ≤ A[i+1] See http://www.sorting-algorithms.com/ for illustrations Insertion: example 113 Insertion sort 114

n = 9; means ≤ % Input: an array A with n elements % Output: array A such that A[1] <= A[2] <= ... <= A[n] i = 9 function A = insertionSort(A, n) 4 7 10 12 14 15 19 20 13 3 31 i = 1 % the invariant is true here (A) i = 8 while i < n

4 7 10 12 14 15 19 13 20 3 31 i = i + 1 A = insert(A, i) % the invariant is true here (B) i = 7 end 4 7 10 12 14 15 13 19 20 3 31 end

i = 6 Loop invariant: the frst i elements are sorted: A[1] ≤ A[2] ≤ ... ≤ A[i]

4 7 10 12 14 13 15 19 20 3 31 Proof by induction base case (A) i = 1: A[1] is sorted i = 5

4 7 10 12 13 14 15 19 20 3 31 inductive step (B) i ≥ 1: at iteration i the insert() procedure sorts A[1], ..., A[i] provided that A[1], ..., A[i-1] are sorted. The latter is given by the invariant at iteration i - 1.

Insertion sort: example 115 Algorithmic complexity 116

Task: sort 5 4 1 2 3 The time complexity of an algorithm is the maximum number of elementary operations f(n)required to process an input of size n. Its space complexity is the maximum amount of memory required. insert(2) insert(5)

5 4 1 2 3 1 2 4 5 3 It often suffces to determine the order of the complexity g(n): linear n, squared n2, polynomial nk, logarithmic log(n), exponential exp(n), ... We say that the order of 1 2 4 3 5 is , and we write , if: insert(3) f(n) g(n) f(n) = O(g(n))

4 5 1 2 3 1 2 3 4 5 ∃a, n0 : ∀n ≥ n0 : f(n) ≤ ag(n) Example: insertion sort 4 1 5 2 3 1 2 3 4 5 The size of the input is the number n of elements to sort. The space complexity is as the algorithm stores only the elements and a insert(4) O(n) constant number of local variables. 1 4 5 2 3 The time complexity of insert() is O(m) as the while loop is executed at most m times. The time complexity of insertionSort() is O(n2) because n 1 4 2 5 3 (n + 1)n 2 ∑ m = = O(n ) m=1 2 1 2 4 5 3 Lecture 4 outline 117 Divide and conquer 118

Arrays Linked list Divide and conquer is a recursive strategy applicable to the solution of a wide variety of problems. In MATLAB and C Search, insertion, deletion The idea is to split each problem instance into two or more smaller parts, solve Pointer arithmetic Trees those, and recombine the results.

Binary search trees % Divide and conquer pseudocode Sorting solution = solve(problem) If problem is easy, compute solution The sorting problem Graphs Else Insertion sort Minimum spanning tree Subdivide problem into subproblem1, subproblem2, ... sol1 = solve(subproblem1), sol2 = solve(subproblem2), ... Algorithmic complexity Get solution by combining sol1, sol2, ...

Note the recursive call. Divide and conquer is naturally implemented as a recursive Divide & conquer procedure. Solving problems recursively Some of the best known and most famous (and useful) algorithms are of this form, Merge sort notably quicksort and the Fast Fourier Transform (FFT). Bisection root fnding

Complexity of divide and conquer 119 Merge sort 120

Assume that the cost of splitting and merging a subproblem of size m is O(m) The merge sort algorithm sorts an array A by divide and conquer: (linear) and that the cost of solving a subproblem of size is . m = 1 O(1) Split: divide A into two halves A1 and A2.

split & merge solve total Merge: iteratively remove from the beginning of the sorted A1 and A2 the smallest element and append it to A. 1 problem of size 8 1 × 8 0 8 Base case: if A has one element only it is sorted.

2 problems of size 4 2 × 4 0 8 function A = mergeSort(A) function A = merge(A1,A2) n = length(A) i1 = 1, i2 = 1 % nothing to do if one element m1 = length(A1), m2 = length(A1) if n == 1 then return A while i1 <= m1 and i2 <= m2 3 problems of size 2 4 × 2 0 8 % split into half if A1[i1] <= A2[i2] k = floor(n / 2) A[i1+i2-1] = A1[i1], i1 = i1 + 1 A1 = A(1:k) else A2 = A(k+1:end) A[i1+i2-1] = A2[i2], i2 = i2 + 1 8 problems of size 1 0 8 × 1 8 % solve subproblems end A1 = mergeSort(A1) end A2 = mergeSort(A2) while i1 <= m1 Given a problem of size n, at each level O(n) work is done in order to split&merge % merge solutions A[i1+i2-1] = A1[i1], i1 = i1 + 1 or solve subproblems. Since there are levels the total cost is log2(n) return merge(A1, A2) end end while i2 <= m2 A[i1+i2-1] = A2[i2], i2 = i2 + 1 O(n log2 n) end end Merge sort: merging example 121 Insertion vs Merge Sort 122

Task: merge 1 4 6 8 2 3 5 7 Insertion Sort The two sorting algorithms have different complexities: 1 10 Merge Sort insertion: O(n2) 1 4 6 8 2 3 5 7 1 4 6 8 2 3 5 7 merge: O(n log(n)) 0 1 1 2 3 4 5 10 Plotting time vs size in loglog coordinates should give a line of slope: 2 for insertion sort −1 1 4 6 8 2 3 5 7 1 4 6 8 2 3 5 7 10 ~ 1 for merge sort

1 2 3 4 5 6 Time [s] 1 2 This is verifed experimentally in the fgure. −2 10

1 4 6 8 2 3 5 7 1 4 6 8 2 3 5 7 −3 10 1 2 3 1 2 3 4 5 6 7

−4 10 1 4 6 8 2 3 5 7 1 4 6 8 2 3 5 7 1 2 3 4 10 10 10 10 1 2 3 4 1 2 3 4 5 6 7 8 Input size

Root fnding 123 Lecture 4 outline 124

Problem: fnd a root of a non-linear scalar function f(x)) Arrays Linked list i.e. a value of x such that f(x) = 0. In MATLAB and C Search, insertion, deletion Assumption: is a continuous function defned in the interval ; f(x) [a, b] Pointer arithmetic Trees furthermore, f(a) f(b) < 0. Binary search trees The bisection algorithm is a divide and conquer strategy to solve this problem. Sorting function bisect(f, a, b) f(b) The sorting problem Graphs m = (a + b) / 2 if f(m) close to zero then return mu Insertion sort Minimum spanning tree if f(m) * f(a) > 0 return bisect(f, m, b) Algorithmic complexity else return bisect(f, a, m) end Divide & conquer a m b Solving problems recursively Merge sort f(a) f(m) Bisection root fnding ⏟ Linked lists 125 Inserting an element into a linked list 126

A limitation of arrays is that inserting an element into an arbitrary position is O(n). To insert an element into a list, use pointers to create a “bypass” at cost O(1).

This is because existing elements must be shifted (moved in memory) in order to list prev make space for the a one. next next next next next next ∅ NA value value value value value value Linked lists solve this problem by using pointers:

list frst el. last el. next /* Insert an element in a list */ void insert(ListElement *prev, ∅ value next next next next next next ListElement *element) { NA value value value value value value element element->next = prev->next ; prev->next = element ; }

In C a list data type could be defned as follows: Example usage /* Create an empty list */ /* Insert at the beginning of the list /* List element datatype */ The list could be defned as a pointer to its List list ; */ frst element. list->next = NULL ; insert(&list, element) ; typedef struct ListElement_ {

struct ListElement_ *next ; /* Create an element */ /* Insert after element */ double value ; It is customary to use instead a fake list element ListElement *element = insert(element, element2) ; } ListElement ; (which contains such a pointer) to simplify coding malloc(sizeof(ListElement)); functions using the list. element->next= NULL ; /* List datatype */ element->value = 42.0 ; typedef ListElement List ;

Removing an element from a linked list 127 Lecture 4 outline 128

To insert an element into a list, use pointers to create a “bypass” at cost O(1). Arrays Linked list

list frst el. prev removed In MATLAB and C Search, insertion, deletion next next next next next next ∅ NA value value value value value value Pointer arithmetic Trees Binary search trees Sorting /* Remove an element */ ListElement *remove(ListElement *prev) { The sorting problem Graphs ListElement removed = prev->next ;

if (removed != NULL) { Insertion sort Minimum spanning tree prev->next = removed->next ; } Algorithmic complexity Example usage return removed ; } Divide & conquer /* Remove the element after previous */ ListElement * removed ; removed = remove(previous) ; Solving problems recursively

/* Do not forget to release the memory if needed */ Merge sort if (removed != NULL) { free(removed) ; Bisection root fnding } Binary tree 129 Binary search tree 130

Each node in a binary tree has one left child and one right child. A binary search tree is a binary tree such that the value of each node is 8 left right There are no backward links (no cycles). at least as large as the value of its left Example C data type 3, 6, 7 ≤ 8 10, 12 > 8 descendants Similar to a linked list: value 6 10 smaller than the values of its right descendants left right typedef struct Node_ { left right left right struct Node_ *left ; 3 ≤ 6 7 > 6 12 > 10 struct Node_ *right ; Its main purpose is to support the binary search value value double value ; algorithm. 3 7 12 parent } Node ; left right left right left right left right left right left child right child

value value value Depth frst traversal (post-order) left right left right left right This algorithm visits recursively all the nodes in a tree:

function visit(node) if node == NULL then return visit(node.left) visit(node.right) print(node.value) end

Binary search algorithm 131 Balanced binary trees 132

Problem: fnd a node with value x in a binary search tree. internal node A binary tree of height h has at most n = 2h leaves. The binary search algorithm searches for x recursively, using the binary search leaf node tree property to descend only into one branch every time. Proof: function node = binarySearch(node, x) Let n(h) the maximum number of leaves of a binary tree of if node == NULL return NULL height h if node.value == x return node if x > node.value Split it at the root in two subtrees of height h′ ≤ h − 1 return binarySearch(node.right, x) else Then at best n(h) = 2n(h − 1) return binarySearch(node.left, x) end end height = For a given n, the shortest binary tree is balanced: longest path The cost is O(h) where h is the depth of the binary tree. Typically h = O(log n), where n is the number of nodes in the tree. Hence the search cost is O(log n), sub-linear. Compare this with the O(h) cost of searching in an array or a linked list. 4 = 22 Decision tree 133 Sorting algorithms as decision trees 134

Many algorithms can be described as decision trees: at every step one changes An algorithm that sorts an array A based on pairwise comparisons can be the state based on a binary test applied to the input. thought of as a decision tree.

For example, the following algorithm decides whether a person is a father, a A[1] < A[2] mother, or neither. yes no

A[2] < A[3] A[2] < A[3] Gender = male? yes no yes no yes no Person A[3] < A[4] ...... Has children? Has children? yes no yes no yes no A[4] < A[1] Permutation: ... yes no A[1] A[2] A[3] A[4] Father Neither Mother Neither

Permutation: Permutation: A[2] A[3] A[1] A[4] A[4] A[3] A[1] A[2]

Sorting algorithms as decision trees 135 Sorting algorithms as decision trees 136

For a sorting algorithm:

Leaves = possible permutations of the array n n − 1 n n t ≥ log2 n! = log2 n ⋅ (n − 1) ⋅ … ⋅ ⋅ ⋅ … ⋅ 2 ⋅ 1 ≥ log Path from root to leaf = comparisons in a run of the algorithm 2 ( 2 ) 2 2 n n 2 Tree height = maximum number of steps required ≥( 2 )

For an array of length n, there are n! possible permutations

The tree height is at least (achieved by a balanced tree) log2(n!) height = The best possible algorithm is O(n log n) number of Hence the best possible algorithm requires at least Hence merge sort is optimal*! steps in the t = log2(n!) worst case steps in the worst case

output binary test permutation *in the asymptotic worst case sense Lecture 4 outline 137 Graphs 138

Arrays Linked list A (directed) graph is a set of vertices V and edges E ⊂ V × V connecting the vertices. An undirected graph is a graph such that for each edge (u, v) there is an opposite edge In MATLAB and C Search, insertion, deletion (v, u).

Pointer arithmetic Trees % MATLAB representation Binary search trees edges = [1 2 2 3 4 5 5 6 2 3 6 4 5 6 8 7 Sorting 2 3 6 4 5 6 8 7 1 2 2 3 4 5 5 6] ;

The sorting problem Graphs An alternative representation of a graph is the adjacency matrix A. A is a n × n matrix such that if, and only if, . Insertion sort Minimum spanning tree A(u, v) = 1 (u, v) ∈ E 1 Algorithmic complexity A = [0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 2 6 0 1 0 1 0 0 0 0 7 Divide & conquer 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 3 Solving problems recursively 5 0 1 0 0 1 0 1 0 Merge sort 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0] ; 4 8 Bisection root fnding

Minimum spanning tree 139 Concept summary 140

Consider a weighted undirected graph with non-negative weights on the edges: Software engineering processes Data structures and algorithms

1 Specifcation, design & implementation, validation, Complexity and correctness evolution 6 Arrays, lists, trees, graphs 1 5 2 6 Waterfall and extreme programming 10 7 Sorting, searching, numerical problems

2 3 Software engineering tools 5 Exam questions 3 4 1 Abstraction and modularity 5 See tutorial sheet 4 8 Procedures

Variables, data type, scoping A spanning tree is a subset of the edges forming a tree including all the nodes. Dynamic memory allocation A minimum spanning tree (MST) is a spanning tree such that the sum of the edge weights is minimal. Pointers, references

A famous algorithm to compute the MST is explored in the tutorial sheet. Recursion, stack, stack frames

Pointers to functions

Compound data types Acknowledgments 141

Thanks to Samuel Albanie for suggesting numerous improvements