z Linux has come a long way since its humble beginnings in 1991 z Today Linux supports a very wide range of Embedded Linux platforms, from Embedded Systems based on ARM, PowerPC, Intel, and Hitachi microprocessors to name a few, all the way up to workstations, servers, and clusters z It also served as a launch pad for the open source movement, and consequently lead to great interest from academia and business alike

Babak Kia Adjunct Professor Boston University 2 College of Engineering Email: bkia -at- bu.edu ENG SC757 - Advanced Microprocessor Design

What is Linux? Why Linux?

z Linux is free open source which z Due to its open source nature, Linux has a highly is fully featured, portable, and extremely versatile qualified code base z It runs on everything from PDAs to the largest z The Kernel can be very small, it could fit onto a single Mainframes 1.4MB floppy disk drive, while including all the fundamental operating system tasks! z Unlike traditional proprietary software, Linux is z It is highly portable, it is available for almost every developed by a multitude of developers across microprocessor system in existence today the world z It is highly supported, it draws on the open source z People often (and mistakenly) use the term Linux community across the globe for both development and to refer to one of three disparate concepts: support • A Linux Distribution z It supports a multi-user environment with a built in • A Linux System capability to concurrently execute applications belonging to 2 or more users • The z Supports multi-processor systems z Our focus is primarily on the Linux Kernel, and therefore the term Linux refers to the Kernel itself z Well documented. The source code is available! 3 4

What is uClinux What is a Linux Distribution?

z uClinux is not Linux, it is a variant of it which runs z A Linux distribution is a collection of software on processors which lack memory management components, including the Linux Kernel itself, as z Without memory management, there is no well as the GNU toolchain (compiler, linker, etc.), differentiation between user space and kernel and a number of free and open source software space, and therefore all applications run at such as Emacs, X11, FTP, etc. Privilege Level 0 z There are many companies involved in creating z Without memory management, all code runs in a distributions, such as RedHat, SUSE, and flat memory space, and therefore doesn’t require Mandriva, and there are many community projects a virtual memory subsystem dedicated to creating distributions (or distros) such as Debian and Gentoo Linux z Therefore in this configuration all processes have direct access to memory and I/O resources, and z There are over 300 active Linux distribution device drivers are not necessary projects in existence today

5 6

1 Linux Distribution The Linux Kernel

z Without distros, a person interested in Linux z The most important element of Embedded Linux would have install everything manually which is its core, called the Linux Kernel basically required a great expertise of the z The Linux Kernel is maintained and distributed by Operating System Linus Trovalds, who initially wrote the Kernel z Distros therefore making the process of installing when he was a student at the University of Linux easier, they usually provide both binaries Helsinki and source, and are segmented into packages, z Unlike proprietary Operating Systems, its source each package providing one component of the code is available for anyone to freely use, system such as font, web browser, etc. distribute, or modify z Some popular Package Management Systems are: z The latest released version of the Linux Kernel is • RPM – The RPM package manager version 2.4, though development of the Linux • deb – The Debian package Kernel is of course ongoing and newer versions • tgz, or tar.gz – Archived tar and gzipped file, used to become available on a regular basis distribute simple hand made packages

7 8

The Linux Kernel GNU

z Like any Operating System, the Linux Kernel is z GNU is an acronym for GNU’s Not Unix, and is responsible for managing resources (memory and pronounced guh-noo I/O), contains device drivers, networking stack, file z The GNU project was started in 1983 with the goal system, and performs other OS tasks of creating a UNIX flavored operating system which was freely distributable z Linux implements different privilege levels, where a module, which is a Kernel function runs in z GNU is not Linux! GNU is used in conjunction with the Linux kernel to form a completely kernel space (supervisor mode), and user operational Operating System. This GNU/Linux applications run in user space (user mode) combination (distribution) is often mistakenly z Linux can mange both multiple processes and called Linux multiple processors (symmetric multiprocessing, z Some software developed by the GNU project are: or SMP systems). As such, all kernel code is (command shell), Emacs (text editor), gzip reentrant (data compression), and GNOME (graphical desktop environment)

9 10

General Public License General Public License

z The GNU General Public License, or GPL as it is z The GPL has been at the center of controversy recently, otherwise known is the free software license under opponents of GPL often call it viral, implying that the which Linux is written and distributed license acts as a virus in that once it comes in contact z The GPL grants the recipient of a computer program the with proprietary code, then the proprietary code following rights: becomes GPL • The freedom to run the program for any purpose z This is an incorrect assessment as GPL simply requires • The freedom to study how the program works and to all copies of derived work to be GPL licensed modify it z In another example, in 2003 the SCO group sued IBM • The freedom to redistribute copies of the program claiming that the latter had contributed portions of • The freedom to improve the program, and to redistribute SCO’s copyrighted code to the Linux Kernel and went the improvements to the public further by threatening legal action against a number of z The GPL is in contrast to the end-user licenses that companies and demanding licensing fees from them plague proprietary software, which rarely grant the end- z To date, there is no proof to SCO’s claims of the use of user any rights copyrighted code

11 12

2 Other Licensing Models The Copyleft

z GPL is not the only licensing model available z The right to redistribute GPL-based code is granted only if the licensee includes the source code in the z Some licenses such as BSD permit distribution of redistribution (including all modifications!) a modified BSD-based code as proprietary software z The redistributed copies themselves are required to include and be licensed under GPL in a mechanism z The difference between GPL and BSD licenses is known as copyleft legal mechanism known as copyleft, invented by z Copyleft actually derives its legal impact from the fact (initiator of GNU project and that the program is copyrighted! founder of ) z Under a copyright, a licensee does not have the right to z Copyleft requires that derivative works of a GPL- modify or redistribute the code unless under the terms licensed application also be covered by the GLP outlined in copyleft license z Therefore copyleft uses copyright law to accomplish an almost opposite effect – granting modification and redistribution rights

13 14

The GNU Toolchain The GNU Toolchain

z Linux relies on the GNU development z The GNU toolchain is an overall term given to the toolchain series of programming tools developed by the GNU project z A toolchain is series of programming z The projects include: tools (assembler, compiler, linker, etc.) • GNU – Build and compilation automation which are used to create another • GNU Compiler Collection (GCC) – Compilers for computer program several programming languages • GNU Binutils – Linker, assembler, and other tools z The tools are used sequentially, or in a • GNU Debugger (GDB) – Interactive debugger chain, in such a way that the output of z Other related projects are: one program becomes the input of • GNU C Library – A standard C library another one, hence the term toolchain • CVS – Concurrent Version System

15 16

CVS CVS

z The Concurrent Versions System z The way CVS works is as follows: implements a version control system to • Any number of clients (developers) can check- keep track of all the work and changes in out a full copy of a given project • One or more developers can work on the same a set of files copy of the code and then check-in their z This enables developers from across the modifications globe to collaborate on a project and as • The CVS server automatically attempts to merge the different changes such as become a popular component of • If it is unsuccessful, for example in the case the open-source development community where two developers are trying to modify the same line of code, it rejects the second developer from updating the code, and directs the two developers to merge the code manually 17 18

3 Developing a Linux System Starting up Linux

z There are three basic setup mechanisms which z From system power up to the time the developers use to develop code for Linux system is up and running, there are three • The Permanent Link Setup is where the host and the distinct steps the must be completed target are permanently connected together via an Ethernet cable for example. In this case a root file • Bootloader is the first piece of code which system can be NFS-mounted which prevents the runs on the hardware and it is closely related need for constantly copying programs back and to the type of platform on which it runs. There forth are many different types of bootloaders for • The Removable Storage Setup is a situation where Linux the code is created on the host, copied onto a • Kernel Startup Code is the second stage of the removable storage device such as Compact Flash and transferred to the target boot process and it too differs greatly depending on the target platform • The Stand-alone Setup is a situation where the toolchain is contained on the target, as could be the • Init is the final process which further initializes case for creating embedded Linux on PC-based the system platforms 19 20

Linux Device Drivers Linux Device Drivers

z Most Linux users are happily unaware of the complexities associated with the underlying hardware z But every piece of the underlying hardware requires a device driver be written for it, and this is a job designers bravely undertake z In the Linux Kernel there are many concurrent processes which tend to various system resources, such as memory, I/O, or the file system z Though the Kernel can have any number of processes, it can basically be broken into the following groups:

21 22 * Linux Device Drivers, Allesandro Rubini and Jonathan Corbet

Resource Management Resource Management

z Process Management z Device Control • It’s the Kernels task to manage processes, to • The Kernel implements a device driver for ensure that they can communicate with each other, every hardware resource which is available on and that they are scheduled, created and disposed the system, ranging from hard drives to Timer of properly modules z Memory Management • The Kernel is also responsible for handling memory z Networking resources, providing a virtual address space and • Finally, the Kernel is responsible for providing memory management a networking stack to the higher-level z Filesystem operating system functions • Filesystem is a major component of a UNIX (and Linux) operating system. Almost every resource in UNIX can be treated as a filesystem

23 24

4 Classes of Devices Classes of Devices

z Unix differentiates its resources into z Block Devices three classes of devices • Another class of devices is the block device, which are closely tied to resource such as a Compact z Character Device Flash card where the resource can only be • One of the simplest classes of devices is the accessed in multiples of blocks character device • Unix enables an application to read and write blocks • It can be accessed as a stream of bytes and like a character device, and therefore the difference implements functions such as open, close, between a character device and a block device is read and write transparent to the user • A file is an example of a character device, as is z Network Interfaces a serial port (dev/ttyS0), with the minor difference that while you can only access a • Finally, network resources are managed through character device resource sequentially, you interfaces, which are generally hardware resources can move back and forth within a file in charge of transmitting and receiving data

25 26

Kernel Modules Example of a Kernel Module

z Kernel functions are called modules, and they are #define MODULE #include loaded and unloaded from memory using the instmod and rmmod calls int init_module (void) { z Unlike traditional functions which are loaded and printk(“<1>Hello, initializing module…\n”); executed completely, a kernel modules registers return(0); itself using the instmod call in order to specify }

which services it is capable of providing and void cleanup_module(void) terminates itself afterwards { printk(“<1>Thank you, & goodbye…\n”); }

>gcc –c test.c >instmod test.o Hello, initializing module… > 27 28

The Journaling Flash File System How JFFS works

z JFFS is a log-structured file system designed by z Nodes containing data and metadata are stored Axis Communications AB in Sweden specifically sequentially on the flash chips for use with flash devices on embedded systems z The entire flash device is then scanned at mount z Since many embedded systems are battery time, with each node being read and interpreted to operated or may otherwise be suddenly and build a hierarchical directory structure at boot uncleanly shut down, one of the major purposes time of a file system such as JFFS is to prevent data z This is a process which is continued until the corruption on such incidents system runs out of space, at which point it begins z Another advantage of the JFFS is to provide wear- to reclaim dirty space which contains old notes leveling of Flash devices that have been rendered obsolete z JFFS2 has since been in use, and it requires all of the advantages of JFFS plus compression

29 30

5 Memory Management Process Management

z Linux employs three memory management z Linux uses 5 states to manage processes schemes • TASK_RUNNING: Process is either executing, or is • Logical Address, where each address contains a waiting to run segment and an offset • TASK_INTERRUPTABLE: The process is • Linear Address, a single 32-bit unsigned integer to suspended until a certain condition is met address memory from 0 to 4 GB • TASK_UNINTERRUPTABLE: Task is suspended • Physical Address, the actual addressing scheme on the system bus (the physical address provided to a until a condition is met and is uninterruptable until flash chip for instance) the condition is met z The kernel translates a logical address into a • TASK_STOPPED: Process execution has been linear address through segmentation, and further terminated translates it into a physical address through • TASK_ZOMBIE: The process has been terminated paging but the parent may still need information pertaining z Linux prefers paging over segmentation to it and therefore the OS can’t discard the process

31 32

Process Management Fork()

z Processes created in Linux have a parent/child z Modern Unix systems, including Linux primarily relationship, and sibling relationships between rely on a different mechanism, namely the fork() child processes and vfork() system calls to work around this z Process 1 (init) is the parent of all other processes inefficiency z The way Unix has traditionally handled creation of z Both fork() and vfork() principally perform the child processes was that the resources available same function, that of creating a child process to a parent process were duplicated and a copy z Although fork() originally copied the entire was provided to the child process memory space of the parent process to the child, z However, this is an inefficient mechanism, with the introduction of vfork() and copy-on-write specially if the parent depends on a large pool of mechanism, where the copying of the address resources and creates many child processes space is faked until modification time, there was less justification for using vfork() anymore z These include stack, memory, current working directory, nice value, etc. 33 34

Interrupt and Exception Handling Interrupt and Exception Handling

z There are two sources of interrupts in Linux, z The nature of an asynchronous interrupt is that it synchronous and asynchronous happens at any time z Synchronous interrupts, better known as z If it happens during a time when the kernel is busy exceptions, are generated by the CPU control unit performing an important function, then the kernel must do the following: z Asynchronous interrupts (known as interrupts) • Switch over and execute as much of the interrupt are generated by hardware resources, such as service routine as necessary serial module, or timers • Switch back and finish the remainder of the task it z Interrupts are grouped into three different was performing before the interrupt occurred categories of critical, non-critical, and deferrable • Switch back yet again and finish the remainder of non-critical the interrupt service routine z The first half of the interrupt service routine is z The address of all the interrupt service routines referred to as the top half, while the second half is must be programmed into the Interrupt Descriptor referred to as the bottom half Table (IDT) 35 36

6 Interprocess Communication

z Another of the kernel’s tasks is to handle interprocess communication (IPC) z Signals and pipes are two mechanisms that Linux uses to perform IPCs z A signal is a mechanism, like interrupts and exceptions to notify processes of events. However, unlike the two, a signal is also available in the user space z For example, the kill() signal can be sent to a process at any time to terminate it

37 Portions of this power point presentation may have been taken from relevant users and technical manuals. Original content Copyright © 2005 – Babak Kia

7