PEPEonBOARD Development board for embedded systems

Pedro Guilherme Antunes Diogo

Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering

Examination Committee Chairperson: Prof. Nuno Cavaco Gomes Horta Supervisor: Prof. Rui Manuel Rodrigues Rocha Co-supervisor: Prof. Carlos Nuno da Cruz Ribeiro Member of the Committee: Prof. Nuno Filipe Valentim Roma

April 2013 ii Dedicado em memoria´ do meu pai...

iii iv Agradecimentos

Em primeiro lugar gostaria agradecer aos meus orientadores de tese, Professor Rui Rocha, que durante um ano me acompanhou neste processo, pela imensa disponibilidade e pela capacidade de exigir o melhor de mim e das minhas decisoes.˜ Ao Professor Carlos Ribeiro pelo apoio prestado e porque sem ele nao˜ existiria simulador. Nao˜ posso deixar de agradecer ao Instituto Superior Tecnico,´ a todos os professores que me acom- panharam nestes quase 6 anos e ajudaram no meu desenvolvimento enquanto aluno. A todos os elementos do grupo GEMS, o meu obrigado pelas reunioes˜ de quarta-feira. Um especial agradecimento ao Professor Carlos Almeida, por me ter ensinado os fundamentos dos sistemas embebidos que tao˜ uteis´ me foram neste trabalho e pelas ideias nas reunioes˜ de quarta-feira. Ao Jose´ Catela pelas chatices que lhe causei e por todo o apoio no laboratorio´ e com o MoteIST. Ao Sr. Joao˜ Pina por toda a ajuda com os componentes e montagem da placa. Nao˜ seria justo referir nomes, mas a todos os amigos que fiz durante esta jornada, o meu grande obrigado. Vocesˆ fizeram com que fosse mais facil´ ultrapassar os momentos menos bons e tornaram os bons melhores ainda. Hei-de recordar com saudade as conversas nos cafezinhos´ depois de almoc¸o. Nao˜ seria justo nao˜ falar dos meus colegas e amigos do NEEC por me apoiarem e incentivarem a dar o meu melhor e a aumentar os meus conhecimentos enquanto ajudava outros, se voltasse atras´ repetiria todo tempo que passei convosco. A` minha mae,˜ por ser a melhor mae˜ do mundo, obrigado por todo o carinho e apoio que me deste durante todos estes anos, ajudaste a fazer de mim o homem que sou hoje. Aos meus irmaos˜ Tita e Andre´ por me chatearem constantemente, mas por fazerem de mim ao mesmo tempo melhor irmao˜ e pessoa. Por ultimo,´ mas nao˜ menos importante, um grande obrigado a` minha namorada Joana, por estares sempre aqui quando preciso, por todos os momentos maravilhosos e por seres o meu porto seguro, este trabalho tambem´ e´ teu.

v vi Resumo

PEPE e´ um processador de 16 bits, criado com o proposito´ de ensinar os conceitos basicos´ de Arqui- tectura de Computadores. Inicialmente este processador era simulado num ambiente de simulac¸ao˜ em computador - SIMAC. A principal desvantagem de correr um processador num ambiente simulado e´ nao˜ poder interagir com perifericos´ reais, o que limita as suas aplicac¸oes.˜ Como tal, os utilizadores estavam impossibilita- dos de correr os seus programas fora do ambiente de simulac¸ao. Motivado por estas limitac¸oes,˜ uma placa de desenvolvimento foi criada como plataforma para um emulador externo. Esta placa inclui muitos dos perifericos´ existentes no SIMAC, como um LCD, teclado hexadecimal, displays de 7 segmentos, botoes˜ e LEDs. Para emular o PEPE fora do ambiente de simulac¸ao˜ foi usado um microcontrolador. Este microcontrolador esta´ responsavel´ por executar os programas do PEPE, interagir com os perifericos´ e comunicar com o computador. Esta comunicac¸ao˜ permite que o computador controle o emulador. Com o objectivo de manter uma interface familiar, a mesma janela usada para simular programas no ambiente de simulac¸ao˜ e´ agora tambem´ usada para controlar o emulador. Este trabalho abre novas possibilidades para quem usa o PEPE, visto que permite controlar novos perifericos´ externos, assim como servir de alternativa a outras placas de desenvolvimento mais com- plexas.

Palavras-chave: PEPE, Sistemas Embebidos, Unidade de Processamento, Emulac¸ao,˜ Sim- ulador em Computador, Desenho de Hardware

vii viii Abstract

PEPE is a 16 bit processor, created with the purpose of teaching the basics of Computer Architectures. This processor was initially simulated on a computer simulator - SIMAC. The main disadvantage of a simulated processor is not being able to interact with real peripherals, which limits its applications. Because of this, the users were unable to run their programs outside of the simulator environment. Motivated by these limitations, a hardware development board was purposefully built to serve as a platform for an external emulator. This platform integrates many of SIMAC’s existing peripherals such as an LCD, a keypad, seven segment displays, buttons and LEDs. To emulate the PEPE processor outside the computer, we used a . The microcontroller is responsible for emulating PEPE programs, interfacing with the peripherals and communicating with the computer. This communication allows the emulator to be controlled by a computer. With the goal of maintaining a familiar user interface, the same simulator interface is used to control the emulator, while maintaining the option to run the same programs on the simulator. This work opens new possibilities for anyone using the PEPE processor, as it can now control new external peripherals, as well as provide an alternative to other more complex development boards.

Keywords: PEPE, Embedded Systems, Processing Unit, Emulation, Computer Simulator, Hard- ware Design

ix x Contents

List of Tables...... xv List of Figures...... xviii List of Abbreviations...... xix

1 Introduction 1 1.1 Motivation...... 1 1.2 Goals...... 2 1.3 Contributions...... 2 1.4 Organization...... 3

2 Emulation Process Overview5 2.1 Comparing Different Emulation Approaches...... 6 2.1.1 Decode-and-Dispatch Interpretation...... 6 2.1.2 Threaded Interpretation...... 7 2.1.3 Threaded Interpretation with Precoding...... 7 2.1.4 Binary Translation...... 8 2.2 Existing Work...... 9 2.3 Discussion...... 9

3 Emulated PEPE Architecture 11 3.1 PEPE...... 12 3.2 Architecture...... 16

4 Designing PEPEonBOARD 19 4.1 Target Platform...... 19 4.1.1 Starting the Microcontroller...... 20 4.2 Storing a PEPE Program in the microcontroller...... 21 4.3 Translating from PEPE’s Instruction Set to the MSP430 Instruction Set...... 22 4.4 Virtual Memory...... 25 4.5 Running a program...... 27 4.6 Emulated Instructions Routines...... 28 4.7 Reading and Writing from / to Memory...... 30

xi 4.8 Interfacing with the Peripherals...... 30 4.8.1 LEDs...... 30 4.8.2 Push Buttons and Toggle Switches...... 30 4.8.3 Seven Segment Displays...... 31 4.8.4 Keypad...... 31 4.8.5 LCD...... 32 4.8.6 Serial Interface...... 33 4.9 Interrupts...... 35 4.9.1 Overflow...... 36 4.9.2 DIV0...... 36 4.9.3 INV OPCODE...... 37 4.9.4 MISALIGNED D...... 37 4.9.5 MISALIGNED I...... 37 4.9.6 External Interrupts...... 37

5 The PEPEonBOARD Platform 39 5.1 Requirements...... 39 5.2 Implementing PEPEonBOARD...... 40 5.2.1 Power...... 40 5.2.2 Interfacing with the MoteIST++ s5 1011...... 40 5.2.3 Serial Communication With a Computer...... 44 5.2.4 Seven Segment Displays...... 46 5.2.5 Speaker...... 48 5.2.6 Push Buttons and Switches...... 48 5.2.7 LEDs...... 48 5.2.8 Extending the Main Board with an LCD and Keypad Add-On Board...... 48 5.3 Second Version of the Main Board...... 50

6 Interfacing with the Computer Simulator 53 6.1 Software Changes to the SIMAC...... 53 6.2 Communication with the Jalapeno˜ ...... 54 6.2.1 Download COD...... 59 6.2.2 Upload COD...... 59 6.2.3 Erase COD...... 60 6.2.4 Version...... 60 6.2.5 Run Emulator...... 61 6.2.6 Pause Emulator...... 61 6.2.7 Step Emulator...... 61 6.2.8 Add Breakpoint...... 62 6.2.9 Remove Breakpoint...... 62

xii 6.3 Breakpoints...... 63 6.3.1 Stepping...... 64

7 Evaluation and Result Analysis 65 7.1 Instructions Timings...... 65 7.2 Speed Test...... 66 7.3 Virtual Memory...... 68 7.4 LCD Update Time...... 69 7.5 Startup Time...... 70

8 Conclusions 71

A Hardware 73 A.1 Schematics of the Second Version of the Main Board...... 74 A.2 Schematics of the LCD and Keypad Add-On Board...... 79 A.3 Bill of Materials...... 81

B Test Code 83 B.1 Instructions Timings...... 83 B.2 Speed Test...... 84 B.2.1 PEPE Program...... 84 B.2.2 Native Program...... 89 B.3 Virtual Memory...... 92 B.4 LCD...... 93

Bibliography 96

xiii xiv List of Tables

3.1 Description of the input and outputs of the PEPE...... 13 3.2 Description of PEPE’s registers...... 14 3.3 Description of each bit of the ...... 15 3.4 Description of the RCN register...... 15 3.5 Description of PEPE’s interrupts...... 16 3.6 Memory Mapping for the Peripherals...... 18

4.1 RCU Value to Baud Rate Conversion Table...... 35

7.1 Time Measurements of Speed Test Program...... 67

xv xvi List of Figures

2.1 Decode-and-dispatch interpretation [1]...... 6 2.2 Threaded Interpretation [1]...... 7 2.3 Threaded Interpretation with Precoding [1]...... 8 2.4 Binary Translation [1]...... 9

3.1 The PEPE module in SIMAC...... 12 3.2 SIMAC’s representation of the emulated PEPE Architecture...... 17

4.1 Key components of PEPEonBOARD...... 19 4.2 Flowchart of the microcontroller’s initialization process...... 20 4.3 Microcontroller’s memory mapping...... 23 4.4 Example representation of the hierarchic organization of the look-up tables...... 24 4.5 Simplification of the expression that indicates a 4 bit opcode instruction...... 25 4.6 Flowchart Describing the Reading and Writing Process of PEPE’s Memory...... 26 4.7 Emulating a PEPE instruction...... 28 4.8 Comparison Between the Simulated and Physical LCDs...... 32 4.9 Organization of the memories in both LCDs...... 33

5.1 Block Diagram of the PCB Project...... 41 5.2 Schematic of the Voltage Regulator...... 42 5.3 Photos of the MoteIST++ s5 1011...... 42 5.4 GPIO pin with a current limiting resistor...... 43 5.5 Resistor arrays placement on both versions of the main board...... 43 5.6 Schematic of the USB to RS-232 converter...... 45 5.7 Difference between versions of the main board on one of the RS-232 channels...... 46 5.8 Schematic of the LED Driver and Dual Seven Segment Display...... 47 5.9 Highlight of the jumpers that enable or disable the integrated peripherals...... 47 5.10 Speaker Circuit...... 48 5.11 Algorithm to Read the Keypad...... 49 5.12 Schematic of the LCD...... 50 5.13 Photo of the LCD and Keypad Add-on board...... 51 5.14 Photo of the first version of the main board...... 51

xvii 5.15 Photo of the second version of the main board...... 52

6.1 PEPE Simulation Window in SIMAC...... 55 6.2 Executing an Action on the SIMAC...... 56 6.3 Format of a Valid Message...... 57 6.4 Representation of the Circular Buffer...... 58 6.5 Algorithm of the checkAction Function...... 59 6.6 Format of the Download COD message...... 59 6.7 Format of the Upload COD message...... 60 6.8 Format of the Erase COD message...... 60 6.9 Format of the Version message...... 61 6.10 Format of the Run Emulator Message...... 61 6.11 Format of the Pause Emulator Message...... 61 6.12 Format of the Step Emulator Message...... 62 6.13 Format of the Add Breakpoint Message...... 62 6.14 Format of the Remove Breakpoint Message...... 62 6.15 Format of the Registers Message...... 63

7.1 Number Clock Cycles of Different Instructions...... 66 7.2 Measured Times for The Virtual Memory Test...... 68 7.3 LCD Test Pattern...... 69 7.4 Startup Time of Different Programs...... 70

xviii List of Abbreviations

ACK Acknowledgement ADC Analog-to-Digital Converter BSL Bootstrap Loader CRC Cyclic Redundancy Check DTR Data Terminal Ready FPGA Field-Programmable Gate Array GPIO General Purpose Input/Output I2C Inter-Integrated Circuit IC Integrated Circuit IDE Integrated Development Environment LCD Liquid-Crystal Display LDO Low-dropout Regulator LED Light Emitting Diode MISO Master In Slave Out MOSFET Metal Oxide Semiconductor Field effect Tran- sistor MOSI Master Out Slave In NACK Negative Acknowledgement PCB Printed Circuit Board PEPE Processador Especial Para Ensino PROM Programmable Read-Only Memory RAM Random Access Memory SIMAC SIMulador de Arquitectura de Computadores SOIC Small-Outline Integrated Circuit SPI Serial Peripheral Interface UART Universal Asynchronous Receiver/Transmitter USB Universal Serial Bus

xix xx Chapter 1

Introduction

PEPE - Processador Especial Para Ensino (translated from the Portuguese as Special Processor for Education) - is, as the name suggests, a computer processor that was created on purpose to teach the basics of Computer Architecture to students. PEPE is a 16 bit processor that allows students to program in a low level language, being all its implementation details documented in [2]. The book starts by describing a very simple version of the PEPE called PEPE-8 in Chapter 3, which is a simpler 8 bit version of the PEPE, and then proceeds to describe the more advanced specification that we are going to use in the context of this dissertation. Accompanying the book there is also a simulator - SIMAC (SIMulador de Arquitectura de Computadores , translated from the Portuguese as Computer Architecture Simulator) - which is used to illustrate most of the programming examples included in the book. The motivation behind this simulator comes from the complex and static nature of electronic circuits, which sometimes can be hard to design and even then cannot be easily changed after built. A complex system can therefore be built in SIMAC using rather simple building blocks, called modules in this simulation tool. Modules can be as simple as logic gates or as complex as memories and processors, as is the case with, but not limited to, PEPE. Modules are also used as inputs and outputs for the user. Some examples include buttons, LEDs, displays and serial UARTs. The PEPE module allows anyone to program the processor using a particular and to interact with the connected modules. This interaction is possible because the PEPE has inputs and outputs that can be used to connect it to other modules.

1.1 Motivation

Even though PEPE is very useful as an entry-level processor for a computer architecture course it is also somewhat limited as it only exists inside the SIMAC. Inside SIMAC the projects are restricted to artificial scenarios as both the processor and the peripherals do not have real existence. This can sometimes be a problem when students want to apply their current knowledge on real-world settings, controlling real peripherals.

1 One example of this problem happens in the Computer Architecture course in the Tagus Park Cam- pus of Instituto Superior Tecnico.´ During one semester the students learn how to use the processor in conjunction with SIMAC. After learning some of PEPE’s concepts the students are encouraged to par- ticipate in a competition in the framework of an on-going project called Suba. The Project Suba consists in a small electric car that has a development board and some sensors mounted on it. The students participating in this project compete among them for the best lap times around a track. The development board that controls the car is equipped with an ARM processor, so if the students want to be able to control the car they must previously learn ARM assembly, different from the assembly language they learnt with the PEPE.

Although learning to use an ARM processor may be a useful skill to have, as they are widely used in the market today, it steps outside the program for the course. By continuing to use the PEPE language the students are familiar with, they can focus more on thinking about the challenges of the project rather than in ARM assembly particularities. For that reason, a hardware implementation of PEPE would be more appropriate.

1.2 Goals

To solve such problems this dissertation aims to create a platform - the PEPEonBOARD - that enable students to use the PEPE on a real-world scenario.

The first goal is to search or create a suitable hardware platform to be the bridge between the PEPE and the user, which must include some of the peripherals existent on the SIMAC and that are commonly used with the PEPE.

Secondly, we must develop a way to run PEPE programs on the developed hardware platform.

Thirdly, we must thrive to provide a graphical interface similar to the one the SIMAC user is familiar with, so that a seamless integration with SIMAC can be obtained and therefore improving the user experience.

1.3 Contributions

The work developed in this dissertation contributes with the following aspects:

• A hardware platform for embedded systems - PEPEonBOARD - and an Add-on board with an LCD and Keypad • An emulator that enables running PEPE programs in PEPEonBOARD • A paper submitted to the exp.at’13 2nd Experiment @ International Conference

2 1.4 Organization

This dissertation is organized into seven different chapters, being the present one the introduction to this work. In the second chapter we will go though the decision of emulating the PEPE on a microcontroller rather than running it natively on an FPGA. Also, some of the emulation techniques that can be utilized in the context of this work are discussed. In the third chapter we give an introduction of the emulated PEPE architecture. In the forth chapter we describe in detail the emulation process that allows to run PEPE programs on the PEPEonBOARD platform and all the challenges involving it. We also present the challenges and decisions made during the design of the hardware that supports this work. We go through the requirements, features and choice of components composing the developed board, as well as changes made on later versions. In the fifth chapter we give a brief overview of the SIMAC, describe the changes made to it in order to communicate with the PEPEonBOARD and also detail the mechanisms used for that communica- tion. We also present how various mechanisms included in SIMAC that control PEPE are ported to the microcontroller. In the sixth chapter we compare and validate the work developed for this work with the existing platform on the SIMAC. Finally on the seventh chapter we give an overview of the work developed and conclude about the results obtained.

3 4 Chapter 2

Emulation Process Overview

Originally, all PEPE programs run inside SIMAC, which is able to run on Linux, Mac OS and Windows operating systems 1. As we need to emulate PEPE programs on the PEPEonBOARD platform, the focus of this chapter is on the analysis of the different emulation approaches, compare their advantages and disadvantages and justify why we chose one over the others. Emulating the PEPE on a different hardware platform requires more than just emulating its Instruction Set, it also requires emulating its Architecture and all its connected peripherals. Initially, two hypothesis were considered for running PEPE programs on the PEPEonBOARD plat- form: one was implementing the PEPE processor directly on a FPGA; the other was running those programs on a 16 bit (because PEPE is a 16 bit processor) or on a 32 bit microcontroller-based em- ulator. Existing 8 bit were not considered because of the generally inferior number of peripherals and performance along with the added complexity of emulating a 16 bit processor on a 8 bit platform. Each hypothesis has its own advantages and disadvantages. The FPGA would run the PEPE natively without emulation, therefore being able to run quicker than a microcontroller at the same clock frequency. However, it doesn’t come without its disadvantages. Even if subjective and highly dependable on the system designer talent and the tools available, the FPGA implementation would take considerably longer to develop than the emulated approach. Additionally, at the time of this writing, an FPGA Integrated Circuit is considerably more expensive than a 16 bit or 32 bit microcontroller 2. On the other hand, the use of an emulation based on a microcontroller has a significant number of advantages. The development language is /C++ or Assembly, which can speed up the development process as we are abstracted from the hardware layer. A microcontroller also has many available pe- ripherals and integrated memory, which implementations are already tested and documented, being not the case of a custom component on an FPGA. In most cases the manufacturer also provides sample code to work with the peripherals. Lastly, a microcontroller provides a working platform from the start.

1The simulator was developed using the Java programming language. For that reason it is able to run on all the mentioned operating systems. 2The cheapest Xilinx Spartan 6 FPGA available on Farnell (http://goo.gl/CHsiH) costs twice as much as the microcontroller used on this work (http://goo.gl/FWDPB).

5 2.1 Comparing Different Emulation Approaches

Emulating the PEPE Instruction Set can be accomplished in different ways each with different character- istics of performance, memory footprint and portability. In [1] some of the emulation approaches relevant to this work are described thoroughly.

2.1.1 Decode-and-Dispatch Interpretation

The interpretation approach to emulation is possibly one of the simplest routes to take. In interpretation we have code running on the PEPEonBOARD platform that is in charge of both interpreting the program memory and also manage the aspects related to the architecture of the emulated system, managing its memory, register bank, interrupts, etc. . . , separately from the platform itself.

In the decode-and-dispatch interpretation approach we have a main interpreter loop - the dispatch loop - that steps through each instruction of the program memory, decodes it, compares the result with the available routines and then dispatches it to the appropriate routine that emulates such instruction, making the appropriate changes to the emulated architecture managed by the interpreter, if necessary. After the instruction routine is finished, the program goes back to the dispatch loop to restart the process with the next instruction. Figure 2.1 illustrates the decode-and-dispatch interpretation method.

Figure 2.1: Decode-and-dispatch interpretation [1]

Although simple this method may be quite slow. For each single interpreted instruction we have to decode, compare and dispatch it, and after the instruction routine is complete, branch back to the dispatch loop. All those branches can impact negatively the performance of this approach.

However, the advantage of this method is a lower memory requirement, compared to other interpre- tation methods. In this case there is only need to store the code for the dispatch loop and the code for each emulated instruction, in addition to the original source code and the emulated architecture (e.g. registers, memory bank).

6 2.1.2 Threaded Interpretation

The threaded interpretation tries to solve the performance problem that the decode-and-dispatch in- terpretation has with its dispatcher loop. By moving the decode-and-dispatch part of the interpreter to the end of each instruction routine we do not need to branch back to the dispatch loop to perform the decode-and-dispatch operation, as we branch directly to the next instruction routine. With this we are effectively saving one branch. Figure 2.2 illustrates the threaded interpretation. Note that with the threaded interpretation we no longer have the need for a dispatch loop.

Figure 2.2: Threaded Interpretation [1]

Comparing with the decode-and-dispatch interpretation, the threaded interpretation has slightly bet- ter performance, as it saves at least one branch, but it also has a greater memory footprint, because now every instruction routine must replicate the functionality that previously only existed in one place, i.e on the dispatcher loop.

2.1.3 Threaded Interpretation with Precoding

Even though the threaded interpretation features better performance than the decode-and-dispatch in- terpretation, we continue to have to perform the decode operation every time we interpret one instruction. This can be avoided if we precode every instruction from the original source code to point to the corresponding routine that emulates it. The precode task is to parse the original source code, decoding each instruction and saving the address of the routine that emulates that instruction in an intermediate code memory. This is done once in the beginning of the emulation process. This differs from the Threaded Interpretation method because instead of reading the next instruction, decoding it and jumping, we now only have to jump to the next instruction, as the decode process is done in advance. This way we still have a routine for every instruction but we are only decoding on the beginning of the program and only once for every instruction in memory, making it particularly advantageous if the program has loops. If the program doesn’t have loops and the same address on the program memory is not visited more than once, the precoding is possibly slower than the threaded interpretation without precoding. Figure 2.3 illustrates the threaded interpretation with precoding. As we can observe the

7 original source code is precoded to an intermediate code region. To begin the execution of the emulation process we jump to the address pointed by the first entry on the intermediate code region. This will jump to the routine that emulates the first instruction of the original source code. When the execution of that instruction is over, we repeat the process for the next instruction. Note that even though it is not clear in figure 2.3, we no longer need to read and decode the next instruction.

Figure 2.3: Threaded Interpretation with Precoding [1]

This method has usually better performance than the two previous methods, with the tradeoff of increased memory footprint. In addition to the emulated architecture memory we must also reserve space for the addresses to the emulated instructions routines on a dedicated memory space, which is proportional to the source code size. This method introduces a startup delay to precode all the emulated instructions as well, when the previous had low to no startup delay.

2.1.4 Binary Translation

The binary translation takes the most extreme approach. The goal of this approach is to translate every instruction of the emulated architecture to the equivalent instruction(s) of the PEPEonBOARD architecture. This implies that we no longer have separate memory or register bank for the emulated architecture as there is no longer a distinction between the two. The final program running on the PEPEonBOARD platform must be converted from the emulated architecture to the PEPEonBOARD’s one. This process is comparable to the precoding process in threaded interpretation, but instead of just pointing to the instructions routines we are converting the instructions to the new architecture. This may or may not happen on the PEPEonBOARD side. Figure 2.4 represents the binary translation method. This approach is the most performant of all the previously mentioned as no extra work is being performed during the execution of the program. The disadvantages of the binary translation approach are that it is not portable and can only be run on the platform that it was built for. Furthermore the startup time is even slower than the startup time for the Threaded Interpretation with Precoding.

8 Figure 2.4: Binary Translation [1]

2.2 Existing Work

Historically, emulation is often used when newer architecture users have the wish to run programs of older architectures. However there are many projects in use today tackling the problem of emulating one architecture on another. The QEMU 3 project for instance can emulate different hardware platforms (e.g. ARM and PowerPC) and can also emulate several peripherals. For emulating the central processing units it uses a type of binary translation called dynamic binary translation [3]. This differs from the static binary translation, just discussed, in the fact that with dynamic binary translation the code is translated as it is being discovered. Apple Inc. also used binary translation on Rosetta, a software that translates older PowerPC pro- grams to the x86 architecture in use today [4]. In comparison, both examples are more advanced and require a higher performance than we need in our case. Interpretation techniques are often used to interpret programming languages. Languages such as Forth were originally interpreted using Threaded Interpretation with Precoding [5].

2.3 Discussion

In our case for emulating the PEPE we have a few aspects that are worthwhile to be mentioned:

1. SIMAC uses a simple decode-and-dispatch method and its focus is not on speed; we must provide a consistent user experience on the emulated platform. 2. On a microcontroller we typically do not have much memory available, so we must take memory footprint into consideration. 3. Code portability is not a problem, as there is no plans to change from the chosen hardware plat- form. 3Quick EMUlator - http://qemu.org

9 After analyzing the requirements and limitations we ended up choosing the threaded interpretation with precoding method. This choice was heavily influenced by the choice of the microcontroller we used on the PEPEonBOARD platform. This microcontroller has enough memory to permit the typical Threaded Interpretation with Precoding memory toll, rendering superior performance over the Decode- and-Dispatch and Threaded Interpretation methods. To grant even better performance we programmed all the interpreter in Assembly and took advantage of the generated flags of the status register on the PEPEonBOARD side to use in the flags of the emulated status register. The disadvantage of this is a lower portability and longer development time. As far as the comparison with the Binary Translation method goes, the main reason for not having adopted such method is concerned with its intrinsic complexity. Moreover, the lack of strict performance requirements and the penalty associated with an increased development time do not favor the use of the Binary Translation approach.

10 Chapter 3

Emulated PEPE Architecture

We will be emulating an existing SIMAC architecture in the external emulator board. The advantage of using SIMAC is that we can change all modules as we wish. That is much more difficult on physical hardware. To mitigate this problem we chose an architecture containing many different peripherals that are used throughout the simulation examples of [2] and as part of the final project of the Computer Architecture course [6] in the Tagus Park Campus of Instituto Superior Tecnico,´ where PEPE is used. In this course, a SIMAC architecture is provided as part of the project. The students are supposed to create a PEPE program that applies most of the knowledge learned about PEPE during the semester. For the 2011/2012 final project of the Computer Architecture course, the provided SIMAC architecture contained the following modules:

• 1x PEPE • 1x Memory Bank • 1x Pixel Screen - LCD • 1x Seven Segment Display • 1x Keypad • 1x Serial Interface • 2x Clocks

Taking that list into consideration, we decided to add a few other peripherals that could be useful, so our final list of peripherals is the following:

• 1x PEPE • 1x Memory Bank • 1x Pixel Screen - LCD • 2x Seven Segment Displays • 1x Keypad • 1x Serial Interface • 4x LEDs

11 • 2x Toggle Switches • 2x Push Buttons • 2x Clocks

As most of these peripherals are physical and not simulated, we made the decision to add all the peripherals a user could need, as they cannot be easily changed. To reduce this problem we added support for an Add-on board on the dedicated hardware platform, that could be used to expand the emulated peripherals. This choice will be justified later on. Given that the physical peripherals are not easily changed, we chose to use a fixed architecture. This means that changes performed in SIMAC will not affect the emulator implementation. Before going into more detail on the process of running PEPE programs on the emulation platform, we must first study the architecture we are going to emulate.

3.1 PEPE

PEPE, is just a normal 16 bit processor. A PEPE instruction always has a fixed size of 16 bits. Those 16 bits include an opcode, which can have a length of 4 bits or 8 bits. The rest of the 12 or 8 remaining bits are used for the relevant arguments of each instruction. It also uses a Von Neumann architecture, where the Memory Bank is shared for both data and program memory. The real interest comes when PEPE is connected to its peripherals. PEPE uses memory mapped I/O, where certain ranges of addresses are dedicated to specific peripherals. With this in mind, just emulating PEPE on an external emulator would have very few applications, and would provide near to none advantages over emulating it on SIMAC. The interest for this dissertation is emulating both PEPE and all connected peripherals of a given SIMAC architecture on the PEPEon- BOARD platform. By running PEPE programs on the PEPEonBOARD platform, we are able to interface with a real representation of the peripherals existing inside the given SIMAC architecture. Before tackling the problem of emulating PEPE, we must first describe PEPE’s relevant features for this dissertation. All the details presented in this section were extracted from [2]. Nonetheless they are presented to serve as comparison to the implemented solution. Figure 3.1 shows an image of the PEPE module in SIMAC.

Figure 3.1: The PEPE module in SIMAC

12 Port Name Number of bits Type Description

Reset 1 Input Resets the processor (active high) INT0 1 Input Interrupt 0 INT1 1 Input Interrupt 1 INT2 1 Input Interrupt 2 INT3 1 Input Interrupt 3 Clock 1 Input External Clock Input D15..D0 16 Input/Output Data Bus A15..A0 16 Output Address Bus BA 1 Output Byte Addressing (active high) RD 1 Output Reading Memory (active low) WR 1 Output Writing Memory (active on rising edge) WAIT 1 Input When active extends the memory access cycle. (active high) IC 1 Input Ignore Cache (active high) BRQ 1 Input Bus Request (active high) BGT 1 Input Bus Grant (active high)

Table 3.1: Description of the input and outputs of the PEPE.

As can be seen, the module has several pins that can be connected to peripherals. A description of each pin is presented in table 3.1. Although relevant for the SIMAC’s simulation, in our case, as we have a fixed architecture, this is not so relevant. Because we have a fixed architecture, in our emulation we know exactly which peripherals are connected and must be activated. For that reason, all the functions performed by the module’s pins, relevant when used in a generic simulation environment, are now replaced by an hardcoded approach in our emulator. Lets take a memory read of a byte as an example. This can be accomplished using the MOVB Rd, [Rs] instruction of PEPE. In the SIMAC simulation, we read the address contained in Rs and send it to the Address Bus. We also set the Byte Addressing signal and the Reading Memory signal. After setting the new pins values, the simulation will then carry that information to the Memory Bank module and continue until the PEPE receives the byte value. In emulation this is processed differently. We start by reading the address contained in Rs and we read that value directly from memory. That does not mean that all PEPE’s pins are irrelevant in our situation. For some connections, such as the interrupts pins, we must be aware of what is connected to them. Internally, PEPE has a register bank with sixteen registers (R0 to R15). Some of those registers however, are reserved for special functions and cannot generally be used as general purpose registers. A description of the registers is presented in table 3.2. One of these registers is the Status Register. Each bit of this register - also known as flag - has a special purpose, described in table 3.3. Even though most of the flags are set by the user, some are updated by the program after executing some instructions (Z, N, C, V), and can be used by others, such

13 Register Number Register Name Description

R0 - R10 General Purpose Registers R11 RL Linking Register R12 SP Stack Pointer R13 RE Status Register R14 BTE Interrupt Table Pointer R15 TEMP Temporary Register. Modified in some of the instructions

Table 3.2: Description of PEPE’s registers as conditional jumps for instance. One of the responsibilities of the emulator is to update some of the flags’ values. For the purpose of this dissertation we chose not to emulate the DMA or the System Protection Level. Although its behavior is described in [2], the DMA modules in SIMAC are not implemented yet. The System Protection Level is a relatively advanced feature, that is not used in most cases, although it could be implemented in future versions of the emulator. For the reasons mentioned above, the DE and NP bits of the Status Registers are ignored. PEPE also has sixteen auxiliary registers (A0 to A15) which control some of its internal features. Those features include controlling PEPE’s internal configurations, Cache and Virtual Memory. Once again, for the purpose of this dissertation we made a decision not to include the Cache or the Virtual Memory in the emulation. The reason behind this decision is similar to the reason for not including support for the DMA, as these features are not yet fully implemented in SIMAC. The only auxiliary register relevant for our emulation work is the A0 register - RCN. This register controls some of PEPE’s internal configurations, such as the triggering level for the Interrupt pins, the clock pin selection, that selects between an internal clock and a clock connected to the Clock pin of PEPE in SIMAC. Finally we have a bit to enable or disable the processor’s pipeline. Table 3.4 provides a description of all RCN’s bits. As our emulator is not intended to be cycle-accurate, we are not enabling the use of an external clock. For that reason this functionality is not implemented. The Pipeline functionality is also not implemented in the PEPE module of SIMAC, so it has not been included on the emulator. The first eight bits of the RCN register are relevant for our work as they control the triggering level for the interrupts. PEPE has 15 predefined interrupts controlled by the processor. However, a maximum of 256 interrupts can be used, if called by the SWE k instruction. This instruction can directly call any interrupt ranging from 0 to 255, including the predefined interrupts. A list of the existing interrupts is given in table 3.5. Also worth recalling, as presented in table 3.3, some bits of the Status register can enable or disable certain interrupts or all interrupts altogether. All interrupts are only enabled when the IE bit of the Status Register is active. We can also enable or disable individual interrupts such as the Overflow, DIV0 and all external interrupts. When an interrupt occurs, a user defined routine is executed. To program the addresses of these

14 Bit Acronym Name Description

0 Z Zero Active when previous instruction’s result is zero (if applicable) 1 N Negative Active when previous instruction’s result is negative (if applicable) 2 C Carry Active when previous instruction’s result generates a carry 3 V Overflow Active when previous instruction’s result overflows 4 A Reserved 5 B Reserved 6 TV Trap on Overflow Generate an interrupt when the overflow flag is active 7 TD Trap on DIV0 Generate an interrupt when a division or modulo by zero is detected 8 IE Interrupts Enable Enable all the interrupts 9 IE0 Enable Int0 Enable the Interrupt 0 10 IE1 Enable Int1 Enable the Interrupt 1 11 IE2 Enable Int2 Enable the Interrupt 2 12 IE3 Enable Int3 Enable the Interrupt 3 13 DE DMA Enable Allows direct access to the memory 14 NP Protection Level System protection level when NP=0, User protection level otherwise 15 R Reserved

Table 3.3: Description of each bit of the Status Register

Bit Acronym Description

1..0 NSI0 Trigger for INT0. Rising Edge = 00 ; Falling Edge = 01 ; Active High = 10 ; Active Low = 11 3..2 NSI1 Trigger for INT1. Same options as for INT0 5..4 NSI2 Trigger for INT2. Same options as for INT0 7..6 NSI3 Trigger for INT3. Same options as for INT0 8 FR Clock select. 9 E Enable Pipeline 15..10 Reserved

Table 3.4: Description of the RCN register

15 Number Name Cause

0..255 SWE Execution of the SWE instruction 0 INT0 INT0 Triggered 1 INT1 INT1 Triggered 2 INT2 INT2 Triggered 3 INT3 INT3 Triggered 4 Overflow The Overflow Flag is active 5 DIV0 Division (or modulo) by zero 6 INV OPCODE Invalid Opcode 7 MISALIGNED D Access to an odd address while reading a 16 bit word 8 MISALIGNED I Odd value 9 D MISS PAG Related to Virtual Memory 10 I MISS PAG Related to Virtual Memory 11 D PROT Related to Protection Levels 12 I PROT Related to Protection Levels 13 READ ONLY Related to Virtual Memory 14 SYSTEM Related to Protection Levels

Table 3.5: Description of PEPE’s interrupts routines the user must maintain a table with the routine address for each interrupt. These routine ad- dresses are ordered by the number presented in table 3.5. For the processor to know the address of that table, we must modify the value of the BTE register to point to the base address of the table. That is, the memory address containing the INT0 interrupt routine address. The goal for the external emulator is to emulate PEPE’s instruction set and its interaction with the virtual peripherals in SIMAC.

3.2 Architecture

As we made a choice to include other peripherals than the ones already included on existing SIMAC architectures, we had the to create a new architecture in SIMAC to allow using the peripherals in the external emulator in the same manner as they are used in SIMAC. A diagram of the created architecture is presented in figure 3.2. As presented in table 3.1, PEPE has an Address Bus of 16 bits. This allows the PEPE to access 65536 different memory positions, if needed. To communicate with peripherals, PEPE uses memory mapped I/O. So, by decoding the address bus, we can enable certain peripherals, other than the memory bank. What happens in our created architecture is that we use a PROM to decode the address bus and enable certain peripherals, depending on the address range. Using this approach we will not be able

16 Figure 3.2: SIMAC’s representation of the emulated PEPE Architecture.

17 Peripheral Address Range

Memory Bank 0x0000 - 0x7FFF Push Buttons, Toggle Switches, Clock 1 0x8000 - 0x8FFF LEDs 0x9000 - 0x9FFF Dual Seven Segment Displays 0xA000 - 0xAFFF Keypad In / Out 0xB000 - 0xBFFF LCD 0xC000 - 0xC07F Serial Interface 0xD000 - 0xDFFF Reserved for Future Peripherals 0xE000 - 0xFFFF

Table 3.6: Memory Mapping for the Peripherals to use a full 216 = 64 KB memory, as we will be dedicating some address ranges for peripherals. The PROM we used in the architecture has 16 positions of 8 bits, and is indexed by the 4 most significant bits of the address bus. This means that the minimum address range we can allocate to a peripheral has a length of 0x1000 bytes (4 KB). Also, by using 8 bit numbers we can control a maximum of 8 different peripherals. The range of addresses dedicated to each peripheral is given in the table 3.6. In the architecture of figure 3.2 we have a SIMAC representation of the emulated architecture, in- cluding the peripherals listed in a previous section. In this architecture we have two clocks, directly connected to the interrupt pins of PEPE. Currently, the period of Clock 1 and Clock 2 is fixed to 200 ms and 100 ms, respectively. Additionally, Clock 1 is also connected to bit 4 of the Buttons & Clock input module. To enable the peripherals that cannot be enabled or disabled manually, we use an input or output interface module. Some of these modules share peripherals. One of these cases has already been mentioned with Clock 1. This is possible because we can select which bits of the connection each peripheral uses. In the Buttons & Clock input module, the 2 push buttons are connected to bits 0-1, the 2 toggle switches are connected to bits 2-3 and Clock 1 to bit 4. A similar situation is happening with the Seven Segment Output module. This module outputs 8 bits, of which the first 4 are displayed on Right Display and the remaining on Left Display. The KeypadIn and KeypadOut are enabled by the same address. However, they are individually selected depending on whether that address is being read of written to.

18 Chapter 4

Designing PEPEonBOARD

The PEPEonBOARD consists on a physical hardware platform containing a number of peripherals that are controlled by a microcontroller. The PEPEonBOARD communicates with a computer via a USB connection, which can also provide power. To process the incoming messages from the computer a resident monitor was created - Jalapeno˜ (we will discuss Jalapeno˜ in chapter6). The other key component of the PEPEonBOARD is the emulator. The Jalapeno˜ controls the emulator with actions from the computer. A diagram describing the key components of PEPEonBOARD is pictured in figure 4.1.

PEPEonBOARD

USB Emulator Jalapeño Computer

Figure 4.1: Key components of PEPEonBOARD

4.1 Target Platform

The choice of the microcontroller for the external emulator was based on different factors. For one it had to be a 16 bit or 32 bit microcontroller. It should also have a sufficient amount of memory to allow storing a representation of PEPE’s architecture. For the microcontroller we chose the MSP430F5438A microcontroller. This microcontroller is a 16 bit microcontroller with 16 KB of RAM and 256 KB of flash memory. This choice was influenced by a number of factors. First, it is a suitable 16 bit platform with an instruction set that shares many of PEPE’s instructions. Secondly, we had previous experience and existing hardware to work with this platform. Using this microcontroller proved to be a good choice. We chose this microcontroller before knowing the memory requirements of our emulation approach, and having such memory availability enabled us to choose a faster emulation method than possible

19 otherwise.

4.1.1 Starting the Microcontroller

Before being able to start the emulation process, we must first initialize the peripherals on the microcon- troller. Figure 4.2 presents a flowchart containing such initializations.

Start

Disable the Watchdog Timer

Initialize Clock

Initialize Peripherals

Initialize Self Programming Functionality

Initialize Breakpoints

Initialize Timers

Enable Interrupts

Jump to the Jalapeño

Figure 4.2: Flowchart of the microcontroller’s initialization process

The first operation is to disable the Watchdog Timer. This comes active by default in the MSP430F5438A microcontroller.1 To avoid this behavior the developer can restart the watchdog timer periodically or sim- ply deactivate it. As we had no requirement for the watchdog timer in this work we chose to deactivate it. The second operation is to setup the microcontroller’s clock. For this work we are using a clock frequency of 8 MHz. Initially, we were running with a clock frequency of 1 MHz but, due to the update time of the LCD, we were forced to speed it up. When we ran programs that needed to update the LCD, the delay was noticeable. This also changed the overall responsiveness of the emulator. The next operation is to initialize the peripherals, the microcontroller’s Serial UARTs and Timers

1A watchdog timer is a timer that is usually used to recover from malfunctions. Summarizing, the watchdog starts with a determined value and once it reaches a certain limit it will reset the microcontroller.

20 connected PEPE’s interrupt pins. To initialize the peripherals we must set the direction of the microcon- troller’s pins and also initialize the microcontroller’s SPI bus peripheral, so we can use it with the LCD and the LED Driver for the Seven Segment Display. The Serial UART peripherals have to be initialized with the appropriate baud rate and enable the interrupts for the serial port related to the resident monitor. After initializing the peripherals we initialize the self programming functions. Those functions are the result of a particularity of the MSP430 and are used for the programming of the flash memory. It happens that when we are trying to erase or write the flash memory while running code on the same flash bank, we must execute those actions from code running on RAM. The flash memory is divided into 4 banks of 64 KB and each bank is divided into 512 B segments. As we have no option to write code directly to RAM, we write the desired functions on a specific region of the flash memory - Func flash - which is then copied to RAM - Func RAM. These memory segments are presented in figure 4.3 which illustrates the microcontroller’s memory partition. Finally, we initialize the breakpoints, a functionality described in chapter6, and enable the microcon- troller’s interrupts. After the initialization process is complete we then jump to the resident monitor - Jalapeno˜ - that will wait for commands from the computer.

4.2 Storing a PEPE Program in the microcontroller

Before being able to emulate PEPE programs we must first have a representation of the programs in memory. The program is transferred from SIMAC into PEPEonBOARD via USB, and is physically placed into the microcontroller’s flash memory. We assigned 64 KB of flash memory for this purpose. When we receive a program from the computer, the program comes in a specific format - COD - used by SIMAC. As we assigned 64 KB for storing the COD file, we are limited to COD files with less than that size. As generally COD files are smaller than the size of the memory bank, this is not a problem. In our current architecture, we reserved 32 KB for the Memory Bank. This means that, if in the future, the range of addresses dedicated to the Memory Bank was expanded, we would still be able to store the COD file in flash memory. COD files are compiled versions of PEPE programs. SIMAC compiles the PEPE programs and uses the COD files in the simulation of PEPE. COD files store a representation of PEPE’s memory, including PEPE’s instructions. They also store information useful for SIMAC such as the labels used on the Assembly program and their locations. These labels are no longer used by the instructions in the memory representation. Instead they use immediate numbers to represent the difference between the actual instruction’s address and the label’s address. The reasons for using COD files as opposed to transfer raw program memory contents to the micro- controller are twofold. First, COD files can store two (or more) non contiguous memory segments more efficiently than storing the whole memory. For instance, lets assume we have our program starting on ad- dress 0x0000 and ending on address 0x0539 and we have 128 bytes of data starting on address 0x2000, with a gap between those two segments. In the COD files both segments are stored more efficiently as

21 the gap between the program and data is not stored. For each segment we know the length and the start address. The second reason to choose COD files is because in the beginning of the development process, we had still not developed the Jalapeno˜ resident monitor to process the communications with the computer. Still we had to test our emulator. Using COD files we are able to use the same file as SIMAC for PEPE’s programs and compare the results. To transfer the information between the computer and the microcontroller, we store a copy of the binary contents of the COD file on the microcontroller and programmed it using a specialized programmer. The disadvantage of using a COD file is that we have to decode it every time we start emulating a new program. Finally we also store the total length of the COD file in flash so we can decode the COD file later, in our microcontroller. Saving this program on flash memory has some advantages over saving it on RAM. For one, we only have 16 KB of RAM available. This would make saving the whole program on RAM at once not possible. Secondly, by saving the program in flash memory, we have the possibility to disconnect the power from the microcontroller and keep the program stored. However, this approach comes with a few disadvantages. The number of writes cycles available on flash is limited, so there is a chance that over time the microcontroller’s flash memory will wear due to usage. The microcontroller’s data sheet assures a minimum of 10000 write cycles and an average of 100000 write cycles for this microcontroller [see7, pp.65]. Also, programming flash memory is both slower and harder to develop and debug than writing to RAM. Some of the difficulties about writing to flash memory in the MSP430 have been mentioned previously.

4.3 Translating from PEPE’s Instruction Set to the MSP430 Instruc- tion Set

After having stored a COD file in the microcontroller’s flash memory we now have all the information needed to be able to start emulating the program. But first we must process that information by decoding the COD file and storing the decoded raw memory contents in another section of the memory. While the place where the COD file is stored is static and can only change when the user uploads a new program to the microcontroller, this is not the case with the decoded memory contents. As PEPE uses a Von Neumann model, the decoded memory is loaded with both program and data. This is stored in a flash memory region called PEPE Memory, which has 64 KB, as seen in figure 4.3. Note that as happened with the PEPE COD section in flash memory, this section also has 32 KB more than the corresponding memory space reserved in the emulated architecture. Using 64 KB would allow other ar- chitectures to dedicate the whole range of addresses (and not use any peripheral) solely to the Memory Bank, thus conferring a bit more versatility to our emulation approach. PEPE’s memory is best suited for a RAM, but as we mentioned before, our microcontroller only has 16 KB of RAM. This means that in the best case scenario we would be limited to emulating PEPE programs of a maximum of 16 KB. This value would still be unlikely as we also need RAM for other

22 MSP430 RAM MSP430 Map Memory Map

0x4000 0x40000 Func RAM 0x3E00 PEPE Registers 0x3DDE PEPE Breakpoints PEPE COD 0x3DBE

0x30000

PEPE PEPE Memory Page

0x20000 PEPE (stores the 0x1CBE addresses pointing to the emulating routines) 0x10000 Free 0xFF80 RAM Free Flash

0x0200 0x0000 0x0000 Func Flash

Figure 4.3: Microcontroller’s memory mapping purposes on our microcontroller other than the emulator. The way to overcome such situation will be studied in greater detail in a later section. Although there is no documentation on how the COD files are obtained, we had access to the SIMAC source code, which included a COD file decoder. SIMAC was created in Java so we needed to convert it to C programming language, supported by the microcontroller Decoding the COD file is a relatively simple process that can be easily performed by the microcon- troller. The first byte represents the word size, which in our case we must test if it is equal to 2, meaning that our word size is 16 bits. After reading the byte we increment our read pointer by the word size, to get a word alignment. Then, we skip through the section of the COD file containing the labels and now we are able to read the contents of the memory. The COD file separates non contiguous blocks of memory into segments. Each segment is preceded by three pieces of information, each with 16 bits:

• The first address of the segment • The size of the segment in bytes • The first address of an instruction

The first and third arguments may seem identical (and in some cases are) but they are useful for separating the data from instructions. This way we are able to tell if a given memory position corresponds to a PEPE instruction or not. This is relevant for the precoding process of our emulation method. After reading the three arguments we are now able to save the information to the appropriate location on the flash memory section reserved for PEPE’s memory. However, while decoding the COD file, we are

23 also decoding the instructions for the Threaded Interpretation with Precoding. We have the information whether the read word is a PEPE instruction or not. By using this we can easily know if we need to link that instruction with the corresponding MSP430 routine that emulates it. We should recall that PEPE instructions have a fixed length of 16 bits and one of the fields of the instruction is the Opcode. The opcode is the field that indicates to the processor which instruction is going to be executed. An opcode on the PEPE can have 4 or 8 bits. Four bit opcodes are useful in instructions that need the remaining twelve bits to transmit more information than they would if they were using 8 bits. Taking advantage of these opcodes, we created a hierarchic group of look-up tables as exemplified in figure 4.4. This table is organized in such a way that on the top we have a look-up table with 16 positions (first opcode LUT). This look-up table links PEPE’s instructions with the corresponding MSP430 routines that emulates them, for all instructions having 4 bit opcodes. For the rest of the instructions that have 8 bits, this look-up table has the address of other look-up table that maps the following 4 bits and points to the routine address on the MSP430. Considering that there are 9 instructions with 4 bit opcodes and that no instructions start with the 0b1110 and 0b1111 opcodes, we are left with only other 5 look-up tables of 16 positions to map the rest of the instructions. Using this concept of multiple look-up tables we are able to “compress” a 256 (28) positions look-up table (as it would be the case of a flat organization) to only 96 (6*16) positions with the slight performance penalty of detecting whether an instruction has four or eight opcode bits.

Second Level of LUTs First Opcode Instructions LUT 0xF ...... 0xF ...... 0x0 ... Emulated Instructions ... Routines ...... 0x0 ......

Figure 4.4: Example representation of the hierarchic organization of the look-up tables

To detect if an instruction has a 4 or 8 bit opcode we chose a simple approach. As the instructions with 4 bits are always the same, we were able to take the 4 bits of the opcode and generate the boolean expression indicating if it is a 4 bit opcode instruction. The generation of such boolean function is highlighted in figure 4.5. After detecting if an instruction has a 4 or 8 bit opcode and reading the address for the routine that emulates that instruction on the microcontroller, we are now able to write it on the memory of the microcontroller.

24 AB Instructions that have a 4 bit opcode (ABCD): 00 01 11 10 0010, 0011, 0100, 0111, 1000, 1001, 1010, 1100, 1101 0 0 1 1 1 0 0 0 0 1 1 ______C 1 F(ABCD) = AC + ACD + BCD + BCD D 1 1 1 0 0 1 1 1 0 0 1 0

Figure 4.5: Simplification of the expression that indicates a 4 bit opcode instruction

According to the emulation Interpretation method we must save a representation of the PEPE archi- tecture, meaning we save a representation of PEPE’s memory, and its Register Bank. Moreover, with the Threaded Interpretation with Precoding, we must also save in memory the address for the routine that emulates an instruction of the corresponding memory address, separately from the PEPE’s memory. This new section of memory is also stored in flash memory in a region called PEPE, pictured in figure 4.3. However, as opposed to the PEPE’s memory, it doesn’t change throughout the execution of the PEPE program, meaning the instructions will remain the same. This section must have the same size as the previously referred section PEPE Memory. This is because for every word of PEPE’s memory we must be able to store the address to the routine that emulates the instruction on that address. These routines addresses on the microcontroller can have 16 or 20 bits, depending on whether they are stored in the first 64 KB of the flash memory or not. As we chose to store all these routines in the first 64 KB of the flash memory, we were able to use 16 bit addresses. Summing all the memory segments we are now using the total 256 KB of flash memory available on the microcontroller. After having decoded the COD file, saving the memory contents of PEPE’s memory and also the addresses for the emulation routines, we have all the necessary requirements to run the program.

4.4 Virtual Memory

To tackle the problem introduced previously of having a PEPE memory greater than the size of the microcontroller’s RAM and still needing to use the full PEPE memory we now introduce the concept of virtual memory. Note that albeit having the same name as the feature present in the PEPE they are not related, as we are not emulating PEPE’s virtual memory. Although we could write to the flash memory every time we perform a change in PEPE’s memory, this approach would very quickly drain the number of write cycles our flash memory has. Writing to flash can only change bits from 1 to 0, meaning that if we are writing to the same position more than once after an erase, it is not guaranteed the write will be possible. This means every time we want to write a word in flash and it failed, we would have to copy the entire segment, modify the desired address, erase the segment (causing all its bits to change to 1), and rewrite the entire segment altogether. Instead of doing this we are doing something different. In our case we introduced the concept of

25 pages. In our implementation a page is an 8 KB segment of PEPE’s memory that is stored in RAM. Choosing the size of 8 KB for the page is justified because we want to have a page size inferior to the 16 KB of the existing RAM and still have some RAM for the other parts of the program. We also need that a page is a power of 2, to allow an easier identification of each page. Using 8 KB pages, we are able to divide PEPE’s memory into 8 different pages. Before starting running a PEPE program, the first page (the first 8 KB of the PEPE’s memory) is copied to the space allocated in RAM for the pages - PEPE Page, referenced in figure 4.3. In the current implementation, every time a user tries to read from the Memory, we just check if the address is inside the current page or not. If it is we return the value present in RAM, otherwise we read the address from the flash memory. Doing so insures the user is always reading the most recent update of the address. Note that we are not changing pages if we read an address outside the page. Both the read and write processes are represented in figure 4.6.

Memory Read Memory Write

Address Inside No Address Inside No Swap Out Current current page? current page? Page to Flash

Yes Yes

Swap In new Update Address Read from Page Read From Flash Page from Flash Value in Page to RAM

End End

(a) Read Memory (b) Write Memory

Figure 4.6: Flowchart Describing the Reading and Writing Process of PEPE’s Memory

If the user tries to write to the memory we perform similar checks. If the user is trying to write to an address inside the page all we do is to update the value in RAM. Otherwise, if the user tries to write outside the page, we swap out the current page in RAM to the flash memory, change the page number and swap in the new page from flash to RAM. After that operation is done we perform as we would normally perform if the user is trying to write to an address inside the page. However, the issue of flash wearing is still present though mitigated in this case. To prevent erasing the entire 8 KB page in flash before copying to it we do a slightly more complex check in order to optimize this operation. The microcontroller we are using has an internal peripheral that can calculate the CRC16 of one or more bytes. In our case we don’t erase the whole 8 KB of flash memory corresponding to the page. Instead, we perform the CRC16 of each flash’s segments 2 in comparison to our page in RAM. If the CRC is different then we will erase the segment and copy it from RAM; actually, we always need to write only the flash segments that were modified, leaving the remaining untouched.

2The flash memory is organized in segments of 512 bytes

26 With this approach we are trading performance for a better management of the flash’s write cycles. The page number is maintained on a reserved RAM position and represents the first address of the page. For instance if we have the second page in RAM then the page number is 0x2000.

4.5 Running a program

Before running a program we must start by parsing the COD file and decoding the instructions to the addresses of the microcontroller’s routines that emulate the PEPE’s instructions. This is necessary to run every time the program is started for the simple reason that the memory might have changed from its original state. A flag could have been added signaling if the memory changed; however, in the current version of SIMAC, every time the user restarts the program, it also sends a new copy of the COD file. This means we would have to decode the COD file again, as the flag would have been reset. Before starting a program and after the decoding process, we copy the first page of the virtual memory to RAM, clear the registers and the interrupts variable. The registers are stored in a reserved section of 34 bytes of RAM memory, given that we have 16 registers and 1 auxiliary register, each with 2 bytes. The registers must also be cleared before executing the program, as it is their default state at the beginning of a program execution. Clearing the interrupts variable is necessary because in the beginning no interrupt was triggered. The interrupts variable was the way we used to implement the interrupts in our emulation, and we will describe the process in further detail in the a following section. As mentioned previously, our interpreter was developed in Assembly for the MSP430. This was useful because we were able to use some of the flags generated on the microcontroller’s status register on our emulation. Those flags were used to update the flags of PEPE’s status register. By using Assembly we had a greater control over the stack and jumps that take place in the emulator. Another great advantage of using Assembly is its greater performance. This is not without a cost though. The development speed was slower, given the complexity of Assembly code being greater than similar code in C. However, all other parts of the emulator that were not related to the interpreter were programmed in C. The TI’s MSP430 compiler, available in the Code Composer Studio IDE has the option to reserve the MSP430’s R4 and/or R5 registers. This way these registers are never used by the C code. This will impact the performance of the C code, but then again we had no strong performance requirements for the parts of the program implemented in C code. Therefore, we use the R4 register to store the program counter of the running PEPE program. This is extremely useful because having a reserved register for the PEPE program counter means we can perform jumps to next instructions more easily, without having to read its value from memory and then performing the jump. As for the R5 register, we use it as a general purpose register for the interpreter. Doing so we do not need to save that register in the microcontroller’s stack before using it in the routine, as we have to do otherwise. Of course that not all instructions could be emulated with only a single general purpose register, so, in a few cases, other registers were pushed to the stack. Both R4 and R5 registers are also cleared prior to starting the program.

27 After performing all these actions the program now jumps for the first routine that emulates the corresponding PEPE instruction.

4.6 Emulated Instructions Routines

Upon jumping to a routine and knowing the current program counter, we are able to read the current instruction from memory. After reading the instruction, the program counter is incremented by two and we can now proceed to parse the instruction’s fields.

Start Jalapeño

Read Instruction From Memory

Check for Breakpoints

Parse Incoming Messages

Increment Program Counter

Parse Instruction's Fields

Emulate Instruction Optional

Update Flags

Check for Interrupts

Jump to Next Instruction

Figure 4.7: Emulating a PEPE instruction

However, before proceeding to increment the program counter and parse the instruction’s fields we need to perform one action. To ensure that the resident monitor processes incoming messages from the computer, or pauses the emulator because of a breakpoint, for instance, we must call a function that performs those tasks. That action is further described in chapter6. If the program was not paused by the resident monitor, then it proceeds normally to emulate the instruction. The specific implementation of the instruction’s emulation varies from instruction to instruc-

28 tion. In general, PEPE instructions that are similar to existing instructions of the microcontroller, require less code and are generally quicker. However, the PEPE instruction set includes some instructions that are not supported by the microcontroller’s instruction set and are computationally more intensive. Instructions such as MUL Rd, Rs, DIV Rd, Rs and MOD Rd, Rs require more code.

In some special cases, where the instructions do exist in both architectures, the flags generated by the microcontroller are different from the ones generated by PEPE. This is the case e.g. of the carry flag on arithmetic instructions that involve subtractions. This is due to the fact that the microcontroller uses the carry flag according to the carry bit convention as opposed to PEPE which adopts the borrow bit convention.

The carry bit convention takes into consideration that A − B = A + B + 1 and calculates the carry flag according to that addition. The borrow bit convention calculates the carry flag as a result of the borrow operation in a subtraction. For the same A and B this will not result in the same carry flag. So in instructions performing subtractions we have to convert the carry flag from the microcontroller to the one used by PEPE.

Carrying out the emulation of each instruction and updating PEPE’s status register flags, we check if any interrupts occurred during the execution of the instruction. Interrupts are always attended at the end of the instructions as they cannot stop its execution in the middle. The process of detecting if an interrupt occurred and how we deal with interrupts is further detailed in later sections.

Finally, we jump to the next instruction.

A diagram illustrating the sequence of emulating a PEPE instruction is depicted in figure 4.7.

As an example lets describe the implementation of the ADD Rd, Rs instruction.

The first operation is to read the current instruction (which is saved in a microcontroller’s register), and check if the resident monitor received any messages. Only after that we increment PEPE’s program counter.

After incrementing the program counter we extract the necessary fields from the read instruction. In this case we extract the Rd register from the second least significant nibble, and the Rs register from the first least significant nibble of the read instruction.

Having extracted the necessary fields we can then emulate the instruction. To emulate this instruction we used an existing instruction of the MSP430 instruction set - ADD src,dst - as shown below.

; Add Rs to Rd ADD pepe_registers(R5), pepe_registers(R6)

In this case we use the indexed addressing mode of the MSP430 to access PEPE’s registers. The pepe registers word corresponds to the base address of PEPE’s registers and R5 and R6 to the relative address of Rs and Rd, respectively.

Finally we proceed to update PEPE’s flags from the MSP430’s generated flags, check for new inter- rupts and jump to the next PEPE instruction.

29 4.7 Reading and Writing from / to Memory

In SIMAC, PEPE communicates with peripherals using predetermined ranges of addresses. The ad- dress is decoded and a signal is activated to use a certain peripheral. For the user this process is hidden, which means the user can use the same instruction to read a word from memory and to read the state of the push buttons, for instance. In this section we will describe how we translate this behavior into our emulator. Instead of decoding the memory addresses using simulated hardware, we are decoding the address using Assembly software. Thus, instead of replicating this decode process in every instruction that can read or write to the memory, we centralized this process and create two routines - readMemory and writeMemory. These routines are similar in the sense that they accept an address and an indication whether one wants to read/write a byte or a 16 bit word. The writeMemory instruction also accepts the value to be written into the memory. In the beginning they start by decoding the addresses. Then, depending on the address received as input, they jump to a section of the routine to control the corresponding peripheral. Its worth noting that some of the address ranges support either reading or writing, while others support both operations. The memory bank for instance, supports both read and write operations, whereas the push buttons only support reading and the LEDs only support writing. This was taken into consideration in our decoding process. If the user tries to read or write to a peripheral that does not support that functionality the routine performs no action. The process of actually reading or writing the memory bank has actually been described previously in the Virtual Memory section.

4.8 Interfacing with the Peripherals

Now that we described the mechanisms to communicate with the peripherals we should now concentrate on how we communicate with the peripherals existing on the PEPEonBOARD platform.

4.8.1 LEDs

The four LEDs in the main board are directly connected to the microcontroller. Controlling the LEDs is just a matter of transferring the correct bits from the value the user wrote to memory to the output pins to which the LEDs are connected.

4.8.2 Push Buttons and Toggle Switches

The process of reading the push buttons and toggle switches is identical and analogue to the process of controling the LEDs. In this case both the push buttons and the toggle switches are also directly connected to the microcontroller. This time, however, we must read the values from the input pins to

30 which both the push buttons and the toggle switches are connected and transfer their values directly to a word the user will receive. It is a known fact that buttons and switches, in general, suffer from a phenomenon called “bouncing”. This phenomenon happens when the user presses a button. An ideal button should change position instantaneously. However, physical buttons “should” obey to physical constraints. This means that sometimes they don’t create a perfect contact instantaneously, originating the “bouncing” effect charac- terized by a wave form bouncing from the high state to the low state, before finally settling in the desired state. Even though there are many known debouncing circuits and debouncing software routines, we chose not to include any. This was done on purpose so that the students can experience these issues them- selves.

4.8.3 Seven Segment Displays

The seven segment displays are controlled by an LED driver. This provides an interface with a lower number of pins, saving a lot of microcontroller’s pins for the control of a double seven segment display. This issue is further discussed in section5. This LED driver can control up to 16 different LEDs using an SPI interface, while requiring an extra pin to Latch the written values and control the LEDs. To control each LED we send each LED state in continuos clock cycles. This means we have to send two separate bytes to control the 16 LEDs. After sending the LEDs states we must toggle the Latch signal on and off and the LEDs will change. We are using a dedicated SPI peripheral, included in the MSP430 that enables greater transfer speeds (we are using 1 MHz), compared to programming it manually, while leaving the CPU free for other tasks. In SIMAC we have two types of seven segment displays - Hex Display and 7 Segment Display. Although they have the same graphical interface, they light different segments when written to. When we write to a 7 Segment Display, it uses each bit from the input byte to light the appropriate segments while the Hex Display shows a representation of the input byte (i.e. if we write 0xA it displays the “A” character on the display). In our base architecture we chose to use two Hex Displays to show a byte. This means that we have to transcode the input byte to light the appropriate segments in the seven segment displays before sending it to the LED driver.

4.8.4 Keypad

In SIMAC the users must read the keypad using a scanning method. With this method the user must first write the code of row that he/she wants to read and then read the pressed buttons as a columns code.

31 This is actually similar to what happens in many existing keypads and it is the way we implemented our physical keypad. When the user outputs the row to be read all we do is transfer the correct bits to the output pins of the microcontroller connected to the keypad. The same happens when the user reads the keypad’s columns but in this case we are reading from the input pins of the microcontroller connected to the keypad. Further information about the keypad is given in section5. The same “bouncing” effect happens with the keypad buttons. The same criterion applied to the push buttons and toggle switches was applied to the keypad as well.

4.8.5 LCD

The LCD is slightly more complex than the previous peripherals. The pixel screen peripheral on the SIMAC is organized as illustrated in figure 4.9 (a) while our physical LCD uses a different, more complex organization - figure 4.9 (b). To make matters worse we were not able to find a suitable physical LCD that had the same dimen- sions as the ones we chose for the SIMAC - 32 x 32 pixels. To solve this issue we used an LCD with the resolution of 128 x 64 pixels and upscaled every pixel screen pixel by two in both dimensions. This left a remaining horizontal space of 64 pixels in the physical LCD, so we centered the pixel screen image on the LCD with two vertical bars measuring 32 pixels on each side, as illustrated in the figure 4.8.

(a) Simulated Version (b) Physical Version

Figure 4.8: Comparison Between the Simulated and Physical LCDs

Having a resolution of 32 x 32 pixels means the pixel screen module has 128 memory positions. The pixel screen module supports both read and write operations. The problem is that our physical LCD do not support the reading operation, while interfaced by the SPI bus. To solve the problem of reading from the LCD we maintain a table of 128 bytes on the microcontroller, one for each address of the pixel screen, and when the user reads from the LCD we return a value from that table. When writing to the LCD we perform a similar process. We write to the table and then write the changes to the LCD. The physical LCD uses the SPI interface for communication, with three extra pins (Reset, A0 and CS). The Reset pin resets the LCD and puts it in a state ready for initialization; A0 distinguishes if the

32 sent byte, via the SPI bus, is a data byte or a command byte; and CS selects the LCD on the SPI bus, because the SPI bus can be shared with other peripherals (in this case we are sharing it with the LED driver for the seven segment display). The information about the initialization steps and how to communicate with the LCD was taken from the controllers LCD, [see8]. To write the changes to the LCD we must first convert the table we used for the pixel screen to a memory representation our physical LCD can use. The difference between both architectures are presented in figure 4.9.

0 1 2 3 1 1 1 1 4 5 6 7 0 1 2 3 ... 2 2 2 2 Row 0 8 9 10 4 5 6 7

......

1 1 1 1 121 122 123 0 1 2 3 2 2 2 2 Row 7 124 125 126 127 4 5 6 7

(a) On the SIMAC (b) On the Physical LCD

Figure 4.9: Organization of the memories in both LCDs

Lets presume we wrote a new value on the address 0x6 of the pixel screen. As we can see in figure 4.9 the byte orientation is perpendicular on both LCDs. This means that by changing the address 0x6 on the pixel screen we must change at least 8 addresses of the LCD. However as we are upscaling the 32 x 32 pixel screen to a 64 x 64 pixel resolution, we will be changing 16 addresses of the LCD. Continuing with the example, we select the first address - 0x30 (48) - on the Row 0 of the physical LCD. We selected the address 0x30 because we must leave 32 pixels for the vertical bar, skipped 2*6 pixels for the corresponding byte we are changing and added 4. The reason we added 4 to the address is because for this LCD the first visible pixel in the row starts at address 4. After having selected the address of the first byte we are changing, we can now reconstruct the first column of the 16, using the information stored in the addresses 0x2, 0x6, 0xA and 0xE. This recon- structed byte is send twice, as the second position of the physical LCD is equal to the first because of the upscaling. We repeat this process until all the 16 pixels are updated.

4.8.6 Serial Interface

The final peripheral we are emulating is the Serial Interface. In the architecture presented in figure 3.2 we have two MUARTs connected and a TX terminal. The purpose of the MUART connected to the TX terminal is to translate the information from/to the TX terminal to the MUART connected to the PEPE. In the external emulator this second MUART represents the peripheral connected to the board, being it a connection from the computer or a new peripheral

33 altogether. SIMAC’s MUARTs have two half-duplex channels, which enables full-duplex communication. In our current architecture, the two MUARTs are indeed connected in a full-duplex mode using both channels for simultaneous communications. This is similar to what happens on a normal RS-232 UART, where we have two different channels, each with a different direction. SIMAC’s MUARTs have four different 8 bit registers:

• REP - Status Register • RCU - Control Register • RDU1 - Data Register for channel 1. • RDU2 - Data Register for channel 2.

All these registers can be read or written to. This was the first obstacle we found when emulating the MUART as one of the UARTs integrated into the microcontroller. In the microcontroller’s UART we can only write to the transmit channel and read from the receive channel. To solve this problem we considered two solutions. The first solution was to simply ignore all values written to RDU1 and return the value 0x0000 if the RDU2 register was read. This is valid because channel 1 is only used to receive data and channel 2 to transmit. Although possible, this solution would also break all previous programs that used channel 1 as transmit and channel 2 as receive channel. The chosen solution was to join both channels, meaning that the user can read or write in both RDU1 and RDU2, resulting in reading or writing to the microcontroller’s UART. Even though this solution is not fully emulating the capabilities of SIMAC’s MUARTs, it would not break any previous programs, while still working appropriately. Also, in our fixed architecture, a single channel can only read or write, so no functionality is lost. After solving the first obstacle, we used the same approach with the Status Register - REP. This register has 4 status bits for each channel:

• Bit 0, 4 - ERX - Received Status: 1 if has received a byte, 0 otherwise; • Bit 1, 5 - ETX - Transmit Status: 1 if is able to transmit a byte, 0 otherwise; • Bit 2, 6 - IRX - Receiving Error - 1 if the last byte was not correctly received, 0 otherwise; • Bit 3, 7 - SRX - Overlap - 1 if the last received byte overlapped a previous unread byte.

As we can see from the list above, all functionalities are the same for both channels. In the same way we did for the Data Registers, we will also be joining both channels, so the last 4 bits will be a copy of the first 4 bits. Replicating the same information for both channels guarantees compatibility with other programs with swapped channels. Fortunately the microcontroller’s UART module provides all the necessary bits for the Status Register, so when a user reads from the status register we copy the corresponding bits from the microcontroller’s registers to the return value for the user.

34 RCU Value BAUD Rate

0000 9600 0001 300 0010 600 0011 1200 0100 2400 0101 4800 0110 9600 0111 11440 1000 19200 1001 28800 1010 38400 1011 57600 1100 115200 1101 230400 1110 230400 1111 230400

Table 4.1: RCU Value to Baud Rate Conversion Table

Finally we must also be able to emulate the control register. This register controls the number of clock cycles the MUART waits between transmitting or receiving each byte. This behavior could not be fully replicated in the real UART peripheral, so we chose to use it to control the UART’s baud rate. The RCU register is divided in 2 groups of 4 bits, controlling the speed for each channel. As we did previously, we are joining both channels, however in this case we are only using the value for the first channel - the first 4 bits. Table 4.1 has the correspondence between each nibble value and the baud rate. This list tries to include the most common baud rates. We chose to use the 9600 baud rate as the 0000 value because this value is used as default in many PEPE test programs and so is the 9600 baud rate in many peripherals. Also, the 230400 baud rate is used in three different values because the microcontroller was not able to use higher baud rates.

4.9 Interrupts

First, we need to make a distinction between the interrupts we are emulating. Some of the interrupts such as the INT[0-3] interrupts, are triggered externally. That means that they react to an external signal that is connected to the PEPE and are triggered depending on the value of the RCN register, as described in table 3.4. The other interrupts are indeed exceptions only dependent on the internal features of the PEPE. Even though all interrupts generate the same result, as all will end up calling the appropriate routine

35 through the interrupt vector table, the way we use to identify if an interrupt has occurred is different due to the different nature of both kinds of interrupts. In an attempt to standardize the attending of interrupts we created a variable, in which each bit is dedicated for a different interrupt. Every bit can be set by any instruction if an interrupt situation arises. This variable is then read in a routine that is called in the end of each instruction - checkInterrupts - that throws the appropriate interrupt if necessary. This behavior of checking the interrupts at the end of each instruction is mentioned in [2, pp.462,611]. Before attending any interrupts, this routine checks if the IE bit of the PEPE’s status register is active. If the IE bit is not active then the interrupts variable is cleared and the program runs normally. Interrupts are attended by the order defined in the interrupts table [see 2, pp.465], which means that if both INT0 and the Overflow interrupts are set, INT0 will be attended first and only after it exists from the interrupt we can attend the Overflow interrupt. Every time an interrupt is thrown some basic actions are performed. First the PEPE’s status register is copied to the TEMP register. Then the status register’s IE bit is cleared [see2, pp.479-480], the PEPE’s program counter is pushed to the PEPE’s stack and so is the TEMP register. Finally we modify the program counter to the appropriate routine in the table pointed by the BTE register. Lets start by describing the internal interrupts or exceptions which occur in software and are depen- dent on internal features. Those interrupts are, according to table 3.5 - Overflow, DIV0, INV OPCODE, MISALIGNED D, MISALIGNED I. We will now describe how we deal with each one separately.

4.9.1 Overflow

The Overflow exception can only occur in instructions that affect the overflow (V) flag of PEPE’s status register. Coincidentally, only instructions belonging to the ARITOP group of instructions (characterized by the same 4 first bits of the opcode) affect the V flag. This exception is thrown every time the V flag is active. When an instruction needs to update PEPE’s status register flags it calls a routine named update- Flags. That routine will take the desired flags values from the microcontroller’s status register and transfer them to PEPE’s status register. This gives us a centralized point to check whether the overflow flag is active or not. If the overflow flag is active, we set the appropriate bit on the interrupts variable and exit the routine normally. When the instruction that set the overflow flag runs the checkInterrupts routine and detects the over- flow trap is set, we must first check other condition before throwing the interrupt. As presented in table 3.3, the overflow exception has a special bit that enables or disables it - TV. If that interrupt is disabled by the TV bit then the overflow trap is cleared and the program runs normally. If the trap is enabled then we throw the interrupt to run the routine pointed by the address on BTE + 0x08.

4.9.2 DIV0

The DIV0 division exception happens every time a division or modulo by zero is attempted. These are the only two situations when this exception might happen. This is checked after decoding the arguments

36 on both instructions. If the divisor is zero then the appropriate bit is set on the interrupt variable and the program jumps to the end where it calls the checkInterrupts routine, performing no calculations. In the checkInterrupts routine the situation is similar to the Overflow exception but with the TD bit of the PEPE’s status register and if the interrupt is thrown it goes to the routine pointed by the address on BTE + 0x0A.

4.9.3 INV OPCODE

The INV OPCODE exception is detected in the decoding process, before starting the emulator. If the read opcode doesn’t belong to a known PEPE instruction then it is pointed to a routine that sets the bit for the INV OPCODE exception and calls the checkInterrupts routine. This routine will then throw the interrupt to the address on BTE + 0x0C

4.9.4 MISALIGNED D

The MISALIGNED D exception happens when the user tries to read or write a 16 bit word from memory (or from a peripheral, as discussed earlier) with an odd address. This condition is detected on the instructions that interact with the memory and use 16 bit words. If the instruction detects an odd address, it sets the appropriate bit on the interrupts variable and jumps to the end of the instruction without reading or writing to memory. In the end of the instruction it calls the checkInterrupts routine and a similar action as the previous instructions is performed. If the interrupt is thrown then it will move the program counter to the address on BTE + 0x0E.

4.9.5 MISALIGNED I

The MISALIGNED I exception happens when the program counter is an odd number and we are trying to read the instruction from memory. This instruction is analogue to the MISALIGNED D interrupt. For catching this exception we had two options. The first option was to check the program counter every time an instruction tries to read an instruction from memory. The second option is to check every time a user has changed the program counter. We chose the second option because despite the fact the program counter is changed in every instruction, it is changed by our emulator to a known valid state. This means that it will remain an even number throughout the execution of the program unless the user changes the program counter manually. So we only check if the program counter is an odd number in the instructions the user has the power to change the program counter. If the interrupt is thrown then it jumps to the routine pointed by the BTE + 0x10 address.

4.9.6 External Interrupts

In this subsection we describe how we deal with the external interrupts. These interrupts differ from the previous because they can be triggered at any time by an external source (i.e. they are asynchronous

37 to the program execution) and their trigger level is set by the RCN register. The RCN register, being an auxiliary register, has a special instruction to read and set its value. As we are not using the other auxiliary registers, we ignore their values if they are used by these instructions. To maintain a state of the RCN register we save it as part of the register bank. We have two different external sources: one are clocks, and the other is a UART. Both these external sources are available as peripherals in the microcontroller and provide some interrupt features. For instance, the clocks provide interrupt signals when they cross a predetermined threshold and the UART provides interrupt signals when it receives or transmits a character. To emulate SIMAC’s behavior for the clocks, we must use the microcontroller’s timer to generate a square wave. This is done simply using a variable and a Timer with half the period of the simulated clock. This way when the timer crosses a determined value and toggles the variable, we are able to generate a square wave behavior on our variable. The problem arises when we need to change the triggering level for these signals. The micro- controller activates the interrupt signals when an action occurs, but does not provide a functionality to change the trigger level. To solve this problem, we have to take into account a couple of situations. First, in situations when we are triggering on level, that is, when the signal is either low or high, we just read the current value of these signals and throw an interrupt if the signal is either low or high, respectively. The other situation is when we trigger on edge values, that is either rising or falling edge. This is accomplished by saving the interrupt’s signal last value and comparing with the current one. This requires that we must maintain a variable in RAM with the previous state for the interrupts signals. The updating of the previous state of the interrupt signals is done at the end of the checkInterrupts routine.

38 Chapter 5

The PEPEonBOARD Platform

In this work we need to emulate a specific PEPE architecture. Some of the modules constituting this architecture may already exist inside the microcontroller, such as the memory module and the timers, but others are external and need to interface with the microcontroller. Additionally, the microcontroller cannot work as a standalone unit and needs circuitry to provide power and to enable communications with a computer, for instance. Although there are several development boards that provide some of the features we needed, we were not able to find any that could provide all of them. So the decision was made to design our own solution with a custom board - the PEPEonBOARD - on a single PCB. The process of designing the first version of the PEPEonBOARD happened before the actual devel- opment of the PEPE emulator, as the actual hardware platform was needed for the task. To aid with the process of creating the PEPEonBOARD, early on the requirements phase, it was de- cided that the designed PEPEonBOARD should extend the existing functionalities of the already existing MoteIST++ s5 1011 [9] board. This board’s original purpose is to be used in Wireless Sensor Networks but due to its great expandability we are able to use its microcontroller to control the peripherals that are mounted on the main board.

5.1 Requirements

In the beginning of the development the new PCB we had to define some minimum requirements:

1. Must extend the MoteIST++ s5 1011 board; 2. Must include the required peripherals presented previously. 3. Low cost.

In addition to the minimum requirements we also wanted this board to be used like a normal devel- opment board for the MSP430 microcontroller, so a few other features were added as well:

1. Should use an USB interface for both power and communication with a computer;

39 2. Should be able to be powered by a computer or an external power supply. This is useful in cases where the board is not plugged into a computer or the need for higher currents is necessary; 3. All the GPIO pins should be user accessible for other purposes other than the built-in peripherals; 4. Should have support for output buffers that allow higher currents to be supplied when using the GPIO pins as outputs, as well as some protection to the microcontroller; 5. Should provide some input protection to the GPIO pins of the microcontroller; 6. Be compatible with the PC104 form factor. This will allow the PEPEonBOARD to be stacked on top of other PC104 compatible boards, even though it has no support for the PC104 Bus.

5.2 Implementing PEPEonBOARD

In this section we will discuss in detail all the choices made during the development of the PCB, as well as present the final results. In figure 5.1 we present a block diagram containing all the components of the main board and their interconnections. In total, two versions of the PEPEonBOARD were created, so if any changes were made between the two versions they will be discussed as well on each subsection. In the end we will present the final result of the main board for both versions. The schematics and bill of materials are present on AppendixA.

5.2.1 Power

The MSP430F5438A can be powered by voltages ranging from 1.8 V to 3.6 V. We chose the voltage of 3.3 V because some of the used peripherals (such as the LCD and the serial communications IC, for instance) are limited to that voltage. The board can be powered both via USB or via an external power supply. This is selectable by the user through a jumper. To regulate the input voltage down to 3.3 V we used the LDO voltage regulator LD1117DT33 [10] which is limited to a maximum input voltage of 15 V. The circuit for the input voltage regulation is based on the reference design of the data sheet and is pictured in figure 5.2. The capacitors values were dimensioned according to the reference values of the manufacturers data sheet.

5.2.2 Interfacing with the MoteIST++ s5 1011

The MoteIST++ s5 1011 board has three interface options: one Hirose 52 expansion socket, compatible with the MicaZ boards [see9, pp.14] and two communication board connectors (Hirose 20). The Hirose expansion socket is located on the bottom of one side of the board while the two communication board connectors are located on the other side as pictured in figure 5.3. As the two communication board connectors are located on the same side they can both be used at the same time.

40 1x GPIO1x Speaker Connector Expansion 2x GPIOs2x Buttons 2x Push 2x Output Buffers 2x GPIOs2x Switches 2 x Toggle 26 GPIOs26 26x GPIOs 4x GPIOs4x

I2C I2C 4 x LEDs

SPI SPI 15x

MSP430 2x RS-232 2x RS-232 GPIOs I2C CB Connectors CB Microcontroller 3.3V / GND SPI RS-232 3.3V / GND Add-On Board 3.3V 3.3V / GND Figure 5.1: Block Diagram of the PCB Project

SPI

3.3V / GND LED Driver Voltage Regulator Converter USB to RS-232 to USB RS-232 D+ D+ / D- Display Dual 7 Segment 5V 5V / GND Supply [4,5-15]V / [4,5-15]V GND USB Connector USB External Power External

41 Figure 5.2: Schematic of the Voltage Regulator

(a) Front side (with the Hirose expansion socket) (b) Back side (with the two communication board connectors)

Figure 5.3: Photos of the MoteIST++ s5 1011

The two communication board connectors provide a total of twenty-six GPIO pins, two separate RS- 232 interfaces (each with a receive and a transmit pin), an SPI interface (three pins, one for clock, one for MOSI1 and the other for MISO 2) and an I2C interface (two pins, one for clock and the other for data). The Hirose expansion socket provides similar functionality with twenty-five GPIO pins, four of which with access to the ADC peripheral of the microcontroller, and the same interfaces provided by the two communication board connectors. After analyzing the options we ended up choosing the two communication board connectors to in- terface with the microcontroller, due to the greater number of GPIO pins and also the added capability of being able to use the built in BSL 3 that allows programming the microcontroller without using a specialized programmer, present on some of the pins. In retrospect, this choice was likely not the best. The two communication board connectors don’t provide any pins with access to the ADC, which for the purpose of emulating the PEPE is not relevant given that there are no modules with support for analog voltages in SIMAC, but would prove to be useful when using the board as an embedded systems’ development board. In addition we did not use the BSL feature on the microcontroller and the connectors proved to be difficult to assemble, causing some problems, although being more stable than the Hirose 52. There are many possible solutions to solve this problem. The most direct solution is to use the Hirose expansion socket instead of the two communication board connectors. However, to reduce the costs and improve the overall robustness of the board a better choice would be to include the microcontroller directly on the main board. After choosing the interface, we had to provide some protection to the GPIO pins of the microcon-

1Master Out Slave In 2Master In Slave Out 3Bootstrap Loader

42 troller. There are many input protection circuits, but we had to provide input protection to thirty-four input pins while maintaining the parts count low and guaranteeing easy hand assembling with a low PCB footprint. This is why we chose to provide some protection with current limiting resistors in each pin, as represented in the figure 5.4.

Micro controller

R GPIO Pin Input/Output

Imax ...

Figure 5.4: GPIO pin with a current limiting resistor

To determine the value of the resistor we referred to the microcontroller’s data sheet [see7, pp. 44] that specifies the maximum current value of 15 mA while maintaining acceptable voltage drops. By choosing a resistor value of 330 Ω we get a maximum current of 10 mA according to the equation 5.1.

V 3, 3 I = = = 10 mA (5.1) max R 330

To aid the process of hand assembling the boards, we used resistor arrays that have four 330 Ω resis- tors in a single package. From the first version of the board to the second the placement of these resistor arrays was changed, as illustrated on figure 5.5, to ease the assembling process. Other improvement made from the first version of the board to the second was to change the size of the outer pads of the resistor arrays in order to make them easier to hand assemble.

(a) First Version (b) Second Version

(c) Close-up of the Solder Jumpers

Figure 5.5: Resistor arrays placement on both versions of the main board

43 5.2.3 Serial Communication With a Computer

We have two separate RS-232 channels that we would like to control via a computer. Nowadays, very few computers actually have an integrated RS-232 port, being the most common option USB. The USB connector has the advantage of providing power to the board, while acting as a data channel. To convert both RS-232 channels to USB we assessed two options: Connect two separate serial converter ICs to two separate USB port, or use a serial converter IC with capability to control two RS-232 channels using a single USB port. We choose the later approach as it was the most convenient and also minimizes the footprint of the USB to RS-232 converter on the board. The IC chosen for this task was the FTDI FT2232HL [11] which is able to control two RS-232 channels using one USB port. The implemented solution was based on the schematic illustrated on the device’s data sheet [see 11, pp. 22] depicted in figure 5.6. All the capacitors and resistors values are the same as the suggested by the data sheet. The LEDs pictured in figure 5.6 give a light indication when a character is being sent or received in either channel. For current limiting we added a resistor on each LED. We used the same resistor array we used as input protection, which limits the current per LED to approximately 4.5 mA when using red LEDs with a 1.8 V voltage drop, as calculated in equation 5.2. Note that all the components associated with the EECS, EECLK, EEDI and EEDO lines of the figure 5.6 are optional. They are supposed to be connected to an external EEPROM to configure the FTDI IC. However as we are using the default configuration of the IC we don’t need to solder those components on the main board, still the functionality is present if needed.

VCC − VD 3.3 − 1.8 I = = ≈ 4.54 mA (5.2) LED R 330

One of the channels we intentionally reserved for communication with a computer. This is used for the Resident Monitor discussed in the next chapter. The other channel was also connected to the computer, but could also be used to connect the microcontroller to other external peripherals. From the first version of the board some corrections had to be made. First, the RS-232 channel dedicated for computer communication was not operational. This was because the first version had the DTR 4 line connected to the Reset line of the MoteIST++ s5 1011. This is useful if we want to reset the microcontroller every time we open the serial connection, like we would if we were using the BSL, but in our case it was causing the microcontroller to disconnect every time a character was sent. This was fixed adding two solder jumpers on the PCB which can be seen in the right side of the figure 5.5 of the second version of the board. Secondly, on the other RS-232 channel the communication between the computer and the microcon- troller had no problems but as we added external peripherals to the channel they could not communicate with the microcontroller. This was later found to be due to the lines from the USB to RS-232 converter were directly connected to the external peripheral as figure 5.7 depicts. The solution found to this prob- lem was adding resistors between the USB to RS-232 converter lines and the lines connecting to the

4Data Terminal Ready

44 (a) USB to RS-232 converter

(b) LEDs

Figure 5.6: Schematic of the USB to RS-232 converter

45 external peripherals. This way we are able to use the channel to communicate with a computer or to communicate with external peripherals.

Micro Controller Micro Controller

TX RX TX RX

USB to RS-232 Converter IC External Peripheral USB to RS-232 Converter IC External Peripheral

470Ω

TX TX TX TX

470Ω

RX RX RX RX RX RX RX RX

(a) First version (b) Second version

Figure 5.7: Difference between versions of the main board on one of the RS-232 channels

5.2.4 Seven Segment Displays

To implement the two seven segment displays that exist on our PEPE architecture, we chose an existing dual seven segment display - HDSP-521E [12]. However, this posed the problem that a dual seven segment display requires at least fourteen GPIO pins to control every segment (sixteen if we wanted to control the dot points on the screen).

To solve this problem we used an LED Driver IC - TLC5927 [13] - that can control sixteen different LEDs at a constant current while allowing it to be controlled by an SPI interface. This interface uses the SPI Clock and the SPI MOSI, which are shared between all devices using the SPI interface, and a single GPIO pin dedicated to the LED Driver IC. This effectively lowers the GPIO count from fourteen to only one.

The current supplied to each LED is controlled by a resistor connected to the TLC5927. To calculate the appropriate resistor value we used the equation 5.3, given in the device’s data sheet [see 13, pp. 15]. The seven segment display is limited to a forward current of 25 mA per segment [see 12, pp.5], so to provide a lower 12.5 mA we must use a resistor value of 1.5 kΩ. Using this data we were able to design the circuit pictured in figure 5.8. Note that the SDO pin is not connected as it is only useful when multiple LED drivers are connected in series.

1.25 V 1.25 I = ∗ 15 = ∗ 15 = 12.5 mA (5.3) OUT R 1.5 ∗ 103

A jumper was also added to allow the user to disable this peripheral as illustrated in figure 5.9.

In the first version of the board we used a SOIC package for the LED Driver, however this version of the IC was only available from the Texas Instruments website, so for the second version we used the SSOP package which is more readily available.

46 Figure 5.8: Schematic of the LED Driver and Dual Seven Segment Display

Figure 5.9: Highlight of the jumpers that enable or disable the integrated peripherals

47 5.2.5 Speaker

Despite the SIMAC not having a speaker module, one was added to the main board to allow its use in normal development board usage. The circuit used is illustrated in figure 5.10. The footprint of the speaker on the PCB allows the usage of two different speakers. The capacitor C3 acts as a decoupling capacitor when the MOSFET Q1 is driven by PWM. The resistor R11 acts as a pull-down resistor to prevent the gate of the MOSFET from floating. The R12 resistor, in parallel with the speaker, provides a discharge path for the speaker capacitance.

Figure 5.10: Speaker Circuit

As with all the integrated peripherals a jumper was added to allow the user to disable the peripheral.

5.2.6 Push Buttons and Switches

The push buttons and switches are designed to output 3.3 V or 0 V depending of its state. Both can be separately disabled using a jumper.

5.2.7 LEDs

The LEDs are connected directly to the GPIOs, and all the current limiting is provided by the GPIO pin current limiting resistors. To disable the peripheral all the LEDs cathodes are connected to a jumper that can be disabled by the user.

5.2.8 Extending the Main Board with an LCD and Keypad Add-On Board

In the main board we left support for an Add-on board, that was developed at the same time as the second version of the main board. This Add-on board has access to power, an RS-232 channel and the I2C and SPI interfaces as well as access to five fifteen GPIO pins numbered from 11 to 25.

48 All those pins are distributed between three 0.1 inch headers as illustrated on the figure 5.13. Using an Add-on board has the advantage of providing an area of approximately 7 cm x 9 cm, which would not be possible otherwise because it was already occupied by other components on the main board. This proved to be invaluable because both the LCD and the Keypad are peripherals that require a large footprint on the board. The keypad is a hexadecimal keypad and uses eight of the provided GPIOs to address the sixteen buttons. This means that the buttons are connected such that all the buttons on the same row are connected and all the buttons on the same column are also connected. To read the values of the buttons the user must output a voltage on each of the rows and read the values of the columns, or vice versa, also known as Keypad Scanning. The flowchart describing the algorithm to read the keypad is pictured in figure 5.11.

Start

ROW = 0 KEYPAD = 0

Pull all rows high

Pull row ROW low

Read columns values to KEYPAD

KEYPAD << 4 ROW = ROW + 1

No ROW = 4?

Yes

Invert KEYPAD

Return KEYPAD

Figure 5.11: Algorithm to Read the Keypad

Choosing the LCD we had some requirements. First it had to fit in the available area of the Add-on board leaving some space for the keypad. Secondly it had to have the lowest cost possible. We chose the Displaytech 64128K-FC-BW-3 [14] as a best solution given the limitation we face. This display has the resolution of 128x64 pixels with the external area of 58.2 mmx41.7 mm, which fits the available area of the Add-on board while still leaving space for the keypad. This LCD uses the ST7565R-G controller and is interfaced via the SPI interface and three extra pins

49 - A0, Reset and Channel Select. This controller has a built-in voltage booster, necessary for the normal functioning of the LCD, which only needs extra capacitors connected to some pins of the LCD. According to the controller’s data sheet, the maximum voltage for the LCD is 13.5 V [see8, pp.56]. So to boost our input voltage of 3.3 V we used the reference design for the 4x step-up voltage circuit [see8, pp.39] which boosts the LCD voltage to 13.2 V. The schematic representing the design on the main board is pictured in figure 5.12. The reference design does not provide a fixed value for the capacitors, instead it gives a range of acceptable values. We chose 3.3 µF for the C1, C7, C8 and C9 capacitors because they were within the suggested range of values (from 1.0 µF to 4.7 µF) and were already used elsewhere on the board, thus reducing the number of different parts necessary. For the remaining capacitors we chose the value of 100 nF, again because they fell within the suggested range (from 100 nF to 4.7 µF) and were already used in the board. Although we could have used 3.3 µF for all the capacitor values, we chose to use 100 nF capacitors when possible because they are slightly cheaper than the 3.3 µF capacitors.

Figure 5.12: Schematic of the LCD

A photo of the final version of the LCD and Keypad Add-on board is presented in figure 5.13.

5.3 Second Version of the Main Board

The first version of the main board, pictured in figure 5.14, implemented all the features previously discussed. However this first version of the board had some problems. Some of those problems were already mentioned previously but others include:

• Changing the PCB footprint of the Speaker. The previous footprint had support for two different speaker component modules, but for one of them the spacing between holes was wrong. • Added a power indicator LED which was not present on the first version of the board. • Changed the spacing of the Peripheral Enable jumpers. In the first version all the jumpers were

50 (a) Front Side (b) Back Side

Figure 5.13: Photo of the LCD and Keypad Add-on board

not aligned on a standard 0.1 inch spacing. This change was mainly for ease of assemble and cosmetic reasons. • Added an On/Off indication on the PCB silkscreen to indicate if a toggle switch was on or off. • Added two test points connected to ground and 3.3 V to check measure the board voltage.

(a) Front Side (b) Back Side

Figure 5.14: Photo of the first version of the main board

The second version of the main board was developed to solve these problems and provide a stable base for the purpose of this thesis. A photograph of the main board is displayed in figure 5.15 as well as the main board with the LCD and Keypad Add-on attached.

51 (a) Front Side (b) Back Side

(c) with the LCD and Keypad Add-on (d) Profile view

Figure 5.15: Photo of the second version of the main board

52 Chapter 6

Interfacing with the Computer Simulator

SIMAC simulates a group of modules that are interconnected, being those modules as simple as logic gates or as complex as processors such as PEPE. Some of the modules act as inputs or outputs to enable interaction with the user. The current SIMAC architecture we are using for this dissertation was already been presented in figure 3.2. This was also the base architecture for our emulation. The SIMAC has two modes of operation. Design, where users can add new peripherals, connect them and change their properties. The second mode is simulation, which allows the users to simulate the modules they interconnected in the design mode. The simulation window of the PEPE module allows the users to perform various actions. The figure 6.1 has annotations indicating the various actions the user can perform. The simulation window also provides indication on the registers values, a table with all the different flags of the status register and the values of the auxiliary registers. This is useful for following the program flow and watch the results of each instruction. Recalling what was previously stated, we have a host computer that is connected to the PEPEon- BOARD via USB. The PEPEonBOARD runs PEPE programs, and is controlled by the Jalapeno˜ , that receives messages sent from SIMAC running on the host computer. In this chapter we will discuss the details of the Jalapeno˜ and the communication with the host computer.

6.1 Software Changes to the SIMAC

One of our requirements was to maintain a familiar user interface. So, instead of creating a new sim- ulation window we simply added the necessary elements to enable running PEPE programs locally or on the PEPEonBOARD. These changes are highlighted in figure 6.1 (b). We added a toggle button that selects between executing the programs locally or on the PEPEonBOARD. If the user chooses to run the program on the external emulator, he/she must select the appropriate Serial Port of the external

53 emulator on a list. This is the only visible difference on the toolbar, the user will notice from the previous version of the SIMAC. Nonetheless, to enable the communication with the PEPEonBOARD we had to make additions to the existing SIMAC code that implements the PEPE module. Previously all the actions described in the figure 6.1 (a) only affected the simulation of the processor. However, with the addition of the new feature that allows the user to run PEPE programs on the external emulator, we have the need to modify all the actions to adapt to this feature. As part of our requirement, we want to maintain the user experience identical, so the actions must have the same effect if running on the simulator or running on the external hardware. When the user performs an action, we must take into consideration if the action will be performed by the simulator or by the external emulator. However, before checking whether the action will run on the simulator or not, there are things we can do that are common to both cases. Things like preprocessing information and updating the graphical elements of the simulation window can be performed firstly for both cases. The only actions requiring preprocessing are the actions related with assembling the pro- gram. As we mentioned previously, the external emulator accepts COD files. These files are compiled versions of the PEPE assembly program. Before sending any new programs to the external emulator we must first compile them. Once more, the same happens when updating the graphical elements of the simulation window. Not all the actions require that we update the graphical elements of the window. Actions that load new programs, that cause a change in the source code listing, adding or removing breakpoints, pausing or stepping the program, require some changes of the graphical elements of the simulation window. These operations do not depend on whether the program is running on the simulator or on the external emulator. Figure 6.2 presents a flowchart describing the process of executing an action. Only minor updates were made to the simulation code of PEPE. During the test phase of the emulator some minor bugs were detected on the DIV Rd, Rs and MOD Rd, Rs instructions. When executing the division or modulo instructions with negative numbers on at least one of the registers, the results obtained were not correct. This was due to the fact that previously the simulator performed a division or modulo operation on the registers, considering they were 32 bit integers, without extending the signal from the original 16 bit value. To fix this problem we simply extended the signal of the 16 bit value to the 32 bit value. The usage of 32 bit integers is due to the fact the SIMAC is simulating the processor with the computer resources. In addition, minor changes were made on the breakpoints colors in the program listing. In the program listing, the current instruction is displayed as cyan and a breakpoint is displayed as magenta. One example of this can be seen in figure 6.1.

6.2 Communication with the Jalapeno˜

We needed to communicate with the external emulator to command each action. This communication is done through one of the two serial interfaces that are created when the dedicated hardware is con- nected through USB. To enable the communication using the serial interfaces in the Java programming

54 (a) Description of each action and tables

(b) New changes made to the window

Figure 6.1: PEPE Simulation Window in SIMAC

55 User performs action

Perform preprocessing (if applicable)

Update the visual elements relative to that action (if applicable)

Yes Running on the No Simulator?

Send the Perform the action action to on the simulator the external emulator

End

Figure 6.2: Executing an Action on the SIMAC

56 language, we used the Java programming library RXTX1. As the actions performed are specific to our problem, we needed a custom solution to transmit these messages between the external emulator and the computer, so a simple communication protocol was created. As we described in figure 6.2, if the current program is running on the external emulator we must send the executed action to the external emulator. These messages must be processed by the external emulator, and for this we created a resident monitor that listens to incoming messages on the serial port, interprets the received messages and if necessary sends a response back to the computer. We called this resident monitor Jalapeno˜ . Before introducing how the Jalapeno˜ works, we must first define what we consider a message in our communication protocol. A message is a group of bytes, which can be either received or sent by the Jalapeno˜ . A message must be initiated by a Sync Character, which in our case is the character ‘@’, and is followed by a byte which corresponds to the header of the message. The header serves as an identifier for different message types. The following bytes are totally dependent on the message type, and could include any necessary information, still, for each different message type the length must be known priorly. The figure 6.3 represents the format of a message. character Header Sync Optional parameters

@ H

Figure 6.3: Format of a Valid Message

This communication protocol makes use of CRCs in one of the implemented messages, to guarantee the message was received with no errors. However, the use of CRCs should have been extended to the remaining messages to guarantee that the received message contained no errors. Now that we defined what a message is, we are able to describe how the Jalapeno˜ reads the mes- sages received from the computer. When the computer sends a message to the external emulator, the message is received byte by byte by the microcontroller’s UART peripheral. The UART peripheral is responsible for managing the serial communications and frees the microcontroller to perform other actions. Every time a byte is received, the microcontroller serves an interrupt, which handles a byte from the serial port. The interrupt routine saves the received byte in a circular buffer, which has a fixed size of 1024 bytes. The size of 1024 bytes was chosen because it can store at least one message to upload a new program and several other smaller messages at the same time. Note that if the communication is correct we will only have one of upload program message at the time, because the computer needs confirmation before sending the next segment of this message. This behavior is described in more detail in the following sections. Choosing the circular buffer allows us to save a significant amount of data with the advantage of

1http://rxtx.qbang.org/

57 overwriting old messages. The circular buffer has two pointers - readPointer and writePointer - that are used in two distinct situations. When a byte arrives, the interrupt routine saves the byte on the position pointed by the writePointer and increments the pointer. If the pointer is at the end of the circular buffer then it is reset back to the beginning of the buffer. The readPointer is used by the Jalapeno˜ to read new messages. When the Jalapeno˜ tries to process new messages it checks if the readPointer is the same as the writePointer. If they are the same then no new messages have arrived. Otherwise it will process the new message. As the Jalapeno˜ is reading bytes from the circular buffer it too increments the readPointer, in a process identical to what happened with the writePointer. A diagram of the circular buffer is presented in figure 6.4.

readPointer writePointer

@ V @ D …

0 1023

Figure 6.4: Representation of the Circular Buffer

The Jalapeno˜ has the ability to control the flow of the emulator. Currently, the emulator has three different states: the running state is the normal state when the emulator is running; the paused state is one enforced by the Jalapeno˜ when the current instruction has a breakpoint or the user steps or pauses the emulation; finally, the stopped state is the state when the emulator is stopped. When the Jalapeno˜ pauses the emulator, the user must perform an action to put the emulator back into the running state. The Jalapeno˜ pauses the emulator in a few cases: when it receives a pause message from the computer; when the current instruction has a breakpoint; or when the user sent a step message on the previous instruction. When the emulator is paused the Jalapeno˜ remains in a loop waiting for messages from the user. The function that is responsible for the flow of the emulator, as well as processing new messages is the checkAction function. This function is called in the beginning of every instruction if the emulator is running, and is looped if the emulator is either paused or stopped. Figure 6.5 has a representation of the algorithm of the checkAction function. The flow of the emulator can be both changed by a message or by a breakpoint. Both situations will be described in detail in the following sections. The way the Jalapeno˜ knows new messages have been received has been described, but now we will detail what happens when there are new messages to process. The first check it performs is if the message is valid. If the message does not have a valid format or simply has an unknown header then the Jalapeno˜ replies with a NACK message. The NACK message is used when an error has occurred, and a similar message - ACK - is used for cases when the message was executed successfully. The ACK and NACK messages don’t follow our established convention for a valid message, and are represented by the byte 0xA5 and 0x5A, respectively. The reason for not using our established convention for a valid message is that it would have a significant 50% overhead with the inclusion of the sync character. After checking if a message is valid, the Jalapeno˜ performs the appropriate action. Currently we have nine different actions that are detailed in the following sections.

58 Start

Check For Breakpoints

Process Start the Incoming Emulator Messages

Yes

No Emulator is Yes Emulator is No running? starting? End

Figure 6.5: Algorithm of the checkAction Function

6.2.1 Download COD

This action is currently not used by the SIMAC. It allows the Jalapeno˜ to send the whole COD file stored in memory by the microcontroller, to the computer. This was created when the integration between the external emulator and the SIMAC was not complete. This provided a way to test the Upload COD action, which can be composed by different segments, to see if it was successfully writing the COD file into the flash memory. The format of this message is described in figure 6.6.

Length of the COD COD file file

@ D @ D … 0 length (a) Message (b) Reply message

Figure 6.6: Format of the Download COD message

As it could be useful for future usages it was left as part of the Jalapeno˜ .

6.2.2 Upload COD

This is one of the most important actions. This action is performed when the user loads a new PEPE program to the external emulator. These messages have a particular format pictured in figure 6.7. After the header there are two bytes that correspond to the length (in bytes) of the segment. After the length comes two bytes for the start address. As we are limiting the size of the segment to 512 bytes we receive a start address that indicates where we will need to write the received segment. Finally, we receive other two bytes for the CRC. The CRC is calculated by the computer before sending the segment and is based on the CRC-16-CCITT standard. After all these prefixed fields comes the segment with a known size. Despite the length field allowing segments with a maximum size of 216 = 65536 bytes we chose a maximum size of 512 bytes for the segment for some reasons. For one it provides an acceptable

59 Length of Address CRC Segment the Segment

@ U …

0 length

Figure 6.7: Format of the Upload COD message overhead between the segment field and all the previous fields. In the best case scenario we have 8 512+8 ≈ 1.5% overhead if the segment length is 512 bytes. Secondly, there might be situations where the whole message is stored in a circular buffer, waiting to be interpreted by the Jalapeno˜ . That means that we had to allocate a larger circular buffer if we wanted larger messages. Not having a size limitation for the segment means we have to begin processing the message before the circular buffer overwrites part of the message. During the tests performed on the Jalapeno˜ , we didn’t found any major downsides of using this maximum size of segment. To exemplify a plausible use of this message, if a user tries to send a COD file with the size of 1256 1256 bytes then a total of d 512 e = 3 messages must be sent: two with a length of 512 bytes and the final one with the length of 1256 − 2 ∗ 512 = 232 bytes. The addresses of those messages would be 0, 512 and 1024 respectively. Before saving the segment to the flash memory a CRC comparison is performed. The CRC calcu- lation of the received segment is performed by a dedicated peripheral, existing on the microcontroller. This peripheral has the advantage of calculating the CRC faster than the microcontroller would, with the addition of freeing the microcontroller for other tasks. If the calculated CRC and received CRC are the same then the segment is saved in Flash memory and an ACK message is replied to the computer. Otherwise a NACK message is sent back.

6.2.3 Erase COD

This action is very simple and is responsible for erasing the COD memory of the microcontroller. It also erases all existing breakpoints. This is called by the SIMAC before sending a new COD file. The format of the message is pictured in figure 6.8.

@ E

Figure 6.8: Format of the Erase COD message

This message has no arguments and after it erases the COD memory it replies with an ACK message to the computer.

6.2.4 Version

This is another action that is not used by the SIMAC but was left thinking on possible future updates. This message has no arguments and all it does is to reply the major and minor version numbers to the

60 computer. Those numbers are sent as bytes so they are limited to 28 = 256 different numbers, but used in conjunction we could have 216 = 65536 different versions. Figure 6.9 provides a representation of the message the format.

Major Minor Version Version

@ V @ V '.'

(a) Message (b) Reply message

Figure 6.9: Format of the Version message

6.2.5 Run Emulator

This command puts the emulator into the running state and sends an ACK message back to the com- puter as reply. The message format is pictured in figure 6.10.

@ R

Figure 6.10: Format of the Run Emulator Message

6.2.6 Pause Emulator

This command is similar to the Run Emulator command but puts the emulator in the paused state. Its format is pictured in figure 6.11.

@ P

Figure 6.11: Format of the Pause Emulator Message

6.2.7 Step Emulator

This command puts the emulator in the running state and replies with an ACK message to the computer. In addition, it also sets a variable - pauseOnNextInstruction. This variable is then read in the checkFor- Breakpoints function in the next instruction, and pauses the emulator. The message format is illustrated in figure 6.12. Using the pauseOnNextInstruction variable was the solution found for the problem on stepping single instructions. One solution could be putting a breakpoint on the next instruction and removing that break- point once we were in that instruction. The main difficulty with that approach is that we need to predict whether the current instruction will jump from the normal flow of the program, and add a breakpoint in the appropriate instruction. This must be done carefully because if the next instruction already has a

61 @ S

Figure 6.12: Format of the Step Emulator Message breakpoint we do not want to remove it. Our proposed solution is simpler and more efficient than the one just discussed, as we do not have to predict if the current instruction will jump from the normal flow of the program, or modify the current implementation of the breakpoints.

6.2.8 Add Breakpoint

This command adds a breakpoint and receives a two byte field containing the address of the instruction to add a breakpoint. Its format is represented in figure 6.13.

Address

@ B

Figure 6.13: Format of the Add Breakpoint Message

Before replying to the computer it performs a simple check. The emulator has a limited number of available breakpoints so, before adding a breakpoint, it checks if there is space for adding a new breakpoint. If there is space for the breakpoint, it is added in the first available slot in the breakpoints list and an ACK message is replied to the computer. Otherwise it replies with a NACK message. To check if there is space available for breakpoints, it checks a variable containing the number of used breakpoints against the maximum number of breakpoints, which is currently defined as 16 breakpoints. After adding a breakpoint this variable is incremented.

6.2.9 Remove Breakpoint

The format of this command is similar to the Add Breakpoint command, with two bytes serving as the instruction address from where a breakpoint should be removed. The format of this message is illustrated in figure 6.14.

Address

@ C

Figure 6.14: Format of the Remove Breakpoint Message

This message differs from the Add Breakpoint message as it does not need to check for available space for new breakpoints. Instead, after deleting the breakpoint, it replies with an ACK message if it the breakpoint was successfully removed. Otherwise it replies with a NACK message.

62 After a breakpoint is deleted, the number of used breakpoints is decremented, and all the following breakpoints in the breakpoints list are shifted, to leave one more slot at the end of the list.

6.3 Breakpoints

Breakpoints put the emulator in the paused state, meaning that for the program to run again, the user must send an action to the external emulator to change the emulator back to the running state. Pausing the program is indeed an useful operation. When the program pauses we transfer all PEPE’s registers values and program counter back to the computer to be displayed in the PEPE simulation window. This is what allows the users to use the PEPE simulation window as they would in a simulation, because as they step the program, or add breakpoints, they see the registers change according to the executed instruction. This message follows the format pictured in figure 6.15.

@ G PC R0 … R15 A0

Figure 6.15: Format of the Registers Message

In the simulator we must somehow process that information and display it on the appropriate fields existing on the PEPE simulator window. To do this we added an event 2 to incoming serial messages from the external emulator. This only happens after the user puts the emulator in the running state. For instance, if the user starts or steps the program and we received a reply confirming this operation, we know the emulator is now running. When this happens we add an event for incoming serial messages that processes serial messages from the emulator. The only message that the Jalapeno˜ sends to the computer without receiving an action first is exactly the message sending the registers values and the program counter. After the event receives a message and confirming it is a message sending the registers and the program counter, it updates the corresponding register fields, and single flag indicators (depending on the received status register value) on the simulation window. This does not mean the user may not send commands to the external hardware while the emulator is running. As the Jalapeno˜ checks for actions every instruction, received commands while the emulator is running are always processed. Now we must describe how breakpoints are saved. Because the program is saved in Flash memory (a non-volatile type of memory), there could be a need to also save the breakpoints in non-volatile memory. However this was considered not to be the case as breakpoints are mainly relevant for the actual session of running the program. Moreover, saving the breakpoints in Flash memory would impact negatively the Flash memory condition. For this reason the breakpoints are stored in RAM and the maximum number of breakpoints is set as a constant in the code. To indicate the number of used breakpoints we maintain a variable with the number of used breakpoints.

2This event is a function that is called automatically every time we receive a serial message

63 We chose a maximum of 16 simultaneous breakpoints. This number is a compromise between speed and convenience. Ideally we would have the same number of available breakpoints as the number of instructions. However, to enable this type of behavior we would have to have at 4 KB of RAM available for breakpoints, to match the 64 KB of PEPE’s memory, with 1 bit for every instruction (16 bits) of the PEPE memory. Having 4 KB of RAM dedicated to breakpoints, knowing we only have 8 KB of RAM free was considered to be unnecessary. So we have chosen to have a maximum of 16 breakpoints and save them as a list instead. Saving the breakpoints in a list means we have to iterate through the whole list in the case that no breakpoint on the list matches the current instruction address. As this is done in every instruction we want to keep this operation short, so 16 breakpoints was considered to be a good compromise between speed and convenience.

6.3.1 Stepping

As described previously, stepping the program is just a particular way of pausing the emulator. When the user executes the step command he/she intends to pause the emulator in the next instruction. Checking if the program must pause in the next instruction is done in the same function as the function that checks for the breakpoints. However, if the emulator has the order to pause on the next instruction it has no need to check for the breakpoints on that instruction as the end result will be the same - the emulator will pause and the information about the registers and the program counter is sent to the computer.

64 Chapter 7

Evaluation and Result Analysis

In this chapter we will evaluate some of the relevant features of PEPEonBOARD and comment on the results. The tests performed try to measure the time some of the components of PEPEonBOARD take to run. The instructions timings test offers a comparison between instructions with different complexities. The speed test will compare the run time of a similar program running on SIMAC, PEPEonBOARD and natively on the PEPEonBOARD platform without emulation. The virtual memory test will measure the impact of using a virtual memory and compare it with the expected results for these operations. Finally, the LCD update time test measures how much time it takes to fully update the LCD using a PEPE program and determine if it harms the emulation experience.

7.1 Instructions Timings

In this test we chose to measure the time of some of PEPE’s instructions running in the PEPEonBOARD. Despite not being able to compare the results with all instructions running in SIMAC, by analyzing the results we can estimate how long an instruction takes to run, and have an idea of the emulation’s overhead. Figure 7.1 provides these results. For this test, we chose to include some instructions with different complexities. Before measuring the results, we had no real metric for the complexity of each instruction, so an instruction was considered to be of low, medium or high complexity, based on the emulating algorithm for that instruction. Instructions such as MOVL Rd, k, XOR Rd, Rs and JMP Label are considered to be simple instructions, as they are emulated using a single microcontroller’s instruction. The instructions ROLC Rd, n and SHLA Rd, n are considered to be of medium complexity as they are emulated using a single microcontroller’s instruction called multiple times. Finally, the MUL Rd, Rs, DIV Rd, Rs and MOD Rd, Rs are considered to be of high complexity as they are emulated using loops that can have many instructions. It is worth mentioning that the results of this test may vary depending on the inputs, which this test does not measure extensibly.

65 To measure the clock cycles we use a timer that is cleared before every measured instruction and add a breakpoint in the end of said instructions. This way we are able to manually read the timer value for every instruction. As this was measured directly on the microcontroller it is not directly observable by the test code.

350

300

250

200

150 Clock Cycles

100

50

0 MOVL XOR JMP ROLC SHLA MUL DIV MOD Instrucon

Figure 7.1: Number Clock Cycles of Different Instructions

Analysing figure 7.1 we can see that our initial predictions for the instruction’s complexities are con- firmed. The least complex instructions were executed in a lower number of clock cycles, although they did not took the same time. The JMP Label instruction was the quickest because its emulation does not read or write to memory, the MOVL Rd, k needs to save the immediate operand to a PEPE register (stored in the microcontroller’s RAM) taking a little longer, while the XOR Rd, Rs instruction needs to read both operands from memory, perform the XOR operation and save the result in the Rd register. Both medium complexity instructions perform similarly. This was expected given that both instructions are implemented in a similar way and had almost the same inputs. The difference between medium and high complexity instructions however, is less than expected. This is most likely explained due to the small inputs of the high complexity instructions (e.g. in this case we are only multiplying, dividing and modulo by 10, which is a relatively small operation) however, further testing with different (larger) inputs would have to be performed to confirm this hypothesis.

7.2 Speed Test

In this test we measured the speed of the PEPEonBOARD and compare it with the speed of SIMAC and the speed of a native implementation on the microcontroller. For this we created a program using the PEPE Assembly Language and ran it in both the PEPEonBOARD and SIMAC. For the native imple- mentation in the microcontroller, we created a program using the same algorithm, with the C program- ming language and ran it in the PEPEonBOARD platform. This test was created in the C programming language, as opposed to the MSP430 Assembly, because operations such as divisions, modulos and multiplications are already available. The program created for this test finds prime numbers. The algorithm used for this task was the Sieve

66 Simulation Emulation on Natively on (SIMAC) PEPEonBOARD PEPEonBOARD Time[s] 18, 96 2, 784 1, 424 Std. Deviation [s] 0, 209 1, 89 ∗ 109 8, 78 ∗ 109

Table 7.1: Time Measurements of Speed Test Program of Sundaram. This algorithm allows finding prime numbers using only multiplications and by iterating two variables, making it easy to implement in Assembly. It is also quicker than a naive implementation for finding prime numbers, where we compare a number to all previous numbers and verify if the remainder of the integer division is not zero. To test more of the capabilities of the emulation we transmit the found prime numbers over serial communication. In the beginning of the program we transmit the message Press Button... and wait for a button press. Once the button is pressed we transmit the message Starting! and light all the LEDs. After those operations are concluded we start finding all prime numbers bellow a predefined limit. For this case we find prime numbers bellow 2000. Once all prime numbers were found and transmitted, we turn off all the LEDs, transmit the message Done. and repeat this process from the beginning waiting for a button press. To measure the times we developed a program to run on the computer that reads the serial interface and calculates the time it takes to calculate and transmit all the prime numbers. Note that this approach is only valid for the programs running (natively and emulated) in PEPEonBOARD, because for the pro- gram running in SIMAC we have no access to the serial interface to the outside world. For SIMAC we measured the execution 10 times using a stopwatch. The resulting times are presented in table 7.1 This approach does not take into consideration the startup time. This was due to the difficulty to measure the execution time including the startup time. In addition it would make it harder to compare the results. As we can see from the results obtained in 7.1, the PEPEonBOARD emulation execution time is much lower than the result obtained with SIMAC. However, the comparison between the two results is not a fair comparison. The PEPE implementation in SIMAC has a default internal period that is not matched with the period of PEPEonBOARD. In addition, during the measurement of the simulated times we noted that while finding all the prime numbers took 7 to 8 seconds, most of the execution time was spent transmitting data. From the comparison between the two results we can only deduce that using the PEPEonBOARD is not harming the overall experience of running PEPE programs. Comparing the results from the programs running (natively and emulated) in PEPEonBOARD is much fairer. The tests were performed under the same conditions, on the same platform running at the same clock frequency. As observed from the results, the native implementation is quicker than the emulated in PEPEonBOARD, as expected. However the difference is relatively small - a 95,5% increase. Considering the smallest emulated instruction takes more than 10 MSP430 instructions, the increase is acceptable. The developed programs for this test are provided in the AppendixB.

67 7.3 Virtual Memory

One other measured feature of PEPEonBOARD was the time spent changing pages of the virtual mem- ory. For this we developed a PEPE program that writes to a page on a determined number of segments1. Recalling what we described in chapter4, we change the page every time we perform a write opera- tion outside the current page. Also, we only write to the flash memory the segments that have changed. For that reason in our program we start by changing the page when no changes were made to the page (possible because we start with page 0 and the first write operation is to page 1), and continue writing a an increasing number of segments until we reach a point when all the segments of the page changed. The results obtained are presented in figure 7.2.

0,60

0,50

0,40

0,30 Time [s]

0,20

0,10

0,00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Number of Changed Segments per Page

Figure 7.2: Measured Times for The Virtual Memory Test

As we can see in figure 7.2, the time it takes to change a page increases with the number of changed segments. Also, by looking at the time when no segments changed, we can deduce the majority of the time spent changing pages is occupied by writing to flash memory, as we calculate the CRC in every test case. Using the minimum and maximum values for programming the flash memory provided in [7, pp.65] we calculated in 7.1 and 7.2 that programming 8 KB of flash memory could take between 0, 387 s and 0, 515 s. Taking those values into consideration we can assume our value of 0, 52 s as acceptable, considering it includes calculating the CRC for the whole page and copying the new page back to RAM.

t tBLOCK,0 + 2t + tBLOCK,N t = ERASE,min + BLOCK,1−(N−1) ∗ 2 ∗ 1024 = min 1 ∗ 103 1 ∗ 106 23 49 + 37 ∗ 2 + 55 = + ∗ 2 ∗ 1024 = (7.1) 1 ∗ 103 1 ∗ 106 = 0, 387 s

1A page has 8 KB and a segment has 512 bytes

68 t tBLOCK,0 + t ∗ 2 + tBLOCK,N t = ERASE,max + BLOCK,1−(N−1) ∗ 2 ∗ 1024 = max 1 ∗ 103 1 ∗ 106 32 65 + 49 ∗ 2 + 73 = + ∗ 2 ∗ 1024 = (7.2) 1 ∗ 103 1 ∗ 106 = 0, 515 s

To measure these times we used an internal timer, present in the microcontroller. This clock was 1,048576 MHz configured to a frequency of 16 = 65, 536 KHz. By resetting the timer before changing the page and reading its value afterwards, we were able to get the time values presented in figure 7.2. This clock frequency was chosen because it allowed measuring the timer value for all test cases, without overflowing the 16 bit timer register. The program used in this test is presented in AppendixB. Note that this program does not include the portion of the code used to measure the time values, as this was performed directly on the micro- controller.

7.4 LCD Update Time

One of the features that took a noticeable amount of time was updating the values of the LCD. This was one of the reasons that motivated the increase of the microcontroller’s clock frequency from 1 MHz to 8 MHz. Furthermore, along with increasing the clock frequency we also carried out some optimizations on the code used to update the LCD in order to increase the speed of such process. Previously every time a new value was written to the LCD, we updated the whole screen, while now we only update the necessary addresses. The final algorithm for updating values on the LCD is described in chapter4. To measure the time necessary to update all the values of the LCD we created a program, provided in AppendixB, that writes a test pattern on the LCD. This program was used to generate figure 7.3.

Figure 7.3: LCD Test Pattern

To measure the time we used the same method as in the previous section. The timer was reset in the

69 beginning of the code and read at the end. To enable reading the timer value, we added a breakpoint on the last instruction of the program and read the timer value. After 10 executions the read value remained the same, corresponding to an update time of 69,44 ms. This value is low enough to be ignored during the normal execution of the program. Taking this value and considering we had a clock frequency of 1 MHz, the total update time would be 8 times higher - 555,54 ms. This value is much more significative and was perceptible during the execution of the program, while the current value is not as perceptible. This value, despite being derived from the actual measured value, is in the same order of the magni- tude and is comparable with the time value experienced before increase the on clock frequency.

7.5 Startup Time

One of the disadvantages of our emulation approach compared to others resides in the need to precode every instruction. This process was described in chapter4. What we consider to be startup time is the time spent on the decode process, as well as the time copying a new page from flash to ram and clearing the register. To measure how significant the startup time is, we measured the startup time for 14 different pro- grams. The resulting times are presented in figure 7.4. Similar to what we did in previous sections we used a timer to measure the time difference.

300

250 y = 0,0175x + 54,658 R² = 0,94629

200

150 Time [ms]

100

50

0 0 2000 4000 6000 8000 10000 12000 14000 COD File Size [Bytes]

Figure 7.4: Startup Time of Different Programs

As can be observed in figure 7.4, different COD files result in different startup times, with larger COD files causing higher startup times. This could be explained because larger COD files tend to include larger programs, which increases the startup time, as we are writing to flash memory.

70 Chapter 8

Conclusions

This dissertation had the goal of creating a new way of running PEPE programs, outside the computer simulator. We presented some approaches for emulating PEPE on a microcontroller. After discussing four different emulation approaches we chose the Threaded Interpretation with Precoding approach. This was made possible by the amount of flash memory available in our microcontroller, and provided good performance, confirmed in the results. We discussed the challenges we encountered when emulating PEPE on a microcontroller. We first studied PEPE’s features that were relevant for this work, how we translated from one instruction set to another by decoding COD files. In addition, we were also able to use PEPE’s memory bank with limited RAM by using a virtual memory. It also has been described how we interacted with the peripherals and PEPE’s interrupts. For this work a hardware platform was created, using the MoteIST++ s5 1011 board as the micro- controller board, and where we included many peripherals. The microcontroller we used proved to be a good choice due to the amount of available memory, included peripherals and performance. However, different connectors should have been used to connect the MoteIST++ s5 board to the PEPEonBOARD platform, as the current ones present some problems. In future versions of the PEPEonBOARD plat- form, the microcontroller should be integrated directly on the platform, for a more robust connection. Furthermore, with the addition of the Add-On board on the PEPEonBOARD platform, we will be able to further extend the main board, or even to use the Add-On boards on different platforms in the future. Finally we were able to maintain a familiar user interface by adapting SIMAC’s code to communicate with the external emulator. This way we were able to retain all the previous functionality. In the end, this work was validated when we measured some of the results of the emulator and veri- fied the emulator produced good results, not harming the user experience. While verifying the different components of the emulator produced good results, we were able to run the same PEPE program faster than the computer simulator. Even though the work developed in this dissertation is a start, there is room for improvement. This dissertation discusses an emulator with a fixed SIMAC architecture, a welcome improvement would be

71 adding support for changing this architecture in SIMAC or by adding new boards in PEPEonBOARD. Also, minor additions to SIMAC such as adding a general purpose input/output module, would allow using PEPE with any external peripheral. The work developed for this dissertation opens new possibilities for teaching the basics of Computer Architectures. When previously students were limited to a simulation scenario, they can now use their knowledge to interact with the real world. We look forward to see this work used and improved by students, while motivating them for new challenges.

72 73 Appendix A B C A D Hardware ision Rev : 4 4 A.1 Schematics of the Second Version of the Main Board Sheet of Drawn By NLCBC202 NLCBC204 NLCBC206 NLCBC208 NLCBC2010 NLCBC2012 NLCBC2014 NLCBC2016 NLCBC2018 NLCBC2020 CBC2_2 CBC2_4 CBC2_6 CBC2_8 CBC2_10 CBC2_12 CBC2_14 CBC2_16 CBC2_18 CBC2_20 2013 B ber 2 4 6 8 10 12 14 16 18 20 S 04- PICBC202 PICBC204 PICBC206 PICBC208 PICBC2010 PICBC2012 PICBC2014 PICBC2016 PICBC2018 PICBC2020 Num 08- C:\Users\..\MoteConnectors.SchDoc 2 4 6 8 10 12 14 16 18 20 A4 ile: Title Size Date: F connected to the U the to connected DF a 7 Header 17Header 1 3 5 7 9 11 13 15 17 19 O16 O15 O14 O17 O18 O19 O20 O21 O22 O23 O24 O25 I I I I I I I I I I I I COCBC2 CBC2 DF 2C_SDA 2C_SCL NLRX1 NLGPIO16 NLGPIO15 NLGPIO14 NLI2C0SDA NLI2C0SCL NLGPIO17 NLTX1 NLGPIO18 NLGPIO19 NLGPIO20 NLGPIO21 NLGPIO22 NLGPIO23 NLGPIO24 NLGPIO25 RX1 GP GP GP I I GP TX1 GP GP GP GP GP GP GP GP 1 3 5 7 9 reviously PIR605 PIR606 PIR607 PIR608 PIR805 PIR806 PIR807 PIR808 PIR2005 PIR2006 PIR2007 PIR2008 PIR2205 PIR2206 PIR2207 PIR2208 11 13 15 17 19 P 330 330 330 330 3 3 COR6 COR8 COR20 COR22 R6 R8 R20 R22 PICBC201 PICBC203 PICBC205 PICBC207 PICBC209 PICBC2011 PICBC2013 PICBC2015 PICBC2017 PICBC2019 PIR601 PIR602 PIR603 PIR604 PIR801 PIR802 PIR803 PIR804 PIR2001 PIR2002 PIR2003 PIR2004 PIR2201 PIR2202 PIR2203 PIR2204 NLRESET NLRTS1 NLCBC2011 NLCBC209 NLCBC207 NLCBC205 NLCBC2019 NLCBC2013 RESET RTS1 CBC2_5 CBC2_7 CBC2_9 CBC2_11 CBC2_13 CBC2_19 CBC2_11 CBC2_9 CBC2_7 CBC2_5 CBC2_18 CBC2_20 CBC2_19 CBC2_13 CBC2_2 CBC2_4 CBC2_6 CBC2_8 CBC2_10 CBC2_12 CBC2_14 CBC2_16 3V3 GND NLCBC102 NLCBC104 NLCBC106 NLCBC108 NLCBC1010 NLCBC1012 NLCBC1014 NLCBC1016 NLCBC1018 NLCBC1020 CBC1_2 CBC1_4 CBC1_6 CBC1_8 CBC1_10 CBC1_12 CBC1_14 CBC1_16 CBC1_18 CBC1_20 2420 2 4 6 8 10 12 14 16 18 20 C PICBC102 PICBC104 PICBC106 PICBC108 PICBC1010 PICBC1012 PICBC1014 PICBC1016 PICBC1018 PICBC1020 2 4 6 8 10 12 14 16 18 20 2 2 ER) EAK ED4) ED3) ED2) ED1) K SO connected to the C the to connected DF a 7 Header O4 (L O3 (L O2 (L O1 (L O5 (SW1) O6 (SW2) O7 (SW3) O8 (SW4) O9 (7SEG) O10 (SP O11 O12 O13 _MOSI _MI _CL I I I I I I I I I I I I I 17Header I I I 1 3 5 7 9 11 13 15 17 19 COCBC1 NLGPIO4 (LED4) NLGPIO3 (LED3) NLGPIO2 (LED2) NLGPIO1 (LED1) NLTX2 NLRX2 NLGPIO5 (SW1) NLGPIO6 (SW2) NLGPIO7 (SW3) NLGPIO8 (SW4) NLGPIO9 (7SEG) NLGPIO10 (SPEAKER) NLGPIO11 NLGPIO12 NLGPIO13 NLSPI0MOSI NLSPI0MISO NLSPI0CLK CBC1 DF GP GP GP GP TX2 RX2 GP GP GP GP GP GP GP GP GP SP SP SP PIR705 PIR706 PIR707 PIR708 PIR905 PIR906 PIR907 PIR908 PIR2105 PIR2106 PIR2107 PIR2108 PIR2305 PIR2306 PIR2307 PIR2308 PIR2405 PIR2406 PIR2407 PIR2408 330 330 330 330 330 1 3 5 7 9 reviously COR7 COR9 COR21 COR23 COR24 11 13 15 17 19 R7 R9 R21 R23 R24 P PIR701 PIR702 PIR703 PIR704 PIR901 PIR902 PIR903 PIR904 PIR2101 PIR2102 PIR2103 PIR2104 PIR2301 PIR2302 PIR2303 PIR2304 PIR2401 PIR2402 PIR2403 PIR2404 PICBC101 PICBC103 PICBC105 PICBC107 PICBC109 PICBC1011 PICBC1013 PICBC1015 PICBC1017 PICBC1019 NLCBC107 NLCBC105 NLCBC103 NLCBC101 NLCBC1013 NLCBC1011 NLCBC109 NLCBC1019 CBC1_1 CBC1_3 CBC1_5 CBC1_7 CBC1_9 CBC1_11 CBC1_13 CBC1_19 CBC1_7 CBC1_5 CBC1_3 CBC1_1 CBC1_13 CBC1_11 CBC1_9 CBC1_2 CBC1_4 CBC1_6 CBC1_8 CBC1_10 CBC1_12 CBC1_14 CBC1_16 CBC1_18 CBC1_20 CBC1_19 GND 74 1 1 B C A D 1 2 3 4

Output Buffers Output Select Jumpers Buffered Outputs Power Select Add-on Board (Selects between buffered output or directly connected IO) (Select between 3.3V (This board's IOs aren't buffered) U3COU3 PCOP33 or a given voltage from the user) 1 20 BUFVCC EXTGPIO1 2 1 BUFNLBUFGPIO1GPIO1 A GND PIU301 OE1 VCC PIU3020 PIP302 2 1 PIP301 A 19 3 GPNLGPIO1IO1 (L (LED1)ED1) GND PIU3019 OE2 3 PIP303 EXTGPIO2 5 4 BUFNLBUFGPIO2GPIO2 3V3 PIP305 5 4 PIP304 GPIO1 (LED1) 2 18 BUFGPIO1 6 GPNLGPIO2IO2 (L (LED2)ED2) PCOP66 PCOP77 PIU302 A1 Y1 PIU3018 6 PIP306 GPIO2 (LED2) 3 17 BUFGPIO2 EXTGPIO3 8 7 BUFNLBUFGPIO3GPIO3 GPNLGPIO11IO11 PIU303 A2 Y2 PIU3017 PIP308 8 7 PIP307 PIP601 1 PIP701 1 GPIO3 (LED3) 4 16 BUFGPIO3 9 GPNLGPIO3IO3 (L (LED3)ED3) GPNLGPIO12IO12 PIU304 A3 Y3 PIU3016 9 PIP309 PIP602 2 PIP702 2 GPIO4 (LED4) 5 15 BUFGPIO4 EXTGPIO411 10 BUFNLBUFGPIO4GPIO4 TX2NLTX2 GPNLGPIO13IO13 PIU305 A4 Y4 PIU3015 PIP3011 11 10 PIP3010 PIP603 3 PIP703 3 GPIO5 (SW1) 6 14 BUFGPIO5 12 GPNLGPIO4IO4 (L (LED4)ED4) RX2 GPNLGPIO14IO14 PIU306 A5 Y5 PIU3014 12 PIP3012 PIP604 4 PIP704 4 GPIO6 (SW2) 7 13 BUFGPIO6 EXTGPIO514 13 BUFNLBUFGPIO5GPIO5 I2C_SCL GPNLGPIO15IO15 PIU307 A6 Y6 PIU3013 PIP3014 14 13 PIP3013 PIP605 5 PIP705 5 GPIO7 (SW3) 8 12 BUFGPIO7 15 GPNLGPIO5IO5 (SW1) (SW1) PCOP99 3V3 I2C_SDA GPNLGPIO16IO16 PIU308 A7 Y7 PIU3012 15 PIP3015 PIP606 6 PIP706 6 GPIO8 (SW4) 9 11 BUFGPIO8 EXTGPIO617 16 BUFNLBUFGPIO6GPIO6 SPI_CLK GPNLGPIO17IO17 PIU309 A8 Y8 PIU3011 PIP3017 17 16 PIP3016 3 PIP903 PIP607 7 PIP707 7 18 GPNLGPIO6IO6 (SW2) (SW2) BUFNLBUFVCCVCC SPNLSPI0MOSII_MOSI GPNLGPIO18IO18 18 PIP3018 2 PIP902 PIP608 8 PIP708 8 10 EXTGPIO720 19 BUFNLBUFGPIO7GPIO7 EXTVCC SPNLSPI0MISOI_MISO GND PIU3010 GND PIP3020 20 19 PIP3019 1 PIP901 PIP609 9 21 GPNLGPIO7IO7 (SW3) (SW3) Header 8 21 PIP3021 SN74HC541N EXTGPIO823 22 BUFNLBUFGPIO8GPIO8 Buffer Power Select Header 9 PIP3023 23 22 PIP3022 24 GPNLGPIO8IO8 (SW4) (SW4) 24 PIP3024 PCOP88 Output Select GPNLGPIO19IO19 PIP801 1 GPNLGPIO20IO20 PIP802 2 GPNLGPIO21IO21 B PIP803 3 B GPNLGPIO22IO22 PIP804 4 U6COU6 PCOP55 GPNLGPIO23IO23 PIP805 5 1 20 BUFVCC EXTGPIO9 2 1 BUFNLBUFGPIO9GPIO9 GPNLGPIO24IO24 GND PIU601 OE1 VCC PIU6020 PIP502 2 1 PIP501 PIP806 6 19 3 GPNLGPIO9IO9 (7SEG) (7SEG) GPNLGPIO25IO25 PIU6019 OE2 3 PIP503 PIP807 7 EXTGPIO10 5 4 BUFNLBUFGPIO10GPIO10 PIP505 5 4 PIP504 GPIO9 (7SEG) 2 18 BUFGPIO9 6 GPNLGPIO10IO10 (SP (SPEAKER)EAKER) Header 7 PIU602 A1 Y1 PIU6018 6 PIP506 GPIO10 (SPEAKER)3 17 BUFGPIO10 EXTGPIO11 8 7 BUFNLBUFGPIO11GPIO11 PIU603 A2 Y2 PIU6017 PIP508 8 7 PIP507 GPIO11 4 16 BUFGPIO11 9 GPIO11 75 PIU604 A3 Y3 PIU6016 9 PIP509 GPIO12 5 15 BUFGPIO12 EXTGPIO12 11 10 BUFNLBUFGPIO12GPIO12 PIU605 A4 Y4 PIU6015 PIP5011 11 10 PIP5010 GPIO13 6 14 BUFGPIO13 12 GPIO12 PIU606 A5 Y5 PIU6014 12 PIP5012 GPIO14 7 13 BUFGPIO14 EXTGPIO13 14 13 BUFNLBUFGPIO13GPIO13 PIU607 A6 Y6 PIU6013 PIP5014 14 13 PIP5013 GPIO15 8 12 BUFGPIO15 15 GPIO13 PIU608 A7 Y7 PIU6012 15 PIP5015 GPIO16 9 11 BUFGPIO16 EXTGPIO14 17 16 BUFNLBUFGPIO14GPIO14 PIU609 A8 Y8 PIU6011 PIP5017 17 16 PIP5016 18 GPIO14 External Connector 18 PIP5018 10 EXTGPIO15 20 19 BUFNLBUFGPIO15GPIO15 GND PIU6010 GND PIP5020 20 19 PIP5019 21 GPIO15 21 PIP5021 SN74HC541N EXTGPIO16 23 22 BUFNLBUFGPIO16GPIO16 GND PIP5023 23 22 PIP5022 24 GPIO16 3V3 PCOP1212 24 PIP5024 C PIP1201 1 2 PIP1202 C Output Select EXTVCCNLEXTVCC RX2NLRX2 PIP1203 3 4 PIP1204 TX2 INLI2C0SCL2C_SCL PIP1205 5 6 PIP1206 SPI_MOSI INLI2C0SDA2C_SDA PIP1207 7 8 PIP1208 SPI_MISO SPNLSPI0CLKI_CLK PIP1209 9 10 PIP12010 U7COU7 PCOP1111 EXTGPNLEXTGPIO1IO1 EXTGPNLEXTGPIO2IO2 PIP12011 11 12 PIP12012 1 20 BUFVCC EXTGPIO17 2 1 BUFNLBUFGPIO17GPIO17 EXTGPNLEXTGPIO3IO3 EXTGPNLEXTGPIO4IO4 GND PIU701 OE1 VCC PIU7020 PIP1102 2 1 PIP1101 PIP12013 13 14 PIP12014 19 3 GPIO17 EXTGPNLEXTGPIO5IO5 EXTGPNLEXTGPIO6IO6 PIU7019 OE2 3 PIP1103 PIP12015 15 16 PIP12016 EXTGPIO18 5 4 BUFNLBUFGPIO18GPIO18 EXTGPNLEXTGPIO7IO7 EXTGPNLEXTGPIO8IO8 PIP1105 5 4 PIP1104 PIP12017 17 18 PIP12018 GPIO17 2 18 BUFGPIO17 6 GPIO18 EXTGPNLEXTGPIO9IO9 EXTGPNLEXTGPIO10IO10 PIU702 A1 Y1 PIU7018 6 PIP1106 PIP12019 19 20 PIP12020 GPIO18 3 17 BUFGPIO18 EXTGPIO19 8 7 BUFNLBUFGPIO19GPIO19 EXTGPNLEXTGPIO11IO11 EXTGPNLEXTGPIO12IO12 PIU703 A2 Y2 PIU7017 PIP1108 8 7 PIP1107 PIP12021 21 22 PIP12022 GPIO19 4 16 BUFGPIO19 9 GPIO19 EXTGPNLEXTGPIO13IO13 EXTGPNLEXTGPIO14IO14 PIU704 A3 Y3 PIU7016 9 PIP1109 PIP12023 23 24 PIP12024 GPIO20 5 15 BUFGPIO20 EXTGPIO20 11 10 BUFNLBUFGPIO20GPIO20 EXTGPNLEXTGPIO15IO15 EXTGPNLEXTGPIO16IO16 PIU705 A4 Y4 PIU7015 PIP11011 11 10 PIP11010 PIP12025 25 26 PIP12026 GPIO21 6 14 BUFGPIO21 12 GPIO20 EXTGPNLEXTGPIO17IO17 EXTGPNLEXTGPIO18IO18 PIU706 A5 Y5 PIU7014 12 PIP11012 PIP12027 27 28 PIP12028 GPIO22 7 13 BUFGPIO22 EXTGPIO21 14 13 BUFNLBUFGPIO21GPIO21 EXTGPNLEXTGPIO19IO19 EXTGPNLEXTGPIO20IO20 PIU707 A6 Y6 PIU7013 PIP11014 14 13 PIP11013 PIP12029 29 30 PIP12030 GPIO23 8 12 BUFGPIO23 15 GPIO21 EXTGPNLEXTGPIO21IO21 EXTGPNLEXTGPIO22IO22 PIU708 A7 Y7 PIU7012 15 PIP11015 PIP12031 31 32 PIP12032 GPIO24 9 11 BUFGPIO24 EXTGPIO22 17 16 BUFNLBUFGPIO22GPIO22 EXTGPNLEXTGPIO23IO23 EXTGPNLEXTGPIO24IO24 PIU709 A8 Y8 PIU7011 PIP11017 17 16 PIP11016 PIP12033 33 34 PIP12034 18 GPIO22 18 PIP11018 10 EXTGPIO23 20 19 BUFNLBUFGPIO23GPIO23 Header 17X2 GND PIU7010 GND PIP11020 20 19 PIP11019 21 GPIO23 Title D 21 PIP11021 D SN74HC541N EXTGPIO24 23 22 BUFNLBUFGPIO24GPIO24 PIP11023 23 22 PIP11022 24 GPIO24 24 PIP11024 Size Number Revision Output Select A4 Date: 08-04-2013 Sheet of File: C:\Users\..\ExternalExpansion.SchDoc Drawn By: 1 2 3 4 1 2 3 4

Dual 7 Segment Display

3V3 U2COU2

LEDs ) ) ) ) Switches DS1CODS1 1 2 3 4

D D D D PCOP1313 24 5 16 11 PIU2024 PIU205 PIDS1016 PIDS1011 E E E E 3V3 VCC OUT0 a a

L L L L 1 6 15 10

( ( ( ( PIP1301 PIU201 PIU206 PIDS1015 DIG1 DIG2 PIDS1010

1 GND GND OUT1 b b

1 2 3 4 7 3 8 A 2 PIP1302 OUT2 PIU207 PIDS103 c c PIDS108 A O O O O R3COR3

I I I I 23 8 2 6 PIR301 PIR302PIU2023 PIU208 PIDS102 PIDS106 NLGPIO1 (LED1)P NLGPIO2 (LED2)P NLGPIO3 (LED3)P NLGPIO4 (LED4)P GND R-EXT OUT3 d d Switches Enable 9 1 5 G G G G 1K5 OUT4 PIU209 PIDS101 e e PIDS105 SW1COSW1 SW2COSW2 SPNLSPI0CLKI_CLK 3 10 18 12 PISW103 PISW203 PIU203 CLK OUT5 PIU2010 PIDS1018 f f PIDS1012 PID101 PID201 PID301 PID401 SPNLSPI0MOSII_MOSI 2 11 17 7 PISW102 PISW202 PIU202 SDI OUT6 PIU2011 PIDS1017 g g PIDS107 PIR402 PIR502 22 12 4 9 R4COR4 R5COR5 PIU2022 SDO OUT7 PIU2012 PIDS104 DP1 DP2 PIDS109 D1COD1 D2COD2 D3COD3 D4COD4 GPNLGPIO9IO9 (7SEG) (7SEG) 4 13 14 13 PISW101 PIR40110K PISW201 PIR50110K PIU204 LE(ED1) OUT8 PIU2013 PIDS1014 A1 A2 PIDS1013 PCOP11 21 14 PID102 PID202 PID302 PID402 G G PIU2021 OE(ED2) OUT9 PIU2014 P P 15 3V3 HDSP-521E 3V3

I I PIP102 PIU2015

O O 2 OUT10 GND GND 16 5 6 1 PIP101 OUT11 PIU2016

PCOP22 ( ( 17 S S OUT12 PIU2017 W W 7 Segment Enable 18 PIP202 PIU2018

2 1 2 OUT13

NLGPIO5) (SW1) NLGPIO6) (SW2) 19 1 PIP201 OUT14 PIU2019 GND 20 OUT15 PIU2020 LEDs Enable R=1K5 GND TLC5927 Iout_x = 12,5mA

B B 3V3

Piezo Speaker Push Buttons PCOP1414 PIP1401 1 PIP1402 2 PCOP44 3V3 Switches Enable 76 1 PIP401 PISW302 PISW301 PISW402 PISW401 2 PIP402 SW3COSW3 SW4COSW4 3V3 Speaker Enable LCOLS1S1 PISW304 PISW303 PISW404 PISW403 GPNLGPIO7IO7 (SW3) (SW3) GPNLGPIO8IO8 (SW4) (SW4) PIC301 PIR1201PILS101 COC3C3 R12COR12 PIR1302 PIR1402 PIC302 100nF PIR1202PILS1021K R13COR13 R14COR14 PIR130110K PIR140110K PiezoSpeaker GND PIQ103 GPNLGPIO10IO10 (SP (SPEAKER)EAKER) Q1COQ1 GND GND C PIR1101 PIQ101 C R11COR11 2N7002K PIR110210K PIQ102

GND GND

VCC and GND Testpoint Reset Switch Power LED

3V3 GND GND 3V3 3V3 PCOP1515 PIR1502 R25COR25 R15COR15 PID901 PID902PIR2501 PIR2502 PIP1501 1 10K 1K Title D PIR1501 PIP1502 2 D RESETNLRESET D9COD9 PISW501 PISW503 VCC & GND Testpoint PISW502 PISW504 SW5COSW5 Size Number Revision GND A4 Date: 08-04-2013 Sheet of File: C:\Users\..\Peripherals.SchDoc Drawn By: 1 2 3 4 1 2 3 4

DC IN Connector Power Regulator USB to Serial IC GND VCC-UNREG 3V3 U4COU4 3 4 PIU403 IN OUT PIU404 J1COJ1 2 A PIJ102 A 3 GND PIJ103 1 PIC402 PIC501

3V3 PIJ101 C4COC4 1 C5COC5 PWR2.5 PIC401 100nF PIU401 PIC502 10uF 3V3 LM1117DT-3.3 VCC-JACK PIC601 PIC701 C6COC6 C7COC7 GND GND GND PIC602 100nF PIC702 100nF 1V8 3V3

U1COU1

GND GND 2 7 4 0 1 2 6 FT2232HL-Tray 4 PIU104 9 PIU109 1 PIU1012 3 PIU1037 6 PIU1064 2 PIU1020 3 PIU1031 4 PIU1042 5 PIU1056 50 16 RX1NLRX1 USB Connector E E E PIU1050 L PIU1016 O O O O 3V3 VREGIN Y ADBUS0 I I I I L R R R 17 TX1NLTX1

H Power Select Jumper C C C C P ADBUS1 PIU1017 O O O P C C C C

49 V 18 RTS1NLRTS1 GND

C C C PIUSB10S4 PIUSB10S3 PIUSB10S1PIUSB10S2 1V8 PIU1049 VREGOUT V ADBUS2 PIU1018PIJMP101 PIJMP102 V V V V

V V V 19 ADBUS3 PIU1019 JMPCOJMP11 S PIC901 PIC802 21 RESETNLRESET VCC-JACK C9COC9 C8COC8 ADBUS4 PIU1021PIJMP201 PIJMP202 100nF 22 PCOP1010 5 PIC902 3.3uF PIC801 ADBUS5 PIU1022 JMPCOJMP22 GND PIUSB105 23 4 ADBUS6 PIU1023 3 PIP1003 ID PIUSB104 24 3 DPNLDP B ADBUS7 PIU1024 2 PIP1002 VCC-UNREG D+ PIUSB103 B GND GND 2 DMNLDM 1 PIP1001 D- PIUSB102 26 1 ACBUS0 PIU1026 VCC PIUSB101 DM 7 27 Power Select PIU107 DM ACBUS1 PIU1027 DP 8 28 VCC-USB COUSB1 PIU108 DP ACBUS2 PIU1028 29 TX1_LED USB VCC-USB ACBUS3 PIU1029 30 RX1_LED ACBUS4 PIU1030 R1COR1 14 32 3V3 PIR101 PIR102 PIU1014 RESETa ACBUS5 PIU1032 1K 33 77 ACBUS6 PIU1033 R2COR2 6 34 GND PIR201 PIR202 PIU106 REF ACBUS7 PIU1034 12K @ 1% 38 R26COR26 470 RX2NLRX2 BDBUS0 PIU1038 PIR2601 PIR2602 39 R27COR27 470 TX2NLTX2 SPI EEPROM Serial LEDs BDBUS1 PIU1039 PIR2701 PIR2702 3V3 40 BDBUS2 PIU1040 41 Normally this is not soldered but the footprint BDBUS3 PIU1041 43 3V3 BDBUS4 PIU1043 PIR1601 PIR1701 PIR1801 44 is here if needed R16COR16 R17COR17 R18COR18 BDBUS5 PIU1044 R10 10K 10K 10K 45 COR10 TX1_LNLTX10LEDED PIR1602 PIR1702 PIR1802 BDBUS6 PIU1045 PIR1001 PIR1005 PID501 PID502 EEDO 46 BDBUS7 PIU1046 PIR1002 PIR1006 PIR1901 EECS 63 U5COU5 D5 C R19COR19 PIU1063 EECS PIR1003 PIR1007 COD5 C 2.2K EECLK 62 48 EEDINLEEDI 3 4 EEDONLEEDO PIR1902 PIU1062 EECLK BCBUS0 PIU1048 PIU503 DI DO PIU504 PIR1004 PIR1008 EEDI 61 52 EECLNLEECLKK 2 RX1_LNLRX10LEDED PIU1061 EEDATA BCBUS1 PIU1052 PIU502 CLK 330 PID601 PID602 53 EECSNLEECS 1 7 BCBUS2 PIU1053 PIU501 CS NC PIU507 54 TX2_LED D6 BCBUS3 PIU1054 COD6 2 55 RX2_LED 6 PIU102 OSCI BCBUS4 PIU1055 PIU506 NC 57 TX2_LNLTX20LEDED BCBUS5 PIU1057 PID701 PID702 58 8 5 BCBUS6 PIU1058 3V3 PIU508 VCC VSS PIU505 GND 12MHz 59 D7 BCBUS7 PIU1059 COD7 2 1 3 93LC46B/ST

PIY202 PIY201 PIU103 OSCO D C1 PIC101 PIC201 60 RX2_LED COC1 N D D D D D D D D NLRX20LED C2COC2 PWRENa PIU1060 PID801 PID802 Y2COY2 G N N N N N N N N 27pF PIC102 PIC202 27pF 13 36 PIU1013 TEST A G G G G G G G G SUSPENDa PIU1036 D8COD8 1 5 0 1 5 5 5 7 1 GND GND GND 1 PIU1010 PIU101 PIU105 1 PIU1011 1 PIU1015 2 PIU1025 3 PIU1035 4 PIU1047 5 PIU1051

GND

Title D D

Size Number Revision A4 Date: 08-04-2013 Sheet of File: C:\Users\..\USB and Power.SchDoc Drawn By: 1 2 3 4 78 A.2 Schematics of the LCD and Keypad Add-On Board B C A D ision Rev : 4 4 Sheet of Drawn By pad Addon.SchDoc ey CD and K 2013 ber 01- Num 04- C:\Users\..\L 3V3 GND 3V3 A4 ile: K Title Size Date: F _CL _MOSI I I CD_RES CD_CS CD_A0 NLLCD0RES NLLCD0CS NLLCD0A0 NLSPI0CLK NLSPI0MOSI L L L SP SP PILCD102 PILCD101 PILCD103 PILCD1027 PILCD1028 PILCD104 PILCD105 PILCD106 PILCD107 PILCD108 PILCD109 PILCD1010 PILCD1011 PILCD1012 PILCD1013 S P A0 D0 D1 D2 D3 D4 D5 RD WR C86 CS1 RES / D6 3 3 SCL SDA / D7 3P 1N 1P 2P 2N 64128K VDD VSS VOUT V0 V1 V2 V3 V4 CAP CAP CAP CAP CAP A K CD1 CD- COLCD1 L L 3.3uF 100nF 100nF 100nF 100nF 100nF 3.3uF 3.3uF 3.3uF PILCD1014 PILCD1015 PILCD1016 PILCD1026 PILCD1025 PILCD1024 PILCD1023 PILCD1022 PILCD1017 PILCD1018 PILCD1019 PILCD1020 PILCD1021 PILCD10A PILCD10K PIC102 PIC202 PIC302 PIC402 PIC502 PIC602 PIC702 PIC802 PIC901 3V3 GND PIC101 PIC201 PIC301 PIC401 PIC501 PIC601 PIC701 PIC801 PIC902 3V3 COC1 COC2 COC3 COC4 COC5 COC6 COC7 COC8 COC9 GND C1 C2 C3 C4 C5 C6 C7 C8 C9 D C L 2 2

COR4 R4 10K

4 L O PIR401 PIR402 C 3V3 NLCOL4 PISW403 PISW404 PISW803 PISW804 PISW1203 PISW1204 PISW1603 PISW1604 SW4 SW8 SW12 SW16 COSW4 COSW8 COSW12 COSW16 PISW401 PISW402 PISW801 PISW802 PISW1201 PISW1202 PISW1601 PISW1602 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 2 3 COP2 COP3 P Header1X8 P Header1X7

R3 10K COR3

3 L O PIR301 PIR302 C 3V3 NLCOL3 PISW303 PISW304 PISW703 PISW704 PISW1103 PISW1104 PISW1503 PISW1504 PIP201 PIP202 PIP203 PIP204 PIP205 PIP206 PIP207 PIP208 PIP301 PIP302 PIP303 PIP304 PIP305 PIP306 PIP307 COSW3 COSW7 COSW11 COSW15 SW3 SW7 SW11 SW15 1 2 3 4 O22 O23 O24 O25 I I I I PISW301 PISW302 PISW701 PISW702 PISW1101 PISW1102 PISW1501 PISW1502 CD_A0 CD_RES CD_CS NLROW1 NLROW2 NLROW3 NLROW4 NLCOL1 NLCOL2 NLGPIO22 NLGPIO23 NLGPIO24 NLGPIO25 ROW1 ROW2 ROW3 ROW4 COL COL COL COL L L L GP GP GP GP

R2 10K COR2

2 L O PIR201 PIR202 C 3V3 PISW203 PISW204 PISW603 PISW604 PISW1003 PISW1004 PISW1403 PISW1404 COSW2 COSW6 COSW10 COSW14 SW2 SW6 SW10 SW14 PISW201 PISW202 PISW601 PISW602 PISW1001 PISW1002 PISW1401 PISW1402 1 2 3 4 5 6 7 8 9 1 P Header1X9 COP1

COR1 R1 10K

1 L O PIR101 PIR102 C 3V3 PISW103 PISW104 PISW503 PISW504 PISW903 PISW904 PISW1303 PISW1304 PIP101 PIP102 PIP103 PIP104 PIP105 PIP106 PIP107 PIP108 PIP109 K SO 3V3 SW1 SW5 SW9 SW13 COSW1 COSW5 COSW9 COSW13 _CL _MOSI _MI 1 1 I I I PISW101 PISW102 PISW501 PISW502 PISW901 PISW902 PISW1301 PISW1302 2C_SCL 2C_SDA TX2 RX2 I I SP SP SP NLTX2 NLRX2 NLI2C0SCL NLI2C0SDA NLSPI0MISO GND ROW1 ROW2 ROW3 ROW4 atrix pad M ey eaders K H B C A D

79 80 A.3 Bill of Materials URL URL http://pt.farnell.com/texas=instruments/sn74ahc540n/logic=buff=dvr=tri=st=octal=20dip/dp/1749625 http://www.digikey.pt/product=detail/en/DF17A(2.0)=20DP=0.5V(57)/H11135CT=ND/1036101 http://pt.farnell.com/bourns/cay16=331j4lf/resistor=array=1206=330r/dp/1770132 http://pt.farnell.com/kingbright/kpt=3216ec/led=red=1206=smd/dp/2099245 http://pt.farnell.com/te=connectivity/sls121pc04/slide=switch=2=pos/dp/1197660 http://pt.farnell.com/texas=instruments/tlc5927idbqr/led=driver=constant=current=ssop24/dp/1858095 http://pt.farnell.com/avago=technologies/hdsp=521e/led=display=dual=0=56=he=red/dp/1003314 http://pt.farnell.com/te=connectivity=alcoswitch/4=1437565=2/switch=pushbutton/dp/2060823 http://pt.farnell.com/nxp/2n7002k/mosfet=n=ch=60v=340ma=sot23/dp/1758065 http://pt.farnell.com/multicomp/mcmr08x152=jtl/resistor=anti=sulfur=1k5=0805=5/dp/2073648 http://pt.farnell.com/multicomp/mcmr08x102=jtl/resistor=anti=sulfur=1k=0805=5/dp/2073611 http://pt.farnell.com/multicomp/mcsr08x103=jtl/resistor=anti=sulfur=10k=0805=5/dp/2074340 http://pt.farnell.com/multicomp/mcmr08x1202ftl/resistor=0805=12k=1=anti=sulfur/dp/2073625 http://pt.farnell.com/fischer=elektronik/cab=4=gs/jumper=2=54mm=black/dp/9728970 http://pt.farnell.com/ftdi/ft2232hl=r/usb=uart=fifo=2232=dual=64lqfp/dp/1697461 http://pt.farnell.com/stmicroelectronics/ld1117dt33c/v=reg=ldo=3=3v=smd=1117=dpak=3/dp/1087170 http://pt.farnell.com/txc/7a=12=000maaj=t/xtal=12=000mhz=18pf=smd=5=0x3=2/dp/1841940 http://pt.farnell.com/multicomp/mj=179ph/socket=low=voltage/dp/1737246 http://pt.farnell.com/te=connectivity=amp/1734035=1/mini=usb=type=b=receptacle/dp/1654060 http://pt.farnell.com/multicomp/mcca000296/mlcc=08055v=25v=100nf/dp/1759167 http://pt.farnell.com/multicomp/mcca000323/mlcc=0805=np0=50v=27pf/dp/1759196 http://pt.farnell.com/multicomp/mcca000538/mlcc=08055v=10v=3=3uf/dp/1759417 http://pt.farnell.com/multicomp/2227=20=03=07/socket=ic=dil=0=3=20way/dp/4285608 http://pt.farnell.com/multicomp/mcca000268/mlcc=08055v=6=3v=10uf/dp/1759136 http://pt.farnell.com/te=connectivity=alcoswitch/4=1437565=2/switch=pushbutton/dp/2060823 http://pt.rs=online.com/web/p/lcd=monochrome=displays/0564399/ http://pt.farnell.com/multicomp/mcca000538/mlcc=08055v=10v=3=3uf/dp/1759417 http://pt.farnell.com/multicomp/mcca000296/mlcc=08055v=25v=100nf/dp/1759167 http://pt.farnell.com/multicomp/mcsr08x103=jtl/resistor=anti=sulfur=10k=0805=5/dp/2074340 http://pt.farnell.com/multicomp/2212s=08sg=85/socket=pcb=1=row=8way/dp/1593463 http://pt.farnell.com/multicomp/2212s=10sg=85/socket=pcb=1=row=10way/dp/1593464 Main'Board HEADERS Description Description LCD'and'Keypad'AddDOn'Board LED,'RED'1206,'SMD Printed'Circuit'Board Printed'Circuit'Board SLIDE'SWITCH,'2'POS SWITCH,'PUSHBUTTON SWITCH,'PUSHBUTTON SOCKET,'LOW'VOLTAGE JUMPER,'2.54MM,'BLACK SOCKET,'PCB,'1'ROW,'8WAY SOCKET'IC,'DIL,'0.3",'20WAY MLCC,'0805,'NP0,'50V,'27PF MLCC,'0805,'Y5V,'10V,'3.3UF MLCC,'0805,'Y5V,'6.3V,'10UF MLCC,'0805,'Y5V,'10V,'3.3UF RESISTOR'ARRAY,'1206,'330R SOCKET,'PCB,'1'ROW,'10WAY MLCC,'0805,'Y5V,'25V,'100NF MLCC,'0805,'Y5V,'25V,'100NF MINI'USB'TYPE'B,'RECEPTACLE IC,'LED'DRIVER'LINEAR'24=SSOP LED'DISPLAY,'DUAL,'0.56",'HE=RED MOSFET,'N'CH,'60V,'340MA,'SOT23 USB=UART/FIFO,'2232,'DUAL,'64LQFP V'REG'LDO'+3.3V,'SMD,'1117,'DPAK=3 RESISTOR,'ANTI'SULFUR,'1K,'0805,'5% XTAL,'12.000MHZ,'18PF,'SMD,'5.0X3.2 LOGIC,'BUFF/DVR'TRI=ST'OCTAL,'20DIP RESISTOR,'ANTI'SULFUR,'1K5,'0805,'5% RESISTOR,'ANTI'SULFUR,'10K,'0805,'5% RESISTOR,'0805,'12K,'1%,'ANTI'SULFUR RESISTOR,'ANTI'SULFUR,'10K,'0805,'5% CONN'HEADER'20POS'.5MM'SMD'GOLD 128x64'COG'FSTN,POS'MODE,white'LED'b/l 33,42 21,67 ' ' TXC NXP FTDI BOURNS DisplayTech KINGBRIGHT MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP MULTICOMP Manufacturer Manufacturer TE'CONNECTIVITY Hirose'Electric'Co'Ltd FISCHER'ELEKTRONIK TEXAS'INSTRUMENTS TEXAS'INSTRUMENTS STMICROELECTRONICS AVAGO'TECHNOLOGIES TE'CONNECTIVITY'/'AMP €'''''''''''''''''''''''''''''''''''''''''''''' €'''''''''''''''''''''''''''''''''''''''''''''' TE'CONNECTIVITY'/'ALCOSWITCH TE'CONNECTIVITY'/'ALCOSWITCH ' = ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' 0,47 3,78 0,15 0,70 1,70 1,02 1,80 1,11 0,08 0,01 0,03 0,07 0,01 4,80 0,17 1,00 0,64 1,06 0,05 0,02 0,04 0,35 0,04 3,50 5,92 0,18 0,05 0,04 1,70 1,01 2,70 ' ' 10,81 10,07 Total'Cost Total'Cost €'''''''''''' €'''''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''''' €'''''''''' €'''''''''' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' 0,16 1,89 0,02 0,08 0,85 1,02 1,80 0,37 0,08 0,01 0,01 0,01 0,01 0,16 0,17 1,00 0,64 1,06 0,01 0,01 0,04 0,12 0,04 3,50 0,37 0,04 0,01 0,01 0,85 1,01 2,70 10,81 10,07 ' ' Unit'Cost Unit'Cost €'''''''''' €'''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' €''''''''''''' Total'Cost'per'Board Total'Cost'per'Board 3 2 9 2 1 1 3 1 1 3 6 1 1 1 1 1 1 5 2 1 3 1 1 1 4 5 4 2 1 1 10 30 16 Qty Qty

81 82 Appendix B

Test Code

B.1 Instructions Timings

1 ; Test MOVL instruction 2 MOV R0 , 10 3 MOV R1 , 20 4 MOV R2 , -1 5 6 ; Test XOR 7 XOR R1 , R2 8 XOR R1 , R2 9 10 ; Test JMP 11 JMP label1 12 NOP 13 NOP 14 NOP 15 NOP 16 NOP 17 label1 : 18 JMP label2 19 NOP 20 label2 : 21 ROLC R2 , 3 22 ROLC R2 , 5 23 SHLA R2 , 3 24 SHLA R2 , 5 25 MUL R1 , R0 26 DIV R1 , R0 27 MOD R1 , R2

83 B.2 Speed Test

B.2.1 PEPE Program

1 ;********************************************************************** 2 ; * Symbols 3 ;********************************************************************** 4 M1_RCU EQU 0D000H ; UART’s Control Register 5 M1_REP EQU 0D002H ; UART’s Status Register 6 M1_RD1 EQU 0D004H ; UART’s RX Port 7 M1_RD2 EQU 0D006H ; UART’s TX Port 8 BUTTONS EQU 8000H ; Buttons Address 9 LEDS EQU 9000H ; LEDs Address 10 N EQU1000 ;Toplimitfortheprimes/2 11 12 13 PLACE 1000 h 14 PRIMES:TABLEN 15 PRESSBUTTON: STRING ’P’, ’r’, ’e’, ’s’, ’s’, ’ ’, ’B’, ’u’, ’t’, ’t’, ’o’, ’n’, ’.’, ’.’, ’.’, 0Ah, 0 16 STARTING: STRING ’S’, ’t’, ’a’, ’r’, ’t’, ’i’, ’n’, ’g’, ’!’, 0Ah, 0 17 DONE: STRING 0Ah, ’D’, ’o’, ’n’, ’e’, ’.’, 0Ah, 0 18 19 PLACE 0000 h 20 21 MOV SP , 500 h 22 CALL ini_MUART 23 24 start : 25 MOV R8, PRESSBUTTON 26 CALL outString 27 28 MOV R0, BUTTONS 29 loopButtons: 30 MOVB R1, [R0] 31 BIT R1 , 0 32 JZ loopButtons 33 34 MOV R8, STARTING 35 CALL outString 36 37 ; Light all leds 38 MOV R0 , LEDS 39 MOV R2 , 0Fh 40 MOVB [R0], R2 41 42 ; Init registers 43 ; R0 = i 44 MOVR1,1 ;R1=j

84 45 ; R2 = 3*j 46 MOVR3,N ;R3=N 47 ; R4 = 2*i*j + i + j - 1 48 MOV R5, PRIMES ; R5 = PRIMES 49 MOVR6,0 ;R6=0 50 MOV R7, 0101h ; R7 = 0x0101 51 52 ; Set the primes table 53 CALL setTable 54 55 ; Start computing the prime numbers 56 ; Compare 3*j < N 57 outLoop : 58 MOV R2 , 3 59 MUL R2 , R1 60 CMP R2 , R3 61 JP endOutLoop 62 63 MOV R0, 1 ; R0 = i 64 inLoop : 65 MOV R4, R0 ; R4 = i 66 MUL R4, R1 ; R4 = i*j 67 ADD R4, R4 ; R4 = 2*i*j 68 ADD R4, R0 ; R4 = 2*i*j + i 69 ADD R4, R1 ; R4 = 2*i*j + i + j 70 SUB R4, 1 ; R4 = 2*i*j + i + j - 1 71 CMP R4 , R3 72 JP endInLoop 73 74 ADD R4 , R5 75 MOVB [R4], R6; 76 INC R0 77 78 JMP inLoop 79 80 endInLoop : 81 INC R1 82 JMP outLoop 83 84 endOutLoop: 85 CALL sendTable 86 87 ; Light off all leds 88 MOV R0 , LEDS 89 MOV R2 , 00h 90 MOVB [R0], R2 91 92 MOV R8 , DONE 93 CALL outString

85 94 95 JMP start 96 97 ;**************************************************************** 98 ;* setTable - initializes all the elements of the prime table as 1 99 ;* Inputs - None 100 ;* Outputs - None 101 ;**************************************************************** 102 setTable : 103 PUSH R1 104 MOV R1 , 0 105 setTableLoop: 106 CMP R1 , R3 107 JP endSetTable 108 109 MOV [R5 + R1], R7 110 ADD R1 , 2 111 112 JMP setTableLoop 113 114 endSetTable: 115 POP R1 116 RET 117 118 ;**************************************************************** 119 ;* sendTable - Sends all the primes via serial 120 ;* Inputs - None 121 ;* Outputs - None 122 ;**************************************************************** 123 sendTable : 124 PUSH R0 125 PUSH R1 126 PUSH R2 127 PUSH R4 128 ; Send the primes through serial 129 ; Number 2 is prime and is not on the list, must be sent first 130 MOV R8 , 2 131 CALL outNumber 132 ; Send Space 133 MOV R8 , 20h 134 CALL outChar 135 136 ; Go through all numbers in the table 137 MOV R0 , 0 138 139 sendTableLoop: 140 CMPR0,R3 ;R0

86 143 MOV R2 , R0 144 ADD R2, R5 ; R2 = PRIMES[R0] 145 INC R0 146 147 MOVB R4, [R2] 148 CMP R4 , 0 149 JZ sendTableLoop 150 ; If it is prime, send R0*2 + 1 151 MOV R8 , R0 152 ADD R8 , R0 153 ADD R8 , 1 154 CALL outNumber 155 156 ; Send space 157 MOV R8 , 20h 158 CALL outChar 159 160 JMP sendTableLoop 161 162 endSendTable: 163 POP R3 164 POP R2 165 POP R1 166 POP R0 167 RET 168 169 ;**************************************************************** 170 ;* ini_MUART - Initializes the UART 171 ;* Inputs - None 172 ;* Outputs - None 173 ;**************************************************************** 174 175 ini_MUART : 176 PUSH R0 177 PUSH R1 178 MOV R0, M1_RCU 179 MOV R1 , 00H 180 MOVB [R0], R1 ; Multiplicative factor of 16. 9600 baud for the external emulator 181 POP R1 182 POP R0 183 RET 184 185 ;**************************************************************** 186 ;* outNumber - Converts a number to ASCII and sends it via serial 187 ;* Inputs - R8 - Number to Send 188 ;* Outputs - None 189 ;* 190 ;****************************************************************

87 191 192 outNumber : 193 PUSH R0 194 PUSH R1 195 PUSH R2 196 PUSH R3 197 PUSH R8 198 199 ; Puts the input number in stack 200 MOV R0 , 0 201 MOV R2 , 10 202 MOV R3, 30h ; R3 = ’0’ 203 204 numberToStackLoop: 205 INC R0 206 MOV R1 , R8 207 MOD R1 , R2 208 ADD R1 , R3 209 PUSH R1 210 DIV R8 , R2 211 CMP R8 , 0 212 JNZ numberToStackLoop 213 214 MOV R8 , R0 215 216 ; Check if can send 217 outNumberNotOk: 218 MOV R0, M1_REP 219 MOVB R1, [R0] ; Read status 220 BIT R1, 5 ; This bit will be 1 if we are allowed to send the char 221 JZ outNumberNotOk 222 223 ; Send character 224 POP R2 225 MOV R0, M1_RD2 226 MOVB [R0], R2 ; Send the char 227 DEC R8 228 CMP R8 , 0 229 JNZ outNumberNotOk 230 231 POP R8 232 POP R3 233 POP R2 234 POP R1 235 POP R0 236 RET 237 238 239 ;****************************************************************

88 240 ;* outChar - Sends a char over serial 241 ;* Inputs - R8 - Char 242 ;* Outputs - None 243 ;**************************************************************** 244 245 outChar : 246 PUSH R0 247 PUSH R1 248 MOV R0, M1_REP 249 outCharNotOk: 250 MOVB R1, [R0] ; Read status 251 BIT R1, 5 ; This bit will be 1 if we are allowed to send the char 252 JZ outCharNotOk 253 MOV R0, M1_RD2 254 MOVB [R0], R8 ; Send the char 255 POP R1 256 POP R0 257 RET 258 259 ;**************************************************************** 260 ;* outString - Sends a string over serial 261 ;* Inputs - R8 - String’s address 262 ;* Outputs - None 263 ;* 264 ;**************************************************************** 265 outString : 266 PUSH R0 267 PUSH R1 268 outStringNotOk: 269 MOV R0, M1_REP 270 MOVB R1, [R0] ; Read status 271 BIT R1, 5 ; This bit will be 1 if we are allowed to send the char 272 JZ outStringNotOk 273 MOV R0, M1_RD2 274 MOVB R1, [R8] 275 CMP R1 , 0 276 JZ endOutString 277 MOVB [R0], R1 ; Send the char 278 INC R8 279 JMP outStringNotOk 280 endOutString: 281 POP R1 282 POP R0 283 RET

B.2.2 Native Program

1 #include 2 #include 3 #include

89 4 #include 5 6 #define N 1000 7 8 uint8_t primes[N]; 9 10 // UART Functions 11 void initUART_A0(void) 12 { 13 P3SEL = 0x30; // P3.4,5 = USCI_A0 TXD/RXD 14 UCA0CTL1 |= UCSWRST; // **Put state machine in reset** 15 UCA0CTL1 |= UCSSEL_2; //SMCLK 16 UCA0BR0 = 109; // 1MHz 9600 (see User’s Guide) 17 UCA0BR1 = 0; // 1 MHz 9600 18 UCA0MCTL |= UCBRS_2 + UCBRF_0; // Modulation UCBRSx=2, UCBRFx=0 19 UCA0CTL1 &= ~UCSWRST; // **Initialize USCI state machine** 20 UCA0IE &= ~UCRXIE; // Disable USCI_A0 RX interrupt 21 } 22 23 void writeByteUART_A0(uint8_t c) 24 { 25 while(!(UCA0IFG & UCTXIFG)); // TX buffer ready? 26 UCA0TXBUF = c; 27 } 28 29 void writeStringUART_A0(char *str, uint16_t length) 30 { 31 uint16_t i; 32 33 if (length < 1) length = strlen(str); 34 35 for (i = 0 ; i < length; ++i) 36 writeByteUART_A0((uint8_t) str[i]); 37 } 38 39 // Clock Functions 40 void initClock() 41 { 42 // 43 // Init the SMCLK as 1MHz (default) and the MCLK as 8MHz 44 // 45 46 UCSCTL3 |= SELREF_2; // Set DCO FLL reference = REFO 47 UCSCTL4 |= SELA_2; // Set ACLK = REFO 48 49 __bis_SR_register(SCG0); // Disable the FLL control loop 50 UCSCTL0 = 0x0000; // Set lowest possible DCOx, MODx 51 UCSCTL1 = DCORSEL_5; // Select DCO range 16MHz operation 52 UCSCTL2 = FLLD_1 + 249; // Set DCO Multiplier for 8MHz

90 53 // (N + 1) * FLLRef = Fdco 54 // (249 + 1) * 32768 = 8MHz 55 // Set FLL Div = fDCOCLK/2 56 UCSCTL5 = DIVS__8; 57 __bic_SR_register(SCG0); // Enable the FLL control loop 58 59 // Worst-case settling time for the DCO when the DCO range bits have been 60 // changed is n x 32 x 32 x f_MCLK / f_FLL_reference. See UCS chapter in 5xx 61 // UG for optimization. 62 // 32 x 32 x 8 MHz / 32,768 Hz = 250000 = MCLK cycles for DCO to settle 63 __delay_cycles(250000); 64 65 // Loop until XT1,XT2 & DCO fault flag is cleared 66 do 67 { 68 UCSCTL7 &= ~(XT2OFFG + XT1LFOFFG + XT1HFOFFG + DCOFFG); 69 // Clear XT2,XT1,DCO fault flags 70 SFRIFG1 &= ~OFIFG; // Clear fault flags 71 }while (SFRIFG1&OFIFG); // Test oscillator fault flag 72 } 73 74 void main(void) 75 { 76 char text[10]; 77 unsigned int i = 1, j = 1; 78 WDTCTL = WDTPW + WDTHOLD; // Stop WDT 79 80 initUART_A0(); 81 initClock(); 82 83 // Init Switch 1 84 P3DIR &= ~BIT6; 85 // Init Switch 2, 3, 4 86 P2DIR &= ~(BIT1 | BIT2 | BIT3); 87 // Init LED 1, 2 and 3 88 P4DIR |= BIT2 | BIT1 | BIT0; 89 // Init LED 4 90 P2DIR |= BIT0; 91 92 while (1) 93 { 94 i = 1; j = 1; 95 writeStringUART_A0("Press Button...\n", 0); 96 while(!(P2IN & BIT3)); 97 writeStringUART_A0("Starting...\n", 0); 98 99 // Light all LEDs 100 P4OUT |= BIT2 | BIT1 | BIT0; 101 P2OUT |= BIT0;

91 102 103 // Initial filling of the primes table with all numbers considered prime. 104 for (i = 0; i < N; i++) 105 primes[i] = 1; 106 107 108 while (3*j < N) 109 { 110 for (i = 1; (i + j + 2*i*j <= N); i++) 111 { 112 primes[i + j + 2*i*j - 1] = 0; 113 } 114 j ++; 115 } 116 117 writeStringUART_A0("2 ", 2); 118 for (i = 0; i < N; i++) 119 { 120 if (primes[i]) 121 { 122 sprintf(text, "%d ", i+i + 3); 123 writeStringUART_A0(text, 0); 124 } 125 } 126 127 writeStringUART_A0("\nDone.\n", 7); 128 // Light out all LEDs 129 P4OUT &= ~(BIT2 | BIT1 | BIT0); 130 P2OUT &= ~(BIT0); 131 } 132 }

B.3 Virtual Memory

1 ;******************************************************** 2 ; This program was done to measure the times changing the pages 3 ; This is will degrade the number of flash writing cycles 4 ;******************************************************** 5 6 ; Symbols 7 STACK EQU 1000h 8 PAGESTART EQU 2000h 9 PAGEEND EQU 4000h 10 SEGMENTSIZE EQU 512 11 12 13 PLACE 0000 h 14 MOVSP,STACK 15 MOV R0, PAGESTART

92 16 MOV R1, PAGEEND 17 MOV R2, SEGMENTSIZE 18 MOV R3, 0A53Fh 19 MOV R4 , 17 20 MOV R5 , 1 21 ; Once we write to R0 we will change the page. 22 ; As we didnt wrote anything to the first page 23 ; this change should be quick. 24 start : 25 MOV R6 , R5 26 INC R3 ; Increment R3 so the CRC changes everytime 27 loop : 28 ; Write something in every segment 29 MOV [R0], R3 30 ADD R0 , R2 31 DEC R6 32 JNZ loop 33 ; Write to the first page 34 ; this will change the page 35 PUSH R3 36 POP R3 37 ; Now lets check the memory 38 MOV R0, PAGESTART 39 MOV R6 , R5 40 loop1 : 41 MOV R7 , [R0] 42 CMP R7 , R3 43 JNZ error 44 DEC R6 45 JNZ loop1 46 47 ; If everything went well lets write one more page 48 INC R5 49 CMP R5 , R4 50 JNZ start 51 end : 52 JMP end 53 error : 54 JMP error

B.4 LCD

1 ; Define the addresses for the peripherals 2 LCD EQU 0C000h 3 LCDEND EQU 0C080h 4 STACK EQU 1FFEh 5 6 PLACE 0000 h 7 MOVSP,STACK

93 8 MOV R0 , LCD 9 MOV R2, LCDEND 10 loop : 11 MOV R1, 0CCCCh 12 MOV [R0], R1 13 ADD R0 , 2 14 MOV [R0], R1 15 ADD R0 , 2 16 MOV [R0], R1 17 ADD R0 , 2 18 MOV [R0], R1 19 ADD R0 , 2 20 MOV R1, 3333h 21 MOV [R0], R1 22 ADD R0 , 2 23 MOV [R0], R1 24 ADD R0 , 2 25 MOV [R0], R1 26 ADD R0 , 2 27 MOV [R0], R1 28 ADD R0 , 2 29 CMP R2 , R0 30 JNZ loop 31 fim : 32 JMP fim

94 Bibliography

[1] James E. Smith and Ravi Nair. Virtual Machines – Versatile Platforms For Systems And Processes, chapter 2. Denise E.M. Penrose, 1st edition, 2005.

[2] Jose´ Delgado and Carlos Ribeiro. Arquitectura de Computadores. FCA - Editora de Informatica,´ Lda., 4th edition, September 2010.

[3] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In Proceedings of the annual conference on USENIX Annual Technical Conference, ATEC ’05, pages 41–41, Berkeley, CA, USA, 2005. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1247360.1247401.

[4] Apple Inc. Universal binary programming guidelines, second edition, February 2009. URL http://developer.apple.com/legacy/mac/library/documentation/MacOSX/Conceptual/ universal_binary/universal_binary.pdf.

[5] Chuck Moore. Forth - the early years. http://www.colorforth.com/HOPL.html, 1999.

[6] Rui M. R. Rocha and Jose´ C. M. Delgado. Projecto de arquitectura de computadores - jogo de defesa antiaerea,´ 2011/2012.

[7] Texas Instruments Inc. MSP430F5438A Datasheet, October 2010. URL http://www.ti.com/lit/ ds/symlink/msp430f5438a.pdf. SLAS655B.

[8] Sitronix. ST7565R Datasheet, March 2006. URL http://www.lcd-module.de/eng/pdf/ zubehoer/st7565r.pdf.

[9] Jose´ Miguel de Carvalho Catela Teixeira. Moteist++: A hardware platform for wireless sensor networks, November 2009.

[10] STMicroelectronics. LD1117 Series Datasheet, July 2005. URL http://www.farnell.com/ datasheets/93498.pdf.

[11] Future Technology Devices International Ltd. FT2232H Datasheet, June 2012. URL http://www. ftdichip.com/Support/Documents/DataSheets/ICs/DS_FT2232H.pdf. FT 000061.

[12] Agilent Technologies Inc. HDSP-521E Datasheet, July 2004. URL http://www.farnell.com/ datasheets/95204.pdf.

95 [13] Texas Instruments Inc. TLC5927 Datasheet, July 2008. URL http://www.ti.com/lit/ds/ symlink/tlc5927.pdf. SLVS677.

[14] Displaytech Ltd. 64128K-FC-BW-3 Datasheet, June 2008. URL http://docs-europe. electrocomponents.com/webdocs/0bf5/0900766b80bf5313.pdf.

96