4.12.2014
Computer System Structures cz:Struktury počítačových systémů
Version: 1.0
Lecturer: Richard Šusta
ČVUT-FEL in Prague, CR – subject A0B35SPS
Microprocessors
2
1 4.12.2014
Definition: Microprocessor
A Microprocessor is a Single VLSI Chip that has a CPU and may also have some Other Units (for example, caches, floating point processing arithmetic unit, pipelining, and super-scaling units)
Source Microprocessor Family Motorola 68HCxxx Intel 80x86 & i860 Sun SPARC IBM PowerPC
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 3
Existovalo a dodnes existuje mnoho typů od různých výrobců http://research.microsoft.com/en-us/um/people/gbell/CyberMuseum_contents/Microprocessor_Evolution_Poster.jpg
2 4.12.2014
From 8 bit to 32 bit CISC processors
8 bit µ-processor 8008 8080 8085 Z80 Year 1972 1974 1976 1976 Instructions 66 111 113 158 MIPS 0.06 0.29 0.37 0.58 Minimum system (IO circuits) 20-40 7 3 3
16 and 32 bit 64bit example: Intel 8086/88 80286 80386 80486 Pentium µ-processor I5-2500K Quad core Year 1978/79 1982 1985-91 1989-94 1993-98 2011 Instructions processor 133 ~175 ~195 ~200 ~225 ~500 + numeric cooprocessor MIPS 0.33-0.75 0.9-2.6 2.5-11 16-70 100-300 ~80000 Memory Physical / Virtual 1 MB 16 MB/1 GB 4 GB / 64TB 32 GB / 256 TB
Definition: Microcontroller
A microcontroller is a single-chip VLSI unit which, though having limited computational capabilities possesses enhanced input- output capabilities and a number of on- chip functional units
Source Microcontroller Family Motorola 68HC11xx, HC12xx, HC16xx Intel 80x86, 8051, 80251 Microchip PIC 16F84, 16F876, PIC18 ARM ARM9, ARM7
Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 6
3 4.12.2014
Microcontroller
Output Output interfaces Inputintefaces In device1 Out device1 In device2 Microcontroller (uC) Out device2
In device3
Auxiliary units
7
Example: “Original” 8051 Microcontroller
Oscillator 4096 Bytes 128 Bytes Two 16 Bit and timing Program Data Timer/Event Memory Memory Counters
8051 Internal data bus CPU
64 K Byte Bus Programmable Programmable Expansion I/O Serial Port Full Control Duplex UART Synchronous Shifter subsystem interrupts
External interrupts Control Parallel ports Serial Output Address Data Bus Serial Input I/O pins Source: Raj Kamal, Embedded Systems: Architecture, Programming and Design 8
4 4.12.2014
Hardcore/Softcore Processors
Hard-core Processors A Fabricated Integrated Circuit that may or may not be Embedded into Additional Logic Soft-core Processors A Processor Described in a HDL that is implemented in Reconfigurable Logic (ie, in an FPGA), e.g NIOS, MicroBlaze, OpenRISC
SPS 9
Soft-core Processors Logic Elements
Ultra SPARC S1 Core/f 60000
S1 Core 37000 OpenRISC 1200 6000
LEON2 5000
Sun Microsystems Sun
LEON3 3500
ARM Cortex-M1 2600
aeMB 2536
source LatticeMico32 1984
OpenFire 1928 Open Nios II/f 1800 Xilinx MicroBlaze 1324
Nios II/s 1170
DSPuva16 510
Nios II/e 390 PacoBlaze 204
LatticeMico8 200
Xilinx PicoBlaze 192 100 1000 10000 100000
SPS 10
5 4.12.2014
Nios II: RISC soft-core processor
Copyright © 2005 Altera Corporation
Nios II Versions
Nios II Processor Comes In Three ISA (Instruction Set Architecture) Compatible Versions
FAST: Optimized for Speed
STANDARD: Balanced for Speed and Size
ECONOMY: Optimized for Size Software Code is Binary Compatible No Changes Required When CPU is Changed
Copyright © 2005 Altera Corporation 12
6 4.12.2014
Nios II: Hard Numbers
Nios II/ f Nios II /s Nios II /e Stratix II 200 DMIPS @ 90 DMIPS @ 28 DMIPS @ 175MHz 175MHz 190MHz
1180 LEs 800 LEs 400 LEs Cyclone 100 DMIPS @ 62 DMIPS @ 20 DMIPS @ 125MHz 125MHz 140MHz 1800 LEs 1200 LEs 550 LEs
DMIPS = Dhrystone benchmark MIPS (million instructions per second) Dhrystone is without float point operations (Note: Whetstone [cz brousek] has float points) Comparing with Pentium (1996) 224 DMIPS @ 200MHz cca 196 DMIPS@175MHz - cca 140 DMIPS@125 MHz
Copyright © 2005 Altera Corporation 13
Processor Core Variations Nios II /f Nios II /s Nios II /e Fast Standard Economy
Pipeline 6 Stage 5 Stage None
H/W Multiplier & Emulated 1 Cycle 3 Cycle Barrel Shifter In Software
Branch Prediction Dynamic Static None
Instruction Cache Configurable Configurable None
Data Cache Configurable None None
Logic Requirements 1800 w/o MMU 1200 600 (Typical LEs) 3200 w/ MMU
Custom Up to 256 Instructions
© 2010 Altera Corporation—Public (MMU - Memory Management Unit) ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 14
7 4.12.2014
Mutlicycle CPU
Start State 0 1 Instr. Instr. fetch/ decode/reg. adv. PC fetch/branch addr. load Reg Jump Branch 3 2
Read Compute ALU Write PC Write memory memory operation on branch jump addr. data addr. condition to PC 6 9 4 store 8 5 7 Write Write memory Write register data register
15 SPS 15
Different Execution Time of Instructions instruction data registers ALU registers memory memory
Fastest
Not Not Not Jump P Not C used used used used
Not Not P Not Branch C used used used
P Not Compute C used
P Load C
Not P used Store C Not
SlowestSPS 16
8 4.12.2014
Single-cycle versus multicycle instruction execution.
Clock
Time needed
Time allotted Instr 1 Instr 2 Instr 3 Instr 4
Clock
Time Time needed saved 3 cycles 5 cycles 3 cycles 4 cycles Time allotted Instr 1 Instr 2 Instr 3 Instr 4
SPS 17
Internal Registers in CPUs
Processor Memory Input/Output PC IR
AC Address Bus MAR Data Bus ALU MDR PC Program counter Control Unit AC Accumulator Internal registers IR Instruction register MAR Memory address register MDR Memory data register
SPS 18
9 4.12.2014
Cycles of Instructions
Store Fetch IR MAR MDR Result Operands Fetch Decode Execute Instruction Opcode Operation
SPS 19
Pipelining
Řízení procesoru s pipelining (zřetězeným) zpracováním instrukcí není jednoduché. Autoři procesoru MIPS, podobného NIOSu, uvádějí ve své knize (Paterson, D., Henessy, J.: Computer Organization and Design, Elsevier), že jimi navržená jednotka pro pipelining úspěšně otestována nejméně 100 lidmi na 18 různých pracovištích obsahovala chybu ztracenou v několika tisících řádkách HDL kódu, na kterou se přišlo až delším používání. Více bude v předmětu A0B36APO Architektura počítačů
SPS 20
10 4.12.2014
Minimum Nios II Processor
Nios II Processor Core reset Instruction General clock Program Master Purpose Port Controller Registers & Address Generation Status & Control Registers
Data Master Arithmetic Port Logic Unit
= Configurable = Fixed
© 2010 Altera Corporation—Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 21
Outside NIOS
Image source: Tron
11 4.12.2014
Simple NIOS System on DE2
Nios Processor Address (32) Switch PIO Read
32-Bit AvalonBus Nios Write LED PIO Processor Data In (32) Data Out (32) 7-Segment
LED PIO
PIO-32
UART Timer User-Defined Interface
23
Nios II/e on Basic Computer for DE2
Copyright © 2005 Altera Corporation 24
12 4.12.2014
NIOS DE2 Basic Computer
Start-address End-address Element Size 0x00000000 0x007FFFFF SDRAM 8 MB Ram
128 MB
--- 120 MB
MB 0x08000000 0x0807FFFF ROM (SRAM) 512 kB read-only 16 MB
256 256 15,5 MB 0x09000000 0x09001FFF On-chip-RAM 8 kB 112 MB 111,92 MB 0x10000000 0x10000003 LEDR 4 bytes-write-only
0x10000010 0x10000013 LEDG 4 bytes-write-only
0x10000020 0x10000023 HEX3_HEX0 4 bytes-write-only
0x10000030 0x10000033 HEX7_HEX4 4 bytes-write-only
0x10000040 0x10000043 SW 4 bytes-read-only
4 bytes data 0x10000050 0x1000005F KEY_BASE +12 bytes control
.. other I/O 25
Memory Map of Basic Computer SDRAM - synchronous dynamic RAM, i.e. synchronized with the system bus, on the DE2 board - 1M x 16 bits x 4 banks - addresses 0x00000000 to 0x007FFFFF. SRAM - static RAM (SRAM) chip, on the DE2 board - 256K x 16 bits - addresses space 0x08000000 to 0x0807FFFF. On-Chip Memory 8-Kbyte memory inside the Cyclone II FPGA chip - 2K x 32 bits - addresses 0x09000000 to 0x09001FFF.
26 Copyright © 2005 Altera Corporation
13 4.12.2014
Nios II/s: DE2 Media Computer
Copyright © 2005 Altera Corporation 27
NIOS DE2 Media Computer
Start-address End-address Element Size 0x00000000 0x007FFFFF SDRAM 8 MB synchron. DRAM
128 MB
--- 120 MB
MB 0x08000000 0x0807FFFF SRAM 512 kB VGA pixel buffer 16 MB
256 256 15,5 MB 0x09000000 0x09001FFF On-chip-RAM 8 kB VGA char. buffer 112 MB 111,92 MB 0x10000000 0x10000003 LEDR 4 bytes-write-only
0x10000010 0x10000013 LEDG 4 bytes-write-only
0x10000020 0x10000023 HEX3_HEX0 4 bytes-write-only
0x10000030 0x10000033 HEX7_HEX4 4 bytes-write-only
0x10000040 0x10000043 SW 4 bytes-read-only
4 bytes data 0x10000050 0x1000005F KEY_BASE +12 bytes control 28 .. other I/O 28
14 4.12.2014
I/O DE2 Basic & Media Computer
0x10000000 LEDR_BASE 0x10000010 LEDG_BASE, 0x10000020 HEX3_HEX0_BASE 0x10000030 HEX7_HEX4_BASE 0x10000040 SW_BASE 0x10000050 KEY_BASE ...others I/O...
Literatura on DVD: DE2_Media_Computer.pdf 29
Switch and Push Button
SW_BASE
KEY_BASE
30
15 4.12.2014
LED
LEDR_BASE
LEDG_BASE
31
HEX
HEX3_HEX0_BASE
HEX7_HEX4_BASE
32
16 4.12.2014
Wiki: Memory Mapped IO
Memory-mapped I/O (MMIO) and port I/O (also called isolated I/O or port-mapped I/O abbreviated PMIO) are two complementary methods of performing input/output between the CPU and peripheral devices in a computer. An alternative approach, not discussed here, is using dedicated I/O processors — commonly known as channels on mainframe computers — that execute their own instructions. Memory-mapped I/O (not to be confused with memory-mapped file I/O) uses the same address bus to address both memory and I/O devices ‒ the memory and registers of the I/O devices are mapped to (associated with) address values. So when an address is accessed by the CPU, it may refer to a portion of physical RAM, but it can also refer to memory of the I/O device. Thus, the CPU instructions used to access the memory can also be used for accessing devices. Each I/O device monitors the CPU's address bus and responds to any CPU access of an address assigned to that device, connecting the data bus to the desired device's hardware register. To accommodate the I/O devices, areas of the addresses used by the CPU must be reserved for I/O and must not be available for normal physical memory. The reservation might be temporary — the Commodore 64 could bank switch between its I/O devices and regular memory — or permanent. Port-mapped I/O often uses a special class of CPU instructions specifically for performing I/O. This is found on Intel microprocessors, with the IN and OUT instructions. These instructions can read and write one to four bytes (outb, outw, outl) to an I/O device. I/O devices have a separate address space from general memory, either accomplished by an extra "I/O" pin on the CPU's physical interface, or an entire bus dedicated to I/O. Because the address space for I/O is isolated from that for main memory, this is sometimes referred to as isolated I/O. [ http://en.wikipedia.org/wiki/Memory-mapped_I/O ]
33
Inside NIOS
Nios II UART
CPU
Debug Cache GPIO
On-Chip Timer ROM SPI
On-Chip Fabric Switch Avalon SDRAM RAM Controller
FPGA
17 4.12.2014
NIOS Simplified System
.text
.data
35
NIOS: Common Register Usage
Reg Name Normal usage Reg Name Normal usage r0 zero 0x0000_0000 r16 r1 at Assembler Temporary r17 r2 r18 r3 r19 r4 r20 r5 r21 r6 r22 r7 r23 r8 r24 et Exception Temporary r9 r25 bt Break Temporary r10 r26 gp Global Pointer r11 r27 sp Stack Pointer r12 r28 fp Frame Pointer r14 r29 ea Exception Return Address r14 r30 ba Break Return Address r15 r31 ra Return Address 36
18 4.12.2014
NIOS Assembler
is GNU assembler http://tigcc.ticalc.org/doc/gnuasm.html http://sources.redhat.com/binutils/docs-2.12/as.info/ by Altera monitor – lieratura on DVD • Doporuceno_Altera_Monitor_Program_Tutorial.pdf • n2cpu_nii5v1.pdf – instruction set reference
37
General Processor Instruction Formats
Three-Address Instructions ADD o1, o2, o3 o1 ← o2 + o3 Two-Address Instructions MOV o1, o2 o1 ← o2 One-Address Instructions NEXTPC o1 o1 ← PC + 4 Zero-Address Instructions NOP r1 ← r1 add r0 (NIOS)
R-Type - operands oi are registers only I-Type - instruction contains an immediate value C-Type - control instruction
38
19 4.12.2014
NIOS Addressing Modes
(a) Register direct addressing Op R1 mov R2,R1 Register contains the operand R2 R1 R2 Operand
(b) Immediate addressing Op -20 movi r2, -20 Instruction contains the operand stw R2, byte_offset(R1)
(c) Displacement (or offset) stw R2, 100(R1) : R2 → M[R1+100] addressing Memory M Address of operand = Op R2 100 register + constant
Op+ Operand
R1 Address
39
RISC vs CISC - Pentium Addressing Modes Only for self-study
LA=linear address ~ EA [Source: Intel co.] 40
20 4.12.2014
Load/Store ldw (load 32-bit word from memory) ldw rB, im-const(rA) : rB ← MEM[rA + im-const] stw (store 32-bit word into memory)
stw rB, im-const (rA) : rB → MEM[rA + im-const]
41
Example - demo1.s /*.include "nios_macros.s"*/ .equ LEDR_BASE, 0x10000000 .equ SW_BASE, 0x10000040
.text/* executable code follows */ .global_start _start: /* initialize base addresses of parallel ports */ movia r10, SW_BASE/* SW slider switch base address */ movia r12, LEDR_BASE/* red LED base address */
LOOP: ldwio r8, 0(r10) Pseudo-instruction stwio r8, 0(r12) movia r12, 0x10000040 br LOOP => orhi r12, r0, 0x1000 .end addi r12, r0, 0x0040
42
21 4.12.2014
Some GNU assembler directives .include insert another file .text this directive tells the assembler to place the following statements at the end of the code section .global this directive makes the symbol visible to the program loading instructions into memory. .data this directive tells the assembler to place the following statements at the end of the data section. .equ this directive set the value of a symbol. .word this directive is used to set the content of memory locations to the specified value. .end Marks the end of current assembly program.
43
Data Types in Instructions NIOS II registers hold 32-bit (4-byte) words.
Other common data sizes include byte and halfword. Byte = 8 bits - example: ldb / ldbu + ldbio / ldbuio - load byte extend signed/unsigned
Halfword = 2 bytes example: - ldh / ldhu + ldhio / ldhuio - load halfword extend signed/unsigned
Word = 4 bytes - example: - ldw + ldwio - load word
Doubleword = 8 bytes - user defined instructions only
u-unsigned extension io - bypass cashe 44
22 4.12.2014
Unsigned/Signed Extension
X • • • 0
• • •
Xu' • • • • • • k m unsigned m X • • •
• • •
X • • • • • • k m signed 45
ALU Instructions operation R-format I-format add add addi subtract sub subi Lower 32bits multiply mul muli of 64bit result High 32bits mulxuu,mulxsu of 64bits mulxss High 32bits of 64bit result AND and andi andhi OR or ori orhi XOR xor xori xorhi NOR nor
pseudo-instruction addi rB ← rA + se(number) Logic instructions AND, OR, XOR, NOR do not use se = sign-extension 46
23 4.12.2014
Increasing Performance
Image source: Tron
3 Ways to Increase Performance
Custom Hardware Multi-Processor Instructions Accelerators System
FPGA FPGA FPGA Nios II
Nios II Nios II External Nios II Custom Hardware CPU or
Instructions Accelerator DSP PCIe Nios II
Accelerate CPU Add external Add more processing co-processing processors performance with hardware to (internal &/or application- accelerate data external) to increase specific hardware functions processing power
© 2010 Altera Corporation—Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. 48 and Altera marks in and outside the U.S.
24 4.12.2014
Accelerating Software in FPGA
Custom Logic
2,500 A 2,000 +- << C - Out 1,500 530 Times >> 27 Times Faster & 1,000 B Faster 500 Nios II Embedded Processor
Iterations/Second 0 Software Custom Accelerator Control
Only Instruction Nios II
Custom Accelerator DMA • Accelerator running 64Kb CRC at 100 MHz Instruction DMA (CRC can be easily calculated on a Linear Feedback Shift Register, see the next lecture)
Arbiter Arbiter
Program Data Data Memory Memory Memory
© 2010 Altera Corporation—Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 49
Altera C2H – the SW Accelerator Solution
Generates and integrates a custom hardware accelerator from an ANSI C function.
C2H CPU Accelerator
Arbiter Arbiter
Program Data Data Memory Memory Memory
© 2010 Altera Corporation—Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 50
25 4.12.2014
Expander created from modified VGAFrame
OUTPUT LEDR[17] show bits of datain regions VCC OUTPUT LEDR[15..12]
OUTPUT LEDR[9..0] VCC
OUTPUT LEDR[16] altpll0 OUTPUT LEDR[11..10] GND
Freq25MHz175 GND INPUT inclk0 c0 CLOCK_27 VCC inclk0 frequency: 27.000 MHz Operation Mode: Normal locked
Clk Ratio Ph (dg) DC (%) c0 1007/1080 0.00 50.00 OUTPUT TD_RESET inst Cyclone II VCC
VGAsynchro VGANiosRectangle
CLK_25MHz175 yrow[9..0] yrow[9..0] VGA_R[9..0] OUTPUT VGA_R[9..0] ACLRN xcolumn[9..0] xcolumn[9..0] VGA_G[9..0] OUTPUT VGA_G[9..0] VGA_CLK VGA_CLK_in VGA_B[9..0] OUTPUT VGA_B[9..0] VGA_BLANK VGA_BLANK_in VGA_CLK OUTPUT VGA_CLK VGA_HS VGA_HS_in VGA_BLANK OUTPUT VGA_BLANK VGA_VS VGA_VS_in VGA_HS OUTPUT VGA_HS VGA_SYNC VGA_SYNC_in VGA_VS OUTPUT VGA_VS ACLRN_in VGA_SYNC OUTPUT VGA_SYNC inst4 datain[17..0] tb_yrow[9..0] tb_xcolumn[9..0] INPUT SW[17..0] VCC inst1 datain(17) - enable write; datain(15 downto 12) - memory index 51 (0-x0, 1-y0, 2-xwidth, 3-yheight; 4 to 6 - RGB color of background, 8 to 10 - RGB color of the rectangle)
VGANiosRectangle
VGA_R[9..0]
yrow[9..0] frame:process(VGA_CLK) yheight x0,y0 VGA_G[9..0] xcolumn[9..0]
xwidth VGA_B[9..0] VGA_CLK paramaters of picture 0 1 2 3 4 5 6 8 9 10
ACLRN x0 y0 xw yh R G B Rb Gb Bb subtype datain_t is unsigned(9 downto 0); datain[17..0] type memory_t is array(0 to 15) of datain_t; signal memory : memory_t;; store:process(VGA_CLK, ACLRN)
52
26 4.12.2014
Top entity of Basic Computer
53
DE2 Basic Computer Schema Detail
DE2_Basic_Computer
CLOCK_50 INPUT BIDIR GPIO_0[35..0] VCC CLOCK_50 GPIO_0[35..0] VCC CLOCK_27 INPUT BIDIR GPIO_1[35..0] VCC CLOCK_27 GPIO_1[35..0] VCC KEY[3..0] INPUT BIDIR SRAM_DQ[15..0] VCC KEY[3..0] SRAM_DQ[15..0] VCC SW[17..0] INPUT BIDIR DRAM_DQ[15..0] VCC SW[17..0] DRAM_DQ[15..0] VCC UART_RXD INPUT OUTPUT LEDG[8..0] VCC UART_RXD LEDG[8..0] LEDR[17..0] OUTPUT LEDR[17..0] HEX0[6..0] OUTPUT HEX0[6..0] HEX1[6..0] OUTPUT HEX1[6..0] HEX2[6..0] OUTPUT HEX2[6..0] HEX3[6..0] OUTPUT HEX3[6..0] HEX4[6..0] OUTPUT HEX4[6..0] HEX5[6..0] OUTPUT HEX5[6..0] HEX6[6..0] OUTPUT HEX6[6..0] HEX7[6..0] OUTPUT HEX7[6..0] SRAM_ADDR[17..0] OUTPUT SRAM_ADDR[17..0] SRAM_CE_N OUTPUT SRAM_CE_N SRAM_WE_N OUTPUT SRAM_WE_N SRAM_OE_N OUTPUT SRAM_OE_N 54 SRAM_UB_N OUTPUT SRAM_UB_N SRAM_LB_N OUTPUT SRAM_LB_N UART_TXD OUTPUT UART_TXD DRAM_ADDR[11..0] OUTPUT DRAM_ADDR[11..0] DRAM_BA_1 OUTPUT DRAM_BA_1 DRAM_BA_0 OUTPUT DRAM_BA_0 DRAM_CAS_N OUTPUT DRAM_CAS_N DRAM_RAS_N OUTPUT DRAM_RAS_N 27 DRAM_CLK OUTPUT DRAM_CLK DRAM_CKE OUTPUT DRAM_CKE DRAM_CS_N OUTPUT DRAM_CS_N DRAM_WE_N OUTPUT DRAM_WE_N DRAM_UDQM OUTPUT DRAM_UDQM DRAM_LDQM OUTPUT DRAM_LDQM
inst 4.12.2014
Connected Accelerator
altpll0
Freq25MHz175 inclk0 c0 inclk0 frequency: 27.000 MHz DE2_Basic_Computer Operation Mode: Normal locked
CLOCK_50 INPUT BIDIR GPIO_0[35..0] VCC CLOCK_50 GPIO_0[35..0] VCC Clk Ratio Ph (dg) DC (%) CLOCK_27 INPUT BIDIR GPIO_1[35..0] VCC CLOCK_27 GPIO_1[35..0] VCC c0 1007/1080 0.00 50.00 KEY[3..0] INPUT KEY[3..0] SRAM_DQ[15..0] BIDIR SRAM_DQ[15..0] VCC VCC OUTPUT TD_RESET INPUT BIDIR
SW[17..0] DRAM_DQ[15..0] VCC VCC SW[17..0] DRAM_DQ[15..0] VCC inst2 Cyclone II UART_RXD INPUT OUTPUT LEDG[8..0] VCC UART_RXD LEDG[8..0] LEDR[17..0] HEX0[6..0] OUTPUT HEX0[6..0] HEX1[6..0] OUTPUT HEX1[6..0] VGAsynchro VGANiosRectangle HEX2[6..0] OUTPUT HEX2[6..0] CLK_25MHz175 yrow[9..0] yrow[9..0] VGA_R[9..0] OUTPUT VGA_R[9..0] HEX3[6..0] OUTPUT HEX3[6..0] ACLRN xcolumn[9..0] xcolumn[9..0] VGA_G[9..0] OUTPUT VGA_G[9..0] HEX4[6..0] OUTPUT HEX4[6..0] VGA_CLK VGA_CLK_in VGA_B[9..0] OUTPUT VGA_B[9..0] HEX5[6..0] OUTPUT HEX5[6..0] VGA_BLANK VGA_BLANK_in VGA_CLK OUTPUT VGA_CLK HEX6[6..0] OUTPUT HEX6[6..0] VGA_HS VGA_HS_in VGA_BLANK OUTPUT VGA_BLANK HEX7[6..0] OUTPUT HEX7[6..0] VGA_VS VGA_VS_in VGA_HS OUTPUT VGA_HS SRAM_ADDR[17..0]OUTPUT SRAM_ADDR[17..0] VGA_SYNC VGA_SYNC_in VGA_VS OUTPUT VGA_VS SRAM_CE_N OUTPUT SRAM_CE_N ACLRN_in VGA_SYNC OUTPUT VGA_SYNC SRAM_WE_N OUTPUT SRAM_WE_N datain[17..0] tb_yrow[9..0] SRAM_OE_N OUTPUT SRAM_OE_N inst4 tb_xcolumn[9..0] SRAM_UB_N OUTPUT SRAM_UB_N SRAM_LB_N OUTPUT SRAM_LB_N inst1 UART_TXD OUTPUT UART_TXD DRAM_ADDR[11..0]OUTPUT DRAM_ADDR[11..0] DRAM_BA_1 OUTPUT DRAM_BA_1 OUTPUT LEDR[17..0] DRAM_BA_0 OUTPUT DRAM_BA_0 DRAM_CAS_N OUTPUT DRAM_CAS_N DRAM_RAS_N OUTPUT DRAM_RAS_N DRAM_CLK OUTPUT DRAM_CLK DRAM_CKE OUTPUT DRAM_CKE DRAM_CS_N OUTPUT DRAM_CS_N DRAM_WE_N OUTPUT DRAM_WE_N DRAM_UDQM OUTPUT DRAM_UDQM DRAM_LDQM OUTPUT DRAM_LDQM
inst
55
E(x)ternal Influence Q: – How to transfer signals depended on different clocks? A: If you look at the picture you will see the key to this solution: You just need second hands for automatic shaking:-)
[Image: 9gag.com] 56
28 4.12.2014
Synchronize Signal with 2nd Clock
Clk1 Clk2 2 unsynchronized clocks
Signal- data Hand clk1 RDY Snake source new data are ready
communication channel
Send without implicit hand-shaking (cz: nepotvrzovaný přenos dat) It is similar to UDP on networks (UDP - User Datagram Protocol) 57
Handshaking
data Data
Signal- Handshake RDY register
clk1 FSM
source SLOAD
clk=5kHz Clk1 ACK Clk2 a signal synchronized - acknowledge by the 1st clock read by hand-shaking (cz: kladně potvrzovaný přenos) like TCP = Transmission Control Protocol
58
29 4.12.2014
Handshake with stable data data Signal- request Register clk1 Hand RDY SLOAD source Clk1 shake ACK clk2
clk1 data 3 1 1 1 1 request 2
RDY 7 4 4 5 5 5 ACK 6 59 RDY is synchronized with clk1, but not with clk2
FSM for Handshaking Clock1 Moore ACK RDY Clk1-signal='0' Machine
S0 Clk1-signal ='0' and ACK='0'
Clk1-signal ='1' ACK='0'
ACK='1' S S1 2 RDY
Clk1-signal ='1' or ACK='1'
30 4.12.2014
Handshaking for unstable data
data init1 Handshake Signal- FSM Shift clk1 RDY register source
SLOAD
clk1 ACK - acknowledge clk2
61
Handshake with unstable data init0 init1 Signal- Hand Shift RDY clk1 KEY request shake SLOAD source clk1 ACK clk2
clk1 init0 1 1 1 1 request 2 init1 3 RDY 7 4 4 5 5 5 ACK 6 62 RDY is synchronized with clk1, but not with clk2
31 4.12.2014
Handshake for unstable data (1/4)
Input ports ENTITY handshake is Output of ENTITY ARCHITECTURE rtl of handshake is ports of ENTITY
clock data1 <=dataReg;
aclrn data1 data0 signal loadData signal
dataReg
process signal ack stNext StateRegister
request rdy
process NextState
signal stReg
SPS 63
Handshake for unstable data (2/4)
LIBRARY ieee; USE ieee.std_logic_1164.all; L:Libraries ENTITY handshake is PORT( ack, request : IN STD_LOGIC; data0 : in std_logic_vector(15 downto 0); clock, aclrn : IN STD_LOGIC; EP: entity-ports rdy : OUT STD_LOGIC; data1 : out std_logic_vector(15 downto 0) ); end handshake; ARCHITECTURE rtl of handshake is type state is (s0, s1, s2); AD: architecture signal stReg, stNext : state; declarations signal dataReg:std_logic_vector(15 downto 0); signal loadData:std_logic; begin ACS: architecture concurrent StateRegister: PROCESS(clock, aclrn) statements There are 3 BEGIN...END PROCESS; concurrent statements: NextState: PROCESS(stReg, ack, request) StateRegister:process, BEGIN...END PROCESS; NextState:process, and data1<=dataReg; <= end rtlSPS; 64
32 4.12.2014
Handshake for unstable data (3/4) StateRegister: PROCESS(clock, aclrn) PH : process - header with sensitivity list begin if (aclrn = '0') then stReg <= s0; dataReg<=(others=>'0'); elsif rising_edge(clock) PC: process-clock test then stReg <= stNext; if loadData='1' PS: process then dataReg<=data0; -synchronous part end if; end if; end PROCESS;
SPS 65
Handshake for unstable data (4/4)
NextState: PROCESS(stReg, ack, request) PH: process-header with begin sensitivity list rdy<='0'; loadData <= '0' ; case stReg is when s0 => if request = '1' then stNext <= s1;loadData <= '1' ; else stNext <= s0; end if; when s1 => rdy<='1'; PD:= process- if ack = '1' then stNext <= s2; dataflow else stNext <= s1; end if; part when s2 => if request = '0' and ack = '0' then stNext <= s0; else stNext <= s2; end if; when others => report "Undefined state"; end case; end PROCESS;
SPS 66
33 4.12.2014
RTL
dataReg[15..0] PRE data0[15..0] D Q data1[15..0] stReg 0 0 ENA ack ack NextState 1 CLR clk s0 NextState:OUT0 loadData request s1 rdy request aclrn reset
clock
s0 s1 s2 stReg~1
SPS 67
Příjem signálu handshake SLOAD DFFE-registry D Q DATA Data- EN synchronní CLK1 CLK z clk2 ACK CLRN DFF DFF RDY D Q D Q
CLK CLK CLRN CLRN Asynchronous clear active-low
Clk2 Obvod Synchronizátor není nutný, potvrzení ale velmi doporučený
SPS 68
34 4.12.2014
NIOS gcc
programming in low-level C language
69
Basic Steps of C Compiler
stdio.h string.h code.C code.h
C preprocessor modifies source code by substitutions
code.tmp optional NIOS gcc C-compiler passes 1 to M optional parts parts metacode
Compiler passes M to N: Optimization of metacode
system relative object libraries module NIOS Linker Loader processor
70 70
35 4.12.2014
C primitive types Size Java C C alternative Range 1 boolean any integer, true if !=0 BOOL(1 0 to !=0 8 byte char(2 signed char –128 to +127 8 unsigned char BYTE(1 0 to 255 16 short int signed short –32768 to +32767 16 unsigned short 0 to + 65535 32 int int signed int -2^31 to 2^31-1 32 unsigned int DWORD(1 0 to 2^32-1 64 long long long int -2^63 to 2^63-1 64 unsigned long LWORD(1 0 to 2^64-1 1) In many implementations, it is not a standard C datatype, but only common custom for user's "#define" macro definitions, see next slides
2) Default is signed but it is best to specify. 71
Sign Extension Example in C
short int x = 15213; int ix = (int) x; short int y = -15213; int iy = (int) y;
Decimal Hex Binary x 15213 3B 6D 00111011 01101101 ix 15213 00 00 C4 92 00000000 00000000 00111011 01101101 y -15213 C4 93 11000100 10010011 iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011
72
36 4.12.2014
NIOS C #define / const
const int TEN=10; read only access const ~ static final in Java
const int TEN2 = TEN; /* Error: initializer element is not constant Note: OK in C++, but C variables must not initialize constants * /
#define TENDEF 10 //substitution rule
const int TEN3 = TENDEF; // OK /* after preprocessor run: const int TEN3 = 10; */
73
Definition of BYTE and BOOL
// by substitution rule no ; and no type check #define BYTE unsigned char #define BOOL int
// by introducing new type, ending ; is required typedef unsigned char BYTE; typedef int BOOL;
C language has no strict type checking #define ~ typedef, but typedef is usually better integrated into compiler. 74
37 4.12.2014
Bitwise Logic Instructions Examples: AND & n = n & 0xF0; OR | n = n | 0xFF00; XOR ^ n = n ^ 0x80; left shift << n = 0xFF << 4; right shift >> n = n >> 4 NOT ~ n = ~0xFF;
In C, operator << usually behaves as logical for unsigned operand and as arithmetic shift for signed operands. 75
NIOS Shift Instructions
shift dir Register Immediate note logical left sll slli C << ; *2 logical right srl srli C >> ; unsigned/2 arithmetic right sra srai signed / 2 rotate left rol roli barrel shifter rotate right ror --- barrel shifter
sll rC, rA, rB rC ← rA << (rB[4..0])
Immediate slli rC, rA, sh rC ← rA << (sh)
76
38 4.12.2014
NIOS and C
1. ORI r4, r2, 0x30 2. ANDI r4, r2, 0x0F 3. XORI r4, r2, 0b10000000 4. NOR r4, r2, r0 /* r4←not r2 */ 5. SLLI r4, r2, 2 6. SRLI r4, r2, 3 7. SRAI r4, r2, 4 8. Rotations ROR and ROL has no direct C equivalents.
int x4, x2=-10; 1. x4 = x2 | 0x30 ; 2. x4 = x2 & 0x0F ; 3. x4 = x2 ^ 0b10000000; 4. x4 = ~x2 5. x4 = x2<<2; 6. x4 = (unsigned int) x2>>3 7. x4 = (signed int) x2>>3; 77
Example - demo2.s
/*.include "nios_macros.s"*/ .equ LEDR_BASE, 0x10000000 .equ SW_BASE, 0x10000040
.text/* executable code follows */ .global_start _start: /* initialize base addresses of parallel ports */ movia r10, SW_BASE/* SW slider switch base address */ movia r12, LEDR_BASE/* red LED base address */
LOOP: ldwio r8, 0(r10) slli r8, r8, 1 stwio r8, 0(r12) br LOOP
.end
78
39 4.12.2014
volatile keyword int main(void) optimized { int x4, x2= -10; //0xfffffff8 code x4 = x2<<2; x4 = (unsigned int) x2>>3; // x4 = 0x1ffffffe x4 = (signed int) x2>>3; // x4=0xfffffffe }
int main(void) { // no assumption about value volatile volatile int x4, x2= -10; winged animal from plural x4 = x2<<2; of Latin volatilis x4 = (unsigned int) x2>>3; = winged x4 = (signed int) x2>>3;
} 79 79
Volatile
When the keyword volatile is specified for a variable, it gives a hint to the compiler not to do certain optimizations as this variable can change from other parts of the program unexpectedly. What is meant here, is that the compiler should not reuse the value already loaded in a register, but access the memory again as the value in register is not guaranteed to be the same as the value stored in memory. All addresses of IO devices should be volatile. [Source: C-reference]
80
40 4.12.2014
Pointer Operators & (address operator) Returns the address of its operand Example int y = 5; int *yPtr; yPtr = &y; // yPtr gets address of y yPtr “points to” y yptr 500000 600000 y 5 yPtr address of y is value of yptr y 600000 5
81
Pointer Operators
& (address operator) Returns the address of its operand * dereference address Get operand stored in address location * and & are inverses (though not always applicable) Cancel each other out *&myVar == myVar and &*yPtr == yPtr
82
41 4.12.2014
Size of Pointer in C-kod int * ptri; ptri - char * ptrc; double * ptrd; ptri+1
*ptrx ≡ ptrx[0] ptrc
*(ptrx+1) ≡ ptrx[1] ptrc+1 *(ptrx+n) ≡ ptrx[n] ptrd *(ptrx-n) ≡ ptrx[-n]
nr1 = sizeof (double); nr2 = sizeof (double*); ptrd+1 + nr1 != nr2 83
Edge Detector – Full Source Code .equ IOBASE, 0x10000000 .equ LEDG, 0x10 Memory Address .equ SW, 0x40 .text = C pointer
.global _start type * pointer; _start: movia r8, IOBASE mov r18, zero LOOP: Dereferencing ldwio r15, SW(r8) Pointer nor r9, zero, r17 /* r9:=not r17*/ and r9, r9, r16 * (pointer+ix) and r9, r9, r15 cmpne r9, r9, zero ≡ pointer[ix] xor r18, r18, r9 stwio r18, LEDR(r8) mov r17, r16 mov r16, r15 br LOOP .end 84
42 4.12.2014
Char [ ]
C-kod: text: ’H’ 0 char text[] = ”Hello World!”; ’a’ 1 ’l’ 2 ’l’ 3 /* text = &text[0] */ text[4] ’o’ 4 ’ ’ 5 ’W’ 6 text[7] ’o’ 7 Assembler: ’r’ 8 ’l’ 9 .data ’d’ 10 ’!’ 11 text: .asciz ”Hello World!” 0x00 12
85
int [ ]
C-kod: int vect[] = {1, 10, 0xA};
/* vect = &(vect[0]) */ vect: 0x01 0 0x00 1 0x00 2 0x00 3 Assembler: *(vect+1) ≡ vect[1] 0x0A 4 0x00 5 .data 0x00 6 0x00 7 .align 2 *(vect+2) ≡ vect[2] 0x0A 8 0x00 9 vect: .word 1, 10, 0xA 0x00 10 #little endian byte ordering 0x00
86
43 4.12.2014
Pointer and const int x; int * lpio = 0x10000000; *lpio = 1; x=*lpio; lpio++; const int * lpCio = 0x10000000; *lpCio = 1; x=*lpCio; lpCio++; int * const lpioC = 0x10000000; *lpioC = 1; x=*lpioC; lpioC++; const int * const lpCioC = 0x10000000; *lpCioC = 1; x=*lpCioC; lpCioC++;
87
VGARectangle
DE2 Computer Extender
44 4.12.2014
C-program //#define DEBUG volatile int * red_LED_ptr = (int *) 0x10000000; volatile int * SW_switch_ptr = (int *) 0x10000040; void WrData(int address, int data); // WrData to extender int Count(int * counter, int * updown, int max); // increment/decrement counter on updown bits int main(void) { int x0=0, y0=0, xwidth=1, yheight=1;WrData(0, x0); WrData(1, y0); WrData(2, xwidth); WrData(3, yheight); // initilize extender while(1) { int SW_value = *(SW_switch_ptr); // read the SW slider switch values if(Count(&x0, &SW_value, 640)) WrData(0, x0); if(Count(&y0, &SW_value, 480)) WrData(1, y0); if(Count(&xwidth, &SW_value, 640)) WrData(2, xwidth); if(Count(&yheight, &SW_value, 480)) WrData(3, yheight); int delay_count = 500000; while(delay_count) --delay_count; // delay loop } } 89
Functions // Write on Address of extender data void WrData(int address, int data) { int send = (data & 0x3FF) | ((address&3)<<14); // address 15. and 14. bit + data *red_LED_ptr=send; *red_LED_ptr=(send | 0x20000); // toggle 17. bit int delay=10; while(delay>0) delay--; *red_LED_ptr=send; // reset toggle #ifdef DEBUG printf("%d:%d\t",address, data); #endif } // increment/decrement counter on updown bits int Count(int * counter, int * updown, int max) { int ud=*updown, cntr=*counter, cntr0=cntr; if((ud & 1) != 0 && cntr>0) cntr--; if((ud & 2) != 0 && cntr
45 4.12.2014
Edge Detection in C
Optimal / Non-optimal code
This parts demonstrates links between assembler and C program
46 4.12.2014
NIOS: Common Register Usage in C
Reg Name Normal usage Reg Name Normal usage r0 zero 0x0000_0000 r16 Callee-Saved r1 at Assembler Temporary r17 General-Purpose Registers r2 Return Value (least-significant 32 bits) r18 r3 Return Value (most significant 32 bits) r19 r4 Register Arguments (First 32 bits) r20 r5 Register Arguments (Second 32 bits) r21 r6 Register Arguments (Third 32 bits) r22 r7 Register Arguments (Fourth 32 bits) r23 r8 Caller-Saved r24 et Exception Temporary r9 General-Purpose Registers r25 bt Break Temporary r10 r26 gp Global Pointer r11 r27 sp Stack Pointer r12 r28 fp Frame Pointer r14 r29 ea Exception Return Address r14 r30 ba Break Return Address
r15 r31 ra Return Address 93
VHDL 32 bit edge detector library ieee; use ieee.std_logic_1164.all; entity re32d2 is port(data : in std_logic_vector(31 downto 0); clk : in std_logic; q : out std_logic); end entity; architecture rtl of re32d2 is signal r15, r16, r17: std_logic_vector(31 downto 0); --the shift register signal r18: std_logic:='0'; --output flip-flop begin process (clk) begin if (rising_edge(clk)) then if((r15 and r16 and (not r17))/=X"00000000") then r18<=not r18; end if; r17<=r16; r16<=r15; r15<=data; --shift end if; end process; q<=r18; --write output end rtl; 94
47 4.12.2014
NIOS SW Edge Detector r9 lpm_or32to1 r9
q[31..0] 32
32 0 inst7 data[17..0] data[31..18]
INPUT sw[17..0] VCC
14 inst1
5
GND inst
32
32 32 data[31..0] lpm_and inst10 2 XOR 4 NOT lpm_tff0 6 DFF lpm_tff0 3 lpm_tff0 DFF data[31..0] DFF VCC q[31..0] data[31..0] DFF PRN clock q[31..0] data[31..0] D Q
clock q[31..0]
clock
1 aclr
r15 aclr inst5 9 CLRN r16 8 aclr inst4 r17 inst6
inst3 VCC
GND r18 FreqDivByEven F=10kHz OUTPUT LEDG[0] CLOCK_50 INPUT 7 VCC CLK q Parameter Value Type inst15 Even_Divisor 5000 Unsigned Integer 5. cmpne r9, r9, zero 1. ldwio r15,SW(r8) 6. xor r18, r18, r9 2. nor r9, zero, r17 7. stwio r18, LEDR(r8) 3. and r9, r9, r16 8. mov r17, r16 4. and r9, r9, r15 9. mov r16, r15 95
Low-level C-code ~ Assembler
.equ IOBASE, 0x10000000 const int IXLEDG=0x10/sizeof(int); .equ LEDG, 0x10 const int IXSW =0x40/sizeof(int); It corresponds .equ SW, 0x40 int main(void) to our .text { assembler .global _start int reg17, reg16, reg15; code _start: volatile int * const IOBASE= 0x10000000; movia r8, IOBASE int reg18=0; mov r18, zero
LOOP: while(1) ldwio r15, SW(r8) { reg15 = IOBASE[IXSW]; nor r9, zero, r17 /* r9:=notr17*/ if(reg15 & reg16 & ~reg17) and r9, r9, r16
and r9, r9, r15 reg18= reg18^1; cmpne r9, r9, zero
xor r18, r18, r9 // store result stwio r18, LEDR(r8) IOBASE[IXLEDG] = reg18; mov r17, r16 reg17=reg16; // shift mov r16, r15 reg16=reg15; br LOOP } .end } 96 96
48 4.12.2014
Higher level C-code int main(void) { volatile int * lpLEDG = (int *) 0x10000010; // red LED address volatile int * lpSW= (int *) 0x10000040; // SW switch address
int SW_mem[3], SW_value, ledVal=0;
while(1) { SW_value = lpSW[0]; // *lpSW read the SW slider switch values if(SW_mem[0] & SW_mem[1] & ~SW_mem[2]) ledVal= ledVal^1; // light up the green LEDs lpLEDG[0] = ledVal; int i; for(i=2; i>0; i--) SW_mem[i]=SW_mem[i-1]; SW_mem[0]=SW_value; // *SW_mem=SW_value } } High level C-code is compiled into ~28 NIOS instructions, the low-level into 15 NIOS instructions 97
49