Neuromorphic Architecture, Quantum Computing and 3D Integration/Photonics/ Memristor

Lecture 25: New Computer Architectures and Models -- Neuromorphic Architecture, Quantum Computing and 3D Integration/Photonics/ Memristor CSE 564 Computer Architecture Fall 2016 Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan Acknowledge and Copyright § https://passlab.github.io/CSE564/copyrightack.html 2 REVIEW FOR SYNCHRONIZATIONS 3 Data Racing in a Multithread Program Consider: /* each thread to update shared variable best_cost */ ! ! if (my_cost < best_cost) ! best_cost = my_cost; – two threads, – the initial value of best_cost is 100, – the values of my_cost are 50 and 75 for threads t1 and t2 ! ! T1 T2 if (my_cost (50) < best_cost)! if (my_cost (75) < best_cost)! ! best_cost = my_cost; ! best_cost = my_cost; best_cost = my_cost; § The value of best_cost could be 50 or 75! § The value 75 does not correspond to any serialization of the two threads. 4 4 Mutual Exclusion using Pthread Mutex int pthread_mutex_lock (pthread_mutex_t *mutex_lock); ! int pthread_mutex_unlock (pthread_mutex_t *mutex_lock); ! int pthread_mutex_init (pthread_mutex_t !*mutex_lock,! const pthread_mutexattr_t *lock_attr);! pthread_mutex_t cost_lock; pthread_mutex_lock blocks the calling int main() { thread if another thread holds the lock ... When pthread_mutex_lock call returns pthread_mutex_init(&cost_lock, NULL); 1. Mutex is locked, enter CS pthread_create(&thhandle1, NULL, find_best, …); 2. Any other locking attempt (call to pthread_create(&thhandle2, NULL, find_best, …); thread_mutex_lock) will cause the } blocking of the calling thread void *find_best(void *list_ptr) { ... When pthread_mutex_unlock returns pthread_mutex_lock(&cost_lock); // enter CS 1. Mutex is unlocked, leave CS if (my_cost < best_cost) Critical Section 2. One thread who blocks on best_cost = my_cost; thread_mutex_lock call will acquire pthread_mutex_unlock(&cost_lock); // leave CS the lock and enter CS } 5 Choices of Hardware Primitives for Synchronizations -- 2 § Compare and Swap (CAS) compare&swap (&address, reg1, reg2) { if (reg1 == M[address]) { M[address] = reg2; return success; } else { return failure; } } § Load-linked and Store-conditional load-linked&store conditional(&address) { loop: ll r1, M[address]; movi r2, 1; /* Can do arbitrary comp */ sc r2, M[address]; beqz r2, loop; } https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html 6 Improved Hardware Primitives: LL-SC § Goals: – Test with reads – Failed read-modify-write attempts don’t generate invalidations – Nice if single primitive can implement range of r-m-w operations § Load-Locked (or -linked), Store-Conditional – LL reads variable into register – Follow with arbitrary instructions to manipulate its value – SC tries to store back to location – succeed if and only if no other write to the variable since this processor’s LL » indicated by condition codes; § If SC succeeds, all three steps happened atomically § If fails, doesn’t write or generate invalidations – must retry aquire 7 Simple Lock with LL-SC lock: !ll !R1, mem[cost_lock]/* LL location to reg1 */ sc !mem[cost_lock], R2 /* SC reg2 into location*/ beqz !R2, lock /* if failed, start again */ ret unlock: !st !mem[cost_lock], #0 /* write 0 to location */ ret § Can do more fancy atomic ops by changing what’s between LL & SC – But keep it small so SC likely to succeed – Don’t include instructions that would need to be undone (e.g. stores) § SC can fail (without putting transaction on bus) if: – Detects intervening write even before trying to get bus – Tries to get bus but another processor’s SC gets bus first § LL, SC are not lock, unlock respectively – Only guarantee no conflicting write to lock variable between them – But can use directly to implement simple operations on shared variables 8 Memory Consistency Model § One of the most confusing topics (if not the most) in computer system, parallel programming and parallel computer architecture 9 VON NEUMANN ARCHITECTURE AND TURING MACHINE 10 The Stored Program Computer 1944: ENIAC – Presper Eckert and John Mauchly -- first general electronic computer. – hard-wired program -- settings of dials and switches. 1944: Beginnings of EDVAC – among other improvements, includes program stored in memory 1945: John von Neumann – wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC – failed to credit designers, ironically still gets credit The basic structure proposed in the draft became known as the “von Neumann machine” (or model). – a memory, containing instructions and data – a processing unit, for performing arithmetic and logical operations – a control unit, for interpreting instructions More history in the optional online lecture 11 John von Neumann 12 Von Neumann Architecture (Model) § Machine Model – Architecture https://en.wikipedia.org/wiki/Von_Neumann_architecture 13 The Von Neumann Architecture § Model for designing and building computers, based on the following three characteristics: 1) Main sub-systems of computers » Memory » ALU (Arithmetic/Logic Unit) » Control Unit » Input/Output System (I/O) 2) Program is stored in memory during execution. 3) Program instructions are executed sequentially. 14 Memory k x m array of stored bits (k is usually 2n) Address – unique (n-bit) identifier of location Contents 0000 0001 – m-bit value stored in location 0010 00101101 0011 Need to distinguish between 0100 • the address of a memory cell 0101 • 10100010 and the content of a memory cell 0110 • 1101 1110 1111 15 Operations on Memory § Fetch (address): – Fetch a copy of the content of memory cell with the specified address. – Non-destructive, copies value in memory cell. § Store (address, value): – Store the specified value into the memory cell specified by address. – Destructive, overwrites the previous value of the memory cell. § The memory system is interfaced via: – Memory Address Register (MAR) – Memory Data Register (MDR) – Fetch/Store signal 16 Processing Unit Functional Units – ALU = Arithmetic and Logic Unit – could have many functional units. some of them special-purpose (multiply, square root, …) Registers – Small, temporary storage – Operands and results of functional units Word Size – number of bits normally processed by ALU in one instruction – also width of registers 17 Input and Output Devices for getting data into and out of computer memory INPUT OUTPUT Keyboard Monitor Each device has its own interface, Mouse Printer usually a set of registers like the Scanner LED memory’s MAR and MDR Disk Disk – Keyboard (input) and console (output) – keyboard: data register (KBDR) and status register (KBSR) – console: data register (CRTDR) and status register (CRTSR) – frame buffer: memory-mapped pixels Some devices provide both input and output – disk, network Program that controls access to a device is usually called a driver. 18 Control Unit (Finite State Machine) Orchestrates execution of the program CONTROL UNIT PC IR Instruction Register (IR) contains the current instruction. Program Counter (PC) contains the address of the next instruction to be executed. Control unit: – reads an instruction from memory » the instruction’s address is in the PC – interprets the instruction, generating signals that tell the other components what to do » an instruction may take many machine cycles to complete 19 Instruction Processing Fetch instruction from memory Decode instruction Evaluate address Fetch operands from memory Execute operation Store result 20 Driving Force: The Clock The clock is a signal that keeps the control unit moving. – At each clock “tick,” control unit moves to the next machine cycle -- may be next instruction or next phase of current instruction. Clock generator circuit: – Based on crystal oscillator – Generates regular sequence of “0” and “1” logic levels – Clock cycle (or machine cycle) -- rising edge to rising edge “1” “0” Machine time→ Cycle 21 Summary -- Von Neumann Model MEMORY MAR MDR INPUT OUTPUT Keyboard Monitor Mouse PROCESSING UNIT Printer Scanner LED Disk ALU TEMP Disk CONTROL UNIT PC IR 22 Turing Machine 1936 https://en.wikipedia.org/wiki/Turing_machine 23 Alan Turing 24 Modern Computers § Von Neumann Machine implement a universal Turing machine and have a sequential architecture § Most computers are based on John Von Neumann architecture and on 1936 Turing architecture 25 NEUROMORPHIC COMPUTING 26 Why Neuromorphic Computing? http://www.isqed.org/English/Archives/2015/Tutorials/Section-6-Jiang.pdf 27 Brain – The Most Efficient Computing Machine 28 Biological Neural Systems https://en.wikipedia.org/wiki/Nervous_system https://en.wikipedia.org/wiki/Synapse 29 Artificial Neural Network § Each neuron can perform non-linear operations. § The algorithm is designed to mimic the behavior of the biological neural network. § Currently, the algorithm is running on a normal PC. § New parallel computing chips can help to improve the learning process. – Neurosynaptic chips/brain-inspired chips – The hardware design also mimics the biology brain – Deep learning is also based on neural network 30 Artificial Neural Network • “We require exquisite numerical precision over many logical steps to achieve what brains accomplish in very few short steps.” - John von Neumann • A neural network is a massively parallel distributed processor made up of simple processing unit, which has a natural propensity for storing experiential knowledge and making it available for use. 31 Neurosynaptic System § n 32 Neuromophic System 33 Comparison with Conventional Chips Architecture Conventional Computer Brain Inspired

Load more