Lecture 25: New Computer Architectures and Models -- Neuromorphic Architecture, Quantum Computing and 3D Integration/Photonics/ Memristor

CSE 564 Computer Architecture Fall 2016

Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan Acknowledge and Copyright

§ https://passlab.github.io/CSE564/copyrightack.html

2 REVIEW FOR SYNCHRONIZATIONS

3 Data Racing in a Multithread Program

Consider: /* each thread to update shared variable best_cost */ ! ! if (my_cost < best_cost) ! best_cost = my_cost;

– two threads, – the initial value of best_cost is 100, – the values of my_cost are 50 and 75 for threads t1 and t2 ! ! T1 T2 if (my_cost (50) < best_cost)! if (my_cost (75) < best_cost)! ! best_cost = my_cost; ! best_cost = my_cost; best_cost = my_cost;

§ The value of best_cost could be 50 or 75! § The value 75 does not correspond to any serialization of the two threads.

4 4 Mutual Exclusion using Pthread Mutex

int pthread_mutex_lock (pthread_mutex_t *mutex_lock); ! int pthread_mutex_unlock (pthread_mutex_t *mutex_lock); ! int pthread_mutex_init (pthread_mutex_t !*mutex_lock,! const pthread_mutexattr_t *lock_attr);! pthread_mutex_t cost_lock; pthread_mutex_lock blocks the calling int main() { thread if another thread holds the lock ... When pthread_mutex_lock call returns pthread_mutex_init(&cost_lock, NULL); 1. Mutex is locked, enter CS pthread_create(&thhandle1, NULL, find_best, …); 2. Any other locking attempt (call to pthread_create(&thhandle2, NULL, find_best, …); thread_mutex_lock) will cause the } blocking of the calling thread void *find_best(void *list_ptr) { ... When pthread_mutex_unlock returns pthread_mutex_lock(&cost_lock); // enter CS 1. Mutex is unlocked, leave CS if (my_cost < best_cost) Critical Section 2. One thread who blocks on best_cost = my_cost; thread_mutex_lock call will acquire pthread_mutex_unlock(&cost_lock); // leave CS the lock and enter CS

} 5 Choices of Hardware Primitives for Synchronizations -- 2 § Compare and Swap (CAS) compare&swap (&address, reg1, reg2) { if (reg1 == M[address]) { M[address] = reg2; return success; } else { return failure; } } § Load-linked and Store-conditional load-linked&store conditional(&address) { loop: ll r1, M[address]; movi r2, 1; /* Can do arbitrary comp */ sc r2, M[address]; beqz r2, loop; } https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html 6 Improved Hardware Primitives: LL-SC § Goals: – Test with reads – Failed read-modify-write attempts don’t generate invalidations – Nice if single primitive can implement range of r-m-w operations § Load-Locked (or -linked), Store-Conditional – LL reads variable into register – Follow with arbitrary instructions to manipulate its value – SC tries to store back to location – succeed if and only if no other write to the variable since this processor’s LL » indicated by condition codes; § If SC succeeds, all three steps happened atomically § If fails, doesn’t write or generate invalidations – must retry aquire

7 Simple Lock with LL-SC

lock: !ll !R1, mem[cost_lock]/* LL location to reg1 */ sc !mem[cost_lock], R2 /* SC reg2 into location*/ beqz !R2, lock /* if failed, start again */ ret unlock: !st !mem[cost_lock], #0 /* write 0 to location */ ret § Can do more fancy atomic ops by changing what’s between LL & SC – But keep it small so SC likely to succeed – Don’t include instructions that would need to be undone (e.g. stores) § SC can fail (without putting transaction on bus) if: – Detects intervening write even before trying to get bus – Tries to get bus but another processor’s SC gets bus first § LL, SC are not lock, unlock respectively – Only guarantee no conflicting write to lock variable between them – But can use directly to implement simple operations on shared variables

8 Memory Consistency Model

§ One of the most confusing topics (if not the most) in computer system, parallel programming and parallel computer architecture

9 AND TURING MACHINE

10 The Stored Program Computer

1944: ENIAC – Presper Eckert and John Mauchly -- first general electronic computer. – hard-wired program -- settings of dials and switches. 1944: Beginnings of EDVAC – among other improvements, includes program stored in memory 1945: John von Neumann – wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC – failed to credit designers, ironically still gets credit The basic structure proposed in the draft became known as the “von Neumann machine” (or model). – a memory, containing instructions and data – a processing unit, for performing arithmetic and logical operations – a control unit, for interpreting instructions More history in the optional online lecture

11 John von Neumann

12 Von Neumann Architecture (Model)

§ Machine Model – Architecture

https://en.wikipedia.org/wiki/Von_Neumann_architecture

13 The Von Neumann Architecture

§ Model for designing and building computers, based on the following three characteristics: 1) Main sub-systems of computers » Memory » ALU (Arithmetic/Logic Unit) » Control Unit » Input/Output System (I/O) 2) Program is stored in memory during execution. 3) Program instructions are executed sequentially.

14 Memory k x m array of stored bits (k is usually 2n) Address – unique (n-bit) identifier of location Contents 0000 0001 – m-bit value stored in location 0010 00101101 0011 Need to distinguish between 0100 • the address of a memory cell 0101 • 10100010 and the content of a memory cell 0110 •

1101 1110 1111

15 Operations on Memory

§ Fetch (address): – Fetch a copy of the content of memory cell with the specified address. – Non-destructive, copies value in memory cell. § Store (address, value): – Store the specified value into the memory cell specified by address. – Destructive, overwrites the previous value of the memory cell. § The memory system is interfaced via: – Memory Address Register (MAR) – Memory Data Register (MDR) – Fetch/Store signal

16 Processing Unit

Functional Units – ALU = Arithmetic and Logic Unit – could have many functional units. some of them special-purpose (multiply, square root, …) Registers – Small, temporary storage – Operands and results of functional units Word Size – number of bits normally processed by ALU in one instruction – also width of registers

17 Input and Output

Devices for getting data into and out of computer memory

INPUT OUTPUT Keyboard Monitor Each device has its own interface, Mouse Printer usually a set of registers like the Scanner LED memory’s MAR and MDR Disk Disk

– Keyboard (input) and console (output) – keyboard: data register (KBDR) and status register (KBSR) – console: data register (CRTDR) and status register (CRTSR) – frame buffer: memory-mapped pixels Some devices provide both input and output – disk, network Program that controls access to a device is usually called a driver.

18 Control Unit (Finite State Machine)

Orchestrates execution of the program

CONTROL UNIT PC IR

Instruction Register (IR) contains the current instruction. Program Counter (PC) contains the address of the next instruction to be executed. Control unit: – reads an instruction from memory » the instruction’s address is in the PC – interprets the instruction, generating signals that tell the other components what to do » an instruction may take many machine cycles to complete

19 Instruction Processing

Fetch instruction from memory

Decode instruction

Evaluate address

Fetch operands from memory

Execute operation

Store result

20 Driving Force: The Clock

The clock is a signal that keeps the control unit moving. – At each clock “tick,” control unit moves to the next machine cycle -- may be next instruction or next phase of current instruction. Clock generator circuit: – Based on crystal oscillator – Generates regular sequence of “0” and “1” logic levels – Clock cycle (or machine cycle) -- rising edge to rising edge

“1” “0” Machine time→ Cycle

21 Summary -- Von Neumann Model

MEMORY

MAR MDR

INPUT OUTPUT Keyboard Monitor Mouse PROCESSING UNIT Printer Scanner LED Disk ALU TEMP Disk

CONTROL UNIT

PC IR

22 Turing Machine 1936

https://en.wikipedia.org/wiki/Turing_machine

23 Alan Turing

24 Modern Computers

§ Von Neumann Machine implement a universal Turing machine and have a sequential architecture § Most computers are based on John Von Neumann architecture and on 1936 Turing architecture

25 NEUROMORPHIC COMPUTING

26 Why Neuromorphic Computing?

http://www.isqed.org/English/Archives/2015/Tutorials/Section-6-Jiang.pdf 27 Brain – The Most Efficient Computing Machine

28 Biological Neural Systems https://en.wikipedia.org/wiki/Nervous_system

https://en.wikipedia.org/wiki/Synapse 29 Artificial Neural Network

§ Each neuron can perform non-linear operations. § The algorithm is designed to mimic the behavior of the biological neural network. § Currently, the algorithm is running on a normal PC. § New parallel computing chips can help to improve the learning process. – Neurosynaptic chips/brain-inspired chips – The hardware design also mimics the biology brain – is also based on neural network

30 Artificial Neural Network

• “We require exquisite numerical precision over many logical steps to achieve what brains accomplish in very few short steps.” - John von Neumann • A neural network is a massively parallel distributed processor made up of simple processing unit, which has a natural propensity for storing experiential knowledge and making it available for use.

31 Neurosynaptic System

§ n

32 Neuromophic System

33 Comparison with Conventional Chips Architecture Conventional Computer Brain Inspired Computer

Architecture Von Neumann Neural Network Computing unit CPU Synaptic Chip (e.g. TrueNorth) Storing unit Memory Synaptic Chip (e.g. TrueNorth) Computing Serial (multiple cores) Massively Parallel Communication CPU <-> Memory Neurons <-> Neurons Advantage Processing (Logical, Analytical) Learning (Pattern Recognition)

Von Neumann bottleneck 34 Comparison with Conventional Chips: Architecture § Processing and Storage are separated in CPU. § CPU is built for a linear process to handle linear sequence of events. § Synaptic chip integrates processing with storage. § Synaptic chip process the information in a massively parallel fashion. § Each neuron is able to process a piece of information and store it locally. § Synapses helps with data communication between neurons. § Synapses decides the connectivity between neurons and thus be able to rewire them.

http://www.forbes.com/sites/alexknapp/2011/08/26/how-ibms-cognitive-computer-works/ 35 Comparison with Conventional Chips

• Example: Converting the color image (480x360x3) to gray image – Gray value = (red + green + blue)/3; – CPU: 480x360 =172800 iterations of linear processing. – Synaptic chips, with 480x360x3 input neurons. 480x360 output neurons, For each output neurons, compute the gray value. One iteration of parallel processing • CPU is good at linear processing to make sure the logic sequence is correct. • Synaptic chip is good at massively parallel processing the image data. – Image processing – Pattern recognition – Network simulation ……

36 Comparison with Conventional Chips Complexity

37 Comparison with Conventional Chips: Power Efficiency § > 1000 times as efficient as chips made with the conventional architecture.

Conventional Chips TrueNorth 50~100 w/(cm*cm) 0.02 w/(cm*cm)

§ In 2012, Sequoia IBM conventional supercomputer simulating brain using 500 billion neurons and 100 trillion synapses, running at 1/1500 of brain speed, requires 12 GW of power. § Each TrueNorth consumes 0.07w (Average 63mw, Maximum 72mw). § The same simulation requires 27.3~35 kw.

Science 8 August 2014: 345 (6197), 614-616. 38 Comparison with Conventional Chips

§ The efficiency of conventional computers is limited because they store data and program instructions in a block of memory that’s separate from the processor that carries out instructions. As the processor works through its instructions in a linear sequence, it has to constantly shuttle information back and forth from the memory store—a bottleneck that slows things down and wastes energy. § While synaptic chips work parallelly, and the information can be stored in numerous synaptic chips. The integration of processing and storing avoids data shuttling and makes computing more energy efficient. § About 176,000 times more efficient than a modern CPU running the same brain-like workload.

http://www.extremetech.com/extreme/187612-ibm-cracks-open-a-new-era-of-computing-with-brain-like-chip-4096-cores-1- million-neurons-5-4-billion-transistors http://www.technologyreview.com/news/529691/ibm-chip-processes-data-similar-to-the-way-your-brain-does/

39 Comparison with Conventional Chips Power Efficiency

40 Comparison with Conventional Chips Denser Package • Denser package – TrueNorth size is 4.3 cm^2 – In order to achieve the same computational performance of Sequoia IBM conventional supercomputer, it requires 400K~500K TrueNorth chips. – The size will be about 172~215 m^2.

Sequoia supercomputer Synaptic chip wall http://www.artificialbrains.com/darpa-synapse- program http://www.extremetech.com/extreme/131413-us-retakes-supercomputing-crown-with-16-petaflops-sequoia-china-promises-100- petaflops-by-2015 41 Comparison with Conventional Chips Faster Speed • Faster Speed – Multi-object detection and classification with 400-pixel-by-240-pixel three- color § video input at 30 frames per second. – more than 160 million spikes per second (5.44 Gbits/sec) – TrueNorth vs Core i7 CPU 950 with 4 cores and 8 threads, clocked at 3.07GHz (45nm process, 2009)

www.sciencemag.org/content/345/6197/668/suppl/ DC1

42 Comparison with Conventional Chips Commercial Availability Hardware CPU Synaptic Chip Dominant Design Intel, AMD No (TrueNorth and NPU are prototypes) Quantity 1 (Generally) ~50K (for Human-brain scale, but growing) Breakthrough Intel 4004 in 1971 IBM TrueNorth in 2014 (not on market) Manufacturing Transistor process Transistor process (45nm, 28nm)

Software CPU Synaptic Chip Language C,C++,Java, etc IBM Corelet Language Operating System Windows, iOS, Linux New operating system Application Office, Game, etc New application (Corelet Library) Algorithm General (include learning) Learning Algorithm Compiler Available New compiler Debugger Available New debugger 43 http://www.computerworld.com/article/2484737/computer-processors/ibm-devises-software-for-its-experimental-brain-like- synapse-chips.html http://www.research.ibm.com/software/IBMResearch/multimedia/IJCNN2013.corelet-language.pdf Amir, Arnon, et al. " programming paradigm: a corelet language for composing networks of neurosynaptic cores." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013. http://darksilicon.ucsd.edu/2012/assets/slides/13 43 Comparison with Conventional Chips Commercial Availability Current Status

Switching Point

http://www.computerworld.com/article/2484737/computer-processors/ibm-devises-software-for-its-experimental-brain-like- synapse-chips.html http://www.research.ibm.com/software/IBMResearch/multimedia/IJCNN2013.corelet-language.pdf Amir, Arnon, et al. "Cognitive computing programming paradigm: a corelet language for composing networks of neurosynaptic cores." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013. http://darksilicon.ucsd.edu/2012/assets/slides/13

44 Comparison with Conventional Chips § Qualcomm’s zeroth chip (Neural Processing Unit) may be incorporated into the new smartphone. – “The Zeroth software is being developed to launch with Qualcomm’s Snapdragon 820 processor, which will enter production later this year. The chip and the Zeroth software are also aimed at manufacturers of drones and robots.”

https://en.wikipedia.org/wiki/ Zeroth_(software) https://www.qualcomm.com/invention/ cognitive-technologies/machine-learning

45 Performance Neuromorphic Systems Status Stanford HRL SpiNNake HiCANN IBM Human Univesity Neuorm r HBP HBP TrueNort Brain Neurogrid orphic (2012) (2012) h (2009) chip (2014) (2014)

Neurons / Prototype 1.00E+06 2304 2.00E+07 1.20E+06 1.60E+07 2.00E+10

Synapses / Prototype 8.00E+09 292000 2.00E+10 3.00E+08 4.00E+09 2.00E+14

Power Consumpons 50 120 1000 3000 20 10 (mW/cm^2) mW/cm3 Manufacturing process 180 90 130 65 28 (nm)

http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=tVSs3tKj1tw http://www.research.ibm.com/articles/brain-chip.shtml http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=i9UhV_HagUs http://www-03.ibm.com/press/us/en/pressrelease/44529.wss http://rt.com/usa/202323-brain-chip-drone-darpa/ http://www.artificialbrains.com/spinnaker#hardware 46 Performance Neuromorphic Systems Status

47 Performance of IBM Cognitive Chip

§ Defense Advanced Research Projects Agency (DARPA) SyNAPSE(Systems of Neuromorphic Adaptive Plastic Scalabe Electronics) – Brain-inspired computer architecture, event-driven

Prototype of IBM Basical Structure http://digi.163.com/14/0811/06/A3BKCVHG001618H9.html http://www.research.ibm.com/articles/brain-chip.shtml

48 Performance of IBM Cognitive Chip

Rate of Improvement of IBM TrueNorth Prototype

Year 2013 2014 2017 2018 Human Brain

Neurons 1.00E+06 1.60E+07 4.00E+09 1.00E+10 2.00E+10

Synapses 4.00E+09 1.00E+12 1.00E+14 2.00E+14

Power Consumptions 5.45E+04 4000 1000 20 (W)

http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=tVSs3tKj1tw http://www.research.ibm.com/articles/brain-chip.shtml http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=i9UhV_HagUs http://www-03.ibm.com/press/us/en/pressrelease/44529.wss

49 Performance of IBM Cognitive Chip

CAGR=1160%

CAGR=531%

CAGR=-172%

http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=tVSs3tKj1tw http://www.research.ibm.com/articles/brain-chip.shtml http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=i9UhV_HagUs http://www-03.ibm.com/press/us/en/pressrelease/44529.wss The compound annual growth rate (CAGR) is the mean annual growth rate of an investment over a specified period of time longer than one year. 50 Artificial Neural Network

§ ANN: To simulate the biological brain with nonlinear, dynamic, statistical mathematic model. – Blue Brain Project – EPFL – IBM BlueGene L/P supercomputer – Open source simulation software NEURON www.neuron.yale.edu/neuron/

Blue Brain Project Cortical Rat cortical Equivalent honey Rat brain Full human column column bee brain neocortical brain (2006) (2007) (2012) (2014) (2023)

Neurons 10000 10000 1.00E+06 2.10E+07 8.60E+10 http://bluebrain.epfl.ch/page-59952-en.html http://www.artificialbrains.com/blue-brain-project

51 Artificial Neural Network

CAGR=160%

http://bluebrain.epfl.ch/page-59952-en.html http://www.artificialbrains.com/blue-brain-project

52 Cost Analysis

§ According to the Moore's Law, we can predicte the number of transistors in one chip in the future – number of transistors per chip

year 2014 2016 2017 2018 2020 2022 2024 number 5.40E+9 1.08E+10 1.53E+10 2.16E+10 4.32E+10 8.64E+10 1.73E+11

53 Cost Analysis

§ According to the price we assumed previously, we can get the following result:

year 2014 2017 2018

cost 145.8 413.1 583.2

54 Cost Analysis

§ According to the table below,we can calculate the interval number of chips:

Rate of Improvement of IBM TrueNorth Prototype

Year 2013 2014 2017 2018 Human Brain Neurons 1.00E+06 1.60E+07 4.00E+09 1.00E+10 2.00E+10

Synapses 4.00E+09 1.00E+12 1.00E+14 2.00E+14

Power 5.45E+04 4000 1000 20 Consumpons (W)

55 Cost Analysis

§ The number of chips for one artificial brain

year 2014 2017 2018 low 1250 5 2 high 50000 200 4

5 10 low high 4 10

of

3

10s ber p m hi c

nu 2 10 he t

1 10

2013 2014 2015 2016 2017 2018 2019 year

56 Cost Analysis

§ Total cost for one artificial brain ($) year 2014 2017 2018

low 1.82E+05 2065.5 1166.4

high 7.29E+06 82620 2332.8

8 10 low hig 7 10 h

6 CAGR=-789% t 10 s o c

($) ($) al t 5 o

t 10

4 10 CAGR=-253%

2013 2014 2015 2016 2017 2018 2019 year

57 Application

58 Applications

Object detection

§ solar-powered leaf – detect changes in the environment – send out environmental and forest fire alerts.

§ Roller robot l search-and-rescue robots l has 32 video cameras l beam back data from hazardous environments.

http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=JOW40Q37jNX

59 Application

§ Vision assistance for the blind – Emulating the visual cortex, low-power, light-weight eye glasses designed to help the visually impaired could be outfitted with multiple video and auditory sensors that capture and analyze this optical flow of data.

http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=JOW40Q37jNX

Image Processing

60 Application Vision assistance

§ Synesthetic feedback – Surround sound is used to indicate the location of point of interest and provide an audible guidance through the pathway. § Visual cues – For users with residual sight, point of interest or obstacles – can be highlighted by displaying either overlapping symbols or braille keywords.

61 Hybrid von Neumann & Neuromophic Architecture with Photonics Interconnect and 3D Stack Chips/Memristor

62 Videos and Info from IBM

§ IBM – http://www.research.ibm.com/cognitive-computing/

§ DoE Neuromophic computing workshop – http://ornlcda.github.io/neuromorphic2016/

§ IEEE Rebooting Computing – http://rebootingcomputing.ieee.org/

§ Others (link from this site): – https://www.hpcwire.com/2016/03/21/lacking-breakthrough- neuromorphic-computing-steadily-advance/

63 QUANTUM COMPUTING

64 Deterministic

§ Binary Logic – 0 and 1

§ Two states – High and low voltage – Easy to implement

65 Quantum Computing § A quantum computer is a machine that performs calculations based on the laws of quantum mechanics, which is the behavior of particles at the sub-atomic level.

http://mrrittner.weebly.com/unit-2- biochemistry.html

66 Introduction

§ Quantum Mechanics – Why? – Moore’s law (particles become smaller toward atom/ sub-atom) – Study of matter at atomic level (The power of atoms) – Classical physics laws do not apply

§ Superposition – Simultaneously possess two or more values § Entanglement – Quantum states of two atoms correlated even though spatially separated!!! – Albert Einstein baffled “spooky action at a distance”

67 Superposition

§ Simultaneously possess two or more values – Schrodinger cat

https://en.wikipedia.org/wiki/Schr%C3%B6dinger's_cat https://en.wikipedia.org/wiki/Quantum_superposition 68 Entanglement

§ Quantum states of two atoms correlated even though spatially separated!!! – Acting on a particle here can instantly influence a particle far away, something that is often described as theoretical teleportation. § Albert Einstein baffled “spooky action at a distance” – He could not understand ;-)

§ Examples: Sending each of a pair of shoes to two places – When one know whether hers is left or right, she knows the other one

https://www.scienceandnonduality.com/videos/brilliantly-simple- explanation-of-quantum-entanglement/

https://en.wikipedia.org/wiki/Quantum_entanglement 69 Feynman to create quantum machine

§ “I think I can safely say that nobody understands quantum mechanics” – Feynman § 1982 - Feynman proposed the idea of creating machines based on the laws of quantum mechanics instead of the laws of classical physics.

§ 1985 - David Deutsch developed the quantum turing machine, showing that quantum circuits are universal. § 1994 - Peter Shor came up with a quantum algorithm to factor very large numbers in polynomial time. § 1997 - Lov Grover develops a quantum search algorithm with O(√N) complexity

70 Most Recent Development

§ Two breakthroughs in 2000. – Isaac Chuang (now an MIT professor, but then working at IBM's Almaden Research Center) used five fluorine atoms to make a crude, five-qubit quantum computer. – Researchers at Los Alamos National Laboratory figured out how to make a seven-qubit machine using a drop of liquid. § 2005, researchers at the University of Innsbruck added an extra qubit and produced the first quantum computer that could manipulate a qubyte (eight qubits). § 2011, a pioneering Canadian company called D-Wave Systems announced in Nature that it had produced a 128-qubit machine. § March 2015, the team announced they were "a step closer to quantum computation," having developed a new way for qubits to detect and protect against errors. http://www.explainthatstuff.com/quantum-computing.html § Quantum computing from IBM – http://www.research.ibm.com/quantum/

71 Representation of Data - Qubits § A bit of data is represented by a single atom that is in one of two states denoted by |0> and |1>. A single bit of this form is known as a qubit § A physical implementation of a qubit could use the two energy levels of an atom. An excited state representing |1> and a ground state representing |0>.

Light pulse of frequency λ for Excited time interval t State

Nucleu Ground s State Electron State |0> State |1>

72 Qubits

§ A single qubit can be forced into a superposition of the two states denoted by the addition of the state vectors – A qubit in superposition is in BOTH of the states |1> and |0 at the same time

” To be or not to be. That is the question” – William Shakespeare The classic answers: ”to be” or ”not to be“

The quantum answers: ”to be” or ”not to be” or https://en.wikipedia.org/wiki/Qubit a x (to be) + b x (not to be) 73 Data Retrieval

§ α represents the probability of the superposition collapsing to |0>. The α’s are called probability amplitudes. In a balanced superposition, α = 1/√2 where n is the number of qubits.

§ If we attempt to retrieve the values represented within a superposition, the superposition randomly collapses to represent just one of the original values.

1 2 1

n

74 Qubits in Quantum Computation

§ Qubit - can be 1, 0 or both 1 and 0 § |Ψ> - number in Quantum Computer

§ Superposition of states of N qubits:

2N −1 2N −1 2 a s Where: a 1 ∑ i i ∑ i = i=0 i=0

75 Examples

1 1 0 + 1 2 2

1 1 1 1 00 + 01 + 10 + 11 2 2 2 2

76 Representation of Data - Superposition

Light pulse of frequency λ for time interval t/2

State |0> State |0> + |1>

Consider a 3 bit qubit register. An equally weighted superposition of all possible states would be denoted by: |ψ> = |000> + |001> + . . . + |111> 1 1 1 √8 √8 √8

77 Relationships among data - Entanglement

§ Entanglement is the ability of quantum systems to exhibit correlations between states within a superposition. § Imagine two qubits, each in the state |0> + |1> (a superposition of the 0 and 1.) § We can entangle the two qubits such that the measurement of one qubit is always correlated to the measurement of the other qubit.

§ Used for quantum communication, such as quantum teleportation

https://en.wikipedia.org/wiki/Quantum_teleportation http://www.sciencemag.org/news/2016/09/big-step-quantum-teleportation-won-t- bring-us-any-closer-star-trek-here-s-why

78 Quantum Computation § Prime factorization (Cryptography) – Peter Shor’s algorithm – Hard classical computation becomes easy quantum computation – Factor n bit integer in O(n3) § Search an unordered list – Lov Grover’s algorithm – Hard classical computation becomes less hard quantum computation – n elements in n1/2 queries

79 Advantages over Classical computers

§ Encode more information § Powerful § Massively parallel § Easily crack secret codes § Fast in searching databases § Hard computational problems become tractable – NP to polynomial

80 Applications

§ Defense § Cryptography § Accurate weather forecasts § Efficient search § Teleportation § … § Unimaginable

81 Information

§ IBM – http://www.research.ibm.com/quantum/ § D-Wave – http://www.dwavesys.com/ § Google’s Quantum Dream Machine – https://www.technologyreview.com/s/544421/googles- quantum-dream-machine/

§ Others

82 QUANTUM GATES

83 Operations on Qubits - Reversible Logic § Due to the nature of quantum physics, the destruction of information in a gate will cause heat to be evolved which can destroy the superposition of qubits.

Ex. Input Output The AND In these 3 cases, A B C information is Gate 0 0 0 being destroyed A 0 1 0 C B 1 0 0 1 1 1

§ This type of gate cannot be used. We must use Quantum Gates.

84 Quantum Gates § Quantum Gates are similar to classical gates, but do not have a degenerate output. i.e. their original input state can be derived from their output state, uniquely. They must be reversible. § § This means that a deterministic computation can be performed on a quantum computer only if it is reversible. Luckily, it has been shown that any deterministic computation can be made reversible. (Charles Bennet, 1973)

85 Quantum Gates - Hadamard

§ Simplest gate involves one qubit and is called a Hadamard Gate (also known as a square-root of NOT gate.) Used to put qubits into superposition.

H H

State State | State |0> 0> + |1> |1>

Note: Two Hadamard gates used in succession can be used as a NOT gate

86 Quantum Gates - Controlled NOT

§ A gate which operates on two qubits is called a Controlled-NOT (CN) Gate. If the bit on the control line is 1, invert the bit on the target line.

Input Output A - Target A’ A B A’ B’ 0 0 0 0 0 1 1 1 B - B’ 1 0 1 0 Control 1 1 0 1

Note: The CN gate has a similar behavior to the XOR gate with some extra information to make it reversible.

87 Example Operation - Multiplication By 2

§ We can build a reversible logic circuit to calculate multiplication by 2 using CN gates arranged in the following manner:

Input Output Carry Ones Carry Ones Bit Bit Bit Bit 0 0 0 0 0 1 1 0

0 Carry Bit

H Ones Bit

88 Quantum Gates - Controlled Controlled NOT (CCN)

§ A gate which operates on three qubits is called a Controlled Controlled NOT (CCN) Gate. Iff the bits on both of the control lines is 1,then the target bit is inverted.

Input Output A B C A’ B’ C’ A - Target A’ 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 B - Control B’ 0 1 1 1 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 C - Control C’ 1 1 0 1 1 0 2 1 1 1 0 1 1

89 A Universal Quantum Computer

§ The CCN gate has been shown to be a universal reversible logic gate as it can be used as a NAND gate.

A - Target A’ Input Output A B C A’ B’ C’ 0 0 0 0 0 0 B - Control B’ 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1 C - Control C’ 1 0 0 1 0 0 2 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 When our target input is 1, our target output is a result of a NAND of B and C.

90 Shor’s Algorithm § Shor’s algorithm shows (in principle,) that a quantum computer is capable of factoring very large numbers in polynomial time.

§ The algorithm is dependant on § Modular Arithmetic § Quantum Parallelism § Quantum Fourier Transform

91 Timeline

§ 2003 - A research team in Japan demonstrated the first solid state device needed to construct a viable quantum computer § 2001 - First working 7-qubit NMR computer demonstrated at IBM’s Almaden Research Center. First execution of Shor’s algorithm. § 2000 - First working 5-qubit NMR computer demonstrated at IBM's Almaden Research Center. First execution of order finding (part of Shor's algorithm). § 1999 - First working 3-qubit NMR computer demonstrated at IBM's Almaden Research Center. First execution of Grover's algorithm. § 1998 - First working 2-qubit NMR computer demonstrated at University of California Berkeley. § 1997 - MIT published the first papers on quantum computers based on spin resonance & thermal ensembles. § 1996 - Lov Grover at Bell Labs invented the quantum database search algorithm § 1995 - Shor proposed the first scheme for quantum error correction 92