<<

DNA • Can we use DNA to do massive CS252 computations? – Organisms do it Graduate Architecture density Lecture 28 – DNA has very high information : » 4 different base pairs: Esoteric Computer Architecture • Adenine/Thymine • Guanine/Cytosine DNA Computing & » Always paired on opposite strands  Energetically favorable – Active operations: » Copy: Split strands of DNA Prof John D. Kubiatowicz apart in solution, gain 2 copies » Concatenate: eg: http://www.cs.berkeley.edu/~kubitron/cs252 GTAATCCT will combine XXXXXCATT with AGGAYYYYY » Polymerase Chain Reaction (PCR): amplifies region of molecule ©1999 Access Excellence between two marker molecules @ the National Health Museum 5/11/2009 cs252-S09, Lecture 28 2

DNA Computing and Hamiltonian Path Even more promising uses of DNA • Given a set of cities and costs between them (possibly directed • Self-assembly of components paths): – DNA serves as substrate – Find shortest path to all cities – Attach active elements in middle of components. – Simpler: find single path that – Final step – metal deposited over DNA visits all cities • DNA Computing example is latter version: – Every city represented by unique 20 base-pair strand – Every path between cities represented by complementary pairs: Active Region 10 pairs from source city, 10 pairs from destination – Shorter example: AAGT for city 1, TTCG for city 2 Path 1->2: CAAA DNA Bonding Active Region Will build: AAGTTTCG • Other interesting structures could be built ..CAAA.. – Dump “city molecules” and “path molecules” into testtube. Select and amplify paths of right length. Analyze for result. – Been done for 6 cities! (Adleman, ~1998!)

5/11/2009 cs252-S09, Lecture 28 3 5/11/2009 cs252-S09, Lecture 28 4 Use Quantum Mechanics to Compute? Quantization: Use of “Spin” • Weird but useful properties of quantum mechanics: North – Quantization: Only certain values or orbits are good » Remember orbitals from chemistry??? – Superposition: Schizophrenic physical elements don’t quite know Spin ½ particle: Representation: whether they are one thing or another (Proton/Electron) |0> or |1> • All existing digital abstractions try to eliminate QM – Transistors/Gates designed with classical behavior – Binary abstraction: a “1” is a “1” and a “0” is a “0” • Quantum Computing: South Use of Quantization and Superposition to compute. • Interesting results: • Particles like Protons have an intrinsic “Spin” – Shor’s algorithm: factors in polynomial time! when defined with respect to an external – Grover’s algorithm: Finds items in unsorted database in time magnetic field proportional to square-root of n. – Materials simulation: exponential classically, linear-time QM • Quantum effect gives “1” and “0”: – Either spin is “UP” or “DOWN” nothing between

5/11/2009 cs252-S09, Lecture 28 5 5/11/2009 cs252-S09, Lecture 28 6

Kane Proposal II (First one didn’t quite work) Now add Superposition! • The bit can be in a combination of “1” and “0”:

– Written as: = C0|0> + C1|1> Single Spin – The C’s are complex numbers! – Important Constraint: |C |2 + |C |2 =1 Control Gates 0 1 • If measure bit to see what looks like, 2 Inter-bit – With probability |C0| we will find |0> (say “UP”) 2 Control Gates – With probability |C1| we will find |1> (say “DOWN”) • Is this a real effect? Options: Phosphorus – This is just statistical – given a large number of protons, a fraction 2 Impurity Atoms of them (|C0| ) are “UP” and the rest are down. – This is a real effect, and the proton is really both things until you • Bits Represented by combination of proton/electron spin try to look at it • Reality: second choice! • Operations performed by manipulating control gates – There are experiments to prove it! – Complex sequences of pulses perform NMR-like operations • Temperature < 1° Kelvin! 5/11/2009 cs252-S09, Lecture 28 7 5/11/2009 cs252-S09, Lecture 28 8 A register can have many values! Spooky action at a distance

• Implications of superposition: • Consider the following simple 2-bit state: = C |00>+ C |11> – An n-bit register can have 2n values simultaneously! 00 11 – Called an “EPR” pair for “Einstein, Podolsky, Rosen” – 3-bit example: • Now, separate the two bits: = C000|000>+ C001|001>+ C010|010>+ C011|011>+ C100|100>+ C101|101>+ C110|110>+ C111|111> • Probabilities of measuring all bits are set by Light-Years? coefficients: – So, prob of getting |000> is |C |2, etc. 000 • If we measure one of them, it instantaneously sets other one! – Suppose we measure only one bit (first): – Einstein called this a “spooky action at a distance” 2 2 2 2 » We get “0” with probability: P0=|C000| +|C001| +|C010| +|C011| – In particular, if we measure a |0> at one side, we get a |0> at the other Result: = (C000|000>+ C001|001>+ C010|010>+ C011|011>) (and vice versa) 2 2 2 2 » We get “1” with probability: P1=|C100| +|C101| +|C110| +|C111| • Teleportation Result: = (C100|100>+ C101|101>+ C110|110>+ C111|111>) – Can “pre-transport” an EPR pair (say bits X and Y) • Problem: Don’t want environment to measure – Later to transport bit A from one side to the other we: » Perform operation between A and X, yielding two classical bits before ready! » Send the two bits to the other side – Solution: Quantum Error Correction Codes! » Use the two bits to operate on Y » Poof! State of bit A appears in place of Y

5/11/2009 cs252-S09, Lecture 28 9 5/11/2009 cs252-S09, Lecture 28 10

Model: Operations on coefficients + measurements Security of Factoring • The Security of RSA Public-key cryptosystems Input Output Unitary depends on the difficult of factoring a number N=pq Complex Classical (product of two primes) Transformations Measure State Answer – Classical computer: sub-exponential time factoring – Quantum computer: polynomial time factoring • Basic Computing Paradigm: • Shor’s Factoring Algorithm (for a quantum computer) – Input is a register with superposition of many values Easy » Possibly all 2n inputs equally probable! 1) Choose random x : 2  x  N-1. – Unitary transformations compute on coefficients Easy  » Must maintain probability property (sum of squares = 1) Hard 2) If gcd(x,N) 1, Bingo! » Looks like doing computation on all 2n inputs simultaneously! 3) Find smallest integer r : xr  1 (mod N) – Output is one result attained by measurement Easy • If do this poorly, just like probabilistic computation: Easy 4) If r is odd, GOTO 1 n n – If 2 inputs equally probable, may be 2 outputs equally probable. Easy 5) If r is even, a  x r/2 (mod N)  (a-1) – After measure, like picked random input to classical function!  – All interesting results have some form of “fourier transform” computation being Easy 6) If a = N-1 GOTO 1 (a+1) = kN done in unitary transformation 7) ELSE gcd(a ±1,N) is a non trivial factor of N.

5/11/2009 cs252-S09, Lecture 28 11 5/11/2009 cs252-S09, Lecture 28 12 r ION Trap Quantum Computer: Finding r with x  1 (mod N) Promising technology Top \ \ \ k \ Cross-

 k/ 1 /  k/ x / Sectional

k k View r   1 \ w\  w r y / x / w  0 y r 1 • IONS of Be+ trapped in w\ oscillating quadrature field Quantum  – Internal electronic modes of IONS ()x / used for quantum bits Fourier w  0 – MEMs technology Transform 0 1 k – Target? 50,000 ions r r r – ROOM Temperature! • Finally: Perform measurement • Ions moved to interaction regions – Find out r with high probability – Ions interactions with one another Top View moderated by lasers – Get |y>|aw’> where y is of form k/r and w’ is related Proposal: NIST Group

5/11/2009 cs252-S09, Lecture 28 13 5/11/2009 cs252-S09, Lecture 28 14

Ion Trap Quantum Computer Ballistic Movement Network w-ui Gate Two-Qubit • Major Components - Data = an ion One-Qubit Two-Qubit Gate Gate - Gate = a location Q1 Q2 Q3 • Ballistic Movement R R - Apply pulse sequences to electrodes One-Qubit R Two-Qubit Gate R Gate - Electrostatic forces move ion Q4 Q5 - Intersections similar, but more complicated pulse sequences Memory Cell Interconnection Memory Cell Network Q1

Q2 • Problem: Noise accumulation!

5/11/2009 cs252-S09, Lecture 28 15 5/11/2009 cs252-S09, Lecture 28 16 Noise Accumulation from Movement Movement Option 2: Teleportation

1.0E-02 Source Location3. Transmit two Target Location classical bits 1.0E-03 D Entanglement 2. Local 4. Local Ops 1.0E-04 1.0E-4 inital error Ops E1 E2 D?D 1.0E-5 inital error 1.0E-05 1.0E-6 inital error 1. Generate EPR pair

Qubit Error 1.0E-06 1.0E-7 inital error 1.0E-8 inital error • Goal: Transfer the state, not the data ion 1.0E-07 • Problem: EPR pairs become noisy 1.0E-08 • Teleportation Benefits 0 16324864 Distance Moved in Gates - Error Correction of data (arbitrary state): ~100 ms Purification of EPR pair (known state): ~120 µs • Noise may increase error by factor of 100 - Pre-distribution of EPR pairs

5/11/2009 cs252-S09, Lecture 28 17 5/11/2009 cs252-S09, Lecture 28 18

EPR Pair Distribution Network Setting Up a Teleportation Link

• Purification = Amplification of EPR pair link

One-Qubit Two-Qubit - Two EPR pairs  One “purer” pair, one junk pair Gate Gate - Chance of failure • Need to send multiple pairs Q2 R Q3 R EPR Pair One-Qubit Two-Qubit GeneratorsR STRONGEREPR Qubits EntanglementEPR Qubits Gate R Gate Q4 Q5 P G P

Memory Cell Interconnection Memory Cell Recycled Qubits Recycled Qubits Q1 Network

For Data Teleportation 5/11/2009 cs252-S09, Lecture 28 19 5/11/2009 cs252-S09, Lecture 28 20 Chained Teleportation Quantum Network Architecture

Teleportation Teleportation T G T G T G T P P P P G G G G Gate Gate Gate Gate T G T G T G T G T G T T G T G T G T P • Adjacent T nodes linked for teleportation P P P P P Gate Gate Gate Gate

• Positive Features • Grid of T nodes, linked by G nodes - T node linking not on critical path • Packet-switched network - Pre-purification (Link Amplification): part of link setup - Dimension-order routing • Each qubit has associated message 5/11/2009 cs252-S09, Lecture 28 21 5/11/2009 cs252-S09, Lecture 28 22

Classical Control Running a Quantum Circuit

• Quantum Datapath Layer • Simple gates (transversal) - T Nodes and G Nodes (P Nodes and Gates not shown) • More complex gates (non-transversal) • Classical Control Layers – Exist in any universal set - Messages Associated with Qubits • Quantum Error Correction (QEC) - Teleportation and Purification Bits – 10-8 to 10-6 error rates from gates, movement and idleness – Data must be encoded and periodically error corrected • Ancilla (helper) qubits – Necessary for complex gates and for QEC – Computation with ancilla qubits > 90% of quantum program

H QEC QEC QEC QEC T T QEC QEC Q0 C X Q1 T T QEC QEC QEC QEC H QEC QEC time Serial Circuit Latency

5/11/2009 cs252-S09, Lecture 28 23 5/11/2009 cs252-S09, Lecture 28 24 Running a Quantum Circuit Running at The Speed of Data

• Ancilla qubits are independent of data • Ideally, execution time determined solely by data – Preparation may be pulled offline – Ancilla qubits should be ready just in time to avoid noise from idleness Operations hardware involving data qubits

time H QEC QEC T QEC Q0 C X Ancilla encoding Q1 T QEC QEC H QEC Parallel Circuit Latency 5/11/2009 cs252-S09, Lecture 28 25 5/11/2009 cs252-S09, Lecture 28 26

Limited Ancilla Bandwidth Ancilla Factory Design I

900000 • “In-place” ancilla preparation 0 Prep ? 800000 QEC Verify Bit QEC Cat Prep Correct s) 700000

μ 0 Prep ? Verify Phase 600000 Cat Prep Correct 500000 0 Prep ? Verify 400000 Cat Prep Ancilla Generation Circuit 300000 Encoded Ancilla Verification Qubits

32-Bit QCLA ( 200000 Execution Time of a 100000 • Ancilla factory consists of many of these 0 – Encoded ancilla prepared In-place In-place 1 10 100 1000 in many places, then Prep Prep Encoded Ancilla Bandwidth Available (Ancillae per ms) moved to output port – Movement is costly! • 32-bit Quantum Carry-Lookahead Adder – Varying rate at which encoded zero ancillae are provided for QEC In-place In-place – Conclusion: design architecture with “ancilla factories” Prep Prep

5/11/2009 cs252-S09, Lecture 28 27 5/11/2009 cs252-S09, Lecture 28 28 Idealized Qalypso Architecture Discussion of ISCA 2009 paper

• Dense data region • “A Fault Tolerant, Area Efficient Architecture for Shor’s Factoring Algorithm” – Data qubits only – Mark Whitney, Nemanja Isailovic, Yatish Patel, and John Kubiatowicz – Local communication – ISCA 2009 • Shared Ancilla Factories • How to compare layouts? (what is good)? – Distributed to data as needed – Probabilistic circuits  need metric that includes probability of failure – Fully multiplexed to all data – ADCR = Probabilistic version of area-delay product – Output ports ( ): close to data Area Latency – Input ports ( ): may be far from ADCR  single data, since recycled qubits have Psuccess irrelevant state – Lower is better • Goals • What to optimize? – Design ancilla factories – Many different “datapath organizations” – Answer Question: How much hardware is needed for ancilla generation to run at the speed of data? – Far too much error correction

5/11/2009 cs252-S09, Lecture 28 29 5/11/2009 cs252-S09, Lecture 28 30

Datapath Organizations Error Correction Optimization

• Selectively error correction placement – Standard techniques: correct after every error • QLA: “Quantum Logic Array” – Instead – only correct bits that are particularly “dirty” – Every compute region has space for 2 bits and ancilla generation • Error correction modeled after retiming optimization for 2 bits (to correct after every operation – Only place correction when approximate “EDist” parameter reaches threshold • CQLA: “Compressed Quantum Logic Array” – Then, perform full mapping – Same compute regions as QLA, but ability to have less ancilla – Choose EDist by optimizing ADCR generation/bit for memory (idle bits less prone to error) • Very successful at reducing area/latency • Qalypso • Even improves probability of success in some cases! – Matching ancilla generation to needs – Why? Because error correction involves operations which can introduce error

5/11/2009 cs252-S09, Lecture 28 31 5/11/2009 cs252-S09, Lecture 28 32 Shor’s Factoring Circuit Conclusion • Computing can be done in a variety of ways – Normal silicon gates not required • DNA Computing – Limited use “demonstration of concept” – Form of massive parallelism – Interesting consequences: self assembly • Quantum Computing: – Computing using superposition and quantization – Ion Traps: a particularly promising technology • Most of time spent in modular exponentiation • CAD Tools for Quantum Computing – QFT is much smaller fraction of time – Can actually optimize circuits – just like classical case – Easiest way to build modular exponentiation: with adders – ADCR = probabilistic version of Area-Delay product » Build multiplier instead? Not studied yet – would be very large • Paper result: can factor in 7659 mm2 – Previous result was 0.9 m2

5/11/2009 cs252-S09, Lecture 28 33 5/11/2009 cs252-S09, Lecture 28 34