10-Multiprocx6.Pdf

10-Multiprocx6.Pdf

Overview. • Mul3processor(OS((Background(and(Review)( - How(does(it(work?((Background)( - Scalability((Review)( • Mul3processor(Hardware( - Contemporary(systems((Intel,(AMD,(ARM,(Oracle/Sun)( - Experimental(and(Future(systems((Intel,(MS,(Polaris)( • OS(Design(for(Mul3processors( - Guidelines( Mul1processor.OS. - Design(approaches( • Divide(and(Conquer((Disco,(Tessela3on)( COMP9242(–(Advanced(Opera3ng(Systems( • Reduce(Sharing((K42,(Corey,(Linux,(FlexSC,(scalable(commuta3vity)( Ihor(Kuz(|([email protected]( T2/2019(Week(10( • No(Sharing((Barrelfish,(fos)( www.data61.csiro.au. 2((|( COMP9242(T2/2019(W10( Uniprocessor.OS. CPU. App1.App2. OS. Mul1processor.OS. OS.data. Applica1on.data. App4. Run.. Process.control.. FS.. App1. queue. blocks. structs. App2. App3. Memory. 3( COMP9242(T2/2019(W10( 4((|( COMP9242(T2/2019(W10( Mul1processor.OS. Mul1processor.OS. CPU. CPU. CPU. CPU. CPU. CPU. CPU. CPU. App1. App3. App4. App4. App1. App3. App4. App4. Key(design(challenges:( OS. OS. OS. OS. OS. OS. • Correctness(of((shared)(data(structures(OS. OS. • Scalability((performance(doesn’t(suffer)( OS.data. Applica1on.data. OS.data. Applica1on.data. App4. App4. Run.. Process.control.. FS.. App1. Run.. Process.control.. FS.. App1. queue. blocks. structs. App2. App3. queue. blocks. structs. App2. App3. Memory. Memory. 5((|( COMP9242(T2/2019(W10( 6((|( COMP9242(T2/2019(W10( Correctness.of.Shared.Data. Scalability. • Concurrency(control( Speedup(as(more(processors(added( - Locks( Ideal. - Semaphores( - T Transac3ons( S(N) 1 - Lockdfree(data(structures( = TN • We(know(how(to(do(this:( - In(the(applica3on( - In(the(OS( Speedup.(S). number.of.processors.(n). 7((|( COMP9242(T2/2019(W10( 8((|( COMP9242(T2/2019(W10( Scalability. Scalability.and.Serialisa1on. Processor.1. Processor.2. Processor.3. Speedup(as(more(processors(added( Parallel. Reality. Parallel( Parallel( Parallel( Program. Parallel( Parallel( Parallel( T Parallel( Parallel( Parallel( S(N) = 1 Parallel( Parallel( Parallel( TN Parallel( Parallel( Parallel( Serial( Parallel( Serial( Serial( speedup. Parallel( Parallel( Serial( Parallel( Parallel( Parallel( number.of.processors. Parallel( Parallel( Parallel( 9((|( COMP9242(T2/2019(W10( 10((|( COMP9242(T2/2019(W10( Scalability.and.Serialisa1on. Serialisa1on. Remember(Amdahl’s(law( - Serial((nondparallel)(por3on:(when(applica3on(not(running(on(all(cores( Where(does(serialisa3on(show(up?( - Applica3on((e.g.(access(shared(app(data)( - Serialisa3on(prevents(scalability( - OS((e.g.(performing(syscall(for(app)(How(much(3me(is(spent(in(OS?( T =1= (1− P)+ P 1 Sources.of.Serialisa1on. P TN = (1− P)+ Locking((explicit(serialisa3on)( N • Wai3ng(for(a(lock(!(stalls(self( T 1 • Lock(implementa3on:(( S(N) = 1 = • Atomic(opera3ons(lock(bus(!(stalls(everyone(wai3ng(for(memory(( T P • Cache(coherence(traffic(loads(bus(!(stalls(others(wai3ng(for(memory(( N (1− P)+ N Memory(access((implicit)( 1 - Rela3vely(high(latency(to(memory((!(stalls(self(( S(∞) → (1− P) Cache((implicit)( - Processor(stalled(while(cache(line(is(fetched(or(invalidated( - Affected(by(latency(of(interconnect( - Performance(depends(on(data(size((cache(lines)(and(conten3on((number(of(cores)( 11((|( COMP9242(T2/2019(W10( From(hgp://en.wikipedia.org/wiki/File:AmdahlsLaw.svg( 12((|( COMP9242(T2/2019(W10( More.CacheMrelated.Serialisa1on. False(sharing( - Unrelated(data(structs(share(the(same(cache(line( - Accessed(from(different(processors(( !(Cache(coherence(traffic(and(delay( Cache(line(bouncing( Mul1processor. - Shared(R/W(on(many(processors( - E.g:(bouncing(due(to(locks:(each(processor(spinning(on(a(lock(brings(it(into(its(own(cache( !(Cache(coherence(traffic(and(delay( Hardware. Cache(misses( - Poten3ally(direct(memory(access(!(stalls(self( - When(does(cache(miss(occur?( • Applica3on(accesses(data(for(the(first(3me,((Applica3on(runs(on(new(core(( • Cached(memory(has(been(evicted( • Cache(footprint(too(big,(another(app(ran,(OS(ran( 13((|( COMP9242(T2/2019(W10( 14( COMP9242(T2/2019(W10( Mul1MWhat?. Interes1ng.Proper1es.oF.Mul1processors. • Terminology:(( • Scale(and(Structure( - core,(die((chip),(package((module,(processor,(CPU)( - How(many(cores(and(processors(are(there( • Mul3processor,(SMP( - What(kinds(of(cores(and(processors(are(there( - >1(separate(processors,(connected(by(offdprocessor(interconnect( - How(are(they(organised((access(to(IO,(etc.)( • Mul3thread,(SMT( • Interconnect( - >1(hardware(threads(in(a(single(processing(core( - How(are(the(cores(and(processors(connected( • Mul3core,(CMP( - >1(processing(cores(in(a(single(die,(connected(by(onddie(interconnect( • Memory(Locality(and(Caches( • Mul3core(+(Mul3processor( - Where(is(the(memory( - >1(mul3core(dies(in(a(package((mul3dchip(module),(ondprocessor(interconnect(( - What(is(the(cache(architecture( - >1(mul3core(processors,(offdprocessor(interconnect( • Interprocessor(Communica3on( • Manycore( - How(do(cores(and(processors(send(messages(to(each(other( - Lots((>100)(of(cores( 15((|( COMP9242(T2/2019(W10( 16((|( COMP9242(T2/2019(W10( Contemporary.Mul1processor.Hardware. Scale.and.Structure. • Intel:(( • ARM(Cortex(A9(MPCore( - Nehalem,(Westmere:((10(core,(QPI( - Sandy(Bridge,(Ivy(Bridge:(5(core,(ring(bus,(integrated(GPU,(L3,(IO( - Haswell((Broadwell):(18+(core,(ring(bus,(transac3onal(memory,(slices((EP)( - Skylake((SP):(mesh(architecture( • AMD:( - K10((Opteron:(Barcelona,(Magny(Cours):(12(core,(Hypertransport( - Bulldozer,(Piledriver,(Steamroller((Opteron,(FX)( • 16(core,(Clustered(Mul3thread:(module(with(2(integer(cores( - Zen:(on(die(NUMA:(CPU(Complex((CCX)((4(core,(private(L3)( - Zen(2:(chiplets((2xCCX)(chiplets,(IO(die((incl(mem(controller)( • Oracle((Sun)(UltraSparc(T1,T2,T3,T4,T5((Niagara),(M5,M7( - T5:(16(cores,(8(threads/core((2(simultaneous),(crossbar,(8(sockets,(( - M8:(32(core,(8(threads,(on(chip(network,(8(sockets,(5GHz( • ARM(Cortex(A9,(A15(MPCore,(big.LITTLE,(DynamIQ( - 4(d8(cores,(big.LITTLE:(A7(+(A15,(dynamIQ:(A75(+(A55( 17((|( COMP9242(T2/2019(W10( 18((|( COMP9242(T2/2019(W10( From(hgp://www.arm.com/images/CortexdA9dMPdcore_Big.gif( ( Scale.and.Structure. Scale.and.Structure. • ARM(big.LITTLE( 19((|( COMP9242(T2/2019(W10( From(hgp://www.arm.com/images/Fig_1_CortexdA15_CCI_CortexdA7_System.jpg( 20((|( COMP9242(T2/2019(W10( From(hgps://developer.arm.com/d/media/developer/Other%20Images/dynamiqdimprovementsdoverdbigdligle.png( ( Scale.and.Structure. Memory.Locality.and.Caches. • Intel(Nehalem( • NUMA((NondUniform(Memory(Access)( From(www.dawnorhered.net/wpdcontent/uploads/2011/02/NehalemdEXdarchitectureddetailed.jpg( From(www.dawnorhered.net/wpdcontent/uploads/2011/02/NehalemdEXdarchitectureddetailed.jpg( 21((|( COMP9242(T2/2019(W10( 22((|( COMP9242(T2/2019(W10( ( ( Interconnect. Interconnect.(Latency). • AMD(Barcelona( ( 23((|( COMP9242(T2/2019(W10( From(www.sigops.org/sosp/sosp09/slides/baumanndslidesdsosp09.pdf( 24((|( COMP9242(T2/2019(W10( From(www.systems.ethz.ch/educa3on/pastdcourses/falld2010/aos/lectures/wk10dmul3core.pdf( “Victoria Falls” 128 threads Par:al'interconnect' FB DIMMFB FB DIMM DIMM FB FBDIMM DIMM FB FB DIMM DIMM FB DIMM 16 cores Interconnect.(Bandwidth). Interconnect. 65x (two sockets) MCU MCU MCU MCU • Oracle(Sparc(T2( MCU MCU MCU MCU UltraSPARC T2 L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ processor L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ 64 threads Node'0' Full Cross Bar eight cores Full Cross Bar 35x C0 C1 C2 C3 C4 C5 C6 C7 C0 C1 C2 C3 C4 C5 C6 C7 FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU SPU SPU SPU SPU SPU SPU SPU SPU UltraSPARC® T1 processor 32 threads Node'6' Node'7' eight cores NIU 14x Sys I/F PCIe (E-NET+) Buffer Switch Core UltraSPARC® IIIi NIU Sys I/F PCIe processor (Ethernet+) Buffer Switch Core 1x 2x 10 Power <100 W x8 @2. GHz Gigabit Ethernet 3GB/s( 6GB/s( 4GB/sd3GB/s( Unidirec3onal( 2x 10 Power <95 W x8 @ 2.0 GHz No'direct'link'between'node'0'and'7,'0'will'do'“2_hops”'to'access'7' Gigabit Ethernet 2004 2005 2006 2007 2008 10' 25((|( COMP9242(T2/2019(W10( From(hgps://www.usenix.org/conference/atc15/technicaldsession/presenta3on/lepers( 26((|( COMP9242(T2/2019(W10( From(Sun/Oracle( Interconnect. Interconnect/Structure/Memory. 27((|( COMP9242(T2/2019(W10( From(hgp://www.anandtech.com/show/8423/inteldxeonde5dversiond3dupdtod18dhaswelldepdcoresd/4( 28((|( COMP9242(T2/2019(W10( From(hgp://www.anandtech.com/show/8423/inteldxeonde5dversiond3dupdtod18dhaswelldepdcoresd/4( Experimental/Future/NonMmainstream.Mul1processor. Scale.and.Structure. Hardware. • Tilera.Tile64.(newest:.Mellanox.TILEMGx),(Intel(Polaris( • Microsor(Beehive( - Ring(bus,(no(cache(coherence( DDR2 Controller 0 DDR2 Controller 1 • Tilera((now(Mellanox)(Tile64,(TiledGx( SerDes SerDes PROCESSOR CACHE - PCIe 0 Reg File L2 CACHE 100(cores,(mesh(network( L-1I L-1D MAC/ P P P I-TLB D-TLB PHY • Intel(Polaris( 2 1 0 2D DMA - UART, MDN TDN 80(cores,(mesh(network( HPI, I2C, GbE 0 UDN IDN JTAG,SPI STN SWITCH • Intel(SCC( Flexible GbE 1 I/O Flexible - 48(cores,(mesh(network,(no(cache(coherency( I/O PCIe 1 • Intel(MIC((Mul3(Integrated(Core)(( MAC/ XAUI 1 PHY MAC/ - Knight’s(Corner/Landing(d(Xeon(Phi( PHY SerDes - 60+(cores,(ring(bus/mesh( SerDes DDR2 Controller 3 DDR2 Controller 2 29((|( COMP9242(T2/2019(W10( 30((|( COMP9242(T2/2019(W10( From(www.3lera.com/products/processors/TILE64(( Cache.and.Memory.and.IPC. Interprocessor.Communica1on. • Intel(SCC( • Beehive( 31((|( COMP9242(T2/2019(W10( From(techresearch.intel.com/spaw2/uploads/files/SCC_Plasorm_Overview.pdf( 32((|( COMP9242(T2/2019(W10( From(projects.csail.mit.edu/beehive/BeehiveV5.pdf( Interconnect. Skylake.SP. • Intel(MIC((Mul3(Integrated(Core)((Knight’s(Corner/Landing(d(Xeon(Phi)(

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us