NEDO-PR- 9 90 3
x-A°-=i wu ? '7-z / □ ixomgffi'g.
NEDOBIS T93018 #f%4vl/4=- — 'I'Jff&'a fi
ISEDQ mb'Sa 3a
0-7 rx-/<-n>/<^7-T^yovOIElS9f%l
TR&1 2^3^ A 249H
illK
t>*^#itafc?iJI+WE$>i=t^
*%#$...... (3) (5) m #...... (7) Abstract...... (9)
m #?'Hb3>/W3## i 1.1 # i 1.2 g ##Wb3 7 ©&ffitib[p]...... 3
1.2.1 ...... 3
1.2.2 ^ y ...... 18 1.2.3 OpenMP 43,fctf OpenMP ©x-^fcKlfOttfcifeSi...... 26 1.2.4 #%3 >;W ly —^ 3 ...... 39 1.2.5 VLIW ^;i/3£#Hb...... 48
1.2.6 ...... 62 1.2.7 3>;W7 py—...... 72
1.2.8 3 >;w 3©##E#m©##mm...... 82 1.2.9 3°3-tr y +1 © lb fp] : Stanford Hydra Chip Multiprocessor 95
1.2.10 s^^Eibrniiis : hpca -6 ...... 134 1.3 # b#^Hb3>/W7^#m%...... 144 1.3.1 m # 144 1.3.2 145 1.3.3 m 152 1.3.4 152
2$ j£M^>E3 >hajL-T-^ 155 2.1 M E...... 155 2.2 j£M5>i(3 > b° j-—x 4“ >^"©&Hlib[p] ...... 157 2.2.. 1 j[£WtS(3>h0a.-:r^ >^'©51^...... 157 2.2.2 : Grid Forum99, U.C.B., JavaGrande Portals Group Meeting iZjotf & ti^jSbfp] — 165 2.2.3 : Supercomputing99 iZlotf & ££l/tribip]...... 182 2.2.4 Grid##43Z^^m^m^^7"A Globus ^©^tn1 G: Hit* £ tiWSbfp] ---- 200
(1) 2.3 >
3$ tstXf...... 213
: A. "The Stanford Hydra Chip Multiprocessor"###^ (1999.11.22,24) OHP...... 215
(2) ~r O/ 04 T4 4 3 ( n> □ <4 » #Y II # V 4 CO ft n • T4 Si i4 SI n n 0 0 O' M ___ V V 3 Ci n VP ft X 3 $0 min # % 3 ft z 3 4 SI u ss is 3 SI n i 1 4 0 £1 ft % If S M • n (4 VP 0 3 30 0 £] (4 3 min ant- ant- ft a d 3 V —t- & 4 s O' Si Sr SI Si 4 ever r^t- Fb 4 4 £j M EE 1—^ VP 4 ever r ft ant- 4 ft 94- d 3 SI s Ov V M Fb #B O' I 0 ever H n> 0 4 ft 3 # m rX # 4 min 4 O E M (4 ft c3 d I^ 4 0 s 4 9t 9t # 0v 4 I V > 0 0 4* . B O' M min z rv SI 0t y 04 r^ Fb 04 #d M Si 4 d * V IS If z ft H d x. I # SO 4 S> ant- niium T K; > w Si X 3 Si s> n m d m #° ever 0 Sr 04 # 0 ft a% $b $r ft mill mm 04 A Sr d V mill 94- h> d 3 # #4 u 0 n ai O' ft #4 H S T x_1 El % 3 5 Sr ~r ~K V ~r m 0 At- #y d null #pt ## #V sy M 0v 4 0 n 0 # E M ant- min V. ~r s #S' ft S rt T (4 £| ever 11 W Ov #Jtt • M d ft ~r VP y ft 0 0 m m 0 ant- ° u i^ 0 A s- 3 ever & gn ft ( (4 H d ft B 0- fS & # M minSS St- 0 f 4 Sr Sffl N St 3 §■ V ft #lV 4 4 04 m M 0 4 m B 0 Si H 3 1 Pt 4 04 v\ Ok tf # fX min ™7T 3- 04 VP (X ft (4 4 Si s 0 M 4 04 #m If w ft N 0 0 4 VP A (4 (4 4 #° ED- d N §5 mill 04 Si -r % 4 n 9t 4 M ^fr #T ft #S' CO (A- n 04 0 nun if 04 0 Fb B- z 1 H #Fb Sr SI iW s EU V rx 04 cf^ 0 #T M H 9ft 0 4 ft E nnt- N #vp H CO n 3 Sr cvcv #-r l SB Hi n % ° #w M 0 d 0 4 4 r Si rv it ■nt Xfi ft #0- 11 S Sr d 3 d 4 4 s xa 0 ft ft 3 l S gn 3 w ft 04 n m Ov ft1 fX #0 4 d St- 9t m St- O' O' 0 d ft ( 0 hr Ft X: O' ft 0 ant-ever H H (4 s E ant-ever 9- 01 % =Dr 04 E IS st- #o, n% 4 ¥ 3 hr min ° 04 rv ip Ov ¥ ° o N 04 VP • > SI ft -S' 0 s Si 0 #ft 1 frb rI 0 0 m V X-S 0 s ft (4 s 4 ED- ov vP n E 0* ti- u 3 0 0 l 4 % % mu m 9- al 4 -r Sr 5] & ~r m \P Sr B d t* ## m 0 V ft ft f4 A 3 X. Si 4 # -r ~r ft Pt ant- n 3 SB # ft m Sr S % S3 (4 a ever Sr 0 w <5. V min S V % *0 A ant- IS min 4 # t S V ft 1 Pb 3 4 E S ever 4 0 ft i Sr ft 4 fX Sr 04 4 4 n O' 0 # ft #3 St (4 as ant-ever #(4 0 V ant- 4 "7? mill U 04 n T4 04 • n> -X 0 0 M ft # % SK M (4 0 m =& 0 ft 3 ever U # • St- 4 V n U tt SI m n ff V V m 4 m 3 tw (4 3 #3$ niin #
M £Wib tT % o £ ti \Z <£ D , #!l X (i\ Virtual Laboratory (lEfiB^iB) • Virtual Microscope • Visual Super grid-computing • Networked Art (x y h y — 9 */7 JH tjx ) • Distributed Simulation • Technology for Everyone (?itHflfcW) & df O 4; d &,
bl_t:<7)4; d &WS£B§£xT, * rx-yt-n>yw ^ □ ^-oSHSE^j C li^do X 174^0 m% (D4 fcOli M t £&M £ 'if & o fc o #^'Jfb3 >;W VT(i, t;V^T'V^f >, >f >^-7D '> — A’flItfK OpenMP ^I70penMP(Dy-^^#Cl:fq](7X:#^, VLIWm#4l/^;i/
u WW y ©-titgfFlfficDfe^i&fpi}! t)I^§IiTc0 IMH
(D4#(D4fq)&tE@f ^ < gllfco j£ti^icn > b° j. —-r X > P&ffllz ol^T &, l^ti^icn > bi-f >f >777'J T — '>3 >(Dlilfgb!Mti'OUT!ll§t~£ b bSH, fMSftT £> ftTUTJKll^lc
dfi,^6(DC:0 2o0^#(D#^ -
(4) iNSfiK
? • a x v ny ]
#M #- ¥«BA^ ai?S t*f8¥f4 SS [# *] (50 g*) sis t#« ##EA# m%#@p ss tt ?A (#S±*E%Sr 3 AXAAgmm m# eia % i»ii*¥ ass Uj£ ¥A aSE*S:c*S«PS*^ti«|g-erE^flfftS$7-tfmg|i AEE55W m*# wbvekep/t mm A
[± *] 's.m mm ¥«D*f ax?s #%#?##%## ss [S M] (50 eIII) mm s*69§ ^xxAjg*w^m *3 ® *# Esij s±®9mm 3>t‘^.-^->x^AE^m MAEt-ef^® ass *$ #mm@A# A#K#mA-x^A#m^# ass ii]£ ¥A *ia*#@:[;#S*p^«ES«||^p^mt»fS?-tfWtg|5 EEE^g
[j£St#fC3 VtTa — 7-f • 9 ——7] [A 1] #w m- ##EA# Ufa #### ss m m\ (5os»i) "FE XW ARA# ASIt®«A>SS «* igjt *iSE*S:i:*eEPS*ASriS^rE^mtSIB7-tfmas £EE2ES era m ssha ? A#p^m*m%#m%# ass
(5) ill P fl# mm# #^i. rfe | B ill fPM iliSSHS x*SEP3 I8S$ i ^B & me x#e*ps ssssas #B me a##is6*@ *?iK*is ea #(6 me «««***i9 #?m*m j$««* me «««fse#)9 MAS SriW-i fiwn ■ficffiNisigte I /J\B &7 EiW-i fta #P foM Sfitot"--i [W9SE. ej jtsia mz mm*-i (M)B*tfr 4b H MM (M)B*t*! M# ^ (#) B
[$n*WlVt6*] $S « WS±*Sf%Sr 3 *b m mBiiKim 's^T-k^wum i§3@ ±ee%* *# @x #*B±$ ai?s /jv$ W¥ mmmm^x.mmmm^mm-^m^\nm7-mf^ i.ammr MX -$ T*>y h y-^-b>^- Btm
[###] m-h $- (M)B*tiigma^%tss ««&■§!$ s«s$ »* BBS® JEX (Bf)B*tl«®a^%B6 Sffjjfcegp SEP ±ffi®
(6) ¥-f& l l ^SirsljBLfc^SIStE^ r^-yi-n wtO'fi'/Dyj ©SES&**iB*P5S-eii, »iftaii3>ta-r-f >y^ia f 6 7cto©*#9#R#t: LTn Sta^vyi/^T'n-fe >t"i-7 7 >7&*fSk lf:3>/H7«W7'Dy7 5 >^JStttS«COVNT, is *#%&5VX S«l$S©#ttl • E?£is%*ffl©16lt&fiofc0 mssmcfefe o-Ctt, rae^ijn WW 5WGJ t y >^WGj ©2o©«i$§ gait, 36?ij3>yH7WG-ett, ® rm%r®mmb? >;H 7g»;itJSfiilj Sff? k #):, 7d^i? ('Sf^&SlJI^-SCifcfc-oTffl© ® rg ffismmg&ftJtoUtMbj &ffofc0 £«A-l(3>ea-7^ >7WG-t-tt. ® rggf© SSliffiU © -x-f >^*yyv-y-i>a>fl-s©1ti#j Stfofeo «TC^fflESS$kto-5o aS^J-fbnWSixftffi : mmt#ay%^A©u-fe«m (*^ny±©**esg) (v7k?x7 &76 6#Am©*^©e#) k©s«. a^s cntt, ae^j ®gy\- h'^iySEChtetU, ffiMB9i:36?iJ®a V 7 h 7 $ y I'^cissuti'^o #c, 70^7At©;i/-7"w.*©@BA-c©%^u%#mA^%a' t?afe 0x 7"n75AttJ©s$d"$tc v^.;v-e©jfi?iJtt#tti (vyu^^v i > jfi^j-fbti*) %*©7=-7*#, ®J»tt#©®]iKj6@xfcSlff*j$; (®aw*ffs T-f ?#K $ ?.(;tt3.—(ftffl'f >?77'>3 >&!SSx7r*?iJ®a^a-->7tt«A^^©tt* ¥Sft4b-(t-6ttk^-5o *^6©iS$E^"Clix 3fi5U®a v 7 h 7 ^/’©tfStte-S gijMWbn >;u 7©S #minit:'3V'-ci$*»is*&%mu^. uttreui. ©vw^i"< v^^j-fb^rK ©i >?-7n ->-'77)$ffi\ ©OpenMP fflteffix ®l«3W('fl/-y3>ftl, ©Air v^;v36?U®a1 ©SStoSIffftSx @a=-7L —— >7"'7 —yi/©ttEI6icax ®n>yW7© tilbFffl®l6lnlC-ov-t * kto&„ £ 6>tr, 77 > 7 * — FTa^tcatiut Hydra knV-S* 777Af7n-fe 77ffl»^ia«t;fef £fc>3 Olukotun $tj§&ffiWH8iB U 4-E© 36 ?|J -f b ft W tc 731' T MII i ^ Sfi U o iy.±©is«6ss$x.. ®7d77Affl#@^sc«^M?ij®a?i)Sfflf#e>nsftii7^e Stcreg^-giJ U369iJ-fbt"-S rvyv^^vi >36?!HbS«j, ©77 7 b^yi/&Sti r®« toSIGSiij^ ©a.—y#nwu% < T4,*jat;7-'-7&«-irr?> r@i)Tr-7di!tS«ij, >36?iJ-fb(;W/S bfe rx7i7 7L —U >7S6Ej, ©E/to*7D 77 A^Wj t*$BCX-3Wc r^.,,.-->7tt$j ©5-7$7ny3L7 h k UT»^|g%T^$S6BE @k VTIS$L7c<, ^*©^>7-7-777 t-T-tib »Mb3 >y W 7 ©ASM S6& y^irttie & »«-c- $ » v /c to. #Mfb3>/w7©em#mR#cMT6R#^% As^stfflisiftta bfco
(7) S • ¥ • S©$m#&ISSl IT V -7©fl»©Tl;:E^SM658»'f B*f ISMSieS: k B C k k bfeo l£*ail!3yi;i-TO?Si : fia^icn > en. —7 -f >¥&Wi (7d -;i*3>ei-fO?S*) B\ -IXCS^SliSlCfifl^cSflfB^jlA/T #Tt\BR%T& 0 , $ y b? —7k3>en. —7 t ©st-g-c ± o reKait®# nr 16 k & -a jai $ nx u b „ *^6©ii«E%T-tts •E>i£iifl'E3>t;zL- f0i'077^h7!’ft (Grid) C7HT> *H6*-C'k U 728®lJ|6ll8$ & ■ i5 to C SI it L £ = £ 9t 5b Sc n > tf n, - 7 i’ >?cit^iSS;6Itii 4® Supercomputing ’99 N ©Grid Forum, ©JavaGrande Portals Group meeting x @ International Symposium on Computing with Objects in Parallel Environments 6Sj $7 B k#l%\ if7B 7 x. 7 b T& B®UC Berkeley © TMillennium 7D ¥ b 7 ©£«M3>ka-f-( 7 7l:A^T7-51/7v bffl&SI&Xfcf rGlobusjx ©NCSA k % o Ti# A B I"the Alliance 7□ V x- 7 b j U)co C©&@*, 4-JPRttEI+*tt7^7 j'"'!'Ai/7 b£@C VT*5D, f Tl: Grid Forum k If HtlB Grid S$©fcto©ffi®AsefiK$n, • 55jt*k"6 ffoTUB T ktfflBf Ufeo Site, '(>7 77b7!>fi>tLtli3;Er-(fi'fflIl!SS ltil$hfe7777S- ffll£®iil£ST7 b^-^TiS-BU $6(677 7-f y77 ig*A>e> PDA l:Wc-Ei4g*6 Web 4kffli’>77SlSCTD-ext1 f©±©77 |J 'r-'>3>liS*I14:SiItIli'(.> B#to7-lf7*Tl6t\K^#»$ftTt^B. cn?)©«!;st;tt^s a#m©m%mmi5##b:±#%;ad%&k?TL&oT^Bck# fl®J bfco £fc, &%5b%3 7Un.—7 -f >7l:i@±B77S7—73>(:3t\TI5, ®ig|SH'& k©eiSlb6E®fliJC«e.T^E^to±i£tc± DKttJSlc*©^ SDP iaS(¥IE;6e:7-n 77A)6«k LT8?D±tJTl6ltUko SDP FtSSHu *«©#m$jm»k©A^6, 8 icf©mgea^is
W±©±7»*^lS©m*#%&^*±T, 3m
(8) Abstract This report summarizes the result of Leading Research "Super-compiler technology" executed in 1999 fiscal year. In this research, we made the leading investigation for the key fundamental technologies aiming at the next generation high performance computing. Concretely, the investigations on both (1) the compiler technology for the next generation parallel computers and (2) the global computing technology were done. Then, we extracted and materialized the technological problems, and also examined the R&D system. We set up two working groups - “Parallel Compiler WG” and “Global Computing WG” to investigate in each technological field. In parallel compiler WG, three investigations were made - (1) summarizing the technological trends and problems, (2) materializing the R&D contents to initiate a project, and (3) planning the project formation. In global computing WG, two investigations were also made - (1) summarizing the latest global computing technology trends, and (2) investigating the application area of the global computing technology. The outline is sown below.
Parallelizing Compiler Technology: The difference between the peak performance (theoretical performance) of the parallel computer and the effective performance (sustained performance when a software runs) increases in recent years. This phenomenon shows that the progress of parallel software R&D is relatively delayed compared with the parallel processing hardware R&D. Especially, a parallelism extraction in the parts of a program other than in loops is insufficient. Thus, (1) the extraction of the parallelism at various levels of a program such as multi-grain parallelization, (2) the new execution methods beyond the control-dependence such as speculation and data prediction, and (3) the tuning technology interacted with the user, become keys in a technological reformation in the near future. In this research, the trends of the automatic parallelizing compiler, which is main technology of parallel software, was investigated related to the eight important technologies - (l)multi-grain parallelization, (2) inter procedure analysis, (3) extension of OpenMP, (4) dynamic compilation, (5) instruction level parallelization, (6) speculative execution, (7) tuning tools, and (8) compiler performance evaluation. Moreover, professor Okukotun involved in the R&D of on-chip multiprocessor called Hydra in Stanford University was invited, and the opinion exchange was made. As a result, five research topics are listed to be researched and developed as a project - (1) multi-grain parallelization which divides a program into suitable grains to maximize the performance, (2) speculative execution scheme including task-level speculation, (3) automatic data distribution without a user assistance, (4) scheduling scheme for multi-grain parallelization, (5) tuning scheme with dynamic program information. Moreover, technological development concerning the performance evaluation is necessary to evaluate the parallelizing compiler because the conventional benchmark test programs are designed to evaluate hardware performance. Besides that, the project formation is investigated to adopt the center management scheme directed by a project leader. In this scheme, all the researchers in industries, universities, and
(9) national laboratories are concentrated to execute the research.
Global Computing Technology: Global computing is an advanced technology that is rapidly researched and developed recently in the United States. The ability of high-performance computing resulted from the fusion of the wide area network and the computer system is focused. In this research, the technological trends on global computing in the United States were emphatically investigated to make clear the high-performance computing and global computing infrastructure (Grid). Four major international conferences in the field of global computing are reported : (1) Supercomputing '99, (2) Grid Forum, (3) JavaGrande Portals Group meeting, and (4) International Symposium on Computing with Objects in Parallel Environments. Three major projects are also reported : (1) “Millennium project ” of UC Berkley, (2) "Globus" by which the role of the toolkit is played in global computing, and (3) "The Alliance project" which is the joint project among industries, universities, and national laboratories where NASA takes its management. We conclude that the high-performance computing has made a paradigm shift, and as a result, the organization to spread the Grid called “Grid Forum ” has already been formed, and it has turned out to circulate its information, make the standalization, etc. Moreover, clusters constructed with commodity components are connected with super-wideband wide area network. The terminals including the efficient graphics terminal and PDAs are served by using the infrastructure such as Web. By using these infrastructures, many applications such as a super-large-scale numeric calculation and commercial services are on experimentation. It has turned out that the R&D of our country takes a very big delay compared with these movements. As for the global computing application, we pick up SDP (Semi-Define Programming) problem that does not rely on the experience rule but rely on non-experience rule to solve the optimization of a structural design, etc. The problem is focused on its importance because of the effective use of resources. However, the calculation performance that exceeds a single super computer is necessary to solve SDP problem. In the speed-up of the SDP problem, the method, in which selecting the best solution among many solutions with different parameters, is used. Therefore, SDP problem is suitable for global computing because the long data communication delay, that is a disadvantage of global computing, is able to be suppressed to low.
We conclude that we should initiate the project called "Advanced parallelizing Compiler ” in 2000 fiscal year. Moreover, we should continue the current R&D on Global Computing in cooperation with universities and national laboratories. Then, we should aim at the new joint R&D among industries, universities, and national laboratories by clarifying the possibility of industrial use of global computing.
(10) S1S *?!Hta>/U7SHf is
1.1 mw
J!7fc<7>;W A 7 — v >7 • 3>h°a — X&, 1EE0 7T7 7 7°n-b y b ;i/7n-i! V^#L/c7;i/A 7°D t Itl^o CCDcfcd&7;i/X-7°D-t? 7tb^^;\X;^7^-—vyx - —7°o-bvi7^®i#AQba^cyxyA cob-7## of##) ^l#t:7 7V
f ^t)7,yDt'ytbmcoi#^b#c,b-7%#b^m#mbcom^#7c^iM:^ D, nx b A7xr —xyxfrGMfc^frTj&fnJACTi^ bW:#W: < $ G c, #%co7;i/yxDtvtb^^;\X;^7^—y>x - oybxi —7T(±, ### #&r8i±^i±za brs ux-;i/#& C D, yxyA&^^C^f C0^##C#L^b^ df^ES&£o C07cA, 30J:9^ai§if3iiT^7^7t-x>x • n>bxi.— 7 b#Bf —tE(±#^%CMT 4bB67tT73 D , X—77 — C t-D T&MA ^yXA A^fg#7^##$7t6 0CM LX, ZCDXo &/W A7 ;t-7>X • 7 > bxL —7 047^&M^X, ABX&BiCS^fRI^ § — o ® IjJtEP C b, DOE(Department of Energy) £ pBL
- 1 - V>7 - —#t:3—K®y7> t*v;i/f7n ’b'^it- 7 —77 7 7--v^£M£ii&t§n-i;:tiu ;W;^7 —v>7 • 3 >
^^7-A#6D 73 77 Ag#W:#j^^73 77 < >7^#g^t 0^73 77 y(: j: D^^^fi^#^t:^^7 4b.73 77 < >7(D0M^W:73 77 Aco^m%|q|±, 7377 < >7#^MT##m##&&3 7>7;i/f-v7 - y;i/f-7Dt'yit, ^i^^^#LX:y;i/y73t ^7-777
a#X.6^l^o '7^7373tvT'^6;W;^7^ —777 • 3 7 ba — 7 37b3- —7 kr&UTs fiJiE'B> 37 b/17#—777#f%±#4#$#-C^
©73 77 ASrSititoM^iMbU mTc k&W#^a T 6 g##^Jfb 3 7/W 7©^%^^#"C&6o *7-;^ —37/W7 - 777 37©#K#2%, ^^ij37;W7 - 7'-jp777;i/-7 c©j:7&^m@##?U
• 5to0;v—7jfeWb£®x7 7—7 7>&'Mbfa±&;S^t~£ fc#><£>7;vy7'v-Y 7
• 7©#?iJ#&g|^mf&AW737-7yfg7-7#<#0#^%L • 3.-4f^3>;W D 3>;W API(T7V7-7 3 7 - 73 77 A - -T 77-7%:^7), e 7377A^^^©###m U 3y/w;i/&ff9#^ • ^^7^;i/#^lj#&3 7/W 7(31 b e 73 77 APp©^^J%©^MC j: b 7:r7m$Jm'T^&l'©&^#;5/:#)
• J1 —7Ccb^) 73 77 ->7'#^$#6/:&b(D7 —;i/, • n >;W 7©'i4|b§rWE^j^|f{ffit"^^teCD^^Imi • 7 7 771/7-y 7 • 7jVf73-fe ^-7©ijfn] • M9\> #%bl:s 73 7x7 MbbipWtI&EtIv'r UT^ < ^<§^EEE^3UtES7^o -2- i.2 1.2.1 yaeyyss lft0VAf7Dt y 7 > 7 t1 A IE gM7!Mb3 WH7T-gy A—74 7 > —> a > > /<7V©5)1'^'J-fb 4fr o T lx 6 [l]-[5]0 C tl fb © 3 >71 d y X!' g N GCD(Greatest Common Devisor)?S[l]x Banerjee © inexact and exact test[2]-[3], OMEGA test[6], > > dt '7 -by >t d y 71$f/i\ -Y >7 —7"n-> —-> yflU/riigi&k'mSy&T-'-ytt# 8W[l]-?\ 71/ — 7^-ia, 71 —70^-x X b ') 7/7/f T3>7\ 71/ — 7’d’ >7 — f-x >>, Tl/d'75-d'/ SIJhtf-7'Si);®!' 7 1/ - > 3 >©#%7@M L&ghg& b&Wl-:/ (71-7 *y V K -7^>7>7) 44)37V-7,jy. limit* © J;a it$&k'6S-3 7!-7\ $,^;xtt7P-7*©S5mx.* :Cflfc©|ByiJ#!BAsa$ix^ 6©AS$> 6Jf6 (S?'J©fSK#«) !±3>yH ae©E$»7-yto#AsH*6;Acto. f ltb©7l — 714 # — © 7 n -b y y- ± X M ft W tc {& a $ lx 6 o 7D77 ASffaefS©l*) 99%©@fl'A si«)$>fc7'-ytt#«ffSt>'l7 7 h a Xfy '7 >7‘C g b afiyiHbTgfc blT4> i%©gB##±m© Za%#mb@#%7i/- 7 (ii?771—7) i^l'tt71/-7tt>i©Sy b LtSot LJ ofelMCIl 1000 6© 7 n-b yy-4ffll'T t>i67 100 IgClifiloUil/iMf'blt&l'o f&fci*., 7n-byy-*Ai g'&l'Jf-n-Cgx 1%© jfi?'Hb^ nrlb»g|S4>As, 7 n-b yy-&©igink#£j67ijffi8t$Sg©ini±6PEWf"6*§&7 r 77 k&oT L£a„ btzifi-oX, 4>#©77P77Dy y 7777A©@#zACI±, %dR©71-7#7'J tttriax. cns;T*Ggfc)ft-ttA&A>oAcfflsaafe?'Jtt[7]-[ii], ®a*4S3eyijtt[24]5& 7Blx6!ftgA si56[17].[22]<, 4 ') 7 4 til ') 7d7137*f7-Ad' >l$»sj±|B|T-gS% 1/71X6 PROMIS 3 > /Hdtii, zztfscommzmmtfiicmmtz htg (7\dry**7i -77777 7 ) k > > # '7 y 7 • -r— 7##)##R#i 6 d" '7 7 d" To#© Parafrase2 3 > 7tg 7 k, VLIW 641'Cxk >^71*8636 SUffiaSrffa * V 7771377;© EVE 3 >Ad7Si#^fct5Ckt: -3- J|BK6>l£?!Jti6i3itv;i/^ L (1) S»ffl6«36?!l®a ;:ta, oscar vii/sf w >36?y-(bo wuresitsgfj*i667 7 7 3e?y@ afsconta^ii, *i667 7 7 36?y®afii£ti:1 jut©40 (a) 8fii?6n6X?tsS (b) *166"=<'7 o 777181©o > h o—T7 o—, 7—7 tt $11 It (c) o>po —T—7##@04: V7 D 7 77ia©###(tW#Af6 JSffi (d) &PCfflfflo-Ffc7'4't-5 y6^^^i“7©4sE OlTT'li, ;*l6ffl^f'?7SSj4f5» a. 77 0 7 7 7£fi£ Fortran 70 77 Alt, HffBgt: 70-b y 44 (PE) 6>07T-A*'\y P\ T-7gmT-/i^\y F kit® OT*g*t69l;:7; $ < &S J; 7 t: OSCAR o >y(4 5 #Tj$TS v7 D 77 7ltA SSfttft A77o y 7 (BPA), #0)8070 y 7 (RB), 44711/ —7->7o y 7 (SB) © 3 SI®© 70 y 7 TS> So BPA It, «SttS®©**7o y 7 (BB) k LT$#SftSo 4:4: U Cffl BPA ©4fiUc* ‘ lATIi, 7o77A©36?iJtt, > fStl'T-f igST-A'-'x y Pltt T, ®*7o y 7 ©74|y-^, a 0 66©*1 W7 a 4’zi' 6£4$t"S &to© v7 n 77 7 iiir»k-*iefflS*tSo **7o y 7-9- BPA ©44ffllt, Z 066©*H\-77 o777M©367y##mi:m'6 *tSo «X«H 1.2.1-1 (a) © BB2 It, %'n9 77 T£>S RBI ©#ma@44k##t77 7T;fcS RB3, RB4 fflliu®a$fl-ffl t o & 2 -D©-K#©)ffiV^g|144SSA-e* ‘0 , BB2 It 0 1.2.1-1 (b) ICSItS BB2A k BB2B K»S1T-$S„ C. ©tHflll; iol. RBI k BB2A ©711/-71, BB2B, RB3, RB4 ©711/-7ffl36?y)aa* s Htb k & S o £4:, 5EffHeflfl©'li?& BB CtlOTIt BPA ^©ffl-6r* s)iffl$n, 7‘4'T5.y777 -7a.-V >7^--y1^xy P6«'>$ti-So MS It, * OB 1.2.1-2 (a) ©7D-777C SltS BB4 k BBStfltk Ak:6:6S4:&l,vjM5& BB-e$>S& 6>lt,BB4 k BB5 It BB2 mc&Sd'Mk 0T7n$4tSA($44R71:#fr$*i-, 0 1.2.1-2 (b) ic^snsSole® ®7k Lt«t>tlSo BBS #0 1.2.1-2 (a) IC<S BB4, BBS lcT-7## - 4 - (a) An example of a basic block (b) A macro task graph after having disjoint data dependence basic block decomposition inside 0 1.2.1-1 y j: 0 BPA BBS It BB2, BB4, BB5 y 2 BB Z (± BPA a RB it. Do ;i/—i/\Z tZll — 7\ t ft t> % =l 7 )l)l — X 0 1.2.1-3 \Zmt 9 U $ 7,9 1t7" u-c&, pcpc thy'x 7^ RB CD5>|!ltCjoVTs 0 1.2.1-4 (a) iZjF^tlZt d —A — ■7 v fco;i/ —7lt, th7"v^7 D ^ £ fc&fcn — F t° 0 1.2.1-4 (b) lz&tf2> RBl.l t RBI.2 (D t O tzt-X h Ztlfzfr RB Doaii ;b-y \t PC \ZMb btlfz'?# up Zf7 t LXWifc>tl%o tt£t>t>, ±X 0 PE, & U < & PC Doall )l — X\t±XO 7°n -fe '^T't Doall;b-7(±k _v/7D^^/7^Li:PC(:mD^%6^6o cc^k(±, mmtipcm^^i'v D —4f-'> 3 XDfctblZMRgtlZo £(D&?i]j£&ttiti}'X! — LX RB tfel ffilz# P *? - 5 - BB1 Data dependence edge Q Control flow edge BB2 BB3 ___Q-__ ___Q___ BB 4 BBS BB 6 BB7 ___Q___ O BBS RB9 RB10 BB11 BB12 O ___Q___ (a) A macro task graph with several small basic blocks (BBs) BB 1 BPA BPA BB2 BBS O A pseudo 0 A pseudo statement statement BBS BB7 BB4 BBS Q O BB11 BB12 BBS —G— —Q— I \ i •, RB9 RB10 (b) A macro task graph after basic block fusion m 1.2.1-2 'ytm.'nte £ BPA - 6 - RBI .4.1 Do dccoss 0 1.2.1-3 RB - 7 - -M- JU _a a (a) A RB having overlapped loops inside its loop body RB1.1 RB1.2 (b) Sub-macrotasks of a RB generated by copying code m 1.2.1-4 7 y ac ate RB D^ 7$ 7 <£>£$; -8- b. Y^D7D“^77 (MFG) ©^5% y 7D7D “^77 (MFG) U\ — 71/7D —, ©iSOT? &itlv£&©-£'&£o m 1.2.1-5 (± MFG ©$I£tRLT^3o G © MFG £;fcl*£ y —FJi BPA n RB, SB CD d £ ^ito ^X7^li3>FD-jl/7D-, H ^#"7^ Df;% ^fMjCD^—f 7 — F F^] C7D /J\ H&, C©MFGt:^^T(±^Rl(±#B&$flTl^^, 3:'^^© AT©;iv^^vi;(d: RB MFG (±#th ^^7F#[o|^77 (DAG) 'T&&o Data Dependency Control flow Conditional branch BPA Block of Pfeuedo Assignment Statements RB Repetition Block 0 1.2.1-5 D yu — tf’7 7 -9- c. vi' n ^IS© j6^iJtt©#titi MFG (j:vi7Di'Xi'M©3>h D —11/7 D —kv —i'teWSrSTAb 7 7 D £ 7 7 HI] ©3fe?!H414*3I bTi'&V'o -jKtoC> n > b d— ;MS#7' v 7, t> b < Ii7n T'v Att #7-9714 , * vvi7Di'xi'i iac7:-'-i'tts»s^itnHE*fflj6Mtt4SUTv^ 0 v A* bHISCtt, v7 o 77 7H9lcl4f-7##A^##±%o Ifetfot, v7n7D —7' V 7 *6 v7 Di'Xi7|g©a69iJtt6#titiT3 > h D— JU&frtr — 7tt#© *ffiK6j6Mtoa^$tr*V'Tli. n>hn-;u 770777 i (MTi) MTi SfctoOiffTifeS. M!;U> H 1.2.1-5 t^lAT, MTI t MT2 c o > b O —lb®# b, MT3 iCir-7tt# Ltl^ MT6 ©*¥|lff BJfglSifttt, W.7©43(:*&. (MTS A%77-& ORMT2 ASMT4 CiblR-f 3) ;;f, "MTS A#7±^," kb7©!4, MT6 £ MT3 Or-i'ftSfcl IT> OSCAR villi'" Vi1 >3 >/W 7»sESLTV^,y.T©*tt6SS'f-E>^V'?Si*"eafe5o a) * b MTi *5 MTj l:?-il# b-0'£& ?>t4, MTi I4 MTj ASH 7 7 -5 4 7i I! £bT-§&, MTj C3>h n-iMK#bTV'-5 MTi liSIffaTIgk&S ±13© MT6 ©«¥Slff MTi ©nfi:*ttffl7i®stty.T© (MTi As3 > h n-JH£#T% MTj Ab MTi (C^Kf 6) AND (MTi A^-7##f MTk (05k5 |N|) ASE7 OR MTk AsHff£tl&± 7 -5 ) «x.l4, MT6 ©e¥^fT5J66*(7ffl±©®(i (MTI A5 MTS OR MT2 As MT4 lcfi-lli$) AND (MTS As$£7f"3 OR MTI As MT2 AND ©tiifflad^ftlib 3 > b n-;i/tt#tc± oTi*S±-5*¥Slfi :i5J6g$(zf:72fe 0 , AND ffl&fflgHibsHifttt, ir-7tt#&?i.£'f qT«g*tt±*>5o 2 #S©jfett 14, MT6 14 MT3 As»7 b tz'&cn'n£ *U Ab MTS 7 t A^^t 3 7 CC^fttC&bT, MTS ©Slfftt, MTI As MTS ICibtiT 3 C k 6, MT2 ffl*fi(4 MTI As MT2 7 fc & IT b bfcAsoT, 7©*ff l4Eil$n. JiVf © 4 3 ZtltzBttlZo - 10- UT EP0& 3> (MTG) * »ff©@#-fbtt, -6 MTG o — ho &:&';/ C tl^Ts — T — u-c^&o > DAG X' / 7\Z£-oX$itobtltzX-"Jz/t) b 3:^ SgfTBJt^ftBu i f (MT3 D ' ± CD^I — — ^6D^rq)(j:T[q]^^E^L, i ;V#;#:n.y ^ vS/(i ^x 1.2.1-6 -ry MFG OSCAR 3 .- OR V^D ®S<7)n m - v MT2 11 MTG i. - HV/HvEiotgimctftift?.. 2.1-6 % ft OR MT4 > V (^k/v^(7)^En(j:#l$ m&lZ&ZZ b D IZfrti&tZ) — A/tito p $ ** > £(}-£& h £ D -e — ;i/7 -5 £ LTl^o < to D-J: N c MTi eg 2 V CO A — ^ r- o d. MTG ±©TX D XX X© PC, PE 'ji'X'ri’i-'J'/y vxDXxxii, HfiWc PC, PE tend'd- 5 >X£tl-?>o CfflX''i'd- 5 •^^'ryi- 1) XXtifflilSBXXXCytf LTjSfflSh-Bfcto, XXXjl-V XX’tf-yS —'xx PA5# X XtoUffBSHSOsf L"Cffi*ftoC'J'$ < &-5o OSCAR n WW xCint-SX'-f X- 5 x X XXXi-'.) XXtt, -flgto& SMP 20% Estimated branch probability 20 10 20 40 70 Longest path length from +30 +50 +50 +50 +3q the exit to each macrotask 1 i 1 l max ( 50 60)i max (70 90 10(* = 60 = 100 0.80*60 + 0.20*100 = 68 0 1.2.1-7 vXDXXX^7X©ttinx — P^N©##yixS©liS (2) IBM RS6000 SP 8 XD-b X » SMP ±T?©t4|g C. ct'li, 8 XD-fe X+!£}§* bfc SMP IBM RS6000 SP 604e High Node C istt-s, v> -12 - a. OSCAR Fortran 3 >7W ~7 HI 1.2.1-8 Us OSCAR Fortran n >7W 7 ^ LT & tC >7^ 7 7 D > b^> K (FE), ^ F71/7^ (MP), J:> F (BE) OSCAR Fortran □WW7l:(t OSCAR, VPP, MPI-2, UltraSparc, PowerPC, OpenMP(Dj;o^#^%^-yvF, #?']####, 7^77Vm0/i'7^:i:>F#& 6o OpenMP 7iyf :n> FU\ OpenMP F ^7 f- ^ Fortran V —X3 — F £ g fj#J £ f 3 tz & £1128 41 G OpenMP Fortran Source Code' Middle Path Multi Grain Parallelization -Coarse Grain Parallelization -Loop Parallelization -Near Fine Parallelization Dynamic Scheduler Generation Static Schedulmg ______ (intermediate Language) V V' OSCAR 1 lPP::i::i:i OpenMP ^ STAMPS 1 Ultra S^arc i i Power PC 1 Back End; Back End; Back End; .Back End; Back End 1 JBackJEnd^ J /^Native \ t^Native \ < Machine } 1 Machine 1 \Code \Code HI 1.2.1-8 OSCAR Fortran 3 >/U ~7 - 13- 1.2.1-9 ARC2D INTEGR © MTG - 14 - C. FFfflrny^A 7P 7"7 AH, Perfect -x>^v —7© ARC2D T&So ARC2D l*S* psb, ^Y s« 3>ys Y 7i; J;S®iifb7’n0!lk UT ARC2D ©-9-y;p-^> INTEGR OV7p7777"77&[1 1.2.1-9 (CStf o ARC2D tt 40 ©»y;t,-f 4500ff©7-D7'7 ATfcSo ARC2D ©#f?R ®©a *> 90%6+h7>-^> INTEGR #AtoTto 0 - +f7>-Y-> INTEGR ©$T% +t7MV-^> FILERX- FILERY- STEPFX- STEPFY »sflt© v 7 D 7 7 7 J; 0 tiitlR 6. kto6©»7'yi/-Y->)c*LT- 05-fm yh-T" y>n-0 >7\ y VY T^Y^-ir Y -fe'-Va >&k"£i6f9 L- 6f?ofci6§$- H 1.2.1-9 MTG #Ifeh)t£toSo d. SMP +t-yi±-e©tt*g k CTH- ±IB©7"n7'7A&HVt;£ IBM RS6000 SP 604e High Node ±T©ffl*66 #%©%* Fortran 7D^7ii>^ OpenMP 7 11/^x174 IH'/fc Fortran T IBxb £ to £ *1*6 6 367!lfb 71 n 7" 7 A # OSCAR 3 >yi Y 7 C (3) $kto OSCAR Fortran vyi/7-7 W >367iJIb3 >71Y 7 ©®56*ie87 X 7 36 9Uffia^a&*-C>tcji4^fco OSCAR-771/^7'W >36?iJfb3 >ylY 7©tt(ig£- 8PE 6 jgSSLfc IBM RS6000 SP 604e High Node SMP _h®36^'JYb3 >y!Y 7T$>S IBM XL Fortran Version 5.1 k tblit LXilSSE- OSCAR 3 >71 Y 7 IC - 15 - gOSCAR ■XL o CD DC CL T3 8 Q. 0) Processors 0 1.2.1-10 RS6000±T?@ ARC2D ©j$Slq|±* [♦#**] [1] auttt, isoi. [2] U.Banerjee, Loop Transformations for Restructuring Compilers -- The Foundations, Kluwer Academic Pub., 1993. [3] U.Banerjee, Loop Parallelization, Kluwer Academic Pub., 1994 [4] W.Blume, R.Eigenmann, J.Hoeflinger, P.Petersen, L.Rauchwerger and Peng Tu, "Automatic Detection of Parallelism, IEEE Parallel & Distributed Technology, Vol.2, No. 3, pp. 37-47, Fall 1994. [5] D.J.Lilja, "Exploiting the Parallelism Available in loops," IEEE Computer, pp.13- 26, Vol.27, No.2, Feb.1994. [6] W.Pugh, "The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependency Analysis," Proc. Supercomputing' 91, 1991. [7] 350, "Fortran D- I, Vol.J73-D-I, No. 12, pp951-960, Dec. 1990. [8] H.Kasahara, H.Honda, M.Iwata, M.Hirota, "A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems," Proc. Int. Conf. on Parallel Processing, Aug. 1990. [9] SJE, f^*, Fortran ###, Vol.J75-Dl, No.Spp. 511-525, Aug. 1992. [10] H.Honda, K.Aida, M.Okamoto, A.Yoshida, W.Ogata and H.Kasahara, "Fortran Macro-Dataflow Compiler," Proc. of Fourth Workshop on Compilers for Parallel Computers, pp. 265-286, 1993. [11] H.Kasahara, H.Honda, S.Narita, "A Multi-Grain Parallelizing compilation scheme for OSCAR," Proc.4th Workshop on Languages and Compilers for Parallel - 16 - Computing, 1991 [12] P.Tu and D.Padua, "Automatic Array Privatization," 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993 [13] Zhiyuan Li, "Array Privatization for Parallel Execution of Loops," Proc. of the 1992 ACM Int'l Conf. on Supercomputing, pp. 313-322, 1992. [14] M.Gupta and P.Banerjee, "Demonstration of Automatic Data Partitioning Techiniques for Parallelizing Compilers on Multicomputers," IEEE Trans.on Parallel and Ditributed System, Vol.3, No. 2, pp. 179-193, 1992. [15] J.M.Anderson amd M.S.Lam, "Global Optimizations for Parallelism and Locality on Scalable Parallel Machines," Proc. of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pp. 112-125,1993. [16] B.KUHN, R. MENON, T.MATTSON, R. EIGENMANN, “OpenMP Parallel Programming ”, IEEE ACM Supercomputing ’98 Tutorial Notes, Nov. 1998. U7]IB* be, mm, mm, ma mzm FORTRAN 3 Vol.40, No.12, pp. 4296-4308, Dec. 1999. [18] H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura,G. Matsui, H. Matsuzaki, K.Aida, H.Honda, ’’OSCAR Multi-grain Architecture and Its Evaluation ”, Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, IEEE Press, 1998. [19] ^# m-au ## mm, /b# ## ##^L@#^ARC#^^ /HPC#^^, Mar. 2000. [20] mm mm, ## ^m, d# am, sb to, mm: ARCl36-8#^^, pp.43-48, Jan. 2000. [21] SB to, mm m-, be, mu: ^ mu f^uv - 7°{h^'- ^ n - * ^ y u - a > voi. 40. No. 5, pp. 2054-2063, May 1999. [22] H. Kasahara and A. Yoshida, “A Data-Localization Compilation Scheme Using Partial Static Task Assignment for Fortran Coarse Grain Parallel Processing ”, Journal of Parallel Computing, Special Issue on Languages and Compilers for parallel Computers, May 1998.. [23] # B WjM, 4$^ ]## : “A Standard Task Graph Set for Fair Evaluation of Multiprocessor Scheduling Algorithms ”, Proc. ICS99 Workshop, pp. 71-77, Jun. 1999. [24] 7t# MB m, #E, E:(DfcMn&mmm”, Vol.40, No.5,pp.l924-1933,May 1999. - 17- 1.2.2 -1* -f > f-ro -> y Ltzo «£%Esnfcii**$(D±ifx mm&ffift&iMmtZo mmmti-eyu-MKc-Duxit, ^©f«is$»sibs$ ft"CU'^i©"Ct©Pl#& (1) T'SSdh'f"-5o (2) X14 Whole Program Paths kDf y°uy?A(DWi ( 1 ) #8g?!l7=-f 7D-## 137(1 y-*- 7 7 D-1$#f@E6ti:7,n77 A&gM7iJ-(b'f-E./ztoO^-H-f > h Tfc D , C©###mi=HA!:-m©@#lol±Hd:?oI/k?&^. k k 6A\ $ECtt7"n7‘ 7Artcjklcz^,$l$a-%l±f ©tEm^##&a#»*©l: bti'^o 3 >;W yfilE UV'7-d ^y AHfrSlSET-B^S^ife-BXzto, ?M%y-7##ibMU -5 kES-ti-d-^. 6fe-fs k©E$©)iTg)g4b^j£7!Hbti:SJ|Sg;* fuck«5. 77 > 7*- K*¥-eS«dnZc SUIF 3 WW y&IH'ASMElb J; D , *(^^$119 7 D-6#gUfc3 >yt+';hBeWSf©#S-fbkkffl«ffi;$S$l?Sz5ufc ffi3 7 h Ikiai -|g© 467iMb©{ig;eA 5S6$-t- § 3 k k * S/T$n tl'4[5]o ^ k T-. * ft 7 * k l? a 3 * ft ft R £ # ffi U fc »Iff * S k U T £ ft ft SB 7U ft- 7 7 o —)WSrS5;[5]AstgSStvtt'-6o k©ftfl;tt, #*©#Mib^'77+'^— Mb©?:©© SB?1!?1— 7 7 D—Sff^lSilt'5 b t C - 18 - XT\ ipo-®a±;i/ — XX^ &mii: 430 ;i/-7#4 u, f m7 SEM-X& 150 tfeofco ^-X SUIF 150 )};-yp(DMLm'Ci\ZtkW. LXX^fztK ^fbFtgE?0X'-7 7 D— ffttif £ £ D 64 lb—Xtt^iMbtr^fco D W$i&&mx$>3 h¥U$rT:§3o UT cm&AM:U\ Z=a;UaE?'J#^#0@Efi|#B8 (Z^f'f- X “ 7 7 D — #P$f k LX L ^D L tlX M % Inspector/Executor[6] [7] b H? (2) Whole Program Paths X7 7'7 Am#fiJ{b"7#i#lb&X##)X^ < ±X7°U ^"7 c a(± &6o cm (±, ifji$&Axti$g£ESl‘£xn 7y^;v^v oREm^&^yco ac5^. cm;b-X^^i^g#W(j:XDX7Am##$:m%±X# > bx&&o f cx. xoX7 Am#8y^##&^AC%@x^^^)}:XDX7 Am#j#mrn^i^ #t #J X 6 #f L L» ^ ^ k LX Whole Program Paths(WPP)^|ES ^ tlT V'l ^ [3]c C0f fel£$tMX^t£fr^fz)\' — XtE£$i^(Df5i^%£fz <' C ^St^falO^X 7°D 7T-r v >x&w#^x^^d7m^M%x^*a^^ t)mx&^,o WPP (±27m7^-XX^#^^iXL'6o #-7:c-XkL XoX7At:j:zDX^e ^ft6#[o];^x&bi/ —xf^c^x&^o #x7oi—XT(d:, cmbL—x^6#M e (Xz^X^#^^L^^) ZD7>^7bX#b^f^B m-#lXDX7Am^ek#eLXbi/-xmE#^et>a, DAGX&60 cmB^C id: X n 7" 7 A m ito &BOT7 D —$:7 7/^7 bfronxft < b Ltti7r a. #[q];iX b V—X^ltE # —X5^ 7 XX (j:, XD X7 A^#^ff L/: ^ ^m;^X ^#6#X j: 7 t:XD X7 A ^#b#7 — b^#&71A^C a$:fj7o cm^^(^^^^6ftXL'^^^[8](:^-7l\XL' a#, ^e^x#:i--7t:m%ijT#6ct 7&:/ix b L-xcm^jm^^^Aox^miE ^^foXW^o WPP XU:/1XXD77^ V >Xm^#^#f^AXV^o U^^^XD77^mJ:7^(±#fo|X77mi^ag^f^o C7f6C^X. /^xhiz-xm+b^X^^x^^, bi/-x##^LX 7—iv+h-Y b ^m^X* s7 — U —iz £-DX'Mfr£tifz^X(Dmtemm£tiZ> &olzt£% ai^7X U 7 b^&^o b. AXDT^A/iX^JSS '& - 19 - T;vrfU XA[9][10]£IWT:& tU X U XX;W;:eJE£;b[];LT vaQ SEQUITUR (d:X h U >^£Ei7;i/3 'JXAT'feot, A*i:^lT3 >y 3rX h 7 60 ZCOTJlzf V XAld;£ £ ^ V —>XCD^^ C kC ^t>ftT ££&©y&a0 SEQUITUR y;vrf U XAi:olX^> UMm%lMz.T3o < o C(7)y;i/3 V XA& U ny XlGC^X. a &a 6^,/: a SEQUITUR S -»■ acba c yb 5ii*n^il/r hf aQ ac 6o SEQUITUR \£Ztl%'> >4t; 1/ A T'E^i&X-T S -> AbA A -> ac c 6Dy;vrf V XA®MA LT Xy y — )l/'V — S -> BCBA A -> ab B -> Ac C -» Ad izmtz\zis>-tf)i difi'Mtinztirzt'rz t sequiturdTsa®#uu, 6o S -> DD A ->• ab B -> Ac C -> Ad D -> BC cc#Ay. B S -» DD A -> ab D -» AcAd a^ao C0y;i/7'JXA(i^t)^Tiffl^Tfe^ SPECint95 ^>7 7-^7D^7 A®^ &y#±m#-f&a osg.go c^fazGB t:±aM/-x<& 300MB ix -^fP^tiitao^^T^ao xyy —, co^yyy —yxo xy^Tf cD#m(±^ iQOMBT&ao ccoxyy- 0X^x^6t)^a cko c wpp teitogb U&ix -20- WPP T-tt DAG a$6fflV'T±IB^7 V-&H 1.2.2-1 © J; oeit$t"3o ;l©I2T-DAG ©i*|g| 5y — Ptti^T—©#*&$£'> >*d/ (±|gfi]l?tt S, As D) t&Zo (±IB«T*ii as b N Cs d) Ttfo-So Bi.2.2-1 DAG am DAG ##®Aggy — pay ?-7 — 0 production (£j$) SSL.Tl'5. -Etl 6 li production ©65$ A1 6> t > #11P9U?& 6 « J — P A 6 y — P B ^\©m y yiiSffl A ©6i3 EMM B ltl'5CtSSt, C© DAG SICint WPP T-|i7-Dy7A©lU!)4f#SS4CSlt5CtAst- t3„ y — PmSlff'SStty j'— h ->>^;v*6^-©y- p^\© dag ±©yixEk ur m$tiZo H 1.2.2-1 e^Lfc»^lillff*B6*t-o c. A7D y 7 WPP 0 DAG *5)dvy KHOT)^X6H,ott-5e kitP-eS&o diy pAiXkv?©ti: a*#e##©m©@!A-z&6. ch$-t-ae.nTv^tixyD7r^v -ettrn A©Hff SSSdf y c t tf-n^tefr-Dtco wpp tt^m%mwjp)i-y’mWiMz-Tt)^bg: fctfT-S-5o *fes -So ww y ctbn'nmmvfrfrz? - m: ai-r-setefr-So $<©#^i±s $>4@»©n-P8 (S»s**^tiiEiia±&*fce> fe WPP T-ttc©diy pyixSE-yit^fctoiCs yixr-Si-S-y-yVl^ t Bf-S8t^:6#A UTVi-Bo $y Md-y^^ttfc-5 3 % P o®v>#®fl/'iX'es bs KHestffSiisvtX/tPs #@enx p ©iEt';t^ v —;> a >&StivlX©k"6 6ipe& £ o DAG 6 p =7 n—Ztz Z t CioT s P tt± - 21 - fcBfeodt 7 h y yy^7 6 M-7it, kfttcy yyt7 SttlttiitT v d. FMS6$ SPECint95 'Of 7-570^7^!; Microsoft tt© ') P —'>3 dOfy —i"<-7 7 uy=7 h. SQL7.0. y-Fyn-fe^yyyynyyA WinWords I: WPP fflfftik LT**$.-5$g$As#e>nTV?,„ SQL TfflfFffitt TPC-C ^>f7-i>7D 77 A&ffiBeUflStffSyfctjtoT'&So y^ 7 7 n 7 y 1- yp 6 S 3 to C Microsoft, tt© Vulcan '7 — yi/7rSKlf^'J' E> PP Path profiler [8] SflJfH Lfco Z tVCSfc h V — 7 A1 6 WPP 6$h!t U, PPCompress Lteo E$g*ti: 7.3-392.8%y$>b, yoyy A©M#7D-##8m#y% fty byyy^7©%M7ld:, 4f*Jgy^7©ff«P»k ityixCt&otlfi^hSff) ^sasife. *yyiy*m#&%©y^g, t ©ft$tfy;7&ft7 fyyyt 7 k LT * 7 > h Lfco < k, $ < ©#Aft 7 hyyyiy(d:€©** 100 #6f#m$yyo37 6, irfc & ft 7 i- y yyt 7 # mt. * 3 k taftu:!:A5t>^oteo ;;fftofe'Of7-?7Df7AciufttS *7 hyyy (3) yyvy-7 v 7 pynyy Aiqitj-^'f >7ig#r yyf-yy l 7 K7n77A&®ybft7k kct#57n-ft#, n >y ft7 ESreHtf-i' >7®f/fSS#f6$SiiTVt5 [4]o vypy 7 L 7 H7Dk7Affl4if 7 o —###%#f)f&(?7@nl:ld:7 V 7 Httt©&y»sk7o 5liST-SIfi1;*ft 3*$;#itf"3i&E* s&vfcto, iEftynyy a klalLj;7lcE7kk7tisyS-E>o LfrL, 7D-##%##&R7#Al:lft, L "CV--E.7 P 7 Ktrftlt^Uff 7 D—LWC, ^ftkifcff LTftff LT t'-5ffe7 P 7 fCi 3ftft >7fflfiSH«©*fb'fe#]*ft^ ft 7 k©re, &71/ 7 Mc#*ftef' ■ 22 - a. MOfSit 3*7o 7? At;ft(t3tf 41 u- y H'H5©4?$tf £ c h Tx ?;i/77 V y H7n 7? AS® bj&A 5 J; 5 t;r;i/oU 7A&te?l Itl'S. X >7jSff4l£t;ti: Wilson t Lam t; J; 3 4?£[11]& ©$ $Stfl LTft b x 70 7? AtpO'SESf^T 0 7 —>3 >-fc y h kfftJftSEoll -23- -f^HHgl/, M&ffoTVxSo &43|8SCI4:7 7>7;<-- F*© SUIF 776A&S JELXm^X^io ^>f7-?7D?7iB 18 *t, -£®-9-'l' Xfct 53 fi~4478 ffT-feSo 7d ^7A fft, lu, cholesky, fib, queens, knapsack ti. iZXifc -S o (Wffi#Stt7-D»-7A® SUIF tfHfl3E$CSVT, load 43 store iWt'7i'-k7S n-5 WEtt®$.-5 n^-->a >-tr 7 F ©$614*6 3 b 4: ic 4; o TttaiJ Lti'5. SUIF 6 UBS! $614: load, store C J; 6 ffiS#S43 J; lFE?iJ#Bffl@'6'tC ® » ffcfiKSh,5®T, 6-f >^##®#g&EI^±tt4:gm%t,® ^■ffliBStt, — 77"D7‘5AtC43t''C load, store Jt D 7 7 4zX$4li6BJIbttAsife5 4:£4lfcD7" —->a>-fe7 F©$I4; 1 D**T'6 4 fl T$>b, #^®% 8 SlIlcH VTttiEBtCfce 1 -3®D-y-->3 >-fe7 h &0ffl-tZ>Zt c®m*&fFmf rn7-7Att*4e?ijstt$n-?.7 u -7 FgGe&%#fF OE*AS) k®S*®tt« C. 3iW >7ft?Wffl(Bffl60: bfflitXfflBW >7##6AI4:%@, >7'7,D'>*i7 F 4: 6 »36?ij-(b 7nyx? F-t-fflfflStiTV^, 4>® 4: b 314:$ fciESlT'n ACISIff C L4bfflV64i % C1$ c ©## @B 6 IB t' 7 7 ;k 6 7 U 7 F7‘n7,5At::43tt3E'a (race)®®to^4tfll6i>a-;iz^E^J#Wto$6->-xL-;P6ft:Bg1--5:F/ET:$.5, d. V 7 F 7 x7I?’V-tU-xfflfEM 7,11/67 V v F 7"n 75 A©H4$iiK7-l4:, 7-—+f C#7 V >7 FBBtc ±©6 5 &4BSfF fB##&634p©##6##63 C j-jiV7 F 7 x7©±jH4|o]±C43f'7MS7'&3o C®6®k:|4:*@®miA:4W >7lSffi 645 5 £ e. 7D75 A$tft/x®fiUH 7D75 Atfi^SSt'-S 7 rq 1 ;V#ff 7 F 7 — 7, 6—7^—77767©65 t Uq ,6>->®*$f'#ffl4:, $$©## 6$ ±4b7|B} byjjS6R-3--3©#fNc6 3 (batching transformations) C. b X, 7 d 7 5 A © ^4564)^6 fe if 3 d 4; i5nJt67 S> 3 <> c©@AC*BI#(:, 4#fi®«t'4fq' >7JSflrl4;eg7$>3<, - 24 - [1] w) 'W 7 -T^y D v®i^E^s f ^ 10 ^SiiSE^|g^S> NEDO-PR-9809. [2] Sungdo Moon and Mary W. Hall, Evaluation of Predicated Array Data-Flow Analysis for Automatic Parallelization, Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 84-95, 1999. [3] James R. Larus, Whole Program Paths, Proceedings of the ACM SIGPLAN’99 Conference on Programming Language Design and Implementation (PLDI), pp. 259-269, 1999. [4] Radu Rugina and Martin Rinard, Pointer Analysis for Multithreaded Programs, Proceedings of the ACM SIGPLAN’99 Conference on Programming Language and Design and Implementation (PLDI), pp. 77-90, 1999. [5] Sungdo Moon, Mary W. Hall, and Brian R. Murphy. Predicated array data-flow analysis for run-time parallelization. In proceedings of the 1998 ACM International Conference on Supercomputing, PP. 204-211, Melbourne, Australia, July 1998. [6] Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-time parallelization and scheduling of loops. IEEE Transaction on Computers, 40(5):603-612, May 1991. [7] Lawrence Rauchwerger and David Padua. The LPRD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN ’95 Conference on Programming Language Design and Implementation, pp. 218-232, June 1995. [8] T. Ball and J. R. Larus, Efficient Path Profiling, Proceedings of the 29 th Annual IEEE/ACM International Symposium on Microarchitecture. Paris, France, pp.46- 57, 1996. [9] C. G. Nevill-Manning and I. H. Witten, Compression and explanation using hierachical grammers, The Computer journal, vol. 40, pp. 103-116, 1997 [10] C. G. nevill-Manning and I. H. Witten, Linear-time, incremental hierarchy inference for compression, in Proceedings of the Data Compression Conference (DCC’97). Snowbird, UT: IEEE Computer Society, pp. 3-11, 1997. [11] R. Wilson and M. Lam, Efficient context-sensitive pointer analysis for C programs, In Proceedings of the SIGPLAN ’95 Conference on Program Language Design and Implementation, La Jolla, CA, June 1995. [12] M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN ’98 Conference on Program language Design and Implementation, Montreal, Canada, June 1998. -25- 1.2.3 OpenMP 43 OpenMP (Dr-j’ftflazligVfz&ZM (1) lit»C OpenMP CoUt. ##. 7D X 5 5 > ?=£ 7=>. ff $6(±#©mM. tt<£. *Atm©lia©t:rA/i:©J:b!R&£'CWL'TI!5 W?Zo *C. OpenMP & A ^ 0 fl-gtSMMIf ©«!;'«)♦ A <««£ If 3 fc »fflll£5l C mLxmwtzttt>iZ' fo?7i> «. o (2) OpenMP i;(4 a. «S OpenMP ti:. —g-cvo k. rftfrx^e V v/t a -t"d -tz x tt-oifeM it'd X7 i. >? ©3— F±H%$@MCf-5 itAC. V —X Xn X'x AC3 WH 7^x©ftSjt$ Afl-E> C k* 5Utt Uilrfcft-So :ct% Rg@»©IA. f ©3>/W 7^©@^A©<±##e%? S & o T1' A <2 k 1?$> 6 o Mili, SGI Power Fortran/C. SUN Impact. KAI/KAP ft klMA. tt#frsSfCoTl'-5o ChT-IA. xfr/AXDt xtlAXAAIg?©###*^ < . ?ct. SWf A^-eWIiC Ex-5 API sasiii ki'7ifi:&-7fctfx.?)n5„ OpenMP k#x g,fta. ctua Stt©^-X©Mlg* 5 Fortran. C/C++tS^:ti)Ati»J. C fflU i k LX tA. j£ ?yitn«$±k utEo©aa ^#T*$>ofcC k*$>3#. ##gk UX^6#6CIA. ttTfflCttfliJ, if. ## ftrii+*xDX3AttibetDtj^wttfr 5*!^kAs±ife>n-E> 0 tt it * s At-&-&£». 1/0 CMLt©#%b%g$#AftC'C k. $fc. c©B©xnX3 AT-li. x"nX7 A3 — p A© 5%©a^A^tt©Slfi:RF(a© 95%$c5to& k Bbntixt. jfiFUfb©ECtt. f © 5%© k c dffl^CEAffttfS < . ifi?ij-fb*5SS"C$iofcc kAsASMEbt-S-B,, - 26 - ofcc £:&a6h UTSx e>ft3[3]0 b. OpenMP CD#E OpenMP &§r Li^B !§"£'&&uQ >/W (directive/pragma )s 7^7^ ^ (C 3: D ^ Bln £ Fortran ^(3:, !$OMPT^^^e(±, OpenMP ft£o Cf #pragma omp <5 fx(3: OpenMP fiTpfx'Trfe £ 0 £fz, OpenMP (3:, yo^7A& 7°D Vyb^^ij ±#a0m^^^##f^(3:, f(D7D < c V^ & D3m7"D/f7 A&df m^-riEL<#f#f6yD ^A^ Ml:, c^U:j:b, b &T(f 6 c c. ^ff drrrOb b T'—^rv' *7 ^ -V OpenMP CD^EfrdrTOl/ld^ fork-join M'V #pragma omp parallel call foo( ); call foo( ) call foo( ); #pragma omp parallel IH 1.2.3-1 OpenMP (DMfr^T*^ -27- cm<^ao ^fyoyyA^e^, f ##, #pragma iZMMt % ts * © jlT CD y D y # £ »J \Z%ff f 3 o C 0 ^J T (d: H E OpenMP 0T-yyy^-H:OV\yM^^o Ml.2.3-2 Cme&^ftljo yyvy —ys> y jL—y ♦ /x ♦ * m- ycf i/^7^ y 3 >;W y V os 0y i/ v F;F—y> m 1.2.3-2 OpenMP 0 1.2.3-2 — if©T 7'J y —'> 3 >\t, OpenMP 0 tN yyy^ry, &&wa3>/w^&^L"t OpenMP 0^e^yyyyu$:^yo #^e# yy yy v §#w\ c®yyyA 0 os ^mmoy v^y y v F;i/-y > c ^ OpenMP #mne yy yy a. fwtyoi/aoiw #^jyDyy<>ytf;i/kuyw:x OpenMP ixL^t:^^<0^^^&^[i]o ccy (± y- 0 4^ ^ 6 \ MPI(Messege Passing Interface) h B+Sill § £y V y F##& #y#A^(Z)!t#^fTOo 1.2.3-1 Q(d:#B -28- 1.2.3-1 fWtfOFa©## mg MPI 7 D v F OpenMP zF — 7 \L 0 t" 4 O A O x-y-7 b- ’jT-'f O O O o o x — 7 36 ^0 if tF— F o o mm%^36^yib o gdMD^;i/©#$ o W<77 — F ^©3# o ^y^mm^gamoTu^ 3;oi:% "C#6o O) itmtruy^&m OpenMP 7#($ (Dmm%Mw tz>o a. Parallel Region Parallel Region (3:, ##&©%DvFl:3:c-C36^yi:^e^3T,^a5^^f^^^l: &t>tl%o Parallel #^Cl: 3; o "Cft-5 o Id) D parallel region V V F § team tl%0 ID (±0 0, 7 7 7 7 D y F© ID ft 0T&&O ID e^(DXDvF^giJ(Z)a5^$:T ^33, iD © omp_get_thread_num( )T'Sf#"3" 'b Z Z h %7: ^ %> o 36 ^>J C SE fi ll" 'b 7 1/ y F © !$( (i, ^ If # 7 f 7' i? 0 ###[ omp_set_num_threads(n) £ GEoT, ^1 © H ^ ^ ^ ^ 8% OMP_NUM_THREADS 0 &3b\ parallel region&ai £ H5 £ 0 0 i.2.3-3 i:yo^7 Aen&^f [2]o 0 1.2.3-3 It, 1000 to -29- #pragma omp parallel { int c,b,e,I,ss; c=1000/omp_get_num_threads( ); (1) b=c*omp_get_thread_num( ); (2) e=b+c; (3) ss=0; (4) for (I=b; I m 1.2.3-3 ( 1 ) f©#, ss for (?) -e, e#i/v ^mscML^Av-cm^m^^af^o #7^# (?) i^i^cmm© #l/'7 atomic C b. Work Sharing ## Work sharing ###, parallel region f^]©^^® team C 43T, # ^ © X 1/ V F t)s fttUTZ 7°D ?'■? A oft^h^JliAEit' ^ fc <& IZ fill ^ % o For ##, section ##, single ## For ###, ;i/-7#B©^^l/-i>3>©#^l/V F/\©##!l7]&&m/E T % i) (DT'#) £ o V>£) $> % x — 9 ^^Jlf W-'fe'iJ O B^F ^i$ifc>tl % o Section ### section ##^#^^4%X:e#D'7^^#l/V #^JC^eT^6©T^^o For # #yb^-^^^#!l1-6 C ^iZck D section ###^^ m^Jl:^e^i±^)Ck&B8 gkLTl^o Single ###, -o©#l/vF© ® for ## for###, ;k —7©^f l/ —^3>gHLT#W:^fzfaX=A&l:;B^&fl,&o y$y ? z tM$k%ixv\zMmt^_o______#pragma omp for [clause...] for (var=lb ; var logical-op ub; incr-expr ) body 0 1.2.3-4 for## -30- $zlWL var (d:, — o Logical-op T'tiU < <=, > >=0 4 ^T'^o ^ Tz , incr-expr <£tWL var £: }E$f d~ ^ #3, C 0 for ;i/“7°0^^©;l^/jB bs break £ ftTU&U C h ^T'$> 3 <, Mt: cyauae-e^, mT&cfi&Bmfao X ^ i/ n. — U > 9 0fitAiifd:, schedule(kind [,chunk_size]) T'ffOo s chunk_size (±, Kind (:(±, static, dynamic, guided, runtime 0@Si Z> 0 Static (d: chunk_size 0 d" ^ D — S/ 3 > 6\G > P □ tf > X! $ ^ G; X b \v MC B!j D G" (j" £ 73 ^ X:' n schedule(static,n) schedule(static) schedule(dynamic,n) schedule(guided,n) 0 i.2.3-5 7,>r¥?--') >vm 0 1.2.3-5 0##(d:, -f *? ]y '— i/ a > £ m LX £ o M#C0l§ n'JdU chunk_size T ^ cF ~^l£f d" %>o Schedule(static,n)'X: (d\ n ft(D chunk H IZ Hi HSU bttttZo Dynamic Xlid:, flltt73 &^7t It t' & dfc £M0H:b o £ X D y E D 0 chunk £: 5§: (d"EX £ ^M-ofe^^T'lr^tb^o Guided X! (d:, 51 D 0 d* ^ D — C^flX:, chunk iy'd'X^d;bT'O/J\^<%^l:e <0 (D section #dC section #dC(±, section tltc? U y # )/ y KXfM^Jfc^fr't 3 o 0 1.2.3-6 X!(d:, sectiol k sectio2 0 ~fU y ^^£50 cF tl%> o - 31 - #pragma omp sections { #pragma omp section { section 1 } #pragma omp section { section2 } } M 1.2.3-6 section (E> single #^C single 1 ^ 1/ y P mto 111 1.2.3-7 £ S £ £ ^ 1~ o #pragma omp single { statements } M 1.2.3-7 single c. mm, mMfflfflvmtt * i/ y oz>o ® Barrier #7^^C Parallel region l46D##CD% 1/ V &4on Work sharing '0//ci§n\ parallel region £ /±5 £ (± #pragma omp barrier ® 1.2.3-8 Barrier ® Atomic #^C ^ t V CDM#f & atomic o fztf)(D^3C.'V;&>Z> o #pragma omp atomic statement HI 1.2.3-9 Atomic # - 32 - © Critical #tC V "J 60 #*$0 1.2.3-10 (C^-f „ Critical section &itu& 0 1.2.3-10 Critical #tC d. OpenMP ©3©E U OpenMP ©tt#3(t 0 17^71/It weak consistency X $> & o U fz A5 o T, parallel region ©#%#, volatile $gt©E», ^uTimm, flush mmtc©#fri:te u©-me&e e. ^fflffe It § fcflttt Kl^td n Orphan 5s -f U ^ rr 4 T, master #tC, ordered # &k’®#l*, ttB©8l%t>$> D, LTIt© c-Ctt^BSf-So (4) 1418 OpenMP *fl 6©3 WW 7 It, *H© V 7 h •> 1T ^ > 9X & £ KAI, PCI &%©, SGI, SUN, Compaq, IBM & k*/\-F ^ > 9 % IB38 £ fr o X t' & „ CCX It, ®ASH © RWC -e^*E© Omni OpenMP 3 >7W 7 ©MS3 „ RWC Omni OpenMP □tll-ftli, C k Fortran K&tttl- F L, Solaris5.6, Linux2.2.5 % k UnixOS mtMTXmat 5 o :®3WU7t, NPB1 CG, BT, SP(ClassA)&3 >7W 7V L, SUN S1000(8 CPU)tl!jff k ^ 5, 7"n b -y+ 1E 8 © mfrc, t=}»»SAs 3.8-5.4 ggai5 H k #*!?,£ tlfco Chtt3>/H5rli;'f t T6Eotc»Mk VCIt, tfc«65li»*gstdkSx.?>o *fe, SolarisOS isSftt5X V y PSftot/D t7i>S IBif Ufc®-&klt8 t UT &, Solaris XUtP ©®ir kl$IEPlS©ttlgAMI e. n^> C kzbsfl-6^-3 tc OpenMP It5fclciz|y<;fc t d&IS* & OS fflt l/ 7 K Cit® UTffo T © £ fc®, Rt,My^;i/fa 3i4T-t-5^ kit, 3—H© k#x.6o (5) OpenMP amn:a'%u6yt v<#m$ It -5 fc ® © f± H © K 31 t -31 ^ T Ml »j f -5 [4] „ OpenMP ltt±#7 - 33 - t*&£o vuvn«\ F^^tm&ibbtz < Mt> £ £ o ##:% U\ for 0 schedule #7p@ii'^ll/^ 7°#^&^}#!l Ltzftfte, ^Wl^tifzy'—^^^Mt^y v y Yi^Moyu -b 7tM:#!l Otttfbtlfc 0 1" yLTZ'tt, frWC**: V y — OpenMP (DttffilZ'Ol'iZfflffitZo D > &IZM& btlfc$)(DZ&ftl'o (6) OpenMP (D&mttWi a. Processor group Processor group U\ (DZ$)% o HI 1.2.3-11 temt (IMTTId: Fortran l^)o !$OMP PROCESSOR p(n) HI 1.2.3-11 Processor Group HI 1.2.3-11 T\ p }£7°n-b y+f7";i/ — 7r b. Index distribution Index distribution IdL IS^U0 4 ^ 7 7 7 §fs/R 1“ £ fe £ 0 HI 1.2.3-12 777 o HI 1.2.3-12 -CU\ mf, P Index distribution IX ^WTO - 34 - 1.2.4 gjfitjn yn-i v—i/ a (l) litfel: bSi-5 C k A^ sj|gt> L < !2Bi6T, SH$tcro ^7 A&SIfr UT t©-Cf WW L-7a > ft»!2, S -39- tz DAISY DAISY (Dynamically Architected Instruction Set from Yorktown) (iN 3 t v it (C b A LT PowerPC ihcday Y- V t°d ^y y —jpy y Y-v±cD^### A#%3 Y D-^/3 b ^o:y^#(:icTs >^f>yyAY7-% il/3 — b& A'CDSttoTlEB&yD-t: yitx ;xl/-'>3 > CD E >i< $ tl ^ y ,Dy ,Y'A&&'a ^$m<^<3_D-i/3>^ef6CA^B#ALX: VLIW yotvit^i^^ fr^^:#v7 c^yo^o:^ bcg#f ^ S&8*to&yY 7s>r T/bWI^^tiru^Siy:y$>^y:E[i]^-ou-riiSLtco C0# YcT^Bf^^^Y > b(i : 1. at)^o^^^^cD#ijm^#mD^ vLiw n- 2. t-u7D/f7AmaL/=3>bD — ;i/7 d -^7 7©^^^M©x b LyA 3. yvitY%^mD^^^^mET^^AcDyot Jt Ltc VLIW3- p/\CD^#& Linux i A0H^f i: btf Linus Toravards tby>A — A LTtDt)oT^£ C ATr&ftl^n^ifcH Transmeta *±(i, E^fY^OSSfaS^'O 2000 ^ 1 D 19 B /:o ^It©vY £ nyD-feyit Crusoe Intel % CD 4: AY iV Pentium III A (i(^B)^CD##b&^(^tCD 1 (D'MWMXlX'MMLfz^ AY ib^tsroKt 7°n -fe v it*T? L^Tv^ccD##m;^y3t ^it^# &IE L < fMt*£ tz&>(D^$mlmWi A LT Ss ^K#^ltl#(iY;A(c%ito & © A®t>ii£o (2) a. ##A## vLiw(i#8#cD#mfiA <>Dv^mMcDmm, —A-y ^ Y"^CD%#A;Y Y- V 3— b&^ffT^&A A ^,o LT7D Y >/=Yi? y :Ll/-f Jp##a-7iY ^/?b3—g^tg^^y^V^itA&ajYjf^yD^^As 7D7 8a#^3-bs by b^y ALT#mf 6 3-b, ^^$ -40- iSliSIGBe36?iJ-fb©Xitoettffs L4V; b. 3 >/H l/ — '> 3 > YAzi i; XU 31 ’ ZL v — h Sn^SVT —^r-r » 5^^ 31 5 jl V— b f 6 VLIW Sv-f ^7 > b T-l-r'??-* tlZTZo OripMf ?<***/«.' C 4- *44 rl.rt.rt wot rti.rt.rtyjxjw 63 . «uew3t . > tnyntf ^ rtwrea «ad rt, rM.< rt G> VLBWlf 4-*dd rt.rt.r3 -rt$. rt.rtj^rs^® Li *^W rtl.rt.3 ** rir|3, vtiim ::::;:::i4“ adtd: rl.ara.irt :;-"*or :: 3rt:3 >*i* :3Ne:: M, . G> pi?if' ,:;::..:; rl.r«.«3 til .;■: detir t-flS , rSj .: : :a; w*fia *» «»****#-**~ CD CD *4~ prtjo.rta.rt WLIWH 4- •** rl,r»,r3 A ** ■ - :M.:: i: :::: : xAr *«»,*#'*p£\jp*> a Vt-jcwa b orrww»* bOrtrtOl b orwHMW ^ Figure 1; Example of conversion from PowerPC code to VLIW tree instructions. 1.2.4-1 PowerPC (XEUU D ) - 41 - =>rV "J 5 a + < ILP ffltttbt) —* VLIW 3 >/W ILP fctoc*$&3t" — H'4ff-5o DAISY 0 g#U±3 >/W 4/©^ —v LV#R§##&)# fc ILP ^ C i;Cfc-5o DAISY li)$^&7C0/W f- M3— P?0mmmc#$ LT7M ; =r 4 71Z%& U VILW ©*K©X D -y h ClRSCjlgiD LTU < o fctzL*. 7') l =r 4 T’^UIh H© VLIW y j.-iVnJf66a^r(iT-S5E b ¥00^^y a.-;v6l$^-5= c©#^-, ft ^0lS$ii-aetot: (-x-xy-^7 L^^ + *e,^ais©) v^-Avyx^icesajL, **0l|i^e*tCttIE LV' V yx < o Cfttctot, % #nmt&Zo ttz. 7L'-j'6s$i-3T3--y->ycx-ri>^-;vsn T IS ot tiPISliS tftl'o %-e^ 7"ny7A©HffA sEI$CilJilt"-S$T-ttT — c. l "ji/'Jtj. M * —y U >^©fc0©SIE VLIW tligiWifr^D- H©fl-lK/X iTSiifeilfi^X'rya-'I ><>'&» #- VLIW ©L vX h /WiJD£*V a > vuw virtual Address Space VUW Real Address Space tctmtUMpnofArdUtectam Phytiea f»y« le" b—*"*m< ttnuuia* + bkcooo r*wsi#t4p#AreWtactar* of rhyiiml M#* 9 * iu»i tMUJOWR + 0*8066 VLIUJUUB > 0*8600 Top or VUW Rmri MmOQr Vtit*„JWtiHHl»6e6006* — tfow SvrArch Mnmry 6*1606 «Q» P#6» » # 6*1690 |JNW«*W» «0» Me* I 6*1000 «6iy* M»* A # 6*1000 LM*eAre6 fW# Ptff» i *Li**rcttit»ctwr* Iim Arofcibieture Ohyeloel Wewry Myeloid. Hewwcy iigute 2: VLIW Address Space Layout H i.2.4-2 VLIWT H VXSB0EB CfciftliU b) - 42 - ©□- IT t)#W&5l§$5c£-f ^©A-fe y Y L, 9 91$* y h £ft 7c VXXX&SStoT&l' (3 5 y h LTV3) at'XU--> 3 Lfe k £ izfflebx W^As*4t'-5o l%#lcx D—K&Xhrj:b 6 ± !:##$-& ^,t*C x #V0gj%7:X hrtfflfflfcx-f ctie-tioT, ®i'?w7 D-fe y +7 3 >yXf>'>?« >J r y 9 I/O 4 ©®«S D - h* 4, HS£ ft, % „ «XI-X X'tir—+rX xr vfb$tiT vx^v vxxx tx—xtf^v—r -r >9999 a* 6ti^ BUSTfeb, X V+7kXM^a#©AA47Ts 3>XXX MabSSxtotC-tr-yUfcb u d. 9 k 7 K U X V y L > X"## VLIW (?-f^7>l7- + T>ft) tt 3 oo-feX X-a > tcfl-SI $ n7cEIEX V S |g.r h* Lxo*e.®*^fi -43- f. VLIW 3- h ;^ > taa-s c yi-rssi', 9->f"j h VLIW »s*a^x> h IJ k UTV-X Silt ia &A>o £«£■«: rftEJixv h u^j F©B*ft*5fr*fcix-2>o c©#*td: vliw 7T+)--f hvv^vr (ITLB) IFSfflvTISBItcfi&do L%x >x®^7 4?-e V, X> V IJ ntl'iSLl^CB, frt> b l: VLIW 3— H^fjSUV — ^>© r K V XSSglTJ) < 5>h,-Bo g. xr-=Vx^^ve©«^1-ilHf%^©^$ VLIW 3- a kx VMM liia»fflIlt«:of;^-Xr-^ x7^-v*^kx ^©^^&Efft"-6ilm)ffl vyx j'tu; W "E V ©ttffi&fSsS L&ttft «& 5>&l/> = C©fe©C SmArchktmrtC^it VUWCo*t VLIW1 c#i cr0*r3 r0 ■““ load r5'*0|r3) b vLj«a 0x0 a*i er0«*r?,0 nm 0x4 be erO.eq,Li A be erO.eq 0x8 load r5*6(rS> copy r5*r5'A\ l>2t b L2 b LI Figure 3: Finding the base mhiteciw instruction responsible for as exception m 1.2.4-3 (XEUU D) & ?-o<7X ^!J&7 ^ -;v b^T--7>£fflV& V737£"CkU VLIW i^(D7>-7B b U4W > b #\ VMM kL IWJMBh&ofc ji> b V TtW > - 44 - t%o b v ;tw > b izmMLtz^, ftfot b vx^st* ^)o VLIW 3— K4:T(Z)7 —bT(±. ^ —73—pf#'r#ftA/#ll#/;%byk#JGbfa#'%\ — o t 7 ^ -5 o h. RS/6000 Vy>4Vl/3>/W 1/ —S/s L. RS/6000 Jt© VLIW i/Ul/-^J3? SPECint95, W < 0^60 AIX 3-^ U x >f N b^T3%(D#$yuy7A (7^ >7^-- ^#^^>^7-7(7)- 3) &^#7o//7A&meL!:###?#M&e&o&o m 1.2.4-1 PowerPC &6 VLIW s\ Pm###:PowerPC ha Average Size of per VUW Translated Page compress 1% m gee 24K BO 2.4 ------""I’?...... "...iOK mMWm TO perl 2.4 i^ IK ■ vortex MEAN... zi" Tabic 1: Pathlength reductions and code explosion moving from PowerPC to VLIW. #1.2.4-11*. 800S5ffY'> I/T-SPECintSd'O^T —ColATiSfiKSilfc/l;*#© UTH3, cn6.©S*ttSPECint95#B9X*S:ttS:fl3Ui:^>^T-^ 6# fiVTff6h.fc*©T-$)t). IS$k UT, 5000f*l*±fflPowerPC:t^l' —a >, %1\ U*tj 2OOOfi|VLIWfti0 A^Htr $ AT V 5 » eft It MS** y Z/j./i%#8 im©TR#2.5 knoiii-Slti'J. (/t^ftfflSkit. RS/eoooUff h v-^t©*^v-->3 >©»&, VLIWUff h V-Xt©VLIWi6^SiT8llofc6ffllClFLVi) vlXfiffl#lt. rnk/^A©, *®*^ y v^<;i/5i?ijg©#g;to&i!Sk !,& ¥ c i: * ST- $ 3 „ DAISYffltiEltfrft bffitA$8 |3X p-C-ilSK 1:$?,= %*©*#?!*, 1 o® PowerPC 0$&3 a®C¥94315 RS/6OOO00*^*^o $ < ®a#^tl6#a-(b miotzi. DfiKW4#w vliw 3 >vw ?-e it* 100,00000*^*^* 1, DAISY©|#ff tt(bfSTIt20XW.I*l C kklotl'-So t »^l:. gcc3 >/W ?lt 1 c© v>>^^S4)ilt5feS)l:, ¥965,000 RS/eOGOA^SUffT^o @4©%#lt, # tt^^eiSeglLfcE^T-D h j?Y7-|:f g^V'a -?-3--->yb, E8t* sfiK» v&ne^. tffllilictot, P6**gt:8!lMT$3 kWl^LTV'-Eio *t5S"-D - 45 - PSJgjbnii 4fgT*&£o MftnJffi? t-( )10 AT0^-y&^3<Lmo#l% VLIW |g 1.2.4-4kL VLIW^S/^ 4 N 2 o ts ALU ^ V — '> a >2oAUt U ^ ^ L — '>3 > 5> ^ 1 o/z U\ L U d y v <7^4 7"^vy yusturtd: 2Iu^cdilp 4i 24 6D7W :l> FV'>>tit fgrep 7? ILP # 5 iS < £ ~e fa J: t~ £ o je»Nrcyel» jWMwmwcwigwww*; :#&** - 9 ALU's -«Mem Am -# Branches 10:24-16-8-7 6: 8 -B-4-3 9 :16-16*7 *: 6-6-3-3 8 :12-12-8-7 3: 4- 4-4-8 7: 8 -8 -8-7 2: 4* 4-2-2 6: 8 -84-7 1: 4-2-2-1 J___l .1,....I.... t..... I....i....1 3 3 4 3 6 7 fl 8 10 **&&*#*## Figure 4: Pathkngth reductions for Different Machine Con- figurations 0 1.2.4-4 (X»[l] - 46 - VLIW3>;Wl/ -'>3 >#&«£ D 6# U^#Atj3 >/U U -9 s >y;i/=f VXA^LX:o —Conte ^ Sathaye 0f±$Cd'97 ;W7$ftTV^o VLIW 7 9 9 R8 T #^^-7 y i/:z- <7#® vliw 9 y — h 9 jl7C L^L, ##^;\-P9 a:7&^^^L. >/W 6 C ^ FX!32(Z)j:o^#^W#t9 ^-;i/(D#8 y^^#^##^ftTV^o L^L, ^^9 —7497977 A, 7/^v^, 7/W7P9 d';^#AT, * 1^7 — 777 79 C#f& 100%#5##&3#J5%Ta tl'5 Ltlil^^^o CC^m^L^7Y7470$:^At)i±^, ## teSj^iW^Ltettt3fr Li^#0££1#h£1~£ £:S9ML£o j. IpH w t)tl^tl{i VLIW ^f&OSr LU7 —77 7 79 £, ^“^7-^rf^ft©fcfe©® #V7 b 7^7a^ACa#C7^Xz&b(D7 7D—7-C$)^ DAISYCo^T%IMZ:o#7 077D—7(j:, #^CD^^9 —74>7977 A^, #^0^—77—7777 9&, #-0;\—^7^7797^^^^"^^^ r^#0^"—79977Aj O) ^m&momm Transmeta #0 Crusoe C^m (4) f HotChipsll tlfz COMPAQ (IE DEC) C 7 & "Wiggins/Redstone: An On-line Program Specialized [2 ]W\ Alpha JlT'ifrfb'f' & x86 x^ j. l/-7 i: - 47 - FX!32 XOft&’ZfoZo Alpha 21264a fct/lx. G ftTT t> b t y* — ^fr& t % 7°n 4r yoy^ A*y >^0^>y V >yc J:oy^fLMJ^0#i^^^]LL < AR yoy^A^ fr^l:dd^0^e#j^^l:#I^L, b L-yt 0 u, b \s-z±m^t>tzz>m'Mib%&£ ;i btz±r, ^0 bL-y ^ 4; 3 i:yo y-7 A^#^#y, yo y^ A)gfb LT^ < hi>3 0£sS*i$ feT'fr-fTtfe^o fg^#{J Wisconsin ft0 Trace Processors 0 V 7 b y J: TKMy & 4:3^$)^^, ijftj3WW l/ —'>a >0— lLffl$IL LXs # R£'^ v ^ fifths ^ ® ft ft £ o [##***] [1] Ebciglu, K. and Altman, E.R.: DAISY: Dynamic Compilation for 100% Architectural Compatibility, In Proc. 24th International Symposium on Computer Architecture, June. 1997. [2] Gordon, R.: Wiggins/Redstone - An On-line Program Specializer, In Proc. HotChips 11, Aug. 1999. i.2.5 vliw i/ (i) at«>£ {JG VLIW (Very Long Instruction Word) (Predication)j f 0#B, Tib 3 V XA, 3 >/W X0#^%, V —X ^ ^ 1/-S/3 6o (2) VLIW a. dfr^11/^11/(Instruction Level Parallelism, ILP) -48- b. VLIW ^7-47*7 7D* 7*Ate##^-77-47 7 77®It#%%Mle VLIW k7-/t7* 7*5$,-6[2][3], C® 2-3 0 7-47777 12^4 6 *^v^;vm 36£iJtt£5l£m7 4 iete2 b . 7D ^7 Am^anfr^ gfg It ti-S. fc/cU ;v® l"&t>4- vliw 4l2n ww /vb| cA%0#9ua&3>/w?### U4#m2^ (SFfa). -*4, 7-/17*741211 («j69)c VLIW 19 no 111 112 15 16 17 18 11 12 13 14 4 4 4 4 [add ] [mUL I | L/S | [ BR ] 01.2.5-1 VLIW k7-/17»70Slff©fil^ 0 1.2.5-1 te VLIW k7-y-S 7*7 4©7D77 AHff®#7SS7o VLIW 412, ii jzE® 2 7 ten 7T--B5lc||ff4S^*^®«6)Sflr Ls (Long Instruction Word) CK. SlflBetel2- C ® ft Steffi 4 2, ##g§(e%X 4n3Ckte20 S8ff£ft-5o -*- 7-/17*7412- E*®n>/W 7*s$ELfcn- h*?ij6$E*jA»=#- 7®fl#4 4Riete#e4#6#^3#&yi'*< 7*te|$?ffi U SS$tejSAL4t> - 49 - O tz ® X> c. #m Percolation Scheduling © =b d & d|t ti X y ^ j. — V > y s Loop unrollings Software pipelining^]© T©2oy&^^#x.^[l]o (D 7°D ^ h )&^©vLiWs y-^y^^yotv-yy^s M#m^#mfbL-c^D, cfu:#imL Ts /wyyy>^m (mx.^7"-y;i/^k*) yaamuy^^s ^cf©imLy^#^izm^L^^i:^, M£ (B jpd' V i>jx < y©##&M^fc ^ ^i%^^©B#^W6^f©(±, yDt'7y^igp©^y&^^s ###. ytvyy^A l:Nf^^©y&^)o Cft(d:s yDdz'7y©i^)#fb©#!jAl:]±^y, 7tV^©#^ - 50 - C®f^#(D4r^ V y(±j^mjCit^TJg:^^f#rD)C$)^o C0C klis ^<^7D-lzyyrft i#imbLT&s b LTV^C^^^^k^oCa&^LTl^o Z. tl'tkf&yk't %> fz&> IZ^ Blockings Tilings Prefetching fc $ tlX £ ^ (#^)s L%;i/—& Ds ##U\ Predication ^^'yi>%i:^LT(±c^m±s %5ADL&^cak: 'T 'h o (3) 0H4^# ct#=fz(Predication) fztz Ls C ©&©*£&&< s z fc£&j@t<*nfcl'[7][ 8]o a. mm icvho^y&ou-s f077^0^->s ^-7 1:ZoTs fj;o)k *#^W:s ^^®#[*W:s Check cn^s - 51 - subcc x, y, cr2 bg cr2 subcc x, y, cr2 eg cr2 cadd cr2,T;csub cr2,F 0 1.2.5-2 a l.2.5-2 imi'Ti&m-fZo a 1.2.5-2 r% ba© -y 7 knf-fe, 3>yHvT-ttfi<'6bh-63>yW 5-e©@a©#*#fitT- afe-60 £ © 0 ii, subcc x 2; y ©#$t Sfrl' b* 6 cr2 tc-b y f d > ( x>y ) -c-&n«in$f&1 k#77 oyyAtss. ;©7ny7i,tt4#;f ( bg) »s#$-r£/c4tx c©^R(%$©m#?Kx tti!0fi$hT U$o0 H 1.2.5-2 ©6©0ti:S b. tta ccfli. S:bMf t9$ff©t#iSi&$ kto-5,, Sf, ±K©M-Ct,a-A^d:3t:. £bMt§Slff©e;Hb&ff 7 ks < T Cti©itigto&%!l$k LT, SEC £ 3 75 -f >© 7 3 -y '> i ©@»&*6fcntiD, nffnei@©®«»sE^-es3, i£, s-es -rr©m#M% ctCJ;^ fl-tt^flijyvy 3 r &k'ffl^> b ')»#c©#mib&Gt»#? <*ot±il:li5. ktiiciEK MfSiny7 70 7?d' >©77 -y ->3.6«61c k, *ir, 0 1.2.5-2 ©M-ce&aZoC, *<4=^SHff6St kfl-edb^/b5* < & b, ± -«l:, i^7 7->'jl-V >7 &fz7#fi\ fl-»$*^6Sx.T©e^rS«)ttH*»k k . #<©@^ -o©** 7o y V >7&R? C k#$V'o fct. %©7n75A WiSTtiA $*7n -y 7# 3 kti5>©®©db^^tott* Wc —S, tMS StffSrffiLfelSlIrtiA -o©S*7n y7k&oT*b, #^77i7^.-b >7A^@K cfttcib, 3 >/W j;a*fU@©#mA%:& 0 , nTtEtttflgxfcc ktc&So -52- _a O g K8 K ■F *0 K? AJ ilnn it m £ AJ ti K £ T J m > 48 E U m V ti tx 4^ ti ti # AJ u S y O hj F ti Kj e x. m G ti r\ n IK m 4^ F >K Kr 4> WA4t X It e % H F iK -R F A E 1^ AO ti n A < G F -AJ -R -ti K3 G ti it U It ti F 0 K X- it $: G ti K E N n it t: 4?- K AJ -£ , 0 #ti <^- AJ M +S X) It -X s K fZ G -AJ K4 ti m .o V *N a> ti ti rH ti U -fiu ° ti F ti It # iK F A e A it tv A -4D # ti a 0 U # ti 0 o \z F ^> y ■it t\ ti m , It O 0 < « # 'tx It n 1$ G C-A 1 G 12 M V -A 'TV K^ 1 iK 43= 12 4^ *N AJ > AJ £ F #( 44o , ilnn -Q ti ti # F G S CFG:= 7D^7 h(D%m? n- NG_LIST:= ^ While (H:=CFG NG_LIST { H (c^f LT, F P cp #)ao If ( Cp < Cn ) { 3 - f p %umtz>o } else { H & NG_LIST } 1.2.5-3 ^(D^WL ck cr1 ck cr2 ck cr3 if cr2 if cr3 andcr cr1,cr2,cr2 andncr cr1,cr3,cr3 cadd r1,cr2, T add r1 sub r2 mu I r3 div r4 csub r2, cr2, F cr1=T & cr2 = T cr1=F & cr3 = T cmul r3, cr3, T cr1=T & cr2 = F cr1=F & cr3 = F cdiv r4, cr3, F m 1.2.5-4 >/UJl predicate ^ ^ MT, HI 1.2.5-4 (7)$| l^T mmtZo - 54 - -#^0 add ft 2 o b btfLiLtZ) d bft'j&Wb'&Zo —&ftt> 2 -0^)0 sub t^CDUffTtiX WtuC)^^^(d: false, S d X>b^m(D^^at true -£$>3 C bft&^bt£%o £¥^>0 muL div^^0^e0^&61:(d:, ##0^14^:^ false C0m®3>/W;i/m$:iai.2.5.4Cr)^#t:^fo SoCO^^^C^^fjLTcrL cr2, cr3 0^{$3—b&AL%6o ^#3—4o0^%^;0 b&¥^if6o C0#1:, ^:^0##^^^ij0^fj^^ &o C0#A, cr2(:crl0^14:^M%^^, ^0add, sub^^0^eJ#L/TV^o $ fzs cr3 C 4b crl (D§k\$ftfsiyk cF tls muL div ^¥0^17 &$ij# LT ^ -5 o CCDey^fj:, crl 4o0###^03^, ##1:%;^0W:. /:/:lo "£$>£C crl 0$§M^M% L^V^x cr2, £&& cr3 0 d. 3 w-w )im cc-r^, th>y¥0 yo^7A#A&m 1.2.5.5 cafo cc^(d:, ^m4-^^e03>/w;i/^#aUT, V 9 1,2,3,4,5,7&)tfMb t%, S¥7"U 'yp 6 &, 0 1.2.5-3 /\ >^6 v ^ 1 subicc x,10, crl bl crl 2 I subicc h, 5, cr2 subcc y, 4,cr6 I bg cr2 bl cr6 2: IF 3:1F2T 4:1F2F 5:1F2F or 6T 7: IT 0 1.2.5-5 ©3>;W;H^ (1) - 55 - la i.2.5.3 LTv^i^, com-ekL % (Tftfc>*>, S*7"D y 7 2frt>T(D 3,4,5 %ftMtt£)o la 1.2.5-5 co^inckL iWlB^ftT l^£0 s^7"n y 7 2 %z\$X 1 ft subicc x,10, crl bl crl subcc y, 4,cr6 subicc h, 5, cr2 bl cr6 2: IF orncr cr2, cr6, cr5 cmul cr5, T 7: IT ia 1.2.5-6 >A^nvm (2) subicc x,10, crl subcc y, 4,cr6 subicc h, 5, cr2 bl cr6 andncr crl, cr2 cr2 orncr cr2, cr6, cr5 % H 1.2.5-7 ww;v#i (3) - 56 - False(2= ip), 3##0H:&a/:#)CU\ #H4^1# False, (3:lF2T)o C^l^CD^f^ O&kT, m^yo^/7 3,4,5 1.2.5-6 -e&&o HI 1.2.5-6 Tli, y & 3 b 4 fttii'&ZtlX S* ^#7D'y/7 2 k &^yD'y^2^7 t:^8L, c^&^lzL^e!I^HIi.2.5.7C^fo subicc x,10, crl; subicc h, 5, cr2 subcc y, 4,cr6 andncr crl, cr2, cr2 bl cr6 orncr cr2, cr6, cr5 M 1.2.5-8 (4) im i.2.5-7 tid:, jjgtin^z t£>st>frZo 8* y" uy y 6 fr b(Dfflffl (4) mm 8®rad:, V 7 t'l)x7'>;il/ — XX' F&81& f 63>/W - e[!3]o v? h*)x7^viU-'>3>cj;Dfiofeo 7 '> >CD ^ 7s Jl it LIT X ISb 3 [12] [13] o - VLIW:#mm2, 2, ^}l%l (^c/:L, a^kL ##/J#A^W:l)o • ###g : Load/Store : 2 , ALU : 2 , FADD : 2 , FMUL : 2 , Branch : 1 -57- - : 6 4, 1/^77 : 6 4, . D—P01/-r7>i>(t4itd'^;bo - H± SPARC ^:$#o /:/:L, 1/^7^ #Lo ^>f7-^7U^7AHTIt Mediabench 0 — gP 0 7 D 7 z? A £ ffl ^ fz [9] 0 Mediabench(t, UCLA!?##, 7^07077 7T&^oJpeg, mpeg, ##{b^^07D77A^A^Tt3^,4'#Ctl607D7 7 A0#^#^!#f/:6o a#o##r0t L^co a. ILP0mO ^f, VLIW^^p#!:a'0<^ cfua, vLiw ^^^0^0^ ft©7 >r — ;b F^LlzM LT, NOP !?&D, ^e^0VLIW^^P#0#!l^^#^L/:#R!:(d:^^o C0*§#&I3 1.2.5-9 to ##^^>7-7 — ^, $%#^##^#^0#AO$ (%) c 0@ 40 35 30 25 20 1 5 1 0 5 0 m 1.2.5-9 ILP ®mm -58- nnt- 0 gp CTtT E d MI M M M 4 cn o 4 & 3* m Ml % 1—* N: FE E d -1N3C0^UIO)nJC0(OO 4 * » S w nt E h-* 41 gp io 5$ rt gp st- ooooooooooo O' MI S* 0 0 S4 d w % 4 # io VP Cn n rT[~ gp 4 0 4: 4 Or d % cn E E h # CD % d adpcm.encode ^ ^ fX M aw t—' 1—1 aw O aw e>i 4 d # r a E 5 0 o' O n M » % < S? d H d adpcm.decode | ^ 4 r Or 4 <4 =t! 2fi Stl 0 S or e>^ gp 4: n E d St VP X d n X gp gp 3t id b> d bt Or gp 4 S 1C gsm.decode j—...... ■■■■■...... —j su i» ft d B> rT X. B> 0 C" ? 0 E 9 S r f< * gp 9-t Si jpeg.encode j d E e d SI & rv V m % O' n ! i ■d> Ml St I jpeg.decode j...... j (d A 0 E SF «P % E d 4 m to Sfc m E d X r cE mesa.mipmap |------1 & Cn bi S> St- j | id m 1—i 0 0 9S St gp o 1—‘ n§ Intel HP% IA64 T —/:/: L, IA64 TL^l^ (ko'T&^o i^ly^Ul/^iJtM, tSWUift&a ^c/:L, IA64 O'T&aZak:, ^3>;w ^,c ^(±m#-r(±^^o 1.9 1.71.8 1.61.5 1.4 1.3 1.2 1.1 1.0 e e ® e ® e ® e ® e ® a 0 c ® e e ® "O *o ■o TJ T3 ■o ■o ■o ■O ■o T3 <0 E e ■o ■o « -a ■o o o o 0 O o o 0 o 0 0 E M o o "g 0 o o o o 0 O o o o o o O % o o o o c ® c e c e ® c e c ® ® c ® c e ® ■o ® ■q ® ■q ■o V ■o ® ■o ® *o ® ® "O E 0 ® E E d d & E E bO m 1.2.5-n [##*$] [1] J.L.Hennessy, P.A.Patterson: Computer Architecture A Quantitative Approach, Second Edition, Morgan Kaufmann, 1995. [2] R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P.K.Rodman: A VLIW -60- Architecture for a Trace Scheduling Compiler. Proceedings of . Second Int'l Conf. on Architectural Support for Programming Languages andOperating Systems, PP.180--192, March, 1987. [3] M. Johnson: Superscalar Microprocessor Design. Prentice Hall Series in Innovative Technology, Prentice Hall, 1991. [4] J. A. Fisher: Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transaction on Computers, Vol.30, No. 7, pp. 478-490, 1981. [5] M.Lam:Software Pipelining: An Effective Scheduling Technique for VLIW Processors, SIGPLAN Conference on Programming Languages Design and Implementation, pp. 318-328, June, 1988. [6] M.Lam, E.E.Rothberg, and M E.Wolf: The cache performance and optimizations of blocked algorithms, Fourth Int’l Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63-74, April, 1991. [7] S.A.Mahlke, R.E.Hank, R.A.Bringmann, J.C.Gyllenhaal, D.M.Gallagher, and W.W.Hwu:Characterizing the Impact of Predicated Execution on Branch Prediction, in Proceedings of the 27th International Symposium on Microarchitecture, pp.217- 227, Dec., 1994. [8] G.S.Tyson: The Effects of Predicated Execution on Branch Prediction, in Proceedings of the 27th International Symposium on Microarchitecture, pp. 196-206, Dec., 1994. [9] Chunho Lee, Iodrag Potkonjak, and William H. Mangione-Smith: Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems, in Proceedings of the 30th International Symposium on Microarchitecture, Dec., 1997. [10] S.A.Mahlke, D.C.Lin, W.Y.Chen, R.E.Hank, and R.A.Bringmann: Effective Compiler Support for Predicated Execution using the Hyperblock, in Proceedings of the 25th International Symposium on Microarchitecture, pp. 45-54, Dec., 1992. [11] S.A.Mahlke, R.E.Hank, J.McCormick, D.I.August, and W.W.Hwu: A Comparison of Full and Partial Predicated Execution Support for ILP Processors, in Proceedings of the 22th International Symposium on Computer Architecture, pp. 138-150, June., 1995. [12] 99-ARC-134-19, pp. 109-114, Aug., 1999. [13] A.Asato, E.Yamanaka. T.Ozawa, Y.Kimura: Compiler Approaches for Exploiting Various Levels of Parallelism, in Proceeding of RWC 2000 Symposium, pp. 71-76, Jan. 2000. [14] M.S.Schlansker, B. R. Ran: EPIC: Explicitly Parallel Instruction Computing, IEEE Computer, Vol.33, No. 2, pp.37 —45, Feb., 2000. [15] A.Nicolau, and J.A.Fisher: Measuring the parallelism available for very long instruction word architectures, IEEE Trans. On Computers, C-33, No. 11, pp968- 976,1984. - 61 - i.2.6 (l) littoK &£© < ofroft/gfl'SflciS^T*'!? L A'L & As 6> x *©-£ t\ $ (2) M?!Hb^5il/-'>3> 3£?iJI+@SttEl3jolj--E>gi6ffle*©li@(iV'o $X6% < V 7 b 7 ^r5g#gl:fc-5o ttfg, SfflnTt&Si«©j££x 7o77A^%37 b»k#76AX#M©©<^*#&A^ -i:-gk£toxu-5o c©z-3%m#i:Av\Xx cn$xc36?iJg+SS[cy§V7 b ^ix© HXSffik LX k 5.ftXSfc77n-f h LX • A<#L©#9'J7D77 ^ >7@#g • @t#xn^7 J:^a6?ij-fb • 3£?!Hbn WW 5 • 36?iJ-fb£ft;fe7i'77 V©Sft %k'##(f LA'Lx ©Til*, te/Sfl-»X©^@ffl/$JA6B< kx -62- V7 h ww 7[i]&k*ffl§S/i|6?ij-fbfit u*uB,-t»* se>. S4$r-ic, ®iSAsiS)i!t£h,TV3 kl±l'C'A^l''o SScoSHjjfi^J-fbn >;H7li, tc*-7V'-c;v-7"lz^;v© (fit-S &©;bs«fcA,fc' T-afeb, >7&A6fflv£=k (JEil&CLfrLSSICti; < ^7 < 36?iJ itvS-eta^A, V7 h 7^777 77©R@©#*K, C ©lUV-i/a & C aAT# j:7o 1"%fc>*> a# (m*) :+m#©7Dy?A&, M?'jftg«±-gjis j-i^-->3>^ff L-3 7>atg[6]±£ff 5 tl'-5 tifflT-ft-So Ztl% n. D-->g >J 1£ L W'& * 7 V I ? h n - K (i$8Bro'>6i.'7-7 6KS U ( = WJKl4aWU) A'fctJ IC?»J • «B7 - K ( = *-;\"^7 K) Si6tnbfc757 @ i.2.6-i ®«s«j'K#7’7 7SfflMfeiie?U'fb0fce.* t -63- o) 7 #9Ij-fb^ 5 a ls — '>a T"D^7 A6»6 3£9iJf$0 6lJ#m&Et^-5o M^nyyAffl^TJl-i; Ltli, fW#7 D-©maa^6)RI7m L*6i'$^©@B^9U -e$>-5$*7'D V U **7a y £ISIC#41‘£$J0iMft#litl& tf- «S977t LT#a k© IS7D97 -5 rfflffll • 7-'— Eft® b HR t -5 -.KB# 6 3 o f kt, SilfefiMlffliAC i7$ij -EE/- Ftik St^CSJffll? n-&U U^-j'E© h-7 >#SJ*T-6 - 64 - # x * 7 v 7 w * sn 3 mm 7 n - a (4) a»ai'r-^t#y7 7tii> eni$?#c j;-3zej##*M#&*g#L, 7--7 fit ^ i*C i o X =r-- 7 «c # M « 4 & IP t S C t fc • SI® 7 D— (Control Speculation) IIS© 2 ;& [6] ANSI’S* 1 t&C>A\ (f #$A^&© V 7— 7&SA) K V 7 & -FSJA S o • ir—7 fit (Data Value Speculation) U 77 7* JtU'ftWCT K 7 7©)&;t£ ns 7 "E U AsS*7n y 7 A s#^, f©#A-e#*$tt7t\s#&?aiif So ttSA,, -r^T©SI®fflSn/x-7ffltt%IE V < »»JLTteSH«lc (5) S|®7 0-©?#| eK?#ia, £IC7n-b'y-y-©asi6|±tt«k DT, mt\^CA6cT#*%E%A^ &*>AT$fco i&^ewmSk LTaT© -65- C^U± (L^^c &) (L%^) ISHf^^fiJffl ITi^o ^£oT^iii5©W£R £>£* ^[9], #<0'7-r^DyDt ^^^^#1:2 If y D, ltfvhCD77^^^<2tf'yh Lf flH_L) c a^:ck'oT, ^^8yi:^^©##©m^#^< f ftT j:o lc LTl^o cNa.T'D^ A©em:,L'3T^La#iiiai$irfm©^fimmim#&$m u^miir&^o VXt#$g£ffll>fcEg§/<-X0^&&^S£ftT^;MlO][12]o A©i^#i: L^^i%©##©^^#i^t^;i/ ibL, t^;i/©;^7—at: j; D#J^©#i^#&^)k^9 ^ t)© &&AL C ac j: D#J^©#^?#J^W#a^ •?> [13] o C© jlo ^#^©^6#^^©## /W 7 V V ckoT j: (6) T-zmo^m ii-v-r pv'7u-t'yy-®m&\n\±&ffit u-ccc^^^m^^g LTLLT© d:o^^©^#(f ^)^t6o * 7 L?#j • /W 7" U v K7-$J Lipasti 6U\ — gtf© Load i^^|?e •66- 777 F t/7©s&iJ7-7#?#i©#*A#A-k#x^c 7'j7i'j/fr F F7©7$JC(;i, Z>x F 7^ F ?#!#$-< i>tir Sti'5 [17][18]o 7i»i[19][20]& £% < ©iSjceti, CtlS^*- T-'-^eoneFatoffiHtt^eti-ferotcattit"- e. *©k lt(*®)v;i/3 7T$jgsAs# x e>ns= *K[2i]T-ti:7V 7^777 F V7©78JC, jtKtigjT-ti^'-^evflicv ;i/3 Lti^» 7-7#7#lk L7t, a !##(:, SmMti&fUfflT-S AS, tr^K-7 ©AS, 7o 7 L7%&©7#I#S& @©e(7 6 AS[20]Ak#i@m$ATU6. ( 7) Java Jog-time Analyzer a±, %#i%eim7-7##77 7Cct\xm^, 7c7mv^tt^7#iR#c^t\T Mf&fr&oAo ±K77Vl/C%-7 < Java 7D 77 Aj£?iHh©StSI6fi:&7 AtoC, 7 D 7"7 A ##@##7—71/ Java Jog-time Analyzer $r @8% L A (Jog-time iltt Runtime i <0 lill'i! Walktime[22] J; V !4i$<©S)« JJA ©KftlJJ^oa D 7fe-5o *f , 7 57 7 x7 ;tffl n— F8SC, 1M###A Z A S^ttSCovTeSiro&JSffi&fi&a,, 7n77 A*se«!)An-5 fc, tFFSn- F7 > 7 7V 7IJ#R[F©7D77 A©@6#7'l7lMf ^>#%A##g©iR#&GA7o A © A © C, 4tR0n — FV >7 7 V 7IJ, ffMAWffiT-f# ftTU-5ffifg#* tC Lt, $*7n 7 7©%#&gg#T6#*7o 7 7 F 7 7 >7&ffAV-, 7n 7 7©%l?&iliit' £ A A' C, EI+t#IB©iK*, 7-7 - #K?#]e7:L-;i,©@K, 7o 77@mtZA77 V 7 7-b7©77 A77 >7©|+®Ak'©totoAtl$©E*%ff A-5o ?lt, 7077A# TBec, 7V7 FWffiUhm **7o 7 7Eff0E, ©?#]*, 7 V71- *;i/y<71+ene@, jfi^iJSA k'ffl^ffKa+7-7 6, ##77707 n 7 F 11 & cm A t -So Java Jog-time Analyzer Static Analysis Runtime Analysis Analysis Results Java - Basic Blocks - Interpreter - Block Exec. class file ■=> - Control - Basic Block Counts Dependence Tracking - Prediction Rate - Data - Predictor - Critical Path Dependence Invocation Length Branch Data Value Prediction Prediction Modules Modules M 1.2.6-2 Java Jog-time Analyzer -67- jja tt, tzfctbtis satbTsacffl^attcktf-et 5, kvid^-ett^gnJigt-s-Ex, $4. jja tea, tI.T0.fc -5 n-tv^„ • 2 tfc-y hSSffiAty j'SfilUfcfl-tiTiM#? • fl-eassfflv^/cfl-iRTSJs • T-TiassA^y j'-tT+ffl-r^TTT^-xofl-eTSits • Xh7>f fc*j$;©B»ST-j'BT#l$ JJA-eiikA6&SU6k fctfcoT7D • »#5/T-^ fiT9)0tttg • 36?iJS 0j;o»m#ac (8) nines Java Jog-time Analyzer VC Java Tnfc^^AOjS^iftl'SiiiiJUT^fco ''<> T v-fc7D^7Ailt Unpack k Javac fcfBUfco < — ^ k UTl£ < JBU fe tVCU 4 Unpack 6 Java "CSBxE U/fe LinpackJava -nfo&o ttHAsfi i.2.6-i m&*es LinpacklOO Linpack200 Javac Javac -O 8,015,783 57,348,307 5,011,325 24,340,585 y /7 8.7 9.1 4.6 4.8 VtS36?!lfi 1.15 1.08 15.5 11.7 886 3156 36.6 129 99.1% 99.6% 93.8% 94.7% T-j'TSJ* 98.2% 99.2% 62.7% 70.8% -68- Unpack tffljfeS'JSttHH 1 T-fe-5o CMS, iv-73! Ch6©BbigLI@C. ;H7^>r?Xttl:J;5r-5'l#l @»SS61" 5fctoT-$>5o Unpack TttWt&A-T-©!!)? #M©I$ k A, 2:" 6 £ to-5 4», ?#l*k AC 98%y.±i:«toTiWV'o CCDISS, fiSm'ftSaSfTCiot, iuv — 7T >7'77gStCcfc 5f-Asfl?-;B£fu iv—7©5-iRb ii L iwi'j 6/r v/zt-*—j'tt#M*ffl*c®ito$nrx'r ya, — u >y$ft.5J;dc&b, CtoeiRStoTilVjfe^JS^tiaiJSh-r U3= $ fc LinpacklOO 7 886, Linpack200 T- 3156 t, HH 4 fgC& o T£ b , Siiii b IHg-y-T X© 2 SgCtt^J UT l'5 C bfttofr&o 400000 total daxpy matgen 350000 dmxpy dgefa idamax 300000 250000 200000 150000 100000 50000 irmnmnnnnnn n n n n nm nnnnnn-inrinr-ir-i Time Steps 0 1.2.6-3 LinpacklOO ©36?Ug©^ffig-fb (MtJ© 1000 7 n V 7 ) 0 1.2.6-3 C LinpacklOO C*143 3fi?!lfifflB£fSSfb£Rfflffl 1000 i?D'^l;-3Pt iTLfeo U < C Linpack ©Jl^WLflTife -5 LU (dgefa) H 3 #01V— $ tl, 1*3(11© 2 "3© Ik-7 (c©9 daxpy) * s3fi?iJ(bnJSgt:fe^>„ 0H»Jfb$nfcrtffll© 2 S IV — 7 A5 IIM-13IV — b iS^tXTU-Stt^SSt) Sfc, CftAC 5ti:A>#toTi®Ulf — £A5fK9J £ fl •?>ASCtxtt, ®biSL6#fiKT5 2 c©**7o -y 7 « 4 A * u © C % ff m 'n $ ft T u £ fc to t? fe -5 o 0 1.2.6-4 C JJA AstHAj Lfctt#77 7 StSTo C illi Linpack © dmxpy tl'i ^ V 7 F©##7"? 7-C&&0 ff5U h lVffl#6*to-5 2 SlV —7T$> b , Linpack ©4" v ##77 7©##A7J\$ <, &*&o, L-Cli)U-5o -69- method: dmxpy class: RunUnpack 0 1.2.6-4 7 7 tti (Linpack © dmxpy) #68bf y-y i:«x vyxvoiE •5, ^nen©ffitte*7-ny »68blt* t>. n©4'i-ti:8*ftton-:r-4? A^sn TU-E>„ 6±©Ett^> h 0 — J — h'6Sb LTb b , C z.-frb%%. f4r- y V y b05l8A"3©tt#Hfli68bt"o -#T©###%%G0&©$V\#1| M L©%*yo y y T- $> 3, Cffl y - b Ctt 6 * ©T-'-^ttSr-y k l *©$i|fflttsr-^»sx* ltv'^al ;ol*7D y^0#O)SL#ecj;O «u465iJS*sff bii% mtx ##7-^©$ < li% h^'f h« o o$ b i. t> ,lv>TSi|*A sf# e. ftTl'^o khttCb ’SOEA5 Value Locality 68b ltl'5 = 4"6tt Javac ©8ff&n$t^ "3 V T & 3 o y'> 3 > 6 "3 M" V i Javac T-tiu #a%R6R%b6t\@Ac?Q 15 ge©a6?utt* sff e.hrb t> , f? 2.4 fg©4fi?!jg©|n|±A s)#')>b3 C ttffciPSo Linpack 0 - 70- (9) £ L'0'diij^|q]±$:f#^, ^Jft:^ ^ 2- P — '> a U M7( 7° □ 7" 7 A i;: !*];£ f £ BO #. mmrnm - 7'-7##7'7 7a^7 #0^707*7 Amm^^^u, 2 w:, Java yD7"7A(D#####7-;i/ Java Jog-time Analyzer & (### L> CtlSrfflVTV'' < 7> ;fc> CD 7° D 7" 7 A CD £:$tl /L L f®&8#U:'7WT##?L&o ##iW#';F-f##f77l:j;cTm#7 [##XE] [1] Wolfe, M.: High Performance Compilers for Parallel Computing, Addison Wesley (1996). [2] Wilson, R. and Lam, M.: Efficient context-sensitive pointer analysis for C programs, SIGPLAN 95 Conference on Programming Language Design and Implementation (1995). [3] Sites, R. et al.: Binary translation, Communications of the ACM,Vol. 36, No. 2 (1993). [4] /p#: Virtual Accelerator (3 L 7 —A P 7 Vol. 96, No. 231 (1996). [5] /J\#, ULl P: 7 U— ^ r;WlS7' < 5. n. P — 7 3 ><7)t&fp # #j^#,Vol. 97, No. 225 (1997). [6] /p#, iP^r, ill □ : 7##7 7 7 A Java Jog-time Analyzer - Java Virtual Accelerator ttT ©Y’iHf fffi - , : 7°D 7*7 ^ >7", Vol. 40 No. SIG(PR02), Feb. 1999. [7] Hammerstrom, D. W. and Davidson, E. S.: Information Content of CPU Memory Referencing Behavior, the 4th Annual Inti. Symp. on Computer Architecture (1977). [8] Bobrow, D. and Clark, D.: Compact Encodings of List Structure, ACM Trans, on Prog. Lang, and Systems, Vol. 1, No. 2 (1979). [9] Smith, J.: A Study of Branch Prediction Strategies, the 8th Annual Inti. Symp. on Computer Architecture (1981). [10] Young, C. and Smith, M.: Improving the Accuracy of Static Branch Prediction using Branch Correlation, ASPLOS VI (1994). [11] Yeh, T. and Patt, Y: Two-Level Adaptive Branch Prediction, the 24th Inti. Symp. on Microarchitecture (1991). [12] Nair, R.: Dynamic Path-Based Branch Correlation, the 28th Inti. Symp. On Microarchitecture (1995). - 71 - [13] #, /J\#, AB: 7 V ^ SWoPP97 (1997). [14] McFarling, S.: Combining Branch Predictors, WRL Tech. Note 36, Digital Equipment Corp (1993). [15] M. H. Lipasti, C. B. W. and Shen, J. P.: Value Locality and Load Value Prediction, ASPLOS VII (1996). [16] Lipasti, M. H. and Shen, J. P.: Exceeding the Dataflow Limit via Value Locality, the 29th Inti. Symp. on Microarchitecture (1996). [17] J. W. C. Fu, J. H. P. and Janssens, B.: Stride Directed Prefetching in Scalar Processors, the 25th Inti. Symp. on Microarchitecture (1992). [18] Eichemeyer, R. J. and Vassiliadis, S.: A Load-instruc-tion Unit for Pipelined Processors, IBM J. Res. Develop., Vol. 37, No. 4 (1993). [19] Sazeides, Y. and Smith, J. E.: The Predictability of Data Values, the 30th Inti. Symp. on Microarchitecture (1997). [20] Gabbay, F. and Mendelson, A.: Can Program Profiling Support Value Prediction?, the 30th Inti. Symp. on Microarchitecture (1997). [21] Joseph, D. and Grunwald, D.: Prefetching using Markov Predictors, the 24th Annual Inti. Symp. on Computer Architecture (1997). [22] J. A. Fisher: Walk-Time Techniques: Catalyst for Architectural Change, IEEE COMPUTER, Vol. 30, No. 9 (1997). 1.2.7 n7 y*-SUIF 7nyx ^ b CDhMT M WbSSE X fz SUIF 3>;W7[2][3]^#^L^m^Jfb^mm#, Parafrase ^7[6][7]a^ LTL-S & © £ UTtL & o (1) SUIF Explorer[l] SUIF >7 Monica Lam t & o X 'M&X U 3 W W 3 7 ©4 > 7 ^ £ UT&B Explorer HI 1.2.7-1 iz&to Explorer a. coarse ^ U Y > ft tz SUIF 4fe?0fb -72- Rivet visualize: m 1.2.7-1 SUIF Explorer t«08Mf6J. IB^iJPffiTffli.'^#® 'J -y a jW'JXAtffli»^t)ttait-5faSII» LTl'?« »ff»ttt^T07D^7 Af& D . ^KSWiti L£$£»s7>fcT!a©fi¥tiT&ffd„ ® xxiwt/r © K?!is»©tt#«w © ns, r=y^^- MB5'j©itm @ x*7$»klB?iJ$E© 'J“^9>l:Mt5U y^ya>/i^->©#m S5@©$e[i6]T6iE^£1fc'>t;:, cft$T-SSK ©fSAr^ffitfiSiesn-tus#. jji tdSffftt6^ij-&ii5fct-n«Av'©-ett» <^ nfflh-C*fe-&o k < C>pfBi/X^A-ettlWDfBeFaA5*$»0JI -o#&3k#k& D #;i/-7A^A^RR RSCAA^MfTx ;i/-X-img6 b ©?Q#+@#R@&%^f 6. cn6©t*ISiR*tt. - 73 - 3 ww 3##d/-7©m#C#)g#$B4 3a#f S3+9I3- FS#At5Ctl:J;oT Hilt'S, c©g+9ICBt'SBSfSti:#SC,>& <7 t-Mji K fcli&fefcV'o HfftWrttS&i+S U /!/-7##tU +bWmbS 4$U ®fSo Cffl&tottD 75 Affl read/write #®43+9Jb, 73 73Alft©^7tV# B)C++VteSf© write #!+:«$ 4 S+# ft So ft Ac, 3 >7 W 7 C ft oT^t+i$ ftfc'Jf 6S'$Sk U 77 7 a > 4ItoJ U c+i5©iSi:itStt#4*StSo $ Ac. iMtt# Jpf-?©t7 + ^- MbC ft o-t+£ftJ+b£ftS7-7 -fe l®S'J "T S C I: As ft $ S & 27 3>/W 3 kffliS8l+)sUrD6^®nTVS= C ©#%%#+)+CD# RA^^^S Aft 7d y + ArttcptSt'S 4£?!ltt4Adit'Sftfttf+l+ftfcSo 3 >/W 3 © @##M+b#b tjfflStSISS, 6SVlt36tiJ+be«©»«kv3 fc+imc SfiJfflft$ S 2: JgfcftS« c. tilbStti+f ft 7 >7 73 73 A©tt6blRl±4HS+cto©>tlS5)fflt)'d'7>7#E rGuruj 4(®xT^So 3 7/W 3### Utft-7^H++#ClR# L+zft-7 473 73ftCf ©ft ftftftA TSitS A&ftlfttt < x a_-74mB%ft-7C#g$#ft#?iJ+bCAB&m !843--yciigvaii+s*-ettibiR]±4iaoTv^ <„ a-fcs*s+i5©iiis»c M?iJ+b4fTofc»©»f§lft D te VS++Sk Lftu S 7b 77 A A* 2:*© ft S CiB#d A>4)tioft V-ftlftStio Guru C J;S36?iJ+b7D-fe7fttT!B© 2-3©Smto»RB»s+l)V^nSo CfflKBli +£ n it m * 4 ffl s f s fc © c a +» m v> s ft s * © t- s> s , • j£tti+b* : j6?iJU-7a >ftftHfi£ftS8Slig©i!l'6' • 36?i)+bto6 : +fi?iJ®aiFto©*g$ #M+b*#ml++t|j% *wK#Aft# 6ftSftltftlft%^o #M+b*BA7j\$%@6C ltl@im^7n t 37M7-7im+#©t"-;^3 K C ft oft M?'J+b Lfc+8-nt;:tttbAsffiT f S C t * feSo RB©'J'$%t/—7#$V'#6CI±KB47@ <+St + 7©ft @ S +21+++#©/!/ —7ft#tU+b Lt D , ES7 —74 ft k#>"C jfeM U — 7 a > C ftSfto^C 2:4li(7SC 2:#AB2:%S. Guru (t3btiJ+b+i;+6?iJ+b*4B«*4«))n$ ■e-SC i:4fido Guru ©##4T3BCtiJ#t"So ® 3£ty+b* 2:$t@$ft SII;toca6?iJ+bSii+c3- bCMfS#9iJ+b*i:#B43-7C7n l, Ifci;jv-7 A^'J+b $ ft Ac »-f=r C It ^ © tit* 4 M»f t" S o ® ie?iJ+b+tfe;i/-7© u 7 +as A © V 7 HC It I/O 4#ftft. A"3#ti)+b/V—7PtC%t'%%/)/ —7A^#ftftSo 7 — 7031/ —773 77+ ft 7 7 3 ft +f C ft o ft |+9J £ ft/cHf+Bf ®©RJi C V 7 + $ ft S o - 74 - (D y°u to'MW, T t:$ < —&&&&&&'? 2>Z tftt)fr'oX\<'tl\£' 3- — +?l&Z(D)l — 7&i##L&L'&& Lfl&V^o Guru —7°D 77 vCi^^iife^7n^7A^^i/Tt^)o 7°D 77 7###^—7#####!? ^ ^ &^iG±E^^777^-HbW#^^7^&$mLT^ 7 A7(7)7:^#^f 6 C fcT, c®##$fbf 077077 A777 7>7[4]^^7m^^# Alti^o 6 3— K £ giJjftjl:H 'Mt Z>jtfztflzmfet % z b&-C%% ^)(7)T&0, 3. —7^1±gT^#^C^:(7^7^71/7 V >7f ^caiCdLoT##^^# c k !:& & D, 3.-^7 ^cc#gL%Mj-(zm^& fe ^ #,^ <7) 7° U 7" 7 A 7 7 7 7 (program slice) ^#^^7-7 7 7 7 7(data slice) LU\ 7 P 77 A 7 7 7 7 <7) 7 7-fe 7 b ”£ fe o T, L&V7—^titocDUl^fctt^G&So — ^(7)$IJ#7 7 7 7(control slice)7D 77 A7777(T)it7t7 bT&oT, (7)#^"C^6o ^^7^, lg 1.2.7-2-f7-77 77 7k#m7 77 7®M##|&^f, GCD^iJT'ti, DO 1000 6 C RS L RL (7)##^^ DO 1000 - 75 - DO 1000 1=1,NMOL DO 1110 K=l,9 RS(K)=- 1110 IF(RS(K).GT.CUT2) KC=KC+1 ® IF(KC.EQ.9) GOTO 1100 DO 1130 K=2,5 IF(RS(K+4).GT.CUT2) GOTO 1130 ® RL(K+4)=...... ® 1130 CONTINUE IF(KC.NE.O) GOTO 20 ® DO 1140 K=ll,14 =RL(K-5) ® 1140 CONTINUE 20 DO 1150 1100 CONTINUE 1000 CONTINUE 0 1.2.7-2 7D V? kT.V'i BB9IJ RL (in *-flH?iJRL(6:9)^ffiffl$n^©Ii*ft*@^f» KC^0©i;fCfe2,o bZ6t\ KC ©I*# 0 RS(l:9)©##f CUT2 LTF©#^T&6o i. o t, 3kft%®X\& RS(6:9)AS1-^T CUT2 HTl:*5fc»E51 RL(6:9)-\©S#AsSlff $ ns, o, bx Xfrb £ n 3 C k A5 -5 , CfflC ztiftt>tfti\is s^ij rlowmmmw'nxs iv -t- 1000 Z© j: d tc 2 -3©E?iJ#!SRa©6[fle> A>Ct"±auts5!i5 fctoCli. rl Silkffiffl©Z3»@ •grttn >/W 7©##'C^#k#m©M#&%6AGcf c cfflio (±l3«7?ttSM RL AOV--7-®bilSLT-S«kffifII©H zb) ZMtlf Vs f - 76 - -USk:#®©?-^? AX 7^x14 AS < &3©TX5'f^Oiifa^igf T-'-xaviTife^ck6So^ajrfctoi3)aiciz^»vtsig6#$Xs XM&'J'S < T-5-Ar-. J.—tf© on demand C 4 o T P88tSS8&3lST-§-5 4 7 Ct 2.citi£>iT-fe5o e. ®*-fb it(t'>Xf AtlT Rivet S*ft$giS[ll]SfflVTl,x-6„ *i/Xf Atol'Tliil T-5AA C©'>XAA6($oT Explorer l4~FSB©lliE&teft LTUSo (D ;W/iXV >X - X'xXXxXif *i LTt'-SX? 77 - HI4AS c©x9 xtna^amT-Dy? A©#m&^fLX"? x ©SviCffliA 5>il-5c (2) fi^-X©$ttM*IB V — X 3 — K © bird’s eye view c?ilT43 D , -f ©Sin _fcT’ V — Xffl§7-f >14 #-©7^ >t 71- >©s$i4?*x Hressctttiiii u-ce^sn ofc4 7 tiffin Sin $ft -So © V —X3 —f • Ka7 Rivet > X AA03 — K ba7l4aSfflt)©tlt'tT, L> < oi>fflKS#}fbnt f. fi/pX A 1 7 A 3_-VA#Af-nXI43 >/H 7^x®S^T-$> D , *®3>;W 7l4c©#i/SC t6ofc®86ff 7c Btk L-CEofc#B/TSX#AC 47TE LV'gS-fb^jfiXiJ-fb^fibn f, LX^RKm?IELt^?Ac ?;t\ i-lf##A LAfg/pX^iE Lt'&0*A7 fr&3 #####©8*&$wLTiy^-r-Sc ctti43>/w tcAS (2) Rivet >XA A[ll] 3 > t a — X >X A A14 4 A 4 Aitftfb LTto D , 7"D -b 7 XI4 4 1" 4 f"# < © h 7 - 77 - u, —7 —aoC^cT^T ^£o tZZfiK Zo\,^'otzfliU£tifz3>\f3-'-&(DMntJPWW£—M&M4hLZ& Rivet A7 7" A&7 >b:i-7 A7^ Al;fb7-;V£:;ihit" 6C b(c^c b ccDao^^^^A^^mf ^±T##b Li:(±TaB(Dcko^^co^&^o -AGC^TT" V 7" —i> a — 7(a^##C%6o c^t&giJWcf ^ AC ®M%ik j^C A^Mco^—7 /nf 6 C b^^f^AUC^—7cD^#^^c k, ^,7"—7 6^:ACA%#[#Ay^##&^-oC Rivet i> 7 ^ A T # SimOS[ll]CZ^)i><^D — S/3>aC(Di><3.D — i>3>CZ ^TiRJft Lfc7r“7®^:ilfb£^iSiftCfi: o c ±mUlMih(DM^M It (af —7CD##V>7, ##& Query, 7"^7D^T7hCDg^#f^, 7"—7CD#^lfb d>bJL-mT&mmaucm^f ^ltb§f^tl^o tztz I/, A ^ n. ]y — 7 3 y&fid fz A C jUfxRvfFd] A5® < , #C7: m#yy v^-i>3 >-c(a^m#cm^&6o (3) 7(Dft§8iz£Z>miMmM[5] luizULfc SUIF Explorer ^SMWbn 7/H 7£/<-7 t b fc^tlSMWb 7-;V T'fe^CDC*fbT, #^!Jib0 3 >;W ;i/7DD-feX^3- — ^b©^tlS$:M btlA^ 93>tyb^#o^^77^^;i/^m^&gmL^^CDbLl: GPE(Graphic Parallelization Environment)f% <£> %> 0 CCDi>7^A(a^-tFm#b3 >;W 7^#^$g^At)1±^X:ACD##:ib7-;i/^m X., :L-tn:j;^yD^7ACD^^Jb^-^cD^m%^.-if-e&6o GPE co^Tia^ vv^^^co CSRD(Center for Supercomputing Research and Development) t II $ tl fz Parafrase-2 ^^'Jfb3 7 [6][7]T& £ o CCD GPE ^7^A^(a3L-if k3 >/W 7 b0ATCDm^##(a^7 7^7 7 ftiSftJ&7°D 7*7 DTff t)tl%o a. 7777^77 777 77 7(a3L-4f^7D77 ACD^^J^fr^m^^Cb 67^CDCt, o b t))gU 777777&BRfBL%:o 777bbT(a7:^ ;h—7°, +f7;i/ —7"7, #*7D'y7CD - 78- (D 7"d77 ASa*ke«©3 Wfd 7jt^jt#A (Jedit) © 3 W\" d ;l/ffl 3 > 7 d 3r a. V — i/ 3 > k 3 >/fd A (Jedit+Parafrase-2) ® 7-n77A777 77 7k777 m##M#©##fb (HTGviz) @ 777©m*R, SiR Vfc ^77 CSS1"-53- h-gHA^x© OpenMP jgi,T*SA kl'-pfeAn^^A^A^dfe^iJ-fbOAi-^k^ (HTGviz) © —kl: j:6x JI6?|J3- h*kItfflfB 3 — h ©SiR >h'f73-FM^UffljSl (HTGviz+Parafrase-2) © 70^7 AUff (Jedit) 3- — -y* ti — l®Ai“ — >7'+J-d7/Vffl&itd 7;i/"Cj6?i)3— NIS7D7 rd ;k t5 k kCi b , /Afflitd k kAst S-6, GPE a Tcl/Tk k»-£ S. hVk'S I7llg[15]67'7 7 d **3 > h©%B^f h6k3>-’W7kffl'f 7^-7i-7tiotl'S. Tel ti, 77')7-ya>Affli tojA»=#$rgg(3't^7ctotC|§|+Sn7c77 V 7 h HlgT*& o t, |a]#Cf ©@#g©d > 7 7"') 7 GPE TbSiStoSn-E, k Tcl/Tk d > 7 7 U 7 A^ibi* ft, GPE t vvs >A^*T^E, PM b 3 WW 7&e#jbfe b , 3Wid7kffld>777'>3>4?T7o A%©3 >/i d 7S# (A*-j'#SJv^6r$) li Tcl/Tk d>^7,'J^7pe,MAE>J;7t;7d;oT*b, ±tffl77 7-f */l/3>#-d>Hi3>/(d b , ztii, 4« jE't'E>kkAs"C$E>0 Parafrase-2 li 7.2- 7 #f i§T $> o T 3 >Vi d —/!/ (+/-7/U —^>) T-Sgi$£ft"tVE>o U fc A5 o T Tcl/Tk 037>ftltS3>d'f7 riXSS'J^tcUff^-e-Ek kAs7SE)o -79- c. 36?iJ-fb3 77A 5 Parafrase-2 14 V — X to V — X ©#;I6$}66 fi17 S-lliatiSx Wi-f 7f siMtri3*$nfev -xxoxa Att±ta©F*3»*sjc*5t$ns ##©#aib, #ig $#, #XiHb»k'##tftSo c©iaX67c©A*m§;BitXtii*1"S„ chS*lt 5 fe© (C sights ©7 0 XPtyAkAX P7Dt7 76#3Xt\So 7 0 7 O t 7 7 i4e###6#m©mai#%i:$#u *x p xot v+n±-e©m@;m#&A*mmc $161"-So 81614 c k Fortran ©«b#7A — p ^ntl'SJ; 7 Tip 0, PDE "C*I4 Fortran CMLTCntC OpenMP ®A*[14]68txS 47 tC#IES;!j|];LTVSo n ww 7 ©|*| *@914-''! x 2:ud t> ©Afl-AiiT*; b, 3 WW ;PJMI4&7x 1C HSbfl-SSnrt'So en£7#1"SkTIB©47t:&So - JSMr : XnXAAtSSAefiSfgJK* - $# : AA7D77 AlCttl-SXnA'A A$# - ?77t 7X : 3>^H7i*iS7ny7A*lfflas - AV Vy A > 7 : A A 7 7 3 - p ©# A - IISJ3- p : $j$3— P©l$|g8+Sij - 3— P$fiK : X—77 P 3— P©4bK XXX£t£©77n — AI4*t#lHsg0'>>X 77XCStA C A> Fortran T'Zfch-tiL X X X 6AS k @ ICI4 natural boundary, 3 $ b Xt-fillP— X1467IP — AXPftFdi LXi#^D 7XAsiPS47&7>7;PXxXkttSo XXXX*77©X — PI4XXX tcliS V, 7 —XI4 2 3©y — p®SHf©)*1'Bc6 :fe^S1'So c fflUfi1 Stott# k!4, XXX©SUfESSS/EAS £>©TiPSo XXX»S!H43-+P»$lg^7T®$1"SC k #1^5. OpenMP ltA*A s#A£n/c Fortran V-X 7D 7 7 A14 A V X V 7 P 7 4 7X V [9][10]fftPtb L6^A£ Fortran 7P 77 A tC$#£flS „ C © 7 -4 77 U (43--—7 P ^;P©<6A—-0^7 P©X V 7 P 7 4 7*7 0 Tip So d. iTt ^ Jedit 14 A X X 7 7 XnJBt& X windows A AX hxr-f X T$i b, Tcl/Tk TSBizE A tl fc 7 7 0 7-'>3 >t7-f 77 V$lTiPSo Cft 14 Jstool[13] k»f.£7 7 7-7©-® k LTtg#t£ftTVSo SStptoXX'OPCSSIC OpenMP lgA*6#AT # S 4 7 &« (b6#3 Tla So X-^-*e.SALycV^}g^*6glRASe kte 4 b A AX P_t^© 1177# A^WtbT-feS, *fe, t3k%m@%mb#3>/W7©AX3>7'f A3i/-7 3>k$wx&So Jedit A>X —7zAX#3X7-f7t7 H:ff UT§Sl:3 >7 4 AAX SiiAnMSlA SA@k7X3©g|$4>A7>3 XSrHAASAfiSiStetttASo -80- e. f X 7 7 ^ HTGviz kmT###o<, HTG#;E^m^L, 3-K&HTG(D/-H:#J&2#ao - htg fawztm®?1 '-#2 3 4 5tmmy v—&*>%&{& ftmtirtzofaMit, nnm tfjcoMmk - xox^ A#^jfbaHTG HTGviz TBdlC^f S'OO^&t^f > —7ai-<7^i#X.Tl^o e ^%7^77#m m±0ckoC, GPE (±3>/W 7(:#T(±& <, t? L5^-if & D, >7-7 7^#x.Tv^o f x < z> bf&t)ti%>o [1] Shih-Wei Liao, Amer Diwan, Robert P. Bosch Jr., Anwar Ghuloum, Monica s. Lam SUIF Explorer: An Interactive and Interprocedural Parallelizes Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel programming(PPoPP), pp. 37-48, May 1999. [2] M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, M. Lam, Detecting coarse-grain parallelism using an interprocedural parallelizing compiler, proceedings of Supercomputing ’95, San Diego, CA, November 1995. [3] R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S.-W. Liao, C.-W, Tseg, M. Hall, M. Lam, and J. Hennessy, SUIF: An infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, 29(1994), pp. 31-37. [4] M. Weiser, Program slicing, IEEE Transactions on Software Engineering, 10(4), pp. 352-357, 1984. [5] C. R. Calidonna, M. Giordano and M. Mango Furnari, A Graphic Parallelizing - 81 - Environment for User-Compiler Interaction, Intern. Conf. On SUPERCOMPUTING, pp. 238-245, June 1999. [6] Polychronopoulos C. D ., Gyrkar M. B., Haghighat M.R., Lee C. L., Leung B. P., and Schouten D. A. The Structure of Parafrase-2: An Advanced Parallelizing Compiler for Parallel Computing, MIT Press (1990) [7] Polychronopoulos C. D ., Gyrkar M. B. , Haghighat M. R., Lee C. L., Leung B. P., and Schouten D. A. Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling programs on Multiprocessors. Int. J. of High Speed Computing, 1,1 (1989) [8] Girkar M. and polychronopoulos C. D. The Hierarchical Task Graph as a Universal Intermediate Representation. Int. J. Parallel programming, 22(1994), pp. 519-551 [9] Polychronopoulos C. D. Nano-Threads: Compiler- Driven Multithreading. CSRD TR, Univ. of Illinois at urbana-Champaign(Urbana IL, 1993) [10] Martorell X., Labarta, J., Navarro N., and Ayguade E. A Library Implementation of the Nano-Threads Programming Model. In Proc. Of the 2nd Intern. Euro-Par Conf. 2(Lyon, France), pp. 644-649, August 1996. [11] http://graphics.stanford.edu/projects/rivet/ [ 12] http : //simo s. sianford.edu/ [13] The Jstool Application suite and libraries, http ://www 1.shore .net/~js/jstools/ [14] OpenMP Organization, Fortran languages Specification, http://www.openmp.org/openmp/mpdocuinents/ [15] Welsh B. B. Practical Programming in Tcl/Tk. Second edition, Prentice Hall(1997) [16] #r:c4v^- - w) 7 D 10 NEDO-PR-9809. 1.2.8 n WM 7 ©t$SBSMS©f$Wib[R] HjIpKcou-C (l)tlin > tfi — n > — (S)Zi±Z2 >;w 7%#g#@©4-#©& t> is Hon (i) Lf ©He#R0#SM&f^>f?-57D77A4v-Xt 6, kVd ITStt^bCfttt, i/Z=rh - 82 - — g##^IJ{b3 >/W7§afe3>/U7® cco&ax @m^^j v —^ 7°d ;W ;f Lfcn — F £%rM&Ui% C3 >/UJV lvfc3 — F#^iH-##±'T(D^fjR$fa]<#f & ^^9 ^©T'fe^o (C ^\ > ^- T — ^7 (benchmarks ;£ fc benchmark suite) — ^ 7” D ^7" 7AW:v-%3- F ^ u-cmmzfiao ^7—^yo^^A : ^ b y& j6^ij{b^#^ LX\^tztz®>, u^ yo^7 A^: X ^ tzo L- (D &. o t£^y ^ — 5? (D X.t£ 0 ^HXb, S^V-y^S )Aib • itt'^d^bb^^fj© s¥f[fl] ^ — V — ^ ^)5 Si ^ ffl l ^ (b tlT I"* ^ o 7;i/^/r—;i/T7V^r ——^7D^7A : T ^ T 43 D , ^7 — ^ ;b ^ ^ y © ^ > ^ ^ U "T (i T y V y — ^ 3 > y U y 7 A a^m#ibLT#^o # ;i/77')^-'>3>7D^7A^^>f V- ^yny^A^LTffll^CI^ ^1# x. 3>;^ bT7U/r ——^yoy^A : 7Jl/77'J^-'>3>B>f7-^ ItlOl) £&yn t?y h-^7" uy7A©###^^^y©@#^^^f^o fc-c, 7;i/yy-;i/yyvy-y3 L, f 4t6y#y^cfff©yDy7 A^#h%L^ F77V y —y 3 >T&6o 3>;^FT7V^ — y y —;i/T y V y —y3>&f8^&<2:&, ©T y V y —y 3 >yu y^ ^^yyyA©y-ym# #T #&o -83- 7o7^ At: f 0H8^0&%&Mft: £TF • A^C«t5V-^3-^l0fl:§fii • AfCcJ:^V-X3-FA©3>/H7fV 1/^T^r 7#A0FfS^B . n >/w 7 777 3 -ae^fa# 7^/3 >m##m -7D7^ A#(D#m3 >;w ^7773 >#^0w^ - -7-77^7 -7nt v7#k -OS^# (7>7;i/3L-7/'7;i/7^L-7, ^-^7^7%^ '>Xf LTtj:, 7D77A0^ei#fm&m'a&®#$^#, a ;i/ —7vh^m^a^(D^&ao f 0#A0f±AW:, 0^^M^a 6 0, m i)0^^m^mt:##u7m^7a6 0, a^a#b##0#ma0ita u%m a (2) i!t#0^>fT-^7D^7 ho^m ::tli3 >SU ?m'Mffi$i(Dmmmte&^XRtt^tiZ> — ^7 0S#!l t: O ^ T a&B,H1~ a o a. Perfect Benchmarks [1-6] Perfect Benchmarks (PERFormance Evaluation for Cost-effective Transformations) (4u AH Illinois A#0 CSRD(Center for Supercomputing Research and Development) t^< ~3fr mmfe^&fz-DX\Z, ^7D7^ A^HfrUAc^0 CPU wall-clock SJ^nao f 0MA a&7D 7^ A0##/J#A##m^ 6 MFLOPS # ( millions of -84- 1.2.8-1 Perfect Benchmarks 7°3 y z? A 3- h*^ yy v y —y 3 y V —7 3- MtB ADM Pseudospectral Air Pollution 7252 ARC2D Two-dimensional Fluid Flow Solver 4650 BDNA A Molecular Dynamics Package for the Simulation of Nucleic 4843 Acids DYFESM Structural Dynamics Benchmark 8446 FL052 Transonic Inviscid Flow Past an Airfoil 2324 MDG Molecular Dynamics Program for the Simulation of Liquid 1430 Water MG3D Depth Migration Code 3455 OCEAN Two-dimensional Ocean Simulation 3198 QCD Quantum Chromodynamics 2816 SPEC?? Weather Simulation 4870 SPICE Circuit Simulation 18521 TRACK Missile Tracking 4271 TRFD A Kernel Simulating a Two-electron Integral Transformation 580 floating-point operations per second)cF ti %> o =g-7°n y'7 A£&£#££ c t^rn^t U A^K: ^'CDfljS0^ A §GA;£ tl A: lb SI jS (optimization diary) ;wyy^Dyy-4y(D#Ax V7 7A/3VXA^Mx 3 >;W7^^rl/^7"^ 7kL Fortran?? CD3 7 > hfr^: LT yoyy AlC#A^fl^4b(D T\ 46DD1/— y^fr x 46 ff th 7'/>V — -A > BT17 til Lx 7^-7 ## Perfect Benchmarks #^Jfb3 >/W CSRD T £ btoTc^yy-v-^-e &$>dx csrd Np#®E^#cz^3>/w7#^yd) -85- >^7 —7 7D?7At LT^fiJfHSivO'-SCkA11?, gttJfiJ'Jfbn >/W 7 k LTtejSt2Jfci>©©U'kok#;?. 6#lTl'5. b. SPEChpc96[7-9] SPEChpc96 ld\ SPEC/HPG(Standard Performance Evaluation Corporation / High Performance Group) (C J; o T 1994 #0 1 H 1.2.8-2 SPEChpc96 SSlitS'Of A 3 - PS T 7 V /r —3 > V —X3— Pff&% SPECseis96 Seismic activity simulation 20,000 SPECchem96 Molecular modeling 110,000 SPECclimate Weather modeling 50,000 ra AnnvpfiwmrsimzttMt &o c ©#-c 86,400[#](1 B)&BLfcfil£ SPEChpc96 ^ k f -5 „ C©m#l±^mf stxil-rv P 6S8IUTV'5 kSHSL^TuAA 7->7? PJ'fASailt kl:aSAsi£i>ST-$>-E)o ttlbSJStrS/roT^X^AfflttffiSk© «td l:iS$1"^,6atts 'Of?-?7D77 a * SI fi -f § # # a £r- * a k © J; o * tt * r* « s 17c a & * t * it ft a & e> & t ^ „ —7 7n A©*SI:&t cTK, PM®$fx7c®HrtT*3- P©SjSfbAsIP -+f/bs SmjSk VTA A?l: -86- *5. SPEC CPU (EiB) kH& t) $ fc, —(B): j) < , 7— U BLAS(basic linear algebra subprograms), LINPACK(linear equation solution), EISPACK(eigensystem solution), k V' o fc 7 '9' 41 7 tt small(_SM), medium(_MD), large(_LG), extra large(_XL)© 4 S$tlT jb‘ b , f9@©1M Xl:i6tTkffl SPEC/HPG -eii7'D7'7 AejiiaA^EfSnTtlb, ZEUS-MP(computational fluid dynamics code), PUPI(path integral monte carlo particle code), UHBD(linearized and non-linear Poisson-Blotzmann equation), UHGROMOS(parallel molecular dynamics), CCM3(atmospheric general circulation model) k V' o fz 7" D 7 7 A ): ilft5 o X V' 5 o $ fz SAS(shared address space)i£7ij©/l— V a SPEChpc96 ttil^-C-6 k*6, iE&ztP'b j£?U^©77-7 U V t©k#M7o77 A©#e#^<Rf ^ck): Z b, 3 7 0l$#EffFfll:fct6ffl k»5W^t*$-E.o c. SPEC CPU2000[10] SPEC CPU2000 tt, SPEC/OSG(The Standard Performance Evaluation Corporateion / Open Systems Group)): • 7-7 7-0^? 7): ut. 3 > £3.-7 —95©7 ‘n7'7 Ali SPEC CFP95 C^StVt tcfc t> © k Is) 577 9 7--> a 7 ©6©*ife?>A s, 3— Kl:$S6sSStvc*5 b £-ofz< @ UtitoT-li^ev'o SPEC CPU2000 li, 7D-fe 7"9\ 9, 3 WW 7 & * k •5 7:to):ft5>tlT©-5o ZCOfzbb, I/O 7^7)9-7, 77 7-f 7 77©tttg©l¥Iit: ktt-eS&Oo SPEC CPU2000 ©-x>^v-77D77 A tiuSBjRSttSEtofffiB ): Bb 3 CINT2000 - 87 - cFP2oooTS^^n^o CINT2000 It 12 077'J 7 —'>3 >7D^7 ATffllJ&Sn, CFP2000 |£ 14 077U 7 — ^/3 >7D7^ 1.2.8-3, # 1.2.8-4) 0 v —7 7° D77 A^a74b$%Ct)/cT)7t^ ^C5C SPEC CPU 2000©#^[^&6o c ft e> © 7 7° v ^-—'>3 >7d^7 a&&t©^j* utaiR^nfco •^“F>>x7i:0S Ci IT7D^7A0^“^ U U 7"7 - i/o • V 7 —^>7*^ 7* 7 7 >r 'y 77^@^^^o • 256MB 0 RAM ©ifBit-t'r, 7 7 y h° > 7"* UTfiKtf £o • spec 5%% ^§X. t£ V^o CINT2000 © 12 © 7°D 7*7 A © 7 *>, 11 © 7D 7*7 A C ft, ftfc©U£:o © 7° D 7* 7 A >b5 C++T'IBM£ftTl>>£o CFP2000 CD 14 ©7°D <77 A©o ^>, 6 o©7 D 7*7 Fortran77 4 O©70 7^7 A# Fortan 90 "ClB^ft, 4o©7° n^7A^C T'lB^^ ftT ^ £ o & ft ^<7 ft V — 7 7° D 7* 7 A C: ft fc D , CCD 26 70^7 A0H 17 070^7 A(i, SPEC Cft^%eftft/:77V7--7377'D7'7A##4r77^-7(#A $5,000)*^— 7 >h°:i-7'>7ftA©'f£fb^ Ltli, '>7fAM(^->77 7> F 7 4* A)^7/l/ —V©^o^^7, 6ft6/bA SPEC CPU 2000 t^C077fAiIi:7 IV — 7" y V © z: C) © # # # ^ $: fr U», ft©MM&ftft"fft none-rate(speed)##, rate (throughput) jy ti h LX^kM~t %>o none-rate 6 SPECint2000 k SPECfp- 2000(4:, ft^T©^7ftT-7©^e^^©< 6V'#<5uTf rate MS© CPECint_rate2000 t SPECfp_rate2000 &, & < CD ^ft, SPEC CPU2000 CD7;V-7'7h#j^(±, y>^/V7Dtvth, ##^VlVfty n-fe^+t, 7 777 77ft A©7;v—7°'y V ft 7 VUStf^ftT v^As, :hl ftiftftV:-7;Vft7'’Dft y-ftftftft AT'©7/V — 7°'y VftT ^ fzo luizfi© t& 0 , C0^v;i/f7D-t77^'7jlf7 7^r>7B CTEffl Ufc^©7;V — 7°'y V 0, A^iJ -88- 1.2.8-3 SPEC CINT2000 Benchmarks 3 7°D^ A 3- K£ 0 BBS gzip Data compression utility C vpr FPGA circuit placement and routing C gcc C compiler C mcf Minimum cost-fow network C crafty Chess program C parser Natural Language processing C eon Ray tracing C++ perlbmk Perl Programming Language C gap Computational group theory C vortex Object-oriented database C bzip2 Data compression utility C twolf Place and route simulator C 1.2.8-4 SPEC CFP2000 Benchmarks £ yu & *7 A 3- #fr3 U BIS wupwise Quantum chromodynamics Fortran 77 swim Shallow water modeling Fortran 77 mgrid Multi-grid solver in 3D potential field Fortran 77 applu Parabolic/elliptic partial differential equations Fortran 77 mesa 3D graphics library C galgel Fluid dynamics: analysis of oscillatory instability Fortran 90 art Neural network simulation: adaptive resonance theory C equake Finite element simulation: earthquake modeling C facerec Computer vision: recognizes faces Fortran 90 ammp Computational chemistry C lucas Number theory: primality testing Fortran 90 fma3d Finite-element crash simulation Fortran 90 sixtrack Particle accelerator model Fortran 77 apsi Solves problems regarding temperature, wind, Fortran 77 distribution of pollutants Itl^o M — base flST\ fflfelZ M fz o Tt£^ LT £ 1^3 WW 7 7° '> 3 > ti: 4 k L, t^tO^>fV“^7D^7Ai:^ltNbt7'>3 L&mm-e&ao oti^o #§Z.(± no-base(peak, aggressive compilation)^® Tn ca^yoy^ 3:9 kfu^m f CDs? U 77l/>/(V'>> (Sun Microsystems UltralO '>3> 300MHz SPARC 256MB ^ t 100 hbitIA© bbk SPEC CPU 2000 8 oa&& :SPECint2000, SPECint_base2000, SPECint_rate2000, SPECint_rate_base2000, SPECfp2000, - 89 - SPECfp_base2000, SPECfp_rate2000, SPECfp_rate_base2000 o SPEC CPU tit, 5-Wf-i'a'A Ai-V;Hl/3 > Un. —7 fclAofc — 1.2.8-5 NAS Parallel Benchmarks 5 3 — F 3 - FS EP Embarrassingly parallel MG 3D Multigrid CG kernel Conjugate gradient FT 3-D EFT partial differential equation IS Integer sort LU pseudo- LU solver SP application Pentadiagonal solver BT Block tridiagonal solver -90- 7-*-7-fe 7 F C(4^©-y-'l'XOjtXCj; Class A. Class B, Class C, Class W#fl} BSXTV-Ex, NPB iiUtn(D7 7')'r-'>3>7D^7i>l:SftM!)> * <577 >J 7-7 3 7© eauemmacMf e. PARKBENCH[13] PARKBENCH(EARallel Kernels and BENCHmarksltt. a#%*#9Wa^777 -7©gg%6itok ft 1992 ^CISfiKSft/i PARKBENCH committee 1:4 -ffto&ft, 1993 4fG:*S$n*x7* 7-7tfc3<, iy®l4fl-$7 ^E V 7 7 7©tiSE itok ft, -~<7 77-7 7n77 A14 Fortran77 k PVAftaBjeE^ft-tU*:,, $ 4©n-7 3 >7? (4 MPI &miA6t0&Rm$ftT!A6. 1.2.8-6 PARKBENCH »3 - R 3- pig TICK1 Timer resolution TICKS Timer value RINF1 Basic Arithmetic Operations (R-infinity/N-half) P0LY1 Memory bottleneck (in-cache) POLYS low-level Memory bottleneck (out-cache) COMMS1 Communication (ping-pong) COMMSS Communication (message exchange) COMMS3 Total saturation bandwidth POLY3 Communication bottleneck SYNCH1 Barrier synchronization rate LU Dense LU factorization with partial pivoting MATMUL Dense matrix multiply QR QR Decomposition TRANS kernel Matrix Transpose TRD Matrix tridiagonalization FT(NPB) FFT MG(NPB) Multigrid PSTSWM Parallel Spectral Transform Shallow Water Model LU(NPB) compact- LU solver SP(NPB) applications Pentadiagonal solver BT(NPB) Block tridiagonal solver PARKBENCH 14, 77 r A©a*W*'ftfg&fFffl-r 3 fc«>© 10© low-level A7*7 -7707? A, 7o©*-4lK>^7-77Dy7A(NPBffl FT/MG 6S&), 4 c ©3 777 FT 71) 7-7a XNPB-CFD © LU/ST/BT 6St?)* S-WlSS ftTV 3 » *45, PARKBENCH 1:14 HPF 3 774 7 77*7-7*^4151 ftIa -5„ ktU4 HPF © forall ■‘V independent * k k 7 7 k©%f? 4> ■£) © t 10©* — 431/7 7*7 —7 4ft 17 &o PARKBENCH 14 7 7 * A ©ffifgff ffit: 3 8S©3 7 *7-7 7n 7"7 A &FBB f M a»0l:@#*m&m=45 3 k Ltu3kc5l:#m#&^,. -91 - (3) gBttWbny;H 7©tilgff«©4-&©$,b*AckiSJS gm#9Mb3 y;W 7©@«g#m#&k L?l±x 4-ItiCftit-i^ gm#9UYb3 y /W 4BKStiten— H&SII+@S±T$ff u ^■©iiffiiefiassist'^kvos?$»s# *mt :uac b, ^©ttflgffttfsetfeAcortt^f u*nf?jiiij©affl4>^ftiS5E©i>Bi As+fl-Ttt&A^ Ac b bAcilia-SS-Vo g##M-(b3 y/W 7R*©g|±©6©(clif © ffffife»A5«s;$tv^;tA5*a-c-fcb. gm#?ufb3y/w7©e##me#A&Ko fc^yyy — ^©!I3Sn f Ach A»y yyy —y ynyy A©a^, SlffSliJmiSS. fffffi m#©^&, «*n5» c©gS%t:SAcoTliE5fe©^ • n y/w 7©»jg-fb • #MfbK*a»$ < ©&-5@&*a 6#g%$ fttV'5. C©6©ny/W 7±#@#©#«©mtc, «S'JK*©Ac©©tt6bff yyc«#fu* ua;*^, 3u$$tiAc 3 - K 0 R tc y X 6 S5§ tt iw m tc «J b »(t ?> ft Ad: ^ if £■ i> $> -5 <> • S(£UAc3 - P©(yt7*-vyx)^-j7 K Vy-f Asill>($7As2$ UV'„ • 33yyW yyw /7imi@A Jpflgij©3s@ul$7A yvw s 2$7t$E©y-yIA'„ e V y ^ As,Wvl$7 Asa$ uu0 3fe?(|-fb3 y;(A 7©^< yyv — y 7n y 7& y t"v7"—y 3 y. 3y;iy pyyby —ys ytc £c k* sa$ tv1, • SnaakUT©3yyW7©ttsgffffl> f 6kt3 y/W 7A#k UT©ttBtt;tt7;pyy —;uy7'b y —y a yyn7'7 AAsas t^-cfe57o —S, fl gij0 3y7U;ue«©ttBbffFfflJp»^^*ca>Ac7-c0#ffiy-y t UT©ttfgff*t v rii, ^©(Hsyft«©I¥fflt;«^s®7Ac*-^;p^3 yy y pyyj y-y 3 y tc j;6 cnii7;pyy-;vyy u y-y 3 yyoyyAtcza tttbfffflytts f ©K#&aa LTV'-&R#A)BBA^'C (c&6© < bt U * 7 DlSgtt* sfe 6 Ap?> T*fc 6 o @y©3 y/w 7S®6A;S < MtJi, 7y y (ctt#y6f$*fctt# UAiyfiWtc fl-®sh,6o vyytctt#uAcv>@#>©s*©ttggff -92- £tlX, V'> «t $$#7&:3— K(D#a UT^#fbUT C^l^C 5oo O ^yo>/W7 y^AfOf"? £*filh L^60CPM6f —f x ^>y v — y <£>||fl|gk:&fc-o't&—^ 0#N ^>y7-y^)o cmi^yoy^ At:(d:aT0 j:Oo ' SPEC CPU ^it U U — F“f £ & © £&oTt'& < 't&ft £&1'0 - ^(Dyoy^A^. L/Tx ^>f7-^yn^7A®^i4 • ]E=M##gg#) 6:H& - ^yoy^A#v —yny-yt^h.—M^#(zX#i#(yDy7A(Z#ff#^ p^yyK yyoy^±#aj:o^yDy7A^m^^(:(±^<®@m^?^^^^o yuy^ Aa uymi^}:(±mT(D #c,j;o%R0mA^A#^L^ ^yyyy-s/a Uft&fc b &i^o - ^#^'T^t)fiyi^yDy7A(±/<>^'7-yyDy7AaL/ymto-c$)^^, - - ^;M#yy v y —s/3 >yoy^ A0#A(±M#f^0y >yy>yny - f ^h0#f##T^ew#&ct o ^:yu y^ A&^t-y >yf < ##;% • $^yoy^ < (D#t##%r#ffW#bky ^ ^yyv —^ yn y^ AGDaBx&fc^ffl£*iTi' $>5oo #t:yot—y (D^1f#MC6#^J -93- mtmmvmMtLTiz, ^©iitfo^toufTii ^m^x^^cDtzi-f^u^i, m &&:/n-fe y+h£Erc:Slfi:t"£ c h £ £ b £t£tb#;W/fbf'£ ^ bv^o fc, 3 - pcDX/r-7 u Vf-J 0##-^, ^>c kt: j; #a^6T&50o &tz-D tit tlfz^ — \s®Wb^'o fctJScDffe^ vxyo&.o v < o . .%-ifkCM' fb y —;k & £' L C^i^cDACM LT t)+^B &lo|U"Tl^(J^U^^ G&tG [1] Lyle Kipp, Perfect Benchmarks Decumentation Suite 1, CSRD University of Illinois at Urbana-Champaign, 1993. [2] Lyle Kipp, Perfect Benchmarks Benchmarking and Optimization Guidelines Suite 1, CSRD University of Illinois at Urbana-Champaign, 1993. [3] M. Berry, et al, The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers, Int’l Journal of Supercomputer Applications, Vol.3, No. 3, p.p.5-40, Fall 1989. [4] George Cybenko, Lyle Kipp, Lynn Pointer and David Kuck, Supercomputer Performance Evaluaiton and the Perfect Benchmarks™, Proc. of ICS, Amsterdam, Netherlands, p.p.254-266, March 1990. [5] Williiam Blume and Rudolf Eigenmann, Performance Analysis of Parallelizing Compilers on the Perfect BenchmarksTM Programs, IEEE Trans, of Parallel and Distributed Systems, Vol. 3, No. 6, p.p.643-656, Nov. 1992. [6] Rudolf Eigenmann, Jay Hoe Ringer and David Padua, On the Automatic Parallelization of the Perfect Benchmarks, IEEE Trans, of Parallel and Distributed Systems, Vol. 9, No.l, p.p.5-23, Jan. 1997. [7] Rudolf Eigenmann and Siamak Hassanzadeh. Benchmarking with Real industrial Applications: The SPEC High-Performance Group. IEEE Computational Science and Engineering. Vol. 3, No. 1. Spring 1996. Pages 18-23. [8] Rudolf Eigenmann, Greg Gaertner, Faisal Saied and Mark Straka, Performance -94- Evaluation with Industrial Applications, Purdue Univ. School of ECE, High- Performance Lab, ECE-HPCLab-98211, Oct. 1998. [9] SPEC High Performance Steering Committee, SPEC Run and Report Rules for SPEChpc Suite, http://www.spec.org/hpg/runrules.html , 1996. [10] SPEC, readmelst.txt, runrules.txt ,http://www.spec.org/osg/cpu2000/docs , 1999. [11] David Bailey, et al, The NASParallel Benchmarks, International Journal of Supercomputer Applications, Vol. 5, No. 3, p.p.63-73, Fall 1991. [12] David Bailey, et al, The NAS Parallel Benchmarks2.0, NASA Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA, Dec. 1995. [13] Roger Hockney and Michael Berry, Public International Benchmarks, PARKBENCH Committee Report-1, http://www.netlib.org/parkbench/ , 1994. [14] F.H. McMahon, The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, Calif., Dec. 1986. [15] J.J.Dongarra, The LINPACK Benchmark: An Explanation. Supercomputing, Spring 1988, p.p.10-14. [16] J.J.Dongarra, Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment, TRMCSRD 23, Argonne National Laboratory, March 1988. [17] D. Bailey and J.Barton, The NAS Kernel Benchmarks Program, Technical Report 86711, NASA Ames Technical Memorandum, 1985. 1.2.9 3 ^ t '> > v 7 v;v^L 7°n-b^4j-©Jjjfq] : Stanford Hydra Chip Multiprocessor "jWWLtlt LT, 1 #0^7 7# t > V xty b CD loTfe^i Stanford Hydra Chip Multiprocessor Project 1C C W ld]A#(Z) Kunle Olukotun L^ b 4 7 "J 'y 3 > £ M bt> ^60 ##7^-7 : The Stanford Hydra Chip Multiprocessor -Architecture, Implementation, Software- Iff IW : Kunle Olukotun Assistant Professor Electrical Engineering and Computer Science, Stanford University lit 0 fi : 11 ^ 11 U 22 "Hydra architecture" 4^# "Hydra Implementation" -95- 24 (zk)^BU "Hydra Software" m m : 55s #2 mm 3 ( 1) "Hydra architecture” 1) Technology <-> Architecture £ t\ Hydra f'y7*7jl/f7nt 'y'V'lM% ^CD##kL f UT, 3>tf 3. — &T — * :r?3-* v^04b. mmftmm:ctc,T##m^#t#^mwmcft&&6%o mmmift uyjoT> Al:&6o L^c at:, W: i ## CD h 7 >^^^^##W#l:ft6o t,0-^CD^#ay^#a@aLT, # #CDm^l:j;D, h7>^%fcDmj^(d:d:Dp#<^cT^ £ £> ft < ft £ o Tft t) cross-chip-wire ^V>oyciSV' 174> T (Dtztib\Z. ?7 U y ^7 JEtlfcSt &$)(if £ Z £: £s titi ifc ft <( ft D > cross-chip-wire & L ftM# C14: ^ /W X ^ X Wb cF tl fcvd' y£m^fttfftti:ft eftu0 z.til*7-*7 :-#?*fc.b'oXi*, iy-f ^ cornea 60 U J T 2) Exploiting Program Parallelism fflMlt, h ^ y¥7># lx, ra*'yvaftffi&WcDfztofc -96- ^7A0Mttl:iitS^ 7D^7AClt 1EEcD3£?|J'fbqJfe& LTU #6#w]//<;i/k LT, S*7"n y^u^;v #7'J%A LT^O 67lT £fz, & o ^'3#VMy UT, y-r ^ i/-i> s cN±;h-7°i//;W 7 iz £ D g b: l^iMb£ n7c 7" d 7"7 Abid; Df#7iia0 7°n-h7 ^ii^o dft &£iJ7<7)70P-h7 ^ iz o m%07-^7777v7>'r(j:, mmi£i%^&)i-7pu'ommmzM* lti^0 ti^(b&%-hK C7i^ 6 LT(i &^ b A^7 c k/co ^ 6, cross-chip-wire ti7MlW 31 £ ft> CtlU: ^ < (D U J r > '> £ t tZfrbtzo f^^i:U7n77A0iT0M§IJIU r^f£tb£ c 3) Hydra Approach Hydra 7° D 7 x 7 Y IZX& S U T 1A a tMteA 7 7 7")l7* 7 7° V)]/7“ 7° D -ir 7 it CD|£HfT'$>&o C©7-^T77t0S*^^7^r7[t WE<7> T^-fe 7 th£ ^7>6DT- 7 7°±(:##f ac :©7-^T^7ttit X 1/ 7 F i/^A^d'bfecD^iJffl^ w#a^D. #u77^ h^^L/:j;7^±T®7^;i/(:#ayp 7: ^ & R9M blt^ #^(7 7 7 M T°D-h7 7&M^T77 77 V 7D^777'J7“ 7 s 7±T#l^#a^c^)^(±, 7 7 7;h7n-h7T7 V 7 —7 a >^!Hbf 7&t>^ 3£7Mb£;h,fc7PD77 A£7h£U ##07°D-h 77X^6^%^^ ^^Jfr®II^&x££Jf6U #^Jib7P77 A0f^h%<&7P77 7 bbt^WH7bbt^ cn^Xcb D WSiC L/co Hydra ##tf 6±7r7c^ ^^0 3;oc uxf ao -97- 4) Outline Hydray-yyyy^#mewyao CCDT-yy 1: LT9^ b^ &o %#y y y b® HC'O^Tgmf 6o Hydra y both# -b^#(f^^i^o f UT^i^HC^^yyvy %yyyb(D##^^j^^j;bx i^o^oyoyyA, #i:#^^cDyDyyA-c^yt Hydra (DyDb^^yO^gf- immjBm Q:yn-teX tX\y y b®i#W±#&? A :i&}ll^U:, fn^#-77'J^-'>3>^^ff:^)©K Z o X* & i^Ofr b ID o Z Q=#^yy vy-y 3 >0^e(±wm^? A:^n-fewt^co y^T^yu-k y-y_h-t\ iej-&3 v>&s&£y yu y —ya >07 u 5) The Base Hydra Design CCT Hydra 0^ —yy+MytCO^TxG^ao 1 yyy^&E'O# yn-fe'r7^}^:tfe$t^^o #yn-k y ^ y y ^ b?—# A-y y y^JA t), L2 ^ y y y ^ lt^^)o f LT, ynt'j/^lt 64 tf y b «li<7)y A b y lb“ ;iy a 256 tf y b*m# V — b V yy —y/iy LT, L2 ^py y y LT l^6o yyyA/^yyi:f6#JAkLT, c j&iwi>b*@^%^uyw^o ccD;^y(±, yo tyth#^yy#^'T#^T^NZ, # com## lothd' 6) Cache Hierarchy Details C0ESI+CD, ^icA-y y ^ailCliit^o ^yn-fe y +b (ififr ^3 A- y y y zl b y —*2 ^ t y 7 1 §: y — y A- -V y 71S 2-way ty byvy:ny ^ ycoy^ byib-^^T&^o yA by;i/-c^Ly(±, L2 ^py y y n. (DWL$R(D&AzWL^'1r & o yyi©7^ X (i i§- 8Kbyte X\ A#t 16Kbyte T'fel)o L2 jr^yy^i::##y^yy^^, ^Atc/wyyd'Xb^^TjoD, yyy;i/^-b 0 4-way-ky bTVy^y^ X0yA b;lyX^^T$)6o L2 A-yyyj-©l5[Siy 128Kbyte tf*t> 512Kbyte $ T'StXbcF 'tir'T IffBi&fffe -o /c0 A- y \y y zl y A > 17A X'{i L2 A- y y y -3. jbs 32byte N Ll y'>i|l 16byte Ar$> & o L2 -98- 77^.7^7^79 f^C^^/zcLo (-, ^—7^7#7^ h;i7 k V — P U 7 7 —7;\7mz:#^i:^)^ flTL^o 7q» P 7^-0##!^ 7^ M#^#mLZe^oX:o #m 7) Hydra vs. Superscalar CCTr, Hydra P d' 7^. —7—/^ —7*7(Dlt#&f7 7k-3^|g^l:ol^7^/<^o nW&i±#fy@#®yDt^ {j'fzo C(7)|§^L LT > Hydra (i 4 X 2-way 4* 7 :r — x 7 —y^ —7^7 (4: 6-way 4* 7 a “®^)0^ffliLfeo cn^lL 4o4ocb^ld] L@^T$) D. W#2##@®77V 7"-7 a >^Htf LTtt«ELfc0 7D^7A C'a0^j^lq]± $^^77}:L/tito C<£>7'7 7 dD^E^CD compress # SPEC95 ^7f-7 —7 Pp(D7 7° V 7* —v a >T'fe£0 o077'J^ “ya>it S^^LXl/7 Fl/^;i/t'0t*^: S oTU&Uo —pmake OLTP (d\ 1/7 #am##&ao OTLP (d:777yi/77V7-7371!&!l, pmake (l(ai^(C#m(D7 3 7#^frf avyi/f-yo t7#77 V 7 —737^$,^,o c®g|^677V7—73>0#^j^^t>}76o 7—y^—7^7'77>(±/:i^O' 1.4 1.5 fgid: Hydra yi/46^J#^yt —746^iJ#(±7 7 7 F z £: 7b5 7! t £ © T\ 7-y^ — 7#7'7772: Hydra 7 —jr^7^7CD%#l±#(±T#6^57o CCT^f77U 7-73>(D(^aA^W:, 0#J^7Lyp-C^e2h&k#7T&&T)T&^o ##j^7 7 7 P07 7 V 7 — 7 3 fa|B5C###7 7 7 P^#< 7 — ^77^707]^ 6&T&6 0 #%J^7 7 7 P©7 7° U 7* —7 3 7 7'fe £ eqntotL mSSksim ^6 8) Memory System Performance C CT. yi7 0^#lH:oWT#%%f ^)o y^7Mf 6^0 7^7 V 7 h a L7, yi7 77t7#377V7 f/^^^LT L^VX h;i/^^7(C^^ aO'O C -99- Tomcatv XttAX©tiWW 10%£T@£o 7 X b M'X CD ddW WXft [SOiBg] Q:C ft Lxo©7 — ^7777 ©X D 7 Xt^X X;)/l±|s) L^? A:f o^o Q:7 >X7b7 v 7°7,n/7Xn dr 'y +HI&, ct D iE;i&X n 7 X £{£^/L^©xy:? A:f©7ft0^o #m^XD7XW#^^^7Ck^X^6o ^[m(±f7#bb#©B^X& £ © X Idl l^ X D 7 X lz — b X |+$lj L AL Hydra &7>TXl/&7^-:)r7X77fo©X\ 7 0#^C##W#X&^o Q:X—;^X^7 77>©jp7 7 7^.'7d' XW:? A:S*^JtI, 1 7° D dr 7 +h © 4 fgCD A-7 7 '> jlIM XX:o###(± ASPLOS96 ^The case for a Single Chip Multiprocessorj LT^J L^o Q:Hydra © 7°U dr 7 it 7 7fdU 2-way X—ft —X 7 7&©£>? A:Hydra ©#^#(d:, #$(:##^77>X(±#Lt^m7©X, 7>X;i/^7n.-77 4, o, sa ^U&^C5T#&&CD#4;&^o #^if^24bL 9) Problem: Paralleled Software jfe^ij V 7 b X^7$: Z 6/:&/)©#)# #?UV7 b X^7^#ft^C k^LX^ftm^7X V X-7 3 >^'Jfb76©(d:@#X&6 c 6:^^^Xft6o ##!:$ < ©f^#©*lz, 7 7°U X*-7 3 >£d£?iHb7S c £&&&%&& Lft&fto $M(:4b SUIF ©d:7 T:i^J^^7>;W7X, 0f7?U FORTRAN 77VX-73>©g#m^iHb#m^^^4b Lft^fto mecDsuiF&mw%:e&7j;7%\ v-x&gm#^ifb76 77d-7(dx C 7°n X"7 AX>&T L& 7 £ < ft < t>OXti:&fto 7 7° V X* —'> 3 >(DML&]ik(Dfzt>b\Z\&' X lz 7 b LX1EE©X lz 7 P##7 LX C, C 7 7° V X" —7 3 >X&> 7 7 >±X^zftft^X 7 0#^©{ii@#f L&U"ft(3& 6%fto C7XUX-73 7&^^Jfb7^/:Al:(±, #LXfta©&t# L&{tftte! & L&fto :ti^d >7 0flfiKb©f^Eh V>7o f LX, >7^mLxv^#mf^MT,(t^c wmn-p^xL 7 p^i:(± £t?CLXXlz7p©Xb7&JEL - 100- ? kT-sg@k&-2>©lik *7 >7iflfi|-fb;t>sS8lsgftti:i6L©kvdk kT$>3 = k©#% ©emiiS < 'nttt>tix\^ti\ C77'J'r->3 >-e±^ ^TlcttSot^ k@t>n-5o L* LUBMTtiiFUi Bfa© C 7D 75 AftigiRTS 3 $7ftSoTti:V>&V' 10) Solution: Data Speculation *1:, Itltta^WotSfeckb, f ftft jL6R#©gk#ft-3t\T%E^<6o k © j:3»IBBftWftR#*@Rk LT, A'-XtSitlftJ; b, 7D?7A©T-5lft#SICt1'l:77lJ 7-7 a>%mmbt 5Cki!tS -5 7n 77 AS3ft3-iJ-fb LTUff L£ k LT S, HfiOSftftfcft R n- h* kx iri'Ttffl/Dyjioiifbta k k&x atzfrt,-? $>%o ctill 7n77A6ifi?mfi:LTi'-E>|iST'&x JELt'iK*A^#7\-M7j:7ft Z bffitiE$ft£ kt'b k kSjttt'T-So % 1/ '7 MBftf-7##A!#$ LT &x IE LV'51 fT$S$As)#e>ft£o kftCJ; bx 77 b -y-y a >©@#b&ak#T6 oI#@©& & kk 5 tkfci)-i9ai4xnniij:<- d $ Ox iEmg©t©T% <##©6©^itfti@im JtftftbfJft'o kiltMbx 7=-7%##»M-ft«x 7n77 A£JEL < ?£ ?>-B--5 fe toft Hiss Aft e>&v>0 <©J;d7^-9-^t- b* stil5Rfttf;i/-7ilfiM-fbttE#7afe-50 36?iHbft ^$;i/-76SS ft-5 7clftC =t < x 11/-77 7 l/-->a >IBT@kRBl#b#©&6T-7##&mftft6 A Bliftt'o 7\-FbirisiitSfrbtfeS. LA^Ux 7n 75 A©36?iHb^(tASA- ##©#x.m±x ES©;E#:7n75 a&xl7 ecesiLx t ft 5,©X V 7 h'toJESfissHftTVRBlbx 7-7##&@cTf©77b 7-7 3 >6 3£?!lSlff AR k k* sT-$£ k^o t>®7$>£,, -o$bkft6©7V7 #x 6 k0 7D77 A© - 101 - Mt Hydra T'idU ^ — M'/\ y H (D'b £ V^^SUff <£>+*-# — h 'fcttlM't 3 o :®7-Jrf^ h "CU\ v 77ji/f7n-l! f C^^'J l/^;i/TCDiIS& nJIE ^t*£ S ©£§!$£: t* 3 o 11) Data Speculation Requirements I f'—& Hydra & £ &lv\ — K *> ^ T £01^ ;i/—LTt^ LZoo f i/ —s> 3 o, -f ^ 1/ —3 > i (ZWZd' ^ l/ —i/ 3 > i+1 ^ UT d'^l/ —^3>i^^(j-^) d'^ l/-i>3 > i+1 4\ c^t6(Dco(Z);i/ —yd'^l/ —i>3>^, c 12) Data Speculation Requirements II 1/ —'> 3 yj^mKDd' ^b-^3>(:i)itl) X CDS^iM J; D L$ ^LT, Mb7x:T(±, -hf 7l/-yd'^l/-i>3> 26<#&6o ^^(DtK^&^tacaasT^&W&AWC, T&60 %66t:C(Dd'^l/ —i>3>0^fT^^%L/:af^o CKDd'^ 1/ —y3>i+l(Z)^^^Ti>>(D^^^#^LT3 - 102 - AM:### 3 a 4 13) Data Speculation Requirements III b 'y i+i & ^ z. t& i~f til cttn ytzi^o v /b^ u & ^ B x 'it? y —y b y i+i & x # m §■ iA -f ^ U-S>3 > i+i ffl £> & V/bC Ol/-y3>iif)^litli^^j:i^o ^ 6x 'i $ ^#%AM C#^A"C(3:, #^^(^^^>±1: X(d:co##&^oTi^ &o —S/3>i#4b#, —S/3>i+l#4b#T$)^o eft ILP rn-fe y V"VO v V7,9 U ^ y?\z tx & «t < MT&0 x ^ t V V f UT^^5/Cx C#7Dtv"^^W^l/-y3>i+2#^^^%A66^C^, -S/3> i+1 X f#X:A6#&oZ:7'Dt i/ —i>3 ^ &IE#UTl^U^Ud!^ Q:HydraTW:x A:Hydra^W:f:-^##mMmf^o###^(±, 4b^ 14) Hydra Speculation Support a##&. Hydra b %fi\ IMTCj^^o x CPU C^AD^ftx ¥k$k 7s y "J b £ rfrJ MtZ>o LI ■*-*? "J yn-lzlit, $t?(D U y b Z>tA% Z) o :T — ytkftM)yftt££titzZ.t%:M%ktZ>o uT#k&D-c& &o Ll (±####x L2 ap^^y^pfgcr)^- t)#kf ^>o U v ^ ^ L2 ap^.y^^t:3 - 103- V*-5 >7tiX ^«07D-tytiiLl Ztlb a sib| tr k uxcm urasc^—j7oc ®S*fT®ilS$SEL< SE1"5fc0©l¥«B6:jSei*x EE^lflEto-b 7->a XT-Siifl t§o yx — F ->zTtcjmXx V 7 h e- zr tijsasiff 6"9"4< — h f-Sc : zT©@S:$( b #vf-7„ CjxHx % B© V 7 h 7 ^y©-fe 7 -> 3 >7!BJBJ!-f-5o [*S$tKS] Q:7 4" h /t 7 7 y 0-9" -f Xtivx < O*? A: IK 7:0 7 4 b rt 7 7 r ©IM X fcffitgCMtfcCotxT «X 4=*mi#f 6o hf e/x- h* 7 = T k'® <£ 7 (.: & 3 A\ &««)$$ Q:7"n-fc yth©SI b STIi'^'f 7 V 7 7 Cff&tift-E. ©*? A:itM(Dm&Z-fr&oZtftT-SZo Multiscalar © l*i7i7i)i7Dt 7+7^0|!l=i4l31i UTt'-SOA1? AiSSIfjfln^Dtym X V 7 ©XD-b 7+7till b 7-f hXll/-yx -X611UTX V 7 K© ID *sjp b t b $n^o 15) Speculative Reads u©%m&m0f Htfo 0-7© CPU ICftLT, -f 7 l-ya> i-2 5, i+1 ##!l b ST 6 tot VS „ C Ct, •1' 7 V —7 a > i *Hff t"-E> XD-fe 7-9" TMej k"©RACtt'-C SSSIfrS L&V'T'n-b 7tl-»s#6"t3o C©###fz&R%h ^txxot y+i&^ v h* cpu a#*. ^7 h cpu ttssuffsff6t)»t'©r-. wig Si|XD-b 7-9 -k UT$#7o -7$ b, fsaitlfcr-7lt LI ^r7 7 7i^0i**7: LS5l<*st 7 h ufc®-grtt, ffli ir-? SS» ttifo LI *7 7->^.^x©E^7: V5#As5Xb7 h LL2 ^K#X 1/ 7 K©?^ h;i7 7 y©%m$f 60 i84toC, -f 7V-7a> i-2 t i- 1 Siffltl'?i7D-fe77li, E Lt'y-7 £«gjAA,7Hx-5o ftfrT-SXlx 7 f©7 i" iA777ffltttiiAi' ha, ®if©y —74^-7 1 7-r>#7Dt7 9" C ® $ ti •£> o [H@IKS] Q:V 7 h I)i7/U X7'f->XIC7iViT%E-5o ft ftXD-fe 79-¥ - 104- u^owm^^&a®!:, m%®^77-737a s#B®^ 77—737^77 b 7 ^7/W T"^ - 7 7"f a®&? A:7 7 b 7^T/W 77^z:>7(j:3>;W C #&i'o 7D7^A C&UtUdC f'—7##i@M®##^C t"CD^SIffffl/x— H7ai Tttifrff U&1^0 16) Speculative Writes ztutg ^izmvkm^o Ho® CPU 7\ 4 7 V — 7 3 7 i-2 £> i+1 #^U$^7Tl^a &(Dbt% o Ll jp^77^a7^ b^77y®^^Cf:-7^#^3&^jiao ^cff fa CPU (3 7^ b;i71:^-7^mfo #^c® CPU W\ -<77-737il:^^%(d: 0lif^©t\ f'—7###M& jo C U^ U, #® CPU £S§- ?A^ tlfcf'—7 £rl%t? ^ Read-After-Write C ® 7 7 7 b eu^ufUd;^6^^o ^efa cpu ca^-c^ao a cpu®#^3&^(±, &/2%m#i-?&-3-e(±&6&Uo ##^® cpu 3^® Li ^^77 ^^igELTjo 17) Speculation Runtime System 777 b®##^ff®$iJ#(d:77b7^T^ff^9o 77b7^T w\ f^T®^#7i/7b®j##&;a^u. ^®777b^^cc&a^^i^^##& ^#f a« Q:i#im:###e2fia7i/';/ yut vifm-i^? A:f®kj60^o ^oaX:<^/u®71/7M®^fT^#x.aca'6-e^a^, fflkL f 0 # < ®;\— b 7 iTaQ 1 7°DU 7tb±T®##&®^#7 7 7 b'®HU§: tbf-bfa;\-b7^7(±M(:^^7L^tu 4'®hcafji(±^W#^o M^0%#7 u 7 b®^fr^#;^7 c®f 7^# A(D,##&®%#7 1/ 7 b®^ff b f a#3'J^;\- b 7^7®#3Ml:N U%(±, j6U'Tj^ 77 b7^T(±, 777 b®###e^7-7##mM^^®%#®w^^$«j#f ao - 105 - Ll A"V7i7^PF©7d'>©##fb 7-f b^y770^iOii^fr^^ o 3 > t° :l - 7 7 7^ A©E£St(d7 F ^x/t'IitliC ^V7b 7 x7T'Ilt 3 d to F 1/— F^7 & 6^17 7 7 F 7 x7T'©#tJI(d:^iJ 7—/\7 F&^l^^d L, — F>>x70#iJDtt$)^)^Stf0t“^“^^ F& 77F7^TT©#@1:(±, ^0©###^^#©##$^:^ 3 Xz 7 V 7 F^fe^o rHydra softwarej T\ 7 7 F 7 ^ 7©#J####^I# (Z:#j#fb ^ 7 7 F7^77©^#^Fr©$lJ#©###ld:\ ASPLOS'98 rData Speculation Support for a Chip Multiprocessor j T\ #^Ufb© @#^##7 7° V 7 — 7 3 7 © 7 7 F 7 ^ 7 (C cfc £ t?fc tl§ [rO-t li „ ICS'99 r Improving the Performance of Speculative Parallel Applications on the Hydra CMPj & o 18) Creating Speculative Threads £ £ £7T\ 7 7 7 F©F£$£flfT Ff^7N — F7xTh7>77 A7 7 F 7 ^ LXzo #7:, 7>yH7i:oi7ilt^o #7W:, 77 7 F##^e7-A-7 7 7^^y^- FT^3>;W ei:ML7e^7 7r$)5 7#mfbl:7)tM:#^/zo 77 V 7-7 3 7©##7 7 7 Ft7 ;v^fttTovmdcStt Hydra 777A^(d:, 73©^#^777F^mi^o ##8^;)/— for ^ while —3©##7 7 7 F F: —3»©7 7 7 — 7 s^&miv^T&o d^u::, ^^iem#© Hydra 7-dr7777'Tfj&7k:(d\ #m#©3-F^y-7##mM©#m$:^^7 WI^T'o ^^^^^#©3—F&777 FkL7^#^fT^fj^3o Hydra 7© 3 >A^ 7 © f± $ (d: ^ for & pfor> while £ pwhile h l'' z> /z <£ 7 &n /7 — 7^^#8g;F —7©7 —^"77 F V 7 — 73— F&^J5Kf £ Cl tX$) % o X LX ^ source to source ^j^7°D 77 A §:^ff L> pfor -7 pwhile #^ffm©7>^d'Ai777'A©ie^mw:^Mf^o ^-y^v##©#^ 7^77^©D—77/)F^^c()^yD—dtt(±, #7©777 F#§#§#©7f 7 7&^hKf F 7 J:ya7l/ —7"jpif V##©$)^)^#(Hc:^L7y— 6o Q:#^iJ;h-y©#A(d:, ###m©##ihT;&&&? A:7! %#7 7 7 F©#^am$©7 7 7 F©^^(±|q|^lC:fT^7^o - 106 - A:f CD^#o D /:o 3 1/ 7 FA^)7 A ^(d:#M: LTV^ f 6C Qifft'TkL ccD% F;^777^^#^(D7N±^l^? A:#^x 7^F;^777(D±#^(±X 7^F/^777^#ft^^W:x 7l/7FCD^e^#±U, fCDCPU^^yFyoty ■9-H^^^T'^Oo ^o&(±/\y F7"D t 7 tH:: & & CD T\ 7 7 F U 7 Zlt&Z £>&l^0 ccDB#x r/\y H7D-t^^li^-5I^Dj '>Xx An—;i/^^ff'f^o 19) Base Speculative Thread Performance ^#7 1/7 F^%7"A±T0^##mcDM^^|gl:^'fo yy U/r-i>3 >CD-gPl!(± ^< A#:&V —f 0# gcc 2.7.2 T# 7^3:^ F^^^:L/=o gcc CD# )@fb^yi>H>l± 02 'T&&o f UTX 4 0CD^>y;F^^^-yDt^tFF0Hydra 7 —^7^77. #Z^##^fr&3>FD—A/T&7>f^Ai>;%^A&, S/^a.1/ — # m bfffffl^fr^^feo *MW±±X(DV 7 F y oir4o «fcU0\— F y o:y^#0Ci/ l/- F LT^6o 4^0CD, compress CD «k 5&7 7 0 ^ — '>3 ^ 1.5 fgOjiJ^roJi:^^ 6^-CV^o mpeg2 (D£olZs &-d t&tfeifi&T §& o&yyvy-s/accDdio^yyvy-z/3 \Z~Dl/^X W:# Odi^^5 o 7 7 7 # 0 7 7° V 7— S/ 3 > cholesky x earx simplex^ sparsel.3 ^6W:, ca%6(Z)y7'vy-i>3>u\ %#yi/7Fty;F cfi6cDf^#(j:x -ACD#±m@cD#^^ 2igfmi! 7;i/-7 7 V 7 F fb(i^i!j(i~^ £} % source-to-source 3>;H 7^l)t:^ TDS tsCD <£ 7 fc source-to-source 3 >/W SbfbT' & x if ^ h X' & alibi! CDt)x ^^AgicccDyoy^ ^ >yty;H±m##7!&&&6/:o 20) Optimizing Parallel Performance 1/ 7 FtT0l/&^ L^z^x 7 7" V 7 —z/ 3 1/ 7 F77 V ##8g#^jfbyDy7Ai:^^Tem^±f - 107- •O^T#x£©*5jMT&3o £©T — 7"A0D—^7 -5A# C0D-* V 7^-f Mf^SM/bL T7°u 7*-'>3 >(7M£tEd£S;Hf&5±t: SPLASH 77U^T“y3>[:ilU^^il C©IiftL D-^j^ax# ft5 A--7 'y'>n-^ 7 £$5^1" 3 tz&lZftf>lX %tco Ztl^(DT7V7 — '>3>(D^r\ MemSpy (iv'^7.1/ — 'y b >^#T7r ## Ln $ tz Flashpoint (i Stanford (D Flash V ;V^;TD-b y th-eifrfEt~3o 1/ v Fmf a fLfxtill #^J7 1/ '7 FREt:###&i^^o{%mE&f L^ m bti'tei^o Zotzfo, #?#fb C ^AW\ ?—7tk&'M)z. 21) Feedback and Code Transformations i> < :L FC7 JITICW:, &&&<, Cti^0^^-XAtlL x — Lfcn — Y^7 h7^©7n 6 &7rl:, 7 U y Y'kbOU^Wii btzfr)b^^tz%(D'&^t\t Z> o c ft £> * e> > btiti%^%t ZSkfeXfo Ds cft60##^6%#m^6ck(d:$^o z>tK mmonxoLM^um^oi&r^m^m^ m^o cft(d:&^@(D7yU^ —^3>"r, C6D^i&(±, yd'^l/ —7'yyf6 ckolc^^^LvF&^^f^k^oTd'f'TC^^^Ti^o /=^x.UL ;h—yd'^L ■—‘ is 3 >(D9tt.11-n— Fi^i^fe ^ C0 D— F^^* s;v —7°^ 7V — i/s >(DMf%(D7 F T#^C## lti^l^^;i/“7^ ^71/ — '> 3 >0Hff (d:^-—y^' —7 y 7°7: CCD^O^#^^, D-F3:T(f. 7FT^mH:^(f6cai:j:D, ed'^L-i/ —^ LW - 108 - 22) Optimized speculative Performance mpeg2 V## ■o^rm v* o 7"o 7 7©##Cj;0 ;s —^ 7 7°W#^#^r Lx J£tllZ£-DT^lMt£^7 ^ — V >XO|p]±^fTl^o mSSksim &m CfS6 ”£^7 *“ V >7#[r1± Itl^o compress ~£# |i|J8I £ ^ £ £ t IZ £ o X %'}><& A 7 ;* Q:#%l:M/:*7 7 ac©*7 7'£##^m7 d:7^^, UTV^©^? A:Xf\3l©Mf;o C©7"y 7-£#7 -f — P^'7 7 j^#gi&'t*fT ofco tu©^^7T'(ir ;Vd'U VT$>So mpeg2 £j3lDTld\ SuS-dTh- F £H,T £ C TtiJtl *>& o - £ifi&*z>o &mt v ^c^©7;i/^ VXAg#:©U7 k XXf-V U >*#, 4^-i7©^^a LT#j%TL^o y;i/3 vXAf ©6©© hX7^7 v >x#e^7-ci^ i^o ^^iHbf^ca©#AaLT, y;k3VXA©V7k7f^v>^# cccinti^o Q:compress £ j3# g fj^Mb© t £BT U % ft? A:compress £fcl'o 23) Hydra Prototype Hydra y ^ f A © 7 n k 7 7° $: f± _k (j1' /r 0 7 D T 7° 7 7 & |g ^ f" o Integrated Device Technology(IDT)#A\ iB- 7 © 7° D -te 7 Ik © Verilog £: j/E#£ D T < ft o ##t#& #IH&&-32:©'T, T'D-k7tN:#i^2fta7tV77^At:'O^Tm id q ££~£m ^^©7D-t'^©^;e1J3>l'D-7l:itf1U;o c©7tV7>kD -7#, ffe©7°D-fe yT*©/^ U 7 > k n-77\ U — Kj3«fctf7>f #.*©XDt 7ik#, 7 7:1 j3Z ^7^—7^* 7 7JL^j^Oo 7^0. — 7* — 7 3 > # £ © L2 'yyi§ILT^$tl^)o 7 -f ky( 7 7 y #g © £ jo 0 ^EE^ft^o c©«k o C, f - y 7° • # 7 • y;v^7 0n-fe 7-7^7- y 7°±^^©^ ft£0 24) Chip Design Road Map Verilog imi%®#S#kk77T7k#^^ #*t:3&Tf&o 7-XT7k#^^#^^?^DZV^o 25) Conclusions Hydra # 7 7 701/ -7 7 7° 7* 11/ 7 7° D 7 y 1k©§f D £ i/i^ 7 £ o wide-issue x-^-X77©^^^m76^^7^vi/a/^j^^^|#m770 4btrU6, #*#71/7 -109- f c-c, c# CMP y3>c^Lt, &&, 26) Hydra Team Monica Lam ItWMOX V v L%#^ff(D77 T7^-^#lI#T/co Single-Chip (D% ^ Hydra OBtSf HHt* & LIT <£> web page UT^UL http://www-hydra.stanford.edu/ iwmmm Q: 7° D h 7 7 7° II Rambus 7 >^-7x-^|iHU;0i(p? A:l^o Q:f ^TT(±. 7 7 >7t V®;i> 1:^6 60^? A:64bit x 100 MHz tie Q:7 t 0 01/7T>i>(d:^0^j^^? A.-ffilill&AcIS Rambua 0) & -c> & (7) £s$> 6 D^o Q:£'E&7^ U (DJ^y IttLtzOW A:#^LX:o LfrLs 7 ^ U 7 7 T 7 0^ l> 7 7° 0 7 — '> 3 > TT¥I0 L L T — Mi Q:f VT4 A:Object /So 1 17 7 7 T f*] T fr & 17 tl ti! £> l't o ^“Fl)x7JpV7 f^x70 A:(Hydra'r(d:)m##^jfb'r#^jfbhrm^;k-y& - 110- Q;C0d:7^%#^e0^7z:XA(d:, 7 7m$:i#^LT L^7 0^ T"D7770#B#t'T#&7V7Y7;i/&g|572:&:50(dA jp^77^T7 77T&D. yot 7 7 0 7 D 7 7 W##((C^#&7x. & V C X., marginal 7dz 7 '>xl 7 7 'sik%M't %> OH5&0 7 7 Md^^o L^ L, T 777 1/ 7 7 > > (d: marginal ^ 7 7 > xl N LI ^P 7 7>xL0T77777 A 1C # x. 6 tlTl^o L2 ^P-P 7> ^.\ZWit^>T 777{dU LI dpy 7 > xl f 3 7 7 7 7 tt~~ n — yy-7t% o ^cODtztb, S6to&^“yS-/\ 7 b0t|,bntdA b 7 ^7(d: ^04§m<&? A:L1 ^P7 7 tffi.%. LTL^o jSSHfrB^S §•&^#C2cW:#W'T#ao Q.o 7°D7 7 Did;? A:#^^1C#3 7°D7 7 7*7' id: & < N 7 U y LUffO/x— b 7xn7lz^j"7^7 >7 —7 77!&ao ^CD^cA, 37D77tb#A#:i:A^^#!l^(d:##l:/L^^o MIPS^(±, 7 7°D 7 7# 0 & MMU ^\07 >^-7x-X^ 3 7 D 7 7 # 1 /S#2fiT^:5o #7(d:, 3737772^71/7 b0^##fr;\“b7^7^07 >7-7 ai-7 ^ L%#m Lfeo Q:C0^#t^$+(d:7-;i-7^7 ^fm| L7:&^ A:ILP 73777(^^^077 h^y^-y^e0^^C;^777^miX Hydra X!(d: 7 1/7 F07 7 b :t7;t“ 7HH 0fcA6i:7Y hD'77 7^fflLTL'50t\ f 0## T(d:mm^^^mm(±lEL^o L^L. ytVT'yyAC^f^y^b^y^-T'^ff 7!(±, ILP 737 77W53 J; > ys —^\7 bid;:£ L&Vo ILP 73 7 7 7x!(d> V ;t —7VS 7 7 y #CD 1^4^77 7 < 7 7 77 vXL — u >7"$: 1 77 7 ;bcpT'ff &t> ^(tfUd:^ & &(Ao Hydra 'Tii, 1 77 7 ;i/7'fr & t> & it fafd!& ^ &l^MJi(d:& < ^ M S0yS7 7°77 Wb/b 5?iffl7££o m'#?m&9l5mL&^#fr(d\ ILP 707 77 7^x./(7 v V$“< y;i/773777y77A77^ %Z£\Z' b 7^7 7 7 > b7o ILP 73 7 7 77& qjfb7&3 0 #00)73 77 A7 7 >7#, ^±)k^^7d > b7±#[m^7“4P7 7 7-7 x.6o C0m3M#&7mi7Y>b7'T(d\ #707377A77>7^73777^ #L, e707d>b7(±^ei:^^^J^imposef^cai:^^o ^0/:^, ^7:% 7d > b70#m^C(d:77 V V$-< >7#<&^a&&o 7;i/77i/ - Ill - y H ©y — ^ t~ 7 y d71! & > wide-issue T' fr & £) T l't 6 J; d) & 7 dr V U ^ > 7* Q:7 7-7tf VydC^f6#R0o Hydra 6D7 1/ V b##^e(d:Ll b y;b-#^C##LT^6^\ ^yb7;b-#^(d:37b^±#Wo Hydra 7fU\ % % 4 yo-k f hm±0 7Dt'7ymf 6ma^3&6&o 8 yot vyw#^a#x.^,o 77-7 if •Jf d &Mt Jfteu J: l) a- F1) x-T-fr'j&W: bt£%>0 A7 Fyxy&fl-jfjn f6^, yi/vKaL##ff(Dyd'yy^y^i/^hVjpy^y u^u, isyoty f^6^lS^f/:(d"CD%#7 1/v M±f#6fl&^/=50o Q:4 7U dr y Aiyoy^Ac^^o ##j^®yyvy-i/3XDm^UT, a. Mfc.&mft&ft'ObCD b. t'(DU^^lX b&&\itWML^ lb Q:f yoy^ A$r^(D A:7yj^-'>3>i:ltu> ##j;o&M;5&'T#&o a. b. &£A;£'(DT7V'r — '>3>{Z&\,^X, bfc &(D&&WIZ& 77b^777^m^y# \zx V y b &ftW\t %o ^l/'^ b^O 637 v Mcoy^x&^t^o yi/ 7 6^, 7l/vb(D#a^m#(D^-;i-^^M^m7, 7 1/vb^^^j@^6k. 77 b/i'7 77##fi6o :lx^©:^3>;t0^y D7’77^fllt^ 11*7 1/ u&L, $^®yyvy —y3>T##u/:^^, ;i/—yi//< Q:y;i/f-f"yy(Dy7yA&^6^(±^of66D^? 7tVy77A(±^fo^6(D ^? A:^^^AC(i#7TV^^o f&T, fd 1/7 b U^d77y^^m^T7 1/7 b^#^e &y#“ b b D :b(tSuy$h&/tgx&V^ —DCDfi&X (±&6^6oo yvy^07y-yifvyd^^(Z), #-yvyi:^f ®t^yb s$)6CDT\ 7 Xibfc o l^T&^fx. fcl^o - 112 - (2) "Hydra Implementation 1) HE3E (Outline) Hydra XC##&XDt7it^^#L^ At:o^T(CMP o Hydra CD 7 ^ U 7— tt 7 t ^ lZ~DC^X j&^SLt o ^ W:, Hydra vWtf- F t % PC;a:T(D#A^6, 7>^-rAXX^A^ ^mtC^fGLX^a Hydra OtU b 7° lZ & Mtl % D 7: to 2) #Ak CMP t£ 3) #i% CMP & - /i> FH(Why a CMP? Bandwidth) CMP Xti:X nx-^ 7 XCD$to;b5&u fc#x A> F ti©^X0 £ Xo read t write (DAT. ££0 X IZ X > X 0 7 > F f a d t % X ^ a U ;^X*@&J& (fa C ^ #XXo £fc> /W XX f Wb^cL-DT^^lbSrlH^ C tb^Z.t>tl£to m/:, ;ixXDF3;i/0@x^#f^^^^&^#fac^^qi#gxfo AX®X f F o F #(:X/ 4) #i% CMP - 3 L — 1/ > X #J#(Why a CMP? Coherence) F#A^^acaCZcXx ##^3t:-l/>XXD F3;i/&m VXS'tM£titit"£ tft7:%Z>tDlzt£ D £to Z(DZt&, iSff£ X > tjllz t a 0X\ Tz, xo F D o #i%% W.M'&tu b 7)b%U mc^#fr. mmc&^XDt 7-x#m#c#:&oxxf >0 3-x-x7X&mf#f a# #^^-x7X^±c#%Lx^ax—x^coxx-tx^^^-facox, f 7 F'IXcC^A^tXo f!lx.(f ##^Xf Fx;i/"7n F^ji/^ffli^ - 113 - 5) Hydra >(The Base Hydra Design) Hydra C 0 £ o £(7 7 4 F#!®)^oTl^to 4 #07D t y L/:S/>70h^ vy^^f-yot y-y^r, yot y tt#iz i^ y i>%^ i y —7 ^7 y '> j.^ilt)oTUt t"0 2 Iti^ f o 4 j@©7D -fe y +h itt write-through 7^7 77^ —7^I/OA>77o: —7^#7^^tTV^fo 6) 7 t U 7 > h D — y (Memory Controllers) 7 t V T777^$U#f #^0#]lU/c3 >hD-y^7T"- h 7^>0BT #& o ^ 7 □ 7 y 7-hi t;: & £7 ^E V n > b n — 7 ti;, 2 % jp 7 y 7 7. ^ 0 fig 0 y — 6 readM, writeMf h^f!07y- h7y>a LT##L3:fo 7A>7tVA>77 > hO —f" y y^gB^0A >7 7 a: — 7^UT, 2 #^-byy:L^7°Dt'yy^0y-f#$&&fMf&7^-h7iy> t LXWJi^L^to 7) 77— h 7 y > (State Machine Design) #7y—h7y>0#ff#C07yA T77 7#^(^-BFIFO ##VV-7l:T77 7 ^ 6 =] U\ CRA(Central Resource Arbiter)IZ$c$I LXMi¥^: ^ t~ o ^-0#, 7tV77-b7^^Tf^)^'r, y7770#^H:^D7^^67y-h&^ib |5]-7M/7^077'b7^^^L^#^CW:, ^^^^077"-h#cy4'7;)/m 8) U V-70iH (The Central Resource Arbiter) CRA Co^Tg%%L#;fo d:7t:^07y- h7S/>(^^ SM)t)##y V -71:7777f^^l:(A CRA SyA7;i/T. 7 7 77^^^04:7 ^ L7V^ SM # CRA C V 7 J:7 h & f o l#|^C#m0 V 7^7h^m6^^#A, ^0^m-i:v7]:7h 0#%|JC (ta &0'%% read 0^# write X D & {§.% £ tl £ t o u&cpu#4#0o%^-f&a&-c#%j^#i&3;yaifo fAi^0#^jgw: CRA (±/J\2& ROM ^-y/U/^Mr-DT^T, f0#&m^yy7^7 H:^f^m:#01/74i>7^^hK^fl^fo CRA (4:^#l:^0y7A>7ty 04>^7x-x;^'y 7 r%Mty '^X % z hi)#)*) £to 9) 7 F 1/ 7 0##(The Central Address Arbiter) 2^jp7yi>^<7^0/:y777W:, 7A>7tV/\07777#TuTf&&7\ ^ 07Fl/7&^^U%^^^C'aC'(j"^#Ayo CAU±, [a!DyM/7(:Mf^#^0^7 mi^0T777(4:A-^-A >7^ - 114 - ^17^*777 7 70^7 £^7 tlzte D t) ^T'il U o 2 #4" 7 v S/^_ < 6 ts f 07 b 1/7&, f < 7^^7^)7* ^l:^otl^it©7n^7 P P7 bMmtt$iL£to it#0^^, 7PU7#5- Hv b^7 7 0^l57^Hv b^tv b^^l,^7o tv b^^l^Hv <70#g^^7f^J^(:7 V7^ft, f^70Hv bA^7 V7^fi/:^^(:(d: D&bT7 7 V 77 7 7^^ff ^fl6 0Tf o 10) CMP T jE £ *tr cl £ 7 h (i ? (Can a CMP do even more?) CMP T(j:7Dtvti^0m#^#^cex^0'r, yD^v^^-cmm^i/v Pjm 0 j:9 C^^coWimCM^XimD'rf o L& < T, 7n 77 y(DMtt 0&ia:3 — P£fbJ$bfcl*tU£ft D £tA>0 7 7T:\ 'J7^-fe^®^^-l £ # X. T g* ;£ Li o o Hydra T'lilP^T 0 write (i write 7l7^MDTf&07Dt v it £7n — P777 b £ft£0T\ g-VD 7 vlll^XrA±t^4lfc^t© write (7)® 7 ^#7#g=7o ^7c, S 7 — PT/E7 read 707 *5ft;E#to£;:SlfTl~^§7D7"7 A7 7*7 > b £ 70j:o^%#aU%77O7777jmL^m^{b(±, ;\-P7 :c7t:7'3Tg##C^;t&0#y:^&y V v bTfo 11) ir —(Data Speculation Requirements) #uxB0%#^^y t V 7 7 77&^#T& X:#)£:U\ 1770 5 c0##^^#(c^^ h #x £>ft£7o (B read l/-i>3 >#fG^L&7 # # s >#C^^L/X:77^ write 0##&^A£C'fb'T#a## (3) write &yi:0 7D^7Amb 07 Lwm^7^776## (E) L7t^^07l/v P£:^L7 t)]E L1^7 7 V 0 view 12) Hydra 0##tb/p — b (Hydra Speculation Support) Hydra b£:c^7g%%L^fo 1 #3-7 Vi/^-077'^ b#ffl0^7b v b^^(t ^.ntl^to 2 #77 v S/^.07C^#8y^ write 7 — 7 7#x^) /:6607; V 7 y(i77 L2 /W 7 T)ft & D £7o ^ 6 £:, ^/:&b037Dt v+1^7D 7 yVtM tltlA £ 7o 7ft£>0 V V —77 lot, ###&77 U 7 7 77 7 d write 7“ 7& write A7&E6 LT L2 /i'j/77i:S§)i^n, CTE^tiH^tiSTo 7*0fi^S #&m%&m:77&a%#mU#&"3Z:7kkL 1 #jr7 v^o.77'^0 Read Hv b& 7^7 77^7 h7^utinIfb7 7o $ 7c n 1 # 3 7 v 7 :x 410 Dirty tf v b - 115 - 7 C jsstotss&^ft Xz7-7(i L2 Ay7 7it|-CtEU^mmt%c,T, 2 ^4r7y i>^t#Si&mft^t-o # 7° D -tz y if t IE U V'' view :ttJ3-Z.%>fztf)(Dsl^ V V < > 7 (t , 1 7 iy pre invalidation H v b 0#^ E: L2 a y 7 t fr £> o 7 t y —7 -f > 7^##^t J;oT ft o 13) 1 #^7 y i/ n.*? ?f <£>S¥$B(L1 Cache Tag Details) 1 17^7 7 '>:i 7 7'to AT £ t§E b < I&0J1 L £ t"o 4 Dlto 1 oftii read t^fa-ofzZ. E £: word #jitif Read-by-word 7 7* T\ 4^0 jo!®!! S read t)%'fc>~DtzZ. EtDj^tititfflAbft^ifo 2 0&b(t Written-by-word 7 7* T , write h %&> ~D tzZ. E £: word # ji t tf L ;£ if o C ft (t 7 t V V % — < > 7 i: ff 7 /z &t|£U'bftfzS07:1"o 3 -OAW Modified 7 7* T\ ^iE% read *A V 0|gC, 17 ml0tf # t & -5 ^ E' 7 ^ ^ t" & 0 "tr f o ##(4: Pre-invalidation 7 7^1?, I/DT^ AtTV > b0^^^^7 LT^tmAyl?«EWC E^^b^f o C ft b 0 7 7'0 j(g fb f 6 /z 0#XU ^ D 7 y 7 ^ ^#'%r f o 70^7^03 ^ y b t) b < (±^#0#7C0^z^, Cft60MHy b <&-^t7 V 7t"6 y 7^^' m^to ffet, Modified btf-fey b£flTl'£H#£&5ctf*i£a Valid tf y b £ 7 U 7t§ [!]{?&Pre-invalidation b y b ^iz 7 b ^ ftT A ^#7^7 ^ y b C o fcB$t Valid b y b £7 U D $Lto 14) L2 dy7 7 0##(L2 Buffer Overview) l##m##&;U/y Pt, 1#0 L2 Ay7y#%&b&fo MtfftbL^bt, IfcXty P0*§^£ 2 ^^r-v y '> jl tMBfc'f atPH^A y 7 77 7 > PTrfi1 otz®>(Dx- ^7b7^ L2 Ay 7 L2 7 s? y 7 7 0rM 0 tz <£> D v y 7 Cft(i, ^#t^%b^Xt/yp^^e0^Dgb&t-6^tAy77& 7UTb/zD, ^byp0^T&^#t2^^r^yi>^^0#SJ&^^bf:l), 7f“bV'>>^ It7ly770^i^fibfc Ds Ay 7 7 £E 15) L2 A y 7 T 0##(L2 Buffer Details) L2 /Ay77to^TMt#L - 116 - 16) L2 %(L2 Data Buffer Sizing) L2 a 7 y tWJ Xlzmt U tt 0 z.®#? ylt^U'Ot v — L2 ;iv7y(DJ:> h ;i7 7 70#j^(d:7;i/TV7T7' ^ 1KB write L2 7 i 7^/^. ^,-e L j: 7o 17) #%#l#3 7° D -fc 7 ^(Speculative Coprocessor) jSHHff<7)rOT£:fT7 3 7°D-tr 7 tKCOUTl^B^ L^t'o 3 7° D dr 7 it f*] ^ < 3 1EUW7- J l >77:7 U ^^2, ffe©7Dn-fe 7itfr£>®3 v> p o 37 > P&^^&T P 77^0 write t LX% ##0777 p^^t,±(t^, ^ftpp0777 p& f 6V^ < 3^(D#m^V7 P 7x:T;\7 p^ ^ 18) ###cfr0 7 > 7 t A 1/ 7 7" A (Speculation Runtime System) 7>7^A777"A^ U7^<3^0V7 h^:cy/\> Ctl 6(±, 71 V#3B0#@L 37D^7t©R ffe7D-fe7li-A037> P?£fl&£'£ff VU $ 7 7 P0#U^$«j#P U^f o ##N#0/W ^7 —7 s p^scneu^^ti^To cti6 i>^^0^##7:W:V7 P 7x:Tl: j;6T73—f-0^^#tl7 7'^f o 777^A77 ^Ai:Nui:(±, i^iTcD2 ttlto • “Data Speculation Support for a Chip Multiprocessor ” ASPLOS ’98 • “Improving the Performance of Speculatively Parallel Applications on the Hydra CMP” ICS’99 19) 7 7 7^ A77:rA0;£^ tib (Runtime System Summary) 777'fA77f'A0ot>^&60k:UT7t&fo %#A^^7 7 7 P k U7 procedure b loop C k ^ MMlt loop (D - 117- End Procedure procedure U y P &y p #± 110 1M y ;v •Loop Hjtil Start Loop loop CD# iteration Iz V P t LX^ffT £ *P S £: HE X, t* & o 70 tfd' ^VP, loop 30thY ^1P End of each loop iteration M^£%'fs 41 CD iteration b, CD iteration (C% D #^ 6 o 80 +h^ ^ ;p, %#CD##& loop trPMS L^S'nid: 12 tfd' ^;p Finish Loop MUfrrpC) iteration *W:71 % t t & lZs loop ^ 7 $ ^ £ o #± 80 +h>f ^;p, ^#CD##& loop \ZMfeL 22 IM y ;p •Support ,>P —if* > Violation Local Hff FpCDyn -L lz — 3 EX 25 thd' ^ ;P, ##CDM#& loop tzW>7?& 7V7 #)]/ Violation: Receive from another CPU 20) y —y /x*tf — P(Anatomy of a Data Hazard) P^^^L/:^^CDmf#^EI^^T3l:m%L^f o yot ytl 0-CW&M fpyv^ izkL^fo v p^^u^ti, c cr, LUfyn-tyo-hv+f yi d^y- i,2,3 P uy0lid: x D b^^c^»iz v p^Uff/%#!:, $ y D t v tf Ot'X ^xCDSS&^^fe-o t btto yn-t '^o write A y^muyyDtv-y i yot^th i VIOLATION Po P^fd: j; D^aau^y i/ v p^## [#0yDt vdf 2,3 1: KILL y f KILL/\> p7 b^Xo bT, yot v+f 1,2,3 (d:X^x y iz ^ p c 0 o - 118- 21) 7"D b 7 d* y(DMW(Prototype Overview) Hydra 7°n b 7 d* 7°©1^#£ LTl^to CPU 3 7^77 bJafetv 7^eu 7X7A©JiiizttmtZTyu—^-xm&xi^to S|37D-t 7ibC (d:%#891:7 t V 63 > b d — 7, SU D *-XA, ####^7-0 A7 A©7:i-->A^©7d- b/uy 7&g#a DiAAT'U^t-o 7^ U'>X7AK:id;, read /^Xt write A A, V V-XOTffl 3 > hu-7, 7" 7A '7 7° 2 ^A7 'r>2^ ^777 A 7 t'J®3> hn-7N A£tiA kA/^7 A©^^©^^^ >^T A/Uy AM##^Am±d' > 7 T a: —T(±m^8yc(d:^o D o -A 7- b 77 >©^#^7 A >7 • ## 3 7° UU7'Al*]7d'7 -^#7 t U©V77l/>At-7 • Central Resource Arbiter © dr 3 7 -A/^7 AM/^At-7 '^7 7" 7 A 7 t V 7 >7 7 —A •*A batSHh©^iJAtitiAd> >7 T x —A 22) 7D 7®7D77"7> (Hydra Prototype Floorplan) yu y 7d*IDT#©RC32364 t°©7 ^^-AlC%^A^A, 1^7-7(77^ 7 7^-W:#^, PS®) A-7 ^ ^ \Z 8KB, 2^7 7 128KB tto f'VSfli 88mm2, fVU'71i 0.25um tto 23) E^H-© BtH(Key Design Challenges) ^®(DmnY'T:lZ& 24) $fctfti!$i k 7 A© ^(Statistics/Debug Mechanisms) D , fA'V©f:fetl7n bAd* Ag%% U ^ f o ^7f - b 7 7 > A > 7 £ tit l^T, idle, busy, arbitrating & - 119- 25) 7°n b ^7 7° 0 A Hi ^(Prototype I/O) 7"D b f 7 7'®^bgP7 7^ 7 7 ^ LTW:, 7:^^^7f'7 77tV^®^777b &7 7^ 7 ^7 b7—77X—73 #^J I/O 7 7^7 J: —C®7 7f 7:c —7&MDT 77 7 7 X U t7D^7Mn- b OfctK 7°n 7"7 A0^fT$pH£flX t> ft Cfc bK #$ 26) ^ 7 7Di£§l'(7) D-^Yy 7°(Chip Design Road Map) 7°D b 7 7 7°©^Htx^{i^©d:o l:/d:oti^t o 1999 ^* A7 7tf-x7 b© VerilogXx> 27) i£ £#> (Conclusions) CMP 7^^r^7t©7t #g#C7'Dy7A®#mb&e7Ck&gmL&L&o ^#77^77^ mtAC 7 7 a, jloT mxc, Hydra @77^977^ -737t:ciM:m%umL/co p^azy^oiM:, a'7f &Pp7\C##L^ Ltzo IHi: Hydra CD7°D b77 7b£ L/co 7° D b 7 7 7"# v7b^^T^®#m^m #)T V < i^Xfo D o [mmm Q: 2 ^^7 7 7 n.|i7 7£Ol/# — b £ & o T l' £ t~ £A 7;i/f^- M:if7b7t'L J; 7 ^o A:fwc^uxm& o & fa\ oxw:^- b&mAoci:^#miq]±(± D #H$X k#x.-[(7g;To ^7 7 7 :x a 7 7 (DftM i> lilSb: & £ D &i±/vo Q: 2 ^jr7 7 7^®77t73 7^77 3 7®##(±fjl^ L&&o V ^r —7 3 7 A:&k;Ua tomcatv&^T#:37^773 7#$ Q:Hydra^@[77 7 7 j;7^o - 120- ;i/ —7l/^vi/0# j:o U z O^o A:f ^ toiJ^<7)7 1/ 7 f & &:U:;i v 7 L^l^#/vo X: ^7(j!#j^(D#1^7 1/ 7 b(J;W^i/-S/3 V vm®;i7 7 7#^t:##^& Df 9tt o Java (Jh“7°h77 7 7 ^1“ l>T"fo /W b n — b h Ti7>(D^l:&6^M'7i>7±'r##%%:7 l/ 7 b &f#& C c a^-r o Q:iM&%Ajg(D7 1/ 7 K&tU D mf C a^3 >;W ^C-C^^CO'T L j: aaewx ;i/-7^0m#mm^^BLi:, mto^ma&frA^ct^^f-cfo Q:/W tV —'>a >IZ ££7 1/ 7 Y(DW^'i7&, 7 1/ 7 b %9ifr 7 1/7 S/— 1? ^ U J; Ly^o Q:%#7 1/7 procedure ie%;mL^T#f^tC^#7 1/7 ^ L^L, ±76D7Dt7tb^##Pb'C^#^^^^X:#'o, ^ a#"?U j; 7o A:%#7 1/ 7 b}:(jf 6^171^ fo 70 4:7^ (Dmi)fj(j(ja:#^^^^7i/7b^a*#^f^(DT. &7 1/7 b&^1Jpb®7D t 7 7'^^1J^Pb#rL7s $ffc&7 1/ 7 b£ V v —7 &%(J #f c k(:^ D ^ t"o b^7 1 Ol:^^-3TV^(DT(±^l/^ U«t O^Po Q:7 1/7 b0#^m{6^^CT#bbL7l^(DTf^o A:7 7^^ Ai>7TAT#mL7^mf o 7 Ai7 7^A(a+#C##^< 3-^J >y^^^V7 b 7^7tf o ##(± ASPLOS t:##L%B8LTT2^o Q7l/-7U^1:, ifXthenAelseB CDj:9^#^#7 1/7b®#m^'r%7C:^(J: A:A ^ B ^^^07^-b7> - 121 - (3) "Hydra Software i) mm #8# Hydra 0V7 b b C? >/w 7&^#v7 b ^^y^#(ttu2%ctz:t,&itAyo *0(2, Hydra b 7 xiTdo^T&fS U^cl^to 2) #t3l(0utline) *0®!S®4iJ$i2C (D 2 a (-& otl^ 2 o %#7 V v b<* — b t Z> tz&> ®mff#i>72A(Co^Z:&BL3;2o CtlC2^T, ##7 1/'7b^^%2^^^^, to ^Cfiv 7 b >>x7l:o^t^iltto #%, Hydra "£(2 C &&XF Java Ha§ C #^(:o^T(2 V —7 to V —7©3£?!Hb b 7 >7 1/— ^ Hydracat ^^bt^tto CtW2m#®* —y#8g#^ij1l/ —y(:^^*6 4b®l! to *012 Hydracat ©fSIU ^EiSfb^o UT jolS LA^fc L^fo &C: Java (£oUT to #SL^*o Java -£(2E^'7^>^^^m2^6, ^OECbO _k^#6 A\ 2&t>t>, E£lt7 v >®—#' —^<7 3 1/77^7773 — 7&$0#(z##[qU:^#6^ (co^%y#%%u^ L£~f o ^ tcE^vv > _k T ito 2 £ B myny^Ag#:^, 7 w b^ztmL®%#mec2oy^#(o]±Lm2o 3) Hydra y D ^7 ^ (Parallel Programming on Hydra) 702 3:2, Hydra±-e^(D2 7C^^jyDy7 < Hydra (ivji/tyn-t 7lt-£2^6, Wvjl/yyn-fe y it [nj (2 © y n 77 < >72^tk(2f © $ Siiffl Wlb^'t" o L/?P L, Hydra (C (2, y D 4z 7 7" (101##© L t 7 7 7#J\ ^ C* L ^ 2 7lt^T|5]^^%^) 1c 6b©#*##2 $) 6 , LL(Load Lock) 2" SC(Store Conditional) |©D y 7Sfb£tb* — b btLtto L^L, Hydra ©#4b21c^##(2##7Ly b©1t* —bT2o Ctl(Z2 D, yo7'7A©m?me#^m:%D3;2o t), &L4&y — Sl(2 it/b^/b^tcl L%4b, 7n— b7^y 7 b b7x:Z'>yyA®^^^2 D, JELHS^/b^E£ft3:2o #%, b 7 F ^t^Et^ D t to -0(2, 7-7'>—7>iW mck, nfztmw:#^f^3-ba^^yi/vbkf^4b®7?*o - 122 - 4) S' (Speculation Runtime System) T*t3\ A©S5tcA D ^©luicvx- F^x T©!^^ Ll ** y'>ilC«U 6^^)©##&©U v F WMlX\$tlXU£t o L2 v:2.©^fiulC write ffl©Ay 77^t)Jto C ft £ ©7x — F txT&fM'fa fcfe©37D-t CPU f o Cft6©/x — F^7xT&$iJ#f ^©(±V7 F C7xy;x> o A^ <^UT 3 # ^®;x>F7^&D^fo ^oia, %#^tV@f^;x>F7'%r, ^©Xl/vF6^3: Dm#^n!&66^mic%#U, 373 U F'7t'to 3 #U(± write /Ui:37> F&jTf&© 73 U y 7 ^©N#J#T6;x > F7tto ;x — F 0 x7#f ft£^til U V7F>)x70MiI^“f>^ |07Dt'y^CXrl!“'>^Iot c©j;3^m@©-gP^V7F^x7^e3C^lcj:D, S&^m&mWc&OSiPo 5) yiPF^Ri U##3 — F © #%# #c fr (Po st- subroutine -call Speculation) TW:, #l^lc^©j:3lcU7^#7UvF&^j^f6)(p&M%vx^^L/j:3o #^©X ^(d:77;P-^>^U:mU©#lCi^< 3- F$:##7 1/ 'y F^ UT, t&W^^'ijt Z> hVxo ^©tto C©#T(±, Prod, Proc2 ©z:o©-tF7;P —^ >6s $) D ^to # 861C ^ ^ >7D^7 Procl 6W^ftfc h ^1C, 7 1/ y F© fork ix^To ^U7, f©l$^mwc#i^'#'^« some code »©gP^^%#7l/vFkU7 Hfflc(j:f L^Xo ft&?#jU^Uft(d:% C©#rai, Procl d lc(± U #?#©X^a U7ld:#^^^©^^^ft7ix^V ^ — &%5 x#©x^^mmw#'rf#^©^@©BAyi:^uT(±, c© Z 6 tc U%, ^o©7 V y Fti^^ijlc^b o ^ UT, t) U^7-^##^.©#M6^# £>, ;x— F 7 x7UX F <)x7^> F'7 tCj;oTilu@©MT'j£^£: £ 6 &[U ^T> ^1C Proc2 ©dfl/m o #U^[B|#lC fork #j^C D, Proc2 ©df U7P Licit < 3— F6^$|7 1/ '7 F' tte D , Proc2 X^^M^iJlcHU^ tl^-To C CT, Proc2 &^^U7Vx^71/'y F6^ Proc2 6^6© Procl ©e^mUlC#!jmU/:#AlC, ^ &%'J©##7l/y F6^^K^ft Procl ^^mU©#i^3-F^^eU^fo CCT:^# UT(5U!x©U\ c©^#71/vF(d:, #©##7l/vF, f^t)^^^>7373Ai^ © Proc2 U©#i^3— F U7lx^%#7 l/vF©caTf^, f ft 3: D t) ^mjg^mixa^ock'rfo m^^e©m^ic^ic#e^^%td:f©3-F &^eu%vx^A^Tfo c©m^^t)^^j:3ic, m#xi/'7Fi±^fU4bm^^em ^i>7X A(±^^##1C^ l; o 6) +FCOP—A > IC^f 'f'-S+F^— F V7 F 7 x 7 (Support Software for Subroutines) V7 F 7xy;x> F^ia, Mips ©7t>7v#^&^c7#$ic^m^<@^ft7ix - 123 - tto Ltwtto —o#\ 7 v v btp£>tife<7)7 1/ 'y M&(Dtz 5, V7b7^7y\>b7(t^01y^7^^^^TW^^^Ml^t^A^^0T, t^t 0l/i77t ^#$^(7tl(d:^f) t7-/uo ^tol/^7 t0^#CMLT3 >yW 70#(7^ &tud:, c^^rntt^M^ti^itt^to v^->#07#, tot^. cti^0t—y^vb}:tD, tE^0^%yb-t>(t^ 7Otb7 7yb0 3 7 b £E L t to tE^£E7 LTfr 1/ 7 K £ H£6t £ yb — t >&$ no tb7^yb#^Dtto gT'otvIf^Tr^mZtl&mM&Mt&yb-t:/# 30 7"7 7 yb N #737 'yy-'Vtfctti£tlfcM£i%fl&Mt£)l — 3:->& 80 77 7yb#^p Dtto ca6#m^&6#x.ac 1:U\ &t t) )kv^tE#^%#^f7t^0W: 7) )\/~7‘Wk 0 3^ L0##^ff (Loop Iteration Speculation) #(±yb—7#D ML0###f7Tto o£ D > C 0 t ? & for yb — 7^fe o fc ^ § K7 ^0e#DmL^-30##7l/7bkt^^v^9'60l!to C0#A, &^7l/7b^ yb — ^(D^T^^ttiLfzt^^U, yb —70^7£^7T#;5>&7 L y ^,tp^^ntit#Ayo f 0#A, ^7mL&7l/7b&%^#&71/7H:#Ll7 ^ 17t±#t o 7 7tybA^6 tit to cti(i7— c0t 3%yb-70##^e^31< i^ < tl(t& D t-tir/Lo L& L#b^L^C##^$>o7^, 77^ > L^^f7Lti^##7i/7b(t, yb-777f40#%^6^D@tcac^60"e, g #Rk#M^C 9CB#^ey(:^/7 b t^)^67rto t fc> ^-^xTC.tot #;mt:7tv0vt-<>7#e:bti&0'r, @#777"<-bfb^^m^titto t ^73tvit(t@^g#0^tV 11&6, #0#D^LCtot write ^tl/=7-t& read t&0ttl(t, 7tV Vt-^>7ais|#0#^^f#6 tit to 8) yb —f^C^l't^itzM'— b 7 7 h 7 -x 7 I (Support Software for Loops I) yb —7&:#t& 7 7b 7 ^T/\ 7 b 7 t:(t 3 o0yi —^ 3 >#& 0, Lt# Tc7(t, Slow ^ Quick ^^7 2 3»0yi-7 3 7^^tL^o Slow #yb-70##^^1f7yb-t>0Wmk&m%LT{Bo#fr0/i-y 3 7 T'to CtlttStlW ij;+b7'yb —t>0bS^Eff ^|5j L7 t7-XA"eto Quick ityb —7 m'r0ityyb-t>^#^e^#^#^^'3^TmmTto vt t^iz, f 0t077yb —t L^v^ki^o Slow yi-^3 >(ttbyyb-t>##^ea|g|L7±-XATt^^, t-/i/\'.v b - 124 - — 80 'V'J ^Jl/03^ h W&1P 0 £to —Quick V 3 >Xlt^ — A ^\7 < ^ D, #(:#d^L^T0^—y^7 M3 16 ,, c^ic j; D, yk-7^7^r DT;^ <&<%&, ^ mfo 9) />V—7,tC^>l‘'f'^t^^w-bV7 KxT II (Support Software for Loops II) L&u%#f), w 10) x“f## £ Tf £ #^(Enforcing Data Dependencies) ^fr^i>7 7 AX^^0 f 7 1: LT##7 L 7 N:7—f A0y\> o I3f m!#7 1/ 7 F 1 #7 M/7 X & read L7. f0#T#%#7 1/ 7 K 0 #7 M/7 X C write LX: ^ yi7&^#LT^X:7 1/ 7 K 1 iwmfog i Q:7 1/ 7 F©7 D-L 7 1^0 III D ^Xtt£'CDJ:o oCDXTfro A:^ff#i77 7A#^07'Dt 7th#^T^&&@LT:& M 7 V -^^07Dt 7iXH||?3^tito Q:#Ee#^77A0##7D 7 7ma^ fi%0m#&D 7ff &o A:(±lx #/uT f o 11) Hydracat 0##(Hydracat Overview) 17±T'HIf ^v77 A0l5^Ht) D s ^{:3>yW7®ii:f f f C n >/i T 7 iCOL»T L & f o #X± Hydracat(Hydra C Annotation Translator (DM)t^ 7 V —7 to V —7 0 C al§ b7>71/ —f£l!%LTV£fo Cftld:, C T°D7'7 A0 - 125 - ;i/—yb — %ffl D tiH ltit7;i/-f >(: L£1~0 ^ LZyb —7"o#CA7Dt v F7'^73-^#AUto ^fcTTicvv—7°£s 6j;7^c##yb—7}:^# L ^ t*o yb — 7° I*] O breaks continue N return 'fef yb — 7°iSllx 7 v M#077 7 7^7 77 7C#UD##6fl%m j:7k: L^(7fl(d: & D tiiAo tt£t>%, 7'D—y^ybib D ^#/vo — yyT\ 7°7i'^ — Mb hi fb&^Bid:7 7 7 7±C#l^^A^#fp|± f bt^to #%0 k C 5 ^ Hydracat & 7“ U 7" 7 7 tl 12) Hydracat O^# I: yb —7° (Hydracat Conversion I: Loops) C0g|U7 yb — 7°g#:0^£:^ btl^to yb-7°£mh IvT12J D titi U IS AODTOtfj ly£#A lv$f o f ^t)7x ®$£yb—7°0 H$6yb —7-7 (spec_begin)^p^ — 7°$| D ?E L©HTyb — 7* 7(spec_end_of_iteration) N ##yb — 7°©^T^bw“7' 7 (spec_terminate)& 13) Hydracat II: (Hydracat Conversion II: Variables) £ 7 L£"io ;v — 7°5IStito £ *5 ^ ^St £ 7' n —y^;v/W 7|qH:M:W:cfi 7 (: L^i"o yb-7"#-co^o#!maco#B#:0 7 7yi^#^f^ j;7 yb-7^%#ut:7G0^#0#&C0#ja#:^y^77L, yb-7#T#^:#^#:^^ 7G®^#k/\y 7/^77 L&fo ;k— ^D-yWfb^tl^fo :©b^ i 0#07 77 V 7 7 0 yb-7^^^r o7cmcg[#±(f i) 14) 7 — tWy 7 fcl £ £ MMit(Feedback Optimization) ^mfbagm^S'e/bo m^7-K(±L(j:L^, yyb^v XA AfaftJWMLX L£ 7 d; 7 c D , CO j: 7 (±, 7— bStE^ck^t^lbT-^. —-777)7^^^^ b> £1"o D mtl/vo Hydra 7:(d:> 1$ $10-7 n-— ~ > 7" LTl^&^7— LTl^FdlM ###MO# J; 7 br&oTU^f o tt£t>s K 7 ^7 £ V 7 b 7 ^7H - 126 - ###&< ctotcf 0gp^ $<0#A. 15) 33 — F|£^(Optimization: Code Movement) Tit, if ® 16) lijStfb: tilt* SJ (Optimization: Value Prediction) {6MJ£:ffl^£ ££ ^337 — F %±filz&WjTg£To —Jlgtett|fi ©mooam&^siL. c0^T(t. if tC0^Tft^h y (TMEM^l/^-tirAvo L;tPU if X 17) #i@Hb: |B)M(Optimization: Synchronization) itto C0#T(d\ |5|#i&g%&2:#)k: sum Jock a^7D77^B=f3#(tTl^ to sumjock & 0 TJZJKJHb 0 l£ t“o ^ fz specjock £: V't 7 0(t7"fe > 7" V SlnT'S^tl — 7" >T\ sumjock 0#^ i f3 & £ T *t :n f h Uto 31 tt f3 tl D ^ #U0 sum f3 write U^:#T, 310#b^L^ sum f3T7t7T6C 2J3& D 6& < & D & 7 h° > D 7 7 T' D 7 7 ^$(0 #& read t-ek^f3##mM^^^Tt)^7T(±@^0T. f0#^f3(t#^#^^ read A^ffll^to 18) C 70^7 AT 0##^ft0##(Speculation Performance with C Programs) Tft, C 7D^7A(:#t-^. %#^ft0#^&MT^^Lj;7o ^770# $6 ft Hydra 0 4 7°D dr 7 ^t£ffl UWIIt b tct§^0. M&mff fcttt - 127 - 6S L$fo 6©AIJSEIt@M©7D ^7A?t. eja-fbLfcneiie->^^A&fliv^te@^©ttig, $?> t, ttSas©KI+tS$B6fflvxTA^^^-->^U3- h'©tt|gs^ urv>$r, s» iR7D79 A7lf^-7##l±m&k|Bl@g©t©A^t^f A\ ®a-fbt ioTMffl 1.6 {gy.±icttigi9±urv'$t'o ttitsf,ffl7ny7i,-eii 7tt#g-e& 2 fgA> e> 3(g©ttig|n|±AH# £fc s ejS-fbU^UffBeAX-r ASfflV'-5>CkT*. ear fttll — 7)S8Btt#Asfe 3 ©T* t AA SHSIfiT-6 MltolcSiK D S V A5 9 7 h U ,ft^M5USAsffe-nri'^fo wc fa»-/ir^»@C'j'Si'©t\ ^-7-etta ik^fiy.T©tt6btc»oT L$oTV^$TA$x S)6-(b$ftA:9lfi;Be->7 9A&/8v-E>e k -e, g%© 1.6 {S©ttlg|6l±A sSI £ ft$ 1~„ mpeg2 T*li3 — F^iJfflftjifttCAoT^- -67 7 7sa«*u tttgi6i±uru$i"o 19) C T©jSt8Slff © $ k ^(Conclusions from Speculation with C) cr*©sssifi&$kto$i-o mgawmecz o. kA^#-e ■to -9-7;i/-A>EE3- -y-7ll/-A>© V 4'->(ilAs:FiMT*Ss A>-3S +1711 — f <1*-M7 K &MT- *&*•& to ;V-7iSbiE u ©###(? If. ($kAk’©;t—7T$irfe$fo ilk it—7tc)f Lnne-fbSft/c. #-/w-7 b©^»t\#i?#77TA&mt''^#Ac, t> ito s«toae?mGicS'd< 7077 5 d$t-„ ® a*e©AA4f7, i!S©ii?A7n79Ak$7>fc < l@l#lc U7#9iJ7n79A< C e©36?iJ7D79 A14 Hydracat titl @#*H:###f?|o|ldlC#^J(b$ n$1"o $&. ;\-K7T:7A^m#f L7. 7D79A© #a(bl:$imf %ckA^@$f. 20) Hydra fi Java C k 7> "C9@69(Why Hydra is Ideal for Java) f fl'CWX Hydra A^tb— h "3©@e§. Java © K C # 0 it, Java fflS!gtt«kHff9#i:li. Hydra £ InH'T t'3 »© A O 0 *To If. Java ©f-f 7 -f 77 V -7 btf H/|J3bfiJ%9IC##IC|o|t\7t''^f-o $fc Java ©#R#9% k JIT 3>Af7ttMIfJfflifi!jftiai:Sitlilt. $£ Java ©EHv-> >©96$l **-f. [email protected]>>r*ti:. #-^->*7 u 9 9 a >(GC), JIT n > d(7. )7^n-fi >7kt&fiE&k\ ^ < ©7Jt- M/-A>6e£;'Sk L$T„ eft ?,©;v-A>©lSkAk(i. «Kg*‘£U’ffle@ffl5l?iJMte J;3x8$A$Efe7 $$-#■,, 21) Java-Hydra 9%(Java-Hydra Environment) Hydra © Java JSitli Kaffe EH7 7 > tlX 11 * t o CftCIf MIPS 7 — ffi’ftiffl JIT n wW 7 Asa * nr 4=5 0. ASift 9 97 9 ha-fffl 6ia-fbS:fft'$1"<, 9f 79 V If JDK1.1 VET AWT ^ SwingSet &7J1- Utl'l •fo Hydra T tt e ftCtt @ ©«|gi§ft]%ff o T tA * t. f ftl±. 7 D A |S|Ri ft k'©3£9l]@L97 0 if 1 7. KSUffa > b' 9 Tf 0 - 128 - 22) Java 7?©X V V K (Speculating on Java Methods) TJJ Hydra T Java © X V 7 K £T £ l5' -4 6 c ck t), X V 7 P©###^fj©a Li7xX©#& ^#aLT(J, XV7MTXtX77X&gSU^c^tcj:D, ^fT^XXTA^^©XV7PA^#^e©^mA^t»^6j:7(:L^fo JIT 3 >;W7(±%#^e^^#f67-h>yV3-KAU^fo XV 23) Java 1!©A/ — X##^fr(Using Loop Speculation in Java) Java TA/—^^^Cd^TTTo C ©%=#)}:#, < o^p©3 — P^#A^'#Tf o cntiATt:: £oT -SS^Wi^TT U> V —X to V-X© h 7 > X 1/ — X £fc i J V — X to X VX 7 7 X A/A X h 7 — p © 7 > A X X £ £ o T & WtbT fo A/-XJtX4&W^mUTX7XX74'^-h^XV7pie^ m LC L^f o X V7 FWffib^lttt7o©A-^3>^ffliLTl^to --oiJX 7 TT h©^m©A/-xTT ^ T, ^&T°77h7;f-A-e&^eqr#-efo c^u±m ##±©^TA/T4^%Aj:7^LT&D^fo 4bo-7(d:^ ##mm©A/-T'#7^'rfo o 24) Java t*©il^ft7D7 7^f V > X(Advanced Profiling Under Java) Java TI3\ #%7 >;W A/M LTr^Jg&7°D 7 7 X V V X##^)^#T #^ To mxw:, ##jC7- D, XD77^ u >x©^^©7- p&man LfcDiJI$UfcDT^£To £fz, JIT T7 VAX A/£ftfc7- P©^E^> 7°D 7 7X A/7—M©/:^)©XA—x^T^JLT43< C^^)T#^To C©T7(:XD77XVVX C k CZ D X LIT©ck 7 D & f o ^T, 7 K kXu 7 7X V VX©##j^^©#laAA#T#^To Jfc^CJStTT^ #0 H;(Level-of-detail N LOD)I'llt^to 7>7'J VX©T?££:fUffi LT 25) 70774 V 7 T &%##cfz©#j®fb(Optiinize Speculation with Profiling) XD77^U>X(±#mfbCt)%M^^fo &1\ XoXXAT©^©^^^^#^^ \z £ ~>TMMik£ tiZ o % (Dlz$.z.£t o cfU:(±, Az-t^TV X>X v 7 P Wtitib©XX XA/Bu UX-v#?#)©#^, HfrM&£©tilfg/b^ijffl-e§^t"o £ 7cs F l/X^IS^^H^ £ s fet£t o 7°n 7 7 4 U 7X©^^£TiJffl LT, ro^Tvti^o-FI m^mmfbx cxoX7A©#^^im#^A<7^©mm4b. D D Jto - 129 - 26) 7 V 7 -S14Hafpj-h(Speedups from Method Speculation) fW#&Java yoy^Ac^f^, 7V7h##me(D#^^^Ly i^to 1.3 3.5 £' 27) 3 1/7^3 >(Speculative Garbage Collection) Java 7D^7 A g #:£ iW^bf 31ST Cfco l> Java 7" 7 ^ #|y(d\ yoy7 7A^7t U#^(D#^$r,DEL^ < Aoi:, ^^7777^3 1/7 73 > ^-^737773>(GC)^(d:, b-y^(D^y7x:7h^ 7^ &o#!82fi&c &^a-7y:c7 h ^)C a ~C?oGC 0 J^o-oid: GC ®m' 28) V-77> K y —y^^C(Mark and Sweep Collection) lf7-77> H7^ - y^^^-oUTEB^UT 43§^t*oJava l: (± SH^CDIV — K fo 77 ^;kh 7 7y o-yi:#^ ^l^y7^;i/h777^(D7li^>7, $J'^^y7l/7M7777(7)JiM'>7, ^LT, ;V— b#esN 1~&fc>£>N B)§^^ii;i/— hy'&3 ^7 — 7 $tlTV^^7"7^7 h7?f o 11/ - h^^m^uy a^icid:, #(d:7^y^l:-yy, ^-3. &&©7:'1~o Mfiid^fd: D 77 7"ftt — ytry#\ f (D Ji7 > h^c^#, JA6, £ fc id: 6 cD^-y^j:7 J:7^^(D-eyo K6(D^-y^/^7 hid:, — > h^(7)^7"7ai7 h £7 — 7 L/:#T#ic^ D £ to 6 id: 77—7 $ fc id: £ £ 7 — 7 £ fiyt^iu:-yyyo y 7-y#)cTu, ic id:, t^t©Siyy^x7 b id: 7d' — ^7i;:& 0 o 29) 7 — 7 7 7 h*y^-y£^(Mark and Sweep Collection) ^§ c0EIid:, yy(D^p#(D#f UT^&fo S, K6, 6(D^7"'7x7 h^rSS l/T y 7 — y7^T Etc t §id:, H^fcid: 6 0:7 7"'7 x7 h D ^#/uo H i±7^y^^y^o:7h, ^Lza(d:^-^7a^D^yo - 130- 30) *6 —7 7 7 > ^ n 1/ 7 7 (Baseline Garbage Collector) /< —7 77 > t W 7 7(37 CCD^^-r^, 'X'XVx.# b £60-!t7 TTlT'oT, 2 To rT£i>j T7"7a:7 b 200 ;W b T ZS0T 7"v :n 7 b (JC ^^cmc, ^r^#^j ^-y^o:7 b ^<0#^, eft(3:^#m^0#t^E^J7!To mbmJ##mc%6T7(:#0To%:f] j^ffl^T^TTo T&t>*>> @B?'JT'&V>T7"7o: 7 b 07*7 7'£T#)ltS LT:£§7 T 7"^ ^7 bif7Xe^(CV—b 2 ft 2: 7 V — V 7 b $:f#oTf3#^To TT^j:7b^#!l t)#(76k#(3:, c07U-U7b^^m^T#^To 2 ft & To & /:, write V TCT^%7 >7 V 7 >7;i/^ GC D ^To T% #WT7y:n7 b&6m'T7^:c7 b/\##T >7^#^#AT7 a U^#fr(: (37 b 7 77°&%ac LTTo Cfl^T D T7'7ai7 b 0® 6 ^T^£t££:ftT To e 0 E ^0^&6C, JIT n>;W7(3:t:-y#m^^MT6T7^;W bn - b, T^T>t,. putstatic^ putfielcU aastore ©bzfB tC write A U 7 £:# A U T To 31) GC 03£^!Hb(Parallelizing Garbage Collection) C0#(37 GC 0a'0^##^0To (:#mb-C5 6&&^L&&0 "TTo mTyn77A71/7b^ write V T^^^T6 T6^ ?(Hb#oI#ETTo GC7l/7b^^60V7b^77-yT6^^C. ;t-7%# #ffl:T^#^iHb^mI#TTo &fsccT(37 ^Hbt)#x.^ftmTo f0^0, $$T^L@T^#Tyi;j:7b0m^j^m4bx ;i/ -7°%#^^^^(±#m#^^iJ^e^W#7:To ^6V7b077-y^t)7- j^etx 80Ty^^7b^^%Lx ##l:Ty^^7b0^Tmm^fft^T^, Cft 6 32) write /i V Tt^foij" 6j&HSI fa (Speculating on Write Barriers) f ftf(37 write ;^VT^^(t^m#^e(:nVM:##C - 131 - 33) *7 U 'T -i GC )\/— 7H&O %> (Speculating on Critical Collector Loops) —ycD#^)^L# C, W.WLO U \) x h -£UWt Z> z. biz b£ LTzo :tii:J:^ a^l)10Mbi)5,J> U^ %Z>£o £ & D $To 34) ^ (Dlt#(vs. Traditional Parallelization Only) atm-fiTo c^ui#mmi(:(±a:#me#^ib^i5i#'rfo c^c, ^IK^DJto Ctl£*fLTj£$l^fT& D (DWinlt, l7D^'y^t:^lt l^b y F&Sfl D X ly ^ h \Zt£% Z £Wfo D 3; t“o L^L, ISMOIE 7D^77©f±| CEL < c t tlth D ^i±Ao 35) GC ©'bttli|6j_h(Speedup in Garbage Collecion) ccomw:, GC C0#^mecd:D^ft^(7e#^[ql±L^^^^LTi^fo 36) GC \Z cL -?> $Lt^lolJt(Garbage Collection Speedups) C##EkL e@JavayD^7A(:j3(76GCCD%#^e(D#^^^L-CV^mfo 7D GC C6Dbb$ (± compress TGi 0%fr t> javac 0 27% £ T'$t% “£ t o 7D V =7 h E% ^6 21%'trfo JIT n y(D^J$l'tZ> 3— KCD&fgiPfplJ: Lfc £ GC 0^#[q]±A^yD^7 L«t 3o 37) Java 1CO l^T (D ^ h #> (Conclusions for Java) 7?«U ##C Java o Java i: Hydra iiffi&AHH' t ^ 5 d k7?fo #x£: D x 7 >f — hV'S y £ ItM'fb&frofc £ f & d o Java T##: ©7D-fe y+l£7Sffi1~3 £td\ &©£'?&;£$£#& D Jto Java % 1/ v Java g#:, GC^JIT3>/W7^f7XD- - 132 - ASfo tl±T* Hydra © V7 h xTOISSUt) b Sf [REit-g] Q'.E#T*S £ ftfc ;v- 7$$© W* 3 ww 3 Asffim;t:* 3 boilfoz -ctt. 3>^-f^izt^xammtztmmmfuz itSf A>o A:f*^8J6*Jfflt--S;H 3T*ttH?6t£fc©l'g;-f„ SfcjSSUffO-fe-^x-f X 7 h As£>-5.fc‘A>(:f7x 3 >/W 3 < T6 j6?iJllff'i!S-5 k V 7 ©#*$ WJA?f o 9:75^ -X- h5M©#mi±3 >/W 3(;toTSiLV'«-a-*sSi b $1" o Cfttfg#® tT-S5©li#SCtSk,Si'St, A:{?fto -M±MA^|s|bx-^&@3 7©T##A^bf 7%©l:x Htt77'i'^- Hbr-sa kudUMc, it. #@!c? m < © < ©•?«❖%)»§* Lfeo Qix —7 6j=5&*>7 Lfc 6>s JtT b V 7 h\ -3 $ 0 kill $ ftfc 7 U 7 b ©81 1B-&S1XT < ££ W A:^t <-t;fflTC*V'^ s ICS'99 ©Mi*C«V'T* b it. iBM©kC5x SSxtt l"7 d73ACWSj bl'TitCSDIt. *Stt, t LA©##A^61i7U6»6# T btltzXV 7 Ux ##%##©% 6# < ©7 1/ 7 b AsjtT feilTVi-S 6>h,$fo Q:Hydra T*ffl;i/-7g!| b StfflSlittb?&oTU*ti>o A:66*©3 5ASfioTV5©kR«>&^$6^x.$1-o «»ffl«l3SlSitftt -7©7V7 HtcIIIDaT^kAx rHx-f I'J LXf ©#)©#?% 6 kVidSiili&b $1"„ Q:while d/-7©##*fT©M&m#7 < AdS^o while 11/-7& 3 >;H 7 (3 t o ttt M^HbaHT-to A:while A- —7gf*© W24-f" <* CttiT C Acl^x tfd1 > 7 6 £ k'-B Q:Hydra T-tif 717 td*f VT 3 WH 7 A^ASt* S B fr&sfxt© * UfcAbx 6 U 3 ww 7 As;k—7Asafi?!|T!fcB k9;;S"e $ AciB^lclixKSSIffS: L4 ■ 133 - LT < ^16 o Q:3 — P##(d:3 WW 7 \Z t oTlltto A:(d:lX CCDt^WCiamZ:# < 0^^ D f O^f tto 1.2.10 mK&mmfamg.: hpca-g npcA-6 cm^c, $E E3 & ^ ^ A -& o #^B# : ¥b£l 2^1B7B-¥bn 2 5UH 1 4B (8 0H) : HPCA-6 (2000.1.8-1.12) (The 6th International Symposium on High-Performance Computer Architecture) IH6S : k r>;v-X'Tf3s 7 7 >7 A^## : ® Proceedings of the 6th International Symposium on High-Performance Computer Architecture (D Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation (S) Proceedings of Workshop on Interaction between Compilers and Computer Architecture ^ 16 o x 35# (##%m 1 e 3, 21%) (1) ^mrnrn HPCA(±. Bll:IEEE^^^^#f^ISCA(Inter. national Symposium on Computer Architecture)IZ H < No.2CD & & o # 3 i:lTA- b 7 i T iz £Hb $ tl % Mfq] 0 fo & iSCAi: \&M o T, 3 >;W 7 V 7 k ^ C 5 ^ U A^m^:(±2 616 ###3 A#:02 Z) o AIHSiC&l^'TtiC HPCA WWlZtltz 2 307 — 7 v 3 y 7° (Workshop on Multithreaded Execution, Architecture and Compilation J$l Zf Workshop on Interaction between Compilers and Computer Architecture) Cl tBJiff C Wi[p] IHilE £ff o fZo - 134 - HPCAUA —M Aj T$)D^t|60g^d:^o n>b°!L — ^7-3r7"7^7©M^&^ffi i: ^ 6 r 71 U j rapt'^ij r^^t'17 —r^7^EUj r>>xfA7-^f^ f-7 A LT©##t^#lq]±^#j rv;i/^-^ P 7 V7F7x7Mi: LT ry;i/3UXAj ^ $#%y-^^7 ^7i:UT© iJ)V 7 ;*;!/ n T7^©Haungs£> ©fgHT' $> 0 > 7 /c&b(C# L/(A## i: L"C rBranch Transition Ratej dv>TiafcAA Sr/c lcN if© < £> OTaken (fe l AiNOT Taken) < fa A l'' 7 Transition Rate £ all A b fz&IZ Sf %^©^%m$98%a^omm-c(±, 98tm ©Taken©^ ^ Not Taken** 2 M'MM t" £ © *\ 49 @] # C1 @ © NOT Taken# ##ilT ^) © ^ & O ^ §■ & ** -o fz *A Transision Rate& #A^" C i: H ck 0 T n —7f/3 7 7"C(±, r-711/^-71/7 —^r^7^-7CNf ^7 — 7 A 3 7 7°j j&zp r 3 >;W 7 t n > b°xL — 7 r — >7 3 7 >> 3 >£Hf| f67-77377j ©2o(cm^LX:o Cfl%©7 —7S/3 v7(±, mXOZJt V T- 4 14:|W < &U*A #S%©Alo]#i: V* 7 ##T ^kh %&~otzo *711/7 ;fll/ — TX^V' AAV J:=T^©Kreaseck6 ©^g^T (±, 7° D 7" 7 A © Hfr 7° U 7 7 i:^og^©4b i:, 77 7 i/^ii/©a/^j#©m^^ IH^T1 a£0 Hl^^if'© < i:teA LTl^t A ll/±7!#^^**, ^e7D77A^©7-2 2%©7tVT7t7(d:. ##A6777i: ©##l###©&lAy 7 t7-edD& C iz^^LT^D, 777 A ^11/7?©%#^^^**$) 6 $ A:n PowerPoint-^Excel© ck 7 ^ \ l > :b $) & A 7 7 h 7 77 7 V 7" — 7 3 AtCjoC'T^ if©^Jg7l/7p**#t>^Tl^}A, f^T6©71/7p^l:(±if©#J^©#^J^^ &6©*^#^L*:#7:, ^WcUA TiPA7l/7p'r^fTA^^lz, A<^##©^W7 7 7$:if©BA©#^T (2) Workshop on Multithreaded Execution, Architecture and Compilation ©fg^**A7 >-t)l£tlfzo lAAA'tiA A ft 7 6 ft© A*^S]Ei:®:bft;E>*gSk:-ca'»T ^ b &> %> o fo ft n http://www-cse.ucsd.edu/users/tullsen/mteac2000/index.html *p £> - 135 - a. Symbiotic Jobscheduling on the Ter a MTA (Allan Snavely, d 71/ 7 ;f71/ —>:r Y J: d'^) T7tt® MTA (J, V7V^7 ly 7 Lfc7—7^ —7 > b° ^ ^ T C0 MTA ±7x ##k0#]A^^y ^ a c ai: d; b 1/ 7 h >^) cru #jmm$^±(fack&m^7c# r Symbiotic j &AJ #00#A&L&yyf #i:y7k- yy NPB2.3 #-^7b0f#&6a#02o0yDy7A&mx A#:0y7l/—y7b^^0J:7(:^fb^a^&^^X:o caCck^tW:, (^a/v^0#^, 10-2 0%###[iS±##f#i2fl&o X:/:L. yoy7A0$M^At)#^7: 1 0%#J^ ^(Dtztf), ±X (DUfo-nt)-# % EWittJteUt z btek-DX, MM & MtTftffziT S.O.S (Sample, Optimize, Symbios) b V bXXX V n.— 7$:^#L5*0 NPB2.3 *-$7l/^AT^frfa#A0A#:0y71/-y7b^m 10% ±(f ac k&mmu&o v7i/f-yi/7p^$^0yDy7A0#^fb^(J7!W:^<. ##&0^y^& LT^g^^rao b. Thread level Parallelism of desktop applications ( K.Flautner, < i7 d >; R.Uhlig, -Y >:r 71/ ; S.Reinhardt, < S/ d : T.Mudge, IBM Power-R Compaq EV8 N Sun MAJC T'tdL V )\z7~X ly 7 K ^17^— b cF tl a o L Ln V 7V ^ X ly 7 y Y > y A\ Internet Explorer ^ PowerPoint ( 3 ) Workshop on Interaction between Compilers and Computer Architecture 0## 3t7i/3>8#0f8##&^7co LTF-ftL C^60^^^,m^amt)^Ta^#I:7) LTS ^ao &4o, http://www-users.cs.umn.edu/~sycIio/INTERACT-4/ ^ G ## ^ ##^A^-e^ao - 136- a. Limits of Task-based Parallelism in Irregular Applications (B. Kreaseck, D.Tullsen, B. Calder, 7z7V 7 7V7. >7'7 SPECint95 Compaq #0 ATOM 7°D 7 7 7 7Vto#7PD 7*7 A±T\ 0 7°D 7 7 7 71/ & ^ (7 > 7 7 7^7: memory-independent ^7 — Memory-independent hide C 0$B 7~22%07X7 (1V-X, XD7 —7^7^ —C)07X7k#x6) IT 6 7 7 7 h CD ^7: memory-independent 7r $) 6 C ^ LTco ^fT7X7^^07"-7##^#l^#^L^:7X7 l//<71/TCD^#ay#fr ##H:, &6LHd:. U/ftJl:It 7 X 7 (4) HPCA-6 omm 1 lt7i>3>3 jp-7-kT Pl/7 3f4:^&^/:o %d3, tz Panel ^Impact of Interconnect on Computer Architecturej (dc -7* ^ X 7^ 0 #)A- 7 7-L71/ tt£ o fzo JXTTtdC C tl b 0 41 b t Bt> tl 6 %M\Z~0 WX £ t &> 6 o & 4b\ http://www.irit.fr/HPCA6/ a. A —7 — h X h° —A* : Joel Emer (7 >yVy 7 XIV 7 7 IE fg 70V —7°) •"Thoughts on the Evolution of Computer Architecture j - 137 - ©so^&^o b. Dynamic Cluster Assignment Mechanisms (R.Canal, J.M.Parcerisa, A.Gonzalez, il #)i— - Palacharla L Smith 6 /b^tES DTtA7)7Dn-fe'ythy — Lx 7°D E 7E(*1© 2 7 h (777^) ^^%1:#E D ^T7)^66©EtA #KC-C#lK 79-P&^#C^#^-C&6o 7 777^T©^-7©^(d-#Dl:#7^—/^7 P^^JMf 7)^&6, C^l6 2 o©# D—P&7)C'(d:7pT7Pl/7f#f7)^^©#^7r&7) To -P '7P777^7j -^, ^ #{iiaci:^j©##^-^hy\a#!iD^(t^o m/:, ^^©D-pv^7>7^-^#m± 1:±# < &6&^cko l:#7>7 & igtt, IE D ML, fo%-sLm£X±lzt£-DtzWi&lzte, n — P©/J\$l^©##3.-v b-\SE Dttl*7>o $61:, D—pl:^v^7:jrE77^<7$^#v\ &7)iG±, Ei#E#i$## ^ il/“'>3 >§ SPECint95 36%©#m o Ctl^T'l: IBM © Code Pack ^F/b5tE^ $ ftT & D , 55-63%^^ § 7) o D^Pb, de-compress 1: 75^1120 dit^p >&IE ' -a8©7°Dty7-e#, FPFu ©x< p;L^^#m<#%T$)7©'r, pp^. -y hU simple INIT a-y h£itEU 2o©7777 (ALU) #$)7)#^l:(±^667tINT#t #^T#7 - 138 - y T^trc SPECint95 Cck^7 0(iA#:03i), e. A- — 7 — h 7 h°— ^ : Kevin Kahn (A 7 7^1/ 7 A-y 7 A-7 Lab.) rDirections in Connected Computing for the Consumerj 1999 4M:(i 1,700 ^^0 PC ##mt:##U-C^6(IDC 07"-7)o 1 —#^T^&A0 61%, 2^#pcTt^A0 86%x 3^#c7O»6A0 97%^A>^-$yH:##^LTt^o C0ioC, —PC 0mBcgci:#30-C(i)k<, 3:T0$:#0##E& PC "C iti' — hi* & o lz tz % o C^Oitli^ ^ Av0 7 f. Register Organization for Media Processing (S.Rixner, J.Dally, B.J.Khailany, P. J.Mattson, U. J.Kapasi, 7 7 7 7 3— K Ac#) 7;rMyM-e(i, i#^c io"eir&0;m#<a& %>o Z(Dfz&, 1 yot V^^0####0#^(^A ^,o c^o, D^7^77Aih0^m, ###m0m#@m 777®#, ^rr(i, D^7^7TA;h0#^^#h%&, ®#, m#mm, mmmii03A&6i±## M 0, 7' D —W1/& 7 7 7 7 7 y A UT, 7777 7 195 ^01, 20 ^01, 430 #01 c c, ##0mT& 8%}: - 139 - v—'>a ysmuttz g. -eofifeOfgH -Y U y 4" A#® J.Torrellas 6> ti, ^Toward a Cost-Effective DSM Organization that Exploits Processor -Memory Integration j kSLTSSSSffofco 7D -fe y 4" © cf1 ifi tt > PDA PIM(Processor in Memory)7r kffllutSt^i£*>, PIM 4$) V'Tk'®J;di: DSM S$S1-nii, DSM AsHS-eS3; ^ k S SPLASH-2, SPECint95, TPC-D ©z<>^v—7 tC J; *3 -> 5 a Iz —-> 3 Dfeo il — X 4f— T.C.Mowry E>ii, E Software-Controlled Multithreading Using Informing Memory Operationsj V 7 h 4 :nT$lj®fflVA4"^ Iz ~y ir t viv«i/>Kll, y^E V iz^^y-y>-c kAsT-$5 kt'd #A^t®®, 7D-E y 4h#hK±, l/iy%^77d'A&*@< f SlD'-H^xTisSilUttSo *fc, 1 c©x 1/ -y P©^4^E,44^#*tcW:, yNI/4- x iz '>ft >tn,z&t) £tt-&®fI4 u*t'** s*iST*fe.5o ;h?.ffl)Sc*ll- rn-tr iitHEH, @t#©S© (ISCA’96 -Ctl3g4U fc A- -V -y Sz a = * * 3.-97 □ V => h zxfix^fcto®2o®vyx4ffltu 7T-a®lt5EiiaoE, lz y h* 4W *3 @x -5 ToiS (E*2 X V -y P/ rn-b'y+f) 4lg$U/c<, SGI Origin 4T7u*;i/h U SPLAS-2 C 5 1 U-->3 > X, 4/7 07 7') 7" —'> 3 >l:ot'T 10~14%®tt|g|6l±^f#e>ni 3/7®77'J!r-'> 3 >Bu i~2%®tttBti±;W9E>to3C k6«S5Ufc0 (5) SE&autfflfllr® 4-0®si*s* dt, #ic*#®e%-ea, stsi-e-r <-{®x-5 »E^A45%lcfi:t>nTt'?,tS4a5ttoTE!$U7c<, ^ET-tiy «x« rl;'®7D77Ai; M7D77 A40KtC^E>-&5 fciWtfJft'j tft EPowerPoint^Excel^Java-® ®7D 77 At ffflggX V y P As<$ftft-tV'3©;tPj E x lz y Ptt*60±® teiACIB f^e,n-cv>5,®A\ znt&xu s. >y±, jFc (6) 7D77AIS1 HPCA-6 ®yD7'7A4U.TlCSt-fo - 140 - Keynote speech Relaxing Constraints: Thoughts on the Evolution of Computer Architecture Joel Emer, Compaq Computer Corporation Session 1: System Architecture Tradeoffs Impact of Chip-Level Integration on Performance of OLTP Workloads Luiz A. Barroso, Kourosh Gharachorloo, Andreas Nowatzyk, Ben Verghese Compaq Computer Corporation Toward a Cost-Effective DSM Organization that Exploits Processor-Memory Integration Josep Torrellas, Liuxi Yang, Anthony-Trung Nguyen University of Illinois, Urbana-Champaign; Sun Microsystems Impact of Heterogeneity on DSM Performance Renato J. Figueiredo, Jose A. Fortes Purdue University Session 2a: Memory and Cache Session 2b: Networks Design of a Parallel Vector Access Unit for SDRAM Memory Systems Flit-Reservation Flow Control Binu K. Mathew, Sally A. McKee, John Li-Shiuan Peh, William J. Dally B. Carter, A1 B. Davis Stanford University Department of Computer Science, University of Utah Performance Evaluation of Dynamic Reconfiguration in High-Speed Local Area ; Modified LRU Policies for Improving Networks Second-level Cache Behavior Rafael Casado, Aurelio Bermudez, ;i Wayne A. Wong, Jean-Loup A. Baer Francisco J. Quiles, Jose L. Sanchez, Jose University of Washington Duato Universidad de Castilla-La Mancha; Umversidad Politecnica de Valencia extended Block Cache In vestiga ting QoS Support for Traffic Mixes Stephan Jourdan, Lihu Rappoport, Yoav with the Media Worm Router Almog, Mattan Erez, Adi Yoaz, Ronny Ki H. Yum, Aniruddha H. Vaidya, Chita R. Ronen Das, Anand Sivasubramaniam Intel Corporation Penn State University Session 3a: Multithreading and Session 3b: Shared Memory Micro architecture Design and Performance of Parallel High-; Quantifying the SMT layout Overhead - Throughput Coherence Controllers Does SMT Pull Its Weight? Ashwini Nanda, Anthony-Trung Nguyen, James S. Burns, Jean-Luc S. Gaudiot Maged Michael, Douglas Joseph use IBM T.J. Watson Research Center; University of Illinois, Urbana-Champaign Software-Con trolled Multithreading Coherence Communication Prediction in \ Using Informing Memory Operations Shared-Memory Multiprocessors Todd C. Mowry, Sherwyn R. Ramkissoon Stefanos Kaxiras, Cliff Young Carnegie Mellon University; ATI Bell Laboratories, Lucent Technologies Technologies, Inc. - 141 - Dynamic Cluster Assignment Mechanisms Improving the Throughput ot Ramon Canal, Joan Manuel Parcerisa, Synchronization by Insertion of Delays Antonio Gonzalez Ravi Rajwar, Alain Kagi, James Goodman Universitat Politecnica de Catalunya - UW Madison; Intel Corporation Barcelona Keynote speech 2Kpapers on caches by Y2K: Do we need more ? Jean-Loup Baer, University of Washington Session 4: Software Techniques On the Performance of Hand vs. Automatically Optimized Numerical Codes Marta Jimenez, Jose Maria Llaberia, Agustin Fernandez Universitat Politecnica de Catalunya Cache-Efficient Matrix Transposition Siddhartha Chatterjee, Sandeep Sen The University of North Carolina at Chapel Hill; UNC Chapel Hill and IIT Delhi A Prefetching Technique for Irregular Accesses to Linked Data Structures Magnus Karlsson, Fredrik Dahlgren, Per Stenstrom Dept, of Computer Engineering, Chalmers University, Sweden; Ericsson Mobile Communications, Sweden Reducing Code Size with Run-Time Code Decompression Charles Lefurgy, Eva Piccininni, Trevor Mudge University of Michigan Session 5a: Prediction I Session 5b: Parallel Systems Decoupled Value Prediction on Trace The Effect of Network Total Order, Processors Broadcast, and Remote Write Capability on Sang-Jeong Lee, Wang Yuan, Yew Pen- Network-Based Shared Memory Computing Chung Robert Stets, Sandhya Dwarkadas, Dept. of Computer Science and Leonidas Kontothanasis, Umit Engineering Soonchunhyang Univ., Rencuzogullari, Michael L. Scott Korea; Dept, of Computer Science and University of Rochester Engineering,Univ. of Minnesota Branch Transition Rate: A New Metric PowerMANNA: A Parallel Architecture for Improved Branch Classification Based on the PowerPC MPC620 Analysis Peter M. Behr, Samuel M. Pletner, Angela Michael Haungs, Phil Sallee, Matthew C. Sodan Farrens GMD FIRST University of California, Davis A DSM Architecture for a Parallel Combining Static and Dynamic Branch Computer Cenju-4 Prediction To Reduce Destructive Takeo Hosomi, Yasushi Kanoh, Masaaki Aliasing Nakamura, Tetuya Hirose Harish G. Patil, Joel S. Emer C&C Media Research Laboratories, NEC Compaq Corporation Session 6b: Parallel Systems Session 6a: Prediction II Performance - 142 - Memory Dependence Speculation Trade offs in Centralized, Continuous-Window Evaluation of Active Disks for Large Superscalar Processors Decision Support Databases Andreas Moshovos, Gurindar S. Sohi Mustafa Uysal, Anurag Acharya, Joel Saltz Northwestern University; Computer University of Maryland, College Park; Sciences, University of Wisconsin- University of California, Santa Barbara Madison A Technique for High Bandwidth and Investigating the Performance of Two Deterministic Low Latency Load/Store Programming Models for Clusters of SMF Accesses to Multiple Cache Banks PCs Henk Neefs, Hans Vandierendonck, Franck Cappello, Olivier Richard, Daniel Koen De Bosschere Etiemble University of Gent CNRS, LRI Performance Analysis and Visualization ot Trace Cache Redundancy: Red & Blue Parallel Systems Using Sim OS and Rivet: A Traces Case Study Alex Ramirez, Josep L. Larriba-Pey, Robert P. Bosch, Chris R. Stolte, Gordon W. Valero L. Mateo Stoll, Mendel W. Rosenblum, Pat W. UPC-Barcelona Hanrahan Stanford University Keynote speech Networking At Home - Directions in Connected Computing for the Consumer Kevin Kahn, Intel Fellow and Director of Communication Architectures Lab, Intel Session 7: Novel Architecture Issues Register Organization for Media Processing Scott Rixner, William J. Dally, Brucek J. Khailany, Peter J. Mattson, Ujval J. Kapasi Stanford University Architectural Issues in Java Runtime Systems Ramesh Radhakrishnan, Narayanan Vijaykrishnan, Lizy K. John, Anand Sivasubramaniam University of Texas at Aust in: Pennsylvania State University The Best Distribution for a Parallel OpenGL 3D Engine with Texture Caches Alexis Vartanian, Jean-Luc Bechennec, Nathalie Drach-Temam Paris-Sud University Cache Memory Design for Network Processors Tzi-Cker Chiueh, Prashant Pradhan State University of New York at Stony Brook - 143 - i.3 = 7 1.3.1 «S =i>m--inz£z±Mm, siffliistrtt, m± ■ **#77> hD. #s-^«rfc*fl?®©sijmiri6iit'r*s»®S!iA5iiS$n'rv^o ^©fc»N Hpccmgh Performance Computer)/x — F >) 17 C 3 > t" JL —7 *sj|| — 7P -fe -y +1 £> 7)bf7D -fe y b"tb> < iigtcafe D . 7;17 r/n-fe -y+f3 vea-iMi. ESb tt64fS6$x%S S*Iig*C*oTU < Ck&^b71\^. bfrb&Ase>, v;v^-7n-b -y+h3 ytfji-j'->XT-AC.i;-S*36?iJ-fb3 >;H 5ti\ SttC*UTI4. *fST-S5 7n75 3. >7Blg»s#ST-S ti . #HT-@toTfibVIC < U b t k#ti\ M7a:T©@#&+a-l:3l$fbf C ?)l/77Pbyt3> (a#@#l:*T677U 7--> 3 VSgft^fflfgBW&ffiig tb) tt. A^o-c ®*mi0iei6IHEy7 7AW.L-amA"'V^l OiTSiB-^tm <2>21 ^ - l&’B.OM+n&ft -KXi a JWtStSKtL fid ir-TStT^BSfiviiyi. a*fkM*|:j-Vm;KKAiiW± ©rbv V>AH£?i|(baA^ ftifsetp-itfc ©3t?'Kb=i> W3o>ftffimmmo)W3zflm HaSIBS: f?'lxi#;5^f$y)3£?i|8tH^coili ( *(Sl b 73'J'T- V3> g tbrtdti-TciSro'ltifc) ©IXttlAPC, WS, HPCIZ^Stt^ttfA 7-=t79*V©t** nWTfm fiWX#WI% ;*1£«VLSW fcs^isk ^mmfiSk mwmm 0 1.3.1-1 aasEig • x»e6gaiBi:*mfiM5e©i$ - 144 - c©fc», 7-9-3 > if 3.-7 ->x# A©®!"## $ &itg± L, A"-og|8btt 6E6iei6-Sj£?!Hb3 >/W 7KiW©giJ%As;lU$k&oTV.5o iRffl7:®uJp-tU7'n'/7 5 >7'ltg£A;bk L, S£jfe©3fi?iJfb3 > ;W LtSt 4 6. A" LtoSt© 5> ti;fc@Jl#fit;#|i6(;:7'D y7A6fl-|!|-t-5 73-$-C*tt*< N 3 3Asgtttoic»jeA^®aStotc5>8!l VX-iryn. -'J>y6f7-5,7-3^h75r-A7V-Ajg|!)jfi?ij{b3>yW7ttrjfflE^IB%6ff3„ kft6©#^uK#C4 b , A#ft PC (Personal Computer), WS(Workstation), HPC ©EblEliJSAitk&S x'll/AT'D-tz 'yD-3>Ka-7v7#A0®©#-f$£|n]±$-9-.5 kk*C, SDattlbSfgtiSti'^, cm:4 b, fSlgaf85eiie»5t:* ‘v>T, *18 ft PC, ws, HPC rfi@©ggg • y ?SK»5tcH)iUAc«SEEfflSiJttl4Sto, PC A" 6 HPC CS-6-IS 18$ SB, #6©3>if o.-^&@mL#a^*&e7$<©m#, s e.cii, b*&t-*-7 (««) -sc $Ac, *««©**«, i«, fiKlc#T-So 1.3.2 r P*n>7 P #Mfb3 7/W (Dr prs>7 P»Jfb3 wt •f 7ftSi,*J;y-'(D3fi^Jfb3 7/W 7©tt|g|fflS«©gg%l:ot^TE^Ig%&SIS1"5o (l) 7P/^7A P#9Ufbn>/W7*%M% %*©#Mfb7 >/W 7K#H4, #-eS©H6?iJtt©*S7"D *7 AA^lftm tfJffl L Xi^fzo LA" L, n-pijiytf, '^77*? 70-t^tli'e,, SHSiELfct 7 Pl/S^i^-A, HPC->7t=-A k#«fb UEUfb LIT < 3 k, %*©# -*£6&MSk Lfc3£?!lfb7:ti:, +fl-&SI%ittSg6 ttifk kAs7:$t\ cf%6©/\— p 7 x7t;tii£ < g»36?ijfb7 ww 7f£S©lFSrAs#toe>ftTV"3,, 01.3.2-11;## 4 7 £, HPC'>7# A©14t£B:, SrVW7ft« («**fb k BfflW $ S©W$fbt: =k;6ilf*T'©f£lbi6i±) k, #)SfiE t^i' 9t|ffl7-^fl'ft) C±D, ^©ai6tt|gA s|6l±L-CSTt'5o -#, Hf1077 Ufr-->3 >S*lA"L7cl)?f©StgT*fe^*affitbtt, 0 1.3.2-1C ## - 145 - £ TZ (D 7° n Hr y —^ ccDcko^##®^a, ^;k p## ^ ^ > - ;i/-y - &*yo v mmn(Dmm%mm$}^\%& t sm'??v j &o Ctl^^Ds H^Jt4fb$:fo]±$1i:^ tt&lz, C^P'OTi^ }$o fc:x—if ^&/b>o fc x^v^yn-b y if rj > b° — ^ A©f'J ammaemmm ASCI CHALLENGE 100TF- (100TFLCPS/ 2004) VPP700(F) SX-5(N) 1 OTF ------(MPP)______- kSX-4(N) CM200(64KPEl y\ SR8800(H) 100GF Z\Origin2000(SGI) RS/6000SP(IBM) SR220KH) 10GF CRAY-C90 P26 00(F) V 3>tfzL—$1 ^ybRAY-XMP (800MFLCPS CRAY-I (150MFLOI 7-* fwcr 70 1980 1990 2000 2010 gi2#j1466 va>0)3gf7l466) -91614660)10-50% -MPR-e(*3~5% y.T©a*3„ a. giiT^fyi/'f >jte?u-(bft^i©iii% (Fortran, CH) T-® frtl fc V - 7 7" D V =7 A t & 0 , #M#:am@g*%KkT6 77 7 h7;i--A7 ';-&6*ja£?!Mbr! >;U 7©S#gSff ■5 o #y*toCti\ AS@fl-6m@tolci6ttJ»^iy Y > (*i@) Cfl-gllU 6«jto£ lbKFi%Sg%-f 3=, ;;■?«, 7-?m&ffi9mWu jSStoSUfSE, ®, X-yya-U >'f&m - 146 - Fortran HPF, HPC++, Occam, Linda, Id C, Pascal Fortran, Prolog, JAVA DFC, SISAL, Valid ~r — *? $5 M * IrI OpenMP^m, (##)!/ ^ t V ...... ------7 -y y h 7 * — A V i/ > SMP (Shard Memory Multiprocessor) i.3.2-2 r hviyx - 147- «iit a-Rm*#©7 oy?yo77-f 6ffl^TM®{b6H^MISS©7-7 y h 7 * -i>7 'J —ftafi^Hb^ji-——;i/©|!%&f7-3o El 1.3.2-21;r hv<>7 h >;W ?0A##aEI, Eli.3.2-3l:m%M*M#& sf. ai.3.2.ii:flsijerjtcHr-B^Mi$@6^v, «7©K#ciwf Ell.3.2-4~1.3.2-7t;^-ro #M4b=i>,W9 ------.x V—x3f □ U HPGJz-eSi$ 01.3.2-3 m. 3.2-1 r bvx>x h#^j §r^!4o » a ft © k ib tc li; ^ - -y* *s L& < T g Stj-r — ajj^-r^^afebo S4o x — V >*?j^#^##4o pfa. $>^v>tt$jBg»tsE »65*70^7Ap»tiS >;W 7 7 4 — H 9 is H ?§ih iz. fij ^J 4“ -2> #c##4 o - 148- (2) afcTlHbn WW 7©ttSgFfflfitflS®Sf%gB% 3 > £ a — f '>7.9- SPEC ^0Ufc!0-2>^>^x’ — 3©AS—IBtoTfcS A^ Cpt6©^>f-y —±Ica—p >>x7©ttfi6£jffflj-f&fc fetl*Jhfct0t!fe5o £©£©, am-fb3 >yH 5ti«©&JE»ttt6FFffiku-5|g ££i:liTg&V''o A>yy — ^©tpCli, %*©$ — *i@ j£¥U-fb3 >;H 7-ElJlJy\- p 7 ^ r © t;-7 ttlgCSV'ttSE ASjSjST- $ 3 t© (M : SPEC CPU95fp ^>^y-y© SWIM 70^7^ iSC jm-fb3 WH 5 As U < £>Si8)b Sff 7t ti a - p C ty±©#l%C Z bttib|p|±* 5HSlT:fe-5 ta-^^TAi;Lt0gft ffi^gWt bft'Ofx’-7T-H:I¥ffiT*S tiWtK ^^©y;uy-T"n-fe'7it3>t"i — ^ ^yyA©6filflec j3V'-ttttt|g|6)±tcW^-f ^ 3 >yw 7tSE©tttb6&iECfffl'r -5 ««©«* #-i:-g-e & 5 o SMP(Shared Memory Multiprocessor) ■> 7 ir AL, WS^HB^flS® r 7 HA >7 pam)b3 ww 7K#i©gg*j ic.tn>TBB%1~-5 gumu^yp'f >#¥U-fb#©N 6t©l:WT©R%^*&R9» UtmE^SSmStiA KTffl il t) O a. «3'J##b*m#@©^* jfc¥ij-fb3>yi¥ 7A^e3flgij«fb©Slb6l¥«f ^>^$6e^gs%'r^= g»3-;pyy W >3fiWbfi:Sn e«ttSlffftSn Stt^-j'fl-fR 77yi-'j>ye*^flsw«gsff - 151 - 1.3.3 HTi:. zmf&BmZmto (1) 7 hvt>7 t-3£?!Hbn>7W7ftSIB% a. v =7e.v»sma#t-s3i o CT&y? v h 7*-A7 V-&g«jv;i/fL7'V'f >36?!Hbft«&{i*'t•&„ Eizt"-Bo c. *E^gg%-c*ig%Lfcs*sffei-^fetoc^ m*$nfc*9!iffia3aifigttsi® ff-T-So E^|g%Jlg rjfe^ij^bn >/H 7 0tt|gfffltt®ogg%j tj; ■o-tfB/ESnSfffl^SSfflVi-r, #«©S&3)*»fflSMP(Shared Memory Multiprocessor)'>XTAICj3V>T> 5$S©S—SSTcDilt^iJttSltiHSff 7 gHj3£ ?ij-fbn >;W 7©tt|g£ttSt l, 2 )gjy.±©6El6|±6il)S'f 5 c t 6 I^fc-r-5, (2) #MYb3 >/W 7©e##@iK*©m%N* ry pm>x b#Mib3 >/w ?ft*©m %J ©ifffl©HJ(fi§a tt. SMP->yy A6*ts> k LfcatM-fbn ww 7© 1.3.4 ffiftfflftfoi&l 7 hvi>7 b3£?iJlb3 WW 7ftiri5i*&*iit4CafefcoTfflSlf^l8J6«:©l ($) 6 KTICyfo (l) #*@ag $to»E^8g%©lE*61l-s<, zcDtztb, (7n yxy b V-y-) ©PCS • # • *«# • *ttoCE^gg%6HST^6a; (Stia 50 - 152 - (2) O) m'pfflpim MQwmM&mmtz (4) *nwafais©av' MUS4=t;*4bfe*n«Meistco^rtt s f t, RiftgtcS *0 k't^o - 153 - * 2* 2S >ymm 2.1 ms f\ 2.1T*lAt$(9-tS(3>Li — -T'f >yirHb-6E%Sg#S©Stt6 $ kto-5„ Sic, 2.2? z:/? Hc«L?H16l8$ef7o/r®$S$kto6o ## 2.3t;i3UTi£«^it3>ti-7i^ >ySEA ijSfflW*B*T7 -';'?-'>g >#»£ -3^?#MUX:0m&7nf o j£«»ic3 >ycMt)58f^sg%©®tt?tts k Lt. «^SE$g^»^0r6t-C>CE^a%»sfi;tonTV'^ rNinf7"n -7 x 7 f j N >7-'|'>737F7777T$,.z> TGUSTOj , *5 Fffl rGlobusj S5tb±lj ‘\ SttfflfMSfB&i&Stf' 1--5o £S«dF!(3 >7'to«KMt3to»i|j|u]I0*f;::fcOTB\ M^SI^ISSIS m^mrnmmn, jsiwjisaneiMnyfa-f-t > tv >777 f^7 •7-k rGridj ©. #lr*BS*-C>k LfcSttlcgat--Si8$Se,-StoCff ofcE$&$ k to^o w*7tt, ®t»I*ox-;f “3>i;i-^sifiiltt*»;tIt:|tJ^l rSupercomputing ’99j. ©Grid -&S;©3>V —7 7A r Grid Forum j. ©Java 6ffiofciWttfE>16 9 lJ4)'iS(3 >b°rL — 7 J rJavaGrande Portals Group meetingj. 7*7 $ 7 F fitlnl £ SI ?U • jJS( • J[£tj8 Jo <£tfflittlul+StCHBf JeI rinternational Symposium on Computing with Objects in Parallel Environments _K ©UC Berkeley © D.Culler |tj§A5[t,,C| k&b^-^W^ptj S'11(3 > t° n. —-r J > 7* d” > 7 7 7 Ibi'f-tS rMillennium 7Dyi|) h j, ©l£tl53-i(3 J L^lC^LT^-dk^y F ffl®S!l£IRfct r Globusj, ©NCSA tfd'i'itiotlSiS ^##1%#© Grid ##7D 7x7 F ? (KlSfl-En > e^-T- J >7®*#k UT HI" 3 7 7" U 7*-7 a >fl-SF:o®T©to WCdot\TK. *^to».®.-a-»3SRffs ¥ffiE*#s SisxStofL - 155 - H%6jiito'T V-Z> rl£®^&i!C3 > t’i — =r *< T-6 -ir — ix 3 >©JBfgj SIMilT *E%gg%T-ti, T7-'Jtr-->g>kUT SDP LTVSo SDP ®@(¥iE stt7-d ^7a)«, #m#:+6sissiji:*?>z u mms lC5Rto-a^@T-$.D, %#©##$UM%k'0A^6s TV'.&o LA^l/x E^©#S®lEI+»k'ICj@fflt--5fctoiCtt1 #-©*-71-3 > t"a- %;+#### o, c^*-emmfb#?w#g-c6 7t. c©6&. sdp R@&6#a-#K3 >yc*ftE$itT@ lWjSI+*6^S1"-BC k-c, cii$T- c k&gmLTt,^. sdp ngeoMsybcjsvrtt, sdp 7D^7Asss®^7^-^sfi vt . 6 ki'3^m&k^6©x & wi^bt^T-s, £ia»a3>bi-f^>yffl©T7"vtr-->3>kVTjsi"5ck jfit>fr^>fZo - 156 - 2.2 2.2.1 iSaM^VKi-r-O^OStt J>£« #g( 3 >t" h* y h 9-^R#©* M L, x — ^ "C& C Give & Take £ £ b rSffl £ *1 3 MMY & Z> aG'OCD^ ;xij7^y h7-^^n->!j;i/xij7^'> h7-^S:^:a U/z^D— t LTJ>£ —b&E^ • Akenti http:/Mjy.w-lt^lblgDvZAkejiu/ Y A ]) ti ty;kft • Albatross http : //w ww.cs.vu. n I/a lb a tr o s s/ ^ ~7 > & f bo • AppLeS http://apples.uesd.edu/ ~7 ^ ]) il yy V y —:/ a > — V >y^:#B L/zE^yo S/o: /? bo • Condor http://www.cs.wisc.edu/condor/ 7^ U A ^mE@^fi/z$m(D9-^y7'-i/a bb#&fr&o • EuroTools http://www.irisa.fr/EuroTools/ 3 — D y y @1 3 — u b 7 — Y t Z> fctfXD'T’u i/xY b 0 • Globus http://www.globus.org/ T ^ ]) il T A U MfeWLk^^tzYut/ xt? bo y D“^'/i/3 > b° jl—r -r > - 157- • Grid Forum ki&vJMwmgnMDXumjixgi ^D—>tf 0.-7"^ >y^7^A0#$ • IceT http://www.mathcs.eniory.edu/icet/ T 7 U tl • IPG http://www.nas.nasa.gov/Groups/Too1s/IPG/ 7 7 V * f 6#^yDS/o:/7 ho • Legion htlp.://Ieg.LQ.ni.Y_irginiaJ.edu/ T 7 ]) ti 7^%vn >/9 ——i> 3 ty f ^ c aco-r ^Av7 h^^yyo^j:/? ho • NetSolve http://www.cs.utk.edu/netsolve/ T 7 U ts -f7>h "th—/^7=";i/C^'d • Ninf http: /In inf. e tl. go . i p / B ^ f 6C acDTr ^6S/7TAo • PACX-MPI http://www.hlrs.de/structure/organisation/par/projects/pacx-mpi/ Pd' 7 //D-wi/:]>trMPim#7^Vo • UNI CORE h ttp: //w w w. kfa -iuelich.de/unicore/ K 'i 7 C^l6(DyD7ai<> h a&U'Tia&^o f ^t^^T(DyD^a:7 h^f ^T^fUD%#!j&@oTl^o > U:L—f-^r j; D, f ft £13 2.2.1-1 \Z7Fto - 158- Application Problem Solving Environment Science Portals Environmental Chemical Cosmology Molecular Scientific hydrology engineering biology instrumentation Nanomaterials Application Component Architecture High performance middleware Web CAVERN HPC+ + Condor Resource Many Numerical tools soft brokers libraries Worlds Legion MPI SWIG NetSolve DAGH Ninf Architecture Components Accounting Communications Information Scheduling Fault detection Security Instrumentation QoS Data access High sgeed networks and routers Resources CF a2.2.i-i if D-ivinyea-fi xxa t a > 5 W??;!)* US/v (eft*3 #atbfc9Us Utt'-So ± SB U -> t A A © # < 6 s c© 5 •k5. n/'J-Ir-va >©ff *EIC|6*t'|Sg©^Sfllj-S: UT (1) Ninf: ^7U t' M9> 5 T-it - n > b a. - 9 (T ii 3 ^ X 9 '> T t U & k"©*tttg8t@ i/X^Atts ■?■ 5 Atl & tl5 & ©Tti&U'o B*TAfjT6©S/ATAA UT l11 5 dfeHs 5H^x#^s Aj tlT t'' 5 o J6fl^iW't3:tbstW7/X'T‘A©^^ltj:s - 159- a^w#a^6o Z=A\ Cft(d:^bTW:^<, y^ V .UL |5|#& API tcmO^C^COy-f b^^ c^7c0%7k^6M#^yy tyf-&$6 wm-r^^o yyyA. cm b^^(:&^#f##m(PC ^9-^yy-ys >)±t*ji—if-r >y 7 x-7^ti#t u ;^3 >&¥)%m ffl if & '> 7 :f A 0 d. b £ Datorr(Desktop Access to Remote Resources) 1.1'' o o Ninf (i c®Datorri>y^A(Dic-e&D, #% E^^rUxTU^o Ninf ©M^^TAti^7^7> b /ft-7^ y )l £S”3 ^T iSHf $ m2.2.i.2 Ninfy7^y>K Ninfy-;^ 7 £ tf — M'® 4 y □—/Syi/i]z/tf .%—T't^y NinfDB Ninf Register Ninf Ninf Executable 1 Internet Computational Ninf Executable Server Ninf Executable ervetj Stub Ninf Af/nf Client Library Program Procedure Ninf Stub IDL File Generator Program -3- — If U: 7 "7 T > b y D 7'77 A £ (X C++N Fortran^ JavaN Lisp h V'1 o coycy^ < Ninf ——ys>#)±7!#r#L, ^y^7>b^^(DU^3LyM: L.-ecD^^&y y^y> bc^to 7 7^77 b#Jy(4: Ninf_Call 7$:#^ LTy-;^C#LT U 7%-7 b & - 160- mmtzo m±uc8i®i^ Ninf_Call("foo", arglist); ktWMSiWtil L4f?&7 k, X 7 M r7-i'±CMia$li'n'l. Ninf -tt-x©* f L, f 0»-/i±07l' 7" 5 V"foo"4 WTR t. mex att-x4%xLtt'@Ai:ii, ninf://hostname[:port]/funcname ©iWC, URL tcapLfclB?iXSIfi:1W'9--x-©;k7 M, *- MSUW -f 7*7 <) 4fl£ t%o Ninf_Call ©ffetc*, !+S©%fi;kilS$©Efi6fl-fflLTff*7Zci<)©MEJV h 7 > tf 7 -> a > ® a * * 7 tz isb © M S * if tf it #t$ ix X © 5 o c © i 71: Ninf 777A(i, f:t®»-U7 4ig#f ^#j(»-/\)j c*fu #7L©% l(^7-i7> M)tf 6lt®#$&*#L, k tz^mt 1-i6->X^A7?dfe5o Ltf t, •9--UXA siex.7c@'6'T-*E*© SunRPC Skligft D . ft- 7 t fl] ti it T- S * tf 'n x. 3 o (2) PACX-MPI: ^n-/vua>\^^~7-'f XfSSEItCSIScFnfc MPI y 7-t-y*m&m\ 6#M7'cif 7 5 >746«A-m77 7-A±(:mMT-5tf&tc# $pl)9& MPI 4i£5*Lfct)©tf PACX-MPI X&6« MPI li#*##*)Xt,7 kt$ <# fflSftTVSy yfe-yafi^-f 7" 7 U XS> D, C Sl§^ Fortran & k’XttJfflt-5 k k tfX#e. MPI liMRlIWltStley-^fr 7 7-7Ci£U£#?$-r-y 7-b--7ie{14fi: 7 - 161- Global rank In-daemon Out-daemon Local rank In-daemon Out-daemon Global rank Local rank 0 2.2.1-3 PACX-MPI PACX-MPI MPI &V-7 JibMPICH-G(MPICH on Globus Device)b it /v 7: l£ M (3) Globus Metacomputing Toolkit Ninf ^ PACX-MPI & b;b7o:T#l:{&@f ^77TA#T^^^)lC(±, 3L- Globus 7° D 7 ^ 7 b liM^J • % 7 h 7 — 7 N bj-a !J bV>'o/c1il5 ,&^ 6 70^^7 p-c$)^o b 7 H:#ADbTb'^o Globus 7D7 J: 7 b 1 3 (CN Globus Metacomputing Toolkit(J7T Globus Toolkit)#5 ifc 7> o GiobusToolkit (±3.-trmmEi/7^A, m#3^73ib ^0(3 7 If ^ - T 7 7^77TACD##1:^^^ 2 H7##3i 0 (toolkit) 7? 'b o Globus Toolkit #5 T % 7 — )V & I"* T JGV 7 ^^b( ^ bbb 7 ^ 7JH ;i;i/37tfn.-T^7^77TA$:##T6Ck#5'r^^o #lx.Ub M# Globus Toolkit MPICH ^^#b/z MPICH-G(MPICH Globus Device) & £ o luidi® Ninf & Globus Toolkit #5Ji#JT £ Mff!3 ^ 7 3 V ^#b/:^-73 7^^#5#%T6o COTol: Globus Toolkit (±jA^^m371f^- bf7^^#bT^D, 77 b 7 7 37b 3 1998 ^ 10 G ^ Globus Toolkit - 162- #2.2.11 Globus Toolkit ©3 T# — ex -9-- ex £ lu « m #isea GRAM 9 V-X©g!l D ST * a a Nexus Unicast/Multicast Mfitt— b*X t»« MDS -> X t" A ©«£* ct t M t -3 tSfg'N 0 T f -b X -b * .n. ijrt GSI authentication ^t'0-fe^rJ- t) fr -f +b— e X ttema HBM SBr-XT 7-bX GASS x — i"\© 9 T — l-ri’-bx#- b'X SlfrX xX iPta GEM Slffx T'Ol/©«$. V y XT'doiU'BEB Globus cn&©+»■—tr^ii/i>Bicj®dtibsij count'sceast* 3 - 163- Access Protocol) k ID 7 t7 ^— 7 ("T 4 V 7 b ]))\Z.'T 9 X't' %> tz &)Ot®*P 7 D b 7" —7 a >7°n7'7 < >7X >7 7 x — 7 http://www.globus.org/mds/ 0^-70 GCI 77V7b&il=amfflU\ Globus b%Ib #b#W:mf #&c a#-?#&<) (4) £fc>0£ CCC^g^U/c 707^7 b(±c:<—gpco^T&^o 7 U^.—7 ^ >7(:^ frtlZX'&Z o $>®\£ Grid Forum #&£<> ^ CT' Grid h 7 <7)U:j£fi^S(7 >b°zL — 7^>7"£>fcffl©7D7:ii7 b&UCRf f xrlc^^cD 6 U k 10 U $^t/co #An#(± #&07-jr7777i/-7ia-na#9 OT&&0 • Scheduling Working Group (Sched-WG) • Grid Information Service Working Group (GIS-WG) • Security Working Group (Security-WG) • Remote Data Access Working Group (Data-WG) • Application and Tools Requirements Working Group (Apps-WG) • End-to-end Performance Working Group (Perf-WG) • Advanced Programming Models Working Group (Models-WG) • Account Management Working Group (Accounts-WG) • User Services Working Group (Users-WG) >Un.—T--r >^PI?t^^0^y£T*li"FaB ©$£>#& D > tuTftS 7 X* n — ^ x. -2) o • 3/15-17 7" O — A )l k 7 7 7 7 U > h° a — X 4 > 7"k: [H11~ £ 7 — 7 7 3 7 7° (WGCC2000)' , • 3/22-24 7" U 7 b 7 ;t “*7 A(Grid Forum) 12 ^ 3 4 7ir 4 rb • 7/18-21 4* 7 7 — * 7 b*>7yl/>7 (INET2000)\ ii/P>7>f 3 - 9 H 7 V 7 F7X--7 A. • 11/4-10 SC2000\ ^0777$ 1 http://www.trc.rwcp.or.jp/ 2 http://www.sdsc.edu/GridForum 3 http://www.isoc.org/ 4 http://www.sc2000.org/ - 164- 2.2.2 S^M5j$ri&fnJSl3E : Grid Forun 99, U.C.B., JavaGrande Portals Group Meeting C&tf & tiSS/lnj A;i/3 > b°iL— y 7 >7", Grid ®, £ 41 fob b 3 !H^£ £<£ftjC f®^^, fTrC Grid Forum anfw:ti6 Grid###/:#)® IETF #f&®ej^^j:77 -<®#^##'r##^ti/:7777&, —7-e^AL, 7^ v77%*^6 PDA Web #2:"®^ H7 L, -B ®±®yyV 7 —i>3 ^#Ag7—Ify tlTV^o G tlGCDib^^tb^, ##g®^7 H)/7^3 >P#;0^#® 0 — ^7 — (1) The Grid Forum fK0H ¥$1 1^1 0 0 1 80~¥$1 1^1 00 2 3 0 (60^) The 2nd Grid Forum Meeting (1999.10. 21~ 10.23) '>*d ’ffG y 7 U * Grid Forum( 7' U v F 7 t “ 7 A ) (http://www.gridforuin.org/ )U:@j£; # ###b #t #, #1# Grid Co^T®#&^®3 >V —7TAT&6o 1999 ^SC&^T NCSA, NPACI, NASA, DOE ASCI & 1999 ^fFCA® 7-773 CCDot,, 10/21-10/23®^, )Km7^^CT 10^1AI±# ADD, ###C#imL, 0*®Grid^#®#fq|&m^L^o #xH® Supercomputing ’99 4b, BoF (Birds-of-a-Feather Workshop)/) 5 HI #^ti, The Grid Forum a v7®#^^et)a^o The Grid Forum ®7§Wj(d: Internet Engineering Task Force (IETF) —7 A It l^/)5, IETF #1# Grid Co DT©7t-7At'fel,o IETF k|a]#, «lb®}Iffl&®$i]7£C £ o X Grid ®77f 4 tAx 'i £: 7°U B: — v 3 > G, ®)##C^3G»T (d: r^tft® n*^(rough consensus)j b ^ n — F (running code ) ® H >f< (o ;£ 0 , #0", ^ tOt ^ 17, ^G'OCai^^^aLTV^o The Grid Forum (±, #%, l^T® 9 ®7-=t>7"TOk-7(WG)T#M2f!-t!7ao ztiztuD wg ii Grid®mij®###ma&@^ o, m#8yC(±0#^tm#Ib^#(RFC &*> Draft Proposed Standard)® Ufe* 0 life U TC-BtlB-fl® 7 — A-> 701/ —7°® Chair, mailing list, webpage, 4o ck 77 PFl ^ ^ ^J #^7 %> o - 165- (D Scheduling Working Group (Sched-WG) Interim chair (s): Bill Nitzberg, [email protected] , Jenny Schopf, [email protected] Email list name: [email protected] Web page: http://www.nas.nasa.gov/~nitzberg/sched-wg/index.html — V ——713 Grid (2) Grid Information Service Working Group (GIS-WG) Interim chair(s): Gregor von Laszewski, [email protected] Email list name: [email protected] Web page: http://www.mcs.anl.gov/gridforum/gis Gis-wG a Grid ath-c XML (3) Security Working Group (Security-WG) Interim chair(s): Randy Butler, [email protected] . Andrew Grimshaw, grimS-hxi_w_@„v.irgi.nia J£.cLii Email list name: [email protected] Web page: ??? -te n. 'Jr>f © WG tJ Grid & I^EE(authentication)jo J: If qj (authorization) Gi §lj ih — PKI I'Zid' >7 (D Remote Data Access Working Group (Data-WG) Interim chair(s): Micah Beck, [email protected]. Reagan Moore, moore@sdsc. edu Email list name: [email protected] Grid heterogeneous iZjiZMlZftWL U fz remote data & ~T ' *? * 9fai& M&> fz Data Grid tlx at# Grid i:©jft W t iEfScIn (transparency) IZ J; C - 166 - Mitttemmibhzmt %), o Data-WGTUu Ztlt>(D& (D Application and Tools Requirements Working Group (Requirements-WG) Interim chair(s): Fran Berman, [email protected] . Bob Hood, [email protected] Email list name: [email protected] Web page: http://www.gridfbrum.org/www.ncsa.uiuc.edu/People/novotny/apps/index html C® WG TUG V /r—>7^ Grid — \ZftLT ¥(D£o %o li®77 © End-to-end Performance Working Group (Perf-WG) Interim chair(s): Mark Gates, [email protected] . and Valerie Taylor, [email protected] Email list name: [email protected] Web page: http://www.dast.nlanr.net-/Perf-WG/ Perf-WG U: Grid & WG T&&o a < (2) "En* — — (3) (4) © Advanced Programming Models Working Group (Models-WG) Interim chair(s): David Bader, [email protected]. and Craig A. Lee, [email protected] Email list name: models-wg@gridforum,org Web page: http://www.eece.unm.edu/~dbader/grid/ Models-WG Ji Grid ft7 7'J ©-'> a - 167- ® Account Management Working Group (Accounts-WG) Interim chair(s): Tom Hacker, [email protected]. and William W. Thigpen, [email protected] Email list name: [email protected] Web page : htt_pjl/www_lnas_,_nasa..go.Y/~iM^pen/_a_c_co_iLnts-.ws WG (D User Services Working Group (Userserv-WG) Interim chair(s): Rita Williams, [email protected] . John Towns, [email protected] Interim secretary: Email list name: [email protected] Web page: http://dast.nlanr.net/UserServ-WG/ WG $ 3 — D y :fe U £7" ]) y K©rS WjtLX^ European Grid initiative(egrid; http://www.egrid.org/ ) v7"(± 2000 ^ 3 B 22-24 (Dm, J (2) U.C.B., JavaGrande Portals Group Meeting fKBfl : fhRl 1$U 2B 6 B-fm 1 1^1 2B 1 1 B (6 BR3) IMS : y^' —7 L — TfU +h> 7 7 >7 77 rR,T7 V ;£? 12 B 6 B^6 12 B 10 Bomm, > 7 7 > 7X 3 ££ & tf UC Berkeley 0 EECS (D David E. Culler Wk, *5 - 168- LTBieL'3o $,.E>„ ^©fe©Cli^**g+*i: I/O ©tgXnb sSSCr X-bX njggT-fcS-il-S , |+S*f*©5 :'"-Xjp|+**©^ 5 ^ v-->3 >, t Ti'SIsJiSki' >XX X -> 3 XT-SS&S^ifcS,, ^©J;o»iSte&t6#lt'-5* m#^xxf-A©«#&«, $c»B3m#©3>^-*> hfr'ffiscjSELTX^'i'y > h<. ^-©gei&xxxAk v,xmw Tifc-5 i;#x-5(L'Stt* < *3o t? L3> SStt^tienfflyxu^-Xs >c*kM'g &dS( AX t- Al:iot7D/r- Uh5, k © - 170- • Berkeley Multimedia Research Center • lnierne.t.Si;ale„.Sj„s.t.e„ms.E.e.s.eMeh„Pjx>Mci- • Digital Libraries • Computational Astrophysics • Reconstructing the History of Life in Integrative Biology • Computational Finance • Chemistry • Civil Engineering • Economics • Ge.olo„gy.„„and..0.e„Qphy„s„ic.s.. • Parallel Computing for Optimization and Simulation in Complex Manufacturing Operations (IEOR) • National Energy Research Scientific Computing • Mechanical Engineering • M.ath_e.m.atic.&. • National Airspace System Simulation • Patient-Based Optimization and Treatment Planning for Neutron-Based Radiation Therapy of Brain Tumors (Department of Nuclear Engineering) • Physics • School of Information Management and Systems • Technology CAD Berkeley VIA CD ^^[4], # —bl/v^^CD^>f-T — 9 £>tl& [2] [3]0VIA Cornell I### cF tlfz U-net £ l'* o Active message CD f§ M ^ © if! V dr — y '> > £7t CG Intel £ CDIdJfiiff%Tr User-space low-latency ifBiiy y V 'y O £1*^ O & CD 1!\ £ft VIA CD fill % & fr o X O' £ o Millenium T: li Berkeley HE CD VIA £ L"£G NOW "Chi O C tlT O £ O h^'y hy — PT&Z Myrinet D v # [##%#] [1] REXEC: A Decentralized, Secure Remote Execution Environment for Clusters. Brent N. Chun and David E. Culler. To appear in 4th Workshop o n Comm unication. Architecture, and Applications for Network-based Parallel Computing . Toulouse, France, January 2000. [2] Architectural Requirements and Scalability of the NAS Parallel Benchmarks. Frederick C. Wong . Richard P. Martin. Remzi IP Arpaci-Dusseau . and David E. - 173 - Culler. In Proceedings of Supercomputing '99, Portland, Oregon, November 1999 [3] Millennium Sort: A Cluster-Based Application for Windows NT using DCOM, River Primitives and the Virtual Interface Architecture. Philip Buonadonna . Josh Coates , Spencer Low, and David E. Culler. In Proceedings of the 3rd USENIX Windows NT Symposium, Seattle, WA, July 1999. [4] An Implementation and Analysis of the Virtual Interface Architecture. Philip Buona donna . Andrew (lew eke, and David E. Culler In Proceedings of Supercomputing '98, Orlando, Florida, November 1998 • Ninja Project (http://ninja.cs.berkeley.edu/ ) Millenium Ninja — —a C JLLTV^o PDA Ninja A scalable Internet services architecture. 13 2.2.2-2 #t#*maLT(D"Base"WA ^/r-^yil/^NOWCDZo^ persistent u, workstation PC PDA, Units — #1: PDA T L,/7^T7> P#6D3 — P^mb^T/zU®/^ —^t#/z%V'oC(Di#'&,Base^6 Active Proxy > P P^y^>D- P^fl, Unit Ninja : • Structured Partitioning of State: o Ninja 0 T —A" T" Z7 V U persistent &tAS - 174- Staff and Visiting Researchers • Reiner Ludwig • Luis Barriga • Junichi Hagiwara [1] Jaguar: Enabling Efficient Communication and I/O from Java, Matt Welsh and David Culler. To appear in Concurrency: Practice and Experience, Special Issue on Java for High-Performance Applications, December, 1999. [2] The MultiSpace: an Evolutionary Platform for Infrastructural Services, by Steven D. Cribble, Matt Welsh, Eric A. Brewer, and David Culler. Proceedings of the 1999 Usenix Annual Technical Conference, Monterey, CA, June 1999. [3] An Architecture for a Secure Service Discovery Service, by Steven E. Czerwinski, Ben Y. Zhao, Todd D. Hodes, Anthony D. Joseph, and Randy H. Katz. Fifth Annual International Conference on Mobile Computing and Networks (MobiCom '99), Seattle, WA, August 1999, pp. 24-35. [4] The Ninja Jukebox, by Ian Goldberg, Steven D. Cribble, David Wagner, and Eric A. Brewer. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999. [5] A Document-based Framework for Internet Application Control, by Todd Hodes and Randy H. Katz. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999. b. JavaGrande Portals Group meeting (1999 ^ 12 H 7 0) (http://www.javagrande.org ) JavaGrande Portals Group Java tf'O h LT lb > b° n. —4 > [§|eN “ The JavaGrande Forum distributed computing working group ^(Grid 3 IETF 0Z3# chUGS^PfbS: UTUt" The Grid Forum (http://www.gridforuni.org)0 IT - 177 - 2.2.2-1 JavaGrande Portals Group meeting Z7°D Duration If Topic | I it . ti 5 11 ! {Start 8:30am 1PART 1: Introduction ii ____ i 20 minutes | Datorr, Computing Portals, and Science Portals, Dennis f \Gannon, Pyuish Mehotra jj J30 minutes |Science Portals, Geoffrey C. Fox j PART II: Talks by Industry and Researchers j |s0-40 minutes pPlanet, Sun Microsystems { | 30-40 minutes je-Speak, Hewlett Packard j 30-40 minutes Ninja, UC Berkeley j 30-40 minutes NPACI Hotpage j __ \ Lunch | PART II: Working groups j {overlapping Group A: Computing Portal Frontends and Architecture j jWorking Groups This might include discussions about the Computing j Portals previous working groups, as well as, some proposals) that are submitted to the Gridforum in the area of the 1 [backend. j Group B: Industrial Portals | This might include discussions on how existing technology | lean be used and enhance Computing Portals in j jjexisting/future industrial applications and frameworks | ! Other groups upon suggestion by the community j | PART III: Summary | ^@(7) ^-7"4 y^xit, y v 3 wo c tt$t$)T±) X ^TXMtf) ^>tlX c? fz Grid 4o - 178 - An Object-Oriented Framework for Parallel Simulation of Ultra-large Communication Networks Dhananjai Madhava Rao and Philip A. Wilsey Computer Architecture Design Laboratory University of Cincinnati - Cincinnati, OH, USA ARAMIS: A Remote Access Medical Imaging System(short) David Sarrut and Serge Miguet Laboratoire ERIC -Bron Cedex - France Language Interoperability for High-Performance Parallel Scientific Components Brent Smolinski, Scott Kohn, Noah Elliott, and Nathan Dykman Lawrence Livermore National Laboratory - Livermore, CA, USA A Framework for Object-Oriented Metacomputing(short) Nenad Stankovic FSJ Inc. - Naka-Meguro, Meguro-ku, Tokyo, Japan Tiger : Towards Object-Oriented Distributed and Parallel Programming in Global Environment (short) Youn-Hee Han, Chan Yeol Park, Chong-Sun Hwang and Young-Sik Jeoung Korea Uni versity - Seoul, Republic of Korea (3) ### HPC by - 180- n y bn.-if- d * ‘,kU''Grid:3ybn.-A-f >ytji5!;i'i/;. $l::t#©yr b A-y a y©A& 61\ d- y -y h ©(*❖* J-—if Cft-f SA -KXtSfttSfeiiKS^'n'?-^ HPC, A5»$Etot3t,. iS3pmCSe6«MsiW* ofeo LTtt, ?7X^3>lfa-f-( >y?lt b TFLOPS, Xf'jlT/iO, P /W h»©-*IBfi*liix.-5 * 7X^sHS®if#?*jntl>5, $ fc, Chien © A *§« t « » S 7J H T* » + gflops ©*mte###*w#r& t,, ^#©#fu #+##©#%%%:#& ii^&^tiTtySAL *>A sH©y — * —lixvo >S±^*T-$> 0, ttlSttiSiiTiy S b ux =fc -5 o *fc, Grid nybn.-Ad' y^ti, d1 > h ©S» k Sa&MiifblcfiH', TFLOPS «©§+*«, iifflST -^^/W MS©* F V Gigabit S^SfS Gbit © WAN kC«toTiWeig|+®* ‘d:tf€©5:-'-j'-fey b StSSStcjS# ■r^fctoffllted'A-efe^*s, x-tp^y^-r 7®'Mm^£.z>'mmw>mt\n, ^»©-by-9-©vni/j'd - am , Azv Digital Library ©A9/W MS©?-£ ©A'- f ^-XM, lt*«4alBTffi-5 bt'y!gt;tttt*o-C*3?>'f . ^*©Itffd' >77©iittj:?,5J 8&tt* sfcSo *Brti;C© 2-3 NGI iSAW Internet2 % b'©@A#m%@ftd' yX-r y h b»P(SLr, SSCa- h\ 7 7 1, DB, * J;tFA Wft-f >770Sii|iIlflTl'-5. 7D77At L T & 133 ji E © Alliance, DOE© DOE2000, NSF © NPACI, NASA © IPG, DOE/DOD © ASCI DISCOM &k'#& b, 5i A&E?Eil4/I>s7} ffjiAS ftoo ai?> S <, M A H SDSC (San Diego Supercomputing Center—NCSA fcHfcV NSF © 2 *X-t13yea-f-f y XA y £ ®-o)T*tt, 50% &l±ostwmsmiz npaci cgbiAHtr Ab, NASA &Eg%m©xA3 'sltt^T Grid © V 7 h L, G/75 (CjglS'f S C. i; SAMftC LT © So Crawford MffliSOidt, ASCI T* H AE^lft £ TFLOPS #^(,4*# PFLOPS *©H-K«£l®»8!ILT^<-S, E^Sf®$ 100Gbit E©Bgi§-fb* y h V -XTISSL, Grid H*&ffaite$ilTTV'^<, -AfctfH© X-*, A#, jbATT E^Srttc©'$n*e.ttS^:C8ti)Bd:nT* b, hpc * au ** hpn ©m,#t:A < op (Sr* s f t; t' s © *ssttr $, s „ Kifflidiy 3®ttr*(d;t>* 5Bttt'*KP#r/oy±r©%* Sfflrru©tt|g|cg±©*tCy< —, E%#, +f©MiC,'* si±* snr*b , f rtisisa^wtcftifift hpc • hpn ^©AyX'd - A77 bt*bsnri3b, s e>tx hpc 03Uzf t LTV'S©»sSttrafe So -tiJ6¥<*B©Stt& HPC jbcktH yX-*y 103U-f-(MiL, fflS ffiA'J'SrfcfltfcoT Network Supercomputing ©Hr Lf''ttt^C[S|lPfeE5SRB^4b d" yy^sfitciqiA^o^tgrifeSyo - 181 - NCSA Supercluster f TIZ production use Tf 24x7 T'lsBb LTD , ^ Origin, T3E, 10#jg V'— ^7 cF tifz t)Ss C C T'JJ 2D Navier Stokes CD Pi $: tf o ^^07 7 V Zeus-MP (256P, Mike Norman) ISIS++ (192P, Robert Clay) ASPCG (256P, Danesh Tafti) Cactus (256P, Paul Walker/John Shalf/Ed Seidel) MILC QCD (256P, Lubos Mitas); QMC (128P, Lubos Mitas) Boeing CFD Test Codes, CFD Overflow (128P, David Levine) freeHEP (256P, Doug Toussaint) ARPI3D (256P, weather code, Dan Weber) GMIN (L. Munro in K. Jordan) DSMC (Ravaioli) FUN3D with PETSc (Kaushik) SPRNG (Srinivasan) MO PAG (Me Ke Ivey) Astrophysical N body codes (Bode) Parallel Sorting (Rivera - CSAG), 10.3 GB Minutesort World Record AS-PCO MM Perfemmride»S3 Navier Stokes Kernel 20 18 4k v; \’i , 16 > ' r”. 5 .'i . ?<: < 14 ...... # 12 10 mr .... * ♦ Ml O20M. 250 WU R1KM0 NT Out* a* 950 FMI Ms 9 32 128 180 192 224 256 Processors mn 2.2.3-4 NT 0 Origin2000 £ - 185 - $ tzn ffe© NT Cluster t LX ^ Sandia's Kudzu Cluster (10/98), Cornell's AC3 Velocity Cluster (8/99) frgtlfco 10 {%LL±CD=1Z X — £$8££tl NT Supercluster, NCSA - http://www.ncsa.uiuc.edu/General/CC/ntcluster/ , - http://www-csag.ucsd.edu/projects/hpvm.html AC3 Cluster, TC - http://www.tc.cornell.edu/UserDoc/Cluster/ Communication Hardware - Myrinet, http://www.myri.com/ - Giganet, http://www.giganet.com/ - Servernet II, http://www.compaq.com/ Cluster Management and Communication Software - LSF, http://www.platform.com/ - Codeine, http://www.gridware.net/ - Cluster CoNTroller, MPI, http://www.mpi-softtech.com/ - MPICH, http://www-unix.mcs.anl.gov/mpi/mpich/ - PVM, http://www.epm.ornl.gov/pvm/ Microsoft Cluster Info - Win2000, http://www.microsoft.com/windows2000/ - MSCS, http://www.microsoft.com/ntserver/ntserverenterprise/exec/overview/clustering.asp - UCSD CSAG # f 7 X f : Andrew Chien UIUC/NCSA 6, UCSD/SDSC UCSD CO#^-^ UT#, WAN a^7^^#@C0 #^CO^^%^^^am#^#^)^(Cluster Federation), fCOP^CO h FD V^com#, WAN ^CO - f ^ #3^ Hbl:Z D, ^ P u y ^ ^ 5 73 P71/#j^T 1G I/O ^ I/O a COM*&;i> P#^ Gigabit WAN U%, 3000km NCSA C0^7%^ k® I/O federation CO^#&fT - 186 - ddTrfJJiCG Pozo IS IT l'' 'b *~SciMark2 Benchmark j tz o SciMark2 CGJ\ • 5 ocD^##t#± —$11/ - fast Fourier transform - successive over-relaxation (SOR) - Monte Carlo integration - sparse matrix multiply - dense LU factorization k(Dda-(X dh&kL Java771/vk^LTaax&2flT^&Z:#X Recent SciMark2 Results http://math.nist.gov/scimark/ 51.6 MFlops IE VM 1.1.4 WinNT 4.0 Intel Pentium III 450MHz 47.7 MFlops IE VM 1.1 Win95, Mobile Pentium II PE 366MHz 45.8 MFlops NE VM 1.1.5 WinNT 4.0 Dell ... Dual Pentium III 500MHz 44.1 MFlops JDK 1.2 appletviewer Vin95, Pentium II 400MHz h 76MFlops t(DZ t'Tr&Zo SciMark Java t C t ©ji® Jt$£ Java C (MFlops) Pentium II 400MHz Linux gcc, java Small 41 66 Large 23 36 Pentium II 333MHz Win95/VisualC++, jvc Small 37 41 Large 16 21 SGI R1000 194MHz Irix 6.5/ gcc, JDK 1.1.6 Small 11 47 Large 8 16 LTLlT(DA^#(f • yffm-nZtitzo • Sun t - 188- 3 A^\©3 >yW C©3 >yW JavaCC (Java fflCDn > ;W 73 >;w 7 a b , Java HHn+x 7 l/^f^f V — y 3 — p^6, MS© Java+3 T/^YAthtf—Pd73X©lWltiL©3 — ^Tt^o 3>f^A7^73V-CW\ <7y^y%i;^>Paisim^17^-P$7lTl^o ^ /c, Java Grande 7 2f — 3 A © Message Passing 7 — ^ > 7*7* IV — T^rSSblaTi'A5 fft)tl^o TMPJj anton30r7c& Java MPI £ 7-^ >7>-7*7!^ Lt Id < Hfli^fe £ a©C at'S^o Ctlli MPI-like & API ;t 7"'7 a: 7 P ^ fa £ Java £ L <#^^717:1^ awo^'rcTi^'rc^O'yyD —y&ao7:i^ a©c a^&^o ^©J:3^#r/=^ API J:3 • Java ©%#©M#y^ 'V 7 — V7?$j £ RMI ^ Socket ii^7^7> p —17 —y^D 7" 7 5: >7C#Yb^7i/z t (DTr&^o • MPI-l ^ MPI-2 7!(±^#%M#tf IPT&^o • r JavaGrande 77'J7“'>a>j & ^ ^ & W #bC # /c 6b (C (J, 7 7 dr — 'Js^y'y >7^© API H7GJ, £7c MPI-l — C, C++, Fortran ©3— ^ 77°n— f b bTx ^ 7'':7 x 7 P f|§ fo] 7: Java |q] §■ © 7 '7 *fe — 77^ y '> > 7" £ BfB^k©caT&^o fCT(J:, yypyxi/vf j>ya77dz-^yi7i>>7'a^o #^6y^^x A&a*3 < 777^ 6^^##7#ma%^a©ca7:&^o ©msMm#t)AD*^7t^ 6 i/i'o ##(C, rComputing Portalsj © 3 ^ il J £0 V^T©|p7t^fe o feQ C 7UJ J7tu, rDatorr (Desktop Access TO Remote Resources] a HTIJ tlT ID fz & © ^^ BU £ ^ X. 7: ^©7?&^)o £ T\ rPortal] a (J M 7^ © SM 7)s & o 7:0 7 > 7 — ^ 'V P £;fe Id 7! 7? x 3"$: 7 D > P Fa LT##^Tt/:3.-47*77yYXW#^-M©17-a7o f Netscape Jp AOL, Yahoo, ##©17—^ b, Ztlft rportal 17—tfX] atM TlTV^o ^CTGJ, Java 75 S IE & l!J £ :SE 7: LT75 b, 7 71/ y ^ JavaScript, Java-Corba yW > F, RMI, XML 7 — )l& £'&{£ t> ftT l0 fCT!, m-###C^(76 rportal] &^#L7:WC3a^7#X.y]7r&6o &b#6 t)©a LTLTF^#(7 6 7t^o - ^m#W:#IbL&7j:7Y >7-7 j:Y7 - %—y^— 37" j >7 v v—7^©> a y© v t — p^^j - :t#C J:6^#©gS#t©^6b©7 7 Vyyj>7Y>7-7o:Y7 - XML &/< — ;% a LX=y^ —7 T —7? Y y^\© V t — p T7 t7©/:6b©7 —IP - ^7#iyyb7 —>a>©yy7 p ^y-e©^E^]AT • 77 i$( W) Jj © 17 — P - %#ib^a©^7 7 p yp©^# - 190- • @EK#:£ UT t'/zS’E© 7 7-b-'>Wy -> > y 7 T* 7-7 V (NX, PVM.PARMACS, Express tt • #<©^>y-* s&a«g©. ^-y evy -/©&v7y-fe-->wy->>7 -7^77 V 6fflVi"C$cro4ilil L "O ^ ii „ (IBM SPN Intel Paragon^ TM CMS) • S’ < © * 7 i7 7— (National Labs & fc*) • #i#lb7"Dt7&g0%f 6 C <3>-fe>t^iiifeofe, • i Vtl£®Hlcfcfc5#iDAs$>7fc= «gffll$£«;tT. tWtot:*. ^SSfl-SiiS it. cncMit Javauifo*? tt'7iai<'i)setfSHt?.n&„ • *7 7 7 —©Sgii^-E,©* 1? 7-y* —&*777 — fi Java ±ffl MPI &%L#7T t'-Sie? • 'oy-ie %©+>-*- b^fc-s *>? MP12 ffl*g-e-r ?> • ¥oJ$-z>X%±t£^'( >7V (,f't''©ii^y W:tiii^7V^©A»? I/O tt bot?.? -*isia«*t'7» k 7ijk-ay 5? mgfjfl©ffl5*fflttti: tyt5? • %#©##W:&^©#? • gSttCWf3l£$S&3 >-t>+)-7.l£&Z fiftCx $t MPI # Java £ £ t' T j@ St; £ fi£ £ & t'ffltriitt Ufr i; 0> 7 j&fCO t'T y.T©a6»s¥U6nic0 • MPI HU SofcrnyvAttSaioifcltfXf - b yyyAAf © Z 7 ^««©1^ttlC*ffi$Sic^V'o 0'JxtiL &3*f§i>sSatfcfct, ii/Lj7 7A© 7-7-b7A^D^kSitT/t)S6ViS7Ci:i)sS§«5,o f ai:ML% Java tiL Eoir7ny7 Att(6ttl$n. Ifltr- b ttS* 7 fe 6 © tc & D Af&Uo #Rim#+#4'C*m/t7 7 r C r 7 -b 71" -5.;: f©m©&b#6#Ak LTUTtfWfciiiio • @@t@©msam# • 7 L 7 PtfOl/ • Java © Socket API ^©/W > b • m^ma - 196 - ■ 7 V £ Java CD heap iS ^ & V ^ T\ d? □ 3 X h S' V T 7 3 >/W 7T#%U&th#- h - ;W h3-M^^<^eW#/W±U(:^afo - native 3- P & 7 t'Jf^ k##L, #(DT^ Jaguar (DM Jaguar ^(±mT(D Jlo^^CDT&^o - a java## y^/? ###. ^ v k *7 ^ T y *? — ~7 rawfO^7 I/O^^'o - Jaguar 3 >/W ^>o >7 j* Wb£tifc3-K(C J; tK M^MT&Zo - Java CO/W h3-^s^“^ 7>& Jaguar ;W h 3 — K (Jaguar ;W h3—p(±vS/>V V —^^/vk^U7y^ - Jaguar /W h3- pcDthtf- HJ> %#0 JIT 3 >/H ^ o - Jaguar ($vh9 ——P-^s X — P ^ ^7M0)o - Java fr 6 Jaguar ^xCD/W h 3 — p )]/Q JavaVM JIT 3 >;W - Jaguar >rf jjfoCD 3 >/W 7 0 - MPI ^ RMIX Java RPCX Jiro & £ [ 1 ] hitp.i//w.w.w_le_p-£C.1e.xL_ac.luk/j_av.agr.a_ndfi/ [2] http://www.javagrande.org/ - 199- 2.2.4 Kmz.tsvr>i&m&ts&vft&imms'ZTixCD Globus -/\ H.TIH:lE BHT\ >1<@® y y^l/d" > £ University of Illinois at Urbana - Champaign IZ& £ National Center for Supercomputing Applications £ l^fnl Ln Utflvj StiBU: yhK 12 ^ 2 u 27 E-3 U 3 B tfifflft 1: 2 U 28 B Gregor von Laszewski JS NCSA T'Bg ±*MWk(D)Zmfrm\WnWi National Technology Grid #L3:3 £ g y D S/ a: ^7 b T:' $) £ National Computational Science Alliance^ Alliance IZ ~D ANL Java y L'y b#mi>yyA MOBA Globus Globus (1) NCSA sm NCSAT(D#^^^^m^&f^<, UIUC tfitlfco NCSA T Chief Technology Officer $rS<$TUCharlie Catlett J3: C: $ I*] £: 3o Catlett £ Tzs Alliance T: (3a Distributed Computing y — A 3o 3: U' Data and Collaboration ^ — A (7) Lead Investigator(10 cF G LA Grid Forum [ 1] 1?' (± A#:(D Chair(U gcf. Grid $:##L3:3kU'3yD^a:/7b, Alliance t:'0(DT#S&3& Globus yos; a:/7b[7][10]N##m^^#tf NCSACO^^Mmei^# tftL'TA < y — b — a. the Alliance UIUCN Beckman Institute Senior Associate Director T:& 6 Melanie Loots gq3:D the Alliance National Computational Science Alliance(C it & lb £ NCSA fz 7bs UIUC & -200- * NCSA Stzltmz Alliance j: ^|+PSECD jg LtzM£ 0?'$>%o Ztl%9cmtZ>(D\± NCSA Tfo D s 50 JAl± (D)Kmcr)^:#i)$M#^WDL-CV^o NSF t'£o ("National Technology Grid 0##j UTCD^M U^c# ^17^ — V"C'$)^>o A@T:(d; Global Computing C UTs Grid Forum © J; 5 Alliance 60 3:3^73^0:^ b Alliance &$ < 0f--A(±s DTP 60 4 ^7^3 0 (D Application Technologies Teams (D Enabling Technologies Teams (D Education, Outreach, and Training Teams @ Partners for Advanced Computational Services f &o (D Application Technologies Teams (AT) - Science driving the Grid Grid 3$ DE6toT7°V^" —i>3 > £ T t 3 *4¥#60 l^o 1%. 60 6 ^ — kfrhMfdLcStlZ)'. Chemical Engineering x Cosmology N Environmental Hydrology N Molecular Biology N Nanomaterials s Scientific Instrumentation 0 (D Enabling Technologies Teams(ET) - Architects of the Grid Grid i ck 3 V7 b 0 W:/^3>U^-^60#^^f 60%#!ia^^)o ^60 3 A-A 6: Parallel Computings Distributed Computings Data and Collaboration 0 (B) Education, Outreach, and Training Teams(EOT) - Access to the Grid Grid ^#(3:s NSF ##J WL\s tz National Partnership for Advanced Computational Infrastructure(NPACI) program [2] DTV> < 0 D1T60 3 f “A Enhancing Educations Universal AccesSs Government @ Partners for Advanced Computational Services(PACS) -Support for the Grid m^aGrid0##uato#@^u-Cs <^-^xTs m-####s Grid ^Alliance 7% a a c a fur ^ 60 -201 - £> Alliance T'0ffi^01i£0&li&0 3 "3T&6 A0C ko (D Capability Computing (D Building the Grid (Distributed Computing) (S Scientific Portals #i: 3 ofcCailfcK Tportalj lu 3b§m3a WWW >\ d ^ t) ^ web ^\0 A D □ C Ao, A A 3 tztz. Loots CDfijm#^ 7 :c —A&, web^>^7x-^t'|)^, il^lTCfco web t TGI If & t>tV3~3& £ A < -Dfr(DffiW]lZ-zn^T Loots pq (L D dS#3& 3 & /=<, X.id! Education, Outreach, and Training Team(EOT)T: idA 'ATiCj###^'t0A^0 m^m^0'6cD^&6 v/r-i>3 LT, ###^#oAA#:##ji:-3^X:7^^3Ao Crutcher #^070^0:7 b 7^0 NCSA ^X?mD, fCXr##Lx Al^db —^x A^MA b L —v b9 —AXr$M^At)#X:h9l^ Al^x At^^^Xr&^o NCSA (3: 15 0 industrial A — b t“§^30 C Lt G 0 A#(d: Alliance A |S#0### (d:^^o u^L, NcsATf^60A##m^m/v^^^(±, m^Tf0A#mx? *ijffihlSE A&3o Ctlid: Alliance 0^^X: fe o X % $I^T' id: & A 0 NCSA 0±I ltZ(D -202- NCSA am^LT# ^ £ ;B ti) T l ^ -S o #!l X (J: J.P. Morgan X'&tl td! Financial Modeling t l'' 3 fzMa'X &%0 Z(DJ:olZLX NCSA Zm&nMWvmW, 77U^“'>a>Slft^± (j\ £ tz. n kkW&MMWteMTt LX c* fzo NCSA {J H 0 f± f£M J & Alliance N Global Computing t) 9 ^ (2) TA3>?mALffl$tmwm Java Xl/v MOBA[8][9](D, Jl^ry b X&%> Globus[7][10]/\6DEpr& Btoa LfclHS> fS %&£>'& (Dtz& kX TUX 'i #((J y />!/ d' > @ V£511 ^pjf (ANL)© Gregor von Laszewski S £ Sfr fnl L Zi o Gregor S(J ANL 6D Mathematics and Computer Science Division(MCS) lZPf\M LX l'* & o ##GJTBZ Globus^ # Java IHj }!l CD HI % jo ck tb Metacomputing Directory Service(MDS) % 0 /L 7r G* o MDS H^0 41 'L AfflJX & % o Grid Forum X (JN Grid Information Service WG CD chair #)% (X & <, XCDWfficom-CDBBlt MOBA b Globus (OWt^X^^tK . £ ^ Giobus tf-if MOBA on Globus ^6^<$ -203- a. -k < X — b — X k *> £>*>£>© #8 3t©i*]g:tt#:ffliiDo S6* ‘ ♦ g B,@Jb a@, ##©*#, k4i$X©ttW&k'<, ♦ m#©m^ Global Computing Infrastructure(GCI)XD i7 x X b C ^n'T, B 69, #iD$166, X X h-xy b*ffl®Ji£„ ♦ MOBA: Java X V y b#j&XX X A moba ©@jbo #a, tBffl, %#m%, «b, xnx'x ; >xxx*;v, fascist*, ttlgfffflo ♦ MOBA CMf gci xnxxx b-t-FE#8lti'3, MOBAfflc&a-y —Kxcouto Shared object, Security, Registry, Manager, Scheduler,, ♦ Streaming Calculation I£tit8 +66#B|60 LfcSfiRi'XfA, Globus k©8t^M LTl'5o ♦ MOBA k Globus ©%-& Globus ©tSWf ^S-aX-BX, Globus X-BX £«ffl-f.5Bt#->X X A©$8« IS SoTlii^xx-ex©x*- b jLtbX-exfm^a©#^#^. MOBA CoVT, $fc, Globus k©St^tCot'T tt ^'J> bl¥ L < -5 o b. MOBA MOBA[8][9]tt Java fiffiT -> >PaT'ffl, IffttiSffo&X b y b£ X*- b f ^ ~>^f AUSSo Sun Microsystems #©% 6 @© Java fit® V X > •£> XX X'J >kLTSissnTV'So f©#ai±, xotyx^os#a%&:+@#i@x©#m, t t£fc>% SS#flfl^j3l(heterogeneous migration)-^, #j#X 1/ y b IX^b, M A# -*p #X1/ y b # 6 © jetb 1C J. 6 #&, # |u]N8#%(asynchronous migration)& 8gK VT t' -5 AC&-5. ifc, #mxi/y b#ABkf 6XXXIJ, X lx y b©£fiJt7G*> y b XXX##+##k klcAV'it-5 kl'^fcASBS 5.X&®lt-5k k* sr-§-5o i&mkLxiJ:, #e*©XDXyA&:+##iax#mf6kkxmi%#ee@K, disconnected operation 4^ffttW©SjSH:$/c, SI6XI/ y b © 3688 k VT t 6 o ADX T, X 1/ y b #ii&©T@ X i" XX V check pointing iiBtt©iig±A^Hn-E l = :t*b'fe«), saanssistt-ii'Sxfc-Bo mt, m@#a+#^©&m, 0$ b EStoXd' 5. > XX X X X x - X % k#aX V y b © -SkkAs^fi^feto, #|Bim#m'6mB»#Rk%%. -204- c. Globus Globus [7] [10] 14 University of Southern California - Information Sciences Institute(USC-ISI) fc ANLO/nyi]' liitF77 b 9 Globus "J —A*"J b l4j£ti6gt@©$$]k&S&l$-9- —e7$ig#t-4S 5. #-^147077 S L%©o %1B#I4, Globus -9 —t*7©±lcfggt£n/c MPI-G tl'-ofcMl'^Ai)-- t*7, 7>g(Ma->77"A6m' -tfl-iraissff&do Globus l4*©-9--U7&m# LTf^So C ft 5> © MOBA f, ©fflfflnJIgtt 61^14* S o ♦ «SM d gram sn»e.©ya7©jsx^ So o DUROC S»->'3 7©|Blliega. @atSt6SEtt1"So (co-allocation) ♦ mm □ Nexus active message S®, 7—j7tUS&ffofciSSIX V '7 b'@Wj© «si££Sl#tf So □ Globus I/O TCP, UDP, IP ZA-f- * -y 7 b tt £©fiJIS, SSL Lfc®mi*lS©6^'fb6'9'4: — b -f So ♦ SPi^T'f ;i/747-bx □ GASS 3cA%ai#7 y-fib7f-UXWE&mKfSo ♦ t+a'lft □ GSI 9 X -f y7f A0 ftfflf < ©It — fXA^fflfBSnSo ♦ 97bU □ MDS LDAP ^-7©|+**iiXV W b 9„ □ Gloperf AV b >7 —^]f«©iilse„ NWS © Globus ^©gtl^C 4 0 k&So □ NWS * v b9-f %#©#]&&47b?#lo ♦ □ hbm 7 7 >©####&«[#&f,#mfs##&m#o Globus ±(CSI$$nfcg$#0iW V-x;t-9--H7Ci4^IX.I4y.T©'fe©» sSSo MOBA A>& Globus if—U7£fiJM1-S|Sg©#if fctSo ♦ MPICH-G ANL (C 4 S MPI[11]©*#7&S MPICH & Globus 54oC Lfc*©» # < © Globus tt-£7 6180 S: MDS, GSI, GASS, GRAM, DUROC, Nexuso ♦ Nimrod/G Nimrod[12]l4 Monash University frUH^cf ttfeX^ yt—j7 -9" —Xffl©iftti:tbl+S 7 XfA« Nimrod/G[13]l4 Globus th—UXSrfiJfilTS Nimrodo Nimrod/G t" - 205 - □ mds mmumo □ Nexus JS&o Nimrod Resource Broker(NRB)©^SBb 0 □ GRAM NRB # £, © V 3 7"^Ao ♦ Ninf on Globus Ninf[14](i Remote Procedure Call(RPC) ^ — X © Global Computing '> 7 r A t fe o Ninf-on-Globus[15]{i Ninf © MM Dt "P Nexus \Z b # & OX £ o C. CD Globus Nexus CD Ninf D s MftXfotlU Nexus Tit:# < X Globus I/O £TiJfd t~ ^ ^ X fe 3 o d. MOBAon Globus MOB A H!k Globus #1 ©7%## G, MOBA on Globus © C#o ZZXlt^f Gregor J3; © IS) S#{£:IZiL^tzo ♦ MOBA ##6, {Sl/^l/th-C'X©^- □ MOBA ©T >X b —;V0 □ Place ©^SHJ □ St S S $1 © £* S (resource locating) O □ tlJffl#©I^SE(authenticataion) A: X V V Itbl(authorization) ♦ Globus## 6, MOBA □ GRAM ©tllfflfBHilS) D o Place ©^SIKo □ DUROC □ Nexus X^#o Globus I/O©## MOBA tcmL-C^a&A&o □ Globus I/O Q$C TCP & ck^#\ secure □ GSI ©^ftu:#d: GRAM&^##mf&o □ MDS □ NWS □ HBM ##lto □ GASS ##lto MOBA#^ Globus Gregor #it-lf ©tSlS'S^/c/cl't/'co o) ®fam&%MLX0ffim IE W S o T N Global Computing C:?jS'n f £ TX]) >T — 'y 3 >© *7 y X&M b ti -206- NCSA l8Uf4 Globus fili: bft (tft&ft £ ft t'fro [1] Grid Forum Home Page, http://www.gridforum.org/ . [2] A National Partnership for Advanced Computational Infrastructure, http://www.npaci.edu/ . [3] SCXY Conference Series, http://www.supercomp.org/ . [4] Computing Portals, hitp://ww.w 1.CQmpu_tLngp„ort„aly„.„Qr^/. [5] IETF Home Page, http://www.ietf.org/ . [6] Internet Society Web Site, http://www.isoc.org/ . [7] The Globus Project, http://www.globus.org/ . [8] Kazuyuki Shudo,Yoichi Muraoka, Noncooperative Migration of Execution Context in Java Virtual Machines, Proc. Of the First Annual Workshop on Java for High- Performance Computing, 1999. [9] ##-$,#W#-, Java CPSY98-32, pp. 39-46, 1998. [10] Ian Foster,Carl Kesselman, Globus: A Metacomputing Infrastructure Toolkit, Inti J. Supercomputer Applications, 11(2), pp. 115-128, 1997. [11] Message Passing Interface Forum, Document for a standard message-passing interface, 1994. [12] Abramson D.,Sosic R.,Giddy J. and Hall B., Nimrod: A Tool for Performing Parametised Simulations using Distributed Workstations, The 4th IEEE Symposium on High Performance Distributed Computing, 1995. [13] Abramson D.,Giddy J.,Foster I., and Kotler L., High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?, to appear in proc. Of Inti Parallel and Distributed Procesing Symposium 2000, 2000. W, JSPP’97 mJtM, pp. 281-288, 1997. [i5]^##^ai3,^;n#^,mM#, > h - o-wi/:] >7"'>7ir A CD.bblk---Ninf,NetSolve,CORBA,Ninf-on-Globus O'fTtbfFffi---, tSIBM 99-HPC-77-34, pp. 197-202, 1999. -207- 2.3.1 ie5<4IBU-SCj3»5je«5-E3 yy fcMfrWL^i > b^-7 ^ Ui^o t)^,5Avx (l) /s—nytr^L —^ . r>-9->7> ^#J0/£ijj^fC3 >h°i— 7 ^ >^<7M y — -3 > bn.— £ & * y h V-^Trjg^U — ^ >ba—ft D b:i-7-f >7"bd; D >bn.-x'T Supercomputing ’99 b&UT C 3 -o £l0 7D i/x^ hf- Ali '^© HLRS (High Performance Computing Center Stuttgart) $: 41 ^> t ^r V 7 CD CSAR/MCC (Computer Services for Academic Research/ManchesterX 7 ^ V ^7 RUS/HLRS and Partners @ SC‘99 Portland Network Topology Cerfecs ► Toulouse EU Projects Optiblade TEN .. , ■-' ""T * > : «» " 155 . vBNS EQ 1 Y ARAN . Abilene 1 T 1 JANET S. IS S m-fh Sg IMnet ' , PSC SCInet Belwii. LANRUS . I ■v m' r^i db "V T 1 Tsukuba/Tokyo Pittsburgh Portland Stuttgart Manchester toncheeter USA, JP H m via JANET Tdeglobe IP Service TACC Hitachi SR 8000 PSCCrayT3E SC -93 Portland HLRS Cray T3E MCCCray T3E sr8k.aist.go.jp jsromir.psc.edu Workstations hwwt3e-at.hvwv.de turing.cfsac.uk 150.29.228.82 128.182.7368 140 221 x.x 129.69.230.195 130.88.212.1 ATM PVC HLRS MeBoompudng HLRS Metacompuing HLRS booth Meaoompiting HLRS Metacomputing HLRS Metacompuling 10MbiVs Eiropean Projects booth EU Projects EU Projects Shared Acattd.Covas, Edison, 4catad, Covas , Edison Novice, RCNet connections Hpoom, Notice, Optibbde, RCNet RUS Projects ______Communication System; BeWii Develop mert (PKB) m 2.3.1-1 $ y h7-^Oiji -208- © PSC (Pittsburgh Supercomputer Center)t 0 ;fc©I Sifft'Jl'te > 9 — (TACC) A> ib © X > n — T * $ ft X tA -5 o ¥ ■=£ > X p V - a > T- ti: HLRS, CSAR/MCC. PSC ffl T3E t TACC CD rsRSOOOj (64 y- -y h V-Xtig s$v, 2.2 tflops a it, # f 5 3- V-^a >65tff LA: (0 2.3.1-1), (2) -#> )£«»«cn>h‘a-7"4 LTs «S@© 5 * i: M l T 11' X )1 - 7 "J It< Supercomputing ’99 ~Sfyt>tlfc;r^:~SH1&W 1 0 ftE5C8$lilcFftAil7— X Xt— L a > ❖ PC UNIX «©-6-|+ 1 5 0 i$±tc8+StoiS6fl-|!l 4Hl©c© #*#©H6©kC6tttt#*©l+*«S*toT < ^,C Ci;tfe-5o tft#tf)©ifitt:f+||«£rSJS'tft ut. h %If(St-S*S* s*-s c ix *tit: J; D#(ajAsHEl:^oT6^n$v^ j'flutc-Ex-BS k, SmT-ny^ASSIffS-SJ;) LS^fci: SCtt£®fflft@PS-t-^©Xn ck, *k -$yc$/;iiffltotctti$a:-iSv^6©»$6-$.o l^u •Y >X-* -y p ©te#-t-v V) -X > X L -X * X ASfiKiz: LT V> & ©"C zfe •£ A> 6>. itn-e©*,© 6 "9"—L X t l/TSIl -5 L X 3 Cft pf %> ASP(Application Service Provider) 2.3.2 y-j >^!;A'f;l7t-V>X3>Ka-T'f >y Md7t-7>X3>ti-fl>)' (HPC) l±3+###K%km# LX, ##» #¥©**?>t, #i#©m#ami:eKUTi\6 (02.3.2-1)0 laca ^t, #l±AA#A!Uxa% < ^>Ax-^*'J'A#&**$#©/:©CI±C© HPC ttESA ALii-gfft-So LA> L, X-A-n > La—X© J; A &i6til6IP@tilti:A V 3 > LR^T i^flS'lifAAsfflKtcAcoTt'-So X-A-3>La-X ti; x-71-xe> v -209- fumam #*« W-*E HPCMKR* f4»tt«it85<75V- 0 2.3.2-1 k HPC eixii, s© cpu &6#@©77uy-y a >& ^'>7-^1: LT«flj©y ■>yv>$T-#ti"C%#t-r^Ik'5tc6-5o ASP © HPC dficffivw 7<-y-Cfc-5o £fc, -v-^y p SSiiS'S-^fetoUi, tSipto»^«©«iz: * s¥iii;*4& fen-5 *9* sfe -5c 6s¥RJkT-ife^o vsntttc s u r - a * ® v g+ e »s £- $ k »-a o c a cpu &C kl: *fc, -^tu^qJtgt: *:-& 2.3.3 ^D-A>3>kJ.-T^f >^?X F a#©R@ai±m%#0mMi5&:fa?&'5o tsa^fcjr-< >y©w*ts E6$e@6tocsifrf ¥t:kcA>© www -y- 1' h sisht^ v7 1^Rotst . -i >7 b-;H"n«iStfisk¥tt"t*tttci'o 'j>& < k^ :+#»- vx\ #ii* y 1 9-77, ^=7^T'y h -210- —i’7'f7> ha-tf £ tot l£* y t 9 — ^0 AT7f tXf 3>Ka-f^ >^f7 t^V M&*@# • ##L Z 9 tit i#4?hTi'?0 ctiST-tct)'J'M^T*ff oTSftit £ < <$x 3 RWCP 1.5Mbps Europe 45Mbps APAN STAR tap iMnet T2ty TransPAC Chica9° 70Mbps 1 Mbps CSys Australia #tHhTV-»$S#tL, Globus, Ninf 3¥£®|::4%±lf Tfc< ‘ OOMbps |g 2.3.3-1 wvri 7^ - 211 - H3S titxs ts ? V 1 0 1 Z b. ®*#Wf!Jffl»if*s0J!SI t*i5iaflj3>;H5a®l:HUtt, ¥fiU 2 ry MWt3 >;t •i- 7f5Egg#gj MbtJ, <2)£Sfl-S(3 > t‘n.—9 4 y^SEC-oV'T &. tfesff3sms4$icisst?t^tc. mm^mm^oj #ee&M&A4:f^,ckczD, <^$c©*smi: (1) #?!Hb3WW9Sffi 3 WiioSttWffitt. x*;i/*-^|g. S* • > h->x^A^k'#aS«t£i6/j:$$S$^'r*t), ^E*¥Sfto6E5aiB X^SffcterPSmSJtijCiniU-T^g^SEll^W^SnTV'-Bo ^fflfcto. HPC (High Performance Computer)/ 1-> — P’i'^7’Ctt-2>V'>>tttb©Ic5J;i;J;ibC^ V 7 K?i7 t; J; s tilSJM© siifb & ff -5 }£«© * s-£.g^ t tt o r u s 0 Cfflfcto. 1 2^fi=fc bHJS-r-5 fT P/\'>7 h jfe?ij-fb3 >;W 9tit«$%j T-ti. Fortran. C %©iFDlTffil'^>f V'7D if 9 X > 7Hlg% X* h L, #^©#Mfb3 •f 9f£WAs*tS;i: LT #£*?£. 1" &£>%$> 6 fp D©9t© 5> it £® $1 Stile * intern 7 9 A&a-#!lf < . 3 >/U 9 Asg«!)WC#jg*®a#tet»S!l 1X7^1- V >761? 9. 77 7 F7f-A7 lJ-7^glijfi?ijfb3>7W9Sri©9f?$Sg%6fi :3o Cilfe©®f ?i! US $5 tC (DiP&M&JWia : ¥/$ 1 2^@~¥f$ 1 4^S ( 3ipf@) <2)E^^j$*ej: * rheas (3) flfgmms : X*A 5?AffliE)*S8S (Fortran. C #) T-fi>tlfe7-77a79A -c*$>o. (openMP ©ksie ) crs7 9-7 h7*-A7 IJ-%@##Mfb3 >/W7®S»*fT)hJ;{iC. 3 > / w 9 tC cfc S P « & * Dr 7- tt ff 5) ft ti l r tt W * £ «tl« £ m' X i*jSfb&EIS;tflSS©79 7 (7t-A7 'J -&36?!Hb^n.-3>7 (4) W%9-7 : ®7 hVt>X h jfiMfb3>7W9Sr7©IB% ©*?iJfb3>/W 9©tt8gfffltt$©|g% (5) *t£lfS : ®7 P/l>7 f #?ijfb3 >/W 9K(6©H% aest-fes*^ v^;p. t*ts-czfesge?iJtoav^/M/. $ e>cttffl*4ST-s>s - 213 - ■y-»—t- > > • IP—y • S*yn y tote §1 tttiT 616 V yi/i- SI36ttE®eti6IISt-5 = ©36?iJ-fb3>yi-f 7®66gff (2) je«^*3>Ki—rd >^ l£«diC3 >bi-ri >^/tt, LX 1997 ^/6^K9E»srS%-fb LTU-E. 53-if-Cfe b ,*1!ilt®3 > >^S8k b-ri±i$n-tt'5o*Clte*H'T(i, c n$t®S|gE^58%te6nx-C. Grid Forum i;ll¥«n-5 Grid. SS®fcto®#l«As$Sfi$ StS®ffl/e*=tt>'t*#ffl«^ • *Xo 076X176 fti7)7- ^"eSSlB'U ltti?77^76XS*6'?» PDA CVfc-Si!«i*& Web 6 60076 Sfflt'T+6-b, -e®±®T7 V -7"--> g >tiffi*S)S»ElSB+#* 6, iSStoth — If X * X £ V > SC * ifi & $ ii X v' -5 o C06®l6$tei:b^x a#5@0)%5g%ia##tc*$%;igo& k oT U$ oTV-6®* 5 SttT*fe-So 50 >^»S}XU, $ 7 1- -7-67fiJffl®y I>7\t7^tgip 'ttmjtizfflmtzxgzrztb, #&®#eim® *«£«* &i\ ■ 214 - ## A. "The Stanford Hydra Chip Multiprocessor"##^ (1999.11.22,24) OHP -215- University University to of Stanford Stanford easier blocks single chip composed of the localized slow fast speculation programs across and slow advantage architecture requires transistors especially 2000 and levels parallel to by taking are thread-level all plentiful bandwidth at by Architecture innovation control and latency relative wires develop plentiful of for <-> cheap, to low communication processors multiprocessor transistors implications simple slower are room easy parallelism law than cheap, fast it of cycle s renaming ’ get threads cross-chip communication Approach million are design implementation Long Plenty logic Exploits Makes Single achieve Moore Wires simple single-chip High 100 > > > > > > > > > Memory Multiple Keep of chip Wires Architectural A Transistors Hydra ■ ■ ■ ■ ■ ■ Technology ■ University University Stanford Stanford 1M Chip 100K Laboratory 10K Team Hydra (instructions) Parallelism University 1K Olukotun Size Hydra Systems Grain Kunle 100 The Stanford 10 Multiprocessor Program Stanford Computer " ' Loop The Thread Process Instruction 5 o 1 a! | Exploiting - 216 - ” better CMP a University better University thread Hydra for Hydra thread 96 buses ‘ 1.5-2x Stanford Stanford 30-50% and fine fine coarse Case cycles) single Interface write l : only & & i cache SS SS Hydra (10 LIOdaCadM Bus CPUS The and (64b (256b) ILP => processor ILP => ILP than comparable => “ ASPLOS TO is interprocessor T CmiHtawyCMhddr | C*5»| Bi ► # > > Bus ► read ----- 2nd-level , latency T 2 ; Write-through Read/Replace 5 Interface Low Shared communication Separate ' LltWiOaehe Memory CPU > >■ > | MwhuitaM lfMt.1 “ Main CPUaMdnqrCMrdhr Cache Memory LI T! : ArbNraton DRAM J |- 1 Rambus But Design 1 r CootroSw DeteCeehe L1 CanlraVlMf CPU Memeqf j caches 1 taetl caches CPU Cache i |Lt { I data Hydra J Cache Superscalar multiprocessor coherence primary f Cad* I «« processors “ 10 vs. On-cMpL* L CPUO maintain Base T “ Four Single-chip Separate Write-through CWSUMMyCoaMW- to Cadwj > >• > > Hydra The University University Stanford Stanford access) cycle cycle (~5 cache lines L2 performance time purposes caches 16-byte writethroughs array writethrough writeback most Details L1 for on-chip thread architecture for Data, performance support bus lines 2-cycle 8K bus design base access write Associative, Associative, read of 32-byte Architecture thread thread buses cycle single-ported Set Set speculative single-ported pipelined, KB, of Instruction, Hierarchy data prototype Hydra Single Fully 2-way 8K 4-way Line-wide 256 Word-wide sets ------ 4 Shared, Two > > > Performance Improving Base Hydra Speculative Speculative Conclusions Outline ■ ■ ■ ■ ■ ■ ■ Cache - 217 - University University very Stanford Stanford is I Loop early applications too expensive write: synchronization Parallelized from Forwarding and threads C-programs => occur of FORTRAN conservative difficult Speculatively reads disambiguation parallel too aggressive matrix is Requirements Software is be to dependencies limited help? when applications pointer dense is between for data analysis i Sequential compiler compiler Parallel data Loop auto-parallelization Iteration need time have disambiguation hardware violations the software Original Speculation can Pointer Remove Hand-parallelized Compile Allow Threads Auto-parallelized Forward Detect >- >- >- > > > >- How Parallel difficult Traditional O © ■ ■ ■ Problem: Data University University regard Stanford Stanford semantics without CMP writes speculative for for memory commits automated (e.g. writes sequential time correctness writes for support support performance bottleneck 40-60% a easily easily are perfect for parallelization t of the ’ threads code >95% original of sequential isn Performance now refills, ensures only is 0-7% cache for 10% refills, Speculation with follow L2 enables support arbitrary arbitrary low-overhead for miss to <10% structure parallelize within hardware ) into stores multiscalar ~35% to bus only Data System execution and provides code parallelization rate: accesses synchronization ways speculation speculation Hit Internal Occupancy: caches - - - data-dependencies Loads Loop Break Speculation Parallel subroutines Hydra Add Wisconsin Performance L1 Most > >- > > > > > > >- > > Data Data Other for Memory Solution: ■ ■ ■ - 218 - University University Stanford Stanford set III are by bits in CPU A-D) read #i+1 CPU later later Hi Speculative hit pulled The parallel L1 are in (priority memory O ---- line #i L1 byte a of CPU *1 Cache n: ” to L checked each Requirements are on CPU written views “ #i-1 CPU Speculative earlier buffers bytes encoders Reads write CPU ” newest © multiple and #i-2 priority CPU miss Head L2 “ The Nonspeculative L1 © Speculation D — Maintain Cache © Data Speculative University University ” M Stanford view Stanford “ Iterations II for Iteration © STATE Successful PERMANENT forwarding L2 after IteraUoni violation state backup smart state & Writes after threads Support provide Requirements speculative state control F7 buffers to retire pre-invalidation violations speculative bad write and with detect and retire bits bits reorder caches coprocessors discard tag tag L1 L1 Speculation L1 ” buffers ” Speculation Safely Correctly Read Dirty “ Separate “ Write Speculation © © © © © © Data Hydra - 219 - University University of a -02 for but on modeling the Stanford Stanford runtime applications architecture aspects 2.7.2 real all single-issue Support Support Hydra GCC processors 4 of system Accurate and interface hardware Entire Improving > > > > “ Applications dependency CP2 than and data 98 ‘ Speculation Parallel System threads implement from through to Data “ Thread speculation in ASPLOS to 99 ‘ ” threads recover simpler speculative ICS Speculatively ” Runtime all and of of overhead routines CMP description speculative Handlers order more flexible Multiprocessor Hydra Speculative Exception more Performance Chip Complete violations Adds the Control Track >- > > > » Software Performance Base Speculation ■ University University checks that Stanford Stanford hazard L2 the RAW into variables buffer L1 out cause thread & © write our CPU local ilidations iteration & L1 #i+1 CPU drains -gnffgr later Speculative any per our Pre-inv cache buffer speculatively Threads . - L1 te speculative #i pre-invalidate its write Wri thread CPU a dependencies invalidate Cache to just globalize # iterations CPUs writes translator procedure and CPUs ” & #i-1 CPU generate in CPU loop Speculative Earlier" Writes A Later after “ speculative Non-speculative body “ loop-carried Detection source O © © © loops procedures calls © CPU one Invalidations Speculative RAW to while code loop support pwhile #i-2 CPU mmm cause "Head" o Nonspeculative and Bus source Write Execute Procedure Pfor, Typically C Analyze could L2 for > > > > > > Cache > Speculative Speculative Compiler Speculative Creating ■ ■ ■ - 220 - University university ► RC32364 Stanford lost) Stanford code (IDT) work L2 source KB with 128 violations frequency, Technology and ------ values Transformations D (PCs, mm I, PC Device 11 occurring KB 8 thread write Code thread loads in statistics load-stores with in and Integrated frequently up down on and read violation tool 0.25gm loads stores in Prototype based dependent non-violating 2 Motion mm Synchronize Use Move Move Collects Correlates Find > > > > > > > Design 88 Feedback Synchronization Code > > Feedback Hydra ■ ■ ■ used University University transform Stanford Stanford of statistics performance manually hardware impact Base code Violation to ■ m and performance some Flashpoint) limit Performance help frequency latency (requires Performance locality memory can violations (MemSpy, data reduce for to statistics movement independence shared misses prediction Parallel Speculative data data communication threads Feedback Base dependence optimize optimize cache violation H ■ to to at at cycle violations coherent explicit explicit Look Frequent Dependence 100+ Need Look Need data support) No No - > > > > > > > > > 4-i 3 Speculative Cache 35- Optimizing Optimized -221 - University University Stanford Stanford Hubbert, all Maciek 2000 of Ben and 00 late ’ Chen, Lim by H1 models by out Mike Melvyn 99 ’ Verilog tape Map layout Siu, H2 and Hammond, by and Mike Road Lance Finish design . (IDT) synthesizable Prahbu, verification Lam, on Team circuit Design Kozyrczak Monica Manohar http://www-hydra.stanford.edu > URL Team Finish Complete Working components Hydra ■ ■ Chip ■ ■ ■ University University and Stanford Stanford for path with levels evaluation parallelism all compiler at microprocessors migration mechanism and applications controller applications on design performance speculative details parallelism to for controller for screening parallelism evaluation fine-grain reference Tasks optimization application interface way resources tuning for buses exploits all support performance memory new memory exception MR for code cache a large-grain parallelize coprocessor write mechanisms implementation high performance main L2 to Implementation platform to debugging performance system and out overhead offers and facilitate Read Speculative Speculative I/O Controllers On-chip Off-chip Statistics Provides Low medium to Realistic Single-chip Provide difficult Allows development Work v v > > v > v > > > >- > v » v Memory Speculative Hydra Prototype Implementation Conclusions ■ ■ ■ ■ - 222 - University University Stanford Stanford CMP a CMP in processors all large to Hydra implemented the in be architecture broadcast arbitrarily may implemented prototype be implemented be possible support Bandwidth be made buses memory are Hydra may simply be may the faster may CMP can buses of buses CMP? speculation CMP? writes a a techniques Hydra Pipelined, Multiple All Wider > > > > Bandwidth New Overview Why Thread The Why Outline ■ ■ ■ ■ University University Stanford Stanford the critical Latency t ’ from of memory from isn or more cycles) cycles) cycles) Cache Read (50 main L2 CMP Read (-5 100 communication parallelism #2 #2 than CPU CPU Laboratory Team (more University Olukotun Main MP cycles) Hydra Cache communication 10 Hydra Memory fine-grained L2 DRAM Systems interprocessor of Kunle Communication about multi-chip The #1 #1 Stanford faster CPU CPU (usually to minimizing more to Implementation more Cache Computer or memory cycles) Write or L2 (5 offers cycles) Conventional, Hydra Write exploitation • CMP? • (50 Stanford main a CMP The A Therefore, Allows Why - 223 - of to SM University University full the time an a prevent a CRA remainder remainder statistics at buses from to into , Stanford does Stanford handle * from take to write Devices caches ,cm i cache necessary also access through passed turned if I/O CPUS 1 access machine and (64b (256b) necessary w be arbitration stages ON**** sequence: Bi Bus arbitration s as H primary read ’ counters state access 2nd-level conditions Handles pipeline, Accesses SM SM memory Could Get Sequence - - - - - Each Control Each race address Write-through Read/Replace brtarface Separate Shared Separate > > > > State Memory Memory > >- > Main Memory rest chip to DRAM of Control Rambus Design . 1 Design rtumber rtumber CPU state Logic Output the caches fist in count to data FIFO cycles Hydra state multiprocessor coherence of up : LSCedw per State Buffer 1; Access Queues processors On-chlp Machine CPUO maintain Base Four Single-chip to Write-through Next- Arbiter Central > >- >- Resource State The University University Stanford Stanford lines less reads writes of transfers on on sequence processors allow errors and access access L2 unit ownership system the interface well bus bandwidth controllers possible cache-to-cache to memory memory controller: do acquire work high data to to off-chip main main memory still and memory interface: Coherence to execution protocols hard-to-find have have memory start start drive return the t t ’ ’ time for to to to to machine machine design don don faster memory in protocols Controllers state state through design coherence independent CMP? processor machine machine machine machine potential easier communication main read write a result Processors Processors Fast Less Shorter efficient L2 L2 State State State State each the >- > > > > v > > v > > Faster, Can Simpler In In Several accesses Memory Why ■ a ■ ■ ■ ■ - 224 - University University s ’ its ” Stanford clear Stanford clear it is early will program as accesses older each need? reserve too performed “ up threads all matches bits for violations vector is of vector bit ... multiple original occur must bit after match queued its L2 those immediately 12 system the the cache indicating access memory memory its to when are Arbiter in reads speculative the state L2 of the clear bits when ” simultaneously done in in the clears be Requirements set write are in writes when memory address address views will all “ memory conditions its between with an it bad misses misses access to main of Address missed processed race when the a data that is that same address same vector each — violations multiple of that speculative bit thread of the a while of accesses does to competition, prevents access access access Speculation discarding no Central Parallel-compares If miss Saves accesses Otherwise, completes This Later > > > > > > Every Each Each address Forwarding Detection Retiring Maintaining Safe order executing Data ■ m ■ ■What The O 0 © © © University University Stanford Stanford system try to the tracking easier in threads re-execute automatic requests & parallel R/W interface CPUs) in resources bus writes full send hardware all writes) grants do reads automatic correctness of support becomes write t among Arbiter to ’ fine-grained shared memory own allows more? the access of isn dependencies before parallelization segments track code its an cycles real use with still generates main keep memory hardware reads track acquire support start find makes even can program add then maintain make to parallelization to to (priority can (usually Resource do can prioritized this broadcast CPU table allocates must necessary need memory by type ... tracking therefore therefore are speculative are support, processor processors processors sequential that ROM by parallelization also uses can R/W when add CMP this cycle First SMs Second SM Each Other Do writes We communication small CRA a Central ------ Requests Programmers A All Programmer Speculative running All With > > > >- > > > > Each Every Fast buffers, However Hence The ■ The * ■ ■ Can ■ ■ - 225 - University University Bus Only Mask byte) Clear and Clear (by Stanford Write Stanford Write Invalidation from Commit Gang (he) in Force on Bus backups commits Data Data cache Read (L2 Write on on to advance and Only out bits backups Clear ICAM] L2T«g Data and Invalidation clear clear Addresses of array Backup V violations Read Gang on to to and Force renaming the detection after bits bits types commit in up to until commits valid memory valid us are extra back violation on 4 here to line required Details buffers cause cause Allow pointer Allow bits priority ” us Allow held complete on each line are tail all bits bits cache ler are *** Clear of for earlier L2 e Tag requires “ when Details present with muxing Allow cache if L2 Gang mask by and from writes byte-by-byte from clear array circuits tricky into modified pre-inval tag write time read data are Read-by-word: Pre-invalidation: Modified: Set Set Gang Written-by-word: Buffer Cache Drains Requires Line Collected Byte Any substituted, encoding > > > > > > > CAM Special Speculation ------Speculative commit ■ 12 L1 > > Reads University University ” to view Stanford Stanford 't SM “ 1 1 for : thread Icpzj i ‘ small Sw:, threads UDMCMwl SpwwkrtkH. Pd (64b) .0 ' <»• ' Bys ^ buffers: with forwarding complete 1 1 | L2 reads the &** ™*- i IcPZj 1 speculative C«h*A A completed Write-through state smart backup ■ buffers „ U & | restart threads per of buffer * " A UM::* from ^ : threads i provide Support > forwarding manages ' , 1 ; ' when speculative order ■» CP2 included 1 threads control j control L2 buffers i buffers to i retire * pre-invalidation provide violations to be to UOwNCwlw* Spee»l*tXt»D*» of I maintained write and when * with is detect background and controller buffers may 'S'"* sequencing jvC#D ' L2 bits buffers Overview bits reorder the sr caches CM coprocessors 1 draining buffers tag and in tag L1 buffer sequencing t buffer L1 Speculation 3pM»OT*ol»b- L1 ” buffers bus buffers ! ” ■V>. - L2 ■ C occur Clears Maintains Uses Allows Commits Dirty Read Separate Speculation “ “ Write Write Buffer > > > > > Extra One Simple 0 0 o e & ■ ■ ■ Hydra L2 - 226 - University University Stanford Stanford needed changes is tone 7 2 12 - - 12 -30 OS -22 address + processors Loop-only Omhid 17 end bits tine handler sequence 15 -70 -70 OS other -no Hs sequencing special + threads OwM 25 Procedure a tag of if this on CPU that to starts snd and track Mice by loop execute from routines: the present) cache and iter buffers if Ae thread procedure SYSCALL on execution a execution a to on end loop speculative iteration, iteration, this code, procedure registers thread task cache primary a software starts thread, non-speculative. committed less next of cache loop loop a for at speculatively the processor then the speculative following following a requires out to is Vee writes speculative Summary processing it when snd completion secondary violation current current speculative runs procedure code thread major if tone L1 or head" its cerent stops down system the the until ... processing " the off processor processor coming bits RAW bits the the a another off iterations, shuts the processor, processor of attempts machine speculative forked control of this control then the the loop Forks loop Completes Completes Restarts ■utiuctioB, Completes then Temporarily (or Prepares running has Handles Pauses simply CPU a Coprocessor of thread Full running loop T Local from ” Loop speculative state are each Exception Procedure Buffer Procedure of the stop SunL the together Finish Receive End new Start Vioimkm: Hold: End Hold: System clearing the a or to some all operation commands examples: small “ it ~i a Procedures L *W<»* speculation Restart Start Start are - - - Has Initiates Commands Some > > > > Interrupts Holds Maintains Here Controls Catches Putting Speculative Runtime ■ ■ ■ ■ ■ KB 2 University University a for but on the Stanford Stanford Lines violations KB processors Support Cache 1 B hardware 64 Improving “ Applications other than to structures and dependency 98 ‘ Speculation System Parallel KB implement 0 from memory to Data commands “ speculation in ASPLOS to 99 ‘ ” threads Sizing recover simpler to ICS Speculatively speculation Runtime ” write-based writes and of coprocessor size overhead are line routines CMP description thread fully full in caches the Buffer most a speculative Handlers managing sending per more L1 flexible Multiprocessor when Hydra By Using By buffers used of KB - - - 1 Exception Control more Performance Complete Chip Adds the Data buffers buffer < captures Stall We associative pair > > > > Software > > > Small sufficient comparable a All L2 Speculation ■ ■ * - 227 - University University and bits Stanford Stanford avoid ... Hydra to logic of tuning evaluation ... including speculation code merging necessary version macros L1 for as mechanism here buses controller read controller for including cores cache performance write functional implementation pipelined for and required coprocessor, cache information screening memory reference be and existing system, fully interface structures resources invalidation of core fan-out CPUs will arbiter main buses all yet read Challenges rate the rate operation high gang memory interrupt feedback for bits cache CPU Overview of secondary write mechanisms memory speculative long system L2 small memory debugging clock off-chip cycle clock a with clear resource and for fan-in, our our and generate MHz Design Memory Single slowdown Conditional Read Gang Simple Target Controllers I/O Speculative Statistics On-chip Speculative to High > > > > > > > > > > > > > > Drivers Speculative Clearable Central Building Starting Adding Adding 250 Key Prototype ■ ■ ■ ■ ■ University university -► RC32364 out hazard, Stanford Stanford X to X by KILL data sends restart (IDT) handler the VIOLATION the address address message L2 the speculative respond CPUs handler - 3 KILL notices notices reads writes KB 1 1 a more processors executing handler starts The They All 0 exception 128 ***** Technology access and D bus C«w#wuc«b Hazard Device ______i \ 11 write IcigwAfl* ***** a KB Floorplan — [ NMI 8 | ******* mi indicate Data :■ with ****** Integrated a Lines CM) OwtM* - \ on I of 0.25p.m * in based Prototype '**£$!* 2 mm ■< Design 88 > > Hydra Anatomy - 228 - University University of Stanford Stanford purposes between make efficiently easy can evaluation memory workstation memory balance fine-grained statistics memory execution host for easier implementation resources main main relatively relatively a further main makes software is effective to into speculation CMP from program allow allocate internals off-chip like performance latency offers automatic will hardware chip and interface results programs during large speculative parallelization to of of I/O nearly to I/O reasonable structure and techniques I/O mechanisms a make prototype for possible speculative interface interface loading reading simple complexity communication can parallel interface offers Hydra hardware Allows Allows Lower Allows Allows Bus-and-SM Support parallelization design Arbitration threads Adding CMP >- > > > > > > > > Direct Simple Hydra A our The Prototype Conclusions ■ ■ ■ ■ ■ University University Stanford Stanford tuning 2000 all code of of for circuitry end coprocessors by HTOO models by monitors out 1999 violations system of on times Verilog tape Map layout speculative monitors end machines Mechanisms in and memory reference by and feedback state arbiter debugging utilization Road timers in Finish for design . through memory synthesizable provide verification on idle-arbitrating-busy resource resource circuit counters chains Design monitors Track Track Primarily > >- > Bus Programmable State Scan Central Speculative Finish Working Complete components Statistics/Debug ■ ■ ■ ■ ■ ■ Chip ■ ■ ■ - 229 - University University but Stanford Stanford violations loops processors hardware ... other parallel than to structures threads CPUs dependency dependency System system the implement from sequence in memory to commands speculation speculatively Java parallelizer to speculative caches threads in recover C into simpler simpler properly L2 for management speculation caches Runtime collector to to concepts write-based and L1 coprocessors coprocessor loops overhead routines threads in thread the speculative added automatic Handlers support code managing garbage system sending bits more flexible together together By Using By user the - - - In Dynamic Basic In Extra Buffers Exception automatically Speculation Transforms Control more Adds > > > > > > > > > > > Speculative Hydracat Software Hardware Software Works Speculation Outline ■ ■ ■ ■ ■ ■ University University while code easier Stanford Stanford dependencies parallel the management data in parallelism data conservative for CMP Hydra of post-subroutine for programming easier parallelization enforces parallel program Laboratory semantics a in with on Team of much simplifies make instead system parallel University Olukotun Hydra parallel program Hydra iterations Systems Support in runtime locks opportunities sections threads Kunle optimistic The for loop Stanford be parallelization communication sequential conventional execute can expands parallel to Computer multiple subroutines Programming manual supported Stanford latency run run allows Hardware-software Software - LL/SC Low maintaining Attempts Can Can Makes Compilers ► > > > > > > Hydra Speculative Speculation Parallel ■ ■ ■ - 230 - University University Stanford Stanford iteration ” speed memory Quick Orertieed “ each for checked -no management code loops to speculation forked I ” A'##!###';,: and reasons: through through loops Stow and off and “ OvaituA another within this forked processor, to processor another made has compatibility this locks call this overhead within shuts loop assembly requires passed by on this that attempts CPU be processor, several processor procedure running subroutine for then the Loops Subroutines then call this this be thread procedure execute on and starts requires a on by processor for and loop procedure committed speculatively execution MIPS task Use a and use require the speculative thread subroutines for subroutines on starts (or iteration, for for must speculative present) processor code, following iteration, committed violation if speculatively speculative processor, then loop to loop speculative code subroutines less current iteration into and RAW loops a processing speculative processing the thread, a speculation subroutine the violation current speculative completion system loop off current of its when down the execute . less the current the structures expensive expensive next RAW a ■ iterations, Handles registers predictions must predictions a Forks Restarts Completes the the made adds disable when can are Prepares run Completes data CPU to Completes Handles Restarts Local types are Receive hand-optimized value subroutine Procedure Procedure loops Software another Software in Routine loops loop CPU ” Local End Start Receive Violation: as ” Loop Loop from Violation: each routines another of iteration Start Return Complex Finish bodies Violation: different Callee-saved End from Violation: Slow Quick - - - “ “ Unfortunately, Same Written The - - - - > > Loop Two > > Support Support University University values loop- Stanford forked Stanford available effects j return hoc* dependencies side among tror'x predictions ... controlled made Threads dynamically executed from 4-45 return is Threads are predictable are easily or iterations distributed Speculative errors loop-carried loop calls check Speculation ... Speculative *■ > & as and/or T!"T .jfcwxfc VOID : nt- body call:: Prod prevents with enforces continuations generate limited Speculation dynamically software body are with Prod specially-marked ■ hardware hardware handlers a subroutines loops dependencies Loop speculation when iterations Iteration by Software Program Original - Post-subroutine-call Requires Speculation off Loop Requires CPUs carried Speculation > Original > > > > > Post-subroutine-call Loop -231 - loop loop Test in ); watt); University University Start y, Initialization x, locally *«»); . () variable out hazard, Stanford Stanford used location to X X function i*l; ,..)i * by ( KILL Xi data counter » y = sends yt restart loop W) * y handler the VIOLATION the 2S| Loop variables < address original address dependencies message f l* «** s // t //Loop-carried the •I** ’ speculative respond CPUs * handler (Hi U () 3 « KILL reads notices writes if spec_end_of_iter»tion X it tool*er_BigJunctiontl HOr»_Cod«_B«r separate loop 1 1 processors handler a more executing All 0 starts exception The They processors data all the TtiisLoop New, to In true Loops ThistoopUi void I: hardware: broadcast when write a Dependencies invoked indicate loop is code speculative in Lines the code itself Data locally Conversion by variable system function x, used Siwj , loop control , ti encapsulation i CPUs own ( counter i*i; the all its =, x? loop Loop, Variables Loop-carried y in * i++* runtime on / body detected // // loop / variables y 25} speculative 90; loop !* oop « loop else are I FOR The i <1 Add Put * > Hydracat Enforcing if X Another_Bifl_Funcfelon — — Program Start Speculative The The i, Transform 1. int University University ” that body only Stanford Stanford 7 7 -22 Improved Overhead “ ... loop loop code system single loops tedious equivalent ” purposes a by the the is II Quick subroutine in converted required “ on just special register-allocated ; t ’ be ” own not loops pwhile simplified 80 with ~ to Slow programs a its system “ aren returns speculative code automated performance Loops and s ’ need into the and uses processors marked shuts be loop for this attempts processor, processor procedure to parallel user values then for then all this this that execute pfor and subroutine, requires to on by then loop and can body) execution speculative a the speculative loop speculation management locals, thread on are start (or starts iteration, processor present) iteration, committed loops loops if speculatively Use into then loop loop added develop into continues, to to loop iteration and to loop pure speculative processing thread, to management violation thread current speculative ” (or system loop converts current process original of loop-carried p down the less the current the “ code RAW next made a iterations, Overview a loops the the a loop easy subroutine breaks, the is Software loop the when tool that overhead run Prepares ” it Completes localize to Handles Completes Restarts all add setup the portions to subroutine loop CPU loop initial Local Receive Loop Loop Minimal No Key Ensure each Makes Add Convert Convert Try Just Pull another of iteration Routine Start - - - Finish Improved > > > > > > > > Violation: Entire End “ However, disables Converting Our from Violation: > > Hydracat Support ■ ■ ■ - 232 - University University up t x: Stanford Stanford +■ ; be i*i; loc*l_*W value * violations t speed parallel y, lot! locals# x, local_sum) * a r even maintained , frequently not critical i fi ( parallelism looal^ann information * i*i; case parallelization the = are help can (...); hidden y necessary (nonlocals*>wum) that may most * 25) dependency failure can be (oanltxsala-xetaR) so != application A_Big_Function of i value common violation calculating track (nonlocalB->si*R) occur local_auaa Anotbaar^BigJPunction More_Code_Here the Prediction may semantics semantics A automatic movement artificially with the of r written information critical is chance value Rare is the parallelism Case a Fixup lative Move code runs for of provides monitors make s involved ’ tuning allows improvements to code Value ); *->«»)); sequential y; there + loop-carried dependencies without algorithm optimization calculation quick support untuned nonlocal Optimization ( occasionally always the true y, with with when critical attempts speculation, in a code x, (nonlocale->sum) because record first, sequential , small, , i to ( i performance ( is i*i; * where Hardware/software and is writes involved = few (...); - y Fix not Often, present Speculation during that that A This 25) performed Prediction Thanks > > > > > != Hence However Speculation A_Big_Function Code Code Code > > (i Feedback ■ ■ ■ Optimization: if x* More_Code_Here AnotherJBig_Punction University University ; ; StfUCt Locals StfUCt ; ; )); ) Induction Stanford sum PdCk Optimization Loop-carried loop -> s UfipBCk Transformation i 1 in tnonloeels->«M*n variable local_au»> tl closer the ; « y, y. . «t, iOttloeal»->*UB> variables locally nonloca x, loop location counter , ( function ■ variable 1 < ndnloeals->suN) of ( , used ; Ci, l i loop-carried loop 0 ( ( ■* local.eum p ; ; *nonlocale) loop ..,); . copy +* i*i; • ) counter %:' subi sun original . i = variable 0: = s ’ increase increase y * Nonlocal Nonlocal Loop-carried Local (nonlocals->*um) Loop loop-carried Variable* sum i ' , . U ft II // * // Au.Bia_Functi Variables separate 25) ( loop thieLoopVars ; ( with » thieLoppNonlocals; != (nohlocals-?: (nonlocals->su«i x Another_Sigjhmeeion Nor*_CodejH*re( the i; A_Big_Function sub 0; New, (i struct = value In II: y; int int thisLoopNonlocals nonlocala->su») Struct IhieLoop ( greatly if local_»ua i: - Amather^BiflJFiwcfcion More_Code_Here x, sub thisLoopVare; ) typed*! thisLoopVar# thlsLoopNonlocals Movement int int int thieLoopNonlocals sun void tfci»Lo«s>(6t&i*L6opllonlocal»t : critical loop-carried can value a the of ) to body registers parallelism compilers of Code ); code y; write SGI ♦ loop-carried out loop a for calculation ndnlocal»->euni> ( -**«*) the work Conversion with critical y, transformations of available a variables (nonlocals->sura) x, transformation CPUs , of i moving nonlocal# itself < ( all top loop variable * writes on body Involved = loop variables transformations y the oop not loop that I FOR loop-carried 25) Simply amount to These != Program Start Speculative The Speculative The A_Big_Function(i, — Code Code (i > Force = Hydracat (nonlocal»**>stmt) Optimization: if x Another More_Code_Here 2. ■ - 233 - no used University University Class with and transform RTS Stanford Stanford statistics dynamic intervention performance C compiler, manually manual Base Optimized code to Violation JIT multiprocessing fine-grained ■ # ■ facilitate with for GC, from (e.g. Java suited compiler benefit for JIT well required can environment and model Ideal and speculation multiprocessing implementation Performance routines routines is of thread environment these Java support of machine specification Hydra Native Runtime Many loading/verification) Most management coarse-grained > > > > Virtual Java 4-, Programs Speculation Why ■ ■ is University University that until C ): loops, waits structure Stanford Stanford normal )); >aum) that * overheads code with continuing routine frequent like most its synced optimizations with "nonlocals" {nonlocals before the , easy performance be i i in ; ( just i to with { dependencies parallel i i*i; helpful is be direct « X; protect equal {...); variable with = a(nonlocals->sum_lock y assembly-language high be >eum_loek) y amortize new w to an a 25) along should is Hydracat minor can is help becomes 0, to helpful else =» to can (i to « be used used (ncm,locals spec_lock(i, x if with Another_Big_Function More_Code_Here form sum_lock spec_lock sum_lock ' initiated ^ conventional conventional protect achieve be can in enough programming to Speculation can can feedback dependencies speculation large programming ; ; ) Synchronization parallel dependencies continue the from low-overhead parallelization code speculation provide for can critical y? its local_sum) ♦ routines ; synchronization y, programming in parallelism parallel code can it x, to (nonlocals->#t»t) of , most , i { synchronization i ( locel_etaa iteration automatic i*i; «: speculation, the * x; protect (...}; unpredictable synchronization = y section (nonlocsl$">etm) Similar variable to y Speculation ” * 25) - - Loop Post-subroutine-call predictable Fully sequential especially With Explicit Only and Hardware =» •else A„Big_Functioo sum >- > > > > Explicit Critical “ added (i Speculative > > Speculative = {r»5»iiocals->s«ml if x Another_Big_Function More_Code_Here Optimization: Conclusions ... □ □ ■ ■ - 234 - University University / for Stanford Stanford stack code required by as during from C disables profiling / invoke in speculative detail (LOD) compiled Java passed to JIT vs. dynamically accesses of code code enables Methods compilation identifies top heap controlled registers Under at speculation overhead flag system level-of-detail annotations techniques assembly Java space dynamic separate code minimize access of runtime dynamically on profile code dynamically inserts can annotation procedural considerations Profiling the performance clearly sampling of methods code to to method speculation assembly system remove accesses change utilize adjust / reserved compiler compiler advantage rewriting Use profile Can local Can Can method Unused methods speculation Add Bytecodes JIT JIT > > > > > » > > > Implementation Runtime speculative Take Advantages Analogous Java-specific ■ Speculating ■ ■ ■ Advanced ■ ■ University University ) Stanford Stanford run to platform call penalty any control, code Java version on ISA method in source-to-classfile process source only or performance MIPS g. d) choses ’ implemented or no (e required bytecodes for executable for call (GPL hand, http://www.transvirtual.com translator ( body class-private speculation by primitives common support compiler allocation normally normally Environment dynamically into for loop licensed method additions handlers loop Speculation done (JIT) of for machine compliant transformation inlined register compiler SwingSet body libraries public / 1.1 normal easily be system specific body Hydra Loop be code source-to-source loop Multiprocessing synchronization) Global Optimization AWT virtual JDK Can - Speculation versions ------Hydra Support General Just-in-time bytecode Default Loop Can Via > > > > > > > > Kaffe Move Some Runtime Two Using * Java ■ ■ ■ ■ - 235 - source University University source object speedup hand- 38x Stanford Stanford stack 1.98x 1 garbage gray completes Hand-picked methods => => speculative Unmodified optimized With be objects roots speedup occupancy > > » ■ □ as runtime must sweep objects; objects classes points-to from after gray marked white Speculation all or default objects or to heap; pointers marks black white black heap explicitly to all Collection to pointers live object Method sweep left, - - unmarked of objects points points or - after stack loader base from objects heap; Sweep heap; black as garbage live class gray live thread model - - references - no and acts Default Black Native Root becomes Gray White and > > > > > > Roots Tri-color Mark Speedups ■ ■ in University University sweep Stanford Stanford critical objects with be and live regions from longer mark speedup call Profiling optimizations no to frequencies identify / benefit programs and can method C with Collection statistics or speculative may as speculation incremental accuracy which of that is speculation bodies addresses dynamically disable profiling / transformations loop loop can objects of execution prediction sections from performance optimizations Garbage violation of based and enable of free effectiveness Speculation value cycles system critical in results and collection implementation sorts detailed Return Frequency Size Speculative - - - - procedure heap Dynamically Determine Programmer runtime Same Identify speculation >- > > > > Collect Applying referenced Use Baseline Garbage sections the Speculative Optimize ■ ■ ■ ■ ■ - 236 - University a University each calling size from Stanford modify Stanford with barriers bytes 300 that examined object object 200 list write speculative < than we by barrier as gray white violations with list collection a more invariant aastore) the bytecodes sorted to write arrays no at gray on method short-lived, on Barriers computed lists thread but applications expected onto putfield, pointer pre no Collector free barrier a head calling objects most incremental long-lived black/white are sweep off in of as accomplished objects, of common, good object Write write for write and speculation to sizes required, (putstatic, usually most 10-25 on white barrier objects number - - maintains results inserts barrier mark value collection object least Garbage sizes continuation write these attempts some at requirements trapped objects write objects object; references all procedural return compiler of allocation object Place Sweep Execute Execute No method Sweep thread Within black Large heap Non-array Allocate JIT Trap Small > > > > > > > > >- > > > > Role Using Experimental Fast Incremental Non-copying Two Baseline Speculating ■ ■ ■ ■ ■ ■ ■ University University Stanford Stanford Non-spec Non-spec Non-spec Non-spec Non-spec spec Implementation(s) spec, spec, spec, spec, spec, Parallel Procedure Loop Loop Loop Loop Loop references root Collection stack GC finalize thread needed again thread to Collection if invoke list list objects native objects objects Function Program object limit, objects gray gray Finalizer barrier at root root Garbage white Resume Mark Heap Mark Invoke Identify Free Write Sweep Finalize Sweep Sweep loader class and GC GC GC GC GC GC GC GC Thread default Program Program Flnallzer Mark Parallelizing - 237 - in University and University and spent speedsup JIT speedups code speedup loops larger time Stanford Stanford as on sections have critical when based relative Calculated in GC application Collector impact improves compiled will identifying Only * ■ wait like variables idle in implementation non-issue GC a in 1% 0% 0% 7% 7% 3% 7% 5% 6% 2% shared 16% 12% 16% 10% 27% iterations 20% is debugging time debug on Speedups % resulting 9.5 9.2 and 17.4 16.5 58.3 58.2 33.5 70.2 79.4 40.4 62.3 89.3 135.2 122.1 2837 295.2 time between speculative (ms) Parallelization model, to Elapsed correctness design loops performance to is 1.00 1.11 1.05 1.04 1.21 1.12 1.12 1.04 Overall balanced since synchronization loops similar iterative Collection not work GC to 3.41 3.12 2.27 2.66 2.28 3.78 2.53 2.84 Speedup difficult is have of orig opt orig opt orig orig opt orig orig opt orig opt orig opt opt opt Bound Bulk Must More violations, work Traditional > > > > Set Non-speculative Speculative Structurally Garbage ■ ■ vs. ■ Benchmark mtrt Swing compress db jack javac jess jBYTEmark University University 2.20 2.61 2.11 - Stanford Speedup Stanford from 2.22, 2.41,2.61 2.41, 2.09, 3.41,3.11 Loops between speculative benefit per speculative linked-list can iteration Non-spec Non-spec Non-spec Non-spec Non-spec iterations spec Implementation dependencies per each Collector spec, spec, spec, spec, spec, Collection loop which for Parallel Procedure Loop Loop Loop Loop Loop elements accessing loops for between overheads GC Critical finalize thread needed again linked-lists to non-essential if linked-list Garbage invoke list list on collector objects objects in Function Program balancing object limit, objects gray gray gray Finalizer dependency speculative at root root multiple four load multiple eliminate Resume Mark Heap Mark Invoke Finalize Identify Sweep Sweep to Eliminate iterations Better iteration Maintain Amortize > > > > GC GC GC GC GC Process Identified Need iteration speculation Thread Program Rnalizer Program Speculating Speedups ■ ■ ■ lilSBfli IlliilgmgM 238 - University Stanford JIT GC, VM: high! into speculation and method processors system verification concurrently utilization and speculation match / profiling and Java loops runtime threads good available optimization loader of a for processor to Java Java managed parallelism on based class keep Hydra work to multiple advantage is with Dynamically Run Feedback Incorporate compiler, Speculate > > >- > > Distribute Goal Java Take Conclusions ■ ■ ■ ■ - 239 - m IS 03-3987-9354 FAX 03-3981-1536 /l —3 ? •ri'yp ^©iSSW^E 16 fi ¥1® 1 2 ^ 3 H =7=105-0011 ESSSKS&E3-5-8 * IS 03-3432-9390 FAX 03-3431-4324?7;VT-7 V y f 7 V 5 7 -f 7->@#§©3 >711 75 vt3 Lfetfo T£/J'£ < */A£A-2>o Cffl^SI4ti£*©X7 7 V^TX Mctt^T*S1i*X7 7 4jT.s 1995 ^CjPjfgci ilfc Supercmputing ’95 7:^iS$jTfc'''>7u'7’ — 7 D' & r the Alliance 7*D id5 x7Fj, © 7 3©±g#. doJO'T'Di/z^ F£*fSik L£o iS*©iSS. 4-JPiWl4fElt*iti:ri7 7d'A77 F&iBC LT*b. -rt-tc Grid Forum kPflifl-S Grid US®/cto® IETF(Internet Engineering Task ForcejlC jgo As Md&©®J£doir>'«|g©*d|i • SftS;J;*&lfoTU5cil)S|iJlibfe. $£. -f >77tFttt3;ET-( f-f Fife 7 7 7 7 £. iSWl&%* 7 F y-y-n^Hb. MtiSE7*7 7 -f 7 7 7iS*/b>6> PDA f;t©c3SS*& Web &k'©d,> 77&S|t''T'b—L7 L. ?©1©77'J 7-7a iSStff] +F—tf 7 *?&©$(<% #&2FtT©&o cdt(,®##lcFb^. ®ASH®^7 F ;F7v^3 > 4>'D±#©7-* —tt#*56ftCjgft£ ko? liotl'SC k Uto