NEDO-PR- 9 90 3

x-A°-=i wu ? '7-z / □ ixomgffi'g.

NEDOBIS T93018 #f%4vl/4=- — 'I'Jff&'a fi

ISEDQ mb'Sa 3a

0-7 rx-/<-n>/<^7-T^yovOIElS9f%l

TR&1 2^3^ A 249H

illK

t>*^#itafc?iJI+WE$>i=t^/U7S#r$u:|-^n S & -

*%#$...... (3) (5) m #...... (7) Abstract...... (9)

m #?'Hb3>/W3## i 1.1 # i 1.2 g ##Wb3 7 ©&ffitib[p]...... 3

1.2.1 ...... 3

1.2.2 ^ y ...... 18 1.2.3 OpenMP 43,fctf OpenMP ©x-^fcKlfOttfcifeSi...... 26 1.2.4 #%3 >;W ly —^ 3 ...... 39 1.2.5 VLIW ^;i/3£#Hb...... 48

1.2.6 ...... 62 1.2.7 3>;W7 py—...... 72

1.2.8 3 >;w 3©##E#m©##mm...... 82 1.2.9 3°3-tr y +1 © lb fp] : Stanford Hydra Chip Multiprocessor 95

1.2.10 s^^Eibrniiis : hpca -6 ...... 134 1.3 # b#^Hb3>/W7^#m%...... 144 1.3.1 m # 144 1.3.2 145 1.3.3 m 152 1.3.4 152

2$ j£M^>E3 >hajL-T-^ 155 2.1 M E...... 155 2.2 j£M5>i(3 > b° j-—x 4“ >^"©&Hlib[p] ...... 157 2.2.. 1 j[£WtS(3>h0a.-:r^ >^'©51^...... 157 2.2.2 : Grid Forum99, U..B., JavaGrande Portals Group Meeting iZjotf & ti^jSbfp] — 165 2.2.3 : Supercomputing99 iZlotf & ££l/tribip]...... 182 2.2.4 Grid##43Z^^m^m^^7"A Globus ^©^tn1 G: Hit* £ tiWSbfp] ---- 200

(1) 2.3 >?...... 208 2.3.2 3 >h° j.-T->r y?...... 209 2.3.3 ^DWiJ^>h”2-f0^7;MyF ...... 210

3$ tstXf...... 213

: A. "The Stanford Hydra Chip Multiprocessor"###^ (1999.11.22,24) OHP...... 215

(2) ~r O/ 04 T4 4 3 ( n> □ <4 » #Y II # V 4 CO ft n • T4 Si i4 SI n n 0 0 O' M ___ V V 3 Ci n VP ft X 3 $0 min # % 3 ft z 3 4 SI u ss is 3 SI n i 1 4 0 £1 ft % If S M • n (4 VP 0 3 30 0 £] (4 3 min ant- ant- ft a d 3 V —t- & 4 s O' Si Sr SI Si 4 ever r^t- Fb 4 4 £j M EE 1—^ VP 4 ever r ft ant- 4 ft 94- d 3 SI s Ov V M Fb #B O' I 0 ever H n> 0 4 ft 3 # m rX # 4 min 4 O E M (4 ft c3 d I^ 4 0 s 4 9t 9t # 0v 4 I V > 0 0 4* . B O' M min z rv SI 0t y 04 r^ Fb 04 #d M Si 4 d * V IS If z ft H d x. I # SO 4 S> ant- niium T K; > w Si X 3 Si s> n m d m #° ever 0 Sr 04 # 0 ft a% $b $r ft mill mm 04 A Sr d V mill 94- h> d 3 # #4 u 0 n ai O' ft #4 H S T x_1 El % 3 5 Sr ~r ~K V ~r m 0 At- #y d null #pt ## #V sy M 0v 4 0 n 0 # E M ant- min V. ~r s #S' ft S rt T (4 £| ever 11 W Ov #Jtt • M d ft ~r VP y ft 0 0 m m 0 ant- ° u i^ 0 A s- 3 ever & gn ft ( (4 H d ft B 0- fS & # M minSS St- 0 f 4 Sr Sffl N St 3 §■ V ft #lV 4 4 04 m M 0 4 m B 0 Si H 3 1 Pt 4 04 v\ Ok tf # fX min ™7T 3- 04 VP (X ft (4 4 Si s 0 M 4 04 #m If w ft N 0 0 4 VP A (4 (4 4 #° ED- d N §5 mill 04 Si -r % 4 n 9t 4 M ^fr #T ft #S' CO (A- n 04 0 nun if 04 0 Fb B- z 1 H #Fb Sr SI iW s EU V rx 04 cf^ 0 #T M H 9ft 0 4 ft E nnt- N #vp H CO n 3 Sr cvcv #-r l SB Hi n % ° #w M 0 d 0 4 4 r Si­ rv it ■nt Xfi ft #0- 11 S Sr d 3 d 4 4 s xa 0 ft ft 3 l S gn 3 w ft 04 n m Ov ft1 fX #0 4 d St- 9t m St- O' O' 0 d ft ( 0 hr Ft X: O' ft 0 ant-ever H H (4 s E ant-ever 9- 01 % =Dr 04 E IS st- #o, n% 4 ¥ 3 hr min ° 04 rv ip Ov ¥ ° o N 04 VP • > SI ft -S' 0 s Si 0 #ft 1 frb rI 0 0 m V X-S 0 s ft (4 s 4 ED- ov vP n E 0* ti- u 3 0 0 l 4 % % mu m 9- al 4 -r Sr 5] & ~r m \P Sr B d t* ## m 0 V ft ft f4 A 3 X. Si 4 # -r ~r ft Pt ant- n 3 SB # ft m Sr S % S3 (4 a ever Sr 0 w <5. V min S V % *0 A ant- IS min 4 # t S V ft 1 Pb 3 4 E S ever 4 0 ft i Sr ft 4 fX Sr 04 4 4 n O' 0 # ft #3 St (4 as ant-ever #(4 0 V ant- 4 "7? mill U 04 n T4 04 • n> -X 0 0 M ft # % SK M (4 0 m =& 0 ft 3 ever U # • St- 4 V n U tt SI m n ff V V m 4 m 3 tw (4 3 #3$ niin # ( u V 13 -r d uu 04 0 V 4 |ffi [ * eff* 0 ant- ° 0 04 ever H i SI if ## Si )# V 4 SB S( VP % Oft Ok st at A ant- A min 4 O' 3 s M Si 4 i# ever 3 f j fX 0 3 Sr 01 0 E 04 3 err ant- 1 1 X. 4 o % V ever 3 vP V #M r IS 0 4 4 • £j c 04 » m H BUll n 0 g)4 m> Si (4 A) 4 l y\ m IW X 04 d M # ft f ■< T V SI B- d SI 53 Sr i^ fr #- 3 ~n d m % 9ft H- ft # 9v0% 0 m 1 M n Sr 50 4 5 V % & Ok s- St" 53 3 y 4i 30-Vl 0 Sr fm y ~r r 04 W* 0 r 4' 0 v\ 4 r i 3 H 4 % m §£ zm 0 VP ft A $ ft S d & m d s 4 ff 9t 4 04 N 04 0 n ft • #tj a^o r^acD Webj ftTt^o fT(:c:®j;9^#x^(j:)K@NGl yo^o:^ H:4o^Tf tlOOfe^o

M £Wib tT % o £ ti \Z <£ D , #!l X (i\ Virtual Laboratory (lEfiB^iB) • Virtual Microscope • Visual Super grid-computing • Networked Art (x y h y — 9 */7 JH tjx ) • Distributed Simulation • Technology for Everyone (?itHflfcW) & df O 4; d &,

bl_t:<7)4; d &WS£B§£xT, * rx-yt-n>yw ^ □ ^-oSHSE^j C li^do X 174^0 m% (D4 fcOli M t £&M £ 'if & o fc o #^'Jfb3 >;W VT(i, t;V^T'V^f >, >f >^-7D '> — A’flItfK OpenMP ^I70penMP(Dy-^^#Cl:fq](7X:#^, VLIWm#4l/^;i/

u WW y ©-titgfFlfficDfe^i&fpi}! t)I^§IiTc0 IMH

(D4#(D4fq)&tE@f ^ < gllfco j£ti^icn > b° j. —-r X > P&ffllz ol^T &, l^ti^icn > bi-f >f >777'J T — '>3 >(Dlilfgb!Mti'OUT!ll§t~£ b bSH, fMSftT £> ftTUTJKll^lc

dfi,^6(DC:0 2o0^#(D#^ -

(4) iNSfiK

? • a x v ny ]

#M #- ¥«BA^ ai?S t*f8¥f4 SS [# *] (50 g*) sis t#« ##EA# m%#@p ss tt ?A (#S±*E%Sr 3 AXAAgmm m# eia % i»ii*¥ ass Uj£ ¥A aSE*S:c*S«PS*^ti«|g-erE^flfftS$7-tfmg|i AEE55W m*# wbvekep/t mm A

[± *] 's.m mm ¥«D*f ax?s #%#?##%## ss [S M] (50 eIII) mm s*69§ ^xxAjg*w^m *3 ® *# Esij s±®9mm 3>t‘^.-^->x^AE^m MAEt-ef^® ass *$ &# #mm@A# A#K#mA-x^A#m^# ass ii]£ ¥A *ia*#@:[;#S*p^«ES«||^p^mt»fS?-tfWtg|5 EEE^g

[j£St#fC3 VtTa — 7-f • 9 ——7] [A 1] #w m- ##EA# Ufa #### ss m m\ (5os»i) "FE XW ARA# ASIt®«A>SS «* igjt *iSE*S:i:*eEPS*ASriS^rE^mtSIB7-tfmas £EE2ES era m ssha ? A#p^m*m%#m%# ass

(5) ill P fl# mm# #^i. rfe | B ill fPM iliSSHS x*SEP3 I8S$ i ^B & me x#e*ps ssssas #B me a##is6*@ *?iK*is ea #(6 me «««***i9 #?m*m j$««* me «««fse#)9 MAS SriW-i fiwn ■ficffiNisigte I /J\B &7 EiW-i fta #P foM Sfitot"--i [W9SE. ej jtsia mz mm*-i (M)B*tfr 4b H MM (M)B*t*! M# ^ (#) B

[$n*WlVt6*] $S « WS±*Sf%Sr 3 *b m mBiiKim 's^T-k^wum i§3@ ±ee%* *# @x #*B±$ ai?s /jv$ W¥ mmmm^x.mmmm^mm-^m^\nm7-mf^ i.ammr MX -$ T*>y h y-^-b>^- Btm

[###] m-h $- (M)B*tiigma^%tss ««&■§!$ s«s$ »* BBS® JEX (Bf)B*tl«®a^%B6 Sffjjfcegp SEP ±ffi®

(6) ¥-f& l l ^SirsljBLfc^SIStE^ r^-yi-n wtO'fi'/Dyj ©SES&**iB*P5S-eii, »iftaii3>ta-r-f >y^ia f 6 7cto©*#9#R#t: LTn Sta^vyi/^T'n-fe >t"i-7 7 >7&*fSk lf:3>/H7«W7'Dy7 5 >^JStttS«COVNT, is *#%&5VX S«l$S©#ttl • E?£is%*ffl©16lt&fiofc0 mssmcfefe o-Ctt, rae^ijn WW 5WGJ t y >^WGj ©2o©«i$§ gait, 36?ij3>yH7WG-ett, ® rm%r®mmb? >;H 7g»;itJSfiilj Sff? k #):, 7d^i? ('Sf^&SlJI^-SCifcfc-oTffl© ® rg ffismmg&ftJtoUtMbj &ffofc0 £«A-l(3>ea-7^ >7WG-t-tt. ® rggf© SSliffiU © -x-f >^*yyv-y-i>a>fl-s©1ti#j Stfofeo «TC^fflESS$kto-5o aS^J-fbnWSixftffi : mmt#ay%^A©u-fe«m (*^ny±©**esg) (v7k?x7 &76 6#Am©*^©e#) k©s«. a^s cntt, ae^j ®gy\- h'^iySEChtetU, ffiMB9i:36?iJ®a V 7 h 7 $ y I'^cissuti'^o #c, 70^7At©;i/-7"w.*©@BA-c©%^u%#mA^%a' t?afe 0x 7"n75AttJ©s$d"$tc v^.;v-e©jfi?iJtt#tti (vyu^^v i > jfi^j-fbti*) %*©7=-7*#, ®J»tt#©®]iKj6@xfcSlff*j$; (®aw*ffs T-f ?#K $ ?.(;tt3.—(ftffl'f >?77'>3 >&!SSx7r*?iJ®a^a-->7tt«A^^©tt* ¥Sft4b-(t-6ttk^-5o *^6©iS$E^"Clix 3fi5U®a v 7 h 7 ^/’©tfStte-S gijMWbn >;u 7©S #minit:'3V'-ci$*»is*&%mu^. uttreui. ©vw^i"< v^^j-fb^rK ©i >?-7n ->-'77)$ffi\ ©OpenMP fflteffix ®l«3W('fl/-y3>ftl, ©Air v^;v36?U®a1 ©SStoSIffftSx @a=-7L —— >7"'7 —yi/©ttEI6icax ®n>yW7© tilbFffl®l6lnlC-ov-t * kto&„ £ 6>tr, 77 > 7 * — FTa^tcatiut Hydra knV-S* 777Af7n-fe 77ffl»^ia«t;fef £fc>3 Olukotun $tj§&ffiWH8iB U 4-E© 36 ?|J -f b ft W tc 731' T MII i ^ Sfi U o iy.±©is«6ss$x.. ®7d77Affl#@^sc«^M?ij®a?i)Sfflf#e>nsftii7^e Stcreg^-giJ U369iJ-fbt"-S rvyv^^vi >36?!HbS«j, ©77 7 b^yi/&Sti r®« toSIGSiij^ ©a.—y#nwu% < T4,*jat;7-'-7&«-irr?> r@i)Tr-7di!tS«ij, >36?iJ-fb(;W/S bfe rx7i7 7L —U >7S6Ej, ©E/to*7D 77 A^Wj t*$BCX-3Wc r^.,,.-->7tt$j ©5-7$7ny3L7 h k UT»^|g%T^$S6BE @k VTIS$L7c<, ^*©^>7-7-777 t-T-tib »Mb3 >y W 7 ©ASM S6& y^irttie & »«-c- $ » v /c to. #Mfb3>/w7©em#mR#cMT6R#^% As^stfflisiftta bfco

(7) S • ¥ • S©$m#&ISSl IT V -7©fl»©Tl;:E^SM658»'f B*f ISMSieS: k B C k k bfeo l£*ail!3yi;i-TO?Si : fia^icn > en. —7 -f >¥&Wi (7d -;i*3>ei-fO?S*) B\ -IXCS^SliSlCfifl^cSflfB^jlA/T #Tt\BR%T& 0 , &#$ y b? —7k3>en. —7 t ©st-g-c ± o reKait®# nr 16 k & -a j&#ai $ nx u b „ *^6©ii«E%T-tts •E>i£iifl'E3>t;zL- f0i'077^h7!’ft (Grid) C7HT> *H6*-C'k U 728®lJ|6ll8$ & ■ i5 to C SI it L £ = £ 9t 5b Sc n > tf n, - 7 i’ >?cit^iSS;6Itii 4® Supercomputing ’99 N ©Grid Forum, ©JavaGrande Portals Group meeting x @ International Symposium on Computing with Objects in Parallel Environments 6Sj $7 B k#l%\ if7B 7 x. 7 b T& B®UC Berkeley © TMillennium 7D ¥ b 7 ©£«M3>ka-f-( 7 7l:A^T7-51/7v bffl&SI&Xfcf rGlobusjx ©NCSA k % o Ti# A B I"the Alliance 7□ V x- 7 b j U)co C©&@*, 4-JPRttEI+*tt7^7 j'"'!'Ai/7 b£@C VT*5D, f Tl: Grid Forum k If HtlB Grid S$©fcto©ffi®AsefiK$n, • 55jt*k"6 ffoTUB T ktfflBf Ufeo Site, '(>7 77b7!>fi>tLtli3;Er-(fi'fflIl!SS ltil$hfe7777S- ffll£®iil£ST7 b^-^TiS-BU $6(677 7-f y77 ig*A>e> PDA l:Wc-Ei4g*6 Web 4kffli’>77SlSCTD-ext1 f©±©77 |J 'r-'>3>liS*I14:SiItIli'(.> B#to7-lf7*Tl6t\K^#»$ftTt^B. cn?)©«!;st;tt^s a#m©m%mmi5##b:±#%;ad%&k?TL&oT^Bck# fl®J bfco £fc, &%5b%3 7Un.—7 -f >7l:i@±B77S7—73>(:3t\TI5, ®ig|SH'& k©eiSlb6E®fliJC«e.T^E^to±i£tc± DKttJSlc*©^ SDP iaS(¥IE;6e:7-n 77A)6«k LT8?D±tJTl6ltUko SDP FtSSHu *«©#m$jm»k©A^6, 8 &#icf©mgea^is tki-7 £;«;L £tt#tttg;!)s!£!'ST;& 0, Cft*T$mbb^±5b#T afeofeo SDP I9@© RiSlblC *51^715, ig-l+WSfi SDP 7n77 A£#»©777<- 7 Sints l#R(:#»©:+##T#E$-f B kuof && kB. C©fe», l£tiifl'S(3>tfn--7^ >7©ki5T-$>B7"-7iieeffilc±B6i6^©BW £® < » X. B C k * 57 l£«5bg(3 > t" n. -7 > 7ffl©7 7 >J 7- 7 3 > k LTiSf BC kSEIS Vfco

W±©±7»*^lS©m*#%&^*±T, 3m7W 7R#C3V\TI5. VeK 1 2^6*6 ry bvtyx b j£?iJ(b3 >7H 7 6®IS%j k VT7D7x7 b-fb±Bc kS fSltl'Bo j£85biC3 >t°n —7-f >7S®IC7ViT15, ©#@& =M>k Lfc9f^S!#B&£ MciSMStiG m#toMm©II#6&W6A4:f BC kl:± b,

(8) Abstract This report summarizes the result of Leading Research "Super-compiler technology" executed in 1999 fiscal year. In this research, we made the leading investigation for the key fundamental technologies aiming at the next generation high performance computing. Concretely, the investigations on both (1) the compiler technology for the next generation parallel computers and (2) the global computing technology were done. Then, we extracted and materialized the technological problems, and also examined the R&D system. We set up two working groups - “Parallel Compiler WG” and “Global Computing WG” to investigate in each technological field. In parallel compiler WG, three investigations were made - (1) summarizing the technological trends and problems, (2) materializing the R&D contents to initiate a project, and (3) planning the project formation. In global computing WG, two investigations were also made - (1) summarizing the latest global computing technology trends, and (2) investigating the application area of the global computing technology. The outline is sown below.

Parallelizing Compiler Technology: The difference between the peak performance (theoretical performance) of the parallel computer and the effective performance (sustained performance when a software runs) increases in recent years. This phenomenon shows that the progress of parallel software R&D is relatively delayed compared with the parallel processing hardware R&D. Especially, a parallelism extraction in the parts of a program other than in loops is insufficient. Thus, (1) the extraction of the parallelism at various levels of a program such as multi-grain parallelization, (2) the new execution methods beyond the control-dependence such as speculation and data prediction, and (3) the tuning technology interacted with the user, become keys in a technological reformation in the near future. In this research, the trends of the automatic parallelizing compiler, which is main technology of parallel software, was investigated related to the eight important technologies - (l)multi-grain parallelization, (2) inter procedure analysis, (3) extension of OpenMP, (4) dynamic compilation, (5) instruction level parallelization, (6) speculative execution, (7) tuning tools, and (8) compiler performance evaluation. Moreover, professor Okukotun involved in the R&D of on-chip multiprocessor called Hydra in Stanford University was invited, and the opinion exchange was made. As a result, five research topics are listed to be researched and developed as a project - (1) multi-grain parallelization which divides a program into suitable grains to maximize the performance, (2) speculative execution scheme including task-level speculation, (3) automatic data distribution without a user assistance, (4) scheme for multi-grain parallelization, (5) tuning scheme with dynamic program information. Moreover, technological development concerning the performance evaluation is necessary to evaluate the parallelizing compiler because the conventional benchmark test programs are designed to evaluate hardware performance. Besides that, the project formation is investigated to adopt the center management scheme directed by a project leader. In this scheme, all the researchers in industries, universities, and

(9) national laboratories are concentrated to execute the research.

Global Computing Technology: Global computing is an advanced technology that is rapidly researched and developed recently in the United States. The ability of high-performance computing resulted from the fusion of the wide area network and the computer system is focused. In this research, the technological trends on global computing in the United States were emphatically investigated to make clear the high-performance computing and global computing infrastructure (Grid). Four major international conferences in the field of global computing are reported : (1) Supercomputing '99, (2) Grid Forum, (3) JavaGrande Portals Group meeting, and (4) International Symposium on Computing with Objects in Parallel Environments. Three major projects are also reported : (1) “Millennium project ” of UC Berkley, (2) "Globus" by which the role of the toolkit is played in global computing, and (3) "The Alliance project" which is the joint project among industries, universities, and national laboratories where NASA takes its management. We conclude that the high-performance computing has made a paradigm shift, and as a result, the organization to spread the Grid called “Grid Forum ” has already been formed, and it has turned out to circulate its information, make the standalization, etc. Moreover, clusters constructed with commodity components are connected with super-wideband wide area network. The terminals including the efficient graphics terminal and PDAs are served by using the infrastructure such as Web. By using these infrastructures, many applications such as a super-large-scale numeric calculation and commercial services are on experimentation. It has turned out that the R&D of our country takes a very big delay compared with these movements. As for the global computing application, we pick up SDP (Semi-Define Programming) problem that does not rely on the experience rule but rely on non-experience rule to solve the optimization of a structural design, etc. The problem is focused on its importance because of the effective use of resources. However, the calculation performance that exceeds a single super computer is necessary to solve SDP problem. In the speed-up of the SDP problem, the method, in which selecting the best solution among many solutions with different parameters, is used. Therefore, SDP problem is suitable for global computing because the long data communication delay, that is a disadvantage of global computing, is able to be suppressed to low.

We conclude that we should initiate the project called "Advanced parallelizing Compiler ” in 2000 fiscal year. Moreover, we should continue the current R&D on Global Computing in cooperation with universities and national laboratories. Then, we should aim at the new joint R&D among industries, universities, and national laboratories by clarifying the possibility of industrial use of global computing.

(10) S1S *?!Hta>/U7SHf is

1.1 mw

J!7fc<7>;W A 7 — v >7 • 3>h°a — X&, 1EE0 7T7 7 7°n-b y b ;i/7n-i! V^&##L/c7;i/A 7°D t Itl^o CCDcfcd&7;i/X-7°D-t? 7tb^^;\X;^7^-—vyx - —7°o-bvi7^®i#AQba^cyxyA cob-7## of##) ^l#t:7 7V3> -XoX^A^#

f ^t)7,yDt'ytbmcoi#^b#c,b-7%#b^m#mbcom^#7c^iM:^ D, nx b A7xr —xyxfrGMfc^frTj&fnJACTi^ bW:#W: < $ G c, #%co7;i/yxDtvtb^^;\X;^7^—y>x - oybxi —7T(±, ### #&r8i±^i±za brs ux-;i/#& C D, yxyA&^^C^f C0^##C#L^b^ df^ES&£o C07cA, 30J:9^ai§if3iiT^7^7t-x>x • n>bxi.— 7 b#Bf —tE(±#^%CMT 4bB67tT73 D , X—77 — C t-D T&MA ^yXA A^fg#7^##$7t6 0CM LX, ZCDXo &/W A7 ;t-7>X • 7 > bxL —7 047^&M^X, ABX&BiCS^fRI^ § — o ® IjJtEP C b, DOE(Department of Energy) £ pBL tlZ> Xo CH^iSI© 7D 'X a: X bCj:0;W/^7^ —7>X • 7 > b xl — 7 • X —77 — ^ 100TFLOPS b^;b03>bxL-7&^f5f&0&^#bX^&o ^6C, 1997 ^2 H (7 Clinton A&^i#7b #^0IT R&D #fb^ B# b, High Performance Computing, Communications, Information Technology, Next Generation Internet CIH IT -S 7 b'AX X#Z7C, &^(7&B#Xg8@bT: PITAC (President's Information Technology Advisory Committee ) ti,” X -^Tr^f y&m^b, 2004 ^^^^c$i.37biiiion (# 1400 mm) %#### - ;^x vx7^m##^^(@m'y^^Ac#o^^x&o, f 0 7: Ac, mmmfmom#, #^i7^-b0^^0$#fb^#x^^#x&6o m^c, ;\-b /V 7 b#m&0^7 >X & b D ,2010 X V 7 — ^3 > C#bX Peta-FLOPS 0^###^#^)/:A0 High-End Computing b^oC b&#^b, #^%C^;\X;^7^-7>X - 3>b:L-7##C:&(7&X::y7^ y bWo^^b&^mibbXV^o^^c, C0^^C^^#bX,2OO5 ;£ X C. Peta-FLOPS £3lit$<;b Xoh'XZ) HTMT(Hybrid Technology Multi-thread)7 7 >0#fl@&%A Zo b bXt^o — 7^^D7Dtyti:i^fq]ittM^)b, IBM Power4 7D -b y +ECf^£7l£ X o C. lfy 7°_t C1EB0 7° n -b y +E £ ## L 7c '> > 7";1/f y 7 • 7;i/f7Dizy7© ^^^tS^ibbXl^o C0j;o^S/>7';i/y'y7''771/yXD'b'ytb(d:, ^%ft0/^ — v 7* ;v 7 > b XL — 7, 7 — 7xy — '>3 > £AnA7 —A4S^\0SA7si^$i]£7l£ b#c, '> >77V7- 7 7° • 7;i/yxn7r b 17 < v b b >77—;^, r»X /^7 —

- 1 - V>7 - —#t:3—K®y7> t*v;i/f7n ’b'^it- 7 —77 7 7--v^£M£ii&t§n-i;:tiu ;W;^7 —v>7 • 3 >

^^7-A#6D 73 77 Ag#W:#j^^73 77 < >7^#g^t 0^73 77 y(: j: D^^^fi^#^t:^^7 4b.73 77 < >7(D0M^W:73 77 Aco^m%|q|±, 7377 < >7#^MT##m##&&3 7>7;i/f-v7 - y;i/f-7Dt'yit, ^i^^^#LX:y;i/y73t ^7-777

a#X.6^l^o '7^7373tvT'^6;W;^7^ —777 • 3 7 ba — 7 37b3- —7 kr&UTs fiJiE'B> 37 b/17#—777#f%±#4#$#-C^

©73 77 ASrSititoM^iMbU mTc k&W#^a T 6 g##^Jfb 3 7/W 7©^%^^#"C&6o *7-;^ —37/W7 - 777 37©#K#2%, ^^ij37;W7 - 7'-jp777;i/-7 c©j:7&^m@##?U/H £ tt&lz r7FA>7 b#?!Hb3 >;U 7Mj§H^j ©7D '7x7 b i\j fo] (t fz *PDi £: ff o fz o jp —77737bL7#g^L^,

• 5to0;v—7jfeWb£®x7 7—7 7>&'Mbfa±&;S^t~£ fc#><£>7;vy7'v-Y 7

• 7©#?iJ#&g|^mf&AW737-7yfg7-7#<#0#^%L • 3.-4f^3>;W D 3>;W API(T7V7-7 3 7 - 73 77 A - -T 77-7%:^7), e 7377A^^^©###m U 3y/w;i/&ff9#^ • ^^7^;i/#^lj#&3 7/W 7(31 b e 73 77 APp©^^J%©^MC j: b 7:r7&#m$Jm'T^&l'©&^#;5/:#)

• J1 —7Ccb^) 73 77 ->7'&##^$#6/:&b(D7 —;i/, • n >;W 7©'i4|b§rWE^j^|f{ffit"^^teCD^^Imi • 7 7 771/7-y 7 • 7jVf73-fe ^-7©ijfn] • M9\>

#%bl:s 73 7x7 MbbipWtI&EtIv'r UT^ < ^<§^EEE^3UtES7^o

-2- i.2

1.2.1 yaeyyss

lft0VAf7Dt y 7 > 7 t1 A IE gM7!Mb3 WH7T-gy A—74 7 > —> a > > /<7V©5)1'^'J-fb 4fr o T lx 6 [l]-[5]0 C tl fb © 3 >71 d y X!' g N GCD(Greatest Common Devisor)?S[l]x Banerjee © inexact and exact test[2]-[3], OMEGA test[6], > > dt '7 -by >t d y 71$f/i\ -Y >7 —7"n-> —-> yflU/riigi&k'mSy&T-'-ytt# 8W[l]-?\ 71/ — 7^-ia, 71 —70^-x X b ') 7/7/f T3>7\ 71/ — 7’d’ >7 — f-x >>, Tl/d'75-d'/3 > [I2][i3]&k'©7n7'y A 'JX F5i;f t'J (g«i7"n ^7illi) ft®[2]67Bl'T_ jfi^lJlbAJtg^tV-T-Sti^rS?** UTt'^., «Xll *B'PJ7'f*?0 Polaris 3 >7ld y[4]g N +4771/-7- >© 4 > 3 d >JI gBx i/>dtv y 7fit®x T I/d' 7Df?b'->a >, #e#y-7####l: 7d - SU1F 3 >71^ 7ttt, 7 D -> — '> y |a]©:r — 7 ##&## f lid1 > 7 — XD 7 — '7XMK- 7 D 7 "7 A V 7 15X b^y V >76ff?iJ$$6i: LTftGfHb 3 Sift, *y 'r>i7t '7 ©WxM'Jffl &Sllbfe7'-7 n —* '77 d icMI"5*iSlb»^'Sffltx"C7l/-7j6?0®$i©#xfi*lbS gf!LTV3[l5]o cn6©7V—7j6?iJ7ld 7 CltJ g b Ax& b ^ < 0 71 —7ASJ7! $tog367iJ-(bT-g6g 7 g&oT*Tix6 A* N gmi©d 7 P -7 a >©##T

SIJhtf-7'Si);®!' 7 1/ - > 3 >©#%7@M L&ghg& b&Wl-:/ (71-7 *y V K -7^>7>7) 44)37V-7,jy. limit* © J;a it$&k'6S-3 7!-7\ $,^;xtt7P-7*©S5mx.* :Cflfc©|ByiJ#!BAsa$ix^ 6©AS$> 6Jf6 (S?'J©fSK#«) !±3>yH ae©E$»7-yto#AsH*6;Acto. f ltb©7l — 714 # — © 7 n -b y y- ± X M ft W tc {& a $ lx 6 o 7D77 ASffaefS©l*) 99%©@fl'A si«)$>fc7'-ytt#«ffSt>'l7 7 h a Xfy '7 >7‘C g b afiyiHbTgfc blT4> i%©gB##±m© Za%#mb@#%7i/- 7 (ii?771—7) i^l'tt71/-7tt>i©Sy b LtSot LJ ofelMCIl 1000 6© 7 n-b yy-4ffll'T t>i67 100 IgClifiloUil/iMf'blt&l'o f&fci*., 7n-byy-*Ai g'&l'Jf-n-Cgx 1%© jfi?'Hb^ nrlb»g|S4>As, 7 n-b yy-&©igink#£j67ijffi8t$Sg©ini±6PEWf"6*§&7 r 77 k&oT L£a„ btzifi-oX, 4>#©77P77Dy y 7777A©@#zACI±, %dR©71-7#7'J tttriax. cns;T*Ggfc)ft-ttA&A>oAcfflsaafe?'Jtt[7]-[ii], ®a*4S3eyijtt[24]5& 7Blx6!ftgA si56[17].[22]<, 4 ') 7 4 til ') 7d7137*f7-Ad' >l$»sj±|B|T-gS% 1/71X6 PROMIS 3 > /Hdtii, zztfscommzmmtfiicmmtz htg (7\dry**7i -77777 7 ) k > > # '7 y 7 • -r— 7##)##R#i 6 d" '7 7 d" To#© Parafrase2 3 > 7tg 7 k, VLIW 641'Cxk >^71*8636 SUffiaSrffa * V 7771377;© EVE 3 >Ad7Si#^fct5Ckt:71 d y g N tt3I L 7b OpenMP API tC g -o X

-3- J|BK6>l£?!Jti6i3itv;i/^ L S*)5l6?iJ-fb3 WW 7[ll][17][18]©®g6, SMP v->>±t:jg fflnJEKs *16636 ?iJ»a41iICxB^So

(1) S»ffl6«36?!l®a

;:ta, oscar vii/sf w >36?y-(bo wuresitsgfj*i667 7 7 3e?y@ afsconta^ii, *i667 7 7 36?y®afii£ti:1 jut©40

(a) 8fii?6n6X?tsS (b) *166"=<'7 o 777181©o > h o—T7 o—, 7—7 tt $11 It (c) o>po —T—7##&#@04: V7 D 7 77ia©###(tW#Af6 JSffi (d) &PCfflfflo-Ffc7'4't-5 y6^^^i“7©4sE

OlTT'li, ;*l6ffl^f'?7SSj4f5» a. 77 0 7 7 7£fi£ Fortran 70 77 Alt, HffBgt: 70-b y 44 (PE) 6>07T-A*'\y P\ T-7gmT-/i^\y F kit® OT*g*t69l;:7; $ < &S J; 7 t: OSCAR o >y(4 5 #Tj$TS v7 D 77 7ltA SSfttft A77o y 7 (BPA), #0)8070 y 7 (RB), 44711/ —7->7o y 7 (SB) © 3 SI®© 70 y 7 TS> So BPA It, «SttS®©**7o y 7 (BB) k LT$#SftSo 4:4: U Cffl BPA ©4fiUc* ‘ lATIi, 7o77A©36?iJtt, > fStl'T-f igST-A'-'x y P&#ltt T, ®*7o y 7 ©74|y-^, a 0 66©*1 W7 a 4’zi' 6£4$t"S &to© v7 n 77 7 iiir»k-*iefflS*tSo **7o y 7-9- BPA ©44ffllt, Z 066©*H\-77 o777M©367y##mi:m'6 *tSo «X«H 1.2.1-1 (a) © BB2 It, %'n9 77 T£>S RBI ©#ma@44k##t77 7T;fcS RB3, RB4 fflliu®a$fl-ffl t o & 2 -D©-K#©)ffiV^g|144SSA-e* ‘0 , BB2 It 0 1.2.1-1 (b) ICSItS BB2A k BB2B K»S1T-$S„ C. ©tHflll; iol. RBI k BB2A ©711/-71, BB2B, RB3, RB4 ©711/-7ffl36?y)aa* s Htb k & S o £4:, 5EffHeflfl©'li?& BB CtlOTIt BPA ^©ffl-6r* s)iffl$n, 7‘4'T5.y777 -7a.-V >7^--y1^xy P6«'>$ti-So MS It, * OB 1.2.1-2 (a) ©7D-777C SltS BB4 k BBStfltk Ak:6:6S4:&l,vjM5& BB-e$>S& 6>lt,BB4 k BB5 It BB2 mc&Sd'Mk 0T7n$4tSA($44R71:#fr$*i-, 0 1.2.1-2 (b) ic^snsSole® ®7k Lt«t>tlSo BBS #0 1.2.1-2 (a) IC<S BB4, BBS lcT-7##

- 4 - (a) An example of a basic block (b) A macro task graph after having disjoint data dependence basic block decomposition inside 0 1.2.1-1 y j: 0 BPA

BBS It BB2, BB4, BB5 y 2 BB Z (± BPA a RB it. Do ;i/—i/\Z tZll — 7\ t ft t> % =l 7 )l)l — X

0 1.2.1-3 \Zmt 9 U $ 7,9 1t7" u-c&, pcpc thy'x 7^ RB CD5>|!ltCjoVTs 0 1.2.1-4 (a) iZjF^tlZt d —A — ■7 v fco;i/ —7lt, th7"v^7 D ^ £ fc&fcn — F t° 0 1.2.1-4 (b) lz&tf2> RBl.l t RBI.2 (D t O tzt-X h Ztlfzfr RB Doaii ;b-y \t PC \ZMb btlfz'?# up Zf7 t LXWifc>tl%o tt£t>t>, ±X 0 PE, & U < & PC Doall )l — X\t±XO 7°n -fe '^T't Doall;b-7(±k _v/7D^^/7^Li:PC(:mD^%6^6o cc^k(±, mmtipcm^^i'v

D —4f-'> 3 XDfctblZMRgtlZo

£(D&?i]j£&ttiti}'X! — LX RB tfel ffilz# P *?

- 5 - BB1 Data dependence edge Q Control flow edge

BB2 BB3 ___Q-__ ___Q___

BB 4 BBS BB 6 BB7 ___Q___ O

BBS RB9 RB10 BB11 BB12 O ___Q___

(a) A macro task graph with several small basic blocks (BBs)

BB 1

BPA BPA BB2 BBS

O A pseudo 0 A pseudo statement statement BBS BB7 BB4 BBS Q O

BB11 BB12 BBS —G— —Q—

I \ i •, RB9 RB10

(b) A macro task graph after basic block fusion

m 1.2.1-2 'ytm.'nte £ BPA

- 6 - RBI .4.1 Do dccoss

0 1.2.1-3 RB

- 7 - -M-

JU _a

a

(a) A RB having overlapped loops inside its loop body

RB1.1

RB1.2

(b) Sub-macrotasks of a RB generated by copying code m 1.2.1-4 7 y ac ate RB D^ 7$ 7 <£>£$;

-8- b. Y^D7D“^77 (MFG) ©^5% y 7D7D “^77 (MFG) U\ — 71/7D —, ©iSOT? &itlv£&©-£'&£o m 1.2.1-5 (± MFG ©$I£tRLT^3o G © MFG £;fcl*£ y —FJi BPA n RB, SB CD d £ ^ito ^X7^li3>FD-jl/7D-, H ^#"7^ Df;% ^fMjCD^—f 7 — F F^] C7D /J\ H&, C©MFGt:^^T(±^Rl(±#B&$flTl^^, 3:'^^© AT©;iv^^vi;(d: RB MFG (±#th ^^7F#[o|^77 (DAG) 'T&&o

Data Dependency Control flow Conditional branch

BPA Block of Pfeuedo Assignment Statements RB Repetition Block

0 1.2.1-5 D yu — tf’7 7

-9- c. vi' n ^IS© j6^iJtt©#titi MFG (j:vi7Di'Xi'M©3>h D —11/7 D —kv —i'teWSrSTAb 7 7 D £ 7 7 HI] ©3fe?!H414*3I bTi'&V'o -jKtoC> n > b d— ;MS#7' v 7, t> b < Ii7n T'v Att #7-9714 , * vvi7Di'xi'i iac7:-'-i'tts»s^itnHE*fflj6Mtt4SUTv^ 0 v A* bHISCtt, v7 o 77 7H9lcl4f-7##A^##±%o Ifetfot, v7n7D —7' V 7 *6 v7 Di'Xi7|g©a69iJtt6#titiT3 > h D— JU&frtr — 7tt#© *ffiK6j6Mtoa^$tr*V'Tli. n>hn-;u

770777 i (MTi) MTi SfctoOiffTifeS. M!;U> H 1.2.1-5 t^lAT, MTI t MT2 c o > b O —lb®# b, MT3 iCir-7tt# Ltl^ MT6 ©*¥|lff BJfglSifttt, W.7©43(:*&.

(MTS A%77-& ORMT2 ASMT4 CiblR-f 3)

;;f, "MTS A#7±^," kb7©!4, MT6 £ MT3 Or-i'ftSfcl IT> OSCAR villi'" Vi1 >3 >/W 7»sESLTV^,y.T©*tt6SS'f-E>^V'?Si*"eafe5o a) * b MTi *5 MTj l:?-il# b-0'£& ?>t4, MTi I4 MTj ASH 7 7 -5 4 7i I! £bT-§&, MTj C3>h n-iMK#bTV'-5 MTi liSIffaTIgk&S ±13© MT6 ©«¥Slff MTi ©nfi:*ttffl7i®stty.T©

(MTi As3 > h n-JH£#T% MTj Ab MTi (C^Kf 6) AND (MTi A^-7##f MTk (05k5 |N|) ASE7 OR MTk AsHff£tl&± 7 -5 )

«x.l4, MT6 ©e¥^fT5J66*(7ffl±©®(i

(MTI A5 MTS OR MT2 As MT4 lcfi-lli$) AND (MTS As$£7f"3 OR MTI As MT2

AND ©tiifflad^ftlib 3 > b n-;i/tt#tc± oTi*S±-5*¥Slfi :i5J6g$(zf:72fe 0 , AND ffl&fflgHibsHifttt, ir-7tt#&?i.£'f qT«g*tt±*>5o 2 #S©jfett 14, MT6 14 MT3 As»7 b tz'&cn'n£ *U Ab MTS 7 t A^^t 3 7 CC^fttC&bT, MTS ©Slfftt, MTI As MTS ICibtiT 3 C k 6, MT2 ffl*fi(4 MTI As MT2 7 fc & IT b bfcAsoT, 7©*ff l4Eil$n. JiVf © 4 3 ZtltzBttlZo

- 10- UT EP0& 3> (MTG)

* »ff©@#-fbtt, -6 MTG

o — ho

&:&';/ C

tl^Ts —

T

— u-c^&o

> DAG

X' /

7\Z£-oX$itobtltzX-"Jz/t) b

3:^ SgfTBJt^ftBu i f (MT3 D '

± CD^I — —

^6D^rq)(j:T[q]^^E^L, i ;V#;#:n.y

^ vS/(i

^x 1.2.1-6

-ry MFG

OSCAR 3

.-

OR V^D

®S<7)n m - v

MT2

11

MTG

i.

- HV/HvEiotgimctftift?.. 2.1-6

% ft

OR

MT4 > V

(^k/v^(7)^En(j:#l$ m&lZ&ZZ b

D IZfrti&tZ) — A/tito p

$

** >

£(}-£& h

£ D

-e —

;i/7 -5

£ LTl^o <

to D-J: N

c MTi

eg 2

V

CO

A — ^ r- o d. MTG ±©TX D XX X© PC, PE 'ji'X'ri’i-'J'/y vxDXxxii, HfiWc PC, PE tend'd- 5 >X£tl-?>o CfflX''i'd- 5 •^^'ryi- 1) XXtifflilSBXXXCytf LTjSfflSh-Bfcto, XXXjl-V XX’tf-yS —'xx PA5# X XtoUffBSHSOsf L"Cffi*ftoC'J'$ < &-5o OSCAR n WW xCint-SX'-f X- 5 x X XXXi-'.) XXtt, -flgto& SMP X;v-^>tc^oTffbn^x, 3>yW3tt, b fc5$*totcx-yy a. — u >X3— h*$XDX3AcrktC$fi$t"-E>o PE &-^|a|M^- —yl-^ X P©J;3^'X'>>y\-7X-Xtcd;oT*4'X'XyyL-U >X^ffitisjStR$ftfciSiBr, X yr-Xi-u ipe ±-cnff$n«o »BxxxyL- v >x^S6sa^$n fct§-&, x^Xo. —u xxyv —f^xivx dxxX3— p©##ic#A$ti&o x'rXi-'J xx'yyi/x u XA t UTIi, <©x-XX^L-V xXiX-yi'vx p k4fiK Sil/cX-yXyL —yi/©*6#® LfciSS, X U t- d *yi/7lxS6IBU-E> X'l' x- 5 xXXX" XZL-IJ >xr;i/3"ij XA Dynamic-CP &X X n X X X K*3 tt -5 MTG ± ©ttlPX-k ^n ©X U f-'t *;py^xMIA, TX oXxxim©^ (r-^|»»i) i£ t)S88£ tit V 3 [20][21]„

20% Estimated branch probability

20 10 20 40 70 Longest path length from +30 +50 +50 +50 +3q the exit to each macrotask 1 i 1 l max ( 50 60)i max (70 90 10(* = 60 = 100 0.80*60 + 0.20*100 = 68

0 1.2.1-7 vXDXXX^7X©ttinx — P^N©##yixS©liS

(2) IBM RS6000 SP 8 XD-b X » SMP ±T?©t4|g

C. ct'li, 8 XD-fe X+!£}§* bfc SMP IBM RS6000 SP 604e High Node C istt-s, v>

-12 - a. OSCAR Fortran 3 >7W ~7 HI 1.2.1-8 Us OSCAR Fortran n >7W 7 ^ LT & tC >7^ 7 7 D > b^> K (FE), ^ F71/7^ (MP), J:> F (BE)

OSCAR Fortran □WW7l:(t OSCAR, VPP, MPI-2, UltraSparc, PowerPC, OpenMP(Dj;o^#^%^-yvF, #?']####, 7^77Vm0/i'7^:i:>F#& 6o OpenMP 7iyf :n> FU\ OpenMP F ^7 f- ^ Fortran V —X3 — F £ g fj#J £ f 3 tz & £1128 41 a ts 1MB 0i-7?4 FL2^r^'7'>a§f^, V (± 1GB t? & £ o

G OpenMP Fortran Source Code'

Middle Path Multi Grain Parallelization -Coarse Grain Parallelization -Loop Parallelization -Near Fine Parallelization Dynamic Scheduler Generation Static Schedulmg ______

(intermediate Language) V V' OSCAR 1 lPP::i::i:i OpenMP ^ STAMPS 1 Ultra S^arc i i Power PC 1 Back End; Back End; Back End; .Back End; Back End 1 JBackJEnd^ J

/^Native \ t^Native \ < Machine } 1 Machine 1 \Code \Code

HI 1.2.1-8 OSCAR Fortran 3 >/U ~7

- 13- 1.2.1-9 ARC2D INTEGR © MTG

- 14 - C. FFfflrny^A 7P 7"7 AH, Perfect -x>^v —7© ARC2D T&So ARC2D l*S* psb, ^Y s« 3>ys Y 7i; J;S®iifb7’n0!lk UT ARC2D ©-9-y;p-^> INTEGR OV7p7777"77&[1 1.2.1-9 (CStf o ARC2D tt 40 ©»y;t,-f 4500ff©7-D7'7 ATfcSo ARC2D ©#f?R ®©a *> 90%6+h7>-^> INTEGR #AtoTto 0 - +f7>-Y-> INTEGR ©$T% +t7MV-^> FILERX- FILERY- STEPFX- STEPFY »sflt© v 7 D 7 7 7 J; 0 tiitlR 6. kto6©»7'yi/-Y->)c*LT- 05-fm yh-T" y>n-0 >7\ y VY T^Y^-ir Y -fe'-Va >&k"£i6f9 L- 6f?ofci6§$- H 1.2.1-9 MTG #Ifeh)t£toSo d. SMP +t-yi±-e©tt*g k CTH- ±IB©7"n7'7A&HVt;£ IBM RS6000 SP 604e High Node ±T©ffl*66

#%©%* Fortran 7D^7ii>^ OpenMP 7 11/^x174 IH'/fc Fortran T IBxb £ to £ *1*6 6 367!lfb 71 n 7" 7 A # OSCAR 3 >yi Y 7 C 7t Y ltd to- RS6000 SP 604e High Node © 1~8 7"D-fe y-y- &H V'THI t$ toSo Ttt- OSCAR 3 WH 7©t$S6k IBM XL Fortran g »36?!jfb3 > 71 Y 7 ©ttfb&it® f-E.o XL Fortran hZ J; S 3 >yiY 76©P§Lltt- JiykiBiSfb# 7"'> 3 >Tifc S “-qsmp=auto - 03 -qmaxmem=-l -qhot ” §: H ti S o H 1.2.1-10 tt- OSCAR 3 >7iY 7CJ;S*l*i636?iJ®a^I££llt'£ ARC2D ffliS6 |6| 3:^6 It Y"o ARC2D ©3i7%a#|elH 77.5 #T $> D , XL Fortran Version 5.1 g 16 36?ijfb 3 > y 1Y 7 £ ffl V' j£ 8PE T © 367ytY 3tr J;^*l*i636?iJtoa^iS0SlffBeiEtt- 8PE X 23.3 #T$.So OSCAR 3 >71Y 3 ttiEiMaaraPnltCft LT (± 3.3 IS- 8PE 6fflUfcl$© XL Fortran 3 >yiY 7 left LTIi 2.6 IS©ii6lRl±6f9-5 c k i5TSTV'4«

(3) $kto

OSCAR Fortran vyi/7-7 W >367iJIb3 >71Y 7 ©®56*ie87 X 7 36 9Uffia^a&*-C>tcji4^fco OSCAR-771/^7'W >36?iJfb3 >ylY 7©tt(ig£- 8PE 6 jgSSLfc IBM RS6000 SP 604e High Node SMP _h®36^'JYb3 >y!Y 7T$>S IBM XL Fortran Version 5.1 k tblit LXilSSE- OSCAR 3 >71 Y 7 IC ■?■ V — 7 &M V fcI¥«C*=30T7 7-3 7'yf*;ffifbifij±*Htt,nfeo|sl L 8 7Dt 7»&Ht't@^-IBM XL Fortran 3>Af7l:MLT- Perfect ^>f-y —7© ARC2D Ttt 2.6 Ig©j$6l6]± &#Sk k* sT-$Sk k 6s $6* to 6, to TV'So

- 15 - gOSCAR ■XL o CD DC CL

T3 8 Q. 0) Processors

0 1.2.1-10 RS6000±T?@ ARC2D ©j$Slq|±*

[♦#**] [1] auttt, isoi. [2] U.Banerjee, Loop Transformations for Restructuring Compilers -- The Foundations, Kluwer Academic Pub., 1993. [3] U.Banerjee, Loop Parallelization, Kluwer Academic Pub., 1994 [4] W.Blume, R.Eigenmann, J.Hoeflinger, P.Petersen, L.Rauchwerger and Peng Tu, "Automatic Detection of Parallelism, IEEE Parallel & Distributed Technology, Vol.2, No. 3, pp. 37-47, Fall 1994. [5] D.J.Lilja, "Exploiting the Parallelism Available in loops," IEEE Computer, pp.13- 26, Vol.27, No.2, Feb.1994. [6] W.Pugh, "The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependency Analysis," Proc. Supercomputing' 91, 1991. [7] 350, "Fortran D- I, Vol.J73-D-I, No. 12, pp951-960, Dec. 1990. [8] H.Kasahara, H.Honda, M.Iwata, M.Hirota, "A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems," Proc. Int. Conf. on Parallel Processing, Aug. 1990. [9] SJE, f^*, Fortran ###, Vol.J75-Dl, No.Spp. 511-525, Aug. 1992. [10] H.Honda, K.Aida, M.Okamoto, A.Yoshida, W.Ogata and H.Kasahara, "Fortran Macro-Dataflow Compiler," Proc. of Fourth Workshop on Compilers for Parallel Computers, pp. 265-286, 1993. [11] H.Kasahara, H.Honda, S.Narita, "A Multi-Grain Parallelizing compilation scheme for OSCAR," Proc.4th Workshop on Languages and Compilers for Parallel

- 16 - Computing, 1991 [12] P.Tu and D.Padua, "Automatic Array Privatization," 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993 [13] Zhiyuan Li, "Array Privatization for Parallel Execution of Loops," Proc. of the 1992 ACM Int'l Conf. on Supercomputing, pp. 313-322, 1992. [14] M.Gupta and P.Banerjee, "Demonstration of Automatic Data Partitioning Techiniques for Parallelizing Compilers on Multicomputers," IEEE Trans.on Parallel and Ditributed System, Vol.3, No. 2, pp. 179-193, 1992. [15] J.M.Anderson amd M.S.Lam, "Global Optimizations for Parallelism and Locality on Scalable Parallel Machines," Proc. of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pp. 112-125,1993. [16] B.KUHN, R. MENON, T.MATTSON, R. EIGENMANN, “OpenMP Parallel Programming ”, IEEE ACM Supercomputing ’98 Tutorial Notes, Nov. 1998. U7]IB* be, mm, mm, ma mzm FORTRAN 3 Vol.40, No.12, pp. 4296-4308, Dec. 1999. [18] H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura,G. Matsui, H. Matsuzaki, K.Aida, H.Honda, ’’OSCAR Multi-grain Architecture and Its Evaluation ”, Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, IEEE Press, 1998. [19] ^# m-au ## mm, /b# ## ##^L@#^ARC#^^ /HPC#^^, Mar. 2000. [20] mm mm, ## ^m, d# am, sb to, mm:

ARCl36-8#^^, pp.43-48, Jan. 2000. [21] SB to, mm m-, be, mu: ^ mu f^uv - 7°{h^'- ^ n - * ^ y u - a > voi. 40. No. 5, pp. 2054-2063, May 1999. [22] H. Kasahara and A. Yoshida, “A Data-Localization Compilation Scheme Using Partial Static Task Assignment for Fortran Coarse Grain Parallel Processing ”, Journal of Parallel Computing, Special Issue on Languages and Compilers for parallel Computers, May 1998.. [23] # B WjM, 4$^ ]## : “A Standard Task Graph Set for Fair Evaluation of Multiprocessor Scheduling Algorithms ”, Proc. ICS99 Workshop, pp. 71-77, Jun. 1999. [24] 7t# MB m, #E, E:(DfcMn&mmm”, Vol.40, No.5,pp.l924-1933,May 1999.

- 17- 1.2.2 -1*

-f > f-ro -> y Ltzo «£%Esnfcii**$(D±ifx mm&ffift&iMmtZo mmmti-eyu-MKc-Duxit, ^©f«is$»sibs$ ft"CU'^i©"Ct©Pl#& (1) T'SSdh'f"-5o (2) X14 Whole Program Paths kDf y°uy?A(DWiTjpx Hz > -> 7 + 7h/f-7"D4z y+lia Fortran?? 7D 7'7 AffiJffl t -5 7 — 7 tfiitS o fz A5, gStt Fortran90 ^ C/C++ k © o fz ft 4 > 7$E&7"n 77 5. > 7' a gU OfUffl As JiijP'f &(flifacfe D. 6i63£?!Hb kv> A k k/cl;i"C-tt & <, 3£?il 7" o 7" 7 a |6] id © -i1 > 7 IWf/f ASE h f y 7 lb & D ok> fc 3 „

( 1 ) &##8g?!l7=-f 7D-##

137(1 y-*- 7 7 D-1$#f@E6ti:7,n77 A&gM7iJ-(b'f-E./ztoO^-H-f > h Tfc D , C©###mi=HA!:-m©@#lol±Hd:?oI/k?&^. k k 6A\ $ECtt7"n7‘ 7Artcjklcz^,$l$a-%l±f ©tEm^##&a#»*©l: bti'^o 3 >;W yfilE UV'7-d ^y AHfrSlSET-B^S^ife-BXzto, ?M%y-7##ibMU -5 kES-ti-d-^. 6fe-fs k©E$©)iT&#g)g4b^j£7!Hbti:SJ|Sg;* fuck«5. 77 > 7*- K*¥-eS«dnZc SUIF 3 WW y&IH'ASMElb J; D , *(^^$119 7 D-6#gUfc3 >yt+';hBeWSf©#S-fbkkffl«ffi;$S$l?Sz5ufc ffi3 7 h Ikiai -|g© 467iMb©{ig;eA 5S6$-t- § 3 k k * S/T$n tl'4[5]o ^ k T-. * ft 7 * k l? a 3 * ft ft R £ # ffi U fc »Iff * S k U T £ ft ft SB 7U ft- 7 7 o —)WSrS5;[5]AstgSStvtt'-6o k©ftfl;tt, #*©#Mib^'77+'^— Mb©?:©© SB?1!?1— 7 7 D—Sff^lSilt'5 b t C rl+• /MS lb © <7ft'J IB £ tvt $ Ac A\ k ft£ a -fiS-lb LTSgfflieib| iF?) predicate SrS^ttilTk k&fiHo k©e®6 SUIF nwiotmt. 3-3©^>ftft-7 7n7'7A(SPECfp95, NAS sample benchmarks, Perfect)A15> T’D 77 A 6 SIR I, 4000 Tv — 7S:Si ?> ti © IkMl/tam U, Fffl LAcSS$A5i86£tVTV3[2]o ftfttba-5 k, ^-7 SUIF 3 WW 7tlW“?ffl 50%6Sx^/k-7S3l£?iJfbt S fc* s, # 0 ©A 2000 Ik —Tibia, I/O "prtSKtijn As#6f-5 k ktba o"t 4£7(Hb©lK#

- 18 - XT\ ipo-®a±;i/ — XX^ &mii: 430 ;i/-7#4 u, f m7 SEM-X& 150 tfeofco ^-X SUIF 150 )};-yp(DMLm'Ci\ZtkW. LXX^fztK ^fbFtgE?0X'-7 7 D— ffttif £ £ D 64 lb—Xtt^iMbtr^fco D W$i&&mx$>3 h¥U$rT:§3o UT

cm&AM:U\ Z=a;UaE?'J#^#0@Efi|#B8 (Z^f'f- X “ 7 7 D — #P$f k LX L ^D L tlX M % Inspector/Executor[6] [7] b H?

(2) Whole Program Paths

X7 7'7 Am#fiJ{b"7#i#lb&X##)X^ < ±X7°U ^"7 c a(± &6o cm (±, ifji$&Axti$g£ESl‘£xn 7y^;v^v

oREm^&^yco ac5^. cm;b-X^^i^g#W(j:XDX7Am##$:m%±X# > bx&&o f cx. xoX7 Am#8y^##&^AC%@x^^^)}:XDX7 Am#j#mrn^i^ #t #J X 6 #f L L» ^ ^ k LX Whole Program Paths(WPP)^|ES ^ tlT V'l ^ [3]c C0f fel£$tMX^t£fr^fz)\' — XtE£$i^(Df5i^%£fz <' C ^St^falO^X 7°D

7T-r v >x&w#^x^^d7m^M%x^*a^^ t)mx&^,o WPP (±27m7^-XX^#^^iXL'6o #-7:c-XkL XoX7At:j:zDX^e ^ft6#[o];^x&bi/ —xf^c^x&^o #x7oi—XT(d:, cmbL—x^6#M e (Xz^X^#^^L^^) ZD7>^7bX#b^f^B m-#lXDX7Am^ek#eLXbi/-xmE#^et>a, DAGX&60 cmB^C id: X n 7" 7 A m ito &BOT7 D —$:7 7/^7 bfronxft < b Ltti7r

a. #[q];iX b V—X^ltE # —X5^ 7 XX (j:, XD X7 A^#^ff L/: ^ ^m;^X ^#6#X j: 7 t:XD X7 A ^#b#7 — b^#&71A^C a$:fj7o cm^^(^^^^6ftXL'^^^[8](:^-7l\XL' a#, ^e^x#:i--7t:m%ijT#6ct 7&:/ix b L-xcm^jm^^^Aox^miE ^^foXW^o WPP XU:/1XXD77^ V >Xm^#^#f^AXV^o U&#^^^XD77^mJ:7^(±#fo|X77mi^ag^f^o C7f6C^X. /^xhiz-xm+b^X^^x^^, bi/-x##^LX 7—iv+h-Y b ^m^X* s7 — U —iz £-DX'Mfr£tifz^X(Dmtemm£tiZ> &olzt£% ai^7X U 7 b^&^o b. AXDT^A/iX^JSS '& oZ(Dtzih\^ SEQUITUR

- 19 - T;vrfU XA[9][10]£IWT:& tU X U XX;W;:eJE£;b[];LT vaQ SEQUITUR (d:X h U >^£Ei7;i/3 'JXAT'feot, A*i:^lT3 >y 3rX h 7 60 ZCOTJlzf V XAld;£ £ ^ V —>XCD^^ C kC ^t>ftT ££&©y&a0 SEQUITUR y;vrf U XAi:olX^> UMm%lMz.T3o < o C(7)y;i/3 V XA& U ny XlGC^X. a &a 6^,/: a SEQUITUR

S -»■ acba c yb 5ii*n^il/r hf aQ ac 6o SEQUITUR \£Ztl%'> >4t; 1/ A T'E^i&X-T S -> AbA A -> ac

c 6Dy;vrf V XA®MA LT Xy y — )l/'V —

S -> BCBA A -> ab B -> Ac C -» Ad izmtz\zis>-tf)i difi'Mtinztirzt'rz t sequiturdTsa®#uu, 6o S -> DD A ->• ab B -> Ac C -> Ad D -> BC cc#Ay. B S -» DD A -> ab D -» AcAd a^ao

C0y;i/7'JXA(i^t)^Tiffl^Tfe^ SPECint95 ^>7 7-^7D^7 A®^ &y#±m#-f&a osg.go c^fazGB t:±aM/-x<& 300MB ix -^fP^tiitao^^T^ao xyy —, co^yyy —yxo xy^Tf cD#m(±^ iQOMBT&ao ccoxyy- 0X^x^6t)^a cko c wpp teitogb U&ix

-20- WPP T-tt DAG a$6fflV'T±IB^7 V-&H 1.2.2-1 © J; oeit$t"3o ;l©I2T-DAG ©i*|g| 5y — Ptti^T—©#*&$£'> >*d/ (±|gfi]l?tt S, As D) t&Zo (±IB«T*ii as b N

Cs d) Ttfo-So

Bi.2.2-1 DAG am

DAG ##®Aggy — pay ?-7 — 0 production (£j$) SSL.Tl'5. -Etl 6 li production ©65$ A1 6> t > #11P9U?& 6 « J — P A 6 y — P B ^\©m y yiiSffl A ©6i3 EMM B ltl'5CtSSt,

C© DAG SICint WPP T-|i7-Dy7A©lU!)4f#SS4CSlt5CtAst- t3„ y — PmSlff'SStty j'— h ->>^;v*6^-©y- p^\© dag ±©yixEk ur m$tiZo H 1.2.2-1 e^Lfc»^lillff*B6*t-o

c. A7D y 7

WPP 0 DAG *5)dvy KHOT)^X6H,ott-5e kitP-eS&o diy pAiXkv?©ti: a*#e##©m©@!A-z&6. ch$-t-ae.nTv^tixyD7r^v

-ettrn A©Hff SSSdf y c t tf-n^tefr-Dtco wpp tt^m%mwjp)i-y’mWiMz-Tt)^bg:

fctfT-S-5o *fes -So ww y ctbn'nmmvfrfrz? - m: ai-r-setefr-So $<©#^i±s $>4@»©n-P8 (S»s**^tiiEiia±&*fce> fe WPP T-ttc©diy pyixSE-yit^fctoiCs yixr-Si-S-y-yVl^ t Bf-S8t^:6#A UTVi-Bo $y Md-y^^ttfc-5 3 % P o®v>#®fl/'iX'es bs KHestffSiisvtX/tPs #@enx p ©iEt';t^ v —;> a

>&StivlX©k"6 6ipe& £ o DAG 6 p =7 n—Ztz Z t CioT s P tt±

- 21 - fcBfeodt 7 h y yy^7 6 M-7it, kfttcy yyt7 SttlttiitT v

d. FMS6$ SPECint95 'Of 7-570^7^!; Microsoft tt© ') P —'>3 dOfy —i"<-7 7 uy=7 h. SQL7.0. y-Fyn-fe^yyyynyyA WinWords I: WPP fflfftik LT**$.-5$g$As#e>nTV?,„ SQL TfflfFffitt TPC-C ^>f7-i>7D 77 A&ffiBeUflStffSyfctjtoT'&So y^ 7 7 n 7 y 1- yp 6 S 3 to C Microsoft, tt© Vulcan '7 — yi/7rSKlf^'J' E> PP Path profiler [8] SflJfH Lfco Z tVCSfc h V — 7 A1 6 WPP 6$h!t U, PPCompress Lteo E$g*ti: 7.3-392.8%y$>b, yoyy A©M#7D-##8m#y%

fty byyy^7©%M7ld:, 4f*Jgy^7©ff«P»k ityixCt&otlfi^hSff) ^sasife. *yyiy*m#&%©y^g, t ©ft$&#tfy;7&ft7 fyyyt 7 k LT * 7 > h Lfco < k, $ < ©#Aft 7 hyyyiy(d:€©** 100 #6f#m$yyo37 6, irfc & ft 7 i- y yyt 7 # mt. * 3 k taftu:!:A5t>^oteo ;;fftofe'Of7-?7Df7AciufttS *7 hyyytifco St5t, chJt-4iJ; < SfcftTUS k tfSJi5, y □ yyAtottEli HA©-3Bfl-©d5 tonsil-n-A^Eto T-$>3 kt'-5 k k h^aiTi'^. — SPECint95 © gcc, compress k 2 7ffl@H7Pk7 A (SQL, WinWord)t;H trtt, B7 y tc-o ft, ^© 2.5-3.0/tPo £o k©k kttiSffl77 V y —'> 3 >|6ltL©tt|g i6i±tctt, ft b 3 >yw 7###y*,^k k&ft LTV'E>„

(3) yyvy-7 v 7 pynyy Aiqitj-^'f >7ig#r

yyf-yy l 7 K7n77A&®ybft7k kct#57n-ft#, n >y ft7 ESreHtf-i' >7®f/fSS#f6$SiiTVt5 [4]o vypy 7 L 7 H7Dk7Affl4if 7 o —###%#f)f&(?7@nl:ld:7 V 7 Httt©&y»sk7o 5liST-SIfi1;*ft 3*$;#itf"3i&E* s&vfcto, iEftynyy a klalLj;7lcE7kk7tisyS-E>o LfrL, 7D-##%##&R7#Al:lft, L "CV--E.7 P 7 Ktrftlt^Uff 7 D—LWC, ^ftkifcff LTftff LT t'-5ffe7 P 7 fCi 3ftft >7fflfiSH«©*fb'fe#]*ft^ ft 7 k©re, &71/ 7 Mc#*ftef'^iz tSs ttses^iis* -y-t:flSx67b5 6«S©ftV''«ff6ff7*i4SS$L-CVt^0

■ 22 - a. MOfSit 3*7o 7? At;ft(t3tf 41 u- y H'H5©4?$tf £ c h Tx ?;i/77 V y H7n 7? AS® bj&A 5 J; 5 t;r;i/oU 7A&te?l Itl'S. X >7jSff4l£t;ti: Wilson t Lam t; J; 3 4?£[11]& ©$ $Stfl LTft b x 70 7? AtpO'SESf^T 0 7 —>3 >-fc y h kfftJftSEoll ©JF# t;lESlb L&_h7x 7 D—@#x O > 7 4 7 M&#& 4^S RUlSSf 6ffo T 7 -5 O fllf/fMS k VTlix x=&y, X=y,x X=*y,x *x=y © 4 ©4-f > 7ftA*k >I£?IJ7 V 7 f 4g&? par ftl'5. S7D77A^ve3il$x iSMftStfd' >7©fiSH«6l+ei--E.Ex 3*707? A© (SW7fen«, 7o-7?7±7M7f©,^*7#jmf>7©mmN*©#fr&t kCI+e&tfxHXVo kC6#77l/77 L 7 K7D7? A©@fr«x jfiff LTSIffStl 6#7 V 7 f I; >7©}|^H#©$fb'fe#8 Vfc^r-l+SSfio^SAsfc-Bo ;©feS),; ©#7©f @7IE#7o 77 AAA^fg# UTVStf-T >7©j@^M#&ii>iH?il:j:7Tfaf?. c;t C tt^©7o7?AtS©emiTiigb itor tA-Stfd’ >7 ©fawM#©#^?& b x y£5fc©3*7D77ACj3lt?.lSfif©@-ni:lBl L E/rCff ttiabo&773ffl?*.x I MX v y f t;4 6f #m#x E 1ES6 4 © 7 V 7 f l;ZoT4a$ftt@wlM#©#A&# Lti'So #7 ly y ft; ■fc-ST&tifK I ttx ffi©1-^T©7 V y f t; X o T4SS ft-5 tfd- >7© f#^M#&Ab-&X:A©x ft&bftAx V y f #t;|+W£ftfc E ©toS£-e$>-E.» :ii MX V y f-cffbti5*-f >7©fi7pH«©4fi8x iMt; J;-5>SS$:£,5 Sifts to© $ A77 f£>©7$. b x SS^SftftoX 1/ y f ©#7o 7"? AA?l±#l; cft6©KftM#Aiab VE?Tt'S £>©i: LTiSf/rSffdo tEftttffe©ft^-t©7 >y f &## LHft S *7$toS C k A4ft #7 V y f ©##f©ttol;l±C©7 > y f @#ffll$Pf)Tt; ftoTff ?>ftS E #Agk%Ski>>m*#4L&6tox *fftt I ©%m tfe±71 %*7e7i/ y f ©##&#bmftami#&i;ft?-ceo. ±IB©67o7? A;5/bs»Ef LTVSfti' >7©#^M«l:mt;#7l/ y @^M#&AnftSAkx @7 1/ y f ©*l/f*ff bt?©71/ y fA*4j$ftS}§ /ftM«6Ht$fftS,'5&BI:N;fx !$!$$£*©WffSSk|BH*Tifc So W, 7 L y f As-6-$r-E.SIS *5 t;ftftS C tt#7 > y f©HAftftftS C ©#*£■ &®ofc£)© bt£%o cftlix oft ftfr©7 > y f tSSSt £>ft&fg^Sl®tt&7 > y f #}:*> vxdllST-llffSftft? i: tiESSfaCMMS-tMiftS Lx Oftft»x©7 V y f I 6iiLT-f- 5> t»S> Xeggfrli *&A&7li:+mmA^#E4-7-a»bx f ftt;tby?7;VT-7 V y f 7 V 5 7 -f 7&#->@#§©3 >71

-23- -f^HHgl/, M&ffoTVxSo &43|8SCI4:7 7>7;<-- F*© SUIF 776A&S JELXm^X^io ^>f7-?7D?7iB 18 *t, -£®-9-'l' Xfct 53 fi~4478 ffT-feSo 7d ^7A fft, lu, cholesky, fib, queens, knapsack ti. iZXifc -S o (Wffi#Stt7-D»-7A® SUIF tfHfl3E$CSVT, load 43 store iWt'7i'-k7S n-5 WEtt®$.-5 n^-->a >-tr 7 F ©$614*6 3 b 4: ic 4; o TttaiJ Lti'5. SUIF 6 UBS! $614: load, store C J; 6 ffiS#S43 J; lFE?iJ#Bffl@'6'tC ® » ffcfiKSh,5®T, 6-f >^##®#g&EI^±tt4:gm%t,® ^■ffliBStt, — 77"D7‘5AtC43t''C load, store Jt D 7 7 4zX$4li6BJIbttAsife5 4:£4lfcD7" —->a>-fe7 F©$I4; 1 D**T'6 4 fl T$>b, #^®% 8 SlIlcH VTttiEBtCfce 1 -3®D-y-->3 >-fe7 h &0ffl-tZ>Zt

c®m*&fFmf rn7-7Att*4e?ijstt$n-?.7 u -7 FgGe&#&%#fF OE*AS) k®S*®tt«4:l/'-5 ±1865*6 *® t UTitRfcf&cfclf f. 4lT V'' -?> o Xk^Fr^'f't-hSSSSflJuti'J 1 00/-O67 —7tt^l4: iRAa®im-e#@®g#%f'cammu-tu&o i£4: fflttRiy&S 417431), 434364a 2 (gl-XTCiKSioTU-Bo

C. 3iW >7ft?Wffl(Bffl60: bfflitXfflBW >7##6AI4:%@, >7'7,D'>*i7 F 4: 6 »36?ij-(b 7nyx? F-t-fflfflStiTV^, 4>® 4: b 314:$ fciESlT'n ACISIff C L4bfflV64i % C1$ c ©## @B 6 IB t' 7 7 ;k 6 7 U 7 F7‘n7,5At::43tt3E'a (race)®®to^4tfll6i>a-;iz^E^J#Wto$6->-xL-;P6ft:Bg1--5:F/ET:$.5, d. V 7 F 7 x7I?’V-tU-xfflfEM 7,11/67 V v F 7"n 75 A©H4$iiK7-l4:, 7-—+f C#7 V >7 FBBtc ±©6 5 &4BSfF fB##&634p©##6##63 C j-jiV7 F 7 x7©±jH4|o]±C43f'7MS7'&3o C®6®k:|4:*@®miA:4W >7lSffi 645 5 £ e. 7D75 A$tft/x®fiUH 7D75 Atfi^SSt'-S 7 rq 1 ;V#ff 7 F 7 — 7, 6—7^—77767©65 t Uq ,6>->®*$f'#ffl4:, $$©## 6$ ±4b7|B} byjjS6R-3--3©#fNc6 3 (batching transformations) C. b X, 7 d 7 5 A © ^4564)^6 fe if 3 d 4; i5nJt67 S> 3 <> c©@AC*BI#(:, 4#fi®«t'4fq' >7JSflrl4;eg7$>3<,

- 24 - [1] w) 'W 7 -T^y D v®i^E^s f ^ 10 ^SiiSE^|g^S> NEDO-PR-9809. [2] Sungdo Moon and Mary W. Hall, Evaluation of Predicated Array Data-Flow Analysis for Automatic Parallelization, Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 84-95, 1999. [3] James R. Larus, Whole Program Paths, Proceedings of the ACM SIGPLAN’99 Conference on Programming Language Design and Implementation (PLDI), pp. 259-269, 1999. [4] Radu Rugina and Martin Rinard, Pointer Analysis for Multithreaded Programs, Proceedings of the ACM SIGPLAN’99 Conference on Programming Language and Design and Implementation (PLDI), pp. 77-90, 1999. [5] Sungdo Moon, Mary W. Hall, and Brian R. Murphy. Predicated array data-flow analysis for run-time parallelization. In proceedings of the 1998 ACM International Conference on Supercomputing, PP. 204-211, Melbourne, Australia, July 1998. [6] Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-time parallelization and scheduling of loops. IEEE Transaction on Computers, 40(5):603-612, May 1991. [7] Lawrence Rauchwerger and David Padua. The LPRD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN ’95 Conference on Programming Language Design and Implementation, pp. 218-232, June 1995. [8] T. Ball and J. R. Larus, Efficient Path Profiling, Proceedings of the 29 th Annual IEEE/ACM International Symposium on Microarchitecture. Paris, France, pp.46- 57, 1996. [9] C. G. Nevill-Manning and I. H. Witten, Compression and explanation using hierachical grammers, The Computer journal, vol. 40, pp. 103-116, 1997 [10] C. G. nevill-Manning and I. H. Witten, Linear-time, incremental hierarchy inference for compression, in Proceedings of the Data Compression Conference (DCC’97). Snowbird, UT: IEEE Computer Society, pp. 3-11, 1997. [11] R. Wilson and M. Lam, Efficient context-sensitive pointer analysis for C programs, In Proceedings of the SIGPLAN ’95 Conference on Program Language Design and Implementation, La Jolla, CA, June 1995. [12] M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN ’98 Conference on Program language Design and Implementation, Montreal, Canada, June 1998.

-25- 1.2.3 OpenMP 43 OpenMP (Dr-j’ftflazligVfz&ZM

(1) lit»C

OpenMP CoUt. ##. 7D X 5 5 > ?=£ 7=>. ff $6(±#©mM. tt<£. *Atm©lia©t:rA/i:©J:b!R&£'CWL'TI!5 W?Zo *C. OpenMP & A ^ 0 fl-gtSMMIf ©«!;'«)♦ A <««£ If 3 fc »fflll£5l C mLxmwtzttt>iZ' fo?7i> «. o

(2) OpenMP i;(4 a. «S OpenMP ti:. —g-cvo k. rftfrx^e V v/t a -t"d -tz x tt-oifeM it'd X7 i. >? 5> © Fortran ■$> C, C++C directive (JiiSfC) &/JDX-5 £ k C A 0 . Sla©Stt $i£5IL. jfeFUfb UTt'^o Cfl6©!±@©#MIA. ISV (Independent Software Vendor) & 4^'bdCff f)ft. 1997 ^© 10 FI C Fortran © A ©© version 1 API 1998 Ff© 10 DC C/C++HI © API version 1 As9$;6£ flT V 3 „ AA. Ctl5>ffltt Stt—ftT jo 0 . http://www.openmp.org/fr' -5 C k fr*tl}3fe& o XZXs Ctl 6>©l±SI4x 3 > b"i — 9 X — A&31GA V y b 7i768*Ettffl 1 tt© ^TrlftiftfcCT'ttfc <. $< ©Stt©Af@As* :py-/V%^6ilbT?*toTV^/iC k o cftciAWT©A3»#@A^x6ft6o &%fr^*# X t V@7ii/A7Dt vfI-$$@CKAf ^ AfrC% 0. fr-o-t&c&n&tz, AfrCft X^tzo Vfr1 Vs i (D7/if7Df 'V +)-$3WA < <8-5 fc»©3WW 7 "p'V-ZVffifflS fittB$»fr'ofc» itc. »ei+srnf7-7AS,wi$ciifivAfr ks 39

©3— F±H%$@MCf-5 itAC. V —X Xn X'x AC3 WH 7^x©ftSjt$ Afl-E> C k* 5Utt Uilrfcft-So :ct% Rg@»©IA. f ©3>/W 7^©@^A©<±##e%? S & o T1' A <2 k 1?$> 6 o Mili, SGI Power Fortran/C. SUN Impact. KAI/KAP ft klMA. tt#frsSfCoTl'-5o ChT-IA. xfr/AXDt xtlAXAAIg?©###*^ < . ?ct. SWf A^-eWIiC Ex-5 API sasiii ki'7ifi:&-7fctfx.?)n5„ OpenMP k#x g,fta. ctua Stt©^-X©Mlg* 5 Fortran. C/C++tS^:ti)Ati»J. C fflU i k LX tA. j£

?yitn«$±k utEo©aa ^#T*$>ofcC k*$>3#. ##gk UX^6#6CIA. ttTfflCttfliJ, if. ## ftrii+*xDX3AttibetDtj^wttfr 5*!^kAs±ife>n-E> 0 tt it * s At-&-&£». 1/0 CMLt©#%b%g$#AftC'C k. $fc. c©B©xnX3 AT-li. x"nX7 A3 — p A© 5%©a^A^tt©Slfi:RF(a© 95%$c5to& k Bbntixt. jfiFUfb©ECtt. f © 5%© k c dffl^CEAffttfS < . ifi?ij-fb*5SS"C$iofcc kAsASMEbt-S-B,,

- 26 - ofcc £:&a6h UTSx e>ft3[3]0 b. OpenMP CD#E OpenMP &§r Li^B !§"£'&&uQ >/W (directive/pragma )s 7^7^ ^ (C 3: D ^ Bln £ Fortran ^(3:, !$OMPT^^^e(±, OpenMP ft£o Cf #pragma omp <5 fx(3: OpenMP fiTpfx'Trfe £ 0 £fz, OpenMP (3:, yo^7A& 7°D Vyb^^ij

±#a0m^^^##f^(3:, f(D7D

< c V^ & D3m7"D/f7 A&df m^-riEL<#f#f6yD ^A&#^

Ml:, c^U:j:b, b &T(f 6 c c. ^ff drrrOb b T'—^rv' *7 ^ -V OpenMP CD^EfrdrTOl/ld^ -join M'V

#pragma omp parallel

call foo( ); call foo( ) call foo( );

#pragma omp parallel

IH 1.2.3-1 OpenMP (DMfr^T*^

-27- cm<^ao ^fyoyyA^e^, f

##, #pragma iZMMt % ts * © jlT CD y D y # £ »J \Z%ff f 3 o C 0 ^J T (d: H E

OpenMP 0T-yyy^-H:OV\yM^^o Ml.2.3-2 Cme&^ftljo

yyvy —ys> y jL—y ♦ /x ♦ * m- ycf i/^7^ y 3 >;W y V 1 75 v

os 0y i/ v F;F—y>

m 1.2.3-2 OpenMP

0 1.2.3-2 — if©T 7'J y —'> 3 >\t, OpenMP 0 tN yyy^ry, &&wa3>/w^&^L"t OpenMP 0^e^yyyyu$:^yo #^e# yy yy v §#w\ c®yyyA 0 os ^mmoy v^y y v F;i/-y > c ^ OpenMP #mne yy yy a. fwtyoi/aoiw #^jyDyy<>ytf;i/kuyw:x OpenMP ixL^t:^^<0^^^&^[i]o ccy (± y- 0 4^ ^ 6 \ MPI(Messege Passing Interface) h B+Sill § £y V y F##& #y#A^(Z)!t#^fTOo 1.2.3-1 Q(d:#Bo $f\ ^“^b''Jf^it MPI, OpenMP t&foZ>t^Z.X& i^o yyvF(±#@ic## #^&^0y, yy-ytfvy^r(±, tfO^yoy y ^ ^©y\ y y-y tf v y ^ Z O ^tk#y(d:^0'o 14 tb Jh fn] (Performance Oriented) ^ 7s — *? ite^U it 4? “ h (Supports Data Parallel)C: HI V T (4 n y 1/ V F (D < c ^ #m#y&ao idi^i/^A/^iWv (y i% v F(4i/^;F^-##<, mpi &y y-t?-y

-28- 1.2.3-1 fWtfOFa©##

mg MPI 7 D v F OpenMP

zF — 7 \L 0 t" 4 O A O x-y-7 b- ’jT-'f O O O o o x — 7 36 ^0 if tF— F o o mm%^36^yib o gdMD^;i/©#$ o W<77 — F ^©3# o

^y^mm^gamoTu^ 3;oi:%

"C#6o

O) itmtruy^&m

OpenMP 7#($ (Dmm%Mw tz>o a. Parallel Region Parallel Region (3:, ##&©%DvFl:3:c-C36^yi:^e^3T,^a5^&#^f^^^l: &t>tl%o Parallel #^Cl: 3; o "Cft-5 o Id) D parallel region V V F § team tl%0 ID (±0 0, 7 7 7 7 D y F© ID ft 0T&&O ID e^(DXDvF^giJ(Z)a5^$:T ^33, iD © omp_get_thread_num( )T'Sf#"3" 'b Z Z h %7: ^ %> o 36 ^>J C SE fi ll" 'b 7 1/ y F © !$( (i, ^ If # 7 f 7' i? 0 ###[ omp_set_num_threads(n) £ GEoT, ^1 © H ^ ^ ^ ^ 8% OMP_NUM_THREADS 0 &3b\ parallel region&ai £ H5 £ 0 0 i.2.3-3 i:yo^7 Aen&^f [2]o 0 1.2.3-3 It, 1000 to tltzW>&fc, ZtUDmffi%>k®%7U Z Z~a, #pragma omp parallel ftMtl&t., ^©T © 7" D y ^7 ^ 36 ^>J I: #Eff cFft^>o C ©

-29- #pragma omp parallel { int c,b,e,I,ss; c=1000/omp_get_num_threads( ); (1) b=c*omp_get_thread_num( ); (2) e=b+c; (3) ss=0; (4) for (I=b; I

m 1.2.3-3 ( 1 )

f©#, ss for (?) -e, e#i/v ^mscML^Av-cm^m^^af^o #7^# (?) i^i^cmm© #l/'7 atomic C

b. Work Sharing ## Work sharing ###, parallel region f^]©^^® team C 43T, # ^ © X 1/ V F t)s fttUTZ 7°D ?'■? A oft^h^JliAEit' ^ fc <& IZ fill ^ % o For ##, section ##, single ## For ###, ;i/-7#B©^^l/-i>3>©#^l/V F/\©##!l7]&&m/E T % i) (DT'#) £ o V>£) $> % x — 9 ^^Jlf W-'fe'iJ O B^F ^i$ifc>tl % o Section ### section ##^#^^4%X:e#D'7^^#l/V #^JC^eT^6©T^^o For # #yb^-^^^#!l1-6 C ^iZck D section ###^^ m^Jl:^e^i±^)Ck&B8 gkLTl^o Single ###, -o©#l/vF©

® for ## for###, ;k —7©^f l/ —^3>&#gHLT#W:^fzfaX=A&l:;B^&fl,&o y$y ? z tM$k%ixv\zMmt^_o______#pragma omp for [clause...] for (var=lb ; var logical-op ub; incr-expr ) body

0 1.2.3-4 for##

-30- $zlWL var (d:, — o Logical-op T'tiU < <=, > >=0 4 ^T'^o ^ Tz , incr-expr <£tWL var £: }E$f d~ ^ #3, C 0 for ;i/“7°0^^©;l^/jB bs break £ ftTU&U C h ^T'$> 3 <, Mt: cyauae-e^, mT&cfi&Bmfao X ^ i/ n. — U > 9 0fitAiifd:, schedule(kind [,chunk_size]) T'ffOo s chunk_size (±, Kind (:(±, static, dynamic, guided, runtime 0@Si Z> 0 Static (d: chunk_size 0 d" ^ D — S/ 3 > &# 6\G > P □ tf > X! $ ^ G; X b \v MC B!j D G" (j" £ 73 ^ X:' yW ;vt"^jl'n 07°D-tr V tA0jl/-— o b UTiiS^ilfcSOl:' &ofc 0 Runtime (±, OMP_SCHEDULE tlZ>mtlZ$Lo o 0 1.2.3-5 (CC-e(d:%DvMm^4aUTl^)o

n

schedule(static,n)

schedule(static)

schedule(dynamic,n)

schedule(guided,n)

0 i.2.3-5 7,>r¥?--') >vm

0 1.2.3-5 0##(d:, -f *? ]y '— i/ a > £ m LX £ o M#C0l§ n'JdU chunk_size T ^ cF ~^l£f d" %>o Schedule(static,n)'X: (d\ n ft(D chunk H IZ Hi HSU bttttZo Dynamic Xlid:, flltt73 &^7t It t' & dfc £M0H:b o £ X D y E D 0 chunk £: 5§: (d"EX £ ^M-ofe^^T'lr^tb^o Guided X! (d:, 51 D 0 d* ^ D —

C^flX:, chunk iy'd'X^d;bT'O/J\^<%^l:e <0

(D section #dC section #dC(±, section tltc? U y # )/ y KXfM^Jfc^fr't 3 o 0 1.2.3-6 X!(d:, sectiol k sectio2 0 ~fU y ^^£50 cF tl%> o

- 31 - #pragma omp sections { #pragma omp section { section 1 } #pragma omp section { section2 } }

M 1.2.3-6 section

(E> single #^C single 1 ^ 1/ y P mto 111 1.2.3-7 £ S £ £ ^ 1~ o

#pragma omp single { statements

}

M 1.2.3-7 single c. mm, mMfflfflvmtt * i/ y oz>o

® Barrier #7^^C Parallel region l46D##CD% 1/ V &4on Work sharing '0//ci§n\ parallel region £ /±5 £ (±

#pragma omp barrier

® 1.2.3-8 Barrier

® Atomic #^C ^ t V CDM#f & atomic o fztf)(D^3C.'V;&>Z> o

#pragma omp atomic statement

HI 1.2.3-9 Atomic #

- 32 - © Critical #tC V "J 60 #*$0 1.2.3-10 (C^-f „ Critical section &itu&

0 1.2.3-10 Critical #tC d. OpenMP ©3©E U OpenMP ©tt#3(t 0 17^71/It weak consistency X $> & o U fz A5 o T, parallel region ©#%#, volatile $gt©E», ^uTimm, flush mmtc©#fri:te u©-me&e

e. ^fflffe It § fcflttt Kl^td n Orphan 5s -f U ^ rr 4 T, master #tC, ordered # &k’®#l*, ttB©8l%t>$> D, LTIt© c-Ctt^BSf-So

(4) 1418

OpenMP *fl 6©3 WW 7 It, *H© V 7 h •> 1T ^ > 9X & £ KAI, PCI &%©, SGI, SUN, Compaq, IBM & k*/\-F ^ > 9 % IB38 £ fr o X t' & „ CCX It, ®ASH © RWC -e^*E© Omni OpenMP 3 >7W 7 ©MS3 „ RWC Omni OpenMP □tll-ftli, C k Fortran K&tttl- F L, Solaris5.6, Linux2.2.5 % k UnixOS mtMTXmat 5 o :®3WU7t, NPB1 CG, BT, SP(ClassA)&3 >7W 7V L, SUN S1000(8 CPU)tl!jff k ^ 5, 7"n b -y+ 1E 8 © mfrc, t=}»»SAs 3.8-5.4 ggai5 H k #*!?,£ tlfco Chtt3>/H5rli;'f t T6Eotc»Mk VCIt, tfc«65li»*gstdkSx.?>o *fe, SolarisOS isSftt5X V y PSftot/D t7i>S IBif Ufc®-&klt8 t UT &, Solaris XUtP ©®ir kl$IEPlS©ttlgAMI e. n^> C kzbsfl-6^-3 tc OpenMP It5fclciz|y<;fc t d&IS* & OS fflt l/ 7 K Cit® UTffo T © £ fc®, Rt,My^;i/fa 3i4T-t-5^ kit, 3—H© k#x.6o

(5)

OpenMP &#amn:a'%u6yt v<#m$ It -5 fc ® © f± H © K 31 t -31 ^ T Ml »j f -5 [4] „ OpenMP ltt±#7

- 33 - t*&£o vuvn«\ F^^tm&ibbtz

< Mt> £ £ o ##:% U\ for 0 schedule #7p@ii'^ll/^ 7°#^&^}#!l Ltzftfte, ^Wl^tifzy'—^^^Mt^y v y Yi^Moyu -b 7tM:#!l Otttfbtlfc 0 1"

yLTZ'tt, frWC**: V y — OpenMP (DttffilZ'Ol'iZfflffitZo D > &IZM& btlfc$)(DZ&ftl'o

(6) OpenMP (D&mttWi a. Processor group Processor group U\ (DZ$)% o HI 1.2.3-11 temt (IMTTId: Fortran l^)o

!$OMP PROCESSOR p(n)

HI 1.2.3-11 Processor Group

HI 1.2.3-11 T\ p }£7°n-b y+f7";i/ — 7r —7° p cmf^m^ay^T-D-hv^cDmm-e&^o i6#co CPU T?#J5%2ftT^:5&WdC -cO^m^D-bv^CW:, ccT, ^yu ~\Z yy-(D%t%5HZ'&Tte7PV z %o £fz, 7°D-b y th$(©3&

b. Index distribution Index distribution IdL IS^U0 4 ^ 7 7 7 §fs/R 1“ £ fe £ 0 HI 1.2.3-12 777 o HI 1.2.3-12 -CU\ mf, P Index distribution IX ^WTO ^777^1^^ 100 (INDEX=l:100)o #S!l (PART) fiN BLOCK ^|!j* (PART=BL0CK)o BLOCK ^#!j 1 100 £Z%7°t]-t yW)l— 7°$ref!jo fct££(j\ 7°D-b 7 7-7> — ccoen'T^d'^fv^xcDi^^ 25 n-b 71L701/ — 7° 1 13SO 0 26 ^ ^ 50 ^7:^: 7D-b v^70P —7° 2

- 34 - 1.2.4 gjfitjn yn-i v—i/ a

(l) litfel:

bSi-5 C k A^ sj|gt> L < !2Bi6T, SH$tcro ^7 A&SIfr UT t©-Cf WW L-7a > ft»!2, S v-->a >®E5Sk LT, #*7b 7 7*5> htek"®S6fffli6tilf66m'fc*jSlb mAM#®#&"P, b )v-7tek'®iWSI898fi :7 7t#«6fflV'fc$*7*B y 7|gl:#6l$ ^777i- V 3 >7 2 >->a XrWHteSIffSHlIfgk UT±IBtt^tc*#^*60»s#x 6n-E>, ;W*U 73 75 A®## 1: tC k AT 66ALto# #1*®?*#) 6ftfc7B 75 5 >78lg®V —73 — b 6/tfSk LT-;6®$to6ilul6k L AI#»r$fltext2A VE*®3 >72 5 k J±7, ffS®72* V 7375A®#%te##r 12, *#73 75 A®@6Tmu-m#i0 Ale, t67teHS$6 kt)te5„ 0!xtf, im#7 *>7^4)0^17^7 b>7k LTa»£ffllA37B75Akl77 (Sfflttl2k*A< k LT) @%teW&#x-C*^k%6Ate2 5le, 111(0727 V 7V 7 5 A le 6^ T12, 3 > h b—;V7 3 —75 7f 6B69leiEieA-358*roi,:tl;;tt*3 C kl2-7 > — -> a >te btlil't 100%®#%te5 8lttkSlfiyjj*k6aVz:$-ti'-5c k&H®ie L*CSfc„ #*®73* v^mRag-fbR#?^, >6?ijJM!®SAt> Estofc, 7-+f i'f+ ±08 t'WofcaSt'SA* D < -5 t#i 6ft-S A, *® k tteHHtotefS® k UTS < ® LAAoT <6®#, ih.2 Tie* §612 ?>ht?ft8#7Dt'?tfflfflI Ate V 7 b ^xTllSui'CjStatctJb kte < A7 6 3 o jii*® V 7 b 7xi7*$»sk"n7d(7abt;®EIBlbttS&5:lBL-CVx5 AI2, %0 Y2K P4@At7 K%A%*2 DM%$nT$T*b, V' < -3A®j@M#l:b77 73* 7»7-A7 7 A *®7 A —7teSffteS*6#6# LT Sfco UAL, :r i. zl > --> a >&Sj)*A < frtel' +fl-tel$|g&fe-5c k(2Stor®u<, $fc, 7-77* 5*5t®Sffltek*®7n-feyf fflltfllbleff oT, 7 3* y7®S^)#V'6jEBA3>cb*6 (Iixb-> a >t* 3 C k 12* t* *2* HitkteoT 1)»3 >7 2 >->a >1C2oT72* U 73 75 A0tl^telPtif 6 RISE k L, HffB5 77 b 7i7->7*AAiWSte3- F*j|fSfjte7Ctl:2otch()0MSMt 6CkAHI#k#x6b^. C® A5teW#A6, K#m%73*7»k72*VS## ^rt^'oteSrlET —A57**S:^^2~-?ite61^El6^3 >7 2 1/ — 7 a 7 & 1* 2. W ^ lb |S] C o l> T ® m * £ 'n te 0 tz o "7 Hlli4§ff teo te®!2, IBM Thomas J. Watson Research Center Tfftebil-T §

-39- tz DAISY DAISY (Dynamically Architected Instruction Set from Yorktown) (iN 3 t v it (C b A LT PowerPC ihcday Y- V t°d ^y y —jpy y Y-v±cD^### A#%3 Y D-^/3 b ^o:y^#(:icTs >^f>yyAY7-% il/3 — b& A'CDSttoTlEB&yD-t: yitx ;xl/-'>3 > CD E >i< $ tl ^ y ,Dy ,Y'A&&'a ^$m<^<3_D-i/3>^ef6CA^B#ALX: VLIW yotvit^i^^ fr^^:#v7 c^yo^o:^ bcg#f ^ S&8*to&yY 7s>r T/bWI^^tiru^Siy:y$>^y:E[i]^-ou-riiSLtco C0# YcT^Bf^^^Y > b(i : 1. at)^o^^^^cD#ijm^#mD^ vLiw n- 2. t-u7D/f7A&#maL/=3>bD — ;i/7 d -^7 7©^^^M©x b LyA

3. yvitY%^mD^^^^mET^^AcDyot Jt Ltc VLIW3- p/\CD^#&

Linux i A0H^f i: btf Linus Toravards tby>A — A LTtDt)oT^£ C ATr&ftl^n^ifcH Transmeta *±(i, E^fY^OSSfaS^'O

2000 ^ 1 D 19 B /:o ^It©vY £ nyD-feyit Crusoe Intel % CD 4: AY iV Pentium III A (i(^B)^CD##b&^(^tCD 1 (D'MWMXlX'MMLfz^ AY ib^tsroKt 7°n -fe v it*T? D s VLIW #h%CD 7° 3 iz '7 it A Code Morphing Software x86 A0# tibXW^tZ^ Y t 'J LTDI)o Crusoe CD Code Morphing Software (iilBJS &IW3 Y V — A a > (c i D Y Y- V AY Y“ V 7%M'&%M\jfz%

L^Tv^ccD##m;^y3t ^it^# &IE L < fMt*£ tz&>(D^$mlmWi A LT Ss ^K#^ltl#(iY;A(c%ito & © A®t>ii£o

(2) a. ##A## vLiw(i#8#cD#mfiA <>Dv^mMcDmm, —A-y ^ Y"^CD%#A;Y Y- V 3— b&^ffT^&A A ^,o LT7D Y >/=Yi? y :Ll/-f Jp##a-7iY ^/?b3—g^tg^^y^V^itA&ajYjf^yD^^As 7D7 8a#^3-bs by b^y ALT#mf 6 3-b, ^^$

-40- iSliSIGBe36?iJ-fb©Xitoettffs L4V;3<, £ e>fe. 7'i)-V-'{?.Mft

b. 3 >/H l/ — '> 3 > YAzi i; XU

31 ’ ZL v — h Sn^SVT —^r-r » 5^^ 31 5 jl V— b f 6 VLIW Sv-f ^7 > b T-l-r'??-* tlZTZo

OripMf ?<***/«.' C 68a «etl* rU.rt 4- «rt rt.rt.rt tti b orma* Hpt^eer rt3,rt,rt4A^e Ul *•***» b vtib* e#. -$###? z: ®

4- *44 rl.rt.rt wot rti.rt.rtyjxjw 63 . «uew3t . > tnyntf ^ rtwrea «ad rt, rM.< rt G> VLBWlf 4-*dd rt.rt.r3 -rt$. rt.rtj^rs^® Li *^W rtl.rt.3 ** rir|3,

vtiim ::::;:::i4“ adtd: rl.ara.irt :;-"*or :: 3rt:3 >*i* :3Ne:: M, . G> pi?if' ,:;::..:;

rl.r«.«3 til .;■: detir t-flS , rSj .: : :a; w*fia *» «»****#-**~ CD CD *4~ prtjo.rta.rt WLIWH 4- •** rl,r»,r3 A ** ■ - :M.:: i: :::: : xAr *«»,*#'*p£\jp*> a Vt-jcwa b orrww»*

bOrtrtOl b orwHMW ^

Figure 1; Example of conversion from PowerPC code to VLIW tree instructions.

1.2.4-1 PowerPC (XEUU D )

- 41 - =>rV "J 5 a + < ILP ffltttbt) —* VLIW 3 >/W ILP fctoc*$&3t" — H'4ff-5o DAISY 0 g#U±3 >/W 4/©^ —v LV#R§##&)# fc ILP ^ C i;Cfc-5o DAISY li)$^&7C0/W f- M3— P?0mmmc#$ LT7M ; =r 4 71Z%& U VILW ©*K©X D -y h ClRSCjlgiD LTU < o fctzL*. 7') l =r 4 T’^UIh H© VLIW y j.-iVnJf66a^r(iT-S5E b ¥00^^y a.-;v6l$^-5= c©#^-, ft ^0lS$ii-aetot: (-x-xy-^7 L^^ + *e,^ais©) v^-Avyx^icesajL, **0l|i^e*tCttIE LV' V yx < o Cfttctot, %

#nmt&Zo ttz. 7L'-j'6s$i-3T3--y->ycx-ri>^-;vsn T IS ot tiPISliS tftl'o %-e^ 7"ny7A©HffA sEI$CilJilt"-S$T-ttT —

c. l "ji/'Jtj. M * —y U >^©fc0©SIE VLIW tligiWifr^D- H©fl-lK/X iTSiifeilfi^X'rya-'I ><>'&» #- VLIW ©L vX h /WiJD£*V a >

vuw virtual Address Space VUW Real Address Space tctmtUMpnofArdUtectam Phytiea f»y« le" b—*"*m< ttnuuia* + bkcooo r*wsi#t4p#AreWtactar* of rhyiiml M#* 9 * iu»i tMUJOWR + 0*8066

VLIUJUUB > 0*8600 Top or VUW Rmri MmOQr

Vtit*„JWtiHHl»6e6006* —

tfow SvrArch Mnmry

6*1606 «Q» P#6» » # 6*1690 |JNW«*W» «0» Me* I 6*1000 «6iy* M»* A # 6*1000 LM*eAre6 fW# Ptff» i *Li**rcttit»ctwr* Iim Arofcibieture Ohyeloel Wewry Myeloid. Hewwcy

iigute 2: VLIW Address Space Layout

H i.2.4-2 VLIWT H VXSB0EB CfciftliU b)

- 42 - ©□- IT t)#W&5l§$5c£-f ^©A-fe y Y L, 9 91$* y h £ft 7c VXXX&SStoT&l' (3 5 y h LTV3) at'XU--> 3 Lfe k £ izfflebx W^As*4t'-5o l%#lcx D—K&Xhrj:b 6 ± !:##$-& ^,t*C x #V0gj%7:X hrtfflfflfcx-f ctie-tioT, ®i'?w7 D-fe y +7 3 >yXf>'>?« >J r y 9 I/O 4 ©®«S D - h* 4, HS£ ft, % „ «XI-X X'tir—+rX xr vfb$tiT vx^v vxxx tx—xtf^v—r -r >9999 a* 6ti^ BUSTfeb, X V+7kXM^a#©AA47Ts 3>XXX MabSSxtotC-tr-yUfcb u

d. 9 k 7 K U X V y L > X"## VLIW (?-f^7>l7- + T>ft) tt 3 oo-feX X-a > tcfl-SI $ n7cEIEX V S |g.r h* Lxo*e.®*^fi^t?y x£7uvliw fi*jr h v 9 =vliw nr h v- x t- afc b x ^-xr-+rrxy + ©Mr H v x k 1=] - t- *^>o s®, vliw u r h Lx^fSotheSfl-a, (i)i®T>>t-^(VMM) & ffi }$ f 4 Mb <7* ffi L * IS f!«x (2)'VMM t$&B k t -5 X- X «IS £ f -5 Sc 7+ * * & «x (3)##&XT V#a, * 6 & b x ❖ « b |BJ- v y a > XT v y y£ fox VLIWEffi=VLIW jg TfcJ.S3®, S»£7l7c3- h*St&lrt1-^««T* b, VLIW_BASE knf liix-SHt* 2 ortiHr K lx (Mxli oxsooooooo) 7>> e> ® $ -s, „ -x-xvy- >© tiax-e b ©^nen©^--x (=vliw e®xt vmfiesfl-) rkix n ig*§;6x -'X&'$#3- h*^jgtCfflS1"^o PowerPCs S/39CL x86 ©*£■ N=4 A^tOT&^o dte^^SifTfif©^—xr—+-r xr + ©tiar k lx n ©$$s nxn+vliw_base kvx VLIWmmr Klxk13.» e. 'X— -XS&3 — h*©4j$ 'X-XT —+rr Xy X Tfgff ^t+TV'S Xn X9 A##ar K UX# n ©XD XX ATtts VLIW fiiffir h* l/X nXN +VLIW_BASE xffl^gkft^o C©^ -xr-^erxr^nru^ttna rvuw^^sxj Is VMM fctx> h ijtf-f > t'frf.SljanJtg&^-XF'jW^fi^fflSj&n-h*S4fi6LT ^?>sy ba*ti7cdesff*ff ts 4>giifl-ecnKij] 1-^, S»r'-X kx - h*©?g$s PC k© 6 a #®3- htts x-77-Jfr7f-+®4 T0LXXX (XnXX AAXXX-^ii^ly X 99^-sts) t)K XnX3A»x-xr-*rxrx±T-SlffSn7c@6'k|BjLfil6®lc (Sf5t"3 kv>-5 VHlc JaoTISStotcfttiti-Bo gaat$3-h*6iEL

-43- f. VLIW 3- h ;^ > taa-s c yi-rssi', 9->f"j h VLIW »s*a^x> h IJ k UTV-X Silt ia &A>o £«£■«: rftEJixv h u^j F©B*ft*5fr*fcix-2>o c©#*td: vliw 7T+)--f hvv^vr (ITLB) IFSfflvTISBItcfi&do L%x >x®^7 4?-e V, X> V IJ ntl'iSLl^CB, frt> b l: VLIW 3— H^fjSUV — ^>© r K V XSSglTJ) < 5>h,-Bo g. xr-=Vx^^ve©«^1-ilHf%^©^$ VLIW 3- a kx VMM liia»fflIlt«:of;^-Xr-^ x7^-v*^kx ^©^^&Efft"-6ilm)ffl vyx j'tu; W "E V ©ttffi&fSsS L&ttft «& 5>&l/> = C©fe©C*X#'£3»A't3o VPA ti:* U Vi-IV3- P*03a4^-y©T P VX6«#e Vx $#%#3- -sfcW^MSf^n^o l*IT-©*X-b x P £ VLIW #^©7 -f —IV P%V< Lx —7"IV#:##f & C ttJ.5. VLIW ©Htf-fev>7i'f XX* sX7-*s^(ii$nfe«^C VLIW &vt)©i:f3 kx $1* x ;f 7-b x P Smvr^<—X3 — ptfc-k’c* ^Kltntf 6v»» $i$gijVx Cx si-3*0^-xir^s ISKSIfi UTx #!ia^©S?6#irkf ©m#©t/^X^ kX t V©K#&#^f

SmArchktmrtC^it VUWCo*t

VLIW1 c#i cr0*r3 r0 ■““ load r5'*0|r3) b vLj«a 0x0 a*i er0«*r?,0 nm 0x4 be erO.eq,Li A be erO.eq 0x8 load r5*6(rS> copy r5*r5'A\ l>2t b L2 b LI

Figure 3: Finding the base mhiteciw instruction responsible for as exception

m 1.2.4-3 (XEUU D)

& ?-o<7X ^!J&7 ^ -;v b^T--7>£fflV& V737£"CkU VLIW i^(D7>-7B b U4W > b #\ VMM kL IWJMBh&ofc ji> b V TtW >

- 44 - t%o b v ;tw > b izmMLtz^, ftfot b vx^st* ^)o VLIW 3— K4:T(Z)7 —bT(±. ^ —73—pf#'r#ftA/#ll#/;%byk#JGbfa#'%\ — o t 7 ^ -5 o h. RS/6000 Vy>4Vl/3>/W 1/ —S/s L. RS/6000 Jt© VLIW i/Ul/-^J3? SPECint95, W < 0^60 AIX 3-^ U x >f N b^T3%(D#$yuy7A (7^ >7^-- ^#^^>^7-7(7)- 3) &^#7o//7A&meL!:###?#M&e&o&o

m 1.2.4-1 PowerPC &6 VLIW s\ A 3- PM^^D (^#[1] j: D)

Pm###:PowerPC ha Average Size of per VUW Translated Page compress 1% m gee 24K BO 2.4 ------""I’?...... "...iOK mMWm TO perl 2.4 i^ IK ■ vortex MEAN... zi"

Tabic 1: Pathlength reductions and code explosion moving from PowerPC to VLIW.

#1.2.4-11*. 800S5ffY'> I/T-SPECintSd'O^T —ColATiSfiKSilfc/l;*#© UTH3, cn6.©S*ttSPECint95#B9X*S:ttS:fl3Ui:^>^T-^ 6# fiVTff6h.fc*©T-$)t). IS$k UT, 5000f*l*±fflPowerPC:t^l' —a >, %1\ U*tj 2OOOfi|VLIWfti0 A^Htr $ AT V 5 » eft It MS** y Z/j./i%#8 im©TR#2.5 knoiii-Slti'J. (/t^ftfflSkit. RS/eoooUff h v-^t©*^v-->3 >©»&, VLIWUff h V-Xt©VLIWi6^SiT8llofc6ffllClFLVi) vlXfiffl#lt. rnk/^A©, *®*^ y v^<;i/5i?ijg©#g;to&i!Sk !,& ¥ c i: * ST- $ 3 „ DAISYffltiEltfrft bffitA$8 |3X p-C-ilSK 1:$?,= %*©*#?!*, 1 o® PowerPC 0$&3 a®C¥94315 RS/6OOO00*^*^o $ < ®a#^tl6#a-(b&# miotzi. DfiKW4#w vliw 3 >vw ?-e it* 100,00000*^*^* 1, DAISY©|#ff tt(bfSTIt20XW.I*l C kklotl'-So t »^l:. gcc3 >/W ?lt 1 c© v>>^^S4)ilt5feS)l:, ¥965,000 RS/eOGOA^SUffT^o @4©%#lt, # tt^^eiSeglLfcE^T-D h j?Y7-|:f g^V'a -?-3--->yb, E8t* sfiK» v&ne^. tffllilictot, P6**gt:8!lMT$3 kWl^LTV'-Eio *t5S"-D

- 45 - PSJgjbnii 4fgT*&£o MftnJffi? t-( )10 AT0^-y&^3<Lmo#l% VLIW

|g 1.2.4-4kL VLIW^S/^

4 N 2 o ts ALU ^ V — '> a >2oAUt U ^ ^ L — '>3 > 5> ^ 1 o/z U\ L U d

y v <7^4 7"^vy yusturtd: 2Iu^cdilp 4i 24 6D7W :l> FV'>>tit fgrep 7? ILP # 5 iS < £ ~e fa J: t~ £ o

je»Nrcyel» jWMwmwcwigwww*; :#&** - 9 ALU's -«Mem Am -# Branches 10:24-16-8-7 6: 8 -B-4-3 9 :16-16*7 *: 6-6-3-3 8 :12-12-8-7 3: 4- 4-4-8 7: 8 -8 -8-7 2: 4* 4-2-2 6: 8 -84-7 1: 4-2-2-1

J___l .1,....I.... t..... I....i....1 3 3 4 3 6 7 fl 8 10 **&&*#*##

Figure 4: Pathkngth reductions for Different Machine Con- figurations

0 1.2.4-4 (X»[l]>©8tS:tt- IBM © VM >y->XT-A6kT-, ^CjgoTtttotlT STU-So C n$-e©EfflV ^ >tt, Sfcl©7 — (S/370 ±T‘© S/360, 486 ±© 8086 &k") Il7-J/r7f 7c *K DAISY (4 VLIW±C#SICS*-6®SV'>>T-jr^^^7S-9-4:- It4. X- 5 CL U-->a >***-(b-r^7c»C4r4- ^>7x S. cl 1/ - f U £, tl% „ f hf n©*^* sflgijc$#sn, BMii©7cto(;'S»*ss* s++7->iu» u, c© 77D-mi, a#dh7cV*-yj >ytt^nJ|gT-$. D,

- 46 - VLIW3>;Wl/ -'>3 >#&«£ D 6# U^#Atj3 >/U U -9 s >y;i/=f VXA&#^LX:o —Conte ^ Sathaye 0f±$Cd'97 ;W7$ftTV^o VLIW 7 9 9 R8 T

#^^-7 y i/:z- <7#® vliw 9 y — h 9 jl7C L^L, ##^;\-P9 a:7&^^^L. >/W 6 C ^ FX!32(Z)j:o^#^W#t9 ^-;i/(D#8 y^^#^##^ftTV^o L^L, ^^9 —7497977 A, 7/^v^, 7/W7P9 d';^#AT, * 1^7 — 777 79 C#f& 100%#5##&3#J5%Ta tl'5 Ltlil^^^o CC^m^L^7Y7470$:^At)i±^, ## teSj^iW^Ltettt3fr Li^#0££1#h£1~£ £:S9ML£o j. IpH w t)tl^tl{i VLIW ^f&OSr LU7 —77 7 79 £, ^“^7-^rf^ft©fcfe©® #V7 b 7^7a^ACa#C7^Xz&b(D7 7D—7-C$)^ DAISYCo^T%IMZ:o#7 077D—7(j:, #^CD^^9 —74>7977 A^, #^0^—77—7777 9&, #-0;\—^7^7797^^^^"^^^ r^#0^"—79977Aj

O) ^m&momm

Transmeta #0 Crusoe C^m

(4) f

HotChipsll tlfz COMPAQ (IE DEC) C 7 & "Wiggins/Redstone: An On-line Program Specialized [2 ]W\ Alpha JlT'ifrfb'f' & x86 x^ j. l/-7 i:

- 47 - FX!32 XOft&’ZfoZo Alpha 21264a fct/lx. G ftTT t> b t y* — ^fr& t % 7°n 4r yoy^ A*y >^0^>y V >yc J:oy^fLMJ^0#i^^^]LL < AR yoy^A^ fr^l:dd^0^e#j^&#^l:#I^L, b L-yt 0 u, b \s-z±m^t>tzz>m'Mib%&£ ;i btz±r, ^0 bL-y ^ 4; 3 i:yo y-7 A^#^#y, yo y^ A&#)gfb LT^ < hi>3 0£sS*i$ feT'fr-fTtfe^o fg^#{J Wisconsin ft0 Trace Processors 0 V 7 b y J: TKMy &

4:3^$)^^, ijftj3WW l/ —'>a >0— lLffl$IL LXs # R£'^ v ^ fifths ^ ® ft ft £ o

[##***] [1] Ebciglu, K. and Altman, E.R.: DAISY: Dynamic Compilation for 100% Architectural Compatibility, In Proc. 24th International Symposium on Computer Architecture, June. 1997. [2] Gordon, R.: Wiggins/Redstone - An On-line Program Specializer, In Proc. HotChips 11, Aug. 1999.

i.2.5 vliw i/

(i) at«>£ {JG VLIW (Very Long Instruction Word)

(Predication)j f 0#B, Tib 3 V XA, 3 >/W X0#^%, V —X ^ ^ 1/-S/3 6o

(2) VLIW a. dfr^11/^11/(Instruction Level Parallelism, ILP)/W t J:3(:^^^J&Aft#x.Tg(^mf #^J#0C ^^2, cft&fT33 >/W 3&#?ijfb:3 >/W 3 anf^[i]o j£^0^Lffl v Y ^ n 7°D-b y-0-T'fdL #)0fb0^&6, yo t 3'60^$C'o fft60yDtvtf0#3#^0##^&i\A>i:3^<^^y% c0^to0#^^j0^hK&e3 0^^^jfb3 >;w xy& •?> o

-48- b. VLIW ^7-47*7 7D* 7*Ate##^&#&#-77-47 7 77®It#%%Mle VLIW k7-/t7* 7*5$,-6[2][3], C® 2-3 0 7-47777 12^4 6 *^v^;vm 36£iJtt£5l£m7 4 iete2 b . 7D ^7 Am^anfr^ gfg It ti-S. fc/cU ;v® l"&t>4- vliw 4l2n ww /vb| cA%0#9ua&3>/w?### U4#m2^ (SFfa). -*4, 7-/17*741211 («j69)c

VLIW

19 no 111 112 15 16 17 18 11 12 13 14 4 4 4 4

[add ] [mUL I | L/S | [ BR ]

01.2.5-1 VLIW k7-/17»70Slff©fil^

0 1.2.5-1 te VLIW k7-y-S 7*7 4©7D77 AHff®#7SS7o VLIW 412, ii jzE® 2 7 ten 7T--B5lc||ff4S^*^®«6)Sflr Ls (Long Instruction Word) CK. SlflBetel2- C ® ft Steffi 4 2, ##g§(e%X 4n3Ckte20 S8ff£ft-5o -*- 7-/17*7412- E*®n>/W 7*s$ELfcn- h*?ij6$E*jA»=#- 7®fl#4 4Riete#e4#6#^3#&yi'*< 7*te|$?ffi U SS$tejSAL4t>E Site 364 2cn- b*^e,^ff4S^»^%«)63te?*to4%ff L4t76<, C0M4I2- 15 ®%ff 2 6 6- 12- 14 < »74t\%,c ktea#42i2cu. Ch2 D- 2 -3® 7-477 77 leML412T® 2 7 VLIW 4(2- 369U@&#R#teA4#ai2-5 4 A- 70* 7tl0/i— K *3:7l2M$te%6o 4 tlt2- 7"D-b 7-9-0 7 D 77-9-4 7/l/6±l2^>5tb$4*II^4$-E)o —*4- n >/W/LB5 tei2#e#mm»t\®4- 7-/t7*7tett^ni2- gism-ti-a^iy^-M^re t2-«te(2ftt< &5. 7-/17* 7412- $^i/^/b®36M#t2A4me#(emom2. «ot- EffmsiE ft&tLZtzib. !%%L^/V36?iJt2®*|ii§lttti-fr5iiJtgttAsfeiS. $2c- E* 0n>/14 7 42/* L fen- k*i4$^4to- —*4-

- 49 - O tz ® X> c. #m Percolation Scheduling © =b d & d|t ti X y ^ j. — V > y s Loop unrollings Software pipelining^]© H/ Blocking ^ Tiling ©cbo^##^##f#yE^J^±^<^^V i/ i- (C## ^'vS[6]s prefetching (D &. o tg.^- °v "J i/ n. < y © 1/ d' T" > '> ^ >

T©2oy&^^#x.^[l]o

(D 7°D ^ h )&^©vLiWs y-^y^^yotv-yy^s M#m^#mfbL-c^D, cfu:#imL Ts /wyyy>^m©m^^m <%%, lom^at^cba^yD'h^th^mmLTt^o C©Zo^^vwW77d'>^#o/:yD't:vtl"r(d:s ;W yy d' & < ;W774>©i^ (b0'7^) ©f^yfSH^fl btiZ z btefc*) ^ yi%^^©^e(:j:o, pc (yoy^A yy>y) ^^M^^t^d:o^yDy7A#mi:$iJ#^#cX:^lc(d:s /wy^y©# a#oo c©#fWs m#^y©^^A^ < klz^^o #(:, #m©####&#yy^& VLiw^y-^y^y©#^lz(d:s

(mx.^7"-y;i/^k*) yaamuy^^s ^cf©imLy^#^izm^L^^i:^, M£;U;^-fL LyL^O(bo^3>;i d' )\zJ5^t%^> ^ o C^t^#;#y#%y 6#{$fT (Predication) y $> 6 o CLfUZlUl ltil ^#LU#y###%y

(B jpd' V i>jx < y©##&M^fc ^ ^i%^^©B#^W6^f©(±, yDt'7y^igp©^y&^^s ###. ytvyy^A l:Nf^^©y&^)o Cft(d:s yDdz'7y©i^)#fb©#!jAl:]±^y, 7tV^©#^

- 50 - C®f^#(D4r^ V y(±j^mjCit^TJg:^^f#rD)C$)^o C0C klis ^<^7D-lzyyrft i#imbLT&s b LTV^C^^^^k^oCa&^LTl^o Z. tl'tkf&yk't %> fz&> IZ^ Blockings Tilings Prefetching fc $ tlX £ ^ (#^)s L%;i/—& Ds ##U\ Predication ^^'yi>%i:^LT(±c^m±s %5ADL&^cak: 'T 'h o

(3) 0H4^# ct#=fz(Predication)

fztz Ls C ©&©*£&&< s z fc£&j@t<*nfcl'[7][ 8]o a. mm icvho^y&ou-s f077^0^->s ^-7 1:ZoTs fj;o)k

*#^W:s

,...,, : :A±fs : : Trues False ^

^^®#[*W:s -e#^^^T/:yi/f:^ h l/^7^0#^k^ L U-ftkfs ^A±ys &^,w(d:m^^L%s

Check , , : : 3 ^/3>3—pl/^/7^ : 71/%^ /r-

cn^s 0^tf£ft&j#fc Ltchiis C True &#^Ls ffitz ltl^ci:ittllis False %Wt.fetZ>o

- 51 - subcc x, y, cr2 bg cr2 subcc x, y, cr2 eg cr2 cadd cr2,T;csub cr2,F

0 1.2.5-2

a l.2.5-2 imi'Ti&m-fZo a 1.2.5-2 r% ba© -y 7 knf-fe, 3>yHvT-ttfi<'6bh-63>yW 5-e©@a©#*#fitT- afe-60 £ © 0 ii, subcc x 2; y ©#$t Sfrl' b* 6 cr2 tc-b y f d > ( x>y ) -c-&n«in$f&1 k#77 oyyAtss. ;©7ny7i,tt4#;f ( bg) »s#$-r£/c4tx c©^R(%$©m#?Kx tti!0fi$hT U$o0 H 1.2.5-2 ©6©0ti:S3— b'-e7C©^#3 — b&>©A{$&4xt#A(: T(true), &&Wi F(False) k & & -b -y b f -6o cadd esub C © cr2 ©rtg* 5 T, F fC 16 li T, SuS^St* &f?do Cffl3>yw;b3-o c©dbim&*ef=ta/W 73 d” >077 -y Off *

b. tta ccfli. S:bMf t9$ff©t#iSi&$ kto-5,, Sf, ±K©M-Ct,a-A^d:3t:. £bMt§Slff©e;Hb&ff 7 ks < T Cti©itigto&%!l$k LT, SEC £ 3 75 -f >© 7 3 -y '> i ©@»&*6fcntiD, nffnei@©®«»sE^-es3, i£, s-es -rr©m#M% ctCJ;^ fl-tt^flijyvy 3 r &k'ffl^> b ')»#c©#mib&Gt»#? <*ot±il:li5. ktiiciEK MfSiny7 70 7?d' >©77 -y ->3.6«61c k, *ir, 0 1.2.5-2 ©M-ce&aZoC, *<4=^SHff6St kfl-edb^/b5* < & b, ± -«l:, i^7 7->'jl-V >7 &fz7#fi\ fl-»$*^6Sx.T©e^rS«)ttH*»k k . #<©@^ -o©** 7o y V >7&R? C k#$V'o fct. %©7n75A WiSTtiA $*7n -y 7# 3 kti5>©®©db^^tott* Wc —S, tMS StffSrffiLfelSlIrtiA -o©S*7n y7k&oT*b, #^77i7^.-b >7A^@K cfttcib, 3 >/W j;a*fU@©#mA%:& 0 , nTtEtttflgxfcc ktc&So

-52- _a O g K8 K ■F *0 K? AJ ilnn it m £ AJ ti K £ T J m > 48 E U m V ti tx 4^ ti ti # AJ u S y O hj F ti Kj e x. m G ti r\ n IK m 4^ F >K dd G m V i—3 U tvQ iK -ti z F ti ti 4%J <# ti m ti «» < m # > g G X ti ti ti 0 *N Cx F

Kr 4> WA4t X It e % H F iK -R F A E 1^ AO ti n A < G F -AJ -R -ti K3 G ti it U It ti F 0 K X- it $: G ti K E N n it t: 4?- K AJ -£ , 0 #ti <^- AJ M +S X) It -X s K fZ G -AJ K4 ti m .o V *N a> ti ti rH ti U -fiu ° -AJ V d: ti X

ti F ti It # iK F A e A it tv A -4D # ti a 0 U # ti 0 o \z F ^> y ■it t\ ti m , It O 0 < « # 'tx It n 1$ G C-A 1 G 12 M V -A 'TV K^ 1 iK 43= 12 4^ *N AJ

>

AJ £ F #( # % -fZ G G YJ *s ij- m G F m y ti G 4g It W ti ti it ss z K 0 ■it it ti ti it it # AO z 4t It 4) K m -AJ G XMX ti # y K

44o , ilnn -Q ti ti # F G S ti B ## h E < m n € It 48 E e miB AJ iK 0 G _A ti AO ti m it 48 F w m ti ti K € tx 1 ti Kd -A 1.2.5-3 , 0 m F aF ti 4% y> -R r<> K? id #( gs It ti ti ti y K4 F z z m G C\ x ti K& 440 it lira 44U 440 ti 49 ti A ti iz y #( K F 4§ % # -AJ € 4K ti "n F ti ti ti An AJ « % ■it AJ % o Si % *N 0 % F % §5 It >r- m z AJ 4F *< _A F z tx A ti ti F 0 # AO it ti 1 4t AO

CFG:= 7D^7 h(D%m? n- NG_LIST:= ^ While (H:=CFG NG_LIST {

H (c^f LT, F P cp &##)ao If ( Cp < Cn ) { 3 - f p %umtz>o } else { H & NG_LIST

} 1.2.5-3 ^(D^WL

ck cr1 ck cr2 ck cr3 if cr2 if cr3 andcr cr1,cr2,cr2 andncr cr1,cr3,cr3 cadd r1,cr2, T add r1 sub r2 mu I r3 div r4 csub r2, cr2, F cr1=T & cr2 = T cr1=F & cr3 = T cmul r3, cr3, T cr1=T & cr2 = F cr1=F & cr3 = F cdiv r4, cr3, F

m 1.2.5-4 >/UJl

predicate ^ ^ MT, HI 1.2.5-4 (7)$| l^T mmtZo

- 54 - -#^0 add ft 2 o b btfLiLtZ) d bft'j&Wb'&Zo —&ftt> 2 -0^)0 sub t^CDUffTtiX WtuC)^^^(d: false, S d X>b^m(D^^at true -£$>3 C bft&^bt£%o £¥^>0 muL div^^0^e0^&61:(d:, ##0^14^:^ false C0m®3>/W;i/m$:iai.2.5.4Cr)^#t:^fo SoCO^^^C^^fjLTcrL cr2, cr3 0^{$3—b&AL%6o ^#3—4o0^%^;0 b&¥^if6o C0#1:, ^:^0##^^^ij0^fj^^ &o C0#A, cr2(:crl0^14:^M%^^, ^0add, sub^^0^e&#J#L/TV^o $ fzs cr3 C 4b crl (D§k\$ftfsiyk cF tls muL div ^¥0^17 &$ij# LT ^ -5 o CCDey^fj:, crl 4o0###^03^, ##1:%;^0W:. /:/:lo "£$>£C crl 0$§M^M% L^V^x cr2, £&& cr3 0^1~ 2o0^e*am#7"D^A±M;t%Lmockt:&ao ^^^^70^7^0^

d. 3 w-w )im cc-r^, th>y¥0 yo^7A#A&m 1.2.5.5 cafo cc^(d:, ^m4-^^e03>/w;i/^#aUT, V 9 1,2,3,4,5,7&)tfMb t%, S¥7"U 'yp 6 &, 0 1.2.5-3 /\ >^6 v ^

1 subicc x,10, crl bl crl

2 I subicc h, 5, cr2 subcc y, 4,cr6 I bg cr2 bl cr6

2: IF 3:1F2T 4:1F2F 5:1F2F or 6T 7: IT

0 1.2.5-5 ©3>;W;H^ (1)

- 55 - la i.2.5.3 LTv^i^, com-ekL % (Tftfc>*>, S*7"D y 7 2frt>T(D 3,4,5 %ftMtt£)o la 1.2.5-5 co^inckL iWlB^ftT l^£0 s^7"n y 7 2 %z\$X 1 ft

subicc x,10, crl bl crl

subcc y, 4,cr6 subicc h, 5, cr2 bl cr6

2: IF orncr cr2, cr6, cr5 cmul cr5, T 7: IT

ia 1.2.5-6 >A^nvm (2)

subicc x,10, crl

subcc y, 4,cr6 subicc h, 5, cr2 bl cr6

andncr crl, cr2 cr2

orncr cr2, cr6, cr5

%

H 1.2.5-7 ww;v#i (3)

- 56 - False(2= ip), 3##0H:&a/:#)CU\ #H4^1# False, (3:lF2T)o C^l^CD^f^ O&kT, m^yo^/7 3,4,5 1.2.5-6 -e&&o HI 1.2.5-6 Tli, y & 3 b 4 fttii'&ZtlX S* ^#7D'y/7 2 k &^yD'y^2^7 t:^8L, c^&^lzL^e!I^HIi.2.5.7C^fo

subicc x,10, crl; subicc h, 5, cr2 subcc y, 4,cr6 andncr crl, cr2, cr2 bl cr6

orncr cr2, cr6, cr5

M 1.2.5-8 (4)

im i.2.5-7 tid:, jjgtin^z t£>st>frZo 8* y" uy y 6 fr b(Dfflffl/w

(4) mm

8®rad:, V 7 t'l)x7'>;il/ — XX' F&81& f 63>/W - e[!3]o v? h*)x7^viU-'>3>cj;Dfiofeo 7 '> >CD ^ 7s Jl it LIT X ISb 3 [12] [13] o

- VLIW:#mm2, 2, ^}l%l (^c/:L, a^kL ##/J#A^W:l)o • ###g : Load/Store : 2 , ALU : 2 , FADD : 2 , FMUL : 2 , Branch : 1

-57- - : 6 4, 1/^77 : 6 4, . D—P01/-r7>i>(t4itd'^;bo

- H± SPARC ^:$#o /:/:L, 1/^7^ #Lo

^>f7-^7U^7AHTIt Mediabench 0 — gP 0 7 D 7 z? A £ ffl ^ fz [9] 0 Mediabench(t, UCLA!?##, 7^07077 7T&^oJpeg, mpeg, ##{b^^07D77A^A^Tt3^,4'#Ctl607D7 7 A0#^#^!#f/:6o a#o##r0t L^co a. ILP0mO ^f, VLIW^^p#!:a'0<^ cfua, vLiw ^^^0^0^ ft©7 >r — ;b F^LlzM LT, NOP !?&D, ^e^0VLIW^^P#0#!l^^#^L/:#R!:(d:^^o C0*§#&I3 1.2.5-9 to ##^^>7-7 — ^, $%#^##^#^0#AO$ (%) c 0@

40

35

30

25

20

1 5

1 0

5

0

m 1.2.5-9 ILP ®mm

-58- nnt- 0 gp CTtT E d MI M M M 4 cn o 4 & 3* m Ml % 1—* N: FE E d -1N3C0^UIO)nJC0(OO 4 * » S w nt E h-* 41 gp io 5$ rt gp st- ooooooooooo O' MI S* 0 0 S4 d w % 4 # io VP Cn n rT[~ gp 4 0 4: 4 Or d % cn E E h # CD % d adpcm.encode ^ ^ fX M aw t—' 1—1 aw O aw e>i 4 d # r a E 5 0 o' O n M » % < S? d H d adpcm.decode | ^ 4 r Or 4 <4 =t! 2fi Stl 0 S or e>^ gp 4: n E d St VP X d n # ow $£ 0 av E d [S M pjf d S) 0 E d 4 g721. decode |...... —- - j E "d^ [ 0 Or □ 4 VP Si E r> |...... ~...... j E (d ^ X hFF % bit E 3 e E m E d ItS gs. decode j ^ 0 M eP 9- mr 4 4 ift p/ St n> f f 6 S $ w aw d W ■i Sfr 0 0 VP n- gp M VP d -r, 4 m ib­ E M f4 E d # id bt-

X gp gp 3t id b> d bt Or gp 4 S 1C gsm.decode j—...... ■■■■■...... —j su i» ft d B> rT X. B> 0 C" ? 0 E 9 S r f< * gp 9-t Si jpeg.encode j d E e d SI & rv V m % O' n ! i ■d> Ml St I jpeg.decode j...... j (d A 0 E SF «P % E d 4 m to Sfc m E d X r cE mesa.mipmap |------1 & Cn bi S> St- j | id m 1—i 0 0 9S St­ gp o 1—‘ n§ 0 I...... ow n □ Pd or d n> & % s> E mesa.texgen j 3 m 0 uu 0 d n> T4 0 % fX % $ j | m aw US St­ St 0 0 vo^ mpeg2.encode j u XE E #E ow > k—* its VP # gp d to r # E Pd fX M »&- mpeg2.decodel j i Pd F$ □ r> y\ m d d 0 8P nut- O" 0 ever Mi d □ pegwit.encode I „, ~1 4 H u- St 0 0 St­ d gp F% 4- ^ SFF n E > F# 1> iE ep St n F# gp 0 nn> pegwit.decode j...... ~~1 or m 4 0 ever E E 0 > Or 0 n n r X z F# m # PE 0 % to or 9tl T, (Predication) V7 b t>^Ti/ ? 3.U — & J x>r ffffi

Intel HP% IA64 T —/:/: L, IA64 TL^l^ (ko'T&^o i^ly^Ul/^iJtM, tSWUift&a

^c/:L, IA64 O'T&aZak:, ^3>;w ^,c ^(±m#-r(±^^o 1.9 1.71.8 1.61.5 1.4 1.3 1.2

1.1

1.0 e e ® e ® e ® e ® e ® a 0 c ® e e ® "O *o ■o TJ T3 ■o ■o ■o ■O ■o T3 <0 E e ■o ■o « -a ■o o o o 0 O o o 0 o 0 0 E M o o "g 0 o o o o 0 O o o o o o O % o o o o c ® c e c e ® c e c ® ® c ® c e ® ■o ® ■q ® ■q ■o V ■o ® ■o ® *o ® ® "O E 0 ® E E d d & E E bO

m 1.2.5-n

[##*$] [1] J.L.Hennessy, P.A.Patterson: Computer Architecture A Quantitative Approach, Second Edition, Morgan Kaufmann, 1995. [2] R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P.K.Rodman: A VLIW

-60- Architecture for a Trace Scheduling Compiler. Proceedings of . Second Int'l Conf. on Architectural Support for Programming Languages andOperating Systems, PP.180--192, March, 1987. [3] M. Johnson: Superscalar Microprocessor Design. Prentice Hall Series in Innovative Technology, Prentice Hall, 1991. [4] J. A. Fisher: Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transaction on Computers, Vol.30, No. 7, pp. 478-490, 1981. [5] M.Lam:Software Pipelining: An Effective Scheduling Technique for VLIW Processors, SIGPLAN Conference on Programming Languages Design and Implementation, pp. 318-328, June, 1988. [6] M.Lam, E.E.Rothberg, and M E.Wolf: The cache performance and optimizations of blocked algorithms, Fourth Int’l Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63-74, April, 1991. [7] S.A.Mahlke, R.E.Hank, R.A.Bringmann, J.C.Gyllenhaal, D.M.Gallagher, and W.W.Hwu:Characterizing the Impact of Predicated Execution on Branch Prediction, in Proceedings of the 27th International Symposium on Microarchitecture, pp.217- 227, Dec., 1994. [8] G.S.Tyson: The Effects of Predicated Execution on Branch Prediction, in Proceedings of the 27th International Symposium on Microarchitecture, pp. 196-206, Dec., 1994. [9] Chunho Lee, Iodrag Potkonjak, and William H. Mangione-Smith: Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems, in Proceedings of the 30th International Symposium on Microarchitecture, Dec., 1997. [10] S.A.Mahlke, D.C.Lin, W.Y.Chen, R.E.Hank, and R.A.Bringmann: Effective Compiler Support for Predicated Execution using the Hyperblock, in Proceedings of the 25th International Symposium on Microarchitecture, pp. 45-54, Dec., 1992. [11] S.A.Mahlke, R.E.Hank, J.McCormick, D.I.August, and W.W.Hwu: A Comparison of Full and Partial Predicated Execution Support for ILP Processors, in Proceedings of the 22th International Symposium on Computer Architecture, pp. 138-150, June., 1995. [12] 99-ARC-134-19, pp. 109-114, Aug., 1999. [13] A.Asato, E.Yamanaka. T.Ozawa, Y.Kimura: Compiler Approaches for Exploiting Various Levels of Parallelism, in Proceeding of RWC 2000 Symposium, pp. 71-76, Jan. 2000. [14] M.S.Schlansker, B. R. Ran: EPIC: Explicitly Parallel Instruction Computing, IEEE Computer, Vol.33, No. 2, pp.37 —45, Feb., 2000. [15] A.Nicolau, and J.A.Fisher: Measuring the parallelism available for very long instruction word architectures, IEEE Trans. On Computers, C-33, No. 11, pp968- 976,1984.

- 61 - i.2.6

(l) littoK

&£© < ofroft/gfl'SflciS^T*'!? L A'L & As 6> x *©-£ t\ $ttx-ti:&©o x T&k'fflfr Ui77|)'r->3>BJ; ti *$&M.$SggAfcg+WSS-t*® tlW5, 5fi?'Jgt@t$^- K r>iy^©*ffl»s-)iS3i-triCkoTA^HKi6:l$ k*i|vf**g+*axfe-5 kLAttiR* *«•?>«*©*© hftot ^X©£o LA'L&As6x f ©Zo%/\- b 5 r70#%Wijm < +£•£#* lti'5 klia;i&V'fflAs SttXfc&o -m%:+##^-v#m'm«©mmAK-3©xmmf&ck©**&A»a. «la s:§- LXUS^-^xer-ya >&© L^-VtJVaVka-i'ffltt tbA^©* *te® sn, $e>©«ffl LXV3J£ffl7n77 AAU D*8l#$&|gi@lc*f LX J; b MSlcsiftiS zoc&&ckx&53. cacMUx Lx©^,cka, B:S&Ase>x ©t>li8iJ©tttW©(ti**X- LA'&Vo ccicae-n-ay#?ug+#aK%©g&&%tf <#m-fb^-g-x©<6©ci±, c©jf+ y t &t>$s :+##&, Lx^gt##©##gKm©^mk l x ^ b c fij ffl -r 4 k k © x* t %

(2) M?!Hb^5il/-'>3>

3£?iJI+@SttEl3jolj--E>gi6ffle*©li@(iV'o $X6% < V 7 b 7 ^r5g#gl:fc-5o ttfg, SfflnTt&Si«©j££x 7o77A^%37 b»k#76AX#M©©<^*#&A^ -i:-gk£toxu-5o c©z-3%m#i:Av\Xx cn$xc36?iJg+SS[cy§V7 b ^ix© HXSffik LX k 5.ftXSfc77n-f h LX

• A<#L©#9'J7D77 ^ >7@#g • @t#xn^7 J:^a6?ij-fb • 3£?!Hbn WW 5 • 36?iJ-fb£ft;fe7i'77 V©Sft

%k'##(f LA'Lx ©Til*, te/Sfl-»X©^@ffl/$JA6B< kx

-62- V7 h ww 7[i]&k*ffl§S/i|6?ij-fbfit u*uB,-t»* se>. S4$r-ic, ®iSAsiS)i!t£h,TV3 kl±l'C'A^l''o SScoSHjjfi^J-fbn >;H7li, tc*-7V'-c;v-7"lz^;v© (fit-S &©;bs«fcA,fc' T-afeb, >7&A6fflv£=k (JEil&CLfrLSSICti; < ^7 < 36?iJ5)f??tt* fiSittcoufcafr /w7©a*%&#m&amcf V'7'D77Alcyu-Cff»-5ck©B®5tc$>5k@tin^o £©tfc-5&Stt£fT®'f-5 — -QfflSttt, ®StotelJto$SS#fflSA"Cli6:V'%-5 as < wALfrmit^n&mvxyt-ot l-c, xui^-y3>w»tf6 n-E>o X 5 i l/ —-> 3 >tt, @t#©V7 h ^^TXSSEApgtcWSlfiJffl L-OO, 7 h 7— ^r7 7 7 7 OS 7 — 77777% ^©#fz&7A — XICf%%7 %&©#I%K#T t LX, £tz, 7^777 7 h 7*-A®#tCj3lj-^,'>-A V7»ti-e««fliSieto-E.A »fflft»i; LTlSESn, ti«©fiK#it [3], «;i£©i9ffl-(b«k UT, DEC (COMPAQ) © FX!32, Connectix tt© VirtualPC %

itvS-eta^A, V7 h 7^777 77©R@©#*K, C ©lUV-i/a & C a&#AT# j:7o 1"%fc>*> a# (m*) :+m#©7Dy?A&, M?'jftg«±-gjis j-i^-->3>^ff L-3 7>atg[6]±£ff 5 tl'-5 tifflT-ft-So Ztl% n. D-->g >J 1£ L

W'& * 7 V I ? h n - K

(i$8Bro'>6i.'7-7 6KS U ( = WJKl4aWU) A'fctJ IC?»J • «B7 - K ( = *-;\"^7 K) Si6tnbfc757

@ i.2.6-i ®«s«j'K#7’7 7SfflMfeiie?U'fb0fce.* t

-63- o) 7

#9Ij-fb^ 5 a ls — '>a T"D^7 A6»6 3£9iJf$0 6lJ#m&Et^-5o M^nyyAffl^TJl-i; Ltli, fW#7 D-©maa^6)RI7m L*6i'$^©@B^9U -e$>-5$*7'D V U **7a y £ISIC#41‘£$J0iMft#litl& tf- «S977t LT#a k©

IS7D97 -5 rfflffll • 7-'— Eft® b HR t -5 -.KB# 6 3 o f kt, SilfefiMlffliACft7ci7iM*t: =ko Tft#H«4SJ5t#7-9S*»ft»t5. ##C, i7$iJ*»s+»lciW<1 WS:t57-i'S r^gij . $j|8g/- h j tpf«ft;z>»JS&7- K4#tfy-7I;*

i7$ij -EE/- Ftik St^CSJffll? n-&U U^-j'E© h-7 >#SJ*T-6 ©@J#&#tf t:*eabl:*X$4i6k kl:»%. UolEd^ :F8 I • «E7- h*l;n h-7 >#$!!# V7ck$tcttsaXSf©^Si|« h-7 7» ©mmmtfo rh fr 6 ff & 7 -KB # ifc -S o HE©t-dT'7Atcjb‘V>Ttt, t-"-1/->*7 7H8© ##M#, £ eu\ Btotcy f vx©z*/£$fi57^ iJ##fS©tt#M«t:pg6n^<, y M l/7##AbtC%^$fl%7t L < &7 fctolCx 7 t b#ff @Ctt#W®#zfc-5#%v#©;?zSlt)ft&i-k 7 K vxtbgtlcJ; btoSMR©* fie©#S&e&7 k©7te3I#-KBk&-2>o 7l±, «0i#977^lil')!;ail!7D97iffltTiV *0?&^ k ti:MLTIELVo %*©##7? 7 k#% b - s«to$ijffl/Tr-j'tos^7 7i!tt, tt#i#sat7-7tf?©ssgia7ti»7- (7$n-0'^>^*5»®T-ft-5o 7Dy7A©#^#V\©6A^'6?#jR#»@;a'&-C#3 Eb#mi"-5iikt:* bx *m*)»;+#@#k, t©:t#&%#f-5A©©#*:^&#M, 1~ •2>ItWk©tzjb:&W-#fi:&fc>ft-So iu\ipi5t, y-7 imtiZr-fiMtei-' LiMffl 7D-»s^aijBii7^y-7!$k\ $<©mm##mj%yi'6y-7, ?mi©@a&y-7 ii««fi©,j>&uy-7kM&u 7ny 5i,mm.s.m^'ntt7kk c

- 64 - # x * 7 v 7 w * sn 3 mm 7 n - a

(4)

a»ai'r-^t#y7 7tii> eni$?#c j;-3zej##*M#&*g#L, 7--7 fit ^ i*C i o X =r-- 7 «c # M « 4 & IP t S C t fc \ x *7d 7x*f*cML-cwsffliW S c k?&So WTt ^aij©*tSk& ;S'fc©t;mT© j;7&ssi"t-$>So

• SI® 7 D— (Control Speculation) IIS© 2 ;& [6] ANSI’S* 1 t&C>A\ (f #$A^&© V 7— 7&SA) K V 7 & -FSJA S o • ir—7 fit (Data Value Speculation) U 77 7* JtU'ftWCT K 7 7©)&;t£ ns 7 "E U AsS*7n y 7 A s#^, f©#A-e#*$tt7t\s#&?aiif So

ttSA,, -r^T©SI®fflSn/x-7ffltt%IE V < »»JLTteSH«lc7 7 7, tfk >7&k") 1: S7-7##NmkA^S7&S7o i^iH A J; o T#to W Ek#x 6AS t©©$ < l±##©Z7%7-7©'R#M#7&Sk kA^?@$ASo

(5) S|®7 0-©?#|

eK?#ia, £IC7n-b'y-y-©asi6|±tt«k DT, mt\^CA6cT#*%E%A^ &*>AT$fco i&^ewmSk LTaT©©AsW=ASo • A-R©##&@& L6?ai| • ggft* 7 > 7 A J; S fl-lK^iM • flli©fl-tt®$kffl#M4«fllTS^fli] • 7?#| • /\k 7') 7 P?#|

-65- C^U± (L^^c &) (L%^) ISHf^^fiJffl ITi^o ^£oT^iii5©W£R £&#>£* ^[9], #<0'7-r^DyDt ^^^^#1:2 If y

D, ltfvhCD77^^^<2tf'yh Lf flH_L) c a^:ck'oT, ^^8yi:^^©##©m^#^< f ftT j:o lc LTl^o

cNa.T'D^ A©em:,L'3T^La#iiiai$irfm©^fimmim#&$m u^miir&^o

VXt#$g£ffll>fcEg§/<-X0^&&^S£ftT^;MlO][12]o A©i^#i: L^^i%©##©^^#i^t^;i/ ibL, t^;i/©;^7—at: j; D#J^©#i^#&^)k^9 ^ t)©

&&AL C ac j: D#J^©#^?#J^W#a^ •?> [13] o C© jlo ^#^©^6#^&#^©##

/W 7 V V ckoT j:

(6) T-zmo^m

ii-v-r pv'7u-t'yy-®m&\n\±&ffit u-ccc^^^m^^g LTLLT© d:o^^©^#(f ^)^t6o

* 7 L?#j

• /W 7" U v K7-$J

Lipasti 6U\ — gtf© Load i^^|?e

•66- 777 F t/7©s&iJ7-7#?#i©#*A#A-k#x^c 7'j7i'j/fr F F7©7$JC(;i, Z>x F 7^ F ?#!#$-< i>tir Sti'5 [17][18]o 7i»i[19][20]& £% < ©iSjceti, CtlS^*- T-'-^eoneFatoffiHtt^eti-ferotcattit"- e. *©k lt(*®)v;i/3 7T$jgsAs# x e>ns= *K[2i]T-ti:7V 7^777 F V7©78JC, jtKtigjT-ti^'-^evflicv ;i/3 Lti^» 7-7#7#lk L7t, a !##(:, SmMti&fUfflT-S AS, tr^K-7 ©AS, 7o 7 L7%&©7#I#S& @©e(7 6 AS[20]Ak#i@m$ATU6.

( 7) Java Jog-time Analyzer

a±, %#i%eim7-7##77 7Cct\xm^, 7c7mv^tt^7#iR#c^t\T Mf&fr&oAo ±K77Vl/C%-7 < Java 7D 77 Aj£?iHh©StSI6fi:&7 AtoC, 7 D 7"7 A ##@##7—71/ Java Jog-time Analyzer $r @8% L A (Jog-time iltt Runtime i <0 lill'i! Walktime[22] J; V !4i$<©S)« JJA ©KftlJJ^oa D 7fe-5o *f , 7 57 7 x7 ;tffl n— F8SC, 1M###A Z A S^ttSCovTeSiro&JSffi&fi&a,, 7n77 A*se«!)An-5 fc, tFFSn- F7 > 7 7V 7IJ#R[F©7D77 A©@6#7'l7lMf ^>#%A##g©iR#&GA7o A © A © C, 4tR0n — FV >7 7 V 7IJ, ffMAWffiT-f# ftTU-5ffifg#* tC Lt, $*7n 7 7©%#&gg#T6#*7o 7 7 F 7 7 >7&ffAV-, 7n 7 7©%l?&iliit' £ A A' C, EI+t#IB©iK*, 7-7 - #K?#]e7:L-;i,©@K, 7o 77@mtZA77 V 7 7-b7©77 A77 >7©|+®Ak'©totoAtl$©E*%ff A-5o ?lt, 7077A# TBec, 7V7 FWffiUhm **7o 7 7Eff0E, ©?#]*, 7 V71- *;i/y<71+ene@, jfi^iJSA k'ffl^ffKa+7-7 6, ##77707 n 7 F 11 & cm A t -So

Java Jog-time Analyzer

Static Analysis Runtime Analysis Analysis Results Java - Basic Blocks - Interpreter - Block Exec. class file ■=> - Control - Basic Block Counts Dependence Tracking - Prediction Rate - Data - Predictor - Critical Path Dependence Invocation Length

Branch Data Value Prediction Prediction Modules Modules

M 1.2.6-2 Java Jog-time Analyzer

-67- jja tt, tzfctbtis satbTsacffl^attcktf-et 5, kvid^-ett^gnJigt-s-Ex, $4. jja tea, tI.T0.fc -5 n-tv^„

• 2 tfc-y hSSffiAty j'SfilUfcfl-tiTiM#? • fl-eassfflv^/cfl-iRTSJs • T-TiassA^y j'-tT+ffl-r^TTT^-xofl-eTSits • Xh7>f fc*j$;©B»ST-j'BT#l$

JJA-eiikA6&SU6k fctfcoT7D

• »#5/T-^ fiT9)0tttg • 36?iJS

0j;o»m#&#ac

(8) nines

Java Jog-time Analyzer VC Java Tnfc^^AOjS^iftl'SiiiiJUT^fco ''<> T v-fc7D^7Ailt Unpack k Javac fcfBUfco < &# — ^ k UTl£ < JBU fe tVCU 4 Unpack 6 Java "CSBxE U/fe LinpackJava -nfo&o ttHAsfiolE"5. ##IE JDK1.1.2 (CfcfcKT^ Java TgBxB $ fclfe Java 3 >;H 7 javac tft!), 3 > r!-f IV fcfcfl © 7D 7 A k L "t ± 13 © Linpack.java SfflUfco 3>&3|J%1^T3>/WTT^@^ (Javac) k 3(Jti#fr (Javac-O) kTil/ISff^ofeo # 1.2.6-1 CiBSSTTo TA^fitc, nffSSTre^-jlJS fcfei*7D -y i'ftoTtSs tStSSEfi 4 ff & t> & Av £ t|ir© MU6. ##*045 %ct#l&©T%3mS, ^RT«*©f T-^T#j*©TQ4T LTU^o

i.2.6-i m&*es LinpacklOO Linpack200 Javac Javac -O 8,015,783 57,348,307 5,011,325 24,340,585 y /7 8.7 9.1 4.6 4.8

VtS36?!lfi 1.15 1.08 15.5 11.7 886 3156 36.6 129 99.1% 99.6% 93.8% 94.7% T-j'TSJ* 98.2% 99.2% 62.7% 70.8%

-68- Unpack tffljfeS'JSttHH 1 T-fe-5o CMS, iv-73! Ch6©BbigLI@C. ;H7^>r?Xttl:J;5r-5'l#l @»SS61" 5fctoT-$>5o Unpack TttWt&A-T-©!!)? #M©I$ k A, 2:" 6 £ to-5 4», ?#l*k AC 98%y.±i:«toTiWV'o CCDISS, fiSm'ftSaSfTCiot, iuv — 7T >7'77gStCcfc 5f-Asfl?-;B£fu iv—7©5-iRb ii L iwi'j 6/r v/zt-*—j'tt#M*ffl*c®ito$nrx'r ya, — u >y$ft.5J;dc&b, CtoeiRStoTilVjfe^JS^tiaiJSh-r U3= $ fc LinpacklOO 7 886, Linpack200 T- 3156 t, HH 4 fgC& o T£ b , Siiii b IHg-y-T X© 2 SgCtt^J UT l'5 C bfttofr&o

400000 total daxpy matgen 350000 dmxpy dgefa idamax 300000

250000

200000

150000

100000

50000

irmnmnnnnnn n n n n nm nnnnnn-inrinr-ir-i

Time Steps

0 1.2.6-3 LinpacklOO ©36?Ug©^ffig-fb (MtJ© 1000 7 n V 7 )

0 1.2.6-3 C LinpacklOO C*143 3fi?!lfifflB£fSSfb£Rfflffl 1000 i?D'^l;-3Pt iTLfeo U < C Linpack ©Jl^WLflTife -5 LU (dgefa) H 3 #01V— $ tl, 1*3(11© 2 "3© Ik-7 (c©9 daxpy) * s3fi?iJ(bnJSgt:fe^>„ 0H»Jfb$nfcrtffll© 2 S IV — 7 A5 IIM-13IV — b iS^tXTU-Stt^SSt) Sfc, CftAC 5ti:A>#toTi®Ulf — £A5fK9J £ fl •?>ASCtxtt, ®biSL6#fiKT5 2 c©**7o -y 7

« 4 A * u © C % ff m 'n $ ft T u £ fc to t? fe -5 o 0 1.2.6-4 C JJA AstHAj Lfctt#77 7 StSTo C illi Linpack © dmxpy tl'i ^ V 7 F©##7"? 7-C&&0 ff5U h lVffl#6*to-5 2 SlV —7T$> b , Linpack ©4" v ##77 7©##A7J\$ <, &*&o, L-Cli)U-5o

-69- method: dmxpy class: RunUnpack

0 1.2.6-4 7 7 tti (Linpack © dmxpy)

#68bf y-y i:«x vyxvoiE •5, ^nen©ffitte*7-ny »68blt* t>. n©4'i-ti:8*ftton-:r-4? A^sn TU-E>„ 6±©Ett^> h 0 — J — h'6Sb LTb b , C z.-frb%%. f4r- y V y b05l8A"3©tt#Hfli68bt"o -#T©###%%G0&©$V\#1| M L©%*yo y y T- $> 3, Cffl y - b Ctt 6 * ©T-'-^ttSr-y k l *©$i|fflttsr-^»sx* ltv'^al ;ol*7D y^0#O)SL#ecj;O «u465iJS*sff bii% mtx ##7-^©$ < li% h^'f h« o o$ b i. t> ,lv>TSi|*A sf# e. ftTl'^o khttCb ’SOEA5 Value Locality 68b ltl'5 = 4"6tt Javac ©8ff&n$t^ "3 V T & 3 o y'> 3 > 6 "3 M" V i Javac T-tiu #a%R6R%b6t\@Ac?Q 15 ge©a6?utt* sff e.hrb t> , f? 2.4 fg©4fi?!jg©|n|±A s)#')>b3 C ttffciPSo Linpack 0 3 >6otj-fct§iB\ y'-6 ^**©ft±ti $ i$k"eti:&i'lc: S W b61\ ###b=CE:6#Ma0|o|±^A lHgkx &© 2.4 fgktb^TA'^ DbSVk V'3**Sb3l*As»e,h-6 0 c©aAk LTx *mYbl:ZoT*b^ms^##^ec b^ll6$;ije©|cg±t:iSLyc'fe©-t-ife^k k A^aiSh-SAL S;t&Ass> Javac IE##A^ ktc6axTi*)SA s&ggdhTuA =cVNfcto N cbLH©#m%##li#KA-c liHLi'c EDS LV®tlftt4-E0Ei@T-$)5o

- 70- (9) £

L'0'diij^|q]±$:f#^, ^Jft:^ ^ 2- P — '> a U M7( 7° □ 7" 7 A i;: !*];£ f £ BO #. mmrnm - 7'-7##7'7 7a^7 #0^707*7 Amm^^^u, 2 w:, Java yD7"7A(D#####7-;i/ Java Jog-time Analyzer & (### L> CtlSrfflVTV'' < 7> ;fc> CD 7° D 7" 7 A CD £:$tl /L L f®&8#U:'7WT##?L&o ##iW#';F-f##f77l:j;cTm#7

[##XE] [1] Wolfe, M.: High Performance Compilers for Parallel Computing, Addison Wesley (1996). [2] Wilson, R. and Lam, M.: Efficient context-sensitive pointer analysis for C programs, SIGPLAN 95 Conference on Programming Language Design and Implementation (1995). [3] Sites, R. et al.: Binary translation, Communications of the ACM,Vol. 36, No. 2 (1993). [4] /p#: Virtual Accelerator (3 L 7 —A P 7 Vol. 96, No. 231 (1996). [5] /J\#, ULl P: 7 U— ^ r;WlS7' < 5. n. P — 7 3 ><7)t&fp # #j^#,Vol. 97, No. 225 (1997). [6] /p#, iP^r, ill □ : 7##7 7 7 A Java Jog-time Analyzer - Java Virtual Accelerator ttT ©Y’iHf fffi - , : 7°D 7*7 ^ >7", Vol. 40 No. SIG(PR02), Feb. 1999. [7] Hammerstrom, D. W. and Davidson, E. S.: Information Content of CPU Memory Referencing Behavior, the 4th Annual Inti. Symp. on Computer Architecture (1977). [8] Bobrow, D. and Clark, D.: Compact Encodings of List Structure, ACM Trans, on Prog. Lang, and Systems, Vol. 1, No. 2 (1979). [9] Smith, J.: A Study of Branch Prediction Strategies, the 8th Annual Inti. Symp. on Computer Architecture (1981). [10] Young, C. and Smith, M.: Improving the Accuracy of Static Branch Prediction using Branch Correlation, ASPLOS VI (1994). [11] Yeh, T. and Patt, Y: Two-Level Adaptive Branch Prediction, the 24th Inti. Symp. on Microarchitecture (1991). [12] Nair, R.: Dynamic Path-Based Branch Correlation, the 28th Inti. Symp. On Microarchitecture (1995).

- 71 - [13] #, /J\#, AB: 7 V ^ SWoPP97 (1997). [14] McFarling, S.: Combining Branch Predictors, WRL Tech. Note 36, Digital Equipment Corp (1993). [15] M. H. Lipasti, C. B. W. and Shen, J. P.: Value Locality and Load Value Prediction, ASPLOS VII (1996). [16] Lipasti, M. H. and Shen, J. P.: Exceeding the Dataflow Limit via Value Locality, the 29th Inti. Symp. on Microarchitecture (1996). [17] J. W. C. Fu, J. H. P. and Janssens, B.: Stride Directed Prefetching in Scalar Processors, the 25th Inti. Symp. on Microarchitecture (1992). [18] Eichemeyer, R. J. and Vassiliadis, S.: A Load-instruc-tion Unit for Pipelined Processors, IBM J. Res. Develop., Vol. 37, No. 4 (1993). [19] Sazeides, Y. and Smith, J. E.: The Predictability of Data Values, the 30th Inti. Symp. on Microarchitecture (1997). [20] Gabbay, F. and Mendelson, A.: Can Program Profiling Support Value Prediction?, the 30th Inti. Symp. on Microarchitecture (1997). [21] Joseph, D. and Grunwald, D.: Prefetching using Markov Predictors, the 24th Annual Inti. Symp. on Computer Architecture (1997). [22] J. A. Fisher: Walk-Time Techniques: Catalyst for Architectural Change, IEEE COMPUTER, Vol. 30, No. 9 (1997).

1.2.7 n7 y*-SUIF 7nyx ^ b CDhMT M WbSSE X fz SUIF 3>;W7[2][3]^#^L^m^Jfb^mm#, Parafrase ^7[6][7]&#a^ LTL-S & © £ UTtL & o

(1) SUIF Explorer[l]

SUIF >7 Monica Lam t & o X 'M&X U 3 W W 3 7 ©4 > 7 ^ £ UT&B Explorer HI 1.2.7-1 iz&to Explorer a. coarse ^ U Y > ft tz SUIF 4fe?0fb

-72- Rivet visualize:

m 1.2.7-1 SUIF Explorer

t«08Mf6J. IB^iJPffiTffli.'^#® 'J -y a jW'JXAtffli»^t)ttait-5faSII» LTl'?« »ff»ttt^T07D^7 Af& D . ^KSWiti L£$£»s7>fcT!a©fi¥tiT&ffd„

® xxiwt/r © K?!is»©tt#«w © ns, r=y^^- MB5'j©itm @ x*7$»klB?iJ$E© 'J“^9>l:Mt5U y^ya>/i^->©#m

S5@©$e[i6]T6iE^£1fc'>t;:, cft$T-SSK ©fSAr^ffitfiSiesn-tus#. jji tdSffftt6^ij-&ii5fct-n«Av'©-ett» <^ nfflh-C*fe-&o k < C>pfBi/X^A-ettlWDfBeFaA5*$»0JI

-o#&3k#k& D #;i/-7A^A^RR RSCAA^MfTx ;i/-X-img6 b ©?Q#+@#R@&%^f 6. cn6©t*ISiR*tt.

- 73 - 3 ww 3##d/-7©m#C#)g#$B4 3a#f S3+9I3- FS#At5Ctl:J;oT Hilt'S, c©g+9ICBt'SBSfSti:#SC,>& <7 t-Mji K fcli&fefcV'o HfftWrttS&i+S U /!/-7##tU +bWmbS 4$U ®fSo Cffl&tottD 75 Affl read/write #®43+9Jb, 73 73Alft©^7tV# B)C++VteSf© write #!+:«$ 4 S+# ft So ft Ac, 3 >7 W 7 C ft oT^t+i$ ftfc'Jf 6S'$Sk U 77 7 a > 4ItoJ U c+i5©iSi:itStt#4*StSo $ Ac. iMtt# Jpf-?©t7 + ^- MbC ft o-t+£ftJ+b£ftS7-7 -fe l®S'J "T S C I: As ft $ S & 27 3>/W 3 kffliS8l+)sUrD6^®nTVS= C ©#%%#+)+CD# RA^^^S Aft 7d y + ArttcptSt'S 4£?!ltt4Adit'Sftfttf+l+ftfcSo 3 >/W 3 © @##M+b#b tjfflStSISS, 6SVlt36tiJ+be«©»«kv3 fc+imc SfiJfflft$ S 2: JgfcftS« c. tilbStti+f ft 7 >7 73 73 A©tt6blRl±4HS+cto©>tlS5)fflt)'d'7>7#E rGuruj 4(®xT^So 3 7/W 3### Utft-7^H++#ClR# L+zft-7 473 73ftCf ©ft ftftftA TSitS A&ftlfttt < x a_-74mB%ft-7C#g$#ft#?iJ+bCAB&m !843--yciigvaii+s*-ettibiR]±4iaoTv^ <„ a-fcs*s+i5©iiis»c M?iJ+b4fTofc»©»f§lft D te VS++Sk Lftu S 7b 77 A A* 2:*© ft S CiB#d A>4)tioft V-ftlftStio Guru C J;S36?iJ+b7D-fe7fttT!B© 2-3©Smto»RB»s+l)V^nSo CfflKBli +£ n it m * 4 ffl s f s fc © c a +» m v> s ft s * © t- s> s ,

• j£tti+b* : j6?iJU-7a >ftftHfi£ftS8Slig©i!l'6' • 36?i)+bto6 : +fi?iJ®aiFto©*g$

#M+b*#ml++t|j% *wK#Aft# 6ftSftltftlft%^o #M+b*BA7j\$%@6C ltl@im^7n t 37M7-7im+#©t"-;^3 K C ft oft M?'J+b Lfc+8-nt;:tttbAsffiT f S C t * feSo RB©'J'$%t/—7#$V'#6CI±KB47@ <+St + 7©ft @ S +21+++#©/!/ —7ft#tU+b Lt D , ES7 —74 ft k#>"C jfeM U — 7 a > C ftSfto^C 2:4li(7SC 2:#AB2:%S. Guru (t3btiJ+b+i;+6?iJ+b*4B«*4«))n$ ■e-SC i:4fido Guru ©##4T3BCtiJ#t"So

® 3£ty+b* 2:$t@$ft SII;toca6?iJ+bSii+c3- bCMfS#9iJ+b*i:#B43-7C7n l, Ifci;jv-7 A^'J+b $ ft Ac »-f=r C It ^ © tit* 4 M»f t" S o

® ie?iJ+b+tfe;i/-7© u 7 +as A © V 7 HC It I/O 4#ftft. A"3#ti)+b/V—7PtC%t'%%/)/ —7A^#ftftSo 7 — 7031/ —773 77+ ft 7 7 3 ft +f C ft o ft |+9J £ ft/cHf+Bf ®©RJi C V 7 + $ ft S o

- 74 - (D y°u to'MW,

T t:$ < —&&&&&&'? 2>Z tftt)fr'oX\<'tl\£' 3- — +?l&Z(D)l — 7&i##L&L'&& Lfl&V^o Guru —7°D 77 vCi^^iife^7n^7A^^i/Tt^)o 7°D 77 7###^—7#####!? ^ ^ &^iG±E^^777^-HbW#^^7^&$mLT^

7 A7(7)7:^#^f 6 C

fcT, c®##&#$fbf 077077 A777 7>7[4]^^7m^^# Alti^o 6 3— K £ giJjftjl:H 'Mt Z>jtfztflzmfet % z b&-C%% ^)(7)T&0, 3. —7^1±gT^#^C^:(7^7^71/7 V >7f ^caiCdLoT##^^#

c k !:& & D, 3.-^7 ^cc#gL%Mj-(zm^&

fe ^ #,^ <7) 7° U 7" 7 A 7 7 7 7 (program slice) ^#^^7-7 7 7 7 7(data slice) LU\ 7 P 77 A 7 7 7 7 <7) 7 7-fe 7 b ”£ fe o T, L&V7—^titocDUl^fctt^G&So — ^(7)$IJ#7 7 7 7(control slice)7D 77 A7777(T)it7t7 bT&oT, (7)#^"C^6o ^^7^, lg 1.2.7-2-f7-77 77 7k#m7 77 7®M##|&^f, GCD^iJT'ti, DO 1000 6 C RS L RL (7)##^^ DO 1000 yb^^E/b safe^o 6 7-7##L$iJ### l:Nt>6^:^m^mL7ct(7)^7P77 A777 7^^^,o 1B7U RS (± DO 1110 'V RS(l:9)(7)|E@^^S$n, N -11/-7ft (:&£ ^(77(1)7:15] L RS(l:9)^^m^^l6o ^61:, DOll3Om(7)^4:^:0^RS(6:9)#^m^^6o ctoT, 1S7U RS (± DO 1000 T71/-7#D^Lt:^T^#L^m(7)im##%:LC C(7)N#W:3 >;W 71: j:6E^J7-7 7P-##1: 6C T^&o

- 75 - DO 1000 1=1,NMOL

DO 1110 K=l,9 RS(K)=- 1110 IF(RS(K).GT.CUT2) KC=KC+1 ® IF(KC.EQ.9) GOTO 1100

DO 1130 K=2,5

IF(RS(K+4).GT.CUT2) GOTO 1130 ® RL(K+4)=...... ®

1130 CONTINUE IF(KC.NE.O) GOTO 20 ®

DO 1140 K=ll,14 =RL(K-5) ® 1140 CONTINUE 20 DO 1150

1100 CONTINUE

1000 CONTINUE

0 1.2.7-2 7D V? kT.V'i

BB9IJ RL (in

*-flH?iJRL(6:9)^ffiffl$n^©Ii*ft*@^f» KC^0©i;fCfe2,o bZ6t\ KC ©I*# 0 RS(l:9)©##f CUT2 LTF©#^T&6o i. o t, 3kft%®X\& RS(6:9)AS1-^T CUT2 HTl:*5fc»E51 RL(6:9)-\©S#AsSlff $ ns, o, bx Xfrb £ n 3 C k A5 -5 , CfflC ztiftt>tfti\is s^ij rlowmmmw'nxs iv -t- 1000 Z© j: d tc 2 -3©E?iJ#!SRa©6[fle> A>Ct"±auts5!i5 fctoCli. rl Silkffiffl©Z3»@ •grttn >/W 7©##'C^#k#m©M#&%6AGcf c cfflio (±l3«7?ttSM RL AOV--7-®bilSLT-S«kffifII©H zb) ZMtlf Vs f

- 76 - -USk:#®©?-^? AX 7^x14 AS < &3©TX5'f^Oiifa^igf T-'-xaviTife^ck6So^ajrfctoi3)aiciz^»vtsig6#$Xs XM&'J'S < T-5-Ar-. J.—tf© on demand C 4 o T P88tSS8&3lST-§-5 4 7 Ct 2.citi£>iT-fe5o

e. ®*-fb it(t'>Xf AtlT Rivet S*ft$giS[ll]SfflVTl,x-6„ *i/Xf Atol'Tliil T-5AA C©'>XAA6($oT Explorer l4~FSB©lliE&teft LTUSo

(D ;W/iXV >X - X'xXXxXif *i LTt'-SX? 77 - HI4AS t3 Lfetfo T£/J'£ < */A£A-2>o Cffl^SI4ti£*©X7 7 V^TX Mctt^T*S1i*X7 7 4

c©x9 xtna^amT-Dy? A©#m&^fLX"? x ©SviCffliA 5>il-5c

(2) fi^-X©$ttM*IB V — X 3 — K © bird’s eye view c?ilT43 D , -f ©Sin _fcT’ V — Xffl§7-f >14 #-©7^ >t 71- >©s$i4?*x Hressctttiiii u-ce^sn

ofc4 7 tiffin Sin $ft -So

© V —X3 —f • Ka7 Rivet > X AA03 — K ba7l4aSfflt)©tlt'tT, L> < oi>fflKS#}fbnt

f. fi/pX A 1 7 A

3_-VA#Af-nXI43 >/H 7^x®S^T-$> D , *®3>;W 7l4c©#i/SC t6ofc®86ff 7c Btk L-CEofc#B/TSX#AC 47TE LV'gS-fb^jfiXiJ-fb^fibn f, LX^RKm?IELt^?Ac ?;t\ i-lf##A LAfg/pX^iE Lt'&0*A7 fr&3 #####©8*&$wLTiy^-r-Sc ctti43>/w tcAS

(2) Rivet >XA A[ll]

3 > t a — X >X A A14 4 A 4 Aitftfb LTto D , 7"D -b 7 XI4 4 1" 4 f"# < © h 7

- 77 - u, —7 —aoC^cT^T ^£o tZZfiK Zo\,^'otzfliU£tifz3>\f3-'-&(DMntJPWW£—M&M4hLZ&

Rivet A7 7" A&7 >b:i-7 A7^ Al;fb7-;V£:;ihit" 6C b(c^c b

ccDao^^^^A^^mf ^±T##b Li:(±TaB(Dcko^^co^&^o

-AGC^TT" V 7" —i> a — 7(a^##C%6o c^t&giJWcf ^ AC

®M%ik j^C A^Mco^—7 &#/nf 6 C b^^f^AUC^—7cD^#^^c k, ^,7"—7 6^:ACA%#[#Ay^##&^-oC

Rivet i> 7 ^ A T # SimOS[ll]CZ^)i><^D — S/3>aC(Di><3.D — i>3>CZ ^TiRJft Lfc7r“7®^:ilfb£^iSiftCfi: o c ±mUlMih(DM^M It (af —7CD##V>7, ##& Query, 7"^7D^T7hCDg^#f^, 7"—7CD#^lfb d>bJL-mT&mmaucm^f ^ltb§f^tl^o tztz I/, A ^ n. ]y — 7 3 y&fid fz A C jUfxRvfFd] A5® < , #C7: m#yy v^-i>3 >-c(a^m#cm^&6o

(3) 7(Dft§8iz£Z>miMmM[5]

luizULfc SUIF Explorer ^SMWbn 7/H 7£/<-7 t b fc^tlSMWb 7-;V T'fe^CDC*fbT, #^!Jib0 3 >;W ;i/7DD-feX^3- — ^b©^tlS$:M btlA^ 93>tyb^#o^^77^^;i/^m^&gmL^^CDbLl: GPE(Graphic Parallelization Environment)f% <£> %> 0 CCDi>7^A(a^-tFm#b3 >;W 7^#^$g^At)1±^X:ACD##:ib7-;i/^m X., :L-tn:j;^yD^7ACD^^Jb^-^cD^m%^.-if-e&6o GPE co^Tia^ vv^^^co CSRD(Center for Supercomputing Research and Development) t II $ tl fz Parafrase-2 ^^'Jfb3 7 [6][7]T& £ o CCD GPE ^7^A^(a3L-if k3 >/W 7 b0ATCDm^##(a^7 7^7 7 ftiSftJ&7°D 7*7 DTff t)tl%o a. 7777^77 777 77 7(a3L-4f^7D77 ACD^^J^fr^m^^Cb 67^CDCt, o b t))gU

777777&BRfBL%:o 777bbT(a7:^ ;h—7°, +f7;i/ —7"7, #*7D'y7CD

- 78- fiPBW©v-x;n:lj;ESv' — b &7/V— t">T'LT77D7 — Ktuaits. htg e.nfcyDy7A©ffli6s»6iM*i®8-e©s:T®j6?u #&#&c k#-c$&o b. GPE 7 — Jf A 7 A -h GPE ©**%%#xA (ffltS) tik 7n^7A©M5iJftk^a ©tt+fd 7 ') "jfts.1- i-->y7D-i2x©$g*T:Sor, im 7/kA^ font© < kt'-5 k kT$)-5o k©-ie©#ffk®fflf 5 7-;kliiAT® =ka CS» T-§&o

(D 7"d77 ASa*ke«©3 Wfd 7jt^jt#A (Jedit) © 3 W\" d ;l/ffl 3 > 7 d 3r a. V — i/ 3 > k 3 >/fd A (Jedit+Parafrase-2) ® 7-n77A777 77 7k777 m##M#©##fb (HTGviz) @ 777©m*R, SiR Vfc ^77 CSS1"-53- h-gHA^x© OpenMP jgi,T*SA kl'-pfeAn^^A^A^dfe^iJ-fbOAi-^k^ (HTGviz) © —kl: j:6x JI6?|J3- h*kItfflfB 3 — h ©SiR >h'f73-FM^UffljSl (HTGviz+Parafrase-2) © 70^7 AUff (Jedit)

3- — -y* ti — l®Ai“ — >7'+J-d7/Vffl&itd 7;i/"Cj6?i)3— NIS7D7 rd ;k t5 k kCi b , /Afflitd k kAst S-6, GPE a Tcl/Tk k»-£ S. hVk'S I7llg[15]67'7 7 d **3 > h©%B^f h6k3>-’W7kffl'f 7^-7i-7tiotl'S. Tel ti, 77')7-ya>Affli tojA»=#$rgg(3't^7ctotC|§|+Sn7c77 V 7 h HlgT*& o t, |a]#Cf ©@#g©d > 7 7"') 7o Hig&teSIf E> k k C i. b Tel 1C37> h’&jigftlt" E> k ki'tgSo Tk l± Tel ©***$6t)^:tiiMT-$>or, X window 77f Affi’7-/k

GPE TbSiStoSn-E, k Tcl/Tk d > 7 7 U 7 A^ibi* ft, GPE t vvs >A^*T^E, PM b 3 WW 7&e#jbfe b , 3Wid7kffld>777'>3>4?T7o A%©3 >/i d 7S# (A*-j'#SJv^6r$) li Tcl/Tk d>^7,'J^7pe,MAE>J;7t;7d;oT*b, ±tffl77 7-f */l/3>#-d>Hi3>/(d b , ztii, 4« jE't'E>kkAs"C$E>0 Parafrase-2 li 7.2- 7 #f i§T $> o T 3 >Vi d —/!/ (+/-7/U —^>) T-Sgi$£ft"tVE>o U fc A5 o T Tcl/Tk 037>ftltS3>d'f7 riXSS'J^tcUff^-e-Ek kAs7SE)o

-79- c. 36?iJ-fb3 77A 5 Parafrase-2 14 V — X to V — X ©#;I6$}66 fi17 S-lliatiSx Wi-f 7f siMtri3*$nfev -xxoxa Att±ta©F*3»*sjc*5t$ns ##©#aib, #ig $#, #XiHb»k'##tftSo c©iaX67c©A*m§;BitXtii*1"S„ chS*lt 5 fe© (C sights ©7 0 XPtyAkAX P7Dt7 76#3Xt\So 7 0 7 O t 7 7 i4e###6#m©mai#%i:$#u *x p xot v+n±-e©m@;m#&A*mmc $161"-So 81614 c k Fortran ©«b#7A — p ^ntl'SJ; 7 Tip 0, PDE "C*I4 Fortran CMLTCntC OpenMP ®A*[14]68txS 47 tC#IES;!j|];LTVSo n ww 7 ©|*| *@914-''! x 2:ud t> ©Afl-AiiT*; b, 3 WW ;PJMI4&7x 1C HSbfl-SSnrt'So en£7#1"SkTIB©47t:&So

- JSMr : XnXAAtSSAefiSfgJK* - $# : AA7D77 AlCttl-SXnA'A A$# - ?77t 7X : 3>^H7i*iS7ny7A*lfflas - AV Vy A > 7 : A A 7 7 3 - p ©# A - IISJ3- p : $j$3— P©l$|g8+Sij - 3— P$fiK : X—77 P 3— P©4bK

XXX£t£©77n — AI4*t#lHsg0'>>X 77XCStA C A> Fortran T'Zfch-tiL X X X 6&#AS k @ ICI4 natural boundary, 3 $ b Xt-fillP— X1467IP — AXPftFdi LXi#^D 7XAsiPS47&7>7;PXxXkttSo XXXX*77©X — PI4XXX tcliS V, 7 —XI4 2 3©y — p®SHf©)*1'Bc6 :fe^S1'So c fflUfi1 Stott# k!4, XXX©SUfESSS/EAS £>©TiPSo XXX»S!H43-+P»$lg^7T®$1"SC k #1^5. OpenMP ltA*A s#A£n/c Fortran V-X 7D 7 7 A14 A V X V 7 P 7 4 7X V [9][10]fftPtb L6^A£ Fortran 7P 77 A tC$#£flS „ C © 7 -4 77 U (43--—7 P ^;P©<6A—-0^7 P©X V 7 P 7 4 7*7 0 Tip So d. iTt ^ Jedit 14 A X X 7 7 XnJBt& X windows A AX hxr-f X T$i b, Tcl/Tk TSBizE A tl fc 7 7 0 7-'>3 >t7-f 77 V$lTiPSo Cft 14 Jstool[13] k»f.£7 7 7-7©-® k LTtg#t£ftTVSo SStptoXX'OPCSSIC OpenMP lgA*6#AT # S 4 7 &« (b6#3 Tla So X-^-*e.SALycV^}g^*6glRASe kte 4 b A AX P_t^© 1177# A^WtbT-feS, *fe, t3k%m@%mb#3>/W7©AX3>7'f A3i/-7 3>k$wx&So Jedit A>X —7zAX#3X7-f7t7 H:ff UT§Sl:3 >7 4 AAX SiiAnMSlA SA@k7X3©g|$4>A7>3 XSrHAASAfiSiStetttASo

-80- e. f X 7 7 ^ HTGviz kmT###&#o<,

HTG#;E^m^L, 3-K&HTG(D/-H:#J&2#ao - htg fawztm®?1 '-#2 3 4 5tmmy v—&*>%&{& ftmtirtzofaMit, nnm tfjcoMmk - xox^ A#^jfbaHTG

HTGviz TBdlC^f S'OO^&t^f > —7ai-<7^i#X.Tl^o

e ^%7^77#mMC7 (D 7” d X -7 a n — F £ >r > F £ © ^77^77#%^^r>KC7 (3) j#^Xf^h&^4>FC7 © jfe^j n - f ^ mmm 3 - f 7 4 > F7

m±0ckoC, GPE (±3>/W 7(:#T(±& <, t? L5^-if & D, >7-7 7^#x.Tv^o f x < z> bf&t)ti%>o

[1] Shih-Wei Liao, Amer Diwan, Robert P. Bosch Jr., Anwar Ghuloum, Monica s. Lam SUIF Explorer: An Interactive and Interprocedural Parallelizes Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel programming(PPoPP), pp. 37-48, May 1999. [2] M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, M. Lam, Detecting coarse-grain parallelism using an interprocedural parallelizing compiler, proceedings of Supercomputing ’95, San Diego, CA, November 1995. [3] R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S.-W. Liao, C.-W, Tseg, M. Hall, M. Lam, and J. Hennessy, SUIF: An infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, 29(1994), pp. 31-37. [4] M. Weiser, Program slicing, IEEE Transactions on Software Engineering, 10(4), pp. 352-357, 1984. [5] C. R. Calidonna, M. Giordano and M. Mango Furnari, A Graphic Parallelizing

- 81 - Environment for User-Compiler Interaction, Intern. Conf. On SUPERCOMPUTING, pp. 238-245, June 1999. [6] Polychronopoulos C. D ., Gyrkar M. B., Haghighat M.R., Lee C. L., Leung B. P., and Schouten D. A. The Structure of Parafrase-2: An Advanced Parallelizing Compiler for Parallel Computing, MIT Press (1990) [7] Polychronopoulos C. D ., Gyrkar M. B. , Haghighat M. R., Lee C. L., Leung B. P., and Schouten D. A. Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling programs on Multiprocessors. Int. J. of High Speed Computing, 1,1 (1989) [8] Girkar M. and polychronopoulos C. D. The Hierarchical Task Graph as a Universal Intermediate Representation. Int. J. Parallel programming, 22(1994), pp. 519-551 [9] Polychronopoulos C. D. Nano-Threads: Compiler- Driven Multithreading. CSRD TR, Univ. of Illinois at urbana-Champaign(Urbana IL, 1993) [10] Martorell X., Labarta, J., Navarro N., and Ayguade E. A Library Implementation of the Nano-Threads Programming Model. In Proc. Of the 2nd Intern. Euro-Par Conf. 2(Lyon, France), pp. 644-649, August 1996. [11] http://graphics.stanford.edu/projects/rivet/ [ 12] http : //simo s. sianford.edu/ [13] The Jstool Application suite and libraries, http ://www 1.shore .net/~js/jstools/ [14] OpenMP Organization, Fortran languages Specification, http://www.openmp.org/openmp/mpdocuinents/ [15] Welsh B. B. Practical Programming in Tcl/Tk. Second edition, Prentice Hall(1997) [16] #r:c4v^- - w) 7 D 10 NEDO-PR-9809.

1.2.8 n WM 7 ©t$SBSMS©f$Wib[R]

HjIpKcou-C (l)tlin > tfi — n >

— (S)Zi±Z2 >;w 7%#g#@©4-#©& t> is Hon

(i)

Lf ©He#R0#SM&f^>f?-57D77A4v-Xt 6, kVd ITStt^bCfttt, i/Z=rh

- 82 - — g##^IJ{b3 >/W7§afe3>/U7® cco&ax @m^^j/W7^#

v —^ 7°d ;W ;f Lfcn — F £%rM&Ui% C3 >/UJV lvfc3 — F#^iH-##±'T(D^fjR$fa]<#f & ^^9 ^©T'fe^o (C ^\ > ^- T — ^7 (benchmarks ;£ fc benchmark suite) — ^ 7” D ^7" 7AW:v-%3- F ^ u-cmmzfiao

^7—^yo^^A : ^ b y& j6^ij{b^#^ LX\^tztz®>, u^ yo^7 A^: X ^ tzo L- (D &. o t£^y ^ — 5? (D X.t£ 0 ^HXb, S^V-y^S )Aib • itt'^d^bb^^fj© s¥f[fl] ^ — V — ^ ^)5 Si ^ ffl l ^ (b tlT I"* ^ o

7;i/^/r—;i/T7V^r ——^7D^7A :

T ^ T 43 D , ^7 — ^ ;b ^ ^ y © ^ > ^ ^ U "T (i T y V y — ^ 3 > y U y 7 A a^m#ibLT#^o #

;i/77')^-'>3>7D^7A^^>f V- ^yny^A^LTffll^CI^ ^1# x.

3>;^ bT7U/r ——^yoy^A : 7Jl/77'J^-'>3>B>f7-^ ItlOl) £&yn t?y h-^7" uy7A©###^^^y©@#^^^f^o fc-c, 7;i/yy-;i/yyvy-y3 L, f 4t6y#y^cfff©yDy7 A^#h%L^ F77V y —y 3 >T&6o 3>;^FT7V^ — y y —;i/T y V y —y3>&f8^&<2:&, ©T y V y —y 3 >yu y^ ^^yyyA©y-ym#

#T #&o

-83- 7o7^ At: f 0H8^0&%&Mf&#t: £TF 6^T^D, ##g###^t: & fc o Ttd:£'0 £ ? & c b&)fctobnT\,'%o

• A^C«t5V-^3-^l0fl:§fii • AfCcJ:^V-X3-FA©3>/H7fV 1/^T^r 7#A0FfS^B . n >/w 7 777 3 -ae^fa# 7^/3 >m##m -7D7^ A#(D#m3 >;w ^7773 >#^0w^ - -7-77^7 -7nt v7#k -OS^# (7>7;i/3L-7/'7;i/7^L-7, ^-^7^7%^ '>Xf

LTtj:, 7D77A0^ei#fm&m'a&®#$^#, a ;i/ —7vh^m^a^(D^&ao f 0#A0f±AW:, 0^^M^a 6 0, m i)0^^m^mt:##u7m^7a6 0, &#a^a#b##0#ma0ita u%m a

(2) i!t#0^>fT-^7D^7 ho^m

::tli3 >SU ?m'Mffi$i(Dmmmte&^XRtt^tiZ> — ^7 0S#!l t: O ^ T a&B,H1~ a o a. Perfect Benchmarks [1-6] Perfect Benchmarks (PERFormance Evaluation for Cost-effective Transformations) (4u AH Illinois A#0 CSRD(Center for Supercomputing Research and Development) t^< ~3frh°:2.“-~P i988 ^t:^^^t±u^D 1993^ 10 U Perfect Benchmarks 0^ > A V — 7 70 77 Ati Fortran77 713?^^ fife 13 077 iJA-'>3 >7D7^ AT'#to£ftat>tltz$)(D(Dtztb, 7D 7^ A0g5id!t:6-7d > b 0^7 >#M0### m&$mCT^amA&&[K 7D7^ A0#Bm^^At: Fortran?? t:$#U7^a^

mmfe^&fz-DX\Z, ^7D7^ A^HfrUAc^0 CPU wall-clock SJ^nao f 0MA a&7D 7^ A0##/J#A##m^ 6 MFLOPS # ( millions of

-84- 1.2.8-1 Perfect Benchmarks 7°3 y z? A 3- h*^ yy v y —y 3 y V —7 3- MtB ADM Pseudospectral Air Pollution 7252 ARC2D Two-dimensional Fluid Flow Solver 4650 BDNA A Molecular Dynamics Package for the Simulation of Nucleic 4843 Acids DYFESM Structural Dynamics Benchmark 8446 FL052 Transonic Inviscid Flow Past an Airfoil 2324 MDG Molecular Dynamics Program for the Simulation of Liquid 1430 Water MG3D Depth Migration Code 3455 OCEAN Two-dimensional Ocean Simulation 3198 QCD Quantum Chromodynamics 2816 SPEC?? Weather Simulation 4870 SPICE Circuit Simulation 18521 TRACK Missile Tracking 4271 TRFD A Kernel Simulating a Two-electron Integral Transformation 580 floating-point operations per second)cF ti %> o

=g-7°n y'7 A£&£#££ c t^rn^t U A^K:;W 3^7 V 7°D -fe y VIZ £ £ HjSfblif^ tlX ID 2> o %$^

^'CDfljS0^ A §GA;£ tl A: lb SI jS (optimization diary) ;wyy^Dyy-4y(D#Ax V7 7A/3VXA^Mx 3 >;W7^^rl/^7"^ 7kL Fortran?? CD3 7 > hfr^: LT yoyy AlC#A^fl^4b(D T\ 46DD1/— y^fr x 46 ff th 7'/>V — -A > BT17 til Lx 7^-7 ##;W 7 mi^1~ £ G & y#V —A3— PCO^flf 6 ^)6DT. #&b^^66Cfft)^i6o z;i/3 VXA^M(±#i"#07;i/^UXAg#:&^MU% L&o 6 (±m^T;i/3 uxkfrb$LV'Mn\zMLtz'7uXAicg^^f 6c (b ## ^ Al ^ o t4Sb$iJ/t M fc o r x ^ > A 7 — 7 y u y 7 A y > y ;i/:x—-y ^ — p h Lx ^^^qiA^yyT Ay-t ^ c & &&0T0 %) o yoyyLy#G^t^#k&G

Perfect Benchmarks #^Jfb3 >/W CSRD T £ btoTc^yy-v-^-e &$>dx csrd Np#®E^#cz^3>/w7#^yd)

-85- >^7 —7 7D?7At LT^fiJfHSivO'-SCkA11?, gttJfiJ'Jfbn >/W 7 k LTtejSt2Jfci>©©U'kok#;?. 6#lTl'5. b. SPEChpc96[7-9] SPEChpc96 ld\ SPEC/HPG(Standard Performance Evaluation Corporation / High Performance Group) (C J; o T 1994 #0 1 HjT.s 1995 ^CjPjfgci ilfc Supercmputing ’95 7:^iS$jTfc'''>7u'7’ — 7 D' o SPEC/HPG © rp jC' 7 >71—1± SPEC 7>n-k Perfect Benchmarks CftfC, PARKBENCH -?> Seismic benchmarks kV>o)c'f|tl©^<>5 :'7 — 77"D'>u:7 p tf jSK^ • SlpfS SPEChpc96 ttvWmy p©469ij • fl-S( 3 > £ u.- ^©ttsgffffi* g ffaCff — 0, K — k&W@gk hTi'J. SPEChpc96 —7 7"D AltT/W tl'5o ;fl^07Dy7klt y-77f-i-3 >777^, SMP(Symmetric Multiprocessor)^ ^<7 h 7k @t # # ^ k 1' o tz 7\-f 7t 7 t — V > 7 A ZrA±r-mff nJlig&J;? Cs jiftn- p k36?ij3- p k LTadxBSftTV^o Ptctt, 7 v^ — vnv 's'sWi 7"k bt PVM %6iPl: mpi ©-en?li4IPt7 d77 5 >y$n7c*©ks OpenMP fc J; 5 7^4 h <7 t- 4 :/ Sffiffl LX 7n V =? X > V $nfct)©Asis*$n tcjo Sff© SPEChpc96 ©''k>^7 — 7 7u >7=7 Att Fortran doZIf C T*aBM£tls 3 O ©XT"') ^r-'> a >7D77 A *>e>«JEjg$ nr t'&fS 1.2.8-2)o SPECseis96 k SPECchem96 © ~o©TrCMf®$ tlX 1'7c#\ 1998 SPECclimate *SiiiD$n7co

1.2.8-2 SPEChpc96 SSlitS'Of A 3 - PS T 7 V /r —3 > V —X3— Pff&% SPECseis96 Seismic activity simulation 20,000 SPECchem96 Molecular modeling 110,000 SPECclimate Weather modeling 50,000

ra AnnvpfiwmrsimzttMt &o c ©#-c 86,400[#](1 B)&BLfcfil£ SPEChpc96 ^ k f -5 „ C©m#l±^mf stxil-rv P 6S8IUTV'5 kSHSL^TuAA 7->7? PJ'fASailt kl:aSAsi£i>ST-$>-E)o ttlbSJStrS/roT^X^AfflttffiSk© «td l:iS$1"^,6atts 'Of?-?7D77 a * SI fi -f § # # a £r- * a k © J; o * tt * r* « s 17c a & * t * it ft a & e> & t ^ „ —7 7n A©*SI:&t cTK, PM®$fx7c®HrtT*3- P©SjSfbAsIP -+f/bs SmjSk VTA A?l:A-f7j 7" A a > (757) ©ISS, 3>/W 7?-( P7f-f 7 ($fcttm§i£5I) ©ffiffl, *s$> !76tt5o iine-tcH vrttk© J;d7d:»iSfb6fi:o7c6»6&MU%ttntf»e.»V'o »

-86- *5. SPEC CPU (EiB) kH& t) $ fc, —(B): j) < , 7— U BLAS(basic linear algebra subprograms), LINPACK(linear equation solution), EISPACK(eigensystem solution), k V' o fc

7 '9' 41 7 tt small(_SM), medium(_MD), large(_LG), extra large(_XL)© 4 S$tlT jb‘ b , f9@©1M Xl:i6tTkffl

SPEC/HPG -eii7'D7'7 AejiiaA^EfSnTtlb, ZEUS-MP(computational fluid dynamics code), PUPI(path integral monte carlo particle code), UHBD(linearized and non-linear Poisson-Blotzmann equation), UHGROMOS(parallel molecular dynamics), CCM3(atmospheric general circulation model) k V' o fz 7" D 7 7 A ): ilft5 o X V' 5 o $ fz SAS(shared address space)i£7ij©/l— V a SPEChpc96 ttil^-C-6 k*6, iE&ztP'b j£?U^©77-7 U V t©k#M7o77 A©#e#^<Rf ^ck): Z b, 3 7 0l$#EffFfll:fct6ffl k»5W^t*$-E.o c. SPEC CPU2000[10] SPEC CPU2000 tt, SPEC/OSG(The Standard Performance Evaluation Corporateion / Open Systems Group)): (7Db7f)i!)'Of7-i' kCa£-t*fc-&„ SPEC CPU95 ^6 0±»$g^tlil)lT©k*bT-$>5o

• 7-7 7-0^? 7): ut. 3 > £3.-7 ;< 7 u£„ • #^>^7-7 7"d 7-7 A©ia@-9-d ,X6*S < Ufco Ig©77'J'r-Aa >7‘nT'7Att*g $-^i*|t$*slg*UTSTl'4©): SPEC CPU95 -m^n&KBi!!U & < & < Ufco • '<>f7-7 7D^7A08SS)@^lfc„ CINT95 tt 8 ©7”0 7*7 A-C,CFP95 ti 10 © 7 D 77 A T" ^ft£ fit Vfc =

—95©7 ‘n7'7 Ali SPEC CFP95 C^StVt tcfc t> © k Is) 577 9 7--> a 7 ©6©*ife?>A s, 3— Kl:$S6sSStvc*5 b £-ofz< @ UtitoT-li^ev'o SPEC CPU2000 li, 7D-fe 7"9\ 9, 3 WW 7 & * k •5 7:to):ft5>tlT©-5o ZCOfzbb, I/O 7^7)9-7, 77 7-f 7 77©tttg©l¥Iit: ktt-eS&Oo SPEC CPU2000 ©-x>^v-77D77 A tiuSBjRSttSEtofffiB ): Bb 3 CINT2000

- 87 - cFP2oooTS^^n^o CINT2000 It 12 077'J 7 —'>3 >7D^7 ATffllJ&Sn, CFP2000 |£ 14 077U 7 — ^/3 >7D7^ 1.2.8-3, # 1.2.8-4) 0 v —7 7° D77 A^a74b$%Ct)/cT)7t^ ^C5C SPEC CPU 2000©#^[^&6o

c ft e> © 7 7° v ^-—'>3 >7d^7 a&&t©^&#j* utaiR^nfco

•^“F>>x7i:0S Ci IT7D^7A0^“^ U U 7"7 - i/o • V 7 —^>7*^ 7* 7 7 >r 'y 77^@&#^^^o • 256MB 0 RAM ©ifBit-t'r, 7 7 y h° > 7"* UTfiKtf £o • spec 5%% ^§X. t£ V^o

CINT2000 © 12 © 7°D 7*7 A © 7 *>, 11 © 7D 7*7 A C ft, ftfc©U£:o © 7° D 7* 7 A >b5 C++T'IBM£ftTl>>£o CFP2000 CD 14 ©7°D <77 A©o ^>, 6 o©7 D 7*7 Fortran77 4 O©70 7^7 A# Fortan 90 "ClB^ft, 4o©7° n^7A^C T'lB^^ ftT ^ £ o & ft ^<7 ft V — 7 7° D 7* 7 A C: ft fc D , CCD 26 70^7 A0H 17 070^7 A(i, SPEC Cft^%eftft/:77V7--7377'D7'7A##4r77^-7(#A $5,000)*^— 7 >h°:i-7'>7ftA©'f£fb^ Ltli, '>7fAM(^->77 7> F 7 4* A)^7/l/ —V©^o^^7, 6ft6/bA SPEC CPU 2000 t^C077fAiIi:7 IV — 7" y V © z: C) © # # # ^ $: fr U», ft©MM&ftft"fft none-rate(speed)##, rate (throughput) jy ti h LX^kM~t %>o none-rate 6 SPECint2000 k SPECfp- 2000(4:, ft^T©^7ftT-7©^e^^©< 6V'#<5uTf rate MS© CPECint_rate2000 t SPECfp_rate2000 &, & < CD

^ft, SPEC CPU2000 CD7;V-7'7h#j^(±, y>^/V7Dtvth, ##^VlVfty n-fe^+t, 7 777 77ft A©7;v—7°'y V ft 7 VUStf^ftT v^As, :hl ftiftftV:-7;Vft7'’Dft y-ftftftft AT'©7/V — 7°'y VftT ^ fzo luizfi© t& 0 , C0^v;i/f7D-t77^'7jlf7 7^r>7B CTEffl Ufc^©7;V — 7°'y V 0, A&#^iJ^"7-77D77A©3u-^#m(-^c(±7Dt^-7am^)mmu, w c tizx^xfioo — >;W 7 A-7©7°d 7*7 L^WA©^^(ft none-rate k V, AfMffCft D W^tl7clS^7:$)^C a^%gBf 6 C SPEC CPU2000ft'TOfb£ft£o SPEC CPU2000 T(ftV-73-P&37/W;Vft^(:^^cT^©ft7^3>;W7 7-^3 >(#m

-88- 1.2.8-3 SPEC CINT2000 Benchmarks 3 7°D^ A 3- K£ 0 BBS gzip Data compression utility C vpr FPGA circuit placement and routing C gcc C compiler C mcf Minimum cost-fow network C crafty Chess program C parser Natural Language processing C eon Ray tracing C++ perlbmk Perl Programming Language C gap Computational group theory C vortex Object-oriented database C bzip2 Data compression utility C twolf Place and route simulator C

1.2.8-4 SPEC CFP2000 Benchmarks £ yu & *7 A 3- #fr3 U BIS wupwise Quantum chromodynamics Fortran 77 swim Shallow water modeling Fortran 77 mgrid Multi-grid solver in 3D potential field Fortran 77 applu Parabolic/elliptic partial differential equations Fortran 77 mesa 3D graphics library C galgel Fluid dynamics: analysis of oscillatory instability Fortran 90 art Neural network simulation: adaptive resonance theory C equake Finite element simulation: earthquake modeling C facerec Computer vision: recognizes faces Fortran 90 ammp Computational chemistry C lucas Number theory: primality testing Fortran 90 fma3d Finite-element crash simulation Fortran 90 sixtrack Particle accelerator model Fortran 77 apsi Solves problems regarding temperature, wind, Fortran 77 distribution of pollutants

Itl^o M — base flST\ fflfelZ M fz o Tt£^ LT £ 1^3 WW 7 7° '> 3 > ti: 4 k L, t^tO^>fV“^7D^7Ai:^ltNbt7'>3

L&mm-e&ao oti^o #§Z.(± no-base(peak, aggressive compilation)^® Tn

ca^yoy^ 3:9 kfu^m

f CDs? U 77l/>/(V'>> (Sun Microsystems UltralO '>3> 300MHz SPARC 256MB ^ t 100 hbitIA© bbk SPEC CPU 2000 8 oa&& :SPECint2000, SPECint_base2000, SPECint_rate2000, SPECint_rate_base2000, SPECfp2000,

- 89 - SPECfp_base2000, SPECfp_rate2000, SPECfp_rate_base2000 o SPEC CPU tit, 5-Wf-i'a'A Ai-V;Hl/3 > Un. —7 fclAofc — tfa —7ttEfffl6"C'l£ < bBVMsiLTV'^^yf-V — b, tfC*bb6t Jpa«AsLJ?>t-v>6 ©C6 6 if d. NAS Parallel Benchmarks(NPB)[ll-12] NPB ti, NASA Ames Research Center © NAS (Numerical Aerospace Simulation) 7" n 7*7 h\z j; b v-3 369ijx —yt —3 > uu. —7©tttbfffflffl©z<>7"7 —7T- feflo NPB tt, 21 MB© aerospace vehicle TH'St8ttt;b¥(CFD: Computational Fluid Dynamics) i A#%at##©# fgfWl&fi 1? £ tZBftt ltl'5, NPB1.0 fct 1991 ;6ti>, if'-i'-WISs T)13') xu-pyay? >7-') NPB ©^>^7-7T"n77Ati; CFD rD^bAtfflltli 7-7Rm&#« Wct©k»7-C#b, CFD 7"n7'7A-C)lH'?>ftf> 5 oeU-Hii© 5 o©j£?ij*-*;uk cfd 3- t"-E> 3 70I177'J ir-->3 1.2.8-5)o NPBi.o C Z %###]:&-CkL y\- F 7tT--<>7'A^#^y\- F7u:7icmgi%(c#)g (b^fi=ofe3 -fbn-cv $ 7 £©, 75 (2)fln^e>nfcT-D7-7 5 >^s$a ssk 3 (3)#-7 E:iJr-ffflib53- FlrffltttblPffiA5 T*t&t', *f©FtS@* 5BMftSfeo -E-CTa 1996 IFiC, MPI t. Fortran77 Sfflt'T T'DT‘7 5 >7'£ftfc V —73 — F # NPB2.1 k UT#6S$tlfeo NPB2.1 li NPBI.O © 8 -3©-< >v-7 © d *. FT, MG, LU, SP, BT ©5-o£a/v"t*Ufco NPB2.2-Ctt EP t IS As)iSP$n, 1997 #%m© NPB2.3 8 -DO)±XD 77 h S tlX t> 3 o * fc3£?!ltt£§Jlt D® »7D77itbfc NPB2.3-serial ^iESP^n&o C flli#-7D -fe 7 7 -> X 7 A ©ft tg if ffl t ah ' & i s *\ * n \b 7—)i -?> life n it 3 > > w 7 © s is m t % n t \ z, c t & x # 3 o

1.2.8-5 NAS Parallel Benchmarks 5 3 — F 3 - FS EP Embarrassingly parallel MG 3D Multigrid CG kernel Conjugate gradient FT 3-D EFT partial differential equation IS Integer sort LU pseudo- LU solver SP application Pentadiagonal solver BT Block tridiagonal solver

-90- 7-*-7-fe 7 F C(4^©-y-'l'XOjtXCj; Class A. Class B, Class C, Class W#fl} BSXTV-Ex, NPB iiUtn(D7 7')'r-'>3>7D^7i>l:SftM!)> * <577 >J 7-7 3 7© eauemmacMf e. PARKBENCH[13] PARKBENCH(EARallel Kernels and BENCHmarksltt. a#%*#9Wa^777 -7©gg%6itok ft 1992 ^CISfiKSft/i PARKBENCH committee 1:4 -ffto&ft, 1993 4fG:*S$n*x7* 7-7tfc3<, iy®l4fl-$7 ^E V 7 7 7©tiSE itok ft, -~<7 77-7 7n77 A14 Fortran77 k PVAftaBjeE^ft-tU*:,, $ 4©n-7 3 >7? (4 MPI &miA6t0&Rm$ftT!A6.

1.2.8-6 PARKBENCH »3 - R 3- pig TICK1 Timer resolution TICKS Timer value RINF1 Basic Arithmetic Operations (R-infinity/N-half) P0LY1 Memory bottleneck (in-cache) POLYS low-level Memory bottleneck (out-cache) COMMS1 Communication (ping-pong) COMMSS Communication (message exchange) COMMS3 Total saturation bandwidth POLY3 Communication bottleneck SYNCH1 Barrier synchronization rate LU Dense LU factorization with partial pivoting MATMUL Dense matrix multiply QR QR Decomposition TRANS kernel Matrix Transpose TRD Matrix tridiagonalization FT(NPB) FFT MG(NPB) Multigrid PSTSWM Parallel Spectral Transform Shallow Water Model LU(NPB) compact- LU solver SP(NPB) applications Pentadiagonal solver BT(NPB) Block tridiagonal solver

PARKBENCH 14, 77 r A©a*W*'ftfg&fFffl-r 3 fc«>© 10© low-level A7*7 -7707? A, 7o©*-4lK>^7-77Dy7A(NPBffl FT/MG 6S&), 4 c ©3 777 FT 71) 7-7a XNPB-CFD © LU/ST/BT 6St?)* S-WlSS ftTV 3 » *45, PARKBENCH 1:14 HPF 3 774 7 77*7-7*^4151 ftIa -5„ ktU4 HPF © forall ■‘V independent * k k 7 7 k©%f? 4> ■£) © t 10©* — 431/7 7*7 —7 4ft 17 &o PARKBENCH 14 7 7 * A ©ffifgff ffit: 3 8S©3 7 *7-7 7n 7"7 A &FBB f M a»0l:@#*m&m=45 3 k Ltu3kc5l:#m#&^,.

-91 - (3) gBttWbny;H 7©tilgff«©4-&©$,b*AckiSJS

gm#9Mb3 y;W 7©@«g#m#&k L?l±x 4-ItiCftit-i^ gm#9UYb3 y /W 4BKStiten— H&SII+@S±T$ff

u ^■©iiffiiefiassist'^kvos?$»s# ua ^^ ctisttt, Jt^ffl^y^v—y7dalt cpu *toffiSbifflofetoffl-^yf- V-?7n^7A4Sffl ly-tV'-fcfc©, f©$A^#&©^yyy —7T"Dy7A©^&

*mt :uac b, ^©ttflgffttfsetfeAcortt^f u*nf?jiiij©affl4>^ftiS5E©i>Bi As+fl-Ttt&A^ Ac b bAcilia-SS-Vo g##M-(b3 y/W 7R*©g|±©6©(clif © ffffife»A5«s;$tv^;tA5*a-c-fcb. gm#?ufb3y/w7©e##me#A&Ko

fc^yyy — ^©!I3Sn f Ach A»y yyy —y ynyy A©a^, SlffSliJmiSS. fffffi m#©^&, «*n5» c©gS%t:SAcoTliE5fe©^3>;H 7##©#%&+A-t:#*l:Afie^aA^^ T-$,%d =

• n y/w 7©»jg-fb • #MfbK*a»$ < ©&-5@&*a 6#g%$ fttV'5. C©6©ny/W 7±#@#©#«©mtc, «S'JK*©Ac©©tt6bff

yyc«#fu* ua;*^, 3u$$tiAc 3 - K 0 R tc y X 6 S5§ tt iw m tc «J b »(t ?> ft Ad: ^ if £■ i> $> -5 <> • S(£UAc3 - P©(yt7*-vyx)^-j7 K Vy-f Asill>($7As2$ UV'„ • 33yyW yyw /7imi@A Jpflgij©3s@ul$7A yvw s 2$7t$E©y-yIA'„ e V y ^ As,Wvl$7 Asa$ uu0

3fe?(|-fb3 y;(A 7©^< yyv — y 7n y 7&>r—)i

y t"v7"—y 3 y. 3y;iy pyyby —ys ytc £c k* sa$ tv1, • SnaakUT©3yyW7©ttsgffffl> f 6kt3 y/W 7A#k UT©ttBtt;tt7;pyy —;uy7'b y —y a yyn7'7 AAsas t^-cfe57o —S, fl gij0 3y7U;ue«©ttBbffFfflJp»^^*ca>Ac7-c0#ffiy-y t UT©ttfgff*t v rii, ^©(Hsyft«©I¥fflt;«^s®7Ac*-^;p^3 yy y pyyj y-y 3 y tc j;6 cnii7;pyy-;vyy u y-y 3 yyoyyAtcza tttbfffflytts f ©K#&aa LTV'-&R#A)BBA^'C (c&6© < bt U * 7 DlSgtt* sfe 6 Ap?> T*fc 6 o @y©3 y/w 7S®6A;S < MtJi, 7y y (ctt#y6f$*fctt# UAiyfiWtc fl-®sh,6o vyytctt#uAcv>@#>©s*©ttggff

-92- £tlX, V'> «t $$#7&:3— K(D#a UT^#fbUT C^l^C 5oo O ^yo>/W7 y^AfOf"? £*filh L^60CPM6f —f x ^>y v — y <£>||fl|gk:&fc-o't&—^ 0#N ^>y7-y^)o cmi^yoy^ At:(d:aT0 j:Oo

' SPEC CPU

^it U U — F“f £ & © £&oTt'& < 't&ft £&1'0 - ^(Dyoy^A^. L/Tx ^>f7-^yn^7A®^i4 • ]E=M##gg#) 6:H&

- ^yoy^A#v —yny-yt^h.—M^#(zX#i#(yDy7A(Z#ff#^ p^yyK yyoy^±#aj:o^yDy7A^m^^(:(±^<®@m^?^^^^o yuy^ Aa uymi^}:(±mT(D #c,j;o%R0mA^A#^L^ ^yyyy-s/a Uft&fc b &i^o

- ^#^'T^t)fiyi^yDy7A(±/<>^'7-yyDy7AaL/ymto-c$)^^, -

- ^;M#yy v y —s/3 >yoy^ A0#A(±M#f^0y >yy>yny

- f ^h0#f##T^ew#&ct o ^:yu y^ A&^t-y >yf < ##;% • $^yoy^ < (D#t##%r#ffW#bky ^ ^yyv —^ yn y^ AGDaBx&fc^ffl£*iTi' $>5oo #t:yot—y (D^1f#MC6#^J;W j;oy^^ ^ 0####(:##0 -6 coa ^ 6 y &a/:#x ^>;w;i/^t:yotyy-ry&%^qaf

-93- mtmmvmMtLTiz, ^©iitfo^toufTii ^m^x^^cDtzi-f^u^i, m &&:/n-fe y+h£Erc:Slfi:t"£ c h £ £ b £t£tb#;W/fbf'£ ^ bv^o fc, 3 - pcDX/r-7 u Vf-J 0##-^, ^>c kt: j;

#a^6T&50o

&tz-D tit tlfz^ — \s®Wb^'o fctJScDffe^ vxyo&.o v < o

. .%-ifkCM' fb y —;k & £' L

C^i^cDACM LT t)+^B &lo|U"Tl^(J^U^^ G&tG

[1] Lyle Kipp, Perfect Benchmarks Decumentation Suite 1, CSRD University of Illinois at Urbana-Champaign, 1993. [2] Lyle Kipp, Perfect Benchmarks Benchmarking and Optimization Guidelines Suite 1, CSRD University of Illinois at Urbana-Champaign, 1993. [3] M. Berry, et al, The PERFECT Club Benchmarks: Effective Performance Evaluation of , Int’l Journal of Applications, Vol.3, No. 3, p.p.5-40, Fall 1989. [4] George Cybenko, Lyle Kipp, Lynn Pointer and David Kuck, Supercomputer Performance Evaluaiton and the Perfect Benchmarks™, Proc. of ICS, Amsterdam, Netherlands, p.p.254-266, March 1990. [5] Williiam Blume and Rudolf Eigenmann, Performance Analysis of Parallelizing Compilers on the Perfect BenchmarksTM Programs, IEEE Trans, of Parallel and Distributed Systems, Vol. 3, No. 6, p.p.643-656, Nov. 1992. [6] Rudolf Eigenmann, Jay Hoe Ringer and David Padua, On the Automatic Parallelization of the Perfect Benchmarks, IEEE Trans, of Parallel and Distributed Systems, Vol. 9, No.l, p.p.5-23, Jan. 1997. [7] Rudolf Eigenmann and Siamak Hassanzadeh. Benchmarking with Real industrial Applications: The SPEC High-Performance Group. IEEE Computational Science and Engineering. Vol. 3, No. 1. Spring 1996. Pages 18-23. [8] Rudolf Eigenmann, Greg Gaertner, Faisal Saied and Mark Straka, Performance

-94- Evaluation with Industrial Applications, Purdue Univ. School of ECE, High- Performance Lab, ECE-HPCLab-98211, Oct. 1998. [9] SPEC High Performance Steering Committee, SPEC Run and Report Rules for SPEChpc Suite, http://www.spec.org/hpg/runrules.html , 1996. [10] SPEC, readmelst.txt, runrules.txt ,http://www.spec.org/osg/cpu2000/docs , 1999. [11] David Bailey, et al, The NASParallel Benchmarks, International Journal of Supercomputer Applications, Vol. 5, No. 3, p.p.63-73, Fall 1991. [12] David Bailey, et al, The NAS Parallel Benchmarks2.0, NASA Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA, Dec. 1995. [13] Roger Hockney and Michael Berry, Public International Benchmarks, PARKBENCH Committee Report-1, http://www.netlib.org/parkbench/ , 1994. [14] F.H. McMahon, The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, Calif., Dec. 1986. [15] J.J.Dongarra, The LINPACK Benchmark: An Explanation. Supercomputing, Spring 1988, p.p.10-14. [16] J.J.Dongarra, Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment, TRMCSRD 23, Argonne National Laboratory, March 1988. [17] D. Bailey and J.Barton, The NAS Kernel Benchmarks Program, Technical Report 86711, NASA Ames Technical Memorandum, 1985.

1.2.9 3 ^ t '> > v 7 v;v^L 7°n-b^4j-©Jjjfq] : Stanford Hydra Chip Multiprocessor

"jWWLtlt LT, 1 #0^7 7# t >

V xty b CD loTfe^i Stanford Hydra Chip Multiprocessor Project 1C C W ld]A#(Z) Kunle Olukotun L^ b 4 7 "J 'y 3 > £ M bt> ^60

##7^-7 : The Stanford Hydra Chip Multiprocessor -Architecture, Implementation, Software- Iff IW : Kunle Olukotun Assistant Professor Electrical Engineering and Computer Science, Stanford University lit 0 fi : 11 ^ 11 U 22 "Hydra architecture" 4^# "Hydra Implementation"

-95- 24 (zk)^BU "Hydra Software" m m : 55s #2 mm 3

( 1) "Hydra architecture”

1) Technology <-> Architecture £ t\ Hydra f'y7*7jl/f7nt 'y'V'lM%§&W}tZ>o

^CD##kL f UT, 3>tf 3. — &T — * :r?3-*'oX\<'%o Z.flb\t±X, A —T©^|iJ b (is — frLX^ZfemX&TO , —^(D^MfoT-'yXCDMfm&lZ 18 tl H ft Z> t d t

v^04b. mmftmm:ctc,T##m^#t#^mwmcft&&6%o mmmift

uyjoT> Al:&6o L^c at:, W: i ## CD h 7 >^^^^##W#l:ft6o t,0-^CD^#ay^#a@aLT, #

#CDm^l:j;D, h7>^%fcDmj^(d:d:Dp#<^cT^^^^CD @(Dfz5ofro 9clzm^tz X o IZ, # %CD$>^,##f$;ffftoCD(:+^ftm0h7>^%^^#^W#aft^o L&Lft#&, m

£ £> ft < ft £ o Tft t) cross-chip-wire ^V>oyciSV' 174> T (Dtztib\Z. ?7 U y ^7 JEtlfcSt &$)(if £ Z £: £s titi ifc ft <( ft D > cross-chip-wire & L ftM# C14: ^ /W X ^ X Wb cF tl fcvd' y£m^fttfftti:ft eftu0 z.til*7-*7 :-#?*fc.b'oXi*, iy-f ^ cornea 60 U J T4tlZ& D N |iV\*> btlZtf' cross-chip-wire £fflix 1jftg'&Zfciblz& s ^ y yfgjCDMIMU J r>'>lz&< W n v ^7 ^S^ftitfU^ft £>ft Vo

2) Exploiting Program Parallelism fflMlt, h ^ y¥7># lx, ra*'yvaftffi&WcDfztofc

-96- ^7A0Mttl:iitS^ 7D^7AClt 1EEcD3£?|J'fbqJfe& LTU

#6#w]//<;i/k LT, S*7"n y^u^;v #7'J%A LT^O 67lT £fz, & o ^'3#VMy UT, y-r ^ i/-i> s cN±;h-7°i//;W 7 iz £ D g b: l^iMb£ n7c 7" d 7"7 Abid; Df#7iia0 7°n-h7 ^ii^o dft &£iJ7<7)70P-h7 ^ iz o

m%07-^7777v7>'r(j:, mmi£i%^&)i-7pu'ommmzM* lti^0 ti^(b&%-hK C7i^ 6 LT(i &^ b A^7 c k/co ^ 6, cross-chip-wire ti7MlW 31 £ ft> CtlU: ^ < (D U J r > '> £ t tZfrbtzo f^^i:U7n77A0iT0M§IJIU r^f£tb£ c

3) Hydra Approach Hydra 7° D 7 x 7 Y IZX& S U T 1A a tMteA 7 7 7")l7* 7 7° V)]/7“ 7° D -ir 7 it CD|£HfT'$>&o C©7-^T77t0S*^^7^r7[t WE<7> T^-fe 7 th£ ^7>6DT- 7 7°±(:##f ac :©7-^T^7ttit X 1/ 7 F i/^A^d'bfecD^iJffl^ w#a^D. #u77^ h^^L/:j;7^±T®7^;i/(:#ayp

7: ^ & R9M blt^ #^(7 7 7 M T°D-h7 7&M^T77 77 V 7D^777'J7“ 7 s 7±T#l^#&#a^c^)^(±, 7 7 7;h7n-h7T7 V 7 —7 a >&#^!Hbf 7&t>^ 3£7Mb£;h,fc7PD77 A£7h£U ##07°D-h 77X^6^%^^

^^Jfr®II^&x££Jf6U #^Jib7P77 A0f^h%<&7P77 7 bbt^WH7bbt^ cn^Xcb D WSiC L/co Hydra ##tf 6±7r7c^ 07 7 7±^m^a

^^0 3;oc uxf ao

-97- 4) Outline Hydray-yyyy^#me&#wyao CCDT-yy 1: LT9^ b^ &o %#y y y b® HC'O^Tgmf 6o Hydra y both# -b^#(f^^i^o f UT^i^HC^^yyvy

%yyyb(D##^^j^^j;bx i^o^oyoyyA, #i:#^^cDyDyyA-c^yt Hydra (DyDb^^yO^gf-

immjBm Q:yn-teX tX\y y b®i#W±#&? A :i&}ll^U:, fn^#-77'J^-'>3>^^ff:^)©K Z o X* & i^Ofr b ID o Z

Q=#^yy vy-y 3 >0^e(±wm^? A:^n-fewt^co y^T^yu-k y-y_h-t\ iej-&3 v>&s&£y yu y —ya >07 u

5) The Base Hydra Design CCT Hydra 0^ —yy+MytCO^TxG^ao 1 yyy^&E'O# yn-fe'r7^}^:tfe$t^^o #yn-k y ^ y y ^ b?—# A-y y y^JA t), L2 ^ y y y ^ lt^^)o f LT, ynt'j/^lt 64 tf y b «li<7)y A b y lb“ ;iy a 256 tf y b*m# V — b V yy —y/iy LT, L2 ^py y y LT l^6o yyyA/^yyi:f6#JAkLT, c j&iwi>b*@^%^uyw^o ccD;^y(±, yo tyth#^yy#^'T#^T^NZ, # com## lothd'

6) Cache Hierarchy Details C0ESI+CD, ^icA-y y ^ailCliit^o ^yn-fe y +b (ififr ^3 A- y y y zl b y —*2 ^ t y 7 1 §: y — y A- -V y 71S 2-way ty byvy:ny ^ ycoy^ byib-^^T&^o yA by;i/-c^Ly(±, L2 ^py y y n. (DWL$R(D&AzWL^'1r & o yyi©7^ X (i i§- 8Kbyte X\ A#t 16Kbyte T'fel)o L2 jr^yy^i::##y^yy^^, ^Atc/wyyd'Xb^^TjoD, yyy;i/^-b 0 4-way-ky bTVy^y^ X0yA b;lyX^^T$)6o L2 A-yyyj-©l5[Siy 128Kbyte tf*t> 512Kbyte $ T'StXbcF 'tir'T IffBi&fffe -o /c0 A- y \y y zl y A > 17A X'{i L2 A- y y y -3. jbs 32byte N Ll y'>i|l 16byte Ar$> & o L2

-98- 77^.7^7^79 f^C^^/zcLo (-, ^—7^7#7^ h;i7 k V — P U 7 7 —7;\7mz:#^i:^)^ flTL^o 7q» P 7^-0##!^ 7^ M#^#mLZe^oX:o #m

7) Hydra vs. Superscalar CCTr, Hydra P d' 7^. —7—/^ —7*7(Dlt#&f7 7k-3^|g^l:ol^7^/<^o nW&i±#fy@#®yDt^ {j'fzo C(7)|§^L LT > Hydra (i 4 X 2-way 4* 7 :r — x 7 —y^ —7^7 (4: 6-way 4* 7 a “®^)0^ffliLfeo cn^lL 4o4ocb^ld] L@^T$) D. W#2##&#@®77V 7"-7 a >^Htf LTtt«ELfc0 7D^7A C'a0^j^lq]± $^^77}:L/tito C<£>7'7 7 dD^E^CD compress # SPEC95 ^7f-7 —7 Pp(D7 7° V 7* —v a >T'fe£0 o077'J^ “ya>it S^^LXl/7 Fl/^;i/t'0t*^: S oTU&Uo —pmake OLTP (d\ 1/7 #am##&ao OTLP (d:777yi/77V7-7371!&!l, pmake (l(ai^(C#m(D7 3 7#^frf avyi/f-yo t7#77 V 7 —737^$,^,o c®g|^677V7—73>0#^j^^t>}76o 7—y^—7^7'77>(±/:i^O' 1.4 1.5 fgid: Hydra yi/46^J#^yt —746^iJ#(±7 7 7 F z £: 7b5 7! t £ © T\ 7-y^ — 7#7'7772: Hydra 7 —jr^7^7CD%#l±#(±T#6^57o CCT^f77U 7-73>(D(^aA^W:, 0#J^7Lyp-C^e2h&k#7T&&T)T&^o ##j^7 7 7 P07 7 V 7 — 7 3 fa|B5C###7 7 7 P^#< 7 — ^77^707]^ 6&T&6 0 #%J^7 7 7 P©7 7° U 7* —7 3 7 7'fe £ eqntotL mSSksim ^6 3 C bt) st>fr& o :©0©^©77lJ7-y3>t:oi7f; o compress (±± < 46^'J ib T' c^ & l "> o eqntott^ m88ksinu apsi ^ 7D 77 VCD^H'^fc^MblL %> >&SE:fts<£> £ o S V^ #7fUZ, 3>;W7^c7L^77V7-73>&ak^j& SUIF 3 >/W 7T4£Wbnm£o OLTP (4:, btbtm m& 7-7 3 7^#^ijfbf ^,3 >yW

8) Memory System Performance C CT. yi7 0^#lH:oWT#%%f ^)o y^7&#Mf 6^0 7^7 V 7 h a L7, yi7 77t7#377V7 f/^^^LT L^VX h;i/^^7(C^^ aO'O C

-99- Tomcatv XttAX©tiWW 10%£T@£o 7 X b M'X CD ddW WXft

[SOiBg] Q:C ft Lxo©7 — ^7777 ©X D 7 Xt^X X;)/l±|s) L^? A:f o^o

Q:7 >X7b7 v 7°7,n/7Xn dr 'y +HI&, ct D iE;i&X n 7 X £{£^/L^©xy:? A:f©7ft0^o #m^XD7XW#^^^7Ck^X^6o ^[m(±f7#bb#©B^X& £ © X Idl l^ X D 7 X lz — b X |+$lj L AL Hydra &7>TXl/&7^-:)r7X77fo©X\ 7 0#^C##W#X&^o

Q:X—;^X^7 77>©jp7 7 7^.'7d' XW:? A:S*^JtI, 1 7° D dr 7 +h © 4 fgCD A-7 7 '> jlIM XX:o###(± ASPLOS96 ^The case for a Single Chip Multiprocessorj LT^J L^o

Q:Hydra © 7°U dr 7 it 7 7fdU 2-way X—ft —X 7 7&©£>? A:Hydra ©#^#(d:, #$(:##^77>X(±#Lt^m7©X, 7>X;i/^7n.-77

4, o, sa ^U&^C5T#&&CD#4;&^o #^if^24bL

9) Problem: Paralleled Software jfe^ij V 7 b X^7$: Z 6/:&/)©#)# #?UV7 b X^7^#ft^C k&#^LX^ftm^7X V X-7 3 >&#^'Jfb76©(d:@#X&6 c 6:^^^Xft6o ##!:$ < ©f^#©*lz, 7 7°U X*-7 3 >£d£?iHb7S c £&&&%&& Lft&fto $M(:4b SUIF ©d:7 T:i^J^^7>;W7X, 0f7?U FORTRAN 77VX-73>©g#m^iHb#m^^^4b Lft^fto mecDsuiF&mw%:e&7j;7%\ v-x&gm#^ifb76

77d-7(dx C 7°n X"7 AX>&T L& 7 £ < ft < t>OXti:&fto

7 7° V X* —'> 3 >(DML&]ik(Dfzt>b\Z\&' X lz 7 b LX1EE©X lz 7 P##7 LX

C, C 7 7° V X" —7 3 >X&> 7 7 >±X^zftft^X 7 0#^©{ii@&##f L&U"ft(3& 6%fto C7XUX-73 7&^^Jfb7^/:Al:(±,

#LXfta©&&#t# L&{tftte! & L&fto :ti^d >7 0flfiKb©f^Eh V>7o f LX, >7^mLxv^#mf^MT,(t^c wmn-p^xL 7 p^i:(± £t?CLXXlz7p©Xb7&JEL

- 100- ? kT-sg@k&-2>©lik *7 >7iflfi|-fb;t>sS8lsgftti:i6L©kvdk kT$>3 = k©#% ©emiiS < 'nttt>tix\^ti\ C77'J'r->3 >-e±^ ^TlcttSot^

k@t>n-5o L* LUBMTtiiFUi Bfa© C 7D 75 AftigiRTS 3 $7ftSoTti:V>&V'/W;vne)Sff'C\5l69iJttffl1ftitit:'®s¥roc*€.^-5Sfe»VNa6litolr*$,^„99% ©;i/-77 7 L-7a o, @b © i%As^d-C7kvkv>a;v-y7b sfe-5 k T-So f LTa6M@A^%t^@frA^< $ftTt\Rft©ftx f ©;i/-7li#MT»t'ka# LTft < A\ ;t-7&|BDt£-U-&(tfttf & >7%#'lb&#;H7tf, < 7 7 7 7 7 7&J$tfr6ft; * o J; b ft Lfew

10) Solution: Data Speculation *1:, Itltta^WotSfeckb, f ftft jL6R#©gk#ft-3t\T%E^<6o k © j:3»IBBftWftR#*@Rk LT, A'-XtSitlftJ; b, 7D?7A©T-5lft#SICt1'l:77lJ 7-7 a>%mmbt 5Cki!tS -5 7n 77 AS3ft3-iJ-fb LTUff L£ k LT S, HfiOSftftfcft R n- h* kx iri'Ttffl/Dyjioiifbta k k&x atzfrt,-? $>%o ctill 7n77A6ifi?mfi:LTi'-E>|iST'&x JELt'iK*A^#7\-M7j:7ft Z bffitiE$ft£ kt'b k kSjttt'T-So % 1/ '7 MBftf-7##A!#$ LT &x IE LV'51 fT$S$As)#e>ft£o kftCJ; bx 77 b -y-y a >©@#b&ak#T6 oI#@©& & kk 5 tkfci)-i9ai4xnniij:<- d $ Ox iEmg©t©T% <##©6©^itfti@im&# JtftftbfJft'o kiltMbx 7=-7%##»M-ft«x 7n77 A£JEL < ?£ ?>-B--5 fe toft Hiss Aft e>&v>0 <©J;d7^-9-^t- b* stil5Rfttf;i/-7ilfiM-fbttE#7afe-50 36?iHbft ^$;i/-76SS ft-5 7clftC =t < x 11/-77 7 l/-->a >IBT@kRBl#b#©&6T-7##&mftft6 A Bliftt'o 7\-FbirisiitSfrbtfeS. LA^Ux 7n 75 A©36?iHb^(tASA-

##©#x.m±x ES©;E#:7n75 a&xl7 ecesiLx t ft 5,©X V 7 h'toJESfissHftTVRBlbx 7-7##&@cTf©77b 7-7 3 >6 3£?!lSlff AR k k* sT-$£ k^o t>®7$>£,, -o$bkft6©7V7 #x 6 k0 7D77 A©7ft«Tt'Ro ftStlTli, &#x -eft?>*!7c©7D77A©®l?T*5S7-r-5 kk$®EUx #7l±fft&7i/7 Hft*fLTft&5kkftL;fco o*bx m*k'ftb ft 5S7T-5 k k $«BLoox V 7 h*436?UftS!ff ft-So •e LT k©x —7®K©=#xli7 ^ 73 >7 >© Multiscalar 7n 7 = 7 h ft A* LT V^o tffl7Db 1? bTlix 7*-7jS#6ftinLfcSt>©TSft^-BSftAc7'>>®l6

- 101 - Mt Hydra T'idU ^ — M'/\ y H (D'b £ V^^SUff <£>+*-# — h 'fcttlM't 3 o :®7-Jrf^ h "CU\ v 77ji/f7n-l! f C^^'J l/^;i/TCDiIS& nJIE ^t*£ S ©£§!$£: t* 3 o

11) Data Speculation Requirements I f'—& Hydra & £ &lv\ — K *> ^ T £01^

;i/—LTt^ LZoo f i/ —s> 3 o, -f ^ 1/ —3 > i (ZWZd' ^ l/ —i/ 3 > i+1 ^ UT d'^l/ —^3>i^^(j-^) d'^ l/-i>3 > i+1 4\ c^t6(Dco(Z);i/ —yd'^l/ —i>3>^, c i ^d'^i/-i>3> i+i c D#^o ^'3(d:d' ^l/ —^3>i(:#3(t^X(D# d»f 1/ —i>3> i+i X (D^^aiU(DmH:^^ck^&6o co#^ j:o, yot 7t7-Fbf;K -o^ D i§^CDE:>f ^ ~d' $) h o

12) Data Speculation Requirements II

1/ —'> 3 yj^mKDd' ^b-^3>(:i)itl) X CDS^iM J; D L$ ^LT, Mb7x:T(±,

-hf 7l/-yd'^l/-i>3>3 >^(d: ^

26<&##&6o ^^(DtK^&^tacaasT^&W&AWC, T&60 %66t:C(Dd'^l/ —i>3>0^fT^^%L/:af^o CKDd'^ 1/ —y3>i+l(Z)^^^Ti>>(D^^^#^LT33 -5C C0#^C(±, d'^l/ —^s^i^CDXCDg^^^jlD^C, d'^l/ —^s^i+lTfD

- 102 - AM:### 3 a 4

13) Data Speculation Requirements III

b 'y i+i & ^ z. t& i~f til cttn ytzi^o v /b^ u & ^ B x 'it? y —y b y i+i & x # m §■ iA -f ^ U-S>3 > i+i ffl £> & V/bC Ol/-y3>iif)^litli^^j:i^o ^ 6x 'i $

^#%AM C#^A"C(3:, #^^(^^^>±1: X(d:co##&^oTi^ &o —S/3>i#4b#, —S/3>i+l#4b#T$)^o eft ILP rn-fe y V"VO v V7,9 U ^ y?\z tx & «t < MT&0 x ^ t V V

f UT^^5/Cx C#7Dtv"^^W^l/-y3>i+2#^^^%A66^C^, -S/3> i+1 X f#X:A6#&oZ:7'Dt i/ —i>3 ^ &IE#UTl^U^Ud!^

Q:HydraTW:x A:Hydra^W:f:-^##mM&#mf^o###^(±, 4b^

14) Hydra Speculation Support a##&. Hydra b %fi\ IMTCj^^o x CPU C^AD^ftx ¥k$k 7s y "J b £ rfrJ MtZ>o LI ■*-*? "J yn-lzlit, $t?(D U y b Z>tA%> O. 3 h — V y

Z) o

:T — ytkftM)yftt££titzZ.t%:M%ktZ>o uT#k&D-c& &o Ll (±####x L2 ap^^y^pfgcr)^- t)#kf ^>o U v ^ ^ L2 ap^.y^^t:3o

- 103- V*-5 >7tiX ^«07D-tytiiLl Ztlb a sib| tr k uxcm urasc^—j7&#oc ®S*fT®ilS$SEL< SE1"5fc0©l¥«B6:jSei*x EE^lflEto-b 7->a XT-Siifl t§o yx — F ->zTtcjmXx V 7 h e- zr tijsasiff 6"9"4< — h f-Sc : zT©@S:$( b #vf-7„ CjxHx % B© V 7 h 7 ^y©-fe 7 -> 3 >7!BJBJ!-f-5o

[*S$tKS] Q:7 4" h /t 7 7 y 0-9" -f Xtivx < O*? A: IK 7:0 7 4 b rt 7 7 r ©IM X fcffitgCMtfcCotxT «X 4=*mi#f 6o

hf e/x- h* 7 = T k'® <£ 7 (.: & 3 A\ &««)$$

Q:7"n-fc yth©SI b STIi'^'f 7 V 7 7 Cff&tift-E. ©*? A:itM(Dm&Z-fr&oZtftT-SZo Multiscalar ©

l*i7i7i)i7Dt 7+7^0|!l=i4l31i UTt'-SOA1? AiSSIfjfln^Dtym X V 7 ©XD-b 7+7till b 7-f hXll/-yx -X611UTX V 7 K© ID *sjp b t b $n^o

15) Speculative Reads u©%m&m0f Htfo 0-7© CPU ICftLT, -f 7 l-ya> i-2 5, i+1 ##!l b ST 6 tot VS „ C Ct, •1' 7 V —7 a > i *Hff t"-E> XD-fe 7-9" TMej k"©RACtt'-C SSSIfrS L&V'T'n-b 7tl-»s#6"t3o C©###fz&R%h ^txxot y+i&^ v h* cpu a#*. ^7 h cpu ttssuffsff6t)»t'©r-. wig Si|XD-b 7-9 -k UT$#7o -7$ b, fsaitlfcr-7lt LI ^r7 7 7i^0i**7: LS5l<*st 7 h ufc®-grtt, ffli ir-? SS» ttifo LI *7 7->^.^x©E^7: V5#As5Xb7 h LL2 ^K#X 1/ 7 K©?^ h;i7 7 y©%m&#$f 60 i84toC, -f 7V-7a> i-2 t i- 1 Siffltl'?i7D-fe77li, E Lt'y-7 £«gjAA,7Hx-5o ftfrT-SXlx 7 f©7 i" iA777ffltttiiAi' ha, ®if©y —74^-7 1 7-r>#7Dt7 9" C ® $ ti •£> o

[H@IKS] Q:V 7 h I)i7/U X7'f->XIC7iViT%E-5o ft ftXD-fe 79-¥

- 104- u^owm^^&a®!:, m%®^77-737a s#B®^ 77—737^77 b 7 ^7/W T"^ - 7 7"f a®&? A:7 7 b 7^T/W 77^z:>7(j:3>;W C #&i'o 7D7^A C&UtUdC f'—7##i@M®##^C t"CD^SIffffl/x— H7ai Tttifrff U&1^0

16) Speculative Writes ztutg ^izmvkm^o Ho® CPU 7\ 4 7 V — 7 3 7 i-2 £> i+1 #^U$^7Tl^a &(Dbt% o Ll jp^77^a7^ b^77y®^^Cf:-7^#^3&^jiao ^cff fa CPU (3 7^ b;i71:^-7^mfo #^c® CPU W\ -<77-737il:^^%(d: 0lif^©t\ f'—7###M& jo C U^ U, #® CPU £S§- ?A^ tlfcf'—7 £rl%t? ^ Read-After-Write C ® 7 7 7 b eu^ufUd;^6^^o ^efa cpu ca^-c^ao a cpu®#^3&^(±, &/2%m#i-?&-3-e(±&6&Uo ##^® cpu 3^® Li ^^77 ^^igELTjo ^tai^ &#x.ao &a^Ai? cpu rMej ^^77-737 i+4 &;mf a,, ^®^ATrd'7 7 -73 7 i+i ®f^T®7-f ^w##jc^a®/2ao^o f x:®^r^ vi/^®&a7^ 7^®&a 9 - b(±, ^^l/“'>3> i+l cn& pre- invalidation k Of ^o pre-invalidation cFflfc9 —b (d7 L2 ^r t ^ ^ i3 3 b cF jl f fia^Pt) Ujl^^o

17) Speculation Runtime System 777 b®##^ff®$iJ#(d:77b7^T^ff^9o 77b7^T w\ f^T®^#7i/7b®j##&;a^u. ^®777b^^cc&a^^i^^##& ^#f a«

Q:i#im:###e2fia7i/';/ yut vifm-i^? A:f®kj60^o ^oaX:<^/u®71/7M®^fT^#x.aca'6-e^a^, fflkL f 0 # < ®;\— b 7 iTaQ 1 7°DU 7tb±T®##&®^#7 7 7 b'®HU§: tbf-bfa;\-b7^7(±M(:^^7L^tu 4'®hcafji(±^W#^o M^0%#7 u 7 b®^fr^#;^7 c®f 7^# A(D,##&®%#7 1/ 7 b®^ff b f a#3'J^;\- b 7^7®#3Ml:N U%(±, j6U'Tj^

77 b7^T(±, 777 b®###e^7-7##mM^^®%#®w^^$«j#f ao

- 105 - Ll A"V7i7^PF©7d'>©##fb 7-f b^y770^iOii^fr^^ o 3 > t° :l - 7 7 7^ A©E£St(d7 F ^x/t'IitliC ^V7b 7 x7T'Ilt 3 d to F 1/— F^7 & 6^17 7 7 F 7 x7T'©#tJI(d:^iJ 7—/\7 F&^l^^d L, — F>>x70#iJDtt$)^)^Stf0t“^“^^ F& 77F7^TT©#@1:(±, ^0©###^^#©##$^:^ 3 Xz 7 V 7 F^fe^o rHydra softwarej T\ 7 7 F 7 ^ 7©#J####^I# (Z:#j#fb ^

7 7 F7^77©^#^Fr©$lJ#©###ld:\ ASPLOS'98 rData Speculation Support for a Chip Multiprocessor j T\ #^Ufb© @#^##7 7° V 7 — 7 3 7 © 7 7 F 7 ^ 7 (C cfc £ t?fc tl§ [rO-t li „ ICS'99 r Improving the Performance of Speculative Parallel Applications on the Hydra CMPj & o

18) Creating Speculative Threads £ £ £7T\ 7 7 7 F©F£$£flfT Ff^7N — F7xTh7>77 A7 7 F 7 ^ LXzo #7:, 7>yH7i:oi7ilt^o #7W:, 77 7 F##^e7-A-7 7 7^^y^- FT^3>;W ei:ML7e^7 7r$)5 7#mfbl:7)tM:#^/zo 77 V 7-7 3 7©##7 7 7 Ft7 ;v^fttTovmdcStt

Hydra 777A^(d:, 73©^#^777F^mi^o ##8^;)/— for ^ while —3©##7 7 7 F F: —3»©7 7 7 — 7 s^&miv^T&o d^u::, ^^iem#©

Hydra 7-dr7777'Tfj&7k:(d\ #m#©3-F^y-7##mM©#m$:^^7 WI^T'o ^^^^^#©3—F&777 FkL7^#^fT^fj^3o Hydra 7© 3 >A^ 7 © f± $ (d: ^ for & pfor> while £ pwhile h l'' z> /z <£ 7 &n /7 — 7^^#8g;F —7©7 —^"77 F V 7 — 73— F&^J5Kf £ Cl tX$) % o X LX ^ source to source ^j^7°D 77 A §:^ff L> pfor -7 pwhile #^ffm©7>^d'Ai777'A©ie^mw:^Mf^o ^-y^v##©#^ 7^77^©D—77/)F^^c()^yD—dtt(±, #7©777 F#§#§#©7f 7 7&^hKf F 7 J:ya7l/ —7"jpif V##©$)^)^#(Hc:^L7y— 6o

Q:#^iJ;h-y©#A(d:, ###m©##&#ihT;&&&? A:7! %#7 7 7 F©#^am$©7 7 7 F©^^(±|q|^lC:fT^7^o

- 106 - A:f CD^#o D /:o 3 1/ 7 FA^)7 A ^(d:#M: LTV^ f 6C

Qifft'TkL ccD%

F;^777^^#^(D7N±^l^? A:#^x 7^F;^777(D±#^(±X 7^F/^777^#ft^^W:x 7l/7FCD^e^#±U, fCDCPU^^yFyoty ■9-H^^^T'^Oo ^o&(±/\y F7"D t 7 tH:: & & CD T\ 7 7 F U 7 Zlt&Z £>&l^0 ccDB#x r/\y H7D-t^^li^-5I^Dj '>Xx An—;i/^^ff'f^o

19) Base Speculative Thread Performance ^#7 1/7 F^%7"A±T0^##mcDM^^|gl:^'fo yy U/r-i>3 >CD-gPl!(± ^< A#:&V —f 0# gcc 2.7.2 T# 7^3:^ F^^^:L/=o gcc CD# )@fb^yi>H>l± 02 'T&&o f UTX 4 0CD^>y;F^^^-yDt^tF&#F0Hydra 7 —^7^77. #Z^##^fr&3>FD—A/T&7>f^Ai>;%^A&, S/^a.1/ — # &#m bfffffl^fr^^feo *MW±±X(DV 7 F y oir4o «fcU0\— F y o:y^#0Ci/ l/- F LT^6o 4^0CD, compress CD «k 5&7 7 0 ^ — '>3 ^ 1.5 fgOjiJ^roJi:^^ 6^-CV^o mpeg2 (D£olZs &-d t&tfeifi&T §& o&yyvy-s/accDdio^yyvy-z/3 \Z~Dl/^X W:# Odi^^5 o 7 7 7 # 0 7 7° V 7— S/ 3 > cholesky x earx simplex^ sparsel.3 ^6W:, ca%6(Z)y7'vy-i>3>u\ %#yi/7Fty;F cfi6cDf^#(j:x -ACD#±m@cD#^^ 2igfmi!

7;i/-7 7 V 7 F fb(i^i!j(i~^ £} % source-to-source 3>;H 7^l)t:^ TDS tsCD <£ 7 fc source-to-source 3 >/W SbfbT' & x if ^ h X' & alibi! 0 #y^##l:^^(j"^7yvy-i>3>(j:ijpegT&^^ x iBlLXfF#^^^ m^fbyDy^ < >

CDt)x ^^AgicccDyoy^ ^ >yty;H±m##7!&&&6/:o

20) Optimizing Parallel Performance 1/ 7 FtT0l/&^ L^z^x 7 7" V 7 —z/ 3 1/ 7 F77 V

##8g#^jfbyDy7Ai:^^Tem^±f

- 107- •O^T#x£©*5jMT&3o £©T —

7"A0D—^7 -5A# C0D-* V 7^-f Mf^SM/bL T7°u 7*-'>3 >(7M£tEd£S;Hf&5±t: SPLASH 77U^T“y3>[:ilU^^il C©IiftL D-^j^ax#

ft5 A--7 'y'>n-^ 7 £$5^1" 3 tz&lZftf>lX %tco Ztl^(DT7V7 — '>3>(D^r\ MemSpy (iv'^7.1/ — 'y b >^#T7r ## Ln $ tz Flashpoint (i Stanford (D Flash V ;V^;TD-b y th-eifrfEt~3o 1/ v F&#mf a fLfxtill #^J7 1/ '7 FREt:###&i^^o{%mE&f L^ m bti'tei^o Zotzfo, #?#fb C ^AW\ ?—7tk&'M)z.

21) Feedback and Code Transformations i> < :L FC7 JITICW:, &&&<, Cti^0^^-XAtlL x — Lfcn — Y^7 h7^©7n

6 &7rl:, 7 U y Y'kbOU^Wii btzfr)b^^tz%(D'&^t\t Z> o c ft £> * e> > btiti%^%t ZSkfeXfo Ds cft60##^6%#m^6ck(d:$^o

z>tK mmonxoLM^um^oi&r^m^m^ m^o cft(d:&^@(D7yU^ —^3>"r, C6D^i&(±, yd'^l/ —7'yyf6 ckolc^^^LvF&^^f^k^oTd'f'TC^^^Ti^o /=^x.UL ;h—yd'^L ■—‘ is 3 >(D9tt.11-n— Fi^i^fe ^ C0 D— F^^* s;v —7°^ 7V — i/s >(DMf%(D7 F T#^C## lti^l^^;i/“7^ ^71/ — '> 3 >0Hff (d:^-—y^' —7 y 7°7: CCD^O^#^^, D-F3:T(f. 7FT^mH:^(f6cai:j:D, ed'^L-i/

—^ LW

- 108 - 22) Optimized speculative Performance mpeg2 V##

■o^rm v* o 7"o 7 7©##Cj;0 ;s —^ 7 7°W#^#^r Lx J£tllZ£-DT^lMt£^7 ^ — V >XO|p]±^fTl^o mSSksim &m CfS6 ”£^7 *“ V >7#[r1± Itl^o compress ~£# |i|J8I £ ^ £ £ t IZ £ o X %'}><& A 7 ;*

Q:#%l:M/:*7 7 ac©*7 7'£##^m7 d:7^^, UTV^©^?

A:Xf\3l©Mf;o C©7"y 7-£#7 -f — P^'7 7 j^#gi&'t*fT ofco tu©^^7T'(ir ;Vd'U VT$>So mpeg2 £j3lDTld\ SuS-dTh- F £H,T £ C TtiJtl *>& o - £ifi&*z>o &mt v ^c^©7;i/^ VXAg#:©U7 k XXf-V U >*#, 4^-i7©^^a LT#j%TL^o y;i/3 vXAf ©6©© hX7^7 v >x#e^7-ci^ i^o ^^iHbf^ca©#AaLT, y;k3VXA©V7k7f^v>^# cccinti^o

Q:compress £ j3# g fj^Mb© t £BT U % ft? A:compress £fcl'o

23) Hydra Prototype Hydra y ^ f A © 7 n k 7 7° $: f± _k (j1' /r 0 7 D T 7° 7 7 & |g ^ f" o Integrated Device Technology(IDT)#A\ iB- 7 © 7° D -te 7 Ik © Verilog £: j/E#£ D T < ft o ##t#& #IH&&-32:©'T, T'D-k7tN:#i^2fta7tV77^At:'O^Tm id q ££~£m ^^©7D-t'^©^;e1J3>l'D-7l:itf1U;o c©7tV7>kD -7#, ffe©7°D-fe yT*©/^ U 7 > k n-77\ U — Kj3«fctf7>f

#.*©XDt 7ik#, 7 7:1 j3Z ^7^—7^* 7 7JL^j^Oo 7^0. — 7* — 7 3 > # £ © L2 'yyi§ILT^$tl^)o 7 -f ky( 7 7 y #g © £ jo 0 ^EE^ft^o c©«k o C, f - y 7° • # 7 • y;v^7 0n-fe 7-7^7- y 7°±^^©^ ft£0

24) Chip Design Road Map Verilog imi%®#S#kk77T7k#^^ #*t:3&Tf&o 7-XT7k#^^#^^?^DZV^o

25) Conclusions Hydra # 7 7 701/ -7 7 7° 7* 11/ 7 7° D 7 y 1k©§f D £ i/i^ 7 £ o wide-issue x-^-X77©^^^m76^^7^vi/a/^j^^^|#m770 4btrU6, #*#71/7

-109- f c-c, c# CMP y3>c^Lt, &&,

26) Hydra Team Monica Lam ItWMOX V v L%#^ff(D77 T7^-^#lI#T/co Single-Chip (D% ^ (D^Ws^i^. #&%i/vhcDyD^^A-r >^—7^—7, hC;:&y&iMlf8L/=o Mike Chen (i Hydra _h© Java LX ID % 0 Ben Hubbert (iT 7 7°0 D '7 v 7 ^Hl^CDESIt^S^ Ui^o Machiek Kozyrczak It IDT CD+fT- h£0

Hydra OBtSf HHt* & LIT <£> web page UT^UL http://www-hydra.stanford.edu/

iwmmm Q: 7° D h 7 7 7° II Rambus 7 >^-7x-^|iHU;0i(p? A:l^o

Q:f ^TT(±. 7 7 >7t V®;i> 1:^6 60^? A:64bit x 100 MHz tie

Q:7 t 0 01/7T>i>(d:^0^j^^? A.-ffilill&AcIS Rambua 0) & -c> & (7) £s$> 6 D^o

Q:£'E&7^ U (DJ^y IttLtzOW A:#^LX:o LfrLs 7 ^ U 7 7 T 7 0^ l> 7 7° 0 7 — '> 3 > TT¥I0 L L T — Mi

Q:f VT4 A:Object /So 1 17 7 7 T f*] T fr & 17 tl ti! £> l't o ^“Fl)x7JpV7 f^x70

A:(Hydra'r(d:)m##^jfb'r#^jfbhrm^;k-y&

- 110- Q;C0d:7^%#^e0^7z:XA(d:, 7 7m$:i#^LT L^7 0^

T"D7770#B#t'T#&7V7Y7;i/&g|572:&:50(dA jp^77^T7 77T&D. yot

7 7 0 7 D 7 7 W##((C^#&7x. & V C ^ 0 (wide-issue)CD 7 D dz y^ttt^^t, (Hydra 0)7“ A —^7 \H&ft'B\Zr\\g < 7 7 7 V > 7 7 A £ ftt> D LI Jpy y '>iL©^iJ(7)®(i^M^iPx.^o H^OgRTidi^yccL d &7$47£

X., marginal 7dz 7 '>xl 7 7 'sik%M't %> OH5&0 7 7 Md^^o L^ L, T 777 1/ 7 7 > > (d: marginal ^ 7 7 > xl N LI ^P 7 7>xL0T77777 A 1C # x. 6

tlTl^o L2 ^P-P 7> ^.\ZWit^>T 777{dU LI dpy 7 > xl f 3 7 7 7 7 tt~~

n — yy-7t% o ^cODtztb, S6to&^“yS-/\ 7 b0t|,bntdA

b 7 ^7(d: ^04§m<&#&? A:L1 ^P7 7 tffi.%. LTL^o jSSHfrB^S §•&^&##C2cW:#W'T#ao

Q.o 7°D7 7 Did;? A:#^^1C#3 7°D7 7 7*7' id: & < N 7 U y LUffO/x— b 7xn7lz^j"7^7 >7 —7 77!&ao ^CD^cA, 37D77tb#A#:i:A^^#!l^(d:##l:/L^^o MIPS^(±, 7 7°D 7 7# 0 & MMU ^\07 >^-7x-X^ 3 7 D 7 7 # 1 /S#2fiT^:5o #7(d:, 3737772^71/7 b0^##fr;\“b7^7^07 >7-7 ai-7 ^ L%#m Lfeo

Q:C0^#t^$+(d:7-;i-7^7 ^fm| L7:&^ A:ILP 73777(^^^077 h^y^-y^e0^^C;^777^miX Hydra X!(d: 7 1/7 F07 7 b :t7;t“ 7HH 0fcA6i:7Y hD'77 7^fflLTL'50t\ f 0## T(d:mm^^^mm(±lEL^o L^L. ytVT'yyAC^f^y^b^y^-T'^ff 7!(±, ILP 737 77W53 J; > ys —^\7 bid;:£ L&Vo ILP 73 7 7 7x!(d>

V ;t —7VS 7 7 y #CD 1^4^77 7 < 7 7 77 vXL — u >7"$: 1 77 7 ;bcpT'ff &t> ^(tfUd:^ & &(Ao Hydra 'Tii, 1 77 7 ;i/7'fr & t> & it fafd!& ^ &l^MJi(d:& < ^ M S0yS7 7°77 Wb/b 5?iffl7££o m'#?m&9l5mL&^#fr(d\ ILP 707 77 7^x./(7 v V$“< y;i/773777y77A77^ %Z£\Z' b 7^7 7 7 > b7&#o ILP 73 7 7 77& qjfb7&3 0 #00)73 77 A7 7 >7#, ^±)k^^7d > b7±&##[m^7“4P7 7 7-7&# x.6o C0m3M#&7mi7Y>b7'T(d\ #707377A77>7^73777^ #L, e707d>b7(±^ei:^^^J^imposef^cai:^^o ^0/:^, ^7:% 7d > b70#m^C(d:77 V V$-< >7#<&^a&&o 7;i/77i/

- Ill - y H ©y — ^ t~ 7 y d71! & > wide-issue T' fr & £) T l't 6 J; d) & 7 dr V U ^ > 7*

Q:7 7-7tf VydC^f6#R0o Hydra 6D7 1/ V b##^e(d:Ll b y;b-#^C##LT^6^\ ^yb7;b-#^(d:37b^±#Wo Hydra 7fU\ % % 4 yo-k f hm±0 7Dt'7y&#mf 6ma^3&6&o 8 yot vyw#^a#x.^,o 77-7 if •Jf d &Mt Jfteu J: l) a- F1) x-T-fr'j&W: bt£%>0 A7 Fyxy&fl-jfjn f6^, yi/vKaL##ff(Dyd'yy^y^i/^hVjpy^y u^u, isyoty f^6^lS^f/:(d"CD%#7 1/v M±f#6fl&^/=50o

Q:4 7U dr y Aiyoy^Ac^^o ##j^®yyvy-i/3XDm^UT, a. Mfc.&mft&ft'ObCD b. t'(DU^^lX b&&\itWML^ lbli 8 fr £> 16 yndr vy^+^/io

Q:f yoy^ A$r^(D A:7yj^-'>3>i:ltu> ##j;o&M;5&'T#&o a. b. &£A;£'(DT7V'r — '>3>{Z&\,^X, bfc &(D&&WIZ& 77b^777^m^y#

\zx V y b &ftW\t %o ^l/'^ b^O 637 v Mcoy^x&^t^o yi/ 7 6^, 7l/vb(D#a^m#(D^-;i-^^M^m7, 7 1/vb^^^j@^6k. 77 b/i'7 77##fi6o :lx^©:^3>;t0^y D7’77^fllt^ 11*7 1/ u&L, $^®yyvy —y3>T##u/:^^, ;i/—yi//<

Q:y;i/f-f"yy(Dy7yA&^6^(±^of66D^? 7tVy77A(±^fo^6(D ^? A:^^^AC(i#7TV^^o f&T, fd 1/7 b U^d77y^^m^T7 1/7 b^#^e &y#“ b b D :b(tSuy$h&/tgx&V^ —DCDfi&X (±&6^6oo yvy^07y-yifvyd^^(Z), #-yvyi:^f ®t^yb s$)6CDT\ 7 Xibfc o l^T&^fx. fcl^o

- 112 - (2) "Hydra Implementation

1) HE3E (Outline) Hydra XC##&XDt7it^^#L^ At:o^T(CMP o Hydra CD 7 ^ U 7— tt 7 t ^ lZ~DC^X j&^SLt o ^ W:, Hydra vWtf- F t % PC;a:T(D#A^6, 7>^-rAXX^A^ ^mtC^fGLX^a Hydra OtU b 7° lZ & Mtl % D 7: to

2) #Ak CMP t£ '>(Why a CMP? Communication Latency) 7D t L j; Oo t%3K#'7;i/^7Xi/X^ AT\ ^7f- 7X(DX t V &^LXXDt 50th^^;i/LL±, 100iFd'Xyi/LL±^^a(D^^mTfo cA ^^LX, imL<^7X±(:&a 2^4- 7V'yn--&jb LTMfi^lf til, 10 +M 7 frm'&’CiuTt Z> Z. t#X 111 o X&t^ v FA^X 'Jt^ iz)b Xid:& < & a © X\ Xu XX v(ifi£3fc® «L -5 l:r-X IB <&o, d;oc^K)o SPEC 'Ot'7 — 7 CD ear

3) #i% CMP &#& - /i> FH(Why a CMP? Bandwidth) CMP Xti:X nx-^ 7 XCD$to;b5&u fc#x A> F ti©^X0 £ Xo read t write (DAT. ££0 X IZ X > X 0 7 > F f a d t % X ^ a U ;^X*@&J& (fa C ^ &##XXo £fc> /W XX f Wb^cL-DT^^lbSrlH^ C tb^Z.t>tl£to m/:, ;ixXDF3;i/0@x^#f^^^^&^#fac^^qi#gxfo AX®X f F o F &##(:X/fl£ Xo

4) #i% CMP - 3 L — 1/ > X #J#(Why a CMP? Coherence) F#A^^acaCZcXx ##^3t:-l/>XXD F3;i/&m VXS'tM£titit"£ tft7:%Z>tDlzt£ D £to Z(DZt&, iSff£ X > tjllz t a 0X\

Tz, xo F D o #i%% W.M'&tu b 7)b%U mc^#fr. mmc&^XDt 7-x#m#c#:&oxxf >0 3-x-x7X&mf#f a# #^^-x7X^±c#%Lx^ax—x^coxx-tx^^^-facox, f 7 F'IXcC^A^tXo f!lx.(f ##^Xf Fx;i/"7n F^ji/^ffli^

- 113 - 5) Hydra >(The Base Hydra Design) Hydra C 0 £ o £(7 7 4 F#!®)^oTl^to 4 #07D t y L/:S/>70h^ vy^^f-yot y-y^r, yot y tt#iz i^ y i>%^ i y —7 ^7 y '> j.^ilt)oTUt t"0 2 Iti^ f o 4 j@©7D -fe y +h itt write-through 7^7 77^ —7^I/OA>77o: —7^#7^^tTV^fo

6) 7 t U 7 > h D — y (Memory Controllers) 7 t V T777^$U#f #^0#]lU/c3 >hD-y^7T"- h 7^>0BT #& o ^ 7 □ 7 y 7-hi t;: & £7 ^E V n > b n — 7 ti;, 2 % jp 7 y 7 7. ^ 0 fig 0 y — 6 readM, writeMf h^f!07y- h7y>a LT##L3:fo 7A>7tVA>77 > hO —f" y y^gB^0A >7 7 a: — 7^UT, 2 #^-byy:L^7°Dt'yy^0y-f#$&&fMf&7^-h7iy> t LXWJi^L^to

7) 77— h 7 y > (State Machine Design) #7y—h7y>0#ff#C07yA T77 7#^(^-BFIFO ##VV-7l:T77 7 ^ 6 =] U\ CRA(Central Resource Arbiter)IZ$c$I LXMi¥^: ^ t~ o ^-0#, 7tV77-b7^^Tf^)^'r, y7770#^H:^D7^^67y-h&^ib |5]-7M/7^077'b7^^^L^#^CW:, ^^^^077"-h#cy4'7;)/m

8) U V-70iH (The Central Resource Arbiter) CRA Co^Tg%%L#;fo d:7t:^07y- h7S/>(^^ SM)t)##y V -71:7777f^^l:(A CRA SyA7;i/T. 7 7 77^^^04:7 ^ L7V^ SM # CRA C V 7 J:7 h & f o l#|^C#m0 V 7^7h^m6^^#A, ^0&#^m-i:v7]:7h 0#%|JC (ta &0'%% read 0^# write X D & {§.% £ tl £ t o u&cpu#4#0o%^-f&a&-c#%j^#i&3;yaifo fAi^0#^jgw: CRA (±/J\2& ROM ^-y/U/^Mr-DT^T, f0#&m^yy7^7 H:^f^m:#01/74i>7^^hK^fl^fo CRA (4:^#l:^0y7A>7ty 04>^7x-x;^'y 7 r%Mty '^X % z hi)#)*) £to

9) 7 F 1/ 7 0##(The Central Address Arbiter) 2^jp7yi>^<7^0/:y777W:, 7A>7tV/\07777#TuTf&&7\ ^ 07Fl/7&^^U%^^^C'aC'(j"^#Ayo CAU±, [a!DyM/7(:Mf^#^0^7 mi^0T777(4:A-^-A >7^

- 114 - ^17^*777 7 70^7 £^7 tlzte D t) ^T'il U o 2 #4" 7 v S/^_ < 6 ts f 07 b 1/7&, f < 7^^7^)7* ^l:^otl^it©7n^7 P P7 bMmtt$iL£to it#0^^, 7PU7#5- Hv b^7 7 0^l57^Hv b^tv b^^l,^7o tv b^^l^Hv <70#g^^7f^J^(:7 V7^ft, f^70Hv bA^7 V7^fi/:^^(:(d: D&bT7 7 V 77 7 7^^ff ^fl6 0Tf o

10) CMP T jE £ *tr cl £ 7 h (i ? (Can a CMP do even more?) CMP T(j:7Dtvti^0m#^#^cex^0'r, yD^v^^-cmm^i/v P&#jm 0 j:9 C^^coWimCM^XimD'rf o L& < T, 7n 77 y(DMtt 0&ia:3 — P£fbJ$bfcl*tU£ft D £tA>0 7 7T:\ 'J7^-fe^®^^-l £ # X. T g* ;£ Li o o Hydra T'lilP^T 0 write (i write 7l7^MDTf&07Dt v it £7n — P777 b £ft£0T\ g-VD 7 vlll^XrA±t^4lfc^t© write (7)® 7 ^#7#g=7o ^7c, S 7 — PT/E7 read 707 *5ft;E#to£;:SlfTl~^§7D7"7 A7 7*7 > b £

70j:o^%#aU%77O7777&#jmL^m^{b(±, ;\-P7 :c7t:7'3Tg##C^;t&0#y:^&y V v bTfo

11) ir —(Data Speculation Requirements) #uxB0%#^^y t V 7 7 77&^#T& X:#)£:U\ 1770 5 c0##^^#(c^^ h #x £>ft£7o

(B read l/-i>3 >#fG^L&7 &## # s >#C^^L/X:77^ write 0##&^A£C&#'fb'T#a## (3) write &yi:0 7D^7Amb 07 Lwm^7^776## (E) L7t^^07l/v P£:^L7 t)]E L1^7 7 V 0 view

12) Hydra 0##tb/p — b (Hydra Speculation Support) Hydra b£:c^7g%%L^fo 1 #3-7 Vi/^-077'^ b#ffl0^7b v b^^(t ^.ntl^to 2 #77 v S/^.07C^#8y^ write 7 — 7 7#x^) /:6607; V 7 y(i77 L2 /W 7 T)ft & D £7o ^ 6 £:, ^/:&b037Dt v+1^7D 7 yVtM tltlA £ 7o 7ft£>0 V V —77 lot, ###&77 U 7 7 77� 7 d write 7“ 7& write A7&E6 LT L2 /i'j/77i:S§)i^n, CTE^tiH^tiSTo 7*0fi^S #&m%&m:77&a%#mU#&"3Z:7kkL 1 #jr7 v^o.77'^0 Read Hv b& 7^7 77^7 h7^utinIfb7 7o $ 7c n 1 # 3 7 v 7 :x 410 Dirty tf v b

- 115 - 7 C jsstotss&^ft Xz7-7(i L2 Ay7 7it|-CtEU^mmt%c,T, 2 ^4r7y i>^t#Si&mft^t-o # 7° D -tz y if t IE U V'' view :ttJ3-Z.%>fztf)(Dsl^ V V < > 7 (t , 1 7 iy pre ­ invalidation H v b 0#^ E: L2 a y 7 t fr £> o 7 t y —7 -f > 7^##^t J;oT ft o

13) 1 #^7 y i/ n.*? ?f <£>S¥$B(L1 Cache Tag Details) 1 17^7 7 '>:i 7 7'to AT £ t§E b < I&0J1 L £ t"o 4 Dlto 1 oftii read t^fa-ofzZ. E £: word #jitif Read-by-word 7 7* T\ 4^0 jo!®!! S read t)%'fc>~DtzZ. EtDj^tititfflAbft^ifo 2 0&b(t Written-by-word 7 7* T , write h %&> ~D tzZ. E £: word # ji t tf L ;£ if o C ft (t 7 t V V % — < > 7 i: ff 7 /z &t|£U'bftfzS07:1"o 3 -OAW Modified 7 7* T\ ^iE% read *A V 0|gC, 17 ml0tf # t & -5 ^ E' 7 ^ ^ t" & 0 "tr f o ##(4: Pre-invalidation 7 7^1?, I/DT^ AtTV > b0^^^^7 LT^tmAyl?«EWC E^^b^f o C ft b 0 7 7'0 j(g &#fb f 6 /z 0#XU ^ D 7 y 7 ^ ^#'%r f o 70^7^03 ^ y b t) b < (±^#0#7C0^z^, Cft60MHy b <&-^t7 V 7t"6 y 7^^' m^to ffet, Modified btf-fey b£flTl'£H#£&5ctf*i£a Valid tf y b £ 7 U 7t§ [!]{?&Pre-invalidation b y b ^iz 7 b ^ ftT A ^#7^7 ^ y b C o fcB$t Valid b y b £7 U D $Lto

14) L2 dy7 7 0##(L2 Buffer Overview) l##m##&;U/y Pt, 1#0 L2 Ay7y#%&b&fo MtfftbL^bt, IfcXty P0*§^£ 2 ^^r-v y '> jl tMBfc'f atPH^A y 7 77 7 > PTrfi1 otz®>(Dx- ^7b7^ L2 Ay 7 L2 7 s? y 7 7 0rM 0 tz <£> D v y 7 Cft(i, ^#t^%b^Xt/yp^^e0^Dgb&t-6^tAy77& 7UTb/zD, ^byp0^T&^#t2^^r^yi>^^0#SJ&^^bf:l), 7f“bV'>>^ It7ly770^i^fibfc Ds Ay 7 7 £E

15) L2 A y 7 T 0##(L2 Buffer Details) L2 /Ay77to^TMt#L b U (t A- 7 y '> zl 7 i >#ji ttz^T t'T , ^CDtp-t'^l^© write y —7jig&^f b#ji0'7% 7^7t^T^fo 7 7X3 CAM tftoT AT, T b' b 7 t £ o T ttfot & ^ > b V ^ C to 7-70bA###(D#%mgm Head, Tail © 2 j@0# 7 > 7 T'^fiS ft £ to L2 Ay?y&b0#,&#b(±6b7##T, 2 ^jpT7b^^b0 read 7-7^/^y 77^ #7 —7'T@(3##Jt#S#X.;5 C f&t>t>, ±T0 L2 Ay 7 7 &tb—f bT, 2^jr^yyo.6D7-7j;l)t)#Lt^(D^$)ftW:, f(D^T4b##®7-7m%b

- 116 - 16) L2 %(L2 Data Buffer Sizing) L2 a 7 y tWJ Xlzmt U tt 0 z.®#? ylt^U'Ot v — L2 ;iv7y(DJ:> h ;i7 7 70#j^(d:7;i/TV7T7'

^ 1KB write L2 7 i 7^/^. ^,-e L j: 7o

17) #%#l#3 7° D -fc 7 ^(Speculative Coprocessor) jSHHff<7)rOT£:fT7 3 7°D-tr 7 tKCOUTl^B^ L^t'o 3 7° D dr 7 it f*] ^ < 3 1EUW7-

J l >77:7 U ^^2, ffe©7Dn-fe 7itfr£>®3 v> p o 37 > P&^^&T P 77^0 write t LX% ##0777 p^^t,±(t^, ^ftpp0777 p& f 6V^ < 3^(D#m^V7 P 7x:T;\7 p^ ^

18) ###cfr0 7 > 7 t A 1/ 7 7" A (Speculation Runtime System) 7>7^A777"A^ U7^<3^0V7 h^:cy/\> Ctl 6(±, 71 V#3B0#@L 37D^7t©R ffe7D-fe7li-A037> P?£fl&£'£ff VU $ 7 7 P0#U^$«j#P U^f o ##N#0/W ^7 —7 s p^scneu^^ti^To cti6 i>^^0^##7:W:V7 P 7x:Tl: j;6T73—f-0^^#tl7 7'^f o 777^A77 ^Ai:Nui:(±, i^iTcD2 ttlto • “Data Speculation Support for a Chip Multiprocessor ” ASPLOS ’98 • “Improving the Performance of Speculatively Parallel Applications on the Hydra CMP” ICS’99

19) 7 7 7^ A77:rA0;£^ tib (Runtime System Summary) 777'fA77f'A0ot>^&60&#k:UT7t&fo %#A^^7 7 7 P k U7 procedure b loop  C k ^ MMlt loop (D (Dt~- /i/\7 P^/J^

- 117- End Procedure procedure U y P &y p #± 110 1M y ;v •Loop Hjtil Start Loop loop CD# iteration Iz V P t LX^ffT £ *P S £: HE X, t* & o 70 tfd' ^VP, loop 30thY ^1P End of each loop iteration M^£%'fs 41 CD iteration b, CD iteration (C% D #^ 6 o 80 +h^ ^ ;p, %#CD##& loop trPMS L^S'nid: 12 tfd' ^;p Finish Loop MUfrrpC) iteration *W:71 % t t & lZs loop ^ 7 $ ^ £ o #± 80 +h>f ^;p, ^#CD##& loop \ZMfeL 22 IM y ;p •Support ,>P —if* > Violation Local Hff FpCDyn -L lz — 3 EX 25 thd' ^ ;P, ##CDM#& loop tzW>7?& 7V7 #)]/ Violation: Receive from another CPU

20) y —y /x*tf — P(Anatomy of a Data Hazard) P^^^L/:^^CDmf#^EI^^T3l:m%L^f o yot ytl 0-CW&M fpyv^ izkL^fo v p^^u^ti, c cr, LUfyn-tyo-hv+f yi d^y- i,2,3 P uy0lid: x D b^^c^»iz v p^Uff/%#!:, $ y D t v tf Ot'X ^xCDSS&^^fe-o t btto yn-t '^o write A y^muyyDtv-y i yot^th i VIOLATION Po P^fd: j; D^aau^y i/ v p^## [#0yDt vdf 2,3 1: KILL y f KILL/\> p7 b^Xo bT, yot v+f 1,2,3 (d:X^x y iz ^ p c 0 o

- 118- 21) 7"D b 7 d* y(DMW(Prototype Overview) Hydra 7°n b 7 d* 7°©1^#£ LTl^to CPU 3 7^77 bJafetv 7^eu 7X7A©JiiizttmtZTyu—^-xm&xi^to S|37D-t 7ibC (d:%#891:7 t V 63 > b d — 7, SU D *-XA, ####^7-0 A7 A©7:i-->A^©7d- b/uy 7&g#a DiAAT'U^t-o 7^ U'>X7AK:id;, read /^Xt write A A, V V-XOTffl 3 > hu-7, 7" 7A '7 7° 2 ^A7 'r>2^ ^777 A 7 t'J®3> hn-7N A£tiA kA/^7 A©^^©^^^ >^T A/Uy AM##^Am±d' > 7 T a: —T(±m^8yc(d:^o D o -A 7- b 77 >©^#^7 A >7 • ## 3 7° UU7'Al*]7d'7 -^#7 t U©V77l/>At-7 • Central Resource Arbiter © dr 3 7 -A/^7 AM/^At-7 '^7 7" 7 A 7 t V 7 >7 7 —A •*A batSHh©^iJAtitiAd> >7 T x —A

22) 7D 7®7D77"7> (Hydra Prototype Floorplan) yu y 7d*IDT#©RC32364 t°©7 ^^-AlC%^A^A, 1^7-7(77^ 7 7^-W:#^, PS®) A-7 ^ ^ \Z 8KB, 2^7 7 128KB tto f'VSfli 88mm2, fVU'71i 0.25um tto

23) E^H-© BtH(Key Design Challenges) ^®(DmnY'T:lZ& D ^ t- o 1 & A- 7 7 7 2L © 7 y U '7 b fjjiJ fP © , Gang clear T3 ^ {$ 3 # Gang invalidation © [ggg£:$fl

24) $fctfti!$i k 7 A© ^(Statistics/Debug Mechanisms) D , fA'V©f:fetl7n bAd* Ag%% U ^ f o ^7f - b 7 7 > A > 7 £ tit l^T, idle, busy, arbitrating &

- 119- 25) 7°n b ^7 7° 0 A Hi ^(Prototype I/O) 7"D b f 7 7'®^bgP7 7^ 7 7 ^ LTW:, 7:^^^7f'7 77tV^®^777b &7 7^ 7 ^7 b7—77X—73 #^J I/O 7 7^7 J: —C®7 7f 7:c —7&MDT 77 7 7 X U t7D^7Mn- b OfctK 7°n 7"7 A0^fT$pH£flX t> ft Cfc bK #$

26) ^ 7 7Di£§l'(7) D-^Yy 7°(Chip Design Road Map) 7°D b 7 7 7°©^Htx^{i^©d:o l:/d:oti^t o 1999 ^* A7 7tf-x7 b© VerilogXx>

27) i£ £#> (Conclusions) CMP 7^^r^7t©7t #g#C7'Dy7A®#mb&e7Ck&gmL&L&o ^#77^77^ mtAC 7 7 a, jloT mxc, Hydra @77^977^ -737t:ciM:m%umL/co p^azy^oiM:, a'7f &Pp7\C##L^ Ltzo IHi: Hydra CD7°D b77 7b£ L/co 7° D b 7 7 7"# v7b^^T^®#m^m #)T V < i^Xfo D o

[mmm Q: 2 ^^7 7 7 n.|i7 7£Ol/# — b £ & o T l' £ t~ £A 7;i/f^- M:if7b7t'L J; 7 ^o A:fwc^uxm& o & fa\ oxw:^- b&mAoci:^#miq]±(± D #H$X k#x.-[(7g;To ^7 7 7 :x a 7 7 (DftM i> lilSb: & £ D &i±/vo

Q: 2 ^jr7 7 7^®77t73 7^77 3 7®##(±fjl^ L&&o V ^r —7 3 7

A:&k;Ua tomcatv&^T#:37^773 7#$

Q:Hydra^@[77 7 7 j;7^o

- 120- ;i/ —7l/^vi/0# j:o U z O^o A:f ^ toiJ^<7)7 1/ 7 f & &:U:;i v 7 L^l^#/vo X: ^7(j!#j^(D#1^7 1/ 7

b(J;W^i/-S/3 V vm®;i7 7 7#^t:##^& Df 9tt o Java (Jh“7°h77 7 7 ^1“ l>T"fo /W b n — b h Ti7>(D^l:&6^M'7i>7±'r##%%:7 l/ 7 b &f#& C c a^-r o

Q:iM&%Ajg(D7 1/ 7 K&tU D mf C a^3 >;W ^C-C^^CO'T L j:

aaewx ;i/-7^0m#mm^^BLi:, mto^ma&frA^ct^^f-cfo

Q:/W tV —'>a >IZ ££7 1/ 7 Y(DW^'i7&, 7 1/ 7 b %9ifr

7 1/7 S/— 1? ^ U J; Ly^o

Q:%#7 1/7 procedure ie%;mL^T#f^tC^#7 1/7 ^ L^L, ±76D7Dt7tb^##Pb'C^#^^^^X:#'o, ^ a#"?U j; 7o A:%#7 1/ 7 b}:(jf 6^171^ fo 70 4:7^

(Dmi)fj(j(ja:#^^^^7i/7b^a*#^f^(DT.

&7 1/7 b&^1Jpb®7D t 7 7'^^1J^Pb#rL7s $ffc&7 1/ 7 b£ V v —7 &%(J #f c k(:^ D ^ t"o

b^7 1 Ol:^^-3TV^(DT(±^l/^ U«t O^Po

Q:7 1/7 b0#^m{6^^CT#bbL7l^(DTf^o A:7 7^^ Ai>7TAT#mL7^mf o 7 Ai7 7^A(a+#C##^< 3-^J >y^^^V7 b 7^7tf o ##(± ASPLOS t:##L&#%&#B8LTT2^o

Q7l/-7U^1:, ifXthenAelseB CDj:9^#^&##7 1/7b®#m^'r%7C:^(J:

A:A ^ B ^^^07^-b7>

- 121 - (3) "Hydra Software

i) mm #8# Hydra 0V7 b b C? >/w 7&^#v7 b ^^y^#(ttu2%ctz:t,&itAyo *0(2, Hydra b 7 xiTdo^T&fS U^cl^to

2) #t3l(0utline)

*0®!S®4iJ$i2C (D 2 a (-& otl^ 2 o %#7 V v b<* — b t Z> tz&> ®mff#i>72A(Co^Z:&BL3;2o CtlC2^T, ##7 1/'7b^^%2^^^^,

to ^Cfiv 7 b >>x7l:o^t^iltto #%, Hydra "£(2 C &&XF Java Ha§

C #^(:o^T(2 V —7 to V —7©3£?!Hb b 7 >7 1/— ^ Hydracat ^^bt^tto CtW2m#®* —y&##8g#^ij1l/ —y(:^^*6 4b®l! to *012 Hydracat ©fSIU ^EiSfb^o UT jolS LA^fc L^fo &C: Java (£oUT to #SL^*o Java -£(2E^'7^>^^^m2^6, ^OECbO _k^#6 A\ 2&t>t>, E£lt7 v >®—#' —^<7 3 1/77^7773 — 7&$0#(z##[qU:^#6^ (co^%y#%%u^ L£~f o ^ tcE^vv > _k T ito 2 £ B myny^Ag#:^, 7 w b^ztmL®%#mec2oy^#(o]±Lm2o

3) Hydra y D ^7 ^ (Parallel Programming on Hydra) 702 3:2, Hydra±-e^(D2 7C^^jyDy7 < Hydra (ivji/tyn-t 7lt-£2^6, Wvjl/yyn-fe y it [nj (2 © y n 77 < >72^tk(2f © $ Siiffl Wlb^'t" o L/?P L, Hydra (C (2, y D 4z 7 7" (101##© L t 7 7 7#J\ ^ C* L ^

2 7lt^T|5]^^%^) 1c 6b©#*##2 $) 6 , LL(Load Lock) 2" SC(Store Conditional) |©D y 7Sfb£tb* — b btLtto L^L, Hydra ©#4b21c^##(2##7Ly b©1t* —bT2o Ctl(Z2 D, yo7'7A©m?me#^m:%D3;2o t), &L4&y — Sl(2 it/b^/b^tcl L%4b, 7n— b7^y 7 b b7x:Z'>yyA®^^^2 D, JELHS^/b^E£ft3:2o #%, b 7 F ^t^Et^ D t to -0(2, 7-7'>—7>iW mck, nfztmw:#^f^3-ba&#^^yi/vbkf^4b®7?*o ©ck^w#k%D^2o —o(2, 3>/W7^yo y^A^^^jfby^^c, ^7-0(2, A*c2atmfb 2^t>t,, ]EL<^i^^mti^^#^T4bfm-C#^Jm^Ly <^6C k#7r^3;2o

- 122 - 4) S' (Speculation Runtime System) T*t3\ A©S5tcA D ^©luicvx- F^x T©!^^ Ll ** y'>ilC«U 6^^)©##&©U v F WMlX\$tlXU£t o L2 v:2.©^fiulC write ffl©Ay 77^t)Jto C ft £ ©7x — F txT&fM'fa fcfe©37D-t CPU f o Cft6©/x — F^7xT&$iJ#f ^©(±V7 F C7xy;x> o A^ <^UT 3 # ^®;x>F7^&D^fo ^oia, %#^tV&#@f^;x>F7'%r, ^©Xl/vF6^3: Dm#^n!&66^mic%#U, 373 U F'7t'to 3 #U(± write /Ui:37> F&jTf&© 73 U y 7 ^©N#&#J#T6;x > F7tto ;x — F 0 x7#f ft£^til U V7F>)x70MiI^“f>^ |07Dt'y^CXrl!“'>^Iot c©j;3^m@©-gP^V7F^x7^e3C^lcj:D, S&^m&mWc&OSiPo

5) yiPF^Ri U##3 — F © #%# #c fr (Po st- subroutine -call Speculation) TW:, #l^lc^©j:3lcU7^#7UvF&^j^f6)(p&M%vx^^L/j:3o #^©X ^(d:77;P-^>^U:mU©#lCi^< 3- F$:##7 1/ 'y F^ UT, t&W^^'ijt Z> hVxo ^©tto C©#T(±, Prod, Proc2 ©z:o©-tF7;P —^ >6s $) D ^to # 861C ^ ^ >7D^7 Procl 6W^ftfc h ^1C, 7 1/ y F© fork ix^To ^U7, f©l$^mwc#i^'#'^« some code »©gP^^%#7l/vFkU7 Hfflc(j:f L^Xo ft&?#jU^Uft(d:% C©#rai, Procl d lc(± U #?#©X^a U7ld:#^^^©^^^ft7ix^V ^ — &%5 x#©x^^mmw#'rf#^©^@©BAyi:^uT(±, c© Z 6 tc U%, ^o©7 V y Fti^^ijlc^b o ^ UT, t) U^7-^##^.©#M6^# £>, ;x— F 7 x7UX F <)x7^> F'7 tCj;oTilu@©MT'j£^£: £ 6 &[U

^T> ^1C Proc2 ©dfl/m o #U^[B|#lC fork #j^C D, Proc2 ©df U7P Licit < 3— F6^$|7 1/ '7 F' tte D , Proc2 X^^M^iJlcHU^ tl^-To C CT, Proc2 &^^U7Vx^71/'y F6^ Proc2 6^6© Procl ©e^mUlC#!jmU/:#AlC, ^ &%'J©##7l/y F6^^K^ft Procl ^^mU©#i^3-F^^eU^fo CCT:^# UT(5U!x©U\ c©^#71/vF(d:, #©##7l/vF, f^t)^^^>7373Ai^ © Proc2 U©#i^3— F U7lx^%#7 l/vF©caTf^, f ft 3: D t) ^mjg^mixa^ock'rfo m^^e©m^ic^ic#e^^%td:f©3-F &^eu%vx^A^Tfo c©m^^t)^^j:3ic, m#xi/'7Fi±^fU4bm^^em

^i>7X A(±^^##1C^ l; o

6) +FCOP—A > IC^f 'f'-S+F^— F V7 F 7 x 7 (Support Software for Subroutines) V7 F 7xy;x> F^ia, Mips ©7t>7v#^&^c7#$ic^m^<@^ft7ix

- 123 - tto Ltwtto —o#\ 7 v v btp£>tife<7)7 1/ 'y M&(Dtz 5, V7b7^7y\>b7(t^01y^7^^^^TW^^^Ml^t^A^^0T, t^t 0l/i77t ^#$^(7tl(d:^f) t7-/uo ^tol/^7 t0^#CMLT3 >yW 70#(7^ &tud:, c^^rntt^M^ti^itt^to v^->#07#, tot^.

cti^0t—y^vb}:tD, tE^0^%yb-t>(t^ 7Otb7 7yb0 3 7 b £E L t to tE^£E7 LTfr 1/ 7 K £ H£6t £ yb — t >&$ no tb7^yb#^Dtto gT'otvIf^Tr^mZtl&mM&Mt&yb-t:/# 30 7"7 7 yb N #737 'yy-'Vtfctti£tlfcM£i%fl&Mt£)l — 3:->& 80 77 7yb#^p Dtto ca6#m^&6#x.ac 1:U\ &t t) )kv^tE#^%#^f7t^0W:

7) )\/~7‘Wk 0 3^ L0##^ff (Loop Iteration Speculation) #(±yb—7#D ML0###f7Tto o£ D > C 0 t ? & for yb — 7^fe o fc ^ § K7 ^0e#DmL^-30##7l/7bkt^^v^9'60l!to C0#A, &^7l/7b^ yb — ^(D^T^^ttiLfzt^^U, yb —70^7£^7T#;5>&7 L y ^,tp^^ntit#Ayo f 0#A, ^7&#mL&7l/7b&%^#&71/7H:#Ll7 ^ 17t±#t o 7 7tybA^6 tit to cti(i7— c0t 3%yb-70##^e^31< i^ < tl(t& D t-tir/Lo L& L#b^L^C##^$>o7^, 77^ >

L^^f7Lti^##7i/7b(t, yb-777f40#%^6^D@tcac^60"e, g #Rk#M^C 9CB#^ey(:^/7 b t^)^67rto t fc> ^-^xTC.tot #;mt:7tv0vt-<>7#e:bti&0'r, @#777"<-bfb^^m^titto t ^73tvit(t@^g#0^tV 11&6, #0#D^LCtot write ^tl/=7-t& read t&0t&#tl(t, 7tV Vt-^>7ais|#0#^^f#6 tit to

8) yb —f^C^l't^itzM'— b 7 7 h 7 -x 7 I (Support Software for Loops I) yb —7&:#t& 7 7b 7 ^T/\ 7 b 7 t:(t 3 o0yi —^ 3 >#& 0, Lt# Tc7(t, Slow ^ Quick ^^7 2 3»0yi-7 3 7^^tL^o Slow #yb-70##^^1f7yb-t>0W&#mk&m%LT{Bo#fr0/i-y 3 7 T'to CtlttStlW ij;+b7'yb —t>0bS^Eff ^|5j L7 t7-XA"eto Quick ityb —7 m'r0ityyb-t>^#^e^#^#^^'3^TmmTto vt t^iz, f 0t077yb —t L^v^ki^o

Slow yi-^3 >(ttbyyb-t>##^ea|g|L7±-XATt^^, t-/i/\'.v b

- 124 - — 80 'V'J ^Jl/03^ h W&1P 0 £to —Quick V 3 >Xlt^ — A ^\7 < ^ D, #(:#d^L^T0^—y^7 M3 16 ,, c^ic j; D, yk-7^7^r DT;^ <&<%&, ^ mfo

9) />V—7,tC^>l‘'f'^t^^w-bV7 KxT II (Support Software for Loops II) L&u%#f), w(D^^yD^7A, ;i/-y < x Quick yi-^3 >0^-yi^\7 ^6 2:^7#^# fe t) ^ ~to C T\ ^ btx Improved(Quicker)h W 7 7i— S/ 3 > c^(±, ;k—ycD^t)#^)-Cx tF7;k— C0#^, %#7i/7^^^^^ft6#^(d:^fm^ ^emmaisi##71/7 D ^to £fz, &+t7";i/—^>\zw D mt&g&te < & D Jto ^0l5^s ;v — ^^7 K(i:$j 30 +M 1DI Li#T(i 12 -tMMK ;v —7° 22 ^0yi-i73>^#7XM±^e ^V7 k 7%.T'TtUD#X.6C H®M^V7 k7x 7tlT7 C £0fij;&0 —off o

10) x“f## £ Tf £ #^(Enforcing Data Dependencies) ^fr^i>7 7 AX^^0 f 7 1: LT##7 L 7 N:7—f

A0y\> o I3f m!#7 1/ 7 F 1 #7 M/7 X & read L7. f0#T#%#7 1/ 7 K 0 #7 M/7 X C write LX: ^ yi7&^#LT^X:7 1/ 7 K 1 o y\> M7(d:7 1/ 7 M l g#0^fT&#%^6^ b igf j; D###%7 1/ 7 M 2 1/ 3 (: KILL 7 7t-S;^m^ o KILL 7 7t-^&^ (j-%'3X:71/7 M 2 3 (±g#0 KILL y\> K7&^#LT, ML£To

iwmfog i Q:7 1/ 7 F©7 D-L 7 1^0 III D ^Xtt£'CDJ:o oCDXTfro A:^ff#i77 7A#^07'Dt 7th#^T^&&&#@LT:& M 7 V -^^07Dt 7iXH||?3^tito

Q:#Ee#^77A0##7D 7 7ma^ fi%0m#&D 7ff &o A:(±lx #/uT f o

11) Hydracat 0##(Hydracat Overview)

17±T'HIf ^v77 A0l5^Ht) D s ^{:3>yW7®ii:f f f C n >/i T 7 iCOL»T L & f o #X± Hydracat(Hydra C Annotation Translator (DM)t^ 7 V —7 to V —7 0 C al§ b7>71/ —f£l!%LTV£fo Cftld:, C T°D7'7 A0

- 125 - ;i/—yb — %ffl D tiH ltit7;i/-f >(: L£1~0 ^ LZyb —7"o#CA7Dt v F7'^73-^#AUto ^fcTTicvv—7°£s 6j;7^c##yb—7}:^# L ^ t*o yb — 7° I*] O breaks continue N return 'fef yb — 7°iSllx 7 v M#077 7 7^7 77 7C#UD##6fl%m j:7k: L^(7fl(d: & D tiiAo tt£t>%, 7'D—y^ybib D ^#/vo — yyT\ 7°7i'^ — Mb hi fb&^Bid:7 7 7 7±C#l^^A^#fp|± f bt^to #%0 k C 5 ^ Hydracat & 7“ U 7" 7 7 tl while yb —yo#^c p T\ 7°n 77 7tc h oT&Ij l> W7ShSx $ t~o

12) Hydracat O^# I: yb —7° (Hydracat Conversion I: Loops) C0g|U7 yb — 7°g#:0^£:^ btl^to yb-7°£mh IvT12J D titi U IS AODTOtfj ly£#A lv$f o f ^t)7x ®$£yb—7°0 H$6yb —7-7 (spec_begin)^p^ — 7°$| D ?E L©HTyb — 7* 7(spec_end_of_iteration) N ##yb — 7°©^T^bw“7' 7 (spec_terminate)&

13) Hydracat II: (Hydracat Conversion II: Variables) £ 7 L£"io ;v — 7°5IStito £ *5 ^ ^St £ 7' n —y^;v/W 7|qH:M:W:cfi

7 (: L^i"o yb-7"#-co^o#!maco#B#:0 7 7yi^#^f^ j;7 yb-7^%#ut:7G0^#0#&C0#ja#:^y^77L, yb-7#T#^:#^#:^^ 7G®^#k/\y 7/^77 L&fo ;k— ^D-yWfb^tl^fo :©b^ i 0#07 77 V 7 7 0 yb-7^^^r o7cmcg[#±(f i)

14) 7 — tWy 7 fcl £ £ MMit(Feedback Optimization)

^mfbagm^S'e/bo m^7-K(±L(j:L^, yyb^v

XA AfaftJWMLX L£ 7 d; 7 c D , CO j: 7 (±, 7— bStE^ck^t^lbT-^. —-777)7^^^^ b> £1"o D mtl/vo Hydra 7:(d:> 1$ $10-7 n-— ~ > 7" LTl^&^7— LTl^FdlM ###MO# J; 7 br&oTU^f o tt£t>s K 7 ^7 £ V 7 b 7 ^7H

- 126 - ###&< ctotcf 0gp^ $<0#A.

15) 33 — F|£^(Optimization: Code Movement) Tit, if ®

16) lijStfb: tilt* SJ (Optimization: Value Prediction) {6MJ£:ffl^£ ££ ^337 — F %±filz&WjTg£To —Jlgtett|fi

©mooam&^siL. c0^T(t. if tC0^Tft^h y (TMEM^l/^-tirAvo L;tPU if X

17) #i@Hb: |B)M(Optimization: Synchronization)

itto C0#T(d\ |5|#i&g%&2:#)k: sum Jock a^7D77^B&#X=f3#(tTl^ to sumjock & 0 TJZJKJHb 0 l£ t“o ^ fz specjock £: V't 7 0(t7"fe > 7" V SlnT'S^tl — 7" >T\ sumjock 0#^ i f3 & £ T *t :n f h Uto 31 tt f3 tl D ^ #U0 sum f3 write U^:#T, 310#b^L^ sum f3T7t7T6C 2J3& D 6& < & D & 7 h° > D 7 7 T' D 7 7 ^$(0 #& read t-ek^f3##mM^^^Tt)^7T(±@^0T. f0#^f3(t#^#^^ read A^ffll^to

18) C 70^7 AT 0##^ft0##(Speculation Performance with C Programs) Tft, C 7D^7A(:#t-^. %#^ft0#^&MT^^Lj;7o ^770# $6 ft Hydra 0 4 7°D dr 7 ^t£ffl UWIIt b tct§^0. M&mff fcttt

- 127 - 6S L$fo 6©AIJSEIt@M©7D ^7A?t. eja-fbLfcneiie->^^A&fliv^te@^©ttig, $?> t, ttSas©KI+tS$B6fflvxTA^^^-->^U3- h'©tt|gs^ urv>$r, s» iR7D79 A7lf^-7##l±m&k|Bl@g©t©A^t^f A\ ®a-fbt ioTMffl 1.6 {gy.±icttigi9±urv'$t'o ttitsf,ffl7ny7i,-eii 7tt#g-e& 2 fgA> e> 3(g©ttig|n|±AH# £fc s ejS-fbU^UffBeAX-r ASfflV'-5>CkT*.

ear fttll — 7)S8Btt#Asfe 3 ©T* t AA SHSIfiT-6 MltolcSiK D S V A5 9 7 h U ,ft^M5USAsffe-nri'^fo wc fa»-/ir^»@C'j'Si'©t\ ^-7-etta ik^fiy.T©tt6btc»oT L$oTV^$TA$x S)6-(b$ftA:9lfi;Be->7 9A&/8v-E>e k -e, g%© 1.6 {S©ttlg|6l±A sSI £ ft$ 1~„ mpeg2 T*li3 — F^iJfflftjifttCAoT^- -67 7 7sa«*u tttgi6i±uru$i"o

19) C T©jSt8Slff © $ k ^(Conclusions from Speculation with C) cr*©sssifi&$kto$i-o mgawmecz o. kA^#-e ■to -9-7;i/-A>EE3- -y-7ll/-A>© V 4'->(ilAs:FiMT*Ss A>-3S +1711 — f <1*-M7 K &MT- *&*•& to ;V-7iSbiE u ©###(? If. ($kAk’©;t—7T$irfe$fo ilk it—7tc)f Lnne-fbSft/c. #-/w-7 b©^»t\#i?#77TA&mt''^#Ac, t> ito s«toae?mGicS'd< 7077 5 d$t-„ ® a*e©AA4f7, i!S©ii?A7n79Ak$7>fc < l@l#lc U7#9iJ7n79A&#< C e©36?iJ7D79 A14 Hydracat titl @#*H:###f?|o|ldlC#^J(b$ n$1"o $&. ;\-K7T:7A^m#f L7. 7D79A© #a(bl:$imf %ckA^@$f.

20) Hydra fi Java C k 7> "C9@69(Why Hydra is Ideal for Java) f fl'CWX Hydra A^tb— h "3©@e§. Java © K C # 0 it, Java fflS!gtt«kHff9#i:li. Hydra £ InH'T t'3 »© A O 0 *To If. Java ©f-f 7 -f 77 V -7 btf H/|J3bfiJ%9IC##IC|o|t\7t''^f-o $fc Java ©#R#9% k JIT 3>Af7ttMIfJfflifi!jftiai:Sitlilt. $£ Java ©EHv-> >©96$l **-f. [email protected]>>r*ti:. #-^->*7 u 9 9 a >(GC), JIT n > d(7. )7^n-fi >7kt&fiE&k\ ^ < ©7Jt- M/-A>6e£;'Sk L$T„ eft ?,©;v-A>©lSkAk(i. «Kg*‘£U’ffle@ffl5l?iJMte J;3x8$A$Efe7 $$-#■,,

21) Java-Hydra 9%(Java-Hydra Environment) Hydra © Java JSitli Kaffe EH7 7 > tlX 11 * t o CftCIf MIPS 7 — ffi’ftiffl JIT n wW 7 Asa * nr 4=5 0. ASift 9 97 9 ha-fffl 6ia-fbS:fft'$1"<, 9f 79 V If JDK1.1 VET AWT ^ SwingSet &7J1- Utl'l •fo Hydra T tt e ftCtt @ ©«|gi§ft]%ff o T tA * t. f ftl±. 7 D A |S|Ri ft k'©3£9l]@L97 0 if 1 7. KSUffa > b' 9 Tf 0

- 128 - 22) Java 7?©X V V K (Speculating on Java Methods) TJJ Hydra T Java © X V 7 K £T £ l5'

-4 6 c ck t), X V 7 P©###^fj©a Li7xX©#&&# ^#aLT(J, XV7MTXtX77X&gSU^c^tcj:D, ^fT^XXTA^^©XV7PA^#^e©^mA^t»^6j:7(:L^fo JIT 3 >;W7(±%#^e^^#f67-h>yV3-K&#AU^fo XV

23) Java 1!©A/ — X##^fr(Using Loop Speculation in Java) Java TA/—^^^Cd^TTTo C ©%=#)}:#, < o^p©3

— P^#A^'#Tf o cntiATt:: £oT -SS^Wi^TT U> V —X to V-X© h 7

> X 1/ — X £fc i J V — X to X VX 7 7 X A/A X h 7 — p © 7 > A X X £ £ o T & WtbT fo A/-XJtX4&W^mUTX7XX74'^-h^XV7pie^ m LC L^f o X V7 FWffib^lttt7o©A-^3>^ffliLTl^to --oiJX 7 TT h©^m©A/-xTT ^ T, ^&T°77h7;f-A-e&^eqr#-efo c^u±m ##±©^TA/T4^%Aj:7^LT&D^fo 4bo-7(d:^ ##mm©A/-T'#7^'rfo o

24) Java t*©il^ft7D7 7^f V > X(Advanced Profiling Under Java) Java TI3\ #%7 >;W A/&#M LTr^Jg&7°D 7 7 X V V X##^)^#T #^ To mxw:, ##jC7- D, XD77^ u >x©^^©7- p&man LfcDiJI$UfcDT^£To £fz, JIT T7 VAX A/£ftfc7- P©^E^> 7°D 7 7X A/7—M©/:^)©XA—x^T^JLT43< C^^)T#^To C©T7(:XD77XVVX C k CZ D X LIT©ck 7 D & f o ^T,

7 K kXu 7 7X V VX©##j^^©#laAA&##T#^To Jfc^CJStTT^ #0

H;(Level-of-detail N LOD)I'llt^to 7>7'J VX©T?££:fUffi LT

25) 70774 V 7 T &%##cfz©#j®fb(Optiinize Speculation with Profiling) XD77^U>X(±#mfbCt)%M^^fo &1\ XoXXAT©^©^^^^#^^ \z £ ~>TMMik£ tiZ o % (Dlz$.z.£t o cfU:(±, Az-t^TV X>X v 7 P Wtitib©XX XA/Bu UX-v#?#)©#^, HfrM&£©tilfg/b^ijffl-e§^t"o £ 7cs F l/X^IS^^H^ £ s fet£t o 7°n 7 7 4 U 7X©^^£TiJffl LT, ro^Tvti^o-FI m^mmfbx cxoX7A©#^^im#^A<7^©mm4b. D D Jto

- 129 - 26) 7 V 7 -S14Hafpj-h(Speedups from Method Speculation) fW#&Java yoy^Ac^f^, 7V7h##me(D#^^^Ly i^to 1.3 3.5 £'

27) 3 1/7^3 >(Speculative Garbage Collection) Java 7D^7 A g #:£ iW^bf 31ST Cfco l> Java 7" 7 ^ #|y(d\ yoy7 7A^7t U#^(D#^$r,DEL^ < Aoi:, ^^7777^3 1/7 73 > ^-^737773>(GC)^(d:, b-y^(D^y7x:7h^ 7^ &o#!82fi&c &#&^a-7y:c7 h ^)C a ~C?oGC 0 J^o-oid: GC ®m'7 v y >7;i/^'7 —77> py^ —y^^'rfo ^;i/t7 73>i:TdUy, emiol-k&lg D £to

28) V-77> K y —y^^C(Mark and Sweep Collection) lf7-77> H7^ - y^^^-oUTEB^UT 43§^t*oJava l: (± SH^CDIV — K fo 77 ^;kh 7 7y o-yi:#^ ^l^y7^;i/h777^(D7li^>7, $J'^^y7l/7M7777(7)JiM'>7, ^LT, ;V— b#esN 1~&fc>£>N B)§^^ii;i/— hy'&3 ^7 — 7 $tlTV^^7"7^7 h7?f o 11/ - h^^m^uy a^icid:, #(d:7^y^l:-yy, ^-3. &&©7:'1~o Mfiid^fd: D 77 7"ftt — ytry#\ f (D Ji7 > h^c^#, JA6, £ fc id: 6 cD^-y^j:7 J:7^^(D-eyo K6(D^-y^/^7 hid:, — > h^(7)^7"7ai7 h £7 — 7 L/:#T#ic^ D £ to 6 id: 77—7 $ fc id: £ £ 7 — 7 £ fiyt^iu:-yyyo y 7-y#)cTu, ic id:, t^t©Siyy^x7 b id: 7d' — ^7i;:& 0 o

29) 7 — 7 7 7 h*y^-y£^(Mark and Sweep Collection) ^§ c0EIid:, yy(D^p#(D#f UT^&fo S, K6, 6(D^7"'7x7 h^rSS l/T y 7 — y7^T Etc t §id:, H^fcid: 6 0:7 7"'7 x7 h D ^#/uo H i±7^y^^y^o:7h, ^Lza(d:^-^7a^D^yo

- 130- 30) *6 —7 7 7 > ^ n 1/ 7 7 (Baseline Garbage Collector) /< —7 77 > t W 7 7(37 CCD^^-r^, 'X'XVx.# b £60-!t7 TTlT'oT, 2 To rT£i>j T7"7a:7 b 200 ;W b T ZS0T 7"v :n 7 b (JC ^^cmc, ^r^#^j ^-y^o:7 b ^<0#^, eft(3:^#m^0#t^E^J7!To mbmJ##mc%6T7(:#0To%:f] j^ffl^T^TTo T&t>*>> @B?'JT'&V>T7"7o: 7 b 07*7 7'£T#)ltS LT:£§7 T 7"^ ^7 bif7Xe^(CV—b 2 ft 2: 7 V — V 7 b $:f#oTf3#^To TT^j:7b^#!l t)#(76k#(3:, c07U-U7b^^m^T#^To 2 ft & To & /:, write V TCT^%7 >7 V 7 >7;i/^ GC D ^To T% #WT7y:n7 b&6m'T7^:c7 b/\##T >7^#^#AT7 a U^#fr(: (37 b 7 77°&%ac LTTo Cfl^T D T7'7ai7 b 0® 6 ^T^£t££:ftT To e 0 E ^0^&6C, JIT n>;W7(3:t:-y#m^^MT6T7^;W bn - b, T^T>t,. putstatic^ putfielcU aastore ©bzfB tC write A U 7 £:# A U T To

31) GC 03£^!Hb(Parallelizing Garbage Collection) C0#(37 GC 0a'0^##^0To (:#mb-C5 6&&^L&&0 "TTo mTyn77A71/7b^ write V T^^^T6 T6^ ?(Hb#oI#ETTo GC7l/7b^^60V7b^77-yT6^^C. ;t-7%# #ffl:T^#^iHb^mI#TTo &fsccT(37 ^Hbt)#x.^ftmTo f0^0, $$T^L@T^#Tyi;j:7b0m^j^m4bx ;i/ -7°%#^^^^(±#m#^^iJ^e^W#7:To ^6V7b077-y^t)7- j^etx 80Ty^^7b^^%Lx ##l:Ty^^7b0^Tmm^fft^T^, Cft 6

32) write /i V Tt^foij" 6j&HSI fa (Speculating on Write Barriers) f ftf(37 write ;^VT^^(t^m#^e(:nVM:##C7 07 77 ;i/&GCt::&t'T, write /iVTU\ b —70—gp^77—7LT, a0T7^^7b& J:7 b 1:T& 2:^7%mi&@oTU#;To write ;iO70^^c^%#^ek(±, write V 1/ 7 b ^ LT^fr L, #0^7 V 7 bl^0 write /i V T(:#i^T 6n- b^##7 V 7 b ^ LTlIff T6 e hT'To write /i V TXtt V 7 —>#(3:^^)k <, $X:nf^7t:7V7b^0^TmM^e6W#g^(±^^7'0'rT^, 4fnT(3:#t^0 7r%#^ff0T —7T ^ 6 7b D TTo C e7r^mh^60(J, HE^^ftfc(f0m0T7'7x7 b ifiX 7 — 7° £ ft 6 t U 7 e J:tTo 7 7 b/^7 y rt^^fiX CT 7 Lft^#Ao — T D (c^T^t^T —7 bli^^Si-o^JiC^^To HSIS^^T6 t, write V T7r(3:##7r^ 10 frt> 25 1®0T7"7 x 7 b £7 7 -7° l, 300 U±C&6Ca# $> D Tit/uT' Lfco eft(37 (^^^^'07707-73 >'T0#^<&#^ UTt^TTo

- 131 - 33) *7 U 'T -i GC )\/— 7H&O %> (Speculating on Critical Collector Loops)

—ycD#^)^L# C, W.WLO U \) x h -£UWt Z> z. biz b£ LTzo :tii:J:^ a^l)10Mbi)5,J>

U^ %Z>£o £ & D $To

34) ^ (Dlt#(vs. Traditional Parallelization Only) atm-fiTo c^ui#mmi(:(±a:#me#^ib^i5i#'rfo c^c,

^IK^DJto Ctl£*fLTj£$l^fT& D (DWinlt, l7D^'y^t:^lt l^b y F&Sfl D X ly ^ h \Zt£% Z £Wfo D 3; t“o L^L, ISMOIE 7D^77©f±| CEL < c t tlth D ^i±Ao

35) GC ©'bttli|6j_h(Speedup in Garbage Collecion) ccomw:, GC C0#^mecd:D^ft^(7e#^[ql±L^^^^LTi^fo

36) GC \Z cL -?> $Lt^lolJt(Garbage Collection Speedups) C##EkL e@JavayD^7A(:j3(76GCCD%#^e(D#^^^L-CV^mfo 7D GC C6Dbb$ (± compress TGi 0%fr t> javac 0 27% £ T'$t% “£ t o 7D V =7 h E% ^6 21%'trfo JIT n y(D^J$l'tZ> 3— KCD&fgiPfplJ: Lfc £ GC 0^#[q]±A^yD^7 L«t 3o

37) Java 1CO l^T (D ^ h #> (Conclusions for Java) 7?«U ##C Java o Java i: Hydra iiffi&AHH' t ^ 5 d k7?fo #x£: D x 7 >f — hV'S y £ ItM'fb&frofc £ f & d o Java T##: ©7D-fe y+l£7Sffi1~3 £td\ &©£'?&;£$£#& D Jto Java % 1/ v Java g#:, GC^JIT3>/W7^f7XD-

- 132 - ASfo tl±T* Hydra © V7 h xTOISSUt) b Sf

[REit-g] Q'.E#T*S £ ftfc ;v- 7$$© W* 3 ww 3 Asffim;t:* 3 boilfoz -ctt. 3>^-f^izt^xammtztmmmfuz itSf A>o A:f*^8J6*Jfflt--S;H 3T*ttH?6t£fc©l'g;-f„ SfcjSSUffO-fe-^x-f X 7 h As£>-5.fc‘A>(:f7x 3 >/W 3 < T6 j6?iJllff'i!S-5 k V 7 ©#*$ WJA?f o

9:75^ -X- h5M©#mi±3 >/W 3(;toTSiLV'«-a-*sSi b $1" o Cfttfg#® tT-S5©li#SCtSk,Si'St, A:{?fto -M±MA^|s|bx-^&@3 7©T##A^bf 7%©l:x Htt77'i'^- Hbr-sa kudUMc, it. #@!c? m < © < ©•?«❖%)»§* Lfeo

Qix —7 6j=5&*>7 Lfc 6>s JtT b V 7 h\ -3 $ 0 kill $ ftfc 7 U 7 b ©81 1B-&S1XT < ££ W A:^t <-t;fflTC*V'^ s ICS'99 ©Mi*C«V'T* b it. iBM©kC5x SSxtt l"7 d73ACWSj bl'TitCSDIt. *Stt, t LA©##A^61i7U6»6# T btltzXV 7 Ux ##%##©% 6# < ©7 1/ 7 b AsjtT feilTVi-S 6>h,$fo

Q:Hydra T*ffl;i/-7g!| b StfflSlittb?&oTU*ti>o A:66*©3 5ASfioTV5©kR«>&^$6^x.$1-o «»ffl«l3SlSitftt -7©7V7 HtcIIIDaT^kAx rHx-f &# I'J LXf ©#)©#?% 6 kVidSiili&b $1"„

Q:while d/-7©##*fT©M&m#7 < AdS^o while 11/-7& 3 >;H 7 (3 t o ttt M^HbaHT-to A:while A- —7gf*© W24-f" <* CttiT C Acl^x tfd1 > 7 6 £ k'-B xi^x ffSSSfcSblSUS kill-f-S Adit©k kt* to

Q:Hydra T-tif 717 td*f VT 3 WH 7 A^ASt* S B fr&sfxt© * UfcAbx 6 U 3 ww 7 As;k—7Asafi?!|T!fcB k9;;S"e $ AciB^lclixKSSIffS: L4 tCbi)!T-SStAi>. 3 W W 3 AVI 7 7 r+f'f 74 #lt Ltfig^ft»5C k %iCiAA,$-&/bA^ IrJA^© idfcttifflt, 3 >/! d- 3 i3MUTat?eii-fbtt$.b $■#■*<> A:m@©*&^%#%R©ggf\#m©*^R»m7f. 3 - b#*x 73d1 Mbx %

■ 133 - LT < ^16 o

Q:3 — P##(d:3 WW 7 \Z t oTlltto A:(d:lX CCDt^WCiamZ:# < 0^^ D f O^f tto

1.2.10 mK&mmfamg.: hpca-g

npcA-6 cm^c, $E E3 & ^ ^ A -& o

#^B# : ¥b£l 2^1B7B-¥bn 2 5UH 1 4B (8 0H) : HPCA-6 (2000.1.8-1.12) (The 6th International Symposium on High-Performance Computer Architecture) IH6S : k r>;v-X'Tf3s 7 7 >7 A^## : ® Proceedings of the 6th International Symposium on High-Performance Computer Architecture (D Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation (S) Proceedings of Workshop on Interaction between Compilers and Computer Architecture

^ 16 o x 35# (##%m 1 e 3, 21%)

(1) ^mrnrn

HPCA(±. Bll:IEEE^^^^#f^ISCA(Inter. national Symposium on Computer Architecture)IZ H < No.2CD & & o # 3 i:lTA- b 7 i T iz £Hb $ tl % Mfq] 0 fo & iSCAi: \&M o T, 3 >;W 7 V 7 k ^ C 5 ^ U A^m^:(±2 616 ###3 A#:02 Z) o AIHSiC&l^'TtiC HPCA WWlZtltz 2 307 — 7 v 3 y 7° (Workshop on Multithreaded Execution, Architecture and Compilation J$l Zf Workshop on Interaction between Compilers and Computer Architecture) Cl tBJiff C Wi[p] IHilE £ff o fZo

- 134 - HPCAUA —M Aj T$)D^t|60g^d:^o n>b°!L — ^7-3r7"7^7©M^&^ffi i: ^ 6 r 71 U j rapt'^ij r^^t'17 —r^7^EUj r>>xfA7-^f^ f-7 A LT©##t^#lq]±^#j rv;i/^-^ P 7 V7F7x7Mi: LT ry;i/3UXAj ^ $#%y-^^7 ^7i:UT©

iJ)V 7 ;*;!/ n T7^©Haungs£> ©fgHT' $> 0 > 7 /c&b(C# L/(A## i: L"C rBranch Transition Ratej dv>TiafcAA Sr/c lcN if© < £> OTaken (fe l AiNOT Taken) < fa A l'' 7 Transition Rate £ all A b fz&IZ Sf %^©^%m$98%a^omm-c(±, 98tm ©Taken©^ ^ Not Taken** 2 M'MM t" £ © *\ 49 @] # C1 @ © NOT Taken# ##ilT ^) © ^ & O ^ §■ & ** -o fz *A Transision Rate& #A^" C i: H ck 0 T n

—7f/3 7 7"C(±, r-711/^-71/7 —^r^7^-7CNf ^7

— 7 A 3 7 7°j j&zp r 3 >;W 7 t n > b°xL — 7 r — >7 3 7 >> 3 >£Hf| f67-77377j ©2o(cm^LX:o Cfl%©7 —7S/3 v7(±, mXOZJt V T- 4 14:|W < &U*A #S%©Alo]#i: V* 7 ##T ^kh %&~otzo *711/7 ;fll/ — TX^V' AAV J:=T^©Kreaseck6 ©^g^T (±, 7° D 7" 7 A © Hfr 7° U 7 7 i:^og^©4b i:, 77 7 i/^ii/©a/^j#©m^^

IH^T1 a£0 Hl^^if'© < i:teA LTl^t A ll/±7!#^^**, ^e7D77A^©7-2 2%©7tVT7t7(d:. ##A6777i: ©##l###©&lAy 7 t7-edD& C iz^^LT^D, 777 A ^11/7?©%#^^^**$) 6

$ A:n PowerPoint-^Excel© ck 7 ^ \ l > :b $) & A 7 7 h 7 77 7 V 7" — 7 3 AtCjoC'T^ if©^Jg7l/7p**#t>^Tl^}A, f^T6©71/7p^l:(±if©#J^©#^J^^ &6©*^#^L*:#7:, ^WcUA TiPA7l/7p'r^fTA^^lz, A<^##©^W7 7 7$:if©BA©#^T

(2) Workshop on Multithreaded Execution, Architecture and Compilation

©fg^**A7 >-t)l£tlfzo lAAA'tiA A ft 7 6 ft© A*^S]Ei:®:bft;E>*gSk:-ca'»T

^ b &> %> o fo ft n http://www-cse.ucsd.edu/users/tullsen/mteac2000/index.html *p £>

- 135 - a. Symbiotic Jobscheduling on the Ter a MTA (Allan Snavely, d 71/ 7 ;f71/ —>:r Y J: d'^) T7tt® MTA (J, V7V^7 ly 7 Lfc7—7^ —7 > b° ^ ^ T C0 MTA ±7x ##k0#]A^^y ^ a c ai: d; b 1/ 7 h >^) cru #jmm$^±(fack&m^7c# r Symbiotic j &AJ #00#A&L&yyf #i:y7k- yy NPB2.3 #-^7b0f#&6a#02o0yDy7A&mx A#:0y7l/—y7b^^0J:7(:^fb^a^&^^X:o caCck^tW:, (^a/v^0#^, 10-2 0%###[iS±##f#i2fl&o X:/:L. yoy7A0$M^At)#^7: 1 0%#J^ ^(Dtztf), ±X (DUfo-nt)-# % EWittJteUt z btek-DX, MM & MtTftffziT S.O.S (Sample, Optimize, Symbios) b V bXXX V n.— 7$:^#L5*0 NPB2.3 *-$7l/^AT^frfa#A0A#:0y71/-y7b^m 10% ±(f ac k&mmu&o v7i/f-yi/7p^$^0yDy7A0#^fb^(J7!W:^<. ##&0^y^& LT^g^^rao b. Thread level Parallelism of desktop applications ( K.Flautner, < i7 d >; R.Uhlig, -Y >:r 71/ ; S.Reinhardt, < S/ d : T.Mudge, IBM Power-R Compaq EV8 N Sun MAJC T'tdL V )\z7~X ly 7 K ^17^— b cF tl a o L Ln V 7V ^ X ly 7 y Y > y A\ Internet Explorer ^ PowerPoint i:j3C'y#^'r&a^^7^w:^%-c&ao c 0 7c 66, *#y:-c(d:, r^#0T y U X-'y a >X TLP(Thread Level Parallelism)^ a frj l“OS ifi b(DU& TLP £rSffl LT^a^j £IH6Lfco ^0*5 HI, WindowsNT ±07^77“ '>3>yjj, TLP (J, 1.1 M0&0 A, £'T\ 1.3 £®x7c0&, Palallel MPEG Player (DfoXfo-otzo WindowsNT ±0 7 7° U X~'>3 >T(J, TLP 0 60-90% # os 0 7 1/7 M: cLa60^&acai:^0L%wao BeOS Trfd:, |S|#I#0 WindowsNT ±0yyUy-'>3>^i:bELT> TLP (1.1-1.9) Zbftfc>frz>tzo m#0yyuy-iy'3>(d:, yDy7<>y±0@A^6yiyvp&^cT t^ao

( 3 ) Workshop on Interaction between Compilers and Computer Architecture 0##

3t7i/3>8#0f8##&^7co LTF-ftL C^60^^^,m^amt)^Ta^#I:7) LTS ^ao &4o, http://www-users.cs.umn.edu/~sycIio/INTERACT-4/ ^ G ## ^ ##^A^-e^ao

- 136- a. Limits of Task-based Parallelism in Irregular Applications (B. Kreaseck, D.Tullsen, B. Calder, 7z7V 7 7V7. >7'7 SPECint95 Compaq #0 ATOM 7°D 7 7 7 7Vto#7PD 7*7 A±T\ 0 7°D 7 7 7 71/ & ^ (7 > 7 7 7^7: memory-independent ^7 — Memory-independent hide C 0$B 7~22%07X7 (1V-X, XD7 —7^7^ —C)07X7k#x6) IT 6 7 7 7 h CD ^7: memory-independent 7r $) 6 C ^ LTco ^fT7X7^^07"-7##^#l^#^L^:7X7 l//<71/TCD^#ay#fr ##H:, &6LHd:. U/ftJl:It 7 X 7 (j'Ltiir (& 6iHd:^DlA^) c2:#T#fUj% *#7: 0^0x^vxa Lzm^-r&^o

(4) HPCA-6 omm

1 lt7i>3>3 jp-7-kT Pl/7 3f4:^&^/:o %d3, tz Panel ^Impact of Interconnect on Computer Architecturej (dc -7* ^ X 7^ 0 #)A- 7 7-L71/ tt£ o fzo JXTTtdC C tl b 0 41 b t Bt> tl 6 %M\Z~0 WX £ t &> 6 o & 4b\ http://www.irit.fr/HPCA6/ a. A —7 — h X h° —A* : Joel Emer (7 >yVy 7 XIV 7 7 IE fg 70V —7°) •"Thoughts on the Evolution of Computer Architecture j , d: D X —£>*£*& 6o tl###±lX-XX7XX#^#X$) 0, # 0 m/c, 7D77#i#m:%6t:ofiX7i/7#0; rcu Emi:f6j r7D77^'7 7VT'V->Tm#tf^j r#|§im[mg&j SIA 0n — pv 7 7t'fe?) 2006 *£0 3.5GHz, 2008 *pCD 6GHz, 2012^0 10GHz 61:(d:<&#&&6T&5 7o ISCA 0li7:0firpO$:^Stoi:7-^77t"6hs ISCAlT'ldC X — XX7XXXX0V #2 2#0 ISCA24T(d:, ^<^l/-i>3>^2 1#0 E^IE^0X77;V (d:, r^g%6D#S^^tr7>^7-7D-p0m^^i7<^l/-i73>^^#J "T# 6£0 #%0lE#(d:, abstraction fro N 9-7D -F/)sf^t'fe^fcto(I x;v^iE^X6ca^m#x&6o

- 137 - ©so^&^o b. Dynamic Cluster Assignment Mechanisms (R.Canal, J.M.Parcerisa, A.Gonzalez, il #)i— - Palacharla L Smith 6 /b^tES DTtA7)7Dn-fe'ythy — Lx 7°D E 7E(*1© 2 7 h (777^) ^^&#%1:#E D ^T7)^66©EtA &##KC-C#lK 79-P&^#C^#^-C&6o 7 777^T©^-7©^(d-#Dl:#7^—/^7 P^^JMf 7)^&6, C^l6 2 o©# D—P&7)C'(d:7pT7Pl/7&#f#f7)^^©#^7r&7) To -P '7P777^7j -^, ^ #{iiaci:^j©##^-^hy\a#!iD^(t^o m/:, ^^©D-pv^7>7^-^#m± 1:±# < &6&^cko l:#7>7 & igtt, IE D ML, fo%-sLm£X±lzt£-DtzWi&lzte, n — P©/J\$l^©##3.-v b-\SE Dttl*7>o $61:, D—pl:^v^7:jrE77^<7$^#v\ &7)iG±, Ei#E#i$## ^ il/“'>3 >§ SPECint95 36%©#m o Ctl^T'l: IBM © Code Pack ^F/b5tE^ $ ftT & D , 55-63%^^ § 7) o D^Pb, de-compress 1: 75^1120 dit^p >&IE o $61:, Cache < 7 1: (±, 1%© 0^EE77^<77r4bA^^^^©5%#J^©^E)l/E7^^b7)o C^tld:, Cache < 7^1:, &^#Em$fi/::3- P©^©gBEl:^7))b^'7 7 E>XE-X;p^ 7,/:(&©X-;^7 PA^#l^6'T&7)o ^P bw X T' {i, ESBE'v£ h LX, Dictionary compression ^IteS U T V> 7> o Cflli, — ^l:E-7^-7^©#^X#t)^7^7)E%^&7)o 32bit©^h^ (3-^7>P&#t?) 7 ht^X —7'>£fflS L, C©X“7" )1/1:^^©^^&## DT# < o de-compress ^I:li, 16bit ©Sf i: ITf -7’>£#BSf ftli£^©7:, de-compress B#PsWXfil:MIBT'#7) o $61:, E##© ^^^!l©m^(±, E##u©m^ai5|E^&7,X=A, cache <7^©^E;PX7&E$o

' -a8©7°Dty7-e#, FPFu ©x< p;L^^#m<#%T$)7©'r, pp^. -y hU simple INIT a-y h£itEU 2o©7777 (ALU) #$)7)#^l:(±^667tINT#t #^T#7

- 138 - y T^trc SPECint95 Cck^7Ci DmmUTl^o d. Branch Transition Rate: A New Metric for Improved Branch Classification Analysis (M.Haungs, P.Sallee, M.Barrens, AOl/ 7 -t )]/^~ T O fz&> lZ§f h LX rBranch Transition Rate j 0 fzmJCT: $>6o I'* Taken (fe ^7 Mi NOT Taken) t)s$i < }zi^o Transition Rate ^ SA U tz #l7kL %^0^i%m^98%ai\o#m^(i, 98 00 Taken (D'&\Z Not Taken ft 2 % ® t)\ &^lMi, 49 ®#(I 1 @0 NOT Taken -?> 0 fr £O ^ Transision Rate &#Ai" %>Z)z\Z ioTs o £ < % Z t \Z$L?h It o D s £0 < ^l^0E^TN o Taken $> £ l Mi NOT Taken t>%Wi tl^^O ##(Transision Rate)&#A L/r0

0(iA#:03i), e. A- — 7 — h 7 h°— ^ : Kevin Kahn (A 7 7^1/ 7 A-y 7 A-7 Lab.) rDirections in Connected Computing for the Consumerj

1999 4M:(i 1,700 ^^0 PC ##mt:##U-C^6(IDC 07"-7)o 1 —#&#^T^&A0 61%, 2^#pcTt^A0 86%x 3^#c7O»6A0 97%^A>^-$yH:##^LTt^o C0ioC, —PC 0mBcgci:#30-C(i)k<, 3:T0$:#0##E& PC "C iti' — hi* & o lz tz % o

C^Oitli^ ^ Av0 7

f. Register Organization for Media Processing (S.Rixner, J.Dally, B.J.Khailany, P. J.Mattson, U. J.Kapasi, 7 7 7 7 3— K Ac#) 7;rMyM-e(i, i#^c io"eir&0;m#<&#a& %>o Z(Dfz&, 1 yot V^^0####0#^(^A ^,o c^o, D^7^77Aih0^m, ###m0m#@m 777®#,

^rr(i, D^7^7TA;h0#^^#h%&, ®#, m#mm, mmmii03A&6i±## M 0, 7' D —W1/& 7 7 7 7 7 y A UT, 7777 7 195 ^01, 20 ^01, 430 #01 c c, ##0mT& 8%}:

- 139 - v—'>a ysmuttz g. -eofifeOfgH -Y U y 4" A#® J.Torrellas 6> ti, ^Toward a Cost-Effective DSM Organization that Exploits Processor -Memory Integration j kSLTSSSSffofco 7D -fe y 4" © cf1 ifi tt > PDA PIM(Processor in Memory)7r kffllutSt^i£*>, PIM 4$) V'Tk'®J;di: DSM S$S1-nii, DSM AsHS-eS3; ^ k S SPLASH-2, SPECint95, TPC-D ©z<>^v—7 tC J; *3 -> 5 a Iz —-> 3 Dfeo il — X 4f— T.C.Mowry E>ii, E Software-Controlled Multithreading Using Informing Memory Operationsj V 7 h 4 :nT$lj®fflVA4"^ Iz ~y ir t viv«i/>Kll, y^E V iz^^y-y>-c kAsT-$5 kt'd #A&#^t®®, 7D-E y 4h#hK±, l/iy%^77d'A&*@< f SlD'-H^xTisSilUttSo *fc, 1 c©x 1/ -y P©^4^E,44^#*tcW:, yNI/4- x iz '>ft >tn,z&t) £tt-&®fI4 u*t'** s*iST*fe.5o ;h?.ffl)Sc*ll- rn-tr iitHEH, @t#©S© (ISCA’96 -Ctl3g4U fc A- -V -y Sz a = * * 3.-97 □ V => h zxfix^fcto®2o®vyx4ffltu 7T-a®lt5EiiaoE, lz y h* 4W *3 @x -5 ToiS (E*2 X V -y P/ rn-b'y+f) 4lg$U/c<, SGI Origin 4T7u*;i/h U SPLAS-2 C 5 1 U-->3 > X, 4/7 07 7') 7" —'> 3 >l:ot'T 10~14%®tt|g|6l±^f#e>ni 3/7®77'J!r-'> 3 >Bu i~2%®tttBti±;W9E>to3C k6«S5Ufc0

(5) SE&autfflfllr®

4-0®si*s* dt, #ic*#®e%-ea, stsi-e-r <-{®x-5 »E^A45%lcfi:t>nTt'?,tS4a5ttoTE!$U7c<, ^ET-tiy «x« rl;'®7D77Ai; M7D77 A40KtC^E>-&5 fciWtfJft'j tft EPowerPoint^Excel^Java-® ®7D 77 At ffflggX V y P As<$ftft-tV'3©;tPj E x lz y Ptt*60±® teiACIB f^e,n-cv>5,®A\ znt&xu s. >y±, jFc$ DME>iittV'*fflAsSv>o cfttiu c%m.izmtixuz,tuom&t>*.gub%z.t>tiZo 8f%M%®*T-tt* < , C©ft -5 UAc* <-<$x5ft-> %m%M%4#4' UTV < c kA\ S7*6©jS8?4iE®"EV < fc®(c*e@-eafe-5o

(6) 7D77AIS1

HPCA-6 ®yD7'7A4U.TlCSt-fo

- 140 - Keynote speech Relaxing Constraints: Thoughts on the Evolution of Computer Architecture Joel Emer, Compaq Computer Corporation Session 1: System Architecture Tradeoffs Impact of Chip-Level Integration on Performance of OLTP Workloads Luiz A. Barroso, Kourosh Gharachorloo, Andreas Nowatzyk, Ben Verghese Compaq Computer Corporation Toward a Cost-Effective DSM Organization that Exploits Processor-Memory Integration Josep Torrellas, Liuxi Yang, Anthony-Trung Nguyen University of Illinois, Urbana-Champaign; Sun Microsystems Impact of Heterogeneity on DSM Performance Renato J. Figueiredo, Jose A. Fortes Purdue University Session 2a: Memory and Cache Session 2b: Networks Design of a Parallel Vector Access Unit for SDRAM Memory Systems Flit-Reservation Flow Control Binu K. Mathew, Sally A. McKee, John Li-Shiuan Peh, William J. Dally B. Carter, A1 B. Davis Stanford University Department of Computer Science, University of Utah Performance Evaluation of Dynamic Reconfiguration in High-Speed Local Area ; Modified LRU Policies for Improving Networks Second-level Cache Behavior Rafael Casado, Aurelio Bermudez, ;i Wayne A. Wong, Jean-Loup A. Baer Francisco J. Quiles, Jose L. Sanchez, Jose University of Washington Duato Universidad de Castilla-La Mancha; Umversidad Politecnica de Valencia extended Block Cache In vestiga ting QoS Support for Traffic Mixes Stephan Jourdan, Lihu Rappoport, Yoav with the Media Worm Router Almog, Mattan Erez, Adi Yoaz, Ronny Ki H. Yum, Aniruddha H. Vaidya, Chita R. Ronen Das, Anand Sivasubramaniam Intel Corporation Penn State University Session 3a: Multithreading and Session 3b: Shared Memory Micro architecture Design and Performance of Parallel High-; Quantifying the SMT layout Overhead - Throughput Coherence Controllers Does SMT Pull Its Weight? Ashwini Nanda, Anthony-Trung Nguyen, James S. Burns, Jean-Luc S. Gaudiot Maged Michael, Douglas Joseph use IBM T.J. Watson Research Center; University of Illinois, Urbana-Champaign Software-Con trolled Multithreading Coherence Communication Prediction in \ Using Informing Memory Operations Shared-Memory Multiprocessors Todd C. Mowry, Sherwyn R. Ramkissoon Stefanos Kaxiras, Cliff Young Carnegie Mellon University; ATI Bell Laboratories, Lucent Technologies Technologies, Inc.

- 141 - Dynamic Cluster Assignment Mechanisms Improving the Throughput ot Ramon Canal, Joan Manuel Parcerisa, Synchronization by Insertion of Delays Antonio Gonzalez Ravi Rajwar, Alain Kagi, James Goodman Universitat Politecnica de Catalunya - UW Madison; Intel Corporation Barcelona Keynote speech 2Kpapers on caches by Y2K: Do we need more ? Jean-Loup Baer, University of Washington Session 4: Software Techniques On the Performance of Hand vs. Automatically Optimized Numerical Codes Marta Jimenez, Jose Maria Llaberia, Agustin Fernandez Universitat Politecnica de Catalunya Cache-Efficient Matrix Transposition Siddhartha Chatterjee, Sandeep Sen The University of North Carolina at Chapel Hill; UNC Chapel Hill and IIT Delhi A Prefetching Technique for Irregular Accesses to Linked Data Structures Magnus Karlsson, Fredrik Dahlgren, Per Stenstrom Dept, of Computer Engineering, Chalmers University, Sweden; Ericsson Mobile Communications, Sweden Reducing Code Size with Run-Time Code Decompression Charles Lefurgy, Eva Piccininni, Trevor Mudge University of Michigan Session 5a: Prediction I Session 5b: Parallel Systems Decoupled Value Prediction on Trace The Effect of Network Total Order, Processors Broadcast, and Remote Write Capability on Sang-Jeong Lee, Wang Yuan, Yew Pen- Network-Based Shared Memory Computing Chung Robert Stets, Sandhya Dwarkadas, Dept. of Computer Science and Leonidas Kontothanasis, Umit Engineering Soonchunhyang Univ., Rencuzogullari, Michael L. Scott Korea; Dept, of Computer Science and University of Rochester Engineering,Univ. of Minnesota Branch Transition Rate: A New Metric PowerMANNA: A Parallel Architecture for Improved Branch Classification Based on the PowerPC MPC620 Analysis Peter M. Behr, Samuel M. Pletner, Angela Michael Haungs, Phil Sallee, Matthew C. Sodan Farrens GMD FIRST University of California, Davis A DSM Architecture for a Parallel Combining Static and Dynamic Branch Computer Cenju-4 Prediction To Reduce Destructive Takeo Hosomi, Yasushi Kanoh, Masaaki Aliasing Nakamura, Tetuya Hirose Harish G. Patil, Joel S. Emer C&C Media Research Laboratories, NEC Compaq Corporation Session 6b: Parallel Systems Session 6a: Prediction II Performance

- 142 - Memory Dependence Speculation Trade­ offs in Centralized, Continuous-Window Evaluation of Active Disks for Large Superscalar Processors Decision Support Databases Andreas Moshovos, Gurindar S. Sohi Mustafa Uysal, Anurag Acharya, Joel Saltz Northwestern University; Computer University of Maryland, College Park; Sciences, University of Wisconsin- University of California, Santa Barbara Madison A Technique for High Bandwidth and Investigating the Performance of Two Deterministic Low Latency Load/Store Programming Models for Clusters of SMF Accesses to Multiple Cache Banks PCs Henk Neefs, Hans Vandierendonck, Franck Cappello, Olivier Richard, Daniel Koen De Bosschere Etiemble University of Gent CNRS, LRI Performance Analysis and Visualization ot Trace Cache Redundancy: Red & Blue Parallel Systems Using Sim OS and Rivet: A Traces Case Study Alex Ramirez, Josep L. Larriba-Pey, Robert P. Bosch, Chris R. Stolte, Gordon W. Valero L. Mateo Stoll, Mendel W. Rosenblum, Pat W. UPC-Barcelona Hanrahan Stanford University Keynote speech Networking At Home - Directions in Connected Computing for the Consumer Kevin Kahn, Intel Fellow and Director of Communication Architectures Lab, Intel Session 7: Novel Architecture Issues Register Organization for Media Processing Scott Rixner, William J. Dally, Brucek J. Khailany, Peter J. Mattson, Ujval J. Kapasi Stanford University Architectural Issues in Java Runtime Systems Ramesh Radhakrishnan, Narayanan Vijaykrishnan, Lizy K. John, Anand Sivasubramaniam University of Texas at Aust in: Pennsylvania State University The Best Distribution for a Parallel OpenGL 3D Engine with Texture Caches Alexis Vartanian, Jean-Luc Bechennec, Nathalie Drach-Temam Paris-Sud University Cache Memory Design for Network Processors Tzi-Cker Chiueh, Prashant Pradhan State University of New York at Stony Brook

- 143 - i.3 = 7

1.3.1 «S

=i>m--inz£z±Mm, siffliistrtt, m± ■ **#77> hD. #s-^«rfc*fl?®©sijmiri6iit'r*s»®S!iA5iiS$n'rv^o ^©fc»N Hpccmgh Performance Computer)/x — F >) 17 C >ffitls© In) _h i iltiC, V 7 h 7 x 7 t; =t 5tiffijaa©^jifb4fj d ni^-es. s o HPC rplilzttBESfflv^ 7 D 7Dlr 7-S-& jE^iJCgiE Lit 5 7 K V'sW-n A5#® AD ITU5. bfflJ^&rfriffflftStt, 7ot 7»©rnmfbK#A^g#l:&-j@,

3 > t" JL —7 *sj|| — 7P -fe -y +1 £> 7)bf7D -fe y b"tb> < iigtcafe D . 7;17 r/n-fe -y+f3 vea-iMi. ESb tt64fS6$x%S S*Iig*C*oTU < Ck&^b71\^. bfrb&Ase>, v;v^-7n-b -y+h3 ytfji-j'->XT-AC.i;-S*36?iJ-fb3 >;H 5ti\ SttC*UTI4. *fST-S5 7n75 3. >7Blg»s#ST-S ti . #HT-@toTfibVIC < U b t k#ti\ M7a:T©@#&+a-l:3l$fbf C ?)l/77Pbyt3> (a#@#l:*T677U 7--> 3 VSgft^fflfgBW&ffiig tb) tt. A^o-c

®*mi0iei6IHEy7 7AW.L-amA"'V^l OiTSiB-^tm <2>21 ^ - l&’B.OM+n&ft -KXi a JWtStSKtL fid ir-TStT^BSfiviiyi.

a*fkM*|:j-Vm;KKAiiW± ©rbv V>AH£?i|(baA^ ftifsetp-itfc ©3t?'Kb=i> W3o>ftffimmmo)W3zflm

HaSIBS: f?'lxi#;5^f$y)3£?i|8tH^coili

( *(Sl b 73'J'T- V3> g tbrtdti-TciSro'ltifc) ©IXttlAPC, WS, HPCIZ^Stt^ttfA 7-=t79*V©t**

nWTfm fiWX#WI% ;*1£«VLSW fcs^isk ^mmfiSk mwmm

0 1.3.1-1 aasEig • x»e6gaiBi:*mfiM5e©i$

- 144 - c©fc», 7-9-3 > if 3.-7 ->x# A©®!"## $ &itg± L, A"-og|8btt 6E6iei6-Sj£?!Hb3 >/W 7KiW©giJ%As;lU$k&oTV.5o iRffl7:®uJp-tU7'n'/7 5 >7'ltg£A;bk L, S£jfe©3fi?iJfb3 > ;W LtSt 4 6. A" LtoSt© 5> ti;fc@Jl#fit;#|i6(;:7'D y7A6fl-|!|-t-5 73-$-C*tt*< N 3 3Asgtttoic»jeA^®aStotc5>8!l VX-iryn. -'J>y6f7-5,7-3^h75r-A7V-Ajg|!)jfi?ij{b3>yW7ttrjfflE^IB%6ff3„ kft6©#^uK#C4 b , A#ft PC (Personal Computer), WS(Workstation), HPC ©EblEliJSAitk&S x'll/AT'D-tz 'yD-3>Ka-7v7#A0®©#-f$£|n]±$-9-.5 kk*C, SDattlbSfgtiSti'^, cm:4 b, fSlgaf85eiie»5t:* ‘v>T, *18 ft PC, ws, HPC rfi@©ggg • y ?SK»5tcH)iUAc«SEEfflSiJttl4Sto, PC A" 6 HPC CS-6-IS 18$ SB, #6©3>if o.-^&@mL#a^*&e7$<©m#, s e.cii, b*&t-*-7 (««) -sc $Ac, *««©**«, i«, fiKlc#T-So

1.3.2

r P*n>7 P #Mfb3 7/W (Dr prs>7 P»Jfb3 wt •f 7ftSi,*J;y-'(D3fi^Jfb3 7/W 7©tt|g|fflS«©gg%l:ot^TE^Ig%&SIS1"5o

(l) 7P/^7A P#9Ufbn>/W7*%M%

%*©#Mfb7 >/W 7K#H4, #-eS©H6?iJtt©*S7"D *7 AA^lftm tfJffl L Xi^fzo LA" L, n-pijiytf, '^77*? 70-t^tli'e,, SHSiELfct 7 Pl/S^i^-A, HPC->7t=-A k#«fb UEUfb LIT < 3 k, %*©# -*£6&MSk Lfc3£?!lfb7:ti:, +fl-&SI%ittSg6 ttifk kAs7:$t\ cf%6©/\— p 7 x7t;tii£ < g»36?ijfb7 ww 7f£S©lFSrAs#toe>ftTV"3,, 01.3.2-11;## 4 7 £, HPC'>7# A©14t£B:, SrVW7ft« («**fb k BfflW $ S©W$fbt: =k;6ilf*T'©f£lbi6i±) k, #)SfiE t^i' 9t|ffl7-^fl'ft) C±D, ^©ai6tt|gA s|6l±L-CSTt'5o -#, Hf1077 Ufr-->3 >S*lA"L7cl)?f©StgT*fe^*affitbtt, 0 1.3.2-1C ## b’n. — 7 AsitSt ©^7 P;P7"n-fe7ft&P)SffCffl7 7;u7"^7 P;pfbgPi,a###l4m#mil:#tPlcAL — 7jT5^®ttfbttTAso fee $ 6 )C, 1990#Su#t;l4, MPP(Massive Parallel Processor) kff«n5#!67^7-D-b 7th&f5©#-7'-T-SELAc7 >tfn.-7 Asgli L, -StCS iitt«gl4|fi)±LAc^©©, ^att#bf4St% LA"iifil7: A"o tzo 41c, $#7:141006

- 145 - £ TZ (D 7° n Hr y —^

ccDcko^##®^a, ^;k p## ^ ^ > - ;i/-y - &*yo v mmn(Dmm%mm$}^\%& t sm'??v j &o Ctl^^Ds H^Jt4fb$:fo]±$1i:^ tt&lz, C^P'OTi^ }$o fc:x—if ^&/b>o fc x^v^yn-b y if rj > b° — ^ A©f'J

ammaemmm

ASCI CHALLENGE 100TF- (100TFLCPS/ 2004)

VPP700(F) SX-5(N) 1 OTF ------(MPP)______- kSX-4(N)

CM200(64KPEl

y\ SR8800(H) 100GF Z\Origin2000(SGI) RS/6000SP(IBM) SR220KH)

10GF CRAY-C90

P26 00(F) V 3>tfzL—$1 ^ybRAY-XMP (800MFLCPS CRAY-I (150MFLOI 7-* fwcr

70 1980 1990 2000 2010 gi2#j1466 va>0)3gf7l466) -91614660)10-50% -MPR-e(*3~5%

y.T©a*3„ a. giiT^fyi/'f >jte?u-(bft^i©iii% (Fortran, CH) T-® frtl fc V - 7 7" D V =7 A t & 0 , #M#:am@g*%KkT6 77 7 h7;i--A7 ';-&6*ja£?!Mbr! >;U 7©S#gSff ■5 o #y*toCti\ AS@fl-6m@tolci6ttJ»^iy Y > (*i@) Cfl-gllU 6«jto£ lbKFi%Sg%-f 3=, ;;■?«, 7-?m&ffi9mWu jSStoSUfSE, ®, X-yya-U >'f&mTfiOo &tis ,'ti*ffl©3e?Utta3ji8lgtt5*®Clj:, *S*P toft 7d y 7 5 > ^8I§ («iia OpenMP©*®®#) 41fH'&o

- 146 - Fortran HPF, HPC++, Occam, Linda, Id C, Pascal Fortran, Prolog, JAVA DFC, SISAL, Valid

~r — *? $5 M * IrI OpenMP^m, (##)!/

^ t V

...... ------7 -y y h 7 * — A V i/ > SMP (Shard Memory Multiprocessor)

i.3.2-2 r hviyx

- 147- «iit a-Rm*#©7 oy?yo77-f 6ffl^TM®{b6H^MISS©7-7 y h 7 * -i>7 'J —ftafi^Hb^ji-——;i/©|!%&f7-3o El 1.3.2-21;r hv<>7 h >;W ?0A##aEI, Eli.3.2-3l:m%M*M#& sf. ai.3.2.ii:flsijerjtcHr-B^Mi$@6^v, «7©K#ciwf Ell.3.2-4~1.3.2-7t;^-ro

#M4b=i>,W9 ------.x

V—x3f □ U HPGJz-eSi$^^JttrU»Jl?U'l4tEid!SI5^h3EJ5R(OpneMPW)'t?titiA0

01.3.2-3

m. 3.2-1 r bvx>x h#^j/u mm# 7D!77i>CSS##Ci v;v-^7" w ^(DW-tL^CDMLmtCD £: 4>J ffl o ®S^eSSCB@4)-S!l LI

§r^!4o

» a ft © k ib tc li; ^ - -y* *s L& < T &# g Stj-r — ajj^-r^^afebo S4o

x — V >*?j^#^##4o pfa. $>^v>tt$jBg»tsE »65*70^7Ap»tiS >;W 7 7 4 — H 9 is H ?§ih iz. fij ^J 4“ -2> #c#&##4 o

- 148- (2) afcTlHbn WW 7©ttSgFfflfitflS®Sf%gB%

3 > £ a — f '>7.9- SPEC ^0Ufc!0-2>^>^x’ —

3©AS—IBtoTfcS A^ Cpt6©^>f-y —±Ica—p >>x7©ttfi6£jffflj-f&fc fetl*Jhfct0t!fe5o £©£©, am-fb3 >yH 5ti«©&JE»ttt6FFffiku-5|g ££i:liTg&V''o A>yy — ^©tpCli, %*©$ — *i@ j£¥U-fb3 >;H 7-ElJlJy\- p 7 ^ r © t;-7 ttlgCSV'ttSE ASjSjST- $ 3 t© (M : SPEC CPU95fp ^>^y-y© SWIM 70^7^ iSC jm-fb3 WH 5 As

U < £>Si8)b Sff 7t ti a - p C ty±©#l%C Z bttib|p|±* 5HSlT:fe-5 ta-^^TAi;Lt0gft ffi^gWt bft'Ofx’-7T-H:I¥ffiT*S tiWtK ^^©y;uy-T"n-fe'7it3>t"i — ^ ^yyA©6filflec j3V'-ttttt|g|6)±tcW^-f ^ 3 >yw 7tSE©tttb6&iECfffl'r -5 ««©«* #-i:-g-e & 5 o SMP(Shared Memory Multiprocessor) ■> 7 ir AL, WS^HB^flS® r 7 HA >7 pam)b3 ww 7K#i©gg*j ic.tn>TBB%1~-5 gumu^yp'f >#¥U-fb#©N 6t©l:WT©R%^*&R9» UtmE^SSmStiA KTffl il t) O a. «3'J##b*m#@©^* jfc¥ij-fb3>yi¥ 7A^e3flgij«fb©Slb6l¥«f ^>^$6e^gs%'r^= g»3-;pyy W >3fiWbfi:Sn e«ttSlffftSn Stt^-j'fl-fR 77yi-'j>ye*^flsw«gsff/W 7©KA##b&''\— P ■> xy©#t$^>ttl6l:tt#;'ti'Ti:fffl't5^i£ &E^PS%t"-E>o #^##©6©©A>f-7—73 7? A&#g& (@t#© ^>iP7-7«77'J 7-7 3 >A>€>©iliR§#it) U afiWbttE** i: LT©H;-6'tt 6. a i.3.2-2 . a i.3.2-2 #^Yb3>/wy©%##m%%i:Mf mm s am-ib 3 >iW7iam-(b &ffis r-tfc&mtiitkWu mme/mimm, 7 7 a — V > »‘SES © fl sto« S ff mf6^*&#Wc #¥Hb3 >A¥ 7©ig;^tt n yj\ 7\-P7ir©«6£Jptt(g IbfffflS® ttt#-&f"ca£?ijft3 >7i ##o d’3©iEirtt*g6Fe'r^

- 151 - 1.3.3

HTi:. zmf&BmZmto

(1) 7 hvt>7 t-3£?!Hbn>7W7ftSIB%

a. v =7e.v»sma#t-s3i o CT&y? v h 7*-A7 V-&g«jv;i/fL7'V'f >36?!Hbft«&{i*'t•&„

Eizt"-Bo c. *E^gg%-c*ig%Lfcs*sffei-^fetoc^ m*$nfc*9!iffia3aifigttsi®

ff-T-So E^|g%Jlg rjfe^ij^bn >/H 7 0tt|gfffltt®ogg%j tj; ■o-tfB/ESnSfffl^SSfflVi-r, #«©S&3)*»fflSMP(Shared Memory Multiprocessor)'>XTAICj3V>T> 5$S©S—SSTcDilt^iJttSltiHSff 7 gHj3£ ?ij-fbn >;W 7©tt|g£ttSt l, 2 )gjy.±©6El6|±6il)S'f 5 c t 6 I^fc-r-5,

(2) #MYb3 >/W 7©e##@iK*©m%N*

ry pm>x b#Mib3 >/w ?ft*©m %J ©ifffl©HJ(fi§a tt. SMP->yy A6*ts> k LfcatM-fbn ww 7© o

1.3.4 ffiftfflftfoi&l

7 hvi>7 b3£?iJlb3 WW 7ftiri5i*&*iit4CafefcoTfflSlf^l8J6«:©l ($) 6 KTICyfo

(l) #*@ag

$to»E^8g%©lE*61l-s<, zcDtztb, (7n yxy b V-y-) ©PCS • # • *«# • *ttoCE^gg%6HST^6a; (Stia 50

- 152 - (2)

O) m'pfflpim

MQwmM&mmtz

(4) *nwafais©av'

MUS4=t;*4bfe*n«Meistco^rtt s f t, RiftgtcS *0 k't^o

- 153 - * 2* 2S >ymm

2.1 ms

f\ 2.1T*lAt$(9-tS(3>Li — -T'f >yirHb-6E%Sg#S©Stt6 $ kto-5„ Sic, 2.2? z:/? Hc«L?H16l8$ef7o/r®$S$kto6o ## 2.3t;i3UTi£«^it3>ti-7i^ >ySEA ijSfflW*B*T7 -';'?-'>g >#»£ -3^?#MUX:0m&7nf o

j£«»ic3 >ycMt)58f^sg%©®tt?tts k Lt. «^SE$g^»^0r6t-C>CE^a%»sfi;tonTV'^ rNinf7"n -7 x 7 f j N >7-'|'>737F7777T$,.z> TGUSTOj , *5 Fffl rGlobusj S5tb±lj ‘\ SttfflfMSfB&i&Stf' 1--5o

£S«dF!(3 >7'to«KMt3to»i|j|u]I0*f;::fcOTB\ M^SI^ISSIS m^mrnmmn, jsiwjisaneiMnyfa-f-t > tv >777 f^7 •7-k rGridj ©. #lr*BS*-C>k LfcSttlcgat--Si8$Se,-StoCff ofcE$&$ k to^o w*7tt, ®t»I*ox-;f “3>i;i-^sifiiltt*»;tIt:|tJ^l rSupercomputing ’99j. ©Grid -&#&S;©3>V —7 7A r Grid Forum j. ©Java 6ffiofciWttfE>16 9 lJ4)'iS(3 >b°rL — 7 J rJavaGrande Portals Group meetingj. 7*7 $ 7 F fitlnl £ SI ?U • jJS( • J[£tj8 Jo <£tfflittlul+StCHBf JeI rinternational Symposium on Computing with Objects in Parallel Environments _K ©UC Berkeley © D.Culler |tj§A5[t,,C| k&b^-^W^ptj S'11(3 > t° n. —-r J > 7* d” > 7 7 7 Ibi'f-tS rMillennium 7Dyi|) h j, ©l£tl53-i(3 J L^lC^LT^-dk^y F ffl®S!l£IRfct r Globusj, ©NCSA tfd'i'itiotlSiS ^##1%#© Grid ##7D 7x7 F ? & r the Alliance 7*D id5 x7Fj, © 7 3©±g&##. doJO'T'Di/z^ F£*fSik L£o iS*©iSS. 4-JPiWl4fElt*iti:ri7 7d'A77 F&iBC LT*b. -rt-tc Grid Forum kPflifl-S Grid US®/cto® IETF(Internet Engineering Task ForcejlC jgo As Md&©®J£doir>'«|g©*d|i • SftS;J;*&lfoTU5cil)S|iJlibfe. $£. -f >77tFttt3;ET-( f-f Fife 7 7 7 7 £. iS&#Wl&%* 7 F y-y-n^Hb. MtiSE7*7 7 -f 7 7 7iS*/b>6> PDA f;t©c3SS*& Web &k'©d,> 77&S|t''T'b—L7 L. ?©1©77'J 7-7a iSStff] +F—tf 7 *?&©$(<% #&2FtT©&o cdt(,®##lcFb^. ®ASH®^7 F ;F7v^3 > 4>'D±#©7-* —tt#*56ftCjgft£ ko? liotl'SC k Uto

(KlSfl-En > e^-T- J >7®*#k UT HI" 3 7 7" U 7*-7 a >fl-SF:o®T©to WCdot\TK. *^to».®.-a-»3SRffs ¥ffiE*#s SisxStofL

- 155 - H%6jiito'T V-Z> rl£®^&i!C3 > t’i — =r *< T-6 -ir — ix 3 >©JBfgj SIMilT

*E%gg%T-ti, T7-'Jtr-->g>kUT SDP LTVSo SDP ®@(¥iE stt7-d ^7a)«, #m#:+6sissiji:*?>z u mms lC5Rto-a^@T-$.D, %#©##$UM%k'0A^6s TV'.&o LA^l/x E^©#S®lEI+»k'ICj@fflt--5fctoiCtt1 #-©*-71-3 > t"a- %;+#### o, c^*-emmfb#?w#g-c6 7t. c©6&. sdp R@&6#a-#K3 >yc*ftE$itT@ lWjSI+*6^S1"-BC k-c, cii$T- c k&gmLTt,^. sdp ngeoMsybcjsvrtt, sdp 7D^7Asss®^7^-^sfi vt . 6 ki'3^m&k^6©x & wi^bt^T-s, £ia»a3>bi-f^>yffl©T7"vtr-->3>kVTjsi"5ck jfit>fr^>fZo

- 156 - 2.2

2.2.1 iSaM^VKi-r-O^OStt

J>£« #g( 3 >t" h* y h 9-^R#©* M L, x — ^ "C& C Give & Take £ £ b rSffl £ *1 3 MMY & Z>

aG'OCD^

;xij7^y h7-^^n->!j;i/xij7^'> h7-^S:^:a U/z^D—

t LTJ>£ —b&E^

• Akenti http:/Mjy.w-lt^lblgDvZAkejiu/ Y A ]) ti

ty;kft

• Albatross http : //w ww.cs.vu. n I/a lb a tr o s s/ ^ ~7 > &

f bo

• AppLeS http://apples.uesd.edu/ ~7 ^ ]) il yy V y —:/ a > — V >y^:#B L/zE^yo S/o: /? bo

• Condor http://www.cs.wisc.edu/condor/ 7^ U A ^mE@^fi/z$m(D9-^y7'-i/a b&#b#&fr&o

• EuroTools http://www.irisa.fr/EuroTools/ 3 — D y y @1 3 — u b 7 — Y t Z> fctfXD'T’u i/xY b 0

• Globus http://www.globus.org/ T ^ ]) il T A U MfeWLk^^tzYut/ xt? bo y D“^'/i/3 > b° jl—r -r >

- 157- • Grid Forum ki&vJMwmgnMDXumjixgi ^D—>tf 0.-7"^ >y^7^A0#$

• IceT http://www.mathcs.eniory.edu/icet/ T 7 U tl

• IPG http://www.nas.nasa.gov/Groups/Too1s/IPG/ 7 7 V *

f 6#^yDS/o:/7 ho

• Legion htlp.://Ieg.LQ.ni.Y_irginiaJ.edu/ T 7 ]) ti 7^%vn >/9 ——i> 3 ty f ^ c aco-r ^Av7 h^^yyo^j:/? ho

• NetSolve http://www.cs.utk.edu/netsolve/ T 7 U ts

-f7>h "th—/^7=";i/C^'d

• Ninf http: /In inf. e tl. go . i p / B ^

f 6C acDTr ^6S/7TAo

• PACX-MPI http://www.hlrs.de/structure/organisation/par/projects/pacx-mpi/ Pd' 7 //D-wi/:]>trMPim#7^Vo

• UNI CORE h ttp: //w w w. kfa -iuelich.de/unicore/ K 'i 7

C^l6(DyD7ai<> h a&U'Tia&^o f ^t^^T(DyD^a:7 h^f ^T^fUD%#!j&@oTl^o > U:L—f-^r j; D, f ft

£13 2.2.1-1 \Z7Fto

- 158- Application Problem Solving Environment Science Portals

Environmental Chemical Cosmology Molecular Scientific hydrology engineering biology instrumentation

Nanomaterials

Application Component Architecture High performance middleware Web CAVERN HPC+ + Condor Resource Many Numerical tools soft brokers libraries Worlds Legion MPI SWIG NetSolve DAGH Ninf Architecture Components Accounting Communications Information Scheduling Fault detection Security Instrumentation QoS Data access

High sgeed networks and routers Resources CF

a2.2.i-i if D-ivinyea-fi xxa

t a > 5 W??;!)* US/v (eft*3 #atbfc9Us

Utt'-So ± SB U -> t A A © # < 6 s c© 5 •k5. n/'J-Ir-va >©ff *EIC|6*t'|Sg©^Sfllj-S: UTXA£T$>bs 7D^77tf(iiJU: k £ & 5 @#5-1?$) -5 o b©5 HlV x: yStc-ISBf-S t1 A© 41* 6, jilHJtfettfe-SIt SXi®$T^-bTf-5fc©©ti:iE^6tg#t uru5 Ninf ks &?!l7‘a?7 lUffl $ ti5 ->< s'-fe — y MfS5 'f T 5 'J Tcfc5 MPI S^D —niPn > bi — x-f >^11^ liglj-CHSUfc PACX-MPI CHUTg^-f 5o $£s *Bt©7-y<-;il(7x-*SC|bt I-WAY y —;i/#gS%£ftTGlobus ttcfte,©7-;nfs k o i kKi&7d ii x£ s t-$> •is 1997 ¥C0lfe, Globus 70^1? htt^D-11113 > Ki-f-f >^®V7 K)x7-f>77xi 7 •5fc®©>A'-;v$ik vrifi^sscag snrv' •5 Globus Metacomputing Toolkig ©®$6tt 17© k Us SlflfiST'SS < ©SlU^gl UT S4 5 h,ll')x71i:(4Bt5^

(1) Ninf: ^7U

t' M9> 5 T-it - n > b a. - 9 (T ii 3 ^ X 9 '> T t U & k"©*tttg8t@ i/X^Atts ■?■ 5 Atl & tl5 & ©Tti&U'o B*TAfjT6©S/ATA&#A UT l11 5 dfeHs 5H^x#^s Aj tlT t'' 5 o J6fl^iW't3:tbstW7/X'T‘A©^^ltj:s

- 159- a^w#a^6o Z=A\ Cft(d:^bTW:^<, y^ V &#x.UL |5|#& API tcmO^C^COy-f b^^ c^7c0%7k^6M#^yy tyf-&$6 wm-r^^o yyyA. cm b^^(:&^#f##m(PC ^9-^yy-ys >)±t*ji—if-r >y 7 x-7^ti#t u ;^3 >&¥)%m ffl if & '> 7 :f A 0 d. b £ Datorr(Desktop Access to Remote Resources) 1.1'' o o Ninf (i c®Datorri>y^A(Dic-e&D, #%

E^^rUxTU^o Ninf ©M^^TAti^7^7> b /ft-7^ y )l £S”3 ^T iSHf $ m2.2.i.2 Ninfy7^y>K Ninfy-;^ 7 £ tf — M'® 4

y □—/Syi/i]z/tf .%—T't^y

NinfDB Ninf Register

Ninf Ninf Executable 1 Internet Computational Ninf Executable Server Ninf Executable ervetj

Stub Ninf Af/nf Client Library Program Procedure

Ninf Stub IDL File Generator Program

-3- — If U: 7 "7 T > b y D 7'77 A £ (X C++N Fortran^ JavaN Lisp h V'1 o coycy^ < Ninf ——ys>#)±7!#r#L, ^y^7>b^^(DU^3LyM: L.-ecD^^&y y^y> bc^to 7 7^77 b#Jy(4: Ninf_Call 7$:#^ LTy-;^C#LT U 7%-7 b &

- 160- mmtzo m±uc8i®i^ Ninf_Call("foo", arglist); ktWMSiWtil L4f?&7 k, X 7 M r7-i'±CMia$li'n'l. Ninf -tt-x©* f L, f 0»-/i±07l' 7" 5 V"foo"4 WTR t. mex att-x4%xLtt'@Ai:ii, ninf://hostname[:port]/funcname ©iWC, URL tcapLfclB?iXSIfi:1W'9--x-©;k7 M, *- MSUW -f 7*7 <) 4fl£ t%o Ninf_Call ©ffetc*, !+S©%fi;kilS$©Efi6fl-fflLTff*7Zci<)©MEJV h 7 > tf 7 -> a > ® a * * 7 tz isb © M S * if tf it #t$ ix X © 5 o c © i 71: Ninf 777A(i, f:t®»-U7 4ig#f ^#j(»-/\)j c*fu #7L©% l(^7-i7> M)tf 6lt®#$&*#L, k tz^mt 1-i6->X^A7?dfe5o Ltf t, •9--UXA siex.7c@'6'T-*E*© SunRPC Skligft D . ft- 7 t fl] ti it T- S * tf 'n x. 3 o

(2) PACX-MPI: ^n-/vua>\^^~7-'f XfSSEItCSIScFnfc MPI

y 7-t-y*m&m\ 6#M7'cif 7 5 >746«A-m77 7-A±(:mMT-5tf&tc# $pl)9& MPI 4i£5*Lfct)©tf PACX-MPI X&6« MPI li#*##*)Xt,7 kt$ <# fflSftTVSy yfe-yafi^-f 7" 7 U XS> D, C Sl§^ Fortran & k’XttJfflt-5 k k tfX#e. MPI liMRlIWltStley-^fr 7 7-7Ci£U£#?$-r-y 7-b--7ie{14fi: 7 Zr&lZ£^Zt£ TCP/IP Ci6m#4 L/c 0 f 6©kE#l:6MttS7 77 AtkAlt^ MPI ©fr+IW >tff' < -7tf&3o fe£U MPI tt55*7 ‘D-yt;i/3 >Un.-7-f >711814*1® LT5!$l;SftTU&6© tf&<, to#© MPI S//D-/SA3>i;2-ff>//SIt«fflt4®tt7-/ff/'f + -I.I+*«R8T-ffllijffl^ jSH'>7 7 A±-e y a 77"n-fe 7 4/fefig-r ^ #sSCM LT P##tf& D, $fcttlg©H»i:"©®E#$.-S.= PACX-MPI tt MW 7 © HLRS(High Performance Computing Center Stuttgart) 7 ErtJfS snte, 7D-ytA3>t"n.-7-f >7*a)tEtj-c#ssnfe mpi z&Zo pacx -mpi tfi@#T6 7T 7" 7 V MS© API ^tv>f t 77 ti MPI CMUXt f), MPI 4ffio T*#n^7n7*7A*6liV-73 - MCSlE4i[]X.-5 k k& < , PACX-MPI ©7-f 77 'J 4 U >Pt6 k k Ci b i£8tttB3 >7SJ1X7d 77 AtfSjff T3o k©#6, Ama-mxMEC»%©AW 7 M 9-7©Smt=iSBf«x(i >*-*7 Mx@m$ttxtt&@fr, n&&ft#«i:x»< 7d t7B0Xli TCP/IP fcAJiATSIttf-5 kkCt^o #m#*©#9'J#+#a7- M: ti, -e©;t#a@#©#@xma%7 7t->*#4e6 7k k©x$^ mpi hti'5. fi$®a«k7-MM©*«4K3iJ V, &m&mntztzib PACX-MPI tiH L !t#m ±T- toftw S 7- n -fc 7 ra X ii 4 © It® « @ # » MPI Sftotiiu *»tat ®«±X«J#1-5 7D-fe7MXIi TCP/IP 4EoXilfl4ff7o H 2.2.1-3 tCft*«(*l® k>- MBS MPI ©M#4^To

- 161- Global rank

In-daemon Out-daemon Local rank

In-daemon Out-daemon

Global rank Local rank

0 2.2.1-3 PACX-MPI

PACX-MPI MPI &V-7

JibMPICH-G(MPICH on Globus Device)b it /v 7: l£ M

(3) Globus Metacomputing Toolkit

Ninf ^ PACX-MPI & b;b7o:T#l:{&@f ^77TA&##T^^^)lC(±, 3L-

Globus 7° D 7 ^ 7 b liM^J • % 7 h 7 — 7 N bj-a !J bV>'o/c1il5 ,&^ 6 70^^7 p-c$)^o b 7 H:#ADbTb'^o Globus 7D7 J: 7 b 1 3 (CN Globus Metacomputing Toolkit(J7T Globus Toolkit)#5 ifc 7> o GiobusToolkit (±3.-trmmEi/7^A, m#3^73ib ^0(3 7 If ^ - T 7 7^77TACD##1:^^^ 2 H7##3i 0 (toolkit) 7? 'b o Globus Toolkit #5 T % 7 — )V & I"* T JGV 7 ^^b( ^ bbb 7 ^ 7JH ;i;i/37tfn.-T^7^77TA$:##T6Ck#5'r^^o #lx.Ub M# Globus Toolkit MPICH ^^#b/z MPICH-G(MPICH Globus Device) & £ o luidi® Ninf & Globus Toolkit #5Ji#JT £ Mff!3 ^ 7 3 V ^#b/:^-73 7^^#5#%T6o COTol: Globus Toolkit (±jA^^m371f^- bf7^^#bT^D, 77 b 7 7 37b 3 1998 ^ 10 G ^ Globus Toolkit

- 162- #2.2.11 Globus Toolkit ©3 T# — ex

-9-- ex £ lu « m #isea GRAM 9 V-X©g!l D ST * a a Nexus Unicast/Multicast Mfitt— b*X t»« MDS -> X t" A ©«£* ct t M t -3 tSfg'N 0 T f -b X -b * .n. ijrt GSI authentication ^t'0-fe^rJ- t) fr -f +b— e X ttema HBM SBr-XT 7-bX GASS x — i"\© 9 T — l-ri’-bx#- b'X SlfrX xX iPta GEM Slffx T'Ol/©«$. V y XT'doiU'BEB

Globus cn&©+»■—tr^ii/i>Bicj®dtibsij count'sceast* 33 >^©T >X 9 X > $ It** A A5 nJsg * -a T ID -5 ^n^n©«$6y.Ttrsf= a. Hi® It a (GRAM) Globus Resource Allocation Manager(GRAM)fi It St J$ 15(Sftli Eh C 7" □ -b >y ih) & IS af 6 fcto© jESti® StStif t"6 oGRAM |± fork. LSF(Load Sharing Facility)-? 5 Condor ft if. Xo-bx©&i& - @a©ft*lffBDft@&RcTt'3o l+®th—0-9 —ytc *1:13 7" n -b X £ tit© ft*CS If T 1 oiy,±© GRAM Wxli fork C <£ o TXD tX&$.af 3 7Xyft0#fl fork @0 GRAM 3C ifCft 9. LSF © a 7"n-b X 6jgMj1"3 i -5 9 >Xdt 9 ->-6HELTt^»T h t;i5C'Ttts LSF IS© GRAM -5 H if C * -5 » b. iifB(Nexus) Globus Toolkit ©aH+f-exli Nexus tL5I*7? X5 'J (C * o Tfttt $ ftT t' 3 = Nexus API fcltjSU X y-b-vaffl JP«PH^iE§»UtttlLftif ©iWV^;t%XDX7 5. — lt5fel!)C)Sl'5. Nexus liiEliflSCX mm©m@ft*&-9-^- h uxt^c mas* i±ma To h3ii/SidTiis<, -b^i 'iff. anstt. m^E#fte©#*&#*#3. Nexus ©maiimev t, s * 9 . #9 >xa©mas*&#imf 3c e#ai# ifft3o c. tSlfi(MDS) tiilSlf— KXtt Globus Metacomputing Directory Service (MDS) As@S LTl'^oMDS tty —x, os. x e 9 . t x h v — za's KtaisiU'w *>->. WESmflyn h n;v. IP ypvxk*x hy-XyXXDyiftoyy exft'ftk©# mds ii#+®*m©#m^%«t:iMTe#m&%aLs9 t *6©#mt: 7 7 -fe X ~f 3 fc 0© 'X — lb 5?5 API 6$6#l"i" 5 o ##KllCli LDAP(Lightweight Directory

- 163- Access Protocol) k ID 7 t7 ^— 7 ("T 4 V 7 b ]))\Z.'T 9 X't' %> tz &)Ot®*P 7 D b 7" —7 a >7°n7'7 < >7X >7 7 x — 7

http://www.globus.org/mds/ 0^-70 GCI 77V7b&il=amfflU\ Globus b%Ib #b#W:mf &##&#&c a#-?#&<)

(4) £fc>0£

CCC^g^U/c 707^7 b(±c:<—gpco^T&^o 7 U^.—7 ^ >7(:^ frtlZX'&Z o $>®\£ Grid Forum #&£<> ^ CT' Grid h 7 <7)U:j£fi^S(7 >b°zL — 7^>7"£>fcffl©7D7:ii7 b&UCRf f xrlc^^cD 6 U k 10 U $^t/co #An#(± #&07-jr7777i/-7ia-na#9 OT&&0

• Scheduling Working Group (Sched-WG) • Grid Information Service Working Group (GIS-WG) • Security Working Group (Security-WG) • Remote Data Access Working Group (Data-WG) • Application and Tools Requirements Working Group (Apps-WG) • End-to-end Performance Working Group (-WG) • Advanced Programming Models Working Group (Models-WG) • Account Management Working Group (Accounts-WG) • User Services Working Group (Users-WG)

>Un.—T--r >^PI?t^^0^y£T*li"FaB ©$£>#& D > tuTftS 7 X* n — ^ x. -2) o

• 3/15-17 7" O — A )l k 7 7 7 7 U > h° a — X 4 > 7"k: [H11~ £ 7 — 7 7 3 7 7° (WGCC2000)' , • 3/22-24 7" U 7 b 7 ;t “*7 A(Grid Forum) 12 ^ 3 4 7ir 4 rb • 7/18-21 4* 7 7 — * 7 b*>7yl/>7 (INET2000)\ ii/P>7>f 3 - 9 H 7 V 7 F7X--7 A. • 11/4-10 SC2000\ ^0777$

1 http://www.trc.rwcp.or.jp/ 2 http://www.sdsc.edu/GridForum 3 http://www.isoc.org/ 4 http://www.sc2000.org/

- 164- 2.2.2 S^M5j$ri&fnJSl3E : Grid Forun 99, U.C.B., JavaGrande Portals Group Meeting C&tf & tiSS/lnj

A;i/3 > b°iL— y 7 >7", Grid ®, £ 41 fob b 3 !H^£ £<£ftjC f®^^, fTrC Grid Forum anfw:ti6 Grid###/:#)® IETF #f&®ej^^j:77

-<®#^##'r##^ti/:7777&, —7-e^AL, 7^ v77%*^6 PDA Web #2:"®^ H7 L, -B ®±®yyV 7 —i>3 ^#Ag7—Ify tlTV^o G tlGCDib^^tb^, ##g®^7 H)/7^3 >P#;0^#® 0 — ^7 —

(1) The Grid Forum

fK0H ¥$1 1^1 0 0 1 80~¥$1 1^1 00 2 3 0 (60^) The 2nd Grid Forum Meeting (1999.10. 21~ 10.23) '>*d ’ffG y 7 U *

Grid Forum( 7' U v F 7 t “ 7 A ) (http://www.gridforuin.org/ )U:@j£; # ###b #t #, #1# Grid Co^T®#&^®3 >V —7TAT&6o 1999 ^SC&^T NCSA, NPACI, NASA, DOE ASCI & 1999 ^fFCA® 7-773 CCDot,, 10/21-10/23®^, )Km7^^CT 10^1AI±# ADD, ###C#imL, 0*®Grid^#®#fq|&m^L^o #xH® Supercomputing ’99 4b, BoF (Birds-of-a-Feather Workshop)/) 5 HI #^ti, The Grid Forum a v7®#^^et)a^o The Grid Forum ®7§Wj(d: Internet Engineering Task Force (IETF) —7 A It l^/)5, IETF #1# Grid Co DT©7t-7At'fel,o IETF k|a]#, «lb®}Iffl&®$i]7£C £ o X Grid ®77f

4 tAx 'i £: 7°U B: — v 3 > G, ®)##C^3G»T (d: r^tft® n*^(rough consensus)j b ^ n — F (running code ) ® H >f< (o ;£ 0 , #0", ^ tOt ^ 17, ^G'OCai^^^aLTV^o The Grid Forum (±, #%, l^T® 9 ®7-=t>7"TOk-7(WG)T#M2f!-t!7ao ztiztuD wg ii Grid®mij®###ma&@^ o, m#8yC(±0#^tm#Ib^#(RFC &*> Draft Proposed Standard)® Ufe* 0 life U TC-BtlB-fl® 7 — A-> 701/ —7°® Chair, mailing list, webpage, 4o ck 77 PFl ^ ^ ^J #^7 %> o

- 165- (D Scheduling Working Group (Sched-WG)

Interim chair (s): Bill Nitzberg, [email protected] , Jenny Schopf, [email protected] Email list name: [email protected] Web page: http://www.nas.nasa.gov/~nitzberg/sched-wg/index.html

— V ——713 Grid

(2) Grid Information Service Working Group (GIS-WG)

Interim chair(s): Gregor von Laszewski, [email protected] Email list name: [email protected] Web page: http://www.mcs.anl.gov/gridforum/gis

Gis-wG a Grid ath-c

XML

(3) Security Working Group (Security-WG)

Interim chair(s): Randy Butler, [email protected] . Andrew Grimshaw, grimS-hxi_w_@„v.irgi.nia J£.cLii Email list name: [email protected] Web page: ???

-te n. 'Jr>f © WG tJ Grid & I^EE(authentication)jo J: If qj (authorization) Gi §lj ih — PKI I'Zid' >7

(D Remote Data Access Working Group (Data-WG)

Interim chair(s): Micah Beck, [email protected]. Reagan Moore, moore@sdsc. edu Email list name: [email protected]

Grid heterogeneous iZjiZMlZftWL U fz remote data &

~T ' *? * 9fai& M&> fz Data Grid tlx at# Grid i:©jft W t iEfScIn

(transparency) IZ J; C

- 166 - Mitttemmibhzmt %), o Data-WGTUu Ztlt>(D&

(D Application and Tools Requirements Working Group (Requirements-WG)

Interim chair(s): Fran Berman, [email protected] . Bob Hood, [email protected] Email list name: [email protected] Web page: http://www.gridfbrum.org/www.ncsa.uiuc.edu/People/novotny/apps/index html

C® WG TUG V /r—>7^ Grid — \ZftLT ¥(D£o %o li®77

© End-to-end Performance Working Group (Perf-WG)

Interim chair(s): Mark Gates, [email protected] . and Valerie Taylor, [email protected] Email list name: [email protected] Web page: http://www.dast.nlanr.net-/Perf-WG/

Perf-WG U: Grid & WG T&&o a <

(2) "En* — — (3) (4)

© Advanced Programming Models Working Group (Models-WG)

Interim chair(s): David Bader, [email protected]. and Craig A. Lee, [email protected] Email list name: models-wg@gridforum,org Web page: http://www.eece.unm.edu/~dbader/grid/

Models-WG Ji Grid ft7 7'J ©-'> a

- 167- ® Account Management Working Group (Accounts-WG)

Interim chair(s): Tom Hacker, [email protected]. and William W. Thigpen, [email protected] Email list name: [email protected] Web page : htt_pjl/www_lnas_,_nasa..go.Y/~iM^pen/_a_c_co_iLnts-.ws

WG

(D User Services Working Group (Userserv-WG)

Interim chair(s): Rita Williams, [email protected] . John Towns, [email protected] Interim secretary: Email list name: [email protected] Web page: http://dast.nlanr.net/UserServ-WG/

WG

$ 3 — D y :fe U £7" ]) y K©rS WjtLX^ European Grid initiative(egrid; http://www.egrid.org/ )

v7"(± 2000 ^ 3 B 22-24 (Dm, J

(2) U.C.B., JavaGrande Portals Group Meeting

fKBfl : fhRl 1$U 2B 6 B-fm 1 1^1 2B 1 1 B (6 BR3) IMS : y^' —7 L — TfU +h> 7 7 >7 77 rR,T7 V ;£?

12 B 6 B^6 12 B 10 Bomm, > 7 7 > 7X 3 ££ & tf UC Berkeley 0 EECS (D David E. Culler Wk, *5 - 7°© Matt Welsch fiH ^ U x ^ © #: JavaGrande Portals Group Meetings 4d dr t/ ISCOPE'99 (International Symposium on Computing with Objects in Parallel Environment)(C# Ufcoth> 7 5 v 3 V —& < sUC Berkeley^ Stanford & drt/Grid try 7 7 :r J 7" & Lawrence Livermore 0 ^ -f>7 —$7hB##©U7$7^^/C^©W:U'7m't!t)^U'o^^^ E-business® Ds 7'>7 7>77 7©i##7 3 Union Square t&M (JavaGrande t ISCOPE B*©y*

- 168- LTBieL'3o $,.E>„ ^©fe©Cli^**g+*i: I/O ©tgXnb sSSCr X-bX njggT-fcS-il-S , |+S*f*©5 :'"-Xjp|+**©^ 5 ^ v-->3 >, t Ti'SIsJiSki' >XX X -> 3 XT-SS&S^ifcS,, ^©J;o»iSte&t6#lt'-5* m#^xxf-A©«#&«, $c»B3m#©3>^-*> hfr'ffiscjSELTX^'i'y > h<. ^-©gei&xxxAk v,xmwXrArt®(t7)yXf Aj k Lt, ±#©*M*&&*L6 5 pg»@m»tofito&& LTC5. ##©@X0xXX h 'j,7?y>^f-/'iffl SMP(Symmetrical Multiprocessor - tiu CLUMP tPfHft-5 SMP 0i?7X^tItlU. CLUMP tt Berkeley NOW Tig axt -am . jo^vxnx^ ’ j;oTSi$sn, ex©em S©D-*-H/*l+@S*icSx^o cft6© clump (, c >;ix i/^oi/©a# # CLUMP CLUMP kt X > /i X # T # A > p # © f A' U y h Ethernet (C J; o Ttl5fliclS$ tU Intercluster k Dflifl £ A X Ir A $: fiigJt t" -5 o H © 7ixrtSttfgl£«?X h A-y K f, Srfc^XX-7 XOl/TT^'f 7t'Jfi® st'th-UXffl©XDX5 5 >Xa#(Ninja). l£*^iit3*f{6f 3*ffc&^'JSlIftK 8. gl/¥i7-4tt»SW- btzm U'XnX^Atlg, *fS©*SB»©^a, »k'A5E^$tvCt'5 =

Tifc-5 i;#x-5(L'Stt* < *3o t? L3> SStt^tienfflyxu^-Xs >c*kM'g &dS( AX t- Al:iot7D/r- Uh5, k © 3 >ld:*ip$i|jgf -5 C k*h mxit&ZiiK Cfflio &/£«S«TTtt^tittHli-C-fc b, to Ldfl-iSttoCEiS mi'ik:#oT##&evA -6 k %;t £ o k © Z O % computational economy &a.—y©tfOK 13J;wax ?A^gK©cp'UW^tMIUk tci. 0, £## ttlgHi*T©#SEfl-|lE@©«C»^ k Millenium T tt k 5> X. IlOiiSffotl' •?> o $ & kX Intercluster jS£ti% computational science 0+htK— h©fe©tA, gt# ©mmib, gui oxt'^o etc. !S«lettlg$iiSC*tiTi*S©S,StoMISA stiSaEL. Intercluster l*]T;&iSC Lfclt* lCk©=k3C##LTx x Xtf'L — -> 3 VSffo A\ k IA-5 ©&* S&K® toSSTifc 3 „ Millenium ©$ < ©7 - H±m##g%X"X X -i -7 X XSSE&tiS X T *> 0 s Millenium tt ^©fcto©T-X h Ay H k&oT V&„

- 170- • Berkeley Multimedia Research Center • lnierne.t.Si;ale„.Sj„s.t.e„ms.E.e.s.eMeh„Pjx>Mci- • Digital Libraries • Computational Astrophysics • Reconstructing the History of Life in Integrative Biology • Computational Finance • Chemistry • Civil Engineering • Economics • Ge.olo„gy.„„and..0.e„Qphy„s„ic.s.. • Parallel Computing for Optimization and Simulation in Complex Manufacturing Operations (IEOR) • National Energy Research Scientific Computing • Mechanical Engineering • M.ath_e.m.atic.&. • National Airspace System Simulation • Patient-Based Optimization and Treatment Planning for Neutron-Based Radiation Therapy of Brain Tumors (Department of Nuclear Engineering) • Physics • School of Information Management and Systems • Technology CAD

Berkeley VIA CD ^^[4], # —bl/v^^CD^>f-T — 9 £>tl& [2] [3]0VIA Cornell I### cF tlfz U-net £ l'* o Active message CD f§ M ^ © if! V dr — y '> > £7t CG Intel £ CDIdJfiiff%Tr User-space low-latency ifBiiy y V 'y O £1*^ O & CD 1!\ £ft VIA CD fill % & fr o X O' £ o Millenium T: li Berkeley HE CD VIA £ L"£G NOW "Chi O C tlT O £ O h^'y hy — PT&Z Myrinet D v #

[##%#] [1] REXEC: A Decentralized, Secure Remote Execution Environment for Clusters. Brent N. Chun and David E. Culler. To appear in 4th Workshop o n Comm unication. Architecture, and Applications for Network-based Parallel Computing . Toulouse, France, January 2000. [2] Architectural Requirements and Scalability of the NAS Parallel Benchmarks. Frederick C. Wong . Richard P. Martin. Remzi IP Arpaci-Dusseau . and David E.

- 173 - Culler. In Proceedings of Supercomputing '99, Portland, Oregon, November 1999 [3] Millennium Sort: A Cluster-Based Application for Windows NT using DCOM, River Primitives and the Virtual Interface Architecture. Philip Buonadonna . Josh Coates , Spencer Low, and David E. Culler. In Proceedings of the 3rd USENIX Windows NT Symposium, Seattle, WA, July 1999. [4] An Implementation and Analysis of the Virtual Interface Architecture. Philip Buona donna . Andrew (lew eke, and David E. Culler In Proceedings of Supercomputing '98, Orlando, Florida, November 1998

• Ninja Project (http://ninja.cs.berkeley.edu/ ) Millenium Ninja — —a C JLLTV^o PDA

Ninja A scalable Internet services architecture. 13 2.2.2-2 #t#*maLT(D"Base"WA ^/r-^yil/^NOWCDZo^ persistent u, workstation PC PDA, Units — #1: PDA T L,/7^T7> P#6D3 — P^mb^T/zU®/^ —^t#/z%V'oC(Di#'&,Base^6 Active Proxy > P P^y^>D- P^fl, Unit

Ninja :

• Structured Partitioning of State: o Ninja 0 T —A" T" Z7 V U persistent &tAS 1tf!cF tl%>o % tlLWCD v X if A ©jX

- 174- Staff and Visiting Researchers • Reiner Ludwig • Luis Barriga • Junichi Hagiwara

[1] Jaguar: Enabling Efficient Communication and I/O from Java, Matt Welsh and David Culler. To appear in Concurrency: Practice and Experience, Special Issue on Java for High-Performance Applications, December, 1999. [2] The MultiSpace: an Evolutionary Platform for Infrastructural Services, by Steven D. Cribble, Matt Welsh, Eric A. Brewer, and David Culler. Proceedings of the 1999 Usenix Annual Technical Conference, Monterey, CA, June 1999. [3] An Architecture for a Secure Service Discovery Service, by Steven E. Czerwinski, Ben Y. Zhao, Todd D. Hodes, Anthony D. Joseph, and Randy H. Katz. Fifth Annual International Conference on Mobile Computing and Networks (MobiCom '99), Seattle, WA, August 1999, pp. 24-35. [4] The Ninja Jukebox, by Ian Goldberg, Steven D. Cribble, David Wagner, and Eric A. Brewer. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999. [5] A Document-based Framework for Internet Application Control, by Todd Hodes and Randy H. Katz. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999.

b. JavaGrande Portals Group meeting (1999 ^ 12 H 7 0) (http://www.javagrande.org ) JavaGrande Portals Group Java tf'O h LT lb > b° n. —4 > [§|eN “ The JavaGrande Forum working group ^(Grid 3 IETF 0Z3&## chUGS^PfbS: UTUt" The Grid Forum (http://www.gridforuni.org)0 IT o V — J JavaGrande 0 SI H il 7: £ Syracuse A# 0 Geoffrey Fox N 4o J; LF Argonne El Gregor von Laszewski T'ifo £ o & b° A 4 > HlZjrjt 2

- 177 - 2.2.2-1 JavaGrande Portals Group meeting Z7°D

Duration If Topic | I it . ti 5 11 ! {Start 8:30am 1PART 1: Introduction ii ____ i 20 minutes | Datorr, Computing Portals, and Science Portals, Dennis f \Gannon, Pyuish Mehotra jj J30 minutes |Science Portals, Geoffrey C. Fox j

PART II: Talks by Industry and Researchers j |s0-40 minutes pPlanet, Sun Microsystems {

| 30-40 minutes je-Speak, Hewlett Packard j

30-40 minutes Ninja, UC Berkeley j

30-40 minutes NPACI Hotpage j __ \ Lunch |

PART II: Working groups j {overlapping Group A: Computing Portal Frontends and Architecture j jWorking Groups This might include discussions about the Computing j Portals previous working groups, as well as, some proposals) that are submitted to the Gridforum in the area of the 1 [backend. j

Group B: Industrial Portals | This might include discussions on how existing technology | lean be used and enhance Computing Portals in j jjexisting/future industrial applications and frameworks |

! Other groups upon suggestion by the community j | PART III: Summary |

^@(7) ^-7"4 y^xit, y v 3 wo c tt$t$)T±) X ^TXMtf) ^>tlX c? fz Grid 4o tf zz- — fr 'y h (DXl'M'&M htlfzo ^ T \ Datorr/Portals $!l 6 > Datorr k Scientific Portals Exa LII^ & ■?> )9c CD Ninja Project vb^Snzfl- cF tltzo C 0#^ Grid v- ^ J D v\ Portals t- c^t^0^m0 web

- 178 - An Object-Oriented Framework for Parallel Simulation of Ultra-large Communication Networks Dhananjai Madhava Rao and Philip A. Wilsey Computer Architecture Design Laboratory University of Cincinnati - Cincinnati, OH, USA

ARAMIS: A Remote Access Medical Imaging System(short) David Sarrut and Serge Miguet Laboratoire ERIC -Bron Cedex - France

Language Interoperability for High-Performance Parallel Scientific Components Brent Smolinski, Scott Kohn, Noah Elliott, and Nathan Dykman Lawrence Livermore National Laboratory - Livermore, CA, USA

A Framework for Object-Oriented Metacomputing(short) Nenad Stankovic FSJ Inc. - Naka-Meguro, Meguro-ku, Tokyo, Japan

Tiger : Towards Object-Oriented Distributed and Parallel Programming in Global Environment (short) Youn-Hee Han, Chan Yeol Park, Chong-Sun Hwang and Young-Sik Jeoung Korea Uni versity - Seoul, Republic of Korea

(3)

### HPC by

- 180- n y bn.-if- d * ‘,kU''Grid:3ybn.-A-f >ytji5!;i'i/;. $l:&#:t#©yr b A-y a y©A& 61\ d- y -y h ©(*❖* J-—if Cft-f SA -KXtSfttSfeiiKS^'n'?-^ HPC, A5»$Etot3t,. iS3pmCSe6«MsiW* ofeo LTtt, ?7X^3>lfa-f-( >y?lt b TFLOPS, Xf'jlT/iO, P /W h»©-*IBfi*liix.-5 * 7X^sHS®if#?*jntl>5, $ fc, Chien © A *§« t « » S 7J H T* » + gflops ©*mte###*w#r& t,, ^#©#fu #+##©#%%%:#& ii^&^tiTtySAL *>A sH©y — * —lixvo >S±^*T-$> 0, ttlSttiSiiTiy S b ux =fc -5 o *fc, Grid nybn.-Ad' y^ti, d1 > h ©S» k Sa&MiifblcfiH', TFLOPS «©§+*«, iifflST -^^/W MS©* F V Gigabit S^SfS Gbit © WAN kC«toTiWeig|+®* ‘d:tf€©5:-'-j'-fey b StSSStcjS# ■r^fctoffllted'A-efe^*s, x-tp^y^-r 7®'Mm^£.z>'mmw>mt\n, ^»©-by-9-©vni/j'd - am , Azv Digital Library ©A9/W MS©?-£ ©A'- f ^-XM, lt*«4alBTffi-5 bt'y!gt;tttt*o-C*3?>'f . ^*©Itffd' >77©iittj:?,5J 8&tt* sfcSo *Brti;C© 2-3 NGI iSAW Internet2 % b'©@A#m%@ftd' yX-r y h b»P(SLr, SSCa- h\ 7 7 1, DB, * J;tFA Wft-f >770Sii|iIlflTl'-5. 7D77At L T & 133 ji E © Alliance, DOE© DOE2000, NSF © NPACI, NASA © IPG, DOE/DOD © ASCI DISCOM &k'#& b, 5i A&E?Eil4/I>s7} ffjiAS ftoo ai?> S <, M A H SDSC (San Diego Supercomputing Center—NCSA fcHfcV NSF © 2 *X-t13yea-f-f y XA y £ ®-o)T*tt, 50% &l±ostwmsmiz npaci cgbiAHtr Ab, NASA &Eg%m©xA3 'sltt^T Grid © V 7 h L, G/75 (CjglS'f S C. i; SAMftC LT © So Crawford MffliSOidt, ASCI T* H AE^lft £ TFLOPS #^(,4*# PFLOPS *©H-K«£l®»8!ILT^<-S, E^Sf®$ 100Gbit E©Bgi§-fb* y h V -XTISSL, Grid H*&ffaite$ilTTV'^<, -AfctfH© X-*, A#, jbATT E^Srttc©'$n*e.ttS^:C8ti)Bd:nT* b, hpc * au ** hpn ©m,#t:A < op (Sr* s f t; t' s © *ssttr $, s „ Kifflidiy 3®ttr*(d;t>* 5Bttt'*KP#r/oy±r©%* Sfflrru©tt|g|cg±©*tCy< —, E%#, +f©MiC,'* si±* snr*b , f rtisisa^wtcftifift hpc • hpn ^©AyX'd - A77 bt*bsnri3b, s e>tx hpc 03Uzf t LTV'S©»sSttrafe So -tiJ6¥<*B©Stt& HPC jbcktH yX-*y 103U-f-(MiL, fflS ffiA'J'SrfcfltfcoT Network Supercomputing ©Hr Lf''ttt^C[S|lPfeE5SRB^4b d" yy^sfitciqiA^o^tgrifeSyo

- 181 - NCSA Supercluster f TIZ production use Tf 24x7 T'lsBb LTD , ^ Origin, T3E, 10#jg V'— ^7 cF tifz t)Ss C C T'JJ 2D Navier Stokes CD Pi $: tf o

^^07 7 V 3 LT(±,

Zeus-MP (256P, Mike Norman) ISIS++ (192P, Robert Clay) ASPCG (256P, Danesh Tafti) Cactus (256P, Paul Walker/John Shalf/Ed Seidel) MILC QCD (256P, Lubos Mitas); QMC (128P, Lubos Mitas) Boeing CFD Test Codes, CFD Overflow (128P, David Levine) freeHEP (256P, Doug Toussaint) ARPI3D (256P, weather code, Dan Weber) GMIN (L. Munro in K. Jordan) DSMC (Ravaioli) FUN3D with PETSc (Kaushik) SPRNG (Srinivasan) MO PAG (Me Ke Ivey) Astrophysical N body codes (Bode) Parallel Sorting (Rivera - CSAG), 10.3 GB Minutesort World Record

AS-PCO MM Perfemmride»S3 Navier Stokes Kernel

20

18 4k v; \’i , 16 > ' r”. 5 .'i . ?<: < 14 ...... # 12 10

mr .... * ♦ Ml O20M. 250 WU R1KM0 NT Out* a* 950 FMI Ms

9 32 128 180 192 224 256 Processors mn 2.2.3-4 NT 0 Origin2000 £

- 185 - $ tzn ffe© NT Cluster t LX ^ Sandia's Kudzu Cluster (10/98), Cornell's AC3 Velocity Cluster (8/99) frgtlfco 10 {%LL±CD=1Z X — £$8££tl

NT Supercluster, NCSA - http://www.ncsa.uiuc.edu/General/CC/ntcluster/ , - http://www-csag.ucsd.edu/projects/hpvm.html AC3 Cluster, TC - http://www.tc.cornell.edu/UserDoc/Cluster/ Communication Hardware - Myrinet, http://www.myri.com/ - Giganet, http://www.giganet.com/ - Servernet II, http://www.compaq.com/ Cluster Management and Communication Software - LSF, http://www.platform.com/ - Codeine, http://www.gridware.net/ - Cluster CoNTroller, MPI, http://www.mpi-softtech.com/ - MPICH, http://www-unix.mcs.anl.gov/mpi/mpich/ - PVM, http://www.epm.ornl.gov/pvm/ Microsoft Cluster Info - Win2000, http://www.microsoft.com/windows2000/ - MSCS, http://www.microsoft.com/ntserver/ntserverenterprise/exec/overview/clustering.asp

- UCSD CSAG # f 7 X f : Andrew Chien UIUC/NCSA 6, UCSD/SDSC UCSD CO#^-^ UT#, WAN a^7^^#@C0

#^CO^^%^^^am#^#^)^(Cluster Federation), fCOP^CO h FD V^com#, WAN ^CO

- f ^ #3^ Hbl:Z D, ^ P u y ^

^ 5 73 P71/#j^T 1G I/O ^ I/O a COM*&;i> P#^ Gigabit WAN U%, 3000km NCSA C0^7%^ k® I/O federation CO^#&fT

- 186 - ddTrfJJiCG Pozo IS IT l'' 'b *~SciMark2 Benchmark j tz o SciMark2 CGJ\

• 5 ocD^##t#± —$11/ - fast Fourier transform - successive over-relaxation (SOR) - Monte Carlo integration - sparse matrix multiply - dense LU factorization

k(Dda-(X dh&kL Java771/vk^LTaax&2flT^&Z:#X

Recent SciMark2 Results http://math.nist.gov/scimark/

51.6 MFlops IE VM 1.1.4 WinNT 4.0 Intel Pentium III 450MHz 47.7 MFlops IE VM 1.1 Win95, Mobile Pentium II PE 366MHz 45.8 MFlops NE VM 1.1.5 WinNT 4.0 Dell ... Dual Pentium III 500MHz 44.1 MFlops JDK 1.2 appletviewer Vin95, Pentium II 400MHz h 76MFlops t(DZ t'Tr&Zo SciMark Java t C t ©ji® Jt$£

Java C (MFlops) Pentium II 400MHz Linux gcc, java Small 41 66 Large 23 36 Pentium II 333MHz Win95/VisualC++, jvc Small 37 41 Large 16 21 SGI R1000 194MHz Irix 6.5/ gcc, JDK 1.1.6 Small 11 47 Large 8 16

LTLlT(DA^#(f

• yffm-nZtitzo • Sun t D • J£ < ##□ ^ J

- 188- 3 A^\©3 >yW C©3 >yW JavaCC (Java fflCDn > ;W 73 >;w 7 a b , Java HHn+x 7 l/^f^f V — y 3 — p^6, MS© Java+3 T/^YAthtf—Pd73X©lWltiL©3 — ^Tt^o 3>f^A7^73V-CW\ <7y^y%i;^>Paisim^17^-P$7lTl^o ^ /c, Java Grande 7 2f — 3 A © Message Passing 7 — ^ > 7*7* IV — T^rSSblaTi'A5 fft)tl^o TMPJj anton30r7c& Java MPI £ 7-^ >7>-7*7!^ Lt Id < Hfli^fe £ a©C at'S^o Ctlli MPI-like & API ;t 7"'7 a: 7 P ^ fa £ Java £ L <#^^717:1^ awo^'rcTi^'rc^O'yyD —y&ao7:i^ a©c a^&^o ^©J:3^#r/=^ API J:3

• Java ©%#©M#y^ 'V 7 — V7?$j £ RMI ^ Socket ii^7^7> p —17 —y^D 7" 7 5: >7C#Yb^7i/z t (DTr&^o • MPI-l ^ MPI-2 7!(±^#%M#tf IPT&^o • r JavaGrande 77'J7“'>a>j & ^ ^ & W #bC # /c 6b (C (J, 7 7 dr — 'Js^y'y >7^© API

H7GJ, £7c MPI-l — C, C++, Fortran ©3— ^ 77°n— f b bTx ^ 7'':7 x 7 P f|§ fo] 7: Java |q] §■ © 7 '7 *fe — 77^ y '> > 7" £ BfB^k©caT&^o fCT(J:, yypyxi/vf j>ya77dz-^yi7i>>7'a^o #^6y^^x A&a*3 < 777^ 6^^##7#ma%^a©ca7:&^o ©msMm#t)AD*^7t^ 6 i/i'o ##(C, rComputing Portalsj © 3 ^ il J £0 V^T©|p7t^fe o feQ C 7UJ J7tu, rDatorr (Desktop Access TO Remote Resources] a HTIJ tlT ID fz & © ^^ BU £ ^ X. 7: ^©7?&^)o £ T\ rPortal] a (J M 7^ © SM 7)s & o 7:0 7 > 7 — ^ 'V P £;fe Id 7! 7? x 3"$: 7 D > P Fa LT##^Tt/:3.-47*77yYXW#^-M©17-a7o f Netscape Jp AOL, Yahoo, ##©17—^ b, Ztlft rportal 17—tfX] atM TlTV^o ^CTGJ, Java 75 S IE & l!J £ :SE 7: LT75 b, 7 71/ y ^ JavaScript, Java-Corba yW > F, RMI, XML 7 — )l& £'&{£ t> ftT l0 fCT!, m-###C^(76 rportal] &^#L7:WC3a^7#X.y]7r&6o &b#6 t)©a LTLTF^#(7 6 7t^o

- ^m#W:#IbL&7j:7Y >7-7 j:Y7 - %—y^— 37" j >7 v v—7^©> a y© v t — p^^j - :t#C J:6^#©gS#t©^6b©7 7 Vyyj>7Y>7-7o:Y7 - XML &/< — ;% a LX=y^ —7 T —7? Y y^\© V t — p T7 t7©/:6b©7 —IP - ^7#iyyb7 —>a>©yy7 p ^y-e©^E^]AT • 77 i$( W) Jj © 17 — P - %#ib^a©^7 7 p yp©^#

- 190- • @EK#:£ UT t'/zS’E© 7 7-b-'>Wy -> > y 7 T* 7-7 V (NX, PVM.PARMACS, Express tt • #<©^>y-* s&a«g©. ^-y evy -/©&v7y-fe-->wy->>7 -7^77 V 6fflVi"C$cro4ilil L "O ^ ii „ (IBM SPN Intel Paragon^ TM CMS) • S’ < © * 7 i7 7— (National Labs & fc*) • #i#lb7"Dt7&g0%f 6 C <3>-fe>t^iiifeofe, • i Vtl£®Hlcfcfc5#iDAs$>7fc= «gffll$£«;tT. tWtot:*. ^SSfl-SiiS it.

cncMit Javauifo*? tt'7iai<'i)setfSHt?.n&„

• *7 7 7 —©Sgii^-E,©* 1? 7-y* —&*777 — fi Java ±ffl MPI &%L#7T t'-Sie? • 'oy-ie %©+>-*- b^fc-s *>? MP12 ffl*g-e-r ?> • ¥oJ$-z>X%±t£^'( >7V (,f't''©ii^y W:tiii^7V^©A»? I/O tt bot?.? -*isia«*t'7» k 7ijk-ay 5? mgfjfl©ffl5*fflttti: tyt5? • %#©##W:&^©#? • gSttCWf3l£$S&3 >-t>+)-7.l£&Z

fiftCx $t MPI # Java £ £ t' T j@ St; £ fi£ £ & t'ffltriitt Ufr i; 0> 7 j&fCO t'T y.T©a6»s¥U6nic0

• MPI HU SofcrnyvAttSaioifcltfXf - b yyyAAf © Z 7 ^««©1^ttlC*ffi$Sic^V'o 0'JxtiL &3*f§i>sSatfcfct, ii/Lj7 7A© 7-7-b7A^D^kSitT/t)S6ViS7Ci:i)sS§«5,o

f ai:ML% Java tiL Eoir7ny7 Att(6ttl$n. Ifltr- b ttS* 7 fe 6 © tc & D Af&Uo #Rim#+#4'C*m/t7 7 r C r 7 -b 71" -5.;:

f©m©&b#6#Ak LTUTtfWfciiiio

• @@t@©msam# • 7 L 7 PtfOl/ • Java © Socket API ^©/W > b • m^ma

- 196 - ■ 7 V £ Java CD heap iS ^ & V ^ T\ d? □ 3 X h S' V T 7

3 >/W 7T#%U&th#- h - ;W h3-M^^<^eW#/W±U(:^afo - native 3- P & 7

t'Jf^ k##L, #(DT^ Jaguar (DM Jaguar ^(±mT(D Jlo^^CDT&^o

- a java## y^/? ###&#. ^ v k *7 ^ T y *? — ~7 rawfO^7 I/O^^'o - Jaguar 3 >/W ^>o >7 j* Wb£tifc3-K(C J; tK M^MT&Zo

- Java CO/W h3-^s^“^ 7>& Jaguar ;W h 3 — K (Jaguar ;W h3—p(±vS/>V V —^^/vk^U7y^

- Jaguar /W h3- pcDthtf- HJ> %#0 JIT 3 >/H ^ o

- Jaguar ($vh9 ——P-^s X — P ^ ^7M0)o - Java fr 6 Jaguar ^xCD/W h 3 — p )]/Q

JavaVM JIT 3 >;W - Jaguar >rf jjfoCD 3 >/W 7 0

- MPI ^ RMIX Java RPCX Jiro & £

[ 1 ] hitp.i//w.w.w_le_p-£C.1e.xL_ac.luk/j_av.agr.a_ndfi/ [2] http://www.javagrande.org/

- 199- 2.2.4 Kmz.tsvr>i&m&ts&vft&imms'ZTixCD Globus -/\

H.TIH:lE BHT\ >1<@® y y^l/d" > £ University of Illinois at Urbana - Champaign IZ& £ National Center for Supercomputing Applications £ l^fnl Ln Utflvj

StiBU: yhK 12 ^ 2 u 27 E-3 U 3 B tfifflft 1: 2 U 28 B Gregor von Laszewski JS y;k3>^g^#^^f(ANL), -YW^'I'M, Wm% 2: 3 U 1 B Charlie Catlett £6 . National Center for Supercomputing Applications(NCSA), University of Illinois at Urbana Champaign(UIUC), 4 U J 4 'I'M, T ^ V ^7

NCSA T'Bg ±*MWk(D)Zmfrm\WnWi National Technology Grid &##L3:3 £ g y D S/ a: ^7 b T:' $) £ National Computational Science Alliance^ Alliance IZ ~D ANL Java y L'y b#mi>yyA MOBA Globus Globus

(1) NCSA sm

NCSAT(D#^^^^m^&f^<, UIUC tfitlfco NCSA T Chief Technology Officer $rS<$TUCharlie Catlett J3: C: $ I*] £: 3o Catlett

£ Tzs Alliance T: (3a Distributed Computing y — A 3o 3: U' Data and Collaboration ^ — A (7) Lead Investigator(10 cF G LA Grid Forum [ 1] 1?' (± A#:(D Chair(U

gcf. Grid $:##L3:3kU'3yD^a:/7b, Alliance t:'0(DT#S&3& Globus yos; a:/7b[7][10]N##m^^#tf NCSACO^^Mmei^# tftL'TA < y — b — a. the Alliance UIUCN Beckman Institute Senior Associate Director T:& 6 Melanie Loots gq3:D the Alliance National Computational Science Alliance(C it & lb £ NCSA fz 7bs UIUC &

-200- * NCSA Stzltmz Alliance j: ^|+PSECD jg

LtzM£ 0?'$>%o Ztl%9cmtZ>(D\± NCSA Tfo D s 50 JAl± (D)Kmcr)^:#i)$M#^WDL-CV^o NSF t'£o ("National Technology Grid 0##j UTCD^M U^c#

^17^ — V"C'$)^>o A@T:(d; Global Computing C UTs Grid Forum © J; 5 Alliance 60 3:3^73^0:^ b

Alliance &#&$ < 0f--A(±s DTP 60 4 ^7^3 0

(D Application Technologies Teams (D Enabling Technologies Teams (D Education, Outreach, and Training Teams @ Partners for Advanced Computational Services

f &o

(D Application Technologies Teams (AT) - Science driving the Grid Grid 3$ DE6toT7°V^" —i>3 > £ T t 3 *4¥#60 l^o 1%. 60 6 ^ — kfrhMfdLcStlZ)'. Chemical Engineering x Cosmology N Environmental Hydrology N Molecular Biology N Nanomaterials s Scientific Instrumentation 0

(D Enabling Technologies Teams(ET) - Architects of the Grid Grid i ck 3 V7 b 0 W:/^3>U^-^60#^^f 60%#!ia^^)o ^60 3 A-A 6: Parallel Computings Distributed Computings Data and Collaboration 0

(B) Education, Outreach, and Training Teams(EOT) - Access to the Grid

Grid ^#(3:s NSF ##J WL\s tz National Partnership for Advanced Computational Infrastructure(NPACI) program [2] DTV> < 0 D1T60 3 f “A Enhancing Educations Universal AccesSs Government

@ Partners for Advanced Computational Services(PACS) -Support for the Grid m^aGrid0##uato#@^u-Cs

<^-^xTs m-####s Grid ^Alliance 7% a a c a fur ^ 60

-201 - £>

Alliance T'0ffi^01i£0&li&0 3 "3T&6 A0C ko

(D Capability Computing (D Building the Grid (Distributed Computing) (S Scientific Portals

#i: 3 ofcCailfcK Tportalj lu 3b§m3a WWW >\ d ^ t) ^ web ^\0 A D □ 3 SttbTrtSibil^niisi) Ao Web ID H # id: $ T portal portal &g & & 0X:$) £ 0 C 0> fifgCD web portal S R£ T:

C Ao, A A 3 tztz. Loots CDfijm#^ 7 :c —A&, web^>^7x-^t'|)^, il^lTCfco web t TGI If & t>tV3~3& £ A < -Dfr(DffiW]lZ-zn^T Loots pq (L D dS#3& 3 & /=<, X.id! Education, Outreach, and Training Team(EOT)T: idA 'ATiCj###^'t0A^0 m^m^0'6cD^&6 Education Program A S£ LT A: tl A: x MM A L A7°n z? A^ffi^tl^ o Loots <,EOT ###Af &0(d:f AT&&%AAmXA A(A3 Ai^X:# < .minority^ A©cAo Av^7A(d:x A

v/r-i>3 LT, ###^#&#oAA#:##ji:-3^X:7^^3Ao Crutcher #^070^0:7 b

7^0 NCSA ^X?mD, fCXr##Lx Al^db

—^x A^MA b L —v b9 —AXr$M^At)#X:h9l^ Al^x At^^^Xr&^o NCSA (3: 15 0 industrial A — b t“§^30 C Lt G 0 A#(d: Alliance A |S#0### (d:^^o u^L, NcsATf^60A##m^m/v^^^(±, m^&#Tf0A#mx? *ijffihlSE A&3o Ctlid: Alliance 0^^X: fe o X % $I^T' id: & A 0 NCSA 0±I ltZ(D

-202- NCSA am^LT#

^ £ ;B ti) T l ^ -S o #!l X (J: J.P. Morgan X'&tl td! Financial Modeling t l'' 3 fzMa'X &%0 Z(DJ:olZLX NCSA Zm&nMWvmW, 77U^“'>a>Slft^± (j\ £ tz. n kkW&MMWteMTt LX c* fzo NCSA {J H 0 f± f£M J & Alliance N Global Computing t) 9 ^ & Catlett SGX M.Mh %X b Grid Forum Hol^Tt^ < ^p#>3 fz0 Internet Society(ISOC)[6]CD ~F \z N Internet Engineering Task Force(IETF)[5] by&JlX %W>X Grid Forum iH® Z> b(D Z. t o IETF aV/UJ RFC a Internet- Drafts d' v b a#$fbT(±&iOk: Grid Forum 0^)X b9-^^#(IETF)a^^3X#f#(Grid Forum)T0##fb(±mUf o/:, ai'o#ma#m& Catlett IX bi^oi)(D(DXdX$)'otzo Grid Forum 0$E#D$#(d: IETF Z&fzfz&s working group(WG)$lJ b X o LX l> %> 0 #X "C <7*tltix WG bL TlJoTl) D Mtsbc^o&x&M bttft^id:&i^0y!p t) Lft&lX Grid Forum X & WG tioT0ftBbiLTV^ftL rSiJ0iE^ aLT###m%

(2) TA3>?mALffl$tmwm

Java Xl/v MOBA[8][9](D, Jl^ry b X&%> Globus[7][10]/\6DEpr& Btoa LfclHS> fS %&£>'& (Dtz& kX 'i #((J y />!/ d' > @ V£511 ^pjf (ANL)© Gregor von Laszewski S £ Sfr fnl L Zi o Gregor S(J ANL 6D Mathematics and Computer Science Division(MCS) lZPf\M LX l'* & o ##GJTBZ Globus^ # Java IHj }!l CD HI % jo ck tb Metacomputing Directory Service(MDS) % 0 /L 7r G* o MDS H^0 41 'L AfflJX & % o Grid Forum X (JN Grid Information Service WG CD chair &##)% (X & <, XCDWfficom-CDBBlt MOBA b Globus (OWt^X^^tK . £ ^ Giobus tf-if MOBA on Globus ^6^<$7 \z-dwx t^EGTX ffiLSo ^,07f fG7L oGTaSLfco ^ # 0 # (J \ Gregor S a Peter Lane S() b . Ip CDEiro^: t) 3 fco

-203- a. -k < X — b — X k *> £>*>£>© #8 3t©i*]g:tt#:ffliiDo S6* ‘ofc ±"t‘, MOBA, Globus K^KdOiX©*, 6 C©©8R**S$&#@

♦ g B,@Jb a@, ##©*#, k4i$X©ttW&k'<, ♦ m#©m^ Global Computing Infrastructure(GCI)XD i7 x X b C ^n'T, B 69, #iD$166, X X h-xy b*ffl®Ji£„ ♦ MOBA: Java X V y b#j&XX X A moba ©@jbo #a, tBffl, %#m%, «b, xnx'x ; >xxx*;v, fascist*, ttlgfffflo ♦ MOBA CMf gci xnxxx b-t-FE#8lti'3, MOBAfflc&a-y —Kxcouto Shared object, Security, Registry, Manager, Scheduler,, ♦ Streaming Calculation I£tit8 +66#B|60 LfcSfiRi'XfA, Globus k©8t^&#M LTl'5o ♦ MOBA k Globus ©%-& Globus ©tSWf ^S-aX-BX, Globus X-BX £«ffl-f.5Bt#->X X A©$8« IS SoTlii^xx-ex©x*- b jLtbX-exfm^a©#^#^.

MOBA CoVT, $fc, Globus k©St^tCot'T tt ^'J> bl¥ L < -5 o b. MOBA MOBA[8][9]tt Java fiffiT -> >PaT'ffl, IffttiSffo&X b y b£ X*- b f ^ ~>^f AUSSo Sun Microsystems #©% 6 @© Java fit® V X > •£> XX X'J >kLTSissnTV'So f©#ai±, xotyx^os#a%&:+@#i@x©#m, t t£fc>% SS#flfl^j3l(heterogeneous migration)-^, #j#X 1/ y b IX^b, M A# -*p #X1/ y b # 6 © jetb 1C J. 6 #&, # |u]N8#%(asynchronous migration)& 8gK VT t' -5 AC&-5. ifc, #mxi/y b#ABkf 6XXX&#IJ, X lx y b©£fiJt7G*> y b

XXX&###+##k klcAV'it-5 kl'^fcASBS 5.X&®lt-5k k* sr-§-5o i&mkLxiJ:, #e*©XDXyA&:+##iax#mf6kkxmi%#ee@K, disconnected operation 4^ffttW©SjSH:$/c, SI6XI/ y b © 3688 k VT t 6 o ADX T, X 1/ y b #ii&©T@ X i" XX V check pointing iiBtt©iig±A^Hn-E l =

:t*b'fe«), saanssistt-ii'Sxfc-Bo mt, m@#a+#^©&m, 0$ b EStoXd' 5. > XX X X X x - X % k#aX V y b © -SkkAs^fi^feto, #|Bim#m'6mB»#Rk%%.

-204- c. Globus Globus [7] [10] 14 University of Southern California - Information Sciences Institute(USC-ISI) fc ANLO/nyi]' liitF77 b 9 Globus "J —A*"J b l4j£ti6gt@©$$]k&S&l$-9- —e7$ig#t-4S 5. #-^147077 S L%©o %1B#I4, Globus -9 —t*7©±lcfggt£n/c MPI-G tl'-ofcMl'^Ai)-- t*7, 7>g(Ma->77"A6m' -tfl-iraissff&do Globus l4*©-9--U7&m# LTf^So C ft 5> © MOBA f, ©fflfflnJIgtt 61^14* S o

♦ «SM d gram sn»e.©ya7©jsx^ So o DUROC S»->'3 7©|Blliega. @atSt6SEtt1"So (co-allocation) ♦ mm □ Nexus active message S®, 7—j7tUS&ffofciSSIX V '7 b'@Wj© «si££Sl#tf So □ Globus I/O TCP, UDP, IP ZA-f- * -y 7 b tt £©fiJIS, SSL Lfc®mi*lS©6^'fb6'9'4: — b -f So ♦ SPi^T'f ;i/747-bx □ GASS 3cA%ai#7 y-fib7f-UXWE&mKfSo ♦ t+a'lft □ GSI 9 X -f y7f A0 ftfflf < ©It — fXA^fflfBSnSo ♦ 97bU □ MDS LDAP ^-7©|+**iiXV W b 9„ □ Gloperf AV b >7 —^]f«©iilse„ NWS © Globus ^©gtl^C 4 0 k&So □ NWS * v b9-f %#©#]&&47b?#lo ♦ □ hbm 7 7 >©####&«[#&f,#mfs##&m#o

Globus ±(CSI$$nfcg$#0iW V-x;t-9--H7Ci4^IX.I4y.T©'fe©» sSSo MOBA A>& Globus if—U7£fiJM1-S|Sg©#if fctSo

♦ MPICH-G ANL (C 4 S MPI[11]©*#7&S MPICH & Globus 54oC Lfc*©» # < © Globus tt-£7 6180 S: MDS, GSI, GASS, GRAM, DUROC, Nexuso ♦ Nimrod/G Nimrod[12]l4 Monash University frUH^cf ttfeX^ yt—j7 -9" —Xffl©iftti:tbl+S 7 XfA« Nimrod/G[13]l4 Globus th—UXSrfiJfilTS Nimrodo Nimrod/G t"

- 205 - □ mds mmumo □ Nexus JS&o Nimrod Resource Broker(NRB)©^SBb 0 □ GRAM NRB # £, © V 3 7"^Ao ♦ Ninf on Globus Ninf[14](i Remote Procedure Call(RPC) ^ — X © Global Computing '> 7 r A t fe o Ninf-on-Globus[15]{i Ninf © MM Dt "P Nexus \Z b # & OX £ o C. CD Globus Nexus CD Ninf D s MftXfotlU Nexus Tit:# < X Globus I/O £TiJfd t~ ^ ^ X fe 3 o d. MOBAon Globus MOB A H!k Globus #1 ©7%## G, MOBA on Globus © C#o ZZXlt^f Gregor J3; © IS) S#{£:IZiL^tzo

♦ MOBA ##6, {Sl/^l/th-C'X©^- □ MOBA ©T >X b —;V0 □ Place ©^SHJ □ St S S $1 © £* S (resource locating) O □ tlJffl#©I^SE(authenticataion) A: X V V Itbl(authorization)

♦ Globus## 6, MOBA □ GRAM ©tllfflfBHilS) D o Place ©^SIKo □ DUROC □ Nexus X^#o Globus I/O©## MOBA tcmL-C^a&A&o □ Globus I/O Q$C TCP & ck^#\ secure □ GSI ©^ftu:#d: GRAM&^##mf&o □ MDS □ NWS □ HBM ##lto □ GASS ##lto

MOBA#^ Globus Gregor #it-lf

©tSlS'S^/c/cl't/'co o) ®fam&%MLX0ffim

IE W S o T N Global Computing C:?jS'n f £ TX]) >T — 'y 3 >© *7 y X&M b ti

-206- NCSA l8Uf4

Globus fili: bft (tft&ft £ ft t'fro

[1] Grid Forum Home Page, http://www.gridforum.org/ . [2] A National Partnership for Advanced Computational Infrastructure, http://www.npaci.edu/ . [3] SCXY Conference Series, http://www.supercomp.org/ . [4] Computing Portals, hitp://ww.w 1.CQmpu_tLngp„ort„aly„.„Qr^/. [5] IETF Home Page, http://www.ietf.org/ . [6] Internet Society Web Site, http://www.isoc.org/ . [7] The Globus Project, http://www.globus.org/ . [8] Kazuyuki Shudo,Yoichi Muraoka, Noncooperative Migration of Execution Context in Java Virtual Machines, Proc. Of the First Annual Workshop on Java for High- Performance Computing, 1999. [9] ##-$,#W#-, Java CPSY98-32, pp. 39-46, 1998. [10] Ian Foster,Carl Kesselman, Globus: A Metacomputing Infrastructure Toolkit, Inti J. Supercomputer Applications, 11(2), pp. 115-128, 1997. [11] Message Passing Interface Forum, Document for a standard message-passing interface, 1994. [12] Abramson D.,Sosic R.,Giddy J. and Hall B., Nimrod: A Tool for Performing Parametised Simulations using Distributed Workstations, The 4th IEEE Symposium on High Performance Distributed Computing, 1995. [13] Abramson D.,Giddy J.,Foster I., and Kotler L., High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?, to appear in proc. Of Inti Parallel and Distributed Procesing Symposium 2000, 2000.

W, JSPP’97 mJtM, pp. 281-288, 1997. [i5]^##^ai3,^;n#^,mM#, > h - o-wi/:] >7"'>7ir A CD.bblk---Ninf,NetSolve,CORBA,Ninf-on-Globus O'fTtbfFffi---, tSIBM 99-HPC-77-34, pp. 197-202, 1999.

-207- 2.3.1 ie5<4IBU-SCj3»5je«5-E3 yy

fcMfrWL^i > b^-7 ^ Ui^o t)^,5Avx

(l) /s—nytr^L —^ . r>-9->7>

^#J0/£ijj^fC3 >h°i— 7 ^ >^<7M y — -3 > bn.— £ & * y h V-^Trjg^U — ^ >ba—ft D

b:i-7-f >7"bd; D >bn.-x'T

Supercomputing ’99 b&UT C 3 -o £l0 7D i/x^ hf- Ali '^© HLRS (High Performance Computing Center Stuttgart) $: 41 ^> t ^r V 7 CD CSAR/MCC (Computer Services for Academic Research/ManchesterX 7 ^ V ^7

RUS/HLRS and Partners @ SC‘99 Portland Network Topology

Cerfecs ► Toulouse EU Projects Optiblade TEN .. , ■-' ""T * > : «» " 155 . vBNS EQ 1 Y ARAN . Abilene 1 T 1 JANET S. IS S m-fh Sg IMnet ' , PSC SCInet Belwii. LANRUS . I ■v m' r^i db "V T 1 Tsukuba/Tokyo Pittsburgh Portland Stuttgart Manchester toncheeter USA, JP H m via JANET Tdeglobe IP Service TACC Hitachi SR 8000 PSCCrayT3E SC -93 Portland HLRS Cray T3E MCCCray T3E sr8k.aist.go.jp jsromir.psc.edu Workstations hwwt3e-at.hvwv.de turing.cfsac.uk 150.29.228.82 128.182.7368 140 221 x.x 129.69.230.195 130.88.212.1 ATM PVC HLRS MeBoompudng HLRS Metacompuing HLRS booth Meaoompiting HLRS Metacomputing HLRS Metacompuling 10MbiVs Eiropean Projects booth EU Projects EU Projects Shared Acattd.Covas, Edison, 4catad, Covas , Edison Novice, RCNet connections Hpoom, Notice, Optibbde, RCNet RUS Projects ______Communication System; BeWii Develop mert (PKB)

m 2.3.1-1 $ y h7-^Oiji

-208- © PSC (Pittsburgh Supercomputer Center)t 0 ;fc©I Sifft'Jl'te > 9 — (TACC) A> ib © X > n — T * $ ft X tA -5 o ¥ ■=£ > X p V - a > T- ti: HLRS, CSAR/MCC. PSC ffl T3E t TACC CD rsRSOOOj (64 y- -y h V-Xtig s$v, 2.2 tflops a it, # f 5 3- V-^a >65tff LA: (0 2.3.1-1),

(2)

-#> )£«»«cn>h‘a-7"4 LTs «S@© 5 * i: M l T 11' X )1 - 7 "J It< Supercomputing ’99 ~Sfyt>tlfc;r^:~SH1&W 1 0 ftE5C8$lilcFftAil7— X Xt— L a > ❖ PC UNIX «©-6-|+ 1 5 0 i$±tc8+StoiS6fl-|!l 4Hl©c© #*#©H6©kC6tttt#*©l+*«S*toT < ^,C Ci;tfe-5o tft#tf)©ifitt:f+||«£rSJS'tft ut. h %If(St-S*S* s*-s c ix *tit: J; D#(ajAsHEl:^oT6^n$v^ j'flutc-Ex-BS k, SmT-ny^ASSIffS-SJ;) LS^fci: SCtt£®fflft@PS-t-^©Xn ck, *k -$yc$/;iiffltotctti$a:-iSv^6©»$6-$.o l^u •Y >X-* -y p ©te#-t-v V) -X > X L -X * X ASfiKiz: LT V> & ©"C zfe •£ A> 6>. itn-e©*,© 6 "9"—L X t l/TSIl -5 L X 3 Cft pf %> ASP(Application Service Provider)

2.3.2 y-j >^!;A'f;l7t-V>X3>Ka-T'f >y

Md7t-7>X3>ti-fl>)' (HPC) l±3+###K%km# LX, ##»

#¥©**?>t, #i#©m#ami:eKUTi\6 (02.3.2-1)0 laca ^t,

#l±AA#A!Uxa% < ^>Ax-^*'J'A#&**$#©/:©CI±C© HPC ttESA ALii-gfft-So LA> L, X-A-n > La—X© J; A &i6til6IP@tilti:A V 3 > LR^T i^flS'lifAAsfflKtcAcoTt'-So X-A-3>La-X ti; x-71-xe> vo- y L a > X 0^-f ft;P»@frCi±?hIA^:+##ft&pe#f-5, let, x-A-x >ua-xi±-#-Xct±#A^x X/\X&##f ^ c kA^uI#x £>•?>, hpc tt«©rxp v-xxx'KttSifflxS ix a®tt, &/1V3vi-e@e©(±*6 -r-E>A5, t'^3>(?')f-f»)i/©@HT-ttjr cfr. lfl«roC S L- L *© l+Sfg*©##t6git 3 L k ASflJEC& ■?>, $ lc, *#j©m-e#*&e#L-ci^»-/H±f©*$ja-t:Mfpcm to i-i?x> PttXxtTX h©55R6iSSt--5

-209- fumam #*«

W-*E

HPCMKR*

f4»tt«it85<75V- 0 2.3.2-1 k HPC

eixii, s© cpu &6#@©77uy-y a >& ^'>7-^1: LT«flj©y ■>yv>$T-#ti"C%#t-r^Ik'5tc6-5o ASP © HPC dficffivw 7<-y-Cfc-5o £fc, -v-^y p SSiiS'S-^fetoUi, tSipto»^«©«iz: * s¥iii;*4& fen-5 *9* sfe -5c 6s¥RJkT-ife^o

vsntttc s u r - a * ® v g+ e »s £- $ k »-a o c a cpu &C kl: *fc, -^tu^qJtgt: *:-& Xx A£*ig

2.3.3 ^D-A>3>kJ.-T^f >^?X F

a#©R@ai±m%#0mMi5&:fa?&'5o tsa^fcjr-< >y©w*ts E6$e@6tocsifrf ¥t:kcA>© www -y- 1' h sisht^ v7 1^Rotst . -i >7 b-;H"n«iStfisk¥tt"t*tttci'o 'j>& < k^ :+#»- vx\ #ii* y 1 9-77, ^=7^T'y h h SEtiT u$ oTI4*Hto*)6«S-i(3 > £3.—7 -f vy'Tii &V-o iSisEdfUi > Ki--r -f 5 fctoCteU B*6tj*T7"Vy —->a > v? hu^ ^<©77^t> h&jax-c, #mc##Kk#@@& 7-—y'CMSB U^i'kV'ftfei'o ko UfeM*S

-210- —i’7'f7> ha-tf £ tot l£* y t 9 — ^0 AT7f tXf 3>Ka-f^ >^f7 t^V M&*@# • ##L Z 9 tit i#4?hTi'?0 ctiST-tct)'J'M^T*ff oTSftit £ < <$x 3 S"C *5. WfCitK-y — nomturn'o, f<®77'jt- tSSlcftTSKKk&So

RWCP 1.5Mbps Europe

45Mbps APAN STAR tap iMnet T2ty TransPAC Chica9° 70Mbps

1 Mbps CSys Australia

#tHhTV-»$S#tL, Globus, Ninf 3¥£®|::4%±lf Tfc<

‘ OOMbps

|g 2.3.3-1 wvri 7^

- 211 - H3S titxs ts ? V

1 0 1 Z b. ®*#Wf!Jffl»if*s0J!SI t*i5iaflj3>;H5a®l:HUtt, ¥fiU 2 ry MWt3 >;t •i- 7f5Egg#gj MbtJ, <2)£Sfl-S(3 > t‘n.—9 4 y^SEC-oV'T &. tfesff3sms4$icisst?t^tc. mm^mm^oj #ee&M&A4:f^,ckczD, <^$c©*smi:

(1) #?!Hb3WW9Sffi

3 WiioSttWffitt. x*;i/*-^|g. S* • > h->x^A^k'#aS«t£i6/j:$$S$^'r*t), ^E*¥Sfto6E5aiB X^SffcterPSmSJtijCiniU-T^g^SEll^W^SnTV'-Bo ^fflfcto. HPC (High Performance Computer)/ 1-> — P’i'^7’Ctt-2>V'>>tttb©Ic5J;i;J;ibC^ V 7 K?i7 t; J; s tilSJM© siifb & ff -5 }£«© * s-£.g^ t tt o r u s 0 Cfflfcto. 1 2^fi=fc bHJS-r-5 fT P/\'>7 h jfe?ij-fb3 >;W 9tit«$%j T-ti. Fortran. C %©iFDlTffil'^>f V'7D if 9 X > 7Hlg% X* h L, #^©#Mfb3 •f 9f£WAs*tS;i: LT #£*?£. 1" &£>%$> 6 fp D©9t© 5> it £® $1 Stile * intern 7 9 A&a-#!lf < . 3 >/U 9 Asg«!)WC#jg*®a#tet»S!l 1X7^1- V >761? 9. 77 7 F7f-A7 lJ-7^glijfi?ijfb3>7W9Sri©9f?$Sg%6fi :3o Cilfe©®f ?i! US $5 tC tTn. — ki:6C. SS&rt4fe&(§»£■» SC fc*S}gt-c HTC. 7Dyi7(fflIIS/iitc

(DiP&M&JWia : ¥/$ 1 2^@~¥f$ 1 4^S ( 3ipf@) <2)E^^j$*ej: * rheas (3) flfgmms : X*A 5?AffliE)*S8S (Fortran. C #) T-fi>tlfe7-77a79A -c*$>o. (openMP ©ksie ) crs7 9-7 h7*-A7 IJ-%@##Mfb3 >/W7®S»*fT)hJ;{iC. 3 > / w 9 tC cfc S P « & * Dr 7- tt ff 5) ft ti l r tt W * £ «tl« £ m' X i*jSfb&EIS;tflSS©79 7 (7t-A7 'J -&36?!Hb^n.-3>7

(4) W%9-7 : ®7 hVt>X h jfiMfb3>7W9Sr7©IB% ©*?iJfb3>/W 9©tt8gfffltt$©|g% (5) *t£lfS : ®7 P/l>7 f #?ijfb3 >/W 9K(6©H% aest-fes*^ v^;p. t*ts-czfesge?iJtoav^/M/. $ e>cttffl*4ST-s>s

- 213 - ■y-»—t- > > • IP—y • S*yn y tote §1 tttiT 616 V yi/i- SI36ttE®eti6IISt-5 = ©36?iJ-fb3>yi-f 7®66gffX h 36£!Mb3 >vH 7t£®®5fl%J le43V'T55%1~-E> 616W >36 ?iJ-(bll®i*S*36?iHbtt®6ff< 1

(2) je«^*3>Ki—rd >^

l£«diC3 >bi-ri >^/tt, LX 1997 ^/6^K9E»srS%-fb LTU-E. 53-if-Cfe b ,*1!ilt®3 > >^S8k b-ri±i$n-tt'5o*Clte*H'T(i, c n$t®S|gE^58%te6nx-C. Grid Forum i;ll¥«n-5 Grid. SS®fcto®#l«As$Sfi$ StS®ffl/e*=tt>'t*#ffl«^ • *Xo 076X176 fti7)7- ^"eSSlB'U ltti?77^76XS*6'?» PDA CVfc-Si!«i*& Web 6 60076 Sfflt'T+6-b, -e®±®T7 V -7"--> g >tiffi*S)S»ElSB+#* 6, iSStoth — If X * X £ V > SC * ifi & $ ii X v' -5 o C06®l6$tei:b^x a#5@0)%5g%ia##tc*$%;igo& k oT U$ oTV-6®* 5 SttT*fe-So 50 >^»S}XU, $ 7 1- -7-67fiJffl®y I>7\t7^tgip 'ttmjtizfflmtzxgzrztb, #&®#eim® *«£«* &i\

■ 214 -

## A. "The Stanford Hydra Chip Multiprocessor"##^ (1999.11.22,24) OHP

-215-

University

University

to

of

Stanford Stanford easier

blocks

single

chip composed

of

the

localized slow

fast speculation

programs

across and

slow advantage

architecture requires

transistors

especially 2000 and

levels parallel to

by taking are thread-level all

plentiful

bandwidth

at by

Architecture innovation

control

and latency

relative

wires

develop plentiful

of for

<-> cheap, to

low

communication

processors

multiprocessor transistors

implications simple

slower are room

easy parallelism law than cheap,

fast

it of cycle s

renaming

’ get threads

cross-chip

communication Approach million

are

design

implementation

Long Plenty logic Exploits Makes Single achieve Moore Wires simple single-chip High

100

> > > > > > > > > Memory Multiple Keep of chip Wires Architectural A Transistors

Hydra ■ ■ ■ ■ ■ ■ Technology ■ University University

Stanford Stanford 1M Chip

100K

Laboratory 10K

Team

Hydra (instructions)

Parallelism University 1K Olukotun

Size

Hydra

Systems Grain

Kunle 100 The Stanford

10 Multiprocessor Program Stanford

Computer " '

Loop The Thread Process Instruction

5 o 1 a! | Exploiting

- 216 -

better

CMP

a University better

University

thread

Hydra

for Hydra

thread 96

buses ‘ 1.5-2x

Stanford Stanford 30-50% and fine fine coarse

Case cycles) single

Interface write l

: only &

& i

cache

SS

SS Hydra

(10

LIOdaCadM Bus

CPUS The

and (64b (256b) ILP

=> processor ILP => ILP

than comparable => “ ASPLOS

TO

is interprocessor

T

CmiHtawyCMhddr | C*5»| Bi ► # > > Bus ►

read

----- 2nd-level ,

latency

T

2 ;

Write-through

Read/Replace 5 Interface Low Shared communication Separate

'

LltWiOaehe

Memory CPU

> >■ > |

MwhuitaM

lfMt.1

“ Main CPUaMdnqrCMrdhr

Cache Memory LI

T!

: ArbNraton DRAM J |- 1

Rambus But

Design

1 r CootroSw

DeteCeehe

L1 CanlraVlMf CPU

Memeqf j

caches 1

taetl caches

CPU Cache

i |Lt { I data

Hydra J Cache Superscalar

multiprocessor coherence

primary f

Cad*

I

«« processors “ 10 vs.

On-cMpL* L CPUO

maintain

Base T “ Four Single-chip Separate Write-through CWSUMMyCoaMW- to Cadwj

> >• > > Hydra The University University

Stanford Stanford access) cycle cycle

(~5

cache

lines

L2 performance time

purposes

caches 16-byte writethroughs

array writethrough writeback most

Details L1

for on-chip thread

architecture for

Data,

performance support bus lines

2-cycle

8K bus

design base access

write

Associative, Associative, read

of

32-byte Architecture thread thread

buses cycle single-ported Set Set

speculative

single-ported pipelined,

KB, of

Instruction, Hierarchy

data

prototype

Hydra Single Fully 2-way 8K 4-way Line-wide 256 Word-wide

sets ------

4 Shared, Two

> > > Performance Improving Base Hydra Speculative Speculative Conclusions

Outline ■ ■ ■ ■ ■ ■ ■ Cache

- 217 - University University

very Stanford Stanford

is I

Loop

early

applications

too expensive

write:

synchronization Parallelized

from Forwarding and threads C-programs

=>

occur

of FORTRAN

conservative

difficult Speculatively

reads disambiguation parallel too aggressive

matrix is Requirements

Software

is

be

to dependencies limited help? when

applications

pointer

dense

is

between for

data analysis

i

Sequential compiler compiler Parallel

data Loop auto-parallelization Iteration need time have

disambiguation hardware

violations

the software

Original Speculation can

Pointer Remove Hand-parallelized Compile Allow Threads Auto-parallelized Forward Detect

>- >- >- > > > >- How Parallel difficult Traditional O ©

■ ■ ■ Problem: Data

University University

regard Stanford Stanford

semantics

without CMP

writes speculative for

for

memory commits automated

(e.g.

writes

sequential

time

correctness writes

for

support support performance

bottleneck 40-60%

a easily easily are perfect

for

parallelization t of the ’ threads code

>95% original

of

sequential isn Performance now refills,

ensures

only

is 0-7% cache

for 10% refills, Speculation

with

follow

L2

enables support arbitrary arbitrary low-overhead for

miss to

<10% structure

parallelize within hardware )

into

stores multiscalar

~35%

to bus only Data

System

execution

and provides code

parallelization

rate:

accesses synchronization

ways

speculation speculation Hit Internal Occupancy: caches

- - - data-dependencies Loads Loop Break Speculation Parallel subroutines Hydra Add Wisconsin

Performance L1 Most

> >- > > > > > > >- > > Data Data Other for

Memory Solution: ■ ■ ■

- 218 - University University

Stanford Stanford set III

are

by bits

in

CPU

A-D) read

#i+1 CPU later later Hi Speculative hit pulled

The parallel

L1 are

in (priority

memory O

---- line

#i L1 byte a of

CPU *1

Cache n: ” to L checked

each

Requirements are on

CPU written

views “ #i-1

CPU Speculative earlier buffers bytes

encoders

Reads write

CPU

” newest ©

multiple and #i-2 priority

CPU miss

Head L2 “ The Nonspeculative L1

© Speculation D —

Maintain

Cache © Data Speculative University University

” M

Stanford view Stanford “

Iterations II

for

Iteration

©

STATE Successful

PERMANENT forwarding

L2

after

IteraUoni violation

state backup smart

state &

Writes after

threads

Support provide

Requirements speculative

state control

F7

buffers

to retire

pre-invalidation violations

speculative bad

write

and

with

detect

and

retire

bits

bits reorder

caches

coprocessors

discard tag

tag L1

L1 Speculation L1 ”

buffers

Speculation

Safely Correctly Read Dirty “ Separate “ Write Speculation

© © © © © © Data Hydra

- 219 -

University University

of

a

-02 for

but on modeling the

Stanford

Stanford runtime

applications architecture

aspects 2.7.2

real

all

single-issue Support Support

Hydra GCC processors 4 of system Accurate and interface hardware

Entire

Improving > > > > “

Applications dependency

CP2 than

and

data

98 ‘ Speculation

Parallel

System threads

implement

from through

to Data

“ Thread speculation

in ASPLOS to

99

‘ ” threads recover

simpler speculative

ICS Speculatively

” Runtime all

and of

of overhead

routines CMP description

speculative Handlers

order more flexible

Multiprocessor

Hydra Speculative

Exception more Performance Chip Complete violations Adds the Control Track

>- > > > » Software

Performance Base Speculation ■

University University

checks

that

Stanford Stanford hazard L2

the

RAW

into variables

buffer

L1

out

cause

thread & © write our CPU

local ilidations

iteration

& L1

#i+1

CPU drains -gnffgr later

Speculative any per our Pre-inv

cache buffer

speculatively Threads

.

- L1

te speculative #i pre-invalidate

its

write Wri

thread CPU

a dependencies invalidate

Cache

to

just globalize

#

iterations CPUs

writes translator procedure

and CPUs

& #i-1

CPU generate in CPU loop

Speculative

Earlier" Writes A Later after “ speculative Non-speculative body

loop-carried Detection

source

O © © © loops procedures calls

© CPU one Invalidations Speculative RAW

to

while code loop

support pwhile #i-2

CPU mmm cause

"Head" o Nonspeculative and

Bus source Write

Execute Procedure Pfor, Typically C Analyze could L2

for

> > > > > > Cache > Speculative Speculative Compiler

Speculative Creating ■ ■ ■

- 220 - University university

► RC32364 Stanford lost) Stanford

code

(IDT) work

L2

source

KB

with

128 violations

frequency,

Technology

and ------

values Transformations

D

(PCs,

mm I,

PC

Device 11

occurring

KB

8 thread write

Code thread loads in statistics

load-stores with in and

Integrated

frequently up

down on and

read

violation tool

0.25gm

loads stores in

Prototype based dependent

non-violating

2

Motion

mm Synchronize Use Move Move Collects Correlates

Find > > > > > > > Design 88 Feedback Synchronization Code

> > Feedback Hydra ■ ■ ■

used

University University

transform

Stanford Stanford of statistics

performance

manually

hardware impact Base

code Violation to

■ m and performance

some

Flashpoint) limit Performance

help frequency

latency (requires

Performance

locality

memory can

violations (MemSpy,

data

reduce

for to statistics

movement independence

shared misses

prediction Parallel

Speculative

data data communication

threads Feedback Base

dependence

optimize optimize cache violation

H ■ to to at at cycle

violations coherent

explicit explicit

Look Frequent Dependence 100+ Need Look Need data support) No

No

- > > > > > > > > > 4-i 3 Speculative Cache 35- Optimizing Optimized

-221 - University University

Stanford Stanford

Hubbert,

all

Maciek 2000

of Ben

and

00 late ’

Chen,

Lim by H1

models

by out Mike

Melvyn

99 ’ Verilog tape Map

layout

Siu,

H2

and Hammond, by and

Mike

Road

Lance

Finish

design . (IDT)

synthesizable Prahbu,

verification Lam,

on

Team circuit

Design

Kozyrczak Monica Manohar http://www-hydra.stanford.edu

> URL Team Finish Complete Working components

Hydra ■ ■ Chip ■ ■ ■ University University

and Stanford Stanford

for

path with

levels

evaluation

parallelism all

compiler

at microprocessors

migration mechanism

and applications

controller applications

on

design performance speculative

details

parallelism to

for

controller for screening parallelism

evaluation

fine-grain reference Tasks

optimization application

interface way

resources

tuning for

buses exploits all

support

performance memory

new

memory exception MR for code

cache a large-grain

parallelize coprocessor

write

mechanisms

implementation high

performance

main L2 to

Implementation platform to

debugging

performance

system

and out

overhead offers

and

facilitate

Read Speculative Speculative I/O Controllers On-chip Off-chip Statistics Provides Low medium to Realistic Single-chip Provide difficult Allows development Work

v v > > v > v > > > >- > v » v Memory Speculative Hydra Prototype

Implementation Conclusions ■ ■ ■ ■

- 222 - University University

Stanford Stanford CMP

a

CMP in

processors

all

large to Hydra

implemented the

in be

architecture broadcast

arbitrarily

may implemented

prototype

be

implemented

be possible

support

Bandwidth

be made

buses

memory are Hydra may

simply

be

may

the faster

may

CMP can

buses

of

buses

CMP? speculation CMP?

writes a a

techniques Hydra

Pipelined, Multiple All Wider

> > > > Bandwidth New Overview Why Thread The

Why Outline ■ ■ ■ ■ University University

Stanford Stanford the critical

Latency

t

from

of memory from isn or more

cycles)

cycles)

cycles) Cache

Read (50 main L2 CMP Read (-5 100 communication parallelism

#2 #2

than CPU CPU

Laboratory

Team

(more

University

Olukotun Main MP

cycles)

Hydra Cache

communication 10 Hydra Memory fine-grained

L2

DRAM Systems interprocessor

of

Kunle Communication about multi-chip

The

#1 #1 Stanford

faster

CPU CPU

(usually

to

minimizing more to

Implementation more Cache

Computer or

memory cycles) Write or

L2 (5 offers cycles) Conventional, Hydra Write

exploitation

• CMP? • (50 Stanford

main

a

CMP

The A Therefore, Allows Why

- 223 -

of

to

SM University University full

the time an

a prevent a

CRA

remainder remainder

statistics at buses from to

into ,

Stanford does Stanford handle

* from

take to write

Devices caches ,cm i cache necessary

also access through

passed

turned if I/O

CPUS 1 access machine

and

(64b

(256b) necessary w be

arbitration stages ON**** sequence:

Bi

Bus

arbitration

s as

H primary read ’

counters state access

2nd-level conditions

Handles pipeline, Accesses SM SM memory Could Get Sequence

- - - - - Each Control Each race address Write-through

Read/Replace brtarface Separate Shared Separate

> > > > State

Memory Memory

> >- >

Main

Memory

rest chip

to DRAM of Control Rambus

Design

.

1

Design rtumber rtumber

CPU state Logic

Output the

caches fist

in

count

to

data

FIFO

cycles Hydra

state multiprocessor

coherence of

up :

LSCedw per

State Buffer 1; Access Queues processors

On-chlp Machine CPUO maintain

Base Four Single-chip to Write-through

Next-

Arbiter Central > >- >- Resource State The University University

Stanford Stanford lines less reads writes

of transfers

on on

sequence processors

allow

errors

and

access access

L2 unit

ownership

system the interface well bus

bandwidth

controllers

possible

cache-to-cache

to

memory memory controller:

do acquire work high

data to to

off-chip

main main

memory still and

memory

interface: Coherence

to execution protocols hard-to-find

have have

memory start start drive return

the

t t

’ ’

time

for to to to to

machine machine

design

don don faster

memory in

protocols Controllers state

state through

design

coherence independent CMP? processor machine machine machine machine potential easier communication

main read write

a result

Processors Processors Fast Less Shorter efficient L2 L2 State State State State each the

>- > > > > v > > v > > Faster, Can Simpler In In Several accesses

Memory Why ■ a ■ ■ ■ ■

- 224 - University

University

s ’

its

” Stanford clear Stanford

clear

it is

early

will program as accesses

older each

need?

reserve

too performed

“ up threads all matches

bits for violations vector

is

of

vector

bit ... multiple original

occur must bit

after match

queued its

L2 those

immediately 12

system

the

the

cache

indicating access memory memory its

to

when

are Arbiter

in

reads speculative

the state L2 of the

clear bits when ”

simultaneously done

in

in

the

clears be

Requirements

set

write are

in writes when

memory

address address

views will all “

memory conditions

its

between

with an

it

bad

misses

misses

access

to main

of Address missed processed race

when the

a data

that

is that same address same

vector each

— violations

multiple

of that

speculative bit thread of the

a while

of accesses

does to

competition, prevents

access access access

Speculation discarding

no

Central

Parallel-compares If miss Saves accesses Otherwise, completes This Later

> > > > > > Every Each Each address Forwarding Detection Retiring Maintaining Safe order executing

Data ■ m ■ ■What The O 0 © © © University University

Stanford Stanford system

try

to the

tracking easier in

threads re-execute

automatic requests

&

parallel

R/W

interface

CPUs) in resources bus writes

full

send hardware

all

writes) grants do

reads

automatic correctness

of support becomes

write

t

among Arbiter to ’

fine-grained

shared memory

own

allows more? the

access of

isn dependencies before parallelization segments

track

code

its

an

cycles real use with

still generates main

keep memory

hardware reads

track acquire

support start find

makes

even can program add then maintain

make

to parallelization to to (priority

can

(usually Resource

do can prioritized this broadcast

CPU table allocates

must

necessary need

memory

by type ... tracking therefore therefore

are speculative are

support,

processor processors processors sequential that ROM by parallelization also uses

can R/W

when

add CMP

this

cycle

First SMs Second SM Each Other Do

writes We

communication

small CRA a

Central

------

Requests Programmers A All Programmer Speculative running All With

> > > >- > > > > Each Every Fast buffers, However Hence The

■ The * ■ ■ Can ■ ■

- 225 -

University University

Bus Only

Mask

byte)

Clear and Clear (by

Stanford Write Stanford

Write Invalidation

from Commit

Gang (he) in

Force on Bus backups commits

Data Data

cache

Read (L2

Write on on

to advance and

Only

out

bits backups

Clear ICAM]

L2T«g Data

and

Invalidation clear clear

Addresses of

array Backup V violations

Read Gang on to to and

Force renaming

the detection

after

bits bits types commit in

up

to

until commits

valid

memory valid us are

extra

back violation on

4 here

to line required Details

buffers cause cause Allow pointer

Allow bits

priority

us

Allow held complete on

each

line are tail

all

bits bits cache ler

are

*** Clear of for earlier

L2 e Tag requires “ when Details

present with muxing Allow

cache

if

L2 Gang

mask by

and from writes byte-by-byte

from

clear array

circuits

tricky

into

modified pre-inval

tag write time read

data

are

Read-by-word: Pre-invalidation: Modified: Set Set Gang Written-by-word: Buffer

Cache

Drains Requires Line Collected Byte Any substituted, encoding > > > > > > >

CAM

Special Speculation ------Speculative commit

■ 12 L1 > > Reads University University

to

view Stanford Stanford 't SM “

1 1

for : thread Icpzj

i ‘ small

Sw:, threads

UDMCMwl SpwwkrtkH.

Pd (64b)

.0

' <»•

' Bys ^

buffers: with forwarding

complete

1 1 | L2

reads the &**

™*- i IcPZj 1 speculative C«h*A A

completed Write-through state

smart backup ■

buffers

U &

| restart threads per of

buffer

*

"

A

UM::* from ^

: threads

i provide Support

>

forwarding

manages ' , 1 ;

' when

speculative

order

■» CP2 included 1

threads control j control

L2

buffers i

buffers to i retire

* pre-invalidation provide violations to

be

to

UOwNCwlw* Spee»l*tXt»D*»

of I maintained write

and

when *

with

is detect background and controller

buffers may

'S'"*

sequencing jvC#D ' L2 bits

buffers

Overview

bits

reorder

the sr caches

CM coprocessors

1 draining buffers tag

and in tag L1

buffer

sequencing

t buffer

L1 Speculation 3pM»OT*ol»b-

L1 ”

buffers bus buffers !

■V>.

- L2

C occur Clears Maintains Uses Allows Commits Dirty Read

Separate Speculation “ “ Write Write Buffer > > > > >

Extra One Simple 0 0 o e &

■ ■ ■ Hydra L2

- 226 - University

University

Stanford Stanford needed

changes is

tone

7 2 12 - - 12 -30 OS -22

address + processors

Loop-only Omhid

17 end

bits tine

handler

sequence 15 -70 -70

OS other -no

Hs sequencing special +

threads

OwM

25

Procedure a tag

of

if

this on

CPU that to starts

snd and track Mice

by loop

execute from

routines: the present) cache

and iter

buffers if Ae

thread procedure SYSCALL

on execution

a execution a to

on end loop speculative

iteration, iteration, this

code, procedure

registers thread

task cache primary

a software

starts

thread, non-speculative.

committed

less

next of

cache loop loop

a

for at speculatively

the processor

then the speculative

following following a requires out

to is

Vee writes

speculative Summary processing

it

when snd

completion secondary

violation

current current speculative

runs

procedure code thread major if

tone L1 or head"

its cerent stops

down system the the until ...

processing

"

the

off processor processor coming bits RAW

bits the

the

a another

off

iterations, shuts the processor,

processor of attempts

machine speculative forked control

of this control

then the the loop Forks loop Completes Completes Restarts ■utiuctioB,

Completes

then Temporarily (or

Prepares

running has Handles Pauses simply CPU a

Coprocessor

of thread Full running loop

T

Local from

Loop speculative

state are each

Exception Procedure Buffer Procedure

of the

stop SunL the together Finish

Receive End

new Start Vioimkm:

Hold: End Hold:

System clearing the a or to

some all operation commands

examples: small “

it ~i a Procedures L *W<»*

speculation

Restart Start Start

are

- - - Has Initiates Commands Some

> > > > Interrupts Holds Maintains Here Controls Catches Putting

Speculative Runtime ■ ■ ■ ■ ■ KB

2 University University

a

for but on

the

Stanford

Stanford

Lines

violations

KB processors Support Cache

1

B

hardware 64

Improving “

Applications other

than to

structures and

dependency

98

‘ Speculation

System

Parallel

KB

implement 0

from

memory

to Data

commands

“ speculation

in ASPLOS to

99

‘ ” threads

Sizing recover

simpler to

ICS

Speculatively speculation

Runtime ”

write-based writes

and of

coprocessor size

overhead

are line

routines

CMP description thread fully full

in

caches the

Buffer

most a speculative

Handlers

managing sending per more

L1

flexible

Multiprocessor

when Hydra By Using By

buffers used of KB

- - - 1

Exception Control more Performance Complete Chip Adds the Data buffers buffer

< captures Stall We associative

pair > > > >

Software > > > Small sufficient comparable a All

L2

Speculation ■ ■ *

- 227 - University University

and bits Stanford Stanford

avoid

... Hydra to

logic

of

tuning

evaluation ...

including

speculation code

merging necessary

version

macros

L1 for as

mechanism here buses

controller

read

controller for

including cores

cache

performance

write

functional implementation

pipelined

for and required coprocessor, cache

information screening memory

reference

be

and

existing

system,

fully interface

structures resources invalidation of core

fan-out CPUs

will

arbiter

main buses

all

yet

read

Challenges rate

the

rate

operation

high gang memory interrupt

feedback for bits cache

CPU

Overview

of secondary

write mechanisms

memory speculative

long

system L2 small

memory

debugging

clock

off-chip

cycle

clock a

with clear resource and

for fan-in, our our

and

generate

MHz Design

Memory Single slowdown Conditional Read Gang Simple Target Controllers I/O Speculative Statistics On-chip Speculative to

High

> > > > > > > > > > > > > > Drivers Speculative Clearable Central Building Starting Adding Adding

250 Key Prototype ■ ■ ■ ■ ■ University university

-► RC32364 out

hazard, Stanford Stanford

X to

X

by KILL

data sends

restart (IDT)

handler the

VIOLATION

the

address address

message

L2 the speculative respond

CPUs

handler

-

3

KILL notices notices reads writes

KB 1 1 a more processors executing handler starts The They All 0 exception

128

***** Technology

access and

D bus

C«w#wuc«b

Hazard

Device ______i \ 11

write IcigwAfl* *****

a KB

Floorplan

[ NMI 8 |

*******

mi indicate

Data

:■ with ******

Integrated

a Lines CM) OwtM*

- \

on I

of

0.25p.m

*

in based Prototype

'**£$!*

2

mm ■<

Design 88

> > Hydra Anatomy

- 228 - University University

of

Stanford Stanford

purposes between

make efficiently

easy can

evaluation memory workstation

memory

balance fine-grained

statistics memory

execution host

for easier implementation

resources main main

relatively relatively

a

further main

makes software

is

effective to into

speculation

CMP

from program

allow

allocate internals

off-chip

like

performance

latency

offers

automatic

will hardware

chip

and

interface results

programs during

large

speculative parallelization to

of of

I/O nearly to I/O

reasonable

structure

and

techniques I/O

mechanisms a

make

prototype for

possible

speculative interface interface loading reading simple complexity

communication

can parallel

interface offers

Hydra hardware

Allows Allows Lower Allows Allows Bus-and-SM Support parallelization design Arbitration threads Adding CMP

>- > > > > > > > > Direct Simple Hydra A our

The Prototype Conclusions ■ ■ ■ ■ ■ University University

Stanford Stanford tuning

2000 all

code of

of

for

circuitry end

coprocessors

by HTOO

models

by monitors out 1999

violations

system

of

on times

Verilog tape Map

layout speculative

monitors

end

machines Mechanisms

in

and

memory reference by

and

feedback

state arbiter debugging utilization Road

timers in

Finish for

design . through memory

synthesizable

provide

verification

on idle-arbitrating-busy resource

resource

circuit counters chains

Design

monitors

Track

Track Primarily > >- > Bus Programmable State Scan Central Speculative Finish Working Complete components

Statistics/Debug ■ ■ ■ ■ ■ ■ Chip ■ ■ ■

- 229 - University University

but

Stanford Stanford

violations loops

processors

hardware

...

other

parallel than

to

structures

threads CPUs

dependency dependency

System system

the

implement

from sequence

in memory

to commands

speculation

speculatively Java

parallelizer

to

speculative caches threads in

recover

C

into simpler simpler

properly L2

for

management

speculation

caches

Runtime collector

to to

concepts write-based

and

L1 coprocessors coprocessor loops

overhead

routines threads in

thread the

speculative added automatic Handlers support

code

managing garbage system sending bits more

flexible

together together

By Using By user the

- - - In Dynamic Basic In Extra Buffers Exception automatically Speculation Transforms Control more Adds

> > > > > > > > > > > Speculative Hydracat Software Hardware Software Works

Speculation Outline ■ ■ ■ ■ ■ ■ University University

while code

easier

Stanford Stanford dependencies

parallel the

management data

in

parallelism data conservative

for

CMP

Hydra

of post-subroutine for programming

easier

parallelization enforces

parallel

program

Laboratory semantics a in

with on

Team

of

much

simplifies

make instead system

parallel University

Olukotun

Hydra parallel program Hydra

iterations

Systems Support in runtime locks opportunities sections

threads Kunle

optimistic The for

loop

Stanford be parallelization

communication sequential

conventional

execute

can expands parallel

to

Computer multiple subroutines Programming

manual supported

Stanford latency run run allows

Hardware-software

Software - LL/SC Low maintaining Attempts Can Can Makes Compilers

► > > > > > > Hydra Speculative Speculation

Parallel ■ ■ ■

- 230 - University University

Stanford Stanford iteration ”

speed

memory Quick

Orertieed “ each for

checked -no

management code loops to

speculation

forked

I

A'##!###';,:

and

reasons: through through

loops Stow and off

and “

OvaituA

another within

this

forked

processor, to

processor

another

made

has compatibility this

locks call this overhead

within

shuts loop assembly requires

passed by on

this that attempts

CPU

be processor, several processor procedure

running

subroutine

for

then the Loops Subroutines

then

call this this

be thread

procedure

execute

on

and

starts

requires

a on by

processor for and loop

procedure committed speculatively execution

MIPS task Use a and use require

the

speculative

thread subroutines for subroutines on

starts (or iteration, for

for must

speculative

present) processor code,

following iteration,

committed violation if speculatively speculative

processor, then loop

to loop speculative

code subroutines

less current

iteration into and

RAW

loops

a processing speculative

processing

the thread,

a

speculation subroutine

the violation

current

speculative completion

system loop off

current

of its when down

the execute .

less the current

the structures expensive expensive

next RAW

a ■

iterations,

Handles

registers predictions must predictions a Forks Restarts Completes

the

the made

adds disable

when can are Prepares run

Completes data CPU to Completes Handles

Restarts Local types are

Receive hand-optimized

value

subroutine Procedure Procedure

loops

Software another Software in

Routine loops loop CPU

” Local End Start

Receive Violation: as ” Loop

Loop from

Violation:

each

routines another of iteration

Start Return Complex Finish bodies

Violation:

different Callee-saved End from Violation: Slow Quick

- - - “ “ Unfortunately, Same Written The

- - - - > > Loop Two

> > Support Support

University University

values

loop- Stanford forked Stanford

available

effects j

return

hoc*

dependencies side

among tror'x

predictions

... controlled

made Threads

dynamically executed from

4-45

return is Threads

are

predictable

are

easily

or iterations

distributed Speculative errors

loop-carried loop

calls check

Speculation ... Speculative

*■ > &

as and/or T!"T

.jfcwxfc

VOID

: nt- body

call:: Prod prevents with

enforces continuations generate limited

Speculation dynamically software

body

are with

Prod

specially-marked ■

hardware hardware handlers a

subroutines loops

dependencies Loop

speculation when

iterations

Iteration by Software

Program

Original - Post-subroutine-call Requires Speculation off Loop Requires CPUs carried Speculation

> Original > > > > > Post-subroutine-call Loop

-231 -

loop loop Test

in

); watt); University University

Start

y,

Initialization x,

locally *«»);

. () variable

out

hazard, Stanford Stanford

used location to X

X function

i*l;

,..)i * by ( KILL

Xi

data counter » y

= sends yt restart

loop

W) * y handler the

VIOLATION

the 2S|

Loop variables

< address original address

dependencies message f l*

«**

s

// t //Loop-carried the •I**

’ speculative respond

CPUs

* handler

(Hi U

() 3 «

KILL reads notices writes if spec_end_of_iter»tion

X it tool*er_BigJunctiontl HOr»_Cod«_B«r

separate loop

1

1 processors handler a more executing All 0 starts exception The They processors data

all the

TtiisLoop New,

to In

true Loops

ThistoopUi void

I:

hardware: broadcast when

write a

Dependencies

invoked

indicate

loop is

code speculative

in

Lines

the

code itself

Data

locally Conversion by variable system

function x, used

Siwj ,

loop control

, ti encapsulation

i CPUs own (

counter

i*i;

the all its =,

x? loop

Loop, Variables Loop-carried

y in

*

i++* runtime on / body

detected

// // loop / variables y

25} speculative

90;

loop

!* oop « loop else

are I FOR The i

<1

Add Put

*

> Hydracat Enforcing if X Another_Bifl_Funcfelon — — Program Start Speculative The The i, Transform

1. int University

University

” that

body only

Stanford

Stanford 7 7 -22 Improved Overhead

“ ... loop loop

code

system single

loops

tedious equivalent

” purposes

a by

the the

is

II

Quick subroutine

in

converted required “

on just special

register-allocated

; t ’

be ”

own not

loops pwhile simplified 80 with

~

to Slow programs

a

its

system “ aren returns

speculative

code

automated performance

Loops and

s

need into the and uses

processors marked shuts be

loop for

this

attempts

processor, processor procedure to

parallel user values then

for

then all this this

that

execute

pfor and subroutine,

requires to

on by

then

loop

and

can

body) execution speculative a

the

speculative loop

speculation management locals,

thread

on

are

start (or starts iteration,

processor present)

iteration,

committed loops loops if speculatively

Use

into then loop loop added develop into continues,

to to

loop

iteration and to

loop pure speculative

processing

thread,

to

management

violation thread

current speculative ” (or

system loop converts current

process

original of loop-carried

p down the

less

the current

the

code RAW next

made a

iterations,

Overview a

loops

the

the a

loop easy

subroutine breaks,

the is Software loop

the when tool

that overhead

run Prepares ”

it

Completes localize to

Handles Completes Restarts all

add setup the

portions

to

subroutine loop

CPU loop

initial Local

Receive

Loop

Loop

Minimal No Key Ensure each Makes Add Convert Convert Try Just

Pull

another of iteration

Routine Start - - - Finish Improved > > > > > > > > Violation: Entire End “ However, disables Converting Our from Violation:

> > Hydracat Support ■ ■ ■

- 232 - University University

up t

x:

Stanford Stanford

+■

; be i*i;

loc*l_*W value * violations

t

speed

parallel y,

lot!

locals#

x, local_sum) * a

r

even maintained

, frequently

not

critical i fi

(

parallelism looal^ann

information * i*i; case

parallelization

the

= are help can

(...);

hidden

y necessary

(nonlocals*>wum)

that may

most

*

25) dependency failure

can be

(oanltxsala-xetaR) so != application

A_Big_Function

of i

value common

violation calculating

track (nonlocalB->si*R) occur local_auaa Anotbaar^BigJPunction More_Code_Here

the

Prediction may semantics semantics A

automatic movement artificially with

the

of

r

written information critical

is chance

value Rare

is the parallelism

Case a Fixup

lative Move code runs for of provides monitors make

s

involved ’

tuning allows

improvements

to code

Value ); *->«»));

sequential y;

there + loop-carried

dependencies without

algorithm optimization

calculation quick support untuned

nonlocal

Optimization ( occasionally always the

true

y, with with when

critical attempts

speculation,

in

a

code

x,

(nonlocale->sum) because record first,

sequential

,

small, ,

i to

(

i

performance ( is som) only

i*i; * where Hardware/software and is writes

involved =

few (...);

- y

Fix not Often, present Speculation during that that A This

25)

performed Prediction Thanks > > > > > !=

Hence However Speculation A_Big_Function

Code Code Code > > (i

Feedback ■ ■ ■ Optimization: if x* More_Code_Here AnotherJBig_Punction University University

; ; StfUCt

Locals StfUCt

; ; ));

) Induction

Stanford sum PdCk Optimization Loop-carried

loop ->

s UfipBCk

Transformation i 1 in

tnonloeels->«M*n variable local_au»>

tl

closer

the ;

«

y,

y.

. «t, iOttloeal»->*UB>

variables locally

nonloca x, loop location counter , ( function ■

variable 1

<

ndnloeals->suN)

of ( ,

used ; Ci, l i

loop-carried loop

0 ( ( ■*

local.eum p

; ; *nonlocale) loop ..,);

. copy +* i*i; •

)

counter %:' subi sun

original

. i

=

variable 0: =

s

’ increase increase y * Nonlocal Nonlocal Loop-carried Local (nonlocals->*um) Loop loop-carried Variable*

sum i '

, . U ft II // * // Au.Bia_Functi Variables separate 25) ( loop

thieLoopVars

; ( with » thieLoppNonlocals;

!=

(nohlocals-?: (nonlocals->su«i x Another_Sigjhmeeion Nor*_CodejH*re( the i; A_Big_Function sub 0;

New,

(i struct = value In II:

y;

int int thisLoopNonlocals

nonlocala->su») Struct IhieLoop ( greatly if local_»ua i: - Amather^BiflJFiwcfcion More_Code_Here

x, sub

thisLoopVare;

) typed*! thisLoopVar# thlsLoopNonlocals Movement int int int thieLoopNonlocals sun void tfci»Lo«s>(6t&i*L6opllonlocal»t

:

critical loop-carried can

value a the

of

) to body registers

parallelism compilers

of

Code

); code y;

write SGI

♦ loop-carried out loop

a for

calculation

ndnlocal»->euni> ( -**«*) the work Conversion

with critical y,

transformations

of available a variables

(nonlocals->sura)

x, transformation

CPUs

, of

i moving nonlocal# itself < (

all

top loop variable

* writes on body

Involved

=

loop variables

transformations y

the oop not loop that

I FOR loop-carried

25)

Simply amount to These

!=

Program Start Speculative The Speculative The

A_Big_Function(i, — Code Code (i >

Force =

Hydracat (nonlocal»**>stmt) Optimization: if x Another More_Code_Here 2. ■

- 233 -

no

used

University University

Class with

and transform

RTS Stanford Stanford

statistics

dynamic intervention

performance

C compiler,

manually

manual Base Optimized code to Violation JIT

multiprocessing

fine-grained ■ # ■ facilitate

with for GC,

from

(e.g. Java

suited compiler

benefit for JIT well

required can

environment

and

model

Ideal and

speculation multiprocessing

implementation Performance

routines routines

is

of

thread environment

these

Java

support

of machine

specification Hydra

Native Runtime Many loading/verification) Most management coarse-grained

> > > > Virtual Java 4-,

Programs Speculation Why ■ ■

is

University University

that

until

C ):

loops, waits

structure Stanford

Stanford

normal )); >aum) that

*

overheads

code with

continuing routine frequent

like

most its

synced optimizations with

"nonlocals"

{nonlocals before

the ,

easy

performance be i

i

in ;

(

just i to

with { dependencies parallel

i i*i; helpful

is

be

direct « X;

protect equal

{...);

variable

with

=

a(nonlocals->sum_lock y assembly-language high

be

>eum_loek) y amortize

new

w to an

a 25) along should is

Hydracat minor

can is help becomes

0,

to helpful else

to

can (i

to «

be

used used (ncm,locals spec_lock(i, x if with Another_Big_Function More_Code_Here

form

sum_lock spec_lock sum_lock

' initiated

^ conventional conventional protect achieve

be

can in enough

programming

to

Speculation can

can

feedback

dependencies speculation large

programming

; ;

) Synchronization parallel

dependencies continue

the from

low-overhead parallelization

code speculation

provide for

can critical

y? its

local_sum)

routines

; synchronization

y, programming

in parallelism parallel code

can

it

x, to

(nonlocals->#t»t)

of

, most

, i

{ synchronization i (

locel_etaa

iteration automatic i*i; «: speculation,

the * x; protect

(...};

unpredictable synchronization = y section

(nonlocsl$">etm) Similar

variable

to

y

Speculation

” * 25) - - Loop Post-subroutine-call predictable Fully sequential especially With

Explicit Only and

Hardware =» •else A„Big_Functioo

sum >- > > > > Explicit Critical “ added (i

Speculative > > Speculative

=

{r»5»iiocals->s«ml

if x

Another_Big_Function More_Code_Here Optimization: Conclusions

... □ □ ■ ■

- 234 - University

University

/

for

Stanford

Stanford

stack code

required

by

as during

from

C disables

profiling / invoke

in

speculative detail (LOD)

compiled Java

passed to

JIT vs.

dynamically accesses

of

code code

enables Methods

compilation identifies top

heap

controlled

registers Under

at

speculation overhead

flag system level-of-detail

annotations techniques assembly

Java

space

dynamic separate

code minimize

access

of runtime dynamically

on profile code dynamically

inserts

can annotation

procedural considerations Profiling

the performance

clearly sampling of

methods

code to to

method speculation

assembly system

remove

accesses change utilize

adjust / reserved

compiler compiler advantage

rewriting Use profile Can local Can Can method Unused methods speculation Add Bytecodes JIT JIT

> > > > > » > > > Implementation Runtime speculative Take Advantages Analogous Java-specific

■ Speculating ■ ■ ■ Advanced ■ ■ University University

) Stanford Stanford run

to

platform call

penalty

any control, code

Java version

on

ISA

method in source-to-classfile

process source only

or

performance

MIPS g. d)

choses ’ implemented or

no

(e

required

bytecodes for

executable

for call (GPL

hand, http://www.transvirtual.com

translator (

body

class-private speculation by

primitives common support

compiler allocation

normally normally Environment

dynamically into for loop licensed method

additions handlers

loop

Speculation

done

(JIT) of

for machine compliant

transformation inlined

register compiler

SwingSet body

libraries public

/

1.1

normal

easily be

system

specific

body Hydra

Loop be code

source-to-source loop Multiprocessing synchronization) Global Optimization AWT

virtual JDK Can

-

Speculation

versions

------Hydra Support General Just-in-time bytecode Default Loop Can Via

> > > > > > > > Kaffe Move Some Runtime Two

Using * Java ■ ■ ■ ■

- 235 -

source University

University

source

object speedup

hand-

38x Stanford Stanford

stack 1.98x 1 garbage

gray

completes Hand-picked methods => => speculative Unmodified optimized With

be objects roots

speedup occupancy > > »

■ □ as

runtime

must sweep objects; objects classes

points-to from

after gray marked

white

Speculation all

or default objects

or

to

heap;

pointers marks

black white

black heap explicitly

to all

Collection to

pointers live object

Method

sweep left, -

- unmarked

of

objects points

points

or

-

after

stack loader

base from objects heap;

Sweep

heap;

black as

garbage live class gray

live

thread model

- -

references -

no and acts

Default Black Native Root becomes Gray White and

> > > > > > Roots Tri-color

Mark Speedups ■ ■

in

University University

sweep

Stanford Stanford critical objects

with

be

and

live

regions from longer

mark speedup

call

Profiling optimizations

no to

frequencies identify

/

benefit programs

and can method

C

with Collection statistics or

speculative

may

as

speculation

incremental

accuracy which of

that

is

speculation

bodies

addresses dynamically

disable profiling

/ transformations

loop loop

can

objects of execution

prediction sections from

performance

optimizations Garbage violation

of

based and

enable

of free

effectiveness

Speculation value cycles

system critical in

results

and collection

implementation sorts

detailed

Return Frequency Size Speculative

- - - - procedure heap Dynamically Determine Programmer runtime Same Identify speculation

>- > > > > Collect Applying referenced Use Baseline Garbage sections the

Speculative Optimize ■ ■ ■ ■ ■

- 236 -

University a University

each

calling

size from

Stanford modify Stanford

with barriers bytes

300

that

examined object

object 200 list

write speculative < than

we by

barrier as gray

white

violations

with list collection a

more invariant

aastore) the bytecodes

sorted

to

write arrays no

at gray on

method

short-lived, on Barriers computed

lists thread but

applications

expected

onto putfield, pointer

pre

no Collector

free barrier

a

head calling objects

most incremental long-lived black/white

are sweep off in of

as accomplished

objects, of

common, good object

Write

write

for write

and speculation to sizes

required, (putstatic,

usually

most 10-25 on

white barrier objects

number - -

maintains

results inserts

barrier

mark

value collection object least Garbage

sizes continuation write

these

attempts some at requirements

trapped objects write objects object;

references

all

procedural

return

compiler

of

allocation object

Place Sweep Execute Execute No method Sweep thread Within black Large heap Non-array Allocate JIT Trap

Small

> > > > > > > > >- > > > > Role Using Experimental Fast Incremental Non-copying Two

Baseline Speculating ■ ■ ■ ■ ■ ■ ■ University University

Stanford Stanford Non-spec Non-spec Non-spec Non-spec Non-spec spec

Implementation(s)

spec, spec, spec, spec, spec,

Parallel Procedure Loop Loop Loop Loop Loop references root Collection

stack

GC

finalize

thread needed again

thread

to

Collection if

invoke list list

objects native

objects objects Function

Program object limit, objects

gray gray Finalizer

barrier at root root

Garbage

white

Resume Mark Heap Mark Invoke Identify Free Write Sweep Finalize Sweep Sweep loader

class and GC GC GC GC GC GC GC GC

Thread default Program Program Flnallzer Mark Parallelizing

- 237 -

in

University and University

and spent

speedsup

JIT speedups code speedup loops larger

time Stanford Stanford as

on

sections have

critical when

based relative Calculated in GC application Collector impact improves compiled will identifying

Only

* ■ wait

like

variables idle

in

implementation non-issue

GC

a in

1% 0% 0% 7% 7% 3% 7% 5% 6% 2% shared 16% 12% 16% 10% 27% iterations 20%

is debugging time

debug on Speedups %

resulting

9.5 9.2 and 17.4 16.5 58.3 58.2 33.5 70.2 79.4 40.4 62.3 89.3

135.2 122.1 2837 295.2 time

between speculative

(ms)

Parallelization model, to

Elapsed correctness design

loops performance

to is

1.00 1.11 1.05 1.04 1.21 1.12 1.12 1.04 Overall balanced

since synchronization loops

similar iterative Collection

not work GC to

3.41 3.12 2.27 2.66 2.28 3.78 2.53 2.84

Speedup difficult is have of

orig opt orig opt orig orig opt orig orig opt orig opt orig opt opt opt Bound Bulk Must More violations, work Traditional

> > > > Set

Non-speculative Speculative Structurally

Garbage ■ ■ vs. ■ Benchmark mtrt Swing compress db jack javac jess jBYTEmark University University

2.20 2.61 2.11

- Stanford Speedup Stanford from 2.22, 2.41,2.61 2.41, 2.09, 3.41,3.11

Loops

between

speculative

benefit

per speculative

linked-list can iteration Non-spec Non-spec Non-spec Non-spec Non-spec

iterations spec

Implementation

dependencies per each Collector

spec, spec, spec, spec, spec, Collection

loop

which for

Parallel Procedure Loop Loop Loop Loop Loop elements

accessing

loops for

between overheads

GC

Critical finalize

thread needed again

linked-lists to

non-essential if

linked-list Garbage invoke

list list on

collector objects objects

in Function

Program balancing object limit, objects

gray gray gray Finalizer

dependency

speculative at

root root multiple

four

load multiple eliminate

Resume Mark Heap Mark Invoke Finalize Identify Sweep Sweep to

Eliminate iterations Better iteration Maintain Amortize

> > > > GC GC GC GC GC Process Identified Need iteration speculation Thread

Program Rnalizer Program Speculating Speedups ■ ■ ■ lilSBfli IlliilgmgM

238 - University

Stanford JIT

GC,

VM:

high! into

speculation

and

method

processors system verification

concurrently utilization

and speculation

match /

profiling and

Java

loops runtime threads

good available optimization

loader of a for

processor

to Java

Java

managed

parallelism

on based class

keep Hydra work

to multiple

advantage is with

Dynamically Run Feedback Incorporate compiler, Speculate

> > >- > > Distribute Goal Java Take

Conclusions ■ ■ ■ ■

- 239 - m IS 03-3987-9354 FAX 03-3981-1536

/l —3 ? •ri'yp ^©iSSW^E

16 fi ¥1® 1 2 ^ 3 H

=7=105-0011 ESSSKS&E3-5-8

* IS 03-3432-9390 FAX 03-3431-4324