5HYLHZꢊꢉ5HRUGHUꢉ%XIIHUꢉꢍ52%ꢎ
&6ꢀꢁꢀ
*UDGXDWHꢉ&RPSXWHUꢉ$UFKLWHFWXUH
/HFWXUHꢉꢃ
8VHꢉRIꢉUHRUGHUꢉEXIIHU
² ,QꢏRUGHUꢉLVVXHꢌꢉ2XWꢏRIꢏRUGHUꢉH[HFXWLRQꢌꢉ,QꢏRUGHUꢉFRPPLW ² +ROGVꢉUHVXOWVꢉXQWLOꢉWKH\ꢉFDQꢉEHꢉFRPPLWWHGꢉLQꢉRUGHU
ª 6HUYHVꢉDVꢉVRXUFHꢉRIꢉYDOXHVꢉXQWLOꢉLQVWUXFWLRQVꢉFRPPLWWHG
,QVWUXFWLRQꢉ/HYHOꢉ3DUDOOHOLVPꢊ
*HWWLQJꢉWKHꢉ&3,ꢉꢋꢉꢅ
² 3URYLGHVꢉVXSSRUWꢉIRUꢉSUHFLVHꢉH[FHSWLRQVꢂ6SHFXODWLRQꢊꢉVLPSO\ꢉWKURZꢉRXWꢉ LQVWUXFWLRQVꢉODWHUꢉWKDQꢉH[FHSWHGꢉLQVWUXFWLRQꢄ
² &RPPLWVꢉXVHUꢏYLVLEOHꢉVWDWHꢉLQꢉLQVWUXFWLRQꢉRUGHU ² 6WRUHVꢉVHQWꢉWRꢉPHPRU\ꢉV\VWHPꢉRQO\ꢉZKHQꢉWKH\ꢉUHDFKꢉKHDGꢉRIꢉEXIIHU
6HSWHPEHUꢉꢀꢇꢌꢉꢀꢈꢈꢈ
3URIꢄꢉ-RKQꢉ.XELDWRZLF]
,Qꢏ2UGHUꢏ&RPPLW LVꢉLPSRUWDQWꢉEHFDXVHꢊ
² $OORZVꢉWKHꢉJHQHUDWLRQꢉRIꢉSUHFLVHꢉH[FHSWLRQV ² $OORZVꢉVSHFXODWLRQꢉDFURVVꢉEUDQFKHV
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅ
&6ꢀꢁꢀꢂ.XELDWRZLF]
- ꢆꢂꢀꢇꢂꢈꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
/HF ꢃꢄꢀ
5HYLHZꢊꢉ0HPRU\ꢉ'LVDPELJXDWLRQ
5HYLHZꢊꢉ([SOLFLWꢉ5HJLVWHUꢉ5HQDPLQJ
0DNHꢉXVHꢉRIꢉDꢉSK\VLFDO UHJLVWHUꢉILOHꢉWKDWꢉLVꢉODUJHUꢉWKDQꢉ
4XHVWLRQꢊꢉ*LYHQꢉDꢉORDGꢉWKDWꢉIROORZVꢉDꢉVWRUHꢉLQꢉSURJUDPꢉ
QXPEHUꢉRIꢉUHJLVWHUVꢉVSHFLILHGꢉE\ꢉ,6$
RUGHUꢌꢉDUHꢉWKHꢉWZRꢉUHODWHG"
² 7U\LQJꢉWRꢉGHWHFWꢉ5$:ꢉKD]DUGVꢉWKURXJKꢉPHPRU\ꢉ
.H\ꢉLQVLJKWꢊꢉ$OORFDWHꢉDꢉQHZꢉSK\VLFDOꢉGHVWLQDWLRQꢉUHJLVWHUꢉ
² 6WRUHVꢉFRPPLWꢉLQꢉRUGHUꢉꢍ52%ꢎꢌꢉVRꢉQRꢉ:$5ꢂ:$:ꢉPHPRU\ꢉKD]DUGVꢄ
IRUꢉHYHU\ꢉLQVWUXFWLRQꢉWKDWꢉZULWHV
² 5HPRYHVꢉDOOꢉFKDQFHꢉRIꢉ:$5ꢉRUꢉ:$:ꢉKD]DUGV
,PSOHPHQWDWLRQꢉ
² 6LPLODUꢉWRꢉFRPSLOHUꢉWUDQVIRUPDWLRQꢉFDOOHGꢉ6WDWLFꢉ6LQJOHꢉ$VVLJQPHQW
ª /LNHꢉKDUGZDUHꢏEDVHGꢉG\QDPLFꢉFRPSLODWLRQ"
² .HHSꢉTXHXHꢉRIꢉVWRUHVꢌꢉLQꢉSURJꢉRUGHU ² :DWFKꢉIRUꢉSRVLWLRQꢉRIꢉQHZꢉORDGVꢉUHODWLYHꢉWRꢉH[LVWLQJꢉVWRUHV
0HFKDQLVP"ꢉꢉ.HHSꢉDꢉWUDQVODWLRQꢉWDEOHꢊ
² ,6$ꢉUHJLVWHUꢉ⇒ SK\VLFDOꢉUHJLVWHUꢉPDSSLQJ
:KHQꢉKDYHꢉDGGUHVVꢉIRUꢉORDGꢌꢉFKHFNꢉVWRUHꢉTXHXHꢊ
² ,IꢉDQ\ VWRUHꢉSULRUꢉWRꢉORDGꢉLVꢉZDLWLQJꢉIRUꢉLWVꢉDGGUHVVꢌꢉVWDOOꢉORDGꢄ
² :KHQꢉUHJLVWHUꢉZULWWHQꢌꢉUHSODFHꢉHQWU\ꢉZLWKꢉQHZꢉUHJLVWHUꢉIURPꢉIUHHOLVWꢄ ² 3K\VLFDOꢉUHJLVWHUꢉIUHHꢉZKHQꢉQRWꢉXVHGꢉE\ꢉDQ\ꢉDFWLYHꢉLQVWUXFWLRQV
² ,IꢉORDGꢉDGGUHVVꢉPDWFKHVꢉHDUOLHUꢉVWRUHꢉDGGUHVVꢉꢍDVVRFLDWLYHꢉORRNXSꢎꢌꢉ WKHQꢉZHꢉKDYHꢉDꢉPHPRU\ꢀLQGXFHGꢁ5$:ꢁKD]DUGꢊ
ª VWRUHꢉYDOXHꢉDYDLODEOHꢉ⇒ UHWXUQꢉYDOXH ª VWRUHꢉYDOXHꢉQRWꢉDYDLODEOHꢉ⇒ UHWXUQꢉ52%ꢉQXPEHUꢉRIꢉVRXUFHꢉ
² 2WKHUZLVHꢌꢉVHQGꢉRXWꢉUHTXHVWꢉWRꢉPHPRU\
$GYDQWDJHVꢉRIꢉH[SOLFLWꢉUHQDPLQJꢊ
² 'HFRXSOHVꢉUHQDPLQJ IURPꢉVFKHGXOLQJꢀ
² $OORZVꢉGDWDꢉWRꢉEHꢉIHWFKHGꢉIURPꢉVLQJOHꢉUHJLVWHUꢉILOH
ª 1RꢉQHHGꢉWRꢉE\SDVVꢉYDOXHVꢉIURPꢉUHRUGHUꢉEXIIHU
:LOOꢉUHOD[ꢉH[DFWꢉGHSHQGHQF\ꢉFKHFNLQJꢉLQꢉODWHUꢉOHFWXUH
ª %HWWHUꢉXWLOL]DWLRQꢉRIꢉUHJLVWHUꢉSRUWV
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢐ
&6ꢀꢁꢀꢂ.XELDWRZLF]
- ꢆꢂꢀꢇꢂꢈꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
/HF ꢃꢄꢑ
*HWWLQJꢉ&3,ꢉꢋꢉꢅꢊꢉ,VVXLQJ
0XOWLSOHꢉ,QVWUXFWLRQVꢂ&\FOH
6XSHUVFDODUꢊꢉYDU\LQJꢉQRꢄꢉLQVWUXFWLRQVꢂF\FOHꢉꢍꢅꢉWRꢉ
ꢃꢎꢌꢉVFKHGXOHGꢉE\ꢉFRPSLOHUꢉRUꢉE\ꢉ+:ꢉꢍ7RPDVXORꢎ
² ,%0ꢉ3RZHU3&ꢌꢉ6XQꢉ8OWUD6SDUFꢌꢉ'(&ꢉ$OSKDꢌꢉ+3ꢉꢃꢈꢈꢈ
,QVWUXFWLRQꢉ/HYHOꢉ3DUDOOHOLVP
+LJKꢉVSHHGꢉH[HFXWLRQꢉEDVHGꢉRQꢉLQVWUXFWLRQ
OHYHO SDUDOOHOLVP ꢍLOSꢎꢊꢉSRWHQWLDOꢉRIꢉVKRUWꢉ LQVWUXFWLRQꢉVHTXHQFHVꢉWRꢉH[HFXWHꢉLQꢉSDUDOOHO
ꢍ9HU\ꢎꢉ/RQJꢉ,QVWUXFWLRQꢉ:RUGVꢉꢍ9ꢎ/,:ꢊꢉ
IL[HGꢉQXPEHUꢉRIꢉLQVWUXFWLRQVꢉꢍꢑꢏꢅꢒꢎꢉVFKHGXOHGꢉ E\ꢉWKHꢉFRPSLOHUꢓꢉSXWꢉRSVꢉLQWRꢉZLGHꢉWHPSODWHV
² -RLQWꢉ+3ꢂ,QWHOꢉDJUHHPHQWꢉLQꢉꢅꢆꢆꢆꢂꢀꢈꢈꢈ"
+LJKꢏVSHHGꢉPLFURSURFHVVRUVꢉH[SORLWꢉ,/3ꢉE\ꢊ
ꢅꢎꢉSLSHOLQHGꢉH[HFXWLRQꢊꢉRYHUODSꢉLQVWUXFWLRQV
² ,QWHOꢉ$UFKLWHFWXUHꢏꢒꢑꢉꢍ,$ꢏꢒꢑꢎꢉꢒꢑꢏELWꢉDGGUHVV
ꢀꢎꢉ2XWꢏRIꢏRUGHUꢉH[HFXWLRQꢉꢍFRPPLWꢉLQꢏRUGHUꢎ
ª 6W\OHꢊꢉ´([SOLFLWO\ꢉ3DUDOOHOꢉ,QVWUXFWLRQꢉ&RPSXWHUꢉꢍ(3,&ꢎµ
² 1HZꢉ681ꢉ0$-,&ꢉ$UFKLWHFWXUHꢊꢉ9/,:ꢉIRUꢉ-DYD
ꢐꢎꢉ0XOWLSOHꢉLVVXHꢊꢉLVVXHꢉDQGꢉH[HFXWHꢉPXOWLSOHꢉ
LQVWUXFWLRQVꢉSHUꢉFORFNꢉF\FOH
9HFWRUꢉ3URFHVVLQJꢊ
ꢑꢎꢉ9HFWRUꢉLQVWUXFWLRQVꢊꢉPDQ\ꢉLQGHSHQGHQWꢉRSVꢉVSHFLILHGꢉ
ZLWKꢉDꢉVLQJOHꢉLQVWUXFWLRQ
([SOLFLWꢉFRGLQJꢉRIꢉLQGHSHQGHQWꢉORRSVꢉDVꢉ RSHUDWLRQVꢉRQꢉODUJHꢉYHFWRUVꢉRIꢉQXPEHUV
² 0XOWLPHGLDꢉLQVWUXFWLRQVꢉEHLQJꢉDGGHGꢉWRꢉPDQ\ꢉSURFHVVRUV
0HPRU\ꢉDFFHVVHVꢉIRUꢉKLJKꢏVSHHGꢉ
PLFURSURFHVVRU"
$QWLFLSDWHGꢉVXFFHVVꢉOHDGꢉWRꢉXVHꢉRIꢉ
²
'DWDꢉ&DFKHꢉSRVVLEO\ꢉPXOWLSRUWHGꢌꢉPXOWLSOHꢉOHYHOV
,QVWUXFWLRQVꢉ3HUꢉ&ORFN F\FOHꢉꢍ,3&ꢎꢉYVꢄꢉ&3,
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢁ
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢒ
- ꢆꢂꢀꢇꢂꢈꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
*HWWLQJꢉ&3,ꢉꢋꢉꢅꢊꢉ,VVXLQJ
5HYLHZꢊꢉ8QUROOHGꢉ/RRSꢉWKDWꢉ
0LQLPL]HVꢉ6WDOOVꢉIRUꢉ6FDODU
0XOWLSOHꢉ,QVWUXFWLRQVꢂ&\FOH
6XSHUVFDODU '/;ꢊꢉꢀꢉLQVWUXFWLRQVꢌꢉꢅꢉ)3ꢉIꢉꢅꢉDQ\WKLQJꢉ
1 Loop: LD
F0,0(R1)
LD to ADDD: 1 Cycle ADDD to SD: 2 Cycles
HOVH
234
LD LD LD
F6,-8(R1) F10,-16(R1) F14,-24(R1)
² )HWFKꢉꢒꢑꢏELWVꢂFORFNꢉF\FOHꢓꢉ,QW RQꢉOHIWꢌꢉ)3ꢉRQꢉULJKW ² &DQꢉRQO\ꢉLVVXHꢉꢀQGꢉLQVWUXFWLRQꢉLIꢉꢅVWꢉLQVWUXFWLRQꢉLVVXHV ² 0RUHꢉSRUWVꢉIRUꢉ)3ꢉUHJLVWHUVꢉWRꢉGRꢉ)3ꢉORDGꢉIꢉ)3ꢉRSꢉLQꢉDꢉSDLU
5678
ADDD F4,F0,F2 ADDD F8,F6,F2 ADDD F12,F10,F2 ADDD F16,F14,F2
- 7\SH
- 3LSH6WDJHV
,QWꢄꢉLQVWUXFWLRQ
)3ꢉLQVWUXFWLRQ
,QWꢄꢉLQVWUXFWLRQ
)3ꢉLQVWUXFWLRQ
,QWꢄꢉLQVWUXFWLRQ
)3ꢉLQVWUXFWLRQ
,) ,' (; 0(0 :%
,) ,' (; 0(0 :%
,) ,' (; 0(0 :%
,) ,' (; 0(0 :%
,) ,' (; 0(0 :%
,) ,' (; 0(0 :%
9
SD SD SD
0(R1),F4 -8(R1),F8 -16(R1),F12
10 11 12 13 14
SUBI R1,R1,#32 BNEZ R1,LOOP
- SD
- 8(R1),F16
; 8-32 = -24
ꢅꢉF\FOHꢉORDGꢉGHOD\ꢉH[SDQGVꢉWRꢉꢐꢉLQVWUXFWLRQV LQꢉ66
14 clock cycles, or 3.5 per iteration
² LQVWUXFWLRQꢉLQꢉULJKWꢉKDOIꢉFDQ·WꢉXVHꢉLWꢌꢉQRUꢉLQVWUXFWLRQVꢉLQꢉQH[W &V6OꢀRꢁꢀWꢂ.XELDWRZLF]
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢃ
- ꢆꢂꢀꢇꢂꢈꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
/HF ꢃꢄꢇ
'\QDPLFꢉ6FKHGXOLQJꢉLQꢉ6XSHUVFDODU
7KHꢉHDV\ꢉZD\
/RRSꢉ8QUROOLQJꢉLQꢉ6XSHUVFDODU
,QWHJHUꢁLQVWUXFWLRQ
/'ꢉꢉꢉꢉ)ꢈꢌꢈꢍ5ꢅꢎ
- )3ꢁLQVWUXFWLRQ
- &ORFNꢁF\FOH
- /RRSꢊ
- ꢅ
ꢀꢐꢑꢁꢒꢇꢃꢆ
+RZꢉWRꢉLVVXHꢉWZRꢉLQVWUXFWLRQVꢉDQGꢉNHHSꢉLQꢏRUGHUꢉ
/'ꢉꢉꢉꢉ)ꢒꢌꢏꢃꢍ5ꢅꢎ /'ꢉꢉꢉꢉ)ꢅꢈꢌꢏꢅꢒꢍ5ꢅꢎ /'ꢉꢉꢉꢉ)ꢅꢑꢌꢏꢀꢑꢍ5ꢅꢎ /'ꢉꢉꢉꢉ)ꢅꢃꢌꢏꢐꢀꢍ5ꢅꢎ 6'ꢉꢉꢉꢉꢈꢍ5ꢅꢎꢌ)ꢑ 6'ꢉꢉꢉꢉꢏꢃꢍ5ꢅꢎꢌ)ꢃ 6'ꢉꢉꢉꢉꢏꢅꢒꢍ5ꢅꢎꢌ)ꢅꢀ 6'ꢉꢉꢉꢉꢏꢀꢑꢍ5ꢅꢎꢌ)ꢅꢒ 68%,ꢉꢉꢉ5ꢅꢌ5ꢅꢌLꢑꢈ %1(=ꢉꢉ5ꢅꢌ/223 6'ꢉꢉꢉꢉꢏꢐꢀꢍ5ꢅꢎꢌ)ꢀꢈ
LQVWUXFWLRQꢉLVVXHꢉIRUꢉ7RPDVXOR"
$'''ꢉ)ꢑꢌ)ꢈꢌ)ꢀ $'''ꢉ)ꢃꢌ)ꢒꢌ)ꢀ $'''ꢉ)ꢅꢀꢌ)ꢅꢈꢌ)ꢀ $'''ꢉ)ꢅꢒꢌ)ꢅꢑꢌ)ꢀ $'''ꢉ)ꢀꢈꢌ)ꢅꢃꢌ)ꢀ
² $VVXPHꢉꢅꢉLQWHJHUꢉꢔꢉꢅꢉIORDWLQJꢉSRLQW ² ꢅꢉ7RPDVXOR FRQWUROꢉIRUꢉLQWHJHUꢌꢉꢅꢉIRUꢉIORDWLQJꢉSRLQW
,VVXHꢉꢀ;ꢉ&ORFNꢉ5DWHꢌꢉVRꢉWKDWꢉLVVXHꢉUHPDLQVꢉLQꢉRUGHU 2QO\ꢉ)3ꢉORDGVꢉPLJKWꢉFDXVHꢉGHSHQGHQF\ꢉEHWZHHQꢉ
LQWHJHUꢉDQGꢉ)3ꢉLVVXHꢊ
² 5HSODFHꢉORDGꢉUHVHUYDWLRQꢉVWDWLRQꢉZLWKꢉDꢉORDGꢉTXHXHꢓꢉ RSHUDQGVꢉPXVWꢉEHꢉUHDGꢉLQꢉWKHꢉRUGHUꢉWKH\ꢉDUHꢉIHWFKHG
ꢅꢈ
ꢅꢅ ꢅꢀ
² /RDGꢉFKHFNVꢉDGGUHVVHVꢉLQꢉ6WRUHꢉ4XHXHꢉWRꢉDYRLGꢉ5$:ꢉYLRODWLRQ ² 6WRUHꢉFKHFNVꢉDGGUHVVHVꢉLQꢉ/RDGꢉ4XHXHꢉWRꢉDYRLGꢉ:$5ꢌ:$: ² &DOOHGꢉ´GHFRXSOHGꢉDUFKLWHFWXUHµꢊꢉFRPSDUHꢉZLWKꢉ6PLWKꢉSDSHU
8QUROOHGꢉꢁꢉWLPHVꢉWRꢉDYRLGꢉGHOD\VꢉꢍꢔꢅꢉGXHꢉWRꢉ66ꢎ ꢅꢀꢉFORFNVꢌꢉRUꢉꢀꢄꢑꢉFORFNVꢉSHUꢉLWHUDWLRQꢉꢍꢅꢄꢁ;ꢎ
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢆ
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
0XOWLSOHꢉ,VVXHꢉ&KDOOHQJHV
9/,:ꢊꢉ9HU\ꢉ/DUJHꢉ,QVWUXFWLRQꢉ:RUG
:KLOHꢉ,QWHJHUꢂ)3ꢉVSOLWꢉLVꢉVLPSOHꢉIRUꢉWKHꢉ+:ꢌꢉJHWꢉ&3,ꢉ
RIꢉꢈꢄꢁꢉRQO\ꢉIRUꢉSURJUDPVꢉZLWKꢊ
² ([DFWO\ꢉꢁꢈOꢉ)3ꢉRSHUDWLRQV
(DFKꢉ´LQVWUXFWLRQµꢉKDVꢉH[SOLFLWꢉFRGLQJꢉIRUꢉPXOWLSOHꢉ
RSHUDWLRQV
² 1RꢉKD]DUGV
,IꢉPRUHꢉLQVWUXFWLRQVꢉLVVXHꢉDWꢉVDPHꢉWLPHꢌꢉJUHDWHUꢉ
² ,Qꢉ(3,&ꢌꢉJURXSLQJꢉFDOOHGꢉDꢉ´SDFNHWµ
GLIILFXOW\ꢉRIꢉGHFRGHꢉDQGꢉLVVXHꢊ
² ,Qꢉ7UDQVPHWDꢌꢉJURXSLQJꢉFDOOHGꢉDꢉ´PROHFXOHµꢉꢍZLWKꢉ´DWRPVµꢉDVꢉRSVꢎ
² (YHQꢉꢀꢏVFDODUꢉ !ꢉH[DPLQHꢉꢀꢉRSFRGHVꢌꢉꢒꢉUHJLVWHUꢉVSHFLILHUVꢌꢉIꢉGHFLGHꢉ
7UDGHRIIꢉLQVWUXFWLRQꢉVSDFHꢉIRUꢉVLPSOHꢉGHFRGLQJ
LIꢉꢅꢉRUꢉꢀꢉLQVWUXFWLRQVꢉFDQꢉLVVXH
- ² 7KHꢉORQJꢉLQVWUXFWLRQꢉZRUGꢉKDVꢉURRPꢉIRUꢉPDQ\ꢉRSHUDWLRQV
- ² 5HJLVWHUꢉILOHꢊꢉQHHGꢉꢀ[ꢉUHDGVꢉDQGꢉꢅ[ꢉZULWHVꢂF\FOH
² 5HQDPHꢉORJLFꢊꢉPXVWꢉEHꢉDEOHꢉWRꢉUHQDPHꢉVDPHꢉUHJLVWHUꢉPXOWLSOHꢉWLPHVꢉ
² %\ꢉGHILQLWLRQꢌꢉDOOꢉWKHꢉRSHUDWLRQVꢉWKHꢉFRPSLOHUꢉSXWVꢉLQꢉWKHꢉORQJꢉ
LQꢉRQHꢉF\FOHRꢉꢉ)RUꢉLQVWDQFHꢌꢉFRQVLGHUꢉꢑꢏZD\ꢉLVVXHꢊ
LQVWUXFWLRQꢉZRUGꢉDUHꢉLQGHSHQGHQWꢉ !ꢉH[HFXWHꢉLQꢉSDUDOOHO
add r1, r2, r3 sub r4, r1, r2 lw r1, 4(r4) add r5, r1, r2 add p11, p4, p7 sub p22, p11, p4 lw p23, 4(p22) add p12, p23, p4
² (ꢄJꢄꢌꢉꢀꢉLQWHJHUꢉRSHUDWLRQVꢌꢉꢀꢉ)3ꢉRSVꢌꢉꢀꢉ0HPRU\ꢉUHIVꢌꢉꢅꢉEUDQFK
⇒
ª ꢅꢒꢉWRꢉꢀꢑꢉELWVꢉSHUꢉILHOGꢉ !ꢉꢇꢕꢅꢒꢉRUꢉꢅꢅꢀꢉELWVꢉWRꢉꢇꢕꢀꢑꢉRUꢉꢅꢒꢃꢉ ELWVꢉZLGH
² 1HHGꢉFRPSLOLQJꢉWHFKQLTXHꢉWKDWꢉVFKHGXOHVꢉDFURVVꢉVHYHUDOꢉEUDQFKHV
,PDJLQHꢉGRLQJꢉWKLVꢉWUDQVIRUPDWLRQꢉLQꢉDꢉVLQJOHꢉF\FOHR
² 5HVXOWꢉEXVHVꢊꢉ1HHGꢉWRꢉFRPSOHWHꢉPXOWLSOHꢉLQVWUXFWLRQVꢂF\FOH
ª 6RꢌꢉQHHGꢉPXOWLSOHꢉEXVHVꢉZLWKꢉDVVRFLDWHGꢉPDWFKLQJꢉORJLFꢉDWꢉHYHU\ꢉ UHVHUYDWLRQꢉVWDWLRQꢄ
ª 2UꢌꢉQHHGꢉPXOWLSOHꢉIRUZDUGLQJꢉSDWKV
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅꢅ
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅꢀ
- ꢆꢂꢀꢇꢂꢈꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
5HFDOOꢊꢉ8QUROOHGꢉ/RRSꢉWKDWꢉ
0LQLPL]HVꢉ6WDOOVꢉIRUꢉ6FDODU
/RRSꢉ8QUROOLQJꢉLQꢉ9/,:
- Memory
- Memory
- FP
- FP
op. 2
Int. op/ branch
Clock reference 1 reference 2 operation 1
1 Loop: LD
F0,0(R1)
LD to ADDD: 1 Cycle ADDD to SD: 2 Cycles
234
LD LD LD
F6,-8(R1) F10,-16(R1) F14,-24(R1)
- LD F0,0(R1) LD F6,-8(R1)
- 1
234567
LD F10,-16(R1) LD F14,-24(R1)
- LD F18,-32(R1) LD F22,-40(R1) ADDD F4,F0,F2
- ADDD F8,F6,F2
5678
ADDD F4,F0,F2 ADDD F8,F6,F2 ADDD F12,F10,F2 ADDD F16,F14,F2
- LD F26,-48(R1)
- ADDD F12,F10,F2 ADDD F16,F14,F2
ADDD F20,F18,F2 ADDD F24,F22,F2
- SD 0(R1),F4
- SD -8(R1),F8 ADDD F28,F26,F2
SD -16(R1),F12 SD -24(R1),F16 SD -32(R1),F20 SD -40(R1),F24 SD -0(R1),F28
9
SD SD SD
0(R1),F4 -8(R1),F8 -16(R1),F12
SUBI R1,R1,#48 BNEZ R1,LOOP
89
10 11 12 13 14
SUBI R1,R1,#32 BNEZ R1,LOOP SD
Unrolled 7 times to avoid delays 7 results in 9 clocks, or 1.3 clocks per iteration (1.8X) Average: 2.5 ops per clock, 50% efficiency
8(R1),F16
; 8-32 = -24
14 clock cycles, or 3.5 per iteration
Note: Need more registers in VLIW (15 vs. 6 in SS)
ꢆꢂꢀꢇꢂꢈꢈ
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅꢐ
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅꢑ
ꢆꢂꢀꢇꢂꢈꢈ
5HFDOOꢊꢉ6RIWZDUHꢉ3LSHOLQLQJꢉ([DPSOH
5HFDOOꢊꢉ6RIWZDUHꢉ3LSHOLQLQJ
2EVHUYDWLRQꢊꢉLIꢉLWHUDWLRQVꢉIURPꢉORRSVꢉDUHꢉLQGHSHQGHQWꢌꢉ
WKHQꢉFDQꢉJHWꢉPRUHꢉ,/3ꢉE\ꢉWDNLQJꢉLQVWUXFWLRQVꢉIURPꢉ GLIIHUHQW LWHUDWLRQV
6RIWZDUHꢉSLSHOLQLQJꢊꢉUHRUJDQL]HVꢉORRSVꢉVRꢉWKDWꢉHDFKꢉ
LWHUDWLRQꢉLVꢉPDGHꢉIURPꢉLQVWUXFWLRQVꢉFKRVHQꢉIURPꢉ GLIIHUHQWꢉLWHUDWLRQVꢉRIꢉWKHꢉRULJLQDOꢉORRSꢉꢍ ꢉ7RPDVXOR LQꢉ
%HIRUHꢊꢉ8QUROOHGꢉꢐꢉWLPHV
After: Software Pipelined
ꢅꢉ LD F0,0(R1)
1 SD
0(R1),F4 ; Stores M[i]
ꢀ ADDD F4,F0,F2
2 ADDD F4,F0,F2 ; Adds to M[i-1]
ꢐ SD
0(R1),F4
F6,-8(R1)
- 3 LD
- F0,-16(R1);Loads M[i-2]
ꢑ LD
4 SUBI R1,R1,#8 5 BNEZ R1,LOOP
ꢁ ADDD F8,F6,F2 ꢒ SD
ꢇ LD
-8(R1),F8
F10,-16(R1)
SW Pipeline
ꢃ ADDD F12,F10,F2
6:ꢎ
Iteration
ꢆ SD
-16(R1),F12
0
Iteration
1
ꢅꢈ SUBI R1,R1,#24
Iteration
- 2
- Iteration
3
Time
ꢅꢅ BNEZ R1,LOOP
Iteration
4
Loop Unrolled
• Symbolic Loop Unrolling
– Maximize result-use distance – Less code space than unrolling – Fill & drain pipe only once per loop
Softwarepipelined iteration
Time
vs. once per each unrolled iteration in loop unrolling
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅꢁ
&6ꢀꢁꢀꢂ.XELDWRZLF]
/HF ꢃꢄꢅꢒ
- ꢆꢂꢀꢇꢂꢈꢈ
- ꢆꢂꢀꢇꢂꢈꢈ
6RIWZDUHꢉ3LSHOLQLQJꢉZLWK
/RRSꢉ8QUROOLQJꢉLQꢉ9/,:
7UDFHꢉ6FKHGXOLQJ
3DUDOOHOLVPꢉDFURVVꢉ,)ꢉEUDQFKHVꢉYVꢄꢉ/223ꢉEUDQFKHV 7ZRꢉVWHSVꢊ
Memory reference 1
- Memory
- FP
- FP
- Int. op/
- Clock
- reference 2
- operation 1
- op. 2 branch
² 7UDFHꢁ6HOHFWLRQ
LD F0,-48(R1) LD F6,-56(R1) LD F10,-40(R1)
ST 0(R1),F4 ST -8(R1),F8 ST 8(R1),F12
ADDD F4,F0,F2 ADDD F8,F6,F2 ADDD F12,F10,F2
1