Ieee 754-2008

Total Page:16

File Type:pdf, Size:1020Kb

Ieee 754-2008 IEEE 754-2008 (ﺑﺮ ﮔﺮﻓﺘﻪ از IEEE 754) (ﺗﺮﺟﻤﻪ ﻣﻘﺎﻟﻪ From Wikipedia, the free encyclopedia) اﺳﺘﺎﻧﺪارد IEEE ﺑﺮای اﻋﺪاد ﻣﻤﯿﺰ ﺷﻨﺎور رﯾﺎﺿﯽ (IEEE 754 )ﺗﺪوﯾﻦ ﺷﺪه اﺳﺖ. اﯾﻦ اﺳﺘﺎﻧﺪارد ﺑﻪ ﻃﻮر ﮔﺴﺘﺮده ای ﺑﺮای ﻣﺤﺎﺳﺒﺎت ﻣﻤﯿﺰ ﺷﻨﺎور ﻣﻮرد اﺳﺘﻔﺎده ﻗﺮار ﻣﯽ ﮔﯿﺮد و ﺑﺴﯿﺎری از ﺳﺨﺖ اﻓﺰارﻫﺎ (cpuوFPU)و ﻧﺮم اﻓﺰارﻫﺎی ﭘﯿﺎده ﺳﺎزی از آن ﺗﺒﻌﯿﺖ ﻣﯽ ﮐﻨﻨﺪ. ﺑﺴﯿﺎری از زﺑﺎن ﻫﺎی ﺑﺮﻧﺎﻣﻪ ﺳﺎزی اﺟﺎزه ﻣﯽ دﻫﻨﺪ وﯾﺎ ﻧﯿﺎز دارﻧﺪ ﮐﻪ درﺑﺴﯿﺎری ﯾﺎ ﺗﻤﺎم ﻣﺤﺎﺳﺒﺎت رﯾﺎﺿﯽ از ﻗﺎﻟﺐ ﻫﺎ و ﻋﻤﻠﮕﺮ ﻫﺎ ی ﺧﻮد از 754IEEE ﺑﺮای ﭘﯿﺎده ﺳﺎزی اﺳﺘﻔﺎده ﮐﻨﻨﺪ. وﯾﺮاﯾﺶ ﺟﺎری اﯾﻦ اﺳﺘﺎﻧﺪارد IEEE 754-2008 اﺳﺖ ﮐﻪ درAugust 2008 ﻣﻨﺘﺸﺮ ﺷﺪه اﺳﺖ. اﯾﻦ اﺳﺘﺎﻧﺪارد ﺗﻘﺮﯾﺒﺎ" درﺑﺮدارﻧﺪه ﺗﻤﺎم ﺧﺼﻮﺻﯿﺎت IEEE اﺻﻠﯽ ،IEEE 754-1985 (ﮐﻪ در ﺳﺎل 1985 ﻣﻨﺘﺸﺮ ﺷﺪه اﺳﺖ) و اﺳﺘﺎﻧﺪارد IEEEﺑﺮای Radix اﺳﺘﻘﻼل ﻣﻤﯿﺰ ﺷﻨﺎور رﯾﺎﺿﯽ(IEEE 854-1987)اﺳﺖ. ﺗﻌﺎرﯾﻒ اﺳﺘﺎﻧﺪارد • ﻗﺎﻟﺐ ﻫﺎی رﯾﺎﺿﯽ: ﻣﺠﻤﻮﻋﻪ ﻫﺎی دودوﯾﯽ و داده ﻫﺎی ﻣﻤﯿﺰ ﺷﻨﺎور دﻫﺪﻫﯽ ﮐﻪ ﺷﺎﻣﻞ اﻋﺪاد ﻣﺘﻨﺎﻫﯽ اﻧﺪ(ﺷﺎﻣﻞ ﺻﻔﺮ ﻣﻨﻔﯽ و اﻋﺪاد ﻏﯿﺮ ﻃﺒﯿﻌﯽ)،ﻧﺎﻣﺘﻨﺎﻫﯽ ﻫﺎ و ﺑﺨﺼﻮص ﻣﻘﺎدﯾﺮی ﮐﻪ ﻋﺪد ﻧﯿﺴﺘﻨﺪ(NANs). • ﻗﺎﻟﺐ ﻫﺎی ﺗﺒﺎدل:رﻣﺰﮔﺬاری ﻫﺎﯾﯽ (رﺷﺘﻪ ﺑﯿﺖ ﻫﺎ)ﮐﻪ ﻣﻤﮑﻦ اﺳﺖ ﺑﺮای ﺗﺒﺎدل داده ﻫﺎی ﻣﻤﯿﺰ ﺷﻨﺎور در ﯾﮏ ﻓﺮم ﻓﺸﺮده و ﮐﺎرا، اﺳﺘﻔﺎده ﺷﻮد. • اﻟﮕﻮرﯾﺘﻢ ﮔﺮد ﮐﺮدن: روش ﻫﺎﯾﯽ ﮐﻪ ﺑﺮای ﮔﺮد ﮐﺮدن اﻋﺪاد در ﻃﻮل ﻣﺤﺎﺳﺒﺎت رﯾﺎﺿﯽ و ﺗﺒﺪﯾﻼت اﺳﺘﻔﺎده ﻣﯽ ﺷﻮد. • ﻋﻤﻠﮕﺮ ﻫﺎ: ﻋﻤﻠﮕﺮ ﻫﺎی رﯾﺎﺿﯽ و ﺳﺎﯾﺮ ﻋﻤﻠﮕﺮﻫﺎ ﺑﺎﯾﺪ در ﻗﺎﻟﺐ رﯾﺎﺿﯽ ﺑﺎﺷﻨﺪ. • ﻣﺪﯾﺮﯾﺖ ﺧﻄﺎ: دﻻﻟﺖ ﺑﺮ ﺧﻄﺎ در ﺣﺎﻟﺖ اﺳﺘﺜﻨﺎء(ﺗﻘﺴﯿﻢ ﺑﺮ ﺻﻔﺮ،ﺳﺮرﯾﺰ و ﻏﯿﺮه)دارد. اﻟﺒﺘﻪ، اﯾﻦ اﺳﺘﺎﻧﺪارد ﺷﺎﻣﻞ ﺗﻮﺻﯿﻪ ﻫﺎی ﺑﯿﺸﺘﺮی ﺑﺮای ﻣﺪﯾﺮﯾﺖ ﺧﻄﺎی ﭘﯿﺸﺮﻓﺘﻪ،ﻋﻤﻠﮕﺮﻫﺎی اﺿﺎﻓﯽ (ﻣﺎﻧﻨﺪ ﺗﻮاﺑﻊ trigonomtic)، ﻋﺒﺎرات ارزﯾﺎﺑﯽ و ﻫﻤﭽﻨﯿﻦ ﺑﺮای دﺳﺘﺮﺳﯽ ﺑﻪ ﻧﺘﺎﯾﺞ ﻗﺎﺑﻞ اﺳﺘﻔﺎده ﻣﺠﺪد اﺳﺖ. اﯾﻦ اﺳﺘﺎﻧﺪارد ﻣﺸﺘﻖ ﺷﺪه ازوﯾﺮاﯾﺶ1985 و ﺟﺎﯾﮕﺰﯾﻦ ﺷﺪه آن ﻣﯽ ﺑﺎﺷﺪ، ﮐﻪ در ﻣﺪت 7ﺳﺎل ﺑﻪ ﺻﻮرت ﭘﺮدازش ﻣﻌﮑﻮس ﺗﻮﺳﻂ Dan Zuras رﻫﺒﺮی ﺷﺪ و ﺗﻮﺳﻂ Mike Cowlishaw وﯾﺮاﯾﺶ ﺷﺪ.ﻗﺎﻟﺐ ﻫﺎی دودوﯾﯽ اﺳﺘﺎﻧﺪارد اﺻﻠﯽ،در ﻃﻮل اﺳﺘﺎﻧﺪارد ﺟﺪﯾﺪ ﻧﯿﺰﺗﻮﺳﻂ ﺳﻪ ﻗﺎﻟﺐ ﭘﺎﯾﻪ ﺟﺪﯾﺪ ﻧﯿﺰ رﻋﺎﯾﺖ ﺷﺪه اﺳﺖ(ﯾﮏ دودوﯾﯽ و دو دﻫﺪﻫﯽ).ﺑﺮای وﻓﻖ دادن اﯾﻦ ﻗﺎﻟﺐ ﺑﺎ اﺳﺘﺎﻧﺪارد ﺟﺎری، ﯾﮏ ﭘﯿﺎده ﺳﺎزی ﺑﺎﯾﺪ ﺣﺪاﻗﻞ ﯾﮑﯽ از دو ﻗﺎﻟﺐ رﯾﺎﺿﯽ و ﺗﺒﺎدل را ﭘﯿﺎده ﺳﺎزی ﮐﻨﺪ. ﻣﻔﺎﻫﯿﻢ • 1.ﻗﺎﻟﺐ ﻫﺎ o 1.1 ﻗﺎﻟﺐ ﻫﺎی اﺻﻠﯽ o 2.1 ﻗﺎﻟﺐ ﻫﺎی رﯾﺎﺿﯽ o 3.1 ﻗﺎﻟﺐ ﻫﺎی ﺗﺒﺎدل • 2.اﻟﮕﻮرﯾﺘﻢ ﮔﺮد ﮐﺮدن o 1.2ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﻧﺰدﯾﮑﺘﺮﯾﻦ ﻋﺪد o 2.2 ﮔﺮد ﮐﺮدن ﻣﺴﺘﻘﯿﻢ • 3.ﻋﻤﻠﮕﺮﻫﺎ • 4.ﻣﺪﯾﺮﯾﺖ ﺧﻄﺎ • 5.ﭘﯿﺸﻨﻬﺎدﻫﺎ o 1.5ﻣﺪﯾﺮﯾﺖ ﺧﻄﺎی ﻣﺘﻨﺎوب o 2.5 ﻋﻤﻠﮕﺮﻫﺎی ﭘﯿﺸﻨﻬﺎد ﺷﺪه o 3.5 ارزﯾﺎﺑﯽ ﻋﺒﺎرات o 4.5 ﻗﺎﺑﻠﯿﺖ اﺳﺘﻔﺎده ﻣﺠﺪد • 6.ﻣﻨﺎﺑﻊ • 7.ﻣﻄﺎﻟﻌﻪ ﺑﯿﺸﺘﺮ • 8.ﻟﯿﻨﮏ ﻫﺎی ﺑﯿﺸﺘﺮ ﻗﺎﻟﺐ ﻫﺎ ﻗﺎﻟﺐ ﻫﺎی IEEE 754 ﻣﺠﻤﻮﻋﻪ ای از داده ﻫﺎی ﻣﻤﯿﺰ ﺷﻨﺎور و رﻣﺰﮔﺬاری ﺷﺪه را ﺟﻬﺖ ﺗﺒﺎدل آن ﻫﺎ ﺗﻮﺻﯿﻒ ﻣﯽ ﮐﻨﺪ. ﯾﮏ ﻗﺎﻟﺐ ﺷﺎﻣﻞ : • اﻋﺪاد ﻣﺘﻨﺎﻫﯽ، ﮐﻪ ﻣﻤﮑﻦ اﺳﺖ ﺑﺮ ﻣﺒﻨﺎی 2 (دودوﯾﯽ)و ﯾﺎ ﻣﺒﻨﺎی 10 (دﻫﺪﻫﯽ) ﺑﺎﺷﻨﺪ.ﻫﺮ ﻋﺪد ﻣﺘﻨﺎﻫﯽ ﺑﻮﺳﯿﻠﻪ ﺳﻪ ﻋﺪد،ﺑﺴﯿﺎر ﺳﺎده ﺗﻮﺻﯿﻒ ﻣﯽ ﺷﻮد:ﯾﮏ ﻋﻼﻣﺖ(0 ﯾﺎ s,(1, ﯾﮏ ﺿﺮﯾﺐ,c, وﯾﮏ ﺗﻮان ,q .ﻣﻘﺪارﻋﺪدی ﯾﮏ ﻋﺪد ﻣﺘﻨﺎﻫﯽ ﺑﻪ اﯾﻦ ﮔﻮﻧﻪ اﺳﺖ .(−1)s×c×bq ﻫﺮﺟﺎﯾﯽ ﮐﻪ b در ﻣﺒﻨﺎی 2وﯾﺎ 10 ﺑﺎﺷﺪ.ﯾﻪ ﻋﻨﻮان ﻣﺜﺎل، اﮔﺮ ﻋﻼﻣﺖ آن 1 ﺑﺎﺷﺪ (ﻧﺸﺎن دﻫﻨﺪه ﻣﻨﻔﯽ) ﺿﺮﯾﺐ آن 12345 ، ﺗﻮان آن (3 - )و ﻣﺒﻨﺎی آن (10) ﺑﺎﺷﺪ ، ﺑﻨﺎﺑﺮاﯾﻦ ﻣﻘﺪار اﯾﻦ ﻋﺪد اﯾﻦ ﮔﻮﻧﻪ اﺳﺖ: 12.345 - • دو ﻋﻼﻣﺖ ﺑﯽ ﻧﻬﺎﯾﺖ ∞+ و∞− . • دو ﻧﻮع NaN (آرام ﻣﺎﻧﺪن و ﻋﻼﻣﺖ دادن). اﻟﺒﺘﻪ ﯾﮏ NaNﻣﻤﮑﻦ اﺳﺖ ﺟﻬﺖ ﻧﺸﺎن دادن اﻃﻼﻋﺎت ﻣﺮﺑﻮط ﺑﻪ ﻣﻨﺒﻊ ﺧﻄﺎ ی NaN،ﯾﮏ ﻣﺤﻤﻮﻟﻪ ( payload )را ﺣﻤﻞ ﻣﯽ ﮐﻨﺪ.ﻋﻼﻣﺖ ﯾﮏ NaN ﻫﯿﭻ ﻣﻌﻨﯽ ﻧﺪارد؟، اﻣﺎ اﯾﻦ اﻣﮑﺎن وﺟﻮد دارد ﮐﻪ در ﺑﺴﯿﺎری از ﺷﺮاﯾﻂ ﻗﺎﺑﻞ ﭘﯿﺶ ﺑﯿﻨﯽ ﺑﺎﺷﺪ. ﻣﻘﺪار ﻣﺘﻨﺎﻫﯽ ﻣﻤﮑﻦ اﺳﺖ در ﯾﮏ ﻗﺎﻟﺐ داده ﺷﺪه ﻣﻌﯿﻦ ﻗﺎﺑﻞ اراﺋﻪ ﻣﺠﺪد ﺑﺎﺷﺪ.اﯾﻦ ﻗﺎﻟﺐ ﺑﻪ اﯾﻦ ﺻﻮرت اﺳﺖ ﮐﻪ ، ﻣﺒﻨﺎﯾﯽ (b)،ارﻗﺎم اﻋﺪاد در ﺿﺮﯾﺐ (دﻗﺖp )و ﭘﺎراﻣﺘﺮ ﺗﻮان emax : • C ﺑﺎﯾﺪ ﯾﮏ ﻋﺪد ﺻﺤﯿﺢ در ﻣﺤﺪوده ﺻﻔﺮ و bp−1 ﺑﺎﺷﺪ (ﺑﻪ ﻋﻨﻮان ﻣﺜﺎل، اﮔﺮ b=10 وp=7 ﺑﺎﺷﺪ ﺑﻨﺎﺑﺮ اﯾﻦ c ﻣﺎ ﺑﯿﻦ ﺻﻔﺮ ﺗﺎ9999999). • q ﺑﺎﯾﺪ ﯾﮏ ﻋﺪد ﺻﺤﯿﺢ ﺑﺎﺷﺪ ﮐﻪ emax ≤ q+p−1 ≤ emax−1 (ﺑﻪ ﻋﻨﻮان ﻣﺜﺎل، اﮔﺮ p=7 وemax=96 ﺑﻨﺎﺑﺮاﯾﻦ q ﻣﺎﺑﯿﻦ 101 - ﺗﺎ 90 اﺳﺖ). ﺑﻪ ﻋﺒﺎرت دﯾﮕﺮ،(ﺑﺮای ﭘﺎرﻣﺘﺮﻫﺎی ﻣﺜﺎل) ﮐﻤﺘﺮﯾﻦ ﻋﺪد ﻏﯿﺮ ﺻﻔﺮ ﮐﻪ ﻣﯽ ﺗﻮاﻧﺪ دوﺑﺎره اراﺋﻪ ﺷﻮد 101−10×1 وﺑﺰرﮔﺘﺮﯾﻦ ﻋﺪد ﻣﺜﺒﺖ ﻏﯿﺮﺻﻔﺮ 1090×9999999 اﺳﺖ، وﻣﺤﺪوده ﮐﺎﻣﻞ اﻋﺪاد از1096×9.999999− ﺗﺎ 1096×9.999999 اﺳﺖ.ﻧﺰدﯾﮑﺘﺮﯾﻦ اﻋﺪاد ﺟﻬﺖ ﻣﻌﮑﻮس ﮐﺮدن اﯾﻦ ﺑﺎزه (and 1×10−95 95−10×1−) ﺷﺎﻣﻞ ﮐﻤﺘﺮﯾﻦ ﻣﻘﺪار (از ﻟﺤﺎظ ﻣﻘﺪار) اﻋﺪاد ﻧﺮﻣﺎل ﻣﯽ ﺑﺎﺷﺪ; اﻋﺪاد ﻏﯿﺮﺻﻔﺮ ﻣﯿﺎن اﯾﻦ ﮐﻤﺘﺮﯾﻦ ﻣﻘﺪارﻫﺎ ﺑﻪ ﻧﺎم اﻋﺪاد ﻏﯿﺮ ﻣﻌﻤﻮل ( subnormal numbers) ﻧﺎﻣﯿﺪه ﻣﯽ ﺷﻮﻧﺪ. ﻗﺎﻟﺐ ﻫﺎی اﺻﻠﯽ اﯾﻦ اﺳﺘﺎﻧﺪارد ﻫﻔﺖ ﻗﺎﻟﺐ اﺻﻠﯽ را ﺗﻌﺮﯾﻒ ﻣﯽ ﮐﻨﺪ، ﺑﻮﺳﯿﻠﻪ ﻣﺒﻨﺎی آن ﻫﺎ ﻧﺎم ﮔﺬاری ﺷﺪه و ﺗﻌﺪاد ارﻗﺎم ﺑﺮای رﻣﺰﮔﺬاری آن ﻫﺎ اﺳﺘﻔﺎده ﺷﺪه اﺳﺖ. ﺳﻪ ﻗﺎﻟﺐ ﺑﺮای ﻣﻤﯿﺰ ﺷﻨﺎور وﺟﻮد دارد(ﮐﻪ ﻣﯽ ﺗﻮان آن ﻫﺎ را ﺑﺎ اﺳﺘﻔﺎده از 32 ، 64 ﯾﺎ 128 ﺑﯿﺖ رﻣﺰ ﮔﺬاری ﻣﯽ ﺷﻮد) و ﻗﺎﻟﺐ ﺑﺮای اﻋﺪاد ﻣﻤﯿﺰ ﺷﻨﺎور دﻫﺪﻫﯽ(ﮐﻪ ﻣﯽ ﺗﻮاﻧﺪ ﺑﺎ اﺳﺘﻔﺎده از 64 ﯾﺎ 128 ﺑﯿﺖ رﻣﺰ ﮔﺬاری ﺷﻮد) .دو ﻗﺎﻟﺐ اول دودوﯾﯽ ’single‘ و ’double‘، ﻗﺎﻟﺐ ﻫﺎی IEEE 754-1985 ﻫﺴﺘﻨﺪ و ﺳﻮﻣﯿﻦ ﻗﺎﻟﺐ اﻏﻠﺐ اوﻗﺎت ’quad‘ ﻧﺎﻣﯿﺪه ﻣﯽ ﺷﻮد و ﻗﺎﻟﺐ ﻫﺎی دﻫﺪﻫﯽ ﺑﻪ ﻃﻮر ﻣﺸﺎﺑﻪ اﻏﻠﺐ اوﻗﺎت ﺑﺎ ﻧﺎم ﻫﺎی ’double‘و ’quad‘ ﻧﺎﻣﯿﺪه ﻣﯽ ﺷﻮﻧﺪ. parameter → b p emax format name base (bits or digits) binary32 2 23+1 bits +127 binary64 2 52+1 bits +1023 binary128 2 112+1 bits +16383 decimal64 10 16 digits +384 decimal128 10 34 digits +6144 ﺗﻤﺎم ﻗﺎﻟﺐ ﻫﺎی اﺻﻠﯽ ﻫﻢ در ﭘﯿﺎده ﺳﺎزی ﺳﺨﺖ اﻓﺰار و ﻫﻢ در ﭘﯿﺎده ﺳﺎزی ﻧﺮم اﻓﺰار در دﺳﺘﺮس ﻫﺴﺘﻨﺪ. ﻗﺎﻟﺐ ﻫﺎی رﯾﺎﺿﯽ اﯾﻦ ﻗﺎﻟﺐ ﻓﻘﻂ در رﯾﺎﺿﯽ و در ﻋﻤﻠﮕﺮﻫﺎی دﯾﮕﺮ ﻣﻮرد اﺳﺘﻔﺎده ﻗﺮار ﻣﯽ ﮔﯿﺮد و اﺣﺘﯿﺎج ﺑﻪ ﯾﮏ رﻣﺰﮔﺰاری ﻣﺮﺑﻮط ﺑﻪ آن ﻧﺪارد(اﯾﻦ ﺑﺪﯾﻦ ﻣﻌﻨﯽ اﺳﺖ ﮐﻪ، ﯾﮏ ﭘﯿﺎده ﺳﺎزی ﻣﯽ ﺗﻮاﻧﺪ ﺑﻪ ﻫﺮ ﺗﻌﺪاد ﺑﻮﺳﯿﻠﻪ اراﺋﻪ ﻫﺎی دوﺑﺎره داﺧﻠﯽ اﻧﺘﺨﺎب ﺷﻮﻧﺪ) ; ﺗﻤﺎم اﯾﻦ ﻗﺎﻟﺐ ﻫﺎ اﺣﺘﯿﺎج ﺑﻪ ﺗﻌﺮﯾﻒ ﭘﺎراﻣﺘﺮ دارﻧﺪ( b, pوemax ).اﯾﻦ ﭘﺎراﻣﺘﺮﻫﺎ ﺑﻪ ﺻﻮرت ﻣﻨﺤﺼﺮ ﺑﻔﺮد، ﻣﺠﻤﻮﻋﻪ اﻋﺪاد ﻣﺘﻨﺎﻫﯽ را ﺗﻌﺮﯾﻒ ﻣﯽ ﮐﻨﺪ(ﺗﺮﮐﯿﺒﯽ از ﻋﻼﻣﺖ،ﺿﺮﯾﺐ، ﺗﻮان)ﮐﻪ ﻣﯽ ﺗﻮاﻧﺪ دوﺑﺎره اراﺋﻪ ﺷﻮد. ﻗﺎﻟﺐ ﻫﺎ ی ﻣﺒﺎدﻟﻪ : ﻓﺮﻣﺘﻬﺎی ﻣﺒﺎدﻟﻪ ﻗﺼﺪ دارد ، داده ﻣﻤﯿﺰ ﺷﻨﺎور در ﺣﺎل اﺳﺘﻔﺎده را ﺑﻪ ﯾﮏ رﺷﺘﻪ ﺑﯿﺘﯽ ﺑﺎ ﻃﻮل ﺛﺎﺑﺖ ﺑﺮای ﻓﺮﻣﺖ داده ﺷﺪه ، ﺗﺒﺪﯾﻞ ﮐﻨﺪ . ﺑﺮای ﺗﺒﺪﯾﻞ اﻋﺪاد ﻣﻤﯿﺰ ﺷﻨﺎور دودوﯾﯽ ، ﻓﺮﻣﺘﻬﺎی ﻣﺒﺎدﻟﻪ ﺑﺎ ﻃﻮل 16 ﺑﯿﺖ ، 32 ﺑﯿﺖ ، 64 ﺑﯿﺖ و ﻫﺮ ﻣﻀﺮﺑﯽ از 32 ﺑﯿﺖ ﮐﻪ ﺑﺰرﮔﺘﺮ ﻣﺴﺎوی 128 ﺑﯿﺖ ﺑﺎﺷﺪ ، ﺗﻌﺮﯾﻒ ﺷﺪه اﺳﺖ . ﻓﺮﻣﺖ 16 ﺑﯿﺘﯽ ﺑﻪ ﻣﻨﻈﻮر ﺗﺒﺪﯾﻞ ﯾﺎ ذﺧﯿﺮه اﻋﺪاد ﮐﻮﭼﮏ اﺳﺖ ( ﺑﻪ ﻋﻨﻮان ﻣﺜﺎل ﺑﺮای ﺗﺼﺎوﯾﺮ ) . ﻃﺮح رﻣﺰ ﮔﺰاری ﺑﺮای ﻓﺮﻣﺘﻬﺎی ﻣﺒﺎدﻟﻪ دودوﯾﯽ ﻣﺸﺎﺑﻪ ﺑﻪ آﻧﭽﻪ در IEEE 754 - 1985 اﺳﺖ ، ﻣﯽ ﺑﺎﺷﺪ ﯾﮏ ﺑﯿﺖ ﻋﻼﻣﺖ ، ﺑﺎ ﺑﯿﺖ ﻫﺎی ﺗﻮان w ﮐﻪ ﺗﻮان آﻧﺴﺖ ﺑﻮﺳﯿﻠﻪ ﯾﮏ راه اﻧﺪاز و ﺑﯿﺖ ﻫﺎی p – 1 اﯾﻦ ﻣﻔﻬﻮم را ﺗﻮﺻﯿﻒ ﻣﯽ ﮐﻨﺪ . ﻋﺮض ﻓﯿﻠﻪ ﺗﻮان ﺑﺮای ﯾﮏ ﻓﺮﻣﺖ k – ﺑﯿﺘﯽ ﺑﻪ ﺻﻮرت : W= round ( 4*log 2 (k) ) _ 13 ﻣﺤﺎﺳﺒﻪ ﺷﺪه اﺳﺖ . ﻓﺮﻣﺘﻬﺎی 64 و 128 ﺑﯿﺘﯽ ﻣﻮﺟﻮد از اﯾﻦ ﻗﺎﻧﻮن ﭘﯿﺮوی ﻣﯽ ﮐﻨﻨﺪ اﻣﺎ ﻓﺮﻣﺘﻬﺎی 16 و 32 ﺑﯿﺘﯽ ، ﺑﯿﺘﻬﺎی ﺗﻮان ﺑﯿﺸﺘﺮی دارﻧﺪ ( 8 و 5 ) ﮐﻪ از اﯾﻦ ﻓﺮﻣﻮل تءﻣﯿﻦ ﻣﯽ ﺷﻮد (ﺑﻪ ﺗﺮﺗﯿﺐ 3 و 7 ) ﺑﺮای ﺗﺒﺪﯾﻞ اﻋﺪاد ﻣﻤﯿﺰ ﺷﻨﺎور دﻫﺪﻫﯽ ، ﻓﺮﻣﺘﻬﺎی ﻣﺒﺎدﻟﻪ ، ﻫﺮ ﻣﻤﯿﺰی از 32 ﺑﯿﺖ ﺗﻌﺮﯾﻒ ﺷﺪه اﺳﺖ . ﻃﺮح رﻣﺰ ﮔﺰاری ﺑﺮای ﻓﺮﻣﺘﻬﺎی ﻣﺎدﻟﻪ دﻫﺪﻫﯽ ﻣﺸﺎﺑﻪ رﻣﺰ ﮔﺰاری ﻋﻼﻣﺖ ، ﺗﻮان و ﻣﻔﻬﻮم اﺳﺖ ، اﻣﺎ اﺳﺘﻔﺎده از ﻓﺮاﯾﻨﺪی ﭘﯿﭽﯿﺪه ﺗﺮ ، signfcand را ﺑﺮای رﻣﺰ ﮔﺰاری ﺑﻪ ﻋﻨﻮان ﯾﮏ دﻧﺒﺎﻟﻪ ﻓﺸﺮده از ارﻗﺎم دﻫﺪﻫﯽ ﯾﺎ ﺑﻪ ﻋﻨﻮان ﯾﮏ ﻋﺪد ﺻﺤﯿﺢ دودوﯾﯽ ﻣﺠﺎز ﻣﯽ ﮐﻨﺪ . در ﻫﺮ ﻣﻮرد ، ﻣﺠﻤﻮﻋﻪ اﻋﺪاد ( ﺗﺮﮐﯿﺒﯽ از ﻋﻼﻣﺖ ، ﺗﻮان و signfcand ) ﻣﻤﮑﻦ اﺳﺖ ﺑﻄﻮر ﯾﮑﺴﺎن رﻣﺰ ﮔﺰاری ﺷﻮﻧﺪ و ﺳﯿﮕﻨﺎل ﻫﺎی NaN ازﯾﮏ رﻣﺰ ﮔﺰاری ﻣﻨﺤﺼﺮ ﺑﻪ ﻓﺮد اﺳﺘﻔﺎده ﻣﯽ ﮐﻨﺪ( ﻣﺠﻤﻮﻋﻪ ﻣﺸﺎﺑﻪ ﻣﺤﻤﻮﻟﻪ ﻫﺎی ﻣﻤﮑﻦ) اﻟﮕﻮ رﯾﺘﻢ ﻫﺎی ﮔﺮد ﮐﺮدن : اﯾﻦ اﺳﺘﺎﻧﺪارد ﭘﻨﺞ ﺗﻌﺮﯾﻒ ﺑﺮای اﻟﮕﻮرﯾﺘﻢ ﻫﺎی ﮔﺮد ﮐﺮدن دارد . اول دو ﺗﻌﺮﯾﻒ ﺑﺮای ﻧﺰدﯾﮑﺘﺮﯾﻦ ﻣﻘﺪار و ﺑﻘﯿﻪ ﮔﺮد ﮐﺮدن ﻫﺎ ﻣﺴﺘﻘﯿﻢ ﻧﺎﻣﯿﺪه ﻣﯽ ﺷﻮد: اﻟﮕﻮرﯾﺘﻢ ﮔﺮد ﮐﺮدن ﺑﻪ ﻧﺰدﯾﮏ ﺗﺮﯾﻦ ﻣﻘﺪار: • ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﻧﺰدﯾﮏ ﺗﺮﯾﻦ ﻋﺪد زوج : اﮔﺮ ﻋﺪد از ﻧﯿﻤﻪ ﮐﻤﺘﺮ ﺑﺎﺷﺪ ، ﺑﻪ ﻧﺰدﯾﮑﺘﺮﯾﻦ ﻣﻘﺪار ﺑﺎ ﺣﺪاﻗﻞ ﺑﯿﺖ ﻫﺎی ﺿﺮﯾﺐ زوج (ﺻﻔﺮ) ﮔﺮد ﻣﯽ ﺷﻮد ، ﮐﻪ اﯾﻦ اﺗﻔﺎق در 50% ﻣﻮاﻗﻊ ﻣﯽ اﻓﺘﺪ .اﯾﻦ اﻟﮕﻮرﯾﺘﻢ ﺑﻪ ﺻﻮرت ﭘﯿﺶ ﻓﺮض ﺑﺮروی اﻋﺪاد ﻣﻤﯿﺰ ﺷﻨﺎور دودوﯾﯽ اﻧﺠﺎم ﻣﯽ ﺷﻮد و ﭘﯿﺶ ﻓﺮض ﻗﺮار دادن آن ﺑﺮای اﻋﺪاد دﻫﺪﻫﯽ ﻧﯿﺰ ﺗﻮﺻﯿﻪ ﺷﺪه اﺳﺖ . ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ دور ﺷﺪن از ﻣﻘﺪار ﺻﻔﺮ : ﮔﺮد ﮐﺮدن ﺑﻪ ﻧﺰدﯾﮑﺘﺮﯾﻦ ﻣﻘﺪار؛ اﮔﺮ ﻋﺪد ﮐﻤﺘﺮ از ﻧﯿﻤﻪ ﺑﺎﺷﺪ ، ﺑﻪ ﻧﺰدﯾﮑﺘﺮﯾﻦ ﻣﻘﺪار ﺑﺎﻻ ( ﺑﺮای اﻋﺪاد ﻣﺜﺒﺖ ) ﯾﺎ ﭘﺎﯾﯿﻦ ( ﺑﺮای اﻋﺪاد ﻣﻨﻔﯽ ) ﮔﺮد ﺷﺪه اﺳﺖ . ﮔﺮد ﮐﺮدن ﻣﺴﺘﻘﯿﻢ : • ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﺻﻔﺮ( Round toward 0) : ﺟﻬﺖ ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﺻﻔﺮ ( ﮐﻪ ﮐﻮﺗﺎه ﺳﺎزی ﻫﻢ ﻧﺎﻣﯿﺪه ﻣﯽ ﺷﻮد .) • ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﻣﺜﺒﺖ ﺑﯽ ﻧﻬﺎﯾﺖ (∞+Round toward) : ﺟﻬﺖ ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﻣﺜﺒﺖ ﺑﯽ ﺟﻬﺖ ﮔﺮد ﮐﺮدن • ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﻣﻨﻔﯽ ﺑﯽ ﻧﻬﺎﯾﺖ(∞− Round toward):ﺟﻬﺖ ﮔﺮد ﮐﺮدن ﺑﻪ ﺳﻤﺖ ﻣﻨﻔﯽ ﺑﯽ ﻧﻬﺎﯾﺖ. ﻋﻤﻠﮕﺮﻫﺎ : ﻋﻤﻠﮕﺮﻫﺎی ﻣﻮرد ﻧﯿﺎزﺑﺮای ﭘﺸﺘﯿﺒﺎﻧﯽ از ﻗﺎﻟﺐ ﻫﺎی رﯾﺎﺿﯽ(در ﺑﺮدارﻧﺪه ﻗﺎﻟﺐ ﻫﺎی اﺻﻠﯽ) ﺷﺎﻣﻞ: • ﻋﻤﻠﮕﺮﻫﺎی ﺣﺴﺎﺑﯽ ( ﺟﻤﻊ ، ﺗﻔﺮﯾﻖ ، ﺿﺮب ، ﺗﻘﺴﯿﻢ ، ﻣﺠﺬور ، ﺿﺮب ﯾﺎ ﺟﻤﻊ ﺗﺮﮐﯿﺒﯽ ، ﺑﺎﻗﯿﻤﺎﻧﺪه و ﻏﯿﺮه ) • ﺗﺒﺪﯾﻼت ( ﺑﯿﻦ ﻓﺮﻣﺘﻬﺎ رﺷﺘﻪ ﻫﺎ و ﻏﯿﺮه ) • ﻣﻘﯿﺎس ﮔﺰاری ( ﺑﺮای دﻫﺪﻫﯽ ) واﻧﺪازه ﮔﯿﺮی ﻣﻘﺪار دﻗﺖ ﯾﮏ ﻣﺘﻐﯿﯿﺮ ﻧﺴﺒﺖ ﺑﻪ ﻣﻘﺎدﯾﺮ ﯾﮏ ﻣﺠﻤﻮﻋﻪ ﺧﺎص • ﮐﭙﯽ ﮐﺮدن و دﺳﺘﮑﺎری ﮐﺮدن ﻋﻼﻣﺖ ( ﻗﺪر ﻣﻄﻠﻖ ، ﻣﻨﻔﯽ ﮐﺮدن و ﻏﯿﺮه ) • ﻣﻘﺎﯾﺴﻪ ﻫﺎ و ﻣﺮﺗﺒﻪ ﮐﻞ • ﮐﻼس ﺑﻨﺪی و ﺑﺮرﺳﯽ NaNsو... • ﺑﺮرﺳﯽ و ﺗﻨﻈﯿﻤﺎت ﻓﻠﮕﻬﺎ • ﻋﻤﻠﮕﺮﻫﺎی ﮔﻮﻧﺎﮔﻮن ﻣﺪﯾﺮﯾﺖ ﺧﻄﺎ (اﺳﺘﺜﻨﺎء) : اﯾﻦ اﺳﺘﺎﻧﺪارد ، ﭘﻨﺞ ﺧﻄﺎ ﺗﻌﺮﯾﻒ ﻣﯽ ﮐﻨﺪ ﮐﻪ ﻫﺮ ﮐﺪام ﯾﮏ ﭘﺮﭼﻢ وﺿﻌﯿﺖ ﻣﺮﺑﻮط ﺑﻪ ﺧﻮد دارﻧﺪ ﮐﻪ ( ﺧﻄﺎ در ﺑﻌﻀﯽ ﻣﻮارد ﺗﻪ رﯾﺰ اﺳﺖ ) ﻫﻨﮕﺎﻣﯽ ﮐﻪ ﺧﻄﺎ رخ دﻫﺪ ، ﺑﺎﻻ ﻣﯽ روﻧﺪ.اﻣﺎ ﭘﯿﺸﻨﻬﺎدات دﯾﮕﺮی ﻫﻢ ﺗﻮﺻﯿﻪ ﺷﺪه اﺳﺖ ( زﯾﺮ را ﺑﺒﯿﻨﯿﺪ ) اﯾﻨﻬﺎ ﭘﻨﺞ ﻣﻮرد ازاﺣﺘﻤﺎﻻت ﺧﻄﺎ ﻫﺴﺘﻨﺪ : • ﻋﻤﻞ ﻏﯿﺮ ﻣﻌﺘﺒﺮ ( ﻣﺎﻧﻨﺪ ﺟﺬر ﮔﺮﻓﺘﻦ ازﯾﮏ ﻋﺪد ﻣﻨﻔﯽ ) • ﺗﻘﺴﯿﻢ ﺑﺮ ﺻﻔﺮ • ﺳﺮ رﯾﺰ ( ﻧﺘﯿﺠﻪ آﻧﻘﺪر ﺑﺰرگ اﺳﺖ ﮐﻪ ﺑﻄﻮر ﺻﺤﯿﺢ ﻗﺎﺑﻞ ﻧﻤﺎﯾﺶ ﻧﯿﺴﺖ ) • ﺗﻪ رﯾﺰ ( ﻧﺘﯿﺠﻪ ﺧﯿﻠﯽ ﮐﻮﭼﮏ ( ﺧﺎرج از ﻣﺤﺪوده ﻧﺮﻣﺎل ) و ﻏﯿﺮ دﻗﯿﻖ اﺳﺖ ) • ﻋﻤﻞ ﻏﯿﺮ دﻗﯿﻖ ﭘﯿﺸﻨﻬﺎد ﻫﺎ : ﻣﺪﯾﺮﯾﺖ ﺧﻄﺎﻫﺎی ﻣﺘﻨﺎوب .
Recommended publications
  • Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers]
    White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] White paper Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers Page 1 of 13 www.fujitsu.com/sparc White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] Table of Contents Table of Contents 1 Software on Chip Innovative Technology 3 2 SPARC64™ X+/X SIMD Vector Processing 4 2.1 How Oracle Databases take advantage of SPARC64™ X+/X SIMD vector processing 4 2.2 How to use of SPARC64™ X+/X SIMD instructions in user applications 5 2.3 How to check if the system and operating system is capable of SIMD execution? 6 2.4 How to check if SPARC64™ X+/X SIMD instructions indeed have been generated upon compilation of a user source code? 6 2.5 Is SPARC64™ X+/X SIMD implementation compatible with Oracle’s SPARC SIMD? 6 3 SPARC64™ X+/X Decimal Floating Point Processing 8 3.1 How Oracle Databases take advantage of SPARC64™ X+/X Decimal Floating-Point 8 3.2 Decimal Floating-Point processing in user applications 8 4 SPARC64™ X+/X Extended Floating-Point Registers 9 4.1 How Oracle Databases take advantage of SPARC64™ X+/X Decimal Floating-Point 9 5 SPARC64™ X+/X On-Chip Cryptographic Processing Capabilities 10 5.1 How to use the On-Chip Cryptographic Processing Capabilities 10 5.2 How to use the On-Chip Cryptographic Processing Capabilities in user applications 10 6 Conclusions 12 Page 2 of 13 www.fujitsu.com/sparc White paper [Fujitsu SPARC64™ X+/X Software on Chip Overview for Developers] Software on Chip Innovative Technology 1 Software on Chip Innovative Technology Fujitsu brings together innovations in supercomputing, business computing, and mainframe computing in the Fujitsu M10 enterprise server family to help organizations meet their business challenges.
    [Show full text]
  • Chapter 2 DIRECT METHODS—PART II
    Chapter 2 DIRECT METHODS—PART II If we use finite precision arithmetic, the results obtained by the use of direct methods are contaminated with roundoff error, which is not always negligible. 2.1 Finite Precision Computation 2.2 Residual vs. Error 2.3 Pivoting 2.4 Scaling 2.5 Iterative Improvement 2.1 Finite Precision Computation 2.1.1 Floating-point numbers decimal floating-point numbers The essence of floating-point numbers is best illustrated by an example, such as that of a 3-digit floating-point calculator which accepts numbers like the following: 123. 50.4 −0.62 −0.02 7.00 Any such number can be expressed as e ±d1.d2d3 × 10 where e ∈{0, 1, 2}. (2.1) Here 40 t := precision = 3, [L : U] := exponent range = [0 : 2]. The exponent range is rather limited. If the calculator display accommodates scientific notation, e g., 3.46 3 −1.56 −3 then we might use [L : U]=[−9 : 9]. Some numbers have multiple representations in form (2.1), e.g., 2.00 × 101 =0.20 × 102. Hence, there is a normalization: • choose smallest possible exponent, • choose + sign for zero, e.g., 0.52 × 102 → 5.20 × 101, 0.08 × 10−8 → 0.80 × 10−9, −0.00 × 100 → 0.00 × 10−9. −9 Nonzero numbers of form ±0.d2d3 × 10 are denormalized. But for large-scale scientific computation base 2 is preferred. binary floating-point numbers This is an important matter because numbers like 0.2 do not have finite representations in base 2: 0.2=(0.001100110011 ···)2.
    [Show full text]
  • Decimal Floating Point for Future Processors Hossam A
    1 Decimal Floating Point for future processors Hossam A. H. Fahmy Electronics and Communications Department, Cairo University, Egypt Email: [email protected] Tarek ElDeeb, Mahmoud Yousef Hassan, Yasmin Farouk, Ramy Raafat Eissa SilMinds, LLC Email: [email protected] F Abstract—Many new designs for Decimal Floating Point (DFP) hard- Simple decimal fractions such as 1=10 which might ware units have been proposed in the last few years. To date, only represent a tax amount or a sales discount yield an the IBM POWER6 and POWER7 processors include internal units for infinitely recurring number if converted to a binary decimal floating point processing. We have designed and tested several representation. Hence, a binary number system with a DFP units including an adder, multiplier, divider, square root, and fused- multiply-add compliant with the IEEE 754-2008 standard. This paper finite number of bits cannot accurately represent such presents the results of using our units as part of a vector co-processor fractions. When an approximated representation is used and the anticipated gains once the units are moved on chip with the in a series of computations, the final result may devi- main processor. ate from the correct result expected by a human and required by the law [3], [4]. One study [5] shows that in a large billing application such an error may be up to 1 WHY DECIMAL HARDWARE? $5 million per year. Ten is the natural number base or radix for humans Banking, billing, and other financial applications use resulting in a decimal number system while a binary decimal extensively.
    [Show full text]
  • Harnessing Numerical Flexibility for Deep Learning on Fpgas.Pdf
    WHITE PAPER FPGA Inline Acceleration Harnessing Numerical Flexibility for Deep Learning on FPGAs Authors Abstract Andrew C . Ling Deep learning has become a key workload in the data center and the edge, leading [email protected] to a race for dominance in this space. FPGAs have shown they can compete by combining deterministic low latency with high throughput and flexibility. In Mohamed S . Abdelfattah particular, FPGAs bit-level programmability can efficiently implement arbitrary [email protected] precisions and numeric data types critical in the fast evolving field of deep learning. Andrew Bitar In this paper, we explore FPGA minifloat implementations (floating-point [email protected] representations with non-standard exponent and mantissa sizes), and show the use of a block-floating-point implementation that shares the exponent across David Han many numbers, reducing the logic required to perform floating-point operations. [email protected] The paper shows this technique can significantly improve FPGA performance with no impact to accuracy, reduce logic utilization by 3X, and memory bandwidth and Roberto Dicecco capacity required by more than 40%.† [email protected] Suchit Subhaschandra Introduction [email protected] Deep neural networks have proven to be a powerful means to solve some of the Chris N Johnson most difficult computer vision and natural language processing problems since [email protected] their successful introduction to the ImageNet competition in 2012 [14]. This has led to an explosion of workloads based on deep neural networks in the data center Dmitry Denisenko and the edge [2]. [email protected] One of the key challenges with deep neural networks is their inherent Josh Fender computational complexity, where many deep nets require billions of operations [email protected] to perform a single inference.
    [Show full text]
  • Midterm-2020-Solution.Pdf
    HONOR CODE Questions Sheet. A Lets C. [6 Points] 1. What type of address (heap,stack,static,code) does each value evaluate to Book1, Book1->name, Book1->author, &Book2? [4] 2. Will all of the print statements execute as expected? If NO, write print statement which will not execute as expected?[2] B. Mystery [8 Points] 3. When the above code executes, which line is modified? How many times? [2] 4. What is the value of register a6 at the end ? [2] 5. What is the value of register a4 at the end ? [2] 6. In one sentence what is this program calculating ? [2] C. C-to-RISC V Tree Search; Fill in the blanks below [12 points] D. RISCV - The MOD operation [8 points] 19. The data segment starts at address 0x10000000. What are the memory locations modified by this program and what are their values ? E Floating Point [8 points.] 20. What is the smallest nonzero positive value that can be represented? Write your answer as a numerical expression in the answer packet? [2] 21. Consider some positive normalized floating point number where p is represented as: What is the distance (i.e. the difference) between p and the next-largest number after p that can be represented? [2] 22. Now instead let p be a positive denormalized number described asp = 2y x 0.significand. What is the distance between p and the next largest number after p that can be represented? [2] 23. Sort the following minifloat numbers. [2] F. Numbers. [5] 24. What is the smallest number that this system can represent 6 digits (assume unsigned) ? [1] 25.
    [Show full text]
  • A Decimal Floating-Point Speciftcation
    A Decimal Floating-point Specification Michael F. Cowlishaw, Eric M. Schwarz, Ronald M. Smith, Charles F. Webb IBM UK IBM Server Division P.O. Box 31, Birmingham Rd. 2455 South Rd., MS:P310 Warwick CV34 5JL. UK Poughkeepsie, NY 12601 USA [email protected] [email protected] Abstract ing is required. In the fixed point formats, rounding must be explicitly applied in software rather than be- Even though decimal arithmetic is pervasive in fi- ing provided by the hardware. To address these and nancial and commercial transactions, computers are other limitations, we propose implementing a decimal stdl implementing almost all arithmetic calculations floating-point format. But what should this format be? using binary arithmetic. As chip real estate becomes This paper discusses the issues of defining a decimal cheaper it is becoming likely that more computer man- floating-point format. ufacturers will provide processors with decimal arith- First, we consider the goals of the specification. It metic engines. Programming languages and databases must be compliant with standards already in place. are expanding the decimal data types available whale One standard we consider is the ANSI X3.274-1996 there has been little change in the base hardware. As (Programming Language REXX) [l]. This standard a result, each language and application is defining a contains a definition of an integrated floating-point and different arithmetic and few have considered the efi- integer decimal arithmetic which avoids the need for ciency of hardware implementations when setting re- two distinct data types and representations. The other quirements. relevant standard is the ANSI/IEEE 854-1987 (Radix- In this paper, we propose a decimal format which Independent Floating-point Arithmetic) [a].
    [Show full text]
  • Floating Point Numbers
    Floating Point Numbers Yes, this is the moon; our own moon. Not the final frontier but the first out post there to be exploited by our greed of consumable minerals. Soon we the human race will be there blasting the mines and depriving the orb with its riches. Do we know how much is there to steal? Pop Quiz : What is the surface area of moon? 2 Answer : The surface area of a sphere is: 4 * π * R Radius of moon is about 1738.2 KM; plugging the values: 4 * 3.14159265 * 1738.2 * 1738.2 = 37967268 .598162344 KM 2. That would be 37.9 million square kilometers. Two of our biggest states Texas and California are 1.7 and 0.7 million square kilometers respectively. Surface if the Moon would be about 2/3 of the area of North America or about the size of Russia, that is close to the 38 million Sq Km Now you, all mainframe assembly language tool developers i.e. old time MF programmers try doing this calculation in S/390 Assembly. Give yourself few minutes. Address Object Code S/390 Assembly Reg./ Memory after execution 000036 B375 0010 LZDR R1 FPR1 00000000_00000000 00003A ED10 C08C 0024 LDE R1,FOUR FPR1 41400000_00000000 000040 7C10 C090 MDE R1,PIE FPR1 41C90FDC_00000000 000044 7C10 C094 MDE R1,RADIUS FPR1 445552DD_F73CD400 000048 7C10 C094 MDE R1,RADIUS FPR1 47243559_FE390700 00004C B3C9 0011 CGDR R1,0,R1 GR1 0243559F 000050 5010 C098 ST R1,FIXED 000054 4E10 C09C CVD R1,DECIMAL 00000003 7967263C 000088 45B27570 FLOAT DC X'45B27570' 00008C 41400000 FOUR DC E'4' 000090 413243F7 PIE DC E'3.14159265E+0' 000094 436CA333 RADIUS DC E'1.7382E+3' 000098 00000000 FIXED DC F'0' 00009C 0000000000000000 DECIMAL DC 2F'0' This is one way of solving the problem mentioned on previous slide and, of course, we all know that there are several different ways to solve any programming problem and your way is always better than mine.
    [Show full text]
  • High Performance Hardware Design of IEEE Floating Point Adder in FPGA with VHDL
    International Journal of Mechatronics, Electrical and Computer Technology Vol. 3(8), Jul, 2013, pp 81 - 101, ISSN: 2305-0543 Available online at: http://www.aeuso.org © Austrian E-Journals of Universal Scientific Organization - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - High Performance Hardware Design Of IEEE Floating Point Adder In FPGA With VHDL Ali Farmani Department of Electrical and Computer Engineering,University of Tabriz,Tabriz,Iran *Corresponding Author's E-mail: [email protected] Abstract In this paper, we present the design and implementation of a floating-point adder that is compliant with the current draft revision of this standard. We provide synthesis results indicating the estimated area and delay for our design when it is pipelined to various depths.Our work is an important design resource for development of floating-point adder hardware on FPGAs. All sub components within the floating-point adder and known algorithms are researched and implemented to provide versatility and flexibility to designers as an alternative to intellectual property where they have no control over the design. The VHDL code is open source and can be used by designers with proper reference. Each of the sub-operation is researched for different implementations and then synthesized onto a Spartan FPGA device to be chosen for best performance. Our implementation of the standard algorithm occupied 370 slices and had an overall delay of 31 ns. The standard algorithm was pipelined into five stages to run at 100 MHz which took an area of 324 slices and power is 30mw. Keywords: Floating Point Adder, FPGA, VHDL, Pipeline. 1. Introduction High performance floating point adders are essential building blocks of microprocessors and floating point DSP data paths.
    [Show full text]
  • Handwritten Digit Classication Using 8-Bit Floating Point Based Convolutional Neural Networks
    Downloaded from orbit.dtu.dk on: Apr 10, 2018 Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks Gallus, Michal; Nannarelli, Alberto Publication date: 2018 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Gallus, M., & Nannarelli, A. (2018). Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks. DTU Compute. (DTU Compute Technical Report-2018, Vol. 01). General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Handwritten Digit Classification using 8-bit Floating Point based Convolutional Neural Networks Michal Gallus and Alberto Nannarelli (supervisor) Danmarks Tekniske Universitet Lyngby, Denmark [email protected] Abstract—Training of deep neural networks is often con- In order to address this problem, this paper proposes usage strained by the available memory and computational power. of 8-bit floating point instead of single precision floating point This often causes it to run for weeks even when the underlying which allows to save 75% space for all trainable parameters, platform is employed with multiple GPUs.
    [Show full text]
  • Names for Standardized Floating-Point Formats
    Names WORK IN PROGRESS April 19, 2002 11:05 am Names for Standardized Floating-Point Formats Abstract I lack the courage to impose names upon floating-point formats of diverse widths, precisions, ranges and radices, fearing the damage that can be done to one’s progeny by saddling them with ill-chosen names. Still, we need at least a list of the things we must name; otherwise how can we discuss them without talking at cross-purposes? These things include ... Wordsize, Width, Precision, Range, Radix, … all of which must be both declarable by a computer programmer, and ascertainable by a program through Environmental Inquiries. And precision must be subject to truncation by Coercions. Conventional Floating-Point Formats Among the names I have seen in use for floating-point formats are … float, double, long double, real, REAL*…, SINGLE PRECISION, DOUBLE PRECISION, double extended, doubled double, TEMPREAL, quad, … . Of these floating-point formats the conventional ones include finite numbers x all of the form x = (–1)s·m·ßn+1–P in which s, m, ß, n and P are integers with the following significance: s is the Sign bit, 0 or 1 according as x ≥ 0 or x ≤ 0 . ß is the Radix or Base, two for Binary and ten for Decimal. (I exclude all others). P is the Precision measured in Significant Digits of the given radix. m is the Significand or Coefficient or (wrongly) Mantissa of | x | running in the range 0 ≤ m ≤ ßP–1 . ≤ ≤ n is the Exponent, perhaps Biased, confined to some interval Nmin n Nmax . Other terms are in use.
    [Show full text]
  • SPARC64™ X / X+ Specification
    SPARC64™ X / X+ Specification Distribution: Public Privilege Levels: Nonprivileged Ver 29.0 2015/01/27 Fujitsu Limited Fujitsu Limited 4-1-1 Kamikodanaka Nakahara-ku, Kawasaki, 211-8588 Japan Copyright© 2007 - 2015 Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, 211-8588, Japan. All rights reserved. This product and related documentation are protected by copyright and distributed under licenses restricting their use, copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Fujitsu Limited and its licensors, if any. The product(s) described in this book may be protected by one or more U.S. patents, foreign patents, or pending applications. TRADEMARKS SPARC® is a registered trademark of SPARC International, Inc. Products bearing SPARC trademarks are based on an architecture developed by Oracle and / or its affiliates. SPARC64™ is a registered trademark of SPARC International, Inc., licensed exclusively to Fujitsu Limited. UNIX is a registered trademark of The Open Group in the United States and other countries. Fujitsu and the Fujitsu logo are trademarks of Fujitsu Limited. This publication is provided “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or noninfringement. This publication could include technical inaccuracies or typographical errors. Changes are periodically added to the information herein; these changes will be incorporated in new editions of the publication. Fujitsu Limited may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time.
    [Show full text]
  • Fpnew: an Open-Source Multi-Format Floating-Point Unit Architecture For
    1 FPnew: An Open-Source Multi-Format Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing Stefan Mach, Fabian Schuiki, Florian Zaruba, Student Member, IEEE, and Luca Benini, Fellow, IEEE Abstract—The slowdown of Moore’s law and the power wall Internet of Things (IoT) domain. In this environment, achiev- necessitates a shift towards finely tunable precision (a.k.a. trans- ing high energy efficiency in numerical computations requires precision) computing to reduce energy footprint. Hence, we need architectures and circuits which are fine-tunable in terms of circuits capable of performing floating-point operations on a wide range of precisions with high energy-proportionality. We precision and performance. Such circuits can minimize the present FPnew, a highly configurable open-source transprecision energy cost per operation by adapting both performance and floating-point unit (TP-FPU) capable of supporting a wide precision to the application requirements in an agile way. The range of standard and custom FP formats. To demonstrate the paradigm of “transprecision computing” [1] aims at creating flexibility and efficiency of FPnew in general-purpose processor a holistic framework ranging from algorithms and software architectures, we extend the RISC-V ISA with operations on half-precision, bfloat16, and an 8bit FP format, as well as SIMD down to hardware and circuits which offer many knobs to vectors and multi-format operations. Integrated into a 32-bit fine-tune workloads. RISC-V core, our TP-FPU can speed up execution of mixed- The most flexible and dynamic way of performing nu- precision applications by 1.67x w.r.t.
    [Show full text]