Whitepaper Intel Ivy Bridge .Indd

Total Page:16

File Type:pdf, Size:1020Kb

Whitepaper Intel Ivy Bridge .Indd » Whitepaper « Processor AMC 3U CompactPCI® 6U CompactPCI® COM Express® basic Mini-ITX Flex-ATX 3U VPX The 3rd Generation Intel® Core™ Processor: A must have for all high-performance embedded computing appliances If it’s embedded, it ’s Kontron. Whitepaper The 3rd Generation Intel® Core™ Processor: A must have for all high-performance embedded computing appliances Intel® 3-D processor technology brings a new dimension of processing to the embedded computing space and comes with a great deal of improvements. What are the most important benefi ts for embedded appliances and how can engineers deploy them most effi ciently? Abstract With their increased performance levels, lowered TDP, improved high-end embedded graphics performance, optimized security, and broad scalability, the 3rd generation Intel® Core™ processors provide an attractive solution for a broad array of high performance embedded applications in target markets such as medical, communications, industrial automation, infotainment and military. This whitepaper gives engineers a closer look into the architectural improvements of the new 3rd generation Intel® Core™ processors and delivers the answers as to how they can integrate these most effi ciently into their appliances. CONTENTS Overview. 3 Improved architecture: a tick-plus . 3 Enhanced performance. 4 Turbo Boost 2.01 . 4 Extended AVX and SEE instructions . .5 Improved interface performance . 5 Additional power savings . 5 Enhanced media and graphics . 6 Secure manageability . .7 The new benchmark comes in different fl avours . 7 COM Express® basic Computer-on-Module . 8 Flex-ATX and Mini-ITX embedded motherboards . 8 AdvancedMC™ . 8 3U and 6U CompactPCI® blades . 9 3U VPX CPU boards . 9 Custom designs and application-ready platforms . 9 www.kontron.com 2 Whitepaper Overview dimensions. But as gate lengths approach sub-32nm dimensions, scaling becomes more challenging to overcome High-performance embedded computing applications, the fundamental physical limitations imposed by traditional such as image processing in automation and medical semiconductor materials. As the size decreases, planar applications, embedded cloud computing and digital transistors increasingly suffer from the undesirable off-state signal processing in communications, as well as signals leakage current, which increases the idle power required intelligence in military and aerospace platforms, all by the device [1]. To solve this issue and keep the pace share a common demand in terms of highest possible of technology advancement, yet another innovation was signal processing performance, throughput and graphics needed to fuel Moore’s Law for the years to come. processing. At the same time, this demand is frequently coupled with strict requirements in regards of power In 2012 Intel® has accomplished this with another radical effi ciency to deliver a level of performance per watt change in its transistor design. For the fi rst time in history, that fi ts the needs of space-, weight- and power- silicon transistors entered the third dimension. With the 3rd constrained (SWaP) applications that characterize many generation Intel® Core™ processors, Intel® is introducing embedded deployments. With the development of a new the tri-gate transistor, in which the transistor channel is 22-nanometer (nm) 3-D tri-gate transistor technology, raised into the 3rd dimension. Adding a third dimension to Intel® introduced several architectural improvements that transistors allows Intel® to increase transistor density to lay the groundwork to continuously fulfi l these tough 1.4 billion transistors on a die size of 160mm² and insert demands for the next years to come. more capabilities into every square millimetre of these new processors [2]. The current fl ow is now controlled on The 3rd generation Intel® Core™ processors, which are the three sides of the channel (top, left and right) rather than fi rst processors to leverage this new technology, provide just from the top, as in conventional, planar transistors. up to 20% enhanced computing power and up to 40% The net result is much better control of the transistor, a increased performance per watt compared to designs based maximization of current fl ow for when high performance is on the 2nd generation Intel® Core™ processors. Embedded required and minimization when it is off to reduce leakage computing platforms that implement the new processors [3]. enable OEMs to build applications with increased processing density and I/O bandwidth within tight thermal envelopes. This also meets and exceeds the requirement for improved size, weight and power of embedded designs and enables designers to utilize the power of the latest quad-core Intel® processors for the fi rst time on small form factors such as COM Express®, AdvancedMC™ and 3U VPX. Additional improvements, such as extended Intel® Advanced Vector Extensions (AVX) and SSE instructions as well as the support for OpenCL 1.1 provide developers Copyright: Intel® effi cient tools to reduce the development effort and Image 1: 3-D Tri-Gate transistors form conducting channels time-to-market for parallel computing applications. on three sides of a vertical fi n structure to maximize current Further advancements, such as the integrated Intel® HD fl ow on the one hand and reduce leakage current at the other Graphics 4000, that now features 30% more execution hand. Moreover, Tri-Gate transistors can have multiple of units than the previous generation and natively supports these vertical fi ns connected together to increase total drive three independent digital display interfaces, enables strength for higher performance [4]. sophisticated graphics intensive applications such as infotainment and digital signage with an immersive user experience. All of these architectural improvements are But the change in transistor design is not the only worth taking a closer look into the enhancements and architectural improvement in the 3rd generation over the how OEMs in the different verticals can unleash the full 2nd generation Intel® Core™ processors. Together with the potential of this new processor architecture by leveraging 3-D tri-gate transistor technology Intel® also introduced standardized and proven platforms to minimize design and a new graphics architecture which offers up to twice risks and speed up time-to-market. the HD media and 3-D graphics performance compared to its predecessor. Further new features are support for low-power DDR3L memory, dynamic overclocking control of both the compute and graphics cores, power-management Improved architecture: a tick-plus improvements and security enhancements to guard against escalation of privilege attacks. With the introduction of the 32 nm process in 2009, Intel® maintained its historical doubling of chip functionality This signifi cant redesign is quite unusual in Intel's "tick- every two years by continually reducing transistor tock" chip-release cadence, in which a tick stands for a 3 www.kontron.com Whitepaper process shrink and a tock stands for a new architecture. Transistor Changing the chips’ architecture while at the same time Gate Delay shrinking the size of the underlying transistors is an (normalized) acceleration of Intel’s “tick-tock” model. This is why Intel® refers to the 3rd generation Intel® Core™ processor as "a 2.0 tick-plus" – a scaled-down version of the 2nd generation 1.8 Intel® Core™ processors, but with its own architectural improvements [2]. 1.4 37% Faster 32 nm Planar 1.2 Enhanced performance 1.0 18% Due to these improvements and as already mentioned in 22 nm 0.8 Trigate Faster the overview chapter, the 3rd generation Intel® Core™ processors now offer up to 20% enhanced computing power 0.6 Operating and up to 40% increased performance per watt compared Voltage to designs based on the 2nd Generation Intel® Core™ 0.5 0.6 0.7 0.8 0.9 1.0 1.1 (V) processors. But this is not all: This increase in power effi ciency now also allows applications with tight thermal Image 2: 22 nm 3-D Tri-Gate transistors provide improved envelopes to take advantage of the parallel performance performance at high voltage and an unprecedented of up to four CPU cores and eight threads. This not only performance gain at low voltage [4] enables highly effi cient small form factor applications, such as extremely compact unmanned aerial vehicles (UAVs), but, due to the high level of integration, also allows Turbo Boost 2.01 consolidating multiple computing systems onto one single platform [5]. This results in reduced hardware costs, as one As for applications that are particularly power-hungry, the multicore system is less expensive than several single core new processors also provide the enhanced Intel® Turbo systems. The decreased system count also results in higher Boost 2.0 technology that has been introduced with the MTBF values of the consolidated installation and helps to 2nd generation Intel® Core™ processors. Turbo Boost mode save valuable space for SWaP optimized high-performance increases the clock speeds of both the processor cores embedded computing applications. However, it’s important and the graphics unit independently. This automatically to be aware that standard boards for the consumer market shifts processor cores and processor graphics resources to are not designed to meet high MTBF requirements. Modules, accelerate performance, tailoring a workload to give users boards and systems that are intended to meet a high MTBF an immediate performance boost for their applications should be selected from embedded computer vendors such
Recommended publications
  • China's Progress in Semiconductor Manufacturing Equipment
    MARCH 2021 China’s Progress in Semiconductor Manufacturing Equipment Accelerants and Policy Implications CSET Policy Brief AUTHORS Will Hunt Saif M. Khan Dahlia Peterson Executive Summary China has a chip problem. It depends entirely on the United States and U.S. allies for access to advanced commercial semiconductors, which underpin all modern technologies, from smartphones to fighter jets to artificial intelligence. China’s current chip dependence allows the United States and its allies to control the export of advanced chips to Chinese state and private actors whose activities threaten human rights and international security. Chip dependence is also expensive: China currently depends on imports for most of the chips it consumes. China has therefore prioritized indigenizing advanced semiconductor manufacturing equipment (SME), which chip factories require to make leading-edge chips. But indigenizing advanced SME will be hard since Chinese firms have serious weaknesses in almost all SME sub-sectors, especially photolithography, metrology, and inspection. Meanwhile, the top global SME firms—based in the United States, Japan, and the Netherlands—enjoy wide moats of intellectual property and world- class teams of engineers, making it exceptionally difficult for newcomers to the SME industry to catch up to the leading edge. But for a country with China’s resources and political will, catching up in SME is not impossible. Whether China manages to close this gap will depend on its access to five technological accelerants: 1. Equipment components. Building advanced SME often requires access to a range of complex components, which SME firms often buy from third party suppliers and then assemble into finished SME.
    [Show full text]
  • Technology Roadmap for 22Nm CMOS and Beyond
    Technology Roadmap for 22nm CMOS and beyond June 1, 2009 IEDST 2009@IIT-Bombay Hiroshi Iwai Tokyo Institute of Technology 1 Outline 1. Scaling 2. ITRS Roadmap 3. Voltage Scaling/ Low Power and Leakage 4. SRAM Cell Scaling 5.Roadmap for further future as a personal view 2 1. Scaling 3 Scaling Method: by R. Dennard in 1974 1 Wdep: Space Charge Region (or Depletion Region) Width 1 1 SDWdep has to be suppressed 1 Otherwise, large leakage Wdep between S and D I Leakage current Potential in space charge region is high, and thus, electrons in source are 0 attracted to the space charge region. 0 V 1 K=0.7 X , Y, Z :K, V :K, Na : 1/K for By the scaling, Wdep is suppressed in proportion, example and thus, leakage can be suppressed. K Good scaled I-V characteristics K K Wdep V/Na K Wdep I I : K : K 0 0K V 4 Downscaling merit: Beautiful! Geometry & L , W g g K Scaling K : K=0.7 for example Supply voltage Tox, Vdd Id = vsatWgCo (Vg‐Vth) Co: gate C per unit area Drive current I d K –1 ‐1 ‐1 in saturation Wg (tox )(Vg‐Vth)= Wgtox (Vg‐Vth)= KK K=K Id per unit Wg Id/µm 1 Id per unit Wg = Id / Wg= 1 Gate capacitance Cg K Cg = εoεoxLgWg/tox KK/K = K Switching speed τ K τ= CgVdd/Id KK/K= K Clock frequency f 1/K f = 1/τ = 1/K Chip area Achip α α: Scaling factor In the past, α>1 for most cases Integration (# of Tr) N α/K2 N α/K2 = 1/K2 , when α=1 Power per chip P α fNCV2/2 K‐1(αK‐2)K (K1 )2= α = 1, when α=1 5 k= 0.7 and α =1 k= 0.72 =0.5 and α =1 Single MOFET Vdd 0.7 Vdd 0.5 Lg 0.7 Lg 0.5 Id 0.7 Id 0.5 Cg 0.7 Cg 0.5 P (Power)/Clock P (Power)/Clock 0.73 = 0.34 0.53 = 0.125 τ (Switching time) 0.7 τ (Switching time) 0.5 Chip N (# of Tr) 1/0.72 = 2 N (# of Tr) 1/0.52 = 4 f (Clock) 1/0.7 = 1.4 f (Clock) 1/0.5 = 2 P (Power) 1 P (Power) 1 6 - The concerns for limits of down-scaling have been announced for every generation.
    [Show full text]
  • Moore's Law: the Future of Si Microelectronics
    Moore’s law: the future of Si microelectronics Soon after Bardeen, Brattain, and Shockley invented a solid-state device in 19471 to replace electron vacuum tubes, the microelectronics industry and a revolution started. Since its birth, the industry has experienced four decades of unprecedented explosive growth driven by two factors: Noyce and Kilby inventing the planar integrated circuit2,3 and the advantageous characteristics that result from scaling (shrinking) solid-state devices. Scott E. Thompson and Srivatsan Parthasarathy SWAMP Center, Department of Electrical and Computer Engineering, University of Florida, Gainsville, FL 32611-6130 USA E-mail:[email protected], [email protected] Scaling solid-state devices has the peculiar property of improving approaches under investigation are: (1) nonclassical CMOS, which cost, performance, and power, which has historically given any consists of new channel materials and/or multigate fully depleted company with the latest technology a large competitive device structures; and (2) alternatives to CMOS, such as spintronics, advantage in the market. As a result, the microelectronics single electron devices, and molecular computing8,9. While some of industry has driven transistor feature size scaling from 10 µm to these non-Si research areas are important and will be successful in ~30 nm4-6 during the past 40 years. During most of this time, new applications and markets10, it seems unlikely any of the non-Si scaling simply consisted of reducing the feature size. However, options can replace the Si transistor for the $300 billion during certain periods, there were major changes as with the microelectronics industry in the foreseeable future (perhaps as long industry move from Si bipolar to p-channel metal-oxide- as 30 years).
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • Optimization of Monte Carlo Neutron Transport Simulations with Emerging Architectures Yunsong Wang
    Optimization of Monte Carlo Neutron Transport Simulations with Emerging Architectures Yunsong Wang To cite this version: Yunsong Wang. Optimization of Monte Carlo Neutron Transport Simulations with Emerging Archi- tectures. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Paris Saclay (COmUE), 2017. English. NNT : 2017SACLX090. tel-01687913 HAL Id: tel-01687913 https://pastel.archives-ouvertes.fr/tel-01687913 Submitted on 18 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. NNT : 2017SACLX090 THESE` DE DOCTORAT DE L’UNIVERSITE´ PARIS-SACLAY PREPAR´ E´ A` L’ECOLE´ POLYTECHNIQUE ECOLE´ DOCTORALE No 573 INTERFACES : APPROCHES INTERDISCIPLINAIRES / FONDEMENTS, APPLICATIONS ET INNOVATION Sp´ecialit´ede doctorat: Informatique par M. Yunsong Wang Optimization of Monte Carlo Neutron Transport Simulations by Using Emerging Architectures Th`ese pr´esent´ee et soutenue `a Gif-sur-Yvette, le 14 d´ecembre 2017 : Composition de jury : M. Marc Verderi Directeur de Recherche, CNRS/IN2P3/LLR Pr´esident du jury M. Andrew Siegel Expert Senior, Argonne National Laboratory Rapporteur M. Raymond Namyst Professeur, Universit´ede Bordeaux/LABRI Rapporteur M. David Chamont Charg´ede Recherche, CNRS/IN2P3/LAL Examinateur M. David Riz Ing´enieur,CEA/DAM Examinateur M.
    [Show full text]
  • Design and Analysis of a New Carbon Nanotube Full Adder Cell
    Hindawi Publishing Corporation Journal of Nanomaterials Volume 2011, Article ID 906237, 6 pages doi:10.1155/2011/906237 Research Article Design and Analysis of a New Carbon Nanotube Full Adder Cell M. H. Ghadiry,1 Asrulnizam Abd Manaf,1 M. T. Ahmadi,2 Hatef Sadeghi,2 and M. Nadi Senejani3 1 School of Electrical and Electronic Engineering, Universiti Sains Malaysia, Engineering Campus, 11800 Penang, Malaysia 2 Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Malaysia 3 Department of Computer Engineering, Islamic Azad University, Ashtian Branch, 39618-13347 Ashtian, Iran Correspondence should be addressed to M. H. Ghadiry, [email protected] Received 10 January 2011; Accepted 27 February 2011 Academic Editor: Theodorian Borca-Tasciuc Copyright © 2011 M. H. Ghadiry et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A novel full adder circuit is presented. The main aim is to reduce power delay product (PDP) in the presented full adder cell. A new method is used in order to design a full-swing full adder cell with low number of transistors. The proposed full adder is implemented in MOSFET-like carbon nanotube technology and the layout is provided based on standard 32 nm technology from MOSIS. The simulation results using HSPICE show that there are substantial improvements in both power and performance of the proposed circuit compared to the latest designs. In addition, the proposed circuit has been implemented in conventional 32 nm process to compare the benefits of using MOSFET-like carbon nanotubes in arithmetic circuits over conventional CMOS technology.
    [Show full text]
  • Challenges and Innovations in Nano‐CMOS Transistor Scaling
    Challenges and Innovations in Nano‐CMOS Transistor Scaling Tahir Ghani Intel Fellow Logic Technology Development October, 2009 Nikkei Presentation 1 Outline • Traditional‐Scaling ‐ Traditional Scaling Limiters, Device Implications ‐ Intel’s Response • Post “Traditional‐Scaling” Innovations ‐ Mobility Booster: Uniaxial Strain - Poly Depletion Elimination: Metal Gate - Gate Leakage Reduction: HiK • Future Challenges and Options - Power Limitation - Potential New Transistor Structures and Materials 40+ Years of Moore’s Law at INTEL: From Few to Billions of Transistors 2X transistors every 2 years Transistor Count has Doubled Every Two Years 40+ Years of Moore’s Law at INTEL: From Few to Billions of Transistors 2X transistors every 2 years Traditional Scaling Era END OF TRADITIONAL SCALING ERA ~ 2003 Lasted ~40 YEARS Top “Traditional-Scaling” Enablers R. Dennard et.al. IEEE JSSC, 1974 • Gate Oxide Thickness Scaling - Key enabler for Lgate scaling • Junction Scaling - Another enabler for Lgate scaling - Improved abruptness (REXT reduction) • Vcc Scaling - Reduce XDEP (improve SCE) - However, did not follow const E field 1990’s: Golden Era of Scaling Vcc, Tox & Lg scaling & increasing Idsat Year 2000: INTEL 90nm CMOS Pathfinding End of “Traditional-Scaling” Era Gate oxide running Mobility degrades out of atoms with scaling 1.E+04 Jox limit VLSI Symp. 2000 300 Universal 1.E+03 NA= Mobility 3x1017 ] 1.E+02 SiO 250 2 2 /(V.s) [Lo et. al, EDL97] 2 1.E+01 18 200 1.3x10 [A/cm 130nm 1.E+00 1.8x1018 OX J 1.E-01 18 150 2.5x10 18 1.E-02 Nitrided SiO2 180nm 3.3x10 1.E-03 Mobility (cm 100 0.51.01.52.02.5 0 0.5 1 1.5 TOX Physical [nm] E EFF [MV/cm] • Gate Oxide Leakage • Universal Mobility Model direct tunneling limited • Ionized impurity scattering T.
    [Show full text]
  • Prospects for the Application of Nanotechnologies to the Computer System Architecture
    JOURNAL OFNANO- AND ELECTRONICPHYSICS ЖУРНАЛ НАНО- ТА ЕЛЕКТРОННОЇ ФІЗИКИ Vol. 4 No 1, 01003(6pp) (2012) Том 4 № 1, 01003(6cc) (2012) Prospects for the Application of Nanotechnologies to the Computer System Architecture J. Partyka1, M. Mazur2,* 1 Faculty of Electrical Engineering and Information Science, Lublin University of Technology 38а, ul. Nadbystrzycka 2 Wyższa Szkoła Ekonomii i Innowacji w Lublinie ul. Mełgiewska 7-9 (Received 26 September 2011; published online 14 March 2012) Computer system architecture essentially influences the comfort of our everyday living. Developmental transition from electromechanical relays to vacuum tubes, from transistors to integrated circuits has sig- nificantly changed technological standards for the architecture of computer systems. Contemporary infor- mation technologies offer huge potential concerning miniaturization of electronic circuits. Presently, a modern integrated circuit includes over a billion of transistors, each of them smaller than 100 nm . Step- ping beyond the symbolic 100 nm limit means that with the onset of the 21 century we have entered a new scientific area that is an era of nanotechnologies. Along with the reduction of transistor dimensions their operation speed and efficiency grow. However, the hitherto observed developmental path of classical elec- tronics with its focus on the miniaturization of transistors and memory cells seems arriving at the limits of technological possibilities because of technical problems as well as physical limitations related to the ap- pearance of
    [Show full text]
  • Si MOSFET Roadmap for 22Nm and Beyond
    Si MOSFET Roadmap for 22nm and beyond December 16, 2009 Jadavpur University @ Kolkata, India Hiroshi Iwai Tokyo Institute of Technology 1 Outline 1. Scaling 2. ITRS Roadmap 3. Voltage Scaling/ Low Power and Leakage 4. SRAM Cell Scaling 5.Roadmap for further future 2 1. Scaling 3 Scaling Method: by R. Dennard in 1974 1 Wdep: Space Charge Region (or Depletion Region) Width 1 1 SDWdep has to be suppressed 1 Otherwise, large leakage Wdep between S and D I Leakage current Potential in space charge region is high, and thus, electrons in source are 0 attracted to the space charge region. 0 V 1 K=0.7 X , Y, Z :K, V :K, Na : 1/K for By the scaling, Wdep is suppressed in proportion, example and thus, leakage can be suppressed. K Good scaled I-V characteristics K K Wdep V/Na K Wdep I I : K : K 0 0K V 4 Downscaling merit: Beautiful! Geometry & L , W g g K Scaling K : K=0.7 for example Supply voltage Tox, Vdd Id = vsatWgCo (Vg‐Vth) Co: gate C per unit area Drive current I d K –1 ‐1 ‐1 in saturation Wg (tox )(Vg‐Vth)= Wgtox (Vg‐Vth)= KK K=K Id per unit Wg Id/µm 1 Id per unit Wg = Id / Wg= 1 Gate capacitance Cg K Cg = εoεoxLgWg/tox KK/K = K Switching speed τ K τ= CgVdd/Id KK/K= K Clock frequency f 1/K f = 1/τ = 1/K Chip area Achip α α: Scaling factor In the past, α>1 for most cases Integration (# of Tr) N α/K2 N α/K2 = 1/K2 , when α=1 Power per chip P α fNCV2/2 K‐1(αK‐2)K (K1 )2= α = 1, when α=1 5 2 Generations k= 0.72 =0.5 and α =1 Single MOFET Vdd 0.5 Lg 0.5 Id 0.5 Cg 0.5 P (Power)/Clock 0.53 = 0.125 τ (Switching time) 0.5 Chip N (# of Tr) 1/0.52 = 4 f (Clock) 1/0.5 = 2 P (Power) 1 6 - The concerns for limits of down-scaling have been announced for every generation.
    [Show full text]
  • Parallel Computing: Multithreading and Multicore We Demand Increasing Performance for Desktop Applications
    The Problem Parallel Computing: MultiThreading and MultiCore We demand increasing performance for desktop applications. How can we get that? There are four approaches that we’re going to discuss here: Mike Bailey 1. We can increase the clock speed (the “La-Z-Boy approach”). [email protected] 2. We can combine several separate computer systems, all working together Oregon State University (multiprocessing). 3. We can develop a single chip which contains multiple CPUs on it (multicore). 4. We can look at where the CPU is spending time waiting, and give it something else to do while it’s waiting (multithreading). "If you were plowing a field, which would you rather use – two strong oxen or 1024 chickens?" -- Seymore Cray Oregon State University Oregon State University Computer Graphics Computer Graphics mjb – November 13, 2012 mjb – November 13, 2012 1. Increasing Clock Speed -- Moore’s Law Moore’s Law “Transistor density doubles every 1.5 years.” From 1986 to 2002, processor performance increased an average of 52%/year Fabrication process sizes (“gate pitch”) have fallen from 65 nm to 45 nm to 32 nm to 22 nm Next will be 16 nm ! Note that oftentimes people ( incorrectly ) equivalence this to: “Clock speed doubles every 1.5 years.” Oregon State University Oregon State University Computer Graphics Source: http://www.intel.com/technology/mooreslaw/index.htm Computer Graphics mjb – November 13, 2012 mjb – November 13, 2012 Moore’s Law Clock Speed and Power Consumption From 1986 to 2002, processor performance increased an average of 52%/year which 1981 IBM PC 5 MHz means that it didn’t quite double every 1.5 years, but it did go up by 1.87, which is close.
    [Show full text]
  • Roadmap for 22Nm Logic CMOS and Beyond
    Roadmap for 22nm Logic CMOS and Beyond March 9, 2009 @Heritage Institute of Technology Hiroshi Iwai, Tokyo Institute of Technology 1 • There were many inventions in the 20th century: Airplane, Nuclear Power generation, Computer, Space aircraft, etc • However, everything has to be controlled by electronics • Electronics Most important invention in the 20th century • What is Electronics: To use electrons, Electronic Circuits Electronic Circuits started by the invention of vacuum tube (Triode) in 1906 Thermal electrons from cathode controlled by grid bias Lee De Forest Cathode Anode (heated) Grid (Positive bias) Same mechanism as that of transistor 4 wives of Lee De Forest 1906 Lucille Sheardown 1907 Nora Blatch 1912 Mary Mayo, singer 1930 Marie Mosquini, silent film actress Mary Marie 4 First Computer Eniac: made of huge number of vacuum tubes 1946 Big size, huge power, short life time filament Æ dreamed of replacing vacuum tube with solid‐state device Today's pocket PC made of semiconductor has much higher performance with extremely low power consumption 5 History of Semiconductor devices 1947, 1st Point Contact Bipolar Transistor: Ge Semiconductor, Bardeen, Brattin Æ Nobel Prize 1948, 1st Junction Bipolar Transistor, Ge Semiconductor, Schokley Æ Nobel Prize 1958, 1st Integrated Circuits, Ge Semiconductor, J.Kilby Æ Nobel Prize 1959, 1st Planar Integrated Circuits, R.Noice 1960, 1st MOS Transistor, Kahng, Si Semiconductor 1963, 1st CMOS Circuits, C.T. Sah and F. Wanlass 6 J. E. LILIENFELD DEVICES FOR CONTROLLED ELECTRIC CURRENT Filed March 28, 1928 J.E.LILIENFELD 7 Capacitor structure with notch Negative bias Gate Electrode Gate Insulator Semiconductor Electron No current Positive bias Electric field Current flows 8 G Surface Gate electrode Gate Oxd Channel Drain Source SD Electron flow 0 bias for gate Positive bias for gate Surface Potential (Negative direction) Negative 0V 0V N+-Si P-Si N+-Si P-Si 1V 1V N-Si N-Si Source Channel Drain Source Channel Drain 9 However, no one could realize MOSFET operation for more than 30 years.
    [Show full text]
  • Power and Performance Optimization for Network-On-Chip Based Many-Core Processors
    Power and Performance Optimization for Network-on-Chip based Many-Core Processors YUAN YAO Doctoral Thesis in Information and Communication Technology (INFKOMTE) School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm, Sweden 2019 KTH School of Electrical Engineering and Computer Science TRITA-EECS-AVL-2019:44 Electrum 229, SE-164 40 Stockholm ISBN 978-91-7873-182-4 SWEDEN Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framlägges till offentlig granskning för avläggande av teknologie doktorsexamen i Informations- och Kommunikationsteknik fredag den 23 Augusti 2019 klockan 9.00 i Sal B, Elect- rum, Kungl Tekniska högskolan, Kistagången 16, Kista. © Yuan Yao, May 27, 2019 Tryck: Universitetsservice US AB iii Abstract Network-on-Chip (NoC) is emerging as a critical shared architecture for CMPs (Chip Multi-/Many-Core Processors) running parallel and concurrent applications. As the core count scales up and the transistor size shrinks, how to optimize power and performance for NoC open new research challenges. As it can potentially consume 20–40% of the entire chip power [20, 40, 81], NoC power efficiency has emerged as one of the main design constraints in today’s and future high performance CMPs. For NoC power management, we propose a novel on-chip DVFS technique that is able to adjust per-region NoC V/F according to voted V/F levels from communicating threads. A thread periodically votes for a preferred NoC V/F level that best suits its individual performance interests. The final DVFS decision of each region is adjusted by a region DVFS controller democratically based on the majority of votes it receives.
    [Show full text]