A Dual-Core, Dual-Thread Itanium Processor

Total Page:16

File Type:pdf, Size:1020Kb

A Dual-Core, Dual-Thread Itanium Processor MONTECITO: A DUAL-CORE, DUAL-THREAD ITANIUM PROCESSOR INTEL’S MONTECITO IS THE FIRST ITANIUM PROCESSOR TO FEATURE DUPLICATE, DUAL-THREAD CORES AND CACHE HIERARCHIES ON A SINGLE DIE. IT FEATURES A LANDMARK 1.72 BILLION TRANSISTORS AND SERVER-FOCUSED TECHNOLOGIES, AND IT REQUIRES ONLY 100 WATTS OF POWER. Intel’s Itanium 2 processor series has cycle. This eight-entry queue decouples the regularly delivered additional performance front end from the back and delivers up to through the increased frequency and cache as two bundles of any alignment to the remain- evidenced by the 6-Mbyte and 9-Mbyte ver- ing six pipeline stages. The dispersal logic sions.1 Montecito is the next offering in the determines issue groups and allocates up to Itanium processor family and represents many six instructions to nearly every combination firsts for both Intel and the computing indus- of the 11 available functional units (two inte- try. Its 1.7 billion transistors extend the Itani- ger, four memory, two floating point, and um 2 core with an enhanced form of temporal three branch). The renaming logic maps vir- multithreading and a substantially improved tual registers specified by the instruction into cache hierarchy. In addition to these land- physical registers, which access the actual reg- marks, designers have incorporated technolo- ister file (12 integer and eight floating-point Cameron McNairy gies and enhancements that target reliability read ports) in the next stage. Instructions then and manageability, power efficiency, and per- perform their operation or issue requests to Rohit Bhatia formance through the exploitation of both the cache hierarchy. The full bypass network instruction- and thread-level parallelism. The allows nearly immediate access to previous Intel result is a single 21.5-mm ¥ 27.7-mm die2 that instruction results while the retirement logic can execute four independent contexts on two writes final results into the register files (10 cores with nearly 27 Mbytes of cache, at over integer and 10 floating-point write ports). 1.8 GHz, yet consumes only 100W of power. Figure 2 is a block diagram of Montecito, which aims to preserve application and oper- Beyond Itanium 2 ating system investments while providing Figure 1 is a block diagram of the Itanium greater opportunity for code generators to 2 processor.3 The front end—with two levels continue their steady performance push. This of branch prediction, two translation look- opportunity is important, because even three aside buffers (TLBs) and a zero-cycle branch years after the Itanium 2’s debut, compilers predictor—feeds two bundles (three instruc- continue to be a source of significant perfor- tions each) into the instruction buffer every mance improvement. Unfortunately, compiler- 10 Published by the IEEE Computer Society 0272-1732/05/$20.00 ” 2005 IEEE optimization compatibility—which lets proces- sors run each other’s code optimally—limits L1 Branch Instruction cache (16 KB) prediction TLB the freedom to explore aggressive ways of increasing cycle-for-cycle performance and B B B I I M M M M F F overall frequency. Single-thread performance improvements Register stack engine/Rename Internal evaluations of Itanium 2 code indi- cate that static instructions per cycle (IPC) Branch and Integer Floating- hover around three and often reach six for a predicate registers point registers registers wide array of workloads. Dynamic IPC decreases from the static highs for nearly all Integer Memory/ Floating- Branch unit workloads. The IPC reduction is primarily due unit Integer point unit to inefficient cache-hierarchy accesses, and to a small degree, functional unit asymmetries L1D Data ALAT and inefficiencies in branching and specula- cache (16 KB) TLB tion. Montecito targeted these performance weaknesses, optimizing nearly every core block and piece of control logic to improve some per- L2 formance aspect. cache (256 KB) Asymmetry, branching, and speculation Queues/ L3 Control cache (3 MB) To address the port-asymmetry problem, Montecito adds a second integer shifter yielding a performance improvement of nearly 100 per- System interface cent for important cryptographic codes. To address branching inefficiency, Montecito’s front end removes the bottlenecks surrounding Figure 1. Block diagram of Intel’s Itanium 2 processor. B, I, single-cycle branches, which are prevalent in M, and F: branch, integer, memory, and floating-point func- integer and enterprise workloads. Finally, Mon- tional units; ALAT: advanced load address table; TLB: trans- tecito decreases the time to reach recovery code lation look-aside buffer. when control or data speculation fails, thereby lowering the cost of speculation and enabling the code to use speculation more effectively.4 accesses. The performance levels of Mon- tecito’s L1D and L1I caches are similar to Cache hierarchy those in the Itanium 2, but Montecito’s L1I Montecito supports three levels of on-chip and L1D have additional data protection. cache. Each core contains a complete cache hier- archy, with nearly 13.3 Mbytes per core, for a Level 2 caches. The real differences from the total of nearly 27 Mbytes of processor cache. Itanium 2’s cache hierarchies start at the L2 caches. The Itanium 2’s L2 shares data and Level 1 caches. The L1 caches (L1I and L1D) instructions, while the Montecito has dedi- are four-way, set associative, and each holds cated instruction (L2I) and data (L2D) 16 Kbytes of instructions or data. Like the rest caches. This separation of instruction and data of the pipeline, these caches are in order, but caches makes it possible to have dedicated they are also nonblocking, which enables high access paths to the caches and thus eliminates request concurrency. Access to L1I and L1D contention and capacity pressures at the L2 is through prevalidated tags and occurs in a caches. For enterprise applications, Mon- single cycle. L1D is write-through, dual- tecito’s dedicated L2 caches can offer up to a ported and banked to support two integer 7-percent performance increase. loads and two stores each cycle. The L1I has dual-ported tags and a single data port to sup- The L2I holds 1 Mbyte; is eight-way, set asso- port simultaneous demand and prefetch ciative; and has a 128-byte line size—yet has MARCH–APRIL 2005 11 HOT CHIPS 16 L1 Branch Instruction L1 Branch Instruction cache (16 KB) prediction TLB cache (16 KB) prediction TLB B B B I I M M M M F F B B B I I M M M M F F Register stack engine/Rename Register stack engine/Rename Foxton technology Branch and Integer Floating- Branch and Integer Floating- predicate registers point predicate registers point registers registers registers registers Integer Memory/ Floating- Integer Memory/ Floating- Branch unit Branch unit unit Integer point unit unit Integer point unit L1D Data System interface L1D Data ALAT ALAT cache (16 KB) TLB cache (16 KB) TLB L2D L2I L2D L2I cache (256 KB) cache (1 MB) cache (256 KB) cache (1 MB) Queues/ L3 Queues/ L3 Control cache (12 MB) Control cache (12 MB) Arbiter Synchronizer Synchronizer Figure 2. Block diagram of Intel’s Montecito. The dual cores and threads realize performance unattainable in the Itanium 2 processor. Montecito also addresses Itanium 2 port asymmetries and inefficiencies in branching, speculation, and cache hierarchy. the same seven-cycle instruction-access latency and increase the L2 miss latency. Montecito as the smaller Itanium 2 unified cache. The tag suspends such secondary misses until the L2D and data arrays are single ported, but the con- fill occurs. At that point, the fill immediately trol logic supports out-of-order and pipelined satisfies the suspended request. This approach accesses, which enable a high utilization rate. greatly reduces bandwidth contention and Montecito’s L2D has the same structure final latency. The L2D, like the Itanium 2’s and organization as the Itanium 2’s shared L2, is out of order, pipelined, and tracks 32 256-Kbyte L2 cache but with several micro- requests (L2D hits or L2D misses not yet architectural improvements to increase passed to the L3 cache) in addition to 16 miss- throughput. The L2D hit latency remains at es and their associated victims. The difference five cycles for integer and six cycles for float- is that Montecito allocates the 32 queue ing-point accesses. The tag array is true four- entries more efficiently, which provides a high- ported—four fully independent accesses in er concurrency level than with the Itanium 2. the same cycle—and the data array is pseudo- four-ported with 16-byte banks. Level 3 cache. Montecito’s L3 cache remains Montecito optimizes several aspects of the unified as in previous Itanium processors, but L2D. In the Itanium 2, any accesses to the is now 12 Mbytes. Even so, it maintains the same cache line beyond the first access that same 14-cycle integer-access latency typical of misses L2 will access the L2 tags periodically the L3s in the 6- and 9-Mbyte Itanium 2 fam- until the tags detect a hit. The repeated tag ily. Montecito’s L3 uses an asynchronous queries consume bandwidth from the core interface with the data array to achieve this 12 IEEE MICRO Ai Idle Ai+1 Idle Ai+2 Bi Idle Bi+1 Idle Bi+2 Ai Ai+1 Ai+2 Hidden latency Bi Bi+1 Bi+2 Time Figure 3. How two threads share a core. Control logic monitors the workload’s behavior and dynamically adjusts the time quantum for a thread. If the control logic determines that a thread is not making progress, it suspends that thread and gives execution resources to the other thread. This partially offsets the cost of long latency operations, such as memory accesses. The time to execute a thread switch—white rectangles at the side of each box—consumes a portion of the idle time.
Recommended publications
  • Instruction-Level Distributed Processing
    COVER FEATURE Instruction-Level Distributed Processing Shifts in hardware and software technology will soon force designers to look at microarchitectures that process instruction streams in a highly distributed fashion. James E. or nearly 20 years, microarchitecture research In short, the current focus on instruction-level Smith has emphasized instruction-level parallelism, parallelism will shift to instruction-level distributed University of which improves performance by increasing the processing (ILDP), emphasizing interinstruction com- Wisconsin- number of instructions per cycle. In striving munication with dynamic optimization and a tight Madison F for such parallelism, researchers have taken interaction between hardware and low-level software. microarchitectures from pipelining to superscalar pro- cessing, pushing toward increasingly parallel proces- TECHNOLOGY SHIFTS sors. They have concentrated on wider instruction fetch, During the next two or three technology generations, higher instruction issue rates, larger instruction win- processor architects will face several major challenges. dows, and increasing use of prediction and speculation. On-chip wire delays are becoming critical, and power In short, researchers have exploited advances in chip considerations will temper the availability of billions of technology to develop complex, hardware-intensive transistors. Many important applications will be object- processors. oriented and multithreaded and will consist of numer- Benefiting from ever-increasing transistor budgets ous separately
    [Show full text]
  • Caching Basics
    Introduction Why memory subsystem design is important • CPU speeds increase 25%-30% per year • DRAM speeds increase 2%-11% per year Autumn 2006 CSE P548 - Memory Hierarchy 1 Memory Hierarchy Levels of memory with different sizes & speeds • close to the CPU: small, fast access • close to memory: large, slow access Memory hierarchies improve performance • caches: demand-driven storage • principal of locality of reference temporal: a referenced word will be referenced again soon spatial: words near a reference word will be referenced soon • speed/size trade-off in technology ⇒ fast access for most references First Cache: IBM 360/85 in the late ‘60s Autumn 2006 CSE P548 - Memory Hierarchy 2 1 Cache Organization Block: • # bytes associated with 1 tag • usually the # bytes transferred on a memory request Set: the blocks that can be accessed with the same index bits Associativity: the number of blocks in a set • direct mapped • set associative • fully associative Size: # bytes of data How do you calculate this? Autumn 2006 CSE P548 - Memory Hierarchy 3 Logical Diagram of a Cache Autumn 2006 CSE P548 - Memory Hierarchy 4 2 Logical Diagram of a Set-associative Cache Autumn 2006 CSE P548 - Memory Hierarchy 5 Accessing a Cache General formulas • number of index bits = log2(cache size / block size) (for a direct mapped cache) • number of index bits = log2(cache size /( block size * associativity)) (for a set-associative cache) Autumn 2006 CSE P548 - Memory Hierarchy 6 3 Design Tradeoffs Cache size the bigger the cache, + the higher the hit ratio
    [Show full text]
  • Processamento Paralelo Em Cuda Aplicado Ao Modelo De Geração De Cenários Sintéticos De Vazões E Energias - Gevazp
    PROCESSAMENTO PARALELO EM CUDA APLICADO AO MODELO DE GERAÇÃO DE CENÁRIOS SINTÉTICOS DE VAZÕES E ENERGIAS - GEVAZP André Emanoel Rabello Quadros Dissertação de Mestrado apresentada ao Programa de Pós-Graduação em Engenharia Elétrica, COPPE, da Universidade Federal do Rio de Janeiro, como parte dos requisitos necessários à obtenção do título de Mestre em Engenharia Elétrica. Orientadora: Carmen Lucia Tancredo Borges Rio de Janeiro Março de 2016 PROCESSAMENTO PARALELO EM CUDA APLICADO AO MODELO DE GERAÇÃO DE CENÁRIOS SINTÉTICOS DE VAZÕES E ENERGIAS - GEVAZP André Emanoel Rabello Quadros DISSERTAÇÃO SUBMETIDA AO CORPO DOCENTE DO INSTITUTO ALBERTO LUIZ COIMBRA DE PÓS-GRADUAÇÃO E PESQUISA DE ENGENHARIA (COPPE) DA UNIVERSIDADE FEDERAL DO RIO DE JANEIRO COMO PARTE DOS REQUISITOS NECESSÁRIOS PARA A OBTENÇÃO DO GRAU DE MESTRE EM CIÊNCIAS EM ENGENHARIA ELÉTRICA. Examinada por: ________________________________________________ Profa. Carmen Lucia Tancredo Borges, D.Sc. ________________________________________________ Prof. Antônio Carlos Siqueira de Lima, D.Sc. ________________________________________________ Dra. Maria Elvira Piñeiro Maceira, D.Sc. ________________________________________________ Prof. Sérgio Barbosa Villas-Boas, Ph.D. RIO DE JANEIRO, RJ – BRASIL MARÇO DE 2016 iii Quadros, André Emanoel Rabello Processamento Paralelo em CUDA Aplicada ao Modelo de Geração de Cenários Sintéticos de Vazões e Energias – GEVAZP / André Emanoel Rabello Quadros. – Rio de Janeiro: UFRJ/COPPE, 2016. XII, 101 p.: il.; 29,7 cm. Orientador(a): Carmen Lucia Tancredo Borges Dissertação (mestrado) – UFRJ/ COPPE/ Programa de Engenharia Elétrica, 2016. Referências Bibliográficas: p. 9 6-99. 1. Programação Paralela. 2. GPU. 3. CUDA. 4. Planejamento da Operação Energética. 5. Geração de cenários sintéticos. I. Borges, Carmen Lucia Tancredo. II. Universidade Federal do Rio de Janeiro, COPPE, Programa de Engenharia Elétrica.
    [Show full text]
  • 45-Year CPU Evolution: One Law and Two Equations
    45-year CPU evolution: one law and two equations Daniel Etiemble LRI-CNRS University Paris Sud Orsay, France [email protected] Abstract— Moore’s law and two equations allow to explain the a) IC is the instruction count. main trends of CPU evolution since MOS technologies have been b) CPI is the clock cycles per instruction and IPC = 1/CPI is the used to implement microprocessors. Instruction count per clock cycle. c) Tc is the clock cycle time and F=1/Tc is the clock frequency. Keywords—Moore’s law, execution time, CM0S power dissipation. The Power dissipation of CMOS circuits is the second I. INTRODUCTION equation (2). CMOS power dissipation is decomposed into static and dynamic powers. For dynamic power, Vdd is the power A new era started when MOS technologies were used to supply, F is the clock frequency, ΣCi is the sum of gate and build microprocessors. After pMOS (Intel 4004 in 1971) and interconnection capacitances and α is the average percentage of nMOS (Intel 8080 in 1974), CMOS became quickly the leading switching capacitances: α is the activity factor of the overall technology, used by Intel since 1985 with 80386 CPU. circuit MOS technologies obey an empirical law, stated in 1965 and 2 Pd = Pdstatic + α x ΣCi x Vdd x F (2) known as Moore’s law: the number of transistors integrated on a chip doubles every N months. Fig. 1 presents the evolution for II. CONSEQUENCES OF MOORE LAW DRAM memories, processors (MPU) and three types of read- only memories [1]. The growth rate decreases with years, from A.
    [Show full text]
  • Hierarchical Roofline Analysis for Gpus: Accelerating Performance
    Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System Charlene Yang, Thorsten Kurth Samuel Williams National Energy Research Scientific Computing Center Computational Research Division Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA Berkeley, CA 94720, USA fcjyang, [email protected] [email protected] Abstract—The Roofline performance model provides an Performance (GFLOP/s) is bound by: intuitive and insightful approach to identifying performance bottlenecks and guiding performance optimization. In prepa- Peak GFLOP/s GFLOP/s ≤ min (1) ration for the next-generation supercomputer Perlmutter at Peak GB/s × Arithmetic Intensity NERSC, this paper presents a methodology to construct a hi- erarchical Roofline on NVIDIA GPUs and extend it to support which produces the traditional Roofline formulation when reduced precision and Tensor Cores. The hierarchical Roofline incorporates L1, L2, device memory and system memory plotted on a log-log plot. bandwidths into one single figure, and it offers more profound Previously, the Roofline model was expanded to support insights into performance analysis than the traditional DRAM- the full memory hierarchy [2], [3] by adding additional band- only Roofline. We use our Roofline methodology to analyze width “ceilings”. Similarly, additional ceilings beneath the three proxy applications: GPP from BerkeleyGW, HPGMG Roofline can be added to represent performance bottlenecks from AMReX, and conv2d from TensorFlow. In so doing, we demonstrate the ability of our methodology to readily arising from lack of vectorization or the failure to exploit understand various aspects of performance and performance fused multiply-add (FMA) instructions. bottlenecks on NVIDIA GPUs and motivate code optimizations.
    [Show full text]
  • An Algorithmic Theory of Caches by Sridhar Ramachandran
    An algorithmic theory of caches by Sridhar Ramachandran Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY. December 1999 Massachusetts Institute of Technology 1999. All rights reserved. Author Department of Electrical Engineering and Computer Science Jan 31, 1999 Certified by / -f Charles E. Leiserson Professor of Computer Science and Engineering Thesis Supervisor Accepted by Arthur C. Smith Chairman, Departmental Committee on Graduate Students MSSACHUSVTS INSTITUT OF TECHNOLOGY MAR 0 4 2000 LIBRARIES 2 An algorithmic theory of caches by Sridhar Ramachandran Submitted to the Department of Electrical Engineeringand Computer Science on Jan 31, 1999 in partialfulfillment of the requirementsfor the degree of Master of Science. Abstract The ideal-cache model, an extension of the RAM model, evaluates the referential locality exhibited by algorithms. The ideal-cache model is characterized by two parameters-the cache size Z, and line length L. As suggested by its name, the ideal-cache model practices automatic, optimal, omniscient replacement algorithm. The performance of an algorithm on the ideal-cache model consists of two measures-the RAM running time, called work complexity, and the number of misses on the ideal cache, called cache complexity. This thesis proposes the ideal-cache model as a "bridging" model for caches in the sense proposed by Valiant [49]. A bridging model for caches serves two purposes. It can be viewed as a hardware "ideal" that influences cache design. On the other hand, it can be used as a powerful tool to design cache-efficient algorithms.
    [Show full text]
  • Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice
    Clock Gating for Power Optimization in ASIC Design Cycle: Theory & Practice Jairam S, Madhusudan Rao, Jithendra Srinivas, Parimala Vishwanath, Udayakumar H, Jagdish Rao SoC Center of Excellence, Texas Instruments, India (sjairam, bgm-rao, jithendra, pari, uday, j-rao) @ti.com 1 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 2 AGENDA • Introduction • Combinational Clock Gating – State of the art – Open problems • Sequential Clock Gating – State of the art – Open problems • Clock Power Analysis and Estimation • Clock Gating In Design Flows JS/BGM – ISLPED08 3 Clock Gating Overview JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer JS/BGM – ISLPED08 4 Clock Gating Overview • System level gating: Turn off entire block disabling all functionality. • Conditions for disabling identified by the designer • Suspend clocks selectively • No change to functionality • Specific to circuit structure • Possible to automate gating at RTL or gate-level JS/BGM – ISLPED08 4 Clock Network Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer Power JS/BGM – ISLPED08 5 Clock Network Power • Clock network power consists of – Clock Tree Buffer
    [Show full text]
  • Evolution of Microprocessor Performance
    EvolutionEvolution ofof MicroprocessorMicroprocessor PerformancePerformance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static & dynamic branch predication. Even with these improvements, the restriction of issuing a single instruction per cycle still limits the ideal CPI = 1 Multiple Issue (CPI <1) Multi-cycle Pipelined T = I x CPI x C (single issue) Superscalar/VLIW/SMT Original (2002) Intel Predictions 1 GHz ? 15 GHz to ???? GHz IPC CPI > 10 1.1-10 0.5 - 1.1 .35 - .5 (?) Source: John P. Chen, Intel Labs We next examine the two approaches to achieve a CPI < 1 by issuing multiple instructions per cycle: 4th Edition: Chapter 2.6-2.8 (3rd Edition: Chapter 3.6, 3.7, 4.3 • Superscalar CPUs • Very Long Instruction Word (VLIW) CPUs. Single-issue Processor = Scalar Processor EECC551 - Shaaban Instructions Per Cycle (IPC) = 1/CPI EECC551 - Shaaban #1 lec # 6 Fall 2007 10-2-2007 ParallelismParallelism inin MicroprocessorMicroprocessor VLSIVLSI GenerationsGenerations Bit-level parallelism Instruction-level Thread-level (?) (TLP) 100,000,000 (ILP) Multiple micro-operations Superscalar /VLIW per cycle Simultaneous Single-issue CPI <1 u Multithreading SMT: (multi-cycle non-pipelined) Pipelined e.g. Intel’s Hyper-threading 10,000,000 CPI =1 u uuu u u Chip-Multiprocessors (CMPs) u Not Pipelined R10000 e.g IBM Power 4, 5 CPI >> 1 uuuuuuu u AMD Athlon64 X2 u uuuuu Intel Pentium D u uuuuuuuu u u 1,000,000 u uu uPentium u u uu i80386 u i80286
    [Show full text]
  • Itanium-Based Solutions by Hp
    Itanium-based solutions by hp an overview of the Itanium™-based hp rx4610 server a white paper from hewlett-packard june 2001 table of contents table of contents 2 executive summary 3 why Itanium is the future of computing 3 rx4610 at a glance 3 rx4610 product specifications 4 rx4610 physical and environmental specifications 4 the rx4610 and the hp server lineup 5 rx4610 architecture 6 64-bit address space and memory capacity 6 I/O subsystem design 7 special features of the rx4610 server 8 multiple upgrade and migration paths for investment protection 8 high availability and manageability 8 advanced error detection, correction, and containment 8 baseboard management controller (BMC) 8 redundant, hot-swap power supplies 9 redundant, hot-swap cooling 9 hot-plug disk drives 9 hot-plug PCI I/O slots 9 internal removable media 10 system control panel 10 ASCII console for hp-ux 10 space-saving rack density 10 complementary design and packaging 10 how hp makes the Itanium transition easy 11 binary compatibility 11 hp-ux operating system 11 seamless transition—even for home-grown applications 12 transition help from hp 12 Itanium quick start service 12 partner technology access centers 12 upgrades and financial incentives 12 conclusion 13 for more information 13 appendix: Itanium advantages in your computing future 14 hp’s CPU roadmap 14 Itanium processor architecture 15 predication enhances parallelism 15 speculation minimizes the effect of memory latency 15 inherent scalability delivers easy expansion 16 what this means in a server 16 2 executive The Itanium™ Processor Family is the next great stride in computing--and it’s here today.
    [Show full text]
  • Arduino and AVR
    Arduino and AVR Ke vin J Dola n a nd Eric Te ve lson Agenda • History of Arduino • Comparison to Other Platforms • Arduino Uno - Hardware • ATmega328P Peripherals • Instruction Set • Processor Components • Pipe lining • Programming • Applica tions • Future of Arduino History of Arduino • Fa mily of Microcontrolle rs cre a te d a s a ma ste rs the sis proje ct • intended for use by a non-technical audience of artists, designers, etc. • Made for accessibility and ease of use. • Programming made easy for the audience • Ability to program board via USB • Inexpensive price point • Expanded for other types and configurations • Example: Arduino Lilypad for wearable technologies • Popularity has expanded functionality including “shields” and Bluetooth. Comparison to Other Platforms • Raspberry Pi • Raspberry Pi is a full computer that can run and support an OS, and has built in graphics. • Porta bility is a n issue , since a n e xte rna l supply is ne e de d. • Network needs more setup on an Arduino • Raspberry Pi does not support analog sensors as well • Teensy • Less expensive • Compatible with Arduino “sketches” and “shields” • Be tte r ADC sa mpling, sa me functiona lity, be tte r re solution • Sma lle r physica l boa rd size Arduino Uno - Hardware • ATmega328P Microcontroller • 3 2 KB Fla s h Me m o ry (2 KB S RAM, 1 KB EEP RO M) • 16 MH z C lo c k • 14 Digita l I/O Pins • 6 PWM Digita l I/O Pins • 6 Analog Input Pins • Up to 20mA DC Current per I/O Pin up to 300mA total across all pins • 50mA DC Current on 3.3V Pin Arduino Uno - Hardware ATmega328P Peripherals • Total of 6 accessible A/D Pins on Port C • 14 GPIO (7 Pins each from PORT B & D) • UART (Se ria l) • SPI Support • Watchdog timer to reset CPU Instruction Set • Harvard Architecture, which is non-von Neumann memory, but still a von Neumann architecture.
    [Show full text]
  • Saber Eletrônica, Designers Pois Precisamos Comprovar Ao Meio Anunciante Estes Números E, Assim, Carlos C
    editorial Editora Saber Ltda. Digital Freemium Edition Diretor Hélio Fittipaldi Nesta edição comemoramos o fantástico número de 258.395 downloads da edição 460 digital em PDF que tivemos nos primeiros 50 dias de circu- www.sabereletronica.com.br lação. Assim, esperamos atingir meio milhão em twitter.com/editora_saber seis meses. Na fase de teste, no ano passado, com Editor e Diretor Responsável a Edição Digital Gratuita que chamamos de “Digital Hélio Fittipaldi Conselho Editorial Freemium (Free + Premium) Edition”, já atingimos João Antonio Zuffo este marco e até ultrapassamos. Redação Fica aqui nosso agradecimento a todos os que Hélio Fittipaldi Augusto Heiss seguiram nosso apelo, para somente fazerem Revisão Técnica Eutíquio Lopez download das nossas edições através do link do Portal Saber Eletrônica, Designers pois precisamos comprovar ao meio anunciante estes números e, assim, Carlos C. Tartaglioni, Diego M. Gomes obtermos patrocínio para manter a edição digital gratuita. Publicidade Aproveitamos também para avisar aos nossos leitores de Portugal, Caroline Ferreira, cerca de 6.000 pessoas, que infelizmente os custos para enviarmos as Nikole Barros revistas impressas em papel têm sido altos e, por solicitação do nosso Colaboradores Alexandre Capelli, distribuidor, não enviaremos mais os exemplares impressos em papel Bruno Venâncio, para distribuição no mercado português e ex-colônias na África. César Cassiolato, Dante J. S. Conti, Em junho teremos a edição especial deste semestre e o assunto prin- Edriano C. de Araújo, cipal é a eletrônica embutida (embedded electronic), ou como dizem os Eutíquio Lopez, Tsunehiro Yamabe espanhóis e portugueses: electrónica embebida. Como marco teremos também no Centro de Exposições Transamérica em São Paulo, a 2ª edição da ESC Brazil 2012 e a 1ª MD&M, o maior evento de tecnologia para o mercado de design eletrônico que, neste ano, estará sendo promovido PARA ANUNCIAR: (11) 2095-5339 pela UBM junto com o primeiro evento para o setor médico/odontoló- [email protected] gico (a MD&M Brazil).
    [Show full text]
  • Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗
    IEICE TRANS. INF. & SYST., VOL.E101–D, NO.4 APRIL 2018 1116 PAPER Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: A Practical Approach∗ Carlos Cesar CORTES TORRES†a), Nonmember, Hayate OKUHARA†, Student Member, Nobuyuki YAMASAKI†, Member, and Hideharu AMANO†, Fellow SUMMARY In the past decade, real-time systems (RTSs), which must in RTSs. These techniques can improve energy efficiency; maintain time constraints to avoid catastrophic consequences, have been however, they often require a large amount of power since widely introduced into various embedded systems and Internet of Things they must control the supply voltages of the systems. (IoTs). The RTSs are required to be energy efficient as they are used in embedded devices in which battery life is important. In this study, we in- Body bias (BB) control is another solution that can im- vestigated the RTS energy efficiency by analyzing the ability of body bias prove RTS energy efficiency as it can manage the tradeoff (BB) in providing a satisfying tradeoff between performance and energy. between power leakage and performance without affecting We propose a practical and realistic model that includes the BB energy and the power supply [4], [5].Itseffect is further endorsed when timing overhead in addition to idle region analysis. This study was con- ducted using accurate parameters extracted from a real chip using silicon systems are enabled with silicon on thin box (SOTB) tech- on thin box (SOTB) technology. By using the BB control based on the nology [6], which is a novel and advanced fully depleted sili- proposed model, about 34% energy reduction was achieved.
    [Show full text]