Best Practice Guide Modern Processors

Total Page:16

File Type:pdf, Size:1020Kb

Best Practice Guide Modern Processors Best Practice Guide Modern Processors Ole Widar Saastad, University of Oslo, Norway Kristina Kapanova, NCSA, Bulgaria Stoyan Markov, NCSA, Bulgaria Cristian Morales, BSC, Spain Anastasiia Shamakina, HLRS, Germany Nick Johnson, EPCC, United Kingdom Ezhilmathi Krishnasamy, University of Luxembourg, Luxembourg Sebastien Varrette, University of Luxembourg, Luxembourg Hayk Shoukourian (Editor), LRZ, Germany Updated 5-5-2021 1 Best Practice Guide Modern Processors Table of Contents 1. Introduction .............................................................................................................................. 4 2. ARM Processors ....................................................................................................................... 6 2.1. Architecture ................................................................................................................... 6 2.1.1. Kunpeng 920 ....................................................................................................... 6 2.1.2. ThunderX2 .......................................................................................................... 7 2.1.3. NUMA architecture .............................................................................................. 9 2.2. Programming Environment ............................................................................................... 9 2.2.1. Compilers ........................................................................................................... 9 2.2.2. Vendor performance libraries ................................................................................ 10 2.2.3. Scalable Vector Extension (SVE) software support ................................................... 11 2.3. Benchmark performance ................................................................................................. 12 2.3.1. STREAM - memory bandwidth benchmark - Kunpeng 920 ......................................... 12 2.3.2. STREAM - memory bandwidth benchmark - Thunder X2 .......................................... 13 2.3.3. High Performance Linpack ................................................................................... 14 2.4. MPI Ping-pong performance using RoCE .......................................................................... 15 2.5. HPCG - High Performance Conjugated Gradients ............................................................... 15 2.6. Simultaneous Multi Threading (SMT) performance impact ................................................... 17 2.7. IOR ............................................................................................................................ 19 2.8. European ARM processor based systems ........................................................................... 19 2.8.1. Fulhame (EPCC) ................................................................................................ 19 3. Processors Intel Skylake ........................................................................................................... 21 3.1. Architecture ................................................................................................................. 21 3.1.1. Memory Architecture .......................................................................................... 22 3.1.2. Power Management ............................................................................................. 22 3.2. Programming Environment ............................................................................................. 23 3.2.1. Compilers .......................................................................................................... 23 3.2.2. Available Numerical Libraries ............................................................................... 24 3.3. Benchmark performance ................................................................................................. 25 3.3.1. MareNostrum system ........................................................................................... 25 3.3.2. SuperMUC-NG system ........................................................................................ 28 3.4. Performance Analysis .................................................................................................... 33 3.4.1. Intel Application Performance Snapshot .................................................................. 33 3.4.2. Scalasca ............................................................................................................ 42 3.4.3. Arm Forge Reports ............................................................................................. 44 3.4.4. PAPI ................................................................................................................ 45 3.5. Tuning ........................................................................................................................ 48 3.5.1. Compiler Flags ................................................................................................... 48 3.5.2. Serial Code Optimisation ..................................................................................... 50 3.5.3. Shared Memory Programming-OpenMP .................................................................. 53 3.5.4. Distributed memory programming -MPI .................................................................. 56 3.5.5. Environment Variables for Process Pinning OpenMP+MPI ......................................... 61 3.6. European SkyLake processor based systems ....................................................................... 63 3.6.1. MareNostrum 4 (BSC) ......................................................................................... 63 3.6.2. SuperMUC-NG (LRZ) ......................................................................................... 67 4. AMD Rome Processors ............................................................................................................ 69 4.1. System Architecture ....................................................................................................... 70 4.1.1. Cores - «real» vs. virtual/logical ............................................................................ 75 4.1.2. Memory Architecture .......................................................................................... 76 4.1.3. NUMA ............................................................................................................. 79 4.1.4. Balance of AMD/Rome system ............................................................................. 80 4.2. Programming Environment ............................................................................................. 80 4.2.1. Available Compilers ............................................................................................ 80 4.2.2. Compiler Flags ................................................................................................... 81 4.2.3. AMD Optimizing CPU Libraries (AOCL) ............................................................... 82 4.2.4. Intel Math Kernel Library .................................................................................... 83 2 Best Practice Guide Modern Processors 4.2.5. Library performance ............................................................................................ 84 4.3. Benchmark performance ................................................................................................. 84 4.3.1. Stream - memory bandwidth benchmark ................................................................. 84 4.3.2. High Performance Linpack ................................................................................... 85 4.4. Performance Analysis .................................................................................................... 86 4.4.1. perf (Linux utility) .............................................................................................. 86 4.4.2. perfcatch ........................................................................................................... 87 4.4.3. AMD µProf ....................................................................................................... 88 4.4.4. Roof line model ................................................................................................. 91 4.5. Tuning ........................................................................................................................ 93 4.5.1. Introduction ....................................................................................................... 93 4.5.2. Intel MKL pre 2020 version ................................................................................. 93 4.5.3. Intel MKL 2020 version ...................................................................................... 93 4.5.4. Memory bandwidth per core ................................................................................. 94 4.6. European AMD processor based systems ........................................................................... 95 4.6.1. HAWK system (HLRS) ....................................................................................... 95 4.6.2. Betzy system (Sigma2) ........................................................................................ 96 A. Acronyms and Abbreviations ................................................................................................... 100 1. Units ...........................................................................................................................
Recommended publications
  • AMD Athlon™ Processor X86 Code Optimization Guide
    AMD AthlonTM Processor x86 Code Optimization Guide © 2000 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other applica- tion in which the failure of AMD’s product could create a situation where per- sonal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. Trademarks AMD, the AMD logo, AMD Athlon, K6, 3DNow!, and combinations thereof, AMD-751, K86, and Super7 are trademarks, and AMD-K6 is a registered trademark of Advanced Micro Devices, Inc. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation.
    [Show full text]
  • Forge and Performance Reports Modules $ Module Load Intel Intelmpi $ Module Use /P/Scratch/Share/VI-HPS/JURECA/Mf/ $ Module Load Arm-Forge Arm-Reports
    Acting on Insight Tips for developing and optimizing scientific applications [email protected] 28/06/2019 Agenda • Introduction • Maximize application efficiency • Analyze code performance • Profile multi-threaded codes • Optimize Python-based applications • Visualize code regions with Caliper 2 © 2019 Arm Limited Arm Technology Already Connects the World Arm is ubiquitous Partnership is key Choice is good 21 billion chips sold by We design IP, we do not One size is not always the best fit partners in 2017 manufacture chips for all #1 in Infrastructure today with Partners build products for HPC is a great fit for 28% market shares their target markets co-design and collaboration 3 © 2019 Arm Limited Arm’s solution for HPC application development and porting Combines cross-platform tools with Arm only tools for a comprehensive solution Cross-platform Tools Arm Architecture Tools FORGE C/C++ & FORTRAN DDT MAP COMPILER PERFORMANCE PERFORMANCE REPORTS LIBRARIES 4 © 2019 Arm Limited The billion dollar question in “weather and forecasting” Is it going to rain tomorrow? 1. Choose domain 2. Gather Data 3. Create Mesh 4. Match Data to Mesh 5. Simulate 6. Visualize 5 © 2019 Arm Limited Weather forecasting workflow Deploy Production Staging environment Builds Scalability Performance Develop Fix Regressions CI Agents Optimize Commit Continuous Version Start CI job control integration Pull system framework • 24 hour timeframe 6 © 2019 Arm Limited • 2 to 3 test runs for 1 production run Application efficiency Scientist Developer System admin Decision maker • Efficient use of allocation • Characterize application • Maximize resource usage • High-level view of system time behaviour • Diagnose performance workload • Higher result throughput • Gets hints on next issues • Reporting figures and optimization steps analysis to help decision making 7 © 2019 Arm Limited Arm Performance Reports Characterize and understand the performance of HPC application runs Gathers a rich set of data • Analyses metrics around CPU, memory, IO, hardware counters, etc.
    [Show full text]
  • Adaptive Data Migration in Load-Imbalanced HPC Applications
    Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 10-16-2020 Adaptive Data Migration in Load-Imbalanced HPC Applications Parsa Amini Louisiana State University and Agricultural and Mechanical College Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Computer Sciences Commons Recommended Citation Amini, Parsa, "Adaptive Data Migration in Load-Imbalanced HPC Applications" (2020). LSU Doctoral Dissertations. 5370. https://digitalcommons.lsu.edu/gradschool_dissertations/5370 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. ADAPTIVE DATA MIGRATION IN LOAD-IMBALANCED HPC APPLICATIONS A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Computer Science by Parsa Amini B.S., Shahed University, 2013 M.S., New Mexico State University, 2015 December 2020 Acknowledgments This effort has been possible, thanks to the involvement and assistance of numerous people. First and foremost, I thank my advisor, Dr. Hartmut Kaiser, who made this journey possible with their invaluable support, precise guidance, and generous sharing of expertise. It has been a great privilege and opportunity for me be your student, a part of the STE||AR group, and the HPX development effort. I would also like to thank my mentor and former advisor at New Mexico State University, Dr.
    [Show full text]
  • Benchmark Evaluations of Modern Multi Processor Vlsi Ds Pm Ps
    1220 Session 1220 Benchmark Evaluations of Modern Multi-Processor VLSI DSPµPs Aaron L. Robinson and Fred O. Simons, Jr. High-Performance Computing and Simulation (HCS) Research Laboratory Electrical Engineering Department Florida A&M University and Florida State University Tallahassee, FL 32316-2175 Abstract - The authors continue their tradition of presenting technical reviews and performance comparisons of the newest multi-processor VLSI DSPµPs with the intention of providing concise focused analyses designed to help established or aspiring DSP analysts evaluate the applicability of new DSP technology to their specific applications. As in the past, the Analog Devices SHARCTM and Texas Instruments TMS320C80 families of DSPµPs will be the focus of our presentation because these manufacturers continue to push the envelop of new DSPµPs (Digital Signal Processing microProcessors) development. However, in addition to the standard performance analyses and benchmark evaluations, the authors will present a new image-processing bench marking technique designed specifically for evaluating new DSPµP image processing capabilities. 1. Introduction The combination of constantly evolving DSP algorithm development and continually advancing DSPuP hardware have formed the basis for an exponential growth in DSP applications. The increased market demand for these applications and their required hardware has resulted in the production of DSPuPs with very powerful components capable of performing a wide range of complicated operations. Due to the complicated architectures of these new processors, it would be time consuming for even the experienced DSP analysts to review and evaluate these new DSPuPs. Furthermore, the inexperienced DSP analysts would find it even more time consuming, and possibly very difficult to appreciate the significance and opportunities for these new components.
    [Show full text]
  • D5.1 Report on Performance and Tuning of Runtime Libraries for ARM
    D5.1 Report on performance and tuning of runtime libraries for ARM Architecture Version 1.0 Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline M12 Dissemintation Level PU Nature Report Coordinator Alex Ramirez (BSC) Contributors Isaac Gelado (BSC), Xavier Martorell (BSC), Judit Gimenez (BSC), Harald Servat (BSC), Bernd Mohr (JUELICH), Daniel Lorenz (JUELICH), Marc Schluetter (JUELICH) Reviewers Chris Adeniyi-Jones (ARM) Keywords compiler, runtime system, parallelism, instrumentation, perfor- mance analysis Notices: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 288777 c 2011 Mont-Blanc Consortium Partners. All rights reserved. D5.1 - Report on performance and tuning of runtime libraries for ARM Architecture Version 1.0 Change Log Version Description of Change v0.1 Initial draft released to the WP5 contributors v0.2 Draft version sent to internal reviewer v1.0 Final version sent to the EU 2 D5.1 - Report on performance and tuning of runtime libraries for ARM Architecture Version 1.0 Contents Executive Summary 4 1 Introduction 5 2 Runtime Libraries 6 2.1 Mercurium compiler . .6 2.1.1 Porting to ARM-v7 . .6 2.1.2 Fortran support . .6 2.2 Nanos++ . .7 2.3 Extrae instrumentation library . .8 2.4 Scalasca support for the ARM platform . .8 3 Performance Analysis and Tuning 11 4 Conclusions 14 3 D5.1 - Report on performance and tuning of runtime libraries for ARM Architecture Version 1.0 Executive Summary In this Mont-Blanc deliverable we present the current status of porting to the ARM architecture of the OmpSs (Mercurium compiler and Nanos++ runtime system), the Extrae instrumentation library and the Scalasca instrumentation facilities.
    [Show full text]
  • (12) Patent Application Publication (10) Pub. No.: US 2011/0231717 A1 HUR Et Al
    US 20110231717A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0231717 A1 HUR et al. (43) Pub. Date: Sep. 22, 2011 (54) SEMICONDUCTOR MEMORY DEVICE Publication Classification (76) Inventors: Hwang HUR, Kyoungki-do (KR): ( 51) Int. Cl. Chang-Ho Do, Kyoungki-do (KR): C. % CR Jae-Bum Ko, Kyoungki-do (KR); ( .01) Jin-Il Chung, Kyoungki-do (KR) (21) Appl. N 13A149,683 (52) U.S. Cl. ................................. 714/718; 714/E11.145 ppl. No.: 9 (22)22) FileFiled: Mavay 31,51, 2011 (57) ABSTRACT Related U.S. Application Data Semiconductor memory device includes a cell array includ (62) Division of application No. 12/154,870, filed on May ing a plurality of unit cells; and a test circuit configured to 28, 2008, now Pat. No. 7,979,758. perform a built-in self-stress (BISS) test for detecting a defect by performing a plurality of internal operations including a (30) Foreign Application Priority Data write operation through an access to the unit cells using a plurality of patterns during a test procedure carried out at a Feb. 29, 2008 (KR) ............................. 2008-OO18761 wafer-level. O 2 FAD MoDE PE, FRS PWD PADX8 GENERATOR EAELER PEC 3. PD) PWDC CRECKCCES CKCKB RASCASECSCE SISTE CLOCKBUFFER CEE e EST CLK BSI RASCASECSCKE WREFB ESTE s ACTF 32 P 8:W 4. 38t.A 3. se. A(2) FIRST REFRESH AKE13A2) REFIREFA CONTROR ROWANC RSBSAApp.12BS ARS BA255612.ESE E L - ww.i. 33 EE 3. PE PWDA RAE7(3) ECS issi: OLREFB SECOND LAK:2) BSDT DATABUFFERG08 EABLER REFIREFA SECON REFRESH BSE Y ADDRC CONTROLLER BST YEADDK)) PWS Testaposs WKB811-12 BAADDICBAAD BSYBRST (88.11:12) "Of BST CLK 380 3.
    [Show full text]
  • Fairborn Camera & Video
    © 2006 The Dayton Microcomputer Association, Inc. V O L UM E 3 1 , I S S U E 6 P A G E 1 TM Volume 31 Issue 6 www.DMA.org November 2006 Association of PC User Groups (APCUG) Member Location for October 31 General Meeting Topic meeting & map inside ... Parking Permits Fairborn Camera & Video Available … Mike Petros - Guest Speaker As the holidays approach, the search be- for prints. RAW format allows the option of gins for the hottest items for Christmas. We manipulating details with photo-editing soft- generally look for some high-tech gadget ware on a PC. It’s a good thing that memory that could make our lives easier, more pro- cards hold more data than ever before. ductive, or just plain fun. The latest in cam- Mike Petros, Store Manager at Fairborn era equipment has always been a favorite Camera & Video, has agreed to demonstrate and the choices this year are impressive. several of this year’s latest digital cameras There seem to be a half dozen models for and help make sense of their many features. every application. Credit-card sized “point Mike draws from 30 years of experience in and shoot” cameras fit easily in a pocket and the business. He often gives presentations to travel well. Those best suited for sports pho- local organizations and once wrote articles tography are the ones marked “single-lens- for the Midwest PC Review magazine. reflex”. They tend to be more responsive Although he would not give away any de- and accept interchangeable lenses. Cameras tails on the specials they will be running this of all sizes offer digital viewfinders.
    [Show full text]
  • Laptops Portfolio
    PREMIUM THE COMMERCIAL ThinkPad X1 Yoga (5th Gen) Thinkpad X1 Extreme (2nd Gen) ThinkPad X1 Carbon (8th Gen) ThinkPad X Series ThinkPad T Series ThinkBook Series LAPTOPS ThinkPad X13, X13 Yoga ThinkPad T14, T14s, T15 ThinkBook Plus, 13s PORTFOLIO MAINSTREAM ThinkPad L Series ThinkPad E Series ThinkBook 14, 15 ThinkPad L13, L13 Yoga, L14, L15 ThinkPad E14, E15 BUDGET FRIENDLY ThinkPad 11e Lenovo™ V Series 11e (5th gen), Lenovo V14, V15, V17 11e Yoga (6th Gen) Lenovo recommends Windows 10 Pro for business Lenovo™ Chromebooks Lenovo™ Winbooks 100e, 300e, 500e 100e, 300e 8 9 Premium Computing ThinkPad Laptops and Ultrabooks ThinkPad X1 Extreme (2nd Gen) ThinkPad Laptops & Ultrabooks: Legendary business tools Powerhouse performer for computing and gaming • Comes with Windows 10 Pro Since launching in 1992, the philosophy behind ThinkPad® has been to build products to suit • Up to 9th Generation Intel® Core™ i9 Processor the human nature of the business user and enable them to be more productive. Superior • NVIDIA® GeForce® GTX 1650 (MaxQ w/4GB GDDR5) thinking in technology and engineering has made ThinkPad a leader in innovative features, • Up to 64GB DDR4 (2666MHz); Non ECC Dual DIMM thoughtful design, reliability and performance. • Up to 14 hours*, 80Whr battery It was this ‘extraordinary thinking’ that took inspiration from a lunch box to design the iconic ThinkPad, a machine that changed the way people do business. KEY DIFFERENTIATORS Award-winning keyboard: The most silent and Thin and light with With larger keys and more efficient cooling system maximum protection: Our space between them, it ever made: ThinkPad’s Owl ThinkPad range of products ensures a high quality Wing fan is 23% lighter, 10% are made of carbon fiber, a ThinkPad X1 Carbon (8th Gen) ThinkPad X1 Yoga (5th Gen) all-day comfortable smaller, has 38% increased light weight material which Marries premium performance & mobility Ever accommodating, it bends over backwards for typing experience.
    [Show full text]
  • Scalasca User Guide
    Scalasca 2.5 User Guide Scalable Automatic Performance Analysis March 2019 The Scalasca Development Team [email protected] The entire code of Scalasca v2 is licensed under the BSD-style license agreement given below, except for the third-party code distributed in the 'vendor/' subdirectory. Please see the corresponding COPYING files in the subdirectories of 'vendor/' included in the distribution tarball for details. Scalasca v2 License Agreement Copyright © 1998–2019 Forschungszentrum Jülich GmbH, Germany Copyright © 2009–2015 German Research School for Simulation Sciences GmbH, Jülich/Aachen, Germany Copyright © 2014–2015 RWTH Aachen University, Germany Copyright © 2003–2008 University of Tennessee, Knoxville, USA Copyright © 2006 Technische Universität Dresden, Germany All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. • Neither the names of – the Forschungszentrum Jülich GmbH, – the German Research School for Simulation Sciences GmbH, – the RWTH Aachen University, – the University of Tennessee, – the Technische Universität Dresden, nor the names of their contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DIS- CLAIMED.
    [Show full text]
  • Computer Architecture Out-Of-Order Execution
    Computer Architecture Out-of-order Execution By Yoav Etsion With acknowledgement to Dan Tsafrir, Avi Mendelson, Lihu Rappoport, and Adi Yoaz 1 Computer Architecture 2013– Out-of-Order Execution The need for speed: Superscalar • Remember our goal: minimize CPU Time CPU Time = duration of clock cycle × CPI × IC • So far we have learned that in order to Minimize clock cycle ⇒ add more pipe stages Minimize CPI ⇒ utilize pipeline Minimize IC ⇒ change/improve the architecture • Why not make the pipeline deeper and deeper? Beyond some point, adding more pipe stages doesn’t help, because Control/data hazards increase, and become costlier • (Recall that in a pipelined CPU, CPI=1 only w/o hazards) • So what can we do next? Reduce the CPI by utilizing ILP (instruction level parallelism) We will need to duplicate HW for this purpose… 2 Computer Architecture 2013– Out-of-Order Execution A simple superscalar CPU • Duplicates the pipeline to accommodate ILP (IPC > 1) ILP=instruction-level parallelism • Note that duplicating HW in just one pipe stage doesn’t help e.g., when having 2 ALUs, the bottleneck moves to other stages IF ID EXE MEM WB • Conclusion: Getting IPC > 1 requires to fetch/decode/exe/retire >1 instruction per clock: IF ID EXE MEM WB 3 Computer Architecture 2013– Out-of-Order Execution Example: Pentium Processor • Pentium fetches & decodes 2 instructions per cycle • Before register file read, decide on pairing Can the two instructions be executed in parallel? (yes/no) u-pipe IF ID v-pipe • Pairing decision is based… On data
    [Show full text]
  • Performance Tuning Workshop
    Performance Tuning Workshop Samuel Khuvis Scientifc Applications Engineer, OSC February 18, 2021 1/103 Workshop Set up I Workshop – set up account at my.osc.edu I If you already have an OSC account, sign in to my.osc.edu I Go to Project I Project access request I PROJECT CODE = PZS1010 I Reset your password I Slides are the workshop website: https://www.osc.edu/~skhuvis/opt21_spring 2/103 Outline I Introduction I Debugging I Hardware overview I Performance measurement and analysis I Help from the compiler I Code tuning/optimization I Parallel computing 3/103 Introduction 4/103 Workshop Philosophy I Aim for “reasonably good” performance I Discuss performance tuning techniques common to most HPC architectures I Compiler options I Code modifcation I Focus on serial performance I Reduce time spent accessing memory I Parallel processing I Multithreading I MPI 5/103 Hands-on Code During this workshop, we will be using a code based on the HPCCG miniapp from Mantevo. I Performs Conjugate Gradient (CG) method on a 3D chimney domain. I CG is an iterative algorithm to numerically approximate the solution to a system of linear equations. I Run code with srun -n <numprocs> ./test_HPCCG nx ny nz, where nx, ny, and nz are the number of nodes in the x, y, and z dimension on each processor. I Download with: wget go . osu .edu/perftuning21 t a r x f perftuning21 I Make sure that the following modules are loaded: intel/19.0.5 mvapich2/2.3.3 6/103 More important than Performance! I Correctness of results I Code readability/maintainability I Portability -
    [Show full text]
  • State of Lenovo Notebooks Running Coreboot Who Am I
    State of Lenovo notebooks running coreboot Who am I B.Sc. Electrical Engineering coreboot developer since 2015 privately motivated Maintaining Lenovo devices from day zero Working at 9eSec as Hardware- and Softwareengineer Contact: [email protected] Why are Lenovo's so awesome? Community driven Wide variety of platforms First class support Actively maintained by community BLOB free Cheap Easy to buy rugged case Good documentation Overview Past Statistics Current state Future Significant changes over the last years Significant changes 2015 General TPM 1.2 support Fix infinite notification loop on shutdown ACPI warnings fixed X60/T60 Native graphics init Brightness control T400/T500 Hybrid graphic support T430s/x220/x230/T530 ACPI code for hybrid graphics ACPI Improved native raminit HDA verb fixes USB3 support X200 Enable PEG device Support disabled IGD Pen support in tablet mode Basic IOMMU support Fixed panel flickering Picture by Infineon Press Photo, https://www.infineon.com/export/sites/default/media/press/Image/press_photo/TPM_SLB9635.jpg Picture by http://www.usb.org, https://de.wikipedia.org/wiki/Universal_Serial_Bus#/media/Datei:SuperSpeed_USB.svg Significant changes 2016 General X200/T400 Don't configure EC on ACPI S3 Enable C4 CPU pstates Fix non working keyboard on boot C T430s/x220/x230/T530 Make use of common GPIO driver First version of shared hybrid graphics driver Allow the use of VGA option ROMs Initial support for dual GPU support Fixed eSATA port Split SandyBridge / IvyBridge native raminit X220/X201 Huge amount
    [Show full text]