MID Linux Tools Launch Guidelines

Total Page:16

File Type:pdf, Size:1020Kb

MID Linux Tools Launch Guidelines Moblin.org SDK and Tools: Open Source Development for the Intel Powered MID Uli Dumschat Tools Product Marketing, Intel Intel® Software Development Products Overview Intel® Compilers • The best way to get application performance on Intel® multi- core processors Intel® VTune™ Performance Analyzers • Identify bottlenecks in source code and optimize multi-core performance Intel® Performance Libraries • Highly optimized, thread-safe, multimedia and HPC math functions Intel® Threading Analysis Tools • Find threading errors and optimize threaded applications for maximum performance Intel® Threading Building Blocks • C++ template-based runtime library that simplifies writing multithreaded applications for performance and scalability Intel® Cluster Tools • Create, analyze, optimize and deploy cluster-based applications Agenda Introduction to Moblin Intel® Compiler for more performance Intel® Debuggers to support Intel® Atom™ processor Intel® IPP Library for new feature development Intel® VTune™ to identify performance bottlenecks Introduction To Moblin Moblin Moblin SDK Intel® Tools for MID • Open Source Linux* SW Platform • Common platform for for Mobile & Embedded Devices development & deployment including Mobile Internet Devices • Complemented by OSS tools (MIDs), Netbooks, Automotive In- Vehicle Infotainment Systems • Rich Development Tool Suite from Intel: “Intel® C++ Software • Optimized for Intel® Atom™ Development Tool Suite for Processor Technologies Linux* OS Supporting Mobile Internet Devices” (Intel® Tools for MID) Moblin SDK Image Creator • Graphical tool to create a Linux* stack PowerTop • Identify how well system uses various power-saving features • Identify misbehaving applications • Get tuning suggestions to achieve low power consumption GNU Tools • GCC – Open Source Compiler Intel® Tools for MID • Complete set of development tools for optimized applications Documentation PowerTop • Sample apps, guides and documentation Moblin & Moblin SDK provide all components and tools to develop software for MIDs Why Intel Provides Software Development Products For MIDs Outstanding performance • Increased application software performance can help to extend battery life time Intel® Architecture customization increases productivity & efficiency • Find issues faster with system-level JTAG and application debugging Technology alignment • Latest Intel® Atom™ Processor Z5xx and chipset support • NDA Tools BETA programs for next generation silicon Excellent customer support Software Design Cycle Intel® C++ Compiler Intel® Debuggers y Highly optimizing y Intel® Atom™ y Full support for processor Z5xx and Intel® Atom™ chipset Support processor Z5xx & y Linux* Kernel and low- Moblin level driver debugging y GCC compatible y Application debugging y Built-in flash memory tool y Execution trace support Intel® IPP Library Intel® VTune™ Analyzer y Highly optimized y Tune code actually multimedia functions running on device y Tailored to y Performance bottleneck Intel® Atom™ identification processor Z5xx y Tuning Assistant Intel® MID Tool Suite covers the entire cycle of system and application software development Linux Compiler Features Compiler Features Benefits Performance Significantly faster than GCC High performing code maps directly into application quality and battery lifetime In-order scheduler Compiler optimization switch that re-arranges/optimizes application code to be executed with best performance on Intel’s Low-power IA technology Better performance of system- and application software helps to reduce power consumption of a mobile device Profile Guided Multi-stage optimization method with feedback loop Optimization Improves application performance by reducing instruction-cache thrashing, reorganizing code layout, shrinking code size, and reducing branch mispredictions GCC Compatibility Intel Compiler provides GCC language extensions and is source and binary code compatible with GCC Saves efforts in porting/re-using existing code Need For In-order Scheduler Support Representative assembly: 1 movl b,%eax Memory Load Dependency Stall Dependency 2 imull $7,%eax Consider code sequence: 3 movl %eax,a a = b * 7; 4 movl d,%edx c = d * 7; Memory Load Dependency Stall Dependency Processor cycles 5 imull $7,%edx 6 movl %edx,c • In some cases assembly code causes delays and dependency stalls which decrease the performance of application and performance critical code Need For In-order Scheduler Support Representative assembly: -xL Compiler 1 movl b,%eax switch 4 movl d,%edx 2 imull $7,%eax Consider code sequence: in-order 5 imull $7,%edx a = b * 7; scheduler 3 movl %eax,a c = d * 7; 6 movl %edx,c Processor cycles • Compiler switch –xL enables the in-order scheduler, which may improve application’s performance behavior SPEC CPU2000 Benchmark (Estimated) Estimated SPECint*_base2000 ICC10.1 vs. GCC4.2.1 16% 30% 1,4 faster faster Estimated based on the following configuration assumptions: Compilers: 1,2 Intel C/C++ Compiler 10.1.013 for Linux GCC 4.2.1 (as contained in Ubuntu/MOBLIN 2008/02/05) Hardware: 1,0 Intel® Atom™ Processor Z530 (code-named: Silverthorne) processor @ 1.60 GHz CPU: 1596.138 MHz (according to Linux /proc/cpuinfo) HT enabled (according to BIOS + Linux /proc/cpuinfo) 0,8 FSB: 533 MHz (according to BIOS) L1 cache: 24 KB 0,6 L2 cache: 512 KB 512 MB Memory (RAM) Board: Intel internal reference board 0,4 Target Operating System: GCC 4.2.1 GCC 4.2.1 GCC Ubuntu/MOBLIN 2008/02/05 (GCC = 4.2.1 1.00) installed on a hard disc drive (no USB stick) E stim ated Relative SPECint_base2000: 0,2 Version 1.3 Perform ance GCC to 4.2.1 SPEC and SPECint are trademarks of the Standard Performance Evaluation 0,0 Intel®Compiler 10.1 C++ for Linux* Intel®Compiler 10.1 C++ for Linux* Corporation. For more information see http://www.spec.org). Compiler switches used: -o2 Advanced optimization -o2: GCC: "-O2 -m32“ ICC: "-xL -O2 -vec-“ Advanced optimization: GCC 4.2.1 Intel® C++ Compiler 10.1 for Linux* GCC: "-m32 -fprofile-use -O3 -funroll-all-loops“ ICC: "-xL -prof_use -O3 -ipo -no-prec-div" Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products or call (U.S.) 1-800-628-8686 or 1-916-356-3104 Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Use Intel® C++ Compiler for higher performance on MIDs System And Application Debugger System y JTAG System Debugging Debugging - Connects through In-Target Probe eXtended Debug Port (ITP-XDP). - Requires JTAG connector on the target hardware and Intel’s XDP3 JTAG hardware interface - Flash Memory support Intel® C++ JTAG Debugger for Linux* OS Supporting Mobile Internet Devices (Intel® MID JTAG Debugger) - Ideal for kernel debugging and boardbring-upphase - Linux* OS awareness Application Debugging y Application Debugging - Connects through TCP/IP - Requires debug agent on the target - Linux* OS awareness - Ideal for application development Intel® C++ Application Debugger for Linux* OS Supporting Mobile Internet Devices (Intel® MID Application Debugger) Chipset Peripheral Registers Kernel Module/Audio Driver Intel® Atom™ Processor -Init Code Z510/Z530 #include <hdaudioregisters.h> #define HD_AUDIO_REG_BASE = 0x00FF0000; uint32 * hdaudioregbase = (uint32)HD_AUDIO_REG_BASE; 24bit init() LVDS 400/533 MHz { FSB System DDR2 400/533 hdaudioregbase[D27FO_IHDACR] = 0x01; (mem down) … SDVO Controller Hub 2 PCIe* x1 } Lanes GPIO SCH US15W 8* USB 2.0 Host US15W • ~400 Peripheral Registers Ports SMBus LPC SDIO/ MMC FWH SIO 1 IDE Channel Intel® High Definition Audio • Validating Peripheral Register Settings Can Be Quite Complex Processor & Chipset Specific Register Access Show and change the content of all processor & chipset registers Convenient access to architectural registers - analyze register changes after instruction execution Chipset Registers Bitfield Editor Graphical representation of peripheral registers and bit fields with online documentation Easy and fully documented access to all processor registers and peripherals. Change register contents on the fly, without re-compilation Note: Intel® MID JTAG Debugger requires the XDP3 JTAG hardware interface from Intel (see p.20 “Pricing & Support Model“) Trace Support • Hardware feature of Intel® Atom™ Processor • Enables viewing of execution history • Identify the root cause for exceptions Branch Trace Buffer On chip (Intel® MID JTAG Debugger) Memory allocated (Intel® MID Application Debugger) Executed Application Application Source Code Send Branch Trace Information To Debugger Branch Trace Support C/C++ Source Window Trace Window Stop application at specific signal Assembler Window OS Awareness Kernel • Monitor kernel modules and system threads (JTAG only) • Access status information • Halt, debug threads and applications and modules individually • Debugging of Linux*
Recommended publications
  • Intel Advisor for Dgpu Intel® Advisor Workflows
    Profile DPC++ and GPU workload performance Intel® VTune™ Profiler, Advisor Vladimir Tsymbal, Technical Consulting Engineer, Intel, IAGS Agenda • Introduction to GPU programming model • Overview of GPU Analysis in Intel® VTune Profiler • Offload Performance Tuning • GPU Compute/Media Hotspots • A DPC++ Code Sample Analysis Demo • Using Intel® Advisor to increase performance • Offload Advisor discrete GPUs • GPU Roofline for discrete GPUs Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 2 Intel GPUs and Programming Model Gen9 Application Workloads • Most common Optimized Middleware & Frameworks in mobile, desktop and Intel oneAPI Product workstations Intel® Media SDK Direct Direct API-Based Gen11 Programming Programming Programming • Data Parallel Mobile OpenCL platforms with C API C++ Libraries Ice Lake CPU Gen12 Low-Level Hardware Interface • Intel Xe-LP GPU • Tiger Lake CPU Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 3 GPU Application Analysis GPU Compute/Media Hotspots • Visibility into both host and GPU sides • HW-events based performance tuning methodology • Provides overtime and aggregated views GPU In-kernel Profiling • GPU source/instruction level profiling • SW instrumentation • Two modes: Basic Block latency and memory access latency Identify GPU occupancy and which kernel to profile. Tune a kernel on a fine grain level Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 4 GPU Analysis: Aggregated and Overtime Views Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
    [Show full text]
  • Getting Started with Oneapi
    ONEAPI SINGLE PROGRAMMING MODEL TO DELIVER CROSS-ARCHITECTURE PERFORMANCE Getting started with oneAPI March 2020 How oneAPIaddresses our Heterogeneous World? DIVERSE WORKLOADS DEMAND DIVERSEARCHITECTURES The future is a diverse mix of scalar, vector, matrix, andspatial architectures deployed in CPU, GPU, AI, FPGA, and other accelerators. Optimization Notice Copyright © 2020, Intel Corporation. All rights reserved. Getting started withoneAPI 4 *Other names and brands may be claimed as the property of others. CHALLENGE: PROGRAMMING IN A HETEROGENEOUSWORLD ▷ Diverse set of data-centric hardware ▷ No common programming language or APIs ▷ Inconsistent tool support across platforms ▷ Proprietary solutions on individual platforms S V M S ▷ Each platform requires unique software investment Optimization Notice Copyright © 2020, Intel Corporation. All rights reserved. Getting started withoneAPI 5 *Other names and brands may be claimed as the property of others. INTEL'S ONEAPI CORECONCEPT ▷ Project oneAPI delivers a unified programming model to simplify development across diverse architectures ▷ Common developer experience across SVMS ▷ Uncompromised native high-level language performance ▷ Support for CPU, GPU, AI, and FPGA ▷ Unified language and libraries for ▷ Based on industry standards and expressing parallelism open specifications https://www.oneapi.com/spec/ Optimization Notice Copyright © 2020, Intel Corporation. All rights reserved. Getting started withoneAPI 6 *Other names and brands may be claimed as the property of others. ONEAPI FOR CROSS-ARCHITECTUREPERFORMANCE Optimization Notice Copyright © 2020, Intel Corporation. All rights reserved. Getting started withoneAPI 7 *Other names and brands may be claimed as the property of others. WHAT IS DATA PARALLELC++? WHAT ISDPC++? The language is: C++ + SYCL https://www.khronos.org/sycl/ + Additional Features such as..
    [Show full text]
  • Michael Steyer Technical Consulting Engineer Intel Architecture, Graphics & Software Analysis Tools
    Michael Steyer Technical Consulting Engineer Intel Architecture, Graphics & Software Analysis Tools Optimization Notice Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Aspects of HPC/Throughput Application Performance What are the Aspects of Performance Intel Hardware Features Multi-core Intel® Omni Intel® Optane™ Intel® Advanced Intel® Path HBM DC persistent Vector Xeon® Extensions 512 Architecture memory (Intel® AVX-512) processor Distributed memory Memory I/O Threading CPU Core Message size False Sharing File I/O Threaded/serial ratio uArch issues (IPC) Rank placement Access with strides I/O latency Thread Imbalance Vectorization Rank Imbalance Latency I/O waits RTL overhead FPU usage efficiency RTL Overhead Bandwidth System-wide I/O (scheduling, forking) Network Bandwidth NUMA Synchronization Cluster Node Core Optimization Notice Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. IntelWhat Parallel are the Studio Aspects Tools covering of Performance the Aspects Intel Hardware Features Multi-core Intel® Intel® Omni Intel® Optane™ Advanced Intel®Path DC persistent Intel® Vector HBM Extensions Architectur Intel® VTune™memory AmplifierXeon® processor 512 (Intel® Tracee Intel®AVX-512) DistributedAnalyzer memory Memory I/O Threading AdvisorCPU Core Messageand size False Sharing File I/O Threaded/serial ratio uArch issues (IPC) Rank placement Access with strides I/O latency Thread Imbalance Vectorization RankCollector Imbalance Latency I/O waits RTL overhead FPU usage efficiency RTL Overhead Bandwidth System-wide I/O (scheduling, forking) Network Bandwidth NUMA Synchronization Cluster Node Core Optimization Notice Copyright © 2020, Intel Corporation. All rights reserved.
    [Show full text]
  • Intel® Software Products Highlights and Best Practices
    Intel® Software Products Highlights and Best Practices Edmund Preiss Business Development Manager Entdecken Sie weitere interessante Artikel und News zum Thema auf all-electronics.de! Hier klicken & informieren! Agenda • Key enhancements and highlights since ISTEP’11 • Industry segments using Intel® Software Development Products • Customer Demo and Best Practices Copyright© 2012, Intel Corporation. All rights reserved. 2 *Other brands and names are the property of their respective owners. Key enhancements & highlights since ISTEP’11 3 All in One -- Intel® Cluster Studio XE 2012 Analysis & Correctness Tools Shared & Distributed Memory Application Development Intel Cluster Studio XE supports: -Shared Memory Processing MPI Libraries & Tools -Distributed Memory Processing Compilers & Libraries Programming Models -Hybrid Processing Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® VTune™ Amplifier XE New VTune Amplifier XE features very well received by Software Developers Key reasons : • More intuitive – Improved GUI points to application inefficiencies • Preconfigured & customizable analysis profiles • Timeline View highlights concurrency issues • New Event/PC counter ratio analysis concept easy to grasp Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® VTune™ Amplifier XE The Old Way versus The New Way The Old Way: To see if there is an issue with branch misprediction, multiply event value (86,400,000) by 14 cycles, then divide by CPU_CLK_UNHALTED.THREAD (5,214,000,000). Then compare the resulting value to a threshold. If it is too high, investigate. The New Way: Look at the Branch Mispredict metric, and see if any cells are pink.
    [Show full text]
  • Tips and Tricks: Designing Low Power Native and Webapps
    Tips and Tricks: Designing low power Native and WebApps Harita Chilukuri and Abhishek Dhanotia Acknowledgements • William Baughman for his help with the browser analysis • Ross Burton & Thomas Wood for information on Tizen Architecture • Tom Baker, Luis “Fernando” Recalde, Raji Shunmuganathan and Yamini Nimmagadda for reviews and comments 2 tizen.org Power – Onus lies on Software too! Use system resources to provide best User WEBAPPS Experience with minimum power NATIVE APPS RUNTIME Interfaces with HW components, DRIVERS / MIDDLEWARE Independent device power management Frequency Governors, CPU Power OS Management ACPI/RTPM Provides features for low power HARDWARE Clock Gating, Power Gating, Sleep States 3 tizen.org Power – Onus lies on Software too! • A single bad application can lead to exceeding power budget • Hardware and OS provide many features for low power – Apps need to use them smartly to improve power efficiency • Good understanding of underlying system can help in designing better apps Images are properties of their respective owners 4 tizen.org Agenda Tips for Power Tools General Low Power Q&A & Metrics guidelines Applications 5 tizen.org Estimating Power - Metrics • CPU utilization • Memory bandwidth • CPU C and P state residencies • Device D states - For non-CPU components • S states – system sleep states • Wakeups, interrupts Soft metrics can help tune the application for optimal power 6 tizen.org Estimating Power - Tools • CPU utilization – Vmstat, Top – VTune, Perf for CPU cycles • Memory bandwidth – Vtune • CPU C and P states, Device D states – VTune, Powertop • Wakeups, Interrupts, Timers – Powertop, /proc stats • Tracing tool in Chrome browser ** VTune is an Intel product and can be purchased, others are publicly available Linux tools * Other names and brands may be claimed as the property of others.
    [Show full text]
  • Dr. Fabio Baruffa Senior Technical Consulting Engineer, Intel IAGS Legal Disclaimer & Optimization Notice
    Dr. Fabio Baruffa Senior Technical Consulting Engineer, Intel IAGS Legal Disclaimer & Optimization Notice Performance results are based on testing as of September 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Copyright © 2019, Intel Corporation. All rights reserved. Intel, the Intel logo, Pentium, Xeon, Core, VTune, OpenVINO, Cilk, are trademarks of Intel Corporation or its subsidiaries in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations.
    [Show full text]
  • 2019-11-20 Intel Profiling Tools
    Profiling Using the Intel Performance Profilers Ryan Honeyager November 21, 2019 JEDI Topic Discussion Meeting Intel-Provided Tools • Intel Advisor – Helps optimize programs to use vectorization and shared-memory threading. • Intel Inspector – A memory and thread checking and debugging tool. • Intel VTune Amplifier (Profiler) – Performs many kinds of code profiling. • Only discussing VTune Amplifier today • All tools are free and can be downloaded from Intel’s website. • They all work with C++ and Fortran, support MPI, and work with GCC, Intel compilers, Clang. No special compiler options needed beyond building in Debug and RelWithDebInfo modes. • All are installed on Hera already (module load vtune inspector advisor). VTune Amplifier (soon to be renamed VTune Profiler) • Performs many kinds of code profiling: • Examine code hotspots by CPU utilization • Threading / MPI efficiency • Memory consumption • Can profile using either CPU instructions (mostly Intel processors) or with software emulation. Defaults to polling every 10 ms. • Profiling cost varies – hotspot analysis is <5-10%, memory consumption analysis is 2-5x. • Has both GUI and console interfaces. Supports remote profiling via SSH, and can also save / load profiling results for future analysis. Example: See https://github.com/JCSDA/oops/pull/442. Reduced execution time of test by 45% (1.68 to 0.93 seconds) by rewriting ten lines of code. Console-based usage • amplxe-cl –collect hotspots –result-dir out –quiet -- your_app_here.x arg1 arg2 … • If –quiet is not specified, a summary report is printed to the console. • The results directory is around 5-10 MB per unit test. Can be transferred between computers. • Not restricted to profiling an application.
    [Show full text]
  • Intel® Embedded Software Development Tool Suite for Intel® Atom™ Processor
    Intel® Embedded Software Development Tool Suite For Intel® Atom™ processor Turbocharge your code for Intel® Atom™ processor-based Product Brief embedded systems Intel® Embedded Software High Performance Data, Media and Signal Powerful Application and JTAG-based Development Tool Suite Processing Libraries Debug for Intel® Atom™ processor High Performance C/C++ Compiler Advanced Performance Analysis Tool Benefit from a comprehensive software development tools solution for your Intel® Atom™ processor- based embedded system designs and application software development. Coding, compiling, debugging and performance tuning made simple. The Intel® Embedded Software Development Tool Suite for Intel® Atom™ processor is a complete solution that addresses embedded software development requirements for Intel® Atom™ processor- powered platforms such as embedded systems, tablets, netbooks, smartphones, IVI and other CE devices. The Embedded Software Development Tool Suite covers the entire software development cycle: coding, compiling, debugging, and analyzing performance. All included tools are Linux* hosted, are compatible with GNU tools, and support multiple Linux* OS based targets. Intel® C++ Compiler for Linux* Intel® Integrated Performance Primitives for Linux* Intel® Application Debugger for Intel® Atom™ processor Intel® JTAG Debugger for Intel® Atom™ processor Intel® VTune™ Amplifier XE for Linux* Compatibility and support for Linux* based targets e.g. Yocto Project* Features and Benefits Feature Benefits Intel® C++ Compiler Boost
    [Show full text]
  • Intel Threading Building Blocks
    Praise for Intel Threading Building Blocks “The Age of Serial Computing is over. With the advent of multi-core processors, parallel- computing technology that was once relegated to universities and research labs is now emerging as mainstream. Intel Threading Building Blocks updates and greatly expands the ‘work-stealing’ technology pioneered by the MIT Cilk system of 15 years ago, providing a modern industrial-strength C++ library for concurrent programming. “Not only does this book offer an excellent introduction to the library, it furnishes novices and experts alike with a clear and accessible discussion of the complexities of concurrency.” — Charles E. Leiserson, MIT Computer Science and Artificial Intelligence Laboratory “We used to say make it right, then make it fast. We can’t do that anymore. TBB lets us design for correctness and speed up front for Maya. This book shows you how to extract the most benefit from using TBB in your code.” — Martin Watt, Senior Software Engineer, Autodesk “TBB promises to change how parallel programming is done in C++. This book will be extremely useful to any C++ programmer. With this book, James achieves two important goals: • Presents an excellent introduction to parallel programming, illustrating the most com- mon parallel programming patterns and the forces governing their use. • Documents the Threading Building Blocks C++ library—a library that provides generic algorithms for these patterns. “TBB incorporates many of the best ideas that researchers in object-oriented parallel computing developed in the last two decades.” — Marc Snir, Head of the Computer Science Department, University of Illinois at Urbana-Champaign “This book was my first introduction to Intel Threading Building Blocks.
    [Show full text]
  • Case Studies: Memory Behavior of Multithreaded Multimedia and AI Applications
    Case Studies: Memory Behavior of Multithreaded Multimedia and AI Applications Lu Peng*, Justin Song, Steven Ge, Yen-Kuang Chen, Victor Lee, Jih-Kwon Peir*, and Bob Liang * Computer Information Science & Engineering Architecture Research Lab University of Florida Intel Corporation {lpeng, peir}@cise.ufl.edu {justin.j.song, steven.ge, yen-kuang.chen, victor.w.lee, bob.liang}@intel.com Abstract processor operated as two logical processors [3]. Active threads on each processor have their own local states, Memory performance becomes a dominant factor for such as Program Counter (PC), register file, and today’s microprocessor applications. In this paper, we completion logic, while sharing other expensive study memory reference behavior of emerging components, such as functional units and caches. On a multimedia and AI applications. We compare memory multithreaded processor, multiple active threads can be performance for sequential and multithreaded versions homogenous (from the same application), or of the applications on multithreaded processors. The heterogeneous (from different independent applications). methodology we used including workload selection and In this paper, we investigate memory-sharing behavior parallelization, benchmarking and measurement, from homogenous threads of emerging multimedia and memory trace collection and verification, and trace- AI workloads. driven memory performance simulations. The results Two applications, AEC (Acoustic Echo Cancellation) from the case studies show that opposite reference and SL (Structure Learning), were examined. AEC is behavior, either constructive or disruptive, could be a widely used in telecommunication and acoustic result for different programs. Care must be taken to processing systems to effectively eliminate echo signals make sure the disruptive memory references will not [4].
    [Show full text]
  • Varun Agrawal: Curriculum Vitae
    Varun Agrawal Ph.D. Candidate http://compas.cs.stonybrook.edu/~vagrawal/ COMPAS Lab [email protected] Stony Brook University Research Interests Design fast and efficient computer systems. Particularly I’m interested in the design of processor microarchitec- ture, and studying the properties of applications and their interaction with the underlying hardware. Education • Stony Brook University, Stony Brook, NY. August 2012 - Present Ph.D. Candidate - Advised by Michael Ferdman, Department of Computer Science GPA: 3.91 • Indian Institute of Technology Kanpur, Kanpur, India. July 2006 - May 2010 B.Tech., Electrical Engineering CPI: 8.3/10.0 Work Experience • Intern, Advanced Micro Devices, Boxborough, MA, USA. September 2017 - December 2017 – Improved efficiency of processor frontend in AMD’s general purpose processors • Intern, Intel Corporation, Hillsboro, OR, USA. June 2015 - August 2015 – Code Generation for PCU microcontroller • Software Associate, Strand Life Sciences, Bangalore, India. August 2010 - June 2012 – Develop bioinformatics tools for gene expression analysis Publications Conference Publication • Architectural Support for Dynamic Linking ASPLOS 2015 Varun Agrawal, Abhiroop Dabral, Tapti Palit, Yongming Shen, Michael Ferdman, in 20th International Con- ference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015. Conference Poster • JIT Kernels: An Idea Whose Time Has (Just) Come SOSP 2013 Varun Agrawal, Amit Arya, Michael Ferdman, Donald E. Porter, Poster presented at the 24th ACM Sympo-
    [Show full text]
  • Intel® Software Development Products for Meego* Uli Dumschat, Intel SSG/Developer Products Division Dec 3Rd, 2010 Legal Disclaimer
    Intel® Software Development Products for MeeGo* Uli Dumschat, Intel SSG/Developer Products Division Dec 3rd, 2010 Legal Disclaimer • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. • Intel may make changes to specifications and product descriptions at any time, without notice. • All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. • Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. • [Add any code names from previous pages] and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user • Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.
    [Show full text]