Mustafa Çavuş

Total Page:16

File Type:pdf, Size:1020Kb

Mustafa Çavuş Mustafa Çavuş Home Address University Address 2471. Sokak TOBB University of Economics and Technology Doga Apartmani Sogutozu Caddesi No: 12/8 No: 43 Umitkoy – Ankara / Turkey Sogutozu – Ankara / Turkey [email protected] [email protected] Education • Master of Science(MS) ◦ Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey (September 2012 – current) • Bachelor of Science(BS) ◦ Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey (September 2007 – August 2012) Selected Projects / Research Experience • My Master thesis project is an analysis of data-parallel applications of single-layer networks in unsupervised feature learning using CUDA. • In my masters degree I have worked in development of a GPU based parallel image processing library for embedded systems. This library is based on OpenCL and it shows 7X speed-up over OpenCV on Vivante GC2000 embedded GPU on average. This project was a cooperation with Vivante Corporation. • During my Bachelor's degree, I worked in the development of the conformance tests of Vivante GC2000 GPU. The aim of this project is to develop applications to test the functionality of both OpenGL ES and OpenCL. We have reported many bugs during the development of this project. All of them fixed and this project finally proved the compability of Vivante GC2000 in both OpenGL ES and OpenCL. This project was a cooperation with Vivante Corporation. • During my Bachelor's degree, I implemented a GPU based parallel real-time panoramic video stitching application as my graduation project. It was developed using OpenCL, OpenCV and Qt Framework. Publications • “GPU Based Parallel Image Processing Library for Embedded Systems” Mustafa Çavuş, Hakkı Doğaner Sümerkan, Osman Seçkin Şimşek, Hasan Hassan, Abdullah Giray Yağlıkçı, Oğuz Ergin 9th International Conference on Computer Vision Theory and Applications (VISAPP'13), Lisbon, Portugal, January 2014 Conference Presentations • 9th International Conference on Computer Vision Theory and Applications (VISAPP'13) “GPU Based Parallel Image Processing Library for Embedded Systems” Lisbon, Portugal, January 2014 Intereset Areas • Parallel Computing • Heterogeneous Computing • Image Processing • Computer Vision Graduate Teaching Assistantships • BİL 362L: Microprocessor Laboratory ◦ Summer 2014 (TOBB University of Economics and Technology) ◦ Spring 2014 (TOBB University of Economics and Technology) ◦ Fall 2013 (TOBB University of Economics and Technology) • BİL 142: Computer Programming II (C/C++) ◦ Spring 2014 (TOBB University of Economics and Technology) Internships • Worked as Software Engineer in Nobel Yazılım (September 2010 – December 2010, Ankara, Turkey) ◦ I worked on an accounting program and developed several applications in C++ using Qt Framework which includes a visual graphics chart generator, a drag and drop based PDF report ganarator, a SOAP message manager framework and some other small applications. • Worked as Software Engineer in Infina Yazılım (January 2010 – April 2010, İstanbul, Turkey) ◦ I worked in the development of the financial application of Infina Yazılım. Technical Skills • Programing Languages: C/C++, Java, Python, LISP, JavaScript. • Experience in CUDA, OpenCL and OpenGL ES. • Experience in OpenCV and MATLAB. • Experience in QT Framework and Android SDK. • Operating Systems: Windows 7 and Linux Distributions(Arch Linux, Ubuntu, OpenSUSE) Languages • Turkish (Native) • English.
Recommended publications
  • CES 2016 Exhibitor Listing As of 1/19/16
    CES 2016 Exhibitor Listing as of 1/19/16 Name Booth * Cosmopolitan Vdara Hospitality Suites 1 Esource Technology Co., Ltd. 26724 10 Vins 80642 12 Labs 73846 1Byone Products Inc. 21953 2 the Max Asia Pacific Ltd. 72163 2017 Exhibit Space Selection 81259 3 Legged Thing Ltd 12045 360fly 10417 360-G GmbH 81250 360Heros Inc 26417 3D Fuel 73113 3D Printlife 72323 3D Sound Labs 80442 3D Systems 72721 3D Vision Technologies Limited 6718 3DiVi Company 81532 3Dprintler.com 80655 3DRudder 81631 3Iware Co.,Ltd. 45005 3M 31411 3rd Dimension Industrial 3D Printing 73108 4DCulture Inc. 58005 4DDynamics 35483 4iiii Innovations, Inc. 73623 5V - All In One HC 81151 6SensorLabs BT31 Page 1 of 135 6sensorlabs / Nima 81339 7 Medical 81040 8 Locations Co., Ltd. 70572 8A Inc. 82831 A&A Merchandising Inc. 70567 A&D Medical 73939 A+E Networks Aria 36, Aria 53 AAC Technologies Holdings Inc. Suite 2910 AAMP Global 2809 Aaron Design 82839 Aaudio Imports Suite 30-116 AAUXX 73757 Abalta Technologies Suite 2460 ABC Trading Solution 74939 Abeeway 80463 Absolare USA LLC Suite 29-131 Absolue Creations Suite 30-312 Acadia Technology Inc. 20365 Acapella Audio Arts Suite 30-215 Accedo Palazzo 50707 Accele Electronics 1110 Accell 20322 Accenture Toscana 3804 Accugraphic Sales 82423 Accuphase Laboratory Suite 29-139 ACE CAD Enterprise Co., Ltd 55023 Ace Computers/Ace Digital Home 20318 ACE Marketing Inc. 59025 ACE Marketing Inc. 31622 ACECAD Digital Corp./Hongteli, DBA Solidtek 31814 USA Acelink Technology Co., Ltd. Suite 2660 Acen Co.,Ltd. 44015 Page 2 of 135 Acesonic USA 22039 A-Champs 74967 ACIGI, Fujiiryoki USA/Dr.
    [Show full text]
  • Vivante Corporation Introduction
    World’s Smallest OpenGL ES 2.0 Silicon Introduction to Vivante Graphics Processor IP Khronos DevU - December 2009 2009 Silicon Proven in the Marketplace Vivante GPUs to ship in tier one consumer products four key market segments Smart Phone Digital Picture & Frame Mobile MID Gaming Netbook Home Embedded Entertainment 2009 World’s Smallest OpenGL ES 2.0 GPU Technology Leader in GPU IP Delivers 2x Advantages • Most efficient solution • 2x Performance per mm2 • Support 1080HD and higher • 30+ Licensees • Silicon proven in multiple applications 2009 smaller faster cooler 30+ Licensees Vivante Marketplace GC2000 in 65LP 500 MHz 100 Mtri/s 1.0 Gpix/s 18 GFLOPS GC400 in 65LP 2.5 mm2 silicon 49 mW active 13 Mtri/s 125 Mpix/s CAMERA ∗ MOBILE PHONE ∗ NAVIGATION ∗ PRINTER ∗ AUTOMOTIVE ∗ NETBOOK ∗ MID ∗ SET-TOP BOX ∗ HDTV ∗ MOBILE GAMING DTV Digital Smartphone MID Blu-ray Picture Camera GPS Netbook Frame Set-top box 2009 Vivante Differentiation ScalarMorphicTM Architecture GC GPU Core Multiple dimensions of AHB AXI scalability for optimal Host Interface cost/area/power balance Memory Controller Unified shaders Scalable multicore design 3-D Pipeline Texture Hyper-threading virtually Engine Engine 3-D RenderingEngine eliminates latency effects Engine Simple integration Ultra-threaded Unified Shader Ultra-threadedUltra-threaded Unified Unified Shader Shader Low bus bandwidth Graphics Up to 256 threads per Shader Pipeline Pixel Produces a small area, Front Engine high performance End 2-D Pipeline 2-D Drawing and Scaling Engine 2009 Complete Graphics
    [Show full text]
  • The Openvx™ Specification
    The OpenVX™ Specification Version 1.0.1 Document Revision: r31169 Generated on Wed May 13 2015 08:41:43 Khronos Vision Working Group Editor: Susheel Gautam Editor: Erik Rainey Copyright ©2014 The Khronos Group Inc. i Copyright ©2014 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specifica- tion for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be re-formatted AS LONG AS the contents of the specifi- cation are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group web-site should be included whenever possible with specification distributions.
    [Show full text]
  • The Openvx™ Specification
    The OpenVX™ Specification Version 1.2 Document Revision: dba1aa3 Generated on Wed Oct 11 2017 20:00:10 Khronos Vision Working Group Editor: Stephen Ramm Copyright ©2016-2017 The Khronos Group Inc. i Copyright ©2016-2017 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specifica- tion for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be re-formatted AS LONG AS the contents of the specifi- cation are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group web-site should be included whenever possible with specification distributions.
    [Show full text]
  • I.MX Graphics Users Guide Android
    NXP Semiconductors Document Number: IMXGRAPHICUG Rev. 0, 02/2018 i.MX Graphics User’s Guide Contents Chapter 1 Introduction ............................................................................................................................................. 6 Chapter 2 i.MX G2D API ............................................................................................................................................ 6 2.1 Overview ...................................................................................................................................................... 6 2.2 Enumerations and structures ....................................................................................................................... 6 2.3 G2D function descriptions .......................................................................................................................... 10 2.4 Support of new operating system in G2D .................................................................................................. 16 2.5 Sample code for G2D API usage ................................................................................................................. 16 2.6 Feature list on multiple platforms.............................................................................................................. 19 Chapter 3 i.MX EGL and OGL Extension Support .................................................................................................... 20 3.1 Introduction ..............................................................................................................................................
    [Show full text]
  • 芯原微电子(上海)股份有限公司 Verisilicon Microelectronics (Shanghai) Co., Ltd
    本次股票发行后拟在科创板市场上市,该市场具有较高的投资风险。科创板 公司具有研发投入大、经营风险高、业绩不稳定、退市风险高等特点,投资 者面临较大的市场风险。投资者应充分了解科创板市场的投资风险及本公司 所披露的风险因素,审慎作出投资决定。 芯原微电子(上海)股份有限公司 VeriSilicon Microelectronics (Shanghai) Co., Ltd. 中国(上海)自由贸易试验区春晓路 289 号张江大厦 20A 首次公开发行股票并在科创板上市 招股说明书 (申报稿) 公司的发行申请尚需经上海证券交易所和中国证监会履行相应程序。本招股 说明书不具有据以发行股票的法律效力,仅供预先披露之用。投资者应当以 正式公告的招股说明书作为投资决定的依据。 保荐人(主承销商) (深圳市福田区福田街道福华一路 111 号) 联席主承销商 (上海市黄浦区广东路 689 号) 财务顾问 芯原微电子(上海)股份有限公司首次公开发行股票并在科创板上市申请文件 招股说明书(申报稿) 声明及承诺 中国证监会、交易所对本次发行所作的任何决定或意见,均不表明其对注册 申请文件及所披露信息的真实性、准确性、完整性作出保证,也不表明其对发行 人的盈利能力、投资价值或者对投资者的收益作出实质性判断或保证。任何与之 相反的声明均属虚假不实陈述。 根据《证券法》的规定,股票依法发行后,发行人经营与收益的变化,由发 行人自行负责;投资者自主判断发行人的投资价值,自主作出投资决策,自行承 担股票依法发行后因发行人经营与收益变化或者股票价格变动引致的投资风险。 发行人及全体董事、监事、高级管理人员承诺招股说明书及其他信息披露资 料不存在虚假记载、误导性陈述或重大遗漏,并对其真实性、准确性、完整性承 担个别和连带的法律责任。 发行人第一大股东承诺本招股说明书不存在虚假记载、误导性陈述或重大遗 漏,并对其真实性、准确性、完整性承担个别和连带的法律责任。 公司负责人和主管会计工作的负责人、会计机构负责人保证招股说明书中财 务会计资料真实、完整。 发行人及全体董事、监事、高级管理人员、发行人的第一大股东以及保荐人、 承销的证券公司承诺因发行人招股说明书及其他信息披露资料有虚假记载、误导 性陈述或者重大遗漏,致使投资者在证券发行和交易中遭受损失的,将依法赔偿 投资者损失。 保荐人及证券服务机构承诺因其为发行人本次公开发行制作、出具的文件有 虚假记载、误导性陈述或者重大遗漏,给投资者造成损失的,将依法赔偿投资者 损失。 1-1-1 芯原微电子(上海)股份有限公司首次公开发行股票并在科创板上市申请文件 招股说明书(申报稿) 本次发行概况 发行股票类型 人民币普通股(A 股) 本次公开发行股票采用公开发行新股方式,公开发行不低于 发行股数 48,319,289 股,不低于发行后总股本的 10.00%。本次发行中, 公司股东不进行公开发售股份。 公司高级管理人员及核心员工拟通过专项资管计划参与本次发 行战略配售,配售数量不超过本次发行数量的 10.00%,具体按 发行人高管、员工参与战 照上交所相关规定执行。公司及相关人员后续将按要求进一步 略配售情况 明确参与本次发行战略配售的具体方案,并按规定向上交所提 交相关文件。 保荐机构将安排子公司招商证券投资有限公司参与本次发行战 保荐人相关子公司参与战 略配售,具体按照上交所相关规定执行。保荐机构及其相关子 略配售 公司后续将按要求进一步明确参与本次发行战略配售的具体方 案,并按规定向上交所提交相关文件。 每股面值 1.00 元 每股发行价格 【】元/股 预计发行日期 【】年【】月【】日 拟上市的证券交易所和板 上海证券交易所科创板 块 发行后总股本 483,192,883
    [Show full text]
  • High Performance Graphics on Android
    High Performance Graphics on Android Cemil Azizoglu, Ph. D. Vivante Corporation © Copyright Khronos Group, 2010 - Page 1 What is Android? • “Android is a software stack that includes an operating system, middleware and key applications.” © Copyright Khronos Group, 2010 - Page 2 Android Facts • Open source • Apache Public License v2 - Allows new files to be kept proprietary - Kernel-side is General Public License v2 • Apps normally developed using SDK (Java) • NDK also provided - does not mean app can be implemented entirely in native code… just portions of it © Copyright Khronos Group, 2010 - Page 3 Supported Graphics APIs • EGL 1.4 • OpenGL ES 1.1 - CPU-based implementation provided - … optionally can use copybit API for acceleration • OpenGL ES 2.0 - no CPU-based version • No OpenVG support - Skia has some vector graphics functionality © Copyright Khronos Group, 2010 - Page 4 Android 3D Graphics Stack 3D app Java JSR 239 Native JNI EGL/OGLES 1.1/OGLES 2.0 wrapper EGL/OGLES 1.1 s/w EGL/OGLES 1.1/OGLES 2.0 h/w Copybit = Vendor implemented component = Existing Android component A B = A loads B dynamically © Copyright Khronos Group, 2010 - Page 5 Which EGL/OpenGL ES? • Android framework always favors hardware implementation - EGL wrapper stacks hardware EGL configs ahead of software - Any (reasonable) config requested would come from hardware implementation • Each hardware entry point is dispatched from Android wrapper via trampoline code (~5 ARM instructions) © Copyright Khronos Group, 2010 - Page 6 Surface Flinger 3D app Java JSR 239 Native
    [Show full text]
  • Download.Nvidia.Com/Pdf/Tegra/Tegra-X1-Whitepaper
    Models and Techniques for Designing Mobile System-on-Chip Devices by Ayub Ahmed Gubran A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Electrical and Computer Engineering) The University of British Columbia (Vancouver) August 2020 © Ayub Ahmed Gubran, 2020 The following individuals certify that they have read, and recommend to the Fac- ulty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: Models and Techniques for Designing Mobile System-on-Chip Devices submitted by Ayub Ahmed Gubran in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Examining Committee: Tor M. Aamodt, Electrical and Computer Engineering Supervisor Steve Wilton, Electrical and Computer Engineering Supervisory Committee Member Alan Hu, Computer Science University Examiner Andre Ivanov, Electrical and Computer Engineering University Examiner John Owens, The University of California-Davis, Electrical and Computer Engineering External Examiner Additional Supervisory Committee Members: Sidney Fels, Electrical and Computer Engineering Supervisory Committee Member ii Abstract Mobile SoCs have become ubiquitous computing platforms, and, in recent years, they have become increasingly heterogeneous and complex. A typical SoC to- day includes CPUs, GPUs, image processors, video encoders/decoders, and AI engines. This dissertation addresses some of the challenges associated with SoCs in three pieces of work. The first piece of work develops a cycle-accurate model, Emerald, which pro- vides a platform for studying system-level SoC interactions while including the impact of graphics. Our cycle-accurate infrastructure builds upon well-established tools, GPGPU-Sim and gem5, with support for graphics and GPGPU workloads, and full system simulation with Android.
    [Show full text]
  • 모바일 GPU 동향 Trends of Mobile GPU 임베디드 소프트웨어 & 시스템반도체 기술 특집
    2013 Electronics and Telecommunications Trends 모바일 GPU 동향 Trends of Mobile GPU 임베디드 소프트웨어 & 시스템반도체 기술 특집 Ⅰ. Mobile GPU Ⅱ. Mobile GPU 기술 Ⅲ. 상용 Mobile GPU Ⅳ. 결론 한진호 (J.H. Han) 멀티미디어프로세서연구실 선임연구원 변경진 (J.G. Byun) 멀티미디어프로세서연구실 실장 엄낙웅 (N.W. Eum) 시스템반도체연구부 부장 스마트폰 및 태블릿 PC에 들어가는 핵심 부품인 AP(Application Processor)는 모 두 GPU(Graphics Processing Unit)를 내장하고 있다. 이는 칩 면적의 제약과 사용 가능한 전력의 한계로 데스크톱의 그래픽 카드에 탑재된 고성능 GPU와는 다른 설계 제약을 받는다. 본고에서는 고성능 GPU와 다른 설계 조건을 갖는 mobile GPU 기술에 대해서 알아보았고 대표적인 commercial mobile GPU인 Imagina- tion, ARM, Qualcomm, NVidia사의 mobile GPU의 특징 및 성능에 대해서 알아 보았다. 50 © 2013 한국전자통신연구원 Ⅰ. Mobile GPU 화면 깊이 등의 정보를 계산한다. 그런 다음에 composite 에서 각 조각의 표면 무늬(texture)를 입히게 된다. Mobile GPU(Graphics Processing Unit)라 함은 스마 이러한 GPU의 rendering pipeline은 하드웨어 블록을 트폰 및 태블릿 PC 등과 같은 핸드헬드(handheld) 제품 고려하여 (그림 1)과 같이 변천하였다. 에서 사용되는 AP(Application Processor) 칩에 내장되 Fixed graphics pipeline 구조에서는 geometry 기능 는 GPU를 칭하며 GPU만을 칩으로 제작되는 것과 비교 을 transform, lighting, clipping으로 구분하였으며 하여 전력 소모와 집적 면적의 제약으로 기존의 하이엔 composite를 texture mapping, texture combine, 드(high end)나 데스크톱 제품과 비교하여 전력 효율성 blending으로 구분하였으며 후에 transform, lighting, 이 대두되는 GPU이다. 이를 위해 기존의 GPU archi- texture mapping, texture combine 부분은 고정된 하 tecture의 차이점 및 이를 위한 mobile GPU 기술 그리 드웨어보다는 programmable한 vertex shader, pixel 고, 현존하는 mobile GPU의 기술 수준을 알아보고 앞 shader로 구성한 programmable graphics pipeline을 실 으로 Mobile GPU가 어떻게 변화할지를 알아보겠다.
    [Show full text]
  • CL-SOM-Am57x Reference Guide
    CL-SOM-AM57x Reference Guide Legal © 2016 Compulab Ltd. All Rights Reserved. No part of this document may be photocopied, reproduced, stored in a retrieval system, or transmitted, in any form or by any means whether, electronic, mechanical, or otherwise without the prior written permission of Compulab Ltd. No warranty of accuracy is given concerning the contents of the information contained in this publication. To the extent permitted by law no liability (including liability to any person by reason of negligence) will be accepted by Compulab Ltd., its subsidiaries or employees for any direct or indirect loss or damage caused by omissions from or inaccuracies in this document. Compulab Ltd. reserves the right to change details in this publication without notice. Product and company names herein may be the trademarks of their respective owners. Compulab Ltd. P.O. Box 687 Yokneam Illit 20692 ISRAEL Tel: +972 (4) 8290100 http://www.compulab.com Fax: +972 (4) 8325251 Revised January 2018 CL-SOM-AM57x 2 Table of Contents Table of Contents 1 INTRODUCTION .......................................................................................................... 6 1.1 About This Document .................................................................................................. 6 1.2 CL-SOM-AM57x Part Number Legend ...................................................................... 6 1.3 Related Documents ...................................................................................................... 6 2 OVERVIEW ..................................................................................................................
    [Show full text]
  • Surface Compression Using Dynamic Color Palettes
    Surface Compression Using Dynamic Color Palettes Ayub A. Gubran Felix Huang Tor M. Aamodt University of British Columbia Carnegie Mellon University University of British Columbia Vancouver, BC, Canada Pittsburgh, PA, United States Vancouver, BC, Canada [email protected] [email protected] [email protected] ABSTRACT framebuffer surfaces, which are composited to a single Off-chip memory traffic is a major source of power and surface for screen display. Also GPUs can use frame- energy consumption on mobile platforms. A large amount buffer surfaces as inputs to additional rendering stages, of this off-chip traffic is used to manipulate graphics e.g., render to texture and deferred shading; as a result, framebuffer surfaces. To cut down the cost of accessing any given application may utilize one or more frame- off-chip memory, framebuffer surfaces are compressed buffer surfaces. to reduce the bandwidth consumed on surface manipu- This work studies a large set of Android workloads to lation when rendering or displaying. infer the compression properties of framebuffer surfaces In this work, we study the compression properties generated by mobile UI, 2D and 3D applications. Our of framebuffer surfaces and highlight the fact that sur- study found that framebuffers from different classes of faces from different applications have different compres- workloads have different compression properties. We sion characteristics. We use the results of our analysis exploit these properties to propose an effective palette- to propose a scheme, Dynamic Color Palettes (DCP), based framebuffer compression scheme that focuses on which achieves higher compression rates with UI and common UI and 2D applications.
    [Show full text]
  • Energy-Efficient Mobile GPU Systems
    Energy-Efficient Mobile GPU Systems Jose Maria Arnau Doctor of Philosophy Department of Computer Architecture Universitat Politecnica de Catalunya 2014 2 Abstract The design of mobile GPUs is all about saving energy. Smartphones and tablets are battery-operated and thus any type of rendering needs to use as little energy as possible. Furthermore, smartphones do not include sophisticated cooling systems due to their small size, making heat dissipation a primary concern. Improving the energy-efficiency of mobile GPUs will be absolutely necessary to achieve the per- formance required to satisfy consumer expectations, while maintaining operating time per battery charge and keeping the GPU in its thermal limits. The first step in optimizing energy consumption is to identify the sources of energy drain. Previous studies have demonstrated that the register file is one of the main sources of energy consumption in a GPU. As graphics workloads are highly data- and memory-parallel, GPUs rely on massive multithreading to hide the memory latency and keep the functional units busy. However, aggressive multithreading requires a huge register file to keep the registers of thousands of simultaneous threads. Such a big register file exceeds the power budget typically available for an embedded graphics processors and, hence, more energy-efficient memory latency tolerance techniques are necessary. On the other hand, prior research showed that the off-chip accesses to system memory are one of the most expensive operations in terms of energy in a mobile GPU. Therefore, optimizing memory bandwidth usage is a primary concern in mobile GPU design. Many bandwidth saving techniques, such as texture com- pression or ARM's transaction elimination, have been proposed in both industry and academia.
    [Show full text]