GPU Programming

GPU Programming

GPU Programming Rupesh Nasre. http://www.cse.iitm.ac.in/~rupesh IIT Madras July 2017 The Good Old Days for Software Source: J. Birnbaum Single-processor performance experienced dramatic improvements from clock, and architectural improvement (Pipelining, Instruction-Level-Parallelism) Applications experienced automatic performance improvement Hitting the Power Wall toward a brighter tomorrow http://img.tomshardware.com/us/2005/11/21/the_mother_of_all_cpu_charts_2005/cpu_frequency.gif Hitting the Power Wall 1000 Power doubles every 4 years Sun's 5-year projection: 200W total, 125 W/cm2 ! Rocket Nozzle Surface Nuclear Reactor 100 2 Pentium® 4 m Pentium® III c / Hot plate s t Pentium® II t 10 a Pentium® Pro W i386 Pentium® i486 P=VI: 75W @ 1.5V = 50 A! 1 1.5m 1m 0.7m 0.5m 0.35m 0.25m 0.18m 0.13m 0.1m 0.07m “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp. Micro32 conference key note - 1999. Courtesy Avi Mendelson, Intel. Hitting the Power Wall http://img.tomshardware.com/us/2005/11/21/the_mother_of_all_cpu_charts_2005/cpu_frequency.gif 2004 – Intel cancels Tejas and Jayhawk due to "heat problems due to the extreme power consumption of the core ..." The Only Option: Use Many Cores Chip density is continuing increase ~2x every 2 years n Clock speed is not n Number of processor cores may double There is little or no more hidden parallelism (ILP) to be found Parallelism must be exposed to and managed by software Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond) Parallel Platforms ● Shared memory systems (multi-core) ● Distributed systems (cluster) ● Graphics Processing Units (many-core) ● Field-Programmable Gate Arrays (configurable after manufacturing) ● Application-Specific Integrated Circuits GPU-CPU Performance Comparison Source: Thorsten Thormählen In this course... ● Basic GPU Programming – Computation, Memory, Synchronization, Debugging ● Advanced GPU Programming – Streams, Heterogeneous computing, Case studies ● Topics in GPU Programming – Unified virtual memory, multi-GPU, peer access Logistics ● TAs – Somesh Singh, Jyothi Vedurada, Shouvick Mondal ● Evaluation – Five assignments (7 + 7 + 13 + 13 + 20) – MidSem (20) + EndSem (20) – Absolute grading (95, 80, 70, ..., 40) – You have this week to suggest changes to dates. ● Attendance (57 * 0.85 = 48.85) – I don't shy to give W grades. ● Moodle – Your responsibility to subscribe to it. Logistics ● Tutorials and lectures would be intermixed. – But we will have separate doubts / practice sessions. ● You need a login on GNR Cluster in CC. – You can run assignments on your laptop, but make sure they run on the GNR Cluster. – GNR Cluster would be slow. .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us