Design and Performance Evaluation of a Software Framework for Multi-Physics Simulations on Heterogeneous Supercomputers

Total Page:16

File Type:pdf, Size:1020Kb

Design and Performance Evaluation of a Software Framework for Multi-Physics Simulations on Heterogeneous Supercomputers Design and Performance Evaluation of a Software Framework for Multi-Physics Simulations on Heterogeneous Supercomputers Entwurf und Performance-Evaluierung eines Software-Frameworks für Multi-Physik-Simulationen auf heterogenen Supercomputern Der Technischen Fakultät der Universität Erlangen-Nürnberg zur Erlangung des Grades Doktor-Ingenieur vorgelegt von Dipl.-Inf. Christian Feichtinger Erlangen, 2012 Als Dissertation genehmigt von der Technischen Fakultät der Universität Erlangen-Nürnberg Tag der Einreichung: 11. Juni 2012 Tag der Promotion: 24. July 2012 Dekan: Prof. Dr. Marion Merklein Berichterstatter: Prof. Dr. Ulrich Rüde Prof. Dr. Takayuki Aoki Prof. Dr. Gerhard Wellein Abstract Despite the experience of several decades the numerical simulation of computa- tional fluid dynamics is still an enormously challenging and active research field. Most simulation tasks of scientific and industrial relevance require the model- ing of multiple physical effects, complex numerical algorithms, and have to be executed on supercomputers due to their high computational demands. Fac- ing these complexities, the reimplementation of the entire functionality for each simulation task, forced by inflexible, non-maintainable, and non-extendable im- plementations is not feasible and bound to fail. The requirements to solve the involved research objectives can only be met in an interdisciplinary effort and by a clean and structured software development process leading to usable, main- tainable, and efficient software designs on all levels of the resulting software framework. The major scientific contribution of this thesis is the thorough design and imple- mentation of the software framework WaLBerla that is suitable for the simulation of multi-physics simulation tasks centered around the lattice Boltzmann method. The design goal of WaLBerla is to be usable, maintainable, and extendable as well as to enable efficient and scalable implementations on massively parallel super- computers. In addition, a performance analysis of lattice Boltzmann simulations has been conducted on heterogeneous supercomputers using a MPI, a hybrid, and a heterogeneous parallelization approach and over 1000 GPUs. With the help of a performance model for the communication overhead the parallel perfor- mance has been accurately estimated, understood, and optimized. It is shown that WaLBerla introduces no significant performance overhead and that efficient hardware-aware implementations are possible in WaLBerla. Furthermore, the applicability and flexibility of WaLBerla is demonstrated in simulations of particle flows and nano fluids on CPU-GPU clusters. By the suc- cessful application of WaLBerla in various simulation tasks, and the analysis of the performance and the software quality with help of quality criteria, it is shown that the design goals of WaLBerla have been fulfilled. Zusammenfassung Die numerische Simulation von strömungsmechanischen Anwendungen ist trotz jahrzehntelanger Erfahrung immer noch ein aktives Forschungsfeld mit enor- men Herausforderungen. Viele der wissenschaftlich und industriell relevanten Anwendungen erfordern die Modellierung einer Vielzahl physikalischer Effekte, komplexe numerische Algorithmen, und sind Aufgrund ihres Rechenaufwandes nur auf Supercomputern ausführbar. Im Angesicht dieser Komplexität ist die zeitaufwendige Reimplementierung der gesamten Funktionalität für jede neue Anwendung, erzwungen durch unflexible, nicht wartbare und erweiterbare Im- plementierungen, kein adäquater Ansatz und zum Scheitern verurteilt. Die An- forderungen zum Lösen der Problemstellungen können nur in interdisziplinären Forschungsvorhaben mit Hilfe eines systematischen und strukturierten Softwa- reentwicklungsprozesses erfüllt werden. Der zentrale wissenschaftliche Beitrag dieser Arbeit ist die nachhaltige Entwick- lung des Software-Frameworks WaLBerla zur Simulation von Multi-Physik-An- wendungen mittels der Lattice Boltzmann Methode. Das Designziel von WaL- Berla ist sowohl anwendbar, flexible, und erweiterbar zu sein, als auch effiziente und skalierbare Implementierungen für massiv parallele Simulation auf Super- computern zu ermöglichen. Weiterhin wurde in dieser Arbeit eine Performance-Studie von Lattice-Boltz- mann-Simulationen für Parallelisierungsansätze basierend auf MPI, hybrider, und heterogener Parallelisierungen auf heterogenen CPU-GPU Clustern auf über 1000 GPUs durchgeführt. Mit Hilfe eines Performance-Models für den Parallelisierungs-Overhead konnte die parallele Performance akkurat vorherge- sagt, untersucht, und optimiert werden. Es konnte gezeigt werden, dass WaL- Berla keinen signifikanten Performance-Overhead aufweist und das effiziente Implementierungen in WaLBerla möglich sind. Außerdem wurde mit Hilfe von Fluidisierungs- und Nano-Fluid-Simulation auf CPU-GPU Clustern die Anwendbarkeit und Flexibilität von WaLBerla demon- striert. Durch die Analyse der Performance und der Softwarequalität anhand von Qualitätskriterien, und dem erfolgreichen Einsatz von WaLBerla in unter- schiedlichen Simulationsszenarien konnte gezeigt werden, dass die Designziele von WaLBerla erreicht wurden. Acknowledgements I would like to express my gratitude to my PhD supervisor Prof. Dr. Ulrich Rüde for giving me the great possibility to work on this project. Thank you very much for the ongoing support of my work, for helpful discussions and encouragement throughout the time. This also includes the continuous support of the WaLBerla project as well as providing the environment necessary for its development. I wish to thank Prof. Dr. Gerhard Wellein not only for taking the time to review my PhD thesis, but also for the thought-provoking impulses on HPC computing and the excellent cooperation within the SKALB project. I would like to extend my gratitude to Prof. Dr. Takayuki Aoki (Aoki Labora- tory, Tokyo Institute of Technology, Japan) who kindly agreed to review my PhD thesis. Also I would like to thank him for his scientific support, especially for providing the compute time on the Tsubame 2.0 supercomputer. The disserta- tion would not have been possible in this way without it. Many thanks also to the HPC team of the RRZE: Thomas Zeiser, Georg Hager, Markus Wittmann, and Jan Treibig. On the one hand for the great teamwork within the SKALB project - I really enjoyed working with you, and on the other hand for the long-lasting support, the useful advices, teaching me how to pro- gram efficiently, and how to “fool the masses”. A big “thanks a lot” to all my colleagues, especially to Frank Deserno, Tobias Preclik, Björn Gmeiner, Iris Weiss, and Gaby Fleig. Thank you for the friendly atmosphere and all the help. It has always been fantastic talking to and laugh- ing with you. This also includes the WaLBerla team, Kristina Pickl, Dominik Bartuschat, Jan Götz, Regina Ammer, Daniela Anderl, Matthias Markl, Christian Godenschwager, Florian Schornbaum, and Martin Bauer. I especially enjoyed your cooperative way to promote our project. Special thanks to: • Klaus Iglberger, who was not only my roommate for many years, but also has been a great friend of mine and my priceless source of C++ knowledge. Thank you so much for invaluable help with the final steps of this work. • Johannes Habich for the great time, the awesome discussions, and that he has always listened to my complains. • Harald Köstler, who despite being constantly overloaded always has an open ear and great new ideas. • Stefan Donath for numerous fruitful discussion sometimes provocative, but always rewarding. Also thank you for being there for me in times of trou- ble. • Felipe Aristizabal, for the pleasant collaboration and the enjoyable video conferences. I would like to give credit to the people who supported me writing the thesis by proofreading parts of it: Klaus and Stefanie Iglberger (not to forget: thanks for the pile of sugar), Sabine Feichtinger, Johannes Habich, Thomas Zeiser, Georg Hager, Regina Ammer, Felipe Aristizabal, Stefan Donath, Harald Köstler, Philipp Neumann, Martin Bauer, Kristina Pickl, and Dominik Bartuschat. I owe immense gratitude to my friends for continuous support and necessary distractions: Dirk, Julia, Anke, Alex, and Silvia. Very special thanks to my wife Sabine, who gave me strength, taught me dis- cipline, and backed me up during the final PhD phase that seemed to last for years. I deeply thank you also for your unquestioning love, even if I’ll be abroad for half a year. My parents and also my parents-in-law receive my immense gratitude for their continuous support, unconditional love, and always believing in me. Contents 1 Motivation and Outline 1 1.1 The Need of HPC Software Design for Complex Flow Simulations. 1 I Methods and Architectures 5 2 Brief Introduction to the Lattice Boltzmann Method 7 2.1 TheLatticeBoltzmannMethod . 7 2.1.1 BoundaryConditions . 11 2.2 TheFluctuatingLatticeBoltzmannMethod . .. 12 2.2.1 TheMRTCollisionOperator . 13 2.2.2 TheFLBMCollisionOperator. 15 3 Current Supercomputer Architectures 17 3.1 GeneralClusterSetup . .. .. .. .. .. .. .. 18 3.2 TheNVIDIAGF100Tesla“Fermi"GPU . 20 4 Implementation Details 23 4.1 TheCUDAArchitectureandtheCUDACLanguage. 23 4.2 KernelImplementation. 26 4.2.1 TheLBMKernelforCPUs. 27 4.2.2 TheLBMKernelforNVIDIAGPUs . 28 4.2.3 TheFluctuationLBMKernelforNVIDIAGPUs . 29 II WaLBerla - An HPC Software Framework for Multi- Physics Simulations 31 5 Designing the Software Framework WaLBerla 35 5.1 DerivingtheDesignGoalsofWaLBerla . 35 5.2 The WaLBerla Prototypes and their Physical Design . .... 40 6 Analysis of the Logical Software Design
Recommended publications
  • Desarrollo Del Juego Sky Fighter Mediante XNA 3.1 Para PC
    Departamento de Informática PROYECTO FIN DE CARRERA Desarrollo del juego Sky Fighter mediante XNA 3.1 para PC Autor: Íñigo Goicolea Martínez Tutor: Juan Peralta Donate Leganés, abril de 2011 Proyecto Fin de Carrera Alumno: Íñigo Goicolea Martínez Sky Fighter Tutor: Juan Peralta Donate Agradecimientos Este proyecto es la culminación de muchos meses de trabajo, y de una carrera a la que llevo dedicando más de cinco años. En estas líneas me gustaría recordar y agradecer a todas las personas que me han permitido llegar hasta aquí. En primer lugar a mis padres, Antonio y Lola, por el apoyo que me han dado siempre. Por creer en mí y confiar en que siempre voy a ser capaz de salir adelante y no dudar jamás de su hijo. Y lo mismo puedo decir de mis dos hermanos, Antonio y Manuel. A Juan Peralta, mi tutor, por darme la oportunidad de realizar este proyecto que me ha permitido acercarme más al mundo de los videojuegos, algo en lo que querría trabajar. Pese a que él también estaba ocupado con su tesis doctoral, siempre ha sacado tiempo para resolver dudas y aportar sugerencias. A Sergio, Antonio, Toño, Alberto, Dani, Jorge, Álvaro, Fernando, Marta, Carlos, otro Antonio y Javier. Todos los compañeros, y amigos, que he hecho y que he tenido a lo largo de la carrera y gracias a los cuales he podido llegar hasta aquí. Por último, y no menos importante, a los demás familiares y amigos con los que paso mucho tiempo de mi vida, porque siempre están ahí cuando hacen falta.
    [Show full text]
  • A Data-Driven Approach for Personalized Drama Management
    A DATA-DRIVEN APPROACH FOR PERSONALIZED DRAMA MANAGEMENT A Thesis Presented to The Academic Faculty by Hong Yu In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Computer Science Georgia Institute of Technology August 2015 Copyright © 2015 by Hong Yu A DATA-DRIVEN APPROACH FOR PERSONALIZED DRAMA MANAGEMENT Approved by: Dr. Mark O. Riedl, Advisor Dr. David Roberts School of Interactive Computing Department of Computer Science Georgia Institute of Technology North Carolina State University Dr. Charles Isbell Dr. Andrea Thomaz School of Interactive Computing School of Interactive Computing Georgia Institute of Technology Georgia Institute of Technology Dr. Brian Magerko Date Approved: April 23, 2015 School of Literature, Media, and Communication Georgia Institute of Technology To my family iii ACKNOWLEDGEMENTS First and foremost, I would like to express my most sincere gratitude and appreciation to Mark Riedl, who has been my advisor and mentor throughout the development of my study and research at Georgia Tech. He has been supportive ever since the days I took his Advanced Game AI class and entered the Entertainment Intelligence lab. Thanks to him I had the opportunity to work on the interactive narrative project which turned into my thesis topic. Without his consistent guidance, encouragement and support, this dissertation would never have been successfully completed. I would also like to gratefully thank my dissertation committee, Charles Isbell, Brian Magerko, David Roberts and Andrea Thomaz for their time, effort and the opportunities to work with them. Their expertise, insightful comments and experience in multiple research fields have been really beneficial to my thesis research.
    [Show full text]
  • Basado En Imágenes Parametrizadas Sobre Resnet Para IdentiCar Intrusiones En 'Smartwatches' U Otros Dispositivos ANes
    IA eñ ™ • Publicaciones de autores 'Framework' basado en imágenes parametrizadas sobre ResNet para identicar intrusiones en 'smartwatches' u otros dispositivos anes. (Un eje singular de la publicación “Estado del arte de la ciencia de datos en el idioma español y su aplicación en el campo de la Inteligencia Articial”) Juan Antonio Lloret Egea, Celia Medina Lloret, Adrián Hernández González, Diana Díaz Raboso, Carlos Campos, Kimberly Riveros Guzmán, Adrián Pérez Herrera, Luis Miguel Cortés Carballo, Héctor Miguel Terrés Lloret License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC-BY-NC-ND 4.0) IA eñ ™ • Publicaciones de autores aplicación en el campo de la Inteligencia Articial”) Abstracto Se ha definido un framework1 conceptual y algebraicamente, inexistente hasta ahora en su morfología, y pionero en aplicación en el campo de la Inteligencia Artificial (IA) de forma conjunta; e implementado en laboratorio, en sus aspectos más estructurales, como un modelo completamente operacional. Su mayor aportación a la IA a nivel cualitativo es aplicar la conversión o transducción de parámetros obtenidos con lógica ternaria[1] (sistemas multivaluados)2 y asociarlos a una imagen, que es analizada mediante una red residual artificial ResNet34[2],[3] para que nos advierta de una intrusión. El campo de aplicación de este framework va desde smartwaches, tablets y PC's hasta la domótica basada en el estándar KNX[4]. Abstract note The full version of this document in the English language will be available in this link. Código QR de la publicación Este marco propone para la IA una ingeniería inversa de tal modo que partiendo de principios matemáticos conocidos y revisables, aplicados en una imagen gráfica en 2D para detectar intrusiones, sea escrutada la privacidad y la seguridad de un dispositivo mediante la inteligencia artificial para mitigar estas lesiones a los usuarios finales.
    [Show full text]
  • Google Adquiere Motorola Mobility * Las Tablets PC Y Su Alcance * Synergy 1.3.1 * Circuito Impreso Al Instante * Proyecto GIMP-Es
    Google adquiere Motorola Mobility * Las Tablets PC y su alcance * Synergy 1.3.1 * Circuito impreso al instante * Proyecto GIMP-Es El vocero . 5 Premio Concurso 24 Aniversario de Joven Club Editorial Por Ernesto Rodríguez Joven Club, vivió el verano 2011 junto a ti 6 Aniversario 24 de los Joven Club La mirada de TINO . Cumple TINO 4 años de Los usuarios no comprueba los enlaces antes de abrirlos existencia en este septiembre, el sueño que vió 7 Un fallo en Facebook permite apropiarse de páginas creadas la luz en el 2007 es hoy toda una realidad con- Google adquiere Motorola Mobility vertida en proeza. Esfuerzo, tesón y duro bre- gar ha acompañado cada día a esta Revista que El escritorio . ha sabido crecerse en sí misma y superar obs- 8 Las Tablets PC y su alcance táculos y dificultades propias del diario de cur- 11 Propuesta de herramientas libre para el diseño de sitios Web sar. Un colectivo de colaboración joven, entu- 14 Joven Club, Infocomunidad y las TIC siasta y emprendedor –bajo la magistral con- 18 Un vistazo a la Informática forense ducción de Raymond- ha sabido mantener y El laboratorio . desarrollar este proyecto, fruto del trabajo y la profesionalidad de quienes convergen en él. 24 PlayOnLinux TINO acumula innegables resultados en estos 25 KMPlayer 2.9.2.1200 años. Más de 350 000 visitas, un volumen apre- 26 Synergy 1.3.1 ciable de descargas y suscripciones, servicios 27 imgSeek 0.8.6 estos que ha ido incorporando, pero por enci- El entrevistado . ma de todo está el agradecimiento de muchos 28 Hilda Arribas Robaina por su existencia, por sus consejos, su oportu- na información, su diálogo fácil y directo, su uti- El taller .
    [Show full text]
  • Estado Del Arte Metodologia Para El Desarrollo De
    ESTADO DEL ARTE METODOLOGIA PARA EL DESARROLLO DE APLICACIONES EN 3D PARA WINDOWS CON VISUAL STUDIO 2008 Y XNA 3.1 Integrantes EDWIN ANDRÉS BETANCUR ÁLVAREZ JHON FREDDY VELÁSQUEZ RESTREPO UNIVERSIDAD SAN BUENAVENTURA Facultad de Ingenierías Seccional Medellín Año 2012 TABLA DE CONTENIDO AGRADECIMIENTOS 3 PARTE 1 ESTADO DEL ARTE 1. RESUMEN 4 2. INTRODUCCIÓN 5 3. MICROSOFT XNA GAME STUDIO 6 3.1 CONCEPTOS 7 4. HISTORIA 10 4.1 El origen de los Videojuegos 10 4.2 XNA en la actualidad 18 4.3 Versiones 19 4.3.1 XNA Game Studio Professional 19 4.3.2 XNA Game Studio Express 20 4.3.3 XNA Game Studio 2.0 21 4.3.4 XNA Game Studio 3.0 21 4.3.5 XNA Game Studio 3.1 22 4.3.6 XNA Game Studio 4.1 23 5. OTROS FRAMEWORKS (ALTERNATIVAS) 24 5.1 PYGAME 24 5.2 PULPCORE 25 5.3 GAMESALAD 25 5.4 ADVENTURE GAME 26 5.5 BLENDER 3D 27 5.6 CRYSTAL SPACE 27 5.7 DIM3 28 5.8 GAME MAKER 28 5.9 M.U.G.E.N 28 6. POR QUÉ XNA 3.1 29 PARTE 2 MARCO TEORICO 7.1 BUCLE DEL JUEGO 32 7.2 Componentes del juego 33 7.3 COMPORTAMIENTO 35 8. REFERENCIAS 37 9. REFERENCIAS DE IMAGENES 39 10. ANEXOS 43 2 AGRADECIMIENTOS Este proyecto de grado tiene un origen muy especial el cual esta plasmado y respaldado por la trayectoria del Microsoft Student Tech Club USB donde se ha hecho el esfuerzo como estudiantes de pregrado para incentivar, informar y enamorar a los integrantes de la institución con las múltiples posibilidades y plataformas que nos ofrece Microsoft al tener el estado de estudiantes en formación.
    [Show full text]
  • Tutorial CUDA
    Graphic Processing Units – GPU (Section 7.7) History of GPUs • VGA in early 90’s -- A memory controller and display generator connected to some (video) RAM • By 1997, VGA controllers were incorporating some acceleration functions • In 2000, a single chip graphics processor incorporated almost every detail of the traditional high-end workstation graphics pipeline - Processors oriented to 3D graphics tasks - Vertex/pixel processing, shading, texture mapping, rasterization • More recently, processor instructions and memory hardware were added to support general-purpose programming languages • OpenGL: A standard specification defining an API for writing applications that produce 2D and 3D computer graphics • CUDA (compute unified device architecture): A scalable parallel programming model and language for GPUs based on C/C++ 70 Historical PC architecture 71 Contemporary PC architecture 72 Basic unified GPU architecture 73 Tutorial CUDA Cyril Zeller NVIDIA Developer Technology Note: These slides are truncated from a longer version which is publicly available on the web Enter the GPU GPU = Graphics Processing Unit Chip in computer video cards, PlayStation 3, Xbox, etc. Two major vendors: NVIDIA and ATI (now AMD) © NVIDIA Corporation 2008 Enter the GPU GPUs are massively multithreaded manycore chips NVIDIA Tesla products have up to 128 scalar processors Over 12,000 concurrent threads in flight Over 470 GFLOPS sustained performance Users across science & engineering disciplines are achieving 100x or better speedups on GPUs CS researchers can use
    [Show full text]
  • Krakow, Poland
    Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland Jesus Carretero, Javier Garcia Blas Roman Wyrzykowski, Emmanuel Jeannot. (Editors) September 10-11, 2015 Volume Editors Jesus Carretero University Carlos III Computer Architecture and Technology Area Computer Science Department Avda Universidad 30, 28911, Leganes, Spain E-mail: [email protected] Javier Garcia Blas University Carlos III Computer Architecture and Technology Area Computer Science Department Avda Universidad 30, 28911, Leganes, Spain E-mail: [email protected] Roman Wyrzykowski Institute of Computer and Information Science Czestochowa University of Technology ul. D ˛abrowskiego 73, 42-201 Cz˛estochowa, Poland E-mail: [email protected] Emmanuel Jeannot Equipe Runtime INRIA Bordeaux Sud-Ouest 200, Avenue de la Vielle Tour, 33405 Talence Cedex, France E-mail: [email protected] Published by: Computer Architecture,Communications, and Systems Group (ARCOS) University Carlos III Madrid, Spain http://www.nesus.eu ISBN: 978-84-608-2581-4 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. This document also is supported by: Printed in Madrid — October 2015 Preface Network for Sustainable Ultrascale Computing (NESUS) We are very excited to present the proceedings of the Second International Workshop on Sustainable Ultrascale Com- puting Systems (NESUS 2015), a workshop created to reflect the research and cooperation activities made in the NESUS COST Action (IC1035) (www.nesus.eu), but open to all the research community working in large/ultra-scale computing systems.
    [Show full text]
  • Accelerating Hpc Applications on Nvidia Gpus
    ACCELERATING HPC APPLICATIONS ON NVIDIA GPUS WITH OPENACC Doug Miles, PGI Compilers & Tools, NVIDIA High Performance Computing Advisory Council February 21, 2018 PGI — THE NVIDIA HPC SDK Fortran, C & C++ Compilers Optimizing, SIMD Vectorizing, OpenMP Accelerated Computing Features CUDA Fortran, OpenACC Directives Multi-Platform Solution X86-64 and OpenPOWER Multicore CPUs NVIDIA Tesla GPUs Supported on Linux, macOS, Windows MPI/OpenMP/OpenACC Tools Debugger Performance Profiler Interoperable with DDT, TotalView 2 Programming GPU-Accelerated Systems Separate CPU System and GPU Memories GPU Developer View PCIe System GPU Memory Memory 3 Programming GPU-Accelerated Systems Separate CPU System and GPU Memories GPU Developer View NVLink System GPU Memory Memory 4 CUDA FORTRAN attributes(global) subroutine mm_kernel ( A, B, C, N, M, L ) real :: A(N,M), B(M,L), C(N,L), Cij integer, value :: N, M, L real, device, allocatable, dimension(:,:) :: integer :: i, j, kb, k, tx, ty Adev,Bdev,Cdev real, shared :: Asub(16,16),Bsub(16,16) tx = threadidx%x . ty = threadidx%y i = blockidx%x * 16 + tx allocate (Adev(N,M), Bdev(M,L), Cdev(N,L)) j = blockidx%y * 16 + ty Adev = A(1:N,1:M) Cij = 0.0 Bdev = B(1:M,1:L) do kb = 1, M, 16 Asub(tx,ty) = A(i,kb+tx-1) call mm_kernel <<<dim3(N/16,M/16),dim3(16,16)>>> Bsub(tx,ty) = B(kb+ty-1,j) ( Adev, Bdev, Cdev, N, M, L ) call syncthreads() do k = 1,16 C(1:N,1:L) = Cdev Cij = Cij + Asub(tx,k) * Bsub(k,ty) deallocate ( Adev, Bdev, Cdev ) enddo call syncthreads() .
    [Show full text]
  • Video Game Design Method for Novice
    GSTF INTERNATIONAL JOURNAL ON COMPUTING, VOL. 1, NO. 1, AUGUST 2010 19 Video Game Design Method for Novice Wei-Da Hao1 and Akash Khurana2 Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville 700 University Blvd, Kingsville, TX 78363, USA [email protected] [email protected] Abstract—This paper shows how college students without To program in Alice for video game, in addition to prior experience in video game design can create an interesting the well-known drag-and-drop editor to avoid syntax error, video game. Video game creation is a task that requires weeks students use many of the 3D objects and manipulating methods if not months of dedication and perseverance to complete. resulting from decades of progress in computer graphics and However, with Alice, a group of three sophomore students who virtual reality in a user-friendly environment; for example, never designed a game can create a full-fledged video game from modelling 3D objects are avoided because a wealth of 3D given specifications. Alice is 3D graphics interactive animation objects needed in the design are accessible from web gallery software, which is well-tried and proven to be an enjoyable linked to Alice. Moreover, Alice is a gift from Carnegie Mellon learning environment. At the start of this project, students are University, USA, which is freely downloadable from website given guidelines that describe expected outcomes. With minimum www.alice.org. Next version of Alice will be developed supervision, in three days, a working program that matches the by collaboration between academic communities and Sun- guidelines is accomplished.
    [Show full text]
  • On the Modeling, Simulation, and Visualization of Many-Body Dynamics Problems with Friction and Contact
    ON THE MODELING, SIMULATION, AND VISUALIZATION OF MANY-BODY DYNAMICS PROBLEMS WITH FRICTION AND CONTACT By Toby D. Heyn A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Mechanical Engineering) at the UNIVERSITY OF WISCONSIN–MADISON 2013 Date of final oral examination: 05/06/2013 The dissertation is approved by the following members of the Final Oral Committee: Dan Negrut, Associate Professor, Mechanical Engineering Stephen Wright, Professor, Computer Science Nicola J. Ferrier, Professor, Mechanical Engineering Neil A. Duffie, Professor, Mechanical Engineering Krishnan Suresh, Associate Professor, Mechanical Engineering Mihai Anitescu, Professor, Statistics, University of Chicago c Copyright by Toby D. Heyn 2013 All Rights Reserved i To my (growing) family ii ACKNOWLEDGMENTS I would like to thank my advisor, Professor Dan Negrut, for his guidance and support. His dedication and commitment continue to inspire me. I would also like to thank Professor Alessandro Tasora and Dr. Mihai Anitescu for their assistance throughout this work. I am also grateful for the expertise of the other committe members, and the assistance, friendship, and conversation of my colleagues in the Simulation-Based Engineering Laboratory. Finally, I am so grateful to my family. The unconditional love, support, and patience of my wife, parents, and sisters means so much to me. DISCARD THIS PAGE iii TABLE OF CONTENTS Page LIST OF TABLES ....................................... vi LIST OF FIGURES ...................................... vii ABSTRACT ..........................................xiii 1 Introduction ........................................ 1 1.1 Background . 2 1.2 Motivation . 3 1.3 Document Overview . 5 1.4 Specific Contributions . 7 2 Many-Body Dynamics .................................. 9 2.1 DVI Formulation .
    [Show full text]
  • Integration of a Raytracing-Based Visualization Component Into An
    Integration of an Interactive Raytracing-Based Visualization Component into an Existing Simulation Kernel Master Thesis Tobias Werner October 10, 2011 Angewandte Informatik V | Intelligent Graphics Group Universit¨atBayreuth 95440 Bayreuth, Germany Integration of a Raytracing-Based Visualization Component Table of Contents 1 Introduction7 1.1 Abstract................................... 7 1.2 Overview .................................. 7 2 Rasterization and raytracing9 2.1 Basic concepts ............................... 9 2.2 Rasterization ................................ 9 2.3 Raytracing ................................. 10 2.4 Comparison................................. 13 2.4.1 Performance ............................ 13 2.4.2 Design elegance........................... 13 2.4.3 Conclusion ............................. 14 2.5 Historical development........................... 14 2.5.1 Early years ............................. 14 2.5.2 The rendering equation ...................... 15 2.5.3 Consumer adaption ........................ 16 2.5.4 Specialized graphics hardware................... 17 2.5.5 General programming on graphics hardware........... 18 2.5.6 Use in modern applications.................... 19 3 State of the Art 21 3.1 Interactive raytracing ........................... 21 3.1.1 General review........................... 21 3.1.2 KD-trees .............................. 25 3.1.3 Bounding volume hierarchies ................... 26 3.1.4 GPU-based bounding volume rebuilds.............. 29 3.1.5 Memory coherence algorithms..................
    [Show full text]
  • NUMA and GPU So, I Know How to Use MPI and Openmp
    Some hot topics in HPC NUMA and GPU So, I know how to use MPI and OpenMP... is that all ? • (Un)fortunately no Today’s lecture is about two “hot topics” in HPC: • NUMA nodes and thread affinity • GPUs (accelerators) 2 / 52 Outline 1 UMA and NUMA Review Remote access Thread scheduling 2 Cache memory Review False sharing 3 GPUs What’s that ? Architecture A first example (CUDA) Let’s get serious Asynchronous copies 3 / 52 Outline 1 UMA and NUMA Review Remote access Thread scheduling 2 Cache memory Review False sharing 3 GPUs What’s that ? Architecture A first example (CUDA) Let’s get serious Asynchronous copies 4 / 52 UMA and NUMA (Review) — What’s inside a modern cluster 1. A network 2. Interconnected nodes 3. Nodes with multiple processors/sockets (and accelerators) 4. Processors/sockets with multiple cores 5 / 52 UMA and NUMA (Review) — And what about memory ? From the network point of view: • Each node (a collection of processors) has access to its own memory • The nodes are communicating by sending messages • We called that distributed memory and used MPI to handle it From the node point of view: • The (node’s own) memory is shared among the cores • We called that shared memory and used OpenMP to handle it • Ok, but how is it shared ? −→ Uniform Memory Access (UMA) −→ Non-Uniform Memory Access (NUMA) 6 / 52 UMA and NUMA (Review) — The UMA way Memory c0 c1 c2 c3 c4 c5 c6 c7 The cores and the memory modules are interconnected by a bus Every core can access any part of the memory at the same speed Pros: • No matter where the data are located
    [Show full text]