Vysoke´Ucˇenítechnicke´V Brneˇ Zobrazenístínu˚Ve Sce

Total Page:16

File Type:pdf, Size:1020Kb

Vysoke´Ucˇenítechnicke´V Brneˇ Zobrazenístínu˚Ve Sce VYSOKE´ UCˇ ENI´ TECHNICKE´ V BRNEˇ BRNO UNIVERSITY OF TECHNOLOGY FAKULTA INFORMACˇ NI´CH TECHNOLOGII´ U´ STAV POCˇ ´ITACˇ OVE´ GRAFIKY A MULTIME´ DII´ FACULTY OF INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER GRAPHICS AND MULTIMEDIA ZOBRAZENI´ STI´NU˚ VE SCE´ NEˇ S VYUZˇ ITI´M KNIHOVNY DIRECTX RENDERING OF SHADOWS IN A SCENE WITH DIRECTX DIPLOMOVA´ PRA´ CE MASTER’S THESIS AUTOR PRA´ CE Bc. JOZEF KOBRTEK AUTHOR VEDOUCI´ PRA´ CE Ing. JAN NAVRA´TIL SUPERVISOR BRNO 2012 Abstrakt Tato práce pojednává o metodách zobrazení stínù, jejich analýze a implementaci v rozhraní DirectX 11. Teoretická èást popisuje historický vývoj použití stínù v 3D aplikácích a jedno- tlivé algoritmy pro výpoèet stínù. V rámci práce jsou na demonstrační aplikaci porovnány z hlediska výkonu, nároènosti implementace a kvality výstupu 2 varianty algoritmu shadow mapping pro všesměrová světla - s využitím cube mappingu a parabolické projekce, každá s pěti různě optimalizovanými implementacemi. Abstract This work discusses shadowing methods, analyses them and describes implementation in DirectX 11 API. Theoretical part describes historical evolution of shadow usage in 3D appli- cations and also analyzes shadowing algorithms. This work compares 2 variants of shadow mapping algorithm for omnidirectional lights, based on cube mapping and paraboloid pro- jection, on demo application using quality, performance and implementation aspects. Klíčová slova DirectX, Direct3D, stínovací metody, shadow mapping, cube mapping, stínová tělesa, geo- metry shader Keywords DirectX, Direct3D, shadowing methods, shadow mapping, cube mapping, shadow volumes, geometry shader Citace Jozef Kobrtek: Zobrazení stínù ve scéně s využitím knihovny DirectX, diplomová práce, Brno, FIT VUT v Brně, 2012 Zobrazení stínù ve scéně s využitím knihovny DirectX Prohlášení Prohla¹uji, že jsem tento semestrální projekt vypracoval samostatně pod vedením pana Ing. Jana Navrátila ....................... Jozef Kobrtek 19. května 2012 Poděkování Chcel by som poďakova» vedúcemu Ing. Navrátilovi za neoceniteľnú pomoc a pripomienky, ktoré prispeli k skvalitneniu tejto práce. Taktiež ďakujem Matejovi Kolejákovi a Matejovi Smetanovi za zapožičanie hardware, na ktorom bolo možné prácu testova». c Jozef Kobrtek, 2012. Tato práce vznikla jako školní dílo na Vysokém učení technickém v Brně, Fakultě informa- čních technologií. Práce je chráněna autorským zákonem a její užití bez udělení oprávnění autorem je nezákonné, s výjimkou zákonem definovaných případů. Obsah 1 Úvod 2 2 Tiene v 3D aplikáciach 3 2.1 Historický vývoj . 3 2.2 Typy svetiel . 7 3 Tieňovacie algoritmy 9 3.1 Planar projection shadows . 9 3.2 Shadow mapping . 11 3.3 Shadow volumes . 16 4 Problém viditeľnosti 21 4.1 Obalové telesá . 21 4.2 View Frustum Culling . 22 5 Návrh a implementácia aplikácie 26 5.1 Návrh . 26 5.2 Implementácia . 26 6 Testy 39 6.1 Kvalitatívne porovnanie . 39 6.2 Testy výkonu . 41 7 Záver 48 A Pseudokód Welzlovho algoritmu 53 B Pseudokód výpoètu prieniku dvoch pohľadových telies 54 C Screenshoty z kvalitatívneho porovnania 55 C.1 512 x 512 . 55 C.2 1024 x 1024 . 56 C.3 2048 x 2048 . 57 C.4 4096 x 4096 . 58 D Obsah CD prílohy 59 E Použitie aplikácie 60 1 Kapitola 1 Úvod V realite by bolo ťažké si predstavi», že by predmety vystavené svetlu nevrhali tiene. Tiene sú práve to, èo nám dáva predstavu o polohe telies v priestore a ich vzájomnej polohe. Práve tento aspekt správania svetla sa snažíme uplatni» aj v 3D grafike. Výpoèet tieňov v počítačovej grafike je už dlhé roky predmetom skúmania, nakoľko ich spracovanie v 3D grafike v reálnom èase je sprevádzané nezanedbateľnou penalizáciou na výkone. Jedná sa o metódy poèítané globálne pre celú scénu a obvykle na niekoľko prechodov, preto musí by» kladený dôraz na ich efektívnu implementáciu. V tejto práci sú popísané a postupne implementované vybrané algoritmy pre zobrazova- nie tieňov v scéne, prièom sú zhodnotené ich vlastnosti, výkon a nároènos» implementácie v rozhraní DirectX 11. Pre tento úèel bolo treba vytvori» aplikáciu, ktorá tieto algotirmy implementuje, a pomocou ktorej by bolo možné vykona» merania a posudzova» kvalitu tieňov vytvorených jednotlivými metódami. Práca sa zameriava aj na porovnanie rozhraní DirectX ako proprietárneho ¹tandardu, a OpenGL ako otvoreného API, a to z hľadiska programátorského ako napr. zložitosť zápisu rovnakých algoritmov, a užívateľského - výkon algoritmu pod jedným èi druhým rozhraním èi veľkos» výsledných binárok. V úvode sa zameriavam na historický vývoj použitia tieňovacích metód vzhľadom na možnosti grafických adaptérov poèas ich vývoja - èi už prvotné s neprogramovateľnou pi- peline, alebo súèasné akcelerátory s programovateľným re»azcom. Sú tu taktiež popísané jednotlivé typy svetiel a ich vlastnosti. Kapitola číslo 2 analyzuje jednotlivé algoritmy zobrazovania tieňov, ich modofikácie èi varianty. Tretia kapitola popisuje analýzu a návrh aplikácie, požiadavky na jej funkènos» a výstup, ako aj problémy spojené s implementáciou v rozhraní DirectX, vrátane shaderov. Kvalitatívne a výkonnostné testy sú popísané v kapitole 6. Taktiež sú tu vyhodnotené výsledky jednotlivých meraní a porovnaní implementovaných metód. Záver práce hodnotí jednotlivé metódy na základe výsledkov meraní. 2 Kapitola 2 Tiene v 3D aplikáciach V tejto kapitole je popísaný vývoj použitia tieňov v hrách a 3D aplikácich v rôznych obdo- biach vývoja grafických akcelerátorov. V tradiènom poňatí syntézy obrazu ak označíme urèitú oblas», že sa nachádza v tieni, znamená to, že existuje urèitý povrch alebo teleso, ktoré sa nachádza medzi zdrojom svetla a sledovanou oblas»ou. Tieň neznamená nutne len tmav¹iu oblas» - v scéne sa môžu objavova» aj farebné tiene. Napr. ak máme v scéne dva zdroje svetla s rôznou farbou - pokiaľ jeden objekt vrhá tieň, stále môže do tieňa dopada» svetlo z druhého zdroja. [35] Obrázok 2.1: Tiene - terminológia, prevzaté z [32] Podľa toho, z ktorých zložiek tieň pozostáva, ich delíme na tvrdé (hard shadows) - len umbra, a mäkké - umbra aj penumbra. Výsledok závisí od použitého algoritmu pre výpoèet tieňov ako aj povahy svetelného zdroja. [2] 2.1 Historický vývoj Použitie tieňov v hrách aj v súèasnej dobe veľmi záleží od výkonu grafickej karty, nakoľko ich vykreslenie vyžaduje èasto niekoľko prechodov (ako bude popísané v ďalších kapitolách). Použitia tieňov v hrách by sa z historického hľadiska dalo rozdeli» na 2 obdobia - a to na akcelerátory s neprogramovateľnou pipeline od roku 1995, ktorý býva považovaný za zaèiatok éry grafikcých èipov pre hráèsky trh1, a na obdobie akcelerátorov s programovateľ- 1podľa: pctuning.cz/hardware/graficke-karty/4743-historie spolecnosti vyrabejicich 3d cipy-dil i-3dfx 3 ným grafickým re»azcom, datované od roku 2001 s vydaním kariet nVidia GeForce3 a ATi Radeon 8500 [19]. 2.1.1 Fixed-function pipeline (1995-2000) Pre akcelerátory z tohto obdobia je charakteristické: • Neprogramovateľná pipeline, grafické efekty tvorené len množnou parametrizovateľ- ných funkcií, ktoré boli implementované priamo v hardware [28] • Transformácie vrcholov realizované na CPU (s výnimkou GeForce 256 a GeForce2, ktoré podporujú T&L [30]) • Nemožnosť realizova» osvetlenie per-pixel, nakoľko grafická pipeline interpoluje farby počítané vo vrcholoch • Rendering do textúry je z hľadiska výkonu nároèná operácia (kvôli povahe zbernice), navy¹e nie každý akcelerátor ju podporuje [28, 11] Obrázok 2.2: Chronologický vývoj presunu operácii realizovaných procesorom na grafický akcelerátor, posledný stĺpec predstavuje GeForce 256, prevzaté z anandtech.com Ako vidno na obrázku 2.2, prvé akcelerátory urýchľovali iba rasterizáciu, napr. 3dfx Voodoo, neskôr sa procesor odľahčil aj od zostavovania primitív - medzi prvými napr. nVidia Riva 128. Neskôr pribudla možnosť aplikova» viacero textúr v jednom prechode (multitexturing) - nVidia Riva TNT2 [28]. Vyhľadom na relatívne nízky výkon akcelerátorov v tomto období (v podovnení s dne¹- kom) a chýbajúcou podporou v hardware a software boli dynamické tiene v hrách skôr raritou. Obrázok 2.3 demon¹truje implementáciu shadow mapping algoritmu s použitím 4 Obrázok 2.3: Shadow mapping v DirectX7, prevzaté zo software.intel.com DirectX7, no výkon takéhoto rie¹enia by bol pre hry nedostatoèný (11FPS na jednoduchej scéne), viď obrázok 2.3. Nakoľko pipeline neumožňovala výraznejšiu kontrolu osvetlenia, požívali sa tzv. light mapy. Sú to predpočítané ¹truktúry, ktoré uchovávajú osveltenie povrchu telesa. Autorom tejto metódy je John Carmack, prièom poprvýkrát bola táto technika použitá v hre Quake (1996). Táto metóda bola populárna najmä vďaka rastúcemu poètu grafických kariet pod- porujúcich multitexturing [34]. Obrázok 2.4: Aplikácia light mapy na textúru, prevzaté z [4] Dynamické tiene v hrách sa zaèali objavova» až koncom éry fixed-function pipeline. Jednou z najvýznamnejších je opäť dielo id Software a Johna Carmacka - Quake 3 Arena (1999), ktorá implementovala dynamické tiene algoritmom stencil shadow volumes vďaka stencil bufferu, ktorý pri¹iel so ¹tandardom DirectX verzie 6, viď obr. 2.5a. Tieňované v¹ak boli len niektoré objekty ako napr. bedne s nábojmi èi niektoré zbrane. Požívala sa tiež technika planar shadows, krotá spočíva v premietaní telesa na plochu, viď kapitola 3.1. 5 (a) Shadow volumes v Quake 3 Arena (b) Depth-fail shadow volumes v Doom 3 Obrázok 2.5: Tiene v hrách od id Software 2.1.2 Programovateľná pipeline (2001-súèasnos») Zaèiatok éry kariet s programovateľnou pipeline sa datuje do roku 2001, kedy boli uve- dené karty ATi Radeon 8500 a GeForce3 s podporou DirectX 8. Novinkou bola hardvérová podpora pre metódu
Recommended publications
  • Fast Calculation of the Lomb-Scargle Periodogram Using Graphics Processing Units 3
    Draft version August 7, 2018 A Preprint typeset using LTEX style emulateapj v. 11/10/09 FAST CALCULATION OF THE LOMB-SCARGLE PERIODOGRAM USING GRAPHICS PROCESSING UNITS R. H. D. Townsend Department of Astronomy, University of Wisconsin-Madison, Sterling Hall, 475 N. Charter Street, Madison, WI 53706, USA; [email protected] Draft version August 7, 2018 ABSTRACT I introduce a new code for fast calculation of the Lomb-Scargle periodogram, that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate key parts of its source. Benchmarking calculations indicate no significant differences in accuracy compared to an equivalent CPU-based code. However, the differences in performance are pronounced; running on a low-end GPU, the code can match 8 CPU cores, and on a high-end GPU it is faster by a factor approaching thirty. Applications of the code include analysis of long photometric time series obtained by ongoing satellite missions and upcoming ground-based monitoring facilities; and Monte-Carlo simulation of periodogram statistical properties. Subject headings: methods: data analysis — methods: numerical — techniques: photometric — stars: oscillations 1. INTRODUCTION of the newly emergent field of GPU computing; then, Astronomical time-series observations are often charac- Section 3 reviews the formalism defining the L-S peri- terized by uneven temporal sampling (e.g., due to trans- odogram, and Section 4 presents a GPU-based code im- formation to the heliocentric frame) and/or non-uniform plementing this formalism. Benchmarking calculations coverage (e.g., from day/night cycles, or radiation belt to evaluate the accuracy and performance of the code passages).
    [Show full text]
  • High-Performance and Energy-Efficient Irregular Graph Processing on GPU Architectures
    High-Performance and Energy-Efficient Irregular Graph Processing on GPU architectures Albert Segura Salvador Doctor of Philosophy Department of Computer Architecture Universitat Politècnica de Catalunya Advisors: Jose-Maria Arnau Antonio González July, 2020 2 Abstract Graph processing is an established and prominent domain that is the foundation of new emerging applications in areas such as Data Analytics, Big Data and Machine Learning. Appli- cations such as road navigational systems, recommendation systems, social networks, Automatic Speech Recognition (ASR) and many others are illustrative cases of graph-based datasets and workloads. Demand for higher processing of large graph-based workloads is expected to rise due to nowadays trends towards increased data generation and gathering, higher inter-connectivity and inter-linkage, and in general a further knowledge-based society. An increased demand that poses challenges to current and future graph processing architectures. To effectively perform graph processing, the large amount of data employed in these domains requires high throughput architectures such as GPGPU. Although the processing of large graph-based workloads exhibits a high degree of parallelism, the memory access patterns tend to be highly irregular, leading to poor GPGPU efficiency due to memory divergence. Graph datasets are sparse, highly unpredictable and unstructured which causes the irregular access patterns and low computation per data ratio, further lowering GPU utilization. The purpose of this thesis is to characterize the bottlenecks and limitations of irregular graph processing on GPGPU architectures in order to propose architectural improvements and extensions that deliver improved performance, energy efficiency and overall increased GPGPU efficiency and utilization. In order to ameliorate these issues, GPGPU graph applications perform stream compaction operations which process the subset of active nodes/edges so subsequent steps work on compacted dataset.
    [Show full text]
  • GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications
    University of Central Florida STARS HIM 1990-2015 2014 GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications Adam Phillips University of Central Florida Part of the Mathematics Commons Find similar works at: https://stars.library.ucf.edu/honorstheses1990-2015 University of Central Florida Libraries http://library.ucf.edu This Open Access is brought to you for free and open access by STARS. It has been accepted for inclusion in HIM 1990-2015 by an authorized administrator of STARS. For more information, please contact [email protected]. Recommended Citation Phillips, Adam, "GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications" (2014). HIM 1990-2015. 1613. https://stars.library.ucf.edu/honorstheses1990-2015/1613 GPU ACCELERATED APPROACH TO NUMERICAL LINEAR ALGEBRA AND MATRIX ANALYSIS WITH CFD APPLICATIONS by ADAM D. PHILLIPS A thesis submitted in partial fulfilment of the requirements for the Honors in the Major Program in Mathematics in the College of Sciences and in The Burnett Honors College at the University of Central Florida Orlando, Florida Spring Term 2014 Thesis Chair: Dr. Bhimsen Shivamoggi c 2014 Adam D. Phillips All Rights Reserved ii ABSTRACT A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applica- tions is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods.
    [Show full text]
  • Background on GPGPU Programming
    Background on GPGPU Programming • Development of powerful 3D graphics chips – Consumer demand was accelerated by immersive first person games such as Doom and Quake – Microsoft DirectX 8.0 standard released in 2001 required hardware to handle both programmable vertex and programmable pixel shading standards – This meant developers had some control over computations on a GPU – The NVIDIA GeForce 3 series were examples of such chips Early Attempts at GPGPU Programming • Calculated a color value for every pixel on screen – Used (x,y) position, input colors, textures, etc. – Input color and textures were controlled by programmer – Any data value could be used as input and, after some computation, the final data value could be used for any purpose, not just the final pixel color • Restrictions that would prevent widespread use – Only a handful of input color and texture values could be used; there were restrictions on where results could be read from and written to memory – There was no reasonable way to debug code – Programmer would need to learn OpenGL or DirectX NVDIA CUDA Architecture • Introduced in November 2006 – First used on GeForce 8800 GTX – Introduced many features to facilitate GP programming • Some Features – Introduced a unified shader pipeline that allowed every ALU to be used for GP programming – Complied with IEEE single-precision floating point – Allowed arbitrary read and write access to memory via shared memory – Developed a free compiler, CUDA C, based on the widely used C language Selected Applications - 1 • Medical
    [Show full text]
  • Time Complexity Parallel Local Binary Pattern Feature Extractor on a Graphical Processing Unit
    ICIC Express Letters ICIC International ⃝c 2019 ISSN 1881-803X Volume 13, Number 9, September 2019 pp. 867{874 θ(1) TIME COMPLEXITY PARALLEL LOCAL BINARY PATTERN FEATURE EXTRACTOR ON A GRAPHICAL PROCESSING UNIT Ashwath Rao Badanidiyoor and Gopalakrishna Kini Naravi Department of Computer Science and Engineering Manipal Institute of Technology Manipal Academy of Higher Education Manipal, Karnataka 576104, India f ashwath.rao; ng.kini [email protected] Received February 2019; accepted May 2019 Abstract. Local Binary Pattern (LBP) feature is used widely as a texture feature in object recognition, face recognition, real-time recognitions, etc. in an image. LBP feature extraction time is crucial in many real-time applications. To accelerate feature extrac- tion time, in many previous works researchers have used CPU-GPU combination. In this work, we provide a θ(1) time complexity implementation for determining the Local Binary Pattern (LBP) features of an image. This is possible by employing the full capa- bility of a GPU. The implementation is tested on LISS medical images. Keywords: Local binary pattern, Medical image processing, Parallel algorithms, Graph- ical processing unit, CUDA 1. Introduction. Local binary pattern is a visual descriptor proposed in 1990 by Wang and He [1, 2]. Local binary pattern provides a distribution of intensity around a center pixel. It is a non-parametric visual descriptor helpful in describing a distribution around a center value. Since digital images are distributions of intensity, it is helpful in describing an image. The pattern is widely used in texture analysis, object recognition, and image description. The local binary pattern is used widely in real-time description, and analysis of objects in images and videos, owing to its computational efficiency and computational simplicity.
    [Show full text]
  • E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDA
    E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDA Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing Research E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, 2016 Columbia University Reference Book CUDA: Compute Unified Device Architecture 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University GPU 2001: NVIDIA’s GeForce 3 series made probably the most breakthrough in GPU technology — the computing industry’s first chip to implement Microsoft’s then-new Direct 8.0 standard; — which required that the compliant hardware contain both programmable vertex and programmable pixel shading stages Early 2000s: The release of GPUs that possessed programmable pipelines attracted many researchers to the possibility of using graphics hardware for more than simply OpenGL or DirectX-based rendering. — The GPUs of the early 2000s were designed to produce a color for every pixel on the screen using programmable arithmetic units known as pixel shaders. — The additional information could be input colors, texture coordinates, or other attributes 3 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University CUDA 2006: GPU computing starts going for prime time — Release of CUDA — The CUDA Architecture included a unified shader pipeline, allowing each and every arithmetic logic unit (ALU) on the chip to be marshaled by a program intending to perform general-purpose computations. 4 E6895 Advanced Big Data Analytics — Lecture 7 ©
    [Show full text]
  • NVIDIA's Fermi: the First Complete GPU Computing Architecture
    NVIDIA’s Fermi: The First Complete GPU Computing Architecture A white paper by Peter N. Glaskowsky Prepared under contract with NVIDIA Corporation Copyright © September 2009, Peter N. Glaskowsky Peter N. Glaskowsky is a consulting computer architect, technology analyst, and professional blogger in Silicon Valley. Glaskowsky was the principal system architect of chip startup Montalvo Systems. Earlier, he was Editor in Chief of the award-winning industry newsletter Microprocessor Report. Glaskowsky writes the Speeds and Feeds blog for the CNET Blog Network: http://www.speedsnfeeds.com/ This document is licensed under the Creative Commons Attribution ShareAlike 3.0 License. In short: you are free to share and make derivative works of the file under the conditions that you appropriately attribute it, and that you distribute it only under a license identical to this one. http://creativecommons.org/licenses/by-sa/3.0/ Company and product names may be trademarks of the respective companies with which they are associated. 2 Executive Summary After 38 years of rapid progress, conventional microprocessor technology is beginning to see diminishing returns. The pace of improvement in clock speeds and architectural sophistication is slowing, and while single-threaded performance continues to improve, the focus has shifted to multicore designs. These too are reaching practical limits for personal computing; a quad-core CPU isn’t worth twice the price of a dual-core, and chips with even higher core counts aren’t likely to be a major driver of value in future PCs. CPUs will never go away, but GPUs are assuming a more prominent role in PC system architecture.
    [Show full text]
  • SOUNDVISION 3.5.1 Readme - V.1.0
    SOUNDVISION 3.5.1 readme - v.1.0 [EN] Soundvision 3.5.1 Readme Soundvision is the L-Acoustics 3D acoustical and mechanical modeling software. Soundvision 3.5.1 is available to download from www.l-acoustics.com from April 27, 2021. Computer requirements Host computer minimum conguration for Soundvision: — Operating systems: Windows 10, macOS High Sierra (OS X 10.13) to macOS Big Sur (11.2) — RAM: 1 GB minimum — Processor speed: 1.2 GHz minimum — Available hard-disk space: 100 MB minimum — Video card: — Intel HD, Iris graphics (Intel GMA and Intel Extreme Graphics are not supported). PC users equipped with an Intel HD Graphics 5500 graphics card (i3-5005U, i3-5015U, i3-5010U, i3-5020U, i5-5200U, i5-5300U, i7-5500U and i7-5600U processors): update the drivers to version 10.18.15.4279 (from Intel package version 15.40.7.4279) or higher. Previous versions of the drivers may give unexpected mapping results. — GeForce cards series 8 and above. The following models are not compatible: GeForce 256, GeForce 2 series, GeForce 3 series, GeForce 4 series, GeForce FX series, GeForce 6 series, GeForce 7 series. — ATI Radeon HD 2000 series and above. — Third-party software: Adobe® Reader® — (optional) dedicated USB port - to open .sv* les without a hardware key (Windows only), refer to the Soundvision Help Refer to the Soundvision optimization technical bulletin for more information on the optimization of the computer conguration and the troubleshooting procedures. Windows 10 is a registered trademark of Microsoft Corporation. Mac and macOS are trademarks of Apple Inc., registered in the U.S.
    [Show full text]
  • HP Z400 Workstation Overview
    QuickSpecs HP Z400 Workstation Overview HP recommends Windows Vista® Business 1. 3 External 5.25" Bays 2. Power Button 3. Front I/O: 2 USB 2.0, 1 IEEE 1394a (optional card required), Headphone, Microphone DA - 13276 North America — Version 4 — April 17, 2009 Page 1 QuickSpecs HP Z400 Workstation Overview 4. 3 External 5.25” Bays 9. Rear I/O: 6 USB 2.0, PS/2 keyboard/mouse 1 RJ-45 to Integrated Gigabit LAN 5. 4 DIMM Slots for DDR3 ECC Memory 1 Audio Line In, 1 Audio Line Out, 1 Microphone In 6. 2 Internal 3.5” Bays 10. 2 PCIe x16 Gen2 Slots 7. 475W, 85% efficient Power Supply 11.. 1 PCIe x4 Gen2, 1 PCIe x4 Gen1, 2 PCI Slots 8. Dual/Quad Core Intel 3500 Series Processors 12 4 Internal USB 2.0 ports Form Factor Convertible Minitower Compatible Operating Genuine Windows Vista® Business 32-bit* Systems Genuine Windows Vista® Business 64-bit* Genuine Windows Vista® Business 32-bit with downgrade to Windows® XP Professional 32-bit custom installed** (expected available until August 2009) Genuine Windows Vista® Business 64-bit with downgrade to Windows® XP Professional x64 custom installed** (expected available until August 2009) HP Linux Installer Kit for Linux (includes drivers for both 32-bit & 64-bit OS versions of Red Hat Enterprise Linux WS4 and WS5 - see: http://www.hp.com/workstations/software/linux) Novell Suse SLED 11 (expected availability May 2009) *Certain Windows Vista product features require advanced or additional hardware. See http://www.microsoft.com/windowsvista/getready/hardwarereqs.mspx and http://www.microsoft.com/windowsvista/getready/capable.mspx for details.
    [Show full text]
  • Oral Presentation
    Some Reflections on Advanced Geocomputations and the Data Deluge J. A. Rod Blais Dept. of Geomatics Engineering Pacific Institute for the Mathematical Sciences University of Calgary, Calgary, AB www.ucalgary.ca/~blais [email protected] Outline • Introduction • Paradigms in Applied Scientific Research • Evolution of Computing and Applications • High Performance Computational Science (HPCS) • Data Intensive Scientific Computing (DISC) • Cloud Computing and Applications • Examples, Challenges and Future Trends • Concluding Remarks Paradigms in Applied Scientific Research • Traditionally, concept oriented and experimentally tested • Currently, evolving experimental and simulated data driven • Also, often multidisciplinary and multifaceted group projects • In other words, more systems orientation with web collaboration • Experimental / simulated data are becoming cheap and accessible • Distributed data and metadata are often practically unmovable • Sensor webs and computing clouds are changing the landscape • Over 90% of web users look at less than 10% of the pages Evolution of Advanced Computing • Megaflops to Petaflops, i.e. 106 to 1015 floating point operations / sec. • Megabytes to Petabytes, i.e. 106 to 1015 bytes (i.e. 8 binary bits) • Single Core CPUs to Multi- and Many-Core CPUs • Practical limitations: • TB HDs are common but I/O speeds imply max. ~100 TBs • CPUs have memory, // instructions and power limitations • Random-Access files have max. volume I/O limitations ► CPUs to GPUs with CUDA code development ► The strategic future:
    [Show full text]
  • Cuda by Example.Book.Pdf
    ptg Download from www.wowebook.com CUDA by Example ptg Download from www.wowebook.com This page intentionally left blank ptg Download from www.wowebook.com CUDA by Example AN INTRODUCTION TO GENERAL!PUR POSE GPU PROGRAMMING ptg JASON SANDERS EDWARD KANDROT 8SSHU6DGGOH5LYHU1-ǩ%RVWRQǩ,QGLDQDSROLVǩ6DQ)UDQFLVFR 1HZ<RUNǩ7RURQWRǩ0RQWUHDOǩ/RQGRQǩ0XQLFKǩ3DULVǩ0DGULG &DSHWRZQǩ6\GQH\ǩ7RN\Rǩ6LQJDSRUHǩ0H[LFR&LW\ Download from www.wowebook.com 0DQ\RIWKHGHVLJQDWLRQVXVHGE\PDQXIDFWXUHUVDQGVHOOHUVWRGLVWLQJXLVKWKHLUSURGXFWVDUH FODLPHGDVWUDGHPDUNV:KHUHWKRVHGHVLJQDWLRQVDSSHDULQWKLVERRNDQGWKHSXEOLVKHUZDV DZDUHRIDWUDGHPDUNFODLPWKHGHVLJQDWLRQVKDYHEHHQSULQWHGZLWKLQLWLDOFDSLWDOOHWWHUVRULQDOO FDSLWDOV 7KHDXWKRUVDQGSXEOLVKHUKDYHWDNHQFDUHLQWKHSUHSDUDWLRQRIWKLVERRNEXWPDNHQRH[SUHVVHG RULPSOLHGZDUUDQW\RIDQ\NLQGDQGDVVXPHQRUHVSRQVLELOLW\IRUHUURUVRURPLVVLRQV1ROLDELOLW\LV DVVXPHGIRULQFLGHQWDORUFRQVHTXHQWLDOGDPDJHVLQFRQQHFWLRQZLWKRUDULVLQJRXWRIWKHXVHRIWKH LQIRUPDWLRQRUSURJUDPVFRQWDLQHGKHUHLQ 19,',$PDNHVQRZDUUDQW\RUUHSUHVHQWDWLRQWKDWWKHWHFKQLTXHVGHVFULEHGKHUHLQDUHIUHHIURP DQ\,QWHOOHFWXDO3URSHUW\FODLPV7KHUHDGHUDVVXPHVDOOULVNRIDQ\VXFKFODLPVEDVHGRQKLVRU KHUXVHRIWKHVHWHFKQLTXHV 7KHSXEOLVKHURIIHUVH[FHOOHQWGLVFRXQWVRQWKLVERRNZKHQRUGHUHGLQTXDQWLW\IRUEXONSXUFKDVHV RUVSHFLDOVDOHVZKLFKPD\LQFOXGHHOHFWURQLFYHUVLRQVDQGRUFXVWRPFRYHUVDQGFRQWHQW SDUWLFXODUWR\RXUEXVLQHVVWUDLQLQJJRDOVPDUNHWLQJIRFXVDQGEUDQGLQJLQWHUHVWV)RUPRUH LQIRUPDWLRQSOHDVHFRQWDFW 86&RUSRUDWHDQG*RYHUQPHQW6DOHV (800) 382-3419 FRUSVDOHV#SHDUVRQWHFKJURXSFRP )RUVDOHVRXWVLGHWKH8QLWHG6WDWHVSOHDVHFRQWDFW ,QWHUQDWLRQDO6DOHV LQWHUQDWLRQDO#SHDUVRQFRP
    [Show full text]
  • Accelerating Convolutional Neural Network by Exploiting Sparsity on Gpus Weizhi Xu, Shengyu Fan, Hui Yu and Xin Fu, Member, IEEE
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 14, NO. 8, AUGUST 2015 1 Accelerating convolutional neural network by exploiting sparsity on GPUs Weizhi Xu, Shengyu Fan, Hui Yu and Xin Fu, Member, IEEE, Abstract—Convolutional neural network (CNN) is an impor- series of algorithmic optimization techniques were developed, tant deep learning method. The convolution operation takes a such as im2col-based method [12], [13], FFT-based method large proportion of the total execution time for CNN. Feature [14], and Winograd-based method [15]. The above acceleration maps for convolution operation are usually sparse. Multiplica- tions and additions for zero values in the feature map are useless methods have been integrated into cuDNN library, which is a for convolution result. In addition, convolution layer and pooling state-of-the-art library for deep learning on GPU [16]. layer are computed separately in traditional methods, which leads The convolution operation in CNN refers to the process to frequent data transfer between CPU and GPU. Based on these in which the convolution kernel samples on the feature map. observations, we propose two new methods to accelerate CNN In the sampling process, the convolution kernel carries out a on GPUs. The first method focuses on accelerating convolution operation, and reducing calculation of zero values. The second weighted summation operation on the sampling area, and the method combines the operations of one convolution layer with entire feature map is sampled according to the stride size. The the following pooling layer to effectively reduce traffic between process of convolution contains a lot of multiplications and CPU and GPU.
    [Show full text]