Accelerating Ray Tracing Using Fpgas

Accelerating Ray Tracing Using FPGAs DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur im Rahmen des Studiums Technische Informatik eingereicht von Alexander Reznicek, BSc Matrikelnummer 1125076 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: Associate Prof. Dipl.-Ing. Dipl.-Ing. Dr.techn. Michael Wimmer Mitwirkung: Dipl.-Ing. Hiroyuki Sakai Wien, 15. Juni 2020 Alexander Reznicek Michael Wimmer Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Accelerating Ray Tracing Using FPGAs DIPLOMA THESIS submitted in partial fulfillment of the requirements for the degree of Diplom-Ingenieur in Computer Engineering by Alexander Reznicek, BSc Registration Number 1125076 to the Faculty of Informatics at the TU Wien Advisor: Associate Prof. Dipl.-Ing. Dipl.-Ing. Dr.techn. Michael Wimmer Assistance: Dipl.-Ing. Hiroyuki Sakai Vienna, 15th June, 2020 Alexander Reznicek Michael Wimmer Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Erklärung zur Verfassung der Arbeit Alexander Reznicek, BSc Huttengasse 49/12 1160 Wien Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 15. Juni 2020 Alexander Reznicek v Danksagung Bei dieser Diplomarbeit gilt es einigen im Umfeld der Universität besonderen Dank auszusprechen, da jeder dieser Beiträge diese Arbeit erst ermöglicht hat. Zu allererst möchte ich mich beim ECS Institut im Allgemeinen bedanken, die mir einen FPGA für die ersten Schritte zur Verfügung gestellt haben und bei so manchen kniffligen Problemen gerade am Anfang unterstützt haben. Der TU Wien selbst, respektive dem Dekanat, gilt allerdings genauso mein Dank. Der Erhalt des Forschungsstipendiums hat es erst ermöglicht, den in der Arbeit erwähnten FPGA auf einer PCIe-Karte für den vorgesehenen Ansatz einzusetzen. Vom Institut für Visual Computing & Human-Centered Technology möchte ich Hiroyuki Sakai gleich in zweifacher Hinsicht danken. Zum einen, da er der Thematik mit großem Interesse begegnet ist und mich unterstützt hat in der Arbeit weiter zu gehen bzw. diese überhaupt zu starten. Zum anderen, weil er viel Unterstützung nicht nur beim Schreiben gegeben hat, sondern auch rundherum, bspw. kurzfristig einen Labor-PC zu organisieren nachdem mein Heim-PC überraschend kaputt ging. Ich möchte auch Michael Wimmer danken dafür, dass er die doch etwas außergewöhnliche Thematik ermöglicht hat, trotz der vielen möglichen Ungewissheiten die eine solche Arbeit mit sich bringt. Ich möchte an der Stelle beiden dafür danken, dass ich gerade am Ende schnelles Feedback zur Arbeit selbst sonntags erhalten habe, was ich beileibe nicht als selbstverständlich empfinde. Im Freundes- und Familienkreis möchte ich zum einen meinen Eltern für die Geduld danken, was in Anbetracht der Dauer dieser Arbeit durchaus als bemerkenswert zu sehen ist. Auch wenn mir noch einige Freunde einfallen denen mein Dank gilt, möchte ich hier namentlich vor allem Michael Agbontaen nennen, der unter Einsatz seines PCs einige zusätzliche Tests durchführte, damit ich einige CPU und GPU Benchmarks durch eine zweite “Meinung” verifizieren konnte. Zuletzt ist auch das in der Arbeit verwendete Stromverbrauchsmessgerät eine Leihgabe von ihm und die Idee die Messung derart durchzuführen gemeinsam mit ihm entstanden. vii Acknowledgements For this thesis, I have to thank many people in the surroundings of the university, as each of these contributions only made the thesis possible. First, I want to thank the ECS institute, which has provided me an FPGA for the first steps and gave support for some tricky problems especially in the initial phase. I also want to thank the TU Wien, respectively the deanery, for the receiving of the “Forschungsstipendium”, which just made this thesis possible by allowing to use the mentioned FPGA on the PCIe card for the designated approach. From the Institute of Visual Computing & Human-Centered Technology I want to thank Hiroyuki Sakai on two counts. First, because he encountered the topic with much interest and supported me to go further on the thesis respectively begin its writing. Secondly, he not just supported me at the writing much, but also around that, e.g., by organizing a Lab-PC quickly after my own PC got broken surprisingly. I also want to thank Michael Wimmer, who made this somewhat extraordinary topic possible, even with respect to the various uncertainties this topic brings on. I want to thank at this point both of them explicitly that I got especially at the end feedbacks fast and even on sundays, what I do not see as implicitness. In the circle of family and friends I want to thank my parents for the patience, which is, considering the duration of this thesis, notable. Even if I could name some friends whom I further have to thank, I want to name especially Michael Agbontaen for testing some CPU and GPU benchmarks to verify my own tests. Lastly, it was also his power meter which was used for the tests and the idea of using it for the tests was born together. ix Kurzfassung Die Synthese eines Bildes aus einer im Computer gespeicherten Szene ist das soge- nannte Rendering, das beispielsweise mit einigen Vertretern der Klasse der Raytracing- Algorithmen fotorealistische Ergebnisse liefern kann. Diese Varianten (beispielsweise das Path Tracing) zeichnen sich allerdings durch eine stochastische Charakteristik aus, welche in einem hohen Rechenaufwand resultieren. Dies liegt in der Natur stochastischer Algorithmen, die durch eine hohe Anzahl an Stichproben ein Ergebnis berechnen–im Falle des Ray Tracing durch eine hohe Anzahl an Strahlen, die zur vollständigen Bildsynthese nötig sind. Eine Möglichkeit um das Ray Tracing, sowohl in den stochastischen als auch in den simpleren Formen, zu beschleunigen ist der Einsatz von spezialisierter Hardware. FPGRay ist ein solcher Ansatz, der dabei die Verwendung von spezialisierter Hardware mit der Software auf einem handelsüblichen PC kombiniert um eine Hybridlösung zu bilden. Dadurch soll die höhere Effizienz spezialisierter Hardware genutzt werden und zeitgleich eine Zukunftsfähigkeit im Falle sich ändernder Algorithmen erreicht werden. Die Ergebnisse deuten darauf hin, dass eine solche Effizienzverbesserung möglich ist. Allerdings war dies im Rahmen der Arbeit nicht realisierbar und die konkrete Imple- mentation zeigte eine niedrigere Effizienz als reine Softwarelösungen. Die Möglichkeit der Erreichung einer höheren Effizienz durch diesen Ansatz konnte allerdings durch das Aufzeigen von FPGRays Potential sichtbar gemacht werden. xi Abstract The synthesis of an image from a scene stored on a computer is called rendering, which is able to deliver photo-realistic results, e.g., by using specific variants of the class of ray tracing algorithms. However, these variants (e.g., path tracing) possess a stochastic characteristic which results in a high computational expense. This is explained by the nature of stochastic algorithms, which use a high number of samples to compute a result—in case of ray tracing, these samples manifest in a high number of rays needed for a complete rendering. One possibility to accelerate ray tracing—no matter if using a stochastic or simpler variants—is the use of customized hardware. FPGRay is such an approach, which combines the use of customized hardware with the software of an off-the-shelf PC to a hybrid solution. This allows increasing the efficiency by specialized hardware and delivers a sustainability in case of changing algorithms at the same time. The results point towards a possible efficiency gain. Unfortunately, in the scope of this thesis this was not realizable and the specific implementation showed a lower efficiency compared to the software implementation. Nevertheless, the possibility to achieve a higher efficiency with this approach by indicating FPGRay’s potential could be shown. xiii Contents Kurzfassung xi Abstract xiii Contents xv 1 Introduction 1 2 Background 5 2.1 Ray Tracing Explained . 5 2.2 Intersection Acceleration With k-d Trees . 9 2.3 FPGAs . 12 3 Related Works 21 4 FPGRay 23 4.1 FPGRayIF . 26 4.2 kdintersect . 27 4.3 kdscheduler . 29 4.4 Parameterization Recommendations . 48 4.5 PCIe Driver . 49 4.6 API . 51 4.7 Implementation Details . 55 5 Results 75 5.1 Test Scenes and Designs . 76 5.2 Synthetic Tests . 78 5.3 Special Tests . 85 5.4 Efficiency of FPGRay . 96 5.5 Analyzing the Results . 98 6 Future Work 103 6.1 Improvements . 103 6.2 Extensions . 104 xv 7 Conclusion 107 List of Figures 109 List of Tables 113 List of Algorithms 115 Acronyms 117 Bibliography 119 CHAPTER 1 Introduction Rendering is the task of synthesizing an image from a model. Concretely, it refers to producing an image based on a three-dimensional scene which is stored on a computer in order to visualize the scene from a certain position and perspective (described by the so-called camera). This task is essential for many applications. Among the most famous examples are computer games and computer-generated imagenery (CGI) in movies. Another application is the prototyping and development of a variety of products ranging from cars to mobile phones in order to make the development process much cheaper. In medicine, rendered visualizations can help medical practitioners to detect diseases or even prevent them. Architects can render visualizations of houses and their interiors. The advent of virtual reality (VR) makes visualizations even more attractive for future applications, e.g., visualizing a new apartment with the desired furniture. Some of these applications aim for highly accurate results, some require fast results, and some even require both. Nowadays, fast results can be generated using rasterization. Techniques that are based on ray tracing, on the other hand, can produce highly realistic results that are generally much slower.

Load more