GPU Acceleration of the Variational Monte Carlo Method for Many Body

Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2013 GPU acceleration of the Variational Monte Carlo Method for Many Body Physics Kaushik Ragavan Rajagopalan Louisiana State University and Agricultural and Mechanical College, [email protected] Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_theses Part of the Electrical and Computer Engineering Commons Recommended Citation Rajagopalan, Kaushik Ragavan, "GPU acceleration of the Variational Monte Carlo Method for Many Body Physics" (2013). LSU Master's Theses. 3622. https://digitalcommons.lsu.edu/gradschool_theses/3622 This Thesis is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Master's Theses by an authorized graduate school editor of LSU Digital Commons. For more information, please contact [email protected]. GPU ACCELERATION OF THE VARIATIONAL MONTE CARLO METHOD FOR MANY BODY PHYSICS A Thesis Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering in The Department of Electrical and Computer Engineering by Rajagopalan Kaushik Ragavan B.ENG in Electronics and Communication, Anna University, Chennai, May 2010 May 2013 Acknowledgments My sincere thanks to Dr. David Koppelman for his valuable teachings on GPU, CPU architecture and on High Performance computing, which motivated me to pick my thesis in this domain. He was a great mentor and guided me on every step of my thesis. I would like to thank Dr.Mark Jarrell, Dr.Juana Moreno and the LA-Sigma group for funding this research. I would like to thank Dr.Xin Li and Dr.Juana Moreno for accepting my request to be a part of the thesis committee. I would also like to thank other colloborators: Dr.Ka Ming Tam, Dr.Zhifeng Yun, Niladri Sengupta and Dr.Sandeep Pathak for their valuable contributions and sharing of thoughts. Finally, I would like to thank my parents for supporting my education till now. It was their motivation and trust which made my degree possible. ii Table of Contents ACKNOWLEDGMENTS . ii LIST OF TABLES . v LIST OF FIGURES . vi ABSTRACT . viii CHAPTER 1 VARIATIONAL MONTE CARLO (VMC) . 1 1.1 Trial Wavefunction . 1 1.2 The VMC Algorithm . 2 1.2.1 Difficulty of VMC . 3 1.2.2 Faster VMC . 4 1.2.3 Summary of VMC method . 5 2 PRIORWORK............................................................ 7 2.1 CPU Implementation . 7 2.2 MPI Implementation . 7 2.3 GPU Implementation . 7 3 GRAPHICS PROCESSING UNITS . 9 3.1 Increasing Trend in Parallel Computing . 9 3.2 CUDA Programming model. 11 4 CPU IMPLEMENTATION OF VMC . 14 4.1 Pseudocode for the VMC Method . 14 4.1.1 Structure of the CPU code . 19 4.2 Calculation of the Ratio of Determinants using SMH Formula . 20 4.3 Update of a configuration . 22 4.4 Algorithm Analysis . 24 4.5 Cache Behavior. 24 4.5.1 Function DetPByDet . 25 4.5.2 Function UpdateConfig . 25 4.5.3 Function Pairfunction . 25 4.6 Results . 26 4.7 Multiple MCs and the system behavior . 31 5 PORTING VMC TO CUDA . 33 iii 5.1 GPU Acceleration of the VMC Method . 33 5.2 Na¨ıve Implementation . 33 5.3 Execution configuration. 35 5.4 Memory requirements and cache behaviour . 36 5.5 Memory Access Pattern. 39 5.5.1 Function DetPByDet . 39 5.5.2 Function UpdateConfig . 42 5.6 Optimization efforts . 44 5.6.1 Thread selection in depth . 44 5.6.2 Elimination of redundant copies of a configuration. 46 5.6.3 Optimized memory access pattern . 46 5.7 Merger of Equilibration and Accumulation . 51 5.8 Streamlined function . 53 6 RESULTS................................................................. 57 6.1 Input Parameters . 57 6.2 Comparison between a CPU and GPU code . 57 6.3 Comparison between MPI-CPU and GPU code . 57 6.4 GPU results in depth . 60 7 CONCLUSION . 64 7.1 Conclusion . 64 7.2 Future Work . 64 REFERENCES . 65 APPENDIX A CUDA CODE FOR THE MERGED EQUILIBRATION AND ACCUMULATION STAGE . 68 B DEVICE FUNCTIONS . 74 VITA ............................................................................. 87 iv List of Tables 4.1 Data Structures and Memory requirements. 19 4.2 Classes used in the Implementation of VMC . 19 4.3 FLOPS calculation for the VMC method . 24 4.4 Execution time for L =5.................................................... 27 4.5 Execution time for L =9.................................................... 28 4.6 Execution time for L =15................................................... 29 5.1 Speculation of a GPU`s cache behaviour for L =5 ........................... 37 5.2 Speculation of a GPU`s cache behaviour for L =9 ........................... 37 5.3 Speculation of a GPU`s cache behaviour for L =15 .......................... 38 5.4 Summary of Memory prediction for L =5.................................... 38 5.5 Threads per MC and Number of MCs per SM. 46 5.6 Occupancy calculation for L =5............................................. 46 6.1 Input Parameters . 58 6.2 Comparison of CPU vs GPU Performance for bfactor=8. 59 6.3 Comparison of CPU vs GPU Performance for bfactor=16 . 59 6.4 Nodal architecture of Philip Supercomputer . 59 6.5 Comparison of MPI vs GPU Performance . 60 6.6 GPU execution time in cycles for L =5...................................... 60 6.7 GPU execution time in cycles for L =9...................................... 61 6.8 GPU execution time in cycles for L =15..................................... 62 v List of Figures 3.1 Floating Point operations per second for CPU vs GPU, Source: CUDA Programming guide . 10 3.2 Memory Bandwidth for CPU vs GPU, Source: CUDA Program- ming guide . 10 3.3 GPU vs CPU architecture, Source: CUDA Programming guide . 11 3.4 Automatic Scalability of Applications, Source: CUDA Program- ming guide . ..

Load more