Matrix Transpose Naive Kernel Analysis

Matrix Transpose Naive Kernel Analysis

[AMD Official Use Only - Internal Distribution Only] Chapter 3.2: Matrix Transpose Naive Kernel Analysis ROCm Tutorial | AMD 2020 [AMD Official Use Only - Internal Distribution Only] Table of Contents CHAPTER 3.2: MATRIX TRANSPOSE NAIVE KERNEL ANALYSIS ............................................................................................................. 2 PREPARATION ...................................................................................................................................................................................................................................... 2 COMPILING AND EXECUTING ............................................................................................................................................................................................................. 2 PROFILING ............................................................................................................................................................................................................................................ 3 ROCm Tutorial | AMD 2020 | 1 [AMD Official Use Only - Internal Distribution Only] Chapter 3.2: Matrix Transpose Naive Kernel Analysis In this tutorial, we will analyze the matrix transpose naive implementation using the rocProf profiler tool from ROCm Preparation 1. First in the tutorial repository go to the directory cd 02_Matrix_Transpose 2. The application code is in matrix_transpose_naive.cpp Compiling and Executing 1. Compile the program hipcc matrix_transpose_naive.cpp -o matrix_transpose_naive 2. Execute the program without profiler ./matrix_transpose_naive 3. Note that we are not printing any output from the matrices as the matrices are large. But you can add print code if desired ROCm Tutorial | AMD 2020 | 2 [AMD Official Use Only - Internal Distribution Only] Profiling 1. Now we will analyze the application through the profiler 2. First let us collect the kernel execution time using the performance measurement mode. Run the following command: rocprof --stats ./matrix_transpose_naive You will get the output in a file results.csv. Note down the kernel duration(ns) 3. For this application we have provided the metrics file for collecting the HW performance counters in “metrics_matrix_transpose_naive_kernel.txt” 4. On a closer look the only thing different in this file from the metrics file of the copy kernel is the kernel name which is set to transpose_kernel 5. Now we will run the application in performance counter mode using our defined metric file: rocprof -i metrics_matrix_transpose_naive_kernel.txt -o metrics_matrix_transpose_naive.csv ./matrix_transpose_naive This will output the results of the HW performance counters in metrics_matrix_transpose_naive.csv. 6. Keep a record of the results obtained for this kernel. For our case, we obtained the results below. Your results might be different depending on the GPU you are on: Kernel time(ns):15135463 TCC_EA_RDREQ_sum: 524289 TCC_EA_WRREQ_sum: 1686190 ROCm Tutorial | AMD 2020 | 3 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us