Trace memory error of CUDA program
source link: http://www.donghao.org/2021/05/14/trace-memory-error-of-cuda-program/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Trace memory error of CUDA program
The program which used CUDA for computing in GPU reported error about memory:
terminate called after throwing an instance of 'std::runtime_error' what(): [CUDA] an illegal memory access was encountered LightGBM/src/treelearner/cuda_tree_learner.cpp 239
xxxxxxxxxx
terminate called after throwing an instance of 'std::runtime_error'
what(): [CUDA] an illegal memory access was encountered LightGBM/src/treelearner/cuda_tree_learner.cpp 239
For common C++ program, we use gdb
for debugging. For CUDA program, we should use cuda-gdb
. Make sure to compile CUDA code with -g
flag and then run:
/usr/local/cuda-11.0/bin/cuda-gdb python3 (cuda-gdb) run test.py
xxxxxxxxxx
/usr/local/cuda-11.0/bin/cuda-gdb python3
(cuda-gdb) run test.py
After a while, we could see the exact memory corrupt position of the code:
CUDA Exception: Warp Illegal Address The exception was triggered at PC 0x1668b2f0 (histogram_16_64_256.cu:182) Thread 1 "python3" received signal CUDA_EXCEPTION_14, Warp Illegal Address. [Switching focus to CUDA kernel 0, grid 10, block (2163,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0] 0x000000001668b380 in LightGBM::histogram16<<<(7360,1,1),(16,1,1)>>> () at LightGBM/src/treelearner/kernels/histogram_16_64_256.cu:185 185 feature = (feature >> ((ind & 1) << 2)) & 0xf;
xxxxxxxxxx
CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x1668b2f0 (histogram_16_64_256.cu:182)
Thread 1 "python3" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 10, block (2163,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]
0x000000001668b380 in LightGBM::histogram16<<<(7360,1,1),(16,1,1)>>> () at LightGBM/src/treelearner/kernels/histogram_16_64_256.cu:185
185 feature = (feature >> ((ind & 1) << 2)) & 0xf;
Like this:
Related
CUDA
Leave a comment
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK