6

Python / PyTorch 性能调优指南 | damaoooo's blog

 9 months ago
source link: https://damaoooo.github.io/2023/09/15/PythonPyTorch%20%E8%B0%83%E4%BC%98%E6%8C%87%E5%8D%97/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

damaoooo's blog

Python / PyTorch 性能调优指南
发表于2023-09-15|更新于2023-09-15
阅读量:14

在炼丹的时候,总是会遇到预处理/训练的速度慢的情况,尤其是遇到大型数据集,一个不小心就是两三天,要是代码还有bug,跑了两天之后发现又得重来,那就更痛苦了,这里介绍一下如何使用profiler找到程序的瓶颈

简单易用的 Line Profiler

Line Profiler还是带有一点侵入性的,需要你手动更改代码

首先最简单的,安装

pip install line_profiler

安装好之后,在程序里插入@profile就行了(有些编辑器会提示找不到符号),如下

from line_profiler import profile


@profile
def calculation():
x = 1
for i in range(10000):
x += i
return x


print(calculation())

然后不要直接运行,而是运行

kernprof -l <filename>.py
python -m line_profiler <filename.py>.lprof

运行kernprof会生成一些profile文件,这个不是ASCII文件,打不开的,需要用line_profiler来打印输出

$ python -m line_profiler .\exp.py.lprof            
Timer unit: 1e-06 s

Total time: 0.0036551 s
File: ./exp.py
Function: calculation at line 4

Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def calculation():
6 1 0.4 0.4 0.0 x = 1
7 10001 1724.8 0.2 47.2 for i in range(10000):
8 10000 1929.7 0.2 52.8 x += i
9 1 0.2 0.2 0.0 return x

是不是很直观呢,但是由于每次都必须手动在要跟踪的的函数里面输入@profile,这是十分不方便的

cProfile - 更加详细

cProfile就更加详细了,甚至可以做到C级别的profile

有侵入式和非侵入式两种用法

如果想要使用非侵入式

python -m cProfile -o myLog.profile <myScript.py> arg1 arg2

其中-o 是profile文件保存名字,不想要也可以不指定,结果如下

      5 function calls in 0.001 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.001 0.001 exp.py:2(calculation)
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 0.001 0.001 {built-in method builtins.exec}
1 0.000 0.000 0.001 0.001 exp.py:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

如果你想要看看火焰图,就需要导出profile文件了

flameprof pipeline.prof > pipeline.svg

下面的图是call graph,也比较直观

gprof2dot -f pstats myLog.profile -o callingGraph.dot

py-spy 超级方便

生成火焰图,即便是使用cProfile,都要两行命令,如何使用一行命令就生成火焰图呢,py-spy也是可以非侵入式的

py-spy record -o profile.svg -- python myprogram.py

torch.profiler

PyTorch自带一个profiler,可以对神经网络的计算部分进行profile,查找瓶颈,但是我发现我的瓶颈根本不在GPU,在单核CPU上,太菜了

from torch.profiler import profile, record_function, ProfilerActivity

with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], with_stack=True, record_shapes=True) as prof:
with torch.no_grad():
start_time = time.time()
for i in range(100):
s = model(dummy_graph)
end_time = time.time()
print("CUDA FP32 time: ", (end_time - start_time) / 10)

print(prof.key_averages(group_by_stack_n=5).table(sort_by="cpu_time_total",row_limit=10))
print(prof.key_averages( group_by_stack_n=5).table(sort_by="cuda_time_total", row_limit=10))
prof.export_stacks("./profiler_stacks.txt", "self_cuda_time_total")

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK