Python / PyTorch 性能调优指南 | damaoooo's blog - JOYK Joy of Geek, Geek News, Link all geek

damaoooo's blog

Python / PyTorch 性能调优指南

发表于2023-09-15|更新于2023-09-15

阅读量:14

在炼丹的时候，总是会遇到预处理/训练的速度慢的情况，尤其是遇到大型数据集，一个不小心就是两三天，要是代码还有bug，跑了两天之后发现又得重来，那就更痛苦了，这里介绍一下如何使用profiler找到程序的瓶颈

简单易用的 Line Profiler

Line Profiler还是带有一点侵入性的，需要你手动更改代码

首先最简单的，安装

pip install line_profiler

安装好之后，在程序里插入@profile就行了（有些编辑器会提示找不到符号），如下

from line_profiler import profile


@profile
def calculation():
    x = 1
    for i in range(10000):
        x += i
    return x


print(calculation())

然后不要直接运行，而是运行

kernprof -l <filename>.py
python -m line_profiler <filename.py>.lprof

运行kernprof会生成一些profile文件，这个不是ASCII文件，打不开的，需要用line_profiler来打印输出

$ python -m line_profiler .\exp.py.lprof            
Timer unit: 1e-06 s

Total time: 0.0036551 s
File: ./exp.py
Function: calculation at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def calculation():
     6         1          0.4      0.4      0.0      x = 1
     7     10001       1724.8      0.2     47.2      for i in range(10000):
     8     10000       1929.7      0.2     52.8          x += i
     9         1          0.2      0.2      0.0      return x

是不是很直观呢，但是由于每次都必须手动在要跟踪的的函数里面输入@profile，这是十分不方便的

cProfile - 更加详细

cProfile就更加详细了，甚至可以做到C级别的profile

有侵入式和非侵入式两种用法

如果想要使用非侵入式

python -m cProfile -o myLog.profile <myScript.py> arg1 arg2

其中-o 是profile文件保存名字，不想要也可以不指定，结果如下

      5 function calls in 0.001 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.001    0.001    0.001    0.001 exp.py:2(calculation)
     1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
     1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
     1    0.000    0.000    0.001    0.001 exp.py:1(<module>)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

如果你想要看看火焰图，就需要导出profile文件了

flameprof pipeline.prof > pipeline.svg

下面的图是call graph，也比较直观

gprof2dot -f pstats myLog.profile -o callingGraph.dot

py-spy 超级方便

生成火焰图，即便是使用cProfile，都要两行命令，如何使用一行命令就生成火焰图呢，py-spy也是可以非侵入式的

py-spy record -o profile.svg -- python myprogram.py

torch.profiler

PyTorch自带一个profiler，可以对神经网络的计算部分进行profile，查找瓶颈，但是我发现我的瓶颈根本不在GPU，在单核CPU上，太菜了

from torch.profiler import profile, record_function, ProfilerActivity

with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], with_stack=True, record_shapes=True) as prof:
    with torch.no_grad():
        start_time = time.time()
        for i in range(100):
            s = model(dummy_graph)
        end_time = time.time()
        print("CUDA FP32 time: ", (end_time - start_time) / 10)

print(prof.key_averages(group_by_stack_n=5).table(sort_by="cpu_time_total",row_limit=10))
print(prof.key_averages( group_by_stack_n=5).table(sort_by="cuda_time_total", row_limit=10))
prof.export_stacks("./profiler_stacks.txt", "self_cuda_time_total")

Python / PyTorch 性能调优指南 | damaoooo's blog

简单易用的 Line Profiler

cProfile - 更加详细

py-spy 超级方便

torch.profiler

Recommend

配置Linux 的xrdp和VNC，让服务器支持GUI | damaoooo's blog

AMD advances the AI PC with the new AMD Ryzen 8040 series processors and Ryzen A...

Using ONNX Runtime to explore DragGAN to generate high quality synthetic images

Blockchain Companies And Crypto Markets Rising Investing In Maryland’s Market: A...

NAS 扩容简明指南：使用各种外设给 NAS 们扩容

快速认识什么是：Docker

Déjà Vu: BloomTech Cuts 50% Staff, Again

This holiday, get it all done with a Windows 11 PC

使用 Redis 构建轻量的向量数据库应用：图片搜索引擎（二）

Apple halts another attempt at iMessage on Android, while dodging EU scrutiny |...

About Joyk