Python 代码性能分析

发表于2023-04-19|更新于2023-04-19|technology python

字数总计:1.7k|阅读时长:8分钟|阅读量:4

Python 是一门非常优秀的面向对象的解释性语言，代码编写快且易读（但从代码字面意义上）。如果非要挑出一个缺点，那就是代码的执行速度相对于 C、Java、Golang 等语言较慢，不过在 3.11 版的 Python 运行速度已经进行大幅提高。但是，对于 Python 代码的性能分析仍然不可或缺，这有助于找出耗时的代码部分，通过优化能够加速程序运行。本篇介绍一些 Python 代码性能分析的方法。

代码的性能分析跟代码执行时间密切相关，只不过它关注的是耗时的位置。默认的 Python 性能分析工具是 cProfile 模块，它在执行一个程序或代码块时，会记录各函数所耗费的时间。但它不是转为 Python 设计的。

cProfile 一般是在命令行上使用的，它将执行整个程序然后输出各函数的执行时间。我们首先编写一个 Python 模块 py_performance.py:

import numpy as np
from numpy.linalg import eigvals
np.random.seed(33)

def run_experiment(niter=100):
    K = 100
    results = []
    for _ in range(niter):
        mat = np.random.randn(K, K)
        max_eigenvalue = np.abs(eigvals(mat)).max()
        results.append(max_eigenvalue)
    return results


if __name__ == "__main__":
    some_results = run_experiment()
    print(f"最大特征值: {np.max(some_results)}")

执行性能分析：

python -m cProfile py_performance.py

结果大致是下面：

Largest one we saw: 11.375894329102476
         80055 function calls (77991 primitive calls) in 1.101 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.000    0.000    0.002    0.000 <__array_function__ internals>:177(all)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(amax)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(concatenate)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(copyto)
      100    0.000    0.000    0.592    0.006 <__array_function__ internals>:177(eigvals)
    149/1    0.001    0.000    0.454    0.454 <frozen importlib._bootstrap>:1002(_find_and_load)
   180/16    0.001    0.000    0.435    0.027 <frozen importlib._bootstrap>:1033(_handle_fromlist)
      452    0.001    0.000    0.002    0.000 <frozen importlib._bootstrap>:112(release)
      149    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:152(__init__)
      149    0.000    0.000    0.003    0.000 <frozen importlib._bootstrap>:156(__enter__)
      149    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:160(__exit__)
      452    0.002    0.000    0.003    0.000 <frozen importlib._bootstrap>:166(_get_module_lock)

单从上面的日志很难发现最耗时的地方在哪里，常用 -s 标志对某一列进行排序，如 cumtime：

Largest one we saw: 11.375894329102476
         80055 function calls (77991 primitive calls) in 0.743 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    110/1    0.000    0.000    0.743    0.743 {built-in method builtins.exec}
        1    0.000    0.000    0.743    0.743 py_performance.py:1(<module>)
        1    0.001    0.001    0.621    0.621 py_performance.py:5(run_experiment)
      100    0.000    0.000    0.567    0.006 <__array_function__ internals>:177(eigvals)
  203/103    0.000    0.000    0.567    0.006 {built-in method numpy.core._multiarray_umath.implement_array_function}
      100    0.562    0.006    0.567    0.006 linalg.py:976(eigvals)
       13    0.001    0.000    0.259    0.020 __init__.py:1(<module>)
    149/1    0.001    0.000    0.122    0.122 <frozen importlib._bootstrap>:1002(_find_and_load)
    149/1    0.001    0.000    0.122    0.122 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
    138/1    0.001    0.000    0.122    0.122 <frozen importlib._bootstrap>:659(_load_unlocked)
    109/1    0.000    0.000    0.122    0.122 <frozen importlib._bootstrap_external>:844(exec_module)
    216/1    0.000    0.000    0.122    0.122 <frozen importlib._bootstrap>:220(_call_with_frames_removed)
   180/16    0.001    0.000    0.118    0.007 <frozen importlib._bootstrap>:1033(_handle_fromlist)
    362/8    0.001    0.000    0.117    0.015 {built-in method builtins.__import__}
      100    0.053    0.001    0.053    0.001 {method 'randn' of 'numpy.random.mtrand.RandomState' objects}

注意：如果一个函数调用了别的函数，计时器是不会停下来重新计时的。

除了上面命令行的方式外，cProfile 还提供了编程式的分析代码块的性能方法，IPython 提供了方便的接口，如 %prun 和 %run -p.

基本性能分析 %prun 和 %run -p

该魔法函数可以直接在 IPython Jupyter notebook 中使用，

from pyscripts.py_performance import run_experiment
# 这里为了节省空间，只打印7行
%prun -l 7 -s cumulative run_experiment()

结果如下：

       4004 function calls (3904 primitive calls) in 0.603 seconds

Ordered by: cumulative time
 List reduced from 35 to 7 due to restriction <7>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    0.603    0.603 {built-in method builtins.exec}
      1    0.000    0.000    0.603    0.603 <string>:1(<module>)
      1    0.001    0.001    0.603    0.603 py_performance.py:5(run_experiment)
    100    0.000    0.000    0.552    0.006 <__array_function__ internals>:177(eigvals)
200/100    0.000    0.000    0.552    0.006 {built-in method numpy.core._multiarray_umath.implement_array_function}
    100    0.547    0.005    0.552    0.006 linalg.py:976(eigvals)
    100    0.049    0.000    0.049    0.000 {method 'randn' of 'numpy.random.mtrand.RandomState' objects}

%run -p -l 7 -s cumulative pyscripts/py_performance.py

结果如下：

Largest one we saw: 11.375894329102476

4163 function calls (4062 primitive calls) in 0.584 seconds

Ordered by: cumulative time
   List reduced from 106 to 7 due to restriction <7>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.000    0.000    0.584    0.584 {built-in method builtins.exec}
        1    0.000    0.000    0.584    0.584 <string>:1(<module>)
        1    0.000    0.000    0.584    0.584 interactiveshell.py:2774(safe_execfile)
        1    0.000    0.000    0.583    0.583 py3compat.py:51(execfile)
        1    0.000    0.000    0.582    0.582 py_performance.py:1(<module>)
        1    0.001    0.001    0.581    0.581 py_performance.py:5(run_experiment)
      100    0.000    0.000    0.532    0.005 <__array_function__ internals>:177(eigvals)

逐行分析函数性能 %lprun

逐行分析函数性能需要第三方模块 line_profiler，安装方法如下：

pip install line_profiler

安装后，可以编辑 ~/.ipython/profile_default/ipython_config.py，增加如下内容：

c.TerminalIPythonApp.extensions = [
    'line_profiler',
]

此时，可以在 IPython 中使用：

In [1]: from pyscripts.py_performance import run_experiment

In [2]: %lprun -f run_experiment run_experiment()
Timer unit: 1e-09 s

Total time: 0.997021 s
File: /home/jinzhongxu/MEGA/py/nb/bs/pyscripts/py_performance.py
Function: run_experiment at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def run_experiment(niter=100):
     6         1       1530.0   1530.0      0.0      K = 100
     7         1        421.0    421.0      0.0      results = []
     8       100      57943.0    579.4      0.0      for _ in range(niter):
     9       100   66120221.0 661202.2      6.6          mat = np.random.randn(K, K)
    10       100  930721489.0 9307214.9     93.4          max_eigenvalue = np.abs(eigvals(mat)).max()
    11       100     118745.0   1187.5      0.0          results.append(max_eigenvalue)
    12         1        321.0    321.0      0.0      return results

%lprun 的通用语法为：

%lprun -f func1 -f func2 func(x, y)

在 Jupyter notebook 中使用:

%load_ext line_profiler
from pyscripts.py_performance import run_experiment
%lprun -f run_experiment run_experiment()

结果如下：

Timer unit: 1e-09 s

Total time: 0.58282 s
File: /home/jinzhongxu/MEGA/py/nb/bs/pyscripts/py_performance.py
Function: run_experiment at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def run_experiment(niter=100):
     6         1       4160.0   4160.0      0.0      K = 100
     7         1        952.0    952.0      0.0      results = []
     8       100      43829.0    438.3      0.0      for _ in range(niter):
     9       100   46188517.0 461885.2      7.9          mat = np.random.randn(K, K)
    10       100  536499670.0 5364996.7     92.1          max_eigenvalue = np.abs(eigvals(mat)).max()
    11       100      82394.0    823.9      0.0          results.append(max_eigenvalue)
    12         1        192.0    192.0      0.0      return results

Python 代码性能分析

Python 代码性能分析

基本性能分析 %prun 和 %run -p

逐行分析函数性能 %lprun

Recommend

做出突出贡献！华为颁发最高荣誉奖：任正非与40名华为员工代表合影

Wealthiest People in the Philippines (April 19, 2023)

Getting Rid Of Unused Variables (Finally) - Inside Java Newscast #46

The Anatomy of Autonomy: Why Agents are the next AI Killer App after ChatGPT

Journal Entry Format

Wealthiest People in Finland (April 19, 2023)

Working to make quantum computing more scalable and fault tolerant, Quantum Sour...

当中国广电开始强调算力

RDNA 3 架構搭配大容量記憶體，AMD Radeon Pro W7900 與 W7800 專業繪圖卡登場

EventMesh Serverless Platform Promoted to Apache Top-Level Project

About Joyk