在Ubuntu20.04上安装Tensorflow遇到的问题和解决方法

安装最新版Ubuntu20.04作为体验，发现安装本地Tensorflow遇到很多问题，此处记录一下解决方法。

CUDA 10.1 requires gcc <= 8
Python3.8

$cat /var/log/cuda-installer.log
...
[ERROR]: unsupported compiler version: 9.3.0. Use --override to override this check.

解决Python版本问题

参考这个，使用Conda或Docker创建多版本Python环境

解决gcc版本问题

这里可参考旧文Linux系统中安装多版本gcc

安装CUDA Toolkit

下载 run 版本的cuda

$chmod a+x cuda_10.1.243_418.87.00_linux.run
$sudo ./cuda_10.1.243_418.87.00_linux.run --silent --toolkit --samples --librarypath=/usr/local/cuda

你也可以输入./cuda_10.1.243_418.87.00_linux.run --help 看看其他参数。

查看CUDA版本

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

cuDNN

cuDNN下载地址在这里cudnnlib

$ tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

.bashrc

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

安装Tensorflow

conda create env

pip -i https://pypi.tuna.tsinghua.edu.cn/simple install tensorflow

写一个简单的模型测试

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K

num_classes = 10
img_rows, img_cols = 28, 28

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# channel last
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=1024,
          epochs=20,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

watch nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0  On |                  N/A |
| 37%   59C    P2   212W / 255W |   3770MiB /  7974MiB |     91%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1232      G   /usr/lib/xorg/Xorg                            60MiB |
|    0      1943      G   /usr/lib/xorg/Xorg                           283MiB |
|    0      2145      G   /usr/bin/gnome-shell                         135MiB |
|    0      2551      G   ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files   236MiB |
|    0      8349      G   ...quest-channel-token=1436292411171661387   296MiB |
|    0     27901      G   /usr/bin/totem                                18MiB |
|    0     56093      C   python3                                     2715MiB |
+-----------------------------------------------------------------------------+

此外，你也可以下载 https://github.com/tensorflow/benchmarks 上面的源码来测试。

以上是在Ubuntu20.04上安装Tensorflow，不过Ubuntu20.04发布不久，不知道会遇到什么问题，而且很多工具还不支持，建议还是作为尝鲜试试，不要把开发环境迁移到这里。

转载请包括本文地址：https://allenwind.github.io/blog/12238/
更多文章请参考：https://allenwind.github.io/blog/archives/

解决Python版本问题

解决gcc版本问题

安装CUDA Toolkit

cuDNN

安装Tensorflow

Recommend

变分推断：多角度理解

C 语言的 argc、argv 干嘛的？

漫谈注意力机制（五）：自注意力与Transformer

考研英语习作

使用神经网络进行分布变换

Sublime Text 快捷键

使用Python元编程创建缓存实例

使用ORM进行快速开发

函数光滑近似（2）：softmax与argmax

LRU算法的实现（Python版）

About Joyk