torch函数解析

Posted on 2019-08-21

Post modified: 2019-11-20

| In NLP

| Visitors:

Words count in article:

torch的API

本文将会介绍一些常用的API，力求把参数解释清楚，通俗易懂。[大多数搬运自官方文档](https://pytorch-cn.readthedocs.io/zh/latest/

Tensor

常用创建操作

torch.ones/zeros/rand/randn

torch.ones(*sizes, out=None) → Tensor
torch.zeros(*sizes, out=None) → Tensor
torch.rand(*sizes, out=None) → Tensor #返回一个张量，包含了从区间[0,1)的均匀分布中抽取的一组随机数，
torch.randn(*sizes, out=None) → Tensor #返回一个张量，包含了从标准正态分布(均值为0，方差为 1，即高斯白噪声)中抽取一组随机数


这四个用法相同，参数只有一个的话，返回的都是方阵。否则维度和输入的size相同，下面对ones做讲解

返回一个全为1 的张量，形状由可变参数sizes定义。

参数:

- sizes (int...) – 整数序列，定义了输出形状
- out (Tensor, optional) – 结果张量


torch.ones(2, 3)
 1  1  1
 1  1  1

torch.eye

torch.eye(n, m=None, out=None)

返回一个2维张量，对角线位置全1，其它位置全0

参数:
n (int ) – 行数
m (int, optional) – 列数.如果为None,则默认为n
out (Tensor, optinal) - 结果张量
返回值: 对角线位置全1，其它位置全0的2维张量
返回类型：Tensor

Torch.from_numpy

torch.from_numpy(ndarray)

将numpy.ndarray 转换为pytorch的 Tensor。 返回的张量tensor和numpy的ndarray共享同一内存空间。修改一个会导致另外一个也被修改。返回的张量不能改变大小。

Torch.where

x = t.linspace(1, 6, steps=6).view(2, 3)
print(x)
ans = t.where(x > 5, t.full_like(x, 5), x)
print(ans)

输入：
输入三个维度相同的矩阵

输出：
tensor([[1., 2., 3.],
        [4., 5., 6.]])
tensor([[1., 2., 3.],
        [4., 5., 5.]])

首先判断第一个矩阵中的x中的元素是不是满足条件，满足的话值为输入的第二个矩阵的值，否则值为第三个矩阵的值

索引，切片，连接，换位操作

torch.cat

torch.cat(inputs, dimension=0) → Tensor

在给定维度上对输入的张量序列seq 进行连接操作。

请注意inputs的表示形式是()这样的tuple的形式。


>>> x = torch.randn(2, 3)
>>> x

0.5983 -0.0341  2.4918
 1.5981 -0.5265 -0.8735
[torch.FloatTensor of size 2x3]

>>> torch.cat((x, x, x), 0)

0.5983 -0.0341  2.4918
 1.5981 -0.5265 -0.8735
 0.5983 -0.0341  2.4918
 1.5981 -0.5265 -0.8735
 0.5983 -0.0341  2.4918
 1.5981 -0.5265 -0.8735
[torch.FloatTensor of size 6x3]

>>> torch.cat((x, x, x), 1)

0.5983 -0.0341  2.4918  0.5983 -0.0341  2.4918  0.5983 -0.0341  2.4918
 1.5981 -0.5265 -0.8735  1.5981 -0.5265 -0.8735  1.5981 -0.5265 -0.8735
[torch.FloatTensor of size 2x9]

torch.gather

torch.gather(input, dim, index, out=None) → Tensor
其中index的大小和原矩阵的大小相同，index表示的是，每次选择第dim维上的哪一个数。

沿给定轴dim，将输入索引张量index指定位置的值进行聚合。
对一个3维张量，输出可以定义为：

out[i][j][k] = tensor[index[i][j][k]][j][k]  # dim=0
out[i][j][k] = tensor[i][index[i][j][k]][k]  # dim=1
out[i][j][k] = tensor[i][j][index[i][j][k]]  # dim=2

参数:

input (Tensor) – 源张量
dim (int) – 索引的轴
index (LongTensor) – 聚合元素的下标
out (Tensor, optional) – 目标张量

例子
t = torch.Tensor([[1,2],[3,4]])
>>> torch.gather(t, 1, torch.LongTensor([[0,0],[1,0]]))
 1  1
 4  3
[torch.FloatTensor of size 2x2]

torch.index_select

torch.index_select(input, dim, index, out=None) → Tensor

沿着指定维度对输入进行切片，取index中指定的相应项(index为一个LongTensor)，然后返回到一个新的张量， 返回的张量与原始张量_Tensor_有相同的维度(在指定轴上)。这个我认为还是很好用的。
注意： 返回的张量不与原始张量共享内存空间。

参数:

input (Tensor) – 输入张量
dim (int) – 索引的轴
index (LongTensor) – 包含索引下标的一维张量
out (Tensor, optional) – 目标张量

例子
>>> x = torch.randn(3, 4)
>>> x

1.2045  2.4084  0.4001  1.1372
 0.5596  1.5677  0.6219 -0.7954
 1.3635 -1.2313 -0.5414 -1.8478
[torch.FloatTensor of size 3x4]

>>> indices = torch.LongTensor([0, 2])
>>> torch.index_select(x, 0, indices)

1.2045  2.4084  0.4001  1.1372
 1.3635 -1.2313 -0.5414 -1.8478
[torch.FloatTensor of size 2x4]

>>> torch.index_select(x, 1, indices)

1.2045  0.4001
 0.5596  0.6219
 1.3635 -0.5414
[torch.FloatTensor of size 3x2]

torch.squeeze

torch.squeeze(input, dim=None, out=None)

将输入张量形状中的1 去除并返回。 如果输入是形如(A×1×B×1×C×1×D)，那么输出形状就为： (A×B×C×D)

注意：当给定dim时，那么挤压操作只在给定维度上。例如，输入形状为: (A×1×B), squeeze(input, 0) 将会保持张量不变，只有用 squeeze(input, 1)，形状会变成 (A×B)。
返回张量与输入张量共享内存，所以改变其中一个的内容会改变另一个。

参数:

input (Tensor) – 输入张量
dim (int, optional) – 如果给定，则input只会在给定维度挤压
out (Tensor, optional) – 输出张量


>>> x = torch.zeros(2,1,2,1,2)
>>> x.size()
(2L, 1L, 2L, 1L, 2L)
>>> y = torch.squeeze(x)
>>> y.size()
(2L, 2L, 2L)
>>> y = torch.squeeze(x, 0)
>>> y.size()
(2L, 1L, 2L, 1L, 2L)
>>> y = torch.squeeze(x, 1)
>>> y.size()
(2L, 2L, 1L, 2L)

torch.unsqueeze

torch.unsqueeze(input, dim, out=None)

返回一个新的张量，对输入的指定位置插入维度 1
注意： 返回张量与输入张量共享内存，所以改变其中一个的内容会改变另一个。同时如果指定的维度大小已经为1的话，则操作无效

参数:

input (Tensor) – 输入张量
dim (int, optional) – 如果给定，则input只会在给定维度挤压
out (Tensor, optional) – 输出张量

例子：
>>> x
tensor([1, 2, 3, 4])
>>> x.unsqueeze(0)
tensor([[1, 2, 3, 4]])
>>> x
tensor([1, 2, 3, 4])
>>> x.unsqueeze(1)
tensor([[1],
        [2],
        [3],
        [4]])
>>> x.unsqueeze(1)
tensor([[1],
        [2],
        [3],
        [4]])

torch.stack

torch.stack(inputs, dim=0)
不同于cat，stack并不是将张量进行拼接，而是将张量放置在同一个维度下

参数：
inputs(tensors) 表示的是一个tuple()。其中的每一个tensor维度应该相同。
dim (int) 表示索引的轴


>>> x=torch.rand(2,3)
>>> y=torch.rand(2,3)
>>> x
tensor([[0.9574, 0.1898, 0.1229],
        [0.0717, 0.9662, 0.9138]])
>>> y
tensor([[0.8737, 0.4180, 0.1566],
        [0.1349, 0.8757, 0.3162]])
>>> torch.stack((x,y),dim=0)
tensor([[[0.9574, 0.1898, 0.1229],
         [0.0717, 0.9662, 0.9138]],

[[0.8737, 0.4180, 0.1566],
         [0.1349, 0.8757, 0.3162]]])
>>> torch.stack((x,y),dim=1)
tensor([[[0.9574, 0.1898, 0.1229],
         [0.8737, 0.4180, 0.1566]],

[[0.0717, 0.9662, 0.9138],
         [0.1349, 0.8757, 0.3162]]])
>>> torch.stack((x,y),dim=2)
tensor([[[0.9574, 0.8737],
         [0.1898, 0.4180],
         [0.1229, 0.1566]],

[[0.0717, 0.1349],
         [0.9662, 0.8757],
         [0.9138, 0.3162]]])

torch.transpose

torch.transpose(input, dim0, dim1, out=None) → Tensor

返回输入矩阵input的转置。交换维度dim0和dim1。 输出张量与输入张量共享内存，所以改变其中一个会导致另外一个也被修改。

注意： 只支持多维矩阵！ 同时转置之后内存不是连续的，回影响到view函数。
参数:

input (Tensor) – 输入张量
dim0 (int) – 转置的第一维
dim1 (int) – 转置的第二维

>>> x = torch.randn(2, 3)
>>> x

0.5983 -0.0341  2.4918
 1.5981 -0.5265 -0.8735
[torch.FloatTensor of size 2x3]

>>> torch.transpose(x, 0, 1)

0.5983  1.5981
-0.0341 -0.5265
 2.4918 -0.8735
[torch.FloatTensor of size 3x2]

tensor.view

tensor.view(*size) → Tensor

按顺序将tensor的形状进行改变，这在全连接的那一层非常有用

参数:

*size 表示的是新变换矩阵的维度

>>> x
tensor([[0.1295, 0.1408, 0.8924, 0.4253],
        [0.5481, 0.2811, 0.1334, 0.5376],
        [0.9201, 0.4686, 0.2011, 0.7833]])
>>> x.view(2,6)
tensor([[0.1295, 0.1408, 0.8924, 0.4253, 0.5481, 0.2811],
        [0.1334, 0.5376, 0.9201, 0.4686, 0.2011, 0.7833]])
>>> x.view(3,2,2)
tensor([[[0.1295, 0.1408],
         [0.8924, 0.4253]],

[[0.5481, 0.2811],
         [0.1334, 0.5376]],

[[0.9201, 0.4686],
         [0.2011, 0.7833]]])

Tensor.contiguous

tensor.contiguous() -> Tensor

表示把tensor变成内存连续的tensor

>>> a=torch.rand(3,2)
>>> a=a.t()
>>> print(a)
tensor([[0.6175, 0.4523, 0.3499],
        [0.0089, 0.5199, 0.4651]])
>>> a.is_contiguous()
False
>>> a.view(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at ../aten/src/TH/generic/THTensor.cpp:203
>>> a=a.contiguous()
>>> a.view(-1)
tensor([0.6175, 0.4523, 0.3499, 0.0089, 0.5199, 0.4651])

通过上面的例子，可以看到转置是会影响到内存的连续性的！！这个时候我们使用contiguous恢复就好了。

Tensor.repeat

tensor.repeat(*a) -> tensor

其中a[0]是必选参数，其余的是可选参数。参数的意思是把tensor按块重复堆叠几次。同时填了几个参数，就在原tensor上加多少-1维。

>>> a=torch.rand(3)
>>>
>>> b=a.repeat(3)
>>> b
tensor([0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108])
>>> b=a.repeat(3,3)
>>> b
tensor([[0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108],
        [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108],
        [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108]])
>>> b=a.repeat(3,3,3)
>>> b
tensor([[[0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108],
         [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108],
         [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108]],

[[0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108],
         [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108],
         [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108]],

[[0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108],
         [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108],
         [0.9655, 0.5626, 0.4108, 0.9655, 0.5626, 0.4108, 0.9655, 0.5626,
          0.4108]]])

Torch.nn

Container-容器

class torch.nn.Module

所有网络的基类

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)# submodule: Conv2d
        self.conv2 = nn.Conv2d(20, 20, 5)

def forward(self, x):
       x = F.relu(self.conv1(x))
       return F.relu(self.conv2(x))

cpu()

将所有的模型参数(parameters)和buffers复制到CPU

cuda(device_id)

device_id (int, optional) – 如果指定的话，所有的模型参数都会复制到指定的设备上。

eval()

将模型设置成evaluation模式仅仅当模型中有Dropout和BatchNorm是才会有影响。

train(mode=True)

将模型设置成train模式仅仅当模型中有Dropout和BatchNorm是才会有影响。

zero_grad()

将module中的所有模型参数的梯度设置为0.

卷积CNN层

Conv1d

class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

一维卷积层，输入的尺度是(N,Cin,L)，输出尺度(N,Cout,Lout)的计算方式：

out(Ni,Coutj)=bias(Coutj)+Cin−1∑k=0weight(Coutj,k)⨂input(Ni,k)

Parameters：

in_channels(int) – 输入信号的通道
out_channels(int) – 卷积产生的通道
kerner_size(int or tuple) - 卷积核的尺寸
stride(int or tuple, optional) - 卷积步长
padding (int or tuple, optional)- 输入的每一条边补充0的层数
dilation(int or tuple, `optional``) – 卷积核元素之间的间距
groups(int, optional) – 从输入通道到输出通道的阻塞连接数
bias(bool, optional) - 如果bias=True，添加偏置

shape:
输入: (N,Cin,Lin)
输出: (N,Cout,Lout)
输入输出的计算方式：

Lout=floor((Lin+2∗padding−dilation∗(kernersize−1)−1)/stride+1)

变量:
weight(tensor) - 卷积的权重，大小是(out_channels, in_channels, kernel_size)
bias(tensor) - 卷积的偏置系数，大小是（out_channel）

例子

>>> m = nn.Conv1d(16, 33, 3, stride=2)
>>> input = autograd.Variable(torch.randn(20, 16, 50))
>>> output = m(input)

Conv2d

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

二维卷积层, 输入的尺度是(N,Cin,H,W)，输出尺度(N,Cout,Hout,Wout)的计算方式：

out(Ni,Coutj)=bias(Coutj)+Cin−1∑k=0weight(Coutj,k)⨂input(Ni,k)

参数kernel_size，stride,padding，dilation也可以是一个int的数据，此时卷积height和width值相同;也可以是一个tuple数组，tuple的第一维度表示height的数值，tuple的第二维度表示width的数值

Parameters：

in_channels(int) – 输入信号的通道
out_channels(int) – 卷积产生的通道
kerner_size(int or tuple) - 卷积核的尺寸
stride(int or tuple, optional) - 卷积步长
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 卷积核元素之间的间距
groups(int, optional) – 从输入通道到输出通道的阻塞连接数
bias(bool, optional) - 如果bias=True，添加偏置

shape:
input:(N,Cin,Hin,Win)
output: (N,Cout,Hout,Wout)

Hout=floor((Hin+2∗padding[0]−dilation[0]∗(kernerlsize[0]−1)−1)/stride[0]+1)Wout=floor((Win+2∗padding[1]−dilation[1]∗(kernerlsize[1]−1)−1)/stride[1]+1)

变量:
weight(tensor) - 卷积的权重，大小是(out_channels, in_channels,kernel_size)
bias(tensor) - 卷积的偏置系数，大小是（out_channel）

例子

>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = autograd.Variable(torch.randn(20, 16, 50, 100))
>>> output = m(input)

MaxPool1d

class torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

对于输入信号的输入通道，提供1维最大池化（max pooling）操作

如果输入的大小是(N,C,L)，那么输出的大小是(N,C,L_out)的计算方式是：
$out(N_i, C_j,k)=max^{kernel_size-1}_{m=0}input(N{i},C_j,stridek+m)$

如果padding不是0，会在输入的每一边添加相应数目0

参数：

kernel_size(int or tuple) - max pooling的窗口大小
stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
return_indices - 如果等于True，会返回输出最大值的序号，对于上采样操作会有帮助
ceil_mode - 如果等于True，计算输出信号大小的时候，会使用向上取整，代替默认的向下取整的操作

shape:
输入: (N,C_in,L_in)
输出: (N,C_out,L_out)

Lout=floor((Lin+2∗padding−dilation∗(kernelsize−1)−1)/stride+1

例子

>>> # pool of size=3, stride=2
>>> m = nn.MaxPool1d(3, stride=2)
>>> input = autograd.Variable(torch.randn(20, 16, 50))
>>> output = m(input)

MaxPool2d

class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

对于输入信号的输入通道，提供2维最大池化（max pooling）操作

如果输入的大小是(N,C,H,W)，那么输出的大小是(N,C,H_out,W_out)和池化窗口大小(kH,kW)的关系是：

out(Ni,Cj,k)=maxkH−1m=0maxkW−1m=0input(Ni,Cj,stride[0]∗h+m,stride[1]∗w+n)

如果padding不是0，会在输入的每一边添加相应数目0

参数：

kernel_size(int or tuple) - max pooling的窗口大小
stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
return_indices - 如果等于True，会返回输出最大值的序号，对于上采样操作会有帮助
ceil_mode - 如果等于True，计算输出信号大小的时候，会使用向上取整，代替默认的向下取整的操作

shape:
输入: (N,C,Hin,Win)
输出: (N,C,Hout,Wout)

Hout=floor((Hin+2∗padding[0]−dilation[0]∗(kernelsize[0]−1)−1)/stride[0]+1Wout=floor((Win+2∗padding[1]−dilation[1]∗(kernelsize[1]−1)−1)/stride[1]+1

例子

>>> # pool of square window of size=3, stride=2
>>> m = nn.MaxPool2d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
>>> input = autograd.Variable(torch.randn(20, 16, 50, 32))
>>> output = m(input)

BatchNorm1d

class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True)

对小批量(mini-batch)的2d或3d输入进行批标准化(Batch Normalization)操作

y=x−mean[x]√Var[x]+ϵ∗gamma+beta

在每一个小批量（mini-batch）数据中，计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量（C为输入大小）

在训练时，该层计算每次输入的均值与方差，并进行移动平均。移动平均默认的动量值为0.1。

在验证时，训练求得的均值/方差将用于标准化验证数据。

参数：

num_features： 来自期望输入的特征数，该期望输入的大小为’batch_size x num_features [x width]’
eps： 为保证数值稳定性（分母不能趋近或取0）,给分母加上的值。默认为1e-5。
momentum： 动态均值和动态方差所使用的动量。默认为0.1。
affine： 一个布尔值，当设为true，给该层添加可学习的仿射变换参数。

Shape： - 输入：（N, C）或者(N, C, L) - 输出：（N, C）或者（N，C，L）（输入输出相同）

例子

>>> # With Learnable Parameters
>>> m = nn.BatchNorm1d(100)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm1d(100, affine=False)
>>> input = autograd.Variable(torch.randn(20, 100))
>>> output = m(input)

BatchNorm2d

class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)

对小批量(mini-batch)3d数据组成的4d输入进行批标准化(Batch Normalization)操作

y=x−mean[x]√Var[x]+ϵ∗gamma+beta

在每一个小批量（mini-batch）数据中，计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量（C为输入大小）

在训练时，该层计算每次输入的均值与方差，并进行移动平均。移动平均默认的动量值为0.1。

在验证时，训练求得的均值/方差将用于标准化验证数据。

参数：

num_features： 来自期望输入的特征数，该期望输入的大小为’batch_size x num_features x height x width’
eps： 为保证数值稳定性（分母不能趋近或取0）,给分母加上的值。默认为1e-5。
momentum： 动态均值和动态方差所使用的动量。默认为0.1。
affine： 一个布尔值，当设为true，给该层添加可学习的仿射变换参数。

Shape： - 输入：（N, C，H, W) - 输出：（N, C, H, W）（输入输出相同）

例子

>>> # With Learnable Parameters
>>> m = nn.BatchNorm2d(100)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm2d(100, affine=False)
>>> input = autograd.Variable(torch.randn(20, 100, 35, 45))
>>> output = m(input)

class torch.nn.RNN( args, * kwargs)[source]

参数说明:

input_size – 输入x的特征数量。
hidden_size – 隐层的特征数量。
num_layers – RNN的层数。
nonlinearity – 指定非线性函数使用tanh还是relu。默认是tanh。
bias – 如果是False，那么RNN层就不会使用偏置权重 bih和bhh,默认是True
batch_first – 如果True的话，那么输入Tensor的shape应该是[batch_size, time_step, feature],输出也是这样。
dropout – 如果值非零，那么除了最后一层外，其它层的输出都会套上一个dropout层。
bidirectional – 如果True，将会变成一个双向RNN，默认为False。

RNN的输入： (input, h_0)

input (seq_len, batch, input_size): 保存输入序列特征的tensor。input可以是被填充的变长的序列。细节请看torch.nn.utils.rnn.pack_padded_sequence(),如果batch_first为true，那么inputs形状为 (batch, seq_len, input_size)。
h0 (num_layers * num_directions, batch, hidden_size): 保存着初始隐状态的tensor

RNN的输出： (output, h_n)

output (seq_len, batch, hidden_size * num_directions): 保存着RNN最后一层的输出特征。如果输入是被填充过的序列，那么输出也是被填充的序列。
hn (num_layers * num_directions, batch, hidden_size): 保存着最后一个时刻隐状态。

RNN模型参数:

weightihl[k] – 第k层的 input-hidden 权重，可学习，形状是(input_size x hidden_size)。
weighthhl[k] – 第k层的 hidden-hidden 权重，可学习，形状是(hidden_size x hidden_size)
biasihl[k] – 第k层的 input-hidden 偏置，可学习，形状是(hidden_size)
biashhl[k] – 第k层的 hidden-hidden 偏置，可学习，形状是(hidden_size)

rnn = nn.RNN(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, h0)

class torch.nn.LSTM( args, * kwargs)[source]

计算公式：

it=sigmoid(Wiixt+bii+Whiht−1+bhi)ft=sigmoid(Wifxt+bif+Whfht−1+bhf)ot=sigmoid(Wioxt+bio+Whoht−1+bho)gt=tanh(Wigxt+big+Whght−1+bhg)ct=ftct−1+itgtht=ot∗tanh(ct)

ht是时刻t的隐状态,ct是时刻t的细胞状态，xt是上一层的在时刻t的隐状态或者是第一层在时刻t的输入。it,ft,gt,ot 分别代表输入门，遗忘门，细胞和输出门。

参数说明:

input_size – 输入的特征维度
hidden_size – 隐状态的特征维度
num_layers – 层数（和时序展开要区分开）
bias – 如果为False，那么LSTM将不会使用bih,bhh，默认为True。
batch_first – 如果为True，那么输入和输出Tensor的形状为(batch, seq, feature)
dropout – 如果非零的话，将会在LSTM的输出上加个dropout`，最后一层除外。
bidirectional – 如果为True，将会变成一个双向LSTM，默认为False`。

LSTM输入: input, (h_0, c_0)

input (seq_len, batch, input_size): 包含输入序列特征的Tensor。也可以是packed variable ，详见 pack_padded_sequence
h_0 (num_layers * num_directions, batch, hidden_size):保存着batch中每个元素的初始化隐状态的Tensor
c_0 (num_layers * num_directions, batch, hidden_size): 保存着batch中每个元素的初始化细胞状态的Tensor

LSTM输出 output, (h_n, c_n)

output (seq_len, batch, hidden_size * num_directions): 保存RNN最后一层的输出的Tensor。如果输入是torch.nn.utils.rnn.PackedSequence，那么输出也是torch.nn.utils.rnn.PackedSequence。
h_n (num_layers * num_directions, batch, hidden_size): Tensor，保存着RNN最后一个时间步的隐状态。
c_n (num_layers * num_directions, batch, hidden_size): Tensor，保存着RNN最后一个时间步的细胞状态。

LSTM模型参数:

weightihl[k] – 第k层可学习的input-hidden权重(Wii|Wif|Wig|Wio)，形状为(input_size x 4*hidden_size)
weighthhl[k] – 第k层可学习的hidden-hidden权重(Whi|Whf|Whg|Who)，形状为(hidden_size x 4*hidden_size)。
biasihl[k] – 第k层可学习的input-hidden偏置(bii|bif|big|bio)，形状为( 4*hidden_size)
biashhl[k] – 第k层可学习的hidden-hidden偏置(bhi|bhf|bhg|bho)，形状为( 4*hidden_size)。

lstm = nn.LSTM(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = lstm(input, (h0, c0))

class torch.nn.GRU( args, * kwargs)[source]

计算公式：

rt=sigmoid(Wirxt+bir+Whrh(t−1)+bhr)it=sigmoid(Wiixt+bii+Whih(t−1)+bhi)nt=tanh(Winxt+bin+rt(Whnh(t−1)+bhn))ht=(1−it)nt+it∗h(t−1)

ht是是时间t的上的隐状态，xt是前一层t时刻的隐状态或者是第一层的t时刻的输入，rt,it,nt分别是重置门，输入门和输出门。

参数说明：

input_size – 期望的输入x的特征值的维度
hidden_size – 隐状态的维度 - num_layers – GRU的层数。
bias – 如果为False，那么GRU层将不会使用bias，默认为True
batch_first – 如果为True的话，那么输入和输出的tensor的形状是(batch, seq, feature)。
dropout – 如果非零的话，将会在GRU的输出上加个dropout，最后一层除外。
bidirectional – 如果为True，将会变成一个双向GRU，默认为False。

输入： input, h_0

input (seq_len, batch, input_size): 包含输入序列特征的Tensor。也可以是packed variable ，详见 pack_padded_sequence。
h_0 (num_layers * num_directions, batch, hidden_size):保存着batch中每个元素的初始化隐状态的Tensor

输出： output, h_n

output (seq_len, batch, hidden_size * num_directions): ten保存RNN最后一层的输出的Tensor。如果输入是torch.nn.utils.rnn.PackedSequence，那么输出也是torch.nn.utils.rnn.PackedSequence。
h_n (num_layers * num_directions, batch, hidden_size): Tensor，保存着RNN最后一个时间步的隐状态。

weightihl[k] – 第k层可学习的input-hidden权重(Wir|Wii|Win)，形状为(input_size x 3*hidden_size)
weighthhl[k] – 第k层可学习的hidden-hidden权重(Whr|Whi|Whn)，形状为(hidden_size x 3*hidden_size)。
biasihl[k] – 第k层可学习的input-hidden偏置(bir|bii|bin)，形状为( 3*hidden_size)
biashhl[k] – 第k层可学习的hidden-hidden偏置(bhr|bhi|bhn)，形状为( 3*hidden_size)。

rnn = nn.GRU(10, 20, 2)
 input = Variable(torch.randn(5, 3, 10))
 h0 = Variable(torch.randn(2, 3, 20))
 output, hn = rnn(input, h0)

其他常用层

Linear

class torch.nn.Linear(in_features, out_features, bias=True)

对输入数据做线性变换：y=Ax+b

参数：

in_features - 每个输入样本的大小
out_features - 每个输出样本的大小
bias - 若设置为False，这层不会学习偏置。默认值：True

形状：

输入:(N,infeatures)
输出： (N,outfeatures)

变量：

weight -形状为(out_features x in_features)的模块中可学习的权值
bias -形状为(out_features)的模块中可学习的偏置

例子

>>> m = nn.Linear(20, 30)
>>> input = autograd.Variable(torch.randn(128, 20))
>>> output = m(input)
>>> print(output.size())

Dropout

class torch.nn.Dropout(p=0.5, inplace=False)

随机将输入张量中部分元素设置为0。对于每次前向调用，被置0的元素都是随机的。

参数：

p - 将元素置0的概率。默认值：0.5
in-place - 若设置为True，会在原地执行操作。默认值：False

形状：

输入： 任意。输入可以为任意形状。
输出： 相同。输出和输入形状相同。

例子

>>> m = nn.Dropout(p=0.2)
>>> input = autograd.Variable(torch.randn(20, 16))
>>> output = m(input)

Embedding

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)

一个保存了固定字典和大小的简单查找表。

这个模块常用来保存词嵌入和用下标检索它们。模块的输入是一个下标的列表，输出是对应的词嵌入。

参数：

num_embeddings (int) - 嵌入字典的大小
embedding_dim (int) - 每个嵌入向量的大小
padding_idx (int, optional) - 如果提供的话，输出遇到此下标时用零填充
max_norm (float, optional) - 如果提供的话，会重新归一化词嵌入，使它们的范数小于提供的值
norm_type (float, optional) - 对于max_norm选项计算p范数时的p
scale_grad_by_freq (boolean, optional) - 如果提供的话，会根据字典中单词频率缩放梯度

变量：

weight (Tensor) -形状为(num_embeddings, embedding_dim)的模块中可学习的权值

形状：

输入： LongTensor (N, W), N = mini-batch, W = 每个mini-batch中提取的下标数
输出： (N, W, embedding_dim)

例子：

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = Variable(torch.LongTensor([[1,2,4,5],[4,3,2,9]]))
>>> embedding(input)

Variable containing:
(0 ,.,.) =
 -1.0822  1.2522  0.2434
  0.8393 -0.6062 -0.3348
  0.6597  0.0350  0.0837
  0.5521  0.9447  0.0498

(1 ,.,.) =
  0.6597  0.0350  0.0837
 -0.1527  0.0877  0.4260
  0.8393 -0.6062 -0.3348
 -0.8738 -0.9054  0.4281
[torch.FloatTensor of size 2x4x3]

>>> # example with padding_idx
>>> embedding = nn.Embedding(10, 3, padding_idx=0)
>>> input = Variable(torch.LongTensor([[0,2,0,5]]))
>>> embedding(input)

Variable containing:
(0 ,.,.) =
  0.0000  0.0000  0.0000
  0.3452  0.4937 -0.9361
  0.0000  0.0000  0.0000
  0.0706 -2.1962 -0.6276
[torch.FloatTensor of size 1x4x3]

# 可以通过这种方式给embedding层初始化
    def __init__(self, opt,social_emb=None,bipartite_emb=None,social_adj=None,bipartite_adj=None):
        super(Joint, self).__init__()
        self.opt=opt
        self.social_embedding=nn.Embedding(opt["number_user"]+opt["number_dev_user"],opt["social_dim"]) # user + dev_user, social_dim
        if social_emb==None:
            self.social_embedding.weight.data[:, :].uniform_(-1.0, 1.0)
        else :
            social_emb = torch.from_numpy(social_emb)
            self.social_embedding.weight.data.copy_(social_emb)

Distance

class torch.nn.PairwiseDistance(p=2, eps=1e-06)

按批计算向量v1, v2之间的距离：

‖x‖p:=(n∑i=1|xi|p)1/p

参数：

x (Tensor): 包含两个输入batch的张量，形状要一样
p (real): 范数次数，默认值：2

形状：

输入： (N,D)，其中D=向量维数
输出： (N,1)

>>> pdist = nn.PairwiseDistance(2)
>>> input1 = autograd.Variable(torch.randn(100, 128))
>>> input2 = autograd.Variable(torch.randn(100, 128))
>>> output = pdist(input1, input2)

最基本的使用方式：

criterion = LossCriterion() #构造函数有自己的参数
loss = criterion(x, y) #调用标准时也有参数

这个参数x,y会根据损失函数的不同，略微有所调整，下面会总结，主要的损失函数的用法。

需要注意的是：这里的loss已经对batch_size取了平均值。

L1Loss

class torch.nn.L1Loss(size_average=True) # size_average表示对样本数取平均

创建一个衡量输入x(模型预测输出)和目标y之间差的绝对值的平均值的标准。

loss(x,y)=1/n∑|xi−yi|

x 和 y 可以是任意形状，但是形状要相同，每个包含n个元素。
对n个元素对应的差值的绝对值求和，得出来的结果除以n。
如果在创建L1Loss实例的时候在构造函数中传入size_average=False，那么求出来的绝对值的和将不会除以n

MSELoss

class torch.nn.MSELoss(size_average=True)

创建一个衡量输入x(模型预测输出)和目标y之间均方误差标准。

loss(x,y)=1/n∑(xi−yi)2

x 和 y 可以是任意形状，每个包含n个元素。
对n个元素对应的差值的绝对值求和，得出来的结果除以n。
如果在创建MSELoss实例的时候在构造函数中传入size_average=False，那么求出来的平方和将不会除以n

CrossEntropyLoss

class torch.nn.CrossEntropyLoss(weight=None, size_average=True)

交叉熵是使用的最多的损失函数，和这个的用法务必牢记！！

此标准将LogSoftMax和NLLLoss集成到一个类中。

当训练一个多类分类器的时候，这个方法是十分有用的。

weight(tensor): 1-D tensor，n个元素，分别代表n类的权重，如果你的训练样本很不均衡的话，是非常有用的。默认值为None。表示每一种类所能提供的损失，详细看下面的公式

计算出的loss对mini-batch的大小取了平均。

调用时参数：

input : 包含每个类的得,shape为 batch*n
target: 大小为 batch, shape为batch`，targeti表示inputi所对应的类别编号

在weight为None的时候

loss(x,class)=−logexp(x[class])∑jexp(x[j])) =−x[class]+log(∑jexp(x[j]))

在weight不为None的时候

loss(x,class)=weights[class]∗(−x[class]+log(∑jexp(x[j])))

输入

loss对mini-batch的大小取了平均。

形状(shape)：

Input: (N,C) C 是类别的数量
Target: (N) N是mini-batch的大小，0 <= targets[i] <= C-1

BCELoss

class torch.nn.BCELoss(weight=None, size_average=True)

其全称应该是Binary Cross Entropy Loss。这个损失函数存在的原因是，我们在做二分类的时候，我们只需要输出一个值表示其成立的概率就好了，但是根据交叉熵，我们做二分类仍然需要输出两个节点，分别表示0成立的概率，和1成立的概率。为了解决这种情况，就有了BCELoss。其计算 target 与 output 之间的二进制交叉熵。

如果weight为None

loss(o,t)=−1n∑i(t[i]log(o[i])+(1−t[i])log(1−o[i]))

如果weight被指定

loss(o,t)=−1n∑iweights[i](t[i]log(o[i])+(1−t[i])∗log(1−o[i]))

强烈注意的地方：

input为一维矩阵，长度为batch，表示为标签1的概率，不能大于1！！！不能小于0！！！
target为一维矩阵，长度为batch，同时标签的值必须得是浮点数！！！！！

第一种情况的解决方案：

outputs[outputs < 0.0] = 0.0
outputs[outputs > 1.0] = 1.0

cre=nn.BCELoss()
input=torch.rand(3)
ans=torch.tensor([0,1.0,0])
loss=cre(input,ans)
print(loss)

Clip_grad_norm

torch.nn.utils.clip_grad_norm(parameters, max_norm, norm_type=2)

正则項的值由所有的梯度计算出来，就像他们连成一个向量一样。梯度被in-place operation修改。

参数说明: - parameters (Iterable[Variable]) – 可迭代的Variables，它们的梯度即将被标准化。 - max_norm (float or int) – clip后，gradients p-norm 值 - norm_type (float or int) – 标准化的类型，p-norm. 可以是inf 代表 infinity norm.

返回值为：所有参数的p-norm值。

torch.nn.utils.clip_grad_norm(self.model.parameters(), self.opt['max_grad_norm']) # 其中opt为超参数

PackedSequence

torch.nn.utils.rnn.PackedSequence(_cls, data, batch_sizes)

Holds the data and list of batch_sizes of a packed sequence.

All RNN modules accept packed sequences as inputs. 所有的RNN模块都接收这种被包裹后的序列作为它们的输入。

NOTE： 这个类的实例不能手动创建。它们只能被 pack_padded_sequence() 实例化。

参数说明:

data (Variable) – 包含打包后序列的Variable。
batch_sizes (list[int]) – 包含 mini-batch 中每个序列长度的列表。

Pack_padded_sequence

torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False)

这里的pack，理解成压紧比较好。将一个填充过的变长序列压紧。（填充时候，会有冗余，所以压紧一下）

输入的形状可以是(T×B× )。T是最长序列长度，B是batch size，`代表任意维度(可以是0)。如果batch_first=True的话，那么相应的input size就是(B×T×*)`。

input中保存的序列，应该按序列长度的长短排序，长的在前，短的在后。即input[:,0]代表的是最长的序列，input[:, B-1]保存的是最短的序列。

NOTE： 只要是维度大于等于2的input都可以作为这个函数的参数。你可以用它来打包labels，然后用RNN的输出和打包后的labels来计算loss。通过PackedSequence对象的.data属性可以获取 Variable。这样的话，就不需要mask了！

参数说明:

input (Variable) – 变长序列被填充后的 batch
lengths (list[int]) – Variable 中每个序列的长度。
batch_first (bool, optional) – 如果是True，input的形状应该是B*T*size。

一个PackedSequence 对象。

一个形象的例子：

input=[
    [1,2,3],
    [4,5,0],
    [6,0,0]
]# 这是我们的输入，其中0表示padding的id号，其他数字表示字的id
length=[3,2,1] # 表示输入的每一维原本的长度


pack_padded_sequence=torch.nn.utils.rnn.pack_padded_sequence(input,length,batch_first=True)
# 经过这一步，我们可以得到

pack_padded_sequence=[1,2,3,4,5,6] # 请注意实际上返回的不仅仅是这些，这只是做一个形象的理解

# 相当于我们这样做的话，rnn在计算的时候，可以自己跳过padding部分
# 这样的话，我们可以节省计算资源，同时由于我们知道length，我们也可以复原出原本的矩阵


# 实际使用的例子
h0, c0 = zero_state(batch_size)
inputs = nn.utils.rnn.pack_padded_sequence(inputs, seq_lens, batch_first=True)
hidden, (ht, ct) = self.rnn(inputs, (h0, c0))
hidden, output_lens = nn.utils.rnn.pad_packed_sequence(hidden, batch_first=True)

pad_packed_sequence

填充packed_sequence。

上面提到的函数的功能是将一个填充后的变长序列压紧。这个操作和pack_padded_sequence()是相反的。把压紧的序列再填充回来。

返回的Varaible的值的size是 T×B×*, T 是最长序列的长度，B 是 batch_size,如果 batch_first=True,那么返回值是B×T×*。

Batch中的元素将会以它们长度的逆序排列。等于说，这个做的是上面的逆操作！复原出原本的形状。

参数说明:

sequence (PackedSequence) – 将要被填充的 batch
batch_first (bool, optional) – 如果为True，返回的数据的格式为 B×T×*。

返回值: 一个tuple，包含被填充后的序列，和batch中序列的长度列表。也就是根据packedsequence对象里面的batch_sizes进行复原

例子

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.nn import utils as nn_utils
batch_size = 2
max_length = 3
hidden_size = 2
n_layers =1

tensor_in = torch.FloatTensor([[1, 2, 3], [1, 0, 0]]).resize_(2,3,1)
tensor_in = Variable( tensor_in ) #[batch, seq, feature], [2, 3, 1]
seq_lengths = [3,1] # list of integers holding information about the batch size at each sequence step

# pack it
pack = nn_utils.rnn.pack_padded_sequence(tensor_in, seq_lengths, batch_first=True)

# initialize
rnn = nn.RNN(1, hidden_size, n_layers, batch_first=True)
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))

#forward
out, _ = rnn(pack, h0)

# unpack
unpacked = nn_utils.rnn.pad_packed_sequence(out)
print(unpacked)

常用的torch的API就是这些啦！接下来也会继续总结的，长期更新。

如果你喜欢这篇文章，可以支持我继续更新呀！

感谢您的阅读，欢迎在评论区纠错。如需转载，请注明本文出处，谢谢。

# torch

torch函数解析

torch函数解析

torch的API

Tensor

常用创建操作

torch.ones/zeros/rand/randn

torch.eye

Torch.from_numpy

Torch.where

索引，切片，连接，换位操作

torch.cat

torch.gather

torch.index_select

torch.squeeze

torch.unsqueeze

torch.stack

torch.transpose

tensor.view

Tensor.contiguous

Tensor.repeat

Torch.nn

Container-容器

class torch.nn.Module

cpu()

cuda(device_id)

eval()

train(mode=True)

zero_grad()

卷积CNN层

Conv1d

Conv2d

MaxPool1d

MaxPool2d

BatchNorm1d

BatchNorm2d

其他常用层

Linear

Dropout

Embedding

Distance

L1Loss

MSELoss

CrossEntropyLoss

BCELoss

Clip_grad_norm

PackedSequence

Pack_padded_sequence

pad_packed_sequence

Recommend

About Joyk