牛刀小试之用 pytorch 实现 LSTM
source link: https://mp.weixin.qq.com/s/k_z8sNbO3sqqkTV8gvWaIw?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
LSTM参数
首先需要定义好循环网络,需要nn.LSTM(),首先介绍一下这个函数里面的参数
LSTM数据格式:
-
num_layers
: 我们构建的循环网络有层lstm -
num_directions
: 当bidirectional=True时,num_directions=2;当bidirectional=False时,num_directions=1
输入LSTM中的数据格式
输入LSTM中的X数据格式尺寸为(seq_len, batch, input_size),此外h0和c0尺寸如下
-
h0(num_layers * num_directions, batch_size, hidden_size)
-
c0(num_layers * num_directions, batch_size, hidden_size)
LSTM输出数据格式
LSTM输出的数据格式尺寸为(seq_len, batch, hidden_size * num_directions);输出的hn和cn尺寸如下
-
hn(num_layers * num_directions, batch_size, hidden_size)
-
cn(num_layers * num_directions, batch_size, hidden_size)
import torch.nn as nn import torch as t lstm = nn.LSTM(input_size=4, #输入数据的特征数是4 hidden_size=10, #输出的特征数(hidden_size)是10 batch_first= True) #使用batch_first数据维度表达方式,即(batch_size, 序列长度, 特征数目) lstm
LSTM(4, 10, batch_first=True)
根据LSTM网络的使用方式,每一层LSTM都有三个外界输入的数据,分别:
-
X: LSTM网络外输入的数据
-
h_0:上一层LSTM输出的结果
-
c_0:上一层LSTM调整后的记忆
照着前面总结的 LSTM输入的数据格式
(这里大家要搞明白batch_size、seq_len、dims各自代表什么),我们定义LSTM输入的数据。
我们定义的LSTM网络也需要输入这三个数据:
-
x: 这里我们将x的尺寸 (batch_size, seq_len, dims)依次是(3, 5, 4)
-
h0:h0的(num_layers * num_directions, batch_size, hidden_size)尺寸依次是(1, 3 , 10)
-
c0: c0的(num_layers * num_directions, batch_size, hidden_size)尺寸依次是(1, 3, 10)
#x: 这里我们将x的尺寸 (batch_size, seq_len, dims)依次是(3, 5, 4) x = t.randn(3, 5, 4) x
tensor([[[ 0.1478, -0.7733, -0.3462, 0.0320], [-0.0540, 0.4757, -1.2787, 0.6141], [ 1.9581, 0.0015, 1.4387, -0.5895], [-1.0691, -1.7070, 1.0219, -0.7990], [-1.7735, 0.6824, 0.6067, -0.6630]], [[-0.3223, -0.6943, 0.1120, -1.7799], [-1.0542, 0.2151, -2.2530, 0.2640], [-0.0599, -0.1996, 0.9793, -1.4952], [-0.2328, 0.2297, -1.4825, 0.0720], [ 0.7112, -0.1165, 2.5641, -1.4247]], [[-0.4157, -1.1617, -0.7442, -0.8369], [ 0.5266, 2.3119, 0.6428, 0.3797], [-0.2951, -1.5711, 1.2832, -0.2773], [ 0.4760, 0.2403, 0.2923, 2.2315], [-0.3348, -0.0976, 0.0388, 0.5948]]])
h0的(num_layers * num_directions, batch_size, hidden_size)尺寸依次是(1, 3, 10)
h0 = t.randn(1, 3, 10) h0
tensor([[[ 1.2469, 1.2457, -1.0390, 0.3173, 1.0083, 0.7610, -0.0088, 0.0614, 0.5630, 0.7260], [-0.6529, -1.4584, -0.7871, -0.4002, -0.4619, -0.2633, 0.2818, -0.3486, -1.0637, -1.0772], [ 0.6969, 1.2095, -0.9888, -1.1326, -1.1339, -1.0660, 0.9650, 0.4040, -0.7997, -1.3996]]])
c0的(num_layers * num_directions, batch_size, hidden_size)尺寸依次是(1, 3, 10)
c0 = t.randn(1, 3, 10) c0
tensor([[[-1.5377, -0.7845, 0.0971, -0.1659, 1.8828, 1.8013, -0.7545, 0.7165, 2.1182, -0.7022], [ 0.2850, 0.2503, -0.8153, 0.5210, 0.0405, 0.5819, -0.1994, 0.2940, 0.4487, -0.4580], [-0.6478, -1.1122, -0.0021, 0.3013, 1.1450, 0.5811, -0.8989, -0.2919, -0.9292, 0.0599]]])
调用之前实例化的lstm, 输入数据x和上一期lstm的 h_0
和 c_0
。
output = lstm(x, (h0, c0)) output
(tensor([[[-0.2626, -0.1759, 0.0594, 0.0890, 0.4585, 0.5010, -0.2925, 0.0523, 0.2142, -0.2536], [ 0.0161, 0.0077, -0.0074, -0.0148, 0.3526, 0.1798, -0.1033, -0.1095, 0.2395, -0.1300], [ 0.0354, -0.0625, 0.1339, 0.1123, 0.1212, 0.0853, 0.0660, -0.0315, 0.0441, -0.1083], [ 0.0188, -0.3257, 0.1776, 0.1890, 0.0583, 0.0848, 0.0848, -0.1144, 0.0529, -0.0439], [ 0.0346, -0.2408, 0.1594, 0.2038, 0.1914, -0.0385, 0.1791, -0.2192, 0.0237, -0.0506]], [[ 0.1276, -0.2258, -0.1029, 0.3354, -0.0689, 0.1603, 0.0564, -0.0073, 0.1161, -0.0802], [ 0.1915, -0.0040, -0.0824, 0.0939, -0.1243, 0.1259, -0.0495, -0.2668, 0.1277, -0.0344], [ 0.1991, -0.1713, 0.0796, 0.2259, -0.0527, 0.1457, 0.0777, -0.2230, -0.0176, -0.0446], [ 0.2169, 0.0204, 0.0673, 0.0936, -0.0887, 0.1041, 0.0164, -0.3327, 0.0704, -0.0270], [ 0.0938, -0.2337, 0.1380, 0.2187, 0.0426, 0.0422, 0.1518, -0.1585, -0.0422, -0.0449]], [[ 0.0959, -0.3436, 0.1434, 0.1418, 0.0680, 0.2509, 0.0244, -0.3842, -0.1125, 0.1595], [ 0.0318, -0.0647, 0.0393, 0.0978, 0.2184, -0.0525, 0.2293, -0.1959, -0.1001, 0.0704], [-0.0548, -0.3070, 0.1436, 0.1868, 0.0755, -0.0966, 0.1553, -0.1882, 0.0149, 0.0397], [-0.1964, 0.0121, 0.0689, 0.0245, 0.1022, -0.1088, 0.1525, -0.0814, 0.1698, 0.0285], [-0.0799, 0.0252, 0.0494, 0.0438, 0.1130, -0.1117, 0.1234, -0.1222, 0.1560, 0.0400]]], grad_fn=<TransposeBackward0>), (tensor([[[ 0.0346, -0.2408, 0.1594, 0.2038, 0.1914, -0.0385, 0.1791, -0.2192, 0.0237, -0.0506], [ 0.0938, -0.2337, 0.1380, 0.2187, 0.0426, 0.0422, 0.1518, -0.1585, -0.0422, -0.0449], [-0.0799, 0.0252, 0.0494, 0.0438, 0.1130, -0.1117, 0.1234, -0.1222, 0.1560, 0.0400]]], grad_fn=<ViewBackward>), tensor([[[ 0.0747, -0.3711, 0.2711, 0.7422, 0.4142, -0.0688, 0.3214, -0.5058, 0.0441, -0.1047], [ 0.1504, -0.3095, 0.3440, 0.5343, 0.0911, 0.0653, 0.2167, -0.3579, -0.1910, -0.1100], [-0.1395, 0.0483, 0.0805, 0.1036, 0.2337, -0.2296, 0.2053, -0.2010, 0.3182, 0.0997]]], grad_fn=<ViewBackward>)))
从上面的结果看应该分成两大部分,其中第二部分又分为两小部分。
上面的结果对应的正是当前LSTM的输出结果,以及当前 h_out
和 c_out
out, (h_out, c_out) = lstm(x, (h0, c0)) print(out.shape) print(h_out.shape) print(c_out.shape)
torch.Size([3, 5, 10]) torch.Size([1, 3, 10]) torch.Size([1, 3, 10])
正常运行,说明我们的参数都组织得当,正确使用了pytorch中lstm模型。
往期文章
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK