9

CIFAR-10 数据集实战——构建ResNet18神经网络

 4 years ago
source link: https://www.wmathor.com/index.php/archives/1389/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

如果不了解ResNet的同学可以先看我的这篇博客ResNet论文阅读

VVrEvuE.png!web

首先实现一个Residual Block

import torch
from torch import nn
from torch.nn import functional as F

class ResBlk(nn.Module):
    def __init__(self, ch_in, ch_out, stride=1):
        super(ResBlk, self).__init__()
        self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(ch_out)
        
        self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(ch_out)
        
        if ch_out == ch_in:
            self.extra = nn.Sequential()
        else:
            self.extra = nn.Sequential(
                
                # 1×1的卷积作用是修改输入x的channel
                # [b, ch_in, h, w] => [b, ch_out, h, w]
                nn.Conv2d(ch_in, ch_out, kernel_size=1, stride=stride),
                nn.BatchNorm2d(ch_out),
            )
        
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))

        # short cut
        out = self.extra(x) + out
        out = F.relu(out)
        
        return out

Block中进行了正则化处理,以使train过程更快更稳定。同时要考虑,如果两元素的ch_in和ch_out不匹配,进行加法时会报错,因此需要判断一下,如果不想等,就用1×1的卷积调整一下

测试一下

blk = ResBlk(64, 128, stride=2)
tmp = torch.randn(2, 64, 32, 32)
out = blk(tmp)
print(out.shape)

输出的shape大小是 torch.Size([2, 128, 16, 16])

这里解释一下,为什么有的层要专门设置stride。先不考虑别的层,对于一个Residual block,channel从64增大到128,如果所有的stride都是1,padding也是1,那么图片的w和h也不会变,但是channel增大了,此时就会导致整个网络的参数增多。而这才仅仅一个Block,更不用说后面的FC以及更多Block了,所以stride不能全部设置为1,不要让网络的参数一直增大

然后我们搭建完整的ResNet-18

class ResNet18(nn.Module):
    def __init__(self):
        super(ResNet18, self).__init__()
        
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=3, padding=0),
            nn.BatchNorm2d(64),
        )
        # followed 4 blocks
        
        # [b, 64, h, w] => [b, 128, h, w]
        self.blk1 = ResBlk(64, 128, stride=2)
        # [b, 128, h, w] => [b, 256, h, w]
        self.blk2 = ResBlk(128, 256, stride=2)
        # [b, 256, h, w] => [h, 512, h, w]
        self.blk3 = ResBlk(256, 512, stride=2)
        # [b, 512, h, w] => [h, 1024, h, w]
        self.blk4 = ResBlk(512, 512, stride=2)
        
        self.outlayer = nn.Linear(512*1*1, 10)
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        
        # 经过四个blk以后 [b, 64, h, w] => [b, 1024, h, w]
        x = self.blk1(x)
        x = self.blk2(x)
        x = self.blk3(x)
        x = self.blk4(x)
        
        x = self.outlayer(x)
        
        return x

测试一下

x = torch.randn(2, 3, 32, 32)
model = ResNet18()
out = model(x)
print("ResNet:", out.shape)

结果报错了,错误信息如下

size mismatch, m1: [2048 x 2], m2: [512 x 10] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:961

问题在于我们最后定义线性层的输入维度,和上一层Block的输出维度不匹配,在ResNet18的最后一个Block运行结束后打印一下当前x的shape,结果是 torch.Size([2, 512, 2, 2])

解决办法有很多,可以修改线性层的输入进行匹配,也可以在最后一层Block后面再进行一些操作,使其与512匹配

先给出修改后的代码,在做解释

class ResNet18(nn.Module):
    def __init__(self):
        super(ResNet18, self).__init__()
        
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=3, padding=0),
            nn.BatchNorm2d(64),
        )
        # followed 4 blocks
        
        # [b, 64, h, w] => [b, 128, h, w]
        self.blk1 = ResBlk(64, 128, stride=2)
        # [b, 128, h, w] => [b, 256, h, w]
        self.blk2 = ResBlk(128, 256, stride=2)
        # [b, 256, h, w] => [h, 512, h, w]
        self.blk3 = ResBlk(256, 512, stride=2)
        # [b, 512, h, w] => [h, 1024, h, w]
        self.blk4 = ResBlk(512, 512, stride=2)
        
        self.outlayer = nn.Linear(512*1*1, 10)
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        
        # 经过四个blk以后 [b, 64, h, w] => [b, 1024, h, w]
        x = self.blk1(x)
        x = self.blk2(x)
        x = self.blk3(x)
        x = self.blk4(x)
        
        # print("after conv:", x.shape) # [b, 512, 2, 2]
        
        # [b, 512, h, w] => [b, 512, 1, 1]
        x = F.adaptive_avg_pool2d(x, [1, 1])
        
        x = x.view(x.size(0), -1) # [b, 512, 1, 1] => [b, 512*1*1]
        x = self.outlayer(x)
        
        return x

这里我采用的是第二种方法,在最后一个Block结束以后,接了一个自适应的pooling层,这个pooling的作用是将不论输入的宽高是多少,全部输出称宽高都是1的tensor,其他维度保持不变。然后再做一个reshape操作,将 [batchsize, 512, 1, 1] reshape成 [batchsize, 512*1*1] 大小的tensor,这样就和接下来的线性层对上了,线性层的输入大小是512,输出是10。因此整个网络最终输出的shape就是 [batchsize, 10]

最后我们把之前训练LeNet5的代码拷贝过来,将里面的 model=LeNet5() 改为 model=ResNet18() 就行了。完整代码如下

import torch
from torch import nn, optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms


batch_size=32
cifar_train = datasets.CIFAR10(root='cifar', train=True, transform=transforms.Compose([
    transforms.Resize([32, 32]),
    transforms.ToTensor(),
]), download=True)

cifar_train = DataLoader(cifar_train, batch_size=batch_size, shuffle=True)

cifar_test = datasets.CIFAR10(root='cifar', train=False, transform=transforms.Compose([
    transforms.Resize([32, 32]),
    transforms.ToTensor(),
]), download=True)
    
cifar_test = DataLoader(cifar_test, batch_size=batch_size, shuffle=True)      

class ResBlk(nn.Module):
    def __init__(self, ch_in, ch_out, stride=1):
        super(ResBlk, self).__init__()
        self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(ch_out)
        
        self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(ch_out)
        
        if ch_out == ch_in:
            self.extra = nn.Sequential()
        else:
            self.extra = nn.Sequential(
                
                # 1×1的卷积作用是修改输入x的channel
                # [b, ch_in, h, w] => [b, ch_out, h, w]
                nn.Conv2d(ch_in, ch_out, kernel_size=1, stride=stride),
                nn.BatchNorm2d(ch_out),
            )
        
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))

        # short cut
        out = self.extra(x) + out
        out = F.relu(out)
        
        return out
        
class ResNet18(nn.Module):
    def __init__(self):
        super(ResNet18, self).__init__()
        
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=3, padding=0),
            nn.BatchNorm2d(64),
        )
        # followed 4 blocks
        
        # [b, 64, h, w] => [b, 128, h, w]
        self.blk1 = ResBlk(64, 128, stride=2)
        # [b, 128, h, w] => [b, 256, h, w]
        self.blk2 = ResBlk(128, 256, stride=2)
        # [b, 256, h, w] => [h, 512, h, w]
        self.blk3 = ResBlk(256, 512, stride=2)
        # [b, 512, h, w] => [h, 1024, h, w]
        self.blk4 = ResBlk(512, 512, stride=2)
        
        self.outlayer = nn.Linear(512*1*1, 10)
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        
        # 经过四个blk以后 [b, 64, h, w] => [b, 1024, h, w]
        x = self.blk1(x)
        x = self.blk2(x)
        x = self.blk3(x)
        x = self.blk4(x)
        
        # print("after conv:", x.shape) # [b, 512, 2, 2]
        
        # [b, 512, h, w] => [b, 512, 1, 1]
        x = F.adaptive_avg_pool2d(x, [1, 1])
        
        x = x.view(x.size(0), -1) # [b, 512, 1, 1] => [b, 512*1*1]
        x = self.outlayer(x)
        
        return x

def main():

    ##########  train  ##########
    #device = torch.device('cuda')
    #model = ResNet18().to(device)
    criteon = nn.CrossEntropyLoss()
    model = ResNet18()
    optimizer = optim.Adam(model.parameters(), 1e-3)
    for epoch in range(1000):
        model.train()
        for batchidx, (x, label) in enumerate(cifar_train):
            #x, label = x.to(device), label.to(device)
            logits = model(x)
            # logits: [b, 10]
            # label:  [b]
            loss = criteon(logits, label)
            
            # backward
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        print('train:', epoch, loss.item())
        
        ########## test  ##########
        model.eval()
        with torch.no_grad():
            total_correct = 0
            total_num = 0
            for x, label in cifar_test:
                # x, label = x.to(device), label.to(device)

                # [b]
                logits = model(x)
                # [b]
                pred = logits.argmax(dim=1)
                # [b] vs [b]
                total_correct += torch.eq(pred, label).float().sum().item()
                total_num += x.size(0)
            acc = total_correct / total_num
            print('test:', epoch, acc)

if __name__ == '__main__':
    main()

ARzuq2z.png!web

ResNet和LeNet相比,准确率提升的很快,但是由于层数增加,不可避免的会导致运行时间增加,如果没有GPU,运行一个epoch大概要15分钟。读者同样可以在此基础上修改网络结构,运用一些tricks,比方说一开始就对图片做一个Normalize等


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK