• 经典论文-MobileNet V1论文及实践


    MobileNets: Effificient Convolutional Neural Networks for Mobile Vision Applications


    • 作者:Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
    • 单位:Google


    我们提出了一类有效的模型,称为MobileNets, 专注于移动端和嵌入式视觉应用程序。MobileNets基于流线型架构,使用深度可分离卷积(depth-wise separable convolutions)来构建轻量级的深度神经网络。引入了两个简单的全局超参数,可以有效地在延迟和准确性之间进行权衡。这些超参数允许模型构建器根据问题的约束条件为其应用程序选择合适大小的模型。另外,我们做了大量权衡资源和准确性的实验,并显示了与其他流行的ImageNet分类模型相比也具有强大的性能。然后,演示了MobileNets在广泛的应用程序和用例中的有效性,包括目标检测、细粒度分类、人脸属性和大规模地理定位

    1 简介

    自从AlexNet[19]赢得图像网挑战,ILSVRC 2012[24]而普及了深度卷积神经网络以来,卷积神经网络已经在计算机视觉中无处不在总的趋势是建立更深、更复杂的网络,以实现更高的精度[27,31,29,8]。然而,这些提高精度的进步并不一定会使网络在模型大小和速度方面更有效。在许多现实世界的应用程序中,如机器人技术、自动驾驶汽车和增强现实技术,识别任务需要在一个计算量有限的平台上及时执行



    2 前期工作


    MobileNets主要由[26]中最初引入的深度可分离卷积构建,随后在初始模型[13]中使用,以减少前几层的计算。Flattened 网络[16]建立了一个完全分解卷积的网络,并显示了分解网络的潜力。独立于本文,分解网络[34]引入了类似的分解卷积以及拓扑连接的使用。随后, Xception网络[3]演示了如何扩大深度可分离过滤器,以完成执行 Inception V3网络。另一个小型网络是Squeezenet[12],它使用bottleneck方法来设计一个非常小的网络。其他简化的计算网络包括结构变换网络[28]和替换全连接层的Fastfood[37]。


    3 MobileNet网络架构


    3.1 深度可分离卷积

    MobileNet模型是基于深度可分离卷积,这是一种因分解卷积的形式,它将一个标准卷积分解为深度卷积和一种称为逐点卷积的1×1卷积。对于MobileNet,深度卷积对每个输入通道应用一个滤波器然后组合逐点卷积1×1的卷积来输出的深度卷积。标准卷积在一步中过滤并将输入组合到一组新的输出中。深度可分离卷积将其分为两层,一个单独的层用于过滤,一个单独的层用于组合。这种因式分解d 的方式大大减少了计算量和模型的大小。图2中的2(a)显示了标准卷积如何被分解为深度卷积2(b)和1×1逐点卷积2©。


    一个标准的卷积层以一个 D F × D F x M D_F×D_FxM DF×DFxM特征图F作为输入,然后产生一个 D G × D G × N D_G × D_G × N DG×DG×N

    的特征图G, 其中 D F D_F DF是一个正方形输入特征图的空间宽度和高度,M为输入通道的数量(输入深度), D G D_G DG为正方形输出特征图的空间宽度和高度,N为输出通道的数量(输出深度)。

    标准卷积层的参数化是由卷积核K大小确定的,总的参数为 D K × D K × M × N D_K×D_K×M×N DK×DK×M×N,其中 D K D_K DK是内核的空间维度,一般是平方的,M是输入通道和N是输出通道的数量如前面定义。





    其中,计算代价以乘法的方式取决于输入通道数M,输出通道数N,核大小 D k × D k D_k×D_k Dk×Dk和特征映射大小 D F × D F D_F×D_F DF×DF。MobileNet模型解决了这些术语和它们之间的相互作用。首先,它使用深度可分离的卷积来打破输出通道的数量和内核的大小之间的相互作用。





    其中 ˆ K ˆK ˆK是大小为 D K × D K × M D_K×D_K×M DK×DK×M的深度卷积核,其中 ˆ K ˆK ˆK中的第m个滤波器应用于F中的第m个通道,以产生滤波输出特征图$ˆG4的第m个通道。深度卷积的计算代价为










    3.2 网络结构与训练





    仅仅用少量的Mult-Adds来定义网络是不够的,确保这些操作能够有效地实现也很重要。例如,非结构化稀疏矩阵操作通常不会比密集矩阵操作快,直到非常高的稀疏性本文提出的模型结构几乎将所有的计算都放在密集的1×1卷积中。这可以用高度优化的一般矩阵乘法(GEMM)函数来实现。卷积通常是由GEMM实现的,但需要在内存中进行一个称为im2col的初始重新排序,以便将其映射到GEMM。例如,在Caffe软件包中就使用了这种方法,**1×1卷积不需要在内存中进行这种重新排序,可以直接用GEMM来实现,这是最优化的数值线性代数算法之一。 ** MobileNet将95%的计算时间花费在1×1卷积中,其中也有75%的参数,如表2所示。几乎所有的附加参数都是在全连接层中


    MobileNet模型在Tensorflow[1]中进行训练,使用类似于InceptionV3[31]中的RMSprop异步梯度下降算法。然而,与训练大型模型相反,本文使用较少的正则化和数据增强技术,因为小型模型在过拟合方面的问题较少。当训练MobileNets时,不使用side heads或label smooth标签平滑,并且通过限制在大型Inception训练[31]中使用的crop的大小来减少扭曲图像的数量。此外,我们发现在深度滤波器上放很少或没有权值衰减(l2正则化)是很重要的,因为它们的参数非常少, 对于下一节中的ImageNet基准,所有模型都使用相同的训练参数进行训练,而不考虑模型的大小。

    3.3 宽度乘数:更细的模型




    其中,α∈(0,1]的典型设置为1, 0.75、0.5和0.25. α=1是基线网络,α<1是reduced MobileNets, 宽度乘数可以将计算成本和参数数二次降低约 α 2 α^2 α2。宽度乘数可以应用于任何模型结构,以定义一个新的更小的模型,具有合理的精度,延迟和尺寸权衡。它用于定义一个新的、需要从头开始进行训练的简化结构

    3.4 分辨率乘数:简化表达



    其中,ρ∈(0,1]通常被隐式设置,使网络的输入分辨率为224、192、160或128。ρ=1是基线MobileNets,ρ<1减少了MobileNets的计算量,分辨率乘数具有降低计算成本 ρ 2 ρ^2 ρ2的效果。



    4 实验


    4.1 模型选择


    4.2 模型缩小超参数












    4.3 细粒度识别

    我们在Stanford Dogs数据集[17]上进行细粒度识别。我们扩展了[18]的方法,并从网络中收集了一个比[18]更大但更嘈杂的训练集。使用嘈杂的网络数据来预训练一个细粒度的狗识别模型然后在Stanford Dogs训练集上对模型进行微调。Stanford Dogs测试集的结果见表10。MobileNet几乎可以在大大减少计算和大小的情况下实现来自[18]的最先进的结果。


    4.4 大规模定位


    我们在相同的数据上使用MobileNet架构重新训练PlaNet。而基于Inception V3架构的完整PlaNet模型[31]有5200万个参数和57.4亿次mult-adds操作。MobileNet模型只有1300万参数,通常网络有300万,最后一层是1000万以及58万次mult-adds操作。如表11中所示,MobileNet版本与PlaNet相比,虽然更紧凑,但性能仅略有下降。但是,它的性能仍然大大优于Im2GPS。


    4.5 人脸属性




    4.6 目标检测

    MobileNet也可以作为现代目标检测系统中有效的基础网络进行部署。基于最近赢得2016年COCO挑战[10]的工作,我们基于MobileNet进行的COCO数据目标检测训练的结果。在表13中,比较了MobileNet与VGG和Faster-RCNN[23]和SSD[21]框架进行了比较。在我们的实验中,SSD以300输入分辨率(SSD 300)进行评估,并将Faster-RCNN与300和600输入分辨率(Faster-RCNN 300,Faster-RCNN 600)进行了比较,Faster-RCNN模型评估每个图像的300个RPN建议框。模型在COCO训练+测试数据集上进行训练,不包括8k个微型图像,并对微型图像进行评估。对于这两种框架,MobileNet仅在计算复杂度和模型非常小的情况下获得了与其他网络相当的结果。


    4.7 人脸Embeddings

    FaceNet模型是一种最先进的人脸识别模型[25]。它基于triple loss损失构建人脸嵌入。为了建立一个移动端的FaceNet模型,我们使用蒸馏来最小化FaceNet和MobileNet对训练数据的平方差。对于非常小的MobileNet模型的结果见表14。


    5 总结



    import os
    import sys
    import numpy as np
    import pandas as pd
    from typing import Any
    from matplotlib import pyplot as plt
    import torch
    import torch.nn as nn
    import torch.nn.init as init
    import torch.nn.functional as f
    import torch.optim as optim
    from torch.utils.data import Dataset, DataLoader
        from torch.hub import load_state_dict_from_url
    except ImportError:
        from torch.utils.model_zoo import load_url as load_state_dict_from_url
    # 设置gpu参数
    os.environ['CUDA_VISIABLE_DIVICES'] = '0'
    # 设置网络超参数
    batch_size = 256
    num_works = 4
    lr = 1e-4
    epochs = 100
    image_size = 224
    # 加载数据
    from torchvision import datasets
    train_data = datasets.CIFAR10(root='./', train=True, download=True, transform=data_transform)
    test_data = datasets.CIFAR10(root='./', train=False, download=True, transform=data_transform)
    # 准备数据
    train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=num_works, drop_last=True)
    test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=num_works)
    # 查看数据
    image, label = next(iter(train_loader))
    print(image.shape, label.shape)
    # plt.imshow(image[0][0], cmap='gray')
    # 构建模型
    class MobileNetV1(nn.Module):
        def __init__(self, input_dim, num_classes=1000):
            def conv_bn(inp, output, stride):
                return nn.Sequential(
                    nn.Conv2d(inp, output, 3, stride, 1, bias=False),
            def conv_dw_pw(inp, output, stride):
                return nn.Sequential(
                    # depth wise
                    nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
                    # point wise
                    nn.Conv2d(inp, output, 1, 1, 0, bias=False),
            self.model = nn.Sequential(
                conv_bn(input_dim, 32, 2),
                conv_dw_pw(32, 64, 1),
                conv_dw_pw(64, 128, 2),
                conv_dw_pw(128, 128, 1),
                conv_dw_pw(128, 256, 2),
                conv_dw_pw(256, 256, 1),
                conv_dw_pw(256, 512, 2),
                conv_dw_pw(512, 512, 1),
                conv_dw_pw(512, 512, 1),
                conv_dw_pw(512, 512, 1),
                conv_dw_pw(512, 512, 1),
                conv_dw_pw(512, 512, 1),
                conv_dw_pw(512, 1024, 2),
                conv_dw_pw(1024, 1024, 1),
            self.fc = nn.Linear(1024, num_classes)
        def forward(self, x):
            x = self.model(x)
            x = x.view(-1, 1024)
            x = self.fc(x)
            return x
    # 模型初始化
    model = MobileNetV1(3, 1000).cuda()
    # 定义优化器和损失函数
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    # 记录
    from torch.utils.tensorboard import SummaryWriter
    writer1 = SummaryWriter('./runs/loss')
    writer2 = SummaryWriter('./runs/acc')
    # train和test过程
    def train(epoch):
        train_loss = 0
        for data, label in train_loader:
            data, label = data.cuda(), label.cuda()
            output = model(data)
            loss = criterion(output, label)
            train_loss += loss.item() * data.size(0)
        train_loss = train_loss / len(train_loader.dataset)
        writer1.add_scalar('loss', train_loss, epoch)
        print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))
    def val(epoch):
        # 设置评估状态
        val_loss = 0
        gt_labels = []
        pred_labels = []
        # 不设置梯度
        with torch.no_grad():
            for data, label in test_loader:
                data, label = data.cuda(), label.cuda()
                output = model(data)
                preds = torch.argmax(output, 1)
                loss = criterion(output, label)
                val_loss += loss.item()*data.size(0)
        # 计算验证集的平均损失
        val_loss = val_loss /len(test_loader.dataset)
        writer1.add_scalar('loss', val_loss, epoch)
        gt_labels, pred_labels = np.concatenate(gt_labels), np.concatenate(pred_labels)
        # 计算准确率
        acc = np.sum(gt_labels ==pred_labels)/len(pred_labels)
        writer2.add_scalar('acc', acc, epoch)
        print('Epoch: {} \tValidation Loss: {:.6f}, Accuracy: {:6f}'.format(epoch, val_loss, acc))
    for epoch in range(1, epochs+1):
    Epoch: 1 	Training Loss: 0.542552
    Epoch: 1 	Validation Loss: 1.102596, Accuracy: 0.636600
    Epoch: 2 	Training Loss: 0.456910
    Epoch: 2 	Validation Loss: 1.116227, Accuracy: 0.637100
    Epoch: 3 	Training Loss: 0.394289
    Epoch: 3 	Validation Loss: 1.163121, Accuracy: 0.642500
    Epoch: 4 	Training Loss: 0.332055
    Epoch: 4 	Validation Loss: 1.204307, Accuracy: 0.635500
    Epoch: 5 	Training Loss: 0.282552
    Epoch: 5 	Validation Loss: 1.317173, Accuracy: 0.632500
    Epoch: 6 	Training Loss: 0.241465
    Epoch: 6 	Validation Loss: 1.304476, Accuracy: 0.644200
    Epoch: 7 	Training Loss: 0.212577
    Epoch: 7 	Validation Loss: 1.297210, Accuracy: 0.652200
    Epoch: 8 	Training Loss: 0.174558
    Epoch: 8 	Validation Loss: 1.427903, Accuracy: 0.629300
    Epoch: 9 	Training Loss: 0.162358
    Epoch: 9 	Validation Loss: 1.316071, Accuracy: 0.655200
    Epoch: 10 	Training Loss: 0.151313
    Epoch: 10 	Validation Loss: 1.385027, Accuracy: 0.652200
    Epoch: 11 	Training Loss: 0.134953
    Epoch: 11 	Validation Loss: 1.348235, Accuracy: 0.652200
    Epoch: 12 	Training Loss: 0.112869
    Epoch: 12 	Validation Loss: 1.455159, Accuracy: 0.647700
    Epoch: 13 	Training Loss: 0.100508
    Epoch: 13 	Validation Loss: 1.478256, Accuracy: 0.655700
    Epoch: 14 	Training Loss: 0.104845
    Epoch: 14 	Validation Loss: 1.470615, Accuracy: 0.651800
    Epoch: 15 	Training Loss: 0.092965
    Epoch: 15 	Validation Loss: 1.486673, Accuracy: 0.652200
    Epoch: 16 	Training Loss: 0.093700
    Epoch: 16 	Validation Loss: 1.480290, Accuracy: 0.657300
    Epoch: 17 	Training Loss: 0.091183
    Epoch: 17 	Validation Loss: 1.496931, Accuracy: 0.659100
    Epoch: 18 	Training Loss: 0.093449
    Epoch: 18 	Validation Loss: 1.545923, Accuracy: 0.655500
    Epoch: 19 	Training Loss: 0.089333
    Epoch: 19 	Validation Loss: 1.656608, Accuracy: 0.652300
    Epoch: 20 	Training Loss: 0.080280
    Epoch: 20 	Validation Loss: 1.691422, Accuracy: 0.637400
    Epoch: 21 	Training Loss: 0.076340
    Epoch: 21 	Validation Loss: 1.553575, Accuracy: 0.667200
    Epoch: 22 	Training Loss: 0.074090
    Epoch: 22 	Validation Loss: 1.529634, Accuracy: 0.669000
    Epoch: 23 	Training Loss: 0.063183
    Epoch: 23 	Validation Loss: 1.581277, Accuracy: 0.667900
    Epoch: 24 	Training Loss: 0.059101
    Epoch: 24 	Validation Loss: 1.594428, Accuracy: 0.666500
    Epoch: 25 	Training Loss: 0.064686
    Epoch: 25 	Validation Loss: 1.653475, Accuracy: 0.659600
    Epoch: 26 	Training Loss: 0.072094
    Epoch: 26 	Validation Loss: 1.603179, Accuracy: 0.666100
    Epoch: 27 	Training Loss: 0.061241
    Epoch: 27 	Validation Loss: 1.615846, Accuracy: 0.668100
    Epoch: 28 	Training Loss: 0.064317
    Epoch: 28 	Validation Loss: 1.692577, Accuracy: 0.665700
    Epoch: 29 	Training Loss: 0.065329
    Epoch: 29 	Validation Loss: 1.705178, Accuracy: 0.661300
    Epoch: 30 	Training Loss: 0.062401
    Epoch: 30 	Validation Loss: 1.679631, Accuracy: 0.657900
    Epoch: 31 	Training Loss: 0.056963
    Epoch: 31 	Validation Loss: 1.773723, Accuracy: 0.665400
    Epoch: 32 	Training Loss: 0.052961
    Epoch: 32 	Validation Loss: 1.673505, Accuracy: 0.675400
    Epoch: 33 	Training Loss: 0.061805
    Epoch: 33 	Validation Loss: 1.882510, Accuracy: 0.648500
    Epoch: 34 	Training Loss: 0.056032
    Epoch: 34 	Validation Loss: 1.658725, Accuracy: 0.671600
    Epoch: 35 	Training Loss: 0.045075
    Epoch: 35 	Validation Loss: 1.644350, Accuracy: 0.677000
    Epoch: 36 	Training Loss: 0.043535
    Epoch: 36 	Validation Loss: 1.666007, Accuracy: 0.677900
    Epoch: 37 	Training Loss: 0.051126
    Epoch: 37 	Validation Loss: 1.714207, Accuracy: 0.672100
    Epoch: 38 	Training Loss: 0.054347
    Epoch: 38 	Validation Loss: 1.628756, Accuracy: 0.674000
    Epoch: 39 	Training Loss: 0.046926
    Epoch: 39 	Validation Loss: 1.661052, Accuracy: 0.674700
    Epoch: 40 	Training Loss: 0.049473
    Epoch: 40 	Validation Loss: 1.649359, Accuracy: 0.687500
    Epoch: 41 	Training Loss: 0.052907
    Epoch: 41 	Validation Loss: 1.650108, Accuracy: 0.684400
    Epoch: 42 	Training Loss: 0.048754
    Epoch: 42 	Validation Loss: 1.744841, Accuracy: 0.677700
    Epoch: 43 	Training Loss: 0.050814
    Epoch: 43 	Validation Loss: 1.673245, Accuracy: 0.683600
    Epoch: 44 	Training Loss: 0.040706
    Epoch: 44 	Validation Loss: 1.813371, Accuracy: 0.672600
    Epoch: 45 	Training Loss: 0.046347
    Epoch: 45 	Validation Loss: 1.720800, Accuracy: 0.680300
    Epoch: 46 	Training Loss: 0.039459
    Epoch: 46 	Validation Loss: 1.677644, Accuracy: 0.686300
    Epoch: 47 	Training Loss: 0.038748
    Epoch: 47 	Validation Loss: 1.794113, Accuracy: 0.670000
    Epoch: 48 	Training Loss: 0.037007
    Epoch: 48 	Validation Loss: 1.848523, Accuracy: 0.668700
    Epoch: 49 	Training Loss: 0.040037
    Epoch: 49 	Validation Loss: 1.804912, Accuracy: 0.674400
    Epoch: 50 	Training Loss: 0.038721
    Epoch: 50 	Validation Loss: 1.715275, Accuracy: 0.689200
    Epoch: 51 	Training Loss: 0.039992
    Epoch: 51 	Validation Loss: 1.801695, Accuracy: 0.671300
    Epoch: 52 	Training Loss: 0.039914
    Epoch: 52 	Validation Loss: 1.691090, Accuracy: 0.683200
    Epoch: 53 	Training Loss: 0.035851
    Epoch: 53 	Validation Loss: 1.692445, Accuracy: 0.692500
    Epoch: 54 	Training Loss: 0.032450
    Epoch: 54 	Validation Loss: 1.677598, Accuracy: 0.692100
    Epoch: 55 	Training Loss: 0.042038
    Epoch: 55 	Validation Loss: 1.776169, Accuracy: 0.673200
    Epoch: 56 	Training Loss: 0.037488
    Epoch: 56 	Validation Loss: 1.752145, Accuracy: 0.683700
    Epoch: 57 	Training Loss: 0.038894
    Epoch: 57 	Validation Loss: 1.721822, Accuracy: 0.683900
    Epoch: 58 	Training Loss: 0.045965
    Epoch: 58 	Validation Loss: 1.726646, Accuracy: 0.694800
    Epoch: 59 	Training Loss: 0.028319
    Epoch: 59 	Validation Loss: 1.672058, Accuracy: 0.700900
    Epoch: 60 	Training Loss: 0.027599
    Epoch: 60 	Validation Loss: 1.717727, Accuracy: 0.698900
    Epoch: 61 	Training Loss: 0.029994
    Epoch: 61 	Validation Loss: 1.676433, Accuracy: 0.695300
    Epoch: 62 	Training Loss: 0.035561
    Epoch: 62 	Validation Loss: 1.759827, Accuracy: 0.684600
    Epoch: 63 	Training Loss: 0.033197
    Epoch: 63 	Validation Loss: 1.749903, Accuracy: 0.693200
    Epoch: 64 	Training Loss: 0.031044
    Epoch: 64 	Validation Loss: 1.795904, Accuracy: 0.693800
    Epoch: 65 	Training Loss: 0.030796
    Epoch: 65 	Validation Loss: 1.743078, Accuracy: 0.688700
    Epoch: 66 	Training Loss: 0.034064
    Epoch: 66 	Validation Loss: 1.723591, Accuracy: 0.691700
    Epoch: 67 	Training Loss: 0.031070
    Epoch: 67 	Validation Loss: 1.715376, Accuracy: 0.696200
    Epoch: 68 	Training Loss: 0.033009
    Epoch: 68 	Validation Loss: 1.800107, Accuracy: 0.692200
    Epoch: 69 	Training Loss: 0.031370
    Epoch: 69 	Validation Loss: 1.707125, Accuracy: 0.700900
    Epoch: 70 	Training Loss: 0.030852
    Epoch: 70 	Validation Loss: 1.685999, Accuracy: 0.702800
    Epoch: 71 	Training Loss: 0.028473
    Epoch: 71 	Validation Loss: 1.732791, Accuracy: 0.697500
    Epoch: 72 	Training Loss: 0.039392
    Epoch: 72 	Validation Loss: 1.700891, Accuracy: 0.698900
    Epoch: 73 	Training Loss: 0.027731
    Epoch: 73 	Validation Loss: 1.695288, Accuracy: 0.699700
    Epoch: 74 	Training Loss: 0.024269
    Epoch: 74 	Validation Loss: 1.760297, Accuracy: 0.694400
    Epoch: 75 	Training Loss: 0.022800
    Epoch: 75 	Validation Loss: 1.713674, Accuracy: 0.698300
    Epoch: 76 	Training Loss: 0.022720
    Epoch: 76 	Validation Loss: 1.760442, Accuracy: 0.698500
    Epoch: 77 	Training Loss: 0.024492
    Epoch: 77 	Validation Loss: 1.793320, Accuracy: 0.700000
    Epoch: 78 	Training Loss: 0.024060
    Epoch: 78 	Validation Loss: 1.711711, Accuracy: 0.704400
    Epoch: 79 	Training Loss: 0.021388
    Epoch: 79 	Validation Loss: 1.719663, Accuracy: 0.703800
    Epoch: 80 	Training Loss: 0.028990
    Epoch: 80 	Validation Loss: 1.796706, Accuracy: 0.697200
    Epoch: 81 	Training Loss: 0.025605
    Epoch: 81 	Validation Loss: 1.716452, Accuracy: 0.704000
    Epoch: 82 	Training Loss: 0.030471
    Epoch: 82 	Validation Loss: 1.837791, Accuracy: 0.701600
    Epoch: 83 	Training Loss: 0.030337
    Epoch: 83 	Validation Loss: 1.726494, Accuracy: 0.703300
    Epoch: 84 	Training Loss: 0.030966
    Epoch: 84 	Validation Loss: 1.693119, Accuracy: 0.707400
    Epoch: 85 	Training Loss: 0.027364
    Epoch: 85 	Validation Loss: 1.655064, Accuracy: 0.712800
    Epoch: 86 	Training Loss: 0.024018
    Epoch: 86 	Validation Loss: 1.644200, Accuracy: 0.709800
    Epoch: 87 	Training Loss: 0.019428
    Epoch: 87 	Validation Loss: 1.693605, Accuracy: 0.705800
    Epoch: 88 	Training Loss: 0.020505
    Epoch: 88 	Validation Loss: 1.687958, Accuracy: 0.713800
    Epoch: 89 	Training Loss: 0.023152
    Epoch: 89 	Validation Loss: 1.728888, Accuracy: 0.711500
    Epoch: 90 	Training Loss: 0.024781
    Epoch: 90 	Validation Loss: 1.808021, Accuracy: 0.701900
    Epoch: 91 	Training Loss: 0.022800
    Epoch: 91 	Validation Loss: 1.739466, Accuracy: 0.707900
    Epoch: 92 	Training Loss: 0.027215
    Epoch: 92 	Validation Loss: 1.729207, Accuracy: 0.709000
    Epoch: 93 	Training Loss: 0.020297
    Epoch: 93 	Validation Loss: 1.739426, Accuracy: 0.709900
    Epoch: 94 	Training Loss: 0.021358
    Epoch: 94 	Validation Loss: 1.711817, Accuracy: 0.708100
    Epoch: 95 	Training Loss: 0.022679
    Epoch: 95 	Validation Loss: 1.677134, Accuracy: 0.714900
    Epoch: 96 	Training Loss: 0.028576
    Epoch: 96 	Validation Loss: 1.729768, Accuracy: 0.714400
    Epoch: 97 	Training Loss: 0.022231
    Epoch: 97 	Validation Loss: 1.688242, Accuracy: 0.719900
    Epoch: 98 	Training Loss: 0.020768
    Epoch: 98 	Validation Loss: 1.756792, Accuracy: 0.709500
    Epoch: 99 	Training Loss: 0.024385
    Epoch: 99 	Validation Loss: 1.798574, Accuracy: 0.705400
