YOLO-V3

背景介绍

YOLO-V3(You Only Look Once):于2018年发表上。You Only Look Once体现了较高的检测速度，是一步法的典型代表，也是YOLO系列的第三个版本。

YOLO-V3

YOLO-V3和SSD的区别

特征提取网络不同，SSD的特征提取网络为VGG，YOLO-V3中的特征提取网络是ResNet的改进版本Darknet-53，实现深层次的特征融合。
先验框不同，SSD是根据每一层的尺寸和长宽比计算出来的，YOLO-V3中每一层的先验框是根据大量数据聚类而得的。
编解码函数不同，SSD的预测的是中心和宽高的相对偏移量，YOLO-V3预测的是中心的绝对偏移量，宽高的相对偏移量
损失函数不同，SSD采用Smooth-L1-Loss损失函数和多分类交叉熵损失函数。YOLO-V3采用MSE和二分类交叉熵损失函数
预测结果不同，SSD是在多个类别上求Softmax，选择最高的一个类别作为预测类别，YOLO-V3是通过Sigmoid函数，用置信度和预测结果相乘，超过阈值即可认为有目标存在，因此可以预测多个物体存在于一个预测框的情况。
正负样本数量不同，SSD正样本的数量是根据真实框的先验框的IOU确定的，只要大于设定值就视为正样本，然后设置正负样本的比例为1：3确定负样本的个数，YOLO-V3选择与真实框最接近的一个先验框作为正样本，然后从剩余样本中选择IOU小于设定值的作为负样本

YOLO-V3图像分析

YOLO-V3

TensorFlow2.0实现

from functools import reduce
import tensorflow.keras as keras


def compose(*funcs):
    if funcs:
        return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
    else:
        raise ValueError('Composition of empty sequence not supported.')


class Conv_Bn_Relu(keras.layers.Layer):
    def __init__(self, filters, kernel_size, strides, padding, **kwargs):
        super(Conv_Bn_Relu, self).__init__(**kwargs)
        self.blocks = keras.Sequential()
        self.blocks.add(keras.layers.Conv2D(filters=filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=keras.regularizers.l2(5e-4)))
        self.blocks.add(keras.layers.BatchNormalization())
        self.blocks.add(keras.layers.LeakyReLU(0.1))

    def call(self, inputs, **kwargs):

        return self.blocks(inputs)


def block(x, filters, times, name):

    x = compose(keras.layers.ZeroPadding2D((1, 1), name='{}_zeropadding'.format(name)),
                Conv_Bn_Relu(filters, (3, 3), (2, 2), 'valid', name='{}_conv_bn_relu'.format(name)))(x)

    for i in range(times):
        shortcut = x
        x = compose(Conv_Bn_Relu(filters // 2, (1, 1), (1, 1), 'same', name='{}_resblock{}_conv1'.format(name, i + 1)),
                    Conv_Bn_Relu(filters, (3, 3), (1, 1), 'same', name='{}_resblock{}_conv2'.format(name, i + 1)))(x)
        x = keras.layers.Add(name='{}_resblock{}_add'.format(name, i + 1))([x, shortcut])

    return x


def five_conv(x, filters, name):

    x = compose(Conv_Bn_Relu(filters, (1, 1), (1, 1), 'same', name='{}_conv_bn_relu1'.format(name)),
                Conv_Bn_Relu(filters * 2, (3, 3), (1, 1), 'same', name='{}_conv_bn_relu2'.format(name)),
                Conv_Bn_Relu(filters, (1, 1), (1, 1), 'same', name='{}_conv_bn_relu3'.format(name)),
                Conv_Bn_Relu(filters * 2, (3, 3), (1, 1), 'same', name='{}_conv_bn_relu4'.format(name)),
                Conv_Bn_Relu(filters, (1, 1), (1, 1), 'same', name='{}_conv_bn_relu5'.format(name)))(x)

    return x


def yolo_v3(input_shape):
    input_tensor = keras.layers.Input(input_shape, name='input')
    x = input_tensor

    x1 = Conv_Bn_Relu(32, (3, 3), (1, 1), 'same', name='conv_bn_relu')(x)
    x1 = block(x1, 64, 1, name='block1')
    x1 = block(x1, 128, 2, name='block2')
    x1 = block(x1, 256, 8, name='block3')

    x2 = block(x1, 512, 8, name='block4')

    x3 = block(x2, 1024, 4, name='block5')
    feature3 = five_conv(x3, 512, name='feature3')

    pred3 = compose(Conv_Bn_Relu(1024, (3, 3), (1, 1), 'same', name='pred3_conv1'),
                    keras.layers.Conv2D(3 * 85, (1, 1), (1, 1), 'same', name='pred3_conv2'),
                    keras.layers.Flatten(name='pred3_flatten'))(feature3)

    upsampling2 = compose(Conv_Bn_Relu(256, (1, 1), (1, 1), 'same', name='conv_bn_relu2'),
                          keras.layers.UpSampling2D((2, 2), name='upsampling2'))(feature3)
    concatenate2 = keras.layers.Concatenate(name='concatenate2')([upsampling2, x2])
    feature2 = five_conv(concatenate2, 256, name='feature2')
    pred2 = compose(Conv_Bn_Relu(512, (3, 3), (1, 1), 'same', name='pred2_conv1'),
                    keras.layers.Conv2D(3 * 85, (1, 1), (1, 1), 'same', name='pred2_conv2'),
                    keras.layers.Flatten(name='pred2_flatten'))(feature2)

    upsampling1 = compose(Conv_Bn_Relu(128, (1, 1), (1, 1), 'same', name='conv_bn_relu1'),
                          keras.layers.UpSampling2D((2, 2), name='upsampling1'))(feature2)
    concatenate1 = keras.layers.Concatenate(name='concatenate1')([upsampling1, x1])
    feature1 = five_conv(concatenate1, 128, name='feature1')
    pred1 = compose(Conv_Bn_Relu(256, (3, 3), (1, 1), 'same', name='pred1_conv1'),
                    keras.layers.Conv2D(3 * 85, (1, 1), (1, 1), 'same', name='pred1_conv2'),
                    keras.layers.Flatten(name='pred1_flatten'))(feature1)

    concatenate = keras.layers.Concatenate(name='concatenate')([pred1, pred2, pred3])

    output = keras.layers.Reshape((10647, 85), name='reshape')(concatenate)

    model = keras.Model(input_tensor, output, name='YOLO-V3')

    return model


if __name__ == '__main__':

    model = yolo_v3(input_shape=(416, 416, 3))
    model.build(input_shape=(None, 416, 416, 3))
    model.summary()

YOLO-V3

Shape数据集完整实战

文件路径关系说明

project
- shape
  - train_imgs(训练集图像文件夹)
  - annotations(训练集标签文件夹)
  - test_imgs(测试集图像文件夹)
- YOLO-V3_weight(模型权重文件夹)
- YOLO-V3_test_result(测试集结果文件夹)
- YOLO-V3.py

实战步骤说明

目标检测和语义分割是两种不同类型的工程项目，目标检测实战处理比语义分割困难的多，首先要读取真实框信息，将其保存下来，为了后面编码使用。
建立先验框，根据网络结构，在不同特征层上建立不同的先验框，先验框的总个数为每个回归分类特征层的像素点个数x每个像素点上的先验框个数。以论文中的先验框为例，特征层有3个，大小分别为52x52，26x26，13x13，特征层上每个像素点的先验框个数都是3个。

$$ 52^2 \times 3+26^2 \times 3+13^2 \times 3=10647 $$
故先验框总数为10647个。
根据真实框的信息，和所有先验框计算IOU，将IOU最大的先验框作为正样本。然后进行编码，在置信度上面置1说明该位置有目标，对应目标类别置信度置1，其他目标类别置信度置0，并计算正样本先验框的中心坐标与宽高和真实框的中心坐标与宽高之间的差异。输出(batch_size, num_prior, 4 + 1 + num_class + 1)，num_prior为先验框的个数，每个先验框有4 + 1 + num_class + 1个值，4代表中心坐标和宽高相对真实框的差异，1代表属于有目标的置信度，num_class代表属于某一个类别的置信度，最后一个1代表真实框与先验框的IOU，方便计算损失时得到负样本。编码的目的是得到真实框对应的神经网络的输出应该是什么样子，然后让两者尽可能的接近。

IOU(Intersection Over Union，交并比)：用于评估语义分割算法性能的指标是平均IOU，交并比也非常好理解，算法的结果与真实物体进行交运算的结果除以进行并运算的结果。通过下图可以直观的看出IOU的计算方法。
IOU
4. 设计损失函数，因为先验框中大部分都是负样本，因此不能直接计算损失函数，选择与真实框最接近的一个先验框作为正样本，然后从剩余样本中选择IOU小于设定值的作为负样本。
5. 搭建神经网络，设置合适参数，进行训练。
6. 预测时，需要根据神经网络的输出进行逆向解码(编码的反过程)，根据置信度，选择背景置信度乘类别置信度大于设定值的先验框作为候选框，然后根据先验框的坐标和4个回归参数确定候选框的左上角和右下角坐标。对每一类候选框进行NMS得到预测框，并且在图像上画出预测框，并且标出置信度即可完成目标检测任务。
NMS(Non-Maximum Suppression，非极大值抑制)：简单地说，不是最大的我不要，在目标检测中，往往图像上存在大量先验框，会导致很多附近的框都会预测出同一个物体，但是我们只保留最大的一个预测结果，这就是非极大值抑制。
步骤：
(1)从最大概率矩形框F开始，分别判断A~E与F的IOU是否大于某个设定的阈值，假设B、D与F的重叠度超过阈值，那么就扔掉B、D；并标记第一个矩形框F，是我们保留下来的。
(2)从剩下的矩形框A、C、E中，选择概率最大的E，然后判断E与A、C的重叠度，重叠度大于一定的阈值，那么就扔掉；并标记E是我们保留下来的第二个矩形框。
(3)重复步骤(2)，直到所有的框都被抛弃或者保留。
NMS

小技巧

神经网络的输出维度为(batch_size, num_prior, 4 + 1 + num_class)，此数据集为3类，因此最后一个维度是8。每个先验框有8个索引，前面4个索引代表先验框的回归参数，用来对先验框进行调整得到预测框，索引为4代表有目标的置信度，索引为5代表圆形，索引为6代表三角形，索引为7代表正方形。
实际的工程应用中，常常还需要对数据集进行大小调整和增强，在这里为了简单起见，没有进行复杂的操作，小伙伴们应用中要记得根据自己的需要，对图像进行resize或者padding，然后旋转，对比度增强，仿射运算等等操作，增加模型的鲁棒性，并且实际中的图像不一定按照顺序命名的，因此应用中也要注意图像读取的文件名。
设置了权重的保存方式，学习率的下降方式和早停方式。
使用yield关键字，产生可迭代对象，不用将所有的数据都保存下来，大大节约内存。
其中将1000个数据，分成800个训练集，100个验证集和100个测试集，小伙伴们可以自行修改。
注意其中的一些维度变换和numpy，tensorflow常用操作，否则在阅读代码时可能会产生一些困难。
YOLO-V3的特征提取网络为Darknet-53，小伙伴们可以参考特征提取网络部分内容，选择其他的网络进行特征提取，比较不同网络参数量，运行速度，最终结果之间的差异。
图像输入可以先将其归一化到0-1之间或者-1-1之间，因为网络的参数一般都比较小，所以归一化后计算方便，收敛较快。
根据实际的图像大小，选择合适的特征层数，先验框的形状，先验框数量，以及各种阈值
anchor尺寸的确定，anchor通过聚类方法确定，anchor的大小对于检测效果有很大的影响，小伙伴们可以尝试不同的anchor，看一看测试的结果。
TF2.0是一个不太稳定的版本，在训练时，常常出现卡顿情况，在损失函数中前面加上一句打印损失函数的值，就不会发生卡顿。。。喵喵喵~~~。
因为这个博客是对学习的一些总结和记录，意在和学习者探讨和交流，并且给准备入门的同学一些手把手的教学，因此关于目标检测的算法参数设计，我都是自己尝试的，不是针对于这个数据集最优的参数，大家可以根据自己的实际需要修改网络结构。

完整实战代码

import colorsys
import os
import xml.etree.ElementTree as ET
from functools import reduce
import numpy as np
import cv2 as cv
import tensorflow as tf
import tensorflow.keras as keras


# 获取先验框函数
def get_prior(layer_id):
    layer_id = layer_id - 1

    box_widths = [x[1] for x in anchors[layer_id]]
    box_heights = [x[0] for x in anchors[layer_id]]

    step_x = img_size[1] / feature_map[layer_id]
    step_y = img_size[0] / feature_map[layer_id]
    linx = np.linspace(0.5 * step_x, img_size[1] - 0.5 * step_x, feature_map[layer_id])
    liny = np.linspace(0.5 * step_y, img_size[0] - 0.5 * step_y, feature_map[layer_id])

    centers_x, centers_y = np.meshgrid(linx, liny)
    centers_x = centers_x.reshape(-1, 1)
    centers_y = centers_y.reshape(-1, 1)

    # 获得先验框的中心坐标
    prior_center = np.concatenate((centers_x, centers_y), axis=1)
    prior_center = np.tile(prior_center, (1, prior[layer_id] * 2))

    prior_lt_rb = prior_center.copy()

    # 获得先验框的左上右下
    prior_lt_rb[:, ::4] -= box_widths
    prior_lt_rb[:, 1::4] -= box_heights
    prior_lt_rb[:, 2::4] += box_widths
    prior_lt_rb[:, 3::4] += box_heights

    # 归一化到[0, 1]
    prior_lt_rb[:, ::2] /= img_size[1]
    prior_lt_rb[:, 1::2] /= img_size[0]
    prior_lt_rb = prior_lt_rb.reshape(-1, 4)
    prior_lt_rb = np.minimum(np.maximum(prior_lt_rb, 0.0), 1.0)

    prior_center_wh = np.zeros_like(prior_lt_rb)
    # 获得先验框的宽和高
    prior_center_wh[:, 0] = 0.5 * (prior_lt_rb[:, 2] + prior_lt_rb[:, 0])
    prior_center_wh[:, 1] = 0.5 * (prior_lt_rb[:, 3] + prior_lt_rb[:, 1])
    prior_center_wh[:, 2] = prior_lt_rb[:, 2] - prior_lt_rb[:, 0]
    prior_center_wh[:, 3] = prior_lt_rb[:, 3] - prior_lt_rb[:, 1]

    return prior_center_wh.astype(np.float32), prior_lt_rb.astype(np.float32)


# 从xml文件中获取bounding-box信息
def get_bbox(image_id, bbox_path, annotations_path):
    with open(bbox_path, 'w') as f:
        for id in image_id:
            # 图片路径
            info = os.getcwd() + imgs_path[1:] + '\\' + str(id) + '.jpg'
            in_file = open(annotations_path + '\\' + str(id) + '.xml', encoding='utf-8')
            tree = ET.parse(in_file)
            root = tree.getroot()

            for obj in root.iter('object'):
                difficult = obj.find('difficult').text
                cls = obj.find('name').text
                if cls not in classes or int(difficult) == 1:
                    continue
                cls_id = classes.index(cls)
                xmlbox = obj.find('bndbox')
                b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
                info += " " + ",".join([str(x) for x in b]) + ',' + str(cls_id)
            f.writelines(info + '\n')


def compose(*funcs):
    if funcs:
        return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
    else:
        raise ValueError('Composition of empty sequence not supported.')


class Conv_Bn_Relu(keras.layers.Layer):
    def __init__(self, filters, kernel_size, strides, padding, **kwargs):
        super(Conv_Bn_Relu, self).__init__(**kwargs)
        self.blocks = keras.Sequential()
        self.blocks.add(keras.layers.Conv2D(filters=filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=keras.regularizers.l2(5e-4)))
        self.blocks.add(keras.layers.BatchNormalization())
        self.blocks.add(keras.layers.LeakyReLU(0.1))

    def call(self, inputs, **kwargs):

        return self.blocks(inputs)


def block(x, filters, times, name):

    x = compose(keras.layers.ZeroPadding2D((1, 1), name='{}_zeropadding'.format(name)),
                Conv_Bn_Relu(filters, (3, 3), (2, 2), 'valid', name='{}_conv_bn_relu'.format(name)))(x)

    for i in range(times):
        shortcut = x
        x = compose(Conv_Bn_Relu(filters // 2, (1, 1), (1, 1), 'same', name='{}_resblock{}_conv1'.format(name, i + 1)),
                    Conv_Bn_Relu(filters, (3, 3), (1, 1), 'same', name='{}_resblock{}_conv2'.format(name, i + 1)))(x)
        x = keras.layers.Add(name='{}_resblock{}_add'.format(name, i + 1))([x, shortcut])

    return x


def five_conv(x, filters, name):

    x = compose(Conv_Bn_Relu(filters, (1, 1), (1, 1), 'same', name='{}_conv_bn_relu1'.format(name)),
                Conv_Bn_Relu(filters * 2, (3, 3), (1, 1), 'same', name='{}_conv_bn_relu2'.format(name)),
                Conv_Bn_Relu(filters, (1, 1), (1, 1), 'same', name='{}_conv_bn_relu3'.format(name)),
                Conv_Bn_Relu(filters * 2, (3, 3), (1, 1), 'same', name='{}_conv_bn_relu4'.format(name)),
                Conv_Bn_Relu(filters, (1, 1), (1, 1), 'same', name='{}_conv_bn_relu5'.format(name)))(x)

    return x


def small_yolo_v3(input_shape):
    input_tensor = keras.layers.Input(input_shape, name='input')
    x = input_tensor

    x1 = Conv_Bn_Relu(16, (3, 3), (1, 1), 'same', name='conv_bn_relu')(x)
    x1 = block(x1, 32, 2, name='block2')
    x1 = block(x1, 64, 2, name='block3')

    x2 = block(x1, 128, 2, name='block4')

    x3 = block(x2, 256, 2, name='block5')
    feature3 = five_conv(x3, 128, name='feature3')
    pred_reg3 = compose(Conv_Bn_Relu(256, (3, 3), (1, 1), 'same', name='pred3_reg_conv1'),
                    keras.layers.Conv2D(2 * 4, (1, 1), (1, 1), 'same', name='pred3_reg_conv2'),
                    keras.layers.Flatten(name='pred3_reg_flatten'))(feature3)

    pred_conf3 = compose(Conv_Bn_Relu(256, (3, 3), (1, 1), 'same', name='pred3_conf_conv1'),
                    keras.layers.Conv2D(2 * num_class, (1, 1), (1, 1), 'same', name='pred3_conf_conv2'),
                    keras.layers.Flatten(name='pred3_conf_flatten'))(feature3)

    upsampling2 = compose(Conv_Bn_Relu(64, (1, 1), (1, 1), 'same', name='conv_bn_relu2'),
                          keras.layers.UpSampling2D((2, 2), name='upsampling2'))(feature3)
    concatenate2 = keras.layers.Concatenate(name='concatenate2')([upsampling2, x2])
    feature2 = five_conv(concatenate2, 64, name='feature2')
    pred_reg2 = compose(Conv_Bn_Relu(128, (3, 3), (1, 1), 'same', name='pred2_reg_conv1'),
                        keras.layers.Conv2D(2 * 4, (1, 1), (1, 1), 'same', name='pred2_reg_conv2'),
                        keras.layers.Flatten(name='pred2_reg_flatten'))(feature2)

    pred_conf2 = compose(Conv_Bn_Relu(128, (3, 3), (1, 1), 'same', name='pred2_conf_conv1'),
                         keras.layers.Conv2D(2 * num_class, (1, 1), (1, 1), 'same', name='pred2_conf_conv2'),
                         keras.layers.Flatten(name='pred2_conf_flatten'))(feature2)

    upsampling1 = compose(Conv_Bn_Relu(32, (1, 1), (1, 1), 'same', name='conv_bn_relu1'),
                          keras.layers.UpSampling2D((2, 2), name='upsampling1'))(feature2)
    concatenate1 = keras.layers.Concatenate(name='concatenate1')([upsampling1, x1])
    feature1 = five_conv(concatenate1, 32, name='feature1')
    pred_reg1 = compose(Conv_Bn_Relu(64, (3, 3), (1, 1), 'same', name='pred1_reg_conv1'),
                        keras.layers.Conv2D(2 * 4, (1, 1), (1, 1), 'same', name='pred1_reg_conv2'),
                        keras.layers.Flatten(name='pred1_reg_flatten'))(feature1)

    pred_conf1 = compose(Conv_Bn_Relu(64, (3, 3), (1, 1), 'same', name='pred1_conf_conv1'),
                         keras.layers.Conv2D(2 * num_class, (1, 1), (1, 1), 'same', name='pred1_conf_conv2'),
                         keras.layers.Flatten(name='pred1_conf_flatten'))(feature1)

    concatenate_reg = keras.layers.Concatenate(name='concatenate_reg')([pred_reg1, pred_reg2, pred_reg3])
    concatenate_cls = keras.layers.Concatenate(name='concatenate_cls')([pred_conf1, pred_conf2, pred_conf3])

    reshape_reg = keras.layers.Reshape((num_prior, 4), name='reshape_reg')(concatenate_reg)
    reshape_cls = keras.layers.Reshape((num_prior, num_class), name='reshape_cls')(concatenate_cls)

    softmax_cls = keras.layers.Activation('sigmoid', name='sigmoid_cls')(reshape_cls)

    output = keras.layers.Concatenate(name='concatenate')([reshape_reg, softmax_cls])

    # 输出维度是[batch_size, 先验框的总数num_prior, 先验框的位置回归 + 物体的置信度 + 先验框的预测类别]，这里是[8, 1008, 8]
    model = keras.Model(input_tensor, output, name='YOLO-V3')

    return model


# 计算IOU函数
def iou(box):
    inter_upleft = np.maximum(prior_lt_rb[:, :2], box[:2])
    inter_botright = np.minimum(prior_lt_rb[:, 2:4], box[2:])

    inter_wh = inter_botright - inter_upleft
    inter_wh = np.maximum(inter_wh, 0)
    inter = inter_wh[:, 0] * inter_wh[:, 1]
    # 真实框的面积
    area_true = (box[2] - box[0]) * (box[3] - box[1])
    # 先验框的面积
    area_gt = (prior_lt_rb[:, 2] - prior_lt_rb[:, 0]) * (prior_lt_rb[:, 3] - prior_lt_rb[:, 1])
    # 计算iou
    union = area_true + area_gt - inter

    iou = inter / union

    return iou


# 根据真实框bounding-box编码函数
def encoder(box):
    iou_val = iou(box)
    encoded_box = np.zeros((num_prior, 5))

    # 找到每一个真实框，重合程度较高的先验框
    assign_mask = iou_val > overlap_threshold
    encoded_box[:, -1][assign_mask] = iou_val[assign_mask]

    # 先计算真实框的中心与长宽
    encoded_box[:, 0:2] = (0.5 * (box[:2] + box[2:]) - prior_center_wh[:, :2]) * feature_shape
    encoded_box[:, 2:4] = tf.math.log((box[2:] - box[:2]) / prior_center_wh[:, 2:])

    return encoded_box


# 获取网络输出标签数据，即作为损失函数的真实输入y_true
def assign_boxes(boxes):
    # 大小为num_box * (4 + num_class + 1)，4代表4个位置回归，1代表iou
    assignment = np.zeros((num_prior, 4 + num_class + 1))
    if len(boxes) == 0:
        return assignment
    # 对每一个真实框都进行iou计算
    encoded_boxes = np.apply_along_axis(f_encode, 1, boxes[:, :4])
    # 每一个真实框的编码后的值，和iou
    encoded_boxes = encoded_boxes.reshape(-1, num_prior, 5)
    # 取重合程度最大的先验框，并且获取这个先验框的index
    best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=1)

    # 前面4列代表中心和宽高
    assignment[:, :4][best_iou_idx] = encoded_boxes[np.arange(len(best_iou_idx)), best_iou_idx, :4]
    # 中间num_class代表标签信息
    assignment[:, 4:-1][best_iou_idx] = boxes[..., 4:]
    # 最后一列代表iou
    assignment[:, -1] = encoded_boxes[:, :, -1].max(axis=0)
    return assignment


# 通过yield获取可迭代对象
def generate_arrays_from_file(train_data, batch_size):
    # 获取总长度
    n = len(train_data)
    i = 0
    while True:
        X_train = []
        Y_train = []
        # 获取一个batch_size大小的数据
        while len(X_train) < batch_size:
            if i == 0:
                np.random.shuffle(train_data)
            # 从文件中读取图像
            img = cv.imread(imgs_path + '\\' + str(train_data[i]) + '.jpg')
            img = img / 127.5 - 1
            info = np.array([list(map(int, x.split(','))) for x in bounding_info[train_data[i]].split()[3:]])
            if not len(info):
                i = (i + 1) % n
                continue
            box = (info[:, :4] + 1).astype(np.float32)
            box[:, [0, 2]] = box[:, [0, 2]] / img_size[1]
            box[:, [1, 3]] = box[:, [1, 3]] / img_size[0]
            label = np.eye(num_class)[np.array(info[:, 4] + 1, np.int32)]
            label[:, 0] = 1
            if ((box[:, 0] - box[:, 2]) >= 0).any() or ((box[:, 1] - box[:, 3]) >= 0).any():
                i = (i + 1) % n
                continue
            box = np.concatenate([box, label], axis=-1)
            X_train.append(img)
            y = assign_boxes(box)
            Y_train.append(y)
            i = (i + 1) % n
        yield tf.constant(X_train), tf.constant(Y_train)


# 定义损失函数
@tf.function
def compute_loss(y_true, y_pred):

    y_true = tf.reshape(y_true, (-1, 9))
    y_pred = tf.reshape(y_pred, (-1, 8))

    pos = tf.equal(y_true[:, 4], 1)
    neg = tf.logical_and(tf.equal(y_true[:, 4], 0), tf.less(y_true[:, -1], overlap_threshold))

    y_true_pos = tf.boolean_mask(y_true[:, :-1], axis=0, mask=pos)
    y_true_neg = tf.boolean_mask(y_true[:, :-1], axis=0, mask=neg)
    y_pred_pos = tf.boolean_mask(y_pred, axis=0, mask=pos)
    y_pred_neg = tf.boolean_mask(y_pred, axis=0, mask=neg)
    y_true_valid = tf.concat([y_true_pos, y_true_neg], axis=0)
    y_pred_valid = tf.concat([y_pred_pos, y_pred_neg], axis=0)
    reg_loss = tf.reduce_mean((y_true_pos[:, :4] - y_pred_pos[:, :4]) ** 2)
    conf_loss = tf.reduce_mean(keras.losses.binary_crossentropy(y_true_valid[:, 4:], y_pred_valid[:, 4:]))
    tf.print(conf_loss)
    return reg_loss + conf_loss


# 根据网络预测解码函数，获得候选框
def decoder(loc):
    # 获得先验框的中心与宽高
    prior_center_x = prior_center_wh[:, 0]
    prior_center_y = prior_center_wh[:, 1]
    prior_width = prior_center_wh[:, 2]
    prior_height = prior_center_wh[:, 3]

    # 获得真实框的中心与宽高
    decode_bbox_center_x = (loc[:, 0] / feature_shape[:, 0] + prior_center_x)
    decode_bbox_center_y = (loc[:, 1] / feature_shape[:, 1] + prior_center_y)
    decode_bbox_width = np.exp(loc[:, 2]) * prior_width
    decode_bbox_height = np.exp(loc[:, 3]) * prior_height

    # 获取真实框的左上角与右下角
    decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
    decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
    decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
    decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height

    # 真实框的左上角与右下角进行堆叠
    decode_bbox = np.concatenate((decode_bbox_xmin[:, np.newaxis], decode_bbox_ymin[:, np.newaxis], decode_bbox_xmax[:, np.newaxis], decode_bbox_ymax[:, np.newaxis]), axis=-1)
    # 防止超出0与1
    decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)
    return decode_bbox


# 将候选框进行非极大值抑制，获得最终的预测框
def detection_out(pred):
    # 回归网络预测结果
    mbox_loc = pred[:, :4]
    # 分类网络预测结果
    mbox_conf = pred[:, 4:]
    results = []
    # 对每一个图像进行处理
    decode_bbox = decoder(mbox_loc)
    for c in range(1, num_class):
        c_confs = mbox_conf[:, c] * mbox_conf[:, 0]
        c_confs_mask = c_confs > confidence_threshold
        if len(c_confs[c_confs_mask]) > 0:
            # 取出得分高于confidence_threshold的框
            boxes_to_process = decode_bbox[c_confs_mask]
            confs_to_process = c_confs[c_confs_mask]
            # 进行iou的非极大抑制
            idx = tf.image.non_max_suppression(boxes_to_process.astype(np.float32), confs_to_process, max_output_size=keep_top_k, iou_threshold=nms_thresh)
            idx = idx.numpy()
            # 取出在非极大抑制中效果较好的内容
            box = boxes_to_process[idx]
            confs = confs_to_process[idx][:, np.newaxis]
            # 将label、置信度、框的位置进行堆叠。
            labels = c * np.ones((len(idx), 1))
            c_pred = np.concatenate((labels, confs, box), axis=1)
            # 添加进result里
            results.extend(c_pred)
    if len(results) > 0:
        # 按照置信度进行排序
        results = np.array(results)
        arg = np.argsort(results[:, 1])[::-1][:keep_top_k]
        results = results[arg]
    return results


# 将图像进行预测并画框
def detect_image(filename):

    test_img = cv.imread(filename)
    preds = tf.squeeze(model.predict(tf.constant([test_img / 127.5 - 1])), axis=0).numpy()

    # 将预测结果进行解码
    results = detection_out(preds)

    if len(results) <= 0:
        return test_img
    print(filename)
    # 筛选出其中得分高于confidence的框
    det_label = results[:, 0]
    det_conf = results[:, 1]
    det_xmin, det_ymin, det_xmax, det_ymax = results[:, 2], results[:, 3], results[:, 4], results[:, 5]
    indices = [index for index, conf in enumerate(det_conf) if conf >= confidence_threshold]
    top_conf = det_conf[indices]
    top_label_indices = det_label[indices].tolist()
    top_xmin = np.expand_dims(det_xmin[indices], -1) * img_size[1]
    top_ymin = np.expand_dims(det_ymin[indices], -1) * img_size[0]
    top_xmax = np.expand_dims(det_xmax[indices], -1) * img_size[1]
    top_ymax = np.expand_dims(det_ymax[indices], -1) * img_size[0]
    boxes = np.concatenate([top_xmin, top_ymin, top_xmax, top_ymax], axis=-1)

    font = cv.FONT_HERSHEY_SIMPLEX

    for i, c in enumerate(top_label_indices):
        cls = int(c) - 1
        predicted_class = classes[cls]
        score = top_conf[i]

        left, top, right, bottom = boxes[i]
        left = left - expand
        top = top - expand
        right = right + expand
        bottom = bottom + expand

        left = max(0, np.floor(left + 0.5).astype('int32'))
        top = max(0, np.floor(top + 0.5).astype('int32'))
        right = min(img_size[1], np.floor(right + 0.5).astype('int32'))
        bottom = min(img_size[0], np.floor(bottom + 0.5).astype('int32'))

        # 画框
        label = '{} {:.2f}'.format(predicted_class, score)

        cv.rectangle(test_img, (left, top), (right, bottom), colors[cls], 1)
        cv.putText(test_img, label, (left, top - int(label_size * 10)), font, label_size, colors[cls], 1)
    return test_img


if __name__ == '__main__':
    neg_pos_ratio = 3
    # 包括背景的类别数目
    num_class = 4
    train_data = list(range(800))
    validation_data = list(range(800, 900))
    test_data = range(900, 1000)
    epochs = 50
    batch_size = 8
    tf.random.set_seed(22)
    img_size = (128, 128)
    classes = ["circle", "triangle", "square"]
    # 每个特征图上每个像素先验框的个数
    prior = [2, 2, 2]
    # 特征图的大小
    feature_map = [32, 16, 8]
    # anchor的长宽
    anchors = [[(4, 4), (8, 8)], [(16, 16), (24, 24)], [(36, 36), (64, 64)]]
    # 先验框的个数
    num_prior = sum([prior[x] * feature_map[x] ** 2 for x in range(len(prior))])
    # 获取所有先验框
    prior_center_wh = []
    prior_lt_rb = []
    feature_shape = []
    for i in range(len(prior)):
        c_wh, tl_br = get_prior(i + 1)
        prior_center_wh.append(c_wh)
        prior_lt_rb.append(tl_br)
        feature_shape.append(np.broadcast_to(feature_map[i], (feature_map[i] ** 2 * prior[i], 2)))
    # 1008 * 4
    prior_center_wh = np.vstack(prior_center_wh)
    # 1008 * 4
    prior_lt_rb = np.vstack(prior_lt_rb)
    # 1008 * 2
    feature_shape = np.vstack(feature_shape)
    # IOU超过0.5的视为正样本
    overlap_threshold = 0.3
    # 编码函数
    f_encode = encoder
    # 画框设置不同的颜色
    hsv_tuples = [(x / (num_class - 1), 1., 1.) for x in range(num_class - 1)]
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    colors = list(map(lambda x: (int(x[1] * 255), int(x[2] * 255), int(x[0] * 255)), colors))
    # 设置图像检测最多的框数目
    keep_top_k = 5
    # 设置检测置信度，大于该值认为有物体
    confidence_threshold = 0.5
    # 非极大值抑制阈值，重叠度不得大于该值
    nms_thresh = 0.5
    # 预测框不要紧贴物体，向外扩展像素大小
    expand = 5
    # 标签大小
    label_size = 0.3

    imgs_path = r'.\shape\train_imgs'
    annotations_path = r'.\shape\annotations'
    test_path = r'.\shape\test_imgs'
    save_path = r'.\Yolo_V3_test_result'
    weight_path = r'.\Yolo_V3_weight'
    bbox_path = r'.\shape\bbox.txt'

    # 将xml存储的bbox转换为bbox.txt文件，内容为file_path + bbox + class_id
    if 'bbox.txt' not in os.listdir(r'.\shape'):
        get_bbox(train_data + validation_data, bbox_path, annotations_path)

    with open(bbox_path, 'r') as f:
        bounding_info = f.readlines()

    try:
        os.mkdir(save_path)
    except FileExistsError:
        print(save_path + 'has been exist')

    try:
        os.mkdir(weight_path)
    except FileExistsError:
        print(weight_path + 'has been exist')

    model = small_yolo_v3(input_shape=(img_size[0], img_size[1], 3))

    model.build(input_shape=(batch_size, img_size[0], img_size[1], 3))
    model.summary()

    optimizor = keras.optimizers.Adam(lr=1e-4)

    model.compile(optimizer=optimizor, loss=compute_loss)

    # 保存的方式，3世代保存一次
    checkpoint_period = keras.callbacks.ModelCheckpoint(
        weight_path + '\\' + 'ep{epoch:03d}-loss{loss:.3f}.h5',
        monitor='loss',
        save_weights_only=True,
        save_best_only=True,
        period=3
    )

    # 学习率下降的方式，val_loss3次不下降就下降学习率继续训练
    reduce_lr = keras.callbacks.ReduceLROnPlateau(
        monitor='loss',
        factor=0.5,
        patience=3,
        verbose=1
    )

    # 是否需要早停，当val_loss一直不下降的时候意味着模型基本训练完毕，可以停止
    early_stopping = keras.callbacks.EarlyStopping(
        monitor='loss',
        min_delta=0,
        patience=10,
        verbose=1
    )

    model.fit_generator(generate_arrays_from_file(train_data, batch_size),
                        steps_per_epoch=max(1, len(train_data) // batch_size),
                        validation_data=generate_arrays_from_file(validation_data, batch_size),
                        validation_steps=max(1, len(validation_data) // batch_size),
                        epochs=epochs,
                        callbacks=[checkpoint_period, reduce_lr, early_stopping])

    for name in test_data:
        test_img_path = test_path + '\\' + str(name) + '.jpg'
        save_img_path = save_path + '\\' + str(name) + '.png'
        test_img = detect_image(test_img_path)
        cv.imwrite(save_img_path, test_img)

模型运行结果

YOLO-V3

YOLO-V3小结

YOLO-V3是一种简单的目标检测网络，从上图可以看出YOLO-V3模型的参数量有62M，由于其结构简单，效果稳定，因此很多场合仍然使用YOLO-V3作为目标检测算法。YOLO-V3作为一步法目标检测的元老级模型，是小伙伴们需要掌握的一个模型。