Data Augmentation(数据增强)

Data Augmentation

背景介绍

  Data Augmentation(数据增强):在深度学习工程实践中,必不可少的是数据集,但是如果自己采集数据集,非常的耗时,而且数量往往不够。这时需要一定的数据增强操作,来扩充自己的数据集使网络更加鲁棒。今天给小伙伴们盘点常用的数据增强操作。

data

OpenCV和Numpy存储数据差异

OpenCV:在OpenCV中,图像的存储是列在前,行在后,和我们的直观理解不同,因此使用起来,尤其在图像坐标索引时,需要特别注意,有关OpenCV常用操作,可以参考我的另一篇博客OpenCV常用库。
Numpy:在Numpy中,图像的存储是行在前,列在后,这符合我们的理解,因为在学习二维数组时,就是按照行在前,列在后的思想,有关Numpy常用操作,可以参考我的另一篇博客Numpy常用库。

1
2
3
4
5
6
import cv2 as cv
import numpy as np


img = cv.imread('origin.png')
img1 = cv.resize(img, (300, 400))

img1的图像Numpy数组的shape为(400, 300, 3),第一次使用时,奇怪的知识又增加了,一定要记得数据的转换。而且OpenCV读取和显示的图像默认是BGR类型的,这也容易产生错误。
compare

Flip(翻转)

Flip(翻转):对图像进行水平翻转,垂直翻转或者水平垂直同时翻转。翻转时图像的高宽不会发生变化,但是图像的坐标会发生变化。因此在目标检测等问题上,bounding-box需要调整相应的坐标
flip
flip
flip
在cv2中已经给我们提供了图像翻转的函数flip。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import cv2 as cv
import numpy as np


def imshow(img, boxes):
cv.namedWindow('result')
nums = len(img)
for i in range(nums):
for box in boxes[i]:
cv.rectangle(img[i], tuple(box[:2]), tuple(box[2:4]), color[box[-1] - 1], 2)
if nums == 4:
result = cv.vconcat([cv.hconcat(img[:2]), cv.hconcat(img[2:])])
else:
result = cv.hconcat(img)
cv.imshow('result', result)
cv.waitKey(0)
cv.destroyAllWindows()


img = cv.imread('origin.png')
boxes = [[24, 18, 220, 260, 1], [196, 16, 330, 244, 2]]
color = [[0, 255, 0], [0, 0, 255]]
output_size = (260, 360)
input_size = tuple(img.shape[:2])


# 垂直翻转
flip0 = cv.flip(img, 0)
# 水平翻转
flip1 = cv.flip(img, 1)
# 垂直水平同时翻转
flip2 = cv.flip(img, -1)
boxes0 = [[img.shape[0] - box[4 - i] if i % 2 else box[i] for i in range(4)] + [box[-1]] for box in boxes]
boxes1 = [[box[i] if i % 2 else img.shape[1] - box[2 - i] for i in range(4)] + [box[-1]] for box in boxes]
boxes2 = [[img.shape[0] - box[4 - i] if i % 2 else img.shape[1] - box[2 - i] for i in range(4)] + [box[-1]] for box in boxes]
imshow([img, flip0, flip1, flip2], [boxes, boxes0, boxes1, boxes2])

flip

Rotate(旋转)

Rotate(旋转):对图像进行旋转,和翻转不同,不但图像的坐标会发生变化,而且图像的高宽可能会发生变化。因此在目标检测等问题上,bounding-box需要调整相应的坐标
rotate
rotate
rotate
在cv2中已经给我们提供了图像旋转的函数rotate,因为旋转可能会改变图像的高宽,为了展示方便,我先将它们的高宽调成相等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import cv2 as cv
import numpy as np


def imshow(img, boxes):
cv.namedWindow('result')
nums = len(img)
for i in range(nums):
for box in boxes[i]:
cv.rectangle(img[i], tuple(box[:2]), tuple(box[2:4]), color[box[-1] - 1], 2)
if nums == 4:
result = cv.vconcat([cv.hconcat(img[:2]), cv.hconcat(img[2:])])
else:
result = cv.hconcat(img)
cv.imshow('result', result)
cv.waitKey(0)
cv.destroyAllWindows()


img = cv.imread('origin.png')
boxes = [[24, 18, 220, 260, 1], [196, 16, 330, 244, 2]]
color = [[0, 255, 0], [0, 0, 255]]
output_size = (260, 360)
input_size = tuple(img.shape[:2])

new_size = (300, 300)
resize_img = cv.resize(img, new_size)
boxes = [[int(box[i] * resize_img.shape[0] / img.shape[0]) if i % 2 else int(box[i] * resize_img.shape[1] / img.shape[1]) for i in range(4)] + [box[-1]] for box in boxes]

rotate0 = cv.rotate(resize_img, 0)
boxes0 = [[resize_img.shape[0] - box[3], box[0], resize_img.shape[0] - box[1], box[2]] + [box[-1]] for box in boxes]

rotate1 = cv.rotate(resize_img, 1)
boxes1 = [[resize_img.shape[1] - box[2], resize_img.shape[0] - box[3], resize_img.shape[1] - box[0], resize_img.shape[0] - box[1]] + [box[-1]] for box in boxes]

rotate2 = cv.rotate(resize_img, 2)
boxes2 = [[box[1], resize_img.shape[1] - box[2], box[3], resize_img.shape[1] - box[0]] + [box[-1]] for box in boxes]
imshow([resize_img, rotate0, rotate1, rotate2], [boxes, boxes0, boxes1, boxes2])

rotate

Resize(调整大小)

Resize(调整大小):对图像进行大小调整,这个操作是最复杂的,因为神经网络的输入是固定尺寸的图像,所以在大小调整后还要以相同尺寸输出,所以还需要加上灰框或者裁剪,而且还要考虑目标是否会被裁剪,因此在目标检测等问题上,bounding-box需要调整相应的坐标,而且需要判断bounding-box是否有效存在,在这里我设置了一个阈值,如果没有被裁剪掉的面积占总面积的0.25倍以上,则认为bounding-box是有效的,否则删除这个bounding-box。
resize
在cv2中已经给我们提供了图像调整大小的函数resize,为了展示方便,我将输出尺寸设置和输入相同。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import cv2 as cv
import numpy as np


def imshow(img, boxes):
cv.namedWindow('result')
nums = len(img)
for i in range(nums):
for box in boxes[i]:
cv.rectangle(img[i], tuple(box[:2]), tuple(box[2:4]), color[box[-1] - 1], 2)
if nums == 4:
result = cv.vconcat([cv.hconcat(img[:2]), cv.hconcat(img[2:])])
else:
result = cv.hconcat(img)
cv.imshow('result', result)
cv.waitKey(0)
cv.destroyAllWindows()


img = cv.imread('origin.png')
boxes = [[24, 18, 220, 260, 1], [196, 16, 330, 244, 2]]
color = [[0, 255, 0], [0, 0, 255]]
output_size = (260, 360)
input_size = tuple(img.shape[:2])

ratio_h, ratio_w = np.where(np.random.rand(2) < 0.5, np.random.uniform(0.5, 1, 2), np.random.uniform(1, 2, 2))
new_h, new_w = int(input_size[0] * ratio_h), int(input_size[1] * ratio_w)
scale_img = cv.resize(img, (new_w, new_h))
valid_h = 0 if new_h <= output_size[0] else np.random.randint(0, new_h - output_size[0])
valid_w = 0 if new_w <= output_size[1] else np.random.randint(0, new_w - output_size[1])
crop_img = scale_img[valid_h:min(valid_h + output_size[0], new_h), valid_w:min(valid_w + output_size[1], new_w)]
resize_img0 = np.ones((output_size[0], output_size[1], 3), np.uint8) * 127
dy, dx = int(np.random.rand() * (output_size[0] - crop_img.shape[0])), int(np.random.rand() * (output_size[1] - crop_img.shape[1]))
resize_img0[dy:dy + crop_img.shape[0], dx:dx + crop_img.shape[1]] = crop_img
boxes0 = [[int(box[i] * ratio_h) + dy - valid_h if i % 2 else int(box[i] * ratio_w) + dx - valid_w for i in range(4)] + [box[-1]] for box in boxes]
boxes1 = []
for box in boxes0:
new_box = [min(output_size[3 - i], box[i]) if i >= 2 else max(0, box[i]) for i in range(4)] + [box[-1]]
area = (box[2] - box[0]) * (box[3] - box[1])
new_area = (new_box[2] - new_box[0]) * (new_box[3] - new_box[1])
if new_area > 0.25 * area:
boxes1.append(new_box)
imshow([img, resize_img0], [boxes, boxes1])

resize

Noise(噪声)

Noise(噪声):对图像添加噪声,这个操作是最简单的,因为添加噪声,尺寸不会改变,因此bounding-box不需要修改,但是要注意,如果加入噪声,数据变为负数后,直接转化为uint8类型的数据会出错,所以需要先clip,将数据限制在[0, 255]之间
在numpy中直接使用random模块,可以产生随机数,根据需要设置相应的噪声类型即可,一般高斯噪声较为常用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import cv2 as cv
import numpy as np


def imshow(img, boxes):
cv.namedWindow('result')
nums = len(img)
for i in range(nums):
for box in boxes[i]:
cv.rectangle(img[i], tuple(box[:2]), tuple(box[2:4]), color[box[-1] - 1], 2)
if nums == 4:
result = cv.vconcat([cv.hconcat(img[:2]), cv.hconcat(img[2:])])
else:
result = cv.hconcat(img)
cv.imshow('result', result)
cv.waitKey(0)
cv.destroyAllWindows()


img = cv.imread('origin.png')
boxes = [[24, 18, 220, 260, 1], [196, 16, 330, 244, 2]]
color = [[0, 255, 0], [0, 0, 255]]
output_size = (260, 360)
input_size = tuple(img.shape[:2])

noise_img = np.clip(img + np.random.normal(0, 30, (img.shape[0], img.shape[1], 3)), 0, 255).astype(np.uint8)
imshow([img, noise_img], [boxes, boxes])

noise

HSV(色调,饱和度,明度)

HSV(色调,饱和度,明度):HSV彩色空间不同于我们熟知的RGB彩色空间,其使用色调,饱和度和明度代替红绿蓝三个通道。在OpenCV中,色调的范围是[0, 179],饱和度和明度的范围都是[0, 255],因为改变HSV的数值,尺寸不会改变,因此bounding-box不需要修改
在cv2中直接使用cvtColor进行色彩空间转化,然后给HSV通道引入随机数即可达到改变色调,饱和度和明度的效果。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import cv2 as cv
import numpy as np


def imshow(img, boxes):
cv.namedWindow('result')
nums = len(img)
for i in range(nums):
for box in boxes[i]:
cv.rectangle(img[i], tuple(box[:2]), tuple(box[2:4]), color[box[-1] - 1], 2)
if nums == 4:
result = cv.vconcat([cv.hconcat(img[:2]), cv.hconcat(img[2:])])
else:
result = cv.hconcat(img)
cv.imshow('result', result)
cv.waitKey(0)
cv.destroyAllWindows()


img = cv.imread('origin.png')
boxes = [[24, 18, 220, 260, 1], [196, 16, 330, 244, 2]]
color = [[0, 255, 0], [0, 0, 255]]
output_size = (260, 360)
input_size = tuple(img.shape[:2])

hsv_img = cv.cvtColor(img, cv.COLOR_BGR2HSV)
hsv_img = hsv_img * np.random.uniform(0.5, 2, 3)
hsv_img[:, :, 0] = np.clip(hsv_img[:, :, 0], 0, 179)
hsv_img[:, :, 1] = np.clip(hsv_img[:, :, 1], 0, 255)
hsv_img[:, :, 2] = np.clip(hsv_img[:, :, 2], 0, 255)
hsv_img = hsv_img.astype(np.uint8)
bgr_img = cv.cvtColor(hsv_img, cv.COLOR_HSV2BGR)
imshow([img, bgr_img], [boxes, boxes])

hsv

整体代码

将上面五种操作结合起来,写在一个函数中,可以实现图像和bounding-box输入,增强后的图像和bounding-box输出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
import cv2 as cv
import numpy as np


def imshow(img, boxes):
cv.namedWindow('result')
nums = len(img)
for i in range(nums):
for box in boxes[i]:
cv.rectangle(img[i], tuple(box[:2]), tuple(box[2:4]), color[box[-1] - 1], 2)
if nums == 4:
result = cv.vconcat([cv.hconcat(img[:2]), cv.hconcat(img[2:])])
else:
result = cv.hconcat(img)
cv.imshow('result', result)
cv.waitKey(0)
cv.destroyAllWindows()


def data_augmentation(image, bbox, output_shape, resize_scale=2, threshold=0.25, noise_scale=10, h_scale=2, s_scale=2, v_scale=2):

# 增加噪声
noise_img = np.clip(image + (np.random.normal(0, noise_scale, img.shape)), 0, 255).astype(np.uint8)

# 改变色调,饱和度,明度
hsv_img = cv.cvtColor(noise_img, cv.COLOR_BGR2HSV)

h_ratio = np.random.uniform(1 / h_scale, 1, 1) if np.random.rand() < 0.5 else np.random.uniform(1, h_scale, 1)
s_ratio = np.random.uniform(1 / s_scale, 1, 1) if np.random.rand() < 0.5 else np.random.uniform(1, s_scale, 1)
v_ratio = np.random.uniform(1 / v_scale, 1, 1) if np.random.rand() < 0.5 else np.random.uniform(1, v_scale, 1)

hsv_img[:, :, 0] = np.clip(hsv_img[:, :, 0] * h_ratio, 0, 179)
hsv_img[:, :, 1] = np.clip(hsv_img[:, :, 1] * s_ratio, 0, 255)
hsv_img[:, :, 2] = np.clip(hsv_img[:, :, 2] * v_ratio, 0, 255)

hsv_img = hsv_img.astype(np.uint8)

bgr_img = cv.cvtColor(hsv_img, cv.COLOR_HSV2BGR)

# 旋转
rotate_flag = np.random.rand()
if rotate_flag < 1 / 4:
rotate_img = cv.rotate(bgr_img, 0)
rotate_boxes = [[bgr_img.shape[0] - box[3], box[0], bgr_img.shape[0] - box[1], box[2]] + [box[-1]] for box in bbox]
elif rotate_flag < 2 / 4:
rotate_img = cv.rotate(bgr_img, 1)
rotate_boxes = [[bgr_img.shape[1] - box[2], bgr_img.shape[0] - box[3], bgr_img.shape[1] - box[0], bgr_img.shape[0] - box[1]] + [box[-1]] for box in bbox]
elif rotate_flag < 3 / 4:
rotate_img = cv.rotate(bgr_img, 2)
rotate_boxes = [[box[1], bgr_img.shape[1] - box[2], box[3], bgr_img.shape[1] - box[0]] + [box[-1]] for box in bbox]
else:
rotate_img = bgr_img
rotate_boxes = bbox

# 翻转操作
flip_flag = np.random.rand()
if flip_flag < 1 / 4:
flip_img = cv.flip(rotate_img, 0)
flip_boxes = [[rotate_img.shape[0] - box[4 - i] if i % 2 else box[i] for i in range(4)] + [box[-1]] for box in rotate_boxes]
elif flip_flag < 2 / 4:
flip_img = cv.flip(rotate_img, 1)
flip_boxes = [[box[i] if i % 2 else rotate_img.shape[1] - box[2 - i] for i in range(4)] + [box[-1]] for box in rotate_boxes]
elif flip_flag < 3 / 4:
flip_img = cv.flip(rotate_img, -1)
flip_boxes = [[rotate_img.shape[0] - box[4 - i] if i % 2 else rotate_img.shape[1] - box[2 - i] for i in range(4)] + [box[-1]] for box in rotate_boxes]
else:
flip_img = rotate_img
flip_boxes = rotate_boxes

# 调整大小,并加灰框
ratio_h, ratio_w = np.where(np.random.rand(2) < 0.5, np.random.uniform(1 / resize_scale, 1, 2), np.random.uniform(1, resize_scale, 2))
new_h, new_w = int(flip_img.shape[0] * ratio_h), int(flip_img.shape[1] * ratio_w)
scale_img = cv.resize(flip_img, (new_w, new_h))
valid_h = 0 if new_h <= output_shape[0] else np.random.randint(0, new_h - output_shape[0])
valid_w = 0 if new_w <= output_shape[1] else np.random.randint(0, new_w - output_shape[1])
crop_img = scale_img[valid_h:min(valid_h + output_shape[0], new_h), valid_w:min(valid_w + output_shape[1], new_w)]
resize_img = np.ones((output_shape[0], output_shape[1], 3), np.uint8) * 127
dy, dx = int(np.random.rand() * (output_shape[0] - crop_img.shape[0])), int(np.random.rand() * (output_shape[1] - crop_img.shape[1]))
resize_img[dy:dy + crop_img.shape[0], dx:dx + crop_img.shape[1]] = crop_img
boxes = [[int(box[i] * ratio_h) + dy - valid_h if i % 2 else int(box[i] * ratio_w) + dx - valid_w for i in range(4)] + [box[-1]] for box in flip_boxes]
resize_boxes = []
for box in boxes:
new_box = [min(output_shape[3 - i], box[i]) if i >= 2 else max(0, box[i]) for i in range(4)] + [box[-1]]
area = (box[2] - box[0]) * (box[3] - box[1])
new_area = (new_box[2] - new_box[0]) * (new_box[3] - new_box[1])
if new_area > threshold * area:
resize_boxes.append(new_box)

return resize_img, resize_boxes

img = cv.imread('origin.png')
boxes = [[24, 18, 220, 260, 1], [196, 16, 330, 244, 2]]
color = [[0, 255, 0], [0, 0, 255]]
output_size = (300, 300)
input_size = tuple(img.shape[:2])

img1, bbox1 = data_augmentation(img, boxes, output_size)
img2, bbox2 = data_augmentation(img, boxes, output_size)
img3, bbox3 = data_augmentation(img, boxes, output_size)
img4, bbox4 = data_augmentation(img, boxes, output_size)
imshow([img1, img2, img3, img4], [bbox1, bbox2, bbox3, bbox4])

result

其他数据增强操作

上面说的数据增强操作,主要是针对于有bounding-box的目标检测任务,所以数据增强会受到一定的限制。如果我们面对的问题仅仅是一个图像分类问题,那么我们就会有更多的图像增强操作,如设置图像的旋转角度,或者进行仿射操作,但是因为bounding-box的存在,如果进行任意角度的旋转或者仿射操作,bounding-box就不再是一个与坐标轴平行的矩形框,而且图像的尺寸变化也会变得非常复杂,因此通过上面5中数据增强操作已经可以满足绝大部分的需要,所以不探讨如何使用其他方法,有感兴趣的小伙伴们可以去寻找一些自己喜欢的数据增强方式。

小结

  数据量是深度学习模型性能的重要决定因素,数据量很少,可能很好的算法也很难达到较好的效果。因此数据增强操作就变得异常重要,如何进行数据增强是小伙伴们必须要掌握的技术。

-------------本文结束感谢您的阅读-------------
0%