tensorlayer3/docs/modules/prepro.rst

642 lines
17 KiB
ReStructuredText

API - Data Pre-Processing
=========================
.. automodule:: tensorlayer.prepro
.. autosummary::
affine_rotation_matrix
affine_horizontal_flip_matrix
affine_vertical_flip_matrix
affine_shift_matrix
affine_shear_matrix
affine_zoom_matrix
affine_respective_zoom_matrix
transform_matrix_offset_center
affine_transform
affine_transform_cv2
affine_transform_keypoints
projective_transform_by_points
rotation
rotation_multi
crop
crop_multi
flip_axis
flip_axis_multi
shift
shift_multi
shear
shear_multi
shear2
shear_multi2
swirl
swirl_multi
elastic_transform
elastic_transform_multi
zoom
respective_zoom
zoom_multi
brightness
brightness_multi
illumination
rgb_to_hsv
hsv_to_rgb
adjust_hue
imresize
pixel_value_scale
samplewise_norm
featurewise_norm
channel_shift
channel_shift_multi
drop
array_to_img
find_contours
pt2map
binary_dilation
dilation
binary_erosion
erosion
obj_box_coord_rescale
obj_box_coords_rescale
obj_box_coord_scale_to_pixelunit
obj_box_coord_centroid_to_upleft_butright
obj_box_coord_upleft_butright_to_centroid
obj_box_coord_centroid_to_upleft
obj_box_coord_upleft_to_centroid
parse_darknet_ann_str_to_list
parse_darknet_ann_list_to_cls_box
obj_box_horizontal_flip
obj_box_imresize
obj_box_crop
obj_box_shift
obj_box_zoom
keypoint_random_crop
keypoint_resize_random_crop
keypoint_random_rotate
keypoint_random_flip
keypoint_random_resize
keypoint_random_resize_shortestedge
pad_sequences
remove_pad_sequences
process_sequences
sequences_add_start_id
sequences_add_end_id
sequences_add_end_id_after_pad
sequences_get_mask
..
Threading
------------
.. autofunction:: threading_data
Affine Transform
----------------
Python can be FAST
^^^^^^^^^^^^^^^^^^
Image augmentation is a critical step in deep learning.
Though TensorFlow has provided ``tf.image``,
image augmentation often remains as a key bottleneck.
``tf.image`` has three limitations:
- Real-world visual tasks such as object detection, segmentation, and pose estimation
must cope with image meta-data (e.g., coordinates).
These data are beyond ``tf.image``
which processes images as tensors.
- ``tf.image`` operators
breaks the pure Python programing experience (i.e., users have to
use ``tf.py_func`` in order to call image functions written in Python); however,
frequent uses of ``tf.py_func`` slow down TensorFlow,
making users hard to balance flexibility and performance.
- ``tf.image`` API is inflexible. Image operations are
performed in an order. They are hard to jointly optimize. More importantly,
sequential image operations can significantly
reduces the quality of images, thus affecting training accuracy.
TensorLayer addresses these limitations by providing a
high-performance image augmentation API in Python.
This API bases on affine transformation and ``cv2.wrapAffine``.
It allows you to combine multiple image processing functions into
a single matrix operation. This combined operation
is executed by the fast ``cv2`` library, offering 78x performance improvement (observed in
`openpose-plus <https://github.com/tensorlayer/openpose-plus>`_ for example).
The following example illustrates the rationale
behind this tremendous speed up.
Example
^^^^^^^
The source code of complete examples can be found \
`here <https://github.com/tensorlayer/tensorlayer/tree/master/examples/data_process/tutorial_fast_affine_transform.py>`__.
The following is a typical Python program that applies rotation, shifting, flipping, zooming and shearing to an image,
.. code-block:: python
image = tl.vis.read_image('tiger.jpeg')
xx = tl.prepro.rotation(image, rg=-20, is_random=False)
xx = tl.prepro.flip_axis(xx, axis=1, is_random=False)
xx = tl.prepro.shear2(xx, shear=(0., -0.2), is_random=False)
xx = tl.prepro.zoom(xx, zoom_range=0.8)
xx = tl.prepro.shift(xx, wrg=-0.1, hrg=0, is_random=False)
tl.vis.save_image(xx, '_result_slow.png')
However, by leveraging affine transformation, image operations can be combined into one:
.. code-block:: python
# 1. Create required affine transformation matrices
M_rotate = tl.prepro.affine_rotation_matrix(angle=20)
M_flip = tl.prepro.affine_horizontal_flip_matrix(prob=1)
M_shift = tl.prepro.affine_shift_matrix(wrg=0.1, hrg=0, h=h, w=w)
M_shear = tl.prepro.affine_shear_matrix(x_shear=0.2, y_shear=0)
M_zoom = tl.prepro.affine_zoom_matrix(zoom_range=0.8)
# 2. Combine matrices
# NOTE: operations are applied in a reversed order (i.e., rotation is performed first)
M_combined = M_shift.dot(M_zoom).dot(M_shear).dot(M_flip).dot(M_rotate)
# 3. Convert the matrix from Cartesian coordinates (the origin in the middle of image)
# to image coordinates (the origin on the top-left of image)
transform_matrix = tl.prepro.transform_matrix_offset_center(M_combined, x=w, y=h)
# 4. Transform the image using a single operation
result = tl.prepro.affine_transform_cv2(image, transform_matrix) # 76 times faster
tl.vis.save_image(result, '_result_fast.png')
The following figure illustrates the rational behind combined affine transformation.
.. image:: ../images/affine_transform_why.jpg
:width: 100 %
:align: center
Using combined affine transformation has two key benefits. First, it allows \
you to leverage a pure Python API to achieve orders of magnitudes of speed up in image augmentation,
and thus prevent data pre-processing from becoming a bottleneck in training. \
Second, performing sequential image transformation requires multiple image interpolations. \
This produces low-quality input images. In contrast, a combined transformation performs the \
interpolation only once, and thus
preserve the content in an image. The following figure illustrates these two benefits:
.. image:: ../images/affine_transform_comparison.jpg
:width: 100 %
:align: center
The major reason for combined affine transformation being fast is because it has lower computational complexity.
Assume we have ``k`` affine transformations ``T1, ..., Tk``, where ``Ti`` can be represented by 3x3 matrixes.
The sequential transformation can be represented as ``y = Tk (... T1(x))``,
and the time complexity is ``O(k N)`` where ``N`` is the cost of applying one transformation to image ``x``.
``N`` is linear to the size of ``x``.
For the combined transformation ``y = (Tk ... T1) (x)``
the time complexity is ``O(27(k - 1) + N) = max{O(27k), O(N)} = O(N)`` (assuming 27k << N) where 27 = 3^3 is the cost for combining two transformations.
Get rotation matrix
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_rotation_matrix
Get horizontal flipping matrix
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_horizontal_flip_matrix
Get vertical flipping matrix
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_vertical_flip_matrix
Get shifting matrix
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_shift_matrix
Get shearing matrix
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_shear_matrix
Get zooming matrix
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_zoom_matrix
Get respective zooming matrix
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_respective_zoom_matrix
Cartesian to image coordinates
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: transform_matrix_offset_center
..
Apply image transform
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_transform
Apply image transform
^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_transform_cv2
Apply keypoint transform
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: affine_transform_keypoints
Images
-----------
Projective transform by points
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: projective_transform_by_points
Rotation
^^^^^^^^^
.. autofunction:: rotation
.. autofunction:: rotation_multi
Crop
^^^^^^^^^
.. autofunction:: crop
.. autofunction:: crop_multi
Flip
^^^^^^^^^
.. autofunction:: flip_axis
.. autofunction:: flip_axis_multi
Shift
^^^^^^^^^
.. autofunction:: shift
.. autofunction:: shift_multi
Shear
^^^^^^^^^
.. autofunction:: shear
.. autofunction:: shear_multi
Shear V2
^^^^^^^^^^^
.. autofunction:: shear2
.. autofunction:: shear_multi2
Swirl
^^^^^^^^^
.. autofunction:: swirl
.. autofunction:: swirl_multi
Elastic transform
^^^^^^^^^^^^^^^^^^
.. autofunction:: elastic_transform
.. autofunction:: elastic_transform_multi
Zoom
^^^^^^^^^
.. autofunction:: zoom
.. autofunction:: zoom_multi
Respective Zoom
^^^^^^^^^^^^^^^^^
.. autofunction:: respective_zoom
Brightness
^^^^^^^^^^^^
.. autofunction:: brightness
.. autofunction:: brightness_multi
Brightness, contrast and saturation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: illumination
RGB to HSV
^^^^^^^^^^^^^^
.. autofunction:: rgb_to_hsv
HSV to RGB
^^^^^^^^^^^^^^
.. autofunction:: hsv_to_rgb
Adjust Hue
^^^^^^^^^^^^^^
.. autofunction:: adjust_hue
Resize
^^^^^^^^^^^^
.. autofunction:: imresize
Pixel value scale
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: pixel_value_scale
Normalization
^^^^^^^^^^^^^^^
.. autofunction:: samplewise_norm
.. autofunction:: featurewise_norm
Channel shift
^^^^^^^^^^^^^^
.. autofunction:: channel_shift
.. autofunction:: channel_shift_multi
Noise
^^^^^^^^^^^^^^
.. autofunction:: drop
Numpy and PIL
^^^^^^^^^^^^^^
.. autofunction:: array_to_img
Find contours
^^^^^^^^^^^^^^
.. autofunction:: find_contours
Points to Image
^^^^^^^^^^^^^^^^^
.. autofunction:: pt2map
Binary dilation
^^^^^^^^^^^^^^^^^
.. autofunction:: binary_dilation
Greyscale dilation
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: dilation
Binary erosion
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: binary_erosion
Greyscale erosion
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: erosion
Object detection
-------------------
Tutorial for Image Aug
^^^^^^^^^^^^^^^^^^^^^^^
Hi, here is an example for image augmentation on VOC dataset.
.. code-block:: python
import tensorlayer as tl
## download VOC 2012 dataset
imgs_file_list, _, _, _, classes, _, _,\
_, objs_info_list, _ = tl.files.load_voc_dataset(dataset="2012")
## parse annotation and convert it into list format
ann_list = []
for info in objs_info_list:
ann = tl.prepro.parse_darknet_ann_str_to_list(info)
c, b = tl.prepro.parse_darknet_ann_list_to_cls_box(ann)
ann_list.append([c, b])
# read and save one image
idx = 2 # you can select your own image
image = tl.vis.read_image(imgs_file_list[idx])
tl.vis.draw_boxes_and_labels_to_image(image, ann_list[idx][0],
ann_list[idx][1], [], classes, True, save_name='_im_original.png')
# left right flip
im_flip, coords = tl.prepro.obj_box_horizontal_flip(image,
ann_list[idx][1], is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_flip, ann_list[idx][0],
coords, [], classes, True, save_name='_im_flip.png')
# resize
im_resize, coords = tl.prepro.obj_box_imresize(image,
coords=ann_list[idx][1], size=[300, 200], is_rescale=True)
tl.vis.draw_boxes_and_labels_to_image(im_resize, ann_list[idx][0],
coords, [], classes, True, save_name='_im_resize.png')
# crop
im_crop, clas, coords = tl.prepro.obj_box_crop(image, ann_list[idx][0],
ann_list[idx][1], wrg=200, hrg=200,
is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_crop, clas, coords, [],
classes, True, save_name='_im_crop.png')
# shift
im_shfit, clas, coords = tl.prepro.obj_box_shift(image, ann_list[idx][0],
ann_list[idx][1], wrg=0.1, hrg=0.1,
is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_shfit, clas, coords, [],
classes, True, save_name='_im_shift.png')
# zoom
im_zoom, clas, coords = tl.prepro.obj_box_zoom(image, ann_list[idx][0],
ann_list[idx][1], zoom_range=(1.3, 0.7),
is_rescale=True, is_center=True, is_random=False)
tl.vis.draw_boxes_and_labels_to_image(im_zoom, clas, coords, [],
classes, True, save_name='_im_zoom.png')
In practice, you may want to use threading method to process a batch of images as follows.
.. code-block:: python
import tensorlayer as tl
import random
batch_size = 64
im_size = [416, 416]
n_data = len(imgs_file_list)
jitter = 0.2
def _data_pre_aug_fn(data):
im, ann = data
clas, coords = ann
## change image brightness, contrast and saturation randomly
im = tl.prepro.illumination(im, gamma=(0.5, 1.5),
contrast=(0.5, 1.5), saturation=(0.5, 1.5), is_random=True)
## flip randomly
im, coords = tl.prepro.obj_box_horizontal_flip(im, coords,
is_rescale=True, is_center=True, is_random=True)
## randomly resize and crop image, it can have same effect as random zoom
tmp0 = random.randint(1, int(im_size[0]*jitter))
tmp1 = random.randint(1, int(im_size[1]*jitter))
im, coords = tl.prepro.obj_box_imresize(im, coords,
[im_size[0]+tmp0, im_size[1]+tmp1], is_rescale=True,
interp='bicubic')
im, clas, coords = tl.prepro.obj_box_crop(im, clas, coords,
wrg=im_size[1], hrg=im_size[0], is_rescale=True,
is_center=True, is_random=True)
## rescale value from [0, 255] to [-1, 1] (optional)
im = im / 127.5 - 1
return im, [clas, coords]
# randomly read a batch of image and the corresponding annotations
idexs = tl.utils.get_random_int(min=0, max=n_data-1, number=batch_size)
b_im_path = [imgs_file_list[i] for i in idexs]
b_images = tl.prepro.threading_data(b_im_path, fn=tl.vis.read_image)
b_ann = [ann_list[i] for i in idexs]
# threading process
data = tl.prepro.threading_data([_ for _ in zip(b_images, b_ann)],
_data_pre_aug_fn)
b_images2 = [d[0] for d in data]
b_ann = [d[1] for d in data]
# save all images
for i in range(len(b_images)):
tl.vis.draw_boxes_and_labels_to_image(b_images[i],
ann_list[idexs[i]][0], ann_list[idexs[i]][1], [],
classes, True, save_name='_bbox_vis_%d_original.png' % i)
tl.vis.draw_boxes_and_labels_to_image((b_images2[i]+1)*127.5,
b_ann[i][0], b_ann[i][1], [], classes, True,
save_name='_bbox_vis_%d.png' % i)
Image Aug with TF Dataset API
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Example code for VOC `here <https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_tf_dataset_voc.py>`__.
Coordinate pixel unit to percentage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_coord_rescale
Coordinates pixel unit to percentage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_coords_rescale
Coordinate percentage to pixel unit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_coord_scale_to_pixelunit
Coordinate [x_center, x_center, w, h] to up-left button-right
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_coord_centroid_to_upleft_butright
Coordinate up-left button-right to [x_center, x_center, w, h]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_coord_upleft_butright_to_centroid
Coordinate [x_center, x_center, w, h] to up-left-width-high
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_coord_centroid_to_upleft
Coordinate up-left-width-high to [x_center, x_center, w, h]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_coord_upleft_to_centroid
Darknet format string to list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: parse_darknet_ann_str_to_list
Darknet format split class and coordinate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: parse_darknet_ann_list_to_cls_box
Image Aug - Flip
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_horizontal_flip
Image Aug - Resize
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_imresize
Image Aug - Crop
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_crop
Image Aug - Shift
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_shift
Image Aug - Zoom
^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: obj_box_zoom
Keypoints
------------
Image Aug - Crop
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: keypoint_random_crop
Image Aug - Resize then Crop
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: keypoint_resize_random_crop
Image Aug - Rotate
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: keypoint_random_rotate
Image Aug - Flip
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: keypoint_random_flip
Image Aug - Resize
^^^^^^^^^^^^^^^^^^^^
.. autofunction:: keypoint_random_resize
Image Aug - Resize Shortest Edge
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: keypoint_random_resize_shortestedge
Sequence
---------
More related functions can be found in ``tensorlayer.nlp``.
Padding
^^^^^^^^^
.. autofunction:: pad_sequences
Remove Padding
^^^^^^^^^^^^^^^^^
.. autofunction:: remove_pad_sequences
Process
^^^^^^^^^
.. autofunction:: process_sequences
Add Start ID
^^^^^^^^^^^^^^^
.. autofunction:: sequences_add_start_id
Add End ID
^^^^^^^^^^^^^^^
.. autofunction:: sequences_add_end_id
Add End ID after pad
^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: sequences_add_end_id_after_pad
Get Mask
^^^^^^^^^
.. autofunction:: sequences_get_mask