• PaddleOCRv3之二:TextRecognitionDataGenerator训练集构造



    OCR识别部分数据集中字体和背景是比较重要的方面,在实际的场景下收集不到那么多真实的样本,在训练开始的时候手工构造一批训练数据还是很有必要的。可以用这一批数据预训练后面用真实数据来微调,也可以直接把这些数据和真实数据混合在一起训练。
    在构造数据集的方面TextRecognitionDataGenerator还是比较好用了,这是按行生成文本的,可以生成bounding boxes,和mask。

    1. 开源项目TextRecognitionDataGenerator

    链接TextRecognitionDataGenerator
    在这里插入图片描述
    支持的一些有用的操作

    • blur 模糊操作
    • 在背景图上写字(这个很有用,选择一些真实场景的背景图,然后把文字写在上面,看起来更真实一些)
    • 角度旋转
    • 生成boundingbox和mask标签
    • 字间距调整
    • 字体颜色设置
    • 拉丁语系的提供了100多种字体

    安装有两种方式,

    • pip 安装
    pip install trdg
    
    • 1
    • 源码安装
      下载源码,cd进入setup.py所在目录,然后安装依赖项
    pip install -r requirements.txt
    
    • 1

    使用的时候用trdg文件夹下的run.py文件生成数据,默认在TextRecognitionDataGenerator-master\trdg路径下
    在这里插入图片描述

    2. 使用

    常用选项

    python run.py 
    --font_dir fonts\latin 	#字体文件,可以选择文件夹或者单个字体库
    --dict dicts\myDict.txt 	#字典路径
    -c 50 		#一共生成多少个图片
    --output_dir outputs #保存路径
    -k 5 -rk  		#-k 5表示旋转的角度为5°,后面接-rk表示在-5,+5范围内随机,
    -bl 3 -rbl		#-bl 表示blur 高斯模糊,后面接半径, -rbl表示高斯核的半径在0-3之间
    --case upper 	#upper表示使用大写字符,lower表示用小写
    -b 3 -id images	#-b表示背景,3表示用图片做背景,-id:image_dir,指定背景图片路径
    -f 64		# --format ,生成的图片的高度
    -tc #22211f	#--text_color 6位的16进制数,RGB格式直接翻译,例如:rgb=[10,15,255]==>#0a 0e ff 
    -obb 1		#生成bonding box,
    		#格式是嵌套的,第一行:4个数,分别为第一个字符左上角x,y 最后一个字符的右下角的x,y
    		#第二行:4个数,分别为第二个字符左上角的x,y,最后一个字符的右下角的x,y
    		#第n行:4个数,分别为第n个字符左上角的x,y,最后一个字符的右下角的x,y
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    例如:

    python run.py --font_dir fonts\latin --dict dicts\OCRDict.txt -c 52 --output_dir K:\imageData\OCR\ocr_dataset\test  -k 6 -rk -bl 1 -rbl -b 3 -id K:\imageData\OCR\ocr_background\yellow -f 80
    
    • 1

    查看使用帮助就能只知道完整的用法了

    python run.py --help
    
    • 1
    usage: run.py [-h] [--output_dir [OUTPUT_DIR]] [-i [INPUT_FILE]]
                  [-l [LANGUAGE]] -c [COUNT] [-rs] [-let] [-num] [-sym]
                  [-w [LENGTH]] [-r] [-f [FORMAT]] [-t [THREAD_COUNT]]
                  [-e [EXTENSION]] [-k [SKEW_ANGLE]] [-rk] [-wk] [-bl [BLUR]]
                  [-rbl] [-b [BACKGROUND]] [-hw] [-na NAME_FORMAT]
                  [-om OUTPUT_MASK] [-obb OUTPUT_BBOXES] [-d [DISTORSION]]
                  [-do [DISTORSION_ORIENTATION]] [-wd [WIDTH]] [-al [ALIGNMENT]]
                  [-or [ORIENTATION]] [-tc [TEXT_COLOR]] [-sw [SPACE_WIDTH]]
                  [-cs [CHARACTER_SPACING]] [-m [MARGINS]] [-fi] [-ft [FONT]]
                  [-fd [FONT_DIR]] [-id [IMAGE_DIR]] [-ca [CASE]] [-dt [DICT]]
                  [-ws] [-stw [STROKE_WIDTH]] [-stf [STROKE_FILL]]
                  [-im [IMAGE_MODE]]
    
    Generate synthetic text data for text recognition.
    
    optional arguments:
      -h, --help            show this help message and exit
      --output_dir [OUTPUT_DIR]
                            The output directory
      -i [INPUT_FILE], --input_file [INPUT_FILE]
                            When set, this argument uses a specified text file as
                            source for the text
      -l [LANGUAGE], --language [LANGUAGE]
                            The language to use, should be fr (French), en
                            (English), es (Spanish), de (German), ar (Arabic), cn
                            (Chinese), ja (Japanese) or hi (Hindi)
      -c [COUNT], --count [COUNT]
                            The number of images to be created.
      -rs, --random_sequences
                            Use random sequences as the source text for the
                            generation. Set '-let','-num','-sym' to use
                            letters/numbers/symbols. If none specified, using all
                            three.
      -let, --include_letters
                            Define if random sequences should contain letters.
                            Only works with -rs
      -num, --include_numbers
                            Define if random sequences should contain numbers.
                            Only works with -rs
      -sym, --include_symbols
                            Define if random sequences should contain symbols.
                            Only works with -rs
      -w [LENGTH], --length [LENGTH]
                            Define how many words should be included in each
                            generated sample. If the text source is Wikipedia,
                            this is the MINIMUM length
      -r, --random          Define if the produced string will have variable word
                            count (with --length being the maximum)
      -f [FORMAT], --format [FORMAT]
                            Define the height of the produced images if
                            horizontal, else the width
      -t [THREAD_COUNT], --thread_count [THREAD_COUNT]
                            Define the number of thread to use for image
                            generation
      -e [EXTENSION], --extension [EXTENSION]
                            Define the extension to save the image with
      -k [SKEW_ANGLE], --skew_angle [SKEW_ANGLE]
                            Define skewing angle of the generated text. In
                            positive degrees
      -rk, --random_skew    When set, the skew angle will be randomized between
                            the value set with -k and it's opposite
      -wk, --use_wikipedia  Use Wikipedia as the source text for the generation,
                            using this paremeter ignores -r, -n, -s
      -bl [BLUR], --blur [BLUR]
                            Apply gaussian blur to the resulting sample. Should be
                            an integer defining the blur radius
      -rbl, --random_blur   When set, the blur radius will be randomized between 0
                            and -bl.
      -b [BACKGROUND], --background [BACKGROUND]
                            Define what kind of background to use. 0: Gaussian
                            Noise, 1: Plain white, 2: Quasicrystal, 3: Image
      -hw, --handwritten    Define if the data will be "handwritten" by an RNN
      -na NAME_FORMAT, --name_format NAME_FORMAT
                            Define how the produced files will be named. 0:
                            [TEXT]_[ID].[EXT], 1: [ID]_[TEXT].[EXT] 2: [ID].[EXT]
                            + one file labels.txt containing id-to-label mappings
      -om OUTPUT_MASK, --output_mask OUTPUT_MASK
                            Define if the generator will return masks for the text
      -obb OUTPUT_BBOXES, --output_bboxes OUTPUT_BBOXES
                            Define if the generator will return bounding boxes for
                            the text, 1: Bounding box file, 2: Tesseract format
      -d [DISTORSION], --distorsion [DISTORSION]
                            Define a distorsion applied to the resulting image. 0:
                            None (Default), 1: Sine wave, 2: Cosine wave, 3:
                            Random
      -do [DISTORSION_ORIENTATION], --distorsion_orientation [DISTORSION_ORIENTATION]
                            Define the distorsion's orientation. Only used if -d
                            is specified. 0: Vertical (Up and down), 1: Horizontal
                            (Left and Right), 2: Both
      -wd [WIDTH], --width [WIDTH]
                            Define the width of the resulting image. If not set it
                            will be the width of the text + 10. If the width of
                            the generated text is bigger that number will be used
      -al [ALIGNMENT], --alignment [ALIGNMENT]
                            Define the alignment of the text in the image. Only
                            used if the width parameter is set. 0: left, 1:
                            center, 2: right
      -or [ORIENTATION], --orientation [ORIENTATION]
                            Define the orientation of the text. 0: Horizontal, 1:
                            Vertical
      -tc [TEXT_COLOR], --text_color [TEXT_COLOR]
                            Define the text's color, should be either a single hex
                            color or a range in the ?,? format.
      -sw [SPACE_WIDTH], --space_width [SPACE_WIDTH]
                            Define the width of the spaces between words. 2.0
                            means twice the normal space width
      -cs [CHARACTER_SPACING], --character_spacing [CHARACTER_SPACING]
                            Define the width of the spaces between characters. 2
                            means two pixels
      -m [MARGINS], --margins [MARGINS]
                            Define the margins around the text when rendered. In
                            pixels
      -fi, --fit            Apply a tight crop around the rendered text
      -ft [FONT], --font [FONT]
                            Define font to be used
      -fd [FONT_DIR], --font_dir [FONT_DIR]
                            Define a font directory to be used
      -id [IMAGE_DIR], --image_dir [IMAGE_DIR]
                            Define an image directory to use when background is
                            set to image
      -ca [CASE], --case [CASE]
                            Generate upper or lowercase only. arguments: upper or
                            lower. Example: --case upper
      -dt [DICT], --dict [DICT]
                            Define the dictionary to be used
      -ws, --word_split     Split on words instead of on characters (preserves
                            ligatures, no character spacing)
      -stw [STROKE_WIDTH], --stroke_width [STROKE_WIDTH]
                            Define the width of the strokes
      -stf [STROKE_FILL], --stroke_fill [STROKE_FILL]
                            Define the color of the contour of the strokes, if
                            stroke_width is bigger than 0
      -im [IMAGE_MODE], --image_mode [IMAGE_MODE]
                            Define the image mode to be used. RGB is default, L
                            means 8-bit grayscale images, 1 means 1-bit binary
                            images stored with one pixel per byte, etc.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136

    3. 示例

    • 下面这几张图是原项目中的示例图

    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述

    • 下面的图片使用的是真实场景下的一些背景图,按字典把字符写在上面
      在这里插入图片描述
      在这里插入图片描述
      在这里插入图片描述
      在这里插入图片描述
      在这里插入图片描述
  • 相关阅读:
    摘要-签名-PKI-访问控制-DOS-欺骗技术
    去中心化社交媒体:到底是未来 还是鸡肋?
    百日完成国产数据库opengausss的开源任务--Linux中安装python3.6.X
    【数据结构与算法】之深入解析“粉刷房子”的求解思路与算法示例
    达梦数据库使用和常见问题
    Nginx部署前端网页,Nginx搭建静态资源服务器
    程序的环境
    用代谢组学解密纳米颗粒缓解烟草重金属中毒机制
    产品经理是做什么的,有什么职责
    昂首资本通过套期保值,MT4和MT5这样选
  • 原文地址:https://blog.csdn.net/qq_40622955/article/details/125876518