代码源自https://github.com/Belval/TextRecognitionDataGenerator
下载安装包解压
安装环境pip install trdg
进入文件夹,安装requirementspip install -r requirements.txt
安装完成
进入文件夹cd trdg
尝试运行看是否存在错误python run.py -c 10
报错
AttributeError: module ‘PIL.Image’ has no attribute ‘Resampling’
‘FreeTypeFont’ object has no attribute ‘getlength’
这是由于pillow包的版本不对问题
更新pillow为9.2.0
pip install --upgrade pillow -i https://pypi.tuna.tsinghua.edu.cn/simple
再次运行,out文件夹下含有对应数据
if args.name_format == 2:
# Create file with filename-to-label connections
with open(
os.path.join(args.output_dir, "labels.txt"), "w", encoding="utf8"
) as f:
for i in range(string_count):
iname="20221114_gan_y_"+str(i+0);#需要修改 命名方式如上,如果需要接着标签,i+数字
file_name = iname + "." + args.extension
label = strings[i]
label = label.replace(" ", "")
f.write("{}\t{}\n".format(file_name, label))
python run.py 根据需要增加下列参数
引用自:https://blog.csdn.net/u012995500/article/details/109405270?spm=1001.2014.3001.5501
参数 | 参数说明 | 举例 | |
---|---|---|---|
--output_dir | 生成图片输出路径 | ||
-ft/--font | 设定生成文本所用的字体文件(.ttf)格式 | -ft ./fonts/font/simhei.ttf | |
-fd/--font_dir | 设定生成文本所用字体的文件夹,生成的图片从文件夹中随机选择字体 | ||
-dt/--dict | 设定从字典文件(路径)中选择单词生成图片 | ./dicts/0.txt (一个临时文件夹) | |
-i/--input_file | 生成图片中文字的源文件(路径),不指定用项目默认文件 | ||
-l/--language | 语言:en—英文,ch—中文,默认英文 | ||
-w/--length | 随机生成图片包含的单词数 | 12个字-w 12 | |
-c/--count | 生成的图片数量 | -c 500 | |
-f/--format | 生成图片的像素高度(水平排版),生成图片的像素宽度(竖直排版) | -f 100 | |
-b/--background | 设置图片的背景,0-高斯噪声; 1-白色背景; 2-图片 | ||
-sw/--space_width | 设定图片中单词之间的像素间隔,默认为1像素 | -sw 0 | |
-na/--name_format | 生成图片的命名格式,图片名称通常包含标签,对于一些包含特殊符号的图片,由于图片命名中不能包含特殊图片,所以另生成一个文本记录标签。 | -na 2 | |
-rs/--random_sequences 在 -rs 为 True 的情况下可以有右侧设置 | -let/–include_letters | 用字符随机生成单词,用于随机生成单词的字符中包含字母 | |
num/--include_numbers | 用字符随机生成单词,用于随机生成单词的字符中包含数字 | ||
-sym/--include_symbols | 用字符随机生成单词,用于随机生成单词的字符中包含符号 | ||
-w/--length | 随机生成图片包含的单词数 | ||
-r/--random | 以-w设置的单词数为上限,随机生成不同单词数的图片 | ||
-t/--thread_count | 运行程序使用的线程数,实测8线程下,生成一万张图片仅需 6s,设置较高的线程可以明显提速 | ||
-e/--extension | 生成图片的保存格式,默认”jpg“ | ||
-k/--skew_angle | 文字在图片中的倾斜角度 | ||
-rk/--random_skew | 在倾斜角度 -k 设置的情况下,比如设为 a,则生成图片文字的倾斜角度在 -a~a之间随机选择 | ||
-bl/--blur | 设定图片的高斯模糊值,默认为0,即无高斯模糊处理 | ||
-rbl/--random_blur | 在设定高斯模糊值 -rbl 的情况下,比如设为b,则生成图片的高斯模糊值在 0~b之间随机取值 | ||
-id/--image_dir | 在设定背景参数 -b 的值为2(即图片)的情况下,从指定的图片文件夹中读取图片作为背景。 | ||
-hw/--handwritten | 利用训练好的RNN模型,生成手写字体图 | ||
-om/--output_mask | 对于每一张生成的图片,输出同样尺寸的掩码(全黑图片),训练的时候作为一种trick | ||
-d/--distorsion | 对生成图片中的文字进行扭曲,默认为0。1-正弦扭曲,2-余弦扭曲 | ||
-do/--distorsion_orientation | 在 -d 设定为正弦扭曲或者余弦扭曲的情况下,设定扭曲方向,0 - 竖直方向上的扭曲 1-横向扭曲 | ||
-wd/--width | 设定图片的像素宽度,在不指定的情况下,宽度为文本的宽度+10,假如设定宽度,过短会截取部分文本 | ||
-al/--alignment | 在设定文本宽度参数 -wd的情况下,截取文本的方式,0 -从左侧开始截取 1- 从中心向两边截取 2-从右侧开始截取 | ||
-or/--orientation | 文本在图片中的排版,0- 横向排版,1- 竖向排版,默认横向排版 | ||
-tc/--text_color | 文本的颜色,通过设定的颜色,或者颜色范围,生成特定颜色的文本,颜色格式为16进制 如:#282828,(#000000,#282828) | ||
-cs/--character_spacing | 设定图片中字符之间的像素间隔,默认为0像素 | ||
-m/--margins | 设定图片中文本,上下左右的空白间隔,以间隔的像素值表示,默认(5,5,5,5,) | ||
-fi/--fit | 是否按文本裁切图片,使图片中文本上下左右的间隔均为0,默认为 False | ||
-fd/--font_dir | 设定生成文本所用字体的文件夹,生成的图片从文件夹中随机选择字体 | ||
-ca/--case | 设定图片中生成的文字大小写:upper/lower | ||
-ws/--word_split | 设定是设定根据单词还是字符分隔文字,True-根据单词 Talse-根据字符 |
python run.py --help
sage: run.py [-h] [--output_dir [OUTPUT_DIR]] [-i [INPUT_FILE]]
[-l [LANGUAGE]] -c [COUNT] [-rs] [-let] [-num] [-sym]
[-w [LENGTH]] [-r] [-f [FORMAT]] [-t [THREAD_COUNT]]
[-e [EXTENSION]] [-k [SKEW_ANGLE]] [-rk] [-wk] [-bl [BLUR]]
[-rbl] [-b [BACKGROUND]] [-hw] [-na NAME_FORMAT]
[-om OUTPUT_MASK] [-obb OUTPUT_BBOXES] [-d [DISTORSION]]
[-do [DISTORSION_ORIENTATION]] [-wd [WIDTH]] [-al [ALIGNMENT]]
[-or [ORIENTATION]] [-tc [TEXT_COLOR]] [-sw [SPACE_WIDTH]]
[-cs [CHARACTER_SPACING]] [-m [MARGINS]] [-fi] [-ft [FONT]]
[-fd [FONT_DIR]] [-id [IMAGE_DIR]] [-ca [CASE]] [-dt [DICT]]
[-ws] [-stw [STROKE_WIDTH]] [-stf [STROKE_FILL]]
[-im [IMAGE_MODE]]
Generate synthetic text data for text recognition.
optional arguments:
-h, --help show this help message and exit
--output_dir [OUTPUT_DIR]
The output directory
-i [INPUT_FILE], --input_file [INPUT_FILE]
When set, this argument uses a specified text file as
source for the text
-l [LANGUAGE], --language [LANGUAGE]
The language to use, should be fr (French), en
(English), es (Spanish), de (German), ar (Arabic), cn
(Chinese), ja (Japanese) or hi (Hindi)
-c [COUNT], --count [COUNT]
The number of images to be created.
-rs, --random_sequences
Use random sequences as the source text for the
generation. Set '-let','-num','-sym' to use
letters/numbers/symbols. If none specified, using all
three.
-let, --include_letters
Define if random sequences should contain letters.
Only works with -rs
-num, --include_numbers
Define if random sequences should contain numbers.
Only works with -rs
-sym, --include_symbols
Define if random sequences should contain symbols.
Only works with -rs
-w [LENGTH], --length [LENGTH]
Define how many words should be included in each
generated sample. If the text source is Wikipedia,
this is the MINIMUM length
-r, --random Define if the produced string will have variable word
count (with --length being the maximum)
-f [FORMAT], --format [FORMAT]
Define the height of the produced images if
horizontal, else the width
-t [THREAD_COUNT], --thread_count [THREAD_COUNT]
Define the number of thread to use for image
generation
-e [EXTENSION], --extension [EXTENSION]
Define the extension to save the image with
-k [SKEW_ANGLE], --skew_angle [SKEW_ANGLE]
Define skewing angle of the generated text. In
positive degrees
-rk, --random_skew When set, the skew angle will be randomized between
the value set with -k and it's opposite
-wk, --use_wikipedia Use Wikipedia as the source text for the generation,
using this paremeter ignores -r, -n, -s
-bl [BLUR], --blur [BLUR]
Apply gaussian blur to the resulting sample. Should be
an integer defining the blur radius
-rbl, --random_blur When set, the blur radius will be randomized between 0
and -bl.
-b [BACKGROUND], --background [BACKGROUND]
Define what kind of background to use. 0: Gaussian
Noise, 1: Plain white, 2: Quasicrystal, 3: Image
-hw, --handwritten Define if the data will be "handwritten" by an RNN
-na NAME_FORMAT, --name_format NAME_FORMAT
Define how the produced files will be named. 0:
[TEXT]_[ID].[EXT], 1: [ID]_[TEXT].[EXT] 2: [ID].[EXT]
+ one file labels.txt containing id-to-label mappings
-om OUTPUT_MASK, --output_mask OUTPUT_MASK
Define if the generator will return masks for the text
-obb OUTPUT_BBOXES, --output_bboxes OUTPUT_BBOXES
Define if the generator will return bounding boxes for
the text, 1: Bounding box file, 2: Tesseract format
-d [DISTORSION], --distorsion [DISTORSION]
Define a distorsion applied to the resulting image. 0:
None (Default), 1: Sine wave, 2: Cosine wave, 3:
Random
-do [DISTORSION_ORIENTATION], --distorsion_orientation [DISTORSION_ORIENTATION]
Define the distorsion's orientation. Only used if -d
is specified. 0: Vertical (Up and down), 1: Horizontal
(Left and Right), 2: Both
-wd [WIDTH], --width [WIDTH]
Define the width of the resulting image. If not set it
will be the width of the text + 10. If the width of
the generated text is bigger that number will be used
-al [ALIGNMENT], --alignment [ALIGNMENT]
Define the alignment of the text in the image. Only
used if the width parameter is set. 0: left, 1:
center, 2: right
-or [ORIENTATION], --orientation [ORIENTATION]
Define the orientation of the text. 0: Horizontal, 1:
Vertical
-tc [TEXT_COLOR], --text_color [TEXT_COLOR]
Define the text's color, should be either a single hex
color or a range in the ?,? format.
-sw [SPACE_WIDTH], --space_width [SPACE_WIDTH]
Define the width of the spaces between words. 2.0
means twice the normal space width
-cs [CHARACTER_SPACING], --character_spacing [CHARACTER_SPACING]
Define the width of the spaces between characters. 2
means two pixels
-m [MARGINS], --margins [MARGINS]
Define the margins around the text when rendered. In
pixels
-fi, --fit Apply a tight crop around the rendered text
-ft [FONT], --font [FONT]
Define font to be used
-fd [FONT_DIR], --font_dir [FONT_DIR]
Define a font directory to be used
-id [IMAGE_DIR], --image_dir [IMAGE_DIR]
Define an image directory to use when background is
set to image
-ca [CASE], --case [CASE]
Generate upper or lowercase only. arguments: upper or
lower. Example: --case upper
-dt [DICT], --dict [DICT]
Define the dictionary to be used
-ws, --word_split Split on words instead of on characters (preserves
ligatures, no character spacing)
-stw [STROKE_WIDTH], --stroke_width [STROKE_WIDTH]
Define the width of the strokes
-stf [STROKE_FILL], --stroke_fill [STROKE_FILL]
Define the color of the contour of the strokes, if
stroke_width is bigger than 0
-im [IMAGE_MODE], --image_mode [IMAGE_MODE]
Define the image mode to be used. RGB is default, L
means 8-bit grayscale images, 1 means 1-bit binary
images stored with one pixel per byte, etc.
D:\MyDatasets\ocr\alpha\font
中的字体,生成20220708 A000 或者2158D219A1的格式python run.py -fd D:/MyDatasets/ocr/alpha/font/ --random_sequences -let --include_numbers -c 10 -w 2 -r -wd 340 -f 50 -b
3 -na 2
python run.py -fd D:/MyDatasets/ocr/alpha/font/ --random_sequences --include_numbers -c 50 -w 2 -r -wd 340 -f 50 -b 3 -k 5 -rk -bl 3 -rbl
中文:
找了一个对应的形近字,制作了txt字典
python run.py -fd ./fonts/cn -l ch -dt D:/pythonProject/TextRecognitionDataGenerator-master/trdg/dicts/chinese.txt -c 100 -w 12 -r -wd 340 -f 50 -b 3 --output_dir ./cout -sw 0 -k 5 -rk -bl 1 -rbl -tc #000000,#FFFFFF
python run.py -fd D:/MyDatasets/ocr/alpha/njs/ -dt ./dicts/temp.txt -c 500 -w 10 -f 100 -b 1 --output_dir ./ntt -sw 0 -fi -na 2