TextRecognitionDataGenerator构造的数据集,图片名称的格式:[label]_[index].png,前缀就是label,例如:72K_123.png label就是72K
用下面的方法提取gt_train.txt and gt_test.txt
def TextRecognitionDataGenerator():
"""
提取路径下的所有文件,保存到gt_train.txt , gt_test.txt中
数据是用TextRecognitionDataGenerator产生的数据,图片名的前缀就是标签
"""
extdName = ["bmp","jpg","jpeg","png"]
root = r"\\192.168.1.247\Pictures\imageAndModel\paddle_OCR_dataset\OCR_dataset"
train_ratio = 0.85 #训练集的比例
date = "20220720"
with open(os.path.join(root,date+"_gt_train.txt"),"w",encoding="utf-8") as train_f:
with open(os.path.join(root,date+"_gt_test.txt"),"w",encoding="utf-8") as test_f:
for subdir in os.listdir(root):
subdir = os.path.join(root,subdir)
if not os.path.isdir(subdir):
continue
for file in os.listdir(subdir):
ext = file.rsplit(".",1)[-1]
if ext.lower() in extdName:
if random.random() < train_ratio:
write_f = train_f
else:
write_f = test_f
label = file.rsplit("_",-1)[0]
father_dir = os.path.basename(subdir)
write_msg = os.path.join(father_dir, file) + "\t" + label + "\n"
write_f.write(write_msg)
print(write_msg)
Global:
save_model_dir: 模型的保存路径
character_dict_path: 字符字典路径
save_res_path:预测结果保存路径
Optimizer:
learning_rate: 0.0001 学习率,finetune的时候可以调小一点
Train:
data_dir: 数据集的路径
label_file_list:gt_train.txt的路径
batch_size_per_card: 32 指定batch_size
Eval:
data_dir: 数据集的路径
label_file_list:gt_test.txt的路径
batch_size_per_card: 32 指定batch_size
下面是我训练时用文件
Global:
debug: false
use_gpu: true
epoch_num: 100
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/myOCR_model2
save_epoch_step: 3
eval_batch_step: [0, 500]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: ./doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/my_en_dict2.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/myOCR_model2/predicts_ppocrv3_en.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset
ext_op_transform_idx: 1
label_file_list:
- D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset\20220720_gt_train.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 32
drop_last: true
num_workers: 1
Eval:
dataset:
name: SimpleDataSet
data_dir: D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset
label_file_list:
- D:\myAPP\pythonDoc\PaddleOCRv3\train_data\Paddle_OCR\OCR_dataset\20220720_gt_test.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 32
num_workers: 1
python tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec_my2.yml -o Global.pretrained_model=./pretrain_models/en_PP-OCRv3_rec_train/best_accuracy