• 【工程部署】在RK3588上部署OCR(文字检测识别)(DBNet+CRNN)


    硬件平台:

    1、firefly安装Ubuntu系统的RK3588;

    2、安装Windows系统的电脑一台,其上安装Ubuntu18.04系统虚拟机。

    参考手册:《00-Rockchip_RKNPU_User_Guide_RKNN_API_V1.3.0_CN》

    《RKNN Toolkit Lite2 用户使用指南》

    1、文字检测

    项目地址:

    GitHub - WenmuZhou/PytorchOCR: 基于Pytorch的OCR工具库,支持常用的文字检测和识别算法

    DBNet(Dynamic-Link Bi-directional Network)是一种用于文本检测的深度学习模型。该模型于2019年由Minghui Liao等人提出,并在文本检测领域取得了显著的成果。DBNet的设计目标是在保持高精度的同时,提高文本检测的效率。传统的文本检测模型通常使用单向的横向连接或纵向连接来处理文本实例。然而,这种单向连接可能导致信息的不完整传递或信息冗余,影响了检测性能和速度。

    为了解决这些问题,DBNet引入了双向动态连接机制,允许横向和纵向两个方向上的信息流动。具体来说,DBNet由两个关键组成部分构成:

    (1) Bi-directional FFM(Feature Fusion Module):这是DBNet的核心组件之一。它包括横向和纵向两个方向的子模块。在横向子模块中,DBNet通过可变形卷积(deformable convolution)从不同尺度的特征图中提取并融合文本实例的特征。而在纵向子模块中,DBNet使用自适应的特征选择机制,动态选择最具有代表性的特征。这些子模块的组合使得文本实例的特征能够全面而高效地进行建模。

    (2) Aggregation Decoder:这是DBNet的另一个重要组件,用于从特征图中生成文本实例的边界框和对应的文本分数。该解码器结合了横向和纵向的特征,通过逐步聚合来预测文本的位置和形状。由于使用了双向动态连接,解码器能够更准确地还原文本实例的形态。

    DBNet的训练过程包括前向传播和反向传播。在前向传播中,DBNet将图像输入网络,经过一系列卷积、特征融合和解码操作,得到文本检测的结果。然后,通过计算预测结果和真实标签之间的损失函数,使用反向传播算法来更新网络参数,从而不断优化模型的性能。

    DBNet在文本检测任务中取得了非常好的效果。其双向动态连接机制允许更好地利用横向和纵向的信息,提高了文本检测的准确性和鲁棒性。此外,相比传统的文本检测模型,DBNet在保持高精度的情况下,大幅提升了检测速度,使得它在实际应用中更具可用性和实用性。因此,DBNet在文字检测、自动化办公、图像识别等领域都具有广泛的应用前景。论文地址:https://arxiv.org/abs/1911.08947

    图1. DBNet网络结构

    2、文字识别

    项目地址:

    GitHub - WenmuZhou/PytorchOCR: 基于Pytorch的OCR工具库,支持常用的文字检测和识别算法

    CRNN(Convolutional Recurrent Neural Network)是一种深度学习模型,结合了卷积神经网络(CNN)和循环神经网络(RNN)的优势,广泛应用于图像文本识别(OCR)任务。CRNN模型于2015年由Baoguang Shi等人首次提出,并在OCR领域取得了显著的突破。

    CRNN的设计思想是将卷积神经网络用于图像的特征提取,并利用循环神经网络来对序列建模,从而使得CRNN能够直接从图像级别到序列级别进行端到端的学习。

    CRNN模型通常由以下几个部分组成:

    (1) 卷积层(Convolutional Layers):CRNN利用多个卷积层来提取图像中的局部特征。这些卷积层可以学习不同层次的图像表示,从低级特征(如边缘和纹理)到高级特征(如形状和模式)。

    (2) RNN层(Recurrent Layers):在卷积层后面,CRNN采用RNN层来处理序列数据。RNN能够捕捉序列的上下文信息,因此对于OCR任务而言,它可以有效地处理不同长度的文本序列。

    (3) 转录层(Transcription Layer):在RNN层之后,CRNN使用转录层来将RNN输出映射到字符类别。这通常是一个全连接层,将RNN输出映射到预定义的字符集合,从而实现对文本的识别。

    CRNN的训练过程包括两个主要步骤:前向传播和反向传播。在前向传播中,CRNN将图像输入模型,经过卷积和循环层,最终得到文本序列的预测。然后,通过计算预测结果和真实标签之间的损失函数,使用反向传播算法来更新网络参数,从而使得模型的预测结果逐渐接近真实标签。

    CRNN在OCR领域的应用广泛,能够识别不同尺寸、字体、颜色和背景的文本。它在识别长文本序列方面表现优秀,并且由于端到端的设计,避免了传统OCR系统中复杂的流水线处理。因此,CRNN在很多实际场景中都取得了很好的效果,如车牌识别、文字检测和手写体识别等。

    总结来说,CRNN是一种将CNN和RNN结合起来的深度学习模型,用于图像文本识别任务。其端到端的设计、优秀的序列建模能力和在OCR领域的广泛应用,使得CRNN成为了一种重要的OCR模型,为自动化文本处理和识别带来了巨大的便利。论文地址:https://arxiv.org/abs/1507.05717

    图2. CRNN结构

    环境搭建

    rknn-toolkit以及rknpu_sdk环境搭建

    (手把手)rknn-toolkit以及rknpu_sdk环境搭建--以rk3588为例_warren@伟_的博客-CSDN博客

    模型的导出与验证

    文字检测

    导出onnx模型

    1. '''
    2. Author: warren
    3. Date: 2023-06-07 14:52:27
    4. LastEditors: warren
    5. LastEditTime: 2023-06-12 15:20:28
    6. FilePath: /warren/VanillaNet1/export_onnx.py
    7. Description: export onnx model
    8. Copyright (c) 2023 by ${git_name_email}, All Rights Reserved.
    9. '''
    10. #!/usr/bin/env python3
    11. import torch
    12. from torchocr.networks import build_model
    13. MODEL_PATH='./model/det_db_mbv3_new.pth'
    14. DEVICE='cuda:0' if torch.cuda.is_available() else 'cpu'
    15. print("-----------------------devices",DEVICE)
    16. class DetInfer:
    17. def __init__(self, model_path):
    18. ckpt = torch.load(model_path, map_location=DEVICE)
    19. cfg = ckpt['cfg']
    20. self.model = build_model(cfg['model'])
    21. state_dict = {}
    22. for k, v in ckpt['state_dict'].items():
    23. state_dict[k.replace('module.', '')] = v
    24. self.model.load_state_dict(state_dict)
    25. self.device = torch.device(DEVICE)
    26. self.model.to(self.device)
    27. self.model.eval()
    28. checkpoint = torch.load(MODEL_PATH, map_location=DEVICE)
    29. # Prepare input tensor
    30. input = torch.randn(1, 3, 640, 640, requires_grad=False).float().to(torch.device(DEVICE))
    31. # Export the torch model as onnx
    32. print("-------------------export")
    33. torch.onnx.export(self.model,
    34. input,
    35. 'detect_model_small.onnx', # name of the exported onnx model
    36. export_params=True,
    37. opset_version=12,
    38. do_constant_folding=False)
    39. # Load the pretrained model and export it as onnx
    40. model = DetInfer(MODEL_PATH)

    验证

    1. import numpy as np
    2. import cv2
    3. import torch
    4. from torchvision import transforms
    5. # from label_convert import CTCLabelConverter
    6. import cv2
    7. import numpy as np
    8. import pyclipper
    9. from shapely.geometry import Polygon
    10. import onnxruntime
    11. class DBPostProcess():
    12. def __init__(self, thresh=0.3, box_thresh=0.7, max_candidates=1000, unclip_ratio=2):
    13. self.min_size = 3
    14. self.thresh = thresh
    15. self.box_thresh = box_thresh
    16. self.max_candidates = max_candidates
    17. self.unclip_ratio = unclip_ratio
    18. def __call__(self, pred, h_w_list, is_output_polygon=False):
    19. '''
    20. batch: (image, polygons, ignore_tags
    21. h_w_list: 包含[h,w]的数组
    22. pred:
    23. binary: text region segmentation map, with shape (N, 1,H, W)
    24. '''
    25. pred = pred[:, 0, :, :]
    26. segmentation = self.binarize(pred)
    27. boxes_batch = []
    28. scores_batch = []
    29. for batch_index in range(pred.shape[0]):
    30. height, width = h_w_list[batch_index]
    31. boxes, scores = self.post_p(pred[batch_index], segmentation[batch_index], width, height,
    32. is_output_polygon=is_output_polygon)
    33. boxes_batch.append(boxes)
    34. scores_batch.append(scores)
    35. return boxes_batch, scores_batch
    36. def binarize(self, pred):
    37. return pred > self.thresh
    38. def post_p(self, pred, bitmap, dest_width, dest_height, is_output_polygon=False):
    39. '''
    40. _bitmap: single map with shape (H, W),
    41. whose values are binarized as {0, 1}
    42. '''
    43. height, width = pred.shape
    44. boxes = []
    45. new_scores = []
    46. # bitmap = bitmap.cpu().numpy()
    47. if cv2.__version__.startswith('3'):
    48. _, contours, _ = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    49. if cv2.__version__.startswith('4'):
    50. contours, _ = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    51. for contour in contours[:self.max_candidates]:
    52. epsilon = 0.005 * cv2.arcLength(contour, True)
    53. approx = cv2.approxPolyDP(contour, epsilon, True)
    54. points = approx.reshape((-1, 2))
    55. if points.shape[0] < 4:
    56. continue
    57. score = self.box_score_fast(pred, contour.squeeze(1))
    58. if self.box_thresh > score:
    59. continue
    60. if points.shape[0] > 2:
    61. box = self.unclip(points, unclip_ratio=self.unclip_ratio)
    62. if len(box) > 1:
    63. continue
    64. else:
    65. continue
    66. four_point_box, sside = self.get_mini_boxes(box.reshape((-1, 1, 2)))
    67. if sside < self.min_size + 2:
    68. continue
    69. if not isinstance(dest_width, int):
    70. dest_width = dest_width.item()
    71. dest_height = dest_height.item()
    72. if not is_output_polygon:
    73. box = np.array(four_point_box)
    74. else:
    75. box = box.reshape(-1, 2)
    76. box[:, 0] = np.clip(np.round(box[:, 0] / width * dest_width), 0, dest_width)
    77. box[:, 1] = np.clip(np.round(box[:, 1] / height * dest_height), 0, dest_height)
    78. boxes.append(box)
    79. new_scores.append(score)
    80. return boxes, new_scores
    81. def unclip(self, box, unclip_ratio=1.5):
    82. poly = Polygon(box)
    83. distance = poly.area * unclip_ratio / poly.length
    84. offset = pyclipper.PyclipperOffset()
    85. offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
    86. expanded = np.array(offset.Execute(distance))
    87. return expanded
    88. def get_mini_boxes(self, contour):
    89. bounding_box = cv2.minAreaRect(contour)
    90. points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
    91. index_1, index_2, index_3, index_4 = 0, 1, 2, 3
    92. if points[1][1] > points[0][1]:
    93. index_1 = 0
    94. index_4 = 1
    95. else:
    96. index_1 = 1
    97. index_4 = 0
    98. if points[3][1] > points[2][1]:
    99. index_2 = 2
    100. index_3 = 3
    101. else:
    102. index_2 = 3
    103. index_3 = 2
    104. box = [points[index_1], points[index_2], points[index_3], points[index_4]]
    105. return box, min(bounding_box[1])
    106. def box_score_fast(self, bitmap, _box):
    107. # bitmap = bitmap.detach().cpu().numpy()
    108. h, w = bitmap.shape[:2]
    109. box = _box.copy()
    110. xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
    111. xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
    112. ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
    113. ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
    114. mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
    115. box[:, 0] = box[:, 0] - xmin
    116. box[:, 1] = box[:, 1] - ymin
    117. cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
    118. return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
    119. def narrow_224_32(image, expected_size=(224,32)):
    120. ih, iw = image.shape[0:2]
    121. ew, eh = expected_size
    122. # scale = eh / ih
    123. scale = min((eh/ih),(ew/iw))
    124. # scale = eh / max(iw,ih)
    125. nh = int(ih * scale)
    126. nw = int(iw * scale)
    127. image = cv2.resize(image, (nw, nh), interpolation=cv2.INTER_CUBIC)
    128. top = 0
    129. bottom = eh - nh
    130. left = 0
    131. right = ew - nw
    132. new_img = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
    133. return image,new_img
    134. def draw_bbox(img_path, result, color=(0, 0, 255), thickness=2):
    135. import cv2
    136. if isinstance(img_path, str):
    137. img_path = cv2.imread(img_path)
    138. # img_path = cv2.cvtColor(img_path, cv2.COLOR_BGR2RGB)
    139. img_path = img_path.copy()
    140. for point in result:
    141. point = point.astype(int)
    142. cv2.polylines(img_path, [point], True, color, thickness)
    143. return img_path
    144. if __name__ == '__main__':
    145. onnx_model = onnxruntime.InferenceSession("detect_model_small.onnx")
    146. input_name = onnx_model.get_inputs()[0].name
    147. # Set inputs
    148. img = cv2.imread('./pic/6.jpg')
    149. img0 , image= narrow_224_32(img,expected_size=(640,640))
    150. transform_totensor = transforms.ToTensor()
    151. tensor=transform_totensor(image)
    152. tensor_nor=transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    153. tensor=tensor_nor(tensor)
    154. tensor = np.array(tensor,dtype=np.float32).reshape(1,3,640,640)
    155. post_proess = DBPostProcess()
    156. is_output_polygon = False
    157. #run
    158. outputs = onnx_model.run(None, {input_name:tensor})
    159. #post process
    160. feat_2 = torch.from_numpy(outputs[0])
    161. print(feat_2.size())
    162. box_list, score_list = post_proess(outputs[0], [image.shape[:2]], is_output_polygon=is_output_polygon)
    163. box_list, score_list = box_list[0], score_list[0]
    164. if len(box_list) > 0:
    165. idx = [x.sum() > 0 for x in box_list]
    166. box_list = [box_list[i] for i, v in enumerate(idx) if v]
    167. score_list = [score_list[i] for i, v in enumerate(idx) if v]
    168. else:
    169. box_list, score_list = [], []
    170. print("-----------------box list",box_list)
    171. img = draw_bbox(image, box_list)
    172. img = img[0:img0.shape[0],0:img0.shape[1]]
    173. print("============save pic")
    174. img1=np.array(img,dtype=np.uint8).reshape(640,640,3)
    175. cv2.imwrite("img.jpg",img1)
    176. cv2.waitKey()

    文字识别

    onnx模型导出

    1. #!/usr/bin/env python3
    2. import os
    3. import sys
    4. import pathlib
    5. # 将 torchocr路径加到python路径里
    6. __dir__ = pathlib.Path(os.path.abspath(__file__))
    7. import numpy as np
    8. sys.path.append(str(__dir__))
    9. sys.path.append(str(__dir__.parent.parent))
    10. import torch
    11. from torchocr.networks import build_model
    12. MODEL_PATH='./model/ch_rec_moblie_crnn_mbv3.pth'
    13. DEVICE='cuda:0' if torch.cuda.is_available() else 'cpu'
    14. print("-----------------------devices",DEVICE)
    15. class RecInfer:
    16. def __init__(self, model_path, batch_size=1):
    17. ckpt = torch.load(model_path, map_location=DEVICE)
    18. cfg = ckpt['cfg']
    19. self.model = build_model(cfg['model'])
    20. state_dict = {}
    21. for k, v in ckpt['state_dict'].items():
    22. state_dict[k.replace('module.', '')] = v
    23. self.model.load_state_dict(state_dict)
    24. self.batch_size = batch_size
    25. self.device = torch.device(DEVICE)
    26. self.model.to(self.device)
    27. self.model.eval()
    28. # Prepare input tensor
    29. input = torch.randn(1, 3, 32, 224, requires_grad=False).float().to(torch.device(DEVICE))
    30. # Export the torch model as onnx
    31. print("-------------------export")
    32. torch.onnx.export(self.model,
    33. input,
    34. 'rego_model_small.onnx',
    35. export_params=True,
    36. opset_version=12,
    37. do_constant_folding=False)
    38. # Load the pretrained model and export it as onnx
    39. model = RecInfer(MODEL_PATH)

    验证

    1. import onnxruntime
    2. import numpy as np
    3. import cv2
    4. import torch
    5. DEVICE='cuda:0' if torch.cuda.is_available() else 'cpu'
    6. IMG_WIDTH=448
    7. ONNX_MODEL='./onnx_model/repvgg_s.onnx'
    8. LABEL_FILE='/root/autodl-tmp/warren/PytorchOCR_OLD/torchocr/datasets/alphabets/dict_text.txt'
    9. #ONNX_MODEL='./onnx_model/rego_model_small.onnx'
    10. #LABEL_FILE='/root/autodl-tmp/warren/PytorchOCR_OLD/torchocr/datasets/alphabets/ppocr_keys_v1.txt'
    11. PIC='./pic/img.jpg'
    12. class CTCLabelConverter(object):
    13. """ Convert between text-label and text-index """
    14. def __init__(self, character):
    15. # character (str): set of the possible characters.
    16. dict_character = []
    17. with open(character, "rb") as fin:
    18. lines = fin.readlines()
    19. for line in lines:
    20. line = line.decode('utf-8').strip("\n").strip("\r\n")
    21. dict_character += list(line)
    22. self.dict = {}
    23. for i, char in enumerate(dict_character):
    24. # NOTE: 0 is reserved for 'blank' token required by CTCLoss
    25. self.dict[char] = i + 1
    26. #TODO replace ‘ ’ with special symbol
    27. self.character = ['[blank]'] + dict_character+[' '] # dummy '[blank]' token for CTCLoss (index 0)
    28. def decode(self, preds, raw=False):
    29. """ convert text-index into text-label. """
    30. preds_idx = preds.argmax(axis=2)
    31. preds_prob = preds.max(axis=2)
    32. result_list = []
    33. for word, prob in zip(preds_idx, preds_prob):
    34. if raw:
    35. result_list.append((''.join([self.character[int(i)] for i in word]), prob))
    36. else:
    37. result = []
    38. conf = []
    39. for i, index in enumerate(word):
    40. if word[i] != 0 and (not (i > 0 and word[i - 1] == word[i])):
    41. result.append(self.character[int(index)])
    42. conf.append(prob[i])
    43. result_list.append((''.join(result), conf))
    44. return result_list
    45. def decode(preds, raw=False):
    46. """ convert text-index into text-label. """
    47. dict_character = []
    48. dict = {}
    49. character=LABEL_FILE
    50. with open(character, "rb") as fin:
    51. lines = fin.readlines()
    52. for line in lines:
    53. line = line.decode('utf-8').strip("\n").strip("\r\n")
    54. dict_character += list(line)
    55. for i, char in enumerate(dict_character):
    56. # NOTE: 0 is reserved for 'blank' token required by CTCLoss
    57. dict[char] = i + 1
    58. #TODO replace ‘ ’ with special symbol
    59. character = ['[blank]'] + dict_character+[' '] # dummy '[blank]' token for CTCLoss (index 0)
    60. preds_idx = preds.argmax(axis=2)
    61. preds_prob = preds.max(axis=2)
    62. result_list = []
    63. for word, prob in zip(preds_idx, preds_prob):
    64. if raw:
    65. result_list.append((''.join([character[int(i)] for i in word]), prob))
    66. else:
    67. result = []
    68. conf = []
    69. for i, index in enumerate(word):
    70. if word[i] != 0 and (not (i > 0 and word[i - 1] == word[i])):
    71. result.append(character[int(index)])
    72. conf.append(prob[i])
    73. result_list.append((''.join(result), conf))
    74. return result_list
    75. def width_pad_img(_img, _target_width, _pad_value=0):
    76. _height, _width, _channels = _img.shape
    77. to_return_img = np.ones([_height, _target_width, _channels], dtype=_img.dtype) * _pad_value
    78. to_return_img[:_height, :_width, :] = _img
    79. return to_return_img
    80. def resize_with_specific_height(_img):
    81. resize_ratio = 32 / _img.shape[0]
    82. return cv2.resize(_img, (0, 0), fx=resize_ratio, fy=resize_ratio, interpolation=cv2.INTER_LINEAR)
    83. def normalize_img(_img):
    84. return (_img.astype(np.float32) / 255 - 0.5) / 0.5
    85. if __name__ == '__main__':
    86. onnx_model = onnxruntime.InferenceSession(ONNX_MODEL)
    87. input_name = onnx_model.get_inputs()[0].name
    88. # Set inputs
    89. imgs = cv2.imread(PIC)
    90. if not isinstance(imgs,list):
    91. imgs = [imgs]
    92. imgs = [normalize_img(resize_with_specific_height(img)) for img in imgs]
    93. widths = np.array([img.shape[1] for img in imgs])
    94. idxs = np.argsort(widths)
    95. txts = []
    96. label_convert=CTCLabelConverter(LABEL_FILE)
    97. for idx in range(len(imgs)):
    98. batch_idxs = idxs[idx:min(len(imgs), idx+1)]
    99. batch_imgs = [width_pad_img(imgs[idx],IMG_WIDTH) for idx in batch_idxs]
    100. batch_imgs = np.stack(batch_imgs)
    101. print(batch_imgs.shape)
    102. tensor =batch_imgs.transpose([0,3, 1, 2]).astype(np.float32)
    103. out = onnx_model.run(None, {input_name:tensor})
    104. tensor_out = torch.tensor(out)
    105. tensor_out = torch.squeeze(tensor_out,dim=1)
    106. softmax_output = tensor_out.softmax(dim=2)
    107. print("---------------out shape is",softmax_output.shape)
    108. txts.extend([label_convert.decode(np.expand_dims(txt, 0)) for txt in softmax_output])
    109. idxs = np.argsort(idxs)
    110. out_txts = [txts[idx] for idx in idxs]
    111. import sys
    112. import codecs
    113. sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
    114. print(out_txts)

    至此 导出验证成功

    rk3588板端部署

    转化为rknn模型

    1. from rknn.api import RKNN
    2. ONNX_MODEL = 'xxx.onnx'
    3. RKNN_MODEL = 'xxxx.rknn'
    4. DATASET = './dataset.txt'
    5. if __name__ == '__main__':
    6.     # Create RKNN object
    7.     rknn = RKNN(verbose=True)
    8.     # pre-process config
    9.     print('--> Config model')
    10.     ret=rknn.config(mean_values=[[0, 0, 0]], std_values=[[0, 0, 0]],target_platform='rk3588')  #wzw
    11.     if ret != 0:
    12.         print('config model failed!')
    13.         exit(ret)
    14.     print('done')
    15.     # Load ONNX model
    16.     print('--> Loading model')
    17.     ret = rknn.load_onnx(model=ONNX_MODEL, outputs=['output', '345', '346'])  
    18.     if ret != 0:
    19.         print('Load model failed!')
    20.         exit(ret)
    21.     print('done')
    22.     # Build model
    23.     print('--> Building model')
    24.     ret = rknn.build(do_quantization=True, dataset=DATASET)
    25.     #ret = rknn.build(do_quantization=False)
    26.     if ret != 0:
    27.         print('Build model failed!')
    28.         exit(ret)
    29.     print('done')
    30.     # Export RKNN model
    31.     print('--> Export rknn model')
    32.     ret = rknn.export_rknn(RKNN_MODEL)
    33.     if ret != 0:
    34.         print('Export rknn model failed!')
    35.         exit(ret)
    36.     print('done')
    37.     #release rknn
    38.     rknn.release()

    使用pyqt进行开发

    PyQt软件设计

    使用pyqt进行开发,ui界面如图所示

    UI

    6. 基于PYQT的ui界面

    该界面包含了三个功能按钮,其中包裹一个选择静态图片,一个使用相机,一个检测按钮,TextEdit用于显示识别结果,label用于显示处理完成后的图片。

    软件流程图如下:

    总体目录参照

    下面依次介绍图片检测的相关代码:

    1. import platform
    2. import sys
    3. import cv2
    4. import numpy as np
    5. import torch
    6. import pyclipper
    7. from shapely.geometry import Polygon
    8. from torchvision import transforms
    9. import time
    10. import os
    11. import glob
    12. import threading
    13. from PyQt5.QtGui import *
    14. from PyQt5.QtWidgets import *
    15. from PyQt5.QtCore import *
    16. import platform
    17. from rknnlite.api import RKNNLite
    18. import os
    19. os.environ.pop("QT_QPA_PLATFORM_PLUGIN_PATH")
    20. DETECT_MODEL = './model/model_small.rknn'
    21. REGO_MODEL='./model/repvgg_s.rknn'
    22. LABEL_FILE='./dict/dict_text.txt'
    23. LABEL_SIZE_PRIVIOUS=0
    24. LABEL_SIZE_LATTER=0
    25. # 文件夹路径
    26. folder_path = './crop_pic'
    27. # 使用 glob 来获取所有图片文件的路径
    28. image_files = glob.glob(os.path.join(folder_path, '*.png')) + glob.glob(os.path.join(folder_path, '*.jpg'))
    29. def resize_img_self(image,reszie_size=(0,0)):
    30. ih,iw=image.shape[0:2]
    31. ew,eh=reszie_size
    32. scale=eh/ih
    33. width=int(iw*scale)
    34. height=int(ih*scale)
    35. if height!=eh:
    36. height=eh
    37. image=cv2.resize(image,(width,height),interpolation=cv2.INTER_LINEAR)
    38. top = 0
    39. bottom = 0
    40. left = 0
    41. right = ew-width
    42. new_img = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
    43. #print("new image shape",new_img.shape)
    44. return new_img
    45. def narrow_224_32(image, expected_size=(224,32)):
    46. ih, iw = image.shape[0:2]
    47. ew, eh = expected_size
    48. # scale = eh / ih
    49. scale = min((eh/ih),(ew/iw))
    50. # scale = eh / max(iw,ih)
    51. nh = int(ih * scale)
    52. nw = int(iw * scale)
    53. image = cv2.resize(image, (nw, nh), interpolation=cv2.INTER_CUBIC)
    54. top = 0
    55. bottom = eh - nh
    56. left = 0
    57. right = ew - nw
    58. new_img = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
    59. return image,new_img
    60. def draw_bbox(img_path, result, color=(0, 0, 255), thickness=2):
    61. import cv2
    62. if isinstance(img_path, str):
    63. img_path = cv2.imread(img_path)
    64. # img_path = cv2.cvtColor(img_path, cv2.COLOR_BGR2RGB)
    65. img_path = img_path.copy()
    66. for point in result:
    67. point = point.astype(int)
    68. cv2.polylines(img_path, [point], True, color, thickness)
    69. return img_path
    70. def delay_milliseconds(milliseconds):
    71. seconds = milliseconds / 1000.0
    72. time.sleep(seconds)
    73. """ Convert between text-label and text-index """
    74. class CTCLabelConverter(object):
    75. def __init__(self, character):
    76. # character (str): set of the possible characters.
    77. dict_character = []
    78. with open(character, "rb") as fin:
    79. lines = fin.readlines()
    80. for line in lines:
    81. line = line.decode('utf-8').strip("\n").strip("\r\n")
    82. dict_character += list(line)
    83. self.dict = {}
    84. for i, char in enumerate(dict_character):
    85. # NOTE: 0 is reserved for 'blank' token required by CTCLoss
    86. self.dict[char] = i + 1
    87. #TODO replace ‘ ’ with special symbol
    88. self.character = ['[blank]'] + dict_character+[' '] # dummy '[blank]' token for CTCLoss (index 0)
    89. def decode(self, preds, raw=False):
    90. """ convert text-index into text-label. """
    91. preds_idx = preds.argmax(axis=2)
    92. preds_prob = preds.max(axis=2)
    93. result_list = []
    94. for word, prob in zip(preds_idx, preds_prob):
    95. if raw:
    96. result_list.append((''.join([self.character[int(i)] for i in word]), prob))
    97. else:
    98. result = []
    99. conf = []
    100. for i, index in enumerate(word):
    101. if word[i] != 0 and (not (i > 0 and word[i - 1] == word[i])):
    102. result.append(self.character[int(index)])
    103. #conf.append(prob[i])
    104. #result_list.append((''.join(result), conf))
    105. result_list.append((''.join(result)))
    106. return result_list
    107. class DBPostProcess():
    108. def __init__(self, thresh=0.3, box_thresh=0.7, max_candidates=1000, unclip_ratio=2):
    109. self.min_size = 3
    110. self.thresh = thresh
    111. self.box_thresh = box_thresh
    112. self.max_candidates = max_candidates
    113. self.unclip_ratio = unclip_ratio
    114. def __call__(self, pred, h_w_list, is_output_polygon=False):
    115. pred = pred[:, 0, :, :]
    116. segmentation = self.binarize(pred)
    117. boxes_batch = []
    118. scores_batch = []
    119. for batch_index in range(pred.shape[0]):
    120. height, width = h_w_list[batch_index]
    121. boxes, scores = self.post_p(pred[batch_index], segmentation[batch_index], width, height,
    122. is_output_polygon=is_output_polygon)
    123. boxes_batch.append(boxes)
    124. scores_batch.append(scores)
    125. return boxes_batch, scores_batch
    126. def binarize(self, pred):
    127. return pred > self.thresh
    128. def post_p(self, pred, bitmap, dest_width, dest_height, is_output_polygon=False):
    129. '''
    130. _bitmap: single map with shape (H, W),
    131. whose values are binarized as {0, 1}
    132. '''
    133. height, width = pred.shape
    134. boxes = []
    135. new_scores = []
    136. # bitmap = bitmap.cpu().numpy()
    137. if cv2.__version__.startswith('3'):
    138. _, contours, _ = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    139. if cv2.__version__.startswith('4'):
    140. contours, _ = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    141. for contour in contours[:self.max_candidates]:
    142. epsilon = 0.005 * cv2.arcLength(contour, True)
    143. approx = cv2.approxPolyDP(contour, epsilon, True)
    144. points = approx.reshape((-1, 2))
    145. if points.shape[0] < 4:
    146. continue
    147. score = self.box_score_fast(pred, contour.squeeze(1))
    148. if self.box_thresh > score:
    149. continue
    150. if points.shape[0] > 2:
    151. box = self.unclip(points, unclip_ratio=self.unclip_ratio)
    152. if len(box) > 1:
    153. continue
    154. else:
    155. continue
    156. four_point_box, sside = self.get_mini_boxes(box.reshape((-1, 1, 2)))
    157. if sside < self.min_size + 2:
    158. continue
    159. if not isinstance(dest_width, int):
    160. dest_width = dest_width.item()
    161. dest_height = dest_height.item()
    162. if not is_output_polygon:
    163. box = np.array(four_point_box)
    164. else:
    165. box = box.reshape(-1, 2)
    166. box[:, 0] = np.clip(np.round(box[:, 0] / width * dest_width), 0, dest_width)
    167. box[:, 1] = np.clip(np.round(box[:, 1] / height * dest_height), 0, dest_height)
    168. boxes.append(box)
    169. new_scores.append(score)
    170. return boxes, new_scores
    171. def unclip(self, box, unclip_ratio=1.5):
    172. poly = Polygon(box)
    173. distance = poly.area * unclip_ratio / poly.length
    174. offset = pyclipper.PyclipperOffset()
    175. offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
    176. expanded = np.array(offset.Execute(distance))
    177. return expanded
    178. def get_mini_boxes(self, contour):
    179. bounding_box = cv2.minAreaRect(contour)
    180. points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
    181. index_1, index_2, index_3, index_4 = 0, 1, 2, 3
    182. if points[1][1] > points[0][1]:
    183. index_1 = 0
    184. index_4 = 1
    185. else:
    186. index_1 = 1
    187. index_4 = 0
    188. if points[3][1] > points[2][1]:
    189. index_2 = 2
    190. index_3 = 3
    191. else:
    192. index_2 = 3
    193. index_3 = 2
    194. box = [points[index_1], points[index_2], points[index_3], points[index_4]]
    195. return box, min(bounding_box[1])
    196. def box_score_fast(self, bitmap, _box):
    197. # bitmap = bitmap.detach().cpu().numpy()
    198. h, w = bitmap.shape[:2]
    199. box = _box.copy()
    200. xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
    201. xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
    202. ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
    203. ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
    204. mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
    205. box[:, 0] = box[:, 0] - xmin
    206. box[:, 1] = box[:, 1] - ymin
    207. cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
    208. return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
    209. class Process_Class(QWidget):
    210. detect_end = pyqtSignal(str)
    211. clear_text = pyqtSignal()
    212. def __init__(self):
    213. super().__init__()
    214. self.image = None
    215. self.img=None
    216. self.camera_status=False
    217. self.result_string=None
    218. self.cap = cv2.VideoCapture()
    219. #detect
    220. rknn_model_detect = DETECT_MODEL
    221. self.rknn_lite_detect = RKNNLite()
    222. self.rknn_lite_detect.load_rknn(rknn_model_detect)# load RKNN model
    223. self.rknn_lite_detect.init_runtime(core_mask=RKNNLite.NPU_CORE_2)# init runtime environment
    224. #rego
    225. rknn_model_rego = REGO_MODEL
    226. self.rknn_lite_rego = RKNNLite()
    227. self.rknn_lite_rego.load_rknn(rknn_model_rego)# load RKNN model
    228. self.rknn_lite_rego.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1)# init runtime environment
    229. self.detect_end.connect(self.update_text_box)
    230. self.clear_text.connect(self.clear_text_box)
    231. def cv2_to_qpixmap(self, cv_image):
    232. height, width, channel = cv_image.shape
    233. bytes_per_line = 3 * width
    234. q_image = QImage(cv_image.data, width, height, bytes_per_line, QImage.Format_RGB888).rgbSwapped()
    235. return QPixmap.fromImage(q_image)
    236. def show_pic(self, cv_image):
    237. pixmap = self.cv2_to_qpixmap(cv_image)
    238. if MainWindow.pic_label is not None:
    239. MainWindow.pic_label.setPixmap(pixmap)
    240. QApplication.processEvents()
    241. else:
    242. print("wrong!!!!!!!")
    243. def camera_open(self):
    244. self.camera_status = not self.camera_status
    245. print("------------camera status is",self.camera_status)
    246. if self.camera_status:
    247. self.cap.open(12)
    248. if self.cap.isOpened():
    249. print("run camera")
    250. while(True):
    251. frame = self.cap.read()
    252. if not frame[0]:
    253. print("read frame failed!!!!")
    254. exit()
    255. self.image=frame[1]
    256. self.detect_pic()
    257. if not self.camera_status:
    258. break
    259. else:
    260. print("Cannot open camera")
    261. exit()
    262. else:
    263. self.release_camera()
    264. def release_camera(self):
    265. if self.cap.isOpened():
    266. self.cap.release()
    267. self.camera_status = False
    268. print("摄像头关闭")
    269. def open_file(self):
    270. # 获取图像的路径
    271. img_path, _ = QFileDialog.getOpenFileName()
    272. if img_path != '':
    273. self.image = cv2.imread(img_path)
    274. self.show_pic(self.image)
    275. def crop_and_save_image(self,image, box_points):
    276. global LABEL_SIZE_PRIVIOUS
    277. global LABEL_SIZE_LATTER
    278. i=-1
    279. # 将box_points转换为NumPy数组,并取整数值
    280. box_points = np.array(box_points, dtype=np.int32)
    281. mask = np.zeros_like(image) # 创建与图像相同大小的全黑图像
    282. print("LABEL_SIZE_PRIVIOUS ",LABEL_SIZE_PRIVIOUS,"LABEL_SIZE_LATTER ",LABEL_SIZE_LATTER)
    283. if LABEL_SIZE_PRIVIOUS==LABEL_SIZE_LATTER:
    284. LABEL_SIZE_PRIVIOUS=len(box_points)
    285. for box_point in box_points:
    286. i=i+1
    287. cropped_image = image.copy()
    288. # 使用OpenCV的函数裁剪图像
    289. x, y, w, h = cv2.boundingRect(box_point)
    290. cropped_image = image[y:y+h, x:x+w]
    291. # 创建与图像大小相同的全黑掩码
    292. mask = np.zeros_like(cropped_image)
    293. # 在掩码上绘制多边形
    294. cv2.fillPoly(mask, [box_point - (x, y)], (255, 255, 255))
    295. # 使用 bitwise_and 进行图像裁剪
    296. masked_cropped_image = cv2.bitwise_and(cropped_image, mask)
    297. # 保存裁剪后的图像
    298. output_path = f"{'./crop_pic/'}img_{i}.jpg"
    299. cv2.imwrite(output_path, masked_cropped_image)
    300. else:
    301. #self.clear_text.emit()
    302. LABEL_SIZE_LATTER=LABEL_SIZE_PRIVIOUS
    303. current_directory = os.getcwd()+'/crop_pic' # Get the current directory
    304. for filename in os.listdir(current_directory):
    305. if filename.endswith(".jpg"):
    306. file_path = os.path.join(current_directory, filename)
    307. os.remove(file_path)
    308. print(f"Deleted: {file_path}")
    309. def detect_thread(self):
    310. #detect inference
    311. img0 , image= narrow_224_32(self.image,expected_size=(640,640))
    312. outputs =self.rknn_lite_detect.inference(inputs=[image])
    313. post_proess = DBPostProcess()
    314. is_output_polygon = False
    315. box_list, score_list = post_proess(outputs[0], [image.shape[:2]], is_output_polygon=is_output_polygon)
    316. box_list, score_list = box_list[0], score_list[0]
    317. if len(box_list) > 0:
    318. idx = [x.sum() > 0 for x in box_list]
    319. box_list = [box_list[i] for i, v in enumerate(idx) if v]
    320. score_list = [score_list[i] for i, v in enumerate(idx) if v]
    321. else:
    322. box_list, score_list = [], []
    323. self.image = draw_bbox(image, box_list)
    324. self.crop_and_save_image(image,box_list)
    325. self.image = self.image[0:img0.shape[0],0:img0.shape[1]]
    326. self.show_pic(self.image)
    327. def rego_thread(self):
    328. label_convert=CTCLabelConverter(LABEL_FILE)
    329. self.clear_text.emit()
    330. for image_file in image_files:
    331. if os.path.exists(image_file):
    332. print('-----------image file',image_file,len(image_files))
    333. self.img = cv2.imread(image_file)
    334. image = resize_img_self(self.img,reszie_size=(448,32))
    335. # Inference
    336. outputs = self.rknn_lite_rego.inference(inputs=[image])
    337. #post process
    338. feat_2 = torch.tensor(outputs[0],dtype=torch.float32)
    339. txt = label_convert.decode(feat_2.detach().numpy())
    340. self.result_string = ' '.join(txt)
    341. print(self.result_string)
    342. self.detect_end.emit(self.result_string)
    343. else:
    344. print("-----------no crop image!!!")
    345. def detect_pic(self):
    346. self.detect_thread()
    347. my_thread = threading.Thread(target=self.rego_thread)
    348. # 启动线程
    349. my_thread.start()
    350. # 等待线程结束
    351. my_thread.join()
    352. def update_text_box(self, text):
    353. # 在主线程中更新文本框的内容
    354. MainWindow.text_box.append(text)
    355. def clear_text_box(self):
    356. print("clear--------------------------------")
    357. # 在主线程中更新文本框的内容
    358. MainWindow.text_box.clear()
    359. class MainWindow(QMainWindow):
    360. #pic_label = None
    361. def __init__(self):
    362. pic_label = None
    363. text_box = None
    364. super().__init__()
    365. self.process_functions = Process_Class()
    366. self.window = QWidget()
    367. # 创建小部件
    368. self.pic_label = QLabel('Show Window!', parent=self.window)
    369. self.pic_label.setMinimumHeight(500) # 设置最小高度
    370. self.pic_label.setMaximumHeight(500) # 设置最大高度
    371. self.pic_button = QPushButton('Picture', parent=self.window)
    372. self.pic_button.clicked.connect(self.process_functions.open_file)
    373. self.camera_button = QPushButton('Camera', parent=self.window)
    374. self.camera_button.clicked.connect(self.process_functions.camera_open)
    375. self.detect_button = QPushButton('Detect', parent=self.window)
    376. self.detect_button.clicked.connect(self.process_functions.detect_pic)
    377. self.text_box = QTextEdit()
    378. # 创建垂直布局管理器并将小部件添加到布局中
    379. self.left_layout = QVBoxLayout()
    380. self.right_layout = QVBoxLayout()
    381. self.layout = QHBoxLayout()
    382. self.create_ui()
    383. self.window.closeEvent = self.closeEvent
    384. def create_ui(self):
    385. self.window.setWindowTitle('Scene_text_rego')
    386. self.window.setGeometry(0, 0, 800, 600) # 设置窗口位置和大小
    387. # 设置主窗口的布局
    388. self.pic_label.setStyleSheet('border: 2px solid black; padding: 10px;')
    389. self.left_layout.addWidget(self.pic_label)
    390. self.left_layout.addWidget(self.text_box)
    391. self.right_layout.addWidget(self.pic_button)
    392. self.right_layout.addWidget(self.camera_button)
    393. self.right_layout.addWidget(self.detect_button)
    394. self.layout.addLayout(self.left_layout)
    395. self.layout.addLayout(self.right_layout)
    396. self.window.setLayout(self.layout)
    397. self.window.show()
    398. def closeEvent(self, event):
    399. # 释放摄像头资源
    400. self.process_functions.release_camera()
    401. event.accept()
    402. def main():
    403. # 创建应用程序对象
    404. app = QApplication(sys.argv)
    405. win = MainWindow()
    406. MainWindow.pic_label = win.pic_label # 设置类变量pic_label为MainWindow对象的pic_label
    407. MainWindow.text_box = win.text_box # 设置类变量pic_label为MainWindow对象的pic_label
    408. # 运行应用程序
    409. sys.exit(app.exec_())
    410. rknn_lite_detect.release()
    411. if __name__ == '__main__':
    412. main()

    运行结果

     参考资料

    博文:

    【工程部署】手把手教你在RKNN上部署OCR服务(上)_rknn ocr_三叔家的猫的博客-CSDN博客

  • 相关阅读:
    图像处理Laplacian 算子
    Q-REG论文阅读
    CMake Day 7 —— option
    【Java】replace替换方法
    第63篇-解释器与编译器适配(二)
    jenkins搭建
    手动构造感知机模型Perceptron(Numpy代码手写)
    数据库timestamp遇到的小问题
    Docker高级-4.可视化工具Portainer/容器监控之CAdvisor+InfluxDB+Granfana
    Mybatis的collection三层嵌套查询(验证通过)
  • 原文地址:https://blog.csdn.net/warren103098/article/details/134392494