Python实现文字识别OCR

Python实现文字识别OCR可选的库很多，这里介绍了Tesseract、ddddocr、CnOCR、paddleocr等。

Tesseract

Tesseract是一个开源的ocr引擎，可以开箱即用，项目最初由惠普实验室支持，1996年被移植到Windows上，1998年进行了C++化。在2005年Tesseract由惠普公司宣布开源。2006年到现在，都由Google公司开发。

import pytesseract
from PIL import Image, ImageEnhance
"""
步骤①：定位图片的元素，并且截取当前浏览器的页面图片
步骤②：获取验证码坐标点，以及验证码图片、浏览器、截图的长和宽
步骤③：截取截图里的验证码图片，获得的验证码图片并保存
步骤④：获得验证码code
"""
# imagePng = "../img/test.png"
# 原图路径
imagePng = "../img/test.png"
# 处理之后图片的路径
savePngPath = "../img/savePng.png"

# 原图转对象
resource_img = Image.open(imagePng)

# 转换模式：L | RGB
resource_img = resource_img.convert('L')

# 提高识别率
enhancer = ImageEnhance.Color(resource_img)
enhancer = enhancer.enhance(0)
enhancer = ImageEnhance.Brightness(enhancer)
enhancer = enhancer.enhance(2)
enhancer = ImageEnhance.Contrast(enhancer)      # 增强对比度
enhancer = enhancer.enhance(8)
enhancer = ImageEnhance.Sharpness(enhancer)
resource_img = enhancer.enhance(20)

resource_img = ImageEnhance.Contrast(resource_img)  # 增强对比度
resource_img = resource_img.enhance(2.0)
resource_img.save(savePngPath)

# 识别图片
code = pytesseract.image_to_string(Image.open(savePngPath)).strip()
#code = pytesseract.image_to_string(Image.open('../img/xin.png')).strip()
print(f"提取的文字为：{code}")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

ddddocr

ddddocr（Deep Double-Digital Digits OCR）是一个基于深度学习的数字识别库，专门用于识别双重数字（双位数字）的任务。它是一个开源项目，提供了训练和预测的功能，可用于识别图片中的双位数字并输出其具体的数值。

pip install ddddocr
1

import ddddocr

ocr = ddddocr.DdddOcr(old=True)
with open('../img/test.png', 'rb') as f:
	img_bytes = f.read()
res = ocr.classification(img_bytes)
print('识别出的文字为：' + res)
1
2
3
4
5
6
7

import ddddocr

# 初始化 OCR 引擎
ocr = ddddocr.DdddOcr()

# 读取身份证图像
image_path = 'path_to_your_image.jpg'
image = ddddocr.imread(image_path)

# 图像预处理
# TODO: 进行图像预处理操作，如裁剪、缩放、灰度转换等

# 文字区域检测
text_boxes = ocr.detect(image)

# 文字识别
results = []
for box in text_boxes:
    text = ocr.recognize(image, box)
    results.append(text)

# 结果解析
# TODO: 对识别结果进行解析和后处理，提取身份证上的关键信息

# 输出识别结果
for result in results:
    print(result)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

CnOCR

CnOCR 是 Python 3 下的文字识别（Optical Character Recognition，简称OCR）工具包，支持简体中文、繁体中文（部分模型）、英文和数字的常见字符识别，支持竖排文字的识别。自带了20+个训练好的识别模型，适用于不同应用场景，安装后即可直接使用。同时，CnOCR也提供简单的训练命令供使用者训练自己的模型。欢迎加入交流群。

$ pip install cnocr[ort-cpu]
1

from cnocr import CnOcr

img_fp = './docs/examples/huochepiao.jpeg'
ocr = CnOcr()  # 所有参数都使用默认值
out = ocr.ocr(img_fp)

print(out)
1
2
3
4
5
6
7

参考链接

https://github.com/tesseract-ocr/tesseract
https://github.com/sml2h3/ddddocr
https://aka.ms/vs/16/release/VC_redist.x86.exe
https://aka.ms/vs/16/release/VC_redist.x64.exe
https://cnocr.readthedocs.io/zh/latest/
https://github.com/PaddlePaddle/PaddleOCR

相关阅读:
C++调用VSS API进行快照
python大学生生活信息交互平台的设计与实现毕业设计-附源码031315
MySQL 8 - 能够成功创建其他用户但无法修改 root 用户的密码
Pointnet++学习
ECCV 2022 旷视入选论文亮点解读（上）
高性能、低成本的高防IP产品现实吗？
建筑楼宇VR火灾扑灭救援虚拟仿真软件厂家
AndroidT(13) -- logger_write 库实现解析(四)
【人见人爱报错系列】GIt常见问题解决大全
java计算机毕业设计ssm基于JAVA的网上购物系统-商城购物网站

原文地址：https://blog.csdn.net/lilongsy/article/details/133747021