博物馆一日游,拍照片无数。分类整理,希望图片中的文字进行识别,加上各展馆、各展品的说明。
手工一张张的整理,慢,累,要老命。。。。。。
还好,模块化、低代码时代,效率、性能、界面、易用性暂不过多考虑,解决问题先,省点力气、省点时间。
- >>> pip install easyocr
- Requirement already satisfied: easyocr in c:\python39\lib\site-packages (1.6.2)
- Requirement already satisfied: scikit-image in c:\python39\lib\site-packages (from easyocr) (0.19.3)
- Requirement already satisfied: Pillow in c:\python39\lib\site-packages (from easyocr) (9.2.0)
- Requirement already satisfied: PyYAML in c:\python39\lib\site-packages (from easyocr) (6.0)
- Requirement already satisfied: torch in c:\python39\lib\site-packages (from easyocr) (1.13.0)
- Requirement already satisfied: pyclipper in c:\python39\lib\site-packages (from easyocr) (1.3.0.post3)
- Requirement already satisfied: python-bidi in c:\python39\lib\site-packages (from easyocr) (0.4.2)
- Requirement already satisfied: Shapely in c:\python39\lib\site-packages (from easyocr) (1.8.5.post1)
- Requirement already satisfied: numpy in c:\python39\lib\site-packages (from easyocr) (1.23.4)
- Requirement already satisfied: scipy in c:\python39\lib\site-packages (from easyocr) (1.9.2)
- Requirement already satisfied: opencv-python-headless<=4.5.4.60 in c:\python39\lib\site-packages (from easyocr) (4.5.4.60)
- Requirement already satisfied: ninja in c:\python39\lib\site-packages (from easyocr) (1.10.2.4)
- Requirement already satisfied: torchvision>=0.5 in c:\python39\lib\site-packages (from easyocr) (0.14.0)
- Requirement already satisfied: typing-extensions in c:\python39\lib\site-packages (from torchvision>=0.5->easyocr) (4.4.0)
- Requirement already satisfied: requests in c:\python39\lib\site-packages (from torchvision>=0.5->easyocr) (2.25.1)
- Requirement already satisfied: six in c:\python39\lib\site-packages (from python-bidi->easyocr) (1.16.0)
- Requirement already satisfied: networkx>=2.2 in c:\python39\lib\site-packages (from scikit-image->easyocr) (2.8.7)
- Requirement already satisfied: PyWavelets>=1.1.1 in c:\python39\lib\site-packages (from scikit-image->easyocr) (1.4.1)
- Requirement already satisfied: packaging>=20.0 in c:\python39\lib\site-packages (from scikit-image->easyocr) (21.3)
- Requirement already satisfied: imageio>=2.4.1 in c:\python39\lib\site-packages (from scikit-image->easyocr) (2.22.1)
- Requirement already satisfied: tifffile>=2019.7.26 in c:\python39\lib\site-packages (from scikit-image->easyocr) (2022.10.10)
- Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\python39\lib\site-packages (from packaging>=20.0->scikit-image->easyocr) (2.4.7)
- Requirement already satisfied: certifi>=2017.4.17 in c:\python39\lib\site-packages (from requests->torchvision>=0.5->easyocr) (2020.12.5)
- Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\python39\lib\site-packages (from requests->torchvision>=0.5->easyocr) (1.26.3)
- Requirement already satisfied: idna<3,>=2.5 in c:\python39\lib\site-packages (from requests->torchvision>=0.5->easyocr) (2.10)
- Requirement already satisfied: chardet<5,>=3.0.2 in c:\python39\lib\site-packages (from requests->torchvision>=0.5->easyocr) (4.0.0)

- import easyocr
-
- reader = easyocr.Reader(['ch_sim','en'], gpu=True)
- result = reader.readtext('pic_file.jpg')
- print(result)
-
- >>>
- CUDA not available - defaulting to CPU. Note: This module is much faster with a GPU.
-
- ([[12, 0], [292, 0], [292, 24], [12, 24]], '博物馆一日游。拒照片无数。分类整理', 0.5019760698786572)
- ([[298, 0], [500, 0], [500, 24], [298, 24]], '希望图片牛的文字进行识别', 0.2667440711212794)
- ([[506, 0], [711, 0], [711, 24], [506, 24]], '加上各展馆。各展品的说明。', 0.48956195253399476)
- ([[12, 26], [280, 26], [280, 50], [12, 50]], '手工一张张的整理。慢。累。要老命。', 0.443645141397)
- ([[12, 52], [260, 52], [260, 76], [12, 76]], '还好。模块化。低代码时代。效率', 0.48323813949440303)
- ([[268, 52], [358, 52], [358, 76], [268, 76]], '性能。界面', 0.7953857046933088)
- ([[364, 52], [516, 52], [516, 76], [364, 76]], '易用性暂下过多考虑', 0.6913828229274245)
- ([[522, 52], [612, 52], [612, 76], [522, 76]], '解决问题先', 0.8767933218561421)
- ([[620, 52], [776, 52], [776, 76], [620, 76]], '省点力气。省点时间。', 0.563630720606001)
- ..........
- raise FileNotFoundError("Missing %s and downloads disabled" % detector_path)
- FileNotFoundError: Missing ./model\craft_mlt_25k.pth and downloads disabled
* 关于CUDA - 使用细节,未研究,待后续
* AttributeError: partially initialized module 'cv2' has no attribute 'gapi_wip_gst_GStreamerPipeline' (most likely due to a circular import)
- opencv-python-headless,版本不匹配
- pip uninstall 卸载,然后使用 pip install 重新安装
opencv-python-headless<=4.5.4.60 in c:\python39\lib\site-packages (from easyocr) (4.5.4.60)
* WARNING: Ignoring invalid distribution -pencv-python-headless (python_install_path\lib\site-packages)
- 安装 opencv-python-headless 时出错形成的临时文件,位置: python_install_path\lib\site-packages
- 解决方法:python安装lib库文件夹下找到该文件,直接删除,重新安装即可
* ERROR: Could not install packages due to an OSError: [WinError 5] 拒绝访问。: '%APPDATA%\Python\..........'
Consider using the `--user` option or check the permissions.
- 使用 --user参数,例,pip install --user *********************
- 命令说明:--user Install to the Python user install directory for your platform. Typically ~/.local/, or %APPDATA%\Python on Windows. (See the Python documentation for site.USER_BASE for full details.)
* cv.gapi.wip.GStreamerPipeline = cv.gapi_wip_gst_GStreamerPipeline
AttributeError: partially initialized module 'cv2' has no attribute 'gapi_wip_gst_GStreamerPipeline' (most likely due to a circular import)
- opencv-python 与 opencv-python-headless 版本不一致
- 解决方法:确认库模块版,uninstall后重新安装指定版本
附代码提示:当前文件夹下、后缀为 jpg 的、图像文字识别,输出到 GetText.txt 文件
- import easyocr
- import glob
- import os,os.path
- from pathlib import Path
-
- reader = easyocr.Reader(['ch_sim','en'],gpu=True, model_storage_directory='./model',verbose=True,download_enabled=False)
- fn = 1
-
- ckfile = Path("./GetText.txt")
- if ckfile.exists():
- os.remove(ckfile)
-
- for f in glob.glob('./*.*'):
- result = ""
- if f.endswith('jpg'):
- result = reader.readtext(f)
-
- print("################ ", f.split('\\',1)[1], " ################")
- temp = ""
- for i in result:
- temp = temp + i[1]
- print(i)
-
- with open("./GetText.txt","a",encoding='utf-8') as fp:
- fp.write("################ " + f.split('\\',1)[1] + " ################\n")
- fp.write(temp)
- fp.write("\n\n\n")
- fn = fn + 1
参考: