pom:
- <dependency>
- <groupId>net.sourceforge.tess4j</groupId>
- <artifactId>tess4j</artifactId>
- <version>5.11.0</version>
- </dependency>
部分代码:
- package com.zy.datapickcli.sys.controller;
-
- import net.sourceforge.tess4j.Tesseract;
- import net.sourceforge.tess4j.TesseractException;
-
- import java.io.File;
-
- public class TempTest {
- public static void main(String[] args) throws TesseractException {
- File file = new File("D:\\1.png");
- System.out.println(recognizeText(file));
- }
-
- public static String recognizeText(File imageFile) throws TesseractException {
- Tesseract tesseract = new Tesseract();
-
- // 设定训练文件的位置(如果是标准英文识别,此步可省略)
- tesseract.setDatapath("D:\\tessdata");
- tesseract.setLanguage("chi_sim");
- return tesseract.doOCR(imageFile);
- }
- }
data文件下载地址
https://gitcode.com/tesseract-ocr/tessdata/tree/main
其余参考代码:
- @Service
- public class OcrService {
-
- public String recognizeText(File imageFile) throws TesseractException {
- Tesseract tesseract = new Tesseract();
-
- // 设定训练文件的位置(如果是标准英文识别,此步可省略)
- tesseract.setDatapath("你的tessdata各语言集合包地址");
- tesseract.setLanguage("chi_sim");
- return tesseract.doOCR(imageFile);
- }
-
- public String recognizeTextFromUrl(String imageUrl) throws Exception {
- URL url = new URL(imageUrl);
- InputStream in = url.openStream();
- Files.copy(in, Paths.get("downloaded.jpg"), StandardCopyOption.REPLACE_EXISTING);
-
- File imageFile = new File("downloaded.jpg");
- return recognizeText(imageFile);
- }
- }
执行效果: