tesseract-ocr

https://www.waitalone.cn/python-php-ocr.html

tesseract-ocr-setup-3.02.02.exe
v3.02版本不需要加环境变量

github: tesseract-ocr - tesseract - releases

4.1.0 Release
Windows installer can be downloaded from https://github.com/UB-Mannheim/tesseract/wiki.

https://digi.bib.uni-mannheim.de/tesseract/

tesseract-ocr-setup-3.05.02-20180621.exe # pytesseract 正常运行,用这个版本
tesseract-ocr-w32-setup-v4.1.0-elag2019.exe # pytesseract 会报错

v3.05+版本后添加到环境变量:

C:\Program Files (x86)\Tesseract-OCR

将 TESSDATA_PREFIX 环境变量设置为“tessdata”目录

setx TESSDATA_PREFIX "C:\Program Files (x86)\Tesseract-OCR\tessdata"
# 显示安装的语言包,不报错才能说明环境变量配置正确
tesseract --list-langs

# 查看版本
tesseract --version

# 查看帮助
tesseract --help
tesseract --help-extra

traineddata 语言包 语言训练数据
安装tesseract-ocr时,可以勾选Additional language data(download)选项来安装OCR识别支持的语言包,但下载很慢。
从github下载zip的语言包压缩文件,将tessdata-master中的文件复制到Tesseract的安装目录C:\Program Files (x86)\Tesseract-OCR\tessdata目录下

https://github.com/tesseract-ocr/tessdata

https://github.com/tesseract-ocr/tessdata/archive/refs/tags/4.1.0.zip
These traineddata files can be used with Tesseract 4.0 and newer releases.

https://github.com/tesseract-ocr/tessdata/archive/refs/tags/4.0.0.zip
These traineddata files can be used with Tesseract 4.0 and newer releases.

https://github.com/tesseract-ocr/tessdata/archive/refs/tags/3.04.00.zip
new version language data for tesseract-ocr 3.04

教程
http://c.biancheng.net/tensorflow/