https://www.waitalone.cn/python-php-ocr.html
tesseract-ocr-setup-3.02.02.exe
v3.02版本不需要加环境变量
github: tesseract-ocr - tesseract - releases
4.1.0 Release
Windows installer can be downloaded from https://github.com/UB-Mannheim/tesseract/wiki.
https://digi.bib.uni-mannheim.de/tesseract/
tesseract-ocr-setup-3.05.02-20180621.exe # pytesseract 正常运行,用这个版本
tesseract-ocr-w32-setup-v4.1.0-elag2019.exe # pytesseract 会报错
v3.05+版本后添加到环境变量:
C:\Program Files (x86)\Tesseract-OCR
将 TESSDATA_PREFIX 环境变量设置为“tessdata”目录
setx TESSDATA_PREFIX "C:\Program Files (x86)\Tesseract-OCR\tessdata"
# 显示安装的语言包,不报错才能说明环境变量配置正确
tesseract --list-langs
# 查看版本
tesseract --version
# 查看帮助
tesseract --help
tesseract --help-extra
traineddata 语言包 语言训练数据
安装tesseract-ocr时,可以勾选Additional language data(download)选项来安装OCR识别支持的语言包,但下载很慢。
从github下载zip的语言包压缩文件,将tessdata-master中的文件复制到Tesseract的安装目录C:\Program Files (x86)\Tesseract-OCR\tessdata目录下
https://github.com/tesseract-ocr/tessdata
https://github.com/tesseract-ocr/tessdata/archive/refs/tags/4.1.0.zip
These traineddata files can be used with Tesseract 4.0 and newer releases.
https://github.com/tesseract-ocr/tessdata/archive/refs/tags/4.0.0.zip
These traineddata files can be used with Tesseract 4.0 and newer releases.
https://github.com/tesseract-ocr/tessdata/archive/refs/tags/3.04.00.zip
new version language data for tesseract-ocr 3.04