이 워크샵에서는 tesseract-ocr에 대해 알 수 있다.
tesseract-ocr 준비
https://github.com/tesseract-ocr/tesseract

Tesseract Version
Tesseract 3.X (legacy)
Tesseract 4.X (+ LSTM) : Line Detection, Fine Tuning
Tesseract 5.X (+ For Windows) : by UB Mannheim
tesseract-ocr 설치
https://tesseract-ocr.github.io/tessdoc/Installation.html

https://github.com/UB-Mannheim/tesseract/wiki

tesseract-ocr 라이브러리 설치




Introduction
Tesseract documentation
tesseract-ocr.github.io
GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)
Tesseract Open Source OCR Engine (main repository) - tesseract-ocr/tesseract
github.com
tesseract-ocr 라이브러리 설치
(base) C:\DEV\gitworkspaces>conda activate tf38-torch
(tf38-torch) C:\DEV\gitworkspaces>cd ocr-tesseract
(tf38-torch) C:\DEV\gitworkspaces\ocr-tesseract>dir/w/p
C 드라이브의 볼륨에는 이름이 없습니다.
볼륨 일련 번호: EA9B-192D
C:\DEV\gitworkspaces\ocr-tesseract 디렉터리
[.] [..]
0개 파일 0 바이트
2개 디렉터리 25,983,504,384 바이트 남음
(tf38-torch) C:\DEV\gitworkspaces\ocr-tesseract>pip install pytesseract
Collecting pytesseract
Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Requirement already satisfied: packaging>=21.3 in c:\dev\anaconda3\envs\tf38-torch\lib\site-packages (from pytesseract) (23.1)
Requirement already satisfied: Pillow>=8.0.0 in c:\users\daekyeong\appdata\roaming\python\python38\site-packages (from pytesseract) (9.5.0)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.10
(tf38-torch) C:\DEV\gitworkspaces\ocr-tesseract>



훈련 데이터 확보
Trained Data Download
Trained models |
Speed | Accuracy | Supports legacy | Retrainable | |
| tessdata | Legacy + LSTM (integerized tessdata-best) | Faster than tessdata-best | Slightly less accurate than tessdata-best | Yes | No |
| tessdata-best | LSTM only (based on langdata) | Slowest | Most accurate | No | Yes |
| tessdata-fast | Integerized LSTM of a smaller network than tessdata-best | Fastest | Least accurate | No | No |


https://github.com/tesseract-ocr





