카테고리 없음

tesseract-ocr 사용해 보기

KZNetwork 2024. 4. 28. 17:43
반응형

이 워크샵에서는 tesseract-ocr에 대해 알 수 있다.
tesseract-ocr 준비
https://github.com/tesseract-ocr/tesseract

 

Tesseract Version 
Tesseract 3.X (legacy)
Tesseract 4.X (+ LSTM)  : Line Detection, Fine Tuning
Tesseract 5.X (+ For Windows) : by UB Mannheim


tesseract-ocr 설치
https://tesseract-ocr.github.io/tessdoc/Installation.html


https://github.com/UB-Mannheim/tesseract/wiki



tesseract-ocr 라이브러리 설치

ltesseract-ocr-w64-setup-5.3.1.20230401.exe
 

 

Introduction

Tesseract documentation

tesseract-ocr.github.io

 

 

 

GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)

Tesseract Open Source OCR Engine (main repository) - tesseract-ocr/tesseract

github.com

 

tesseract-ocr 라이브러리 설치


(base) C:\DEV\gitworkspaces>conda activate tf38-torch

(tf38-torch) C:\DEV\gitworkspaces>cd ocr-tesseract

(tf38-torch) C:\DEV\gitworkspaces\ocr-tesseract>dir/w/p
 C 드라이브의 볼륨에는 이름이 없습니다.
 볼륨 일련 번호: EA9B-192D

 C:\DEV\gitworkspaces\ocr-tesseract 디렉터리

[.]  [..]
               0개 파일                   0 바이트     
               2개 디렉터리  25,983,504,384 바이트 남음

(tf38-torch) C:\DEV\gitworkspaces\ocr-tesseract>pip install pytesseract
Collecting pytesseract
  Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Requirement already satisfied: packaging>=21.3 in c:\dev\anaconda3\envs\tf38-torch\lib\site-packages (from pytesseract) (23.1)
Requirement already satisfied: Pillow>=8.0.0 in c:\users\daekyeong\appdata\roaming\python\python38\site-packages (from pytesseract) (9.5.0)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.10

(tf38-torch) C:\DEV\gitworkspaces\ocr-tesseract>

 

 

 

 

 

 

훈련 데이터 확보
Trained Data Download

 


Trained models
Speed Accuracy Supports legacy Retrainable  
tessdata Legacy + LSTM (integerized tessdata-best) Faster than tessdata-best Slightly less accurate than tessdata-best Yes No
tessdata-best LSTM only (based on langdata) Slowest Most accurate No Yes
tessdata-fast Integerized LSTM of a smaller network than tessdata-best Fastest Least accurate No No

 

 

l훈련 데이터 확보
C:\Program Files\Tesseract-OCR\tessdata
 

 

 

 

l훈련 데이터 확보
 
 

 

 

l훈련 데이터 확보

https://github.com/tesseract-ocr

 

 

 

l훈련 데이터 확보
다운로드
 

 

 

 

 

l훈련 데이터 확보
C:\Program Files\Tesseract-OCR\tessdata에 덮어 씌운다
 

 

 

 

 

반응형