2010년 5월 28일 금요일

Install Tesseract in Ubuntu

from http://www.howtoforge.com/ocr_with_tesseract_on_ubuntu704

1. Install Imagemagick

The Tesseract supports only uncompressed and G3-compressed tiff files,

so we need to convert image format and

to install Imagemagick throughout the Synaptic Package Manager.

2. Install the packages tesseract-ocr, tesseract-ocr-eng and tesseract-ocr-dev.

3. Prepare Image file.

Execute Gimp -> File -> New Image -> "Draw the english characters"

-> Image -> Mode -> RGB or GrayScale

-> Tools -> Color Tools -> Threshold

->"Reduce Black and White Image" -> Image -> Mode -> Indexed

4. Convert to uncompressed Tif

$ convert document.jpg document.tif

5. Use Tesseract

$ tesseract document.tif result

댓글 없음:

댓글 쓰기