from http://www.howtoforge.com/ocr_with_tesseract_on_ubuntu704
1. Install Imagemagick
The Tesseract supports only uncompressed and G3-compressed tiff files,
so we need to convert image format and
to install Imagemagick throughout the Synaptic Package Manager.
2. Install the packages tesseract-ocr, tesseract-ocr-eng and tesseract-ocr-dev.
3. Prepare Image file.
Execute Gimp -> File -> New Image -> "Draw the english characters"
-> Image -> Mode -> RGB or GrayScale
-> Tools -> Color Tools -> Threshold
->"Reduce Black and White Image" -> Image -> Mode -> Indexed
4. Convert to uncompressed Tif
$ convert document.jpg document.tif
5. Use Tesseract
$ tesseract document.tif result
댓글 없음:
댓글 쓰기