tesseract

package module
v1.0.0-...-b5aa24e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 20, 2014 License: BSD-2-Clause Imports: 10 Imported by: 1

README

##go.tesseract go.tesseract is a wrapper for the tesseract OCR library (text-recognition from image/pdf).

Installation and dependencies

go.tesseract has two direct dependencies; go.leptonica and libtesseract

Make sure you have installed go.leptonica. go.leptonica has a C library dependency, please read the go.leptonica/README.md.

You are required to install the tesseract library including development headers at version 3.02.02 or later. You absolutely need 3.02.02 (or later) as go.tesseract can not compile with earlier versions of tesseract. At time of writing this version of tesseract is not in the ubuntu/debian stable repository yet.

go.tesseract uses gopkg.in for versioned releases:

go get gopkg.in/GeertJohan/go.tesseract.v1

Debian testing (jessie) package

sudo apt-get install -t testing libtesseract3 libtesseract-dev

OSX with Homebrew

Do the following before trying to go get this package:

$ brew install leptonica
$ brew install tesseract
$ export CGO_LDFLAGS="-L/usr/local/Cellar/leptonica/1.69_1/lib -L/usr/local/Cellar/tesseract/3.02.02/lib"
$ export CGO_CFLAGS="-I/usr/local/Cellar/leptonica/1.69_1/include -I/usr/local/Cellar/tesseract/3.02.02/include"

Note: this assumes you are using the standard Brew path of /usr/local/Cellar

Manual installation

Download, configure, make and install

svn checkout http://tesseract-ocr.googlecode.com/svn/tags/release-3.02.02 tesseract-ocr-read-only
cd tesseract-ocr-read-only
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

Language files

If you have installed from debian testing (jessie):

sudo apt-get install -t testing tesseract-ocr-YOUR-LANGUAGE-SHORTCODE

# example, this installs dutch and english
sudo apt-get install -t testing tesseract-ocr-nld
sudo apt-get install -t testing tesseract-ocr-eng

If you have installed manually; copy language files (do this for any language you require)

sudo cp tessdata/YOUR-LANGUAGE-SHORTCODE.* /usr/local/share/tessdata/

# example for english and dutch:
sudo cp tessdata/eng.* /usr/local/share/tessdata/
sudo cp tessdata/nld.* /usr/local/share/tessdata/

For more information, view the tesseract compilation guide.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Version

func Version() string

Version returns both go.tesseract's version as well as the version from the tesseract lib (>3.02.02)

Types

type BoxCharacter

type BoxCharacter struct {
	Character  rune
	StartX     uint32
	StartY     uint32
	EndX       uint32
	EndY       uint32
	Pagenumber uint32
}

type BoxText

type BoxText struct {
	Characters []BoxCharacter
}

TODO: make this: `type BoxText []BoxCharacter` ?

type PageSegMode

type PageSegMode int

typedef enum TessPageSegMode { PSM_OSD_ONLY, PSM_AUTO_OSD, PSM_AUTO_ONLY, PSM_AUTO, PSM_SINGLE_COLUMN, PSM_SINGLE_BLOCK_VERT_TEXT, PSM_SINGLE_BLOCK, PSM_SINGLE_LINE, PSM_SINGLE_WORD, PSM_CIRCLE_WORD, PSM_SINGLE_CHAR, PSM_COUNT } TessPageSegMode;

const (
	PSM_OSD_ONLY PageSegMode = iota
	PSM_AUTO_OSD
	PSM_AUTO_ONLY
	PSM_AUTO
	PSM_SINGLE_COLUMN
	PSM_SINGLE_BLOCK_VERT_TEXT
	PSM_SINGLE_BLOCK
	PSM_SINGLE_LINE
	PSM_SINGLE_WORD
	PSM_CIRCLE_WORD
	PSM_SINGLE_CHAR
	PSM_COUNT
)

type Tess

type Tess struct {
	// contains filtered or unexported fields
}

Tess represents a tesseract instance

func NewTess

func NewTess(datapath string, language string) (*Tess, error)

NewTess creates and returns a new tesseract instance.

func (*Tess) AvailableLanguages

func (t *Tess) AvailableLanguages() []string

AvailableLanguages returns the languages available to the given tesseract instance. To find the languages actually loaded use (*Tess).LoadedLanguages().

func (*Tess) BoxText

func (tess *Tess) BoxText(pagenumber int) (*BoxText, error)

BoxText returns the output given by BoxTextRaw as BoxText object

func (*Tess) BoxTextRaw

func (t *Tess) BoxTextRaw(pagenumber int) string

BoxTextRaw returns the raw box text for given pagenumber

func (*Tess) Clear

func (t *Tess) Clear()

Clear frees up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload. Afterwards, you must call SetImagePix before doing any Recognize or Get* operation.

func (*Tess) Close

func (t *Tess) Close()

Close clears the tesseract instance from memory

func (*Tess) DumpVariables

func (t *Tess) DumpVariables()

DumpVariables dumps the variables set on a Tess to stdout

func (*Tess) HOCRText

func (t *Tess) HOCRText(pagenumber int) string

HOCRText returns the HOCR text for given pagenumber

func (*Tess) InitializedLanguages

func (t *Tess) InitializedLanguages() string

InitializedLanguages returns the languages string used in the last valid initialization. If the last initialization specified "deu+hin" then that will be returned. If hin loaded eng automatically as well, then that will not be included in this list. To find the languages actually loaded use (*Tess).LoadedLanguages().

func (*Tess) LoadedLanguages

func (t *Tess) LoadedLanguages() []string

LoadedLanguages returns the loaded languages in the vector of STRINGs. Includes all languages loaded for the given tesseract instance, including those loaded as dependencies of other loaded languages.

func (*Tess) SetImagePix

func (t *Tess) SetImagePix(pix *leptonica.Pix)

SetImagePix sets the input image using a leptonica Pix

func (*Tess) SetInputName

func (t *Tess) SetInputName(filename string)

SetInputName sets the name of the input file. Needed only for training and loading a UNLV zone file. ++ TODO: drop this?

func (*Tess) SetPageSegMode

func (tess *Tess) SetPageSegMode(psm PageSegMode)

void TessBaseAPISetPageSegMode(TessBaseAPI* handle, TessPageSegMode mode);

func (*Tess) SetRectangle

func (t *Tess) SetRectangle(left, top, width, height int)

void TessBaseAPISetRectangle(TessBaseAPI* handle, int left, int top, int width, int height);

func (*Tess) SetVariable

func (t *Tess) SetVariable(name, value string) error

BOOL TessBaseAPISetVariable(TessBaseAPI* handle, const char* name, const char* value);

func (*Tess) Text

func (t *Tess) Text() string

Text returns text after analysing the image(s)

func (*Tess) UNLVText

func (t *Tess) UNLVText() string

UNLVText returns the UNLV text

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL