gocropus

package module
v0.9.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2019 License: MIT Imports: 10 Imported by: 0

README

builds.sr.ht status

gocropus

Small utility library for working with line segmented images for OCR used by ocropy etc.

Test files

All test files for ground truth and images in testdata are taken from the GT4HistOCR corpus.

Documentation

Index

Constants

View Source
const (
	GTExt     = ".gt.txt"
	TxtExt    = ".txt"
	LLocsExt  = ".llocs"
	BinPngExt = ".bin.png"
	DewPngExt = ".dew.png"
	PngExt    = ".png"
	NrmPngExt = ".nrm.png" /* GT4HistOCR */
)

File extensions for gt, img, txt and llocs files.

Variables

View Source
var ImageExtensions = []string{
	BinPngExt,
	DewPngExt,
	PngExt,
	NrmPngExt,
}

ImageExtensions defines the different possible extensions for line image files. The order of the extensions defines which files are used for image files. Change this if you need other image file priorities.

Functions

func GTFromFile added in v0.9.4

func GTFromFile(p string, stat bool) (string, bool)

GTFromFile returns the according gt file for the given stripped or unstripped path and whether it exists. If stat is set to false, just the according gt path and false are returend; it is not checked in this case if the resulting file path exists. In any case the according gt file path is returned.

func ImageFromFile added in v0.9.4

func ImageFromFile(stripped string) (string, bool)

ImageFromFile returns the according line image file for the given stripped or unstripped path and whether it exists. In order to identify the right extension, the file path is checked with stat. If no existing image file path can be found this function returns "", false.

func LLocsFromFile added in v0.9.4

func LLocsFromFile(p string, stat bool) (string, bool)

LLocsFromFile returns the according llocs file for the given stripped path and whether it exists. If stat is set to false, just the according gt path and false are returend; it is not checked in this case if the resulting file path exists. In any case the according llocs file path is returned.

func OpenImgFile

func OpenImgFile(path string) (image.Image, error)

OpenImgFile reads the image's data from a png encoded file.

func OpenTxtFile

func OpenTxtFile(path string) (string, error)

OpenTxtFile opens a txt or gt file and reads it content line.

func ReadImgFile

func ReadImgFile(in io.Reader) (image.Image, error)

ReadImgFile reads the image's data from a png encoded file.

func ReadTxtFile

func ReadTxtFile(in io.Reader) (string, error)

ReadTxtFile read the content line from a txt or gt file.

func Strip

func Strip(p string) string

Strip returns the bare file path for a given path with all extensions stripped. If the path's file name starts with a leading dot, the whole file name will be removed.

func TxtFromFile added in v0.9.4

func TxtFromFile(p string, stat bool) (string, bool)

TxtFromFile returns the according txt file for the given stripped or unstripped path and whether it exists. If stat is set to false, just the according gt path and false are returend; it is not checked in this case if the resulting file path exists. In any case the according txt file path is returned.

func Walk

func Walk(dir, ext string, recursive bool, f WalkFunc) error

Walk iterates over all files in the given directory and calls the given callback function for each set of Ocropy files. The set of Ocropy files is calculated based on the the given file extension. If recursive is false, sub directories are ignored.

Types

type Cmd added in v0.9.1

type Cmd struct {
	Exe   string // executable to run
	Model string // path of the model to use
}

Cmd wraps information to run gocropus commands.

func (*Cmd) Cmd added in v0.9.1

func (cmd *Cmd) Cmd(args ...string) []string

Cmd returns the command line command that get executed.

func (*Cmd) Run added in v0.9.1

func (cmd *Cmd) Run(args ...string) ([]byte, error)

Run runs a command with the given arguments and returns its combined (stderr and stdout) output.

func (*Cmd) RunContext added in v0.9.1

func (cmd *Cmd) RunContext(ctx context.Context, args ...string) ([]byte, error)

RunContext runs the command with the given context.

type LLoc

type LLoc struct {
	Char rune
	Cut  float32
	Conf float32
}

LLoc represents character information for one recognized character.

type LLocs

type LLocs []LLoc

LLocs represents character information for one recognized line.

func OpenLLocsFile

func OpenLLocsFile(path string) (LLocs, error)

OpenLLocsFile opens a llocs file and returns its contents.

func ReadLLocsFile

func ReadLLocsFile(in io.Reader) (LLocs, error)

ReadLLocsFile read the contents from a llocs file.

func (LLocs) Confs added in v0.9.3

func (l LLocs) Confs() []float32

Confs returns the confidences as an array.

func (LLocs) Cuts added in v0.9.3

func (l LLocs) Cuts() []int

Cuts returns the right cuts of the llocs as int array.

func (LLocs) String

func (l LLocs) String() string

type WalkFunc

type WalkFunc func(string, string, string, string) error

WalkFunc defines the type for the callback function used in Walk. It is called with the paths of the existing Ocropy file image set. The first path is the gt, the second path is the img, the third path is the txt and the fourth path is the llocs file path. If any file path does not exist the according value is set to the empty string "".

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL