hocr

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2019 License: MIT Imports: 6 Imported by: 1

Documentation

Index

Constants

View Source
const (
	ClassPage = "ocr_page"
	ClassArea = "ocr_carea"
	ClassLine = "ocr_line"
	ClassWord = "ocrx_word"
)

Possible classes for elements

Variables

This section is empty.

Functions

This section is empty.

Types

type Element

type Element struct {
	Class string
	Node  xml.StartElement
}

Element is used to represent text elements in the hOCR document.

func (Element) Attribute

func (e Element) Attribute(key string) (string, bool)

Attribute returns the attribute's value for a given key and if the attribute was found.

func (Element) BoundingBox

func (e Element) BoundingBox() image.Rectangle

BoundingBox returns the bounding box of the element. If the element does not have a bounding box, the empty boundingbox (0,0)-(0,0) is returned.

func (Element) Scanf

func (e Element) Scanf(attr, key, format string, args ...interface{}) bool

Scanf is used to read values of the different element attributes. Use like this: e.Scanf("title", "image", "%s", &str)

type Meta

type Meta struct {
	Name, Content string
}

Meta represents /html/head/meta tags.

type Node

type Node interface{}

Node represents hOCR nodes returned by the scanner. A Node is either of type Text, Element, Title or Meta.

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner is a low-level scanner for hOCR documents.

func NewScanner

func NewScanner(r io.Reader) *Scanner

NewScanner creates a new hocr.Scanner

func (*Scanner) Err

func (s *Scanner) Err() error

Err returns the last error.

func (*Scanner) Node

func (s *Scanner) Node() Node

Node returns the last scanned node.

func (*Scanner) Scan

func (s *Scanner) Scan() bool

Scan scans the next element in the document. It returns true if a new element was scanned and false if an error occured or if there is no more nodes to be scanned.

type Text

type Text string

Text is used to represent (non empty) char data nodes.

type Title

type Title string

Title represents the char data nodes of /html/head/title elements.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL