extractor

package

v2.1.0+incompatible Latest Latest Go to latest Published: Jul 14, 2018 License: AGPL-3.0 Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gtrafimenkov/ossdoc

Links

Open Source Insights

Documentation ¶

Overview ¶

Package extractor is used for quickly extracting PDF content through a simple interface. Currently offers functionality for extracting textual content.

Index ¶

type Extractor
- func New(page *model.PdfPage) (*Extractor, error)
- func (e *Extractor) ExtractText() (string, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Extractor ¶

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor stores and offers functionality for extracting content from PDF pages.

func New ¶

func New(page *model.PdfPage) (*Extractor, error)

New returns an Extractor instance for extracting content from the input PDF page.

func (*Extractor) ExtractText ¶

func (e *Extractor) ExtractText() (string, error)

ExtractText processes and extracts all text data in content streams and returns as a string. Takes into account character encoding via CMaps in the PDF file. The text is processed linearly e.g. in the order in which it appears. A best effort is done to add spaces and newlines.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL