extractor

package
v2.1.0+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 14, 2018 License: AGPL-3.0 Imports: 8 Imported by: 0

Documentation

Overview

Package extractor is used for quickly extracting PDF content through a simple interface. Currently offers functionality for extracting textual content.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor stores and offers functionality for extracting content from PDF pages.

func New

func New(page *model.PdfPage) (*Extractor, error)

New returns an Extractor instance for extracting content from the input PDF page.

func (*Extractor) ExtractText

func (e *Extractor) ExtractText() (string, error)

ExtractText processes and extracts all text data in content streams and returns as a string. Takes into account character encoding via CMaps in the PDF file. The text is processed linearly e.g. in the order in which it appears. A best effort is done to add spaces and newlines.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL