unidoc: github.com/unidoc/unidoc/pdf/extractor Index | Files

package extractor

import "github.com/unidoc/unidoc/pdf/extractor"

Package extractor is used for quickly extracting PDF content through a simple interface. Currently offers functionality for extracting textual content.


Package Files

const.go doc.go extractor.go text.go utils.go

type Extractor Uses

type Extractor struct {
    // contains filtered or unexported fields

Extractor stores and offers functionality for extracting content from PDF pages.

func New Uses

func New(page *model.PdfPage) (*Extractor, error)

New returns an Extractor instance for extracting content from the input PDF page.

func (*Extractor) ExtractText Uses

func (e *Extractor) ExtractText() (string, error)

ExtractText processes and extracts all text data in content streams and returns as a string. Takes into account character encoding via CMaps in the PDF file. The text is processed linearly e.g. in the order in which it appears. A best effort is done to add spaces and newlines.

Package extractor imports 9 packages (graph). Updated 2018-03-22. Refresh now. Tools for package owners.