pdftotext

package
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2023 License: MIT Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type PdfParser

type PdfParser struct {
	// contains filtered or unexported fields
}

PdfParser is a wrapper around the go-fitz library.

func Open

func Open(path string) (*PdfParser, error)

Open creates a new PdfParser and opens a PDF file at the specified path.

Parameters:

  • path: The path of the PDF file to open.

Returns:

  • *PdfParser: A pointer to the PdfParser object if the file was opened successfully.
  • error: An error object if there was an error opening the file.

func OpenReader

func OpenReader(r io.Reader) (*PdfParser, error)

OpenReader creates a new PdfParser from an io.Reader.

Parameters:

  • r: The io.Reader from which to create the PdfParser.

Returns:

  • *PdfParser: The created PdfParser.
  • error: Any error that occurred during the creation of the PdfParser.

func OpenURL

func OpenURL(u string) (*PdfParser, int, error)

OpenURL creates a new PdfParser by reading the specified URL.

Parameters:

  • u: the URL to open as a string.

Returns:

  • pp: a pointer to a PdfParser object.
  • statusCode: an integer representing the HTTP status code of the URL response.
  • err: an error object, if any error occurred during the process.

func (*PdfParser) Close

func (pp *PdfParser) Close() error

Close closes the opened pdf document of PdfParser.

func (*PdfParser) ExtractPageTexts

func (pp *PdfParser) ExtractPageTexts(pages ...int) (string, error)

ExtractPageTexts extracts the text from the specified pages(start 0) of a PDF document.

Parameters:

  • pages: A variadic parameter representing the page numbers to extract the text from.

Returns:

  • A string containing the text content of the specified pages.
  • An error if any error occurs during the extraction process.

func (*PdfParser) ExtractTexts

func (pp *PdfParser) ExtractTexts() (string, error)

ExtractTexts extracts the text from all pages of a PDF document.

Parameters:

  • None

Returns:

  • A string containing the text content of all pages seperated by the pageSep.
  • An error if any error occurs during the extraction process.

func (*PdfParser) NumPages

func (pp *PdfParser) NumPages() int

NumPages returns the number of pages in the PDF.

func (*PdfParser) SetPageSep

func (pp *PdfParser) SetPageSep(sep string)

SetPageSep sets the page text separator for the PdfParser. Default is "-"x100.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL