pdftotext

package

v1.0.3 Latest Latest Go to latest Published: Dec 11, 2023 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/young2j/oxmltotext

Links

Open Source Insights

Documentation ¶

Index ¶

type PdfParser

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type PdfParser ¶

type PdfParser struct {
	// contains filtered or unexported fields
}

PdfParser is a wrapper around the go-fitz library.

func Open ¶

func Open(path string) (*PdfParser, error)

Open creates a new PdfParser and opens a PDF file at the specified path.

Parameters:

path: The path of the PDF file to open.

Returns:

*PdfParser: A pointer to the PdfParser object if the file was opened successfully.
error: An error object if there was an error opening the file.

func OpenReader ¶

func OpenReader(r io.Reader) (*PdfParser, error)

OpenReader creates a new PdfParser from an io.Reader.

Parameters:

r: The io.Reader from which to create the PdfParser.

Returns:

*PdfParser: The created PdfParser.
error: Any error that occurred during the creation of the PdfParser.

func OpenURL ¶

func OpenURL(u string) (*PdfParser, int, error)

OpenURL creates a new PdfParser by reading the specified URL.

Parameters:

u: the URL to open as a string.

Returns:

pp: a pointer to a PdfParser object.
statusCode: an integer representing the HTTP status code of the URL response.
err: an error object, if any error occurred during the process.

func (*PdfParser) Close ¶

func (pp *PdfParser) Close() error

Close closes the opened pdf document of PdfParser.

func (*PdfParser) ExtractPageTexts ¶

func (pp *PdfParser) ExtractPageTexts(pages ...int) (string, error)

ExtractPageTexts extracts the text from the specified pages(start 0) of a PDF document.

Parameters:

pages: A variadic parameter representing the page numbers to extract the text from.

Returns:

A string containing the text content of the specified pages.
An error if any error occurs during the extraction process.

func (*PdfParser) ExtractTexts ¶

func (pp *PdfParser) ExtractTexts() (string, error)

ExtractTexts extracts the text from all pages of a PDF document.

Parameters:

None

Returns:

A string containing the text content of all pages seperated by the pageSep.
An error if any error occurs during the extraction process.

func (*PdfParser) NumPages ¶

func (pp *PdfParser) NumPages() int

NumPages returns the number of pages in the PDF.

func (*PdfParser) SetPageSep ¶

func (pp *PdfParser) SetPageSep(sep string)

SetPageSep sets the page text separator for the PdfParser. Default is "-"x100.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL