pdf

package module

v0.0.8 Latest Latest Go to latest Published: Apr 8, 2024 License: BSD-3-Clause Imports: 16 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ScriptRock/pdf

Links

Open Source Insights

README ¶

PDF Reader

A simple Go library which enables reading PDF files text content. Fork tree:

Reader.GetText returns the text content annotated with text size and weight information. Text is returned in stream order - irrespectve of where it appears on the page, the returned text order is how it appears in the PDF stream.

Attempts are made to separate text blocks that are displayed in separate blocks in the PDF as separate paragraphs.

e.g. with tabular PDF content:

Col 1 header	Col 2 header
Text in row 1 col 1	Text in row 1 col 2
Text in row 2 col 1	Text in row 2 col 2

Reader.GetText returns content as:

Col 1 header

Col 2 header

Text in row 1 col 1

Text in row 1 col 2

Text in row 2 col 1

Text in row 2 col 2

Documentation ¶

Overview ¶

Package pdf implements reading of PDF files.

Overview ¶

PDF is Adobe's Portable Document Format, ubiquitous on the internet. A PDF document is a complex data format built on a fairly simple structure. This package exposes the simple structure along with some wrappers to extract basic information. If more complex information is needed, it is possible to extract that information by interpreting the structure exposed by this package.

Specifically, a PDF is a data structure built from Values, each of which has one of the following Kinds:

Null, for the null object.
Integer, for an integer.
Real, for a floating-point number.
Bool, for a boolean value.
Name, for a name constant (as in /Helvetica).
String, for a string constant.
Dict, for a dictionary of name-value pairs.
Array, for an array of values.
Stream, for an opaque data stream and associated header dictionary.

The accessors on Value—Int64, Float64, Bool, Name, and so on—return a view of the data as the given type. When there is no appropriate view, the accessor returns a zero result. For example, the Name accessor returns the empty string if called on a Value v for which v.Kind() != Name. Returning zero values this way, especially from the Dict and Array accessors, which themselves return Values, makes it possible to traverse a PDF quickly without writing any error checking. On the other hand, it means that mistakes can go unreported.

The basic structure of the PDF file is exposed as the graph of Values.

Most richer data structures in a PDF file are dictionaries with specific interpretations of the name-value pairs. The Font and Page wrappers make the interpretation of a specific Value as the corresponding type easier. They are only helpers, though: they are implemented only in terms of the Value API and could be moved outside the package. Equally important, traversal of other PDF data structures can be implemented in other packages as needed.

Index ¶

type Page
- func (p *Page) Text() (result text.Text, err error)
type Reader
Bugs

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Page ¶

type Page struct {
	// contains filtered or unexported fields
}

A Page represent a single Page in a PDF file. The methods interpret a Page dictionary stored in V.

func (*Page) Text ¶

func (p *Page) Text() (result text.Text, err error)

Text returns the structured text on the page.

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

A Reader is a single PDF file open for reading.

func NewReader ¶

func NewReader(f io.ReaderAt, size int64) (*Reader, error)

NewReader opens a file for reading, using the data in f with the given total size.

func NewReaderEncrypted ¶

func NewReaderEncrypted(f io.ReaderAt, size int64, pw string) (*Reader, error)

NewReaderEncrypted opens a file for reading, using the data in f with the given total size. If the PDF is encrypted, NewReaderEncrypted calls pw repeatedly to obtain passwords to try. If pw returns the empty string, NewReaderEncrypted stops trying to decrypt the file and returns an error.

func Open ¶

func Open(file string) (*Reader, error)

Open opens a file for reading. Reader.Close should be called when done with the Reader.

func (*Reader) Close ¶

func (r *Reader) Close() error

Close closes the underlying Reader if it is an io.Closer.

func (*Reader) NPages ¶

func (r *Reader) NPages() int

NPages returns the number of pages in the PDF file.

func (*Reader) Page ¶

func (r *Reader) Page(i int) (text.Text, error)

Page returns the page for the given page number. Page numbers are indexed starting at 1, not 0. If the page is not found, Page returns an error.

func (*Reader) Text ¶

func (r *Reader) Text() (text.Text, error)

Text returns a structured Text for all pages of the pdf.

Notes ¶

Bugs ¶

The library makes no attempt at efficiency. A value cache maintained in the Reader would probably help significantly.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
decrypter
encoding
state
types
text

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL