Documentation ¶
Overview ¶
Package pdf implements reading of PDF files.
Overview ¶
PDF is Adobe's Portable Document Format, ubiquitous on the internet. A PDF document is a complex data format built on a fairly simple structure. This package exposes the simple structure along with some wrappers to extract basic information. If more complex information is needed, it is possible to extract that information by interpreting the structure exposed by this package.
Specifically, a PDF is a data structure built from Values, each of which has one of the following Kinds:
Null, for the null object. Integer, for an integer. Real, for a floating-point number. Bool, for a boolean value. Name, for a name constant (as in /Helvetica). String, for a string constant. Dict, for a dictionary of name-value pairs. Array, for an array of values. Stream, for an opaque data stream and associated header dictionary.
The accessors on Value—Int64, Float64, Bool, Name, and so on—return a view of the data as the given type. When there is no appropriate view, the accessor returns a zero result. For example, the Name accessor returns the empty string if called on a Value v for which v.Kind() != Name. Returning zero values this way, especially from the Dict and Array accessors, which themselves return Values, makes it possible to traverse a PDF quickly without writing any error checking. On the other hand, it means that mistakes can go unreported.
The basic structure of the PDF file is exposed as the graph of Values.
Most richer data structures in a PDF file are dictionaries with specific interpretations of the name-value pairs. The Font and Page wrappers make the interpretation of a specific Value as the corresponding type easier. They are only helpers, though: they are implemented only in terms of the Value API and could be moved outside the package. Equally important, traversal of other PDF data structures can be implemented in other packages as needed.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Page ¶
type Page struct {
// contains filtered or unexported fields
}
A Page represent a single Page in a PDF file. The methods interpret a Page dictionary stored in V.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
A Reader is a single PDF file open for reading.
func NewReaderEncrypted ¶
NewReaderEncrypted opens a file for reading, using the data in f with the given total size. If the PDF is encrypted, NewReaderEncrypted calls pw repeatedly to obtain passwords to try. If pw returns the empty string, NewReaderEncrypted stops trying to decrypt the file and returns an error.
Notes ¶
Bugs ¶
The library makes no attempt at efficiency. A value cache maintained in the Reader would probably help significantly.