naveoss_pdf

package module

v1.0.4 Latest Latest Go to latest Published: Nov 14, 2018 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

gitlab.com/naveoss-go/naveoss-pdf

README ¶

naveoss-pdf

This is a Golang PDF library for obtaining metadata (Author, Title) from a PDF.

Supported PDF Versions:

1.2
1.3
1.4
1.5
1.6
1.7

Not Supported:

ObjStm objects
Hybrid Pre 1.5 xref + xref object stream
exif embedded meta

Why?

I found that most go (and other languages) PDF libraries only support exif streams. This was not useful for reading most PDF files in my library. Most of the time those libraries would choose an exif entry that was older than the latest or the PDF wouldn't have exif data at all.

Current shortcomings

Partial updates where Author is updated but Title is which results in the current and Prev objects of trailer/xref to be merged currently do not merge.
If Info object is a reference to an object that is in a ObjStrm it cannot be located (in development).
Trailer.Prev resolution is not available so if Info is located in a previous trailer/xref section (Pre 1.5 & older updates) Info cannot be located.
exif streams are not supported and likely never will. Use another library if the only metadata in the PDF is exif.

Documentation ¶

Index ¶

Constants
Variables
func Decompress(stream []byte) ([]byte, error)
func PdfPredictor12Decode(ba []byte, size int) ([]byte, error)
type ObjectStreamScanner
type Pdf
- func ParsePDF(filename string) (*Pdf, error)
- func (pdf *Pdf) GetMeta() (PdfMeta, error)
type PdfMeta
type Scanner
type XRefI

Constants ¶

View Source

const (
	DecodeParms  = "DecodeParms"
	Author       = "Author"
	CreationDate = "CreationDate"
	ModDate      = "ModDate"
	Title        = "Title"
	Columns      = "Columns"
	Predictor    = "Predictor"
	Filter       = "Filter"
	FlateDecode  = "FlateDecode"
	Info         = "Info"
	Length       = "Length"
	Root         = "Root"
	Size         = "Size"
	Type         = "Type"
	XRef         = "XRef"
	W            = "W"
	Prev         = "Prev"
	Index        = "Index"
)

Variables ¶

View Source

var (
	PdfErrorNotParsed                   = errors.New("Pdf has not been parsed yet")
	PdfErrorNotFound                    = errors.New("Not Found")
	PdfErrorInfoNotFound                = errors.New("Info Not Found")
	PdfErrorXRefNotFound                = errors.New("XRef Not Found")
	PdfErrorFailedToResolveRef          = errors.New("Failed to resolve a PdfRef to an OBJ")
	PdfErrorFailedToFindIndexForXRef    = errors.New("Failed to find index for XRef & object is not in xref table")
	PdfErrorInvalidXRefIndex            = errors.New("XRef Index was not formatted correctly")
	PdfErrorCompressObjectsNotSupported = errors.New("Compressed Objects are not currently supported")
	PdfErrorFailedToParseTrailer        = errors.New("Failed to parse Trailer")
	PdfErrorInvalidTrailerObjectType    = errors.New("Expected *pdfDictionary for trailer")
	PdfErrorUnknownMode                 = errors.New("Mode passed to parseObject is not a known mode")
	PdfParseErrorUnclosedObject         = errors.New("Never encrountered the closing token of an object")
	PdfParseErrorInvalidNameObject      = errors.New("Invalid name object")
	PdfParseErrorEncounteredWhitespace  = errors.New("Encountered whitespace that was suppose to be skipped. File Bug!")
)

View Source

var (
	Debug bool
)

Functions ¶

func Decompress ¶

func Decompress(stream []byte) ([]byte, error)

Decompress provides and expanded byte array for the PDF stream provided. It will auto select the correct decompression method so it should succeed regardless of which type of compression is used. Returns error if for some reason decompress fails.

func PdfPredictor12Decode ¶

func PdfPredictor12Decode(ba []byte, size int) ([]byte, error)

PdfPredictor12Decode will decode a PNG Up Predictor. In PDF the size of each line is size + 1 because the first byte of each "line" is the type of Predictor used. The intention is that each line could use a different predictor. At the moment this decoder can only use type:2. The type should be striped from the byte array and discarded during the computation.

Types ¶

type ObjectStreamScanner ¶

type ObjectStreamScanner struct {
	// contains filtered or unexported fields
}

type Pdf ¶

type Pdf struct {
	// contains filtered or unexported fields
}

func ParsePDF ¶

func ParsePDF(filename string) (*Pdf, error)

func (*Pdf) GetMeta ¶

func (pdf *Pdf) GetMeta() (PdfMeta, error)

type PdfMeta ¶

type PdfMeta struct {
	Title        string
	Author       string
	CreationDate time.Time
}

type Scanner ¶

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner represents a lexical scanner.

type XRefI ¶

type XRefI interface {
	// contains filtered or unexported methods
}

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
app
tools

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL