naveoss_pdf

package module
v1.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2018 License: MIT Imports: 13 Imported by: 0

README

naveoss-pdf

This is a Golang PDF library for obtaining metadata (Author, Title) from a PDF.

Supported PDF Versions:
  • 1.2
  • 1.3
  • 1.4
  • 1.5
  • 1.6
  • 1.7
Not Supported:
  • ObjStm objects
  • Hybrid Pre 1.5 xref + xref object stream
  • exif embedded meta
Why?

I found that most go (and other languages) PDF libraries only support exif streams. This was not useful for reading most PDF files in my library. Most of the time those libraries would choose an exif entry that was older than the latest or the PDF wouldn't have exif data at all.

Current shortcomings
  1. Partial updates where Author is updated but Title is which results in the current and Prev objects of trailer/xref to be merged currently do not merge.
  2. If Info object is a reference to an object that is in a ObjStrm it cannot be located (in development).
  3. Trailer.Prev resolution is not available so if Info is located in a previous trailer/xref section (Pre 1.5 & older updates) Info cannot be located.
  4. exif streams are not supported and likely never will. Use another library if the only metadata in the PDF is exif.

Documentation

Index

Constants

View Source
const (
	DecodeParms  = "DecodeParms"
	Author       = "Author"
	CreationDate = "CreationDate"
	ModDate      = "ModDate"
	Title        = "Title"
	Columns      = "Columns"
	Predictor    = "Predictor"
	Filter       = "Filter"
	FlateDecode  = "FlateDecode"
	Info         = "Info"
	Length       = "Length"
	Root         = "Root"
	Size         = "Size"
	Type         = "Type"
	XRef         = "XRef"
	W            = "W"
	Prev         = "Prev"
	Index        = "Index"
)

Variables

View Source
var (
	PdfErrorNotParsed                   = errors.New("Pdf has not been parsed yet")
	PdfErrorNotFound                    = errors.New("Not Found")
	PdfErrorInfoNotFound                = errors.New("Info Not Found")
	PdfErrorXRefNotFound                = errors.New("XRef Not Found")
	PdfErrorFailedToResolveRef          = errors.New("Failed to resolve a PdfRef to an OBJ")
	PdfErrorFailedToFindIndexForXRef    = errors.New("Failed to find index for XRef & object is not in xref table")
	PdfErrorInvalidXRefIndex            = errors.New("XRef Index was not formatted correctly")
	PdfErrorCompressObjectsNotSupported = errors.New("Compressed Objects are not currently supported")
	PdfErrorFailedToParseTrailer        = errors.New("Failed to parse Trailer")
	PdfErrorInvalidTrailerObjectType    = errors.New("Expected *pdfDictionary for trailer")
	PdfErrorUnknownMode                 = errors.New("Mode passed to parseObject is not a known mode")
	PdfParseErrorUnclosedObject         = errors.New("Never encrountered the closing token of an object")
	PdfParseErrorInvalidNameObject      = errors.New("Invalid name object")
	PdfParseErrorEncounteredWhitespace  = errors.New("Encountered whitespace that was suppose to be skipped. File Bug!")
)
View Source
var (
	Debug bool
)

Functions

func Decompress

func Decompress(stream []byte) ([]byte, error)

Decompress provides and expanded byte array for the PDF stream provided. It will auto select the correct decompression method so it should succeed regardless of which type of compression is used. Returns error if for some reason decompress fails.

func PdfPredictor12Decode

func PdfPredictor12Decode(ba []byte, size int) ([]byte, error)

PdfPredictor12Decode will decode a PNG Up Predictor. In PDF the size of each line is size + 1 because the first byte of each "line" is the type of Predictor used. The intention is that each line could use a different predictor. At the moment this decoder can only use type:2. The type should be striped from the byte array and discarded during the computation.

Types

type ObjectStreamScanner

type ObjectStreamScanner struct {
	// contains filtered or unexported fields
}

type Pdf

type Pdf struct {
	// contains filtered or unexported fields
}

func ParsePDF

func ParsePDF(filename string) (*Pdf, error)

func (*Pdf) GetMeta

func (pdf *Pdf) GetMeta() (PdfMeta, error)

type PdfMeta

type PdfMeta struct {
	Title        string
	Author       string
	CreationDate time.Time
}

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner represents a lexical scanner.

type XRefI

type XRefI interface {
	// contains filtered or unexported methods
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL