pdflib

package module

v0.0.0-...-99d8987 Latest Latest Go to latest Published: Aug 18, 2017 License: MIT Imports: 18 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/EndFirstCorp/pdflib

Links

Open Source Insights

README ¶

pdflib: a golang pdf processor

Package pdflib is a simple PDF processing library written in Go It provides both an API and a command line tool. Supported are all versions up to PDF 1.7 (ISO-32000).

Motivation

Reducing the size of large PDF files for mass mailings by optimization to the bare minimum. This can be achieved by analyzing a PDF's cross reference table, removing redundant embedded resources like font files or images and by always writing back the file maxing out PDF compression.

I also wanted to have my own swiss army knife for PDFs written entirely in Go that allows me to trim, split and merge PDF content.

Features

Validate (validates PDF files up to version 7.0)
Read (builds xref table from PDF file)
Write (writes xref table to PDF file)
Optimize (gets rid of redundancies like duplicate fonts, images)
Split (split a multi page PDF file into single page PDF files)
Merge (a set of PDF files into one consolidated PDF file)
Trim (generate a custom version of a PDF file)
Extract Images (extract all embedded images of a PDF file into a given dir)
Extract Fonts (extract all embedded fonts of a PDF file into a given dir)
Extract Pages (extract specific pages into a given dir)
Extract Content (extract the PDF-Source into given dir)
Extract Text (extract the text of the PDF to an io.Reader)

Installation

go get github.com/hhrutter/pdflib/cmd/...

Usage

pdflib is a tool for PDF manipulation written in Go.

Usage:

pdflib command [arguments]

The commands are:

validate	validate PDF against PDF 32000-1:2008 (PDF 1.7)
optimize	optimize PDF by getting rid of redundant page resources
split		split multi-page PDF into several single-page PDFs
merge		concatenate 2 or more PDFs
extract		extract images, fonts, content, pages out of a PDF
trim		create trimmed version of a PDF
version		print pdflib version

Single-letter Unix-style supported for commands and flags.

Use "pdflib help [command]" for more information about a command.

pdflib validate [-verbose] [-mode strict|relaxed] inFile
pdflib optimize [-verbose] [-stats csvFile] inFile [outFile]
pdflib split [-verbose] inFile outDir
pdflib merge [-verbose] outFile inFile1 inFile2 ...
pdflib extract [-verbose] -mode image|font|content|page [-pages pageSelection] inFile outDir
pdflib trim [-verbose] -pages pageSelection inFile outFile

Please read the documentation

Status

Version: 0.0.1-beta

The extraction code for font files and images is experimental and serves as proof of concept only.

To Do

validation of the less used page entry "PresSteps"
validation of the less used root entries "SpiderInfo", "Permissions", "Legal", "Collection"

I am looking for test PDFs using one of these features. If you have one and you can share let me know. I am also accepting PRs but right now only regarding the defined items on the todo list.

Disclaimer

Usage of pdflib assumes you know about and respect all copyrights of any PDF content you may be processing. This applies to the PDF files as such, their content and in particular all embedded resources like font files or images.

License

MIT

Documentation ¶

Overview ¶

Package pdflib is a simple PDF processing library written in Go. It provides both an API and a command line tool. Supported are all versions up to PDF 1.7 (ISO-32000).

The available commands are:

validate	validate PDF against PDF 32000-1:2008 (PDF 1.7)
optimize	optimize PDF by getting rid of redundant page resources
split		split multi-page PDF into several single-page PDFs
merge		concatenate 2 or more PDFs
extract		extract images, fonts, content, pages out of a PDF
trim		create trimmed version of a PDF
version		print pdflib version

Index ¶

Constants
func ExtractContent(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)
func ExtractFonts(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)
func ExtractImages(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)
func ExtractPages(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)
func ExtractText(fileIn string, config *types.Configuration) (io.Reader, error)
func Merge(filesIn []string, fileOut string, config *types.Configuration) (err error)
func Optimize(fileIn, fileOut string, config *types.Configuration) (err error)
func ParsePageSelection(s string) (ps []string, err error)
func Process(cmd *Command) (err error)
func Read(fileIn string, config *types.Configuration) (ctx *types.PDFContext, err error)
func Split(fileIn, dirOut string, config *types.Configuration) (err error)
func Trim(fileIn, fileOut string, pageSelection *[]string, config *types.Configuration) (err error)
func Validate(fileIn string, config *types.Configuration) (err error)
func Verbose(verbose bool)
func Write(ctx *types.PDFContext) (err error)
type Command

Constants ¶

View Source

const (
	VALIDATE commandMode = iota
	OPTIMIZE
	SPLIT
	MERGE
	EXTRACTIMAGES
	EXTRACTFONTS
	EXTRACTPAGES
	EXTRACTCONTENT
	TRIM
)

The available commands for the CLI.

Variables ¶

This section is empty.

Functions ¶

func ExtractContent ¶

func ExtractContent(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)

ExtractContent dumps "PDF source" files from fileIn into dirOut for selected pages.

func ExtractFonts ¶

func ExtractFonts(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)

ExtractFonts dumps embedded fontfiles from fileIn into dirOut for selected pages.

func ExtractImages ¶

func ExtractImages(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)

ExtractImages dumps embedded image resources from fileIn into dirOut for selected pages.

func ExtractPages ¶

func ExtractPages(fileIn, dirOut string, pageSelection *[]string, config *types.Configuration) (err error)

ExtractPages generates single page PDF files from fileIn in dirOut for selected pages.

func ExtractText ¶

func ExtractText(fileIn string, config *types.Configuration) (io.Reader, error)

ExtractText converts PDF into text

func Merge ¶

func Merge(filesIn []string, fileOut string, config *types.Configuration) (err error)

Merge some PDF files together and write the result to fileOut. This corresponds to concatenating these files in the order specified by filesIn. The first entry of filesIn serves as the destination xRefTable where all the remaining files gets merged into.

func Optimize ¶

func Optimize(fileIn, fileOut string, config *types.Configuration) (err error)

Optimize reads in fileIn, does validation, optimization and writes the result to fileOut.

func ParsePageSelection ¶

func ParsePageSelection(s string) (ps []string, err error)

ParsePageSelection ensures a correct page selection expression.

func Process ¶

func Process(cmd *Command) (err error)

Process executes a pdflib command.

Example (ExtractImages) ¶

// Extract all embedded images for first 5 and last 5 pages but not for page 4.
selectedPages := []string{"-5", "5-", "!4"}

cmd := ExtractImagesCommand("in.pdf", "dirOut", selectedPages, types.NewDefaultConfiguration())

err := Process(&cmd)
if err != nil {
	return
}

Output:

Example (ExtractPages) ¶

// Extract single-page PDFs for pages 3, 4 and 5.
selectedPages := []string{"3..5"}

cmd := ExtractPagesCommand("in.pdf", "dirOut", selectedPages, types.NewDefaultConfiguration())

err := Process(&cmd)
if err != nil {
	return
}

Output:

Example (Merge) ¶

// Concatenate this sequence of PDF files:
filenamesIn := []string{"in1.pdf", "in2.pdf", "in3.pdf"}

cmd := MergeCommand(filenamesIn, "out.pdf", types.NewDefaultConfiguration())

err := Process(&cmd)
if err != nil {
	return
}

Output:

Example (Optimize) ¶

config := types.NewDefaultConfiguration()

// Generate optional stats.
config.StatsFileName = "stats.csv"

// Configure end of line sequence for writing.
config.Eol = types.EolLF

cmd := OptimizeCommand("in.pdf", "out.pdf", config)

err := Process(&cmd)
if err != nil {
	return
}

Output:

Example (Split) ¶

// Split into single-page PDFs.
cmd := SplitCommand("in.pdf", "outDir", types.NewDefaultConfiguration())

err := Process(&cmd)
if err != nil {
	return
}

Output:

Example (Trim) ¶

// Trim to first three pages.
selectedPages := []string{"-3"}

cmd := TrimCommand("in.pdf", "out.pdf", selectedPages, types.NewDefaultConfiguration())

err := Process(&cmd)
if err != nil {
	return
}

Output:

Example (Validate) ¶

config := types.NewDefaultConfiguration()

// Set relaxed validation mode.
config.SetValidationRelaxed()

cmd := ValidateCommand("in.pdf", config)

err := Process(&cmd)
if err != nil {
	return
}

Output:

func Read ¶

func Read(fileIn string, config *types.Configuration) (ctx *types.PDFContext, err error)

Read reads in a PDF file and builds an internal structure holding its cross reference table aka the PDFContext.

func Split ¶

func Split(fileIn, dirOut string, config *types.Configuration) (err error)

Split generates a sequence of single page PDF files in dirOut creating one file for every page of inFile.

func Trim ¶

func Trim(fileIn, fileOut string, pageSelection *[]string, config *types.Configuration) (err error)

Trim generates a trimmed version of fileIn containing all pages selected.

func Validate ¶

func Validate(fileIn string, config *types.Configuration) (err error)

Validate validates a PDF file against ISO-32000-1:2008.

func Verbose ¶

func Verbose(verbose bool)

Verbose controls logging output.

func Write ¶

func Write(ctx *types.PDFContext) (err error)

Write generates a PDF file for a given PDFContext.

Types ¶

type Command ¶

type Command struct {
	Mode          commandMode          // VALIDATE  OPTIMIZE  SPLIT  MERGE  EXTRACT  TRIM
	InFile        *string              //    *         *        *      -       *      *
	InFiles       *[]string            //    -         -        -      *       -      -
	OutFile       *string              //    -         *        -      *       -      *
	OutDir        *string              //    -         -        *      -       *      -
	PageSelection *[]string            //    -         -        -      -       *      *
	Config        *types.Configuration //
}

Command represents an execution context.

func ExtractContentCommand ¶

func ExtractContentCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) Command

ExtractContentCommand creates a new ExtractContentCommand.

func ExtractFontsCommand ¶

func ExtractFontsCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) Command

ExtractFontsCommand creates a new ExtractFontsCommand. (experimental)

func ExtractImagesCommand ¶

func ExtractImagesCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) Command

ExtractImagesCommand creates a new ExtractImagesCommand. (experimental)

func ExtractPagesCommand ¶

func ExtractPagesCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) Command

ExtractPagesCommand creates a new ExtractPagesCommand.

func MergeCommand ¶

func MergeCommand(pdfFileNamesIn []string, pdfFileNameOut string, config *types.Configuration) Command

MergeCommand creates a new MergeCommand.

func OptimizeCommand ¶

func OptimizeCommand(pdfFileNameIn, pdfFileNameOut string, config *types.Configuration) Command

OptimizeCommand creates a new OptimizeCommand.

func SplitCommand ¶

func SplitCommand(pdfFileNameIn, dirNameOut string, config *types.Configuration) Command

SplitCommand creates a new SplitCommand.

func TrimCommand ¶

func TrimCommand(pdfFileNameIn, pdfFileNameOut string, pageSelection []string, config *types.Configuration) Command

TrimCommand creates a new TrimCommand.

func ValidateCommand ¶

func ValidateCommand(pdfFileName string, config *types.Configuration) Command

ValidateCommand creates a new ValidateCommand.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
bufio Package bufio extends the stdlib bufio with additional support for the \r eol marker.	Package bufio extends the stdlib bufio with additional support for the \r eol marker.
cmd
pdflib
extract Package extract provides methods for extracting fonts, images, pages and page content.	Package extract provides methods for extracting fonts, images, pages and page content.
filter Package filter contains implementations for PDF filters.	Package filter contains implementations for PDF filters.
merge Package merge provides code for merging two PDFContexts.	Package merge provides code for merging two PDFContexts.
optimize Package optimize contains code for optimizing the resources of a PDF file.	Package optimize contains code for optimizing the resources of a PDF file.
read Package read provides methods for parsing PDF files into memory.	Package read provides methods for parsing PDF files into memory.
types Package types provides the PDFContext, representing an ecosystem for PDF processing.	Package types provides the PDFContext, representing an ecosystem for PDF processing.
validate Package validate contains validation code for ISO 32000-1:2008.	Package validate contains validation code for ISO 32000-1:2008.
write Package write contains code that writes PDF data from memory to a file.	Package write contains code that writes PDF data from memory to a file.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL