pico

package module
v0.0.0-...-0dddbe1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2023 License: MIT Imports: 18 Imported by: 0

README

pico - convert PDF to images with progress

A Go implementation for @Belval's Python pdf2image but with progress support.

convert with progress bar

Install package and dependency

$ go install github.com/DeathKing/pico/...@latest

NOTICE
this will also install the binary pdf2image in $GOPATH/bin folder.

Windows

Windows users will have to build or download poppler for Windows. I recommend @oschwartz10612 version which is the most up-to-date. You will then have to add the bin/ folder to PATH or use WithPopperPath("C:/path/to/poppler-xx/bin").

Mac

Mac users will have to install poppler.

Installing using Brew:

brew install poppler
Linux

Most distros ship with pdftoppm and pdftocairo. If they are not installed, refer to your package manager to install poppler-utils

Platform-independant (Using conda)

Install poppler: conda install -c conda-forge poppler

Usage

Programmatically use it as a library
import "github.com/DeathKing/pico"

func main() {

    // Case 1. Silently convert file with single worker, you must use `Wait()`
    //         for synchronization
    // see _example/single/main.go
    task, _ := pico.Convert("path/to/pdf")
    task.Wait()

    // Case 2. Convert single file with multiple worker, instead of `Wait()`
    //         for final result, we take the per-page conversion result through
    //         `Entries` channel. In this situation, `task.Wait()` is not a
    //         neccessary
    // see _example/woker/main.go
    task, _ = pico.Convert("path/to/pdf",
        pico.WithJob(4),
    )

    // entry is like ["current_page" "total_page" "output_filename" "worker_index"]
    for entry := range task.Entries {
		fmt.Printf("page %s is converted as file %s \n",
			entry[0], // current page
			entry[2], // output filename
		)
	}

    // Case 3. A more fancy usage
    task, _ = pico.Convert("path/to/pdf",
        pico.WithPopperPath("path/to/poppler"),
        pico.WithFormat("jpg"),
        pico.WithDPI(72),
        pico.WithPageRange(22, 42),             // Convert from Page 22 to Page 42 (included)
        pico.WithJob(3)                         // Using 3 worker/process to convert
        pico.WithTimeout(10 * time.Second)      // Must finished within 10 seconds
    )

    // `WaitAndCollect()` will blocked the excution and collect the conversion
    // result into a slice.
    for _, item := task.WaitAndCollect() {
        fmt.Printf("[worker#%d] file: %s %s/%s", entry[3], entry[2], entry[0], entry[1])
    }
}

Use it as a command line tool
df2image [-d dpi] [-f firstPage] [-l lastPage] [-j n] [-o outputFolder] path/to/file pattern/to/folder

For more detail , see cmd/pdf2image/main.go.

TODO

  • outputFileFn() to specify output filename by function.
  • Converts() function which support concurrently convert multiple files.
  • implement WithScale()/WithSize()/WithScaleToX()/WithScaleToY option
    • WithScale(400) or WithSize(400) will fit the image to a 400x400 box, preserving aspect ratio.
    • WithScaleToX(400) will make the image 400 pixels wide, preserving aspect ratio.
    • WithScaleToY(400) will make the image 400 pixels height, preserving aspect ratio.
  • explaining each parameters in detail.
  • more robust command line parsing.
  • more test cases.

Limitations / known issues

  1. Not working well with filename or path that contains CJK characters (this may caused by poppler).

Credit

Much thanks go to Edouard Belval for not only his original Python library pdf2image which inspires this Golang variation, but also the poppler installation instructions.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrProviderClosed = errors.New("provider is closed")
View Source
var WithSize = WithScaleTo

WithSize is the alias of WithScaleTo

Functions

func GetInfo

func GetInfo(pdf string, options ...CallOption) (map[string]string, error)

func GetPagesCount

func GetPagesCount(pdfPath string, options ...CallOption) (int, error)

Types

type BatchTask

type BatchTask struct {
	Task
}

BatchTask deals with multiple documents conversion where each convertor converts single document.

func ConvertFiles

func ConvertFiles(files interface{}, options ...CallOption) (*BatchTask, error)

ConvertFiles converts multiple PDF files to images

files could be type `[]string`, `chan string`, or `PdfProvider`

func (*BatchTask) Start

func (t *BatchTask) Start(provider PdfProvider) error

type CallOption

type CallOption func(o *Parameters, command []string) []string

func WithContext

func WithContext(ctx context.Context) CallOption

func WithDpi

func WithDpi(dpi int) CallOption

WithDpi sets image quality in DPI (default 200)

func WithFirstPage

func WithFirstPage(firstPage int) CallOption

WithFirstPage sets the first page to convert

func WithFormat

func WithFormat(fmt string) CallOption

WithFormat sets the output image format

func WithGrayScale

func WithGrayScale() CallOption

func WithHideAnnotations

func WithHideAnnotations() CallOption

func WithJPEGOpt

func WithJPEGOpt(jpegOpt map[string]string) CallOption

func WithJPEGOptimize

func WithJPEGOptimize(optimize bool) CallOption

func WithJPEGProgressive

func WithJPEGProgressive(progressive bool) CallOption

func WithJPEGQuality

func WithJPEGQuality(quality int) CallOption

func WithJob

func WithJob(job int) CallOption

WithJob sets the number of threads to use

func WithLastPage

func WithLastPage(lastPage int) CallOption

WithLastPage sets the last page to convert

func WithOutputFile

func WithOutputFile(outputFile string) CallOption

func WithOutputFileFn

func WithOutputFileFn(fn nameFn) CallOption

func WithOutputFolder

func WithOutputFolder(outputFolder string) CallOption

Write the resulting images to a folder (instead of directly in memory)

func WithOwnerPw

func WithOwnerPw(ownerPw string) CallOption

WithOwnerPw sets PDF's owner password

func WithPageRange

func WithPageRange(firstPage, lastPage int) CallOption

WithPageRange sets the range of pages to convert

func WithPopplerPath

func WithPopplerPath(popplerPath string) CallOption

WithPopplerPath sets poppler binaries lookup path

func WithScaleTo

func WithScaleTo(size int) CallOption

WithScaleTo sets the size of the resulting images, size=400 will fit the image to a 400x400 box, preserving aspect ratio

func WithScaleToX

func WithScaleToX(size int) CallOption

func WithScaleToY

func WithScaleToY(size int) CallOption

func WithSingleFile

func WithSingleFile() CallOption

func WithStrict

func WithStrict() CallOption

WithStrict sets to strict mode, when a Syntax Error is thrown, it will be raised as an Exception

func WithTimeout

func WithTimeout(timeout time.Duration) CallOption

WithTimeout

func WithTransparent

func WithTransparent() CallOption

func WithUseCropBox

func WithUseCropBox() CallOption

func WithUsePdftocario

func WithUsePdftocario() CallOption

func WithUserPw

func WithUserPw(userPw string) CallOption

WithUserPw sets PDF's password

func WithVerbose

func WithVerbose() CallOption

WithVerbose will prints useful debugging information

type ChanProvider

type ChanProvider struct {
	// contains filtered or unexported fields
}

func (*ChanProvider) Count

func (p *ChanProvider) Count() int

func (*ChanProvider) Source

func (p *ChanProvider) Source() <-chan string

type ConversionError

type ConversionError struct {
	// contains filtered or unexported fields
}

func (*ConversionError) Cause

func (e *ConversionError) Cause() error

func (*ConversionError) Error

func (e *ConversionError) Error() string

type Convertor

type Convertor struct {
	Progress
	// contains filtered or unexported fields
}

func (*Convertor) Aborted

func (c *Convertor) Aborted() bool

func (*Convertor) Completed

func (c *Convertor) Completed() bool

Completed reports whether the convertor is in completed state

func (*Convertor) Error

func (c *Convertor) Error() (err error)

func (*Convertor) Errors

func (c *Convertor) Errors() []*ConversionError

type GetBinaryVersionError

type GetBinaryVersionError struct {
	// contains filtered or unexported fields
}

func NewGetBinaryVersionError

func NewGetBinaryVersionError(binary string) *GetBinaryVersionError

func (*GetBinaryVersionError) Error

func (e *GetBinaryVersionError) Error() string

type Observable

type Observable interface {
	// Total is the total
	Total() int32

	// Finished counts finished conversion, since the conversion may be a
	// part of a file, like from `firstPage` to `lastPage`, thus the total
	// count may less than `lastPage` and Finished() <= Current() always holds
	Finished() int32

	// Current is the current page number we've just converted
	Current() int32

	Completed() bool
	Aborted() bool
}

type PDFSyntaxError

type PDFSyntaxError struct {
	// contains filtered or unexported fields
}

func NewOldPDFSyntaxError

func NewOldPDFSyntaxError(line, filename string, page int32) *PDFSyntaxError

func NewPDFSyntaxError

func NewPDFSyntaxError(line string) *PDFSyntaxError

func (*PDFSyntaxError) Error

func (e *PDFSyntaxError) Error() string

type Parameters

type Parameters struct {
	// contains filtered or unexported fields
}

type PdfProvider

type PdfProvider interface {
	Source() <-chan string
	Count() int
}

func FromChan

func FromChan(ch chan string) PdfProvider

func FromGlob

func FromGlob(pattern string) PdfProvider

func FromInterface

func FromInterface(i interface{}) PdfProvider

func FromMultiSource

func FromMultiSource(patterns []string) PdfProvider

func FromMultiSourceAsync

func FromMultiSourceAsync(patterns []string) PdfProvider

func FromSlice

func FromSlice(files []string) PdfProvider

type PerPageTimeoutError

type PerPageTimeoutError struct {
	// contains filtered or unexported fields
}

func NewPerPageTimeoutError

func NewPerPageTimeoutError(page string) *PerPageTimeoutError

func (*PerPageTimeoutError) Error

func (e *PerPageTimeoutError) Error() string

type Progress

type Progress struct {
	// contains filtered or unexported fields
}

func (*Progress) Current

func (p *Progress) Current() int32

func (*Progress) Filename

func (p *Progress) Filename() string

func (*Progress) Finished

func (p *Progress) Finished() int32

func (*Progress) Incr

func (p *Progress) Incr(delta int32)

func (*Progress) PushTotal

func (p *Progress) PushTotal(delta int32)

func (*Progress) SetCurrent

func (p *Progress) SetCurrent(current int32)

func (*Progress) Total

func (p *Progress) Total() int32

type SingleTask

type SingleTask struct {
	Task
}

SingleTask deals with single document conversion where usually the given pdf is a large file so we split it (evenly) into parts by page ranges and dispatch them to every convertor.

func Convert

func Convert(pdf string, options ...CallOption) (*SingleTask, error)

Convert converts single PDF to images. This function is solely a options parser and command builder

func (*SingleTask) Start

func (t *SingleTask) Start(pdf string) error

Start initiates the conversion process

type SliceFileProvider

type SliceFileProvider struct {
	ChanProvider
	// contains filtered or unexported fields
}

func (*SliceFileProvider) Count

func (p *SliceFileProvider) Count() int

type Task

type Task struct {
	// FileProgress is measured by file counts
	Progress

	// Convertors are used to convert PDF to images
	Convertors []*Convertor

	// Entries is the channel of conversion progress entry
	// the format will be ["currentPage" "lastPage" "filename" "workerId"]
	Entries chan []string
	// contains filtered or unexported fields
}

func (*Task) Aborted

func (t *Task) Aborted() bool

func (*Task) Completed

func (t *Task) Completed() bool

func (*Task) Error

func (t *Task) Error() error

func (*Task) Errors

func (t *Task) Errors() (errs []*ConversionError)

func (*Task) Wait

func (t *Task) Wait()

Wait hijacks the EntryChan and wait for all the workers finish

func (*Task) WaitAndCollect

func (t *Task) WaitAndCollect() (entries [][]string)

WaitAndCollect acts like Wait() but collects all the entries into a slice. A empty array is returned if there is no entry received.

type WrongArgumentError

type WrongArgumentError struct {
	// contains filtered or unexported fields
}

func (*WrongArgumentError) Error

func (e *WrongArgumentError) Error() string

Directories

Path Synopsis
_example
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL