pdfreader

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2024 License: MIT Imports: 8 Imported by: 0

README

pdfreader

Build Status Go Report Card Docs License

Introduction

The pdfreader library for Go is a library to read contents of PDF files.

Basically it turns out that this will result in a PDF to SVG converter at the first stage.

Details

PDF files are basically something that is usually at the end of some workflow and that is intended to conserve informations in a way that allows the informations to be accessed as they where intended (e.g. in terms of typographical layout). These informations need to be fetched. The project here tries to make this possible with a library for Go.

If you are not a Go programmer, just move away or play with an example application.

Currently everything is at it's premature state and there is no production-ready library to be expected. Well, the things work usually fine for many tasks.

If you are willing to make experiments, just checkout at http://code.google.com/p/pdfreader/source/checkout

Basic design principles

  • Using this library with a malformed PDF might crash the program. This is intentional.
  • Keep things simple - no reason to produce billions of lines of code.
  • Make the crash to be late. As late as possible. If there is something really wrong it will crash earlier or later. Why using a "safe" programming language if not using it and adding useless tests for validity of input?
  • Avoid endless recursions. There are many places where this could occur in PDF-files. A fixing of issue 226 with golang would help, but the gurus of Google did decide to do different. So be prepared to have no real fun with the implementation language. See Philosophy-page.

Example

This shows an SVG displayed in Inkscape that was converted from a PDF:

example-convert.png

Credits

  1. This library was originally created by Helmar Wodtke and available on Google Code: https://code.google.com/archive/p/pdfreader/ . This code is available under the MIT license.
  2. This library is forked from a GitHub repo maintained by Nathan Kerr here: https://github.com/nathankerr/pdfreader . The benefit of this repo is that it maintains the commit history from Google Code Archive. This may have been possible via the "Export to GitHub" button on Google Code Archive which no longer appears to work. Additionally, the code from Nathan here is also under the MIT license.
  3. Another fork was created by James Healy here https://github.com/yob/pdfreader , however, it does not maintian Helmar's original commit history.
  4. Another fork was created by Raffaele Sena here https://github.com/raff/pdfreader , however, it is based on Healy's fork so the original commit history is lost, and the license is unknown for Raffaele's additional code.

Documentation

Overview

Access to PDF files.

Index

Constants

View Source
const (
	MAX_PDF_UPDATES   = 1024
	MAX_PDF_ARRAYSIZE = 1024
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Dictionary added in v0.3.0

type Dictionary map[string][]byte

type PDFReader added in v0.3.0

type PDFReader struct {
	File string // name of the file

	Startxref int         // starting of xref table
	Xref      map[int]int // "pointers" of the xref table
	Trailer   Dictionary  // trailer dictionary of the file
	// contains filtered or unexported fields
}

func Load

func Load(fn string) *PDFReader

Load() loads a PDF file of a given name.

func (*PDFReader) Arr added in v0.3.0

func (pd *PDFReader) Arr(reference []byte) [][]byte

pd.Arr() queries array data from a reference.

func (*PDFReader) Att added in v0.3.0

func (pd *PDFReader) Att(a string, src []byte) []byte

pd.Att() tries to get an attribute from a page reference. The attribute will be resolved.

func (*PDFReader) DecodedStream added in v0.3.0

func (pd *PDFReader) DecodedStream(reference []byte) (Dictionary, []byte)

DecodedStream returns decoded contents of a stream.

func (*PDFReader) Dic added in v0.3.0

func (pd *PDFReader) Dic(reference []byte) Dictionary

pd.Dic() queries dictionary data from a reference.

func (*PDFReader) ForcedArray added in v0.3.0

func (pd *PDFReader) ForcedArray(reference []byte) [][]byte

pd.ForcedArray() queries array data. If reference does not refer to an array, reference is taken as element of the returned array.

func (*PDFReader) PageFonts added in v0.3.0

func (pd *PDFReader) PageFonts(page []byte) Dictionary

PageFonts returns references to the fonts defined for a page.

func (*PDFReader) Pages added in v0.3.0

func (pd *PDFReader) Pages() [][]byte

pd.Pages() returns an array with references to the pages of the PDF.

Directories

Path Synopsis
Character Mappings (cmap).
Character Mappings (cmap).
"crush" bytes into bits - variable length.
"crush" bytes into bits - variable length.
Enhanced input.
Enhanced input.
hex encoder/decoder for PDF.
hex encoder/decoder for PDF.
LZW decoder for PDF.
LZW decoder for PDF.
HTTP-server example.
HTTP-server example.
Example program for pdfread.go
Example program for pdfread.go
Convert PDF-pages to SVG.
Convert PDF-pages to SVG.
Decoder for pfb fonts.
Decoder for pfb fonts.
PS top-down parser.
PS top-down parser.
Stacks of different types.
Stacks of different types.
string math
string math
Library to convert PDF pages to SVG.
Library to convert PDF pages to SVG.
SVG driver for graf.go.
SVG driver for graf.go.
SVG driver (text) for graf.go.
SVG driver (text) for graf.go.
Type1 font tester.
Type1 font tester.
Some utilities.
Some utilities.
Encode UTF-8.
Encode UTF-8.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL