serial

package
v0.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 28, 2019 License: MIT Imports: 7 Imported by: 0

README

serial

flatbuffersserialization

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func MakeDocPageLocations

func MakeDocPageLocations(b *flatbuffers.Builder, ppos []OffsetBBox) []byte

MakeDocPageLocations returns a flatbuffers serialized byte array for `ppos`.

func MakeDocPositions

func MakeDocPositions(b *flatbuffers.Builder, doc DocPositions) []byte

MakeDocPositions returns a flatbuffers serialized byte array for `doc`.

func MakeSerialBlevePdf

func MakeSerialBlevePdf(b *flatbuffers.Builder, spi SerialBlevePdf) []byte

MakeSerialBlevePdf returns a flatbuffers serialized byte array for `spi`.

func MakeTextLocation

func MakeTextLocation(b *flatbuffers.Builder, loc OffsetBBox) []byte

MakeTextLocation returns a flatbuffers serialized byte array for `loc`.

func WriteSerialBlevePdf

func WriteSerialBlevePdf(spi SerialBlevePdf) []byte

WriteSerialBlevePdf converts `spi` into a byte array.

Types

type DocPositions

type DocPositions struct {
	Path          string         // Path of input PDF file.
	DocIdx        uint64         // Index into blevePdf.fileList.
	PagePositions [][]OffsetBBox // PagePositions[i] = doc.pagePositions[doc.pageNums[i]].offsetBBoxes
	PageNums      []uint32       // 1-offset page numbers of entries.
	PageTexts     []string       // Extracted page text of entries.
}

DocPositions is used to serialize a doclib.DocPositions.

table DocPositions {
	path:  string;
	doc_idx:  uint64;
	page_dpl: [locations.PagePositions];
	page_nums:  [uint32];
	page_texts: [string];
}

func (DocPositions) String

func (doc DocPositions) String() string

String returns a text description of `doc`.

type HashIndexPathDoc

type HashIndexPathDoc struct {
	Hash  string
	Index uint64
	Path  string
	Doc   DocPositions
}

HashIndexPathDoc is used for serializing a doclib.BlevePdf. They key+values of the maps in the BlevePdf are stored in []HashIndexPathDoc.

table HashIndexPathDoc {
	hash: string;
	index: uint64;
	path: string;
	doc: DocPositions;
}

type OffsetBBox

type OffsetBBox struct {
	Offset             uint32  // Offset of the text fragment in extracted page text.
	Llx, Lly, Urx, Ury float32 // Bounding box of fragment on PDF page.
}

OffsetBBox provides a mapping between the location of a piece of text on a PDF page and the offset of that piece of text in the text extracted from the PDF page. The text extracted from PDF pages is sent to bleve for indexing. BBox() is used to map the results of bleve searches (offsets in the extracted text) back to PDF contents. (Members need to be public because they are accessed by the doclib package.

func ReadDocPageLocations

func ReadDocPageLocations(buf []byte) ([]OffsetBBox, error)

func ReadTextLocation

func ReadTextLocation(buf []byte) OffsetBBox

func (OffsetBBox) BBox

func (t OffsetBBox) BBox() model.PdfRectangle

BBox returns `t` as a UniDoc rectangle. This is convenient for drawing bounding rectangles around text in a PDF file.

func (OffsetBBox) Equals

func (t OffsetBBox) Equals(u OffsetBBox) bool

Equals returns true if `t` has the same text interval and bounding box as `u`.

type SerialBlevePdf

type SerialBlevePdf struct {
	NumFiles uint32
	NumPages uint32
	HIPDs    []HashIndexPathDoc
}

SerialBlevePdf is for serializing and deserializing doclib.BlevePdf. It corresponds to the following flatbuffers schema.

table PdfIndex  {
	num_files:   uint32;
	num_pages:   uint32;
	hipd:       [HashIndexPathDoc];
}

func ReadSerialBlevePdf

func ReadSerialBlevePdf(buf []byte) (SerialBlevePdf, error)

ReadSerialBlevePdf converts byte array `b` into a SerialBlevePdf. Write round trip tests. !@#$

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL