process

package
v0.0.0-...-876d392 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2020 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ExtMap = map[string]ExtConst{
	"application/pdf": PDF,
	"text/plain":      OFFICE,
	"application/rtf": OFFICE,
	"application/vnd.oasis.opendocument.text-template": OFFICE,
	"application/msword": OFFICE,
	"application/vnd.openxmlformats-officedocument.wordprocessingml.document":   OFFICE,
	"application/vnd.oasis.opendocument.text":                                   OFFICE,
	"application/vnd.ms-excel":                                                  OFFICE,
	"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet":         OFFICE,
	"application/vnd.oasis.opendocument.spreadsheet":                            OFFICE,
	"application/vnd.ms-powerpoint":                                             OFFICE,
	"application/vnd.openxmlformats-officedocument.presentationml.presentation": OFFICE,
	"application/vnd.oasis.opendocument.presentation":                           OFFICE,
	"text/html": URL,
}

ExtMap mapps Content-Type to ExtConst

Functions

func InjestFile

func InjestFile(ctx context.Context, file types.FileI, contenttype string, stream io.Reader, dbconfig database.Database) (fs *types.FileStore, err error)

InjestFile builds a file and file store from data and adds to database

func SentenceSplitter

func SentenceSplitter(data []byte, atEOF bool) (advance int, token []byte, err error)

SentenceSplitter implements the scanner split func type for splitting strings into sentences

Types

type ContentExtractor

type ContentExtractor tika.Client

ContentExtractor is a connector to a tika server and build content lines from file streams

func NewContentExtractor

func NewContentExtractor(httpClient *http.Client, urlString string) *ContentExtractor

NewContentExtractor connects to a tika server at a given address

func (*ContentExtractor) ExtractCSV

func (ce *ContentExtractor) ExtractCSV(ctx context.Context, filecontent io.Reader) ([]types.ContentLine, error)

ExtractCSV process a byte stream assuming it is a type of comma or tab separated values

func (*ContentExtractor) ExtractText

func (ce *ContentExtractor) ExtractText(ctx context.Context, filecontent io.Reader) ([]types.ContentLine, error)

ExtractText process a file stream assuming it is some kind of text file

type ExtConst

type ExtConst int

ExtConst is an enum type indicating how a particular file type is to be processed

const (

	// Convertable to PDF
	OFFICE ExtConst
	// URL to a website to convert to PDF
	URL
	// Already a PDF
	PDF
)

Values of ExtConst

func IdentifyFileAction

func IdentifyFileAction(name string, ctype string) ExtConst

IdentifyFileAction generates the ExtConst based on the file header

func MapContentType

func MapContentType(ct string) ExtConst

MapContentType converts the content type header value to associated ExtConst

type FileConverter

type FileConverter gotenberg.Client

FileConverter is n connector to gotenberg to convert a wide variety of files into pdfs

func NewFileConverter

func NewFileConverter(url string) *FileConverter

NewFileConverter builds new connector to gotenberg server at url

func (*FileConverter) ConvertOffice

func (fc *FileConverter) ConvertOffice(inputName string, in []byte) (out []byte, err error)

ConvertOffice converts many common file types into PDFs

func (*FileConverter) ConvertURL

func (fc *FileConverter) ConvertURL(url string) (out []byte, err error)

ConvertURL produces PDF version of webpage at address

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL