sources

package
v0.0.0-...-78f357f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2023 License: MIT Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrInvalidID = errors.New("Invalid ID(s)")

ErrInvalidID means an improper ID has been passed to a query

View Source
var ErrNoDocument = errors.New("No matching document")

ErrNoDocument means no document matches that query

View Source
var ErrNotImplemented = errors.New("Not implemented")

ErrNotImplemented is for methods for which implementation does not make sense.

Functions

This section is empty.

Types

type DirectoryStore

type DirectoryStore struct {
	Path string
}

DirectoryStore is a datastore for sources containing plain text files in a directory. The filenames will be used as the identifiers.

func NewDirectoryStore

func NewDirectoryStore(path string) (*DirectoryStore, error)

NewDirectoryStore creates a new DirectoryStore

func (*DirectoryStore) GetDocFromPath

func (ds *DirectoryStore) GetDocFromPath(ctx context.Context, id string) (*Doc, error)

GetDocFromPath takes a file name and returns a document representation by reading the file contents from the directory.

func (*DirectoryStore) GetTreatisePage

func (ds *DirectoryStore) GetTreatisePage(context.Context, string, string) (*TreatisePage, error)

GetTreatisePage is not implemented for a DirectoryStore

type Doc

type Doc struct {
	Identifier string
	FullText   string
}

Doc is a simple document which has no pages, parent, or other subdivisions.

func NewDoc

func NewDoc(id string, text string) *Doc

NewDoc creates a new document

func (*Doc) CorrectOCR

func (d *Doc) CorrectOCR(subs []*OCRSubstitution)

CorrectOCR takes a set of OCR mistakes to be corrected via substitution and and fixes them in the document.

func (*Doc) HasParent

func (d *Doc) HasParent() bool

HasParent returns false, because a Doc by definition has not parent document. Used to satisfy the Document interface.

func (*Doc) ID

func (d *Doc) ID() string

ID returns the ID of the document.

func (*Doc) ParentID

func (d *Doc) ParentID() string

ParentID returns an empty string, because a Doc by definition has no parent document. Used to satisfy the Document interface.

func (Doc) String

func (d Doc) String() string

String returns a string representation of the document.

func (*Doc) Text

func (d *Doc) Text() string

Text returns the full text of the document.

type Document

type Document interface {
	ID() string
	ParentID() string
	HasParent() bool
	Text() string
	CorrectOCR([]*OCRSubstitution)
}

Document models a document

type OCRSubstitution

type OCRSubstitution struct {
	Mistake    string
	Correction string
}

OCRSubstitution represents an OCR correction that should be made to an input document via simple string substitution.

func OCRSubstitutionsFromCSV

func OCRSubstitutionsFromCSV(path string) ([]*OCRSubstitution, error)

OCRSubstitutionsFromCSV reads OCR substitutions from a CSV file.

type PgxStore

type PgxStore struct {
	DB *pgxpool.Pool
}

PgxStore is a datastore for sources contained in a PostgreSQL database using the pgx driver.

func NewPgxStore

func NewPgxStore(db *pgxpool.Pool) *PgxStore

NewPgxStore creates a new datastore backed by the database

func (*PgxStore) GetAllTreatisePageIDs

func (p *PgxStore) GetAllTreatisePageIDs(ctx context.Context) ([]*TreatisePage, error)

GetAllTreatisePageIDs gets all the IDs (both document and page) for the treatises. However, the full text will be empty.

func (*PgxStore) GetDocFromPath

func (p *PgxStore) GetDocFromPath(context.Context, string, string) (*Doc, error)

GetDocFromPath is not implemented for this datastore. It will always return an error.

func (*PgxStore) GetOCRSubstitutions

func (p *PgxStore) GetOCRSubstitutions(ctx context.Context) ([]*OCRSubstitution, error)

GetOCRSubstitutions gets a complete list of OCR substitutions from the database

func (*PgxStore) GetTreatisePage

func (p *PgxStore) GetTreatisePage(ctx context.Context, treatiseID string, pageID string) (*TreatisePage, error)

GetTreatisePage gets a TreatisePage from the ID of the treatise and the page

type Store

type Store interface {
	GetDocFromPath(ctx context.Context, id string) (*Doc, error)
	GetTreatisePage(ctx context.Context, treatiseID string, pageID string) (*TreatisePage, error)
	GetAllTreatisePageIDs(ctx context.Context) ([]*TreatisePage, error)
	GetOCRSubstitutions(ctx context.Context) ([]*OCRSubstitution, error)
}

Store describes a datastore for sources.

type TreatisePage

type TreatisePage struct {
	PageID     string
	TreatiseID string
	FullText   string
}

TreatisePage represents a page from a treatise in MOML.

func NewTreatisePage

func NewTreatisePage(pageID string, treatiseID, text string) *TreatisePage

NewTreatisePage creates a new treatise document

func (*TreatisePage) CorrectOCR

func (t *TreatisePage) CorrectOCR(subs []*OCRSubstitution)

CorrectOCR takes a set of OCR mistakes to be corrected via substitution and and fixes them in the document.

func (*TreatisePage) HasParent

func (t *TreatisePage) HasParent() bool

HasParent returns true, because a page from a treatise by definition has a parent document. Used to satisfy the Document interface.

func (*TreatisePage) ID

func (t *TreatisePage) ID() string

ID returns the ID of the page from the treatise.

func (*TreatisePage) ParentID

func (t *TreatisePage) ParentID() string

ParentID returns the ID of the treatise. Used to satisfy the Document interface.

func (TreatisePage) String

func (t TreatisePage) String() string

String returns a string representation of the document.

func (*TreatisePage) Text

func (t *TreatisePage) Text() string

Text returns the full text of the page of the treatise.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL