Documentation ¶
Index ¶
- Variables
- type DirectoryStore
- type Doc
- type Document
- type OCRSubstitution
- type PgxStore
- func (p *PgxStore) GetAllTreatisePageIDs(ctx context.Context) ([]*TreatisePage, error)
- func (p *PgxStore) GetDocFromPath(context.Context, string, string) (*Doc, error)
- func (p *PgxStore) GetOCRSubstitutions(ctx context.Context) ([]*OCRSubstitution, error)
- func (p *PgxStore) GetTreatisePage(ctx context.Context, treatiseID string, pageID string) (*TreatisePage, error)
- type Store
- type TreatisePage
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidID = errors.New("Invalid ID(s)")
ErrInvalidID means an improper ID has been passed to a query
var ErrNoDocument = errors.New("No matching document")
ErrNoDocument means no document matches that query
var ErrNotImplemented = errors.New("Not implemented")
ErrNotImplemented is for methods for which implementation does not make sense.
Functions ¶
This section is empty.
Types ¶
type DirectoryStore ¶
type DirectoryStore struct {
Path string
}
DirectoryStore is a datastore for sources containing plain text files in a directory. The filenames will be used as the identifiers.
func NewDirectoryStore ¶
func NewDirectoryStore(path string) (*DirectoryStore, error)
NewDirectoryStore creates a new DirectoryStore
func (*DirectoryStore) GetDocFromPath ¶
GetDocFromPath takes a file name and returns a document representation by reading the file contents from the directory.
func (*DirectoryStore) GetTreatisePage ¶
func (ds *DirectoryStore) GetTreatisePage(context.Context, string, string) (*TreatisePage, error)
GetTreatisePage is not implemented for a DirectoryStore
type Doc ¶
Doc is a simple document which has no pages, parent, or other subdivisions.
func (*Doc) CorrectOCR ¶
func (d *Doc) CorrectOCR(subs []*OCRSubstitution)
CorrectOCR takes a set of OCR mistakes to be corrected via substitution and and fixes them in the document.
func (*Doc) HasParent ¶
HasParent returns false, because a Doc by definition has not parent document. Used to satisfy the Document interface.
func (*Doc) ParentID ¶
ParentID returns an empty string, because a Doc by definition has no parent document. Used to satisfy the Document interface.
type Document ¶
type Document interface { ID() string ParentID() string HasParent() bool Text() string CorrectOCR([]*OCRSubstitution) }
Document models a document
type OCRSubstitution ¶
OCRSubstitution represents an OCR correction that should be made to an input document via simple string substitution.
func OCRSubstitutionsFromCSV ¶
func OCRSubstitutionsFromCSV(path string) ([]*OCRSubstitution, error)
OCRSubstitutionsFromCSV reads OCR substitutions from a CSV file.
type PgxStore ¶
PgxStore is a datastore for sources contained in a PostgreSQL database using the pgx driver.
func NewPgxStore ¶
NewPgxStore creates a new datastore backed by the database
func (*PgxStore) GetAllTreatisePageIDs ¶
func (p *PgxStore) GetAllTreatisePageIDs(ctx context.Context) ([]*TreatisePage, error)
GetAllTreatisePageIDs gets all the IDs (both document and page) for the treatises. However, the full text will be empty.
func (*PgxStore) GetDocFromPath ¶
GetDocFromPath is not implemented for this datastore. It will always return an error.
func (*PgxStore) GetOCRSubstitutions ¶
func (p *PgxStore) GetOCRSubstitutions(ctx context.Context) ([]*OCRSubstitution, error)
GetOCRSubstitutions gets a complete list of OCR substitutions from the database
func (*PgxStore) GetTreatisePage ¶
func (p *PgxStore) GetTreatisePage(ctx context.Context, treatiseID string, pageID string) (*TreatisePage, error)
GetTreatisePage gets a TreatisePage from the ID of the treatise and the page
type Store ¶
type Store interface { GetDocFromPath(ctx context.Context, id string) (*Doc, error) GetTreatisePage(ctx context.Context, treatiseID string, pageID string) (*TreatisePage, error) GetAllTreatisePageIDs(ctx context.Context) ([]*TreatisePage, error) GetOCRSubstitutions(ctx context.Context) ([]*OCRSubstitution, error) }
Store describes a datastore for sources.
type TreatisePage ¶
TreatisePage represents a page from a treatise in MOML.
func NewTreatisePage ¶
func NewTreatisePage(pageID string, treatiseID, text string) *TreatisePage
NewTreatisePage creates a new treatise document
func (*TreatisePage) CorrectOCR ¶
func (t *TreatisePage) CorrectOCR(subs []*OCRSubstitution)
CorrectOCR takes a set of OCR mistakes to be corrected via substitution and and fixes them in the document.
func (*TreatisePage) HasParent ¶
func (t *TreatisePage) HasParent() bool
HasParent returns true, because a page from a treatise by definition has a parent document. Used to satisfy the Document interface.
func (*TreatisePage) ID ¶
func (t *TreatisePage) ID() string
ID returns the ID of the page from the treatise.
func (*TreatisePage) ParentID ¶
func (t *TreatisePage) ParentID() string
ParentID returns the ID of the treatise. Used to satisfy the Document interface.
func (TreatisePage) String ¶
func (t TreatisePage) String() string
String returns a string representation of the document.
func (*TreatisePage) Text ¶
func (t *TreatisePage) Text() string
Text returns the full text of the page of the treatise.