corpus

package
v0.0.80 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 19, 2023 License: Apache-2.0 Imports: 9 Imported by: 1

Documentation

Overview

Package for scanning the corpus collections

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetOutfileMap

func GetOutfileMap(loader CorpusLoader) (*map[string]CorpusEntry, error)

Method to get a a map of entries with keys being output (HTML) file names Param:

sourceMap: A map by source (plain) text file name

Returns

map with keys being the output file names

func IsExcluded

func IsExcluded(excluded map[string]bool, text string) bool

Tests whether the string should be excluded from corpus analysis Parameter chunk: the string to be tested

func LoadExcluded

func LoadExcluded(file io.Reader) (*map[string]bool, error)

func ReadIntroFile

func ReadIntroFile(r io.Reader) string

Reads a text file introducing the collection. The file should be a plain text file. HTML breaks will be added for line breaks. Parameter r: with text introducing the collection

func ReadText added in v0.0.25

func ReadText(r io.Reader) string

Reads a Chinese text file

Types

type CollectionEntry

type CollectionEntry struct {
	CollectionFile, GlossFile, Title, Summary, Intro, DateUpdated, Corpus string
	CorpusEntries                                                         []CorpusEntry
	AnalysisFile, Format, Date, Genre                                     string
}

type CorpusConfig

type CorpusConfig struct {
	CorpusDataDir string
	CorpusDir     string
	Excluded      map[string]bool
	ProjectHome   string
	// contains filtered or unexported fields
}

CorpusConfig encapsulates parameters for corpus configuration

func NewFileCorpusConfig added in v0.0.22

func NewFileCorpusConfig(corpusDataDir, corpusDir string,
	excluded map[string]bool, projectHome string) CorpusConfig

Creates a new CorpusConfig strct

type CorpusEntry

type CorpusEntry struct {
	RawFile, GlossFile, Title, ColTitle, ColFile string
}

An entry in a collection

func NewCorpusEntry

func NewCorpusEntry() *CorpusEntry

Constructor for an empty CollectionEntry

type CorpusLoader

type CorpusLoader interface {

	// Method to get the corpus configuration
	// Parameter:
	//  r: to reader the text
	GetConfig() CorpusConfig

	// Method to get a single entry in a collection
	// Param:
	//   fName: The file name of the collection
	// Returns
	//   A CollectionEntry encapsulating the collection or an error
	GetCollectionEntry(fName string) (*CollectionEntry, error)

	// Method to load the entries in a collection
	// Param:
	//   fName: A file name containing the entries in the collection
	//   colTitle: The title of the collection
	LoadCollection(fName, colTitle string) (*[]CorpusEntry, error)

	// Method to load the collections in a corpus from the default file
	// Parameter:
	//  r: to read the listing of the collections
	LoadCollections() (*[]CollectionEntry, error)

	// Method to load the collections in a corpus
	// Parameter:
	//  r: to read the listing of the collections
	LoadCorpus(r io.Reader) (*[]CollectionEntry, error)

	// Method to read the contents of a corpus entry
	// Parameter:
	//  r: to reader the text
	ReadText(srcFile string) (string, error)
}

Interface for loading corpus with hierarchical collections of documents

func NewFileCorpusLoader added in v0.0.22

func NewFileCorpusLoader(corpusConfig CorpusConfig) CorpusLoader

CorpusLoader gets the default kind of CorpusLoader

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL