Documentation ¶
Index ¶
- Constants
- func CombineURLs(urls ...string) string
- func ExpandBraces(str string) ([]string, error)
- func NormalizeString(str string, sourround bool) string
- func WithEdges(cs ...*Concept) func(*Concept)
- func WithID(id int) func(*Concept)
- type Concept
- func (c *Concept) Ambig() bool
- func (c *Concept) EachEdge(f func(Edge))
- func (c *Concept) EdgeAt(i int) Edge
- func (c Concept) Edges() []Edge
- func (c *Concept) EdgesLen() int
- func (c *Concept) FindEdge(p, o string) (Edge, bool)
- func (c *Concept) GobDecode(bs []byte) error
- func (c *Concept) GobEncode() ([]byte, error)
- func (c *Concept) ID() int32
- func (c *Concept) MarshalJSON() ([]byte, error)
- func (c *Concept) ShortName() string
- func (c *Concept) ShortURL() string
- func (c *Concept) String() string
- func (c *Concept) URL() string
- func (c *Concept) UnmarshalJSON(b []byte) error
- type DFA
- type DFAMatcher
- type Dictionary
- type Document
- type Edge
- type EdgeSet
- type FileDocument
- type FuzzyDFA
- type FuzzyDFAMatcher
- type Graph
- type HTTPDocument
- type HandleAmbigsFunc
- type MatchPos
- type Matcher
- type Parser
- type ReaderDocument
- type RegexMatcher
- type Resource
- type RulesDictionary
- type Stream
- type StreamToken
- type Token
- type Traits
- type Triple
- type URLRegister
- func (r *URLRegister) GobDecode(bs []byte) error
- func (r *URLRegister) GobEncode() ([]byte, error)
- func (r *URLRegister) LookupID(id int) (string, bool)
- func (r *URLRegister) LookupURL(url string) (int, bool)
- func (r *URLRegister) Register(url string) int
- func (r *URLRegister) Write(path string) error
Constants ¶
const ( // SplitURL is the name of predicates that denote ambiguous connections // in the concept graph. SplitURL = "http://bitbucket.org/fflo/semix/pkg/semix/a-star" )
Variables ¶
This section is empty.
Functions ¶
func CombineURLs ¶
CombineURLs combines tow or more URLs. If urls is empty, the empty string is returned. If urls contain exactly on url, this url is returned.
func ExpandBraces ¶
ExpandBraces expands braces in a given string using a bash-like syntax.
func NormalizeString ¶
NormalizeString normalizes a given string. The normalization converts any non empty sequence of punctuation or whitespace characters to exactly one whitespace.
If sourround is true, the result string is sourrounded with exactly one whitespace.
Types ¶
type Concept ¶
type Concept struct { Name string // contains filtered or unexported fields }
Concept represents a concept in the concept graph. It consits of an unique URL, an optional (human readeable) name, a list of edges and an unique ID.
func HandleAmbigsWithMerge ¶
HandleAmbigsWithMerge handles ambiguities by creating a new distinct concept.
func HandleAmbigsWithSplit ¶
HandleAmbigsWithSplit handles an ambiguity by creating a new ambig split concept.
func NewConcept ¶
NewConcept creates a new Concept with the given URL and configuration functions.
func (*Concept) GobDecode ¶
GobDecode decodes a concept from gob encoded binary data. Only the name, url and id are decoded.
func (*Concept) GobEncode ¶
GobEncode encodes a concept to gob encoded binary data. Only the name, url and id are encoded.
func (*Concept) MarshalJSON ¶
MarshalJSON writes the concept to json. To avoid writting the whole graph of the concepts, the edges of the concept are written as pairs of URLs and recursive links are omitted.
func (*Concept) ShortName ¶
ShortName returns a nice human readeable name for the concept. This does not need to be a unique identifier for this concept.
func (*Concept) ShortURL ¶
ShortURL returns a short version of the URL of this concept. The short URL is not necessarily unique.
func (*Concept) UnmarshalJSON ¶
UnmarshalJSON reads the concept from json. Since the edges are written as pairs of URLs, it is not possible to recreate the whole concept using json.
type DFA ¶
type DFA struct {
// contains filtered or unexported fields
}
DFA is a simple wrapper around a sparsetable.DFA. It maps the ids to Concepts.
func (DFA) Delta ¶
func (d DFA) Delta(s sparsetable.State, c byte) sparsetable.State
Delta executes one transition in the DFA.
func (DFA) Final ¶
func (d DFA) Final(s sparsetable.State) (*Concept, bool)
Final return the found Concept and true iff s denotes a final state. Otherwise it returns nil and false.
func (*DFA) GobDecode ¶
GobDecode decodes a the sparsetable.DFA of a DFA. It does not decode the graph.
func (DFA) GobEncode ¶
GobEncode encodes a the sparsetable.DFA of a DFA. It does not encode the graph.
func (DFA) Initial ¶
func (d DFA) Initial() sparsetable.State
Initial returns the initial state of the DFA.
type DFAMatcher ¶
type DFAMatcher struct {
DFA DFA
}
DFAMatcher uses a DFA to search for matches in a string.
func (DFAMatcher) Match ¶
func (m DFAMatcher) Match(str string) MatchPos
Match returns the MatchPos of the first encountered entry in the DFA. The MatchPos denotes the first encountered concept in the string or nil nothing could be matched.
type Dictionary ¶
Dictionary is a dictionary that maps the labels of the concepts to their apporpriate IDs. Negative IDs mark ambigous dictionary entries. The map to the according positve ID.
type Document ¶
type Document interface { io.ReadCloser Path() string }
Document defines an interface for readeable documents.
func NewFileDocument ¶
NewFileDocument creates a new FileDocument with the given path. The first call to Read will trigger an os.Open. Any errors from os.Open will be returned in the Read method.
func NewHTMLDocument ¶
NewHTMLDocument returns a new HTML Document reader. If the parsing of the html fails, its Read method will return the appropriate error.
func NewHTTPDocument ¶
NewHTTPDocument creates a new HTTPDocument with the given url. The first call to Read will trigger an http.Get request to be sent. Any errors from this request will be returned in the Read method.
func NewReaderDocument ¶
NewReaderDocument create a new ReaderDocument.
func NewStringDocument ¶
NewStringDocument returns a document that reads from a string.
type Edge ¶
Edge represents an edge in the concept graph that links on concept to another concept with a predicate and a Levenshtein distance.
type EdgeSet ¶
EdgeSet represents a set of relations
func IntersectEdges ¶
IntersectEdges calculates the intersection of the relation sets of the given concepts.
type FileDocument ¶
type FileDocument struct {
// contains filtered or unexported fields
}
FileDocument wraps an os.File and a path.
func (*FileDocument) Close ¶
func (d *FileDocument) Close() error
Close closes the underlying body of the http GET resoponse of the HTTPDocument.
func (*FileDocument) Path ¶
func (d *FileDocument) Path() string
Path returns the url of the HTTPDocument.
type FuzzyDFA ¶
type FuzzyDFA struct {
// contains filtered or unexported fields
}
FuzzyDFA is a simple wrapper around a sparsetable.FuzzyDFA. It maps the ids of the underlying sparsetable.DFA to the according Concepts.
func NewFuzzyDFA ¶
NewFuzzyDFA constructs a new FuzzyDFA with the given maximum error bound k.
func (FuzzyDFA) Delta ¶
func (d FuzzyDFA) Delta(s *sparsetable.FuzzyStack, f func(int, int, *Concept)) bool
Delta executes one fuzzy transition in this FuzzyDFA.
func (FuzzyDFA) Initial ¶
func (d FuzzyDFA) Initial(str string) *sparsetable.FuzzyStack
Initial returns the initial state of this FuzzyDFA.
type FuzzyDFAMatcher ¶
type FuzzyDFAMatcher struct {
DFA FuzzyDFA
}
FuzzyDFAMatcher uses a FuzzyDFA to search for matches in a string.
func (FuzzyDFAMatcher) Match ¶
func (m FuzzyDFAMatcher) Match(str string) MatchPos
Match returns the MatchPos of the first encountered entry in the DFA. The MatchPos denotes the first encountered concept in the string or nil nothing could be matched.
type Graph ¶
type Graph struct {
// contains filtered or unexported fields
}
Graph represents a graph of linked concepts. It holds a map of the URLs and the concepts and an array of all concepts.
func (*Graph) Add ¶
Add adds a triple to the graph. It returns a Triple that consits of the according concepts that where created.
func (*Graph) ConceptsLen ¶
ConceptsLen returns the number of concepts in the array.
func (*Graph) FindByID ¶
FindByID searches a concept by its ID. If a negative ID is given, a new split concept is returned, that links to the concept with the according positive ID.
type HTTPDocument ¶
type HTTPDocument struct {
// contains filtered or unexported fields
}
HTTPDocument is a document that reads from HTTP.
func (*HTTPDocument) Close ¶
func (d *HTTPDocument) Close() error
Close closes the underlying body of the http GET resoponse of the HTTPDocument.
func (*HTTPDocument) Path ¶
func (d *HTTPDocument) Path() string
Path returns the url of the HTTPDocument.
type HandleAmbigsFunc ¶
HandleAmbigsFunc defines a function that handles ambiguities in the parsing of the knowledge base. If the function is successfull, it must return a non nil concept, otherwise the according dictionary entry is discarded.
type MatchPos ¶
MatchPos represents a matching position in a string. Concept is the associated concept of the match. It is nil if nothing can be matched. Begin and End mark the begin and end positions of the match if Concept is not nil.
type Matcher ¶
type Matcher interface { // Match returns the MatchPos of the next concept in the given string. Match(string) MatchPos }
Matcher is a simple interface for searching a concept in a string.
type ReaderDocument ¶
ReaderDocument wraps an io.Reader.
func (ReaderDocument) Path ¶
func (d ReaderDocument) Path() string
Path returns the path of this StringDocument.
type RegexMatcher ¶
RegexMatcher uses a regex to search for a match in a string.
func (RegexMatcher) Match ¶
func (m RegexMatcher) Match(str string) MatchPos
Match returns the MatchPos of the first occurence of the regex.
type Resource ¶
type Resource struct { Graph *Graph Dictionary Dictionary Rules RulesDictionary DFA DFA }
Resource is a struct that holds all parsed knwoledge base resources.
func NewResource ¶
func NewResource(g *Graph, d Dictionary, r RulesDictionary) *Resource
NewResource creates a new resource.
type RulesDictionary ¶
RulesDictionary is a dictionary that maps concept URLs to their respective rules.
type Stream ¶
type Stream <-chan StreamToken
Stream repsents a stream to read tokens.
func Match ¶
Match matches concepts in the stream and splits the tokens accordingly. So one token ' text <match> text ' is split into ' text ', '<match>' and ' text '.
type StreamToken ¶
StreamToken Wraps either a token or an error
func ReadStreamToken ¶
func ReadStreamToken(d Document) StreamToken
ReadStreamToken reads a StreamToken from a document. It simply wraps ReadToken and returns a StreamToken
type Token ¶
Token denotes a token in an input document. It holds the according Concept or nil and its position in the input document.
type Traits ¶
type Traits interface { Ignore(string) bool IsSymmetric(string) bool IsTransitive(string) bool IsName(string) bool IsDistinct(string) bool IsAmbig(string) bool IsInverted(string) bool IsRule(string) bool HandleAmbigs() HandleAmbigsFunc }
Traits defines the interface for the different traits of predicates.
type Triple ¶
type Triple struct {
S, P, O *Concept
}
Triple represents a relational triple in the graph. It consitst of a subject S, a predicate P and an object O.
type URLRegister ¶
type URLRegister struct {
// contains filtered or unexported fields
}
URLRegister is used to map urls to unique ids and vice versa.
func ReadURLRegister ¶
func ReadURLRegister(path string) (*URLRegister, error)
ReadURLRegister reads a URLRegister from a gob encoded file. If the given file does not exist, a new empty register is returned.
func (*URLRegister) GobDecode ¶
func (r *URLRegister) GobDecode(bs []byte) error
GobDecode implements gob.Decoder
func (*URLRegister) GobEncode ¶
func (r *URLRegister) GobEncode() ([]byte, error)
GobEncode implements gob.Encoder
func (*URLRegister) LookupID ¶
func (r *URLRegister) LookupID(id int) (string, bool)
LookupID searches for the given id and returs its associated url and true if it can be found or "" and false otherwise.
func (*URLRegister) LookupURL ¶
func (r *URLRegister) LookupURL(url string) (int, bool)
LookupURL searches for the given url and returns its associated id and true if it can be found or 0 and false oterhwise.
func (*URLRegister) Register ¶
func (r *URLRegister) Register(url string) int
Register registers a new url and returs its associated id. If a given url does not yet exist, it is inserted and given a new id.
func (*URLRegister) Write ¶
func (r *URLRegister) Write(path string) error
Write writes a URLRegister into a gob encode file.