semix

package
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2018 License: MIT Imports: 19 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// SplitURL is the name of predicates that denote ambiguous connections
	// in the concept graph.
	SplitURL = "http://bitbucket.org/fflo/semix/pkg/semix/a-star"
)

Variables

This section is empty.

Functions

func CombineURLs

func CombineURLs(urls ...string) string

CombineURLs combines tow or more URLs. If urls is empty, the empty string is returned. If urls contain exactly on url, this url is returned.

func ExpandBraces

func ExpandBraces(str string) ([]string, error)

ExpandBraces expands braces in a given string using a bash-like syntax.

func NormalizeString

func NormalizeString(str string, sourround bool) string

NormalizeString normalizes a given string. The normalization converts any non empty sequence of punctuation or whitespace characters to exactly one whitespace.

If sourround is true, the result string is sourrounded with exactly one whitespace.

func WithEdges

func WithEdges(cs ...*Concept) func(*Concept)

WithEdges returns a configuration function that sets the edges. Each pair (p,o) in cs is set to the edge pointing to o with the predicate p.

func WithID

func WithID(id int) func(*Concept)

WithID returns a configuration function that sets the concept's ID.

Types

type Concept

type Concept struct {
	Name string
	// contains filtered or unexported fields
}

Concept represents a concept in the concept graph. It consits of an unique URL, an optional (human readeable) name, a list of edges and an unique ID.

func HandleAmbigsWithMerge

func HandleAmbigsWithMerge(g *Graph, urls ...string) (*Concept, error)

HandleAmbigsWithMerge handles ambiguities by creating a new distinct concept.

func HandleAmbigsWithSplit

func HandleAmbigsWithSplit(g *Graph, urls ...string) (*Concept, error)

HandleAmbigsWithSplit handles an ambiguity by creating a new ambig split concept.

func NewConcept

func NewConcept(url string, cfs ...func(*Concept)) *Concept

NewConcept creates a new Concept with the given URL and configuration functions.

func (*Concept) Ambig

func (c *Concept) Ambig() bool

Ambig returns if the concept is ambiguous or not.

func (*Concept) EachEdge

func (c *Concept) EachEdge(f func(Edge))

EachEdge iterates over all edges of this concept.

func (*Concept) EdgeAt

func (c *Concept) EdgeAt(i int) Edge

EdgeAt returns the edge at the given position in the edges slice.

func (Concept) Edges

func (c Concept) Edges() []Edge

Edges returns the edges of a concept.

func (*Concept) EdgesLen

func (c *Concept) EdgesLen() int

EdgesLen returns the length of the edges.

func (*Concept) FindEdge

func (c *Concept) FindEdge(p, o string) (Edge, bool)

FindEdge searches for the matching edge.

func (*Concept) GobDecode

func (c *Concept) GobDecode(bs []byte) error

GobDecode decodes a concept from gob encoded binary data. Only the name, url and id are decoded.

func (*Concept) GobEncode

func (c *Concept) GobEncode() ([]byte, error)

GobEncode encodes a concept to gob encoded binary data. Only the name, url and id are encoded.

func (*Concept) ID

func (c *Concept) ID() int32

ID returns the unique ID of the concept.

func (*Concept) MarshalJSON

func (c *Concept) MarshalJSON() ([]byte, error)

MarshalJSON writes the concept to json. To avoid writting the whole graph of the concepts, the edges of the concept are written as pairs of URLs and recursive links are omitted.

func (*Concept) ShortName

func (c *Concept) ShortName() string

ShortName returns a nice human readeable name for the concept. This does not need to be a unique identifier for this concept.

func (*Concept) ShortURL

func (c *Concept) ShortURL() string

ShortURL returns a short version of the URL of this concept. The short URL is not necessarily unique.

func (*Concept) String

func (c *Concept) String() string

func (*Concept) URL

func (c *Concept) URL() string

URL return the url of this concept.

func (*Concept) UnmarshalJSON

func (c *Concept) UnmarshalJSON(b []byte) error

UnmarshalJSON reads the concept from json. Since the edges are written as pairs of URLs, it is not possible to recreate the whole concept using json.

type DFA

type DFA struct {
	// contains filtered or unexported fields
}

DFA is a simple wrapper around a sparsetable.DFA. It maps the ids to Concepts.

func NewDFA

func NewDFA(d Dictionary, graph *Graph) DFA

NewDFA constructs a new DFA.

func (DFA) Delta

func (d DFA) Delta(s sparsetable.State, c byte) sparsetable.State

Delta executes one transition in the DFA.

func (DFA) Final

func (d DFA) Final(s sparsetable.State) (*Concept, bool)

Final return the found Concept and true iff s denotes a final state. Otherwise it returns nil and false.

func (*DFA) GobDecode

func (d *DFA) GobDecode(bs []byte) error

GobDecode decodes a the sparsetable.DFA of a DFA. It does not decode the graph.

func (DFA) GobEncode

func (d DFA) GobEncode() ([]byte, error)

GobEncode encodes a the sparsetable.DFA of a DFA. It does not encode the graph.

func (DFA) Initial

func (d DFA) Initial() sparsetable.State

Initial returns the initial state of the DFA.

type DFAMatcher

type DFAMatcher struct {
	DFA DFA
}

DFAMatcher uses a DFA to search for matches in a string.

func (DFAMatcher) Match

func (m DFAMatcher) Match(str string) MatchPos

Match returns the MatchPos of the first encountered entry in the DFA. The MatchPos denotes the first encountered concept in the string or nil nothing could be matched.

type Dictionary

type Dictionary map[string]int32

Dictionary is a dictionary that maps the labels of the concepts to their apporpriate IDs. Negative IDs mark ambigous dictionary entries. The map to the according positve ID.

type Document

type Document interface {
	io.ReadCloser
	Path() string
}

Document defines an interface for readeable documents.

func NewFileDocument

func NewFileDocument(path string) Document

NewFileDocument creates a new FileDocument with the given path. The first call to Read will trigger an os.Open. Any errors from os.Open will be returned in the Read method.

func NewHTMLDocument

func NewHTMLDocument(path string, r io.Reader) Document

NewHTMLDocument returns a new HTML Document reader. If the parsing of the html fails, its Read method will return the appropriate error.

func NewHTTPDocument

func NewHTTPDocument(url string) Document

NewHTTPDocument creates a new HTTPDocument with the given url. The first call to Read will trigger an http.Get request to be sent. Any errors from this request will be returned in the Read method.

func NewReaderDocument

func NewReaderDocument(path string, r io.Reader) Document

NewReaderDocument create a new ReaderDocument.

func NewStringDocument

func NewStringDocument(path, str string) Document

NewStringDocument returns a document that reads from a string.

type Edge

type Edge struct {
	P, O *Concept
	L    int
}

Edge represents an edge in the concept graph that links on concept to another concept with a predicate and a Levenshtein distance.

func (Edge) String

func (e Edge) String() string

type EdgeSet

type EdgeSet map[string]map[string]struct{}

EdgeSet represents a set of relations

func IntersectEdges

func IntersectEdges(g *Graph, urls ...string) EdgeSet

IntersectEdges calculates the intersection of the relation sets of the given concepts.

type FileDocument

type FileDocument struct {
	// contains filtered or unexported fields
}

FileDocument wraps an os.File and a path.

func (*FileDocument) Close

func (d *FileDocument) Close() error

Close closes the underlying body of the http GET resoponse of the HTTPDocument.

func (*FileDocument) Path

func (d *FileDocument) Path() string

Path returns the url of the HTTPDocument.

func (*FileDocument) Read

func (d *FileDocument) Read(b []byte) (int, error)

Read implements the io.Reader interface.

type FuzzyDFA

type FuzzyDFA struct {
	// contains filtered or unexported fields
}

FuzzyDFA is a simple wrapper around a sparsetable.FuzzyDFA. It maps the ids of the underlying sparsetable.DFA to the according Concepts.

func NewFuzzyDFA

func NewFuzzyDFA(k int, dfa DFA) FuzzyDFA

NewFuzzyDFA constructs a new FuzzyDFA with the given maximum error bound k.

func (FuzzyDFA) Delta

func (d FuzzyDFA) Delta(s *sparsetable.FuzzyStack, f func(int, int, *Concept)) bool

Delta executes one fuzzy transition in this FuzzyDFA.

func (FuzzyDFA) Initial

func (d FuzzyDFA) Initial(str string) *sparsetable.FuzzyStack

Initial returns the initial state of this FuzzyDFA.

func (FuzzyDFA) MaxError

func (d FuzzyDFA) MaxError() int

MaxError returns the maximum allowed error for the this FuzzyDFA.

type FuzzyDFAMatcher

type FuzzyDFAMatcher struct {
	DFA FuzzyDFA
}

FuzzyDFAMatcher uses a FuzzyDFA to search for matches in a string.

func (FuzzyDFAMatcher) Match

func (m FuzzyDFAMatcher) Match(str string) MatchPos

Match returns the MatchPos of the first encountered entry in the DFA. The MatchPos denotes the first encountered concept in the string or nil nothing could be matched.

type Graph

type Graph struct {
	// contains filtered or unexported fields
}

Graph represents a graph of linked concepts. It holds a map of the URLs and the concepts and an array of all concepts.

func NewGraph

func NewGraph() *Graph

NewGraph creates a new graph.

func (*Graph) Add

func (g *Graph) Add(s, p, o string) (*Concept, *Concept, *Concept)

Add adds a triple to the graph. It returns a Triple that consits of the according concepts that where created.

func (*Graph) ConceptAt

func (g *Graph) ConceptAt(i int) *Concept

ConceptAt returns the concept at the given position.

func (*Graph) ConceptsLen

func (g *Graph) ConceptsLen() int

ConceptsLen returns the number of concepts in the array.

func (*Graph) FindByID

func (g *Graph) FindByID(id int32) (*Concept, bool)

FindByID searches a concept by its ID. If a negative ID is given, a new split concept is returned, that links to the concept with the according positive ID.

func (*Graph) FindByURL

func (g *Graph) FindByURL(str string) (*Concept, bool)

FindByURL searches a concept by its URL.

func (*Graph) Register

func (g *Graph) Register(url string) *Concept

Register registers new concept with the given URL in the Graph. If the URL does already exist, the according cocnept is retuned. This function will never return a nil concept.

type HTTPDocument

type HTTPDocument struct {
	// contains filtered or unexported fields
}

HTTPDocument is a document that reads from HTTP.

func (*HTTPDocument) Close

func (d *HTTPDocument) Close() error

Close closes the underlying body of the http GET resoponse of the HTTPDocument.

func (*HTTPDocument) Path

func (d *HTTPDocument) Path() string

Path returns the url of the HTTPDocument.

func (*HTTPDocument) Read

func (d *HTTPDocument) Read(b []byte) (int, error)

Read implements the io.Reader interface.

type HandleAmbigsFunc

type HandleAmbigsFunc func(*Graph, ...string) (*Concept, error)

HandleAmbigsFunc defines a function that handles ambiguities in the parsing of the knowledge base. If the function is successfull, it must return a non nil concept, otherwise the according dictionary entry is discarded.

type MatchPos

type MatchPos struct {
	Concept    *Concept
	Begin, End int
}

MatchPos represents a matching position in a string. Concept is the associated concept of the match. It is nil if nothing can be matched. Begin and End mark the begin and end positions of the match if Concept is not nil.

type Matcher

type Matcher interface {
	// Match returns the MatchPos of the next concept in the given string.
	Match(string) MatchPos
}

Matcher is a simple interface for searching a concept in a string.

type Parser

type Parser interface {
	Parse(func(string, string, string) error) error
}

Parser defines a parser that parses (Subject, Predicate, Object) triples.

type ReaderDocument

type ReaderDocument struct {
	io.Reader
	// contains filtered or unexported fields
}

ReaderDocument wraps an io.Reader.

func (ReaderDocument) Close

func (ReaderDocument) Close() error

Close returns nil.

func (ReaderDocument) Path

func (d ReaderDocument) Path() string

Path returns the path of this StringDocument.

type RegexMatcher

type RegexMatcher struct {
	Re      *regexp.Regexp
	Concept *Concept
}

RegexMatcher uses a regex to search for a match in a string.

func (RegexMatcher) Match

func (m RegexMatcher) Match(str string) MatchPos

Match returns the MatchPos of the first occurence of the regex.

type Resource

type Resource struct {
	Graph      *Graph
	Dictionary Dictionary
	Rules      RulesDictionary
	DFA        DFA
}

Resource is a struct that holds all parsed knwoledge base resources.

func NewResource

func NewResource(g *Graph, d Dictionary, r RulesDictionary) *Resource

NewResource creates a new resource.

func Parse

func Parse(p Parser, t Traits) (*Resource, error)

Parse creates a resource from a parser.

func (*Resource) GobDecode

func (r *Resource) GobDecode(bs []byte) error

GobDecode decodes a graph from gob endcoded binary data.

func (*Resource) GobEncode

func (r *Resource) GobEncode() ([]byte, error)

GobEncode encodes a graph to gob encoded binary data.

type RulesDictionary

type RulesDictionary map[string]string

RulesDictionary is a dictionary that maps concept URLs to their respective rules.

type Stream

type Stream <-chan StreamToken

Stream repsents a stream to read tokens.

func Filter

func Filter(ctx context.Context, s Stream) Stream

Filter discards all tokens that do not match a concept.

func Match

func Match(ctx context.Context, m Matcher, s Stream) Stream

Match matches concepts in the stream and splits the tokens accordingly. So one token ' text <match> text ' is split into ' text ', '<match>' and ' text '.

func Normalize

func Normalize(ctx context.Context, s Stream) Stream

Normalize normalizes the token input. It prepends and appends one ' ' character to the token. All sequences of one or more unicode punctuation or unicode whitespaces are replaced by exactly one whitespace character ' '.

func Read

func Read(ctx context.Context, ds ...Document) Stream

Read reads documents into tokens.

type StreamToken

type StreamToken struct {
	Token Token
	Err   error
}

StreamToken Wraps either a token or an error

func ReadStreamToken

func ReadStreamToken(d Document) StreamToken

ReadStreamToken reads a StreamToken from a document. It simply wraps ReadToken and returns a StreamToken

type Token

type Token struct {
	Token, Path string
	Concept     *Concept
	Begin, End  int
}

Token denotes a token in an input document. It holds the according Concept or nil and its position in the input document.

func ReadToken

func ReadToken(d Document) (Token, error)

ReadToken reads a single Token from a document.

func (Token) String

func (t Token) String() string

String returns the string representation of a token.

type Traits

type Traits interface {
	Ignore(string) bool
	IsSymmetric(string) bool
	IsTransitive(string) bool
	IsName(string) bool
	IsDistinct(string) bool
	IsAmbig(string) bool
	IsInverted(string) bool
	IsRule(string) bool
	HandleAmbigs() HandleAmbigsFunc
}

Traits defines the interface for the different traits of predicates.

type Triple

type Triple struct {
	S, P, O *Concept
}

Triple represents a relational triple in the graph. It consitst of a subject S, a predicate P and an object O.

type URLRegister

type URLRegister struct {
	// contains filtered or unexported fields
}

URLRegister is used to map urls to unique ids and vice versa.

func NewURLRegister

func NewURLRegister() *URLRegister

NewURLRegister creates a new URLRegister.

func ReadURLRegister

func ReadURLRegister(path string) (*URLRegister, error)

ReadURLRegister reads a URLRegister from a gob encoded file. If the given file does not exist, a new empty register is returned.

func (*URLRegister) GobDecode

func (r *URLRegister) GobDecode(bs []byte) error

GobDecode implements gob.Decoder

func (*URLRegister) GobEncode

func (r *URLRegister) GobEncode() ([]byte, error)

GobEncode implements gob.Encoder

func (*URLRegister) LookupID

func (r *URLRegister) LookupID(id int) (string, bool)

LookupID searches for the given id and returs its associated url and true if it can be found or "" and false otherwise.

func (*URLRegister) LookupURL

func (r *URLRegister) LookupURL(url string) (int, bool)

LookupURL searches for the given url and returns its associated id and true if it can be found or 0 and false oterhwise.

func (*URLRegister) Register

func (r *URLRegister) Register(url string) int

Register registers a new url and returs its associated id. If a given url does not yet exist, it is inserted and given a new id.

func (*URLRegister) Write

func (r *URLRegister) Write(path string) error

Write writes a URLRegister into a gob encode file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL