elowl

package
v0.0.0-...-97a57b4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2018 License: MIT Imports: 19 Imported by: 0

README

This package contains approaches for a turtle parser. However, it was never
finished. But I've added it as an example how turtle support would be possible.

Documentation

Overview

Package elowl builds a bridge between OWL ontology file and the internal representation of EL++. It contains parser(s) for OWL files and builder(s) for generating EL++ formulae given a set of RDF / OWL triples.

Parsing Input

Currently there is a parser for Turtle files (see https://www.w3.org/TR/turtle/). However we currently don't cover the whole grammar. For example we don't support language tags. For example the following is not supported:

<#spiderman>
  rel:enemyOf <#green-goblin> ;
  a foaf:Person ;
  foaf:name "Spiderman", "Человек-паук"@ru .

Also there is no support for escape sequences such as \n in strings. Only string data is supported at the moment, but the other types could be easily added.

There are several approaches to parse these files so there are interfaces for parsing instances (the parser interface requires a io.Reader). The default implementation uses a hand-written tokenizer, translates the sequence of tokens to an abstract syntax tree (AST) and from this point creates an abstract syntax tree. Tokenization, the transforming to an AST and the transformation from an AST to a set of triples are each defined in there own interfaces so it is easy to plug new approaches inside the current model.

However there are some things I'm not very happy with at the moment (though the perfomance seems ok):

* The tokenizer reads the whole file into memory before parsing the tokens. Turtle files are very clear because each line ends a command. So it should be ok to read the input line by line. However there are multiline texts (”' and """) which may span more than one line. It should not be too hard to add this speciality though. The tokenizer simply stores a list of regex elements and the first one that matches will be the next token. I think it a good idea to create an matcher interface that reads from the given io.Reader. Must of them will be simple regex expressions but for multiline strings we could use a combination of regexes and other methods. However there should be a method to get the next line from the input in the parser itself. This method should take care to either return the rest of an not completely parsed line or read the next line from the input.

* No concurrency. The tokenizer and AST builder don't make use of go routines right now which is not very nice. The converter AST --> triples however processes several statements in a concurrent way.

Index

Constants

View Source
const OntologyDir = ".ontologies"
View Source
const RDFClass = "http://www.w3.org/2002/07/owl#Class"
View Source
const RDFComment = "http://www.w3.org/2000/01/rdf-schema#comment"
View Source
const RDFDatatypeProperty = "http://www.w3.org/2002/07/owl/#DatatypeProperty"
View Source
const RDFDomain = "http://www.w3.org/2000/01/rdf-schema#domain"
View Source
const RDFFirst = "http://www.w3.org/1999/02/22-rdf-syntax-ns#first"
View Source
const RDFInverseOf = "http://www.w3.org/2002/07/owl#inverseOf"
View Source
const RDFLabel = "http://www.w3.org/2000/01/rdf-schema#label"
View Source
const RDFList = "http://www.w3.org/1999/02/22-rdf-syntax-ns#List"
View Source
const RDFNil = "http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"
View Source
const RDFObjectProperty = "http://www.w3.org/2002/07/owl#ObjectProperty"
View Source
const RDFOnProperty = "http://www.w3.org/2002/07/owl#onProperty"
View Source
const RDFRange = "http://www.w3.org/2000/01/rdf-schema#range"
View Source
const RDFRest = "http://www.w3.org/1999/02/22-rdf-syntax-ns#rest"
View Source
const RDFRestriction = "http://www.w3.org/2002/07/owl#Restriction"
View Source
const RDFSomeValuesFrom = "http://www.w3.org/2002/07/owl#someValuesFrom"
View Source
const RDFSubPropertyOf = "http://www.w3.org/2000/01/rdf-schema#subPropertyOf"
View Source
const RDFSubclass = "http://www.w3.org/2000/01/rdf-schema#subClassOf"
View Source
const RDFTransitiveProperty = "http://www.w3.org/2002/07/owl/#TransitiveProperty"
View Source
const RDFType = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
View Source
const RDFVersionInfo = "http://www.w3.org/2002/07/owl#versionInfo"

Variables

View Source
var ErrNoToken = errors.New("Expected token, but non was found.")

A special error that is used to specifiy that we expected to read a token but no further token was found. This way you can check if there was an error because a rule didn't match or simply the stream was empty.

Functions

func EqualsRDFObject

func EqualsRDFObject(object, other RDFObjectType) bool

func GetObjectString

func GetObjectString(o RDFObjectType) (string, error)

func IsBlankNode

func IsBlankNode(s string) bool

Types

type ASTArc

type ASTArc struct {
	ArcType *ASTTypeInfo
	Dest    int
}

func NewASTArc

func NewASTArc(typeInfo *ASTTypeInfo, dest int) *ASTArc

type ASTBuilder

type ASTBuilder interface {
	BuildAST(t TurtleTokenizer, r io.Reader) (*TurtleAST, error)
}

type ASTConverter

type ASTConverter interface {
	Convert(ast *TurtleAST, defaultBase string, builder OWLBuilder) error
}

type ASTNode

type ASTNode struct {
	TokenRef int
	Arcs     []*ASTArc
}

func NewASTNode

func NewASTNode(tokenRef int) *ASTNode

func (*ASTNode) AddArc

func (node *ASTNode) AddArc(arc *ASTArc)

type ASTParser

type ASTParser struct {
	// contains filtered or unexported fields
}

func DefaultTurtleParser

func DefaultTurtleParser(numStatementWorkers int) *ASTParser

func NewASTParser

func NewASTParser(t TurtleTokenizer, astBuilder ASTBuilder, converter ASTConverter) *ASTParser

func (*ASTParser) Parse

func (parser *ASTParser) Parse(r io.Reader, defaultBase string, builder OWLBuilder) error

type ASTTypeInfo

type ASTTypeInfo struct {
	// contains filtered or unexported fields
}

func NewTypeInfoFromNameFromToken

func NewTypeInfoFromNameFromToken(t TurtleToken) *ASTTypeInfo

func NewTypeInfoFromNonterm

func NewTypeInfoFromNonterm(n TurtleNonterminal) *ASTTypeInfo

func (*ASTTypeInfo) GetNonterm

func (info *ASTTypeInfo) GetNonterm() (TurtleNonterminal, bool)

func (*ASTTypeInfo) GetToken

func (info *ASTTypeInfo) GetToken() (TurtleToken, bool)

func (*ASTTypeInfo) IsNonterm

func (info *ASTTypeInfo) IsNonterm(n TurtleNonterminal) bool

func (*ASTTypeInfo) IsToken

func (info *ASTTypeInfo) IsToken(t TurtleToken) bool

func (*ASTTypeInfo) String

func (info *ASTTypeInfo) String() string

type DefaultASTBuilder

type DefaultASTBuilder struct {
	// contains filtered or unexported fields
}

func NewDefaultASTBuilder

func NewDefaultASTBuilder() *DefaultASTBuilder

func (*DefaultASTBuilder) Bla

func (builder *DefaultASTBuilder) Bla(r io.Reader)

func (*DefaultASTBuilder) BuildAST

func (builder *DefaultASTBuilder) BuildAST(t TurtleTokenizer, r io.Reader) (*TurtleAST, error)

type DefaultASTConverter

type DefaultASTConverter struct {
	// contains filtered or unexported fields
}

func NewDefaultASTConverter

func NewDefaultASTConverter(numStatementWorkers int) *DefaultASTConverter

func (*DefaultASTConverter) Convert

func (converter *DefaultASTConverter) Convert(ast *TurtleAST, defaultBase string, builder OWLBuilder) error

type DefaultOWLBuilder

type DefaultOWLBuilder struct {
	NextBlankID  uint
	SubjectMap   TripleMap
	PredicateMap TripleMap
}

func NewDefaultOWLBuilder

func NewDefaultOWLBuilder() *DefaultOWLBuilder

func (*DefaultOWLBuilder) AnswerQuery

func (handler *DefaultOWLBuilder) AnswerQuery(subject, predicate *string, object RDFObjectType,
	f func(t *RDFTriple) error) error

func (*DefaultOWLBuilder) GetBlankNode

func (handler *DefaultOWLBuilder) GetBlankNode() string

func (*DefaultOWLBuilder) HandleTriple

func (handler *DefaultOWLBuilder) HandleTriple(t *RDFTriple) error

type OWLBuilder

type OWLBuilder interface {
	HandleTriple(t *RDFTriple) error
	GetBlankNode() string
}

type OneRegexTokenizer

type OneRegexTokenizer struct {
	// contains filtered or unexported fields
}

func NewOneRegexTokenizer

func NewOneRegexTokenizer() *OneRegexTokenizer

func (*OneRegexTokenizer) Init

func (t *OneRegexTokenizer) Init(r io.Reader) error

func (*OneRegexTokenizer) Match

func (t *OneRegexTokenizer) Match(str string)

func (*OneRegexTokenizer) NextToken

func (t *OneRegexTokenizer) NextToken() (*TokenMatch, error)

type OntologyLib

type OntologyLib struct {
	Db      *sql.DB
	BaseDir string
}

func NewOntologyLib

func NewOntologyLib(db *sql.DB, baseDir string) *OntologyLib

func (*OntologyLib) GetFile

func (lib *OntologyLib) GetFile(url *url.URL, addLocally bool) (*bytes.Reader, error)

func (*OntologyLib) InitDatabase

func (lib *OntologyLib) InitDatabase(driver string) error

func (*OntologyLib) InitLocal

func (lib *OntologyLib) InitLocal() error

func (*OntologyLib) RetrieveFromUrl

func (lib *OntologyLib) RetrieveFromUrl(url *url.URL) ([]byte, error)

type RDFObjectType

type RDFObjectType interface{}

type RDFTriple

type RDFTriple struct {
	Subject, Predicate string
	Object             RDFObjectType
}

func NewRDFTriple

func NewRDFTriple(subject, predicate string, object RDFObjectType) *RDFTriple

func (*RDFTriple) String

func (triple *RDFTriple) String() string

type RegexTokenizer

type RegexTokenizer struct {
	// contains filtered or unexported fields
}

Tokenizes a reader by trying a list of regexes, the first that matches is the next token. Implements the tokenizer interface.

func NewRegexTokenizer

func NewRegexTokenizer() *RegexTokenizer

func (*RegexTokenizer) Init

func (t *RegexTokenizer) Init(r io.Reader) error

func (*RegexTokenizer) NextToken

func (t *RegexTokenizer) NextToken() (*TokenMatch, error)

type SynchTokenizer

type SynchTokenizer struct {
	*RegexTokenizer
}

func (*SynchTokenizer) NextToken

func (t *SynchTokenizer) NextToken() (*TokenMatch, error)

type TBoxConverter

type TBoxConverter struct {
	Classes, BlankClasses []string
	ClassID               map[string]int
	Relations             []*elconc.BinaryObjectRelation
	RelationNames         []string
	RelationID            map[string]int
	SubProperties         []*elconc.SubProp
}

func NewTBoxConverter

func NewTBoxConverter() *TBoxConverter

func (*TBoxConverter) ConvertToTBox

func (converter *TBoxConverter) ConvertToTBox(handler TripleQueryHandler) error

type TokenMatch

type TokenMatch struct {
	Token TurtleToken
	Seq   string
}

A match defines the toke type of the match and the string sequence with which it was matched.

func (*TokenMatch) CleanUp

func (match *TokenMatch) CleanUp()

type TripleMap

type TripleMap map[string]map[string][]RDFObjectType

func NewTripleMap

func NewTripleMap() TripleMap

func (TripleMap) AddElement

func (tm TripleMap) AddElement(key1, key2 string, value RDFObjectType)

type TripleQueryHandler

type TripleQueryHandler interface {
	AnswerQuery(subject, predicate *string, object RDFObjectType,
		f func(t *RDFTriple) error) error
}

type TurtleAST

type TurtleAST struct {
	Nodes  []*ASTNode
	Tokens []*TokenMatch
}

func NewTurtleAST

func NewTurtleAST(tokens []*TokenMatch) *TurtleAST

func (*TurtleAST) AddNode

func (ast *TurtleAST) AddNode(node *ASTNode) int

func (*TurtleAST) Backtrack

func (ast *TurtleAST) Backtrack(nodeID int)

func (*TurtleAST) GetArc

func (ast *TurtleAST) GetArc(node *ASTNode, ids ...int) *ASTArc

func (*TurtleAST) GetArcByID

func (ast *TurtleAST) GetArcByID(nodeID int, ids ...int) *ASTArc

func (*TurtleAST) String

func (ast *TurtleAST) String() string

type TurtleNonterminal

type TurtleNonterminal int

A type for the nonternimals as defined in the Turtle grammar. Nearly the same as in the formal specification, though we don't support everything just yet.

const (
	TurtleDoc TurtleNonterminal = iota
	Statement
	NonterminalPrefixID
	NonterminalBase
	Directive
	Subject
	IRI
	BlankNode
	Collection
	PrefixedName
	Triples
	Object
	BlankNodePropertyList
	Literal
	String
	ObjectList
	PredicateObjectList
	Verb
	Predicate
	RDFLiteral
)

func (TurtleNonterminal) String

func (n TurtleNonterminal) String() string

Human readable version.

type TurtleParser

type TurtleParser interface {
	Parse(r io.Reader, defaultBase string, builder OWLBuilder) error
}

An interface for everything that parses triples from a given reader and adds those triples to the builder for further processing.

type TurtleToken

type TurtleToken int

Defines a type for turtle grammar objects. The list is nearly as defined in the turtle documentation. Some small changes have been made, for example @base and @prefix are defined as tokens.

const (
	ErrorToken TurtleToken = iota
	EOF
	WS
	Comment
	IRIRef
	PNameNS
	BlankNodeLabel
	StringLiteralQuote
	Annon
	PNameLN
	Point
	OpenBrace
	CloseBrace
	OpenBracket
	CloseBracket
	OpenCurlyBrace
	CloseCurlyBrace
	Comma
	Semicolon
	Averb
	PrefixDirective
	BaseDirective
)

func TokenFromRegexCapture

func TokenFromRegexCapture(str string) TurtleToken

func (TurtleToken) RegexCapture

func (t TurtleToken) RegexCapture() string

func (TurtleToken) String

func (t TurtleToken) String() string

A human readable version of the token name.

type TurtleTokenizer

type TurtleTokenizer interface {
	Init(r io.Reader) error
	NextToken() (*TokenMatch, error)
}

An interface for functions that return a sequence of tokens. Each time before you use a tokenizer you *must* call its Init method. The tokenizer should also be able to handle subsequent calls with different readers, i.e. you can reuse it for tokenizing another file. However before you use NextToken you must always call Init. After that the tokenizer returns the next Token by calling NextToken. Note one important thing about the NextToken method: An error != nil should only be returned if there was an error while reading the input file. If a syntax error occurred you return error = nil but the token ErrorToken.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL