rabbit

package module
v0.0.0-...-490b20b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2021 License: MIT Imports: 12 Imported by: 1

README

🐰rabbit

Go Reference Go Report Card

An interpreted language written in Go - XPath 3.1 implementation for HTML

XML Path Language(XPath) 3.1 is W3C recommendation since 21 march 2017. The rabbit language is built for selecting HTML nodes with XPath syntax.

Overview

Rabbit language is built for HTML, not for XML. Since XPath 3.1 is targeted for XML, it was not possible to implement all the concepts listed in https://www.w3.org/TR/xpath-31/. But in most cases, it is fair enough for selecting HTML nodes with rabbit language.

For example)

  • //a
  • //div[@category='web']/preceding::node()[2]
  • let $abc := ('a', 'b', 'c') return fn:insert-before($abc, 4, 'z')

Basic Usage

// you can chaining xpath object. data is nil or []string
data := rabbit.New().SetDoc("uri/or/filepath.txt").Eval("//a").GetAll()
// if you expect evaled result is a sequence of html node, 
// use NodeAll() instead of DataAll() or GetAll()
nodes := rabbit.New().SetDoc("uri/or/filepath.txt").Eval("//a").NodeAll()
// with error check
x := rabbit.New()
x.SetDoc("uri/or/filepath.txt")
if len(x.Errors()) > 0 {
  // ... do something with errors (the x.Errors() type is []error)
}
x.Eval("//a")
if len(x.Errors()) > 0 {
  // ... do something with errors
}
data = x.DataAll()
// without SetDoc. Since document is not set in the context, 
// node related xpath expressions are not going to work.
x := rabbit.New()
data := x.Eval("1+1").Data()
// you can test simple xpath expressions using cli program
rabbit.New().SetDoc("uri/or/filepath.txt").CLI()

Features

What is supported
  1. Primary Expressions
    • Integer(1)
    • Decimal(1.1)
    • Double(1e1)
    • String("")
    • Boolean(true, false)
    • Variable($var)
    • Context Item(.)
    • Placeholder(?)
  2. Functions
    • Named Function(built in function - bif)
    • Inline Function(custom function)
    • Map
    • Array
    • Arrow operator(=>)
    • Simple Map Operator(!)
  3. Path Expressions
    • Forward Step(child::, descendant::, ...)
    • Reverse Step(parent::, ...)
    • Node Test
    • Predicate([])
    • Abbreviated Syntax(@, ..)
  4. Sequence Expressions(())
  5. Arithmetic Expressions
    • Additive(+, -)
    • Multiplicative(*, div, idiv, mod)
    • Unary(+, -)
  6. String Concatenation Expressions(||)
  7. Comparison Expressions
    • Value Compare(eq, ne, lt, le, gt, ge)
    • Node Compare(is, <<, >>)
    • General Compare(=, !=, <, <=, >, >=)
  8. Logical Expressions(and, or)
  9. For Expressions(for)
  10. Let Expressions(let)
  11. Conditional Expressions(if)
  12. Quantified Expressions(some, every)
  13. Lookup(?)
What is not supported
  1. Namespace
    Rabbit language doesn't care about prefixed tag names or xmlns attributes in tags. So, xmlns attribute is not treated as a namespace node, and a prefixed tag does not complain if no namespace for the prefix is specified in a document.

  2. Limited Types
    There is a bunch of data types in XPath data model. You can check all the types in https://www.w3.org/TR/xpath-datamodel-31/. Many of the types are not supported in Rabbit language and most of the data types in Rabbit language are simplified as string. It makes no sense to implement all the data types because there are no such things as XML Schema Definition(xsd) in HTML.

  3. Limited KindTest
    In the XPath 3.1 document, there are 10 kinds of KindTest. But namespace-node test, processing-instruction test, schema-attribute test, schema-element test is not supported in Rabbit language because our parsing engine(/x/net/html) does not recognize them.

  4. Sequence Type Check
    In XPath 3.1, you can specify data types in lnline function. It looks like this. function($a as xs:string) as xs:string {$a}. This syntax is not a part of the Rabbit language. The inline function should like this. function($a) {$a}.

  5. Node Test with Argument
    Node test with argument is not supported. For example, element(person), element(person, surgeon), element(*, surgeon), attribute(price), attribute(*, xs:decimal) are not allowed. But you can do element(), attribute().

  6. Wildcard Expressions
    Only * wildcard is allowed in the Rabbit language. NCName:*, *:NCName, BracedURILiteral* are not supported since namespace is not a big deal in the Rabbit language.

Notice

Attribute node is custom *html.Node type

Rabbit language support attribute node. But /x/net/html package has no such a type(it only has 6 kinds of nodes) and treats attribute as a field of an element node. So, in order to make an attribute as a node, I had to make a custom *html.Node type. It has the following fields.

  • Type: html.NodeType(7).
  • Parent: node(*html.Node) that is contain the attribute
  • FirstChild, LastChild: nil
  • PrevSibling, NextSibling: prev or next attribute node(*html.Node) of current one
  • Data: attribute key(string).
  • DataAtom: atomized Data(atom.Atom)
  • Namespace: ""(empty string)
  • Attr: Attr field contains only one html.Attribute item. Is has key, value pair for the attribute.
Not well-formed document will be transformed

Rabbit language uses the /x/net/html package for parsing HTML. So, the type of the selected node will be *html.Node. One thing that should know is that /x/net/html package wraps a document with html, head, body tags if it is not well-formed.

For example, if your document looks like this

<div>
  ...
</div>

/x/net/html package transforms the document to this internally.

<html>
  <head></head>
  <body>
    <div>
      ...
    </div>
  </body>
</html>

So, in this example, XPath expression /div has no result because the root node is an html, not div. Keep in mind this fact and otherwise, you can get confused.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type XPath

type XPath struct {
	// contains filtered or unexported fields
}

XPath is a base object to evaluate xpath expressions. xpath field is xpath expression that is saved when using Eval method. context is a context that contains a document and context node. SetDoc function saves a document to the context field. evaled field is set when calling Eval method. object.Item is a custom data type used in rabbit language. You can convert object.Item to a golang data type using Data or Nodes method. errors field is collected errors while parsing and evaluating

func New

func New() *XPath

New creates new xpath object.

func (*XPath) CLI

func (x *XPath) CLI()

CLI is a command line interface

func (*XPath) Data

func (x *XPath) Data() interface{}

Data selects first item of returned value from DataAll

func (*XPath) DataAll

func (x *XPath) DataAll() []interface{}

DataAll convert evaled field to []interface{}

func (*XPath) Errors

func (x *XPath) Errors() []error

Errors returns errors field

func (*XPath) Eval

func (x *XPath) Eval(input string) *XPath

Eval evaluates a xpath expression and save the result to evaled field.

func (*XPath) Evals

func (x *XPath) Evals(input string) []*XPath

Evals evaluates a xpath expression and returns slice of *XPath.

func (*XPath) Get

func (x *XPath) Get() string

func (*XPath) GetAll

func (x *XPath) GetAll() []string

func (*XPath) Node

func (x *XPath) Node() *html.Node

Node selects first item of returned value from NodeAll

func (*XPath) NodeAll

func (x *XPath) NodeAll() []*html.Node

NodeAll convert evaled field to []*html.Node

func (*XPath) Raw

func (x *XPath) Raw() object.Item

Raw returns evaled field

func (*XPath) SetDoc

func (x *XPath) SetDoc(input string) *XPath

SetDoc set document to a context. if document is not set in a context, node related xpath expressions are not going to work. input param can be url or local filepath.

func (*XPath) SetDocN

func (x *XPath) SetDocN(n *html.Node) *XPath

SetDocN is another version of SetDoc.

func (*XPath) SetDocR

func (x *XPath) SetDocR(r *http.Response) *XPath

SetDocR is another version of SetDoc.

func (*XPath) SetDocS

func (x *XPath) SetDocS(s string) *XPath

SetDocS is another version of SetDoc.

func (*XPath) String

func (x *XPath) String() string

String returns input field

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL