charlatan

package module
v0.0.0-...-c5ebb49 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 28, 2016 License: MIT Imports: 8 Imported by: 4

README

Charlatan

Build Status

Charlatan is a query engine for lists or streams of records in different formats. It natively supports CSV and JSON formats but can easily be extended to others.

It supports an SQL-like query language that is defined below. Queries are applied to records to extract values depending on zero or more criteria.

Query Syntax

SELECT <fields> FROM <source> [ WHERE <value> ] [ STARTING AT <index> ] [ LIMIT [<offset>,] <count> ]
  • <fields> is a list of comma-separated field names. Each field name must exist in the source. When reading CSV files, the field names are the column names, while when reading JSON they represent keys.
  • <source> is the filename from which the data is read. The API is agnostique on this and one can implement support for any source type.
  • <value> is a SQL-like value, which can be either a constant (e.g. WHERE 1), a field (e.g. WHERE archived) or any operation using comparison operators (=, !=, <, <=, >, >=, AND, OR) and optionally parentheses (e.g. WHERE (foo > 2) AND (bar = "yo")). The parser allows to use && instead of AND and || instead of OR. It also support inclusive range tests, like WHERE age BETWEEN 20 AND 30.
  • LIMIT N can be used to keep only the first N matched records. It also support the MySQL way to specify offsets: LIMIT M, N can be used to get the first N matched records after the M-th.
  • STARTING AT <index> can be used to skip the first N records. It’s equivalent to the <offset> field of the LIMIT clause, and if both clauses are used in a query, the last one will be used.

Constant values include strings, integers, floats, booleans and the null value.

Examples
SELECT CountryName FROM sample/csv/population.csv WHERE Year = 2010 AND Value > 50000000 AND Value < 70000000
SELECT name, age FROM sample/json/people.jsons WHERE stats.walking > 30 AND stats.biking < 300
SELECT name, age FROM sample/json/people.jsons WHERE stats.walking BETWEEN 20 AND 100 LIMIT 10, 5
Type Coercion Rules
  • int: same value if the constant is an integer. Truncated value if it’s a float. 1 if it’s a true boolean. 0 for everything else.
  • float: same value if the constant is an integer or a float. 1.0 if it’s a true boolean. 0.0 for everything else.
  • boolean: true if it’s a string (even if it’s empty), a true boolean, a non-zero integer or float. false for everything else.
  • string: the string representation of the constant. null becomes "null"

These rules mean that e.g. WHERE 0 is equivalent to WHERE false and WHERE "" is equivalent to WHERE true.

API

The library is responsible for parsing the query and executing against records. Everything else is up to you, including how fields are retrieved from records.

Note: code examples below don’t include error handling for clarity purposes.

// parse the query
query, _ := charlatan.QueryFromString("SELECT foo FROM myfile.json WHERE foo > 2")

// open the source file
reader, _ := os.Open(query.From())

defer reader.Close()

// skip lines if the query contains "STARTING AT <n>"
skip := query.StartingAt()

decoder := json.NewDecoder(reader)

for {
    // here we use STARTING AT to skip all lines, not only the ones that match
    // the query. This is not the usual behavior, but we can do whatever we
    // want here.
    skip--
    if skip >= 0 {
        continue
    }

    // get a new JSON record
    r, err := record.NewJSONRecordFromDecoder(decoder)

    if err == io.EOF {
        break
    }

    // evaluate the query against the record to test if it matches
    if match, _ := query.Evaluate(r); !match {
        continue
    }

    // extract the values and print them
    values, _ := query.FieldsValues(r)
    fmt.Printf("%v\n", values)
}

Two record types are included: JSONRecord and CSVRecord. Implementing a record only requires one method: Find(*Field) (*Const, error), which takes a field and return its value.

As an example, let’s implement a LineRecord that’ll be used to get specific characters on each line of a file, c0 being the first character:

type LineRecord struct { Line string }

func (r *LineRecord) Find(f *charlatan.Field) (*charlatan.Const, error) {

    // this is the field value we must return
    name := f.Name()

    // we reject fields that doesn't start with 'c'
    if len(name) < 2 || name[0] != 'c' {
        return nil, fmt.Errorf("Unknown field '%s'", name)
    }

    // we extract the character index from the field name.
    index, err := strconv.ParseInt(name[1:], 10, 64)
    if err != nil {
        return nil, err
    }

    // let's not be too strict and accept out-of-range indexes
    if index < 0 || index >= int64(len(r.Line)) {
        return charlatan.StringConst(""), nil
    }

    return charlatan.StringConst(fmt.Sprintf("%c", r.Line[index])), nil
}

One can now loop over a file’s content, construct LineRecords from its lines and evaluate queries against them:

query, _ := charlatan.QueryFromString("SELECT c1 FROM myfile WHERE c0 = 'a'")

f, _ := os.Open(query.From())
defer f.Close()

s := bufio.NewScanner(f)
for s.Scan() {
    r := &LineRecord{Line: s.Text()}

    if m, _ := query.Evaluate(r); !m {
        continue
    }

    values, _ := query.FieldsValues(r)
    fmt.Printf("%v\n", values)
}
Examples

Two examples are included in the repository under sample/csv/ and sample/json/.

Authors

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Const

type Const struct {
	// contains filtered or unexported fields
}

Const represents a Constant

func BoolConst

func BoolConst(value bool) *Const

BoolConst returns a new Const of type Bool

func ConstFromString

func ConstFromString(s string) *Const

ConstFromString parses a Const from a string

func FloatConst

func FloatConst(value float64) *Const

FloatConst returns a new Const of type float

func IntConst

func IntConst(value int64) *Const

IntConst returns a new Const of type int

func NewConst

func NewConst(value interface{}) (*Const, error)

NewConst creates a new constant whatever the type is

func NullConst

func NullConst() *Const

NullConst returns a new const of type null

func StringConst

func StringConst(value string) *Const

StringConst returns a new Const of type String

func (Const) AsBool

func (c Const) AsBool() bool

AsBool converts into a bool

  • for bool, returns the value
  • for null, returns false
  • for numeric, returns true if not 0
  • for strings, return true (test existence)

func (Const) AsFloat

func (c Const) AsFloat() float64

AsFloat converts into a float64 Returns 0 if the const is a string or null

func (Const) AsInt

func (c Const) AsInt() int64

AsInt converts into an int64 Returns 0 if the const is a string or null

func (Const) AsString

func (c Const) AsString() string

AsString converts into a string

func (Const) CompareTo

func (c Const) CompareTo(c2 *Const) (int, error)

CompareTo returns:

  • a positive integer if this Constant is greater than the given one
  • a negative integer if this Constant is lower than the given one
  • zero, is this constant is equals to the given one

If the comparison is not possible (incompatible types), an error will be returned

func (Const) Evaluate

func (c Const) Evaluate(r Record) (*Const, error)

Evaluate evaluates a const against a record. In practice it always returns a pointer on itself

func (Const) IsBool

func (c Const) IsBool() bool

IsBool tests if a const is a bool

func (Const) IsNull

func (c Const) IsNull() bool

IsNull tests if a const is null

func (Const) IsNumeric

func (c Const) IsNumeric() bool

IsNumeric tests if a const has a numeric type (int or float)

func (Const) IsString

func (c Const) IsString() bool

IsString tests if a const is a string

func (Const) String

func (c Const) String() string

func (Const) Value

func (c Const) Value() interface{}

Value returns the value of a const

type Field

type Field struct {
	// contains filtered or unexported fields
}

Field is a field, contained into the SELECT part and the condition. A field is an operand, it can return the value extracted into the Record.

func NewField

func NewField(name string) *Field

NewField returns a new field from the given string

func (Field) Evaluate

func (f Field) Evaluate(record Record) (*Const, error)

Evaluate evaluates the field on a record

func (Field) Name

func (f Field) Name() string

Name returns the field's name

func (Field) String

func (f Field) String() string

type Query

type Query struct {
	// contains filtered or unexported fields
}

Query is a query

func NewQuery

func NewQuery(from string) *Query

NewQuery creates a new query with the given from part

func QueryFromString

func QueryFromString(s string) (*Query, error)

QueryFromString creates a query from the given string

func (*Query) AddField

func (q *Query) AddField(field *Field)

AddField adds one field

func (*Query) AddFields

func (q *Query) AddFields(fields []*Field)

AddFields adds multiple fields

func (*Query) Evaluate

func (q *Query) Evaluate(record Record) (bool, error)

Evaluate evaluates the query against the given record

func (*Query) Fields

func (q *Query) Fields() []*Field

Fields returns the fields

func (*Query) FieldsValues

func (q *Query) FieldsValues(record Record) ([]*Const, error)

FieldsValues extracts the values of each fields into the given record Note that you should evaluate the query first

func (*Query) From

func (q *Query) From() string

From returns the FROM part of this query

func (*Query) HasLimit

func (q *Query) HasLimit() bool

HasLimit tests if the query has a limit

func (*Query) Limit

func (q *Query) Limit() int64

Limit returns the 'LIMIT' part of the query, or 0 if it's not present

func (*Query) StartingAt

func (q *Query) StartingAt() int64

StartingAt returns the 'STARTING AT' part of the query, or 0 if it's not present

func (*Query) String

func (q *Query) String() string

String returns a string representation of this query

type Record

type Record interface {
	Find(*Field) (*Const, error)
}

A Record is a record

Directories

Path Synopsis
Package record provides records for the charlatan package
Package record provides records for the charlatan package
samples
csv
tests

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL