tok

package
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 10, 2018 License: BSD-3-Clause, BSD-3-Clause Imports: 4 Imported by: 2

README

Go Report Card License

tok

A niave tokenizer library

Public Interface

  • Backup - given a token and buffer return a new buffer with the token's value as prefix
    • parameters
      • Token
      • buffer (byte array) returns
      • buffer (byte array)
  • Between - returns the value between an opening and closing delimiter values,
    • parameters
      • open value (byte array)
      • close value (byte array)
      • escape vaue (byte array)
      • buffer (byte array)
    • returns
      • between content (byte array)
      • buffer (byte array)
      • error value if closing value not found before end of buffer
  • Peek - returns the next token without consuming the buffer being scanned
    • parameters
      • buffer (byte array)
    • returns
      • Token
  • Skip - scans through a buffer until a token is found, returns skipped content, token and remaining buffer
    • parameters
      • Token
      • buffer (byte array)
    • returns
      • skipped content (byte array)
      • Token
      • buffer (byte array)
  • Skip2 - like Skip but allows a Tokenizer to be passed in rather than using the default Tok().
    • parameters
      • Token
      • buffer (byte array)
      • Tokenizer function
    • returns
      • skipped content (byte array)
      • Token
      • buffer (byte array)
  • Token - a simple structure
    • properties
      • Type is a string holding the label of the token type
      • Value is a byte array holding the value of the token
  • Tokenizer - is a type of function that can be applied by Tok2, may be recursive
    • parameters
      • byte array
      • a Tokenizer function
    • returns
      • Token
      • byte array of remaining buffer
  • Tok - is a simple, non-look ahead tokenizer
    • parameter
      • a byte array representing the buffer to evaluate
    • returns
      • a Token of Type Letter, Numeral, Punctuation and Space
      • the remaining buffer byte array
  • Tok2 - is a function the take
    • parameters
      • a byte array representing the buffer to evaluate
      • A Tokenizer function
    • returns
      • a Token of Type defined by the Tokenizer function
      • the remaining buffer byte array
  • Words - Is an example Tokenizer function
    • returns tokens of type Numeral, Punctuation, Space and Word

Documentation

Overview

Package tok is a niave tokenizer

@author R. S. Doiel, <rsdoiel@caltech.edu>

Copyright (c) 2016, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Index

Constants

View Source
const (
	// Version of  tok package
	Version = `v0.0.2`

	// Letter is an alphabetical letter (e.g. A-Z, a-z in English)
	Letter = "Letter"
	// Numeral is a single digit
	Numeral = "Numeral"
	// Punctuation is any non-number, non alphametical character, non-space (e.g. periods, colons, bang, hash mark)
	Punctuation = "Punctuation"
	// Space characters representing white space (e.g. space, tab, new line, carriage return)
	Space = "Space"

	// Words a sequence of characters delimited by spaces
	Word = "Word"
	// OpenCurly bracket, e.g. "{"
	OpenCurlyBracket = "OpenCurlyBracket"
	// CloseCurly bracket, e.g. "}"
	CloseCurlyBracket = "CloseCurlyBracket"
	// CurlyBracket, e.g. "{}"
	CurlyBracket = "CurlyBracket"
	// OpenSquareBracket, e.g. "["
	OpenSquareBracket = "OpenSquareBracket"
	// CloseSquareBracket, e.g. "]"
	CloseSquareBracket = "CloseSquareBracket"
	// SquareBracket, e.g. "[]"
	SquareBracket = "SquareBracket"
	// OpenAngleBracket, e.g. "<"
	OpenAngleBracket = "OpenAngleBracket"
	// CloseAngleBracket, e.g. ">"
	CloseAngleBracket = "CloseAngleBracket"
	// AngleBracket, e.g. "<>"
	AngleBracket = "AngleBracket"
	// AtSign, e.g. "@"
	AtSign = "AtSign"
	// EqualSign, e.g. "="
	EqualSign = "EqualSign"
	// DoubleQuote, e.g. "\""
	DoubleQuote = "DoubleQuote"
	// SingleQuote, e.g., "'"
	SingleQuote = "SingleQuote"

	// EOF is an end of file token type. It is separate form Space only because of it being a common stop condition
	EOF = "EOF"
)

Variables

View Source
var (
	// Numerals is a map of numbers as strings
	Numerals = []byte("0123456789")

	// Spaces is a map space symbols as strings
	Spaces = []byte(" \t\r\n")

	// PunctuationMarks map as strings
	PunctuationMarks = []byte("~!@#$%^&*()_+`-=:{}|[]\\:;\"'<>?,./")

	// These map to the specialized tokens
	AtSignMark = []byte("@")
	// EqualMark, e.g. =
	EqualMark = []byte("=")
	// DoubleQuoteMark, e.g. "\""
	DoubleQuoteMark = []byte("\"")
	// SingleQuoteMark, e.g. "'"
	SingleQuoteMark = []byte("'")

	// OpenCurlyBrackets token
	OpenCurlyBrackets = []byte("{")
	// CloseCurlyBrackets token
	CloseCurlyBrackets = []byte("}")
	// CurlyBrackets tokens
	CurlyBrackets = []byte("{}")

	// OpenSquareBrackets token
	OpenSquareBrackets = []byte("[")
	// CloseSquareBrackets token
	CloseSquareBrackets = []byte("]")
	// SquareBrackets tokens
	SquareBrackets = []byte("[]")

	// OpenAngleBrackets token
	OpenAngleBrackets = []byte("<")
	// CloseAngleBrackets token
	CloseAngleBrackets = []byte(">")
	// AngleBrackets tokens
	AngleBrackets = []byte("<>")
)

Functions

func Backup

func Backup(token *Token, buf []byte) []byte

Backup pushes a Token back onto the front of a Buffer

func Between

func Between(openValue []byte, closeValue []byte, escapeValue []byte, buf []byte) ([]byte, []byte, error)

Between returns the buf between two delimiters (e.g. curly braces)

func IsNumeral

func IsNumeral(b []byte) bool

IsNumeral checks to see if []byte is a number or not

func IsPunctuation

func IsPunctuation(b []byte) bool

IsPunctuation checks to see if []byte is some punctuation or not

func IsSpace

func IsSpace(b []byte) bool

IsSpace checks to see if []byte is a space or not

func Next

func Next(buf []byte, re *regexp.Regexp) ([]byte, []byte)

Next takes a buffer ([]byte) and a regular expression (string) and returns two []byte, first is the sub []byte until the expression is found or end of buf and the second is the remaining []byte array

func NextLine

func NextLine(buf []byte) ([]byte, []byte)

NextLine takes a buffer ([]byte) and returns the next line as a []byte and the remainder as a []byte.

Types

type Token

type Token struct {
	XMLName xml.Name `json:"-"`
	Type    string   `xml:"type" json:"type"`
	Value   []byte   `xml:"value" json:"value"`
}

Token structure for emitting simply tokens and value from Tok() and Tok2()

func Peek

func Peek(buf []byte) *Token

Peek generates a token without consuming the buffer

func Skip

func Skip(tokenType string, buf []byte) ([]byte, *Token, []byte)

Skip provides a means to advance to the next non-target Token.

func Skip2

func Skip2(tokenType string, buf []byte, fn Tokenizer) ([]byte, *Token, []byte)

func Tok

func Tok(buf []byte) (*Token, []byte)

Tok is a naive tokenizer that looks only at the next character by shifting it off the []byte and returning a token found with remaining []byte

func Tok2

func Tok2(buf []byte, fn Tokenizer) (*Token, []byte)

Tok2 provides an easy to implement look ahead tokenizer by defining a look ahead function

func TokenFromMap

func TokenFromMap(t *Token, m map[string][]byte) *Token

TokenFromMap, revaluates token type against a map of type names and byte arrays returns modified Token

func Words

func Words(tok *Token, buf []byte) (*Token, []byte)

Words is an example of implementing a Tokenizer function

func (*Token) String

func (t *Token) String() string

String returns a human readable Token struct

type TokenMap

type TokenMap map[string][]byte

TokenMap is a map of simple token names and associated array of possible bytes

type Tokenizer

type Tokenizer func(*Token, []byte) (*Token, []byte)

Tokenizer is a function that takes a current token, looks ahead in []byte and returns a revised token and remaining []byte

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL