sham

package module
v0.0.0-...-c0144c8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2021 License: MIT Imports: 13 Imported by: 0

README

sham

sham is a random data generator that uses a custom DSL for defining the data's shape.

Install

go get -u github.com/mattmeyers/sham/cmd/sham/...

Usage

sham is a tool for generating random data

Usage:

	sham [options] <schema>

Options:
	-f value	set the output format: json, xml (default json)
	-n int		the number of generations to perform (default 1)		
	-pretty		pretty print the result
	-h, --help	show this help message

To ensure the schema is not affected by any shell escaping, it is recommended that the schema be surrounded by single quotes.

Example

The following schema

{
    "name": name,
    "friends": [(1,5),
        {
            "name": name,
            "age": (20,30),
            "phone": /\(\d{3}\) \d{3}-\d{4}/,
            "job": /programmer|accountant|lawyer/
        }
    ]
}

will produce data such as

{
    "name": "John Doe",
    "friends": [
        {
            "name: "Bob Smith",
            "age": 21,
            "phone": "(555) 746-8193",
            "job": "programmer"
        },
        {
            "name: "Matt Doe",
            "age": 28,
            "phone": "(555) 395-1823",
            "job": "lawyer"
        }
    ]
}

Sham Language

The Sham language defines the structure of the random data. This language is a superset of JSON that adds integer ranges, generator functions, and regular expressions. For the full grammar, refer to doc/sham.ebnf. For the base JSON grammar, refer to RFC 8259. Sham adds three structures to this grammar:

Ranges

A range is an inclusive range of integers defined by the production

range : '(' INTEGER ',' INTEGER ')' ;

where the first integer is the min and the second is the max. This range includes both the min and max. If a range appears at the beginning of an array, the a random number of elements will be generated in the array. In any other position, a range will evaluate to a random integer in the range.

Terminal Generators

A terminal generator is a function identifier defined by the production

generator : [a-zA-Z][a-zA-Z]* ;

In the generated data, the terminal generator will be replaced by a single value. Generators must match a function defined by the sham CLI tool. Unkown generators will return a parsing error.

Regular Expressions

While regular expressions are normally used to match text, Sham provides the ability to instead generate data from a regular expression. Regular expressions are defined by the production

regex : '/' .* '/'

Note: This regular expression is simplified and technically incorrect. Any valid Go flavored regular expression should work though.

More generally, a regular expression is a string of characters enclosed by two / characters. These expressions are of the Go flavor.

Documentation

Overview

Package sham generates pseudorandom data from a supplied schema written in the Sham language.

Index

Constants

This section is empty.

Variables

View Source
var TerminalGenerators = map[string]Generator{
	"name":        GeneratorFunc(stringAdaptor(gen.Name)),
	"firstName":   GeneratorFunc(stringAdaptor(gen.FirstName)),
	"lastName":    GeneratorFunc(stringAdaptor(gen.LastName)),
	"phoneNumber": GeneratorFunc(stringAdaptor(gen.PhoneNumber)),
	"timestamp":   GeneratorFunc(timeAdaptor(gen.Timestamp)),
}

TerminalGenerators is the standard collection of terminal generators provided by Sham.

Functions

func Generate

func Generate(schema []byte) (interface{}, error)

Generate parses a Sham schema, and on success, performs a single generation of data using the default terminal generators. This function is intended to be a simple wrapper for the Sham data generation process. If multiple generations are required, or custom terminal generators are needed, then a parser should be instantiated with NewParser. After successfully parsing, the resulting Schema object can be used to generate data multiple times without parsing the schema.

Types

type Array

type Array struct {
	Range *Range
	Inner Node
}

Array represents a list of values that can be generated. An array has two potential parts: an inner node and a range. The inner node defines the structure of the array elements, and the range defines how many elements should be present. The range is optional. If omitted, one element will be generated.

func (Array) Generate

func (a Array) Generate() interface{}

Generate creates a slice of generated values where each value is defined by the inner node field. If the range is omitted, then exactly one element will populate the array. Otherwise, a random number of elements will be generated based on the inclusive range of integers.

type FormattedString

type FormattedString struct {
	Raw    string
	Format string
	Params []Generator
}

FormattedString represents a string literal with values that can be interpolated into the string. In the Sham language, formatted strings are enclosed in backticks, and the interpolated values are enclosed by curly braces. Interpolated values must be valid, registered terminal generators.

func (FormattedString) Generate

func (f FormattedString) Generate() interface{}

Generate produces a string literal value by replacing interpolated values with the values generated by the corresponding terminal generator.

type Generator

type Generator interface {
	Generate() interface{}
}

Generator represents the core functionality behind Sham's data generation. Any type that implements this interface can be used to generate data. Implementors can be either data structures containing more data, or simpler functions that directly generate a sinlge piece of data. These latter objects are referred to as terminal generators since they are generally found as leaves in the AST.

type GeneratorFunc

type GeneratorFunc func() interface{}

GeneratorFunc is a simple function type that implements the Generator interface. This type can be used to provide single functions as Generators.

func (GeneratorFunc) Generate

func (f GeneratorFunc) Generate() interface{}

type KV

type KV struct {
	Key   string
	Value Node
}

KV represents a single key-value pair in an Object.

type Literal

type Literal struct {
	Value interface{}
}

Literal represents a literal value. No data generation is involved here, but rather values are returned as-is.

func (Literal) Generate

func (l Literal) Generate() interface{}

Generate returns the literal value.

type Node

type Node interface {
	Generator
}

Node represents a single element in the abstract syntax tree. A valid Sham AST must be able to generate data. As such, every node in tree must be implement the Generator interface. There are two main type of nodes: structural and terminal. Terminal nodes are leaves that generate values. Structural nodes generate the data structures that hold these values.

type Object

type Object struct {
	Values []KV
}

Object represents a key-value data structure. In order to maintain the key order in the schema, the pairs are stored in a slice and converted to an ordered map during the generation process. If a key is provided multiple times, the last value will be used during generation.

func (*Object) AppendPair

func (m *Object) AppendPair(k string, v Node)

AppendPair adds a key-value pair to an Object.

func (Object) Generate

func (m Object) Generate() interface{}

Generate creates a map of key-value pairs from the slice of KVs. An ordered map is used to preserve the order of the provided keys. If the same key is provided multiple times, then only the last value will be used.

type OrderedMap

type OrderedMap struct {
	Values map[string]interface{}
	Keys   []string
}

func NewOrderedMap

func NewOrderedMap() *OrderedMap

func (*OrderedMap) MarshalJSON

func (m *OrderedMap) MarshalJSON() ([]byte, error)

func (*OrderedMap) MarshalXML

func (m *OrderedMap) MarshalXML(e *xml.Encoder, start xml.StartElement) error

func (*OrderedMap) Set

func (m *OrderedMap) Set(k string, v interface{})

type Parser

type Parser struct {
	TerminalGenerators map[string]Generator
	// contains filtered or unexported fields
}

Parser maintains the internal state of the language parser. This struct takes a slice of tokens as input and produces an AST if and only if the token stream represents a valid Sham schema. Part of the internal state is the terminalGenerator map. This map of terminal generators must be set prior to the initiating the parsing method. If a terminal generator is referenced in the schema, but not defined in the terminal generator map, then the parsing process will be halted with an error.

To ensure the parser begins with the proper state, one of the constructor functions should be used.

func NewDefaultParser

func NewDefaultParser(d []byte) *Parser

NewDefaultParser creates a new Parser instance using the default terminal generators map.

func NewParser

func NewParser(d []byte) *Parser

NewParser creates a new Parser instance with an empty terminal generator map.

func (*Parser) Parse

func (p *Parser) Parse() (Schema, error)

Parse generates a new AST from the schema provided to the parser. The parser combines multiple steps to generate this structure. The schema is first tokenized. If an invalid token is presented (either an unknown character or an unterminated sequence), then a scanning error will be returned. Upon success, the slice of tokens will be parsed. If the tokens are representative of a valid Sham schema, then an AST will be returned. Otherwise and error will be returned.

func (*Parser) RegisterGenerators

func (p *Parser) RegisterGenerators(gs map[string]Generator)

RegisterGenerators merges a terminal generator map into the parser's internal terminal generator map. If a generator is already registered, then the existing generator will be overwritten with the new. To avoid parsing errors, all terminal generators should be registered prior to parsing.

type QuoteType

type QuoteType byte
const (
	QuoteSingle   QuoteType = '\''
	QuoteDouble   QuoteType = '"'
	QuoteBacktick QuoteType = '`'
)

type Range

type Range struct {
	Min int
	Max int
}

Range is an inclusive range of integers. Ranges have two uses within the a Sham schema. If provided as the first argument in an array, the range will be used to determine the number of elements to populate the array. If provided in the position of a terminal generator, then a random integer will be generated for the value.

func (Range) Generate

func (r Range) Generate() interface{}

Generate chooses a random integer from the inclusive range.

func (Range) GetValue

func (r Range) GetValue() int

GetValue retrieves a random integer from the inclusive range [min, max]. The chosen integer is not cryptographically secure and should never be treated as such.

type Regex

type Regex struct {
	Pattern string
	// contains filtered or unexported fields
}

Regex holds a compiled regex. While any valid regular expression can be provided, only a subset will actually generate data. Every node in a parsed regex leads to a possible choice. During data generation, a random path through the parsed expression is taken. Therefore, a complciated expression has the potential to lead to wildly different performance on repeated generations.

TODO: fully document nodes that can generate data

func NewRegex

func NewRegex(pattern string) (Regex, error)

NewRegex parses a regular expression. Regular expressions are of the Go flavor and use Perl flags.

func (Regex) Generate

func (r Regex) Generate() interface{}

Generate traverses a parsed regular expression and generates data where applicable.

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner maintains of the state of the tokenization process. This scanner maintains an internal buffer to minimize allocations as the Scanner reads through the source.

func NewScanner

func NewScanner(b []byte) *Scanner

NewScanner initializes a Scanner with the provided schema.

func (*Scanner) Scan

func (s *Scanner) Scan() (tok TokenType, lit string)

Scan consumes characters in the source until a full token is determined. An error will never occur while scanning. Instead, TokInvalid will be returned if a token cannot be created.

Whitespace is not important in the Sham language. If whitespace is encountered outside of string literals or regular expressions, then it will be aggregated into a single TokWS token.

type Schema

type Schema struct {
	Root Node
}

Schema represents a Sham schema and holds the root of the AST. The root of the AST must either be a single terminal node, or a structural node.

func (Schema) Generate

func (s Schema) Generate() interface{}

Generate triggers the Sham data generation process. The generation process begins with the root and walks the tree, generating data structures and data as it goes. Data generation cannot cause errors. Any possible errors will have been caught during the tokenization and parsing processes. This method can be called more than once to generate more data using the same schema.

type TerminalGenerator

type TerminalGenerator struct {
	Name string
	// contains filtered or unexported fields
}

TerminalGenerator represents a function that can generate data.

func (TerminalGenerator) Generate

func (t TerminalGenerator) Generate() interface{}

Generate runs the terminal generator's generation function. The generator is expected to be a non nil interface. If nil was registered for this terminal generator, then this method will panic.

type Token

type Token struct {
	Type  TokenType
	Value string
}

func Tokenize

func Tokenize(source []byte) ([]Token, error)

Tokenize initializes a Scanner and performs the tokenization of the source. The Scanner will continue reading until EOF or an invalid token is read.

type TokenType

type TokenType int
const (
	TokInvalid TokenType = iota
	TokEOF
	TokWS
	// Structural tokens
	TokLBrace
	TokRBrace
	TokLBracket
	TokRBracket
	TokLParen
	TokRParen
	TokColon
	TokComma

	TokString
	TokFString
	TokRegex
	TokInteger
	TokFloat
	TokIdent

	TokNull
	TokTrue
	TokFalse
)

func (TokenType) String

func (t TokenType) String() string

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL