highlight

package module
v0.0.0-...-f73c30d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 19, 2017 License: MIT Imports: 7 Imported by: 0

README

go-highlight

Description

A somewhat crude syntax highlighter loosely based on Pygments.

See chroma for a newer, better, and far more complete syntax highlighter!

Installation

go-highlight can be installed with the regular go get command:

go get github.com/johnsto/go-highlight

Lexers are stored in a separate package, so remember to install those, too:

go get github.com/johnsto/go-highlight/lexers

The CLI app can also be installed with go get:

go get github.com/johnsto/go-highlight/cmd/highlight

Run tests:

go test github.com/johnsto/go-highlight

Usage

Importing for use in your code is as simple as importing both the base highlight package, and registering the default lexers as an anonymous import:

import "github.com/johnsto/go-highlight"
import _ "github.com/johnsto/go-highlight/lexers"

Tokenizers can be retrieved by content type or filename:

tokenizer, err = highlight.GetTokenizerForContentType("application/json")
// or:
tokenizer, err = highlight.GetTokenizerForFilename("futurama.json")

Use Tokenize or TokenizeString to tokenize an io.Reader or string respectively:

err = tokenizer.Tokenize(reader, func(t highlight.Token) error {
	_, err := fmt.Printf(t.Value)
	return err
})

For colourised terminal output, import the "term" package:

import "github.com/johnsto/go-highlight/output/term"

Then instantiate the emitter and tokenize to it:

emitter = output.NewDebugOutputter()
err = tokenizer.Tokenize(reader, emitter.Emit)

Or for formatted (indented) output, use Format instead:

err = tokenizer.Format(reader, emitter.Emit)

Documentation

Index

Constants

View Source
const (
	// Error, emitted when unexpected token was encountered.
	Error TokenType = "error"
	// Comment e.g. `// this should never happen`
	Comment = "comment"
	// Number - e.g. `2716057` in `"serial": 2716057` or `serial = 2716057;`
	Number = "number"
	// String - e.g. `Fry` in `"name": "Fry"` or `var name = "Fry";`
	String = "string"
	// Text - e.g. `Fry` in `<p>Fry</p>`
	Text = "text"
	// Attribute - e.g. `name` in `"name": "Fry"`, or `font-size` in
	// `font-size: 1.2rem;`
	Attribute = "attribute"
	// Assignment - e.g. `=` in `int x = y;` or `:` in `font-size: 1.2rem;`
	Assignment = "assignment"
	// Operator - e.g. `+`/`-` in `int x = a + b - c;`
	Operator = "operator"
	// Punctuation - e.g. semi/colons in `int x, j;`
	Punctuation = "punctuation"
	// Literal - e.g. `true`/`false`/`null`.
	Literal = "literal"
	// Tag - e.g. `html`/`div`/`b`
	Tag = "tag"
	// Whitespace - e.g. \n, \t
	Whitespace = "whitespace"
)

Variables

View Source
var EndToken = Token{}
View Source
var MergeTokensFilter = FilterFunc(
	func(out func(Token) error) func(Token) error {
		curr := Token{}

		return func(t Token) error {
			if t.Type == "" {
				out(curr)
				return io.EOF
			} else if t.Type == curr.Type {

				curr.Value += t.Value
				return nil
			} else if curr.Value != "" {
				out(curr)
			}
			curr = Token{
				Value: t.Value,
				Type:  t.Type,
				State: t.State,
			}
			return nil
		}
	})

MergeTokensFilter combines Tokens if they have the same type.

View Source
var PassthroughFilter = FilterFunc(
	func(out func(Token) error) func(Token) error {
		return func(t Token) error {
			return out(t)
		}
	})

PassthroughFilter simply emits each token to the output without modification.

View Source
var RemoveEmptiesFilter = FilterFunc(
	func(out func(Token) error) func(Token) error {
		return func(t Token) error {
			if t == EndToken || t.Value != "" {
				return out(t)
			}
			return nil
		}
	})

RemoveEmptiesFilter removes empty (zero-length) tokens from the output.

Functions

func GetTokenizers

func GetTokenizers() map[string]Tokenizer

GetTokenizers returns the map of known Tokenizers.

func Register

func Register(name string, t Tokenizer)

Register registers the given Tokenizer under the specified name. Any existing Tokenizer under that name will be replaced.

Types

type Emitter

type Emitter interface {
	// Emit emits the given token to some output
	Emit(t Token) error
}

Emitter is any type that supports emitting tokens to some output

type Filter

type Filter interface {
	// Filter reads tokens from `in` and outputs tokens to `out`, typically
	// modifying or filtering them along the way. The function should return
	// as soon as the input is exhausted (i.e. the channel is closed), or an
	// error is encountered.
	Filter(out func(Token) error) func(Token) error
}

Filter describes a type that is capable of filtering/processing tokens.

type FilterFunc

type FilterFunc func(out func(Token) error) func(Token) error

FilterFunc is a helper type allowing filter functions to be used as filters.

func (FilterFunc) Filter

func (f FilterFunc) Filter(out func(Token) error) func(Token) error

type Filters

type Filters []Filter

func (Filters) Filter

func (fs Filters) Filter(out func(Token) error) func(Token) error

Filter runs the input through each filter in series, emitting the final result to `out`. This function will return as soon as the last token has been processed, or iff an error is encountered by one of the filters.

It is safe to close the output channel as soon as this function returns.

type IncludeRule

type IncludeRule struct {
	StateMap  *StateMap
	StateName string
}

IncludeRule allows the states of another Rule to be referenced.

func (IncludeRule) Find

func (r IncludeRule) Find(subject string) (int, Rule)

func (IncludeRule) Match

func (r IncludeRule) Match(subject string) (int, Rule, []Token, error)

func (IncludeRule) Stack

func (r IncludeRule) Stack() []string

type Lexer

type Lexer struct {
	Name      string
	States    States
	Filters   Filters
	Formatter Filter
	Filenames []string
	MimeTypes []string
}

Lexer defines a simple state-based lexer.

func (Lexer) AcceptsFilename

func (l Lexer) AcceptsFilename(name string) (bool, error)

AcceptsFilename returns true if this Lexer thinks it is suitable for the given filename. An error will be returned iff an invalid filename pattern is registered by the Lexer.

func (Lexer) AcceptsMediaType

func (l Lexer) AcceptsMediaType(media string) (bool, error)

AcceptsMediaType returns true if this Lexer thinks it is suitable for the given meda (MIME) type. An error wil be returned iff the given mime type is invalid.

func (Lexer) Format

func (l Lexer) Format(r *bufio.Reader, emit func(Token) error) error

func (Lexer) ListFilenames

func (l Lexer) ListFilenames() []string

ListFilenames lists the filename patterns this Lexer supports, e.g. ["*.json"]

func (Lexer) ListMediaTypes

func (l Lexer) ListMediaTypes() []string

ListMediaTypes lists the media types this Lexer supports, e.g. ["application/json"]

func (Lexer) Tokenize

func (l Lexer) Tokenize(br *bufio.Reader, emit func(Token) error) error

Tokenize reads from the given input and emits tokens to the output channel. Will end on any error from the reader, including io.EOF to signify the end of input.

func (Lexer) TokenizeString

func (l Lexer) TokenizeString(s string) ([]Token, error)

TokenizeString is a convenience method

type RegexpRule

type RegexpRule struct {
	Regexp     *regexp.Regexp
	Type       TokenType
	SubTypes   []TokenType
	NextStates []string
}

RegexpRule matches a state if the subject matches a regular expression.

func NewRegexpRule

func NewRegexpRule(re string, t TokenType, subTypes []TokenType,
	next []string) RegexpRule

NewRegexpRule creates a new regular expression Rule.

func (RegexpRule) Find

func (r RegexpRule) Find(subject string) (int, Rule)

Find returns the first position in subject where this Rule will match, or -1 if no match was found.

func (RegexpRule) Match

func (r RegexpRule) Match(subject string) (int, Rule, []Token, error)

Match attempts to match against the beginning of the given search string. Returns the number of characters matched, and an array of tokens.

If the regular expression contains groups, they will be matched with the corresponding token type in `Rule.Types`. Any text inbetween groups will be returned using the token type defined by `Rule.Type`.

func (RegexpRule) Stack

func (r RegexpRule) Stack() []string

type Rule

type Rule interface {
	Find(subject string) (int, Rule)
	Match(subject string) (int, Rule, []Token, error)
	Stack() []string
}

type RuleSpec

type RuleSpec struct {
	// Regexp is the regular expression this rule should match against.
	Regexp string
	// Type is the token type for strings that match this rule.
	Type TokenType
	// SubTypes contains an ordered array of token types matching the order
	// of groups in the Regexp expression.
	SubTypes []TokenType
	// State indicates the next state to migrate to if this rule is
	// triggered.
	State string
	// Include specifies a state to run
	Include string
}

Rule describes the conditions required to match some subject text.

func (RuleSpec) Compile

func (rs RuleSpec) Compile(sm *StateMap) Rule

Compile converts the RuleSpec shorthand into a fully-fledged Rule.

type Stack

type Stack []string

Stack is a simple stack of string values.

func (*Stack) Empty

func (s *Stack) Empty()

Empty removes all elements from the stack.

func (*Stack) Len

func (s *Stack) Len() int

func (Stack) Peek

func (s Stack) Peek() string

Peek returns the item on the top of the stack, but does not pop it.

func (*Stack) Pop

func (s *Stack) Pop() string

Pop removes and returns the item on the top of the stack.

func (*Stack) Push

func (s *Stack) Push(v string)

Push puts a new value on to the top of the stack

type State

type State []Rule

State is a list of matching Rules.

func (State) Find

func (s State) Find(subject string) (int, Rule)

Find examines the provided string, looking for a match within the current state. It returns the position `n` at which a rule match was found, and the rule itself.

-1 will be returned if no rule could be matched, in which case the caller should disregard the string entirely (emit it as an error), and continue onto the next line of input.

0 will be returned if a rule matches at the start of the string.

Otherwise, this function will return a number of characters to skip before reaching the first matched rule. The caller should emit those first `n` characters as an error, and emit the remaining characters according to the rule.

func (State) Match

func (s State) Match(subject string) (int, Rule, []Token, error)

Match tests the subject text against all rules within the State. If a match is found, it returns the number of characters consumed, a series of tokens consumed from the subject text, and the specific Rule that was succesfully matched against.

If the start of the subject text can not be matched against any known rule, it will return a position of -1 and a nil Rule.

type StateMap

type StateMap map[string]State

StateMap is a map of states to their names.

func (StateMap) Compile

func (m StateMap) Compile() (States, error)

Compile does nothing.

func (StateMap) Get

func (m StateMap) Get(name string) State

Get returns the State with the given name.

type States

type States interface {
	Get(name string) State
	Compile() (States, error)
}

States contains lexer states

type StatesSpec

type StatesSpec map[string][]RuleSpec

StatesSpec is a container for Lexer rule specifications, and can be compiled into a full state machine.

func (StatesSpec) Compile

func (m StatesSpec) Compile() (States, error)

Compile compiles the specified states into a complete State machine, returning an error if any state fails to compile for any reason.

func (StatesSpec) Get

func (m StatesSpec) Get(name string) State

func (StatesSpec) MustCompile

func (m StatesSpec) MustCompile() States

MustCompile is a helper method that compiles the State specification, panicing on error.

type Token

type Token struct {
	Value string
	Type  TokenType
	State string
}

Token represents one item of parsed output, containing the parsed value and its detected type.

func (Token) String

func (t Token) String() string

type TokenType

type TokenType string

type Tokenizer

type Tokenizer interface {
	// Tokenize reads from the given input and emits tokens to the output
	// channel. Will end on any error from the reader, including io.EOF to
	// signify the end of input.
	Tokenize(*bufio.Reader, func(Token) error) error

	// Format behaves exactly as Tokenize, except it also formats the output.
	Format(*bufio.Reader, func(Token) error) error

	// AcceptsFilename returns true if this Lexer thinks it is suitable for
	// the given filename. An error will be returned iff an invalid filename
	// pattern is registered by the Lexer.
	AcceptsFilename(name string) (bool, error)

	// AcceptsMediaType returns true if this Lexer thinks it is suitable for
	// the given meda (MIME) type. An error wil be returned iff the given mime
	// type is invalid.
	AcceptsMediaType(name string) (bool, error)

	// ListMediaTypes lists the media types this Tokenizer advertises support
	// for, e.g. ["application/json"]
	ListMediaTypes() []string

	// ListFilenames lists the filename patterns this Tokenizer advertises
	// support for, e.g. ["*.json"]
	ListFilenames() []string
}

Tokenizer represents a type capable of tokenizing data from an input source.

func GetTokenizer

func GetTokenizer(name string) Tokenizer

GetTokenizer returns the Tokenizer of the given name.

func GetTokenizerForContentType

func GetTokenizerForContentType(contentType string) (Tokenizer, error)

GetTokenizerForContentType returns a Tokenizer for the given content type (e.g. "text/html" or "application/json"), or nil if one is not found.

func GetTokenizerForFilename

func GetTokenizerForFilename(name string) (Tokenizer, error)

GetTokenizerForFilename returns a Tokenizer for the given filename (e.g. "index.html" or "jasons.json"), or nil if one is not found.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL