thrifter

package module
v0.0.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 9, 2023 License: MIT Imports: 10 Imported by: 2

README

thrifter

Non-destructive parser/printer for thrift with zero third-party dependency.

YYCoder goreportcard GoDoc Codecov PRs Welcome

中文文档

Inspiration

There are several thrift parsers on github, but each of them have issues on preserve original format, since mostly, they are used to generate rpc code. But, this project serves different purpose, which is focus on helping you write thrift code more efficiently, thanks to its non-destructive code transformation. Since it's a non-destructive parser, we can do a lot of stuff on top of it, such as code formatting, code transforming, etc.

Currently, it's mainly used by my other little project called protobuf-thrift, which is a code transformer between protobuf and thrift.

Here are some other thrift parsers on github I discovered before start protobuf-thrift, none of them is 100 percent suitable for it.

Similar Packages
  1. go-thrift: mainly used to generate rpc code, ignore white space, comments and lose statements order

  2. thriftrw-go: thrift parser and code generator which open sourced by Uber, same issue as above

  3. thriftgo: another thrift parser and code generator, same issue as above

  4. thrift-parser: thrift parser written in typescript, same issue

So, that's why I started thinking of writing a new thrift parser which preserves all the format.

Thanks to rocambole, behind which idea is perfect for this project.

Core Concept

The main idea behind thrifter on achieve non-destructive is that, we use a linked-list to chain all the tokens.

Lets think of the essence of source code, it's just a chain of token, different token combination declare different syntax, so if we want to preserve original format, we must preserve all tokens from the source code.

The best data structure for this chain of tokens is linked-list, since it's easier to modify than array, we only need to change some pointer, and we can patch start token and end token to each ast node, so that we are able to easily iterate over tokens within a node.

When iterate over token linked-list, we also provide a Map structure for each ContainerType, such as enum/struct/service, in order to find the field node started by the token.

Usage

Initialize Parser first, and specify the io.Reader which consume source code:

parser := thrifter.NewParser(strings.NewReader(XXX), false)
// or
file, err := os.Open(XXX)
if err != nil {
   return nil, err
}
defer file.Close()
parser := thrifter.NewParser(file, false)

then, simply use parser.Parse to start parsing:

definition, err := parser.Parse(YOUR_FILE_NAME)

and that's it, now we have the root node for the source code, which structure like this:

type Thrift struct {
	NodeCommonField
	// thrift file name, if it exists
	FileName string
	// since Thrift is the root node, we need a property to access its children
	Nodes []Node
}

You might wonder what the hell is NodeCommonField nested into Thrift node, that's the magic of thrifter, we will discuss it in the AST Node section.

Code Print

The most amazing thing about thrifter is that, it is also a non-destructive code printer.

Think about this case, When you want to write a code generator to optimize your workflow, normally you would use a code parser to get the code ast, and then manipulate it. Under some circumstances, you merely want to add some new code to it and leave the rest intact, normal code parser could not able to do that, since they will ignore whitespace like line-breaks/indents.

With thrifter, you can just initialize your new code ast node, and then patch the StartToken to the original ast's token linked-list, all other code is unchanged, like this:

// 1. initialize new node, enum, for instance
// for simplicity, you can just initialize a parser to parse the code you want to generate, in order to get the code tokens linked-list
p := thrifter.NewParser(`enum a {
    A = 1
    B = 2;
    C
    D;
}`, false)
startTok := parser.next() // consume enum token
enumNode := NewEnum(startTok, nil)
if err := enumNode.parse(p); err != nil {
   t.Errorf("unexpected error: %v", err)
   return
}

// 2. patch the generated code StartToken to any where you want to put it
preNext := someNodeFromOriginalCode.EndToken.Next
someNodeFromOriginalCode.EndToken.Next = enumNode.StartToken
enumNode.EndToken.Next = preNext

// 3. last, use node.String to print the code
fmt.Println(thriftNodeFromOriginalCode.String())

Each thrifter.Node have their own String function, so, you can also print the node standalone not the whole thrift file.

The principle of String is pretty simple, it just traverse the token and write them one by one:

func toString(start *Token, end *Token) string {
	var res bytes.Buffer
	curr := start
	for curr != end {
		res.WriteString(curr.Raw)
		curr = curr.Next
	}
	res.WriteString(end.Raw)
	return res.String()
}

Note that, when you manipulate original ast like above, the original Thrift.Nodes fields is unchanged, but it doesn't affect code print, since it only iterate over tokens, not nodes. However, you can manually add the node to Thrift.Nodes by yourself for consistency.

AST Node

To understand the idea behind thrifter, there are two struct and one interface you must know:

type NodeCommonField struct {
	Parent     Node
	Next       Node
	Prev       Node
	StartToken *Token
	EndToken   *Token
}

type Token struct {
	Type  token
	Raw   string // tokens raw value, e.g. comments contain prefix, like // or /* or #; strings contain ' or "
	Value string // tokens transformed value
	Next  *Token
	Prev  *Token
	Pos   scanner.Position
}

type Node interface {
	// recursively output current node and its children
	String() string
	// recursively parse current node and its children
	parse(p *Parser) error
	// get node value
	NodeValue() interface{}
	// get node type, value specified from each node
	NodeType() string
}

Firstly, NodeCommonField is the basic of achieving non-destructive, it will be nested into each ast node, whatever node.Type is. These two fields are essential:

  • StartToken: the start token of the node, which means you can easily iterate over tokens within the node

  • EndToken: the end token of the node, when iteration within node reaches it, means iteration is done

Second struct Token represents a basic token of thrifter, a token can be a symbol, e.g. - or +, or string literal "abc" or 'abc', and also a identifier.

Note that, thrifter considers comment as a token, not a node, currently. I'm not entirely sure it is a good idea, so if some one have questions about it, please open an issue.

And the last interface Node represents a thrifter node. Since it's a interface, if you want to access the node fields, you can use NodeType to get the type of node, and then do a type assertion of the node:

for _, node := range thrift.Nodes {
    switch node.NodeType() {
    case "Namespace":
        n := node.(*thrifter.Namespace)
        fmt.Printf("Namespace: %+v", n)
    case "Enum":
        n := node.(*thrifter.Enum)
        fmt.Printf("Enum: %+v", n)
    case "Struct":
        n := node.(*thrifter.Struct)
        fmt.Printf("Struct: %+v", n)
    case "Service":
        n := node.(*thrifter.Service)
        fmt.Printf("Service: %+v", n)
    case "Include":
        n := node.(*thrifter.Include)
        fmt.Printf("Include: %+v", n)
    }
}

Notice

  1. senum not supported: since thrift officially don't recommend to use it, thrifter will not handle it, too.

  2. current parser implementation is not completely validating .thrift definitions, since we think validation feature is better to leave to specific linter.

Some packages build on top of thrifter:

Contribution

Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub.

TODO
  • [] support comment node
  • [] Thrift node support ElemsMap to map start token to each element node

Documentation

Index

Constants

View Source
const (
	CONST_VALUE_INT = iota + 1
	CONST_VALUE_FLOAT
	CONST_VALUE_IDENT
	CONST_VALUE_LITERAL
	CONST_VALUE_MAP
	CONST_VALUE_LIST
)
View Source
const (
	FIELD_TYPE_IDENT = iota + 1
	FIELD_TYPE_BASE
	FIELD_TYPE_MAP
	FIELD_TYPE_LIST
	FIELD_TYPE_SET
)

the definition of what kind of field type

View Source
const (
	FIELD_PARENT_TYPE_ARGS = iota + 1
	FIELD_PARENT_TYPE_THROWS
)
View Source
const (
	STRUCT = iota + 1
	UNION
	EXCEPTION
)
View Source
const (
	// special tokens
	T_ILLEGAL token = iota
	T_EOF
	T_IDENT
	T_STRING // string literal
	T_NUMBER // integer or float

	// white space
	T_SPACE
	T_LINEBREAK // \n
	T_RETURN    // \r
	T_TAB       // \t

	// punctuator
	T_SEMICOLON   // ;
	T_COLON       // :
	T_EQUALS      // =
	T_QUOTE       // "
	T_SINGLEQUOTE // '
	T_LEFTPAREN   // (
	T_RIGHTPAREN  // )
	T_LEFTCURLY   // {
	T_RIGHTCURLY  // }
	T_LEFTSQUARE  // [
	T_RIGHTSQUARE // ]
	T_COMMENT     // /
	T_LESS        // <
	T_GREATER     // >
	T_COMMA       // ,
	T_DOT         // .
	T_PLUS        // +
	T_MINUS       // -

	T_NAMESPACE
	T_ENUM
	T_SENUM // currently not supported
	T_CONST
	T_SERVICE
	T_STRUCT
	T_INCLUDE
	T_CPP_INCLUDE
	T_TYPEDEF
	T_UNION
	T_EXCEPTION

	// field keywords
	T_OPTIONAL
	T_REQUIRED

	// type keywords
	T_MAP
	T_SET
	T_LIST

	// function keywords
	T_ONEWAY
	T_VOID
	T_THROWS
)
View Source
const (
	SINGLE_LINE_COMMENT = iota + 1 // like this
	MULTI_LINE_COMMENT             /* like this */
	BASH_LIKE_COMMENT              // # like this
)

comment type

Variables

This section is empty.

Functions

func GenTokenHash added in v0.0.4

func GenTokenHash(t *Token) (res string)

Generate hash from token.Type + token.Raw + token.Pos, for nodes like enum/struct/service to find their element node when iterate over token.

func GetToken added in v0.0.2

func GetToken(literal string) token

Get corresponding token from string literal, mostly used for generate token.

func IsDigit added in v0.0.3

func IsDigit(lit rune) bool

isDigit returns true if the rune is a digit.

func IsKeyword added in v0.0.3

func IsKeyword(tok token) bool

isKeyword returns if tok is in the keywords range

func IsNumber added in v0.0.3

func IsNumber(str string) (isFloat bool, isInt bool)

determine whether it is an integer or a float number

func IsWhitespace added in v0.0.3

func IsWhitespace(tok token) bool

Types

type Const

type Const struct {
	NodeCommonField
	Ident string
	Type  *FieldType
	Value *ConstValue
}

func NewConst

func NewConst(start *Token, parent Node) *Const

func (*Const) NodeType

func (r *Const) NodeType() string

func (*Const) NodeValue

func (r *Const) NodeValue() interface{}

func (*Const) String

func (r *Const) String() string

type ConstList

type ConstList struct {
	NodeCommonField
	Elems []*ConstValue
}

func NewConstList

func NewConstList(start *Token, parent Node) *ConstList

func (*ConstList) NodeType

func (r *ConstList) NodeType() string

func (*ConstList) NodeValue

func (r *ConstList) NodeValue() interface{}

func (*ConstList) String

func (r *ConstList) String() string

type ConstMap

type ConstMap struct {
	NodeCommonField
	MapKeyList   []ConstValue
	MapValueList []ConstValue
}

func NewConstMap

func NewConstMap(start *Token, parent Node) *ConstMap

func (*ConstMap) NodeType

func (r *ConstMap) NodeType() string

func (*ConstMap) NodeValue

func (r *ConstMap) NodeValue() interface{}

func (*ConstMap) String

func (r *ConstMap) String() string

type ConstValue

type ConstValue struct {
	NodeCommonField
	Type  int
	Value string
	Map   *ConstMap
	List  *ConstList
}

func NewConstValue

func NewConstValue(parent Node) *ConstValue

func (*ConstValue) NodeType

func (r *ConstValue) NodeType() string

func (*ConstValue) NodeValue

func (r *ConstValue) NodeValue() interface{}

func (*ConstValue) String

func (r *ConstValue) String() string

type Enum

type Enum struct {
	NodeCommonField
	Ident    string
	Elems    []*EnumElement
	Options  []*Option
	ElemsMap map[string]*EnumElement // startToken hash => EnumElement node
}

func NewEnum

func NewEnum(start *Token, parent Node) *Enum

func (*Enum) NodeType

func (r *Enum) NodeType() string

func (*Enum) NodeValue

func (r *Enum) NodeValue() interface{}

func (*Enum) String

func (r *Enum) String() string

type EnumElement

type EnumElement struct {
	NodeCommonField
	ID      int
	Ident   string
	Options []*Option
}

func NewEnumElement

func NewEnumElement(parent Node) *EnumElement

func (*EnumElement) NodeType

func (r *EnumElement) NodeType() string

func (*EnumElement) NodeValue

func (r *EnumElement) NodeValue() interface{}

func (*EnumElement) String

func (r *EnumElement) String() string

type Field

type Field struct {
	NodeCommonField
	ID           int
	Requiredness string
	FieldType    *FieldType
	Ident        string
	DefaultValue *ConstValue
	Options      []*Option
}

Field represent a field within struct/union/exception

func NewField

func NewField(parent Node) *Field

func (*Field) NodeType

func (r *Field) NodeType() string

func (*Field) NodeValue

func (r *Field) NodeValue() interface{}

func (*Field) String

func (r *Field) String() string

type FieldType

type FieldType struct {
	NodeCommonField
	Type     int
	Ident    string
	BaseType string
	Map      *MapType
	List     *ListType
	Set      *SetType
	Options  []*Option
}

func NewFieldType

func NewFieldType(parent Node) *FieldType

func (*FieldType) NodeType

func (r *FieldType) NodeType() string

func (*FieldType) NodeValue

func (r *FieldType) NodeValue() interface{}

func (*FieldType) String

func (r *FieldType) String() string

type Function

type Function struct {
	NodeCommonField
	Ident        string
	Throws       []*Field
	Oneway       bool
	FunctionType *FieldType
	Void         bool
	Args         []*Field
	Options      []*Option
	ArgsMap      map[string]*Field // startToken hash => Argument Field
	ThrowsMap    map[string]*Field // startToken hash => Throws Field
}

func NewFunction

func NewFunction(parent Node) *Function

func (*Function) NodeType

func (r *Function) NodeType() string

func (*Function) NodeValue

func (r *Function) NodeValue() interface{}

func (*Function) String

func (r *Function) String() string

type Include

type Include struct {
	NodeCommonField
	FilePath string
}

func NewInclude

func NewInclude(start *Token, parent Node) *Include

func (*Include) NodeType

func (r *Include) NodeType() string

func (*Include) NodeValue

func (r *Include) NodeValue() interface{}

func (*Include) String

func (r *Include) String() string

type ListType

type ListType struct {
	NodeCommonField
	Elem    *FieldType
	CppType string
}

func NewListType

func NewListType(start *Token, parent Node) *ListType

func (*ListType) NodeType

func (r *ListType) NodeType() string

func (*ListType) NodeValue

func (r *ListType) NodeValue() interface{}

func (*ListType) String

func (r *ListType) String() string

type MapType

type MapType struct {
	NodeCommonField
	// since directly use map structure its hard to index and will lead to loss of order, we use slice to represent map type, use slice index to mapping
	Key     *FieldType
	Value   *FieldType
	CppType string
}

func NewMapType

func NewMapType(start *Token, parent Node) *MapType

func (*MapType) NodeType

func (r *MapType) NodeType() string

func (*MapType) NodeValue

func (r *MapType) NodeValue() interface{}

func (*MapType) String

func (r *MapType) String() string

type Namespace

type Namespace struct {
	NodeCommonField
	Name    string
	Value   string
	Options []*Option
}

func NewNamespace

func NewNamespace(start *Token, parent Node) *Namespace

func (*Namespace) NodeType

func (r *Namespace) NodeType() string

func (*Namespace) NodeValue

func (r *Namespace) NodeValue() interface{}

func (*Namespace) String

func (r *Namespace) String() string

type Node

type Node interface {
	// recursively output current node and its children
	String() string

	// get node value
	NodeValue() interface{}
	// get node type, value specified from each node
	NodeType() string
	// contains filtered or unexported methods
}

type NodeCommonField

type NodeCommonField struct {
	Parent     Node
	Next       Node
	Prev       Node
	StartToken *Token
	EndToken   *Token
}

type Option

type Option struct {
	NodeCommonField
	Name  string
	Value string
}

Represent a single option, e.g. a = "123"

func NewOption

func NewOption(parent Node) *Option

func (*Option) NodeType

func (r *Option) NodeType() string

func (*Option) NodeValue

func (r *Option) NodeValue() interface{}

func (*Option) String

func (r *Option) String() string

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

func NewParser

func NewParser(rd io.Reader, debug bool) *Parser

func (*Parser) Parse

func (p *Parser) Parse(fileName string) (res *Thrift, err error)

parse a thrift file

type Service

type Service struct {
	NodeCommonField
	Ident    string
	Elems    []*Function
	Extends  string
	Options  []*Option
	ElemsMap map[string]*Function // startToken hash => Function node
}

func NewService

func NewService(start *Token, parent Node) *Service

func (*Service) NodeType

func (r *Service) NodeType() string

func (*Service) NodeValue

func (r *Service) NodeValue() interface{}

func (*Service) String

func (r *Service) String() string

type SetType

type SetType struct {
	NodeCommonField
	Elem    *FieldType
	CppType string
}

func NewSetType

func NewSetType(start *Token, parent Node) *SetType

func (*SetType) NodeType

func (r *SetType) NodeType() string

func (*SetType) NodeValue

func (r *SetType) NodeValue() interface{}

func (*SetType) String

func (r *SetType) String() string

type Struct

type Struct struct {
	NodeCommonField
	Type     int
	Ident    string
	Elems    []*Field
	Options  []*Option
	ElemsMap map[string]*Field // startToken hash => Field node
}

func NewStruct

func NewStruct(start *Token, parent Node) *Struct

func (*Struct) NodeType

func (r *Struct) NodeType() string

func (*Struct) NodeValue

func (r *Struct) NodeValue() interface{}

func (*Struct) String

func (r *Struct) String() string

type Thrift

type Thrift struct {
	NodeCommonField
	// thrift file name, if it exists
	FileName string
	// since Thrift is the root node, we need a property to access its children
	Nodes []Node
}

func NewThrift

func NewThrift(parent Node, FileName string) *Thrift

func (*Thrift) NodeType

func (r *Thrift) NodeType() string

func (*Thrift) NodeValue

func (r *Thrift) NodeValue() interface{}

func (*Thrift) String

func (r *Thrift) String() string

type Token

type Token struct {
	Type  token
	Raw   string // tokens raw value, e.g. comments contain prefix, like // or /* or #; strings contain ' or "
	Value string // tokens transformed value
	Next  *Token
	Prev  *Token
	Pos   scanner.Position
}

type TypeDef

type TypeDef struct {
	NodeCommonField
	Type    *FieldType // except for identifier
	Ident   string
	Options []*Option
}

func NewTypeDef

func NewTypeDef(start *Token, parent Node) *TypeDef

func (*TypeDef) NodeType

func (r *TypeDef) NodeType() string

func (*TypeDef) NodeValue

func (r *TypeDef) NodeValue() interface{}

func (*TypeDef) String

func (r *TypeDef) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL