goawk

command module
v1.26.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 28, 2024 License: MIT Imports: 16 Imported by: 0

README

GoAWK: an AWK interpreter with CSV support

Documentation GitHub Actions Build

AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse The AWK Programming Language I was inspired to write an interpreter for it in Go. So here it is, feature-complete and tested against "the one true AWK" and GNU AWK test suites.

GoAWK is a POSIX-compatible version of AWK, and additionally has a CSV mode for reading and writing CSV and TSV files. This feature was sponsored by the library of the University of Antwerp. Read the CSV documentation.

You can also read one of the articles I've written about GoAWK:

Basic usage

To use the command-line version, simply use go install to install it, and then run it using goawk (assuming ~/go/bin is in your PATH):

$ go install github.com/benhoyt/goawk@latest

$ goawk 'BEGIN { print "foo", 42 }'
foo 42

$ echo 1 2 3 | goawk '{ print $1 + $3 }'
4

# Or use GoAWK's CSV and @"named-field" support:
$ echo -e 'name,amount\nBob,17.50\nJill,20\n"Boba Fett",100.00' | \
  goawk -i csv -H '{ total += @"amount" } END { print total }'
137.5

To use it in your Go programs, you can call interp.Exec() directly for simple needs:

input := strings.NewReader("foo bar\n\nbaz buz")
err := interp.Exec("$0 { print $1 }", " ", input, nil)
if err != nil {
    fmt.Println(err)
    return
}
// Output:
// foo
// baz

Or you can use the parser module and then interp.ExecProgram() to control execution, set variables, and so on:

src := "{ print NR, tolower($0) }"
input := "A\naB\nAbC"

prog, err := parser.ParseProgram([]byte(src), nil)
if err != nil {
    fmt.Println(err)
    return
}
config := &interp.Config{
    Stdin: strings.NewReader(input),
    Vars:  []string{"OFS", ":"},
}
_, err = interp.ExecProgram(prog, config)
if err != nil {
    fmt.Println(err)
    return
}
// Output:
// 1:a
// 2:ab
// 3:abc

If you need to repeat execution of the same program on different inputs, you can call interp.New once, and then call the returned object's Execute method as many times as you need.

Read the package documentation for more details.

Differences from AWK

The intention is for GoAWK to conform to awk's behavior and to the POSIX AWK spec, but this section describes some areas where it's different.

Additional features GoAWK has over AWK:

  • It has proper support for CSV and TSV files (read the documentation).
  • It's the only AWK implementation we know with a code coverage feature (read the documentation).
  • It supports negative field indexes to access fields from the right, for example, $-1 refers to the last field.
  • It's embeddable in your Go programs! You can even call custom Go functions from your AWK scripts.
  • Most AWK scripts are faster than awk and on a par with gawk, though usually slower than mawk. (See recent benchmarks.)
  • The parser supports 'single-quoted strings' in addition to "double-quoted strings", primarily to make Windows one-liners easier when using the cmd.exe shell (which uses " as the quote character).

Things AWK has over GoAWK:

  • Scripts that use regular expressions are slower than other implementations (unfortunately Go's regexp package is relatively slow).
  • AWK is written by Alfred Aho, Peter Weinberger, and Brian Kernighan.

Stability

This project has a good suite of tests, which include my own intepreter tests, the original AWK test suite, and the relevant tests from the Gawk test suite. I've used it a bunch personally, and it's used in the Benthos stream processor as well as by the software team at the library of the University of Antwerp. However, to err == human, so please use GoAWK at your own risk. I intend not to change the Go API in a breaking way in any v1.x.y version.

AWKGo

The GoAWK repository also includes the creatively-named AWKGo, an AWK-to-Go compiler. This is experimental and is not subject to the stability requirements of GoAWK itself. You can read more about AWKGo or browse the code on the awkgo branch.

License

GoAWK is licensed under an open source MIT license.

The end

Have fun, and please contact me if you're using GoAWK or have any feedback!

Documentation

Overview

Package goawk is an implementation of AWK with CSV support

You can use the command-line "goawk" command or run AWK from your Go programs using the "interp" package. The command-line program has the same interface as regular awk:

goawk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]

The -F flag specifies the field separator (the default is to split on whitespace). The -v flag allows you to set a variable to a given value (multiple -v flags allowed). The -f flag allows you to read AWK source from a file instead of the 'prog' command-line argument. The rest of the arguments are input filenames (default is to read from stdin).

A simple example (prints the sum of the numbers in the file's second column):

$ echo 'foo 12
> bar 34
> baz 56' >file.txt
$ goawk '{ sum += $2 } END { print sum }' file.txt
102

To use GoAWK in your Go programs, see README.md or the "interp" package docs.

Directories

Path Synopsis
internal
ast
compiler
Package compiler compiles an AST to virtual machine instructions.
Package compiler compiles an AST to virtual machine instructions.
cover
Package cover implements AWK code coverage and reporting.
Package cover implements AWK code coverage and reporting.
parseutil
Package parseutil contains various utilities for parsing GoAWK source code.
Package parseutil contains various utilities for parsing GoAWK source code.
resolver
Package resolver assigns integer indexes to functions and variables, as well as determining and checking their types (scalar or array).
Package resolver assigns integer indexes to functions and variables, as well as determining and checking their types (scalar or array).
Package interp is the GoAWK interpreter.
Package interp is the GoAWK interpreter.
Package lexer is an AWK lexer (tokenizer).
Package lexer is an AWK lexer (tokenizer).
Package parser is an AWK parser and abstract syntax tree.
Package parser is an AWK parser and abstract syntax tree.
scripts

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL