csvbuddy

package module
v0.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 9, 2022 License: ISC Imports: 10 Imported by: 0

README

csvbuddy - convenient CSV codec for Go

GoDoc Go Report Card

Overview

Package csvbuddy implements a convenient interface for encoding and decoding CSV files.

Install

go get -u github.com/askeladdk/csvbuddy

Quickstart

Use Marshal and Unmarshal to encode and decode slices of structs to and from byte slices. Only slices of structs can be encoded and decoded because CSV is defined as a list of records.

type Person struct {
    Name string `csv:"name"`
    Age  int    `csv:"age"`
}

boys := []Person{
    {"Stan", 10},
    {"Kyle", 10},
    {"Cartman", 10},
    {"Kenny", 10},
    {"Ike", 5},
}

text, _ := csvbuddy.Marshal(&boys)

var boys2 []Person
_ = csvbuddy.Unmarshal(text, &boys2)

fmt.Println(string(text))
fmt.Println(reflect.DeepEqual(&boys, &boys2))
// Output:
// name,age
// Stan,10
// Kyle,10
// Cartman,10
// Kenny,10
// Ike,5
//
// true

Use NewEncoder and NewDecoder to produce and consume output and input streams.

func encodePeople(w io.Writer, people *[]People) error {
    return csvbuddy.NewEncoder(w).Encode(people)
}

func decodePeople(r io.Reader, people *[]People) error {
    return csvbuddy.NewDecoder(r).Decode(people)
}

Use the SetReaderFunc and SetWriterFunc methods if you need more control over CSV parsing and writing. You can also provide a custom parser and writer by implementing the Reader and Writer interfaces.

enc := csvbuddy.NewEncoder(w)
enc.SetWriterFunc(func(w io.Writer) csvbuddy.Writer {
    cw := csv.NewWriter(w)
    cw.Comma = ';'
    return cw
})
dec := csvbuddy.NewDecoder(r)
dec.SetReaderFunc(func(r io.Reader) csvbuddy.Reader {
    cr := csv.NewReader(w)
    cr.Comma = ';'
    cr.ReuseRecord = true
    return cr
})

Use the SetMapFunc method to perform data cleaning on the fly.

dec := csvbuddy.NewDecoder(w)
dec.SetMapFunc(func(name, value string) string {
    value = strings.TrimSpace(value)
    if name == "age" && value == "" {
        value = "0"
    }
    return value
})

Use the Iterate method to decode a CSV as a stream of rows. This allows decoding of very large CSV files without having to read it entirely into memory.

dec := csvbuddy.NewDecoder(r)
var row structType

iter, _ := dec.Iterate(&row)

for iter.Scan() {
    fmt.Println(row)
}

Read the rest of the documentation on pkg.go.dev. It's easy-peasy!

Performance

Unscientific benchmarks on my laptop suggest that the performance is comparable with csvutil.

% go test -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: bench_test
cpu: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

# Marshal
BenchmarkMarshal/csvutil.Marshal/1_record-4         	  162283	      6300 ns/op	   10395 B/op	      13 allocs/op
BenchmarkMarshal/csvutil.Marshal/10_records-4       	   73813	     16208 ns/op	   11243 B/op	      22 allocs/op
BenchmarkMarshal/csvutil.Marshal/100_records-4      	   10000	    113700 ns/op	   27372 B/op	     113 allocs/op
BenchmarkMarshal/csvutil.Marshal/1000_records-4     	    1069	   1150120 ns/op	  185336 B/op	    1016 allocs/op
BenchmarkMarshal/csvutil.Marshal/10000_records-4    	     100	  11192591 ns/op	 1542795 B/op	   10019 allocs/op
BenchmarkMarshal/csvutil.Marshal/100000_records-4   	      10	 105524150 ns/op	22383758 B/op	  100023 allocs/op
BenchmarkMarshal/gocsv.Marshal/1_record-4           	  261706	      4046 ns/op	    4712 B/op	      31 allocs/op
BenchmarkMarshal/gocsv.Marshal/10_records-4         	   53725	     21879 ns/op	    7145 B/op	     238 allocs/op
BenchmarkMarshal/gocsv.Marshal/100_records-4        	    5115	    208961 ns/op	   39120 B/op	    2309 allocs/op
BenchmarkMarshal/gocsv.Marshal/1000_records-4       	     554	   2113323 ns/op	  355617 B/op	   23012 allocs/op
BenchmarkMarshal/gocsv.Marshal/10000_records-4      	      57	  20734269 ns/op	 3303851 B/op	  230019 allocs/op
BenchmarkMarshal/gocsv.Marshal/100000_records-4     	       5	 240347370 ns/op	40780699 B/op	 2300032 allocs/op
BenchmarkMarshal/csvbuddy.Marshal/1_record-4        	  238672	      5174 ns/op	    5531 B/op	      22 allocs/op
BenchmarkMarshal/csvbuddy.Marshal/10_records-4      	   76998	     15281 ns/op	    6955 B/op	      94 allocs/op
BenchmarkMarshal/csvbuddy.Marshal/100_records-4     	    9872	    123790 ns/op	   28843 B/op	     815 allocs/op
BenchmarkMarshal/csvbuddy.Marshal/1000_records-4    	     994	   1379329 ns/op	  244415 B/op	    8018 allocs/op
BenchmarkMarshal/csvbuddy.Marshal/10000_records-4   	      96	  12226165 ns/op	 2178207 B/op	   80021 allocs/op
BenchmarkMarshal/csvbuddy.Marshal/100000_records-4  	       9	 121619259 ns/op	28867782 B/op	  800025 allocs/op

# Unmarshal
BenchmarkUnmarshal/csvutil.Unmarshal/1_record-4             	  167607	      7193 ns/op	    8132 B/op	      31 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/10_records-4           	   78772	     14302 ns/op	    9084 B/op	      40 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/100_records-4          	   10000	    104986 ns/op	   18541 B/op	     130 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/1000_records-4         	    1317	    835977 ns/op	  113918 B/op	    1030 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/10000_records-4         	     140	   8483102 ns/op	 1058253 B/op	   10030 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/100000_records-4        	      13	  90634928 ns/op	11056816 B/op	  100031 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/1_record-4                	  128088	      8749 ns/op	    7571 B/op	      65 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/10_records-4              	   35547	     33419 ns/op	   15467 B/op	     320 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/100_records-4             	    3704	    279071 ns/op	   92221 B/op	    2843 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/1000_records-4            	     429	   2790427 ns/op	  878584 B/op	   28047 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/10000_records-4           	      39	  33458428 ns/op	 9066452 B/op	  280054 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/100000_records-4          	       4	 309066282 ns/op	95385456 B/op	 2800067 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/1_record-4                	   66268	     16141 ns/op	    9163 B/op	      96 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/10_records-4              	   14227	     80105 ns/op	   22109 B/op	     496 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/100_records-4             	    1738	    689075 ns/op	  145894 B/op	    4459 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/1000_records-4            	     171	   6987786 ns/op	 1442679 B/op	   44063 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/10000_records-4           	      13	  77296045 ns/op	15585227 B/op	  440072 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/100000_records-4          	       2	 756436991 ns/op	164327960 B/op	 4400088 allocs/op
BenchmarkUnmarshal/csvbuddy.Unmarshal/1_record-4             	  176460	      5842 ns/op	    6483 B/op	      32 allocs/op
BenchmarkUnmarshal/csvbuddy.Unmarshal/10_records-4           	   76069	     17500 ns/op	   10035 B/op	      63 allocs/op
BenchmarkUnmarshal/csvbuddy.Unmarshal/100_records-4          	   10000	    106009 ns/op	   39860 B/op	     336 allocs/op
BenchmarkUnmarshal/csvbuddy.Unmarshal/1000_records-4         	    1092	   1072939 ns/op	  396786 B/op	    3040 allocs/op
BenchmarkUnmarshal/csvbuddy.Unmarshal/10000_records-4        	      79	  12884045 ns/op	 5068202 B/op	   30048 allocs/op
BenchmarkUnmarshal/csvbuddy.Unmarshal/100000_records-4       	       8	 132744382 ns/op	56674790 B/op	  300060 allocs/op

License

Package csvbuddy is released under the terms of the ISC license.

Documentation

Overview

Package csvbuddy implements a convenient interface for encoding and decoding CSV files.

Only slices of structs can be encoded and decoded because CSV is defined as a list of records.

Every exported struct field is interpreted as a CSV column. Struct fields are automatically mapped by name to a CSV column. Use the "csv" struct field tag to customize how each field is marshaled.

// StructA demonstrates CSV struct field tags.
type StructA struct {
    // The first param is always the name, which can be empty.
    // Default is the name of the field.
    Name string `csv:"name"`
    // Exported fields with name "-" are ignored.
    Ignored int `csv:"-"`
    // Use base to set the integer base. Default is 10.
    Hex uint `csv:"addr,base=16"`
    // Use prec and fmt to set floating point precision and format. Default is -1 and 'f'.
    Flt float64 `csv:"flt,prec=6,fmt=E"`
    // Inline structs with inline tag.
    // Any csv fields in the inlined struct are also (un)marshaled.
    // Beware of naming clashes.
    B StructB `csv:",inline"`
    // Embedded structs do not need the inline tag.
    StructC
}

The following struct field types are supported: bool, int[8, 16, 32, 64], uint[8, 16, 32, 64], float[32, 64], complex[64, 128], []byte, string, encoding.TextMarshaler, encoding.TextUnmarshaler. Other values produce an error.

Pointers to any of the above types are interpreted as optional types. Optional types are decoded if the parsed field is not an empty string, and they are encoded as an empty string if the pointer is nil.

Example (DataCleaning)
package main

import (
	"encoding/csv"
	"fmt"
	"io"
	"strings"

	"github.com/askeladdk/csvbuddy"
)

func main() {
	// A messy CSV that is missing a header, uses semicolon delimiter,
	// has numbers with comma decimals, inconsistent capitalization, and stray spaces.
	var messyCSV = strings.Join([]string{
		"Tokyo   ; JP ; 35,6897 ; 139,6922",
		"jakarta ; Id ; -6,2146 ; 106,8451",
		"DELHI   ; in ; 28,6600 ;  77,2300  ",
	}, "\n")

	type city struct {
		Name      string  `csv:"name"`
		Country   string  `csv:"country"`
		Latitude  float32 `csv:"lat"`
		Longitude float32 `csv:"lng"`
	}

	d := csvbuddy.NewDecoder(strings.NewReader(messyCSV))

	// Set the Decoder to use the header derived from the city struct fields.
	d.SkipHeader()

	// Set the CSV reader to delimit on semicolons.
	d.SetReaderFunc(func(r io.Reader) csvbuddy.Reader {
		cr := csv.NewReader(r)
		cr.Comma = ';'
		cr.ReuseRecord = true
		return cr
	})

	// Set the Decoder to clean messy values.
	d.SetMapFunc(func(name, value string) string {
		value = strings.TrimSpace(value)
		switch name {
		case "lat", "lng":
			value = strings.ReplaceAll(value, ",", ".")
		case "name":
			value = strings.Title(strings.ToLower(value)) //nolint
		case "country":
			value = strings.ToUpper(value)
		}
		return value
	})

	// Decode into the cities variable.
	var cities []city
	_ = d.Decode(&cities)

	for _, city := range cities {
		fmt.Printf("%s, %s is located at coordinate (%.4f, %.4f).\n", city.Name, city.Country, city.Latitude, city.Longitude)
	}
}
Output:

Tokyo, JP is located at coordinate (35.6897, 139.6922).
Jakarta, ID is located at coordinate (-6.2146, 106.8451).
Delhi, IN is located at coordinate (28.6600, 77.2300).
Example (DecoderIterate)
package main

import (
	"fmt"
	"strings"

	"github.com/askeladdk/csvbuddy"
)

func main() {
	moviesCSV := strings.Join([]string{
		"movie,year of release",
		"The Matrix,1999",
		"Back To The Future,1985",
		"The Terminator,1984",
		"2001: A Space Odyssey,1968",
	}, "\n")

	var movie struct {
		Name string `csv:"movie"`
		Year int    `csv:"year of release"`
	}

	cr := csvbuddy.NewDecoder(strings.NewReader(moviesCSV))

	iter, _ := cr.Iterate(&movie)
	for iter.Scan() {
		fmt.Printf("%s was released in %d.\n", movie.Name, movie.Year)
	}

}
Output:

The Matrix was released in 1999.
Back To The Future was released in 1985.
The Terminator was released in 1984.
2001: A Space Odyssey was released in 1968.
Example (FloatingPointTags)
package main

import (
	"fmt"
	"math"

	"github.com/askeladdk/csvbuddy"
)

func main() {
	numbers := []struct {
		N float64 `csv:"number,prec=3,fmt=E"`
	}{{math.Pi}, {100e4}}
	text, _ := csvbuddy.Marshal(&numbers)
	fmt.Println(string(text))
}
Output:

number
3.142E+00
1.000E+06
Example (Marshal)
package main

import (
	"fmt"

	"github.com/askeladdk/csvbuddy"
)

func main() {
	movies := []struct {
		Name string `csv:"movie"`
		Year int    `csv:"year of release"`
	}{
		{"The Matrix", 1999},
		{"Back To The Future", 1985},
		{"The Terminator", 1984},
		{"2001: A Space Odyssey", 1968},
	}

	text, _ := csvbuddy.Marshal(&movies)

	fmt.Println(string(text))
}
Output:

movie,year of release
The Matrix,1999
Back To The Future,1985
The Terminator,1984
2001: A Space Odyssey,1968
Example (Unmarshal)
package main

import (
	"fmt"
	"strings"

	"github.com/askeladdk/csvbuddy"
)

func main() {
	moviesCSV := strings.Join([]string{
		"movie,year of release",
		"The Matrix,1999",
		"Back To The Future,1985",
		"The Terminator,1984",
		"2001: A Space Odyssey,1968",
	}, "\n")

	var movies []struct {
		Name string `csv:"movie"`
		Year int    `csv:"year of release"`
	}

	_ = csvbuddy.Unmarshal([]byte(moviesCSV), &movies)

	for _, movie := range movies {
		fmt.Printf("%s was released in %d.\n", movie.Name, movie.Year)
	}
}
Output:

The Matrix was released in 1999.
Back To The Future was released in 1985.
The Terminator was released in 1984.
2001: A Space Odyssey was released in 1968.

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrInvalidArgument = errors.New("csv: interface{} argument is of an invalid type")

ErrInvalidArgument signals that an interface{} argument is of an invalid type.

Functions

func Header(v interface{}) ([]string, error)

Header returns the header of v, which must be a pointer to a slice of structs.

func Marshal

func Marshal(v interface{}) ([]byte, error)

Marshal encodes a slice of structs to a byte slice of CSV text format. The CSV will be comma-separated and have a header.

func Unmarshal

func Unmarshal(data []byte, v interface{}) error

Unmarshal decodes a byte slice as a CSV to a slice of structs. The CSV is expected to be comma-separated and have a header.

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder reads and decodes CSV records from an input stream.

func NewDecoder

func NewDecoder(r io.Reader) *Decoder

NewDecoder returns a Decoder that reads from r.

func (*Decoder) Decode

func (d *Decoder) Decode(v interface{}) error

Decode decodes a CSV as a slice of structs and stores it in v. The value of v must be a pointer to a slice of structs.

func (*Decoder) DisallowShortFields added in v0.0.3

func (d *Decoder) DisallowShortFields()

DisallowShortFields causes the Decoder to raise an error if a record has fewer columns than struct fields.

func (*Decoder) DisallowUnknownFields

func (d *Decoder) DisallowUnknownFields()

DisallowUnknownFields causes the Decoder to raise an error if a record has more columns than struct fields.

func (*Decoder) Iterate added in v0.0.5

func (d *Decoder) Iterate(v interface{}) (*DecoderIterator, error)

Iterate returns a DecoderIterator that decodes each row into v, which must be a pointer to a struct.

func (*Decoder) SetMapFunc

func (d *Decoder) SetMapFunc(fn MapFunc)

SetMapFunc causes the Decoder to call fn on every field before type conversion. Use this to clean wrongly formatted values.

func (*Decoder) SetReaderFunc

func (d *Decoder) SetReaderFunc(fn ReaderFunc)

SetReaderFunc customizes how records are decoded. The default value is NewReader.

func (*Decoder) SkipHeader

func (d *Decoder) SkipHeader()

SkipHeader causes the Decoder to not parse the first record as the header but to derive it from the struct tags. Use this to read headerless CSVs.

type DecoderIterator added in v0.0.5

type DecoderIterator struct {
	// contains filtered or unexported fields
}

DecoderIterator decodes one row at a time to enable parsing of large files without having to read them entirely into memory.

func (*DecoderIterator) Err added in v0.0.5

func (d *DecoderIterator) Err() error

Err returns the most recent non-EOF error.

func (*DecoderIterator) Scan added in v0.0.5

func (d *DecoderIterator) Scan() bool

Scan parses the next row and stores the result in the value passed into Decoder.Iterate. It returns false when an error has occurred or it reached EOF. After Scan returns false, Err will return the error that caused it to stop. If Scan stopped because it has reached EOF, Err will return nil.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

Encoder writes and encodes CSV records to an output stream.

func NewEncoder

func NewEncoder(w io.Writer) *Encoder

NewEncoder creates a new Encoder.

func (*Encoder) Encode

func (e *Encoder) Encode(v interface{}) (err error)

Encode encodes a slice of structs to CSV text format. The value of v must be a pointer to a slice of structs.

func (*Encoder) SetHeader added in v0.0.2

func (e *Encoder) SetHeader(h []string)

SetHeader causes the Encoder to change the order in which fields are encoded.

func (*Encoder) SetMapFunc

func (e *Encoder) SetMapFunc(fn MapFunc)

SetMapFunc causes the Encoder to call fn on every field before a record is written.

func (*Encoder) SetWriterFunc

func (e *Encoder) SetWriterFunc(fn WriterFunc)

SetWriterFunc customizes how records are encoded. The default value is NewWriter.

func (*Encoder) SkipHeader

func (e *Encoder) SkipHeader()

SkipHeader causes the Encoder to not write the CSV header.

type MapFunc

type MapFunc func(name, value string) string

MapFunc is a function that replaces a field value by another value.

type Reader

type Reader interface {
	Read() ([]string, error)
}

Reader parses a CSV input stream to records. A Reader must return io.EOF to signal end of file.

func NewReader

func NewReader(r io.Reader) Reader

NewReader returns a new csv.Reader that reads from r.

type ReaderFunc

type ReaderFunc func(io.Reader) Reader

ReaderFunc is a function that returns a Reader that reads from an input stream.

type Writer

type Writer interface {
	Write([]string) error
}

Writer writes CSV records.

Writer may optionally support flushing by implementing Flush() error.

func NewWriter

func NewWriter(w io.Writer) Writer

NewWriter returns a new csv.Writer that writes to w.

type WriterFunc

type WriterFunc func(io.Writer) Writer

WriterFunc is a function that returns a Writer that writes to an output stream.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL