csvutil

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 3, 2017 License: MIT Imports: 11 Imported by: 0

README

csvutil GoDoc Build Status Build status Go Report Card codecov

Package csvutil provides fast and idiomatic mapping between CSV and Go values.

This package does not provide a CSV parser itself, it is based on the Reader and Writer interfaces which are implemented by eg. std csv package. This gives a possibility of choosing any other CSV writer or reader which may be more performant.

Installation

go get github.com/jszwec/csvutil

Example

Unmarshal

Nice and easy Unmarshal is using the std csv.Reader with its default options. Use Decoder for streaming and more advanced use cases.

	var csvInput = []byte(`
name,age
jacek,26
john,27`,
	)

	type User struct {
		Name string `csv:"name"`
		Age  int    `csv:"age"`
	}

	var users []User
	if err := csvutil.Unmarshal(csvInput, &users); err != nil {
		fmt.Println("error:", err)
	}
	fmt.Printf("%+v", users)

	// Output:
	// [{Name:jacek Age:26} {Name:john Age:27}]
Marshal

Marshal is using the std csv.Writer with its default options. Use Encoder for streaming or to use a different Writer.

	type Address struct {
		City    string
		Country string
	}

	type User struct {
		Name string
		Address
		Age int `csv:"age,omitempty"`
	}

	users := []User{
		{Name: "John", Address: Address{"Boston", "USA"}, Age: 26},
		{Name: "Bob", Address: Address{"LA", "USA"}, Age: 27},
		{Name: "Alice", Address: Address{"SF", "USA"}},
	}

	b, err := csvutil.Marshal(users)
	if err != nil {
		fmt.Println("error:", err)
	}
	fmt.Println(string(b))

	// Output:
	// Name,City,Country,age
	// John,Boston,USA,26
	// Bob,LA,USA,27
	// Alice,SF,USA,
Unmarshal and metadata

It may happen that your CSV input will not always have the same header. In addition to your base fields you may get extra metadata that you would still like to store. Decoder provides Unused method, which after each call to Decode can report which header indexes were not used during decoding. Based on that, it is possible to handle and store all these extra values.

	type User struct {
		Name      string            `csv:"name"`
		City      string            `csv:"city"`
		Age       int               `csv:"age"`
		OtherData map[string]string `csv:"-"`
	}

	csvReader := csv.NewReader(strings.NewReader(`
name,age,city,zip
alice,25,la,90005
bob,30,ny,10005`))

	dec, err := csvutil.NewDecoder(csvReader)
	if err != nil {
		log.Fatal(err)
	}

	header := dec.Header()
	var users []User
	for {
		u := User{OtherData: make(map[string]string)}

		if err := dec.Decode(&u); err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}

		for _, i := range dec.Unused() {
			u.OtherData[header[i]] = dec.Record()[i]
		}
		users = append(users, u)
	}

	fmt.Println(users)

	// Output:
	// [{alice la 25 map[zip:90005]} {bob ny 30 map[zip:10005]}]

Performance

csvutil provides the best encoding and decoding performance with small memory usage.

Unmarshal

benchmark code: https://gist.github.com/jszwec/e8515e741190454fa3494bcd3e1f100f

csvutil:

BenchmarkUnmarshal/csvutil.Unmarshal/1_record-8         	  200000	      9407 ns/op	    7408 B/op	      44 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/10_records-8       	  100000	     21384 ns/op	    8433 B/op	      53 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/100_records-8      	   10000	    140172 ns/op	   18609 B/op	     143 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/1000_records-8     	    1000	   1334816 ns/op	  121183 B/op	    1043 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/10000_records-8    	     100	  13203689 ns/op	 1140356 B/op	   10043 allocs/op
BenchmarkUnmarshal/csvutil.Unmarshal/100000_records-8   	      10	 137474932 ns/op	12048059 B/op	  100044 allocs/op

gocsv:

BenchmarkUnmarshal/gocsv.Unmarshal/1_record-8           	  200000	     10613 ns/op	    7451 B/op	      94 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/10_records-8         	   50000	     36413 ns/op	   13547 B/op	     304 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/100_records-8        	    5000	    287672 ns/op	   72300 B/op	    2377 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/1000_records-8       	     500	   2756252 ns/op	  649932 B/op	   23080 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/10000_records-8      	      50	  29407701 ns/op	 7023391 B/op	  230089 allocs/op
BenchmarkUnmarshal/gocsv.Unmarshal/100000_records-8     	       5	 311860368 ns/op	75482985 B/op	 2300102 allocs/op

easycsv:

BenchmarkUnmarshal/easycsv.ReadAll/1_record-8           	  100000	     15636 ns/op	    8863 B/op	      78 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/10_records-8         	   20000	     76797 ns/op	   24080 B/op	     388 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/100_records-8        	    2000	    666465 ns/op	  170548 B/op	    3451 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/1000_records-8       	     200	   6431414 ns/op	 1595751 B/op	   34054 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/10000_records-8      	      20	  70387764 ns/op	18870418 B/op	  340065 allocs/op
BenchmarkUnmarshal/easycsv.ReadAll/100000_records-8     	       2	 737079728 ns/op	190822472 B/op	 3400081 allocs/op
Marshal

benchmark code: https://gist.github.com/jszwec/31980321e1852ebb5615a44ccf374f17

csvutil:

BenchmarkMarshal/csvutil.Marshal/1_record-8         	  200000	      6010 ns/op	    6816 B/op	      28 allocs/op
BenchmarkMarshal/csvutil.Marshal/10_records-8       	  100000	     22391 ns/op	    7728 B/op	      38 allocs/op
BenchmarkMarshal/csvutil.Marshal/100_records-8      	   10000	    189905 ns/op	   25139 B/op	     129 allocs/op
BenchmarkMarshal/csvutil.Marshal/1000_records-8     	    1000	   1812082 ns/op	  165458 B/op	    1031 allocs/op
BenchmarkMarshal/csvutil.Marshal/10000_records-8    	     100	  18112811 ns/op	 1523067 B/op	   10034 allocs/op
BenchmarkMarshal/csvutil.Marshal/100000_records-8   	      10	 183706155 ns/op	22364681 B/op	  100038 allocs/op

gocsv:

BenchmarkMarshal/gocsv.Marshal/1_record-8           	  200000	      7291 ns/op	    5810 B/op	      82 allocs/op
BenchmarkMarshal/gocsv.Marshal/10_records-8         	   50000	     32093 ns/op	    9316 B/op	     389 allocs/op
BenchmarkMarshal/gocsv.Marshal/100_records-8        	    5000	    284238 ns/op	   52673 B/op	    3450 allocs/op
BenchmarkMarshal/gocsv.Marshal/1000_records-8       	     500	   2777589 ns/op	  452503 B/op	   34052 allocs/op
BenchmarkMarshal/gocsv.Marshal/10000_records-8      	      50	  28477563 ns/op	 4413044 B/op	  340064 allocs/op
BenchmarkMarshal/gocsv.Marshal/100000_records-8     	       5	 286370004 ns/op	51970707 B/op	 3400084 allocs/op

Documentation

Overview

Package csvutil provides fast and idiomatic mapping between CSV and Go values.

This package does not provide a CSV parser itself, it is based on the Reader and Writer interfaces which are implemented by eg. std csv package. This gives a possibility of choosing any other CSV writer or reader which may be more performant.

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrFieldCount = errors.New("wrong number of fields in record")

ErrFieldCount is returned when header's length doesn't match the length of the read record.

Functions

func Marshal

func Marshal(v interface{}) ([]byte, error)

Marshal returns the CSV encoding of slice v. If v is not a slice, Marshal returns InvalidMarshalError. If slice elements are not structs, Marshal will return InvalidEncodeError.

Marshal uses the std encoding/csv.Writer with its default settings for csv encoding.

For the exact encoding rules look at Encoder.Encode method.

Example
package main

import (
	"fmt"

	"github.com/jszwec/csvutil"
)

func main() {
	type Address struct {
		City    string
		Country string
	}

	type User struct {
		Name string
		Address
		Age int `csv:"age,omitempty"`
	}

	users := []User{
		{Name: "John", Address: Address{"Boston", "USA"}, Age: 26},
		{Name: "Bob", Address: Address{"LA", "USA"}, Age: 27},
		{Name: "Alice", Address: Address{"SF", "USA"}},
	}

	b, err := csvutil.Marshal(users)
	if err != nil {
		fmt.Println("error:", err)
	}
	fmt.Println(string(b))

}
Output:

Name,City,Country,age
John,Boston,USA,26
Bob,LA,USA,27
Alice,SF,USA,
Example (CustomMarshalCSV)
package main

import (
	"fmt"

	"github.com/jszwec/csvutil"
)

type Status uint8

const (
	Unknown = iota
	Success
	Failure
)

func (s Status) MarshalCSV() ([]byte, error) {
	switch s {
	case Success:
		return []byte("success"), nil
	case Failure:
		return []byte("failure"), nil
	default:
		return []byte("unknown"), nil
	}
}

type Job struct {
	ID     int
	Status Status
}

func main() {
	jobs := []Job{
		{1, Success},
		{2, Failure},
	}

	b, err := csvutil.Marshal(jobs)
	if err != nil {
		fmt.Println("error:", err)
	}
	fmt.Println(string(b))

}
Output:

ID,Status
1,success
2,failure

func Unmarshal

func Unmarshal(data []byte, v interface{}) error

Unmarshal parses the CSV-encoded data and stores the result in the slice pointed to by v. If v is nil or not a pointer to a slice, Unmarshal returns an InvalidUnmarshalError.

Unmarshal uses the std encoding/csv.Reader for parsing and csvutil.Decoder for populating the struct elements in the provided slice. For exact decoding rules look at the Decoder's documentation.

The first line in data is treated as a header. Decoder will use it to map csv columns to struct's fields.

In case of success the provided slice will be reinitialized and its content fully replaced with decoded data.

Types

type Decoder

type Decoder struct {
	// Tag defines which key in the struct field's tag to scan for names and
	// options (Default: 'csv').
	Tag string
	// contains filtered or unexported fields
}

A Decoder reads and decodes string records into structs.

Example (CustomUnmarshalCSV)
package main

import (
	"fmt"
	"strconv"

	"github.com/jszwec/csvutil"
)

type Bar int

func (b *Bar) UnmarshalCSV(data []byte) error {
	n, err := strconv.Atoi(string(data))
	*b = Bar(n)
	return err
}

type Foo struct {
	Int int `csv:"int"`
	Bar Bar `csv:"bar"`
}

func main() {
	var csvInput = []byte(`
int,bar
5,10
6,11`)

	var foos []Foo
	if err := csvutil.Unmarshal(csvInput, &foos); err != nil {
		fmt.Println("error:", err)
	}

	fmt.Printf("%+v", foos)

}
Output:

[{Int:5 Bar:10} {Int:6 Bar:11}]
Example (Decode)
package main

import (
	"encoding/csv"
	"fmt"
	"io"
	"log"
	"strings"

	"github.com/jszwec/csvutil"
)

func main() {
	type User struct {
		ID   *int   `csv:"id,omitempty"`
		Name string `csv:"name"`
		City string `csv:"city"`
		Age  int    `csv:"age"`
	}

	csvReader := csv.NewReader(strings.NewReader(`
id,name,age,city
,alice,25,la
,bob,30,ny`))

	dec, err := csvutil.NewDecoder(csvReader)
	if err != nil {
		log.Fatal(err)
	}

	var users []User
	for {
		var u User
		if err := dec.Decode(&u); err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}
		users = append(users, u)
	}

	fmt.Println(users)

}
Output:

[{<nil> alice la 25} {<nil> bob ny 30}]
Example (DecodeEmbedded)
package main

import (
	"encoding/csv"
	"fmt"
	"io"
	"log"
	"strings"

	"github.com/jszwec/csvutil"
)

func main() {
	type Address struct {
		ID    int    `csv:"id"` // same field as in User - this one will be empty
		City  string `csv:"city"`
		State string `csv:"state"`
	}

	type User struct {
		Address
		ID   int    `csv:"id"` // same field as in Address - this one wins
		Name string `csv:"name"`
		Age  int    `csv:"age"`
	}

	csvReader := csv.NewReader(strings.NewReader(
		"id,name,age,city,state\n" +
			"1,alice,25,la,ca\n" +
			"2,bob,30,ny,ny"))

	dec, err := csvutil.NewDecoder(csvReader)
	if err != nil {
		log.Fatal(err)
	}

	var users []User
	for {
		var u User

		if err := dec.Decode(&u); err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}

		users = append(users, u)
	}

	fmt.Println(users)

}
Output:

[{{0 la ca} 1 alice 25} {{0 ny ny} 2 bob 30}]
Example (DecodeUnusedColumns)
package main

import (
	"encoding/csv"
	"fmt"
	"io"
	"log"
	"strings"

	"github.com/jszwec/csvutil"
)

func main() {
	type User struct {
		Name      string            `csv:"name"`
		City      string            `csv:"city"`
		Age       int               `csv:"age"`
		OtherData map[string]string `csv:"-"`
	}

	csvReader := csv.NewReader(strings.NewReader(`
name,age,city,zip
alice,25,la,90005
bob,30,ny,10005`))

	dec, err := csvutil.NewDecoder(csvReader)
	if err != nil {
		log.Fatal(err)
	}

	header := dec.Header()
	var users []User
	for {
		u := User{OtherData: make(map[string]string)}

		if err := dec.Decode(&u); err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}

		for _, i := range dec.Unused() {
			u.OtherData[header[i]] = dec.Record()[i]
		}
		users = append(users, u)
	}

	fmt.Println(users)

}
Output:

[{alice la 25 map[zip:90005]} {bob ny 30 map[zip:10005]}]
Example (Unmarshal)
package main

import (
	"fmt"

	"github.com/jszwec/csvutil"
)

func main() {
	var csvInput = []byte(`
name,age
jacek,26
john,27`,
	)

	type User struct {
		Name string `csv:"name"`
		Age  int    `csv:"age"`
	}

	var users []User
	if err := csvutil.Unmarshal(csvInput, &users); err != nil {
		fmt.Println("error:", err)
	}
	fmt.Printf("%+v", users)

}
Output:

[{Name:jacek Age:26} {Name:john Age:27}]

func NewDecoder

func NewDecoder(r Reader, header ...string) (dec *Decoder, err error)

NewDecoder returns a new decoder that reads from r.

Decoder will match struct fields according to the given header.

If header is empty NewDecoder will read one line and treat it as a header.

Records coming from r must be of the same length as the header.

NewDecoder may return io.EOF if there is no data in r and no header was provided by the caller.

func (*Decoder) Decode

func (d *Decoder) Decode(v interface{}) (err error)

Decode reads the next string record from its input and stores it in the value pointed to by v which must be a non-nil struct pointer.

Decode matches all exported struct fields based on the header. Struct fields can be adjusted by using tags.

The "omitempty" option specifies that the field should be omitted from the decoding if record's field is an empty string.

Examples of struct field tags and their meanings:

// Decode matches this field with "myName" header column.
Field int `csv:"myName"`

// Decode matches this field with "Field" header column.
Field int

// Decode matches this field with "myName" header column and decoding is not
// called if record's field is an empty string.
Field int `csv:"myName,omitempty"`

// Decode matches this field with "Field" header column and decoding is not
// called if record's field is an empty string.
Field int `csv:",omitempty"`

// Decode ignores this field.
Field int `csv:"-"`

By default decode looks for "csv" tag, but this can be changed by setting Decoder.Tag field.

To Decode into a custom type v must implement csvutil.Unmarshaler or encoding.TextUnmarshaler.

Anonymous struct fields with tags are treated like normal fields and they must implement csvutil.Unmarshaler or encoding.TextUnmarshaler.

Anonymous struct fields without tags are populated just as if they were part of the main struct. However, fields in the main struct have bigger priority and they are populated first. If main struct and anonymous struct field have the same fields, the main struct's fields will be populated.

Fields of type []byte expect the data to be base64 encoded strings.

func (*Decoder) Header

func (d *Decoder) Header() []string

Header returns the first line that came from the reader, or returns the defined header by the caller.

func (*Decoder) Record

func (d *Decoder) Record() []string

Record returns the most recently read record. The slice is valid until the next call to Decode.

func (*Decoder) Unused

func (d *Decoder) Unused() (indexes []int)

Unused returns a list of column indexes that were not used during decoding due to lack of matching struct field.

type Encoder

type Encoder struct {
	// Tag defines which key in the struct field's tag to scan for names and
	// options (Default: 'csv').
	Tag string
	// contains filtered or unexported fields
}

Encoder writes structs CSV representations to the output stream.

Example (Encode)
package main

import (
	"bytes"
	"encoding/csv"
	"fmt"

	"github.com/jszwec/csvutil"
)

func main() {
	type Address struct {
		City    string
		Country string
	}

	type User struct {
		Name string
		Address
		Age int `csv:"age,omitempty"`
	}

	users := []User{
		{Name: "John", Address: Address{"Boston", "USA"}, Age: 26},
		{Name: "Bob", Address: Address{"LA", "USA"}, Age: 27},
		{Name: "Alice", Address: Address{"SF", "USA"}},
	}

	var buf bytes.Buffer
	w := csv.NewWriter(&buf)
	enc := csvutil.NewEncoder(w)

	for _, u := range users {
		if err := enc.Encode(u); err != nil {
			fmt.Println("error:", err)
		}
	}

	w.Flush()
	if err := w.Error(); err != nil {
		fmt.Println("error:", err)
	}

	fmt.Println(buf.String())

}
Output:

Name,City,Country,age
John,Boston,USA,26
Bob,LA,USA,27
Alice,SF,USA,

func NewEncoder

func NewEncoder(w Writer) *Encoder

NewEncoder returns a new encoder that writes to w.

func (*Encoder) Encode

func (e *Encoder) Encode(v interface{}) error

Encode writes the CSV encoding of v to the output stream. The provided argument v must be a struct.

Only the exported fields will be encoded.

First call to Encode will write a header. Header names can be customized by using tags ('csv' by default), otherwise original Field names are used.

Header and fields are written in the same order as struct fields are defined. Embedded struct's fields are treated as if they were part of the outer struct. Fields that are embedded types and that are tagged are treated like any other field, but they have to implement Marshaler or encoding.TextMarshaler interfaces.

Marshaler interface has the priority over encoding.TextMarshaler.

Tagged fields have the priority over non tagged fields with the same name.

Following the Go vibility rules if there are multiple fields with the same name (tagged or not tagged) on the same level and choice between them is ambiguous, then all these fields will be ignored.

Nil values will be encoded as empty strings. Same will happen if 'omitempty' tag is set, and the value is a default value like 0, false or nil interface.

Bool types are encoded as 'true' or 'false'.

Float types are encoded using strconv.FormatFloat with precision -1 and 'G' format.

Fields of type []byte are being encoded as base64-encoded strings.

Fields can be excluded from encoding by using '-' tag option.

Examples of struct tags:

// Field appears as 'myName' header in CSV encoding.
Field int `csv:"myName"`

// Field appears as 'Field' header in CSV encoding.
Field int

// Field appears as 'myName' header in CSV encoding and is an empty string
// if Field is 0.
Field int `csv:"myName,omitempty"`

// Field appears as 'Field' header in CSV encoding and is an empty string
// if Field is 0.
Field int `csv:",omitempty"`

// Encode ignores this field.
Field int `csv:"-"`

Encode doesn't flush data. The caller is responsible for calling Flush() if the used Writer supports it.

type InvalidDecodeError

type InvalidDecodeError struct {
	Type reflect.Type
}

An InvalidDecodeError describes an invalid argument passed to Decode. (The argument to Decode must be a non-nil pointer.)

func (*InvalidDecodeError) Error

func (e *InvalidDecodeError) Error() string

type InvalidEncodeError

type InvalidEncodeError struct {
	Type reflect.Type
}

InvalidEncodeError is returned by Encode when the passed argument v is not a struct.

func (*InvalidEncodeError) Error

func (e *InvalidEncodeError) Error() string

type InvalidMarshalError

type InvalidMarshalError struct {
	Type reflect.Type
}

InvalidMarshalError is returned by Marshal when the provided type was not a slice.

func (*InvalidMarshalError) Error

func (e *InvalidMarshalError) Error() string

type InvalidUnmarshalError

type InvalidUnmarshalError struct {
	Type reflect.Type
}

An InvalidUnmarshalError describes an invalid argument passed to Unmarshal. (The argument to Unmarshal must be a non-nil slice pointer.)

func (*InvalidUnmarshalError) Error

func (e *InvalidUnmarshalError) Error() string

type Marshaler

type Marshaler interface {
	MarshalCSV() ([]byte, error)
}

Marshaler is the interface implemented by types that can marshal themselves into valid string.

type MarshalerError

type MarshalerError struct {
	Type          reflect.Type
	MarshalerType string
	Err           error
}

MarshalerError is returned by Encoder when MarshalCSV or MarshalText returned an error.

func (*MarshalerError) Error

func (e *MarshalerError) Error() string

type Reader

type Reader interface {
	Read() ([]string, error)
}

Reader provides the interface for reading a single CSV record.

If there is no data left to be read, Read returns (nil, io.EOF).

It is implemented by csv.Reader.

type UnmarshalTypeError

type UnmarshalTypeError struct {
	Value string       // string value
	Type  reflect.Type // type of Go value it could not be assigned to
}

An UnmarshalTypeError describes a string value that was not appropriate for a value of a specific Go type.

func (*UnmarshalTypeError) Error

func (e *UnmarshalTypeError) Error() string

type Unmarshaler

type Unmarshaler interface {
	UnmarshalCSV([]byte) error
}

Unmarshaler is the interface implemented by types that can unmarshal a single record's field description of themselves.

type UnsupportedTypeError

type UnsupportedTypeError struct {
	Type reflect.Type
}

An UnsupportedTypeError is returned when attempting to decode an unsupported value type.

func (*UnsupportedTypeError) Error

func (e *UnsupportedTypeError) Error() string

type Writer

type Writer interface {
	Write([]string) error
}

Writer provides the interface for writing a single CSV record.

It is implemented by csv.Writer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL