csv

package module
v0.0.0-...-8745000 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 9, 2021 License: LGPL-3.0 Imports: 13 Imported by: 12

README

NAME

go-csv - a set of golang tools and libraries for manipulating CSV representations

DESCRIPTION

go-csv is a set of tools for manipulating streams of CSV data.

As a rule, most tools in this set assume CSV files that include a header record that describes the contents of each field.

TOOLS

  • csv-select - selects the specified fields from the header-prefixed, CSV input stream
  • uniquify - augments a partial key so that each record in the output stream has a unique natural key
  • surrogate-keys - augments the input stream so that each record in the output stream has a surrogate key derived from the MD5 sum of the natural key
  • csv-to-json - converts a CSV stream into a JSON stream.
  • json-to-csv - converts a JSON stream into a CSV stream.
  • csv-sort - sorts a CSV stream according to the specified columns.
  • csv-join - joins two sorted CSV streams after matching on specified columns.
  • influx-line-format - convert a CSV stream into influx line format.
  • csv-use-tab - uses a table delimit while writing (default) or reading (--on-read) a CSV stream

INSTALLATION

The instructions assume that there is a local go installation available, that the binaries will be installed into $GOPATH/bin and this directory is already on the user's PATH.

go install github.com/wildducktheories/go-csv/...

DOCUMENTATION

For more information, refer to https://godoc.org/github.com/wildducktheories/go-csv .

LICENSE

Refer to LICENSE file in same directory.

(c) 2014 - Wild Duck Theories Australia Pty Limited

Documentation

Overview

Package csv provides stream abstractions that allow the fields of a CSV record to be addressed by the values of the fields of the stream's header record.

It provides an alternative to the stream abstractions in encoding/csv which use unnamed string slices as the stream record type.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Format

func Format(record []string) string

Format the specified slice as a CSV record using the default CSV encoding conventions.

func LessNumericStrings

func LessNumericStrings(l, r string) bool

Answers true if the numeric value of l is less than r according to a numerical comparison (if l and r are both parseable as floats) or according to a lexical comparison otherwise.

func LessStrings

func LessStrings(l, r string) bool

Answers true if l is less than r, according to a lexical comparison

func Parse

func Parse(record string) ([]string, error)

Parse a string representing one or more encoded CSV record and returns the first such record.

Types

type CatProcess

type CatProcess struct {
}

A process that copies the reader to the writer.

func (*CatProcess) Run

func (p *CatProcess) Run(r Reader, b WriterBuilder, errCh chan<- error)

type CsvToJsonProcess

type CsvToJsonProcess struct {
	BaseObject  string
	StringsOnly bool
}

Given a stream of CSV records, generate a stream of JSON records, one per line. The headers are treated as paths into the resulting JSON object, so a CSV file containing the header foo.bar,baz and the data 1, 2 will be converted into a JSON object like {"foo": {"bar": 1}, "baz": 2}

If column values can be successfully unmarshalled as JSON numbers, booleans, objects or arrays then the value will be encoded as the corresponding JSON object, otherwise it will be encoded as a string. Use --strings to force all column values to be encoded as JSON strings.

func (*CsvToJsonProcess) Run

func (p *CsvToJsonProcess) Run(reader Reader, encoder *json.Encoder, errCh chan<- error)

type InfluxLineFormatProcess

type InfluxLineFormatProcess struct {
	Measurement string   // the name of the measurement
	Timestamp   string   // the name of the timestamp column
	Format      string   // the format of the timestamp column (for format see documentation of go time.Parse())
	Location    string   // the location in which the timestamp is interpreted (per go time.LoadLocation())
	Tags        []string // the columns to be used as tags
	Values      []string // the columns to be used as values.
}

InfluxLineFormatProcess is a process which converts a CSV file into influx line format.

func (*InfluxLineFormatProcess) Run

func (p *InfluxLineFormatProcess) Run(reader Reader, out io.Writer, errCh chan<- error)

Run exhausts the reader, writing one record in influx line format per CSV input record.

type Join

type Join struct {
	LeftKeys   []string // the names of the keys from the left stream
	RightKeys  []string // the names of the keys from the right stream
	Numeric    []string // the names of the keys in the left stream that are numeric keys
	LeftOuter  bool     // perform a left outer join - left rows are copied even if there is no matching right row
	RightOuter bool     // perform a right outer join - right rows are copied even if there is no matching left row
}

A Join can be used to construct a process that will join two streams of CSV records by matching records from each stream on the specified key columns.

func (*Join) WithRight

func (p *Join) WithRight(r Reader) Process

Binds the specified reader as the right-hand side of a join and returns a Process whose reader will be considered as the left-hand side of the join.

type JsonToCsvProcess

type JsonToCsvProcess struct {
	BaseObject string
	Header     []string
}

Given a stream of JSON records, generate a stream of csv records.

The columns of the output CSV stream are named by the Header parameter. Each column is interprefed as a path into the JSON input object. If the object at that path is a string, the string is copied into the specified column of the output stream. Otherwise, the json encoding of the object is copied into the specified column of the output stream.

Each object in the input object which is mapped by a CSV column is logically deleted from the input object. If --base-object-key is specified, a JSON encoding of the remaining input object is written into the specified column of the CSV output stream.

func (*JsonToCsvProcess) Run

func (p *JsonToCsvProcess) Run(decoder *json.Decoder, builder WriterBuilder, errCh chan<- error)

type Pipe

type Pipe interface {
	Builder() WriterBuilder // Builds a Writer for the write end of the pipe
	Reader() Reader         // Returns the Reader for the read end of the pipe
}

Implements a unidirectional channel that can connect a reader process to a writer process.

func NewPipe

func NewPipe() Pipe

Answer a new Pipe whose Builder and Reader can be used to connect two chained processes.

type Process

type Process interface {
	Run(reader Reader, builder WriterBuilder, errCh chan<- error)
}

A Process is a function that can be run asynchronously that consumes a stream of CSV records provided by reader and writes them into a writer of CSV records as constructed by the specified builder. It signals its successful completion by writing a nil into the specified error channel. An unsuccessful completion is signaled by writing at most one error into the specified error channel.

func NewPipeLine

func NewPipeLine(p []Process) Process

Join a sequence of processes by connecting them with pipes, returning a new process that represents the entire pipeline.

type Reader

type Reader interface {
	// Answers the header.
	Header() []string
	// Answers a channel that iterates over a sequence of Records in the stream. The channel
	// remains open until an error is encountered or until the stream is exhausted.
	C() <-chan Record
	// Answers the error that caused the stream to close, if any.
	Error() error
	// Close the reader and release any resources associated with it.
	Close()
}

Reader provides a reader of CSV streams whose first record is a header describing each field.

func WithCsvReader

func WithCsvReader(r *csv.Reader, c io.Closer) Reader

WithCsvReader creates a csv reader from the specified encoding/csv Reader.

func WithIoReader

func WithIoReader(io io.ReadCloser) Reader

WithIoReader creates a csv Reader from the specified io Reader.

func WithIoReaderAndDelimiter

func WithIoReaderAndDelimiter(io io.ReadCloser, delimiter rune) Reader

WithIoReaderAndDelimiter creates a csv Reader from the specified io Reader.

func WithProcess

func WithProcess(r Reader, p Process) Reader

Given a reader and a process, answer a new reader which is the result of applying the specified process to the specified reader.

type Record

type Record interface {
	// Return the header of the record.
	Header() []string
	// Gets the value of the field specified by the key. Returns the empty string
	// if the field does not exist in the record.
	Get(key string) string
	// Puts the value into the field specified by the key.
	Put(key string, value string)
	// Return the contents of the record as a map. Mutation of the map is not supported.
	AsMap() map[string]string
	// Return the contents of the record as a slice. Mutation of the slice is not supported.
	AsSlice() []string
	// Puts all the matching values from the specified record into the receiving record
	PutAll(r Record)
	// Return true if the receiver and the specified record have the same header.
	SameHeader(r Record) bool
}

Record provides keyed access to the fields of data records where each field of a data record is keyed by the value of the corresponding field in the header record.

func ReadAll

func ReadAll(reader Reader) ([]Record, error)

ReadAll reads all the records from the specified reader and only returns a non-nil error if an error, other than EOF, occurs during the reading process.

type RecordBuilder

type RecordBuilder func(fields []string) Record

func NewRecordBuilder

func NewRecordBuilder(header []string) RecordBuilder

NewRecordBuilder returns a function that can be used to create new Records for a CSV stream with the specified header.

This can be used with raw encoding/csv streams in cases where a CSV stream contains more than one record type.

type RecordComparator

type RecordComparator func(l, r Record) bool

A RecordComparator is a function that returns true if the left Record is 'less' than the right Record according to some total order.

func AsRecordComparator

func AsRecordComparator(comparators []RecordComparator) RecordComparator

Constructs a single RecordComparator from a slice of RecordComparators

type SelectProcess

type SelectProcess struct {
	Keys        []string
	PermuteOnly bool
}

Given a header-prefixed input stream of CSV records select the fields that match the specified key (Key). If PermuteOnly is is specified, all the fields of the input stream are preserved, but the output stream is permuted so that the key fields occupy the left-most fields of the output stream. The remaining fields are preserved in their original order.

func (*SelectProcess) Run

func (p *SelectProcess) Run(reader Reader, builder WriterBuilder, errCh chan<- error)

type SortComparator

type SortComparator func(i, j int) bool

A Sort comparator compares two records, identified by i and j, and returns true if the ith record is less than the jth record according to some total order.

type SortKeys

type SortKeys struct {
	Keys     []string // list of columns to use for sorting
	Numeric  []string // list of columns for which a numerical string comparison is used
	Reversed []string // list of columns for which the comparison is reversed
}

Specifies the keys to be used by a CSV sort.

func (*SortKeys) AsRecordComparator

func (p *SortKeys) AsRecordComparator() RecordComparator

Answers a comparator that can compare two records.

func (*SortKeys) AsRecordComparators

func (p *SortKeys) AsRecordComparators() []RecordComparator

Answers a slice of comparators that can compare two records.

func (*SortKeys) AsSort

func (p *SortKeys) AsSort(data []Record) sort.Interface

Answer a Sort for the specified slice of CSV records, using the comparators derived from the keys specified by the receiver.

func (*SortKeys) AsSortProcess

func (p *SortKeys) AsSortProcess() *SortProcess

Derive a SortProcess from the receiver.

func (*SortKeys) AsSortable

func (p *SortKeys) AsSortable(data []Record) *Sortable

Answer a Sortable whose comparators have been initialized with string or numerical string comparators according the specification of the receiver.

func (*SortKeys) AsStringProjection

func (p *SortKeys) AsStringProjection() StringProjection

Derive a StringProjection from the sort keys.

func (*SortKeys) AsStringSliceComparator

func (p *SortKeys) AsStringSliceComparator() StringSliceComparator

Answers a comparator that can compare two slices.

type SortProcess

type SortProcess struct {
	AsSort func(data []Record) sort.Interface
	Keys   []string
}

A process, which given a CSV reader, sorts a stream of Records using the sort specified by the result of the AsSort function. The stream is checked to verify that it has the specified keys.

func (*SortProcess) Run

func (p *SortProcess) Run(reader Reader, builder WriterBuilder, errCh chan<- error)

Run the sort process specified by the receiver against the specified CSV reader, writing the results to a Writer constructed from the specified builder. Termination of the sort process is signalled by writing nil or at most one error into the specified error channel. It is an error to apply the receiving process to a reader whose Header is not a strict superset of the receiver's Keys.

type Sortable

type Sortable struct {
	Keys        []string
	Data        []Record
	Comparators []SortComparator
}

An adapter that converts a slice of CSV records into an instance of sort.Interface using the specified comparators, in order, to compare records.

func (*Sortable) AsSortProcess

func (b *Sortable) AsSortProcess() *SortProcess

Derives a SortProcess from the receiver. Note that it isn't safe to run multiple processes derived from the same Sortable at the same time.

func (*Sortable) Comparator

func (b *Sortable) Comparator(k string, less StringComparator) SortComparator

Answer a comparator for the field named k, using the string comparator specified by less.

func (*Sortable) Len

func (b *Sortable) Len() int

An implementation of sort.Interface.Len()

func (*Sortable) Less

func (b *Sortable) Less(i, j int) bool

An implementation of sort.Interface.Less()

func (*Sortable) Swap

func (b *Sortable) Swap(i, j int)

An implementation of sort.Interface.Swap()

type StringComparator

type StringComparator func(l, r string) bool

A StringComparator is a function that returns true if the left string is 'less' then the right string according to some total order.

type StringProjection

type StringProjection func(r Record) []string

A StringProjection is a function which produces a slice of strings from a Record.

type StringSliceComparator

type StringSliceComparator func(l, r []string) bool

A StringSliceComparator is a function that returns true if the left slice is 'less' than the right slice according to some total order.

func AsStringSliceComparator

func AsStringSliceComparator(comparators []StringComparator) StringSliceComparator

type SurrogateKeysProcess

type SurrogateKeysProcess struct {
	NaturalKeys  []string
	SurrogateKey string
}

Given a header-prefixed input stream of CSV records and the specification of a natural key (NaturalKeys) generate an augmented, header-prefixed, output stream of CSV records which contains a surrogate key (SurrogateKey) that is derived from the MD5 sum of the natural key.

The surrogate key is constructed by calculating the MD5 hash of the string representation of a CSV record that contains the fields of the natural key of each record.

For example, given the following input CSV stream which has a natural key of Date,Amount,Sequence

Date,Amount,Description,Sequence
2014/12/31,100.0,Payment
2014/12/31,100.0,Payment,1
2014/12/31,85.0,Payment

generate an additional column, KeyMD5, containing a surrogate key that represents the natural key.

Date,Amount,Description,Sequence,KeyMD5
2014/12/31,100.0,Payment,"",bead7c34cf0828efb8a240e262e7afea
2014/12/31,100.0,Payment,1,cc8ab528163236eb1aa4004202ee1935
2014/12/31,85.0,Payment,"",8f4d3a8a05031256a4fa4cf1fadd757b

func (*SurrogateKeysProcess) Run

func (p *SurrogateKeysProcess) Run(reader Reader, builder WriterBuilder, errCh chan<- error)

type UniquifyProcess

type UniquifyProcess struct {
	PartialKeys   []string
	AdditionalKey string
}

Given a header-prefixed input stream of CSV records and the specification of a partial key (PartialKey) formed from one or more of the fields, generate an augmented, header-prefixed, stream of CSV records such that the augmented key of each output record is unique. The field used to ensure uniqueness is specified by the AdditionalKey option.

For example, given the following input with the partial key Date,Amount

Date,Amount,Description
2014/12/31,100.0,Payment
2014/12/31,100.0,Payment
2014/12/31,85.0,Payment

Generate an additional column, Sequence, such that the augmented key Date,Amount,Sequence is unique for all input records.

Date,Amount,Description,Sequence
2014/12/31,100.0,Payment,
2014/12/31,100.0,Payment,1
2014/12/31,85.0,Payment,

func (*UniquifyProcess) Run

func (p *UniquifyProcess) Run(reader Reader, builder WriterBuilder, errCh chan<- error)

type UseTabProcess

type UseTabProcess struct {
	OnRead bool
}

Merely copies records from input to output - delimiting munging is done by the tool.

func (*UseTabProcess) Run

func (p *UseTabProcess) Run(reader Reader, builder WriterBuilder, errCh chan<- error)

type Writer

type Writer interface {
	Header() []string      // Answer the header of the stream.
	Blank() Record         // Provide a blank record compatible with the stream.
	Write(r Record) error  // Write a single record into the underying stream.
	Error() error          // Return the final error.
	Close(err error) error // Close the writer with the specified error.
}

type WriterBuilder

type WriterBuilder func(header []string) Writer

A constructor for a writer using the specified header. By convention, passing a nil to the Builder returns a Writer which will release any underlying resources held by the builder when Close(error) is called.

func WithCsvWriter

func WithCsvWriter(w *encoding.Writer, c io.Closer) WriterBuilder

Answer a Writer for the CSV stream constrained by specified header, using the specified encoding writer

func WithIoWriter

func WithIoWriter(w io.WriteCloser) WriterBuilder

Answer a Writer for the CSV stream constrained by the specified header, using the specified io writer.

func WithIoWriterAndDelimiter

func WithIoWriterAndDelimiter(w io.WriteCloser, delimiter rune) WriterBuilder

Answer a Writer for the CSV stream constrained by the specified header, using the specified io writer and delimiter.

Directories

Path Synopsis
cmd
Some additional utilities that are useful when processing CSV headers and data.
Some additional utilities that are useful when processing CSV headers and data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL