tsv

package

v0.0.11 Latest Latest Go to latest Published: Mar 7, 2024 License: Apache-2.0 Imports: 10 Imported by: 5

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/grailbio/base

Links

Open Source Insights

Documentation ¶

Overview ¶

Package tsv provides a simple TSV writer which takes care of number->string conversions and tabs, and is far more performant than fmt.Fprintf (thanks to use of strconv.Append{Uint,Float}).

Usage is similar to bufio.Writer, except that in place of the usual Write() method, there are typed WriteString(), WriteUint32(), etc. methods which append one field at a time to the current line, and an EndLine() method to finish the line.

Index ¶

Constants
type Reader
- func NewReader(in io.Reader) *Reader
- func (r *Reader) Read(v interface{}) error
type RowWriter
- func NewRowWriter(w io.Writer) *RowWriter
- func (w *RowWriter) Flush() error
- func (w *RowWriter) Write(v interface{}) error
type Writer
- func NewWriter(w io.Writer) (tw *Writer)

Constants ¶

View Source

const EmptyReadErrStr = "empty file: could not read the header row"

EmptyReadErrStr is the error-string returned by Read() when the file is empty, and at least a header line was expected.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Reader ¶

type Reader struct {
	*csv.Reader

	// HasHeaderRow should be set to true to indicate that the input contains a
	// single header row that lists column names of the rows that follow.  It must
	// be set before reading any data.
	HasHeaderRow bool

	// UseHeaderNames causes the reader to set struct fields by matching column
	// names to struct field names (or `tsv` tag). It must be set before reading
	// any data.
	//
	// If not set, struct fields are filled in order, EVEN IF HasHeaderRow=true.
	// If set, all struct fields must have a corresponding column in the file or
	// IgnoreMissingColumns must also be set. An error will be reported through
	// Read().
	//
	// REQUIRES: HasHeaderRow=true
	UseHeaderNames bool

	// RequireParseAllColumns causes Read() report an error if there are columns
	// not listed in the passed-in struct. It must be set before reading any data.
	//
	// REQUIRES: HasHeaderRow=true
	RequireParseAllColumns bool

	// IgnoreMissingColumns causes the reader to ignore any struct fields that are
	// not present as columns in the file. It must be set before reading any
	// data.
	//
	// REQUIRES: HasHeaderRow=true AND UseHeaderNames=true
	IgnoreMissingColumns bool
	// contains filtered or unexported fields
}

Reader reads a TSV file. It wraps around the standard csv.Reader and allows parsing row contents into a Go struct directly. Thread compatible.

TODO(saito) Support passing a custom bool parser.

TODO(saito) Support a custom "NA" detector.

Example ¶

package main

import (
	"bytes"
	"fmt"
	"io"

	"github.com/grailbio/base/tsv"
)

func main() {
	type row struct {
		Key  string
		Col0 uint
		Col1 float64
	}

	readRow := func(r *tsv.Reader) row {
		var v row
		if err := r.Read(&v); err != nil {
			panic(err)
		}
		return v
	}

	r := tsv.NewReader(bytes.NewReader([]byte(`Key	Col0	Col1
key0	0	0.5
key1	1	1.5
`)))
	r.HasHeaderRow = true
	r.UseHeaderNames = true
	fmt.Printf("%+v\n", readRow(r))
	fmt.Printf("%+v\n", readRow(r))

	var v row
	if err := r.Read(&v); err != io.EOF {
		panic(err)
	}
}

Output:

{Key:key0 Col0:0 Col1:0.5}
{Key:key1 Col0:1 Col1:1.5}

Example (WithTag) ¶

package main

import (
	"bytes"
	"fmt"
	"io"

	"github.com/grailbio/base/tsv"
)

func main() {
	type row struct {
		ColA    string  `tsv:"key"`
		ColB    float64 `tsv:"col1"`
		Skipped int     `tsv:"-"`
		ColC    int     `tsv:"col0,fmt=d"`
		Hex     int     `tsv:",fmt=x"`
		Hyphen  int     `tsv:"-,"`
	}
	readRow := func(r *tsv.Reader) row {
		var v row
		if err := r.Read(&v); err != nil {
			panic(err)
		}
		return v
	}

	r := tsv.NewReader(bytes.NewReader([]byte(`key	col0	col1	Hex	-
key0	0	0.5	a	1
key1	1	1.5	f	2
`)))
	r.HasHeaderRow = true
	r.UseHeaderNames = true
	fmt.Printf("%+v\n", readRow(r))
	fmt.Printf("%+v\n", readRow(r))

	var v row
	if err := r.Read(&v); err != io.EOF {
		panic(err)
	}
}

Output:

{ColA:key0 ColB:0.5 Skipped:0 ColC:0 Hex:10 Hyphen:1}
{ColA:key1 ColB:1.5 Skipped:0 ColC:1 Hex:15 Hyphen:2}

func NewReader ¶

func NewReader(in io.Reader) *Reader

NewReader creates a new TSV reader that reads from the given input.

func (*Reader) Read ¶

func (r *Reader) Read(v interface{}) error

Read reads the next TSV row into a go struct. The argument must be a pointer to a struct. It parses each column in the row into the matching struct fields.

Example:

 r := tsv.NewReader(...)
 ...
 type row struct {
   Col0 string
   Col1 int
   Float int
}
var v row
err := r.Read(&v)

If !Reader.HasHeaderRow or !Reader.UseHeaderNames, the N-th column (base zero) will be parsed into the N-th field in the struct.

If Reader.HasHeaderRow and Reader.UseHeaderNames, then the struct's field name must match one of the column names listed in the first row in the TSV input. The contents of the column with the matching name will be parsed into the struct field.

By default, the column name is the struct's field name, but you can override it by setting `tsv:"columnname"` tag in the field. The struct tag may also take an fmt option to specify how to parse the value using the fmt package. This is useful for parsing numbers written in a different base. Note that not all verbs are supported with the scanning functions in the fmt package. Using the fmt option may lead to slower performance. Imagine the following row type:

type row struct {
   Chr    string `tsv:"chromo"`
   Start  int    `tsv:"pos"`
   Length int
   Score  int    `tsv:"score,fmt=x"`
}

and the following TSV file:

| chromo | Length | pos | score
| chr1   | 1000   | 10  | 0a
| chr2   | 950    | 20  | ff

The first Read() will return row{"chr1", 10, 1000, 10}.

The second Read() will return row{"chr2", 20, 950, 15}.

Embedded structs are supported, and the default column name for nested fields will be the unqualified name of the field.

type RowWriter ¶ added in v0.0.2

type RowWriter struct {
	// contains filtered or unexported fields
}

RowWriter writes structs to TSV files using field names or "tsv" tags as TSV column headers.

TODO: Consider letting the caller filter or reorder columns.

Example ¶

package main

import (
	"bytes"
	"fmt"

	"github.com/grailbio/base/tsv"
)

func main() {
	type rowTyp struct {
		Foo float64 `tsv:"foo,fmt=.2f"`
		Bar float64 `tsv:"bar,fmt=.3f"`
		Baz float64
	}
	rows := []rowTyp{
		{Foo: 0.1234, Bar: 0.4567, Baz: 0.9876},
		{Foo: 1.1234, Bar: 1.4567, Baz: 1.9876},
	}
	var buf bytes.Buffer
	w := tsv.NewRowWriter(&buf)
	for i := range rows {
		if err := w.Write(&rows[i]); err != nil {
			panic(err)
		}
	}
	if err := w.Flush(); err != nil {
		panic(err)
	}
	fmt.Print(string(buf.Bytes()))

}

Output:

foo	bar	Baz
0.12	0.457	0.9876
1.12	1.457	1.9876

func NewRowWriter ¶ added in v0.0.2

func NewRowWriter(w io.Writer) *RowWriter

NewRowWriter constructs a writer.

User must call Flush() after last Write().

func (*RowWriter) Flush ¶ added in v0.0.2

func (w *RowWriter) Flush() error

Flush flushes all previously-written rows.

func (*RowWriter) Write ¶ added in v0.0.2

func (w *RowWriter) Write(v interface{}) error

Write writes a TSV row containing the values of v's exported fields. v must be a pointer to a struct.

On first Write, a TSV header row is written using v's type. Subsequent Write()s may pass v of different type, but no guarantees are made about consistent column ordering with different types.

By default, the column name is the struct's field name, but you can override it by setting `tsv:"columnname"` tag in the field.

You can optionally specify an fmt option in the tag which will control how to format the value using the fmt package. Note that the reader may not support all the verbs. Without the fmt option, formatting options are preset for each type. Using the fmt option may lead to slower performance.

Embedded structs are supported, and the default column name for nested fields will be the unqualified name of the field.

type Writer ¶

type Writer struct {
	// contains filtered or unexported fields
}

Writer provides an efficient and concise way to append a field at a time to a TSV. However, note that it does NOT have a Write() method; the interface is deliberately restricted.

We force this to fill at least one cacheline to prevent false sharing when make([]Writer, parallelism) is used.

func NewWriter ¶

func NewWriter(w io.Writer) (tw *Writer)

NewWriter creates a new tsv.Writer from an io.Writer.

func (*Writer) Copy ¶

func (w *Writer) Copy(r io.Reader) error

Copy appends the entire contents of the given io.Reader (assumed to be another TSV file).

func (*Writer) EndCsv ¶

func (w *Writer) EndCsv()

EndCsv finishes the current comma-separated field, converting the last comma to a tab. It must be nonempty.

func (*Writer) EndLine ¶

func (w *Writer) EndLine() (err error)

EndLine finishes the current line. It must be nonempty.

func (*Writer) Flush ¶

func (w *Writer) Flush() error

Flush flushes all finished lines.

func (*Writer) WriteByte ¶

func (w *Writer) WriteByte(b byte)

WriteByte appends the given literal byte (no number->string conversion) and a tab to the current line.

func (*Writer) WriteBytes ¶

func (w *Writer) WriteBytes(s []byte)

WriteBytes appends the given []byte and a tab to the current line.

func (*Writer) WriteCsvByte ¶

func (w *Writer) WriteCsvByte(b byte)

WriteCsvByte appends the given literal byte (no number->string conversion) and a comma to the current line.

func (*Writer) WriteCsvUint32 ¶

func (w *Writer) WriteCsvUint32(ui uint32)

WriteCsvUint32 converts the given uint32 to a string, and appends that and a comma to the current line.

func (*Writer) WriteFloat64 ¶

func (w *Writer) WriteFloat64(f float64, fmt byte, prec int)

WriteFloat64 converts the given float64 to a string with the given strconv.AppendFloat parameters, and appends that and a tab to the current line.

func (*Writer) WriteInt64 ¶

func (w *Writer) WriteInt64(i int64)

WriteInt64 converts the given int64 to a string, and appends that and a tab to the current line.

func (*Writer) WritePartialByte ¶ added in v0.0.11

func (w *Writer) WritePartialByte(b byte)

WritePartialByte appends the given literal byte (no number->string conversion) WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialBytes ¶

func (w *Writer) WritePartialBytes(s []byte)

WritePartialBytes appends a []byte WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialString ¶

func (w *Writer) WritePartialString(s string)

WritePartialString appends a string WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialUint32 ¶

func (w *Writer) WritePartialUint32(ui uint32)

WritePartialUint32 converts the given uint32 to a string, and appends that WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WriteString ¶

func (w *Writer) WriteString(s string)

WriteString appends the given string and a tab to the current line. (It is safe to use this to write multiple fields at a time.)

func (*Writer) WriteUint32 ¶

func (w *Writer) WriteUint32(ui uint32)

WriteUint32 converts the given uint32 to a string, and appends that and a tab to the current line.

func (*Writer) WriteUint64 ¶ added in v0.0.2

func (w *Writer) WriteUint64(ui uint64)

WriteUint64 converts the given uint64 to a string, and appends that and a tab to the current line.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL