base: github.com/grailbio/base/tsv Index | Examples | Files

package tsv

import "github.com/grailbio/base/tsv"

Package tsv provides a simple TSV writer which takes care of number->string conversions and tabs, and is far more performant than fmt.Fprintf (thanks to use of strconv.Append{Uint,Float}).

Usage is similar to bufio.Writer, except that in place of the usual Write() method, there are typed WriteString(), WriteUint32(), etc. methods which append one field at a time to the current line, and an EndLine() method to finish the line.

Index

Examples

Package Files

doc.go reader.go row_writer.go writer.go

type Reader Uses

type Reader struct {
    *csv.Reader

    // HasHeaderRow should be set to true to indicate that the input contains a
    // single header row that lists column names of the rows that follow.  It must
    // be set before reading any data.
    HasHeaderRow bool

    // UseHeaderNames causes the reader to set struct fields by matching column
    // names to struct field names (or `tsv` tag). It must be set before reading
    // any data.
    //
    // If not set, struct fields are filled in order, EVEN IF HasHeaderRow=true.
    // If set, all struct fields must have a corresponding column in the file.
    // An error will be reported through Read().
    //
    // REQUIRES: HasHeaderRow=true
    UseHeaderNames bool

    // RequireParseAllColumns causes Read() report an error if there are columns
    // not listed in the passed-in struct. It must be set before reading any data.
    //
    // REQUIRES: HasHeaderRow=true
    RequireParseAllColumns bool
    // contains filtered or unexported fields
}

Reader reads a TSV file. It wraps around the standard csv.Reader and allows parsing row contents into a Go struct directly. Thread compatible.

TODO(saito) Support passing a custom bool parser.

TODO(saito) Support a custom "NA" detector.

Code:

type row struct {
    Key  string
    Col0 uint
    Col1 float64
}

readRow := func(r *tsv.Reader) row {
    var v row
    if err := r.Read(&v); err != nil {
        panic(err)
    }
    return v
}

r := tsv.NewReader(bytes.NewReader([]byte(`Key	Col0	Col1
key0	0	0.5
key1	1	1.5
`)))
r.HasHeaderRow = true
r.UseHeaderNames = true
fmt.Printf("%+v\n", readRow(r))
fmt.Printf("%+v\n", readRow(r))

var v row
if err := r.Read(&v); err != io.EOF {
    panic(err)
}

Output:

{Key:key0 Col0:0 Col1:0.5}
{Key:key1 Col0:1 Col1:1.5}

Code:

type row struct {
    ColA    string  `tsv:"key"`
    ColB    float64 `tsv:"col1"`
    Skipped int     `tsv:"-"`
    ColC    int     `tsv:"col0"`
}
readRow := func(r *tsv.Reader) row {
    var v row
    if err := r.Read(&v); err != nil {
        panic(err)
    }
    return v
}

r := tsv.NewReader(bytes.NewReader([]byte(`key	col0	col1
key0	0	0.5
key1	1	1.5
`)))
r.HasHeaderRow = true
r.UseHeaderNames = true
fmt.Printf("%+v\n", readRow(r))
fmt.Printf("%+v\n", readRow(r))

var v row
if err := r.Read(&v); err != io.EOF {
    panic(err)
}

Output:

{ColA:key0 ColB:0.5 Skipped:0 ColC:0}
{ColA:key1 ColB:1.5 Skipped:0 ColC:1}

func NewReader Uses

func NewReader(in io.Reader) *Reader

NewReader creates a new TSV reader that reads from the given input.

func (*Reader) Read Uses

func (r *Reader) Read(v interface{}) error

Read reads the next TSV row into a go struct. The argument must be a pointer to a struct. It parses each column in the row into the matching struct fields.

Example:

 r := tsv.NewReader(...)
 ...
 type row struct {
   Col0 string
   Col1 int
   Float int
}
var v row
err := r.Read(&v)

- If !Reader.HasHeaderRow or !Reader.UseHeaderNames, the N-th column (base

zero) will be parsed into the N-th field in the struct.

- If Reader.HasHeaderRow and Reader.UseHeaderNames, then the struct's field

name must match one of the column names listed in the first row in the TSV
input. The contents of the column with the matching name will be parsed
into the struct field. By default, the column name is the struct's field
name, but you can override it by setting `tsv:"columnname"` tag in the
field. Imagine the following row type:

type row struct {
   Chr string `tsv:"chromo"`
   Start int `tsv:"pos"`
   Length int
}

and the following TSV file:

| chromo | length | pos
| chr1   | 1000   | 10
| chr2   | 950    | 20

The first Read() will return row{"chr1", 10, 1000}.
The second Read() will return row{"chr2", 20, 950}.

type RowWriter Uses

type RowWriter struct {
    // contains filtered or unexported fields
}

RowWriter writes structs to TSV files using field names or "tsv" tags as TSV column headers.

TODO: Consider letting the caller filter or reorder columns.

func NewRowWriter Uses

func NewRowWriter(w io.Writer) *RowWriter

NewRowWriter constructs a writer.

User must call Flush() after last Write().

func (*RowWriter) Flush Uses

func (w *RowWriter) Flush() error

Flush flushes all previously-written rows.

func (*RowWriter) Write Uses

func (w *RowWriter) Write(v interface{}) error

Write writes a TSV row containing the values of v's exported fields. v must be a pointer to a struct.

On first Write, a TSV header row is written using v's type. Subsequent Write()s may pass v of different type, but no guarantees are made about consistent column ordering with different types.

type Writer Uses

type Writer struct {
    // contains filtered or unexported fields
}

Writer provides an efficient and concise way to append a field at a time to a TSV. However, note that it does NOT have a Write() method; the interface is deliberately restricted.

We force this to fill at least one cacheline to prevent false sharing when make([]Writer, parallelism) is used.

func NewWriter Uses

func NewWriter(w io.Writer) (tw *Writer)

NewWriter creates a new tsv.Writer from an io.Writer.

func (*Writer) Copy Uses

func (w *Writer) Copy(r io.Reader) error

Copy appends the entire contents of the given io.Reader (assumed to be another TSV file).

func (*Writer) EndCsv Uses

func (w *Writer) EndCsv()

EndCsv finishes the current comma-separated field, converting the last comma to a tab. It must be nonempty.

func (*Writer) EndLine Uses

func (w *Writer) EndLine() (err error)

EndLine finishes the current line. It must be nonempty.

func (*Writer) Flush Uses

func (w *Writer) Flush() error

Flush flushes all finished lines.

func (*Writer) WriteByte Uses

func (w *Writer) WriteByte(b byte)

WriteByte appends the given literal byte (no number->string conversion) and a tab to the current line.

func (*Writer) WriteBytes Uses

func (w *Writer) WriteBytes(s []byte)

WriteBytes appends the given []byte and a tab to the current line.

func (*Writer) WriteCsvByte Uses

func (w *Writer) WriteCsvByte(b byte)

WriteCsvByte appends the given literal byte (no number->string conversion) and a comma to the current line.

func (*Writer) WriteCsvUint32 Uses

func (w *Writer) WriteCsvUint32(ui uint32)

WriteCsvUint32 converts the given uint32 to a string, and appends that and a comma to the current line.

func (*Writer) WriteFloat64 Uses

func (w *Writer) WriteFloat64(f float64, fmt byte, prec int)

WriteFloat64 converts the given float64 to a string with the given strconv.AppendFloat parameters, and appends that and a tab to the current line.

func (*Writer) WriteInt64 Uses

func (w *Writer) WriteInt64(i int64)

WriteInt64 converts the given int64 to a string, and appends that and a tab to the current line.

func (*Writer) WritePartialBytes Uses

func (w *Writer) WritePartialBytes(s []byte)

WritePartialBytes appends a []byte WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialString Uses

func (w *Writer) WritePartialString(s string)

WritePartialString appends a string WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialUint32 Uses

func (w *Writer) WritePartialUint32(ui uint32)

WritePartialUint32 converts the given uint32 to a string, and appends that WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WriteString Uses

func (w *Writer) WriteString(s string)

WriteString appends the given string and a tab to the current line. (It is safe to use this to write multiple fields at a time.)

func (*Writer) WriteUint32 Uses

func (w *Writer) WriteUint32(ui uint32)

WriteUint32 converts the given uint32 to a string, and appends that and a tab to the current line.

func (*Writer) WriteUint64 Uses

func (w *Writer) WriteUint64(ui uint64)

WriteUint64 converts the given uint64 to a string, and appends that and a tab to the current line.

Package tsv imports 9 packages (graph) and is imported by 4 packages. Updated 2019-12-24. Refresh now. Tools for package owners.