bq

package
v2.4.3+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 5, 2020 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Overview

Package bq includes all code related to BigQuery.

NB: NOTES ON MEMORY USE AND HTTP SIZE The bigquery library uses JSON encoding of data, which appears to be the only option at this time. Furthermore, it uses intermediate data representations, eventually creating a map[string]Value (unless you pass that in to begin with). In general, when we start pumping large volumes of data, both the map and the JSON will cause some memory pressure, and likely pretty severe limits on the size of the insert we can send, likely on the order of a couple MB of actual row footprint in BQ.

Passing in slice of structs makes memory pressure a bit worse, but probably isn't worth worrying about.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetClient

func GetClient(project string) (*bigquery.Client, error)

GetClient returns an appropriate bigquery client.

func NewBQInserter

func NewBQInserter(params etl.InserterParams, uploader etl.Uploader) (etl.Inserter, error)

NewBQInserter initializes a new BQInserter Pass in nil uploader for normal use, custom uploader for custom behavior TODO - improve the naming between here and NewInserter. TODO - migrate all the tests to use NewColumnPartitionedInserter.

func NewColumnPartitionedInserter

func NewColumnPartitionedInserter(pdt bqx.PDT) (row.Sink, error)

NewColumnPartitionedInserter creates a new BQInserter for the specified BQ table using default bigquery uploader.

func NewColumnPartitionedInserterWithUploader

func NewColumnPartitionedInserterWithUploader(pdt bqx.PDT, uploader etl.Uploader) (row.Sink, error)

NewColumnPartitionedInserterWithUploader creates a new BQInserter with appropriate characteristics. TODO - migrate all the tests to use this instead of NewBQInserter.

func NewInserter

func NewInserter(dt etl.DataType, partition time.Time) (etl.Inserter, error)

NewInserter creates a new BQInserter with appropriate characteristics.

func NewSinkFactory

func NewSinkFactory() factory.SinkFactory

NewSinkFactory returns the default SinkFactory TODO inject a common bq client.

Types

type BQInserter

type BQInserter struct {
	// contains filtered or unexported fields
}

BQInserter provides an API for inserting rows into a specific BQ Table.

func (*BQInserter) Accepted

func (in *BQInserter) Accepted() int

func (*BQInserter) Close

func (in *BQInserter) Close() error

Close synchronizes on the tokens, and closes the backing file.

func (*BQInserter) Commit

func (in *BQInserter) Commit(rows []interface{}, label string) (int, error)

Commit implements row.Sink. It is thread safe, and returns the number of rows successfull committed.

func (*BQInserter) Committed

func (in *BQInserter) Committed() int

func (*BQInserter) Dataset

func (in *BQInserter) Dataset() string

func (*BQInserter) Failed

func (in *BQInserter) Failed() int

func (*BQInserter) Flush

func (in *BQInserter) Flush() error

Flush synchronously flushes the rows in the row buffer up to BigQuery It is NOT threadsafe, as it touches the row buffer, so should only be called by the owning thread. Deprecated: Please use external buffer, Put, and PutAsync instead.

func (*BQInserter) FullTableName

func (in *BQInserter) FullTableName() string

func (*BQInserter) InsertRow

func (in *BQInserter) InsertRow(data interface{}) error

InsertRow adds one row to the insert buffer, and flushes if necessary. Caller should check error, and take appropriate action before calling again. NOT THREADSAFE. Should only be called by owning thread/goroutine. Deprecated: Please use external buffer, Put, and PutAsync instead.

func (*BQInserter) InsertRows

func (in *BQInserter) InsertRows(data []interface{}) error

InsertRows adds rows to the insert buffer, and flushes if necessary. Caller should check error, and take appropriate action before calling again. NOT THREADSAFE. Should only be called by owning thread/goroutine. Deprecated: Please use external buffer, Put, and PutAsync instead.

func (*BQInserter) Project

func (in *BQInserter) Project() string

func (*BQInserter) Put

func (in *BQInserter) Put(rows []interface{}) error

Put sends a slice of rows to BigQuery, processes any errors, and updates row stats. It uses a token to serialize with any previous calls to PutAsync, to ensure that when Put() returns, all flushes have completed and row stats reflect PutAsync requests. (Of course races may occur if calls are made from multiple goroutines). It is THREAD-SAFE. It may block if there is already a Put or Flush in progress.

func (*BQInserter) PutAsync

func (in *BQInserter) PutAsync(rows []interface{})

PutAsync asynchronously sends a slice of rows to BigQuery, processes any errors, and updates row stats. It uses a token to serialize with other (likely synchronous) calls, to ensure that when Put() returns, all flushes have completed and row stats reflect PutAsync requests. (Of course races may occur if these are called from multiple goroutines). It is THREAD-SAFE. It may block if there is already a Put or Flush in progress.

func (*BQInserter) RowsInBuffer

func (in *BQInserter) RowsInBuffer() int

func (*BQInserter) TableBase

func (in *BQInserter) TableBase() string

func (*BQInserter) TableSuffix

func (in *BQInserter) TableSuffix() string

The $ or _ suffix.

type MapSaver

type MapSaver map[string]bigquery.Value

MapSaver is a generic implementation of bq.ValueSaver, based on maps. This avoids extra conversion steps in the bigquery library (except for the JSON conversion). IMPLEMENTS: bigquery.ValueSaver

func (MapSaver) Save

func (s MapSaver) Save() (row map[string]bigquery.Value, insertID string, err error)

Save implements the bigquery.ValueSaver interface

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL