s2db_arrow_driver

package module
v1.1.0-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2023 License: Apache-2.0 Imports: 15 Imported by: 0

README

SingleStoreDB Go Arrow Driver

The SingleStoreDB Go Arrow driver facilitates the reading of data in Apache Arrow format from SingleStoreDB databases. Note that this is the alpha release of the driver, and there may be changes to the API, type conversion, and other internal implementations in the future.

Installation

Run the following command to add the SingleStoreDB Go Arrow driver as a dependency to your Go module:

go get github.com/singlestore-labs/singlestoredb-go-arrow-driver

MySQL driver dependency is required to use this driver:

go get github.com/go-sql-driver/mysql@v1.7.2-0.20230809113539-7cf548287682

Use the following code to import dependencies:

import (
    "database/sql"

    _ "github.com/go-sql-driver/mysql"
    s2db_arrow_driver "github.com/singlestore-labs/singlestoredb-go-arrow-driver"
)

API

The S2DBArrowReader interface provides an API for reading Apache Arrow data from SingleStoreDB databases. To create a new instance of S2DBArrowReader, use the NewS2DBArrowReader function. S2DBArrowReader provides the following methods:

  • GetNextArrowRecordBatch: Retrieves a single Record object (arrow.Record) from the database. When there are no more records to fetch, it returns nil as the first part of the result tuple. You must release the returned Record using the Release() method after use.
  • Close: Finalizes the reading of query results and releases all the acquired resources.

Configuration

The NewS2DBArrowReader function takes S2DBArrowReaderConfig as a parameter. Here are the supported reader configurations and their explanations:

Name Default Description
Conn No default (required) The sql.DB object used to connect with a SingleStoreDB database.
Args nil (no arguments) Arguments for placeholder parameters in the query.
RecordSize 10000 The maximum number of rows in the resulting records.
ParallelReadConfig nil (sequential read) Additional configurations for parallel read. If this value is non-nil, parallel read is enabled.
UseClientConvesion false Indicates if the data should be converted to Arrow Record format on the client. It can be set to true for test purposes, performance optimizations take place only when it is false
EnableQueryLogging false Controls whether the driver should generate debug logs. Debug logs are printed to the standard output.

The S2DBParallelReadConfig allows you to configure additional settings for parallel read. Here are the additional configurations that can be set:

Name Default Description
DatabaseName No default (required) The name of the SingleStoreDB database. It is used to determine the number of partitions for parallel reading.
ChannelSize 10000 The size of the channel buffer. The channel stores references to Arrow Records while reading is in progress and transfers them to the main goroutine.
EnableDebugProfiling false Controls whether to profile the query. Profiling result is printed to the standart output.

Note: Set interpolateParams=true parameter of the sql.DB in order to use parallel read. If this parameter is not set - you will get the following error: This command is not supported in the prepared statement protocol yet

Note: Currently parallel read with UseClientConvesion = false is not supported. So if you set ParallelReadConfig, you must also set UseClientConvesion to true.

Usage example

dsn := fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?interpolateParams=true", "user", "password", "host", 3306, "database")
dsn += fmt.Sprintf("&connectionAttributes=%s:%s,%s:%s", "program_name", "CompanyName_AppName", "program_version", "1.2.3")
db, err := sql.Open("mysql", dsn)
if err != nil {
    // Handle the error
}

arrowReader, err := s2db_arrow_driver.NewS2DBArrowReader(
    context.Background(), 
    s2db_arrow_driver.S2DBArrowReaderConfig{
        Conn:  db,
        Query: "SELECT * FROM t WHERE a > ? AND a < ?",
            Args: []interface{}{1, 10},
        // uncomment lines below to use parallel read instead of Arrow conversion on Server
        // ParallelReadConfig: &s2db_arrow_driver.S2DBParallelReadConfig{
        //     DatabaseName: "db",
        // },
        // UseClientConvesion: true,
    })
if err != nil {
    // Handle the error
}
defer arrowReader.Close()

for batch, err := arrowReader.GetNextArrowRecordBatch(); batch != nil; batch, err = arrowReader.GetNextArrowRecordBatch() {
    if err != nil {
        // Handle the error
    }
    defer batch.Release()

    // Process the batch
}

Performance Considerations

To achieve maximum performance, consider using parallel read. The performance of parallel read depends on the size of the SingleStore cluster and the number of CPU cores on the machine where the code runs. SingleStore recommends using a machine where the number of CPU cores is equal to the number of partitions in the SingleStoreDB database. While the above holds true, parallel read will be supported in the next release.

Additionally, performance is influenced by the data types in the SingleStoreDB database. Performance tests conducted by the SingleStore team demonstrated that nullable data types are slower than non-nullable types. Therefore, consider using non-nullable data types when appropriate.

Data type mapping

The following table maps the SingleStoreDB data types to the corresponding Arrow data types. Note that this mapping is based on the alpha version of the driver and it may change in the future.

SingleStoreDB Data Type Arrow Data Type
UNSIGNED TINYINT uint8
UNSIGNED SMALLINT uint16
UNSIGNED MEDIUMINT uint32
UNSIGNED INT uint32
UNSIGNED BIGINT uint64
TINYINT boolean
SMALLINT int16
MEDIUMINT int32
INT int32
BIGINT int64
FLOAT float32
DOUBLE float64
DECIMAL string
YEAR int16
DATE string
TIME string
DATETIME string
TIMESTAMP string
CHAR string
VARCHAR string
TINYTEXT string
TEXT string
MEDIUMTEXT string
LONGTEXT string
JSON string
BIT binary
BINARY binary
VARBINARY binary
TINYBLOB binary
BLOB binary
MEDIUMBLOB binary
LONGBLOB binary

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type S2DBArrowReader

type S2DBArrowReader interface {
	// GetNextArrowRecordBatch fetches a single arrow.Record from the server
	// It returns nil as the first part of the result tuple if there are no more rows to fetch
	// The returned Record must be Release()'d after use.
	GetNextArrowRecordBatch() (arrow.Record, error)
	// Close finalizes reading of the query results
	// It releases all acquired resources
	Close() error
}

S2DBArrowReader provides an API for reading arrow data from the SingleStore database The NewS2DBArrowReader function should be used to create a new instance of the S2DBArrowReader

func NewS2DBArrowReader

func NewS2DBArrowReader(ctx context.Context, conf S2DBArrowReaderConfig) (S2DBArrowReader, error)

NewS2DBArrowReader creates an instance of S2DBArrowReader It sends a query to the database server for execution

func NewS2DBArrowReaderImpl

func NewS2DBArrowReaderImpl(ctx context.Context, conf S2DBArrowReaderConfig) (S2DBArrowReader, error)

func NewS2DBArrowReaderParallelImpl

func NewS2DBArrowReaderParallelImpl(ctx context.Context, conf S2DBArrowReaderConfig) (S2DBArrowReader, error)

func NewS2DBServerArrowReaderImpl

func NewS2DBServerArrowReaderImpl(ctx context.Context, conf S2DBArrowReaderConfig) (S2DBArrowReader, error)

type S2DBArrowReaderConfig

type S2DBArrowReaderConfig struct {
	// Conn is a sql.DB object which will be used to communicate with the database
	Conn S2SqlDbWrapper
	// Query is a SQL query that will be executed
	Query string
	// Args are arguments for placeholder parameters in the query
	Args []interface{}
	// RecordSize identifies maximum number of rows in the resulting records
	// By default it is 10000
	RecordSize int64
	// UseClientConvesion indicates if the data should be converted to Arrow Record
	// format on the client. For production use, it should be false
	UseClientConvesion bool
	// ParallelReadConfig specifies aditional configurations for parallel read
	// By default it is nil and it means that parallel read is not used
	ParallelReadConfig *S2DBParallelReadConfig
	// EnableQueryLogging controls whether the driver should generate debug logs
	// Debug logs are printed to the standard output
	EnableQueryLogging bool
}

type S2DBArrowReaderImpl

type S2DBArrowReaderImpl struct {
	// contains filtered or unexported fields
}

S2DBArrow implements S2DBArrowReader

func (*S2DBArrowReaderImpl) Close

func (s2db *S2DBArrowReaderImpl) Close() error

func (*S2DBArrowReaderImpl) GetNextArrowRecordBatch

func (s2db *S2DBArrowReaderImpl) GetNextArrowRecordBatch() (arrow.Record, error)

type S2DBArrowReaderParallelImpl

type S2DBArrowReaderParallelImpl struct {
	// contains filtered or unexported fields
}

func (*S2DBArrowReaderParallelImpl) Close

func (s2db *S2DBArrowReaderParallelImpl) Close() error

func (*S2DBArrowReaderParallelImpl) GetNextArrowRecordBatch

func (s2db *S2DBArrowReaderParallelImpl) GetNextArrowRecordBatch() (arrow.Record, error)

type S2DBParallelReadConfig

type S2DBParallelReadConfig struct {
	// DatabaseName is a name of the SingleStore database
	// It is needed to get number of partitions from the database for parallel read
	DatabaseName string
	// ChannelSize specifies size of the channel buffer
	// Channel is used to store references to Arrow Records while reading is happening
	// and transfer them to the main goroutine
	// The default value is 10000
	ChannelSize int64
	// Controls whether to profile the query
	// Profiling result is printed to the standart output
	EnableDebugProfiling bool
}

type S2DBServerArrowReaderImpl

type S2DBServerArrowReaderImpl struct {
	// contains filtered or unexported fields
}

S2DBServerArrowReaderImpl implements S2DBArrowReader

func (*S2DBServerArrowReaderImpl) Close

func (s2db *S2DBServerArrowReaderImpl) Close() error

func (*S2DBServerArrowReaderImpl) GetNextArrowRecordBatch

func (s2db *S2DBServerArrowReaderImpl) GetNextArrowRecordBatch() (arrow.Record, error)

type S2SqlDbWrapper

type S2SqlDbWrapper interface {
	Stats() sql.DBStats
	Close() error
	Conn(ctx context.Context) (*sql.Conn, error)
	ExecContext(ctx context.Context, query string, args ...interface{}) (sql.Result, error)
	QueryContext(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error)
}

S2DB is a utility wrapper around sql.DB, used only by the driver

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL