s3selectsqldriver

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 18, 2023 License: MIT Imports: 23 Imported by: 1

README

s3-select-sql-driver

Documentation Latest GitHub tag Github Actions test Go Report Card License

S3 Select sql driver for Go's database/sql package

Usage

for example:

package main

import (
	"context"
	"database/sql"
	"log"
	"time"

	_ "github.com/mashiike/s3-select-sql-driver"
)

func main() {
	db, err := sql.Open("s3-select", "s3://example-com/abc.csv?format=csv")
	if err != nil {
		log.Fatalln(err)
	}
	defer db.Close()
	rows, err := db.QueryContext(
		context.Background(),
		`SELECT timestamp, message FROM s3object s`,
	)
	if err != nil {
		log.Fatalln(err)
	}
	defer rows.Close()
	for rows.Next() {
		var timestamp time.Time
		var message string
		err := rows.Scan(&timestamp, &message)
		if err != nil {
			log.Println(err)
			break
		}
		log.Printf("%s\t%s", timestamp, message)
	}
}
Placeholders

S3 Select SQL driver supports placeholders.

for example, can use ordinal placeholders

rows, err := db.QueryContext(
    context.Background(),
    `SELECT timestamp, message FROM s3object s WHERE timestamp > ?`,
    time.Now().Add(-time.Hour),
)

and named placeholders

rows, err := db.QueryContext(
    context.Background(),
    `SELECT timestamp, message FROM s3object s WHERE timestamp > :timestamp`,
    sql.Named("timestamp", time.Now().Add(-time.Hour)),
)
DSN format
s3://<bucket>/<key>?<query>
query parameters
name description default
format object format (csv,tsv,json,json_lines,parquet) file ext auto detect
compression_type gzip or bzip, none none
parse_time parse time column false
input_serialization input serialization base64 json
region aws region
input serialization base64 json

if set complex format, you can set input serialization in DSN

see also S3 Select InputSerialization

{
  "CSV": {
    "FileHeaderInfo": "NONE",
    "RecordDelimiter": "\n",
    "FieldDelimiter": ",",
    "QuoteCharacter": "\"",
    "QuoteEscapeCharacter": "\"",
    "Comments": "#",
    "AllowQuotedRecordDelimiter": false
  },
  "CompressionType": "NONE"
}

and can set DSN query parameter(json base64 encoded)

s3://example-com/hoge.csv?input_serialization=ewogICJDU1YiOiB7CiAgICAiRmlsZUhlYWRlckluZm8iOiAiTk9ORSIsCiAgICAiUmVjb3JkRGVsaW1pdGVyIjogIlxuIiwKICAgICJGaWVsZERlbGltaXRlciI6ICIsIiwKICAgICJRdW90ZUNoYXJhY3RlciI6ICJcIiIsCiAgICAiUXVvdGVFc2NhcGVDaGFyYWN0ZXIiOiAiXCIiLAogICAgIkNvbW1lbnRzIjogIiMiLAogICAgIkFsbG93UXVvdGVkUmVjb3JkRGVsaW1pdGVyIjogZmFsc2UKICB9LAogICJDb21wcmVzc2lvblR5cGUiOiAiTk9ORSIKfQo
Prefix Search and LIMIT clause

S3 Select SQL driver supports prefix search and LIMIT clause. for example: DSN is s3://example-com/data/?format=csv&compression_type=gzip Query is SELECT * FROM s3object s LIMIT 10

S3 Select SQL driver search s3://example-com/data/ prefix and get 10 objects. case of following objects in bucket.

s3://example-com/data/2020-01-01.csv.gz
s3://example-com/data/2020-01-02.csv.gz
s3://example-com/data/2020-01-03.csv.gz
s3://example-com/data/2020-01-04.csv.gz
...

each object is 5 rows, execute S3 Select object is 2020-01-01.csv.gz and 2020-01-02.csv.gz and get 10 rows.

LICENSE

MIT

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNotSupported = errors.New("not supported")
	ErrDSNEmpty     = errors.New("dsn is empty")
)
View Source
var S3SelectClientConstructor func(ctx context.Context, cfg *S3SelectConfig) (S3SelectClient, error)

Functions

func SetDebugLogger

func SetDebugLogger(l Logger) error

func SetInputSerializationToURLValues added in v0.3.0

func SetInputSerializationToURLValues(params url.Values, inputSerialization *types.InputSerialization) error

func SetLogger

func SetLogger(l Logger) error

Types

type Logger

type Logger interface {
	Printf(format string, v ...any)
	SetOutput(w io.Writer)
	Writer() io.Writer
}

type S3SelectClient

type S3SelectClient interface {
	SelectObjectContentWithWriter(ctx context.Context, w io.Writer, params *s3.SelectObjectContentInput, optFns ...func(*s3.Options)) error
	s3.ListObjectsV2APIClient
}

func DefaultS3SelectClientConstructor

func DefaultS3SelectClientConstructor(ctx context.Context, cfg *S3SelectConfig) (S3SelectClient, error)

type S3SelectClientWithWriter

type S3SelectClientWithWriter struct {
	*s3.Client
}

func (S3SelectClientWithWriter) SelectObjectContentWithWriter

func (c S3SelectClientWithWriter) SelectObjectContentWithWriter(ctx context.Context, w io.Writer, params *s3.SelectObjectContentInput, optFns ...func(*s3.Options)) error

type S3SelectCompressionType

type S3SelectCompressionType string
const (
	S3SelectCompressionTypeNone  S3SelectCompressionType = "none"
	S3SelectCompressionTypeGzip  S3SelectCompressionType = "gzip"
	S3SelectCompressionTypeBzip2 S3SelectCompressionType = "bzip2"
)

type S3SelectConfig

type S3SelectConfig struct {
	BucketName         string
	ObjectKey          string
	ObjectKeyPrefix    string
	Format             S3SelectFormat
	CompressionType    S3SelectCompressionType
	InputSerialization *types.InputSerialization
	ParseTime          *bool
	Params             url.Values
	S3OptFns           []func(*s3.Options)
}

func ParseDSN

func ParseDSN(dsn string) (*S3SelectConfig, error)

func (*S3SelectConfig) String

func (cfg *S3SelectConfig) String() string

func (*S3SelectConfig) WithRegion

func (cfg *S3SelectConfig) WithRegion(region string) *S3SelectConfig

type S3SelectFormat

type S3SelectFormat string
const (
	S3SelectFormatCSV     S3SelectFormat = "csv"
	S3SelectFormatTSV     S3SelectFormat = "tsv"
	S3SelectFormatJSON    S3SelectFormat = "json"
	S3SelectFormatParquet S3SelectFormat = "parquet"
	S3SelectFormatJSONL   S3SelectFormat = "json_lines"
)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL