dbtoredshift

package module
v0.0.0-...-46f6989 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 20, 2017 License: Apache-2.0 Imports: 8 Imported by: 0

README

db-to-redshift

A Go library for importing data from any sql.DB to Redshift. This library performs 3 steps:

  1. Retrieves data from the source database using the provided query.
  2. Writes the data to a CSV file and uploads it to the provided S3 bucket.
  3. Copies the S3 file into the provided Redshift database.

Install

go get github.com/jaredpiedt/db-to-redshift

Use

package main

import (
    "database/sql"
    "fmt"
    
    "github.com/aws/aws-sdk-go/aws/session"
    _ "github.com/go-sql-driver/mysql"
    "github.com/jaredpiedt/db-to-redshift"
    _ "github.com/lib/pq"
)

func main() {
    // Initialize AWS session
    session, err := session.NewSession()
    if err != nil {
        panic(err)
    }
    
    // Open connection to source database
    sourceDB, err := sql.Open("mysql", fmt.Sprintf(
        "%s:%s@tcp(%s:3306)/%s",
        "user",
        "password",
        "host.com",
        "database",
    ))
    if err != nil {
        panic(err)
    }
    
    // Open connection to redshift database
    rsDB, err := sql.Open("pq", fmt.Sprintf(
        "user=%s password=%s dbname=%s sslmode=disable host=%s port=5439 sslmode=require",
        "user",
        "password",
        "database",
        "host.com",
    ))
    if err != nil {
        panic(err)
    }
    
    // Setup dbtoredshift config
    cfg := dbtoredshift.Config{
        Session:  session,
        SourceDB: sourceDB,
        Redshift: dbtoredshift.Redshift{
            DB:               rsDB,
            Schema:           "<schema>",
            Table:            "<table>",
            CredentialsParam: "aws_iam_role=arn:aws:iam::<aws-account-id>:role/<role-name>",
            CopyParams:       "TRUNCATECOLUMNS BLANKSASNULL EMPTYASNULL TIMEFORMAT 'auto' DATEFORMAT 'auto'"
        },
        S3: dbtoredshift.S3{
            Bucket: "<bucket",
            Prefix: "<prefix>",
            Key:    "<key>",
            Region: "<region>"
        }
    }
    
    client := dbtoredshift.New(cfg)
    
    // Execute query. Data returned from that query will be inserted into Redshift
    err := client.Exec("SELECT * FROM schema.table")
}

License

db-to-redshift is available under the Apache License, Version 2.0.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client wraps all database and S3 information.

func New

func New(c Config) *Client

New will return a pointer to a new db-to-redshift Client.

func (*Client) Exec

func (c *Client) Exec(query string) error

Exec executes a query, writes the results to S3, and then loads them into Redshift. It returns any error it encounters.

type Config

type Config struct {
	Session      *session.Session
	SourceDB     *sql.DB
	Redshift     Redshift
	S3           S3
	CSVDelimiter rune // Default value is ','
}

Config contains all the information needed to create a new Client.

type Redshift

type Redshift struct {
	DB     *sql.DB
	Schema string
	Table  string

	// A clause that indicates the method your cluster will use when accessing
	// your AWS S3 resource.
	CredentialsParam string

	// Specify how COPY will map field data to columns in the target table,
	// define source data attributes to enable the COPY command to correctly
	// read and parse the source data, and manage which operations the COPY
	// command performs during the load process
	CopyParams string
}

Redshift contains all of the information needed to COPY from S3 into Redshift.

type S3

type S3 struct {
	Region string
	Bucket string
	Prefix string
	Key    string
}

S3 contains all of the information needed to connect to an S3 bucket.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL