csv

package
v1.0.1-0...-386defc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 18, 2023 License: Apache-2.0 Imports: 18 Imported by: 0

README

Spanner migration tool: CSV-to-Spanner Migration

Spanner migration tool (formerly known as HarbourBridge) is a stand-alone open source tool for Cloud Spanner evaluation and migration. We now support loading data from CSVs. This assumes a Spanner database with schema already exists and Spanner migration tool loads the data for you. It first reads the schema in the database specified by your target profile to understand how to convert the data to relevant types. If using PG Spanner, you should specify the dialect in the target-profile explicitly. For CSVs, you need to run Spanner migration tool in data mode only

Example CSV Usage

You can load CSVs to Spanner in 2 ways.

  • Without manifest input: Before running, make sure each table's csv is present in the current working directory named [table_name].csv where table_name is the same as the Spanner table name. Spanner migration tool will migrate table_name.csv to a Spanner table named table_name in the database specified via the target profile.
spanner-migration-tool data -source=csv -target-profile="instance=my-instance,dbName=my-db,dialect=postgresql" 
  • Providing a manifest input: You can optionally provide a manifest file if you have csv files in different locations (local system as well as Google Cloud Storage). You can also provide multiple csv file paths for a single table using the manifest.
spanner-migration-tool data -source=csv -source-profile="manifest=path/to/manifest/file" -target-profile="instance=my-instance,dbName=my-db" 
Manifest File

The manifest file should be a list of JSONs containing each table's csv file locations. Each item should contain the fields:

  • "table_name": The name of the table, identical to the corresponding table name in your Spanner schema.
  • "file_patterns": A list of the CSV file paths (local and GCS) that contain data for that table. Note that it is a list, so multiple CSV files are accepted per table.

Sample manifest:

[
    {
      "table_name": "Albums",
      "file_patterns": [
        "/Users/username/Desktop/Albums.csv"
      ]
    },
    {
      "table_name": "Singers",
      "file_patterns": [
        "/Users/username/Desktop/tmp/Singers_1.csv",
        "gs://bucket-name/Singers_2.csv",
        "/Users/username/Downloads/other/Singers_3.csv"
      ]
    }
]

CAVEATS:

  • File patterns do not accept regular expressions. Provide the path inside double quotes.
CSV File Format
  • Spanner migration tool checks the first row and matches it with the spanner columns to check if the first row is a permutation of the Spanner table's column names. If a match is not found, then Spanner migration tool assumes the order of data in csv is same as the corresponding Spanner table's column.
  • You can optionally provide a custom ordering by providing the column names in the first row.
  • Remove trailing spaces, tabs between delimiters and data.
  • Array data has to be enclosed within [] or {}.
  • Default delimiter is , but can be specified via the delimiter flag in source profile.
  • Null values are represented by '' by default. This can be specified via the nullStr flag in source profile.

For example, if you want to use | as the delimiter and NULL as the null value, you can use

spanner-migration-tool data -source=csv -source-profile="delimiter=|,nullStr=NULL" -target-profile="instance=my-instance,dbName=my-db" 

Sample CSV: You can just provide the data in the file

true,helloworld,10

or provide col names in a custom order

int_col,bool_col,string_col
10,true,helloworld

Example data types:

bool_col,byte_col,date_col,float_col,int_col,numeric_col,string_col,timestamp_col,json_col
true,bytevalue,2020-12-09,15.13,100,39.94,Helloworld,2019-10-29 05:30:00,"{""key1"": ""value1"", ""key2"": ""value2""}"

Example array data:

string_col,int_col,array_col
abc,10,"[1,2,3]"
xyz,11,"{1,2,3}"

Since the delimiter is also ,, the array has to be enclosed within "" to make it unambiguous. You can choose a different delimiter for the columns to avoid this (like |). The array data however should only be separated by ,. An alternate approach would be using it like abc|10|[1,2,3]. As for enclosing the array data with [] or {}, you can use either.

CSV Data Type Considerations:

  • The only supported date format right now is RFC3339 full-date format.
  • The only supported timestamp format right now is ISO 8601.
  • The format to escape the quotes in json is adding an additional " in front of the double quote. \ does not work. Also enclose the whole data inside "". Some modification might be required since most databases do not export CSVs with escaping quotes like mentioned.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetCSVFiles

func GetCSVFiles(conv *internal.Conv, sourceProfile profiles.SourceProfile) (tables []utils.ManifestTable, err error)

GetCSVFiles finds the appropriate files paths and downloads gcs files in any.

func ProcessCSV

func ProcessCSV(conv *internal.Conv, tables []utils.ManifestTable, nullStr string, delimiter rune) error

ProcessCSV writes data across the tables provided in the manifest file. Each table's data can be provided across multiple CSV files hence, the manifest accepts a list of file paths in the input.

func SetRowStats

func SetRowStats(conv *internal.Conv, tables []utils.ManifestTable, delimiter rune) error

SetRowStats calculates the number of rows per table.

func ToSpannerType

func ToSpannerType(columnType string) (ddl.Type, error)

func VerifyManifest

func VerifyManifest(conv *internal.Conv, tables []utils.ManifestTable) error

VerifyManifest performs certain prechecks on the structure of the manifest while populating the conv with the ddl types. Also checks on valid file paths and empty CSVs are handled as conv.Unexpected errors later during processing.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL