mysql

package
v1.0.1-0...-1811ebb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 28, 2023 License: Apache-2.0 Imports: 28 Imported by: 0

README

HarbourBridge: MySQL-to-Spanner Evaluation and Migration

HarbourBridge is a stand-alone open source tool for Cloud Spanner evaluation and migration, using data from an existing database. This README provides details of the tool's MySQL capabilities. For general HarbourBridge information see this README.

Example MySQL Usage

HarbourBridge can either be used with mysqldump or it can be run directly on a MySQL database (via go's database/sql package).

The following examples assume a harbourbridge alias has been setup as described in the Installing HarbourBridge section of the main README.

Using HarbourBridge with mysqldump

The tool can be used to migrate schema from an existing mysqldump file:

harbourbridge schema -source=mysql < my_mysqldump_file

This will generate a session file with session.json suffix. This file contains schema mapping from source to destination. You will need to specify this file during data migration. You also need to specify a particular Spanner instance and database to use during data migration.

For example, run

harbourbridge data -session=mydb.session.json -source=mysql -target-profile="instance=my-spanner-instance,dbName=my-spanner-database-name" < my_mysqldump_file

You can also run HarbourBridge in a schema-and-data mode, where it will perform both schema and data migration. This is useful for quick evaluation when source database size is small.

harbourbridge schema-and-data -source=mysql -target-profile="instance=my-spanner-instance" < my_mysqldump_file

HarbourBridge generates a report file, a schema file, and a bad-data file (if there are bad-data rows). You can control where these files are written by specifying a file prefix. For example,

harbourbridge schema -prefix=mydb. -source=mysql < my_mysqldump_file

will write files mydb.report.txt, mydb.schema.txt, and mydb.dropped.txt. The prefix can also be a directory. For example,

harbourbridge schema -prefix=~/spanner-eval-mydb/ -source=mysql < my_mysqldump_file

would write the files into the directory ~/spanner-eval-mydb/. Note that HarbourBridge will not create directories as it writes these files.

Directly connecting to a MySQL database

In this case, HarbourBridge connects directly to the MySQL database to retrieve table schema and data. Set the -source=mysql and corresponding source profile connection parameters host, port, user, dbName and password.

For example, to perform schema conversion, run

harbourbridge schema -source=mysql -source-profile="host=<>,port=<>,user=<>,dbName=<>"

Parameters port and password are optional. Port (port) defaults to 3306 for MySQL source. Password can be provided at the password prompt.

(⚠ Deprecated ⚠) Set environment variables MYSQLHOST, MYSQLPORT, MYSQLUSER, MYSQLDATABASE. Password can be specified either in the MYSQLPWD environment variable or provided at the password prompt.

Note that the various target-profile params described in the previous section are also applicable in direct connect mode.

Schema Conversion

The HarbourBridge tool maps MySQL types to Spanner types as follows:

MySQL Type Spanner Type Notes
BOOL, BOOLEAN,
TINYINT(1)
BOOL
BIGINT INT64
BINARY, VARBINARY BYTES(MAX)
BLOB, MEDIUMBLOB,
TINYBLOB, LONGBLOB
BYTES(MAX)
BIT BYTES(MAX)
CHAR STRING(1) CHAR defaults to length 1
CHAR(N) STRING(N) c
DATE DATE
DATETIME TIMESTAMP t
DECIMAL, NUMERIC NUMERIC p
DOUBLE FLOAT64
ENUM STRING(MAX)
FLOAT FLOAT64 s
INTEGER, MEDIUMINT,
TINYINT, SMALLINT
INT64 s
JSON JSON
SET ARRAY<STRING> SET only supports string values
TEXT, MEDIUMTEXT,
TINYTEXT, LONGTEXT
STRING(MAX)
TIMESTAMP TIMESTAMP
VARCHAR STRING(MAX)
VARCHAR(N) STRING(N) c

Spanner does not support spatial datatypes of MySQL. Along with spatial datatypes, all other types map to STRING(MAX). Some of the mappings in this table represent potential changes of precision (marked p), differences in treatment of timezones (marked t), differences in treatment of fixed-length character types (marked c), and changes in storage size (marked s). We discuss these, as well as other limits and notes on schema conversion, in the following sections.

DECIMAL and NUMERIC

Spanner's NUMERIC type can store up to 29 digits before the decimal point and up to 9 after the decimal point. MySQL's NUMERIC type can potentially support higher precision than this, so please verify that Spanner's NUMERIC support meets your application needs. Note that in MySQL, NUMERIC is implemented as DECIMAL, so the remarks about DECIMAL apply equally to NUMERIC.

TIMESTAMP and DATETIME

MySQL has two timestamp types: TIMESTAMP and DATETIME. Both provide microsecond resolution, but neither actually stores a timezone with the data. The key difference between the two types is that MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. This does not occur for DATETIME and data is returned without a timezone. For TIMESTAMP, timezone can be set by time zone offset parameter.

Spanner has a single timestamp type. Data is stored as UTC (there is no separate timezone) Spanner client libraries convert timestamps to UTC before sending them to Spanner. Data is always returned as UTC. Spanner's timestamp type is essentially the same as TIMESTAMP, except that there is no analog of MySQL's timezone offset parameter.

In other words, mapping MySQL DATETIME to TIMESTAMP is fairly straightforward, but care should be taken with MySQL DATETIME data because Spanner clients will not drop the timezone.

CHAR(n) and VARCHAR(n)

The semantics of fixed-length character types differ between MySQL and Spanner. The CHAR(n) type in MySQL is right-padded with spaces. If a string value smaller than the limit is stored, spaces will be added to pad it out to the specified length. If a string longer than the specified length is stored, and the extra characters are all spaces, then it will be silently truncated. Moreover, trailing spaces are ignored when comparing two values. In constrast, Spanner does not give special treatment to spaces, and the specified length simply represents the maximum length that can be stored. This is close to the semantics of MySQL's VARCHAR(n). However there are some minor differences. For example, even VARCHAR(n) has some special treatment of spaces: string with trailing spaces in excess of the column length are truncated prior to insertion and a warning is generated.

SET

MySQL SET is a string object that can hold muliple values, each of which must be chosen from a list of permitted values specified when the table is created. SET is being mapped to Spanner type ARRAY<STRING>. Validation of SET element values will be dropped in Spanner. Thus for production use, validation needs to be done in the application.

Spatial datatype

MySQL spatial datatypes are used to represent geographic feature. It includes GEOMETRY, POINT, LINESTRING, POLYGON, MULTIPOINT, MULTIPOLYGON and GEOMETRYCOLLECTION datatypes. Spanner does not support spatial data types. This datatype are currently mapped to standard STRING Spanner datatype.

Storage Use

The tool maps several MySQL types to Spanner types that use more storage. For example, SMALLINT is a two-byte integer, but it maps to Spanner's INT64, an eight-byte integer.

Primary Keys

Spanner requires primary keys for all tables. MySQL recommends the use of primary keys for all tables, but does not enforce this. When converting a table without a primary key, HarbourBridge will create a new primary key of type INT64. By default, the name of the new column is synth_id. If there is already a column with that name, then a variation is used to avoid collisions.

NOT NULL Constraints

The tool preserves NOT NULL constraints. Note that Spanner does not require primary key columns to be NOT NULL. However, in MySQL, a primary key is a combination of NOT NULL and UNIQUE, and so primary key columns from MySQL will be mapped to Spanner columns that are both primary keys and NOT NULL.

Foreign Keys

The tool maps MySQL foreign key constraints into Spanner foreign key constraints, and preserves constraint names where possible. Since Spanner doesn't support ON DELETE and ON UPDATE actions, we drop them.

Default Values

Spanner does not currently support default values. We drop these MySQL features during conversion.

Secondary Indexes

The tool maps MySQL secondary indexes to Spanner secondary indexes, and preserves constraint names where possible. Note that Spanner requires index key constraint names to be globally unique (within a database), but in MySQL they only have to be unique for a table, so we add a uniqueness suffix to a name if needed. The tool also maps UNIQUE constraint into UNIQUE secondary index. Note that due to limitations of our mysqldump parser, we are not able to handle key column ordering (i.e. ASC/DESC) in mysqldump files. All key columns in mysqldump files will be treated as ASC.

Other MySQL features

MySQL has many other features we haven't discussed, including functions, sequences, procedures, triggers, (non-primary) indexes and views. The tool does not support these and the relevant statements are dropped during schema conversion.

See Migrating from MySQL to Cloud Spanner for a general discussion of MySQL to Spanner migration issues. HarbourBridge follows most of the recommendations in that guide. The main difference is that we map a few more types to STRING(MAX).

Data Conversion

Timestamps and Timezones

As noted earlier when discussing schema conversion of TIMESTAMP, there are some subtle differences in how timestamps are handled in MySQL and Spanner.

During data conversion, MySQL TIMESTAMP values are converted to UTC and stored in Spanner. The conversion proceeds as follows. If the value has a timezone offset, that timezone is respected during the conversion to UTC. If the value does not have a timezone offset, then we look for any set timezone statements in the mysqldump output and use the timezone offset specified. Otherwise, we use '+00:00' timezone offset (UTC).

Strings, character set support and UTF-8

Spanner requires that STRING values be UTF-8 encoded. All Spanner functions and operators that act on STRING values operate on Unicode characters rather than bytes. Since we map many MySQL types (including TEXT and CHAR types) to Spanner's STRING type, HarbourBridge is effectively a UTF-8 based tool.

Note that the tool itself does not do any encoding/decoding or UTF-8 checks: it passes through data from mysqldump to Spanner. Internally, we use Go's string type, which supports UTF-8.

Spatial datatypes support

As noted earlier when discussing schema conversion of Spatial datatype, Spanner does not support spatial datatypes and are mapped to STRING(MAX) Spanner type. There are differences in data conversion for spatial datatypes depending on whether direct connect or a mysqldump file is used.

  • MySQL information schema approach (direct connect) : Data from MySQL is fetched using 'ST_AsText(g)' function which converts a value in internal geometry format to its WKT(Well-Known Text) representation and returns the string result. This value will be stored as STRING in Spanner.
  • MySQL dump approach : Mysqldump will have the internal geometry data in binary format. It cannot be converted to WKT format and there is no proper method for mysqldump generation of spatial datatypes also. Thus, this value will just be fetched as a TEXT type and converted to Spanner type STRING.

Note that mysql information schema approach would be the recommended approach for data conversion of spatial datatypes. For production use, you must store this data using standard data types, and implement any searching/filtering logic in the application layer.

Documentation

Overview

Package MySQL handles schema and data migrations from MySQL.

Index

Constants

This section is empty.

Variables

View Source
var MysqlSpatialDataTypes = []string{"geometrycollection", "multipoint", "multilinestring", "multipolygon", "point", "linestring", "polygon", "geometry"}

MysqlSpatialDataTypes is an array of all MySQL spatial data types.

Functions

func ConvertData

func ConvertData(conv *internal.Conv, tableId string, colIds []string, srcSchema schema.Table, spSchema ddl.CreateTable, vals []string, additionalAttributes internal.AdditionalDataAttributes) (string, []string, []interface{}, error)

ConvertData maps the source DB data in vals into Spanner data, based on the Spanner and source DB schemas. Note that since entries in vals may be empty, we also return the list of columns (empty cols are dropped).

func NodeType

func NodeType(n ast.StmtNode) string

NodeType strips off "ast." prefix from ast.StmtNode type.

func ProcessDataRow

func ProcessDataRow(conv *internal.Conv, tableId string, colIds []string, srcSchema schema.Table, spSchema ddl.CreateTable, vals []string, additionalAttributes internal.AdditionalDataAttributes)

ProcessDataRow converts a row of data and writes it out to Spanner. srcTable and srcCols are the source table and columns respectively, and vals contains string data to be converted to appropriate types to send to Spanner. ProcessDataRow is only called in DataMode.

Types

type DbDumpImpl

type DbDumpImpl struct{}

DbDumpImpl MySQL specific implementation for DdlDumpImpl.

func (DbDumpImpl) GetToDdl

func (ddi DbDumpImpl) GetToDdl() common.ToDdl

GetToDdl function below implement the common.DbDump interface.

func (DbDumpImpl) ProcessDump

func (ddi DbDumpImpl) ProcessDump(conv *internal.Conv, r *internal.Reader) error

ProcessDump processes the mysql dump.

type InfoSchemaImpl

type InfoSchemaImpl struct {
	DbName        string
	Db            *sql.DB
	SourceProfile profiles.SourceProfile
	TargetProfile profiles.TargetProfile
}

InfoSchemaImpl is MySQL specific implementation for InfoSchema.

func (InfoSchemaImpl) GetColumns

func (isi InfoSchemaImpl) GetColumns(conv *internal.Conv, table common.SchemaAndName, constraints map[string][]string, primaryKeys []string) (map[string]schema.Column, []string, error)

GetColumns returns a list of Column objects and names// ProcessColumns

func (InfoSchemaImpl) GetConstraints

func (isi InfoSchemaImpl) GetConstraints(conv *internal.Conv, table common.SchemaAndName) ([]string, map[string][]string, error)

GetConstraints returns a list of primary keys and by-column map of other constraints. Note: we need to preserve ordinal order of columns in primary key constraints. Note that foreign key constraints are handled in getForeignKeys.

func (InfoSchemaImpl) GetForeignKeys

func (isi InfoSchemaImpl) GetForeignKeys(conv *internal.Conv, table common.SchemaAndName) (foreignKeys []schema.ForeignKey, err error)

GetForeignKeys return list all the foreign keys constraints. MySQL supports cross-database foreign key constraints. We ignore them because HarbourBridge works database at a time (a specific run of HarbourBridge focuses on a specific database) and so we can't handle them effectively.

func (InfoSchemaImpl) GetIndexes

func (isi InfoSchemaImpl) GetIndexes(conv *internal.Conv, table common.SchemaAndName, colNameIdMap map[string]string) ([]schema.Index, error)

GetIndexes return a list of all indexes for the specified table.

func (InfoSchemaImpl) GetRowCount

func (isi InfoSchemaImpl) GetRowCount(table common.SchemaAndName) (int64, error)

GetRowCount with number of rows in each table.

func (InfoSchemaImpl) GetRowsFromTable

func (isi InfoSchemaImpl) GetRowsFromTable(conv *internal.Conv, tableId string) (interface{}, error)

GetRowsFromTable returns a sql Rows object for a table.

func (InfoSchemaImpl) GetTableName

func (isi InfoSchemaImpl) GetTableName(dbName string, tableName string) string

GetTableName returns table name.

func (InfoSchemaImpl) GetTables

func (isi InfoSchemaImpl) GetTables() ([]common.SchemaAndName, error)

GetTables return list of tables in the selected database. Note that sql.DB already effectively has the dbName embedded within it (dbName is part of the DSN passed to sql.Open), but unfortunately there is no way to extract it from sql.DB.

func (InfoSchemaImpl) GetToDdl

func (isi InfoSchemaImpl) GetToDdl() common.ToDdl

GetToDdl implement the common.InfoSchema interface.

func (InfoSchemaImpl) ProcessData

func (isi InfoSchemaImpl) ProcessData(conv *internal.Conv, tableId string, srcSchema schema.Table, commonColIds []string, spSchema ddl.CreateTable, additionalAttributes internal.AdditionalDataAttributes) error

ProcessData performs data conversion for source database.

func (InfoSchemaImpl) StartChangeDataCapture

func (isi InfoSchemaImpl) StartChangeDataCapture(ctx context.Context, conv *internal.Conv) (map[string]interface{}, error)

StartChangeDataCapture is used for automatic triggering of Datastream job when performing a streaming migration.

func (InfoSchemaImpl) StartStreamingMigration

func (isi InfoSchemaImpl) StartStreamingMigration(ctx context.Context, client *sp.Client, conv *internal.Conv, streamingInfo map[string]interface{}) error

StartStreamingMigration is used for automatic triggering of Dataflow job when performing a streaming migration.

type ToDdlImpl

type ToDdlImpl struct {
}

ToDdlImpl MySQL specific implementation for ToDdl.

func (ToDdlImpl) ToSpannerType

func (tdi ToDdlImpl) ToSpannerType(conv *internal.Conv, spType string, srcType schema.Type) (ddl.Type, []internal.SchemaIssue)

ToSpannerType maps a scalar source schema type (defined by id and mods) into a Spanner type. This is the core source-to-Spanner type mapping. toSpannerType returns the Spanner type and a list of type conversion issues encountered. Functions below implement the common.ToDdl interface

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL