molt

command module

v0.2.0 Latest Latest Go to latest Published: May 3, 2024 License: Apache-2.0 Imports: 1 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/cockroachdb/molt

Links

Open Source Insights

README ¶

MOLT

Migrate Off Legacy Technology is CockroachDB's suite of tools to assist migrations. This repo contains any open-source MOLT tooling.

Certain packages can be re-used by external tools and are subject to the Apache License.

Build

All commands require molt to be built. Example:

# Build molt for the local machine (goes to artifacts/molt)
go build -o artifacts/molt .

# Cross compiling.
GOOS=linux GOARCH=amd64 go build -v -o artifacts/molt .

Encoding passwords

If your password contains special characters, the MOLT tooling may not be able to parse the password. In the event that this happens, you should:

Percent/URL-encode the password.
Use the encoded password in your connection string.

# Original connection string
postgres://postgres:a$52&@localhost:5432/replicationload

# Percent-encoded password
%3Aa%2452%26

# Percent-encoded connection string
postgres://postgres:%3Aa%2452%26@localhost:5432/replicationload

To simplify this process, you can use the escape-password command:

# Command template
molt escape-password "<your password string>"

...

# Example output
molt escape-password ";@;"
Substitute the following encoded password in your original connection url string:
%3B%40%3B

Database-specific setup

MySQL

A prerequisite of using molt fetch with MySQL is that GTID consistency must be enabled. This is necessary for returning the cdc_cursor. To enable GTID, pass two flags to the mysql start command or define them in mysl.cnf:

--gtid-mode=ON
--enforce-gtid-consistency=ON

Additionally, disable ONLY_FULL_GROUP_BY:

// Inside the MySQL shell
SET PERSIST sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));

References: MySQL Docs for Enabling GTID Enabling for AWS RDS MySQL

MOLT Verify

molt verify does the following:

Verifies that tables between the two data sources are the same.
Verifies that table column definitions between the two data sources are the same.
Verifies that tables contain the row values between data sources.

It currently supports PostgreSQL and MySQL comparisons with CockroachDB. It takes in two connection strings as arguments: source and target. source is assumed to be the source of truth.

# Compare postgres and CRDB instance.
molt verify \
  --source 'postgres://user:pass@url:5432/db' \
  --target 'postgres://root@localhost:26257?sslmode=disable'

# Compare mysql and CRDB instance.
molt verify \
  --source 'jdbc:mysql://root@tcp(localhost:3306)/defaultdb' \
  --target 'postgresql://root@127.0.0.1:26257/defaultdb?sslmode=disable'

See molt verify --help for all available parameters. Make sure that your connection strings are properly encoded.

Filters

To verify specific tables or schemas, use --table-filter or --schema-filter.

Continuous verification

If you want all tables to be verified in a loop, you can use --continuous.

Live verification

If you expect data to change as you do data verification, you can use --live. This makes verifier re-check rows before marking them as problematic.

Limitations

MySQL set types are not supported.
Supports only comparing one MySQL database vs a whole CRDB schema (which is assumed to be "public").
Geospatial types cannot yet be compared.
We do not handle schema changes between commands well.

MOLT Fetch

flowchart LR
    LegacyDB[legacy database<br/>i.e. PG, MySQL]
    S3[(Amazon S3)]
    GCP[(Google Cloud<br/>Bucket)]
    Local[(Local File Server)]
    CRDB[CockroachDB]

    LegacyDB -- CSV Dump --> S3
    LegacyDB -- CSV Dump --> GCP
    LegacyDB -- CSV Dump --> Local
    S3 -- IMPORT INTO<br/>or COPY FROM --> CRDB
    GCP -- IMPORT INTO<br/>or COPY FROM --> CRDB
    Local -- "IMPORT INTO (exposes HTTP server)"<br/>or COPY FROM --> CRDB

    LegacyDB -- COPY FROM --> CRDB

molt fetch is able to migrate data from your PG or MySQL tables to CockroachDB without taking your PG/MySQL tables offline. It takes --source and --target as arguments (see molt verify documentation above for examples).

It outputs a cdc_cursor which can be fed to CDC programs (e.g. cdc-sink, AWS DMS) to migrate live data without taking your database offline.

It currently supports the following:

Pulling a table, uploading CSVs to S3/GCP/local machine (--listen-addr must be set) and running IMPORT on Cockroach for you.
Pulling a table, uploading CSVs to S3/GCP/local machine and running COPY FROM on Cockroach from that CSV.
Pulling a table and running COPY FROM directly onto the CRDB table without an intermediate store.

By default, data is imported using IMPORT INTO. You can use --use-copy if you need target data to be queriable during loading, which uses COPY FROM instead.

Data can be truncated automatically if run with --table-handling 'truncate-if-exists'. Molt Fetch can also automatically create the new table on the target side if run with --table-handling 'drop-on-target-and-recreate'. The user can also manually create the new table schema on the target side, and run with --table-handling 'none' (which is the default setting of table handling options).

A PG replication slot can be created for you if you use pglogical-replication-slot-name, see --help for more related flags.

For now, schemas must be identical on both sides. This is verified upfront - tables with mismatching columns may only be partially migrated.

Example invocations

Make sure that your connection strings are properly encoded.

S3 usage:

# Ensure access tokens are appropriately set in the environment.
export AWS_REGION='us-east-1'
export AWS_SECRET_ACCESS_KEY='key'
export AWS_ACCESS_KEY_ID='id'
# Ensure the S3 bucket is created and accessible from CRDB.
molt fetch \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --bucket-path 's3://otan-test-bucket' \
  --table-handling 'truncate-if-exists' \ # automatically truncate destination tables before importing
  --cleanup # cleans up any created s3 files

GCP usage:

# Ensure credentials are loaded using `gcloud init`.
# Ensure the GCP bucket is created and accessible from CRDB.
molt fetch \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --bucket-path 'gs://otan-test-bucket/test-migrations' \ # writes to a subpath within the bucket (i.e. gs://otan-test-bucket/test-migrations)
  --cleanup # cleans up any created gcp files

Using a direct COPY FROM without storing intermediate files:

molt fetch \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --direct-copy

Storing CSVs locally before running COPY FROM:

molt fetch \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --local-path /tmp/basic \
  --use-copy

Storing CSVs locally and running a file server:

# set --local-path-crdb-access-addr if the automatic IP detection is incorrect.
molt fetch \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --local-path /tmp/basic \
  --local-path-listen-addr '0.0.0.0:9005'

Creating a replication slot with PG:

molt fetch \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --local-path /tmp/basic \
  --local-path-listen-addr '0.0.0.0:9005' \
  --pglogical-replication-slot-name 'hi_im_elfo' \
  --pglogical-replication-slot-decoding 'pgoutput'

Edge case

MacOS + CockroachDB as source within Docker container

If you encounter an error similar to the following, please contact the support team.

ERROR: AS OF SYSTEM TIME: cannot specify timestamp in the future (1701836988.000000000,0 > 1701836987.322737000,0) (SQLSTATE XXUUU)

This error is due to the fact that with MacOS as the runtime OS, Docker may have indeterministic time drift from the host machine.[1] Because we run a SELECT ... AS OF SYSTEM TIME query to iterate content from the target table, time drift can cause a cannot specify timestamp in the future error when using molt fetch to export data from a CockroachDB cluster within a container.

Example to reproduce the time drift:

#!/bin/bash

set -e
docker rm -f random-cont
docker run -d --name random-cont alpine sh -c "apk add --no-cache coreutils && tail -f /dev/null"

# Wait for the coreutils to be fully installed
sleep 8

for ((i = 1; i <= 10; i++)); do
# Capture start time from the Docker container and clean up non-numeric characters
start_time=$(docker exec -it random-cont date +%s%N | tr -cd '[:digit:]')

# Capture end time from the local host and clean up non-numeric characters
end_time=$(gdate +%s%N | tr -cd '[:digit:]')

# Calculate the time difference in milliseconds
time_diff=$(( (end_time - start_time) / 1000000 ))

# Calculate the time difference in seconds
time_diff_seconds=$(bc <<< "scale=6; $time_diff / 1000")

echo "Time difference: ${time_diff} milliseconds"
echo "Time difference: ${time_diff_seconds} seconds"
echo

done

Customized Type Mapping

Molt Fetch support automatic schema creation if you run it with --table-handling 'drop-on-target-and-recreate' option. For the newly created schema, you can customized the type mapping for each column. The map you would like to use should be listed in a json file with its path passed via --type-map-file='path/to/your/json.json'.

The example json file format is as follows:

[
  {
    "table": "mytable",
    "column_type_map": [
      {
        "column": "*",
        "source_type": "int2",
        "crdb_type": "int2"
      },
      {
        "column": "jsonbcol",
        "source_type": "jsonb",
        "crdb_type": "string"
      },
      {
        "column": "boolcol",
        "source_type": "int2",
        "crdb_type": "bool"
      }
    ]
  }
]

In this example, we specify the overriding map for the table mytable. It specifies the following rules:

All columns of type int2, except for column boolcol, will all have corresponding column in crdb created as type int2.
For column boolcol, its corresponding column on the crdb side will be bool type.
For column jsonbcol, its corresponding column on the crdb side will be of type string.
All other columns not specified, will be mapped to crdb side with the default type mapping rules.

Local Setup

Setup Git Hooks

In order to enforce good developer practices, there are Git hooks that must be synced to your local directory. To do this, run: make sync_hooks. Right now, this supports making sure that each commit has a Release Note:.

Running Tests

Ensure a local postgres instance is setup and can be logged in using postgres://postgres:postgres@localhost:5432/defaultdb (this can be overridden with the POSTGRES_URL env var):

CREATE USER 'postgres' PASSWORD 'postgres' ADMIN;
CREATE DATABASE defaultdb;

Ensure a local, insecure CockroachDB instance is setup (this can be overriden with the COCKROACH_URL env var): cockroach demo --insecure --empty.
Ensure a local MySQL is setup with username root and an empty password, with a defaultdb database setup (this can be overriden with the MYSQL_URL env var):

CREATE DATABASE defaultdb;

Run the tests: go test ./....
- Data-driven tests can be run with -rewrite, e.g. go test ./verification -rewrite.

Releases

All releases before v0.0.6 were published directly to Github release artifacts. Look at the Assets section of each release link for the binaries. From v0.0.6 onward, molt is published to the GCS bucket. The version manifest contains links to all binaries for all versions.

Release versions (v0.0.6 and after) Release v0.0.5 Release v0.0.4 Release v0.0.3 Release v0.0.2 Release v0.0.1

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
climanifest
cmd
fetch
fetch/tokens
internal/cmdutil
verify
comparectx
compression
dbconn
dbtable
e2e
fetch
datablobstorage
dataexport
fetchcontext
fetchmetrics
internal/dataquery
replicator
status
moltcsv Package moltcsv reads and writes comma-separated values (CSV) files.	Package moltcsv reads and writes comma-separated values (CSV) files.
moltlogger
molttelemetry
mysqlconv
mysqlurl
oracleconv
parsectx
pgconv
retry
rowiterator
testutils
utils
typeconv
verify
dbverify Package dbverify is responsible for verifying two different databases match.	Package dbverify is responsible for verifying two different databases match.
inconsistency
rowverify
tableverify
verifymetrics

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL