datapasta

package module
v0.0.0-...-b650ae1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 1, 2023 License: Apache-2.0 Imports: 10 Imported by: 0

README

DataPasta 🍝

Export and import heirarchal objects in a database.

golangci-lint


Summary

This library provides deep-clone functionality for a database object. It currently has an adapter for Postgres, but custom adapters can be created to satisfy a minimal interface.

Mechanism

There are 2 steps:

  • Download: this recursive process fetches a record, recurses into all the records that have a foreign key reference to it and appends them to an output, then appends this record, and finally recurses into any records this record has a foreign key reference to.
  • Upload: this naive process loops through a slice of objects, inserting each object to its given table. While doing so, however, it keeps track of changes (such as newly generated primary keys), and updates references to those changes in the following records.

These 2 mechanisms allow for easily downloading an export of a hierarchial structure from a database, and then uploading that export to either the same database or a new database.

Example

main.go

// c is a connection to a postgres database
pg, err := datapasta.NewPostgres(ctx, c)
assert.NoError(err)

export.go

// we want to export everything about user 50
cli, err := pg.NewClient(ctx, c)
assert.NoError(err)

// download user id 50 - it will recursively find everything related to the user
dl, trace, err := datapasta.Download(ctx, cli, "user", "id", 50)
assert.NoError(err)

import.go

// now upload a copy of that user
cli, err := pg.NewClient(ctx, db)
assert.NoError(err)

datapasta.Upload(ctx, cli, dump)

// return the new id of the user (as postgres provided a new id)
return dump[0]["id"].(int32), nil

Export Tips

Download accepts a few options, which you will definitely want to provide. The most important option is DontRecurse, which tells the clone to include but not recurse into a table. For example, consider:

user ( id serial )
item ( id serial )
purchase ( 
    user_id REFERENCES user(id), 
    item_id REFERENCES item(id),
)

If we export a user, the export will recurse into purchase, and then recurse into other user records that have made purchases, which will likely clone your entire database!

This can be solved by telling Download not to recurse out of the purchase table, with datapasta.DontRecurse("purchase").

This can also be solved by telling Download not to include the purchase table at all, with datapasta.DontInclude("purchase").

Import Tips

There's a very good chance that the resulting export won't be importable without some cleaning up, for a few reasons.

  • Some unique columns that aren't primary keys will need to be nulled or mocked.
  • If you're exporting to a different database, any records excluded from the dump may still be referenced by a foreign key, which will need to be nulled.
  • You might want to strip PII.

Luckily, as the dump is just an array of arbitrary objects, it's pretty easy to clean up the dump between import and export. Here's an example that removes "access codes" from users as those have a unique constraint:

for _, obj := range dump {
    if obj[datapasta.DumpTableKey] == "user" {
		obj["access_code"] = nil
	}
}

Documentation

Index

Constants

View Source
const (
	// DumpTableKey is a special field present in every row of an export.
	// It can be used to determine which table the row is from.
	// Note that the export may have rows from a table interleaved with rows from other tables.
	DumpTableKey = "%_tablename"
)

Variables

View Source
var LogFunc = log.Printf

Functions

func ApplyMergeStrategy

func ApplyMergeStrategy(db Database, mapp []Mapping, mas []MergeAction) error

func FindModifiedRows

func FindModifiedRows(pks map[string]string, from, in DatabaseDump) map[RecordID]map[string]any

return a map of updates or deletes that would make "in" equal "from" the map key is the table and column that changed and the value is the new value

func FindRow

func FindRow(table, pk string, id any, dump DatabaseDump) map[string]any

func NewPostgres

func NewPostgres(ctx context.Context, c Postgreser) (pgdb, error)

NewPostgres returns a pgdb that can generate a Database for datapasta Upload and Download functions.

func ReverseForeignKeyMapping

func ReverseForeignKeyMapping(fks []ForeignKey, mapp []Mapping, rows DatabaseDump)

reverse all the foreign keys of a dump

func ReverseForeignKeyMappingRow

func ReverseForeignKeyMappingRow(fks []ForeignKey, mapp []Mapping, row map[string]any)

reverse all the foreign keys of an indivdual row

func ReversePrimaryKeyMapping

func ReversePrimaryKeyMapping(pks map[string]string, mapp []Mapping, dump DatabaseDump)

reverse all the primary keys of a dump

func Upload

func Upload(ctx context.Context, db Database, dump DatabaseDump) error

Upload uploads, in naive order, every record in a dump. It mutates the elements of `dump`, so you can track changes (for example new primary keys).

Types

type Database

type Database interface {
	// SelectMatchingRows must return unseen records.
	// a Database can't be reused between clones, because it must do internal deduping.
	// `conds` will be a map of columns and the values they can have.
	SelectMatchingRows(tname string, conds map[string][]any) ([]map[string]any, error)

	// insert one record, returning the new id
	InsertRecord(record map[string]any) (any, error)

	// apply the updates from the cols to the row
	Update(id RecordID, cols map[string]any) error

	// delete the row
	Delete(id RecordID) error

	// Insert uploads a batch of records.
	// a Destination can't generally be reused between clones, as it may be inside a transaction.
	// it's recommended that callers use a Database that wraps a transaction.
	//
	// the records will have primary keys which must be handled.
	// the Database is responsible for exposing the resulting primary key mapping in some manner.
	Insert(records ...map[string]any) error

	// Mapping must return whatever mapping has been created by prior Inserts.
	// the implementation may internally choose to track this in the database or in memory.
	Mapping() ([]Mapping, error)

	// get foriegn key mapping
	ForeignKeys() []ForeignKey

	// get primary key mapping
	PrimaryKeys() map[string]string
}

Database is the abstraction between the cloning tool and the database. The NewPostgres.NewClient method gives you an implementation for Postgres.

type DatabaseDump

type DatabaseDump []map[string]any

DatabaseDump is the output of a Download call, containing every record that was downloaded. It is safe to transport as JSON.

func Download

func Download(ctx context.Context, db Database, startTable, startColumn string, startId any, opts ...Opt) (DatabaseDump, []string, error)

Download recursively downloads a dump of the database from a given starting point. the 2nd return is a trace that can help debug or understand what happened.

func FindMissingRows

func FindMissingRows(pks map[string]string, from, in DatabaseDump) DatabaseDump

find rows in "from" that are missing in "in"

type ForeignKey

type ForeignKey struct {
	BaseTable        string `json:"base_table"`
	BaseCol          string `json:"base_col"`
	ReferencingTable string `json:"referencing_table"`
	ReferencingCol   string `json:"referencing_col"`
}

ForeignKey contains every RERENCING column and the BASE column it refers to. This is used to recurse the database as a graph. Database implementations must provide a complete list of references.

type Mapping

type Mapping struct {
	RecordID
	OriginalID any
}

func FindMapping

func FindMapping(id RecordID, mapp []Mapping) Mapping

type MergeAction

type MergeAction struct {
	ID     RecordID
	Action string
	Data   map[string]any
}

func GenerateMergeStrategy

func GenerateMergeStrategy(pks map[string]string, base, main, branch DatabaseDump) []MergeAction

GenerateMergeStrategy returns every update or delete needed to merge branch into main note that conflicts will be intermingled in updates and deletes

func (MergeAction) String

func (ma MergeAction) String() string

type Opt

type Opt func(*downloadOpts)

Opt is a functional option that can be passed to Download.

func DontInclude

func DontInclude(table string) Opt

DontInclude does not recurse into records from `table`, but still includes referenced records.

func DontRecurse

func DontRecurse(table string) Opt

DontRecurse includes records from `table`, but does not recurse into references to it.

func LimitSize

func LimitSize(limit int) Opt

LimitSize causes the clone to fail if more than `limit` records have been collected. You should use an estimate of a higher bound for how many records you expect to be exported. The default limit is 0, and 0 is treated as having no limit.

type Postgreser

type Postgreser interface {
	Exec(context.Context, string, ...interface{}) (pgconn.CommandTag, error)
	Query(context.Context, string, ...interface{}) (pgx.Rows, error)
	QueryRow(context.Context, string, ...interface{}) pgx.Row
	CopyFrom(ctx context.Context, tableName pgx.Identifier, columnNames []string, rowSrc pgx.CopyFromSource) (int64, error)
	SendBatch(context.Context, *pgx.Batch) pgx.BatchResults
}

Postgreser does postgres things. github.com/jackc/pgx/v4/pgxpool.Pool is one such implementation of postgres.

type RecordID

type RecordID struct {
	Table      string
	PrimaryKey any
}

func GetRowIdentifier

func GetRowIdentifier(pks map[string]string, row map[string]any) RecordID

func (RecordID) String

func (r RecordID) String() string

Directories

Path Synopsis
integrations package houses some utility functions for making or testing database integrations.
integrations package houses some utility functions for making or testing database integrations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL