templater

package module

v1.5.2 Latest Latest Go to latest Published: Nov 21, 2022 License: GPL-2.0 Imports: 18 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/mr-joshcrane/templater

Links

Open Source Insights

README ¶

templater

Quick Install

go install github.com/mr-joshcrane/templater/cmd/templater@latest

Usage

$ templater [FIELDS_TO_UNPACK ...]

FIELDS_TO_UNPACK is an optional indications of which fields are JSON objects, capable of further unpacking.

Why would you use templater?

Data Engineering will often require taking some raw, untyped and unsanitised data and running it through a series of preliminary transformations before it can be presented in its final format.

Templater was created with DBT as the transformation tool in mind, and is designed to generate a complete DBT project from a set of CSV files. These CSVs are typically exported Snowflake tables, but any CSVs can be used.

Theoretically templater can be used with any data warehouse solution, but it has only been tested with Snowflake.

Data doesn't always play nice

Sometimes you'll have some untyped data that needs to be typed. This usually isn't difficult, but is very tedious and labour intensive.

Sometimes you will have column names that contain special characters that are not allowed in Snowflake (without special handling).

Sometimes you will have nested JSON data that needs to be flattened before it can be used for any real business application.

Templater can be used to alleviate some of the burden here!

Imagine we had a table in Snowflake called ENERGY and we dumped it out to a CSV file as ENERGY.csv:

sourceAsMetric	target(as %)	value(per million tonnes)	statistics
Bio-conversion(natural)	Liquid	124.729	"{ ""attributes"": { ""on_hand"": false, ""available_in"": ""days"", ""componentDemands"": [99,2.5,"{5,2,5}"]}}"
BiofuelImports	Solid	35	"{ ""attributes"": { ""on_hand"": true, ""available_in"": ""minutes"", ""componentDemands"": [1,0,"{0.1,9,0}"]}}"
coal imports(55%)	Coal	null	"{ ""attributes"": { ""on_hand"": true, ""available_in"": ""hours"", ""componentDemands"": [1,0,"{0.1,00.1,NA}"]}}"

Gross! As we can see, the format of this data leaves much to be desired. We could munge it with DBT, but that would be a non-trivial amount effort!

We can use templater to generate a DBT project that will transform this data into something more useful.

Lets call templater, and indicate we want to unpack the CSV field called statistics which seems to be a JSON field.

$ templater statistics

Templater will generate the core of a new DBT project, including our transformation below.

output/transform/TRANS01_ENERGY.sql

{{ config(materialized='table') }}
SELECT
  "statistics":"attributes"."available_in"::STRING AS ATTRIBUTES__AVAILABLE_IN
  ,"statistics":"attributes"."componentDemands"::ARRAY AS ATTRIBUTES__COMPONENT_DEMANDS
  ,"statistics":"attributes"."on_hand"::BOOLEAN AS ATTRIBUTES__ON_HAND
  ,"sourceAsMetric"::STRING AS SOURCE_AS_METRIC
  ,"target(idealised)"::STRING AS TARGET_IDEALISED
  ,"value(per million tonnes)"::STRING AS VALUE_PER_MILLION_TONNES
FROM
  {{ source('TEMPLATER', 'ENERGY') }}

We can see that templater has:

Generated our transformation SQL
Tried to infer Snowflake Types for each column
Normalised the column names to be Snowflake friendly

It has also generated some suggested DBT models that you can tweak to your liking in output/transform/_models_schema.yml, which will go a long way when you're trying to generate some dbt docs.

version: 2
models:
  - name: TRANS01_ENERGY
    columns:
      - name: ATTRIBUTES__AVAILABLE_IN
      - name: ATTRIBUTES__COMPONENT_DEMANDS
      - name: ATTRIBUTES__ON_HAND
      - name: SOURCE_AS_METRIC
      - name: TARGET_IDEALISED
      - name: VALUE_PER_MILLION_TONNES

Documentation ¶

Index ¶

Variables
func CleanTableName(path string) string
func ContainsNonLeadingArray(path string) bool
func EscapePath(s string) string
func GenerateColumnsSQL(f map[string]Field) string
func GenerateReferenceSQL(table string) string
func GenerateSourceSQL(project, table string) string
func GenerateTagsSQL(project, table string) string
func Main() int
func NormaliseKey(s string) string
func UnmarshalJSONFromCUE(c cue.Value) (cue.Value, error)
func Unpack(t *Table, c cue.Value, opts ...NameOption)
type Column
type Field
type Model
type Models
- func GenerateProjectModel(tables []*Table) Models
type NameOption
type SQLTemplate
type Source
type Sources
type Table
- func (t *Table) InferFields(iter cue.Iterator, unpackPaths ...string) error
type Test

Constants ¶

This section is empty.

Variables ¶

View Source

var SnowflakeTypes = map[string]string{
	"string": "STRING",
	"int":    "INTEGER",
	"float":  "FLOAT",
	"struct": "OBJECT",
	"list":   "ARRAY",
	"null":   "VARCHAR",
	"bool":   "BOOLEAN",
}

SnowflakeTypes is a map of CUE types to Snowflake types.

CUE Lang Type Reference: https://cuelang.org/docs/references/spec/#types.

Snowflake Type Reference: https://docs.snowflake.com/en/sql-reference/data-types.html.

Functions ¶

func CleanTableName ¶

func CleanTableName(path string) string

CleanTableName derives a table name from a file name in a Snowflake-friendly format.

func ContainsNonLeadingArray ¶

func ContainsNonLeadingArray(path string) bool

Example ¶

package main

import (
	"fmt"

	"github.com/mr-joshcrane/templater"
)

func main() {
	path1 := "meta.mass_edit_custom_type_ids[123]"
	path2 := "meta.mass_edit_custom_type_ids"
	path3 := "[123]meta.mass_edit_custom_type_ids"
	fmt.Println(templater.ContainsNonLeadingArray(path1))
	fmt.Println(templater.ContainsNonLeadingArray(path2))
	fmt.Println(templater.ContainsNonLeadingArray(path3))
}

Output:

true
false
false

func EscapePath ¶

func EscapePath(s string) string

EscapePath escapes each section of the path into "delimited identifiers". If we didn't do this, we'd end up with a path like: foo.bar.baz instead of "foo"."bar"."baz", which would cause errors.

Reference: https://docs.snowflake.com/en/sql-reference/identifiers-syntax.html#delimited-identifiers.

func GenerateColumnsSQL ¶

func GenerateColumnsSQL(f map[string]Field) string

Generate the SQL required to declare, rename and typecast the columns in a table in a DBT Project Model.

Example ¶

package main

import (
	"fmt"

	"github.com/mr-joshcrane/templater"
)

func main() {
	fields := map[string]templater.Field{
		"Team": {
			Path:         "Team",
			Node:         "Team",
			InferredType: "STRING",
		},
		"Payroll(millions)": {
			Path:         "Payroll(millions)",
			Node:         "Payroll(millions)",
			InferredType: "FLOAT",
		},
		"Wins": {
			Path:         "Wins",
			Node:         "Wins",
			InferredType: "INTEGER",
		},
	}
	fmt.Println(templater.GenerateColumnsSQL(fields))
}

Output:

  "Payroll(millions)"::FLOAT AS PAYROLL_MILLIONS
  ,"Team"::STRING AS TEAM
  ,"Wins"::INTEGER AS WINS

func GenerateReferenceSQL ¶

func GenerateReferenceSQL(table string) string

GenerateReferenceSQL generates a relation for a source table in a DBT Project Model.

Reference: https://docs.getdbt.com/reference/dbt-jinja-functions/ref.

func GenerateSourceSQL ¶

func GenerateSourceSQL(project, table string) string

GenerateSourceSQL generates a relation for a source table in a DBT Project Model.

Reference: https://docs.getdbt.com/reference/dbt-jinja-functions/source

Example ¶

package main

import (
	"fmt"

	"github.com/mr-joshcrane/templater"
)

func main() {
	PROJECT := "A_ProjectName"
	TABLE := "A_TableName"
	fmt.Println(templater.GenerateSourceSQL(PROJECT, TABLE))
}

Output:

  {{ source('A_PROJECTNAME', 'A_TABLENAME') }}

func GenerateTagsSQL ¶

func GenerateTagsSQL(project, table string) string

GenerateTagsSQL generates the config block tags suitable for use in a DBT Project Model.

Reference: https://docs.getdbt.com/reference/resource-configs/tags

Example ¶

package main

import (
	"fmt"

	"github.com/mr-joshcrane/templater"
)

func main() {
	PROJECT := "A_ProjectName"
	TABLE := "A_TableName"
	fmt.Println(templater.GenerateTagsSQL(PROJECT, TABLE))
}

Output:

{{ config(tags=['A_PROJECTNAME', 'A_TABLENAME']) }}

func Main ¶

func Main() int

Main is the entrypoint for the templater. Working in the context of the current working directory as a fs.FS and taking os.Args as a list of fields to unpack it will generate a the following artifacts:

A [transform] directory containing the DBT SQL transformations of the tables.
A [public] directory containing the DBT SQL clone transforms.
Schemas for source, transform, and public models.

func NormaliseKey ¶

func NormaliseKey(s string) string

NormaliseKey takes a key and attempts to normalise it to a Snowflake-friendly format. It will:

Convert from camelCase to SCREAMING_SNAKE_CASE
Cast to uppercase
Remove any non-underscore/non-alphanumeric characters
Remove any leading/trailing spaces
Convert spaces to underscores
Replace any double underscores with single underscores
Replace any dots with double underscores

Example ¶

package main

import (
	"fmt"

	"github.com/mr-joshcrane/templater"
)

func main() {
	rule_uppercased := "thisisakey"
	rule_spaces_to_underscore := "this is a key"
	rule_nonalphanumeric_stripped := "this%^@is``a()*key"
	rule_dot_separaters_to_double_underscore := "json.payload.and_children"
	rule_leading_and_trailing_space_trimmed := "       THISISAKEY          "
	rule_parenthesised_words_considered_word_boundaries := "(THIS)IS(A)KEY"
	rule_camel_case_considered_separate_words := "thisIsAKey"

	fmt.Println(templater.NormaliseKey(rule_uppercased))
	fmt.Println(templater.NormaliseKey(rule_spaces_to_underscore))
	fmt.Println(templater.NormaliseKey(rule_nonalphanumeric_stripped))
	fmt.Println(templater.NormaliseKey(rule_dot_separaters_to_double_underscore))
	fmt.Println(templater.NormaliseKey(rule_leading_and_trailing_space_trimmed))
	fmt.Println(templater.NormaliseKey(rule_parenthesised_words_considered_word_boundaries))
	fmt.Println(templater.NormaliseKey(rule_camel_case_considered_separate_words))

}

Output:

THISISAKEY
THIS_IS_A_KEY
THIS_IS_A_KEY
JSON__PAYLOAD__AND_CHILDREN
THISISAKEY
THIS_IS_A_KEY
THIS_IS_A_KEY

func UnmarshalJSONFromCUE ¶ added in v1.4.1

func UnmarshalJSONFromCUE(c cue.Value) (cue.Value, error)

UnmarshalJSONFromCUE takes a cue.Value that is assumed to be a JSON string and attempts to marshal it to JSON, returning an error if unable to do so.

func Unpack ¶

func Unpack(t *Table, c cue.Value, opts ...NameOption)

Unpack constructs a Field from a cue.Value and adds it to the Table. A NameOption can be passed to modify the path of the field. Objects are recursively unpacked. Arrays are not.

Types ¶

type Column ¶

type Column struct {
	Name        string   `yaml:"name"`
	Description *string  `yaml:"description, omitempty"`
	Tests       []string `yaml:"tests, omitempty"`
}

Column: DBT Reference: https://docs.getdbt.com/reference/resource-properties/columns.

type Field ¶

type Field struct {
	Node         string
	Path         string
	InferredType string
}

A Field represents a column and information about how it should be transformed.

Node: Represents the post-transformation target identifier in Snowflake.

Path: Represents the pre-transformation path to the data in the source table.

InferType: Represents the current best guess at Snowflake type inferred from exemplars.

type Model ¶

type Model struct {
	Name        string   `yaml:"name"`
	Description *string  `yaml:"description, omitempty"`
	Tests       []Test   `yaml:"tests, omitempty"`
	Columns     []Column `yaml:"columns"`
}

Models: DBT Reference: https://docs.getdbt.com/docs/dbt-cloud-apis/metadata-schema-model.

type Models ¶

type Models struct {
	Version int     `yaml:"version"`
	Models  []Model `yaml:"models"`
}

Models: DBT Reference: https://docs.getdbt.com/docs/dbt-cloud-apis/metadata-schema-model.

func GenerateProjectModel ¶

func GenerateProjectModel(tables []*Table) Models

GenerateProject: Generate the Models required in _models_schema.yaml files that help define a (potentially multi-table) DBT project.

type NameOption ¶

type NameOption func(string) string

type SQLTemplate ¶

type SQLTemplate struct {
	Tags      string
	Columns   string
	Source    string
	Reference string
}

SQLTemplate is an intermediate data structure that represents the table to be rendered as a SQL Model in a DBT Project.

type Source ¶

type Source struct {
	Name   string   `yaml:"name"`
	Schema string   `yaml:"schema"`
	Tables []Column `yaml:"tables, omitempty"`
}

Sources: DBT Reference: https://docs.getdbt.com/reference/dbt-jinja-functions/source.

type Sources ¶

type Sources struct {
	Version int      `yaml:"version"`
	Sources []Source `yaml:"sources"`
}

Sources: DBT Reference: https://docs.getdbt.com/reference/dbt-jinja-functions/source.

type Table ¶

type Table struct {
	Name    string
	Project string
	Fields  map[string]Field
	// contains filtered or unexported fields
}

A Table represents a source table. It is the intermediate representation of the untyped semi-structured data.

func (*Table) InferFields ¶

func (t *Table) InferFields(iter cue.Iterator, unpackPaths ...string) error

InferFields takes a cue.Iterator and walks through it, adding fields to the table. It will also unpack any JSON fields where the column name matches the (optional) unpackPath.

type Test ¶

type Test struct {
	Test string `yaml:"test, omitempty"`
}

Test: DBT Reference: https://docs.getdbt.com/reference/resource-properties/tests.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
templater

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL