describe

package module
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 21, 2023 License: Apache-2.0 Imports: 7 Imported by: 0

README

Describe

Go Report Card godoc

Background

Describe is a package and a command to generate descriptive plots of fields in ClickHouse tables or queries. For fields of type float, describe generates quantile plots. For all other fields, it generates histograms.

Describe creates either files of the plots or displays them in a browser (or both).

Describe depends on orca.

Parameters

Help
  • -h

Prints help.

ClickHouse credentials
  • -user
  • -pw
Source of the data.

One of -q and -t needs to be specified. If the entire table is run, Describe includes the comment for each field in the title of the plot.

  • -q <"SELECT * FROM...">. Query to pull the data, enclosed in quote.
  • -t <db.table>. Table name.
Outputs
  • -xy <'xField,yField1,..yFieldk'>. If this flag is used, an XY plot is created. The input must be a query. The field names to plot are enclosed in quotes and are comma-separated. This syntax works, too:
    • -xy <'field'> which plots 'field' against an index 0,1,2...
    • -lineType 'm,l,b' line types for xy plots (m=marker, l=line,b=boxplot)
    • -color colors for xy plots (e.g. 'black' 'red',...)
    • -f - Filename. Optional root file name for output graphs (no extension )
  • -i <image type>. Image types. One or more of: png, jpeg, html, pdf, webp, svg, eps, emf. If none is specified, the plot(s) are sent to the browser.
  • -d - Directory. Directory for output images. Defaults to the working directory.
  • -b <browser>. Browser for images. If omitted, the system default is used.
  • -show - If included, the plot is (also) sent to the browser. -show is assumed if -d and -i are omitted.
  • -title - If included, the plots are titled with this value.
  • -subtitle - Optional subtitle.
  • -threads - If included, maximum # of threads for ClickHouse to use.
  • -width - plot width (default 1000)
  • -height - plot heith (default 800)
  • -xlim <min,max> x-axis range.
  • -ylim <min,max> y-axis range.
  • -log Plot y-axis on log scale.

Images are placed in subdirectories of -d according to image type. For example, if you have

-i png,html

then two subdirectories are created - png and html - for images of the corresponding type.

Image filenames are the name of the field.

Missing Values.

By default, the results exclude values that indicate the data is missing. Use -miss to disable this feature.

  • -mF <value>. Value that indicates a missing float. Default: -1.

  • -mI <value>. Value that indicates a missing int. Default: -1.

  • -mS <value>. Value that indicates a missing string. Default: !.

  • -mD <value>. Value that indicates a missing data. Default: 19700101

  • -miss - If present, missing-value filter is disabled.

  • -markdown <filename> - If present, the graphs are bundled into a markdown file <filename>.
    Requires -d parameter to point to the directory of images. If the input files are html, markdown uses links. Otherwise, the graphs are included. From markdown, you can include it in a Jekyll (GitHub pages) site, or you can convert it to PDF. If the -d path is relative, the links are relative. -markdown is run standalone, outside of creating the images. Why? Well, we'd have to add another flag to specify which image type to use.

Parameter Combinations
  1. -i requires -d
  2. -show is implied if -i is omitted.
  3. If -d is omitted, -d is set to the working directory.
Examples
describe -q "select purpose from bk.loan" -user <user> -pw <pw>  

Runs the query, sending the graph to the default browser.

describe -q "select * from bk.loan" -i png,html -user <user> -pw <pw>

Runs the query, pulling all the fields from bk.loan. Both png and html files are produced. These are placed in the current working directory. One could, instead, use:

describe -q bk.loan -i png,html -user <user> -pw <pw>

which will include the field comments in the graphs.

describe -t bk.loan -d figs/png -markdown figs.md

Creates a markdown file, figs.md, in the current working directory with the images in figs/png.

describe -q 'select ltv, cltv from bk.loan' -xy 'ltv,cltv' -show

produces a cross plot of cltv (y-axis) vs ltv (x-axis) in the default browser

Images

Histograms are not produced for fields that have more than 1000 distinct levels.

Documentation

Overview

Package describe generates descriptive plots of ClickHouse tables and query results. There are two types of images generated: histograms and quantile plots. Quantile plots are generated for fields of type float. Histograms are generated for fields of type string, date and int. If you want a quantile plot of an int field, cast it as float.

Values deemed "missing" in a field may be omitted from a graph.

In addition, there is a func to create a simple markdown file of the images created.

The command in the describe subdirectory.

Index

Constants

View Source
const (
	SkipLevel = 1000 // a histogram isn't made if there are more than this many levels
)

Variables

This section is empty.

Functions

func Drive

func Drive(runDetail *RunDef, conn *chutils.Connect) error

Drive runs the appropriate task

func FieldPlot

func FieldPlot(runDetail *RunDef, xField, yField, where, plotType, title string, conn *chutils.Connect) error

FieldPlot builds the plot for a single field.

  • qry. Query to pull the data.
  • field. Field to keep from query.
  • plotType. ("histogram" or "quantile")
  • outDir. Directory for output.
  • title. Title for plot.
  • imageTypes. Type(s) of images to produce.
  • show. If true, push plot to browser.
  • conn. Connector to ClickHouse.

func Markdown

func Markdown(runDetail *RunDef) error

Markdown creates a simple markdown file of the images in OutDir

func Multiple

func Multiple(runDetail *RunDef, conn *chutils.Connect) error

Multiple creates the graphs for a query (as opposed to a table)

func Table

func Table(runDetail *RunDef, conn *chutils.Connect) error

Table generates plots for all the fields in the table.

func XY added in v0.0.3

func XY(runDetail *RunDef, conn *chutils.Connect) error

XY creates an XY graphs for a query

Types

type RunDef

type RunDef struct {
	Task TaskType // the kind of task to run

	Show         bool                    // if true, send the plots to the browser
	ImageTypesCh []utilities.PlotlyImage // type(s) of image files to create

	// one of these two must be specified
	Qry      string // query to pull data
	Table    string // table to pull data
	XY       string
	LineType string
	Color    string
	Box      bool
	Log      bool

	Title    string
	SubTitle string

	Width  float64
	Height float64

	Xlim []float64
	Ylim []float64

	OutDir   string // directory for image files
	FileName string

	ImageTypes string // types of images to create

	MissStr, MissDt, MissInt, MissFlt any // values which indicate a field value is missing. Ignored if nil.

	Markdown string // if not nil, the name of a markdown file to create with the images in OutDir.

	Fds *chutils.TableDef // field defs of query results (not required if describing a table).
}

The RunDef struct holds the elements required to direct describe's activities.

type TaskType

type TaskType int

TaskType is what we're asked to do:

  • taskQuery: describe results of query
  • taskTable: describe all the fields in a table
const (
	TaskNone TaskType = 0 + iota
	TaskQuery
	TaskTable
	TaskXY
)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL