imports

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 30, 2022 License: MIT Imports: 17 Imported by: 0

Documentation

Overview

Package imports provides functionality to read data contained in another format to populate a DataFrame. It provides inverse functionality to the exports package.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func LoadFromCSV

func LoadFromCSV(ctx context.Context, r io.ReadSeeker, options ...CSVLoadOptions) (*dataframe.DataFrame, error)

LoadFromCSV will load data from a csv file.

func LoadFromJSON

func LoadFromJSON(ctx context.Context, r io.ReadSeeker, options ...JSONLoadOptions) (*dataframe.DataFrame, error)

LoadFromJSON will load data from a jsonl file or a JSON array. The first row determines which fields will be imported for subsequent rows.

See: https://jsonlines.org for details on the file format.

func LoadFromParquet

func LoadFromParquet(ctx context.Context, src source.ParquetFile, opts ...ParquetLoadOptions) (*dataframe.DataFrame, error)

LoadFromParquet will load data from a parquet file.

NOTE: This function is experimental and the implementation is likely to change.

Example (gist):

import	"github.com/xitongsys/parquet-go-source/local"
import	"github.com/jdfergason/dataframe-go/imports"

func main() {
	fr, _ := local.NewLocalFileReader("file.parquet")
	defer fr.Close()

	df, _ := imports.LoadFromParquet(ctx, fr)
}

Types

type CSVLoadOptions

type CSVLoadOptions struct {

	// Comma is the field delimiter.
	// The default value is ',' when CSVLoadOption is not provided.
	// Comma must be a valid rune and must not be \r, \n,
	// or the Unicode replacement character (0xFFFD).
	Comma rune

	// Comment, if not 0, is the comment character. Lines beginning with the
	// Comment character without preceding whitespace are ignored.
	// With leading whitespace the Comment character becomes part of the
	// field, even if TrimLeadingSpace is true.
	// Comment must be a valid rune and must not be \r, \n,
	// or the Unicode replacement character (0xFFFD).
	// It must also not be equal to Comma.
	Comment rune

	// If TrimLeadingSpace is true, leading white space in a field is ignored.
	// This is done even if the field delimiter, Comma, is white space.
	TrimLeadingSpace bool

	// LargeDataSet should be set to true for large datasets.
	// It will set the capacity of the underlying slices of the Dataframe by performing a basic parse
	// of the full dataset before processing the data fully.
	// Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case.
	LargeDataSet bool

	// DictateDataType is used to inform LoadFromCSV what the true underlying data type is for a given field name.
	// The key must be the case-sensitive field name.
	// The value for a given key must be of the data type of the data.
	// eg. For a string use "". For an int64 use int64(0). What is relevant is the data type and not the value itself.
	//
	// NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work.
	DictateDataType map[string]interface{}

	// NilValue allows you to set what string value in the CSV file should be interpreted as a nil value for
	// the purposes of insertion.
	//
	// Common values are: NULL, \N, NaN, NA
	NilValue *string

	// InferDataTypes can be set to true if the underlying data type should be automatically detected.
	// Using DictateDataType is the recommended approach (especially for large datasets or memory constrained systems).
	// DictateDataType always takes precedence when determining the type.
	// If the data type could not be detected, SeriesString is used.
	InferDataTypes bool

	// Headers must be set if the CSV file does not contain a header row. This must be nil if the CSV file contains a
	// header row.
	Headers []string
}

CSVLoadOptions is likely to change.

type Converter

type Converter struct {
	ConcreteType  interface{}
	ConverterFunc GenericDataConverter
}

Converter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("dataframe.SeriesGeneric"). As a special case, if ConcreteType is time.Time, then a SeriesTime is used.

Example:

opts := imports.CSVLoadOptions{
   DictateDataType: map[string]interface{}{
      "Date": imports.Converter{
         ConcreteType: time.Time{},
         ConverterFunc: func(in interface{}) (interface{}, error) {
            return time.Parse("2006-01-02", in.(string))
         },
      },
   },
}

type GenericDataConverter

type GenericDataConverter func(in interface{}) (interface{}, error)

GenericDataConverter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("SeriesGeneric").

type JSONLoadOptions

type JSONLoadOptions struct {

	// LargeDataSet should be set to true for large datasets.
	// It will set the capacity of the underlying slices of the Dataframe by performing a basic parse
	// of the full dataset before processing the data fully.
	// Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case.
	LargeDataSet bool

	// DictateDataType is used to inform LoadFromJSON what the true underlying data type is for a given field name.
	// The key must be the case-sensitive field name.
	// The value for a given key must be of the data type of the data.
	// eg. For a string use "". For an int64 use int64(0). What is relevant is the data type and not the value itself.
	//
	// NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work.
	DictateDataType map[string]interface{}

	// ErrorOnUnknownFields will generate an error if an unknown field is encountered after the first row.
	ErrorOnUnknownFields bool

	// Path sets the location of the array containing the data to import. It uses dot notation relative to the root
	// JSON object. For JSONL files, it does nothing.
	//
	// NOTE: Not implemented.
	Path string
}

JSONLoadOptions is likely to change.

type ParquetLoadOptions

type ParquetLoadOptions struct {
}

ParquetLoadOptions is likely to change.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL