files

package

v0.0.0-...-a18f44e Latest Latest Go to latest Published: Sep 16, 2016 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/mdmarek/dataux

Links

Open Source Insights

Documentation ¶

Overview ¶

Package files implements Cloud Files logic for getting, readding, and converting files into databases. It reads cloud(or local) files, gets lists of tables, and can scan through them using distributed query engine.

Package files is a cloud (gcs, s3) and local file datasource that translates json, csv, files into appropriate interface for qlbridge DataSource so we can run queries. Provides FileHandler interface to allow custom file type handling

Index ¶

Constants
Variables
func NewFileSource() schema.Source
func RegisterFileScanner(scannerType string, fh FileHandler)
type FileHandler
type FileInfo
- func (m *FileInfo) Values() []driver.Value
type FilePager
- func NewFilePager(tableName string, fs *FileSource) *FilePager
type FileReader
type FileSource
type PartitionedFileReader

Constants ¶

View Source

const (
	// SourceType is the registered Source name in the qlbridge source registry
	SourceType = "cloudstore"
)

Variables ¶

View Source

var (
	// FileColumns are the default file-columns
	FileColumns = []string{"file", "table", "path", "size", "partition", "updated", "deleted", "filetype"}
)

View Source

var (

	// FileStoreLoader defines the interface for loading files
	FileStoreLoader func(ss *schema.SchemaSource) (cloudstorage.Store, error)
)

Functions ¶

func NewFileSource ¶

func NewFileSource() schema.Source

NewFileSource provides a singleton manager for a particular Source Schema, and File-Handler to read/manage all files from a source such as gcs folder x, s3 folder y

func RegisterFileScanner ¶

func RegisterFileScanner(scannerType string, fh FileHandler)

RegisterFileScanner Register a file scanner maker available by the provided @scannerType

Types ¶

type FileHandler ¶

type FileHandler interface {
	// Each time the underlying FileStore layer finds a new file it hands it off
	// to filehandler to determine if it is File or not, and to to extract any
	// metadata such as partition, and parse out fields that may exist in File/Folder path
	File(path string, obj cloudstorage.Object) *FileInfo
	// Create a scanner for particiular file
	Scanner(store cloudstorage.Store, fr *FileReader) (schema.ConnScanner, error)
	// FileAppendColumns provides an method that this file-handler is going to provide additional
	// columns to the files list table, ie we are going to extract column info from the
	// folder paths, file-names which is common.
	// Optional:  may be nil
	FileAppendColumns() []string
}

FileHandler defines a file-type/format, each format such as

csv, json, or a custom-protobuf file type of your choosing
would have its on filehandler that knows how to read, parse, scan
a file type.

The File Reading, Opening, Listing is a separate layer, see FileSource

for the Cloudstorage layer.

So it is a a factory to create Scanners for a speciffic format type such as csv, json

type FileInfo ¶

type FileInfo struct {
	Name       string         // Name, Path of file
	Table      string         // Table name this file participates in
	FileType   string         // csv, json, etc
	Partition  int            // which partition
	Size       int            // Content-Length size in bytes
	AppendCols []driver.Value // Additional Column info extracted from file name/folder path
	// contains filtered or unexported fields
}

FileInfo Struct of file info

func (*FileInfo) Values ¶

func (m *FileInfo) Values() []driver.Value

Values as as slice

type FilePager ¶

type FilePager struct {
	schema.ConnScanner
	// contains filtered or unexported fields
}

FilePager acts like a Partitionied Data Source Conn, wrapping underlying FileSource and paging through list of files and only scanning those that match this pagers partition - by default the partition is -1 which means no partitioning

func NewFilePager ¶

func NewFilePager(tableName string, fs *FileSource) *FilePager

NewFilePager creates default new FilePager

func (*FilePager) Close ¶

func (m *FilePager) Close() error

Close this connection/pager

func (*FilePager) Columns ¶

func (m *FilePager) Columns() []string

Columns part of Conn interface for providing columns for this table/conn

func (*FilePager) Next ¶

func (m *FilePager) Next() schema.Message

Next iterator for next message, wraps the file Scanner, Next file abstractions

func (*FilePager) NextFile ¶

func (m *FilePager) NextFile() (*FileReader, error)

NextFile gets next file

func (*FilePager) NextScanner ¶

func (m *FilePager) NextScanner() (schema.ConnScanner, error)

NextScanner provides the next scanner assuming that each scanner representas different file, and multiple files for single source

func (*FilePager) WalkExecSource ¶

func (m *FilePager) WalkExecSource(p *plan.Source) (exec.Task, error)

WalkExecSource Provide ability to implement a source plan for execution

type FileReader ¶

type FileReader struct {
	*FileInfo
	F    io.Reader // Actual file reader
	Exit chan bool // exit channel to shutdown reader
}

FileReader file info and access to file to supply to ScannerMakers

type FileSource ¶

type FileSource struct {
	Partitioner string // random, ??  (date, keyed?)
	// contains filtered or unexported fields
}

FileSource Source for reading files, and scanning them allowing

the contents to be treated as a database, like doing a full
table scan in mysql.  But, you can partition across files.

readers: s3, gcs, local-fs
tablesource: translate lists of files into tables. Normally we would have multiple files per table (ie partitioned, per-day, etc)
scanners: responsible for file-specific
files table: a "table" of all the files from this cloud source

func (*FileSource) Close ¶

func (m *FileSource) Close() error

Close this File Source manager

func (*FileSource) Open ¶

func (m *FileSource) Open(tableName string) (schema.Conn, error)

Open a connection to given table, part of Source interface

func (*FileSource) Setup ¶

func (m *FileSource) Setup(ss *schema.SchemaSource) error

Setup the filesource with schema info

func (*FileSource) Table ¶

func (m *FileSource) Table(tableName string) (*schema.Table, error)

Table satisfys Source Schema interface to get table schema for given table

func (*FileSource) Tables ¶

func (m *FileSource) Tables() []string

Tables for this file-source

type PartitionedFileReader ¶

type PartitionedFileReader interface {
	// NextFile returns io.EOF on last file
	NextFile() (*FileReader, error)
}

PartitionedFileReader defines a file source that can page through files getting next file from partition

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL