csv

package

v0.1.8 Latest Latest Go to latest Published: Feb 21, 2024 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/Breeze0806/go-etl

Links

Open Source Insights

README ¶

CsvReader Plugin Documentation

Quick Introduction

The CsvReader plugin enables data extraction from CSV files. Under the hood, it utilizes the standard libraries os and encoding/csv for file reading.

Implementation Principles

CsvReader leverages the os and encoding/csv standard libraries to read files. Each row is assembled into an abstract dataset using go-etl's custom data types and passed downstream for further processing by a Writer.

The specific reading process is implemented by invoking go-etl's custom file.InStreamer from the reading flow defined in file.Task.

Functionality Description

Configuration Example

Configuring a job to synchronously extract data from a CSV file to a local destination:

{
    "job":{
        "content":[
            {
                "reader":{
                    "name": "csvreader",
                    "parameter": {
                        "path":["a.txt","b.txt"],
                        "column":[
                            {
                                "index":"1",
                                "type":"time",
                                "format":"yyyy-MM-dd"
                            }
                        ],
                        "encoding":"utf-8",
                        "delimiter":","
                    }
                }
            }
        ]
    }
}

Parameter Explanation

path

Description: Specifies the absolute path(s) of the CSV file(s). Multiple files can be configured.
Required: Yes
Default: None

column

Description: Configures the column information array for the CSV file. If not specified, the corresponding columns are assumed to be of type string.
Required: Yes
Default: None

index

Description: Specifies the column number in the CSV file, starting from 1.
Required: Yes
Default: None

type

Description: Configures the data type of the CSV column, including options like boolean, bigInt, decimal, string, time, etc.
Required: Yes
Default: None

format

Description: Specifies the format for the column type, particularly useful for the time type. It uses the Java Joda time format, e.g., yyyy-MM-dd.
Required: Yes, for time type
Default: None

encoding

Description: Configures the encoding type of the CSV file, currently supporting utf-8 and gbk.
Required: No
Default: utf-8

delimiter

Description: Specifies the delimiter used in the CSV file. It supports not only visible symbols like commas or semicolons but also invisible characters such as 0x10 (configured as "\u0010").
Required: No
Default: , (comma)

nullFormat

Description: CSV files cannot represent null (empty pointers) using standard strings. The nullFormat parameter defines which strings can be interpreted as null. For example, if nullFormat is set to "\N", then DataX will treat the source data "\N" as a null field.
Required: No
Default: Empty string

startRow

Description: Specifies the row number from which to start reading in the CSV file, starting from 1.
Required: No
Default: 1

comment

Description: Provides a comment for the CSV file.
Required: No
Default: None

compress

Description: Specifies the compression method used for the CSV file, currently supporting gz (gzip compression) and zip (zip compression).
Required: No
Default: No compression

Type Conversion

The CsvReader currently supports CSV data types that need to be configured in the "column" setting. Please ensure you check your data types.

Below is a list of type conversions supported by CsvReader for CSV data:

go-etl Type	CSV Data Type
bigInt	bigInt
decimal	decimal
string	string
time	time
bool	bool

Performance Report

Pending testing.

Limitations and Constraints

Frequently Asked Questions (FAQ)

(Note: The FAQ section would typically include common questions and answers related to the plugin's usage, troubleshooting, or best practices. However, as no specific questions were provided, this section remains empty. It can be populated as questions arise from users or developers.)

Documentation ¶

Index ¶

type Config
- func NewConfig(conf *config.JSON) (c *Config, err error)
type Job
- func NewJob() *Job
- func (j *Job) Init(ctx context.Context) (err error)
- func (j *Job) Split(ctx context.Context, number int) (configs []*config.JSON, err error)
type Reader

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Config ¶

type Config struct {
	csv.InConfig

	Path []string `json:"path"`
}

Config represents the configuration for reading CSV files.

func NewConfig ¶

func NewConfig(conf *config.JSON) (c *Config, err error)

NewConfig reads the JSON configuration conf to obtain the CSV reading configuration.

type Job ¶

type Job struct {
	*file.Job
	// contains filtered or unexported fields
}

Job - A unit of work to be performed

func NewJob ¶

func NewJob() *Job

NewJob - Creates a new instance of a Job

func (*Job) Init ¶

func (j *Job) Init(ctx context.Context) (err error)

Init - Initializes the Job, setting up any required resources or states

func (*Job) Split ¶

func (j *Job) Split(ctx context.Context, number int) (configs []*config.JSON, err error)

Split - Divides the Job into smaller sub-tasks or sub-jobs for parallel processing or distribution

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

Reader reader

func (*Reader) Job ¶

func (r *Reader) Job() spireader.Job

Job job

func (*Reader) ResourcesConfig ¶

func (r *Reader) ResourcesConfig() *config.JSON

ResourcesConfig plugin resource configuration

func (*Reader) Task ¶

func (r *Reader) Task() spireader.Task

Task task

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL