tabextract

package module
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2023 License: MIT Imports: 3 Imported by: 0

README

Tabextract

Extracting data from csv, xls, xlsx files with Go fast and stable

Telegram

Features

Tabextract is a tool that can extract data from xls and xlsx files in a fast and stable way. With pure golang to read Excel files into mem, which is a tabular data structure that can be manipulated and analyzed easily. Tabextract is useful for data mining and processing tasks that involve Excel files.

Usage

Tabextract provides a simple standard interface for all supported filetypes, allowing access to both named worksheets and single tables in plain text formats.

package main

import (
    "fmt"
    "os"
    "strings"

    "github.com/collatzc/tabextract"
    _ "github.com/collatzc/tabextract/simple"
    _ "github.com/collatzc/tabextract/xls"
    _ "github.com/collatzc/tabextract/xlsx"
)

func main() {
    wb, _ := tabextract.Open(os.Args[1])
    sheets, _ := wb.List()
    for _, s := range sheets {
        sheet, _ := wb.Get(s)
        for sheet.Next() {
            row := sheet.Strings()
            fmt.Println(strings.Join(row, "\t"))
            if row[0] == "" {
                break
            }
        }
    }
    wb.Close()
}

License

All source code is licensed under the MIT License.

Documentation

Index

Constants

View Source
const (
	ContinueColumnMerged = "→"
	EndColumnMerged      = "⇥"
	ContinueRowMerged    = "↓"
	EndRowMerged         = "⤓"
)

Variables

View Source
var (

	// Debug should be set to true to expose detailed logging.
	Debug bool = (loglevel == "debug")
)
View Source
var ErrInvalidScanType = errors.New("tabextract: Scan only supports *bool, *int, *float64, *string, *time.Time arguments")

ErrInvalidScanType is returned by Scan for invalid arguments.

View Source
var ErrNotInFormat = errors.New("tabextract: file is not in this format")

ErrNotInFormat is used to auto-detect file types using the defined OpenFunc It is returned by OpenFunc when the code does not detect correct file formats.

View Source
var ErrUnknownFormat = errors.New("tabextract: file format is not known/supported")

ErrUnknownFormat is used when tabextract does not know how to open a file format.

Functions

func ColAtoi added in v0.0.2

func ColAtoi(col string) (int, error)

func Register

func Register(name string, priority int, opener OpenFunc) error

Register the named source as a tabextract datasource implementation.

func WrapErr

func WrapErr(e ...error) error

WrapErr wraps a set of errors.

Types

type Collection

type Collection interface {
	// Next advances to the next record of content.
	// It MUST be called prior to any Scan().
	Next() bool

	// Strings extracts values from the current record into a list of strings.
	Strings() []string

	// Types extracts the data types from the current record into a list.
	// options: "boolean", "integer", "float", "string", "date",
	// and special cases: "blank", "hyperlink" which are string types
	Types() []string

	// Formats extracts the format codes for the current record into a list.
	Formats() []string

	// Scan extracts values from the current record into the provided arguments
	// Arguments must be pointers to one of 5 supported types:
	//     bool, int64, float64, string, or time.Time
	// If invalid, returns ErrInvalidScanType
	Scan(args ...interface{}) error

	// IsEmpty returns true if there are no data values.
	IsEmpty() bool

	// Err returns the last error that occured.
	Err() error
}

Collection represents an iterable collection of records.

type OpenFunc

type OpenFunc func(filename string) (Source, error)

OpenFunc defines a Source's instantiation function. It should return ErrNotInFormat immediately if filename is not of the correct file type.

type Source

type Source interface {
	// List the individual data tables within this source.
	List() ([]string, error)

	// Get a Collection from the source by name.
	Get(name string) (Collection, error)

	// Close the source and discard memory.
	Close() error
}

func Open

func Open(filename string) (Source, error)

Open a tabular data file and return a Source for accessing it's contents.

Directories

Path Synopsis
xls
Package xls implements the Microsoft Excel Binary File Format (.xls) Structure.
Package xls implements the Microsoft Excel Binary File Format (.xls) Structure.
cfb
Package cfb implements the Microsoft Compound File Binary File Format.
Package cfb implements the Microsoft Compound File Binary File Format.
crypto
Package crypto implements excel encryption algorithms from the MS-OFFCRYPTO design specs.
Package crypto implements excel encryption algorithms from the MS-OFFCRYPTO design specs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL