dtxt

package module
v0.0.0-...-9ae038d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2022 License: MIT Imports: 7 Imported by: 0

README

go-dtxt

Package dtxt implements encoding and decoding of ASCII delimited text, for the Go programming language.

ASCII delimited text is similar to CSV, TSV, and other table & spreadsheet data formats. Except that ASCII delimited text uses some of the deliminator control code characters that Unicode inherited from ASCII.

ASCII delimited text could also probably be validly called Unicode delimited text. Especially when Unicode is encoded as UTF-8.

Documention

Online documentation, which includes examples, can be found at: http://godoc.org/github.com/reiver/go-dtxt

GoDoc

Encoding Example

This is a basic example of how to encode tabular data into ASCII delimited text using this package:

import "github.com/reiver/go-dtxt"

// ...

var writer io.Writer //@TODO: set to wherever you want the encoded ASCII Delimited Text data to go.

// ...

var encoder dtxt.Encoder = dtxt.EncoderWrap(writer)

err := encoder.Begin()

// ...

defer encoder.End()

// ...

// row 1
err := encode.EncodeRow("ONCE", '۱', "1", "Ⅰ", "یکی")

// ...

// row 2
err := encode.EncodeRow("TWICE", '۲', "2", "Ⅱ". "دو")

// ...

// row 3
err := encode.EncodeRow("THRICE", '۳', "3", "Ⅲ", "سه")

// ...

// row 3
err := encode.EncodeRow("FOURCE", '۴', "3", "Ⅳ", "چهار")

// ...


Decoding Example

This is a basic example of how to dencode tabular data from ASCII delimited text using this package.

In this example it is known ahead of time how many columns there are in the data.


import "github.com/reiver/go-dtxt"

// ...

var reader io.Reader //@TODO: set to wherever you want the encoded ASCII Delimited Text data to come from.

// ...

var decoder dtxt.Decoder = dtxt.WrapDecoder(reader)

// ...

for {
	var key string
	var value string
	
	err := decoder.DecodeRow(&key, &value)
	if dtxt.GS == err {
		break
	}
	if nil != err {
		return err
	}
}

Deliminators

Unicode inherited 5 deliminator control code characters from ASCII:

Symbol Name Alternative Name Abbreviation Hexadecimal Decimal Caret UTF-8
File Separator FS 0x1c 28 ^\ 0b00011100
Group Separator Table Terminator GS 0x1d 29 ^] 0b00011101
Row Separator Row Terminator RS 0x1e 30 ^^ 0b00011110
Unit Separator Field Terminator US 0x1f 31 ^_ 0b00011111
Space Word Separator SP 0x20 32 ^` 0b00100000

Table Row Format

Unit Separator (US) and Row Separator (RS) can be used to construct a table row.

For example, if we wanted to have a table row with 3 fields: “joe”, “blow”, and “root beer”. I.e,. —

joe blow root beer

Then the result would be this:

const US = 0x1f
const RS = 0x1e

[]byte{
	'j','o','e', 
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'b','l','o','w',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'r','o','o','t',' ','b','e','e','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator
}

(Note this is just a single row. And not a whole table. A whole table would have a GS control code character at the end of it.)

⚠️ Notice that we are using the US control code characters in the Unix/Linux style — as a field terminator (and not just a field separator). I.e., the last field gets a US after it too.

⚠️ Notice also that we are using the RS control code character in the Unix/Linux style too — as a row terminator (and not just a row separator). I.e., the last row gets a RS after it too.

Table Format

Let's make it more obvious how RS is used by showing a whole table encoded (and not just a row). Let's encode this table:

joe blow root beer
john doe caramel apple
jane doe cotton candy
const GS = 0x1d // table terminator
const RS = 0x1e // row terminator
const US = 0x1f // field terminator

[]byte{
	'j','o','e', 
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'b','l','o','w',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'r','o','o','t',' ','b','e','e','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'j','o','h','n',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'd','o','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'c','a','r','a','m','e','l',' ','a','p','p','l','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'j','a','n','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'd','o','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'c','o','t','t','o','n',' ','c','a','n','d','y',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	GS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Table Terminator
}

⚠️ Notice that we are using the GS control code characters in the Unix/Linux style — as a table terminator (and not just a table separator). I.e., the last rows gets a GS after it.

Escaping

One issue that can arise is — what if the data inside of a unit contains a Unit Separator (US), a Row Separator (RS), a Group Separator (GS), or a File Separator (FS)‽

How is that situation handled‽

The answer is that — Unicode inherited a control code character for escaping. The aptly named Escape (ESC) control code character:

Name Abbreviation Hexadecimal Decimal Caret UTF-8
Escape ESC 0x1b 27 ^[ 0b00011011

An ESC chararacter is stuffed before any Escape (ESC), Unit Separator (US), Row Separator (RS), Group Separator (GS), or File Separator (FS) that appears inside of a unit.

Here is an example.

Let's say that we want to encode this table:

[]byte{'E','S','C'} []byte{ESC} []byte{'e','s','c','a','p','e'}
[]byte{'F','S'} []byte{FS} []byte{'f','i','l','e',' ','t','e','r','m','i','n','a','t','o','r'}
[]byte{'G','S'} []byte{GS} []byte{'t','a','b','l','e',' ','t','e','r','m','i','n','a','t','o','r'}
[]byte{'R','S'} []byte{RS} []byte{'r','o','w',' ','t','e','r','m','i','n','a','t','o','r'}
[]byte{'U','S'} []byte{US} []byte{'f','i','e','l','d',' ','t','e','r','m','i','n','a','t','o','r'}

We would get:

const ESC = 0x1b // escape
const FS  = 0x1c // file terminator
const GS  = 0x1d // table terminator
const RS  = 0x1e // row terminator
const US  = 0x1f // field terminator

[]byte{
	'E','S','C',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	ESC,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'e','s','c','a','p','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'F','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	FS,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'f','i','l','e',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'G','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	GS,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	't','a','b','l','e',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'R','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	RS,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'r','o','w',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'U','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	US,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'f','i','e','l','d',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator
	
	
	
	GS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Table Terminator
}

Documentation

Index

Constants

View Source
const (
	GS = fck.Error("GS") // ‘GS’ represents the end of a table. ‘GS’ means ‘Group Separator’. ‘GS’ also sometimes called a ‘Table Separator’.
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

func EncoderWrap

func EncoderWrap(writer io.Writer) Encoder

func (*Encoder) Begin

func (receiver *Encoder) Begin() error

func (Encoder) EncodeRow

func (receiver Encoder) EncodeRow(values ...any) error

func (*Encoder) End

func (receiver *Encoder) End() error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL