dtxt

package module

v0.0.0-...-9ae038d Latest Latest Go to latest Published: Jul 22, 2022 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/reiver/go-dtxt

Links

Open Source Insights

README ¶

go-dtxt

Package dtxt implements encoding and decoding of ASCII delimited text, for the Go programming language.

ASCII delimited text is similar to CSV, TSV, and other table & spreadsheet data formats. Except that ASCII delimited text uses some of the deliminator control code characters that Unicode inherited from ASCII.

ASCII delimited text could also probably be validly called Unicode delimited text. Especially when Unicode is encoded as UTF-8.

Documention

Online documentation, which includes examples, can be found at: http://godoc.org/github.com/reiver/go-dtxt

Encoding Example

This is a basic example of how to encode tabular data into ASCII delimited text using this package:

import "github.com/reiver/go-dtxt"

// ...

var writer io.Writer //@TODO: set to wherever you want the encoded ASCII Delimited Text data to go.

// ...

var encoder dtxt.Encoder = dtxt.EncoderWrap(writer)

err := encoder.Begin()

// ...

defer encoder.End()

// ...

// row 1
err := encode.EncodeRow("ONCE", '۱', "1", "Ⅰ", "یکی")

// ...

// row 2
err := encode.EncodeRow("TWICE", '۲', "2", "Ⅱ". "دو")

// ...

// row 3
err := encode.EncodeRow("THRICE", '۳', "3", "Ⅲ", "سه")

// ...

// row 3
err := encode.EncodeRow("FOURCE", '۴', "3", "Ⅳ", "چهار")

// ...

Decoding Example

This is a basic example of how to dencode tabular data from ASCII delimited text using this package.

In this example it is known ahead of time how many columns there are in the data.


import "github.com/reiver/go-dtxt"

// ...

var reader io.Reader //@TODO: set to wherever you want the encoded ASCII Delimited Text data to come from.

// ...

var decoder dtxt.Decoder = dtxt.WrapDecoder(reader)

// ...

for {
	var key string
	var value string
	
	err := decoder.DecodeRow(&key, &value)
	if dtxt.GS == err {
		break
	}
	if nil != err {
		return err
	}
}

Deliminators

Unicode inherited 5 deliminator control code characters from ASCII:

Symbol	Name	Alternative Name	Abbreviation	Hexadecimal	Decimal	Caret	UTF-8
␜	File Separator		FS	0x1c	28	`^\`	`0b00011100`
␝	Group Separator	Table Terminator	GS	0x1d	29	`^]`	`0b00011101`
␞	Row Separator	Row Terminator	RS	0x1e	30	`^^`	`0b00011110`
␟	Unit Separator	Field Terminator	US	0x1f	31	`^_`	`0b00011111`
␠	Space	Word Separator	SP	0x20	32	^`	`0b00100000`

Table Row Format

Unit Separator (US) and Row Separator (RS) can be used to construct a table row.

For example, if we wanted to have a table row with 3 fields: “joe”, “blow”, and “root beer”. I.e,. —


joe	blow	root beer

Then the result would be this:

const US = 0x1f
const RS = 0x1e

[]byte{
	'j','o','e', 
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'b','l','o','w',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'r','o','o','t',' ','b','e','e','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator
}

(Note this is just a single row. And not a whole table. A whole table would have a GS control code character at the end of it.)

⚠️ Notice that we are using the US control code characters in the Unix/Linux style — as a field terminator (and not just a field separator). I.e., the last field gets a US after it too.

⚠️ Notice also that we are using the RS control code character in the Unix/Linux style too — as a row terminator (and not just a row separator). I.e., the last row gets a RS after it too.

Table Format

Let's make it more obvious how RS is used by showing a whole table encoded (and not just a row). Let's encode this table:


joe	blow	root beer
john	doe	caramel apple
jane	doe	cotton candy

const GS = 0x1d // table terminator
const RS = 0x1e // row terminator
const US = 0x1f // field terminator

[]byte{
	'j','o','e', 
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'b','l','o','w',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'r','o','o','t',' ','b','e','e','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'j','o','h','n',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'd','o','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'c','a','r','a','m','e','l',' ','a','p','p','l','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'j','a','n','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'd','o','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'c','o','t','t','o','n',' ','c','a','n','d','y',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	GS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Table Terminator
}

⚠️ Notice that we are using the GS control code characters in the Unix/Linux style — as a table terminator (and not just a table separator). I.e., the last rows gets a GS after it.

Escaping

One issue that can arise is — what if the data inside of a unit contains a Unit Separator (US), a Row Separator (RS), a Group Separator (GS), or a File Separator (FS)‽

How is that situation handled‽

The answer is that — Unicode inherited a control code character for escaping. The aptly named Escape (ESC) control code character:

Name	Abbreviation	Hexadecimal	Decimal	Caret	UTF-8
Escape	ESC	0x1b	27	`^[`	`0b00011011`

An ESC chararacter is stuffed before any Escape (ESC), Unit Separator (US), Row Separator (RS), Group Separator (GS), or File Separator (FS) that appears inside of a unit.

Here is an example.

Let's say that we want to encode this table:


`[]byte{'E','S','C'}`	`[]byte{ESC}`	`[]byte{'e','s','c','a','p','e'}`
`[]byte{'F','S'}`	`[]byte{FS}`	`[]byte{'f','i','l','e',' ','t','e','r','m','i','n','a','t','o','r'}`
`[]byte{'G','S'}`	`[]byte{GS}`	`[]byte{'t','a','b','l','e',' ','t','e','r','m','i','n','a','t','o','r'}`
`[]byte{'R','S'}`	`[]byte{RS}`	`[]byte{'r','o','w',' ','t','e','r','m','i','n','a','t','o','r'}`
`[]byte{'U','S'}`	`[]byte{US}`	`[]byte{'f','i','e','l','d',' ','t','e','r','m','i','n','a','t','o','r'}`

We would get:

const ESC = 0x1b // escape
const FS  = 0x1c // file terminator
const GS  = 0x1d // table terminator
const RS  = 0x1e // row terminator
const US  = 0x1f // field terminator

[]byte{
	'E','S','C',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	ESC,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'e','s','c','a','p','e',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'F','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	FS,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'f','i','l','e',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'G','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	GS,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	't','a','b','l','e',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'R','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	RS,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'r','o','w',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator



	'U','S',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	ESC, // ⇚⇚⇚⇚⇚ Escape. Next character will be treated as data regardless of whether it is a control code character or not.
	US,
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	'f','i','e','l','d',' ','t','e','r','m','i','n','a','t','o','r',
	US, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Field Terminator
	
	RS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Row Terminator
	
	
	
	GS, // ⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚⇚ Table Terminator
}

Documentation ¶

Index ¶

Constants
type Encoder
- func EncoderWrap(writer io.Writer) Encoder

Constants ¶

View Source

const (
	GS = fck.Error("GS") // ‘GS’ represents the end of a table. ‘GS’ means ‘Group Separator’. ‘GS’ also sometimes called a ‘Table Separator’.
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Encoder ¶

type Encoder struct {
	// contains filtered or unexported fields
}

func EncoderWrap ¶

func EncoderWrap(writer io.Writer) Encoder

func (*Encoder) Begin ¶

func (receiver *Encoder) Begin() error

func (Encoder) EncodeRow ¶

func (receiver Encoder) EncodeRow(values ...any) error

func (*Encoder) End ¶

func (receiver *Encoder) End() error

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL