text

package

v1.0.11 Latest Latest Go to latest Published: Jan 25, 2024 License: CC0-1.0, MIT Imports: 6 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/Hubmakerlabs/replicatr

Links

Open Source Insights

Documentation ¶

Overview ¶

Package text implements an RFC8259 compliant string escaping with a pre-calculation stage that eliminates the risk of multiple allocations for long inputs.

Index ¶

Constants
func EscapeJSONStringAndWrap(s string) (escaped []byte)
func EscapeJSONStringAndWrapOld(s string) (escaped []byte)
func EscapeString(dst []byte, s string) []byte
func FirstHexCharToValue(in byte) (out byte)
func SecondHexCharToValue(in byte) (out byte)
func UnescapeByteString(bs []byte) (o []byte)
func Unwrap(wrapped []byte) (unwrapped []byte)
type Buffer
- func NewBuffer(b []byte) (buf *Buffer)

Constants ¶

View Source

const (
	QuotationMark    = 0x22
	QuotationMarkGo  = '"'
	ReverseSolidus   = 0x5c
	ReverseSolidusGo = '\\'
	Solidus          = 0x2f
	SolidusGo        = '/'
	Backspace        = 0x08
	BackspaceGo      = '\b'
	FormFeed         = 0x0c
	FormFeedGo       = '\f'
	LineFeed         = 0x0a
	LineFeedGo       = '\n'
	CarriageReturn   = 0x0d
	CarriageReturnGo = '\r'
	Tab              = 0x09
	TabGo            = '\t'
	Space            = 0x20
	SpaceGo          = ' '
)

The character constants are used as their names. IDEs with inlays expanding the values will demonstrate the equivalence of these with the same decimal UTF-8 value, thus the secondary items with their Go rune equivalents.

The human readable forms are given in order to educate more than anything else. The same symbols can be used in regular Go double quoted "" strings to indicate the same character.

Different rules apply to backtick quoted strings, which allow any character to be placed in a string, escaped sequences are literally interpreted instead of parsed to their respective bytes, and generally editors won't allow the placement of control characters in these strings; their purpose is allowing properly flowed, line-break containing strings such as embedding literal text. Backtick strings can contain printf formatting same as double quote strings.

Variables ¶

This section is empty.

Functions ¶

func EscapeJSONStringAndWrap ¶

func EscapeJSONStringAndWrap(s string) (escaped []byte)

func EscapeJSONStringAndWrapOld ¶

func EscapeJSONStringAndWrapOld(s string) (escaped []byte)

EscapeJSONStringAndWrapOld takes an arbitrary string and escapes all control characters as per rfc8259 section 7 https://www.rfc-editor.org/rfc/rfc8259 (retrieved 2023-11-21):

The representation of strings is similar to conventions used in the C family
of programming languages. A string begins and ends with quotation marks. All
Unicode characters may be placed within the quotation marks, except for the
characters that MUST be escaped: quotation mark, reverse solidus, and the
control characters (U+0000 through U+001F).

The string is assumed to be UTF-8 and only the above escapes are processed. The string will be wrapped in double quotes `"` as it is assumed that the string will be added to a JSON document in a place where a string is valid.

The processing proceeds in two passes, first calculating the required expansion for the characters in the provided string, and then copying over and adding the required escape code expansions as indicated, to ensure that for very long strings only one allocation, of precisely the correct amount, is made.

Note the iteration through the string must proceed as though the string is []byte rather than be interpreted using a `for _, c := range s` which will prompt Go to interpret the string as UTF-8 and potentially return a different result, this occurs on the series of characters 0-255 at a certain point due to UTF-8 encoding rules.

One last thing to note. The stdlib function `json.Marshal` automatically runs a HTML escape processing which turns some valid characters, namely:

String values encode as JSON strings coerced to valid UTF-8, replacing
invalid bytes with the Unicode replacement rune. So that the JSON will be
safe to embed inside HTML <script> tags, the string is encoded using
HTMLEscape, which replaces "<", ">", "&", U+2028, and U+2029 are escaped to
"\u003c","\u003e", "\u0026", "\u2028", and "\u2029". This replacement can be
disabled when using an Encoder, by calling SetEscapeHTML(false).

And so the assumption this code here makes is that backslashes need to be escaped, needs to have special handling to not escape the escaped, in order to allow custom JSON marshalers to not keep adding backslashes to valid UTF-8 entities.

func EscapeString ¶

func EscapeString(dst []byte, s string) []byte

EscapeString for JSON encoding according to RFC8259.

taken from https://github.com/nbd-wtf/go-nostr/blob/master/utils.go replaced by EscapeJSONStringAndWrap in file rfc8259.go tested to be functionally equivalent, the purpose of the above function is to eliminate extra heap allocations for very long strings such as long form posts.

Formatting is retained from the original despite being ugly.

func FirstHexCharToValue ¶

func FirstHexCharToValue(in byte) (out byte)

FirstHexCharToValue returns the hex value of a provided character from the first place in an 8 bit value of two characters.

Two of these functions exist to minimise the computation cost, thus doubling the memory cost in the switch lookup table.

func SecondHexCharToValue ¶

func SecondHexCharToValue(in byte) (out byte)

SecondHexCharToValue returns the hex value of a provided character from the second (last) place in an 8 bit value.

func UnescapeByteString ¶

func UnescapeByteString(bs []byte) (o []byte)

UnescapeByteString scans a string assumed to be UTF-8 for escaped UTF-8 characters that must be escaped for JSON/HTML encoding. This means octal `\xxx` unicode backslash escapes \uXXXX and \UXXXX

func Unwrap ¶

func Unwrap(wrapped []byte) (unwrapped []byte)

Unwrap is a dumb function that just slices off the first and last byte, which from the EscapeJSONStringAndWrap function is the quotes around it.

This can be unsafe to run as it assumes there is at least two bytes.

TODO: rewrite this all to work from []byte and optional quote wrapping.

Types ¶

type Buffer ¶

type Buffer struct {
	Pos int
	Buf []byte
}

func NewBuffer ¶

func NewBuffer(b []byte) (buf *Buffer)

NewBuffer returns a new buffer containing the provided slice. This slice can/will be mutated.

func (*Buffer) Bytes ¶

func (b *Buffer) Bytes() (bb []byte)

func (*Buffer) Copy ¶

func (b *Buffer) Copy(length, src, dest int) (err error)

Copy a given length of bytes starting at src position to dest position, and move the cursor to the end of the written segment.

func (*Buffer) Head ¶

func (b *Buffer) Head() []byte

Head returns the buffer from the start until the current Pos position.

func (*Buffer) Read ¶

func (b *Buffer) Read() (bb byte, err error)

Read the next byte out of the buffer or return io.EOF if there is no more.

func (*Buffer) ReadBytes ¶

func (b *Buffer) ReadBytes(count int) (bb []byte, err error)

ReadBytes returns the specified number of byte, and advances the cursor, or io.EOF if there isn't this much remaining after the cursor.

func (*Buffer) ReadEnclosed ¶

func (b *Buffer) ReadEnclosed() (bb []byte, err error)

ReadEnclosed scans quickly while keeping count of open and close brackets [] or braces {} and returns the byte sub-slice starting with a bracket and ending with the same depth bracket. Selects the counted characters based on the first.

Ignores anything within quotes.

Useful for quickly finding a potentially valid array or object in JSON.

func (*Buffer) ReadThrough ¶

func (b *Buffer) ReadThrough(c byte) (bb []byte, err error)

ReadThrough is the same as ReadUntil except it returns a slice *including* the character being sought.

func (*Buffer) ReadUntil ¶

func (b *Buffer) ReadUntil(c byte) (bb []byte, err error)

ReadUntil returns all of the buffer from the Pos at invocation, until the index immediately before the match of the requested character.

The next Read or Write after this will return the found character or mutate it. If the first character at the index of the Pos is the one being sought, it returns a zero length slice.

Note that the implementation does not increment the Pos position until either the end of the buffer or when the requested character is found, because there is no need to write the value twice for no reason.

When this function returns an error, the state of the buffer is unchanged from prior to the invocation.

If the character is not `"` then any match within a pair of unescaped `"` is ignored. The closing `"` is not counted if it is escaped with a \.

If the character is `"` then any `"` with a `\` before it is ignored (and included in the returned slice).

func (*Buffer) Scan ¶

func (b *Buffer) Scan(c byte, through, slice bool) (subSlice []byte, err error)

Scan is the utility back end that does all the scan/read functionality

func (*Buffer) ScanForOneOf ¶

func (b *Buffer) ScanForOneOf(through bool, c ...byte) (which byte, err error)

ScanForOneOf provides the ability to scan for two or more different bytes.

For simplicity it does not skip quotes, it was actually written to find quotes or braces but just to make it clear this is very bare.

if through is set to true, the cursor is advanced to the next after the match

func (*Buffer) ScanThrough ¶

func (b *Buffer) ScanThrough(c byte) (err error)

ScanThrough does the same as ScanUntil except it returns the next index *after* the found item.

func (*Buffer) ScanUntil ¶

func (b *Buffer) ScanUntil(c byte) (err error)

ScanUntil does the same as ReadUntil except it doesn't slice what it passed over.

func (*Buffer) String ¶

func (b *Buffer) String() (s string)

String returns the whole buffer as a string.

func (*Buffer) Tail ¶

func (b *Buffer) Tail() []byte

Tail returns the buffer starting from the current Pos position.

func (*Buffer) Write ¶

func (b *Buffer) Write(bb byte) (err error)

Write a byte into the next index of the buffer or return io.EOF if there is no space left.

func (*Buffer) WriteBytes ¶

func (b *Buffer) WriteBytes(bb []byte) (err error)

WriteBytes copies over top of the current buffer with the bytes given.

Returns io.EOF if the write would exceed the end of the buffer, and does not perform the operation, nor move the cursor.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL