stringbenchmarks

package

v0.0.0-...-46fb334 Latest Latest Go to latest Published: Mar 30, 2021 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/skeptycal/util

Documentation ¶

Overview ¶

Package stringutils implements additional functions to support the go standard library strings module.

The algorithms chosen are based on benchmarks from the stringbenchmarks module. ymmv...

The current implementation at the start of this project was .../go/1.15.3/libexec/src/strings/strings.go

For information about UTF-8 strings in Go, see https://blog.golang.org/strings.

Index ¶

Constants
Variables
func ByteSamples() []byte
func Cutover(n int) int
func DedupeWhitespace(s string, ignoreNewlines bool) string
func Equal(a, b []byte) bool
func HashStrBytes(sep []byte) (uint32, uint32)
func IndexRabinKarpBytes(s, sep []byte) int
func IsASCIIAlpha(c byte) bool
func IsASCIIPrintable(s string) bool
func IsASCIISpace(c byte) bool
func IsAlphaNum(c byte) bool
func IsAlphaNumSwitch(c byte) bool
func IsAlphaNumUnder(c byte) bool
func IsDigit(c byte) bool
func IsDigitSingleOP(c byte) bool
func IsDigitSingleOPCompare(c byte) bool
func IsHex(c byte) bool
func IsSpaceMask(c byte) bool
func IsUnicodeWhiteSpaceMap(r rune) bool
func JoinLines(list []string) string
func RuneSample(c rune)
func RuneSamples() []rune
func SmallByteSamples() []byte
func SmallByteStringSamples() (list []string)
func SmallRuneSamples() []rune
func SmallRuneStringSamples() (list []string)
func TabIt(s string, n int) string
func ToLower(s string) string
func ToLowerByte(c byte) byte
func ToString(any interface{}) string
func ToUpper(s string) string
func ToUpperByte(c byte) byte
type Any
type List
- func NewList(name string, data []Any) *List
- func (v *List) Add(item Any)
- func (v *List) Cap() int
- func (v *List) Contains(item Any) bool
- func (v *List) Len() int
- func (v *List) Name() string
- func (v *List) ToSet() *Set
- func (v *List) ToSlice() []Any
type Set
- func NewSet(name string, data []Any) *Set
- func (s *Set) Add(item Any) error
- func (s *Set) Cap() int
- func (s *Set) Contains(item Any) bool
- func (s *Set) Len() int
- func (s *Set) Name() string
- func (s *Set) ToList() *List
- func (s *Set) ToSlice() []Any
type SetMap

Constants ¶

View Source

const (
	RuneError = utf8.RuneError // '\uFFFD'       // the "error" Rune or "Unicode replacement character"
	RuneSelf  = utf8.RuneSelf  // 0x80           // characters below RuneSelf are represented as themselves in a single byte.
	MaxRune   = utf8.MaxRune   // '\U0010FFFF'   // Maximum valid Unicode code point.
	UTFMax    = utf8.UTFMax    // 4              // maximum number of bytes of a UTF-8 encoded Unicode character.

)

Numbers fundamental to the encoding.

View Source

const (
	TAB   = 0x09 // '\t'
	LF    = 0x0A // '\n'
	VT    = 0x0B // '\v'
	FF    = 0x0C // '\f'
	CR    = 0x0D // '\r'
	SPACE = ' '
	NBSP  = 0x00A0
	NEL   = 0x0085
)

View Source

const MaxBruteForce = 64 // x86 values

View Source

const PrimeRK = 16777619

PrimeRK is the prime base used in Rabin-Karp algorithm.

Variables ¶

View Source

var MaxLen int = 0

View Source

var (

	// UnicodeWhiteSpaceMap provides a mapping from Unicode runes to strings
	// with descriptions of each. It is marginally slower than the bool map.
	//
	// In computer programming, whitespace is any character or series of
	// characters that represent horizontal or vertical space in typography.
	// When rendered, a whitespace character does not correspond to a visible
	// mark, but typically does occupy an area on a page. For example, the
	// common whitespace symbol SPACE (unicode: U+0020 ASCII: 32 decimal 0x20
	// hex) represents a blank space punctuation character in text, used as a
	// word divider in Western scripts.
	//
	// Reference: https://en.wikipedia.org/wiki/Whitespace_character
	UnicodeWhiteSpaceMap = map[rune]string{
		0x0009: `CHARACTER TABULATION <TAB>`,
		0x000A: `ASCII LF`,
		0x000B: `LINE TABULATION <VT>`,
		0x000C: `FORM FEED <FF>`,
		0x000D: `ASCII CR`,
		0x0020: `SPACE <SP>`,
		0x00A0: `NO-BREAK SPACE <NBSP>`,
		0x0085: `NEL; Next Line`,
		0x1680: `Ogham space mark, interword separation in Ogham text`,
		0x2000: `EN QUAD, 0x2002 is preferred`,
		0x2001: `EM QUAD, mutton quad, 0x2003 is preferred`,
		0x2002: `EN SPACE, "nut", &ensp, LaTeX: '\enspace'`,
		0x2003: `EM SPACE, "mutton", &emsp;, LaTeX: '\quad'`,
		0x2004: `THREE-PER-EM SPACE, "thick space", &emsp13;`,
		0x2005: `four-per-em space, "mid space", &emsp14;`,
		0x2006: `SIX-PER-EM SPACE, sometimes equated to U+2009`,
		0x2007: `FIGURE SPACE, width of monospaced char, &numsp;`,
		0x2008: `PUNCTUATION SPACE, width of period or comma, &puncsp;`,
		0x2009: `THIN SPACE, 1/5th em, thousands sep, &thinsp;; LaTeX: '\,'`,
		0x200A: `HAIR SPACE, &hairsp;`,
		0x2028: `LINE SEPARATOR`,
		0x2029: `PARAGRAPH SEPARATOR`,
		0x202F: `NARROW NO-BREAK SPACE`,
		0x205F: `MEDIUM MATHEMATICAL SPACE, MMSP, &MediumSpace, 4/18 em`,
		0x3000: `IDEOGRAPHIC SPACE, full width CJK character cell`,
		0xFFEF: `ZERO WIDTH NO-BREAK SPACE <ZWNBSP> (BOM), deprecated Unicode 3.2 (use U+2060)`,
	}
)

Functions ¶

func ByteSamples ¶

func ByteSamples() []byte

func Cutover ¶

func Cutover(n int) int

Cutover reports the number of failures of IndexByte we should tolerate before switching over to Index. n is the number of bytes processed so far. See the bytes.Index implementation for details.

func DedupeWhitespace ¶

func DedupeWhitespace(s string, ignoreNewlines bool) string

DedupeWhitespace removes any duplicate whitespace from the string and replaces it with a single space. If ignoreNewlines == true then \n is ignored.

func Equal ¶

func Equal(a, b []byte) bool

Equal reports whether a and b are the same length and contain the same bytes. A nil argument is equivalent to an empty slice.

func HashStrBytes ¶

func HashStrBytes(sep []byte) (uint32, uint32)

HashStrBytes returns the hash and the appropriate multiplicative factor for use in Rabin-Karp algorithm.

func IndexRabinKarpBytes ¶

func IndexRabinKarpBytes(s, sep []byte) int

IndexRabinKarpBytes uses the Rabin-Karp search algorithm to return the index of the first occurrence of substr in s, or -1 if not present.

func IsASCIIAlpha ¶

func IsASCIIAlpha(c byte) bool

func IsASCIIPrintable ¶

func IsASCIIPrintable(s string) bool

IsASCIIPrintable checks if s is ascii and printable, aka doesn't include tab, backspace, etc.

func IsASCIISpace ¶

func IsASCIISpace(c byte) bool

IsASCIISpace tests for the most common ASCII whitespace characters:

' ', '\t', '\n', '\f', '\r', '\v'

This excludes all Unicode code points above 0x007F.

The C language defines whitespace characters to be "space, horizontal tab, new-line, vertical tab, and form-feed."

func IsAlphaNum ¶

func IsAlphaNum(c byte) bool

IsAlphaNum reports whether the byte is an ASCII letter, number, or underscore

func IsAlphaNumSwitch ¶

func IsAlphaNumSwitch(c byte) bool

func IsAlphaNumUnder ¶

func IsAlphaNumUnder(c byte) bool

func IsDigit ¶

func IsDigit(c byte) bool

func IsDigitSingleOP ¶

func IsDigitSingleOP(c byte) bool

IsDigitSingleOP uses a single operation instead of the standard a << c && c << b form Another good example: very common thing is if(x >= 1 && x <= 9) which can be done as if( (unsigned)(x-1) <=(unsigned)(9-1)) Changing two conditional tests to one can be a big speed advantage; especially when it allows predicated execution instead of branches. I used this for years (where justified) until I noticed abt 10 years ago that compilers had started doing this transform in the optimizer, then I stopped. Still good to know, since there are similar situations where the compiler can't make the transform for you. Or if you're working on a compiler.

func IsDigitSingleOPCompare ¶

func IsDigitSingleOPCompare(c byte) bool

IsDigitSingleOPCompare is a sample implementation used for benchmarking IsDigitSingleOP

func IsHex ¶

func IsHex(c byte) bool

func IsSpaceMask ¶

func IsSpaceMask(c byte) bool

func IsUnicodeWhiteSpaceMap ¶

func IsUnicodeWhiteSpaceMap(r rune) bool

IsUnicodeWhiteSpaceMap reports whether the rune is any utf8 whitespace character using the broadest and most complete definition.

The speed of this implementation ~25% slower than that of IsASCIISpace(c byte) but tests 3.75 times more possible code points.

The speed is ~7% faster than that of unicode.IsSpace(r rune) from the standard library and covers nearly twice as many code points.

isWhiteSpaceLogicChain checks for any unicode whitespace rune.

Included:

0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005,
0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x2028,
0x2029, 0x202F, 0x205F, 0x3000, 0x1680

Related Unicode characters (property White_Space=no) Not included:

0x200B,	0x200C,	0x200D,	0x2060

func JoinLines ¶

func JoinLines(list []string) string

func RuneSample ¶

func RuneSample(c rune)

RuneSample prints a sample of various Unicode runes.

func RuneSamples ¶

func RuneSamples() []rune

func SmallByteSamples ¶

func SmallByteSamples() []byte

func SmallByteStringSamples ¶

func SmallByteStringSamples() (list []string)

func SmallRuneSamples ¶

func SmallRuneSamples() []rune

func SmallRuneStringSamples ¶

func SmallRuneStringSamples() (list []string)

func TabIt ¶

func TabIt(s string, n int) string

func ToLower ¶

func ToLower(s string) string

func ToLowerByte ¶

func ToLowerByte(c byte) byte

func ToString ¶

func ToString(any interface{}) string

ToString implements Stringer directly as a function call with a parameter instead of a method on that parameter.

func ToUpper ¶

func ToUpper(s string) string

func ToUpperByte ¶

func ToUpperByte(c byte) byte

Types ¶

type Any ¶

type Any interface{}

Any is used to store data when the type cannot be determined ahead of time.

type List ¶

type List struct {
	// contains filtered or unexported fields
}

List is a wrapper around a slice of items. It offers formatting options and convenience functions.

Example ¶

// List.Contains()
fmt.Println(tempList.Contains(3.14))
fmt.Println(tempList.Contains(42))
// List.Len()
fmt.Println(tempList.Len())
// List.Cap()
fmt.Println(tempList.Cap())
// List.Name()
fmt.Println(tempList.Name())
// List.Add()
fmt.Println(tempList.Contains("fake"))
tempList.Add("fake")
// fmt.Println(tempList.Contains("fake"))

Output:

false
false
6
6
tempList
false

func NewList ¶

func NewList(name string, data []Any) *List

NewList returns a new List from the given data.

Example ¶

fmt.Println(tempList)

Output:

&{tempList [this 1 <nil> 0 3.14 9]}

func (*List) Add ¶

func (v *List) Add(item Any)

Add adds item to the List Duplicates are allowed.

func (*List) Cap ¶

func (v *List) Cap() int

Cap returns the max number of elements in the List.

func (*List) Contains ¶

func (v *List) Contains(item Any) bool

Contains tells whether a contains x.

func (*List) Len ¶

func (v *List) Len() int

Len returns of count of elements in the Set. If the Set is nil, Len() is zero.

func (*List) Name ¶

func (v *List) Name() string

Name returns the name of the List.

func (*List) ToSet ¶

func (v *List) ToSet() *Set

ToSet returns the underlying data as a Set.

func (*List) ToSlice ¶

func (v *List) ToSlice() []Any

ToSlice returns the underlying data as a slice.

type Set ¶

type Set struct {
	// contains filtered or unexported fields
}

Set is a hashable version of a list with unique items.

Example ¶

// Set.Contains()
fmt.Println(tempSet.Contains(3.14))
fmt.Println(tempSet.Contains(42))
// Set.Len()
fmt.Println(tempSet.Len())
// Set.Cap()
fmt.Println(tempSet.Cap())
// Set.Name()
fmt.Println(tempSet.Name())
// Set.Add()
fmt.Println(tempSet.Contains("fake"))
_ = tempSet.Add("fake")
// fmt.Println(tempSet.Contains("fake"))

Output:

true
false
6
6
tempSet
false

func NewSet ¶

func NewSet(name string, data []Any) *Set

NewSet returns a new Set from the given List

Example ¶

fmt.Println(tempSet)

Output:

&{tempSet map[<nil>:true 3.14:true 0:true 1:true 9:true this:true]}

func (*Set) Add ¶

func (s *Set) Add(item Any) error

Add adds item to the Set or returns an error. Duplicates are not allowed.

func (*Set) Cap ¶

func (s *Set) Cap() int

Cap returns the max number of elements in the Set (since cap is undefined for map types in go).

func (*Set) Contains ¶

func (s *Set) Contains(item Any) bool

Contains returns true if the Set contains item.

func (*Set) Len ¶

func (s *Set) Len() int

Len returns of elements in the Set If the Set is nil, Len() is zero.

func (*Set) Name ¶

func (s *Set) Name() string

Name returns the name of the Set.

func (*Set) ToList ¶

func (s *Set) ToList() *List

ToList returns the underlying data as a List.

func (*Set) ToSlice ¶

func (s *Set) ToSlice() []Any

ToSlice returns the underlying data as a slice.

type SetMap ¶

type SetMap = map[Any]bool

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL