Documentation ¶
Overview ¶
Package stringutils implements additional functions to support the go standard library strings module.
The algorithms chosen are based on benchmarks from the stringbenchmarks module. ymmv...
The current implementation at the start of this project was .../go/1.15.3/libexec/src/strings/strings.go
For information about UTF-8 strings in Go, see https://blog.golang.org/strings.
Index ¶
- Constants
- Variables
- func ByteSamples() []byte
- func Cutover(n int) int
- func DedupeWhitespace(s string, ignoreNewlines bool) string
- func Equal(a, b []byte) bool
- func HashStrBytes(sep []byte) (uint32, uint32)
- func IndexRabinKarpBytes(s, sep []byte) int
- func IsASCIIAlpha(c byte) bool
- func IsASCIIPrintable(s string) bool
- func IsASCIISpace(c byte) bool
- func IsAlphaNum(c byte) bool
- func IsAlphaNumSwitch(c byte) bool
- func IsAlphaNumUnder(c byte) bool
- func IsDigit(c byte) bool
- func IsDigitSingleOP(c byte) bool
- func IsDigitSingleOPCompare(c byte) bool
- func IsHex(c byte) bool
- func IsSpaceMask(c byte) bool
- func IsUnicodeWhiteSpaceMap(r rune) bool
- func JoinLines(list []string) string
- func RuneSample(c rune)
- func RuneSamples() []rune
- func SmallByteSamples() []byte
- func SmallByteStringSamples() (list []string)
- func SmallRuneSamples() []rune
- func SmallRuneStringSamples() (list []string)
- func TabIt(s string, n int) string
- func ToLower(s string) string
- func ToLowerByte(c byte) byte
- func ToString(any interface{}) string
- func ToUpper(s string) string
- func ToUpperByte(c byte) byte
- type Any
- type List
- type Set
- type SetMap
Examples ¶
Constants ¶
const ( RuneError = utf8.RuneError // '\uFFFD' // the "error" Rune or "Unicode replacement character" RuneSelf = utf8.RuneSelf // 0x80 // characters below RuneSelf are represented as themselves in a single byte. MaxRune = utf8.MaxRune // '\U0010FFFF' // Maximum valid Unicode code point. UTFMax = utf8.UTFMax // 4 // maximum number of bytes of a UTF-8 encoded Unicode character. )
Numbers fundamental to the encoding.
const ( TAB = 0x09 // '\t' LF = 0x0A // '\n' VT = 0x0B // '\v' FF = 0x0C // '\f' CR = 0x0D // '\r' SPACE = ' ' NBSP = 0x00A0 NEL = 0x0085 )
const MaxBruteForce = 64 // x86 values
const PrimeRK = 16777619
PrimeRK is the prime base used in Rabin-Karp algorithm.
Variables ¶
var MaxLen int = 0
var ( // UnicodeWhiteSpaceMap provides a mapping from Unicode runes to strings // with descriptions of each. It is marginally slower than the bool map. // // In computer programming, whitespace is any character or series of // characters that represent horizontal or vertical space in typography. // When rendered, a whitespace character does not correspond to a visible // mark, but typically does occupy an area on a page. For example, the // common whitespace symbol SPACE (unicode: U+0020 ASCII: 32 decimal 0x20 // hex) represents a blank space punctuation character in text, used as a // word divider in Western scripts. // // Reference: https://en.wikipedia.org/wiki/Whitespace_character UnicodeWhiteSpaceMap = map[rune]string{ 0x0009: `CHARACTER TABULATION <TAB>`, 0x000A: `ASCII LF`, 0x000B: `LINE TABULATION <VT>`, 0x000C: `FORM FEED <FF>`, 0x000D: `ASCII CR`, 0x0020: `SPACE <SP>`, 0x00A0: `NO-BREAK SPACE <NBSP>`, 0x0085: `NEL; Next Line`, 0x1680: `Ogham space mark, interword separation in Ogham text`, 0x2000: `EN QUAD, 0x2002 is preferred`, 0x2001: `EM QUAD, mutton quad, 0x2003 is preferred`, 0x2002: `EN SPACE, "nut", &ensp, LaTeX: '\enspace'`, 0x2003: `EM SPACE, "mutton",  , LaTeX: '\quad'`, 0x2004: `THREE-PER-EM SPACE, "thick space",  `, 0x2005: `four-per-em space, "mid space",  `, 0x2006: `SIX-PER-EM SPACE, sometimes equated to U+2009`, 0x2007: `FIGURE SPACE, width of monospaced char,  `, 0x2008: `PUNCTUATION SPACE, width of period or comma,  `, 0x2009: `THIN SPACE, 1/5th em, thousands sep,  ; LaTeX: '\,'`, 0x200A: `HAIR SPACE,  `, 0x2028: `LINE SEPARATOR`, 0x2029: `PARAGRAPH SEPARATOR`, 0x202F: `NARROW NO-BREAK SPACE`, 0x205F: `MEDIUM MATHEMATICAL SPACE, MMSP, &MediumSpace, 4/18 em`, 0x3000: `IDEOGRAPHIC SPACE, full width CJK character cell`, 0xFFEF: `ZERO WIDTH NO-BREAK SPACE <ZWNBSP> (BOM), deprecated Unicode 3.2 (use U+2060)`, } )
Functions ¶
func ByteSamples ¶
func ByteSamples() []byte
func Cutover ¶
Cutover reports the number of failures of IndexByte we should tolerate before switching over to Index. n is the number of bytes processed so far. See the bytes.Index implementation for details.
func DedupeWhitespace ¶
DedupeWhitespace removes any duplicate whitespace from the string and replaces it with a single space. If ignoreNewlines == true then \n is ignored.
func Equal ¶
Equal reports whether a and b are the same length and contain the same bytes. A nil argument is equivalent to an empty slice.
func HashStrBytes ¶
HashStrBytes returns the hash and the appropriate multiplicative factor for use in Rabin-Karp algorithm.
func IndexRabinKarpBytes ¶
IndexRabinKarpBytes uses the Rabin-Karp search algorithm to return the index of the first occurrence of substr in s, or -1 if not present.
func IsASCIIAlpha ¶
func IsASCIIPrintable ¶
IsASCIIPrintable checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsASCIISpace ¶
IsASCIISpace tests for the most common ASCII whitespace characters:
' ', '\t', '\n', '\f', '\r', '\v'
This excludes all Unicode code points above 0x007F.
The C language defines whitespace characters to be "space, horizontal tab, new-line, vertical tab, and form-feed."
func IsAlphaNum ¶
IsAlphaNum reports whether the byte is an ASCII letter, number, or underscore
func IsAlphaNumSwitch ¶
func IsAlphaNumUnder ¶
func IsDigitSingleOP ¶
IsDigitSingleOP uses a single operation instead of the standard a << c && c << b form Another good example: very common thing is if(x >= 1 && x <= 9) which can be done as if( (unsigned)(x-1) <=(unsigned)(9-1)) Changing two conditional tests to one can be a big speed advantage; especially when it allows predicated execution instead of branches. I used this for years (where justified) until I noticed abt 10 years ago that compilers had started doing this transform in the optimizer, then I stopped. Still good to know, since there are similar situations where the compiler can't make the transform for you. Or if you're working on a compiler.
func IsDigitSingleOPCompare ¶
IsDigitSingleOPCompare is a sample implementation used for benchmarking IsDigitSingleOP
func IsSpaceMask ¶
func IsUnicodeWhiteSpaceMap ¶
IsUnicodeWhiteSpaceMap reports whether the rune is any utf8 whitespace character using the broadest and most complete definition.
The speed of this implementation ~25% slower than that of IsASCIISpace(c byte) but tests 3.75 times more possible code points.
The speed is ~7% faster than that of unicode.IsSpace(r rune) from the standard library and covers nearly twice as many code points.
isWhiteSpaceLogicChain checks for any unicode whitespace rune.
Included:
0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, 0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x2028, 0x2029, 0x202F, 0x205F, 0x3000, 0x1680
Related Unicode characters (property White_Space=no) Not included:
0x200B, 0x200C, 0x200D, 0x2060
func RuneSamples ¶
func RuneSamples() []rune
func SmallByteSamples ¶
func SmallByteSamples() []byte
func SmallByteStringSamples ¶
func SmallByteStringSamples() (list []string)
func SmallRuneSamples ¶
func SmallRuneSamples() []rune
func SmallRuneStringSamples ¶
func SmallRuneStringSamples() (list []string)
func ToLowerByte ¶
func ToString ¶
func ToString(any interface{}) string
ToString implements Stringer directly as a function call with a parameter instead of a method on that parameter.
func ToUpperByte ¶
Types ¶
type Any ¶
type Any interface{}
Any is used to store data when the type cannot be determined ahead of time.
type List ¶
type List struct {
// contains filtered or unexported fields
}
List is a wrapper around a slice of items. It offers formatting options and convenience functions.
Example ¶
// List.Contains() fmt.Println(tempList.Contains(3.14)) fmt.Println(tempList.Contains(42)) // List.Len() fmt.Println(tempList.Len()) // List.Cap() fmt.Println(tempList.Cap()) // List.Name() fmt.Println(tempList.Name()) // List.Add() fmt.Println(tempList.Contains("fake")) tempList.Add("fake") // fmt.Println(tempList.Contains("fake"))
Output: false false 6 6 tempList false
func NewList ¶
NewList returns a new List from the given data.
Example ¶
fmt.Println(tempList)
Output: &{tempList [this 1 <nil> 0 3.14 9]}
type Set ¶
type Set struct {
// contains filtered or unexported fields
}
Set is a hashable version of a list with unique items.
Example ¶
// Set.Contains() fmt.Println(tempSet.Contains(3.14)) fmt.Println(tempSet.Contains(42)) // Set.Len() fmt.Println(tempSet.Len()) // Set.Cap() fmt.Println(tempSet.Cap()) // Set.Name() fmt.Println(tempSet.Name()) // Set.Add() fmt.Println(tempSet.Contains("fake")) _ = tempSet.Add("fake") // fmt.Println(tempSet.Contains("fake"))
Output: true false 6 6 tempSet false
func NewSet ¶
NewSet returns a new Set from the given List
Example ¶
fmt.Println(tempSet)
Output: &{tempSet map[<nil>:true 3.14:true 0:true 1:true 9:true this:true]}
func (*Set) Cap ¶
Cap returns the max number of elements in the Set (since cap is undefined for map types in go).