text: golang.org/x/text/internal/colltab Index | Files

package colltab

import "golang.org/x/text/internal/colltab"

Package colltab contains functionality related to collation tables. It is only to be used by the collate and search packages.

Index

Package Files

collelem.go colltab.go contract.go iter.go numeric.go table.go trie.go weighter.go

Constants

const (
    Ignore = ceType4
)

For normal collation elements, we assume that a collation element either has a primary or non-default secondary value, not both. Collation elements with a primary value are of the form 01pppppp pppppppp ppppppp0 ssssssss

- p* is primary collation value
- s* is the secondary collation value

00pppppp pppppppp ppppppps sssttttt, where

- p* is primary collation value
- s* offset of secondary from default value.
- t* is the tertiary collation value

100ttttt cccccccc pppppppp pppppppp

- t* is the tertiar collation value
- c* is the canonical combining class
- p* is the primary collation value

Collation elements with a secondary value are of the form 1010cccc ccccssss ssssssss tttttttt, where

- c* is the canonical combining class
- s* is the secondary collation value
- t* is the tertiary collation value

11qqqqqq qqqqqqqq qqqqqqq0 00000000

- q* quaternary value
const (
    MaxQuaternary = 0x1FFFFF // 21 bits.
)

func MatchLang Uses

func MatchLang(t language.Tag, tags []language.Tag) int

MatchLang finds the index of t in tags, using a matching algorithm used for collation and search. tags[0] must be language.Und, the remaining tags should be sorted alphabetically.

Language matching for collation and search is different from the matching defined by language.Matcher: the (inferred) base language must be an exact match for the relevant fields. For example, "gsw" should not match "de". Also the parent relation is different, as a parent may have a different script. So usually the parent of zh-Hant is und, whereas for MatchLang it is zh.

type ContractTrieSet Uses

type ContractTrieSet []struct{ L, H, N, I uint8 }

type Elem Uses

type Elem uint32

Elem is a representation of a collation element. This API provides ways to encode and decode Elems. Implementations of collation tables may use values greater or equal to PrivateUse for their own purposes. However, these should never be returned by AppendNext.

const (
    PrivateUse Elem = minContract
)

func MakeElem Uses

func MakeElem(primary, secondary, tertiary int, ccc uint8) (Elem, error)

MakeElem returns an Elem for the given values. It will return an error if the given combination of values is invalid.

func MakeQuaternary Uses

func MakeQuaternary(v int) Elem

MakeQuaternary returns an Elem with the given quaternary value.

func (Elem) CCC Uses

func (ce Elem) CCC() uint8

CCC returns the canonical combining class associated with the underlying character, if applicable, or 0 otherwise.

func (Elem) Mask Uses

func (ce Elem) Mask(l Level) uint32

Mask sets weights for any level smaller than l to 0. The resulting Elem can be used to test for equality with other Elems to which the same mask has been applied.

func (Elem) Primary Uses

func (ce Elem) Primary() int

Primary returns the primary collation weight for ce.

func (Elem) Quaternary Uses

func (ce Elem) Quaternary() int

Quaternary returns the quaternary value if explicitly specified, 0 if ce == Ignore, or MaxQuaternary otherwise. Quaternary values are used only for shifted variants.

func (Elem) Secondary Uses

func (ce Elem) Secondary() int

Secondary returns the secondary collation weight for ce.

func (Elem) Tertiary Uses

func (ce Elem) Tertiary() uint8

Tertiary returns the tertiary collation weight for ce.

func (Elem) Weight Uses

func (ce Elem) Weight(l Level) int

Weight returns the collation weight for the given level.

type Iter Uses

type Iter struct {
    Weighter Weighter
    Elems    []Elem
    // N is the number of elements in Elems that will not be reordered on
    // subsequent iterations, N <= len(Elems).
    N   int
    // contains filtered or unexported fields
}

An Iter incrementally converts chunks of the input text to collation elements, while ensuring that the collation elements are in normalized order (that is, they are in the order as if the input text were normalized first).

func (*Iter) Discard Uses

func (i *Iter) Discard()

Discard removes the collation elements up to N.

func (*Iter) End Uses

func (i *Iter) End() int

End returns the end position of the input text for which Next has returned results.

func (*Iter) Len Uses

func (i *Iter) Len() int

Len returns the length of the input text.

func (*Iter) Next Uses

func (i *Iter) Next() bool

Next appends Elems to the internal array. On each iteration, it will either add starters or modifiers. In the majority of cases, an Elem with a primary value > 0 will have a CCC of 0. The CCC values of collation elements are also used to detect if the input string was not normalized and to adjust the result accordingly.

func (*Iter) Reset Uses

func (i *Iter) Reset(p int)

Reset sets the position in the current input text to p and discards any results obtained so far.

func (*Iter) SetInput Uses

func (i *Iter) SetInput(s []byte)

SetInput resets i to input s.

func (*Iter) SetInputString Uses

func (i *Iter) SetInputString(s string)

SetInputString resets i to input s.

type Level Uses

type Level int

Level identifies the collation comparison level. The primary level corresponds to the basic sorting of text. The secondary level corresponds to accents and related linguistic elements. The tertiary level corresponds to casing and related concepts. The quaternary level is derived from the other levels by the various algorithms for handling variable elements.

const (
    Primary Level = iota
    Secondary
    Tertiary
    Quaternary
    Identity

    NumLevels
)

type Table Uses

type Table struct {
    Index Trie // main trie

    // expansion info
    ExpandElem []uint32

    // contraction info
    ContractTries  ContractTrieSet
    ContractElem   []uint32
    MaxContractLen int
    VariableTop    uint32
}

Table holds all collation data for a given collation ordering.

func (*Table) AppendNext Uses

func (t *Table) AppendNext(w []Elem, b []byte) (res []Elem, n int)

func (*Table) AppendNextString Uses

func (t *Table) AppendNextString(w []Elem, s string) (res []Elem, n int)

func (*Table) Domain Uses

func (t *Table) Domain() []string

func (*Table) Start Uses

func (t *Table) Start(p int, b []byte) int

func (*Table) StartString Uses

func (t *Table) StartString(p int, s string) int

func (*Table) Top Uses

func (t *Table) Top() uint32

type Trie Uses

type Trie struct {
    Index0  []uint16 // index for first byte (0xC0-0xFF)
    Values0 []uint32 // index for first byte (0x00-0x7F)
    Index   []uint16
    Values  []uint32
}

type Weighter Uses

type Weighter interface {
    // Start finds the start of the segment that includes position p.
    Start(p int, b []byte) int

    // StartString finds the start of the segment that includes position p.
    StartString(p int, s string) int

    // AppendNext appends Elems to buf corresponding to the longest match
    // of a single character or contraction from the start of s.
    // It returns the new buf and the number of bytes consumed.
    AppendNext(buf []Elem, s []byte) (ce []Elem, n int)

    // AppendNextString appends Elems to buf corresponding to the longest match
    // of a single character or contraction from the start of s.
    // It returns the new buf and the number of bytes consumed.
    AppendNextString(buf []Elem, s string) (ce []Elem, n int)

    // Domain returns a slice of all single characters and contractions for which
    // collation elements are defined in this table.
    Domain() []string

    // Top returns the highest variable primary value.
    Top() uint32
}

A Weighter can be used as a source for Collator and Searcher.

func NewNumericWeighter Uses

func NewNumericWeighter(w Weighter) Weighter

NewNumericWeighter wraps w to replace individual digits to sort based on their numeric value.

Weighter w must have a free primary weight after the primary weight for 9. If this is not the case, numeric value will sort at the same primary level as the first primary sorting after 9.

Package colltab imports 6 packages (graph) and is imported by 3 packages. Updated 2017-12-15. Refresh now. Tools for package owners.