asciimap

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 6, 2023 License: BSD-3-Clause Imports: 2 Imported by: 0

README

Ascii runes mapper

Ascii runes mapper extends golang.org/x/text/transform extension to support mapping from a single rune to a string with multiple runes. The idea behind implementing such a mapper is to replace ASCII 256 extended characters that are part of unicode.Ll and unicode.Lu to remap to basic ASCII 128 characters.

What's the use case?

We had a common issue that Norwegian characters (could be others - Vietnamese, Greece, etc.) are part of unicode.Ll or Lu but using norm.NFD the decomposition of character does not work for characters such as ["Œ", "Æ", "Ø", "ß"] and their lowercase variants. So you are not able to modify using the transform module the name groß/Æd/Snøfall 2 as it is not part of the norm.NFD decomposition.

When you would like to search for previously mentioned titles using simple ASCII characters, you come across a problem that it does not match at all (it has to be an exact match of Unicode character). That's why this library exists. It can transform uncommon characters into their string representatives that are part of ASCII 128. These new variants could be part of the search index as well as old variants.

When you would like to find gross or groß it does not matter, they are both the same. As I have not found such a library in Go that would fit the transform extension module, only custom libraries.

Rune mapper

The rune Map transformation function has prescription of Map(mapping func(rune) string) Transformer which means that single rune cloud be converted into single/multiple runes. The tests were inspired by module golang.org/x/text/runes which was never adopted into Go modules.

Examples

import mapper "github.com/Matovidlo/ascii-runes-map"

func main() {
    t := mapper.Map(mapper.Ascii256Toascii128)
    text, _, _ = transform.String(t, "Hellø")
    fmt.Println(text) // Hello
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Ascii256Toascii128

func Ascii256Toascii128(r rune) string

Ascii256Toascii128 maps given rune `r` that is in both ascii 256 and unicode.Ll or unicode.Lu categories. This conversion does not support stripping of unicode.Mn as it can be done using norm.NFD normalization.

Types

type Transformer

type Transformer struct {
	// contains filtered or unexported fields
}

Transformer implements the transform.Transformer interface.

func Map

func Map(mapping func(rune) string) Transformer

Map returns a Transformer that maps the runes in the input using the given mapping. Illegal bytes in the input are converted to utf8.RuneError before being passed to the mapping func.

func (Transformer) Reset

func (t Transformer) Reset()

func (Transformer) Span

func (t Transformer) Span(b []byte, atEOF bool) (n int, err error)

func (Transformer) Transform

func (t Transformer) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL