diacritics

package
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 13, 2020 License: LGPL-3.0 Imports: 5 Imported by: 0

Documentation

Overview

Package diacritics is the subpackage of package candidate which will attempt to remove diacritical marks from extended latin letters based on one of two different strategies.

- Strategy #1: Straight diacritics removal (NFKD -> strip Mn -> NFKC) - Strategy #2: Apache Lucene ASCII folding

Index

Constants

This section is empty.

Variables

View Source
var AsciiFoldTransformer = transform.Chain(
	norm.NFKC,
	&asciiFoldSpanningTransformer{},
)

AsciiFoldTransformer is a Unicode stream transformer object which replaces a character with the ASCII folding version of the character.

View Source
var AsciiFoldTranslateTable = map[rune]string{}/* 1240 elements not displayed */

ASCII folding database fetched from https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java

StripDiacriticalMarksTransformer is a Unicode stream transformer object which tries to remove as many combining diacritical marks from the input string as possible. It handles various combinations of the same Unicode characters whenever possible (such as 'ö' as a single codepoint vs. 'o' + '¨' = 'ö' which has 2 codepoints).

The removal process is preceded by Unicode decomposition, and the result is then re-combined to get final output.

Functions

This section is empty.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL