translit

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2024 License: Unlicense Imports: 4 Imported by: 1

README

translit

Go Reference Go Test Go Report Card

Go (Golang) utilities for (mostly Cyrillic) transliteration.

This project aims to provide:

This project is intended to be used with golang.org/x/text/transform - well-thought-out/convenient base for streaming text transforming.

If you a) don't need to build a custom transliterator b) are fine with custom license - take a look at essentialkaos/translit - it is fast and has plenty of standards implemented for Cyrillic.

Features

  • easy to build custom transliterations with map[string]string rules - translit.Map(map[string]string{...}).Transformer()
  • tries longest match first - correct multi-character transliterations like zз and zhж (and not zhзх)
  • expected decent performance - implemented using tree lookup under the hood (no regexps, no brute-force multi-replacement; transformer itself should not generate any garbage on it's own)
  • The Unlicense: Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

It is intentionally kept as simple as possible, but that comes at cost of a higher memory usage in some special cases:

  • rules like "first word character" / "non-first word character" result in multiple tree paths (roughly, each "alphabet letter" + "non-first letter" permutations - applies mostly to uknational implementation)
  • upper/lower case - each upper/lower character conversion should be a custom rule, like БB and бb

Usage

go get -u github.com/mxmCherry/translit
Custom rules
package translit_test

import (
	"fmt"

	"github.com/mxmCherry/translit"
	"golang.org/x/text/transform"
)

func ExampleMap() {
	// pre-compile a transformer factory from rule map;
	// this is recommended to be a global variable in your own package:
	custom := translit.Map(
		map[string]string{
			"л":  "l",
			"Л":  "L",
			"ля": "lya",
			"Ля": "Lya",
		},
	)

	// get a "fresh" transformer (can be done for each transliteration instead of tr.Reset()):
	tr := custom.Transformer()

	var s string

	tr.Reset() // reset transformer state before usage - it is stateful and non-thread-safe
	s, _, _ = transform.String(tr, "Л - л")
	fmt.Println(s) // L - l

	tr.Reset() // reset transformer state before usage - it is stateful and non-thread-safe
	s, _, _ = transform.String(tr, "Ля-лЯ-ля")
	fmt.Println(s) // Lya-lЯ-lya - no rule for upper-case "Я", so it's not converted

	// Output:
	// L - l
	// Lya-lЯ-lya
}
Language-specific transliterator
package uknational_test

import (
	"fmt"

	"github.com/mxmCherry/translit/uknational"
	"golang.org/x/text/transform"
)

func ExampleToLatin() {
	uk := uknational.ToLatin() // this is recommended to be a global variable in your own package

	// https://uk.wikipedia.org/wiki/Панграма
	s, _, _ := transform.String(uk.Transformer(), "Десь чув, що той фраєр привіз їхньому царю грильяж та класну шубу з пір'я ґави.")
	fmt.Println(s)

	// Output:
	// Des chuv, shcho toi fraier pryviz yikhnomu tsariu hryliazh ta klasnu shubu z piria gavy.
}

Guidelines

This package aims to provide default transliterations for some languages.

Subpackage names for these transliterations should be made of ISO 639-1 language code and the standard name, for example: uknational, where uk is the language code and national is a standard (defined by national government).

One subpackage per language/standard approach is to reduce memory footprint: pay (with memory) only for what you actually use.

These subpackages should expose at least one translit.Factory constructor, ideally - two (two-way transformers, like ToLatin()/FromLatin()).

Code style

Documentation

Overview

Package translit provides transliteration transformer. See subpackages for language-specific transformers.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Factory

type Factory interface {
	// Transformer returns/builds a "fresh" transliteration transformer.
	Transformer() transform.Transformer
}

Factory represents pre-compiled tree builder/factory.

func Map

func Map(rules map[string]string) Factory

Map pre-compiles string-to-string transliteration rule map into transformer factory.

Example
package main

import (
	"fmt"

	"github.com/mxmCherry/translit"
	"golang.org/x/text/transform"
)

func main() {
	// pre-compile a transformer factory from rule map;
	// this is recommended to be a global variable in your own package:
	custom := translit.Map(
		map[string]string{
			"л":  "l",
			"Л":  "L",
			"ля": "lya",
			"Ля": "Lya",
		},
	)

	// get a "fresh" transformer (can be done for each transliteration instead of tr.Reset()):
	tr := custom.Transformer()

	var s string

	tr.Reset() // reset transformer state before usage - it is stateful and non-thread-safe
	s, _, _ = transform.String(tr, "Л - л")
	fmt.Println(s) // L - l

	tr.Reset() // reset transformer state before usage - it is stateful and non-thread-safe
	s, _, _ = transform.String(tr, "Ля-лЯ-ля")
	fmt.Println(s) // Lya-lЯ-lya - no rule for upper-case "Я", so it's not converted

}
Output:

L - l
Lya-lЯ-lya

Directories

Path Synopsis
internal
transformer
Package transformer implements a transliteration golang.org/x/text/transform.Transformer.
Package transformer implements a transliteration golang.org/x/text/transform.Transformer.
tree
Package tree defines a byte-to-byte-slice tree and implements some basic operations for it.
Package tree defines a byte-to-byte-slice tree and implements some basic operations for it.
tree/maptree
Package maptree provides a tree implementation with map-based children lookup.
Package maptree provides a tree implementation with map-based children lookup.
Package ruicao provides Russian romanization transliterator, as defined in ICAO standard.
Package ruicao provides Russian romanization transliterator, as defined in ICAO standard.
Package uknational provides Ukrainian romanization transliterator, as defined in http://zakon.rada.gov.ua/laws/show/55-2010-%D0%BF
Package uknational provides Ukrainian romanization transliterator, as defined in http://zakon.rada.gov.ua/laws/show/55-2010-%D0%BF

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL