kanatrans

package module
v1.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 11, 2024 License: MIT Imports: 8 Imported by: 0

README

English2KanaTransliteration

Go Reference

Convert English phrases into phonetic Japanese kana approximations; also known as Englishru. Does not translate English into Japanese, but translates English words into their approximate pronounciations in Japanese.

Based on the English to Katakana transcription code written in Python by Yoko Harada (@yokolet) Please see that repo for details on the phonetic conversion.

English to phoneme conversion based on CMUDict. Kanji to Katakana convertion based on KANJIDIC2. Thanks to JMDict and kana. Please refer to those licenses for non-free implementations.

It is a port in Golang with some additional functions:

  • Filtering functions to split, parse, and rejoin sentences which contain punctuation or improper contractions.
  • Also accepts Japanese input; converts any Kanji characters into their most common Hiragana pronounciation, converts Hiragana into Katakana, leaves Katakana as is.
  • Also accepts Romaji input.
  • strict input cleaning mode for use with TTS input that does not understand punctuation and other chars. See header below.

Usage Example

Below is an example go file to test this module. It reads input from stdin, converts the English sentences into their Japanese transliteration and prints them to stdout.

package main

import (
	"github.com/Luigi-Pizzolito/English2KanaTransliteration"
	"bufio"
	"fmt"
	"os"
)

func main() {
	// Create an instance of AllToKana
	allToKana := kanatrans.NewAllToKana()

	// Listen to stdin indefinitely
	reader := bufio.NewReader(os.Stdin)
	for {
		line, err := reader.ReadString('\n')
		if err != nil {
			break // Exit loop on error
		}

		// Call convertString function with the accumulated line
		result := allToKana.Convert(line)

		// Output the result
		fmt.Print(result+"\n")
	}
}

Sample Output:

❯ go run .
Hello there.
ヘロー ゼアー。
With this program, you can make Japanese text to speech speak in English!
ウィズ ジス プローラ、 ユー キャン メイク ジャーンイーズ テックスト ツー スピーチ スピーク イン イングシュ!
watashi wa miku desu~
ワタシ ワ ミク デス〜
Hello! こんにちは~ ヘロー, miki松原。
ヘロー! コンニチハ〜 ヘロー、 ミキショウゲン。
Using individual modules
All2Katakana
// Create an instance of AllToKana
allToKana := kanatrans.NewAllToKana()
// Usage
kana := allToKana.Convert("Hello! watashiwa 初音ミク.")
// -> ヘロー! ワタシワ ショオンミク。
Eng2Katakana
// Create an instance of EngToKana
engToKana := kanatrans.NewEngToKana()
// Usage
kana := engToKana.TranscriptSentence("Hello World!")
// -> ヘローワールド
Kanji2Katakana
// Create an instance of KanjiToKana
kanjiToKana := kanatrans.NewKanjiToKana()
// Usage
kana := kanjiToKana.Convert("初音")
// -> ショオン

This needs some work, it just takes the most common pronouciation of each Kanji instead of the correct one for the context. Pull requests are welcome!

Hiragana2Katakana
// Create an instance of HiraganaToKana
hiraganaToKana := kanatrans.NewHiraganaToKana()
// Usage
kana := hiraganaToKana.Convert("こんにちは")
// -> コンニチハ
Romaji2Katakana
// Create an instance of RomajiToKana
romajiToKana := kanatrans.NewRomajiToKana()
// Usage
kana := romajiToKana.Convert("kita kita desu")
// -> キタ キタ デス
ConvertPunctuation
// Usage
japanesePunctuation := kanatrans.ConvertToJapanesePunctuation("Hello, World!")
// -> Hello、 World!
Note for using with Japanese-only text-to-speech (TTS)

This module is intended to allow TTS which only support Japanese to speak english (such as AquesTalk, Softalk, etc). These TTS usually have some limitations in what punctuation may be present in the input; with only commas and stops being interpreted as a pause and all other punctuation causing an error.

To use this module for such TTS input, you may enable strict input cleaning mode (only Japanese comma and stop on output) by passing a bool in the initialiser for EngToKana, RomajiToKana and AllToKana classes:

// Create an instance of AllToKana with strict punctuation output
allToKana := kanatrans.NewAllToKana(true)
// Create an instance of EngToKana with strict punctuation output
engToKana := kanatrans.NewEngToKana(true)
// Create an instance of RomajiToKana with strict punctuation output
romajiToKana := kanatrans.NewRomajiToKana(true)

You may also use the function kanatrans.ConvertToJapanesePunctuationRestricted instead of kanatrans.ConvertToJapanesePunctuation.

Custom callbacks to proccess Kanji, Kana, English & Punctuation

Internally, the AllToKana proccess function uses a KanjiSplitter class to call func(string) string functions which handle Kanji, Kana, English and Punctuation respectively:

// Create an instance of KanjiSplitter with proccesing callbacks
kanjiSplitter := kanatrans.NewKanjiSplitter(
	kanjiToKana.Convert,					// Kanji callback
	hiraganaToKana.Convert,					// Gana & Kana callback
	engToKana.TranscriptSentence,			// English callback
	ConvertToJapanesePunctuation,			// Punctuation callback
)

If required, you may use a KanjiSplitter with custom callback functions to provide different processing.

Documentation

Overview

Package kanatrans converts English phrases into phonetic Japanese kana approximations; also known as Englishru

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ConvertToJapanesePunctuation added in v1.0.2

func ConvertToJapanesePunctuation(str string) string

Function to convert normal punctuations to their Japanese equivalents

func ConvertToJapanesePunctuationRestricted added in v1.0.2

func ConvertToJapanesePunctuationRestricted(str string) string

Function to convert normal punctuations to their Japanese equivalents

Types

type AllToKana

type AllToKana struct {
	// contains filtered or unexported fields
}

AllToKana struct holds the necessary functions for All to Katakana conversion

func NewAllToKana

func NewAllToKana(strictPunct ...bool) *AllToKana

NewAllToKana creates a new instance of AllToKana

func (*AllToKana) Convert

func (a2k *AllToKana) Convert(s string) string

Convert converts English, Romaji, Hiragana, Kanji to Katakana, leaves Katakana unchanged.

type EngToKana

type EngToKana struct {
	// contains filtered or unexported fields
}

EngToKana struct holds the necessary functions for English to Katakana conversion

func NewEngToKana

func NewEngToKana(strictClean ...bool) *EngToKana

NewEngToKana creates a new instance of EngToKana

func (*EngToKana) TranscriptSentence

func (e2k *EngToKana) TranscriptSentence(line string) string

TranscriptSentence converts an English sentence to Katakana

func (*EngToKana) TranscriptWord

func (e2k *EngToKana) TranscriptWord(word string) string

TranscriptWord converts an English word to Katakana

type HiraganaToKana added in v1.0.1

type HiraganaToKana struct{}

HiraganaToKana struct holds the necessary functions for Hiragana to Katakana conversion

func NewHiraganaToKana added in v1.0.1

func NewHiraganaToKana() *HiraganaToKana

NewHiraganaToKana creates a new instance of HiraganaToKana

func (*HiraganaToKana) Convert added in v1.0.1

func (h2k *HiraganaToKana) Convert(input string) string

Convert converts Hiragana characters to Katakana while leaving Katakana characters unchanged

type KanjiSplitter

type KanjiSplitter struct {
	// contains filtered or unexported fields
}

KanjiSplitter is a class to split a string into segments of Roman, Katagana & Hiragana, and Romaji text for individual processing

func NewKanjiSplitter

func NewKanjiSplitter(kanjiCallback, kanaCallback, romanCallback, punctCallback func(string) string) *KanjiSplitter

NewKanjiSplitter creates a new instance of KanjiSplitter

func (*KanjiSplitter) SeparateAndProcess

func (ks *KanjiSplitter) SeparateAndProcess(input string) string

SeparateAndProcess separates the input string into segments of Roman, Katagana & Hiragana, and Romaji text, and processes each segment accordingly

type KanjiToKana

type KanjiToKana struct {
	// contains filtered or unexported fields
}

KanjiToKana struct holds the necessary functions for Kanji to Kana conversion

func NewKanjiToKana

func NewKanjiToKana() *KanjiToKana

NewKanjiToKana creates a new instance of KanjiToKana

func (*KanjiToKana) Convert

func (k2k *KanjiToKana) Convert(kanji string) string

Convert converts Kanji into Katakana

type RomajiToKana

type RomajiToKana struct {
	// contains filtered or unexported fields
}

RomajiToKana struct holds the necessary functions for Romaji to Kana conversion

func NewRomajiToKana

func NewRomajiToKana(strictClean ...bool) *RomajiToKana

NewRomajiToKana creates a new instance of RomajiToKana

func (*RomajiToKana) Convert

func (r2k *RomajiToKana) Convert(s string) string

Convert converts Romaji to Katakana

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL