symbolset

package module

v0.0.0-...-262ae63 Latest Latest Go to latest Published: Mar 21, 2023 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/stts-se/symbolset

Links

Open Source Insights

README ¶

symbolset

Symbolset is a repository for handling phonetic symbol sets and mappers/converters between different symbol sets and languages. Written in go.

I. Server installation

Set up go

Download: https://golang.org/dl/ (1.13 or higher)
Installation instructions: https://golang.org/doc/install
Clone the source code

$ git clone https://github.com/stts-se/symbolset.git
$ cd symbolset
Test (optional)

symbolset$ go test ./...
Pre-compile server (for faster execution times).

symbolset$ cd server
server$ go build .

II. Quick start: Start the server with demo set of symbol sets

server$ ./server -ss_files demo_files

III. Setup with Wikispeech symbolsets

Clone Wikispeech lexdata (this might take a couple of minutes)

$ git clone https://github.com/stts-se/wikispeech-lexdata.git
Setup

server$ bash setup.sh wikispeech-lexdata ss_files
Start server

server$ ./server -ss_files ss_files

This work was supported by the Swedish Post and Telecom Authority (PTS) through the grant "Wikispeech – en användargenererad talsyntes på Wikipedia" (2016–2017).

Documentation ¶

Overview ¶

Package symbolset is used to define symbol sets, such as NST-SAMPA, Wikispeech-SAMPA, and so on.

Each symbol set is defined in a .sym file including each symbol's corresponding IPA representation:

DESCRIPTION          SYMBOL   IPA	 IPA UNICODE          CATEGORY

Sample lines (Swedish Wikispeech SAMPA):

DESCRIPTION          SYMBOL   IPA	 IPA UNICODE          CATEGORY
sil                  i:       iː 	 U+0069U+02D0         Syllabic
aula                 au       a⁀ʊ	 U+0061U+2040U+028A   Syllabic
bok                  b        b  	 U+0062               NonSyllabic
forna                rn       ɳ  	 U+0273               NonSyllabic
syllable delimiter   .        .  	 U+002E               SyllableDelimiter
accent I             "        ˈ  	 U+02C8               Stress
accent II            ""       ˈ̀  	 U+02C8U+0300         Stress
secondary stress     %        ˌ  	 U+02CC               Stress

Note that the header is required on the first line. As you can see in the examples, the IPA UNICODE is specified on the format U+<NUMBER> (no space between symbols in sequence).

Each symbol set has a name, extracted from the .sym file name.

Legal categories (pre-defined in code):

Syllabic: syllabic phonemes (typically vowels and syllabic consonants)

NonSyllabic: non-syllabic phonemes (typically consonants)

Stress: stress and accent symbols (primary, secondary, tone accents, etc)

PhonemeDelimiter: phoneme delimiters (white space, empty string, etc)

SyllableDelimiter: syllable delimiters

MorphemeDelimiter: morpheme delimiters that need not align with morpheme boundaries in the decompounded orthography

CompoundDelimiter: compound delimiters that should be aligned with compound boundaries in the decompounded orthography

WordDelimiter: word delimiters

For real world examples (used for unit tests), see the test_data folder: https://github.com/stts-se/pronlex/tree/master/symbolset/test_data

Index ¶

Variables
func LoadSymbolSetsFromDir(dirName string) (map[string]SymbolSet, error)
type IPASymbol
type Symbol
type SymbolCat
- func (i SymbolCat) String() string
type SymbolSet
type Type
- func (i Type) String() string

Constants ¶

This section is empty.

Variables ¶

View Source

var SymbolSetSuffix = ".sym"

SymbolSetSuffix defines the filename extension for symbol sets

Functions ¶

func LoadSymbolSetsFromDir ¶

func LoadSymbolSetsFromDir(dirName string) (map[string]SymbolSet, error)

LoadSymbolSetsFromDir loads a all symbol sets from the specified folder (all files with .sym extension)

Types ¶

type IPASymbol ¶

type IPASymbol struct {
	String  string
	Unicode string
}

IPASymbol ipa symbol string with Unicode representation

type Symbol ¶

type Symbol struct {
	String string
	Cat    SymbolCat
	Desc   string
	IPA    IPASymbol
}

Symbol represent a phoneme, stress or delimiter symbol used in transcriptions, including the IPA symbol with unicode

type SymbolCat ¶

type SymbolCat int

SymbolCat is used to categorize transcription symbols.

const (
	// Syllabic is used for syllabic phonemes (typically vowels and syllabic consonants)
	Syllabic SymbolCat = iota

	// NonSyllabic is used for non-syllabic phonemes (typically consonants)
	NonSyllabic

	// Stress is used for stress and accent symbols (primary, secondary, tone accents, etc)
	Stress

	// PhonemeDelimiter is used for phoneme delimiters (white space, empty string, etc)
	PhonemeDelimiter

	// SyllableDelimiter is used for syllable delimiters
	SyllableDelimiter

	// MorphemeDelimiter is used for morpheme delimiters that need not align with
	// morpheme boundaries in the decompounded orthography
	MorphemeDelimiter

	// CompoundDelimiter is used for compound delimiters that should be aligned
	// with compound boundaries in the decompounded orthography
	CompoundDelimiter

	// WordDelimiter is used for word delimiters
	WordDelimiter
)

func (SymbolCat) String ¶

func (i SymbolCat) String() string

type SymbolSet ¶

type SymbolSet struct {
	Name    string
	Type    Type
	Symbols []Symbol

	// Phonemes: actual phonemes (syllabic non-syllabic)
	Phonemes []Symbol

	// PhoneticSymbols: Phonemes and stress
	PhoneticSymbols []Symbol

	PhonemeRe     *regexp.Regexp
	SyllabicRe    *regexp.Regexp
	NonSyllabicRe *regexp.Regexp
	SymbolRe      *regexp.Regexp

	PhonemeDelimiter Symbol
	// contains filtered or unexported fields
}

SymbolSet is a struct for package private usage. To create a new 'SymbolSet' instance, use NewSymbolSet

func LoadSymbolSet ¶

func LoadSymbolSet(fName string) (SymbolSet, error)

LoadSymbolSet loads a SymbolSet from file

func LoadSymbolSetWithName ¶

func LoadSymbolSetWithName(name string, fName string) (SymbolSet, error)

LoadSymbolSetWithName loads a SymbolSet from file, and names the SymbolSet

func NewSymbolSet ¶

func NewSymbolSet(name string, symbols []Symbol) (SymbolSet, error)

NewSymbolSet is a constructor for 'symbols' with built-in error checks

func NewSymbolSetWithTests ¶

func NewSymbolSetWithTests(name string, symbols []Symbol, testLines []string, checkForDups bool) (SymbolSet, error)

NewSymbolSetWithTests is a constructor for 'symbols' with built-in error checks

func (SymbolSet) ContainsSymbols ¶

func (ss SymbolSet) ContainsSymbols(trans string, symbols []Symbol) (bool, error)

ContainsSymbols checks if a transcription contains a certain phoneme symbol

func (SymbolSet) ConvertFromIPA ¶

func (ss SymbolSet) ConvertFromIPA(trans string) (string, error)

ConvertFromIPA maps one input IPA transcription into the current symbol set

func (SymbolSet) ConvertToIPA ¶

func (ss SymbolSet) ConvertToIPA(trans string) (string, error)

ConvertToIPA maps one input transcription string into an IPA transcription

func (SymbolSet) Get ¶

func (ss SymbolSet) Get(symbol string) (Symbol, error)

Get searches the SymbolSet for a symbol with the given string

func (SymbolSet) GetFromIPA ¶

func (ss SymbolSet) GetFromIPA(ipa string) (Symbol, error)

GetFromIPA searches the SymbolSet for a symbol with the given IPA symbol string

func (SymbolSet) SplitIPATranscription ¶

func (ss SymbolSet) SplitIPATranscription(input string) ([]string, error)

SplitIPATranscription splits the input transcription into separate symbols

func (SymbolSet) SplitTranscription ¶

func (ss SymbolSet) SplitTranscription(input string) ([]string, error)

SplitTranscription splits the input transcription into separate symbols

func (SymbolSet) ValidIPASymbol ¶

func (ss SymbolSet) ValidIPASymbol(symbol string) bool

ValidIPASymbol checks if a string is a valid symbol or not

func (SymbolSet) ValidSymbol ¶

func (ss SymbolSet) ValidSymbol(symbol string) bool

ValidSymbol checks if a string is a valid symbol or not

type Type ¶

type Type int

Type is used for accent placement, etc.

const (
	// CMU is used for the phone set used in the CMU lexicon
	CMU Type = iota

	// SAMPA is used for SAMPA transcriptions (http://www.phon.ucl.ac.uk/home/sampa/)
	SAMPA

	// IPA is used for IPA transcriptions
	IPA

	// Other is used for symbol sets not defined in the types above
	Other
)

func (Type) String ¶

func (i Type) String() string

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
converter Package converter is used to convert between symbol sets from different languages.	Package converter is used to convert between symbol sets from different languages.
mapper Package mapper is used to map between different phonetic symbol sets, such as NST-SAMPA to Wikispeech-SAMPA, IPA to SAMPA, and so on.	Package mapper is used to map between different phonetic symbol sets, such as NST-SAMPA to Wikispeech-SAMPA, IPA to SAMPA, and so on.
server

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL