hyphenation

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2023 License: CC0-1.0 Imports: 4 Imported by: 5

README

GoDoc CircleCI

A port of TeX's hyphenation algorithm to Go

Installation

go get github.com/speedata/hyphenation

Prerequisites

Download a hyphenation pattern file from CTAN, for example from https://ctan.math.utah.edu/ctan/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/

Usage

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/speedata/hyphenation"
)

func main() {
	filename := "hyph-en-us.pat.txt"
	r, err := os.Open(filename)
	if err != nil {
		log.Fatal(err)
	}
	l, err := hyphenation.New(r)
	if err != nil {
		log.Fatal(err)
	}

	var h []int
	for _, v := range []string{"Computer", "developers"} {
		h = l.Hyphenate(v)
		fmt.Println(v, h) // [3 6] and [2 5 7 9]
	}
}

Debugging hyphenation patterns

Similar to getting the hyphenation slice, you can get a detailed view of the hyphenation patterns used in a word:

str := l.DebugHyphenate("developers")
fmt.Println(str)

results in

   .   d   e   v   e   l   o   p   e   r   s   .
     0   0   1   0   |   |   |   |   |   |   |    de1v
     |   0   0   0   0   3   0   |   |   |   |    evel3o
     |   |   0   0   4   0   0   |   |   |   |    ve4lo
     |   |   |   |   |   0   0   1   0   0   |    op1er
     |   |   |   |   |   |   |   0   0   1   0    er1s
     |   |   |   |   |   |   |   |   4   0   2    4rs2
max: 0   0   1   0   4   3   0   1   4   1   2
final: d   e - v   e   l - o   p - e   r - s

Other

Contact: gundlach@speedata.de
Twitter: @speedata
License: cc0 / public domain (https://creativecommons.org/publicdomain/zero/1.0/)
Status: Just an example hack, never used it in production yet

Documentation

Overview

Package hyphenation hyphenates words with TeXs algorithm.

The algorithm is used in TeX and originally from Franklin Liang, his thesis can be downloaded from https://www.tug.org/docs/liang%20/liang-thesis.pdf

You need pattern files which can be downloaded from http://ctan.math.utah.edu/ctan/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Lang

type Lang struct {
	Leftmin  int // The minimum number of non hyphenated runes at the beginning of a word. Defaults to 0.
	Rightmin int // The minimum number of non hyphenated runes at the end of a word. Defaults to 0.
	// contains filtered or unexported fields
}

Lang is a language object for hyphenation. Use it by calling New(), otherwise the object is not initialized properly.

func New

func New(r io.Reader) (*Lang, error)

New loads patterns from the reader. Patterns are word substrings with a hyphenation priority between each letter, 0s omitted. Example patterns are “.ach4 at3est 4if.” where a dot denotes a word boundary. An odd number means “don't hyphenate here”, everything else allows hyphenation at this point. The final priority for each position is the maximum of each priority given in each applied pattern.

func (*Lang) DebugHyphenate added in v1.0.1

func (l *Lang) DebugHyphenate(word string) string

DebugHyphenate returns a multi-line string with information about the patterns used and the priorities.

func (*Lang) Hyphenate

func (l *Lang) Hyphenate(word string) []int

Hyphenate returns an array of int with resulting break points. For example the word “developers” with English (US) hyphenation patterns could return [2 5 7 9] which means de-vel-op-er-s

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL