text

package module
v0.14.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 31, 2024 License: BSD-3-Clause Imports: 0 Imported by: 0

README

Go Text

Go Reference

This repository holds supplementary Go libraries for text processing, many involving Unicode.

Go-CoreLibs Fork

The purpose of this fork is to enable multilingual support within the overall Go-Enjin and Go-Curses projects and due to various reasons, this has proven to require specific changes to the upstream project that may or may not ever propagate back.

Maintained Changes
# commit message
0 cmd/gotext,message/pipeline: ssoroka - go modules support, see: https://github.com/golang/text/commit/059a5b1dfb6e3c931765fdc5ad8a49299f7bf55b
1 *.{go,json}: replace instances of golang.org/x/text with github.com/go-corelibs/x-text
2 language: implement MarshalBinary and UnmarshalBinary methods for Tag structures
3 language: added func Compare(a Tag, others ...Tag) (equal bool)
4 cmd/gotext: document rewrite sub-command -w option
5 cmd/gotext: add IMPORTANT notices to docstrings and --help output
6 message/catalog: implement Include catalog Option and Builder support
7 message/pipeline: do not clear message .Key during State.Merge process
8 cmd/gotext,message/pipeline: document -srclang global option; implement -declare-var and -go-build global options
9 .gitignore: add /gotext
10 README.md: Go-CoreLibs fork updates
11 message: added http.Request tag and printer functions
Branching Structure

This repository has two main branches: upstream and trunk. All Go-CoreLibs changes are are made to trunk. All changes on their way to upstream are in their own branches (based off of upstream and not trunk).

All changes made by the Go-Enjin team are licensed under the same terms as the upstream project.

Tagging Versions

This repository follows the same pattern as the upstream project with the exception that all upstream tags are included here suffixed with -upstream, making them beta releases. The actual tagged releases include the Go-Enjin changes on top of the upstream versions. This is done to minimize confusion and simplify the build systems used by Go-Enjin.

For example:

  • upstream released v0.14.0
  • trunk is essentially hard-reset to the upstream changes
  • trunk is tagged v0.14.0-upstream
  • all Go-Enjin changes are re-applied to trunk
  • trunk is tagged v0.14.0

Currently there is no evidence of upstream doing any patch-level versions so this model works without much effort. Should upstream decide to start using patch-level versions, this model still works because the beta releases are always upstream and the official releases are always the latest upstream plus the Go-Enjin changes.

Long-Term Support

This package is now a part of the Go-CoreLibs project and will be maintained for as long as Go-Enjin and Go-Curses projects need it.

CLDR Versioning

It is important that the Unicode version used in x/text matches the one used by your Go compiler. The x/text repository supports multiple versions of Unicode and will match the version of Unicode to that of the Go compiler. At the moment this is supported for Go compilers from version 1.7.

Download/Install

The easiest way to install is to run go get -u github.com/go-corelibs/x-text. You can also manually git clone the repository to $GOPATH/src/github.com/go-corelibs/x-text.

Contribute

To submit changes to this repository, see http://golang.org/doc/contribute.html.

To generate the tables in this repository (except for the encoding tables), run go generate from this directory. By default tables are generated for the Unicode version in core and the CLDR version defined in github.com/go-corelibs/x-text/unicode/cldr.

Running go generate will as a side effect create a DATA subdirectory in this directory, which holds all files that are used as a source for generating the tables. This directory will also serve as a cache.

Testing

Run

go test ./...

from this directory to run all tests. Add the "-tags icu" flag to also run ICU conformance tests (if available). This requires that you have the correct ICU version installed on your system.

TODO:

  • updating unversioned source files.

Generating Tables

To generate the tables in this repository (except for the encoding tables), run go generate from this directory. By default tables are generated for the Unicode version in core and the CLDR version defined in github.com/go-corelibs/x-text/unicode/cldr.

Running go generate will as a side effect create a DATA subdirectory in this directory which holds all files that are used as a source for generating the tables. This directory will also serve as a cache.

Versions

To update a Unicode version run

UNICODE_VERSION=x.x.x go generate

where x.x.x must correspond to a directory in https://www.unicode.org/Public/. If this version is newer than the version in core it will also update the relevant packages there. The idna package in x/net will always be updated.

To update a CLDR version run

CLDR_VERSION=version go generate

where version must correspond to a directory in https://www.unicode.org/Public/cldr/.

Note that the code gets adapted over time to changes in the data and that backwards compatibility is not maintained. So updating to a different version may not work.

The files in DATA/{iana|icu|w3|whatwg} are currently not versioned.

Report Issues / Send Patches

This repository uses Gerrit for code changes. To learn how to submit changes to this repository, see https://golang.org/doc/contribute.html.

The main issue tracker for the image repository is located at https://github.com/golang/go/issues. Prefix your issue with "x/text:" in the subject line, so it is easy to find.

Documentation

Overview

text is a repository of text-related packages related to internationalization (i18n) and localization (l10n), such as character encodings, text transformations, and locale-specific text handling.

There is a 30 minute video, recorded on 2017-11-30, on the "State of golang.org/x/text" at https://www.youtube.com/watch?v=uYrDrMEGu58

Directories

Path Synopsis
Package cases provides general and language-specific case mappers.
Package cases provides general and language-specific case mappers.
cmd
gotext
gotext is a tool for managing text in Go source code.
gotext is a tool for managing text in Go source code.
Package collate contains types for comparing and sorting Unicode strings according to a given collation order.
Package collate contains types for comparing and sorting Unicode strings according to a given collation order.
Package currency contains currency-related functionality.
Package currency contains currency-related functionality.
Package encoding defines an interface for character encodings, such as Shift JIS and Windows 1252, that can convert to and from UTF-8.
Package encoding defines an interface for character encodings, such as Shift JIS and Windows 1252, that can convert to and from UTF-8.
charmap
Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252.
Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252.
htmlindex
Package htmlindex maps character set encoding names to Encodings as recommended by the W3C for use in HTML 5.
Package htmlindex maps character set encoding names to Encodings as recommended by the W3C for use in HTML 5.
ianaindex
Package ianaindex maps names to Encodings as specified by the IANA registry.
Package ianaindex maps names to Encodings as specified by the IANA registry.
internal
Package internal contains code that is shared among encoding implementations.
Package internal contains code that is shared among encoding implementations.
internal/identifier
Package identifier defines the contract between implementations of Encoding and Index by defining identifiers that uniquely identify standardized coded character sets (CCS) and character encoding schemes (CES), which we will together refer to as encodings, for which Encoding implementations provide converters to and from UTF-8.
Package identifier defines the contract between implementations of Encoding and Index by defining identifiers that uniquely identify standardized coded character sets (CCS) and character encoding schemes (CES), which we will together refer to as encodings, for which Encoding implementations provide converters to and from UTF-8.
japanese
Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.
Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.
korean
Package korean provides Korean encodings such as EUC-KR.
Package korean provides Korean encodings such as EUC-KR.
simplifiedchinese
Package simplifiedchinese provides Simplified Chinese encodings such as GBK.
Package simplifiedchinese provides Simplified Chinese encodings such as GBK.
traditionalchinese
Package traditionalchinese provides Traditional Chinese encodings such as Big5.
Package traditionalchinese provides Traditional Chinese encodings such as Big5.
unicode
Package unicode provides Unicode encodings such as UTF-16.
Package unicode provides Unicode encodings such as UTF-16.
unicode/utf32
Package utf32 provides the UTF-32 Unicode encoding.
Package utf32 provides the UTF-32 Unicode encoding.
feature
plural
Package plural provides utilities for handling linguistic plurals in text.
Package plural provides utilities for handling linguistic plurals in text.
Package internal contains non-exported functionality that are used by packages in the text repository.
Package internal contains non-exported functionality that are used by packages in the text repository.
catmsg
Package catmsg contains support types for package x/text/message/catalog.
Package catmsg contains support types for package x/text/message/catalog.
cldrtree
Package cldrtree builds and generates a CLDR index file, including all inheritance.
Package cldrtree builds and generates a CLDR index file, including all inheritance.
colltab
Package colltab contains functionality related to collation tables.
Package colltab contains functionality related to collation tables.
export/idna
Package idna implements IDNA2008 using the compatibility processing defined by UTS (Unicode Technical Standard) #46, which defines a standard to deal with the transition from IDNA2003.
Package idna implements IDNA2008 using the compatibility processing defined by UTS (Unicode Technical Standard) #46, which defines a standard to deal with the transition from IDNA2003.
export/unicode
Package unicode generates the Unicode tables in core.
Package unicode generates the Unicode tables in core.
format
Package format contains types for defining language-specific formatting of values.
Package format contains types for defining language-specific formatting of values.
gen
Package gen contains common code for the various code generation tools in the text repository.
Package gen contains common code for the various code generation tools in the text repository.
gen/bitfield
Package bitfield converts annotated structs into integer values.
Package bitfield converts annotated structs into integer values.
language/compact
Package compact defines a compact representation of language tags.
Package compact defines a compact representation of language tags.
number
Package number contains tools and data for formatting numbers.
Package number contains tools and data for formatting numbers.
stringset
Package stringset provides a way to represent a collection of strings compactly.
Package stringset provides a way to represent a collection of strings compactly.
tag
Package tag contains functionality handling tags and related data.
Package tag contains functionality handling tags and related data.
testtext
Package testtext contains test data that is of common use to the text repository.
Package testtext contains test data that is of common use to the text repository.
triegen
Package triegen implements a code generator for a trie for associating unsigned integer values with UTF-8 encoded runes.
Package triegen implements a code generator for a trie for associating unsigned integer values with UTF-8 encoded runes.
ucd
Package ucd provides a parser for Unicode Character Database files, the format of which is defined in https://www.unicode.org/reports/tr44/.
Package ucd provides a parser for Unicode Character Database files, the format of which is defined in https://www.unicode.org/reports/tr44/.
utf8internal
Package utf8internal contains low-level utf8-related constants, tables, etc.
Package utf8internal contains low-level utf8-related constants, tables, etc.
Package language implements BCP 47 language tags and related functionality.
Package language implements BCP 47 language tags and related functionality.
display
Package display provides display names for languages, scripts and regions in a requested language.
Package display provides display names for languages, scripts and regions in a requested language.
Package message implements formatted I/O for localized strings with functions analogous to the fmt's print functions.
Package message implements formatted I/O for localized strings with functions analogous to the fmt's print functions.
catalog
Package catalog defines collections of translated format strings.
Package catalog defines collections of translated format strings.
pipeline
Package pipeline provides tools for creating translation pipelines.
Package pipeline provides tools for creating translation pipelines.
Package number formats numbers according to the customs of different locales.
Package number formats numbers according to the customs of different locales.
Package runes provide transforms for UTF-8 encoded text.
Package runes provide transforms for UTF-8 encoded text.
Package search provides language-specific search and string matching.
Package search provides language-specific search and string matching.
secure is a repository of text security related packages.
secure is a repository of text security related packages.
bidirule
Package bidirule implements the Bidi Rule defined by RFC 5893.
Package bidirule implements the Bidi Rule defined by RFC 5893.
precis
Package precis contains types and functions for the preparation, enforcement, and comparison of internationalized strings ("PRECIS") as defined in RFC 8264.
Package precis contains types and functions for the preparation, enforcement, and comparison of internationalized strings ("PRECIS") as defined in RFC 8264.
Package transform provides reader and writer wrappers that transform the bytes passing through as well as various transformations.
Package transform provides reader and writer wrappers that transform the bytes passing through as well as various transformations.
unicode holds packages with implementations of Unicode standards that are mostly used as building blocks for other packages in github.com/go-corelibs/x-text, layout engines, or are otherwise more low-level in nature.
unicode holds packages with implementations of Unicode standards that are mostly used as building blocks for other packages in github.com/go-corelibs/x-text, layout engines, or are otherwise more low-level in nature.
bidi
Package bidi contains functionality for bidirectional text support.
Package bidi contains functionality for bidirectional text support.
cldr
Package cldr provides a parser for LDML and related XML formats.
Package cldr provides a parser for LDML and related XML formats.
norm
Package norm contains types and functions for normalizing Unicode strings.
Package norm contains types and functions for normalizing Unicode strings.
rangetable
Package rangetable provides utilities for creating and inspecting unicode.RangeTables.
Package rangetable provides utilities for creating and inspecting unicode.RangeTables.
runenames
Package runenames provides rune names from the Unicode Character Database.
Package runenames provides rune names from the Unicode Character Database.
Package width provides functionality for handling different widths in text.
Package width provides functionality for handling different widths in text.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL