tokenizer

package

v2.0.0-...-6ef3e87 Latest Latest Go to latest Published: Feb 15, 2023 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

git.templeos.me/xultist/go-enry

Documentation ¶

Overview ¶

Package tokenizer implements file tokenization used by the enry content classifier. This package is an implementation detail of enry and should not be imported by other packages.

Constants ¶

View Source

const ByteLimit = 100000

ByteLimit defines the maximum prefix of an input text that will be tokenized.

Variables ¶

This section is empty.

Functions ¶

func Tokenize ¶

func Tokenize(content []byte) []string

Tokenize returns lexical tokens from content. The tokens returned match what the Linguist library returns. At most the first ByteLimit bytes of content are tokenized.

BUG: Until https://github.com/src-d/enry/issues/193 is resolved, there are some differences between this function and the Linguist output.

Types ¶

This section is empty.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL