tokenizer

package
v2.0.0-...-6ef3e87 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2023 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package tokenizer implements file tokenization used by the enry content classifier. This package is an implementation detail of enry and should not be imported by other packages.

Index

Constants

View Source
const ByteLimit = 100000

ByteLimit defines the maximum prefix of an input text that will be tokenized.

Variables

This section is empty.

Functions

func Tokenize

func Tokenize(content []byte) []string

Tokenize returns lexical tokens from content. The tokens returned match what the Linguist library returns. At most the first ByteLimit bytes of content are tokenized.

BUG: Until https://github.com/src-d/enry/issues/193 is resolved, there are some differences between this function and the Linguist output.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL