tokenizer

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 11, 2022 License: MIT Imports: 3 Imported by: 0

README

Go Go Reference

French Tokenizer

This is a basic rule-based tokenizer written in Go for French

How to install

go get github.com/justinsowhat/french-tokenizer

Usage

import (
    "fmt"

    ft "github.com/justinsowhat/french-tokenizer"
)

tokenizer := ft.Tokenizer{}

	text := " «    Je m'appelle Jean-Pierre, et j'aime faire du foot justqu'au soir. » "
	mergeProperNouns := false
	actual := tokenizer.Tokenize(text, mergeProperNouns)
	fmt.Println(actual)

When mergeProperNouns is set to true, proper nous like Jean-Pierre will not be broken into three separate tokens, like Jean - Pierre; instead, it will remain Jean-Pierre. Use it with caution, as it's written with basic heuristic.

License

The MIT license is here.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type FrenchTokenizer added in v0.1.1

type FrenchTokenizer struct {
}

func (FrenchTokenizer) Tokenize added in v0.1.1

func (t FrenchTokenizer) Tokenize(text string, mergeProperNouns bool) []string

type Tokenizer

type Tokenizer interface {
	Tokenize(text string, mergeProperNouns bool) []string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL