text

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2024 License: Apache-2.0 Imports: 6 Imported by: 1

Documentation

Index

Constants

View Source
const (
	// DefaultChunkSize is default chunk size.
	DefaultChunkSize = 1
	// DefaultChunkOverlap is default chunk overlap.
	DefaultChunkOverlap = 0
)

Variables

View Source
var (
	// DefaultLenFunc is a default string length function.
	// It counts UTF-8 encoded characters aka runes.
	DefaultLenFunc = utf8.RuneCountInString
	// StringBytesLenFunc counts number of bytes in a string.
	// Faster for some documents, but less accurate for multiling.
	StringBytesLenFunc = func(s string) int { return len(s) }
	// DefaultSeparator is default text separator.
	// Its intention is to splitt by paragraphs.
	DefaultSeparator = Sep{Value: "\n\n"}
	// DefaultSeparators are used in RecursiveCharSplitter.
	// RecursiveCharSplitter keeps splitting document
	// recursively using the separators until done.
	DefaultSeparators = []Sep{
		{Value: "\n\n"},
		{Value: "\n"},
		{Value: " "},
		{Value: ""},
	}
)

Functions

This section is empty.

Types

type CharSplitter

type CharSplitter struct {
	*Splitter
	// contains filtered or unexported fields
}

CharSplitter is a character text splitter. It splits texts into chunks over a separator which is either a string or a regular expression.

func NewCharSplitter

func NewCharSplitter() *CharSplitter

NewSplitter creates a new splitter with default options and returns it.

func (*CharSplitter) Split

func (s *CharSplitter) Split(text string) []string

Split splits text into chunks.

func (*CharSplitter) WithSep

func (s *CharSplitter) WithSep(sep Sep) *CharSplitter

WithSep sets the separator.

func (*CharSplitter) WithSplitter

func (s *CharSplitter) WithSplitter(splitter *Splitter) *CharSplitter

WithSplitter sets the splitter

type Config

type Config struct {
	ChunkSize    int
	ChunkOverlap int
	TrimSpace    bool
	KeepSep      bool
	LenFunc      LenFunc
}

Config configures the splitter NOTE: this is used to prevent situations where values in constructors accidentally mix the order of parameters of the same type leading to unpredicable behaviour.

type LenFunc

type LenFunc func(s string) int

LenFunc is used for funcs that calculate string lengths.

type RecursiveCharSplitter

type RecursiveCharSplitter struct {
	*Splitter
	// contains filtered or unexported fields
}

RecursiveCharSplitter is a recursive character text splitter. It tries to split text recursively by different separators to find one that works.

func NewRecursiveCharSplitter

func NewRecursiveCharSplitter() *RecursiveCharSplitter

NewSplitter creates a new splitter and returns it.

func (*RecursiveCharSplitter) Split

func (r *RecursiveCharSplitter) Split(text string) []string

Split splits text into chunks.

func (*RecursiveCharSplitter) WithSeps

func (r *RecursiveCharSplitter) WithSeps(seps []Sep) *RecursiveCharSplitter

WithSeps sets separators.

func (*RecursiveCharSplitter) WithSplitter

func (r *RecursiveCharSplitter) WithSplitter(splitter *Splitter) *RecursiveCharSplitter

WithSplitter sets the splitter.

type Sep added in v0.3.0

type Sep struct {
	Value    string
	IsRegexp bool
}

Sep is a text separator.

type Splitter

type Splitter struct {
	// contains filtered or unexported fields
}

Splitter splits text documents.

func NewSplitter

func NewSplitter() *Splitter

NewSplitterWithConfig creates a new text splitter with default options and returns it. You can override all config options with appropriate methods.

func NewSplitterWithConfig

func NewSplitterWithConfig(c Config) *Splitter

NewSplitterWithConfig creates a new text splitter and returns it.

func (*Splitter) Split added in v0.3.0

func (s *Splitter) Split(text string, sep Sep) []string

Split splits the text over a separator optionally keeping the separator and returns the the chunks in a slice. If the separator is empty string it splits on individual characters.

func (*Splitter) WithChunkOverlap

func (s *Splitter) WithChunkOverlap(chunkOverlap int) *Splitter

WithChunkOverlap sets chunk overlap.

func (*Splitter) WithChunkSize

func (s *Splitter) WithChunkSize(chunkSize int) *Splitter

WithChunkSize sets chunk size.

func (*Splitter) WithKeepSep

func (s *Splitter) WithKeepSep(keepSep bool) *Splitter

WithKeepSep sets keep separator flag.

func (*Splitter) WithLenFunc

func (s *Splitter) WithLenFunc(f LenFunc) *Splitter

WithLenFunc sets length func.

func (*Splitter) WithTrimSpace

func (s *Splitter) WithTrimSpace(trimSpace bool) *Splitter

WithTrimSpace sets trim space.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL