Documentation ¶
Index ¶
- Constants
- Variables
- type CharSplitter
- type Config
- type LenFunc
- type RecursiveCharSplitter
- type Sep
- type Splitter
- func (s *Splitter) Split(text string, sep Sep) []string
- func (s *Splitter) WithChunkOverlap(chunkOverlap int) *Splitter
- func (s *Splitter) WithChunkSize(chunkSize int) *Splitter
- func (s *Splitter) WithKeepSep(keepSep bool) *Splitter
- func (s *Splitter) WithLenFunc(f LenFunc) *Splitter
- func (s *Splitter) WithTrimSpace(trimSpace bool) *Splitter
Constants ¶
const ( // DefaultChunkSize is default chunk size. DefaultChunkSize = 1 // DefaultChunkOverlap is default chunk overlap. DefaultChunkOverlap = 0 )
Variables ¶
var ( // DefaultLenFunc is a default string length function. // It counts UTF-8 encoded characters aka runes. DefaultLenFunc = utf8.RuneCountInString // StringBytesLenFunc counts number of bytes in a string. // Faster for some documents, but less accurate for multiling. StringBytesLenFunc = func(s string) int { return len(s) } // DefaultSeparator is default text separator. // Its intention is to splitt by paragraphs. DefaultSeparator = Sep{Value: "\n\n"} // DefaultSeparators are used in RecursiveCharSplitter. // RecursiveCharSplitter keeps splitting document // recursively using the separators until done. DefaultSeparators = []Sep{ {Value: "\n\n"}, {Value: "\n"}, {Value: " "}, {Value: ""}, } )
Functions ¶
This section is empty.
Types ¶
type CharSplitter ¶
type CharSplitter struct { *Splitter // contains filtered or unexported fields }
CharSplitter is a character text splitter. It splits texts into chunks over a separator which is either a string or a regular expression.
func NewCharSplitter ¶
func NewCharSplitter() *CharSplitter
NewSplitter creates a new splitter with default options and returns it.
func (*CharSplitter) Split ¶
func (s *CharSplitter) Split(text string) []string
Split splits text into chunks.
func (*CharSplitter) WithSep ¶
func (s *CharSplitter) WithSep(sep Sep) *CharSplitter
WithSep sets the separator.
func (*CharSplitter) WithSplitter ¶
func (s *CharSplitter) WithSplitter(splitter *Splitter) *CharSplitter
WithSplitter sets the splitter
type Config ¶
Config configures the splitter NOTE: this is used to prevent situations where values in constructors accidentally mix the order of parameters of the same type leading to unpredicable behaviour.
type RecursiveCharSplitter ¶
type RecursiveCharSplitter struct { *Splitter // contains filtered or unexported fields }
RecursiveCharSplitter is a recursive character text splitter. It tries to split text recursively by different separators to find one that works.
func NewRecursiveCharSplitter ¶
func NewRecursiveCharSplitter() *RecursiveCharSplitter
NewSplitter creates a new splitter and returns it.
func (*RecursiveCharSplitter) Split ¶
func (r *RecursiveCharSplitter) Split(text string) []string
Split splits text into chunks.
func (*RecursiveCharSplitter) WithSeps ¶
func (r *RecursiveCharSplitter) WithSeps(seps []Sep) *RecursiveCharSplitter
WithSeps sets separators.
func (*RecursiveCharSplitter) WithSplitter ¶
func (r *RecursiveCharSplitter) WithSplitter(splitter *Splitter) *RecursiveCharSplitter
WithSplitter sets the splitter.
type Splitter ¶
type Splitter struct {
// contains filtered or unexported fields
}
Splitter splits text documents.
func NewSplitter ¶
func NewSplitter() *Splitter
NewSplitterWithConfig creates a new text splitter with default options and returns it. You can override all config options with appropriate methods.
func NewSplitterWithConfig ¶
NewSplitterWithConfig creates a new text splitter and returns it.
func (*Splitter) Split ¶ added in v0.3.0
Split splits the text over a separator optionally keeping the separator and returns the the chunks in a slice. If the separator is empty string it splits on individual characters.
func (*Splitter) WithChunkOverlap ¶
WithChunkOverlap sets chunk overlap.
func (*Splitter) WithChunkSize ¶
WithChunkSize sets chunk size.
func (*Splitter) WithKeepSep ¶
WithKeepSep sets keep separator flag.
func (*Splitter) WithLenFunc ¶
WithLenFunc sets length func.
func (*Splitter) WithTrimSpace ¶
WithTrimSpace sets trim space.