tiktoken

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2023 License: MIT Imports: 16 Imported by: 0

README

tiktoken-go

简体中文

OpenAI's tiktoken in Go.
Tiktoken is a fast BPE tokeniser for use with OpenAI's models.
This is a port of the original tiktoken.

Usage

Install

go get github.com/ccbond/tiktoken-go

Example

get token by encoding
package main

import (
    "fmt"
    "github.com/ccbond/tiktoken-go"
)

func main() (num_tokens int) {
    text = "Hello, world!"
    encoding = "r50k_base"

	tke, err := tiktoken.GetEncoding(encoding)
	if err != nil {
		err = fmt.Errorf("getEncoding: %v", err)
		return
	}

    // encode
	token := tke.Encode(text, nil, nil)

    // num_tokens
    num_tokens = len(token)
}
get token by Model
package main

import (
    "fmt"
    "github.com/ccbond/tiktoken-go"
)

func main() (num_tokens int) {
    text = "Hello, world!"
    encoding = "davinci"

   tkm, err := tiktoken.EncodingForModel(model)
	if err != nil {
		err = fmt.Errorf("getEncoding: %v", err)
		return
	}

	 // encode
	token := tke.Encode(text, nil, nil)

    // num_tokens
    num_tokens = len(token)
}

available encodings

Encoding name OpenAI models
cl100k_base gpt-4, gpt-3.5-turbo, text-embedding-ada-002
p50k_base Codex models, text-davinci-002, text-davinci-003
r50k_base (or gpt2) GPT-3 models like davinci

available models

Model name OpenAI models
gpt-4 cl100k_base
gpt-3.5-turbo cl100k_base
text-davinci-003 p50k_base
text-davinci-002 p50k_base
text-davinci-001 r50k_base
text-curie-001 r50k_base
text-babbage-001 r50k_base
text-ada-001 r50k_base
davinci r50k_base
curie r50k_base
babbage r50k_base
ada r50k_base
code-davinci-002 p50k_base
code-davinci-001 p50k_base
code-cushman-002 p50k_base
code-cushman-001 p50k_base
davinci-codex p50k_base
cushman-codex p50k_base
text-davinci-edit-001 p50k_edit
code-davinci-edit-001 p50k_edit
text-embedding-ada-002 cl100k_base
text-similarity-davinci-001 r50k_base
text-similarity-curie-001 r50k_base
text-similarity-babbage-001 r50k_base
text-similarity-ada-001 r50k_base
text-search-davinci-doc-001 r50k_base
text-search-curie-doc-001 r50k_base
text-search-babbage-doc-001 r50k_base
text-search-ada-doc-001 r50k_base
code-search-babbage-code-001 r50k_base
code-search-ada-code-001 r50k_base
gpt2 gpt2

Test

you can run text in test folder

compare with original tiktoken

get token by encoding

python tiktoken golang tiktoken-go
text: hallo world!, encoding: cl100k_base, token: 4 text: hallo world!, encoding: cl100k_base, token: 4
text: hallo world!, encoding: p50k_base, token: 4 text: hallo world!, encoding: p50k_base, token: 4
text: hallo world!, encoding: r50k_base, token: 4 text: hallo world!, encoding: r50k_base, token: 4
text: 你好世界!, encoding: cl100k_base, token: 6 text: 你好世界!, encoding: cl100k_base, token: 6
text: 你好世界!, encoding: p50k_base, token: 11 text: 你好世界!, encoding: p50k_base, token: 11
text: 你好世界!, encoding: r50k_base, token: 11 text: 你好世界!, encoding: r50k_base, token: 11
text: こんにちは世界!, encoding: cl100k_base, token: 5 text: こんにちは世界!, encoding: cl100k_base, token: 5
text: こんにちは世界!, encoding: p50k_base, token: 13 text: こんにちは世界!, encoding: p50k_base, token: 13
text: こんにちは世界!, encoding: r50k_base, token: 13 text: こんにちは世界!, encoding: r50k_base, token: 13
text: 안녕하세요 세계!, encoding: cl100k_base, token: 10 text: 안녕하세요 세계!, encoding: cl100k_base, token: 10
text: 안녕하세요 세계!, encoding: p50k_base, token: 21 text: 안녕하세요 세계!, encoding: p50k_base, token: 21
text: 안녕하세요 세계!, encoding: r50k_base, token: 21 text: 안녕하세요 세계!, encoding: r50k_base, token: 21
text: Привет мир!, encoding: cl100k_base, token: 6 text: Привет мир!, encoding: cl100k_base, token: 6
text: Привет мир!, encoding: p50k_base, token: 12 text: Привет мир!, encoding: p50k_base, token: 12
text: Привет мир!, encoding: r50k_base, token: 12 text: Привет мир!, encoding: r50k_base, token: 12
text: ¡Hola mundo!, encoding: cl100k_base, token: 4 text: ¡Hola mundo!, encoding: cl100k_base, token: 4
text: ¡Hola mundo!, encoding: p50k_base, token: 7 text: ¡Hola mundo!, encoding: p50k_base, token: 7
text: ¡Hola mundo!, encoding: r50k_base, token: 7 text: ¡Hola mundo!, encoding: r50k_base, token: 7
text: Hallo Welt!, encoding: cl100k_base, token: 3 text: Hallo Welt!, encoding: cl100k_base, token: 3
text: Hallo Welt!, encoding: p50k_base, token: 5 text: Hallo Welt!, encoding: p50k_base, token: 5
text: Hallo Welt!, encoding: r50k_base, token: 5 text: Hallo Welt!, encoding: r50k_base, token: 5
text: Bonjour le monde!, encoding: cl100k_base, token: 4 text: Bonjour le monde!, encoding: cl100k_base, token: 4
text: Bonjour le monde!, encoding: p50k_base, token: 7 text: Bonjour le monde!, encoding: p50k_base, token: 7
text: Bonjour le monde!, encoding: r50k_base, token: 7 text: Bonjour le monde!, encoding: r50k_base, token: 7
text: Ciao mondo!, encoding: cl100k_base, token: 4 text: Ciao mondo!, encoding: cl100k_base, token: 4
text: Ciao mondo!, encoding: p50k_base, token: 5 text: Ciao mondo!, encoding: p50k_base, token: 5
text: Ciao mondo!, encoding: r50k_base, token: 5 text: Ciao mondo!, encoding: r50k_base, token: 5
text: Hej världen!, encoding: cl100k_base, token: 7 text: Hej världen!, encoding: cl100k_base, token: 7
text: Hej världen!, encoding: p50k_base, token: 8 text: Hej världen!, encoding: p50k_base, token: 8
text: Hej världen!, encoding: r50k_base, token: 8 text: Hej världen!, encoding: r50k_base, token: 8
text: Hallo wereld!, encoding: cl100k_base, token: 3 text: Hallo wereld!, encoding: cl100k_base, token: 3
text: Hallo wereld!, encoding: p50k_base, token: 5 text: Hallo wereld!, encoding: p50k_base, token: 5
text: Hallo wereld!, encoding: r50k_base, token: 5 text: Hallo wereld!, encoding: r50k_base, token: 5
text: Hallo verden!, encoding: cl100k_base, token: 4 text: Hallo verden!, encoding: cl100k_base, token: 4
text: Hallo verden!, encoding: p50k_base, token: 5 text: Hallo verden!, encoding: p50k_base, token: 5
text: Hallo verden!, encoding: r50k_base, token: 5 text: Hallo verden!, encoding: r50k_base, token: 5
text: Hallo wereld!, encoding: cl100k_base, token: 3 text: Hallo wereld!, encoding: cl100k_base, token: 3
text: Hallo wereld!, encoding: p50k_base, token: 5 text: Hallo wereld!, encoding: p50k_base, token: 5
text: Hallo wereld!, encoding: r50k_base, token: 5 text: Hallo wereld!, encoding: r50k_base, token: 5
text: Hallo verden!, encoding: cl100k_base, token: 4 text: Hallo verden!, encoding: cl100k_base, token: 4
text: Hallo verden!, encoding: p50k_base, token: 5 text: Hallo verden!, encoding: p50k_base, token: 5
text: Hallo verden!, encoding: r50k_base, token: 5 text: Hallo verden!, encoding: r50k_base, token: 5

get token by model

python tiktoken golang tiktoken-go
text: hallo world!, model: gpt-4, token: 4 text: hallo world!, model: gpt-4, token: 4
text: hallo world!, model: gpt-3.5-turbo, token: 4 text: hallo world!, model: gpt-3.5-turbo, token: 4
text: hallo world!, model: text-davinci-003, token: 4 text: hallo world!, model: text-davinci-003, token: 4
text: hallo world!, model: text-davinci-002, token: 4 text: hallo world!, model: text-davinci-002, token: 4
text: hallo world!, model: text-davinci-001, token: 4 text: hallo world!, model: text-davinci-001, token: 4
text: hallo world!, model: text-curie-001, token: 4 text: hallo world!, model: text-curie-001, token: 4
text: hallo world!, model: text-babbage-001, token: 4 text: hallo world!, model: text-babbage-001, token: 4
text: hallo world!, model: text-ada-001, token: 4 text: hallo world!, model: text-ada-001, token: 4
text: hallo world!, model: davinci, token: 4 text: hallo world!, model: davinci, token: 4
text: hallo world!, model: curie, token: 4 text: hallo world!, model: curie, token: 4
text: hallo world!, model: babbage, token: 4 text: hallo world!, model: babbage, token: 4
text: hallo world!, model: ada, token: 4 text: hallo world!, model: ada, token: 4
text: hallo world!, model: code-davinci-002, token: 4 text: hallo world!, model: code-davinci-002, token: 4
text: hallo world!, model: code-davinci-001, token: 4 text: hallo world!, model: code-davinci-001, token: 4
text: hallo world!, model: code-cushman-002, token: 4 text: hallo world!, model: code-cushman-002, token: 4
text: hallo world!, model: code-cushman-001, token: 4 text: hallo world!, model: code-cushman-001, token: 4
text: hallo world!, model: davinci-codex, token: 4 text: hallo world!, model: davinci-codex, token: 4
text: hallo world!, model: cushman-codex, token: 4 text: hallo world!, model: cushman-codex, token: 4
text: hallo world!, model: text-davinci-edit-001, token: 4 text: hallo world!, model: text-davinci-edit-001, token: 4
text: hallo world!, model: code-davinci-edit-001, token: 4 text: hallo world!, model: code-davinci-edit-001, token: 4
text: hallo world!, model: text-embedding-ada-002, token: 4 text: hallo world!, model: text-embedding-ada-002, token: 4
text: hallo world!, model: text-similarity-davinci-001, token: 4 text: hallo world!, model: text-similarity-davinci-001, token: 4
text: 你好世界!, model: gpt-4, token: 6 text: 你好世界!, model: gpt-4, token: 6
text: 你好世界!, model: gpt-3.5-turbo, token: 6 text: 你好世界!, model: gpt-3.5-turbo, token: 6
text: 你好世界!, model: text-davinci-003, token: 11 text: 你好世界!, model: text-davinci-003, token: 11
text: 你好世界!, model: text-davinci-002, token: 11 text: 你好世界!, model: text-davinci-002, token: 11
text: 你好世界!, model: text-davinci-001, token: 11 text: 你好世界!, model: text-davinci-001, token: 11
text: 你好世界!, model: text-curie-001, token: 11 text: 你好世界!, model: text-curie-001, token: 11
text: 你好世界!, model: text-babbage-001, token: 11 text: 你好世界!, model: text-babbage-001, token: 11
text: 你好世界!, model: text-ada-001, token: 11 text: 你好世界!, model: text-ada-001, token: 11
text: 你好世界!, model: davinci, token: 11 text: 你好世界!, model: davinci, token: 11
text: 你好世界!, model: curie, token: 11 text: 你好世界!, model: curie, token: 11
text: 你好世界!, model: babbage, token: 11 text: 你好世界!, model: babbage, token: 11
text: 你好世界!, model: ada, token: 11 text: 你好世界!, model: ada, token: 11
text: 你好世界!, model: code-davinci-002, token: 11 text: 你好世界!, model: code-davinci-002, token: 11
text: 你好世界!, model: code-davinci-001, token: 11 text: 你好世界!, model: code-davinci-001, token: 11
text: 你好世界!, model: code-cushman-002, token: 11 text: 你好世界!, model: code-cushman-002, token: 11
text: 你好世界!, model: code-cushman-001, token: 11 text: 你好世界!, model: code-cushman-001, token: 11
text: 你好世界!, model: davinci-codex, token: 11 text: 你好世界!, model: davinci-codex, token: 11
text: 你好世界!, model: cushman-codex, token: 11 text: 你好世界!, model: cushman-codex, token: 11
text: 你好世界!, model: text-davinci-edit-001, token: 11 text: 你好世界!, model: text-davinci-edit-001, token: 11
text: 你好世界!, model: code-davinci-edit-001, token: 11 text: 你好世界!, model: code-davinci-edit-001, token: 11
text: 你好世界!, model: text-embedding-ada-002, token: 6 text: 你好世界!, model: text-embedding-ada-002, token: 6
text: 你好世界!, model: text-similarity-davinci-001, token: 11 text: 你好世界!, model: text-similarity-davinci-001, token: 11
text: こんにちは世界!, model: gpt-4, token: 5 text: こんにちは世界!, model: gpt-4, token: 5
text: こんにちは世界!, model: gpt-3.5-turbo, token: 5 text: こんにちは世界!, model: gpt-3.5-turbo, token: 5
text: こんにちは世界!, model: text-davinci-003, token: 13 text: こんにちは世界!, model: text-davinci-003, token: 13
text: こんにちは世界!, model: text-davinci-002, token: 13 text: こんにちは世界!, model: text-davinci-002, token: 13
text: こんにちは世界!, model: text-davinci-001, token: 13 text: こんにちは世界!, model: text-davinci-001, token: 13
text: こんにちは世界!, model: text-curie-001, token: 13 text: こんにちは世界!, model: text-curie-001, token: 13
text: こんにちは世界!, model: text-babbage-001, token: 13 text: こんにちは世界!, model: text-babbage-001, token: 13
text: こんにちは世界!, model: text-ada-001, token: 13 text: こんにちは世界!, model: text-ada-001, token: 13
text: こんにちは世界!, model: davinci, token: 13 text: こんにちは世界!, model: davinci, token: 13
text: こんにちは世界!, model: curie, token: 13 text: こんにちは世界!, model: curie, token: 13
text: こんにちは世界!, model: babbage, token: 13 text: こんにちは世界!, model: babbage, token: 13
text: こんにちは世界!, model: ada, token: 13 text: こんにちは世界!, model: ada, token: 13
text: こんにちは世界!, model: code-davinci-002, token: 13 text: こんにちは世界!, model: code-davinci-002, token: 13
text: こんにちは世界!, model: code-davinci-001, token: 13 text: こんにちは世界!, model: code-davinci-001, token: 13
text: こんにちは世界!, model: code-cushman-002, token: 13 text: こんにちは世界!, model: code-cushman-002, token: 13
text: こんにちは世界!, model: code-cushman-001, token: 13 text: こんにちは世界!, model: code-cushman-001, token: 13
text: こんにちは世界!, model: davinci-codex, token: 13 text: こんにちは世界!, model: davinci-codex, token: 13
text: こんにちは世界!, model: cushman-codex, token: 13 text: こんにちは世界!, model: cushman-codex, token: 13
text: こんにちは世界!, model: text-davinci-edit-001, token: 13 text: こんにちは世界!, model: text-davinci-edit-001, token: 13
text: こんにちは世界!, model: code-davinci-edit-001, token: 13 text: こんにちは世界!, model: code-davinci-edit-001, token: 13
text: こんにちは世界!, model: text-embedding-ada-002, token: 5 text: こんにちは世界!, model: text-embedding-ada-002, token: 5
text: こんにちは世界!, model: text-similarity-davinci-001, token: 13 text: こんにちは世界!, model: text-similarity-davinci-001, token: 13
text: 안녕하세요 세계!, model: gpt-4, token: 10 text: 안녕하세요 세계!, model: gpt-4, token: 10
text: 안녕하세요 세계!, model: gpt-3.5-turbo, token: 10 text: 안녕하세요 세계!, model: gpt-3.5-turbo, token: 10
text: 안녕하세요 세계!, model: text-davinci-003, token: 21 text: 안녕하세요 세계!, model: text-davinci-003, token: 21
text: 안녕하세요 세계!, model: text-davinci-002, token: 21 text: 안녕하세요 세계!, model: text-davinci-002, token: 21
text: 안녕하세요 세계!, model: text-davinci-001, token: 21 text: 안녕하세요 세계!, model: text-davinci-001, token: 21
text: 안녕하세요 세계!, model: text-curie-001, token: 21 text: 안녕하세요 세계!, model: text-curie-001, token: 21
text: 안녕하세요 세계!, model: text-babbage-001, token: 21 text: 안녕하세요 세계!, model: text-babbage-001, token: 21
text: 안녕하세요 세계!, model: text-ada-001, token: 21 text: 안녕하세요 세계!, model: text-ada-001, token: 21
text: 안녕하세요 세계!, model: davinci, token: 21 text: 안녕하세요 세계!, model: davinci, token: 21
text: 안녕하세요 세계!, model: curie, token: 21 text: 안녕하세요 세계!, model: curie, token: 21
text: 안녕하세요 세계!, model: babbage, token: 21 text: 안녕하세요 세계!, model: babbage, token: 21
text: 안녕하세요 세계!, model: ada, token: 21 text: 안녕하세요 세계!, model: ada, token: 21
text: 안녕하세요 세계!, model: code-davinci-002, token: 21 text: 안녕하세요 세계!, model: code-davinci-002, token: 21
text: 안녕하세요 세계!, model: code-davinci-001, token: 21 text: 안녕하세요 세계!, model: code-davinci-001, token: 21
text: 안녕하세요 세계!, model: code-cushman-002, token: 21 text: 안녕하세요 세계!, model: code-cushman-002, token: 21
text: 안녕하세요 세계!, model: code-cushman-001, token: 21 text: 안녕하세요 세계!, model: code-cushman-001, token: 21
text: 안녕하세요 세계!, model: davinci-codex, token: 21 text: 안녕하세요 세계!, model: davinci-codex, token: 21
text: 안녕하세요 세계!, model: cushman-codex, token: 21 text: 안녕하세요 세계!, model: cushman-codex, token: 21
text: 안녕하세요 세계!, model: text-davinci-edit-001, token: 21 text: 안녕하세요 세계!, model: text-davinci-edit-001, token: 21
text: 안녕하세요 세계!, model: code-davinci-edit-001, token: 21 text: 안녕하세요 세계!, model: code-davinci-edit-001, token: 21
text: 안녕하세요 세계!, model: text-embedding-ada-002, token: 10 text: 안녕하세요 세계!, model: text-embedding-ada-002, token: 10
text: 안녕하세요 세계!, model: text-similarity-davinci-001, token: 21 text: 안녕하세요 세계!, model: text-similarity-davinci-001, token: 21
text: Привет мир!, model: gpt-4, token: 6 text: Привет мир!, model: gpt-4, token: 6
text: Привет мир!, model: gpt-3.5-turbo, token: 6 text: Привет мир!, model: gpt-3.5-turbo, token: 6
text: Привет мир!, model: text-davinci-003, token: 12 text: Привет мир!, model: text-davinci-003, token: 12
text: Привет мир!, model: text-davinci-002, token: 12 text: Привет мир!, model: text-davinci-002, token: 12
text: Привет мир!, model: text-davinci-001, token: 12 text: Привет мир!, model: text-davinci-001, token: 12
text: Привет мир!, model: text-curie-001, token: 12 text: Привет мир!, model: text-curie-001, token: 12
text: Привет мир!, model: text-babbage-001, token: 12 text: Привет мир!, model: text-babbage-001, token: 12
text: Привет мир!, model: text-ada-001, token: 12 text: Привет мир!, model: text-ada-001, token: 12
text: Привет мир!, model: davinci, token: 12 text: Привет мир!, model: davinci, token: 12
text: Привет мир!, model: curie, token: 12 text: Привет мир!, model: curie, token: 12
text: Привет мир!, model: babbage, token: 12 text: Привет мир!, model: babbage, token: 12
text: Привет мир!, model: ada, token: 12 text: Привет мир!, model: ada, token: 12
text: Привет мир!, model: code-davinci-002, token: 12 text: Привет мир!, model: code-davinci-002, token: 12
text: Привет мир!, model: code-davinci-001, token: 12 text: Привет мир!, model: code-davinci-001, token: 12
text: Привет мир!, model: code-cushman-002, token: 12 text: Привет мир!, model: code-cushman-002, token: 12
text: Привет мир!, model: code-cushman-001, token: 12 text: Привет мир!, model: code-cushman-001, token: 12
text: Привет мир!, model: davinci-codex, token: 12 text: Привет мир!, model: davinci-codex, token: 12
text: Привет мир!, model: cushman-codex, token: 12 text: Привет мир!, model: cushman-codex, token: 12
text: Привет мир!, model: text-davinci-edit-001, token: 12 text: Привет мир!, model: text-davinci-edit-001, token: 12
text: Привет мир!, model: code-davinci-edit-001, token: 12 text: Привет мир!, model: code-davinci-edit-001, token: 12
text: Привет мир!, model: text-embedding-ada-002, token: 6 text: Привет мир!, model: text-embedding-ada-002, token: 6
text: Привет мир!, model: text-similarity-davinci-001, token: 12 text: Привет мир!, model: text-similarity-davinci-001, token: 12
text: ¡Hola mundo!, model: gpt-4, token: 4 text: ¡Hola mundo!, model: gpt-4, token: 4
text: ¡Hola mundo!, model: gpt-3.5-turbo, token: 4 text: ¡Hola mundo!, model: gpt-3.5-turbo, token: 4
text: ¡Hola mundo!, model: text-davinci-003, token: 7 text: ¡Hola mundo!, model: text-davinci-003, token: 7
text: ¡Hola mundo!, model: text-davinci-002, token: 7 text: ¡Hola mundo!, model: text-davinci-002, token: 7
text: ¡Hola mundo!, model: text-davinci-001, token: 7 text: ¡Hola mundo!, model: text-davinci-001, token: 7
text: ¡Hola mundo!, model: text-curie-001, token: 7 text: ¡Hola mundo!, model: text-curie-001, token: 7
text: ¡Hola mundo!, model: text-babbage-001, token: 7 text: ¡Hola mundo!, model: text-babbage-001, token: 7
text: ¡Hola mundo!, model: text-ada-001, token: 7 text: ¡Hola mundo!, model: text-ada-001, token: 7
text: ¡Hola mundo!, model: davinci, token: 7 text: ¡Hola mundo!, model: davinci, token: 7
text: ¡Hola mundo!, model: curie, token: 7 text: ¡Hola mundo!, model: curie, token: 7
text: ¡Hola mundo!, model: babbage, token: 7 text: ¡Hola mundo!, model: babbage, token: 7
text: ¡Hola mundo!, model: ada, token: 7 text: ¡Hola mundo!, model: ada, token: 7
text: ¡Hola mundo!, model: code-davinci-002, token: 7 text: ¡Hola mundo!, model: code-davinci-002, token: 7
text: ¡Hola mundo!, model: code-davinci-001, token: 7 text: ¡Hola mundo!, model: code-davinci-001, token: 7
text: ¡Hola mundo!, model: code-cushman-002, token: 7 text: ¡Hola mundo!, model: code-cushman-002, token: 7
text: ¡Hola mundo!, model: code-cushman-001, token: 7 text: ¡Hola mundo!, model: code-cushman-001, token: 7
text: ¡Hola mundo!, model: davinci-codex, token: 7 text: ¡Hola mundo!, model: davinci-codex, token: 7
text: ¡Hola mundo!, model: cushman-codex, token: 7 text: ¡Hola mundo!, model: cushman-codex, token: 7
text: ¡Hola mundo!, model: text-davinci-edit-001, token: 7 text: ¡Hola mundo!, model: text-davinci-edit-001, token: 7
text: ¡Hola mundo!, model: code-davinci-edit-001, token: 7 text: ¡Hola mundo!, model: code-davinci-edit-001, token: 7
text: ¡Hola mundo!, model: text-embedding-ada-002, token: 4 text: ¡Hola mundo!, model: text-embedding-ada-002, token: 4
text: ¡Hola mundo!, model: text-similarity-davinci-001, token: 7 text: ¡Hola mundo!, model: text-similarity-davinci-001, token: 7
text: Hallo Welt!, model: gpt-4, token: 3 text: Hallo Welt!, model: gpt-4, token: 3
text: Hallo Welt!, model: gpt-3.5-turbo, token: 3 text: Hallo Welt!, model: gpt-3.5-turbo, token: 3
text: Hallo Welt!, model: text-davinci-003, token: 5 text: Hallo Welt!, model: text-davinci-003, token: 5
text: Hallo Welt!, model: text-davinci-002, token: 5 text: Hallo Welt!, model: text-davinci-002, token: 5
text: Hallo Welt!, model: text-davinci-001, token: 5 text: Hallo Welt!, model: text-davinci-001, token: 5
text: Hallo Welt!, model: text-curie-001, token: 5 text: Hallo Welt!, model: text-curie-001, token: 5
text: Hallo Welt!, model: text-babbage-001, token: 5 text: Hallo Welt!, model: text-babbage-001, token: 5
text: Hallo Welt!, model: text-ada-001, token: 5 text: Hallo Welt!, model: text-ada-001, token: 5
text: Hallo Welt!, model: davinci, token: 5 text: Hallo Welt!, model: davinci, token: 5
text: Hallo Welt!, model: curie, token: 5 text: Hallo Welt!, model: curie, token: 5
text: Hallo Welt!, model: babbage, token: 5 text: Hallo Welt!, model: babbage, token: 5
text: Hallo Welt!, model: ada, token: 5 text: Hallo Welt!, model: ada, token: 5
text: Hallo Welt!, model: code-davinci-002, token: 5 text: Hallo Welt!, model: code-davinci-002, token: 5
text: Hallo Welt!, model: code-davinci-001, token: 5 text: Hallo Welt!, model: code-davinci-001, token: 5
text: Hallo Welt!, model: code-cushman-002, token: 5 text: Hallo Welt!, model: code-cushman-002, token: 5
text: Hallo Welt!, model: code-cushman-001, token: 5 text: Hallo Welt!, model: code-cushman-001, token: 5
text: Hallo Welt!, model: davinci-codex, token: 5 text: Hallo Welt!, model: davinci-codex, token: 5
text: Hallo Welt!, model: cushman-codex, token: 5 text: Hallo Welt!, model: cushman-codex, token: 5
text: Hallo Welt!, model: text-davinci-edit-001, token: 5 text: Hallo Welt!, model: text-davinci-edit-001, token: 5
text: Hallo Welt!, model: code-davinci-edit-001, token: 5 text: Hallo Welt!, model: code-davinci-edit-001, token: 5
text: Hallo Welt!, model: text-embedding-ada-002, token: 3 text: Hallo Welt!, model: text-embedding-ada-002, token: 3
text: Hallo Welt!, model: text-similarity-davinci-001, token: 5 text: Hallo Welt!, model: text-similarity-davinci-001, token: 5
text: Bonjour le monde!, model: gpt-4, token: 4 text: Bonjour le monde!, model: gpt-4, token: 4
text: Bonjour le monde!, model: gpt-3.5-turbo, token: 4 text: Bonjour le monde!, model: gpt-3.5-turbo, token: 4
text: Bonjour le monde!, model: text-davinci-003, token: 7 text: Bonjour le monde!, model: text-davinci-003, token: 7
text: Bonjour le monde!, model: text-davinci-002, token: 7 text: Bonjour le monde!, model: text-davinci-002, token: 7
text: Bonjour le monde!, model: text-davinci-001, token: 7 text: Bonjour le monde!, model: text-davinci-001, token: 7
text: Bonjour le monde!, model: text-curie-001, token: 7 text: Bonjour le monde!, model: text-curie-001, token: 7
text: Bonjour le monde!, model: text-babbage-001, token: 7 text: Bonjour le monde!, model: text-babbage-001, token: 7
text: Bonjour le monde!, model: text-ada-001, token: 7 text: Bonjour le monde!, model: text-ada-001, token: 7
text: Bonjour le monde!, model: davinci, token: 7 text: Bonjour le monde!, model: davinci, token: 7
text: Bonjour le monde!, model: curie, token: 7 text: Bonjour le monde!, model: curie, token: 7
text: Bonjour le monde!, model: babbage, token: 7 text: Bonjour le monde!, model: babbage, token: 7
text: Bonjour le monde!, model: ada, token: 7 text: Bonjour le monde!, model: ada, token: 7
text: Bonjour le monde!, model: code-davinci-002, token: 7 text: Bonjour le monde!, model: code-davinci-002, token: 7
text: Bonjour le monde!, model: code-davinci-001, token: 7 text: Bonjour le monde!, model: code-davinci-001, token: 7
text: Bonjour le monde!, model: code-cushman-002, token: 7 text: Bonjour le monde!, model: code-cushman-002, token: 7
text: Bonjour le monde!, model: code-cushman-001, token: 7 text: Bonjour le monde!, model: code-cushman-001, token: 7
text: Bonjour le monde!, model: davinci-codex, token: 7 text: Bonjour le monde!, model: davinci-codex, token: 7
text: Bonjour le monde!, model: cushman-codex, token: 7 text: Bonjour le monde!, model: cushman-codex, token: 7
text: Bonjour le monde!, model: text-davinci-edit-001, token: 7 text: Bonjour le monde!, model: text-davinci-edit-001, token: 7
text: Bonjour le monde!, model: code-davinci-edit-001, token: 7 text: Bonjour le monde!, model: code-davinci-edit-001, token: 7
text: Bonjour le monde!, model: text-embedding-ada-002, token: 4 text: Bonjour le monde!, model: text-embedding-ada-002, token: 4
text: Bonjour le monde!, model: text-similarity-davinci-001, token: 7 text: Bonjour le monde!, model: text-similarity-davinci-001, token: 7
text: Ciao mondo!, model: gpt-4, token: 4 text: Ciao mondo!, model: gpt-4, token: 4
text: Ciao mondo!, model: gpt-3.5-turbo, token: 4 text: Ciao mondo!, model: gpt-3.5-turbo, token: 4
text: Ciao mondo!, model: text-davinci-003, token: 5 text: Ciao mondo!, model: text-davinci-003, token: 5
text: Ciao mondo!, model: text-davinci-002, token: 5 text: Ciao mondo!, model: text-davinci-002, token: 5
text: Ciao mondo!, model: text-davinci-001, token: 5 text: Ciao mondo!, model: text-davinci-001, token: 5
text: Ciao mondo!, model: text-curie-001, token: 5 text: Ciao mondo!, model: text-curie-001, token: 5
text: Ciao mondo!, model: text-babbage-001, token: 5 text: Ciao mondo!, model: text-babbage-001, token: 5
text: Ciao mondo!, model: text-ada-001, token: 5 text: Ciao mondo!, model: text-ada-001, token: 5
text: Ciao mondo!, model: davinci, token: 5 text: Ciao mondo!, model: davinci, token: 5
text: Ciao mondo!, model: curie, token: 5 text: Ciao mondo!, model: curie, token: 5
text: Ciao mondo!, model: babbage, token: 5 text: Ciao mondo!, model: babbage, token: 5
text: Ciao mondo!, model: ada, token: 5 text: Ciao mondo!, model: ada, token: 5
text: Ciao mondo!, model: code-davinci-002, token: 5 text: Ciao mondo!, model: code-davinci-002, token: 5
text: Ciao mondo!, model: code-davinci-001, token: 5 text: Ciao mondo!, model: code-davinci-001, token: 5
text: Ciao mondo!, model: code-cushman-002, token: 5 text: Ciao mondo!, model: code-cushman-002, token: 5
text: Ciao mondo!, model: code-cushman-001, token: 5 text: Ciao mondo!, model: code-cushman-001, token: 5
text: Ciao mondo!, model: davinci-codex, token: 5 text: Ciao mondo!, model: davinci-codex, token: 5
text: Ciao mondo!, model: cushman-codex, token: 5 text: Ciao mondo!, model: cushman-codex, token: 5
text: Ciao mondo!, model: text-davinci-edit-001, token: 5 text: Ciao mondo!, model: text-davinci-edit-001, token: 5
text: Ciao mondo!, model: code-davinci-edit-001, token: 5 text: Ciao mondo!, model: code-davinci-edit-001, token: 5
text: Ciao mondo!, model: text-embedding-ada-002, token: 4 text: Ciao mondo!, model: text-embedding-ada-002, token: 4
text: Ciao mondo!, model: text-similarity-davinci-001, token: 5 text: Ciao mondo!, model: text-similarity-davinci-001, token: 5
text: Hej världen!, model: gpt-4, token: 7 text: Hej världen!, model: gpt-4, token: 7
text: Hej världen!, model: gpt-3.5-turbo, token: 7 text: Hej världen!, model: gpt-3.5-turbo, token: 7
text: Hej världen!, model: text-davinci-003, token: 8 text: Hej världen!, model: text-davinci-003, token: 8
text: Hej världen!, model: text-davinci-002, token: 8 text: Hej världen!, model: text-davinci-002, token: 8
text: Hej världen!, model: text-davinci-001, token: 8 text: Hej världen!, model: text-davinci-001, token: 8
text: Hej världen!, model: text-curie-001, token: 8 text: Hej världen!, model: text-curie-001, token: 8
text: Hej världen!, model: text-babbage-001, token: 8 text: Hej världen!, model: text-babbage-001, token: 8
text: Hej världen!, model: text-ada-001, token: 8 text: Hej världen!, model: text-ada-001, token: 8
text: Hej världen!, model: davinci, token: 8 text: Hej världen!, model: davinci, token: 8
text: Hej världen!, model: curie, token: 8 text: Hej världen!, model: curie, token: 8
text: Hej världen!, model: babbage, token: 8 text: Hej världen!, model: babbage, token: 8
text: Hej världen!, model: ada, token: 8 text: Hej världen!, model: ada, token: 8
text: Hej världen!, model: code-davinci-002, token: 8 text: Hej världen!, model: code-davinci-002, token: 8
text: Hej världen!, model: code-davinci-001, token: 8 text: Hej världen!, model: code-davinci-001, token: 8
text: Hej världen!, model: code-cushman-002, token: 8 text: Hej världen!, model: code-cushman-002, token: 8
text: Hej världen!, model: code-cushman-001, token: 8 text: Hej världen!, model: code-cushman-001, token: 8
text: Hej världen!, model: davinci-codex, token: 8 text: Hej världen!, model: davinci-codex, token: 8
text: Hej världen!, model: cushman-codex, token: 8 text: Hej världen!, model: cushman-codex, token: 8
text: Hej världen!, model: text-davinci-edit-001, token: 8 text: Hej världen!, model: text-davinci-edit-001, token: 8
text: Hej världen!, model: code-davinci-edit-001, token: 8 text: Hej världen!, model: code-davinci-edit-001, token: 8
text: Hej världen!, model: text-embedding-ada-002, token: 7 text: Hej världen!, model: text-embedding-ada-002, token: 7
text: Hej världen!, model: text-similarity-davinci-001, token: 8 text: Hej världen!, model: text-similarity-davinci-001, token: 8
text: Hallo wereld!, model: gpt-4, token: 3 text: Hallo wereld!, model: gpt-4, token: 3
text: Hallo wereld!, model: gpt-3.5-turbo, token: 3 text: Hallo wereld!, model: gpt-3.5-turbo, token: 3
text: Hallo wereld!, model: text-davinci-003, token: 5 text: Hallo wereld!, model: text-davinci-003, token: 5
text: Hallo wereld!, model: text-davinci-002, token: 5 text: Hallo wereld!, model: text-davinci-002, token: 5
text: Hallo wereld!, model: text-davinci-001, token: 5 text: Hallo wereld!, model: text-davinci-001, token: 5
text: Hallo wereld!, model: text-curie-001, token: 5 text: Hallo wereld!, model: text-curie-001, token: 5
text: Hallo wereld!, model: text-babbage-001, token: 5 text: Hallo wereld!, model: text-babbage-001, token: 5
text: Hallo wereld!, model: text-ada-001, token: 5 text: Hallo wereld!, model: text-ada-001, token: 5
text: Hallo wereld!, model: davinci, token: 5 text: Hallo wereld!, model: davinci, token: 5
text: Hallo wereld!, model: curie, token: 5 text: Hallo wereld!, model: curie, token: 5
text: Hallo wereld!, model: babbage, token: 5 text: Hallo wereld!, model: babbage, token: 5
text: Hallo wereld!, model: ada, token: 5 text: Hallo wereld!, model: ada, token: 5
text: Hallo wereld!, model: code-davinci-002, token: 5 text: Hallo wereld!, model: code-davinci-002, token: 5
text: Hallo wereld!, model: code-davinci-001, token: 5 text: Hallo wereld!, model: code-davinci-001, token: 5
text: Hallo wereld!, model: code-cushman-002, token: 5 text: Hallo wereld!, model: code-cushman-002, token: 5
text: Hallo wereld!, model: code-cushman-001, token: 5 text: Hallo wereld!, model: code-cushman-001, token: 5
text: Hallo wereld!, model: davinci-codex, token: 5 text: Hallo wereld!, model: davinci-codex, token: 5
text: Hallo wereld!, model: cushman-codex, token: 5 text: Hallo wereld!, model: cushman-codex, token: 5
text: Hallo wereld!, model: text-davinci-edit-001, token: 5 text: Hallo wereld!, model: text-davinci-edit-001, token: 5
text: Hallo wereld!, model: code-davinci-edit-001, token: 5 text: Hallo wereld!, model: code-davinci-edit-001, token: 5
text: Hallo wereld!, model: text-embedding-ada-002, token: 3 text: Hallo wereld!, model: text-embedding-ada-002, token: 3
text: Hallo wereld!, model: text-similarity-davinci-001, token: 5 text: Hallo wereld!, model: text-similarity-davinci-001, token: 5
text: Hallo verden!, model: gpt-4, token: 4 text: Hallo verden!, model: gpt-4, token: 4
text: Hallo verden!, model: gpt-3.5-turbo, token: 4 text: Hallo verden!, model: gpt-3.5-turbo, token: 4
text: Hallo verden!, model: text-davinci-003, token: 5 text: Hallo verden!, model: text-davinci-003, token: 5
text: Hallo verden!, model: text-davinci-002, token: 5 text: Hallo verden!, model: text-davinci-002, token: 5
text: Hallo verden!, model: text-davinci-001, token: 5 text: Hallo verden!, model: text-davinci-001, token: 5
text: Hallo verden!, model: text-curie-001, token: 5 text: Hallo verden!, model: text-curie-001, token: 5
text: Hallo verden!, model: text-babbage-001, token: 5 text: Hallo verden!, model: text-babbage-001, token: 5
text: Hallo verden!, model: text-ada-001, token: 5 text: Hallo verden!, model: text-ada-001, token: 5
text: Hallo verden!, model: davinci, token: 5 text: Hallo verden!, model: davinci, token: 5
text: Hallo verden!, model: curie, token: 5 text: Hallo verden!, model: curie, token: 5
text: Hallo verden!, model: babbage, token: 5 text: Hallo verden!, model: babbage, token: 5
text: Hallo verden!, model: ada, token: 5 text: Hallo verden!, model: ada, token: 5
text: Hallo verden!, model: code-davinci-002, token: 5 text: Hallo verden!, model: code-davinci-002, token: 5
text: Hallo verden!, model: code-davinci-001, token: 5 text: Hallo verden!, model: code-davinci-001, token: 5
text: Hallo verden!, model: code-cushman-002, token: 5 text: Hallo verden!, model: code-cushman-002, token: 5
text: Hallo verden!, model: code-cushman-001, token: 5 text: Hallo verden!, model: code-cushman-001, token: 5
text: Hallo verden!, model: davinci-codex, token: 5 text: Hallo verden!, model: davinci-codex, token: 5
text: Hallo verden!, model: cushman-codex, token: 5 text: Hallo verden!, model: cushman-codex, token: 5
text: Hallo verden!, model: text-davinci-edit-001, token: 5 text: Hallo verden!, model: text-davinci-edit-001, token: 5
text: Hallo verden!, model: code-davinci-edit-001, token: 5 text: Hallo verden!, model: code-davinci-edit-001, token: 5
text: Hallo verden!, model: text-embedding-ada-002, token: 4 text: Hallo verden!, model: text-embedding-ada-002, token: 4
text: Hallo verden!, model: text-similarity-davinci-001, token: 5 text: Hallo verden!, model: text-similarity-davinci-001, token: 5
text: Hallo wereld!, model: gpt-4, token: 3 text: Hallo wereld!, model: gpt-4, token: 3
text: Hallo wereld!, model: gpt-3.5-turbo, token: 3 text: Hallo wereld!, model: gpt-3.5-turbo, token: 3
text: Hallo wereld!, model: text-davinci-003, token: 5 text: Hallo wereld!, model: text-davinci-003, token: 5
text: Hallo wereld!, model: text-davinci-002, token: 5 text: Hallo wereld!, model: text-davinci-002, token: 5
text: Hallo wereld!, model: text-davinci-001, token: 5 text: Hallo wereld!, model: text-davinci-001, token: 5
text: Hallo wereld!, model: text-curie-001, token: 5 text: Hallo wereld!, model: text-curie-001, token: 5
text: Hallo wereld!, model: text-babbage-001, token: 5 text: Hallo wereld!, model: text-babbage-001, token: 5
text: Hallo wereld!, model: text-ada-001, token: 5 text: Hallo wereld!, model: text-ada-001, token: 5
text: Hallo wereld!, model: davinci, token: 5 text: Hallo wereld!, model: davinci, token: 5
text: Hallo wereld!, model: curie, token: 5 text: Hallo wereld!, model: curie, token: 5
text: Hallo wereld!, model: babbage, token: 5 text: Hallo wereld!, model: babbage, token: 5
text: Hallo wereld!, model: ada, token: 5 text: Hallo wereld!, model: ada, token: 5
text: Hallo wereld!, model: code-davinci-002, token: 5 text: Hallo wereld!, model: code-davinci-002, token: 5
text: Hallo wereld!, model: code-davinci-001, token: 5 text: Hallo wereld!, model: code-davinci-001, token: 5
text: Hallo wereld!, model: code-cushman-002, token: 5 text: Hallo wereld!, model: code-cushman-002, token: 5
text: Hallo wereld!, model: code-cushman-001, token: 5 text: Hallo wereld!, model: code-cushman-001, token: 5
text: Hallo wereld!, model: davinci-codex, token: 5 text: Hallo wereld!, model: davinci-codex, token: 5
text: Hallo wereld!, model: cushman-codex, token: 5 text: Hallo wereld!, model: cushman-codex, token: 5
text: Hallo wereld!, model: text-davinci-edit-001, token: 5 text: Hallo wereld!, model: text-davinci-edit-001, token: 5
text: Hallo wereld!, model: code-davinci-edit-001, token: 5 text: Hallo wereld!, model: code-davinci-edit-001, token: 5
text: Hallo wereld!, model: text-embedding-ada-002, token: 3 text: Hallo wereld!, model: text-embedding-ada-002, token: 3
text: Hallo wereld!, model: text-similarity-davinci-001, token: 5 text: Hallo wereld!, model: text-similarity-davinci-001, token: 5
text: Hallo verden!, model: gpt-4, token: 4 text: Hallo verden!, model: gpt-4, token: 4
text: Hallo verden!, model: gpt-3.5-turbo, token: 4 text: Hallo verden!, model: gpt-3.5-turbo, token: 4
text: Hallo verden!, model: text-davinci-003, token: 5 text: Hallo verden!, model: text-davinci-003, token: 5
text: Hallo verden!, model: text-davinci-002, token: 5 text: Hallo verden!, model: text-davinci-002, token: 5
text: Hallo verden!, model: text-davinci-001, token: 5 text: Hallo verden!, model: text-davinci-001, token: 5
text: Hallo verden!, model: text-curie-001, token: 5 text: Hallo verden!, model: text-curie-001, token: 5
text: Hallo verden!, model: text-babbage-001, token: 5 text: Hallo verden!, model: text-babbage-001, token: 5
text: Hallo verden!, model: text-ada-001, token: 5 text: Hallo verden!, model: text-ada-001, token: 5
text: Hallo verden!, model: davinci, token: 5 text: Hallo verden!, model: davinci, token: 5
text: Hallo verden!, model: curie, token: 5 text: Hallo verden!, model: curie, token: 5
text: Hallo verden!, model: babbage, token: 5 text: Hallo verden!, model: babbage, token: 5
text: Hallo verden!, model: ada, token: 5 text: Hallo verden!, model: ada, token: 5
text: Hallo verden!, model: code-davinci-002, token: 5 text: Hallo verden!, model: code-davinci-002, token: 5
text: Hallo verden!, model: code-davinci-001, token: 5 text: Hallo verden!, model: code-davinci-001, token: 5
text: Hallo verden!, model: code-cushman-002, token: 5 text: Hallo verden!, model: code-cushman-002, token: 5
text: Hallo verden!, model: code-cushman-001, token: 5 text: Hallo verden!, model: code-cushman-001, token: 5
text: Hallo verden!, model: davinci-codex, token: 5 text: Hallo verden!, model: davinci-codex, token: 5
text: Hallo verden!, model: cushman-codex, token: 5 text: Hallo verden!, model: cushman-codex, token: 5
text: Hallo verden!, model: text-davinci-edit-001, token: 5 text: Hallo verden!, model: text-davinci-edit-001, token: 5
text: Hallo verden!, model: code-davinci-edit-001, token: 5 text: Hallo verden!, model: code-davinci-edit-001, token: 5
text: Hallo verden!, model: text-embedding-ada-002, token: 4 text: Hallo verden!, model: text-embedding-ada-002, token: 4
text: Hallo verden!, model: text-similarity-davinci-001, token: 5 text: Hallo verden!, model: text-similarity-davinci-001, token: 5

License

MIT

Documentation

Index

Constants

View Source
const ENDOFPROMPT string = "<|endofprompt|>"
View Source
const ENDOFTEXT string = "<|endoftext|>"
View Source
const FIM_MIDDLE string = "<|fim_middle|>"
View Source
const FIM_PREFIX string = "<|fim_prefix|>"
View Source
const FIM_SUFFIX string = "<|fim_suffix|>"

Variables

View Source
var ENCODING_MAP = map[string]*Encoding{}
View Source
var MODEL_TO_ENCODING = map[string]string{

	"gpt-4":         "cl100k_base",
	"gpt-3.5-turbo": "cl100k_base",

	"text-davinci-003": "p50k_base",
	"text-davinci-002": "p50k_base",
	"text-davinci-001": "r50k_base",
	"text-curie-001":   "r50k_base",
	"text-babbage-001": "r50k_base",
	"text-ada-001":     "r50k_base",
	"davinci":          "r50k_base",
	"curie":            "r50k_base",
	"babbage":          "r50k_base",
	"ada":              "r50k_base",

	"code-davinci-002": "p50k_base",
	"code-davinci-001": "p50k_base",
	"code-cushman-002": "p50k_base",
	"code-cushman-001": "p50k_base",
	"davinci-codex":    "p50k_base",
	"cushman-codex":    "p50k_base",

	"text-davinci-edit-001": "p50k_edit",
	"code-davinci-edit-001": "p50k_edit",

	"text-embedding-ada-002": "cl100k_base",

	"text-similarity-davinci-001":  "r50k_base",
	"text-similarity-curie-001":    "r50k_base",
	"text-similarity-babbage-001":  "r50k_base",
	"text-similarity-ada-001":      "r50k_base",
	"text-search-davinci-doc-001":  "r50k_base",
	"text-search-curie-doc-001":    "r50k_base",
	"text-search-babbage-doc-001":  "r50k_base",
	"text-search-ada-doc-001":      "r50k_base",
	"code-search-babbage-code-001": "r50k_base",
	"code-search-ada-code-001":     "r50k_base",

	"gpt2": "gpt2",
}

Functions

This section is empty.

Types

type CoreBPE

type CoreBPE struct {
	// contains filtered or unexported fields
}

func NewCoreBPE

func NewCoreBPE(encoder map[string]int, specialTokensEncoder map[string]int, pattern string) (*CoreBPE, error)

type Encoding

type Encoding struct {
	Name           string
	PatStr         string
	MergeableRanks map[string]int
	SpecialTokens  map[string]int
	ExplicitNVocab int
}

type Tiktoken

type Tiktoken struct {
	// contains filtered or unexported fields
}

func EncodingForModel

func EncodingForModel(modelName string) (*Tiktoken, error)

func GetEncoding

func GetEncoding(encodingName string) (*Tiktoken, error)

func (*Tiktoken) Decode

func (t *Tiktoken) Decode(tokens []int) string

func (*Tiktoken) Encode

func (t *Tiktoken) Encode(text string, allowedSpecial []string, disallowedSpecial []string) []int

func (*Tiktoken) SpecialTokenRegex

func (t *Tiktoken) SpecialTokenRegex(disallowedSpecialSet map[string]any) *regexp2.Regexp

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL