gollum

package module
v0.0.0-...-07a9aa3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 16, 2023 License: MIT Imports: 12 Imported by: 0

README

GOLLuM

Production-grade LLM tooling. At least, in theory -- stuff changes fast so don't expect stability from this library so much as ideas for your own apps.

Features

  • Automated function dispatch
    • Parses arbitrary Go structs into JSONSchema for OpenAI - and validates when unmarshaling back to your structs
    • Simplified API to generate results from a single prompt or template
  • Highly performant vector store solution with exact search
    • SIMD acceleration for 10x better perf than naive approach, constant memory usage
    • Drop-in integration with OpenAI and other embedding providers
    • Carefully mocked, tested, and benchmarked.
  • Implementation of HyDE (hypothetical documents embeddings) for enhanced retrieval
  • MIT License

Examples

Dispatch

Function dispatch is a highly simplified and easy way to generate filled structs via an LLM.

type dinnerParty struct {
	Topic       string   `json:"topic" jsonschema:"required" jsonschema_description:"The topic of the conversation"`
	RandomWords []string `json:"random_words" jsonschema:"required" jsonschema_description:"Random words to prime the conversation"`
}
completer := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
d := gollum.NewOpenAIDispatcher[dinnerParty]("dinner_party", "Given a topic, return random words", completer, nil)
output, _ := d.Prompt(context.Background(), "Talk to me about dinosaurs")

The result should be a filled `dinnerParty`` struct.

expected := dinnerParty{
		Topic:       "dinosaurs",
		RandomWords: []string{"dinosaur", "fossil", "extinct"},
	}

Some similar libraries / ideas:

Parsing

Simplest

Imagine you have a function GetWeather --

type getWeatherInput struct {
	Location string `json:"location" jsonschema_description:"The city and state, e.g. San Francisco, CA" jsonschema:"required"`
	Unit     string `json:"unit,omitempty" jsonschema:"enum=celsius,enum=fahrenheit" jsonschema_description:"The unit of temperature"`
}

type getWeatherOutput struct {
    // ...
}

// GetWeather does something, this dosctring is annoying but theoretically possible to get
func GetWeather(ctx context.Context, inp getWeatherInput) (out getWeatherOutput, err error) {
    return out, err
}

This is a common pattern for API design, as it is eay to share the getWeatherInput struct (well, imagine if it were public). See, for example, the GRPC service definitions, or the Connect RPC implementation. This means we can simplify the logic greatly by assuming a single input struct.

Now, we can construct the responses:

type getWeatherInput struct {
	Location string `json:"location" jsonschema_description:"The city and state, e.g. San Francisco, CA" jsonschema:"required"`
	Unit     string `json:"unit,omitempty" jsonschema:"enum=celsius,enum=fahrenheit" jsonschema_description:"The unit of temperature"`
}

fi := gollum.StructToJsonSchema("weather", "Get the current weather in a given location", getWeatherInput{})

chatRequest := openai.ChatCompletionRequest{
    Model: "gpt-3.5-turbo-0613",
    Messages: []openai.ChatCompletionMessage{
        {
            Role:    "user",
            Content: "Whats the temperature in Boston?",
        },
    },
    MaxTokens:   256,
    Temperature: 0.0,
    Tools:       []openai.Tool{{Type: "function", Function: openai.FunctionDefinition(fi)}},
    ToolChoice:  "weather",
}

ctx := context.Background()
resp, err := api.SendRequest(ctx, chatRequest)
parser := gollum.NewJSONParser[getWeatherInput](false)
input, err := parser.Parse(ctx, resp.Choices[0].Message.ToolCalls[0].Function.Arguments)

This example steps through all that, end to end. Some of this is 'sort of' pseudo-code, as the OpenAI clients I use haven't implemented support yet for functions, but it should also hopefully show that minimal modifications are necessary to upstream libraries.

It is also possible to go from just the function definition to a fully formed OpenAI FunctionCall. Reflection gives name of the function for free, godoc parsing can get the function description too. I think in practice though that it's fairly unlikely that you need to change the name/description of the function that often, and in practice the inputs change more often. Using this pattern and compiling once makes the most sense to me.

We should be able to chain the call for the single input and for the ctx + single input case and return it easily.

Recursion on arbitrary structs without explicit definitions

Say you have a struct that has JSON tags defined.

fi := gollum.StructToJsonSchema("ChatCompletion", "Call the OpenAI chat completion API", chatCompletionRequest{})

chatRequest := chatCompletionRequest{
    ChatCompletionRequest: openai.ChatCompletionRequest{
        Model: "gpt-3.5-turbo-0613",
        Messages: []openai.ChatCompletionMessage{
            {
                Role:    openai.ChatMessageRoleSystem,
                Content: "Construct a ChatCompletionRequest to answer the user's question, but using Kirby references. Do not answer the question directly using prior knowledge, you must generate a ChatCompletionRequest that will answer the question.",
            },
            {
                Role:    openai.ChatMessageRoleUser,
                Content: "What is the definition of recursion?",
            },
        },
        MaxTokens:   256,
        Temperature: 0.0,
    },
    Tools: []openai.Tool{
        {
            Type: "function",
            Function: fi,
        }
    }
}
parser := gollum.NewJSONParser[openai.ChatCompletionRequest](false)
input, err := parser.Parse(ctx, resp.Choices[0].Message.ToolCalls[0].Function.Arguments)

On the first try, this yielded the following result:

 {
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "system", "content": "You are Kirby, a friendly virtual assistant."},
    {"role": "user", "content": "What is the definition of recursion?"}
  ]
}

That's really sick considering that no effort was put into manually creating a new JSON struct, and the original struct didn't have any JSONSchema tags - just JSON serdes comments.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FunctionInputToTool

func FunctionInputToTool(fi FunctionInput) openai.Tool

Types

type ChatCompleter

type ChatCompleter interface {
	CreateChatCompletion(context.Context, openai.ChatCompletionRequest) (openai.ChatCompletionResponse, error)
}

type Completer

type Completer interface {
	CreateCompletion(context.Context, openai.CompletionRequest) (openai.CompletionResponse, error)
}

type Dispatcher

type Dispatcher[T any] interface {
	// Prompt generates an object of type T from the given prompt.
	Prompt(ctx context.Context, prompt string) (T, error)
	// PromptTemplate generates an object of type T from a given template.
	// The prompt is then a template string that is rendered with the given values.
	PromptTemplate(ctx context.Context, template *template.Template, values interface{}) (T, error)
}

type DocStore

type DocStore interface {
	Insert(context.Context, Document) error
	Retrieve(ctx context.Context, id string) (Document, error)
}

type Document

type Document struct {
	ID        string                 `json:"id"`
	Content   string                 `json:"content,omitempty"`
	Embedding []float32              `json:"embedding,omitempty"`
	Metadata  map[string]interface{} `json:"metadata,omitempty"`
}

func NewDocumentFromString

func NewDocumentFromString(content string) Document

type DummyDispatcher

type DummyDispatcher[T any] struct{}

func NewDummyDispatcher

func NewDummyDispatcher[T any]() *DummyDispatcher[T]

func (*DummyDispatcher[T]) Prompt

func (d *DummyDispatcher[T]) Prompt(ctx context.Context, prompt string) (T, error)

func (*DummyDispatcher[T]) PromptTemplate

func (d *DummyDispatcher[T]) PromptTemplate(ctx context.Context, template *template.Template, values interface{}) (T, error)

type Embedder

type Embedder interface {
	CreateEmbeddings(context.Context, openai.EmbeddingRequest) (openai.EmbeddingResponse, error)
}

type FunctionInput

type FunctionInput struct {
	Name        string `json:"name"`
	Description string `json:"description,omitempty"`
	Parameters  any    `json:"parameters"`
}

func StructToJsonSchema

func StructToJsonSchema(functionName string, functionDescription string, inputStruct interface{}) FunctionInput

func StructToJsonSchemaGeneric

func StructToJsonSchemaGeneric[T any](functionName string, functionDescription string) FunctionInput

type JSONParserGeneric

type JSONParserGeneric[T any] struct {
	// contains filtered or unexported fields
}

JSONParser is a parser that parses arbitrary JSON structs It is threadsafe and can be used concurrently. The underlying validator is threadsafe as well.

func NewJSONParserGeneric

func NewJSONParserGeneric[T any](validate bool) *JSONParserGeneric[T]

NewJSONParser returns a new JSONParser validation is done via jsonschema

func (*JSONParserGeneric[T]) Parse

func (p *JSONParserGeneric[T]) Parse(ctx context.Context, input []byte) (T, error)

type MemoryDocStore

type MemoryDocStore struct {
	Documents map[string]Document
}

MemoryDocStore is a simple in-memory document store. It's functionally a hashmap / inverted-index.

func NewMemoryDocStore

func NewMemoryDocStore() *MemoryDocStore

func NewMemoryDocStoreFromDisk

func NewMemoryDocStoreFromDisk(ctx context.Context, bucket *blob.Bucket, path string) (*MemoryDocStore, error)

func (*MemoryDocStore) Insert

func (m *MemoryDocStore) Insert(ctx context.Context, d Document) error

Insert adds a node to the document store. It overwrites duplicates.

func (*MemoryDocStore) Load

func (m *MemoryDocStore) Load(ctx context.Context, bucket *blob.Bucket, path string) error

Load loads the document store from disk.

func (*MemoryDocStore) Persist

func (m *MemoryDocStore) Persist(ctx context.Context, bucket *blob.Bucket, path string) error

Persist saves the document store to disk.

func (*MemoryDocStore) Retrieve

func (m *MemoryDocStore) Retrieve(ctx context.Context, id string) (Document, error)

Retrieve returns a node from the document store matching an ID.

type Moderator

type Moderator interface {
	Moderations(context.Context, openai.ModerationRequest) (openai.ModerationResponse, error)
}

type OAITool

type OAITool struct {
	// Type is always "function" for now.
	Type     string        `json:"type"`
	Function FunctionInput `json:"function"`
}

type OpenAIDispatcher

type OpenAIDispatcher[T any] struct {
	*OpenAIDispatcherConfig
	// contains filtered or unexported fields
}

OpenAIDispatcher dispatches to any OpenAI compatible model. For any type T and prompt, it will generate and parse the response into T.

func NewOpenAIDispatcher

func NewOpenAIDispatcher[T any](name, description, systemPrompt string, completer ChatCompleter, cfg *OpenAIDispatcherConfig) *OpenAIDispatcher[T]

func (*OpenAIDispatcher[T]) Prompt

func (d *OpenAIDispatcher[T]) Prompt(ctx context.Context, prompt string) (T, error)

func (*OpenAIDispatcher[T]) PromptTemplate

func (d *OpenAIDispatcher[T]) PromptTemplate(ctx context.Context, template *template.Template, values interface{}) (T, error)

PromptTemplate generates an object of type T from a given template. This is mostly a convenience wrapper around Prompt.

type OpenAIDispatcherConfig

type OpenAIDispatcherConfig struct {
	Model       *string
	Temperature *float32
	MaxTokens   *int
}

type Parser

type Parser[T any] interface {
	Parse(ctx context.Context, input []byte) (T, error)
}

Parser is an interface for parsing strings into structs It is threadsafe and can be used concurrently. The underlying validator is threadsafe as well.

Directories

Path Synopsis
internal
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
Package syncpool provides a generic wrapper around sync.Pool Copied from https://github.com/mkmik/syncpool
Package syncpool provides a generic wrapper around sync.Pool Copied from https://github.com/mkmik/syncpool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL