llama

module

v0.0.0-...-85e67b9 Latest Latest Go to latest Published: May 8, 2024 License: MIT

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/adrianliechti/llama

Links

Open Source Insights

README ¶

LLM Platform

Open Source LLM Platform to build and deploy applications at scale

Integrations & Configuration

LLM Providers

OpenAI Platform

https://platform.openai.com/docs/api-reference

providers:
  - type: openai
    token: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    models:
      gpt-3.5-turbo:
        id: gpt-3.5-turbo-1106

      gpt-4:
        id: gpt-4-1106-preview
        
      text-embedding-ada-002:
        id: text-embedding-ada-002

Azure OpenAI Service

https://azure.microsoft.com/en-us/products/ai-services/openai-service

providers:
  - type: openai
    url: https://xxxxxxxx.openai.azure.com
    token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    models:
      gpt-3.5-turbo:
        id: gpt-35-turbo-16k

      gpt-4:
        id: gpt-4-32k
        
      text-embedding-ada-002:
        id: text-embedding-ada-002

Anthropic

https://www.anthropic.com/api

providers:
  - type: anthropic
    token: sk-ant-apixx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    models:
      claude-3-opus:
        id: claude-3-opus-20240229

Ollama

https://ollama.ai

$ ollama start
$ ollama run mistral

providers:
  - type: ollama
    url: http://localhost:11434

    models:
      mistral-7b-instruct:
        id: mistral

LLAMA.CPP

https://github.com/ggerganov/llama.cpp/tree/master/examples/server

# using taskfile.dev
$ task llama:server

# LLAMA.CPP Server
$ llama-server --port 9081 --log-disable --model ./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

# LLAMA.CPP Server (Multimodal Model)
$ llama-server --port 9081 --log-disable --model ./models/llava-v1.5-7b-Q4_K.gguf --mmproj ./models/llava-v1.5-7b-mmproj-Q4_0.gguf

# using Docker (might be slow)
$ docker run -it --rm -p 9081:9081 -v ./models/:/models/ ghcr.io/ggerganov/llama.cpp:server --host 0.0.0.0 --port 9081 --model /models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

providers:
  - type: llama
    url: http://localhost:9081

    models:
      mistral-7b-instruct:
        id: /models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

WHISPER.CPP

https://github.com/ggerganov/whisper.cpp/tree/master/examples/server

# using taskfile.dev
$ task whisper:server

# WHISPER.CPP Server
$ whisper-server --port 9083 --convert --model ./models/whisper-ggml-medium.bin

providers:
  - type: whisper
    url: http://localhost:9085

    models:
      whisper:
        id: whisper

Hugging Face

https://huggingface.co/

providers:
  - type: huggingface
    url: https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.1

    models:
      mistral-7B-instruct:
        id: tgi

Mimic 3

mkdir -p mimic3
chmod 777 mimic3
docker run -it -p 59125:59125 -v $(pwd)/models/mimic3:/home/mimic3/.local/share/mycroft/mimic3 mycroftai/mimic3

providers:
  - type: mimic
      url: http://localhost:59125
  
      models:
        tts-1:
          id: mimic-3

LangChain / LangServe

https://python.langchain.com/docs/langserve

providers:
  - type: langchain
    url: http://your-langchain-server:8000

    models:
      langchain:
        id: default

Vector Databses / Indexes

Chroma

https://www.trychroma.com

# using Docker
$ docker run -it --rm -p 9083:8000 -v chroma-data:/chroma/chroma ghcr.io/chroma-core/chroma

indexes:
  docs:
    type: chroma
    url: http://localhost:9083
    namespace: docs
    embedding: text-embedding-ada-002

Weaviate

https://weaviate.io

# using Docker
$ docker run -it --rm -p 9084:8080 -v weaviate-data:/var/lib/weaviate -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true -e PERSISTENCE_DATA_PATH=/var/lib/weaviate semitechnologies/weaviate

indexes:
  docs:
    type: weaviate
    url: http://localhost:9084
    namespace: Document
    embedding: text-embedding-ada-002

In-Memory

indexes:
  docs:
    type: memory   
    embedding: text-embedding-ada-002

OpenSearch / Elasticsearch

# using Docker
docker run -it --rm -p 9200:9200 -v opensearch-data:/usr/share/opensearch/data -e "discovery.type=single-node" -e DISABLE_SECURITY_PLUGIN=true opensearchproject/opensearch:latest

indexes:
  docs:
    type: elasticsearch
    url: http://localhost:9200
    namespace: docs

Extractors

Text

extractors:
  text:
    type: text

Code

Supported Languages:

C#
C++
Go
Java
Kotlin
Java Script
Type Script
Python
Ruby
Rust
Scala
Swfit

extractors:
  code:
    type: code

Tesseract

https://tesseract-ocr.github.io

# using Docker
docker run -it --rm -p 9086:8884 hertzg/tesseract-server:latest

extractors:
  tesseract:
    type: tesseract
    url: http://localhost:9086

Unstructured

https://unstructured.io

# using Docker
docker run -it --rm -p 9085:8000 quay.io/unstructured-io/unstructured-api:0.0.64 --port 8000 --host 0.0.0.0

extractors:
  unstructured:
    type: unstructured
    url: http://localhost:9085

Classifications

LLM Classifier

classifiers:
  {classifier-id}:
    type: llm
    model: mistral-7b-instruct
    classes:
      class-1: "...Description when to use Class 1..."
      class-2: "...Description when to use Class 2..."

Use Cases

Retrieval Augmented Generation (RAG)

Configuration

chains:
  qa:
    type: rag
    index: docs
    model: mistral-7b-instruct

    # limit: 10
    # distance: 1

    # filters:
    #  {metadata-key}:
    #    classifier: {classifier-id}

Index Documents

Using Extractor

POST http://localhost:8080/v1/index/{index-name}/{extractor}
Content-Type: application/pdf
Content-Disposition: attachment; filename="filename.pdf"

Using Documents

POST http://localhost:8080/v1/index/{index-name}

[
    {
        "id": "id1",
        "content": "content of document...",
        "metadata": {
          "key1": "value1",
          "key2": "value2"
        }
    },
    {
        "id": "id2",
        "content": "content of document...",
        "metadata": {
          "key1": "value1",
          "key2": "value2"
        }
    }
]

Function Calling

Hermes Function Calling

providers:
  - type: llama
    url: http://localhost:9081

    models:
      hermes-2-pro:
        id: /models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf
        adapter: hermesfn

Directories ¶

Path	Synopsis
cmd
client
server
config
examples
custom-chat
local-hermesfn
local-rag
local-stablediffusion
local-voicebot
pkg
adapter
adapter/hermesfn
authorizer
authorizer/oidc
authorizer/static
chain
chain/assistant
chain/rag
chain/toolbox
classifier
classifier/llm
extractor
extractor/code
extractor/tesseract
extractor/text
extractor/unstructured
index
index/aisearch
index/bing
index/chroma
index/custom
index/duckduckgo
index/elasticsearch
index/memory
index/tavily
index/weaviate
index/wikipedia
jsonschema
prompt
provider
provider/anthropic
provider/automatic1111
provider/azuretranslator
provider/custom
provider/deepl
provider/groq
provider/huggingface
provider/langchain
provider/llama
provider/mimic
provider/mistral
provider/ollama
provider/openai
provider/whisper
text
to
tool
tool/custom
tool/search
server
api
oai
ollama

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL