audio

package
v0.3.0-beta Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2024 License: MIT Imports: 6 Imported by: 0

README

Audio

Bindings for the audio endpoint.

Example

See audio-example.go.

Documentation

Overview

Package audio provides bindings for the audio endpoint. Converts audio into text.

Index

Constants

View Source
const (
	BaseEndpoint         = common.BaseURL + "audio/"
	TransciptionEndpoint = BaseEndpoint + "transcriptions"
	TranslationEndpoint  = BaseEndpoint + "translations"
	SpeechEndpoint       = BaseEndpoint + "speech"
)
View Source
const (
	// TODO: Support non-json return formats.
	ResponseFormatJSON = "json"
	// [deprecated]: Use ResponseFormatJSON instead
	JSONResponseFormat = ResponseFormatJSON
)
View Source
const (
	VoiceAlloy   = "alloy"
	VoiceEcho    = "echo"
	VoiceFable   = "fable"
	VoiceOnyx    = "onyx"
	VoiceNova    = "nova"
	VoiceShimmer = "shimmer"

	SpeechFormatMp3  = "mp3"
	SpeechFormatOpus = "opus"
	SpeechFormatAac  = "aac"
	SpeechFormatFlac = "flac"
)

Variables

This section is empty.

Functions

func MakeSpeechRequest

func MakeSpeechRequest(request *SpeechRequest, organizationID *string) ([]byte, error)

Types

type Response

type Response struct {
	Text  string                `json:"text"`
	Usage common.ResponseUsage  `json:"usage"`
	Error *common.ResponseError `json:"error,omitempty"`
}

Response structure for both Transcription and Translation requests.

func MakeTranscriptionRequest

func MakeTranscriptionRequest(request *TranscriptionRequest, organizationID *string) (*Response, error)

func MakeTranslationRequest

func MakeTranslationRequest(request *TranslationRequest, organizationID *string) (*Response, error)

type ResponseFormat

type ResponseFormat = string

type SpeechRequest

type SpeechRequest struct {
	// One of the available TTS models.
	Model string `json:"model"`

	// The text to generate audio for. The maximum length is 4096 characters.
	Input string `json:"input"`

	// The voice to use when generating the audio.
	Voice string `json:"voice"`

	// The format to audio in.
	ResponseFormat ResponseFormat `json:"response_format,omitempty"`

	// The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
	Speed float64 `json:"speed,omitempty"`
}

Request structure for the create speech endpoint.

type TranscriptionRequest

type TranscriptionRequest struct {
	// The audio file to transcribe, in one of these formats:
	// mp3, mp4, mpeg, mpga, m4a, wav, or webm.
	// This can be a file path or a URL.
	File string `json:"file"`

	// ID of the model to use. You can use the List models API
	// to see all of your available models, or see our Model
	// overview for descriptions of them.
	Model string `json:"model"`

	// An optional text to guide the model's style or continue a
	// previous audio segment. The prompt should match the audio language.
	Prompt string `json:"prompt,omitempty"`

	// The format of the transcript output, in one of these options:
	// json, text, srt, verbose_json, or vtt.
	ResponseFormat ResponseFormat `json:"response_format,omitempty"`

	// The sampling temperature, between 0 and 1. Higher values like 0.8 will
	// make the output more random, while lower values like 0.2 will make it
	// more focused and deterministic. If set to 0, the model will use log
	// probability to automatically increase the temperature until certain
	// thresholds are hit.
	Temperature *float64 `json:"temperature,omitempty"`

	// The language of the input audio. Supplying the input language in
	// ISO-639-1 format will improve accuracy and latency.
	Language string `json:"language,omitempty"`
}

Request structure for the transcription endpoint.

type TranslationRequest

type TranslationRequest struct {
	// The audio file to transcribe, in one of these formats:
	// mp3, mp4, mpeg, mpga, m4a, wav, or webm.
	// This can be a file path or a URL.
	File string `json:"file"`

	// ID of the model to use. You can use the List models API
	// to see all of your available models, or see our Model
	// overview for descriptions of them.
	Model string `json:"model"`

	// An optional text to guide the model's style or continue a
	// previous audio segment. The prompt should be in English.
	Prompt string `json:"prompt,omitempty"`

	// The format of the transcript output, in one of these options:
	// json, text, srt, verbose_json, or vtt.
	ResponseFormat ResponseFormat `json:"response_format,omitempty"`

	// The sampling temperature, between 0 and 1. Higher values like 0.8 will
	// make the output more random, while lower values like 0.2 will make it
	// more focused and deterministic. If set to 0, the model will use log
	// probability to automatically increase the temperature until certain
	// thresholds are hit.
	Temperature *float64 `json:"temperature,omitempty"`
}

Request structure for the Translations endpoint.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL