rhino

package module
v2.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2023 License: Apache-2.0 Imports: 14 Imported by: 4

README

Rhino Speech-to-Intent Engine

Made in Vancouver, Canada by Picovoice

Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of interest, in real-time. For example, given a spoken command:

Can I have a small double-shot espresso?

Rhino infers that the user would like to order a drink and emits the following inference result:

{
  "isUnderstood": "true",
  "intent": "orderBeverage",
  "slots": {
    "beverage": "espresso",
    "size": "small",
    "numberOfShots": "2"
  }
}

Rhino is:

  • using deep neural networks trained in real-world environments.
  • compact and computationally-efficient, making it perfect for IoT.
  • self-service. Developers and designers can train custom models using Picovoice Console.

Compatibility

  • Go 1.16+
  • Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64), Raspberry Pi, NVIDIA Jetson (Nano) and BeagleBone

Installation

go get github.com/Picovoice/rhino/binding/go/v2

AccessKey

Rhino requires a valid Picovoice AccessKey at initialization. AccessKey acts as your credentials when using Rhino SDKs. You can get your AccessKey for free. Make sure to keep your AccessKey secret. Signup or Login to Picovoice Console to get your AccessKey.

Usage

To create an instance of the engine with default parameters, pass an AccessKey and a path to a Rhino context file (.rhn) to the NewRhino function and then make a call to .Init().

import . "github.com/Picovoice/rhino/binding/go"

const accessKey string = "${ACCESS_KEY}" // obtained from Picovoice Console (https://console.picovoice.ai/)

rhino = NewRhino(accessKey, "/path/to/context/file.rhn")
err := rhino.Init()
if err != nil {
    // handle error
}

The context file is a Speech-to-Intent context created either using Picovoice Console or one of the default contexts available on Rhino's GitHub repository.

The sensitivity of the engine can be tuned using the sensitivity parameter. It is a floating-point number within [0, 1]. A higher sensitivity value results in fewer misses at the cost of (potentially) increasing the erroneous inference rate. You can also override the default Rhino model (.pv), which is required when using a non-English context.

To override these parameters, you can create a Rhino struct directly and then call Init():

import . "github.com/Picovoice/rhino/binding/go/v2"

const accessKey string = "${ACCESS_KEY}" // obtained from Picovoice Console (https://console.picovoice.ai/)

rhino := Rhino{
    AccessKey: accessKey,
    ContextPath: "/path/to/context/file.rhn",
    Sensitivity: 0.7,
    ModelPath: "/path/to/rhino/params.pv"}
err := rhino.Init()
if err != nil {
    // handle error
}

Once initialized, you can start passing in frames of audio for processing. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio. The sample rate that is required by the engine is given by SampleRate and number of samples-per-frame is FrameLength.

To feed audio into Rhino, use the Process function in your capture loop. You must have called Init() before calling Process.

func getNextFrameAudio() []int16{
    // get audio frame
}

for {
    isFinalized, err := rhino.Process(getNextFrameAudio())
    if isFinalized {
        inference, err := rhino.GetInference()
        if inference.IsUnderstood {
            intent := inference.Intent
            slots := inference.Slots
            // add code to take action based on inferred intent and slot values
        } else {
            // add code to handle unsupported commands
        }
    }
}

When done with the engine, resources have to be released explicitly.

rhino.Delete()

Using a defer call to Delete() after Init() is also a good way to ensure cleanup.

Non-English Contexts

In order to detect non-English contexts you need to use the corresponding model file. The model files for all supported languages are available here.

Demos

Check out the Rhino Go demos here

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// Number of audio samples per frame.
	FrameLength int

	// Audio sample rate accepted by Picovoice.
	SampleRate int

	// Rhino version
	Version string
)

Functions

This section is empty.

Types

type PvStatus

type PvStatus int

PvStatus type

const (
	SUCCESS                  PvStatus = 0
	OUT_OF_MEMORY            PvStatus = 1
	IO_ERROR                 PvStatus = 2
	INVALID_ARGUMENT         PvStatus = 3
	STOP_ITERATION           PvStatus = 4
	KEY_ERROR                PvStatus = 5
	INVALID_STATE            PvStatus = 6
	RUNTIME_ERROR            PvStatus = 7
	ACTIVATION_ERROR         PvStatus = 8
	ACTIVATION_LIMIT_REACHED PvStatus = 9
	ACTIVATION_THROTTLED     PvStatus = 10
	ACTIVATION_REFUSED       PvStatus = 11
)

Possible status return codes from the Rhino library

type Rhino

type Rhino struct {

	// AccessKey obtained from Picovoice Console (https://console.picovoice.ai/).
	AccessKey string

	// Absolute path to Rhino's dynamic library.
	LibraryPath string

	// Absolute path to the file containing model parameters.
	ModelPath string

	// Inference sensitivity. A higher sensitivity value results in
	// fewer misses at the cost of (potentially) increasing the erroneous inference rate.
	// Sensitivity should be a floating-point number within 0 and 1.
	Sensitivity float32

	// Endpoint duration in seconds. An endpoint is a chunk of silence at the end of an
	// utterance that marks the end of spoken command. It should be a positive number within [0.5, 5]. A lower endpoint
	// duration reduces delay and improves responsiveness. A higher endpoint duration assures Rhino doesn't return inference
	// preemptively in case the user pauses before finishing the request.
	EndpointDurationSec float32

	// Absolute path to the Rhino context file (.rhn).
	ContextPath string

	// If set to `true`, Rhino requires an endpoint (a chunk of silence) after the spoken command.
	// If set to `false`, Rhino tries to detect silence, but if it cannot, it still will provide inference regardless. Set
	// to `false` only if operating in an environment with overlapping speech (e.g. people talking in the background).
	RequireEndpoint bool

	// Once initialized, stores the source of the Rhino context in YAML format. Shows the list of intents,
	// which expressions map to those intents, as well as slots and their possible values.
	ContextInfo string
	// contains filtered or unexported fields
}

Rhino struct

func NewRhino

func NewRhino(accessKey string, contextPath string) Rhino

Returns a Rhino struct with the given context file and default parameters

func (*Rhino) Delete

func (rhino *Rhino) Delete() error

Releases resources acquired by Rhino

func (*Rhino) GetInference

func (rhino *Rhino) GetInference() (inference RhinoInference, err error)

Gets inference results from Rhino. If the spoken command was understood, it includes the specific intent name that was inferred, and (if applicable) slot keys and specific slot values. Should only be called after the process function returns true, otherwise Rhino has not yet reached an inference conclusion. Returns an inference struct with `.IsUnderstood`, '.Intent` , and `.Slots`.

func (*Rhino) Init

func (rhino *Rhino) Init() error

Init function for Rhino. Must be called before attempting process

func (*Rhino) Process

func (rhino *Rhino) Process(pcm []int16) (isFinalized bool, err error)

Process a frame of pcm audio with the speech-to-intent engine. isFinalized returns true when Rhino has an inference ready to return

type RhinoError

type RhinoError struct {
	StatusCode PvStatus
	Message    string
}

func (*RhinoError) Error

func (e *RhinoError) Error() string

type RhinoInference

type RhinoInference struct {
	// Indicates whether Rhino understood what it heard based on the context
	IsUnderstood bool

	// If IsUnderstood, name of intent that was inferred
	Intent string

	// If isUnderstood, dictionary of slot keys and values that were inferred
	Slots map[string]string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL