rhino

package module
v1.6.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2021 License: Apache-2.0 Imports: 11 Imported by: 1

README

Rhino Speech-to-Intent Engine

Made in Vancouver, Canada by Picovoice

Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of interest, in real-time. For example, given a spoken command:

Can I have a small double-shot espresso?

Rhino infers that the user would like to order a drink and emits the following inference result:

{
  "isUnderstood": "true",
  "intent": "orderBeverage",
  "slots": {
    "beverage": "espresso",
    "size": "small",
    "numberOfShots": "2"
  }
}

Rhino is:

  • using deep neural networks trained in real-world environments.
  • compact and computationally-efficient, making it perfect for IoT.
  • self-service. Developers and designers can train custom models using Picovoice Console.

Compatibility

  • Go 1.16+
  • Runs on Linux (x86_64), macOS (x86_64) and Windows (x86_64)

Installation

go get github.com/Picovoice/rhino/binding/go

Usage

To create an instance of the engine with default parameters, pass a path to a Rhino context file (.rhn) to the NewRhino function and then make a call to .Init().

import . "github.com/Picovoice/rhino/binding/go"

rhino = NewRhino("/path/to/context/file.rhn")
err := rhino.Init()
if err != nil {
    // handle error
}

The context file is a Speech-to-Intent context created either using Picovoice Console or one of the default contexts available on Rhino's GitHub repository.

The sensitivity of the engine can be tuned using the sensitivity parameter. It is a floating point number within [0, 1]. A higher sensitivity value results in fewer misses at the cost of (potentially) increasing the erroneous inference rate. You can also override the default Rhino model (.pv), which is needs to be done when using a non-English context.

To override these parameters, you can create a Rhino struct directly and then call Init():

import . "github.com/Picovoice/rhino/binding/go"

rhino = Rhino{
    ContextPath: "/path/to/context/file.rhn",
    Sensitivity: 0.7,
    ModelPath: "/path/to/rhino/params.pv"}
err := rhino.Init()
if err != nil {
    // handle error
}

Once initialized, you can start passing in frames of audio for processing. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio. The sample rate that is required by the engine is given by SampleRate and number of samples per frame is FrameLength.

To feed audio into Rhino, use the Process function in your capture loop. You must have called Init() before calling Process.

func getNextFrameAudio() []int16{
    // get audio frame
}

for {
    isFinalized, err := rhino.Process(getNextFrameAudio())
    if isFinalized {
        inference, err := rhino.GetInference()
        if inference.IsUnderstood {
            intent := inference.Intent
            slots := inference.Slots
            // add code to take action based on inferred intent and slot values
        } else {
            // add code to handle unsupported commands
        }
    }
}

When done resources have to be released explicitly.

rhino.Delete()

Using a defer call to Delete() after Init() is also a good way to ensure cleanup.

Non-English Contexts

In order to detect non-English contexts you need to use the corresponding model file. The model files for all supported languages are available here.

Demos

Check out the Rhino Go demos here

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// Number of audio samples per frame.
	FrameLength = nativeRhino.nativeFrameLength()

	// Audio sample rate accepted by Picovoice.
	SampleRate = nativeRhino.nativeSampleRate()

	// Rhino version
	Version = nativeRhino.nativeVersion()
)

Functions

This section is empty.

Types

type PvStatus

type PvStatus int

PvStatus type

const (
	SUCCESS          PvStatus = 0
	OUT_OF_MEMORY    PvStatus = 1
	IO_ERROR         PvStatus = 2
	INVALID_ARGUMENT PvStatus = 3
	STOP_ITERATION   PvStatus = 4
	KEY_ERROR        PvStatus = 5
	INVALID_STATE    PvStatus = 6
)

Possible status return codes from the Rhino library

type Rhino

type Rhino struct {

	// Absolute path to the file containing model parameters.
	ModelPath string

	// Inference sensitivity. A higher sensitivity value results in
	// fewer misses at the cost of (potentially) increasing the erroneous inference rate.
	// Sensitivity should be a floating-point number within 0 and 1.
	Sensitivity float32

	// Absolute path to the Rhino context file (.rhn).
	ContextPath string

	// Once initialized, stores the source of the Rhino context in YAML format. Shows the list of intents,
	// which expressions map to those intents, as well as slots and their possible values.
	ContextInfo string
	// contains filtered or unexported fields
}

Rhino struct

func NewRhino

func NewRhino(contextPath string) Rhino

Returns a Rhino struct with the given context file and default parameters

func (*Rhino) Delete

func (rhino *Rhino) Delete() error

Releases resources acquired by Rhino

func (*Rhino) GetInference

func (rhino *Rhino) GetInference() (inference RhinoInference, err error)

Gets inference results from Rhino. If the spoken command was understood, it includes the specific intent name that was inferred, and (if applicable) slot keys and specific slot values. Should only be called after the process function returns true, otherwise Rhino has not yet reached an inference conclusion. Returns an inference struct with `.IsUnderstood`, '.Intent` , and `.Slots`.

func (*Rhino) Init

func (rhino *Rhino) Init() error

Init function for Rhino. Must be called before attempting process

func (*Rhino) Process

func (rhino *Rhino) Process(pcm []int16) (isFinalized bool, err error)

Process a frame of pcm audio with the speech-to-intent engine. isFinalized returns true when Rhino has an inference ready to return

type RhinoInference

type RhinoInference struct {
	// Indicates whether Rhino understood what it heard based on the context
	IsUnderstood bool

	// If IsUnderstood, name of intent that was inferred
	Intent string

	// If isUnderstood, dictionary of slot keys and values that were inferred
	Slots map[string]string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL