astideepspeech

package module
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 7, 2020 License: MIT Imports: 3 Imported by: 4

README

GoReportCard GoDoc

Golang bindings for Mozilla's DeepSpeech speech-to-text library.

astideepspeech is compatible with version v0.8.0 of DeepSpeech.

Installation

Install DeepSpeech

  • fetch an up-to-date native_client.<your system>.tar.xz matching your system from DeepSpeech's "releases"
  • extract its content to /tmp/deepspeech/lib
  • download deepspeech.h from https://github.com/mozilla/DeepSpeech/raw/v0.8.0/native_client/deepspeech.h
  • copy it to /tmp/deepspeech/include
  • export CGO_LDFLAGS="-L/tmp/deepspeech/lib/"
  • export CGO_CXXFLAGS="-I/tmp/deepspeech/include/"
  • export LD_LIBRARY_PATH=/tmp/deepspeech/lib/:$LD_LIBRARY_PATH

Alternatively, copy the downloaded libdeepspeech.so and deepspeech.h files to directories that are searched by default, e.g. /usr/local/lib and /usr/local/include, respectively.

Install astideepspeech

Run the following command:

$ go get -u github.com/asticode/go-astideepspeech/...

Example

Get the pre-trained model and scorer

Run the following commands:

$ mkdir /tmp/deepspeech
$ cd /tmp/deepspeech
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.8.0/deepspeech-0.8.0-models.pbmm
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.8.0/deepspeech-0.8.0-models.scorer

Get the audio files

Run the following commands:

$ cd /tmp/deepspeech
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.8.0/audio-0.8.0.tar.gz
$ tar xvfz audio-0.8.0.tar.gz

Use the client

Run the following commands (make sure $GOPATH/bin is in your $PATH):

$ cd /tmp/deepspeech
$ deepspeech -model deepspeech-0.8.0-models.pbmm -scorer deepspeech-0.8.0-models.scorer -audio audio/2830-3980-0043.wav

    Text: experience proves this

$ deepspeech -model deepspeech-0.8.0-models.pbmm -scorer deepspeech-0.8.0-models.scorer -audio audio/4507-16021-0012.wav

    Text: why should one hall on the way
    
$ deepspeech -model deepspeech-0.8.0-models.pbmm -scorer deepspeech-0.8.0-models.scorer -audio audio/8455-210777-0068.wav

    Text: your power is sufficient i said

Documentation

Overview

package astideepspeech provides bindings for Mozilla's DeepSpeech speech-to-text library.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Version added in v0.7.0

func Version() string

Version returns the version of the DeepSpeech C library. The returned version is a semantic version (SemVer 2.0.0).

Types

type CandidateTranscript added in v0.7.0

type CandidateTranscript C.struct_CandidateTranscript

CandidateTranscript is a single transcript computed by the model, including a confidence value and the metadata for its constituent tokens.

func (*CandidateTranscript) Confidence added in v0.7.0

func (ct *CandidateTranscript) Confidence() float64

Confidence returns the approximated confidence value for this transcript. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcript.

func (*CandidateTranscript) NumTokens added in v0.7.0

func (ct *CandidateTranscript) NumTokens() uint

func (*CandidateTranscript) Tokens added in v0.7.0

func (ct *CandidateTranscript) Tokens() []TokenMetadata

type Metadata

type Metadata C.struct_Metadata

Metadata holds an array of CandidateTranscript objects computed by the model.

func (*Metadata) Close

func (m *Metadata) Close()

Close frees the Metadata structure properly.

func (*Metadata) NumTranscripts added in v0.7.0

func (m *Metadata) NumTranscripts() uint

func (*Metadata) Transcripts added in v0.7.0

func (m *Metadata) Transcripts() []CandidateTranscript

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model provides an interface to a trained DeepSpeech model.

func New

func New(modelPath string) (*Model, error)

New creates a new Model. modelPath is the path to the frozen model graph.

func (*Model) BeamWidth added in v0.9.0

func (m *Model) BeamWidth() uint

BeamWidth returns the beam width value used by the model. If SetModelBeamWidth was not called before, it will return the default value loaded from the model file.

func (*Model) Close

func (m *Model) Close()

Close frees associated resources and destroys the model object.

func (*Model) DisableExternalScorer added in v0.7.0

func (m *Model) DisableExternalScorer() error

DisableExternalScorer disables decoding using an external scorer.

func (*Model) EnableExternalScorer added in v0.7.0

func (m *Model) EnableExternalScorer(scorerPath string) error

EnableExternalScorer enables decoding using an external scorer. scorerPath is the path to the external scorer file.

func (*Model) NewStream added in v0.9.0

func (m *Model) NewStream() (*Stream, error)

NewStream creates a new streaming inference state. If an error is not returned, exactly one of the returned stream's Finish, FinishWithMetadata, or Discard methods must be called later to free resources.

func (*Model) SampleRate added in v0.9.0

func (m *Model) SampleRate() int

SampleRate returns the sample rate that was used to produce the model file.

func (*Model) SetBeamWidth added in v0.9.0

func (m *Model) SetBeamWidth(width uint) error

SetBeamWidth sets the beam width value used by the model. A larger beam width value generates better results at the cost of decoding time.

func (*Model) SetScorerAlphaBeta added in v0.7.0

func (m *Model) SetScorerAlphaBeta(alpha, beta float32) error

SetScorerAlphaBeta sets hyperparameters alpha and beta of the external scorer. alpha is the language model weight. beta is the word insertion weight.

func (*Model) SpeechToText

func (m *Model) SpeechToText(buffer []int16) (string, error)

SpeechToText uses the DeepSpeech model to convert speech to text. buffer is 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

func (*Model) SpeechToTextWithMetadata

func (m *Model) SpeechToTextWithMetadata(buffer []int16, numResults uint) (*Metadata, error)

SpeechToTextWithMetadata uses the DeepSpeech model to convert speech to text and output results including metadata.

buffer is a 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on). numResults is the maximum number of CandidateTranscript structs to return. Returned value might be smaller than this. If an error is not returned, the returned metadata's Close method must be called later to free resources.

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

Stream represents a streaming inference state.

func (*Stream) Discard added in v0.9.0

func (s *Stream) Discard()

Discard destroys a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don't want to perform a costly decode operation.

func (*Stream) FeedAudioContent

func (s *Stream) FeedAudioContent(buffer []int16)

FeedAudioContent feeds audio samples to an ongoing streaming inference. buffer is an array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

func (*Stream) Finish added in v0.9.0

func (s *Stream) Finish() (string, error)

Finish computes the final decoding of an ongoing streaming inference and returns the result. This signals the end of an ongoing streaming inference.

func (*Stream) FinishWithMetadata added in v0.9.0

func (s *Stream) FinishWithMetadata(numResults uint) (*Metadata, error)

FinishWithMetadata computes the final decoding of an ongoing streaming inference and returns results including metadata. This signals the end of an ongoing streaming inference. If an error is not returned, the metadata's Close method must be called.

func (*Stream) IntermediateDecode

func (s *Stream) IntermediateDecode() (string, error)

IntermediateDecode computes the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn't currently capable of streaming, so it always starts from the beginning of the audio.

func (*Stream) IntermediateDecodeWithMetadata added in v0.7.0

func (s *Stream) IntermediateDecodeWithMetadata(numResults uint) (*Metadata, error)

IntermediateDecodeWithMetadata computes the intermediate decoding of an ongoing streaming inference, returning results including metadata. numResults is the number of candidate transcripts to return. If an error is not returned, the metadata's Close method must be called.

type TokenMetadata added in v0.7.0

type TokenMetadata C.struct_TokenMetadata

TokenMetadata stores text of an individual token, along with its timing information.

func (*TokenMetadata) StartTime added in v0.7.0

func (tm *TokenMetadata) StartTime() float32

StartTime returns the position of the token in seconds.

func (*TokenMetadata) Text added in v0.7.0

func (tm *TokenMetadata) Text() string

Text returns the text corresponding to this token.

func (*TokenMetadata) Timestep added in v0.7.0

func (tm *TokenMetadata) Timestep() uint

Timestep returns the position of the token in units of 20ms.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL