astideepspeech

package module
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 20, 2020 License: MIT Imports: 2 Imported by: 0

README

GoReportCard GoDoc

Golang bindings for Mozilla's DeepSpeech speech-to-text library.

As of now, astideepspeech is only compatible with version v0.6.0 of DeepSpeech.

Installation

Install DeepSpeech

  • fetch an up-to-date native_client.<your system>.tar.xz matching your system from DeepSpeech's "releases"
  • extract its content to /tmp/deepspeech/lib
  • download deepspeech.h from https://github.com/mozilla/DeepSpeech/raw/v0.6.0/native_client/deepspeech.h
  • copy it to /tmp/deepspeech/include
  • export CGO_LDFLAGS="-L/tmp/deepspeech/lib/"
  • export CGO_CXXFLAGS="-I/tmp/deepspeech/include/"
  • export LD_LIBRARY_PATH=/tmp/deepspeech/lib/:$LD_LIBRARY_PATH

Install astideepspeech

Run the following command:

$ go get -u github.com/asticode/go-astideepspeech/...

Example

Get the pre-trained model

Run the following commands:

$ mkdir /tmp/deepspeech
$ cd /tmp/deepspeech
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
$ tar xvfz deepspeech-0.6.0-models.tar.gz

Get the audio files

Run the following commands:

$ cd /tmp/deepspeech
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz
$ tar xvfz audio-0.6.0.tar.gz

Use the client

Run the following commands (make sure $GOPATH/bin is in your $PATH):

$ cd /tmp/deepspeech
$ deepspeech -model models/output_graph.pb -audio audio/2830-3980-0043.wav -lm models/lm.binary -trie models/trie

    Text: experience proves this

$ deepspeech -model models/output_graph.pb -audio audio/4507-16021-0012.wav -lm models/lm.binary -trie models/trie

    Text: why should one halt on the way
    
$ deepspeech -model models/output_graph.pb -audio audio/8455-210777-0068.wav -lm models/lm.binary -trie models/trie

    Text: your power is sufficient i said

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetVersion added in v0.7.0

func GetVersion() string

PrintVersions Print version of this library and of the linked TensorFlow library.

Types

type Metadata

type Metadata C.struct_Metadata

Metadata represents a DeepSpeech metadata output

func (*Metadata) Close

func (m *Metadata) Close() error

Close frees the Metadata structure properly

func (*Metadata) Confidence

func (m *Metadata) Confidence() float64

func (*Metadata) Items

func (m *Metadata) Items() []MetadataItem

func (*Metadata) NumItems

func (m *Metadata) NumItems() int32

type MetadataItem

type MetadataItem C.struct_MetadataItem

func (*MetadataItem) Character

func (mi *MetadataItem) Character() string

func (*MetadataItem) StartTime

func (mi *MetadataItem) StartTime() float32

func (*MetadataItem) Timestep

func (mi *MetadataItem) Timestep() int

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model represents a DeepSpeech model

func New

func New(modelPath string, beamWidth int) *Model

New creates a new Model

modelPath The path to the frozen model graph. beamWidth The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.

func (*Model) Close

func (m *Model) Close() error

Close closes the model properly

func (*Model) DisableExternalScorer added in v0.7.0

func (m *Model) DisableExternalScorer()

func (*Model) EnableExternalScorer added in v0.7.0

func (m *Model) EnableExternalScorer(scorerPath string)

EnableExternalScorer enables decoding using beam scoring with a KenLM language model.

lmPath The path to the language model binary file.

func (*Model) GetModelBeamWidth added in v0.7.0

func (m *Model) GetModelBeamWidth() uint

func (*Model) GetModelSampleRate

func (m *Model) GetModelSampleRate() int

GetModelSampleRate read the sample rate that was used to produce the model file.

func (*Model) SetModelBeamWidth added in v0.7.0

func (m *Model) SetModelBeamWidth(beamWidth uint)

func (*Model) SpeechToText

func (m *Model) SpeechToText(buffer []int16, bufferSize uint) string

SpeechToText uses the DeepSpeech model to perform Speech-To-Text. buffer A 16-bit, mono raw audio signal at the appropriate sample rate. bufferSize The number of samples in the audio signal.

func (*Model) SpeechToTextWithMetadata

func (m *Model) SpeechToTextWithMetadata(buffer []int16, bufferSize uint) *Metadata

SpeechToTextWithMetadata uses the DeepSpeech model to perform Speech-To-Text. buffer A 16-bit, mono raw audio signal at the appropriate sample rate. bufferSize The number of samples in the audio signal.

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

Stream represent a streaming state

func CreateStream

func CreateStream(mw *Model) *Stream

CreateStream creates a new audio stream

mw The DeepSpeech model to use

func (*Stream) FeedAudioContent

func (s *Stream) FeedAudioContent(buffer []int16, bufferSize uint)

FeedAudioContent Feed audio samples to an ongoing streaming inference. aBuffer An array of 16-bit, mono raw audio samples at the appropriate sample rate. aBufferSize The number of samples in @p aBuffer.

func (*Stream) FinishStream

func (s *Stream) FinishStream() string

FinishStream Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.

func (*Stream) FinishStreamWithMetadata

func (s *Stream) FinishStreamWithMetadata() *Metadata

FinishStreamWithMetadata Signal the end of an audio signal to an ongoing streaming inference, returns extended metadata.

func (*Stream) FreeStream

func (s *Stream) FreeStream()

DiscardStream Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don't want to perform a costly decode operation.

func (*Stream) IntermediateDecode

func (s *Stream) IntermediateDecode() string

IntermediateDecode Compute the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn't currently capable of streaming, so it always starts from the beginning of the audio.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL