astideepspeech

package module

v0.7.0 Latest Latest Go to latest Published: Feb 20, 2020 License: MIT Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/31434116/go-astideepspeech

Links

Open Source Insights

README ¶

Golang bindings for Mozilla's DeepSpeech speech-to-text library.

As of now, astideepspeech is only compatible with version v0.6.0 of DeepSpeech.

Installation

Install DeepSpeech

fetch an up-to-date native_client.<your system>.tar.xz matching your system from DeepSpeech's "releases"
extract its content to /tmp/deepspeech/lib
download deepspeech.h from https://github.com/mozilla/DeepSpeech/raw/v0.6.0/native_client/deepspeech.h
copy it to /tmp/deepspeech/include
export CGO_LDFLAGS="-L/tmp/deepspeech/lib/"
export CGO_CXXFLAGS="-I/tmp/deepspeech/include/"
export LD_LIBRARY_PATH=/tmp/deepspeech/lib/:$LD_LIBRARY_PATH

Install astideepspeech

Run the following command:

$ go get -u github.com/asticode/go-astideepspeech/...

Example

Get the pre-trained model

Run the following commands:

$ mkdir /tmp/deepspeech
$ cd /tmp/deepspeech
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
$ tar xvfz deepspeech-0.6.0-models.tar.gz

Get the audio files

Run the following commands:

$ cd /tmp/deepspeech
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz
$ tar xvfz audio-0.6.0.tar.gz

Use the client

Run the following commands (make sure $GOPATH/bin is in your $PATH):

$ cd /tmp/deepspeech
$ deepspeech -model models/output_graph.pb -audio audio/2830-3980-0043.wav -lm models/lm.binary -trie models/trie

    Text: experience proves this

$ deepspeech -model models/output_graph.pb -audio audio/4507-16021-0012.wav -lm models/lm.binary -trie models/trie

    Text: why should one halt on the way
    
$ deepspeech -model models/output_graph.pb -audio audio/8455-210777-0068.wav -lm models/lm.binary -trie models/trie

    Text: your power is sufficient i said

Documentation ¶

Index ¶

func GetVersion() string
type Metadata
type MetadataItem
type Model
- func New(modelPath string, beamWidth int) *Model
type Stream
- func CreateStream(mw *Model) *Stream

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func GetVersion ¶ added in v0.7.0

func GetVersion() string

PrintVersions Print version of this library and of the linked TensorFlow library.

Types ¶

type Metadata ¶

type Metadata C.struct_Metadata

Metadata represents a DeepSpeech metadata output

func (*Metadata) Close ¶

func (m *Metadata) Close() error

Close frees the Metadata structure properly

func (*Metadata) Confidence ¶

func (m *Metadata) Confidence() float64

func (*Metadata) Items ¶

func (m *Metadata) Items() []MetadataItem

func (*Metadata) NumItems ¶

func (m *Metadata) NumItems() int32

type MetadataItem ¶

type MetadataItem C.struct_MetadataItem

func (*MetadataItem) Character ¶

func (mi *MetadataItem) Character() string

func (*MetadataItem) StartTime ¶

func (mi *MetadataItem) StartTime() float32

func (*MetadataItem) Timestep ¶

func (mi *MetadataItem) Timestep() int

func New ¶

func New(modelPath string, beamWidth int) *Model

New creates a new Model

modelPath The path to the frozen model graph. beamWidth The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.

func (*Model) Close ¶

func (m *Model) Close() error

Close closes the model properly

func (*Model) DisableExternalScorer ¶ added in v0.7.0

func (m *Model) DisableExternalScorer()

func (*Model) EnableExternalScorer ¶ added in v0.7.0

func (m *Model) EnableExternalScorer(scorerPath string)

EnableExternalScorer enables decoding using beam scoring with a KenLM language model.

lmPath The path to the language model binary file.

func (*Model) GetModelBeamWidth ¶ added in v0.7.0

func (m *Model) GetModelBeamWidth() uint

func (*Model) GetModelSampleRate ¶

func (m *Model) GetModelSampleRate() int

GetModelSampleRate read the sample rate that was used to produce the model file.

func (*Model) SetModelBeamWidth ¶ added in v0.7.0

func (m *Model) SetModelBeamWidth(beamWidth uint)

func (*Model) SpeechToText ¶

func (m *Model) SpeechToText(buffer []int16, bufferSize uint) string

SpeechToText uses the DeepSpeech model to perform Speech-To-Text. buffer A 16-bit, mono raw audio signal at the appropriate sample rate. bufferSize The number of samples in the audio signal.

func (*Model) SpeechToTextWithMetadata ¶

func (m *Model) SpeechToTextWithMetadata(buffer []int16, bufferSize uint) *Metadata

SpeechToTextWithMetadata uses the DeepSpeech model to perform Speech-To-Text. buffer A 16-bit, mono raw audio signal at the appropriate sample rate. bufferSize The number of samples in the audio signal.

type Stream ¶

type Stream struct {
	// contains filtered or unexported fields
}

Stream represent a streaming state

func CreateStream ¶

func CreateStream(mw *Model) *Stream

CreateStream creates a new audio stream

mw The DeepSpeech model to use

func (*Stream) FeedAudioContent ¶

func (s *Stream) FeedAudioContent(buffer []int16, bufferSize uint)

FeedAudioContent Feed audio samples to an ongoing streaming inference. aBuffer An array of 16-bit, mono raw audio samples at the appropriate sample rate. aBufferSize The number of samples in @p aBuffer.

func (*Stream) FinishStream ¶

func (s *Stream) FinishStream() string

FinishStream Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.

func (*Stream) FinishStreamWithMetadata ¶

func (s *Stream) FinishStreamWithMetadata() *Metadata

FinishStreamWithMetadata Signal the end of an audio signal to an ongoing streaming inference, returns extended metadata.

func (*Stream) FreeStream ¶

func (s *Stream) FreeStream()

DiscardStream Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don't want to perform a costly decode operation.

func (*Stream) IntermediateDecode ¶

func (s *Stream) IntermediateDecode() string

IntermediateDecode Compute the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn't currently capable of streaming, so it always starts from the beginning of the audio.

Source Files ¶

View all Source files

deepspeech.go

Directories ¶

Path	Synopsis
deepspeech

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL