Documentation ¶
Overview ¶
package astideepspeech provides bindings for Mozilla's DeepSpeech speech-to-text library.
Index ¶
- func Version() string
- type CandidateTranscript
- type Metadata
- type Model
- func (m *Model) BeamWidth() uint
- func (m *Model) Close()
- func (m *Model) DisableExternalScorer() error
- func (m *Model) EnableExternalScorer(scorerPath string) error
- func (m *Model) NewStream() (*Stream, error)
- func (m *Model) SampleRate() int
- func (m *Model) SetBeamWidth(width uint) error
- func (m *Model) SetScorerAlphaBeta(alpha, beta float32) error
- func (m *Model) SpeechToText(buffer []int16) (string, error)
- func (m *Model) SpeechToTextWithMetadata(buffer []int16, numResults uint) (*Metadata, error)
- type Stream
- func (s *Stream) Discard()
- func (s *Stream) FeedAudioContent(buffer []int16)
- func (s *Stream) Finish() (string, error)
- func (s *Stream) FinishWithMetadata(numResults uint) (*Metadata, error)
- func (s *Stream) IntermediateDecode() (string, error)
- func (s *Stream) IntermediateDecodeWithMetadata(numResults uint) (*Metadata, error)
- type TokenMetadata
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type CandidateTranscript ¶ added in v0.7.0
type CandidateTranscript C.struct_CandidateTranscript
CandidateTranscript is a single transcript computed by the model, including a confidence value and the metadata for its constituent tokens.
func (*CandidateTranscript) Confidence ¶ added in v0.7.0
func (ct *CandidateTranscript) Confidence() float64
Confidence returns the approximated confidence value for this transcript. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcript.
func (*CandidateTranscript) NumTokens ¶ added in v0.7.0
func (ct *CandidateTranscript) NumTokens() uint
func (*CandidateTranscript) Tokens ¶ added in v0.7.0
func (ct *CandidateTranscript) Tokens() []TokenMetadata
type Metadata ¶
type Metadata C.struct_Metadata
Metadata holds an array of CandidateTranscript objects computed by the model.
func (*Metadata) NumTranscripts ¶ added in v0.7.0
func (*Metadata) Transcripts ¶ added in v0.7.0
func (m *Metadata) Transcripts() []CandidateTranscript
type Model ¶
type Model struct {
// contains filtered or unexported fields
}
Model provides an interface to a trained DeepSpeech model.
func (*Model) BeamWidth ¶ added in v0.9.0
BeamWidth returns the beam width value used by the model. If SetModelBeamWidth was not called before, it will return the default value loaded from the model file.
func (*Model) Close ¶
func (m *Model) Close()
Close frees associated resources and destroys the model object.
func (*Model) DisableExternalScorer ¶ added in v0.7.0
DisableExternalScorer disables decoding using an external scorer.
func (*Model) EnableExternalScorer ¶ added in v0.7.0
EnableExternalScorer enables decoding using an external scorer. scorerPath is the path to the external scorer file.
func (*Model) NewStream ¶ added in v0.9.0
NewStream creates a new streaming inference state. If an error is not returned, exactly one of the returned stream's Finish, FinishWithMetadata, or Discard methods must be called later to free resources.
func (*Model) SampleRate ¶ added in v0.9.0
SampleRate returns the sample rate that was used to produce the model file.
func (*Model) SetBeamWidth ¶ added in v0.9.0
SetBeamWidth sets the beam width value used by the model. A larger beam width value generates better results at the cost of decoding time.
func (*Model) SetScorerAlphaBeta ¶ added in v0.7.0
SetScorerAlphaBeta sets hyperparameters alpha and beta of the external scorer. alpha is the language model weight. beta is the word insertion weight.
func (*Model) SpeechToText ¶
SpeechToText uses the DeepSpeech model to convert speech to text. buffer is 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
func (*Model) SpeechToTextWithMetadata ¶
SpeechToTextWithMetadata uses the DeepSpeech model to convert speech to text and output results including metadata.
buffer is a 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on). numResults is the maximum number of CandidateTranscript structs to return. Returned value might be smaller than this. If an error is not returned, the returned metadata's Close method must be called later to free resources.
type Stream ¶
type Stream struct {
// contains filtered or unexported fields
}
Stream represents a streaming inference state.
func (*Stream) Discard ¶ added in v0.9.0
func (s *Stream) Discard()
Discard destroys a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don't want to perform a costly decode operation.
func (*Stream) FeedAudioContent ¶
FeedAudioContent feeds audio samples to an ongoing streaming inference. buffer is an array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
func (*Stream) Finish ¶ added in v0.9.0
Finish computes the final decoding of an ongoing streaming inference and returns the result. This signals the end of an ongoing streaming inference.
func (*Stream) FinishWithMetadata ¶ added in v0.9.0
FinishWithMetadata computes the final decoding of an ongoing streaming inference and returns results including metadata. This signals the end of an ongoing streaming inference. If an error is not returned, the metadata's Close method must be called.
func (*Stream) IntermediateDecode ¶
IntermediateDecode computes the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn't currently capable of streaming, so it always starts from the beginning of the audio.
func (*Stream) IntermediateDecodeWithMetadata ¶ added in v0.7.0
IntermediateDecodeWithMetadata computes the intermediate decoding of an ongoing streaming inference, returning results including metadata. numResults is the number of candidate transcripts to return. If an error is not returned, the metadata's Close method must be called.
type TokenMetadata ¶ added in v0.7.0
type TokenMetadata C.struct_TokenMetadata
TokenMetadata stores text of an individual token, along with its timing information.
func (*TokenMetadata) StartTime ¶ added in v0.7.0
func (tm *TokenMetadata) StartTime() float32
StartTime returns the position of the token in seconds.
func (*TokenMetadata) Text ¶ added in v0.7.0
func (tm *TokenMetadata) Text() string
Text returns the text corresponding to this token.
func (*TokenMetadata) Timestep ¶ added in v0.7.0
func (tm *TokenMetadata) Timestep() uint
Timestep returns the position of the token in units of 20ms.