gosling

command module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 9, 2022 License: MIT Imports: 7 Imported by: 0

README

gosling

Natural sounding text-to-speech in the terminal (and more).

Pre-requisites

This is NOT intended to be a completely-free, pick-up-and-use TTS solution. In fact, it is simply a wrapper around Google's Cloud Text-to-Speech API.

You will need:

  • A GCP account with billing enabled.
    • Google gives you 1 million characters free every month. That's nearly 10 books a month. It's essentially free for personal use.
    • Once you have a GCP account, enable the TTS API and get a service account.
    • Export service account credentials in your shell. You will need to do this every time you open a new shell. Add it to your shell configuration or make a script to run gosling for convenience.
      export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json"
      
  • Internet connection every time you need some text spoken to you.
  • I have only tested this on Linux. Commands for playing audio will be different on other platforms.

Examples

Simple text with default options

https://user-images.githubusercontent.com/34161949/178104531-73298a8e-753f-4910-94c6-7cea9a85337a.mp4

Numbers and punctuation with default options

(the multiple exclamations are something that I have seen other TTSs struggle with):

Welcome to gosling!!! It has options such as "Pitch adjustment" in the range -20.0 to 20.0, "Speaking rate/speed" in the range 0.25 to 4.0 and "Volume gain" (in dB) in the range -96.0 to 16.0.

https://user-images.githubusercontent.com/34161949/178104603-f8c46b93-4d38-4f71-bdc0-d3d4b3e47b05.mp4

Other languages

Kannada:

https://user-images.githubusercontent.com/34161949/178105235-19e921c7-355b-4e66-8c3e-e962718002aa.mp4

Check out the full voice list, use Wavenet or Neural2 based voices for better quality.

Installation

Pre-built binaries

Go to the latest release, scroll down to "Assets" and download the correct file for your platform. Unzip the file and run the gosling binary inside:

./gosling

If you have go installed

go install github.com/Samyak2/gosling@latest

Usage

Text file

gosling input.txt output.mp3

Play the resulting output.mp3 file using your audio player.

Standard input

echo "hello there" | gosling - output.mp3

Play audio directly

If you have the play command, which is usually a part of the sox package (sudo dnf install sox on Fedora):

echo "hello there" | gosling - - | play -t mp3 -

If you have the ffplay command, which is a part of ffmpeg:

echo "hello there" | gosling - - | ffplay -nodisp -autoexit -

Options

gosling has a lot of configuration around language & voice, audio, etc.

See gosling --help for all options.

Usage: gosling <input-file> <output-file>

Arguments:
  <input-file>     Text file to read from. Use - for standard input.
  <output-file>    Audio file to write to. Use - for standard output.

Flags:
  -h, --help                            Show context-sensitive help.
  -l, --language-code="en-US"           Language code to use for the synthesis. See full list at: https://cloud.google.com/text-to-speech/docs/voices
  -v, --voice-name="en-US-Wavenet-A"    Voice name to use for the synthesis. Use an empty string to let the GCP API choose. See full list at: https://cloud.google.com/text-to-speech/docs/voices
      --pitch=-3                        Pitch adjustment in the range [-20.0, 20.0]. Use a negative number to decrease the pitch. See:
                                        https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig
  -r, --speaking-rate=1.0               Speaking rate/speed in the range [0.25, 4.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig
      --volume-gain=0.0                 Volume gain (in dB) in the range [-96.0, 16.0]. See: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig
  -s, --[no-]ssml                       Use if text has SSML. Default is plain text. See: https://cloud.google.com/text-to-speech/docs/basics#speech_synthesis_markup_language_ssml_support
      --service-endpoint=STRING         GCP Service Endpoint. You'll need to set this if you want a Neural2 voice. See: https://cloud.google.com/text-to-speech/docs/endpoints.

FAQ

The voice sounds too robotic

WaveNet

By default, on the default language, gosling uses a WaveNet based voice model. If you're using a different language, make sure to switch the voice to a WaveNet based one too. Use --voice-name for this.

Neural2

If WaveNet is not good enough, try using a Neural2 voice type (search for Neural2 in the voice list if you need other languages):

gosling input.txt output.mp3 --service-endpoint 'https://us-central1-texttospeech.googleapis.com' -v en-US-Neural2-A

TODO: this endpoint is currently timing out for all TTS requests, not sure why.

If Neural2 isn't good enough either, well... you'll have to take this up with Google.

Why am I getting this error google: could not find default credentials?

Either:

  • You did not read the Pre-requisites section.
  • You forgot to export the GOOGLE_APPLICATION_CREDENTIALS environment variable in your shell.
  • Something is wrong with your GCP service account. See this page that is also linked from the error.

Why don't --pitch and --volume-gain have short versions?

These options can have negative values and the command-line parser I use behaves weirdly with negative numbers and short flags. I have removed the short versions to avoid making it a pitfall.

How do I use this with foliate?

I use this script:

#!/bin/bash
# requires gosling and sox
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json"
gosling - - | play -t mp3 - &
trap 'kill $!; exit 0' INT
wait

Copy and save this to a file and chmod +x /path/to/foliate-gosling.sh it.

TODO: this only works with English text. I need to figure out a way to convert FOLIATE_TTS_LANG_LOWER to Google's format.

But why?

When I'm too lazy to read an article, I use Google Assistant's "read me this article" feature on my phone. It's extremely good, especially with text-only articles. I could not find an alternative on desktop (specifically, Linux).

Yes, there are quite a few text-to-speech apps on Linux. Most of them either sound like R2D2 or something from the depths of the void. The only one, that I found, which sounds bearable uses an undocumented Google Translate API (probably a ToS violation?). There are also some pre-trained neural-network based models, but they sound like a person speaking through a very low-bandwidth voice call and they skip over numbers and abbreviations pretending they never existed.

The only text-to-speech that sounded good was Google's. So I thought - "they must have a GCP API for this". And they did. And I hacked this together.

TODO

  • speech-dispatcher support. This will allow using it in Firefox's reader mode, for example.
  • Some pre-processing of raw text - remove extra/unnecessary punctuation, better formatting for numbers, etc.

License

MIT

Documentation

Overview

Command quickstart generates an audio file with the content "Hello, World!".

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL