Deepspeech basics
What is Deepspeech
From Mozilla's github repo for deepspeech:
"DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier."
Virtual environment
First let's create a virtual environment for deepspeech
Install deepspeech
The only required package is deepspeech
Download Model
A pre-trained english model is available for download
Download audio files
You can download some example audio files
Run inference
We can now transcribe the audio file
If you ran the above command you should see something like "experience proofsless" if you are using the same model as me
So not perfect, but we can try it out on our own voice as well
Record a wav file
For deepspeech to run inference correctly you will need to record your voice with some specific parameters.
- Sampling rate: 16 kHz
- Channel: 1
- Bit rate: 256 kb/s
We can achieve this using the sox
package
If you're on Ubuntu:
Arch Linux:
Mac:
After installing sox
you should have access to the rec
command, we will use this to record our voice
To begin recording you voice enter the following command
To make sure you have recorded the audio in the proper format we can install another package called mediainfo
and run it like so:
You should see an output similar to the following:
Run inference
Now we can run inference on our own voice data
Wrapping up
In the next article I'll go over running inference on a GPU