Testing out deepspeech for speech to text recognition

published Feb 05, 2019 02:05   by admin ( last modified Feb 05, 2019 04:01 )

Contents: 16 minutes of heavily compressed "mushy" speech in British-ish English. Text file needed to create a script to redo the audio. I tried deepspeech 0.4.1 with its supplied model.

./bin/deepspeech --model models/output_graph.pb \
--alphabet models/alphabet.txt \
--trie models/trie --lm models/lm.binary \
--audio resampled-mono.wav > 16monoklr.txt

For comparison here's what Google transcribes from the same file:

this is the Swedish Land Registry logjam products contract blockchain system a Croma way running on s flex so I'd open a number of tabs here for each participant into the smart contract seller buyer sellers and buyers Bank the Land Registry itself and another server which in this case could be the demo…

First try on a 48KHz stereo wav file. The result is close to garbage and way too short:

his is the way should under his dream look him potomac changes and i conteanyng on spikes so i oftener of catechism in to the smart contract celeritate sang biting the understreak so i don't know server wichitas be the democrats not stage she told some pecorino is…

Second try 16KHz stereo version of the same file. Result, this is the entire contents of the file:

ohhhhhhhhh ohhhhhhhhh hhhhhhhhow hhhhhhhhow hhhhhhhhow here

Third try 16KHz mono version of the same file. The result is close to garbage and way too short:

this is these we should under destrem look him or he come look changed and a craney running on specs so i don't get a number of caterers to the smart contract selerier said or sankirtan the understreak self and an observer which he is caste democrat…