This is more of personal project to learn more about how speech recognition (SR) works and how AI training works at a low level. (Functionally, it’s pointless and is just a self-assigned “homework problem”)
To do this, I need to record a bit of audio to to use as training data.
Recording and chopping up .wav files is easy, but it’s time consuming. I am toying with my own teleprompter-like python app that will prompt for a word, record and tag, and save for later. However, is there a good app to automatically create utterances that is already built?
Ideally, unrecognized words in my own SR system would be automatically turned into tagged audio clips to be used for re-training or fine tuning.
I am shortcutting a bit of this work in python with Google SR for my first dataset. Unfortunately, calling external APIs is sidestepping my intent of this project so I’ll move away from that soon.
People that work with AI typically work with lots of data, so I figured here was a good place to ask.
I found this as a start: https://github.com/cmusphinx/pocketsphinx/blob/master/cython/pocketsphinx/segmenter.py