My experience navigating the confusing world of open source voice to text.
Getting your computer to talk is easy. Getting it to listen is a little harder. In the current era of digital assistants like Siri, Cortana and Alexa scooping up our conversations and sending them to their corporate masters for analysis and processing (being intercepted and analyzed by various international intelligence agencies along the way), a lot of us are uncomfortable with these electronic assistants but feel like we may be left behind if we do not embrace the new interface. Wouldn't it be great to have a talking computer that is completely under your control?
Well, you can, but there currently does not exist a simple, end to end solution for voice recognition and control. Even Mycroft, the open source AI, lacks an integrated speech to text interface, instead relying on cloud processing.
I will look at the current state of open source speech to text including Simon, CMU Sphinx, and Voxforge and explain where the current state of the art in open source speech to text processing is, how it compares to some of the commercial solutions (Deep Mind, etc) and how we can help push the state of the art forward.
Learn how the different parts of Simon fit together and what sorts of files the wizard is asking for when you first start it up, plus how to train it to understand your own voice better. Also a better understanding of some of the underlying technology like Hidden Markov Models. Finally, to use voice control with a simple, non Simon interface like Steve Hickson's simple raspberry pi VoiceCommand program.
Discussion/installation of Simon and PocketSphinx.
Configuration of Simon
Testing
Final discussion
People who like to understand technology and who are willing to work towards a solution.
Basic knowledge of linux, package managers, and compiling software from source (make, cmake).