The first thing you need to do is build a language model or a grammar.
The grammar can be something simple in a format called JSGF, and this is the easier way to get a speech recognizer up and running. Alternatively, you can use a language model. The language model can be built using the instructions on the Sphinx site. You can create it starting from a file with sentences like this:
<s> I WANT A NEXTCUBE ZERO FOUR ZERO </s>
<s> I WANT THE NEXTCUBE ZERO FOUR ZERO </s>
<s> I NEED A NEXTCUBE ZERO FOUR ZERO </s>
<s> I NEED THE NEXTCUBE ZERO FOUR ZERO </s>
<s> I AM LOOKING FOR A NEXTCUBE ZERO FOUR ZERO </s>
<s> I AM LOOKING FOR THE NEXTCUBE ZERO FOUR ZERO </s>
<s> I AM SEEKING A NEXTCUBE ZERO FOUR ZERO </s>
<s> I AM SEEKING THE NEXTCUBE ZERO FOUR ZERO </s>
A sample JSGF file would be (modified from the sample on the Sphinx website) … note that I’ve made all the words capitals because the CMU phonetic dictionary has all the words listed in caps (make sure that any language model is all caps as well, except for the sentence boundaries):
* JSGF Grammar for Hello World example
public <greet> = (GOOD MORNING | HELLO | HI) ( PAUL | RITA | WILL );
Imagine a day when spoken language sentences can be recognized perfectly by a machine.
It suddenly becomes much easier and more natural to issue commands and pose queries in a natural language, rather than, say in XML. Imagine speaking a C program to someone, syntax and all. That would be really weird!
We recently evaluated spoken language programming by combining a speech recognizer with Vaklipi, our fifth generation programming language.
The results were frankly disappointing.
Here’s a sample conversation between me and my computer:
Cohan : b is equal to two.
Sphinx hears: b is e equal to two.
Vaklipi [error]: I can’t understand …
Cohan : b equals two.
Sphinx: b equals two.
Cohan : What is b.
Sphinx: What is 8.
Vaklipi : 8.0
Cohan : What is b.
Sphinx: What is b.
Thanks to all the people filled out the surveys we sent out and helped evaluate Vaklipi. We also thank the people who helped us port the system to other languages like: Kannada (Ms. K. G. Padma Lekha, Mr. K. G. Srikanta Dani, Dr. K. R. Ganesha, and Mr. Rupesh Kumar G.), Tamil (Mrs. Linda Christy and Dr. S. Carlos), Hindi (Mr. Kartik Asooja of Aiaioo Labs and Mr. Chandra Bhan Asooja), French (M. Sammy Ben Rabah, M. Yann Jouanique et Mme. Fanny Jouanique), German (Hartmut Wege, Judith Klein und Deepica Rao), Chinese (under development – Dright Ho), Japanese (Jojo Baby), Polish (under development – Joanna Lupinska – Asia), Telugu (under development – Mrs. Meenakshi Jami).