Speech Recognition for the Kinect:
In the development of Kin-educate I found this to be one of the most tricky parts. Largely because I couldn't find any complete tutorials out there other than the quick start series at channel 9, which, if you have checked it out, you will know is helpful, but not comprehensive.
I have been asked by quite a lot of people about how I did the speech recognition in the maths game for Kin-educate, so I thought I would do a quick tutorial that cuts out all the unnecessary bits, and just focuses on getting you set up and speech recognition working quickly and easily. This tutorial assumes you have a Kinect project set up already - if you do, you should be able to just copy and paste this code, in order, and you're all set!
*You decide what kind of outputs you would like for the speech recognition, but for this example I have used just three text boxes for feedback. One for the hypothesized result (good for debugging), one for the rejected speech, and one for the reply - when speech is recognized.
Add using statements and references:
//Make sure to add a reference to Kinect in the references using Microsoft.Kinect; //Make sure you have the speech SDK installed //go to add reference, browse, navigate to program files, micrsoft SDKs //speech, assemblies and select speech.dll using Microsoft.Speech.AudioFormat; using Microsoft.Speech.Recognition; using System.IO;
Then, declare your variables and get the speech recognizer:
//Create an instance of your kinect sensor public KinectSensor CurrentSensor; //and the speech recognition engine (SRE) private SpeechRecognitionEngine speechRecognizer; //Get the speech recognizer (SR) private static RecognizerInfo GetKinectRecognizer() { Func<RecognizerInfo, bool> matchingFunc = r => { string value; r.AdditionalInfo.TryGetValue("Kinect", out value); return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase); }; return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault(); }
When the window loads, we need to initialize the Kinect sensor:
//When the window loads, initialize the Kinect public MainWindow() { InitializeComponent(); InitializeKinect(); } //Initilaize the kinect private KinectSensor InitializeKinect() { //get the first available sensor and set it to the current sensor variable CurrentSensor = KinectSensor.KinectSensors .FirstOrDefault(s => s.Status == KinectStatus.Connected); speechRecognizer = CreateSpeechRecognizer(); //Start the sensor CurrentSensor.Start(); //then run the start method to start streaming audio Start(); return CurrentSensor; }
Now we need to configure the audio stream:
//Start streaming audio private void Start() { //set sensor audio source to variable var audioSource = CurrentSensor.AudioSource; //Set the beam angle mode - the direction the audio beam is pointing //we want it to be set to adaptive audioSource.BeamAngleMode = BeamAngleMode.Adaptive; //start the audiosource var kinectStream = audioSource.Start(); //configure incoming audio stream speechRecognizer.SetInputToAudioStream( kinectStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); //make sure the recognizer does not stop after completing speechRecognizer.RecognizeAsync(RecognizeMode.Multiple); //reduce background and ambient noise for better accuracy CurrentSensor.AudioSource.EchoCancellationMode = EchoCancellationMode.None; CurrentSensor.AudioSource.AutomaticGainControlEnabled = false; }
Here we set the culture, define the words we want our program to recognize, and set up the grammar builder:
//here is the fun part: create the speech recognizer private SpeechRecognitionEngine CreateSpeechRecognizer() { //set recognizer info RecognizerInfo ri = GetKinectRecognizer(); //create instance of SRE SpeechRecognitionEngine sre; sre = new SpeechRecognitionEngine(ri.Id); //Now we need to add the words we want our program to recognise var grammar = new Choices(); grammar.Add("hello"); grammar.Add("goodbye"); //set culture - language, country/region var gb = new GrammarBuilder { Culture = ri.Culture }; gb.Append(grammar); //set up the grammar builder var g = new Grammar(gb); sre.LoadGrammar(g); //Set events for recognizing, hypothesising and rejecting speech sre.SpeechRecognized += SreSpeechRecognized; sre.SpeechHypothesized += SreSpeechHypothesized; sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected; return sre; }
Now all we need to do is set up the methods for hypothesizing, recognizing and rejecting speech:
//if speech is rejected private void RejectSpeech(RecognitionResult result) { textBox2.Text = "Pardon Moi?"; } private void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e) { RejectSpeech(e.Result); }
I use the hypothesized result for debugging and changing the confidence level for managing accuracy:
//hypothesized result private void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e) { textBox1.Text = "Hypothesized: " + e.Result.Text + " " + e.Result.Confidence; }
This is where we decide what happens when speech is recognized. The confidence level is set quite low here. Experiment with it to see what suits you best:
//Speech is recognised private void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e) { //Very important! - change this value to adjust accuracy - the higher the value //the more accurate it will have to be, lower it if it is not recognizing you if (e.Result.Confidence < .4) { RejectSpeech(e.Result); } //and finally, here we set what we want to happen when //the SRE recognizes a word switch (e.Result.Text.ToUpperInvariant()) { case "HELLO": textBox3.Text = "Hi there."; break; case "GOODBYE": textBox3.Text = "Goodbye then."; break; default: break; } }
And that is that. You should now have speech recognition working within your Kinect program. Check back for the next blog where I will be expanding upon this by making a speech-based application for controlling your media player!
Contact info:
mickpal_@hotmail.commichaelpalmer.mp@gmail.com
www.michaelpalmerwebdesign.com
Or, leave a comment on my YouTube channel
This comment has been removed by the author.
ReplyDeleteHi! Thanks for article. Can You upload solution? I'm a new in C# and it will be easy if I can download source files.
ReplyDeletehi,,
ReplyDeletei'm a beginner in kinect..
i wanna ask, can culture in kinect detect another language, such as indonesian??
or it's only detect english word..
how if i wanna detect my word using my language??
thanks for your help
Awesome tutorial, thanks for the help!
ReplyDelete