The PDF File for the Draft Chapter
Total Page:16
File Type:pdf, Size:1020Kb
Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) Kinect Chapter 14. Speech Recognition When OpenNI is compared to Microsoft's SDK for the Kinect, two 'drawbacks' are often mentioned – a lack of control over the Kinect's tilt motor, and being unable to access the four channel microphone array. I tackled the first issue in chapter 6 when I implemented a simple Java class and driver for the motor, accelerometer, and LED. This chapter and the next are about two different approaches to adding speech recognition and microphone capture to OpenNI. This chapter focuses on speech recognition, using CloudGarden's TalkingJava SDK (http://www.cloudgarden.com/JSAPI/), recorded with the PC's microphone rather than the Kinect's. TalkingJava offers a full implementation of Sun's Java Speech API (JSAPI) for recognition and synthesis, and utilizes Microsoft's Speech API (SAPI) speech engines beneath Java. SAPI is a standard feature on Windows, and comes with a range of simple configuration and testing tools. In particular, I can improve speech recognition accuracy by training the engine to deal specifically with my voice and microphone setup. SAPI is also at the heart of Microsoft's SDK for the Kinect. Chapter 15 shows how to directly record from the Kinect's microphone array with Java's Sound API. That chapter's 'trick' is to install audio support from Microsoft's SDK, which can co-exist with OpenNI. The SDK's audio driver lets Windows 7 treat the array as a multichannel recording device, making it accessible to Java. 1. Before Programming Although JSAPI and TalkingJava have features for listing the capabilities of the PC's sound card, for selecting between audio lines and mixers, and for setting parameters such as a microphone's volume, it's generally easier to configure and test audio equipment using OS-level tools. For example, on my Windows 7 test machine, I couldn't record from one of the audio ports with Java, and was able to discover that the hardware was faulty via OS controls. Windows 7 (and XP) offers three audio Control Panel controls: a "Sound" control, a "Speech Recognition" control, and a control specific to the sound card ("SigmaTel Audio" on my Windows 7 machine, "Realtek HS Sound Effect Manager" on my XP device). If you're not sure what sound card you're using, then you can find details under the "Sound, video, and game controllers" heading of the "Device Manager" control. The sound card should be checked first to ensure that the microphone ports are working, switched on, and their volume isn't muted. Figure 1 shows part of the SigmaTel control panel. 1 Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) Figure 1. The SigmaTel Control Panel. It was here that I discovered two problems with my sound card which Java can't detect: firstly that the front panel microphone wasn't working, and secondly that the rear panel microphone port would change into a speaker line if the microphone cable was removed!. The "Sound" control on Windows 7 has a Recording tab (see Figure 2) and audio level bars for each input device. These might include different microphones, stereo "line in"s, headsets, or TV tuners. Figure 2. The Sound Control Panel on Windows 7. The dark green level bars on the right of the rear microphone (Rear Mic) row in Figure 2 shows that it is picking up sound. If more than one microphone is active, then the ones you don't want should be disabled (by right clicking on them, and using 2 Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) the menu that appears). Alternatively, make sure that the microphone you want is the default OS choice. The Properties button in Figure 2 leads to a Properties window (see Figure 3) with tabs for setting the volume and boost, noise controls, and format settings (e.g. sample rate and number of channels on the Advanced tab). Make a note of these formats since they might be useful in your Java code. Figure 3. Properties Window for my Microphone. It's a good idea to check the sound quality of your microphone. One way is via the Listen tab in the Properties window, but I prefer to record a piece of speech, and look at its visualization inside a audio editing tool. Two good free editors that I've tried are Audacity (http://audacity.sourceforge.net/) and Wavosaur (http://www.wavosaur.com/). Figure 4 shows a short recording from the rear microphone using Audacity. 3 Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) Figure 4. An Audacity Recording. Sound capture in Audacity is controlled by selecting a microphone input (from the list next to the microphone icon), and then pressing record (the red circle button). When the track in Figure 4 is played back, there's a slight hiss from the microphone, and a background hum from the room's air conditioner, both of which will affect the quality of the speech recognition. Windows' "Speech Recognition" control is a great resource for making sure that the microphone works with Microsoft's Speech API (SAPI) recognizer engine, and also for training the engine (see Figure 5). 4 Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) Figure 5. The Speech Recognition Control Panel in Windows 7. Good quality speech recognition requires that the engine be trained, not only to recognize your voice, but also to deal with background and microphone noise. The Windows 7 training session takes about 10 minutes, but pays dividends later in better recognition accuracy. Audio Tools on Windows XP I found another kind of speech recognition problem when checking the audio setup on my Windows XP test machine. In XP, speech recognition is handled by the "Speech" control panel (equivalent to the "Speech Recognition" control in Windows 7), as shown in Figure 6. 5 Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) Figure 6. The Speech Control Panel in Windows XP. Recognizer training is started via the "Train Profile" button, with the training details stored as the default speech profile. The "Language" pop-down list shows three recognizer engines: Microsoft English Recognizer v5.1 (seen in Figure 6) Microsoft English (U.S.) v6.1 Recognizer SAPI Developer Sample Engine The "Sample Engine" is a demo, and can't be used for real speech recognition tasks. The version number in "Microsoft English (U.S.) v6.1 Recognizer" refers to the recognizer engine, not the SAPI version (which is v5.1 for Windows XP). There appears to be a problem with its installation, since its "Train Profile" button is grayed out (disabled) when that engine is selected, and pressing on its Settings button throws up an error dialog box stating that the engine cannot be started. In other words, although my XP machine appears to have three speech recognizers, only "Microsoft English Recognizer v5.1" is actually working. 2. SAPI, JSAPI, and Talking Java Figure 7 shows the main elements in a speech recognition application. 6 Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) Figure 7. Components in a Speech Recognition Application. SAPI is MS Windows middleware that provides an API for accessing speech engines (both recognizers and synthesizers). It's installed as part of Windows 7 (and Vista), but may not be present on XP, where is comes as part of MS Office 2003. If you don't have Office, SAPI 5.1 is available online as SpeechSDK51MSM.exe at http://www.microsoft.com/download/en/details.aspx?id=10121. Vista comes with SAPI 5.3, and Windows 7 with SAPI 5.4, but TalkingJava can communicate with any of these versions. A good overview of all the SAPI versions can be found at http://en.wikipedia.org/wiki/Microsoft_Speech_API/. SAPI is also used by Microsoft's SDK for the Kinect, via a .NET managed System.Speech namespace. SAPI's speech recognition engines currently support eight languages: U.S. English, U.K. English, traditional Chinese, simplified Chinese, Japanese, German, French, and Spanish, with the default determined by your current OS language. To use a different recognizer language, you must install the appropriate language pack, which may be available online from Microsoft, depending on the version of Windows you're using. Microsoft has a lot of online information about Window's speech recognition capabilities, starting at http://msdn.microsoft.com/en-us/library/bb756992.aspx. Cloud Garden TalkingJava (http://www.cloudgarden.com/JSAPI/) acts as a link between Java and SAPI. It works with any SAPI-compliant speech engine, and implements the Java Speech API ( JSAPI) specification (http://java.sun.com/products/java-media/speech/). TalkingJava also offers useful additional features, such as redirection of audio data to/from files, and GUI tools for selecting engines (e.g. see Figure 8). Two drawbacks of TalkingJava are that it's only 7 Java Prog. Techniques for Games. Kinect 14. Speech Recognition Draft #1 (13th Feb 2012) free for non-commercial use, and requires a SAPI-compatible speech engine. Actually, I found this latter condition to be an advantage because of the user-friendly SAPI training tools in Windows. Oracle (Sun Microsystems, formerly) doesn't offer a JSAPI implementation, but hosts some excellent documentation, including a programmer guide at http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/ and API documents at http://java.sun.com/products/java-media/speech/reference/api/.