Micarrayechocancellation Walkthrough: C++
Total Page:16
File Type:pdf, Size:1020Kb
MicArrayEchoCancellation Walkthrough: C++ Capturing Audio Streams with Acoustic Echo Cancellation and Beamforming About This Walkthrough In the Kinectct™™ for Windows®® Software Development Kit (SDK), the MicArrayEchoCancellation sample shows how to capture an audio stream from the microphone aae rray of the Kinect for Xbox 360®® se sensor by using the MSRKineinectAudio Microsoft DirectX ®® media object (DMO) in a Microsoft DirectShow®® graph. This document provides a walkthrough review of the MicArrayEchoCancellation sample. Resources For a complete list of documentation for the Kinect for Windows SDK Beta, plus related reference and links to the online forums, see the bbe eta SDK website at: http://kinectforwindows.org Contents Introduction .......................................................................................................................................................................... 2 Program Description ........................................................................................................................................................... 3 Create and Configure the MSRKinectAudio DMO .......................................................................................................... 4 Select the Kinect Sensor‘s Microphone Array .................................................................................................................. 55 Enumerate the Device Index........................................................................................................................................... 6 Determine the Device Index ........................................................................................................................................... 7 Record the Captured Stream and DeDetertermine the Source Direction ............................................................................. 8 Set Up the Data Buffer .................................................................................................................................................... 8 Set the Output Format .................................................................................................................................................... 8 Allocate Resources and the Output Buffer ................................................................................................................... 9 Capture the Audio Stretream and Determine Source Direction .................................................................................. 10 License: The Kinect for Windows SDK Beta is licensed for non-commercial use only. By in stalling, copying, or otherwise usin g the beta SDK, you agree to b e bound by the terms of its li cecensense.. Read the license.. Disclaimer: TThis document is provided ―as--is‖. Information and views expressed in this document, including URL and otherher InInternet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any lelegal rights to any in teltellectual property in aain ny Microsoft proproduct. You may copy an d uud se this document for youou r ininternal, reference purposes.es. © 2011 Microsofsoft Corporation. All rights reserved.. Microsoft, DirectShow, DirectX, Kininect, MSDN, Windows, and Windows Media are trademarks of th e Microsoft group of compan ies. All other trademarks are property of their respective owners. MicArrayEchoCancellation Walkthrough: C++ – 2 Introduction The audio component of the Kinect™ for Xbox 360® sensor is a four-element linear microphone array. An array provides some significant advantages over a single microphone, including more sophisticated acoustic echo cancellation and noise suppression, and the ability to determine the direction of a sound source. The primary way for C++ applications to access the Kinect sensor‘ s microphone array is through the MSRKinectAudio Microsoft® DirectX® media object (DMO). A DMO is a standard COM object that can be incorporated into a Microsoft DirectShow® graph or a Microsoft Media Foundation topology. The Kinect for Windows® Software Development Kit (SDK) Beta includes an extended version of the Windows microphone array DMO—referred to here as the MSRKinectAudio DMO—to support the Kinect microphone array. The MSRKinectAudio DMO supports all the standard microphone array functionality, which includes: Acoustic echo cancellation (AEC) Microphone array processing (MicArray) Noise suppression (NS) Automatic gain control (AGC) Voice activity detection (VAD) Sound source localization, which identifies the direction of the source in the horizo ntal plane Beamforming, which allows the array to function as a steerable directional microphone. The DMO supports 11 beams, with fixed directions that range from -50 to+50 degrees in 10-degree increments. For more information on the standard microphone array, see ―Microphone Array Support in Windows Vista‖ and ―How to Build and Use Microphone Arrays for Windows Vista‖ on the Microsoft Developer Network (MSDN®) website. Although the internal details for MSRKinectAudio DMO are different, you use it in much the same way as the standard microphone array DMO, with the following exceptions. The MSRKinectAudio DMO: Has its own class identifier (CLSID)—CLSID_CMSRMSRKinectAudio. Exposes sound source localization functionality through a new interface—ISoundSourceLocalizer. Supports an additional microphone array mode—adaptive beamforming)—which uses an internal source localizer to automatically determine the beam direction. The MicArrayEchoCancellation sample shows how to capture an audio stream from the Kinect sensor‘s microphone array by polling the MSRKinectAudio DMO in source mode. The application uses AEC to record a high-quality audio stream and beam-forming to determine the direction to the sound source. The DMO can also be used with a Microsoft Media Foundation topology. For an example, see ―MFAudioFilter Walkthrough: C++ Sample‖ on the beta SDK website. Note DirectShow is COM-based, and this document assumes that you are familiar with how to use COM objects and interfaces. You do not need to know how to implement COM objects. For the basics of how to use COM objects, see ―Programming DirectX with COM‖ on the MSDN website. That MSDN topic is written for DirectX programmers, but the basic principles apply to all COM -based applications. MicArrayEchoCancellation Walkthrough: C++ – 3 Program Description MicArrayEchoCancellation is installed with the Kinect for Windows Software Development Kit (SDK) Beta samples in %KINECTSDK_DIR%\Samples\KinectSDKSamples.zip.. MicArrayEchoCancellation is a C++ console application that is implemented in MicArrayEchoCancellation.cpp. The basic program flow is as follows: 1. Create and configure the MSRKinectAudio DMO. 2. Enumerate the available capture devices and select the Kinect sensor‘s microphone array. 3. Record 10 seconds of audio stream and determine the source direction as the capture process progresses. To run MicArrayEchoCancellation, start MicArrayEchoCancellation.exe and follow the instructions in the console window. Tip Before attempting to capture audio from the microphone array, you must be actively streaming to the audio render device that is specified for the DMO —typically the system‘s speakers. Otherwise, the MSRKinectAudio DMO fails. AEC is designed to cancel interfering sounds, so there must be something to cancel. The simplest solution is to start playing a tune on Windows Media ® Player before you run the application. The Libraries\Music\Sample Music folder on your Windows PC contains some sample music files. The following is a lightly edited version of the output from a MicArrayEchoCancellation session, where the sound source moved from side to side as capture progressed: Start a song in Windows Media Player and then press any key to start recording (echo cancellation processing expects speakers to be producing sound). Recording using DMO AEC-MicArray is running ... Press "s" to stop Position: -0.051290 Confidence: 1.000000 Beam Angle = 0.0000000 Sound output was written to file: C:\KDK\Samples\Audio\MicArrayEchoCancellation\CPP\AECout.wav The recording process uses beamforming, which creates a single directional channel from the four microphones in 16-kHz, 16-bit mono pulse code modulation (PCM) format. The channel is oriented to one of the 11 beam directions. MicArrayEchoCancellation uses adaptive beamforming, which automatically selects the beam that is closest to the source direction. You can use the captured stream for many purposes. MicArrayEchoCancellation simply writes the captured audio stream to AECout.wav—which is a .wav file that can be played with Windows Media Player. The rest of this document is a walkthrough of the MicArrayEchoCancellation sample. It describes all the sample‘s functionality except for writing the capture stream to a .wav file. For details on that process, see the sample code. MicArrayEchoCancellation Walkthrough: C++ – 4 Note This document includes code excerpts, most of which have been edited for brevity and readability. In particular, most routine error-correction code has been removed. For the complete code, see the MicArrayEchoCancellation sample. Hyperlinks in this walkthrough refer to content on the MSDN website. Create and Configure the MSRKinectAudio DMO The application‘s entry point— _tmain—manages the overall program execution, with private methods handling most details. The first step is to create and configure an instance of the MSRKinectAudio DMO, as