Audiocaptureraw Walkthrough: C++
Total Page:16
File Type:pdf, Size:1020Kb
AudioCaptureRaw Walkthrough: C++ Capturing the Raw Audio Stream About This Walkthrough In the Kinect™ for Windows® Software Development Kit (SDK) Beta, the AudioCaptureRaw sample uses the Windows Audio Session API (WASAPI) to capture the raw audio stream from the microphone array of the Kinect for Xbox 360® sensor and write it to a .wav file. This document is a walkthrough of the sample. Resources For a complete list of documentation for the Kinect for Windows SDK Beta, plus related reference and links to the online forums, see the beta SDK website at: http://www.kinectforwindows.org/ Contents Introduction .......................................................................................................................................................................... 2 Program Description ........................................................................................................................................................... 2 Select a Capture Device ...................................................................................................................................................... 3 Enumerate the Capture Devices..................................................................................................................................... 4 Retrieve the Device Name .............................................................................................................................................. 5 Determine the Device Index ........................................................................................................................................... 6 Prepare for Audio Capture ................................................................................................................................................. 6 Initialize Audio Engine for Capture................................................................................................................................ 7 Load the Format ............................................................................................................................................................... 7 Initialize the Audio Engine .............................................................................................................................................. 7 Capture an Audio Stream from the Microphone Array .................................................................................................. 8 The Primary Thread.......................................................................................................................................................... 8 The Worker Thread ........................................................................................................................................................ 10 License: The Kinect for Windows SDK Beta is licensed for non-commercial use only. By installing, copying, or otherwise using the beta SDK, you agree to be bound by the terms of its license. Read the license. Disclaimer: This document is provided ―as-is‖. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. © 2011 Microsoft Corporation. All rights reserved. Microsoft, DirectX, Kinect, LifeChat, MSDN, and Windows are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners. AudioCaptureRaw Walkthrough: C++ – 2 Introduction The audio component of the Kinect™ for Xbox 360® sensor is a four-element linear microphone array. An array provides some significant advantages over a single microphone, including more sophisticated acoustic echo cancellation and noise suppression, and the ability to determine the direction of a sound source. The primary way for C++ applications to access the Kinect sensor’s microphone array is through the KinectAudio Microsoft® DirectX® Media Object (DMO). However, it is useful for some purposes to simply capture the raw audio streams from the array’s microphones. The Kinect sensor’s microphone array is a standard Windows® multichannel audio-capture device, so you can also capture the audio stream by using the Windows Audio Session API (WASAPI) or by using the microphone array as a standard Windows microphone. The AudioCaptureRaw sample uses the WASAPI to capture the raw audio stream from the Kinect sensor’s microphone array and write it to a .wav file. This document is a walkthrough of the sample. For more information on WASAPI, see ―About WASAPI‖ on the Microsoft Developer Network (MSDN®) website. Note The WASAPI is COM-based, and this document assumes that you are familiar with the basics of how to use COM objects and interfaces. You do not need to know how to implement COM objects. For the basics of how to use COM objects, see ―Programming DirectX with COM‖ on the MSDN website. This MSDN topic is written for DirectX programmers, but the basic principles apply to all COM-based applications. Program Description AudioCaptureRaw is installed with the Kinect for Windows Software Development Kit (SDK) Beta samples in %KINECTSDK_DIR%\Samples\KinectSDKSamples.zip. AudioCaptureRaw is a C++ console application that is implemented in the following files: AudioCaptureRaw.cpp contains the application’s entry point and manages overall program execution. WASAPICapture.cpp and its associated header—WASAPICapture.h—implement the CWASAPICapture class, which handles the details of capturing the audio stream. The AudioCaptureRaw basic program flow is as follows: 1. Enumerate the system’s capture devices and select the appropriate device. Because the system might have multiple audio capture devices, the application enumerates all such devices and has the user specify the appropriate one. 2. Record 10 seconds of audio data from the device. 3. Write the recorded data to a WAVE file: out.wav. The recording process multiplexes the streams from each microphone channel in an interleaved format—ch 1/ ch 2/ ch 3/ ch 4/ ch 1/ ch 2/... and so on—with each channel’s data in a 16-kiloHertz (kHz), 32-bit mono pulse code modulation (PCM) format. AudioCaptureRaw Walkthrough: C++ – 3 The following is a lightly edited version of the AudioCaptureRaw output for a system with two capture devices—a Microsoft LifeChat® headset and a Kinect sensor: WASAPI Capture Shared Timer Driven Sample Copyright (c) Microsoft. All Rights Reserved Select an output device: 0: Microphone Array (Kinect USB Audio) ({0.0.1.00000000} {6ed40fd5-a340-4f8a-b324-edac93fa6702}) 1: Headset Microphone (3- Microsoft LifeChat LX-3000 )({0.0.1.00000000} {97721472-fc66-4d63-95a2-86c1044e0893}) 0 Capture audio data for 10 seconds 1 Successfully wrote WAVE data to out.wav The remainder of this document walks you through the application. Note This document includes code examples, most of which have been edited for brevity and readability. In particular, most routine error correction code has been removed. For the complete code, see the example. Hyperlinks in this walkthrough refer to content on the MSDN website. Select a Capture Device The application’s entry point is wmain, in WASAPICaptureRaw.cpp. This function manages the overall program execution, with private functions handling most of the details. WASAPI is COM-based, so AudioCapture Raw first initializes COM, as follows: int wmain() { ... HRESULT hr = CoInitializeEx(NULL, COINIT_MULTITHREADED); ... } Tip Applications that have a graphical user interface (GUI) should use COINIT_APARTMENTTHREADED instead of COINIT_MULTITHREADED. AudioCaptureRaw next calls the private PickDevice method to select the capture device, as follows: bool PickDevice(IMMDevice **DeviceToUse, bool *IsDefaultDevice, ERole *DefaultDeviceRole) { IMMDeviceEnumerator *deviceEnumerator = NULL; IMMDeviceCollection *deviceCollection = NULL; *IsDefaultDevice = false; AudioCaptureRaw Walkthrough: C++ – 4 hr = CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&deviceEnumerator)); ... } PickDevice calls the CoCreateInstance function to create a device enumerator object and get a pointer to its IMMDeviceEnumerator interface. Enumerate the Capture Devices PickDevice enumerates the system’s capture devices by calling the enumerator object’s IMMDeviceEnumerator::EnumAudioEndpoints method, as follows: bool PickDevice(...) { ... hr = deviceEnumerator->EnumAudioEndpoints(eCapture, DEVICE_STATE_ACTIVE, &deviceCollection); ... } The EnumAudioEndpoints parameter values are as follows: 1. A value from the EDataFlow enumeration that indicates the device type. eCapture directs EnumAudioEndpoints to enumerate only capture devices. 2. A DEVICE_STATE_XXX constant that specifies which device states to enumerate. DEVICE_STATE_ACTIVE directs EnumAudioEndpoints to enumerate only active devices. 3. The address of an IMMDeviceCollection interface pointer that contains the enumerated capture devices. PickDevice then uses the IMMDeviceCollection interface to list the available capture devices and let the user select the appropriate device—presumably the Kinect sensor—as follows: bool PickDevice(...) { UINT deviceCount; ... hr = deviceCollection->GetCount(&deviceCount); for (UINT i = 0 ; i < deviceCount ; i += 1) { LPWSTR deviceName; deviceName = GetDeviceName(deviceCollection, i); printf_s(" %d: %S\n", i, deviceName);