audio_low_latency_output_win.h revision 5821806d5e7f356e8fa4b058a389a808ea183019
1// Copyright (c) 2012 The Chromium Authors. All rights reserved.
2// Use of this source code is governed by a BSD-style license that can be
3// found in the LICENSE file.
4
5// Implementation of AudioOutputStream for Windows using Windows Core Audio
6// WASAPI for low latency rendering.
7//
8// Overview of operation and performance:
9//
10// - An object of WASAPIAudioOutputStream is created by the AudioManager
11//   factory.
12// - Next some thread will call Open(), at that point the underlying
13//   Core Audio APIs are utilized to create two WASAPI interfaces called
14//   IAudioClient and IAudioRenderClient.
15// - Then some thread will call Start(source).
16//   A thread called "wasapi_render_thread" is started and this thread listens
17//   on an event signal which is set periodically by the audio engine to signal
18//   render events. As a result, OnMoreData() will be called and the registered
19//   client is then expected to provide data samples to be played out.
20// - At some point, a thread will call Stop(), which stops and joins the
21//   render thread and at the same time stops audio streaming.
22// - The same thread that called stop will call Close() where we cleanup
23//   and notify the audio manager, which likely will destroy this object.
24// - Initial tests on Windows 7 shows that this implementation results in a
25//   latency of approximately 35 ms if the selected packet size is less than
26//   or equal to 20 ms. Using a packet size of 10 ms does not result in a
27//   lower latency but only affects the size of the data buffer in each
28//   OnMoreData() callback.
29// - A total typical delay of 35 ms contains three parts:
30//    o Audio endpoint device period (~10 ms).
31//    o Stream latency between the buffer and endpoint device (~5 ms).
32//    o Endpoint buffer (~20 ms to ensure glitch-free rendering).
33// - Note that, if the user selects a packet size of e.g. 100 ms, the total
34//   delay will be approximately 115 ms (10 + 5 + 100).
35// - Supports device events using the IMMNotificationClient Interface. If
36//   streaming has started, a so-called stream switch will take place in the
37//   following situations:
38//    o The user enables or disables an audio endpoint device from Device
39//      Manager or from the Windows multimedia control panel, Mmsys.cpl.
40//    o The user adds an audio adapter to the system or removes an audio
41//      adapter from the system.
42//    o The user plugs an audio endpoint device into an audio jack with
43//      jack-presence detection, or removes an audio endpoint device from
44//      such a jack.
45//    o The user changes the device role that is assigned to a device.
46//    o The value of a property of a device changes.
47//   Practical/typical example: A user has two audio devices A and B where
48//   A is a built-in device configured as Default Communication and B is a
49//   USB device set as Default device. Audio rendering starts and audio is
50//   played through the device B since the eConsole role is used by the audio
51//   manager in Chrome today. If the user now removes the USB device (B), it
52//   will be detected and device A will instead be defined as the new default
53//   device. Rendering will automatically stop, all resources will be released
54//   and a new session will be initialized and started using device A instead.
55//   The net effect for the user is that audio will automatically switch from
56//   device B to device A. Same thing will happen if the user now re-inserts
57//   the USB device again.
58//
59// Implementation notes:
60//
61// - The minimum supported client is Windows Vista.
62// - This implementation is single-threaded, hence:
63//    o Construction and destruction must take place from the same thread.
64//    o All APIs must be called from the creating thread as well.
65// - It is recommended to first acquire the native sample rate of the default
66//   input device and then use the same rate when creating this object. Use
67//   WASAPIAudioOutputStream::HardwareSampleRate() to retrieve the sample rate.
68// - Calling Close() also leads to self destruction.
69// - Stream switching is not supported if the user shifts the audio device
70//   after Open() is called but before Start() has been called.
71// - Stream switching can fail if streaming starts on one device with a
72//   supported format (X) and the new default device - to which we would like
73//   to switch - uses another format (Y), which is not supported given the
74//   configured audio parameters.
75// - The audio device must be opened with the same number of channels as it
76//   supports natively (see HardwareChannelCount()) otherwise Open() will fail.
77// - Support for 8-bit audio has not yet been verified and tested.
78//
79// Core Audio API details:
80//
81// - The public API methods (Open(), Start(), Stop() and Close()) must be
82//   called on constructing thread. The reason is that we want to ensure that
83//   the COM environment is the same for all API implementations.
84// - Utilized MMDevice interfaces:
85//     o IMMDeviceEnumerator
86//     o IMMDevice
87// - Utilized WASAPI interfaces:
88//     o IAudioClient
89//     o IAudioRenderClient
90// - The stream is initialized in shared mode and the processing of the
91//   audio buffer is event driven.
92// - The Multimedia Class Scheduler service (MMCSS) is utilized to boost
93//   the priority of the render thread.
94// - Audio-rendering endpoint devices can have three roles:
95//   Console (eConsole), Communications (eCommunications), and Multimedia
96//   (eMultimedia). Search for "Device Roles" on MSDN for more details.
97// - The actual stream-switch is executed on the audio-render thread but it
98//   is triggered by an internal MMDevice thread using callback methods
99//   in the IMMNotificationClient interface.
100//
101// Threading details:
102//
103// - It is assumed that this class is created on the audio thread owned
104//   by the AudioManager.
105// - It is a requirement to call the following methods on the same audio
106//   thread: Open(), Start(), Stop(), and Close().
107// - Audio rendering is performed on the audio render thread, owned by this
108//   class, and the AudioSourceCallback::OnMoreData() method will be called
109//   from this thread. Stream switching also takes place on the audio-render
110//   thread.
111// - All callback methods from the IMMNotificationClient interface will be
112//   called on a Windows-internal MMDevice thread.
113//
114// Experimental exclusive mode:
115//
116// - It is possible to open up a stream in exclusive mode by using the
117//   --enable-exclusive-audio command line flag.
118// - The internal buffering scheme is less flexible for exclusive streams.
119//   Hence, some manual tuning will be required before deciding what frame
120//   size to use. See the WinAudioOutputTest unit test for more details.
121// - If an application opens a stream in exclusive mode, the application has
122//   exclusive use of the audio endpoint device that plays the stream.
123// - Exclusive-mode should only be utilized when the lowest possible latency
124//   is important.
125// - In exclusive mode, the client can choose to open the stream in any audio
126//   format that the endpoint device supports, i.e. not limited to the device's
127//   current (default) configuration.
128// - Initial measurements on Windows 7 (HP Z600 workstation) have shown that
129//   the lowest possible latencies we can achieve on this machine are:
130//     o ~3.3333ms @ 48kHz <=> 160 audio frames per buffer.
131//     o ~3.6281ms @ 44.1kHz <=> 160 audio frames per buffer.
132// - See http://msdn.microsoft.com/en-us/library/windows/desktop/dd370844(v=vs.85).aspx
133//   for more details.
134
135#ifndef MEDIA_AUDIO_WIN_AUDIO_LOW_LATENCY_OUTPUT_WIN_H_
136#define MEDIA_AUDIO_WIN_AUDIO_LOW_LATENCY_OUTPUT_WIN_H_
137
138#include <Audioclient.h>
139#include <audiopolicy.h>
140#include <MMDeviceAPI.h>
141
142#include <string>
143
144#include "base/compiler_specific.h"
145#include "base/memory/scoped_ptr.h"
146#include "base/threading/platform_thread.h"
147#include "base/threading/simple_thread.h"
148#include "base/win/scoped_co_mem.h"
149#include "base/win/scoped_com_initializer.h"
150#include "base/win/scoped_comptr.h"
151#include "base/win/scoped_handle.h"
152#include "media/audio/audio_io.h"
153#include "media/audio/audio_parameters.h"
154#include "media/base/media_export.h"
155
156namespace media {
157
158class AudioManagerWin;
159
160// AudioOutputStream implementation using Windows Core Audio APIs.
161// TODO(henrika): Remove IMMNotificationClient implementation now that we have
162// AudioDeviceListenerWin; currently just disabled since extraction is extremely
163// advanced.
164class MEDIA_EXPORT WASAPIAudioOutputStream
165    : public IMMNotificationClient,
166      public AudioOutputStream,
167      public base::DelegateSimpleThread::Delegate {
168 public:
169  // The ctor takes all the usual parameters, plus |manager| which is the
170  // the audio manager who is creating this object.
171  WASAPIAudioOutputStream(AudioManagerWin* manager,
172                          const AudioParameters& params,
173                          ERole device_role);
174  // The dtor is typically called by the AudioManager only and it is usually
175  // triggered by calling AudioOutputStream::Close().
176  virtual ~WASAPIAudioOutputStream();
177
178  // Implementation of AudioOutputStream.
179  virtual bool Open() OVERRIDE;
180  virtual void Start(AudioSourceCallback* callback) OVERRIDE;
181  virtual void Stop() OVERRIDE;
182  virtual void Close() OVERRIDE;
183  virtual void SetVolume(double volume) OVERRIDE;
184  virtual void GetVolume(double* volume) OVERRIDE;
185
186  // Retrieves the number of channels the audio engine uses for its internal
187  // processing/mixing of shared-mode streams for the default endpoint device.
188  static int HardwareChannelCount();
189
190  // Retrieves the channel layout the audio engine uses for its internal
191  // processing/mixing of shared-mode streams for the default endpoint device.
192  // Note that we convert an internal channel layout mask (see ChannelMask())
193  // into a Chrome-specific channel layout enumerator in this method, hence
194  // the match might not be perfect.
195  static ChannelLayout HardwareChannelLayout();
196
197  // Retrieves the sample rate the audio engine uses for its internal
198  // processing/mixing of shared-mode streams for the default endpoint device.
199  static int HardwareSampleRate(ERole device_role);
200
201  // Returns AUDCLNT_SHAREMODE_EXCLUSIVE if --enable-exclusive-mode is used
202  // as command-line flag and AUDCLNT_SHAREMODE_SHARED otherwise (default).
203  static AUDCLNT_SHAREMODE GetShareMode();
204
205  bool started() const { return render_thread_.get() != NULL; }
206
207  // Returns the number of channels the audio engine uses for its internal
208  // processing/mixing of shared-mode streams for the default endpoint device.
209  int GetEndpointChannelCountForTesting() { return format_.Format.nChannels; }
210
211 private:
212  // Implementation of IUnknown (trivial in this case). See
213  // msdn.microsoft.com/en-us/library/windows/desktop/dd371403(v=vs.85).aspx
214  // for details regarding why proper implementations of AddRef(), Release()
215  // and QueryInterface() are not needed here.
216  STDMETHOD_(ULONG, AddRef)();
217  STDMETHOD_(ULONG, Release)();
218  STDMETHOD(QueryInterface)(REFIID iid, void** object);
219
220  // Implementation of the abstract interface IMMNotificationClient.
221  // Provides notifications when an audio endpoint device is added or removed,
222  // when the state or properties of a device change, or when there is a
223  // change in the default role assigned to a device. See
224  // msdn.microsoft.com/en-us/library/windows/desktop/dd371417(v=vs.85).aspx
225  // for more details about the IMMNotificationClient interface.
226
227  // The default audio endpoint device for a particular role has changed.
228  // This method is only used for diagnostic purposes.
229  STDMETHOD(OnDeviceStateChanged)(LPCWSTR device_id, DWORD new_state);
230
231  // Indicates that the state of an audio endpoint device has changed.
232  STDMETHOD(OnDefaultDeviceChanged)(EDataFlow flow, ERole role,
233                                    LPCWSTR new_default_device_id);
234
235  // These IMMNotificationClient methods are currently not utilized.
236  STDMETHOD(OnDeviceAdded)(LPCWSTR device_id) { return S_OK; }
237  STDMETHOD(OnDeviceRemoved)(LPCWSTR device_id) { return S_OK; }
238  STDMETHOD(OnPropertyValueChanged)(LPCWSTR device_id,
239                                    const PROPERTYKEY key) {
240    return S_OK;
241  }
242
243  // DelegateSimpleThread::Delegate implementation.
244  virtual void Run() OVERRIDE;
245
246  // Issues the OnError() callback to the |sink_|.
247  void HandleError(HRESULT err);
248
249  // The Open() method is divided into these sub methods.
250  HRESULT SetRenderDevice();
251  HRESULT ActivateRenderDevice();
252  bool DesiredFormatIsSupported();
253  HRESULT InitializeAudioEngine();
254
255  // Called when the device will be opened in shared mode and use the
256  // internal audio engine's mix format.
257  HRESULT SharedModeInitialization();
258
259  // Called when the device will be opened in exclusive mode and use the
260  // application specified format.
261  HRESULT ExclusiveModeInitialization();
262
263  // Converts unique endpoint ID to user-friendly device name.
264  std::string GetDeviceName(LPCWSTR device_id) const;
265
266  // Called on the audio render thread when the current audio stream must
267  // be re-initialized because the default audio device has changed. This
268  // method: stops the current renderer, releases and re-creates all WASAPI
269  // interfaces, creates a new IMMDevice and re-starts rendering using the
270  // new default audio device.
271  bool RestartRenderingUsingNewDefaultDevice();
272
273  // Contains the thread ID of the creating thread.
274  base::PlatformThreadId creating_thread_id_;
275
276  // Our creator, the audio manager needs to be notified when we close.
277  AudioManagerWin* manager_;
278
279  // Rendering is driven by this thread (which has no message loop).
280  // All OnMoreData() callbacks will be called from this thread.
281  scoped_ptr<base::DelegateSimpleThread> render_thread_;
282
283  // Contains the desired audio format which is set up at construction.
284  // Extended PCM waveform format structure based on WAVEFORMATEXTENSIBLE.
285  // Use this for multiple channel and hi-resolution PCM data.
286  WAVEFORMATPCMEX format_;
287
288  // Copy of the audio format which we know the audio engine supports.
289  // It is recommended to ensure that the sample rate in |format_| is identical
290  // to the sample rate in |audio_engine_mix_format_|.
291  base::win::ScopedCoMem<WAVEFORMATPCMEX> audio_engine_mix_format_;
292
293  bool opened_;
294
295  // Set to true as soon as a new default device is detected, and cleared when
296  // the streaming has switched from using the old device to the new device.
297  // All additional device detections during an active state are ignored to
298  // ensure that the ongoing switch can finalize without disruptions.
299  bool restart_rendering_mode_;
300
301  // Volume level from 0 to 1.
302  float volume_;
303
304  // Size in bytes of each audio frame (4 bytes for 16-bit stereo PCM).
305  size_t frame_size_;
306
307  // Size in audio frames of each audio packet where an audio packet
308  // is defined as the block of data which the source is expected to deliver
309  // in each OnMoreData() callback.
310  size_t packet_size_frames_;
311
312  // Size in bytes of each audio packet.
313  size_t packet_size_bytes_;
314
315  // Size in milliseconds of each audio packet.
316  float packet_size_ms_;
317
318  // Length of the audio endpoint buffer.
319  size_t endpoint_buffer_size_frames_;
320
321  // Defines the role that the system has assigned to an audio endpoint device.
322  ERole device_role_;
323
324  // The sharing mode for the connection.
325  // Valid values are AUDCLNT_SHAREMODE_SHARED and AUDCLNT_SHAREMODE_EXCLUSIVE
326  // where AUDCLNT_SHAREMODE_SHARED is the default.
327  AUDCLNT_SHAREMODE share_mode_;
328
329  // The channel count set by the client in |params| which is provided to the
330  // constructor. The client must feed the AudioSourceCallback::OnMoreData()
331  // callback with PCM-data that contains this number of channels.
332  int client_channel_count_;
333
334  // Counts the number of audio frames written to the endpoint buffer.
335  UINT64 num_written_frames_;
336
337  // Pointer to the client that will deliver audio samples to be played out.
338  AudioSourceCallback* source_;
339
340  // An IMMDeviceEnumerator interface which represents a device enumerator.
341  base::win::ScopedComPtr<IMMDeviceEnumerator> device_enumerator_;
342
343  // An IMMDevice interface which represents an audio endpoint device.
344  base::win::ScopedComPtr<IMMDevice> endpoint_device_;
345
346  // An IAudioClient interface which enables a client to create and initialize
347  // an audio stream between an audio application and the audio engine.
348  base::win::ScopedComPtr<IAudioClient> audio_client_;
349
350  // The IAudioRenderClient interface enables a client to write output
351  // data to a rendering endpoint buffer.
352  base::win::ScopedComPtr<IAudioRenderClient> audio_render_client_;
353
354  // The audio engine will signal this event each time a buffer becomes
355  // ready to be filled by the client.
356  base::win::ScopedHandle audio_samples_render_event_;
357
358  // This event will be signaled when rendering shall stop.
359  base::win::ScopedHandle stop_render_event_;
360
361  // This event will be signaled when stream switching shall take place.
362  base::win::ScopedHandle stream_switch_event_;
363
364  // Container for retrieving data from AudioSourceCallback::OnMoreData().
365  scoped_ptr<AudioBus> audio_bus_;
366
367  DISALLOW_COPY_AND_ASSIGN(WASAPIAudioOutputStream);
368};
369
370}  // namespace media
371
372#endif  // MEDIA_AUDIO_WIN_AUDIO_LOW_LATENCY_OUTPUT_WIN_H_
373