Reading codes of WebRTC: Deep into WebRTC Voice Engine(draft) 2


Introduction of this document

This module includes software based acoustic echo cancellation (AEC), automatic gain control (AGC), noise reduction, noise suppression and hardware access and control across multiple platforms.

My ultimate goal will be wrapping an independent module out of WebRTC’s Voice Engine for our product, and the first task is to get AGC implemented base on current KdvMediaSDK implementations which is not so much the same in interfaces with WebRTC).

Keywords: WebRTC, audio processing, AEC, AGC, noise reduction, noise suppression.

Overall architecture of WebRTC

The overall architecture looks something like this:

Overall architecture of WebRTC

Overall architecture of WebRTC

(Image from http://webrtc.org)

WebRTC Voice Engine – AGC control workflow

WebRTC Voice Engine - AGC control workflow

WebRTC Voice Engine – AGC control workflow

You can download my original Visio file here:

http://rg4.net/p/webrtc/webrtc.voiceengine.agc.vsd

You can modify & distribute whatever you wish, but if you made any improvement for this chart, please send me a copy by mail. That’ll benefit a lot more people. Thank you.

Target/related source codes

Major source codes:

l  audio_device_wave_win.cc: %WEBRTC%\src\modules\audio_device\main\source\win\audio_device_wave_win.cc

l  audio_device_buffer.cc:

l  audio_device_utility.cc:

l  audio_mixer_manager_win.cc:

l  voe_base_impl.cc: %WEBRTC%\src\voice_engine\main\source\voe_base_impl.cc

l  transmit_mixer.cc : %WEBRTC%\src\voice_engine\main\source\transmit_mixer.cc

l  level_indicator.cc AudioLevel src\voice_engine\main\source\level_indicator.cc

Utility source codes:

event_win_wrapper.cc

thread_win_wrapper.cc

Detail interfaces & implementations

audio_device_wave_win.cc

It responsible for:

l  Audio capture

l  Get/Set Microphone Volume (I’m not sure what this volume means, hardware volume, or a virtual volume after audio processing, only because it is Get/Set through audio_device_mixer_manager.cc)

a. Audio capture.

Step 1: run audio capture in a thread named as ThreadProcess().

[cpp]bool AudioDeviceWindowsWave::ThreadProcess()

{

while ((nRecordedBytes = RecProc(recTime)) > 0)
{

}

}[/cpp]

Step 2: detail into RecProc() function, all capture parameters & the captured buffer will be saved to a variant |AudioDeviceBuffer* _ptrAudioBuffer|

[cpp]WebRtc_Word32 AudioDeviceWindowsWave::RecProc(LONGLONG& consumedTime)
{
……
// store the recorded buffer (no action will be taken if the #recorded samples is not a full buffer)

_ptrAudioBuffer->SetRecordedBuffer(_waveHeaderIn[bufCount].lpData, nSamplesRecorded);

// Check how large the playout and recording buffers are on the sound card.
// This info is needed by the AEC.

msecOnPlaySide = GetPlayoutBufferDelay(writtenSamples, playedSamples);
msecOnRecordSide = GetRecordingBufferDelay(readSamples, recSamples);

// If we use the alternative playout delay method, skip the clock drift compensation
// since it will be an unreliable estimate and might degrade AEC performance.

WebRtc_Word32 drift = (_useHeader > 0) ? 0 : GetClockDrift(playedSamples, recSamples);
_ptrAudioBuffer->SetVQEData(msecOnPlaySide, msecOnRecordSide, drift);


if (_AGC)
{
WebRtc_UWord32 newMicLevel = _ptrAudioBuffer->NewMicLevel();
if (newMicLevel != 0)
{
// The VQE will only deliver non-zero microphone levels when a change is needed.
WEBRTC_TRACE(kTraceStream, kTraceUtility, _id,”AGC change of volume: => new=%u”, newMicLevel);

// We store this outside of the audio buffer to avoid
// having it overwritten by the getter thread.
_newMicLevel = newMicLevel;
SetEvent(_hSetCaptureVolumeEvent);
}
}

}[/cpp]

b. Get/Set Microphone Volume

There are two other threads along with the major capture thread, they are

::DoGetCaptureVolumeThread()

::DoSetCaptureVolumeThread()

These threads will be always running waiting for a signal to Get or Set capture volume.

Things I’m still trying to figure out

There are so many definitions about microphone level or relevant. What I’m not sure is which volume is what volume? Here are some volume related definitions I confused with:

1.    class VoEBaseImpl(voe_base_impl.cc)

What’s the difference between currentVoEMicLevel and currentMicLevel in the codes below?

Which can be also compare to the variants _oldVoEMicLevel and _oldMicLevel defined in voe_base_impl.cc

WebRtc_UWord32 _oldVoEMicLevel

WebRtc_UWord32 _oldMicLevel

Where will set/change these variants values?

This code locates at voe_base_impl.cc

[cpp]WebRtc_Word32 VoEBaseImpl::RecordedDataIsAvailable(

const WebRtc_UWord32 currentMicLevel,
WebRtc_UWord32& newMicLevel)
{

// Will only deal with the volume in adaptive analog mode
if (isAnalogAGC)
{
// Scale from ADM to VoE level range
if (_audioDevicePtr->MaxMicrophoneVolume(&maxVolume) == 0)
{
if (0 != maxVolume)
{
currentVoEMicLevel = (WebRtc_UWord16) ((currentMicLevel
* kMaxVolumeLevel + (int) (maxVolume / 2))
/ (maxVolume));
}
}
// We learned that on certain systems (e.g Linux) the currentVoEMicLevel
// can be greater than the maxVolumeLevel therefore
// we are going to cap the currentVoEMicLevel to the maxVolumeLevel
// if it turns out that the currentVoEMicLevel is indeed greater
// than the maxVolumeLevel
if (currentVoEMicLevel > kMaxVolumeLevel)
{
currentVoEMicLevel = kMaxVolumeLevel;
}
}
// Keep track if the MicLevel has been changed by the AGC, if not,
// use the old value AGC returns to let AGC continue its trend,
// so eventually the AGC is able to change the mic level. This handles
// issues with truncation introduced by the scaling.
if (_oldMicLevel == currentMicLevel)
{
currentVoEMicLevel = (WebRtc_UWord16) _oldVoEMicLevel;
}
// Perform channel-independent operations
// (APM, mix with file, record to file, mute, etc.)
_transmitMixerPtr->PrepareDemux(audioSamples, nSamples, nChannels,
samplesPerSec,
(WebRtc_UWord16) totalDelayMS, clockDrift,
currentVoEMicLevel);
// Copy the audio frame to each sending channel and perform
// channel-dependent operations (file mixing, mute, etc.) to prepare
// for encoding.
_transmitMixerPtr->DemuxAndMix();
// Do the encoding and packetize+transmit the RTP packet when encoding
// is done.
_transmitMixerPtr->EncodeAndSend();
// Will only deal with the volume in adaptive analog mode
if (isAnalogAGC)
{
// Scale from VoE to ADM level range
newVoEMicLevel = _transmitMixerPtr->CaptureLevel();
if (newVoEMicLevel != currentVoEMicLevel)
{
// Add (kMaxVolumeLevel/2) to round the value
newMicLevel = (WebRtc_UWord32) ((newVoEMicLevel * maxVolume
+ (int) (kMaxVolumeLevel / 2)) / (kMaxVolumeLevel));
}
else
{
// Pass zero if the level is unchanged
newMicLevel = 0;
}
// Keep track of the value AGC returns
_oldVoEMicLevel = newVoEMicLevel;
_oldMicLevel = currentMicLevel;
}
return 0;
}[/cpp]

2.    class AudioDeviceWindowsWave

(audio_device_wave_win.cc):

[cpp]WebRtc_UWord32                          _newMicLevel;
WebRtc_UWord32                          _minMicVolume;
[/cpp]

WebRtc_UWord32 _newMicLevel

WebRtc_UWord32 _minMicVolume

Where will set/change these variants values?

_newMicLevel value: |_ptrAudioBuffer->NewMicLevel();| in AudioDeviceWindowsWave::RecProc while processing AGC

3.    class TransmitMixer(transmit_mixer.cc)

WebRtc_UWord32 _captureLevel;

Codes listed below is the key to the microphone level values, including the level before processing & and the level after processed.

[cpp]WebRtc_Word32 TransmitMixer::APMProcessStream(
const WebRtc_UWord16 totalDelayMS,
const WebRtc_Word32 clockDrift,
const WebRtc_UWord16 currentMicLevel)
{
WebRtc_UWord16 captureLevel(currentMicLevel);

if (_audioProcessingModulePtr->gain_control()->set_stream_analog_level(
captureLevel) == -1)
{
WEBRTC_TRACE(kTraceWarning, kTraceVoice, VoEId(_instanceId, -1),
“AudioProcessing::set_stream_analog_level(%u) => error”,
captureLevel);
}

captureLevel =
_audioProcessingModulePtr->gain_control()->stream_analog_level();
// Store new capture level (only updated when analog AGC is enabled)
_captureLevel = captureLevel;

return 0;
}[/cpp]

4.    class AudioDeviceBuffer

WebRtc_UWord32 _currentMicLevel;

WebRtc_UWord32 _newMicLevel;

5.    class

Functional implement flow:

audio_device_wave_win.cc

Summary

The major code of calling webrtc APM(audio processing manager) is in the function APMProcessStream() of TransmitMixer class.

For example: we will do the audio processing here for the input audio frame(AudioFrame _audioFrame), calculating the microphone levels in the same time, then output the processed frame and the new microphone level.

[cpp]WebRtc_Word32 TransmitMixer::APMProcessStream(
const WebRtc_UWord16 totalDelayMS,
const WebRtc_Word32 clockDrift,
const WebRtc_UWord16 currentMicLevel)
{
WebRtc_UWord16 captureLevel(currentMicLevel);

if (_audioProcessingModulePtr->gain_control()->set_stream_analog_level(
captureLevel) == -1)
{
WEBRTC_TRACE(kTraceWarning, kTraceVoice, VoEId(_instanceId, -1),
“AudioProcessing::set_stream_analog_level(%u) => error”,
captureLevel);
}

captureLevel =
_audioProcessingModulePtr->gain_control()->stream_analog_level();
// Store new capture level (only updated when analog AGC is enabled)
_captureLevel = captureLevel;

return 0;
}[/cpp]

And here are some customized log ouputs I added to the source code to log the detail processing and microphone level value change of AGC processing

[code]…

CRITICAL  ; ( 3: 7:13:562 |    4) AUDIO DEVICE:    1    99;      4776; TransmitMixer::PrepareDemux, Near-end Voice Quality Enhancement (APM) processing, currentMicLevel=18 before processing

CRITICAL  ; ( 3: 7:13:562 |    0)        VOICE:    1    99;      4776; TransmitMixer::APMProcessStream, AudioProcessing::set_stream_analog_level(18)

CRITICAL  ; ( 3: 7:13:562 |    1)        VOICE:    1    99;      4776; TransmitMixer::APMProcessStream, AudioProcessing::get_stream_analog_level after processed(17)

CRITICAL  ; ( 3: 7:13:562 |    0) AUDIO DEVICE:    1    99;      4776; TransmitMixer::PrepareDemux,Measure audio level of speech after APM processing, currentMicLevel=18, energy=-1

CRITICAL  ; ( 3: 7:13:562 |    0) AUDIO DEVICE:    1    99;      4776; AudioDeviceBuffer: _ptrCbAudioTransport->RecordedDataIsAvailable return newMicLevel=4369

CRITICAL  ; ( 3: 7:13:562 |    0)      UTILITY:    1    99;      4776; AudioDeviceWindowsWave::RecProc AGC change of volume: => new=4369

CRITICAL  ; ( 3: 7:13:562 |    3) AUDIO DEVICE:    1    99;      5672; AudioMixerManager::SetMicrophoneVolume volume=4369

…..

[/code]


Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Reading codes of WebRTC: Deep into WebRTC Voice Engine(draft)

  • Sam

    Hi Jackie; great article!
    May I ask you something? I’m trying to figure out a way in order to put another echo canceller after webrtc voice engine. What I’d like to do is to test my AEC deactivating the webrtc’s one.
    Where should I start looking for a better understanding on where and how the data from the sound card is pulled and used.

    Thank a lot!

    • Jacky Wei Post author

      Hi Sam, it was years ago I researched into WebRTC for audio process related implementations, and the codes I listed here could be different comparing to the latest codes.
      If you still interested in my experience in WebRTC Audio Engine, you can try to search the code like this:

      s32 CAudioTransportAPI::RecordedDataIsAvailable(const void* audioSamples,
      const uint32_t nSamples,
      const uint8_t nBytesPerSample,
      const uint8_t nChannels,
      const uint32_t samplesPerSec,
      const uint32_t totalDelayMS,
      const int32_t clockDrift,
      const uint32_t currentMicLevel,
      const bool keyPressed,
      uint32_t& newMicLevel)
      {
      ……
      }

      This was the exact callback interface when there are audio buffers ready from the sound card.
      To customize the callback or add some specific process for yourself, you need to add a new class which inheritted from CAudioTransportAPI and overwrite this callback function.