Reading codes of WebRTC: Deep into WebRTC Voice Engine(draft) 2

Introduction of this document

This module includes software based acoustic echo cancellation (AEC), automatic gain control (AGC), noise reduction, noise suppression and hardware access and control across multiple platforms.

My ultimate goal will be wrapping an independent module out of WebRTC’s Voice Engine for our product, and the first task is to get AGC implemented base on current KdvMediaSDK implementations which is not so much the same in interfaces with WebRTC).

Keywords: WebRTC, audio processing, AEC, AGC, noise reduction, noise suppression.

Overall architecture of WebRTC

The overall architecture looks something like this:

Overall architecture of WebRTC

Overall architecture of WebRTC

(Image from

WebRTC Voice Engine – AGC control workflow

WebRTC Voice Engine - AGC control workflow

WebRTC Voice Engine – AGC control workflow

You can download my original Visio file here:

You can modify & distribute whatever you wish, but if you made any improvement for this chart, please send me a copy by mail. That’ll benefit a lot more people. Thank you.

Target/related source codes

Major source codes:

l %WEBRTC%\src\modules\audio_device\main\source\win\




l %WEBRTC%\src\voice_engine\main\source\

l : %WEBRTC%\src\voice_engine\main\source\

l AudioLevel src\voice_engine\main\source\

Utility source codes:

Detail interfaces & implementations

It responsible for:

l  Audio capture

l  Get/Set Microphone Volume (I’m not sure what this volume means, hardware volume, or a virtual volume after audio processing, only because it is Get/Set through

a. Audio capture.

Step 1: run audio capture in a thread named as ThreadProcess().

Step 2: detail into RecProc() function, all capture parameters & the captured buffer will be saved to a variant |AudioDeviceBuffer* _ptrAudioBuffer|

b. Get/Set Microphone Volume

There are two other threads along with the major capture thread, they are



These threads will be always running waiting for a signal to Get or Set capture volume.

Things I’m still trying to figure out

There are so many definitions about microphone level or relevant. What I’m not sure is which volume is what volume? Here are some volume related definitions I confused with:

1.    class VoEBaseImpl(

What’s the difference between currentVoEMicLevel and currentMicLevel in the codes below?

Which can be also compare to the variants _oldVoEMicLevel and _oldMicLevel defined in

WebRtc_UWord32 _oldVoEMicLevel

WebRtc_UWord32 _oldMicLevel

Where will set/change these variants values?

This code locates at

2.    class AudioDeviceWindowsWave


WebRtc_UWord32 _newMicLevel

WebRtc_UWord32 _minMicVolume

Where will set/change these variants values?

_newMicLevel value: |_ptrAudioBuffer->NewMicLevel();| in AudioDeviceWindowsWave::RecProc while processing AGC

3.    class TransmitMixer(

WebRtc_UWord32 _captureLevel;

Codes listed below is the key to the microphone level values, including the level before processing & and the level after processed.

4.    class AudioDeviceBuffer

WebRtc_UWord32 _currentMicLevel;

WebRtc_UWord32 _newMicLevel;

5.    class

Functional implement flow:


The major code of calling webrtc APM(audio processing manager) is in the function APMProcessStream() of TransmitMixer class.

For example: we will do the audio processing here for the input audio frame(AudioFrame _audioFrame), calculating the microphone levels in the same time, then output the processed frame and the new microphone level.

And here are some customized log ouputs I added to the source code to log the detail processing and microphone level value change of AGC processing


CRITICAL  ; ( 3: 7:13:562 |    4) AUDIO DEVICE:    1    99;      4776; TransmitMixer::PrepareDemux, Near-end Voice Quality Enhancement (APM) processing, currentMicLevel=18 before processing

CRITICAL  ; ( 3: 7:13:562 |    0)        VOICE:    1    99;      4776; TransmitMixer::APMProcessStream, AudioProcessing::set_stream_analog_level(18)

CRITICAL  ; ( 3: 7:13:562 |    1)        VOICE:    1    99;      4776; TransmitMixer::APMProcessStream, AudioProcessing::get_stream_analog_level after processed(17)

CRITICAL  ; ( 3: 7:13:562 |    0) AUDIO DEVICE:    1    99;      4776; TransmitMixer::PrepareDemux,Measure audio level of speech after APM processing, currentMicLevel=18, energy=-1

CRITICAL  ; ( 3: 7:13:562 |    0) AUDIO DEVICE:    1    99;      4776; AudioDeviceBuffer: _ptrCbAudioTransport->RecordedDataIsAvailable return newMicLevel=4369

CRITICAL  ; ( 3: 7:13:562 |    0)      UTILITY:    1    99;      4776; AudioDeviceWindowsWave::RecProc AGC change of volume: => new=4369

CRITICAL  ; ( 3: 7:13:562 |    3) AUDIO DEVICE:    1    99;      5672; AudioMixerManager::SetMicrophoneVolume volume=4369



Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Reading codes of WebRTC: Deep into WebRTC Voice Engine(draft)

  • Sam

    Hi Jackie; great article!
    May I ask you something? I’m trying to figure out a way in order to put another echo canceller after webrtc voice engine. What I’d like to do is to test my AEC deactivating the webrtc’s one.
    Where should I start looking for a better understanding on where and how the data from the sound card is pulled and used.

    Thank a lot!

    • Jacky Wei Post author

      Hi Sam, it was years ago I researched into WebRTC for audio process related implementations, and the codes I listed here could be different comparing to the latest codes.
      If you still interested in my experience in WebRTC Audio Engine, you can try to search the code like this:

      s32 CAudioTransportAPI::RecordedDataIsAvailable(const void* audioSamples,
      const uint32_t nSamples,
      const uint8_t nBytesPerSample,
      const uint8_t nChannels,
      const uint32_t samplesPerSec,
      const uint32_t totalDelayMS,
      const int32_t clockDrift,
      const uint32_t currentMicLevel,
      const bool keyPressed,
      uint32_t& newMicLevel)

      This was the exact callback interface when there are audio buffers ready from the sound card.
      To customize the callback or add some specific process for yourself, you need to add a new class which inheritted from CAudioTransportAPI and overwrite this callback function.