Step by step to enable x264 with OpenCL – NVIDIA solution

x264 project added OpenCL video acceleration to it’s implementation early at about 2013(not sure with the date), and my goal here is test the video encoding performance of x264 when with OpenCL video accelerator enabled.

Test hardware environments: HP Pavilion 14
1. Graphic card: NVIDIA GeForce GT730M card.
2. CPU: Intel Core Ivy Bridge i7-3632QM
Continue reading “Step by step to enable x264 with OpenCL – NVIDIA solution”

Supporting DXVA 2.0 in DirectShow

This thread is direct forward from Microsoft MSDN website:

Easlier in this month, I was researching hardware video encoding/decoding supports for Linux base environments, which involves Intel Media SDK & VA-API(libva).

Happen to see this DXVA related post in MSDN, so I decided to copy it to my blog.

This topic describes how to support DirectX Video Acceleration (DXVA) 2.0 in a DirectShow decoder filter. Specifically, it describes the communication between the decoder and the video renderer. This topic does not describe how to implement DXVA decoding.

Continue reading “Supporting DXVA 2.0 in DirectShow”

Performance tests of Intel Media SDK(4.0.026-HSW): Efficiency comparision between different Profiles

720P H.264 video decoding performance

Yesterday I got two test H.264 ES stream files from one of my colleague.

One was encoded by Main Profile, the other was encoded by Baseline Profile.

Weired thing is decoding the Main Profiled file seems much more fast than the Baseline Profiled file.

To release his concerns, today I did some more tests.

First, I double checked with these files together with one another High Profile file.

Here is the test result:

Profile Instance FPS CPU GPU
HP 1 629.63575 <10% 50% ~ 60%
HP 4 854.59665 40% ~ 50% 80%
Main 1 1064.62068 20% ~ 30% 67%
Main 4 1257.23482 40% ~ 50% 90% ~ 93%
Baseline 1 740.99021 <10% 46% ~ 48%
Baseline 4 1241.61537 30% ~ 50% 80% ~ 83%

*CPU usage: max usage on a single core
*GPU usage: max usage of GPU’s bitstream space

According to test result, the Baseline & Main Profile video decoding efficiency seems really strange.

So I checked these test video streams:

Test video sequence bit stream file:

  • 720p HP: 720p-lakeside-forest.264 (Standard test sequence)
  • 720p Main: 720p-dec-main.264 . Recorded video from one of Kedacom product, A regular moving video, and clearly a rag.
  • 720p Baseline: 720p-vid2-baseline.264 .Recorded video from one of Kedacom product, a regular moving video.

Then, I decided to transcode the Baseline profile video(720p-vid2-baseline.264) to a Main Profile video stream, and run one more test for Main Profile H.264 video decoding which will be using a same YUV video sequence with the previous Baseline Profile video decoding test.

Since the H.264 profile setting’s in MSDK is not open for set(but can be set by an indirect way), I ported a x264 to the current test framework(added a new encoder library base on x264).

Finally, I got two more files:

720p-vid2-baseline-still-baseline.264, which major encode setting was “baseline” + “veryfast”, “zerolatency”

Command: ./kdvcodec_msdkenc h264 -i 720p-baseline.yuv -o s.264 -w 1280 -h 720 -b 2000 -f 30 -hw -vaapi -u speed -loop 1 -run 1 -test 0 -cache 500

720p-vid2-baseline-to-main.264, which major encode setting was “ main” + “veryfast”, “zerolatency” profile Main, level 5.2

Command: ./kdvcodec_msdkenc h264 -i 720p-baseline.yuv -o s.264 -w 1280 -h 720 -b 2000 -f 30 -hw -vaapi -u speed -loop 1 -run 1 -test 2 -cache 500

Now run one more test:

Profile Instance FPS CPU GPU
Baseline 1 721.26709 20% ~ 30% 67%
Baseline 4 1218.01066 40% ~ 50% 90% ~ 93%
Main 1 620.54006 <10% 46% ~ 48%
Main 4 1136.95245 30% ~ 50% 80% ~ 83%


  1. Decoding a ragged video will cost less than decoding a normal video.
  2. Decoding a Baseline profile video can be more efficiency than decoding a Main Profile by given an even circumstance.

Step by step: build libva 1.2.0 on Ubuntu 12.04 LTS

After finished my first phase of GPU – Intel Media SDK research, now it’s time to get into libva, my goal is to get a set of performance test report for both MSDK and libva, so that we can finally choose one of them, or both of them for product use. If you don’t know libva, or not sure if you know it or not, please visit here for more informations: or (a copy in RG4.NET)


And the first step of research on libva will be download & build libva. Continue reading “Step by step: build libva 1.2.0 on Ubuntu 12.04 LTS”

Performance tests of Intel Media SDK (4.0.024-HSW) – Multithread decode

Performance test – multithread decode

Media SDK version

Intel Media SDK (4.0.024-HSW)

Test environments

CPU: Core ivy bridge i7-3770

OS: Ubuntu server 12.04 LTS, kernel version 3.2.0-23 x86_64 x86_64 x86_64 GNU/Linux

Test program

kdvcodec_msdkdec_mt (svn: 10)

Working mode

a. Decode with simulated block mode

b. Return with NV12 buffer.

c. Run N decoding threads. Open & caching H.264 Elementary Streams from one specified file.

d. Timing: start before thread created, end time after all thread terminated(Means it’s a ballpark estimation). Continue reading “Performance tests of Intel Media SDK (4.0.024-HSW) – Multithread decode”

Performance tests of Intel Media SDK (4.0.024-HSW) – Decode

It’s a test report of current/latest version of kdvcodec_msdkdec which based on the latest version of Intel Media SDK(4.0.024-HSW).

There is a known issue, NV12 to YV12 (or YUV420P) conversion, which is very low in efficiency.

Here are the details. Continue reading “Performance tests of Intel Media SDK (4.0.024-HSW) – Decode”

Step by step research into Intel Media SDK for Linux server

Intel release an alpha version of Media SDK for linux servers (Intel® Media SDK 2013 for Linux Servers) recently, here now I am researching on it. I will record my research details in this post, as usual,  can not guarantee everything I write here are exact true and right, only an experiment of it.


Intel® Media SDK 2013 for Linux Servers is an SDK for optimizing datacenter and embedded media applications for Linux operating systems to utilize Intel HD Graphics hardware acceleration capabilities. Now, quickly and easily develop optimized media applications for Linux operating systems such as encode, decode, and transcode for real-time streaming, teleconferencing, and video analytics.

You can visit: to get more informations from Intel’s official website of Media SDK


  • For Intel Xeon® E3-1285Lv2 and Intel Core™ Processor-based Platforms with Intel HD Graphics
  • Encode, decode, and transcode for server-based streaming
  • Supports Ubuntu* and SUSE* Linux Enterprise Operating Systems
  • Supports H.264, MPEG-2, VC-1 formats


Step by step to setup the develop environment, and get into works. My hardware environemts(CPU) is Core 3rd Generation Ivy, according to Intel’s introduction in it Release Notes, I can only choose ubuntu 12.04 server 64 with kernel 3.2.

1. OS Install
Related files:
1. ubuntu-12.04-server-amd64.iso
2. unetbootin-windows-581.exe

According to Intel’s Release Notes, target OS version match with MediaSDK is ubuntu 12.04 server 64 bit with kernel 3.2.
So we choose: ubuntu-12.04-server-amd64.iso(we can not download this version of ubuntu server on, if you dont have it, you can mail me or call me)

1. Get yourself a USB flash(4G or bigger), use unetbootin-windows-581.exe to write the iso to the USB flash.
2. Plug the USB to 8000e, and power up, press DEL to get into BIOS, choose your newly plugged in USB as first boot option.
3. Install Ubuntu server.
4. Make sure your 8000e can access to the internet (Mail your 8000e’s ethernet MAC ADDR together with your pre-assigned IP addr to 黄阳)
5. I strongly recommend you to not do it on 8000e’s iCF4000 card(which has only 8 GB rom), because I was sucked here 2 times being told “no space left” while compiling the kernel.
2. Media SDK install
Related files:
1. haswell.rar which including but not limits to files:
a. intel-Linux-media_ubuntu_16.1.0.8778_64bit.tar.gz
b. kmd_patched_sources.tar.bz2

Upload haswell.rar to 8000e, and extract it, you will find a file named as intel-linux-media_ubuntu_16.1.0.8778_64bit.tar.gz
1. Extract intel-linux-media_ubuntu_16.1.0.8778_64bit.tar.gz
2. Locate to the extracted files, find and execute ./

jacky@ubuntu-msdk:~/msdk$ ./
INFO... Install on Ubuntu ...
Error... This script must be run as root!
jacky@ubuntu-msdk:~/msdk$ sudo -i
[sudo] password for jacky:
root@ubuntu-msdk:~# ls
root@ubuntu-msdk:~# cd /home/jacky/msdk/
root@ubuntu-msdk:/home/jacky/msdk# ls
bak  intel-linux-media_ubuntu_16.1.0.8778_64bit.tar.gz  kmd  MSDK  usr
root@ubuntu-msdk:/home/jacky/msdk# ./
INFO... Install on Ubuntu ...
INFO... Installing New Driver...
INFO... The default media driver is renderless API, do you want to use X11 backend?
press 'y' to use X11 backend, otherwise by default(drm backend, renderless)y
INFO... X11 backend enabled!
INFO... MediaSDK installed successfully in /opt/intel/mediasdk!
INFO... Do you want to install KMD?
press 'y' to confirm, otherwise cancelled.y
INFO... Original i915.ko backuped in kmd_backup/i915.ko.2013-07-02_182315
INFO... Trying to install 3.2.42 kmd...
Error... Kernel module updated failed, due to mismatched kernel 3.2.0-23-generic with pre-build KMD. You have to rebuild kernel with patched files (kmd/source) manually.
INFO... Package installation Done.
root@ubuntu-msdk:/home/jacky/msdk# uname -a
Linux ubuntu-msdk 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

As you can see, I was told to rebuild the kernel, so we do it. However it’s a long way to go…

3. Install a bunch of stuffs with rebuild kernel needed:
If you encounter with network setting & routing problems, you can refer to … pid=24641&fromuid=3 for a guide.

sudo apt-get install git g++ make curl
sudo apt-get install ncurses-dev kernel-package

4. Download kernel

git clone git://
cd linux-stable
git checkout v3.2

5. Apply Intel Media SDK patch

jacky@ubuntu-msdk:~/kernel/linux-stable/include$ cd ../drivers/char/agp/
jacky@ubuntu-msdk:~/kernel/linux-stable/drivers/char/agp$ cp -r ~/msdk/kmd/source/xcode-ubuntu-kmd-rel/drivers/char/agp/* .
jacky@ubuntu-msdk:~/kernel/linux-stable/drivers/char/agp$ cd ../../gpu/
jacky@ubuntu-msdk:~/kernel/linux-stable/drivers/gpu$ cp -r ~/msdk/kmd/source/xcode-ubuntu-kmd-rel/drivers/gpu/* .

6. Recompile kernel

make menuconfig
make-kpkg --initrd --append-to-version -01 kernel_image kernel_headers -j8


Start GPU encoding/decoding with VA-API (Video Acceleration API)


The main motivation for VA-API (Video Acceleration API) is to enable hardware accelerated video decode/encode at various entry-points (VLD, IDCT, Motion Compensation etc.) for the prevailing coding standards today (MPEG-2, MPEG-4 ASP/H.263, MPEG-4 AVC/H.264, and VC-1/VMW3). Extending XvMC was considered, but due to its original design for MPEG-2 ?MotionComp only, it made more sense to design an interface from scratch that can fully expose the video decode capabilities in today’s GPUs.

The current video decode/encode interface is window system independent, so that potentially it can be used with graphics sub-systems other than X. In a nutshell it is basically a scheme to pass various types of data buffers from the application to the GPU for decoding or encoding. Feedback on the API is greatly welcomed, as this is intended to be a community collaborative effort.


The latest releases of libva software can be found at:


libva an implementation of VA-API for Linux, is now available via git from the following location (

git clone git://

The gstreamer-vaapi elements are available at:

git clone git://


Latest VA-API decode/encode specification can be found at,

Post-processing interface can be found at

Drivers (back-ends) that implement VA-API

  • Broadcom Crystal HD (work-in-progress):
    * <>
  • Intel Embedded Graphics Drivers (IEGD):
    * <>
  • Intel Embedded Media and Graphics Drivers (EMGD):
    * <> 
  • Intel GMA500 driver (OEM only):
    * <> 
  • Intel integrated G45 graphics chips:
    * <> 
  • IMG VXD375/385 and VXE250/285 video engines:
    * <> 
  • VDPAU back-end for NVIDIA and VIA chipsets:
    * <> 
  • VIA / S3 Graphics Accelerated Linux Driver:
    * <> 
  • XvBA / ATI Graphics Backend (for proprietary driver only)
    * <> 

    Other back-ends are currently under development.

Decoding Hardware with no backend available


Software using VA-API

  • Clutter toolkit (through clutter-gst, thus GStreamer):
    * <> 
  • FFmpeg (upstream SVN tree >= 2010/01/18 / version 0.6.x and onwards):
    * <> 
  • Fluendo video codec pack for Intel Atom (GStreamer):
    * <> 
  • Gnash flash player:
    * <> 
  • GStreamer:
    * <> 
  • Lightspark flash player:
    * <> 
  • MPlayer/VAAPI:
    * <> (`hwaccel-vaapi` branch) 
  • MythTV (work-in-progress):
    * <> 
  • ?RealPlayer for MID:
    * <> 
  • Totem movie player (simply requires GStreamer VA-API plug-ins):
    * <> 
  • VideoLAN – VLC media player:
    * <> 
  • XBMC:
    * <> 
  • Xine:
    * <> 

libVA sample code

  • Hardware video decoding acceleration demos:
    * <> 
  • Decode sample program:
    * <> 
  • Encode sample program:
    * <> 
  • Post-processing sample program:
    * <> 


[[!img Linux_vaAPI.gif]


Jonathan Bian (; Austin Yuan (


Integrating Intel® Media SDK with FFmpeg for mux/demuxing and audio encode/decode usages

Download Article and Source Code

Download Integrating Intel® Media SDK with FFmpeg for mux/demuxing and audio encode/decode usages (PDF 568KB)
Download Source Code. (ZIP 98KB) (Note: Licensing terms match Media SDK 2012)


The provided samples intend to illustrate how Intel® Media SDK can be used together with the popular FFmpeg suite of components to perform container muxing and demuxing (splitting). The samples also showcase integration of rudimentary FFmpeg audio decode and encode. Continue reading “Integrating Intel® Media SDK with FFmpeg for mux/demuxing and audio encode/decode usages”

A simple guide to start GPU programming

This is a simple guide document for getting start with GPU programming by using CUDA SDK, and my working environment is  WinXP + VS 2010.

If you are looking for a comprehensive guide of GPU programming, you need to visit

Here we go.

1. Installing CUDA Development Tools

Key steps:

  • Verify the system has a CUDA-capable GPU.
  • Download the NVIDIA CUDA Toolkit.
  • Install the NVIDIA CUDA Toolkit.
  • Test that the installed software runs correctly and communicated with the hardware.


For WinXP(32 bit):

You can choose what to install from the following packages:

  1. Note: If you want to install the CUDA Driver for new hardware, and have already installed the CUDA Driver before, you can launch the CUDA Driver installer from the Start Menu under:

    NVIDIA Corporation\CUDA Toolkit\v5.0, or

    NVIDIA Corporation\CUDA Toolkit\v5.0 (64 bit)

    CUDA Driver

    The CUDA Driver installation can be done silently or by using a GUI. A silent installation of the driver is done by enabling that feature when choosing what to install.

    • Silent: Only the display driver will be installed.
    • GUI: A window will appear after the CUDA Toolkit installation if you allowed it at the last dialog with the full driver installation UI. You can choose which features you wish to install.
  2. CUDA Toolkit

    The CUDA Toolkit installation defaults to

    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v#.#, where

    #.#is version number 3.2 or higher. This directory contains the following:

    the compiler executables and runtime libraries
    the header files needed to compile CUDA programs
    the library files needed to link CUDA programs
    the CUDA C Programming Guide, CUDA C Best Practices Guide, documentation for the CUDA libraries, and other CUDA Toolkit-related documentation

    Note: CUDA Toolkit versions 3.1 and earlier installed into

    C:\CUDAby default, requiring prior CUDA Toolkit versions to be uninstalled before the installation of new versions. Beginning with CUDA Toolkit 3.2, multiple CUDA Toolkit versions can be installed simultaneously.

  3. CUDA Samples The CUDA Samples contain source code for many example problems and templates with Microsoft Visual Studio 2008 and 2010 projects.

    For Windows XP, the samples can be found here:

    C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\CUDA Samples\v5.0

    For Windows Vista, Windows 7, and Windows Server 2008, the samples can be found here:

    C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0

2. Compiling CUDA Programs

The project files in the CUDA Samples have been designed to provide simple, one-click builds of the programs that include all source code. To build the 32-bit or 64-bit Windows projects (for release or debug mode), use the provided

*.slnsolution files for Microsoft Visual Studio 2008 or 2010 (and likewise for the corresponding versions of Microsoft Visual C++ Express Edition). You can use either the solution files located in each of the examples directories in

CUDA Samples\v5.0\C\<category>\<sample_name>

or the global solution files

Samples*.slnlocated in

CUDA Samples\v5.0\C

CUDA Samples are organized according to

<category>. Each sample is organized into one of the following folders: (0_Simple, 1_Utilities, 2_Graphics, 3_Imaging, 4_Finance, 5_Simulations, 6_Advanced, 7_CUDALibraries).