Performance tests of Intel Media SDK(4.0.026-HSW): Efficiency comparision between different Profiles

720P H.264 video decoding performance

Yesterday I got two test H.264 ES stream files from one of my colleague.

One was encoded by Main Profile, the other was encoded by Baseline Profile.

Weired thing is decoding the Main Profiled file seems much more fast than the Baseline Profiled file.

To release his concerns, today I did some more tests.

First, I double checked with these files together with one another High Profile file.

Here is the test result:

Profile Instance FPS CPU GPU
HP 1 629.63575 <10% 50% ~ 60%
HP 4 854.59665 40% ~ 50% 80%
Main 1 1064.62068 20% ~ 30% 67%
Main 4 1257.23482 40% ~ 50% 90% ~ 93%
Baseline 1 740.99021 <10% 46% ~ 48%
Baseline 4 1241.61537 30% ~ 50% 80% ~ 83%

*CPU usage: max usage on a single core
*GPU usage: max usage of GPU’s bitstream space

According to test result, the Baseline & Main Profile video decoding efficiency seems really strange.

So I checked these test video streams:

Test video sequence bit stream file:

  • 720p HP: 720p-lakeside-forest.264 (Standard test sequence)
  • 720p Main: 720p-dec-main.264 . Recorded video from one of Kedacom product, A regular moving video, and clearly a rag.
  • 720p Baseline: 720p-vid2-baseline.264 .Recorded video from one of Kedacom product, a regular moving video.

Then, I decided to transcode the Baseline profile video(720p-vid2-baseline.264) to a Main Profile video stream, and run one more test for Main Profile H.264 video decoding which will be using a same YUV video sequence with the previous Baseline Profile video decoding test.

Since the H.264 profile setting’s in MSDK is not open for set(but can be set by an indirect way), I ported a x264 to the current test framework(added a new encoder library base on x264).

Finally, I got two more files:

720p-vid2-baseline-still-baseline.264, which major encode setting was “baseline” + “veryfast”, “zerolatency”

Command: ./kdvcodec_msdkenc h264 -i 720p-baseline.yuv -o s.264 -w 1280 -h 720 -b 2000 -f 30 -hw -vaapi -u speed -loop 1 -run 1 -test 0 -cache 500

720p-vid2-baseline-to-main.264, which major encode setting was “ main” + “veryfast”, “zerolatency” profile Main, level 5.2

Command: ./kdvcodec_msdkenc h264 -i 720p-baseline.yuv -o s.264 -w 1280 -h 720 -b 2000 -f 30 -hw -vaapi -u speed -loop 1 -run 1 -test 2 -cache 500

Now run one more test:

Profile Instance FPS CPU GPU
Baseline 1 721.26709 20% ~ 30% 67%
Baseline 4 1218.01066 40% ~ 50% 90% ~ 93%
Main 1 620.54006 <10% 46% ~ 48%
Main 4 1136.95245 30% ~ 50% 80% ~ 83%

Conclusion:

  1. Decoding a ragged video will cost less than decoding a normal video.
  2. Decoding a Baseline profile video can be more efficiency than decoding a Main Profile by given an even circumstance.

Performance tests of Intel Media SDK (4.0.024-HSW) – Multithread decode

Performance test – multithread decode

Media SDK version

Intel Media SDK (4.0.024-HSW)

Test environments

CPU: Core ivy bridge i7-3770

OS: Ubuntu server 12.04 LTS, kernel version 3.2.0-23 x86_64 x86_64 x86_64 GNU/Linux

Test program

kdvcodec_msdkdec_mt (svn: 10)

Working mode

a. Decode with simulated block mode

b. Return with NV12 buffer.

c. Run N decoding threads. Open & caching H.264 Elementary Streams from one specified file.

d. Timing: start before thread created, end time after all thread terminated(Means it’s a ballpark estimation). Continue reading “Performance tests of Intel Media SDK (4.0.024-HSW) – Multithread decode”

Performance tests of Intel Media SDK (4.0.024-HSW) – Decode

It’s a test report of current/latest version of kdvcodec_msdkdec which based on the latest version of Intel Media SDK(4.0.024-HSW).

There is a known issue, NV12 to YV12 (or YUV420P) conversion, which is very low in efficiency.

Here are the details. Continue reading “Performance tests of Intel Media SDK (4.0.024-HSW) – Decode”

Step by step research into Intel Media SDK for Linux server

Intel release an alpha version of Media SDK for linux servers (Intel® Media SDK 2013 for Linux Servers) recently, here now I am researching on it. I will record my research details in this post, as usual,  can not guarantee everything I write here are exact true and right, only an experiment of it.

Introduction

Intel® Media SDK 2013 for Linux Servers is an SDK for optimizing datacenter and embedded media applications for Linux operating systems to utilize Intel HD Graphics hardware acceleration capabilities. Now, quickly and easily develop optimized media applications for Linux operating systems such as encode, decode, and transcode for real-time streaming, teleconferencing, and video analytics.

You can visit: http://software.intel.com/en-us/vcsource/tools/media-sdk-linux to get more informations from Intel’s official website of Media SDK

Features

  • For Intel Xeon® E3-1285Lv2 and Intel Core™ Processor-based Platforms with Intel HD Graphics
  • Encode, decode, and transcode for server-based streaming
  • Supports Ubuntu* and SUSE* Linux Enterprise Operating Systems
  • Supports H.264, MPEG-2, VC-1 formats

Experiments

Step by step to setup the develop environment, and get into works. My hardware environemts(CPU) is Core 3rd Generation Ivy, according to Intel’s introduction in it Release Notes, I can only choose ubuntu 12.04 server 64 with kernel 3.2.

1. OS Install
Related files:
1. ubuntu-12.04-server-amd64.iso
2. unetbootin-windows-581.exe

According to Intel’s Release Notes, target OS version match with MediaSDK is ubuntu 12.04 server 64 bit with kernel 3.2.
So we choose: ubuntu-12.04-server-amd64.iso(we can not download this version of ubuntu server on ubuntu.com, if you dont have it, you can mail me or call me)

1. Get yourself a USB flash(4G or bigger), use unetbootin-windows-581.exe to write the iso to the USB flash.
2. Plug the USB to 8000e, and power up, press DEL to get into BIOS, choose your newly plugged in USB as first boot option.
3. Install Ubuntu server.
4. Make sure your 8000e can access to the internet (Mail your 8000e’s ethernet MAC ADDR together with your pre-assigned IP addr to 黄阳)
5. I strongly recommend you to not do it on 8000e’s iCF4000 card(which has only 8 GB rom), because I was sucked here 2 times being told “no space left” while compiling the kernel.
2. Media SDK install
Related files:
1. haswell.rar which including but not limits to files:
a. intel-Linux-media_ubuntu_16.1.0.8778_64bit.tar.gz
b. kmd_patched_sources.tar.bz2

Upload haswell.rar to 8000e, and extract it, you will find a file named as intel-linux-media_ubuntu_16.1.0.8778_64bit.tar.gz
1. Extract intel-linux-media_ubuntu_16.1.0.8778_64bit.tar.gz
2. Locate to the extracted files, find and execute ./install_media.sh

jacky@ubuntu-msdk:~/msdk$ ./install_media.sh
INFO... Install on Ubuntu ...
Error... This script must be run as root!
jacky@ubuntu-msdk:~/msdk$ sudo -i
[sudo] password for jacky:
root@ubuntu-msdk:~# ls
root@ubuntu-msdk:~# cd /home/jacky/msdk/
root@ubuntu-msdk:/home/jacky/msdk# ls
bak  install_media.sh  intel-linux-media_ubuntu_16.1.0.8778_64bit.tar.gz  kmd  MSDK  usr
root@ubuntu-msdk:/home/jacky/msdk# ./install_media.sh
INFO... Install on Ubuntu ...
INFO... Installing New Driver...
INFO... The default media driver is renderless API, do you want to use X11 backend?
press 'y' to use X11 backend, otherwise by default(drm backend, renderless)y
INFO... X11 backend enabled!
INFO... MediaSDK installed successfully in /opt/intel/mediasdk!
INFO... Do you want to install KMD?
press 'y' to confirm, otherwise cancelled.y
INFO... Original i915.ko backuped in kmd_backup/i915.ko.2013-07-02_182315
INFO... Trying to install 3.2.42 kmd...
Error... Kernel module updated failed, due to mismatched kernel 3.2.0-23-generic with pre-build KMD. You have to rebuild kernel with patched files (kmd/source) manually.
INFO... Package installation Done.
root@ubuntu-msdk:/home/jacky/msdk# uname -a
Linux ubuntu-msdk 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

As you can see, I was told to rebuild the kernel, so we do it. However it’s a long way to go…

3. Install a bunch of stuffs with rebuild kernel needed:
If you encounter with network setting & routing problems, you can refer to http://bbs.rosoo.net/forum.php?m … pid=24641&fromuid=3 for a guide.

sudo apt-get install git g++ make curl
sudo apt-get install ncurses-dev kernel-package

4. Download kernel

git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
cd linux-stable
git checkout v3.2

5. Apply Intel Media SDK patch

jacky@ubuntu-msdk:~/kernel/linux-stable/include$ cd ../drivers/char/agp/
jacky@ubuntu-msdk:~/kernel/linux-stable/drivers/char/agp$ cp -r ~/msdk/kmd/source/xcode-ubuntu-kmd-rel/drivers/char/agp/* .
jacky@ubuntu-msdk:~/kernel/linux-stable/drivers/char/agp$ cd ../../gpu/
jacky@ubuntu-msdk:~/kernel/linux-stable/drivers/gpu$ cp -r ~/msdk/kmd/source/xcode-ubuntu-kmd-rel/drivers/gpu/* .
jacky@ubuntu-msdk:~/kernel/linux-stable/drivers/gpu$

6. Recompile kernel

make menuconfig
make-kpkg --initrd --append-to-version -01 kernel_image kernel_headers -j8

 

Start GPU encoding/decoding with VA-API (Video Acceleration API)

About

The main motivation for VA-API (Video Acceleration API) is to enable hardware accelerated video decode/encode at various entry-points (VLD, IDCT, Motion Compensation etc.) for the prevailing coding standards today (MPEG-2, MPEG-4 ASP/H.263, MPEG-4 AVC/H.264, and VC-1/VMW3). Extending XvMC was considered, but due to its original design for MPEG-2 ?MotionComp only, it made more sense to design an interface from scratch that can fully expose the video decode capabilities in today’s GPUs.

The current video decode/encode interface is window system independent, so that potentially it can be used with graphics sub-systems other than X. In a nutshell it is basically a scheme to pass various types of data buffers from the application to the GPU for decoding or encoding. Feedback on the API is greatly welcomed, as this is intended to be a community collaborative effort.

Download

The latest releases of libva software can be found at: http://www.freedesktop.org/software/vaapi/

Git

libva an implementation of VA-API for Linux, is now available via git from the following location (http://cgit.freedesktop.org/libva/):

git clone git://anongit.freedesktop.org/git/libva

The gstreamer-vaapi elements are available at: https://gitorious.org/vaapi/gstreamer-vaapi

git clone git://gitorious.org/vaapi/gstreamer-vaapi.git

Specification

Latest VA-API decode/encode specification can be found at http://cgit.freedesktop.org/libva/tree/va/va.h,

Post-processing interface can be found at http://cgit.freedesktop.org/libva/tree/va/va_x11.h

Drivers (back-ends) that implement VA-API

  • Broadcom Crystal HD (work-in-progress):
    * <http://gitorious.org/crystalhd-video>
    
  • Intel Embedded Graphics Drivers (IEGD):
    * <http://edc.intel.com/Software/Downloads/IEGD/>
    
  • Intel Embedded Media and Graphics Drivers (EMGD):
    * <http://edc.intel.com/Software/Downloads/EMGD/> 
    
  • Intel GMA500 driver (OEM only):
    * <https://launchpad.net/~ubuntu-mobile/+archive/ppa> 
    
  • Intel integrated G45 graphics chips:
    * <http://cgit.freedesktop.org/vaapi/intel-driver> 
    
  • IMG VXD375/385 and VXE250/285 video engines:
    * <http://cgit.freedesktop.org/vaapi/pvr-driver/> 
    
  • VDPAU back-end for NVIDIA and VIA chipsets:
    * <http://cgit.freedesktop.org/vaapi/vdpau-driver/> 
    
  • VIA / S3 Graphics Accelerated Linux Driver:
    * <http://www.s3graphics.com/en/index.aspx> 
    
  • XvBA / ATI Graphics Backend (for proprietary driver only)
    * <http://cgit.freedesktop.org/vaapi/xvba-driver/> 
    

    Other back-ends are currently under development.

Decoding Hardware with no backend available

  • NONE FOR NOW

Software using VA-API

  • Clutter toolkit (through clutter-gst, thus GStreamer):
    * <http://clutter-project.org/> 
    
  • FFmpeg (upstream SVN tree >= 2010/01/18 / version 0.6.x and onwards):
    * <http://ffmpeg.org/> 
    
  • Fluendo video codec pack for Intel Atom (GStreamer):
    * <http://www.fluendo.com/> 
    
  • Gnash flash player:
    * <http://wiki.gnashdev.org/Hardware_Video_decoding> 
    
  • GStreamer:
    * <http://gitorious.org/vaapi/gstreamer-vaapi> 
    
  • Lightspark flash player:
    * <http://lightspark.sourceforge.net/> 
    
  • MPlayer/VAAPI:
    * <http://gitorious.org/vaapi/mplayer> (`hwaccel-vaapi` branch) 
    
  • MythTV (work-in-progress):
    * <http://www.mythtv.org/wiki/VAAPI> 
    
  • ?RealPlayer for MID:
    * <https://community.helixcommunity.org/Licenses/realplayer_for_mid_faq.html> 
    
  • Totem movie player (simply requires GStreamer VA-API plug-ins):
    * <http://projects.gnome.org/totem/> 
    
  • VideoLAN – VLC media player:
    * <http://www.videolan.org/> 
    
  • XBMC:
    * <http://www.xbmc.org/> 
    
  • Xine:
    * <https://github.com/huceke/xine-lib-vaapi/tree/vaapi> 
    

libVA sample code

  • Hardware video decoding acceleration demos:
    * <http://gitorious.org/hwdecode-demos/> 
    
  • Decode sample program:
    * <http://cgit.freedesktop.org/libva/tree/test/decode/mpeg2vldemo.c> 
    
  • Encode sample program:
    * <http://cgit.freedesktop.org/libva/tree/test/encode/h264encode.c> 
    
  • Post-processing sample program:
    * <http://cgit.freedesktop.org/libva/tree/test/putsurface/putsurface.c> 
    

Architecture

[[!img Linux_vaAPI.gif]

Contact

Jonathan Bian (jonathan.bian@intel.com); Austin Yuan (shengquan.yuan@intel.com)

From: http://www.freedesktop.org/wiki/Software/vaapi/

Integrating Intel® Media SDK with FFmpeg for mux/demuxing and audio encode/decode usages

Download Article and Source Code

Download Integrating Intel® Media SDK with FFmpeg for mux/demuxing and audio encode/decode usages (PDF 568KB)
Download Source Code. (ZIP 98KB) (Note: Licensing terms match Media SDK 2012)

Introduction

The provided samples intend to illustrate how Intel® Media SDK can be used together with the popular FFmpeg suite of components to perform container muxing and demuxing (splitting). The samples also showcase integration of rudimentary FFmpeg audio decode and encode. Continue reading “Integrating Intel® Media SDK with FFmpeg for mux/demuxing and audio encode/decode usages”