Performance tests of Intel Media SDK (4.0.024-HSW) – Decode


It’s a test report of current/latest version of kdvcodec_msdkdec which based on the latest version of Intel Media SDK(4.0.024-HSW).

There is a known issue, NV12 to YV12 (or YUV420P) conversion, which is very low in efficiency.

Here are the details.

I. Intel MSDK version

Beta version: 4.0.024-HSW

II. Hardware

Core ivy bridge i7-3770, and here is the details (you can view it by bash command: cat /proc/cpuinfo)

vendor_id : GenuineIntel

cpu family : 6

model : 58

model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

stepping : 9

microcode : 0x12

cpu MHz : 1600.000

cache size : 8192 KB

physical id : 0

siblings : 8

core id : 3

cpu cores : 4

apicid : 7

initial apicid : 7

fpu : yes

fpu_exception : yes

cpuid level : 13

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms

bogomips : 6799.77

clflush size : 64

cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual

III. OS

Ubuntu 12.04 LTS, kernel version 3.2.0-23

jacky@ubuntu-msdk:~$ uname -a

Linux ubuntu-msdk 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

IV. Test video sequences

  • 1080p.264: total 300 frames
  • 1080P-10M-600.264: total 600 frames

V. Test program

1. Sample program provided by Intel
Major processes & features

  • Read 264 file
  • Decode in asynchronized mode
  • Save to a YUV file

This test programs performance is limited due to the processes really unnecessary like Read 264 ES stream file, and save the decoded YUV buffer to file. The Intel Media SDK’s output format is NV12.

So I made a few modifications for it, to run a specific test more appropriate for our product’s run-time scenario, here are the modifications:
1. Skip Read 264 file by reading the total file to memory and parse the total file to a framed array(there are bugs here)
2. Implemented two mode of Decode (async mode and block mode)
3. Decoded buffer process
A. Copy the decoded NV12 to a buffer.
B. Copy the decoded NV12 to a buffer, and memset the NV12′s UV data to 0
C. Copy and convert the NV12 to YUV420P (Color convertion without optimized)
D. Copy and convert the NV12 to YUV420P, and memset the UV data to 0(Color convertion without optimized)
E. A + save to file.
F. B + save to file
G. C + save to file
H. D + save to file

VI. Test results

1MB bitstream per input

Test sequence 1: 1080P, 300 frames

A B C D E F G H
FPS 969758 997308 3330680 3342184 5682973 4660920 5672404 5336507
309.36 300.81 90.07 89.76 52.79 64.36 52.89 56.22
CPU Low Low High High Low Low slight slight

Test sequence 2: 1080P-10M-600.264: total 600 frames

A B C D E F G H
FPS 2233026 2287319 7224942 7281495 13804849 13727656 14136208 14121156
268.69 262.32 83.05 82.4 43.46 43.71 43.71 42.49
CPU Low Low High High Low Low slight slight

Single frame/slice per input

Test sequence 1: 1080P, 300 frames

A B C D E F G H
FPS 1610299 1637120 4406070 4549480 6122828 6198919 6438414 6705004
186.3 183.25 68.09 65.94 49 48.4 46.6 44.74
CPU Low Low High High Low Low slight slight

Test sequence 2: 1080P-10M-600.264: total 600 frames

A B C D E F G H
FPS 2511838 2597502 7577040 7679905 14260726 14330324 14857964 14912612
238.87 230.99 79.19 78.13 42.07 41.87 40.38 40.23
CPU Low Low High High Low Low slight slight

*This test result can be different by :
a. The test video sequence file which is not encoded by the same settings with my test sequence ;
b. Chances, like OS loading or other processes interferings, especially the E/F/G/H saving to file scenarios, results Italicized could also be a prove of it.

*About CPU usage:
“Not obvious” means total OS level CPU usage less than 3%, and not sure whether is the test programs costing it.

Some suspecious phenomenon.

a. CPU usage

b. GPU performance

Periodic conclusion for Intel MSDK decode research

Most costy procedure in the current test program is Colorspace conversion(from NV12 to YUV420P), I’ll try to improve it in the coming days (BTW: Intel VPP module does not provide this conversion support).

However, comparing to the software decoding, even down to 150 fps for 1080P decoding, MSDK sounds still a promising technique for us.

Leave a comment

Your email address will not be published. Required fields are marked *