It’s a test report of current/latest version of kdvcodec_msdkdec which based on the latest version of Intel Media SDK(4.0.024-HSW).
There is a known issue, NV12 to YV12 (or YUV420P) conversion, which is very low in efficiency.
Here are the details.
I. Intel MSDK version
Beta version: 4.0.024-HSW
II. Hardware
Core ivy bridge i7-3770, and here is the details (you can view it by bash command: cat /proc/cpuinfo)
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping : 9
microcode : 0x12
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips : 6799.77
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
III. OS
Ubuntu 12.04 LTS, kernel version 3.2.0-23
jacky@ubuntu-msdk:~$ uname -a
Linux ubuntu-msdk 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
IV. Test video sequences
- 1080p.264: total 300 frames
- 1080P-10M-600.264: total 600 frames
V. Test program
1. Sample program provided by Intel
Major processes & features
- Read 264 file
- Decode in asynchronized mode
- Save to a YUV file
This test programs performance is limited due to the processes really unnecessary like Read 264 ES stream file, and save the decoded YUV buffer to file. The Intel Media SDK’s output format is NV12.
So I made a few modifications for it, to run a specific test more appropriate for our product’s run-time scenario, here are the modifications:
1. Skip Read 264 file by reading the total file to memory and parse the total file to a framed array(there are bugs here)
2. Implemented two mode of Decode (async mode and block mode)
3. Decoded buffer process
A. Copy the decoded NV12 to a buffer.
B. Copy the decoded NV12 to a buffer, and memset the NV12′s UV data to 0
C. Copy and convert the NV12 to YUV420P (Color convertion without optimized)
D. Copy and convert the NV12 to YUV420P, and memset the UV data to 0(Color convertion without optimized)
E. A + save to file.
F. B + save to file
G. C + save to file
H. D + save to file
VI. Test results
1MB bitstream per input
Test sequence 1: 1080P, 300 frames
A | B | C | D | E | F | G | H | |
FPS | 969758 | 997308 | 3330680 | 3342184 | 5682973 | 4660920 | 5672404 | 5336507 |
309.36 | 300.81 | 90.07 | 89.76 | 52.79 | 64.36 | 52.89 | 56.22 | |
CPU | Low | Low | High | High | Low | Low | slight | slight |
Test sequence 2: 1080P-10M-600.264: total 600 frames
A | B | C | D | E | F | G | H | |
FPS | 2233026 | 2287319 | 7224942 | 7281495 | 13804849 | 13727656 | 14136208 | 14121156 |
268.69 | 262.32 | 83.05 | 82.4 | 43.46 | 43.71 | 43.71 | 42.49 | |
CPU | Low | Low | High | High | Low | Low | slight | slight |
Single frame/slice per input
Test sequence 1: 1080P, 300 frames
A | B | C | D | E | F | G | H | |
FPS | 1610299 | 1637120 | 4406070 | 4549480 | 6122828 | 6198919 | 6438414 | 6705004 |
186.3 | 183.25 | 68.09 | 65.94 | 49 | 48.4 | 46.6 | 44.74 | |
CPU | Low | Low | High | High | Low | Low | slight | slight |
Test sequence 2: 1080P-10M-600.264: total 600 frames
A | B | C | D | E | F | G | H | |
FPS | 2511838 | 2597502 | 7577040 | 7679905 | 14260726 | 14330324 | 14857964 | 14912612 |
238.87 | 230.99 | 79.19 | 78.13 | 42.07 | 41.87 | 40.38 | 40.23 | |
CPU | Low | Low | High | High | Low | Low | slight | slight |
*This test result can be different by :
a. The test video sequence file which is not encoded by the same settings with my test sequence ;
b. Chances, like OS loading or other processes interferings, especially the E/F/G/H saving to file scenarios, results Italicized could also be a prove of it.
*About CPU usage:
“Not obvious” means total OS level CPU usage less than 3%, and not sure whether is the test programs costing it.
Some suspecious phenomenon.
a. CPU usage
b. GPU performance
Periodic conclusion for Intel MSDK decode research
Most costy procedure in the current test program is Colorspace conversion(from NV12 to YUV420P), I’ll try to improve it in the coming days (BTW: Intel VPP module does not provide this conversion support).
However, comparing to the software decoding, even down to 150 fps for 1080P decoding, MSDK sounds still a promising technique for us.