Performance tests of Intel Media SDK (4.0.024-HSW) – Multithread decode


Performance test – multithread decode

Media SDK version

Intel Media SDK (4.0.024-HSW)

Test environments

CPU: Core ivy bridge i7-3770

OS: Ubuntu server 12.04 LTS, kernel version 3.2.0-23 x86_64 x86_64 x86_64 GNU/Linux

Test program

kdvcodec_msdkdec_mt (svn: 10)

Working mode

a. Decode with simulated block mode

b. Return with NV12 buffer.

c. Run N decoding threads. Open & caching H.264 Elementary Streams from one specified file.

d. Timing: start before thread created, end time after all thread terminated(Means it’s a ballpark estimation).

Command-line arguments

kdvcodec_msdkdec_mt InputBitstream [TestType] [Instance Number] [LOOP]

Options:

[TestType]

0 – Decode only(including copy decoded buffer from mfxFrameSurface1 to a mfxU8* byte pointer)

1 – Decode and print status

2 – Decode, printf status, save YUV file to aaa.yuv

[Instance Number]

Start N instance(thread) to run the test, don’t set this variant too large if your RAM is not large enough

[LOOP]

Loop times of cache will be input to decode

Example:

./kdvcodec_msdkdec_mt /home/jacky/test.264 2 4 4

Test scenarios 1: 4 threads

Command

./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 4 4

Description

Run 4 threads, loop 4 times, and output NV12 only

Result

Tests result on my PC shows this test scenario can decode 500 ~ 530 fps, but the CPU always stuck in a high level.

Here is a sample test result:

Overall: 11430, output frames: 9101, fps=524.50256, Total used time: 17.35168 s, 17351679 usec

CPU usage: 40% ~ 60%

jacky@ubuntu-msdk:/opt/workspace/msdk/bin$ ./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 4 4

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

libva info: va_openDriver() returns 0

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

Output Type = 1

libva info: Found init function __vaDriverInit_0_32

Output Type = 1

libva info: va_openDriver() returns 0

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

MFX_WRN_VIDEO_PARAM_CHANGED

MFX_WRN_VIDEO_PARAM_CHANGED

(0x00DD4010): 1143, output frames: 2278, fps=131.90689, Total used time: 17.26976 s, 17269757 usec

(0x00DDA1D0): 2286, output frames: 2258, fps=130.49756, Total used time: 17.30301 s, 17303006 usec

(0x00DE00C0): 3429, output frames: 2288, fps=132.09439, Total used time: 17.32095 s, 17320947 usec

(0x00DF7440): 4572, output frames: 2278, fps=131.37719, Total used time: 17.33939 s, 17339388 usec

————————————————-

Overall: 11430, output frames: 9101, fps=524.50256, Total used time: 17.35168 s, 17351679 usec

Test scenarios 2: 8 threads

Command

./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 8 4

Description

Run 8 threads, loop 4 times, and output NV12 only

Result

Overall: 41148, output frames: 17686, fps=344.67759, Total used time: 51.31172 s, 51311720 usec

CPU usage: 30% ~ 50%

jacky@ubuntu-msdk:/opt/workspace/msdk/bin$ ./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 8 4

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

libva info: VA-API version 0.34.0

Output Type = 1

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

Output Type = 1

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

MFX_WRN_VIDEO_PARAM_CHANGED

MFX_WRN_VIDEO_PARAM_CHANGED

(0x01D52010): 1143, output frames: 2278, fps=44.57920, Total used time: 51.10006 s, 51100059 usec

(0x01D581D0): 2286, output frames: 2164, fps=42.30637, Total used time: 51.15068 s, 51150684 usec

(0x01D5E0C0): 3429, output frames: 2203, fps=43.07164, Total used time: 51.14734 s, 51147344 usec

(0x01D75440): 4572, output frames: 2136, fps=41.71776, Total used time: 51.20122 s, 51201217 usec

(0x01D6A0D0): 5715, output frames: 2264, fps=44.20714, Total used time: 51.21345 s, 51213451 usec

(0x01D8F6B0): 6858, output frames: 2238, fps=43.66062, Total used time: 51.25900 s, 51259003 usec

(0x01D987D0): 8001, output frames: 2253, fps=43.93440, Total used time: 51.28100 s, 51281004 usec

(0x01DA18F0): 9144, output frames: 2152, fps=41.95352, Total used time: 51.29486 s, 51294859 usec

————————————————-

Overall: 41148, output frames: 17686, fps=344.67759, Total used time: 51.31172 s, 51311720 usec

Test scenarios 3: 1 thread

Command

./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 1 4

Description

Run 1 threads, loop 40 times, and output NV12 only

Result

Overall: 1143, output frames: 2278, fps=354.90440, Total used time: 6.41863 s, 6418630 usec

CPU usage: 10% ~ 20%

jacky@ubuntu-msdk:/opt/workspace/msdk/bin$ ./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 1 4

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

MFX_WRN_VIDEO_PARAM_CHANGED

MFX_WRN_VIDEO_PARAM_CHANGED

(0x0146C010): 1143, output frames: 2278, fps=355.55328, Total used time: 6.40692 s, 6406916 usec

————————————————-

Overall: 1143, output frames: 2278, fps=354.90440, Total used time: 6.41863 s, 6418630 usec

Test scenarios 4: 2 threads

Command

./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 2 20

Description

Run 2 threads, loop 20 times, and output NV12 only

Result

Overall: 3429, output frames: 22785, fps=478.96642, Total used time: 47.57118 s, 47571185 usec

.

jacky@ubuntu-msdk:/opt/workspace/msdk/bin$ ./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 2 20

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

libva info: va_openDriver() returns 0

Output Type = 1

Output Type = 1

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

MFX_WRN_VIDEO_PARAM_CHANGED

MFX_WRN_VIDEO_PARAM_CHANGED

(0x00A0D010): 1143, output frames: 11398, fps=240.22250, Total used time: 47.44768 s, 47447679 usec

(0x00A131D0): 2286, output frames: 11390, fps=239.48212, Total used time: 47.56096 s, 47560961 usec

————————————————-

Overall: 3429, output frames: 22785, fps=478.96642, Total used time: 47.57118 s, 47571185 usec

Test scenarios 5: 4 instances, no copy NV12

Command

./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 4 40

Description

Run 4 parallel instances, loop 40 times, don’t output any buffer (to check whether its memcpy which caused CPU loading)

Result

Overall: 5010, output frames: 39796, fps=594.05649, Total used time: 66.99026 s, 66990262 usec

CPU usage: 40% ~ 70%

jacky@ubuntu-msdk:/opt/workspace/msdk/bin$ ./kdvcodec_msdkdec_mt ~/Videos/riverbed_1920x1080_25.h264 0 4 40

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: Found init function __vaDriverInit_0_32

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

libva info: va_openDriver() returns 0

libva info: va_openDriver() returns 0

Output Type = 1

Output Type = 1

Output Type = 1

libva info: VA-API version 0.34.0

libva info: va_getDriverName() returns 0

libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so

libva info: Found init function __vaDriverInit_0_32

libva info: va_openDriver() returns 0

Output Type = 1

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

VideoInfo:

Resolution: 1920×1088

FPS: 60(2)

MFX_WRN_VIDEO_PARAM_CHANGED

MFX_WRN_VIDEO_PARAM_CHANGED

(0x024CE010): 501, output frames: 9998, fps=150.19414, Total used time: 66.56718 s, 66567176 usec

(0x024D3190): 1002, output frames: 9973, fps=149.44981, Total used time: 66.73144 s, 66731435 usec

(0x024D8280): 1503, output frames: 9936, fps=148.62010, Total used time: 66.85502 s, 66855020 usec

(0x024DE380): 2004, output frames: 9920, fps=148.11366, Total used time: 66.97559 s, 66975591 usec

————————————————-

Overall: 5010, output frames: 39796, fps=594.05649, Total used time: 66.99026 s, 66990262 usec

Test scenarios 5: extended test

This test was based on another version of this test program which added a usleep() in every input. Goal of this test is to run decode more like what we encounter in reality, that is video frames are input to the codec periodic, rather than endless, restless.

Command

./kdvcodec_msdkdec_mt ~/Videos/red_kayak_1080p.h264 0 4 40

Description

Run 4 pararol instances, loop 40 times, the decode thread sleeps 30 ms after input one frame to the codec.

Result

Overall: 68136, output frames: 7993, fps=295.29221, Total used time: 27.06810 s, 27068103 usec

CPU usage: 5% ~ 10%

(0x02291010): 501, output frames: 498, fps=18.70292, Total used time: 26.62686 s, 26626861 usec

(0x02296190): 1002, output frames: 493, fps=18.37663, Total used time: 26.82755 s, 26827547 usec

(0x022B1660): 3507, output frames: 492, fps=18.32293, Total used time: 26.85159 s, 26851594 usec

(0x022A8FF0): 5010, output frames: 505, fps=18.77590, Total used time: 26.89618 s, 26896182 usec

(0x0229B280): 1503, output frames: 499, fps=18.54937, Total used time: 26.90118 s, 26901183 usec

(0x022A1380): 2004, output frames: 495, fps=18.38211, Total used time: 26.92836 s, 26928363 usec

(0x022AD0D0): 5511, output frames: 492, fps=18.36819, Total used time: 26.78543 s, 26785435 usec

(0x022B5740): 4008, output frames: 506, fps=18.81383, Total used time: 26.89511 s, 26895107 usec

(0x022B9820): 4509, output frames: 503, fps=18.72153, Total used time: 26.86746 s, 26867461 usec

(0x0229F0B0): 3006, output frames: 511, fps=19.08146, Total used time: 26.77992 s, 26779920 usec

(0x022D1C00): 6513, output frames: 499, fps=18.51492, Total used time: 26.95124 s, 26951235 usec

(0x022A5460): 2505, output frames: 496, fps=18.47320, Total used time: 26.84970 s, 26849702 usec

(0x022DDEA0): 8016, output frames: 498, fps=18.48051, Total used time: 26.94731 s, 26947310 usec

(0x022CDB20): 6012, output frames: 518, fps=19.27356, Total used time: 26.87620 s, 26876199 usec

(0x022D9DC0): 7515, output frames: 494, fps=18.36024, Total used time: 26.90597 s, 26905971 usec

(0x022D5CE0): 7014, output frames: 494, fps=18.40993, Total used time: 26.83335 s, 26833345 usec

————————————————-

Overall: 68136, output frames: 7993, fps=295.29221, Total used time: 27.06810 s, 27068103 usec

Test analysis & conclusion

According to the test results, scenario 1 running with 4 instance seems have the best performance which can decode 500+ 1080P frames per second, however the CPU usage is also the highest one.

Best performance: scenario 1. Max to 530+ fps (1080P) , with CPU 40% ~ 60% (if reuse mfxFrameSurface1 pointer may get 600+ fps performance)

Best loading: scenario 3. CPU usage: 10%~ while decoding 1080P video in 350+ fps.

Tips for running this test:

1. Don’t too much instances if you don’t have sufficient RAM
2. Enlarge shared memory for graphic card driver, and see whether we can get more improvement in MSDK performance.

Leave a comment

Your email address will not be published. Required fields are marked *