1. 首页
  2. 课程学习
  3. 专业指导
  4. SSE4资料学习使用

SSE4资料学习使用

上传者: 2018-12-07 09:46:37上传 PDF文件 217.82KB 热度 56次
SSE4 Home › Articles by Kiefer Kuah April 2007 Intel Software Solutions Group Abstract Intel® SSE4 is a new set of Single Instruction Multiple Data (SIMD) instructions that will be introduced in the 45nm Next Generation Intel® Core™2 processor family (Penryn) and improve the performance and energy efficiency of a broad range of applications. This white paper describes how video encoders can utilize Intel SSE4 instructions to achieve 1.6x to 3.8x performance speedups in integer motion vector search, a frequently used motion estim ation function. Contents 1. Introduction 2. Motion Estimation Using MPSADBW and PHMINPOSUW 3. Results 4. Conclusion A. SSE2 - Optimized Function for 4x4 Blocks B. Intel® SSE4 - Optimized Function for 4x4 Blocks C. SSE2 - Optimized Function for 8x8 Blocks D. Intel® SSE4 - Optimized Function for 8x8 Blocks E. SSE2 - Optimized Function for 16x16 Blocks F. Intel® SSE4 - Optimized Function for 16x16 Blocks 1. Introduction Intel® Streaming SIMD Extensions 4 (Intel® SSE4) is a new set of Single Instruction Multiple Data (SIMD) instructions designed to improve the performance of various applications, such video encoders, image processing, and 3D games. Intel SSE4 builds upon the Intel® 64 and IA-32 instruction set, the most popular and broadly used computer architecture for developing 32-bit and 64-bit applications. Intel SSE4 will be introduced in the 45nm Next Generation Intel® Core™2 processor family (Penryn). This white paper will describe how video encoders can benefit from the Intel SSE4 instructions, achieving 1.6x to 3.8x performance speedups in integer motion vector search, a frequently used motion estimation function. Three different block sizes, 4x4, 8x8, and 16x16, are used in this paper to represent some of the variations that are used in motion estimation and to illustrate how the code can be adapted to suit these variations. 2. Motion Estimation Using MPSADBW and PHMINPOSUW Motion estimation is one of the main bottlenecks in video encoders. It involves searching reference frames for best matches and often accounts for about 40% of the total CPU cycles consumed by an encoder. The quality of the search is a factor that determines the compression ratio and the video quality of the enco ded video. This search operation is often the target of algorithmic and SIMD optimizations to improve the encoding speed. An un-optimized version of the block matching function for 4x4 block size is shown in Figure 2 -1. The example code in this paper performs only the integer motion vector search of the motion estimation stage. -F cigoullarpes 2e- s1o. uUrcneovpietwim pilzaeindc oVpeyr tsoi oclnip boof aarndp Irnintte?ger Block Matching Function
用户评论
天步维艰 2018-12-07 09:46:37

很好,做数学库必备啊