Results for SSE4
On this page:
 

The name erroneously given to the Supplemental SSE3 instructions of Intel's Core 2 chips by the public, but not by Intel. See SSE.



 
 
Wikipedia: SSE4

SSE4 is a new instruction set for the Intel Core microarchitecture, initially implemented in the Penryn processor.

It was announced on September 27, 2006 at the Fall 2006 Intel Developer Forum, with vague details in a white paper [1]; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the presentation [2]. The SSE4 Programming Reference is available from Intel.

SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as SSE4.1 in some Intel documentation, will be available in Penryn. SSE4.1 and SSE4.2, a subset consisting of 7 instructions, will first be available in Nehalem. Intel, unusually, credits feedback from developers as playing an important role in the development of the instruction set.

What is now known as Supplemental Streaming SIMD Extension 3 (SSSE3) was also referred to as SSE4 by fans during development of the Intel Core microarchitecture. This has caused a bit of confusion in the community.

The use of SSE4 instructions is enabled with the -QxS or -QaxS optimization switch in version 10 of the Intel C compiler. S stands for Swing New Instructions -- string "Intel(R) processors with Swing New Instructions support" can be found in compiler binary and library files.

New instructions

Unlike all previous iterations of SSE, SSE4 contains instructions that do operations not specific to multimedia applications. It features a number of instructions whose action is determined by a constant field, and, in a rather surprising move, a set of instructions which take XMM0 as an implicit third operand. In addition, SSE4 totally lacks support for operations on 64-bit MMX registers; SIMD integer operations can be carried out on 128-bit XMM registers only.

Several of these instructions are enabled by the single-cycle shuffle engine in Penryn.

Instruction Description
MPSADBW Compute eight offset sums of absolute differences (i.e. |x0-y0|+|x1-y1|+|x2-y2|+|x3-y3|, |x0-y1|+|x1-y2|+|x2-y3|+|x3-y4|, ...); this operation is extremely important for modern HDTV codecs, and (see [3]) allows an 8x8 block difference to be computed in less than seven cycles. One bit of a three-bit immediate operand indicates whether y0 .. y11 or y4 .. y15 should be used from the destination operand, the other two whether x0..x3, x4..x7, x8..x11 or x12..x15 should be used from the source.
PHMINPOSUW Sets the bottom unsigned 16-bit word of the destination to the smallest unsigned 16-bit word in the source, and the next-from-bottom to the index of that word in the source.
PMULLD, PMULDQ Packed signed/unsigned 16->32-bit and 32->64-bit integer multiply
DPPS, DPPD Dot product for AOS (Array of Structs) data. This takes an immediate operand consisting of four (or two for DPPD) bits to select which of the entries in the input to multiply and accumulate, and another four (or two for DPPD) to select whether to put 0 or the dot-product in the appropriate field of the output.
BLENDPS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB, PBLENDW Conditional copying of elements in one location with another, based (for non-V form) on the bits in an immediate operand, and (for V form) on the bits in register XMM0.
PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSD Packed minimum/maximum for different integer operand types
ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD Round values in a floating-point register to integers, using one of four rounding modes specified by an immediate operand
INSERTPS, PINSRB, PINSRD/PINSRQ, EXTRACTPS, PEXTRB, PEXTRW, PEXTRD/PEXTRQ The INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86 register memory location and insert it into a field in the destination register given by an immediate operand, EXTRACTPS and PEXTR read a field from the source register and insert it into an x86 register or memory location. For example, PEXTRD eax, [xmm0], 1; EXTRACTPS [addr+4*eax], xmm1, 1 stores the first field of xmm1 in the address given by the first field of xmm0.
PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ Packed sign/zero extension to wider types
PTEST This does the same as the TEST instruction, in that it sets the ZF and CF flags to the result of an AND between its operators ... it sets the Z flag if any of the bits matched, and the C flag if all of them did.
PCMPEQQ Quadword (64 bits) compare for equality
PACKUSDW Convert signed DWORDs into unsigned WORDs with saturation.
MOVNTDQA Efficient read from write-combining memory area into SSE register; this is useful for retrieving results from peripherals attached to the memory bus.
CRC32 (SSE4.2) Accumulate CRC32 value
PCMPESTRI (SSE4.2) Packed Compare Explicit Length Strings, Return Index
PCMPESTRM (SSE4.2) Packed Compare Explicit Length Strings, Return Mask
PCMPISTRI (SSE4.2) Packed Compare Implicit Length Strings, Return Index
PCMPISTRM (SSE4.2) Packed Compare Implicit Length String, Return Mask
PCMPGTQ (SSE4.2) Compare Packed Data For Greater Than
POPCNT (SSE4.2, optional) Population count (count number of bits set to 1); shares the same opcode for JMPE, the instruction used in Itanium CPUs to escape from IA-32 mode. POPCNT instruction may also be implemented in some processors that do not support SSE4 instruction set extensions (such as AMD K10) and a separate bit can be tested to confirm POPCNT presence.

References

  1. ^ http://www.intel.com/technology/architecture-silicon/sse4-instructions/index.htm
  2. ^ https://intel.wingateweb.com/published/BMAS005/BMAS005_100Eng.pdf
  3. ^ http://softwarecommunity.intel.com/articles/eng/1246.htm

See also


 
 

Join the WikiAnswers Q&A community. Post a question or answer questions about "SSE4" at WikiAnswers.

 

Copyrights:

Computer Desktop Encyclopedia. THIS COPYRIGHTED DEFINITION IS FOR PERSONAL USE ONLY.
All other reproduction is strictly prohibited without permission from the publisher.
© 1981-2008 Computer Language Company Inc.  All rights reserved.  Read more
Wikipedia. This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "SSE4" Read more

Search for answers directly from your browser with the FREE Answers.com Toolbar!  
Click here to download now. 

Get Answers your way! Check out all our free tools and products.

On this page:   E-mail   print Print  Link  

 

Keep Reading

Mentioned In: