"Use of mnemonics" demonstrations
XMM SSE floating point instructions

Go to the demos

About the XMM SSE floating point instructions
These are the SSE (streaming SIMD extensions) floating point instructions which use the 128-bit XMM registers. The SSE instructions handle single-precision (32-bit) floating point values. Support for SSE is found in the Intel Pentium III processors and above. See also XMM SSE2 floating point instructions which handle double-precision (64-bit) floating point values.
Before using these instructions in your code you need to check if they are available on the processor which is running your program. This is done by calling CPUID having set EAX=1. Then test bit 25 of EDX. The bit will be set if the SSE instructions can be used.
In the tests the following data declarations are used:-

SINGLEFP1 DD 1.1
          DD 2.2
          DD 3.3
          DD 4.4
SINGLEFP2 DD 10.66
          DD 20.66
          DD 30.66
          DD 40.66
SINGLEFPN DD -1.1
          DD +2.2
          DD +3.3
          DD -4.4
DINTEGER  DD 23,24
DRESULT   DD 0
Since it is possible that the labels which point to the floating point values may not be on a 16-bit boundary, the MOVUPS instruction must be used to transfer the data from memory into an XMM register. MOVUPS (move four unaligned packed single-precision) does not care about alignment. If you specify ALIGN 16 immediately before the relevant data declaration, however, then the assembler will make sure the data is on a 16-byte boundary and the faster MOVAPS (move four aligned packed single-precision) can be used instead. If you get this wrong your program will cause an exception. See more about this (in the case of MOVDQA and MOVDQU which work in the same way). When transferring from between registers either MOVUPS or MOVAPS may be used.
The instructions we are looking at here tend largely to be two types. The first type of instruction deals with packed floating point numbers. These instructions have "PS" in their mnemonic name, referring to "packed single-precision" and they work on more than one single-precision (32-bit) floating point value at once. The second type of instruction deals with just one floating point value. These instructions have "SS" in their mnemonic name referring to "scalar single-precision". They work on the lowest part of the XMM register only, that is to say the first 32 bits of the register (bits 0 to 31).
To watch these tests properly you need to set the appropriate breakpoint, start the test and then single step through the instructions. You can then watch how they change the XMM registers. Using GoBug you can make the XMM registers appear in their floating point SSE format using the appropriate button on the toolbar.

SSE instructions:-
Data movement instructions
Arithmetic instructions
Logical instructions
Comparison instructions
Shuffle and unpack instructions
Conversion instructions

SSE Data movement instructions
This demonstrates moving data into the registers and between the registers. MOVUPS and MOVAPS (aligned version), MOVSS, MOVLPS and MOVHPS can also be used to get values out and into memory. MOVMSKPS can be used after a comparison instruction to get the result of the compare into eax for analysis.
The breakpoint is XMMSSE_FPDATA:-

XMMSSE_FPDATA:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h       ;test bit 25 (SSE)
JNZ >L20                ;SSE available
CALL NOSSEFPMESS        ;displays message if SSE not available
RET
L20:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1]      ;move four fp values into XMM0
MOVAPS XMM1,XMM0             ;copy to XMM1
MOVHPS XMM3,[SINGLEFP1]      ;move two fp values into XMM3 (high)
MOVLPS XMM3,[SINGLEFP1]      ;move two fp values into XMM3 (low)
MOVLHPS XMM4,XMM0            ;move two fp values low to high
MOVHLPS XMM4,XMM0            ;move two fp values high to low
MOVSS XMM5,[SINGLEFP1]       ;move one fp value into XMM5 (lowest)
MOVSS XMM6,XMM0              ;move one fp value into XMM6 (lowest)
MOVUPS XMM0,[SINGLEFPN]      ;move two -ve, two +ve values into XMM0
MOVMSKPS EAX,XMM0            ;get all sign bits in XMM0 into eax
RET

SSE Arithmetic instrunctions
This demonstrated the arithmetic instructions which can work in the XMM registers using single-precision (32-bit) numbers.
The breakpoint is XMMSSE_FPARITH:-

XMMSSE_FPARITH:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h       ;test bit 25 (SSE)
JNZ >L22                ;SSE available
CALL NOSSEFPMESS        ;displays message if SSE not available
RET
L22:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVAPS XMM2,XMM0        ;copying to XMM2
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVAPS XMM3,XMM1        ;copying to XMM3
ADDPS  XMM0,XMM1        ;add all fp values result in XMM0
MOVAPS XMM0,XMM2        ;restore value in XMM0
SUBPS  XMM0,XMM1        ;subtract all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
ADDSS  XMM0,XMM1        ;add lowest fp value result in XMM0
SUBSS  XMM0,XMM1        ;subtract lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
MULPS  XMM0,XMM1        ;multiply all fp values result in XMM0 
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
MULSS  XMM0,XMM1        ;multiply lowest fp value result in XMM0
;*******                
MOVAPS XMM0,XMM2        ;restore value in XMM0
DIVPS  XMM0,XMM1        ;divide all fp values result in XMM0 
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
DIVSS  XMM0,XMM1        ;divide lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
RCPPS  XMM0,XMM1        ;get reciprocals of all fp values result in XMM0 
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
RCPSS  XMM0,XMM1        ;get reciprocal of lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
SQRTPS XMM0,XMM1        ;get square roots of all fp values result in XMM0 
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
SQRTSS XMM0,XMM1        ;get square root of lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
RSQRTPS XMM0,XMM1       ;get reciprocals of square roots of all fp values result in XMM0 
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
RSQRTSS XMM0,XMM1       ;get square root of lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
MAXPS XMM0,XMM1         ;get numerically greater fp values result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
MAXSS XMM0,XMM1         ;get numerically greater of low fp values result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
MINPS XMM0,XMM1         ;get numerically smaller fp values result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
MINSS XMM0,XMM1         ;get numerically smaller of low fp values result in XMM0
RET

SSE Logical instructions
This demonstrates the logical instructions which can work in the XMM registers using single-precision (32-bit) numbers.
The breakpoint is XMMSSE_FPLOGIC:-

XMMSSE_FPLOGIC:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h       ;test bit 25 (SSE)
JNZ >L24                ;SSE available
CALL NOSSEFPMESS        ;displays message if SSE not available
RET
L24:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVAPS XMM2,XMM0        ;copying to XMM2
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVAPS XMM3,XMM1        ;copying to XMM3
ANDPS  XMM0,XMM1        ;perform AND on all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
ANDNPS XMM0,XMM1        ;perform AND NOT on all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
ORPS   XMM0,XMM1        ;perform OR on all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2        ;restore value in XMM0
XORPS  XMM0,XMM1        ;perform XOR on all fp values result in XMM0
RET

SSE Comparison instructions
This demonstrates the comparison instructions which can work in the XMM registers using single-precision (32-bit) numbers.
You tell CMPPS and CMPSS what to do by specifying an immediate value in the third operand. It is not easy to remember what value does what, so some assemblers (including GoAsm) also provide psuedo mnemonics in the form recommended by Intel (given here in the comment). Somewhat easier to use, because they use the ordinary flags are COMISS and UCOMISS although they only work on one floating point value in the XMM register (contained in bits 0-31).
The breakpoint is XMMSSE_FPCOMP:-

XMMSSE_FPCOMP:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h       ;test bit 25 (SSE)
JNZ >L26                ;SSE available
CALL NOSSEFPMESS        ;displays message if SSE not available
RET
L26:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVSS  XMM0,XMM1        ;make lowest of XMM0 and XMM1 the same
MOVAPS XMM2,XMM0        ;copying to XMM2
MOVAPS XMM3,XMM1        ;copying to XMM3
;********************* compare instructions working on all four fp values
CMPPS XMM0,XMM1,0       ;=CMPEQPS see whether equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPPS XMM0,XMM1,1       ;=CMPLTPS see whether less than, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPPS XMM0,XMM1,2       ;=CMPLEPS see whether less than or equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPPS XMM0,XMM1,3       ;=CMPUNORDPS see unordered, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPPS XMM0,XMM1,4       ;=CMPNEQPS see whether not equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPPS XMM0,XMM1,5       ;=CMPNLTPS see whether not less than, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPPS XMM0,XMM1,6       ;=CMPNLEPS see whether not less than or equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPPS XMM0,XMM1,7       ;=CMPORDPS see whether ordered, result in XMM0
;********************* compare instructions working on lowest only
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,0       ;=CMPEQPS see whether equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,1       ;=CMPLTPS see whether less than, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,2       ;=CMPLEPS see whether less than or equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,3       ;=CMPUNORDPS see unordered, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,4       ;=CMPNEQPS see whether not equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,5       ;=CMPNLTPS see whether not less than, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,6       ;=CMPNLEPS see whether not less than or equal, result in XMM0
MOVAPS XMM0,XMM2        ;restore original value to XMM0
CMPSS XMM0,XMM1,7       ;=CMPORDPS see whether ordered, result in XMM0
;********************* compare and give result in eflags
MOVAPS XMM0,XMM2        ;restore original value to XMM0
COMISS XMM0,XMM1        ;look at lowest only result in eflags
UCOMISS XMM0,XMM1       ;(unordered compare)
MOVUPS XMM0,[SINGLEFPN] ;move two -ve, two +ve values into XMM0
COMISS XMM0,XMM1        ;look at lowest only - result in eflags
UCOMISS XMM0,XMM1       ;(unordered compare)
RET

SSE Shuffle and unpack instructions
With these instructions you can move the single-precision (32-bit) floating point values around the XMM registers.
The breakpoint is XMMSSE_SHUFF:-

XMMSSE_SHUFF:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h       ;test bit 25 (SSE)
JNZ >L28                ;SSE available
CALL NOSSEFPMESS        ;displays message if SSE not available
RET
L28:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVAPS XMM2,XMM0        ;copying to XMM2
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVAPS XMM3,XMM1        ;copying to XMM3
SHUFPS XMM0,XMM1,33h    ;shuffle pack into destination
SHUFPS XMM0,XMM0,33h    ;shuffle pack in destination
MOVAPS XMM0,XMM2        ;restore original value to XMM0
UNPCKHPS XMM0,XMM1      ;unpack (high) and put into destination 
MOVAPS XMM0,XMM2        ;restore original value to XMM0
UNPCKLPS XMM0,XMM0      ;unpack (low) and put into destination 
RET

SSE Conversion instructions
The instructions convert dword integers into single-precision (32-bit) floating point values and vice versa. They should be read together with the SSE2 conversion instructions.
The breakpoint is XMMSSE_CONV:-

XMMSSE_CONV:
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h       ;test bit 25 (SSE)
JNZ >L30                ;SSE available
CALL NOSSEFPMESS        ;displays message if SSE not available
RET
L30:
;***** display XMM registers in SSE mode ..
CVTPI2PS XMM0,[DINTEGER]   ;convert 23 and 24 to single-precision fp values
CVTSI2SS XMM1,[DINTEGER]   ;convert 23 only to single-precision fp value
;***** display also MMX registers in dword integer mode ..
CVTPS2PI MM0,XMM0          ;convert 23 and 24 back again from XMM0 into MM0
CVTTPS2PI MM1,XMM0         ;same as above but with truncation
CVTSS2SI EAX,XMM1          ;convert 23 back again from XMM1 into EAX
CVTTSS2SI EDX,XMM1         ;same as above but with truncation
RET