MMX指令集/MMX指令集 编辑
MMX指令集中包括有57条多媒体指令,通过这些指令可以一次处理多个数据,在处理结果超过实际处理能力的时候也能进行正常处理,这样在软件的配合下,就可以得到更高的性能。MMX的益处在于,当时存在的操作系统不必为此而做出任何修改便可以轻松地执行MMX程序。但是,问题也比较明显,那就是MMX指令集与x87浮点运算指令不能够同时执行,必须做密集式的交错切换才可以正常执行,这种情况就势必造成整个系统运行质量的下降。
http://blog.csdn.net/arau_sh/article/details/7575043
http://blog.csdn.net/arau_sh/article/details/7575059
http://blog.csdn.net/arau_sh/article/details/7575066
MMX指令集(详解)
标签: 编译器存储英特尔fpc扩展
2012-05-17 09:44 559人阅读 评论(0) 收藏 举报
分类: Asm(33)
目录(?)
转自
http://blog.csdn.net/dahan_wangtao/article/details/1944153
EMMS MMX状态置空:
将FP特征字置空(全1),使后续浮点指令可以使用浮点寄存器,其他MMX指令自动置FP为全0.本指令应在所有MMX例程结束和调用可含有FP指令的例程时使用,以清除MMX状态.
MOVD mm,r/m32
MOVD r/m32,mm 转移32位数据:
将32位数据从整型寄存器/内存移到MMX寄存器,和反向移动.MOVD不能在MMX寄存器之间,内存之间及整型寄存器之间移动数据.目标操作数为MMX寄存器时,32位源操作数写入目标寄存器的低32位.目标寄存器"0扩展"为64位.源操作数为MMX寄存器时,该寄存器的低32位被写入目标操作数.
MOVQ mm,r/m64
MOVQ r.m64,mm 转移64位数据:
将64位数据从整型寄存器/内存移到MMX寄存器,和反向移动.目标操作数和源操作数可为MMX寄存器,64位内存操作数.但MOVQ不能在内存和内存之间进行数据转移.
PACKSSWB mm,mm/m64
PACKSSDW mm,mm/m64
有符号饱和方式数据成组:
将MMX寄存器和MMX寄存器/内存单元中的有符号字组变成MMX寄存器的有符号字节组.和将MMX寄存器和MMX寄存器/内存单元中的有符号双字组变成MMX寄存器的有符号字组.(注1)
PACKUSWB mm,mm/m64 无符号饱和方式数据成组
将MMX寄存器和MMX寄存器/内存单元中的有符号字组变成MMX寄存器的无符号字节组.(注1)
PADDB mm,mm/m64
PADDW mm,mm/m64
PADDD mm,mm/m64
环绕方式数据组相加:
按环绕方式将MMX寄存器/内存单元中的字节组(字组,双字组)相加到MMX寄存器中(注1)
PADDSB mm,mm/m64
PADDSW mm,mm/m64
饱和方式有符号数据组相加:
按饱和方式将MMX寄存器/内存单元中的有符号字节组(字组)相加到MMX寄存器中的有符号字节组(字组)数据.(注1)
PADDUSB mm,mm/m64
PADDUSW mm,mm/m64
饱和方式无符号数据组相加:
按饱和方式将MMX寄存器/内存单元中的无符号字节组(字组)相加到MMX寄存器中的无符号字节组(字组)数据.(注1)
PAND mm,mm/m64
逐位逻辑与:
将MMX寄存器/内存单元中的64位数据进行与操作,结果存于MMX寄存器中.
PANDN mm,mm/m64
逐位逻辑与非:
将MMX寄存器中的64位值取反,再将取反后的MMX寄存器与MMX寄存器/内存单元中的64位数据进行与操作,结果存于MMX寄存器中.
PCMPEQB mm,mm/m64
PCMPEQW mm,mm/m64
PCMPEQD mm,mm/m64
成组数据的相等比较:
将MMX寄存器与MMX寄存器/内存单元中的字节组(字组,双字组)数据进行相等比较.
该指令将目标操作数和源操作数的相应数据元素进行比较,相等则目标寄存器的对应数据元素被置为全1,否则置为全0.
eg:PCMPEQE mm,mm/m64
mm ? ? 00000000000000111 0111000111000111
mm/m64 ? ? 11111110000001100 0111000111000111
结果mm ? ? 00000000000000000 1111111111111111
PCMPGTB mm,mm/m64
PCMPGTW mm,mm/m64
PCMPGTD mm,mm/m64
成组数据的相等比较:
将MMX寄存器与MMX寄存器/内存单元中的字节组(字组,双字组)数据进行大于比较.
该指令将目标操作数和源操作数的相应数据元素进行比较,大于则目标寄存器的对应数据元素被置为全1,否则置为全0.(参考上一条)
PMADDWD mm,mm/m64 数据组(字组)的乘加:
将MMX寄存器与MMX寄存器/内存单元中的字组数据相乘,然后将32位结果逐对相加并作为双字存于MMX寄存器中.
eg:PMADDWD mm,mm/m64
mm ? ? 0111000111000111 0111000111000111
操作 * * * *
mm,mm/m64 ? ? 1000000000000000 0000010000000000
操作 /_____+____/ /______+_____/
mm ? ? 1100100011100011 1001110000000000
PMULHW mm,mm/m64
成组数据(字组)的乘后取高位:
将MMX寄存器与MMX寄存器/内存单元中的有符号字组数据相乘,然后将结果的高16位存入MMX寄存器.
eg:PMULHW mm,mm/m64
mm ? ? 0111000111000111 0111000111000111
操作 * * * *
mm/m64 ? ? 1000000000000000 0000010000000000
操作 High Order High Order High Order High Order
mm ? ? 1100011100011100 0000000111000111
PMULLW mm,mm/m64
成组数据(字组)的乘后取低位:
将MMX寄存器与MMX寄存器/内存单元中的有符号字组数据相乘,然后将结果的低16位存入MMX寄存器.(参考上一条)
POR mm,mm/m64 逐位逻辑或:
将MMX寄存器/内存单元中的64位数据进行或操作,结果存于MMX寄存器中.
PSLLW mm,mm/m64
PSLLD mm,mm/m64
PSLLQ mm,mm/m64
PSLLW mm,imm8
PSLLD mm,imm8
PSLLQ mm,imm8
成组数据的逻辑左移:
将MMX寄存器中的字(双字,四字)数据按MMX寄存器/内存单元指定的个数左移,低位移入0.
将MMX寄存器中的字(双字,四字)数据按8位立即数指定的个数左移,低位移入0.
PSRAW mm,mm/m64
PSRAD mm,mm/m64
PSRAW mm,imm8
PSRAD mm,imm8 成组数据的算术右移:
将MMX寄存器中的字(双字)数据按MMX寄存器/内存单元指定的个数右移,移动中保持符号位.
将MMX寄存器中的字(双字)数据按8位立即数指定的个数右移,移动中保持符号位.
PSRLW mm,mm/m64
PSRLD mm,mm/m64
PSRLQ mm,mm/m64
PSRLW mm,imm8
PSRLD mm,imm8
PSRLQ mm,imm8 成组数据的逻辑右移:
将MMX寄存器中的字(双字)数据按MMX寄存器/内存单元指定的个数右移,移出位用0填充.
将MMX寄存器中的字(双字)数据按8位立即数指定的个数右移,移出位用0填充.
PSUBB mm,mm/m64
PSUBW mm,mm/m64
PSUBD mm,mm/m64 环绕方式成组数据相减:
从MMX寄存器中按字节(字,双字)减去MMX寄存器/内存单元中的字节(字,双字)组.(注1)
PSUBSB mm,mm/m64
PSUBSW mm,mm/m64
饱和方式有符号成组数据相减:
从MMX寄存器中的有符号成组字节(字)组数据减去MMX寄存器/内存单元中的有符号字节(字)组数据.(注1)
PSUBUSB mm,mm/m64
PSUBUSW mm,mm/m64 饱和方式有符号成组数据相减:
从MMX寄存器中的无符号成组字节(字)组数据减去MMX寄存器/内存单元中的无符号字节(字)组数据.(注1)
PUNPCKHBW mm,mm/m64
PUNPCKHWD mm,mm/m64
PUNPCKHDQ mm,mm/m64 高位成组数据分解:
此指令交替取出源操作数和目标操作数的数据元素的高半部分,写入目标操作数中,数据元素的低半部分被忽略.
eg:PUNPCKHBW mm,mm/m64
PUNPCKLBW mm,mm/m64
PUNPCKLWD mm,mm/m64
PUNPCKLDQ mm,mm/m64 低位成组数据分解:
此指令交替取出源操作数和目标操作数的数据元素的低半部分,写入目标操作数中,数据元素的高半部分被忽略.(参考上一条)
PXOR mm,mm/m64 逐位逻辑异或:
将MMX寄存器/内存单元中的64位数据进行异或操作,结果存于MMX寄存器中.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
补充:
PMULHUW - 压缩无符号乘法取高位
操作码
指令
说明
0F E4 /r
PMULHUW mm1, mm2/m64
将 mm1 寄存器与 mm2/m64 中的压缩无符号字整数相乘,并将结果的高 16 位存储到 mm1。
66 0F E4 /r
PMULHUW xmm1, xmm2/m128
将 xmm1 与 xmm2/m128 中的压缩无符号字整数相乘,并将结果的高 16 位存储到 xmm1。
说明
对目标操作数(第一个操作数)与源操作数(第二个操作数)中的压缩无符号字整数执行 SIMD 乘法,并将每个 32 位中间结果的高 16 位存储到目标操作数。(图 3-7 显示使用 64 位操作数时此操作的情况)。源操作数可以是 MMX™ 技术寄存器或 64 位内存位置,也可以是 XMM 寄存器或 128 位内存位置。目标操作数可以是 MMX 或 XMM 寄存器。
PMULHUW 与 PMULHW 指令操作
操作
PMULHUW instruction with 64-bit operands:
TEMP0 DEST * SRC; * Unsigned multiplication *
TEMP1 DEST * SRC;
TEMP2 DEST * SRC;
TEMP3 DEST * SRC;
DEST TEMP0;
DEST TEMP1;
DEST TEMP2;
DEST TEMP3;
PMULHUW instruction with 128-bit operands:
TEMP0 DEST * SRC; * Unsigned multiplication *
TEMP1 DEST * SRC;
TEMP2 DEST * SRC;
TEMP3 DEST * SRC;
TEMP4 DEST * SRC;
TEMP5 DEST * SRC;
TEMP6 DEST * SRC;
TEMP7 DEST * SRC;
DEST TEMP0;
DEST TEMP1;
DEST TEMP2;
DEST TEMP3;
DEST TEMP4;
DEST TEMP5;
DEST TEMP6;
DEST TEMP7;
英特尔(R) C++ 编译器等价内部函数
PMULHUW __m64 _mm_mulhi_pu16(__m64 a, __m64 b)
PMULHUW __m128i _mm_mulhi_epu16 ( __m128i a, __m128i b)
影响的标志
无。
保护模式异常
#GP(0) - 如果内存操作数有效地址超出 CS、DS、ES、FS 或 GS 段限制。(仅限 128 位操作)。如果内存操作数未对齐 16 字节边界,不论是哪一段。
#SS(0) - 如果内存操作数有效地址超出 SS 段限制。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
#PF(错误代码) - 如果发生页错误。
#AC(0)(仅限 64 位操作)- 如果启用对齐检查并在当前特权级别为 3 时进行未对齐的内存引用。
实地址模式异常
#GP(0)(仅限 128 位操作)- 如果内存操作数未对齐 16 字节边界,不论是哪一段。如果操作数的任何部分出现在 0 到 FFFFH 的有效地址空间之外。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
虚 8086 模式异常
与“实地址模式”中的异常相同。
#PF(错误代码) - 页错误。
#AC(0)(仅限 64 位操作)- 如果在启用对齐检查的情况下进行未对齐的内存引用。
数值异常
无。
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PACKUSWB - 无符号饱和压缩
操作码
指令
说明
0F 67 /r
PACKUSWB mm, mm/m64
使用饱和运算将 mm 中的 4 个有符号字与 mm/m64 中的 4 个有符号字压缩成 8 个无符号字节,结果放入 mm。
66 0F 67 /r
PACKUSWB xmm1, xmm2/m128
使用饱和运算将 xmm1 与 xmm2/m128 中的有符号字压缩成无符号字节,结果放入 xmm1。
说明
使用饱和运算将目标操作数(第一个操作数)中的 4 个有符号字与源操作数(第二个操作数)中的 4 个有符号字压缩成 8 个有符号字节,结果放入目标操作数(请参阅“图 3-5”)。如果字的有符号值超出无符号字节的范围(即大于 FFH 或小于 00H),则分别将饱和字节值 FFH 或 00H 存储到目标操作数。
目标操作数必须是 MMX™ 技术寄存器;源操作数可以是 MMX 寄存器,也可以是四字内存位置。
将源操作数 xmm2/m128 中的八个有符号字与目标操作数 xmm1 中的八个有符号字压缩成十六个无符号字节,结果放入目标寄存器 xmm1。如果字的有符号值大于或小于无符号字节的范围,则对该值执行饱和运算(上溢时为 FFH,下溢时为 00H)。目标操作数是 XMM 寄存器。源操作数可以是 XMM 寄存器或 128 位内存操作数。
图 3-5. PACKUSWB 指令的操作
操作
PACKUSWB instruction with 64-bit operands:
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte SRC;
DEST SaturateSignedWordToUnsignedByte SRC;
DEST SaturateSignedWordToUnsignedByte SRC;
DEST SaturateSignedWordToUnsignedByte SRC;
PACKUSWB instruction with 128-bit operands:
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
英特尔(R) C++ 编译器等价内部函数
__m64 _mm_packs_pu16(__m64 m1, __m64 m2)
影响的标志
无。
保护模式异常
#GP(0) - 如果内存操作数有效地址超出 CS、DS、ES、FS 或 GS 段限制。(仅限 128 位操作)。如果内存操作数未对齐 16 字节边界,不论是哪一段。
#SS(0) - 如果内存操作数有效地址超出 SS 段限制。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
#PF(错误代码) - 如果发生页错误。
#AC(0)(仅限 64 位操作)- 如果启用对齐检查并在当前特权级别为 3 时进行未对齐的内存引用。
实地址模式异常
#GP(0)(仅限 128 位操作)- 如果内存操作数未对齐 16 字节边界,不论是哪一段。如果操作数的任何部分出现在 0 到 FFFFH 的有效地址空间之外。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
虚 8086 模式异常
与“实地址模式”中的异常相同。
#PF(错误代码) - 页错误。
#AC(0)(仅限 64 位操作)- 如果在启用对齐检查的情况下进行未对齐的内存引用。
---------------------------------------------------------------------------------------------------------------------------------------
PACKSSWB/PACKSSDW - 有符号饱和压缩
操作码
指令
说明
0F 63 /r
PACKSSWB mm1, mm2/m64
使用饱和运算将 mm1 中的 4 个与 mm2/m64 中的 4 个压缩有符号字整数压缩成 8 个压缩有符号字节整数,结果放入 mm1。
66 0F 63 /r
PACKSSWB xmm1, xmm2/m128
使用饱和运算将 xmm1 中的 8 个与 xxm2/m128 中的 8 个压缩有符号字整数压缩成 16 个压缩有符号字节整数,结果放入 xxm1。
0F 6B /r
PACKSSDW mm1, mm2/m64
使用饱和运算将 mm1 中的 2 个与 mm2/m64 中的 2 个压缩有符号双字整数压缩成 4 个压缩有符号字整数,结果放入 mm1。
66 0F 6B /r
PACKSSDW xmm1, xmm2/m128
使用饱和运算将 xmm1 中的 4 个与 xxm2/m128 中的 4 个压缩有符号双字整数压缩成 8 个压缩有符号字整数,结果放入 xxm1。
说明
使用饱和运算将压缩有符号字整数压缩成压缩有符号字节整数 (PACKSSWB),或是将压缩有符号双字整数压缩成压缩有符号字整数 (PACKSSDW)。PACKSSWB 指令将目标操作数(第一个操作数)中的 4 个有符号字与源操作数(第二个操作数)中的 4 个有符号字压缩成 8 个有符号字节,结果放入目标操作数。如果字的有符号值超出有符号字节的范围(即大于 7FH 或小于 80H),则分别将饱和字节值 7FH 或 80H 存储到目标操作数。
PACKSSDW 指令将目标操作数(第一个操作数)中的 2 个有符号双字与源操作数(第二个操作数)中的 2 个有符号双字压缩成 4 个有符号字,结果放入目标操作数(请参阅“图 3-4”)。如果双字的有符号值超出有符号字的范围(即大于 7FFFH 或小于 8000H),则分别将饱和字节值 7FFFH 或 8000H 存储到目标操作数。
PACKSSWB 与 PACKSSDW 指令的目标操作数必须是 MMX™ 技术寄存器;源操作数可以是 MMX 寄存器,也可以是四字内存位置。
使用有符号饱和运算压缩源操作数与目标操作数中的有符号数据元素,结果写入目标操作数。目标操作数是 XMM 寄存器。源操作数可以是 XMM 寄存器或 128 位内存操作数。
PACKSSWB 指令将源操作数中的八个有符号字与目标操作数中的八个有符号字压缩成十六个有符号字节,结果放入目标操作数。如果字的有符号值大于或小于有符号字节的范围,则对该值执行饱和运算(上溢时 7FH,下溢时 80H)。
PACKSSDW 指令将源操作数中的四个有符号双字与目标操作数中的四个有符号双字压缩成八个有符号字,结果放入目标寄存器。如果双字的有符号值大于或小于有符号字的范围,则对该值执行饱和运算(上溢时 7FFFH,下溢时 8000H))。
图 3-4. PACKSSDW 指令的操作
操作
PACKSSWB instruction with 64-bit operands
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte SRC;
DEST SaturateSignedWordToSignedByte SRC;
DEST SaturateSignedWordToSignedByte SRC;
DEST SaturateSignedWordToSignedByte SRC;
PACKSSDW instruction with 64-bit operands
DEST SaturateSignedDoublewordToSignedWord DEST;
DEST SaturateSignedDoublewordToSignedWord DEST;
DEST SaturateSignedDoublewordToSignedWord SRC;
DEST SaturateSignedDoublewordToSignedWord SRC;
PACKSSWB instruction with 128-bit operands
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
PACKSSDW instruction with 128-bit operands
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (SRC);
DEST SaturateSignedDwordToSignedWord (SRC);
DEST SaturateSignedDwordToSignedWord (SRC);
DEST SaturateSignedDwordToSignedWord (SRC);
英特尔(R) C++ 编译器等价内部函数
__m64 _mm_packs_pi16(__m64 m1, __m64 m2)
__m64 _mm_packs_pi32 (__m64 m1, __m64 m2)
影响的标志
无。
保护模式异常
#GP(0) - 如果内存操作数有效地址超出 CS、DS、ES、FS 或 GS 段限制。(仅限 128 位操作)。如果内存操作数未对齐 16 字节边界,不论是哪一段。
#SS(0) - 如果内存操作数有效地址超出 SS 段限制。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
#PF(错误代码) - 如果发生页错误。
#AC(0)(仅限 64 位操作)- 如果启用对齐检查并在当前特权级别为 3 时进行未对齐的内存引用。
实地址模式异常
#GP(0)(仅限 128 位操作)- 如果内存操作数未对齐 16 字节边界,不论是哪一段。如果操作数的任何部分出现在 0 到 FFFFH 的有效地址空间之外。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
虚 8086 模式异常
与“实地址模式”中的异常相同。
#PF(错误代码) - 页错误。
#AC(0)(仅限 64 位操作)- 如果在启用对齐检查的情况下进行未对齐的内存引用。
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PMULLW - 压缩有符号乘法取低位
操作码
指令
说明
0F D5 /r
PMULLW mm, mm/m64
将 mm1 寄存器与 mm2/m64 中的压缩有符号字整数相乘,并将结果的低 16 位存储到 mm1。
66 0F D5 /r
PMULLW xmm1, xmm2/m128
将 xmm1 与 xmm2/m128 中的压缩有符号字整数相乘,并将结果的低 16 位存储到 xmm1。
说明
对目标操作数(第一个操作数)与源操作数(第二个操作数)中的压缩有符号字整数执行 SIMD 乘法,并将每个 32 位中间结果的低 16 位存储到目标操作数。(下图显示此操作使用 64 位操作数时的情况)。源操作数可以是 MMX™ 技术寄存器或 64 位内存位置,也可以是 XMM 寄存器或 128 位内存位置。目标操作数可以是 MMX 或 XMM 寄存器。
图 3-8. PMULLU 指令操作
操作
PMULLW instruction with 64-bit operands:
TEMP0 DEST * SRC; * Signed multiplication *
TEMP1 DEST * SRC;
TEMP2 DEST * SRC;
TEMP3 DEST * SRC;
DEST TEMP0;
DEST TEMP1;
DEST TEMP2;
DEST TEMP3;
PMULLW instruction with 64-bit operands:
TEMP0 DEST * SRC; * Signed multiplication *
TEMP1 DEST * SRC;
TEMP2 DEST * SRC;
TEMP3 DEST * SRC;
TEMP4 DEST * SRC;
TEMP5 DEST * SRC;
TEMP6 DEST * SRC;
TEMP7 DEST * SRC;
DEST TEMP0;
DEST TEMP1;
DEST TEMP2;
DEST TEMP3;
DEST TEMP4;
DEST TEMP5;
DEST TEMP6;
DEST TEMP7;
英特尔(R) C++ 编译器等价内部函数
PMULLW __m64 _mm_mullo_pi16(__m64 m1, __m64 m2)
PMULLW __m128i _mm_mullo_epi16 ( __m128i a, __m128i b)
影响的标志
无。
保护模式异常
#GP(0) - 如果内存操作数有效地址超出 CS、DS、ES、FS 或 GS 段限制。(仅限 128 位操作)。如果内存操作数未对齐 16 字节边界,不论是哪一段。
#SS(0) - 如果内存操作数有效地址超出 SS 段限制。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
#PF(错误代码) - 如果发生页错误。
#AC(0)(仅限 64 位操作)- 如果启用对齐检查并在当前特权级别为 3 时进行未对齐的内存引用。
实地址模式异常
#GP(0)(仅限 128 位操作)- 如果内存操作数未对齐 16 字节边界,不论是哪一段。如果操作数的任何部分出现在 0 到 FFFFH 的有效地址空间之外。
#UD - 如果 CR0 中的 EM 设置为 1。(仅限 128 位操作)。如果 CR4 中的 OSFXSR 是 0。(仅限 128 位操作)。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#MF(仅限 64 位操作)- 如果存在未决的 x87 FPU 异常。
虚 8086 模式异常
与“实地址模式”中的异常相同
#PF(错误代码) - 页错误。
#AC(0)(仅限 64 位操作)- 如果在启用对齐检查的情况下进行未对齐的内存引用。
数值异常
无。
-----------------------------------------------------------------------------------------------------------------------------------------------
PSLLDQ - 压缩双四字逻辑左移
操作码
指令
说明
66 0F 73 /7 ib
PSLLDQ xmm1, imm8
将 xmm1 左移 imm8 字节,清除低位。
说明
按计数操作数(第二个操作数)指定的字节数,将目标操作数(第一个操作数)左移。空出的低位字节被清除(设置为全 0)。如果计数操作数指定的值大于 15,则将目标操作数设置为全 0。目标操作数是 XMM 寄存器。计数操作数是 8 位立即数。
操作
TEMP COUNT;
if (TEMP > 15) TEMP 16;
DEST DEST << (TEMP * 8);
英特尔(R) C++ 编译器等价内部函数
PSLLDQ __m128i _mm_slli_si128 ( __m128i a, int imm)
影响的标志
无。
保护模式异常
#UD - 如果 CR0 中的 EM 设置为 1。如果 CR4 中的 OSFXSR 是 0。如果 CPUID 功能标志 SSE-2 是 0。
#NM - 如果 CR0 中的 TS 设置为 1。
实地址模式异常
与“保护模式”中的异常相同。
虚 8086 模式异常
与“保护模式”中的异常相同。
数值异常
无。
-----------------------------------------------------------------------------------------------------------------------------------------------
PSHUFD - 压缩双字乱序
操作码
指令
说明
66 0F 70 /r ib
PSHUFD xmm1, xmm2/m128,imm8
按照 imm8 中的编码对 xmm2/m128 中的双字执行乱序处理,结果存储到 xmm1。
说明
从源操作数(第二个操作数)中复制双字,按照顺序操作数(第三个操作数)选择的位置,将它们插入目标操作数(第一个操作数)。图 3-10 显示 PSHUFD 指令的操作与顺序操作数的编码。顺序操作数中的每 2 个位字段选择目标操作数中一个双字位置的内容。例如,顺序操作数的位 0 与 1 选择目标操作数中双字 0 的内容。顺序操作数位 0 与 1 的编码(请参阅“图 3-10”中的字段编码)确定要将源操作数中的哪个双字复制到目标操作数的双字 0。
源操作数可以是 XMM 寄存器或 128 位内存位置。目标操作数是 XMM 寄存器。顺序操作数是 8 位立即数。
请注意,此指令允许将源操作数中的一个双字复制到目标操作数中的多个双字位置。
图 3-10 PSHUFD 指令操作
操作
DEST (SRC >> (ORDER * 32) )
DEST (SRC >> (ORDER * 32) )
DEST (SRC >> (ORDER * 32) )
DEST (SRC >> (ORDER * 32) )
英特尔(R) C++ 编译器等价内部函数
PSHUFD __m128i _mm_shuffle_epi32(__m128i a, int n)
影响的标志
无。
保护模式异常
#GP(0) - 如果内存操作数有效地址超出 CS、DS、ES、FS 或 GS 段限制。如果内存操作数未对齐 16 字节边界,不论是哪一段。
#SS(0) - 如果内存操作数有效地址超出 SS 段限制。
#UD - 如果 CR0 中的 EM 设置为 1。如果 CR4 中的 OSFXSR 是 0。如果 CPUID 功能标志 SSE-2 是 0。
#NM - 如果 CR0 中的 TS 设置为 1。
#PF(错误代码) - 如果发生页错误。
实地址模式异常
#GP(0) - 如果内存操作数未对齐 16 字节边界,不论是哪一段。如果操作数的任何部分出现在 0 到 FFFFH 的有效地址空间之外。
#UD - 如果 CR0 中的 EM 设置为 1。如果 CR4 中的 OSFXSR 是 0。如果 CPUID 功能标志 SSE-2 为 0。
#NM - 如果 CR0 中的 TS 设置为 1。
虚 8086 模式异常
与“实地址模式”中的异常相同。
#PF(错误代码) - 页错误。
数值异常
无。
Intel体系结构MMX 指令集(表结构)
2012-05-17 09:46 314人阅读 评论(0) 收藏 举报
分类: Asm(33)
转自
http://blog.csdn.net/dahan_wangtao/article/details/1816513
Intel体系结构MMX 指令集
成组算术 环绕方式 有符号饱和模式 无符号饱和模式
Addition PADD PADDS PADDUS
Subtraction PSUB PSUBS PSUBUS
Multiplication PMULL/H
Multiply&add PMADD
Shift right Arithmetic PSRA
Compare PCMPcc
转换 常规 有符号饱和模式 无符号饱和模式
Pack PACKSS PACKUS
Unpack PUNPCKL/H
逻辑操作 成组 全64位
And PAND
And not PANDN
Or POR
Exclusive or PXOR
Shift left PSLL PSLL
Shift right PSRL PSRL
转移及内存操作 32位 64位
Register-register move MOVD MOVQ
Load from memory MOVD MOVQ
Store to memory MOVD MOVQ
其它
Empty multimedia state EMMS
MMX指令
标签: 数据结构扩展byte图形算法
2012-05-17 09:46 255人阅读 评论(0) 收藏 举报
分类: Asm(33)
转自
http://blog.csdn.net/dahan_wangtao/article/details/1816513
最近学习相关的多媒体数据操作指令, 统计了一下的MMX指令,作为知识积累。
MMX的数据结构
多媒体软件具有如下显著的特点:
1、 小整型数据类型(图形数据为8位 ,声频数据为16位)
2、 对小整型数据的频繁且重复的计算操作(例如被频繁的调用的核心算法);
3、 许多操作具有内存的并行性(例如对大量的数据进行同一个加,减或乘法运算操作);
MMX技术设计了一套基本的,通用的紧缩整形指令,共57条。
所谓“紧缩整形数据”是指多个8/16/32位的整形数据组合成为一个64位的数据.MMX指令主要就是使用
这种紧缩整形数据,它又分成4种整形类型:紧缩字节、紧缩字、紧缩双字、紧缩4字
。紧缩字节(Packed Byte): 8个字节组合成一个64位的数据;
。紧缩字 (Packed Word): 4个字组合成一个64位的数据;
。紧缩双字(Packed Doubleword): 2个双字组合成一个64位的数据;
。紧缩4字 (Packed Quadword):一个64位数据
这样一条MMX指令就能够同时处理8/4/2个数据单元,这就是所谓的“单指令多数据SIMD”结构。这种结构
是MMX技术把机器性能提高的最根本因素。
为了方便使用64位紧缩整形数据,MMX技术含有8个64位的MMX寄存器(MM0-----MM7),只有MMX指令可以使用MMX寄存器。
值得一提的是,MMX寄存器是随机存取的,但实际上是借用了8个浮点数据寄存器实现的。浮点处理单元FPU有8个浮点寄存器FPR,以堆栈方式存取。每个浮点数据寄存器有80位,高16位用于指数和符号,低64位用于有效数字。MMX利用其64位有效数字部分用做随机存取的64位的MMX寄存器。
MMX指令集
1、算术运算:
PADD 环绕加
PADDS 有符号饱和加
PADDUS 无符号饱和加
PSUB 环绕减
PSUBS 有符号饱和减
PSUBUS 无符号饱和减【字节,字】
PMULHW 紧缩字乘后取高位
PMULLW 紧缩字乘后取低位
PMADDWD 紧缩字乘,积相加
2、比较:
PCMPEQ 紧缩比较是否相等【字节,字,双字】
PCMPGT 紧缩比较是否大于【字节,字,双字】
3、类型转换:
PACKUSWB 按无符号饱和压缩【字成字节】
PACKSS 按有符号饱和压缩【字/双字成/字节/字】
PUNPCKH 扩展高位【字节,字,双字成字,双字,4字】
PUNPCKL 扩展地位【字节,字,双字成字,双字,4字】
4、逻辑运算:
PAND 紧缩逻辑与
PANDN 紧缩逻辑与非
POR 紧缩逻辑或
PXOR 紧缩逻辑异或
5、位移:
PSLL 紧缩逻辑左移
PSRL 紧缩逻辑右移
PSRA 紧缩算术右移【字,双字】
7、数据传送:
MOV 从MMX寄存器传人/传出【双字/4字】
8、状态清除
EMMS 清除MMX状态
Last edited by zzz19760225 on 2016-12-12 at 14:56 ]
MMX Instruction Set/MMX Instruction Set Edit
The MMX instruction set includes 57 multimedia instructions. Through these instructions, multiple data can be processed at one time. When the processing result exceeds the actual processing capacity, it can also be processed normally. So, with the cooperation of software, higher performance can be obtained. The benefit of MMX is that the existing operating systems do not need to make any modifications to easily execute MMX programs. However, the problem is also obvious. That is, the MMX instruction set and the x87 floating-point operation instructions cannot be executed simultaneously and must be switched intensively to execute normally. This situation is bound to cause a decline in the running quality of the entire system.
http://blog.csdn.net/arau_sh/article/details/7575043
http://blog.csdn.net/arau_sh/article/details/7575059
http://blog.csdn.net/arau_sh/article/details/7575066
MMX Instruction Set (Detailed Explanation)
Tags: Compiler Storage Intel FPC Extension
2012-05-17 09:44 559 views Comments(0) Favorite Report
Classification: Asm (33)
Table of Contents(?)
Transferred from
http://blog.csdn.net/dahan_wangtao/article/details/1944153
EMMS Empty MMX State:
Set the FP feature word to empty (all 1s), so that subsequent floating-point instructions can use the floating-point registers, and other MMX instructions automatically set FP to all 0s. This instruction should be used at the end of all MMX routines and when calling routines that may contain FP instructions to clear the MMX state.
MOVD mm, r/m32
MOVD r/m32, mm Transfer 32-bit Data:
Move 32-bit data from the integer register/memory to the MMX register and vice versa. MOVD cannot move data between MMX registers, between memory, or between integer registers. When the destination operand is an MMX register, the 32-bit source operand is written to the lower 32 bits of the destination register. The destination register is "0-extended" to 64 bits. When the source operand is an MMX register, the lower 32 bits of this register are written to the destination operand.
MOVQ mm, r/m64
MOVQ r.m64, mm Transfer 64-bit Data:
Move 64-bit data from the integer register/memory to the MMX register and vice versa. The destination operand and the source operand can be MMX registers or 64-bit memory operands. But MOVQ cannot transfer data between memory and memory.
PACKSSWB mm, mm/m64
PACKSSDW mm, mm/m64
Signed Saturation Data Packing:
Pack the signed word groups in the MMX register and the MMX register/memory unit into the signed byte groups in the MMX register. And pack the signed double word groups in the MMX register and the MMX register/memory unit into the signed word groups in the MMX register. (Note 1)
PACKUSWB mm, mm/m64 Unsigned Saturation Data Packing
Pack the signed word groups in the MMX register and the MMX register/memory unit into the unsigned byte groups in the MMX register. (Note 1)
PADDB mm, mm/m64
PADDW mm, mm/m64
PADDD mm, mm/m64
Wrapping Mode Data Group Addition:
Add the byte groups (word groups, double word groups) in the MMX register/memory unit to the MMX register in wrapping mode. (Note 1)
PADDSB mm, mm/m64
PADDSW mm, mm/m64
Signed Saturation Data Group Addition:
Add the signed byte groups (word groups) in the MMX register/memory unit to the signed byte groups (word groups) data in the MMX register in saturation mode. (Note 1)
PADDUSB mm, mm/m64
PADDUSW mm, mm/m64
Unsigned Saturation Data Group Addition:
Add the unsigned byte groups (word groups) in the MMX register/memory unit to the unsigned byte groups (word groups) data in the MMX register in saturation mode. (Note 1)
PAND mm, mm/m64
Bitwise Logical AND:
Perform an AND operation on the 64-bit data in the MMX register/memory unit, and store the result in the MMX register.
PANDN mm, mm/m64
Bitwise Logical AND NOT:
Invert the 64-bit value in the MMX register, then perform an AND operation on the inverted MMX register and the 64-bit data in the MMX register/memory unit, and store the result in the MMX register.
PCMPEQB mm, mm/m64
PCMPEQW mm, mm/m64
PCMPEQD mm, mm/m64
Group Data Equality Comparison:
Compare the byte groups (word groups, double word groups) data in the MMX register with the MMX register/memory unit.
This instruction compares the corresponding data elements of the destination operand and the source operand. If they are equal, the corresponding data element of the destination register is set to all 1s; otherwise, it is set to all 0s.
eg: PCMPEQE mm, mm/m64
mm ? ? 00000000000000111 0111000111000111
mm/m64 ? ? 11111110000001100 0111000111000111
Result mm ? ? 00000000000000000 1111111111111111
PCMPGTB mm, mm/m64
PCMPGTW mm, mm/m64
PCMPGTD mm, mm/m64
Group Data Greater Than Comparison:
Compare the byte groups (word groups, double word groups) data in the MMX register with the MMX register/memory unit.
This instruction compares the corresponding data elements of the destination operand and the source operand. If greater, the corresponding data element of the destination register is set to all 1s; otherwise, it is set to all 0s. (Refer to the previous instruction)
PMADDWD mm, mm/m64 Multiply and Add of Data Groups (Word Groups):
Multiply the word group data in the MMX register with the MMX register/memory unit, then add the 32-bit results pairwise and store them as double words in the MMX register.
eg: PMADDWD mm, mm/m64
mm ? ? 0111000111000111 0111000111000111
Operation * * * *
mm, mm/m64 ? ? 1000000000000000 0000010000000000
Operation /_____+____/ /______+_____/
mm ? ? 1100100011100011 1001110000000000
PMULHW mm, mm/m64
Multiply and Take High Order of Group Data (Word Groups):
Multiply the signed word group data in the MMX register with the MMX register/memory unit, then store the high 16 bits of the result in the MMX register.
eg: PMULHW mm, mm/m64
mm ? ? 0111000111000111 0111000111000111
Operation * * * *
mm/m64 ? ? 1000000000000000 0000010000000000
Operation High Order High Order High Order High Order
mm ? ? 1100011100011100 0000000111000111
PMULLW mm, mm/m64
Multiply and Take Low Order of Group Data (Word Groups):
Multiply the signed word group data in the MMX register with the MMX register/memory unit, then store the low 16 bits of the result in the MMX register. (Refer to the previous instruction)
POR mm, mm/m64 Bitwise Logical OR:
Perform an OR operation on the 64-bit data in the MMX register/memory unit, and store the result in the MMX register.
PSLLW mm, mm/m64
PSLLD mm, mm/m64
PSLLQ mm, mm/m64
PSLLW mm, imm8
PSLLD mm, imm8
PSLLQ mm, imm8
Logical Left Shift of Group Data:
Left shift the word (double word, quad word) data in the MMX register by the number of bits specified by the MMX register/memory unit, and 0s are shifted into the lower bits.
Left shift the word (double word, quad word) data in the MMX register by the number of bits specified by the 8-bit immediate number, and 0s are shifted into the lower bits.
PSRAW mm, mm/m64
PSRAD mm, mm/m64
PSRAW mm, imm8
PSRAD mm, imm8 Arithmetic Right Shift of Group Data:
Right shift the word (double word) data in the MMX register by the number of bits specified by the MMX register/memory unit, and the sign bit is maintained during the shift.
Right shift the word (double word) data in the MMX register by the number of bits specified by the 8-bit immediate number, and the sign bit is maintained during the shift.
PSRLW mm, mm/m64
PSRLD mm, mm/m64
PSRLQ mm, mm/m64
PSRLW mm, imm8
PSRLD mm, imm8
PSRLQ mm, imm8 Logical Right Shift of Group Data:
Right shift the word (double word) data in the MMX register by the number of bits specified by the MMX register/memory unit, and 0s are shifted into the shifted-out bits.
Right shift the word (double word) data in the MMX register by the number of bits specified by the 8-bit immediate number, and 0s are shifted into the shifted-out bits.
PSUBB mm, mm/m64
PSUBW mm, mm/m64
PSUBD mm, mm/m64 Wrapping Mode Group Data Subtraction:
Subtract the byte (word, double word) groups in the MMX register/memory unit from the MMX register by byte (word, double word). (Note 1)
PSUBSB mm, mm/m64
PSUBSW mm, mm/m64
Signed Saturation Group Data Subtraction:
Subtract the signed byte (word) group data in the MMX register/memory unit from the signed byte (word) group data in the MMX register. (Note 1)
PSUBUSB mm, mm/m64
PSUBUSW mm, mm/m64 Signed Saturation Group Data Subtraction:
Subtract the unsigned byte (word) group data in the MMX register/memory unit from the unsigned byte (word) group data in the MMX register. (Note 1)
PUNPCKHBW mm, mm/m64
PUNPCKHWD mm, mm/m64
PUNPCKHDQ mm, mm/m64 High Order Group Data Unpacking:
This instruction alternately takes out the high half parts of the data elements of the source operand and the destination operand and writes them into the destination operand, and the low half parts of the data elements are ignored.
eg: PUNPCKHBW mm, mm/m64
PUNPCKLBW mm, mm/m64
PUNPCKLWD mm, mm/m64
PUNPCKLDQ mm, mm/m64 Low Order Group Data Unpacking:
This instruction alternately takes out the low half parts of the data elements of the source operand and the destination operand and writes them into the destination operand, and the high half parts of the data elements are ignored. (Refer to the previous instruction)
PXOR mm, mm/m64 Bitwise Logical XOR:
Perform an XOR operation on the 64-bit data in the MMX register/memory unit, and store the result in the MMX register.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Supplementary:
PMULHUW - Packed Unsigned Multiplication Take High Order
Opcode
Instruction
Description
0F E4 /r
PMULHUW mm1, mm2/m64
Multiply the packed unsigned word integers in mm1 register and mm2/m64, and store the high 16 bits of the result into mm1.
66 0F E4 /r
PMULHUW xmm1, xmm2/m128
Multiply the packed unsigned word integers in xmm1 and xmm2/m128, and store the high 16 bits of the result into xmm1.
Description
Perform SIMD multiplication on the packed unsigned word integers in the destination operand (the first operand) and the source operand (the second operand), and store the high 16 bits of each 32-bit intermediate result into the destination operand. (Figure 3-7 shows the situation when using 64-bit operands for this operation). The source operand can be an MMX™ technology register or a 64-bit memory location, or an XMM register or a 128-bit memory location. The destination operand can be an MMX or XMM register.
PMULHUW and PMULHW Instruction Operations
Operation
PMULHUW instruction with 64-bit operands:
TEMP0 DEST * SRC; * Unsigned multiplication *
TEMP1 DEST * SRC;
TEMP2 DEST * SRC;
TEMP3 DEST * SRC;
DEST TEMP0;
DEST TEMP1;
DEST TEMP2;
DEST TEMP3;
PMULHUW instruction with 128-bit operands:
TEMP0 DEST * SRC; * Unsigned multiplication *
TEMP1 DEST * SRC;
TEMP2 DEST * SRC;
TEMP3 DEST * SRC;
TEMP4 DEST * SRC;
TEMP5 DEST * SRC;
TEMP6 DEST * SRC;
TEMP7 DEST * SRC;
DEST TEMP0;
DEST TEMP1;
DEST TEMP2;
DEST TEMP3;
DEST TEMP4;
DEST TEMP5;
DEST TEMP6;
DEST TEMP7;
Intel® C++ Compiler Equivalent Intrinsics
PMULHUW __m64 _mm_mulhi_pu16(__m64 a, __m64 b)
PMULHUW __m128i _mm_mulhi_epu16 ( __m128i a, __m128i b)
Affected Flags
None.
Protected Mode Exceptions
#GP(0) - If the effective address of the memory operand is beyond the segment limit of CS, DS, ES, FS, or GS. (Only for 128-bit operations). If the memory operand is not aligned to a 16-byte boundary, regardless of which segment.
#SS(0) - If the effective address of the memory operand is beyond the segment limit of SS.
#UD - If EM in CR0 is set to 1. (Only for 128-bit operations). If OSFXSR in CR4 is 0. (Only for 128-bit operations). If CPUID feature flag SSE-2 is 0.
#NM - If TS in CR0 is set to 1.
#MF (Only for 64-bit operations) - If there is a pending x87 FPU exception.
#PF(error code) - If a page fault occurs.
#AC(0) (Only for 64-bit operations) - If alignment checking is enabled and an unaligned memory reference is made when the current privilege level is 3.
Real Address Mode Exceptions
#GP(0) (Only for 128-bit operations) - If the memory operand is not aligned to a 16-byte boundary, regardless of which segment. If any part of the operand is outside the valid address space from 0 to FFFFH.
#UD - If EM in CR0 is set to 1. (Only for 128-bit operations). If OSFXSR in CR4 is 0. (Only for 128-bit operations). If CPUID feature flag SSE-2 is 0.
#NM - If TS in CR0 is set to 1.
#MF (Only for 64-bit operations) - If there is a pending x87 FPU exception.
Virtual 8086 Mode Exceptions
Same as exceptions in "Real Address Mode".
#PF(error code) - Page fault.
#AC(0) (Only for 64-bit operations) - If an unaligned memory reference is made when alignment checking is enabled.
Numerical Exceptions
None.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PACKUSWB - Unsigned Saturation Packing
Opcode
Instruction
Description
0F 67 /r
PACKUSWB mm, mm/m64
Pack 4 signed words in mm into 8 unsigned bytes using saturation operation, and the result is put into mm.
66 0F 67 /r
PACKUSWB xmm1, xmm2/m128
Pack signed words in xmm1 and xmm2/m128 into unsigned bytes using saturation operation, and the result is put into xmm1.
Description
Pack 4 signed words in the destination operand (the first operand) and 4 signed words in the source operand (the second operand) into 8 signed bytes using saturation operation, and the result is put into the destination operand. (Refer to "Figure 3-5"). If the signed value of the word is outside the range of the unsigned byte (i.e., greater than FFH or less than 00H), then the saturated byte value FFH or 00H is stored into the destination operand respectively.
The destination operand must be an MMX™ technology register; the source operand can be an MMX register or a quad-word memory location.
Pack 8 signed words in the source operand xmm2/m128 and 8 signed words in the destination operand xmm1 into 16 unsigned bytes, and the result is put into the destination register xmm1. If the signed value of the word is greater than or less than the range of the unsigned byte, then saturation operation is performed on the value (FFH for overflow, 00H for underflow). The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory operand.
Figure 3-5. Operation of PACKUSWB Instruction
Operation
PACKUSWB instruction with 64-bit operands:
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte DEST;
DEST SaturateSignedWordToUnsignedByte SRC;
DEST SaturateSignedWordToUnsignedByte SRC;
DEST SaturateSignedWordToUnsignedByte SRC;
DEST SaturateSignedWordToUnsignedByte SRC;
PACKUSWB instruction with 128-bit operands:
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (DEST);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
DEST SaturateSignedWordToUnsignedByte (SRC);
Intel® C++ Compiler Equivalent Intrinsics
__m64 _mm_packs_pu16(__m64 m1, __m64 m2)
Affected Flags
None.
Protected Mode Exceptions
#GP(0) - If the effective address of the memory operand is beyond the segment limit of CS, DS, ES, FS, or GS. (Only for 128-bit operations). If the memory operand is not aligned to a 16-byte boundary, regardless of which segment.
#SS(0) - If the effective address of the memory operand is beyond the segment limit of SS.
#UD - If EM in CR0 is set to 1. (Only for 128-bit operations). If OSFXSR in CR4 is 0. (Only for 128-bit operations). If CPUID feature flag SSE-2 is 0.
#NM - If TS in CR0 is set to 1.
#MF (Only for 64-bit operations) - If there is a pending x87 FPU exception.
#PF(error code) - If a page fault occurs.
#AC(0) (Only for 64-bit operations) - If alignment checking is enabled and an unaligned memory reference is made when the current privilege level is 3.
Real Address Mode Exceptions
#GP(0) (Only for 128-bit operations) - If the memory operand is not aligned to a 16-byte boundary, regardless of which segment. If any part of the operand is outside the valid address space from 0 to FFFFH.
#UD - If EM in CR0 is set to 1. (Only for 128-bit operations). If OSFXSR in CR4 is 0. (Only for 128-bit operations). If CPUID feature flag SSE-2 is 0.
#NM - If TS in CR0 is set to 1.
#MF (Only for 64-bit operations) - If there is a pending x87 FPU exception.
Virtual 8086 Mode Exceptions
Same as exceptions in "Real Address Mode".
#PF(error code) - Page fault.
#AC(0) (Only for 64-bit operations) - If an unaligned memory reference is made when alignment checking is enabled.
---------------------------------------------------------------------------------------------------------------------------------------
PACKSSWB/PACKSSDW - Signed Saturation Packing
Opcode
Instruction
Description
0F 63 /r
PACKSSWB mm1, mm2/m64
Pack 4 signed words in mm1 and 4 signed words in mm2/m64 into 8 signed byte integers using saturation operation, and the result is put into mm1.
66 0F 63 /r
PACKSSWB xmm1, xmm2/m128
Pack 8 signed words in xmm1 and 8 signed words in xxm2/m128 into 16 signed byte integers using saturation operation, and the result is put into xxm1.
0F 6B /r
PACKSSDW mm1, mm2/m64
Pack 2 signed double word integers in mm1 and 2 signed double word integers in mm2/m64 into 4 signed word integers using saturation operation, and the result is put into mm1.
66 0F 6B /r
PACKSSDW xmm1, xmm2/m128
Pack 4 signed double word integers in xmm1 and 4 signed double word integers in xxm2/m128 into 8 signed word integers using saturation operation, and the result is put into xxm1.
Description
Pack signed word integers into signed byte integers using saturation operation (PACKSSWB), or pack signed double word integers into signed word integers (PACKSSDW). The PACKSSWB instruction packs 4 signed words in the destination operand (the first operand) and 4 signed words in the source operand (the second operand) into 8 signed bytes, and the result is put into the destination operand. If the signed value of the word is outside the range of the signed byte (i.e., greater than 7FH or less than 80H), then the saturated byte value 7FH or 80H is stored into the destination operand respectively.
The PACKSSDW instruction packs 2 signed double words in the destination operand (the first operand) and 2 signed double words in the source operand (the second operand) into 4 signed words, and the result is put into the destination operand. (Refer to "Figure 3-4"). If the signed value of the double word is outside the range of the signed word (i.e., greater than 7FFFH or less than 8000H), then the saturated byte value 7FFFH or 8000H is stored into the destination operand respectively.
The destination operand of the PACKSSWB and PACKSSDW instructions must be an MMX™ technology register; the source operand can be an MMX register or a quad-word memory location.
Pack signed data elements in the source operand and the destination operand using signed saturation operation, and the result is written into the destination operand. The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory operand.
The PACKSSWB instruction packs 8 signed words in the source operand and 8 signed words in the destination operand into 16 signed bytes, and the result is put into the destination operand. If the signed value of the word is greater than or less than the range of the signed byte, then saturation operation is performed on the value (7FH for overflow, 80H for underflow).
The PACKSSDW instruction packs 4 signed double words in the source operand and 4 signed double words in the destination operand into 8 signed words, and the result is put into the destination register. If the signed value of the double word is greater than or less than the range of the signed word, then saturation operation is performed on the value (7FFFH for overflow, 8000H for underflow)).
Figure 3-4. Operation of PACKSSDW Instruction
Operation
PACKSSWB instruction with 64-bit operands
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte DEST;
DEST SaturateSignedWordToSignedByte SRC;
DEST SaturateSignedWordToSignedByte SRC;
DEST SaturateSignedWordToSignedByte SRC;
DEST SaturateSignedWordToSignedByte SRC;
PACKSSDW instruction with 64-bit operands
DEST SaturateSignedDoublewordToSignedWord DEST;
DEST SaturateSignedDoublewordToSignedWord DEST;
DEST SaturateSignedDoublewordToSignedWord SRC;
DEST SaturateSignedDoublewordToSignedWord SRC;
PACKSSWB instruction with 128-bit operands
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (DEST);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
DEST SaturateSignedWordToSignedByte (SRC);
PACKSSDW instruction with 128-bit operands
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (DEST);
DEST SaturateSignedDwordToSignedWord (SRC);
DEST SaturateSignedDwordToSignedWord (SRC);
DEST SaturateSignedDwordToSignedWord (SRC);
DEST SaturateSignedDwordToSignedWord (SRC);
Intel® C++ Compiler Equivalent Intrinsics
__m64 _mm_packs_pi16(__m64 m1, __m64 m2)
__m64 _mm_packs_pi32 (__m64 m1, __m64 m2)
Affected Flags
None.
Protected Mode Exceptions
#GP(0) - If the effective address of the memory operand is beyond the segment limit of CS, DS, ES, FS, or GS. (Only for 128-bit operations). If the memory operand is not aligned to a 16-byte boundary, regardless of which segment