Discussion:
x86: Swap destination/source to encode VEX only if possible
H.J. Lu
2018-09-13 13:19:06 UTC
Permalink
Various moves come in load and store forms, and just like on the GPR
and FPU sides there would better be only one pattern. In some cases this
is not feasible because the opcodes are too different, but quite a few
cases follow a similar standard scheme. Introduce Opcode_SIMD_FloatD and
Opcode_SIMD_IntD, generalize handling in operand_size_match() (reverse
operand handling there simply needs to match "straight" operand one),
and fix a long standing, but so far only latent bug with when to zap
found_reverse_match.
Also once again drop IgnoreSize where pointlessly applied to templates
touched anyway as well as *word when redundant with Reg*.
gas/
* config/tc-i386.c (operand_size_match): Mirror
.reg/.regsimd/.acc handling from forward to reverse case.
(build_vex_prefix): Check first and last operand types are equal
and also consider .d for swapping operands for VEX2 encoding.
(match_template): Clear found_reverse_match on every iteration.
Use Opcode_SIMD_FloatD and Opcode_SIMD_IntD.
* testsuite/gas/i386/pseudos.s,
testsuite/gas/i386/x86-64-pseudos.s: Add kmov* tests.
* testsuite/gas/i386/pseudos.d,
testsuite/gas/i386/x86-64-pseudos.d: Adjust expectations.
opcodes/
* i386-opc.tbl (bndmov, kmovb, kmovd, kmovq, kmovw, movapd,
movaps, movd, movdqa, movdqu, movhpd, movhps, movlpd, movlps,
movq, movsd, movss, movupd, movups, vmovapd, vmovaps, vmovd,
vmovdqa, vmovdqa32, vmovdqa64, vmovdqu, vmovdqu16, vmovdqu32,
Fold load and store templates where possible, adding D. Drop
IgnoreSize where it was pointlessly present. Drop redundant
*word.
* i386-tbl.h: Re-generate.
On Linux/x86-64, this caused
FAIL: i386 arch 10
FAIL: i386 arch 10 (lzcnt)
FAIL: i386 arch 10 (prefetchw)
FAIL: i386 arch 10 (bdver1)
FAIL: i386 arch 10 (bdver2)
FAIL: i386 arch 10 (bdver3)
FAIL: i386 arch 10 (bdver4)
FAIL: i386 arch 10 (btver1)
FAIL: i386 arch 10 (btver2)
FAIL: i386 noavx-1
FAIL: i386 noavx-3
FAIL: i386 AVX
FAIL: i386 AVX (Intel disassembly)
FAIL: x86-64 arch 2
FAIL: x86-64 arch 2 (lzcnt)
FAIL: x86-64 arch 2 (prefetchw)
FAIL: x86-64 arch 2 (bdver1)
FAIL: x86-64 arch 2 (bdver2)
FAIL: x86-64 arch 2 (bdver3)
FAIL: x86-64 arch 2 (bdver4)
FAIL: x86-64 arch 2 (btver1)
FAIL: x86-64 arch 2 (btver2)
FAIL: x86-64 AVX
FAIL: x86-64 AVX (Intel mode)
Can you fix it today?
as fails to assemble

vzeroall

I checked in the following patch to fix it.
--
H.J.
Jan Beulich
2018-09-13 13:33:47 UTC
Permalink
Post by H.J. Lu
Various moves come in load and store forms, and just like on the GPR
and FPU sides there would better be only one pattern. In some cases this
is not feasible because the opcodes are too different, but quite a few
cases follow a similar standard scheme. Introduce Opcode_SIMD_FloatD and
Opcode_SIMD_IntD, generalize handling in operand_size_match() (reverse
operand handling there simply needs to match "straight" operand one),
and fix a long standing, but so far only latent bug with when to zap
found_reverse_match.
Also once again drop IgnoreSize where pointlessly applied to templates
touched anyway as well as *word when redundant with Reg*.
gas/
* config/tc-i386.c (operand_size_match): Mirror
.reg/.regsimd/.acc handling from forward to reverse case.
(build_vex_prefix): Check first and last operand types are equal
and also consider .d for swapping operands for VEX2 encoding.
(match_template): Clear found_reverse_match on every iteration.
Use Opcode_SIMD_FloatD and Opcode_SIMD_IntD.
* testsuite/gas/i386/pseudos.s,
testsuite/gas/i386/x86-64-pseudos.s: Add kmov* tests.
* testsuite/gas/i386/pseudos.d,
testsuite/gas/i386/x86-64-pseudos.d: Adjust expectations.
opcodes/
* i386-opc.tbl (bndmov, kmovb, kmovd, kmovq, kmovw, movapd,
movaps, movd, movdqa, movdqu, movhpd, movhps, movlpd, movlps,
movq, movsd, movss, movupd, movups, vmovapd, vmovaps, vmovd,
vmovdqa, vmovdqa32, vmovdqa64, vmovdqu, vmovdqu16, vmovdqu32,
Fold load and store templates where possible, adding D. Drop
IgnoreSize where it was pointlessly present. Drop redundant
*word.
* i386-tbl.h: Re-generate.
On Linux/x86-64, this caused
FAIL: i386 arch 10
FAIL: i386 arch 10 (lzcnt)
FAIL: i386 arch 10 (prefetchw)
FAIL: i386 arch 10 (bdver1)
FAIL: i386 arch 10 (bdver2)
FAIL: i386 arch 10 (bdver3)
FAIL: i386 arch 10 (bdver4)
FAIL: i386 arch 10 (btver1)
FAIL: i386 arch 10 (btver2)
FAIL: i386 noavx-1
FAIL: i386 noavx-3
FAIL: i386 AVX
FAIL: i386 AVX (Intel disassembly)
FAIL: x86-64 arch 2
FAIL: x86-64 arch 2 (lzcnt)
FAIL: x86-64 arch 2 (prefetchw)
FAIL: x86-64 arch 2 (bdver1)
FAIL: x86-64 arch 2 (bdver2)
FAIL: x86-64 arch 2 (bdver3)
FAIL: x86-64 arch 2 (bdver4)
FAIL: x86-64 arch 2 (btver1)
FAIL: x86-64 arch 2 (btver2)
FAIL: x86-64 AVX
FAIL: x86-64 AVX (Intel mode)
Can you fix it today?
as fails to assemble
vzeroall
Interesting. I have absolutely no idea why it looks to have worked for
me.
Post by H.J. Lu
I checked in the following patch to fix it.
Thanks a lot!

Jan

Loading...