[X86] Make v2i1 and v4i1 legal types without VLX

Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.

It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.

This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.

We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.

I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.

There's definitely room for improvement with some follow up patches.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41560

llvm-svn: 321967
This commit is contained in:
Craig Topper 2018-01-07 18:20:37 +00:00
parent 464de6ca09
commit d58c165545
23 changed files with 4685 additions and 8227 deletions

View File

@ -3616,6 +3616,7 @@ SDValue DAGTypeLegalizer::WidenVecOp_SETCC(SDNode *N) {
SDValue InOp0 = GetWidenedVector(N->getOperand(0));
SDValue InOp1 = GetWidenedVector(N->getOperand(1));
SDLoc dl(N);
EVT VT = N->getValueType(0);
// WARNING: In this code we widen the compare instruction with garbage.
// This garbage may contain denormal floats which may be slow. Is this a real
@ -3625,18 +3626,23 @@ SDValue DAGTypeLegalizer::WidenVecOp_SETCC(SDNode *N) {
// Only some of the compared elements are legal.
EVT SVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
InOp0.getValueType());
// The result type is legal, if its vXi1, keep vXi1 for the new SETCC.
if (VT.getScalarType() == MVT::i1)
SVT = EVT::getVectorVT(*DAG.getContext(), MVT::i1,
SVT.getVectorNumElements());
SDValue WideSETCC = DAG.getNode(ISD::SETCC, SDLoc(N),
SVT, InOp0, InOp1, N->getOperand(2));
SVT, InOp0, InOp1, N->getOperand(2));
// Extract the needed results from the result vector.
EVT ResVT = EVT::getVectorVT(*DAG.getContext(),
SVT.getVectorElementType(),
N->getValueType(0).getVectorNumElements());
VT.getVectorNumElements());
SDValue CC = DAG.getNode(
ISD::EXTRACT_SUBVECTOR, dl, ResVT, WideSETCC,
DAG.getConstant(0, dl, TLI.getVectorIdxTy(DAG.getDataLayout())));
return PromoteTargetBoolean(CC, N->getValueType(0));
return PromoteTargetBoolean(CC, VT);
}

View File

@ -460,7 +460,7 @@ static bool isLegalMaskCompare(SDNode *N, const X86Subtarget *Subtarget) {
// this happens we will use 512-bit operations and the mask will not be
// zero extended.
EVT OpVT = N->getOperand(0).getValueType();
if (OpVT == MVT::v8i32 || OpVT == MVT::v8f32)
if (OpVT.is256BitVector() || OpVT.is128BitVector())
return Subtarget->hasVLX();
return true;

View File

@ -1144,6 +1144,8 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
addRegisterClass(MVT::v8f64, &X86::VR512RegClass);
addRegisterClass(MVT::v1i1, &X86::VK1RegClass);
addRegisterClass(MVT::v2i1, &X86::VK2RegClass);
addRegisterClass(MVT::v4i1, &X86::VK4RegClass);
addRegisterClass(MVT::v8i1, &X86::VK8RegClass);
addRegisterClass(MVT::v16i1, &X86::VK16RegClass);
@ -1171,15 +1173,14 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
setOperationAction(ISD::FP_TO_UINT, MVT::v2i1, Custom);
}
// Extends of v16i1/v8i1 to 128-bit vectors.
setOperationAction(ISD::SIGN_EXTEND, MVT::v16i8, Custom);
setOperationAction(ISD::ZERO_EXTEND, MVT::v16i8, Custom);
setOperationAction(ISD::ANY_EXTEND, MVT::v16i8, Custom);
setOperationAction(ISD::SIGN_EXTEND, MVT::v8i16, Custom);
setOperationAction(ISD::ZERO_EXTEND, MVT::v8i16, Custom);
setOperationAction(ISD::ANY_EXTEND, MVT::v8i16, Custom);
// Extends of v16i1/v8i1/v4i1/v2i1 to 128-bit vectors.
for (auto VT : { MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v2i64 }) {
setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
setOperationAction(ISD::ANY_EXTEND, VT, Custom);
}
for (auto VT : { MVT::v8i1, MVT::v16i1 }) {
for (auto VT : { MVT::v2i1, MVT::v4i1, MVT::v8i1, MVT::v16i1 }) {
setOperationAction(ISD::ADD, VT, Custom);
setOperationAction(ISD::SUB, VT, Custom);
setOperationAction(ISD::MUL, VT, Custom);
@ -1195,9 +1196,12 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
}
setOperationAction(ISD::CONCAT_VECTORS, MVT::v16i1, Custom);
setOperationAction(ISD::CONCAT_VECTORS, MVT::v8i1, Custom);
setOperationAction(ISD::CONCAT_VECTORS, MVT::v4i1, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, MVT::v4i1, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, MVT::v8i1, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, MVT::v16i1, Custom);
for (auto VT : { MVT::v1i1, MVT::v8i1 })
for (auto VT : { MVT::v1i1, MVT::v2i1, MVT::v4i1, MVT::v8i1 })
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
for (MVT VT : MVT::fp_vector_valuetypes())
@ -1528,41 +1532,6 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
}
if (!Subtarget.useSoftFloat() && Subtarget.hasVLX()) {
addRegisterClass(MVT::v4i1, &X86::VK4RegClass);
addRegisterClass(MVT::v2i1, &X86::VK2RegClass);
for (auto VT : { MVT::v2i1, MVT::v4i1 }) {
setOperationAction(ISD::ADD, VT, Custom);
setOperationAction(ISD::SUB, VT, Custom);
setOperationAction(ISD::MUL, VT, Custom);
setOperationAction(ISD::VSELECT, VT, Expand);
setOperationAction(ISD::TRUNCATE, VT, Custom);
setOperationAction(ISD::SETCC, VT, Custom);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
}
// TODO: v8i1 concat should be legal without VLX to support concats of
// v1i1, but we won't legalize it correctly currently without introducing
// a v4i1 concat in the middle.
setOperationAction(ISD::CONCAT_VECTORS, MVT::v8i1, Custom);
setOperationAction(ISD::CONCAT_VECTORS, MVT::v4i1, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, MVT::v4i1, Custom);
for (auto VT : { MVT::v2i1, MVT::v4i1 })
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
// Extends from v2i1/v4i1 masks to 128-bit vectors.
setOperationAction(ISD::ZERO_EXTEND, MVT::v4i32, Custom);
setOperationAction(ISD::ZERO_EXTEND, MVT::v2i64, Custom);
setOperationAction(ISD::SIGN_EXTEND, MVT::v4i32, Custom);
setOperationAction(ISD::SIGN_EXTEND, MVT::v2i64, Custom);
setOperationAction(ISD::ANY_EXTEND, MVT::v4i32, Custom);
setOperationAction(ISD::ANY_EXTEND, MVT::v2i64, Custom);
setTruncStoreAction(MVT::v4i64, MVT::v4i8, Legal);
setTruncStoreAction(MVT::v4i64, MVT::v4i16, Legal);
setTruncStoreAction(MVT::v4i64, MVT::v4i32, Legal);
@ -4945,8 +4914,6 @@ static SDValue getZeroVector(MVT VT, const X86Subtarget &Subtarget,
} else if (VT.getVectorElementType() == MVT::i1) {
assert((Subtarget.hasBWI() || VT.getVectorNumElements() <= 16) &&
"Unexpected vector type");
assert((Subtarget.hasVLX() || VT.getVectorNumElements() >= 8) &&
"Unexpected vector type");
Vec = DAG.getConstant(0, dl, VT);
} else {
unsigned Num32BitElts = VT.getSizeInBits() / 32;
@ -17779,6 +17746,19 @@ static SDValue LowerVSETCC(SDValue Op, const X86Subtarget &Subtarget,
assert(EltVT == MVT::f32 || EltVT == MVT::f64);
#endif
// Custom widen MVT::v2f32 to prevent the default widening
// from getting a result type of v4i32, extracting it to v2i32 and then
// trying to sign extend that to v2i1.
if (VT == MVT::v2i1 && Op1.getValueType() == MVT::v2f32) {
Op0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v4f32, Op0,
DAG.getUNDEF(MVT::v2f32));
Op1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v4f32, Op1,
DAG.getUNDEF(MVT::v2f32));
SDValue NewOp = DAG.getNode(ISD::SETCC, dl, MVT::v4i1, Op0, Op1, CC);
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v2i1, NewOp,
DAG.getIntPtrConstant(0, dl));
}
unsigned Opc;
if (Subtarget.hasAVX512() && VT.getVectorElementType() == MVT::i1) {
assert(VT.getVectorNumElements() <= 16);
@ -24417,8 +24397,8 @@ static SDValue LowerMSCATTER(SDValue Op, const X86Subtarget &Subtarget,
// Mask
// At this point we have promoted mask operand
assert(MaskVT.getScalarSizeInBits() >= 32 && "unexpected mask type");
MVT ExtMaskVT = MVT::getVectorVT(MaskVT.getScalarType(), NumElts);
assert(MaskVT.getScalarType() == MVT::i1 && "unexpected mask type");
MVT ExtMaskVT = MVT::getVectorVT(MVT::i1, NumElts);
// Use the original mask here, do not modify the mask twice
Mask = ExtendToType(N->getMask(), ExtMaskVT, DAG, true);
@ -24427,12 +24407,9 @@ static SDValue LowerMSCATTER(SDValue Op, const X86Subtarget &Subtarget,
Src = ExtendToType(Src, NewVT, DAG);
}
}
// If the mask is "wide" at this point - truncate it to i1 vector
MVT BitMaskVT = MVT::getVectorVT(MVT::i1, NumElts);
Mask = DAG.getNode(ISD::TRUNCATE, dl, BitMaskVT, Mask);
// The mask is killed by scatter, add it to the values
SDVTList VTs = DAG.getVTList(BitMaskVT, MVT::Other);
SDVTList VTs = DAG.getVTList(Mask.getValueType(), MVT::Other);
SDValue Ops[] = {Chain, Src, Mask, BasePtr, Index};
SDValue NewScatter = DAG.getTargetMemSDNode<X86MaskedScatterSDNode>(
VTs, Ops, dl, N->getMemoryVT(), N->getMemOperand());
@ -24455,11 +24432,6 @@ static SDValue LowerMLOAD(SDValue Op, const X86Subtarget &Subtarget,
assert((!N->isExpandingLoad() || ScalarVT.getSizeInBits() >= 32) &&
"Expanding masked load is supported for 32 and 64-bit types only!");
// 4x32, 4x64 and 2x64 vectors of non-expanding loads are legal regardless of
// VLX. These types for exp-loads are handled here.
if (!N->isExpandingLoad() && VT.getVectorNumElements() <= 4)
return Op;
assert(Subtarget.hasAVX512() && !Subtarget.hasVLX() && !VT.is512BitVector() &&
"Cannot lower masked load op.");
@ -24476,16 +24448,12 @@ static SDValue LowerMLOAD(SDValue Op, const X86Subtarget &Subtarget,
Src0 = ExtendToType(Src0, WideDataVT, DAG);
// Mask element has to be i1.
MVT MaskEltTy = Mask.getSimpleValueType().getScalarType();
assert((MaskEltTy == MVT::i1 || VT.getVectorNumElements() <= 4) &&
"We handle 4x32, 4x64 and 2x64 vectors only in this case");
assert(Mask.getSimpleValueType().getScalarType() == MVT::i1 &&
"Unexpected mask type");
MVT WideMaskVT = MVT::getVectorVT(MaskEltTy, NumEltsInWideVec);
MVT WideMaskVT = MVT::getVectorVT(MVT::i1, NumEltsInWideVec);
Mask = ExtendToType(Mask, WideMaskVT, DAG, true);
if (MaskEltTy != MVT::i1)
Mask = DAG.getNode(ISD::TRUNCATE, dl,
MVT::getVectorVT(MVT::i1, NumEltsInWideVec), Mask);
SDValue NewLoad = DAG.getMaskedLoad(WideDataVT, dl, N->getChain(),
N->getBasePtr(), Mask, Src0,
N->getMemoryVT(), N->getMemOperand(),
@ -24514,10 +24482,6 @@ static SDValue LowerMSTORE(SDValue Op, const X86Subtarget &Subtarget,
assert((!N->isCompressingStore() || ScalarVT.getSizeInBits() >= 32) &&
"Expanding masked load is supported for 32 and 64-bit types only!");
// 4x32 and 2x64 vectors of non-compressing stores are legal regardless to VLX.
if (!N->isCompressingStore() && VT.getVectorNumElements() <= 4)
return Op;
assert(Subtarget.hasAVX512() && !Subtarget.hasVLX() && !VT.is512BitVector() &&
"Cannot lower masked store op.");
@ -24532,17 +24496,13 @@ static SDValue LowerMSTORE(SDValue Op, const X86Subtarget &Subtarget,
MVT WideDataVT = MVT::getVectorVT(ScalarVT, NumEltsInWideVec);
// Mask element has to be i1.
MVT MaskEltTy = Mask.getSimpleValueType().getScalarType();
assert((MaskEltTy == MVT::i1 || VT.getVectorNumElements() <= 4) &&
"We handle 4x32, 4x64 and 2x64 vectors only in this case");
assert(Mask.getSimpleValueType().getScalarType() == MVT::i1 &&
"Unexpected mask type");
MVT WideMaskVT = MVT::getVectorVT(MaskEltTy, NumEltsInWideVec);
MVT WideMaskVT = MVT::getVectorVT(MVT::i1, NumEltsInWideVec);
DataToStore = ExtendToType(DataToStore, WideDataVT, DAG);
Mask = ExtendToType(Mask, WideMaskVT, DAG, true);
if (MaskEltTy != MVT::i1)
Mask = DAG.getNode(ISD::TRUNCATE, dl,
MVT::getVectorVT(MVT::i1, NumEltsInWideVec), Mask);
return DAG.getMaskedStore(N->getChain(), dl, DataToStore, N->getBasePtr(),
Mask, N->getMemoryVT(), N->getMemOperand(),
N->isTruncatingStore(), N->isCompressingStore());
@ -24592,12 +24552,9 @@ static SDValue LowerMGATHER(SDValue Op, const X86Subtarget &Subtarget,
Index = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i64, Index);
// Mask
MVT MaskBitVT = MVT::getVectorVT(MVT::i1, NumElts);
// At this point we have promoted mask operand
assert(MaskVT.getScalarSizeInBits() >= 32 && "unexpected mask type");
MVT ExtMaskVT = MVT::getVectorVT(MaskVT.getScalarType(), NumElts);
Mask = ExtendToType(Mask, ExtMaskVT, DAG, true);
Mask = DAG.getNode(ISD::TRUNCATE, dl, MaskBitVT, Mask);
assert(MaskVT.getScalarType() == MVT::i1 && "unexpected mask type");
MaskVT = MVT::getVectorVT(MVT::i1, NumElts);
Mask = ExtendToType(Mask, MaskVT, DAG, true);
// The pass-through value
MVT NewVT = MVT::getVectorVT(VT.getScalarType(), NumElts);
@ -24605,7 +24562,7 @@ static SDValue LowerMGATHER(SDValue Op, const X86Subtarget &Subtarget,
SDValue Ops[] = { N->getChain(), Src0, Mask, N->getBasePtr(), Index };
SDValue NewGather = DAG.getTargetMemSDNode<X86MaskedGatherSDNode>(
DAG.getVTList(NewVT, MaskBitVT, MVT::Other), Ops, dl, N->getMemoryVT(),
DAG.getVTList(NewVT, MaskVT, MVT::Other), Ops, dl, N->getMemoryVT(),
N->getMemOperand());
SDValue Extract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT,
NewGather.getValue(0),
@ -30447,7 +30404,7 @@ static SDValue combineBitcast(SDNode *N, SelectionDAG &DAG,
// If this is a bitcast between a MVT::v4i1/v2i1 and an illegal integer
// type, widen both sides to avoid a trip through memory.
if ((VT == MVT::v4i1 || VT == MVT::v2i1) && SrcVT.isScalarInteger() &&
Subtarget.hasVLX()) {
Subtarget.hasAVX512()) {
SDLoc dl(N);
N0 = DAG.getNode(ISD::ANY_EXTEND, dl, MVT::i8, N0);
N0 = DAG.getBitcast(MVT::v8i1, N0);
@ -30458,7 +30415,7 @@ static SDValue combineBitcast(SDNode *N, SelectionDAG &DAG,
// If this is a bitcast between a MVT::v4i1/v2i1 and an illegal integer
// type, widen both sides to avoid a trip through memory.
if ((SrcVT == MVT::v4i1 || SrcVT == MVT::v2i1) && VT.isScalarInteger() &&
Subtarget.hasVLX()) {
Subtarget.hasAVX512()) {
SDLoc dl(N);
unsigned NumConcats = 8 / SrcVT.getVectorNumElements();
SmallVector<SDValue, 4> Ops(NumConcats, DAG.getUNDEF(SrcVT));

View File

@ -2962,46 +2962,77 @@ multiclass avx512_mask_shiftop_w<bits<8> opc1, bits<8> opc2, string OpcodeStr,
defm KSHIFTL : avx512_mask_shiftop_w<0x32, 0x33, "kshiftl", X86kshiftl, SSE_PSHUF>;
defm KSHIFTR : avx512_mask_shiftop_w<0x30, 0x31, "kshiftr", X86kshiftr, SSE_PSHUF>;
multiclass axv512_icmp_packed_no_vlx_lowering<SDNode OpNode, string InstStr> {
def : Pat<(v8i1 (OpNode (v8i32 VR256X:$src1), (v8i32 VR256X:$src2))),
(COPY_TO_REGCLASS (!cast<Instruction>(InstStr##Zrr)
(v16i32 (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src1, sub_ymm)),
(v16i32 (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src2, sub_ymm))), VK8)>;
multiclass axv512_icmp_packed_no_vlx_lowering<SDNode OpNode, string InstStr,
X86VectorVTInfo Narrow,
X86VectorVTInfo Wide> {
def : Pat<(Narrow.KVT (OpNode (Narrow.VT Narrow.RC:$src1),
(Narrow.VT Narrow.RC:$src2))),
(COPY_TO_REGCLASS
(!cast<Instruction>(InstStr##Zrr)
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src1, Narrow.SubRegIdx)),
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src2, Narrow.SubRegIdx))),
Narrow.KRC)>;
def : Pat<(v8i1 (and VK8:$mask,
(OpNode (v8i32 VR256X:$src1), (v8i32 VR256X:$src2)))),
def : Pat<(Narrow.KVT (and Narrow.KRC:$mask,
(OpNode (Narrow.VT Narrow.RC:$src1),
(Narrow.VT Narrow.RC:$src2)))),
(COPY_TO_REGCLASS
(!cast<Instruction>(InstStr##Zrrk)
(COPY_TO_REGCLASS VK8:$mask, VK16),
(v16i32 (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src1, sub_ymm)),
(v16i32 (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src2, sub_ymm))),
VK8)>;
(COPY_TO_REGCLASS Narrow.KRC:$mask, Wide.KRC),
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src1, Narrow.SubRegIdx)),
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src2, Narrow.SubRegIdx))),
Narrow.KRC)>;
}
multiclass axv512_icmp_packed_cc_no_vlx_lowering<SDNode OpNode, string InstStr,
AVX512VLVectorVTInfo _> {
def : Pat<(v8i1 (OpNode (_.info256.VT VR256X:$src1), (_.info256.VT VR256X:$src2), imm:$cc)),
(COPY_TO_REGCLASS (!cast<Instruction>(InstStr##Zrri)
(_.info512.VT (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src1, sub_ymm)),
(_.info512.VT (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src2, sub_ymm)),
imm:$cc), VK8)>;
X86VectorVTInfo Narrow,
X86VectorVTInfo Wide> {
def : Pat<(Narrow.KVT (OpNode (Narrow.VT Narrow.RC:$src1),
(Narrow.VT Narrow.RC:$src2), imm:$cc)),
(COPY_TO_REGCLASS
(!cast<Instruction>(InstStr##Zrri)
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src1, Narrow.SubRegIdx)),
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src2, Narrow.SubRegIdx)),
imm:$cc), Narrow.KRC)>;
def : Pat<(v8i1 (and VK8:$mask, (OpNode (_.info256.VT VR256X:$src1),
(_.info256.VT VR256X:$src2), imm:$cc))),
(COPY_TO_REGCLASS (!cast<Instruction>(InstStr##Zrrik)
(COPY_TO_REGCLASS VK8:$mask, VK16),
(_.info512.VT (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src1, sub_ymm)),
(_.info512.VT (INSERT_SUBREG (IMPLICIT_DEF), VR256X:$src2, sub_ymm)),
imm:$cc), VK8)>;
def : Pat<(Narrow.KVT (and Narrow.KRC:$mask,
(OpNode (Narrow.VT Narrow.RC:$src1),
(Narrow.VT Narrow.RC:$src2), imm:$cc))),
(COPY_TO_REGCLASS (!cast<Instruction>(InstStr##Zrrik)
(COPY_TO_REGCLASS Narrow.KRC:$mask, Wide.KRC),
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src1, Narrow.SubRegIdx)),
(Wide.VT (INSERT_SUBREG (IMPLICIT_DEF), Narrow.RC:$src2, Narrow.SubRegIdx)),
imm:$cc), Narrow.KRC)>;
}
let Predicates = [HasAVX512, NoVLX] in {
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpgtm, "VPCMPGTD">;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpeqm, "VPCMPEQD">;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpgtm, "VPCMPGTD", v8i32x_info, v16i32_info>;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpeqm, "VPCMPEQD", v8i32x_info, v16i32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VCMPPS", avx512vl_f32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VPCMPD", avx512vl_i32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpmu, "VPCMPUD", avx512vl_i32_info>;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpgtm, "VPCMPGTD", v4i32x_info, v16i32_info>;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpeqm, "VPCMPEQD", v4i32x_info, v16i32_info>;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpgtm, "VPCMPGTQ", v4i64x_info, v8i64_info>;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpeqm, "VPCMPEQQ", v4i64x_info, v8i64_info>;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpgtm, "VPCMPGTQ", v2i64x_info, v8i64_info>;
defm : axv512_icmp_packed_no_vlx_lowering<X86pcmpeqm, "VPCMPEQQ", v2i64x_info, v8i64_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VCMPPS", v8f32x_info, v16f32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VPCMPD", v8i32x_info, v16i32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpmu, "VPCMPUD", v8i32x_info, v16i32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VCMPPS", v4f32x_info, v16f32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VPCMPD", v4i32x_info, v16i32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpmu, "VPCMPUD", v4i32x_info, v16i32_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VCMPPD", v4f64x_info, v8f64_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VPCMPQ", v4i64x_info, v8i64_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpmu, "VPCMPUQ", v4i64x_info, v8i64_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VCMPPD", v2f64x_info, v8f64_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpm, "VPCMPQ", v2i64x_info, v8i64_info>;
defm : axv512_icmp_packed_cc_no_vlx_lowering<X86cmpmu, "VPCMPUQ", v2i64x_info, v8i64_info>;
}
// Mask setting all 0s or 1s
@ -3376,8 +3407,15 @@ multiclass mask_move_lowering<string InstrStr, X86VectorVTInfo Narrow,
// Patterns for handling v8i1 selects of 256-bit vectors when VLX isn't
// available. Use a 512-bit operation and extract.
let Predicates = [HasAVX512, NoVLX] in {
defm : mask_move_lowering<"VMOVAPSZ", v4f32x_info, v16f32_info>;
defm : mask_move_lowering<"VMOVDQA32Z", v4i32x_info, v16i32_info>;
defm : mask_move_lowering<"VMOVAPSZ", v8f32x_info, v16f32_info>;
defm : mask_move_lowering<"VMOVDQA32Z", v8i32x_info, v16i32_info>;
defm : mask_move_lowering<"VMOVAPDZ", v2f64x_info, v8f64_info>;
defm : mask_move_lowering<"VMOVDQA64Z", v2i64x_info, v8i64_info>;
defm : mask_move_lowering<"VMOVAPDZ", v4f64x_info, v8f64_info>;
defm : mask_move_lowering<"VMOVDQA64Z", v4i64x_info, v8i64_info>;
}
let Predicates = [HasAVX512] in {

View File

@ -495,6 +495,18 @@ let Predicates = [HasBWI, HasVLX] in {
// If the bits are not zero we have to fall back to explicitly zeroing by
// using shifts.
let Predicates = [HasAVX512] in {
def : Pat<(v16i1 (insert_subvector (v16i1 immAllZerosV),
(v2i1 VK2:$mask), (iPTR 0))),
(KSHIFTRWri (KSHIFTLWri (COPY_TO_REGCLASS VK2:$mask, VK16),
(i8 14)), (i8 14))>;
def : Pat<(v16i1 (insert_subvector (v16i1 immAllZerosV),
(v4i1 VK4:$mask), (iPTR 0))),
(KSHIFTRWri (KSHIFTLWri (COPY_TO_REGCLASS VK4:$mask, VK16),
(i8 12)), (i8 12))>;
}
let Predicates = [HasAVX512, NoDQI] in {
def : Pat<(v16i1 (insert_subvector (v16i1 immAllZerosV),
(v8i1 VK8:$mask), (iPTR 0))),
@ -506,9 +518,7 @@ let Predicates = [HasDQI] in {
def : Pat<(v16i1 (insert_subvector (v16i1 immAllZerosV),
(v8i1 VK8:$mask), (iPTR 0))),
(COPY_TO_REGCLASS (KMOVBkk VK8:$mask), VK16)>;
}
let Predicates = [HasVLX, HasDQI] in {
def : Pat<(v8i1 (insert_subvector (v8i1 immAllZerosV),
(v2i1 VK2:$mask), (iPTR 0))),
(KSHIFTRBri (KSHIFTLBri (COPY_TO_REGCLASS VK2:$mask, VK8),
@ -519,17 +529,6 @@ let Predicates = [HasVLX, HasDQI] in {
(i8 4)), (i8 4))>;
}
let Predicates = [HasVLX] in {
def : Pat<(v16i1 (insert_subvector (v16i1 immAllZerosV),
(v2i1 VK2:$mask), (iPTR 0))),
(KSHIFTRWri (KSHIFTLWri (COPY_TO_REGCLASS VK2:$mask, VK16),
(i8 14)), (i8 14))>;
def : Pat<(v16i1 (insert_subvector (v16i1 immAllZerosV),
(v4i1 VK4:$mask), (iPTR 0))),
(KSHIFTRWri (KSHIFTLWri (COPY_TO_REGCLASS VK4:$mask, VK16),
(i8 12)), (i8 12))>;
}
let Predicates = [HasBWI] in {
def : Pat<(v32i1 (insert_subvector (v32i1 immAllZerosV),
(v16i1 VK16:$mask), (iPTR 0))),

View File

@ -8,11 +8,17 @@ target triple = "x86_64-apple-macosx10.8.0"
define i32 @add(i32 %arg) {
; CHECK-LABEL: for function 'add'
; -- Same size registeres --
;CHECK: cost of 1 {{.*}} zext
;CHECK-AVX512: cost of 12 {{.*}} zext
;CHECK-AVX2: cost of 1 {{.*}} zext
;CHECK-AVX: cost of 1 {{.*}} zext
%A = zext <4 x i1> undef to <4 x i32>
;CHECK: cost of 2 {{.*}} sext
;CHECK-AVX512: cost of 12 {{.*}} sext
;CHECK-AVX2: cost of 2 {{.*}} sext
;CHECK-AVX: cost of 2 {{.*}} sext
%B = sext <4 x i1> undef to <4 x i32>
;CHECK: cost of 0 {{.*}} trunc
;CHECK-AVX512: cost of 0 {{.*}} trunc
;CHECK-AVX2: cost of 0 {{.*}} trunc
;CHECK-AVX: cost of 0 {{.*}} trunc
%C = trunc <4 x i32> undef to <4 x i1>
; -- Different size registers --

View File

@ -702,9 +702,10 @@ define <4 x float> @f64to4f32_mask(<4 x double> %b, <4 x i1> %mask) {
; NOVL-LABEL: f64to4f32_mask:
; NOVL: # %bb.0:
; NOVL-NEXT: vpslld $31, %xmm1, %xmm1
; NOVL-NEXT: vpsrad $31, %xmm1, %xmm1
; NOVL-NEXT: vptestmd %zmm1, %zmm1, %k1
; NOVL-NEXT: vcvtpd2ps %ymm0, %xmm0
; NOVL-NEXT: vpand %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vmovaps %zmm0, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NOVL-NEXT: vzeroupper
; NOVL-NEXT: retq
;
@ -743,9 +744,12 @@ define <8 x double> @f32to8f64(<8 x float> %b) nounwind {
define <4 x double> @f32to4f64_mask(<4 x float> %b, <4 x double> %b1, <4 x double> %a1) {
; NOVL-LABEL: f32to4f64_mask:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; NOVL-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NOVL-NEXT: vcvtps2pd %xmm0, %ymm0
; NOVL-NEXT: vcmpltpd %ymm2, %ymm1, %ymm1
; NOVL-NEXT: vandpd %ymm0, %ymm1, %ymm0
; NOVL-NEXT: vcmpltpd %zmm2, %zmm1, %k1
; NOVL-NEXT: vmovapd %zmm0, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NOVL-NEXT: retq
;
; VL-LABEL: f32to4f64_mask:
@ -1591,12 +1595,15 @@ define <8 x float> @sbto8f32(<8 x float> %a) {
}
define <4 x float> @sbto4f32(<4 x float> %a) {
; NOVL-LABEL: sbto4f32:
; NOVL: # %bb.0:
; NOVL-NEXT: vxorps %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vcmpltps %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vcvtdq2ps %xmm0, %xmm0
; NOVL-NEXT: retq
; NOVLDQ-LABEL: sbto4f32:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NOVLDQ-NEXT: vxorps %xmm1, %xmm1, %xmm1
; NOVLDQ-NEXT: vcmpltps %zmm0, %zmm1, %k1
; NOVLDQ-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; NOVLDQ-NEXT: vcvtdq2ps %xmm0, %xmm0
; NOVLDQ-NEXT: vzeroupper
; NOVLDQ-NEXT: retq
;
; VLDQ-LABEL: sbto4f32:
; VLDQ: # %bb.0:
@ -1614,19 +1621,30 @@ define <4 x float> @sbto4f32(<4 x float> %a) {
; VLNODQ-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
; VLNODQ-NEXT: vcvtdq2ps %xmm0, %xmm0
; VLNODQ-NEXT: retq
;
; AVX512DQ-LABEL: sbto4f32:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512DQ-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX512DQ-NEXT: vcmpltps %zmm0, %zmm1, %k0
; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0
; AVX512DQ-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%cmpres = fcmp ogt <4 x float> %a, zeroinitializer
%1 = sitofp <4 x i1> %cmpres to <4 x float>
ret <4 x float> %1
}
define <4 x double> @sbto4f64(<4 x double> %a) {
; NOVL-LABEL: sbto4f64:
; NOVL: # %bb.0:
; NOVL-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vcmpltpd %ymm0, %ymm1, %ymm0
; NOVL-NEXT: vpmovqd %zmm0, %ymm0
; NOVL-NEXT: vcvtdq2pd %xmm0, %ymm0
; NOVL-NEXT: retq
; NOVLDQ-LABEL: sbto4f64:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NOVLDQ-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; NOVLDQ-NEXT: vcmpltpd %zmm0, %zmm1, %k1
; NOVLDQ-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; NOVLDQ-NEXT: vcvtdq2pd %xmm0, %ymm0
; NOVLDQ-NEXT: retq
;
; VLDQ-LABEL: sbto4f64:
; VLDQ: # %bb.0:
@ -1644,18 +1662,30 @@ define <4 x double> @sbto4f64(<4 x double> %a) {
; VLNODQ-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
; VLNODQ-NEXT: vcvtdq2pd %xmm0, %ymm0
; VLNODQ-NEXT: retq
;
; AVX512DQ-LABEL: sbto4f64:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512DQ-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX512DQ-NEXT: vcmpltpd %zmm0, %zmm1, %k0
; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0
; AVX512DQ-NEXT: vcvtdq2pd %xmm0, %ymm0
; AVX512DQ-NEXT: retq
%cmpres = fcmp ogt <4 x double> %a, zeroinitializer
%1 = sitofp <4 x i1> %cmpres to <4 x double>
ret <4 x double> %1
}
define <2 x float> @sbto2f32(<2 x float> %a) {
; NOVL-LABEL: sbto2f32:
; NOVL: # %bb.0:
; NOVL-NEXT: vxorps %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vcmpltps %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vcvtdq2ps %xmm0, %xmm0
; NOVL-NEXT: retq
; NOVLDQ-LABEL: sbto2f32:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NOVLDQ-NEXT: vxorps %xmm1, %xmm1, %xmm1
; NOVLDQ-NEXT: vcmpltps %zmm0, %zmm1, %k1
; NOVLDQ-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; NOVLDQ-NEXT: vcvtdq2ps %xmm0, %xmm0
; NOVLDQ-NEXT: vzeroupper
; NOVLDQ-NEXT: retq
;
; VLDQ-LABEL: sbto2f32:
; VLDQ: # %bb.0:
@ -1673,19 +1703,31 @@ define <2 x float> @sbto2f32(<2 x float> %a) {
; VLNODQ-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
; VLNODQ-NEXT: vcvtdq2ps %xmm0, %xmm0
; VLNODQ-NEXT: retq
;
; AVX512DQ-LABEL: sbto2f32:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512DQ-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX512DQ-NEXT: vcmpltps %zmm0, %zmm1, %k0
; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0
; AVX512DQ-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%cmpres = fcmp ogt <2 x float> %a, zeroinitializer
%1 = sitofp <2 x i1> %cmpres to <2 x float>
ret <2 x float> %1
}
define <2 x double> @sbto2f64(<2 x double> %a) {
; NOVL-LABEL: sbto2f64:
; NOVL: # %bb.0:
; NOVL-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vcmpltpd %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
; NOVL-NEXT: vcvtdq2pd %xmm0, %xmm0
; NOVL-NEXT: retq
; NOVLDQ-LABEL: sbto2f64:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NOVLDQ-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; NOVLDQ-NEXT: vcmpltpd %zmm0, %zmm1, %k1
; NOVLDQ-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; NOVLDQ-NEXT: vcvtdq2pd %xmm0, %xmm0
; NOVLDQ-NEXT: vzeroupper
; NOVLDQ-NEXT: retq
;
; VLDQ-LABEL: sbto2f64:
; VLDQ: # %bb.0:
@ -1703,6 +1745,16 @@ define <2 x double> @sbto2f64(<2 x double> %a) {
; VLNODQ-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
; VLNODQ-NEXT: vcvtdq2pd %xmm0, %xmm0
; VLNODQ-NEXT: retq
;
; AVX512DQ-LABEL: sbto2f64:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512DQ-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX512DQ-NEXT: vcmpltpd %zmm0, %zmm1, %k0
; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0
; AVX512DQ-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%cmpres = fcmp ogt <2 x double> %a, zeroinitializer
%1 = sitofp <2 x i1> %cmpres to <2 x double>
ret <2 x double> %1
@ -1925,10 +1977,12 @@ define <8 x double> @ubto8f64(<8 x i32> %a) {
define <4 x float> @ubto4f32(<4 x i32> %a) {
; NOVL-LABEL: ubto4f32:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NOVL-NEXT: vpxor %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vpcmpgtd %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vpbroadcastd {{.*#+}} xmm1 = [1,1,1,1]
; NOVL-NEXT: vpand %xmm1, %xmm0, %xmm0
; NOVL-NEXT: vpcmpgtd %zmm0, %zmm1, %k1
; NOVL-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0 {%k1} {z}
; NOVL-NEXT: vcvtdq2ps %xmm0, %xmm0
; NOVL-NEXT: vzeroupper
; NOVL-NEXT: retq
;
; VL-LABEL: ubto4f32:
@ -1946,9 +2000,10 @@ define <4 x float> @ubto4f32(<4 x i32> %a) {
define <4 x double> @ubto4f64(<4 x i32> %a) {
; NOVL-LABEL: ubto4f64:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NOVL-NEXT: vpxor %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vpcmpgtd %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vpsrld $31, %xmm0, %xmm0
; NOVL-NEXT: vpcmpgtd %zmm0, %zmm1, %k1
; NOVL-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0 {%k1} {z}
; NOVL-NEXT: vcvtdq2pd %xmm0, %ymm0
; NOVL-NEXT: retq
;
@ -1969,14 +2024,10 @@ define <2 x float> @ubto2f32(<2 x i32> %a) {
; NOVL: # %bb.0:
; NOVL-NEXT: vpxor %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
; NOVL-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vpextrb $8, %xmm0, %eax
; NOVL-NEXT: andl $1, %eax
; NOVL-NEXT: vcvtsi2ssl %eax, %xmm2, %xmm1
; NOVL-NEXT: vpextrb $0, %xmm0, %eax
; NOVL-NEXT: andl $1, %eax
; NOVL-NEXT: vcvtsi2ssl %eax, %xmm2, %xmm0
; NOVL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
; NOVL-NEXT: vpcmpltuq %zmm1, %zmm0, %k1
; NOVL-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0 {%k1} {z}
; NOVL-NEXT: vcvtdq2ps %xmm0, %xmm0
; NOVL-NEXT: vzeroupper
; NOVL-NEXT: retq
;
; VL-LABEL: ubto2f32:
@ -1997,10 +2048,8 @@ define <2 x double> @ubto2f64(<2 x i32> %a) {
; NOVL: # %bb.0:
; NOVL-NEXT: vpxor %xmm1, %xmm1, %xmm1
; NOVL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
; NOVL-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; NOVL-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; NOVL-NEXT: vpbroadcastd {{.*#+}} xmm1 = [1,1,1,1]
; NOVL-NEXT: vpand %xmm1, %xmm0, %xmm0
; NOVL-NEXT: vpcmpltuq %zmm1, %zmm0, %k1
; NOVL-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0 {%k1} {z}
; NOVL-NEXT: vcvtudq2pd %ymm0, %zmm0
; NOVL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NOVL-NEXT: vzeroupper
@ -2020,19 +2069,27 @@ define <2 x double> @ubto2f64(<2 x i32> %a) {
}
define <2 x i64> @test_2f64toub(<2 x double> %a, <2 x i64> %passthru) {
; NOVLDQ-LABEL: test_2f64toub:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: vcvttsd2usi %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm2
; NOVLDQ-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
; NOVLDQ-NEXT: vcvttsd2usi %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm0
; NOVLDQ-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; NOVLDQ-NEXT: vpsllq $63, %xmm0, %xmm0
; NOVLDQ-NEXT: vpsraq $63, %zmm0, %zmm0
; NOVLDQ-NEXT: vpand %xmm1, %xmm0, %xmm0
; NOVLDQ-NEXT: vzeroupper
; NOVLDQ-NEXT: retq
; KNL-LABEL: test_2f64toub:
; KNL: # %bb.0:
; KNL-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
; KNL-NEXT: vcvttsd2si %xmm2, %eax
; KNL-NEXT: kmovw %eax, %k0
; KNL-NEXT: vcvttsd2si %xmm0, %eax
; KNL-NEXT: andl $1, %eax
; KNL-NEXT: kmovw %eax, %k1
; KNL-NEXT: kshiftrw $1, %k0, %k2
; KNL-NEXT: kshiftlw $1, %k2, %k2
; KNL-NEXT: korw %k1, %k2, %k1
; KNL-NEXT: kshiftrw $1, %k1, %k2
; KNL-NEXT: kxorw %k0, %k2, %k0
; KNL-NEXT: kshiftlw $15, %k0, %k0
; KNL-NEXT: kshiftrw $14, %k0, %k0
; KNL-NEXT: kxorw %k1, %k0, %k1
; KNL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; VL-LABEL: test_2f64toub:
; VL: # %bb.0:
@ -2044,13 +2101,47 @@ define <2 x i64> @test_2f64toub(<2 x double> %a, <2 x i64> %passthru) {
;
; AVX512DQ-LABEL: test_2f64toub:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512DQ-NEXT: vcvttpd2uqq %zmm0, %zmm0
; AVX512DQ-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512DQ-NEXT: vpsraq $63, %zmm0, %zmm0
; AVX512DQ-NEXT: vpand %xmm1, %xmm0, %xmm0
; AVX512DQ-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512DQ-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
; AVX512DQ-NEXT: vcvttsd2si %xmm2, %eax
; AVX512DQ-NEXT: kmovw %eax, %k0
; AVX512DQ-NEXT: vcvttsd2si %xmm0, %eax
; AVX512DQ-NEXT: andl $1, %eax
; AVX512DQ-NEXT: kmovw %eax, %k1
; AVX512DQ-NEXT: kshiftrw $1, %k0, %k2
; AVX512DQ-NEXT: kshiftlw $1, %k2, %k2
; AVX512DQ-NEXT: korw %k1, %k2, %k1
; AVX512DQ-NEXT: kshiftrw $1, %k1, %k2
; AVX512DQ-NEXT: kxorw %k0, %k2, %k0
; AVX512DQ-NEXT: kshiftlw $15, %k0, %k0
; AVX512DQ-NEXT: kshiftrw $14, %k0, %k0
; AVX512DQ-NEXT: kxorw %k1, %k0, %k1
; AVX512DQ-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
;
; AVX512BW-LABEL: test_2f64toub:
; AVX512BW: # %bb.0:
; AVX512BW-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512BW-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
; AVX512BW-NEXT: vcvttsd2si %xmm2, %eax
; AVX512BW-NEXT: kmovd %eax, %k0
; AVX512BW-NEXT: vcvttsd2si %xmm0, %eax
; AVX512BW-NEXT: andl $1, %eax
; AVX512BW-NEXT: kmovw %eax, %k1
; AVX512BW-NEXT: kshiftrw $1, %k0, %k2
; AVX512BW-NEXT: kshiftlw $1, %k2, %k2
; AVX512BW-NEXT: korw %k1, %k2, %k1
; AVX512BW-NEXT: kshiftrw $1, %k1, %k2
; AVX512BW-NEXT: kxorw %k0, %k2, %k0
; AVX512BW-NEXT: kshiftlw $15, %k0, %k0
; AVX512BW-NEXT: kshiftrw $14, %k0, %k0
; AVX512BW-NEXT: kxorw %k1, %k0, %k1
; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; AVX512BW-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512BW-NEXT: vzeroupper
; AVX512BW-NEXT: retq
%mask = fptoui <2 x double> %a to <2 x i1>
%select = select <2 x i1> %mask, <2 x i64> %passthru, <2 x i64> zeroinitializer
ret <2 x i64> %select
@ -2059,12 +2150,12 @@ define <2 x i64> @test_2f64toub(<2 x double> %a, <2 x i64> %passthru) {
define <4 x i64> @test_4f64toub(<4 x double> %a, <4 x i64> %passthru) {
; NOVL-LABEL: test_4f64toub:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NOVL-NEXT: vcvttpd2udq %zmm0, %ymm0
; NOVL-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NOVL-NEXT: vcvttpd2dq %ymm0, %xmm0
; NOVL-NEXT: vpslld $31, %xmm0, %xmm0
; NOVL-NEXT: vpsrad $31, %xmm0, %xmm0
; NOVL-NEXT: vpmovsxdq %xmm0, %ymm0
; NOVL-NEXT: vpand %ymm1, %ymm0, %ymm0
; NOVL-NEXT: vptestmd %zmm0, %zmm0, %k1
; NOVL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NOVL-NEXT: retq
;
; VL-LABEL: test_4f64toub:
@ -2101,19 +2192,16 @@ define <8 x i64> @test_8f64toub(<8 x double> %a, <8 x i64> %passthru) {
}
define <2 x i64> @test_2f32toub(<2 x float> %a, <2 x i64> %passthru) {
; NOVLDQ-LABEL: test_2f32toub:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: vcvttss2usi %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm2
; NOVLDQ-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; NOVLDQ-NEXT: vcvttss2usi %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm0
; NOVLDQ-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; NOVLDQ-NEXT: vpsllq $63, %xmm0, %xmm0
; NOVLDQ-NEXT: vpsraq $63, %zmm0, %zmm0
; NOVLDQ-NEXT: vpand %xmm1, %xmm0, %xmm0
; NOVLDQ-NEXT: vzeroupper
; NOVLDQ-NEXT: retq
; NOVL-LABEL: test_2f32toub:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NOVL-NEXT: vcvttps2dq %xmm0, %xmm0
; NOVL-NEXT: vpslld $31, %xmm0, %xmm0
; NOVL-NEXT: vptestmd %zmm0, %zmm0, %k1
; NOVL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NOVL-NEXT: vzeroupper
; NOVL-NEXT: retq
;
; VL-LABEL: test_2f32toub:
; VL: # %bb.0:
@ -2122,16 +2210,6 @@ define <2 x i64> @test_2f32toub(<2 x float> %a, <2 x i64> %passthru) {
; VL-NEXT: vptestmd %xmm0, %xmm0, %k1
; VL-NEXT: vmovdqa64 %xmm1, %xmm0 {%k1} {z}
; VL-NEXT: retq
;
; AVX512DQ-LABEL: test_2f32toub:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; AVX512DQ-NEXT: vcvttps2uqq %ymm0, %zmm0
; AVX512DQ-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512DQ-NEXT: vpsraq $63, %zmm0, %zmm0
; AVX512DQ-NEXT: vpand %xmm1, %xmm0, %xmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%mask = fptoui <2 x float> %a to <2 x i1>
%select = select <2 x i1> %mask, <2 x i64> %passthru, <2 x i64> zeroinitializer
ret <2 x i64> %select
@ -2140,12 +2218,12 @@ define <2 x i64> @test_2f32toub(<2 x float> %a, <2 x i64> %passthru) {
define <4 x i64> @test_4f32toub(<4 x float> %a, <4 x i64> %passthru) {
; NOVL-LABEL: test_4f32toub:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NOVL-NEXT: vcvttps2udq %zmm0, %zmm0
; NOVL-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NOVL-NEXT: vcvttps2dq %xmm0, %xmm0
; NOVL-NEXT: vpslld $31, %xmm0, %xmm0
; NOVL-NEXT: vpsrad $31, %xmm0, %xmm0
; NOVL-NEXT: vpmovsxdq %xmm0, %ymm0
; NOVL-NEXT: vpand %ymm1, %ymm0, %ymm0
; NOVL-NEXT: vptestmd %zmm0, %zmm0, %k1
; NOVL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NOVL-NEXT: retq
;
; VL-LABEL: test_4f32toub:
@ -2195,16 +2273,27 @@ define <16 x i32> @test_16f32toub(<16 x float> %a, <16 x i32> %passthru) {
}
define <2 x i64> @test_2f64tosb(<2 x double> %a, <2 x i64> %passthru) {
; NOVLDQ-LABEL: test_2f64tosb:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: vcvttsd2si %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm2
; NOVLDQ-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
; NOVLDQ-NEXT: vcvttsd2si %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm0
; NOVLDQ-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; NOVLDQ-NEXT: vpand %xmm1, %xmm0, %xmm0
; NOVLDQ-NEXT: retq
; KNL-LABEL: test_2f64tosb:
; KNL: # %bb.0:
; KNL-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
; KNL-NEXT: vcvttsd2si %xmm2, %eax
; KNL-NEXT: kmovw %eax, %k0
; KNL-NEXT: vcvttsd2si %xmm0, %eax
; KNL-NEXT: andl $1, %eax
; KNL-NEXT: kmovw %eax, %k1
; KNL-NEXT: kshiftrw $1, %k0, %k2
; KNL-NEXT: kshiftlw $1, %k2, %k2
; KNL-NEXT: korw %k1, %k2, %k1
; KNL-NEXT: kshiftrw $1, %k1, %k2
; KNL-NEXT: kxorw %k0, %k2, %k0
; KNL-NEXT: kshiftlw $15, %k0, %k0
; KNL-NEXT: kshiftrw $14, %k0, %k0
; KNL-NEXT: kxorw %k1, %k0, %k1
; KNL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; VL-LABEL: test_2f64tosb:
; VL: # %bb.0:
@ -2216,11 +2305,47 @@ define <2 x i64> @test_2f64tosb(<2 x double> %a, <2 x i64> %passthru) {
;
; AVX512DQ-LABEL: test_2f64tosb:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512DQ-NEXT: vcvttpd2qq %zmm0, %zmm0
; AVX512DQ-NEXT: vandps %xmm1, %xmm0, %xmm0
; AVX512DQ-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512DQ-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
; AVX512DQ-NEXT: vcvttsd2si %xmm2, %eax
; AVX512DQ-NEXT: kmovw %eax, %k0
; AVX512DQ-NEXT: vcvttsd2si %xmm0, %eax
; AVX512DQ-NEXT: andl $1, %eax
; AVX512DQ-NEXT: kmovw %eax, %k1
; AVX512DQ-NEXT: kshiftrw $1, %k0, %k2
; AVX512DQ-NEXT: kshiftlw $1, %k2, %k2
; AVX512DQ-NEXT: korw %k1, %k2, %k1
; AVX512DQ-NEXT: kshiftrw $1, %k1, %k2
; AVX512DQ-NEXT: kxorw %k0, %k2, %k0
; AVX512DQ-NEXT: kshiftlw $15, %k0, %k0
; AVX512DQ-NEXT: kshiftrw $14, %k0, %k0
; AVX512DQ-NEXT: kxorw %k1, %k0, %k1
; AVX512DQ-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
;
; AVX512BW-LABEL: test_2f64tosb:
; AVX512BW: # %bb.0:
; AVX512BW-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512BW-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
; AVX512BW-NEXT: vcvttsd2si %xmm2, %eax
; AVX512BW-NEXT: kmovd %eax, %k0
; AVX512BW-NEXT: vcvttsd2si %xmm0, %eax
; AVX512BW-NEXT: andl $1, %eax
; AVX512BW-NEXT: kmovw %eax, %k1
; AVX512BW-NEXT: kshiftrw $1, %k0, %k2
; AVX512BW-NEXT: kshiftlw $1, %k2, %k2
; AVX512BW-NEXT: korw %k1, %k2, %k1
; AVX512BW-NEXT: kshiftrw $1, %k1, %k2
; AVX512BW-NEXT: kxorw %k0, %k2, %k0
; AVX512BW-NEXT: kshiftlw $15, %k0, %k0
; AVX512BW-NEXT: kshiftrw $14, %k0, %k0
; AVX512BW-NEXT: kxorw %k1, %k0, %k1
; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; AVX512BW-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512BW-NEXT: vzeroupper
; AVX512BW-NEXT: retq
%mask = fptosi <2 x double> %a to <2 x i1>
%select = select <2 x i1> %mask, <2 x i64> %passthru, <2 x i64> zeroinitializer
ret <2 x i64> %select
@ -2229,9 +2354,11 @@ define <2 x i64> @test_2f64tosb(<2 x double> %a, <2 x i64> %passthru) {
define <4 x i64> @test_4f64tosb(<4 x double> %a, <4 x i64> %passthru) {
; NOVL-LABEL: test_4f64tosb:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NOVL-NEXT: vcvttpd2dq %ymm0, %xmm0
; NOVL-NEXT: vpmovsxdq %xmm0, %ymm0
; NOVL-NEXT: vpand %ymm1, %ymm0, %ymm0
; NOVL-NEXT: vptestmd %zmm0, %zmm0, %k1
; NOVL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NOVL-NEXT: retq
;
; VL-LABEL: test_4f64tosb:
@ -2265,16 +2392,15 @@ define <8 x i64> @test_8f64tosb(<8 x double> %a, <8 x i64> %passthru) {
}
define <2 x i64> @test_2f32tosb(<2 x float> %a, <2 x i64> %passthru) {
; NOVLDQ-LABEL: test_2f32tosb:
; NOVLDQ: # %bb.0:
; NOVLDQ-NEXT: vcvttss2si %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm2
; NOVLDQ-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; NOVLDQ-NEXT: vcvttss2si %xmm0, %rax
; NOVLDQ-NEXT: vmovq %rax, %xmm0
; NOVLDQ-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; NOVLDQ-NEXT: vpand %xmm1, %xmm0, %xmm0
; NOVLDQ-NEXT: retq
; NOVL-LABEL: test_2f32tosb:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NOVL-NEXT: vcvttps2dq %xmm0, %xmm0
; NOVL-NEXT: vptestmd %zmm0, %zmm0, %k1
; NOVL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NOVL-NEXT: vzeroupper
; NOVL-NEXT: retq
;
; VL-LABEL: test_2f32tosb:
; VL: # %bb.0:
@ -2282,14 +2408,6 @@ define <2 x i64> @test_2f32tosb(<2 x float> %a, <2 x i64> %passthru) {
; VL-NEXT: vptestmd %xmm0, %xmm0, %k1
; VL-NEXT: vmovdqa64 %xmm1, %xmm0 {%k1} {z}
; VL-NEXT: retq
;
; AVX512DQ-LABEL: test_2f32tosb:
; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; AVX512DQ-NEXT: vcvttps2qq %ymm0, %zmm0
; AVX512DQ-NEXT: vandps %xmm1, %xmm0, %xmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%mask = fptosi <2 x float> %a to <2 x i1>
%select = select <2 x i1> %mask, <2 x i64> %passthru, <2 x i64> zeroinitializer
ret <2 x i64> %select
@ -2298,9 +2416,11 @@ define <2 x i64> @test_2f32tosb(<2 x float> %a, <2 x i64> %passthru) {
define <4 x i64> @test_4f32tosb(<4 x float> %a, <4 x i64> %passthru) {
; NOVL-LABEL: test_4f32tosb:
; NOVL: # %bb.0:
; NOVL-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NOVL-NEXT: vcvttps2dq %xmm0, %xmm0
; NOVL-NEXT: vpmovsxdq %xmm0, %ymm0
; NOVL-NEXT: vpand %ymm1, %ymm0, %ymm0
; NOVL-NEXT: vptestmd %zmm0, %zmm0, %k1
; NOVL-NEXT: vmovdqa64 %zmm1, %zmm0 {%k1} {z}
; NOVL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NOVL-NEXT: retq
;
; VL-LABEL: test_4f32tosb:

View File

@ -301,9 +301,10 @@ define <4 x i32> @zext_4x8mem_to_4x32(<4 x i8> *%i , <4 x i1> %mask) nounwind re
; KNL-LABEL: zext_4x8mem_to_4x32:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovzxbd {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxbd {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
; KNL-NEXT: vmovdqa32 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_4x8mem_to_4x32:
@ -322,9 +323,10 @@ define <4 x i32> @sext_4x8mem_to_4x32(<4 x i8> *%i , <4 x i1> %mask) nounwind re
; KNL-LABEL: sext_4x8mem_to_4x32:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovsxbd (%rdi), %xmm1
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxbd (%rdi), %xmm0
; KNL-NEXT: vmovdqa32 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_4x8mem_to_4x32:
@ -489,9 +491,10 @@ define <2 x i64> @zext_2x8mem_to_2x64(<2 x i8> *%i , <2 x i1> %mask) nounwind re
; KNL-LABEL: zext_2x8mem_to_2x64:
; KNL: # %bb.0:
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vpsraq $63, %zmm0, %zmm0
; KNL-NEXT: vpmovzxbq {{.*#+}} xmm1 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxbq {{.*#+}} xmm0 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_2x8mem_to_2x64:
@ -509,9 +512,10 @@ define <2 x i64> @sext_2x8mem_to_2x64mask(<2 x i8> *%i , <2 x i1> %mask) nounwin
; KNL-LABEL: sext_2x8mem_to_2x64mask:
; KNL: # %bb.0:
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vpsraq $63, %zmm0, %zmm0
; KNL-NEXT: vpmovsxbq (%rdi), %xmm1
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxbq (%rdi), %xmm0
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_2x8mem_to_2x64mask:
@ -539,10 +543,10 @@ define <4 x i64> @zext_4x8mem_to_4x64(<4 x i8> *%i , <4 x i1> %mask) nounwind re
; KNL-LABEL: zext_4x8mem_to_4x64:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; KNL-NEXT: vpmovzxbq {{.*#+}} ymm1 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero,mem[2],zero,zero,zero,zero,zero,zero,zero,mem[3],zero,zero,zero,zero,zero,zero,zero
; KNL-NEXT: vpand %ymm1, %ymm0, %ymm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxbq {{.*#+}} ymm0 = mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero,mem[2],zero,zero,zero,zero,zero,zero,zero,mem[3],zero,zero,zero,zero,zero,zero,zero
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_4x8mem_to_4x64:
@ -561,10 +565,10 @@ define <4 x i64> @sext_4x8mem_to_4x64mask(<4 x i8> *%i , <4 x i1> %mask) nounwin
; KNL-LABEL: sext_4x8mem_to_4x64mask:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovsxdq %xmm0, %ymm0
; KNL-NEXT: vpmovsxbq (%rdi), %ymm1
; KNL-NEXT: vpand %ymm1, %ymm0, %ymm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxbq (%rdi), %ymm0
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_4x8mem_to_4x64mask:
@ -645,9 +649,10 @@ define <4 x i32> @zext_4x16mem_to_4x32(<4 x i16> *%i , <4 x i1> %mask) nounwind
; KNL-LABEL: zext_4x16mem_to_4x32:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovzxwd {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxwd {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
; KNL-NEXT: vmovdqa32 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_4x16mem_to_4x32:
@ -666,9 +671,10 @@ define <4 x i32> @sext_4x16mem_to_4x32mask(<4 x i16> *%i , <4 x i1> %mask) nounw
; KNL-LABEL: sext_4x16mem_to_4x32mask:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovsxwd (%rdi), %xmm1
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxwd (%rdi), %xmm0
; KNL-NEXT: vmovdqa32 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_4x16mem_to_4x32mask:
@ -865,9 +871,10 @@ define <2 x i64> @zext_2x16mem_to_2x64(<2 x i16> *%i , <2 x i1> %mask) nounwind
; KNL-LABEL: zext_2x16mem_to_2x64:
; KNL: # %bb.0:
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vpsraq $63, %zmm0, %zmm0
; KNL-NEXT: vpmovzxwq {{.*#+}} xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxwq {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_2x16mem_to_2x64:
@ -886,9 +893,10 @@ define <2 x i64> @sext_2x16mem_to_2x64mask(<2 x i16> *%i , <2 x i1> %mask) nounw
; KNL-LABEL: sext_2x16mem_to_2x64mask:
; KNL: # %bb.0:
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vpsraq $63, %zmm0, %zmm0
; KNL-NEXT: vpmovsxwq (%rdi), %xmm1
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxwq (%rdi), %xmm0
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_2x16mem_to_2x64mask:
@ -917,10 +925,10 @@ define <4 x i64> @zext_4x16mem_to_4x64(<4 x i16> *%i , <4 x i1> %mask) nounwind
; KNL-LABEL: zext_4x16mem_to_4x64:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; KNL-NEXT: vpmovzxwq {{.*#+}} ymm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
; KNL-NEXT: vpand %ymm1, %ymm0, %ymm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxwq {{.*#+}} ymm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_4x16mem_to_4x64:
@ -939,10 +947,10 @@ define <4 x i64> @sext_4x16mem_to_4x64mask(<4 x i16> *%i , <4 x i1> %mask) nounw
; KNL-LABEL: sext_4x16mem_to_4x64mask:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovsxdq %xmm0, %ymm0
; KNL-NEXT: vpmovsxwq (%rdi), %ymm1
; KNL-NEXT: vpand %ymm1, %ymm0, %ymm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxwq (%rdi), %ymm0
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_4x16mem_to_4x64mask:
@ -1052,9 +1060,10 @@ define <2 x i64> @zext_2x32mem_to_2x64(<2 x i32> *%i , <2 x i1> %mask) nounwind
; KNL-LABEL: zext_2x32mem_to_2x64:
; KNL: # %bb.0:
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vpsraq $63, %zmm0, %zmm0
; KNL-NEXT: vpmovzxdq {{.*#+}} xmm1 = mem[0],zero,mem[1],zero
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxdq {{.*#+}} xmm0 = mem[0],zero,mem[1],zero
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_2x32mem_to_2x64:
@ -1073,9 +1082,10 @@ define <2 x i64> @sext_2x32mem_to_2x64mask(<2 x i32> *%i , <2 x i1> %mask) nounw
; KNL-LABEL: sext_2x32mem_to_2x64mask:
; KNL: # %bb.0:
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vpsraq $63, %zmm0, %zmm0
; KNL-NEXT: vpmovsxdq (%rdi), %xmm1
; KNL-NEXT: vpand %xmm1, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxdq (%rdi), %xmm0
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_2x32mem_to_2x64mask:
@ -1104,10 +1114,10 @@ define <4 x i64> @zext_4x32mem_to_4x64(<4 x i32> *%i , <4 x i1> %mask) nounwind
; KNL-LABEL: zext_4x32mem_to_4x64:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; KNL-NEXT: vpmovzxdq {{.*#+}} ymm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
; KNL-NEXT: vpand %ymm1, %ymm0, %ymm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovzxdq {{.*#+}} ymm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_4x32mem_to_4x64:
@ -1126,10 +1136,10 @@ define <4 x i64> @sext_4x32mem_to_4x64mask(<4 x i32> *%i , <4 x i1> %mask) nounw
; KNL-LABEL: sext_4x32mem_to_4x64mask:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vpsrad $31, %xmm0, %xmm0
; KNL-NEXT: vpmovsxdq %xmm0, %ymm0
; KNL-NEXT: vpmovsxdq (%rdi), %ymm1
; KNL-NEXT: vpand %ymm1, %ymm0, %ymm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpmovsxdq (%rdi), %ymm0
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: sext_4x32mem_to_4x64mask:
@ -1167,10 +1177,10 @@ define <4 x i64> @zext_4x32_to_4x64mask(<4 x i32> %a , <4 x i1> %mask) nounwind
; KNL-LABEL: zext_4x32_to_4x64mask:
; KNL: # %bb.0:
; KNL-NEXT: vpslld $31, %xmm1, %xmm1
; KNL-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL-NEXT: vpmovzxdq {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
; KNL-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL-NEXT: vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; KNL-NEXT: vpand %ymm0, %ymm1, %ymm0
; KNL-NEXT: vmovdqa64 %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: zext_4x32_to_4x64mask:

View File

@ -844,40 +844,20 @@ define i32 @test_insertelement_v32i1(i32 %a, i32 %b, <32 x i32> %x , <32 x i32>
define i8 @test_iinsertelement_v4i1(i32 %a, i32 %b, <4 x i32> %x , <4 x i32> %y) {
; KNL-LABEL: test_iinsertelement_v4i1:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: cmpl %esi, %edi
; KNL-NEXT: setb %al
; KNL-NEXT: vpbroadcastd {{.*#+}} xmm2 = [2147483648,2147483648,2147483648,2147483648]
; KNL-NEXT: vpxor %xmm2, %xmm0, %xmm0
; KNL-NEXT: vpxor %xmm2, %xmm1, %xmm1
; KNL-NEXT: vpcmpgtd %xmm0, %xmm1, %xmm0
; KNL-NEXT: vpextrb $4, %xmm0, %ecx
; KNL-NEXT: kmovw %ecx, %k0
; KNL-NEXT: vpextrb $0, %xmm0, %ecx
; KNL-NEXT: andl $1, %ecx
; KNL-NEXT: kmovw %ecx, %k1
; KNL-NEXT: kshiftrw $1, %k0, %k2
; KNL-NEXT: kshiftlw $1, %k2, %k2
; KNL-NEXT: korw %k1, %k2, %k1
; KNL-NEXT: kshiftrw $1, %k1, %k2
; KNL-NEXT: kxorw %k0, %k2, %k0
; KNL-NEXT: kshiftlw $15, %k0, %k0
; KNL-NEXT: kshiftrw $14, %k0, %k0
; KNL-NEXT: kxorw %k1, %k0, %k0
; KNL-NEXT: vpcmpltud %zmm1, %zmm0, %k0
; KNL-NEXT: kshiftrw $2, %k0, %k1
; KNL-NEXT: kmovw %eax, %k2
; KNL-NEXT: kxorw %k2, %k1, %k1
; KNL-NEXT: kshiftlw $15, %k1, %k1
; KNL-NEXT: kshiftrw $13, %k1, %k1
; KNL-NEXT: kxorw %k0, %k1, %k0
; KNL-NEXT: kshiftrw $3, %k0, %k1
; KNL-NEXT: vpextrb $12, %xmm0, %eax
; KNL-NEXT: kmovw %eax, %k2
; KNL-NEXT: kxorw %k2, %k1, %k1
; KNL-NEXT: kshiftlw $15, %k1, %k1
; KNL-NEXT: kshiftrw $12, %k1, %k1
; KNL-NEXT: kxorw %k0, %k1, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: ## kill: def %al killed %al killed %eax
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test_iinsertelement_v4i1:
@ -905,18 +885,11 @@ define i8 @test_iinsertelement_v4i1(i32 %a, i32 %b, <4 x i32> %x , <4 x i32> %y)
define i8 @test_iinsertelement_v2i1(i32 %a, i32 %b, <2 x i64> %x , <2 x i64> %y) {
; KNL-LABEL: test_iinsertelement_v2i1:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: cmpl %esi, %edi
; KNL-NEXT: setb %al
; KNL-NEXT: vmovdqa {{.*#+}} xmm2 = [9223372036854775808,9223372036854775808]
; KNL-NEXT: vpxor %xmm2, %xmm0, %xmm0
; KNL-NEXT: vpxor %xmm2, %xmm1, %xmm1
; KNL-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; KNL-NEXT: vpextrb $0, %xmm0, %ecx
; KNL-NEXT: andl $1, %ecx
; KNL-NEXT: kmovw %ecx, %k0
; KNL-NEXT: kshiftrw $1, %k0, %k1
; KNL-NEXT: kshiftlw $1, %k1, %k1
; KNL-NEXT: korw %k0, %k1, %k0
; KNL-NEXT: vpcmpltuq %zmm1, %zmm0, %k0
; KNL-NEXT: kshiftrw $1, %k0, %k1
; KNL-NEXT: kmovw %eax, %k2
; KNL-NEXT: kxorw %k2, %k1, %k1
@ -925,6 +898,7 @@ define i8 @test_iinsertelement_v2i1(i32 %a, i32 %b, <2 x i64> %x , <2 x i64> %y)
; KNL-NEXT: kxorw %k0, %k1, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: ## kill: def %al killed %al killed %eax
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test_iinsertelement_v2i1:
@ -952,15 +926,15 @@ define i8 @test_iinsertelement_v2i1(i32 %a, i32 %b, <2 x i64> %x , <2 x i64> %y)
define zeroext i8 @test_extractelement_v2i1(<2 x i64> %a, <2 x i64> %b) {
; KNL-LABEL: test_extractelement_v2i1:
; KNL: ## %bb.0:
; KNL-NEXT: vmovdqa {{.*#+}} xmm2 = [9223372036854775808,9223372036854775808]
; KNL-NEXT: vpxor %xmm2, %xmm1, %xmm1
; KNL-NEXT: vpxor %xmm2, %xmm0, %xmm0
; KNL-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm0
; KNL-NEXT: vpextrb $0, %xmm0, %eax
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpcmpnleuq %zmm1, %zmm0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: andb $1, %al
; KNL-NEXT: movb $4, %cl
; KNL-NEXT: subb %al, %cl
; KNL-NEXT: movzbl %cl, %eax
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test_extractelement_v2i1:
@ -981,15 +955,15 @@ define zeroext i8 @test_extractelement_v2i1(<2 x i64> %a, <2 x i64> %b) {
define zeroext i8 @extractelement_v2i1_alt(<2 x i64> %a, <2 x i64> %b) {
; KNL-LABEL: extractelement_v2i1_alt:
; KNL: ## %bb.0:
; KNL-NEXT: vmovdqa {{.*#+}} xmm2 = [9223372036854775808,9223372036854775808]
; KNL-NEXT: vpxor %xmm2, %xmm1, %xmm1
; KNL-NEXT: vpxor %xmm2, %xmm0, %xmm0
; KNL-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm0
; KNL-NEXT: vpextrb $0, %xmm0, %eax
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpcmpnleuq %zmm1, %zmm0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: andb $1, %al
; KNL-NEXT: movb $4, %cl
; KNL-NEXT: subb %al, %cl
; KNL-NEXT: movzbl %cl, %eax
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: extractelement_v2i1_alt:
@ -1011,12 +985,13 @@ define zeroext i8 @extractelement_v2i1_alt(<2 x i64> %a, <2 x i64> %b) {
define zeroext i8 @test_extractelement_v4i1(<4 x i32> %a, <4 x i32> %b) {
; KNL-LABEL: test_extractelement_v4i1:
; KNL: ## %bb.0:
; KNL-NEXT: vpbroadcastd {{.*#+}} xmm2 = [2147483648,2147483648,2147483648,2147483648]
; KNL-NEXT: vpxor %xmm2, %xmm1, %xmm1
; KNL-NEXT: vpxor %xmm2, %xmm0, %xmm0
; KNL-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
; KNL-NEXT: vpextrd $3, %xmm0, %eax
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpcmpnleud %zmm1, %zmm0, %k0
; KNL-NEXT: kshiftrw $3, %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: andl $1, %eax
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test_extractelement_v4i1:
@ -1550,14 +1525,15 @@ define zeroext i8 @test_extractelement_varible_v2i1(<2 x i64> %a, <2 x i64> %b,
; KNL-LABEL: test_extractelement_varible_v2i1:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %edi killed %edi def %rdi
; KNL-NEXT: vmovdqa {{.*#+}} xmm2 = [9223372036854775808,9223372036854775808]
; KNL-NEXT: vpxor %xmm2, %xmm1, %xmm1
; KNL-NEXT: vpxor %xmm2, %xmm0, %xmm0
; KNL-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm0
; KNL-NEXT: vmovdqa %xmm0, -{{[0-9]+}}(%rsp)
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpcmpnleuq %zmm1, %zmm0, %k1
; KNL-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: vextracti32x4 $0, %zmm0, -{{[0-9]+}}(%rsp)
; KNL-NEXT: andl $1, %edi
; KNL-NEXT: movl -24(%rsp,%rdi,8), %eax
; KNL-NEXT: movzbl -24(%rsp,%rdi,8), %eax
; KNL-NEXT: andl $1, %eax
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test_extractelement_varible_v2i1:
@ -1580,14 +1556,15 @@ define zeroext i8 @test_extractelement_varible_v4i1(<4 x i32> %a, <4 x i32> %b,
; KNL-LABEL: test_extractelement_varible_v4i1:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %edi killed %edi def %rdi
; KNL-NEXT: vpbroadcastd {{.*#+}} xmm2 = [2147483648,2147483648,2147483648,2147483648]
; KNL-NEXT: vpxor %xmm2, %xmm1, %xmm1
; KNL-NEXT: vpxor %xmm2, %xmm0, %xmm0
; KNL-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
; KNL-NEXT: vmovdqa %xmm0, -{{[0-9]+}}(%rsp)
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpcmpnleud %zmm1, %zmm0, %k1
; KNL-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: vextracti32x4 $0, %zmm0, -{{[0-9]+}}(%rsp)
; KNL-NEXT: andl $3, %edi
; KNL-NEXT: movl -24(%rsp,%rdi,4), %eax
; KNL-NEXT: movzbl -24(%rsp,%rdi,4), %eax
; KNL-NEXT: andl $1, %eax
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test_extractelement_varible_v4i1:

View File

@ -3004,20 +3004,8 @@ declare <8 x i64> @llvm.x86.avx512.mask.pmulu.dq.512(<16 x i32>, <16 x i32>, <8
define <4 x float> @test_mask_vextractf32x4(<4 x float> %b, <16 x float> %a, i8 %mask) {
; CHECK-LABEL: test_mask_vextractf32x4:
; CHECK: ## %bb.0:
; CHECK-NEXT: vmovd %edi, %xmm2
; CHECK-NEXT: kmovw %edi, %k0
; CHECK-NEXT: kshiftrw $3, %k0, %k1
; CHECK-NEXT: kmovw %k1, %eax
; CHECK-NEXT: kshiftrw $2, %k0, %k1
; CHECK-NEXT: kmovw %k1, %ecx
; CHECK-NEXT: kshiftrw $1, %k0, %k0
; CHECK-NEXT: kmovw %k0, %edx
; CHECK-NEXT: vpinsrb $4, %edx, %xmm2, %xmm2
; CHECK-NEXT: vpinsrb $8, %ecx, %xmm2, %xmm2
; CHECK-NEXT: vpinsrb $12, %eax, %xmm2, %xmm2
; CHECK-NEXT: vextractf32x4 $2, %zmm1, %xmm1
; CHECK-NEXT: vpslld $31, %xmm2, %xmm2
; CHECK-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0
; CHECK-NEXT: kmovw %edi, %k1
; CHECK-NEXT: vextractf32x4 $2, %zmm1, %xmm0 {%k1}
; CHECK-NEXT: retq
%res = call <4 x float> @llvm.x86.avx512.mask.vextractf32x4.512(<16 x float> %a, i32 2, <4 x float> %b, i8 %mask)
ret <4 x float> %res
@ -3028,21 +3016,8 @@ declare <4 x float> @llvm.x86.avx512.mask.vextractf32x4.512(<16 x float>, i32, <
define <4 x i64> @test_mask_vextracti64x4(<4 x i64> %b, <8 x i64> %a, i8 %mask) {
; CHECK-LABEL: test_mask_vextracti64x4:
; CHECK: ## %bb.0:
; CHECK-NEXT: vextractf64x4 $1, %zmm1, %ymm1
; CHECK-NEXT: vmovd %edi, %xmm2
; CHECK-NEXT: kmovw %edi, %k0
; CHECK-NEXT: kshiftrw $3, %k0, %k1
; CHECK-NEXT: kmovw %k1, %eax
; CHECK-NEXT: kshiftrw $2, %k0, %k1
; CHECK-NEXT: kmovw %k1, %ecx
; CHECK-NEXT: kshiftrw $1, %k0, %k0
; CHECK-NEXT: kmovw %k0, %edx
; CHECK-NEXT: vpinsrb $4, %edx, %xmm2, %xmm2
; CHECK-NEXT: vpinsrb $8, %ecx, %xmm2, %xmm2
; CHECK-NEXT: vpinsrb $12, %eax, %xmm2, %xmm2
; CHECK-NEXT: vpslld $31, %xmm2, %xmm2
; CHECK-NEXT: vpmovsxdq %xmm2, %ymm2
; CHECK-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0
; CHECK-NEXT: kmovw %edi, %k1
; CHECK-NEXT: vextracti64x4 $1, %zmm1, %ymm0 {%k1}
; CHECK-NEXT: retq
%res = call <4 x i64> @llvm.x86.avx512.mask.vextracti64x4.512(<8 x i64> %a, i32 1, <4 x i64> %b, i8 %mask)
ret <4 x i64> %res
@ -3053,21 +3028,8 @@ declare <4 x i64> @llvm.x86.avx512.mask.vextracti64x4.512(<8 x i64>, i32, <4 x i
define <4 x i32> @test_maskz_vextracti32x4(<16 x i32> %a, i8 %mask) {
; CHECK-LABEL: test_maskz_vextracti32x4:
; CHECK: ## %bb.0:
; CHECK-NEXT: vmovd %edi, %xmm1
; CHECK-NEXT: kmovw %edi, %k0
; CHECK-NEXT: kshiftrw $3, %k0, %k1
; CHECK-NEXT: kmovw %k1, %eax
; CHECK-NEXT: kshiftrw $2, %k0, %k1
; CHECK-NEXT: kmovw %k1, %ecx
; CHECK-NEXT: kshiftrw $1, %k0, %k0
; CHECK-NEXT: kmovw %k0, %edx
; CHECK-NEXT: vpinsrb $4, %edx, %xmm1, %xmm1
; CHECK-NEXT: vpinsrb $8, %ecx, %xmm1, %xmm1
; CHECK-NEXT: vpinsrb $12, %eax, %xmm1, %xmm1
; CHECK-NEXT: vextracti32x4 $2, %zmm0, %xmm0
; CHECK-NEXT: vpslld $31, %xmm1, %xmm1
; CHECK-NEXT: vpsrad $31, %xmm1, %xmm1
; CHECK-NEXT: vpand %xmm0, %xmm1, %xmm0
; CHECK-NEXT: kmovw %edi, %k1
; CHECK-NEXT: vextracti32x4 $2, %zmm0, %xmm0 {%k1} {z}
; CHECK-NEXT: retq
%res = call <4 x i32> @llvm.x86.avx512.mask.vextracti32x4.512(<16 x i32> %a, i32 2, <4 x i32> zeroinitializer, i8 %mask)
ret <4 x i32> %res

View File

@ -498,11 +498,15 @@ entry:
define <4 x i32> @test4(<4 x i64> %x, <4 x i64> %y, <4 x i64> %x1, <4 x i64> %y1) {
; KNL-LABEL: test4:
; KNL: ## %bb.0:
; KNL-NEXT: vpcmpgtq %ymm1, %ymm0, %ymm0
; KNL-NEXT: vpmovqd %zmm0, %ymm0
; KNL-NEXT: vpcmpgtq %ymm3, %ymm2, %ymm1
; KNL-NEXT: vpmovqd %zmm1, %ymm1
; KNL-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
; KNL-NEXT: ## kill: def %ymm3 killed %ymm3 def %zmm3
; KNL-NEXT: ## kill: def %ymm2 killed %ymm2 def %zmm2
; KNL-NEXT: ## kill: def %ymm1 killed %ymm1 def %zmm1
; KNL-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; KNL-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
; KNL-NEXT: vpcmpgtq %zmm3, %zmm2, %k1
; KNL-NEXT: kandnw %k0, %k1, %k1
; KNL-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
@ -517,21 +521,29 @@ define <4 x i32> @test4(<4 x i64> %x, <4 x i64> %y, <4 x i64> %x1, <4 x i64> %y1
;
; AVX512BW-LABEL: test4:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: vpcmpgtq %ymm1, %ymm0, %ymm0
; AVX512BW-NEXT: vpmovqd %zmm0, %ymm0
; AVX512BW-NEXT: vpcmpgtq %ymm3, %ymm2, %ymm1
; AVX512BW-NEXT: vpmovqd %zmm1, %ymm1
; AVX512BW-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
; AVX512BW-NEXT: ## kill: def %ymm3 killed %ymm3 def %zmm3
; AVX512BW-NEXT: ## kill: def %ymm2 killed %ymm2 def %zmm2
; AVX512BW-NEXT: ## kill: def %ymm1 killed %ymm1 def %zmm1
; AVX512BW-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512BW-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
; AVX512BW-NEXT: vpcmpgtq %zmm3, %zmm2, %k1
; AVX512BW-NEXT: kandnw %k0, %k1, %k1
; AVX512BW-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512BW-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512BW-NEXT: vzeroupper
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test4:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: vpcmpgtq %ymm1, %ymm0, %ymm0
; AVX512DQ-NEXT: vpmovqd %zmm0, %ymm0
; AVX512DQ-NEXT: vpcmpgtq %ymm3, %ymm2, %ymm1
; AVX512DQ-NEXT: vpmovqd %zmm1, %ymm1
; AVX512DQ-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
; AVX512DQ-NEXT: ## kill: def %ymm3 killed %ymm3 def %zmm3
; AVX512DQ-NEXT: ## kill: def %ymm2 killed %ymm2 def %zmm2
; AVX512DQ-NEXT: ## kill: def %ymm1 killed %ymm1 def %zmm1
; AVX512DQ-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512DQ-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
; AVX512DQ-NEXT: vpcmpgtq %zmm3, %zmm2, %k1
; AVX512DQ-NEXT: kandnw %k0, %k1, %k0
; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0
; AVX512DQ-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%x_gt_y = icmp sgt <4 x i64> %x, %y
@ -544,9 +556,16 @@ define <4 x i32> @test4(<4 x i64> %x, <4 x i64> %y, <4 x i64> %x1, <4 x i64> %y1
define <2 x i64> @test5(<2 x i64> %x, <2 x i64> %y, <2 x i64> %x1, <2 x i64> %y1) {
; KNL-LABEL: test5:
; KNL: ## %bb.0:
; KNL-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; KNL-NEXT: vpcmpgtq %xmm3, %xmm2, %xmm1
; KNL-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; KNL-NEXT: ## kill: def %xmm3 killed %xmm3 def %zmm3
; KNL-NEXT: ## kill: def %xmm2 killed %xmm2 def %zmm2
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpcmpgtq %zmm0, %zmm1, %k0
; KNL-NEXT: vpcmpgtq %zmm3, %zmm2, %k1
; KNL-NEXT: kandnw %k1, %k0, %k1
; KNL-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test5:
@ -559,16 +578,30 @@ define <2 x i64> @test5(<2 x i64> %x, <2 x i64> %y, <2 x i64> %x1, <2 x i64> %y1
;
; AVX512BW-LABEL: test5:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; AVX512BW-NEXT: vpcmpgtq %xmm3, %xmm2, %xmm1
; AVX512BW-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; AVX512BW-NEXT: ## kill: def %xmm3 killed %xmm3 def %zmm3
; AVX512BW-NEXT: ## kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512BW-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512BW-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512BW-NEXT: vpcmpgtq %zmm0, %zmm1, %k0
; AVX512BW-NEXT: vpcmpgtq %zmm3, %zmm2, %k1
; AVX512BW-NEXT: kandnw %k1, %k0, %k1
; AVX512BW-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512BW-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512BW-NEXT: vzeroupper
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test5:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; AVX512DQ-NEXT: vpcmpgtq %xmm3, %xmm2, %xmm1
; AVX512DQ-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
; AVX512DQ-NEXT: ## kill: def %xmm3 killed %xmm3 def %zmm3
; AVX512DQ-NEXT: ## kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512DQ-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512DQ-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512DQ-NEXT: vpcmpgtq %zmm0, %zmm1, %k0
; AVX512DQ-NEXT: vpcmpgtq %zmm3, %zmm2, %k1
; AVX512DQ-NEXT: kandnw %k1, %k0, %k0
; AVX512DQ-NEXT: vpmovm2q %k0, %zmm0
; AVX512DQ-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%x_gt_y = icmp slt <2 x i64> %x, %y
%x1_gt_y1 = icmp sgt <2 x i64> %x1, %y1
@ -795,10 +828,17 @@ define <4 x i1> @test11(<4 x i1>%a, <4 x i1>%b, i32 %a1, i32 %b1) {
; KNL-LABEL: test11:
; KNL: ## %bb.0:
; KNL-NEXT: cmpl %esi, %edi
; KNL-NEXT: jg LBB20_2
; KNL-NEXT: ## %bb.1:
; KNL-NEXT: vmovaps %xmm1, %xmm0
; KNL-NEXT: LBB20_2:
; KNL-NEXT: jg LBB20_1
; KNL-NEXT: ## %bb.2:
; KNL-NEXT: vpslld $31, %xmm1, %xmm0
; KNL-NEXT: jmp LBB20_3
; KNL-NEXT: LBB20_1:
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: LBB20_3:
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test11:
@ -818,19 +858,33 @@ define <4 x i1> @test11(<4 x i1>%a, <4 x i1>%b, i32 %a1, i32 %b1) {
; AVX512BW-LABEL: test11:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: cmpl %esi, %edi
; AVX512BW-NEXT: jg LBB20_2
; AVX512BW-NEXT: ## %bb.1:
; AVX512BW-NEXT: vmovaps %xmm1, %xmm0
; AVX512BW-NEXT: LBB20_2:
; AVX512BW-NEXT: jg LBB20_1
; AVX512BW-NEXT: ## %bb.2:
; AVX512BW-NEXT: vpslld $31, %xmm1, %xmm0
; AVX512BW-NEXT: jmp LBB20_3
; AVX512BW-NEXT: LBB20_1:
; AVX512BW-NEXT: vpslld $31, %xmm0, %xmm0
; AVX512BW-NEXT: LBB20_3:
; AVX512BW-NEXT: vptestmd %zmm0, %zmm0, %k1
; AVX512BW-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512BW-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512BW-NEXT: vzeroupper
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test11:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: cmpl %esi, %edi
; AVX512DQ-NEXT: jg LBB20_2
; AVX512DQ-NEXT: ## %bb.1:
; AVX512DQ-NEXT: vmovaps %xmm1, %xmm0
; AVX512DQ-NEXT: LBB20_2:
; AVX512DQ-NEXT: jg LBB20_1
; AVX512DQ-NEXT: ## %bb.2:
; AVX512DQ-NEXT: vpslld $31, %xmm1, %xmm0
; AVX512DQ-NEXT: jmp LBB20_3
; AVX512DQ-NEXT: LBB20_1:
; AVX512DQ-NEXT: vpslld $31, %xmm0, %xmm0
; AVX512DQ-NEXT: LBB20_3:
; AVX512DQ-NEXT: vptestmd %zmm0, %zmm0, %k0
; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0
; AVX512DQ-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
%mask = icmp sgt i32 %a1, %b1
%c = select i1 %mask, <4 x i1>%a, <4 x i1>%b
@ -1271,8 +1325,7 @@ define <32 x i16> @test21(<32 x i16> %x , <32 x i1> %mask) nounwind readnone {
define void @test22(<4 x i1> %a, <4 x i1>* %addr) {
; KNL-LABEL: test22:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %ymm0
; KNL-NEXT: vpslld $31, %ymm0, %ymm0
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: movb %al, (%rdi)
@ -1288,8 +1341,7 @@ define void @test22(<4 x i1> %a, <4 x i1>* %addr) {
;
; AVX512BW-LABEL: test22:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: ## kill: def %xmm0 killed %xmm0 def %ymm0
; AVX512BW-NEXT: vpslld $31, %ymm0, %ymm0
; AVX512BW-NEXT: vpslld $31, %xmm0, %xmm0
; AVX512BW-NEXT: vptestmd %zmm0, %zmm0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: movb %al, (%rdi)
@ -1298,8 +1350,7 @@ define void @test22(<4 x i1> %a, <4 x i1>* %addr) {
;
; AVX512DQ-LABEL: test22:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: ## kill: def %xmm0 killed %xmm0 def %ymm0
; AVX512DQ-NEXT: vpslld $31, %ymm0, %ymm0
; AVX512DQ-NEXT: vpslld $31, %xmm0, %xmm0
; AVX512DQ-NEXT: vptestmd %zmm0, %zmm0, %k0
; AVX512DQ-NEXT: kmovb %k0, (%rdi)
; AVX512DQ-NEXT: vzeroupper
@ -1311,8 +1362,7 @@ define void @test22(<4 x i1> %a, <4 x i1>* %addr) {
define void @test23(<2 x i1> %a, <2 x i1>* %addr) {
; KNL-LABEL: test23:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpsllq $63, %zmm0, %zmm0
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: movb %al, (%rdi)
@ -1328,8 +1378,7 @@ define void @test23(<2 x i1> %a, <2 x i1>* %addr) {
;
; AVX512BW-LABEL: test23:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512BW-NEXT: vpsllq $63, %zmm0, %zmm0
; AVX512BW-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512BW-NEXT: vptestmq %zmm0, %zmm0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: movb %al, (%rdi)
@ -1338,8 +1387,7 @@ define void @test23(<2 x i1> %a, <2 x i1>* %addr) {
;
; AVX512DQ-LABEL: test23:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512DQ-NEXT: vpsllq $63, %zmm0, %zmm0
; AVX512DQ-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512DQ-NEXT: vptestmq %zmm0, %zmm0, %k0
; AVX512DQ-NEXT: kmovb %k0, (%rdi)
; AVX512DQ-NEXT: vzeroupper
@ -1390,10 +1438,9 @@ define void @store_v1i1(<1 x i1> %c , <1 x i1>* %ptr) {
define void @store_v2i1(<2 x i1> %c , <2 x i1>* %ptr) {
; KNL-LABEL: store_v2i1:
; KNL: ## %bb.0:
; KNL-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
; KNL-NEXT: vpxor %xmm1, %xmm0, %xmm0
; KNL-NEXT: vpsllq $63, %zmm0, %zmm0
; KNL-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL-NEXT: vptestmq %zmm0, %zmm0, %k0
; KNL-NEXT: knotw %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: movb %al, (%rdi)
; KNL-NEXT: vzeroupper
@ -1409,10 +1456,9 @@ define void @store_v2i1(<2 x i1> %c , <2 x i1>* %ptr) {
;
; AVX512BW-LABEL: store_v2i1:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
; AVX512BW-NEXT: vpxor %xmm1, %xmm0, %xmm0
; AVX512BW-NEXT: vpsllq $63, %zmm0, %zmm0
; AVX512BW-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512BW-NEXT: vptestmq %zmm0, %zmm0, %k0
; AVX512BW-NEXT: knotw %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: movb %al, (%rdi)
; AVX512BW-NEXT: vzeroupper
@ -1420,10 +1466,9 @@ define void @store_v2i1(<2 x i1> %c , <2 x i1>* %ptr) {
;
; AVX512DQ-LABEL: store_v2i1:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
; AVX512DQ-NEXT: vpxor %xmm1, %xmm0, %xmm0
; AVX512DQ-NEXT: vpsllq $63, %zmm0, %zmm0
; AVX512DQ-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512DQ-NEXT: vptestmq %zmm0, %zmm0, %k0
; AVX512DQ-NEXT: knotw %k0, %k0
; AVX512DQ-NEXT: kmovb %k0, (%rdi)
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq
@ -1435,10 +1480,9 @@ define void @store_v2i1(<2 x i1> %c , <2 x i1>* %ptr) {
define void @store_v4i1(<4 x i1> %c , <4 x i1>* %ptr) {
; KNL-LABEL: store_v4i1:
; KNL: ## %bb.0:
; KNL-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
; KNL-NEXT: vpxor %xmm1, %xmm0, %xmm0
; KNL-NEXT: vpslld $31, %ymm0, %ymm0
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k0
; KNL-NEXT: knotw %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: movb %al, (%rdi)
; KNL-NEXT: vzeroupper
@ -1454,10 +1498,9 @@ define void @store_v4i1(<4 x i1> %c , <4 x i1>* %ptr) {
;
; AVX512BW-LABEL: store_v4i1:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
; AVX512BW-NEXT: vpxor %xmm1, %xmm0, %xmm0
; AVX512BW-NEXT: vpslld $31, %ymm0, %ymm0
; AVX512BW-NEXT: vpslld $31, %xmm0, %xmm0
; AVX512BW-NEXT: vptestmd %zmm0, %zmm0, %k0
; AVX512BW-NEXT: knotw %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: movb %al, (%rdi)
; AVX512BW-NEXT: vzeroupper
@ -1465,10 +1508,9 @@ define void @store_v4i1(<4 x i1> %c , <4 x i1>* %ptr) {
;
; AVX512DQ-LABEL: store_v4i1:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
; AVX512DQ-NEXT: vpxor %xmm1, %xmm0, %xmm0
; AVX512DQ-NEXT: vpslld $31, %ymm0, %ymm0
; AVX512DQ-NEXT: vpslld $31, %xmm0, %xmm0
; AVX512DQ-NEXT: vptestmd %zmm0, %zmm0, %k0
; AVX512DQ-NEXT: knotw %k0, %k0
; AVX512DQ-NEXT: kmovb %k0, (%rdi)
; AVX512DQ-NEXT: vzeroupper
; AVX512DQ-NEXT: retq

View File

@ -72,9 +72,13 @@ define <8 x i64> @test6_unsigned(<8 x i64> %x, <8 x i64> %y, <8 x i64> %x1) noun
define <4 x float> @test7(<4 x float> %a, <4 x float> %b) {
; KNL-LABEL: test7:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vxorps %xmm2, %xmm2, %xmm2
; KNL-NEXT: vcmpltps %xmm2, %xmm0, %xmm2
; KNL-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; KNL-NEXT: vcmpltps %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmps %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test7:
@ -92,9 +96,13 @@ define <4 x float> @test7(<4 x float> %a, <4 x float> %b) {
define <2 x double> @test8(<2 x double> %a, <2 x double> %b) {
; KNL-LABEL: test8:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vxorpd %xmm2, %xmm2, %xmm2
; KNL-NEXT: vcmpltpd %xmm2, %xmm0, %xmm2
; KNL-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; KNL-NEXT: vcmpltpd %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test8:
@ -537,8 +545,11 @@ define <16 x i8>@test29(<16 x i32> %x, <16 x i32> %y, <16 x i32> %x1, <16 x i32>
define <4 x double> @test30(<4 x double> %x, <4 x double> %y) nounwind {
; KNL-LABEL: test30:
; KNL: ## %bb.0:
; KNL-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm2
; KNL-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; KNL-NEXT: ## kill: def %ymm1 killed %ymm1 def %zmm1
; KNL-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; KNL-NEXT: vcmpeqpd %zmm1, %zmm0, %k1
; KNL-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: test30:
@ -555,8 +566,13 @@ define <4 x double> @test30(<4 x double> %x, <4 x double> %y) nounwind {
define <2 x double> @test31(<2 x double> %x, <2 x double> %x1, <2 x double>* %yp) nounwind {
; KNL-LABEL: test31:
; KNL: ## %bb.0:
; KNL-NEXT: vcmpltpd (%rdi), %xmm0, %xmm2
; KNL-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vmovupd (%rdi), %xmm2
; KNL-NEXT: vcmpltpd %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test31:
@ -574,8 +590,12 @@ define <2 x double> @test31(<2 x double> %x, <2 x double> %x1, <2 x double>* %yp
define <4 x double> @test32(<4 x double> %x, <4 x double> %x1, <4 x double>* %yp) nounwind {
; KNL-LABEL: test32:
; KNL: ## %bb.0:
; KNL-NEXT: vcmpltpd (%rdi), %ymm0, %ymm2
; KNL-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; KNL-NEXT: ## kill: def %ymm1 killed %ymm1 def %zmm1
; KNL-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; KNL-NEXT: vmovupd (%rdi), %ymm2
; KNL-NEXT: vcmpltpd %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: test32:
@ -605,8 +625,13 @@ define <8 x double> @test33(<8 x double> %x, <8 x double> %x1, <8 x double>* %yp
define <4 x float> @test34(<4 x float> %x, <4 x float> %x1, <4 x float>* %yp) nounwind {
; KNL-LABEL: test34:
; KNL: ## %bb.0:
; KNL-NEXT: vcmpltps (%rdi), %xmm0, %xmm2
; KNL-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vmovups (%rdi), %xmm2
; KNL-NEXT: vcmpltps %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmps %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test34:
@ -674,9 +699,12 @@ define <8 x double> @test37(<8 x double> %x, <8 x double> %x1, double* %ptr) nou
define <4 x double> @test38(<4 x double> %x, <4 x double> %x1, double* %ptr) nounwind {
; KNL-LABEL: test38:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %ymm1 killed %ymm1 def %zmm1
; KNL-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; KNL-NEXT: vbroadcastsd (%rdi), %ymm2
; KNL-NEXT: vcmpltpd %ymm2, %ymm0, %ymm2
; KNL-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; KNL-NEXT: vcmpltpd %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; KNL-NEXT: retq
;
; SKX-LABEL: test38:
@ -697,9 +725,13 @@ define <4 x double> @test38(<4 x double> %x, <4 x double> %x1, double* %ptr) nou
define <2 x double> @test39(<2 x double> %x, <2 x double> %x1, double* %ptr) nounwind {
; KNL-LABEL: test39:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vmovddup {{.*#+}} xmm2 = mem[0,0]
; KNL-NEXT: vcmpltpd %xmm2, %xmm0, %xmm2
; KNL-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; KNL-NEXT: vcmpltpd %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test39:
@ -763,9 +795,13 @@ define <8 x float> @test41(<8 x float> %x, <8 x float> %x1, float* %ptr) noun
define <4 x float> @test42(<4 x float> %x, <4 x float> %x1, float* %ptr) nounwind {
; KNL-LABEL: test42:
; KNL: ## %bb.0:
; KNL-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vbroadcastss (%rdi), %xmm2
; KNL-NEXT: vcmpltps %xmm2, %xmm0, %xmm2
; KNL-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; KNL-NEXT: vcmpltps %zmm2, %zmm0, %k1
; KNL-NEXT: vblendmps %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test42:

View File

@ -6,18 +6,12 @@ declare <2 x double> @llvm.x86.avx512.mask.vextractf64x2.512(<8 x double>, i32,
define <2 x double>@test_int_x86_avx512_mask_vextractf64x2_512(<8 x double> %x0, <2 x double> %x2, i8 %x3) {
; CHECK-LABEL: test_int_x86_avx512_mask_vextractf64x2_512:
; CHECK: ## %bb.0:
; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm0
; CHECK-NEXT: vmovd %edi, %xmm2
; CHECK-NEXT: kmovw %edi, %k0
; CHECK-NEXT: kshiftrb $1, %k0, %k0
; CHECK-NEXT: kmovw %k0, %eax
; CHECK-NEXT: vpinsrb $8, %eax, %xmm2, %xmm2
; CHECK-NEXT: vpsllq $63, %xmm2, %xmm2
; CHECK-NEXT: vpsraq $63, %zmm2, %zmm2
; CHECK-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm1
; CHECK-NEXT: vandpd %xmm0, %xmm2, %xmm2
; CHECK-NEXT: vaddpd %xmm0, %xmm1, %xmm0
; CHECK-NEXT: vaddpd %xmm0, %xmm2, %xmm0
; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm2
; CHECK-NEXT: kmovw %edi, %k1
; CHECK-NEXT: vextractf64x2 $1, %zmm0, %xmm1 {%k1}
; CHECK-NEXT: vextractf64x2 $1, %zmm0, %xmm0 {%k1} {z}
; CHECK-NEXT: vaddpd %xmm2, %xmm1, %xmm1
; CHECK-NEXT: vaddpd %xmm1, %xmm0, %xmm0
; CHECK-NEXT: retq
%res = call <2 x double> @llvm.x86.avx512.mask.vextractf64x2.512(<8 x double> %x0,i32 1, <2 x double> %x2, i8 %x3)
%res2 = call <2 x double> @llvm.x86.avx512.mask.vextractf64x2.512(<8 x double> %x0,i32 1, <2 x double> zeroinitializer, i8 %x3)

View File

@ -11,8 +11,11 @@ define <4 x i64> @test256_1(<4 x i64> %x, <4 x i64> %y) nounwind {
;
; NoVLX-LABEL: test256_1:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpeqq %ymm1, %ymm0, %ymm2
; NoVLX-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NoVLX-NEXT: retq
%mask = icmp eq <4 x i64> %x, %y
%max = select <4 x i1> %mask, <4 x i64> %x, <4 x i64> %y
@ -28,8 +31,12 @@ define <4 x i64> @test256_2(<4 x i64> %x, <4 x i64> %y, <4 x i64> %x1) nounwind
;
; NoVLX-LABEL: test256_2:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vblendvpd %ymm0, %ymm2, %ymm1, %ymm0
; NoVLX-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; NoVLX-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm2, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NoVLX-NEXT: retq
%mask = icmp sgt <4 x i64> %x, %y
%max = select <4 x i1> %mask, <4 x i64> %x1, <4 x i64> %y
@ -66,11 +73,12 @@ define <4 x i64> @test256_4(<4 x i64> %x, <4 x i64> %y, <4 x i64> %x1) nounwind
;
; NoVLX-LABEL: test256_4:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpbroadcastq {{.*#+}} ymm3 = [9223372036854775808,9223372036854775808,9223372036854775808,9223372036854775808]
; NoVLX-NEXT: vpxor %ymm3, %ymm1, %ymm4
; NoVLX-NEXT: vpxor %ymm3, %ymm0, %ymm0
; NoVLX-NEXT: vpcmpgtq %ymm4, %ymm0, %ymm0
; NoVLX-NEXT: vblendvpd %ymm0, %ymm2, %ymm1, %ymm0
; NoVLX-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; NoVLX-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NoVLX-NEXT: vpcmpnleuq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm2, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NoVLX-NEXT: retq
%mask = icmp ugt <4 x i64> %x, %y
%max = select <4 x i1> %mask, <4 x i64> %x1, <4 x i64> %y
@ -289,12 +297,14 @@ define <4 x i64> @test256_10(<4 x i64> %x, <4 x i64> %y, <4 x i64> %x1, <4 x i64
;
; NoVLX-LABEL: test256_10:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq %ymm2, %ymm3, %ymm3
; NoVLX-NEXT: vpcmpeqd %ymm4, %ymm4, %ymm4
; NoVLX-NEXT: vpxor %ymm4, %ymm3, %ymm3
; NoVLX-NEXT: vpcmpgtq %ymm1, %ymm0, %ymm1
; NoVLX-NEXT: vpandn %ymm3, %ymm1, %ymm1
; NoVLX-NEXT: vblendvpd %ymm1, %ymm0, %ymm2, %ymm0
; NoVLX-NEXT: # kill: def %ymm3 killed %ymm3 def %zmm3
; NoVLX-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; NoVLX-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NoVLX-NEXT: vpcmpleq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpcmpleq %zmm2, %zmm3, %k1 {%k1}
; NoVLX-NEXT: vpblendmq %zmm0, %zmm2, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sge <4 x i64> %x1, %y1
%mask0 = icmp sle <4 x i64> %x, %y
@ -313,10 +323,14 @@ define <4 x i64> @test256_11(<4 x i64> %x, <4 x i64>* %y.ptr, <4 x i64> %x1, <4
;
; NoVLX-LABEL: test256_11:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq (%rdi), %ymm0, %ymm3
; NoVLX-NEXT: vpcmpgtq %ymm2, %ymm1, %ymm2
; NoVLX-NEXT: vpand %ymm2, %ymm3, %ymm2
; NoVLX-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; NoVLX-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %ymm3
; NoVLX-NEXT: vpcmpgtq %zmm3, %zmm0, %k1
; NoVLX-NEXT: vpcmpgtq %zmm2, %zmm1, %k1 {%k1}
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sgt <4 x i64> %x1, %y1
%y = load <4 x i64>, <4 x i64>* %y.ptr, align 4
@ -362,9 +376,12 @@ define <4 x i64> @test256_13(<4 x i64> %x, <4 x i64> %x1, i64* %yb.ptr) nounwind
;
; NoVLX-LABEL: test256_13:
; NoVLX: # %bb.0:
; NoVLX-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NoVLX-NEXT: vpbroadcastq (%rdi), %ymm2
; NoVLX-NEXT: vpcmpeqq %ymm2, %ymm0, %ymm2
; NoVLX-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpcmpeqq %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NoVLX-NEXT: retq
%yb = load i64, i64* %yb.ptr, align 4
%y.0 = insertelement <4 x i64> undef, i64 %yb, i32 0
@ -437,11 +454,14 @@ define <4 x i64> @test256_16(<4 x i64> %x, i64* %yb.ptr, <4 x i64> %x1, <4 x i64
;
; NoVLX-LABEL: test256_16:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq %ymm1, %ymm2, %ymm2
; NoVLX-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; NoVLX-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; NoVLX-NEXT: vpbroadcastq (%rdi), %ymm3
; NoVLX-NEXT: vpcmpgtq %ymm3, %ymm0, %ymm3
; NoVLX-NEXT: vpandn %ymm3, %ymm2, %ymm2
; NoVLX-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpcmpgtq %zmm3, %zmm0, %k1
; NoVLX-NEXT: vpcmpleq %zmm1, %zmm2, %k1 {%k1}
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sge <4 x i64> %x1, %y1
%yb = load i64, i64* %yb.ptr, align 4
@ -550,8 +570,11 @@ define <2 x i64> @test128_1(<2 x i64> %x, <2 x i64> %y) nounwind {
;
; NoVLX-LABEL: test128_1:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpeqq %xmm1, %xmm0, %xmm2
; NoVLX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask = icmp eq <2 x i64> %x, %y
%max = select <2 x i1> %mask, <2 x i64> %x, <2 x i64> %y
@ -567,8 +590,12 @@ define <2 x i64> @test128_2(<2 x i64> %x, <2 x i64> %y, <2 x i64> %x1) nounwind
;
; NoVLX-LABEL: test128_2:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vblendvpd %xmm0, %xmm2, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm2, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask = icmp sgt <2 x i64> %x, %y
%max = select <2 x i1> %mask, <2 x i64> %x1, <2 x i64> %y
@ -584,10 +611,12 @@ define <4 x i32> @test128_3(<4 x i32> %x, <4 x i32> %y, <4 x i32> %x1) nounwind
;
; NoVLX-LABEL: test128_3:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtd %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3
; NoVLX-NEXT: vpxor %xmm3, %xmm0, %xmm0
; NoVLX-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpcmpled %zmm0, %zmm1, %k1
; NoVLX-NEXT: vpblendmd %zmm2, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask = icmp sge <4 x i32> %x, %y
%max = select <4 x i1> %mask, <4 x i32> %x1, <4 x i32> %y
@ -603,11 +632,12 @@ define <2 x i64> @test128_4(<2 x i64> %x, <2 x i64> %y, <2 x i64> %x1) nounwind
;
; NoVLX-LABEL: test128_4:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vmovdqa {{.*#+}} xmm3 = [9223372036854775808,9223372036854775808]
; NoVLX-NEXT: vpxor %xmm3, %xmm1, %xmm4
; NoVLX-NEXT: vpxor %xmm3, %xmm0, %xmm0
; NoVLX-NEXT: vpcmpgtq %xmm4, %xmm0, %xmm0
; NoVLX-NEXT: vblendvpd %xmm0, %xmm2, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpcmpnleuq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm2, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask = icmp ugt <2 x i64> %x, %y
%max = select <2 x i1> %mask, <2 x i64> %x1, <2 x i64> %y
@ -623,8 +653,12 @@ define <4 x i32> @test128_5(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %yp) nounwin
;
; NoVLX-LABEL: test128_5:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpeqd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpeqd %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %yp, align 4
%mask = icmp eq <4 x i32> %x, %y
@ -641,8 +675,12 @@ define <4 x i32> @test128_5b(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %yp) nounwi
;
; NoVLX-LABEL: test128_5b:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpeqd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpeqd %zmm0, %zmm2, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %yp, align 4
%mask = icmp eq <4 x i32> %y, %x
@ -659,8 +697,12 @@ define <4 x i32> @test128_6(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) noun
;
; NoVLX-LABEL: test128_6:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpgtd %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp sgt <4 x i32> %x, %y
@ -677,8 +719,12 @@ define <4 x i32> @test128_6b(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) nou
;
; NoVLX-LABEL: test128_6b:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpgtd %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp slt <4 x i32> %y, %x
@ -695,10 +741,12 @@ define <4 x i32> @test128_7(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) noun
;
; NoVLX-LABEL: test128_7:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3
; NoVLX-NEXT: vpxor %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpled %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp sle <4 x i32> %x, %y
@ -715,10 +763,12 @@ define <4 x i32> @test128_7b(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) nou
;
; NoVLX-LABEL: test128_7b:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3
; NoVLX-NEXT: vpxor %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpled %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp sge <4 x i32> %y, %x
@ -735,9 +785,12 @@ define <4 x i32> @test128_8(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) noun
;
; NoVLX-LABEL: test128_8:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpminud (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpleud %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp ule <4 x i32> %x, %y
@ -754,10 +807,12 @@ define <4 x i32> @test128_8b(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) nou
;
; NoVLX-LABEL: test128_8b:
; NoVLX: # %bb.0:
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpmaxud %xmm0, %xmm2, %xmm3
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpcmpnltud %zmm0, %zmm2, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp uge <4 x i32> %y, %x
@ -775,10 +830,14 @@ define <4 x i32> @test128_9(<4 x i32> %x, <4 x i32> %y, <4 x i32> %x1, <4 x i32>
;
; NoVLX-LABEL: test128_9:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm3
; NoVLX-NEXT: vpand %xmm2, %xmm3, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm3 killed %xmm3 def %zmm3
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpcmpeqd %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpcmpeqd %zmm3, %zmm2, %k1 {%k1}
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp eq <4 x i32> %x1, %y1
%mask0 = icmp eq <4 x i32> %x, %y
@ -797,12 +856,14 @@ define <2 x i64> @test128_10(<2 x i64> %x, <2 x i64> %y, <2 x i64> %x1, <2 x i64
;
; NoVLX-LABEL: test128_10:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq %xmm2, %xmm3, %xmm3
; NoVLX-NEXT: vpcmpeqd %xmm4, %xmm4, %xmm4
; NoVLX-NEXT: vpxor %xmm4, %xmm3, %xmm3
; NoVLX-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm1
; NoVLX-NEXT: vpandn %xmm3, %xmm1, %xmm1
; NoVLX-NEXT: vblendvpd %xmm1, %xmm0, %xmm2, %xmm0
; NoVLX-NEXT: # kill: def %xmm3 killed %xmm3 def %zmm3
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpcmpleq %zmm1, %zmm0, %k1
; NoVLX-NEXT: vpcmpleq %zmm2, %zmm3, %k1 {%k1}
; NoVLX-NEXT: vpblendmq %zmm0, %zmm2, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sge <2 x i64> %x1, %y1
%mask0 = icmp sle <2 x i64> %x, %y
@ -821,10 +882,14 @@ define <2 x i64> @test128_11(<2 x i64> %x, <2 x i64>* %y.ptr, <2 x i64> %x1, <2
;
; NoVLX-LABEL: test128_11:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq (%rdi), %xmm0, %xmm3
; NoVLX-NEXT: vpcmpgtq %xmm2, %xmm1, %xmm2
; NoVLX-NEXT: vpand %xmm2, %xmm3, %xmm2
; NoVLX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm3
; NoVLX-NEXT: vpcmpgtq %zmm3, %zmm0, %k1
; NoVLX-NEXT: vpcmpgtq %zmm2, %zmm1, %k1 {%k1}
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sgt <2 x i64> %x1, %y1
%y = load <2 x i64>, <2 x i64>* %y.ptr, align 4
@ -844,11 +909,14 @@ define <4 x i32> @test128_12(<4 x i32> %x, <4 x i32>* %y.ptr, <4 x i32> %x1, <4
;
; NoVLX-LABEL: test128_12:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtd %xmm1, %xmm2, %xmm2
; NoVLX-NEXT: vpminud (%rdi), %xmm0, %xmm3
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm0, %xmm3
; NoVLX-NEXT: vpandn %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm3
; NoVLX-NEXT: vpcmpleud %zmm3, %zmm0, %k1
; NoVLX-NEXT: vpcmpled %zmm1, %zmm2, %k1 {%k1}
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sge <4 x i32> %x1, %y1
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
@ -867,9 +935,12 @@ define <2 x i64> @test128_13(<2 x i64> %x, <2 x i64> %x1, i64* %yb.ptr) nounwind
;
; NoVLX-LABEL: test128_13:
; NoVLX: # %bb.0:
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm2
; NoVLX-NEXT: vpcmpeqq %xmm2, %xmm0, %xmm2
; NoVLX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpcmpeqq %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%yb = load i64, i64* %yb.ptr, align 4
%y.0 = insertelement <2 x i64> undef, i64 %yb, i32 0
@ -888,11 +959,12 @@ define <4 x i32> @test128_14(<4 x i32> %x, i32* %yb.ptr, <4 x i32> %x1) nounwind
;
; NoVLX-LABEL: test128_14:
; NoVLX: # %bb.0:
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpbroadcastd (%rdi), %xmm2
; NoVLX-NEXT: vpcmpgtd %xmm2, %xmm0, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3
; NoVLX-NEXT: vpxor %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpcmpled %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%yb = load i32, i32* %yb.ptr, align 4
%y.0 = insertelement <4 x i32> undef, i32 %yb, i32 0
@ -912,11 +984,14 @@ define <4 x i32> @test128_15(<4 x i32> %x, i32* %yb.ptr, <4 x i32> %x1, <4 x i32
;
; NoVLX-LABEL: test128_15:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtd %xmm1, %xmm2, %xmm2
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpbroadcastd (%rdi), %xmm3
; NoVLX-NEXT: vpcmpgtd %xmm3, %xmm0, %xmm3
; NoVLX-NEXT: vpandn %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpcmpgtd %zmm3, %zmm0, %k1
; NoVLX-NEXT: vpcmpled %zmm1, %zmm2, %k1 {%k1}
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sge <4 x i32> %x1, %y1
%yb = load i32, i32* %yb.ptr, align 4
@ -938,11 +1013,14 @@ define <2 x i64> @test128_16(<2 x i64> %x, i64* %yb.ptr, <2 x i64> %x1, <2 x i64
;
; NoVLX-LABEL: test128_16:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpgtq %xmm1, %xmm2, %xmm2
; NoVLX-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm3
; NoVLX-NEXT: vpcmpgtq %xmm3, %xmm0, %xmm3
; NoVLX-NEXT: vpandn %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpcmpgtq %zmm3, %zmm0, %k1
; NoVLX-NEXT: vpcmpleq %zmm1, %zmm2, %k1 {%k1}
; NoVLX-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%mask1 = icmp sge <2 x i64> %x1, %y1
%yb = load i64, i64* %yb.ptr, align 4
@ -963,10 +1041,12 @@ define <4 x i32> @test128_17(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) nou
;
; NoVLX-LABEL: test128_17:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpeqd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3
; NoVLX-NEXT: vpxor %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpneqd %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp ne <4 x i32> %x, %y
@ -983,10 +1063,12 @@ define <4 x i32> @test128_18(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) nou
;
; NoVLX-LABEL: test128_18:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpcmpeqd (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm3, %xmm3
; NoVLX-NEXT: vpxor %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpneqd %zmm0, %zmm2, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp ne <4 x i32> %y, %x
@ -1003,9 +1085,12 @@ define <4 x i32> @test128_19(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) nou
;
; NoVLX-LABEL: test128_19:
; NoVLX: # %bb.0:
; NoVLX-NEXT: vpmaxud (%rdi), %xmm0, %xmm2
; NoVLX-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpcmpnltud %zmm2, %zmm0, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp uge <4 x i32> %x, %y
@ -1022,10 +1107,12 @@ define <4 x i32> @test128_20(<4 x i32> %x, <4 x i32> %x1, <4 x i32>* %y.ptr) nou
;
; NoVLX-LABEL: test128_20:
; NoVLX: # %bb.0:
; NoVLX-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; NoVLX-NEXT: vmovdqu (%rdi), %xmm2
; NoVLX-NEXT: vpmaxud %xmm0, %xmm2, %xmm3
; NoVLX-NEXT: vpcmpeqd %xmm3, %xmm2, %xmm2
; NoVLX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpcmpnltud %zmm0, %zmm2, %k1
; NoVLX-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; NoVLX-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; NoVLX-NEXT: retq
%y = load <4 x i32>, <4 x i32>* %y.ptr, align 4
%mask = icmp uge <4 x i32> %y, %x

File diff suppressed because it is too large Load Diff

View File

@ -48,7 +48,6 @@ define <2 x i64> @ext_i2_2i64(i2 %a0) {
;
; AVX512F-LABEL: ext_i2_2i64:
; AVX512F: # %bb.0:
; AVX512F-NEXT: andb $3, %dil
; AVX512F-NEXT: kmovw %edi, %k1
; AVX512F-NEXT: vpbroadcastq {{.*}}(%rip), %zmm0 {%k1} {z}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
@ -98,7 +97,6 @@ define <4 x i32> @ext_i4_4i32(i4 %a0) {
;
; AVX512F-LABEL: ext_i4_4i32:
; AVX512F: # %bb.0:
; AVX512F-NEXT: andb $15, %dil
; AVX512F-NEXT: kmovw %edi, %k1
; AVX512F-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0 {%k1} {z}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
@ -289,7 +287,6 @@ define <4 x i64> @ext_i4_4i64(i4 %a0) {
;
; AVX512F-LABEL: ext_i4_4i64:
; AVX512F: # %bb.0:
; AVX512F-NEXT: andb $15, %dil
; AVX512F-NEXT: kmovw %edi, %k1
; AVX512F-NEXT: vpbroadcastq {{.*}}(%rip), %zmm0 {%k1} {z}
; AVX512F-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0

View File

@ -200,11 +200,9 @@ define void @test10(i64* %base, <4 x i64> %V, <4 x i1> %mask) {
; KNL: # %bb.0:
; KNL-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; KNL-NEXT: vpslld $31, %xmm1, %xmm1
; KNL-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL-NEXT: vmovdqa %ymm1, %ymm1
; KNL-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL-NEXT: kshiftlw $12, %k0, %k0
; KNL-NEXT: kshiftrw $12, %k0, %k1
; KNL-NEXT: vpcompressq %zmm0, (%rdi) {%k1}
; KNL-NEXT: retq
call void @llvm.masked.compressstore.v4i64(<4 x i64> %V, i64* %base, <4 x i1> %mask)
@ -223,10 +221,9 @@ define void @test11(i64* %base, <2 x i64> %V, <2 x i1> %mask) {
; KNL: # %bb.0:
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL-NEXT: vpsraq $63, %zmm1, %zmm1
; KNL-NEXT: vmovdqa %xmm1, %xmm1
; KNL-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL-NEXT: kshiftlw $14, %k0, %k0
; KNL-NEXT: kshiftrw $14, %k0, %k1
; KNL-NEXT: vpcompressq %zmm0, (%rdi) {%k1}
; KNL-NEXT: retq
call void @llvm.masked.compressstore.v2i64(<2 x i64> %V, i64* %base, <2 x i1> %mask)
@ -245,10 +242,9 @@ define void @test12(float* %base, <4 x float> %V, <4 x i1> %mask) {
; KNL: # %bb.0:
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpslld $31, %xmm1, %xmm1
; KNL-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL-NEXT: vmovdqa %xmm1, %xmm1
; KNL-NEXT: vpslld $31, %zmm1, %zmm1
; KNL-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL-NEXT: kshiftlw $12, %k0, %k0
; KNL-NEXT: kshiftrw $12, %k0, %k1
; KNL-NEXT: vcompressps %zmm0, (%rdi) {%k1}
; KNL-NEXT: retq
call void @llvm.masked.compressstore.v4f32(<4 x float> %V, float* %base, <4 x i1> %mask)
@ -269,11 +265,9 @@ define <2 x float> @test13(float* %base, <2 x float> %src0, <2 x i32> %trigger)
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpxor %xmm2, %xmm2, %xmm2
; KNL-NEXT: vpblendd {{.*#+}} xmm1 = xmm1[0],xmm2[1],xmm1[2],xmm2[3]
; KNL-NEXT: vpcmpeqq %xmm2, %xmm1, %xmm1
; KNL-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL-NEXT: vmovaps %xmm1, %xmm1
; KNL-NEXT: vpslld $31, %zmm1, %zmm1
; KNL-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL-NEXT: vpcmpeqq %zmm2, %zmm1, %k0
; KNL-NEXT: kshiftlw $14, %k0, %k0
; KNL-NEXT: kshiftrw $14, %k0, %k1
; KNL-NEXT: vexpandps (%rdi), %zmm0 {%k1}
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL-NEXT: retq
@ -296,11 +290,9 @@ define void @test14(float* %base, <2 x float> %V, <2 x i32> %trigger) {
; KNL-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL-NEXT: vpxor %xmm2, %xmm2, %xmm2
; KNL-NEXT: vpblendd {{.*#+}} xmm1 = xmm1[0],xmm2[1],xmm1[2],xmm2[3]
; KNL-NEXT: vpcmpeqq %xmm2, %xmm1, %xmm1
; KNL-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL-NEXT: vmovaps %xmm1, %xmm1
; KNL-NEXT: vpslld $31, %zmm1, %zmm1
; KNL-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL-NEXT: vpcmpeqq %zmm2, %zmm1, %k0
; KNL-NEXT: kshiftlw $14, %k0, %k0
; KNL-NEXT: kshiftrw $14, %k0, %k1
; KNL-NEXT: vcompressps %zmm0, (%rdi) {%k1}
; KNL-NEXT: retq
%mask = icmp eq <2 x i32> %trigger, zeroinitializer

View File

@ -812,11 +812,12 @@ define <4 x float> @test15(float* %base, <4 x i32> %ind, <4 x i1> %mask) {
; KNL_64-LABEL: test15:
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_64-NEXT: vmovdqa %xmm1, %xmm1
; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm2
; KNL_64-NEXT: vpslld $31, %ymm1, %ymm0
; KNL_64-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL_64-NEXT: vgatherqps (%rdi,%zmm2,4), %ymm0 {%k1}
; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $12, %k0, %k0
; KNL_64-NEXT: kshiftrw $12, %k0, %k1
; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm1
; KNL_64-NEXT: vgatherqps (%rdi,%zmm1,4), %ymm0 {%k1}
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 killed %ymm0
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
@ -824,12 +825,13 @@ define <4 x float> @test15(float* %base, <4 x i32> %ind, <4 x i1> %mask) {
; KNL_32-LABEL: test15:
; KNL_32: # %bb.0:
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_32-NEXT: vmovdqa %xmm1, %xmm1
; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $12, %k0, %k0
; KNL_32-NEXT: kshiftrw $12, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm2
; KNL_32-NEXT: vpslld $31, %ymm1, %ymm0
; KNL_32-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL_32-NEXT: vgatherqps (%eax,%zmm2,4), %ymm0 {%k1}
; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm1
; KNL_32-NEXT: vgatherqps (%eax,%zmm1,4), %ymm0 {%k1}
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 killed %ymm0
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
@ -864,12 +866,10 @@ define <4 x double> @test16(double* %base, <4 x i32> %ind, <4 x i1> %mask, <4 x
; KNL_64-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_64-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL_64-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL_64-NEXT: vmovdqa %ymm1, %ymm1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $12, %k0, %k0
; KNL_64-NEXT: kshiftrw $12, %k0, %k1
; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm0
; KNL_64-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_64-NEXT: vgatherqpd (%rdi,%zmm0,8), %zmm2 {%k1}
; KNL_64-NEXT: vmovapd %ymm2, %ymm0
; KNL_64-NEXT: retq
@ -879,13 +879,11 @@ define <4 x double> @test16(double* %base, <4 x i32> %ind, <4 x i1> %mask, <4 x
; KNL_32-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_32-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL_32-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL_32-NEXT: vmovdqa %ymm1, %ymm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $12, %k0, %k0
; KNL_32-NEXT: kshiftrw $12, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
; KNL_32-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_32-NEXT: vgatherqpd (%eax,%zmm0,8), %zmm2 {%k1}
; KNL_32-NEXT: vmovapd %ymm2, %ymm0
; KNL_32-NEXT: retl
@ -919,9 +917,10 @@ define <2 x double> @test17(double* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x
; KNL_64-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; KNL_64-NEXT: vpsllq $32, %xmm0, %xmm0
; KNL_64-NEXT: vpsraq $32, %zmm0, %zmm0
; KNL_64-NEXT: vmovdqa %xmm1, %xmm1
; KNL_64-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_64-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vgatherqpd (%rdi,%zmm0,8), %zmm2 {%k1}
; KNL_64-NEXT: vmovapd %xmm2, %xmm0
; KNL_64-NEXT: vzeroupper
@ -932,10 +931,11 @@ define <2 x double> @test17(double* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x
; KNL_32-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; KNL_32-NEXT: vpsllq $32, %xmm0, %xmm0
; KNL_32-NEXT: vpsraq $32, %zmm0, %zmm0
; KNL_32-NEXT: vmovdqa %xmm1, %xmm1
; KNL_32-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_32-NEXT: vgatherqpd (%eax,%zmm0,8), %zmm2 {%k1}
; KNL_32-NEXT: vmovapd %xmm2, %xmm0
; KNL_32-NEXT: vzeroupper
@ -979,9 +979,10 @@ define void @test18(<4 x i32>%a1, <4 x i32*> %ptr, <4 x i1>%mask) {
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_64-NEXT: vmovdqa %xmm2, %xmm2
; KNL_64-NEXT: vpslld $31, %ymm2, %ymm2
; KNL_64-NEXT: vptestmd %zmm2, %zmm2, %k1
; KNL_64-NEXT: vpslld $31, %xmm2, %xmm2
; KNL_64-NEXT: vptestmd %zmm2, %zmm2, %k0
; KNL_64-NEXT: kshiftlw $12, %k0, %k0
; KNL_64-NEXT: kshiftrw $12, %k0, %k1
; KNL_64-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
@ -990,10 +991,11 @@ define void @test18(<4 x i32>%a1, <4 x i32*> %ptr, <4 x i1>%mask) {
; KNL_32: # %bb.0:
; KNL_32-NEXT: # kill: def %xmm1 killed %xmm1 def %ymm1
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_32-NEXT: vmovdqa %xmm2, %xmm2
; KNL_32-NEXT: vpslld $31, %xmm2, %xmm2
; KNL_32-NEXT: vptestmd %zmm2, %zmm2, %k0
; KNL_32-NEXT: kshiftlw $12, %k0, %k0
; KNL_32-NEXT: kshiftrw $12, %k0, %k1
; KNL_32-NEXT: vpmovsxdq %ymm1, %zmm1
; KNL_32-NEXT: vpslld $31, %ymm2, %ymm2
; KNL_32-NEXT: vptestmd %zmm2, %zmm2, %k1
; KNL_32-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
@ -1022,11 +1024,9 @@ define void @test19(<4 x double>%a1, double* %ptr, <4 x i1>%mask, <4 x i64> %ind
; KNL_64-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; KNL_64-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_64-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL_64-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL_64-NEXT: vmovdqa %ymm1, %ymm1
; KNL_64-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $12, %k0, %k0
; KNL_64-NEXT: kshiftrw $12, %k0, %k1
; KNL_64-NEXT: vscatterqpd %zmm0, (%rdi,%zmm2,8) {%k1}
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
@ -1036,12 +1036,10 @@ define void @test19(<4 x double>%a1, double* %ptr, <4 x i1>%mask, <4 x i64> %ind
; KNL_32-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; KNL_32-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_32-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL_32-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL_32-NEXT: vmovdqa %ymm1, %ymm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $12, %k0, %k0
; KNL_32-NEXT: kshiftrw $12, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_32-NEXT: vscatterqpd %zmm0, (%eax,%zmm2,8) {%k1}
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
@ -1073,10 +1071,10 @@ define void @test20(<2 x float>%a1, <2 x float*> %ptr, <2 x i1> %mask) {
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_64-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,2],zero,zero
; KNL_64-NEXT: vmovaps %xmm2, %xmm2
; KNL_64-NEXT: vpslld $31, %ymm2, %ymm2
; KNL_64-NEXT: vptestmd %zmm2, %zmm2, %k1
; KNL_64-NEXT: vpsllq $63, %xmm2, %xmm2
; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vscatterqps %ymm0, (,%zmm1) {%k1}
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
@ -1084,12 +1082,12 @@ define void @test20(<2 x float>%a1, <2 x float*> %ptr, <2 x i1> %mask) {
; KNL_32-LABEL: test20:
; KNL_32: # %bb.0:
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_32-NEXT: vpsllq $63, %xmm2, %xmm2
; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
; KNL_32-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,2],zero,zero
; KNL_32-NEXT: vmovaps %xmm2, %xmm2
; KNL_32-NEXT: vpmovsxdq %ymm1, %zmm1
; KNL_32-NEXT: vpslld $31, %ymm2, %ymm2
; KNL_32-NEXT: vptestmd %zmm2, %zmm2, %k1
; KNL_32-NEXT: vscatterqps %ymm0, (,%zmm1) {%k1}
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
@ -1119,10 +1117,11 @@ define void @test21(<2 x i32>%a1, <2 x i32*> %ptr, <2 x i1>%mask) {
; KNL_64-LABEL: test21:
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; KNL_64-NEXT: vmovdqa %xmm2, %xmm2
; KNL_64-NEXT: vpsllq $63, %xmm2, %xmm2
; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; KNL_64-NEXT: vpsllq $63, %zmm2, %zmm2
; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k1
; KNL_64-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
@ -1131,10 +1130,11 @@ define void @test21(<2 x i32>%a1, <2 x i32*> %ptr, <2 x i1>%mask) {
; KNL_32: # %bb.0:
; KNL_32-NEXT: vpsllq $32, %xmm1, %xmm1
; KNL_32-NEXT: vpsraq $32, %zmm1, %zmm1
; KNL_32-NEXT: vmovdqa %xmm2, %xmm2
; KNL_32-NEXT: vpsllq $63, %xmm2, %xmm2
; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; KNL_32-NEXT: vpsllq $63, %zmm2, %zmm2
; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
; KNL_32-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
@ -1170,12 +1170,12 @@ define <2 x float> @test22(float* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x fl
; KNL_64-LABEL: test22:
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm2 killed %xmm2 def %ymm2
; KNL_64-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; KNL_64-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_64-NEXT: vmovaps %xmm1, %xmm1
; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm0
; KNL_64-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_64-NEXT: vgatherqps (%rdi,%zmm0,4), %ymm2 {%k1}
; KNL_64-NEXT: vmovaps %xmm2, %xmm0
; KNL_64-NEXT: vzeroupper
@ -1184,13 +1184,13 @@ define <2 x float> @test22(float* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x fl
; KNL_32-LABEL: test22:
; KNL_32: # %bb.0:
; KNL_32-NEXT: # kill: def %xmm2 killed %xmm2 def %ymm2
; KNL_32-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; KNL_32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_32-NEXT: vmovaps %xmm1, %xmm1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
; KNL_32-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_32-NEXT: vgatherqps (%eax,%zmm0,4), %ymm2 {%k1}
; KNL_32-NEXT: vmovaps %xmm2, %xmm0
; KNL_32-NEXT: vzeroupper
@ -1225,10 +1225,10 @@ define <2 x float> @test22a(float* %base, <2 x i64> %ind, <2 x i1> %mask, <2 x f
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm2 killed %xmm2 def %ymm2
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL_64-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_64-NEXT: vmovaps %xmm1, %xmm1
; KNL_64-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_64-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vgatherqps (%rdi,%zmm0,4), %ymm2 {%k1}
; KNL_64-NEXT: vmovaps %xmm2, %xmm0
; KNL_64-NEXT: vzeroupper
@ -1238,11 +1238,11 @@ define <2 x float> @test22a(float* %base, <2 x i64> %ind, <2 x i1> %mask, <2 x f
; KNL_32: # %bb.0:
; KNL_32-NEXT: # kill: def %xmm2 killed %xmm2 def %ymm2
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL_32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_32-NEXT: vmovaps %xmm1, %xmm1
; KNL_32-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_32-NEXT: vgatherqps (%eax,%zmm0,4), %ymm2 {%k1}
; KNL_32-NEXT: vmovaps %xmm2, %xmm0
; KNL_32-NEXT: vzeroupper
@ -1275,30 +1275,30 @@ declare <2 x i64> @llvm.masked.gather.v2i64.v2p0i64(<2 x i64*>, i32, <2 x i1>, <
define <2 x i32> @test23(i32* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x i32> %src0) {
; KNL_64-LABEL: test23:
; KNL_64: # %bb.0:
; KNL_64-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
; KNL_64-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_64-NEXT: vpshufd {{.*#+}} xmm1 = xmm2[0,2,2,3]
; KNL_64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm0
; KNL_64-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_64-NEXT: vmovaps %xmm1, %xmm1
; KNL_64-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_64-NEXT: vpgatherqd (%rdi,%zmm0,4), %ymm2 {%k1}
; KNL_64-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm2[0],zero,xmm2[1],zero
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vpgatherqd (%rdi,%zmm0,4), %ymm1 {%k1}
; KNL_64-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
;
; KNL_32-LABEL: test23:
; KNL_32: # %bb.0:
; KNL_32-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
; KNL_32-NEXT: vpshufd {{.*#+}} xmm1 = xmm2[0,2,2,3]
; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
; KNL_32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_32-NEXT: vmovaps %xmm1, %xmm1
; KNL_32-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_32-NEXT: vpgatherqd (%eax,%zmm0,4), %ymm2 {%k1}
; KNL_32-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm2[0],zero,xmm2[1],zero
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: vpgatherqd (%eax,%zmm0,4), %ymm1 {%k1}
; KNL_32-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
;
@ -1332,27 +1332,27 @@ define <2 x i32> @test23b(i32* %base, <2 x i64> %ind, <2 x i1> %mask, <2 x i32>
; KNL_64-LABEL: test23b:
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL_64-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
; KNL_64-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_64-NEXT: vmovaps %xmm1, %xmm1
; KNL_64-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_64-NEXT: vpgatherqd (%rdi,%zmm0,4), %ymm2 {%k1}
; KNL_64-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm2[0],zero,xmm2[1],zero
; KNL_64-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_64-NEXT: vpshufd {{.*#+}} xmm1 = xmm2[0,2,2,3]
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vpgatherqd (%rdi,%zmm0,4), %ymm1 {%k1}
; KNL_64-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
;
; KNL_32-LABEL: test23b:
; KNL_32: # %bb.0:
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL_32-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
; KNL_32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,2],zero,zero
; KNL_32-NEXT: vmovaps %xmm1, %xmm1
; KNL_32-NEXT: vpslld $31, %ymm1, %ymm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1
; KNL_32-NEXT: vpgatherqd (%eax,%zmm0,4), %ymm2 {%k1}
; KNL_32-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm2[0],zero,xmm2[1],zero
; KNL_32-NEXT: vpshufd {{.*#+}} xmm1 = xmm2[0,2,2,3]
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: vpgatherqd (%eax,%zmm0,4), %ymm1 {%k1}
; KNL_32-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
;
@ -1433,9 +1433,10 @@ define <2 x i64> @test25(i64* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x i64> %
; KNL_64-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; KNL_64-NEXT: vpsllq $32, %xmm0, %xmm0
; KNL_64-NEXT: vpsraq $32, %zmm0, %zmm0
; KNL_64-NEXT: vmovdqa %xmm1, %xmm1
; KNL_64-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_64-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vpgatherqq (%rdi,%zmm0,8), %zmm2 {%k1}
; KNL_64-NEXT: vmovdqa %xmm2, %xmm0
; KNL_64-NEXT: vzeroupper
@ -1446,10 +1447,11 @@ define <2 x i64> @test25(i64* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x i64> %
; KNL_32-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; KNL_32-NEXT: vpsllq $32, %xmm0, %xmm0
; KNL_32-NEXT: vpsraq $32, %zmm0, %zmm0
; KNL_32-NEXT: vmovdqa %xmm1, %xmm1
; KNL_32-NEXT: vpsllq $63, %xmm1, %xmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_32-NEXT: vpgatherqq (%eax,%zmm0,8), %zmm2 {%k1}
; KNL_32-NEXT: vmovdqa %xmm2, %xmm0
; KNL_32-NEXT: vzeroupper
@ -1500,10 +1502,8 @@ define <2 x i64> @test26(i64* %base, <2 x i32> %ind, <2 x i64> %src0) {
; KNL_32-NEXT: vpsllq $32, %xmm0, %xmm0
; KNL_32-NEXT: vpsraq $32, %zmm0, %zmm0
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2
; KNL_32-NEXT: vmovdqa %xmm2, %xmm2
; KNL_32-NEXT: vpsllq $63, %zmm2, %zmm2
; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
; KNL_32-NEXT: movb $3, %cl
; KNL_32-NEXT: kmovw %ecx, %k1
; KNL_32-NEXT: vpgatherqq (%eax,%zmm0,8), %zmm1 {%k1}
; KNL_32-NEXT: vmovdqa %xmm1, %xmm0
; KNL_32-NEXT: vzeroupper
@ -1597,10 +1597,8 @@ define void @test28(<2 x i32>%a1, <2 x i32*> %ptr) {
; KNL_32-NEXT: vpsllq $32, %xmm1, %xmm1
; KNL_32-NEXT: vpsraq $32, %zmm1, %zmm1
; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; KNL_32-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2
; KNL_32-NEXT: vmovdqa %xmm2, %xmm2
; KNL_32-NEXT: vpsllq $63, %zmm2, %zmm2
; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
; KNL_32-NEXT: movb $3, %al
; KNL_32-NEXT: kmovw %eax, %k1
; KNL_32-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
@ -1686,83 +1684,80 @@ declare <3 x i32> @llvm.masked.gather.v3i32.v3p0i32(<3 x i32*>, i32, <3 x i1>, <
define <3 x i32> @test30(<3 x i32*> %base, <3 x i32> %ind, <3 x i1> %mask, <3 x i32> %src0) {
; KNL_64-LABEL: test30:
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm3 killed %xmm3 def %zmm3
; KNL_64-NEXT: vpslld $31, %xmm2, %xmm2
; KNL_64-NEXT: vptestmd %zmm2, %zmm2, %k1
; KNL_64-NEXT: kmovw %k1, %eax
; KNL_64-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL_64-NEXT: vpsllq $2, %ymm1, %ymm1
; KNL_64-NEXT: vpaddq %ymm1, %ymm0, %ymm1
; KNL_64-NEXT: testb $1, %dil
; KNL_64-NEXT: testb $1, %al
; KNL_64-NEXT: # implicit-def: %xmm0
; KNL_64-NEXT: jne .LBB31_1
; KNL_64-NEXT: # %bb.2: # %else
; KNL_64-NEXT: testb $1, %sil
; KNL_64-NEXT: jne .LBB31_3
; KNL_64-NEXT: .LBB31_4: # %else2
; KNL_64-NEXT: testb $1, %dl
; KNL_64-NEXT: jne .LBB31_5
; KNL_64-NEXT: .LBB31_6: # %else5
; KNL_64-NEXT: vmovd %edi, %xmm1
; KNL_64-NEXT: vpinsrb $4, %esi, %xmm1, %xmm1
; KNL_64-NEXT: vpinsrb $8, %edx, %xmm1, %xmm1
; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_64-NEXT: vblendvps %xmm1, %xmm0, %xmm2, %xmm0
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
; KNL_64-NEXT: .LBB31_1: # %cond.load
; KNL_64-NEXT: je .LBB31_2
; KNL_64-NEXT: # %bb.1: # %cond.load
; KNL_64-NEXT: vmovq %xmm1, %rax
; KNL_64-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; KNL_64-NEXT: testb $1, %sil
; KNL_64-NEXT: .LBB31_2: # %else
; KNL_64-NEXT: kshiftrw $1, %k1, %k0
; KNL_64-NEXT: kmovw %k0, %eax
; KNL_64-NEXT: testb $1, %al
; KNL_64-NEXT: je .LBB31_4
; KNL_64-NEXT: .LBB31_3: # %cond.load1
; KNL_64-NEXT: # %bb.3: # %cond.load1
; KNL_64-NEXT: vpextrq $1, %xmm1, %rax
; KNL_64-NEXT: vpinsrd $1, (%rax), %xmm0, %xmm0
; KNL_64-NEXT: testb $1, %dl
; KNL_64-NEXT: .LBB31_4: # %else2
; KNL_64-NEXT: kshiftrw $2, %k1, %k0
; KNL_64-NEXT: kmovw %k0, %eax
; KNL_64-NEXT: testb $1, %al
; KNL_64-NEXT: je .LBB31_6
; KNL_64-NEXT: .LBB31_5: # %cond.load4
; KNL_64-NEXT: # %bb.5: # %cond.load4
; KNL_64-NEXT: vextracti128 $1, %ymm1, %xmm1
; KNL_64-NEXT: vmovq %xmm1, %rax
; KNL_64-NEXT: vpinsrd $2, (%rax), %xmm0, %xmm0
; KNL_64-NEXT: jmp .LBB31_6
; KNL_64-NEXT: .LBB31_6: # %else5
; KNL_64-NEXT: vmovdqa32 %zmm0, %zmm3 {%k1}
; KNL_64-NEXT: vmovdqa %xmm3, %xmm0
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
;
; KNL_32-LABEL: test30:
; KNL_32: # %bb.0:
; KNL_32-NEXT: pushl %esi
; KNL_32-NEXT: .cfi_def_cfa_offset 8
; KNL_32-NEXT: .cfi_offset %esi, -8
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %edx
; KNL_32-NEXT: subl $12, %esp
; KNL_32-NEXT: .cfi_def_cfa_offset 16
; KNL_32-NEXT: vpslld $31, %xmm2, %xmm2
; KNL_32-NEXT: vptestmd %zmm2, %zmm2, %k1
; KNL_32-NEXT: kmovw %k1, %eax
; KNL_32-NEXT: vpslld $2, %xmm1, %xmm1
; KNL_32-NEXT: vpaddd %xmm1, %xmm0, %xmm1
; KNL_32-NEXT: testb $1, %dl
; KNL_32-NEXT: # implicit-def: %xmm0
; KNL_32-NEXT: jne .LBB31_1
; KNL_32-NEXT: # %bb.2: # %else
; KNL_32-NEXT: testb $1, %cl
; KNL_32-NEXT: jne .LBB31_3
; KNL_32-NEXT: .LBB31_4: # %else2
; KNL_32-NEXT: vpaddd %xmm1, %xmm0, %xmm2
; KNL_32-NEXT: testb $1, %al
; KNL_32-NEXT: # implicit-def: %xmm1
; KNL_32-NEXT: je .LBB31_2
; KNL_32-NEXT: # %bb.1: # %cond.load
; KNL_32-NEXT: vmovd %xmm2, %eax
; KNL_32-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; KNL_32-NEXT: .LBB31_2: # %else
; KNL_32-NEXT: kshiftrw $1, %k1, %k0
; KNL_32-NEXT: kmovw %k0, %eax
; KNL_32-NEXT: testb $1, %al
; KNL_32-NEXT: jne .LBB31_5
; KNL_32-NEXT: .LBB31_6: # %else5
; KNL_32-NEXT: vmovd %edx, %xmm1
; KNL_32-NEXT: vpinsrb $4, %ecx, %xmm1, %xmm1
; KNL_32-NEXT: vpinsrb $8, %eax, %xmm1, %xmm1
; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_32-NEXT: vblendvps %xmm1, %xmm0, %xmm2, %xmm0
; KNL_32-NEXT: popl %esi
; KNL_32-NEXT: retl
; KNL_32-NEXT: .LBB31_1: # %cond.load
; KNL_32-NEXT: vmovd %xmm1, %esi
; KNL_32-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; KNL_32-NEXT: testb $1, %cl
; KNL_32-NEXT: je .LBB31_4
; KNL_32-NEXT: .LBB31_3: # %cond.load1
; KNL_32-NEXT: vpextrd $1, %xmm1, %esi
; KNL_32-NEXT: vpinsrd $1, (%esi), %xmm0, %xmm0
; KNL_32-NEXT: # %bb.3: # %cond.load1
; KNL_32-NEXT: vpextrd $1, %xmm2, %eax
; KNL_32-NEXT: vpinsrd $1, (%eax), %xmm1, %xmm1
; KNL_32-NEXT: .LBB31_4: # %else2
; KNL_32-NEXT: vmovdqa {{[0-9]+}}(%esp), %xmm0
; KNL_32-NEXT: kshiftrw $2, %k1, %k0
; KNL_32-NEXT: kmovw %k0, %eax
; KNL_32-NEXT: testb $1, %al
; KNL_32-NEXT: je .LBB31_6
; KNL_32-NEXT: .LBB31_5: # %cond.load4
; KNL_32-NEXT: vpextrd $2, %xmm1, %esi
; KNL_32-NEXT: vpinsrd $2, (%esi), %xmm0, %xmm0
; KNL_32-NEXT: jmp .LBB31_6
; KNL_32-NEXT: # %bb.5: # %cond.load4
; KNL_32-NEXT: vpextrd $2, %xmm2, %eax
; KNL_32-NEXT: vpinsrd $2, (%eax), %xmm1, %xmm1
; KNL_32-NEXT: .LBB31_6: # %else5
; KNL_32-NEXT: vmovdqa32 %zmm1, %zmm0 {%k1}
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; KNL_32-NEXT: addl $12, %esp
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
;
; SKX-LABEL: test30:
; SKX: # %bb.0:
@ -2355,11 +2350,9 @@ define <4 x i64> @test_pr28312(<4 x i64*> %p1, <4 x i1> %k, <4 x i1> %k2,<4 x i6
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_64-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL_64-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL_64-NEXT: vmovdqa %ymm1, %ymm1
; KNL_64-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_64-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_64-NEXT: kshiftlw $12, %k0, %k0
; KNL_64-NEXT: kshiftrw $12, %k0, %k1
; KNL_64-NEXT: vpgatherqq (,%zmm0), %zmm1 {%k1}
; KNL_64-NEXT: vpaddq %ymm1, %ymm1, %ymm0
; KNL_64-NEXT: vpaddq %ymm0, %ymm1, %ymm0
@ -2376,12 +2369,10 @@ define <4 x i64> @test_pr28312(<4 x i64*> %p1, <4 x i1> %k, <4 x i1> %k2,<4 x i6
; KNL_32-NEXT: subl $32, %esp
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %ymm0
; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
; KNL_32-NEXT: vpsrad $31, %xmm1, %xmm1
; KNL_32-NEXT: vpmovsxdq %xmm1, %ymm1
; KNL_32-NEXT: vmovdqa %ymm1, %ymm1
; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k0
; KNL_32-NEXT: kshiftlw $12, %k0, %k0
; KNL_32-NEXT: kshiftrw $12, %k0, %k1
; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
; KNL_32-NEXT: vpsllq $63, %zmm1, %zmm1
; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
; KNL_32-NEXT: vpgatherqq (,%zmm0), %zmm1 {%k1}
; KNL_32-NEXT: vpaddq %ymm1, %ymm1, %ymm0
; KNL_32-NEXT: vpaddq %ymm0, %ymm1, %ymm0
@ -2547,14 +2538,14 @@ define <2 x float> @large_index(float* %base, <2 x i128> %ind, <2 x i1> %mask, <
; KNL_64-LABEL: large_index:
; KNL_64: # %bb.0:
; KNL_64-NEXT: # kill: def %xmm1 killed %xmm1 def %ymm1
; KNL_64-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,2],zero,zero
; KNL_64-NEXT: vmovaps %xmm0, %xmm0
; KNL_64-NEXT: vmovq %rcx, %xmm2
; KNL_64-NEXT: vmovq %rsi, %xmm3
; KNL_64-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
; KNL_64-NEXT: vpslld $31, %ymm0, %ymm0
; KNL_64-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL_64-NEXT: vgatherqps (%rdi,%zmm2,4), %ymm1 {%k1}
; KNL_64-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL_64-NEXT: vptestmq %zmm0, %zmm0, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vmovq %rcx, %xmm0
; KNL_64-NEXT: vmovq %rsi, %xmm2
; KNL_64-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; KNL_64-NEXT: vgatherqps (%rdi,%zmm0,4), %ymm1 {%k1}
; KNL_64-NEXT: vmovaps %xmm1, %xmm0
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
@ -2562,16 +2553,16 @@ define <2 x float> @large_index(float* %base, <2 x i128> %ind, <2 x i1> %mask, <
; KNL_32-LABEL: large_index:
; KNL_32: # %bb.0:
; KNL_32-NEXT: # kill: def %xmm1 killed %xmm1 def %ymm1
; KNL_32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,2],zero,zero
; KNL_32-NEXT: vmovaps %xmm0, %xmm0
; KNL_32-NEXT: vpsllq $63, %xmm0, %xmm0
; KNL_32-NEXT: vptestmq %zmm0, %zmm0, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vmovd {{.*#+}} xmm2 = mem[0],zero,zero,zero
; KNL_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm2, %xmm2
; KNL_32-NEXT: vpinsrd $2, {{[0-9]+}}(%esp), %xmm2, %xmm2
; KNL_32-NEXT: vpinsrd $3, {{[0-9]+}}(%esp), %xmm2, %xmm2
; KNL_32-NEXT: vpslld $31, %ymm0, %ymm0
; KNL_32-NEXT: vptestmd %zmm0, %zmm0, %k1
; KNL_32-NEXT: vgatherqps (%eax,%zmm2,4), %ymm1 {%k1}
; KNL_32-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; KNL_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
; KNL_32-NEXT: vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
; KNL_32-NEXT: vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
; KNL_32-NEXT: vgatherqps (%eax,%zmm0,4), %ymm1 {%k1}
; KNL_32-NEXT: vmovaps %xmm1, %xmm0
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl
@ -2700,9 +2691,10 @@ define void @test_scatter_2i32_index(<2 x double> %a1, double* %base, <2 x i32>
; KNL_64-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL_64-NEXT: vpsllq $32, %xmm1, %xmm1
; KNL_64-NEXT: vpsraq $32, %zmm1, %zmm1
; KNL_64-NEXT: vmovdqa %xmm2, %xmm2
; KNL_64-NEXT: vpsllq $63, %zmm2, %zmm2
; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k1
; KNL_64-NEXT: vpsllq $63, %xmm2, %xmm2
; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k0
; KNL_64-NEXT: kshiftlw $14, %k0, %k0
; KNL_64-NEXT: kshiftrw $14, %k0, %k1
; KNL_64-NEXT: vscatterqpd %zmm0, (%rdi,%zmm1,8) {%k1}
; KNL_64-NEXT: vzeroupper
; KNL_64-NEXT: retq
@ -2712,10 +2704,11 @@ define void @test_scatter_2i32_index(<2 x double> %a1, double* %base, <2 x i32>
; KNL_32-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; KNL_32-NEXT: vpsllq $32, %xmm1, %xmm1
; KNL_32-NEXT: vpsraq $32, %zmm1, %zmm1
; KNL_32-NEXT: vmovdqa %xmm2, %xmm2
; KNL_32-NEXT: vpsllq $63, %xmm2, %xmm2
; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k0
; KNL_32-NEXT: kshiftlw $14, %k0, %k0
; KNL_32-NEXT: kshiftrw $14, %k0, %k1
; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_32-NEXT: vpsllq $63, %zmm2, %zmm2
; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
; KNL_32-NEXT: vscatterqpd %zmm0, (%eax,%zmm1,8) {%k1}
; KNL_32-NEXT: vzeroupper
; KNL_32-NEXT: retl

View File

@ -99,10 +99,15 @@ define <2 x double> @test6(<2 x i64> %trigger, <2 x double>* %addr, <2 x double>
;
; AVX512F-LABEL: test6:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpcmpeqq %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vmaskmovpd (%rdi), %xmm0, %xmm2
; AVX512F-NEXT: vblendvpd %xmm0, %xmm2, %xmm1, %xmm0
; AVX512F-NEXT: vpcmpeqq %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $14, %k0, %k0
; AVX512F-NEXT: kshiftrw $14, %k0, %k1
; AVX512F-NEXT: vblendmpd (%rdi), %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test6:
@ -127,10 +132,15 @@ define <4 x float> @test7(<4 x i32> %trigger, <4 x float>* %addr, <4 x float> %d
;
; AVX512F-LABEL: test7:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2
; AVX512F-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0
; AVX512F-NEXT: vpcmpeqd %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $12, %k0, %k0
; AVX512F-NEXT: kshiftrw $12, %k0, %k1
; AVX512F-NEXT: vblendmps (%rdi), %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test7:
@ -163,10 +173,15 @@ define <4 x i32> @test8(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %dst) {
;
; AVX512F-LABEL: test8:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vpmaskmovd (%rdi), %xmm0, %xmm2
; AVX512F-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0
; AVX512F-NEXT: vpcmpeqd %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $12, %k0, %k0
; AVX512F-NEXT: kshiftrw $12, %k0, %k1
; AVX512F-NEXT: vpblendmd (%rdi), %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test8:
@ -197,9 +212,14 @@ define void @test9(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %val) {
;
; AVX512F-LABEL: test9:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vpmaskmovd %xmm1, %xmm0, (%rdi)
; AVX512F-NEXT: vpcmpeqd %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $12, %k0, %k0
; AVX512F-NEXT: kshiftrw $12, %k0, %k1
; AVX512F-NEXT: vmovdqu32 %zmm1, (%rdi) {%k1}
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test9:
@ -237,11 +257,14 @@ define <4 x double> @test10(<4 x i32> %trigger, <4 x double>* %addr, <4 x double
;
; AVX512F-LABEL: test10:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %ymm1 killed %ymm1 def %zmm1
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vpmovsxdq %xmm0, %ymm0
; AVX512F-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm2
; AVX512F-NEXT: vblendvpd %ymm0, %ymm2, %ymm1, %ymm0
; AVX512F-NEXT: vpcmpeqd %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $12, %k0, %k0
; AVX512F-NEXT: kshiftrw $12, %k0, %k1
; AVX512F-NEXT: vblendmpd (%rdi), %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; SKX-LABEL: test10:
@ -277,10 +300,13 @@ define <4 x double> @test10b(<4 x i32> %trigger, <4 x double>* %addr, <4 x doubl
;
; AVX512F-LABEL: test10b:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX512F-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0
; AVX512F-NEXT: vpmovsxdq %xmm0, %ymm0
; AVX512F-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm0
; AVX512F-NEXT: vpcmpeqd %zmm1, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $12, %k0, %k0
; AVX512F-NEXT: kshiftrw $12, %k0, %k1
; AVX512F-NEXT: vmovupd (%rdi), %zmm0 {%k1} {z}
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; SKX-LABEL: test10b:
@ -525,11 +551,14 @@ define void @test14(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) {
;
; AVX512F-LABEL: test14:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3]
; AVX512F-NEXT: vpcmpeqq %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,2],zero,zero
; AVX512F-NEXT: vmaskmovps %xmm1, %xmm0, (%rdi)
; AVX512F-NEXT: vpcmpeqq %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $14, %k0, %k0
; AVX512F-NEXT: kshiftrw $14, %k0, %k1
; AVX512F-NEXT: vmovups %zmm1, (%rdi) {%k1}
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test14:
@ -569,10 +598,12 @@ define void @test15(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) {
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3]
; AVX512F-NEXT: vpcmpeqq %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,2],zero,zero
; AVX512F-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
; AVX512F-NEXT: vpmaskmovd %xmm1, %xmm0, (%rdi)
; AVX512F-NEXT: vpcmpeqq %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $14, %k0, %k0
; AVX512F-NEXT: kshiftrw $14, %k0, %k1
; AVX512F-NEXT: vpshufd {{.*#+}} xmm0 = xmm1[0,2,2,3]
; AVX512F-NEXT: vmovdqu32 %zmm0, (%rdi) {%k1}
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test15:
@ -610,12 +641,15 @@ define <2 x float> @test16(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %
;
; AVX512F-LABEL: test16:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3]
; AVX512F-NEXT: vpcmpeqq %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,2],zero,zero
; AVX512F-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2
; AVX512F-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0
; AVX512F-NEXT: vpcmpeqq %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $14, %k0, %k0
; AVX512F-NEXT: kshiftrw $14, %k0, %k1
; AVX512F-NEXT: vblendmps (%rdi), %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test16:
@ -659,12 +693,13 @@ define <2 x i32> @test17(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %dst) {
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3]
; AVX512F-NEXT: vpcmpeqq %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,2],zero,zero
; AVX512F-NEXT: vpmaskmovd (%rdi), %xmm0, %xmm2
; AVX512F-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[0,2,2,3]
; AVX512F-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0
; AVX512F-NEXT: vpcmpeqq %zmm2, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $14, %k0, %k0
; AVX512F-NEXT: kshiftrw $14, %k0, %k1
; AVX512F-NEXT: vpshufd {{.*#+}} xmm0 = xmm1[0,2,2,3]
; AVX512F-NEXT: vmovdqu32 (%rdi), %zmm0 {%k1}
; AVX512F-NEXT: vpmovsxdq %xmm0, %xmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test17:
@ -704,9 +739,12 @@ define <2 x float> @test18(<2 x i32> %trigger, <2 x float>* %addr) {
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
; AVX512F-NEXT: vpcmpeqq %xmm1, %xmm0, %xmm0
; AVX512F-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,2],zero,zero
; AVX512F-NEXT: vmaskmovps (%rdi), %xmm0, %xmm0
; AVX512F-NEXT: vpcmpeqq %zmm1, %zmm0, %k0
; AVX512F-NEXT: kshiftlw $14, %k0, %k0
; AVX512F-NEXT: kshiftrw $14, %k0, %k1
; AVX512F-NEXT: vmovups (%rdi), %zmm0 {%k1} {z}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test18:
@ -729,8 +767,11 @@ define <4 x float> @load_all(<4 x i32> %trigger, <4 x float>* %addr) {
;
; AVX512F-LABEL: load_all:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
; AVX512F-NEXT: vmaskmovps (%rdi), %xmm0, %xmm0
; AVX512F-NEXT: movw $15, %ax
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovups (%rdi), %zmm0 {%k1} {z}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: load_all:
@ -755,9 +796,12 @@ define <4 x float> @mload_constmask_v4f32(<4 x float>* %addr, <4 x float> %dst)
;
; AVX512F-LABEL: mload_constmask_v4f32:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vmovaps {{.*#+}} xmm1 = [4294967295,0,4294967295,4294967295]
; AVX512F-NEXT: vmaskmovps (%rdi), %xmm1, %xmm2
; AVX512F-NEXT: vblendvps %xmm1, %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: movw $13, %ax
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovups (%rdi), %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: mload_constmask_v4f32:
@ -789,9 +833,12 @@ define <4 x i32> @mload_constmask_v4i32(<4 x i32>* %addr, <4 x i32> %dst) {
;
; AVX512F-LABEL: mload_constmask_v4i32:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vmovdqa {{.*#+}} xmm1 = [0,4294967295,4294967295,4294967295]
; AVX512F-NEXT: vpmaskmovd (%rdi), %xmm1, %xmm2
; AVX512F-NEXT: vblendvps %xmm1, %xmm2, %xmm0, %xmm0
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: movw $14, %ax
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovdqu32 (%rdi), %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: mload_constmask_v4i32:
@ -843,9 +890,11 @@ define <4 x double> @mload_constmask_v4f64(<4 x double>* %addr, <4 x double> %ds
;
; AVX512F-LABEL: mload_constmask_v4f64:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vmovapd {{.*#+}} ymm1 = [18446744073709551615,18446744073709551615,18446744073709551615,0]
; AVX512F-NEXT: vmaskmovpd (%rdi), %ymm1, %ymm2
; AVX512F-NEXT: vblendvpd %ymm1, %ymm2, %ymm0, %ymm0
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512F-NEXT: movb $7, %al
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovupd (%rdi), %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; SKX-LABEL: mload_constmask_v4f64:
@ -898,9 +947,11 @@ define <4 x i64> @mload_constmask_v4i64(<4 x i64>* %addr, <4 x i64> %dst) {
;
; AVX512F-LABEL: mload_constmask_v4i64:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vmovdqa {{.*#+}} ymm1 = [18446744073709551615,0,0,18446744073709551615]
; AVX512F-NEXT: vpmaskmovq (%rdi), %ymm1, %ymm2
; AVX512F-NEXT: vblendvpd %ymm1, %ymm2, %ymm0, %ymm0
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512F-NEXT: movb $9, %al
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0 {%k1}
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; SKX-LABEL: mload_constmask_v4i64:
@ -950,8 +1001,10 @@ define <4 x double> @mload_constmask_v4f64_undef_passthrough(<4 x double>* %addr
;
; AVX512F-LABEL: mload_constmask_v4f64_undef_passthrough:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vmovapd {{.*#+}} ymm0 = [18446744073709551615,18446744073709551615,18446744073709551615,0]
; AVX512F-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm0
; AVX512F-NEXT: movb $7, %al
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovupd (%rdi), %zmm0 {%k1} {z}
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; SKX-LABEL: mload_constmask_v4f64_undef_passthrough:
@ -979,8 +1032,10 @@ define <4 x i64> @mload_constmask_v4i64_undef_passthrough(<4 x i64>* %addr) {
;
; AVX512F-LABEL: mload_constmask_v4i64_undef_passthrough:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vmovdqa {{.*#+}} ymm0 = [0,18446744073709551615,18446744073709551615,0]
; AVX512F-NEXT: vpmaskmovq (%rdi), %ymm0, %ymm0
; AVX512F-NEXT: movb $6, %al
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0 {%k1} {z}
; AVX512F-NEXT: ## kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; SKX-LABEL: mload_constmask_v4i64_undef_passthrough:
@ -1008,8 +1063,11 @@ define void @test21(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %val) {
;
; AVX512F-LABEL: test21:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
; AVX512F-NEXT: vpmaskmovd %xmm1, %xmm0, (%rdi)
; AVX512F-NEXT: ## kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: movw $15, %ax
; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovdqu32 %zmm1, (%rdi) {%k1}
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: test21:
@ -1225,7 +1283,14 @@ define void @trunc_mask(<4 x float> %x, <4 x float>* %ptr, <4 x float> %y, <4 x
;
; AVX512F-LABEL: trunc_mask:
; AVX512F: ## %bb.0:
; AVX512F-NEXT: vmaskmovps %xmm0, %xmm2, (%rdi)
; AVX512F-NEXT: ## kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512F-NEXT: ## kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX512F-NEXT: vpcmpgtd %zmm2, %zmm1, %k0
; AVX512F-NEXT: kshiftlw $12, %k0, %k0
; AVX512F-NEXT: kshiftrw $12, %k0, %k1
; AVX512F-NEXT: vmovups %zmm0, (%rdi) {%k1}
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; SKX-LABEL: trunc_mask:

View File

@ -8,32 +8,38 @@ target triple = "x86_64-unknown-linux-gnu"
define void @test(<4 x i1> %m, <4 x x86_fp80> %v, <4 x x86_fp80>*%p) local_unnamed_addr {
; KNL-LABEL: test:
; KNL: # %bb.0: # %bb
; KNL-NEXT: vpextrb $0, %xmm0, %eax
; KNL-NEXT: vpslld $31, %xmm0, %xmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k0
; KNL-NEXT: kshiftrw $1, %k0, %k1
; KNL-NEXT: kmovw %k1, %eax
; KNL-NEXT: kshiftrw $2, %k0, %k1
; KNL-NEXT: kshiftrw $1, %k1, %k2
; KNL-NEXT: kmovw %k1, %ecx
; KNL-NEXT: testb $1, %al
; KNL-NEXT: fld1
; KNL-NEXT: fldz
; KNL-NEXT: fld %st(0)
; KNL-NEXT: fcmovne %st(2), %st(0)
; KNL-NEXT: vpextrb $4, %xmm0, %eax
; KNL-NEXT: testb $1, %al
; KNL-NEXT: testb $1, %cl
; KNL-NEXT: fld %st(1)
; KNL-NEXT: fcmovne %st(3), %st(0)
; KNL-NEXT: vpextrb $8, %xmm0, %eax
; KNL-NEXT: kmovw %k2, %eax
; KNL-NEXT: testb $1, %al
; KNL-NEXT: fld %st(2)
; KNL-NEXT: fcmovne %st(4), %st(0)
; KNL-NEXT: vpextrb $12, %xmm0, %eax
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: testb $1, %al
; KNL-NEXT: fxch %st(3)
; KNL-NEXT: fcmovne %st(4), %st(0)
; KNL-NEXT: fstp %st(4)
; KNL-NEXT: fxch %st(3)
; KNL-NEXT: fstpt (%rdi)
; KNL-NEXT: fxch %st(1)
; KNL-NEXT: fstpt 30(%rdi)
; KNL-NEXT: fxch %st(1)
; KNL-NEXT: fstpt 20(%rdi)
; KNL-NEXT: fxch %st(1)
; KNL-NEXT: fstpt 10(%rdi)
; KNL-NEXT: fstpt (%rdi)
; KNL-NEXT: vzeroupper
; KNL-NEXT: retq
;
; SKX-LABEL: test:

View File

@ -10,17 +10,44 @@
;
define void @signum32a(<4 x float>*) {
; AVX-LABEL: signum32a:
; AVX: # %bb.0: # %entry
; AVX-NEXT: vmovaps (%rdi), %xmm0
; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX-NEXT: vcmpltps %xmm1, %xmm0, %xmm2
; AVX-NEXT: vcvtdq2ps %xmm2, %xmm2
; AVX-NEXT: vcmpltps %xmm0, %xmm1, %xmm0
; AVX-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX-NEXT: vsubps %xmm0, %xmm2, %xmm0
; AVX-NEXT: vmovaps %xmm0, (%rdi)
; AVX-NEXT: retq
; AVX1-LABEL: signum32a:
; AVX1: # %bb.0: # %entry
; AVX1-NEXT: vmovaps (%rdi), %xmm0
; AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vcmpltps %xmm1, %xmm0, %xmm2
; AVX1-NEXT: vcvtdq2ps %xmm2, %xmm2
; AVX1-NEXT: vcmpltps %xmm0, %xmm1, %xmm0
; AVX1-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX1-NEXT: vsubps %xmm0, %xmm2, %xmm0
; AVX1-NEXT: vmovaps %xmm0, (%rdi)
; AVX1-NEXT: retq
;
; AVX2-LABEL: signum32a:
; AVX2: # %bb.0: # %entry
; AVX2-NEXT: vmovaps (%rdi), %xmm0
; AVX2-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX2-NEXT: vcmpltps %xmm1, %xmm0, %xmm2
; AVX2-NEXT: vcvtdq2ps %xmm2, %xmm2
; AVX2-NEXT: vcmpltps %xmm0, %xmm1, %xmm0
; AVX2-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX2-NEXT: vsubps %xmm0, %xmm2, %xmm0
; AVX2-NEXT: vmovaps %xmm0, (%rdi)
; AVX2-NEXT: retq
;
; AVX512F-LABEL: signum32a:
; AVX512F: # %bb.0: # %entry
; AVX512F-NEXT: vmovaps (%rdi), %xmm0
; AVX512F-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX512F-NEXT: vcmpltps %zmm1, %zmm0, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm2, %zmm2, %zmm2 {%k1} {z}
; AVX512F-NEXT: vcvtdq2ps %xmm2, %xmm2
; AVX512F-NEXT: vcmpltps %zmm0, %zmm1, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX512F-NEXT: vsubps %xmm0, %xmm2, %xmm0
; AVX512F-NEXT: vmovaps %xmm0, (%rdi)
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
entry:
%1 = load <4 x float>, <4 x float>* %0
%2 = fcmp olt <4 x float> %1, zeroinitializer
@ -33,19 +60,48 @@ entry:
}
define void @signum64a(<2 x double>*) {
; AVX-LABEL: signum64a:
; AVX: # %bb.0: # %entry
; AVX-NEXT: vmovapd (%rdi), %xmm0
; AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX-NEXT: vcmpltpd %xmm1, %xmm0, %xmm2
; AVX-NEXT: vpermilps {{.*#+}} xmm2 = xmm2[0,2,2,3]
; AVX-NEXT: vcvtdq2pd %xmm2, %xmm2
; AVX-NEXT: vcmpltpd %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX-NEXT: vsubpd %xmm0, %xmm2, %xmm0
; AVX-NEXT: vmovapd %xmm0, (%rdi)
; AVX-NEXT: retq
; AVX1-LABEL: signum64a:
; AVX1: # %bb.0: # %entry
; AVX1-NEXT: vmovapd (%rdi), %xmm0
; AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vcmpltpd %xmm1, %xmm0, %xmm2
; AVX1-NEXT: vpermilps {{.*#+}} xmm2 = xmm2[0,2,2,3]
; AVX1-NEXT: vcvtdq2pd %xmm2, %xmm2
; AVX1-NEXT: vcmpltpd %xmm0, %xmm1, %xmm0
; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
; AVX1-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX1-NEXT: vsubpd %xmm0, %xmm2, %xmm0
; AVX1-NEXT: vmovapd %xmm0, (%rdi)
; AVX1-NEXT: retq
;
; AVX2-LABEL: signum64a:
; AVX2: # %bb.0: # %entry
; AVX2-NEXT: vmovapd (%rdi), %xmm0
; AVX2-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX2-NEXT: vcmpltpd %xmm1, %xmm0, %xmm2
; AVX2-NEXT: vpermilps {{.*#+}} xmm2 = xmm2[0,2,2,3]
; AVX2-NEXT: vcvtdq2pd %xmm2, %xmm2
; AVX2-NEXT: vcmpltpd %xmm0, %xmm1, %xmm0
; AVX2-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
; AVX2-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX2-NEXT: vsubpd %xmm0, %xmm2, %xmm0
; AVX2-NEXT: vmovapd %xmm0, (%rdi)
; AVX2-NEXT: retq
;
; AVX512F-LABEL: signum64a:
; AVX512F: # %bb.0: # %entry
; AVX512F-NEXT: vmovapd (%rdi), %xmm0
; AVX512F-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX512F-NEXT: vcmpltpd %zmm1, %zmm0, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm2, %zmm2, %zmm2 {%k1} {z}
; AVX512F-NEXT: vcvtdq2pd %xmm2, %xmm2
; AVX512F-NEXT: vcmpltpd %zmm0, %zmm1, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX512F-NEXT: vsubpd %xmm0, %xmm2, %xmm0
; AVX512F-NEXT: vmovapd %xmm0, (%rdi)
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
entry:
%1 = load <2 x double>, <2 x double>* %0
%2 = fcmp olt <2 x double> %1, zeroinitializer
@ -152,11 +208,11 @@ define void @signum64b(<4 x double>*) {
; AVX512F: # %bb.0: # %entry
; AVX512F-NEXT: vmovapd (%rdi), %ymm0
; AVX512F-NEXT: vxorpd %xmm1, %xmm1, %xmm1
; AVX512F-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2
; AVX512F-NEXT: vpmovqd %zmm2, %ymm2
; AVX512F-NEXT: vcmpltpd %zmm1, %zmm0, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm2, %zmm2, %zmm2 {%k1} {z}
; AVX512F-NEXT: vcvtdq2pd %xmm2, %ymm2
; AVX512F-NEXT: vcmpltpd %ymm0, %ymm1, %ymm0
; AVX512F-NEXT: vpmovqd %zmm0, %ymm0
; AVX512F-NEXT: vcmpltpd %zmm0, %zmm1, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: vcvtdq2pd %xmm0, %ymm0
; AVX512F-NEXT: vsubpd %ymm0, %ymm2, %ymm0
; AVX512F-NEXT: vmovapd %ymm0, (%rdi)

View File

@ -6,7 +6,14 @@
define <2 x i1> @shuf2i1_1_0(<2 x i1> %a) {
; AVX512F-LABEL: shuf2i1_1_0:
; AVX512F: # %bb.0:
; AVX512F-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[2,3,0,1]
; AVX512F-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1
; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1
; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: shuf2i1_1_0:
@ -36,9 +43,16 @@ define <2 x i1> @shuf2i1_1_0(<2 x i1> %a) {
define <2 x i1> @shuf2i1_1_2(<2 x i1> %a) {
; AVX512F-LABEL: shuf2i1_1_2:
; AVX512F: # %bb.0:
; AVX512F-NEXT: movl $1, %eax
; AVX512F-NEXT: vpsllq $63, %xmm0, %xmm0
; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1
; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: movq $-1, %rax
; AVX512F-NEXT: vmovq %rax, %xmm1
; AVX512F-NEXT: vpalignr {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],xmm1[0,1,2,3,4,5,6,7]
; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1
; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: shuf2i1_1_2:
@ -73,7 +87,14 @@ define <2 x i1> @shuf2i1_1_2(<2 x i1> %a) {
define <4 x i1> @shuf4i1_3_2_10(<4 x i1> %a) {
; AVX512F-LABEL: shuf4i1_3_2_10:
; AVX512F: # %bb.0:
; AVX512F-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0]
; AVX512F-NEXT: vpslld $31, %xmm0, %xmm0
; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,2,1,0]
; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k1
; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: shuf4i1_3_2_10:

View File

@ -43,10 +43,22 @@ define <8 x i16> @signbit_sel_v8i16(<8 x i16> %x, <8 x i16> %y, <8 x i16> %mask)
}
define <4 x i32> @signbit_sel_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) {
; AVX12F-LABEL: signbit_sel_v4i32:
; AVX12F: # %bb.0:
; AVX12F-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX12F-NEXT: retq
; AVX12-LABEL: signbit_sel_v4i32:
; AVX12: # %bb.0:
; AVX12-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX12-NEXT: retq
;
; AVX512F-LABEL: signbit_sel_v4i32:
; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512F-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512F-NEXT: vpcmpgtd %zmm2, %zmm3, %k1
; AVX512F-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v4i32:
; AVX512VL: # %bb.0:
@ -60,10 +72,22 @@ define <4 x i32> @signbit_sel_v4i32(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask)
}
define <2 x i64> @signbit_sel_v2i64(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask) {
; AVX12F-LABEL: signbit_sel_v2i64:
; AVX12F: # %bb.0:
; AVX12F-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; AVX12F-NEXT: retq
; AVX12-LABEL: signbit_sel_v2i64:
; AVX12: # %bb.0:
; AVX12-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; AVX12-NEXT: retq
;
; AVX512F-LABEL: signbit_sel_v2i64:
; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512F-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512F-NEXT: vpcmpgtq %zmm2, %zmm3, %k1
; AVX512F-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v2i64:
; AVX512VL: # %bb.0:
@ -77,10 +101,22 @@ define <2 x i64> @signbit_sel_v2i64(<2 x i64> %x, <2 x i64> %y, <2 x i64> %mask)
}
define <4 x float> @signbit_sel_v4f32(<4 x float> %x, <4 x float> %y, <4 x i32> %mask) {
; AVX12F-LABEL: signbit_sel_v4f32:
; AVX12F: # %bb.0:
; AVX12F-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX12F-NEXT: retq
; AVX12-LABEL: signbit_sel_v4f32:
; AVX12: # %bb.0:
; AVX12-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX12-NEXT: retq
;
; AVX512F-LABEL: signbit_sel_v4f32:
; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512F-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512F-NEXT: vpcmpgtd %zmm2, %zmm3, %k1
; AVX512F-NEXT: vblendmps %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v4f32:
; AVX512VL: # %bb.0:
@ -94,10 +130,22 @@ define <4 x float> @signbit_sel_v4f32(<4 x float> %x, <4 x float> %y, <4 x i32>
}
define <2 x double> @signbit_sel_v2f64(<2 x double> %x, <2 x double> %y, <2 x i64> %mask) {
; AVX12F-LABEL: signbit_sel_v2f64:
; AVX12F: # %bb.0:
; AVX12F-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; AVX12F-NEXT: retq
; AVX12-LABEL: signbit_sel_v2f64:
; AVX12: # %bb.0:
; AVX12-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; AVX12-NEXT: retq
;
; AVX512F-LABEL: signbit_sel_v2f64:
; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512F-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512F-NEXT: vpcmpgtq %zmm2, %zmm3, %k1
; AVX512F-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v2f64:
; AVX512VL: # %bb.0:
@ -203,10 +251,21 @@ define <8 x i32> @signbit_sel_v8i32(<8 x i32> %x, <8 x i32> %y, <8 x i32> %mask)
}
define <4 x i64> @signbit_sel_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask) {
; AVX12F-LABEL: signbit_sel_v4i64:
; AVX12F: # %bb.0:
; AVX12F-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; AVX12F-NEXT: retq
; AVX12-LABEL: signbit_sel_v4i64:
; AVX12: # %bb.0:
; AVX12-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; AVX12-NEXT: retq
;
; AVX512F-LABEL: signbit_sel_v4i64:
; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; AVX512F-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; AVX512F-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512F-NEXT: vpcmpgtq %zmm2, %zmm3, %k1
; AVX512F-NEXT: vpblendmq %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v4i64:
; AVX512VL: # %bb.0:
@ -220,10 +279,21 @@ define <4 x i64> @signbit_sel_v4i64(<4 x i64> %x, <4 x i64> %y, <4 x i64> %mask)
}
define <4 x double> @signbit_sel_v4f64(<4 x double> %x, <4 x double> %y, <4 x i64> %mask) {
; AVX12F-LABEL: signbit_sel_v4f64:
; AVX12F: # %bb.0:
; AVX12F-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; AVX12F-NEXT: retq
; AVX12-LABEL: signbit_sel_v4f64:
; AVX12: # %bb.0:
; AVX12-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; AVX12-NEXT: retq
;
; AVX512F-LABEL: signbit_sel_v4f64:
; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def %ymm2 killed %ymm2 def %zmm2
; AVX512F-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; AVX512F-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512F-NEXT: vpcmpgtq %zmm2, %zmm3, %k1
; AVX512F-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v4f64:
; AVX512VL: # %bb.0:
@ -256,8 +326,13 @@ define <4 x double> @signbit_sel_v4f64_small_mask(<4 x double> %x, <4 x double>
;
; AVX512F-LABEL: signbit_sel_v4f64_small_mask:
; AVX512F: # %bb.0:
; AVX512F-NEXT: vpmovsxdq %xmm2, %ymm2
; AVX512F-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
; AVX512F-NEXT: # kill: def %xmm2 killed %xmm2 def %zmm2
; AVX512F-NEXT: # kill: def %ymm1 killed %ymm1 def %zmm1
; AVX512F-NEXT: # kill: def %ymm0 killed %ymm0 def %zmm0
; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
; AVX512F-NEXT: vpcmpgtd %zmm2, %zmm3, %k1
; AVX512F-NEXT: vblendmpd %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %ymm0 killed %ymm0 killed %zmm0
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v4f64_small_mask:
@ -296,12 +371,23 @@ define <8 x double> @signbit_sel_v8f64(<8 x double> %x, <8 x double> %y, <8 x i6
; (2) FIXME: If we don't care about signed-zero (and NaN?), the compare should still get folded.
define <4 x float> @signbit_sel_v4f32_fcmp(<4 x float> %x, <4 x float> %y, <4 x float> %mask) #0 {
; AVX12F-LABEL: signbit_sel_v4f32_fcmp:
; AVX12F: # %bb.0:
; AVX12F-NEXT: vxorps %xmm2, %xmm2, %xmm2
; AVX12F-NEXT: vcmpltps %xmm2, %xmm0, %xmm2
; AVX12F-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX12F-NEXT: retq
; AVX12-LABEL: signbit_sel_v4f32_fcmp:
; AVX12: # %bb.0:
; AVX12-NEXT: vxorps %xmm2, %xmm2, %xmm2
; AVX12-NEXT: vcmpltps %xmm2, %xmm0, %xmm2
; AVX12-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX12-NEXT: retq
;
; AVX512F-LABEL: signbit_sel_v4f32_fcmp:
; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def %xmm1 killed %xmm1 def %zmm1
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 def %zmm0
; AVX512F-NEXT: vxorps %xmm2, %xmm2, %xmm2
; AVX512F-NEXT: vcmpltps %zmm2, %zmm0, %k1
; AVX512F-NEXT: vblendmps %zmm0, %zmm1, %zmm0 {%k1}
; AVX512F-NEXT: # kill: def %xmm0 killed %xmm0 killed %zmm0
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512VL-LABEL: signbit_sel_v4f32_fcmp:
; AVX512VL: # %bb.0: