Commit Graph

39 Commits

Author SHA1 Message Date
Ahmed Bougacha 7dfaaf3891 [Headers][X86] Fix stream_load (movntdqa) to accept const*.
Per Intel intrinsics guide:
- _mm256_stream_load_si256 takes `__m256i const *'
- _mm_stream_load_si128 takes `__m128i *', for no good reason.

Let's accept const* for both.

llvm-svn: 249213
2015-10-02 23:29:26 +00:00
Andrea Di Biagio f9989b04bf Make test more resilient to FastIsel changes. NFC.
Currently FastISel doesn't know how to select vector bitcasts.
During instruction selection, fast-isel always falls back to SelectionDAG 
every time it encounters a vector bitcast.
As a consequence of this, all the 'packed vector shift by immedate count'
test cases in avx2-builtins.c are optimized by the DAGCombiner.
In particular, the DAGCombiner would always fold trivial stack loads of
constant shift counts into the operands of packed shift builtins.

This behavior would start changing as soon as I reapply revision 249121.
That revision would teach x86 fast-isel how to select bitcasts between vector
types of the same size.

As a consequence of that change, fast-isel would less often fall back to
SelectionDAG. More importantly, DAGCombiner would no longer be able to 
simplify the code by folding the stack reload of a constant.

No functional change.

llvm-svn: 249142
2015-10-02 15:10:22 +00:00
Chandler Carruth cbe6411401 Fix the SSE4 byte sign extension in a cleaner way, and more thoroughly
test that our intrinsics behave the same under -fsigned-char and
-funsigned-char.

This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit
elements, and has for a long time. This is fixed in the same way as
SSE4 handles the case.

The other ISA extensions currently work correctly because they use
specific instruction intrinsics. As soon as they are rewritten in terms
of generic IR, they will need to add these special casts. I've added the
necessary testing to catch this however, so we shouldn't have to chase
it down again.

I considered changing the core typedef to be signed, but that seems like
a bad idea. Notably, it would be an ABI break if anyone is reaching into
the innards of the intrinsic headers and passing __v16qi on an API
boundary. I can't be completely confident that this wouldn't happen due
to a macro expanding in a lambda, etc., so it seems much better to leave
it alone. It also matches GCC's behavior exactly.

A fun side note is that for both GCC and Clang, -funsigned-char really
does change the semantics of __v16qi. To observe this, consider:

  % cat x.cc
  #include <smmintrin.h>
  #include <iostream>

  int main() {
    __v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    __v16qi b = _mm_set1_epi8(-1);
    std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n';
  }
  % clang++ -o x x.cc && ./x
  -1, 1
  % clang++ -funsigned-char -o x x.cc && ./x
  0, 1

However, while this may be surprising, both Clang and GCC agree.

Differential Revision: http://reviews.llvm.org/D13324

llvm-svn: 249097
2015-10-01 23:40:12 +00:00
Ahmed Bougacha beb3a9a970 [Headers] Require x86-registered for r245987 codegen tests.
llvm-svn: 245992
2015-08-25 23:42:55 +00:00
Ahmed Bougacha e9abaad105 [Headers][X86] Add -O0 assembly tests for avx2 intrinsics.
We agreed for r245605 that, as long as we don't affect -O0 codegen
too much, it's OK to use native constructs rather than intrinsics.
Let's test that, starting with AVX2 here.

See PR24580.

Differential Revision: http://reviews.llvm.org/D12212

llvm-svn: 245987
2015-08-25 23:09:05 +00:00
Ahmed Bougacha 5e354cb547 [Headers][X86] Use __builtin_shufflevector in AVX2 broadcasts.
This lets us optimize them better. We agreed to remove the intrinsics,
instead of combining them later, as, at -O0, we generate the expected
instructions. Plus, it's a nice cleanup.

Differential Revision: http://reviews.llvm.org/D10556

llvm-svn: 245605
2015-08-20 20:27:21 +00:00
Michael Kuperstein 877f3cbe84 [X86] Add _mm_broadcastsd_pd intrinsic
_mm_broadcastsd_pd is basically an alias for _mm_movedup_pd, however the alias is only available from AVX2 forward.

llvm-svn: 237698
2015-05-19 14:49:14 +00:00
Michael Kuperstein 6168183e04 [X86] Added _mm256_bslli_epi128 and _mm256_bsrli_epi128.
These two intrinsics are alternative names for _mm256_slli_si256 and _mm256_srli_si256, respectively.

llvm-svn: 237693
2015-05-19 13:05:46 +00:00
Sanjay Patel 0a6da5de55 [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles
This is nearly identical to the v*f128_si256 parts of r231792 and r232052.

AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.

This should complete the front end fixes for insert/extract128 intrinsics. 
Corresponding LLVM patch to follow.

llvm-svn: 232109
2015-03-12 21:54:24 +00:00
Juergen Ributzka 9baa03fc07 Lower _mm256_broadcastsi128_si256 directly to a vector shuffle.
Originally we were using the same GCC builtins to lower this AVX2 vector
intrinsic. Instead we will now lower it directly to a vector shuffle.

This will not only allow LLVM to generate better code, but it will also allow us
to remove the GCC intrinsics.

Reviewed by Andrea

This is related to rdar://problem/18742778.

llvm-svn: 231081
2015-03-03 17:22:53 +00:00
Manuel Klimek fa27b8861b Make tests independent of llvm variable naming.
llvm-svn: 229484
2015-02-17 09:49:31 +00:00
Craig Topper 96f9a573b5 [X86] Convert palignr builtin handling to use shuffle form of right shift instead of intrinsics. This should allow the instrinsics to removed from the backend.
llvm-svn: 229474
2015-02-17 07:18:01 +00:00
Craig Topper 370644f66e [X86] Teach clang to lower __builtin_ia32_psrldqi256 and __builtin_ia32_pslldqi256 to vector shuffles the backend recognizes. This is a step towards removing the corresponding intrinsics from the backend.
llvm-svn: 229348
2015-02-16 00:42:49 +00:00
Chandler Carruth 2949e548f4 [x86] Clean up the x86 builtin specs to reflect r217310 in LLVM which
made the 8-bit masks actually 8-bit arguments to these intrinsics.

These builtins are a mess. Many were missing the I qualifier which
I added where obviously correct. Most aren't tested, but I've updated
the relevant tests. I've tried to catch all the things that should
become 'c' in this round.

It's also frustrating because the set of these is really ad-hoc and
doesn't really map that cleanly to the set supported by either GCC or
LLVM. Oh well...

llvm-svn: 217311
2014-09-06 10:30:51 +00:00
Filipe Cabecinhas e897d7e7c2 Fixed a few tests and moved a comment to its proper place
llvm-svn: 208665
2014-05-13 05:21:11 +00:00
Filipe Cabecinhas 5d289b48b1 Patched clang to emit x86 blends as shufflevectors.
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.

LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600

Reviewers: eli.friedman, craig.topper, rafael

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D3601

llvm-svn: 208664
2014-05-13 02:37:02 +00:00
Michael J. Spencer 807cf41e2f Fix test to not depend on llvm optimizations.
llvm-svn: 207062
2014-04-24 02:16:29 +00:00
Eli Friedman 3cd55f49ab Fix argument types of some AVX2 intrinsics.
This fix makes our headers consistent with gcc.

PR17312.

llvm-svn: 191248
2013-09-23 23:52:04 +00:00
Juergen Ributzka 2c2dbf4542 Fix the name and the type of the argument for intrinisc
_mm256_broadcastsi128_si256 to align with the Intel documentation.

This fixes bug PR 16581 and rdar:14747994.

llvm-svn: 188609
2013-08-17 16:40:09 +00:00
Manman Ren f865ba0c0e X86: add more GATHER intrinsics in Clang
Support the following intrinsics:
  _mm_i32gather_pd, _mm256_i32gather_pd,
  _mm_i64gather_pd, _mm256_i64gather_pd,
  _mm_i32gather_ps, _mm256_i32gather_ps,
  _mm_i64gather_ps, _mm256_i64gather_ps,
  _mm_i32gather_epi64, _mm256_i32gather_epi64,
  _mm_i64gather_epi64, _mm256_i64gather_epi64,
  _mm_i32gather_epi32, _mm256_i32gather_epi32,
  _mm_i64gather_epi32, _mm256_i64gather_epi32

llvm-svn: 159410
2012-06-29 05:19:13 +00:00
Manman Ren 86c3250b82 X86: add more GATHER intrinsics in Clang
Corrected type for index of _mm256_mask_i32gather_pd
  from 256-bit to 128-bit
Corrected types for src|dst|mask of _mm256_mask_i64gather_ps
  from 256-bit to 128-bit

Support the following intrinsics:
  _mm_mask_i32gather_epi64, _mm256_mask_i32gather_epi64,
  _mm_mask_i64gather_epi64, _mm256_mask_i64gather_epi64,
  _mm_mask_i32gather_epi32, _mm256_mask_i32gather_epi32,
  _mm_mask_i64gather_epi32, _mm256_mask_i64gather_epi32

llvm-svn: 159403
2012-06-29 00:54:35 +00:00
Manman Ren add5e9e289 X86: add GATHER intrinsics (AVX2) in Clang
Support the following intrinsics:
  _mm_mask_i32gather_pd, _mm256_mask_i32gather_pd, _mm_mask_i64gather_pd
  _mm256_mask_i64gather_pd, _mm_mask_i32gather_ps, _mm256_mask_i32gather_ps
  _mm_mask_i64gather_ps, _mm256_mask_i64gather_ps

llvm-svn: 159222
2012-06-26 19:55:09 +00:00
Craig Topper 26e74e50b6 Convert vperm2f128 and vperm2i128 intrinsics back to using llvm intrinsics. Unfortunately, these instructions have behavior that can't be modeled with shuffle vector.
llvm-svn: 154906
2012-04-17 05:16:56 +00:00
Craig Topper 8e57855ea0 Change _mm256_permute4x64_epi64 and _mm256_permute4x64_pd to use builtin_shufflevector instead of specific builtins. Old builtins will be removed from llvm now that vpermq/vpermpd are supported by shuffle lowering code.
llvm-svn: 154777
2012-04-15 22:18:10 +00:00
Craig Topper e5ea3b0239 Remove vperm2f* and vperm2i builtins. Same effect can be achieved with builtin_shufflevector.
llvm-svn: 150064
2012-02-08 07:33:36 +00:00
Craig Topper 175543ac78 Add last of the AVX2 intrinsics except for gather.
llvm-svn: 147253
2011-12-24 17:20:15 +00:00
Craig Topper a9387739aa Add AVX2 permute intrinsics. Also add parentheses on some macro arguments in other intrinsic headers.
llvm-svn: 147242
2011-12-24 07:55:25 +00:00
Craig Topper 9a01e58447 Add AVX2 intrinsics for FP vbroadcast, vbroadcasti128, and vpblendd.
llvm-svn: 147240
2011-12-24 05:19:47 +00:00
Craig Topper a6fdbd1807 Intrinsics for AVX2 unpack instructions.
llvm-svn: 147237
2011-12-24 03:58:43 +00:00
Craig Topper f4bb952533 More AVX2 intrinsics for shift, psign, some shuffles, and psadbw.
llvm-svn: 147236
2011-12-24 03:28:57 +00:00
Craig Topper 235a365d58 Add AVX2 multiply intrinsics.
llvm-svn: 147219
2011-12-23 08:31:16 +00:00
Craig Topper 1f2460ad43 Add AVX2 intrinsics for max, min, sign extend, and zero extend.
llvm-svn: 147141
2011-12-22 09:18:58 +00:00
Craig Topper a73baa8050 Add a few more AVX2 intrinsics and fix the type strings on a couple SSE intrinsics.
llvm-svn: 147048
2011-12-21 08:35:05 +00:00
Craig Topper 3fe5ac40db Add AVX2 horizontal add/sub intrinsics.
llvm-svn: 147047
2011-12-21 08:17:40 +00:00
Craig Topper a89747dd1e Add AVX2 intrinsics for pavg, pblend, and pcmp instructions. Also remove unneeded builtins for SSE pcmp. Change SSE pcmpeqq and pcmpgtq to not use builtins and just use vector == and >.
llvm-svn: 146969
2011-12-20 09:55:26 +00:00
Eli Friedman 61221b72ee Attempt to fix test in Release builds.
llvm-svn: 146898
2011-12-19 20:09:01 +00:00
Craig Topper a557e1c122 Add AVX2 intrinsics for and, andn, or, and xor.
llvm-svn: 146862
2011-12-19 09:03:48 +00:00
Craig Topper 94aba2c260 More AVX2 intrinsic support including saturating add/sub and palignr.
llvm-svn: 146857
2011-12-19 07:03:25 +00:00
Craig Topper dec792ebb5 Begin adding AVX2 intrinsics. Necessitated increasing the number of bits used to store builtinID when serializing identifier table.
llvm-svn: 146855
2011-12-19 05:04:33 +00:00