Go to file
Bruno Cardoso Lopes 811c173523 [x86] Add vector @llvm.ctpop intrinsic custom lowering
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.

Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)

When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.

Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.

== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707

== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244

More work for other vector types will come next.

llvm-svn: 224725
2014-12-22 19:45:43 +00:00
clang Disable trigraphs in microsoft mode by default. Matches cl.exe. 2014-12-22 18:35:03 +00:00
clang-tools-extra Fixed a typo in a comment. NFC. 2014-12-19 15:37:02 +00:00
compiler-rt [Msan] Fix msan_test.cc inclusions to build the unit tests on FreeBSD 2014-12-22 19:14:23 +00:00
debuginfo-tests New round of fixes for "Always compile debuginfo-tests for the host triple" 2014-10-18 23:47:59 +00:00
libclc Remove wrong semi-colons 2014-12-19 09:18:23 +00:00
libcxx Fix PR22000. __bit_iterator::move_backwards. Also make a note that __bit_iterator 2014-12-22 19:10:11 +00:00
libcxxabi Silence warnings in libunwind. 2014-12-21 14:22:00 +00:00
lld [macho] Minor install_name fixes 2014-12-20 09:22:56 +00:00
lldb Add support for frameless function compact unwind encodings on x86_64/i386. 2014-12-22 11:02:02 +00:00
llgo Test commit 2014-12-19 02:45:48 +00:00
llvm [x86] Add vector @llvm.ctpop intrinsic custom lowering 2014-12-22 19:45:43 +00:00
openmp I apologise in advance for the size of this check-in. At Intel we do 2014-10-07 16:25:50 +00:00
polly (diagnostics) fix typo in test... 2014-12-19 17:22:46 +00:00