Go to file
Sanjay Patel 2fbc4e5c49 transform fadd chains to increase parallelism
This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.

In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations. 
The optimal balanced binary tree would reduce the chain to log2(N).

The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.

We can extend this patch to cover other associative operations such as fmul, fmax, fmin, 
integer add, integer mul.

This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305

and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

The issue also came up in:
http://reviews.llvm.org/D8941

Differential Revision: http://reviews.llvm.org/D9232

llvm-svn: 236031
2015-04-28 21:03:22 +00:00
clang [cuda] Preserve TLS storage class of host variable even if it's a 2015-04-28 20:31:49 +00:00
clang-tools-extra Disable clang-tools-extra/test/pp-trace/pp-trace-modules.cpp on win32 for now. Investigating. 2015-04-28 17:31:36 +00:00
compiler-rt [asan] Use dl_iterate_phdr on Android. 2015-04-28 18:50:32 +00:00
debuginfo-tests New round of fixes for "Always compile debuginfo-tests for the host triple" 2014-10-18 23:47:59 +00:00
libclc Fix compilation warnings without cl_khr_fp64 2015-04-24 19:54:17 +00:00
libcxx Removed 'complete' from 2408; updated status 2015-04-28 19:35:36 +00:00
libcxxabi libc++abi: work around layering violation 2015-04-28 02:52:47 +00:00
libunwind unwind: remove inclusion of private_typeinfo.h 2015-04-27 16:51:52 +00:00
lld Use MemoryBufferRef instead of MemoryBuffer&. NFC. 2015-04-27 22:48:51 +00:00
lldb [TestMiVar] Enable one of the tests for GCC. 2015-04-28 19:21:57 +00:00
llgo [llgo] add buildbot-slave config 2015-04-08 01:41:46 +00:00
llvm transform fadd chains to increase parallelism 2015-04-28 21:03:22 +00:00
openmp updated copyright date to 2015 2015-04-16 11:10:17 +00:00
polly Extract IslNodeBuilder into its own file 2015-04-27 12:32:24 +00:00