Commit Graph

11429 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen 7b73528064 Add assertions to verify that the new interval is clear of the interference.
If these inequalities don't hold, we are creating a live range split that won't
allocate.

llvm-svn: 124917
2011-02-05 01:06:36 +00:00
Jakob Stoklund Olesen e8ac8e93a1 Apparently, it is possible for a block with a landing pad successor to have no calls.
In that case we simply ignore the landing pad and split live ranges before the
first terminator.

llvm-svn: 124907
2011-02-04 23:11:13 +00:00
Devang Patel 116a9d7c38 Merge .debug_loc entries whenever possible to reduce debug_loc size.
llvm-svn: 124904
2011-02-04 22:57:18 +00:00
Nick Lewycky d650b30488 Mark that the return is using EAX so that we don't use it for some other
purpose. Fixes PR9080!

llvm-svn: 124903
2011-02-04 22:44:08 +00:00
Jakob Stoklund Olesen 80a2878b5d Be more accurate about live range splitting at the end of blocks.
If interference reaches the last split point, it is effectively live out and
should be marked as 'MustSpill'.

This can make a difference when the terminator uses a register. There is no way
that register can be reused in the outgoing CFG bundle, even if it isn't live
out.

llvm-svn: 124900
2011-02-04 21:42:06 +00:00
Jakob Stoklund Olesen 096bd8837f Add LiveIntervals::getLastSplitPoint().
A live range cannot be split everywhere in a basic block. A split must go before
the first terminator, and if the variable is live into a landing pad, the split
must happen before the call that can throw.

llvm-svn: 124894
2011-02-04 19:33:11 +00:00
Jakob Stoklund Olesen fefe6ebc73 Verify that one of the ranges produced by region splitting is allocatable.
We should not be attempting a region split if it won't lead to at least one
directly allocatable interval. That could cause infinite splitting loops.

llvm-svn: 124893
2011-02-04 19:33:07 +00:00
Andrew Trick d0548ae750 Introducing a new method of tracking register pressure. We can't
precisely track pressure on a selection DAG, but we can at least keep
it balanced. This design accounts for various interesting aspects of
selection DAGS: register and subregister copies, glued nodes, dead
nodes, unused registers, etc.

Added SUnit::NumRegDefsLeft and ScheduleDAGSDNodes::RegDefIter.

Note: I disabled PrescheduleNodesWithMultipleUses when register
pressure is enabled, based on no evidence other than I don't think it
makes sense to have both enabled.

llvm-svn: 124853
2011-02-04 03:18:17 +00:00
Devang Patel 26ffa01889 DebugLoc associated with a machine instruction is used to emit location entries. DebugLoc associated with a DBG_VALUE is used to identify lexical scope of the variable. After register allocation, while inserting DBG_VALUE remember original debug location for the first instruction and reuse it, otherwise dwarf writer may be mislead in identifying the variable's scope.
llvm-svn: 124845
2011-02-04 01:43:25 +00:00
Evan Cheng f7073d1445 Update comments.
llvm-svn: 124843
2011-02-04 01:10:12 +00:00
Jakob Stoklund Olesen 3295a99fe9 Skip unused values.
llvm-svn: 124842
2011-02-04 00:59:23 +00:00
Jakob Stoklund Olesen b336c50c81 Also compute interference intervals for blocks with no uses.
When the live range is live through a block that doesn't use the register, but
that has interference, region splitting wants to split at the top and bottom of
the basic block.

llvm-svn: 124839
2011-02-04 00:39:20 +00:00
Jakob Stoklund Olesen 66d0f39904 Verify kill flags conservatively.
Allow a live range to end with a kill flag, but don't allow a kill flag that
doesn't end the live range.

This makes the machine code verifier more useful during register allocation when
kill flag computation is deferred.

llvm-svn: 124838
2011-02-04 00:39:18 +00:00
Andrew Trick 3f924e4e87 whitespace
llvm-svn: 124827
2011-02-03 23:00:17 +00:00
Jakob Stoklund Olesen 4a6518e6a8 Ensure that the computed interference intervals actually overlap their basic blocks.
llvm-svn: 124815
2011-02-03 20:29:43 +00:00
Jakob Stoklund Olesen db4cf7e4a4 Tweak debug output from SlotIndexes.
llvm-svn: 124814
2011-02-03 20:29:41 +00:00
Jakob Stoklund Olesen d8f62e2a62 Add debug output and asserts to the phi-connecting code.
llvm-svn: 124813
2011-02-03 20:29:39 +00:00
Jakob Stoklund Olesen 8c0254870b Fix coloring bug when mapping values in the middle of a live-through block.
If the found value is not live-through the block, we should only add liveness up
to the requested slot index. When the value is live-through, the whole block
should be colored.

Bug found by SSA verification in the machine code verifier.

llvm-svn: 124812
2011-02-03 20:29:36 +00:00
Jakob Stoklund Olesen f12e120743 Return live range end points from SplitEditor::enter*/leave*.
These end points come from the inserted copies, and can be passed directly to
useIntv. This simplifies the coloring code.

llvm-svn: 124799
2011-02-03 17:04:16 +00:00
Jakob Stoklund Olesen 2b855eb69c Silence an MSVC warning
llvm-svn: 124798
2011-02-03 17:04:12 +00:00
Eric Christopher ede6267993 Reapply this.
llvm-svn: 124779
2011-02-03 06:18:29 +00:00
Eric Christopher 21933539f2 Temporarily revert 124765 in an attempt to find the cycle breaking bootstrap.
llvm-svn: 124778
2011-02-03 05:40:54 +00:00
Jakob Stoklund Olesen dca2917e25 Defer SplitKit value mapping until all defs are available.
The greedy register allocator revealed some problems with the value mapping in
SplitKit. We would sometimes start mapping values before all defs were known,
and that could change a value from a simple 1-1 mapping to a multi-def mapping
that requires ssa update.

The new approach collects all defs and register assignments first without
filling in any live intervals. Only when finish() is called, do we compute
liveness and mapped values. At this time we know with certainty which values map
to multiple values in a split range.

This also has the advantage that we can compute live ranges based on the
remaining uses after rematerializing at split points.

The current implementation has many opportunities for compile time optimization.

llvm-svn: 124765
2011-02-03 00:54:23 +00:00
Devang Patel be933b470a Add support to describe template value parameter in debug info.
llvm-svn: 124755
2011-02-02 22:35:53 +00:00
Devang Patel 3a9e65efb6 Add support to describe template parameter type in debug info.
llvm-svn: 124752
2011-02-02 21:38:25 +00:00
Evan Cheng d42641c6b5 Given a pair of floating point load and store, if there are no other uses of
the load, then it may be legal to transform the load and store to integer
load and store of the same width.

This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []

to

ldr r12, []
str r12, []

rdar://8944252

llvm-svn: 124708
2011-02-02 01:06:55 +00:00
Matt Beaumont-Gay 29c8c8fe92 Take Bill Wendling's suggestion for structuring a couple of asserts.
llvm-svn: 124688
2011-02-01 22:12:50 +00:00
Devang Patel 56cc5fdf09 Keep track of incoming argument's location while emitting LiveIns.
llvm-svn: 124611
2011-01-31 21:38:14 +00:00
Richard Osborne 272e084bca Fix bug where ReduceLoadWidth was creating illegal ZEXTLOAD instructions.
llvm-svn: 124587
2011-01-31 17:41:44 +00:00
Anton Korobeynikov fe3a6e049d Clarify the LSDASection NULL check
llvm-svn: 124569
2011-01-30 22:07:31 +00:00
Jakob Stoklund Olesen 9af7afcb7f Respect the -tail-dup-size command line option even when optimizing for size.
This is similar to the -unroll-threshold option. There should be no change in
behavior when -tail-dup-size is not explicit on the llc command line.

llvm-svn: 124564
2011-01-30 20:38:12 +00:00
Benjamin Kramer 946e1522b6 Teach DAGCombine to fold fold (sra (trunc (sr x, c1)), c2) -> (trunc (sra x, c1+c2) when c1 equals the amount of bits that are truncated off.
This happens all the time when a smul is promoted to a larger type.

On x86-64 we now compile "int test(int x) { return x/10; }" into
  movslq  %edi, %rax
  imulq $1717986919, %rax, %rax
  movq  %rax, %rcx
  shrq  $63, %rcx
  sarq  $34, %rax <- used to be "shrq $32, %rax; sarl $2, %eax"
  addl  %ecx, %eax

This fires 96 times in gcc.c on x86-64.

llvm-svn: 124559
2011-01-30 16:38:43 +00:00
Benjamin Kramer 65bb14d368 Add the missing sub identity "A-(A-B) -> B" to DAGCombine.
This happens e.g. for code like "X - X%10" where we lower the modulo operation
to a series of multiplies and shifts that are then subtracted from X, leading to
this missed optimization.

llvm-svn: 124532
2011-01-29 12:34:05 +00:00
Evan Cheng d983eba7dc Re-apply r124518 with fix. Watch out for invalidated iterator.
llvm-svn: 124526
2011-01-29 04:46:23 +00:00
Evan Cheng 65b8ccf6ac Revert r124518. It broke Linux self-host.
llvm-svn: 124522
2011-01-29 02:43:04 +00:00
Evan Cheng d4eff31476 Re-commit r124462 with fixes. Tail recursion elim will now dup ret into unconditional predecessor to enable TCE on demand.
llvm-svn: 124518
2011-01-29 01:29:26 +00:00
Evan Cheng aaa9606b2f Revert r124462. There are a few big regressions that I need to fix first.
llvm-svn: 124478
2011-01-28 07:12:38 +00:00
Nick Lewycky 0af77fd45b Fix build with stdcxx by using llvm::next. Patch by Joerg Sonnenberger!
llvm-svn: 124472
2011-01-28 04:00:15 +00:00
Rafael Espindola 6c17d54891 Print the visibility of declarations.
llvm-svn: 124468
2011-01-28 03:20:10 +00:00
Evan Cheng 417fca86c4 - Stop simplifycfg from duplicating "ret" instructions into unconditional
branches. PR8575, rdar://5134905, rdar://8911460.
- Allow codegen tail duplication to dup small return blocks after register
  allocation is done.

llvm-svn: 124462
2011-01-28 02:19:21 +00:00
Andrew Trick c0ca67601a Remove a temporary workaround for a lencod miscompile. Depends on the fix in r124442.
llvm-svn: 124443
2011-01-27 21:28:51 +00:00
Andrew Trick 13bb644fdd VirtRegRewriter fix: update kill flags, which are used by the scavenger.
rdar://problem/8893967: JM/lencod miscompile at -arch armv7 -mthumb -O3

Added ResurrectKill to remove kill flags after we decide to reused a
physical register. And (hopefully) ensure that we call it in all the
right places.

Sorry, I'm not checking in a unit test given that it's a miscompile I
can't reproduce easily with a toy example. Failures in the rewriter
depend on a series of heuristic decisions maked during one of the many
upstream phases in codegen. This case would require coercing regalloc
to generate a couple of rematerialzations in a way that causes the
scavenger to reuse the same register at just the wrong point.

The general way to test this is to implement kill flags
verification. Then we could have a simple, robust compile-only unit
test. That would be worth doing if the whole pass was not about to
disappear. At this point we focus verification work on the next
generation of regalloc.

llvm-svn: 124442
2011-01-27 21:26:43 +00:00
Devang Patel 1cec755494 Speculatively revert r124380.
llvm-svn: 124397
2011-01-27 19:15:01 +00:00
Devang Patel 3b266a2780 While legalizing SDValues do not drop SDDbgValues, trasfer them to new legal nodes.
Take 2. This includes fix for dragonegg crash.

llvm-svn: 124380
2011-01-27 17:43:53 +00:00
Bob Wilson 2d69fb4184 Avoid modifying the OneClassForEachPhysReg map while iterating over it.
Linear scan regalloc is currently assuming that any register aliased with
a member of a regclass must also be in at least one regclass.  That is not
always true.  For example, for X86, RIP is in a regclass but IP is not.
If you're unlucky, this can cause a crash by invalidating the iterator.

llvm-svn: 124365
2011-01-27 07:26:15 +00:00
Matt Beaumont-Gay a148c59231 Try harder to not have unused variables.
llvm-svn: 124350
2011-01-27 02:39:27 +00:00
Matt Beaumont-Gay 0cddbf2bdf Opt-mode -Wunused-variable cleanup
llvm-svn: 124346
2011-01-27 01:47:50 +00:00
Devang Patel 92b7077f9e Reapply 124301
llvm-svn: 124339
2011-01-27 00:13:27 +00:00
Bill Wendling fb4ee9bbde Initialize variable to get rid of clang warning.
llvm-svn: 124331
2011-01-26 22:21:35 +00:00
Devang Patel b370bf329a Revert 124301.
llvm-svn: 124327
2011-01-26 21:41:22 +00:00
Devang Patel 084e0628e0 Revert r124302
llvm-svn: 124320
2011-01-26 21:12:32 +00:00
David Greene bab5e6ed0e [AVX] Add INSERT_SUBVECTOR and support it on x86. This provides a
default implementation for x86, going through the stack in a similr
fashion to how the codegen implements BUILD_VECTOR.  Eventually this
will get matched to VINSERTF128 if AVX is available.

llvm-svn: 124307
2011-01-26 19:13:22 +00:00
Devang Patel a11210b1b8 While legalizing SDValues do not drop SDDbgValues, trasfer them to new legal nodes.
llvm-svn: 124302
2011-01-26 18:55:05 +00:00
Devang Patel 9d4eb2f480 Process valid SDDbgValues even if the node does not have any order assigned.
llvm-svn: 124301
2011-01-26 18:42:32 +00:00
Devang Patel 1448e7c8b6 Refactor.
llvm-svn: 124300
2011-01-26 18:20:04 +00:00
David Greene b6f1611928 [AVX] Support EXTRACT_SUBVECTOR on x86. This provides a default
implementation of EXTRACT_SUBVECTOR for x86, going through the stack
in a similr fashion to how the codegen implements BUILD_VECTOR.
Eventually this will get matched to VEXTRACTF128 if AVX is available.

llvm-svn: 124292
2011-01-26 15:38:49 +00:00
Jakob Stoklund Olesen b308902024 Rename member variables to follow the rest of LLVM.
No functional change.

llvm-svn: 124257
2011-01-26 00:50:53 +00:00
Devang Patel efc6b16e4b Provide an interface to transfer SDDbgValue from one SDNode to another.
llvm-svn: 124245
2011-01-25 23:27:42 +00:00
Devang Patel 70f8e5962a Resolve DanglingDbgValue of PHI nodes where the use follows dbg.value intrinisic.
llvm-svn: 124203
2011-01-25 18:09:58 +00:00
Devang Patel 04b649d48a This assertion is too restrictive, it does not apply for dangling dbg value nodes (nodes where dbg.value intrinsic preceds use of the value).
llvm-svn: 124202
2011-01-25 18:09:33 +00:00
Anton Korobeynikov b15beb2ae1 Support printing exception section into the current one. This is the case when LSDASection is blank
llvm-svn: 124150
2011-01-24 22:38:40 +00:00
Devang Patel 533479544b Speculatively revert r124138.
llvm-svn: 124142
2011-01-24 20:04:37 +00:00
Devang Patel 8cc5355c90 Resolve DanglingDbgValue of PHI nodes where the use follows dbg.value intrinisic.
llvm-svn: 124138
2011-01-24 19:24:37 +00:00
Andrew Trick a293c49f0d Temporarily workaround JM/lencod miscompile (SIGSEGV).
rdar://problem/8893967

llvm-svn: 124137
2011-01-24 19:08:15 +00:00
Rafael Espindola b3eca9bb71 Add support for the --noexecstack option.
llvm-svn: 124077
2011-01-23 17:55:27 +00:00
Ted Kremenek 3c4408ceb6 Null initialize a few variables flagged by
clang's -Wuninitialized-experimental warning.
While these don't look like real bugs, clang's
-Wuninitialized-experimental analysis is stricter
than GCC's, and these fixes have the benefit
of being general nice cleanups.

llvm-svn: 124073
2011-01-23 17:05:06 +00:00
Rafael Espindola 4b7b7fba38 Delay the creation of eh_frame so that the user can change the defaults.
Add support for SHT_X86_64_UNWIND.

llvm-svn: 124059
2011-01-23 05:43:40 +00:00
Rafael Espindola 0e7e34e476 Remove more duplicated code.
llvm-svn: 124056
2011-01-23 04:43:11 +00:00
Rafael Espindola aea4958ea6 Remove duplicated code.
llvm-svn: 124054
2011-01-23 04:28:49 +00:00
Andrew Trick bd428ec50f Enable support for precise scheduling of the instruction selection
DAG. Disable using "-disable-sched-cycles".

For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.

Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
  operations can have their latency covered. i.e. two independent 4
  cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
  latency-based stalls on their uses will be prioritized by depth before height
  (height is irrelevant if no stalls occur in the schedule below this point).

llvm-svn: 123971
2011-01-21 06:19:05 +00:00
Andrew Trick 47ff14b091 Convert -enable-sched-cycles and -enable-sched-hazard to -disable
flags. They are still not enable in this revision.

Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.

Generalized unit tests to work with sched-cycles.

llvm-svn: 123969
2011-01-21 05:51:33 +00:00
Jakob Stoklund Olesen 8a46e26b8e SplitKit requires that all defs are in place before calling useIntv().
The value mapping gets confused about which original values have multiple new
definitions so they may need phi insertions.

This could probably be simplified by letting enterIntvBefore() take a live range
to be added following the instruction. As long as the range stays inside the
same basic block, value mapping shouldn't be a problem.

llvm-svn: 123926
2011-01-20 17:45:23 +00:00
Jakob Stoklund Olesen 04e6b3bd21 Add LiveIntervalMap::dumpCache() to print out the cache used by the ssa update algorithm.
llvm-svn: 123925
2011-01-20 17:45:20 +00:00
Eric Christopher 37c4a8be72 My editor's indent went crazy. Fix.
llvm-svn: 123909
2011-01-20 08:56:34 +00:00
Eric Christopher 785db078b4 Expand invalid return values for umulo and smulo. Handle these similarly
to add/sub by doing the normal operation and then checking for overflow
afterwards. This generally relies on the DAG handling the later invalid
operations as well.

Fixes the 64-bit part of rdar://8622122 and rdar://8774702.

llvm-svn: 123908
2011-01-20 08:54:28 +00:00
Evan Cheng b8b0ad80a8 Sorry, several patches in one.
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.

Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.

ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
   which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
   a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
   to re-materialize the instruction, allow machine LICM to hoist the set of
   instructions out of the loop and make it possible to CSE them. It's a bit
   hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.

With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.

llvm-svn: 123905
2011-01-20 08:34:58 +00:00
Andrew Trick 2cd1f0beb6 Selection DAG scheduler register pressure heuristic fixes.
Added a check for already live regs before claiming HighRegPressure.
Fixed a few cases of checking the wrong number of successors.
Added some tracing until these heuristics are better understood.

llvm-svn: 123892
2011-01-20 06:21:59 +00:00
Jakob Stoklund Olesen 4060abb4b9 Check that a live range exists before shortening it. This fixes PR8989.
The live range may have been deleted earlier because of rematerialization.

llvm-svn: 123891
2011-01-20 06:20:02 +00:00
Jakob Stoklund Olesen 145755f1d6 Add hidden -verify-coalescing to run the machine code verifier before and after
register coalescing.

llvm-svn: 123890
2011-01-20 06:20:00 +00:00
Jakob Stoklund Olesen 5acd4a6453 Fix bug found by new clang warning.
llvm-svn: 123872
2011-01-20 02:43:19 +00:00
Eric Christopher b2139f655b Use only one API at a time.
llvm-svn: 123866
2011-01-20 01:29:23 +00:00
Eric Christopher bb14f65672 If we can, lower the multiply part of a umulo/smulo call to a libcall
with an invalid type then split the result and perform the overflow check
normally.

Fixes the 32-bit parts of rdar://8622122 and rdar://8774702.

llvm-svn: 123864
2011-01-20 00:29:24 +00:00
Devang Patel 2d9e532a3a Fix debug info for merged global.
llvm-svn: 123862
2011-01-20 00:02:16 +00:00
Jakob Stoklund Olesen 79be8aecba Divert Hopfield network debug output. It is very noisy.
llvm-svn: 123859
2011-01-19 23:14:59 +00:00
Jakob Stoklund Olesen 509089f5b6 Don't accidentally leave small gaps in the live ranges when leaving the active
interval after an instruction. The leaveIntvAfter() method only adds liveness
from the instruction's boundary index to the inserted copy.

Ideally, SplitKit should be smarter about this, perhaps by combining useIntv()
and leaveIntvAfter() into one method that guarantees continuity.

llvm-svn: 123858
2011-01-19 23:14:56 +00:00
Devang Patel 8698f09dbd Fix register address expression. Patch by Ken Dyck.
llvm-svn: 123856
2011-01-19 23:04:47 +00:00
Jakob Stoklund Olesen 9fb04015ff Implement RAGreedy::splitAroundRegion and remove loop splitting.
Region splitting includes loop splitting as a subset, and it is more generic.
The splitting heuristics for variables that are live in more than one block are
now:

1. Try to create a region that covers multiple basic blocks.
2. Try to create a new live range for each block with multiple uses.
3. Spill.

Steps 2 and 3 are similar to what the standard spiller is doing.

llvm-svn: 123853
2011-01-19 22:11:48 +00:00
Jakob Stoklund Olesen 267f6c1ab2 Add RAGreedy methods for splitting live ranges around regions.
Analyze the live range's behavior entering and leaving basic blocks. Compute an
interference pattern for each allocation candidate, and use SpillPlacement to
find an optimal region where that register can be live.

This code is still not enabled.

llvm-svn: 123774
2011-01-18 21:13:27 +00:00
Jeffrey Yasskin 249fcd4499 Remove unused variables found by gcc-4.6's -Wunused-but-set-variable.
llvm-svn: 123707
2011-01-18 00:51:23 +00:00
Stuart Hastings 4fa832aab0 Remove checking that prevented overlapping CALLSEQ_START/CALLSEQ_END
ranges, add legalizer support for nested calls.  Necessary for ARM
byval support.  Radar 7662569.

llvm-svn: 123704
2011-01-18 00:09:27 +00:00
Benjamin Kramer 45d183ccf0 Fix an off-by-one error in ctpop combining.
llvm-svn: 123664
2011-01-17 18:00:28 +00:00
Benjamin Kramer 24c5184dca Add a DAGCombine to turn (ctpop x) u< 2 into (x & x-1) == 0.
This shaves off 4 popcounts from the hacked 186.crafty source.

This is enabled even when a native popcount instruction is available. The
combined code is one operation longer but it should be faster nevertheless.

llvm-svn: 123621
2011-01-17 12:04:57 +00:00
Chris Lattner 2d186574a6 reapply my fix for PR8961 with a tweak to properly handle
multi-instruction sequences like calls.  Many thanks to Jakob for
finding a testcase.

llvm-svn: 123559
2011-01-16 02:27:38 +00:00
Benjamin Kramer bec03ea725 Add an assert so we don't silently miscompile ctpop for bit widths > 128.
llvm-svn: 123549
2011-01-15 21:19:37 +00:00
Benjamin Kramer fff2517edc Reimplement CTPOP legalization with the "best" algorithm from
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel

In a silly microbenchmark on a 65 nm core2 this is 1.5x faster than the old
code in 32 bit mode and about 2x faster in 64 bit mode. It's also a lot shorter,
especially when counting 64 bit population on a 32 bit target.

I hope this is fast enough to replace Kernighan-style counting loops even when
the input is rather sparse.

llvm-svn: 123547
2011-01-15 20:30:30 +00:00
Ted Kremenek e92b6e436d Update CMake build.
llvm-svn: 123491
2011-01-14 22:58:11 +00:00
Dan Gohman abac063b7a Delete an assignment to ThisBB which isn't needed, and tidy up some
comments.

llvm-svn: 123479
2011-01-14 22:26:16 +00:00
Anton Korobeynikov 9be547cfd3 Add a possibility to switch between CFI directives- and table-based frame description emission. Currently all the backends use table-based stuff.
llvm-svn: 123476
2011-01-14 21:58:08 +00:00
Anton Korobeynikov b46ef57de5 Add CFI directives-based frame information emission. Not hooked yet.
llvm-svn: 123474
2011-01-14 21:57:53 +00:00
Anton Korobeynikov 61d167e92b Split stuff as a preparation for CFI directives-based frame information emission
llvm-svn: 123473
2011-01-14 21:57:45 +00:00
Andrew Trick 9ccce77893 Support for precise scheduling of the instruction selection DAG,
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.

Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
  if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
  latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
  for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
  prioritize nodes by their depth rather than height. As long as it
  doesn't stall, height is irrelevant. Depth represents the critical
  path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
  adding them to the available queue.

Related Cleanup: most of the register reduction routines do not need
to be templates.

llvm-svn: 123468
2011-01-14 21:11:41 +00:00
Jakob Stoklund Olesen ab3d6ecbd2 Try for the third time to teach getFirstTerminator() about debug values.
This time let's rephrase to trick gcc-4.3 into not miscompiling.

llvm-svn: 123432
2011-01-14 06:33:45 +00:00
Jakob Stoklund Olesen c38102889f Revert r123419. It still breaks llvm-gcc-i386-linux-selfhost.
llvm-svn: 123423
2011-01-14 02:12:54 +00:00
Chris Lattner 3be81e9bd7 Set the insertion point correctly for instructions generated by load folding:
they should go *before* the new instruction not after it. 

llvm-svn: 123420
2011-01-14 01:33:40 +00:00
Jakob Stoklund Olesen c0767e029d Try again to teach getFirstTerminator() about debug values.
Fix some callers to better deal with debug values.

llvm-svn: 123419
2011-01-14 01:17:53 +00:00
Jakob Stoklund Olesen 088b30aa48 Better terminator avoidance.
This approach also works when the terminator doesn't have a slot index. (Which
can happen??)

llvm-svn: 123413
2011-01-13 23:35:53 +00:00
Jakob Stoklund Olesen 05a0b55e76 Temporary workaround for an i386 crash in LiveDebugVariables.
llvm-svn: 123400
2011-01-13 21:28:55 +00:00
Jakob Stoklund Olesen 4bc5e38960 Teach frame lowering to ignore debug values after the terminators.
llvm-svn: 123399
2011-01-13 21:28:52 +00:00
Devang Patel 35f4ae26d6 Speculatively revert r123384 to make llvm-gcc-i386-linux-selfhost buildbot happy.
llvm-svn: 123389
2011-01-13 19:27:50 +00:00
Jakob Stoklund Olesen 0e233ae183 Teach MachineBasicBlock::getFirstTerminator to ignore debug values.
It will still return an iterator that points to the first terminator or end(),
but there may be DBG_VALUE instructions following the first terminator.

llvm-svn: 123384
2011-01-13 18:41:05 +00:00
Dan Gohman 958620dd6d Fix r123346 to handle scalar types too.
llvm-svn: 123352
2011-01-13 01:06:51 +00:00
Jakob Stoklund Olesen 9472847bcc Add missing space in debug output
llvm-svn: 123351
2011-01-13 00:57:35 +00:00
Dan Gohman 6e017a1134 Apply the patch from PR8958, which allows llc to get slightly
further on the associated testcase before aborting.

llvm-svn: 123346
2011-01-12 23:56:26 +00:00
Jakob Stoklund Olesen 74ded57bb8 Try again enabling LiveDebugVariables.
llvm-svn: 123342
2011-01-12 23:36:21 +00:00
Jakob Stoklund Olesen e63dfeee36 Don't emit a DBG_VALUE for a spill slot that the rewriter decided not to use after all.
llvm-svn: 123339
2011-01-12 23:14:07 +00:00
Jakob Stoklund Olesen 2ffee66e10 Fix braino in dominator tree walk.
llvm-svn: 123338
2011-01-12 23:14:04 +00:00
Jakob Stoklund Olesen 1a3534afc4 Sometimes, old virtual registers can linger on DBG_VALUE instructions.
Make sure we don't crash in that case, but simply turn them into %noreg instead.

llvm-svn: 123335
2011-01-12 22:37:49 +00:00
Jakob Stoklund Olesen 013c4649c0 Teach VirtRegRewriter to update slot indexes when erasing instructions.
It was leaving dangling pointers in the slot index maps.

llvm-svn: 123334
2011-01-12 22:28:51 +00:00
Jakob Stoklund Olesen 71a3853332 Annotate VirtRegRewriter debug output with slot indexes.
llvm-svn: 123333
2011-01-12 22:28:48 +00:00
Jakob Stoklund Olesen 58b6f4d832 Verify slot index ordering.
The slot indexes must be monotonically increasing through the function.

llvm-svn: 123324
2011-01-12 21:27:48 +00:00
Jakob Stoklund Olesen b5b4a5d0ba Verify that machine instruction parent pointers are consistent.
llvm-svn: 123322
2011-01-12 21:27:41 +00:00
Jakob Stoklund Olesen 43812bfa92 The world is not ready for LiveDebugVariables yet.
llvm-svn: 123290
2011-01-11 23:20:33 +00:00
Jakob Stoklund Olesen 8c98495f43 Enable LiveDebugVariables by default.
llvm-svn: 123282
2011-01-11 22:45:28 +00:00
Jakob Stoklund Olesen 803f48bcd1 Don't insert DBG_VALUE instructions after the first terminator.
For one, MachineBasicBlock::getFirstTerminator() doesn't understand what is
happening, and it also makes sense to have all control flow run through the
DBG_VALUE.

llvm-svn: 123277
2011-01-11 22:11:16 +00:00
Devang Patel 447cb38fbe Appropriately truncate debug info range in dwarf output.
This is not yet completely enabled.

llvm-svn: 123274
2011-01-11 21:42:10 +00:00
Eric Christopher 1bb2c00f65 Move ExpandAtomic into the integer expansion routines - it's only used there.
llvm-svn: 123202
2011-01-11 00:36:08 +00:00
Dale Johannesen d2b48119b0 Fix PR 8916 (qv for analysis), at least the immediate problem.
There's an inherent tension in DAGCombine between assuming
that things will be put in canonical form, and the Depth
mechanism that disables transformations when recursion gets
too deep.  It would not surprise me if there's a lot of little
bugs like this one waiting to be discovered.  The mechanism
seems fragile and I'd suggest looking at it from a design viewpoint.

llvm-svn: 123191
2011-01-10 21:53:07 +00:00
Anton Korobeynikov 2f93128109 Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs and fixes here and there.
llvm-svn: 123170
2011-01-10 12:39:04 +00:00
Chris Lattner 6c8b8dd522 fit in 80 cols and use MBB::isSuccessor instead of a hand
rolled std::find.

llvm-svn: 123164
2011-01-10 07:51:31 +00:00
Jakob Stoklund Olesen 2fb5b31578 Simplify a bunch of isVirtualRegister() and isPhysicalRegister() logic.
These functions not longer assert when passed 0, but simply return false instead.

No functional change intended.

llvm-svn: 123155
2011-01-10 02:58:51 +00:00
Jakob Stoklund Olesen d82ac37594 Remove MachineRegisterInfo::getLastVirtReg(), it was giving wrong results
when no virtual registers have been allocated.

It was only used to resize IndexedMaps, so provide an IndexedMap::resize()
method such that

 Map.grow(MRI.getLastVirtReg());

can be replaced with the simpler

 Map.resize(MRI.getNumVirtRegs());

This works correctly when no virtuals are allocated, and it bypasses the to/from
index conversions.

llvm-svn: 123130
2011-01-09 21:58:20 +00:00
Chris Lattner 878665b4bc sort this.
llvm-svn: 123129
2011-01-09 21:31:39 +00:00
Jakob Stoklund Olesen b83a6b23dc Teach TargetRegisterInfo how to cram stack slot indexes in with the virtual and
physical register numbers.

This makes the hack used in LiveInterval official, and lets LiveInterval be
oblivious of stack slots.

The isPhysicalRegister() and isVirtualRegister() predicates don't know about
this, so when a variable may contain a stack slot, isStackSlot() should always
be tested first.

llvm-svn: 123128
2011-01-09 21:17:37 +00:00
Jakob Stoklund Olesen d65524da0f Add a forgotten VireReg2IndexFunctor.
llvm-svn: 123123
2011-01-09 18:58:33 +00:00
Cameron Zwarich 8a00d8175b Eliminate some extra hash table lookups.
llvm-svn: 123115
2011-01-09 10:54:21 +00:00
Cameron Zwarich a5910d1b31 Add an informative comment.
llvm-svn: 123114
2011-01-09 10:32:30 +00:00
Jakob Stoklund Olesen 9adf5e09cd Simplify LiveDebugVariables by storing MachineOperand copies locations instead
of using a Location class with the same information.

When making a copy of a MachineOperand that was already stored in a
MachineInstr, it is necessary to clear the parent pointer on the copy. Otherwise
the register use-def lists become inconsistent.

Add MachineOperand::clearParent() to do that. An alternative would be a custom
MachineOperand copy constructor that cleared ParentMI. I didn't want to do that
because of the performance impact.

llvm-svn: 123109
2011-01-09 05:33:21 +00:00
Jakob Stoklund Olesen 3a9e5c29c8 Shrink a BitVector that didn't mean to store bits for all physical registers.
llvm-svn: 123108
2011-01-09 03:45:44 +00:00
Jakob Stoklund Olesen 1331a15b0c Replace TargetRegisterInfo::printReg with a PrintReg class that also works without a TRI instance.
Print virtual registers numbered from 0 instead of the arbitrary
FirstVirtualRegister. The first virtual register is printed as %vreg0.
TRI::NoRegister is printed as %noreg.

llvm-svn: 123107
2011-01-09 03:05:53 +00:00
Jakob Stoklund Olesen 7f93d8d62c Use IndexedMap for MachineRegisterInfo as well. No functional change.
llvm-svn: 123106
2011-01-09 03:05:46 +00:00
Jakob Stoklund Olesen cf4d5ced0f Fix VirtRegMap to use TRI::index2VirtReg and TRI::virtReg2Index instead of
depending on TRI::FirstVirtualRegister.

Also use TRI::printReg instead of printing virtual registers directly.

llvm-svn: 123101
2011-01-08 23:11:07 +00:00
Jakob Stoklund Olesen 6ff70ad356 Fix a MachineVerifier loop that probably didn't mean to skip the last two
virtual registers.

llvm-svn: 123100
2011-01-08 23:11:02 +00:00
Jakob Stoklund Olesen 28d76692b6 Use an IndexedMap for LiveVariables::VirtRegInfo.
Provide MRI::getNumVirtRegs() and TRI::index2VirtReg() functions to allow
iteration over virtual registers without depending on the representation of
virtual register numbers.

llvm-svn: 123098
2011-01-08 23:10:57 +00:00
Jakob Stoklund Olesen 793d7b7626 Use an IndexedMap for LiveOutRegInfo to hide its dependence on TargetRegisterInfo::FirstVirtualRegister.
llvm-svn: 123096
2011-01-08 23:10:50 +00:00
Cameron Zwarich 0939bc3709 Fix coding style.
llvm-svn: 123093
2011-01-08 22:36:53 +00:00
Cameron Zwarich 84986b298a Make more passes preserve dominators (or state that they preserve dominators if
they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.

The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.

llvm-svn: 123064
2011-01-08 17:01:52 +00:00
Evan Cheng 078b0b095e Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Evan Cheng 6eb516dbea Do not model all INLINEASM instructions as having unmodelled side effects.
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.

This allows memory instructions to be moved around INLINEASM instructions.

llvm-svn: 123044
2011-01-07 23:50:32 +00:00
Devang Patel acbee0b0d9 Speculatively revert r123032.
llvm-svn: 123039
2011-01-07 22:33:41 +00:00
Devang Patel 6381e1584c Appropriately truncate debug info range in dwarf output.
Enable live debug variables pass.

llvm-svn: 123032
2011-01-07 21:30:41 +00:00
Evan Cheng 0638c20e7c DBG_VALUE does not have any side effects; it also makes no sense to mark it cheap as a copy.
llvm-svn: 123031
2011-01-07 21:08:26 +00:00
Bob Wilson 8265d56638 Add ARM patterns to match EXTRACT_SUBVECTOR nodes.
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.

The test changes are needed to keep those spill-q tests from testing aligned
spills and restores.  If the only aligned stack objects are spill slots, we
no longer realign the stack frame.  Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.

llvm-svn: 122995
2011-01-07 04:59:04 +00:00
Bob Wilson d23b3d2dfc Fix a comment typo.
llvm-svn: 122994
2011-01-07 04:58:58 +00:00
Bob Wilson f291cb268f Change EXTRACT_SUBVECTOR to require a constant index.
We were never generating any of these nodes with variable indices, and there
was one legalizer function asserting on a non-constant index.  If we ever have
a need to support variable indices, we can add this back again.

llvm-svn: 122993
2011-01-07 04:58:56 +00:00
Bill Wendling 34e2bc0f08 Early exit if we don't have invokes. The 'Unwinds' vector isn't modified unless
we have invokes, so there is no functionality change here.

llvm-svn: 122990
2011-01-07 02:54:45 +00:00
Duncan Sands 61c5708b51 Fix the other problem reported in PR8582. Testcase and patch by
Nadav Rotem.

llvm-svn: 122983
2011-01-06 23:45:22 +00:00
Eric Christopher e516af753b Add some fairly duplicated code to let type legalization split illegal
typed atomics. This will lower exclusively to libcalls at the moment.

llvm-svn: 122979
2011-01-06 22:28:56 +00:00
Devang Patel 70eb982843 Emit 128 bit constant.
This fixes PR 8913 crash.

llvm-svn: 122971
2011-01-06 21:39:25 +00:00
Evan Cheng 3ae2b79aa3 Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.

llvm-svn: 122952
2011-01-06 06:52:41 +00:00
Evan Cheng c052ba7ff3 Revert r122936. I'll re-implement the change.
llvm-svn: 122949
2011-01-06 06:17:53 +00:00
Jakob Stoklund Olesen 70be93a200 Zap the last two -Wself-assign warnings in llvm.
Simplify RALinScan::DowngradeRegister with TRI::getOverlaps while we are there.

llvm-svn: 122940
2011-01-06 01:33:22 +00:00
Jakob Stoklund Olesen 8e236eac74 Add the SpillPlacement analysis pass.
This pass precomputes CFG block frequency information that can be used by the
register allocator to find optimal spill code placement.

Given an interference pattern, placeSpills() will compute which basic blocks
should have the current variable enter or exit in a register, and which blocks
prefer the stack.

The algorithm is ready to consume block frequencies from profiling data, but for
now it gets by with the static estimates used for spill weights.

This is a work in progress and still not hooked up to RegAllocGreedy.

llvm-svn: 122938
2011-01-06 01:21:53 +00:00
Evan Cheng 06536e7158 r105228 reduced the memcpy / memset inline limit to 4 with -Os to avoid blowing
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501

llvm-svn: 122936
2011-01-06 01:04:47 +00:00
Evan Cheng ac730dd2d1 Avoid zero extend bit test operands to pointer type if all the masks fit in
the original type of the switch statement key.
rdar://8781238

llvm-svn: 122935
2011-01-06 01:02:44 +00:00
Evan Cheng 260acf32ee Optimize:
r1025 = s/zext r1024, 4
  r1026 = extract_subreg r1025, 4
to:
  r1026 = copy r1024

llvm-svn: 122925
2011-01-05 23:06:49 +00:00
Jakob Stoklund Olesen f3ac733684 Add a hidden command line option to display edge bundle graphs as they are
calculated.

llvm-svn: 122912
2011-01-05 21:50:24 +00:00
Eric Christopher c673b21a87 80-cols.
llvm-svn: 122909
2011-01-05 21:45:56 +00:00
Eric Christopher 988518109d Remove TODO, these appear to be implemented.
llvm-svn: 122849
2011-01-04 22:31:50 +00:00
Jakob Stoklund Olesen f96ae684c4 Turn the EdgeBundles class into a stand-alone machine CFG analysis pass.
The analysis will be needed by both the greedy register allocator and the
X86FloatingPoint pass. It only needs to be computed once when the CFG doesn't
change.

This pass is very fast, usually showing up as 0.0% wall time.

llvm-svn: 122832
2011-01-04 21:10:05 +00:00
Cameron Zwarich 5cd3d718f6 Switch to path halving from path compression for a small speedup. This also
makes getLeader() nonrecursive.

llvm-svn: 122811
2011-01-04 16:24:51 +00:00
Cameron Zwarich 82e8332a22 Eliminate repeated allocation of a per-BB DenseMap for a 4.6% reduction of time
spent in StrongPHIElimination on 403.gcc.

llvm-svn: 122803
2011-01-04 06:42:27 +00:00
Owen Anderson 2e28697c60 Clean up a funky pass registration that got passed over when I got rid of static constructors.
llvm-svn: 122795
2011-01-04 00:55:21 +00:00
Cameron Zwarich 18f164f7c9 Use a RecyclingAllocator to allocate values for MachineCSE's ScopedHashTable for
a 28% speedup of MachineCSE time on 403.gcc.

llvm-svn: 122735
2011-01-03 04:07:46 +00:00
Chris Lattner bf0aa927cc split dom frontier handling stuff out to its own DominanceFrontier header,
so that Dominators.h is *just* domtree.  Also prune #includes a bit.

llvm-svn: 122714
2011-01-02 22:09:33 +00:00
Benjamin Kramer 25e6e06e42 Try to reuse the value when lowering memset.
This allows us to compile:
  void test(char *s, int a) {
    __builtin_memset(s, a, 15);
  }
into 1 mul + 3 stores instead of 3 muls + 3 stores.

llvm-svn: 122710
2011-01-02 19:57:05 +00:00
Benjamin Kramer 2fdea4c8f1 Lower the i8 extension in memset to a multiply instead of a potentially long series of shifts and ors.
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.

Example code:
  void test(char *s, int a) {
      __builtin_memset(s, a, 4);
  }
before:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    movl  %eax, %ecx
    shll  $8, %ecx
    orl %eax, %ecx
    movl  %ecx, %eax
    shll  $16, %eax
    orl %ecx, %eax
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret
after:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    imull $16843009, %eax, %eax   ## imm = 0x1010101
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret

llvm-svn: 122707
2011-01-02 19:44:58 +00:00
Cameron Zwarich 2f6dc10ccc Use getVRegDef() instead of def_iterator. This leads to fewer defs being added
with 2-address instructions, for about a 3.5% speedup of StrongPHIElimination on
403.gcc.

llvm-svn: 122635
2010-12-30 00:42:23 +00:00
Cameron Zwarich 329cd49ce6 None of the other pass names in CodeGen have terminating periods.
llvm-svn: 122628
2010-12-29 11:49:10 +00:00
Cameron Zwarich 0507f44669 Instead of processing every instruction when splitting interferences, only
process those instructions that define phi sources. This is a 47% speedup of
StrongPHIElimination compile time on 403.gcc.

llvm-svn: 122627
2010-12-29 11:00:09 +00:00
Cameron Zwarich bfef075140 Add a missing word to a comment.
llvm-svn: 122625
2010-12-29 04:42:39 +00:00
Cameron Zwarich 458fd305d4 Add text explaining an assertion.
llvm-svn: 122617
2010-12-29 03:52:51 +00:00
Cameron Zwarich 6fe33fdd63 Simplify some code in MachineVerifier that was doing the correct thing, but not
in the most obvious way.

llvm-svn: 122610
2010-12-28 23:45:38 +00:00
Cameron Zwarich 146666eabb Revert the optimization in r122596. It is correct for all current targets, but
it relies on assumptions that may not be true in the future.

llvm-svn: 122608
2010-12-28 23:02:56 +00:00
Cameron Zwarich 92f6e4290c Avoid iterating every operand of an instruction in StrongPHIElimination, since
we are only interested in the defs when discovering interferences.

This is a 28% speedup running StrongPHIElimination on 403.gcc.

llvm-svn: 122596
2010-12-28 10:49:33 +00:00
Duncan Sands 496770debc Pacify the compiler. BestWeight cannot in fact be used uninitialized
in this function, but the compiler was warning that it might be when
doing a release build.

llvm-svn: 122595
2010-12-28 10:07:15 +00:00
Cameron Zwarich 5e5cfbe871 Change an assertion to assert what the code actually relies upon.
llvm-svn: 122586
2010-12-27 22:08:42 +00:00
Cameron Zwarich 25d046ce68 Land a first cut at StrongPHIElimination. There are only 5 new test failures
when running without the verifier, and I have not yet checked them to see if
the new results are still correct. There are more verifier failures, but they
all seem to be additional occurrences of verifier failures that occur with the
existing PHIElimination pass. There are a few obvious issues with the code:

1) It doesn't properly update the register equivalence classes during copy
insertion, and instead recomputes them before merging live intervals and
renaming registers. I wanted to keep this first patch simple for debugging
purposes, but it shouldn't be very hard to do this.

2) It doesn't mix the renaming and live interval merging with the copy insertion
process, which leads to a lot of virtual register churn. Virtual registers and
live intervals are created, only to later be merged into others. The code should
be smarter and only create a new virtual register if there is no existing
register in the same congruence class.

3) In one place the code uses a DenseMap per basic block, which is unnecessary
heap allocation. There should be an inline storage version of DenseMap.

I did a quick compile-time test of running llc on 403.gcc with and without
StrongPHIElimination. It is slightly slower with StrongPHIElimination, because
the small decrease in the coalescer runtime can't beat the increase in phi
elimination runtime. Perhaps fixing the above performance issues will narrow
the gap.

I also haven't yet run any tests of the quality of the generated code.

llvm-svn: 122582
2010-12-27 10:08:19 +00:00
Cameron Zwarich b95bfe1667 Add knowledge of phi-def and phi-kill valnos to MachineVerifier's predecessor
valno verification. The "Different value live out of predecessor" check is
incorrect in the case of phi-def valnos, so just skip that check for phi-def
valnos and instead check that all of the valnos for predecessors have phi-kill.
Fixes PR8863.

llvm-svn: 122581
2010-12-27 05:17:23 +00:00
Andrew Trick 5ce945ca3a Minor cleanup related to my latest scheduler changes.
llvm-svn: 122545
2010-12-24 07:10:19 +00:00
Andrew Trick c94056692a Fix a few cases where the scheduler is not checking for phys reg copies. The scheduling node may have a NULL DAG node, yuck.
llvm-svn: 122544
2010-12-24 06:46:50 +00:00
Andrew Trick 10ffc2b6c2 Various bits of framework needed for precise machine-level selection
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.

Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.

Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.

Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.

ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.

ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.

llvm-svn: 122541
2010-12-24 05:03:26 +00:00
Andrew Trick c416ba612b whitespace
llvm-svn: 122539
2010-12-24 04:28:06 +00:00
Cameron Zwarich ab434079d3 Simplify a check for implicit defs and remove a FIXME.
llvm-svn: 122537
2010-12-24 03:09:36 +00:00
Chris Lattner 11a33811b6 flags -> glue for selectiondag
llvm-svn: 122509
2010-12-23 17:24:32 +00:00
Chris Lattner f647e95b9a sdisel flag -> glue.
llvm-svn: 122507
2010-12-23 17:13:18 +00:00
Andrew Trick 528fad91d2 Reorganize ListScheduleBottomUp in preparation for modeling machine cycles and instruction issue.
llvm-svn: 122491
2010-12-23 05:42:20 +00:00
Andrew Trick a52f325c35 Converted LiveRegCycles to LiveRegGens. It's easier to work with and allows multiple nodes per cycle.
llvm-svn: 122474
2010-12-23 04:16:14 +00:00
Andrew Trick 12acde11cb In CheckForLiveRegDef use TRI->getOverlaps.
llvm-svn: 122473
2010-12-23 03:43:21 +00:00
Andrew Trick 033efdf4d7 Fixes PR8823: add-with-overflow-128.ll
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.

llvm-svn: 122472
2010-12-23 03:15:51 +00:00
Jeffrey Yasskin 9b43f33620 Change all self assignments X=X to (void)X, so that we can turn on a
new gcc warning that complains on self-assignments and
self-initializations.

llvm-svn: 122458
2010-12-23 00:58:24 +00:00
Benjamin Kramer 1f4dfbbcb0 DAGCombine add (sext i1), X into sub X, (zext i1) if sext from i1 is illegal. The latter usually compiles into smaller code.
example code:
unsigned foo(unsigned x, unsigned y) {
  if (x != 0) y--;
  return y;
}

before:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    sbbl  %eax, %eax              ## encoding: [0x19,0xc0]
    notl  %eax                    ## encoding: [0xf7,0xd0]
    addl  8(%esp), %eax           ## encoding: [0x03,0x44,0x24,0x08]
    ret                           ## encoding: [0xc3]

after:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    movl  8(%esp), %eax           ## encoding: [0x8b,0x44,0x24,0x08]
    adcl  $-1, %eax               ## encoding: [0x83,0xd0,0xff]
    ret                           ## encoding: [0xc3]

llvm-svn: 122455
2010-12-22 23:17:45 +00:00
Jakob Stoklund Olesen 0acb69d53c When RegAllocGreedy decides to spill the interferences of the current register,
pick the victim with the lowest total spill weight.

llvm-svn: 122445
2010-12-22 22:01:30 +00:00
Jakob Stoklund Olesen 29836e6572 Include a shadow of the original CFG edges in the edge bundle graph.
llvm-svn: 122444
2010-12-22 22:01:28 +00:00
Chris Lattner cafc1e60bb Fix a bug in ReduceLoadWidth that wasn't handling extending
loads properly.  We miscompiled the testcase into:

_test:                                  ## @test
	movl	$128, (%rdi)
	movzbl	1(%rdi), %eax
	ret

Now we get a proper:

_test:                                  ## @test
	movl	$128, (%rdi)
	movsbl	(%rdi), %eax
	movzbl	%ah, %eax
	ret

This fixes PR8757.

llvm-svn: 122392
2010-12-22 08:02:57 +00:00
Chris Lattner 9a499e96eb more cleanups, move a check for "roundedness" earlier to reject
unhanded cases faster and simplify code.

llvm-svn: 122391
2010-12-22 08:01:44 +00:00
Chris Lattner 222374d886 reduce indentation and improve comments, no functionality change.
llvm-svn: 122389
2010-12-22 07:36:50 +00:00
Andrew Trick fbb3ed8774 In DelayForLiveRegsBottomUp, handle instructions that read and write
the same physical register. Simplifies the fix from the previous
checkin r122211.

llvm-svn: 122370
2010-12-21 22:27:44 +00:00
Andrew Trick 2085a96513 whitespace
llvm-svn: 122368
2010-12-21 22:25:04 +00:00
Dale Johannesen a94e36bbee Reapply 122353-122355 with fixes. 122354 was wrong;
the shift type was needed one place, the shift count
type another.  The transform in 123555 had the same
problem.

llvm-svn: 122366
2010-12-21 21:55:50 +00:00
Dale Johannesen 87c47499c6 Revert 122353-122355 for the moment, they broke stuff.
llvm-svn: 122360
2010-12-21 21:22:27 +00:00
Dale Johannesen caf42aa6a4 Add a new transform to DAGCombiner.
llvm-svn: 122355
2010-12-21 20:10:51 +00:00
Dale Johannesen fa5dc82fda Get the type of a shift from the shift, not from its shift
count operand.  These should be the same but apparently are
not always, and this is cleaner anyway.  This improves the
code in an existing test.

llvm-svn: 122354
2010-12-21 20:06:19 +00:00
Dale Johannesen d64931df77 Shift by the word size is invalid IR; don't create it.
llvm-svn: 122353
2010-12-21 20:00:06 +00:00
Chris Lattner 2a7ff99979 fix some typos
llvm-svn: 122349
2010-12-21 18:05:22 +00:00
Stuart Hastings 83cce8e7ab Fix indentation, add comment.
llvm-svn: 122345
2010-12-21 17:16:58 +00:00
Stuart Hastings 8c5bfcaa29 Missing logic for nested CALLSEQ_START/END.
llvm-svn: 122342
2010-12-21 17:07:24 +00:00
Cameron Zwarich 79ebc7186e Incremental progress towards a new implementation of StrongPHIElimination. Most
of the problems with my last attempt were in the updating of LiveIntervals
rather than the coalescing itself. Therefore, I decided to get that right first
by essentially reimplementing the existing PHIElimination using LiveIntervals.

It works correctly, with only a few tests failing (which may not be legitimate
failures) and no new verifier failures (at least as far as I can tell, I didn't
count the number per file).

llvm-svn: 122321
2010-12-21 06:54:43 +00:00
Chris Lattner 3e5fbd74ed rename MVT::Flag to MVT::Glue. "Flag" is a terrible name for
something that just glues two nodes together, even if it is
sometimes used for flags.

llvm-svn: 122310
2010-12-21 02:38:05 +00:00
Chris Lattner 17f906be96 improve "cannot yet select" errors a trivial amount: now
they are just as useless, but at least a bit more gramatical

llvm-svn: 122305
2010-12-21 02:07:03 +00:00
Jakob Stoklund Olesen 2530cd2a4c Add EdgeBundles to SplitKit.
Edge bundles is an annotation on the CFG that turns it into a bipartite directed
graph where each basic block is connected to an outgoing and an ingoing bundle.
These bundles are useful for identifying regions of the CFG for live range
splitting.

llvm-svn: 122301
2010-12-21 01:50:21 +00:00
Jakob Stoklund Olesen 4c278f82c8 Use IntEqClasses to compute connected components of live intervals.
llvm-svn: 122296
2010-12-21 00:48:17 +00:00
Dale Johannesen 0a291a36f2 Cosmetic changes.
llvm-svn: 122259
2010-12-20 20:10:50 +00:00
Cameron Zwarich 4ffda706d0 MachineVerifier should count landing pad successors as basic blocks rather than
out-edges. Fixes PR8824.

llvm-svn: 122228
2010-12-20 04:19:48 +00:00
Cameron Zwarich 660bce67f3 Teach MachineVerifier that early clobber defs begin at USE slots and other defs
begin at DEF slots. Fixes the second half of PR8813.

llvm-svn: 122225
2010-12-20 03:15:20 +00:00
Cameron Zwarich bc2461c5f9 Add a missing check from r122218.
llvm-svn: 122224
2010-12-20 02:59:51 +00:00
Chris Lattner 0b3ca50ebb implement type legalization promotion support for SMULO and UMULO, giving
ARM (and other 32-bit-only) targets support for i8 and i16 overflow 
multiplies.  The generated code isn't great, but this at least fixes
CodeGen/Generic/overflow.ll when running on ARM hosts.

llvm-svn: 122221
2010-12-20 02:05:39 +00:00
Cameron Zwarich fc0c6b1ea9 Don't assume that an instruction ending a register's live range always reads
the register; it may be a dead def instead. Fixes PR8820.

llvm-svn: 122218
2010-12-20 01:22:37 +00:00
Chris Lattner 981afd206b Fix a bug in the scheduler's handling of "unspillable" vregs.
Imagine we see:

EFLAGS = inst1
EFLAGS = inst2 FLAGS
gpr = inst3 EFLAGS

Previously, we would refuse to schedule inst2 because it clobbers
the EFLAGS of the predecessor.  However, it also uses the EFLAGS
of the predecessor, so it is safe to emit.  SDep edges ensure that
the right order happens already anyway.

This fixes 2 testsuite crashes with the X86 patch I'm going to
commit next.

llvm-svn: 122211
2010-12-20 00:55:43 +00:00
Chris Lattner 0cfe884874 the result of CheckForLiveRegDef is dead, remove it.
llvm-svn: 122209
2010-12-20 00:51:56 +00:00
Chris Lattner ed69c6e4b9 reduce indentation, no functionality change.
llvm-svn: 122208
2010-12-20 00:50:16 +00:00
Cameron Zwarich 1b67d6c565 Ignore debug values when performing MachineVerifier liveness checks. Fixes
PR8822.

llvm-svn: 122207
2010-12-20 00:08:10 +00:00
Cameron Zwarich 0b111b1aee Early clobber operands are allowed to be defined at use indices. This fixes one
half of PR8813.

llvm-svn: 122205
2010-12-19 23:50:53 +00:00
Cameron Zwarich 251337e1c4 Fix PR8815 by checking for an explicit clobber def tied to a use operand in
ConnectedVNInfoEqClasses::Classify().

llvm-svn: 122202
2010-12-19 22:12:45 +00:00
Cameron Zwarich 7e24173a3c Fix PR8811 by teaching MachineVerifier about optional defs.
llvm-svn: 122199
2010-12-19 21:37:23 +00:00
Cameron Zwarich b5cec4f11a StrongPHIElimination will never run before TwoAddressInstructionPass.
llvm-svn: 122197
2010-12-19 21:32:29 +00:00
Nick Lewycky 0de20af7ba Add missing standard headers. Patch by Joerg Sonnenberger!
llvm-svn: 122193
2010-12-19 20:43:38 +00:00
Chris Lattner 440b2804ff teach MaskedValueIsZero how to analyze ADDE. This is
enough to teach it that ADDE(0,0) is known 0 except the 
low bit, for example.

llvm-svn: 122191
2010-12-19 20:38:28 +00:00
Cameron Zwarich 713ab37965 Remove some checks for StrongPHIElim. These checks make it impossible to use an
alternative register allocator that does not require LiveIntervals by specifying
it on the command-line for a target that has StrongPHIElimination enabled by
default.

These checks are pretty meaningless anyways, since StrongPHIElimination and
PHIElimination are never used at the same time.

llvm-svn: 122176
2010-12-19 18:03:27 +00:00
Chris Lattner 77a8a71414 fix PR8642: if a critical edge has a PHI value that can trap,
isel is *required* to split the edge.  PHI values get evaluated
on the edge, not in their predecessor block.

llvm-svn: 122170
2010-12-19 04:58:57 +00:00
Jakob Stoklund Olesen 1fa7958eaa Apparently, operandices is not a word.
llvm-svn: 122135
2010-12-18 03:28:32 +00:00
Jakob Stoklund Olesen 3b2966dc7d Teach the inline spiller to attempt folding a load instruction into its single
use before rematerializing the load.

This allows us to produce:

    addps	LCPI0_1(%rip), %xmm2

Instead of:

    movaps	LCPI0_1(%rip), %xmm3
    addps	%xmm3, %xmm2

Saving a register and an instruction. The standard spiller already knows how to
do this.

llvm-svn: 122133
2010-12-18 03:04:14 +00:00
Jakob Stoklund Olesen 2a9f194b00 Tweak debug spew.
llvm-svn: 122132
2010-12-18 03:04:11 +00:00
Jakob Stoklund Olesen 7971a3eaff Check that the register is live-in to the loop header before inserting copies in
the loop predecessors.

The register can be live-out from a predecessor without being live-in to the
loop header if there is a critical edge from the predecessor.

llvm-svn: 122123
2010-12-18 01:06:19 +00:00
Nick Lewycky 1d108cb962 Fix GCC warning:
lib/CodeGen/RegAllocGreedy.cpp:311: error: unused variable 'PhysReg' [-Wunused-variable]

llvm-svn: 122122
2010-12-18 01:05:55 +00:00
Jakob Stoklund Olesen bf4550e3fb Pass a Banner argument to the machine code verifier both from
createMachineVerifierPass and MachineFunction::verify.

The banner is printed before the machine code dump, just like the printer pass.

llvm-svn: 122113
2010-12-18 00:06:56 +00:00
Jakob Stoklund Olesen cf846100d8 Avoid dereferencing end() in collectInterferingVRegs() when there is no
interference.

llvm-svn: 122108
2010-12-17 23:16:38 +00:00
Jakob Stoklund Olesen 2e98ee31b3 Make the -verify-regalloc command line option available to base classes as
RegAllocBase::VerifyEnabled.

Run the machine code verifier in a few interesting places during RegAllocGreedy.

llvm-svn: 122107
2010-12-17 23:16:35 +00:00
Jakob Stoklund Olesen 1740e00104 Enable loop splitting in RegAllocGreedy.
The heuristics split around the largest loop where the current register may be
allocated without interference.

llvm-svn: 122106
2010-12-17 23:16:32 +00:00
Bill Wendling 3fff1fd49b During local stack slot allocation, the materializeFrameBaseRegister function
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.

Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)

<rdar://problem/8782198>

llvm-svn: 122104
2010-12-17 23:09:14 +00:00
Bob Wilson 5408144add Fix a DAGCombiner crash when folding binary vector operations with constant
BUILD_VECTOR operands where the element type is not legal.  I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.

llvm-svn: 122102
2010-12-17 23:06:49 +00:00