Commit Graph

1315 Commits

Author SHA1 Message Date
Brian Gaeke 94e95d2b3e Great renaming: Sparc --> SparcV9
llvm-svn: 11826
2004-02-25 18:44:15 +00:00
Chris Lattner 309327a4b5 Teach the instruction selector how to transform 'array' GEP computations into X86
scaled indexes.  This allows us to compile GEP's like this:

int* %test([10 x { int, { int } }]* %X, int %Idx) {
        %Idx = cast int %Idx to long
        %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
        ret int* %X
}

Into a single address computation:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
        ret

Before it generated:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        shl %ECX, 3
        add %EAX, %ECX
        lea %EAX, DWORD PTR [%EAX + 4]
        ret

This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
bzip2 for example, we go from this:

10665 asm-printer           - Number of machine instrs printed
   40 ra-local              - Number of loads/stores folded into instructions
 1708 ra-local              - Number of loads added
 1532 ra-local              - Number of stores added
 1354 twoaddressinstruction - Number of instructions added
 1354 twoaddressinstruction - Number of two-address instructions
 2794 x86-peephole          - Number of peephole optimization performed

to this:
9873 asm-printer           - Number of machine instrs printed
  41 ra-local              - Number of loads/stores folded into instructions
1710 ra-local              - Number of loads added
1521 ra-local              - Number of stores added
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole          - Number of peephole optimization performed

... and these types of instructions are often in tight loops.

Linear scan is also helped, but not as much.  It goes from:

8787 asm-printer           - Number of machine instrs printed
2389 liveintervals         - Number of identity moves eliminated after coalescing
2288 liveintervals         - Number of interval joins performed
3522 liveintervals         - Number of intervals after coalescing
5810 liveintervals         - Number of original intervals
 700 spiller               - Number of loads added
 487 spiller               - Number of stores added
 303 spiller               - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
 363 x86-peephole          - Number of peephole optimization performed

to:

7982 asm-printer           - Number of machine instrs printed
1759 liveintervals         - Number of identity moves eliminated after coalescing
1658 liveintervals         - Number of interval joins performed
3282 liveintervals         - Number of intervals after coalescing
4940 liveintervals         - Number of original intervals
 635 spiller               - Number of loads added
 452 spiller               - Number of stores added
 288 spiller               - Number of register spills
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
 258 x86-peephole          - Number of peephole optimization performed

Though I'm not complaining about the drop in the number of intervals.  :)

llvm-svn: 11820
2004-02-25 07:00:55 +00:00
Chris Lattner d1ee55d439 * Make the previous patch more efficient by not allocating a temporary MachineInstr
to do analysis.

*** FOLD getelementptr instructions into loads and stores when possible,
    making use of some of the crazy X86 addressing modes.

For example, the following C++ program fragment:

struct complex {
    double re, im;
    complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
    return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
    return arg + complex(1,0);
}

Used to be compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
***     mov %EDX, %ECX
        fld QWORD PTR [%EDX]
        fld1
        faddp %ST(1)
***     add %ECX, 8
        fld QWORD PTR [%ECX]
        fldz
        faddp %ST(1)
***     mov %ECX, %EAX
        fxch %ST(1)
        fstp QWORD PTR [%ECX]
***     add %EAX, 8
        fstp QWORD PTR [%EAX]
        ret

Now it is compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        fld QWORD PTR [%ECX]
        fld1
        faddp %ST(1)
        fld QWORD PTR [%ECX + 8]
        fldz
        faddp %ST(1)
        fxch %ST(1)
        fstp QWORD PTR [%EAX]
        fstp QWORD PTR [%EAX + 8]
        ret

Other programs should see similar improvements, across the board.  Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86.  :)

llvm-svn: 11819
2004-02-25 06:13:04 +00:00
Chris Lattner 4b3514c173 Add a helper to create an addressing mode given all of the pieces.
llvm-svn: 11818
2004-02-25 06:01:07 +00:00
Chris Lattner d825d30f42 add an inefficient way of folding structure and constant array indexes together
into a single LEA instruction.  This should improve the code generated for
things like X->A.B.C[12].D.

The bigger benefit is still coming though.  Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom.  We should probably
never generate ADDri32's.

llvm-svn: 11817
2004-02-25 03:45:50 +00:00
Chris Lattner f85e33cd79 Implement special case for storing an immediate into memory so that we don't need
an intermediate register.

llvm-svn: 11816
2004-02-25 02:56:58 +00:00
Brian Gaeke 10a32da382 FunctionLiveVarInfo.h moved: include/llvm/CodeGen -> lib/Target/Sparc/LiveVar
llvm-svn: 11804
2004-02-24 19:46:00 +00:00
Chris Lattner b471f0188f Fix some unexpected fallout from the config.h changes. Because the CBE no
longer was getting this #include, it always fell back on the less precise
floating point initializer values, causing some testsuite failures.

llvm-svn: 11803
2004-02-24 18:34:10 +00:00
Alkis Evlogimenos af2de4848e Refactor rewinding code for finding the first terminator of a basic
block into MachineBasicBlock::getFirstTerminator().

This also fixes a bug in the implementation of the above in both
RegAllocLocal and InstrSched, where instructions where added after the
terminator if the basic block's only instruction was a terminator (it
shouldn't matter for RegAllocLocal since this case never occurs in
practice).

llvm-svn: 11748
2004-02-23 18:14:48 +00:00
Chris Lattner cb185a34bb Simplify code a bit, don't go off the end of the block, now that the current
block we are in might be empty

llvm-svn: 11744
2004-02-23 07:42:19 +00:00
Chris Lattner 4ffd4443ce We were forgetting to add FP_REG_KILL instructions to basic blocks which will
eventually get an assignment due to elimination of PHIs.

llvm-svn: 11743
2004-02-23 07:29:45 +00:00
Chris Lattner abb9162999 Work around a gas bug. Print '-9223372036854775808' as unsigned.
llvm-svn: 11729
2004-02-23 03:27:05 +00:00
Chris Lattner 7e90628a8a Implement cast fp -> bool
llvm-svn: 11728
2004-02-23 03:21:41 +00:00
Chris Lattner 6590c29971 Stop passing iterators around by reference now that we have ilists!
Implement cast Type::ULongTy -> double

llvm-svn: 11726
2004-02-23 03:10:10 +00:00
Chris Lattner 378157c3d7 Add a new cmove instruction
llvm-svn: 11722
2004-02-23 01:16:05 +00:00
Chris Lattner cdd56634b0 Only insert FP_REG_KILL instructions in MachineBasicBlocks that actually
use FP instructions.  This reduces the number of instructions inserted in
176.gcc (for example) from 58074 to 101 (it doesn't use much FP, which
is typical).  This reduction speeds up the entire code generator.  In the
case of 176.gcc, llc went from taking 31.38s to 24.78s.  The passes that
sped up the most are the register allocator and the 2 live variable analysis
passes, which sped up 2.3, 1.3, and 1.5s respectively.  The asmprinter
pass also sped up because it doesn't print the instructions in comments :)

Note that this patch is likely to expose latent bugs in machine code passes,
because now basicblock can be empty, where they were never empty before.  I
cleaned out regalloclocal, but who knows about linscan :)

llvm-svn: 11717
2004-02-22 19:47:26 +00:00
Alkis Evlogimenos 8358cc573d Move MOTy::UseType enum into MachineOperand. This eliminates the
switch statements in the constructors and simplifies the
implementation of the getUseType() member function. You will have to
specify defs using MachineOperand::Def instead of MOTy::Def though
(similarly for Use and UseAndDef).

llvm-svn: 11715
2004-02-22 19:23:26 +00:00
Chris Lattner fae7564027 Reduce the number of pointless copies inserted due to constant pointer refs.
Also, make an assertion actually fireable!

llvm-svn: 11713
2004-02-22 17:35:42 +00:00
Chris Lattner fa3ebd6ad5 Fix bug in previous checkout: leave the iterator at the first instruction
AFTER the GEP that was emitted.  :(

llvm-svn: 11712
2004-02-22 17:05:38 +00:00
Chris Lattner 6536519f6e Completely rewrite how getelementptr instructions are expanded. This has two
(minor) benefits right now:

1. An extra dummy MOVrr32 is gone.  This move would often be coallesced by
   both allocators anyway.
2. The code now uses the gep_type_iterator to walk the gep, which should future
   proof it a bit.  It still assumes that array indexes are Longs though.

These don't really justify rewriting the code.  The big benefit will come later
though.

llvm-svn: 11710
2004-02-22 07:04:00 +00:00
Alkis Evlogimenos de51c65299 When folding memory operands in machine instructions be careful to
leave register operands with the same use/def flags as the original
instruction.

llvm-svn: 11709
2004-02-22 06:54:26 +00:00
Chris Lattner 5fc6ae2baf Wow this is out of date. When we have _real_ code generator documentation,
this should be folded into it.

llvm-svn: 11705
2004-02-22 05:53:54 +00:00
Chris Lattner 87d72eb23f The two address pass cannot handle two addr instructions where one incoming
value is a physreg and one is a virtreg.  For this reason, disable copy folding
entirely for physregs.  Also, use the new isMoveInstr target hook which gives us
folding of FP moves as well.

llvm-svn: 11700
2004-02-22 04:44:58 +00:00
Chris Lattner 73ffc88a8b It is totally unacceptable to print out (literally) millions of zeros when
compiling 129.compress... so don't!

llvm-svn: 11649
2004-02-20 05:49:22 +00:00
Alkis Evlogimenos 6401b22fc2 Fix argument size for MOVSX and MOVZX instructions.
llvm-svn: 11576
2004-02-18 16:20:40 +00:00
Chris Lattner 30e73e3442 Add support for GlobalAddress's for alkis
llvm-svn: 11560
2004-02-17 18:23:55 +00:00
Alkis Evlogimenos 47ea17a852 These store to memory too.
llvm-svn: 11558
2004-02-17 17:53:48 +00:00
Chris Lattner 49794be442 These store to memory, not read from it.
llvm-svn: 11556
2004-02-17 17:46:50 +00:00
Alkis Evlogimenos cf7b9392ea Instructiosn with 1 memory operand have 4 operands in our
representation.. duh!

llvm-svn: 11554
2004-02-17 15:58:13 +00:00
Alkis Evlogimenos f90da5f346 Align case statements.
llvm-svn: 11552
2004-02-17 15:50:41 +00:00
Alkis Evlogimenos 546513ccfd Add TEST and XCHG memory operand support.
llvm-svn: 11550
2004-02-17 15:48:42 +00:00
Alkis Evlogimenos f08064b714 Add OR and XOR memory operand support.
llvm-svn: 11549
2004-02-17 15:33:14 +00:00
Alkis Evlogimenos 65a5ee86ba Peephole optimize SUBmi{16,32} into SUBmi{16,32}b when immediate is 8
bits wide.

llvm-svn: 11548
2004-02-17 15:14:29 +00:00
Alkis Evlogimenos e9583082a6 ADDmi{16,32} should be in the next case statement.
llvm-svn: 11547
2004-02-17 15:10:11 +00:00
Alkis Evlogimenos e5585328d8 Add memory operand folding support for MUL, DIV, IDIV, NEG, NOT,
MOVSX, and MOVZX.

llvm-svn: 11546
2004-02-17 09:14:23 +00:00
Alkis Evlogimenos 93398df103 Add memory operand folding for CMP{rm,mr,mi}{8,16,32}, INCm{8,16,32}
and DECm{8,16,32} instructions.

llvm-svn: 11545
2004-02-17 08:49:20 +00:00
Alkis Evlogimenos 574c7c9ce9 Add CMP{rm,mr,mi}{8,16,32}, INCm{8,16,32} and DECm{8,16,32} instructions.
llvm-svn: 11544
2004-02-17 08:49:00 +00:00
Alkis Evlogimenos d5ce14ddd1 Add SUB{rm,mr,mi}{8,16,32} instructions.
llvm-svn: 11543
2004-02-17 08:17:40 +00:00
Alkis Evlogimenos 7f615d7e9a Add support for folding memory operands for ADC, SBB and SUB instructions.
llvm-svn: 11541
2004-02-17 08:08:51 +00:00
Alkis Evlogimenos b591e5de31 Add support for ADC{rm.mr}32 and SBB{rm,mr}32.
llvm-svn: 11540
2004-02-17 08:06:31 +00:00
Chris Lattner 7ef6d2fd0e Add a (hidden) option to print instructions that fail to fuse. It's looking
like compares and test's would be the next huge win...

llvm-svn: 11539
2004-02-17 08:03:47 +00:00
Alkis Evlogimenos 6974f4758a Add support for folding memory operands in MOVri{8,16,32} instructions.
llvm-svn: 11538
2004-02-17 07:47:20 +00:00
Chris Lattner c07eeaef6b Expand the repertoire of the forms we can print and encode.
llvm-svn: 11537
2004-02-17 07:40:44 +00:00
Chris Lattner a9363fda17 Disable this peephole for now. We can't keep track of the fact that the immediate is 8 bits,
but the memory reference is full sized.

llvm-svn: 11536
2004-02-17 07:36:32 +00:00
Chris Lattner ca8f1c2716 Add an option to disable spill fusing in the X86 backend
llvm-svn: 11531
2004-02-17 06:30:34 +00:00
Chris Lattner 3abcdf3b90 Fix the mneumonics for the mov instructions to have the source and destination
order in the correct sense!! Arg!

llvm-svn: 11530
2004-02-17 06:28:19 +00:00
Chris Lattner ebd90733b0 Fix the last crimes against nature that used the 'ir' ordering to use the
'ri' ordering instead... no it's not possible to store a register into an
immediate!

llvm-svn: 11529
2004-02-17 06:24:02 +00:00
Chris Lattner 9990476875 GRRR. Move instructions have swapped the order of the r/m operands.
llvm-svn: 11528
2004-02-17 06:20:20 +00:00
Chris Lattner 288e043e1b Rename MOVi[mr] instructions to MOV[rm]i
llvm-svn: 11527
2004-02-17 06:16:44 +00:00
Chris Lattner 4c241855e6 Whoops, got my cases swapped.
llvm-svn: 11526
2004-02-17 06:02:15 +00:00