Commit Graph

60 Commits

Author SHA1 Message Date
David Majnemer 22dff0aafc [COFF] Don't hard-code the load configuration size
The load configuration directory is a structure whose size varies as the
OS gains additional functionality.  To account for this, the structure's
layout begins with a size field; this allows loaders to know which
fields are available.

However, LLD hard-coded the sizes (112 bytes for 64-bit and 64 for
32-bit).  This means that we might not inform the loader of all the
pertinent fields or we might claim that there are more fields than are
actually present.

To correctly account for this, the size field must be loaded from the
_load_config_used symbol.

N.B.  The COFF spec is either wrong or out of date, the load
configuration directory is not correctly documented in the
specification: it omits the size field.

llvm-svn: 263543
2016-03-15 09:48:27 +00:00
Rui Ueyama 489a806965 Update for LLVM function name change.
llvm-svn: 257801
2016-01-14 20:53:50 +00:00
Rui Ueyama dba6b576cf COFF: Rename RoundUpToAlignment -> align.
llvm-svn: 257220
2016-01-08 22:24:26 +00:00
Rui Ueyama 43e12900d9 COFF: Non-external COMDAT sections sholud not be merged by ICF.
If a section symbol is not external, that COMDAT section should never
be merge with other sections in other compilation unit. Previously,
we didn't take visibility into account.

Note that COMDAT sections with non-external visibility makes sense
because they can be removed by dead-stripping.

Fixes https://llvm.org/bugs/show_bug.cgi?id=25686

llvm-svn: 254578
2015-12-03 02:23:33 +00:00
Rui Ueyama 548d22c073 COFF: ICF should not merge sectinos if their alignments are not the same.
There's actually a room to improve this patch. Instead of not merging
sections that have different alignements, we can choose the section that
has the largest alignment requirement among all sections that are otherwise
considered the same. Then all section alignments are satisfied, so we can
merge them.

I don't know if that improvement could make any difference for real-world
input, so I'll leave it alone. Would be interesting to revisit later.

llvm-svn: 248581
2015-09-25 16:50:12 +00:00
Rui Ueyama de88072a00 COFF: Rename Ptr -> Repl.
This pointer points to a replacement for this chunk. Ptr was not a good name.

llvm-svn: 248579
2015-09-25 16:20:24 +00:00
Rui Ueyama 3cb1f5c860 COFF: Rename A.replaceWith(B) -> B.replace(A). NFC.
llvm-svn: 248197
2015-09-21 19:36:51 +00:00
Rui Ueyama 3cfd2bff1e Remove dead code.
llvm-svn: 248105
2015-09-20 01:19:36 +00:00
Rui Ueyama 63bbe84b27 COFF: Make Chunk::writeTo() const. NFC.
This should improve code readability especially because this function
is called inside parallel_for_each.

llvm-svn: 248103
2015-09-19 23:28:57 +00:00
Rui Ueyama 27e9e6540c Remove unused #includes.
llvm-svn: 248081
2015-09-19 02:28:32 +00:00
Rui Ueyama aa95e5a4cc COFF: Parallelize ICF.
The LLD's ICF algorithm is highly parallelizable. This patch does that
using parallel_for_each.

ICF accounted for about one third of total execution time. Previously,
it took 324 ms when self-hosting. Now it takes only 62 ms.

Of course your mileage may vary. My machine is a beefy 24-core Xeon machine,
so you may not see this much speedup. But this optimization should be
effective even for 2-core machine, since I saw speedup (324 ms -> 189 ms)
when setting parallelism parameter to 2.

llvm-svn: 248038
2015-09-18 21:06:34 +00:00
Rui Ueyama c9a6e827bd COFF: Optimize ICF by not creating temporary vectors.
Previously, ICF created a vector for each SectionChunk. The vector
contained pointers to successors, which are namely associative sections
and COMDAT relocation targets. The reason I created vectors is because
I thought that that would make section comparison faster.

It did make the comparison faster. When self-linking, for example, it
saved about 10 ms on each iteration. The time we spent on constructing
the vectors was 124 ms. If we iterate more than 12 times, return from
the investment exceeds the initial cost.

In reality, it usually needs 5 iterations. So we shouldn't construct
the vectors.

llvm-svn: 247963
2015-09-18 01:51:37 +00:00
Rui Ueyama 4dbff20c91 COFF: Fix bug that not all symbols were written to symtab if /opt:noref.
Only live symbols are written to the symbol table. Because isLive()
returned false if dead-stripping was disabled entirely, only
non-COMDAT sections were written to the symbol table. This patch fixes
the issue.

llvm-svn: 247856
2015-09-16 21:40:47 +00:00
Rui Ueyama 92298d5418 COFF: Create ICF class to move code from SectionChunk to ICF. NFC.
This patch defines ICF class and defines ICF-related functions as
members of the class. By doing this we can move code that are
related only to ICF from SectionChunk to the newly-defined class.
This also eliminates a global variable "NextID".

llvm-svn: 247802
2015-09-16 14:19:10 +00:00
Rui Ueyama 9cb2870ce0 ICF: Improve ICF to reduce more sections than before.
This is a patch to make LLD to be on par with MSVC in terms of ICF
effectiveness. MSVC produces a 27.14MB executable when linking LLD.
LLD previously produced a 27.61MB when self-linking. Now the size
is reduced to 27.11MB. Note that without ICF the size is 29.63MB.

In r247387, I implemented an algorithm that handles section graphs
as cyclic graphs and merge them using SCC. The algorithm did not
always work as intended as I demonstrated in r247721. The new
algortihm implemented in this patch is different from the previous
one. If you are interested the details, you want to read the file
comment of ICF.cpp.

llvm-svn: 247770
2015-09-16 03:26:31 +00:00
Rui Ueyama 5b93aa51de COFF: Teach ICF to merge cyclic graphs.
Previously, LLD's ICF couldn't merge cyclic graphs. That was unfortunate
because, in COFF, cyclic graphs are not exceptional at all. That is
pretty common.

In this patch, sections are grouped by Tarjan's strongly connected
component algorithm to get acyclic graphs. And then we try to merge
SCCs whose outdegree is zero, and remove them from the graph. This
makes other SCCs to have outdegree zero, so we can repeat the
process until all SCCs are removed. When comparing two SCCs, we handle
cycles properly.

This algorithm works better than previous one. Previously, self-linking
produced a 29.0MB executable. It now produces a 27.7MB. There's still some
gap compared to MSVC linker which produces a 27.1MB executable for the
same input. So the gap is narrowed, but still LLD is not on par with MSVC.
I'll investigate that later.

llvm-svn: 247387
2015-09-11 04:29:03 +00:00
Rui Ueyama ef907ec82d COFF: Implement a better algorithm for ICF.
Identical COMDAT Folding is a feature to merge COMDAT sections
by contents. Two sections are considered the same if their contents,
relocations, attributes, etc, are all the same.

An interesting fact is that MSVC linker takes "iterations" parameter
for ICF because the algorithm they are using is iterative. Merging
two sections could make more sections to be mergeable because
different relocations could now point to the same section. ICF is
repeated until we get a convergence (until no section can be merged).
This algorithm is not fast. Usually it needs three iterations until a
convergence is obtained.

In the new algorithm implemented in this patch, we consider sections
and relocations as a directed acyclic graph, and we try to merge
sections whose outdegree is zero. Sections with outdegree zero are then
removed from the graph, which makes  other sections to have outdegree
zero. We repeat that until all sections are processed. In this
algorithm, we don't iterate over the same sections many times.

There's an apparent issue in the algorithm -- the section graph is
not guaranteed to be acyclic. It's actually pretty often cyclic.
So this algorithm cannot eliminate all possible duplicates.
That's OK for now because the previous algorithm was not able to
eliminate cycles too. I'll address the issue in a follow-up patch.

llvm-svn: 246878
2015-09-04 21:35:54 +00:00
Rui Ueyama 2dcc23580e COFF: Use section content checksum for ICF.
Previously, we calculated our own hash values for section contents.
Of coruse that's slow because we had to access all bytes in sections.
Fortunately, COFF objects usually contain hash values for COMDAT
sections. We can use that to speed up Identical COMDAT Folding.

llvm-svn: 246869
2015-09-04 20:45:50 +00:00
Rafael Espindola beee25e484 Make these headers as being c++.
llvm-svn: 245050
2015-08-14 14:12:54 +00:00
Rafael Espindola 5c546a1437 COFF: In chunks, store the offset from the start of the output section. NFC.
This is more convenient than the offset from the start of the file as we
don't have to worry about it changing when we move the output section.

This is a port of r245008 from ELF.

llvm-svn: 245018
2015-08-14 03:30:59 +00:00
Rui Ueyama 67fcd1a0c7 COFF: Fix bad #includes.
Writer.h is intended to be included only by Writer.cpp and Driver.cpp.
Use of the header in other files are bad.

llvm-svn: 244106
2015-08-05 19:51:28 +00:00
Rui Ueyama f69ecc1212 COFF: Handle all COMDAT sections as non-GC root.
I don't remember why I thought that only functions are subject
of garbage collection, but the comment here said so, which is
not correct. Moreover, the code just below the comment does not
do what the comment says -- it handles non-COMDAT, non-function
sections as GC root. As a result, it just handles non-COMDAT
sections as GC root.

This patch cleans that up by removing SectionChunk::isRoot and
use isCOMDAT instead.

llvm-svn: 243700
2015-07-30 22:48:45 +00:00
Rui Ueyama eb26e1d03c COFF: Fix SECREL and SECTION relocations.
SECREL should sets the 32-bit offset of the target from the beginning
of *target's* output section. Previously, the offset from the beginning
of source's output section was used instead.

SECTION means the target section's index, and not the source section's
index. This patch fixes that issue too.

llvm-svn: 243535
2015-07-29 16:30:45 +00:00
Rui Ueyama 3dd9372d2b COFF: ARM: Support import functions.
llvm-svn: 243205
2015-07-25 03:39:29 +00:00
Rui Ueyama 237fca1451 COFF: ARM: Implement MOV32T relocation.
llvm-svn: 243201
2015-07-25 03:03:46 +00:00
Rui Ueyama 3afd5bfd7b COFF: Handle base relocation as a tuple of relocation type and RVA. NFC.
On x64 and x86, we use only one base relocation type, so we handled
base relocations just as a list of RVAs. That doesn't work well for
ARM becuase we have to handle two types of base relocations on ARM.
This patch changes the type of base relocation from uint32_t to
{reltype, uint32_t} to make it easy to port this code to ARM.

llvm-svn: 243197
2015-07-25 01:44:32 +00:00
Rui Ueyama 28df04211c COFF: Split ImportThunkChunk into x86 and x64. NFC.
This change should make it easy to port this code to ARM.

llvm-svn: 243195
2015-07-25 01:16:06 +00:00
Rui Ueyama cd3f99b6c5 COFF: Implement Safe SEH support for x86.
An object file compatible with Safe SEH contains a .sxdata section.
The section contains a list of symbol table indices, each of which
is an exception handler function. A safe SEH-enabled executable
contains a list of exception handler RVAs. So, what the linker has
to do to support Safe SEH is basically to read the .sxdata section,
interpret the contents as a list of symbol indices, unique-fy and
sort their RVAs, and then emit that list to .rdata. This patch
implements that feature.

llvm-svn: 243182
2015-07-24 23:51:14 +00:00
Rui Ueyama 3cb895c930 COFF: Fix __ImageBase symbol relocation.
__ImageBase is a special symbol whose value is the image base address.
Previously, we handled __ImageBase symbol as an absolute symbol.

Absolute symbols point to specific locations in memory and the locations
never change even if an image is base-relocated. That means that we
don't have base relocation entries for absolute symbols.

This is not a case for __ImageBase. If an image is base-relocated, its
base address changes, and __ImageBase needs to be shifted as well.
So we have to have base relocations for __ImageBase. That means that
__ImageBase is not really an absolute symbol but a different kind of
symbol.

In this patch, I introduced a new type of symbol -- DefinedRelative.
DefinedRelative is similar to DefinedAbsolute, but it has not a VA but RVA
and is a subject of base relocation. Currently only __ImageBase is of
the new symbol type.

llvm-svn: 243176
2015-07-24 22:58:44 +00:00
Rui Ueyama 33fb2cb11b COFF: Fix base relocations for __imp_ symbols on x86.
Because thunks for dllimported symbols contain absolute addresses on x86,
they need to be relocated at load-time. This bug was a cause of crashes
in DLL initialization routines.

llvm-svn: 242259
2015-07-15 00:25:38 +00:00
Rui Ueyama d4b351f0de COFF: Fix locally-imported symbol's size for x86.
llvm-svn: 241860
2015-07-09 21:15:58 +00:00
Rui Ueyama 11863b4ae1 COFF: Support x86 file header and relocations.
llvm-svn: 241657
2015-07-08 01:45:29 +00:00
Rui Ueyama 661a4e7ab6 COFF: Split writeTo in preparation for supporting 32-bit x86.
llvm-svn: 241638
2015-07-07 22:49:21 +00:00
Rui Ueyama 7a333c66be COFF: Fix locally-imported symbols.
Previously, pointers pointed by locally-imported symbols were broken.
It has only 4 bytes although the correct size is 8 byte. This patch
fixes that bug.

llvm-svn: 241295
2015-07-02 20:33:50 +00:00
Chandler Carruth 59013c387e [opt] Replace the recursive walk for GC with a worklist algorithm.
This flattens the entire liveness walk from a recursive mark approach to
a worklist approach. It also sinks the worklist management completely
out of the SectionChunk and into the Writer by exposing the ability to
iterato over children of a chunk and over the symbol bodies of relocated
symbols. I'm not 100% happy with the API names, so suggestions welcome
there.

This allows us to use a single worklist for the entire recursive walk
and would also be a natural place to take advantage of parallelism at
some future point.

With this, we completely inline away the GC walk into the
Writer::markLive function and it makes it very easy to profile what is
slow. Currently, time is being wasted checking whether a Chunk isa
SectionChunk (it essentially always is), finding (or skipping)
a replacement for a symbol, and chasing pointers between symbols and
their chunks. There are a bunch of things we can do to fix this, and its
easier to do them after this change IMO.

This change alone saves 1-2% of the time for my self-link of lld.exe
(which I'm running and benchmarking on Linux ironically).

Perhaps more notably, we'll no longer blow out the stack for large
links. =]

Just as an FYI, at this point, I/O is starting to really dominate the
profile. Well over 10% of the time appears to be inside the kernel doing
page table silliness. I think a decent chunk of this can be nuked as
well, but it's a little odd as cross-linking in this way isn't really
the primary goal here.

Differential Revision: http://reviews.llvm.org/D10790

llvm-svn: 240995
2015-06-29 21:12:49 +00:00
Rui Ueyama 871847e32d COFF: Fix ICF correctness bug.
When comparing two COMDAT sections, we need to take section values
and associative sections into account. This patch fixes that bug.
It fixes a crash bug of llvm-tblgen when linked with /opt:lldicf.

One thing I don't understand yet is that this logic seems to be
too strict. MSVC linker is able to create more compact executables
(which of course work correctly). With this ICF algorithm, LLD is
able to make executable smaller, but the outputs are larger than
MSVC's. There must be something I'm missing here.

llvm-svn: 240897
2015-06-28 01:30:54 +00:00
Rui Ueyama 7383562bc9 COFF: Align DLL import thunks on 16-byte boundaries.
llvm-svn: 240806
2015-06-26 18:28:56 +00:00
Rui Ueyama 9b921e5dc9 COFF: Merge DefinedRegular and DefinedCOMDAT.
I split them in r240319 because I thought they are different enough
that we should treat them as different types. It turned out that
that was not a good idea. They are so similar that we ended up having
many duplicate code.

llvm-svn: 240706
2015-06-25 22:00:42 +00:00
Rui Ueyama fc510f4cf8 COFF: Devirtualize mark(), markLive() and isCOMDAT().
Only SectionChunk can be dead-stripped. Previously,
all types of chunks implemented these functions,
but their functions were blank.

Likewise, only DefinedRegular and DefinedCOMDAT symbols
can be dead-stripped. markLive() function was implemented
for other symbol types, but they were blank.

I started thinking that the change I made in r240319 was
a mistake. I separated DefinedCOMDAT from DefinedRegular
because I thought that would make the code cleaner, but now
we want to handle them as the same type here. Maybe we
should roll it back.

This change should improve readability a bit as this removes
some dubious uses of reinterpret_cast. Previously, we
assumed that all COMDAT chunks are actually SectionChunks,
which was not very obvious.

llvm-svn: 240675
2015-06-25 19:10:58 +00:00
Rui Ueyama f34c088515 COFF: Simplify. NFC.
llvm-svn: 240666
2015-06-25 17:56:36 +00:00
Rui Ueyama 02c302790f COFF: Don't use COFFHeader->NumberOfRelocations.
The size of the field is 16 bit, so it's inaccurate if the
number of relocations in a section is more than 65535.

llvm-svn: 240661
2015-06-25 17:43:37 +00:00
Rui Ueyama 88e0f9206b COFF: Fix a bug of __imp_ symbol.
The change I made in r240620 was not correct. If a symbol foo is
defined, and if you use __imp_foo, __imp_foo symbol is automatically
defined as a pointer (not just an alias) to foo.

Now that we need to create a chunk for automatically-created symbols.
I defined LocalImportChunk class for them.

llvm-svn: 240622
2015-06-25 03:31:47 +00:00
Rui Ueyama 42aa00b34b COFF: Use COFFObjectFile::getRelocations(). NFC.
llvm-svn: 240614
2015-06-25 00:33:38 +00:00
Rui Ueyama cde92423d7 COFF: Cache raw pointers to relocation tables.
Getting an iterator to the relocation table is very hot operation
in the linker. We do that not only to apply relocations but also
to mark live sections and to do ICF.

libObject's interface is slow. By caching pointers to the first
relocation table entries makes the linker 6% faster to self-link.

We probably need to fix libObject as well.

llvm-svn: 240603
2015-06-24 23:03:17 +00:00
Rui Ueyama ddf71fc370 COFF: Initial implementation of Identical COMDAT Folding.
Identical COMDAT Folding (ICF) is an optimization to reduce binary
size by merging COMDAT sections that contain the same metadata,
actual data and relocations. MSVC link.exe and many other linkers
have this feature. LLD achieves on per with MSVC in terms produced
binary size with this patch.

This technique is pretty effective. For example, LLD's size is
reduced from 64MB to 54MB by enaling this optimization.

The algorithm implemented in this patch is extremely inefficient.
It puts all COMDAT sections into a set to identify duplicates.
Time to self-link with/without ICF are 3.3 and 320 seconds,
respectively. So this option roughly makes LLD 100x slower.
But it's okay as I wanted to achieve correctness first.
LLD is still able to link itself with this optimization.
I'm going to make it more efficient in followup patches.

Note that this optimization is *not* entirely safe. C/C++ require
different functions have different addresses. If your program
relies on that property, your program wouldn't work with ICF.
However, it's not going to be an issue on Windows because MSVC
link.exe turns ICF on by default. As long as your program works
with default settings (or not passing /opt:noicf), your program
would work with LLD too.

llvm-svn: 240519
2015-06-24 04:36:52 +00:00
Peter Collingbourne bd3a29d063 COFF: Remove unused field SectionChunk::SectionIndex.
llvm-svn: 240512
2015-06-24 00:12:36 +00:00
Rui Ueyama 6a60be7749 COFF: Add names for logging/debugging to COMDAT chunks.
Chunks are basically unnamed chunks of bytes, and we don't like
to give them names. However, for logging or debugging, we want to
know symbols names of functions for COMDAT chunks. (For example,
we want to print out "we have removed unreferenced COMDAT section
which contains a function FOOBAR.")

This patch is to do that.

llvm-svn: 240484
2015-06-24 00:00:52 +00:00
Rui Ueyama 588e832d0a COFF: Support base relocations.
PE/COFF executables/DLLs usually contain data which is called
base relocations. Base relocations are a list of addresses that
need to be fixed by the loader if load-time relocation is needed.

Base relocations are in .reloc section.

We emit one base relocation entry for each IMAGE_REL_AMD64_ADDR64
relocation.

In order to save disk space, base relocations are grouped by page.
Each group is called a block. A block starts with a 32-bit page
address followed by 16-bit offsets in the page. That is more
efficient representation of addresses than just an array of 32-bit
addresses.

llvm-svn: 239710
2015-06-15 01:23:58 +00:00
Rui Ueyama 8b33f59bfd COFF: De-virtualize and inline garbage collector functions.
isRoot, isLive and markLive functions are called very frequently.
Previously, they were virtual functions. This patch make them
non-virtual.

Also this patch checks chunk liveness before calling its mark().
Previously, we did that at beginning of markLive(), so the virtual
function would return immediately if it's live. That was inefficient.

llvm-svn: 239458
2015-06-10 04:21:47 +00:00
Rui Ueyama 5b2588ae8c COFF: Print out log messages to stdout.
llvm-svn: 239288
2015-06-08 05:43:50 +00:00