Commit Graph

7387 Commits

Author SHA1 Message Date
Rui Ueyama 5419861a52 Remove a wrong performance optimization.
This is a hack for single thread execution. We are using Color[0] and
Color[1] alternately on each iteration. This optimization is to look
at the next slot as opposted to the current slot to get recent results
early. Turns out that the assumption is wrong, because the other slots
are not always have the most recent values, but instead it may have
stale values of the previous iteration. This patch removes that
performance hack.

llvm-svn: 288527
2016-12-02 18:40:43 +00:00
Rui Ueyama 83ec681a5c Removed a wrong assertion about non-colorable sections.
The assertion asserted that colorable sections can never have
a reference to non-colorable sections, but that was simply wrong.
They can have references to non-colorable sections. If that's the
case, referenced sections must be the same in terms of pointer
comparison.

llvm-svn: 288511
2016-12-02 17:23:58 +00:00
Rui Ueyama 3a618e5606 Port parallel ICF to COFF.
LLD used to take 11.73 seconds to link Clang. Now it is 6.94 seconds.
MSVC link takes 83.02 seconds. Note that ICF is enabled by default on
Windows, so a low latency ICF is more important than in ELF.

llvm-svn: 288487
2016-12-02 08:03:58 +00:00
Rafael Espindola 5708b2f8a6 Ignore R_X86_64_NONE.
It looks like the way dtrace works is

* The user creates .o files that reference magical symbol names.
* dtrace reads those files, collecs the info it needs and changes the
  relocation to R_X86_64_NONE expecting the linker to ignore them.

llvm-svn: 288485
2016-12-02 08:00:09 +00:00
Rui Ueyama 27498b5dd5 Fix a bug in ICF involving COFF associative sections.
Associative sections are sections that need to be linked if their associated
sections are linked. Associative sections are used to append auxiliary data
such as debug info.

Previously, we compared all associative sections when comparing two comdat
sections. Because usually assocative sections are not mergeable sections,
we missed a lot of mergeable sections. MSVC linker doesn't seem to check
the identity of associative sections.

This patch makes LLD to ignore associative sections when doing ICF.

llvm-svn: 288483
2016-12-02 07:46:12 +00:00
Rui Ueyama 1b6bab011c Fix the worse case performance of ICF.
r288228 seems to have regressed ICF performance in some cases in which
a lot of sections are actually mergeable. In r288228, I made a change
to create a Range object for each new color group. So every time we
split a group, we allocated and added a new group to a list of groups.

This patch essentially reverted r288228 with an improvement to
parallelize the original algorithm.

Now the ICF main loop is entirely allocation-free and lock-free.

Just like pre-r288228, we search for group boundaries by linear scan
instead of managing the information using Range class. r288228 was
neutral in performance-wise, and so is this patch.

I confirmed that this produces the exact same result as before
using chromium and clang as tests.

llvm-svn: 288480
2016-12-02 05:35:46 +00:00
Rafael Espindola 103fc28961 Add a test documenting how we handle addends on Elf_Rela.
llvm-svn: 288477
2016-12-02 04:20:47 +00:00
Rafael Espindola 858c092daa Allow duplicated abs symbols with the same value.
This is a fairly reasonable bfd extension since there is one obvious value.

dtrace depends on this feature as it creates multiple absolute
symbols with the same value.

llvm-svn: 288461
2016-12-02 02:58:21 +00:00
Rafael Espindola f4ff80c128 Write the addent to got entries when using Elf_Rel.
llvm-svn: 288451
2016-12-02 01:57:24 +00:00
Rui Ueyama 395859bdb7 Fix undefined behavior.
New items can be added to Ranges here, and that invalidates
an iterater that previously pointed the end of the vector.

llvm-svn: 288443
2016-12-02 00:38:15 +00:00
Rui Ueyama a6cd5fe415 Add an assert instead of ignoring an impossible condition.
llvm-svn: 288419
2016-12-01 21:41:06 +00:00
Rui Ueyama 91ae861af5 Updates file comments and variable names.
Use "color" instead of "group id" to describe the ICF algorithm.

llvm-svn: 288409
2016-12-01 19:45:22 +00:00
Rui Ueyama c1835319c9 Parallelize ICF to make LLD's ICF really fast.
ICF is short for Identical Code Folding. It is a size optimization to
identify two or more functions that happened to have the same contents
to merges them. It usually reduces output size by a few percent.

ICF is slow because it is computationally intensive process. I tried
to paralellize it before but failed because I couldn't make a
parallelized version produce consistent outputs. Although it didn't
create broken executables, every invocation of the linker generated
slightly different output, and I couldn't figure out why.

I think I now understand what was going on, and also came up with a
simple algorithm to fix it. So is this patch.

The result is very exciting. Chromium for example has 780,662 input
sections in which 20,774 are reducible by ICF. LLD previously took
7.980 seconds for ICF. Now it finishes in 1.065 seconds.

As a result, LLD can now link a Chromium binary (output size 1.59 GB)
in 10.28 seconds on my machine with ICF enabled. Compared to gold
which takes 40.94 seconds to do the same thing, this is an amazing
number.

From here, I'll describe what we are doing for ICF, what was the
previous problem, and what I did in this patch.

In ICF, two sections are considered identical if they have the same
section flags, section data, and relocations. Relocations are tricky,
becuase two relocations are considered the same if they have the same
relocation type, values, and if they point to the same section _in
terms of ICF_.

Here is an example. If foo and bar defined below are compiled to the
same machine instructions, ICF can (and should) merge the two,
although their relocations point to each other.

  void foo() { bar(); }
  void bar() { foo(); }

This is not an easy problem to solve.

What we are doing in LLD is some sort of coloring algorithm. We color
non-identical sections using different colors repeatedly, and sections
in the same color when the algorithm terminates are considered
identical. Here is the details:

  1. First, we color all sections using their hash values of section
  types, section contents, and numbers of relocations. At this moment,
  relocation targets are not taken into account. We just color
  sections that apparently differ in different colors.

  2. Next, for each color C, we visit sections having color C to see
  if their relocations are the same. Relocations are considered equal
  if their targets have the same color. We then recolor sections that
  have different relocation targets in new colors.

  3. If we recolor some section in step 2, relocations that were
  previously pointing to the same color targets may now be pointing to
  different colors. Therefore, repeat 2 until a convergence is
  obtained.

Step 2 is a heavy operation. For Chromium, the first iteration of step
2 takes 2.882 seconds, and the second iteration takes 1.038 seconds,
and in total it needs 23 iterations.

Parallelizing step 1 is easy because we can color each section
independently. This patch does that.

Parallelizing step 2 is tricky. We could work on each color
independently, but we cannot recolor sections in place, because it
will break the invariance that two possibly-identical sections must
have the same color at any moment.

Consider sections S1, S2, S3, S4 in the same color C, where S1 and S2
are identical, S3 and S4 are identical, but S2 and S3 are not. Thread
A is about to recolor S1 and S2 in C'. After thread A recolor S1 in
C', but before recolor S2 in C', other thread B might observe S1 and
S2. Then thread B will conclude that S1 and S2 are different, and it
will split thread B's sections into smaller groups wrongly. Over-
splitting doesn't produce broken results, but it loses a chance to
merge some identical sections. That was the cause of indeterminism.

To fix the problem, I made sections have two colors, namely current
color and next color. At the beginning of each iteration, both colors
are the same. Each thread reads from current color and writes to next
color. In this way, we can avoid threads from reading partial
results. After each iteration, we flip current and next.

This is a very simple solution and is implemented in less than 50
lines of code.

I tested this patch with Chromium and confirmed that this parallelized
ICF produces the identical output as the non-parallelized one.

Differential Revision: https://reviews.llvm.org/D27247

llvm-svn: 288373
2016-12-01 17:09:04 +00:00
Sean Silva 2eed75926c Add `isRelExprOneOf` helper
In various places in LLD's hot loops, we have expressions of the form
"E == R_FOO || E == R_BAR || ..." (E is a RelExpr).

Some of these expressions are quite long, and even though they usually go just
a very small number of ways and so should be well predicted, they can still
occupy branch predictor resources harming other parts of the code, or they
won't be predicted well if they overflow branch predictor resources or if the
branches are too dense and the branch predictor can't track them all (the
compiler can in theory avoid this, at a cost in text size). And some of these
expressions are so large and executed so frequently that even when
well-predicted they probably still have a nontrivial cost.

This speedup should be pretty portable. The cost of these simple bit tests is
independent of:

- the target we are linking for
- the distribution of RelExpr's for a given link (which can depend on how the
  input files were compiled)
- what compiler was used to compile LLD (it is just a simple bit test;
  hopefully the compiler gets it right!)
- adding new target-dependent relocations (e.g. needsPlt doesn't pay any extra
  cost checking R_PPC_PLT_OPD on x86-64 builds)

I did some rough measurements on clang-fsds and this patch gives over about 4%
speedup for a regular -O1 link, about 2.5% for -O3 --gc-sections and over 5%
for -O0. Sorry, I don't have my current machine set up for doing really
accurate measurements right now.

This also is just a bit cleaner. Thanks for Joerg for suggesting for
this approach.

Differential Revision: https://reviews.llvm.org/D27156

llvm-svn: 288314
2016-12-01 05:43:48 +00:00
Rui Ueyama 10091b0ac2 Simplify ScriptParser.
- Rename currentBuffer -> getCurrentMB to start it with verb.
 - Simplify containsString.
 - Add llvm_unreachable at end of getCurrentMB.

llvm-svn: 288310
2016-12-01 04:36:54 +00:00
Rui Ueyama 3cd22d3104 Do not name a variable Ret which is not a return value.
llvm-svn: 288309
2016-12-01 04:36:51 +00:00
Rui Ueyama b5f1c3ec0c Make get{Line,Column}Number members of StringParser.
This patch also renames currentLocation getCurrentLocation.

llvm-svn: 288308
2016-12-01 04:36:49 +00:00
Rui Ueyama 50fb82743e Split getPos into getLineNumber and getColumnNumber.
llvm-svn: 288306
2016-12-01 03:56:27 +00:00
Rui Ueyama c5cb737584 Dump Codeview type information correctly.
llvm-svn: 288298
2016-12-01 01:22:48 +00:00
Rui Ueyama 9dedfb1fa8 Change how we manage groups in ICF.
Previously, on each iteration in ICF, we scan the entire vector of
input sections to find boundaries of groups having the same ID.

This patch changes the algorithm so that we now have a vector of ranges.
Each range contains a starting index and an ending index of the group.
So we no longer have to search boundaries on each iteration.

Performance-wise, this seems neutral. Instead of searching boundaries,
we now have to maintain ranges. But I think this is more readable
than the previous implementation.

Moreover, this makes easy to parallelize the main loop of ICF,
which I'll do in a follow-up patch.

llvm-svn: 288228
2016-11-30 01:50:03 +00:00
Rui Ueyama 84e65a7ca1 Use StringRefZ explicitly instead of const char *.
This patch is to avoid an implicit conversion from const char * to
StringRefZ, to make it apparent where we are using StringRefZ.

llvm-svn: 288182
2016-11-29 19:11:39 +00:00
Rui Ueyama a13efc2a73 Introduce StringRefZ class to represent null-terminated strings.
StringRefZ is a class to represent a null-terminated string. String
length is computed lazily, so it's more efficient than StringRef to
represent strings in string table.

The motivation of defining this new class is to merge functions
that only differ in string types; we have many constructors that takes
`const char *` or `StringRef`. With StringRefZ, we can merge them.

Differential Revision: https://reviews.llvm.org/D27037

llvm-svn: 288172
2016-11-29 18:05:04 +00:00
Peter Smith de3e73880e [ELF] Add support for static TLS to ARM
The module index dynamic relocation R_ARM_DTPMOD32 is always 1 for an
executable. When static linking and when we know that we are not a shared
object we can resolve the module index relocation statically.
    
The logic in handleNoRelaxTlsRelocation remains the same for Mips as it
has its own custom GOT writing code. For ARM we add the module index
relocation to the GOT when it can be resolved statically.
    
In addition the type of the RelExpr for the static resolution of TlsGotRel
should be R_TLS and not R_ABS as we need to include the size of
the thread control block in the calculation.
    
Addresses the TLS part of PR30218.

Differential revision: https://reviews.llvm.org/D27213

llvm-svn: 288153
2016-11-29 16:23:50 +00:00
George Rimar 9b3ae73fc8 [ELF] - Disable emiting multiple output sections when merging is disabled.
When -O0 is specified, we do not do section merging.
Though before this patch several sections were generated instead
of single, what is useless.

Differential revision: https://reviews.llvm.org/D27041

llvm-svn: 288151
2016-11-29 16:11:09 +00:00
George Rimar 3fb5a6dc9e [ELF] - Add support of proccessing of the rest allocatable synthetic sections from linkerscript.
This change continues what was started by D27040
Now all allocatable synthetics should be available from script side.

Differential revision: https://reviews.llvm.org/D27131

llvm-svn: 288150
2016-11-29 16:05:27 +00:00
Simon Atanasyan 160bf723f5 [ELF][MIPS] Make stable an order of GOT page address entries
llvm-svn: 288137
2016-11-29 13:26:04 +00:00
Simon Atanasyan 198dd205e6 [ELF][MIPS] Add new check to the test case in attempt to investigate Windows build-bot failure
llvm-svn: 288132
2016-11-29 11:19:47 +00:00
Simon Atanasyan 9705ff74ed [ELF][MIPS] Restore Config->Threads for MIPS targets
llvm-svn: 288130
2016-11-29 10:24:00 +00:00
Simon Atanasyan 9fae3b8a2c [ELF][MIPS] Do not change MipsGotSection state in the getPageEntryOffset method
The MipsGotSection::getPageEntryOffset calculates index of GOT entry
with a "page" address. Previously this method changes the state
of MipsGotSection because it modifies PageIndexMap field. That leads
to the unpredictable results if getPageEntryOffset called from multiple threads.

The patch makes getPageEntryOffset constant. To do so it calculates GOT
entry index but does not update PageIndexMap field. Later in the
MipsGotSection::writeTo method linker calculates "page" addresses and
writes them to the output.

llvm-svn: 288129
2016-11-29 10:23:56 +00:00
Simon Atanasyan a0efc4268c [ELF][MIPS] Replace the magic number of GOT header entries by constant. NFC
llvm-svn: 288128
2016-11-29 10:23:50 +00:00
Simon Atanasyan 80f3d9ce93 [ELF][MIPS] Fix calculation of GOT "page address" entries number
If output section which referenced by R_MIPS_GOT_PAGE or R_MIPS_GOT16
relocations is small (less that 0x10000 bytes) and occupies two adjacent
0xffff-bytes pages, current formula gives incorrect number of required "page"
GOT entries. The problem is that in time of calculation we do not know
the section address and so we cannot calculate number of 0xffff-bytes
pages exactly.

This patch fix the formula. Now it gives a correct number of pages in
the worst case when "small" section intersects 0xffff-bytes page
boundary. From the other side, sometimes it adds one more redundant GOT
entry for each output section. But usually number of output sections
referenced by GOT relocations is small.

llvm-svn: 288127
2016-11-29 10:23:46 +00:00
George Rimar 595a763f38 [ELF] - Implemented -N (-omagic) command line option.
-N (-omagic)
  Set the text and data sections to be readable and writable. 
  Also, do not page-align the data segment.

Differential revision: https://reviews.llvm.org/D26888

llvm-svn: 288123
2016-11-29 09:43:51 +00:00
Eugene Leviant 84569e6caa [ELF] Refactor target error messages
Differential revision: https://reviews.llvm.org/D27097

llvm-svn: 288114
2016-11-29 08:05:44 +00:00
Rui Ueyama bf4ddeb97c Style fix.
llvm-svn: 288113
2016-11-29 04:22:57 +00:00
Rui Ueyama c910bc7e19 Simplify "missing argument" error message.
llvm-svn: 288112
2016-11-29 04:17:31 +00:00
Rui Ueyama e5e407beb4 Add comments.
llvm-svn: 288111
2016-11-29 04:17:30 +00:00
Rui Ueyama bd2a812ff0 Print error message header in red.
llvm-svn: 288110
2016-11-29 04:09:08 +00:00
Rafael Espindola f1e245315b Use relocations to fill statically known got entries.
Right now we just remember a SymbolBody for each got entry and
duplicate a bit of logic to decide what value, if any, should be
written for that SymbolBody.

With ARM there will be more complicated values, and it seems better to
just use the relocation code to fill the got entries. This makes it
clear that each entry is filled by the dynamic linker or by the static
linker.

llvm-svn: 288107
2016-11-29 03:45:36 +00:00
Rafael Espindola d3b32df3de Sort. NFC.
llvm-svn: 288102
2016-11-29 03:36:30 +00:00
Rafael Espindola 5498ba38df Add a test.
It would have found a missing case in another patch.

llvm-svn: 288101
2016-11-29 03:30:07 +00:00
George Rimar 1642c5d871 [ELF] - Do not put non exec sections first when -no-rosegment
That unifies handling cases when we have SECTIONS and when
-no-rosegment is given in compareSectionsNonScript()

Now Config->SingleRoRx is used for check, testcase is provided.

llvm-svn: 288022
2016-11-28 10:26:21 +00:00
George Rimar 18a3096282 [ELF] - Set Config->SingleRoRx differently. NFC.
Previously Config->SingleRoRx was set in
createFiles() and used HasSections.

This change moves it to readConfigs at place of
common flags handling, and adds logic that sets
this flag separatelly from ScriptParser if SECTIONS present.

llvm-svn: 288021
2016-11-28 10:11:10 +00:00
George Rimar 63bf011003 [ELF] - Implemented -no-rosegment.
--no-rosegment: Do not put read-only non-executable sections in their own segment

Differential revision: https://reviews.llvm.org/D26889

llvm-svn: 288020
2016-11-28 10:05:20 +00:00
Eugene Leviant ed30ce7ae4 [ELF] Print file:line for 'undefined section' errors
Differential revision: https://reviews.llvm.org/D27108

llvm-svn: 288019
2016-11-28 09:58:04 +00:00
Rafael Espindola 8e67000f1a Always create a PT_ARM_EXIDX if needed.
Unfortunatelly PT_ARM_EXIDX is special. There is no way to create it
from linker scripts, so we have to create it even if PHDRS is used.

This matches bfd and is required for the lld output to survive bfd's strip.

llvm-svn: 288012
2016-11-28 00:40:21 +00:00
Rui Ueyama 1dd86a664f Add paralell_for and use it where appropriate.
When we iterate over numbers as opposed to iterable elements,
parallel_for fits better than parallel_for_each.

llvm-svn: 288002
2016-11-27 19:28:32 +00:00
Rafael Espindola 5fcc99c27d Also skip regular symbol assignment at the start of a script.
Unfortunatelly some scripts look like

kernphys = ...
. = ....

and the expectation in that every orphan section is after the
assignment.

llvm-svn: 287996
2016-11-27 09:44:45 +00:00
Rafael Espindola 7fe4ec9b3a Don't put an orphan before the first . assignment.
This is an horrible special case, but seems to match bfd's behaviour
and is important for avoiding placing an orphan section before the
expected start of the file.

llvm-svn: 287994
2016-11-27 07:39:45 +00:00
Rui Ueyama e8a077badf Change return types of split{Non,}Strings.
They return new vectors, but at the same time they mutate other vectors,
so returning values doesn't make much sense. We should just mutate two
vectors.

llvm-svn: 287979
2016-11-26 15:15:11 +00:00
Rui Ueyama 72b1ee2533 Make getColorDiagnostics return a boolean value instead of an enum.
Config->ColorDiagnostics was of type enum before. Now it is just a
boolean flag. Thanks Rafael for suggestion.

llvm-svn: 287978
2016-11-26 15:10:01 +00:00