[ThinLTO] Add funtions in callees metadata to CallGraphEdges

Summary:
If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.

* Why this is necessary:
Consider following code example:
```
(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) {
  if (x) fptr=f1; return f1(x);
}
int foo(int x) {
  (*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}

(bar.c)
int bar(int x) {
  return foo(x);
}
```

At LTO time when `foo.o` is imported into `bar.o`, function `foo` might be inlined into `bar` and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of `@f1` or `@f2` is beneficial, the optimizer will check if the "promoted" `@f1` or `@f2` (such as `@f1.llvm.0` or `@f2.llvm.0`) is available. Without this patch, importing `!callees` metadata would only add promoted declarations of `@f1` and `@f2` to the `bar.o`, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with `undefined reference to @f1.llvm.0`.

This patch fixes this problem by adding callees in the `!callees` metadata to CallGraphEdges so that their definition would be properly imported into.

One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like
```
bar:1000000:100000
  1:100000
    1: foo:100000
        1: 100000 f1:100000
```
, Computing FunctionSummary for `foo.o` wouldn't add `foo->f1` to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).

Reviewers: tejohnson, mehdi_amini, pcc

Reviewed By: tejohnson

Subscribers: inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D44399

llvm-svn: 327358
This commit is contained in:
Taewook Oh 2018-03-13 04:26:58 +00:00
parent 33a2836307
commit 7646e77969
3 changed files with 68 additions and 0 deletions

View File

@ -291,6 +291,18 @@ computeFunctionSummary(ModuleSummaryIndex &Index, const Module &M,
if (!CalledValue || isa<Constant>(CalledValue))
continue;
// Check if the instruction has a callees metadata. If so, add callees
// to CallGraphEdges to reflect the references from the metadata, and
// to enable importing for subsequent indirect call promotion and
// inlining.
if (auto *MD = I.getMetadata(LLVMContext::MD_callees)) {
for (auto &Op : MD->operands()) {
Function *Callee = mdconst::extract_or_null<Function>(Op);
if (Callee)
CallGraphEdges[Index.getOrInsertValueInfo(Callee)];
}
}
uint32_t NumVals, NumCandidates;
uint64_t TotalCount;
auto CandidateProfileData =

View File

@ -0,0 +1,34 @@
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@fptr = internal unnamed_addr global i32 (i32)* @f2, align 8
define dso_local i32 @foo(i32 %x) local_unnamed_addr {
entry:
%0 = load i32 (i32)*, i32 (i32)** @fptr, align 8
%call = tail call i32 %0(i32 %x), !callees !0
ret i32 %call
}
define internal i32 @f2(i32 %x) {
entry:
%tobool = icmp eq i32 %x, 0
br i1 %tobool, label %if.end, label %if.then
if.then: ; preds = %entry
store i32 (i32)* @f1, i32 (i32)** @fptr, align 8
%sub.i = add nsw i32 %x, -1
br label %if.end
if.end: ; preds = %entry, %if.then
%phi.call = phi i32 [ %sub.i, %if.then ], [ -1, %entry ]
ret i32 %phi.call
}
define internal i32 @f1(i32 %x) {
entry:
%sub = add nsw i32 %x, -1
ret i32 %sub
}
!0 = !{i32 (i32)* @f1, i32 (i32)* @f2}

View File

@ -0,0 +1,22 @@
; Do setup work: generate bitcode and combined index
; RUN: opt -module-summary %s -o %t1.bc
; RUN: opt -module-summary %p/Inputs/callees-metadata.ll -o %t2.bc
; RUN: llvm-lto2 run %t1.bc %t2.bc -o %t.o -save-temps \
; RUN: -r=%t1.bc,bar,plx \
; RUN: -r=%t1.bc,foo,l \
; RUN: -r=%t2.bc,foo,pl
; RUN: llvm-dis %t.o.1.3.import.bc -o - | FileCheck %s
; CHECK: define {{.*}} i32 @f1.llvm.0
; CHECK: define {{.*}} i32 @f2.llvm.0
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
define dso_local i32 @bar(i32 %x) {
entry:
%call = call i32 @foo(i32 %x)
ret i32 %call
}
declare dso_local i32 @foo(i32)