don't inhibit random field reordering on repr(packed(1))
`inhibit_struct_field_reordering_opt` being false means we exclude this type from random field shuffling. However, `packed(1)` types can still be shuffled! The logic was added in https://github.com/rust-lang/rust/pull/48528 since it's pointless to reorder fields in packed(1) types (there's no padding that could be saved) -- but that shouldn't inhibit `-Zrandomize-layout` (which did not exist at the time).
We could add an optimization elsewhere to not bother sorting the fields for `repr(packed)` types, but I don't think that's worth the effort.
This *does* change the behavior in that we may now reorder fields of `packed(1)` structs (e.g. if there are niches, we'll try to move them to the start/end, according to `NicheBias`). We were always allowed to do that but so far we didn't. Quoting the [reference](https://doc.rust-lang.org/reference/type-layout.html):
> On their own, align and packed do not provide guarantees about the order of fields in the layout of a struct or the layout of an enum variant, although they may be combined with representations (such as C) which do provide such guarantees.
coverage: Memoize and simplify counter expressions
When creating coverage counter expressions as part of coverage instrumentation, we often end up creating obviously-redundant expressions like `c1 + (c0 - c1)`, which is equivalent to just `c0`.
To avoid doing so, this PR checks when we would create an expression matching one of 5 patterns, and uses the simplified form instead:
- `(a - b) + b` → `a`.
- `(a + b) - b` → `a`.
- `(a + b) - a` → `b`.
- `a + (b - a)` → `b`.
- `a - (a - b)` → `b`.
Of all the different ways to combine 3 operands and 2 operators, these are the patterns that allow simplification.
(Some of those patterns currently don't occur in practice, but are included anyway for completeness, to avoid having to add them later as branch coverage and MC/DC coverage support expands.)
---
This PR also adds memoization for newly-created (or newly-simplified) counter expressions, to avoid creating duplicates.
This currently makes no difference to the final mappings, but is expected to be useful for MC/DC coverage of match expressions, as proposed by https://github.com/rust-lang/rust/pull/124278#issuecomment-2106754753.
Some of these cases currently don't occur in practice, but are included for
completeness, and to avoid having to add them later as branch coverage and
MC/DC coverage start building more complex expressions.
Avoid a cast in `ptr::slice_from_raw_parts(_mut)`
Casting to `*const ()` or `*mut ()` is no longer needed after https://github.com/rust-lang/rust/pull/123840 so let's make the MIR smaller (and more inline-able, as seen in the tests).
If [ACP#362](https://github.com/rust-lang/libs-team/issues/362) goes through we can keep calling `ptr::from_raw_parts(_mut)` in these also without the cast, but that hasn't had any libs-api attention yet, so I'm not waiting on it.
never patterns: lower never patterns to `Unreachable` in MIR
This lowers a `!` pattern to "goto Unreachable". Ideally I'd like to read from the place to make it clear that the UB is coming from an invalid value, but that's tricky so I'm leaving it for later.
r? `@compiler-errors` how do you feel about a lil bit of MIR lowering
Casting to `*const ()` or `*mut ()` just bloats the MIR, so let's not.
If ACP#362 goes through we can keep calling `ptr::from_raw_parts(_mut)` in these also without the cast, but that hasn't had any libs-api attention yet, so I'm not waiting on it.
Account for immutably borrowed locals in MIR copy-prop and GVN
For the most part, we consider that immutably borrowed `Freeze` locals still fulfill SSA conditions. As the borrow is immutable, any use of the local will have the value given by the single assignment, and there can be no surprise.
This allows copy-prop to merge a non-borrowed local with a borrowed local. We chose to keep copy-classes heads unborrowed, as those may be easier to optimize in later passes.
This also allows to GVN the value behind an immutable borrow. If a SSA local is borrowed, dereferencing that borrow is equivalent to copying the local's value: re-executing the assignment between the borrow and the dereference would be UB.
r? `@ghost` for perf
deref patterns: lower deref patterns to MIR
This lowers deref patterns to MIR. This is a bit tricky because this is the first kind of pattern that requires storing a value in a temporary. Thanks to https://github.com/rust-lang/rust/pull/123324 false edges are no longer a problem.
The thing I'm not confident about is the handling of fake borrows. This PR ignores any fake borrows inside a deref pattern. We are guaranteed to at least fake borrow the place of the first pointer value, which could be enough, but I'm not certain.
Disable SimplifyToExp in MatchBranchSimplification
Due to the miscompilation mentioned in #124150, We need to disable MatchBranchSimplification temporarily.
To fully resolve this issue, my plan is:
1. Disable SimplifyToExp in MatchBranchSimplification (this PR).
2. Remove all potentially unclear transforms in #124122.
3. Gradually add back the removed transforms (possibly multiple PRs).
r? `@Nilstrieb` or `@oli-obk`
Make `checked` ops emit *unchecked* LLVM operations where feasible
For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write the checked methods in such a way that we stop emitting wrapping versions of them.
For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither
```rust
a.checked_sub(b).unwrap()
```
nor
```rust
a.checked_sub(b).unwrap_unchecked()
```
actually optimizes to `sub nuw`. After this PR they do.
cc #103299
For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them.
For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither
```rust
a.checked_sub(b).unwrap()
```
nor
```rust
a.checked_sub(b).unwrap_unchecked()
```
actually optimizes to `sub nuw`. After this PR they do.
Re-enable the early otherwise branch optimization
Closes#95162. Fixes#119014.
This is the first part of #121397.
An invalid enum discriminant can come from anywhere. We have to check to see if all successors contain the discriminant statement. This should have a pass to hoist instructions.
r? cjgillot
Pass list of defineable opaque types into canonical queries
This eliminates `DefiningAnchor::Bubble` for good and brings the old solver closer to the new one wrt cycles and nested obligations. At that point the difference between `DefiningAnchor::Bind([])` and `DefiningAnchor::Error` was academic. We only used the difference for some sanity checks, which actually had to be worked around in places, so I just removed `DefiningAnchor` entirely and just stored the list of opaques that may be defined.
fixes#108498
fixes https://github.com/rust-lang/rust/issues/116877
* [x] run crater
- https://github.com/rust-lang/rust/pull/122077#issuecomment-2013293931
match lowering: make false edges more precise
When lowering match expressions, we add false edges to hide details of the lowering from borrowck. Morally we pretend we're testing the patterns (and guards) one after the other in order. See the tests for examples. Problem is, the way we implement this today is too coarse for deref patterns.
In deref patterns, a pattern like `deref [1, x]` matches on a `Vec` by creating a temporary to store the output of the call to `deref()` and then uses that to continue matching. Here the pattern has a binding, which we set up after the pre-binding block. Problem is, currently the false edges tell borrowck that the pre-binding block can be reached from a previous arm as well, so the `deref()` temporary may not be initialized. This triggers an error when we try to use the binding `x`.
We could call `deref()` a second time, but this opens the door to soundness issues if the deref impl is weird. Instead in this PR I rework false edges a little bit.
What we need from false edges is a (fake) path from each candidate to the next, specifically from candidate C's pre-binding block to next candidate D's pre-binding block. Today, we link the pre-binding blocks directly. In this PR, I link them indirectly by choosing an earlier node on D's success path. Specifically, I choose the earliest block on D's success path that doesn't make a loop (if I chose e.g. the start block of the whole match (which is on the success path of all candidates), that would make a loop). This turns out to be rather straightforward to implement.
r? `@matthewjasper` if you have the bandwidth, otherwise let me know
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`
Per [#120268](https://github.com/rust-lang/rust/pull/120268#discussion_r1517492060), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .
I solved some nits to add some comments.
I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.
r? RalfJung
rename ptr::from_exposed_addr -> ptr::with_exposed_provenance
As discussed on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/To.20expose.20or.20not.20to.20expose/near/427757066).
The old name, `from_exposed_addr`, makes little sense as it's not the address that is exposed, it's the provenance. (`ptr.expose_addr()` stays unchanged as we haven't found a better option yet. The intended interpretation is "expose the provenance and return the address".)
The new name nicely matches `ptr::without_provenance`.
De-LLVM the unchecked shifts [MCP#693]
This is just one part of the MCP (https://github.com/rust-lang/compiler-team/issues/693), but it's the one that IMHO removes the most noise from the standard library code.
Seems net simpler this way, since MIR already supported heterogeneous shifts anyway, and thus it's not more work for backends than before.
r? WaffleLapkin
Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
This is just one part of the MCP, but it's the one that IMHO removes the most noise from the standard library code.
Seems net simpler this way, since MIR already supported heterogeneous shifts anyway, and thus it's not more work for backends than before.
In `ConstructCoroutineInClosureShim`, pass receiver by mut ref, not mut pointer
The receivers were compatible at codegen time, but did not necessarily have the same layouts due to niches, which was caught by miri.
Fixesrust-lang/miri#3400
r? oli-obk
match lowering: consistently merge simple or-patterns
There are two places where we expand or-patterns in match lowering: the main one is `test_candidates_with_or`, and there's one in `match_candidates` that's an optimization for the simple case where the whole pattern is just one or-pattern.
To reduce duplication, we merge or-pattern alternatives into a single block when possible, but we only to that in `test_candidates_with_or`. This PR fixes this oversight and merges them in `match_candidates` too.
This is a part of splitting up https://github.com/rust-lang/rust/pull/122046 into smaller bits.
This makes `-Zprint-type-sizes`'s output easier to read, because the
name of an `async fn` is more immediately recognizable than its span.
I also deleted the comment "FIXME(eddyb) should use `def_span`." because
it appears to have already been fixed by commit 67727aa7c3.
This saves some debug and scope metadata in every single function that calls it.
Normally wouldn't be worth it, but with the derives there's *so* many of these.
Fix validation on substituted callee bodies in MIR inliner
When inlining a coroutine, we will substitute the MIR body with the args of the call. There is code in the MIR validator that attempts to prevent query cycles, and will use the coroutine body directly when it detects that's the body that's being validated. That means that when inlining a coroutine body that has been substituted, it may no longer be parameterized over the original args of the coroutine, which will lead to substitution ICEs.
Fixes#119064
refactor check_{lang,library}_ub: use a single intrinsic
This enacts the plan I laid out [here](https://github.com/rust-lang/rust/pull/122282#issuecomment-1996917998): use a single intrinsic, called `ub_checks` (in aniticpation of https://github.com/rust-lang/compiler-team/issues/725), that just exposes the value of `debug_assertions` (consistently implemented in both codegen and the interpreter). Put the language vs library UB logic into the library.
This makes it easier to do something like https://github.com/rust-lang/rust/pull/122282 in the future: that just slightly alters the semantics of `ub_checks` (making it more approximating when crates built with different flags are mixed), but it no longer affects whether these checks can happen in Miri or compile-time.
The first commit just moves things around; I don't think these macros and functions belong into `intrinsics.rs` as they are not intrinsics.
r? `@saethlin`
Some of the marker statements used by coverage are added during MIR building
for use by the InstrumentCoverage pass (during analysis), and are not needed
afterwards.
misc cleanups from debugging something
rename `instantiate_canonical_with_fresh_inference_vars` to `instantiate_canonical` the substs for the canonical are not solely infer vars as that would be wildly wrong and it is rather confusing to see this method called and think that the entire canonicalization setup is completely broken when it is not 👍
also update region debug printing to be more like the custom impls for Ty/Const, right now regions in debug output are horribly verbose and make it incredibly hard to read but with this atleast boundvars and placeholders when debugging the new solver do not take up excessive amounts of space.
r? `@lcnr`
Remove some only- clauses from mir-opt tests
Derived from https://github.com/rust-lang/rust/pull/122295
Many of these tests were originally codegen tests, and MIR is more trivially portable than LLVM IR. We simply don't need to restrict the platform in most cases.
r? Nadrieril
match lowering: don't collect test alternatives ahead of time
I'm very happy with this one. Before this, when sorting candidates into the possible test branches, we manually computed `usize` indices to determine in which branch each candidate goes. To make this work we had a first pass that collected the possible alternatives we'd have to deal with, and a second pass that actually sorts the candidates.
In this PR, I replace `usize` indices with a dedicated enum. This makes `sort_candidates` easier to follow, and we don't need the first pass anymore.
r? ``@matthewjasper``
Add FileCheck annotations to MIR-opt unnamed-fields tests
Part of #116971
Adds filecheck annotations to unnamed-fields mir-opt tests in `tests/mir-opt/unnamed-fields`
match lowering: Lower bindings in a predictable order
After the recent refactorings, we can now lower bindings in a truly predictable order. The order in https://github.com/rust-lang/rust/pull/120214 was an improvement but not very clear. With this PR, we lower bindings from left to right, with the special case that `x @ pat` is traversed as `pat @ x` (i.e. `x` is lowered after any bindings in `pat`).
This description only applies in the absence of or-patterns. Or-patterns make everything complicated, because the binding place depends on the subpattern. Until I have a better idea I leave them to be handled in whatever weird order arises from today's code.
r? `@matthewjasper`
match lowering: Separate the `bool` case from other integers in `TestKind`
`TestKind::SwitchInt` had a special case for `bool` essentially everywhere it's used, so I made `TestKind::If` to handle the bool case on its own.
r? `@matthewjasper`
Make the success arms of `if lhs || rhs` meet up in a separate block
Extracted from #118305, where this is necessary to avoid introducing a bug when injecting marker statements into the then/else arms.
---
In the previous code (#111752), the success block of `lhs` would jump directly to the success block of `rhs`. However, `rhs_success_block` could already contain statements that are specific to the RHS, and the direct goto causes them to be executed in the LHS success path as well.
This patch therefore creates a fresh block that the LHS and RHS success blocks can both jump to.
---
I think the reason we currently get away with this is that `rhs_success_block` usually doesn't contain anything other than StorageDead statements for locals used by the RHS, and those statements don't seem to cause problems in the LHS success path (which never makes those locals live).
But if we start adding meaningful statements for branch coverage (or MC/DC coverage), it's important to keep the LHS and RHS blocks separate.
In the previous code, the success block of `lhs` would jump directly to the
success block of `rhs`. However, `rhs_success_block` could already contain
statements that are specific to the RHS, and the direct goto causes them to be
executed in the LHS success path as well.
This patch therefore creates a fresh block that the LHS and RHS success blocks
can both jump to.
Add `#[rustc_no_mir_inline]` for standard library UB checks
should help with #121110 and also with #120848
Because the MIR inliner cannot know whether the checks are enabled or not, so inlining is an unnecessary compile time pessimization when debug assertions are disabled. LLVM knows whether they are enabled or not, so it can optimize accordingly without wasting time.
r? `@saethlin`
match lowering: simplify empty candidate selection
In match lowering, `match_simplified_candidates` is tasked with removing candidates that are fully matched and linking them up properly. The code that does that was needlessly complicated; this PR simplifies it.
The overall change isn't big but I split it up into tiny commits to convince myself that I was correctly preserving behavior. The test changes are all due to the first commit. Let me know if you'd prefer me to split up the PR to make reviewing easier.
r? `@matthewjasper`
match lowering: eagerly simplify match pairs
This removes one important complication from match lowering. Before this, match pair simplification (which includes collecting bindings and type ascriptions) was intertwined with the whole match lowering algorithm.
I'm avoiding this by storing in each `MatchPair` the sub-`MatchPair`s that correspond to its subfields. This makes it possible to simplify everything (except or-patterns) in `Candidate::new()`.
This should open up further simplifications. It will also give us proper control over the order of bindings.
r? `@matthewjasper`
Use intrinsics::debug_assertions in debug_assert_nounwind
This is the first item in https://github.com/rust-lang/rust/issues/120848.
Based on the benchmarking in this PR, it looks like, for the programs in our benchmark suite, enabling all these additional checks does not introduce significant compile-time overhead, with the single exception of `Alignment::new_unchecked`. Therefore, I've added `#[cfg(debug_assertions)]` to that one call site, so that it remains compiled out in the distributed standard library.
The trailing commas in the previous calls to `debug_assert_nounwind!` were causing the macro to expand to `panic_nouwnind_fmt`, which requires more work to set up its arguments, and that overhead alone is measured between this perf run and the next: https://github.com/rust-lang/rust/pull/120863#issuecomment-1937423502
match lowering: simplify block creation
Match lowering was doing complicated things with block creation. As far as I can tell it was trying to avoid creating unneeded blocks, but of the three places that start out with `otherwise = &mut None`, two of them called `otherwise.unwrap_or_else(|| self.cfg.start_new_block())` anyway. As far as I can tell the only place where this PR makes a difference is in `lower_match_tree`, which did indeed sometimes avoid creating the unreachable final block + FakeRead. Unless this is important I propose we do the naive thing instead.
I have not checked all the graph isomorphisms by hand, but at a glance the test diff looks sensible.
r? `@matthewjasper`