Make the `LowerState` pass allow operations to remain in the top-level
`arc.model` op after state lowering. This is necessary for lowering the
model op into an `eval` function in the future. Make use of this new
flexibility by inserting logic into the model that detects edges on the
clocks of the `arc.clock_tree` ops. The clock trees no longer trigger on
the clock itself, but are given an "enable" signal that indicates
whether a clock edge has been observed.
In the future, we'll want to schedule the ops in the `arc.model` and
lower it to a separate `eval` function, instead of throwing it away. In
doing so the user will no longer have to manually call clock functions,
but can call a singular `eval` function instead. A centralized function
that performs model execution will also allow us to properly simulate
clock edges occurring at the same time -- something which is impossible
today.
Together with the `arc.clock_domain` op, this `eval` function will make
the entire clock detection and grouping a performance optimization
instead of a required transformation. Theoretically, even if we did not
separate state with the same clock into dedicated clock functions, we'll
still be able to generate an `eval` function, with all logic inlined.
This will ultimately make the Arc dialect more robust and the transforms
more composable.
To print debug info in MLIR output, the AsmPrinter CLI option mlir-print-debuginfo could be used. However, for printing debug info in the LLVM output, a pass to add the necessary DILocation attributes as to be called first. To make this process easy to use and uniform no matter whether MLIR or LLVM is printed, a print-debug-info CLI option is added
Add two passes to lower a design to a software model.
The `LowerClocksToFuncs` pass outlines all `arc.clock_tree` and
`arc.passthrough` operations into separate MLIR functions. This
conceptually converts clocks from being a signal in the design into a
function that can be called in order to execute the state update
triggered by that clock.
The `LowerArcToLLVM` conversion pass does exactly as it says: it sets up
a dialect conversion from Arc and the core CIRCT dialects to Func/SCF,
and from there to the LLVM dialect.
Also add an output format option to the arcilator tool that allows for
the direct emission of LLVM IR (as opposed to the MLIR's LLVM dialect).
Co-authored-by: Martin Erhart <maerhart@outlook.com>
Co-authored-by: Zachary Yedidia <zyedidia@gmail.com>
Add three passes that implement state allocation. The passes take the
abstract state allocation ops, compute a memory layout for the overall
state of the model, and replace the allocation ops with simple pointer
getter ops that access the allocated piece of memory. The passes operate
as follows:
- `LegalizeStateUpdate` detects read-after-write hazards and introduces
temporary storage locations that allow the read and write ops to
occur without infringing on each other.
- `AllocateState` computes the overall memory layout and replaces
allocation ops with simple accessor ops.
- `PrintStateInfo` emits the memory layout as a JSON file. This allows
other tools to reason about the exact memory layout of the design.
State update legalization is still lacking proper handling of memory
reads and writes, which are significantly more involved than the simple
scalar registers. Follow-up work.
Co-authored-by: Martin Erhart <maerhart@outlook.com>
Co-authored-by: Zachary Yedidia <zyedidia@gmail.com>
Add the `LowerState` pass and accompanying operations to convert a
design from a pure state transfer graph composed of arcs to a more
procedural read-modify-write representation. After this transformation
the design is a significant step down the path of becoming a software
model.
The `LowerState` pass proceeds as follows:
- Group all `arc.state` ops according to their clock into
`arc.clock_tree`s. Operations that are on direct input-to-output
passthrough or on state-to-output paths are grouped into a
`arc.passthrough` op.
- Introduce an explicit state storage allocation op for every state op,
memory op, as well as every input and output port of the root module.
- Replace the root `hw.module` with a `arc.model` which no longer has
any input and output ports, but a storage pointer argument instead.
Storage allocation ops represent specific chunks of memory behind this
pointer.
- Split every `arc.state` op with latency >0 up into a `arc.state_read`,
`arc.state` with latency 0, and `arc.state_write` operation. This
essentially breaks `arc.state` up into two parts: a read for all users
of the state, and a transfer function plus write for all operands of
the arc.
In doing so, the sea-of-gates representation of the HW dialect, which
is a pure graph without op ordering constraints, is converted into a
a sea-of-clocks, where each group contains the parts of the circuit that
are triggered by the same clock. The actual computation that occurs on
that trigger is represented procedurally in a proper SSA/CFG block.
TL;DR: This goes from "How do the gates connect together?" to "How does
a computer simulate these gates?".
Co-authored-by: Martin Erhart <maerhart@outlook.com>
Co-authored-by: Zachary Yedidia <zyedidia@gmail.com>
Add the `arcilator` convenience tool to make experimenting with the
dialect easier. The intended pass sequence performs the full conversion
from a circuit cut along port boundaries (through modules) to a circuit
cut along the state elements (through arcs). The tool simply executes
this pass pipeline.
Co-authored-by: Martin Erhart <maerhart@outlook.com>
Co-authored-by: Zachary Yedidia <zyedidia@gmail.com>