circt/docs/RationaleFIRRTL.md

23 KiB

FIRRTL Dialect Rationale

This document describes various design points of the FIRRTL dialect, why it is the way it is, and current status and progress. This follows in the spirit of other MLIR Rationale docs.

Introduction

The FIRRTL project is an existing open source compiler infrastructure used by the Chisel framework to lower ".fir" files to Verilog. It provides a number of useful compiler passes and infrastructure that allows the development of domain specific passes. The FIRRTL project includes a well documented IR specification that explains the semantics of its IR, an ANTLR grammar includes some extensions beyond it, and a compiler implemented in Scala which we refer to as the Scala FIRRTL Compiler (SFC).

The FIRRTL dialect in CIRCT is designed to provide a drop-in replacement for the SFC for the subset of FIRRTL IR that is produced by Chisel and in common use. The FIRRTL dialect also provides robust support for SFC Annotations.

To achieve these goals, the FIRRTL dialect follows the FIRRTL IR specification and the SFC implementation almost exactly. Where the FIRRTL specification allows for undefined behavior, FIRRTL dialect and its passes will choose the SFC interpretation of specific undefined behavior. The small deviations we do make are discussed below. Early versions of the FIRRTL dialect made heavy deviations from FIRRTL IR and the SFC (see the Type Canonicalization section below). These deviations, while elegant, led to difficult to resolve mismatches with the SFC and the inability to verify FIRRTL IR. The remaining small deviations introduced in the FIRRTL dialect are done to simplify the CIRCT implementation of a FIRRTL compiler and to take advantage of MLIR's various features.

This document generally assumes that you've read and have a basic grasp of the FIRRTL IR spec, and it can be occasionally helpful to refer to the ANTLR grammar.

Status

The FIRRTL dialect and FIR parser is a generally complete implementation of the FIRRTL specification and is actively maintained, tracking new enhancements. The FIRRTL dialect supports some undocumented features and the "CHIRRTL" flavor of FIRRTL IR that is produced from Chisel. The FIRRTL dialect has support for parsing an SFC Annotation file consisting of only local annotations and converting this to operation or argument attributes. Non-local annotation support is planned, but not implemented.

There are some exceptions to the above:

  1. We don't support the Fixed types for fixed point numbers, and some primitives associated with them.
  2. We don't support Interval types

Some of these may be research efforts that didn't gain broad adoption, in which case we don't want to support them. However, if there is a good reason and a community that would benefit from adding support for these, we can do so.

Naming

One of the goals of the FIRRTL compiler is to produce human-readable Verilog which can be easily mapped back to the input FIRRTL. Part of this effort means that we want the naming of Verilog objects to match the names used in the original FIRRTL, and they are predictably transformed during lowering. For example, after bundles are replaced with scalars in the lower-types pass, each field should be prefixed with the bundle name:

circuit Example
  module Example
    reg myreg: { a :UInt<1>, b: UInt<1> }, clock
; firrtl-lower-types =>
circuit Example
  module Example
    reg myreg_a: UInt<1>, clock
    reg myreg_b: UInt<1>, clock

The name transformations applied by the SFC have become part of the documented API, and people rely on the final names to take a certain form.

There are names for temporaries generated by the Chisel and FIRRTL tooling which are not important to maintain. These names are discarded when parsing, which saves memory during compilation. New names are generated at Verilog export time, which has the effect of renumbering intermediate value names. Names generated by Chisel typically look like _T_12, and names generated by the SFC look like _GEN_12. The FIRRTL compiler will not discard these names if the object has an array attribute annotations containing the attribute {class = "firrtl.transforms.DontTouchAnnotation}.

It is common for EDA tools to hide or optimize away entities which have a name beginning with an _. FIRRTL considers these names precious (excluding FIRRTL temporary names) and will maintain them.

Type system

Not using standard types

At one point we tried to use the integer types in the standard dialect, like si42 instead of !firrtl.sint<42>, but we backed away from this. While it originally seemed appealing to use those types, FIRRTL operations generally need to work with "unknown width" integer types (i.e. !firrtl.sint).

Having the known width and unknown width types implemented with two different C++ classes was awkward, led to casting bugs, and prevented having a FIRRTLType class that unified all the FIRRTL dialect types.

Not Canonicalizing Flip Types

An initial version of the FIRRTL dialect relied on canonicalization of flip types according to the following rules:

  1. flip(flip(x)) == x.
  2. flip(analog(x)) == analog(x) since analog types are implicitly bidirectional.
  3. flip(bundle(a,b,c,d)) == bundle(flip(a), flip(b), flip(c), flip(d)) when the bundle has non-passive type or contains an analog type. This forces the flip into the subelements, where it recursively merges with the non-passive subelements and analogs.
  4. flip(vector(a, n)) == vector(flip(a), n) when the vector has non-passive type or analogs. This forces the flip into the element type, generally canceling it out.
  5. bundle(flip(a), flip(b), flip(c), flip(d)) == flip(bundle(a, b, c, d). Due to the other rules, the operand to a flip must be a passive type, so the entire bundle will be passive, and rule #3 won't be recursively reinvoked.

While elegant in a number of ways (e.g., FIRRTL types are guaranteed to have a canonical representation and can be compared using pointer equality, flips partially subsume port directionality and "flow", and analog inputs and outputs are canonicalized to the same representation), this resulted in information loss during canonicalization because the number of flip types can change. Namely, three problems were identified:

  1. Type canonicalization may make illegal operations legal.
  2. The flow of connections could not be verified because flow is a function of the number of flip types.
  3. The directionality of leaves in an aggregate could not be determined.

As an example of the first problem, consider the following circuit:

module Foo:
  output a: { flip a: UInt<1> }
  output b: { a: UInt<1> }

  b <= a

The connection b <= a is illegal FIRRTL due to a type mismatch where { flip a: UInt<1> } is not equal to { a: UInt<1> }. However, type canonicalization would transform this circuit into the following circuit:

module Foo:
  input a: { a: UInt<1> }
  output b: { a: UInt<1> }

  b <= a

Here, the connection b <= a is legal FIRRTL. This then makes it impossible for a type canonical form to be type checked.

As an example of the second problem, consider the following circuit:

module Bar:
  output a: { flip a: UInt<1> }
  input b: { flip b: UInt<1> }

  b <= a

Here, the connection b <= a is illegal FIRRTL because b is a source and a is a sink. However, type canonicalization converts this to the following circuit:

module Bar:
  input a: { a: UInt<1> }
  output b: { b: UInt<1> }

  b <= a

Here, the connect b <= a is legal FIRRTL because b is now a sink and a is now a source. This then makes it impossible for a type canonical form to be flow checked.

As an example of the third problem, consider the following circuit:

module Baz:
  wire a: {flip a: {flip a: UInt<1>}}
  wire b: {flip a: {flip a: UInt<1>}}

  b.a <= a.a

The connection b.a <= a.a, when lowered, results in the reverse connect a.a.a <= b.a.a. However, type canonicalization will remove the flips from the circuit to produce:

module Baz:
  wire a: {a: {a: UInt<1>}}
  wire b: {a: {a: UInt<1>}}

  b.a <= a.a

Here, the connect b.a <= a.a, when lowered, results in the normal connect b.a.a <= a.a.a. Type canonicalization has thereby changed the semantics of connect.

Due to the elegance of type canonicalization, we initially decided that we would use type canonicalization and CIRCT would accept more circuits than the SFC. The third problem (identified much later than the first two) convinced us to remove type canonicalization.

For a historical discussion of type canonicalization see:

Flow

The FIRRTL specification describes the concept of "flow". Flow encodes additional information that determines the legality of operations. FIRRTL defines three different flows: sink, source, and duplex. Module inputs, instance outputs, and nodes are source, module outputs and instance inputs are sink, and wires and registers are duplex. A value with sink flow may only be written to, but not read from (with the exception of module outputs and instance inputs which may be also read from). A value with source flow may be read from, but not written to. A value with duplex flow may be read from or written to.

For FIRRTL connects or partial connect statements, it follows that the left-hand-side must be sink or duplex and the right-hand-side must be source, duplex, or a port/instance sink.

Flow is not represented as a first-class type in CIRCT. We instead provide utilities for computing flow when needed, e.g., for connect statement verification.

Operations

Multiple result firrtl.instance operation

The FIRRTL spec describes instances as returning a bundle type, where each element of the bundle corresponds to one of the ports of the module being instanced. This makes sense in the Scala FIRRTL implementation, given that it does not support multiple ports.

The MLIR FIRRTL dialect takes a different approach, having each element of the bundle result turn into its own distinct result on the firrtl.instance operation. This is made possible by MLIR's robust support for multiple value operands, and makes the IR much easier to analyze and work with.

Module bodies require def-before-use dominance instead of allowing graphs

MLIR allows regions with arbitrary graphs in their bodies, and this is used by the HW dialect to allow direct expression of cyclic graphs etc. While this makes sense for hardware in general, the FIRRTL dialect is intended to be a pragmatic infrastructure focused on lowering of Chisel code to the HW dialect, it isn't intended to be a "generally useful IR for hardware".

We recommend that non-Chisel frontends target the HW dialect, or a higher level dialect of their own creation that lowers to HW as appropriate.

input and output Module Ports

The FIRRTL specification describes two kinds of ports: input and output. In the firrtl.module declaration we track this via an arbitrary precision integer attribute (IntegerAttr) where each bit encodes the directionality of the port at that index.

Originally, we encoded direction as the absence of an outer flip type (input) or presence of an outer flip type (output). This was done as part of the original type canonicalization effort which combined input/output with the type system. However, once type canonicalization was removed flip type only became used in three places: on the types of bundle fields, on the variadic return types of instances or memories, and on ports. The first is the same as the FIRRTL specification. The second is a deviation from the FIRRTL specification, but allowable as it takes advantage of the MLIR's variadic capabilities to simplify the IR. The third was an inelegant abuse of an unrelated concept that added bloat to the type system. Many operations would have to check for an outer flip on ports and immediately discard it.

For this reason, the IntegerAttr encoding implementation was chosen.

For a historical discussion of this issue and its development see:

firrtl.mem

Unlike the SFC, the FIRRTL dialect represents each memory port as a distinct result value of the firrtl.mem operation. Also, the firrtl.mem node does not allow zero port memories for simplicity. Zero port memories are dropped by the .fir file parser.

CHIRRTL Memories

FIRRTL has two different representations of memories: Chisel cmemory operations, smem and cmem, and the standard FIRRTL mem operation. Chisel memory operations exist to make it easy to produce FIRRTL code from Chisel, and closely match the Chisel API for memories. Chisel memories are intended to be replaced with standard FIRRTL memories early in the pipeline. The set of operations related to Chisel memories are often referred to as CHIRRTL.

The main difference between Chisel and FIRRTL memories is that Chisel memories have an operation to add a memory port to a memory, while FIRRTL memories require all ports to be defined up front. Another difference is that Chisel memories have "enable inferrence", and are usually inferred to be enabled where they are declared. The following example shows a CHIRRTL memory declaration, and the standard FIRRTL memory equivalent.

smem mymemory : UInt<4>[8]
when p:
  read mport port0 = mymemory[address], clock
mem mymemory:
    data-type => UInt<4>
    depth => 8
    read-latency => 0
    write-latency => 1
    reader => port0
    read-under-write => undefined

mymemory.port0.en <= p
mymemory.port0.clk <= clock
mymemory.port0.addr <= address

FIRRTL memory operations were created because it was thought that a concrete memory primitive, that looks like an instance, is a better design for a compiler IR. It was originally intended that Chisel would be modified to emit FIRRTL memory operations directly, and the CHIRRTL operations would be retired. The lowering from Chisel memories to FIRRTL memories proved far more complicated than originally envisioned, specifically surrounding the type of ports, inference of enable signals, and inference of clocks.

CHIRRTL operations have since stuck around, but their strange behavior has lead to discussions to remove, improve, or totally redesign them. For some current discussion about this see 1, 2. Since CIRCT is attempting to be a drop in replacement FIRRTL compiler, we are not attempting to implement these new ideas for Chisel memories. Instead, we are trying to implement what exists today.

There is, however, a major compatibility issue with the existing implementation of Chisel memories which made them difficult to support in CIRCT. The FIRRTL specification disallows using any declaration outside of the scope where it is created. This means that a Chisel memory port declared inside of a when block can only be used inside the scope of the when block. Unfortunately, this invariant is not enforced for memory ports, and this leniency has been abused by the Chisel standard library. Due to the way clock and enable inference works, we couldn't just hoist the declaration into the outer scope.

To support escaping memory port definitions, we decided to split the memory port operation into two operations. We created a firrtl.memoryport operation to declare the memory port, and a firrtl.memoryport.access operation to enable the memory port. The following is an example of how FIRRTL translates into the CIRCT dialect:

smem mymem : UInt<1>[8]
when cond:
  infer mport myport = mymem[addr], clock
out <= myport
%mymem = firrtl.seqmem Undefined  : !firrtl.cmemory<uint<1>, 8>
%myport_data, %myport_port = firrtl.memoryport Infer %mymem {name = "myport"}  : (!firrtl.cmemory<uint<1>, 8>) -> (!firrtl.uint<1>, !firrtl.cmemoryport)
firrtl.when %cond  {
  firrtl.memoryport.access %myport_port[%addr], %clock : !firrtl.cmemoryport, !firrtl.uint<3>, !firrtl.clock
}
firrtl.connect %out, %myport_data : !firrtl.uint<1>, !firrtl.uint<1

For a historical discussion of this issue and its development see llvm/circt#1561.

More things are represented as primitives

We describe the mux expression as "primitive", whereas the IR spec and grammar implement it as a special kind of expression.

We do this to simplify the implementation: These expressions have the same structure as primitives, and modeling them as such allows reuse of the parsing logic instead of duplication of grammar rules.

invalid Invalidate Operation is an expression

The FIRRTL spec describes an x is invalid statement that logically computes an invalid value and connects it to x according to flow semantics. This behavior makes analysis and transformation a bit more complicated, because there are now two things that perform connections: firrtl.connect and the x is invalid operation.

To make things easier to reason about, we split the x is invalid operation into two different ops: an firrtl.invalidvalue op that takes no operands and returns an invalid value, and a standard firrtl.connect operation that connects the invalid value to the destination (or a firrtl.attach for analog values). This has the same expressive power as the standard FIRRTL representation but is easier to work with.

During parsing, we break up an x is invalid statement into leaf connections. As an example, consider the following FIRRTL module where a bi-directional aggregate, a is invalidated:

module Foo:
  output a: { a: UInt<1>, flip b: UInt<1> }

  a is invalid

This is parsed into the following MLIR. Here, only a.a is invalidated:

firrtl.module @Foo(out %a: !firrtl.bundle<a: uint<1>, b: flip<uint<1>>>) {
  %0 = firrtl.subfield %a("a") : (!firrtl.bundle<a: uint<1>, b: flip<uint<1>>>) -> !firrtl.uint<1>
  %invalid_ui1 = firrtl.invalidvalue : !firrtl.uint<1>
  firrtl.connect %0, %invalid_ui1 : !firrtl.uint<1>, !firrtl.uint<1>
}

validif represented as a multiplexer

The FIRRTL spec describes a validif(en, x) operation that is used during lowering from high to low FIRRTL. Consider the following example:

c <= invalid
when a:
  c <= b

Lowering will introduce the following intermediate representation in low FIRRTL:

c <= validif(a, b)

Since there is no precedence of this validif being used anywhere in the Chisel/FIRRTL ecosystem thus far and instead is always replaced by its right-hand operand b, the FIRRTL MLIR dialect does not provide such an operation at all. Rather it directly replaces any validif in FIRRTL input with the following equivalent operations:

%0 = firrtl.invalidvalue : !firrtl.uint<42>
%c = firrtl.mux(%a, %b, %0) : (!firrtl.uint<1>, !firrtl.uint<42>, !firrtl.uint<42>) -> !firrtl.uint<42>

A canonicalization then folds this combination of firrtl.invalidvalue and firrtl.mux to the "high" operand of the multiplexer to facilitate downstream transformation passes.

Inline SystemVerilog through verbatim.expr operation

The FIRRTL dialect offers a firrtl.verbatim.expr operation that allows for SystemVerilog expressions to be embedded verbatim in the IR. It is lowered to the corresponding sv.verbatim.expr operation of the underlying SystemVerilog dialect, which embeds it in the emitted output. The operation has a FIRRTL result type, and a variadic number of operands can be accessed from within the inline SystemVerilog source text through string interpolation of {{0}}-style placeholders.

The rationale behind this verbatim operation is to offer an escape hatch analogous to asm ("...") in C/C++ and other languages, giving the user or compiler passes full control of what exactly gets embedded in the output. Usually, though, you would rather add a new operation to the IR to properly represent additional constructs.

As an example, a verbatim expression could be used to interact with yet-unsupported SystemVerilog constructs such as parametrized class typedef members:

firrtl.module @Magic (out %n : !firrtl.uint<32>) {
  %0 = firrtl.verbatim.expr "$bits(SomeClass #(.Param(1))::SomeTypedef)" : !firrtl.uint<32>
  firrtl.connect %n, %0 : !firrtl.uint<32>, !firrtl.uint<32>
}

This would lower through the other dialects to SystemVerilog as you would expect:

module Magic (output [31:0] n);
  assign n = $bits(SomeClass #(.Param(1))::SomeTypedef);
endmodule

Interpretation of Undefined Behavior

The FIRRTL Specification has undefined behavior for certain features. When in doubt, FIRRTL dialect typically chooses to implement undefined behavior in the same manner as the SFC.

Invalid

The SFC has a context-sensitive interpretation of invalid.

When an is invalid statement is used, the SFC will optimize this as a connect to a constant zero if the invalidated component is not assigned to inside a conditional block (when/else). This is an interpretation of invalid as a value that the compiler chooses to connect to a single component.

When an is invalid statement is used to specify the default of a component that is connected to in a conditional block and the conditional block is not complete, then a conditionally valid (validif) statement is generated. The conditionally valid statement connects a value when a condition is true and invalidates the component otherwise. (This is modeled as a multiplexer in the FIRRTL dialect.) When lowered, the SFC treats this invalidation as undefined behavior and will choose the valid path unconditionally. This is an interpretation of invalid as undefined behavior. (See above for more information on validif and the modeling of this as a multiplexer.)

Instead of choosing to aggressively optimize undefined behavior, FIRRTL dialect and its passes use this context-sensitive interpretation of invalid. Folds of primitive operations treat an invalid operand as a zero-valued constant. Folds of multiplexers treat invalid operands as undefined behavior and will optimize away the invalid path.

Propagation of invalid values is handled with extreme caution. Any propagation can cause a later conflation of these two interpretations of invalid and produce subtle bugs.