start tutorial on adding new devices

This commit is contained in:
Howard Mao 2018-05-11 16:36:57 -07:00
parent bba9dea481
commit f415332ba6
8 changed files with 697 additions and 0 deletions

View File

@ -0,0 +1,97 @@
Connecting Devices to Bus
=========================
Now that we have finished designing our peripheral device, we need to
hook it up into the SoC. To do this, we first need to create two traits:
one for the lazy module and one for the module implementation. The lazy
module trait is the following.
.. code-block:: scala
trait HasPeripheryInputStream { this: BaseSubsystem =>
private val portName = "input-stream"
val streamWidth = pbus.beatBytes * 8
val inputstream = LazyModule(new InputStream(0x10017000, pbus.beatBytes))
pbus.toVariableWidthSlave(Some(portName)) { inputstream.regnode }
sbus.fromPort(Some(portName))() := inputstream.dmanode
ibus.fromSync := inputstream.intnode
}
We add the line ``this: BaseSubsystem =>`` to indicate that this trait will
eventually be mixed into a class that extends ``BaseSubsystem``, which contains
the definition of the system bus ``sbus``, peripheral bus ``pbus``, and
interrupt bus ``ibus``. We instantiate the ``InputStream`` lazy module and
give it the base address ``0x10017000``. We connect the ``pbus`` into the
register node, DMA node to the ``sbus``, and interrupt node to the ``ibus``.
The module implementation trait is as follows:
.. code-block:: scala
trait HasPeripheryInputStreamModuleImp extends LazyModuleImp {
val outer: HasPeripheryInputStream
val stream_in = IO(Flipped(Decoupled(UInt(outer.streamWidth.W))))
outer.inputstream.module.io.in <> stream_in
def connectFixedInput(data: Seq[BigInt]) {
val fixed = Module(new FixedInputStream(data, outer.streamWidth))
stream_in <> fixed.io.out
}
}
Since the interrupts and memory ports have already been connected in the
lazy module trait, the module implementation trait only needs to create the
external decoupled interface and connect that to the ``InputStream`` module
implementation.
The ``connectFixedInput`` method will be used by the test harness to connect
an input stream model that just sends a pre-specified stream of data.
We can now mix these traits into the SoC design. Open up
``src/main/scala/example/Top.scala`` and add the following:
.. code-block:: scala
class ExampleTopWithInputStream(implicit p: Parameters) extends ExampleTop
with HasPeripheryInputStream {
override lazy val module = new ExampleTopWithInputStreamModule(this)
}
class ExampleTopWithInputStreamModule(outer: ExampleTopWithInputStream)
extends ExampleTopModuleImp(outer)
with HasPeripheryInputStreamModuleImp
We can then build a simulation using our new SoC by adding a configuration
to ``src/main/scala/example/Configs.scala``. This configuration will cause
the test harness to instantiate an SoC with the ``InputStream`` device
and then connect a fixed input stream model to it.
.. code-block:: scala
class WithFixedInputStream extends Config((site, here, up) => {
case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
top.connectFixedInput(Seq(
BigInt("1002abcd", 16),
BigInt("34510204", 16),
BigInt("10329999", 16),
BigInt("92101222", 16)))
top
}
})
class FixedInputStreamConfig extends Config(
new WithFixedInputStream ++ new BaseExampleConfig)
We can now compile the simulation using VCS.
.. code-block:: scala
cd vsim
make CONFIG=FixedInputStreamConfig
This will produce a ``simv-example-FixedInputStreamConfig`` executable that
can be used to run tests. We will discuss how to write and run those tests in
the next section.

View File

@ -0,0 +1,247 @@
Creating Simulation Model
=========================
So far, we've been using a fixed input stream model to test our device.
But, ideally, we'd like an input stream that is defined by a software model
and configurable at runtime. We'd like to put the input data in a file and
pass it in as a command-line argument. We can't do that in Chisel.
We'll have to create the model in Verilog and call out to C++ using the
Verilog DPI-C API.
First, how do we include Verilog code in a Chisel codebase? We can do this
using the Chisel BlackBox class. This class allows us to define IO ports and
can be used like a regular Chisel module, but the internal implementation is
left to Verilog.
.. code-block:: scala
class SimInputStream(w: Int) extends BlackBox(Map("DATA_BITS" -> IntParam(w))) {
val io = IO(new Bundle {
val clock = Input(Clock())
val reset = Input(Bool())
val out = Decoupled(UInt(w.W))
})
}
One key difference in the IO bundle definition is that the implicit ``clock``
and ``reset`` signals must be explicitly defined in a BlackBox. The BlackBox
class also takes a map that defines parameters that will be passed to the
verilog implementation. To connect the BlackBox in the test harness, we should
create a ``connectSimInput`` method in the ``HasPeripheryInputStreamModuleImp``
trait.
.. code-block:: scala
def connectSimInput(clock: Clock, reset: Bool) {
val sim = Module(new SimInputStream(outer.streamWidth))
sim.io.clock := clock
sim.io.reset := reset
stream_in <> sim.io.out
}
We then add a new configuration class in
``src/main/scala/example/Configs.scala`` that calls the ``connectSimInput``
method.
.. code-block:: scala
class WithSimInputStream extends Config((site, here, up) => {
case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
top.connectSimInput(clock, reset)
top
}
})
class SimInputStreamConfig extends Config(
new WithSimInputStream ++ new BaseExampleConfig)
Now we need to create the verilog implementation of the ``SimInputStream``
module. Make a new directory ``src/main/resources`` and add ``vsrc`` and ``csrc``
subdirectories under it.
.. code-block:: shell
$ mkdir -p src/main/resources/{vsrc,csrc}
In the ``vsrc`` directory, create a file called ``SimInputStream.v`` and add
the following code.
.. code-block:: verilog
import "DPI-C" function void input_stream_init
(
input string filename,
input int data_bits
);
import "DPI-C" function void input_stream_tick
(
output bit out_valid,
input bit out_ready,
output longint out_bits
);
module SimInputStream #(DATA_BITS=64) (
input clock,
input reset,
output out_valid,
input out_ready,
output [DATA_BITS-1:0] out_bits
);
bit __out_valid;
longint __out_bits;
string filename;
int data_bits;
reg __out_valid_reg;
reg [DATA_BITS-1:0] __out_bits_reg;
initial begin
data_bits = DATA_BITS;
if ($value$plusargs("instream=%s", filename)) begin
input_stream_init(filename, data_bits);
end
end
always @(posedge clock) begin
if (reset) begin
__out_valid = 0;
__out_bits = 0;
__out_valid_reg <= 0;
__out_bits_reg <= 0;
end else begin
input_stream_tick(
__out_valid,
out_ready,
__out_bits);
__out_valid_reg <= __out_valid;
__out_bits_reg <= __out_bits;
end
end
assign out_valid = __out_valid_reg;
assign out_bits = __out_bits_reg;
endmodule
The verilog defines its inputs and outputs to match the definition in the
Chisel BlackBox. But most of the implementation is left to C++ through the
DPI functions ``input_stream_init`` and ``input_stream_tick``. We define
these functions in a ``SimInputStream.cc`` file in the ``csrc`` directory.
.. code-block:: c++
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
class InputStream {
public:
InputStream(const char *filename, int nbytes);
~InputStream(void);
bool out_valid() { return !complete; }
uint64_t out_bits() { return data; }
void tick(bool out_ready);
private:
void read_next(void);
bool complete;
FILE *file;
int nbytes;
uint64_t data;
};
InputStream::InputStream(const char *filename, int nbytes)
{
this->nbytes = nbytes;
this->file = fopen(filename, "r");
if (this->file == NULL) {
fprintf(stderr, "Could not open %s\n", filename);
abort();
}
read_next();
}
InputStream::~InputStream(void)
{
fclose(this->file);
}
void InputStream::read_next(void)
{
int res;
this->data = 0;
res = fread(&this->data, this->nbytes, 1, this->file);
if (res < 0) {
perror("fread");
abort();
}
this->complete = (res == 0);
}
void InputStream::tick(bool out_ready)
{
int res;
if (out_valid() && out_ready)
read_next();
}
InputStream *stream = NULL;
extern "C" void input_stream_init(const char *filename, int data_bits)
{
stream = new InputStream(filename, data_bits/8);
}
extern "C" void input_stream_tick(
unsigned char *out_valid,
unsigned char out_ready,
long long *out_bits)
{
stream->tick(out_ready);
*out_valid = stream->out_valid();
*out_bits = stream->out_bits();
}
In the C++ file, we implement an ``InputStream`` class that takes a file name
as its argument. It opens the file and reads ``nbytes`` from it for every
ready-valid handshake. The ``input_stream_init`` function constructs an
``InputStream`` class and assigns it to a global pointer. The
``input_stream_tick`` function updates the state by calling the ``tick``
method, passing in the inputs from verilog. It then assigns values to the
verilog outputs.
You can now build this new configuration in VCS.
.. code-block:: shell
$ cd vsim
$ make CONFIG=SimInputStreamConfig
Now create a file that can be used as the input stream data. Just getting
random bytes from ``/dev/urandom`` would work. Pass this to your simulation
through the ``+instream=`` flag, and you should see the data get printed
out in the ``input-stream.riscv`` test.
.. code-block:: shell
$ dd if=/dev/urandom of=instream.img bs=32 count=1
$ hexdump instream.img
0000000 189b f12a 1cc1 9eb5 b65d bbef 96b6 4949
0000010 f8c8 636c 76fe 15f3 0665 0ef9 8c5d 3011
0000020
$ ./simv-example-SimInputStreamConfig +instream=instream.img ../tests/input-stream.riscv
9eb51cc1f12a189b
494996b6bbefb65d
15f376fe636cf8c8
30118c5d0ef90665

151
docs/DMA-and-Interrupts.rst Normal file
View File

@ -0,0 +1,151 @@
DMA and Interrupts
==================
In order to move data from the external input stream to memory, we need to
perform direct memory access (DMA). We can achieve this by giving the device
a TLClientNode. Once we add it, the ``LazyModule`` will now look like this:
.. code-block:: scala
class InputStream(
address: BigInt,
val beatBytes: Int = 8,
val maxInflight: Int = 4)
(implicit p: Parameters) extends LazyModule {
val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
val regnode = TLRegisterNode(
address = Seq(AddressSet(address, 0x3f)),
device = device,
beatBytes = beatBytes)
val dmanode = TLClientNode(Seq(TLClientPortParameters(
Seq(TLClientParameters(
name = "input-stream",
sourceId = IdRange(0, maxInflight))))))
lazy val module = new InputStreamModuleImp(this)
}
For our ``TLClientNode``, we only need a single port, so we specify a single
set of ``TLClientPortParameters`` and ``TLClientParameters``. We override two
arguments in the ``TLClientParameters`` constructor. The ``name`` is the
name of the port and ``sourceId`` indicates the range of transaction IDs
that can be used in memory requests. The lower bound is inclusive, and the
upper bound is exclusive, so this device can use source IDs from 0 to
``maxInflight - 1``.
In the module implementation, we can now implement a state machine that
sends write requests to memory. We first call `outer.dmanode.out` to get
a sequence of output port tuples. Since we only have one port, we can just
pull out the first element of this sequence. For each port, we get a pair of
objects. The first is the physical TileLink port, which we can connect to RTL.
The second is a ``TLEdge`` object, which we can use to get extra metadata about
the tilelink port (like the number of address and data bits).
.. code-block:: scala
class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
val (tl, edge) = outer.dmanode.out(0)
val addrBits = edge.bundle.addressBits
val w = edge.bundle.dataBits
val beatBytes = (w / 8)
val io = IO(new Bundle {
val in = Flipped(Decoupled(UInt(w.W)))
})
val addr = Reg(UInt(addrBits.W))
val len = Reg(UInt(addrBits.W))
val running = RegInit(false.B)
val complete = RegInit(false.B)
val s_idle :: s_issue :: s_wait :: Nil = Enum(3)
val state = RegInit(s_idle)
val nXacts = outer.maxInflight
val xactBusy = RegInit(0.U(nXacts.W))
val xactOnehot = PriorityEncoderOH(~xactBusy)
val canIssue = (state === s_issue) && !xactBusy.andR
io.in.ready := canIssue && tl.a.ready
tl.a.valid := canIssue && io.in.valid
tl.a.bits := edge.Put(
fromSource = OHToUInt(xactOnehot),
toAddress = addr,
lgSize = log2Ceil(beatBytes).U,
data = io.in.bits)._2
tl.d.ready := running && xactBusy.orR
xactBusy := (xactBusy |
Mux(tl.a.fire(), xactOnehot, 0.U(nXacts.W))) &
~Mux(tl.d.fire(), UIntToOH(tl.d.bits.source), 0.U)
when (state === s_idle && running) {
assert(addr(log2Ceil(beatBytes)-1,0) === 0.U,
s"InputStream base address not aligned to ${beatBytes} bytes")
assert(len(log2Ceil(beatBytes)-1,0) === 0.U,
s"InputStream length not aligned to ${beatBytes} bytes")
state := s_issue
}
when (io.in.fire()) {
addr := addr + beatBytes.U
len := len - beatBytes.U
when (len === beatBytes.U) { state := s_wait }
}
when (state === s_wait && !xactBusy.orR) {
running := false.B
complete := true.B
state := s_idle
}
outer.regnode.regmap(
0x00 -> Seq(RegField(addrBits, addr)),
0x08 -> Seq(RegField(addrBits, len)),
0x10 -> Seq(RegField(1, running)),
0x18 -> Seq(RegField(1, complete)))
}
The state machine starts in the ``s_idle`` state. In this state, the CPU should
set the ``addr`` and ``len`` registers and then set the ``running`` register to
1. The state machine then moves into the ``s_issue`` state, in which it
forwards data from the ``in`` decoupled interface to memory through the
TileLink `A` channel.
We construct the `A` channel requests using the ``Put`` method in the
``TLEdge`` object we extracted earlier. The ``Put`` method takes a unique
source ID in ``fromSource``, the address to write to in ``toAddress``, the
base-2 logarithm of the size in bytes in ``lgSize``, and the data to be written
in ``data``.
The source field must observe some constraints. There can only be one
transaction with each distinct source ID in flight at a given time.
Once you send a request on the `A` channel with a specific source ID,
you cannot send another until after you've received the response for it
on the `D` channel.
Once all requests have been sent on the `A` channel, the state machine
transitions to the ``s_wait`` state to wait for the remaining responses on
the `D` channel. Once the responses have all returned, the state machine
sets ``running`` to false and ``completed`` to true. The CPU can poll the
``completed`` register to check if the operation has finished.
However, for long-running operations, we would usually like to have the device
notify the CPU through an interrupt. To add an interrupt to the device,
we need to create an ``IntSourceNode`` in the lazy module.
.. code-block:: scala
val intnode = IntSourceNode(IntSourcePortSimple(resources = device.int))
Then, in the module implementation, we can connect the ``complete`` register
to the interrupt line. That way, the CPU will get interrupted once the
state machine completes. It can clear the interrupt by writing a 0 to the
``complete`` register.
.. code-block:: scala
val (interrupt, _) = outer.intnode.out(0)
interrupt(0) := complete

View File

@ -0,0 +1,13 @@
Developing New Devices
======================
.. toctree::
:maxdepth: 2
:caption: Developing New Devices:
Getting-Started
MMIO-mapped-Registers
DMA-and-Interrupts
Connecting-Devices-to-Bus
Running-Test-Software
Creating-Simulation-Model

42
docs/Getting-Started.rst Normal file
View File

@ -0,0 +1,42 @@
Getting Started
===============
In this tutorial, we will show you how to design a new memory-mapped IO
device, test it in simulation, and then build and run it on FireSim.
To start with, you will need to clone a copy of FireChip, the repository
that aggregates all the target RTL for FireSim. FireSim already contains
FireChip as a submodule under ``target-design/firechip``, but it makes patches
to the codebase so that it will work with the FPGA tools. Therefore, you will
need to clone a clean copy if you want to use FireChip standalone.
Go to https://github.com/firesim/firechip and click the "Fork" button to
fork the repository to your own account. Now clone the new repo to your
local machine and initialize the submodules.
.. code-block:: shell
$ git clone git@github.com:yourusername/firechip.git
$ cd firechip
$ git submodule update --init
$ cd rocket-chip
$ git submodule update --init
$ cd ..
You will not need to install the riscv-tools again because you'll just be
reusing the one in firesim. So make sure to go into firesim and source
``sourceme-f1-full.sh`` before you run the rest of the commands in this
tutorial.
Now that everything is checked out, you can build the VCS or Verilator
simulator and run the regression tests to make sure everything is working.
.. code-block:: shell
$ cd vsim # or "cd verisim" for verilator
$ make # builds the DefaultExampleConfig
$ make run-regression-tests
If everything is set up correctly, you should see a bunch of ``*.out`` files
in the ``output/`` directory. If you open these up, they should all say
"Completed after XXXXX cycles" at the end and not have any error messages.

View File

@ -0,0 +1,76 @@
MMIO-mapped Registers
=====================
In this tutorial, we will create a new device which pulls in data from an
externally-connected input stream and writes it to memory. We'll create out
device in the file ``src/main/scala/example/InputStream.scala``. The first
thing we need to do is set up some memory-mapped control registers that the
CPU can use to communicate with the device. The easiest way to do this is by
creating a ``TLRegisterNode``, which provides a ``regmap`` method that can be
used to generate the hardware for reading and writing to RTL registers.
.. code-block:: scala
class InputStream(
address: BigInt,
val beatBytes: Int = 8)
(implicit p: Parameters) extends LazyModule {
val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
val regnode = TLRegisterNode(
address = Seq(AddressSet(address, 0x3f)),
device = device,
beatBytes = beatBytes)
lazy val module = new InputStreamModuleImp(this)
}
We want to specify or override two arguments in the ``TLRegisterNode``
constructor. The first is the address of the device in the memory map.
The address is specified as an ``AddressSet`` containing two values, a base
address and a mask. The system bus will route all addresses that match the
base address on the bits not set in the mask. In this case, we set the
mask to ``0x3f``, which sets the lower six bits. This means that a 64 byte
region starting from the base address will be routed to this device.
The second argument to ``TLRegisterNode`` is a ``SimpleDevice`` object, which
provides the name and compatibility of the device table entry that will be
created for the peripheral. We won't show how this is used in this tutorial,
but it will be important if you want to create a Linux kernel driver for
the device.
The third argument to ``TLRegisterNode`` is ``beatBytes``, which specifies
the width of the TileLink interface. We will just pass this through from a
class argument.
We want the device to be able to write a specified amount of bytes to a
specified location in memory, so we'll provide ``addr`` and ``len`` registers.
We will also want a ``running`` register for the CPU to signal that the device
to start operation and a ``complete`` register for the device to signal to
the CPU that it has completed.
.. code-block:: scala
class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
val addrBits = 64
val w = 64
val io = IO(new Bundle {
// Not used yet
val in = Flipped(Decoupled(UInt(w.W)))
}
val addr = Reg(UInt(addrBits.W))
val len = Reg(UInt(addrBits.W))
val running = RegInit(false.B)
val complete = RegInit(false.B)
outer.regnode.regmap(
0x00 -> Seq(RegField(addrBits, addr)),
0x08 -> Seq(RegField(addrBits, len)),
0x10 -> Seq(RegField(1, running)),
0x18 -> Seq(RegField(1, complete)))
}
The arguments to ``regmap`` should be a series of mappings from address
offsets to sequences of ``RegField`` objects. The ``RegField`` constructor
takes two arguments, the width of the register field and the RTL register
itself.

View File

@ -0,0 +1,70 @@
Running Test Software
=====================
To test our input stream device, we want to write an application that uses
the device to write data into memory, then reads the data and prints it out.
In project-template, test software is placed in the ``tests/`` directory,
which includes a Makefile and library code for developing a baremetal program.
We'll create a new file at ``tests/input-stream.c`` with the following code:
.. code-block:: c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include "mmio.h"
#define N 4
#define INPUTSTREAM_BASE 0x10017000L
#define INPUTSTREAM_ADDR (INPUTSTREAM_BASE + 0x00)
#define INPUTSTREAM_LEN (INPUTSTREAM_BASE + 0x08)
#define INPUTSTREAM_RUNNING (INPUTSTREAM_BASE + 0x10)
#define INPUTSTREAM_COMPLETE (INPUTSTREAM_BASE + 0x18)
uint64_t values[N];
int main(void)
{
reg_write64(INPUTSTREAM_ADDR, (uint64_t) values);
reg_write64(INPUTSTREAM_LEN, N * sizeof(uint64_t));
asm volatile ("fence");
reg_write64(INPUTSTREAM_RUNNING, 1);
while (reg_read64(INPUTSTREAM_COMPLETE) == 0) {}
reg_write64(INPUTSTREAM_COMPLETE, 0);
for (int i = 0; i < N; i++)
printf("%016lx\n", values[i]);
return 0;
}
This program statically allocates an array for the data to be written to.
It then sets the ``addr`` and ``len`` registers, executes a ``fence``
instruction to make sure they are committed, and then sets the ``running``
register. It then continuously polls the ``complete`` register until it sees
a non-zero value, at which point it knows the data has been written to memory
and is safe to read back.
To compile this program, add "input-stream" to the ``PROGRAMS`` list in
``tests/Makefile`` and run ``make`` from the tests directory.
To run the program, return to the ``vsim/`` directory and run the simulator
executable, passing the newly compiled ``input-stream.riscv`` executable
as an argument.
.. code-block:: shell
$ cd vsim
$ ./simv-example-FixedInputStreamConfig ../tests/input-stream.riscv
The program should print out
.. code-block:: text
000000001002abcd
0000000034510204
0000000010329999
0000000092101222

View File

@ -15,6 +15,7 @@ Welcome to FireSim's documentation!
single-node-sim
cluster-sim
advanced-usage
Developing-New-Devices
Indices and tables
==================