start tutorial on adding new devices
This commit is contained in:
parent
bba9dea481
commit
f415332ba6
|
@ -0,0 +1,97 @@
|
||||||
|
Connecting Devices to Bus
|
||||||
|
=========================
|
||||||
|
|
||||||
|
Now that we have finished designing our peripheral device, we need to
|
||||||
|
hook it up into the SoC. To do this, we first need to create two traits:
|
||||||
|
one for the lazy module and one for the module implementation. The lazy
|
||||||
|
module trait is the following.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
trait HasPeripheryInputStream { this: BaseSubsystem =>
|
||||||
|
private val portName = "input-stream"
|
||||||
|
val streamWidth = pbus.beatBytes * 8
|
||||||
|
val inputstream = LazyModule(new InputStream(0x10017000, pbus.beatBytes))
|
||||||
|
pbus.toVariableWidthSlave(Some(portName)) { inputstream.regnode }
|
||||||
|
sbus.fromPort(Some(portName))() := inputstream.dmanode
|
||||||
|
ibus.fromSync := inputstream.intnode
|
||||||
|
}
|
||||||
|
|
||||||
|
We add the line ``this: BaseSubsystem =>`` to indicate that this trait will
|
||||||
|
eventually be mixed into a class that extends ``BaseSubsystem``, which contains
|
||||||
|
the definition of the system bus ``sbus``, peripheral bus ``pbus``, and
|
||||||
|
interrupt bus ``ibus``. We instantiate the ``InputStream`` lazy module and
|
||||||
|
give it the base address ``0x10017000``. We connect the ``pbus`` into the
|
||||||
|
register node, DMA node to the ``sbus``, and interrupt node to the ``ibus``.
|
||||||
|
|
||||||
|
The module implementation trait is as follows:
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
trait HasPeripheryInputStreamModuleImp extends LazyModuleImp {
|
||||||
|
val outer: HasPeripheryInputStream
|
||||||
|
|
||||||
|
val stream_in = IO(Flipped(Decoupled(UInt(outer.streamWidth.W))))
|
||||||
|
outer.inputstream.module.io.in <> stream_in
|
||||||
|
|
||||||
|
def connectFixedInput(data: Seq[BigInt]) {
|
||||||
|
val fixed = Module(new FixedInputStream(data, outer.streamWidth))
|
||||||
|
stream_in <> fixed.io.out
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Since the interrupts and memory ports have already been connected in the
|
||||||
|
lazy module trait, the module implementation trait only needs to create the
|
||||||
|
external decoupled interface and connect that to the ``InputStream`` module
|
||||||
|
implementation.
|
||||||
|
|
||||||
|
The ``connectFixedInput`` method will be used by the test harness to connect
|
||||||
|
an input stream model that just sends a pre-specified stream of data.
|
||||||
|
|
||||||
|
We can now mix these traits into the SoC design. Open up
|
||||||
|
``src/main/scala/example/Top.scala`` and add the following:
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
class ExampleTopWithInputStream(implicit p: Parameters) extends ExampleTop
|
||||||
|
with HasPeripheryInputStream {
|
||||||
|
override lazy val module = new ExampleTopWithInputStreamModule(this)
|
||||||
|
}
|
||||||
|
|
||||||
|
class ExampleTopWithInputStreamModule(outer: ExampleTopWithInputStream)
|
||||||
|
extends ExampleTopModuleImp(outer)
|
||||||
|
with HasPeripheryInputStreamModuleImp
|
||||||
|
|
||||||
|
|
||||||
|
We can then build a simulation using our new SoC by adding a configuration
|
||||||
|
to ``src/main/scala/example/Configs.scala``. This configuration will cause
|
||||||
|
the test harness to instantiate an SoC with the ``InputStream`` device
|
||||||
|
and then connect a fixed input stream model to it.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
class WithFixedInputStream extends Config((site, here, up) => {
|
||||||
|
case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
|
||||||
|
val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
|
||||||
|
top.connectFixedInput(Seq(
|
||||||
|
BigInt("1002abcd", 16),
|
||||||
|
BigInt("34510204", 16),
|
||||||
|
BigInt("10329999", 16),
|
||||||
|
BigInt("92101222", 16)))
|
||||||
|
top
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
class FixedInputStreamConfig extends Config(
|
||||||
|
new WithFixedInputStream ++ new BaseExampleConfig)
|
||||||
|
|
||||||
|
We can now compile the simulation using VCS.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
cd vsim
|
||||||
|
make CONFIG=FixedInputStreamConfig
|
||||||
|
|
||||||
|
This will produce a ``simv-example-FixedInputStreamConfig`` executable that
|
||||||
|
can be used to run tests. We will discuss how to write and run those tests in
|
||||||
|
the next section.
|
|
@ -0,0 +1,247 @@
|
||||||
|
Creating Simulation Model
|
||||||
|
=========================
|
||||||
|
|
||||||
|
So far, we've been using a fixed input stream model to test our device.
|
||||||
|
But, ideally, we'd like an input stream that is defined by a software model
|
||||||
|
and configurable at runtime. We'd like to put the input data in a file and
|
||||||
|
pass it in as a command-line argument. We can't do that in Chisel.
|
||||||
|
We'll have to create the model in Verilog and call out to C++ using the
|
||||||
|
Verilog DPI-C API.
|
||||||
|
|
||||||
|
First, how do we include Verilog code in a Chisel codebase? We can do this
|
||||||
|
using the Chisel BlackBox class. This class allows us to define IO ports and
|
||||||
|
can be used like a regular Chisel module, but the internal implementation is
|
||||||
|
left to Verilog.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
class SimInputStream(w: Int) extends BlackBox(Map("DATA_BITS" -> IntParam(w))) {
|
||||||
|
val io = IO(new Bundle {
|
||||||
|
val clock = Input(Clock())
|
||||||
|
val reset = Input(Bool())
|
||||||
|
val out = Decoupled(UInt(w.W))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
One key difference in the IO bundle definition is that the implicit ``clock``
|
||||||
|
and ``reset`` signals must be explicitly defined in a BlackBox. The BlackBox
|
||||||
|
class also takes a map that defines parameters that will be passed to the
|
||||||
|
verilog implementation. To connect the BlackBox in the test harness, we should
|
||||||
|
create a ``connectSimInput`` method in the ``HasPeripheryInputStreamModuleImp``
|
||||||
|
trait.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
def connectSimInput(clock: Clock, reset: Bool) {
|
||||||
|
val sim = Module(new SimInputStream(outer.streamWidth))
|
||||||
|
sim.io.clock := clock
|
||||||
|
sim.io.reset := reset
|
||||||
|
stream_in <> sim.io.out
|
||||||
|
}
|
||||||
|
|
||||||
|
We then add a new configuration class in
|
||||||
|
``src/main/scala/example/Configs.scala`` that calls the ``connectSimInput``
|
||||||
|
method.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
|
||||||
|
class WithSimInputStream extends Config((site, here, up) => {
|
||||||
|
case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
|
||||||
|
val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
|
||||||
|
top.connectSimInput(clock, reset)
|
||||||
|
top
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
class SimInputStreamConfig extends Config(
|
||||||
|
new WithSimInputStream ++ new BaseExampleConfig)
|
||||||
|
|
||||||
|
Now we need to create the verilog implementation of the ``SimInputStream``
|
||||||
|
module. Make a new directory ``src/main/resources`` and add ``vsrc`` and ``csrc``
|
||||||
|
subdirectories under it.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
$ mkdir -p src/main/resources/{vsrc,csrc}
|
||||||
|
|
||||||
|
In the ``vsrc`` directory, create a file called ``SimInputStream.v`` and add
|
||||||
|
the following code.
|
||||||
|
|
||||||
|
.. code-block:: verilog
|
||||||
|
|
||||||
|
import "DPI-C" function void input_stream_init
|
||||||
|
(
|
||||||
|
input string filename,
|
||||||
|
input int data_bits
|
||||||
|
);
|
||||||
|
|
||||||
|
import "DPI-C" function void input_stream_tick
|
||||||
|
(
|
||||||
|
output bit out_valid,
|
||||||
|
input bit out_ready,
|
||||||
|
output longint out_bits
|
||||||
|
);
|
||||||
|
|
||||||
|
module SimInputStream #(DATA_BITS=64) (
|
||||||
|
input clock,
|
||||||
|
input reset,
|
||||||
|
output out_valid,
|
||||||
|
input out_ready,
|
||||||
|
output [DATA_BITS-1:0] out_bits
|
||||||
|
);
|
||||||
|
|
||||||
|
bit __out_valid;
|
||||||
|
longint __out_bits;
|
||||||
|
string filename;
|
||||||
|
int data_bits;
|
||||||
|
|
||||||
|
reg __out_valid_reg;
|
||||||
|
reg [DATA_BITS-1:0] __out_bits_reg;
|
||||||
|
|
||||||
|
initial begin
|
||||||
|
data_bits = DATA_BITS;
|
||||||
|
if ($value$plusargs("instream=%s", filename)) begin
|
||||||
|
input_stream_init(filename, data_bits);
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
always @(posedge clock) begin
|
||||||
|
if (reset) begin
|
||||||
|
__out_valid = 0;
|
||||||
|
__out_bits = 0;
|
||||||
|
|
||||||
|
__out_valid_reg <= 0;
|
||||||
|
__out_bits_reg <= 0;
|
||||||
|
end else begin
|
||||||
|
input_stream_tick(
|
||||||
|
__out_valid,
|
||||||
|
out_ready,
|
||||||
|
__out_bits);
|
||||||
|
__out_valid_reg <= __out_valid;
|
||||||
|
__out_bits_reg <= __out_bits;
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
assign out_valid = __out_valid_reg;
|
||||||
|
assign out_bits = __out_bits_reg;
|
||||||
|
|
||||||
|
endmodule
|
||||||
|
|
||||||
|
The verilog defines its inputs and outputs to match the definition in the
|
||||||
|
Chisel BlackBox. But most of the implementation is left to C++ through the
|
||||||
|
DPI functions ``input_stream_init`` and ``input_stream_tick``. We define
|
||||||
|
these functions in a ``SimInputStream.cc`` file in the ``csrc`` directory.
|
||||||
|
|
||||||
|
.. code-block:: c++
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
|
||||||
|
class InputStream {
|
||||||
|
public:
|
||||||
|
InputStream(const char *filename, int nbytes);
|
||||||
|
~InputStream(void);
|
||||||
|
|
||||||
|
bool out_valid() { return !complete; }
|
||||||
|
uint64_t out_bits() { return data; }
|
||||||
|
void tick(bool out_ready);
|
||||||
|
|
||||||
|
private:
|
||||||
|
void read_next(void);
|
||||||
|
bool complete;
|
||||||
|
FILE *file;
|
||||||
|
int nbytes;
|
||||||
|
uint64_t data;
|
||||||
|
};
|
||||||
|
|
||||||
|
InputStream::InputStream(const char *filename, int nbytes)
|
||||||
|
{
|
||||||
|
this->nbytes = nbytes;
|
||||||
|
this->file = fopen(filename, "r");
|
||||||
|
if (this->file == NULL) {
|
||||||
|
fprintf(stderr, "Could not open %s\n", filename);
|
||||||
|
abort();
|
||||||
|
}
|
||||||
|
|
||||||
|
read_next();
|
||||||
|
}
|
||||||
|
|
||||||
|
InputStream::~InputStream(void)
|
||||||
|
{
|
||||||
|
fclose(this->file);
|
||||||
|
}
|
||||||
|
|
||||||
|
void InputStream::read_next(void)
|
||||||
|
{
|
||||||
|
int res;
|
||||||
|
|
||||||
|
this->data = 0;
|
||||||
|
|
||||||
|
res = fread(&this->data, this->nbytes, 1, this->file);
|
||||||
|
if (res < 0) {
|
||||||
|
perror("fread");
|
||||||
|
abort();
|
||||||
|
}
|
||||||
|
|
||||||
|
this->complete = (res == 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
void InputStream::tick(bool out_ready)
|
||||||
|
{
|
||||||
|
int res;
|
||||||
|
|
||||||
|
if (out_valid() && out_ready)
|
||||||
|
read_next();
|
||||||
|
}
|
||||||
|
|
||||||
|
InputStream *stream = NULL;
|
||||||
|
|
||||||
|
extern "C" void input_stream_init(const char *filename, int data_bits)
|
||||||
|
{
|
||||||
|
stream = new InputStream(filename, data_bits/8);
|
||||||
|
}
|
||||||
|
|
||||||
|
extern "C" void input_stream_tick(
|
||||||
|
unsigned char *out_valid,
|
||||||
|
unsigned char out_ready,
|
||||||
|
long long *out_bits)
|
||||||
|
{
|
||||||
|
stream->tick(out_ready);
|
||||||
|
*out_valid = stream->out_valid();
|
||||||
|
*out_bits = stream->out_bits();
|
||||||
|
}
|
||||||
|
|
||||||
|
In the C++ file, we implement an ``InputStream`` class that takes a file name
|
||||||
|
as its argument. It opens the file and reads ``nbytes`` from it for every
|
||||||
|
ready-valid handshake. The ``input_stream_init`` function constructs an
|
||||||
|
``InputStream`` class and assigns it to a global pointer. The
|
||||||
|
``input_stream_tick`` function updates the state by calling the ``tick``
|
||||||
|
method, passing in the inputs from verilog. It then assigns values to the
|
||||||
|
verilog outputs.
|
||||||
|
|
||||||
|
You can now build this new configuration in VCS.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
$ cd vsim
|
||||||
|
$ make CONFIG=SimInputStreamConfig
|
||||||
|
|
||||||
|
Now create a file that can be used as the input stream data. Just getting
|
||||||
|
random bytes from ``/dev/urandom`` would work. Pass this to your simulation
|
||||||
|
through the ``+instream=`` flag, and you should see the data get printed
|
||||||
|
out in the ``input-stream.riscv`` test.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
$ dd if=/dev/urandom of=instream.img bs=32 count=1
|
||||||
|
$ hexdump instream.img
|
||||||
|
0000000 189b f12a 1cc1 9eb5 b65d bbef 96b6 4949
|
||||||
|
0000010 f8c8 636c 76fe 15f3 0665 0ef9 8c5d 3011
|
||||||
|
0000020
|
||||||
|
$ ./simv-example-SimInputStreamConfig +instream=instream.img ../tests/input-stream.riscv
|
||||||
|
9eb51cc1f12a189b
|
||||||
|
494996b6bbefb65d
|
||||||
|
15f376fe636cf8c8
|
||||||
|
30118c5d0ef90665
|
|
@ -0,0 +1,151 @@
|
||||||
|
DMA and Interrupts
|
||||||
|
==================
|
||||||
|
|
||||||
|
In order to move data from the external input stream to memory, we need to
|
||||||
|
perform direct memory access (DMA). We can achieve this by giving the device
|
||||||
|
a TLClientNode. Once we add it, the ``LazyModule`` will now look like this:
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
class InputStream(
|
||||||
|
address: BigInt,
|
||||||
|
val beatBytes: Int = 8,
|
||||||
|
val maxInflight: Int = 4)
|
||||||
|
(implicit p: Parameters) extends LazyModule {
|
||||||
|
|
||||||
|
val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
|
||||||
|
val regnode = TLRegisterNode(
|
||||||
|
address = Seq(AddressSet(address, 0x3f)),
|
||||||
|
device = device,
|
||||||
|
beatBytes = beatBytes)
|
||||||
|
val dmanode = TLClientNode(Seq(TLClientPortParameters(
|
||||||
|
Seq(TLClientParameters(
|
||||||
|
name = "input-stream",
|
||||||
|
sourceId = IdRange(0, maxInflight))))))
|
||||||
|
|
||||||
|
lazy val module = new InputStreamModuleImp(this)
|
||||||
|
}
|
||||||
|
|
||||||
|
For our ``TLClientNode``, we only need a single port, so we specify a single
|
||||||
|
set of ``TLClientPortParameters`` and ``TLClientParameters``. We override two
|
||||||
|
arguments in the ``TLClientParameters`` constructor. The ``name`` is the
|
||||||
|
name of the port and ``sourceId`` indicates the range of transaction IDs
|
||||||
|
that can be used in memory requests. The lower bound is inclusive, and the
|
||||||
|
upper bound is exclusive, so this device can use source IDs from 0 to
|
||||||
|
``maxInflight - 1``.
|
||||||
|
|
||||||
|
In the module implementation, we can now implement a state machine that
|
||||||
|
sends write requests to memory. We first call `outer.dmanode.out` to get
|
||||||
|
a sequence of output port tuples. Since we only have one port, we can just
|
||||||
|
pull out the first element of this sequence. For each port, we get a pair of
|
||||||
|
objects. The first is the physical TileLink port, which we can connect to RTL.
|
||||||
|
The second is a ``TLEdge`` object, which we can use to get extra metadata about
|
||||||
|
the tilelink port (like the number of address and data bits).
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
|
||||||
|
val (tl, edge) = outer.dmanode.out(0)
|
||||||
|
val addrBits = edge.bundle.addressBits
|
||||||
|
val w = edge.bundle.dataBits
|
||||||
|
val beatBytes = (w / 8)
|
||||||
|
|
||||||
|
val io = IO(new Bundle {
|
||||||
|
val in = Flipped(Decoupled(UInt(w.W)))
|
||||||
|
})
|
||||||
|
|
||||||
|
val addr = Reg(UInt(addrBits.W))
|
||||||
|
val len = Reg(UInt(addrBits.W))
|
||||||
|
val running = RegInit(false.B)
|
||||||
|
val complete = RegInit(false.B)
|
||||||
|
|
||||||
|
val s_idle :: s_issue :: s_wait :: Nil = Enum(3)
|
||||||
|
val state = RegInit(s_idle)
|
||||||
|
|
||||||
|
val nXacts = outer.maxInflight
|
||||||
|
val xactBusy = RegInit(0.U(nXacts.W))
|
||||||
|
val xactOnehot = PriorityEncoderOH(~xactBusy)
|
||||||
|
val canIssue = (state === s_issue) && !xactBusy.andR
|
||||||
|
|
||||||
|
io.in.ready := canIssue && tl.a.ready
|
||||||
|
tl.a.valid := canIssue && io.in.valid
|
||||||
|
tl.a.bits := edge.Put(
|
||||||
|
fromSource = OHToUInt(xactOnehot),
|
||||||
|
toAddress = addr,
|
||||||
|
lgSize = log2Ceil(beatBytes).U,
|
||||||
|
data = io.in.bits)._2
|
||||||
|
tl.d.ready := running && xactBusy.orR
|
||||||
|
|
||||||
|
xactBusy := (xactBusy |
|
||||||
|
Mux(tl.a.fire(), xactOnehot, 0.U(nXacts.W))) &
|
||||||
|
~Mux(tl.d.fire(), UIntToOH(tl.d.bits.source), 0.U)
|
||||||
|
|
||||||
|
when (state === s_idle && running) {
|
||||||
|
assert(addr(log2Ceil(beatBytes)-1,0) === 0.U,
|
||||||
|
s"InputStream base address not aligned to ${beatBytes} bytes")
|
||||||
|
assert(len(log2Ceil(beatBytes)-1,0) === 0.U,
|
||||||
|
s"InputStream length not aligned to ${beatBytes} bytes")
|
||||||
|
state := s_issue
|
||||||
|
}
|
||||||
|
|
||||||
|
when (io.in.fire()) {
|
||||||
|
addr := addr + beatBytes.U
|
||||||
|
len := len - beatBytes.U
|
||||||
|
when (len === beatBytes.U) { state := s_wait }
|
||||||
|
}
|
||||||
|
|
||||||
|
when (state === s_wait && !xactBusy.orR) {
|
||||||
|
running := false.B
|
||||||
|
complete := true.B
|
||||||
|
state := s_idle
|
||||||
|
}
|
||||||
|
|
||||||
|
outer.regnode.regmap(
|
||||||
|
0x00 -> Seq(RegField(addrBits, addr)),
|
||||||
|
0x08 -> Seq(RegField(addrBits, len)),
|
||||||
|
0x10 -> Seq(RegField(1, running)),
|
||||||
|
0x18 -> Seq(RegField(1, complete)))
|
||||||
|
}
|
||||||
|
|
||||||
|
The state machine starts in the ``s_idle`` state. In this state, the CPU should
|
||||||
|
set the ``addr`` and ``len`` registers and then set the ``running`` register to
|
||||||
|
1. The state machine then moves into the ``s_issue`` state, in which it
|
||||||
|
forwards data from the ``in`` decoupled interface to memory through the
|
||||||
|
TileLink `A` channel.
|
||||||
|
|
||||||
|
We construct the `A` channel requests using the ``Put`` method in the
|
||||||
|
``TLEdge`` object we extracted earlier. The ``Put`` method takes a unique
|
||||||
|
source ID in ``fromSource``, the address to write to in ``toAddress``, the
|
||||||
|
base-2 logarithm of the size in bytes in ``lgSize``, and the data to be written
|
||||||
|
in ``data``.
|
||||||
|
|
||||||
|
The source field must observe some constraints. There can only be one
|
||||||
|
transaction with each distinct source ID in flight at a given time.
|
||||||
|
Once you send a request on the `A` channel with a specific source ID,
|
||||||
|
you cannot send another until after you've received the response for it
|
||||||
|
on the `D` channel.
|
||||||
|
|
||||||
|
Once all requests have been sent on the `A` channel, the state machine
|
||||||
|
transitions to the ``s_wait`` state to wait for the remaining responses on
|
||||||
|
the `D` channel. Once the responses have all returned, the state machine
|
||||||
|
sets ``running`` to false and ``completed`` to true. The CPU can poll the
|
||||||
|
``completed`` register to check if the operation has finished.
|
||||||
|
|
||||||
|
However, for long-running operations, we would usually like to have the device
|
||||||
|
notify the CPU through an interrupt. To add an interrupt to the device,
|
||||||
|
we need to create an ``IntSourceNode`` in the lazy module.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
val intnode = IntSourceNode(IntSourcePortSimple(resources = device.int))
|
||||||
|
|
||||||
|
Then, in the module implementation, we can connect the ``complete`` register
|
||||||
|
to the interrupt line. That way, the CPU will get interrupted once the
|
||||||
|
state machine completes. It can clear the interrupt by writing a 0 to the
|
||||||
|
``complete`` register.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
val (interrupt, _) = outer.intnode.out(0)
|
||||||
|
|
||||||
|
interrupt(0) := complete
|
|
@ -0,0 +1,13 @@
|
||||||
|
Developing New Devices
|
||||||
|
======================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
:caption: Developing New Devices:
|
||||||
|
|
||||||
|
Getting-Started
|
||||||
|
MMIO-mapped-Registers
|
||||||
|
DMA-and-Interrupts
|
||||||
|
Connecting-Devices-to-Bus
|
||||||
|
Running-Test-Software
|
||||||
|
Creating-Simulation-Model
|
|
@ -0,0 +1,42 @@
|
||||||
|
Getting Started
|
||||||
|
===============
|
||||||
|
|
||||||
|
In this tutorial, we will show you how to design a new memory-mapped IO
|
||||||
|
device, test it in simulation, and then build and run it on FireSim.
|
||||||
|
|
||||||
|
To start with, you will need to clone a copy of FireChip, the repository
|
||||||
|
that aggregates all the target RTL for FireSim. FireSim already contains
|
||||||
|
FireChip as a submodule under ``target-design/firechip``, but it makes patches
|
||||||
|
to the codebase so that it will work with the FPGA tools. Therefore, you will
|
||||||
|
need to clone a clean copy if you want to use FireChip standalone.
|
||||||
|
|
||||||
|
Go to https://github.com/firesim/firechip and click the "Fork" button to
|
||||||
|
fork the repository to your own account. Now clone the new repo to your
|
||||||
|
local machine and initialize the submodules.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
$ git clone git@github.com:yourusername/firechip.git
|
||||||
|
$ cd firechip
|
||||||
|
$ git submodule update --init
|
||||||
|
$ cd rocket-chip
|
||||||
|
$ git submodule update --init
|
||||||
|
$ cd ..
|
||||||
|
|
||||||
|
You will not need to install the riscv-tools again because you'll just be
|
||||||
|
reusing the one in firesim. So make sure to go into firesim and source
|
||||||
|
``sourceme-f1-full.sh`` before you run the rest of the commands in this
|
||||||
|
tutorial.
|
||||||
|
|
||||||
|
Now that everything is checked out, you can build the VCS or Verilator
|
||||||
|
simulator and run the regression tests to make sure everything is working.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
$ cd vsim # or "cd verisim" for verilator
|
||||||
|
$ make # builds the DefaultExampleConfig
|
||||||
|
$ make run-regression-tests
|
||||||
|
|
||||||
|
If everything is set up correctly, you should see a bunch of ``*.out`` files
|
||||||
|
in the ``output/`` directory. If you open these up, they should all say
|
||||||
|
"Completed after XXXXX cycles" at the end and not have any error messages.
|
|
@ -0,0 +1,76 @@
|
||||||
|
MMIO-mapped Registers
|
||||||
|
=====================
|
||||||
|
|
||||||
|
In this tutorial, we will create a new device which pulls in data from an
|
||||||
|
externally-connected input stream and writes it to memory. We'll create out
|
||||||
|
device in the file ``src/main/scala/example/InputStream.scala``. The first
|
||||||
|
thing we need to do is set up some memory-mapped control registers that the
|
||||||
|
CPU can use to communicate with the device. The easiest way to do this is by
|
||||||
|
creating a ``TLRegisterNode``, which provides a ``regmap`` method that can be
|
||||||
|
used to generate the hardware for reading and writing to RTL registers.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
class InputStream(
|
||||||
|
address: BigInt,
|
||||||
|
val beatBytes: Int = 8)
|
||||||
|
(implicit p: Parameters) extends LazyModule {
|
||||||
|
|
||||||
|
val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
|
||||||
|
val regnode = TLRegisterNode(
|
||||||
|
address = Seq(AddressSet(address, 0x3f)),
|
||||||
|
device = device,
|
||||||
|
beatBytes = beatBytes)
|
||||||
|
|
||||||
|
lazy val module = new InputStreamModuleImp(this)
|
||||||
|
}
|
||||||
|
|
||||||
|
We want to specify or override two arguments in the ``TLRegisterNode``
|
||||||
|
constructor. The first is the address of the device in the memory map.
|
||||||
|
The address is specified as an ``AddressSet`` containing two values, a base
|
||||||
|
address and a mask. The system bus will route all addresses that match the
|
||||||
|
base address on the bits not set in the mask. In this case, we set the
|
||||||
|
mask to ``0x3f``, which sets the lower six bits. This means that a 64 byte
|
||||||
|
region starting from the base address will be routed to this device.
|
||||||
|
|
||||||
|
The second argument to ``TLRegisterNode`` is a ``SimpleDevice`` object, which
|
||||||
|
provides the name and compatibility of the device table entry that will be
|
||||||
|
created for the peripheral. We won't show how this is used in this tutorial,
|
||||||
|
but it will be important if you want to create a Linux kernel driver for
|
||||||
|
the device.
|
||||||
|
|
||||||
|
The third argument to ``TLRegisterNode`` is ``beatBytes``, which specifies
|
||||||
|
the width of the TileLink interface. We will just pass this through from a
|
||||||
|
class argument.
|
||||||
|
|
||||||
|
We want the device to be able to write a specified amount of bytes to a
|
||||||
|
specified location in memory, so we'll provide ``addr`` and ``len`` registers.
|
||||||
|
We will also want a ``running`` register for the CPU to signal that the device
|
||||||
|
to start operation and a ``complete`` register for the device to signal to
|
||||||
|
the CPU that it has completed.
|
||||||
|
|
||||||
|
.. code-block:: scala
|
||||||
|
|
||||||
|
class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
|
||||||
|
val addrBits = 64
|
||||||
|
val w = 64
|
||||||
|
val io = IO(new Bundle {
|
||||||
|
// Not used yet
|
||||||
|
val in = Flipped(Decoupled(UInt(w.W)))
|
||||||
|
}
|
||||||
|
val addr = Reg(UInt(addrBits.W))
|
||||||
|
val len = Reg(UInt(addrBits.W))
|
||||||
|
val running = RegInit(false.B)
|
||||||
|
val complete = RegInit(false.B)
|
||||||
|
|
||||||
|
outer.regnode.regmap(
|
||||||
|
0x00 -> Seq(RegField(addrBits, addr)),
|
||||||
|
0x08 -> Seq(RegField(addrBits, len)),
|
||||||
|
0x10 -> Seq(RegField(1, running)),
|
||||||
|
0x18 -> Seq(RegField(1, complete)))
|
||||||
|
}
|
||||||
|
|
||||||
|
The arguments to ``regmap`` should be a series of mappings from address
|
||||||
|
offsets to sequences of ``RegField`` objects. The ``RegField`` constructor
|
||||||
|
takes two arguments, the width of the register field and the RTL register
|
||||||
|
itself.
|
|
@ -0,0 +1,70 @@
|
||||||
|
Running Test Software
|
||||||
|
=====================
|
||||||
|
|
||||||
|
To test our input stream device, we want to write an application that uses
|
||||||
|
the device to write data into memory, then reads the data and prints it out.
|
||||||
|
|
||||||
|
In project-template, test software is placed in the ``tests/`` directory,
|
||||||
|
which includes a Makefile and library code for developing a baremetal program.
|
||||||
|
We'll create a new file at ``tests/input-stream.c`` with the following code:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
#include "mmio.h"
|
||||||
|
|
||||||
|
#define N 4
|
||||||
|
#define INPUTSTREAM_BASE 0x10017000L
|
||||||
|
#define INPUTSTREAM_ADDR (INPUTSTREAM_BASE + 0x00)
|
||||||
|
#define INPUTSTREAM_LEN (INPUTSTREAM_BASE + 0x08)
|
||||||
|
#define INPUTSTREAM_RUNNING (INPUTSTREAM_BASE + 0x10)
|
||||||
|
#define INPUTSTREAM_COMPLETE (INPUTSTREAM_BASE + 0x18)
|
||||||
|
|
||||||
|
uint64_t values[N];
|
||||||
|
|
||||||
|
int main(void)
|
||||||
|
{
|
||||||
|
reg_write64(INPUTSTREAM_ADDR, (uint64_t) values);
|
||||||
|
reg_write64(INPUTSTREAM_LEN, N * sizeof(uint64_t));
|
||||||
|
asm volatile ("fence");
|
||||||
|
reg_write64(INPUTSTREAM_RUNNING, 1);
|
||||||
|
|
||||||
|
while (reg_read64(INPUTSTREAM_COMPLETE) == 0) {}
|
||||||
|
reg_write64(INPUTSTREAM_COMPLETE, 0);
|
||||||
|
|
||||||
|
for (int i = 0; i < N; i++)
|
||||||
|
printf("%016lx\n", values[i]);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
This program statically allocates an array for the data to be written to.
|
||||||
|
It then sets the ``addr`` and ``len`` registers, executes a ``fence``
|
||||||
|
instruction to make sure they are committed, and then sets the ``running``
|
||||||
|
register. It then continuously polls the ``complete`` register until it sees
|
||||||
|
a non-zero value, at which point it knows the data has been written to memory
|
||||||
|
and is safe to read back.
|
||||||
|
|
||||||
|
To compile this program, add "input-stream" to the ``PROGRAMS`` list in
|
||||||
|
``tests/Makefile`` and run ``make`` from the tests directory.
|
||||||
|
|
||||||
|
To run the program, return to the ``vsim/`` directory and run the simulator
|
||||||
|
executable, passing the newly compiled ``input-stream.riscv`` executable
|
||||||
|
as an argument.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
$ cd vsim
|
||||||
|
$ ./simv-example-FixedInputStreamConfig ../tests/input-stream.riscv
|
||||||
|
|
||||||
|
The program should print out
|
||||||
|
|
||||||
|
.. code-block:: text
|
||||||
|
|
||||||
|
000000001002abcd
|
||||||
|
0000000034510204
|
||||||
|
0000000010329999
|
||||||
|
0000000092101222
|
|
@ -15,6 +15,7 @@ Welcome to FireSim's documentation!
|
||||||
single-node-sim
|
single-node-sim
|
||||||
cluster-sim
|
cluster-sim
|
||||||
advanced-usage
|
advanced-usage
|
||||||
|
Developing-New-Devices
|
||||||
|
|
||||||
Indices and tables
|
Indices and tables
|
||||||
==================
|
==================
|
||||||
|
|
Loading…
Reference in New Issue