start tutorial on adding new devices
This commit is contained in:
parent
bba9dea481
commit
f415332ba6
|
@ -0,0 +1,97 @@
|
|||
Connecting Devices to Bus
|
||||
=========================
|
||||
|
||||
Now that we have finished designing our peripheral device, we need to
|
||||
hook it up into the SoC. To do this, we first need to create two traits:
|
||||
one for the lazy module and one for the module implementation. The lazy
|
||||
module trait is the following.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
trait HasPeripheryInputStream { this: BaseSubsystem =>
|
||||
private val portName = "input-stream"
|
||||
val streamWidth = pbus.beatBytes * 8
|
||||
val inputstream = LazyModule(new InputStream(0x10017000, pbus.beatBytes))
|
||||
pbus.toVariableWidthSlave(Some(portName)) { inputstream.regnode }
|
||||
sbus.fromPort(Some(portName))() := inputstream.dmanode
|
||||
ibus.fromSync := inputstream.intnode
|
||||
}
|
||||
|
||||
We add the line ``this: BaseSubsystem =>`` to indicate that this trait will
|
||||
eventually be mixed into a class that extends ``BaseSubsystem``, which contains
|
||||
the definition of the system bus ``sbus``, peripheral bus ``pbus``, and
|
||||
interrupt bus ``ibus``. We instantiate the ``InputStream`` lazy module and
|
||||
give it the base address ``0x10017000``. We connect the ``pbus`` into the
|
||||
register node, DMA node to the ``sbus``, and interrupt node to the ``ibus``.
|
||||
|
||||
The module implementation trait is as follows:
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
trait HasPeripheryInputStreamModuleImp extends LazyModuleImp {
|
||||
val outer: HasPeripheryInputStream
|
||||
|
||||
val stream_in = IO(Flipped(Decoupled(UInt(outer.streamWidth.W))))
|
||||
outer.inputstream.module.io.in <> stream_in
|
||||
|
||||
def connectFixedInput(data: Seq[BigInt]) {
|
||||
val fixed = Module(new FixedInputStream(data, outer.streamWidth))
|
||||
stream_in <> fixed.io.out
|
||||
}
|
||||
}
|
||||
|
||||
Since the interrupts and memory ports have already been connected in the
|
||||
lazy module trait, the module implementation trait only needs to create the
|
||||
external decoupled interface and connect that to the ``InputStream`` module
|
||||
implementation.
|
||||
|
||||
The ``connectFixedInput`` method will be used by the test harness to connect
|
||||
an input stream model that just sends a pre-specified stream of data.
|
||||
|
||||
We can now mix these traits into the SoC design. Open up
|
||||
``src/main/scala/example/Top.scala`` and add the following:
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
class ExampleTopWithInputStream(implicit p: Parameters) extends ExampleTop
|
||||
with HasPeripheryInputStream {
|
||||
override lazy val module = new ExampleTopWithInputStreamModule(this)
|
||||
}
|
||||
|
||||
class ExampleTopWithInputStreamModule(outer: ExampleTopWithInputStream)
|
||||
extends ExampleTopModuleImp(outer)
|
||||
with HasPeripheryInputStreamModuleImp
|
||||
|
||||
|
||||
We can then build a simulation using our new SoC by adding a configuration
|
||||
to ``src/main/scala/example/Configs.scala``. This configuration will cause
|
||||
the test harness to instantiate an SoC with the ``InputStream`` device
|
||||
and then connect a fixed input stream model to it.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
class WithFixedInputStream extends Config((site, here, up) => {
|
||||
case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
|
||||
val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
|
||||
top.connectFixedInput(Seq(
|
||||
BigInt("1002abcd", 16),
|
||||
BigInt("34510204", 16),
|
||||
BigInt("10329999", 16),
|
||||
BigInt("92101222", 16)))
|
||||
top
|
||||
}
|
||||
})
|
||||
|
||||
class FixedInputStreamConfig extends Config(
|
||||
new WithFixedInputStream ++ new BaseExampleConfig)
|
||||
|
||||
We can now compile the simulation using VCS.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
cd vsim
|
||||
make CONFIG=FixedInputStreamConfig
|
||||
|
||||
This will produce a ``simv-example-FixedInputStreamConfig`` executable that
|
||||
can be used to run tests. We will discuss how to write and run those tests in
|
||||
the next section.
|
|
@ -0,0 +1,247 @@
|
|||
Creating Simulation Model
|
||||
=========================
|
||||
|
||||
So far, we've been using a fixed input stream model to test our device.
|
||||
But, ideally, we'd like an input stream that is defined by a software model
|
||||
and configurable at runtime. We'd like to put the input data in a file and
|
||||
pass it in as a command-line argument. We can't do that in Chisel.
|
||||
We'll have to create the model in Verilog and call out to C++ using the
|
||||
Verilog DPI-C API.
|
||||
|
||||
First, how do we include Verilog code in a Chisel codebase? We can do this
|
||||
using the Chisel BlackBox class. This class allows us to define IO ports and
|
||||
can be used like a regular Chisel module, but the internal implementation is
|
||||
left to Verilog.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
class SimInputStream(w: Int) extends BlackBox(Map("DATA_BITS" -> IntParam(w))) {
|
||||
val io = IO(new Bundle {
|
||||
val clock = Input(Clock())
|
||||
val reset = Input(Bool())
|
||||
val out = Decoupled(UInt(w.W))
|
||||
})
|
||||
}
|
||||
|
||||
One key difference in the IO bundle definition is that the implicit ``clock``
|
||||
and ``reset`` signals must be explicitly defined in a BlackBox. The BlackBox
|
||||
class also takes a map that defines parameters that will be passed to the
|
||||
verilog implementation. To connect the BlackBox in the test harness, we should
|
||||
create a ``connectSimInput`` method in the ``HasPeripheryInputStreamModuleImp``
|
||||
trait.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
def connectSimInput(clock: Clock, reset: Bool) {
|
||||
val sim = Module(new SimInputStream(outer.streamWidth))
|
||||
sim.io.clock := clock
|
||||
sim.io.reset := reset
|
||||
stream_in <> sim.io.out
|
||||
}
|
||||
|
||||
We then add a new configuration class in
|
||||
``src/main/scala/example/Configs.scala`` that calls the ``connectSimInput``
|
||||
method.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
|
||||
class WithSimInputStream extends Config((site, here, up) => {
|
||||
case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
|
||||
val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
|
||||
top.connectSimInput(clock, reset)
|
||||
top
|
||||
}
|
||||
})
|
||||
|
||||
class SimInputStreamConfig extends Config(
|
||||
new WithSimInputStream ++ new BaseExampleConfig)
|
||||
|
||||
Now we need to create the verilog implementation of the ``SimInputStream``
|
||||
module. Make a new directory ``src/main/resources`` and add ``vsrc`` and ``csrc``
|
||||
subdirectories under it.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ mkdir -p src/main/resources/{vsrc,csrc}
|
||||
|
||||
In the ``vsrc`` directory, create a file called ``SimInputStream.v`` and add
|
||||
the following code.
|
||||
|
||||
.. code-block:: verilog
|
||||
|
||||
import "DPI-C" function void input_stream_init
|
||||
(
|
||||
input string filename,
|
||||
input int data_bits
|
||||
);
|
||||
|
||||
import "DPI-C" function void input_stream_tick
|
||||
(
|
||||
output bit out_valid,
|
||||
input bit out_ready,
|
||||
output longint out_bits
|
||||
);
|
||||
|
||||
module SimInputStream #(DATA_BITS=64) (
|
||||
input clock,
|
||||
input reset,
|
||||
output out_valid,
|
||||
input out_ready,
|
||||
output [DATA_BITS-1:0] out_bits
|
||||
);
|
||||
|
||||
bit __out_valid;
|
||||
longint __out_bits;
|
||||
string filename;
|
||||
int data_bits;
|
||||
|
||||
reg __out_valid_reg;
|
||||
reg [DATA_BITS-1:0] __out_bits_reg;
|
||||
|
||||
initial begin
|
||||
data_bits = DATA_BITS;
|
||||
if ($value$plusargs("instream=%s", filename)) begin
|
||||
input_stream_init(filename, data_bits);
|
||||
end
|
||||
end
|
||||
|
||||
always @(posedge clock) begin
|
||||
if (reset) begin
|
||||
__out_valid = 0;
|
||||
__out_bits = 0;
|
||||
|
||||
__out_valid_reg <= 0;
|
||||
__out_bits_reg <= 0;
|
||||
end else begin
|
||||
input_stream_tick(
|
||||
__out_valid,
|
||||
out_ready,
|
||||
__out_bits);
|
||||
__out_valid_reg <= __out_valid;
|
||||
__out_bits_reg <= __out_bits;
|
||||
end
|
||||
end
|
||||
|
||||
assign out_valid = __out_valid_reg;
|
||||
assign out_bits = __out_bits_reg;
|
||||
|
||||
endmodule
|
||||
|
||||
The verilog defines its inputs and outputs to match the definition in the
|
||||
Chisel BlackBox. But most of the implementation is left to C++ through the
|
||||
DPI functions ``input_stream_init`` and ``input_stream_tick``. We define
|
||||
these functions in a ``SimInputStream.cc`` file in the ``csrc`` directory.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdint.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
class InputStream {
|
||||
public:
|
||||
InputStream(const char *filename, int nbytes);
|
||||
~InputStream(void);
|
||||
|
||||
bool out_valid() { return !complete; }
|
||||
uint64_t out_bits() { return data; }
|
||||
void tick(bool out_ready);
|
||||
|
||||
private:
|
||||
void read_next(void);
|
||||
bool complete;
|
||||
FILE *file;
|
||||
int nbytes;
|
||||
uint64_t data;
|
||||
};
|
||||
|
||||
InputStream::InputStream(const char *filename, int nbytes)
|
||||
{
|
||||
this->nbytes = nbytes;
|
||||
this->file = fopen(filename, "r");
|
||||
if (this->file == NULL) {
|
||||
fprintf(stderr, "Could not open %s\n", filename);
|
||||
abort();
|
||||
}
|
||||
|
||||
read_next();
|
||||
}
|
||||
|
||||
InputStream::~InputStream(void)
|
||||
{
|
||||
fclose(this->file);
|
||||
}
|
||||
|
||||
void InputStream::read_next(void)
|
||||
{
|
||||
int res;
|
||||
|
||||
this->data = 0;
|
||||
|
||||
res = fread(&this->data, this->nbytes, 1, this->file);
|
||||
if (res < 0) {
|
||||
perror("fread");
|
||||
abort();
|
||||
}
|
||||
|
||||
this->complete = (res == 0);
|
||||
}
|
||||
|
||||
void InputStream::tick(bool out_ready)
|
||||
{
|
||||
int res;
|
||||
|
||||
if (out_valid() && out_ready)
|
||||
read_next();
|
||||
}
|
||||
|
||||
InputStream *stream = NULL;
|
||||
|
||||
extern "C" void input_stream_init(const char *filename, int data_bits)
|
||||
{
|
||||
stream = new InputStream(filename, data_bits/8);
|
||||
}
|
||||
|
||||
extern "C" void input_stream_tick(
|
||||
unsigned char *out_valid,
|
||||
unsigned char out_ready,
|
||||
long long *out_bits)
|
||||
{
|
||||
stream->tick(out_ready);
|
||||
*out_valid = stream->out_valid();
|
||||
*out_bits = stream->out_bits();
|
||||
}
|
||||
|
||||
In the C++ file, we implement an ``InputStream`` class that takes a file name
|
||||
as its argument. It opens the file and reads ``nbytes`` from it for every
|
||||
ready-valid handshake. The ``input_stream_init`` function constructs an
|
||||
``InputStream`` class and assigns it to a global pointer. The
|
||||
``input_stream_tick`` function updates the state by calling the ``tick``
|
||||
method, passing in the inputs from verilog. It then assigns values to the
|
||||
verilog outputs.
|
||||
|
||||
You can now build this new configuration in VCS.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cd vsim
|
||||
$ make CONFIG=SimInputStreamConfig
|
||||
|
||||
Now create a file that can be used as the input stream data. Just getting
|
||||
random bytes from ``/dev/urandom`` would work. Pass this to your simulation
|
||||
through the ``+instream=`` flag, and you should see the data get printed
|
||||
out in the ``input-stream.riscv`` test.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ dd if=/dev/urandom of=instream.img bs=32 count=1
|
||||
$ hexdump instream.img
|
||||
0000000 189b f12a 1cc1 9eb5 b65d bbef 96b6 4949
|
||||
0000010 f8c8 636c 76fe 15f3 0665 0ef9 8c5d 3011
|
||||
0000020
|
||||
$ ./simv-example-SimInputStreamConfig +instream=instream.img ../tests/input-stream.riscv
|
||||
9eb51cc1f12a189b
|
||||
494996b6bbefb65d
|
||||
15f376fe636cf8c8
|
||||
30118c5d0ef90665
|
|
@ -0,0 +1,151 @@
|
|||
DMA and Interrupts
|
||||
==================
|
||||
|
||||
In order to move data from the external input stream to memory, we need to
|
||||
perform direct memory access (DMA). We can achieve this by giving the device
|
||||
a TLClientNode. Once we add it, the ``LazyModule`` will now look like this:
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
class InputStream(
|
||||
address: BigInt,
|
||||
val beatBytes: Int = 8,
|
||||
val maxInflight: Int = 4)
|
||||
(implicit p: Parameters) extends LazyModule {
|
||||
|
||||
val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
|
||||
val regnode = TLRegisterNode(
|
||||
address = Seq(AddressSet(address, 0x3f)),
|
||||
device = device,
|
||||
beatBytes = beatBytes)
|
||||
val dmanode = TLClientNode(Seq(TLClientPortParameters(
|
||||
Seq(TLClientParameters(
|
||||
name = "input-stream",
|
||||
sourceId = IdRange(0, maxInflight))))))
|
||||
|
||||
lazy val module = new InputStreamModuleImp(this)
|
||||
}
|
||||
|
||||
For our ``TLClientNode``, we only need a single port, so we specify a single
|
||||
set of ``TLClientPortParameters`` and ``TLClientParameters``. We override two
|
||||
arguments in the ``TLClientParameters`` constructor. The ``name`` is the
|
||||
name of the port and ``sourceId`` indicates the range of transaction IDs
|
||||
that can be used in memory requests. The lower bound is inclusive, and the
|
||||
upper bound is exclusive, so this device can use source IDs from 0 to
|
||||
``maxInflight - 1``.
|
||||
|
||||
In the module implementation, we can now implement a state machine that
|
||||
sends write requests to memory. We first call `outer.dmanode.out` to get
|
||||
a sequence of output port tuples. Since we only have one port, we can just
|
||||
pull out the first element of this sequence. For each port, we get a pair of
|
||||
objects. The first is the physical TileLink port, which we can connect to RTL.
|
||||
The second is a ``TLEdge`` object, which we can use to get extra metadata about
|
||||
the tilelink port (like the number of address and data bits).
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
|
||||
val (tl, edge) = outer.dmanode.out(0)
|
||||
val addrBits = edge.bundle.addressBits
|
||||
val w = edge.bundle.dataBits
|
||||
val beatBytes = (w / 8)
|
||||
|
||||
val io = IO(new Bundle {
|
||||
val in = Flipped(Decoupled(UInt(w.W)))
|
||||
})
|
||||
|
||||
val addr = Reg(UInt(addrBits.W))
|
||||
val len = Reg(UInt(addrBits.W))
|
||||
val running = RegInit(false.B)
|
||||
val complete = RegInit(false.B)
|
||||
|
||||
val s_idle :: s_issue :: s_wait :: Nil = Enum(3)
|
||||
val state = RegInit(s_idle)
|
||||
|
||||
val nXacts = outer.maxInflight
|
||||
val xactBusy = RegInit(0.U(nXacts.W))
|
||||
val xactOnehot = PriorityEncoderOH(~xactBusy)
|
||||
val canIssue = (state === s_issue) && !xactBusy.andR
|
||||
|
||||
io.in.ready := canIssue && tl.a.ready
|
||||
tl.a.valid := canIssue && io.in.valid
|
||||
tl.a.bits := edge.Put(
|
||||
fromSource = OHToUInt(xactOnehot),
|
||||
toAddress = addr,
|
||||
lgSize = log2Ceil(beatBytes).U,
|
||||
data = io.in.bits)._2
|
||||
tl.d.ready := running && xactBusy.orR
|
||||
|
||||
xactBusy := (xactBusy |
|
||||
Mux(tl.a.fire(), xactOnehot, 0.U(nXacts.W))) &
|
||||
~Mux(tl.d.fire(), UIntToOH(tl.d.bits.source), 0.U)
|
||||
|
||||
when (state === s_idle && running) {
|
||||
assert(addr(log2Ceil(beatBytes)-1,0) === 0.U,
|
||||
s"InputStream base address not aligned to ${beatBytes} bytes")
|
||||
assert(len(log2Ceil(beatBytes)-1,0) === 0.U,
|
||||
s"InputStream length not aligned to ${beatBytes} bytes")
|
||||
state := s_issue
|
||||
}
|
||||
|
||||
when (io.in.fire()) {
|
||||
addr := addr + beatBytes.U
|
||||
len := len - beatBytes.U
|
||||
when (len === beatBytes.U) { state := s_wait }
|
||||
}
|
||||
|
||||
when (state === s_wait && !xactBusy.orR) {
|
||||
running := false.B
|
||||
complete := true.B
|
||||
state := s_idle
|
||||
}
|
||||
|
||||
outer.regnode.regmap(
|
||||
0x00 -> Seq(RegField(addrBits, addr)),
|
||||
0x08 -> Seq(RegField(addrBits, len)),
|
||||
0x10 -> Seq(RegField(1, running)),
|
||||
0x18 -> Seq(RegField(1, complete)))
|
||||
}
|
||||
|
||||
The state machine starts in the ``s_idle`` state. In this state, the CPU should
|
||||
set the ``addr`` and ``len`` registers and then set the ``running`` register to
|
||||
1. The state machine then moves into the ``s_issue`` state, in which it
|
||||
forwards data from the ``in`` decoupled interface to memory through the
|
||||
TileLink `A` channel.
|
||||
|
||||
We construct the `A` channel requests using the ``Put`` method in the
|
||||
``TLEdge`` object we extracted earlier. The ``Put`` method takes a unique
|
||||
source ID in ``fromSource``, the address to write to in ``toAddress``, the
|
||||
base-2 logarithm of the size in bytes in ``lgSize``, and the data to be written
|
||||
in ``data``.
|
||||
|
||||
The source field must observe some constraints. There can only be one
|
||||
transaction with each distinct source ID in flight at a given time.
|
||||
Once you send a request on the `A` channel with a specific source ID,
|
||||
you cannot send another until after you've received the response for it
|
||||
on the `D` channel.
|
||||
|
||||
Once all requests have been sent on the `A` channel, the state machine
|
||||
transitions to the ``s_wait`` state to wait for the remaining responses on
|
||||
the `D` channel. Once the responses have all returned, the state machine
|
||||
sets ``running`` to false and ``completed`` to true. The CPU can poll the
|
||||
``completed`` register to check if the operation has finished.
|
||||
|
||||
However, for long-running operations, we would usually like to have the device
|
||||
notify the CPU through an interrupt. To add an interrupt to the device,
|
||||
we need to create an ``IntSourceNode`` in the lazy module.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
val intnode = IntSourceNode(IntSourcePortSimple(resources = device.int))
|
||||
|
||||
Then, in the module implementation, we can connect the ``complete`` register
|
||||
to the interrupt line. That way, the CPU will get interrupted once the
|
||||
state machine completes. It can clear the interrupt by writing a 0 to the
|
||||
``complete`` register.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
val (interrupt, _) = outer.intnode.out(0)
|
||||
|
||||
interrupt(0) := complete
|
|
@ -0,0 +1,13 @@
|
|||
Developing New Devices
|
||||
======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Developing New Devices:
|
||||
|
||||
Getting-Started
|
||||
MMIO-mapped-Registers
|
||||
DMA-and-Interrupts
|
||||
Connecting-Devices-to-Bus
|
||||
Running-Test-Software
|
||||
Creating-Simulation-Model
|
|
@ -0,0 +1,42 @@
|
|||
Getting Started
|
||||
===============
|
||||
|
||||
In this tutorial, we will show you how to design a new memory-mapped IO
|
||||
device, test it in simulation, and then build and run it on FireSim.
|
||||
|
||||
To start with, you will need to clone a copy of FireChip, the repository
|
||||
that aggregates all the target RTL for FireSim. FireSim already contains
|
||||
FireChip as a submodule under ``target-design/firechip``, but it makes patches
|
||||
to the codebase so that it will work with the FPGA tools. Therefore, you will
|
||||
need to clone a clean copy if you want to use FireChip standalone.
|
||||
|
||||
Go to https://github.com/firesim/firechip and click the "Fork" button to
|
||||
fork the repository to your own account. Now clone the new repo to your
|
||||
local machine and initialize the submodules.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ git clone git@github.com:yourusername/firechip.git
|
||||
$ cd firechip
|
||||
$ git submodule update --init
|
||||
$ cd rocket-chip
|
||||
$ git submodule update --init
|
||||
$ cd ..
|
||||
|
||||
You will not need to install the riscv-tools again because you'll just be
|
||||
reusing the one in firesim. So make sure to go into firesim and source
|
||||
``sourceme-f1-full.sh`` before you run the rest of the commands in this
|
||||
tutorial.
|
||||
|
||||
Now that everything is checked out, you can build the VCS or Verilator
|
||||
simulator and run the regression tests to make sure everything is working.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cd vsim # or "cd verisim" for verilator
|
||||
$ make # builds the DefaultExampleConfig
|
||||
$ make run-regression-tests
|
||||
|
||||
If everything is set up correctly, you should see a bunch of ``*.out`` files
|
||||
in the ``output/`` directory. If you open these up, they should all say
|
||||
"Completed after XXXXX cycles" at the end and not have any error messages.
|
|
@ -0,0 +1,76 @@
|
|||
MMIO-mapped Registers
|
||||
=====================
|
||||
|
||||
In this tutorial, we will create a new device which pulls in data from an
|
||||
externally-connected input stream and writes it to memory. We'll create out
|
||||
device in the file ``src/main/scala/example/InputStream.scala``. The first
|
||||
thing we need to do is set up some memory-mapped control registers that the
|
||||
CPU can use to communicate with the device. The easiest way to do this is by
|
||||
creating a ``TLRegisterNode``, which provides a ``regmap`` method that can be
|
||||
used to generate the hardware for reading and writing to RTL registers.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
class InputStream(
|
||||
address: BigInt,
|
||||
val beatBytes: Int = 8)
|
||||
(implicit p: Parameters) extends LazyModule {
|
||||
|
||||
val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
|
||||
val regnode = TLRegisterNode(
|
||||
address = Seq(AddressSet(address, 0x3f)),
|
||||
device = device,
|
||||
beatBytes = beatBytes)
|
||||
|
||||
lazy val module = new InputStreamModuleImp(this)
|
||||
}
|
||||
|
||||
We want to specify or override two arguments in the ``TLRegisterNode``
|
||||
constructor. The first is the address of the device in the memory map.
|
||||
The address is specified as an ``AddressSet`` containing two values, a base
|
||||
address and a mask. The system bus will route all addresses that match the
|
||||
base address on the bits not set in the mask. In this case, we set the
|
||||
mask to ``0x3f``, which sets the lower six bits. This means that a 64 byte
|
||||
region starting from the base address will be routed to this device.
|
||||
|
||||
The second argument to ``TLRegisterNode`` is a ``SimpleDevice`` object, which
|
||||
provides the name and compatibility of the device table entry that will be
|
||||
created for the peripheral. We won't show how this is used in this tutorial,
|
||||
but it will be important if you want to create a Linux kernel driver for
|
||||
the device.
|
||||
|
||||
The third argument to ``TLRegisterNode`` is ``beatBytes``, which specifies
|
||||
the width of the TileLink interface. We will just pass this through from a
|
||||
class argument.
|
||||
|
||||
We want the device to be able to write a specified amount of bytes to a
|
||||
specified location in memory, so we'll provide ``addr`` and ``len`` registers.
|
||||
We will also want a ``running`` register for the CPU to signal that the device
|
||||
to start operation and a ``complete`` register for the device to signal to
|
||||
the CPU that it has completed.
|
||||
|
||||
.. code-block:: scala
|
||||
|
||||
class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
|
||||
val addrBits = 64
|
||||
val w = 64
|
||||
val io = IO(new Bundle {
|
||||
// Not used yet
|
||||
val in = Flipped(Decoupled(UInt(w.W)))
|
||||
}
|
||||
val addr = Reg(UInt(addrBits.W))
|
||||
val len = Reg(UInt(addrBits.W))
|
||||
val running = RegInit(false.B)
|
||||
val complete = RegInit(false.B)
|
||||
|
||||
outer.regnode.regmap(
|
||||
0x00 -> Seq(RegField(addrBits, addr)),
|
||||
0x08 -> Seq(RegField(addrBits, len)),
|
||||
0x10 -> Seq(RegField(1, running)),
|
||||
0x18 -> Seq(RegField(1, complete)))
|
||||
}
|
||||
|
||||
The arguments to ``regmap`` should be a series of mappings from address
|
||||
offsets to sequences of ``RegField`` objects. The ``RegField`` constructor
|
||||
takes two arguments, the width of the register field and the RTL register
|
||||
itself.
|
|
@ -0,0 +1,70 @@
|
|||
Running Test Software
|
||||
=====================
|
||||
|
||||
To test our input stream device, we want to write an application that uses
|
||||
the device to write data into memory, then reads the data and prints it out.
|
||||
|
||||
In project-template, test software is placed in the ``tests/`` directory,
|
||||
which includes a Makefile and library code for developing a baremetal program.
|
||||
We'll create a new file at ``tests/input-stream.c`` with the following code:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <stdint.h>
|
||||
|
||||
#include "mmio.h"
|
||||
|
||||
#define N 4
|
||||
#define INPUTSTREAM_BASE 0x10017000L
|
||||
#define INPUTSTREAM_ADDR (INPUTSTREAM_BASE + 0x00)
|
||||
#define INPUTSTREAM_LEN (INPUTSTREAM_BASE + 0x08)
|
||||
#define INPUTSTREAM_RUNNING (INPUTSTREAM_BASE + 0x10)
|
||||
#define INPUTSTREAM_COMPLETE (INPUTSTREAM_BASE + 0x18)
|
||||
|
||||
uint64_t values[N];
|
||||
|
||||
int main(void)
|
||||
{
|
||||
reg_write64(INPUTSTREAM_ADDR, (uint64_t) values);
|
||||
reg_write64(INPUTSTREAM_LEN, N * sizeof(uint64_t));
|
||||
asm volatile ("fence");
|
||||
reg_write64(INPUTSTREAM_RUNNING, 1);
|
||||
|
||||
while (reg_read64(INPUTSTREAM_COMPLETE) == 0) {}
|
||||
reg_write64(INPUTSTREAM_COMPLETE, 0);
|
||||
|
||||
for (int i = 0; i < N; i++)
|
||||
printf("%016lx\n", values[i]);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
This program statically allocates an array for the data to be written to.
|
||||
It then sets the ``addr`` and ``len`` registers, executes a ``fence``
|
||||
instruction to make sure they are committed, and then sets the ``running``
|
||||
register. It then continuously polls the ``complete`` register until it sees
|
||||
a non-zero value, at which point it knows the data has been written to memory
|
||||
and is safe to read back.
|
||||
|
||||
To compile this program, add "input-stream" to the ``PROGRAMS`` list in
|
||||
``tests/Makefile`` and run ``make`` from the tests directory.
|
||||
|
||||
To run the program, return to the ``vsim/`` directory and run the simulator
|
||||
executable, passing the newly compiled ``input-stream.riscv`` executable
|
||||
as an argument.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cd vsim
|
||||
$ ./simv-example-FixedInputStreamConfig ../tests/input-stream.riscv
|
||||
|
||||
The program should print out
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
000000001002abcd
|
||||
0000000034510204
|
||||
0000000010329999
|
||||
0000000092101222
|
|
@ -15,6 +15,7 @@ Welcome to FireSim's documentation!
|
|||
single-node-sim
|
||||
cluster-sim
|
||||
advanced-usage
|
||||
Developing-New-Devices
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
|
Loading…
Reference in New Issue