diff --git a/docs/Connecting-Devices-to-Bus.rst b/docs/Connecting-Devices-to-Bus.rst new file mode 100644 index 00000000..0c824484 --- /dev/null +++ b/docs/Connecting-Devices-to-Bus.rst @@ -0,0 +1,97 @@ +Connecting Devices to Bus +========================= + +Now that we have finished designing our peripheral device, we need to +hook it up into the SoC. To do this, we first need to create two traits: +one for the lazy module and one for the module implementation. The lazy +module trait is the following. + +.. code-block:: scala + + trait HasPeripheryInputStream { this: BaseSubsystem => + private val portName = "input-stream" + val streamWidth = pbus.beatBytes * 8 + val inputstream = LazyModule(new InputStream(0x10017000, pbus.beatBytes)) + pbus.toVariableWidthSlave(Some(portName)) { inputstream.regnode } + sbus.fromPort(Some(portName))() := inputstream.dmanode + ibus.fromSync := inputstream.intnode + } + +We add the line ``this: BaseSubsystem =>`` to indicate that this trait will +eventually be mixed into a class that extends ``BaseSubsystem``, which contains +the definition of the system bus ``sbus``, peripheral bus ``pbus``, and +interrupt bus ``ibus``. We instantiate the ``InputStream`` lazy module and +give it the base address ``0x10017000``. We connect the ``pbus`` into the +register node, DMA node to the ``sbus``, and interrupt node to the ``ibus``. + +The module implementation trait is as follows: + +.. code-block:: scala + + trait HasPeripheryInputStreamModuleImp extends LazyModuleImp { + val outer: HasPeripheryInputStream + + val stream_in = IO(Flipped(Decoupled(UInt(outer.streamWidth.W)))) + outer.inputstream.module.io.in <> stream_in + + def connectFixedInput(data: Seq[BigInt]) { + val fixed = Module(new FixedInputStream(data, outer.streamWidth)) + stream_in <> fixed.io.out + } + } + +Since the interrupts and memory ports have already been connected in the +lazy module trait, the module implementation trait only needs to create the +external decoupled interface and connect that to the ``InputStream`` module +implementation. + +The ``connectFixedInput`` method will be used by the test harness to connect +an input stream model that just sends a pre-specified stream of data. + +We can now mix these traits into the SoC design. Open up +``src/main/scala/example/Top.scala`` and add the following: + +.. code-block:: scala + + class ExampleTopWithInputStream(implicit p: Parameters) extends ExampleTop + with HasPeripheryInputStream { + override lazy val module = new ExampleTopWithInputStreamModule(this) + } + + class ExampleTopWithInputStreamModule(outer: ExampleTopWithInputStream) + extends ExampleTopModuleImp(outer) + with HasPeripheryInputStreamModuleImp + + +We can then build a simulation using our new SoC by adding a configuration +to ``src/main/scala/example/Configs.scala``. This configuration will cause +the test harness to instantiate an SoC with the ``InputStream`` device +and then connect a fixed input stream model to it. + +.. code-block:: scala + + class WithFixedInputStream extends Config((site, here, up) => { + case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => { + val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module) + top.connectFixedInput(Seq( + BigInt("1002abcd", 16), + BigInt("34510204", 16), + BigInt("10329999", 16), + BigInt("92101222", 16))) + top + } + }) + + class FixedInputStreamConfig extends Config( + new WithFixedInputStream ++ new BaseExampleConfig) + +We can now compile the simulation using VCS. + +.. code-block:: scala + + cd vsim + make CONFIG=FixedInputStreamConfig + +This will produce a ``simv-example-FixedInputStreamConfig`` executable that +can be used to run tests. We will discuss how to write and run those tests in +the next section. diff --git a/docs/Creating-Simulation-Model.rst b/docs/Creating-Simulation-Model.rst new file mode 100644 index 00000000..2e03e484 --- /dev/null +++ b/docs/Creating-Simulation-Model.rst @@ -0,0 +1,247 @@ +Creating Simulation Model +========================= + +So far, we've been using a fixed input stream model to test our device. +But, ideally, we'd like an input stream that is defined by a software model +and configurable at runtime. We'd like to put the input data in a file and +pass it in as a command-line argument. We can't do that in Chisel. +We'll have to create the model in Verilog and call out to C++ using the +Verilog DPI-C API. + +First, how do we include Verilog code in a Chisel codebase? We can do this +using the Chisel BlackBox class. This class allows us to define IO ports and +can be used like a regular Chisel module, but the internal implementation is +left to Verilog. + +.. code-block:: scala + + class SimInputStream(w: Int) extends BlackBox(Map("DATA_BITS" -> IntParam(w))) { + val io = IO(new Bundle { + val clock = Input(Clock()) + val reset = Input(Bool()) + val out = Decoupled(UInt(w.W)) + }) + } + +One key difference in the IO bundle definition is that the implicit ``clock`` +and ``reset`` signals must be explicitly defined in a BlackBox. The BlackBox +class also takes a map that defines parameters that will be passed to the +verilog implementation. To connect the BlackBox in the test harness, we should +create a ``connectSimInput`` method in the ``HasPeripheryInputStreamModuleImp`` +trait. + +.. code-block:: scala + + def connectSimInput(clock: Clock, reset: Bool) { + val sim = Module(new SimInputStream(outer.streamWidth)) + sim.io.clock := clock + sim.io.reset := reset + stream_in <> sim.io.out + } + +We then add a new configuration class in +``src/main/scala/example/Configs.scala`` that calls the ``connectSimInput`` +method. + +.. code-block:: scala + + + class WithSimInputStream extends Config((site, here, up) => { + case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => { + val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module) + top.connectSimInput(clock, reset) + top + } + }) + + class SimInputStreamConfig extends Config( + new WithSimInputStream ++ new BaseExampleConfig) + +Now we need to create the verilog implementation of the ``SimInputStream`` +module. Make a new directory ``src/main/resources`` and add ``vsrc`` and ``csrc`` +subdirectories under it. + +.. code-block:: shell + + $ mkdir -p src/main/resources/{vsrc,csrc} + +In the ``vsrc`` directory, create a file called ``SimInputStream.v`` and add +the following code. + +.. code-block:: verilog + + import "DPI-C" function void input_stream_init + ( + input string filename, + input int data_bits + ); + + import "DPI-C" function void input_stream_tick + ( + output bit out_valid, + input bit out_ready, + output longint out_bits + ); + + module SimInputStream #(DATA_BITS=64) ( + input clock, + input reset, + output out_valid, + input out_ready, + output [DATA_BITS-1:0] out_bits + ); + + bit __out_valid; + longint __out_bits; + string filename; + int data_bits; + + reg __out_valid_reg; + reg [DATA_BITS-1:0] __out_bits_reg; + + initial begin + data_bits = DATA_BITS; + if ($value$plusargs("instream=%s", filename)) begin + input_stream_init(filename, data_bits); + end + end + + always @(posedge clock) begin + if (reset) begin + __out_valid = 0; + __out_bits = 0; + + __out_valid_reg <= 0; + __out_bits_reg <= 0; + end else begin + input_stream_tick( + __out_valid, + out_ready, + __out_bits); + __out_valid_reg <= __out_valid; + __out_bits_reg <= __out_bits; + end + end + + assign out_valid = __out_valid_reg; + assign out_bits = __out_bits_reg; + + endmodule + +The verilog defines its inputs and outputs to match the definition in the +Chisel BlackBox. But most of the implementation is left to C++ through the +DPI functions ``input_stream_init`` and ``input_stream_tick``. We define +these functions in a ``SimInputStream.cc`` file in the ``csrc`` directory. + +.. code-block:: c++ + + #include + #include + #include + + class InputStream { + public: + InputStream(const char *filename, int nbytes); + ~InputStream(void); + + bool out_valid() { return !complete; } + uint64_t out_bits() { return data; } + void tick(bool out_ready); + + private: + void read_next(void); + bool complete; + FILE *file; + int nbytes; + uint64_t data; + }; + + InputStream::InputStream(const char *filename, int nbytes) + { + this->nbytes = nbytes; + this->file = fopen(filename, "r"); + if (this->file == NULL) { + fprintf(stderr, "Could not open %s\n", filename); + abort(); + } + + read_next(); + } + + InputStream::~InputStream(void) + { + fclose(this->file); + } + + void InputStream::read_next(void) + { + int res; + + this->data = 0; + + res = fread(&this->data, this->nbytes, 1, this->file); + if (res < 0) { + perror("fread"); + abort(); + } + + this->complete = (res == 0); + } + + void InputStream::tick(bool out_ready) + { + int res; + + if (out_valid() && out_ready) + read_next(); + } + + InputStream *stream = NULL; + + extern "C" void input_stream_init(const char *filename, int data_bits) + { + stream = new InputStream(filename, data_bits/8); + } + + extern "C" void input_stream_tick( + unsigned char *out_valid, + unsigned char out_ready, + long long *out_bits) + { + stream->tick(out_ready); + *out_valid = stream->out_valid(); + *out_bits = stream->out_bits(); + } + +In the C++ file, we implement an ``InputStream`` class that takes a file name +as its argument. It opens the file and reads ``nbytes`` from it for every +ready-valid handshake. The ``input_stream_init`` function constructs an +``InputStream`` class and assigns it to a global pointer. The +``input_stream_tick`` function updates the state by calling the ``tick`` +method, passing in the inputs from verilog. It then assigns values to the +verilog outputs. + +You can now build this new configuration in VCS. + +.. code-block:: shell + + $ cd vsim + $ make CONFIG=SimInputStreamConfig + +Now create a file that can be used as the input stream data. Just getting +random bytes from ``/dev/urandom`` would work. Pass this to your simulation +through the ``+instream=`` flag, and you should see the data get printed +out in the ``input-stream.riscv`` test. + +.. code-block:: shell + + $ dd if=/dev/urandom of=instream.img bs=32 count=1 + $ hexdump instream.img + 0000000 189b f12a 1cc1 9eb5 b65d bbef 96b6 4949 + 0000010 f8c8 636c 76fe 15f3 0665 0ef9 8c5d 3011 + 0000020 + $ ./simv-example-SimInputStreamConfig +instream=instream.img ../tests/input-stream.riscv + 9eb51cc1f12a189b + 494996b6bbefb65d + 15f376fe636cf8c8 + 30118c5d0ef90665 diff --git a/docs/DMA-and-Interrupts.rst b/docs/DMA-and-Interrupts.rst new file mode 100644 index 00000000..f75e8934 --- /dev/null +++ b/docs/DMA-and-Interrupts.rst @@ -0,0 +1,151 @@ +DMA and Interrupts +================== + +In order to move data from the external input stream to memory, we need to +perform direct memory access (DMA). We can achieve this by giving the device +a TLClientNode. Once we add it, the ``LazyModule`` will now look like this: + +.. code-block:: scala + + class InputStream( + address: BigInt, + val beatBytes: Int = 8, + val maxInflight: Int = 4) + (implicit p: Parameters) extends LazyModule { + + val device = new SimpleDevice("input-stream", Seq("example,input-stream")) + val regnode = TLRegisterNode( + address = Seq(AddressSet(address, 0x3f)), + device = device, + beatBytes = beatBytes) + val dmanode = TLClientNode(Seq(TLClientPortParameters( + Seq(TLClientParameters( + name = "input-stream", + sourceId = IdRange(0, maxInflight)))))) + + lazy val module = new InputStreamModuleImp(this) + } + +For our ``TLClientNode``, we only need a single port, so we specify a single +set of ``TLClientPortParameters`` and ``TLClientParameters``. We override two +arguments in the ``TLClientParameters`` constructor. The ``name`` is the +name of the port and ``sourceId`` indicates the range of transaction IDs +that can be used in memory requests. The lower bound is inclusive, and the +upper bound is exclusive, so this device can use source IDs from 0 to +``maxInflight - 1``. + +In the module implementation, we can now implement a state machine that +sends write requests to memory. We first call `outer.dmanode.out` to get +a sequence of output port tuples. Since we only have one port, we can just +pull out the first element of this sequence. For each port, we get a pair of +objects. The first is the physical TileLink port, which we can connect to RTL. +The second is a ``TLEdge`` object, which we can use to get extra metadata about +the tilelink port (like the number of address and data bits). + +.. code-block:: scala + + class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) { + val (tl, edge) = outer.dmanode.out(0) + val addrBits = edge.bundle.addressBits + val w = edge.bundle.dataBits + val beatBytes = (w / 8) + + val io = IO(new Bundle { + val in = Flipped(Decoupled(UInt(w.W))) + }) + + val addr = Reg(UInt(addrBits.W)) + val len = Reg(UInt(addrBits.W)) + val running = RegInit(false.B) + val complete = RegInit(false.B) + + val s_idle :: s_issue :: s_wait :: Nil = Enum(3) + val state = RegInit(s_idle) + + val nXacts = outer.maxInflight + val xactBusy = RegInit(0.U(nXacts.W)) + val xactOnehot = PriorityEncoderOH(~xactBusy) + val canIssue = (state === s_issue) && !xactBusy.andR + + io.in.ready := canIssue && tl.a.ready + tl.a.valid := canIssue && io.in.valid + tl.a.bits := edge.Put( + fromSource = OHToUInt(xactOnehot), + toAddress = addr, + lgSize = log2Ceil(beatBytes).U, + data = io.in.bits)._2 + tl.d.ready := running && xactBusy.orR + + xactBusy := (xactBusy | + Mux(tl.a.fire(), xactOnehot, 0.U(nXacts.W))) & + ~Mux(tl.d.fire(), UIntToOH(tl.d.bits.source), 0.U) + + when (state === s_idle && running) { + assert(addr(log2Ceil(beatBytes)-1,0) === 0.U, + s"InputStream base address not aligned to ${beatBytes} bytes") + assert(len(log2Ceil(beatBytes)-1,0) === 0.U, + s"InputStream length not aligned to ${beatBytes} bytes") + state := s_issue + } + + when (io.in.fire()) { + addr := addr + beatBytes.U + len := len - beatBytes.U + when (len === beatBytes.U) { state := s_wait } + } + + when (state === s_wait && !xactBusy.orR) { + running := false.B + complete := true.B + state := s_idle + } + + outer.regnode.regmap( + 0x00 -> Seq(RegField(addrBits, addr)), + 0x08 -> Seq(RegField(addrBits, len)), + 0x10 -> Seq(RegField(1, running)), + 0x18 -> Seq(RegField(1, complete))) + } + +The state machine starts in the ``s_idle`` state. In this state, the CPU should +set the ``addr`` and ``len`` registers and then set the ``running`` register to +1. The state machine then moves into the ``s_issue`` state, in which it +forwards data from the ``in`` decoupled interface to memory through the +TileLink `A` channel. + +We construct the `A` channel requests using the ``Put`` method in the +``TLEdge`` object we extracted earlier. The ``Put`` method takes a unique +source ID in ``fromSource``, the address to write to in ``toAddress``, the +base-2 logarithm of the size in bytes in ``lgSize``, and the data to be written +in ``data``. + +The source field must observe some constraints. There can only be one +transaction with each distinct source ID in flight at a given time. +Once you send a request on the `A` channel with a specific source ID, +you cannot send another until after you've received the response for it +on the `D` channel. + +Once all requests have been sent on the `A` channel, the state machine +transitions to the ``s_wait`` state to wait for the remaining responses on +the `D` channel. Once the responses have all returned, the state machine +sets ``running`` to false and ``completed`` to true. The CPU can poll the +``completed`` register to check if the operation has finished. + +However, for long-running operations, we would usually like to have the device +notify the CPU through an interrupt. To add an interrupt to the device, +we need to create an ``IntSourceNode`` in the lazy module. + +.. code-block:: scala + + val intnode = IntSourceNode(IntSourcePortSimple(resources = device.int)) + +Then, in the module implementation, we can connect the ``complete`` register +to the interrupt line. That way, the CPU will get interrupted once the +state machine completes. It can clear the interrupt by writing a 0 to the +``complete`` register. + +.. code-block:: scala + + val (interrupt, _) = outer.intnode.out(0) + + interrupt(0) := complete diff --git a/docs/Developing-New-Devices.rst b/docs/Developing-New-Devices.rst new file mode 100644 index 00000000..af1a5a7b --- /dev/null +++ b/docs/Developing-New-Devices.rst @@ -0,0 +1,13 @@ +Developing New Devices +====================== + +.. toctree:: + :maxdepth: 2 + :caption: Developing New Devices: + + Getting-Started + MMIO-mapped-Registers + DMA-and-Interrupts + Connecting-Devices-to-Bus + Running-Test-Software + Creating-Simulation-Model diff --git a/docs/Getting-Started.rst b/docs/Getting-Started.rst new file mode 100644 index 00000000..33e1c627 --- /dev/null +++ b/docs/Getting-Started.rst @@ -0,0 +1,42 @@ +Getting Started +=============== + +In this tutorial, we will show you how to design a new memory-mapped IO +device, test it in simulation, and then build and run it on FireSim. + +To start with, you will need to clone a copy of FireChip, the repository +that aggregates all the target RTL for FireSim. FireSim already contains +FireChip as a submodule under ``target-design/firechip``, but it makes patches +to the codebase so that it will work with the FPGA tools. Therefore, you will +need to clone a clean copy if you want to use FireChip standalone. + +Go to https://github.com/firesim/firechip and click the "Fork" button to +fork the repository to your own account. Now clone the new repo to your +local machine and initialize the submodules. + +.. code-block:: shell + + $ git clone git@github.com:yourusername/firechip.git + $ cd firechip + $ git submodule update --init + $ cd rocket-chip + $ git submodule update --init + $ cd .. + +You will not need to install the riscv-tools again because you'll just be +reusing the one in firesim. So make sure to go into firesim and source +``sourceme-f1-full.sh`` before you run the rest of the commands in this +tutorial. + +Now that everything is checked out, you can build the VCS or Verilator +simulator and run the regression tests to make sure everything is working. + +.. code-block:: shell + + $ cd vsim # or "cd verisim" for verilator + $ make # builds the DefaultExampleConfig + $ make run-regression-tests + +If everything is set up correctly, you should see a bunch of ``*.out`` files +in the ``output/`` directory. If you open these up, they should all say +"Completed after XXXXX cycles" at the end and not have any error messages. diff --git a/docs/MMIO-mapped-Registers.rst b/docs/MMIO-mapped-Registers.rst new file mode 100644 index 00000000..eabf1115 --- /dev/null +++ b/docs/MMIO-mapped-Registers.rst @@ -0,0 +1,76 @@ +MMIO-mapped Registers +===================== + +In this tutorial, we will create a new device which pulls in data from an +externally-connected input stream and writes it to memory. We'll create out +device in the file ``src/main/scala/example/InputStream.scala``. The first +thing we need to do is set up some memory-mapped control registers that the +CPU can use to communicate with the device. The easiest way to do this is by +creating a ``TLRegisterNode``, which provides a ``regmap`` method that can be +used to generate the hardware for reading and writing to RTL registers. + +.. code-block:: scala + + class InputStream( + address: BigInt, + val beatBytes: Int = 8) + (implicit p: Parameters) extends LazyModule { + + val device = new SimpleDevice("input-stream", Seq("example,input-stream")) + val regnode = TLRegisterNode( + address = Seq(AddressSet(address, 0x3f)), + device = device, + beatBytes = beatBytes) + + lazy val module = new InputStreamModuleImp(this) + } + +We want to specify or override two arguments in the ``TLRegisterNode`` +constructor. The first is the address of the device in the memory map. +The address is specified as an ``AddressSet`` containing two values, a base +address and a mask. The system bus will route all addresses that match the +base address on the bits not set in the mask. In this case, we set the +mask to ``0x3f``, which sets the lower six bits. This means that a 64 byte +region starting from the base address will be routed to this device. + +The second argument to ``TLRegisterNode`` is a ``SimpleDevice`` object, which +provides the name and compatibility of the device table entry that will be +created for the peripheral. We won't show how this is used in this tutorial, +but it will be important if you want to create a Linux kernel driver for +the device. + +The third argument to ``TLRegisterNode`` is ``beatBytes``, which specifies +the width of the TileLink interface. We will just pass this through from a +class argument. + +We want the device to be able to write a specified amount of bytes to a +specified location in memory, so we'll provide ``addr`` and ``len`` registers. +We will also want a ``running`` register for the CPU to signal that the device +to start operation and a ``complete`` register for the device to signal to +the CPU that it has completed. + +.. code-block:: scala + + class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) { + val addrBits = 64 + val w = 64 + val io = IO(new Bundle { + // Not used yet + val in = Flipped(Decoupled(UInt(w.W))) + } + val addr = Reg(UInt(addrBits.W)) + val len = Reg(UInt(addrBits.W)) + val running = RegInit(false.B) + val complete = RegInit(false.B) + + outer.regnode.regmap( + 0x00 -> Seq(RegField(addrBits, addr)), + 0x08 -> Seq(RegField(addrBits, len)), + 0x10 -> Seq(RegField(1, running)), + 0x18 -> Seq(RegField(1, complete))) + } + +The arguments to ``regmap`` should be a series of mappings from address +offsets to sequences of ``RegField`` objects. The ``RegField`` constructor +takes two arguments, the width of the register field and the RTL register +itself. diff --git a/docs/Running-Test-Software.rst b/docs/Running-Test-Software.rst new file mode 100644 index 00000000..ec202f8b --- /dev/null +++ b/docs/Running-Test-Software.rst @@ -0,0 +1,70 @@ +Running Test Software +===================== + +To test our input stream device, we want to write an application that uses +the device to write data into memory, then reads the data and prints it out. + +In project-template, test software is placed in the ``tests/`` directory, +which includes a Makefile and library code for developing a baremetal program. +We'll create a new file at ``tests/input-stream.c`` with the following code: + +.. code-block:: c + + #include + #include + #include + + #include "mmio.h" + + #define N 4 + #define INPUTSTREAM_BASE 0x10017000L + #define INPUTSTREAM_ADDR (INPUTSTREAM_BASE + 0x00) + #define INPUTSTREAM_LEN (INPUTSTREAM_BASE + 0x08) + #define INPUTSTREAM_RUNNING (INPUTSTREAM_BASE + 0x10) + #define INPUTSTREAM_COMPLETE (INPUTSTREAM_BASE + 0x18) + + uint64_t values[N]; + + int main(void) + { + reg_write64(INPUTSTREAM_ADDR, (uint64_t) values); + reg_write64(INPUTSTREAM_LEN, N * sizeof(uint64_t)); + asm volatile ("fence"); + reg_write64(INPUTSTREAM_RUNNING, 1); + + while (reg_read64(INPUTSTREAM_COMPLETE) == 0) {} + reg_write64(INPUTSTREAM_COMPLETE, 0); + + for (int i = 0; i < N; i++) + printf("%016lx\n", values[i]); + + return 0; + } + +This program statically allocates an array for the data to be written to. +It then sets the ``addr`` and ``len`` registers, executes a ``fence`` +instruction to make sure they are committed, and then sets the ``running`` +register. It then continuously polls the ``complete`` register until it sees +a non-zero value, at which point it knows the data has been written to memory +and is safe to read back. + +To compile this program, add "input-stream" to the ``PROGRAMS`` list in +``tests/Makefile`` and run ``make`` from the tests directory. + +To run the program, return to the ``vsim/`` directory and run the simulator +executable, passing the newly compiled ``input-stream.riscv`` executable +as an argument. + +.. code-block:: shell + + $ cd vsim + $ ./simv-example-FixedInputStreamConfig ../tests/input-stream.riscv + +The program should print out + +.. code-block:: text + + 000000001002abcd + 0000000034510204 + 0000000010329999 + 0000000092101222 diff --git a/docs/index.rst b/docs/index.rst index bd1c03b4..f7f0bc65 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -15,6 +15,7 @@ Welcome to FireSim's documentation! single-node-sim cluster-sim advanced-usage + Developing-New-Devices Indices and tables ==================