start tutorial on adding new devices

2018-05-11 16:36:57 -07:00 · 2018-05-11 16:36:57 -07:00 · f415332ba6
parent bba9dea481
commit f415332ba6
8 changed files with 697 additions and 0 deletions
--- a/docs/Connecting-Devices-to-Bus.rst
+++ b/docs/Connecting-Devices-to-Bus.rst
@ -0,0 +1,97 @@
+Connecting Devices to Bus
+=========================
+
+Now that we have finished designing our peripheral device, we need to
+hook it up into the SoC. To do this, we first need to create two traits:
+one for the lazy module and one for the module implementation. The lazy
+module trait is the following.
+
+.. code-block:: scala
+
+    trait HasPeripheryInputStream { this: BaseSubsystem =>
+      private val portName = "input-stream"
+      val streamWidth = pbus.beatBytes * 8
+      val inputstream = LazyModule(new InputStream(0x10017000, pbus.beatBytes))
+      pbus.toVariableWidthSlave(Some(portName)) { inputstream.regnode }
+      sbus.fromPort(Some(portName))() := inputstream.dmanode
+      ibus.fromSync := inputstream.intnode
+    }
+
+We add the line ``this: BaseSubsystem =>`` to indicate that this trait will
+eventually be mixed into a class that extends ``BaseSubsystem``, which contains
+the definition of the system bus ``sbus``, peripheral bus ``pbus``, and
+interrupt bus ``ibus``. We instantiate the ``InputStream`` lazy module and
+give it the base address ``0x10017000``. We connect the ``pbus`` into the
+register node, DMA node to the ``sbus``, and interrupt node to the ``ibus``.
+
+The module implementation trait is as follows:
+
+.. code-block:: scala
+
+    trait HasPeripheryInputStreamModuleImp extends LazyModuleImp {
+      val outer: HasPeripheryInputStream
+
+      val stream_in = IO(Flipped(Decoupled(UInt(outer.streamWidth.W))))
+      outer.inputstream.module.io.in <> stream_in
+
+      def connectFixedInput(data: Seq[BigInt]) {
+        val fixed = Module(new FixedInputStream(data, outer.streamWidth))
+        stream_in <> fixed.io.out
+      }
+    }
+
+Since the interrupts and memory ports have already been connected in the
+lazy module trait, the module implementation trait only needs to create the
+external decoupled interface and connect that to the ``InputStream`` module
+implementation.
+
+The ``connectFixedInput`` method will be used by the test harness to connect
+an input stream model that just sends a pre-specified stream of data.
+
+We can now mix these traits into the SoC design. Open up
+``src/main/scala/example/Top.scala`` and add the following:
+
+.. code-block:: scala
+
+    class ExampleTopWithInputStream(implicit p: Parameters) extends ExampleTop
+        with HasPeripheryInputStream {
+      override lazy val module = new ExampleTopWithInputStreamModule(this)
+    }
+
+    class ExampleTopWithInputStreamModule(outer: ExampleTopWithInputStream)
+      extends ExampleTopModuleImp(outer)
+      with HasPeripheryInputStreamModuleImp
+
+
+We can then build a simulation using our new SoC by adding a configuration
+to ``src/main/scala/example/Configs.scala``. This configuration will cause
+the test harness to instantiate an SoC with the ``InputStream`` device
+and then connect a fixed input stream model to it.
+
+.. code-block:: scala
+
+    class WithFixedInputStream extends Config((site, here, up) => {
+      case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
+        val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
+        top.connectFixedInput(Seq(
+          BigInt("1002abcd", 16),
+          BigInt("34510204", 16),
+          BigInt("10329999", 16),
+          BigInt("92101222", 16)))
+        top
+      }
+    })
+
+    class FixedInputStreamConfig extends Config(
+      new WithFixedInputStream ++ new BaseExampleConfig)
+
+We can now compile the simulation using VCS.
+
+.. code-block:: scala
+
+    cd vsim
+    make CONFIG=FixedInputStreamConfig
+
+This will produce a ``simv-example-FixedInputStreamConfig`` executable that
+can be used to run tests. We will discuss how to write and run those tests in
+the next section.
--- a/docs/Creating-Simulation-Model.rst
+++ b/docs/Creating-Simulation-Model.rst
@ -0,0 +1,247 @@
+Creating Simulation Model
+=========================
+
+So far, we've been using a fixed input stream model to test our device.
+But, ideally, we'd like an input stream that is defined by a software model
+and configurable at runtime. We'd like to put the input data in a file and
+pass it in as a command-line argument. We can't do that in Chisel.
+We'll have to create the model in Verilog and call out to C++ using the
+Verilog DPI-C API.
+
+First, how do we include Verilog code in a Chisel codebase? We can do this
+using the Chisel BlackBox class. This class allows us to define IO ports and
+can be used like a regular Chisel module, but the internal implementation is
+left to Verilog.
+
+.. code-block:: scala
+
+    class SimInputStream(w: Int) extends BlackBox(Map("DATA_BITS" -> IntParam(w))) {
+      val io = IO(new Bundle {
+        val clock = Input(Clock())
+        val reset = Input(Bool())
+        val out = Decoupled(UInt(w.W))
+      })
+    }
+
+One key difference in the IO bundle definition is that the implicit ``clock``
+and ``reset`` signals must be explicitly defined in a BlackBox. The BlackBox
+class also takes a map that defines parameters that will be passed to the
+verilog implementation. To connect the BlackBox in the test harness, we should
+create a ``connectSimInput`` method in the ``HasPeripheryInputStreamModuleImp``
+trait.
+
+.. code-block:: scala
+
+    def connectSimInput(clock: Clock, reset: Bool) {
+      val sim = Module(new SimInputStream(outer.streamWidth))
+      sim.io.clock := clock
+      sim.io.reset := reset
+      stream_in <> sim.io.out
+    }
+
+We then add a new configuration class in
+``src/main/scala/example/Configs.scala`` that calls the ``connectSimInput``
+method.
+
+.. code-block:: scala
+
+
+    class WithSimInputStream extends Config((site, here, up) => {
+      case BuildTop => (clock: Clock, reset: Bool, p: Parameters) => {
+        val top = Module(LazyModule(new ExampleTopWithInputStream()(p)).module)
+        top.connectSimInput(clock, reset)
+        top
+      }
+    })
+
+    class SimInputStreamConfig extends Config(
+      new WithSimInputStream ++ new BaseExampleConfig)
+
+Now we need to create the verilog implementation of the ``SimInputStream``
+module. Make a new directory ``src/main/resources`` and add ``vsrc`` and ``csrc``
+subdirectories under it.
+
+.. code-block:: shell
+
+    $ mkdir -p src/main/resources/{vsrc,csrc}
+
+In the ``vsrc`` directory, create a file called ``SimInputStream.v`` and add
+the following code.
+
+.. code-block:: verilog
+
+    import "DPI-C" function void input_stream_init
+    (
+        input string filename,
+        input int    data_bits
+    );
+
+    import "DPI-C" function void input_stream_tick
+    (
+        output bit     out_valid,
+        input  bit     out_ready,
+        output longint out_bits
+    );
+
+    module SimInputStream #(DATA_BITS=64) (
+        input                  clock,
+        input                  reset,
+        output                 out_valid,
+        input                  out_ready,
+        output [DATA_BITS-1:0] out_bits
+    );
+
+        bit __out_valid;
+        longint __out_bits;
+        string filename;
+        int data_bits;
+
+        reg                 __out_valid_reg;
+        reg [DATA_BITS-1:0] __out_bits_reg;
+
+        initial begin
+            data_bits = DATA_BITS;
+            if ($value$plusargs("instream=%s", filename)) begin
+                input_stream_init(filename, data_bits);
+            end
+        end
+
+        always @(posedge clock) begin
+            if (reset) begin
+                __out_valid = 0;
+                __out_bits = 0;
+
+                __out_valid_reg <= 0;
+                __out_bits_reg <= 0;
+            end else begin
+                input_stream_tick(
+                    __out_valid,
+                    out_ready,
+                    __out_bits);
+                __out_valid_reg <= __out_valid;
+                __out_bits_reg  <= __out_bits;
+            end
+        end
+
+        assign out_valid = __out_valid_reg;
+        assign out_bits  = __out_bits_reg;
+
+    endmodule
+
+The verilog defines its inputs and outputs to match the definition in the
+Chisel BlackBox. But most of the implementation is left to C++ through the
+DPI functions ``input_stream_init`` and ``input_stream_tick``. We define
+these functions in a ``SimInputStream.cc`` file in the ``csrc`` directory.
+
+.. code-block:: c++
+
+    #include <stdio.h>
+    #include <stdint.h>
+    #include <stdlib.h>
+
+    class InputStream {
+      public:
+        InputStream(const char *filename, int nbytes);
+        ~InputStream(void);
+
+        bool out_valid() { return !complete; }
+        uint64_t out_bits() { return data; }
+        void tick(bool out_ready);
+
+      private:
+        void read_next(void);
+        bool complete;
+        FILE *file;
+        int nbytes;
+        uint64_t data;
+    };
+
+    InputStream::InputStream(const char *filename, int nbytes)
+    {
+        this->nbytes = nbytes;
+        this->file = fopen(filename, "r");
+        if (this->file == NULL) {
+            fprintf(stderr, "Could not open %s\n", filename);
+            abort();
+        }
+
+        read_next();
+    }
+
+    InputStream::~InputStream(void)
+    {
+        fclose(this->file);
+    }
+
+    void InputStream::read_next(void)
+    {
+        int res;
+
+        this->data = 0;
+
+        res = fread(&this->data, this->nbytes, 1, this->file);
+        if (res < 0) {
+            perror("fread");
+            abort();
+        }
+
+        this->complete = (res == 0);
+    }
+
+    void InputStream::tick(bool out_ready)
+    {
+        int res;
+
+        if (out_valid() && out_ready)
+            read_next();
+    }
+
+    InputStream *stream = NULL;
+
+    extern "C" void input_stream_init(const char *filename, int data_bits)
+    {
+        stream = new InputStream(filename, data_bits/8);
+    }
+
+    extern "C" void input_stream_tick(
+            unsigned char *out_valid,
+            unsigned char out_ready,
+            long long     *out_bits)
+    {
+        stream->tick(out_ready);
+        *out_valid = stream->out_valid();
+        *out_bits  = stream->out_bits();
+    }
+
+In the C++ file, we implement an ``InputStream`` class that takes a file name
+as its argument. It opens the file and reads ``nbytes`` from it for every
+ready-valid handshake. The ``input_stream_init`` function constructs an
+``InputStream`` class and assigns it to a global pointer. The
+``input_stream_tick`` function updates the state by calling the ``tick``
+method, passing in the inputs from verilog. It then assigns values to the
+verilog outputs.
+
+You can now build this new configuration in VCS.
+
+.. code-block:: shell
+
+    $ cd vsim
+    $ make CONFIG=SimInputStreamConfig
+
+Now create a file that can be used as the input stream data. Just getting
+random bytes from ``/dev/urandom`` would work. Pass this to your simulation
+through the ``+instream=`` flag, and you should see the data get printed
+out in the ``input-stream.riscv`` test.
+
+.. code-block:: shell
+
+    $ dd if=/dev/urandom of=instream.img bs=32 count=1
+    $ hexdump instream.img
+    0000000 189b f12a 1cc1 9eb5 b65d bbef 96b6 4949
+    0000010 f8c8 636c 76fe 15f3 0665 0ef9 8c5d 3011
+    0000020
+    $ ./simv-example-SimInputStreamConfig +instream=instream.img ../tests/input-stream.riscv
+    9eb51cc1f12a189b
+    494996b6bbefb65d
+    15f376fe636cf8c8
+    30118c5d0ef90665
--- a/docs/DMA-and-Interrupts.rst
+++ b/docs/DMA-and-Interrupts.rst
@ -0,0 +1,151 @@
+DMA and Interrupts
+==================
+
+In order to move data from the external input stream to memory, we need to
+perform direct memory access (DMA). We can achieve this by giving the device
+a TLClientNode. Once we add it, the ``LazyModule`` will now look like this:
+
+.. code-block:: scala
+
+    class InputStream(
+        address: BigInt,
+        val beatBytes: Int = 8,
+        val maxInflight: Int = 4)
+        (implicit p: Parameters) extends LazyModule {
+    
+      val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
+      val regnode = TLRegisterNode(
+        address = Seq(AddressSet(address, 0x3f)),
+        device = device,
+        beatBytes = beatBytes)
+      val dmanode = TLClientNode(Seq(TLClientPortParameters(
+        Seq(TLClientParameters(
+          name = "input-stream",
+          sourceId = IdRange(0, maxInflight))))))
+    
+      lazy val module = new InputStreamModuleImp(this)
+    }
+
+For our ``TLClientNode``, we only need a single port, so we specify a single
+set of ``TLClientPortParameters`` and ``TLClientParameters``. We override two
+arguments in the ``TLClientParameters`` constructor. The ``name`` is the
+name of the port and ``sourceId`` indicates the range of transaction IDs
+that can be used in memory requests. The lower bound is inclusive, and the
+upper bound is exclusive, so this device can use source IDs from 0 to
+``maxInflight - 1``.
+
+In the module implementation, we can now implement a state machine that
+sends write requests to memory. We first call `outer.dmanode.out` to get
+a sequence of output port tuples. Since we only have one port, we can just
+pull out the first element of this sequence. For each port, we get a pair of
+objects. The first is the physical TileLink port, which we can connect to RTL.
+The second is a ``TLEdge`` object, which we can use to get extra metadata about
+the tilelink port (like the number of address and data bits). 
+
+.. code-block:: scala
+
+    class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
+      val (tl, edge) = outer.dmanode.out(0)
+      val addrBits = edge.bundle.addressBits
+      val w = edge.bundle.dataBits
+      val beatBytes = (w / 8)
+
+      val io = IO(new Bundle {
+        val in = Flipped(Decoupled(UInt(w.W)))
+      })
+
+      val addr = Reg(UInt(addrBits.W))
+      val len = Reg(UInt(addrBits.W))
+      val running = RegInit(false.B)
+      val complete = RegInit(false.B)
+
+      val s_idle :: s_issue :: s_wait :: Nil = Enum(3)
+      val state = RegInit(s_idle)
+
+      val nXacts = outer.maxInflight
+      val xactBusy = RegInit(0.U(nXacts.W))
+      val xactOnehot = PriorityEncoderOH(~xactBusy)
+      val canIssue = (state === s_issue) && !xactBusy.andR
+
+      io.in.ready := canIssue && tl.a.ready
+      tl.a.valid  := canIssue && io.in.valid
+      tl.a.bits   := edge.Put(
+        fromSource = OHToUInt(xactOnehot),
+        toAddress = addr,
+        lgSize = log2Ceil(beatBytes).U,
+        data = io.in.bits)._2
+      tl.d.ready := running && xactBusy.orR
+
+      xactBusy := (xactBusy |
+        Mux(tl.a.fire(), xactOnehot, 0.U(nXacts.W))) &
+        ~Mux(tl.d.fire(), UIntToOH(tl.d.bits.source), 0.U)
+
+      when (state === s_idle && running) {
+        assert(addr(log2Ceil(beatBytes)-1,0) === 0.U,
+          s"InputStream base address not aligned to ${beatBytes} bytes")
+        assert(len(log2Ceil(beatBytes)-1,0) === 0.U,
+          s"InputStream length not aligned to ${beatBytes} bytes")
+        state := s_issue
+      }
+
+      when (io.in.fire()) {
+        addr := addr + beatBytes.U
+        len := len - beatBytes.U
+        when (len === beatBytes.U) { state := s_wait }
+      }
+
+      when (state === s_wait && !xactBusy.orR) {
+        running := false.B
+        complete := true.B
+        state := s_idle
+      }
+
+      outer.regnode.regmap(
+        0x00 -> Seq(RegField(addrBits, addr)),
+        0x08 -> Seq(RegField(addrBits, len)),
+        0x10 -> Seq(RegField(1, running)),
+        0x18 -> Seq(RegField(1, complete)))
+    }
+
+The state machine starts in the ``s_idle`` state. In this state, the CPU should
+set the ``addr`` and ``len`` registers and then set the ``running`` register to
+1. The state machine then moves into the ``s_issue`` state, in which it
+forwards data from the ``in`` decoupled interface to memory through the
+TileLink `A` channel.
+
+We construct the `A` channel requests using the ``Put`` method in the
+``TLEdge`` object we extracted earlier.  The ``Put`` method takes a unique
+source ID in ``fromSource``, the address to write to in ``toAddress``, the
+base-2 logarithm of the size in bytes in ``lgSize``, and the data to be written
+in ``data``.
+
+The source field must observe some constraints. There can only be one
+transaction with each distinct source ID in flight at a given time.
+Once you send a request on the `A` channel with a specific source ID,
+you cannot send another until after you've received the response for it
+on the `D` channel.
+
+Once all requests have been sent on the `A` channel, the state machine
+transitions to the ``s_wait`` state to wait for the remaining responses on
+the `D` channel. Once the responses have all returned, the state machine
+sets ``running`` to false and ``completed`` to true. The CPU can poll the
+``completed`` register to check if the operation has finished.
+
+However, for long-running operations, we would usually like to have the device
+notify the CPU through an interrupt. To add an interrupt to the device,
+we need to create an ``IntSourceNode`` in the lazy module.
+
+.. code-block:: scala
+
+    val intnode = IntSourceNode(IntSourcePortSimple(resources = device.int))
+
+Then, in the module implementation, we can connect the ``complete`` register
+to the interrupt line. That way, the CPU will get interrupted once the
+state machine completes. It can clear the interrupt by writing a 0 to the
+``complete`` register.
+
+.. code-block:: scala
+
+    val (interrupt, _) = outer.intnode.out(0)
+
+    interrupt(0) := complete
--- a/docs/Developing-New-Devices.rst
+++ b/docs/Developing-New-Devices.rst
@ -0,0 +1,13 @@
+Developing New Devices
+======================
+
+.. toctree::
+    :maxdepth: 2
+    :caption: Developing New Devices:
+
+    Getting-Started
+    MMIO-mapped-Registers
+    DMA-and-Interrupts
+    Connecting-Devices-to-Bus
+    Running-Test-Software
+    Creating-Simulation-Model
--- a/docs/Getting-Started.rst
+++ b/docs/Getting-Started.rst
@ -0,0 +1,42 @@
+Getting Started
+===============
+
+In this tutorial, we will show you how to design a new memory-mapped IO
+device, test it in simulation, and then build and run it on FireSim.
+
+To start with, you will need to clone a copy of FireChip, the repository
+that aggregates all the target RTL for FireSim. FireSim already contains
+FireChip as a submodule under ``target-design/firechip``, but it makes patches
+to the codebase so that it will work with the FPGA tools. Therefore, you will
+need to clone a clean copy if you want to use FireChip standalone.
+
+Go to https://github.com/firesim/firechip and click the "Fork" button to
+fork the repository to your own account. Now clone the new repo to your
+local machine and initialize the submodules.
+
+.. code-block:: shell
+
+    $ git clone git@github.com:yourusername/firechip.git
+    $ cd firechip
+    $ git submodule update --init
+    $ cd rocket-chip
+    $ git submodule update --init
+    $ cd ..
+
+You will not need to install the riscv-tools again because you'll just be
+reusing the one in firesim. So make sure to go into firesim and source
+``sourceme-f1-full.sh`` before you run the rest of the commands in this
+tutorial.
+
+Now that everything is checked out, you can build the VCS or Verilator
+simulator and run the regression tests to make sure everything is working.
+
+.. code-block:: shell
+
+    $ cd vsim # or "cd verisim" for verilator
+    $ make # builds the DefaultExampleConfig
+    $ make run-regression-tests
+
+If everything is set up correctly, you should see a bunch of ``*.out`` files
+in the ``output/`` directory. If you open these up, they should all say
+"Completed after XXXXX cycles" at the end and not have any error messages.
--- a/docs/MMIO-mapped-Registers.rst
+++ b/docs/MMIO-mapped-Registers.rst
@ -0,0 +1,76 @@
+MMIO-mapped Registers
+=====================
+
+In this tutorial, we will create a new device which pulls in data from an
+externally-connected input stream and writes it to memory. We'll create out
+device in the file ``src/main/scala/example/InputStream.scala``. The first
+thing we need to do is set up some memory-mapped control registers that the
+CPU can use to communicate with the device. The easiest way to do this is by
+creating a ``TLRegisterNode``, which provides a ``regmap`` method that can be
+used to generate the hardware for reading and writing to RTL registers.
+
+.. code-block:: scala
+
+    class InputStream(
+        address: BigInt,
+        val beatBytes: Int = 8)
+        (implicit p: Parameters) extends LazyModule {
+    
+      val device = new SimpleDevice("input-stream", Seq("example,input-stream"))
+      val regnode = TLRegisterNode(
+        address = Seq(AddressSet(address, 0x3f)),
+        device = device,
+        beatBytes = beatBytes)
+    
+      lazy val module = new InputStreamModuleImp(this)
+    }
+
+We want to specify or override two arguments in the ``TLRegisterNode``  
+constructor. The first is the address of the device in the memory map.
+The address is specified as an ``AddressSet`` containing two values, a base
+address and a mask. The system bus will route all addresses that match the
+base address on the bits not set in the mask. In this case, we set the
+mask to ``0x3f``, which sets the lower six bits. This means that a 64 byte
+region starting from the base address will be routed to this device.
+
+The second argument to ``TLRegisterNode`` is a ``SimpleDevice`` object, which
+provides the name and compatibility of the device table entry that will be
+created for the peripheral. We won't show how this is used in this tutorial,
+but it will be important if you want to create a Linux kernel driver for
+the device.
+
+The third argument to ``TLRegisterNode`` is ``beatBytes``, which specifies
+the width of the TileLink interface. We will just pass this through from a
+class argument.
+
+We want the device to be able to write a specified amount of bytes to a
+specified location in memory, so we'll provide ``addr`` and ``len`` registers.
+We will also want a ``running`` register for the CPU to signal that the device
+to start operation and a ``complete`` register for the device to signal to
+the CPU that it has completed.
+
+.. code-block:: scala
+
+    class InputStreamModuleImp(outer: InputStream) extends LazyModuleImp(outer) {
+        val addrBits = 64
+        val w = 64
+        val io = IO(new Bundle {
+            // Not used yet
+            val in = Flipped(Decoupled(UInt(w.W)))
+        }
+        val addr = Reg(UInt(addrBits.W))
+        val len = Reg(UInt(addrBits.W))
+        val running = RegInit(false.B)
+        val complete = RegInit(false.B)
+
+        outer.regnode.regmap(
+            0x00 -> Seq(RegField(addrBits, addr)),
+            0x08 -> Seq(RegField(addrBits, len)),
+            0x10 -> Seq(RegField(1, running)),
+            0x18 -> Seq(RegField(1, complete)))
+    }
+
+The arguments to ``regmap`` should be a series of mappings from address
+offsets to sequences of ``RegField`` objects. The ``RegField`` constructor
+takes two arguments, the width of the register field and the RTL register
+itself.
--- a/docs/Running-Test-Software.rst
+++ b/docs/Running-Test-Software.rst
@ -0,0 +1,70 @@
+Running Test Software
+=====================
+
+To test our input stream device, we want to write an application that uses
+the device to write data into memory, then reads the data and prints it out.
+
+In project-template, test software is placed in the ``tests/`` directory,
+which includes a Makefile and library code for developing a baremetal program.
+We'll create a new file at ``tests/input-stream.c`` with the following code:
+
+.. code-block:: c
+
+    #include <stdio.h>
+    #include <stdlib.h>
+    #include <stdint.h>
+
+    #include "mmio.h"
+
+    #define N 4
+    #define INPUTSTREAM_BASE 0x10017000L
+    #define INPUTSTREAM_ADDR     (INPUTSTREAM_BASE + 0x00)
+    #define INPUTSTREAM_LEN      (INPUTSTREAM_BASE + 0x08)
+    #define INPUTSTREAM_RUNNING  (INPUTSTREAM_BASE + 0x10)
+    #define INPUTSTREAM_COMPLETE (INPUTSTREAM_BASE + 0x18)
+
+    uint64_t values[N];
+
+    int main(void)
+    {
+            reg_write64(INPUTSTREAM_ADDR, (uint64_t) values);
+            reg_write64(INPUTSTREAM_LEN, N * sizeof(uint64_t));
+            asm volatile ("fence");
+            reg_write64(INPUTSTREAM_RUNNING, 1);
+
+            while (reg_read64(INPUTSTREAM_COMPLETE) == 0) {}
+            reg_write64(INPUTSTREAM_COMPLETE, 0);
+
+            for (int i = 0; i < N; i++)
+                    printf("%016lx\n", values[i]);
+
+            return 0;
+    }
+
+This program statically allocates an array for the data to be written to.
+It then sets the ``addr`` and ``len`` registers, executes a ``fence``
+instruction to make sure they are committed, and then sets the ``running``
+register. It then continuously polls the ``complete`` register until it sees
+a non-zero value, at which point it knows the data has been written to memory
+and is safe to read back.
+
+To compile this program, add "input-stream" to the ``PROGRAMS`` list in
+``tests/Makefile`` and run ``make`` from the tests directory.
+
+To run the program, return to the ``vsim/`` directory and run the simulator
+executable, passing the newly compiled ``input-stream.riscv`` executable
+as an argument.
+
+.. code-block:: shell
+
+    $ cd vsim
+    $ ./simv-example-FixedInputStreamConfig ../tests/input-stream.riscv
+
+The program should print out 
+
+.. code-block:: text
+
+    000000001002abcd
+    0000000034510204
+    0000000010329999
+    0000000092101222
--- a/docs/index.rst
+++ b/docs/index.rst
@ -15,6 +15,7 @@ Welcome to FireSim's documentation!
   single-node-sim
   cluster-sim
   advanced-usage
+   Developing-New-Devices

 Indices and tables
 ==================