Merge pull request #400 from firesim/flatten-midas

Flatten the MIDAS submodule into FireSim
This commit is contained in:
David Biancolin 2019-10-23 16:55:12 -07:00 committed by GitHub
commit a7ed32fe31
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
172 changed files with 75115 additions and 4 deletions

6
.gitmodules vendored
View File

@ -4,9 +4,6 @@
[submodule "sw/firesim-software"]
path = sw/firesim-software
url = https://github.com/firesim/firesim-software
[submodule "sim/midas"]
path = sim/midas
url = https://github.com/ucb-bar/midas
[submodule "target-design/chipyard"]
path = target-design/chipyard
url = https://github.com/ucb-bar/project-template
@ -28,3 +25,6 @@
[submodule "deploy/workloads/coremark/riscv-coremark"]
path = deploy/workloads/coremark/riscv-coremark
url = https://github.com/riscv-boom/riscv-coremark
[submodule "sim/midas/src/main/cc/dramsim2"]
path = sim/midas/src/main/cc/dramsim2
url = https://github.com/firesim/DRAMSim2.git

@ -1 +0,0 @@
Subproject commit 8401cfa7e20c8331264a5d726698a4ed1994d45e

12
sim/midas/.gitignore vendored Normal file
View File

@ -0,0 +1,12 @@
generated
logs
results
target
project/target
*.out
*.swp
*.tmp
*.key
DVEfiles
*~
*#

89
sim/midas/README.md Normal file
View File

@ -0,0 +1,89 @@
# Golden Gate (MIDAS II)
Golden Gate is an _optimizing_ FIRRTL compiler for generating FPGA-accelerated simulators
automatically from Chisel-based RTL design, and is the basis for simulator
compilation in [FireSim](https://fires.im).
Golden Gate is the successor to MIDAS, which was originally based off the
[Strober](http://dl.acm.org/citation.cfm?id=3001151) sample-based energy
simulation framework. Golden Gate differs from prior work in that it is, to our knowledge, the first compiler
to support automatic _multi-model composition_: it can break apart a
block of RTL into a graph of models. Golden Gate uses this feature
to identify and replace FPGA-hostile blocks with multi-host-cycle models that
consume fewer FPGA resources while still exactly representing the behavior of
the source RTL. In [our ICCAD 2019 paper](http://davidbiancolin.github.io/papers/goldengate-iccad19.pdf), we leverage this feature optimize
multi-ported RAMs in order to fit an extra two BOOM cores (6 up from 4) on a
Xilinx VU9P.
## Changes From MIDAS
Golden Gate inherits nearly all of the features of MIDAS, including, FASED memory timing models, assertion synthesis, and printf synthesis, but there are some notable changes:
### 1. Support for Resource Optimizations
As mentioned above, Golden Gate can identify and optimize FPGA-hostile
structures in the target RTL. This is described at length in [our ICCAD2019
paper](http://davidbiancolin.github.io/papers/goldengate-iccad19.pdf).
Currently Golden Gate only supports optimizing multi-ported memories,
but other resource-reducing optimizations are under development.
### 2. Different Inputs and Invocation Model (FIRRTL Stage).
Golden Gate is not invoked in the same process as the target generator.
instead it's invoked as a seperate process and provided with three inputs:
1) FIRRTL for the target-design
2) Associated FIRRTL annotations for that design
3) A compiler parameterization (derived from Rocket Chip's Config system).
annotations. This permits decoupling the target Generator from the compiler,
and enables the resuse of the same FIRRTL between multiple simulation or EDA
backends. midas.Compiler will be removed in the next release.
### 3. Endpoints Have Been Replaced With Target-to-Host Bridges.
Unlike Endpoints, which were instantiated by matching on a Chisel I/O type,
target-to-host bridges (or bridges, for short) are instantiated directly in the
target's RTL (i.e., in Chisel). Unlike endpoints, bridges can be instantiated
anywhere in the module heirachy, and can more effectively capture
module-hierarchy-dependent parameterization information from the target. This
makes it easier to have multiple instances of the same bridge with difference
parameterizations.
### 4. The Input Target Design Must Be Closed
The FIRRTL passed to Golden Gate must expose no dangling I/O (with the exception of one input
clock): instead the target should be wrapped in a module that instantiates the
appropriate bridges. This wrapper module is directly analogous to a test
harness used in software-based RTL simulation. How these bridges are
instantiated is left to the user, but multiple different examples can be found in
FireSim. One benefit of this "closed-world" approach is that the topology of the
simulator (as a network of simulation models) is guaranteed to match the topology
of the input design.
### 5. Different Underlying Dataflow Network Formalism
Golden Gate uses the [_Latency-Insensitive Bounded-Dataflow Network_](https://dl.acm.org/citation.cfm?id=1715781) (LI-BDN)
target formalism. This makes it possible to model combinational paths that
span multiple models, and to prove that properties about target-cycle exactness
and deadlock freedom in the resulting simulator.
## Documentation
Golden Gate's documentation is hosted in [FireSim's Read-The-Docs](https://docs.fires.im)
## Related Publications
* Albert Magyar, David T. Biancolin, Jack Koenig, Sanjit Seshia, Jonathan Bachrach, Krste Asanović, **Golden Gate: Bridging The Resource-Efficiency Gap Between ASICs and FPGA Prototypes**, To appear at ICCAD '19.([Paper PDF](http://davidbiancolin.github.io/papers/goldengate-iccad19.pdf))
* David Biancolin, Sagar Karandikar, Donggyu Kim, Jack Koenig, Andrew Waterman, Jonathan Bachrach, Krste Asanović, **“FASED: FPGA-Accelerated Simulation and Evaluation of DRAM”**, In proceedings of the 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, February 2019. ([Paper PDF](https://people.eecs.berkeley.edu/~biancolin/papers/fased-fpga19.pdf))
* Donggyu Kim, Christopher Celio, Sagar Karandikar, David Biancolin, Jonathan Bachrach, and Krste Asanović, **“DESSERT: Debugging RTL Effectively with State Snapshotting for Error Replays across Trillions of cycles”**, In proceedings of the 28th International Conference on Field Programmable Logic & Applications (FPL 2018), Dublin, Ireland, August 2018. ([IEEE Xplore](https://ieeexplore.ieee.org/abstract/document/8533471))
* Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolić, Randy Katz, Jonathan Bachrach, and Krste Asanović, **“FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud”**, In proceedings of the 45th ACM/IEEE International Symposium on Computer Architecture (ISCA 2018), Los Angeles, June 2018. ([Paper PDF](https://sagark.org/assets/pubs/firesim-isca2018.pdf), [IEEE Xplore](https://ieeexplore.ieee.org/document/8416816)) **Selected as one of IEEE Micros “Top Picks from Computer Architecture Conferences, 2018”.**
* Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, and Krste Asanović, **"Evaluation of RISC-V RTL with FPGA-Accelerated Simulation"**, The First Workshop on Computer Architecture Research with RISC-V (CARRV 2017), Boston, MA, USA, Oct 2017. ([Paper PDF](doc/papers/carrv-2017.pdf))
* Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, and Krste Asanović, **"Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL"**, International Symposium on Computer Architecture (ISCA-2016), Seoul, Korea, June 2016. ([ACM DL](https://dl.acm.org/citation.cfm?id=3001151), [Slides](http://isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2B-2.pdf))
## Dependencies
This repository depends on the following projects:
* [Chisel](https://github.com/freechipsproject/chisel3): Target-RTL that MIDAS transformed must be written in Chisel RTL in the current version. Additionally, MIDAS RTL libraries are all written in Chisel.
* [FIRRTL](https://github.com/freechipsproject/firrtl): Transformations of target-RTL are performed using FIRRTL compiler passes.
* [RocketChip](https://github.com/freechipsproject/rocket-chip): Rocket Chip is not only a chip generator, but also a collection of useful libraries for various hardware designs.
* [barstools](https://github.com/ucb-bar/barstools): Some additional technology-dependent custom transforms(e.g. macro compiler) are required when Strober energy modelling is enabled.

7
sim/midas/build.sbt Normal file
View File

@ -0,0 +1,7 @@
organization := "edu.berkeley.cs"
version := "1.0-SNAPSHOT"
name := "midas"
scalaVersion := "2.12.4"

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

3
sim/midas/src/main/cc/.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
*.obj
gmp-*
sdfAnnotateInfo

View File

@ -0,0 +1,119 @@
midas_dir = $(abspath .)
util_dir = $(midas_dir)/utils
bridge_dir = $(midas_dir)/bridges
replay_dir = $(midas_dir)/replay
v_dir = $(abspath ../verilog)
r_dir = $(abspath ../resources)
########################################################################
# Parameters:
# 1) PLATFORM: FPGA platform board(by default zynq)
# 2) DESIGN: Target design of midas
# 3) GEN_DIR: Directory for generated source code
# 4) OUT_DIR: Directory for binary files (by default GEN_DIR)
# 5) DRIVER: software driver written by user (not necessary for replay)
# 6) CLOCK_PERIOD(optional): clock period of tests
########################################################################
ifeq ($(strip $(DESIGN)),)
$(error Define DESIGN, the target design)
endif
ifeq ($(strip $(GEN_DIR)),)
$(error Define GEN_DIR, where all midas generated code reside)
endif
ifeq ($(filter $(MAKECMDGOALS),vcs-replay $(REPLAY_BINARY)),)
ifeq ($(strip $(DRIVER)),)
$(error Define DRIVER, the source code of the simulation driver)
endif
endif
PLATFORM ?= zynq
OUT_DIR ?= $(GEN_DIR)
CLOCK_PERIOD ?= 1.0
$(info platform: $(PLATFORM))
$(info target design: $(DESIGN))
$(info generated source directory: $(GEN_DIR))
$(info output directory: $(OUT_DIR))
$(info driver source files: $(DRIVER))
$(info clock period: $(CLOCK_PERIOD))
shim := FPGATop
override CXXFLAGS := $(CXXFLAGS) -std=c++11 -Wall -I$(midas_dir)/dramsim2
include $(util_dir)/utils.mk
$(OUT_DIR)/dramsim2_ini: $(r_dir)/dramsim2_ini
ln -sf $< $@
$(OUT_DIR)/$(DESIGN).chain:
$(if $(wildcard $(GEN_DIR)/$(DESIGN).chain),cp $(GEN_DIR)/$(DESIGN).chain $@,)
override CXXFLAGS += -I$(midas_dir) -I$(util_dir)
# The trailing whitespace is important for some reason...
override LDFLAGS := $(LDFLAGS) -L$(GEN_DIR) -lstdc++ -lpthread -lgmp -lmidas
design_v := $(GEN_DIR)/$(shim).v
design_h := $(GEN_DIR)/$(DESIGN)-const.h
design_vh := $(GEN_DIR)/$(DESIGN)-const.vh
driver_h = $(foreach t, $(DRIVER), $(wildcard $(dir $(t))/*.h))
bridge_h := $(wildcard $(bridge_dir)/*.h)
bridge_cc := $(wildcard $(bridge_dir)/*.cc)
bridge_o := $(patsubst $(bridge_dir)/%.cc, $(GEN_DIR)/%.o, $(bridge_cc))
$(bridge_o): $(GEN_DIR)/%.o: $(bridge_dir)/%.cc $(design_h) $(bridge_h)
$(CXX) $(CXXFLAGS) -c -o $@ $< -include $(word 2, $^)
platform_files := simif simif_$(PLATFORM) sample/sample
platform_h := $(addprefix $(midas_dir)/, $(addsuffix .h, $(platform_files)))
platform_cc := $(addprefix $(midas_dir)/, $(addsuffix .cc, $(platform_files) sample/simif_sample))
platform_o := $(addprefix $(GEN_DIR)/, $(addsuffix .o, $(platform_files) sample/simif_sample))
$(platform_o): $(GEN_DIR)/%.o: $(midas_dir)/%.cc $(design_h) $(platform_h)
mkdir -p $(dir $@)
$(CXX) $(CXXFLAGS) -c -o $@ $< -include $(word 2, $^)
$(OUT_DIR)/$(DESIGN)-$(PLATFORM): $(design_h) $(lib) $(DRIVER) $(driver_h) $(platform_o) $(bridge_o)
mkdir -p $(OUT_DIR)
$(CXX) $(CXXFLAGS) -include $< \
-o $@ $(DRIVER) $(dramsim_o) $(lib_o) $(platform_o) $(bridge_o) $(LDFLAGS)
$(PLATFORM): $(OUT_DIR)/$(DESIGN)-$(PLATFORM) $(OUT_DIR)/$(DESIGN).chain
# Sources for building MIDAS-level simulators. Must be defined before sources VCS/Verilator Makefrags
override CFLAGS += -include $(design_h)
# Models of FPGA primitives that are used in host-level sim, but not in FPGATop
sim_fpga_resource_models := $(v_dir)/BUFGCE.v
emul_files := simif simif_emul emul/mmio_$(PLATFORM) sample/sample
emul_h := $(driver_h) $(bridge_h) $( $(addprefix $(midas_dir)/, $(addsuffix .h, $(emul_files) emul/mmio)))
# This includes c sources and static libraries
emul_cc := $(DRIVER) $(bridge_cc) $(addprefix $(midas_dir)/, $(addsuffix .cc, $(emul_files) sample/simif_sample)) $(lib)
emul_v := $(design_vh) $(design_v) $(sim_fpga_resource_models)
# The lop level module must be called out for verilator
ifeq ($(PLATFORM),zynq)
top_module = ZynqShim
endif
ifeq ($(PLATFORM),f1)
top_module = F1Shim
endif
verilator_conf := rtlsim/ml-verilator-conf.vlt
include rtlsim/Makefrag-verilator
verilator: $(OUT_DIR)/V$(DESIGN) $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
verilator-debug: $(OUT_DIR)/V$(DESIGN)-debug $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
# Add an extra wrapper source for VCS simulators
vcs_wrapper_v := $(v_dir)/emul_$(PLATFORM).v
TB := emul
VCS_FLAGS := -e vcs_main
include rtlsim/Makefrag-vcs
vcs: $(OUT_DIR)/$(DESIGN) $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
vcs-debug: $(OUT_DIR)/$(DESIGN)-debug $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
include $(replay_dir)/replay.mk
.PHONY: $(PLATFORM) verilator verilator-debug vcs vcs-debug

View File

@ -0,0 +1,20 @@
// See LICENSE for license details.
#include "address_map.h"
AddressMap::AddressMap(
unsigned int r_register_count,
const unsigned int* r_register_addrs,
const char* const* r_register_names,
unsigned int w_register_count,
const unsigned int* w_register_addrs,
const char* const* w_register_names) {
for (size_t i = 0; i < r_register_count; i++) {
r_registers.insert(std::make_pair(r_register_names[i], r_register_addrs[i]));
}
for (size_t i = 0; i < w_register_count; i++) {
w_registers.insert(std::make_pair(w_register_names[i], w_register_addrs[i]));
}
}

View File

@ -0,0 +1,35 @@
// See LICENSE for license details.
#ifndef __ADDRESS_MAP_H
#define __ADDRESS_MAP_H
#include <map>
// Maps midas compiler emited arrays to a more useful object, that can be
// used to read and write to a local set of registers by their names
//
// Registers may appear in both R and W lists
class AddressMap
{
public:
AddressMap(
unsigned int read_register_count,
const unsigned int* read_register_addrs,
const char* const* read_register_names,
unsigned int write_register_count,
const unsigned int* write_register_addrs,
const char* const* write_register_names);
// Look up register address based on name
uint32_t r_addr(std::string name) { return r_registers[name]; };
uint32_t w_addr(std::string name) { return w_registers[name]; };
// Check for register presence
bool r_reg_exists(std::string name) { return r_registers.find(name) != r_registers.end(); };
bool w_reg_exists(std::string name) { return w_registers.find(name) != w_registers.end(); };
// Register name -> register addresses
std::map<std::string, uint32_t> r_registers;
std::map<std::string, uint32_t> w_registers;
};
#endif // __ADDRESS_MAP_H

View File

@ -0,0 +1,58 @@
// See LICENSE for license details.
#ifndef __BRIDGE_DRIVER_H
#define __BRIDGE_DRIVER_H
#include "simif.h"
// Bridge Drivers are the CPU-hosted component of a Target-to-Host Bridge. A
// Bridge Driver interacts with their accompanying FPGA-hosted BridgeModule
// using MMIO (via read() and write() methods) or CPU-mastered DMA (via pull()
// and push()).
class bridge_driver_t
{
public:
bridge_driver_t(simif_t* s): sim(s) { }
virtual ~bridge_driver_t() {};
// Initialize BridgeModule state -- this can't be done in the constructor currently
virtual void init() = 0;
// Does work that allows the Bridge to advance in simulation time (one or more cycles)
// The standard FireSim driver calls the tick methods of all registered bridge drivers.
// Bridges whose BridgeModule is free-running need not implement this method
virtual void tick() = 0;
// Indicates the simulation should terminate.
// Tie off to false if the brige will never call for the simulation to teriminate.
virtual bool terminate() = 0;
// If the bridge driver calls for termination, encode a cause here. 0 = PASS All other
// codes are bridge-implementation defined
virtual int exit_code() = 0;
// The analog of init(), this provides a final opportunity to interact with
// the FPGA before destructors are called at the end of simulation. Useful
// for doing end-of-simulation clean up that requires calling {read,write,push,pull}.
virtual void finish() = 0;
protected:
void write(size_t addr, data_t data) {
sim->write(addr, data);
}
data_t read(size_t addr) {
return sim->read(addr);
}
ssize_t pull(size_t addr, char *data, size_t size) {
return sim->pull(addr, data, size);
}
ssize_t push(size_t addr, char *data, size_t size) {
if (size == 0)
return 0;
return sim->push(addr, data, size);
}
private:
simif_t *sim;
};
#endif // __BRIDGE_DRIVER_H

View File

@ -0,0 +1,196 @@
// See LICENSE for license details.
#include <iostream>
#include <algorithm>
#include <exception>
#include <stdio.h>
#include "fased_memory_timing_model.h"
void Histogram::init() {
// Read out the initial values
write(enable, 1);
for ( size_t i = 0; i < HISTOGRAM_SIZE; i++) {
write(addr, i);
latency[i] = read64(dataH, dataL, BIN_H_MASK);
}
// Disable readout enable; otherwise histogram updates will be gated
write(enable, 0);
}
void Histogram::finish() {
// Read out the initial values
write(enable, 1);
for ( size_t i = 0; i < HISTOGRAM_SIZE; i++) {
write(addr, i);
latency[i] = read64(dataH, dataL, BIN_H_MASK) - latency[i];
}
// Disable readout enable; otherwise histogram updates will be gated
write(enable, 0);
}
void AddrRangeCounter::init() {
nranges = read("numRanges");
range_bytes = new uint64_t[nranges];
write(enable, 1);
for (size_t i = 0; i < nranges; i++) {
write(addr, i);
range_bytes[i] = read64(dataH, dataL, RANGE_H_MASK);
}
write(enable, 0);
}
void AddrRangeCounter::finish() {
write(enable, 1);
for (size_t i = 0; i < nranges; i++) {
write(addr, i);
range_bytes[i] = read64(dataH, dataL, RANGE_H_MASK);
}
write(enable, 0);
}
FASEDMemoryTimingModel::FASEDMemoryTimingModel(
simif_t* sim, AddressMap addr_map, int argc,char** argv,
std::string stats_file_name, size_t mem_size, uint64_t mem_host_offset)
: FpgaModel(sim, addr_map), mem_size(mem_size), mem_host_offset(mem_host_offset) {
std::vector<std::string> args(argv + 1, argv + argc);
for (auto &arg: args) {
if(arg.find("+mm_") == 0) {
auto sub_arg = std::string(arg.c_str() + 4);
size_t delimit_idx = sub_arg.find_first_of("=");
std::string key = sub_arg.substr(0, delimit_idx).c_str();
int value = std::stoi(sub_arg.substr(delimit_idx+1).c_str());
model_configuration[key] = value;
}
}
stats_file.open(stats_file_name, std::ofstream::out);
if(!stats_file.is_open()) {
throw std::runtime_error("Could not open output file: " + stats_file_name);
}
for (auto pair: addr_map.r_registers) {
// Only profile readable registers
if (!addr_map.w_reg_exists((pair.first))) {
// Iterate through substrings to exclude
bool exclude = false;
for (auto &substr: profile_exclusion) {
if (pair.first.find(substr) != std::string::npos) { exclude = true; }
}
if (!exclude) {
profile_reg_addrs.push_back(pair.second);
stats_file << pair.first << ",";
}
}
}
stats_file << std::endl;
if (addr_map.w_reg_exists("hostReadLatencyHist_enable")) {
histograms.push_back(Histogram(sim, addr_map, "hostReadLatency"));
histograms.push_back(Histogram(sim, addr_map, "hostWriteLatency"));
histograms.push_back(Histogram(sim, addr_map, "targetReadLatency"));
histograms.push_back(Histogram(sim, addr_map, "targetWriteLatency"));
histograms.push_back(Histogram(sim, addr_map, "ingressReadLatency"));
histograms.push_back(Histogram(sim, addr_map, "ingressWriteLatency"));
histograms.push_back(Histogram(sim, addr_map, "totalReadLatency"));
histograms.push_back(Histogram(sim, addr_map, "totalWriteLatency"));
}
if (addr_map.w_reg_exists("readRanges_enable")) {
rangectrs.push_back(AddrRangeCounter(sim, addr_map, "read"));
rangectrs.push_back(AddrRangeCounter(sim, addr_map, "write"));
}
}
void FASEDMemoryTimingModel::profile() {
for (auto addr: profile_reg_addrs) {
stats_file << read(addr) << ",";
}
stats_file << std::endl;
}
void FASEDMemoryTimingModel::init() {
for (auto &pair: addr_map.w_registers) {
auto value_it = model_configuration.find(pair.first);
if (value_it != model_configuration.end()) {
write(pair.second, value_it->second);
}
else if (pair.first.find("hostMemOffsetLow") != std::string::npos) {
write(pair.second, mem_host_offset & ((1ULL << 32) - 1));
}
else if (pair.first.find("hostMemOffsetHigh") != std::string::npos) {
write(pair.second, mem_host_offset >> 32);
}
else {
// Iterate through substrings to exclude
bool exclude = false;
for (auto &substr: configuration_exclusion) {
if (pair.first.find(substr) != std::string::npos) { exclude = true; }
}
if (!exclude) {
char buf[100];
sprintf(buf, "No value provided for configuration register: %s", pair.first.c_str());
throw std::runtime_error(buf);
} else {
fprintf(stderr, "Ignoring writeable register: %s\n", pair.first.c_str());
}
}
}
for (auto &hist: histograms) { hist.init(); }
for (auto &rctr: rangectrs) { rctr.init(); }
}
void FASEDMemoryTimingModel::finish() {
for (auto &hist: histograms) { hist.finish(); }
for (auto &rctr: rangectrs) { rctr.finish(); }
std::ofstream histogram_file;
histogram_file.open("latency_histogram.csv", std::ofstream::out);
if(!histogram_file.is_open()) {
throw std::runtime_error("Could not open histogram output file");
}
// Header
for (auto &hist: histograms) {
histogram_file << hist.name << ",";
}
histogram_file << std::endl;
// Data
for (size_t i = 0; i < HISTOGRAM_SIZE; i++) {
for (auto &hist: histograms) {
histogram_file << hist.latency[i] << ",";
}
histogram_file << std::endl;
}
histogram_file.close();
if (!rangectrs.empty()) {
size_t nranges = rangectrs[0].nranges;
std::ofstream rangectr_file;
rangectr_file.open("range_counters.csv", std::ofstream::out);
if (!rangectr_file.is_open()) {
throw std::runtime_error("Could not open range counter file");
}
rangectr_file << "Address,";
for (auto &rctr: rangectrs) {
rangectr_file << rctr.name << ",";
}
rangectr_file << std::endl;
for (size_t i = 0; i < nranges; i++) {
rangectr_file << std::hex << (i * mem_size / nranges) << ",";
for (auto &rctr: rangectrs) {
rangectr_file << std::dec << rctr.range_bytes[i] << ",";
}
rangectr_file << std::endl;
}
rangectr_file.close();
}
stats_file.close();
}

View File

@ -0,0 +1,114 @@
// See LICENSE for license details.
#ifndef __FASED_MEMORY_TIMING_MODEL_H
#define __FASED_MEMORY_TIMING_MODEL_H
/* This is the widget driver for FASED memory-timing models
*
* FASED instances are FPGA-hosted and only rely on this driver to:
* 1) set runtime-configurable timing parameters before simulation commences
* 2) poll instrumentation registers
*
*/
#include <unordered_map>
#include <set>
#include <fstream>
#include "fpga_model.h"
// MICRO HACKS.
constexpr int HISTOGRAM_SIZE = 1024;
constexpr int BIN_SIZE = 36;
constexpr int RANGE_COUNT_SIZE = 48;
constexpr data_t BIN_H_MASK = (1L << (BIN_SIZE - 32)) - 1;
constexpr data_t RANGE_H_MASK = (1L << (RANGE_COUNT_SIZE - 32)) - 1;
class AddrRangeCounter: public FpgaModel {
public:
AddrRangeCounter(simif_t *sim, AddressMap addr_map, std::string name):
FpgaModel(sim, addr_map), name(name) {};
~AddrRangeCounter(void) { /*delete [] range_bytes;*/ }
void init();
void profile() {}
void finish();
std::string name;
uint64_t *range_bytes;
size_t nranges;
private:
std::string enable = name + "Ranges_enable";
std::string dataH = name + "Ranges_dataH";
std::string dataL = name + "Ranges_dataL";
std::string addr = name + "Ranges_addr";
};
class Histogram: public FpgaModel {
public:
Histogram(simif_t* s, AddressMap addr_map, std::string name): FpgaModel(s, addr_map), name(name) {};
void init();
void profile() {};
void finish();
std::string name;
uint64_t latency[HISTOGRAM_SIZE];
private:
std::string enable = name + "Hist_enable";
std::string dataH = name + "Hist_dataH";
std::string dataL = name + "Hist_dataL";
std::string addr = name + "Hist_addr";
};
class FASEDMemoryTimingModel: public FpgaModel
{
public:
FASEDMemoryTimingModel(simif_t* s, AddressMap addr_map, int argc, char** argv,
std::string stats_file_name, size_t mem_size, uint64_t mem_host_offset);
void init();
void profile();
void finish();
private:
// Saves a map of register names to settings
std::unordered_map<std::string, uint32_t> model_configuration;
std::vector<uint32_t> profile_reg_addrs;
std::ofstream stats_file;
std::vector<Histogram> histograms;
std::vector<AddrRangeCounter> rangectrs;
std::set<std::string> configuration_exclusion {
"Hist_dataL",
"Hist_dataH",
"Hist_addr",
"Hist_enable",
"hostMemOffsetLow",
"hostMemOffsetHigh",
"Ranges_dataL",
"Ranges_dataH",
"Ranges_addr",
"Ranges_enable",
"numRanges"
};
std::set<std::string> profile_exclusion {
"Hist_dataL",
"Hist_dataH",
"Hist_addr",
"Hist_enable",
"hostMemOffsetLow",
"hostMemOffsetHigh",
"Ranges_dataL",
"Ranges_dataH",
"Ranges_addr",
"Ranges_enable",
"numRanges"
};
bool has_latency_histograms() { return histograms.size() > 0; };
size_t mem_size;
uint64_t mem_host_offset;
};
#endif // __FASED_MEMORY_TIMING_MODEL_H

View File

@ -0,0 +1,60 @@
// See LICENSE for license details.
#ifndef __FPGA_MODEL_H
#define __FPGA_MODEL_H
#include "simif.h"
#include "address_map.h"
/**
* Base class for (handwritten) FPGA-hosted models
*
* These models have two important methods:
*
* 1) init: Which sets their runtime configuration. Ex. The latency of
* latency pipe
*
* 2) profile: Which gives a default means to read all readable registers in
* the model, including programmable registers and instrumentation.
*
*/
class FpgaModel
{
private:
simif_t *sim;
public:
FpgaModel(simif_t* s, AddressMap addr_map): sim(s), addr_map(addr_map) {};
virtual void init() = 0;
virtual void profile() = 0;
virtual void finish() = 0;
protected:
AddressMap addr_map;
void write(size_t addr, data_t data) {
sim->write(addr, data);
}
data_t read(size_t addr) {
return sim->read(addr);
}
void write(std::string reg, data_t data){
sim->write(addr_map.w_addr(reg), data);
}
data_t read(std::string reg){
return sim->read(addr_map.r_addr(reg));
}
uint64_t read64(std::string msw, std::string lsw, data_t upper_word_mask) {
assert(sizeof(data_t) == 4);
uint64_t data = ((uint64_t) (read(msw) & upper_word_mask)) << 32;
return data | read(lsw);
}
};
#endif // __FPGA_MODEL_H

View File

@ -0,0 +1,46 @@
#ifdef ASSERTBRIDGEMODULE_struct_guard
#include "synthesized_assertions.h"
#include <iostream>
#include <fstream>
synthesized_assertions_t::synthesized_assertions_t(simif_t* sim,
ASSERTBRIDGEMODULE_struct * mmio_addrs): bridge_driver_t(sim) {
this->mmio_addrs = mmio_addrs;
};
synthesized_assertions_t::~synthesized_assertions_t() {
free(this->mmio_addrs);
}
void synthesized_assertions_t::tick() {
if (read(this->mmio_addrs->fire)) {
// Read assertion information
std::vector<std::string> msgs;
std::ifstream file(std::string(TARGET_NAME) + ".asserts");
std::string line;
std::ostringstream oss;
while (std::getline(file, line)) {
if (line == "0") {
msgs.push_back(oss.str());
oss.str(std::string());
} else {
oss << line << std::endl;
}
}
assert_cycle = read(this->mmio_addrs->cycle_low);
assert_cycle |= ((uint64_t)read(this->mmio_addrs->cycle_high)) << 32;
assert_id = read(this->mmio_addrs->id);
std::cerr << msgs[assert_id];
std::cerr << " at cycle: " << assert_cycle << std::endl;
assert_fired = true;
}
}
void synthesized_assertions_t::resume() {
assert_fired = false;
write(this->mmio_addrs->resume, 1);
}
#endif // ASSERTBRIDGEMODULE_struct_guard

View File

@ -0,0 +1,28 @@
#ifndef __SYNTHESIZED_ASSERTIONS_H
#define __SYNTHESIZED_ASSERTIONS_H
#ifdef ASSERTBRIDGEMODULE_struct_guard
#include "bridge_driver.h"
class synthesized_assertions_t: public bridge_driver_t
{
public:
synthesized_assertions_t(simif_t* sim, ASSERTBRIDGEMODULE_struct * mmio_addrs);
~synthesized_assertions_t();
virtual void init() {};
virtual void tick();
virtual void finish() {};
void resume(); // Clears any set assertions, and allows the simulation to advance
virtual bool terminate() { return assert_fired; };
virtual int exit_code() { return (assert_fired) ? assert_id + 1 : 0; };
private:
bool assert_fired = false;
int assert_id;
uint64_t assert_cycle;
ASSERTBRIDGEMODULE_struct * mmio_addrs;
};
#endif // ASSERTBRIDGEMODULE_struct_guard
#endif //__SYNTHESIZED_ASSERTIONS_H

View File

@ -0,0 +1,294 @@
#ifdef PRINTBRIDGEMODULE_struct_guard
#include <iomanip>
#include "synthesized_prints.h"
synthesized_prints_t::synthesized_prints_t(
simif_t* sim,
std::vector<std::string> &args,
PRINTBRIDGEMODULE_struct * mmio_addrs,
unsigned int print_count,
unsigned int token_bytes,
unsigned int idle_cycles_mask,
const unsigned int* print_offsets,
const char* const* format_strings,
const unsigned int* argument_counts,
const unsigned int* argument_widths,
unsigned int dma_address):
bridge_driver_t(sim),
mmio_addrs(mmio_addrs),
print_count(print_count),
token_bytes(token_bytes),
idle_cycles_mask(idle_cycles_mask),
print_offsets(print_offsets),
format_strings(format_strings),
argument_counts(argument_counts),
argument_widths(argument_widths),
dma_address(dma_address) {
assert((token_bytes & (token_bytes - 1)) == 0);
assert(print_count > 0);
const char *printfilename = default_filename.c_str();
this->start_cycle = 0;
this->end_cycle = -1ULL;
std::string printfile_arg = std::string("+print-file=");
std::string printstart_arg = std::string("+print-start=");
std::string printend_arg = std::string("+print-end=");
// Does not format the printfs, before writing them to file
std::string binary_arg = std::string("+print-binary");
// Removes the cycle prefix from human-readable output
std::string cycleprefix_arg = std::string("+print-no-cycle-prefix");
// Choose a multiple of token_bytes for the batch size
if (((beat_bytes * desired_batch_beats) % token_bytes) != 0 ) {
this->batch_beats = token_bytes / beat_bytes;
} else {
this->batch_beats = desired_batch_beats;
}
for (auto &arg: args) {
if (arg.find(printfile_arg) == 0) {
printfilename = const_cast<char*>(arg.c_str()) + printfile_arg.length();
}
if (arg.find(printstart_arg) == 0) {
char *str = const_cast<char*>(arg.c_str()) + printstart_arg.length();
this->start_cycle = atol(str);
}
if (arg.find(printend_arg) == 0) {
char *str = const_cast<char*>(arg.c_str()) + printend_arg.length();
this->end_cycle = atol(str);
}
if (arg.find(binary_arg) == 0) {
human_readable = false;
}
if (arg.find(cycleprefix_arg) == 0) {
print_cycle_prefix = false;
}
}
current_cycle = start_cycle; // We won't receive tokens until start_cycle; so fast-forward
this->printfile.open(printfilename, std::ios_base::out | std::ios_base::binary);
if (!this->printfile.is_open()) {
fprintf(stderr, "Could not open print log file: %s\n", printfilename);
abort();
}
this->printstream = &(this->printfile);
widths.resize(print_count);
// Used to reconstruct the relative position of arguments in the flattened argument_widths array
size_t arg_base_offset = 0;
size_t print_bit_offset = 1; // The lsb of the current print in the packed token
for (size_t p_idx = 0; p_idx < print_count; p_idx++ ) {
auto print_args = new print_vars_t;
size_t print_width = 1; // A running total of argument widths for this print, including an enable bit
// Iterate through the arguments for this print
for (size_t arg_idx = 0; arg_idx < argument_counts[p_idx]; arg_idx++) {
size_t arg_width = argument_widths[arg_base_offset + arg_idx];
widths[p_idx].push_back(arg_width);
mpz_t* mask = (mpz_t*)malloc(sizeof(mpz_t));
// Below is equivalent to *mask = (1 << arg_width) - 1
mpz_init(*mask);
mpz_set_ui(*mask, 1);
mpz_mul_2exp(*mask, *mask, arg_width);
mpz_sub_ui(*mask, *mask, 1);
print_args->data.push_back(mask);
print_width += arg_width;
}
size_t aligned_offset = print_bit_offset / gmp_align_bits;
size_t aligned_msw = (print_width + print_bit_offset) / gmp_align_bits;
size_t rounded_size = aligned_msw - aligned_offset + 1;
arg_base_offset += argument_counts[p_idx];
masks.push_back(print_args);
sizes.push_back(rounded_size);
aligned_offsets.push_back(aligned_offset);
bit_offset.push_back(print_bit_offset % gmp_align_bits);
print_bit_offset += print_width;
}
};
synthesized_prints_t::~synthesized_prints_t() {
free(this->mmio_addrs);
for (size_t i = 0 ; i < print_count ; i++) {
delete masks[i];
}
}
void synthesized_prints_t::init() {
// Set the bounds in the widget
write(this->mmio_addrs->startCycleL, this->start_cycle);
write(this->mmio_addrs->startCycleH, this->start_cycle >> 32);
write(this->mmio_addrs->endCycleL, this->end_cycle);
write(this->mmio_addrs->endCycleH, this->end_cycle >> 32);
write(this->mmio_addrs->doneInit, 1);
}
// Accepts the format string, and the masked arguments, and emits the formatted
// print to the desired stream
void synthesized_prints_t::print_format(const char* fmt, print_vars_t* vars, print_vars_t* masks) {
size_t k = 0;
if (print_cycle_prefix) {
*printstream << "CYCLE:" << std::setw(13) << current_cycle << " ";
}
while(*fmt) {
if (*fmt == '%' && fmt[1] != '%') {
mpz_t* value = vars->data[k];
char* v = NULL;
if (fmt[1] == 's') {
size_t size;
v = (char*)mpz_export(NULL, &size, 1, sizeof(char), 0, 0, *value);
for (size_t j = 0 ; j < size ; j++) printstream->put(v[j]);
fmt++;
free(v);
} else {
char buf[1024];
switch(*(++fmt)) {
case 'h':
case 'x': gmp_sprintf(buf, "%0*Zx", mpz_sizeinbase(*(masks->data[k]), 16), *value); break;
case 'd': gmp_sprintf(buf, "%*Zd", mpz_sizeinbase(*(masks->data[k]), 10), *value); break;
case 'b': mpz_get_str(buf, 2, *value); break;
default: assert(0); break;
}
(*printstream) << buf;
}
fmt++;
k++;
} else if (*fmt == '%') {
printstream->put(*(++fmt));
fmt++;
} else if (*fmt == '\\' && fmt[1] == 'n') {
printstream->put('\n');
fmt += 2;
} else {
printstream->put(*fmt);
fmt++;
}
}
assert(k == vars->data.size());
}
// Returns true if at least one print in the token is enabled in this cycle
bool has_enabled_print(char * buf) { return (buf[0] & 1); }
// If the token has no enabled prints, return a number of idle cycles encoded in the msbs
uint32_t decode_idle_cycles(char * buf, uint32_t mask) {
return (((*((uint32_t*)buf)) & mask) >> 1);
}
// Iterates through the DMA flits (each is one token); checking if their are enabled prints
void synthesized_prints_t::process_tokens(size_t beats) {
size_t batch_bytes = beats * beat_bytes;
// See FireSim issue #208
// This needs to be page aligned, as a DMA request that spans a page is
// fractured into a pair, and for reasons unknown, first beat of the second
// request is lost. Once aligned, qequests larger than a page will be fractured into
// page-size (64-beat) requests and these seem to behave correctly.
alignas(4096) char buf[batch_bytes];
uint32_t bytes_received = pull(dma_address, (char*)buf, batch_bytes);
if (bytes_received != batch_bytes) {
printf("ERR MISMATCH! on reading print tokens. Read %d bytes, wanted %d bytes.\n",
bytes_received, batch_bytes);
printf("errno: %s\n", strerror(errno));
exit(1);
}
if (human_readable) {
for (size_t idx = 0; idx < batch_bytes; idx += token_bytes ) {
if (has_enabled_print(&buf[idx])) {
show_prints(&buf[idx]);
current_cycle++;
} else {
current_cycle += decode_idle_cycles(&buf[idx], idle_cycles_mask);
}
}
} else {
printstream->write(buf, batch_bytes);
}
}
// Returns true if the print at the current offset is enabled in this cycle
bool synthesized_prints_t::current_print_enabled(gmp_align_t * buf, size_t offset) {
return (buf[0] & (1LL << (offset)));
}
// Finds enabled prints in a token
void synthesized_prints_t::show_prints(char * buf) {
for (size_t i = 0 ; i < print_count; i++) {
gmp_align_t* data = ((gmp_align_t*)buf) + aligned_offsets[i];
// First bit is enable
if (current_print_enabled(data, bit_offset[i])) {
mpz_t print;
mpz_init(print);
mpz_import(print, sizes[i], -1, sizeof(gmp_align_t), 0, 0, data);
mpz_fdiv_q_2exp(print, print, bit_offset[i] + 1);
print_vars_t vars;
size_t num_args = argument_counts[i];
for (size_t arg = 0 ; arg < num_args ; arg++) {
mpz_t* var = (mpz_t*)malloc(sizeof(mpz_t));
mpz_t* mask = masks[i]->data[arg];
mpz_init(*var);
// *var = print & *mask
mpz_and(*var, print, *mask);
vars.data.push_back(var);
// print = print >> width
mpz_fdiv_q_2exp(print, print, widths[i][arg]);
}
print_format(format_strings[i], &vars, masks[i]);
mpz_clear(print);
}
}
}
void synthesized_prints_t::tick() {
// Pull batch_tokens from the FPGA if at least that many are avaiable
// Assumes 1:1 token to dma-beat size
size_t beats_available = read(mmio_addrs->outgoing_count);
if (beats_available >= batch_beats) {
process_tokens(batch_beats);
}
}
// This is a little hacky... however it'll probably work perfectly fine on the
// FPGA as mmio read latency is 100+ ns.
int synthesized_prints_t::beats_avaliable_stable() {
size_t prev_beats_available = 0;
size_t beats_avaliable = read(mmio_addrs->outgoing_count);
while (beats_avaliable > prev_beats_available) {
prev_beats_available = beats_avaliable;
beats_avaliable = read(mmio_addrs->outgoing_count);
}
return beats_avaliable;
}
// Pull in any remaining tokens and flush them to file
// WARNING: may not function correctly if the simulator is actively running
void synthesized_prints_t::flush() {
// Wait for the system to settle
size_t beats_available = beats_avaliable_stable();
// If multiple tokens are being packed into a single DMA beat, force the widget
// to write out any incomplete beat
if (token_bytes < beat_bytes) {
write(mmio_addrs->flushNarrowPacket, 1);
while (read(mmio_addrs->outgoing_count) != (beats_available + 1));
beats_available++;
}
if (beats_available) process_tokens(beats_available);
this->printstream->flush();
}
#endif // PRINTBRIDGEMODULE_struct_guard

View File

@ -0,0 +1,97 @@
#ifndef __SYNTHESIZED_PRINTS_H
#define __SYNTHESIZED_PRINTS_H
#ifdef PRINTBRIDGEMODULE_struct_guard
#include <vector>
#include <iostream>
#include <fstream>
#include <gmp.h>
#include "bridge_driver.h"
struct print_vars_t {
std::vector<mpz_t*> data;
~print_vars_t() {
for (auto& e: data) {
mpz_clear(*e);
free(e);
}
}
};
class synthesized_prints_t: public bridge_driver_t
{
public:
synthesized_prints_t(simif_t* sim,
std::vector<std::string> &args,
PRINTBRIDGEMODULE_struct * mmio_addrs,
unsigned int print_count,
unsigned int token_bytes,
unsigned int idle_cycles_mask,
const unsigned int* print_offsets,
const char* const* format_strings,
const unsigned int* argument_counts,
const unsigned int* argument_widths,
unsigned int dma_address);
~synthesized_prints_t();
virtual void init();
virtual void tick();
virtual bool terminate() { return false; };
virtual int exit_code() { return 0; };
void flush();
void finish() { flush(); };
private:
PRINTBRIDGEMODULE_struct * mmio_addrs;
const unsigned int print_count;
const unsigned int token_bytes;
const unsigned int idle_cycles_mask;
const unsigned int* print_offsets;
const char* const* format_strings;
const unsigned int* argument_counts;
const unsigned int* argument_widths;
const unsigned int dma_address;
// DMA batching parameters
const size_t beat_bytes = DMA_DATA_BITS / 8;
// The number of DMA beats to pull off the FPGA on each invocation of tick()
// This will be set based on the ratio of token_size : desired_batch_beats
size_t batch_beats;
// This will be modified to be a multiple of the token size
const size_t desired_batch_beats = 3072;
// Used to define the boundaries in the batch buffer at which we'll
// initalize GMP types
using gmp_align_t = uint64_t;
const size_t gmp_align_bits = sizeof(gmp_align_t) * 8;
// +arg driven members
std::ofstream printfile; // Used only if the +print-file arg is provided
std::string default_filename = "synthesized-prints.out";
std::ostream* printstream; // Is set to std::cerr otherwise
uint64_t start_cycle, end_cycle; // Bounds between which prints will be emitted
uint64_t current_cycle = 0;
bool human_readable = true;
bool print_cycle_prefix = true;
std::vector<std::vector<size_t>> widths;
std::vector<size_t> sizes;
std::vector<print_vars_t*> masks;
std::vector<size_t> aligned_offsets; // Aligned to gmp_align_t
std::vector<size_t> bit_offset;
bool current_print_enabled(gmp_align_t* buf, size_t offset);
void process_tokens(size_t beats);
void show_prints(char * buf);
void print_format(const char* fmt, print_vars_t* vars, print_vars_t* masks);
// Returns the number of beats available, once two successive reads return the same value
int beats_avaliable_stable();
};
#endif // PRINTBRIDGEMODULE_struct_guard
#endif //__SYNTHESIZED_PRINTS_H

@ -0,0 +1 @@
Subproject commit 2ec7965b2ee051aaff03d5db21c6709aea4dd24e

View File

@ -0,0 +1,20 @@
// See LICENSE for license details.
#ifndef __MMIO_H
#define __MMIO_H
#include <stdint.h>
#include <stddef.h>
class mmio_t
{
public:
virtual void read_req(uint64_t addr, size_t size, size_t len) = 0;
virtual void write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb) = 0;
virtual bool read_resp(void *data) = 0;
virtual bool write_resp() = 0;
};
void* init(uint64_t memsize, bool dram);
#endif // __MMIO_H

View File

@ -0,0 +1,940 @@
#include "mmio_f1.h"
#include "mm.h"
#include "mm_dramsim2.h"
#include <memory>
#include <cassert>
#include <cmath>
#ifdef VCS
#include <DirectC.h>
#include "midas_context.h"
#else
#include <verilated.h>
#if VM_TRACE
#include <verilated_vcd_c.h>
#endif // VM_TRACE
#endif
void mmio_f1_t::read_req(uint64_t addr, size_t size, size_t len) {
mmio_req_addr_t ar(0, addr, size, len);
this->ar.push(ar);
}
void mmio_f1_t::write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb) {
int nbytes = 1 << size;
mmio_req_addr_t aw(0, addr, size, len);
this->aw.push(aw);
for (int i = 0; i < len + 1; i++) {
mmio_req_data_t w(((char*) data) + i * nbytes, strb[i], i == len);
this->w.push(w);
}
}
void mmio_f1_t::tick(
bool reset,
bool ar_ready,
bool aw_ready,
bool w_ready,
size_t r_id,
void* r_data,
bool r_last,
bool r_valid,
size_t b_id,
bool b_valid)
{
const bool ar_fire = !reset && ar_ready && ar_valid();
const bool aw_fire = !reset && aw_ready && aw_valid();
const bool w_fire = !reset && w_ready && w_valid();
const bool r_fire = !reset && r_valid && r_ready();
const bool b_fire = !reset && b_valid && b_ready();
if (ar_fire) read_inflight = true;
if (aw_fire) write_inflight = true;
if (w_fire) this->w.pop();
if (r_fire) {
char* dat = (char*)malloc(dummy_data.size());
memcpy(dat, (char*)r_data, dummy_data.size());
mmio_resp_data_t r(r_id, dat, r_last);
this->r.push(r);
}
if (b_fire) {
this->b.push(b_id);
}
}
bool mmio_f1_t::read_resp(void* data) {
if (ar.empty() || r.size() <= ar.front().len) {
return false;
} else {
auto ar = this->ar.front();
size_t word_size = 1 << ar.size;
for (size_t i = 0 ; i <= ar.len ; i++) {
auto r = this->r.front();
assert(i < ar.len || r.last);
memcpy(((char*)data) + i * word_size, r.data, word_size);
free(r.data);
this->r.pop();
}
this->ar.pop();
read_inflight = false;
return true;
}
}
bool mmio_f1_t::write_resp() {
if (aw.empty() || b.empty()) {
return false;
} else {
aw.pop();
b.pop();
write_inflight = false;
return true;
}
}
extern uint64_t main_time;
extern std::unique_ptr<mmio_t> master;
extern std::unique_ptr<mmio_t> dma;
std::unique_ptr<mm_t> slave[4];
void* init(uint64_t memsize, bool dramsim) {
master.reset(new mmio_f1_t(MMIO_WIDTH));
dma.reset(new mmio_f1_t(DMA_WIDTH));
for (int mem_channel_index=0; mem_channel_index < 4; mem_channel_index++) {
slave[mem_channel_index].reset(dramsim ? (mm_t*) new mm_dramsim2_t(1 << MEM_ID_BITS) : (mm_t*) new mm_magic_t);
slave[mem_channel_index]->init(memsize, MEM_WIDTH, 64);
}
return slave[0]->get_data();
}
#ifdef VCS
static const size_t MASTER_DATA_SIZE = MMIO_WIDTH / sizeof(uint32_t);
static const size_t DMA_DATA_SIZE = DMA_WIDTH / sizeof(uint32_t);
static const size_t DMA_STRB_SIZE = (DMA_WIDTH/8 + sizeof(uint32_t) - 1) / sizeof(uint32_t);
static const size_t SLAVE_DATA_SIZE = MEM_WIDTH / sizeof(uint32_t);
extern midas_context_t* host;
extern bool vcs_fin;
extern bool vcs_rst;
extern "C" {
void tick(
vc_handle reset,
vc_handle fin,
vc_handle master_ar_valid,
vc_handle master_ar_ready,
vc_handle master_ar_bits_addr,
vc_handle master_ar_bits_id,
vc_handle master_ar_bits_size,
vc_handle master_ar_bits_len,
vc_handle master_aw_valid,
vc_handle master_aw_ready,
vc_handle master_aw_bits_addr,
vc_handle master_aw_bits_id,
vc_handle master_aw_bits_size,
vc_handle master_aw_bits_len,
vc_handle master_w_valid,
vc_handle master_w_ready,
vc_handle master_w_bits_strb,
vc_handle master_w_bits_data,
vc_handle master_w_bits_last,
vc_handle master_r_valid,
vc_handle master_r_ready,
vc_handle master_r_bits_resp,
vc_handle master_r_bits_id,
vc_handle master_r_bits_data,
vc_handle master_r_bits_last,
vc_handle master_b_valid,
vc_handle master_b_ready,
vc_handle master_b_bits_resp,
vc_handle master_b_bits_id,
vc_handle dma_ar_valid,
vc_handle dma_ar_ready,
vc_handle dma_ar_bits_addr,
vc_handle dma_ar_bits_id,
vc_handle dma_ar_bits_size,
vc_handle dma_ar_bits_len,
vc_handle dma_aw_valid,
vc_handle dma_aw_ready,
vc_handle dma_aw_bits_addr,
vc_handle dma_aw_bits_id,
vc_handle dma_aw_bits_size,
vc_handle dma_aw_bits_len,
vc_handle dma_w_valid,
vc_handle dma_w_ready,
vc_handle dma_w_bits_strb,
vc_handle dma_w_bits_data,
vc_handle dma_w_bits_last,
vc_handle dma_r_valid,
vc_handle dma_r_ready,
vc_handle dma_r_bits_resp,
vc_handle dma_r_bits_id,
vc_handle dma_r_bits_data,
vc_handle dma_r_bits_last,
vc_handle dma_b_valid,
vc_handle dma_b_ready,
vc_handle dma_b_bits_resp,
vc_handle dma_b_bits_id,
vc_handle slave_0_ar_valid,
vc_handle slave_0_ar_ready,
vc_handle slave_0_ar_bits_addr,
vc_handle slave_0_ar_bits_id,
vc_handle slave_0_ar_bits_size,
vc_handle slave_0_ar_bits_len,
vc_handle slave_0_aw_valid,
vc_handle slave_0_aw_ready,
vc_handle slave_0_aw_bits_addr,
vc_handle slave_0_aw_bits_id,
vc_handle slave_0_aw_bits_size,
vc_handle slave_0_aw_bits_len,
vc_handle slave_0_w_valid,
vc_handle slave_0_w_ready,
vc_handle slave_0_w_bits_strb,
vc_handle slave_0_w_bits_data,
vc_handle slave_0_w_bits_last,
vc_handle slave_0_r_valid,
vc_handle slave_0_r_ready,
vc_handle slave_0_r_bits_resp,
vc_handle slave_0_r_bits_id,
vc_handle slave_0_r_bits_data,
vc_handle slave_0_r_bits_last,
vc_handle slave_0_b_valid,
vc_handle slave_0_b_ready,
vc_handle slave_0_b_bits_resp,
vc_handle slave_0_b_bits_id,
vc_handle slave_1_ar_valid,
vc_handle slave_1_ar_ready,
vc_handle slave_1_ar_bits_addr,
vc_handle slave_1_ar_bits_id,
vc_handle slave_1_ar_bits_size,
vc_handle slave_1_ar_bits_len,
vc_handle slave_1_aw_valid,
vc_handle slave_1_aw_ready,
vc_handle slave_1_aw_bits_addr,
vc_handle slave_1_aw_bits_id,
vc_handle slave_1_aw_bits_size,
vc_handle slave_1_aw_bits_len,
vc_handle slave_1_w_valid,
vc_handle slave_1_w_ready,
vc_handle slave_1_w_bits_strb,
vc_handle slave_1_w_bits_data,
vc_handle slave_1_w_bits_last,
vc_handle slave_1_r_valid,
vc_handle slave_1_r_ready,
vc_handle slave_1_r_bits_resp,
vc_handle slave_1_r_bits_id,
vc_handle slave_1_r_bits_data,
vc_handle slave_1_r_bits_last,
vc_handle slave_1_b_valid,
vc_handle slave_1_b_ready,
vc_handle slave_1_b_bits_resp,
vc_handle slave_1_b_bits_id,
vc_handle slave_2_ar_valid,
vc_handle slave_2_ar_ready,
vc_handle slave_2_ar_bits_addr,
vc_handle slave_2_ar_bits_id,
vc_handle slave_2_ar_bits_size,
vc_handle slave_2_ar_bits_len,
vc_handle slave_2_aw_valid,
vc_handle slave_2_aw_ready,
vc_handle slave_2_aw_bits_addr,
vc_handle slave_2_aw_bits_id,
vc_handle slave_2_aw_bits_size,
vc_handle slave_2_aw_bits_len,
vc_handle slave_2_w_valid,
vc_handle slave_2_w_ready,
vc_handle slave_2_w_bits_strb,
vc_handle slave_2_w_bits_data,
vc_handle slave_2_w_bits_last,
vc_handle slave_2_r_valid,
vc_handle slave_2_r_ready,
vc_handle slave_2_r_bits_resp,
vc_handle slave_2_r_bits_id,
vc_handle slave_2_r_bits_data,
vc_handle slave_2_r_bits_last,
vc_handle slave_2_b_valid,
vc_handle slave_2_b_ready,
vc_handle slave_2_b_bits_resp,
vc_handle slave_2_b_bits_id,
vc_handle slave_3_ar_valid,
vc_handle slave_3_ar_ready,
vc_handle slave_3_ar_bits_addr,
vc_handle slave_3_ar_bits_id,
vc_handle slave_3_ar_bits_size,
vc_handle slave_3_ar_bits_len,
vc_handle slave_3_aw_valid,
vc_handle slave_3_aw_ready,
vc_handle slave_3_aw_bits_addr,
vc_handle slave_3_aw_bits_id,
vc_handle slave_3_aw_bits_size,
vc_handle slave_3_aw_bits_len,
vc_handle slave_3_w_valid,
vc_handle slave_3_w_ready,
vc_handle slave_3_w_bits_strb,
vc_handle slave_3_w_bits_data,
vc_handle slave_3_w_bits_last,
vc_handle slave_3_r_valid,
vc_handle slave_3_r_ready,
vc_handle slave_3_r_bits_resp,
vc_handle slave_3_r_bits_id,
vc_handle slave_3_r_bits_data,
vc_handle slave_3_r_bits_last,
vc_handle slave_3_b_valid,
vc_handle slave_3_b_ready,
vc_handle slave_3_b_bits_resp,
vc_handle slave_3_b_bits_id
) {
mmio_f1_t *m, *d;
assert(m = dynamic_cast<mmio_f1_t*>(master.get()));
assert(d = dynamic_cast<mmio_f1_t*>(dma.get()));
assert(DMA_STRB_SIZE <= 2);
uint32_t master_r_data[MASTER_DATA_SIZE];
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
master_r_data[i] = vc_4stVectorRef(master_r_bits_data)[i].d;
}
uint32_t dma_r_data[DMA_DATA_SIZE];
for (size_t i = 0 ; i < DMA_DATA_SIZE ; i++) {
dma_r_data[i] = vc_4stVectorRef(dma_r_bits_data)[i].d;
}
uint32_t slave_0_w_data[SLAVE_DATA_SIZE];
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
slave_0_w_data[i] = vc_4stVectorRef(slave_0_w_bits_data)[i].d;
}
uint32_t slave_1_w_data[SLAVE_DATA_SIZE];
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
slave_1_w_data[i] = vc_4stVectorRef(slave_1_w_bits_data)[i].d;
}
uint32_t slave_2_w_data[SLAVE_DATA_SIZE];
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
slave_2_w_data[i] = vc_4stVectorRef(slave_2_w_bits_data)[i].d;
}
uint32_t slave_3_w_data[SLAVE_DATA_SIZE];
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
slave_3_w_data[i] = vc_4stVectorRef(slave_3_w_bits_data)[i].d;
}
m->tick(
vcs_rst,
vc_getScalar(master_ar_ready),
vc_getScalar(master_aw_ready),
vc_getScalar(master_w_ready),
vc_4stVectorRef(master_r_bits_id)->d,
master_r_data,
vc_getScalar(master_r_bits_last),
vc_getScalar(master_r_valid),
vc_4stVectorRef(master_b_bits_id)->d,
vc_getScalar(master_b_valid)
);
d->tick(
vcs_rst,
vc_getScalar(dma_ar_ready),
vc_getScalar(dma_aw_ready),
vc_getScalar(dma_w_ready),
vc_4stVectorRef(dma_r_bits_id)->d,
dma_r_data,
vc_getScalar(dma_r_bits_last),
vc_getScalar(dma_r_valid),
vc_4stVectorRef(dma_b_bits_id)->d,
vc_getScalar(dma_b_valid)
);
slave[0]->tick(
vcs_rst,
vc_getScalar(slave_0_ar_valid),
vc_4stVectorRef(slave_0_ar_bits_addr)->d,
vc_4stVectorRef(slave_0_ar_bits_id)->d,
vc_4stVectorRef(slave_0_ar_bits_size)->d,
vc_4stVectorRef(slave_0_ar_bits_len)->d,
vc_getScalar(slave_0_aw_valid),
vc_4stVectorRef(slave_0_aw_bits_addr)->d,
vc_4stVectorRef(slave_0_aw_bits_id)->d,
vc_4stVectorRef(slave_0_aw_bits_size)->d,
vc_4stVectorRef(slave_0_aw_bits_len)->d,
vc_getScalar(slave_0_w_valid),
vc_4stVectorRef(slave_0_w_bits_strb)->d,
slave_0_w_data,
vc_getScalar(slave_0_w_bits_last),
vc_getScalar(slave_0_r_ready),
vc_getScalar(slave_0_b_ready)
);
slave[1]->tick(
vcs_rst,
vc_getScalar(slave_1_ar_valid),
vc_4stVectorRef(slave_1_ar_bits_addr)->d,
vc_4stVectorRef(slave_1_ar_bits_id)->d,
vc_4stVectorRef(slave_1_ar_bits_size)->d,
vc_4stVectorRef(slave_1_ar_bits_len)->d,
vc_getScalar(slave_1_aw_valid),
vc_4stVectorRef(slave_1_aw_bits_addr)->d,
vc_4stVectorRef(slave_1_aw_bits_id)->d,
vc_4stVectorRef(slave_1_aw_bits_size)->d,
vc_4stVectorRef(slave_1_aw_bits_len)->d,
vc_getScalar(slave_1_w_valid),
vc_4stVectorRef(slave_1_w_bits_strb)->d,
slave_1_w_data,
vc_getScalar(slave_1_w_bits_last),
vc_getScalar(slave_1_r_ready),
vc_getScalar(slave_1_b_ready)
);
slave[2]->tick(
vcs_rst,
vc_getScalar(slave_2_ar_valid),
vc_4stVectorRef(slave_2_ar_bits_addr)->d,
vc_4stVectorRef(slave_2_ar_bits_id)->d,
vc_4stVectorRef(slave_2_ar_bits_size)->d,
vc_4stVectorRef(slave_2_ar_bits_len)->d,
vc_getScalar(slave_2_aw_valid),
vc_4stVectorRef(slave_2_aw_bits_addr)->d,
vc_4stVectorRef(slave_2_aw_bits_id)->d,
vc_4stVectorRef(slave_2_aw_bits_size)->d,
vc_4stVectorRef(slave_2_aw_bits_len)->d,
vc_getScalar(slave_2_w_valid),
vc_4stVectorRef(slave_2_w_bits_strb)->d,
slave_2_w_data,
vc_getScalar(slave_2_w_bits_last),
vc_getScalar(slave_2_r_ready),
vc_getScalar(slave_2_b_ready)
);
slave[3]->tick(
vcs_rst,
vc_getScalar(slave_3_ar_valid),
vc_4stVectorRef(slave_3_ar_bits_addr)->d,
vc_4stVectorRef(slave_3_ar_bits_id)->d,
vc_4stVectorRef(slave_3_ar_bits_size)->d,
vc_4stVectorRef(slave_3_ar_bits_len)->d,
vc_getScalar(slave_3_aw_valid),
vc_4stVectorRef(slave_3_aw_bits_addr)->d,
vc_4stVectorRef(slave_3_aw_bits_id)->d,
vc_4stVectorRef(slave_3_aw_bits_size)->d,
vc_4stVectorRef(slave_3_aw_bits_len)->d,
vc_getScalar(slave_3_w_valid),
vc_4stVectorRef(slave_3_w_bits_strb)->d,
slave_3_w_data,
vc_getScalar(slave_3_w_bits_last),
vc_getScalar(slave_3_r_ready),
vc_getScalar(slave_3_b_ready)
);
if (!vcs_fin) host->switch_to();
else vcs_fin = false;
vc_putScalar(master_aw_valid, m->aw_valid());
vc_putScalar(master_ar_valid, m->ar_valid());
vc_putScalar(master_w_valid, m->w_valid());
vc_putScalar(master_w_bits_last, m->w_last());
vc_putScalar(master_r_ready, m->r_ready());
vc_putScalar(master_b_ready, m->b_ready());
vec32 md[MASTER_DATA_SIZE];
md[0].c = 0;
md[0].d = m->aw_id();
vc_put4stVector(master_aw_bits_id, md);
md[0].c = 0;
md[0].d = m->aw_addr();
vc_put4stVector(master_aw_bits_addr, md);
md[0].c = 0;
md[0].d = m->aw_size();
vc_put4stVector(master_aw_bits_size, md);
md[0].c = 0;
md[0].d = m->aw_len();
vc_put4stVector(master_aw_bits_len, md);
md[0].c = 0;
md[0].d = m->ar_id();
vc_put4stVector(master_ar_bits_id, md);
md[0].c = 0;
md[0].d = m->ar_addr();
vc_put4stVector(master_ar_bits_addr, md);
md[0].c = 0;
md[0].d = m->ar_size();
vc_put4stVector(master_ar_bits_size, md);
md[0].c = 0;
md[0].d = m->ar_len();
vc_put4stVector(master_ar_bits_len, md);
md[0].c = 0;
md[0].d = m->w_strb();
vc_put4stVector(master_w_bits_strb, md);
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
md[i].c = 0;
md[i].d = ((uint32_t*) m->w_data())[i];
}
vc_put4stVector(master_w_bits_data, md);
vc_putScalar(dma_aw_valid, d->aw_valid());
vc_putScalar(dma_ar_valid, d->ar_valid());
vc_putScalar(dma_w_valid, d->w_valid());
vc_putScalar(dma_w_bits_last, d->w_last());
vc_putScalar(dma_r_ready, d->r_ready());
vc_putScalar(dma_b_ready, d->b_ready());
vec32 dd[DMA_DATA_SIZE];
dd[0].c = 0;
dd[0].d = d->aw_id();
vc_put4stVector(dma_aw_bits_id, dd);
dd[0].c = 0;
dd[0].d = d->aw_addr();
dd[1].c = 0;
dd[1].d = d->aw_addr() >> 32;
vc_put4stVector(dma_aw_bits_addr, dd);
dd[0].c = 0;
dd[0].d = d->aw_size();
vc_put4stVector(dma_aw_bits_size, dd);
dd[0].c = 0;
dd[0].d = d->aw_len();
vc_put4stVector(dma_aw_bits_len, dd);
dd[0].c = 0;
dd[0].d = d->ar_id();
vc_put4stVector(dma_ar_bits_id, dd);
dd[0].c = 0;
dd[0].d = d->ar_addr();
dd[1].c = 0;
dd[1].d = d->ar_addr() >> 32;
vc_put4stVector(dma_ar_bits_addr, dd);
dd[0].c = 0;
dd[0].d = d->ar_size();
vc_put4stVector(dma_ar_bits_size, dd);
dd[0].c = 0;
dd[0].d = d->ar_len();
vc_put4stVector(dma_ar_bits_len, dd);
auto strb = d->w_strb();
for (size_t i = 0 ; i < DMA_STRB_SIZE ; i++) {
dd[i].c = 0;
dd[i].d = ((uint32_t*)(&strb))[i];
}
vc_put4stVector(dma_w_bits_strb, dd);
for (size_t i = 0 ; i < DMA_DATA_SIZE ; i++) {
dd[i].c = 0;
dd[i].d = ((uint32_t*) d->w_data())[i];
}
vc_put4stVector(dma_w_bits_data, dd);
vc_putScalar(slave_0_aw_ready, slave[0]->aw_ready());
vc_putScalar(slave_0_ar_ready, slave[0]->ar_ready());
vc_putScalar(slave_0_w_ready, slave[0]->w_ready());
vc_putScalar(slave_0_b_valid, slave[0]->b_valid());
vc_putScalar(slave_0_r_valid, slave[0]->r_valid());
vc_putScalar(slave_0_r_bits_last, slave[0]->r_last());
vc_putScalar(slave_1_aw_ready, slave[1]->aw_ready());
vc_putScalar(slave_1_ar_ready, slave[1]->ar_ready());
vc_putScalar(slave_1_w_ready, slave[1]->w_ready());
vc_putScalar(slave_1_b_valid, slave[1]->b_valid());
vc_putScalar(slave_1_r_valid, slave[1]->r_valid());
vc_putScalar(slave_1_r_bits_last, slave[1]->r_last());
vc_putScalar(slave_2_aw_ready, slave[2]->aw_ready());
vc_putScalar(slave_2_ar_ready, slave[2]->ar_ready());
vc_putScalar(slave_2_w_ready, slave[2]->w_ready());
vc_putScalar(slave_2_b_valid, slave[2]->b_valid());
vc_putScalar(slave_2_r_valid, slave[2]->r_valid());
vc_putScalar(slave_2_r_bits_last, slave[2]->r_last());
vc_putScalar(slave_3_aw_ready, slave[3]->aw_ready());
vc_putScalar(slave_3_ar_ready, slave[3]->ar_ready());
vc_putScalar(slave_3_w_ready, slave[3]->w_ready());
vc_putScalar(slave_3_b_valid, slave[3]->b_valid());
vc_putScalar(slave_3_r_valid, slave[3]->r_valid());
vc_putScalar(slave_3_r_bits_last, slave[3]->r_last());
vec32 sd[SLAVE_DATA_SIZE];
sd[0].c = 0;
sd[0].d = slave[0]->b_id();
vc_put4stVector(slave_0_b_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[0]->b_resp();
vc_put4stVector(slave_0_b_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave[0]->r_id();
vc_put4stVector(slave_0_r_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[0]->r_resp();
vc_put4stVector(slave_0_r_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave[1]->b_id();
vc_put4stVector(slave_1_b_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[1]->b_resp();
vc_put4stVector(slave_1_b_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave[1]->r_id();
vc_put4stVector(slave_1_r_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[1]->r_resp();
vc_put4stVector(slave_1_r_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave[2]->b_id();
vc_put4stVector(slave_2_b_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[2]->b_resp();
vc_put4stVector(slave_2_b_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave[2]->r_id();
vc_put4stVector(slave_2_r_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[2]->r_resp();
vc_put4stVector(slave_2_r_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave[3]->b_id();
vc_put4stVector(slave_3_b_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[3]->b_resp();
vc_put4stVector(slave_3_b_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave[3]->r_id();
vc_put4stVector(slave_3_r_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave[3]->r_resp();
vc_put4stVector(slave_3_r_bits_resp, sd);
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
sd[i].c = 0;
sd[i].d = ((uint32_t*) slave[0]->r_data())[i];
}
vc_put4stVector(slave_0_r_bits_data, sd);
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
sd[i].c = 0;
sd[i].d = ((uint32_t*) slave[1]->r_data())[i];
}
vc_put4stVector(slave_1_r_bits_data, sd);
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
sd[i].c = 0;
sd[i].d = ((uint32_t*) slave[2]->r_data())[i];
}
vc_put4stVector(slave_2_r_bits_data, sd);
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
sd[i].c = 0;
sd[i].d = ((uint32_t*) slave[3]->r_data())[i];
}
vc_put4stVector(slave_3_r_bits_data, sd);
vc_putScalar(reset, vcs_rst);
vc_putScalar(fin, vcs_fin);
main_time++;
}
}
#else
extern PLATFORM_TYPE* top;
#if VM_TRACE
extern VerilatedVcdC* tfp;
#endif // VM_TRACE
void tick() {
mmio_f1_t *m, *d;
assert(m = dynamic_cast<mmio_f1_t*>(master.get()));
assert(d = dynamic_cast<mmio_f1_t*>(dma.get()));
// ASSUMPTION: All models have *no* combinational paths through I/O
// Step 1: Clock lo -> propagate signals between DUT and software models
top->io_master_aw_valid = m->aw_valid();
top->io_master_aw_bits_id = m->aw_id();
top->io_master_aw_bits_addr = m->aw_addr();
top->io_master_aw_bits_size = m->aw_size();
top->io_master_aw_bits_len = m->aw_len();
top->io_master_ar_valid = m->ar_valid();
top->io_master_ar_bits_id = m->ar_id();
top->io_master_ar_bits_addr = m->ar_addr();
top->io_master_ar_bits_size = m->ar_size();
top->io_master_ar_bits_len = m->ar_len();
top->io_master_w_valid = m->w_valid();
top->io_master_w_bits_strb = m->w_strb();
top->io_master_w_bits_last = m->w_last();
top->io_master_r_ready = m->r_ready();
top->io_master_b_ready = m->b_ready();
#if CTRL_DATA_BITS > 64
memcpy(top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
#else
memcpy(&top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
#endif
top->io_dma_aw_valid = d->aw_valid();
top->io_dma_aw_bits_id = d->aw_id();
top->io_dma_aw_bits_addr = d->aw_addr();
top->io_dma_aw_bits_size = d->aw_size();
top->io_dma_aw_bits_len = d->aw_len();
top->io_dma_ar_valid = d->ar_valid();
top->io_dma_ar_bits_id = d->ar_id();
top->io_dma_ar_bits_addr = d->ar_addr();
top->io_dma_ar_bits_size = d->ar_size();
top->io_dma_ar_bits_len = d->ar_len();
top->io_dma_w_valid = d->w_valid();
top->io_dma_w_bits_strb = d->w_strb();
top->io_dma_w_bits_last = d->w_last();
top->io_dma_r_ready = d->r_ready();
top->io_dma_b_ready = d->b_ready();
#if DMA_DATA_BITS > 64
memcpy(top->io_dma_w_bits_data, d->w_data(), DMA_WIDTH);
#else
memcpy(&top->io_dma_w_bits_data, d->w_data(), DMA_WIDTH);
#endif
top->io_slave_0_aw_ready = slave[0]->aw_ready();
top->io_slave_0_ar_ready = slave[0]->ar_ready();
top->io_slave_0_w_ready = slave[0]->w_ready();
top->io_slave_0_b_valid = slave[0]->b_valid();
top->io_slave_0_b_bits_id = slave[0]->b_id();
top->io_slave_0_b_bits_resp = slave[0]->b_resp();
top->io_slave_0_r_valid = slave[0]->r_valid();
top->io_slave_0_r_bits_id = slave[0]->r_id();
top->io_slave_0_r_bits_resp = slave[0]->r_resp();
top->io_slave_0_r_bits_last = slave[0]->r_last();
top->io_slave_1_aw_ready = slave[1]->aw_ready();
top->io_slave_1_ar_ready = slave[1]->ar_ready();
top->io_slave_1_w_ready = slave[1]->w_ready();
top->io_slave_1_b_valid = slave[1]->b_valid();
top->io_slave_1_b_bits_id = slave[1]->b_id();
top->io_slave_1_b_bits_resp = slave[1]->b_resp();
top->io_slave_1_r_valid = slave[1]->r_valid();
top->io_slave_1_r_bits_id = slave[1]->r_id();
top->io_slave_1_r_bits_resp = slave[1]->r_resp();
top->io_slave_1_r_bits_last = slave[1]->r_last();
top->io_slave_2_aw_ready = slave[2]->aw_ready();
top->io_slave_2_ar_ready = slave[2]->ar_ready();
top->io_slave_2_w_ready = slave[2]->w_ready();
top->io_slave_2_b_valid = slave[2]->b_valid();
top->io_slave_2_b_bits_id = slave[2]->b_id();
top->io_slave_2_b_bits_resp = slave[2]->b_resp();
top->io_slave_2_r_valid = slave[2]->r_valid();
top->io_slave_2_r_bits_id = slave[2]->r_id();
top->io_slave_2_r_bits_resp = slave[2]->r_resp();
top->io_slave_2_r_bits_last = slave[2]->r_last();
top->io_slave_3_aw_ready = slave[3]->aw_ready();
top->io_slave_3_ar_ready = slave[3]->ar_ready();
top->io_slave_3_w_ready = slave[3]->w_ready();
top->io_slave_3_b_valid = slave[3]->b_valid();
top->io_slave_3_b_bits_id = slave[3]->b_id();
top->io_slave_3_b_bits_resp = slave[3]->b_resp();
top->io_slave_3_r_valid = slave[3]->r_valid();
top->io_slave_3_r_bits_id = slave[3]->r_id();
top->io_slave_3_r_bits_resp = slave[3]->r_resp();
top->io_slave_3_r_bits_last = slave[3]->r_last();
#if MEM_DATA_BITS > 64
memcpy(top->io_slave_0_r_bits_data, slave[0]->r_data(), MEM_WIDTH);
memcpy(top->io_slave_1_r_bits_data, slave[1]->r_data(), MEM_WIDTH);
memcpy(top->io_slave_2_r_bits_data, slave[2]->r_data(), MEM_WIDTH);
memcpy(top->io_slave_3_r_bits_data, slave[3]->r_data(), MEM_WIDTH);
#else
memcpy(&top->io_slave_0_r_bits_data, slave[0]->r_data(), MEM_WIDTH);
memcpy(&top->io_slave_1_r_bits_data, slave[1]->r_data(), MEM_WIDTH);
memcpy(&top->io_slave_2_r_bits_data, slave[2]->r_data(), MEM_WIDTH);
memcpy(&top->io_slave_3_r_bits_data, slave[3]->r_data(), MEM_WIDTH);
#endif
top->eval();
#if VM_TRACE
if (tfp) tfp->dump((double) main_time);
#endif // VM_TRACE
main_time++;
top->clock = 0;
top->eval(); // This shouldn't do much
#if VM_TRACE
if (tfp) tfp->dump((double) main_time);
#endif // VM_TRACE
main_time++;
// Step 2: Clock high, tick all software models and evaluate DUT with posedge
m->tick(
top->reset,
top->io_master_ar_ready,
top->io_master_aw_ready,
top->io_master_w_ready,
top->io_master_r_bits_id,
#if CTRL_DATA_BITS > 64
top->io_master_r_bits_data,
#else
&top->io_master_r_bits_data,
#endif
top->io_master_r_bits_last,
top->io_master_r_valid,
top->io_master_b_bits_id,
top->io_master_b_valid
);
d->tick(
top->reset,
top->io_dma_ar_ready,
top->io_dma_aw_ready,
top->io_dma_w_ready,
top->io_dma_r_bits_id,
#if DMA_DATA_BITS > 64
top->io_dma_r_bits_data,
#else
&top->io_dma_r_bits_data,
#endif
top->io_dma_r_bits_last,
top->io_dma_r_valid,
top->io_dma_b_bits_id,
top->io_dma_b_valid
);
slave[0]->tick(
top->reset,
top->io_slave_0_ar_valid,
top->io_slave_0_ar_bits_addr,
top->io_slave_0_ar_bits_id,
top->io_slave_0_ar_bits_size,
top->io_slave_0_ar_bits_len,
top->io_slave_0_aw_valid,
top->io_slave_0_aw_bits_addr,
top->io_slave_0_aw_bits_id,
top->io_slave_0_aw_bits_size,
top->io_slave_0_aw_bits_len,
top->io_slave_0_w_valid,
top->io_slave_0_w_bits_strb,
#if MEM_DATA_BITS > 64
top->io_slave_0_w_bits_data,
#else
&top->io_slave_0_w_bits_data,
#endif
top->io_slave_0_w_bits_last,
top->io_slave_0_r_ready,
top->io_slave_0_b_ready
);
slave[1]->tick(
top->reset,
top->io_slave_1_ar_valid,
top->io_slave_1_ar_bits_addr,
top->io_slave_1_ar_bits_id,
top->io_slave_1_ar_bits_size,
top->io_slave_1_ar_bits_len,
top->io_slave_1_aw_valid,
top->io_slave_1_aw_bits_addr,
top->io_slave_1_aw_bits_id,
top->io_slave_1_aw_bits_size,
top->io_slave_1_aw_bits_len,
top->io_slave_1_w_valid,
top->io_slave_1_w_bits_strb,
#if MEM_DATA_BITS > 64
top->io_slave_1_w_bits_data,
#else
&top->io_slave_1_w_bits_data,
#endif
top->io_slave_1_w_bits_last,
top->io_slave_1_r_ready,
top->io_slave_1_b_ready
);
slave[2]->tick(
top->reset,
top->io_slave_2_ar_valid,
top->io_slave_2_ar_bits_addr,
top->io_slave_2_ar_bits_id,
top->io_slave_2_ar_bits_size,
top->io_slave_2_ar_bits_len,
top->io_slave_2_aw_valid,
top->io_slave_2_aw_bits_addr,
top->io_slave_2_aw_bits_id,
top->io_slave_2_aw_bits_size,
top->io_slave_2_aw_bits_len,
top->io_slave_2_w_valid,
top->io_slave_2_w_bits_strb,
#if MEM_DATA_BITS > 64
top->io_slave_2_w_bits_data,
#else
&top->io_slave_2_w_bits_data,
#endif
top->io_slave_2_w_bits_last,
top->io_slave_2_r_ready,
top->io_slave_2_b_ready
);
slave[3]->tick(
top->reset,
top->io_slave_3_ar_valid,
top->io_slave_3_ar_bits_addr,
top->io_slave_3_ar_bits_id,
top->io_slave_3_ar_bits_size,
top->io_slave_3_ar_bits_len,
top->io_slave_3_aw_valid,
top->io_slave_3_aw_bits_addr,
top->io_slave_3_aw_bits_id,
top->io_slave_3_aw_bits_size,
top->io_slave_3_aw_bits_len,
top->io_slave_3_w_valid,
top->io_slave_3_w_bits_strb,
#if MEM_DATA_BITS > 64
top->io_slave_3_w_bits_data,
#else
&top->io_slave_3_w_bits_data,
#endif
top->io_slave_3_w_bits_last,
top->io_slave_3_r_ready,
top->io_slave_3_b_ready
);
top->clock = 1;
top->eval();
}
#endif // VCS

View File

@ -0,0 +1,98 @@
#ifndef __MMIO_F1_H
#define __MMIO_F1_H
#include "mmio.h"
#include <cstring>
#include <vector>
#include <queue>
struct mmio_req_addr_t
{
size_t id;
uint64_t addr;
size_t size;
size_t len;
mmio_req_addr_t(size_t id_, uint64_t addr_, size_t size_, size_t len_):
id(id_), addr(addr_), size(size_), len(len_) { }
};
struct mmio_req_data_t
{
char* data;
size_t strb;
bool last;
mmio_req_data_t(char* data_, size_t strb_, bool last_):
data(data_), strb(strb_), last(last_) { }
};
struct mmio_resp_data_t
{
size_t id;
char* data;
bool last;
mmio_resp_data_t(size_t id_, char* data_, bool last_):
id(id_), data(data_), last(last_) { }
};
class mmio_f1_t: public mmio_t
{
public:
mmio_f1_t(size_t size): read_inflight(false), write_inflight(false) {
dummy_data.resize(size);
}
bool aw_valid() { return !aw.empty() && !write_inflight; }
size_t aw_id() { return aw_valid() ? aw.front().id : 0; }
uint64_t aw_addr() { return aw_valid() ? aw.front().addr : 0; }
size_t aw_size() { return aw_valid() ? aw.front().size : 0; }
size_t aw_len() { return aw_valid() ? aw.front().len : 0; }
bool ar_valid() { return !ar.empty() && !read_inflight; }
size_t ar_id() { return ar_valid() ? ar.front().id : 0; }
uint64_t ar_addr() { return ar_valid() ? ar.front().addr : 0; }
size_t ar_size() { return ar_valid() ? ar.front().size : 0; }
size_t ar_len() { return ar_valid() ? ar.front().len : 0; }
bool w_valid() { return !w.empty(); }
size_t w_strb() { return w_valid() ? w.front().strb : 0; }
bool w_last() { return w_valid() ? w.front().last : false; }
void* w_data() { return w_valid() ? w.front().data : &dummy_data[0]; }
bool r_ready() { return read_inflight; }
bool b_ready() { return write_inflight; }
void tick
(
bool reset,
bool ar_ready,
bool aw_ready,
bool w_ready,
size_t r_id,
void* r_data,
bool r_last,
bool r_valid,
size_t b_id,
bool b_valid
);
virtual void read_req(uint64_t addr, size_t size, size_t len);
virtual void write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb);
virtual bool read_resp(void *data);
virtual bool write_resp();
private:
std::queue<mmio_req_addr_t> ar;
std::queue<mmio_req_addr_t> aw;
std::queue<mmio_req_data_t> w;
std::queue<mmio_resp_data_t> r;
std::queue<size_t> b;
bool read_inflight;
bool write_inflight;
std::vector<char> dummy_data;
};
#endif // __MMIO_F1_H

View File

@ -0,0 +1,420 @@
// See LICENSE for license details.
#include "mmio_zynq.h"
#include "mm.h"
#include "mm_dramsim2.h"
#include <memory>
#include <cassert>
#include <cmath>
#ifdef VCS
#include <DirectC.h>
#include "midas_context.h"
#else
#include <verilated.h>
#if VM_TRACE
#include <verilated_vcd_c.h>
#endif // VM_TRACE
#endif
void mmio_zynq_t::read_req(uint64_t addr, size_t size, size_t len) {
mmio_req_addr_t ar(0, addr, size, len);
this->ar.push(ar);
}
void mmio_zynq_t::write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb) {
int nbytes = 1 << size;
mmio_req_addr_t aw(0, addr, size, len);
this->aw.push(aw);
for (int i = 0; i < len + 1; i++) {
mmio_req_data_t w(((char*) data) + i * nbytes, strb[i], i == len);
this->w.push(w);
}
}
void mmio_zynq_t::tick(
bool reset,
bool ar_ready,
bool aw_ready,
bool w_ready,
size_t r_id,
void* r_data,
bool r_last,
bool r_valid,
size_t b_id,
bool b_valid)
{
const bool ar_fire = !reset && ar_ready && ar_valid();
const bool aw_fire = !reset && aw_ready && aw_valid();
const bool w_fire = !reset && w_ready && w_valid();
const bool r_fire = !reset && r_valid && r_ready();
const bool b_fire = !reset && b_valid && b_ready();
if (ar_fire) read_inflight = true;
if (aw_fire) write_inflight = true;
if (w_fire) this->w.pop();
if (r_fire) {
char* dat = (char*)malloc(dummy_data.size());
memcpy(dat, (char*)r_data, dummy_data.size());
mmio_resp_data_t r(r_id, dat, r_last);
this->r.push(r);
}
if (b_fire) {
this->b.push(b_id);
}
}
bool mmio_zynq_t::read_resp(void* data) {
if (ar.empty() || r.size() <= ar.front().len) {
return false;
} else {
auto ar = this->ar.front();
size_t word_size = 1 << ar.size;
for (size_t i = 0 ; i <= ar.len ; i++) {
auto r = this->r.front();
assert(ar.id == r.id && (i < ar.len || r.last));
memcpy(((char*)data) + i * word_size, r.data, word_size);
free(r.data);
this->r.pop();
}
this->ar.pop();
read_inflight = false;
return true;
}
}
bool mmio_zynq_t::write_resp() {
if (aw.empty() || b.empty()) {
return false;
} else {
assert(aw.front().id == b.front());
aw.pop();
b.pop();
write_inflight = false;
return true;
}
}
extern uint64_t main_time;
extern std::unique_ptr<mmio_t> master;
std::unique_ptr<mm_t> slave;
void* init(uint64_t memsize, bool dramsim) {
master.reset(new mmio_zynq_t);
slave.reset(dramsim ? (mm_t*) new mm_dramsim2_t(1 << MEM_ID_BITS) : (mm_t*) new mm_magic_t);
slave->init(memsize, MEM_WIDTH, 64);
return slave->get_data();
}
#ifdef VCS
static const size_t MASTER_DATA_SIZE = MMIO_WIDTH / sizeof(uint32_t);
static const size_t SLAVE_DATA_SIZE = MEM_WIDTH / sizeof(uint32_t);
extern midas_context_t* host;
extern bool vcs_fin;
extern bool vcs_rst;
extern "C" {
void tick(
vc_handle reset,
vc_handle fin,
vc_handle master_ar_valid,
vc_handle master_ar_ready,
vc_handle master_ar_bits_addr,
vc_handle master_ar_bits_id,
vc_handle master_ar_bits_size,
vc_handle master_ar_bits_len,
vc_handle master_aw_valid,
vc_handle master_aw_ready,
vc_handle master_aw_bits_addr,
vc_handle master_aw_bits_id,
vc_handle master_aw_bits_size,
vc_handle master_aw_bits_len,
vc_handle master_w_valid,
vc_handle master_w_ready,
vc_handle master_w_bits_strb,
vc_handle master_w_bits_data,
vc_handle master_w_bits_last,
vc_handle master_r_valid,
vc_handle master_r_ready,
vc_handle master_r_bits_resp,
vc_handle master_r_bits_id,
vc_handle master_r_bits_data,
vc_handle master_r_bits_last,
vc_handle master_b_valid,
vc_handle master_b_ready,
vc_handle master_b_bits_resp,
vc_handle master_b_bits_id,
vc_handle slave_ar_valid,
vc_handle slave_ar_ready,
vc_handle slave_ar_bits_addr,
vc_handle slave_ar_bits_id,
vc_handle slave_ar_bits_size,
vc_handle slave_ar_bits_len,
vc_handle slave_aw_valid,
vc_handle slave_aw_ready,
vc_handle slave_aw_bits_addr,
vc_handle slave_aw_bits_id,
vc_handle slave_aw_bits_size,
vc_handle slave_aw_bits_len,
vc_handle slave_w_valid,
vc_handle slave_w_ready,
vc_handle slave_w_bits_strb,
vc_handle slave_w_bits_data,
vc_handle slave_w_bits_last,
vc_handle slave_r_valid,
vc_handle slave_r_ready,
vc_handle slave_r_bits_resp,
vc_handle slave_r_bits_id,
vc_handle slave_r_bits_data,
vc_handle slave_r_bits_last,
vc_handle slave_b_valid,
vc_handle slave_b_ready,
vc_handle slave_b_bits_resp,
vc_handle slave_b_bits_id
) {
mmio_zynq_t* m;
assert(m = dynamic_cast<mmio_zynq_t*>(master.get()));
uint32_t master_r_data[MASTER_DATA_SIZE];
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
master_r_data[i] = vc_4stVectorRef(master_r_bits_data)[i].d;
}
uint32_t slave_w_data[SLAVE_DATA_SIZE];
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
slave_w_data[i] = vc_4stVectorRef(slave_w_bits_data)[i].d;
}
vc_putScalar(master_aw_valid, m->aw_valid());
vc_putScalar(master_ar_valid, m->ar_valid());
vc_putScalar(master_w_valid, m->w_valid());
vc_putScalar(master_w_bits_last, m->w_last());
vc_putScalar(master_r_ready, m->r_ready());
vc_putScalar(master_b_ready, m->b_ready());
vec32 md[MASTER_DATA_SIZE];
md[0].c = 0;
md[0].d = m->aw_id();
vc_put4stVector(master_aw_bits_id, md);
md[0].c = 0;
md[0].d = m->aw_addr();
vc_put4stVector(master_aw_bits_addr, md);
md[0].c = 0;
md[0].d = m->aw_size();
vc_put4stVector(master_aw_bits_size, md);
md[0].c = 0;
md[0].d = m->aw_len();
vc_put4stVector(master_aw_bits_len, md);
md[0].c = 0;
md[0].d = m->ar_id();
vc_put4stVector(master_ar_bits_id, md);
md[0].c = 0;
md[0].d = m->ar_addr();
vc_put4stVector(master_ar_bits_addr, md);
md[0].c = 0;
md[0].d = m->ar_size();
vc_put4stVector(master_ar_bits_size, md);
md[0].c = 0;
md[0].d = m->ar_len();
vc_put4stVector(master_ar_bits_len, md);
md[0].c = 0;
md[0].d = m->w_strb();
vc_put4stVector(master_w_bits_strb, md);
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
md[i].c = 0;
md[i].d = ((uint32_t*) m->w_data())[i];
}
vc_put4stVector(master_w_bits_data, md);
m->tick(
vcs_rst,
vc_getScalar(master_ar_ready),
vc_getScalar(master_aw_ready),
vc_getScalar(master_w_ready),
vc_4stVectorRef(master_r_bits_id)->d,
master_r_data,
vc_getScalar(master_r_bits_last),
vc_getScalar(master_r_valid),
vc_4stVectorRef(master_b_bits_id)->d,
vc_getScalar(master_b_valid)
);
slave->tick(
vcs_rst,
vc_getScalar(slave_ar_valid),
vc_4stVectorRef(slave_ar_bits_addr)->d,
vc_4stVectorRef(slave_ar_bits_id)->d,
vc_4stVectorRef(slave_ar_bits_size)->d,
vc_4stVectorRef(slave_ar_bits_len)->d,
vc_getScalar(slave_aw_valid),
vc_4stVectorRef(slave_aw_bits_addr)->d,
vc_4stVectorRef(slave_aw_bits_id)->d,
vc_4stVectorRef(slave_aw_bits_size)->d,
vc_4stVectorRef(slave_aw_bits_len)->d,
vc_getScalar(slave_w_valid),
vc_4stVectorRef(slave_w_bits_strb)->d,
slave_w_data,
vc_getScalar(slave_w_bits_last),
vc_getScalar(slave_r_ready),
vc_getScalar(slave_b_ready)
);
vc_putScalar(slave_aw_ready, slave->aw_ready());
vc_putScalar(slave_ar_ready, slave->ar_ready());
vc_putScalar(slave_w_ready, slave->w_ready());
vc_putScalar(slave_b_valid, slave->b_valid());
vc_putScalar(slave_r_valid, slave->r_valid());
vc_putScalar(slave_r_bits_last, slave->r_last());
vec32 sd[SLAVE_DATA_SIZE];
sd[0].c = 0;
sd[0].d = slave->b_id();
vc_put4stVector(slave_b_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave->b_resp();
vc_put4stVector(slave_b_bits_resp, sd);
sd[0].c = 0;
sd[0].d = slave->r_id();
vc_put4stVector(slave_r_bits_id, sd);
sd[0].c = 0;
sd[0].d = slave->r_resp();
vc_put4stVector(slave_r_bits_resp, sd);
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
sd[i].c = 0;
sd[i].d = ((uint32_t*) slave->r_data())[i];
}
vc_put4stVector(slave_r_bits_data, sd);
vc_putScalar(reset, vcs_rst);
vc_putScalar(fin, vcs_fin);
main_time++;
if (!vcs_fin) host->switch_to();
else vcs_fin = false;
}
}
#else
extern PLATFORM_TYPE* top;
#if VM_TRACE
extern VerilatedVcdC* tfp;
#endif // VM_TRACE
void tick() {
mmio_zynq_t* m;
assert(m = dynamic_cast<mmio_zynq_t*>(master.get()));
top->clock = 1;
top->eval();
#if VM_TRACE
if (tfp) tfp->dump((double) main_time);
#endif // VM_TRACE
main_time++;
top->io_master_aw_valid = m->aw_valid();
top->io_master_aw_bits_id = m->aw_id();
top->io_master_aw_bits_addr = m->aw_addr();
top->io_master_aw_bits_size = m->aw_size();
top->io_master_aw_bits_len = m->aw_len();
top->io_master_ar_valid = m->ar_valid();
top->io_master_ar_bits_id = m->ar_id();
top->io_master_ar_bits_addr = m->ar_addr();
top->io_master_ar_bits_size = m->ar_size();
top->io_master_ar_bits_len = m->ar_len();
top->io_master_w_valid = m->w_valid();
top->io_master_w_bits_strb = m->w_strb();
top->io_master_w_bits_last = m->w_last();
top->io_master_r_ready = m->r_ready();
top->io_master_b_ready = m->b_ready();
#if CTRL_DATA_BITS > 64
memcpy(top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
#else
memcpy(&top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
#endif
m->tick(
top->reset,
top->io_master_ar_ready,
top->io_master_aw_ready,
top->io_master_w_ready,
top->io_master_r_bits_id,
#if CTRL_DATA_BITS > 64
top->io_master_r_bits_data,
#else
&top->io_master_r_bits_data,
#endif
top->io_master_r_bits_last,
top->io_master_r_valid,
top->io_master_b_bits_id,
top->io_master_b_valid
);
top->io_slave_aw_ready = slave->aw_ready();
top->io_slave_ar_ready = slave->ar_ready();
top->io_slave_w_ready = slave->w_ready();
top->io_slave_b_valid = slave->b_valid();
top->io_slave_b_bits_id = slave->b_id();
top->io_slave_b_bits_resp = slave->b_resp();
top->io_slave_r_valid = slave->r_valid();
top->io_slave_r_bits_id = slave->r_id();
top->io_slave_r_bits_resp = slave->r_resp();
top->io_slave_r_bits_last = slave->r_last();
#if MEM_DATA_BITS > 64
memcpy(top->io_slave_r_bits_data, slave->r_data(), MEM_WIDTH);
#else
memcpy(&top->io_slave_r_bits_data, slave->r_data(), MEM_WIDTH);
#endif
top->clock = 0;
top->eval();
// Slave should be ticked in clock low for comb paths
slave->tick(
top->reset,
top->io_slave_ar_valid,
top->io_slave_ar_bits_addr,
top->io_slave_ar_bits_id,
top->io_slave_ar_bits_size,
top->io_slave_ar_bits_len,
top->io_slave_aw_valid,
top->io_slave_aw_bits_addr,
top->io_slave_aw_bits_id,
top->io_slave_aw_bits_size,
top->io_slave_aw_bits_len,
top->io_slave_w_valid,
top->io_slave_w_bits_strb,
#if MEM_DATA_BITS > 64
top->io_slave_w_bits_data,
#else
&top->io_slave_w_bits_data,
#endif
top->io_slave_w_bits_last,
top->io_slave_r_ready,
top->io_slave_b_ready
);
#if VM_TRACE
if (tfp) tfp->dump((double) main_time);
#endif // VM_TRACE
main_time++;
}
#endif // VCS

View File

@ -0,0 +1,100 @@
// See LICENSE for license details.
#ifndef __MMIO_ZYNQ_H
#define __MMIO_ZYNQ_H
#include "mmio.h"
#include <cstring>
#include <vector>
#include <queue>
struct mmio_req_addr_t
{
size_t id;
uint64_t addr;
size_t size;
size_t len;
mmio_req_addr_t(size_t id_, uint64_t addr_, size_t size_, size_t len_):
id(id_), addr(addr_), size(size_), len(len_) { }
};
struct mmio_req_data_t
{
char* data;
size_t strb;
bool last;
mmio_req_data_t(char* data_, size_t strb_, bool last_):
data(data_), strb(strb_), last(last_) { }
};
struct mmio_resp_data_t
{
size_t id;
char* data;
bool last;
mmio_resp_data_t(size_t id_, char* data_, bool last_):
id(id_), data(data_), last(last_) { }
};
class mmio_zynq_t: public mmio_t
{
public:
mmio_zynq_t(): read_inflight(false), write_inflight(false) {
dummy_data.resize(MMIO_WIDTH);
}
bool aw_valid() { return !aw.empty() && !write_inflight; }
size_t aw_id() { return aw_valid() ? aw.front().id : 0; }
uint64_t aw_addr() { return aw_valid() ? aw.front().addr : 0; }
size_t aw_size() { return aw_valid() ? aw.front().size : 0; }
size_t aw_len() { return aw_valid() ? aw.front().len : 0; }
bool ar_valid() { return !ar.empty() && !read_inflight; }
size_t ar_id() { return ar_valid() ? ar.front().id : 0; }
uint64_t ar_addr() { return ar_valid() ? ar.front().addr : 0; }
size_t ar_size() { return ar_valid() ? ar.front().size : 0; }
size_t ar_len() { return ar_valid() ? ar.front().len : 0; }
bool w_valid() { return !w.empty(); }
size_t w_strb() { return w_valid() ? w.front().strb : 0; }
bool w_last() { return w_valid() ? w.front().last : false; }
void* w_data() { return w_valid() ? w.front().data : &dummy_data[0]; }
bool r_ready() { return read_inflight; }
bool b_ready() { return write_inflight; }
void tick
(
bool reset,
bool ar_ready,
bool aw_ready,
bool w_ready,
size_t r_id,
void* r_data,
bool r_last,
bool r_valid,
size_t b_id,
bool b_valid
);
virtual void read_req(uint64_t addr, size_t size, size_t len);
virtual void write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb);
virtual bool read_resp(void *data);
virtual bool write_resp();
private:
std::queue<mmio_req_addr_t> ar;
std::queue<mmio_req_addr_t> aw;
std::queue<mmio_req_data_t> w;
std::queue<mmio_resp_data_t> r;
std::queue<size_t> b;
bool read_inflight;
bool write_inflight;
std::vector<char> dummy_data;
};
#endif // __MMIO_ZYNQ_H

View File

@ -0,0 +1,25 @@
// See LICENSE for license details.
#ifndef __VCS_MAIN
#define __VCS_MAIN
extern "C" {
extern int vcs_main(int argc, char** argv);
}
struct target_args_t {
target_args_t(int c, char** v):
argc(c), argv(v) { }
int argc;
char** argv;
};
int target_thread(void *arg) {
target_args_t* targs = reinterpret_cast<target_args_t*>(arg);
int argc = targs->argc;
char** argv = targs->argv;
delete targs;
return vcs_main(argc, argv);
}
#endif // __VCS_MAIN

View File

@ -0,0 +1,273 @@
// See LICENSE for license details.
#ifndef __REPLAY_H
#define __REPLAY_H
#include <vector>
#include <map>
#include <fstream>
#include <sstream>
#include <iostream>
#include <cassert>
#include <gmp.h>
#include "sample/sample.h"
enum PUT_VALUE_TYPE { PUT_DEPOSIT, PUT_FORCE };
static const char* PUT_VALUE_TYPE_STRING[2] = { "LOAD", "FORCE" };
template <class T> class replay_t {
public:
replay_t(): cycles(0L), log(false), pass(true), is_exit(false) {
mpz_init(one);
mpz_set_ui(one, 1);
}
virtual ~replay_t() {
for (auto& sample: samples) delete sample;
samples.clear();
mpz_clear(one);
}
void init(int argc, char** argv) {
std::vector<std::string> args(argv + 1, argv + argc);
for (auto &arg: args) {
if (arg.find("+sample=") == 0) {
load_samples(arg.c_str() + 8);
}
if (arg.find("+match=") == 0) {
load_match_points(arg.c_str() + 7);
}
if (arg.find("+verbose") == 0) {
log = true;
}
}
}
void reset(size_t n) {
size_t id = replay_data.signal_map["reset"];
put_value(replay_data.signals[id], one, PUT_DEPOSIT);
take_steps(n);
}
virtual void replay() {
for (auto& sample: samples) {
reset(5);
std::cerr << " * REPLAY AT CYCLE " << sample->get_cycle() << " * " << std::endl;
for (size_t i = 0 ; i < sample->get_cmds().size() ; i++) {
sample_inst_t* cmd = sample->get_cmds()[i];
if (step_t* p = dynamic_cast<step_t*>(cmd)) {
step(p->n);
}
else if (load_t* p = dynamic_cast<load_t*>(cmd)) {
auto signal = signals[p->type][p->id];
auto width = widths[p->type][p->id];
load(signal, width, *(p->value), PUT_DEPOSIT, p->idx);
}
else if (force_t* p = dynamic_cast<force_t*>(cmd)) {
auto signal = signals[p->type][p->id];
auto width = widths[p->type][p->id];
load(signal, width, *(p->value), PUT_FORCE, -1);
}
else if (poke_t* p = dynamic_cast<poke_t*>(cmd)) {
poke(signals[p->type][p->id], *(p->value));
}
else if (expect_t* p = dynamic_cast<expect_t*>(cmd)) {
pass &= expect(signals[p->type][p->id], *(p->value));
}
}
}
is_exit = true;
}
virtual int finish() {
fprintf(stderr, "[%s] Runs %" PRIu64 " cycles\n",
pass ? "PASS" : "FAIL", cycles);
return exitcode();
}
protected:
struct {
std::vector<T> signals;
std::map<std::string, size_t> signal_map;
} replay_data;
inline bool gate_level() { return !match_map.empty(); }
inline bool done() { return is_exit; }
inline int exitcode() { return pass ? EXIT_SUCCESS : EXIT_FAILURE; }
private:
mpz_t one; // useful constant
uint64_t cycles;
bool log;
bool pass;
bool is_exit;
std::vector<sample_t*> samples;
std::vector<std::vector<std::string>> signals;
std::vector<std::vector<size_t>> widths;
std::map<std::string, std::string> match_map;
void load_samples(const char* filename) {
std::ifstream file(filename);
if (!file) {
fprintf(stderr, "Cannot open %s\n", filename);
exit(EXIT_FAILURE);
}
std::string line;
size_t steps = 0;
sample_t* sample = NULL;
while (std::getline(file, line)) {
std::istringstream iss(line);
size_t type, t, width, id, n;
ssize_t idx;
uint64_t cycles;
std::string signal, valstr, dummy;
mpz_t *value = NULL;
iss >> type;
switch(static_cast<SAMPLE_INST_TYPE>(type)) {
case SIGNALS:
iss >> t >> signal >> width;
while(signals.size() <= t) signals.push_back(std::vector<std::string>());
while(widths.size() <= t) widths.push_back(std::vector<size_t>());
signals[t].push_back(signal);
widths[t].push_back(width);
break;
case CYCLE:
iss >> dummy >> cycles;
sample = new sample_t(cycles);
samples.push_back(sample);
steps = 0;
break;
case STATE_LOAD:
iss >> t >> id >> valstr >> idx;
value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_str(*value, valstr.c_str(), 16);
sample->add_cmd(new load_t(t, id, value, idx));
break;
case FORCE:
iss >> t >> id >> valstr;
value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_str(*value, valstr.c_str(), 16);
sample->add_cmd(new force_t(t, id, value));
break;
case POKE:
iss >> t >> id >> valstr;
value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_str(*value, valstr.c_str(), 16);
sample->add_cmd(new poke_t(t, id, value));
break;
case STEP:
iss >> n;
sample->add_cmd(new step_t(n));
steps += n;
break;
case EXPECT:
iss >> t >> id >> valstr;
value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_str(*value, valstr.c_str(), 16);
if (steps > 1) sample->add_cmd(new expect_t(t, id, value));
break;
default:
break;
}
}
file.close();
}
void load_match_points(const char* filename) {
std::ifstream file(filename);
if (!file) {
fprintf(stderr, "Cannot open %s\n", filename);
exit(EXIT_FAILURE);
}
std::string line;
while (std::getline(file, line)) {
std::istringstream iss(line);
std::string ref, impl;
iss >> ref >> impl;
match_map[ref] = impl;
}
}
virtual void take_steps(size_t) = 0;
virtual void put_value(T& sig, mpz_t& data, PUT_VALUE_TYPE type) = 0;
virtual void get_value(T& sig, mpz_t& data) = 0;
void step(size_t n) {
cycles += n;
if (log) std::cerr << " * STEP " << n << " -> " << cycles << " *" << std::endl;
take_steps(n);
}
T& get_signal(const std::string& node) {
auto it = replay_data.signal_map.find(node);
if (it == replay_data.signal_map.end()) {
std::cerr << "Cannot find " << node << " in the design" << std::endl;
assert(false);
}
return replay_data.signals[it->second];
}
void load_bit(const std::string& ref, mpz_t& bit, PUT_VALUE_TYPE tpe) {
auto it = match_map.find(ref);
if (it != match_map.end()) {
put_value(get_signal(it->second), bit, tpe);
}
}
void load(const std::string& node, size_t width, mpz_t& data, PUT_VALUE_TYPE tpe, int idx) {
std::string name = idx < 0 ? node : node + "[" + std::to_string(idx) + "]";
if (log) {
char* data_str = mpz_get_str(NULL, 16, data);
std::cerr << " * " << PUT_VALUE_TYPE_STRING[tpe] << " " << name
<< " <- 0x" << data_str << " *" << std::endl;
free(data_str);
}
if (!gate_level()) {
put_value(get_signal(name), data, tpe);
} else if (width == 1 && idx < 0) {
load_bit(name, data, tpe);
} else {
for (size_t i = 0 ; i < width ; i++) {
mpz_t bit;
mpz_init(bit);
// bit = (data >> i) & 0x1
mpz_fdiv_q_2exp(bit, data, i);
mpz_and(bit, bit, one);
load_bit(name + "[" + std::to_string(i) + "]", bit, tpe);
mpz_clear(bit);
}
}
}
void poke(const std::string& node, mpz_t& data) {
if (log) {
char* data_str = mpz_get_str(NULL, 16, data);
std::cerr << " * POKE " << node << " <- 0x" << data_str << " *" << std::endl;
free(data_str);
}
put_value(get_signal(node), data, PUT_DEPOSIT);
}
bool expect(const std::string& node, mpz_t& expected) {
mpz_t value;
mpz_init(value);
get_value(get_signal(node), value);
bool pass = mpz_cmp(value, expected) == 0 || cycles <= 1;
if (log) {
char* value_str = mpz_get_str(NULL, 16, value);
char* expected_str = mpz_get_str(NULL, 16, expected);
std::cerr << " * EXPECT " << node
<< " -> 0x" << value_str << " ?= 0x" << expected_str
<< (pass ? " : PASS" : " : FAIL") << " *" << std::endl;
free(value_str);
free(expected_str);
}
return pass;
}
};
#endif //__REPLAY_H

View File

@ -0,0 +1,30 @@
##################################################################################
# Replay Parameters
# 1) TARGET_VERILOG: verilog file to be replay (by default $(GEN_DIR)/$(DESISN).v)
# 2) REPLAY_BINARY: binary file for replay (by default $(OUT_DIR)/$(DESIGN)-reply)
##################################################################################
TARGET_VERILOG ?= $(GEN_DIR)/$(DESIGN).v $(GEN_DIR)/$(DESIGN).macros.v
REPLAY_BINARY ?= $(OUT_DIR)/$(DESIGN)-replay
replay_h := $(midas_dir)/sample/sample.h $(replay_dir)/replay_vpi.h $(replay_dir)/replay.h
replay_cc := $(midas_dir)/sample/sample.cc $(replay_dir)/replay_vpi.cc
ifneq ($(filter $(MAKECMDGOALS),vcs-replay $(REPLAY_BINARY)),)
$(info verilog files: $(TARGET_VERILOG))
$(info replay binary: $(REPLAY_BINARY))
endif
# Compile VCS replay binary
$(REPLAY_BINARY): $(v_dir)/replay.v $(TARGET_VERILOG) $(replay_cc) $(replay_h) $(lib)
mkdir -p $(OUT_DIR)
rm -rf $(GEN_DIR)/$(notdir $@).csrc
rm -rf $(OUT_DIR)/$(notdir $@).daidir
$(VCS) $(VCS_FLAGS) -CFLAGS -I$(replay_dir) \
-Mdir=$(GEN_DIR)/$(notdir $@).csrc +vpi -P $(r_dir)/vpi.tab \
+define+STOP_COND=!replay.reset +define+PRINTF_COND=!replay.reset \
+define+VFRAG=\"$(GEN_DIR)/$(DESIGN).vfrag\" \
-o $@ $< $(TARGET_VERILOG) $(replay_cc) $(lib)
vcs-replay: $(REPLAY_BINARY)
.PHONY: vcs-replay

View File

@ -0,0 +1,229 @@
// See LICENSE for license details.
#include "replay_vpi.h"
#include "emul/vcs_main.h"
void replay_vpi_t::init(int argc, char** argv) {
host = midas_context_t::current();
target_args_t *targs = new target_args_t(argc, argv);
target.init(target_thread, targs);
replay_t::init(argc, argv);
target.switch_to();
}
int replay_vpi_t::finish() {
target.switch_to();
return replay_t::finish();
}
void replay_vpi_t::add_signal(vpiHandle& sig_handle, std::string& wire) {
size_t id = replay_data.signals.size();
replay_data.signals.push_back(sig_handle);
replay_data.signal_map[wire] = id;
}
void replay_vpi_t::probe_bits(vpiHandle& sig_handle, std::string& sigpath, std::string& modname) {
if (gate_level()) {
if (vpi_get(vpiSize, sig_handle) == 1) {
std::string bitpath = sigpath + "[0]";
add_signal(sig_handle, bitpath);
} else {
vpiHandle bit_iter = vpi_iterate(vpiBit, sig_handle);
while (vpiHandle bit_handle = vpi_scan(bit_iter)) {
std::string bitname = vpi_get_str(vpiName, bit_handle);
std::string bitpath = modname + "." + bitname;
add_signal(bit_handle, bitpath);
}
}
}
}
void replay_vpi_t::probe_signals() {
// traverse testbench first
vpiHandle replay_handle = vpi_scan(vpi_iterate(vpiModule, NULL));
vpiHandle reg_iter = vpi_iterate(vpiReg, replay_handle);
vpiHandle net_iter = vpi_iterate(vpiNet, replay_handle);
while (vpiHandle reg_handle = vpi_scan(reg_iter)) {
std::string regname = vpi_get_str(vpiName, reg_handle);
if (regname.find("_delay") != 0) add_signal(reg_handle, regname);
}
while (vpiHandle net_handle = vpi_scan(net_iter)) {
std::string netname = vpi_get_str(vpiName, net_handle);
if (netname.find("_delay") != 0) add_signal(net_handle, netname);
}
vpiHandle syscall_handle = vpi_handle(vpiSysTfCall, NULL);
vpiHandle arg_iter = vpi_iterate(vpiArgument, syscall_handle);
vpiHandle top_handle = vpi_scan(arg_iter);
std::queue<vpiHandle> modules;
size_t offset = std::string(vpi_get_str(vpiFullName, top_handle)).find(".") + 1;
// Start from the top module
modules.push(top_handle);
while (!modules.empty()) {
vpiHandle mod_handle = modules.front();
modules.pop();
std::string modname = std::string(vpi_get_str(vpiFullName, mod_handle)).substr(offset);
if (!vpi_scan(vpi_iterate(vpiPrimitive, mod_handle))) { // Not a gate?
// Iterate its ports
vpiHandle net_iter = vpi_iterate(vpiNet, mod_handle);
while (vpiHandle net_handle = vpi_scan(net_iter)) {
std::string netname = vpi_get_str(vpiName, net_handle);
std::string netpath = modname + "." + netname;
add_signal(net_handle, netpath);
probe_bits(net_handle, netpath, modname);
}
}
// Iterate its regs
vpiHandle reg_iter = vpi_iterate(vpiReg, mod_handle);
while (vpiHandle reg_handle = vpi_scan(reg_iter)) {
std::string regname = vpi_get_str(vpiName, reg_handle);
std::string regpath = modname + "." + regname;
add_signal(reg_handle, regpath);
probe_bits(reg_handle, regpath, modname);
}
// Iterate its mems
vpiHandle mem_iter = vpi_iterate(vpiRegArray, mod_handle);
while (vpiHandle mem_handle = vpi_scan(mem_iter)) {
vpiHandle elm_iter = vpi_iterate(vpiReg, mem_handle);
while (vpiHandle elm_handle = vpi_scan(elm_iter)) {
std::string elmname = vpi_get_str(vpiName, elm_handle);
std::string elmpath = modname + "." + elmname;
add_signal(elm_handle, elmpath);
probe_bits(elm_handle, elmpath, modname);
}
}
// Find DFF
vpiHandle udp_iter = vpi_iterate(vpiPrimitive, mod_handle);
while (vpiHandle udp_handle = vpi_scan(udp_iter)) {
if (vpi_get(vpiPrimType, udp_handle) == vpiSeqPrim) {
add_signal(udp_handle, modname);
}
}
vpiHandle sub_iter = vpi_iterate(vpiModule, mod_handle);
while (vpiHandle sub_handle = vpi_scan(sub_iter)) {
modules.push(sub_handle);
}
}
}
void replay_vpi_t::put_value(vpiHandle& sig, std::string& value, PLI_INT32 flag) {
s_vpi_value value_s;
// s_vpi_time time_s;
value_s.format = vpiHexStrVal;
value_s.value.str = (PLI_BYTE8*) value.c_str();
// time_s.type = vpiScaledRealTime;
// time_s.real = 0.0;
vpi_put_value(sig, &value_s, /*&time_s*/ NULL, flag);
}
void replay_vpi_t::get_value(vpiHandle& sig, std::string& value) {
s_vpi_value value_s;
value_s.format = vpiHexStrVal;
vpi_get_value(sig, &value_s);
value = value_s.value.str;
}
void replay_vpi_t::put_value(vpiHandle& sig, mpz_t& data, PUT_VALUE_TYPE type) {
PLI_INT32 flag;
switch(type) {
case PUT_DEPOSIT: flag = vpiNoDelay; break;
case PUT_FORCE: flag = vpiForceFlag; forces.push(sig); break;
}
size_t value_size;
uint32_t* value = (uint32_t*)mpz_export(NULL, &value_size, -1, sizeof(uint32_t), 0, 0, data);
size_t signal_size = ((vpi_get(vpiSize, sig) - 1) / 32) + 1;
s_vpi_value value_s;
s_vpi_vecval vecval_s[signal_size];
value_s.format = vpiVectorVal;
value_s.value.vector = vecval_s;
for (size_t i = 0 ; i < signal_size ; i++) {
value_s.value.vector[i].aval = i < value_size ? value[i] : 0;
value_s.value.vector[i].bval = 0;
}
vpi_put_value(sig, &value_s, NULL, flag);
}
void replay_vpi_t::get_value(vpiHandle& sig, mpz_t& data) {
size_t signal_size = ((vpi_get(vpiSize, sig) - 1) / 32) + 1;
s_vpi_value value_s;
s_vpi_vecval vecval_s[signal_size];
value_s.format = vpiVectorVal;
value_s.value.vector = vecval_s;
vpi_get_value(sig, &value_s);
uint32_t value[signal_size];
for (size_t i = 0 ; i < signal_size ; i++) {
value[i] = value_s.value.vector[i].aval;
}
mpz_import(data, signal_size, -1, sizeof(uint32_t), 0, 0, value);
}
void replay_vpi_t::take_steps(size_t n) {
for (size_t i = 0 ; i < n ; i++)
target.switch_to();
}
void replay_vpi_t::tick() {
while(!forces.empty()) {
vpi_put_value(forces.front(), NULL, NULL, vpiReleaseFlag);
forces.pop();
}
host->switch_to();
vpiHandle syscall_handle = vpi_handle(vpiSysTfCall, NULL);
vpiHandle arg_iter = vpi_iterate(vpiArgument, syscall_handle);
vpiHandle exit_handle = vpi_scan(arg_iter);
s_vpi_value vexit;
vexit.format = vpiIntVal;
vexit.value.integer = done();
vpi_put_value(exit_handle, &vexit, NULL, vpiNoDelay);
}
static replay_vpi_t* replay = NULL;
extern "C" {
PLI_INT32 init_sigs_calltf(PLI_BYTE8 *user_data) {
replay->probe_signals();
return 0;
}
PLI_INT32 tick_calltf(PLI_BYTE8 *user_data) {
replay->tick();
return 0;
}
PLI_INT32 sim_end_cb(p_cb_data cb_data) {
replay->tick();
return 0;
}
PLI_INT32 tick_compiletf(PLI_BYTE8 *user_data) {
s_cb_data data_s;
data_s.reason = cbEndOfSimulation;
data_s.cb_rtn = sim_end_cb;
data_s.obj = NULL;
data_s.time = NULL;
data_s.value = NULL;
data_s.user_data = NULL;
vpi_free_object(vpi_register_cb(&data_s));
}
int main(int argc, char** argv) {
replay = new replay_vpi_t;
replay->init(argc, argv);
replay->replay();
int exitcode = replay->finish();
delete replay;
return exitcode;
}
}

View File

@ -0,0 +1,36 @@
// See LICENSE for license details.
#ifndef __REPLAY_VPI_H
#define __REPLAY_VPI_H
#include "vpi_user.h"
#include "replay.h"
#include "midas_context.h"
#include <queue>
class replay_vpi_t: public replay_t<vpiHandle> {
public:
replay_vpi_t() { }
virtual ~replay_vpi_t() { }
virtual void init(int argc, char** argv);
virtual int finish();
void probe_signals();
void tick();
private:
std::queue<vpiHandle> forces;
midas_context_t *host;
midas_context_t target;
inline void add_signal(vpiHandle& sig_handle, std::string& path);
inline void probe_bits(vpiHandle& sig_handle, std::string& sigpath, std::string& modname);
void put_value(vpiHandle& sig, std::string& value, PLI_INT32 flag);
void get_value(vpiHandle& sig, std::string& value);
virtual void put_value(vpiHandle& sig, mpz_t& data, PUT_VALUE_TYPE type);
virtual void get_value(vpiHandle& sig, mpz_t& data);
virtual void take_steps(size_t n);
};
#endif // __REPLAY_VPI_H

View File

@ -0,0 +1,41 @@
# VCS RTL Simulation Makefrag
#
# This makefrag stores common recipes for building RTL simulators with VCS
#
# Compulsory variables:
# All those described Makefrag-verilator
# vcs_wrapper_v: An additional verilog wrapper around the DUT not used in verilator
# CLOCK_PERIOD: Self explanatory
# TB := The top level module on which the stop and printf conditions are defined
#
VCS ?= vcs -full64
override VCS_FLAGS := -quiet -timescale=1ns/1ps +v2k +rad +vcs+initreg+random +vcs+lic+wait \
-notice -line +lint=all,noVCDE,noONGS,noUI -quiet -debug_pp +no_notifier -cpp $(CXX) \
-Mdir=$(GEN_DIR)/$(DESIGN)-debug.csrc \
+vc+list \
-CFLAGS "$(CXXFLAGS) $(CFLAGS) -DVCS -I$(VCS_HOME)/include" \
-LDFLAGS "$(LDFLAGS)" \
-sverilog \
+define+CLOCK_PERIOD=$(CLOCK_PERIOD) \
+define+RANDOMIZE_GARBAGE_ASSIGN \
+define+RANDOMIZE_INVALID_ASSIGN \
+define+STOP_COND=!$(TB).reset \
+define+PRINTF_COND=!$(TB).reset \
$(VCS_FLAGS)
vcs_v := $(emul_v) $(vcs_wrapper_v)
$(OUT_DIR)/$(DESIGN): $(vcs_v) $(emul_cc) $(emul_h)
mkdir -p $(OUT_DIR)
rm -rf $(GEN_DIR)/$(DESIGN).csrc
rm -rf $(OUT_DIR)/$(DESIGN).daidir
$(VCS) $(VCS_FLAGS) \
-o $@ $(vcs_v) $(emul_cc)
$(OUT_DIR)/$(DESIGN)-debug: $(vcs_v) $(emul_cc) $(emul_h)
mkdir -p $(OUT_DIR)
rm -rf $(GEN_DIR)/$(DESIGN)-debug.csrc
rm -rf $(OUT_DIR)/$(DESIGN)-debug.daidir
$(VCS) $(VCS_FLAGS) +define+DEBUG \
-o $@ $(vcs_v) $(emul_cc)

View File

@ -0,0 +1,37 @@
# Verilator RTL Simulation Makefrag
#
# This makefrag stores common recipes for building RTL simulators with Verilator
#
# Compulsory variables:
# OUT_DIR: See Makefile
# GEN_DIR: See Makefile
# DESIGN: See Makefile
# emul_cc: C++ sources
# emul_h: C++ headers
# emul_v: verilog sources and headers
#
# Verilator Only:
# top_module: The top of the DUT
# (optional) verilator_conf: An verilator configuration file
VERILATOR ?= verilator --cc --exe
override VERILATOR_FLAGS := --assert -Wno-STMTDLY -O3 \
-CFLAGS "$(CXXFLAGS) $(CFLAGS)" \
-LDFLAGS "$(LDFLAGS) " \
$(VERILATOR_FLAGS)
$(OUT_DIR)/V$(DESIGN): $(emul_v) $(emul_cc) $(emul_h)
mkdir -p $(OUT_DIR)
rm -rf $(GEN_DIR)/V$(DESIGN).csrc
$(VERILATOR) $(VERILATOR_FLAGS) --top-module $(top_module) -Mdir $(GEN_DIR)/V$(DESIGN).csrc \
-CFLAGS "-include $(GEN_DIR)/V$(DESIGN).csrc/V$(top_module).h" \
-o $@ $(emul_v) $(verilator_conf) $(emul_cc)
$(MAKE) -C $(GEN_DIR)/V$(DESIGN).csrc -f V$(top_module).mk
$(OUT_DIR)/V$(DESIGN)-debug: $(emul_v) $(emul_cc) $(emul_h)
mkdir -p $(OUT_DIR)
rm -rf $(GEN_DIR)/V$(DESIGN)-debug.csrc
$(VERILATOR) $(VERILATOR_FLAGS) --trace --top-module $(top_module) -Mdir $(GEN_DIR)/V$(DESIGN)-debug.csrc \
-CFLAGS "-include $(GEN_DIR)/V$(DESIGN)-debug.csrc/V$(top_module).h" \
-o $@ $(emul_v) $(verilator_conf) $(emul_cc)
$(MAKE) -C $(GEN_DIR)/V$(DESIGN)-debug.csrc -f V$(top_module).mk

View File

@ -0,0 +1,273 @@
// See LICENSE.SiFive for license details.
// See LICENSE.Berkeley for license details.
#include "verilated.h"
#if VM_TRACE
#include <memory>
#include "verilated_vcd_c.h"
#endif
#include <iostream>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
// Originally from Rocket-Chip, with RISC-V specific stuff stripped out
// For option parsing, which is split across this file, Verilog, a few external
// files must be pulled in. The list of files and what they provide is
// enumerated:
//
// Biancolin: This will be useful later.
// $(ROCKETCHIP_DIR)/generated-src(-debug)?/$(CONFIG).plusArgs:
// defines:
// - PLUSARG_USAGE_OPTIONS
// variables:
// - static const char * verilog_plusargs
static uint64_t trace_count = 0;
bool verbose;
bool done_reset;
//void handle_sigterm(int sig)
//{
// Biancolin: //TODO
//}
double sc_time_stamp()
{
return trace_count;
}
extern "C" int vpi_get_vlog_info(void* arg)
{
return 0;
}
static void usage(const char * program_name)
{
printf("Usage: %s [VERILOG PLUSARG]...\n",
program_name);
fputs("\
Run a BINARY on the Rocket Chip emulator.\n\
\n\
Mandatory arguments to long options are mandatory for short options too.\n\
\n\
EMULATOR OPTIONS\n\
-c, --cycle-count Print the cycle count before exiting\n\
+cycle-count\n\
-h, --help Display this help and exit\n\
-m, --max-cycles=CYCLES Kill the emulation after CYCLES\n\
+max-cycles=CYCLES\n\
-s, --seed=SEED Use random number seed SEED\n\
automatically.\n\
-V, --verbose Enable all Chisel printfs (cycle-by-cycle info)\n\
+verbose\n\
", stdout);
#if VM_TRACE == 0
fputs("\
\n\
EMULATOR DEBUG OPTIONS (only supported in debug build -- try `make debug`)\n",
stdout);
#endif
fputs("\
-v, --vcd=FILE, Write vcd trace to FILE (or '-' for stdout)\n\
-x, --dump-start=CYCLE Start VCD tracing at CYCLE\n\
+dump-start\n\
", stdout);
//fputs("\n" PLUSARG_USAGE_OPTIONS, stdout);
}
int main(int argc, char** argv)
{
unsigned random_seed = (unsigned)time(NULL) ^ (unsigned)getpid();
uint64_t max_cycles = -1;
int ret = 0;
bool print_cycles = false;
// Port numbers are 16 bit unsigned integers.
#if VM_TRACE
FILE * vcdfile = NULL;
uint64_t start = 0;
#endif
int verilog_plusargs_legal = 1;
while (1) {
static struct option long_options[] = {
{"cycle-count", no_argument, 0, 'c' },
{"help", no_argument, 0, 'h' },
{"max-cycles", required_argument, 0, 'm' },
{"seed", required_argument, 0, 's' },
{"rbb-port", required_argument, 0, 'r' },
#if VM_TRACE
{"vcd", required_argument, 0, 'v' },
{"dump-start", required_argument, 0, 'x' },
#endif
{"verbose", no_argument, 0, 'V' }
};
int option_index = 0;
#if VM_TRACE
int c = getopt_long(argc, argv, "-chm:s:r:v:Vx:", long_options, &option_index);
#else
int c = getopt_long(argc, argv, "-chm:s:r:V", long_options, &option_index);
#endif
if (c == -1) break;
retry:
switch (c) {
// Process long and short EMULATOR options
case '?': usage(argv[0]); return 1;
case 'c': print_cycles = true; break;
case 'h': usage(argv[0]); return 0;
case 'm': max_cycles = atoll(optarg); break;
case 's': random_seed = atoi(optarg); break;
case 'V': verbose = true; break;
#if VM_TRACE
case 'v': {
vcdfile = strcmp(optarg, "-") == 0 ? stdout : fopen(optarg, "w");
if (!vcdfile) {
std::cerr << "Unable to open " << optarg << " for VCD write\n";
return 1;
}
break;
}
case 'x': start = atoll(optarg); break;
#endif
// Process legacy '+' EMULATOR arguments by replacing them with
// their getopt equivalents
case 1: {
std::string arg = optarg;
if (arg.substr(0, 1) != "+") {
optind--;
goto done_processing;
}
if (arg == "+verbose")
c = 'V';
else if (arg.substr(0, 12) == "+max-cycles=") {
c = 'm';
optarg = optarg+12;
}
#if VM_TRACE
else if (arg.substr(0, 12) == "+dump-start=") {
c = 'x';
optarg = optarg+12;
}
#endif
else if (arg.substr(0, 12) == "+cycle-count")
c = 'c';
// If we don't find a legacy '+' EMULATOR argument, it still could be
// a VERILOG_PLUSARG and not an error.
//else if (verilog_plusargs_legal) {
// const char ** plusarg = &verilog_plusargs[0];
// int legal_verilog_plusarg = 0;
// while (*plusarg && (legal_verilog_plusarg == 0)){
// if (arg.substr(1, strlen(*plusarg)) == *plusarg) {
// legal_verilog_plusarg = 1;
// }
// plusarg ++;
// }
// if (!legal_verilog_plusarg) {
// verilog_plusargs_legal = 0;
// } else {
// c = 'P';
// }
// goto retry;
//}
// Not a recongized plus-arg
else {
std::cerr << argv[0] << ": invalid plus-arg (Verilog or HTIF) \""
<< arg << "\"\n";
c = '?';
}
goto retry;
}
case 'P': break; // Nothing to do here, Verilog PlusArg
default:
c = '?';
goto retry;
}
}
done_processing:
if (verbose)
fprintf(stderr, "using random seed %u\n", random_seed);
srand(random_seed);
srand48(random_seed);
Verilated::randReset(2);
Verilated::commandArgs(argc, argv);
TEST_HARNESS *tile = new TEST_HARNESS;
#if VM_TRACE
Verilated::traceEverOn(true); // Verilator must compute traced signals
std::unique_ptr<VerilatedVcdFILE> vcdfd(new VerilatedVcdFILE(vcdfile));
std::unique_ptr<VerilatedVcdC> tfp(new VerilatedVcdC(vcdfd.get()));
if (vcdfile) {
tile->trace(tfp.get(), 99); // Trace 99 levels of hierarchy
tfp->open("");
}
#endif
//signal(SIGTERM, handle_sigterm);
bool dump;
// reset for several cycles to handle pipelined reset
for (int i = 0; i < 10; i++) {
tile->reset = 1;
tile->clock = 0;
tile->eval();
#if VM_TRACE
dump = tfp && trace_count >= start;
if (dump)
tfp->dump(static_cast<vluint64_t>(trace_count * 2));
#endif
tile->clock = 1;
tile->eval();
#if VM_TRACE
if (dump)
tfp->dump(static_cast<vluint64_t>(trace_count * 2 + 1));
#endif
trace_count ++;
}
tile->reset = 0;
done_reset = true;
while (!tile->io_success && trace_count < max_cycles) {
tile->clock = 0;
tile->eval();
#if VM_TRACE
dump = tfp && trace_count >= start;
if (dump)
tfp->dump(static_cast<vluint64_t>(trace_count * 2));
#endif
tile->clock = 1;
tile->eval();
#if VM_TRACE
if (dump)
tfp->dump(static_cast<vluint64_t>(trace_count * 2 + 1));
#endif
trace_count++;
}
#if VM_TRACE
if (tfp)
tfp->close();
if (vcdfile)
fclose(vcdfile);
#endif
if (trace_count == max_cycles)
{
fprintf(stderr, "*** FAILED *** via trace_count (timeout, seed %d) after %ld cycles\n", random_seed, trace_count);
ret = 2;
}
else if (verbose || print_cycles)
{
fprintf(stderr, "Completed after %ld cycles\n", trace_count);
}
if (tile) delete tile;
return ret;
}

View File

@ -0,0 +1,5 @@
// HACK: Disable MULTIDRIVEN linting, since verilator cannot determine if two
// syntactically different clocks are aliases of one another if they are
// driven by seperate ports.
`verilator_config
lint_off -msg MULTIDRIVEN

View File

@ -0,0 +1,158 @@
// See LICENSE for license details.
#include "sample.h"
#include <cassert>
#include <cstring>
#include <fstream>
#include <sstream>
#ifdef ENABLE_SNAPSHOT
std::array<std::vector<std::string>, CHAIN_NUM> sample_t::signals = {};
std::array<std::vector<size_t>, CHAIN_NUM> sample_t::widths = {};
std::array<std::vector<int>, CHAIN_NUM> sample_t::depths = {};
size_t sample_t::chain_len[CHAIN_NUM] = {0};
size_t sample_t::chain_loop[CHAIN_NUM] = {0};
void sample_t::init_chains(std::string filename) {
std::fill(signals.begin(), signals.end(), std::vector<std::string>());
std::fill(widths.begin(), widths.end(), std::vector<size_t>());
std::fill(depths.begin(), depths.end(), std::vector<int>());
std::ifstream file(filename.c_str());
if (!file) {
fprintf(stderr, "Cannot open %s\n", filename.c_str());
exit(EXIT_FAILURE);
}
std::string line;
while (std::getline(file, line)) {
std::istringstream iss(line);
size_t type;
std::string signal;
iss >> type >> signal;
size_t width;
int depth;
iss >> width >> depth;
if (signal == "null") signal = "";
signals[type].push_back(signal);
widths[type].push_back(width);
depths[type].push_back(depth);
chain_len[type] += width;
switch ((CHAIN_TYPE) type) {
case SRAM_CHAIN:
case REGFILE_CHAIN:
if (!signal.empty() && depth > 0) {
chain_loop[type] = std::max(chain_loop[type], (size_t) depth);
}
break;
default:
chain_loop[type] = 1;
break;
}
}
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
chain_len[t] /= DAISY_WIDTH;
}
file.close();
}
void sample_t::dump_chains(std::ostream& os) {
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
auto chain_signals = signals[t];
auto chain_widths = widths[t];
for (size_t id = 0 ; id < chain_signals.size() ; id++) {
auto signal = chain_signals[id];
auto width = chain_widths[id];
os << SIGNALS << " " << t << " " <<
(signal.empty() ? "null" : signal) << " " << width << std::endl;
}
}
for (size_t id = 0 ; id < IN_TR_SIZE ; id++) {
os << SIGNALS << " " << IN_TR << " " << IN_TR_NAMES[id] << std::endl;
}
for (size_t id = 0 ; id < OUT_TR_SIZE ; id++) {
os << SIGNALS << " " << OUT_TR << " " << OUT_TR_NAMES[id] << std::endl;
}
for (size_t id = 0, bits_id = 0 ; id < IN_TR_READY_VALID_SIZE ; id++) {
os << SIGNALS << " " << IN_TR_VALID << " " <<
(const char*)IN_TR_READY_VALID_NAMES[id] << "_valid" << std::endl;
os << SIGNALS << " " << IN_TR_READY << " " <<
(const char*)IN_TR_READY_VALID_NAMES[id] << "_ready" << std::endl;
for (size_t k = 0 ; k < (size_t)IN_TR_BITS_FIELD_NUMS[id] ; k++, bits_id++) {
os << SIGNALS << " " << IN_TR_BITS << " " <<
(const char*)IN_TR_BITS_FIELD_NAMES[bits_id] << std::endl;
}
}
for (size_t id = 0, bits_id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
os << SIGNALS << " " << OUT_TR_VALID << " " <<
(const char*)OUT_TR_READY_VALID_NAMES[id] << "_valid" << std::endl;
os << SIGNALS << " " << OUT_TR_READY << " " <<
(const char*)OUT_TR_READY_VALID_NAMES[id] << "_ready" << std::endl;
for (size_t k = 0 ; k < (size_t)OUT_TR_BITS_FIELD_NUMS[id] ; k++, bits_id++) {
os << SIGNALS << " " << OUT_TR_BITS << " " <<
(const char*)OUT_TR_BITS_FIELD_NAMES[bits_id] << std::endl;
}
}
}
size_t sample_t::read_chain(CHAIN_TYPE type, const char* snap, size_t start) {
size_t t = static_cast<size_t>(type);
auto chain_signals = signals[t];
auto chain_widths = widths[t];
auto chain_depths = depths[t];
for (size_t i = 0 ; i < chain_loop[type] ; i++) {
for (size_t s = 0 ; s < chain_signals.size() ; s++) {
auto signal = chain_signals[s];
auto width = chain_widths[s];
auto depth = chain_depths[s];
if (!signal.empty()) {
char substr[1025];
assert(width <= 1024);
strncpy(substr, snap+start, width);
substr[width] = '\0';
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_str(*value, substr, 2);
switch(type) {
case TRACE_CHAIN:
add_cmd(new force_t(type, s, value));
break;
case REGS_CHAIN:
add_cmd(new load_t(type, s, value));
break;
case SRAM_CHAIN:
case REGFILE_CHAIN:
if (static_cast<int>(i) < depth)
add_cmd(new load_t(type, s, value, i));
break;
case CNTR_CHAIN:
add_cmd(new count_t(type, s, value));
break;
default:
break;
}
}
start += width;
}
assert(start % DAISY_WIDTH == 0);
}
return start;
}
sample_t::sample_t(const char* snap, uint64_t _cycle):
cycle(_cycle), force_prev_id(-1) {
size_t start = 0;
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
CHAIN_TYPE type = static_cast<CHAIN_TYPE>(t);
start = read_chain(type, snap, start);
}
}
sample_t::sample_t(CHAIN_TYPE type, const char* snap, uint64_t _cycle):
cycle(_cycle), force_prev_id(-1) {
read_chain(type, snap);
}
#endif
sample_t::~sample_t() {
for (auto& cmd: cmds) delete cmd;
cmds.clear();
}

View File

@ -0,0 +1,192 @@
// See LICENSE for license details.
#ifndef __SAMPLE_H
#define __SAMPLE_H
#include <string>
#include <array>
#include <vector>
#include <map>
#include <ostream>
#include <inttypes.h>
#include <gmp.h>
enum SAMPLE_INST_TYPE { SIGNALS, CYCLE, LOAD, FORCE, POKE, STEP, EXPECT, COUNT };
#ifdef ENABLE_SNAPSHOT
enum { IN_TR = CHAIN_NUM,
OUT_TR,
IN_TR_VALID,
IN_TR_READY,
IN_TR_BITS,
OUT_TR_VALID,
OUT_TR_READY,
OUT_TR_BITS };
#endif
struct sample_inst_t {
virtual ~sample_inst_t() {}
virtual std::ostream& dump(std::ostream &os) const = 0;
friend std::ostream& operator<<(std::ostream &os, const sample_inst_t& cmd) {
return cmd.dump(os);
}
};
struct step_t: sample_inst_t {
step_t(size_t n_): n(n_) { }
std::ostream& dump(std::ostream &os) const {
return os << STEP << " " << n << std::endl;
}
const size_t n;
};
struct load_t: sample_inst_t {
load_t(const size_t type, const size_t id, mpz_t* value, const int idx = -1):
type(type), id(id), value(value), idx(idx) { }
~load_t() {
mpz_clear(*value);
free(value);
}
std::ostream& dump(std::ostream &os) const {
char* value_str = mpz_get_str(NULL, 16, *value);
os << LOAD << " " << type << " " << id << " " << value_str << " " << idx << std::endl;
free(value_str);
return os;
}
const size_t type;
const size_t id;
mpz_t* const value;
const int idx;
};
struct force_t: sample_inst_t {
force_t(const size_t type, const size_t id, mpz_t* value):
type(type), id(id), value(value) { }
~force_t() {
mpz_clear(*value);
free(value);
}
std::ostream& dump(std::ostream &os) const {
char* value_str = mpz_get_str(NULL, 16, *value);
os << FORCE << " " << type << " " << id << " " << value_str << std::endl;
free(value_str);
return os;
}
const size_t type;
const size_t id;
mpz_t* const value;
};
struct poke_t: sample_inst_t {
poke_t(const size_t type, const size_t id, mpz_t* value):
type(type), id(id), value(value) { }
~poke_t() {
mpz_clear(*value);
free(value);
}
std::ostream& dump(std::ostream &os) const {
char* value_str = mpz_get_str(NULL, 16, *value);
os << POKE << " " << type << " " << id << " " << value_str << std::endl;
free(value_str);
return os;
}
const size_t type;
const size_t id;
mpz_t* const value;
};
struct expect_t: sample_inst_t {
expect_t(const size_t type, const size_t id, mpz_t* value):
type(type), id(id), value(value) { }
~expect_t() {
mpz_clear(*value);
free(value);
}
std::ostream& dump(std::ostream &os) const {
char* value_str = mpz_get_str(NULL, 16, *value);
os << EXPECT << " " << type << " " << id << " " << value_str << std::endl;
free(value_str);
return os;
}
const size_t type;
const size_t id;
mpz_t* const value;
};
struct count_t: sample_inst_t {
count_t(const size_t type, const size_t id, mpz_t* value):
type(type), id(id), value(value) { }
~count_t() {
mpz_clear(*value);
free(value);
}
std::ostream& dump(std::ostream &os) const {
char* value_str = mpz_get_str(NULL, 16, *value);
os << COUNT << " " << type << " " << id << " " << value_str << std::endl;
free(value_str);
return os;
}
const size_t type;
const size_t id;
mpz_t* const value;
};
class sample_t {
public:
sample_t(uint64_t _cycle): cycle(_cycle) { }
#ifdef ENABLE_SNAPSHOT
sample_t(const char* snap, uint64_t _cycle);
sample_t(CHAIN_TYPE type, const char* snap, uint64_t _cycle);
std::ostream& dump(std::ostream &os) const {
os << CYCLE << " cycle: " << cycle << std::endl;
for (size_t i = 0 ; i < cmds.size() ; i++) {
os << *cmds[i];
}
return os;
}
friend std::ostream& operator<<(std::ostream& os, const sample_t& s) {
return s.dump(os);
}
#endif
virtual ~sample_t();
void add_cmd(sample_inst_t *cmd) { cmds.push_back(cmd); }
inline const uint64_t get_cycle() const { return cycle; }
inline const std::vector<sample_inst_t*>& get_cmds() const { return cmds; }
#ifdef ENABLE_SNAPSHOT
size_t read_chain(CHAIN_TYPE type, const char* snap, size_t start = 0);
static void init_chains(std::string filename);
static void dump_chains(FILE *file);
static void dump_chains(std::ostream &os);
static size_t get_chain_loop(CHAIN_TYPE t) {
return chain_loop[t];
}
static size_t get_chain_len(CHAIN_TYPE t) {
return chain_len[t];
}
#endif
private:
const uint64_t cycle;
std::vector<sample_inst_t*> cmds;
#ifdef ENABLE_SNAPSHOT
std::vector<std::vector<force_t*>> force_bins;
size_t force_bin_idx;
size_t force_prev_id;
static size_t chain_loop[CHAIN_NUM];
static size_t chain_len[CHAIN_NUM];
static std::array<std::vector<std::string>, CHAIN_NUM> signals;
static std::array<std::vector<size_t>, CHAIN_NUM> widths;
static std::array<std::vector<int>, CHAIN_NUM> depths;
#endif
};
#endif // __SAMPLE_H

View File

@ -0,0 +1,333 @@
// See LICENSE for license details.
#include "simif.h"
#include <fstream>
#include <iostream>
#include <algorithm>
#ifdef ENABLE_SNAPSHOT
void simif_t::init_sampling(int argc, char** argv) {
// Read mapping files
sample_t::init_chains(std::string(TARGET_NAME) + ".chain");
// Init sample variables
sample_file = std::string(TARGET_NAME) + ".sample";
sample_num = 30;
last_sample = NULL;
last_sample_id = 0;
profile = false;
sample_count = 0;
sample_time = 0;
sample_cycle = 0;
snap_cycle = -1ULL;
tracelen = TRACE_MAX_LEN;
trace_count = 0;
std::vector<std::string> args(argv + 1, argv + argc);
for (auto &arg: args) {
if (arg.find("+sample=") == 0) {
sample_file = arg.c_str() + 8;
}
if (arg.find("+samplenum=") == 0) {
sample_num = strtol(arg.c_str() + 11, NULL, 10);
}
if (arg.find("+sample-cycle=") == 0) {
sample_cycle = strtoll(arg.c_str() + 14, NULL, 10);
}
if (arg.find("+tracelen=") == 0) {
tracelen = strtol(arg.c_str() + 10, NULL, 10);
}
if (arg.find("+profile") == 0) {
profile = true;
}
}
assert(tracelen > 2);
write(TRACELEN_ADDR, tracelen);
#ifdef KEEP_SAMPLES_IN_MEM
samples = new sample_t*[sample_num];
for (size_t i = 0 ; i < sample_num ; i++) samples[i] = NULL;
#endif
// flush output traces by sim reset
for (size_t k = 0 ; k < OUT_TR_SIZE ; k++) {
size_t addr = OUT_TR_ADDRS[k];
size_t chunk = OUT_TR_CHUNKS[k];
for (size_t off = 0 ; off < chunk ; off++)
read(addr+off);
}
for (size_t id = 0, bits_id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
read((size_t)OUT_TR_READY_ADDRS[id]);
bits_id = !read((size_t)OUT_TR_VALID_ADDRS[id]) ?
bits_id + (size_t)OUT_TR_BITS_FIELD_NUMS[id] :
trace_ready_valid_bits(NULL, false, id, bits_id);
}
}
void simif_t::finish_sampling() {
// tail samples
save_sample();
// dump samples
std::ofstream file(sample_file.c_str(), std::ios_base::out | std::ios_base::trunc);
sample_t::dump_chains(file);
#ifdef KEEP_SAMPLES_IN_MEM
for (size_t i = 0 ; i < sample_num ; i++) {
if (samples[i] != NULL) {
samples[i]->dump(file);
delete samples[i];
}
}
delete[] samples;
#else
for (size_t i = 0 ; i < std::min(sample_num, sample_count) ; i++) {
std::string fname = sample_file + "_" + std::to_string(i);
std::ifstream f(fname.c_str());
std::string line;
while (std::getline(f, line)) {
file << line << std::endl;
}
remove(fname.c_str());
}
#endif
file.close();
fprintf(stderr, "Sample Count: %zu\n", sample_count);
if (profile) {
double sim_time = diff_secs(timestamp(), sim_start_time);
fprintf(stderr, "Sample Time: %.3f s\n", diff_secs(sample_time, 0));
}
}
static const size_t data_t_chunks = sizeof(data_t) / sizeof(uint32_t);
size_t simif_t::trace_ready_valid_bits(sample_t* sample, bool poke, size_t id, size_t bits_id) {
size_t bits_addr = poke ? (size_t)IN_TR_BITS_ADDRS[id] : (size_t)OUT_TR_BITS_ADDRS[id];
size_t bits_chunk = poke ? (size_t)IN_TR_BITS_CHUNKS[id] : (size_t)OUT_TR_BITS_CHUNKS[id];
size_t num_fields = poke ? (size_t)IN_TR_BITS_FIELD_NUMS[id] : (size_t)OUT_TR_BITS_FIELD_NUMS[id];
data_t *bits_data = new data_t[bits_chunk];
for (size_t off = 0 ; off < bits_chunk ; off++) {
bits_data[off] = read(bits_addr + off);
}
if (sample) {
mpz_t data;
mpz_init(data);
mpz_import(data, bits_chunk, -1, sizeof(data_t), 0, 0, bits_data);
for (size_t k = 0, off = 0 ; k < num_fields ; k++, bits_id++) {
size_t field_width = ((unsigned int*)(
poke ? IN_TR_BITS_FIELD_WIDTHS : OUT_TR_BITS_FIELD_WIDTHS))[bits_id];
mpz_t *value = (mpz_t*)malloc(sizeof(mpz_t)), mask;
mpz_inits(*value, mask, NULL);
// value = data >> off
mpz_fdiv_q_2exp(*value, data, off);
// mask = (1 << field_width) - 1
mpz_set_ui(mask, 1);
mpz_mul_2exp(mask, mask, field_width);
mpz_sub_ui(mask, mask, 1);
// *value = *value & mask
mpz_and(*value, *value, mask);
mpz_clear(mask);
sample->add_cmd(poke ?
(sample_inst_t*) new poke_t(IN_TR_BITS, bits_id, value):
(sample_inst_t*) new expect_t(OUT_TR_BITS, bits_id, value));
off += field_width;
}
mpz_clear(data);
}
delete[] bits_data;
return bits_id;
}
sample_t* simif_t::read_traces(sample_t *sample) {
for (size_t i = 0 ; i < std::min(trace_count, tracelen) ; i++) {
// wire input traces from FPGA
for (size_t id = 0 ; id < IN_TR_SIZE ; id++) {
size_t addr = IN_TR_ADDRS[id];
size_t chunk = IN_TR_CHUNKS[id];
data_t *data = new data_t[chunk];
for (size_t off = 0 ; off < chunk ; off++) {
data[off] = read(addr+off);
}
if (sample) {
mpz_t *value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_import(*value, chunk, -1, sizeof(data_t), 0, 0, data);
sample->add_cmd(new poke_t(IN_TR, id, value));
}
delete[] data;
}
// ready valid input traces from FPGA
for (size_t id = 0, bits_id = 0 ; id < IN_TR_READY_VALID_SIZE ; id++) {
size_t valid_addr = (size_t)IN_TR_VALID_ADDRS[id];
data_t valid_data = read(valid_addr);
if (sample) {
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_ui(*value, valid_data);
sample->add_cmd(new poke_t(IN_TR_VALID, id, value));
}
bits_id = !valid_data ?
bits_id + (size_t)IN_TR_BITS_FIELD_NUMS[id] :
trace_ready_valid_bits(sample, true, id, bits_id);
}
for (size_t id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
size_t ready_addr = (size_t)OUT_TR_READY_ADDRS[id];
data_t ready_data = read(ready_addr);
if (sample) {
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_ui(*value, ready_data);
sample->add_cmd(new poke_t(OUT_TR_READY, id, value));
}
}
if (sample) sample->add_cmd(new step_t(1));
// wire output traces from FPGA
for (size_t id = 0 ; id < OUT_TR_SIZE ; id++) {
size_t addr = OUT_TR_ADDRS[id];
size_t chunk = OUT_TR_CHUNKS[id];
data_t *data = new data_t[chunk];
for (size_t off = 0 ; off < chunk ; off++) {
data[off] = read(addr+off);
}
if (sample && i > 0) {
mpz_t *value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_import(*value, chunk, -1, sizeof(data_t), 0, 0, data);
sample->add_cmd(new expect_t(OUT_TR, id, value));
}
delete[] data;
}
// ready valid output traces from FPGA
for (size_t id = 0, bits_id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
size_t valid_addr = (size_t)OUT_TR_VALID_ADDRS[id];
data_t valid_data = read(valid_addr);
if (sample) {
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_ui(*value, valid_data);
sample->add_cmd(new expect_t(OUT_TR_VALID, id, value));
}
bits_id = !valid_data ?
bits_id + (size_t)OUT_TR_BITS_FIELD_NUMS[id] :
trace_ready_valid_bits(sample, false, id, bits_id);
}
for (size_t id = 0 ; id < IN_TR_READY_VALID_SIZE ; id++) {
size_t ready_addr = (size_t)IN_TR_READY_ADDRS[id];
data_t ready_data = read(ready_addr);
if (sample) {
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
mpz_init(*value);
mpz_set_ui(*value, ready_data);
sample->add_cmd(new expect_t(IN_TR_READY, id, value));
}
}
}
if (sample && sample_cycle > 0) {
sample->add_cmd(new step_t(5)); // to catch assertions in replay
}
return sample;
}
static inline char* int_to_bin(char *bin, data_t value, size_t size) {
for (size_t i = 0 ; i < size; i++) {
bin[i] = ((value >> (size-1-i)) & 0x1) + '0';
}
bin[size] = 0;
return bin;
}
sample_t* simif_t::read_snapshot(bool load) {
std::ostringstream snap;
char bin[DAISY_WIDTH+1];
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
CHAIN_TYPE type = static_cast<CHAIN_TYPE>(t);
const size_t chain_loop = sample_t::get_chain_loop(type);
const size_t chain_len = sample_t::get_chain_len(type);
for (size_t k = 0 ; k < chain_loop ; k++) {
for (size_t i = 0 ; i < CHAIN_SIZE[t] ; i++) {
switch(type) {
case SRAM_CHAIN:
write(SRAM_RESTART_ADDR + i, 1);
break;
case REGFILE_CHAIN:
write(REGFILE_RESTART_ADDR + i, 1);
break;
default:
break;
}
for (size_t j = 0 ; j < chain_len ; j++) {
// TODO: write arbitrary values
if (load) write(CHAIN_IN_ADDR[t], 0);
data_t value = read(CHAIN_ADDR[t] + i);
if (!load) snap << int_to_bin(bin, value, DAISY_WIDTH);
}
if (load) write(CHAIN_LOAD_ADDR[t], 1);
}
}
}
return load ? NULL : new sample_t(snap.str().c_str(), cycles());
}
void simif_t::save_sample() {
if (last_sample != NULL) {
sample_t* sample = read_traces(last_sample);
#ifdef KEEP_SAMPLES_IN_MEM
if (samples[last_sample_id] != NULL)
delete samples[last_sample_id];
samples[last_sample_id] = sample;
#else
std::string filename = sample_file + "_" + std::to_string(last_sample_id);
std::ofstream file(filename.c_str(), std::ios_base::out | std::ios_base::trunc);
sample->dump(file);
delete sample;
file.close();
#endif
}
}
void simif_t::reservoir_sampling(size_t n) {
if (t % tracelen == 0) {
midas_time_t start_time = 0;
uint64_t record_id = t / tracelen;
uint64_t sample_id = record_id < sample_num ? record_id : gen() % (record_id + 1);
if (sample_id < sample_num) {
sample_count++;
if (profile) start_time = timestamp();
save_sample();
last_sample = read_snapshot();
last_sample_id = sample_id;
trace_count = 0;
if (profile) sample_time += (timestamp() - start_time);
}
}
if (trace_count < tracelen) trace_count += n;
}
void simif_t::deterministic_sampling(size_t n) {
if (((t + n) - sample_cycle <= tracelen || sample_cycle <= t) &&
((last_sample_id + 1) < sample_num)) {
sample_count++;
snap_cycle = t;
fprintf(stderr, "[id: %u] Snapshot at %llu\n",
(unsigned)last_sample_id, (unsigned long long)t);
trace_count = std::min(n, tracelen);
if (last_sample) {
save_sample();
} else {
// flush trace buffer
read_traces(NULL);
}
trace_count = 0;
last_sample_id = last_sample ? last_sample_id + 1 : 0;
last_sample = read_snapshot();
}
}
#endif

View File

@ -0,0 +1,224 @@
// See LICENSE for license details.
#include "simif.h"
#include <fstream>
#include <algorithm>
midas_time_t timestamp(){
struct timeval tv;
gettimeofday(&tv, NULL);
return 1000000L * tv.tv_sec + tv.tv_usec;
}
double diff_secs(midas_time_t end, midas_time_t start) {
return ((double)(end - start)) / TIME_DIV_CONST;
}
simif_t::simif_t() {
pass = true;
t = 0;
fail_t = 0;
seed = time(NULL); // FIXME: better initail seed?
SIMULATIONMASTER_0_substruct_create;
this->master_mmio_addrs = SIMULATIONMASTER_0_substruct;
LOADMEMWIDGET_0_substruct_create;
this->loadmem_mmio_addrs = LOADMEMWIDGET_0_substruct;
PEEKPOKEBRIDGEMODULE_0_substruct_create;
this->defaultiowidget_mmio_addrs = PEEKPOKEBRIDGEMODULE_0_substruct;
}
void simif_t::init(int argc, char** argv, bool log) {
// Simulation reset
write(this->master_mmio_addrs->SIM_RESET, 1);
while(!done());
this->log = log;
std::vector<std::string> args(argv + 1, argv + argc);
std::string loadmem;
bool fastloadmem = false;
for (auto &arg: args) {
if (arg.find("+fastloadmem") == 0) {
fastloadmem = true;
}
if (arg.find("+loadmem=") == 0) {
loadmem = arg.c_str() + 9;
}
if (arg.find("+seed=") == 0) {
seed = strtoll(arg.c_str() + 6, NULL, 10);
fprintf(stderr, "Using custom SEED: %ld\n", seed);
}
}
gen.seed(seed);
fprintf(stderr, "random min: 0x%llx, random max: 0x%llx\n", gen.min(), gen.max());
if (!fastloadmem && !loadmem.empty()) {
load_mem(loadmem.c_str());
}
#ifdef ENABLE_SNAPSHOT
init_sampling(argc, argv);
#endif
}
uint64_t simif_t::actual_tcycle() {
write(this->defaultiowidget_mmio_addrs->tCycle_latch, 1);
data_t cycle_l = read(this->defaultiowidget_mmio_addrs->tCycle_0);
data_t cycle_h = read(this->defaultiowidget_mmio_addrs->tCycle_1);
return (((uint64_t) cycle_h) << 32) | cycle_l;
}
uint64_t simif_t::hcycle() {
write(this->defaultiowidget_mmio_addrs->hCycle_latch, 1);
data_t cycle_l = read(this->defaultiowidget_mmio_addrs->hCycle_0);
data_t cycle_h = read(this->defaultiowidget_mmio_addrs->hCycle_1);
return (((uint64_t) cycle_h) << 32) | cycle_l;
}
void simif_t::target_reset(int pulse_length) {
poke(reset, 1);
take_steps(pulse_length, true);
poke(reset, 0);
#ifdef ENABLE_SNAPSHOT
// flush I/O traces by target resets
trace_count = std::min((size_t)(pulse_length), tracelen);
read_traces(NULL);
trace_count = 0;
#endif
}
int simif_t::finish() {
#ifdef ENABLE_SNAPSHOT
finish_sampling();
#endif
fprintf(stderr, "Runs %llu cycles\n", actual_tcycle());
fprintf(stderr, "[%s] %s Test", pass ? "PASS" : "FAIL", TARGET_NAME);
if (!pass) { fprintf(stdout, " at cycle %llu", fail_t); }
fprintf(stderr, "\nSEED: %ld\n", seed);
return pass ? EXIT_SUCCESS : EXIT_FAILURE;
}
static const size_t data_t_chunks = sizeof(data_t) / sizeof(uint32_t);
void simif_t::poke(size_t id, mpz_t& value) {
if (log) {
char* v_str = mpz_get_str(NULL, 16, value);
fprintf(stderr, "* POKE %s.%s <- 0x%s *\n", TARGET_NAME, INPUT_NAMES[id], v_str);
free(v_str);
}
size_t size;
data_t* data = (data_t*)mpz_export(NULL, &size, -1, sizeof(data_t), 0, 0, value);
for (size_t i = 0 ; i < INPUT_CHUNKS[id] ; i++) {
write(INPUT_ADDRS[id]+i, i < size ? data[i] : 0);
}
}
void simif_t::peek(size_t id, mpz_t& value) {
const size_t size = (const size_t)OUTPUT_CHUNKS[id];
data_t data[size];
for (size_t i = 0 ; i < size ; i++) {
data[i] = read((size_t)OUTPUT_ADDRS[id]+i);
}
mpz_import(value, size, -1, sizeof(data_t), 0, 0, data);
if (log) {
char* v_str = mpz_get_str(NULL, 16, value);
fprintf(stderr, "* PEEK %s.%s -> 0x%s *\n", TARGET_NAME, (const char*)OUTPUT_NAMES[id], v_str);
free(v_str);
}
}
bool simif_t::expect(size_t id, mpz_t& expected) {
mpz_t value;
mpz_init(value);
peek(id, value);
bool pass = mpz_cmp(value, expected) == 0;
if (log) {
char* v_str = mpz_get_str(NULL, 16, value);
char* e_str = mpz_get_str(NULL, 16, expected);
fprintf(stderr, "* EXPECT %s.%s -> 0x%s ?= 0x%s : %s\n",
TARGET_NAME, (const char*)OUTPUT_NAMES[id], v_str, e_str, pass ? "PASS" : "FAIL");
free(v_str);
free(e_str);
}
mpz_clear(value);
return expect(pass, NULL);
}
void simif_t::step(uint32_t n, bool blocking) {
if (n == 0) return;
#ifdef ENABLE_SNAPSHOT
reservoir_sampling(n);
#endif
// take steps
if (log) fprintf(stderr, "* STEP %d -> %llu *\n", n, (t + n));
take_steps(n, blocking);
t += n;
}
void simif_t::load_mem(std::string filename) {
fprintf(stdout, "[loadmem] start loading\n");
std::ifstream file(filename.c_str());
if (!file) {
fprintf(stderr, "Cannot open %s\n", filename.c_str());
exit(EXIT_FAILURE);
}
const size_t chunk = MEM_DATA_BITS / 4;
size_t addr = 0;
std::string line;
mpz_t data;
mpz_init(data);
while (std::getline(file, line)) {
assert(line.length() % chunk == 0);
for (int j = line.length() - chunk ; j >= 0 ; j -= chunk) {
mpz_set_str(data, line.substr(j, chunk).c_str(), 16);
write_mem(addr, data);
addr += chunk / 2;
}
}
mpz_clear(data);
file.close();
fprintf(stdout, "[loadmem] done\n");
}
// NB: mpz_t variables may not export <size> <data_t> beats, if initialized with an array of zeros.
void simif_t::read_mem(size_t addr, mpz_t& value) {
write(this->loadmem_mmio_addrs->R_ADDRESS_H, addr >> 32);
write(this->loadmem_mmio_addrs->R_ADDRESS_L, addr & ((1ULL << 32) - 1));
const size_t size = MEM_DATA_CHUNK;
data_t data[size];
for (size_t i = 0 ; i < size ; i++) {
data[i] = read(this->loadmem_mmio_addrs->R_DATA);
}
mpz_import(value, size, -1, sizeof(data_t), 0, 0, data);
}
void simif_t::write_mem(size_t addr, mpz_t& value) {
write(this->loadmem_mmio_addrs->W_ADDRESS_H, addr >> 32);
write(this->loadmem_mmio_addrs->W_ADDRESS_L, addr & ((1ULL << 32) - 1));
write(this->loadmem_mmio_addrs->W_LENGTH, 1);
size_t size;
data_t* data = (data_t*)mpz_export(NULL, &size, -1, sizeof(data_t), 0, 0, value);
for (size_t i = 0 ; i < MEM_DATA_CHUNK ; i++) {
write(this->loadmem_mmio_addrs->W_DATA, i < size ? data[i] : 0);
}
}
#define MEM_DATA_CHUNK_BYTES (MEM_DATA_CHUNK*sizeof(data_t))
#define ceil_div(a, b) (((a) - 1) / (b) + 1)
void simif_t::write_mem_chunk(size_t addr, mpz_t& value, size_t bytes) {
write(this->loadmem_mmio_addrs->W_ADDRESS_H, addr >> 32);
write(this->loadmem_mmio_addrs->W_ADDRESS_L, addr & ((1ULL << 32) - 1));
size_t num_beats = ceil_div(bytes, MEM_DATA_CHUNK_BYTES);
write(this->loadmem_mmio_addrs->W_LENGTH, num_beats);
size_t size;
data_t* data = (data_t*)mpz_export(NULL, &size, -1, sizeof(data_t), 0, 0, value);
for (size_t i = 0 ; i < num_beats * MEM_DATA_CHUNK ; i++) {
write(this->loadmem_mmio_addrs->W_DATA, i < size ? data[i] : 0);
}
}
void simif_t::zero_out_dram() {
write(this->loadmem_mmio_addrs->ZERO_OUT_DRAM, 1);
while(!read(this->loadmem_mmio_addrs->ZERO_FINISHED));
}

View File

@ -0,0 +1,167 @@
// See LICENSE for license details.
#ifndef __SIMIF_H
#define __SIMIF_H
#include <cassert>
#include <cstring>
#include <sstream>
#include <map>
#include <queue>
#include <random>
#ifdef ENABLE_SNAPSHOT
#include "sample/sample.h"
#endif
#include <gmp.h>
#include <sys/time.h>
#define TIME_DIV_CONST 1000000.0;
typedef uint64_t midas_time_t;
midas_time_t timestamp();
double diff_secs(midas_time_t end, midas_time_t start);
typedef std::map< std::string, size_t > idmap_t;
typedef std::map< std::string, size_t >::const_iterator idmap_it_t;
class simif_t
{
public:
simif_t();
virtual ~simif_t() { }
private:
// simulation information
bool log;
bool pass;
uint64_t t;
uint64_t fail_t;
// random numbers
uint64_t seed;
std::mt19937_64 gen;
SIMULATIONMASTER_struct * master_mmio_addrs;
LOADMEMWIDGET_struct * loadmem_mmio_addrs;
PEEKPOKEBRIDGEMODULE_struct * defaultiowidget_mmio_addrs;
midas_time_t sim_start_time;
inline void take_steps(size_t n, bool blocking) {
write(this->master_mmio_addrs->STEP, n);
if (blocking) while(!done());
}
virtual void load_mem(std::string filename);
public:
// Simulation APIs
virtual void init(int argc, char** argv, bool log = false);
virtual int finish();
virtual void step(uint32_t n, bool blocking = true);
inline bool done() { return read(this->master_mmio_addrs->DONE); }
// Widget communication
virtual void write(size_t addr, data_t data) = 0;
virtual data_t read(size_t addr) = 0;
virtual ssize_t pull(size_t addr, char *data, size_t size) = 0;
virtual ssize_t push(size_t addr, char *data, size_t size) = 0;
inline void poke(size_t id, data_t value) {
if (log) fprintf(stderr, "* POKE %s.%s <- 0x%x *\n",
TARGET_NAME, INPUT_NAMES[id], value);
write(INPUT_ADDRS[id], value);
}
inline data_t peek(size_t id) {
data_t value = read(((unsigned int*)OUTPUT_ADDRS)[id]);
if (log) fprintf(stderr, "* PEEK %s.%s -> 0x%x *\n",
TARGET_NAME, (const char*)OUTPUT_NAMES[id], value);
return value;
}
inline bool expect(size_t id, data_t expected) {
data_t value = peek(id);
bool pass = value == expected;
if (log) fprintf(stderr, "* EXPECT %s.%s -> 0x%x ?= 0x%x : %s\n",
TARGET_NAME, (const char*)OUTPUT_NAMES[id], value, expected, pass ? "PASS" : "FAIL");
return expect(pass, NULL);
}
inline bool expect(bool pass, const char *s) {
if (log && s) fprintf(stderr, "* %s : %s *\n", s, pass ? "PASS" : "FAIL");
if (this->pass && !pass) fail_t = t;
this->pass &= pass;
return pass;
}
void poke(size_t id, mpz_t& value);
void peek(size_t id, mpz_t& value);
bool expect(size_t id, mpz_t& expected);
// LOADMEM functions
void read_mem(size_t addr, mpz_t& value);
void write_mem(size_t addr, mpz_t& value);
void write_mem_chunk(size_t addr, mpz_t& value, size_t bytes);
void zero_out_dram();
uint64_t get_seed() { return seed; };
// A default reset scheme that holds reset high for pulse_length cycles
void target_reset(int pulse_length = 5);
// Returns an upper bound for the cycle reached by the target
// If using blocking steps, this will be ~equivalent to actual_tcycle()
uint64_t cycles(){ return t; };
// Returns the current target cycle as measured by a hardware counter in the DefaultIOWidget
// (# of reset tokens generated)
uint64_t actual_tcycle();
// Returns the current host cycle as measured by a hardware counter
uint64_t hcycle();
uint64_t rand_next(uint64_t limit) { return gen() % limit; }
#ifdef ENABLE_SNAPSHOT
private:
// sample information
#ifdef KEEP_SAMPLES_IN_MEM
sample_t** samples;
#endif
sample_t* last_sample;
size_t sample_num;
size_t last_sample_id;
std::string sample_file;
uint64_t sample_cycle;
uint64_t snap_cycle;
size_t trace_count;
// profile information
bool profile;
size_t sample_count;
midas_time_t sample_time;
void init_sampling(int argc, char** argv);
void finish_sampling();
void reservoir_sampling(size_t n);
void deterministic_sampling(size_t n);
size_t trace_ready_valid_bits(
sample_t* sample, bool poke, size_t id, size_t bits_id);
inline void save_sample();
protected:
size_t tracelen;
sample_t* read_snapshot(bool load = false);
sample_t* read_traces(sample_t* s);
public:
uint64_t get_snap_cycle() const {
return snap_cycle;
}
uint64_t get_sample_cycle() const {
return sample_cycle;
}
void set_sample_cycle(uint64_t cycle) {
sample_cycle = cycle;
}
void set_trace_count(uint64_t count) {
trace_count = count;
}
#endif
};
#endif // __SIMIF_H

View File

@ -0,0 +1,201 @@
// See LICENSE for license details.
#include "simif_emul.h"
#ifdef VCS
#include "midas_context.h"
#include "emul/vcs_main.h"
#else
#include <verilated.h>
#if VM_TRACE
#include <verilated_vcd_c.h>
#endif
#endif
#include <signal.h>
uint64_t main_time = 0;
std::unique_ptr<mmio_t> master;
std::unique_ptr<mmio_t> dma;
#ifdef VCS
midas_context_t* host;
midas_context_t target;
bool vcs_rst = false;
bool vcs_fin = false;
#else
PLATFORM_TYPE* top = NULL;
#if VM_TRACE
VerilatedVcdC* tfp = NULL;
#endif // VM_TRACE
double sc_time_stamp() {
return (double) main_time;
}
extern void tick();
#endif // VCS
void finish() {
#ifdef VCS
vcs_fin = true;
target.switch_to();
#else
#if VM_TRACE
if (tfp) tfp->close();
delete tfp;
#endif // VM_TRACE
#endif // VCS
}
void handle_sigterm(int sig) {
finish();
}
simif_emul_t::~simif_emul_t() { }
void simif_emul_t::init(int argc, char** argv, bool log) {
// Parse args
std::vector<std::string> args(argv + 1, argv + argc);
std::string waveform = "dump.vcd";
std::string loadmem;
bool fastloadmem = false;
bool dramsim = false;
uint64_t memsize = 1L << MEM_ADDR_BITS;
for (auto arg: args) {
if (arg.find("+waveform=") == 0) {
waveform = arg.c_str() + 10;
}
if (arg.find("+loadmem=") == 0) {
loadmem = arg.c_str() + 9;
}
if (arg.find("+fastloadmem") == 0) {
fastloadmem = true;
}
if (arg.find("+dramsim") == 0) {
dramsim = true;
}
if (arg.find("+memsize=") == 0) {
memsize = strtoll(arg.c_str() + 9, NULL, 10);
}
if (arg.find("+fuzz-host-timing=") == 0) {
maximum_host_delay = atoi(arg.c_str() + 18);
}
}
void* mems[1];
mems[0] = ::init(memsize, dramsim);
if (mems[0] && fastloadmem && !loadmem.empty()) {
fprintf(stdout, "[fast loadmem] %s\n", loadmem.c_str());
::load_mem(mems, loadmem.c_str(), MEM_DATA_BITS / 8, 1);
}
signal(SIGTERM, handle_sigterm);
#ifdef VCS
host = midas_context_t::current();
target_args_t *targs = new target_args_t(argc, argv);
target.init(target_thread, targs);
vcs_rst = true;
for (size_t i = 0 ; i < 10 ; i++)
target.switch_to();
vcs_rst = false;
#else
Verilated::commandArgs(argc, argv); // Remember args
top = new PLATFORM_TYPE;
#if VM_TRACE // If emul was invoked with --trace
tfp = new VerilatedVcdC;
Verilated::traceEverOn(true); // Verilator must compute traced signals
VL_PRINTF("Enabling waves: %s\n", waveform.c_str());
top->trace(tfp, 99); // Trace 99 levels of hierarchy
tfp->open(waveform.c_str()); // Open the dump file
#endif // VM_TRACE
top->reset = 1;
for (size_t i = 0 ; i < 10 ; i++) ::tick();
top->reset = 0;
#endif
simif_t::init(argc, argv, log);
}
int simif_emul_t::finish() {
int exitcode = simif_t::finish();
::finish();
return exitcode;
}
void simif_emul_t::advance_target() {
int cycles_to_wait = rand_next(maximum_host_delay) + 1;
for (int i = 0; i < cycles_to_wait; i++) {
#ifdef VCS
target.switch_to();
#else
::tick();
#endif
}
}
void simif_emul_t::wait_write(std::unique_ptr<mmio_t>& mmio) {
while(!mmio->write_resp()) advance_target();
}
void simif_emul_t::wait_read(std::unique_ptr<mmio_t>& mmio, void *data) {
while(!mmio->read_resp(data)) advance_target();
}
void simif_emul_t::write(size_t addr, data_t data) {
size_t strb = (1 << CTRL_STRB_BITS) - 1;
master->write_req(addr << CHANNEL_SIZE, CHANNEL_SIZE, 0, &data, &strb);
wait_write(master);
}
data_t simif_emul_t::read(size_t addr) {
data_t data;
master->read_req(addr << CHANNEL_SIZE, CHANNEL_SIZE, 0);
wait_read(master, &data);
return data;
}
#define MAX_LEN 255
ssize_t simif_emul_t::pull(size_t addr, char* data, size_t size) {
ssize_t len = (size - 1) / DMA_WIDTH;
while (len >= 0) {
size_t part_len = len % (MAX_LEN + 1);
dma->read_req(addr, DMA_SIZE, part_len);
wait_read(dma, data);
len -= (part_len + 1);
addr += (part_len + 1) * DMA_WIDTH;
data += (part_len + 1) * DMA_WIDTH;
}
return size;
}
ssize_t simif_emul_t::push(size_t addr, char *data, size_t size) {
ssize_t len = (size - 1) / DMA_WIDTH;
size_t remaining = size - len * DMA_WIDTH;
size_t strb[len + 1];
size_t *strb_ptr = &strb[0];
for (int i = 0; i < len; i++)
strb[i] = (1LL << DMA_WIDTH) - 1;
if (remaining == DMA_WIDTH)
strb[len] = strb[0];
else
strb[len] = (1LL << remaining) - 1;
while (len >= 0) {
size_t part_len = len % (MAX_LEN + 1);
dma->write_req(addr, DMA_SIZE, part_len, data, strb_ptr);
wait_write(dma);
len -= (part_len + 1);
addr += (part_len + 1) * DMA_WIDTH;
data += (part_len + 1) * DMA_WIDTH;
strb_ptr += (part_len + 1);
}
return size;
}

View File

@ -0,0 +1,38 @@
// See LICENSE for license details.
#ifndef __SIMIF_EMUL_H
#define __SIMIF_EMUL_H
#include <memory>
#include "simif.h"
#include "mm.h"
#include "mm_dramsim2.h"
#include "emul/mmio.h"
// simif_emul_t is a concrete simif_t implementation for Software RTL simulators
// The basis for MIDAS-level simulation
class simif_emul_t : public virtual simif_t
{
public:
simif_emul_t() { }
virtual ~simif_emul_t();
virtual void init(int argc, char** argv, bool log = false);
virtual int finish();
virtual void write(size_t addr, data_t data);
virtual data_t read(size_t addr);
virtual ssize_t pull(size_t addr, char* data, size_t size);
virtual ssize_t push(size_t addr, char* data, size_t size);
private:
// The maximum number of cycles the RTL simulator can advance before
// switching back to the driver process. +fuzz-host-timings sets this to a value > 1, introducing random delays
// in MMIO (read, write) and DMA (push, pull) requests
int maximum_host_delay = 1;
void advance_target();
void wait_read(std::unique_ptr<mmio_t>& mmio, void *data);
void wait_write(std::unique_ptr<mmio_t>& mmio);
};
#endif // __SIMIF_EMUL_H

View File

@ -0,0 +1,212 @@
#include "simif_f1.h"
#include <cassert>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
simif_f1_t::simif_f1_t(int argc, char** argv) {
#ifdef SIMULATION_XSIM
mkfifo(driver_to_xsim, 0666);
fprintf(stderr, "opening driver to xsim\n");
driver_to_xsim_fd = open(driver_to_xsim, O_WRONLY);
fprintf(stderr, "opening xsim to driver\n");
xsim_to_driver_fd = open(xsim_to_driver, O_RDONLY);
#else
slot_id = -1;
std::vector<std::string> args(argv + 1, argv + argc);
for (auto &arg: args) {
if (arg.find("+slotid=") == 0) {
slot_id = atoi((arg.c_str()) + 8);
}
}
if (slot_id == -1) {
fprintf(stderr, "Slot ID not specified. Assuming Slot 0\n");
slot_id = 0;
}
fpga_setup(slot_id);
#endif
}
void simif_f1_t::check_rc(int rc, char * infostr) {
#ifndef SIMULATION_XSIM
if (rc) {
if (infostr) {
fprintf(stderr, "%s\n", infostr);
}
fprintf(stderr, "INVALID RETCODE: %d\n", rc, infostr);
fpga_shutdown();
exit(1);
}
#endif
}
void simif_f1_t::fpga_shutdown() {
#ifndef SIMULATION_XSIM
int rc = fpga_pci_detach(pci_bar_handle);
// don't call check_rc because of fpga_shutdown call. do it manually:
if (rc) {
fprintf(stderr, "Failure while detaching from the fpga: %d\n", rc);
}
close(edma_write_fd);
close(edma_read_fd);
#endif
}
void simif_f1_t::fpga_setup(int slot_id) {
#ifndef SIMULATION_XSIM
/*
* pci_vendor_id and pci_device_id values below are Amazon's and avaliable
* to use for a given FPGA slot.
* Users may replace these with their own if allocated to them by PCI SIG
*/
uint16_t pci_vendor_id = 0x1D0F; /* Amazon PCI Vendor ID */
uint16_t pci_device_id = 0xF000; /* PCI Device ID preassigned by Amazon for F1 applications */
int rc = fpga_pci_init();
check_rc(rc, "fpga_pci_init FAILED");
/* check AFI status */
struct fpga_mgmt_image_info info = {0};
/* get local image description, contains status, vendor id, and device id. */
rc = fpga_mgmt_describe_local_image(slot_id, &info,0);
check_rc(rc, "Unable to get AFI information from slot. Are you running as root?");
/* check to see if the slot is ready */
if (info.status != FPGA_STATUS_LOADED) {
rc = 1;
check_rc(rc, "AFI in Slot is not in READY state !");
}
fprintf(stderr, "AFI PCI Vendor ID: 0x%x, Device ID 0x%x\n",
info.spec.map[FPGA_APP_PF].vendor_id,
info.spec.map[FPGA_APP_PF].device_id);
/* confirm that the AFI that we expect is in fact loaded */
if (info.spec.map[FPGA_APP_PF].vendor_id != pci_vendor_id ||
info.spec.map[FPGA_APP_PF].device_id != pci_device_id) {
fprintf(stderr, "AFI does not show expected PCI vendor id and device ID. If the AFI "
"was just loaded, it might need a rescan. Rescanning now.\n");
rc = fpga_pci_rescan_slot_app_pfs(slot_id);
check_rc(rc, "Unable to update PF for slot");
/* get local image description, contains status, vendor id, and device id. */
rc = fpga_mgmt_describe_local_image(slot_id, &info,0);
check_rc(rc, "Unable to get AFI information from slot");
fprintf(stderr, "AFI PCI Vendor ID: 0x%x, Device ID 0x%x\n",
info.spec.map[FPGA_APP_PF].vendor_id,
info.spec.map[FPGA_APP_PF].device_id);
/* confirm that the AFI that we expect is in fact loaded after rescan */
if (info.spec.map[FPGA_APP_PF].vendor_id != pci_vendor_id ||
info.spec.map[FPGA_APP_PF].device_id != pci_device_id) {
rc = 1;
check_rc(rc, "The PCI vendor id and device of the loaded AFI are not "
"the expected values.");
}
}
/* attach to BAR0 */
pci_bar_handle = PCI_BAR_HANDLE_INIT;
rc = fpga_pci_attach(slot_id, FPGA_APP_PF, APP_PF_BAR0, 0, &pci_bar_handle);
check_rc(rc, "fpga_pci_attach FAILED");
// EDMA setup
char device_file_name[256];
char device_file_name2[256];
sprintf(device_file_name, "/dev/xdma%d_h2c_0", slot_id);
printf("Using xdma write queue: %s\n", device_file_name);
sprintf(device_file_name2, "/dev/xdma%d_c2h_0", slot_id);
printf("Using xdma read queue: %s\n", device_file_name2);
edma_write_fd = open(device_file_name, O_WRONLY);
edma_read_fd = open(device_file_name2, O_RDONLY);
assert(edma_write_fd >= 0);
assert(edma_read_fd >= 0);
#endif
}
simif_f1_t::~simif_f1_t() {
fpga_shutdown();
}
void simif_f1_t::write(size_t addr, uint32_t data) {
// addr is really a (32-byte) word address because of zynq implementation
addr <<= 2;
#ifdef SIMULATION_XSIM
uint64_t cmd = (((uint64_t)(0x80000000 | addr)) << 32) | (uint64_t)data;
char * buf = (char*)&cmd;
::write(driver_to_xsim_fd, buf, 8);
#else
int rc = fpga_pci_poke(pci_bar_handle, addr, data);
check_rc(rc, NULL);
#endif
}
uint32_t simif_f1_t::read(size_t addr) {
addr <<= 2;
#ifdef SIMULATION_XSIM
uint64_t cmd = addr;
char * buf = (char*)&cmd;
::write(driver_to_xsim_fd, buf, 8);
int gotdata = 0;
while (gotdata == 0) {
gotdata = ::read(xsim_to_driver_fd, buf, 8);
if (gotdata != 0 && gotdata != 8) {
printf("ERR GOTDATA %d\n", gotdata);
}
}
return *((uint64_t*)buf);
#else
uint32_t value;
int rc = fpga_pci_peek(pci_bar_handle, addr, &value);
return value & 0xFFFFFFFF;
#endif
}
ssize_t simif_f1_t::pull(size_t addr, char* data, size_t size) {
#ifdef SIMULATION_XSIM
return -1; // TODO
#else
return ::pread(edma_read_fd, data, size, addr);
#endif
}
ssize_t simif_f1_t::push(size_t addr, char* data, size_t size) {
#ifdef SIMULATION_XSIM
return -1; // TODO
#else
return ::pwrite(edma_write_fd, data, size, addr);
#endif
}
uint32_t simif_f1_t::is_write_ready() {
uint64_t addr = 0x4;
#ifdef SIMULATION_XSIM
uint64_t cmd = addr;
char * buf = (char*)&cmd;
::write(driver_to_xsim_fd, buf, 8);
int gotdata = 0;
while (gotdata == 0) {
gotdata = ::read(xsim_to_driver_fd, buf, 8);
if (gotdata != 0 && gotdata != 8) {
printf("ERR GOTDATA %d\n", gotdata);
}
}
return *((uint64_t*)buf);
#else
uint32_t value;
int rc = fpga_pci_peek(pci_bar_handle, addr, &value);
check_rc(rc, NULL);
return value & 0xFFFFFFFF;
#endif
}

View File

@ -0,0 +1,41 @@
#ifndef __SIMIF_F1_H
#define __SIMIF_F1_H
#include "simif.h" // from midas
#ifndef SIMULATION_XSIM
#include <fpga_pci.h>
#include <fpga_mgmt.h>
#endif
class simif_f1_t: public virtual simif_t
{
public:
simif_f1_t(int argc, char** argv);
virtual ~simif_f1_t();
virtual void write(size_t addr, uint32_t data);
virtual uint32_t read(size_t addr);
virtual ssize_t pull(size_t addr, char* data, size_t size);
virtual ssize_t push(size_t addr, char* data, size_t size);
uint32_t is_write_ready();
void check_rc(int rc, char * infostr);
void fpga_shutdown();
void fpga_setup(int slot_id);
private:
char in_buf[MMIO_WIDTH];
char out_buf[MMIO_WIDTH];
#ifdef SIMULATION_XSIM
char * driver_to_xsim = "/tmp/driver_to_xsim";
char * xsim_to_driver = "/tmp/xsim_to_driver";
int driver_to_xsim_fd;
int xsim_to_driver_fd;
#else
// int rc;
int slot_id;
int edma_write_fd;
int edma_read_fd;
pci_bar_handle_t pci_bar_handle;
#endif
};
#endif // __SIMIF_F1_H

View File

@ -0,0 +1,33 @@
// See LICENSE for license details.
#include "simif_zynq.h"
#include <cassert>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#define read_reg(r) (dev_vaddr[r])
#define write_reg(r, v) (dev_vaddr[r] = v)
simif_zynq_t::simif_zynq_t() {
int fd = open("/dev/mem", O_RDWR|O_SYNC);
assert(fd != -1);
int host_prot = PROT_READ | PROT_WRITE;
int flags = MAP_SHARED;
uintptr_t pgsize = sysconf(_SC_PAGESIZE);
assert(dev_paddr % pgsize == 0);
dev_vaddr = (uintptr_t*)mmap(0, pgsize, host_prot, flags, fd, dev_paddr);
assert(dev_vaddr != MAP_FAILED);
}
void simif_zynq_t::write(size_t addr, uint32_t data) {
write_reg(addr, data);
__sync_synchronize();
}
uint32_t simif_zynq_t::read(size_t addr) {
__sync_synchronize();
return read_reg(addr);
}

View File

@ -0,0 +1,27 @@
// See LICENSE for license details.
#ifndef __SIMIF_ZYNQ_H
#define __SIMIF_ZYNQ_H
#include "simif.h"
class simif_zynq_t: public virtual simif_t
{
public:
simif_zynq_t();
virtual ~simif_zynq_t() { }
private:
volatile uintptr_t* dev_vaddr;
const static uintptr_t dev_paddr = 0x43C00000;
protected:
virtual void write(size_t addr, uint32_t data);
virtual uint32_t read(size_t addr);
virtual size_t pread(size_t addr, char* data, size_t size) {
// Not supported
return 0;
}
};
#endif // __SIMIF_ZYNQ_H

View File

@ -0,0 +1,83 @@
# See LICENSE for license details.
#
# Makefrag for generating MIDAS's synthesizable unit tests
# Compulsory arguments:
# ROCKETCHIP_DIR: Location of rocket chip source -- to grab verilog sources and simulation makefrags
# TODO: These are provided as resources -- fix.
# SBT: command to invoke sbt
# GEN_DIR: Directory into which to emit generate verilog
DESIGN := TestHarness
CONFIG ?= AllUnitTests
OUT_DIR ?= $(GEN_DIR)
TB ?= TestDriver
EMUL ?= vcs
CLOCK_PERIOD ?= 1.0
MAKEFRAG_DIR:=$(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
sim_makefrag_dir := $(MAKEFRAG_DIR)/../rtlsim
vsrc := $(ROCKETCHIP_DIR)/src/main/resources/vsrc
csrc := $(ROCKETCHIP_DIR)/src/main/resources/csrc
# Stupidly guess what this test might depend on
src_path = src/main/scala
scala_srcs := $(shell find $(BASE_DIR) -name "*.scala")
$(GEN_DIR)/$(DESIGN).v $(GEN_DIR)/$(DESIGN).behav_srams.v: $(scala_srcs)
mkdir -p $(@D)
cd $(BASE_DIR) && $(SBT) "runMain midas.unittest.Generator -td $(GEN_DIR) -conf $(CONFIG)"
touch $(GEN_DIR)/$(DESIGN).behav_srams.v
verilog: $(GEN_DIR)/$(DESIGN).v
# Common SW RTL simulation Makefrag arguments
# These aren't required as yet, but will be in the future
#bb_vsrcs = \
# $(vsrc)/ClockDivider2.v \
# $(vsrc)/ClockDivider3.v \
# $(vsrc)/AsyncResetReg.v \
#
#sim_vsrcs = \
# $(bb_vsrcs)
emul_v := $(GEN_DIR)/$(DESIGN).v #$(sim_vsrcs)
emul_h :=
emul_cc :=
# VCS Makefrag arguments
ifeq ($(EMUL),vcs)
vcs_wrapper_v := $(vsrc)/TestDriver.v
VCS_FLAGS = +verbose
include $(sim_makefrag_dir)/Makefrag-vcs
vcs = $(OUT_DIR)/$(DESIGN)
vcs_debug = $(OUT_DIR)/$(DESIGN)-debug
vcs: $(vcs)
vcs-debug: $(vcs_debug)
else
# Verilator Makefrag arguments
top_module := TestHarness
override CFLAGS += -I$(csrc) -include $(csrc)/verilator.h -DTEST_HARNESS=V$(top_module) -std=c++11
override emul_cc += $(sim_makefrag_dir)/generic_vharness.cc
include $(sim_makefrag_dir)/Makefrag-verilator
verilator = $(OUT_DIR)/V$(DESIGN)
verilator_debug = $(OUT_DIR)/V$(DESIGN)-debug
verilator: $(verilator)
verilator-debug: $(verilator_debug)
endif
# Run recipes
run-midas-unittests: $($(EMUL))
cd $(GEN_DIR) && $<
run-midas-unittests-debug: $($(EMUL)_debug)
cd $(GEN_DIR) && $<
.PHONY: run-midas-unittests run-midas-unittests-debug verilog

View File

@ -0,0 +1,2 @@
*.o
*.a

View File

@ -0,0 +1,72 @@
// See LICENSE for license details.
#include "midas_context.h"
#include <stdlib.h>
#include <cassert>
static __thread midas_context_t* cur = NULL;
midas_context_t::midas_context_t()
: creator(NULL), func(NULL), arg(NULL),
mutex(PTHREAD_MUTEX_INITIALIZER),
cond(PTHREAD_COND_INITIALIZER), flag(0)
{
}
midas_context_t* midas_context_t::current()
{
if (cur == NULL)
{
cur = new midas_context_t;
cur->thread = pthread_self();
cur->flag = 1;
}
return cur;
}
void* midas_context_t::wrapper(void* a)
{
midas_context_t* ctx = static_cast<midas_context_t*>(a);
cur = ctx;
ctx->creator->switch_to();
ctx->func(ctx->arg);
return NULL;
}
void midas_context_t::init(int (*f)(void*), void* a)
{
func = f;
arg = a;
creator = current();
assert(flag == 0);
pthread_mutex_lock(&creator->mutex);
creator->flag = 0;
if (pthread_create(&thread, NULL, &midas_context_t::wrapper, this) != 0)
abort();
pthread_detach(thread);
while (!creator->flag)
pthread_cond_wait(&creator->cond, &creator->mutex);
pthread_mutex_unlock(&creator->mutex);
}
midas_context_t::~midas_context_t()
{
assert(this != cur);
}
void midas_context_t::switch_to()
{
assert(this != cur);
cur->flag = 0;
this->flag = 1;
pthread_mutex_lock(&this->mutex);
pthread_cond_signal(&this->cond);
pthread_mutex_unlock(&this->mutex);
pthread_mutex_lock(&cur->mutex);
while (!cur->flag)
pthread_cond_wait(&cur->cond, &cur->mutex);
pthread_mutex_unlock(&cur->mutex);
}

View File

@ -0,0 +1,28 @@
// See LICENSE for license details.
#ifndef __CONTEXT_H
#define __CONTEXT_H
#include <pthread.h>
class midas_context_t
{
public:
midas_context_t();
~midas_context_t();
void init(int (*func)(void*), void* arg);
void switch_to();
static midas_context_t* current();
private:
midas_context_t* creator;
int (*func)(void*);
void* arg;
pthread_t thread;
pthread_mutex_t mutex;
pthread_cond_t cond;
volatile int flag;
static void* wrapper(void*);
};
#endif // __CONTEXT_H

View File

@ -0,0 +1,163 @@
// See LICENSE for license details.
#include "mm.h"
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cstring>
#include <string>
#include <cassert>
void mm_base_t::write(uint64_t addr, uint8_t *data) {
addr %= this->size;
uint8_t* base = this->data + addr;
memcpy(base, data, word_size);
}
void mm_base_t::write(uint64_t addr, uint8_t *data, uint64_t strb, uint64_t size)
{
if (addr > this->size) {
char buf[80];
snprintf(buf, 80, "Out-of-bounds write @ address: 0x%lx Memory size: 0x%lx\n", addr, this->size);
throw(mm_exception(buf));
}
strb &= ((1L << size) - 1) << (addr % word_size);
uint8_t *base = this->data + (addr / word_size) * word_size;
for (int i = 0; i < word_size; i++) {
if (strb & 1)
base[i] = data[i];
strb >>= 1;
}
}
std::vector<char> mm_base_t::read(uint64_t addr)
{
if (addr > this->size) {
char buf[80];
snprintf(buf, 80, "Out-of-bounds read @ address: 0x%lx Memory size: 0x%lx\n", addr, this->size);
throw(mm_exception(buf));
}
uint8_t *base = this->data + addr;
return std::vector<char>(base, base + word_size);
}
void mm_base_t::init(size_t sz, int wsz, int lsz)
{
assert(wsz > 0 && lsz > 0 && (lsz & (lsz-1)) == 0 && lsz % wsz == 0);
word_size = wsz;
line_size = lsz;
data = new uint8_t[sz];
size = sz;
}
mm_base_t::~mm_base_t()
{
delete [] data;
}
void mm_magic_t::init(size_t sz, int wsz, int lsz)
{
mm_t::init(sz, wsz, lsz);
dummy_data.resize(word_size);
}
void mm_magic_t::tick(
bool reset,
bool ar_valid,
uint64_t ar_addr,
uint64_t ar_id,
uint64_t ar_size,
uint64_t ar_len,
bool aw_valid,
uint64_t aw_addr,
uint64_t aw_id,
uint64_t aw_size,
uint64_t aw_len,
bool w_valid,
uint64_t w_strb,
void *w_data,
bool w_last,
bool r_ready,
bool b_ready)
{
bool ar_fire = !reset && ar_valid && ar_ready();
bool aw_fire = !reset && aw_valid && aw_ready();
bool w_fire = !reset && w_valid && w_ready();
bool r_fire = !reset && r_valid() && r_ready;
bool b_fire = !reset && b_valid() && b_ready;
if (ar_fire) {
uint64_t start_addr = (ar_addr / word_size) * word_size;
for (size_t i = 0; i <= ar_len; i++) {
auto dat = read(start_addr + i * word_size);
rresp.push(mm_rresp_t(ar_id, dat, i == ar_len));
}
}
if (aw_fire) {
store_addr = aw_addr;
store_id = aw_id;
store_count = aw_len + 1;
store_size = 1 << aw_size;
store_inflight = true;
}
if (w_fire) {
write(store_addr, (uint8_t*)w_data, w_strb, store_size);
store_addr += store_size;
store_count--;
if (store_count == 0) {
store_inflight = false;
bresp.push(store_id);
assert(w_last);
}
}
if (b_fire)
bresp.pop();
if (r_fire)
rresp.pop();
cycle++;
if (reset) {
while (!bresp.empty()) bresp.pop();
while (!rresp.empty()) rresp.pop();
cycle = 0;
}
}
void load_mem(void** mems, const char* fn, int line_size, int nchannels)
{
char* m;
int start = 0;
std::ifstream in(fn);
if (!in)
{
std::cerr << "could not open " << fn << std::endl;
exit(EXIT_FAILURE);
}
std::string line;
while (std::getline(in, line))
{
#define parse_nibble(c) ((c) >= 'a' ? (c)-'a'+10 : (c)-'0')
for (int i = line.length()-2, j = 0; i >= 0; i -= 2, j++) {
char data = (parse_nibble(line[i]) << 4) | parse_nibble(line[i+1]);
int addr = start + j;
int channel = (addr / line_size) % nchannels;
m = (char *) mems[channel];
addr = (addr / line_size / nchannels) * line_size + (addr % line_size);
m[addr] = data;
}
start += line.length()/2;
}
}

View File

@ -0,0 +1,172 @@
// See LICENSE for license details.
#ifndef MM_EMULATOR_H
#define MM_EMULATOR_H
#include <stdint.h>
#include <cstring>
#include <queue>
#include <string>
#include <stdexcept>
class mm_exception : public std::runtime_error {
public:
explicit mm_exception(const std::string& msg) :
std::runtime_error(msg), msg_(msg) {};
virtual const char* what() const throw()
{
return msg_.c_str();
};
virtual ~mm_exception() throw () {};
private:
std::string msg_;
};
class mm_base_t
{
public:
mm_base_t(): data(0), size(0) {}
virtual void init(size_t sz, int word_size, int line_size);
virtual void* get_data() { return data; }
virtual size_t get_size() { return size; }
virtual size_t get_word_size() { return word_size; }
virtual size_t get_line_size() { return line_size; }
void write(uint64_t addr, uint8_t *data);
void write(uint64_t addr, uint8_t *data, uint64_t strb, uint64_t size);
std::vector<char> read(uint64_t addr);
virtual ~mm_base_t();
protected:
uint8_t* data;
size_t size;
int word_size;
int line_size;
};
class mm_t: public mm_base_t
{
public:
virtual bool ar_ready() = 0;
virtual bool aw_ready() = 0;
virtual bool w_ready() = 0;
virtual bool b_valid() = 0;
virtual uint64_t b_resp() = 0;
virtual uint64_t b_id() = 0;
virtual bool r_valid() = 0;
virtual uint64_t r_resp() = 0;
virtual uint64_t r_id() = 0;
virtual void *r_data() = 0;
virtual bool r_last() = 0;
virtual void tick
(
bool reset,
bool ar_valid,
uint64_t ar_addr,
uint64_t ar_id,
uint64_t ar_size,
uint64_t ar_len,
bool aw_valid,
uint64_t aw_addr,
uint64_t aw_id,
uint64_t aw_size,
uint64_t aw_len,
bool w_valid,
uint64_t w_strb,
void *w_data,
bool w_last,
bool r_ready,
bool b_ready
) = 0;
};
struct mm_rresp_t
{
uint64_t id;
std::vector<char> data;
bool last;
mm_rresp_t(uint64_t id, std::vector<char> data, bool last)
{
this->id = id;
this->data = data;
this->last = last;
}
mm_rresp_t()
{
this->id = 0;
this->last = false;
}
};
class mm_magic_t : public mm_t
{
public:
mm_magic_t() : store_inflight(false) {}
virtual void init(size_t sz, int word_size, int line_size);
virtual bool ar_ready() { return true; }
virtual bool aw_ready() { return !store_inflight; }
virtual bool w_ready() { return store_inflight; }
virtual bool b_valid() { return !bresp.empty(); }
virtual uint64_t b_resp() { return 0; }
virtual uint64_t b_id() { return b_valid() ? bresp.front() : 0; }
virtual bool r_valid() { return !rresp.empty(); }
virtual uint64_t r_resp() { return 0; }
virtual uint64_t r_id() { return r_valid() ? rresp.front().id: 0; }
virtual void *r_data() { return r_valid() ? &rresp.front().data[0] : &dummy_data[0]; }
virtual bool r_last() { return r_valid() ? rresp.front().last : false; }
virtual void tick
(
bool reset,
bool ar_valid,
uint64_t ar_addr,
uint64_t ar_id,
uint64_t ar_size,
uint64_t ar_len,
bool aw_valid,
uint64_t aw_addr,
uint64_t aw_id,
uint64_t aw_size,
uint64_t aw_len,
bool w_valid,
uint64_t w_strb,
void *w_data,
bool w_last,
bool r_ready,
bool b_ready
);
protected:
bool store_inflight;
uint64_t store_addr;
uint64_t store_id;
uint64_t store_size;
uint64_t store_count;
std::vector<char> dummy_data;
std::queue<uint64_t> bresp;
std::queue<mm_rresp_t> rresp;
uint64_t cycle;
};
void load_mem(void** mems, const char* fn, int line_size, int nchannels);
#endif

View File

@ -0,0 +1,152 @@
// See LICENSE for license details.
#include "mm_dramsim2.h"
#include "mm.h"
#include <iostream>
#include <fstream>
#include <list>
#include <queue>
#include <cstring>
#include <cstdlib>
#include <cassert>
//#define DEBUG_DRAMSIM2
using namespace DRAMSim;
void mm_dramsim2_t::read_complete(unsigned id, uint64_t address, uint64_t clock_cycle)
{
assert(!rreq[address].empty());
auto req = rreq[address].front();
uint64_t start_addr = (req.addr / word_size) * word_size;
for (size_t i = 0; i < req.len; i++) {
auto dat = read(start_addr + i * word_size);
rresp.push(mm_rresp_t(req.id, dat, (i == req.len - 1)));
}
read_id_busy[req.id] = false;
rreq[address].pop();
}
void mm_dramsim2_t::write_complete(unsigned id, uint64_t address, uint64_t clock_cycle)
{
assert(!wreq[address].empty());
auto b_id = wreq[address].front();
bresp.push(b_id);
write_id_busy[b_id] = false;
wreq[address].pop();
}
void power_callback(double a, double b, double c, double d)
{
//fprintf(stderr, "power callback: %0.3f, %0.3f, %0.3f, %0.3f\n",a,b,c,d);
}
void mm_dramsim2_t::init(size_t sz, int wsz, int lsz)
{
assert(lsz == 64); // assumed by dramsim2
mm_t::init(sz, wsz, lsz);
dummy_data.resize(word_size);
assert(size % (1024*1024) == 0);
mem = getMemorySystemInstance(memory_ini, system_ini, ini_dir, "results", size/(1024*1024));
TransactionCompleteCB *read_cb = new Callback<mm_dramsim2_t, void, unsigned, uint64_t, uint64_t>(this, &mm_dramsim2_t::read_complete);
TransactionCompleteCB *write_cb = new Callback<mm_dramsim2_t, void, unsigned, uint64_t, uint64_t>(this, &mm_dramsim2_t::write_complete);
mem->RegisterCallbacks(read_cb, write_cb, power_callback);
#ifdef DEBUG_DRAMSIM2
fprintf(stderr,"Dramsim2 init successful\n");
#endif
}
bool mm_dramsim2_t::ar_ready() {
return mem->willAcceptTransaction();
}
bool mm_dramsim2_t::aw_ready() {
return mem->willAcceptTransaction() && !store_inflight;
}
void mm_dramsim2_t::tick(
bool reset,
bool ar_valid,
uint64_t ar_addr,
uint64_t ar_id,
uint64_t ar_size,
uint64_t ar_len,
bool aw_valid,
uint64_t aw_addr,
uint64_t aw_id,
uint64_t aw_size,
uint64_t aw_len,
bool w_valid,
uint64_t w_strb,
void *w_data,
bool w_last,
bool r_ready,
bool b_ready)
{
bool ar_fire = !reset && ar_valid && ar_ready();
bool aw_fire = !reset && aw_valid && aw_ready();
bool w_fire = !reset && w_valid && w_ready();
bool r_fire = !reset && r_valid() && r_ready;
bool b_fire = !reset && b_valid() && b_ready;
if (mem->willAcceptTransaction()) {
for (auto it = rreq_queue.begin(); it != rreq_queue.end(); it++) {
if (!read_id_busy[it->id]) {
read_id_busy[it->id] = true;
auto transaction = *it;
rreq[transaction.addr].push(transaction);
mem->addTransaction(false, transaction.addr);
rreq_queue.erase(it);
break;
}
}
}
if (ar_fire) {
rreq_queue.push_back(mm_req_t(ar_id, 1 << ar_size, ar_len + 1, ar_addr));
}
if (aw_fire) {
store_addr = aw_addr;
store_id = aw_id;
store_count = aw_len + 1;
store_size = 1 << aw_size;
store_inflight = true;
}
if (w_fire) {
write(store_addr, (uint8_t*)w_data, w_strb, store_size);
store_addr += store_size;
store_count--;
if (store_count == 0) {
store_inflight = false;
mem->addTransaction(true, store_addr);
wreq[store_addr].push(store_id);
assert(w_last);
}
}
if (b_fire)
bresp.pop();
if (r_fire)
rresp.pop();
mem->update();
cycle++;
if (reset) {
while (!bresp.empty()) bresp.pop();
while (!rresp.empty()) rresp.pop();
cycle = 0;
}
}

View File

@ -0,0 +1,124 @@
// See LICENSE for license details.
#ifndef _MM_EMULATOR_DRAMSIM2_H
#define _MM_EMULATOR_DRAMSIM2_H
#include "mm.h"
#include <DRAMSim.h>
#include <map>
#include <queue>
#include <list>
#include <stdint.h>
struct mm_req_t {
uint64_t id;
uint64_t size;
uint64_t len;
uint64_t addr;
mm_req_t(uint64_t id, uint64_t size, uint64_t len, uint64_t addr)
{
this->id = id;
this->size = size;
this->len = len;
this->addr = addr;
}
mm_req_t()
{
this->id = 0;
this->size = 0;
this->len = 0;
this->addr = 0;
}
};
class mm_dramsim2_t : public mm_t
{
public:
mm_dramsim2_t(int axi4_ids) :
read_id_busy(axi4_ids, false),
write_id_busy(axi4_ids, false) {};
mm_dramsim2_t(std::string memory_ini, std::string system_ini, std::string ini_dir, int axi4_ids) :
memory_ini(memory_ini),
system_ini(system_ini),
ini_dir(ini_dir),
read_id_busy(axi4_ids, false),
write_id_busy(axi4_ids, false) {};
virtual void init(size_t sz, int word_size, int line_size);
virtual bool ar_ready();
virtual bool aw_ready();
virtual bool w_ready() { return store_inflight; }
virtual bool b_valid() { return !bresp.empty(); }
virtual uint64_t b_resp() { return 0; }
virtual uint64_t b_id() { return b_valid() ? bresp.front() : 0; }
virtual bool r_valid() { return !rresp.empty(); }
virtual uint64_t r_resp() { return 0; }
virtual uint64_t r_id() { return r_valid() ? rresp.front().id: 0; }
virtual void *r_data() { return r_valid() ? &rresp.front().data[0] : &dummy_data[0]; }
virtual bool r_last() { return r_valid() ? rresp.front().last : false; }
virtual void tick
(
bool reset,
bool ar_valid,
uint64_t ar_addr,
uint64_t ar_id,
uint64_t ar_size,
uint64_t ar_len,
bool aw_valid,
uint64_t aw_addr,
uint64_t aw_id,
uint64_t aw_size,
uint64_t aw_len,
bool w_valid,
uint64_t w_strb,
void *w_data,
bool w_last,
bool r_ready,
bool b_ready
);
protected:
DRAMSim::MultiChannelMemorySystem *mem;
uint64_t cycle;
bool store_inflight = false;
std::string memory_ini = "DDR3_micron_64M_8B_x4_sg15.ini";
std::string system_ini = "system.ini";
std::string ini_dir = "dramsim2_ini";
uint64_t store_addr;
uint64_t store_id;
uint64_t store_size;
uint64_t store_count;
std::vector<char> dummy_data;
std::queue<uint64_t> bresp;
// Keep a FIFO of IDs that made reads to an address since Dramsim2 doesn't
// track it. Reads or writes to the same address from different IDs can
// collide
std::map<uint64_t, std::queue<uint64_t>> wreq;
std::map<uint64_t, std::queue<mm_req_t>> rreq;
std::queue<mm_rresp_t> rresp;
//std::map<uint64_t, std::queue<mm_rresp_t> > rreq;
// Track inflight requests by putting indexes to their positions in the
// stimulus vector in queues for each AXI channel
std::vector<bool> read_id_busy;
std::vector<bool> write_id_busy;
std::list<mm_req_t> rreq_queue;
void read_complete(unsigned id, uint64_t address, uint64_t clock_cycle);
void write_complete(unsigned id, uint64_t address, uint64_t clock_cycle);
};
#endif

View File

@ -0,0 +1,23 @@
# Compile DRAMSim2
dramsim_o := $(foreach f, \
$(patsubst %.cpp, %.o, $(wildcard $(midas_dir)/dramsim2/*.cpp)), \
$(GEN_DIR)/$(notdir $(f)))
$(dramsim_o): $(GEN_DIR)/%.o: $(midas_dir)/dramsim2/%.cpp
$(CXX) $(CXXFLAGS) -DNO_STORAGE -DNO_OUTPUT -Dmain=nomain -c -o $@ $<
ifeq ($(PLATFORM),zynq)
host = arm-xilinx-linux-gnueabi
endif
# Compile utility code
lib_files := mm mm_dramsim2 $(if $(filter $(CXX),cl),,midas_context)
lib_cc := $(addprefix $(util_dir)/, $(addsuffix .cc, $(lib_files)))
lib_o := $(addprefix $(GEN_DIR)/, $(addsuffix .o, $(lib_files)))
$(lib_o): $(GEN_DIR)/%.o: $(util_dir)/%.cc
$(CXX) $(CXXFLAGS) -c -o $@ $<
lib := $(GEN_DIR)/libmidas.a
$(lib): $(lib_o) $(dramsim_o)
$(AR) rcs $@ $^

View File

@ -0,0 +1,58 @@
NUM_BANKS=8
NUM_ROWS=32768
NUM_COLS=2048
DEVICE_WIDTH=4
;in nanoseconds
;#define REFRESH_PERIOD 7800
REFRESH_PERIOD=7800
tCK=1.5 ;*
CL=10 ;*
AL=0 ;*
;AL=3; needs to be tRCD-1 or 0
;RL=(CL+AL)
;WL=(RL-1)
BL=8 ;*
tRAS=24;*
tRCD=10 ;*
tRRD=4 ;*
tRC=34 ;*
tRP=10 ;*
tCCD=4 ;*
tRTP=5 ;*
tWTR=5 ;*
tWR=10 ;*
tRTRS=1; -- RANK PARAMETER, TODO
tRFC=107;*
tFAW=20;*
tCKE=4 ;*
tXP=4 ;*
tCMD=1 ;*
IDD0=100;
IDD1=130;
IDD2P=10;
IDD2Q=70;
IDD2N=70;
IDD3Pf=60;
IDD3Ps=60;
IDD3N=90;
IDD4W=255;
IDD4R=230;
IDD5=305;
IDD6=9;
IDD6L=12;
IDD7=415;
;same bank
;READ_TO_PRE_DELAY=(AL+BL/2+max(tRTP,2)-2)
;WRITE_TO_PRE_DELAY=(WL+BL/2+tWR)
;READ_TO_WRITE_DELAY=(RL+BL/2+tRTRS-WL)
;READ_AUTOPRE_DELAY=(AL+tRTP+tRP)
;WRITE_AUTOPRE_DELAY=(WL+BL/2+tWR+tRP)
;WRITE_TO_READ_DELAY_B=(WL+BL/2+tWTR);interbank
;WRITE_TO_READ_DELAY_R=(WL+BL/2+tRTRS-RL);interrank
Vdd=1.5 ; TODO: double check this

View File

@ -0,0 +1,25 @@
; COPY THIS FILE AND MODIFY IT TO SUIT YOUR NEEDS
NUM_CHANS=1 ; number of *logically independent* channels (i.e. each with a separate memory controller); should be a power of 2
JEDEC_DATA_BUS_BITS=64 ; Always 64 for DDRx; if you want multiple *ganged* channels, set this to N*64
TRANS_QUEUE_DEPTH=32 ; transaction queue, i.e., CPU-level commands such as: READ 0xbeef
CMD_QUEUE_DEPTH=32 ; command queue, i.e., DRAM-level commands such as: CAS 544, RAS 4
EPOCH_LENGTH=100000 ; length of an epoch in cycles (granularity of simulation)
ROW_BUFFER_POLICY=open_page ; close_page or open_page
ADDRESS_MAPPING_SCHEME=scheme2 ;valid schemes 1-7; For multiple independent channels, use scheme7 since it has the most parallelism
SCHEDULING_POLICY=rank_then_bank_round_robin ; bank_then_rank_round_robin or rank_then_bank_round_robin
QUEUING_STRUCTURE=per_rank ;per_rank or per_rank_per_bank
;for true/false, please use all lowercase
DEBUG_TRANS_Q=false
DEBUG_CMD_Q=false
DEBUG_ADDR_MAP=false
DEBUG_BUS=false
DEBUG_BANKSTATE=false
DEBUG_BANKS=false
DEBUG_POWER=false
VIS_FILE_OUTPUT=false
USE_LOW_POWER=true ; go into low power mode when idle?
VERIFICATION_OUTPUT=false ; should be false for normal operation
TOTAL_ROW_ACCESSES=4 ; maximum number of open page requests to send to the same row before forcing a row close (to prevent starvation)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,61 @@
#!/usr/bin/python
import optparse
import subprocess
import json
import re
# This script generates json files used to drive the memory configuration
# generator by invoking verilator's verilog preprocessor on the micron-provided
# DRAM-model verilog headers. It does this for all combinations of speedgrade X
# DQ width
parser = optparse.OptionParser()
parser.add_option('-f', '--input-file', dest='input_file', help='The verilog header to parse for DDR timings.')
parser.add_option('-o', '--output-file', dest='output_file', help='The output json file name.')
(options, args) = parser.parse_args()
def call_verilator_preprocessor(filename, speedgrade, width):
#args = ['verilator', '-E', '-D' + width, '-D' + speedgrade, filename]
args = "verilator -E -D{0} -D{1} {2}".format(speedgrade, width, filename)
p = subprocess.Popen(args, shell=True, stdout = subprocess.PIPE)
return p.stdout.readlines()
def get_units(filename):
units = {}
with open(filename, 'rb') as vhf:
for line in vhf.readlines():
m = re.search('parameter\s*(\w*).*?\/\/\s*([()\w]+?)\s*(tCK|ps)', line)
if m:
units[m.group(1)] = m.group(3)
return units
values = {}
speedgrades = [ "sg093", "sg107", "sg125", "sg15E", "sg15", "sg187E", "sg187", "sg25E", "sg25",]
widths = ['x4', 'x8', 'x16']
unit_table = get_units(options.input_file)
for sg in speedgrades:
values[sg] = {};
for width in widths:
lines = call_verilator_preprocessor(options.input_file, sg, width)
values[sg][width] = {};
for line in lines:
line_backup = line
m = re.search('parameter\s*(\w*)\s*=\s*(\w*);', line)
if m:
units = unit_table.get(m.group(1), "none")
try:
values[sg][width][m.group(1)] = { "units" : units,
"value" : int(m.group(2))}
except ValueError:
# Reject string parameters
pass
with open(options.output_file, 'wb') as jsonf:
json.dump(values, jsonf, indent = 4)

View File

@ -0,0 +1 @@
*.pyc

View File

@ -0,0 +1,105 @@
#!/usr/bin/env python
# See LICENSE for license details.
import os
import os.path
from subprocess import Popen
import argparse
import shutil
import re
import csv
import numpy as np
import math
parser = argparse.ArgumentParser(
description = 'Run PrimeTime PX for each sample generated by replay-sample.py')
parser.add_argument("-s", "--sample", dest="sample", type=str,
help='sample file', required=True)
parser.add_argument("-d", "--design", dest="design", type=str,
help='design name', required=True)
parser.add_argument("-m", "--make", dest="make", type=str, nargs='*',
help='make command', required=True)
parser.add_argument("--output-dir", dest="output_dir", type=str,
help='output directory for vpd, power', required=True)
parser.add_argument("--trace-dir", dest="trace_dir", type=str,
help='PLSI TRACE directory', required=True)
parser.add_argument("--obj-dir", dest="obj_dir", type=str,
help='PLSI OBJ directory', required=True)
args = parser.parse_args()
""" Read Sample """
num = 0
with open(args.sample) as f:
for line in f:
tokens = line.split(" ")
head = tokens[0]
if head == '1':
assert tokens[1] == 'cycle:'
num += 1
prefix = os.path.basename(os.path.splitext(args.sample)[0])
if not os.path.exists(args.output_dir):
os.makedirs(args.output_dir)
if not os.path.exists(args.trace_dir):
os.makedirs(args.trace_dir)
ids = range(num)
for k in xrange(0, num, 10):
ps = list()
for i in ids[k:k+10]:
""" Copy vpd """
shutil.copy("%s/%s-replay-%d.vpd" % (args.output_dir, prefix, i),
"%s/%s-replay-%d.vpd" % (args.trace_dir, prefix, i))
""" Run PrimeTime PX """
cmd = ["make"] + args.make + \
["SAMPLE=%s/%s-replay-%d.sample" % (args.output_dir, prefix, i)]
ps.append(Popen(cmd, stdout=open(os.devnull, 'wb')))
while any(p.poll() == None for p in ps):
pass
assert all(p.poll() == 0 for p in ps)
""" Read report file """
modules = list()
sample_pwr = dict()
for i in xrange(num):
report_filename = "%s/pt-power/%s-replay-%d/synopsys-pt-workdir/reports/%s_report_power.report" % (
args.obj_dir, prefix, i, args.design)
with open(report_filename) as f:
found = False
for line in f:
tokens = line.split()
if not found:
found = len(tokens) > 0 and tokens[0] == 'Hierarchy'
elif found and len(tokens) >= 6:
module = ' '.join(tokens[:2]) if len(tokens) > 6 else tokens[0]
int_pwr = tokens[-5]
switch_pwr = tokens[-4]
leak_pwr = tokens[-3]
total_pwr = tokens[-2]
percent = tokens[-1]
if not 'clk_gate' in module:
if not module in sample_pwr:
modules.append(module)
sample_pwr[module] = list()
sample_pwr[module].append('0.0' if total_pwr == 'N/A' else total_pwr)
""" Dump power """
csv_filename = "%s/%s-pwr.csv" % (args.output_dir, prefix)
print "[strober] Dump Power at", csv_filename
with open(csv_filename, "w") as f:
writer = csv.writer(f)
writer.writerow(["Modules"] + ["Sample %d (mW)" % i for i in xrange(num)] + [
"Average (mW)", "95% error", "99% error"])
for m in modules:
arr = np.array([1000.0 * float(x) for x in sample_pwr[m]])
avg = np.mean(arr)
var = np.sum(np.power(arr - avg, 2)) / (num - 1) if num > 1 else 0 # sample variance
_95 = 1.96 * math.sqrt(var / num)
_99 = 2.576 * math.sqrt(var / num)
writer.writerow([m] + arr.tolist() + [avg, _95, _99])

View File

@ -0,0 +1,178 @@
#!/usr/bin/env python
# See LICENSE for license details.
import sys
import os
import tempfile
import subprocess
import argparse
import json
import fm_regex
def initialize_arguments(args):
""" initilize translator arguments """
parser = argparse.ArgumentParser(
description = 'run formality for macros')
parser.add_argument('--paths', type=file, required=True,
help="""macro path analaysis file from Strober's compiler (d.g. <design>.macros.path) """)
parser.add_argument('--ref', nargs='+',
help="""reference verilog file""")
parser.add_argument('--impl', nargs='+',
help="""implementation verilog file""")
parser.add_argument('--match', type=str, required=True,
help="match file to be appended")
""" parse the arguments """
res = parser.parse_args(args)
return res.paths, res.match, res.ref, res.impl
def read_path_file(f):
paths = dict()
try:
for line in f:
tokens = line.split()
module = tokens[0]
path = tokens[1]
if not module in paths:
paths[module] = list()
paths[module].append(path)
finally:
f.close()
return paths
def read_match_file(match_file):
gate_names = dict()
with open(match_file, 'r') as f:
for line in f:
tokens = line.split()
gate_names[tokens[0]] = tokens[1]
return gate_names
def write_tcl(tcl_file, report_file, mem_name, ref_v_files, impl_v_files):
with open(tcl_file, 'w') as f:
""" Don't match name substrings """
f.write("set_app_var name_match_allow_subset_match none\n")
""" No errors from unresolved modules """
f.write("set_app_var hdlin_unresolved_modules black_box\n")
""" Read reference verilog files """
for ref in ref_v_files:
f.write("read_verilog -r %s -work_library WORK\n" % ref)
""" Set top of reference """
f.write("set_top r:/WORK/%s\n" % mem_name)
""" Read implementation verilog files """
for impl in impl_v_files:
f.write("read_verilog -i %s -work_library WORK\n" % impl)
""" Set top of implementation """
f.write("set_top i:/WORK/%s\n" % mem_name)
""" Match """
f.write("match\n")
""" Report match points """
f.write("report_matched_points > %s\n" % report_file)
""" Report unmatch points """
# f.write("report_unmatched_points >> %s\n" % report_file)
""" Finish """
f.write("exit\n")
return
def append_match_file(report_file, match_file, mem, paths, gate_names):
""" construct macro name mapping for the formality report """
macro_map = list()
ref_was_matched = False
with open(report_file, 'r') as f:
for line in f:
if ref_was_matched:
impl_matched = fm_regex.impl_regex.search(line)
if impl_matched:
impl_name = impl_matched.group(1).replace("/", ".")
impl_name = impl_name.replace(mem + ".", "")
ff_matched = fm_regex.ff_regex.match(impl_name)
reg_matched = fm_regex.reg_regex.match(impl_name)
mem_matched = fm_regex.mem_regex.match(impl_name)
if mem_matched:
impl_name = mem_matched.group(1) + "[" + mem_matched.group(2) + "]" +\
"[" + mem_matched.group(3) + "]"
elif reg_matched:
impl_name = reg_matched.group(1) + "[" + reg_matched.group(2) + "]"
elif ff_matched:
impl_name = ff_matched.group(1)
macro_map.append((ref_name, impl_name))
ref_was_matched = False
else:
ref_matched = fm_regex.ref_regex.search(line)
if ref_matched:
ref_name = ref_matched.group(2).replace("/", ".")
ref_name = ref_name.replace(mem + ".", "")
ff_matched = fm_regex.ff_regex.match(ref_name)
reg_matched = fm_regex.reg_regex.match(ref_name)
mem_matched = fm_regex.mem_regex.match(ref_name)
if mem_matched:
ref_name = mem_matched.group(1) + "[" + mem_matched.group(2) + "]" +\
"[" + mem_matched.group(3) + "]"
elif reg_matched:
ref_name = reg_matched.group(1) + "[" + reg_matched.group(2) + "]"
elif ff_matched:
ref_name = ff_matched.group(1)
ref_was_matched = True
""" append the name mapping to the match file """
with open(match_file, 'a') as f:
for path in paths:
for ref_name, impl_name in macro_map:
ref_full_name = path + "." + ref_name
if path in gate_names:
impl_mod_path = gate_names[path]
else:
impl_mod_path = path
impl_full_name = impl_mod_path + "." + impl_name
f.write("%s %s\n" % (ref_full_name, impl_full_name))
return
if __name__ == '__main__':
""" parse the arguments """
path_file, match_file, ref_files, impl_files = initialize_arguments(sys.argv[1:])
""" read path file """
paths = read_path_file(path_file)
""" read match file """
gate_names = read_match_file(match_file)
""" create temp dir """
dir_path = tempfile.mkdtemp()
for mem in paths:
""" TCL file path """
tcl_file = os.path.join(dir_path, mem + ".tcl")
""" report file path """
report_file = os.path.join(dir_path, mem + ".rpt")
""" generate TCL script for formality """
write_tcl(tcl_file, report_file, mem, ref_files, impl_files)
""" execute formality """
assert subprocess.call(["fm_shell", "-f", tcl_file]) == 0
""" append mappings to the match file """
append_match_file(report_file, match_file, mem, paths[mem], gate_names)

View File

@ -0,0 +1,109 @@
#!/usr/bin/env python
# See LICENSE for license details.
import fm_regex
import read_svf
import sys
import argparse
def initialize_arguments(args):
""" initilize arguments """
parser = argparse.ArgumentParser(
description = 'Find match map between RTL Names & Gate-Level Names')
parser.add_argument('--match', type=str, required=True,
help="""match output file (RTL name -> Gate-level name)""")
parser.add_argument('--report', type=str, required=True,
help="""report from Snopsys formality (generated by 'match' and 'report_match_points' )""")
parser.add_argument('--svf', type=str, required=True,
help="""decripted svf file from formality (generated in formality_svf/svf.txt)""")
""" parse the arguments """
res = parser.parse_args(args)
return res.report, res.svf, res.match
def read_name_map(report_file, instance_map, change_names):
name_map = list()
with open(report_file, 'r') as f:
ref_was_matched = False
ref_name = ""
for line in f:
if ref_was_matched:
impl_matched = fm_regex.impl_regex.search(line)
if impl_matched:
name_map.append((ref_name, impl_matched.group(1).replace("/", ".")))
ref_was_matched = False
else:
ref_matched = fm_regex.ref_regex.search(line)
if ref_matched:
gate_type = ref_matched.group(1)
ref_name_tokens = ref_matched.group(2).split("/")
ref_name = ref_name_tokens[0]
design = ref_name
for i, token in enumerate(ref_name_tokens[1:]):
if design in change_names:
map = change_names[design]
rtl_name = map[token] if token in map else token
else:
rtl_name = token
ref_name = ref_name + "." + rtl_name
if design in instance_map and i < len(ref_name_tokens[1:]) - 1:
design = instance_map[design][rtl_name]
else:
design = ""
if gate_type == "DFF":
""" D Flip Flops """
ff_matched = fm_regex.ff_regex.match(ref_name)
reg_matched = fm_regex.reg_regex.match(ref_name)
mem_matched = fm_regex.mem_regex.match(ref_name)
if mem_matched:
ref_name = mem_matched.group(1) + "[" + mem_matched.group(2) + "]" +\
"[" + mem_matched.group(3) + "]"
elif reg_matched:
ref_name = reg_matched.group(1) + "[" + reg_matched.group(2) + "]"
elif ff_matched:
ref_name = ff_matched.group(1)
else:
print ref_name
assert False
elif gate_type == "BBox":
""" Macros """
pass
elif gate_type == "BlPin" or gate_type == "BBPin" or gate_type == "Port":
""" Pins """
bus_matched = fm_regex.bus_regex.search(ref_name)
if bus_matched:
ref_name = bus_matched.group(1) + "[" + bus_matched.group(2) + "]"
else:
assert False
ref_was_matched = True
return name_map
def write_match_file(match_file, name_map):
with open(match_file, 'w') as f:
for ref_name, impl_name in name_map:
f.write("%s %s\n" % (ref_name, impl_name))
return
if __name__ == '__main__':
""" parse the arguments and open files """
report_file, svf_file, match_file = initialize_arguments(sys.argv[1:])
""" read svf file for guidance used in formality """
instance_map, change_names = read_svf.read_svf_file(svf_file)
""" read gate-level names from the formality report file """
name_map = read_name_map(report_file, instance_map, change_names)
""" write the output file """
write_match_file(match_file, name_map)

View File

@ -0,0 +1,27 @@
import re
# See LICENSE for license details.
""" define reference(RTL) name regular expression """
ref_regex = re.compile(r"""
Ref\s+ # is reference(RTL)?
(DFF|BlPin|Port|BBox|BBPin)\w*\s+ # Type
(?:[\w\(\)]*)\s # Matched by (e.g. name)
r:/WORK/ # name prefix
([\w/\[\]\$]*) # RTL(chisel) name
""", re.VERBOSE)
""" define implemntation(gate-level designs) name regular expression """
impl_regex = re.compile(r"""
Impl\s+ # is implementation(gate-level design)?
(?:DFF|BlPin|Port|BBox|BBPin)\w*\s+ # Type
(?:\(-\))?\s+ # Inverted?
(?:[\w\(\)]*)\s # Matched by (e.g. name)
i:/WORK/ # name prefix
([\w/\[\]]*) # gate-level name
""", re.VERBOSE)
ff_regex = re.compile(r"([\w.\$]*)_reg")
reg_regex = re.compile(r"([\w.\$]*)_reg[_\[](\d+)[_\]]")
mem_regex = re.compile(r"([\w.\$]*)_reg[_\[](\d+)[_\]][_\[](\d+)[_\]]")
bus_regex = re.compile(r"([\w.\$]*)[_\[](\d+)[_\]]")

View File

@ -0,0 +1,128 @@
import sys
import argparse
# See LICENSE for license details.
def construct_instance_map(instance_map, tokens):
is_design = False
is_instance = False
is_module = False
for token in tokens:
if token == "{" or token == "}":
pass
elif token == "-design":
is_design = True
elif token == "-instance":
is_instance = True
elif token == "-linked":
is_module = True
elif is_design:
design = token
is_design = False
elif is_instance:
instance = token
is_instance = False
elif is_module:
module = token
is_module = False
else:
print token, ': ', ' '.join(tokens)
assert False
if not design in instance_map:
instance_map[design] = dict()
instance_map[design][instance] = module
return
def uniquify_instances(instance_map, tokens):
state = -1
for token in tokens:
if token == "{" or token == "}":
pass
elif token == "-design":
state = 0
elif state == 0:
""" identify a parent module name """
top = token
state = 1
elif state == 1:
""" identify an child instance name """
design = top
path_tokens = token.split("/")
for path_token in path_tokens[:-1]:
design = instance_map[design][path_token]
instance = path_tokens[-1]
path = '.'.join(path_tokens) # for debugging
state = 2
elif state == 2:
""" identify a child module name """
if not design in instance_map:
instance_map[design] = dict()
instance_map[design][instance] = token
state = 1
else:
print token, ': ', ' '.join(tokens)
assert False
return
def construct_change_names(change_names, tokens):
state = -1
for token in tokens:
if token == '{' or token == '}':
pass
elif token == "-design":
state = 0
elif state == 0:
""" identify a parent module name """
design = token
state = 1
elif state == 1:
""" identify an object type """
is_cell = token == "cell"
state = 2
elif state == 2:
""" identify an rtl name """
rtl_name = token
state = 3
elif state == 3:
if not design in change_names:
change_names[design] = dict()
""" record name changes only for cells """
if is_cell:
change_names[design][token] = rtl_name
state = 1
else:
print token, ': ', ' '.join(tokens)
assert False
return
def read_svf_file(svf_file):
instance_map = dict()
change_names = dict()
with open(svf_file, 'r') as f:
full_line = ""
for line in f:
tokens = line.split()
if len(tokens) == 0:
pass
elif tokens[-1] == '\\':
full_line += ' '.join(tokens[:-1]) + ' '
else:
full_line += ' '.join(tokens)
full_tokens = full_line.split()
guide_cmd = full_tokens[0]
if guide_cmd == "guide_instance_map":
construct_instance_map(instance_map, full_tokens[1:])
elif guide_cmd == "guide_uniquify":
uniquify_instances(instance_map, full_tokens[1:])
elif guide_cmd == "guide_change_names":
construct_change_names(change_names, full_tokens[1:])
full_line = ""
return instance_map, change_names

View File

@ -0,0 +1,77 @@
#!/usr/bin/env python
# See LICENSE for license details.
import os.path
import argparse
from subprocess import Popen
parser = argparse.ArgumentParser(
description = 'Replay each sample in a separate simulation instance')
parser.add_argument("-s", "--sample", dest="sample", type=str,
help='sample files', required=True)
parser.add_argument("-e", "--sim", dest="sim", type=str,
help='simulator executable for sample replays', required=True)
parser.add_argument("-d", "--dir", dest="dir", type=str,
help='output directory for waveform, log', required=True)
parser.add_argument("-m", "--match", dest="match", type=str, default=None,
help='match file generated by fm-match.py', required=False)
parser.add_argument("-n", "--num", dest="num", type=str, default=10,
help='# instances of gate-level simulation', required=False)
args = parser.parse_args()
prefix = os.path.basename(os.path.splitext(args.sample)[0])
abspath = os.path.abspath(args.sim)
dirname = os.path.dirname(abspath)
basename = os.path.basename(abspath)
if not os.path.exists(args.dir):
os.makedirs(args.dir)
""" Split Sample """
prologue = list()
samples = list()
with open(args.sample) as f:
for line in f:
tokens = line.split(" ")
head = tokens[0]
if head == '0':
prologue.append(line)
elif head == '1':
assert tokens[1] == 'cycle:'
samples.append(list(line))
else:
samples[-1].append(line)
""" Save samples """
for i, sample in enumerate(samples):
f = open(os.path.join(args.dir, "%s-replay-%d.sample" % (prefix, i)), 'w')
for line in prologue:
f.write("%s" % line)
for line in sample:
f.write("%s" % line)
f.close()
""" Execute replays """
ids = range(len(samples))
for k in xrange(0, len(samples), args.num):
ps = list()
logs = list()
for i in ids[k:k+min(args.num,len(samples)-k)]:
cmd = ["./%s" % basename,
"+verbose",
"+match=%s" % (args.match) if args.match != None else "",
"+sample=%s/%s-replay-%d.sample" % (args.dir, prefix, i),
"+vcdfile=%s/%s-replay-%d.vcd" % (args.dir, prefix, i),
"+waveform=%s/%s-replay-%d.vpd" % (args.dir, prefix, i)]
print " ".join(cmd)
log = open("%s/%s-replay-%d.out" % (args.dir, prefix, i), 'w')
ps.append(Popen(cmd, cwd=dirname, stderr=log))
logs.append(log)
while any(p.poll() == None for p in ps):
pass
assert all(p.poll() == 0 for p in ps)
for log in logs:
log.close()

View File

@ -0,0 +1,2 @@
$init_sigs call=init_sigs_calltf
$tick call=tick_calltf check=tick_compiletf acc=rw,frc:* acc-=frc:%CELL

View File

@ -0,0 +1,158 @@
// See LICENSE.Berkeley for license details.
package junctions
import Chisel._
import freechips.rocketchip.unittest.UnitTest
class MultiWidthFifo(inW: Int, outW: Int, n: Int) extends Module {
val io = new Bundle {
val in = Decoupled(Bits(width = inW)).flip
val out = Decoupled(Bits(width = outW))
val count = UInt(OUTPUT, log2Up(n + 1))
}
if (inW == outW) {
val q = Module(new Queue(Bits(width = inW), n))
q.io.enq <> io.in
io.out <> q.io.deq
io.count := q.io.count
} else if (inW > outW) {
val nBeats = inW / outW
require(inW % outW == 0, s"MultiWidthFifo: in: $inW not divisible by out: $outW")
require(n % nBeats == 0, s"Cannot store $n output words when output beats is $nBeats")
val wdata = Reg(Vec(n / nBeats, Bits(width = inW)))
val rdata = Vec(wdata.flatMap { indat =>
(0 until nBeats).map(i => indat(outW * (i + 1) - 1, outW * i)) })
val head = Reg(init = UInt(0, log2Up(n / nBeats)))
val tail = Reg(init = UInt(0, log2Up(n)))
val size = Reg(init = UInt(0, log2Up(n + 1)))
when (io.in.fire()) {
wdata(head) := io.in.bits
head := head + UInt(1)
}
when (io.out.fire()) { tail := tail + UInt(1) }
size := MuxCase(size, Seq(
(io.in.fire() && io.out.fire()) -> (size + UInt(nBeats - 1)),
io.in.fire() -> (size + UInt(nBeats)),
io.out.fire() -> (size - UInt(1))))
io.out.valid := size > UInt(0)
io.out.bits := rdata(tail)
io.in.ready := size < UInt(n - nBeats + 1)
io.count := size
} else {
val nBeats = outW / inW
require(outW % inW == 0, s"MultiWidthFifo: out: $outW not divisible by in: $inW")
val wdata = Reg(Vec(n * nBeats, Bits(width = inW)))
val rdata = Vec.tabulate(n) { i =>
Cat(wdata.slice(i * nBeats, (i + 1) * nBeats).reverse)}
val head = Reg(init = UInt(0, log2Up(n * nBeats)))
val tail = Reg(init = UInt(0, log2Up(n)))
val size = Reg(init = UInt(0, log2Up(n * nBeats + 1)))
when (io.in.fire()) {
wdata(head) := io.in.bits
head := head + UInt(1)
}
when (io.out.fire()) { tail := tail + UInt(1) }
size := MuxCase(size, Seq(
(io.in.fire() && io.out.fire()) -> (size - UInt(nBeats - 1)),
io.in.fire() -> (size + UInt(1)),
io.out.fire() -> (size - UInt(nBeats))))
io.count := size >> UInt(log2Up(nBeats))
io.out.valid := io.count > UInt(0)
io.out.bits := rdata(tail)
io.in.ready := size < UInt(n * nBeats)
}
}
class MultiWidthFifoTest extends UnitTest {
val big2little = Module(new MultiWidthFifo(16, 8, 8))
val little2big = Module(new MultiWidthFifo(8, 16, 4))
val bl_send = Reg(init = false.B)
val lb_send = Reg(init = false.B)
val bl_recv = Reg(init = false.B)
val lb_recv = Reg(init = false.B)
val bl_finished = Reg(init = false.B)
val lb_finished = Reg(init = false.B)
val bl_data = Vec.tabulate(4){i => UInt((2 * i + 1) * 256 + 2 * i, 16)}
val lb_data = Vec.tabulate(8){i => UInt(i, 8)}
val (bl_send_cnt, bl_send_done) = Counter(big2little.io.in.fire(), 4)
val (lb_send_cnt, lb_send_done) = Counter(little2big.io.in.fire(), 8)
val (bl_recv_cnt, bl_recv_done) = Counter(big2little.io.out.fire(), 8)
val (lb_recv_cnt, lb_recv_done) = Counter(little2big.io.out.fire(), 4)
big2little.io.in.valid := bl_send
big2little.io.in.bits := bl_data(bl_send_cnt)
big2little.io.out.ready := bl_recv
little2big.io.in.valid := lb_send
little2big.io.in.bits := lb_data(lb_send_cnt)
little2big.io.out.ready := lb_recv
val bl_recv_data_idx = bl_recv_cnt >> UInt(1)
val bl_recv_data = Mux(bl_recv_cnt(0),
bl_data(bl_recv_data_idx)(15, 8),
bl_data(bl_recv_data_idx)(7, 0))
val lb_recv_data = Cat(
lb_data(Cat(lb_recv_cnt, UInt(1, 1))),
lb_data(Cat(lb_recv_cnt, UInt(0, 1))))
when (io.start) {
bl_send := true.B
lb_send := true.B
}
when (bl_send_done) {
bl_send := false.B
bl_recv := true.B
}
when (lb_send_done) {
lb_send := false.B
lb_recv := true.B
}
when (bl_recv_done) {
bl_recv := false.B
bl_finished := true.B
}
when (lb_recv_done) {
lb_recv := false.B
lb_finished := true.B
}
io.finished := bl_finished && lb_finished
val bl_start_recv = Reg(next = bl_send_done)
val lb_start_recv = Reg(next = lb_send_done)
assert(!little2big.io.out.valid || little2big.io.out.bits === lb_recv_data,
"Little to Big data mismatch")
assert(!big2little.io.out.valid || big2little.io.out.bits === bl_recv_data,
"Bit to Little data mismatch")
assert(!lb_start_recv || little2big.io.count === UInt(4),
"Little to Big count incorrect")
assert(!bl_start_recv || big2little.io.count === UInt(8),
"Big to Little count incorrect")
}

View File

@ -0,0 +1,117 @@
package junctions
import Chisel._
import freechips.rocketchip.config.Parameters
class ReorderQueueWrite[T <: Data](dType: T, tagWidth: Int) extends Bundle {
val data = dType.cloneType
val tag = UInt(width = tagWidth)
override def cloneType =
new ReorderQueueWrite(dType, tagWidth).asInstanceOf[this.type]
}
class ReorderEnqueueIO[T <: Data](dType: T, tagWidth: Int)
extends DecoupledIO(new ReorderQueueWrite(dType, tagWidth)) {
override def cloneType =
new ReorderEnqueueIO(dType, tagWidth).asInstanceOf[this.type]
}
class ReorderDequeueIO[T <: Data](dType: T, tagWidth: Int) extends Bundle {
val valid = Bool(INPUT)
val tag = UInt(INPUT, tagWidth)
val data = dType.cloneType.asOutput
val matches = Bool(OUTPUT)
override def cloneType =
new ReorderDequeueIO(dType, tagWidth).asInstanceOf[this.type]
}
class ReorderQueue[T <: Data](dType: T, tagWidth: Int,
size: Option[Int] = None, nDeq: Int = 1)
extends Module {
val io = new Bundle {
val enq = new ReorderEnqueueIO(dType, tagWidth).flip
val deq = Vec(nDeq, new ReorderDequeueIO(dType, tagWidth))
}
val tagSpaceSize = 1 << tagWidth
val actualSize = size.getOrElse(tagSpaceSize)
if (tagSpaceSize > actualSize) {
require(tagSpaceSize % actualSize == 0)
val smallTagSize = log2Ceil(actualSize)
val roq_data = Reg(Vec(actualSize, dType))
val roq_tags = Reg(Vec(actualSize, UInt(width = tagWidth - smallTagSize)))
val roq_free = Reg(init = Vec.fill(actualSize)(true.B))
val roq_enq_addr = io.enq.bits.tag(smallTagSize-1, 0)
io.enq.ready := roq_free(roq_enq_addr)
when (io.enq.valid && io.enq.ready) {
roq_data(roq_enq_addr) := io.enq.bits.data
roq_tags(roq_enq_addr) := io.enq.bits.tag >> smallTagSize.U
roq_free(roq_enq_addr) := false.B
}
io.deq.foreach { deq =>
val roq_deq_addr = deq.tag(smallTagSize-1, 0)
deq.data := roq_data(roq_deq_addr)
deq.matches := !roq_free(roq_deq_addr) && roq_tags(roq_deq_addr) === (deq.tag >> smallTagSize.U)
when (deq.valid) {
roq_free(roq_deq_addr) := true.B
}
}
} else if (tagSpaceSize == actualSize) {
val roq_data = Mem(tagSpaceSize, dType)
val roq_free = Reg(init = Vec.fill(tagSpaceSize)(true.B))
io.enq.ready := roq_free(io.enq.bits.tag)
when (io.enq.valid && io.enq.ready) {
roq_data(io.enq.bits.tag) := io.enq.bits.data
roq_free(io.enq.bits.tag) := false.B
}
io.deq.foreach { deq =>
deq.data := roq_data(deq.tag)
deq.matches := !roq_free(deq.tag)
when (deq.valid) {
roq_free(deq.tag) := true.B
}
}
} else {
require(actualSize % tagSpaceSize == 0)
val qDepth = actualSize / tagSpaceSize
val queues = Seq.fill(tagSpaceSize) {
Module(new Queue(dType, qDepth))
}
io.enq.ready := false.B
io.deq.foreach(_.matches := false.B)
io.deq.foreach(_.data := dType.fromBits(UInt(0)))
for ((q, i) <- queues.zipWithIndex) {
when (io.enq.bits.tag === UInt(i)) { io.enq.ready := q.io.enq.ready }
q.io.enq.valid := io.enq.valid && io.enq.bits.tag === UInt(i)
q.io.enq.bits := io.enq.bits.data
val deqReadys = Wire(Vec(nDeq, Bool()))
io.deq.zip(deqReadys).foreach { case (deq, rdy) =>
when (deq.tag === UInt(i)) {
deq.matches := q.io.deq.valid
deq.data := q.io.deq.bits
}
rdy := deq.valid && deq.tag === UInt(i)
}
q.io.deq.ready := deqReadys.reduce(_ || _)
}
}
}

View File

@ -0,0 +1,148 @@
// See LICENSE.SiFive for license details.
// See LICENSE.Berkeley for license details.
package junctions
import Chisel._
import freechips.rocketchip.config._
import scala.collection.mutable.HashMap
case class MemAttr(prot: Int, cacheable: Boolean = false)
sealed abstract class MemRegion {
def start: BigInt
def size: BigInt
def numSlaves: Int
def attr: MemAttr
def containsAddress(x: UInt) = UInt(start) <= x && x < UInt(start + size)
}
case class MemSize(size: BigInt, attr: MemAttr) extends MemRegion {
def start = 0
def numSlaves = 1
}
case class MemRange(start: BigInt, size: BigInt, attr: MemAttr) extends MemRegion {
def numSlaves = 1
}
object AddrMapProt {
val R = 0x1
val W = 0x2
val X = 0x4
val RW = R | W
val RX = R | X
val RWX = R | W | X
val SZ = 3
}
class AddrMapProt extends Bundle {
val x = Bool()
val w = Bool()
val r = Bool()
}
case class AddrMapEntry(name: String, region: MemRegion)
object AddrMap {
def apply(elems: AddrMapEntry*): AddrMap = new AddrMap(elems)
}
class AddrMap(
entriesIn: Seq[AddrMapEntry],
val start: BigInt = BigInt(0),
val collapse: Boolean = false) extends MemRegion {
private val slavePorts = HashMap[String, Int]()
private val mapping = HashMap[String, MemRegion]()
def isEmpty = entries.isEmpty
def length = entries.size
def numSlaves = slavePorts.size
val (size: BigInt, entries: Seq[AddrMapEntry], attr: MemAttr) = {
var ind = 0
var base = start
var rebasedEntries = collection.mutable.ArrayBuffer[AddrMapEntry]()
var prot = 0
var cacheable = true
for (AddrMapEntry(name, r) <- entriesIn) {
require (!mapping.contains(name))
base = r.start
r match {
case r: AddrMap =>
val subMap = new AddrMap(r.entries, base, r.collapse)
rebasedEntries += AddrMapEntry(name, subMap)
mapping += name -> subMap
mapping ++= subMap.mapping.map { case (k, v) => s"$name:$k" -> v }
if (r.collapse) {
slavePorts += (name -> ind)
ind += 1
} else {
slavePorts ++= subMap.slavePorts.map {
case (k, v) => s"$name:$k" -> (ind + v)
}
ind += r.numSlaves
}
case _ =>
val e = MemRange(base, r.size, r.attr)
rebasedEntries += AddrMapEntry(name, e)
mapping += name -> e
slavePorts += name -> ind
ind += r.numSlaves
}
base += r.size
prot |= r.attr.prot
cacheable &&= r.attr.cacheable
}
(base - start, rebasedEntries, MemAttr(prot, cacheable))
}
val flatten: Seq[AddrMapEntry] = {
mapping.toSeq.map {
case (name, range: MemRange) => Some(AddrMapEntry(name, range))
case _ => None
}.flatten.sortBy(_.region.start)
}
// checks to see whether any MemRange overlaps within this AddrMap
flatten.combinations(2) foreach {
case (Seq(AddrMapEntry(an, ar), AddrMapEntry(bn, br))) =>
val arEnd = ar.start + ar.size
val brEnd = br.start + br.size
val abOverlaps = ar.start < brEnd && br.start < arEnd
require(!abOverlaps,
s"region $an@0x${ar.start.toString(16)} overlaps region $bn@0x${br.start.toString(16)}")
}
def toRange: MemRange = MemRange(start, size, attr)
def apply(name: String): MemRegion = mapping(name)
def contains(name: String): Boolean = mapping.contains(name)
def port(name: String): Int = slavePorts(name)
def subMap(name: String): AddrMap = mapping(name).asInstanceOf[AddrMap]
def isInRegion(name: String, addr: UInt): Bool = mapping(name).containsAddress(addr)
def isCacheable(addr: UInt): Bool = {
flatten.filter(_.region.attr.cacheable).map(
_.region.containsAddress(addr)
).foldLeft(false.B)(_ || _)
}
def isValid(addr: UInt): Bool = {
flatten.map(_.region.containsAddress(addr)).foldLeft(false.B)(_ || _)
}
def getProt(addr: UInt): AddrMapProt = {
val protForRegion = flatten.map { entry =>
Mux(entry.region.containsAddress(addr),
UInt(entry.region.attr.prot, AddrMapProt.SZ), UInt(0))
}
new AddrMapProt().fromBits(protForRegion.reduce(_|_))
}
override def containsAddress(x: UInt) = {
flatten.map(_.region.containsAddress(x)).reduce(_||_)
}
}

View File

@ -0,0 +1,610 @@
/// See LICENSE for license details.
package junctions
import Chisel._
import scala.math.{min, max}
import scala.collection.mutable.ArraySeq
import freechips.rocketchip.util.{DecoupledHelper, ParameterizedBundle, HellaPeekingArbiter}
import freechips.rocketchip.config.{Parameters, Field}
case object NastiKey extends Field[NastiParameters]
case class NastiParameters(dataBits: Int, addrBits: Int, idBits: Int)
trait HasNastiParameters {
implicit val p: Parameters
val nastiExternal = p(NastiKey)
val nastiXDataBits = nastiExternal.dataBits
val nastiWStrobeBits = nastiXDataBits / 8
val nastiXAddrBits = nastiExternal.addrBits
val nastiWIdBits = nastiExternal.idBits
val nastiRIdBits = nastiExternal.idBits
val nastiXIdBits = max(nastiWIdBits, nastiRIdBits)
val nastiXUserBits = 1
val nastiAWUserBits = nastiXUserBits
val nastiWUserBits = nastiXUserBits
val nastiBUserBits = nastiXUserBits
val nastiARUserBits = nastiXUserBits
val nastiRUserBits = nastiXUserBits
val nastiXLenBits = 8
val nastiXSizeBits = 3
val nastiXBurstBits = 2
val nastiXCacheBits = 4
val nastiXProtBits = 3
val nastiXQosBits = 4
val nastiXRegionBits = 4
val nastiXRespBits = 2
def bytesToXSize(bytes: UInt) = MuxLookup(bytes, UInt("b111"), Array(
UInt(1) -> UInt(0),
UInt(2) -> UInt(1),
UInt(4) -> UInt(2),
UInt(8) -> UInt(3),
UInt(16) -> UInt(4),
UInt(32) -> UInt(5),
UInt(64) -> UInt(6),
UInt(128) -> UInt(7)))
}
abstract class NastiModule(implicit val p: Parameters) extends Module
with HasNastiParameters
abstract class NastiBundle(implicit val p: Parameters) extends ParameterizedBundle()(p)
with HasNastiParameters
abstract class NastiChannel(implicit p: Parameters) extends NastiBundle()(p)
abstract class NastiMasterToSlaveChannel(implicit p: Parameters) extends NastiChannel()(p)
abstract class NastiSlaveToMasterChannel(implicit p: Parameters) extends NastiChannel()(p)
trait HasNastiMetadata extends HasNastiParameters {
val addr = UInt(width = nastiXAddrBits)
val len = UInt(width = nastiXLenBits)
val size = UInt(width = nastiXSizeBits)
val burst = UInt(width = nastiXBurstBits)
val lock = Bool()
val cache = UInt(width = nastiXCacheBits)
val prot = UInt(width = nastiXProtBits)
val qos = UInt(width = nastiXQosBits)
val region = UInt(width = nastiXRegionBits)
}
trait HasNastiData extends HasNastiParameters {
val data = UInt(width = nastiXDataBits)
val last = Bool()
}
class NastiReadIO(implicit val p: Parameters) extends ParameterizedBundle()(p) {
val ar = Decoupled(new NastiReadAddressChannel)
val r = Decoupled(new NastiReadDataChannel).flip
}
class NastiWriteIO(implicit val p: Parameters) extends ParameterizedBundle()(p) {
val aw = Decoupled(new NastiWriteAddressChannel)
val w = Decoupled(new NastiWriteDataChannel)
val b = Decoupled(new NastiWriteResponseChannel).flip
}
class NastiIO(implicit p: Parameters) extends NastiBundle()(p) {
val aw = Decoupled(new NastiWriteAddressChannel)
val w = Decoupled(new NastiWriteDataChannel)
val b = Decoupled(new NastiWriteResponseChannel).flip
val ar = Decoupled(new NastiReadAddressChannel)
val r = Decoupled(new NastiReadDataChannel).flip
}
class NastiAddressChannel(implicit p: Parameters) extends NastiMasterToSlaveChannel()(p)
with HasNastiMetadata
class NastiResponseChannel(implicit p: Parameters) extends NastiSlaveToMasterChannel()(p) {
val resp = UInt(width = nastiXRespBits)
}
class NastiWriteAddressChannel(implicit p: Parameters) extends NastiAddressChannel()(p) {
val id = UInt(width = nastiWIdBits)
val user = UInt(width = nastiAWUserBits)
}
class NastiWriteDataChannel(implicit p: Parameters) extends NastiMasterToSlaveChannel()(p)
with HasNastiData {
val id = UInt(width = nastiWIdBits)
val strb = UInt(width = nastiWStrobeBits)
val user = UInt(width = nastiWUserBits)
}
class NastiWriteResponseChannel(implicit p: Parameters) extends NastiResponseChannel()(p) {
val id = UInt(width = nastiWIdBits)
val user = UInt(width = nastiBUserBits)
}
class NastiReadAddressChannel(implicit p: Parameters) extends NastiAddressChannel()(p) {
val id = UInt(width = nastiRIdBits)
val user = UInt(width = nastiARUserBits)
}
class NastiReadDataChannel(implicit p: Parameters) extends NastiResponseChannel()(p)
with HasNastiData {
val id = UInt(width = nastiRIdBits)
val user = UInt(width = nastiRUserBits)
}
object NastiConstants {
val BURST_FIXED = UInt("b00")
val BURST_INCR = UInt("b01")
val BURST_WRAP = UInt("b10")
val RESP_OKAY = UInt("b00")
val RESP_EXOKAY = UInt("b01")
val RESP_SLVERR = UInt("b10")
val RESP_DECERR = UInt("b11")
val CACHE_DEVICE_NOBUF = UInt("b0000")
val CACHE_DEVICE_BUF = UInt("b0001")
val CACHE_NORMAL_NOCACHE_NOBUF = UInt("b0010")
val CACHE_NORMAL_NOCACHE_BUF = UInt("b0011")
def AXPROT(instruction: Bool, nonsecure: Bool, privileged: Bool): UInt =
Cat(instruction, nonsecure, privileged)
def AXPROT(instruction: Boolean, nonsecure: Boolean, privileged: Boolean): UInt =
AXPROT(Bool(instruction), Bool(nonsecure), Bool(privileged))
}
import NastiConstants._
object NastiWriteAddressChannel {
def apply(id: UInt, addr: UInt, size: UInt,
len: UInt = UInt(0), burst: UInt = BURST_INCR)
(implicit p: Parameters) = {
val aw = Wire(new NastiWriteAddressChannel)
aw.id := id
aw.addr := addr
aw.len := len
aw.size := size
aw.burst := burst
aw.lock := false.B
aw.cache := CACHE_DEVICE_NOBUF
aw.prot := AXPROT(false, false, false)
aw.qos := UInt("b0000")
aw.region := UInt("b0000")
aw.user := UInt(0)
aw
}
}
object NastiReadAddressChannel {
def apply(id: UInt, addr: UInt, size: UInt,
len: UInt = UInt(0), burst: UInt = BURST_INCR)
(implicit p: Parameters) = {
val ar = Wire(new NastiReadAddressChannel)
ar.id := id
ar.addr := addr
ar.len := len
ar.size := size
ar.burst := burst
ar.lock := false.B
ar.cache := CACHE_DEVICE_NOBUF
ar.prot := AXPROT(false, false, false)
ar.qos := UInt(0)
ar.region := UInt(0)
ar.user := UInt(0)
ar
}
}
object NastiWriteDataChannel {
def apply(data: UInt, strb: Option[UInt] = None,
last: Bool = true.B, id: UInt = UInt(0))
(implicit p: Parameters): NastiWriteDataChannel = {
val w = Wire(new NastiWriteDataChannel)
w.strb := strb.getOrElse(Fill(w.nastiWStrobeBits, UInt(1, 1)))
w.data := data
w.last := last
w.id := id
w.user := UInt(0)
w
}
}
object NastiReadDataChannel {
def apply(id: UInt, data: UInt, last: Bool = true.B, resp: UInt = UInt(0))(
implicit p: Parameters) = {
val r = Wire(new NastiReadDataChannel)
r.id := id
r.data := data
r.last := last
r.resp := resp
r.user := UInt(0)
r
}
}
object NastiWriteResponseChannel {
def apply(id: UInt, resp: UInt = UInt(0))(implicit p: Parameters) = {
val b = Wire(new NastiWriteResponseChannel)
b.id := id
b.resp := resp
b.user := UInt(0)
b
}
}
class NastiQueue(depth: Int)(implicit p: Parameters) extends Module {
val io = new Bundle {
val in = (new NastiIO).flip
val out = new NastiIO
}
io.out.ar <> Queue(io.in.ar, depth)
io.out.aw <> Queue(io.in.aw, depth)
io.out.w <> Queue(io.in.w, depth)
io.in.r <> Queue(io.out.r, depth)
io.in.b <> Queue(io.out.b, depth)
}
object NastiQueue {
def apply(in: NastiIO, depth: Int = 2)(implicit p: Parameters): NastiIO = {
val queue = Module(new NastiQueue(depth))
queue.io.in <> in
queue.io.out
}
}
class NastiArbiterIO(arbN: Int)(implicit p: Parameters) extends Bundle {
val master = Vec(arbN, new NastiIO).flip
val slave = new NastiIO
override def cloneType =
new NastiArbiterIO(arbN).asInstanceOf[this.type]
}
/** Arbitrate among arbN masters requesting to a single slave */
class NastiArbiter(val arbN: Int)(implicit p: Parameters) extends NastiModule {
val io = new NastiArbiterIO(arbN)
if (arbN > 1) {
val arbIdBits = log2Up(arbN)
val ar_arb = Module(new RRArbiter(new NastiReadAddressChannel, arbN))
val aw_arb = Module(new RRArbiter(new NastiWriteAddressChannel, arbN))
val w_chosen = Reg(UInt(width = arbIdBits))
val w_done = Reg(init = true.B)
when (aw_arb.io.out.fire()) {
w_chosen := aw_arb.io.chosen
w_done := false.B
}
when (io.slave.w.fire() && io.slave.w.bits.last) {
w_done := true.B
}
val queueSize = min((1 << nastiXIdBits) * arbN, 64)
val rroq = Module(new ReorderQueue(
UInt(width = arbIdBits), nastiXIdBits, Some(queueSize)))
val wroq = Module(new ReorderQueue(
UInt(width = arbIdBits), nastiXIdBits, Some(queueSize)))
for (i <- 0 until arbN) {
val m_ar = io.master(i).ar
val m_aw = io.master(i).aw
val m_r = io.master(i).r
val m_b = io.master(i).b
val a_ar = ar_arb.io.in(i)
val a_aw = aw_arb.io.in(i)
val m_w = io.master(i).w
a_ar <> m_ar
a_aw <> m_aw
m_r.valid := io.slave.r.valid && rroq.io.deq.head.data === UInt(i)
m_r.bits := io.slave.r.bits
m_b.valid := io.slave.b.valid && wroq.io.deq.head.data === UInt(i)
m_b.bits := io.slave.b.bits
m_w.ready := io.slave.w.ready && w_chosen === UInt(i) && !w_done
}
io.slave.r.ready := io.master(rroq.io.deq.head.data).r.ready
io.slave.b.ready := io.master(wroq.io.deq.head.data).b.ready
rroq.io.deq.head.tag := io.slave.r.bits.id
rroq.io.deq.head.valid := io.slave.r.fire() && io.slave.r.bits.last
wroq.io.deq.head.tag := io.slave.b.bits.id
wroq.io.deq.head.valid := io.slave.b.fire()
assert(!rroq.io.deq.head.valid || rroq.io.deq.head.matches,
"NastiArbiter: read response mismatch")
assert(!wroq.io.deq.head.valid || wroq.io.deq.head.matches,
"NastiArbiter: write response mismatch")
io.slave.w.bits := io.master(w_chosen).w.bits
io.slave.w.valid := io.master(w_chosen).w.valid && !w_done
val ar_helper = DecoupledHelper(
ar_arb.io.out.valid,
io.slave.ar.ready,
rroq.io.enq.ready)
io.slave.ar.valid := ar_helper.fire(io.slave.ar.ready)
io.slave.ar.bits := ar_arb.io.out.bits
ar_arb.io.out.ready := ar_helper.fire(ar_arb.io.out.valid)
rroq.io.enq.valid := ar_helper.fire(rroq.io.enq.ready)
rroq.io.enq.bits.tag := ar_arb.io.out.bits.id
rroq.io.enq.bits.data := ar_arb.io.chosen
val aw_helper = DecoupledHelper(
aw_arb.io.out.valid,
io.slave.aw.ready,
wroq.io.enq.ready)
io.slave.aw.bits <> aw_arb.io.out.bits
io.slave.aw.valid := aw_helper.fire(io.slave.aw.ready, w_done)
aw_arb.io.out.ready := aw_helper.fire(aw_arb.io.out.valid, w_done)
wroq.io.enq.valid := aw_helper.fire(wroq.io.enq.ready, w_done)
wroq.io.enq.bits.tag := aw_arb.io.out.bits.id
wroq.io.enq.bits.data := aw_arb.io.chosen
} else { io.slave <> io.master.head }
}
/** A slave that send decode error for every request it receives */
class NastiErrorSlave(implicit p: Parameters) extends NastiModule {
val io = (new NastiIO).flip
when (io.ar.fire()) { printf("Invalid read address %x\n", io.ar.bits.addr) }
when (io.aw.fire()) { printf("Invalid write address %x\n", io.aw.bits.addr) }
val r_queue = Module(new Queue(new NastiReadAddressChannel, 1))
r_queue.io.enq <> io.ar
val responding = Reg(init = false.B)
val beats_left = Reg(init = UInt(0, nastiXLenBits))
when (!responding && r_queue.io.deq.valid) {
responding := true.B
beats_left := r_queue.io.deq.bits.len
}
io.r.valid := r_queue.io.deq.valid && responding
io.r.bits.id := r_queue.io.deq.bits.id
io.r.bits.data := UInt(0)
io.r.bits.resp := RESP_DECERR
io.r.bits.last := beats_left === UInt(0)
r_queue.io.deq.ready := io.r.fire() && io.r.bits.last
when (io.r.fire()) {
when (beats_left === UInt(0)) {
responding := false.B
} .otherwise {
beats_left := beats_left - UInt(1)
}
}
val draining = Reg(init = false.B)
io.w.ready := draining
when (io.aw.fire()) { draining := true.B }
when (io.w.fire() && io.w.bits.last) { draining := false.B }
val b_queue = Module(new Queue(UInt(width = nastiWIdBits), 1))
b_queue.io.enq.valid := io.aw.valid && !draining
b_queue.io.enq.bits := io.aw.bits.id
io.aw.ready := b_queue.io.enq.ready && !draining
io.b.valid := b_queue.io.deq.valid && !draining
io.b.bits.id := b_queue.io.deq.bits
io.b.bits.resp := RESP_DECERR
b_queue.io.deq.ready := io.b.ready && !draining
}
class NastiRouterIO(nSlaves: Int)(implicit p: Parameters) extends Bundle {
val master = (new NastiIO).flip
val slave = Vec(nSlaves, new NastiIO)
override def cloneType =
new NastiRouterIO(nSlaves).asInstanceOf[this.type]
}
/** Take a single Nasti master and route its requests to various slaves
* @param nSlaves the number of slaves
* @param routeSel a function which takes an address and produces
* a one-hot encoded selection of the slave to write to */
class NastiRouter(nSlaves: Int, routeSel: UInt => UInt)(implicit p: Parameters)
extends NastiModule {
val io = new NastiRouterIO(nSlaves)
val ar_route = routeSel(io.master.ar.bits.addr)
val aw_route = routeSel(io.master.aw.bits.addr)
val ar_ready = Wire(init = false.B)
val aw_ready = Wire(init = false.B)
val w_ready = Wire(init = false.B)
val queueSize = min((1 << nastiXIdBits) * nSlaves, 64)
// These reorder queues remember which slave ports requests were sent on
// so that the responses can be sent back in-order on the master
val ar_queue = Module(new ReorderQueue(
UInt(width = log2Up(nSlaves + 1)), nastiXIdBits,
Some(queueSize), nSlaves + 1))
val aw_queue = Module(new ReorderQueue(
UInt(width = log2Up(nSlaves + 1)), nastiXIdBits,
Some(queueSize), nSlaves + 1))
// This queue holds the accepted aw_routes so that we know how to route the
val w_queue = Module(new Queue(aw_route, nSlaves))
val ar_helper = DecoupledHelper(
io.master.ar.valid,
ar_queue.io.enq.ready,
ar_ready)
val aw_helper = DecoupledHelper(
io.master.aw.valid,
w_queue.io.enq.ready,
aw_queue.io.enq.ready,
aw_ready)
val w_helper = DecoupledHelper(
io.master.w.valid,
w_queue.io.deq.valid,
w_ready)
def routeEncode(oh: UInt): UInt = Mux(oh.orR, OHToUInt(oh), UInt(nSlaves))
ar_queue.io.enq.valid := ar_helper.fire(ar_queue.io.enq.ready)
ar_queue.io.enq.bits.tag := io.master.ar.bits.id
ar_queue.io.enq.bits.data := routeEncode(ar_route)
aw_queue.io.enq.valid := aw_helper.fire(aw_queue.io.enq.ready)
aw_queue.io.enq.bits.tag := io.master.aw.bits.id
aw_queue.io.enq.bits.data := routeEncode(aw_route)
w_queue.io.enq.valid := aw_helper.fire(w_queue.io.enq.ready)
w_queue.io.enq.bits := aw_route
w_queue.io.deq.ready := w_helper.fire(w_queue.io.deq.valid, io.master.w.bits.last)
io.master.ar.ready := ar_helper.fire(io.master.ar.valid)
io.master.aw.ready := aw_helper.fire(io.master.aw.valid)
io.master.w.ready := w_helper.fire(io.master.w.valid)
val ar_valid = ar_helper.fire(ar_ready)
val aw_valid = aw_helper.fire(aw_ready)
val w_valid = w_helper.fire(w_ready)
val w_route = w_queue.io.deq.bits
io.slave.zipWithIndex.foreach { case (s, i) =>
s.ar.valid := ar_valid && ar_route(i)
s.ar.bits := io.master.ar.bits
when (ar_route(i)) { ar_ready := s.ar.ready }
s.aw.valid := aw_valid && aw_route(i)
s.aw.bits := io.master.aw.bits
when (aw_route(i)) { aw_ready := s.aw.ready }
s.w.valid := w_valid && w_route(i)
s.w.bits := io.master.w.bits
when (w_route(i)) { w_ready := s.w.ready }
}
val ar_noroute = !ar_route.orR
val aw_noroute = !aw_route.orR
val w_noroute = !w_route.orR
val err_slave = Module(new NastiErrorSlave)
err_slave.io.ar.valid := ar_valid && ar_noroute
err_slave.io.ar.bits := io.master.ar.bits
err_slave.io.aw.valid := aw_valid && aw_noroute
err_slave.io.aw.bits := io.master.aw.bits
err_slave.io.w.valid := w_valid && w_noroute
err_slave.io.w.bits := io.master.w.bits
when (ar_noroute) { ar_ready := err_slave.io.ar.ready }
when (aw_noroute) { aw_ready := err_slave.io.aw.ready }
when (w_noroute) { w_ready := err_slave.io.w.ready }
val b_arb = Module(new RRArbiter(new NastiWriteResponseChannel, nSlaves + 1))
val r_arb = Module(new HellaPeekingArbiter(
new NastiReadDataChannel, nSlaves + 1,
// we can unlock if it's the last beat
(r: NastiReadDataChannel) => r.last, rr = true))
val all_slaves = io.slave :+ err_slave.io
for (i <- 0 to nSlaves) {
b_arb.io.in(i) <> all_slaves(i).b
aw_queue.io.deq(i).valid := all_slaves(i).b.fire()
aw_queue.io.deq(i).tag := all_slaves(i).b.bits.id
r_arb.io.in(i) <> all_slaves(i).r
ar_queue.io.deq(i).valid := all_slaves(i).r.fire() && all_slaves(i).r.bits.last
ar_queue.io.deq(i).tag := all_slaves(i).r.bits.id
assert(!aw_queue.io.deq(i).valid || aw_queue.io.deq(i).matches,
s"aw_queue $i tried to dequeue untracked transaction")
assert(!ar_queue.io.deq(i).valid || ar_queue.io.deq(i).matches,
s"ar_queue $i tried to dequeue untracked transaction")
}
io.master.b <> b_arb.io.out
io.master.r <> r_arb.io.out
}
/** Crossbar between multiple Nasti masters and slaves
* @param nMasters the number of Nasti masters
* @param nSlaves the number of Nasti slaves
* @param routeSel a function selecting the slave to route an address to */
class NastiCrossbar(nMasters: Int, nSlaves: Int,
routeSel: UInt => UInt)
(implicit p: Parameters) extends NastiModule {
val io = new Bundle {
val masters = Vec(nMasters, new NastiIO).flip
val slaves = Vec(nSlaves, new NastiIO)
}
if (nMasters == 1) {
val router = Module(new NastiRouter(nSlaves, routeSel))
router.io.master <> io.masters.head
io.slaves <> router.io.slave
} else {
val routers = Vec.fill(nMasters) { Module(new NastiRouter(nSlaves, routeSel)).io }
val arbiters = Vec.fill(nSlaves) { Module(new NastiArbiter(nMasters)).io }
for (i <- 0 until nMasters) {
routers(i).master <> io.masters(i)
}
for (i <- 0 until nSlaves) {
arbiters(i).master <> Vec(routers.map(r => r.slave(i)))
io.slaves(i) <> arbiters(i).slave
}
}
}
class NastiInterconnectIO(val nMasters: Int, val nSlaves: Int)
(implicit p: Parameters) extends Bundle {
/* This is a bit confusing. The interconnect is a slave to the masters and
* a master to the slaves. Hence why the declarations seem to be backwards. */
val masters = Vec(nMasters, new NastiIO).flip
val slaves = Vec(nSlaves, new NastiIO)
override def cloneType =
new NastiInterconnectIO(nMasters, nSlaves).asInstanceOf[this.type]
}
abstract class NastiInterconnect(implicit p: Parameters) extends NastiModule()(p) {
val nMasters: Int
val nSlaves: Int
lazy val io = new NastiInterconnectIO(nMasters, nSlaves)
}
class NastiRecursiveInterconnect(
val nMasters: Int, addrMap: AddrMap)
(implicit p: Parameters) extends NastiInterconnect()(p) {
def port(name: String) = io.slaves(addrMap.port(name))
val nSlaves = addrMap.numSlaves
val routeSel = (addr: UInt) =>
Cat(addrMap.entries.map(e => addrMap(e.name).containsAddress(addr)).reverse)
val xbar = Module(new NastiCrossbar(nMasters, addrMap.length, routeSel))
xbar.io.masters <> io.masters
io.slaves <> addrMap.entries.zip(xbar.io.slaves).flatMap {
case (entry, xbarSlave) => {
entry.region match {
case submap: AddrMap if submap.entries.isEmpty =>
val err_slave = Module(new NastiErrorSlave)
err_slave.io <> xbarSlave
None
case submap: AddrMap =>
val ic = Module(new NastiRecursiveInterconnect(1, submap))
ic.io.masters.head <> xbarSlave
ic.io.slaves
case r: MemRange =>
Some(xbarSlave)
}
}
}
}

View File

@ -0,0 +1,90 @@
// See LICENSE for license details.
package midas
import passes.Utils.writeEmittedCircuit
import chisel3.{Data, Bundle, Record, Clock, Bool}
import chisel3.internal.firrtl.Port
import firrtl.ir.Circuit
import firrtl.{Transform, CircuitState}
import firrtl.annotations.Annotation
import firrtl.CompilerUtils.getLoweringTransforms
import firrtl.passes.memlib._
import freechips.rocketchip.config.{Parameters, Field}
import java.io.{File, FileWriter, Writer}
import logger._
// Directory into which output files are dumped. Set by dir argument
case object OutputDir extends Field[File]
// Compiler for Midas Transforms
private class MidasCompiler extends firrtl.Compiler {
def emitter = new firrtl.LowFirrtlEmitter
def transforms =
getLoweringTransforms(firrtl.ChirrtlForm, firrtl.MidForm) ++
Seq(new InferReadWrite) ++
getLoweringTransforms(firrtl.MidForm, firrtl.LowForm)
}
// These next two compilers split LFO from the rest of the lowering
// compilers to schedule around the presence of internal & non-standard WIR
// nodes (Dshlw) present after LFO, which custom transforms can't handle
private class HostTransformCompiler extends firrtl.Compiler {
def emitter = new firrtl.LowFirrtlEmitter
def transforms =
Seq(new firrtl.IRToWorkingIR,
new firrtl.ResolveAndCheck,
new firrtl.HighFirrtlToMiddleFirrtl) ++
getLoweringTransforms(firrtl.MidForm, firrtl.LowForm)
}
// Custom transforms have been scheduled -> do the final lowering
private class LastStageVerilogCompiler extends firrtl.Compiler {
def emitter = new firrtl.VerilogEmitter
def transforms = Seq(new firrtl.LowFirrtlOptimization,
new firrtl.transforms.RemoveReset)
}
object MidasCompiler {
def apply(
chirrtl: Circuit,
targetAnnos: Seq[Annotation],
io: Seq[(String, Data)],
dir: File,
targetTransforms: Seq[Transform], // Run pre-MIDAS transforms, on the target RTL
hostTransforms: Seq[Transform] // Run post-MIDAS transformations
)
(implicit p: Parameters): CircuitState = {
val midasAnnos = Seq(
firrtl.TargetDirAnnotation(dir.getPath()),
InferReadWriteAnnotation)
val midasTransforms = new passes.MidasTransforms(io)(p alterPartial { case OutputDir => dir })
val compiler = new MidasCompiler
val midas = compiler.compile(firrtl.CircuitState(
chirrtl, firrtl.ChirrtlForm, targetAnnos ++ midasAnnos),
targetTransforms :+ midasTransforms)
val postHostTransforms = new HostTransformCompiler().compile(midas, hostTransforms)
val result = new LastStageVerilogCompiler().compileAndEmit(postHostTransforms)
writeEmittedCircuit(result, new File(dir, s"FPGATop.v"))
result
}
// Unlike above, elaborates the target locally, before constructing the target IO Record.
def apply[T <: chisel3.core.UserModule](
w: => T,
dir: File,
targetTransforms: Seq[Transform] = Seq.empty,
hostTransforms: Seq[Transform] = Seq.empty
)
(implicit p: Parameters): CircuitState = {
dir.mkdirs
lazy val target = w
val circuit = chisel3.Driver.elaborate(() => target)
val chirrtl = firrtl.Parser.parse(chisel3.Driver.emit(circuit))
val io = target.getPorts map (p => p.id.instanceName -> p.id)
apply(chirrtl, circuit.annotations.map(_.toFirrtl), io, dir, targetTransforms, hostTransforms)
}
}

View File

@ -0,0 +1,85 @@
// See LICENSE for license details.
package midas
import core._
import widgets._
import platform._
import models._
import strober.core._
import junctions.{NastiKey, NastiParameters}
import freechips.rocketchip.config.{Parameters, Config, Field}
import freechips.rocketchip.unittest.UnitTests
trait PlatformType
case object Zynq extends PlatformType
case object F1 extends PlatformType
case object Platform extends Field[PlatformType]
// Switches to synthesize prints and assertions
case object SynthAsserts extends Field[Boolean]
case object SynthPrints extends Field[Boolean]
// Exclude module instances from assertion and print synthesis
// Tuple of Parent Module (where the instance is instantiated) and the instance name
case object EnableSnapshot extends Field[Boolean]
case object HasDMAChannel extends Field[Boolean]
case object KeepSamplesInMem extends Field[Boolean]
// MIDAS 2.0 Switches
case object GenerateMultiCycleRamModels extends Field[Boolean](false)
// User provided transforms to run before Golden Gate transformations
// These are constructor functions accept a Parameters instance and produce a
// sequence of firrtl Transforms to run
case object TargetTransforms extends Field[Seq[(Parameters) => Seq[firrtl.Transform]]](Seq())
// User provided transforms to run after Golden Gate transformations
case object HostTransforms extends Field[Seq[(Parameters) => Seq[firrtl.Transform]]](Seq())
class SimConfig extends Config((site, here, up) => {
case TraceMaxLen => 1024
case SRAMChainNum => 1
case ChannelLen => 16
case ChannelWidth => 32
case DaisyWidth => 32
case SynthAsserts => false
case SynthPrints => false
case EnableSnapshot => false
case KeepSamplesInMem => true
case CtrlNastiKey => NastiParameters(32, 32, 12)
case DMANastiKey => NastiParameters(512, 64, 6)
case FpgaMMIOSize => BigInt(1) << 12 // 4 KB
case AXIDebugPrint => false
case HostMemChannelNastiKey => NastiParameters(64, 32, 6)
case HostMemNumChannels => 1
case MemNastiKey => site(HostMemChannelNastiKey).copy(
addrBits = chisel3.util.log2Ceil(site(HostMemNumChannels)) + site(HostMemChannelNastiKey).addrBits,
// TODO: We should try to constrain masters to 4 bits of ID space -> but we need to map
// multiple target-ids on a single host-id in the DRAM timing model to support that
idBits = 6
)
})
class ZynqConfig extends Config(new Config((site, here, up) => {
case Platform => Zynq
case HasDMAChannel => false
case MasterNastiKey => site(CtrlNastiKey)
}) ++ new SimConfig)
class ZynqConfigWithSnapshot extends Config(new Config((site, here, up) => {
case EnableSnapshot => true
}) ++ new ZynqConfig)
// we are assuming the host-DRAM size is 2^chAddrBits
class F1Config extends Config(new Config((site, here, up) => {
case Platform => F1
case HasDMAChannel => true
case CtrlNastiKey => NastiParameters(32, 25, 12)
case MasterNastiKey => site(CtrlNastiKey)
case HostMemChannelNastiKey => NastiParameters(64, 34, 16)
case HostMemNumChannels => 4
}) ++ new SimConfig)
class F1ConfigWithSnapshot extends Config(new Config((site, here, up) => {
case EnableSnapshot => true
}) ++ new F1Config)

View File

@ -0,0 +1,76 @@
// See LICENSE for license details.
package midas.unittest
import chisel3._
import chisel3.experimental.RawModule
import firrtl.{ExecutionOptionsManager, HasFirrtlOptions}
import freechips.rocketchip.config.{Parameters, Config, Field}
import midas.widgets.ScanRegister
case object QoRTargets extends Field[Parameters => Seq[RawModule]]
class QoRShim(implicit val p: Parameters) extends Module {
val io = IO(new Bundle {
val scanIn = Input(Bool())
val scanOut = Output(Bool())
val scanEnable = Input(Bool())
})
val modules = p(QoRTargets)(p)
val scanOuts = modules.map({ module =>
val ports = module.getPorts.flatMap({
case chisel3.internal.firrtl.Port(id: Clock, _) => None
case chisel3.internal.firrtl.Port(id, _) => Some(id)
})
ScanRegister(ports, io.scanEnable, io.scanIn)
})
io.scanOut := scanOuts.reduce(_ || _)
}
class Midas2QoRTargets extends Config((site, here, up) => {
case QoRTargets => (q: Parameters) => {
implicit val p = q
Seq(
Module(new midas.models.sram.AsyncMemChiselModel(160, 64, 6, 3))
)
}
})
// Generates synthesizable unit tests for key modules, such as simulation channels
// See: src/main/cc/unittest/Makefile for the downstream RTL-simulation flow
//
// TODO: Make the core of this generator a trait that can be mixed into
// FireSim's ScalaTests for more type safety
object QoRShimGenerator extends App with freechips.rocketchip.util.HasGeneratorUtilities {
case class QoRShimOptions(
configProject: String = "midas.unittest",
config: String = "Midas2QoRTargets") {
val fullConfigClasses: Seq[String] = Seq(configProject + "." + config)
}
trait HasUnitTestOptions {
self: ExecutionOptionsManager =>
var qorOptions = QoRShimOptions()
parser.note("MIDAS Unit Test Generator Options")
parser.opt[String]("config-project")
.abbr("cp")
.valueName("<config-project>")
.foreach { d => qorOptions = qorOptions.copy(configProject = d) }
parser.opt[String]("config")
.abbr("conf")
.valueName("<configClassName>")
.foreach { cfg => qorOptions = qorOptions.copy(config = cfg) }
}
val exOptions = new ExecutionOptionsManager("qor")
with HasChiselExecutionOptions
with HasFirrtlOptions
with HasUnitTestOptions
exOptions.parse(args)
val params = getConfig(exOptions.qorOptions.fullConfigClasses).toInstance
Driver.execute(exOptions, () => new QoRShim()(params))
}

View File

@ -0,0 +1,81 @@
// See LICENSE for license details.
package midas.unittest
import midas.core._
import chisel3._
import firrtl.{ExecutionOptionsManager, HasFirrtlOptions}
import freechips.rocketchip.config.{Parameters, Config, Field}
import freechips.rocketchip.unittest.{UnitTests, TestHarness}
import midas.models.{CounterTableUnitTest, LatencyHistogramUnitTest, AddressRangeCounterUnitTest}
// Unittests
class WithAllUnitTests extends Config((site, here, up) => {
case UnitTests => (q: Parameters) => {
implicit val p = q
val timeout = 2000000
Seq(
Module(new PipeChannelUnitTest(latency = 0, timeout = timeout)),
Module(new PipeChannelUnitTest(latency = 1, timeout = timeout)),
Module(new ReadyValidChannelUnitTest(timeout = timeout)),
Module(new CounterTableUnitTest),
Module(new LatencyHistogramUnitTest),
Module(new AddressRangeCounterUnitTest))
}
})
// Failing tests
class WithTimeOutCheck extends Config((site, here, up) => {
case UnitTests => (q: Parameters) => {
implicit val p = q
Seq(
Module(new PipeChannelUnitTest(timeout = 100)),
)
}
})
// Complete configs
class AllUnitTests extends Config(new WithAllUnitTests ++ new midas.SimConfig)
class TimeOutCheck extends Config(new WithTimeOutCheck ++ new midas.SimConfig)
// Generates synthesizable unit tests for key modules, such as simulation channels
// See: src/main/cc/unittest/Makefile for the downstream RTL-simulation flow
//
// TODO: Make the core of this generator a trait that can be mixed into
// FireSim's ScalaTests for more type safety
object Generator extends App with freechips.rocketchip.util.HasGeneratorUtilities {
case class UnitTestOptions(
configProject: String = "midas.unittest",
config: String = "AllUnitTests") {
val fullConfigClasses: Seq[String] = Seq(configProject + "." + config)
}
trait HasUnitTestOptions {
self: ExecutionOptionsManager =>
var utOptions = UnitTestOptions()
parser.note("MIDAS Unit Test Generator Options")
parser.opt[String]("config-project")
.abbr("cp")
.valueName("<config-project>")
.foreach { d => utOptions = utOptions.copy(configProject = d) }
parser.opt[String]("config")
.abbr("conf")
.valueName("<configClassName>")
.foreach { cfg => utOptions = utOptions.copy(config = cfg) }
}
val exOptions = new ExecutionOptionsManager("regressions")
with HasChiselExecutionOptions
with HasFirrtlOptions
with HasUnitTestOptions
exOptions.parse(args)
val params = getConfig(exOptions.utOptions.fullConfigClasses).toInstance
Driver.execute(exOptions, () => new TestHarness()(params))
}

View File

@ -0,0 +1,374 @@
// See LICENSE for license details.
package midas
package core
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.unittest._
import freechips.rocketchip.util.{DecoupledHelper}
import freechips.rocketchip.tilelink.LFSR64 // Better than chisel's
import chisel3._
import chisel3.util._
import chisel3.experimental.{dontTouch, chiselName, MultiIOModule}
import strober.core.{TraceQueue, TraceMaxLen}
import midas.core.SimUtils.{ChLeafType}
// For now use the convention that clock ratios are set with respect to the transformed RTL
trait IsRationalClockRatio {
def numerator: Int
def denominator: Int
def isUnity() = numerator == denominator
def isReciprocal() = numerator == 1
def isIntegral() = denominator == 1
def inverse: IsRationalClockRatio
}
case class RationalClockRatio(numerator: Int, denominator: Int) extends IsRationalClockRatio {
def inverse() = RationalClockRatio(denominator, numerator)
}
case object UnityClockRatio extends IsRationalClockRatio {
val numerator = 1
val denominator = 1
def inverse() = UnityClockRatio
}
case class ReciprocalClockRatio(denominator: Int) extends IsRationalClockRatio {
val numerator = 1
def inverse = IntegralClockRatio(numerator = denominator)
}
case class IntegralClockRatio(numerator: Int) extends IsRationalClockRatio {
val denominator = 1
def inverse = ReciprocalClockRatio(denominator = numerator)
}
class PipeChannelIO[T <: ChLeafType](gen: T)(implicit p: Parameters) extends Bundle {
val in = Flipped(Decoupled(gen))
val out = Decoupled(gen)
val trace = Decoupled(gen)
val traceLen = Input(UInt(log2Up(p(TraceMaxLen)+1).W))
override def cloneType = new PipeChannelIO(gen)(p).asInstanceOf[this.type]
}
class PipeChannel[T <: ChLeafType](
val gen: T,
latency: Int,
clockRatio: IsRationalClockRatio = UnityClockRatio
)(implicit p: Parameters) extends Module {
require(clockRatio.isUnity)
require(latency == 0 || latency == 1)
val io = IO(new PipeChannelIO(gen))
val tokens = Module(new Queue(gen, p(ChannelLen)))
tokens.io.enq <> io.in
io.out <> tokens.io.deq
if (latency == 1) {
val initializing = RegNext(reset.toBool)
when(initializing) {
tokens.io.enq.valid := true.B
io.in.ready := false.B
}
}
if (p(EnableSnapshot)) {
io.trace <> TraceQueue(tokens.io.deq, io.traceLen)
} else {
io.trace := DontCare
io.trace.valid := false.B
}
}
class PipeChannelUnitTest(
latency: Int = 0,
numTokens: Int = 4096,
timeout: Int = 50000
)(implicit p: Parameters) extends UnitTest(timeout) {
override val testName = "PipeChannel Unit Test"
val payloadWidth = 8
val dut = Module(new PipeChannel(UInt(payloadWidth.W), latency, UnityClockRatio))
val referenceInput = Wire(UInt(payloadWidth.W))
val referenceOutput = ShiftRegister(referenceInput, latency)
val inputChannelMapping = Seq(IChannelDesc("in", referenceInput, dut.io.in))
val outputChannelMapping = Seq(OChannelDesc("out", referenceOutput, dut.io.out, TokenComparisonFunctions.ignoreNTokens(1)))
io.finished := DirectedLIBDNTestHelper(inputChannelMapping, outputChannelMapping, numTokens)
dut.io.traceLen := DontCare
dut.io.trace.ready := DontCare
}
// A bidirectional token channel wrapping a target-decoupled (ready-valid) interface
// Structurally, this keeps the target bundle intact however it should really be thought of as:
// two *independent* token channels
// fwd: DecoupledIO (carries a combined valid-and-payload token)
// - valid -> fwd.hValid
// - ready -> fwd.hReady
// - bits -> {target.valid, target.bits}
//
// rev: DecoupledIO (carries a ready token)
// - valid -> rev.hValid
// - ready -> rev.hReady
// - bits -> target.ready
//
// WARNING: Target.fire() is meaningless unless are fwd and rev channels are
// synchronized and carry valid tokens
class SimReadyValidIO[T <: Data](gen: T) extends Bundle {
val target = EnqIO(gen)
val fwd = new HostReadyValid
val rev = Flipped(new HostReadyValid)
override def cloneType = new SimReadyValidIO(gen).asInstanceOf[this.type]
def fwdIrrevocabilityAssertions(suggestedName: Option[String] = None): Unit = {
val hValidPrev = RegNext(fwd.hValid, false.B)
val hReadyPrev = RegNext(fwd.hReady)
val hFirePrev = hValidPrev && hReadyPrev
val tPrev = RegNext(target)
val prefix = suggestedName match {
case Some(name) => name + ": "
case None => ""
}
assert(!hValidPrev || hFirePrev || fwd.hValid,
s"${prefix}hValid de-asserted without handshake, violating fwd token irrevocability")
assert(!hValidPrev || hFirePrev || tPrev.valid === target.valid,
s"${prefix}tValid transitioned without host handshake, violating fwd token irrevocability")
assert(!hValidPrev || hFirePrev || tPrev.bits.asUInt() === target.bits.asUInt(),
s"${prefix}tBits transitioned without host handshake, violating fwd token irrevocability")
assert(!hFirePrev || tPrev.fire || !tPrev.valid,
s"${prefix}tValid deasserted without prior target handshake, violating target-queue irrevocability")
assert(!hFirePrev || tPrev.fire || !tPrev.valid || tPrev.bits.asUInt() === target.bits.asUInt(),
s"${prefix}tBits transitioned without prior target handshake, violating target-queue irrevocability")
}
def revIrrevocabilityAssertions(suggestedName: Option[String] = None): Unit = {
val prefix = suggestedName match {
case Some(name) => name + ": "
case None => ""
}
val hReadyPrev = RegNext(rev.hReady, false.B)
val hValidPrev = RegNext(rev.hValid)
val tReadyPrev = RegNext(target.ready)
val hFirePrev = hReadyPrev && hValidPrev
assert(hFirePrev || !hReadyPrev || rev.hReady,
s"${prefix}hReady de-asserted, violating token irrevocability")
assert(hFirePrev || !hReadyPrev || tReadyPrev === target.ready,
s"${prefix}tReady de-asserted, violating token irrevocability")
}
// Returns two directioned objects driven by this SimReadyValidIO hw instance
def bifurcate(): (DecoupledIO[ValidIO[T]], DecoupledIO[Bool]) = {
// Can't use bidirectional wires, so we use a dummy module (akin to the identity module)
class BifurcationModule[T <: Data](gen: T) extends MultiIOModule {
val fwd = IO(Decoupled(Valid(gen)))
val rev = IO(Flipped(DecoupledIO(Bool())))
val coupled = IO(Flipped(cloneType))
// Forward channel
fwd.bits.bits := coupled.target.bits
fwd.bits.valid := coupled.target.valid
fwd.valid := coupled.fwd.hValid
coupled.fwd.hReady := fwd.ready
// Reverse channel
rev.ready := coupled.rev.hReady
coupled.target.ready := rev.bits
coupled.rev.hValid := rev.valid
}
val bifurcator = Module(new BifurcationModule(gen))
bifurcator.coupled <> this
(bifurcator.fwd, bifurcator.rev)
}
// Returns two directioned objects which will drive this SimReadyValidIO hw instance
def combine(): (DecoupledIO[ValidIO[T]], DecoupledIO[Bool]) = {
// Can't use bidirectional wires, so we use a dummy module (akin to the identity module)
class CombiningModule[T <: Data](gen: T) extends MultiIOModule {
val fwd = IO(Flipped(DecoupledIO(Valid(gen))))
val rev = IO((Decoupled(Bool())))
val coupled = IO(cloneType)
// Forward channel
coupled.target.bits := fwd.bits.bits
coupled.target.valid := fwd.bits.valid
coupled.fwd.hValid := fwd.valid
fwd.ready := coupled.fwd.hReady
// Reverse channel
coupled.rev.hReady := rev.ready
rev.bits := coupled.target.ready
rev.valid := coupled.rev.hValid
}
val combiner = Module(new CombiningModule(gen))
this <> combiner.coupled
(combiner.fwd, combiner.rev)
}
}
object SimReadyValid {
def apply[T <: Data](gen: T) = new SimReadyValidIO(gen)
}
class ReadyValidTraceIO[T <: Data](gen: T) extends Bundle {
val bits = Decoupled(gen)
val valid = Decoupled(Bool())
val ready = Decoupled(Bool())
override def cloneType = new ReadyValidTraceIO(gen).asInstanceOf[this.type]
}
object ReadyValidTrace {
def apply[T <: Data](gen: T) = new ReadyValidTraceIO(gen)
}
class ReadyValidChannelIO[T <: Data](gen: T)(implicit p: Parameters) extends Bundle {
val enq = Flipped(SimReadyValid(gen))
val deq = SimReadyValid(gen)
val trace = ReadyValidTrace(gen)
val traceLen = Input(UInt(log2Up(p(TraceMaxLen)+1).W))
val targetReset = Flipped(Decoupled(Bool()))
override def cloneType = new ReadyValidChannelIO(gen)(p).asInstanceOf[this.type]
}
class ReadyValidChannel[T <: Data](
gen: T,
n: Int = 2, // Target queue depth
// Clock ratio (N/M) of deq interface (N) vs enq interface (M)
clockRatio: IsRationalClockRatio = UnityClockRatio
)(implicit p: Parameters) extends Module {
require(clockRatio.isUnity, "CDC is not currently implemented")
val io = IO(new ReadyValidChannelIO(gen))
val enqFwdQ = Module(new Queue(ValidIO(gen), 2, flow = true))
enqFwdQ.io.enq.bits.valid := io.enq.target.valid
enqFwdQ.io.enq.bits.bits := io.enq.target.bits
enqFwdQ.io.enq.valid := io.enq.fwd.hValid
io.enq.fwd.hReady := enqFwdQ.io.enq.ready
val deqRevQ = Module(new Queue(Bool(), 2, flow = true))
deqRevQ.io.enq.bits := io.deq.target.ready
deqRevQ.io.enq.valid := io.deq.rev.hValid
io.deq.rev.hReady := deqRevQ.io.enq.ready
val reference = Module(new Queue(gen, n))
val deqFwdFired = RegInit(false.B)
val enqRevFired = RegInit(false.B)
val finishing = DecoupledHelper(
io.targetReset.valid,
enqFwdQ.io.deq.valid,
deqRevQ.io.deq.valid,
(enqRevFired || io.enq.rev.hReady),
(deqFwdFired || io.deq.fwd.hReady))
val targetFire = finishing.fire()
val enqBitsLast = RegEnable(enqFwdQ.io.deq.bits.bits, targetFire)
// enqRev
io.enq.rev.hValid := !enqRevFired
io.enq.target.ready := reference.io.enq.ready
// deqFwd
io.deq.fwd.hValid := !deqFwdFired
io.deq.target.bits := reference.io.deq.bits
io.deq.target.valid := reference.io.deq.valid
io.targetReset.ready := finishing.fire(io.targetReset.valid)
enqFwdQ.io.deq.ready := finishing.fire(enqFwdQ.io.deq.valid)
deqRevQ.io.deq.ready := finishing.fire(deqRevQ.io.deq.valid)
reference.reset := reset.toBool || targetFire && io.targetReset.bits
reference.io.enq.valid := targetFire && enqFwdQ.io.deq.bits.valid
reference.io.enq.bits := Mux(targetFire, enqFwdQ.io.deq.bits.bits, enqBitsLast)
reference.io.deq.ready := targetFire && deqRevQ.io.deq.bits
deqFwdFired := Mux(targetFire, false.B, deqFwdFired || io.deq.fwd.hReady)
enqRevFired := Mux(targetFire, false.B, enqRevFired || io.enq.rev.hReady)
io.trace := DontCare
io.trace.bits.valid := false.B
io.trace.valid.valid := false.B
io.trace.ready.valid := false.B
}
@chiselName
class ReadyValidChannelUnitTest(
numTokens: Int = 4096,
queueDepth: Int = 2,
timeout: Int = 50000
)(implicit p: Parameters) extends UnitTest(timeout) {
override val testName = "PipeChannel ClockRatio: ${clockRatio.numerator}/${clockRatio.denominator}"
val payloadType = UInt(8.W)
val resetLength = 4
val dut = Module(new ReadyValidChannel(payloadType))
val reference = Module(new Queue(payloadType, queueDepth))
// Generates target-reset tokens
def resetTokenGen(): Bool = {
val resetCount = RegInit(0.U(log2Ceil(resetLength + 1).W))
val outOfReset = resetCount === resetLength.U
resetCount := Mux(outOfReset, resetCount, resetCount + 1.U)
!outOfReset
}
// This will ensure that the bits field of deq matches even if target valid
// is not asserted. To workaround random initialization of the queue's
// mem, it neglects all target-invalid output tokens until all entries of
// the mem has been written once.
//
// TODO: Consider initializing all memories to zero even in the unittests as
// that will more closely the FPGA
val enqCount = RegInit(0.U(log2Ceil(queueDepth + 1).W))
val memFullyDefined = enqCount === queueDepth.U
enqCount := Mux(!memFullyDefined && reference.io.enq.fire && !reference.reset.toBool, enqCount + 1.U, enqCount)
// Track the target cycle at which all entries are known
val memFullyDefinedCycle = RegInit(1.U(log2Ceil(2*timeout).W))
memFullyDefinedCycle := Mux(!memFullyDefined, memFullyDefinedCycle + 1.U, memFullyDefinedCycle)
def strictPayloadCheck(ref: Data, ch: DecoupledIO[Data]): Bool = {
// hack: fix the types
val refTyped = ref.asTypeOf(refDeqFwd)
val modelTyped = ref.asTypeOf(refDeqFwd)
val deqCount = RegInit(0.U(log2Ceil(numTokens + 1).W))
when (ch.fire) { deqCount := deqCount + 1.U }
// Neglect a comparison if: 1) still under reset 2) mem contents still undefined
val exempt = deqCount < resetLength.U ||
!refTyped.valid && !modelTyped.valid && (deqCount < memFullyDefinedCycle)
val matchExact = ref.asUInt === ch.bits.asUInt
!ch.fire || exempt || matchExact
}
val (deqFwd, deqRev) = dut.io.deq.bifurcate()
val (enqFwd, enqRev) = dut.io.enq.combine()
val refDeqFwd = Wire(Valid(payloadType))
refDeqFwd.bits := reference.io.deq.bits
refDeqFwd.valid := reference.io.deq.valid
val refEnqFwd = Wire(Valid(payloadType))
reference.io.enq.bits := refEnqFwd.bits
reference.io.enq.valid := refEnqFwd.valid
val inputChannelMapping = Seq(IChannelDesc("enqFwd", refEnqFwd, enqFwd),
IChannelDesc("deqRev", reference.io.deq.ready, deqRev),
IChannelDesc("reset" , reference.reset, dut.io.targetReset, Some(resetTokenGen)))
val outputChannelMapping = Seq(OChannelDesc("deqFwd", refDeqFwd, deqFwd, strictPayloadCheck),
OChannelDesc("enqRev", reference.io.enq.ready, enqRev, TokenComparisonFunctions.ignoreNTokens(resetLength)))
io.finished := DirectedLIBDNTestHelper(inputChannelMapping, outputChannelMapping, numTokens)
dut.io.traceLen := DontCare
dut.io.traceLen := DontCare
dut.io.trace.ready.ready := DontCare
dut.io.trace.valid.ready := DontCare
dut.io.trace.bits.ready := DontCare
dut.io.traceLen := DontCare
}

View File

@ -0,0 +1,61 @@
// See LICENSE for license details.
package midas.core
import freechips.rocketchip.tilelink.LFSR64 // Better than chisel's
import chisel3._
import chisel3.util._
import chisel3.experimental.MultiIOModule
trait ClockUtils {
// Assume time is measured in ps
val timeStepBits = 32
}
class GenericClockCrossing[T <: Data](gen: T) extends MultiIOModule with ClockUtils {
val enq = IO(Flipped(Decoupled(gen)))
val deq = IO(Decoupled(gen))
val enqDomainTimeStep = IO(Input(UInt(timeStepBits.W)))
val deqDomainTimeStep = IO(Input(UInt(timeStepBits.W)))
val enqTokens = Queue(enq, 2)
// Deq Domain handling
val residualTime = Reg(UInt(timeStepBits.W))
val hasResidualTime = RegInit(false.B)
val timeToNextEnqEdge = Mux(hasResidualTime, residualTime, enqDomainTimeStep)
val timeToNextDeqEdge = RegInit(0.U(timeStepBits.W))
val enqTokenVisible = timeToNextEnqEdge > timeToNextDeqEdge
val tokenWouldExpire = timeToNextEnqEdge < timeToNextDeqEdge + deqDomainTimeStep
deq.valid := enqTokens.valid && enqTokenVisible
deq.bits := enqTokens.bits
enqTokens.ready := !enqTokenVisible || deq.ready && tokenWouldExpire
val enqTokenExpiring = enqTokens.fire
val deqTokenReleased = deq.fire
// CASE 1: This ENQ token is visible in the current deq token, but not visible in future DEQ tokens
// ENQ N | ENQ N1 |
// ... | DEQ M | DEQ M1 |
when (enqTokenExpiring && deqTokenReleased) {
hasResidualTime := false.B
timeToNextDeqEdge := timeToNextDeqEdge + deqDomainTimeStep - timeToNextEnqEdge
// Case 2: This ENQ token is no longer visible, generally Fast -> Slow)
// ENQ N | ENQ N+1 | ...
// DEQ M | DEQ M+1...
}.elsewhen(enqTokenExpiring) {
hasResidualTime := false.B
timeToNextDeqEdge := timeToNextDeqEdge - timeToNextEnqEdge
// Case 3: This ENQ token is visible in the current and possibly future output tokens
// ENQ M | ...
// ENQ N | ENQ N+1 | ...
}.elsewhen(deqTokenReleased) {
hasResidualTime := true.B
timeToNextDeqEdge := deqDomainTimeStep
residualTime := timeToNextEnqEdge - deqDomainTimeStep
}
}

View File

@ -0,0 +1,153 @@
// See LICENSE for license details.
package midas
package core
import junctions._
import widgets._
import chisel3._
import chisel3.util._
import chisel3.core.ActualDirection
import chisel3.core.DataMirror.directionOf
import freechips.rocketchip.config.{Parameters, Field}
import freechips.rocketchip.diplomacy.AddressSet
import freechips.rocketchip.util.{DecoupledHelper}
import scala.collection.mutable
case object DMANastiKey extends Field[NastiParameters]
case object FpgaMMIOSize extends Field[BigInt]
// The AXI4 widths for a single host-DRAM channel
case object HostMemChannelNastiKey extends Field[NastiParameters]
// The number of host-DRAM channels -> all channels must have the same AXI4 widths
case object HostMemNumChannels extends Field[Int]
// The aggregate memory-space seen by masters wanting DRAM
case object MemNastiKey extends Field[NastiParameters]
class FPGATopIO(implicit val p: Parameters) extends WidgetIO {
val dma = Flipped(new NastiIO()(p alterPartial ({ case NastiKey => p(DMANastiKey) })))
val mem = Vec(4, new NastiIO()(p alterPartial ({ case NastiKey => p(HostMemChannelNastiKey) })))
}
// Platform agnostic wrapper of the simulation models for FPGA
class FPGATop(simIoType: SimWrapperChannels)(implicit p: Parameters) extends Module with HasWidgets {
val io = IO(new FPGATopIO)
// Simulation Target
val sim = Module(new SimBox(simIoType.cloneType))
val simIo = sim.io.channelPorts
// This reset is used to return the simulation to time 0.
val master = addWidget(new SimulationMaster)
val simReset = master.io.simReset
sim.io.clock := clock
sim.io.reset := reset.toBool || simReset
sim.io.hostReset := simReset
val memPorts = new mutable.ListBuffer[NastiIO]
case class DmaInfo(name: String, port: NastiIO, size: BigInt)
val dmaInfoBuffer = new mutable.ListBuffer[DmaInfo]
// Instantiate bridge widgets.
simIo.bridgeAnnos.map({ bridgeAnno =>
val widgetChannelPrefix = s"${bridgeAnno.target.ref}"
val widget = addWidget(bridgeAnno.elaborateWidget)
widget.reset := reset.toBool || simReset
widget match {
case model: midas.models.FASEDMemoryTimingModel =>
memPorts += model.io.host_mem
model.hPort.hBits.axi4.aw.bits.user := DontCare
model.hPort.hBits.axi4.aw.bits.region := DontCare
model.hPort.hBits.axi4.ar.bits.user := DontCare
model.hPort.hBits.axi4.ar.bits.region := DontCare
model.hPort.hBits.axi4.w.bits.id := DontCare
model.hPort.hBits.axi4.w.bits.user := DontCare
case peekPoke: PeekPokeBridgeModule =>
peekPoke.io.step <> master.io.step
master.io.done := peekPoke.io.idle
case _ =>
}
widget.hPort.connectChannels2Port(bridgeAnno, simIo)
widget match {
case widget: HasDMA => dmaInfoBuffer += DmaInfo(widget.getWName, widget.dma, widget.dmaSize)
case _ => Nil
}
})
// Host Memory Channels
// Masters = Target memory channels + loadMemWidget
val numMemModels = memPorts.length
val nastiP = p.alterPartial({ case NastiKey => p(MemNastiKey) })
val loadMem = addWidget(new LoadMemWidget(MemNastiKey))
loadMem.reset := reset.toBool || simReset
memPorts += loadMem.io.toSlaveMem
val channelSize = BigInt(1) << p(HostMemChannelNastiKey).addrBits
val hostMemAddrMap = new AddrMap(Seq.tabulate(p(HostMemNumChannels))(i =>
AddrMapEntry(s"memChannel$i", MemRange(i * channelSize, channelSize, MemAttr(AddrMapProt.RW)))))
val mem_xbar = Module(new NastiRecursiveInterconnect(numMemModels + 1, hostMemAddrMap)(nastiP))
io.mem.zip(mem_xbar.io.slaves).foreach({ case (mem, slave) => mem <> NastiQueue(slave)(nastiP) })
memPorts.zip(mem_xbar.io.masters).foreach({ case (mem_model, master) => master <> mem_model })
// Sort the list of DMA ports by address region size, largest to smallest
val dmaInfoSorted = dmaInfoBuffer.sortBy(_.size).reverse.toSeq
// Build up the address map using the sorted list,
// auto-assigning base addresses as we go.
val dmaAddrMap = dmaInfoSorted.foldLeft((BigInt(0), List.empty[AddrMapEntry])) {
case ((startAddr, addrMap), DmaInfo(widgetName, _, reqSize)) =>
// Round up the size to the nearest power of 2
val regionSize = 1 << log2Ceil(reqSize)
val region = MemRange(startAddr, regionSize, MemAttr(AddrMapProt.RW))
(startAddr + regionSize, AddrMapEntry(widgetName, region) :: addrMap)
}._2.reverse
val dmaPorts = dmaInfoSorted.map(_.port)
if (dmaPorts.isEmpty) {
val dmaParams = p.alterPartial({ case NastiKey => p(DMANastiKey) })
val error = Module(new NastiErrorSlave()(dmaParams))
error.io <> io.dma
} else if (dmaPorts.size == 1) {
dmaPorts(0) <> io.dma
} else {
val dmaParams = p.alterPartial({ case NastiKey => p(DMANastiKey) })
val router = Module(new NastiRecursiveInterconnect(
1, new AddrMap(dmaAddrMap))(dmaParams))
router.io.masters.head <> NastiQueue(io.dma)(dmaParams)
dmaPorts.zip(router.io.slaves).foreach { case (dma, slave) => dma <> NastiQueue(slave)(dmaParams) }
}
genCtrlIO(io.ctrl, p(FpgaMMIOSize))
val addrConsts = dmaAddrMap.map {
case AddrMapEntry(name, MemRange(addr, _, _)) =>
(s"${name.toUpperCase}_DMA_ADDR" -> addr.longValue)
}
val headerConsts = addrConsts ++ List[(String, Long)](
"CTRL_ID_BITS" -> io.ctrl.nastiXIdBits,
"CTRL_ADDR_BITS" -> io.ctrl.nastiXAddrBits,
"CTRL_DATA_BITS" -> io.ctrl.nastiXDataBits,
"CTRL_STRB_BITS" -> io.ctrl.nastiWStrobeBits,
// These specify channel widths; used mostly in the test harnesses
"MEM_ADDR_BITS" -> io.mem(0).nastiXAddrBits,
"MEM_DATA_BITS" -> io.mem(0).nastiXDataBits,
"MEM_ID_BITS" -> io.mem(0).nastiXIdBits,
// These are fixed by the AXI4 standard, only used in SW DRAM model
"MEM_SIZE_BITS" -> io.mem(0).nastiXSizeBits,
"MEM_LEN_BITS" -> io.mem(0).nastiXLenBits,
"MEM_RESP_BITS" -> io.mem(0).nastiXRespBits,
"MEM_STRB_BITS" -> io.mem(0).nastiWStrobeBits,
// Address width of the aggregated host-DRAM space
"DMA_ID_BITS" -> io.dma.nastiXIdBits,
"DMA_ADDR_BITS" -> io.dma.nastiXAddrBits,
"DMA_DATA_BITS" -> io.dma.nastiXDataBits,
"DMA_STRB_BITS" -> io.dma.nastiWStrobeBits,
"DMA_WIDTH" -> p(DMANastiKey).dataBits / 8,
"DMA_SIZE" -> log2Ceil(p(DMANastiKey).dataBits / 8)
)
}

View File

@ -0,0 +1,31 @@
// See LICENSE for license details.
package midas
package core
import chisel3._
// Adapted from DecoupledIO in Chisel3
class HostDecoupledIO[+T <: Data](gen: T) extends Bundle
{
val hReady = Input(Bool())
val hValid = Output(Bool())
val hBits = gen.cloneType
def fire(): Bool = hReady && hValid
override def cloneType: this.type =
new HostDecoupledIO(gen).asInstanceOf[this.type]
}
/** Adds a ready-valid handshaking protocol to any interface.
* The standard used is that the consumer uses the flipped interface.
*/
object HostDecoupled {
def apply[T <: Data](gen: T): HostDecoupledIO[T] = new HostDecoupledIO(gen)
}
class HostReadyValid extends Bundle {
val hReady = Input(Bool())
val hValid = Output(Bool())
def fire(): Bool = hReady && hValid
}

View File

@ -0,0 +1,115 @@
// See LICENSE for license details.
package midas.core
import freechips.rocketchip.tilelink.LFSR64 // Better than chisel's
import chisel3._
import chisel3.util._
import chisel3.experimental.{chiselName}
// Describes an input channel / input port pair for an LI-BDN unittest
// name: a descriptive channel name
// reference: a hardware handle to the input on the reference RTL
// modelChannel: a hardware handle to the input channel on the model
// tokenGenFunc: a option carrying a function that when excuted,
// generates hardware to produce a new input value each cycle
case class IChannelDesc(
name: String,
reference: Data,
modelChannel: DecoupledIO[Data],
tokenGenFunc: Option[() => Data] = None) {
private def tokenSequenceGenerator(typ: Data): Data =
Cat(Seq.fill((typ.getWidth + 63)/64)(LFSR64()))(typ.getWidth - 1, 0).asTypeOf(typ)
// Generate the testing hardware for a single input channel of a model
@chiselName
def genEnvironment(testLength: Int): Unit = {
val inputGen = tokenGenFunc.getOrElse(() => tokenSequenceGenerator(reference.cloneType))()
// Drive a new input to the reference on every cycle
reference := inputGen
// Drive tokenzied inputs to the model
val inputTokenQueue = Module(new Queue(reference.cloneType, testLength, flow = true))
inputTokenQueue.io.enq.bits := reference
inputTokenQueue.io.enq.valid := true.B
// This provides an irrevocable input token stream
val stickyTokenValid = Reg(Bool())
modelChannel <> inputTokenQueue.io.deq
modelChannel.valid := stickyTokenValid && inputTokenQueue.io.deq.valid
inputTokenQueue.io.deq.ready := stickyTokenValid && modelChannel.ready
when (modelChannel.fire || ~stickyTokenValid) {
stickyTokenValid := LFSR64()(1)
}
}
}
// Describes an output channel / output port pair for an LI-BDN unittest
// name: a descriptive channel name
// reference: a hardware handle to the output on the reference RTL
// modelChannel: a hardware handle to the output channel on the model
// comparisonFunc: a function that elaborates hardware to compare
// an output token Decoupled[Data] to the correct reference output [Data]
case class OChannelDesc(
name: String,
reference: Data,
modelChannel: DecoupledIO[Data],
comparisonFunc: (Data, DecoupledIO[Data]) => Bool = (a, b) => !b.fire || a.asUInt === b.bits.asUInt) {
// Generate the testing hardware for a single output channel of a model
@chiselName
def genEnvironment(testLength: Int): Bool = {
val refOutputs = Module(new Queue(reference.cloneType, testLength, flow = true))
val refIdx = RegInit(0.U(log2Ceil(testLength + 1).W))
val modelIdx = RegInit(0.U(log2Ceil(testLength + 1).W))
val hValidPrev = RegNext(modelChannel.valid, false.B)
val hReadyPrev = RegNext(modelChannel.ready)
val hFirePrev = hValidPrev && hReadyPrev
// Collect outputs from the reference RTL
refOutputs.io.enq.valid := true.B
refOutputs.io.enq.bits := reference
assert(comparisonFunc(refOutputs.io.deq.bits, modelChannel),
s"${name} Channel: Output token traces did not match")
assert(!hValidPrev || hFirePrev || modelChannel.valid,
s"${name} Channel: hValid de-asserted without handshake, violating output token irrevocability")
val modelChannelDone = modelIdx === testLength.U
when (modelChannel.fire) { modelIdx := modelIdx + 1.U }
refOutputs.io.deq.ready := modelChannel.fire
// Fuzz backpressure on the token channel
modelChannel.ready := LFSR64()(1) & !modelChannelDone
// Return the done signal
modelChannelDone
}
}
object TokenComparisonFunctions{
// Ignores the first N output tokens when verifying a token output trace
def ignoreNTokens(numTokens: Int)(ref: Data, ch: DecoupledIO[Data]): Bool = {
val count = RegInit(0.U(log2Ceil(numTokens + 1).W))
val ignoreToken = count < numTokens.U
when (ch.fire && ignoreToken) { count := count + 1.U }
!ch.fire || ignoreToken || ref.asUInt === ch.bits.asUInt
}
}
object DirectedLIBDNTestHelper{
@chiselName
def apply(
inputChannelMapping: Seq[IChannelDesc],
outputChannelMapping: Seq[OChannelDesc],
testLength: Int = 4096): Bool = {
inputChannelMapping.foreach(_.genEnvironment(testLength))
val finished = outputChannelMapping.map(_.genEnvironment(testLength)).foldLeft(true.B)(_ && _)
finished
}
}

View File

@ -0,0 +1,83 @@
// See LICENSE for license details.
package midas.core
import chisel3._
import chisel3.util._
import chisel3.experimental.{Direction}
import chisel3.experimental.DataMirror.directionOf
import scala.collection.mutable.{ArrayBuffer}
// A collection of useful types and methods for moving between target and host-land interfaces
object SimUtils {
type ChLeafType = Bits
type ChTuple = Tuple2[ChLeafType, String]
type RVChTuple = Tuple2[ReadyValidIO[Data], String]
type ParsePortsTuple = (List[ChTuple], List[ChTuple], List[RVChTuple], List[RVChTuple])
// (Some, None) -> Source channel
// (None, Some) -> Sink channel
// (Some, Some) -> Loop back channel -> two interconnected models
trait PortTuple[T <: Any] {
def source: Option[T]
def sink: Option[T]
def isOutput(): Boolean = sink == None
def isInput(): Boolean = source == None
def isLoopback(): Boolean = source != None && sink != None
}
case class WirePortTuple(source: Option[ReadyValidIO[Data]], sink: Option[ReadyValidIO[Data]])
extends PortTuple[ReadyValidIO[Data]]{
require(source != None || sink != None)
}
// Tuple of forward port and reverse (backpressure) port
type TargetRVPortType = (ReadyValidIO[ValidIO[Data]], ReadyValidIO[Bool])
// A tuple of Options of the above type. _1 => source port _2 => sink port
// Same principle as the wire channel, now with a more complex port type
case class TargetRVPortTuple(source: Option[TargetRVPortType], sink: Option[TargetRVPortType])
extends PortTuple[TargetRVPortType]{
require(source != None || sink != None)
}
def rvChannelNamePair(chName: String): (String, String) = (chName + "_fwd", chName + "_rev")
def rvChannelNamePair(tuple: RVChTuple): (String, String) = rvChannelNamePair(tuple._2)
def prefixWith(prefix: String, base: Any): String =
if (prefix != "") s"${prefix}_${base}" else base.toString
// Returns a list of input and output elements, with their flattened names
def parsePorts(io: Seq[(String, Data)], alsoFlattenRVPorts: Boolean): ParsePortsTuple = {
val inputs = ArrayBuffer[ChTuple]()
val outputs = ArrayBuffer[ChTuple]()
val rvInputs = ArrayBuffer[RVChTuple]()
val rvOutputs = ArrayBuffer[RVChTuple]()
def loop(name: String, data: Data): Unit = data match {
case c: Clock => // skip
case rv: ReadyValidIO[_] => (directionOf(rv.valid): @unchecked) match {
case Direction.Input => rvInputs += (rv -> name)
case Direction.Output => rvOutputs += (rv -> name)
}
if (alsoFlattenRVPorts) rv.elements foreach {case (n, e) => loop(prefixWith(name, n), e)}
case b: Record =>
b.elements foreach {case (n, e) => loop(prefixWith(name, n), e)}
case v: Vec[_] =>
v.zipWithIndex foreach {case (e, i) => loop(prefixWith(name, i), e)}
case b: ChLeafType => (directionOf(b): @unchecked) match {
case Direction.Input => inputs += (b -> name)
case Direction.Output => outputs += (b -> name)
}
}
io.foreach({ case (name, port) => loop(name, port)})
(inputs.toList, outputs.toList, rvInputs.toList, rvOutputs.toList)
}
def parsePorts(io: Data, prefix: String = "", alsoFlattenRVPorts: Boolean = true): ParsePortsTuple =
parsePorts(Seq(prefix -> io), alsoFlattenRVPorts)
def parsePortsSeq(io: Seq[(String, Data)], alsoFlattenRVPorts: Boolean = true): ParsePortsTuple =
parsePorts(io, alsoFlattenRVPorts)
}

View File

@ -0,0 +1,355 @@
// See LICENSE for license details.
package midas
package core
import midas.widgets.BridgeIOAnnotation
import midas.passes.fame
import midas.passes.fame.{FAMEChannelConnectionAnnotation, DecoupledForwardChannel}
import midas.core.SimUtils._
// from rocketchip
import freechips.rocketchip.config.{Parameters, Field}
import chisel3._
import chisel3.util._
import chisel3.experimental.{MultiIOModule, Direction}
import chisel3.experimental.DataMirror.directionOf
import firrtl.annotations.{ReferenceTarget}
import scala.collection.immutable.ListMap
import scala.collection.mutable.{ArrayBuffer}
case object ChannelLen extends Field[Int]
case object ChannelWidth extends Field[Int]
trait HasSimWrapperParams {
implicit val p: Parameters
implicit val channelWidth = p(ChannelWidth)
val traceMaxLen = p(strober.core.TraceMaxLen)
val daisyWidth = p(strober.core.DaisyWidth)
val sramChainNum = p(strober.core.SRAMChainNum)
}
class SimReadyValidRecord(es: Seq[(String, ReadyValidIO[Data])]) extends Record {
val elements = ListMap() ++ (es map { case (name, rv) =>
(directionOf(rv.valid): @unchecked) match {
case Direction.Input => name -> Flipped(SimReadyValid(rv.bits.cloneType))
case Direction.Output => name -> SimReadyValid(rv.bits.cloneType)
}
})
def cloneType = new SimReadyValidRecord(es).asInstanceOf[this.type]
}
class ReadyValidTraceRecord(es: Seq[(String, ReadyValidIO[Data])]) extends Record {
val elements = ListMap() ++ (es map {
case (name, rv) => name -> ReadyValidTrace(rv.bits.cloneType)
})
def cloneType = new ReadyValidTraceRecord(es).asInstanceOf[this.type]
}
// Regenerates the "bits" field of a target ready-valid interface from a list of flattened
// elements that include the "bits_" prefix. This is stripped off.
class PayloadRecord(elms: Seq[(String, Data)]) extends Record {
override val elements = ListMap((elms map { case (name, data) => name.stripPrefix("bits_") -> data.cloneType }):_*)
override def cloneType: this.type = new PayloadRecord(elms).asInstanceOf[this.type]
}
abstract class ChannelizedWrapperIO(chAnnos: Seq[FAMEChannelConnectionAnnotation],
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port]) extends Record {
def regenTypesFromField(name: String, tpe: firrtl.ir.Type): Seq[(String, ChLeafType)] = tpe match {
case firrtl.ir.BundleType(fields) => fields.flatMap(f => regenTypesFromField(prefixWith(name, f.name), f.tpe))
case firrtl.ir.UIntType(width: firrtl.ir.IntWidth) => Seq(name -> UInt(width.width.toInt.W))
case firrtl.ir.SIntType(width: firrtl.ir.IntWidth) => Seq(name -> SInt(width.width.toInt.W))
case _ => throw new RuntimeException(s"Unexpected type in token payload: ${tpe}.")
}
def regenTypes(refTargets: Seq[ReferenceTarget]): Seq[(String, ChLeafType)] = {
val port = leafTypeMap(refTargets.head.copy(component = Seq()))
val fieldName = refTargets.head.component match {
case firrtl.annotations.TargetToken.Field(fName) :: Nil => fName
case firrtl.annotations.TargetToken.Field(fName) :: fields => fName
case _ => throw new RuntimeException("Expected only a bits field in ReferenceTarget's component.")
}
val bitsField = port.tpe match {
case a: firrtl.ir.BundleType => a.fields.filter(_.name == fieldName).head
case _ => throw new RuntimeException("ReferenceTargets should point at the channel's bundle.")
}
regenTypesFromField("", bitsField.tpe)
}
def regenPayloadType(refTargets: Seq[ReferenceTarget]): Data = {
require(!refTargets.isEmpty)
// Reject all (String -> Data) pairs not included in the refTargets
// Use this to remove target valid
val targetLeafNames = refTargets.map(_.component.reverse.head.value).toSet
val elements = regenTypes(refTargets).filter({ case (name, f) => targetLeafNames(name) })
elements match {
case (name, field) :: Nil => field // If there's only a single field, just pass out the type
case elms => new PayloadRecord(elms)
}
}
def regenWireType(refTargets: Seq[ReferenceTarget]): ChLeafType = {
require(refTargets.size == 1, "FIXME: Handle aggregated wires")
regenTypes(refTargets).head._2
}
val payloadTypeMap: Map[FAMEChannelConnectionAnnotation, Data] = chAnnos.collect({
// Target Decoupled Channels need to have their target-valid ReferenceTarget removed
case ch @ FAMEChannelConnectionAnnotation(_,DecoupledForwardChannel(_,Some(vsrc),_,_),Some(srcs),_) =>
ch -> regenPayloadType(srcs.filterNot(_ == vsrc))
case ch @ FAMEChannelConnectionAnnotation(_,DecoupledForwardChannel(_,_,_,Some(vsink)),_,Some(sinks)) =>
ch -> regenPayloadType(sinks.filterNot(_ == vsink))
}).toMap
val wireTypeMap: Map[FAMEChannelConnectionAnnotation, ChLeafType] = chAnnos.collect({
case ch @ FAMEChannelConnectionAnnotation(_,fame.PipeChannel(_),Some(srcs),_) => ch -> regenWireType(srcs)
case ch @ FAMEChannelConnectionAnnotation(_,fame.PipeChannel(_),_,Some(sinks)) => ch -> regenWireType(sinks)
}).toMap
val wireElements = ArrayBuffer[(String, ReadyValidIO[Data])]()
val wirePortMap: Map[String, WirePortTuple] = chAnnos.collect({
case ch @ FAMEChannelConnectionAnnotation(globalName, fame.PipeChannel(_),sources,sinks) => {
val sinkP = sinks.map({ tRefs =>
val name = tRefs.head.ref.stripSuffix("_bits")
val port = Flipped(Decoupled(wireTypeMap(ch)))
wireElements += name -> port
port
})
val sourceP = sources.map({ tRefs =>
val name = tRefs.head.ref.stripSuffix("_bits")
val port = Decoupled(wireTypeMap(ch))
wireElements += name -> port
port
})
(globalName -> WirePortTuple(sourceP, sinkP))
}
}).toMap
// Looks up a channel based on a channel name
val wireOutputPortMap = wirePortMap.collect({
case (name, portTuple) if portTuple.isOutput => name -> portTuple.source.get
})
val wireInputPortMap = wirePortMap.collect({
case (name, portTuple) if portTuple.isInput => name -> portTuple.sink.get
})
val rvElements = ArrayBuffer[(String, ReadyValidIO[Data])]()
// Using a channel's globalName; look up it's associated port tuple
val rvPortMap: Map[String, TargetRVPortTuple] = chAnnos.collect({
case ch @ FAMEChannelConnectionAnnotation(globalName, info@DecoupledForwardChannel(_,_,_,_), leafSources, leafSinks) =>
val sourcePortPair = leafSources.map({ tRefs =>
require(!tRefs.isEmpty, "FIXME: Are empty decoupleds OK?")
val validTRef: ReferenceTarget = info.validSource.getOrElse(throw new RuntimeException(
"Target RV port has leaves but no TRef to a validSource"))
val readyTRef: ReferenceTarget = info.readySink.getOrElse(throw new RuntimeException(
"Target RV port has leaves but no TRef to a readySink"))
val fwdName = validTRef.ref
val fwdPort = Decoupled(Valid(payloadTypeMap(ch)))
val revName = readyTRef.ref
val revPort = Flipped(Decoupled(Bool()))
rvElements ++= Seq((fwdName -> fwdPort), (revName -> revPort))
(fwdPort, revPort)
})
val sinkPortPair = leafSinks.map({ tRefs =>
require(!tRefs.isEmpty, "FIXME: Are empty decoupleds OK?")
val validTRef: ReferenceTarget = info.validSink.getOrElse(throw new RuntimeException(
"Target RV port has payload sinks but no TRef to a validSink"))
val readyTRef: ReferenceTarget = info.readySource.getOrElse(throw new RuntimeException(
"Target RV port has payload sinks but no TRef to a readySource"))
val fwdName = validTRef.ref
val fwdPort = Flipped(Decoupled(Valid(payloadTypeMap(ch))))
val revName = readyTRef.ref
val revPort = Decoupled(Bool())
rvElements ++= Seq((fwdName -> fwdPort), (revName -> revPort))
(fwdPort, revPort)
})
globalName -> TargetRVPortTuple(sourcePortPair, sinkPortPair)
}).toMap
// Looks up a channel based on a channel name
val rvOutputPortMap = rvPortMap.collect({
case (name, portTuple) if portTuple.isOutput => name -> portTuple.source.get
})
val rvInputPortMap = rvPortMap.collect({
case (name, portTuple) if portTuple.isInput => name -> portTuple.sink.get
})
// Looks up a FCCA based on a global channel name
val chNameToAnnoMap = chAnnos.map(anno => anno.globalName -> anno)
}
class TargetBoxIO(val chAnnos: Seq[FAMEChannelConnectionAnnotation],
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port])
extends ChannelizedWrapperIO(chAnnos, leafTypeMap) {
val clock = Input(Clock())
val hostReset = Input(Bool())
override val elements = ListMap((wireElements ++ rvElements):_*) ++
// Untokenized ports
ListMap("clock" -> clock, "hostReset" -> hostReset)
override def cloneType: this.type = new TargetBoxIO(chAnnos, leafTypeMap).asInstanceOf[this.type]
}
class TargetBox(chAnnos: Seq[FAMEChannelConnectionAnnotation],
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port]) extends BlackBox {
val io = IO(new TargetBoxIO(chAnnos, leafTypeMap))
}
class SimWrapperChannels(val chAnnos: Seq[FAMEChannelConnectionAnnotation],
val bridgeAnnos: Seq[BridgeIOAnnotation],
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port])
extends ChannelizedWrapperIO(chAnnos, leafTypeMap) {
override val elements = ListMap((wireElements ++ rvElements):_*)
override def cloneType: this.type = new SimWrapperChannels(chAnnos, bridgeAnnos, leafTypeMap).asInstanceOf[this.type]
}
class SimBox(simChannels: SimWrapperChannels) extends BlackBox {
val io = IO(new Bundle {
val clock = Input(Clock())
val reset = Input(Bool())
val hostReset = Input(Bool())
val channelPorts = simChannels.cloneType
})
}
class SimWrapper(chAnnos: Seq[FAMEChannelConnectionAnnotation],
bridgeAnnos: Seq[BridgeIOAnnotation],
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port])
(implicit val p: Parameters) extends MultiIOModule with HasSimWrapperParams {
// Remove all FCAs that are loopback channels. All non-loopback FCAs connect
// to bridges and will be presented in the SimWrapper's IO
val bridgeChAnnos = chAnnos.collect({
case fca @ FAMEChannelConnectionAnnotation(_,_,_,None) => fca
case fca @ FAMEChannelConnectionAnnotation(_,_,None,_) => fca
})
val channelPorts = IO(new SimWrapperChannels(bridgeChAnnos, bridgeAnnos, leafTypeMap))
val hostReset = IO(Input(Bool()))
val target = Module(new TargetBox(chAnnos, leafTypeMap))
target.io.hostReset := reset.toBool && hostReset
target.io.clock := clock
import chisel3.core.ExplicitCompileOptions.NotStrict // FIXME
def getPipeChannelType(chAnno: FAMEChannelConnectionAnnotation): ChLeafType = {
target.io.wireTypeMap(chAnno)
}
def genPipeChannel(chAnno: FAMEChannelConnectionAnnotation, latency: Int = 1): PipeChannel[ChLeafType] = {
require(chAnno.sources == None || chAnno.sources.get.size == 1, "Can't aggregate wire-type channels yet")
require(chAnno.sinks == None || chAnno.sinks .get.size == 1, "Can't aggregate wire-type channels yet")
val channel = Module(new PipeChannel(getPipeChannelType(chAnno), latency))
channel suggestName s"PipeChannel_${chAnno.globalName}"
val portTuple = target.io.wirePortMap(chAnno.globalName)
portTuple.source match {
case Some(srcP) => channel.io.in <> srcP
case None => channel.io.in <> channelPorts.elements(s"${chAnno.globalName}_sink")
}
portTuple.sink match {
case Some(sinkP) => sinkP <> channel.io.out
case None => channelPorts.elements(s"${chAnno.globalName}_source") <> channel.io.out
}
channel.io.trace.ready := DontCare
channel.io.traceLen := DontCare
channel
}
// Helper functions to attach legacy SimReadyValidIO to true, dual-channel implementations of target ready-valid
def bindRVChannelEnq[T <: Data](enq: SimReadyValidIO[T], port: TargetRVPortType): Unit = {
val (fwdPort, revPort) = port
enq.fwd.hValid := fwdPort.valid
enq.target.valid := fwdPort.bits.valid
enq.target.bits := fwdPort.bits.bits // Yeah, i know
fwdPort.ready := enq.fwd.hReady
// Connect up the target-ready token channel
revPort.valid := enq.rev.hValid
revPort.bits := enq.target.ready
enq.rev.hReady := revPort.ready
}
def bindRVChannelDeq[T <: Data](deq: SimReadyValidIO[T], port: TargetRVPortType): Unit = {
val (fwdPort, revPort) = port
deq.fwd.hReady := fwdPort.ready
fwdPort.valid := deq.fwd.hValid
fwdPort.bits.valid := deq.target.valid
fwdPort.bits.bits := deq.target.bits
// Connect up the target-ready token channel
deq.rev.hValid := revPort.valid
deq.target.ready := revPort.bits
revPort.ready := deq.rev.hReady
}
def getReadyValidChannelType(chAnno: FAMEChannelConnectionAnnotation): Data = {
target.io.payloadTypeMap(chAnno)
}
def genReadyValidChannel(chAnno: FAMEChannelConnectionAnnotation): ReadyValidChannel[Data] = {
val chName = chAnno.globalName
val strippedName = chName.stripSuffix("_fwd")
// Determine which bridge this channel belongs to by looking it up with the valid
//val bridgeClockRatio = io.bridges.find(_(rvInterface.valid)) match {
// case Some(bridge) => bridge.clockRatio
// case None => UnityClockRatio
//}
val bridgeClockRatio = UnityClockRatio // TODO: FIXME
// A channel is considered "flipped" if it's sunk by the tranformed RTL (sourced by an bridge)
val channel = Module(new ReadyValidChannel(getReadyValidChannelType(chAnno).cloneType))
channel.suggestName(s"ReadyValidChannel_$strippedName")
val enqPortPair = (chAnno.sources match {
case Some(_) => target.io.rvOutputPortMap(chName)
case None => channelPorts.rvInputPortMap(chName)
})
bindRVChannelEnq(channel.io.enq, enqPortPair)
val deqPortPair = (chAnno.sinks match {
case Some(_) => target.io.rvInputPortMap(chName)
case None => channelPorts.rvOutputPortMap(chName)
})
bindRVChannelDeq(channel.io.deq, deqPortPair)
channel.io.trace := DontCare
channel.io.traceLen := DontCare
channel.io.targetReset.bits := false.B
channel.io.targetReset.valid := true.B
channel
}
// Generate all ready-valid channels
val rvChannels = chAnnos.collect({
case ch @ FAMEChannelConnectionAnnotation(_,fame.DecoupledForwardChannel(_,_,_,_),_,_) => genReadyValidChannel(ch)
})
// Generate all wire channels, excluding reset
chAnnos.collect({
case ch @ FAMEChannelConnectionAnnotation(name, fame.PipeChannel(latency),_,_) => genPipeChannel(ch, latency)
})
}

View File

@ -0,0 +1,150 @@
package midas
package models
import chisel3._
import chisel3.util._
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.util.GenericParameterizedBundle
import junctions._
import midas.widgets._
import Console.{UNDERLINED, RESET}
case class BankConflictConfig(
maxBanks: Int,
maxLatencyBits: Int = 12, // 4K cycles
params: BaseParams) extends BaseConfig {
def elaborate()(implicit p: Parameters): BankConflictModel = Module(new BankConflictModel(this))
}
class BankConflictMMRegIO(cfg: BankConflictConfig)(implicit p: Parameters)
extends SplitTransactionMMRegIO(cfg){
val latency = Input(UInt(cfg.maxLatencyBits.W))
val conflictPenalty = Input(UInt(32.W))
// The mask bits setting determines how many banks are used
val bankAddr = Input(new ProgrammableSubAddr(
maskBits = log2Ceil(cfg.maxBanks),
longName = "Bank Address",
defaultOffset = 13,
defaultMask = (1 << cfg.maxBanks) - 1
))
val bankConflicts = Output(Vec(cfg.maxBanks, UInt(32.W)))
val registers = maxReqRegisters ++ Seq(
(latency -> RuntimeSetting(30,
"Latency",
min = 1,
max = Some((1 << (cfg.maxLatencyBits-1)) - 1))),
(conflictPenalty -> RuntimeSetting(30,
"Bank-Conflict Penalty",
max = Some((1 << (cfg.maxLatencyBits-1)) - 1)))
)
def requestSettings() {
Console.println(s"${UNDERLINED}Generating runtime configuration for Bank-Conflict Model${RESET}")
}
}
class BankConflictIO(cfg: BankConflictConfig)(implicit p: Parameters)
extends SplitTransactionModelIO()(p) {
val mmReg = new BankConflictMMRegIO(cfg)
}
class BankQueueEntry(cfg: BankConflictConfig)(implicit p: Parameters) extends Bundle {
val xaction = new TransactionMetaData
val bankAddr = UInt(log2Ceil(cfg.maxBanks).W)
override def cloneType = new BankQueueEntry(cfg)(p).asInstanceOf[this.type]
}
// Appends a target cycle at which this reference should be complete
class BankConflictReference(cfg: BankConflictConfig)(implicit p: Parameters) extends Bundle {
val reference = new BankQueueEntry(cfg)
val cycle = UInt(cfg.maxLatencyBits.W) // Indicates latency until doneness
val done = Bool() // Set high when the cycle count expires
override def cloneType = new BankConflictReference(cfg)(p).asInstanceOf[this.type]
}
object BankConflictConstants {
val nBankStates = 3
val bankIdle :: bankBusy :: bankPrecharge :: Nil = Enum(nBankStates)
}
import BankConflictConstants._
class BankConflictModel(cfg: BankConflictConfig)(implicit p: Parameters) extends SplitTransactionModel(cfg)(p) {
val longName = "Bank Conflict"
def printTimingModelGenerationConfig {}
/**************************** CHISEL BEGINS *********************************/
// This is the absolute number of banks the model can account for
lazy val io = IO(new BankConflictIO(cfg))
val latency = io.mmReg.latency
val conflictPenalty = io.mmReg.conflictPenalty
val transactionQueue = Module(new DualQueue(
gen = new BankQueueEntry(cfg),
entries = cfg.maxWrites + cfg.maxReads))
transactionQueue.io.enqA.valid := newWReq
transactionQueue.io.enqA.bits.xaction := TransactionMetaData(awQueue.io.deq.bits)
transactionQueue.io.enqA.bits.bankAddr := io.mmReg.bankAddr.getSubAddr(awQueue.io.deq.bits.addr)
transactionQueue.io.enqB.valid := tNasti.ar.fire
transactionQueue.io.enqB.bits.xaction := TransactionMetaData(tNasti.ar.bits)
transactionQueue.io.enqB.bits.bankAddr := io.mmReg.bankAddr.getSubAddr(tNasti.ar.bits.addr)
val bankBusyCycles = Seq.fill(cfg.maxBanks)(RegInit(UInt(0, cfg.maxLatencyBits)))
val bankConflictCounts = RegInit(VecInit(Seq.fill(cfg.maxBanks)(0.U(32.W))))
val newReference = Wire(Decoupled(new BankConflictReference(cfg)))
newReference.valid := transactionQueue.io.deq.valid
newReference.bits.reference := transactionQueue.io.deq.bits
val marginalCycles = latency + VecInit(bankBusyCycles)(transactionQueue.io.deq.bits.bankAddr)
newReference.bits.cycle := tCycle(cfg.maxLatencyBits-1, 0) + marginalCycles
newReference.bits.done := marginalCycles === 0.U
transactionQueue.io.deq.ready := newReference.ready
val refBuffer = CollapsingBuffer(newReference, cfg.maxReads + cfg.maxWrites)
val refList = refBuffer.io.entries
val refUpdates = refBuffer.io.updates
bankBusyCycles.zip(bankConflictCounts).zipWithIndex.foreach({ case ((busyCycles, conflictCount), idx) =>
when(busyCycles > 0.U){
busyCycles := busyCycles - 1.U
}
when(newReference.fire() && newReference.bits.reference.bankAddr === idx.U){
busyCycles := marginalCycles + conflictPenalty
conflictCount := Mux(busyCycles > 0.U, conflictCount + 1.U, conflictCount)
}
})
// Mark the reference as complete
refList.zip(refUpdates).foreach({ case (ref, update) =>
when(tCycle(cfg.maxLatencyBits-1, 0) === ref.bits.cycle) { update.bits.done := true.B }
})
val selector = Module(new Arbiter(refList.head.bits.cloneType, refList.size))
selector.io.in <> refList.map({ entry =>
val candidate = V2D(entry)
candidate.valid := entry.valid && entry.bits.done
candidate
})
// Take the readies from the arbiter, and kill the selected entry
refUpdates.zip(selector.io.in).foreach({ case (ref, sel) =>
when(sel.fire()) { ref.valid := false.B } })
io.mmReg.bankConflicts := bankConflictCounts
val completedRef = selector.io.out.bits.reference
rResp.bits := ReadResponseMetaData(completedRef.xaction)
wResp.bits := WriteResponseMetaData(completedRef.xaction)
wResp.valid := selector.io.out.valid && completedRef.xaction.isWrite
rResp.valid := selector.io.out.valid && !completedRef.xaction.isWrite
selector.io.out.ready := Mux(completedRef.xaction.isWrite, wResp.ready, rResp.ready)
}

View File

@ -0,0 +1,726 @@
package midas
package models
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.util.GenericParameterizedBundle
import chisel3._
import chisel3.util._
import org.json4s._
import org.json4s.native.JsonMethods._
import Console.{UNDERLINED, GREEN, RESET}
import scala.collection.mutable
import scala.io.Source
trait HasDRAMMASConstants {
val maxDRAMTimingBits = 7 // width of a DRAM timing
val tREFIWidth = 14 // Refresh interval. Suffices up to tCK = ~0.5ns (for 64ms, 8192 refresh commands)
val tREFIBits = 14 // Refresh interval. Suffices up to tCK = ~0.5ns (for 64ms, 8192 refresh commands)
val tRFCBits = 10
val numBankStates = 2
val numRankStates = 2
}
object DRAMMasEnums extends HasDRAMMASConstants {
val cmd_nop :: cmd_act :: cmd_pre :: cmd_casw :: cmd_casr :: cmd_ref :: Nil = Enum(6)
val bank_idle :: bank_active :: Nil = Enum(numBankStates)
val rank_active :: rank_refresh :: Nil = Enum(numRankStates)
}
case class JSONField(value: BigInt, units: String)
class DRAMProgrammableTimings extends Bundle with HasDRAMMASConstants with HasProgrammableRegisters
with HasConsoleUtils {
// The most vanilla of DRAM timings
val tAL = UInt(maxDRAMTimingBits.W)
val tCAS = UInt(maxDRAMTimingBits.W)
val tCMD = UInt(maxDRAMTimingBits.W)
val tCWD = UInt(maxDRAMTimingBits.W)
val tCCD = UInt(maxDRAMTimingBits.W)
val tFAW = UInt(maxDRAMTimingBits.W)
val tRAS = UInt(maxDRAMTimingBits.W)
val tREFI = UInt(tREFIBits.W)
val tRC = UInt(maxDRAMTimingBits.W)
val tRCD = UInt(maxDRAMTimingBits.W)
val tRFC = UInt(tRFCBits.W)
val tRRD = UInt(maxDRAMTimingBits.W)
val tRP = UInt(maxDRAMTimingBits.W)
val tRTP = UInt(maxDRAMTimingBits.W)
val tRTRS = UInt(maxDRAMTimingBits.W)
val tWR = UInt(maxDRAMTimingBits.W)
val tWTR = UInt(maxDRAMTimingBits.W)
def tCAS2tCWL(tCAS: BigInt) = {
require(tCAS > 4)
if (tCAS > 12 ) tCAS - 4
else if (tCAS > 9) tCAS - 3
else if (tCAS > 7) tCAS - 2
else if (tCAS > 5) tCAS - 1
else tCAS
}
// Defaults are set to sg093, x8, 2048Mb density (1GHz clock)
val registers = Seq(
tAL -> RuntimeSetting(0,"Additive Latency"),
tCAS -> JSONSetting(14, "CAS Latency", { _("CL_TIME") }),
tCMD -> JSONSetting(1, "Command Transport Time", { lut => 1 }),
tCWD -> JSONSetting(10, "Write CAS Latency", { lut => tCAS2tCWL(lut("CL_TIME")) }),
tCCD -> JSONSetting(4, "Column-to-Column Delay", { _("TCCD") }),
tFAW -> JSONSetting(25, "Four row-Activation Window", { _("TFAW") }),
tRAS -> JSONSetting(33, "Row Access Strobe Delay", { _("TRAS_MIN") }),
tREFI -> JSONSetting(7800,"REFresh Interval", { _("TRFC_MAX")/9 }),
tRC -> JSONSetting(47, "Row Cycle time", { _("TRC") }),
tRCD -> JSONSetting(14, "Row-to-Column Delay", { _("TRCD") }),
tRFC -> JSONSetting(160, "ReFresh Cycle time", { _("TRFC_MIN") }),
tRRD -> JSONSetting(8, "Row-to-Row Delay", { _("TRRD") }),
tRP -> JSONSetting(14, "Row-Precharge delay", { _("TRP") }),
tRTP -> JSONSetting(8, "Read-To-Precharge delay", { lut => lut("TRTP").max(lut("TRTP_TCK")) }),
tRTRS -> JSONSetting(2, "Rank-to-Rank Switching Time", { lut => 2 }), // FIXME
tWR -> JSONSetting(15, "Write-Recovery time", { _("TWR") }),
tWTR -> JSONSetting(8, "Write-To-Read Turnaround Time", { _("TWTR") })
)
def setDependentRegisters(lut: Map[String, JSONField], freqMHz: BigInt) {
val periodPs = 1000000.0/freqMHz.toFloat
// Generate a lookup table of timings in units of tCK (as all programmable
// timings in the model are in units of the controller clock frequency
val lutTCK = lut.flatMap({
case (name , JSONField(value, "ps")) =>
Some(name -> BigInt(((value.toFloat + periodPs - 1)/periodPs).toInt))
case (name , JSONField(value, "tCK")) => Some(name -> value)
case _ => None
})
registers foreach {
case (elem, reg: JSONSetting) => reg.setWithLUT(lutTCK)
case _ => None
}
}
override def cloneType = new DRAMProgrammableTimings().asInstanceOf[this.type]
}
case class DRAMBackendKey(writeDepth: Int, readDepth: Int, latencyBits: Int)
abstract class DRAMBaseConfig extends BaseConfig with HasDRAMMASConstants {
def dramKey: DramOrganizationParams
def backendKey: DRAMBackendKey
}
abstract class BaseDRAMMMRegIO(cfg: DRAMBaseConfig) extends MMRegIO(cfg) with HasConsoleUtils {
// The default assignment corresponde to a standard open-page policy
// with 8K pages. All available ranks are enabled.
val bankAddr = Input(new ProgrammableSubAddr(
maskBits = cfg.dramKey.bankBits,
longName = "Bank Address",
defaultOffset = 13, // Assume 8KB page size
defaultMask = 7 // DDR3 Has 8 banks
))
val rankAddr = Input(new ProgrammableSubAddr(
maskBits = cfg.dramKey.rankBits,
longName = "Rank Address",
defaultOffset = bankAddr.defaultOffset + log2Ceil(bankAddr.defaultMask + 1),
defaultMask = (1 << cfg.dramKey.rankBits) - 1
))
val defaultRowOffset = rankAddr.defaultOffset + log2Ceil(rankAddr.defaultMask + 1)
val rowAddr = Input(new ProgrammableSubAddr(
maskBits = cfg.dramKey.rowBits,
longName = "Row Address",
defaultOffset = defaultRowOffset,
defaultMask = (cfg.dramKey.dramSize >> defaultRowOffset.toInt) - 1
))
// Page policy 1 = open, 0 = closed
val openPagePolicy = Input(Bool())
// Additional latency added to read data beats after it's received from the devices
val backendLatency = Input(UInt(cfg.backendKey.latencyBits.W))
// Counts the number of misses in the open row buffer
//val rowMisses = Output(UInt(32.W))
val dramTimings = Input(new DRAMProgrammableTimings())
val rankPower = Output(Vec(cfg.dramKey.maxRanks, new RankPowerIO))
// END CHISEL TYPES
val dramBaseRegisters = Seq(
(openPagePolicy -> RuntimeSetting(1, "Open-Page Policy")),
(backendLatency -> RuntimeSetting(2,
"Backend Latency",
min = 1,
max = Some(1 << (cfg.backendKey.latencyBits - 1))))
)
// A list of DDR3 speed grades provided by micron.
// _1 = is used as a key to look up a device, _2 = long name
val speedGrades = Seq(
("sg093" -> "DDR3-2133 (14-14-14) Minimum Clock Period: 938 ps"),
("sg107" -> "DDR3-1866 (13-13-13) Minimum Clock Period: 1071 ps"),
("sg125" -> "DDR3-1600 (11-11-11) Minimum Clock Period: 1250 ps"),
("sg15E" -> "DDR3-1333H (9-9-9) Minimum Clock Period: 1500 ps"),
("sg15" -> "DDR3-1333J (10-10-10) Minimum Clock Period: 1500 ps"),
("sg187U" -> "DDR3-1066F (7-7-7) Minimum Clock Period: 1875 ps"),
("sg187" -> "DDR3-1066G (8-8-8) Minimum Clock Period: 1875 ps"),
("sg25E" -> "DDR3-800E (5-5-5) Minimum Clock Period: 2500 ps"),
("sg25" -> "DDR3-800 (6-6-6) Minimum Clock Period: 2500 ps")
)
// Prompt the user for an address assignment scheme. TODO: Channel bits.
def getAddressScheme(
numRanks: BigInt,
numBanks: BigInt,
numRows: BigInt,
numBytesPerLine: BigInt,
pageSize: BigInt) {
case class SubAddr(
shortName: String,
longName: String,
field: Option[ProgrammableSubAddr],
count: BigInt) {
require(isPow2(count))
val bits = log2Ceil(count)
def set(offset: Int) { field.foreach( _.forceSettings(offset, count - 1) ) }
def legendEntry = s" ${shortName} -> ${longName}"
}
val ranks = SubAddr("L", "Rank Address Bits", Some(rankAddr), numRanks)
val banks = SubAddr("B", "Bank Address Bits", Some(bankAddr), numBanks)
val rows = SubAddr("R", "Row Address Bits", Some(rowAddr), numRows)
val linesPerRow = SubAddr("N", "log2(Lines Per Row)", None, pageSize/numBytesPerLine)
val bytesPerLine= SubAddr("Z", "log2(Bytes Per Line)", None, numBytesPerLine)
// Address schemes
// _1 = long name, _2 = A seq of subfields from address MSBs to LSBs
val addressSchemes = Seq(
"Baseline Open " -> Seq(rows, ranks, banks, linesPerRow, bytesPerLine),
"Baseline Closed " -> Seq(rows, linesPerRow, ranks, banks, bytesPerLine)
)
val legendHeader = s"${UNDERLINED}Legend${RESET}\n"
val legendBody = (addressSchemes.head._2 map {_.legendEntry}).mkString("\n")
val schemeStrings = addressSchemes map { case (name, addrOrder) =>
val shortNameOrder = (addrOrder map { _.shortName }).mkString(" | ")
s"${name} -> ( ${shortNameOrder} ) "
}
val scheme = addressSchemes(requestSeqSelection(
"Select an address assignment scheme:",
schemeStrings,
legendHeader + legendBody + "\nAddress scheme number"))._2
def setSubAddresses(ranges: Seq[SubAddr], offset: Int = 0): Unit = ranges match {
case current :: moreSigFields =>
current.set(offset)
setSubAddresses(moreSigFields, offset + current.bits)
case Nil => None
}
setSubAddresses(scheme.reverse)
}
// Prompt the user for a speedgrade selection. TODO: illegalize SGs based on frequency
def getSpeedGrade(): String = {
speedGrades(requestSeqSelection("Select a speed grade:", speedGrades.unzip._2))._1
}
// Get the parameters (timings, bitwidths etc..) for a paticular device from jsons in resources/
def lookupPart(density: BigInt, dqWidth: BigInt, speedGrade: String): Map[String, JSONField] = {
val dqKey = "x" + dqWidth.toString
val stream = getClass.getResourceAsStream(s"/midas/models/dram/${density}Mb_ddr3.json")
val lines = Source.fromInputStream(stream).getLines
implicit val formats = org.json4s.DefaultFormats
val json = parse(lines.mkString).extract[Map[String, Map[String, Map[String, JSONField]]]]
json(speedGrade)(dqKey)
}
def setBaseDRAMSettings(): Unit = {
// Prompt the user for overall memory organization of this channel
Console.println(s"${UNDERLINED}Memory system organization${RESET}")
val memorySize = requestInput("Memory system size in GiB", 2)
val numRanks = requestInput("Number of ranks", 1)
val busWidth = requestInput("DRAM data bus width in bits", 64)
val dqWidth = requestInput("Device DQ width", 8)
val devicesPerRank = busWidth / dqWidth
val deviceDensityMib = ((memorySize << 30) * 8 / numRanks / devicesPerRank) >> 20
Console.println(s"${GREEN}Selected Device density (Mib) -> ${deviceDensityMib}${RESET}")
// Select the appropriate device, and look up it's parameters in resource jsons
Console.println(s"\n${UNDERLINED}Device Selection${RESET}")
val freqMHz = requestInput("Clock Frequency in MHz", 1000)
val speedGradeKey = getSpeedGrade()
val lut = lookupPart(deviceDensityMib, dqWidth, speedGradeKey)
val dramTimingSettings = dramTimings.setDependentRegisters(lut, freqMHz)
// Determine the address assignment scheme
Console.println(s"\n${UNDERLINED}Address assignment${RESET}")
val lineSize = requestInput("Line size in Bytes", 64)
val numBanks = 8 // DDR3 Mandated
val pageSize = ((BigInt(1) << lut("COL_BITS").value.toInt) * devicesPerRank * dqWidth ) / 8
val numRows = BigInt(1) << lut("ROW_BITS").value.toInt
getAddressScheme(numRanks, numBanks, numRows, lineSize, pageSize)
}
}
case class DramOrganizationParams(maxBanks: Int, maxRanks: Int, dramSize: BigInt, lineBits: Int = 8) {
require(isPow2(maxBanks))
require(isPow2(maxRanks))
require(isPow2(dramSize))
require(isPow2(lineBits))
def bankBits = log2Up(maxBanks)
def rankBits = log2Up(maxRanks)
def rowBits = log2Ceil(dramSize) - lineBits
def maxRows = 1 << rowBits
}
trait CommandLegalBools {
val canCASW = Output(Bool())
val canCASR = Output(Bool())
val canPRE = Output(Bool())
val canACT = Output(Bool())
}
trait HasLegalityUpdateIO {
val key: DramOrganizationParams
import DRAMMasEnums._
val timings = Input(new DRAMProgrammableTimings)
val selectedCmd = Input(cmd_nop.cloneType)
val autoPRE = Input(Bool())
val cmdRow = Input(UInt(key.rowBits.W))
//val burstLength = Input(UInt(4.W)) // TODO: Fixme
}
// Add some scheduler specific metadata to a reference
// TODO factor out different MAS metadata into a mixin
class MASEntry(key: DRAMBaseConfig)(implicit p: Parameters) extends Bundle {
val xaction = new TransactionMetaData
val rowAddr = UInt(key.dramKey.rowBits.W)
val bankAddrOH = UInt(key.dramKey.maxBanks.W)
val bankAddr = UInt(key.dramKey.bankBits.W)
val rankAddrOH = UInt(key.dramKey.maxRanks.W)
val rankAddr = UInt(key.dramKey.rankBits.W)
def decode(from: XactionSchedulerEntry, mmReg: BaseDRAMMMRegIO) {
xaction := from.xaction
bankAddr := mmReg.bankAddr.getSubAddr(from.addr)
bankAddrOH := UIntToOH(bankAddr)
rowAddr := mmReg.rowAddr.getSubAddr(from.addr)
rankAddr := mmReg.rankAddr.getSubAddr(from.addr)
rankAddrOH := UIntToOH(rankAddr)
}
def addrMatch(rank: UInt, bank: UInt, row: Option[UInt] = None): Bool = {
val rowHit = row.foldLeft(true.B)({ case (p, addr) => p && addr === rowAddr })
rank === rankAddr && bank === bankAddr && rowHit
}
override def cloneType = new MASEntry(key)(p).asInstanceOf[this.type]
}
class FirstReadyFCFSEntry(key: DRAMBaseConfig)(implicit p: Parameters) extends MASEntry(key)(p) {
val isReady = Bool() //Set when entry hits in open row buffer
val mayPRE = Bool() // Set when no other entires hit open row buffer
// We only ask for a precharge, if we have permission (no other references hit)
// and the entry isn't personally ready
def wantPRE(): Bool = !isReady && mayPRE // Don't need the dummy args
def wantACT(): Bool = !isReady
override def cloneType = new FirstReadyFCFSEntry(key)(p).asInstanceOf[this.type]
}
// Tracks the state of a bank, including:
// - Whether it's active/idle
// - Open row address
// - Whether CAS, PRE, and ACT commands can be legally issued
//
// A MAS model uses these trackers to filte out illegal instructions for this bank
//
// A necessary condition for the controller to issue a CMD that uses this bank
// is that the can{CMD} bit be high. The controller of course all extra-bank
// timing and resource constraints are met. The controller must also ensure CAS
// commands use the open ROW.
class BankStateTrackerO(key: DramOrganizationParams) extends GenericParameterizedBundle(key)
with CommandLegalBools {
import DRAMMasEnums._
val openRow = Output(UInt(key.rowBits.W))
val state = Output(Bool())
def isRowHit(ref: MASEntry): Bool = ref.rowAddr === openRow && state === bank_active
}
class BankStateTrackerIO(val key: DramOrganizationParams) extends GenericParameterizedBundle(key)
with HasLegalityUpdateIO {
val out = new BankStateTrackerO(key)
val cmdUsesThisBank = Input(Bool())
}
class BankStateTracker(key: DramOrganizationParams) extends Module with HasDRAMMASConstants {
import DRAMMasEnums._
val io = IO(new BankStateTrackerIO(key))
val state = RegInit(bank_idle)
val openRowAddr = Reg(UInt(key.rowBits.W))
val nextLegalPRE = Module(new DownCounter(maxDRAMTimingBits))
val nextLegalACT = Module(new DownCounter(maxDRAMTimingBits))
val nextLegalCAS = Module(new DownCounter(maxDRAMTimingBits))
Seq(nextLegalPRE, nextLegalCAS, nextLegalACT) foreach { mod =>
mod.io.decr := true.B
mod.io.set.valid := false.B
mod.io.set.bits := DontCare
}
when (io.cmdUsesThisBank) {
switch(io.selectedCmd) {
is(cmd_act) {
assert(io.out.canACT, "Bank Timing Violation: Controller issued activate command illegally")
state := bank_active
openRowAddr := io.cmdRow
nextLegalCAS.io.set.valid := true.B
nextLegalCAS.io.set.bits := io.timings.tRCD - io.timings.tAL - 1.U
nextLegalPRE.io.set.valid := true.B
nextLegalPRE.io.set.bits := io.timings.tRAS - 1.U
nextLegalACT.io.set.valid := true.B
nextLegalACT.io.set.bits := io.timings.tRC - 1.U
}
is(cmd_casr) {
assert(io.out.canCASR, "Bank Timing Violation: Controller issued CASR command illegally")
when (io.autoPRE) {
state := bank_idle
nextLegalACT.io.set.valid := true.B
nextLegalACT.io.set.bits := io.timings.tRTP + io.timings.tAL + io.timings.tRP - 1.U
}.otherwise {
nextLegalPRE.io.set.valid := true.B
nextLegalPRE.io.set.bits := io.timings.tRTP + io.timings.tAL - 1.U
}
}
is(cmd_casw) {
assert(io.out.canCASW, "Bank Timing Violation: Controller issued CASW command illegally")
when (io.autoPRE) {
state := bank_idle
nextLegalACT.io.set.valid := true.B
nextLegalACT.io.set.bits := io.timings.tCWD + io.timings.tAL + io.timings.tWR +
io.timings.tCCD + io.timings.tRP + 1.U
}.otherwise {
nextLegalPRE.io.set.valid := true.B
nextLegalPRE.io.set.bits := io.timings.tCWD + io.timings.tAL + io.timings.tWR +
io.timings.tCCD - 1.U
}
}
is(cmd_pre) {
assert(io.out.canPRE, "Bank Timing Violation: Controller issued PRE command illegally")
state := bank_idle
nextLegalACT.io.set.valid := true.B
nextLegalACT.io.set.bits := io.timings.tRP - 1.U
}
}
}
io.out.canCASW := (state === bank_active) && nextLegalCAS.io.idle // Controller must check rowAddr
io.out.canCASR := (state === bank_active) && nextLegalCAS.io.idle // Controller must check rowAddr
io.out.canPRE := (state === bank_active) && nextLegalPRE.io.idle
io.out.canACT := (state === bank_idle) && nextLegalACT.io.idle
io.out.state := state
io.out.openRow := openRowAddr
}
// Tracks the state of a rank, including:
// - Whether CAS, PRE, and ACT commands can be legally issued
//
// A MAS model uses these trackers to filte out illegal instructions for this bank
//
// A necessary condition for the controller to issue a CMD that uses this bank
// is that the can{CMD} bit be high. The controller of course all extra-bank
// timing and resource constraints are met. The controller must also ensure CAS
// commands use the open ROW.
class RankStateTrackerO(key: DramOrganizationParams) extends GenericParameterizedBundle(key)
with CommandLegalBools {
import DRAMMasEnums._
val canREF = Output(Bool())
val wantREF = Output(Bool())
val state = Output(rank_active.cloneType)
val banks = Vec(key.maxBanks, Output(new BankStateTrackerO(key)))
}
class RankStateTrackerIO(val key: DramOrganizationParams) extends GenericParameterizedBundle(key)
with HasLegalityUpdateIO with HasDRAMMASConstants {
val rank = new RankStateTrackerO(key)
val tCycle = Input(UInt(maxDRAMTimingBits.W))
val cmdUsesThisRank = Input(Bool())
val cmdBankOH = Input(UInt(key.maxBanks.W))
}
class RankStateTracker(key: DramOrganizationParams) extends Module with HasDRAMMASConstants {
import DRAMMasEnums._
val io = IO(new RankStateTrackerIO(key))
val nextLegalPRE = Module(new DownCounter(maxDRAMTimingBits))
val nextLegalACT = Module(new DownCounter(tRFCBits))
val nextLegalCASR = Module(new DownCounter(maxDRAMTimingBits))
val nextLegalCASW = Module(new DownCounter(maxDRAMTimingBits))
val tREFI = RegInit(0.U(tREFIBits.W))
val state = RegInit(rank_active)
val wantREF = RegInit(false.B)
Seq(nextLegalPRE, nextLegalCASW, nextLegalCASR, nextLegalACT) foreach { mod =>
mod.io.decr := true.B
mod.io.set.valid := false.B
mod.io.set.bits := DontCare
}
val tFAWcheck = Module(new Queue(io.tCycle.cloneType, entries = 4))
tFAWcheck.io.enq.valid := io.cmdUsesThisRank && io.selectedCmd === cmd_act
tFAWcheck.io.enq.bits := io.tCycle + io.timings.tFAW
tFAWcheck.io.deq.ready := io.tCycle === tFAWcheck.io.deq.bits
when (io.cmdUsesThisRank && io.selectedCmd === cmd_act) {
assert(io.rank.canACT, "Rank Timing Violation: Controller issued ACT command illegally")
nextLegalACT.io.set.valid := true.B
nextLegalACT.io.set.bits := io.timings.tRRD - 1.U
}.elsewhen (io.selectedCmd === cmd_casr) {
assert(!io.cmdUsesThisRank || io.rank.canCASR,
"Rank Timing Violation: Controller issued CASR command illegally")
nextLegalCASR.io.set.valid := true.B
nextLegalCASR.io.set.bits := io.timings.tCCD +
Mux(io.cmdUsesThisRank, 0.U, io.timings.tRTRS) - 1.U
// TODO: tRTRS isn't the correct parameter here, but need a two cycle delay in DDR3
nextLegalCASW.io.set.valid := true.B
nextLegalCASW.io.set.bits := io.timings.tCAS + io.timings.tCCD - io.timings.tCWD +
io.timings.tRTRS - 1.U
}.elsewhen (io.selectedCmd === cmd_casw) {
assert(!io.cmdUsesThisRank || io.rank.canCASW,
"Rank Timing Violation: Controller issued CASW command illegally")
nextLegalCASR.io.set.valid := true.B
nextLegalCASR.io.set.bits := Mux(io.cmdUsesThisRank,
io.timings.tCWD + io.timings.tCCD + io.timings.tWTR - 1.U,
io.timings.tCWD + io.timings.tCCD + io.timings.tRTRS - io.timings.tCAS - 1.U)
// TODO: OST
nextLegalCASW.io.set.valid := true.B
nextLegalCASW.io.set.bits := io.timings.tCCD - 1.U
}.elsewhen (io.cmdUsesThisRank && io.selectedCmd === cmd_pre) {
assert(io.rank.canPRE, "Rank Timing Violation: Controller issued PRE command illegally")
}.elsewhen (io.cmdUsesThisRank && io.selectedCmd === cmd_ref) {
assert(io.rank.canREF, "Rank Timing Violation: Controller issued REF command illegally")
wantREF := false.B
state := rank_refresh
nextLegalACT.io.set.valid := true.B
nextLegalACT.io.set.bits := io.timings.tRFC - 1.U
}
// Disable refresion by setting tREFI = 0
when (tREFI === io.timings.tREFI && io.timings.tREFI =/= 0.U) {
tREFI := 0.U
wantREF := true.B
}.otherwise {
tREFI := tREFI + 1.U
}
when (state === rank_refresh && nextLegalACT.io.current === 1.U) {
state := rank_active
}
val bankTrackers = Seq.fill(key.maxBanks)(Module(new BankStateTracker(key)).io)
io.rank.banks.zip(bankTrackers) foreach { case (out, bank) => out := bank.out }
bankTrackers.zip(io.cmdBankOH.toBools) foreach { case (bank, cmdUsesThisBank) =>
bank.timings := io.timings
bank.selectedCmd := io.selectedCmd
bank.cmdUsesThisBank := cmdUsesThisBank && io.cmdUsesThisRank
bank.cmdRow := io.cmdRow
bank.autoPRE:= io.autoPRE
}
io.rank.canREF := (bankTrackers map { _.out.canACT } reduce { _ && _ })
io.rank.canCASR := nextLegalCASR.io.idle
io.rank.canCASW := nextLegalCASW.io.idle
io.rank.canPRE := nextLegalPRE.io.idle
io.rank.canACT := nextLegalACT.io.idle && tFAWcheck.io.enq.ready
io.rank.wantREF := wantREF
io.rank.state := state
}
class CommandBusMonitor extends Module {
import DRAMMasEnums._
val io = IO( new Bundle {
val cmd = Input(cmd_nop.cloneType)
val rank = Input(UInt())
val bank = Input(UInt())
val row = Input(UInt())
val autoPRE = Input(Bool())
})
val cycleCounter = RegInit(1.U(32.W))
val lastCommand = RegInit(0.U(32.W))
cycleCounter := cycleCounter + 1.U
when (io.cmd =/= cmd_nop) {
lastCommand := cycleCounter
when (lastCommand + 1.U =/= cycleCounter) { printf("nop(%d);\n", cycleCounter - lastCommand - 1.U) }
}
switch (io.cmd) {
is(cmd_act) {
printf("activate(%d, %d, %d); // %d\n", io.rank, io.bank, io.row, cycleCounter)
}
is(cmd_casr) {
val autoPRE = io.autoPRE
val burstChop = false.B
val column = 0.U // Don't care since we aren't checking data
printf("read(%d, %d, %d, %x, %x); // %d\n",
io.rank, io.bank, column, autoPRE, burstChop, cycleCounter)
}
is(cmd_casw) {
val autoPRE = io.autoPRE
val burstChop = false.B
val column = 0.U // Don't care since we aren't checking data
val mask = 0.U // Don't care since we aren't checking data
val data = 0.U // Don't care since we aren't checking data
printf("write(%d, %d, %d, %x, %x, %d, %d); // %d\n",
io.rank, io.bank, column, autoPRE, burstChop, mask, data, cycleCounter)
}
is(cmd_ref) {
printf("refresh(%d); // %d\n", io.rank, cycleCounter)
}
is(cmd_pre) {
val preAll = false.B
printf("precharge(%d,%d,%d); // %d\n",io.rank, io.bank, preAll, cycleCounter)
}
}
}
class RankRefreshUnitIO(key: DramOrganizationParams) extends GenericParameterizedBundle(key) {
val rankStati = Vec(key.maxRanks, Flipped(new RankStateTrackerO(key)))
// The user may have instantiated multiple ranks, but is only modelling a single
// rank system. Don't issue refreshes to ranks we aren't modelling
val ranksInUse = Input(UInt(key.maxRanks.W))
val suggestREF = Output(Bool())
val refRankAddr = Output(UInt(key.rankBits.W))
val suggestPRE = Output(Bool())
val preRankAddr = Output(UInt(key.rankBits.W))
val preBankAddr = Output(UInt(key.bankBits.W))
}
class RefreshUnit(key: DramOrganizationParams) extends Module {
val io = IO(new RankRefreshUnitIO(key))
val ranksWantingRefresh = VecInit(io.rankStati map { _.wantREF }).asUInt
val refreshableRanks = VecInit(io.rankStati map { _.canREF }).asUInt & io.ranksInUse
io.refRankAddr := PriorityEncoder(ranksWantingRefresh & refreshableRanks)
io.suggestREF := (ranksWantingRefresh & refreshableRanks).orR
// preRef => a precharge needed before refresh may occur
val preRefBanks = io.rankStati map { rank => PriorityEncoder(rank.banks map { _.canPRE })}
val prechargeableRanks = VecInit(io.rankStati map { rank => rank.canPRE &&
(rank.banks map { _.canPRE } reduce { _ || _ })}).asUInt & io.ranksInUse
io.suggestPRE := (ranksWantingRefresh & prechargeableRanks).orR
io.preRankAddr := PriorityEncoder(ranksWantingRefresh & prechargeableRanks)
io.preBankAddr := PriorityMux(ranksWantingRefresh & prechargeableRanks, preRefBanks)
}
// Outputs for counters used to feed to micron's power calculator
// # CASR, CASW is a proxy for cycles of read and write data (assuming fixed burst length)
// 1 - (ACT/(CASR + CASW)) = rank row buffer hit rate
class RankPowerIO extends Bundle {
val allPreCycles = UInt(32.W) // # of cycles the rank has all banks precharged
val numCASR = UInt(32.W) // Assume no burst-chop
val numCASW = UInt(32.W) // Ditto above
val numACT = UInt(32.W)
// TODO
// CKE low & all banks pre
// CKE low & at least one bank active
}
object RankPowerIO {
def apply(): RankPowerIO = {
val w = Wire(new RankPowerIO)
w.allPreCycles := 0.U
w.numCASR := 0.U
w.numCASW := 0.U
w.numACT := 0.U
w
}
}
class RankPowerMonitor(key: DramOrganizationParams) extends Module with HasDRAMMASConstants {
import DRAMMasEnums._
val io = IO(new Bundle {
val stats = Output(new RankPowerIO)
val rankState = Input(new RankStateTrackerO(key))
val selectedCmd = Input(cmd_nop.cloneType)
val cmdUsesThisRank = Input(Bool())
})
val stats = RegInit(RankPowerIO())
when (io.cmdUsesThisRank) {
switch(io.selectedCmd) {
is(cmd_act) {
stats.numACT := stats.numACT + 1.U
}
is(cmd_casw) {
stats.numCASW := stats.numCASW + 1.U
}
is(cmd_casr) {
stats.numCASR := stats.numCASR + 1.U
}
}
}
// This is questionable. Needs to be reevaluated once CKE toggling is accounted for
when (io.rankState.state =/= rank_refresh && ((io.rankState.banks) forall { _.canACT })) {
stats.allPreCycles := stats.allPreCycles + 1.U
}
io.stats := stats
}
class DRAMBackendIO(val latencyBits: Int)(implicit val p: Parameters) extends Bundle {
val newRead = Flipped(Decoupled(new ReadResponseMetaData))
val newWrite = Flipped(Decoupled(new WriteResponseMetaData))
val completedRead = Decoupled(new ReadResponseMetaData)
val completedWrite = Decoupled(new WriteResponseMetaData)
val readLatency = Input(UInt(latencyBits.W))
val writeLatency = Input(UInt(latencyBits.W))
val tCycle = Input(UInt(latencyBits.W))
}
class DRAMBackend(key: DRAMBackendKey)(implicit p: Parameters) extends Module {
val io = IO(new DRAMBackendIO(key.latencyBits))
val rQueue = Module(new DynamicLatencyPipe(new ReadResponseMetaData, key.readDepth, key.latencyBits))
val wQueue = Module(new DynamicLatencyPipe(new WriteResponseMetaData, key.writeDepth, key.latencyBits))
io.completedRead <> rQueue.io.deq
io.completedWrite <> wQueue.io.deq
rQueue.io.enq <> io.newRead
rQueue.io.latency := io.readLatency
wQueue.io.enq <> io.newWrite
wQueue.io.latency := io.writeLatency
Seq(rQueue, wQueue) foreach { _.io.tCycle := io.tCycle }
}

View File

@ -0,0 +1,358 @@
package midas
package models
import chisel3._
import chisel3.util._
import freechips.rocketchip.config.{Parameters, Field}
import junctions._
import midas.widgets._
/** A simple freelist
* @param entries The number of IDS to be managed by the free list
*
* Inputs: freeId. Valid is asserted along side an ID that is to be
* returned to the freelist
*
* Outputs: nextId. The next available ID. Granted on a successful handshake
*/
class FreeList(entries: Int) extends Module {
val io = IO(new Bundle {
val freeId = Flipped(Valid(UInt(log2Up(entries).W)))
val nextId = Decoupled(UInt(log2Up(entries).W))
})
require(entries > 0)
val nextId = RegInit({ val i = Wire(Valid(UInt())); i.valid := true.B;
i.bits := 0.U; i})
io.nextId.valid := nextId.valid
io.nextId.bits := nextId.bits
// Add an extra entry to represent the empty bit. Maybe not necessary?
val ids = RegInit(Vec.tabulate(entries)(i =>
if (i == 0) false.B else true.B))
val next = ids.indexWhere((x:Bool) => x)
when(io.nextId.fire() || ~nextId.valid) {
nextId.bits := next
nextId.valid := ids.exists((x: Bool) => x)
ids(next) := false.B
}
when(io.freeId.valid) {
ids(io.freeId.bits) := true.B
}
}
// This maintains W-W R-R orderings by managing a set of shared physical
// queues based on the the NASTI id field.
class RATEntry(vIdWidth: Int, pIdWidth: Int) extends Bundle {
val current = Valid(UInt(vIdWidth.W))
val next = Valid(UInt(pIdWidth.W))
val head = Output(Bool())
def matchHead(id: UInt): Bool = {
(current.bits === id) && head
}
def matchTail(id: UInt): Bool = {
(current.bits === id) && (current.valid) && !next.valid
}
def push(id: UInt) {
next.bits := id
next.valid := true.B
}
def setTranslation(id: UInt) {
current.bits := id
current.valid := true.B
}
def setHead() { head := true.B }
def pop() {
current.valid := false.B
next.valid := false.B
head := false.B
}
override def cloneType() = new RATEntry(vIdWidth, pIdWidth).asInstanceOf[this.type]
}
object RATEntry {
def apply(vIdWidth: Int, pIdWidth: Int) = {
val entry = Wire(new RATEntry(vIdWidth, pIdWidth))
entry.current.valid := false.B
entry.current.bits := DontCare
entry.next.valid := false.B
entry.next.bits := DontCare
entry.head := false.B
entry
}
}
class AllocationIO(vIdWidth: Int, pIdWidth: Int) extends Bundle {
val pId = Output(UInt(pIdWidth.W))
val vId = Input(UInt(vIdWidth.W))
val ready = Output(Bool())
val valid = Input(Bool())
def fire(): Bool = ready && valid
}
class ReorderBuffer(val numVIds: Int, val numPIds: Int) extends Module {
val pIdWidth = log2Up(numPIds)
val vIdWidth = log2Up(numVIds)
val io = IO(new Bundle {
// Free a physical ID
val free = Flipped(Valid(UInt(pIdWidth.W)))
// ID Allocation. Two way handshake. The next available PId is held on
// nextPId.bits. nextPID.valid == false if there are no free IDs avaiable.
// Allocation occurs when nextPId.fire asserts
val next = new AllocationIO(vIdWidth, pIdWidth)
val trans = new AllocationIO(vIdWidth, pIdWidth)
})
val rat = RegInit(Vec.fill(numPIds)(RATEntry(vIdWidth, pIdWidth)))
val freeList = Module(new FreeList(numPIds))
freeList.io.freeId <> io.free
// PID allocation
io.next.ready := freeList.io.nextId.valid
freeList.io.nextId.ready := io.next.valid
val nextPId = freeList.io.nextId.bits
io.next.pId := nextPId
// Pointer to the child of an entry being freed (it will become the new head)
val nextHeadPtr = WireInit({val w = Wire(Valid(UInt(pIdWidth.W))); w.valid := false.B; w.bits := DontCare; w})
// Pointer to the parent of a entry being appended to a linked-list
val parentEntryPtr = Wire(Valid(UInt()))
parentEntryPtr.bits := rat.onlyIndexWhere(_.matchTail(io.next.vId))
parentEntryPtr.valid := io.next.fire() && rat.exists(_.matchTail(io.next.vId))
for((entry,index) <- rat.zipWithIndex){
// Allocation: Set the pointer of the new entry's parent
when(parentEntryPtr.valid && parentEntryPtr.bits === index.U){
rat(parentEntryPtr.bits).push(nextPId)
}
// Deallocation: Set the head bit of a link list whose head is to be freed
when(nextHeadPtr.valid && (index.U === nextHeadPtr.bits)) {
rat(index).setHead()
}
// Allocation: Add the new entry to the table
when(io.next.fire() && nextPId === index.U) {
rat(index).setTranslation(io.next.vId)
// We set the head bit if no linked-list exists for this vId, or
// if the parent, and thus previous head, is about to be freed.
when (~parentEntryPtr.valid ||
(io.trans.fire() && (io.trans.pId === parentEntryPtr.bits))){
rat(index).setHead()
}
}
// Deallocation: invalidate the entry = io.free.bits
// Note this exploits last connect semantics to override the pushing
// of new child to this entry when it is about to be freed.
when(io.trans.fire() && (index.U === io.trans.pId)) {
assert(rat(index).head)
nextHeadPtr := rat(index).next
rat(index).pop()
}
}
io.trans.pId := rat.onlyIndexWhere(_.matchHead(io.trans.vId))
io.trans.ready := rat.exists(_.matchHead(io.trans.vId))
}
// Read response staging units only buffer data and last fields of a B payload
class StoredBeat(implicit p: Parameters) extends NastiBundle()(p) with HasNastiData
// Buffers read reponses from the host-memory system in a structure that maintains
// their per-transaction ID ordering.
class ReadEgressResponseIO(implicit p: Parameters) extends NastiBundle()(p) {
val tBits = Output(new NastiReadDataChannel)
val tReady = Input(Bool()) // Really this is part of the input token to the egress unit...
val hValid = Output(Bool())
}
class ReadEgressReqIO(implicit p: Parameters) extends NastiBundle()(p) {
val t = Output(Valid(UInt(p(NastiKey).idBits.W)))
val hValid = Output(Bool())
}
class ReadEgress(maxRequests: Int, maxReqLength: Int, maxReqsPerId: Int)
(implicit val p: Parameters) extends Module {
val io = IO(new Bundle {
val enq = Flipped(Decoupled(new NastiReadDataChannel))
val resp = new ReadEgressResponseIO
val req = Flipped(new ReadEgressReqIO)
})
// The total BRAM state required to implement a maximum length queue for each AXI transaction ID
val virtualState = (maxReqsPerId * (1 << p(NastiKey).idBits) * maxReqLength * p(NastiKey).dataBits)
// The total BRAM state required to dynamically allocate a entres to responses
val physicalState = (maxRequests * maxReqLength * p(NastiKey).dataBits)
// 0x20000 = 4 32 Kb BRAMs
val generateTranslation = (virtualState > 0x20000) && (virtualState > physicalState + 0x10000)
// This module fires whenever there is a token available on the request port.
val targetFire = io.req.hValid
// On reset, the egress unit always has a single output token valid, but with invalid target data
val currReqReg = RegInit({
val r = Wire(io.req.t.cloneType)
r.valid := false.B
r.bits := DontCare
r
})
val xactionDone = Wire(Bool())
when (targetFire && io.req.t.valid) {
currReqReg := io.req.t
}.elsewhen (targetFire && xactionDone) {
currReqReg.valid := false.B
}
val xactionStart = targetFire && io.req.t.valid
// Queue address into which to enqueue the host-response
val enqPId = Wire(Valid(UInt()))
// Queue address from which to dequeue the response
val (deqPId: UInt, deqPIdReg: ValidIO[UInt]) = if (generateTranslation) {
val rob = Module(new ReorderBuffer(1 << p(NastiKey).idBits, maxRequests))
val enqPIdReg = RegInit({val i = Wire(Valid(UInt(log2Up(maxRequests).W)))
i.valid := false.B;
i.bits := DontCare;
i})
val deqPIdReg = RegInit({ val r = Wire(Valid(UInt(log2Up(maxRequests).W)));
r.valid := false.B;
r.bits := DontCare;
r })
val translationFailure = currReqReg.valid && ~deqPIdReg.valid
rob.io.trans.vId := Mux(translationFailure, currReqReg.bits, io.req.t.bits)
rob.io.trans.valid := translationFailure || xactionStart
rob.io.free.valid := xactionDone
rob.io.free.bits := deqPIdReg.bits
when(rob.io.trans.fire()) {
deqPIdReg.valid := rob.io.trans.fire()
deqPIdReg.bits := rob.io.trans.pId
}.elsewhen (targetFire && xactionDone) {
deqPIdReg.valid := false.B
}
//Don't initiate another allocation until the current one has finished
rob.io.next.vId := io.enq.bits.id
io.enq.ready := enqPId.valid
assert(enqPId.valid || ~io.enq.valid)
rob.io.next.valid := ~enqPIdReg.valid && io.enq.valid
enqPId.bits := Mux(enqPIdReg.valid, enqPIdReg.bits, rob.io.next.pId)
enqPId.valid := enqPIdReg.valid || rob.io.next.ready
when (io.enq.fire()) {
when (io.enq.bits.last) {
enqPIdReg.valid := false.B
}.elsewhen (~enqPIdReg.valid) {
enqPIdReg.valid := true.B
enqPIdReg.bits := rob.io.next.pId
}
}
// Deq using the translation if first beat, otherwise use the register
val deqPId = Mux(translationFailure || xactionStart, rob.io.trans.pId, deqPIdReg.bits)
(deqPId, deqPIdReg)
} else {
enqPId.bits := io.enq.bits.id
enqPId.valid := io.enq.valid
io.enq.ready := true.B
val deqPId = Mux(xactionStart, io.req.t.bits, currReqReg.bits)
(deqPId, currReqReg)
}
val mQDepth = if (generateTranslation) maxReqLength else maxReqLength * maxReqsPerId
val mQWidth = if (generateTranslation) maxRequests else 1 << p(NastiKey).idBits
val multiQueue = Module(new MultiQueue(new StoredBeat, mQWidth, mQDepth))
multiQueue.io.enq.bits.data := io.enq.bits.data
multiQueue.io.enq.bits.last := io.enq.bits.last
multiQueue.io.enq.valid := io.enq.valid
multiQueue.io.enqAddr := enqPId.bits
multiQueue.io.deqAddr := deqPId
xactionDone := targetFire && currReqReg.valid && deqPIdReg.valid &&
io.resp.tReady && io.resp.tBits.last
io.resp.tBits := NastiReadDataChannel(currReqReg.bits,
multiQueue.io.deq.bits.data, multiQueue.io.deq.bits.last)
io.resp.hValid := ~currReqReg.valid || (deqPIdReg.valid && multiQueue.io.deq.valid)
multiQueue.io.deq.ready := targetFire && currReqReg.valid &&
deqPIdReg.valid && io.resp.tReady
}
class WriteEgressResponseIO(implicit p: Parameters) extends NastiBundle()(p) {
val tBits = Output(new NastiWriteResponseChannel)
val tReady = Input(Bool())
val hValid = Output(Bool())
}
class WriteEgressReqIO(implicit p: Parameters) extends NastiBundle()(p) {
val t = Output(Valid(UInt(p(NastiKey).idBits.W)))
val hValid = Output(Bool())
}
// Maintains a series of incrementer/decrementers to track the number of
// write acknowledgements returned by the host memory system. No other
// response metadata is stored.
class WriteEgress(maxRequests: Int, maxReqLength: Int, maxReqsPerId: Int)
(implicit val p: Parameters) extends Module {
val io = IO(new Bundle {
val enq = Flipped(Decoupled(new NastiWriteResponseChannel))
val resp = new WriteEgressResponseIO
val req = Flipped(new WriteEgressReqIO)
})
// This module fires whenever there is a token available on the request port.
val targetFire = io.req.hValid
// Indicates whether the egress unit is releasing a transaction
val currReqReg = RegInit({
val r = Wire(io.req.t.cloneType)
r.valid := false.B
r.bits := DontCare
r
})
val haveAck = RegInit(false.B)
when (targetFire && io.req.t.valid) {
currReqReg := io.req.t
}.elsewhen (targetFire && currReqReg.valid && haveAck && io.resp.tReady) {
currReqReg.valid := false.B
}
val ackCounters = Seq.fill(1 << p(NastiKey).idBits)(RegInit(0.U(log2Up(maxReqsPerId + 1).W)))
val notEmpty = VecInit(ackCounters map {_ =/= 0.U})
val retry = currReqReg.valid && !haveAck
val deqId = Mux(retry, currReqReg.bits, io.req.t.bits)
when (retry || targetFire && io.req.t.valid) {
haveAck := notEmpty(deqId)
}
val idMatch = currReqReg.bits === io.enq.bits.id
val do_enq = io.enq.fire()
val do_deq = targetFire && currReqReg.valid && haveAck && io.resp.tReady
ackCounters.zipWithIndex foreach { case (count, idx) =>
when (!(do_deq && do_enq && idMatch)) {
when(do_enq && io.enq.bits.id === idx.U) {
count := count + 1.U
}.elsewhen(do_deq && currReqReg.bits === idx.U) {
count := count - 1.U
}
}
}
io.resp.tBits := NastiWriteResponseChannel(currReqReg.bits)
io.resp.hValid := !currReqReg.valid || haveAck
io.enq.ready := true.B
}
trait EgressUnitParameters {
val egressUnitDelay = 1
}

View File

@ -0,0 +1,577 @@
// See LICENSE for license details.
package midas
package models
// From RC
import freechips.rocketchip.config.{Parameters, Field}
import freechips.rocketchip.util.{DecoupledHelper}
import freechips.rocketchip.diplomacy.{LazyModule}
import freechips.rocketchip.amba.axi4.{AXI4EdgeParameters, AXI4Bundle}
import junctions._
import chisel3._
import chisel3.util._
import chisel3.experimental.dontTouch
import midas.core._
import midas.widgets._
import midas.passes.{Fame1ChiselAnnotation}
import midas.passes.fame.{HasSerializationHints}
import scala.math.min
import Console.{UNDERLINED, RESET}
import java.io.{File, FileWriter}
// Note: NASTI -> legacy rocket chip implementation of AXI4
case object FasedAXI4Edge extends Field[Option[AXI4EdgeSummary]](None)
case class BaseParams(
// Pessimistically provisions the functional model. Don't be cheap:
// underprovisioning will force functional model to assert backpressure on
// target AW. W or R channels, which may lead to unexpected bandwidth throttling.
maxReads: Int,
maxWrites: Int,
nastiKey: Option[NastiParameters] = None,
edge: Option[AXI4EdgeParameters] = None,
// AREA OPTIMIZATIONS:
// AXI4 bursts(INCR) can be 256 beats in length -- some
// area can be saved if the target design only issues smaller requests
maxReadLength: Int = 256,
maxReadsPerID: Option[Int] = None,
maxWriteLength: Int = 256,
maxWritesPerID: Option[Int] = None,
// DEBUG FEATURES
// Check for collisions in pending reads and writes to the host memory system
// May produce false positives in timing models that reorder requests
detectAddressCollisions: Boolean = false,
// HOST INSTRUMENTATION
stallEventCounters: Boolean = false, // To track causes of target-time stalls
localHCycleCount: Boolean = false, // Host Cycle Counter
latencyHistograms: Boolean = false, // Creates a BRAM histogram of various system latencies
// BASE TIMING-MODEL SETTINGS
// Some(key) instantiates an LLC model in front of the DRAM timing model
llcKey: Option[LLCParams] = None,
// BASE TIMING-MODEL INSTRUMENTATION
xactionCounters: Boolean = true, // Numbers of read and write AXI4 xactions
beatCounters: Boolean = false, // Numbers of read and write beats in AXI4 xactions
targetCycleCounter: Boolean = false, // Redundant in a full simulator; useful for testing
// Number of xactions in flight in a given cycle. Bin N contains the range
// (occupancyHistograms[N-1], occupancyHistograms[N]]
occupancyHistograms: Seq[Int] = Seq(0, 2, 4, 8),
addrRangeCounters: BigInt = BigInt(0)
)
// A serializable summary of the diplomatic edge
case class AXI4EdgeSummary(
maxReadTransfer: Int,
maxWriteTransfer: Int,
idReuse: Option[Int],
maxFlight: Option[Int],
)
object AXI4EdgeSummary {
// Returns max ID reuse; None -> unbounded
private def getIDReuseFromEdge(e: AXI4EdgeParameters): Option[Int] = {
val maxFlightPerMaster = e.master.masters.map(_.maxFlight)
maxFlightPerMaster.reduce( (_,_) match {
case (Some(prev), Some(cur)) => Some(scala.math.max(prev, cur))
case _ => None
})
}
// Returns (maxReadLength, maxWriteLength)
private def getMaxTransferFromEdge(e: AXI4EdgeParameters): (Int, Int) = {
val beatBytes = e.slave.beatBytes
val readXferSize = e.slave.slaves.head.supportsRead.max
val writeXferSize = e.slave.slaves.head.supportsWrite.max
((readXferSize + beatBytes - 1) / beatBytes, (writeXferSize + beatBytes - 1) / beatBytes)
}
// Sums up the maximum number of requests that can be inflight across all masters
// None -> unbounded
private def getMaxTotalFlightFromEdge(e: AXI4EdgeParameters): Option[Int] = {
val maxFlightPerMaster = e.master.masters.map(_.maxFlight)
maxFlightPerMaster.reduce( (_,_) match {
case (Some(prev), Some(cur)) => Some(prev + cur)
case _ => None
})
}
def apply(e: AXI4EdgeParameters): AXI4EdgeSummary = AXI4EdgeSummary(
getMaxTransferFromEdge(e)._1,
getMaxTransferFromEdge(e)._2,
getIDReuseFromEdge(e),
getMaxTotalFlightFromEdge(e))
}
abstract class BaseConfig {
def params: BaseParams
private def getMaxPerID(e: Option[AXI4EdgeSummary], modelMaxXactions: Int, userMax: Option[Int])(implicit p: Parameters): Int = {
e.flatMap(_.idReuse).getOrElse(min(userMax.getOrElse(modelMaxXactions), modelMaxXactions))
}
def maxReadLength(implicit p: Parameters) = p(FasedAXI4Edge) match {
case Some(e) => e.maxReadTransfer
case _ => params.maxReadLength
}
def maxWriteLength(implicit p: Parameters) = p(FasedAXI4Edge) match {
case Some(e) => e.maxWriteTransfer
case _ => params.maxWriteLength
}
def maxWritesPerID(implicit p: Parameters) = getMaxPerID(p(FasedAXI4Edge), params.maxWrites, params.maxWritesPerID)
def maxReadsPerID(implicit p: Parameters) = getMaxPerID(p(FasedAXI4Edge), params.maxReads, params.maxReadsPerID)
def maxWrites(implicit p: Parameters) = {
val maxFromEdge = p(FasedAXI4Edge).flatMap(_.maxFlight).getOrElse(params.maxWrites)
min(params.maxWrites, maxFromEdge)
}
def maxReads(implicit p: Parameters) = {
val maxFromEdge = p(FasedAXI4Edge).flatMap(_.maxFlight).getOrElse(params.maxReads)
min(params.maxReads, maxFromEdge)
}
def useLLCModel = params.llcKey != None
// Timing model classes implement this function to elaborate the correct module
def elaborate()(implicit p: Parameters): TimingModel
def maxWritesBits(implicit p: Parameters) = log2Up(maxWrites)
def maxReadsBits(implicit p: Parameters) = log2Up(maxReads)
}
// A wrapper bundle around all of the programmable settings in the functional model (!timing model).
class FuncModelProgrammableRegs extends Bundle with HasProgrammableRegisters {
val relaxFunctionalModel = Input(Bool())
val registers = Seq(
(relaxFunctionalModel -> RuntimeSetting(0, """Relax functional model""", max = Some(1)))
)
def getFuncModelSettings(): Seq[(String, String)] = {
Console.println(s"${UNDERLINED}Functional Model Settings${RESET}")
setUnboundSettings()
getSettings()
}
}
class FASEDTargetIO(implicit val p: Parameters) extends Bundle {
val axi4 = Flipped(new NastiIO)
val reset = Input(Bool())
}
class MemModelIO(implicit val p: Parameters) extends WidgetIO()(p){
// The default NastiKey is expected to be that of the target
val host_mem = new NastiIO()(p.alterPartial({ case NastiKey => p(MemNastiKey)}))
}
// Need to wrap up all the parameters in a case class for serialization. The edge and width
// were previously passed in via the target's Parameters object
case class CompleteConfig(
userProvided: BaseConfig,
axi4Widths: NastiParameters,
axi4Edge: Option[AXI4EdgeSummary] = None) extends HasSerializationHints {
def typeHints(): Seq[Class[_]] = Seq(userProvided.getClass)
}
class FASEDMemoryTimingModel(completeConfig: CompleteConfig, hostParams: Parameters) extends BridgeModule[HostPortIO[FASEDTargetIO]]()(hostParams) {
val cfg = completeConfig.userProvided
// Reconstitute the parameters object
implicit override val p = hostParams.alterPartial({
case NastiKey => completeConfig.axi4Widths
case FasedAXI4Edge => completeConfig.axi4Edge
})
require(p(NastiKey).idBits <= p(MemNastiKey).idBits,
"Target AXI4 IDs cannot be mapped 1:1 onto host AXI4 IDs"
)
val io = IO(new MemModelIO)
val hPort = IO(HostPort(new FASEDTargetIO))
val tNasti = hPort.hBits.axi4
val tReset = hPort.hBits.reset
val model = cfg.elaborate()
printGenerationConfig
// Debug: Put an optional bound on the number of memory requests we can make
// to the host memory system
val funcModelRegs = Wire(new FuncModelProgrammableRegs)
val ingress = Module(new IngressModule(cfg))
// Drop in a width adapter to handle differences between
// the host and target memory widths
val widthAdapter = Module(LazyModule(
new TargetToHostAXI4Converter(p(NastiKey), p(MemNastiKey))
).module)
val hostMemOffsetWidthOffset = io.host_mem.aw.bits.addr.getWidth - p(CtrlNastiKey).dataBits
val hostMemOffsetLowWidth = if (hostMemOffsetWidthOffset > 0) p(CtrlNastiKey).dataBits else io.host_mem.aw.bits.addr.getWidth
val hostMemOffsetHighWidth = if (hostMemOffsetWidthOffset > 0) hostMemOffsetWidthOffset else 0
val hostMemOffsetHigh = RegInit(0.U(hostMemOffsetHighWidth.W))
val hostMemOffsetLow = RegInit(0.U(hostMemOffsetLowWidth.W))
val hostMemOffset = Cat(hostMemOffsetHigh, hostMemOffsetLow)
attach(hostMemOffsetHigh, "hostMemOffsetHigh", WriteOnly)
attach(hostMemOffsetLow, "hostMemOffsetLow", WriteOnly)
io.host_mem <> widthAdapter.sAxi4
io.host_mem.aw.bits.user := DontCare
io.host_mem.aw.bits.region := DontCare
io.host_mem.ar.bits.user := DontCare
io.host_mem.ar.bits.region := DontCare
io.host_mem.w.bits.id := DontCare
io.host_mem.w.bits.user := DontCare
io.host_mem.ar.bits.addr := widthAdapter.sAxi4.ar.bits.addr + hostMemOffset
io.host_mem.aw.bits.addr := widthAdapter.sAxi4.aw.bits.addr + hostMemOffset
widthAdapter.mAxi4.aw <> ingress.io.nastiOutputs.aw
widthAdapter.mAxi4.ar <> ingress.io.nastiOutputs.ar
widthAdapter.mAxi4.w <> ingress.io.nastiOutputs.w
val readEgress = Module(new ReadEgress(
maxRequests = cfg.maxReads,
maxReqLength = cfg.maxReadLength,
maxReqsPerId = cfg.maxReadsPerID))
readEgress.io.enq <> widthAdapter.mAxi4.r
readEgress.io.enq.bits.user := DontCare
val writeEgress = Module(new WriteEgress(
maxRequests = cfg.maxWrites,
maxReqLength = cfg.maxWriteLength,
maxReqsPerId = cfg.maxWritesPerID))
writeEgress.io.enq <> widthAdapter.mAxi4.b
writeEgress.io.enq.bits.user := DontCare
// Track outstanding requests to the host memory system
val hOutstandingReads = SatUpDownCounter(cfg.maxReads)
hOutstandingReads.inc := io.host_mem.ar.fire()
hOutstandingReads.dec := io.host_mem.r.fire() && io.host_mem.r.bits.last
hOutstandingReads.max := cfg.maxReads.U
val hOutstandingWrites = SatUpDownCounter(cfg.maxWrites)
hOutstandingWrites.inc := io.host_mem.aw.fire()
hOutstandingWrites.dec := io.host_mem.b.fire()
hOutstandingWrites.max := cfg.maxWrites.U
val host_mem_idle = hOutstandingReads.empty && hOutstandingWrites.empty
// By default, disallow all R->W, W->R, and W->W reorderings in host memory
// system. see IngressUnit.scala for more detail
ingress.io.host_mem_idle := host_mem_idle
ingress.io.host_read_inflight := !hOutstandingReads.empty
ingress.io.relaxed := funcModelRegs.relaxFunctionalModel
// Five conditions to execute a target cycle:
// 1: AXI4 tokens are available, and there is space to enqueue a new input token
// 2: Ingress has space for requests snooped in token
val ingressReady = ingress.io.nastiInputs.hReady
// 3: Egress unit has produced the payloads for read response channel
val rReady = readEgress.io.resp.hValid
// 4: Egress unit has produced the payloads for write response channel
val bReady = writeEgress.io.resp.hValid
// 5: If targetReset is asserted the host-memory system must first settle
val tResetReady = (!tReset || host_mem_idle)
// decoupled helper fire currently doesn't support directly passing true/false.B as exclude
val tFireHelper = DecoupledHelper(hPort.toHost.hValid,
hPort.fromHost.hReady,
ingressReady, bReady, rReady, tResetReady)
val targetFire = tFireHelper.fire
// HACK: Feeding valid back on ready and ready back on valid until we figure out
// channel tokenization
hPort.toHost.hReady := tFireHelper.fire
hPort.fromHost.hValid := tFireHelper.fire
ingress.io.nastiInputs.hValid := tFireHelper.fire(ingressReady)
model.tNasti <> tNasti
model.reset := tReset
// Connect up aw to ingress and model
ingress.io.nastiInputs.hBits.aw.valid := tNasti.aw.fire
ingress.io.nastiInputs.hBits.aw.bits := tNasti.aw.bits
// Connect ar to ingress and model
ingress.io.nastiInputs.hBits.ar.valid := tNasti.ar.fire
ingress.io.nastiInputs.hBits.ar.bits := tNasti.ar.bits
// Connect w to ingress and model
ingress.io.nastiInputs.hBits.w.valid := tNasti.w.fire
ingress.io.nastiInputs.hBits.w.bits := tNasti.w.bits
// Connect target-level signals between egress and model
readEgress.io.req.t := model.io.egressReq.r
readEgress.io.req.hValid := targetFire
readEgress.io.resp.tReady := model.io.egressResp.rReady
model.io.egressResp.rBits := readEgress.io.resp.tBits
writeEgress.io.req.t := model.io.egressReq.b
writeEgress.io.req.hValid := targetFire
writeEgress.io.resp.tReady := model.io.egressResp.bReady
model.io.egressResp.bBits := writeEgress.io.resp.tBits
ingress.reset := reset.toBool || tReset && tFireHelper.fire(ingressReady)
readEgress.reset := reset.toBool || tReset && targetFire
writeEgress.reset := reset.toBool || tReset && targetFire
if (cfg.params.localHCycleCount) {
val hCycle = RegInit(0.U(32.W))
hCycle := hCycle + 1.U
attach(hCycle, "hostCycle", ReadOnly)
}
if (cfg.params.stallEventCounters) {
val writeEgressStalls = RegInit(0.U(32.W))
when(!bReady) {
writeEgressStalls := writeEgressStalls + 1.U
}
val readEgressStalls = RegInit(0.U(32.W))
when(!rReady) {
readEgressStalls := readEgressStalls + 1.U
}
val tokenStalls = RegInit(0.U(32.W))
when(!(tResetReady && hPort.toHost.hValid && hPort.fromHost.hReady)) {
tokenStalls := tokenStalls + 1.U
}
val hostMemoryIdleCycles = RegInit(0.U(32.W))
when(host_mem_idle) {
hostMemoryIdleCycles := hostMemoryIdleCycles + 1.U
}
when (targetFire) {
writeEgressStalls := 0.U
readEgressStalls := 0.U
tokenStalls := 0.U
}
attach(writeEgressStalls, "writeStalled", ReadOnly)
attach(readEgressStalls, "readStalled", ReadOnly)
attach(tokenStalls, "tokenStalled", ReadOnly)
}
if (cfg.params.detectAddressCollisions) {
val discardedMSBs = 6
val collision_checker = Module(new AddressCollisionChecker(
cfg.maxReads, cfg.maxWrites, p(NastiKey).addrBits - discardedMSBs))
collision_checker.io.read_req.valid := targetFire && tNasti.ar.fire
collision_checker.io.read_req.bits := tNasti.ar.bits.addr >> discardedMSBs
collision_checker.io.read_done := io.host_mem.r.fire && io.host_mem.r.bits.last
collision_checker.io.write_req.valid := targetFire && tNasti.aw.fire
collision_checker.io.write_req.bits := tNasti.aw.bits.addr >> discardedMSBs
collision_checker.io.write_done := io.host_mem.b.fire
val collision_addr = RegEnable(collision_checker.io.collision_addr.bits,
targetFire & collision_checker.io.collision_addr.valid)
val num_collisions = RegInit(0.U(32.W))
when (targetFire && collision_checker.io.collision_addr.valid) {
num_collisions := num_collisions + 1.U
}
attach(num_collisions, "addrCollision", ReadOnly)
attach(collision_addr, "collisionAddr", ReadOnly)
}
if (cfg.params.latencyHistograms) {
// Measure latency from reception of first read data beat; need
// some state to track when a beat corresponds to the start of a new xaction
val newHRead = RegInit(true.B)
when (readEgress.io.enq.fire && readEgress.io.enq.bits.last) {
newHRead := true.B
}.elsewhen (readEgress.io.enq.fire) {
newHRead := false.B
}
// Latencies of host xactions
val hReadLatencyHist = HostLatencyHistogram(
ingress.io.nastiOutputs.ar.fire,
ingress.io.nastiOutputs.ar.bits.id,
readEgress.io.enq.fire && newHRead,
readEgress.io.enq.bits.id
)
attachIO(hReadLatencyHist, "hostReadLatencyHist_")
val hWriteLatencyHist = HostLatencyHistogram(
ingress.io.nastiOutputs.aw.fire,
ingress.io.nastiOutputs.aw.bits.id,
writeEgress.io.enq.fire,
writeEgress.io.enq.bits.id
)
attachIO(hWriteLatencyHist, "hostWriteLatencyHist_")
// target-time latencies of xactions
val newTRead = RegInit(true.B)
// Measure latency from reception of first read data beat; need
// some state to track when a beat corresponds to the start of a new xaction
when (targetFire) {
when (model.tNasti.r.fire && model.tNasti.r.bits.last) {
newTRead := true.B
}.elsewhen (model.tNasti.r.fire) {
newTRead := false.B
}
}
val tReadLatencyHist = HostLatencyHistogram(
model.tNasti.ar.fire && targetFire,
model.tNasti.ar.bits.id,
model.tNasti.r.fire && targetFire && newTRead,
model.tNasti.r.bits.id,
cycleCountEnable = targetFire
)
attachIO(tReadLatencyHist, "targetReadLatencyHist_")
val tWriteLatencyHist = HostLatencyHistogram(
model.tNasti.aw.fire && targetFire,
model.tNasti.aw.bits.id,
model.tNasti.b.fire && targetFire,
model.tNasti.b.bits.id,
cycleCountEnable = targetFire
)
attachIO(tWriteLatencyHist, "targetWriteLatencyHist_")
// Total host-latency of transactions
val totalReadLatencyHist = HostLatencyHistogram(
model.tNasti.ar.fire && targetFire,
model.tNasti.ar.bits.id,
model.tNasti.r.fire && targetFire && newTRead,
model.tNasti.r.bits.id
)
attachIO(totalReadLatencyHist, "totalReadLatencyHist_")
val totalWriteLatencyHist = HostLatencyHistogram(
model.tNasti.aw.fire && targetFire,
model.tNasti.aw.bits.id,
model.tNasti.b.fire && targetFire,
model.tNasti.b.bits.id
)
attachIO(totalWriteLatencyHist, "totalWriteLatencyHist_")
// Ingress latencies
val iReadLatencyHist = HostLatencyHistogram(
ingress.io.nastiInputs.hBits.ar.fire() && targetFire,
ingress.io.nastiInputs.hBits.ar.bits.id,
ingress.io.nastiOutputs.ar.fire,
ingress.io.nastiOutputs.ar.bits.id
)
attachIO(iReadLatencyHist, "ingressReadLatencyHist_")
val iWriteLatencyHist = HostLatencyHistogram(
ingress.io.nastiInputs.hBits.aw.fire() && targetFire,
ingress.io.nastiInputs.hBits.aw.bits.id,
ingress.io.nastiOutputs.aw.fire,
ingress.io.nastiOutputs.aw.bits.id
)
attachIO(iWriteLatencyHist, "ingressWriteLatencyHist_")
}
if (cfg.params.addrRangeCounters > 0) {
val n = cfg.params.addrRangeCounters
val readRanges = AddressRangeCounter(n, model.tNasti.ar, targetFire)
val writeRanges = AddressRangeCounter(n, model.tNasti.aw, targetFire)
val numRanges = n.U(32.W)
attachIO(readRanges, "readRanges_")
attachIO(writeRanges, "writeRanges_")
attach(numRanges, "numRanges", ReadOnly)
}
val rrespError = RegEnable(io.host_mem.r.bits.resp, 0.U,
io.host_mem.r.bits.resp =/= 0.U && io.host_mem.r.fire)
val brespError = RegEnable(io.host_mem.r.bits.resp, 0.U,
io.host_mem.b.bits.resp =/= 0.U && io.host_mem.b.fire)
// Generate the configuration registers and tie them to the ctrl bus
attachIO(model.io.mmReg)
attachIO(funcModelRegs)
attach(rrespError, "rrespError", ReadOnly)
attach(brespError, "brespError", ReadOnly)
genCRFile()
dontTouch(targetFire)
chisel3.experimental.annotate(Fame1ChiselAnnotation(model, "targetFire"))
getDefaultSettings("runtime.conf")
override def genHeader(base: BigInt, sb: StringBuilder) {
def genCPPmap(mapName: String, map: Map[String, BigInt]): String = {
val prefix = s"const std::map<std::string, int> $mapName = {\n"
map.foldLeft(prefix)((str, kvp) => str + s""" {\"${kvp._1}\", ${kvp._2}},\n""") + "};\n"
}
import midas.widgets.CppGenerationUtils._
super.genHeader(base, sb)
sb.append(CppGenerationUtils.genMacro(s"${getWName.toUpperCase}_target_addr_bits", UInt32(p(NastiKey).addrBits)))
crRegistry.genArrayHeader(wName.getOrElse(name).toUpperCase, base, sb)
}
// Prints out key elaboration time settings
private def printGenerationConfig(): Unit = {
println("Generating a Midas Memory Model")
println(" Max Read Requests: " + cfg.maxReads)
println(" Max Write Requests: " + cfg.maxReads)
println(" Max Read Length: " + cfg.maxReadLength)
println(" Max Write Length: " + cfg.maxWriteLength)
println(" Max Read ID Reuse: " + cfg.maxReadsPerID)
println(" Max Write ID Reuse: " + cfg.maxWritesPerID)
println("\nTiming Model Parameters")
model.printGenerationConfig
cfg.params.llcKey match {
case Some(key) => key.print()
case None => println(" No LLC Model Instantiated\n")
}
}
// Accepts an elaborated memory model and generates a runtime configuration for it
private def emitSettings(fileName: String, settings: Seq[(String, String)])(implicit p: Parameters): Unit = {
val file = new File(p(OutputDir), fileName)
val writer = new FileWriter(file)
settings.foreach({
case (field, value) => writer.write(s"+mm_${field}=${value}\n")
})
writer.close
}
def getSettings(fileName: String)(implicit p: Parameters) {
println("\nGenerating a Midas Memory Model Configuration File")
val functionalModelSettings = funcModelRegs.getFuncModelSettings()
val timingModelSettings = model.io.mmReg.getTimingModelSettings()
emitSettings(fileName, functionalModelSettings ++ timingModelSettings)
}
def getDefaultSettings(fileName: String)(implicit p: Parameters) {
val functionalModelSettings = funcModelRegs.getDefaults()
val timingModelSettings = model.io.mmReg.getDefaults()
emitSettings(fileName, functionalModelSettings ++ timingModelSettings)
}
}
class FASEDBridge(argument: CompleteConfig)(implicit p: Parameters)
extends BlackBox with Bridge[HostPortIO[FASEDTargetIO], FASEDMemoryTimingModel] {
val io = IO(new FASEDTargetIO)
val bridgeIO = HostPort(io)
val constructorArg = Some(argument)
generateAnnotations()
}
object FASEDBridge {
def apply(axi4: AXI4Bundle, reset: Bool, cfg: CompleteConfig)(implicit p: Parameters): FASEDBridge = {
val ep = Module(new FASEDBridge(cfg)(p.alterPartial({ case NastiKey => cfg.axi4Widths })))
ep.io.reset := reset
import chisel3.core.ExplicitCompileOptions.NotStrict
ep.io.axi4 <> axi4
ep
}
}

View File

@ -0,0 +1,172 @@
package midas
package models
import chisel3._
import chisel3.util._
import freechips.rocketchip.config.Parameters
import junctions._
import midas.widgets._
import Console.{UNDERLINED, RESET}
case class FIFOMASConfig(
dramKey: DramOrganizationParams,
transactionQueueDepth: Int,
backendKey: DRAMBackendKey = DRAMBackendKey(4, 4, DRAMMasEnums.maxDRAMTimingBits),
params: BaseParams)
extends DRAMBaseConfig {
def elaborate()(implicit p: Parameters): FIFOMASModel = Module(new FIFOMASModel(this)(p))
}
class FIFOMASMMRegIO(val cfg: FIFOMASConfig) extends BaseDRAMMMRegIO(cfg) {
val registers = dramBaseRegisters
def requestSettings() {
Console.println(s"Configuring a First-Come First-Serve Model")
setBaseDRAMSettings()
}
}
class FIFOMASIO(val cfg: FIFOMASConfig)(implicit p: Parameters) extends TimingModelIO()(p) {
val mmReg = new FIFOMASMMRegIO(cfg)
//override def clonetype = new FIFOMASIO(cfg)(p).asInstanceOf[this.type]
}
class FIFOMASModel(cfg: FIFOMASConfig)(implicit p: Parameters) extends TimingModel(cfg)(p)
with HasDRAMMASConstants {
val longName = "FIFO MAS"
def printTimingModelGenerationConfig {}
/**************************** CHISEL BEGINS *********************************/
import DRAMMasEnums._
lazy val io = IO(new FIFOMASIO(cfg))
val timings = io.mmReg.dramTimings
val backend = Module(new DRAMBackend(cfg.backendKey))
val xactionScheduler = Module(new UnifiedFIFOXactionScheduler(cfg.transactionQueueDepth, cfg))
xactionScheduler.io.req <> nastiReq
xactionScheduler.io.pendingAWReq := pendingAWReq.value
xactionScheduler.io.pendingWReq := pendingWReq.value
val currentReference = Queue({
val next = Wire(Decoupled(new MASEntry(cfg)))
next.valid := xactionScheduler.io.nextXaction.valid
next.bits.decode(xactionScheduler.io.nextXaction.bits, io.mmReg)
xactionScheduler.io.nextXaction.ready := next.ready
next
}, 1, pipe = true)
val selectedCmd = WireInit(cmd_nop)
val memReqDone = (selectedCmd === cmd_casr || selectedCmd === cmd_casw)
// Trackers controller-level structural hazards
val cmdBusBusy = Module(new DownCounter((maxDRAMTimingBits)))
cmdBusBusy.io.decr := true.B
// Trackers for bank-level hazards and timing violations
val rankStateTrackers = Seq.fill(cfg.dramKey.maxRanks)(Module(new RankStateTracker(cfg.dramKey)))
val currentRank = VecInit(rankStateTrackers map { _.io.rank })(currentReference.bits.rankAddr)
val bankMuxes = VecInit(rankStateTrackers map { tracker => tracker.io.rank.banks(currentReference.bits.bankAddr) })
val currentBank = WireInit(bankMuxes(currentReference.bits.rankAddr))
// Command scheduling logic
val cmdRow = currentReference.bits.rowAddr
val cmdRank = WireInit(UInt(cfg.dramKey.rankBits.W), init = currentReference.bits.rankAddr)
val cmdBank = WireInit(currentReference.bits.bankAddr)
val cmdBankOH = UIntToOH(cmdBank)
val currentRowHit = currentBank.state === bank_active && cmdRow === currentBank.openRow
val casAutoPRE = WireInit(false.B)
val canCASW = backend.io.newWrite.ready && currentReference.valid &&
currentRowHit && currentReference.bits.xaction.isWrite && currentBank.canCASW &&
currentRank.canCASW && !currentRank.wantREF
val canCASR = backend.io.newRead.ready && currentReference.valid && currentRowHit &&
!currentReference.bits.xaction.isWrite && currentBank.canCASR && currentRank.canCASR &&
!currentRank.wantREF
val refreshUnit = Module(new RefreshUnit(cfg.dramKey)).io
refreshUnit.ranksInUse := io.mmReg.rankAddr.maskToOH()
refreshUnit.rankStati.zip(rankStateTrackers) foreach { case (refInput, tracker) =>
refInput := tracker.io.rank }
when (refreshUnit.suggestREF) {
selectedCmd := cmd_ref
cmdRank := refreshUnit.refRankAddr
}.elsewhen (refreshUnit.suggestPRE) {
selectedCmd := cmd_pre
cmdRank := refreshUnit.preRankAddr
cmdBank := refreshUnit.preBankAddr
}.elsewhen(io.mmReg.openPagePolicy) {
when (canCASR) {
selectedCmd := cmd_casr
}.elsewhen (canCASW) {
selectedCmd := cmd_casw
}.elsewhen (currentReference.valid && currentBank.canACT && currentRank.canACT && !currentRank.wantREF) {
selectedCmd := cmd_act
}.elsewhen (currentReference.valid && !currentRowHit && currentBank.canPRE && currentRank.canPRE) {
selectedCmd := cmd_pre
}
}.otherwise {
when (canCASR) {
selectedCmd := cmd_casr
casAutoPRE := true.B
}.elsewhen (canCASW) {
selectedCmd := cmd_casw
casAutoPRE := true.B
}.elsewhen (currentReference.valid && currentBank.canACT && currentRank.canACT && !currentRank.wantREF) {
selectedCmd := cmd_act
}
}
rankStateTrackers.zip(UIntToOH(cmdRank).toBools) foreach { case (state, cmdUsesThisRank) =>
state.io.selectedCmd := selectedCmd
state.io.cmdBankOH := cmdBankOH
state.io.cmdRow := cmdRow
state.io.autoPRE := casAutoPRE
state.io.cmdUsesThisRank := cmdUsesThisRank
state.io.timings := timings
state.io.tCycle := tCycle
}
// TODO: sensible mapping to DRAM bus width
cmdBusBusy.io.set.bits := timings.tCMD - 1.U
cmdBusBusy.io.set.valid := (selectedCmd =/= cmd_nop)
currentReference.ready := memReqDone
backend.io.tCycle := tCycle
backend.io.newRead.bits := ReadResponseMetaData(currentReference.bits.xaction)
backend.io.newRead.valid := memReqDone && !currentReference.bits.xaction.isWrite
backend.io.readLatency := timings.tCAS + timings.tAL + io.mmReg.backendLatency
// For writes we send out the acknowledge immediately
backend.io.newWrite.bits := WriteResponseMetaData(currentReference.bits.xaction)
backend.io.newWrite.valid := memReqDone && currentReference.bits.xaction.isWrite
backend.io.writeLatency := 1.U
wResp <> backend.io.completedWrite
rResp <> backend.io.completedRead
// Dump the command stream
val cmdMonitor = Module(new CommandBusMonitor())
cmdMonitor.io.cmd := selectedCmd
cmdMonitor.io.rank := cmdRank
cmdMonitor.io.bank := cmdBank
cmdMonitor.io.row := cmdRow
cmdMonitor.io.autoPRE := casAutoPRE
val powerStats = (rankStateTrackers).zip(UIntToOH(cmdRank).toBools) map {
case (rankState, cmdUsesThisRank) =>
val powerMonitor = Module(new RankPowerMonitor(cfg.dramKey))
powerMonitor.io.selectedCmd := selectedCmd
powerMonitor.io.cmdUsesThisRank := cmdUsesThisRank
powerMonitor.io.rankState := rankState.io.rank
powerMonitor.io.stats
}
io.mmReg.rankPower := VecInit(powerStats)
}

View File

@ -0,0 +1,290 @@
package midas
package models
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.util.GenericParameterizedBundle
import chisel3._
import chisel3.util._
import junctions._
import midas.widgets._
import Console.{UNDERLINED, RESET}
case class FirstReadyFCFSConfig(
dramKey: DramOrganizationParams,
schedulerWindowSize: Int,
transactionQueueDepth: Int,
backendKey: DRAMBackendKey = DRAMBackendKey(4, 4, DRAMMasEnums.maxDRAMTimingBits),
params: BaseParams)
extends DRAMBaseConfig {
def elaborate()(implicit p: Parameters): FirstReadyFCFSModel = Module(new FirstReadyFCFSModel(this))
}
class FirstReadyFCFSMMRegIO(val cfg: FirstReadyFCFSConfig) extends BaseDRAMMMRegIO(cfg) {
val schedulerWindowSize = Input(UInt(log2Ceil(cfg.schedulerWindowSize).W))
val transactionQueueDepth = Input(UInt(log2Ceil(cfg.transactionQueueDepth).W))
val registers = dramBaseRegisters ++ Seq(
(schedulerWindowSize -> RuntimeSetting(
default = cfg.schedulerWindowSize,
query = "Reference queue depth",
min = 1,
max = Some(cfg.schedulerWindowSize))),
transactionQueueDepth -> RuntimeSetting(
default = cfg.transactionQueueDepth,
query = "Transaction queue depth",
min = 1,
max = Some(cfg.transactionQueueDepth)))
def requestSettings() {
Console.println(s"Configuring First-Ready First-Come First Serve Model")
setBaseDRAMSettings()
}
}
class FirstReadyFCFSIO(val cfg: FirstReadyFCFSConfig)(implicit p: Parameters) extends TimingModelIO()(p){
val mmReg = new FirstReadyFCFSMMRegIO(cfg)
}
class FirstReadyFCFSModel(cfg: FirstReadyFCFSConfig)(implicit p: Parameters) extends TimingModel(cfg)(p)
with HasDRAMMASConstants {
val longName = "First-Ready FCFS MAS"
def printTimingModelGenerationConfig {}
/**************************** CHISEL BEGINS *********************************/
import DRAMMasEnums._
lazy val io = IO(new FirstReadyFCFSIO(cfg))
val timings = io.mmReg.dramTimings
val backend = Module(new DRAMBackend(cfg.backendKey))
val xactionScheduler = Module(new UnifiedFIFOXactionScheduler(cfg.transactionQueueDepth, cfg))
xactionScheduler.io.req <> nastiReq
xactionScheduler.io.pendingAWReq := pendingAWReq.value
xactionScheduler.io.pendingWReq := pendingWReq.value
// Trackers for controller-level structural hazards
val cmdBusBusy = Module(new DownCounter((maxDRAMTimingBits)))
cmdBusBusy.io.decr := true.B
// Forward declared wires
val selectedCmd = WireInit(cmd_nop)
val memReqDone = (selectedCmd === cmd_casr || selectedCmd === cmd_casw)
// Trackers for DRAM timing violations
val rankStateTrackers = Seq.fill(cfg.dramKey.maxRanks)(Module(new RankStateTracker(cfg.dramKey)))
// Prevents closing a row before a CAS command has been issued for the ready entry
// Instead of counting the number, we keep a bit to indicate presence
// it is set on activation, enqueuing a new ready entry, and unset when a memreq kills the last
// ready entry
val bankHasReadyEntries = RegInit(VecInit(Seq.fill(cfg.dramKey.maxRanks * cfg.dramKey.maxBanks)(false.B)))
// State for the collapsing buffer of pending memory references
val newReference = Wire(Decoupled(new FirstReadyFCFSEntry(cfg)))
newReference.valid := xactionScheduler.io.nextXaction.valid
newReference.bits.decode(xactionScheduler.io.nextXaction.bits, io.mmReg)
// Mark that the new reference hits an open row buffer, in case it missed the broadcast
val rowHitsInRank = VecInit(rankStateTrackers map { tracker =>
VecInit(tracker.io.rank.banks map { _.isRowHit(newReference.bits)}).asUInt })
xactionScheduler.io.nextXaction.ready := newReference.ready
val refBuffer = CollapsingBuffer(
enq = newReference,
depth = cfg.schedulerWindowSize,
programmableDepth = Some(io.mmReg.schedulerWindowSize)
)
val refList = refBuffer.io.entries
val refUpdates = refBuffer.io.updates
// Selects the oldest candidate from all ready references that can legally request a CAS
val columnArbiter = Module(new Arbiter(refList.head.bits.cloneType, refList.size))
def checkRankBankLegality(getField: CommandLegalBools => Bool)(masEntry: FirstReadyFCFSEntry): Bool = {
val bankFields = rankStateTrackers map { rank => VecInit(rank.io.rank.banks map getField).asUInt }
val bankLegal = (Mux1H(masEntry.rankAddrOH, bankFields) & masEntry.bankAddrOH).orR
val rankFields = VecInit(rankStateTrackers map { rank => getField(rank.io.rank) }).asUInt
val rankLegal = (masEntry.rankAddrOH & rankFields).orR
rankLegal && bankLegal
}
def rankWantsRef(rankAddrOH: UInt): Bool =
(rankAddrOH & (VecInit(rankStateTrackers map { _.io.rank.wantREF }).asUInt)).orR
val canLegallyCASR = checkRankBankLegality( _.canCASR ) _
val canLegallyCASW = checkRankBankLegality(_.canCASW) _
val canLegallyACT = checkRankBankLegality(_.canACT) _
val canLegallyPRE = checkRankBankLegality(_.canPRE) _
columnArbiter.io.in <> refList.map({ entry =>
val candidate = V2D(entry)
val canCASR = canLegallyCASR(entry.bits) && backend.io.newRead.ready
val canCASW = canLegallyCASW(entry.bits) && backend.io.newWrite.ready
candidate.valid := entry.valid && entry.bits.isReady &&
Mux(entry.bits.xaction.isWrite, canCASW, canCASR) &&
!rankWantsRef(entry.bits.rankAddrOH)
candidate
})
val entryWantsPRE = refList map { ref => ref.valid && ref.bits.wantPRE() && canLegallyPRE(ref.bits) }
val entryWantsACT = refList map { ref => ref.valid && ref.bits.wantACT() && canLegallyACT(ref.bits) &&
!rankWantsRef(ref.bits.rankAddrOH) }
val preBank = PriorityMux(entryWantsPRE, refList.map(_.bits.bankAddr))
val preRank = PriorityMux(entryWantsPRE, refList.map(_.bits.rankAddr))
val suggestPre = entryWantsPRE reduce {_ || _}
val actRank = PriorityMux(entryWantsACT, refList.map(_.bits.rankAddr))
val actBank = PriorityMux(entryWantsACT, refList.map(_.bits.bankAddr))
val actRow = PriorityMux(entryWantsACT, refList.map(_.bits.rowAddr))
// See if the oldest pending row reference wants a PRE an ACT
val suggestAct = (entryWantsACT.zip(entryWantsPRE)).foldRight(false.B)({
case ((act, pre), current) => Mux(act, true.B, !pre && current) })
// NB: These are not driven for all command types. Ex. When issuing a CAS cmdRow
// will not correspond to the row of the CAS command since that is implicit
// to the state of the bank.
val cmdBank = WireInit(UInt(cfg.dramKey.bankBits.W), init = preBank)
val cmdBankOH = UIntToOH(cmdBank)
val cmdRank = WireInit(UInt(cfg.dramKey.rankBits.W), init = columnArbiter.io.out.bits.rankAddr)
val cmdRow = actRow
val refreshUnit = Module(new RefreshUnit(cfg.dramKey)).io
refreshUnit.ranksInUse := io.mmReg.rankAddr.maskToOH()
refreshUnit.rankStati.zip(rankStateTrackers) foreach { case (refInput, tracker) =>
refInput := tracker.io.rank }
when (refreshUnit.suggestREF) {
selectedCmd := cmd_ref
cmdRank := refreshUnit.refRankAddr
}.elsewhen (refreshUnit.suggestPRE) {
selectedCmd := cmd_pre
cmdRank := refreshUnit.preRankAddr
cmdBank := refreshUnit.preBankAddr
}.elsewhen(columnArbiter.io.out.valid){
selectedCmd := Mux(columnArbiter.io.out.bits.xaction.isWrite, cmd_casw, cmd_casr)
cmdBank := columnArbiter.io.out.bits.bankAddr
}.elsewhen(suggestAct) {
selectedCmd := cmd_act
cmdRank := actRank
cmdBank := actBank
}.elsewhen(suggestPre) {
selectedCmd := cmd_pre
cmdRank := preRank
}
// Remove a reference if it is granted a column access
columnArbiter.io.out.ready := selectedCmd === cmd_casw || selectedCmd === cmd_casr
// Take the readies from the arbiter, and kill the selected entry
val entriesStillReady = refUpdates.zip(columnArbiter.io.in) map { case (ref, sel) =>
when (sel.fire()) { ref.valid := false.B }
// If the entry is not killed, but shares the same open row as the killed reference, return true
!sel.fire() && ref.valid && ref.bits.isReady &&
cmdBank === ref.bits.bankAddr && cmdRank === ref.bits.rankAddr
}
val otherReadyEntries = entriesStillReady reduce { _ || _ }
val casAutoPRE = Mux(io.mmReg.openPagePolicy, false.B, memReqDone && !otherReadyEntries)
// Mark new entries that now hit in a open row buffer
// Or invalidate them if a precharge was issued
refUpdates.foreach({ ref =>
when(cmdRank === ref.bits.rankAddr && cmdBank === ref.bits.bankAddr) {
when (selectedCmd === cmd_act) {
ref.bits.isReady := ref.bits.rowAddr === cmdRow
ref.bits.mayPRE := false.B
}.elsewhen (selectedCmd === cmd_pre) {
ref.bits.isReady := false.B
ref.bits.mayPRE := false.B
}.elsewhen (memReqDone && !otherReadyEntries) {
ref.bits.mayPRE := true.B
}
}
})
val newRefAddrMatch = newReference.bits.addrMatch(cmdRank, cmdBank, Some(cmdRow))
val newRefBankAddrMatch = newReference.bits.addrMatch(cmdRank, cmdBank)
newReference.bits.isReady := // 1) Row just opened or 2) already open && No precharges to that row this cycle
selectedCmd === cmd_act && newRefAddrMatch ||
(rowHitsInRank(newReference.bits.rankAddr) & newReference.bits.bankAddrOH).orR &&
!(memReqDone && casAutoPRE && newRefBankAddrMatch) && !(selectedCmd === cmd_pre && newRefBankAddrMatch)
// Useful only for the open-page policy. In closed page policy, precharges
// are always issued as part of auto-pre commands on in preperation for refresh.
newReference.bits.mayPRE := // Last ready reference serviced or no other ready entries
Mux(io.mmReg.openPagePolicy,
// 1:The last ready request has been made to the bank
newReference.bits.addrMatch(cmdRank, cmdBank) && memReqDone && !otherReadyEntries ||
// 2: There are no ready references, and a precharge is not being issued to the bank this cycle
!bankHasReadyEntries(Cat(newReference.bits.rankAddr, newReference.bits.bankAddr)) &&
!(selectedCmd === cmd_pre && newRefBankAddrMatch),
false.B)
// Check if the broadcasted cmdBank and cmdRank hit a ready entry
when(memReqDone || selectedCmd === cmd_act) {
bankHasReadyEntries(Cat(cmdRank, cmdBank)) := memReqDone && otherReadyEntries || selectedCmd === cmd_act
}
when (newReference.bits.isReady & newReference.fire()){
bankHasReadyEntries(Cat(newReference.bits.rankAddr, newReference.bits.bankAddr)) := true.B
}
rankStateTrackers.zip(UIntToOH(cmdRank).toBools) foreach { case (state, cmdUsesThisRank) =>
state.io.selectedCmd := selectedCmd
state.io.cmdBankOH := cmdBankOH
state.io.cmdRow := cmdRow
state.io.autoPRE := casAutoPRE
state.io.cmdUsesThisRank := cmdUsesThisRank
state.io.timings := timings
state.io.tCycle := tCycle
}
cmdBusBusy.io.set.bits := timings.tCMD - 1.U
cmdBusBusy.io.set.valid := selectedCmd =/= cmd_nop
backend.io.tCycle := tCycle
backend.io.newRead.bits := ReadResponseMetaData(columnArbiter.io.out.bits.xaction)
backend.io.newRead.valid := memReqDone && !columnArbiter.io.out.bits.xaction.isWrite
backend.io.readLatency := timings.tCAS + timings.tAL + io.mmReg.backendLatency
// For writes we send out the acknowledge immediately
backend.io.newWrite.bits := WriteResponseMetaData(columnArbiter.io.out.bits.xaction)
backend.io.newWrite.valid := memReqDone && columnArbiter.io.out.bits.xaction.isWrite
backend.io.writeLatency := 1.U
wResp <> backend.io.completedWrite
rResp <> backend.io.completedRead
// Dump the cmd stream
val cmdMonitor = Module(new CommandBusMonitor())
cmdMonitor.io.cmd := selectedCmd
cmdMonitor.io.rank := cmdRank
cmdMonitor.io.bank := cmdBank
cmdMonitor.io.row := cmdRow
cmdMonitor.io.autoPRE := casAutoPRE
val powerStats = (rankStateTrackers).zip(UIntToOH(cmdRank).toBools) map {
case (rankState, cmdUsesThisRank) =>
val powerMonitor = Module(new RankPowerMonitor(cfg.dramKey))
powerMonitor.io.selectedCmd := selectedCmd
powerMonitor.io.cmdUsesThisRank := cmdUsesThisRank
powerMonitor.io.rankState := rankState.io.rank
powerMonitor.io.stats
}
io.mmReg.rankPower := VecInit(powerStats)
}

View File

@ -0,0 +1,160 @@
package midas
package models
// From RC
import freechips.rocketchip.config.{Parameters}
import freechips.rocketchip.util.{DecoupledHelper}
import junctions._
import chisel3._
import chisel3.util.{Queue}
import midas.core.{HostDecoupled}
import midas.widgets.{SatUpDownCounter}
// The ingress module queues up incoming target requests, and issues them to the
// host memory system.
// NB: The AXI4 imposes no ordering between in flight reads and writes. In the
// event the target-master issues a read and a write to an overlapping memory
// region, host-memory-system reorderings of those requests will result in
// non-deterministic target behavior.
//
// asserting io.relaxed = true, allows the ingress unit to issue requests ASAP. This
// is a safe optimization only for non-chump-city AXI4 masters.
//
// asserting io.relaxed = false, will force the ingress unit to pessimistically
// issue host-memory requests to prevent reorderings,by waiting for the
// host-memory system to go idle before 1) issuing any write, 2) issuing a read
// if there is a write inflight. (I did not want to add the extra complexity of
// tracking inflight addresses.) This has the effect of forcing reads to see
// the value of youngest write for which the AW and all W beats have been
// accepted, but no write acknowledgement has been issued..
trait IngressModuleParameters {
val cfg: BaseConfig
implicit val p: Parameters
// In general the only consequence of undersizing these are more wasted
// host cycles the model waits to drain these
val ingressAWQdepth = cfg.maxWrites
val ingressWQdepth = 2*cfg.maxWriteLength
val ingressARQdepth = 4
// DEADLOCK RISK: if the host memory system accepts only one AW while a W
// xaction is inflight, and the entire W-transaction is not available in the
// ingress module the host memory system will drain the WQueue without
// consuming another AW token. The target will remain stalled and cannot
// complete the W xaction.
require(ingressWQdepth >= cfg.maxWriteLength)
require(ingressAWQdepth >= cfg.maxWrites)
}
class IngressModule(val cfg: BaseConfig)(implicit val p: Parameters) extends Module
with IngressModuleParameters {
val io = IO(new Bundle {
// This is target valid and not decoupled because the model has handshaked
// the target-level channels already for us
val nastiInputs = Flipped(HostDecoupled((new ValidNastiReqChannels)))
val nastiOutputs = new NastiReqChannels
val relaxed = Input(Bool())
val host_mem_idle = Input(Bool())
val host_read_inflight = Input(Bool())
})
val awQueue = Module(new Queue(new NastiWriteAddressChannel, ingressAWQdepth))
val wQueue = Module(new Queue(new NastiWriteDataChannel, ingressWQdepth))
val arQueue = Module(new Queue(new NastiReadAddressChannel, ingressARQdepth))
// Host request gating -- wait until we have a complete W transaction before
// we issue it.
val wCredits = SatUpDownCounter(cfg.maxWrites)
wCredits.inc := awQueue.io.enq.fire()
wCredits.dec := wQueue.io.deq.fire() && wQueue.io.deq.bits.last
val awCredits = SatUpDownCounter(cfg.maxWrites)
awCredits.inc := wQueue.io.enq.fire() && wQueue.io.enq.bits.last
awCredits.dec := awQueue.io.deq.fire()
// All the sources of host stalls
val tFireHelper = DecoupledHelper(
io.nastiInputs.hValid,
awQueue.io.enq.ready,
wQueue.io.enq.ready,
arQueue.io.enq.ready)
val ingressUnitStall = !tFireHelper.fire(io.nastiInputs.hValid)
// A request is finished when we have both a complete AW and W request
// Only then can we consider issuing the write to host memory system
//
// When we aren't relaxing the ordering, we repurpose the credit counters to
// simply count the number of complete W and AW requests.
val write_req_done = ((awCredits.value > wCredits.value) && wCredits.inc) ||
((awCredits.value < wCredits.value) && awCredits.inc) ||
awCredits.inc && wCredits.inc
when (!io.relaxed) {
Seq(awCredits, wCredits) foreach { _.dec := write_req_done }
}
val read_req_done = arQueue.io.enq.fire()
// FIFO that tracks the relative order of reads and writes are they are received
// bit 0 = Read, bit 1 = Write
val xaction_order = Module(new DualQueue(Bool(), cfg.maxReads + cfg.maxWrites))
xaction_order.io.enqA.valid := read_req_done
xaction_order.io.enqA.bits := true.B
xaction_order.io.enqB.valid := write_req_done
xaction_order.io.enqB.bits := false.B
val do_hread = io.relaxed ||
(io.host_mem_idle || io.host_read_inflight) && xaction_order.io.deq.valid && xaction_order.io.deq.bits
val do_hwrite = Mux(io.relaxed, !awCredits.empty,
io.host_mem_idle && xaction_order.io.deq.valid && !xaction_order.io.deq.bits)
xaction_order.io.deq.ready := io.nastiOutputs.ar.fire || io.nastiOutputs.aw.fire
val do_hwrite_data_reg = RegInit(false.B)
when (io.nastiOutputs.aw.fire) {
do_hwrite_data_reg := true.B
}.elsewhen (io.nastiOutputs.w.fire && io.nastiOutputs.w.bits.last) {
do_hwrite_data_reg := false.B
}
val do_hwrite_data = Mux(io.relaxed, !wCredits.empty, do_hwrite_data_reg)
io.nastiInputs.hReady := !ingressUnitStall
arQueue.io.enq.bits := io.nastiInputs.hBits.ar.bits
arQueue.io.enq.valid := tFireHelper.fire(arQueue.io.enq.ready) && io.nastiInputs.hBits.ar.valid
io.nastiOutputs.ar <> arQueue.io.deq
io.nastiOutputs.ar.valid := do_hread && arQueue.io.deq.valid
arQueue.io.deq.ready := do_hread && io.nastiOutputs.ar.ready
awQueue.io.enq.bits := io.nastiInputs.hBits.aw.bits
awQueue.io.enq.valid := tFireHelper.fire(awQueue.io.enq.ready) && io.nastiInputs.hBits.aw.valid
wQueue.io.enq.bits := io.nastiInputs.hBits.w.bits
wQueue.io.enq.valid := tFireHelper.fire(wQueue.io.enq.ready) && io.nastiInputs.hBits.w.valid
io.nastiOutputs.aw.bits := awQueue.io.deq.bits
io.nastiOutputs.w.bits := wQueue.io.deq.bits
io.nastiOutputs.aw.valid := do_hwrite && awQueue.io.deq.valid
awQueue.io.deq.ready := do_hwrite && io.nastiOutputs.aw.ready
io.nastiOutputs.w.valid := do_hwrite_data && wQueue.io.deq.valid
wQueue.io.deq.ready := do_hwrite_data && io.nastiOutputs.w.ready
// Deadlock checks.
assert(!(wQueue.io.enq.valid && !wQueue.io.enq.ready &&
Mux(io.relaxed, wCredits.empty, !xaction_order.io.deq.valid)),
"DEADLOCK: Timing model requests w enqueue, but wQueue is full and cannot drain")
assert(!(awQueue.io.enq.valid && !awQueue.io.enq.ready &&
Mux(io.relaxed, awCredits.empty, !xaction_order.io.deq.valid)),
"DEADLOCK: Timing model requests aw enqueue, but is awQueue is full and cannot drain")
}

View File

@ -0,0 +1,73 @@
package midas
package models
import chisel3._
import chisel3.util._
import freechips.rocketchip.config.{Parameters, Field}
import freechips.rocketchip.util.ParameterizedBundle
import junctions._
class NastiReqChannels(implicit val p: Parameters) extends ParameterizedBundle {
val aw = Decoupled(new NastiWriteAddressChannel)
val w = Decoupled(new NastiWriteDataChannel)
val ar = Decoupled(new NastiReadAddressChannel)
def fromNasti(n: NastiIO): Unit = {
aw <> n.aw
ar <> n.ar
w <> n.w
}
}
object NastiReqChannels {
def apply(nasti: NastiIO)(implicit p: Parameters): NastiReqChannels = {
val w = Wire(new NastiReqChannels)
w.ar <> nasti.ar
w.aw <> nasti.aw
w.w <> nasti.w
w
}
}
class ValidNastiReqChannels(implicit val p: Parameters) extends ParameterizedBundle {
val aw = Valid(new NastiWriteAddressChannel)
val w = Valid(new NastiWriteDataChannel)
val ar = Valid(new NastiReadAddressChannel)
}
class NastiRespChannels(implicit val p: Parameters) extends ParameterizedBundle {
val b = Decoupled(new NastiWriteResponseChannel)
val r = Decoupled(new NastiReadDataChannel)
}
// Target-level interface
class EgressReq(implicit val p: Parameters) extends ParameterizedBundle
with HasNastiParameters {
val b = Valid(UInt(nastiWIdBits.W))
val r = Valid(UInt(nastiRIdBits.W))
}
// Target-level interface
class EgressResp(implicit val p: Parameters) extends ParameterizedBundle {
val bBits = Output(new NastiWriteResponseChannel)
val bReady = Input(Bool())
val rBits = Output(new NastiReadDataChannel)
val rReady = Input(Bool())
}
// Contains the metadata required to track a transaction as it it requested from the egress unit
class CurrentReadResp(implicit val p: Parameters) extends ParameterizedBundle
with HasNastiParameters {
val id = UInt(nastiRIdBits.W)
val len = UInt(nastiXLenBits.W)
}
class CurrentWriteResp(implicit val p: Parameters) extends ParameterizedBundle
with HasNastiParameters {
val id = UInt(nastiRIdBits.W)
}
class MemModelTargetIO(implicit val p: Parameters) extends ParameterizedBundle {
val nasti = new NastiIO
val reset = Output(Bool())
}

View File

@ -0,0 +1,438 @@
package midas
package models
// NOTE: This LLC model is *very* crude model of a cache that simple forwards
// misses onto the DRAM model, while short-circuiting hits.
import junctions._
import midas.core._
import midas.widgets._
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.util.{ParameterizedBundle, MaskGen, UIntToOH1}
import chisel3._
import chisel3.util._
import scala.math.min
import Console.{UNDERLINED, RESET}
import java.io.{File, FileWriter}
// State to track reads to DRAM, ~loosely an MSHR
class MSHR(llcKey: LLCParams)(implicit p: Parameters) extends NastiBundle()(p) {
val set_addr = UInt(llcKey.sets.maxBits.W)
val xaction = new TransactionMetaData
val wb_in_flight = Bool()
val acq_in_flight = Bool()
val enabled = Bool() // Set by a runtime configuration register
def valid(): Bool = (wb_in_flight || acq_in_flight) && enabled
def available(): Bool = !valid && enabled
def setCollision(set_addr: UInt): Bool = (set_addr === this.set_addr) && valid
// Call on a MSHR register; sets all pertinent fields (leaving enabled untouched)
def allocate(
new_xaction: TransactionMetaData,
new_set_addr: UInt,
do_acq: Bool,
do_wb: Bool = false.B)(implicit p: Parameters): Unit = {
set_addr := new_set_addr
wb_in_flight := do_wb
acq_in_flight := do_acq
xaction := new_xaction
}
override def cloneType = new MSHR(llcKey)(p).asInstanceOf[this.type]
}
object MSHR {
def apply(llcKey: LLCParams)(implicit p: Parameters): MSHR = {
val w = Wire(new MSHR(llcKey))
w.wb_in_flight := false.B
w.acq_in_flight := false.B
// Initialize to enabled to play nice with assertions
w.enabled := true.B
w.xaction := DontCare
w.set_addr := DontCare
w
}
}
class BlockMetadata(tagBits: Int) extends Bundle {
val tag = UInt(tagBits.W)
val valid = Bool()
val dirty = Bool()
override def cloneType = new BlockMetadata(tagBits).asInstanceOf[this.type]
}
class LLCProgrammableSettings(llcKey: LLCParams) extends Bundle
with HasProgrammableRegisters with HasConsoleUtils {
val wayBits = Input(UInt(log2Ceil(llcKey.ways.maxBits).W))
val setBits = Input(UInt(log2Ceil(llcKey.sets.maxBits).W))
val blockBits = Input(UInt(log2Ceil(llcKey.blockBytes.maxBits).W))
val activeMSHRs = Input(UInt(log2Ceil(llcKey.mshrs.max + 1).W))
// Instrumentation
val misses = Output(UInt(32.W)) // Total accesses is provided by (totalReads + totalWrites)
val writebacks = Output(UInt(32.W)) // Number of dirty lines returned to DRAM
val refills = Output(UInt(32.W)) // Number of clean lines requested from DRAM
val peakMSHRsUsed = Output(UInt(log2Ceil(llcKey.mshrs.max+1).W)) // Peak number of MSHRs used
// Note short-burst writes will produce a refill, whereas releases from caches will not
val registers = Seq(
wayBits -> RuntimeSetting(llcKey.ways.maxBits, "Log2(ways per set)"),
setBits -> RuntimeSetting(llcKey.sets.maxBits, "Log2(sets per bank"),
blockBits -> RuntimeSetting(llcKey.blockBytes.maxBits, "Log2(cache-block bytes"),
activeMSHRs -> RuntimeSetting(llcKey.mshrs.max, "Number of MSHRs", min = 1, max = Some(llcKey.mshrs.max))
)
def maskTag(addr: UInt): UInt = (addr >> (blockBits +& setBits))
def maskSet(addr: UInt): UInt = ((addr >> blockBits) & ((1.U << setBits) - 1.U))(llcKey.sets.maxBits-1, 0)
def regenPhysicalAddress(set_addr: UInt, tag_addr: UInt): UInt =
(set_addr << (blockBits)) |
(tag_addr << (blockBits +& setBits))
def setLLCSettings(bytesPerBlock: Option[Int] = None): Unit = {
Console.println(s"\n${UNDERLINED}Last-Level Cache Settings${RESET}")
regMap(blockBits).set(log2Ceil(requestInput("Block size in bytes",
default = llcKey.blockBytes.max,
min = Some(llcKey.blockBytes.min),
max = Some(llcKey.blockBytes.max))))
regMap(setBits).set(log2Ceil(requestInput("Number of sets in LLC",
default = llcKey.sets.max,
min = Some(llcKey.sets.min),
max = Some(llcKey.sets.max))))
regMap(wayBits).set(log2Ceil(requestInput("Set associativity",
default = llcKey.ways.max,
min = Some(llcKey.ways.min),
max = Some(llcKey.ways.max))))
}
}
case class WRange(min: Int, max: Int) {
def minBits: Int = log2Ceil(min)
def maxBits: Int = log2Ceil(max)
override def toString(): String = s"[${min},${max}]"
}
case class LLCParams(
ways: WRange = WRange(1, 8),
sets: WRange = WRange(32, 4096),
blockBytes: WRange = WRange(8, 128),
mshrs: WRange = WRange(1, 8)// TODO: check against AXI ID width
) {
def maxTagBits(addrWidth: Int): Int = addrWidth - blockBytes.minBits - sets.minBits
def print(): Unit = {
println(" LLC Parameters:")
println(" Sets: " + sets)
println(" Associativity: " + ways)
println(" Block Size (B): " + blockBytes)
println(" MSHRs: " + mshrs)
println(" Replacement Policy: Random\n")
}
}
class LLCModelIO(val key: LLCParams)(implicit val p: Parameters) extends Bundle {
val req = Flipped(new NastiReqChannels)
val wResp = Decoupled(new WriteResponseMetaData) // to backend
val rResp = Decoupled(new ReadResponseMetaData)
val memReq = new NastiReqChannels // to backing DRAM model
val memRResp = Flipped(Decoupled(new ReadResponseMetaData)) // from backing DRAM model
val memWResp = Flipped(Decoupled(new WriteResponseMetaData))
// LLC runtime configuration
val settings = new LLCProgrammableSettings(key)
}
class LLCModel(cfg: BaseConfig)(implicit p: Parameters) extends NastiModule()(p) {
val llcKey = cfg.params.llcKey.get
val io = IO(new LLCModelIO(llcKey))
require(log2Ceil(llcKey.mshrs.max) <= nastiXIdBits, "Can have at most one MSHR per AXI ID")
val maxTagBits = llcKey.maxTagBits(nastiXAddrBits)
val way_addr_mask = Reverse(MaskGen((llcKey.ways.max - 1).U, io.settings.wayBits, llcKey.ways.max))
// Rely on intialization of BRAM to 0 during programming to unset all valid bits the md_array
val md_array = SyncReadMem(llcKey.sets.max, Vec(llcKey.ways.max, new BlockMetadata(maxTagBits)))
val d_array_busy = Module(new DownCounter(8))
d_array_busy.io.set.valid := false.B
d_array_busy.io.set.bits := DontCare
d_array_busy.io.decr := false.B
val mshr_mask_vec = UIntToOH1(io.settings.activeMSHRs, llcKey.mshrs.max).toBools
val mshrs = RegInit(VecInit(Seq.fill(llcKey.mshrs.max)(MSHR(llcKey))))
// Enable only active MSHRs as requested in the runtime configuration
mshrs.zipWithIndex.foreach({ case (m, idx) => m.enabled := mshr_mask_vec(idx) })
val mshr_available = mshrs.exists({m: MSHR => m.available() })
val mshr_next_idx = mshrs.indexWhere({ m: MSHR => m.available() })
// TODO: Put this on a switch
val mshrs_allocated = mshrs.count({m: MSHR => m.valid})
assert((mshrs_allocated < io.settings.activeMSHRs) || !mshr_available,
"Too many runtime MSHRs exposed given runtime programmable limit")
assert((mshrs_allocated === io.settings.activeMSHRs) || mshr_available,
"Too few runtime MSHRs exposed given runtime programmable limit")
val s2_ar_mem = Module(new Queue(new NastiReadAddressChannel, 2))
val s2_aw_mem = Module(new Queue(new NastiWriteAddressChannel, 2))
val miss_resource_hazard = !mshr_available || !s2_aw_mem.io.enq.ready || !s2_ar_mem.io.enq.ready
val reads = Queue(io.req.ar)
val read_set = io.settings.maskSet(reads.bits.addr)
val read_set_collision = mshrs.exists({ m: MSHR => m.setCollision(read_set) })
val can_deq_read = reads.valid && !read_set_collision && !miss_resource_hazard && io.rResp.ready
val writes = Queue(io.req.aw)
val write_set = io.settings.maskSet(writes.bits.addr)
val write_set_collision = mshrs.exists({ m: MSHR => m.setCollision(write_set) })
val can_deq_write = writes.valid && !write_set_collision && !miss_resource_hazard && mshr_available && io.wResp.ready
val llc_idle :: llc_r_mdaccess :: llc_r_wb :: llc_r_daccess :: llc_w_mdaccess :: llc_w_wb :: llc_w_daccess :: llc_refill :: Nil = Enum(8)
val state = RegInit(llc_idle)
val refill_start = WireInit(false.B)
val read_start = WireInit(false.B)
val write_start = WireInit(false.B)
val set_addr = Mux(write_start, write_set, read_set)
val tag_addr = io.settings.maskTag(Mux(write_start, writes.bits.addr, reads.bits.addr))
// S1 = Tag matches, replacement candidate selection, and replacement policy update
val s1_tag_addr = RegNext(tag_addr)
val s1_set_addr = RegNext(set_addr)
val s1_valid = state === llc_r_mdaccess || state === llc_w_mdaccess
val s1_metadata = {
import Chisel._
md_array.read(set_addr, read_start || write_start)
}
def isHit(m: BlockMetadata): Bool = m.valid && (m.tag === s1_tag_addr)
val hit_ways = VecInit(s1_metadata.map(isHit)).asUInt & way_addr_mask
val hit_way_sel = PriorityEncoderOH(hit_ways)
val hit_valid = hit_ways.orR
def isEmptyWay(m: BlockMetadata): Bool = !m.valid
val empty_ways = VecInit(s1_metadata.map(isEmptyWay)).asUInt & way_addr_mask
val empty_way_sel = PriorityEncoderOH(empty_ways)
val empty_valid = empty_ways.orR
val fill_empty_way = !hit_valid && empty_valid
val lsfr = LFSR16(true.B)
val evict_way_sel = UIntToOH(lsfr(llcKey.ways.maxBits - 1, 0) & ((1.U << io.settings.wayBits) - 1.U))
val evict_way_is_dirty = (VecInit(s1_metadata.map(_.dirty)).asUInt & evict_way_sel).orR
val evict_way_tag = Mux1H(evict_way_sel, s1_metadata.map(_.tag))
val do_evict = !hit_valid && !empty_valid
val evict_dirty_way = do_evict && evict_way_is_dirty
val dirty_line_addr = io.settings.regenPhysicalAddress(s1_set_addr, evict_way_tag)
val selected_way_OH = Mux(hit_valid, hit_way_sel, Mux(empty_valid, empty_way_sel, evict_way_sel)).toBools
val md_update = s1_metadata.zip(selected_way_OH) map { case (md, sel) =>
val next = WireInit(md)
when (sel) {
when (fill_empty_way) {
next.valid := true.B
}
when(state === llc_w_mdaccess) {
next.dirty := true.B
// This also assumes that all md fields in invalid ways are initialized
// to zero during programming. Otherwise we'd need to unset the dirty bit
// on a compulsory miss
}.elsewhen(state === llc_r_mdaccess && do_evict) {
next.dirty := false.B
}
when(do_evict || fill_empty_way) {
next.tag := s1_tag_addr
}
}
next
}
when (s1_valid) {
md_array.write(s1_set_addr, VecInit(md_update))
}
// FIXME: Inner and outer widths are the same
val block_beats = (1.U << (io.settings.blockBits - log2Ceil(nastiXDataBits/8).U))
// AXI4 length; subtract 1
val axi4_block_len = block_beats - 1.U
val read_triggered_refill = state === llc_r_mdaccess && !hit_valid
val write_triggered_refill = state === llc_w_mdaccess && (writes.bits.len < axi4_block_len) &&
!hit_valid
val need_refill = read_triggered_refill || write_triggered_refill
val need_writeback = s1_valid && evict_dirty_way
val allocate_mshr = need_refill || need_writeback
when(allocate_mshr) {
mshrs(mshr_next_idx).allocate(
new_xaction = Mux(state === llc_r_mdaccess,
TransactionMetaData(reads.bits),
TransactionMetaData(writes.bits)),
new_set_addr = s1_set_addr,
do_acq = need_refill,
do_wb = need_writeback)
}
// Refill Issue
// For now always fetch whole cache lines from DRAM, even if fewer beats are required for
// a write-triggered refill
val current_line_addr = io.settings.regenPhysicalAddress(s1_set_addr, s1_tag_addr)
s2_ar_mem.io.enq.bits := NastiReadAddressChannel(
addr = current_line_addr,
id = mshr_next_idx,
size = log2Ceil(nastiXDataBits/8).U,
len = axi4_block_len)
s2_ar_mem.io.enq.valid := need_refill
reads.ready := (state === llc_r_mdaccess)
// Writeback Issue
s2_aw_mem.io.enq.bits := NastiWriteAddressChannel(
addr = dirty_line_addr,
id = mshr_next_idx,
size = log2Ceil(nastiXDataBits/8).U,
len = axi4_block_len)
s2_aw_mem.io.enq.valid := need_writeback
writes.ready := io.req.w.bits.last && io.req.w.fire
io.memReq.ar <> s2_ar_mem.io.deq
io.memReq.aw <> s2_aw_mem.io.deq
io.memReq.w.valid := (state === llc_r_wb || state === llc_w_wb)
io.memReq.w.bits.last := d_array_busy.io.idle
// Handle responses from DRAM
when (io.memWResp.valid) {
mshrs(io.memWResp.bits.id).wb_in_flight := false.B
}
io.memWResp.ready := true.B
when (refill_start) {
mshrs(io.memRResp.bits.id).acq_in_flight := false.B
}
val can_refill = io.memRResp.valid &&
(mshrs(io.memRResp.bits.id).xaction.isWrite || io.rResp.ready)
io.memRResp.ready := refill_start
// Data-array hazard tracking
when (((state === llc_w_mdaccess || state === llc_r_mdaccess) && evict_dirty_way) ||
refill_start) {
d_array_busy.io.set.valid := true.B
d_array_busy.io.set.bits := axi4_block_len
}.elsewhen (state === llc_r_mdaccess && hit_valid) {
d_array_busy.io.set.valid := true.B
d_array_busy.io.set.bits := reads.bits.len
}.elsewhen (state === llc_w_mdaccess && (hit_valid || empty_valid) ||
state === llc_w_wb && (io.memReq.w.fire && io.memReq.w.bits.last)) {
d_array_busy.io.set.valid := true.B
d_array_busy.io.set.bits := writes.bits.len
}
d_array_busy.io.decr := Mux(state === llc_w_wb || state === llc_r_wb,
io.memReq.w.fire,
Mux(state === llc_w_daccess, io.req.w.valid, true.B))
io.req.w.ready := (state === llc_w_daccess) || (state === llc_w_mdaccess && !evict_dirty_way)
io.rResp.valid := (refill_start && !mshrs(io.memRResp.bits.id).xaction.isWrite) ||
(state === llc_r_mdaccess && hit_valid)
io.rResp.bits := Mux(refill_start,
ReadResponseMetaData(mshrs(io.memRResp.bits.id).xaction),
ReadResponseMetaData(reads.bits))
io.wResp.valid := (state === llc_w_mdaccess || state === llc_w_daccess) &&
io.req.w.fire && io.req.w.bits.last
io.wResp.bits := WriteResponseMetaData(writes.bits)
switch (state) {
is (llc_idle) {
when (can_refill) {
state := llc_refill
refill_start := true.B
}.elsewhen(can_deq_read) {
state := llc_r_mdaccess
read_start := true.B
}.elsewhen(can_deq_write) {
state := llc_w_mdaccess
write_start := true.B
}
}
is (llc_r_mdaccess) {
when (hit_valid) {
when(reads.bits.len =/= 0.U) {
state := llc_r_daccess
}.otherwise {
state := llc_idle
}
}.elsewhen (evict_dirty_way) {
state := llc_r_wb
}.otherwise {
state := llc_idle
}
}
is (llc_w_mdaccess) {
when (!evict_dirty_way) {
when (io.req.w.valid && io.req.w.bits.last) {
state := llc_idle
}.otherwise {
state := llc_w_daccess
}
}.otherwise {
state := llc_w_wb
}
}
is (llc_r_wb) {
when(io.memReq.w.fire && io.memReq.w.bits.last) {
state := llc_idle
}
}
is (llc_w_wb) {
when(io.memReq.w.fire && io.memReq.w.bits.last) {
state := llc_w_daccess
}
}
is (llc_w_daccess) {
when (io.req.w.valid && io.req.w.bits.last) {
state := llc_idle
}
}
is (llc_r_daccess) {
when (d_array_busy.io.current === 1.U) {
state := llc_idle
}
}
is (llc_refill) {
when (d_array_busy.io.current === 1.U) {
state := llc_idle
}
}
}
// Instrumentation
val miss_count = RegInit(0.U(32.W))
when (s1_valid && !hit_valid) { miss_count := miss_count + 1.U }
io.settings.misses := miss_count
val wb_count = RegInit(0.U(32.W))
when (s1_valid && evict_dirty_way) { wb_count := wb_count + 1.U }
io.settings.writebacks := wb_count
val refill_count = RegInit(0.U(32.W))
when (state === llc_r_mdaccess && !hit_valid) { refill_count := refill_count + 1.U }
io.settings.refills := refill_count
val peak_mshrs_used = RegInit(0.U(log2Ceil(llcKey.mshrs.max + 1).W))
when (peak_mshrs_used < mshrs_allocated) { peak_mshrs_used := mshrs_allocated }
io.settings.peakMSHRsUsed := peak_mshrs_used
}

View File

@ -0,0 +1,83 @@
package midas
package models
import chisel3._
import chisel3.util._
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.util.ParameterizedBundle
import junctions._
import midas.widgets._
import Console.{UNDERLINED, RESET}
case class LatencyPipeConfig(params: BaseParams) extends BaseConfig {
def elaborate()(implicit p: Parameters): LatencyPipe = Module(new LatencyPipe(this))
}
class LatencyPipeMMRegIO(cfg: BaseConfig)(implicit p: Parameters) extends SplitTransactionMMRegIO(cfg){
val readLatency = Input(UInt(32.W))
val writeLatency = Input(UInt(32.W))
val registers = maxReqRegisters ++ Seq(
(writeLatency -> RuntimeSetting(30, "Write latency", min = 1)),
(readLatency -> RuntimeSetting(30,"Read latency", min = 1))
)
def requestSettings() {
Console.println(s"${UNDERLINED}Generating a runtime configuration for a latency-bandwidth pipe${RESET}")
}
}
class LatencyPipeIO(val cfg: LatencyPipeConfig)(implicit p: Parameters) extends SplitTransactionModelIO()(p) {
val mmReg = new LatencyPipeMMRegIO(cfg)
}
class WritePipeEntry(implicit val p: Parameters) extends Bundle {
val releaseCycle = UInt(64.W)
val xaction = new WriteResponseMetaData
}
class ReadPipeEntry(implicit val p: Parameters) extends Bundle {
val releaseCycle = UInt(64.W)
val xaction = new ReadResponseMetaData
}
class LatencyPipe(cfg: LatencyPipeConfig)(implicit p: Parameters) extends SplitTransactionModel(cfg)(p) {
lazy val io = IO(new LatencyPipeIO(cfg))
val longName = "Latency Bandwidth Pipe"
def printTimingModelGenerationConfig {}
/**************************** CHISEL BEGINS *********************************/
// Configuration values
val readLatency = io.mmReg.readLatency
val writeLatency = io.mmReg.writeLatency
// ***** Write Latency Pipe *****
// Write delays are applied to the cycle upon which both the AW and W
// transactions have completed. Since multiple AW packets may arrive
// before the associated W packet, we queue them up.
val writePipe = Module(new Queue(new WritePipeEntry, cfg.maxWrites, flow = true))
writePipe.io.enq.valid := newWReq
writePipe.io.enq.bits.xaction := WriteResponseMetaData(awQueue.io.deq.bits)
writePipe.io.enq.bits.releaseCycle := writeLatency + tCycle - egressUnitDelay.U
val writeDone = writePipe.io.deq.bits.releaseCycle <= tCycle
wResp.valid := writePipe.io.deq.valid && writeDone
wResp.bits := writePipe.io.deq.bits.xaction
writePipe.io.deq.ready := wResp.ready && writeDone
// ***** Read Latency Pipe *****
val readPipe = Module(new Queue(new ReadPipeEntry, cfg.maxReads, flow = true))
readPipe.io.enq.valid := nastiReq.ar.fire
readPipe.io.enq.bits.xaction := ReadResponseMetaData(nastiReq.ar.bits)
readPipe.io.enq.bits.releaseCycle := readLatency + tCycle - egressUnitDelay.U
// Release read responses on the appropriate cycle
val readDone = readPipe.io.deq.bits.releaseCycle <= tCycle
rResp.valid := readPipe.io.deq.valid && readDone
rResp.bits := readPipe.io.deq.bits.xaction
readPipe.io.deq.ready := rResp.ready && readDone
}

View File

@ -0,0 +1,163 @@
package midas
package models
import freechips.rocketchip.util.{HasGeneratorUtilities, ParsedInputNames} // For parameter lookup
import freechips.rocketchip.config._
import chisel3._
import org.json4s._
import Console.{UNDERLINED, GREEN, RESET}
import java.io.{File, FileWriter}
// Hacky utilities to get console input from user.
trait HasConsoleUtils {
def requestInput(query: String,
default: BigInt,
min: Option[BigInt] = None,
max: Option[BigInt] = None): BigInt = {
def inner(): BigInt = {
Console.printf(query + s"(${default}):")
var value = default
try {
val line = io.StdIn.readLine()
if (line.length() > 0) {
value = line.toInt
}
if (max != None && value > max.get) {
Console.printf(s"Request integer ${value} exceeds maximum ${max.get}")
inner
} else if (min != None && value < min.get) {
Console.printf(s"Request integer ${value} is less than minimum ${min.get}")
inner
}
} catch {
case e: java.lang.NumberFormatException => {
Console.println("Please give me an integer!")
value = inner
}
case e: java.io.EOFException => { value = default }
}
value
}
inner
}
// Select from list of possibilities
// Format:
// HEADER
// POS 0
// ...
// POS N-1
// FOOTER (DEFAULT):
def requestSeqSelection(
header: String,
possibilities: Seq[String],
footer: String = "Selection number",
default: BigInt = 0): Int = {
val query = s"${header}\n" + (possibilities.zipWithIndex).foldRight(footer)((head, body) =>
s" ${head._2}) ${head._1}\n" + body)
requestInput(query, default).toInt
}
}
// Runtime settings are programmable registers that change behavior of a memory model instance
// These are instantatiated in the I/O of the timing model and tied to a Chisel Input
trait IsRuntimeSetting extends HasConsoleUtils {
def default: BigInt
def query: String
def min: BigInt
def max: Option[BigInt]
private var _isSet = false
private var _value: BigInt = 0
def set(value: BigInt) {
require(!_isSet, "Trying to set a programmable register that has already been set.")
_value = value;
_isSet = true
}
def isSet() = _isSet
def getOrElse(alt: =>BigInt): BigInt = if (_isSet) _value else alt
// This prompts the user via the console for setting
def requestSetting(field: Data) {
set(requestInput(query, default, Some(min), max))
}
}
// A vanilla runtime setting of the memory model
case class RuntimeSetting(
default: BigInt,
query: String,
min: BigInt = 0,
max: Option[BigInt] = None) extends IsRuntimeSetting
// A setting whose value can be looked up from a provided table.
case class JSONSetting(
default: BigInt,
query: String,
lookUp: Map[String, BigInt] => BigInt,
min: BigInt = 0,
max: Option[BigInt] = None) extends IsRuntimeSetting {
def setWithLUT(lut: Map[String, BigInt]) = set(lookUp(lut))
}
trait HasProgrammableRegisters extends Bundle {
def registers: Seq[(Data, IsRuntimeSetting)]
lazy val regMap = Map(registers: _*)
def getName(dat: Data): String = {
val name = elements.find(_._2 == dat) match {
case Some((name, elem)) => name
case None => throw new RuntimeException("Could not look up register leaf name")
}
name
}
// Returns the default values for all registered RuntimeSettings
def getDefaults(prefix: String = ""): Seq[(String, String)] = {
val localDefaults = registers map { case (elem, reg) => (s"${prefix}${getName(elem)}" -> s"${reg.default}") }
localDefaults ++ (elements flatMap {
case (name, elem: HasProgrammableRegisters) => elem.getDefaults(s"${prefix}${name}_")
case _ => Seq()
})
}
// Returns the requested values for all RuntimSEttings, throws an exception if one is unbound
def getSettings(prefix: String = ""): Seq[(String, String)] = {
val localSettings = registers map { case (elem, reg) => {
val name = s"${prefix}${getName(elem)}"
val setting = reg.getOrElse(throw new RuntimeException(s"Runtime Setting ${name} has not been set"))
(name -> setting.toString)
}
}
// Recurse into leaves
localSettings ++ (elements flatMap {
case (name, elem: HasProgrammableRegisters) => elem.getSettings(s"${prefix}${name}_")
case _ => Seq()
})
}
// Requests the users input for all unset RuntimeSettings
def setUnboundSettings(prefix: String = "test") {
// Set all local registers
registers foreach {
case (elem, reg) if !reg.isSet => reg.requestSetting(elem)
case _ => None
}
// Traverse into leaf bundles and set them
elements foreach {
case (name, elem: HasProgrammableRegisters) => elem.setUnboundSettings()
case _ => None
}
}
}

View File

@ -0,0 +1,71 @@
package midas
package models
import Chisel._
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.diplomacy._
import freechips.rocketchip.util._
import freechips.rocketchip.amba.axi4._
import freechips.rocketchip.tilelink._
import freechips.rocketchip.devices.tilelink._
import junctions.NastiParameters
// WARNING: The address widths are totally bungled here. This is intended
// for use with the memory model only
// We're going to rely on truncation of the (sometimes) wider master address
// later on
//
// For identical widths this module becomes passthrough
class TargetToHostAXI4Converter (
mWidths: NastiParameters,
sWidths: NastiParameters,
mMaxTransfer: Int = 128)
(implicit p: Parameters) extends LazyModule
{
implicit val valname = ValName("FASEDWidthAdapter")
val m = AXI4MasterNode(Seq(AXI4MasterPortParameters(
masters = Seq(AXI4MasterParameters(
name = "widthAdapter",
aligned = true,
maxFlight = Some(2),
id = IdRange(0, (1 << mWidths.idBits))))))) // FIXME: Idbits
val s = AXI4SlaveNode(Seq(AXI4SlavePortParameters(
slaves = Seq(AXI4SlaveParameters(
address = Seq(AddressSet(0, (BigInt(1) << mWidths.addrBits) - 1)),
supportsWrite = TransferSizes(1, mMaxTransfer),
supportsRead = TransferSizes(1, mMaxTransfer),
interleavedId = Some(0))), // slave does not interleave read responses
beatBytes = sWidths.dataBits/8)
))
// If no width change necessary, pass through with a buffer
if (mWidths.dataBits == sWidths.dataBits) {
s := m
} else {
// Otherwise we need to convert to TL2 and back
val xbar = LazyModule(new TLXbar)
val error = LazyModule(new TLError(DevNullParams(
Seq(AddressSet(BigInt(1) << mWidths.addrBits, 0xff)), maxAtomic = 1, maxTransfer = 128),
beatBytes = sWidths.dataBits/8))
(xbar.node
:= TLWidthWidget(mWidths.dataBits/8)
:= TLFIFOFixer()
:= AXI4ToTL()
:= AXI4Buffer()
:= m )
error.node := xbar.node
(s := AXI4Buffer()
:= AXI4UserYanker()
:= TLToAXI4()
:= xbar.node)
}
lazy val module = new LazyModuleImp(this) {
val mAxi4 = IO(Flipped(m.out.head._1.cloneType))
m.out.head._1 <> mAxi4
val sAxi4 = IO(s.in.head._1.cloneType)
sAxi4 <> s.in.head._1
}
}

View File

@ -0,0 +1,244 @@
package midas
package models
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.util.ParameterizedBundle
import junctions._
import chisel3._
import chisel3.util._
import midas.core._
import midas.widgets._
import Console.{UNDERLINED, RESET}
// Automatically bound to simulation-memory-mapped. registers Extends this
// bundle to add additional programmable values and instrumentation
abstract class MMRegIO(cfg: BaseConfig) extends Bundle with HasProgrammableRegisters {
val (totalReads, totalWrites) = if (cfg.params.xactionCounters) {
(Some(Output(UInt(32.W))), Some(Output(UInt(32.W))))
} else {
(None, None)
}
val (totalReadBeats, totalWriteBeats) = if (cfg.params.beatCounters) {
(Some(Output(UInt(32.W))), Some(Output(UInt(32.W))))
} else {
(None, None)
}
val llc = if (cfg.useLLCModel) Some(new LLCProgrammableSettings(cfg.params.llcKey.get)) else None
// Instrumentation Registers
val bins = cfg.params.occupancyHistograms match {
case Nil => 0
case binMaximums => binMaximums.size + 1
}
val readOutstandingHistogram = Output(Vec(bins, UInt(32.W)))
val writeOutstandingHistogram = Output(Vec(bins, UInt(32.W)))
val targetCycle = if (cfg.params.targetCycleCounter) Some(Output(UInt(32.W))) else None
// Implemented by each timing model to query runtime values for its
// programmable settings
def requestSettings(): Unit
// Called by MidasMemModel to fetch all programmable settings for the timing
// model. These are concatenated with functional model settings
def getTimingModelSettings(): Seq[(String, String)] = {
// First invoke the timing model specific method
requestSettings()
// Finally set everything that hasn't already been set
llc.foreach({ _.setLLCSettings() })
Console.println(s"\n${UNDERLINED}Remaining Free Parameters${RESET}")
setUnboundSettings()
getSettings()
}
}
abstract class TimingModelIO(implicit val p: Parameters) extends Bundle {
val tNasti = Flipped(new NastiIO)
val egressReq = new EgressReq
val egressResp = Flipped(new EgressResp)
// This sub-bundle contains all the programmable fields of the model
val mmReg: MMRegIO
}
abstract class TimingModel(val cfg: BaseConfig)(implicit val p: Parameters) extends Module
with IngressModuleParameters with EgressUnitParameters with HasNastiParameters {
// Concrete timing models must implement io with the MMRegIO sub-bundle
// containing all of the requisite runtime-settings and instrumentation brought
// out as inputs and outputs respectively. See MMRegIO above.
val io: TimingModelIO
val longName: String
// Implemented by concrete timing models to describe their configuration during
// chisel elaboration
protected def printTimingModelGenerationConfig: Unit
def printGenerationConfig {
println(" Timing Model Class: " + longName)
printTimingModelGenerationConfig
}
/**************************** CHISEL BEGINS *********************************/
// Regulates the return of beats to the target memory system
val tNasti = io.tNasti
// Request channels presented to DRAM models
val nastiReqIden = Module(new IdentityModule(new NastiReqChannels))
val nastiReq = nastiReqIden.io.out
val wResp = Wire(Decoupled(new WriteResponseMetaData))
val rResp = Wire(Decoupled(new ReadResponseMetaData))
val monitor = Module(new MemoryModelMonitor(cfg))
monitor.axi4 := io.tNasti
val tCycle = RegInit(0.U(64.W))
tCycle := tCycle + 1.U
io.mmReg.targetCycle.foreach({ _ := tCycle })
val pendingReads = SatUpDownCounter(cfg.maxReads)
pendingReads.inc := tNasti.ar.fire()
pendingReads.dec := tNasti.r.fire() && tNasti.r.bits.last
val pendingAWReq = SatUpDownCounter(cfg.maxWrites)
pendingAWReq.inc := tNasti.aw.fire()
pendingAWReq.dec := tNasti.b.fire()
val pendingWReq = SatUpDownCounter(cfg.maxWrites)
pendingWReq.inc := tNasti.w.fire() && tNasti.w.bits.last
pendingWReq.dec := tNasti.b.fire()
assert(!tNasti.ar.valid || (tNasti.ar.bits.burst === NastiConstants.BURST_INCR),
"Illegal ar request: memory model only supports incrementing bursts")
assert(!tNasti.aw.valid || (tNasti.aw.bits.burst === NastiConstants.BURST_INCR),
"Illegal aw request: memory model only supports incrementing bursts")
// Release; returns responses to target
val xactionRelease = Module(new AXI4Releaser)
tNasti.b <> xactionRelease.io.b
tNasti.r <> xactionRelease.io.r
io.egressReq <> xactionRelease.io.egressReq
xactionRelease.io.egressResp <> io.egressResp
if (cfg.useLLCModel) {
// Drop the LLC model inline
val llc_model = Module(new LLCModel(cfg))
llc_model.io.settings <> io.mmReg.llc.get
llc_model.io.memRResp <> rResp
llc_model.io.memWResp <> wResp
llc_model.io.req.fromNasti(io.tNasti)
nastiReqIden.io.in <> llc_model.io.memReq
xactionRelease.io.nextWrite <> llc_model.io.wResp
xactionRelease.io.nextRead <> llc_model.io.rResp
} else {
nastiReqIden.io.in.fromNasti(io.tNasti)
xactionRelease.io.nextWrite <> wResp
xactionRelease.io.nextRead <> rResp
}
if (cfg.params.xactionCounters) {
val totalReads = RegInit(0.U(32.W))
val totalWrites = RegInit(0.U(32.W))
when(pendingReads.inc){ totalReads := totalReads + 1.U }
when(pendingAWReq.inc){ totalWrites := totalWrites + 1.U}
io.mmReg.totalReads foreach { _ := totalReads }
io.mmReg.totalWrites foreach { _ := totalWrites }
}
if (cfg.params.beatCounters) {
val totalReadBeats = RegInit(0.U(32.W))
val totalWriteBeats = RegInit(0.U(32.W))
when(tNasti.r.fire){ totalReadBeats := totalReadBeats + 1.U }
when(tNasti.w.fire){ totalWriteBeats := totalWriteBeats + 1.U }
io.mmReg.totalReadBeats foreach { _ := totalReadBeats}
io.mmReg.totalWriteBeats foreach { _ := totalWriteBeats }
}
cfg.params.occupancyHistograms match {
case Nil => Nil
case binMaximums =>
val numBins = binMaximums.size + 1
val readOutstandingHistogram = Seq.fill(numBins)(RegInit(0.U(32.W)))
val writeOutstandingHistogram = Seq.fill(numBins)(RegInit(0.U(32.W)))
def bindHistograms(bins: Seq[UInt], maximums: Seq[Int], count: UInt): Bool = {
(bins.zip(maximums)).foldLeft(false.B)({ case (hasIncrmented, (bin, maximum)) =>
when (!hasIncrmented && (count <= maximum.U)) {
bin := bin + 1.U
}
hasIncrmented || (count <= maximum.U)
})
}
// Append the largest UInt representable to the end of the Seq to catch remaining cases
val allBinMaximums = binMaximums :+ -1
bindHistograms(readOutstandingHistogram, binMaximums, pendingReads.value)
bindHistograms(writeOutstandingHistogram, binMaximums, pendingAWReq.value)
io.mmReg.readOutstandingHistogram := readOutstandingHistogram
io.mmReg.writeOutstandingHistogram := writeOutstandingHistogram
}
}
// A class of simple timing models that has independently programmable bounds on
// the number of reads and writes the model will accept.
//
// This is in contrast to more complex DRAM models that propogate backpressure
// from shared structures back to the AXI4 request channels.
abstract class SplitTransactionMMRegIO(cfg: BaseConfig)(implicit p: Parameters) extends MMRegIO(cfg) {
val readMaxReqs = Input(UInt(log2Ceil(cfg.maxReads+1).W))
val writeMaxReqs = Input(UInt(log2Ceil(cfg.maxWrites+1).W))
val maxReqRegisters = Seq(
(writeMaxReqs -> RuntimeSetting(cfg.maxWrites,
"Maximum number of target-writes the model will accept",
max = Some(cfg.maxWrites))),
(readMaxReqs -> RuntimeSetting(cfg.maxReads,
"Maximum number of target-reads the model will accept",
max = Some(cfg.maxReads)))
)
}
abstract class SplitTransactionModelIO(implicit p: Parameters)
extends TimingModelIO()(p) {
// This sub-bundle contains all the programmable fields of the model
val mmReg: SplitTransactionMMRegIO
}
abstract class SplitTransactionModel(cfg: BaseConfig)(implicit p: Parameters)
extends TimingModel(cfg)(p) {
override val io: SplitTransactionModelIO
pendingReads.max := io.mmReg.readMaxReqs
pendingAWReq.max := io.mmReg.writeMaxReqs
pendingWReq.max := io.mmReg.writeMaxReqs
nastiReq.ar.ready := ~pendingReads.full
nastiReq.aw.ready := ~pendingAWReq.full
nastiReq.w.ready := ~pendingWReq.full
//recombines AW and W transactions before passing them onto the rest of the model
val awQueue = Module(new Queue(new NastiWriteAddressChannel, cfg.maxWrites, flow = true))
val newWReq = if (!cfg.useLLCModel) {
((pendingWReq.value > pendingAWReq.value) && pendingAWReq.inc) ||
((pendingWReq.value < pendingAWReq.value) && pendingWReq.inc) ||
(pendingWReq.inc && pendingAWReq.inc)
} else {
val memWReqs = SatUpDownCounter(cfg.maxWrites)
val newWReq = ((memWReqs.value > awQueue.io.count) && nastiReq.aw.fire) ||
((memWReqs.value < awQueue.io.count) && memWReqs.inc) ||
(memWReqs.inc && nastiReq.aw.fire)
memWReqs.inc := nastiReq.w.fire && nastiReq.w.bits.last
memWReqs.dec := newWReq
newWReq
}
awQueue.io.enq.bits := nastiReq.aw.bits
awQueue.io.enq.valid := nastiReq.aw.fire()
awQueue.io.deq.ready := newWReq
}

View File

@ -0,0 +1,62 @@
package midas
package models
import chisel3._
import chisel3.util._
import junctions._
import midas.widgets._
import freechips.rocketchip.config.Parameters
import freechips.rocketchip.util.{ParameterizedBundle, DecoupledHelper}
// Add some scheduler specific metadata to a reference
class XactionSchedulerEntry(implicit p: Parameters) extends NastiBundle()(p) {
val xaction = new TransactionMetaData
val addr = UInt(nastiXAddrBits.W)
}
class XactionSchedulerIO(val cfg: BaseConfig)(implicit val p: Parameters) extends Bundle{
val req = Flipped(new NastiReqChannels)
val nextXaction = Decoupled(new XactionSchedulerEntry)
val pendingWReq = Input(UInt((cfg.maxWrites + 1).W))
val pendingAWReq = Input(UInt((cfg.maxWrites + 1).W))
}
class UnifiedFIFOXactionScheduler(depth: Int, cfg: BaseConfig)(implicit p: Parameters) extends Module {
val io = IO(new XactionSchedulerIO(cfg))
import DRAMMasEnums._
val transactionQueue = Module(new DualQueue(
gen = new XactionSchedulerEntry,
entries = depth))
transactionQueue.io.enqA.valid := io.req.ar.valid
transactionQueue.io.enqA.bits.xaction := TransactionMetaData(io.req.ar.bits)
transactionQueue.io.enqA.bits.addr := io.req.ar.bits.addr
io.req.ar.ready := transactionQueue.io.enqA.ready
transactionQueue.io.enqB.valid := io.req.aw.valid
transactionQueue.io.enqB.bits.xaction := TransactionMetaData(io.req.aw.bits)
transactionQueue.io.enqB.bits.addr := io.req.aw.bits.addr
io.req.aw.ready := transactionQueue.io.enqB.ready
// Accept up to one additional write data request
// TODO: More sensible model; maybe track a write buffer volume
io.req.w.ready := io.pendingWReq <= io.pendingAWReq
val selectedCmd = WireInit(cmd_nop)
val completedWrites = SatUpDownCounter(cfg.maxWrites)
completedWrites.inc := io.req.w.fire && io.req.w.bits.last
completedWrites.dec := io.nextXaction.fire && io.nextXaction.bits.xaction.isWrite
// Prevent release of oldest transaction if it is a write and it's data is not yet available
val deqGate = DecoupledHelper(
transactionQueue.io.deq.valid,
io.nextXaction.ready,
(!io.nextXaction.bits.xaction.isWrite || ~completedWrites.empty)
)
io.nextXaction <> transactionQueue.io.deq
io.nextXaction.valid := deqGate.fire(io.nextXaction.ready)
transactionQueue.io.deq.ready := deqGate.fire(transactionQueue.io.deq.valid)
}

View File

@ -0,0 +1,753 @@
package midas
package models
// From RC
import freechips.rocketchip.config.{Parameters, Field}
import freechips.rocketchip.util.{ParameterizedBundle, GenericParameterizedBundle, UIntIsOneOf}
import freechips.rocketchip.unittest.UnitTest
import junctions._
import chisel3._
import chisel3.util._
import chisel3.experimental.MultiIOModule
// From MIDAS
import midas.widgets.{D2V, V2D, SkidRegister}
class DualQueue[T <: Data](gen: =>T, entries: Int) extends Module {
val io = IO(new Bundle {
val enqA = Flipped(Decoupled(gen.cloneType))
val enqB = Flipped(Decoupled(gen.cloneType))
val deq = Decoupled(gen.cloneType)
val next = Valid(gen.cloneType)
})
val qA = Module(new Queue(gen.cloneType, (entries+1)/2))
val qB = Module(new Queue(gen.cloneType, entries/2))
qA.io.deq.ready := false.B
qB.io.deq.ready := false.B
val enqPointer = RegInit(false.B)
when (io.enqA.fire() ^ io.enqB.fire()) {
enqPointer := ~enqPointer
}
when(enqPointer ^ ~io.enqA.valid){
qA.io.enq <> io.enqB
qB.io.enq <> io.enqA
}.otherwise{
qA.io.enq <> io.enqA
qB.io.enq <> io.enqB
}
val deqPointer = RegInit(false.B)
when (io.deq.fire()) {
deqPointer := ~deqPointer
}
when(deqPointer){
io.deq <> qB.io.deq
io.next <> D2V(qA.io.deq)
}.otherwise{
io.deq <> qA.io.deq
io.next <> D2V(qB.io.deq)
}
}
class ProgrammableSubAddr(
val maskBits: Int,
val longName: String,
val defaultOffset: BigInt,
val defaultMask: BigInt) extends Bundle with HasProgrammableRegisters {
val offset = UInt(32.W) // TODO:fixme
val mask = UInt(maskBits.W) // Must be contiguous high bits starting from LSB
def getSubAddr(fullAddr: UInt): UInt = (fullAddr >> offset) & mask
// Used to produce a bit vector of enables from a mask
def maskToOH(): UInt = {
val decodings = Seq.tabulate(maskBits)({ i => ((1 << (1 << (i + 1))) - 1).U})
MuxCase(1.U, (mask.toBools.zip(decodings)).reverse)
}
val registers = Seq(
(offset -> RuntimeSetting(defaultOffset,s"${longName} Offset", min = 0)),
(mask -> RuntimeSetting(defaultMask,s"${longName} Mask", max = Some((1 << maskBits) - 1)))
)
def forceSettings(offsetValue: BigInt, maskValue: BigInt) {
regMap(offset).set(offsetValue)
regMap(mask).set(maskValue)
}
}
// A common motif to track inputs in a buffer
trait HasFIFOPointers {
val entries: Int
val do_enq = Wire(Bool())
val do_deq = Wire(Bool())
val enq_ptr = Counter(entries)
val deq_ptr = Counter(entries)
val maybe_full = RegInit(false.B)
val ptr_match = enq_ptr.value === deq_ptr.value
val empty = ptr_match && !maybe_full
val full = ptr_match && maybe_full
when (do_enq) {
enq_ptr.inc()
}
when (do_deq) {
deq_ptr.inc()
}
when (do_enq =/= do_deq) {
maybe_full := do_enq
}
}
class DynamicLatencyPipeIO[T <: Data](gen: T, entries: Int, countBits: Int)
extends QueueIO(gen, entries) {
val latency = Input(UInt(countBits.W))
val tCycle = Input(UInt(countBits.W))
override def cloneType = new DynamicLatencyPipeIO(gen, entries, countBits).asInstanceOf[this.type]
}
// I had to copy this code because critical fields are now private
class DynamicLatencyPipe[T <: Data] (
gen: T,
val entries: Int,
countBits: Int
) extends Module with HasFIFOPointers {
val io = IO(new DynamicLatencyPipeIO(gen, entries, countBits))
// Add the implication on enq.fire to work around target reset problems for now
assert(!io.enq.fire || io.latency =/= 0.U, "DynamicLatencyPipe only supports latencies > 0")
val ram = Mem(entries, gen)
do_enq := io.enq.fire()
do_deq := io.deq.fire()
when (do_enq) {
ram(enq_ptr.value) := io.enq.bits
}
io.enq.ready := !full
io.deq.bits := ram(deq_ptr.value)
val ptr_diff = enq_ptr.value - deq_ptr.value
if (isPow2(entries)) {
io.count := Cat(maybe_full && ptr_match, ptr_diff)
} else {
io.count := Mux(ptr_match,
Mux(maybe_full,
entries.asUInt, 0.U),
Mux(deq_ptr.value > enq_ptr.value,
entries.asUInt + ptr_diff, ptr_diff))
}
val latencies = Reg(Vec(entries, UInt(countBits.W)))
val pendingRegisters = RegInit(VecInit(Seq.fill(entries)(false.B)))
val done = Vec(latencies.zip(pendingRegisters) map { case (lat, pendingReg) =>
val cycleMatch = lat === io.tCycle
when (cycleMatch) { pendingReg := false.B }
cycleMatch || !pendingReg
})
when (do_enq) {
latencies(enq_ptr.value) := io.tCycle + io.latency
pendingRegisters(enq_ptr.value) := io.latency != 1.U
}
io.deq.valid := !empty && done(deq_ptr.value)
}
// Counts down from a set value; If the set value is less than the present value
// it is ignored.
class DownCounter(counterWidth: Int) extends Module {
val io = IO(new Bundle {
val set = Input(Valid(UInt(counterWidth.W)))
val decr = Input(Bool())
val current = Output(UInt(counterWidth.W))
val idle = Output(Bool())
})
require(counterWidth > 0, "DownCounter must have a width > 0")
val delay = RegInit(0.U(counterWidth.W))
when(io.set.valid && io.set.bits >= delay) {
delay := io.set.bits
}.elsewhen(io.decr && delay =/= 0.U){
delay := delay - 1.U
}
io.idle := delay === 0.U
io.current := delay
}
// While down counter has a local decrementer, this module instead matches against
// a provided cycle count.
class CycleTracker(counterWidth: Int) extends Module {
val io = IO(new Bundle {
val set = Input(Valid(UInt(counterWidth.W)))
val tCycle = Input(UInt(counterWidth.W))
val idle = Output(Bool())
})
require(counterWidth > 0, "CycleTracker must have a width > 0")
val delay = RegInit(0.U(counterWidth.W))
val idle = RegInit(true.B)
when(io.set.valid && io.tCycle =/= io.set.bits) {
delay := io.set.bits
idle := false.B
}.elsewhen(delay === io.tCycle){
idle := true.B
}
io.idle := idle
}
// A collapsing buffer with entries that can be updated. Valid entries trickle
// down through queue, one entry per cycle.
// Kill is implemented by setting io.update(entry).valid := false.B
//
// NB: Companion object should be used to generate a module instance -> or
// updates must be driven to entries by default for the module to behave
// correctly
class CollapsingBufferIO[T <: Data](private val gen: T, val depth: Int) extends Bundle {
val entries = Output(Vec(depth, Valid(gen)))
val updates = Input(Vec(depth, Valid(gen)))
val enq = Flipped(Decoupled(gen))
val programmableDepth = Input(UInt(log2Ceil(depth+1).W))
}
// Note: Use companion object
class CollapsingBuffer[T <: Data](gen: T, depth: Int) extends Module {
val io = IO(new CollapsingBufferIO(gen, depth))
def linkEntries(entries: Seq[(ValidIO[T], ValidIO[T], Bool)], shifting: Bool): Unit = entries match {
case Nil => throw new RuntimeException("Asked for 0 entry collapasing buffer?")
// Youngest entry, connect up io.enq
case (entry, currentUpdate, lastEntry) :: Nil => {
val shift = shifting || !currentUpdate.valid
entry := Mux(shift, D2V(io.enq), currentUpdate)
io.enq.ready := shift
}
// Default case, a younger stage enqueues into this one
case (entry, currentUpdate, lastEntry) :: tail => {
val youngerUpdate = tail.head._2
val shift = !lastEntry && ( shifting || !currentUpdate.valid)
entry := Mux(shift, youngerUpdate, currentUpdate)
linkEntries(tail, shift)
}
}
val lastEntry = UIntToOH(io.programmableDepth).toBools.take(depth).reverse
val entries = Seq.fill(depth)(
RegInit({val w = Wire(Valid(gen.cloneType)); w.valid := false.B; w.bits := DontCare; w}))
io.entries := entries
linkEntries((entries, io.updates, lastEntry).zipped.toList, false.B)
}
object CollapsingBuffer {
def apply[T <: Data](
enq: DecoupledIO[T],
depth: Int,
programmableDepth: Option[UInt] = None): CollapsingBuffer[T] = {
val buffer = Module(new CollapsingBuffer(enq.bits.cloneType, depth))
// This sets the default that each entry retains its value unless driven by the parent mod
(buffer.io.updates).zip(buffer.io.entries).foreach({ case (e, u) => e := u })
buffer.io.enq <> enq
buffer.io.programmableDepth := programmableDepth.getOrElse(depth.U)
buffer
}
}
trait HasAXI4Id extends HasNastiParameters { val id = UInt(nastiXIdBits.W) }
trait HasAXI4IdAndLen extends HasAXI4Id { val len = UInt(nastiXLenBits.W) }
trait HasReqMetaData extends HasAXI4IdAndLen { val addr = UInt(nastiXAddrBits.W) }
class TransactionMetaData(implicit val p: Parameters) extends Bundle with HasAXI4IdAndLen {
val isWrite = Bool()
}
object TransactionMetaData {
def apply(id: UInt, len: UInt, isWrite: Bool)(implicit p: Parameters): TransactionMetaData = {
val w = Wire(new TransactionMetaData)
w.id := id
w.len := len
w.isWrite := isWrite
w
}
def apply(x: NastiReadAddressChannel)(implicit p: Parameters): TransactionMetaData =
apply(x.id, x.len, false.B)
def apply(x: NastiWriteAddressChannel)(implicit p: Parameters): TransactionMetaData =
apply(x.id, x.len, true.B)
}
class WriteResponseMetaData(implicit val p: Parameters) extends Bundle with HasAXI4Id
class ReadResponseMetaData(implicit val p: Parameters) extends Bundle with HasAXI4IdAndLen
object ReadResponseMetaData {
def apply(x: HasAXI4IdAndLen)(implicit p: Parameters): ReadResponseMetaData = {
val readMetaData = Wire(new ReadResponseMetaData)
readMetaData.id := x.id
readMetaData.len := x.len
readMetaData
}
// UGH. Will fix when i go to RC's AXI4 impl
def apply(x: NastiReadAddressChannel)(implicit p: Parameters): ReadResponseMetaData = {
val readMetaData = Wire(new ReadResponseMetaData)
readMetaData.id := x.id
readMetaData.len := x.len
readMetaData
}
}
object WriteResponseMetaData {
def apply(x: HasAXI4Id)(implicit p: Parameters): WriteResponseMetaData = {
val writeMetaData = Wire(new WriteResponseMetaData)
writeMetaData.id := x.id
writeMetaData
}
def apply(x: NastiWriteAddressChannel)(implicit p: Parameters): WriteResponseMetaData = {
val writeMetaData = Wire(new WriteResponseMetaData)
writeMetaData.id := x.id
writeMetaData
}
}
class AXI4ReleaserIO(implicit val p: Parameters) extends ParameterizedBundle()(p) {
val b = Decoupled(new NastiWriteResponseChannel)
val r = Decoupled(new NastiReadDataChannel)
val egressReq = new EgressReq
val egressResp = Flipped(new EgressResp)
val nextRead = Flipped(Decoupled(new ReadResponseMetaData))
val nextWrite = Flipped(Decoupled(new WriteResponseMetaData))
}
class AXI4Releaser(implicit p: Parameters) extends Module {
val io = IO(new AXI4ReleaserIO)
val currentRead = Queue(io.nextRead, 1, pipe = true)
currentRead.ready := io.r.fire && io.r.bits.last
io.egressReq.r.valid := io.nextRead.fire
io.egressReq.r.bits := io.nextRead.bits.id
io.r.valid := currentRead.valid
io.r.bits := io.egressResp.rBits
io.egressResp.rReady := io.r.ready
val currentWrite = Queue(io.nextWrite, 1, pipe = true)
currentWrite.ready := io.b.fire
io.egressReq.b.valid := io.nextWrite.fire
io.egressReq.b.bits := io.nextWrite.bits.id
io.b.valid := currentWrite.valid
io.b.bits := io.egressResp.bBits
io.egressResp.bReady := io.b.ready
}
class FIFOAddressMatcher(val entries: Int, addrWidth: Int) extends Module with HasFIFOPointers {
val io = IO(new Bundle {
val enq = Flipped(Valid(UInt(addrWidth.W)))
val deq = Input(Bool())
val match_address = Input(UInt(addrWidth.W))
val hit = Output(Bool())
})
val addrs = RegInit(VecInit(Seq.fill(entries)({
val w = Wire(Valid(UInt(addrWidth.W)))
w.valid := false.B
w.bits := DontCare
w
})))
do_enq := io.enq.valid
do_deq := io.deq
assert(!full || (!do_enq || do_deq)) // Since we don't have backpressure, check for overflow
when (do_enq) {
addrs(enq_ptr.value).valid := true.B
addrs(enq_ptr.value).bits := io.enq.bits
}
when (do_deq) {
addrs(deq_ptr.value).valid := false.B
}
io.hit := addrs.exists({entry => entry.valid && entry.bits === io.match_address })
}
class AddressCollisionCheckerIO(addrWidth: Int)(implicit p: Parameters) extends NastiBundle()(p) {
val read_req = Input(Valid(UInt(addrWidth.W)))
val read_done = Input(Bool())
val write_req = Input(Valid(UInt(addrWidth.W)))
val write_done = Input(Bool())
val collision_addr = ValidIO(UInt(addrWidth.W))
}
class AddressCollisionChecker(numReads: Int, numWrites: Int, addrWidth: Int)(implicit p: Parameters)
extends NastiModule()(p) {
val io = IO(new AddressCollisionCheckerIO(addrWidth))
require(isPow2(numReads))
require(isPow2(numWrites))
//val discardedLSBs = 6
//val addrType = UInt(p(NastiKey).addrBits - discardedLSBs)
val read_matcher = Module(new FIFOAddressMatcher(numReads, addrWidth)).io
read_matcher.enq := io.read_req
read_matcher.deq := io.read_done
read_matcher.match_address := io.write_req.bits
val write_matcher = Module(new FIFOAddressMatcher(numReads, addrWidth)).io
write_matcher.enq := io.write_req
write_matcher.deq := io.write_done
write_matcher.match_address := io.read_req.bits
io.collision_addr.valid := io.read_req.valid && write_matcher.hit ||
io.write_req.valid && read_matcher.hit
io.collision_addr.bits := Mux(io.read_req.valid, io.read_req.bits, io.write_req.bits)
}
class CounterReadoutIO(val addrBits: Int) extends Bundle {
val enable = Input(Bool()) // Set when the simulation memory bus whishes to read out the values
val addr = Input(UInt(addrBits.W))
val dataL = Output(UInt(32.W))
val dataH = Output(UInt(32.W))
}
class CounterIncrementIO(val addrBits: Int, val dataBits: Int) extends Bundle {
val enable = Input(Bool())
val addr = Input(UInt(addrBits.W))
// Pass data 2 cycles after enable
val data = Input(UInt(dataBits.W))
}
class CounterTable(addrBits: Int, dataBits: Int) extends Module {
val io = IO(new Bundle {
val incr = new CounterIncrementIO(addrBits, dataBits)
val readout = new CounterReadoutIO(addrBits)
})
require(dataBits > 32)
val memDepth = 1 << addrBits
val counts = Mem(memDepth, UInt(dataBits.W))
val s0_readAddr = Mux(io.readout.enable, io.readout.addr, io.incr.addr)
val s1_readAddr = RegNext(s0_readAddr)
val s1_readData = counts.read(s1_readAddr)
val s1_valid = RegNext(io.incr.enable, false.B)
val s1_readout = RegNext(io.readout.enable)
val s2_valid = RegNext(s1_valid && !s1_readout)
val s2_writeAddr = RegNext(s1_readAddr)
val s2_readData = RegNext(s1_readData)
val s2_writeData = Wire(UInt(dataBits.W))
val s3_valid = RegNext(s2_valid)
val s3_writeData = RegNext(s2_writeData)
val s3_writeAddr = RegNext(s2_writeAddr)
val doBypass = s2_valid && s3_valid && s2_writeAddr === s3_writeAddr
s2_writeData := Mux(doBypass, s3_writeData, s2_readData) + io.incr.data
when (s2_valid) {
counts(s2_writeAddr) := s2_writeData
}
io.readout.dataL := s2_readData(31, 0)
io.readout.dataH := s2_readData(dataBits-1, 32)
}
// Stores a histogram of host latencies in BRAM
// Setting io.readoutEnable ties a read port of the BRAM to a read address that
// can be driven by the simulation bus
//
// WARNING: Will drop bin updates if attempting to read values while host
// transactions are still inflight
class HostLatencyHistogramIO(val idBits: Int, val binAddrBits: Int) extends Bundle {
val reqId = Flipped(ValidIO(UInt(idBits.W)))
val respId = Flipped(ValidIO(UInt(idBits.W)))
val cycleCountEnable = Input(Bool()) // Indicates which cycles the counter should be incremented
val readout = new CounterReadoutIO(binAddrBits)
}
// Defaults Will fit in a 36K BRAM
class HostLatencyHistogram (
idBits: Int,
cycleCountBits: Int = 10
) extends Module {
val io = IO(new HostLatencyHistogramIO(idBits, cycleCountBits))
val binSize = 36
// Need a queue for each ID to track the host cycle a request was issued.
val queues = Seq.fill(1 << idBits)(Module(new Queue(UInt(cycleCountBits.W), 1)))
val cycle = RegInit(0.U(cycleCountBits.W))
when (io.cycleCountEnable) { cycle := cycle + 1.U }
// When the host accepts an AW/AR enq the current cycle
(queues map { _.io.enq }).zip(UIntToOH(io.reqId.bits).toBools).foreach({ case (enq, sel) =>
enq.valid := io.reqId.valid && sel
enq.bits := cycle
assert(!(enq.valid && !enq.ready), "Multiple requests issued to same ID")
})
val deqAddrOH = UIntToOH(io.respId.bits)
val reqCycle = Mux1H(deqAddrOH, (queues map { _.io.deq.bits }))
(queues map { _.io.deq }).zip(deqAddrOH.toBools).foreach({ case (deq, sel) =>
deq.ready := io.respId.valid && sel
assert(deq.valid || !deq.ready, "Received an unexpected response")
})
val histogram = Module(new CounterTable(cycleCountBits, binSize))
histogram.io.incr.enable := io.respId.valid
histogram.io.incr.addr := cycle - reqCycle
histogram.io.incr.data := 1.U
io.readout <> histogram.io.readout
}
object HostLatencyHistogram {
def apply(
reqValid: Bool,
reqId: UInt,
respValid: UInt,
respId: UInt,
cycleCountEnable: Bool = true.B,
binAddrBits: Int = 10): CounterReadoutIO = {
require(reqId.getWidth == respId.getWidth)
val histogram = Module(new HostLatencyHistogram(reqId.getWidth, binAddrBits))
histogram.io.reqId.bits := reqId
histogram.io.reqId.valid := reqValid
histogram.io.respId.bits := respId
histogram.io.respId.valid := respValid
histogram.io.cycleCountEnable := cycleCountEnable
histogram.io.readout
}
}
// Pick out the relevant parts of NastiReadAddressChannel or NastiWriteAddressChannel
class AddressRangeCounterRequest(implicit p: Parameters) extends NastiBundle {
val addr = UInt(nastiXAddrBits.W)
val len = UInt(nastiXLenBits.W)
val size = UInt(nastiXSizeBits.W)
}
// Stores count of #bytes requested from each range in BRAM.
// Setting io.readout.enable ties a read port of the BRAM to a read address
// that can be driven by the simulation bus
//
// WARNING: Will drop range updates if attempting to read values when host
// transactions issued
class AddressRangeCounter(nRanges: BigInt)(implicit p: Parameters) extends NastiModule {
val io = IO(new Bundle {
val req = Flipped(ValidIO(new AddressRangeCounterRequest))
val readout = new CounterReadoutIO(log2Ceil(nRanges))
})
require(nRanges > 1)
require(nRanges < (1L << 32))
require(isPow2(nRanges))
val counterBits = 48
val addrMSB = nastiXAddrBits - 1
val addrLSB = nastiXAddrBits - log2Ceil(nRanges)
val s1_len = RegNext(io.req.bits.len)
val s1_size = RegNext(io.req.bits.size)
val s1_bytes = (s1_len + 1.U) << s1_size
val s2_bytes = RegNext(s1_bytes)
val counters = Module(new CounterTable(log2Ceil(nRanges), counterBits))
counters.io.incr.enable := io.req.valid
counters.io.incr.addr := io.req.bits.addr(addrMSB, addrLSB)
counters.io.incr.data := s2_bytes
io.readout <> counters.io.readout
}
object AddressRangeCounter {
def apply[T <: NastiAddressChannel](
n: BigInt, req: DecoupledIO[T], en: Bool)(implicit p: Parameters) = {
val counter = Module(new AddressRangeCounter(n))
counter.io.req.valid := req.fire() && en
counter.io.req.bits.addr := req.bits.addr
counter.io.req.bits.len := req.bits.len
counter.io.req.bits.size := req.bits.size
counter.io.readout
}
}
object AddressCollisionCheckMain extends App {
implicit val p = Parameters.empty.alterPartial({case NastiKey => NastiParameters(64,32,4)})
chisel3.Driver.execute(args, () => new AddressCollisionChecker(4,4,16))
}
class CounterTableUnitTest extends UnitTest {
val addrBits = 10
val dataBits = 48
val counters = Module(new CounterTable(addrBits, dataBits))
val (s_start :: s_readInit :: s_incr :: s_readout :: s_done :: Nil) = Enum(5)
val state = RegInit(s_start)
val incrAddrs = VecInit(Seq(0, 0, 4, 0, 4, 16).map(_.U(addrBits.W)))
val incrData = VecInit(Seq(1, 2, 5, 1, 3, 7).map(_.U(dataBits.W)))
val (incrIdx, incrDone) = Counter(state === s_incr, incrAddrs.size)
val readAddrs = VecInit(Seq(0, 4, 8, 16).map(_.U(addrBits.W)))
val readExpected = VecInit(Seq(4, 8, 0, 7).map(_.U(dataBits.W)))
val (readIdx, readDone) = Counter(state.isOneOf(s_readInit, s_readout), readAddrs.size)
val initValues = Reg(Vec(readExpected.size, UInt(dataBits.W)))
counters.io.incr.enable := state === s_incr
counters.io.incr.addr := incrAddrs(incrIdx)
counters.io.incr.data := RegNext(RegNext(incrData(incrIdx)))
counters.io.readout.enable := state.isOneOf(s_readInit, s_readout)
counters.io.readout.addr := readAddrs(readIdx)
val readData = Cat(counters.io.readout.dataH, counters.io.readout.dataL)
val initValid = RegNext(RegNext(state === s_readInit, false.B), false.B)
val initWriteIdx = RegNext(RegNext(readIdx))
when (initValid) { initValues(initWriteIdx) := readData }
val expectedCount = RegNext(RegNext(readExpected(readIdx)))
val readValid = RegNext(RegNext(state === s_readout, false.B), false.B)
val readCount = readData - RegNext(RegNext(initValues(readIdx)))
assert(!readValid || readCount === expectedCount)
when (state === s_start && io.start) { state := s_readInit }
when (state === s_readInit && readDone) { state := s_incr }
when (incrDone) { state := s_readout }
when (state === s_readout && readDone) { state := s_done }
io.finished := state === s_done
}
class LatencyHistogramUnitTest extends UnitTest {
val addrBits = 8
val histogram = Module(new HostLatencyHistogram(2, addrBits))
val dataBits = histogram.binSize
val (s_start :: s_readInit :: s_run :: s_readout :: s_done :: Nil) = Enum(5)
val state = RegInit(s_start)
// The second response comes a cycle after the first,
// with the same amount of time (2 cycles) after the request.
// Therefore, it will require a bypass.
// The third response comes a cycle after the second, but since the number
// of cycles is 1 instead of 2, it will not require a bypass.
// The fourth response also comes 2 cycles after the request,
// but since several cycles have elapsed since last update, no bypass needed
val cycleReq = VecInit(Seq(true.B, true.B, false.B, true.B, true.B, false.B, false.B))
val cycleResp = VecInit(Seq(false.B, false.B, true.B, true.B, true.B, false.B, true.B))
val (runIdx, runDone) = Counter(state === s_run, cycleReq.size)
val (reqId, _) = Counter(histogram.io.reqId.valid, 4)
val (respId, _) = Counter(histogram.io.respId.valid, 4)
val readAddrs = VecInit(Seq(1.U(addrBits.W), 2.U(addrBits.W)))
val readExpected = VecInit(Seq(1.U(dataBits.W), 3.U(dataBits.W)))
val (readIdx, readDone) = Counter(state.isOneOf(s_readInit, s_readout), readAddrs.size)
histogram.io.reqId.valid := state === s_run && cycleReq(runIdx)
histogram.io.reqId.bits := reqId
histogram.io.respId.valid := state === s_run && cycleResp(runIdx)
histogram.io.respId.bits := respId
histogram.io.cycleCountEnable := true.B
histogram.io.readout.enable := state.isOneOf(s_readInit, s_readout)
histogram.io.readout.addr := readAddrs(readIdx)
val initValues = Reg(Vec(readExpected.size, UInt(dataBits.W)))
val initWriteIdx = RegNext(RegNext(readIdx))
val initValid = RegNext(RegNext(state === s_readInit, false.B), false.B)
val readData = Cat(histogram.io.readout.dataH, histogram.io.readout.dataL)
when (initValid) { initValues(initWriteIdx) := readData }
when (state === s_start && io.start) { state := s_readInit }
when (state === s_readInit && readDone) { state := s_run }
when (runDone) { state := s_readout }
when (state === s_readout && readDone) { state := s_done }
val expectedCount = RegNext(RegNext(readExpected(readIdx)))
val readValid = RegNext(RegNext(state === s_readout, false.B), false.B)
val readCount = readData - RegNext(RegNext(initValues(readIdx)))
assert(!readValid || readCount === expectedCount)
io.finished := state === s_done
}
class AddressRangeCounterUnitTest(implicit p: Parameters) extends UnitTest {
val nCounters = 8
val nastiP = p.alterPartial({
case NastiKey => NastiParameters(64, 16, 4)
})
val counters = Module(new AddressRangeCounter(nCounters)(nastiP))
val (s_start :: s_readInit :: s_run :: s_readout :: s_done :: Nil) = Enum(5)
val state = RegInit(s_start)
val reqAddrs = VecInit(Seq(
0x00000, 0x2000, 0x4000, 0x6000,
0x4000, 0x2000, 0x0000, 0x6000).map(_.U(16.W)))
val reqSizes = VecInit(Seq(3, 2, 3, 1, 0, 1, 2, 3).map(_.U(3.W)))
val reqLens = VecInit(Seq(0, 4, 5, 3, 1, 9, 4, 6).map(_.U(8.W)))
def computeExpected(idx: Int) = (reqLens(idx) + 1.U) << reqSizes(idx)
val readExpected = VecInit(Seq((0, 6), (1, 5), (2, 4), (3, 7)).map {
case (a, b) => computeExpected(a) + computeExpected(b)
})
val (readIdx, readDone) = Counter(state.isOneOf(s_readInit, s_readout), readExpected.size)
val (runIdx, runDone) = Counter(state === s_run, reqAddrs.size)
val initValues = Reg(Vec(readExpected.size, UInt(counters.counterBits.W)))
val initWriteIdx = RegNext(RegNext(readIdx))
val initValid = RegNext(RegNext(state === s_readInit, false.B), false.B)
val readData = Cat(counters.io.readout.dataH, counters.io.readout.dataL)
counters.io.req.valid := state === s_run
counters.io.req.bits.addr := reqAddrs(runIdx)
counters.io.req.bits.len := reqLens(runIdx)
counters.io.req.bits.size := reqSizes(runIdx)
counters.io.readout.enable := state.isOneOf(s_readInit, s_readout)
counters.io.readout.addr := readIdx
when (initValid) { initValues(initWriteIdx) := readData }
when (state === s_start && io.start) { state := s_readInit }
when (state === s_readInit && readDone) { state := s_run }
when (runDone) { state := s_readout }
when (state === s_readout && readDone) { state := s_done }
val expectedCount = RegNext(RegNext(readExpected(readIdx)))
val readValid = RegNext(RegNext(state === s_readout, false.B), false.B)
val readCount = readData - RegNext(RegNext(initValues(readIdx)))
assert(!readValid || readCount === expectedCount)
io.finished := state === s_done
}
// Checks AXI4 transactions to ensure they conform to the bounds
// set in the memory model configuration eg. Max burst lengths respected
// NOTE: For use only in a FAME1 context
class MemoryModelMonitor(cfg: BaseConfig)(implicit p: Parameters) extends MultiIOModule {
val axi4 = IO(Input(new NastiIO))
assert(!axi4.ar.fire || axi4.ar.bits.len < cfg.maxReadLength.U,
s"Read burst length exceeds memory-model maximum of ${cfg.maxReadLength}")
assert(!axi4.aw.fire || axi4.aw.bits.len < cfg.maxWriteLength.U,
s"Write burst length exceeds memory-model maximum of ${cfg.maxReadLength}")
}

View File

@ -0,0 +1,160 @@
package midas.models.sram
import chisel3._
import chisel3.util.{Mux1H, Decoupled, RegEnable, log2Ceil, Enum}
import chisel3.experimental.{MultiIOModule, dontTouch}
//import chisel3.experimental.ChiselEnum
import chisel3.experimental.{DataMirror, requireIsChiselType}
import collection.immutable.ListMap
class AsyncMemModelGen(val depth: Int, val dataWidth: Int) extends ModelGenerator {
assert(depth > 0)
assert(dataWidth > 0)
val emitModel = () => new AsyncMemChiselModel(depth, dataWidth)
val emitRTLImpl = () => new AsyncMemChiselRTL(depth, dataWidth)
}
class AsyncMemChiselRTL(val depth: Int, val dataWidth: Int, val nReads: Int = 2, val nWrites: Int = 2) extends MultiIOModule {
val channels = IO(new RegfileRTLIO(depth, dataWidth, nReads, nWrites))
val data = Mem(depth, UInt(dataWidth.W))
for (i <- 0 until nReads) {
channels.read_resps(i) := data.read(channels.read_cmds(i).addr)
}
for (i <- 0 until nWrites) {
val write_cmd = channels.write_cmds(i)
def collides(c: WriteCmd) = c.active && write_cmd.active && (c.addr === write_cmd.addr)
val collision_detected = channels.write_cmds.drop(i+1).foldLeft(false.B) {
case (detected, cmd) => detected || collides(cmd)
}
when (write_cmd.active && !reset.toBool() && !collision_detected) {
data.write(write_cmd.addr, write_cmd.data)
}
}
}
object AsyncMemChiselModel {
//object ReadState extends ChiselEnum {
// val start, active, generated, responded = Value
//}
object ReadState {
lazy val start :: active :: generated :: responded :: Nil = Enum(4)
}
}
class AsyncMemChiselModel(val depth: Int, val dataWidth: Int, val nReads: Int = 2, val nWrites: Int = 2) extends MultiIOModule {
// FSM states and helper functions
//import AsyncMemChiselModel.ReadState
import AsyncMemChiselModel.ReadState._
val tupleAND = (vals: (Bool, Bool)) => vals._1 && vals._2
val tupleOR = (vals: (Bool, Bool)) => vals._1 || vals._2
// Channelized IO
val channels = IO(new RegfileModelIO(depth, dataWidth, nReads, nWrites))
// Target reset logic
val target_reset_fired = Reg(Bool())
val target_reset_available = target_reset_fired || channels.reset.valid
val target_reset_reg = Reg(Bool())
val target_reset_value = Mux(target_reset_fired, target_reset_reg, channels.reset.bits)
// Host memory implementation
val data = Mem(depth, UInt(dataWidth.W))
val active_read_addr = Wire(UInt())
val active_write_addr = Wire(UInt())
val active_write_data = Wire(UInt())
val active_write_en = Wire(Bool())
val read_data_async = data.read(active_read_addr)
val read_data = RegNext(read_data_async)
when (active_write_en && !target_reset_value && !reset.toBool()) {
data.write(active_write_addr, active_write_data)
}
// Read request management and response data buffering
val read_state = Reg(Vec(nReads, start.cloneType))
val read_resp_data = Reg(Vec(nReads, UInt(dataWidth.W)))
val read_access_req = (read_state zip channels.read_cmds) map { case (s, cmd) => s === start && cmd.valid }
// Don't use priority encoder because bools catted to ints considered hard to QED
val read_access_available = read_access_req.scanLeft(true.B)({ case (open, claim) => open && !claim }).init
val read_access_granted = (read_access_req zip read_access_available) map tupleAND
// Have all reads actually been performed?
val reads_done = read_state.foldLeft(true.B) { case (others_done, s) => others_done && s =/= start }
// This is used to overlap last read and first write -- depends on READ_FIRST implementation
val reads_finishing = (read_state zip channels.read_cmds).foldLeft(true.B) {
case (finishing, (s, cmd)) => finishing && (s =/= start || cmd.fire)
}
// Are all reads done or finishing this cycle?
val outputs_responded_or_firing = (read_state zip channels.read_resps).foldLeft(true.B) {
case (res, (s, resp)) => res && (s === responded || resp.fire)
}
// Write request management
val write_complete = Reg(Vec(nWrites, Bool()))
// Order writes for determinism
val write_prereqs_met = (true.B +: write_complete.init) map { case p => p && reads_done && target_reset_available }
// Are all writes done or finishing this cycle?
val writes_done_or_finishing = (write_complete zip channels.write_cmds).foldLeft(true.B) {
case (res, (complete, cmd)) => res && (complete || cmd.fire)
}
val advance_cycle = outputs_responded_or_firing && writes_done_or_finishing
// Target reset state management
channels.reset.ready := !target_reset_fired
when (advance_cycle || reset.toBool()) {
target_reset_fired := false.B
} .elsewhen (channels.reset.fire) {
target_reset_fired := true.B
target_reset_reg := channels.reset.bits
}
// Read state management
active_read_addr := channels.read_cmds(0).bits.addr
for (i <- 0 until nReads) {
when (read_access_granted(i)) { active_read_addr := channels.read_cmds(i).bits.addr }
channels.read_cmds(i).ready := read_state(i) === start && read_access_available(i)
channels.read_resps(i).bits := Mux(read_state(i) === active, read_data, read_resp_data(i))
channels.read_resps(i).valid := read_state(i) === active || read_state(i) === generated
when (advance_cycle || reset.toBool()) {
read_state(i) := start
} .elsewhen (read_state(i) === start && read_access_granted(i)) {
read_state(i) := active
} .elsewhen (read_state(i) === active) {
read_state(i) := Mux(channels.read_resps(i).fire, responded, generated)
read_resp_data(i) := read_data
} .elsewhen (read_state(i) === generated && channels.read_resps(i).fire) {
read_state(i) := responded
}
}
// Write state management
active_write_addr := channels.write_cmds(0).bits.addr
active_write_data := channels.write_cmds(0).bits.data
active_write_en := false.B
for (i <- 0 until nWrites) {
channels.write_cmds(i).ready := write_prereqs_met(i) && !write_complete(i)
when (advance_cycle || reset.toBool()) {
write_complete(i) := false.B
} .elsewhen (channels.write_cmds(i).fire) {
write_complete(i) := true.B
}
when (channels.write_cmds(i).fire) {
active_write_addr := channels.write_cmds(i).bits.addr
active_write_data := channels.write_cmds(i).bits.data
active_write_en := channels.write_cmds(i).bits.active
}
}
}

Some files were not shown because too many files have changed in this diff Show More