Merge pull request #400 from firesim/flatten-midas
Flatten the MIDAS submodule into FireSim
This commit is contained in:
commit
a7ed32fe31
|
@ -4,9 +4,6 @@
|
|||
[submodule "sw/firesim-software"]
|
||||
path = sw/firesim-software
|
||||
url = https://github.com/firesim/firesim-software
|
||||
[submodule "sim/midas"]
|
||||
path = sim/midas
|
||||
url = https://github.com/ucb-bar/midas
|
||||
[submodule "target-design/chipyard"]
|
||||
path = target-design/chipyard
|
||||
url = https://github.com/ucb-bar/project-template
|
||||
|
@ -28,3 +25,6 @@
|
|||
[submodule "deploy/workloads/coremark/riscv-coremark"]
|
||||
path = deploy/workloads/coremark/riscv-coremark
|
||||
url = https://github.com/riscv-boom/riscv-coremark
|
||||
[submodule "sim/midas/src/main/cc/dramsim2"]
|
||||
path = sim/midas/src/main/cc/dramsim2
|
||||
url = https://github.com/firesim/DRAMSim2.git
|
||||
|
|
|
@ -1 +0,0 @@
|
|||
Subproject commit 8401cfa7e20c8331264a5d726698a4ed1994d45e
|
|
@ -0,0 +1,12 @@
|
|||
generated
|
||||
logs
|
||||
results
|
||||
target
|
||||
project/target
|
||||
*.out
|
||||
*.swp
|
||||
*.tmp
|
||||
*.key
|
||||
DVEfiles
|
||||
*~
|
||||
*#
|
|
@ -0,0 +1,89 @@
|
|||
# Golden Gate (MIDAS II)
|
||||
|
||||
Golden Gate is an _optimizing_ FIRRTL compiler for generating FPGA-accelerated simulators
|
||||
automatically from Chisel-based RTL design, and is the basis for simulator
|
||||
compilation in [FireSim](https://fires.im).
|
||||
|
||||
Golden Gate is the successor to MIDAS, which was originally based off the
|
||||
[Strober](http://dl.acm.org/citation.cfm?id=3001151) sample-based energy
|
||||
simulation framework. Golden Gate differs from prior work in that it is, to our knowledge, the first compiler
|
||||
to support automatic _multi-model composition_: it can break apart a
|
||||
block of RTL into a graph of models. Golden Gate uses this feature
|
||||
to identify and replace FPGA-hostile blocks with multi-host-cycle models that
|
||||
consume fewer FPGA resources while still exactly representing the behavior of
|
||||
the source RTL. In [our ICCAD 2019 paper](http://davidbiancolin.github.io/papers/goldengate-iccad19.pdf), we leverage this feature optimize
|
||||
multi-ported RAMs in order to fit an extra two BOOM cores (6 up from 4) on a
|
||||
Xilinx VU9P.
|
||||
|
||||
## Changes From MIDAS
|
||||
|
||||
Golden Gate inherits nearly all of the features of MIDAS, including, FASED memory timing models, assertion synthesis, and printf synthesis, but there are some notable changes:
|
||||
|
||||
### 1. Support for Resource Optimizations
|
||||
|
||||
As mentioned above, Golden Gate can identify and optimize FPGA-hostile
|
||||
structures in the target RTL. This is described at length in [our ICCAD2019
|
||||
paper](http://davidbiancolin.github.io/papers/goldengate-iccad19.pdf).
|
||||
Currently Golden Gate only supports optimizing multi-ported memories,
|
||||
but other resource-reducing optimizations are under development.
|
||||
|
||||
### 2. Different Inputs and Invocation Model (FIRRTL Stage).
|
||||
|
||||
Golden Gate is not invoked in the same process as the target generator.
|
||||
instead it's invoked as a seperate process and provided with three inputs:
|
||||
1) FIRRTL for the target-design
|
||||
2) Associated FIRRTL annotations for that design
|
||||
3) A compiler parameterization (derived from Rocket Chip's Config system).
|
||||
annotations. This permits decoupling the target Generator from the compiler,
|
||||
and enables the resuse of the same FIRRTL between multiple simulation or EDA
|
||||
backends. midas.Compiler will be removed in the next release.
|
||||
|
||||
### 3. Endpoints Have Been Replaced With Target-to-Host Bridges.
|
||||
|
||||
Unlike Endpoints, which were instantiated by matching on a Chisel I/O type,
|
||||
target-to-host bridges (or bridges, for short) are instantiated directly in the
|
||||
target's RTL (i.e., in Chisel). Unlike endpoints, bridges can be instantiated
|
||||
anywhere in the module heirachy, and can more effectively capture
|
||||
module-hierarchy-dependent parameterization information from the target. This
|
||||
makes it easier to have multiple instances of the same bridge with difference
|
||||
parameterizations.
|
||||
|
||||
### 4. The Input Target Design Must Be Closed
|
||||
|
||||
The FIRRTL passed to Golden Gate must expose no dangling I/O (with the exception of one input
|
||||
clock): instead the target should be wrapped in a module that instantiates the
|
||||
appropriate bridges. This wrapper module is directly analogous to a test
|
||||
harness used in software-based RTL simulation. How these bridges are
|
||||
instantiated is left to the user, but multiple different examples can be found in
|
||||
FireSim. One benefit of this "closed-world" approach is that the topology of the
|
||||
simulator (as a network of simulation models) is guaranteed to match the topology
|
||||
of the input design.
|
||||
|
||||
### 5. Different Underlying Dataflow Network Formalism
|
||||
|
||||
Golden Gate uses the [_Latency-Insensitive Bounded-Dataflow Network_](https://dl.acm.org/citation.cfm?id=1715781) (LI-BDN)
|
||||
target formalism. This makes it possible to model combinational paths that
|
||||
span multiple models, and to prove that properties about target-cycle exactness
|
||||
and deadlock freedom in the resulting simulator.
|
||||
|
||||
## Documentation
|
||||
|
||||
Golden Gate's documentation is hosted in [FireSim's Read-The-Docs](https://docs.fires.im)
|
||||
|
||||
## Related Publications
|
||||
|
||||
* Albert Magyar, David T. Biancolin, Jack Koenig, Sanjit Seshia, Jonathan Bachrach, Krste Asanović, **Golden Gate: Bridging The Resource-Efficiency Gap Between ASICs and FPGA Prototypes**, To appear at ICCAD '19.([Paper PDF](http://davidbiancolin.github.io/papers/goldengate-iccad19.pdf))
|
||||
* David Biancolin, Sagar Karandikar, Donggyu Kim, Jack Koenig, Andrew Waterman, Jonathan Bachrach, Krste Asanović, **“FASED: FPGA-Accelerated Simulation and Evaluation of DRAM”**, In proceedings of the 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, February 2019. ([Paper PDF](https://people.eecs.berkeley.edu/~biancolin/papers/fased-fpga19.pdf))
|
||||
* Donggyu Kim, Christopher Celio, Sagar Karandikar, David Biancolin, Jonathan Bachrach, and Krste Asanović, **“DESSERT: Debugging RTL Effectively with State Snapshotting for Error Replays across Trillions of cycles”**, In proceedings of the 28th International Conference on Field Programmable Logic & Applications (FPL 2018), Dublin, Ireland, August 2018. ([IEEE Xplore](https://ieeexplore.ieee.org/abstract/document/8533471))
|
||||
* Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolić, Randy Katz, Jonathan Bachrach, and Krste Asanović, **“FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud”**, In proceedings of the 45th ACM/IEEE International Symposium on Computer Architecture (ISCA 2018), Los Angeles, June 2018. ([Paper PDF](https://sagark.org/assets/pubs/firesim-isca2018.pdf), [IEEE Xplore](https://ieeexplore.ieee.org/document/8416816)) **Selected as one of IEEE Micro’s “Top Picks from Computer Architecture Conferences, 2018”.**
|
||||
* Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, and Krste Asanović, **"Evaluation of RISC-V RTL with FPGA-Accelerated Simulation"**, The First Workshop on Computer Architecture Research with RISC-V (CARRV 2017), Boston, MA, USA, Oct 2017. ([Paper PDF](doc/papers/carrv-2017.pdf))
|
||||
* Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, and Krste Asanović, **"Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL"**, International Symposium on Computer Architecture (ISCA-2016), Seoul, Korea, June 2016. ([ACM DL](https://dl.acm.org/citation.cfm?id=3001151), [Slides](http://isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2B-2.pdf))
|
||||
|
||||
## Dependencies
|
||||
|
||||
This repository depends on the following projects:
|
||||
* [Chisel](https://github.com/freechipsproject/chisel3): Target-RTL that MIDAS transformed must be written in Chisel RTL in the current version. Additionally, MIDAS RTL libraries are all written in Chisel.
|
||||
* [FIRRTL](https://github.com/freechipsproject/firrtl): Transformations of target-RTL are performed using FIRRTL compiler passes.
|
||||
* [RocketChip](https://github.com/freechipsproject/rocket-chip): Rocket Chip is not only a chip generator, but also a collection of useful libraries for various hardware designs.
|
||||
* [barstools](https://github.com/ucb-bar/barstools): Some additional technology-dependent custom transforms(e.g. macro compiler) are required when Strober energy modelling is enabled.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
organization := "edu.berkeley.cs"
|
||||
|
||||
version := "1.0-SNAPSHOT"
|
||||
|
||||
name := "midas"
|
||||
|
||||
scalaVersion := "2.12.4"
|
Binary file not shown.
After Width: | Height: | Size: 22 KiB |
Binary file not shown.
|
@ -0,0 +1,3 @@
|
|||
*.obj
|
||||
gmp-*
|
||||
sdfAnnotateInfo
|
|
@ -0,0 +1,119 @@
|
|||
midas_dir = $(abspath .)
|
||||
util_dir = $(midas_dir)/utils
|
||||
bridge_dir = $(midas_dir)/bridges
|
||||
replay_dir = $(midas_dir)/replay
|
||||
v_dir = $(abspath ../verilog)
|
||||
r_dir = $(abspath ../resources)
|
||||
|
||||
########################################################################
|
||||
# Parameters:
|
||||
# 1) PLATFORM: FPGA platform board(by default zynq)
|
||||
# 2) DESIGN: Target design of midas
|
||||
# 3) GEN_DIR: Directory for generated source code
|
||||
# 4) OUT_DIR: Directory for binary files (by default GEN_DIR)
|
||||
# 5) DRIVER: software driver written by user (not necessary for replay)
|
||||
# 6) CLOCK_PERIOD(optional): clock period of tests
|
||||
########################################################################
|
||||
ifeq ($(strip $(DESIGN)),)
|
||||
$(error Define DESIGN, the target design)
|
||||
endif
|
||||
ifeq ($(strip $(GEN_DIR)),)
|
||||
$(error Define GEN_DIR, where all midas generated code reside)
|
||||
endif
|
||||
ifeq ($(filter $(MAKECMDGOALS),vcs-replay $(REPLAY_BINARY)),)
|
||||
ifeq ($(strip $(DRIVER)),)
|
||||
$(error Define DRIVER, the source code of the simulation driver)
|
||||
endif
|
||||
endif
|
||||
|
||||
PLATFORM ?= zynq
|
||||
OUT_DIR ?= $(GEN_DIR)
|
||||
CLOCK_PERIOD ?= 1.0
|
||||
|
||||
$(info platform: $(PLATFORM))
|
||||
$(info target design: $(DESIGN))
|
||||
$(info generated source directory: $(GEN_DIR))
|
||||
$(info output directory: $(OUT_DIR))
|
||||
$(info driver source files: $(DRIVER))
|
||||
$(info clock period: $(CLOCK_PERIOD))
|
||||
|
||||
shim := FPGATop
|
||||
|
||||
override CXXFLAGS := $(CXXFLAGS) -std=c++11 -Wall -I$(midas_dir)/dramsim2
|
||||
|
||||
include $(util_dir)/utils.mk
|
||||
|
||||
$(OUT_DIR)/dramsim2_ini: $(r_dir)/dramsim2_ini
|
||||
ln -sf $< $@
|
||||
|
||||
$(OUT_DIR)/$(DESIGN).chain:
|
||||
$(if $(wildcard $(GEN_DIR)/$(DESIGN).chain),cp $(GEN_DIR)/$(DESIGN).chain $@,)
|
||||
|
||||
override CXXFLAGS += -I$(midas_dir) -I$(util_dir)
|
||||
# The trailing whitespace is important for some reason...
|
||||
override LDFLAGS := $(LDFLAGS) -L$(GEN_DIR) -lstdc++ -lpthread -lgmp -lmidas
|
||||
|
||||
design_v := $(GEN_DIR)/$(shim).v
|
||||
design_h := $(GEN_DIR)/$(DESIGN)-const.h
|
||||
design_vh := $(GEN_DIR)/$(DESIGN)-const.vh
|
||||
driver_h = $(foreach t, $(DRIVER), $(wildcard $(dir $(t))/*.h))
|
||||
bridge_h := $(wildcard $(bridge_dir)/*.h)
|
||||
bridge_cc := $(wildcard $(bridge_dir)/*.cc)
|
||||
bridge_o := $(patsubst $(bridge_dir)/%.cc, $(GEN_DIR)/%.o, $(bridge_cc))
|
||||
$(bridge_o): $(GEN_DIR)/%.o: $(bridge_dir)/%.cc $(design_h) $(bridge_h)
|
||||
$(CXX) $(CXXFLAGS) -c -o $@ $< -include $(word 2, $^)
|
||||
|
||||
platform_files := simif simif_$(PLATFORM) sample/sample
|
||||
platform_h := $(addprefix $(midas_dir)/, $(addsuffix .h, $(platform_files)))
|
||||
platform_cc := $(addprefix $(midas_dir)/, $(addsuffix .cc, $(platform_files) sample/simif_sample))
|
||||
platform_o := $(addprefix $(GEN_DIR)/, $(addsuffix .o, $(platform_files) sample/simif_sample))
|
||||
|
||||
$(platform_o): $(GEN_DIR)/%.o: $(midas_dir)/%.cc $(design_h) $(platform_h)
|
||||
mkdir -p $(dir $@)
|
||||
$(CXX) $(CXXFLAGS) -c -o $@ $< -include $(word 2, $^)
|
||||
|
||||
$(OUT_DIR)/$(DESIGN)-$(PLATFORM): $(design_h) $(lib) $(DRIVER) $(driver_h) $(platform_o) $(bridge_o)
|
||||
mkdir -p $(OUT_DIR)
|
||||
$(CXX) $(CXXFLAGS) -include $< \
|
||||
-o $@ $(DRIVER) $(dramsim_o) $(lib_o) $(platform_o) $(bridge_o) $(LDFLAGS)
|
||||
|
||||
$(PLATFORM): $(OUT_DIR)/$(DESIGN)-$(PLATFORM) $(OUT_DIR)/$(DESIGN).chain
|
||||
|
||||
# Sources for building MIDAS-level simulators. Must be defined before sources VCS/Verilator Makefrags
|
||||
override CFLAGS += -include $(design_h)
|
||||
|
||||
# Models of FPGA primitives that are used in host-level sim, but not in FPGATop
|
||||
sim_fpga_resource_models := $(v_dir)/BUFGCE.v
|
||||
|
||||
emul_files := simif simif_emul emul/mmio_$(PLATFORM) sample/sample
|
||||
emul_h := $(driver_h) $(bridge_h) $( $(addprefix $(midas_dir)/, $(addsuffix .h, $(emul_files) emul/mmio)))
|
||||
# This includes c sources and static libraries
|
||||
emul_cc := $(DRIVER) $(bridge_cc) $(addprefix $(midas_dir)/, $(addsuffix .cc, $(emul_files) sample/simif_sample)) $(lib)
|
||||
emul_v := $(design_vh) $(design_v) $(sim_fpga_resource_models)
|
||||
|
||||
# The lop level module must be called out for verilator
|
||||
ifeq ($(PLATFORM),zynq)
|
||||
top_module = ZynqShim
|
||||
endif
|
||||
ifeq ($(PLATFORM),f1)
|
||||
top_module = F1Shim
|
||||
endif
|
||||
|
||||
verilator_conf := rtlsim/ml-verilator-conf.vlt
|
||||
include rtlsim/Makefrag-verilator
|
||||
|
||||
verilator: $(OUT_DIR)/V$(DESIGN) $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
|
||||
verilator-debug: $(OUT_DIR)/V$(DESIGN)-debug $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
|
||||
|
||||
# Add an extra wrapper source for VCS simulators
|
||||
vcs_wrapper_v := $(v_dir)/emul_$(PLATFORM).v
|
||||
TB := emul
|
||||
VCS_FLAGS := -e vcs_main
|
||||
include rtlsim/Makefrag-vcs
|
||||
|
||||
vcs: $(OUT_DIR)/$(DESIGN) $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
|
||||
vcs-debug: $(OUT_DIR)/$(DESIGN)-debug $(OUT_DIR)/$(DESIGN).chain $(OUT_DIR)/dramsim2_ini
|
||||
|
||||
include $(replay_dir)/replay.mk
|
||||
|
||||
.PHONY: $(PLATFORM) verilator verilator-debug vcs vcs-debug
|
|
@ -0,0 +1,20 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "address_map.h"
|
||||
|
||||
AddressMap::AddressMap(
|
||||
unsigned int r_register_count,
|
||||
const unsigned int* r_register_addrs,
|
||||
const char* const* r_register_names,
|
||||
unsigned int w_register_count,
|
||||
const unsigned int* w_register_addrs,
|
||||
const char* const* w_register_names) {
|
||||
|
||||
for (size_t i = 0; i < r_register_count; i++) {
|
||||
r_registers.insert(std::make_pair(r_register_names[i], r_register_addrs[i]));
|
||||
}
|
||||
|
||||
for (size_t i = 0; i < w_register_count; i++) {
|
||||
w_registers.insert(std::make_pair(w_register_names[i], w_register_addrs[i]));
|
||||
}
|
||||
}
|
|
@ -0,0 +1,35 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __ADDRESS_MAP_H
|
||||
#define __ADDRESS_MAP_H
|
||||
|
||||
#include <map>
|
||||
// Maps midas compiler emited arrays to a more useful object, that can be
|
||||
// used to read and write to a local set of registers by their names
|
||||
//
|
||||
// Registers may appear in both R and W lists
|
||||
class AddressMap
|
||||
{
|
||||
public:
|
||||
AddressMap(
|
||||
unsigned int read_register_count,
|
||||
const unsigned int* read_register_addrs,
|
||||
const char* const* read_register_names,
|
||||
unsigned int write_register_count,
|
||||
const unsigned int* write_register_addrs,
|
||||
const char* const* write_register_names);
|
||||
|
||||
// Look up register address based on name
|
||||
uint32_t r_addr(std::string name) { return r_registers[name]; };
|
||||
uint32_t w_addr(std::string name) { return w_registers[name]; };
|
||||
|
||||
// Check for register presence
|
||||
bool r_reg_exists(std::string name) { return r_registers.find(name) != r_registers.end(); };
|
||||
bool w_reg_exists(std::string name) { return w_registers.find(name) != w_registers.end(); };
|
||||
|
||||
// Register name -> register addresses
|
||||
std::map<std::string, uint32_t> r_registers;
|
||||
std::map<std::string, uint32_t> w_registers;
|
||||
};
|
||||
|
||||
#endif // __ADDRESS_MAP_H
|
|
@ -0,0 +1,58 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __BRIDGE_DRIVER_H
|
||||
#define __BRIDGE_DRIVER_H
|
||||
|
||||
#include "simif.h"
|
||||
|
||||
// Bridge Drivers are the CPU-hosted component of a Target-to-Host Bridge. A
|
||||
// Bridge Driver interacts with their accompanying FPGA-hosted BridgeModule
|
||||
// using MMIO (via read() and write() methods) or CPU-mastered DMA (via pull()
|
||||
// and push()).
|
||||
|
||||
class bridge_driver_t
|
||||
{
|
||||
public:
|
||||
bridge_driver_t(simif_t* s): sim(s) { }
|
||||
virtual ~bridge_driver_t() {};
|
||||
// Initialize BridgeModule state -- this can't be done in the constructor currently
|
||||
virtual void init() = 0;
|
||||
// Does work that allows the Bridge to advance in simulation time (one or more cycles)
|
||||
// The standard FireSim driver calls the tick methods of all registered bridge drivers.
|
||||
// Bridges whose BridgeModule is free-running need not implement this method
|
||||
virtual void tick() = 0;
|
||||
// Indicates the simulation should terminate.
|
||||
// Tie off to false if the brige will never call for the simulation to teriminate.
|
||||
virtual bool terminate() = 0;
|
||||
// If the bridge driver calls for termination, encode a cause here. 0 = PASS All other
|
||||
// codes are bridge-implementation defined
|
||||
virtual int exit_code() = 0;
|
||||
// The analog of init(), this provides a final opportunity to interact with
|
||||
// the FPGA before destructors are called at the end of simulation. Useful
|
||||
// for doing end-of-simulation clean up that requires calling {read,write,push,pull}.
|
||||
virtual void finish() = 0;
|
||||
|
||||
protected:
|
||||
void write(size_t addr, data_t data) {
|
||||
sim->write(addr, data);
|
||||
}
|
||||
|
||||
data_t read(size_t addr) {
|
||||
return sim->read(addr);
|
||||
}
|
||||
|
||||
ssize_t pull(size_t addr, char *data, size_t size) {
|
||||
return sim->pull(addr, data, size);
|
||||
}
|
||||
|
||||
ssize_t push(size_t addr, char *data, size_t size) {
|
||||
if (size == 0)
|
||||
return 0;
|
||||
return sim->push(addr, data, size);
|
||||
}
|
||||
|
||||
private:
|
||||
simif_t *sim;
|
||||
};
|
||||
|
||||
#endif // __BRIDGE_DRIVER_H
|
|
@ -0,0 +1,196 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include <iostream>
|
||||
#include <algorithm>
|
||||
#include <exception>
|
||||
#include <stdio.h>
|
||||
|
||||
#include "fased_memory_timing_model.h"
|
||||
|
||||
void Histogram::init() {
|
||||
// Read out the initial values
|
||||
write(enable, 1);
|
||||
for ( size_t i = 0; i < HISTOGRAM_SIZE; i++) {
|
||||
write(addr, i);
|
||||
latency[i] = read64(dataH, dataL, BIN_H_MASK);
|
||||
}
|
||||
// Disable readout enable; otherwise histogram updates will be gated
|
||||
write(enable, 0);
|
||||
}
|
||||
|
||||
void Histogram::finish() {
|
||||
// Read out the initial values
|
||||
write(enable, 1);
|
||||
for ( size_t i = 0; i < HISTOGRAM_SIZE; i++) {
|
||||
write(addr, i);
|
||||
latency[i] = read64(dataH, dataL, BIN_H_MASK) - latency[i];
|
||||
}
|
||||
// Disable readout enable; otherwise histogram updates will be gated
|
||||
write(enable, 0);
|
||||
}
|
||||
|
||||
void AddrRangeCounter::init() {
|
||||
nranges = read("numRanges");
|
||||
range_bytes = new uint64_t[nranges];
|
||||
|
||||
write(enable, 1);
|
||||
for (size_t i = 0; i < nranges; i++) {
|
||||
write(addr, i);
|
||||
range_bytes[i] = read64(dataH, dataL, RANGE_H_MASK);
|
||||
}
|
||||
write(enable, 0);
|
||||
}
|
||||
|
||||
void AddrRangeCounter::finish() {
|
||||
write(enable, 1);
|
||||
for (size_t i = 0; i < nranges; i++) {
|
||||
write(addr, i);
|
||||
range_bytes[i] = read64(dataH, dataL, RANGE_H_MASK);
|
||||
}
|
||||
write(enable, 0);
|
||||
}
|
||||
|
||||
FASEDMemoryTimingModel::FASEDMemoryTimingModel(
|
||||
simif_t* sim, AddressMap addr_map, int argc,char** argv,
|
||||
std::string stats_file_name, size_t mem_size, uint64_t mem_host_offset)
|
||||
: FpgaModel(sim, addr_map), mem_size(mem_size), mem_host_offset(mem_host_offset) {
|
||||
|
||||
std::vector<std::string> args(argv + 1, argv + argc);
|
||||
for (auto &arg: args) {
|
||||
if(arg.find("+mm_") == 0) {
|
||||
auto sub_arg = std::string(arg.c_str() + 4);
|
||||
size_t delimit_idx = sub_arg.find_first_of("=");
|
||||
std::string key = sub_arg.substr(0, delimit_idx).c_str();
|
||||
int value = std::stoi(sub_arg.substr(delimit_idx+1).c_str());
|
||||
model_configuration[key] = value;
|
||||
}
|
||||
}
|
||||
|
||||
stats_file.open(stats_file_name, std::ofstream::out);
|
||||
if(!stats_file.is_open()) {
|
||||
throw std::runtime_error("Could not open output file: " + stats_file_name);
|
||||
}
|
||||
|
||||
for (auto pair: addr_map.r_registers) {
|
||||
// Only profile readable registers
|
||||
if (!addr_map.w_reg_exists((pair.first))) {
|
||||
// Iterate through substrings to exclude
|
||||
bool exclude = false;
|
||||
for (auto &substr: profile_exclusion) {
|
||||
if (pair.first.find(substr) != std::string::npos) { exclude = true; }
|
||||
}
|
||||
if (!exclude) {
|
||||
profile_reg_addrs.push_back(pair.second);
|
||||
stats_file << pair.first << ",";
|
||||
}
|
||||
}
|
||||
}
|
||||
stats_file << std::endl;
|
||||
|
||||
if (addr_map.w_reg_exists("hostReadLatencyHist_enable")) {
|
||||
histograms.push_back(Histogram(sim, addr_map, "hostReadLatency"));
|
||||
histograms.push_back(Histogram(sim, addr_map, "hostWriteLatency"));
|
||||
histograms.push_back(Histogram(sim, addr_map, "targetReadLatency"));
|
||||
histograms.push_back(Histogram(sim, addr_map, "targetWriteLatency"));
|
||||
histograms.push_back(Histogram(sim, addr_map, "ingressReadLatency"));
|
||||
histograms.push_back(Histogram(sim, addr_map, "ingressWriteLatency"));
|
||||
histograms.push_back(Histogram(sim, addr_map, "totalReadLatency"));
|
||||
histograms.push_back(Histogram(sim, addr_map, "totalWriteLatency"));
|
||||
}
|
||||
|
||||
if (addr_map.w_reg_exists("readRanges_enable")) {
|
||||
rangectrs.push_back(AddrRangeCounter(sim, addr_map, "read"));
|
||||
rangectrs.push_back(AddrRangeCounter(sim, addr_map, "write"));
|
||||
}
|
||||
}
|
||||
|
||||
void FASEDMemoryTimingModel::profile() {
|
||||
for (auto addr: profile_reg_addrs) {
|
||||
stats_file << read(addr) << ",";
|
||||
}
|
||||
stats_file << std::endl;
|
||||
}
|
||||
|
||||
void FASEDMemoryTimingModel::init() {
|
||||
for (auto &pair: addr_map.w_registers) {
|
||||
auto value_it = model_configuration.find(pair.first);
|
||||
if (value_it != model_configuration.end()) {
|
||||
write(pair.second, value_it->second);
|
||||
}
|
||||
else if (pair.first.find("hostMemOffsetLow") != std::string::npos) {
|
||||
write(pair.second, mem_host_offset & ((1ULL << 32) - 1));
|
||||
}
|
||||
else if (pair.first.find("hostMemOffsetHigh") != std::string::npos) {
|
||||
write(pair.second, mem_host_offset >> 32);
|
||||
}
|
||||
else {
|
||||
// Iterate through substrings to exclude
|
||||
bool exclude = false;
|
||||
for (auto &substr: configuration_exclusion) {
|
||||
if (pair.first.find(substr) != std::string::npos) { exclude = true; }
|
||||
}
|
||||
|
||||
if (!exclude) {
|
||||
char buf[100];
|
||||
sprintf(buf, "No value provided for configuration register: %s", pair.first.c_str());
|
||||
throw std::runtime_error(buf);
|
||||
} else {
|
||||
fprintf(stderr, "Ignoring writeable register: %s\n", pair.first.c_str());
|
||||
}
|
||||
}
|
||||
}
|
||||
for (auto &hist: histograms) { hist.init(); }
|
||||
for (auto &rctr: rangectrs) { rctr.init(); }
|
||||
}
|
||||
|
||||
void FASEDMemoryTimingModel::finish() {
|
||||
for (auto &hist: histograms) { hist.finish(); }
|
||||
for (auto &rctr: rangectrs) { rctr.finish(); }
|
||||
|
||||
std::ofstream histogram_file;
|
||||
histogram_file.open("latency_histogram.csv", std::ofstream::out);
|
||||
if(!histogram_file.is_open()) {
|
||||
throw std::runtime_error("Could not open histogram output file");
|
||||
}
|
||||
|
||||
// Header
|
||||
for (auto &hist: histograms) {
|
||||
histogram_file << hist.name << ",";
|
||||
}
|
||||
histogram_file << std::endl;
|
||||
// Data
|
||||
for (size_t i = 0; i < HISTOGRAM_SIZE; i++) {
|
||||
for (auto &hist: histograms) {
|
||||
histogram_file << hist.latency[i] << ",";
|
||||
}
|
||||
histogram_file << std::endl;
|
||||
}
|
||||
histogram_file.close();
|
||||
|
||||
if (!rangectrs.empty()) {
|
||||
size_t nranges = rangectrs[0].nranges;
|
||||
std::ofstream rangectr_file;
|
||||
|
||||
rangectr_file.open("range_counters.csv", std::ofstream::out);
|
||||
if (!rangectr_file.is_open()) {
|
||||
throw std::runtime_error("Could not open range counter file");
|
||||
}
|
||||
|
||||
rangectr_file << "Address,";
|
||||
for (auto &rctr: rangectrs) {
|
||||
rangectr_file << rctr.name << ",";
|
||||
}
|
||||
rangectr_file << std::endl;
|
||||
|
||||
for (size_t i = 0; i < nranges; i++) {
|
||||
rangectr_file << std::hex << (i * mem_size / nranges) << ",";
|
||||
for (auto &rctr: rangectrs) {
|
||||
rangectr_file << std::dec << rctr.range_bytes[i] << ",";
|
||||
}
|
||||
rangectr_file << std::endl;
|
||||
}
|
||||
rangectr_file.close();
|
||||
}
|
||||
|
||||
stats_file.close();
|
||||
}
|
|
@ -0,0 +1,114 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __FASED_MEMORY_TIMING_MODEL_H
|
||||
#define __FASED_MEMORY_TIMING_MODEL_H
|
||||
|
||||
/* This is the widget driver for FASED memory-timing models
|
||||
*
|
||||
* FASED instances are FPGA-hosted and only rely on this driver to:
|
||||
* 1) set runtime-configurable timing parameters before simulation commences
|
||||
* 2) poll instrumentation registers
|
||||
*
|
||||
*/
|
||||
|
||||
#include <unordered_map>
|
||||
#include <set>
|
||||
#include <fstream>
|
||||
|
||||
#include "fpga_model.h"
|
||||
|
||||
|
||||
// MICRO HACKS.
|
||||
constexpr int HISTOGRAM_SIZE = 1024;
|
||||
constexpr int BIN_SIZE = 36;
|
||||
constexpr int RANGE_COUNT_SIZE = 48;
|
||||
constexpr data_t BIN_H_MASK = (1L << (BIN_SIZE - 32)) - 1;
|
||||
constexpr data_t RANGE_H_MASK = (1L << (RANGE_COUNT_SIZE - 32)) - 1;
|
||||
|
||||
class AddrRangeCounter: public FpgaModel {
|
||||
public:
|
||||
AddrRangeCounter(simif_t *sim, AddressMap addr_map, std::string name):
|
||||
FpgaModel(sim, addr_map), name(name) {};
|
||||
~AddrRangeCounter(void) { /*delete [] range_bytes;*/ }
|
||||
|
||||
void init();
|
||||
void profile() {}
|
||||
void finish();
|
||||
|
||||
std::string name;
|
||||
uint64_t *range_bytes;
|
||||
size_t nranges;
|
||||
|
||||
private:
|
||||
std::string enable = name + "Ranges_enable";
|
||||
std::string dataH = name + "Ranges_dataH";
|
||||
std::string dataL = name + "Ranges_dataL";
|
||||
std::string addr = name + "Ranges_addr";
|
||||
};
|
||||
|
||||
class Histogram: public FpgaModel {
|
||||
public:
|
||||
Histogram(simif_t* s, AddressMap addr_map, std::string name): FpgaModel(s, addr_map), name(name) {};
|
||||
void init();
|
||||
void profile() {};
|
||||
void finish();
|
||||
std::string name;
|
||||
uint64_t latency[HISTOGRAM_SIZE];
|
||||
|
||||
private:
|
||||
std::string enable = name + "Hist_enable";
|
||||
std::string dataH = name + "Hist_dataH";
|
||||
std::string dataL = name + "Hist_dataL";
|
||||
std::string addr = name + "Hist_addr";
|
||||
};
|
||||
|
||||
class FASEDMemoryTimingModel: public FpgaModel
|
||||
{
|
||||
public:
|
||||
FASEDMemoryTimingModel(simif_t* s, AddressMap addr_map, int argc, char** argv,
|
||||
std::string stats_file_name, size_t mem_size, uint64_t mem_host_offset);
|
||||
void init();
|
||||
void profile();
|
||||
void finish();
|
||||
|
||||
private:
|
||||
// Saves a map of register names to settings
|
||||
std::unordered_map<std::string, uint32_t> model_configuration;
|
||||
std::vector<uint32_t> profile_reg_addrs;
|
||||
std::ofstream stats_file;
|
||||
std::vector<Histogram> histograms;
|
||||
std::vector<AddrRangeCounter> rangectrs;
|
||||
std::set<std::string> configuration_exclusion {
|
||||
"Hist_dataL",
|
||||
"Hist_dataH",
|
||||
"Hist_addr",
|
||||
"Hist_enable",
|
||||
"hostMemOffsetLow",
|
||||
"hostMemOffsetHigh",
|
||||
"Ranges_dataL",
|
||||
"Ranges_dataH",
|
||||
"Ranges_addr",
|
||||
"Ranges_enable",
|
||||
"numRanges"
|
||||
};
|
||||
|
||||
std::set<std::string> profile_exclusion {
|
||||
"Hist_dataL",
|
||||
"Hist_dataH",
|
||||
"Hist_addr",
|
||||
"Hist_enable",
|
||||
"hostMemOffsetLow",
|
||||
"hostMemOffsetHigh",
|
||||
"Ranges_dataL",
|
||||
"Ranges_dataH",
|
||||
"Ranges_addr",
|
||||
"Ranges_enable",
|
||||
"numRanges"
|
||||
};
|
||||
|
||||
bool has_latency_histograms() { return histograms.size() > 0; };
|
||||
size_t mem_size;
|
||||
uint64_t mem_host_offset;
|
||||
};
|
||||
|
||||
#endif // __FASED_MEMORY_TIMING_MODEL_H
|
|
@ -0,0 +1,60 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __FPGA_MODEL_H
|
||||
#define __FPGA_MODEL_H
|
||||
|
||||
#include "simif.h"
|
||||
#include "address_map.h"
|
||||
|
||||
/**
|
||||
* Base class for (handwritten) FPGA-hosted models
|
||||
*
|
||||
* These models have two important methods:
|
||||
*
|
||||
* 1) init: Which sets their runtime configuration. Ex. The latency of
|
||||
* latency pipe
|
||||
*
|
||||
* 2) profile: Which gives a default means to read all readable registers in
|
||||
* the model, including programmable registers and instrumentation.
|
||||
*
|
||||
*/
|
||||
|
||||
class FpgaModel
|
||||
{
|
||||
private:
|
||||
simif_t *sim;
|
||||
|
||||
public:
|
||||
FpgaModel(simif_t* s, AddressMap addr_map): sim(s), addr_map(addr_map) {};
|
||||
virtual void init() = 0;
|
||||
virtual void profile() = 0;
|
||||
virtual void finish() = 0;
|
||||
|
||||
protected:
|
||||
AddressMap addr_map;
|
||||
|
||||
void write(size_t addr, data_t data) {
|
||||
sim->write(addr, data);
|
||||
}
|
||||
|
||||
data_t read(size_t addr) {
|
||||
return sim->read(addr);
|
||||
}
|
||||
|
||||
void write(std::string reg, data_t data){
|
||||
sim->write(addr_map.w_addr(reg), data);
|
||||
}
|
||||
|
||||
data_t read(std::string reg){
|
||||
return sim->read(addr_map.r_addr(reg));
|
||||
}
|
||||
|
||||
uint64_t read64(std::string msw, std::string lsw, data_t upper_word_mask) {
|
||||
assert(sizeof(data_t) == 4);
|
||||
uint64_t data = ((uint64_t) (read(msw) & upper_word_mask)) << 32;
|
||||
return data | read(lsw);
|
||||
}
|
||||
|
||||
};
|
||||
|
||||
#endif // __FPGA_MODEL_H
|
|
@ -0,0 +1,46 @@
|
|||
#ifdef ASSERTBRIDGEMODULE_struct_guard
|
||||
|
||||
#include "synthesized_assertions.h"
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
|
||||
|
||||
synthesized_assertions_t::synthesized_assertions_t(simif_t* sim,
|
||||
ASSERTBRIDGEMODULE_struct * mmio_addrs): bridge_driver_t(sim) {
|
||||
this->mmio_addrs = mmio_addrs;
|
||||
};
|
||||
|
||||
synthesized_assertions_t::~synthesized_assertions_t() {
|
||||
free(this->mmio_addrs);
|
||||
}
|
||||
|
||||
void synthesized_assertions_t::tick() {
|
||||
if (read(this->mmio_addrs->fire)) {
|
||||
// Read assertion information
|
||||
std::vector<std::string> msgs;
|
||||
std::ifstream file(std::string(TARGET_NAME) + ".asserts");
|
||||
std::string line;
|
||||
std::ostringstream oss;
|
||||
while (std::getline(file, line)) {
|
||||
if (line == "0") {
|
||||
msgs.push_back(oss.str());
|
||||
oss.str(std::string());
|
||||
} else {
|
||||
oss << line << std::endl;
|
||||
}
|
||||
}
|
||||
assert_cycle = read(this->mmio_addrs->cycle_low);
|
||||
assert_cycle |= ((uint64_t)read(this->mmio_addrs->cycle_high)) << 32;
|
||||
assert_id = read(this->mmio_addrs->id);
|
||||
std::cerr << msgs[assert_id];
|
||||
std::cerr << " at cycle: " << assert_cycle << std::endl;
|
||||
assert_fired = true;
|
||||
}
|
||||
}
|
||||
|
||||
void synthesized_assertions_t::resume() {
|
||||
assert_fired = false;
|
||||
write(this->mmio_addrs->resume, 1);
|
||||
}
|
||||
|
||||
#endif // ASSERTBRIDGEMODULE_struct_guard
|
|
@ -0,0 +1,28 @@
|
|||
#ifndef __SYNTHESIZED_ASSERTIONS_H
|
||||
#define __SYNTHESIZED_ASSERTIONS_H
|
||||
|
||||
#ifdef ASSERTBRIDGEMODULE_struct_guard
|
||||
|
||||
#include "bridge_driver.h"
|
||||
|
||||
class synthesized_assertions_t: public bridge_driver_t
|
||||
{
|
||||
public:
|
||||
synthesized_assertions_t(simif_t* sim, ASSERTBRIDGEMODULE_struct * mmio_addrs);
|
||||
~synthesized_assertions_t();
|
||||
virtual void init() {};
|
||||
virtual void tick();
|
||||
virtual void finish() {};
|
||||
void resume(); // Clears any set assertions, and allows the simulation to advance
|
||||
virtual bool terminate() { return assert_fired; };
|
||||
virtual int exit_code() { return (assert_fired) ? assert_id + 1 : 0; };
|
||||
private:
|
||||
bool assert_fired = false;
|
||||
int assert_id;
|
||||
uint64_t assert_cycle;
|
||||
ASSERTBRIDGEMODULE_struct * mmio_addrs;
|
||||
};
|
||||
|
||||
#endif // ASSERTBRIDGEMODULE_struct_guard
|
||||
|
||||
#endif //__SYNTHESIZED_ASSERTIONS_H
|
|
@ -0,0 +1,294 @@
|
|||
#ifdef PRINTBRIDGEMODULE_struct_guard
|
||||
|
||||
#include <iomanip>
|
||||
|
||||
#include "synthesized_prints.h"
|
||||
|
||||
synthesized_prints_t::synthesized_prints_t(
|
||||
simif_t* sim,
|
||||
std::vector<std::string> &args,
|
||||
PRINTBRIDGEMODULE_struct * mmio_addrs,
|
||||
unsigned int print_count,
|
||||
unsigned int token_bytes,
|
||||
unsigned int idle_cycles_mask,
|
||||
const unsigned int* print_offsets,
|
||||
const char* const* format_strings,
|
||||
const unsigned int* argument_counts,
|
||||
const unsigned int* argument_widths,
|
||||
unsigned int dma_address):
|
||||
bridge_driver_t(sim),
|
||||
mmio_addrs(mmio_addrs),
|
||||
print_count(print_count),
|
||||
token_bytes(token_bytes),
|
||||
idle_cycles_mask(idle_cycles_mask),
|
||||
print_offsets(print_offsets),
|
||||
format_strings(format_strings),
|
||||
argument_counts(argument_counts),
|
||||
argument_widths(argument_widths),
|
||||
dma_address(dma_address) {
|
||||
assert((token_bytes & (token_bytes - 1)) == 0);
|
||||
assert(print_count > 0);
|
||||
|
||||
const char *printfilename = default_filename.c_str();
|
||||
|
||||
this->start_cycle = 0;
|
||||
this->end_cycle = -1ULL;
|
||||
|
||||
std::string printfile_arg = std::string("+print-file=");
|
||||
std::string printstart_arg = std::string("+print-start=");
|
||||
std::string printend_arg = std::string("+print-end=");
|
||||
// Does not format the printfs, before writing them to file
|
||||
std::string binary_arg = std::string("+print-binary");
|
||||
// Removes the cycle prefix from human-readable output
|
||||
std::string cycleprefix_arg = std::string("+print-no-cycle-prefix");
|
||||
|
||||
// Choose a multiple of token_bytes for the batch size
|
||||
if (((beat_bytes * desired_batch_beats) % token_bytes) != 0 ) {
|
||||
this->batch_beats = token_bytes / beat_bytes;
|
||||
} else {
|
||||
this->batch_beats = desired_batch_beats;
|
||||
}
|
||||
|
||||
for (auto &arg: args) {
|
||||
if (arg.find(printfile_arg) == 0) {
|
||||
printfilename = const_cast<char*>(arg.c_str()) + printfile_arg.length();
|
||||
}
|
||||
if (arg.find(printstart_arg) == 0) {
|
||||
char *str = const_cast<char*>(arg.c_str()) + printstart_arg.length();
|
||||
this->start_cycle = atol(str);
|
||||
}
|
||||
if (arg.find(printend_arg) == 0) {
|
||||
char *str = const_cast<char*>(arg.c_str()) + printend_arg.length();
|
||||
this->end_cycle = atol(str);
|
||||
}
|
||||
if (arg.find(binary_arg) == 0) {
|
||||
human_readable = false;
|
||||
}
|
||||
if (arg.find(cycleprefix_arg) == 0) {
|
||||
print_cycle_prefix = false;
|
||||
}
|
||||
}
|
||||
current_cycle = start_cycle; // We won't receive tokens until start_cycle; so fast-forward
|
||||
|
||||
this->printfile.open(printfilename, std::ios_base::out | std::ios_base::binary);
|
||||
if (!this->printfile.is_open()) {
|
||||
fprintf(stderr, "Could not open print log file: %s\n", printfilename);
|
||||
abort();
|
||||
}
|
||||
|
||||
this->printstream = &(this->printfile);
|
||||
|
||||
widths.resize(print_count);
|
||||
// Used to reconstruct the relative position of arguments in the flattened argument_widths array
|
||||
size_t arg_base_offset = 0;
|
||||
size_t print_bit_offset = 1; // The lsb of the current print in the packed token
|
||||
|
||||
for (size_t p_idx = 0; p_idx < print_count; p_idx++ ) {
|
||||
|
||||
auto print_args = new print_vars_t;
|
||||
size_t print_width = 1; // A running total of argument widths for this print, including an enable bit
|
||||
|
||||
// Iterate through the arguments for this print
|
||||
for (size_t arg_idx = 0; arg_idx < argument_counts[p_idx]; arg_idx++) {
|
||||
size_t arg_width = argument_widths[arg_base_offset + arg_idx];
|
||||
widths[p_idx].push_back(arg_width);
|
||||
|
||||
mpz_t* mask = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
// Below is equivalent to *mask = (1 << arg_width) - 1
|
||||
mpz_init(*mask);
|
||||
mpz_set_ui(*mask, 1);
|
||||
mpz_mul_2exp(*mask, *mask, arg_width);
|
||||
mpz_sub_ui(*mask, *mask, 1);
|
||||
|
||||
print_args->data.push_back(mask);
|
||||
print_width += arg_width;
|
||||
}
|
||||
|
||||
size_t aligned_offset = print_bit_offset / gmp_align_bits;
|
||||
size_t aligned_msw = (print_width + print_bit_offset) / gmp_align_bits;
|
||||
size_t rounded_size = aligned_msw - aligned_offset + 1;
|
||||
|
||||
arg_base_offset += argument_counts[p_idx];
|
||||
masks.push_back(print_args);
|
||||
sizes.push_back(rounded_size);
|
||||
aligned_offsets.push_back(aligned_offset);
|
||||
bit_offset.push_back(print_bit_offset % gmp_align_bits);
|
||||
|
||||
print_bit_offset += print_width;
|
||||
}
|
||||
};
|
||||
|
||||
synthesized_prints_t::~synthesized_prints_t() {
|
||||
free(this->mmio_addrs);
|
||||
for (size_t i = 0 ; i < print_count ; i++) {
|
||||
delete masks[i];
|
||||
}
|
||||
}
|
||||
|
||||
void synthesized_prints_t::init() {
|
||||
// Set the bounds in the widget
|
||||
write(this->mmio_addrs->startCycleL, this->start_cycle);
|
||||
write(this->mmio_addrs->startCycleH, this->start_cycle >> 32);
|
||||
write(this->mmio_addrs->endCycleL, this->end_cycle);
|
||||
write(this->mmio_addrs->endCycleH, this->end_cycle >> 32);
|
||||
write(this->mmio_addrs->doneInit, 1);
|
||||
}
|
||||
|
||||
// Accepts the format string, and the masked arguments, and emits the formatted
|
||||
// print to the desired stream
|
||||
void synthesized_prints_t::print_format(const char* fmt, print_vars_t* vars, print_vars_t* masks) {
|
||||
size_t k = 0;
|
||||
if (print_cycle_prefix) {
|
||||
*printstream << "CYCLE:" << std::setw(13) << current_cycle << " ";
|
||||
}
|
||||
while(*fmt) {
|
||||
if (*fmt == '%' && fmt[1] != '%') {
|
||||
mpz_t* value = vars->data[k];
|
||||
char* v = NULL;
|
||||
if (fmt[1] == 's') {
|
||||
size_t size;
|
||||
v = (char*)mpz_export(NULL, &size, 1, sizeof(char), 0, 0, *value);
|
||||
for (size_t j = 0 ; j < size ; j++) printstream->put(v[j]);
|
||||
fmt++;
|
||||
free(v);
|
||||
} else {
|
||||
char buf[1024];
|
||||
switch(*(++fmt)) {
|
||||
case 'h':
|
||||
case 'x': gmp_sprintf(buf, "%0*Zx", mpz_sizeinbase(*(masks->data[k]), 16), *value); break;
|
||||
case 'd': gmp_sprintf(buf, "%*Zd", mpz_sizeinbase(*(masks->data[k]), 10), *value); break;
|
||||
case 'b': mpz_get_str(buf, 2, *value); break;
|
||||
default: assert(0); break;
|
||||
}
|
||||
(*printstream) << buf;
|
||||
}
|
||||
fmt++;
|
||||
k++;
|
||||
} else if (*fmt == '%') {
|
||||
printstream->put(*(++fmt));
|
||||
fmt++;
|
||||
} else if (*fmt == '\\' && fmt[1] == 'n') {
|
||||
printstream->put('\n');
|
||||
fmt += 2;
|
||||
} else {
|
||||
printstream->put(*fmt);
|
||||
fmt++;
|
||||
}
|
||||
}
|
||||
assert(k == vars->data.size());
|
||||
}
|
||||
|
||||
// Returns true if at least one print in the token is enabled in this cycle
|
||||
bool has_enabled_print(char * buf) { return (buf[0] & 1); }
|
||||
// If the token has no enabled prints, return a number of idle cycles encoded in the msbs
|
||||
uint32_t decode_idle_cycles(char * buf, uint32_t mask) {
|
||||
return (((*((uint32_t*)buf)) & mask) >> 1);
|
||||
}
|
||||
|
||||
// Iterates through the DMA flits (each is one token); checking if their are enabled prints
|
||||
void synthesized_prints_t::process_tokens(size_t beats) {
|
||||
size_t batch_bytes = beats * beat_bytes;
|
||||
|
||||
// See FireSim issue #208
|
||||
// This needs to be page aligned, as a DMA request that spans a page is
|
||||
// fractured into a pair, and for reasons unknown, first beat of the second
|
||||
// request is lost. Once aligned, qequests larger than a page will be fractured into
|
||||
// page-size (64-beat) requests and these seem to behave correctly.
|
||||
alignas(4096) char buf[batch_bytes];
|
||||
|
||||
uint32_t bytes_received = pull(dma_address, (char*)buf, batch_bytes);
|
||||
if (bytes_received != batch_bytes) {
|
||||
printf("ERR MISMATCH! on reading print tokens. Read %d bytes, wanted %d bytes.\n",
|
||||
bytes_received, batch_bytes);
|
||||
printf("errno: %s\n", strerror(errno));
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (human_readable) {
|
||||
for (size_t idx = 0; idx < batch_bytes; idx += token_bytes ) {
|
||||
if (has_enabled_print(&buf[idx])) {
|
||||
show_prints(&buf[idx]);
|
||||
current_cycle++;
|
||||
} else {
|
||||
current_cycle += decode_idle_cycles(&buf[idx], idle_cycles_mask);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
printstream->write(buf, batch_bytes);
|
||||
}
|
||||
}
|
||||
|
||||
// Returns true if the print at the current offset is enabled in this cycle
|
||||
bool synthesized_prints_t::current_print_enabled(gmp_align_t * buf, size_t offset) {
|
||||
return (buf[0] & (1LL << (offset)));
|
||||
}
|
||||
|
||||
// Finds enabled prints in a token
|
||||
void synthesized_prints_t::show_prints(char * buf) {
|
||||
for (size_t i = 0 ; i < print_count; i++) {
|
||||
gmp_align_t* data = ((gmp_align_t*)buf) + aligned_offsets[i];
|
||||
// First bit is enable
|
||||
if (current_print_enabled(data, bit_offset[i])) {
|
||||
mpz_t print;
|
||||
mpz_init(print);
|
||||
mpz_import(print, sizes[i], -1, sizeof(gmp_align_t), 0, 0, data);
|
||||
mpz_fdiv_q_2exp(print, print, bit_offset[i] + 1);
|
||||
|
||||
print_vars_t vars;
|
||||
size_t num_args = argument_counts[i];
|
||||
for (size_t arg = 0 ; arg < num_args ; arg++) {
|
||||
mpz_t* var = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_t* mask = masks[i]->data[arg];
|
||||
mpz_init(*var);
|
||||
// *var = print & *mask
|
||||
mpz_and(*var, print, *mask);
|
||||
vars.data.push_back(var);
|
||||
// print = print >> width
|
||||
mpz_fdiv_q_2exp(print, print, widths[i][arg]);
|
||||
}
|
||||
print_format(format_strings[i], &vars, masks[i]);
|
||||
mpz_clear(print);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void synthesized_prints_t::tick() {
|
||||
// Pull batch_tokens from the FPGA if at least that many are avaiable
|
||||
// Assumes 1:1 token to dma-beat size
|
||||
size_t beats_available = read(mmio_addrs->outgoing_count);
|
||||
if (beats_available >= batch_beats) {
|
||||
process_tokens(batch_beats);
|
||||
}
|
||||
}
|
||||
|
||||
// This is a little hacky... however it'll probably work perfectly fine on the
|
||||
// FPGA as mmio read latency is 100+ ns.
|
||||
int synthesized_prints_t::beats_avaliable_stable() {
|
||||
size_t prev_beats_available = 0;
|
||||
size_t beats_avaliable = read(mmio_addrs->outgoing_count);
|
||||
while (beats_avaliable > prev_beats_available) {
|
||||
prev_beats_available = beats_avaliable;
|
||||
beats_avaliable = read(mmio_addrs->outgoing_count);
|
||||
}
|
||||
return beats_avaliable;
|
||||
}
|
||||
|
||||
// Pull in any remaining tokens and flush them to file
|
||||
// WARNING: may not function correctly if the simulator is actively running
|
||||
void synthesized_prints_t::flush() {
|
||||
// Wait for the system to settle
|
||||
size_t beats_available = beats_avaliable_stable();
|
||||
|
||||
// If multiple tokens are being packed into a single DMA beat, force the widget
|
||||
// to write out any incomplete beat
|
||||
if (token_bytes < beat_bytes) {
|
||||
write(mmio_addrs->flushNarrowPacket, 1);
|
||||
while (read(mmio_addrs->outgoing_count) != (beats_available + 1));
|
||||
beats_available++;
|
||||
}
|
||||
|
||||
if (beats_available) process_tokens(beats_available);
|
||||
this->printstream->flush();
|
||||
}
|
||||
|
||||
#endif // PRINTBRIDGEMODULE_struct_guard
|
|
@ -0,0 +1,97 @@
|
|||
#ifndef __SYNTHESIZED_PRINTS_H
|
||||
#define __SYNTHESIZED_PRINTS_H
|
||||
|
||||
#ifdef PRINTBRIDGEMODULE_struct_guard
|
||||
|
||||
#include <vector>
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include <gmp.h>
|
||||
|
||||
#include "bridge_driver.h"
|
||||
|
||||
struct print_vars_t {
|
||||
std::vector<mpz_t*> data;
|
||||
~print_vars_t() {
|
||||
for (auto& e: data) {
|
||||
mpz_clear(*e);
|
||||
free(e);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
class synthesized_prints_t: public bridge_driver_t
|
||||
{
|
||||
|
||||
public:
|
||||
synthesized_prints_t(simif_t* sim,
|
||||
std::vector<std::string> &args,
|
||||
PRINTBRIDGEMODULE_struct * mmio_addrs,
|
||||
unsigned int print_count,
|
||||
unsigned int token_bytes,
|
||||
unsigned int idle_cycles_mask,
|
||||
const unsigned int* print_offsets,
|
||||
const char* const* format_strings,
|
||||
const unsigned int* argument_counts,
|
||||
const unsigned int* argument_widths,
|
||||
unsigned int dma_address);
|
||||
~synthesized_prints_t();
|
||||
virtual void init();
|
||||
virtual void tick();
|
||||
virtual bool terminate() { return false; };
|
||||
virtual int exit_code() { return 0; };
|
||||
void flush();
|
||||
void finish() { flush(); };
|
||||
private:
|
||||
PRINTBRIDGEMODULE_struct * mmio_addrs;
|
||||
const unsigned int print_count;
|
||||
const unsigned int token_bytes;
|
||||
const unsigned int idle_cycles_mask;
|
||||
const unsigned int* print_offsets;
|
||||
const char* const* format_strings;
|
||||
const unsigned int* argument_counts;
|
||||
const unsigned int* argument_widths;
|
||||
const unsigned int dma_address;
|
||||
|
||||
// DMA batching parameters
|
||||
const size_t beat_bytes = DMA_DATA_BITS / 8;
|
||||
// The number of DMA beats to pull off the FPGA on each invocation of tick()
|
||||
// This will be set based on the ratio of token_size : desired_batch_beats
|
||||
size_t batch_beats;
|
||||
// This will be modified to be a multiple of the token size
|
||||
const size_t desired_batch_beats = 3072;
|
||||
|
||||
// Used to define the boundaries in the batch buffer at which we'll
|
||||
// initalize GMP types
|
||||
using gmp_align_t = uint64_t;
|
||||
const size_t gmp_align_bits = sizeof(gmp_align_t) * 8;
|
||||
|
||||
|
||||
// +arg driven members
|
||||
std::ofstream printfile; // Used only if the +print-file arg is provided
|
||||
std::string default_filename = "synthesized-prints.out";
|
||||
|
||||
std::ostream* printstream; // Is set to std::cerr otherwise
|
||||
uint64_t start_cycle, end_cycle; // Bounds between which prints will be emitted
|
||||
uint64_t current_cycle = 0;
|
||||
bool human_readable = true;
|
||||
bool print_cycle_prefix = true;
|
||||
|
||||
std::vector<std::vector<size_t>> widths;
|
||||
std::vector<size_t> sizes;
|
||||
std::vector<print_vars_t*> masks;
|
||||
|
||||
std::vector<size_t> aligned_offsets; // Aligned to gmp_align_t
|
||||
std::vector<size_t> bit_offset;
|
||||
|
||||
bool current_print_enabled(gmp_align_t* buf, size_t offset);
|
||||
void process_tokens(size_t beats);
|
||||
void show_prints(char * buf);
|
||||
void print_format(const char* fmt, print_vars_t* vars, print_vars_t* masks);
|
||||
// Returns the number of beats available, once two successive reads return the same value
|
||||
int beats_avaliable_stable();
|
||||
};
|
||||
|
||||
#endif // PRINTBRIDGEMODULE_struct_guard
|
||||
|
||||
#endif //__SYNTHESIZED_PRINTS_H
|
|
@ -0,0 +1 @@
|
|||
Subproject commit 2ec7965b2ee051aaff03d5db21c6709aea4dd24e
|
|
@ -0,0 +1,20 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __MMIO_H
|
||||
#define __MMIO_H
|
||||
|
||||
#include <stdint.h>
|
||||
#include <stddef.h>
|
||||
|
||||
class mmio_t
|
||||
{
|
||||
public:
|
||||
virtual void read_req(uint64_t addr, size_t size, size_t len) = 0;
|
||||
virtual void write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb) = 0;
|
||||
virtual bool read_resp(void *data) = 0;
|
||||
virtual bool write_resp() = 0;
|
||||
};
|
||||
|
||||
void* init(uint64_t memsize, bool dram);
|
||||
|
||||
#endif // __MMIO_H
|
|
@ -0,0 +1,940 @@
|
|||
#include "mmio_f1.h"
|
||||
#include "mm.h"
|
||||
#include "mm_dramsim2.h"
|
||||
#include <memory>
|
||||
#include <cassert>
|
||||
#include <cmath>
|
||||
#ifdef VCS
|
||||
#include <DirectC.h>
|
||||
#include "midas_context.h"
|
||||
#else
|
||||
#include <verilated.h>
|
||||
#if VM_TRACE
|
||||
#include <verilated_vcd_c.h>
|
||||
#endif // VM_TRACE
|
||||
#endif
|
||||
|
||||
void mmio_f1_t::read_req(uint64_t addr, size_t size, size_t len) {
|
||||
mmio_req_addr_t ar(0, addr, size, len);
|
||||
this->ar.push(ar);
|
||||
}
|
||||
|
||||
void mmio_f1_t::write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb) {
|
||||
int nbytes = 1 << size;
|
||||
|
||||
mmio_req_addr_t aw(0, addr, size, len);
|
||||
this->aw.push(aw);
|
||||
|
||||
for (int i = 0; i < len + 1; i++) {
|
||||
mmio_req_data_t w(((char*) data) + i * nbytes, strb[i], i == len);
|
||||
this->w.push(w);
|
||||
}
|
||||
}
|
||||
|
||||
void mmio_f1_t::tick(
|
||||
bool reset,
|
||||
bool ar_ready,
|
||||
bool aw_ready,
|
||||
bool w_ready,
|
||||
size_t r_id,
|
||||
void* r_data,
|
||||
bool r_last,
|
||||
bool r_valid,
|
||||
size_t b_id,
|
||||
bool b_valid)
|
||||
{
|
||||
const bool ar_fire = !reset && ar_ready && ar_valid();
|
||||
const bool aw_fire = !reset && aw_ready && aw_valid();
|
||||
const bool w_fire = !reset && w_ready && w_valid();
|
||||
const bool r_fire = !reset && r_valid && r_ready();
|
||||
const bool b_fire = !reset && b_valid && b_ready();
|
||||
|
||||
if (ar_fire) read_inflight = true;
|
||||
if (aw_fire) write_inflight = true;
|
||||
if (w_fire) this->w.pop();
|
||||
if (r_fire) {
|
||||
char* dat = (char*)malloc(dummy_data.size());
|
||||
memcpy(dat, (char*)r_data, dummy_data.size());
|
||||
mmio_resp_data_t r(r_id, dat, r_last);
|
||||
this->r.push(r);
|
||||
}
|
||||
if (b_fire) {
|
||||
this->b.push(b_id);
|
||||
}
|
||||
}
|
||||
|
||||
bool mmio_f1_t::read_resp(void* data) {
|
||||
if (ar.empty() || r.size() <= ar.front().len) {
|
||||
return false;
|
||||
} else {
|
||||
auto ar = this->ar.front();
|
||||
size_t word_size = 1 << ar.size;
|
||||
for (size_t i = 0 ; i <= ar.len ; i++) {
|
||||
auto r = this->r.front();
|
||||
assert(i < ar.len || r.last);
|
||||
memcpy(((char*)data) + i * word_size, r.data, word_size);
|
||||
free(r.data);
|
||||
this->r.pop();
|
||||
}
|
||||
this->ar.pop();
|
||||
read_inflight = false;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
bool mmio_f1_t::write_resp() {
|
||||
if (aw.empty() || b.empty()) {
|
||||
return false;
|
||||
} else {
|
||||
aw.pop();
|
||||
b.pop();
|
||||
write_inflight = false;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
extern uint64_t main_time;
|
||||
extern std::unique_ptr<mmio_t> master;
|
||||
extern std::unique_ptr<mmio_t> dma;
|
||||
std::unique_ptr<mm_t> slave[4];
|
||||
|
||||
void* init(uint64_t memsize, bool dramsim) {
|
||||
master.reset(new mmio_f1_t(MMIO_WIDTH));
|
||||
dma.reset(new mmio_f1_t(DMA_WIDTH));
|
||||
for (int mem_channel_index=0; mem_channel_index < 4; mem_channel_index++) {
|
||||
slave[mem_channel_index].reset(dramsim ? (mm_t*) new mm_dramsim2_t(1 << MEM_ID_BITS) : (mm_t*) new mm_magic_t);
|
||||
slave[mem_channel_index]->init(memsize, MEM_WIDTH, 64);
|
||||
}
|
||||
return slave[0]->get_data();
|
||||
}
|
||||
|
||||
#ifdef VCS
|
||||
static const size_t MASTER_DATA_SIZE = MMIO_WIDTH / sizeof(uint32_t);
|
||||
static const size_t DMA_DATA_SIZE = DMA_WIDTH / sizeof(uint32_t);
|
||||
static const size_t DMA_STRB_SIZE = (DMA_WIDTH/8 + sizeof(uint32_t) - 1) / sizeof(uint32_t);
|
||||
static const size_t SLAVE_DATA_SIZE = MEM_WIDTH / sizeof(uint32_t);
|
||||
extern midas_context_t* host;
|
||||
extern bool vcs_fin;
|
||||
extern bool vcs_rst;
|
||||
extern "C" {
|
||||
void tick(
|
||||
vc_handle reset,
|
||||
vc_handle fin,
|
||||
|
||||
vc_handle master_ar_valid,
|
||||
vc_handle master_ar_ready,
|
||||
vc_handle master_ar_bits_addr,
|
||||
vc_handle master_ar_bits_id,
|
||||
vc_handle master_ar_bits_size,
|
||||
vc_handle master_ar_bits_len,
|
||||
|
||||
vc_handle master_aw_valid,
|
||||
vc_handle master_aw_ready,
|
||||
vc_handle master_aw_bits_addr,
|
||||
vc_handle master_aw_bits_id,
|
||||
vc_handle master_aw_bits_size,
|
||||
vc_handle master_aw_bits_len,
|
||||
|
||||
vc_handle master_w_valid,
|
||||
vc_handle master_w_ready,
|
||||
vc_handle master_w_bits_strb,
|
||||
vc_handle master_w_bits_data,
|
||||
vc_handle master_w_bits_last,
|
||||
|
||||
vc_handle master_r_valid,
|
||||
vc_handle master_r_ready,
|
||||
vc_handle master_r_bits_resp,
|
||||
vc_handle master_r_bits_id,
|
||||
vc_handle master_r_bits_data,
|
||||
vc_handle master_r_bits_last,
|
||||
|
||||
vc_handle master_b_valid,
|
||||
vc_handle master_b_ready,
|
||||
vc_handle master_b_bits_resp,
|
||||
vc_handle master_b_bits_id,
|
||||
|
||||
vc_handle dma_ar_valid,
|
||||
vc_handle dma_ar_ready,
|
||||
vc_handle dma_ar_bits_addr,
|
||||
vc_handle dma_ar_bits_id,
|
||||
vc_handle dma_ar_bits_size,
|
||||
vc_handle dma_ar_bits_len,
|
||||
|
||||
vc_handle dma_aw_valid,
|
||||
vc_handle dma_aw_ready,
|
||||
vc_handle dma_aw_bits_addr,
|
||||
vc_handle dma_aw_bits_id,
|
||||
vc_handle dma_aw_bits_size,
|
||||
vc_handle dma_aw_bits_len,
|
||||
|
||||
vc_handle dma_w_valid,
|
||||
vc_handle dma_w_ready,
|
||||
vc_handle dma_w_bits_strb,
|
||||
vc_handle dma_w_bits_data,
|
||||
vc_handle dma_w_bits_last,
|
||||
|
||||
vc_handle dma_r_valid,
|
||||
vc_handle dma_r_ready,
|
||||
vc_handle dma_r_bits_resp,
|
||||
vc_handle dma_r_bits_id,
|
||||
vc_handle dma_r_bits_data,
|
||||
vc_handle dma_r_bits_last,
|
||||
|
||||
vc_handle dma_b_valid,
|
||||
vc_handle dma_b_ready,
|
||||
vc_handle dma_b_bits_resp,
|
||||
vc_handle dma_b_bits_id,
|
||||
|
||||
vc_handle slave_0_ar_valid,
|
||||
vc_handle slave_0_ar_ready,
|
||||
vc_handle slave_0_ar_bits_addr,
|
||||
vc_handle slave_0_ar_bits_id,
|
||||
vc_handle slave_0_ar_bits_size,
|
||||
vc_handle slave_0_ar_bits_len,
|
||||
|
||||
vc_handle slave_0_aw_valid,
|
||||
vc_handle slave_0_aw_ready,
|
||||
vc_handle slave_0_aw_bits_addr,
|
||||
vc_handle slave_0_aw_bits_id,
|
||||
vc_handle slave_0_aw_bits_size,
|
||||
vc_handle slave_0_aw_bits_len,
|
||||
|
||||
vc_handle slave_0_w_valid,
|
||||
vc_handle slave_0_w_ready,
|
||||
vc_handle slave_0_w_bits_strb,
|
||||
vc_handle slave_0_w_bits_data,
|
||||
vc_handle slave_0_w_bits_last,
|
||||
|
||||
vc_handle slave_0_r_valid,
|
||||
vc_handle slave_0_r_ready,
|
||||
vc_handle slave_0_r_bits_resp,
|
||||
vc_handle slave_0_r_bits_id,
|
||||
vc_handle slave_0_r_bits_data,
|
||||
vc_handle slave_0_r_bits_last,
|
||||
|
||||
vc_handle slave_0_b_valid,
|
||||
vc_handle slave_0_b_ready,
|
||||
vc_handle slave_0_b_bits_resp,
|
||||
vc_handle slave_0_b_bits_id,
|
||||
|
||||
vc_handle slave_1_ar_valid,
|
||||
vc_handle slave_1_ar_ready,
|
||||
vc_handle slave_1_ar_bits_addr,
|
||||
vc_handle slave_1_ar_bits_id,
|
||||
vc_handle slave_1_ar_bits_size,
|
||||
vc_handle slave_1_ar_bits_len,
|
||||
|
||||
vc_handle slave_1_aw_valid,
|
||||
vc_handle slave_1_aw_ready,
|
||||
vc_handle slave_1_aw_bits_addr,
|
||||
vc_handle slave_1_aw_bits_id,
|
||||
vc_handle slave_1_aw_bits_size,
|
||||
vc_handle slave_1_aw_bits_len,
|
||||
|
||||
vc_handle slave_1_w_valid,
|
||||
vc_handle slave_1_w_ready,
|
||||
vc_handle slave_1_w_bits_strb,
|
||||
vc_handle slave_1_w_bits_data,
|
||||
vc_handle slave_1_w_bits_last,
|
||||
|
||||
vc_handle slave_1_r_valid,
|
||||
vc_handle slave_1_r_ready,
|
||||
vc_handle slave_1_r_bits_resp,
|
||||
vc_handle slave_1_r_bits_id,
|
||||
vc_handle slave_1_r_bits_data,
|
||||
vc_handle slave_1_r_bits_last,
|
||||
|
||||
vc_handle slave_1_b_valid,
|
||||
vc_handle slave_1_b_ready,
|
||||
vc_handle slave_1_b_bits_resp,
|
||||
vc_handle slave_1_b_bits_id,
|
||||
|
||||
vc_handle slave_2_ar_valid,
|
||||
vc_handle slave_2_ar_ready,
|
||||
vc_handle slave_2_ar_bits_addr,
|
||||
vc_handle slave_2_ar_bits_id,
|
||||
vc_handle slave_2_ar_bits_size,
|
||||
vc_handle slave_2_ar_bits_len,
|
||||
|
||||
vc_handle slave_2_aw_valid,
|
||||
vc_handle slave_2_aw_ready,
|
||||
vc_handle slave_2_aw_bits_addr,
|
||||
vc_handle slave_2_aw_bits_id,
|
||||
vc_handle slave_2_aw_bits_size,
|
||||
vc_handle slave_2_aw_bits_len,
|
||||
|
||||
vc_handle slave_2_w_valid,
|
||||
vc_handle slave_2_w_ready,
|
||||
vc_handle slave_2_w_bits_strb,
|
||||
vc_handle slave_2_w_bits_data,
|
||||
vc_handle slave_2_w_bits_last,
|
||||
|
||||
vc_handle slave_2_r_valid,
|
||||
vc_handle slave_2_r_ready,
|
||||
vc_handle slave_2_r_bits_resp,
|
||||
vc_handle slave_2_r_bits_id,
|
||||
vc_handle slave_2_r_bits_data,
|
||||
vc_handle slave_2_r_bits_last,
|
||||
|
||||
vc_handle slave_2_b_valid,
|
||||
vc_handle slave_2_b_ready,
|
||||
vc_handle slave_2_b_bits_resp,
|
||||
vc_handle slave_2_b_bits_id,
|
||||
|
||||
vc_handle slave_3_ar_valid,
|
||||
vc_handle slave_3_ar_ready,
|
||||
vc_handle slave_3_ar_bits_addr,
|
||||
vc_handle slave_3_ar_bits_id,
|
||||
vc_handle slave_3_ar_bits_size,
|
||||
vc_handle slave_3_ar_bits_len,
|
||||
|
||||
vc_handle slave_3_aw_valid,
|
||||
vc_handle slave_3_aw_ready,
|
||||
vc_handle slave_3_aw_bits_addr,
|
||||
vc_handle slave_3_aw_bits_id,
|
||||
vc_handle slave_3_aw_bits_size,
|
||||
vc_handle slave_3_aw_bits_len,
|
||||
|
||||
vc_handle slave_3_w_valid,
|
||||
vc_handle slave_3_w_ready,
|
||||
vc_handle slave_3_w_bits_strb,
|
||||
vc_handle slave_3_w_bits_data,
|
||||
vc_handle slave_3_w_bits_last,
|
||||
|
||||
vc_handle slave_3_r_valid,
|
||||
vc_handle slave_3_r_ready,
|
||||
vc_handle slave_3_r_bits_resp,
|
||||
vc_handle slave_3_r_bits_id,
|
||||
vc_handle slave_3_r_bits_data,
|
||||
vc_handle slave_3_r_bits_last,
|
||||
|
||||
vc_handle slave_3_b_valid,
|
||||
vc_handle slave_3_b_ready,
|
||||
vc_handle slave_3_b_bits_resp,
|
||||
vc_handle slave_3_b_bits_id
|
||||
) {
|
||||
mmio_f1_t *m, *d;
|
||||
assert(m = dynamic_cast<mmio_f1_t*>(master.get()));
|
||||
assert(d = dynamic_cast<mmio_f1_t*>(dma.get()));
|
||||
assert(DMA_STRB_SIZE <= 2);
|
||||
|
||||
uint32_t master_r_data[MASTER_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
|
||||
master_r_data[i] = vc_4stVectorRef(master_r_bits_data)[i].d;
|
||||
}
|
||||
uint32_t dma_r_data[DMA_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < DMA_DATA_SIZE ; i++) {
|
||||
dma_r_data[i] = vc_4stVectorRef(dma_r_bits_data)[i].d;
|
||||
}
|
||||
uint32_t slave_0_w_data[SLAVE_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
slave_0_w_data[i] = vc_4stVectorRef(slave_0_w_bits_data)[i].d;
|
||||
}
|
||||
|
||||
uint32_t slave_1_w_data[SLAVE_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
slave_1_w_data[i] = vc_4stVectorRef(slave_1_w_bits_data)[i].d;
|
||||
}
|
||||
|
||||
uint32_t slave_2_w_data[SLAVE_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
slave_2_w_data[i] = vc_4stVectorRef(slave_2_w_bits_data)[i].d;
|
||||
}
|
||||
|
||||
uint32_t slave_3_w_data[SLAVE_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
slave_3_w_data[i] = vc_4stVectorRef(slave_3_w_bits_data)[i].d;
|
||||
}
|
||||
|
||||
m->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(master_ar_ready),
|
||||
vc_getScalar(master_aw_ready),
|
||||
vc_getScalar(master_w_ready),
|
||||
vc_4stVectorRef(master_r_bits_id)->d,
|
||||
master_r_data,
|
||||
vc_getScalar(master_r_bits_last),
|
||||
vc_getScalar(master_r_valid),
|
||||
vc_4stVectorRef(master_b_bits_id)->d,
|
||||
vc_getScalar(master_b_valid)
|
||||
);
|
||||
|
||||
d->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(dma_ar_ready),
|
||||
vc_getScalar(dma_aw_ready),
|
||||
vc_getScalar(dma_w_ready),
|
||||
vc_4stVectorRef(dma_r_bits_id)->d,
|
||||
dma_r_data,
|
||||
vc_getScalar(dma_r_bits_last),
|
||||
vc_getScalar(dma_r_valid),
|
||||
vc_4stVectorRef(dma_b_bits_id)->d,
|
||||
vc_getScalar(dma_b_valid)
|
||||
);
|
||||
|
||||
slave[0]->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(slave_0_ar_valid),
|
||||
vc_4stVectorRef(slave_0_ar_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_0_ar_bits_id)->d,
|
||||
vc_4stVectorRef(slave_0_ar_bits_size)->d,
|
||||
vc_4stVectorRef(slave_0_ar_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_0_aw_valid),
|
||||
vc_4stVectorRef(slave_0_aw_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_0_aw_bits_id)->d,
|
||||
vc_4stVectorRef(slave_0_aw_bits_size)->d,
|
||||
vc_4stVectorRef(slave_0_aw_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_0_w_valid),
|
||||
vc_4stVectorRef(slave_0_w_bits_strb)->d,
|
||||
slave_0_w_data,
|
||||
vc_getScalar(slave_0_w_bits_last),
|
||||
|
||||
vc_getScalar(slave_0_r_ready),
|
||||
vc_getScalar(slave_0_b_ready)
|
||||
);
|
||||
|
||||
slave[1]->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(slave_1_ar_valid),
|
||||
vc_4stVectorRef(slave_1_ar_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_1_ar_bits_id)->d,
|
||||
vc_4stVectorRef(slave_1_ar_bits_size)->d,
|
||||
vc_4stVectorRef(slave_1_ar_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_1_aw_valid),
|
||||
vc_4stVectorRef(slave_1_aw_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_1_aw_bits_id)->d,
|
||||
vc_4stVectorRef(slave_1_aw_bits_size)->d,
|
||||
vc_4stVectorRef(slave_1_aw_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_1_w_valid),
|
||||
vc_4stVectorRef(slave_1_w_bits_strb)->d,
|
||||
slave_1_w_data,
|
||||
vc_getScalar(slave_1_w_bits_last),
|
||||
|
||||
vc_getScalar(slave_1_r_ready),
|
||||
vc_getScalar(slave_1_b_ready)
|
||||
);
|
||||
|
||||
slave[2]->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(slave_2_ar_valid),
|
||||
vc_4stVectorRef(slave_2_ar_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_2_ar_bits_id)->d,
|
||||
vc_4stVectorRef(slave_2_ar_bits_size)->d,
|
||||
vc_4stVectorRef(slave_2_ar_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_2_aw_valid),
|
||||
vc_4stVectorRef(slave_2_aw_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_2_aw_bits_id)->d,
|
||||
vc_4stVectorRef(slave_2_aw_bits_size)->d,
|
||||
vc_4stVectorRef(slave_2_aw_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_2_w_valid),
|
||||
vc_4stVectorRef(slave_2_w_bits_strb)->d,
|
||||
slave_2_w_data,
|
||||
vc_getScalar(slave_2_w_bits_last),
|
||||
|
||||
vc_getScalar(slave_2_r_ready),
|
||||
vc_getScalar(slave_2_b_ready)
|
||||
);
|
||||
|
||||
slave[3]->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(slave_3_ar_valid),
|
||||
vc_4stVectorRef(slave_3_ar_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_3_ar_bits_id)->d,
|
||||
vc_4stVectorRef(slave_3_ar_bits_size)->d,
|
||||
vc_4stVectorRef(slave_3_ar_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_3_aw_valid),
|
||||
vc_4stVectorRef(slave_3_aw_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_3_aw_bits_id)->d,
|
||||
vc_4stVectorRef(slave_3_aw_bits_size)->d,
|
||||
vc_4stVectorRef(slave_3_aw_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_3_w_valid),
|
||||
vc_4stVectorRef(slave_3_w_bits_strb)->d,
|
||||
slave_3_w_data,
|
||||
vc_getScalar(slave_3_w_bits_last),
|
||||
|
||||
vc_getScalar(slave_3_r_ready),
|
||||
vc_getScalar(slave_3_b_ready)
|
||||
);
|
||||
|
||||
if (!vcs_fin) host->switch_to();
|
||||
else vcs_fin = false;
|
||||
|
||||
vc_putScalar(master_aw_valid, m->aw_valid());
|
||||
vc_putScalar(master_ar_valid, m->ar_valid());
|
||||
vc_putScalar(master_w_valid, m->w_valid());
|
||||
vc_putScalar(master_w_bits_last, m->w_last());
|
||||
vc_putScalar(master_r_ready, m->r_ready());
|
||||
vc_putScalar(master_b_ready, m->b_ready());
|
||||
|
||||
vec32 md[MASTER_DATA_SIZE];
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_id();
|
||||
vc_put4stVector(master_aw_bits_id, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_addr();
|
||||
vc_put4stVector(master_aw_bits_addr, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_size();
|
||||
vc_put4stVector(master_aw_bits_size, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_len();
|
||||
vc_put4stVector(master_aw_bits_len, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_id();
|
||||
vc_put4stVector(master_ar_bits_id, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_addr();
|
||||
vc_put4stVector(master_ar_bits_addr, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_size();
|
||||
vc_put4stVector(master_ar_bits_size, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_len();
|
||||
vc_put4stVector(master_ar_bits_len, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->w_strb();
|
||||
vc_put4stVector(master_w_bits_strb, md);
|
||||
|
||||
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
|
||||
md[i].c = 0;
|
||||
md[i].d = ((uint32_t*) m->w_data())[i];
|
||||
}
|
||||
vc_put4stVector(master_w_bits_data, md);
|
||||
|
||||
vc_putScalar(dma_aw_valid, d->aw_valid());
|
||||
vc_putScalar(dma_ar_valid, d->ar_valid());
|
||||
vc_putScalar(dma_w_valid, d->w_valid());
|
||||
vc_putScalar(dma_w_bits_last, d->w_last());
|
||||
vc_putScalar(dma_r_ready, d->r_ready());
|
||||
vc_putScalar(dma_b_ready, d->b_ready());
|
||||
|
||||
vec32 dd[DMA_DATA_SIZE];
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->aw_id();
|
||||
vc_put4stVector(dma_aw_bits_id, dd);
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->aw_addr();
|
||||
dd[1].c = 0;
|
||||
dd[1].d = d->aw_addr() >> 32;
|
||||
vc_put4stVector(dma_aw_bits_addr, dd);
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->aw_size();
|
||||
vc_put4stVector(dma_aw_bits_size, dd);
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->aw_len();
|
||||
vc_put4stVector(dma_aw_bits_len, dd);
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->ar_id();
|
||||
vc_put4stVector(dma_ar_bits_id, dd);
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->ar_addr();
|
||||
dd[1].c = 0;
|
||||
dd[1].d = d->ar_addr() >> 32;
|
||||
vc_put4stVector(dma_ar_bits_addr, dd);
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->ar_size();
|
||||
vc_put4stVector(dma_ar_bits_size, dd);
|
||||
dd[0].c = 0;
|
||||
dd[0].d = d->ar_len();
|
||||
vc_put4stVector(dma_ar_bits_len, dd);
|
||||
|
||||
auto strb = d->w_strb();
|
||||
for (size_t i = 0 ; i < DMA_STRB_SIZE ; i++) {
|
||||
dd[i].c = 0;
|
||||
dd[i].d = ((uint32_t*)(&strb))[i];
|
||||
}
|
||||
vc_put4stVector(dma_w_bits_strb, dd);
|
||||
|
||||
for (size_t i = 0 ; i < DMA_DATA_SIZE ; i++) {
|
||||
dd[i].c = 0;
|
||||
dd[i].d = ((uint32_t*) d->w_data())[i];
|
||||
}
|
||||
vc_put4stVector(dma_w_bits_data, dd);
|
||||
|
||||
vc_putScalar(slave_0_aw_ready, slave[0]->aw_ready());
|
||||
vc_putScalar(slave_0_ar_ready, slave[0]->ar_ready());
|
||||
vc_putScalar(slave_0_w_ready, slave[0]->w_ready());
|
||||
vc_putScalar(slave_0_b_valid, slave[0]->b_valid());
|
||||
vc_putScalar(slave_0_r_valid, slave[0]->r_valid());
|
||||
vc_putScalar(slave_0_r_bits_last, slave[0]->r_last());
|
||||
|
||||
vc_putScalar(slave_1_aw_ready, slave[1]->aw_ready());
|
||||
vc_putScalar(slave_1_ar_ready, slave[1]->ar_ready());
|
||||
vc_putScalar(slave_1_w_ready, slave[1]->w_ready());
|
||||
vc_putScalar(slave_1_b_valid, slave[1]->b_valid());
|
||||
vc_putScalar(slave_1_r_valid, slave[1]->r_valid());
|
||||
vc_putScalar(slave_1_r_bits_last, slave[1]->r_last());
|
||||
|
||||
vc_putScalar(slave_2_aw_ready, slave[2]->aw_ready());
|
||||
vc_putScalar(slave_2_ar_ready, slave[2]->ar_ready());
|
||||
vc_putScalar(slave_2_w_ready, slave[2]->w_ready());
|
||||
vc_putScalar(slave_2_b_valid, slave[2]->b_valid());
|
||||
vc_putScalar(slave_2_r_valid, slave[2]->r_valid());
|
||||
vc_putScalar(slave_2_r_bits_last, slave[2]->r_last());
|
||||
|
||||
vc_putScalar(slave_3_aw_ready, slave[3]->aw_ready());
|
||||
vc_putScalar(slave_3_ar_ready, slave[3]->ar_ready());
|
||||
vc_putScalar(slave_3_w_ready, slave[3]->w_ready());
|
||||
vc_putScalar(slave_3_b_valid, slave[3]->b_valid());
|
||||
vc_putScalar(slave_3_r_valid, slave[3]->r_valid());
|
||||
vc_putScalar(slave_3_r_bits_last, slave[3]->r_last());
|
||||
|
||||
|
||||
vec32 sd[SLAVE_DATA_SIZE];
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[0]->b_id();
|
||||
vc_put4stVector(slave_0_b_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[0]->b_resp();
|
||||
vc_put4stVector(slave_0_b_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[0]->r_id();
|
||||
vc_put4stVector(slave_0_r_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[0]->r_resp();
|
||||
vc_put4stVector(slave_0_r_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[1]->b_id();
|
||||
vc_put4stVector(slave_1_b_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[1]->b_resp();
|
||||
vc_put4stVector(slave_1_b_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[1]->r_id();
|
||||
vc_put4stVector(slave_1_r_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[1]->r_resp();
|
||||
vc_put4stVector(slave_1_r_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[2]->b_id();
|
||||
vc_put4stVector(slave_2_b_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[2]->b_resp();
|
||||
vc_put4stVector(slave_2_b_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[2]->r_id();
|
||||
vc_put4stVector(slave_2_r_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[2]->r_resp();
|
||||
vc_put4stVector(slave_2_r_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[3]->b_id();
|
||||
vc_put4stVector(slave_3_b_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[3]->b_resp();
|
||||
vc_put4stVector(slave_3_b_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[3]->r_id();
|
||||
vc_put4stVector(slave_3_r_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave[3]->r_resp();
|
||||
vc_put4stVector(slave_3_r_bits_resp, sd);
|
||||
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
sd[i].c = 0;
|
||||
sd[i].d = ((uint32_t*) slave[0]->r_data())[i];
|
||||
}
|
||||
vc_put4stVector(slave_0_r_bits_data, sd);
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
sd[i].c = 0;
|
||||
sd[i].d = ((uint32_t*) slave[1]->r_data())[i];
|
||||
}
|
||||
vc_put4stVector(slave_1_r_bits_data, sd);
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
sd[i].c = 0;
|
||||
sd[i].d = ((uint32_t*) slave[2]->r_data())[i];
|
||||
}
|
||||
vc_put4stVector(slave_2_r_bits_data, sd);
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
sd[i].c = 0;
|
||||
sd[i].d = ((uint32_t*) slave[3]->r_data())[i];
|
||||
}
|
||||
vc_put4stVector(slave_3_r_bits_data, sd);
|
||||
vc_putScalar(reset, vcs_rst);
|
||||
vc_putScalar(fin, vcs_fin);
|
||||
|
||||
main_time++;
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
extern PLATFORM_TYPE* top;
|
||||
#if VM_TRACE
|
||||
extern VerilatedVcdC* tfp;
|
||||
#endif // VM_TRACE
|
||||
|
||||
void tick() {
|
||||
mmio_f1_t *m, *d;
|
||||
assert(m = dynamic_cast<mmio_f1_t*>(master.get()));
|
||||
assert(d = dynamic_cast<mmio_f1_t*>(dma.get()));
|
||||
|
||||
// ASSUMPTION: All models have *no* combinational paths through I/O
|
||||
// Step 1: Clock lo -> propagate signals between DUT and software models
|
||||
top->io_master_aw_valid = m->aw_valid();
|
||||
top->io_master_aw_bits_id = m->aw_id();
|
||||
top->io_master_aw_bits_addr = m->aw_addr();
|
||||
top->io_master_aw_bits_size = m->aw_size();
|
||||
top->io_master_aw_bits_len = m->aw_len();
|
||||
|
||||
top->io_master_ar_valid = m->ar_valid();
|
||||
top->io_master_ar_bits_id = m->ar_id();
|
||||
top->io_master_ar_bits_addr = m->ar_addr();
|
||||
top->io_master_ar_bits_size = m->ar_size();
|
||||
top->io_master_ar_bits_len = m->ar_len();
|
||||
|
||||
top->io_master_w_valid = m->w_valid();
|
||||
top->io_master_w_bits_strb = m->w_strb();
|
||||
top->io_master_w_bits_last = m->w_last();
|
||||
|
||||
top->io_master_r_ready = m->r_ready();
|
||||
top->io_master_b_ready = m->b_ready();
|
||||
#if CTRL_DATA_BITS > 64
|
||||
memcpy(top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
|
||||
#else
|
||||
memcpy(&top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
|
||||
#endif
|
||||
|
||||
|
||||
top->io_dma_aw_valid = d->aw_valid();
|
||||
top->io_dma_aw_bits_id = d->aw_id();
|
||||
top->io_dma_aw_bits_addr = d->aw_addr();
|
||||
top->io_dma_aw_bits_size = d->aw_size();
|
||||
top->io_dma_aw_bits_len = d->aw_len();
|
||||
|
||||
top->io_dma_ar_valid = d->ar_valid();
|
||||
top->io_dma_ar_bits_id = d->ar_id();
|
||||
top->io_dma_ar_bits_addr = d->ar_addr();
|
||||
top->io_dma_ar_bits_size = d->ar_size();
|
||||
top->io_dma_ar_bits_len = d->ar_len();
|
||||
|
||||
top->io_dma_w_valid = d->w_valid();
|
||||
top->io_dma_w_bits_strb = d->w_strb();
|
||||
top->io_dma_w_bits_last = d->w_last();
|
||||
|
||||
top->io_dma_r_ready = d->r_ready();
|
||||
top->io_dma_b_ready = d->b_ready();
|
||||
#if DMA_DATA_BITS > 64
|
||||
memcpy(top->io_dma_w_bits_data, d->w_data(), DMA_WIDTH);
|
||||
#else
|
||||
memcpy(&top->io_dma_w_bits_data, d->w_data(), DMA_WIDTH);
|
||||
#endif
|
||||
|
||||
top->io_slave_0_aw_ready = slave[0]->aw_ready();
|
||||
top->io_slave_0_ar_ready = slave[0]->ar_ready();
|
||||
top->io_slave_0_w_ready = slave[0]->w_ready();
|
||||
top->io_slave_0_b_valid = slave[0]->b_valid();
|
||||
top->io_slave_0_b_bits_id = slave[0]->b_id();
|
||||
top->io_slave_0_b_bits_resp = slave[0]->b_resp();
|
||||
top->io_slave_0_r_valid = slave[0]->r_valid();
|
||||
top->io_slave_0_r_bits_id = slave[0]->r_id();
|
||||
top->io_slave_0_r_bits_resp = slave[0]->r_resp();
|
||||
top->io_slave_0_r_bits_last = slave[0]->r_last();
|
||||
top->io_slave_1_aw_ready = slave[1]->aw_ready();
|
||||
top->io_slave_1_ar_ready = slave[1]->ar_ready();
|
||||
top->io_slave_1_w_ready = slave[1]->w_ready();
|
||||
top->io_slave_1_b_valid = slave[1]->b_valid();
|
||||
top->io_slave_1_b_bits_id = slave[1]->b_id();
|
||||
top->io_slave_1_b_bits_resp = slave[1]->b_resp();
|
||||
top->io_slave_1_r_valid = slave[1]->r_valid();
|
||||
top->io_slave_1_r_bits_id = slave[1]->r_id();
|
||||
top->io_slave_1_r_bits_resp = slave[1]->r_resp();
|
||||
top->io_slave_1_r_bits_last = slave[1]->r_last();
|
||||
top->io_slave_2_aw_ready = slave[2]->aw_ready();
|
||||
top->io_slave_2_ar_ready = slave[2]->ar_ready();
|
||||
top->io_slave_2_w_ready = slave[2]->w_ready();
|
||||
top->io_slave_2_b_valid = slave[2]->b_valid();
|
||||
top->io_slave_2_b_bits_id = slave[2]->b_id();
|
||||
top->io_slave_2_b_bits_resp = slave[2]->b_resp();
|
||||
top->io_slave_2_r_valid = slave[2]->r_valid();
|
||||
top->io_slave_2_r_bits_id = slave[2]->r_id();
|
||||
top->io_slave_2_r_bits_resp = slave[2]->r_resp();
|
||||
top->io_slave_2_r_bits_last = slave[2]->r_last();
|
||||
top->io_slave_3_aw_ready = slave[3]->aw_ready();
|
||||
top->io_slave_3_ar_ready = slave[3]->ar_ready();
|
||||
top->io_slave_3_w_ready = slave[3]->w_ready();
|
||||
top->io_slave_3_b_valid = slave[3]->b_valid();
|
||||
top->io_slave_3_b_bits_id = slave[3]->b_id();
|
||||
top->io_slave_3_b_bits_resp = slave[3]->b_resp();
|
||||
top->io_slave_3_r_valid = slave[3]->r_valid();
|
||||
top->io_slave_3_r_bits_id = slave[3]->r_id();
|
||||
top->io_slave_3_r_bits_resp = slave[3]->r_resp();
|
||||
top->io_slave_3_r_bits_last = slave[3]->r_last();
|
||||
#if MEM_DATA_BITS > 64
|
||||
memcpy(top->io_slave_0_r_bits_data, slave[0]->r_data(), MEM_WIDTH);
|
||||
memcpy(top->io_slave_1_r_bits_data, slave[1]->r_data(), MEM_WIDTH);
|
||||
memcpy(top->io_slave_2_r_bits_data, slave[2]->r_data(), MEM_WIDTH);
|
||||
memcpy(top->io_slave_3_r_bits_data, slave[3]->r_data(), MEM_WIDTH);
|
||||
#else
|
||||
memcpy(&top->io_slave_0_r_bits_data, slave[0]->r_data(), MEM_WIDTH);
|
||||
memcpy(&top->io_slave_1_r_bits_data, slave[1]->r_data(), MEM_WIDTH);
|
||||
memcpy(&top->io_slave_2_r_bits_data, slave[2]->r_data(), MEM_WIDTH);
|
||||
memcpy(&top->io_slave_3_r_bits_data, slave[3]->r_data(), MEM_WIDTH);
|
||||
#endif
|
||||
top->eval();
|
||||
#if VM_TRACE
|
||||
if (tfp) tfp->dump((double) main_time);
|
||||
#endif // VM_TRACE
|
||||
main_time++;
|
||||
|
||||
top->clock = 0;
|
||||
top->eval(); // This shouldn't do much
|
||||
#if VM_TRACE
|
||||
if (tfp) tfp->dump((double) main_time);
|
||||
#endif // VM_TRACE
|
||||
main_time++;
|
||||
|
||||
// Step 2: Clock high, tick all software models and evaluate DUT with posedge
|
||||
m->tick(
|
||||
top->reset,
|
||||
top->io_master_ar_ready,
|
||||
top->io_master_aw_ready,
|
||||
top->io_master_w_ready,
|
||||
top->io_master_r_bits_id,
|
||||
#if CTRL_DATA_BITS > 64
|
||||
top->io_master_r_bits_data,
|
||||
#else
|
||||
&top->io_master_r_bits_data,
|
||||
#endif
|
||||
top->io_master_r_bits_last,
|
||||
top->io_master_r_valid,
|
||||
top->io_master_b_bits_id,
|
||||
top->io_master_b_valid
|
||||
);
|
||||
|
||||
d->tick(
|
||||
top->reset,
|
||||
top->io_dma_ar_ready,
|
||||
top->io_dma_aw_ready,
|
||||
top->io_dma_w_ready,
|
||||
top->io_dma_r_bits_id,
|
||||
#if DMA_DATA_BITS > 64
|
||||
top->io_dma_r_bits_data,
|
||||
#else
|
||||
&top->io_dma_r_bits_data,
|
||||
#endif
|
||||
top->io_dma_r_bits_last,
|
||||
top->io_dma_r_valid,
|
||||
top->io_dma_b_bits_id,
|
||||
top->io_dma_b_valid
|
||||
);
|
||||
|
||||
slave[0]->tick(
|
||||
top->reset,
|
||||
top->io_slave_0_ar_valid,
|
||||
top->io_slave_0_ar_bits_addr,
|
||||
top->io_slave_0_ar_bits_id,
|
||||
top->io_slave_0_ar_bits_size,
|
||||
top->io_slave_0_ar_bits_len,
|
||||
|
||||
top->io_slave_0_aw_valid,
|
||||
top->io_slave_0_aw_bits_addr,
|
||||
top->io_slave_0_aw_bits_id,
|
||||
top->io_slave_0_aw_bits_size,
|
||||
top->io_slave_0_aw_bits_len,
|
||||
|
||||
top->io_slave_0_w_valid,
|
||||
top->io_slave_0_w_bits_strb,
|
||||
#if MEM_DATA_BITS > 64
|
||||
top->io_slave_0_w_bits_data,
|
||||
#else
|
||||
&top->io_slave_0_w_bits_data,
|
||||
#endif
|
||||
top->io_slave_0_w_bits_last,
|
||||
|
||||
top->io_slave_0_r_ready,
|
||||
top->io_slave_0_b_ready
|
||||
);
|
||||
slave[1]->tick(
|
||||
top->reset,
|
||||
top->io_slave_1_ar_valid,
|
||||
top->io_slave_1_ar_bits_addr,
|
||||
top->io_slave_1_ar_bits_id,
|
||||
top->io_slave_1_ar_bits_size,
|
||||
top->io_slave_1_ar_bits_len,
|
||||
|
||||
top->io_slave_1_aw_valid,
|
||||
top->io_slave_1_aw_bits_addr,
|
||||
top->io_slave_1_aw_bits_id,
|
||||
top->io_slave_1_aw_bits_size,
|
||||
top->io_slave_1_aw_bits_len,
|
||||
|
||||
top->io_slave_1_w_valid,
|
||||
top->io_slave_1_w_bits_strb,
|
||||
#if MEM_DATA_BITS > 64
|
||||
top->io_slave_1_w_bits_data,
|
||||
#else
|
||||
&top->io_slave_1_w_bits_data,
|
||||
#endif
|
||||
top->io_slave_1_w_bits_last,
|
||||
|
||||
top->io_slave_1_r_ready,
|
||||
top->io_slave_1_b_ready
|
||||
);
|
||||
slave[2]->tick(
|
||||
top->reset,
|
||||
top->io_slave_2_ar_valid,
|
||||
top->io_slave_2_ar_bits_addr,
|
||||
top->io_slave_2_ar_bits_id,
|
||||
top->io_slave_2_ar_bits_size,
|
||||
top->io_slave_2_ar_bits_len,
|
||||
|
||||
top->io_slave_2_aw_valid,
|
||||
top->io_slave_2_aw_bits_addr,
|
||||
top->io_slave_2_aw_bits_id,
|
||||
top->io_slave_2_aw_bits_size,
|
||||
top->io_slave_2_aw_bits_len,
|
||||
|
||||
top->io_slave_2_w_valid,
|
||||
top->io_slave_2_w_bits_strb,
|
||||
#if MEM_DATA_BITS > 64
|
||||
top->io_slave_2_w_bits_data,
|
||||
#else
|
||||
&top->io_slave_2_w_bits_data,
|
||||
#endif
|
||||
top->io_slave_2_w_bits_last,
|
||||
|
||||
top->io_slave_2_r_ready,
|
||||
top->io_slave_2_b_ready
|
||||
);
|
||||
slave[3]->tick(
|
||||
top->reset,
|
||||
top->io_slave_3_ar_valid,
|
||||
top->io_slave_3_ar_bits_addr,
|
||||
top->io_slave_3_ar_bits_id,
|
||||
top->io_slave_3_ar_bits_size,
|
||||
top->io_slave_3_ar_bits_len,
|
||||
|
||||
top->io_slave_3_aw_valid,
|
||||
top->io_slave_3_aw_bits_addr,
|
||||
top->io_slave_3_aw_bits_id,
|
||||
top->io_slave_3_aw_bits_size,
|
||||
top->io_slave_3_aw_bits_len,
|
||||
|
||||
top->io_slave_3_w_valid,
|
||||
top->io_slave_3_w_bits_strb,
|
||||
#if MEM_DATA_BITS > 64
|
||||
top->io_slave_3_w_bits_data,
|
||||
#else
|
||||
&top->io_slave_3_w_bits_data,
|
||||
#endif
|
||||
top->io_slave_3_w_bits_last,
|
||||
|
||||
top->io_slave_3_r_ready,
|
||||
top->io_slave_3_b_ready
|
||||
);
|
||||
|
||||
top->clock = 1;
|
||||
top->eval();
|
||||
}
|
||||
|
||||
#endif // VCS
|
|
@ -0,0 +1,98 @@
|
|||
#ifndef __MMIO_F1_H
|
||||
#define __MMIO_F1_H
|
||||
|
||||
#include "mmio.h"
|
||||
#include <cstring>
|
||||
#include <vector>
|
||||
#include <queue>
|
||||
|
||||
struct mmio_req_addr_t
|
||||
{
|
||||
size_t id;
|
||||
uint64_t addr;
|
||||
size_t size;
|
||||
size_t len;
|
||||
|
||||
mmio_req_addr_t(size_t id_, uint64_t addr_, size_t size_, size_t len_):
|
||||
id(id_), addr(addr_), size(size_), len(len_) { }
|
||||
};
|
||||
|
||||
struct mmio_req_data_t
|
||||
{
|
||||
char* data;
|
||||
size_t strb;
|
||||
bool last;
|
||||
|
||||
mmio_req_data_t(char* data_, size_t strb_, bool last_):
|
||||
data(data_), strb(strb_), last(last_) { }
|
||||
};
|
||||
|
||||
struct mmio_resp_data_t
|
||||
{
|
||||
size_t id;
|
||||
char* data;
|
||||
bool last;
|
||||
|
||||
mmio_resp_data_t(size_t id_, char* data_, bool last_):
|
||||
id(id_), data(data_), last(last_) { }
|
||||
};
|
||||
|
||||
class mmio_f1_t: public mmio_t
|
||||
{
|
||||
public:
|
||||
mmio_f1_t(size_t size): read_inflight(false), write_inflight(false) {
|
||||
dummy_data.resize(size);
|
||||
}
|
||||
|
||||
bool aw_valid() { return !aw.empty() && !write_inflight; }
|
||||
size_t aw_id() { return aw_valid() ? aw.front().id : 0; }
|
||||
uint64_t aw_addr() { return aw_valid() ? aw.front().addr : 0; }
|
||||
size_t aw_size() { return aw_valid() ? aw.front().size : 0; }
|
||||
size_t aw_len() { return aw_valid() ? aw.front().len : 0; }
|
||||
|
||||
bool ar_valid() { return !ar.empty() && !read_inflight; }
|
||||
size_t ar_id() { return ar_valid() ? ar.front().id : 0; }
|
||||
uint64_t ar_addr() { return ar_valid() ? ar.front().addr : 0; }
|
||||
size_t ar_size() { return ar_valid() ? ar.front().size : 0; }
|
||||
size_t ar_len() { return ar_valid() ? ar.front().len : 0; }
|
||||
|
||||
bool w_valid() { return !w.empty(); }
|
||||
size_t w_strb() { return w_valid() ? w.front().strb : 0; }
|
||||
bool w_last() { return w_valid() ? w.front().last : false; }
|
||||
void* w_data() { return w_valid() ? w.front().data : &dummy_data[0]; }
|
||||
|
||||
bool r_ready() { return read_inflight; }
|
||||
bool b_ready() { return write_inflight; }
|
||||
|
||||
void tick
|
||||
(
|
||||
bool reset,
|
||||
bool ar_ready,
|
||||
bool aw_ready,
|
||||
bool w_ready,
|
||||
size_t r_id,
|
||||
void* r_data,
|
||||
bool r_last,
|
||||
bool r_valid,
|
||||
size_t b_id,
|
||||
bool b_valid
|
||||
);
|
||||
|
||||
virtual void read_req(uint64_t addr, size_t size, size_t len);
|
||||
virtual void write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb);
|
||||
virtual bool read_resp(void *data);
|
||||
virtual bool write_resp();
|
||||
|
||||
private:
|
||||
std::queue<mmio_req_addr_t> ar;
|
||||
std::queue<mmio_req_addr_t> aw;
|
||||
std::queue<mmio_req_data_t> w;
|
||||
std::queue<mmio_resp_data_t> r;
|
||||
std::queue<size_t> b;
|
||||
|
||||
bool read_inflight;
|
||||
bool write_inflight;
|
||||
std::vector<char> dummy_data;
|
||||
};
|
||||
|
||||
#endif // __MMIO_F1_H
|
|
@ -0,0 +1,420 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "mmio_zynq.h"
|
||||
#include "mm.h"
|
||||
#include "mm_dramsim2.h"
|
||||
#include <memory>
|
||||
#include <cassert>
|
||||
#include <cmath>
|
||||
#ifdef VCS
|
||||
#include <DirectC.h>
|
||||
#include "midas_context.h"
|
||||
#else
|
||||
#include <verilated.h>
|
||||
#if VM_TRACE
|
||||
#include <verilated_vcd_c.h>
|
||||
#endif // VM_TRACE
|
||||
#endif
|
||||
|
||||
void mmio_zynq_t::read_req(uint64_t addr, size_t size, size_t len) {
|
||||
mmio_req_addr_t ar(0, addr, size, len);
|
||||
this->ar.push(ar);
|
||||
}
|
||||
|
||||
void mmio_zynq_t::write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb) {
|
||||
int nbytes = 1 << size;
|
||||
|
||||
mmio_req_addr_t aw(0, addr, size, len);
|
||||
this->aw.push(aw);
|
||||
|
||||
for (int i = 0; i < len + 1; i++) {
|
||||
mmio_req_data_t w(((char*) data) + i * nbytes, strb[i], i == len);
|
||||
this->w.push(w);
|
||||
}
|
||||
}
|
||||
|
||||
void mmio_zynq_t::tick(
|
||||
bool reset,
|
||||
bool ar_ready,
|
||||
bool aw_ready,
|
||||
bool w_ready,
|
||||
size_t r_id,
|
||||
void* r_data,
|
||||
bool r_last,
|
||||
bool r_valid,
|
||||
size_t b_id,
|
||||
bool b_valid)
|
||||
{
|
||||
const bool ar_fire = !reset && ar_ready && ar_valid();
|
||||
const bool aw_fire = !reset && aw_ready && aw_valid();
|
||||
const bool w_fire = !reset && w_ready && w_valid();
|
||||
const bool r_fire = !reset && r_valid && r_ready();
|
||||
const bool b_fire = !reset && b_valid && b_ready();
|
||||
|
||||
if (ar_fire) read_inflight = true;
|
||||
if (aw_fire) write_inflight = true;
|
||||
if (w_fire) this->w.pop();
|
||||
if (r_fire) {
|
||||
char* dat = (char*)malloc(dummy_data.size());
|
||||
memcpy(dat, (char*)r_data, dummy_data.size());
|
||||
mmio_resp_data_t r(r_id, dat, r_last);
|
||||
this->r.push(r);
|
||||
}
|
||||
if (b_fire) {
|
||||
this->b.push(b_id);
|
||||
}
|
||||
}
|
||||
|
||||
bool mmio_zynq_t::read_resp(void* data) {
|
||||
if (ar.empty() || r.size() <= ar.front().len) {
|
||||
return false;
|
||||
} else {
|
||||
auto ar = this->ar.front();
|
||||
size_t word_size = 1 << ar.size;
|
||||
for (size_t i = 0 ; i <= ar.len ; i++) {
|
||||
auto r = this->r.front();
|
||||
assert(ar.id == r.id && (i < ar.len || r.last));
|
||||
memcpy(((char*)data) + i * word_size, r.data, word_size);
|
||||
free(r.data);
|
||||
this->r.pop();
|
||||
}
|
||||
this->ar.pop();
|
||||
read_inflight = false;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
bool mmio_zynq_t::write_resp() {
|
||||
if (aw.empty() || b.empty()) {
|
||||
return false;
|
||||
} else {
|
||||
assert(aw.front().id == b.front());
|
||||
aw.pop();
|
||||
b.pop();
|
||||
write_inflight = false;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
extern uint64_t main_time;
|
||||
extern std::unique_ptr<mmio_t> master;
|
||||
std::unique_ptr<mm_t> slave;
|
||||
|
||||
void* init(uint64_t memsize, bool dramsim) {
|
||||
master.reset(new mmio_zynq_t);
|
||||
slave.reset(dramsim ? (mm_t*) new mm_dramsim2_t(1 << MEM_ID_BITS) : (mm_t*) new mm_magic_t);
|
||||
slave->init(memsize, MEM_WIDTH, 64);
|
||||
return slave->get_data();
|
||||
}
|
||||
|
||||
#ifdef VCS
|
||||
static const size_t MASTER_DATA_SIZE = MMIO_WIDTH / sizeof(uint32_t);
|
||||
static const size_t SLAVE_DATA_SIZE = MEM_WIDTH / sizeof(uint32_t);
|
||||
extern midas_context_t* host;
|
||||
extern bool vcs_fin;
|
||||
extern bool vcs_rst;
|
||||
extern "C" {
|
||||
void tick(
|
||||
vc_handle reset,
|
||||
vc_handle fin,
|
||||
|
||||
vc_handle master_ar_valid,
|
||||
vc_handle master_ar_ready,
|
||||
vc_handle master_ar_bits_addr,
|
||||
vc_handle master_ar_bits_id,
|
||||
vc_handle master_ar_bits_size,
|
||||
vc_handle master_ar_bits_len,
|
||||
|
||||
vc_handle master_aw_valid,
|
||||
vc_handle master_aw_ready,
|
||||
vc_handle master_aw_bits_addr,
|
||||
vc_handle master_aw_bits_id,
|
||||
vc_handle master_aw_bits_size,
|
||||
vc_handle master_aw_bits_len,
|
||||
|
||||
vc_handle master_w_valid,
|
||||
vc_handle master_w_ready,
|
||||
vc_handle master_w_bits_strb,
|
||||
vc_handle master_w_bits_data,
|
||||
vc_handle master_w_bits_last,
|
||||
|
||||
vc_handle master_r_valid,
|
||||
vc_handle master_r_ready,
|
||||
vc_handle master_r_bits_resp,
|
||||
vc_handle master_r_bits_id,
|
||||
vc_handle master_r_bits_data,
|
||||
vc_handle master_r_bits_last,
|
||||
|
||||
vc_handle master_b_valid,
|
||||
vc_handle master_b_ready,
|
||||
vc_handle master_b_bits_resp,
|
||||
vc_handle master_b_bits_id,
|
||||
|
||||
vc_handle slave_ar_valid,
|
||||
vc_handle slave_ar_ready,
|
||||
vc_handle slave_ar_bits_addr,
|
||||
vc_handle slave_ar_bits_id,
|
||||
vc_handle slave_ar_bits_size,
|
||||
vc_handle slave_ar_bits_len,
|
||||
|
||||
vc_handle slave_aw_valid,
|
||||
vc_handle slave_aw_ready,
|
||||
vc_handle slave_aw_bits_addr,
|
||||
vc_handle slave_aw_bits_id,
|
||||
vc_handle slave_aw_bits_size,
|
||||
vc_handle slave_aw_bits_len,
|
||||
|
||||
vc_handle slave_w_valid,
|
||||
vc_handle slave_w_ready,
|
||||
vc_handle slave_w_bits_strb,
|
||||
vc_handle slave_w_bits_data,
|
||||
vc_handle slave_w_bits_last,
|
||||
|
||||
vc_handle slave_r_valid,
|
||||
vc_handle slave_r_ready,
|
||||
vc_handle slave_r_bits_resp,
|
||||
vc_handle slave_r_bits_id,
|
||||
vc_handle slave_r_bits_data,
|
||||
vc_handle slave_r_bits_last,
|
||||
|
||||
vc_handle slave_b_valid,
|
||||
vc_handle slave_b_ready,
|
||||
vc_handle slave_b_bits_resp,
|
||||
vc_handle slave_b_bits_id
|
||||
) {
|
||||
mmio_zynq_t* m;
|
||||
assert(m = dynamic_cast<mmio_zynq_t*>(master.get()));
|
||||
uint32_t master_r_data[MASTER_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
|
||||
master_r_data[i] = vc_4stVectorRef(master_r_bits_data)[i].d;
|
||||
}
|
||||
uint32_t slave_w_data[SLAVE_DATA_SIZE];
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
slave_w_data[i] = vc_4stVectorRef(slave_w_bits_data)[i].d;
|
||||
}
|
||||
|
||||
vc_putScalar(master_aw_valid, m->aw_valid());
|
||||
vc_putScalar(master_ar_valid, m->ar_valid());
|
||||
vc_putScalar(master_w_valid, m->w_valid());
|
||||
vc_putScalar(master_w_bits_last, m->w_last());
|
||||
vc_putScalar(master_r_ready, m->r_ready());
|
||||
vc_putScalar(master_b_ready, m->b_ready());
|
||||
|
||||
vec32 md[MASTER_DATA_SIZE];
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_id();
|
||||
vc_put4stVector(master_aw_bits_id, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_addr();
|
||||
vc_put4stVector(master_aw_bits_addr, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_size();
|
||||
vc_put4stVector(master_aw_bits_size, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->aw_len();
|
||||
vc_put4stVector(master_aw_bits_len, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_id();
|
||||
vc_put4stVector(master_ar_bits_id, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_addr();
|
||||
vc_put4stVector(master_ar_bits_addr, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_size();
|
||||
vc_put4stVector(master_ar_bits_size, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->ar_len();
|
||||
vc_put4stVector(master_ar_bits_len, md);
|
||||
md[0].c = 0;
|
||||
md[0].d = m->w_strb();
|
||||
vc_put4stVector(master_w_bits_strb, md);
|
||||
|
||||
for (size_t i = 0 ; i < MASTER_DATA_SIZE ; i++) {
|
||||
md[i].c = 0;
|
||||
md[i].d = ((uint32_t*) m->w_data())[i];
|
||||
}
|
||||
vc_put4stVector(master_w_bits_data, md);
|
||||
|
||||
m->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(master_ar_ready),
|
||||
vc_getScalar(master_aw_ready),
|
||||
vc_getScalar(master_w_ready),
|
||||
vc_4stVectorRef(master_r_bits_id)->d,
|
||||
master_r_data,
|
||||
vc_getScalar(master_r_bits_last),
|
||||
vc_getScalar(master_r_valid),
|
||||
vc_4stVectorRef(master_b_bits_id)->d,
|
||||
vc_getScalar(master_b_valid)
|
||||
);
|
||||
|
||||
slave->tick(
|
||||
vcs_rst,
|
||||
vc_getScalar(slave_ar_valid),
|
||||
vc_4stVectorRef(slave_ar_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_ar_bits_id)->d,
|
||||
vc_4stVectorRef(slave_ar_bits_size)->d,
|
||||
vc_4stVectorRef(slave_ar_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_aw_valid),
|
||||
vc_4stVectorRef(slave_aw_bits_addr)->d,
|
||||
vc_4stVectorRef(slave_aw_bits_id)->d,
|
||||
vc_4stVectorRef(slave_aw_bits_size)->d,
|
||||
vc_4stVectorRef(slave_aw_bits_len)->d,
|
||||
|
||||
vc_getScalar(slave_w_valid),
|
||||
vc_4stVectorRef(slave_w_bits_strb)->d,
|
||||
slave_w_data,
|
||||
vc_getScalar(slave_w_bits_last),
|
||||
|
||||
vc_getScalar(slave_r_ready),
|
||||
vc_getScalar(slave_b_ready)
|
||||
);
|
||||
|
||||
vc_putScalar(slave_aw_ready, slave->aw_ready());
|
||||
vc_putScalar(slave_ar_ready, slave->ar_ready());
|
||||
vc_putScalar(slave_w_ready, slave->w_ready());
|
||||
vc_putScalar(slave_b_valid, slave->b_valid());
|
||||
vc_putScalar(slave_r_valid, slave->r_valid());
|
||||
vc_putScalar(slave_r_bits_last, slave->r_last());
|
||||
|
||||
vec32 sd[SLAVE_DATA_SIZE];
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave->b_id();
|
||||
vc_put4stVector(slave_b_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave->b_resp();
|
||||
vc_put4stVector(slave_b_bits_resp, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave->r_id();
|
||||
vc_put4stVector(slave_r_bits_id, sd);
|
||||
sd[0].c = 0;
|
||||
sd[0].d = slave->r_resp();
|
||||
vc_put4stVector(slave_r_bits_resp, sd);
|
||||
for (size_t i = 0 ; i < SLAVE_DATA_SIZE ; i++) {
|
||||
sd[i].c = 0;
|
||||
sd[i].d = ((uint32_t*) slave->r_data())[i];
|
||||
}
|
||||
vc_put4stVector(slave_r_bits_data, sd);
|
||||
vc_putScalar(reset, vcs_rst);
|
||||
vc_putScalar(fin, vcs_fin);
|
||||
|
||||
main_time++;
|
||||
|
||||
if (!vcs_fin) host->switch_to();
|
||||
else vcs_fin = false;
|
||||
}
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
extern PLATFORM_TYPE* top;
|
||||
#if VM_TRACE
|
||||
extern VerilatedVcdC* tfp;
|
||||
#endif // VM_TRACE
|
||||
|
||||
void tick() {
|
||||
mmio_zynq_t* m;
|
||||
assert(m = dynamic_cast<mmio_zynq_t*>(master.get()));
|
||||
top->clock = 1;
|
||||
top->eval();
|
||||
#if VM_TRACE
|
||||
if (tfp) tfp->dump((double) main_time);
|
||||
#endif // VM_TRACE
|
||||
main_time++;
|
||||
|
||||
top->io_master_aw_valid = m->aw_valid();
|
||||
top->io_master_aw_bits_id = m->aw_id();
|
||||
top->io_master_aw_bits_addr = m->aw_addr();
|
||||
top->io_master_aw_bits_size = m->aw_size();
|
||||
top->io_master_aw_bits_len = m->aw_len();
|
||||
|
||||
top->io_master_ar_valid = m->ar_valid();
|
||||
top->io_master_ar_bits_id = m->ar_id();
|
||||
top->io_master_ar_bits_addr = m->ar_addr();
|
||||
top->io_master_ar_bits_size = m->ar_size();
|
||||
top->io_master_ar_bits_len = m->ar_len();
|
||||
|
||||
top->io_master_w_valid = m->w_valid();
|
||||
top->io_master_w_bits_strb = m->w_strb();
|
||||
top->io_master_w_bits_last = m->w_last();
|
||||
|
||||
top->io_master_r_ready = m->r_ready();
|
||||
top->io_master_b_ready = m->b_ready();
|
||||
#if CTRL_DATA_BITS > 64
|
||||
memcpy(top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
|
||||
#else
|
||||
memcpy(&top->io_master_w_bits_data, m->w_data(), MMIO_WIDTH);
|
||||
#endif
|
||||
|
||||
m->tick(
|
||||
top->reset,
|
||||
top->io_master_ar_ready,
|
||||
top->io_master_aw_ready,
|
||||
top->io_master_w_ready,
|
||||
top->io_master_r_bits_id,
|
||||
#if CTRL_DATA_BITS > 64
|
||||
top->io_master_r_bits_data,
|
||||
#else
|
||||
&top->io_master_r_bits_data,
|
||||
#endif
|
||||
top->io_master_r_bits_last,
|
||||
top->io_master_r_valid,
|
||||
top->io_master_b_bits_id,
|
||||
top->io_master_b_valid
|
||||
);
|
||||
|
||||
top->io_slave_aw_ready = slave->aw_ready();
|
||||
top->io_slave_ar_ready = slave->ar_ready();
|
||||
top->io_slave_w_ready = slave->w_ready();
|
||||
top->io_slave_b_valid = slave->b_valid();
|
||||
top->io_slave_b_bits_id = slave->b_id();
|
||||
top->io_slave_b_bits_resp = slave->b_resp();
|
||||
top->io_slave_r_valid = slave->r_valid();
|
||||
top->io_slave_r_bits_id = slave->r_id();
|
||||
top->io_slave_r_bits_resp = slave->r_resp();
|
||||
top->io_slave_r_bits_last = slave->r_last();
|
||||
#if MEM_DATA_BITS > 64
|
||||
memcpy(top->io_slave_r_bits_data, slave->r_data(), MEM_WIDTH);
|
||||
#else
|
||||
memcpy(&top->io_slave_r_bits_data, slave->r_data(), MEM_WIDTH);
|
||||
#endif
|
||||
|
||||
top->clock = 0;
|
||||
top->eval();
|
||||
|
||||
// Slave should be ticked in clock low for comb paths
|
||||
slave->tick(
|
||||
top->reset,
|
||||
top->io_slave_ar_valid,
|
||||
top->io_slave_ar_bits_addr,
|
||||
top->io_slave_ar_bits_id,
|
||||
top->io_slave_ar_bits_size,
|
||||
top->io_slave_ar_bits_len,
|
||||
|
||||
top->io_slave_aw_valid,
|
||||
top->io_slave_aw_bits_addr,
|
||||
top->io_slave_aw_bits_id,
|
||||
top->io_slave_aw_bits_size,
|
||||
top->io_slave_aw_bits_len,
|
||||
|
||||
top->io_slave_w_valid,
|
||||
top->io_slave_w_bits_strb,
|
||||
#if MEM_DATA_BITS > 64
|
||||
top->io_slave_w_bits_data,
|
||||
#else
|
||||
&top->io_slave_w_bits_data,
|
||||
#endif
|
||||
top->io_slave_w_bits_last,
|
||||
|
||||
top->io_slave_r_ready,
|
||||
top->io_slave_b_ready
|
||||
);
|
||||
|
||||
#if VM_TRACE
|
||||
if (tfp) tfp->dump((double) main_time);
|
||||
#endif // VM_TRACE
|
||||
main_time++;
|
||||
}
|
||||
|
||||
#endif // VCS
|
|
@ -0,0 +1,100 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __MMIO_ZYNQ_H
|
||||
#define __MMIO_ZYNQ_H
|
||||
|
||||
#include "mmio.h"
|
||||
#include <cstring>
|
||||
#include <vector>
|
||||
#include <queue>
|
||||
|
||||
struct mmio_req_addr_t
|
||||
{
|
||||
size_t id;
|
||||
uint64_t addr;
|
||||
size_t size;
|
||||
size_t len;
|
||||
|
||||
mmio_req_addr_t(size_t id_, uint64_t addr_, size_t size_, size_t len_):
|
||||
id(id_), addr(addr_), size(size_), len(len_) { }
|
||||
};
|
||||
|
||||
struct mmio_req_data_t
|
||||
{
|
||||
char* data;
|
||||
size_t strb;
|
||||
bool last;
|
||||
|
||||
mmio_req_data_t(char* data_, size_t strb_, bool last_):
|
||||
data(data_), strb(strb_), last(last_) { }
|
||||
};
|
||||
|
||||
struct mmio_resp_data_t
|
||||
{
|
||||
size_t id;
|
||||
char* data;
|
||||
bool last;
|
||||
|
||||
mmio_resp_data_t(size_t id_, char* data_, bool last_):
|
||||
id(id_), data(data_), last(last_) { }
|
||||
};
|
||||
|
||||
class mmio_zynq_t: public mmio_t
|
||||
{
|
||||
public:
|
||||
mmio_zynq_t(): read_inflight(false), write_inflight(false) {
|
||||
dummy_data.resize(MMIO_WIDTH);
|
||||
}
|
||||
|
||||
bool aw_valid() { return !aw.empty() && !write_inflight; }
|
||||
size_t aw_id() { return aw_valid() ? aw.front().id : 0; }
|
||||
uint64_t aw_addr() { return aw_valid() ? aw.front().addr : 0; }
|
||||
size_t aw_size() { return aw_valid() ? aw.front().size : 0; }
|
||||
size_t aw_len() { return aw_valid() ? aw.front().len : 0; }
|
||||
|
||||
bool ar_valid() { return !ar.empty() && !read_inflight; }
|
||||
size_t ar_id() { return ar_valid() ? ar.front().id : 0; }
|
||||
uint64_t ar_addr() { return ar_valid() ? ar.front().addr : 0; }
|
||||
size_t ar_size() { return ar_valid() ? ar.front().size : 0; }
|
||||
size_t ar_len() { return ar_valid() ? ar.front().len : 0; }
|
||||
|
||||
bool w_valid() { return !w.empty(); }
|
||||
size_t w_strb() { return w_valid() ? w.front().strb : 0; }
|
||||
bool w_last() { return w_valid() ? w.front().last : false; }
|
||||
void* w_data() { return w_valid() ? w.front().data : &dummy_data[0]; }
|
||||
|
||||
bool r_ready() { return read_inflight; }
|
||||
bool b_ready() { return write_inflight; }
|
||||
|
||||
void tick
|
||||
(
|
||||
bool reset,
|
||||
bool ar_ready,
|
||||
bool aw_ready,
|
||||
bool w_ready,
|
||||
size_t r_id,
|
||||
void* r_data,
|
||||
bool r_last,
|
||||
bool r_valid,
|
||||
size_t b_id,
|
||||
bool b_valid
|
||||
);
|
||||
|
||||
virtual void read_req(uint64_t addr, size_t size, size_t len);
|
||||
virtual void write_req(uint64_t addr, size_t size, size_t len, void* data, size_t *strb);
|
||||
virtual bool read_resp(void *data);
|
||||
virtual bool write_resp();
|
||||
|
||||
private:
|
||||
std::queue<mmio_req_addr_t> ar;
|
||||
std::queue<mmio_req_addr_t> aw;
|
||||
std::queue<mmio_req_data_t> w;
|
||||
std::queue<mmio_resp_data_t> r;
|
||||
std::queue<size_t> b;
|
||||
|
||||
bool read_inflight;
|
||||
bool write_inflight;
|
||||
std::vector<char> dummy_data;
|
||||
};
|
||||
|
||||
#endif // __MMIO_ZYNQ_H
|
|
@ -0,0 +1,25 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __VCS_MAIN
|
||||
#define __VCS_MAIN
|
||||
|
||||
extern "C" {
|
||||
extern int vcs_main(int argc, char** argv);
|
||||
}
|
||||
|
||||
struct target_args_t {
|
||||
target_args_t(int c, char** v):
|
||||
argc(c), argv(v) { }
|
||||
int argc;
|
||||
char** argv;
|
||||
};
|
||||
|
||||
int target_thread(void *arg) {
|
||||
target_args_t* targs = reinterpret_cast<target_args_t*>(arg);
|
||||
int argc = targs->argc;
|
||||
char** argv = targs->argv;
|
||||
delete targs;
|
||||
return vcs_main(argc, argv);
|
||||
}
|
||||
|
||||
#endif // __VCS_MAIN
|
|
@ -0,0 +1,273 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __REPLAY_H
|
||||
#define __REPLAY_H
|
||||
|
||||
#include <vector>
|
||||
#include <map>
|
||||
#include <fstream>
|
||||
#include <sstream>
|
||||
#include <iostream>
|
||||
#include <cassert>
|
||||
#include <gmp.h>
|
||||
#include "sample/sample.h"
|
||||
|
||||
enum PUT_VALUE_TYPE { PUT_DEPOSIT, PUT_FORCE };
|
||||
static const char* PUT_VALUE_TYPE_STRING[2] = { "LOAD", "FORCE" };
|
||||
|
||||
template <class T> class replay_t {
|
||||
public:
|
||||
replay_t(): cycles(0L), log(false), pass(true), is_exit(false) {
|
||||
mpz_init(one);
|
||||
mpz_set_ui(one, 1);
|
||||
}
|
||||
|
||||
virtual ~replay_t() {
|
||||
for (auto& sample: samples) delete sample;
|
||||
samples.clear();
|
||||
mpz_clear(one);
|
||||
}
|
||||
|
||||
void init(int argc, char** argv) {
|
||||
std::vector<std::string> args(argv + 1, argv + argc);
|
||||
for (auto &arg: args) {
|
||||
if (arg.find("+sample=") == 0) {
|
||||
load_samples(arg.c_str() + 8);
|
||||
}
|
||||
if (arg.find("+match=") == 0) {
|
||||
load_match_points(arg.c_str() + 7);
|
||||
}
|
||||
if (arg.find("+verbose") == 0) {
|
||||
log = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void reset(size_t n) {
|
||||
size_t id = replay_data.signal_map["reset"];
|
||||
put_value(replay_data.signals[id], one, PUT_DEPOSIT);
|
||||
take_steps(n);
|
||||
}
|
||||
|
||||
virtual void replay() {
|
||||
for (auto& sample: samples) {
|
||||
reset(5);
|
||||
std::cerr << " * REPLAY AT CYCLE " << sample->get_cycle() << " * " << std::endl;
|
||||
for (size_t i = 0 ; i < sample->get_cmds().size() ; i++) {
|
||||
sample_inst_t* cmd = sample->get_cmds()[i];
|
||||
if (step_t* p = dynamic_cast<step_t*>(cmd)) {
|
||||
step(p->n);
|
||||
}
|
||||
else if (load_t* p = dynamic_cast<load_t*>(cmd)) {
|
||||
auto signal = signals[p->type][p->id];
|
||||
auto width = widths[p->type][p->id];
|
||||
load(signal, width, *(p->value), PUT_DEPOSIT, p->idx);
|
||||
}
|
||||
else if (force_t* p = dynamic_cast<force_t*>(cmd)) {
|
||||
auto signal = signals[p->type][p->id];
|
||||
auto width = widths[p->type][p->id];
|
||||
load(signal, width, *(p->value), PUT_FORCE, -1);
|
||||
}
|
||||
else if (poke_t* p = dynamic_cast<poke_t*>(cmd)) {
|
||||
poke(signals[p->type][p->id], *(p->value));
|
||||
}
|
||||
else if (expect_t* p = dynamic_cast<expect_t*>(cmd)) {
|
||||
pass &= expect(signals[p->type][p->id], *(p->value));
|
||||
}
|
||||
}
|
||||
}
|
||||
is_exit = true;
|
||||
}
|
||||
|
||||
virtual int finish() {
|
||||
fprintf(stderr, "[%s] Runs %" PRIu64 " cycles\n",
|
||||
pass ? "PASS" : "FAIL", cycles);
|
||||
return exitcode();
|
||||
}
|
||||
|
||||
protected:
|
||||
struct {
|
||||
std::vector<T> signals;
|
||||
std::map<std::string, size_t> signal_map;
|
||||
} replay_data;
|
||||
|
||||
inline bool gate_level() { return !match_map.empty(); }
|
||||
inline bool done() { return is_exit; }
|
||||
inline int exitcode() { return pass ? EXIT_SUCCESS : EXIT_FAILURE; }
|
||||
|
||||
private:
|
||||
mpz_t one; // useful constant
|
||||
uint64_t cycles;
|
||||
bool log;
|
||||
bool pass;
|
||||
bool is_exit;
|
||||
std::vector<sample_t*> samples;
|
||||
std::vector<std::vector<std::string>> signals;
|
||||
std::vector<std::vector<size_t>> widths;
|
||||
std::map<std::string, std::string> match_map;
|
||||
|
||||
void load_samples(const char* filename) {
|
||||
std::ifstream file(filename);
|
||||
if (!file) {
|
||||
fprintf(stderr, "Cannot open %s\n", filename);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
std::string line;
|
||||
size_t steps = 0;
|
||||
sample_t* sample = NULL;
|
||||
while (std::getline(file, line)) {
|
||||
std::istringstream iss(line);
|
||||
size_t type, t, width, id, n;
|
||||
ssize_t idx;
|
||||
uint64_t cycles;
|
||||
std::string signal, valstr, dummy;
|
||||
mpz_t *value = NULL;
|
||||
iss >> type;
|
||||
switch(static_cast<SAMPLE_INST_TYPE>(type)) {
|
||||
case SIGNALS:
|
||||
iss >> t >> signal >> width;
|
||||
while(signals.size() <= t) signals.push_back(std::vector<std::string>());
|
||||
while(widths.size() <= t) widths.push_back(std::vector<size_t>());
|
||||
signals[t].push_back(signal);
|
||||
widths[t].push_back(width);
|
||||
break;
|
||||
case CYCLE:
|
||||
iss >> dummy >> cycles;
|
||||
sample = new sample_t(cycles);
|
||||
samples.push_back(sample);
|
||||
steps = 0;
|
||||
break;
|
||||
case STATE_LOAD:
|
||||
iss >> t >> id >> valstr >> idx;
|
||||
value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_str(*value, valstr.c_str(), 16);
|
||||
sample->add_cmd(new load_t(t, id, value, idx));
|
||||
break;
|
||||
case FORCE:
|
||||
iss >> t >> id >> valstr;
|
||||
value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_str(*value, valstr.c_str(), 16);
|
||||
sample->add_cmd(new force_t(t, id, value));
|
||||
break;
|
||||
case POKE:
|
||||
iss >> t >> id >> valstr;
|
||||
value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_str(*value, valstr.c_str(), 16);
|
||||
sample->add_cmd(new poke_t(t, id, value));
|
||||
break;
|
||||
case STEP:
|
||||
iss >> n;
|
||||
sample->add_cmd(new step_t(n));
|
||||
steps += n;
|
||||
break;
|
||||
case EXPECT:
|
||||
iss >> t >> id >> valstr;
|
||||
value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_str(*value, valstr.c_str(), 16);
|
||||
if (steps > 1) sample->add_cmd(new expect_t(t, id, value));
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
file.close();
|
||||
}
|
||||
|
||||
void load_match_points(const char* filename) {
|
||||
std::ifstream file(filename);
|
||||
if (!file) {
|
||||
fprintf(stderr, "Cannot open %s\n", filename);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
std::string line;
|
||||
while (std::getline(file, line)) {
|
||||
std::istringstream iss(line);
|
||||
std::string ref, impl;
|
||||
iss >> ref >> impl;
|
||||
match_map[ref] = impl;
|
||||
}
|
||||
}
|
||||
|
||||
virtual void take_steps(size_t) = 0;
|
||||
virtual void put_value(T& sig, mpz_t& data, PUT_VALUE_TYPE type) = 0;
|
||||
virtual void get_value(T& sig, mpz_t& data) = 0;
|
||||
|
||||
void step(size_t n) {
|
||||
cycles += n;
|
||||
if (log) std::cerr << " * STEP " << n << " -> " << cycles << " *" << std::endl;
|
||||
take_steps(n);
|
||||
}
|
||||
|
||||
T& get_signal(const std::string& node) {
|
||||
auto it = replay_data.signal_map.find(node);
|
||||
if (it == replay_data.signal_map.end()) {
|
||||
std::cerr << "Cannot find " << node << " in the design" << std::endl;
|
||||
assert(false);
|
||||
}
|
||||
return replay_data.signals[it->second];
|
||||
}
|
||||
|
||||
void load_bit(const std::string& ref, mpz_t& bit, PUT_VALUE_TYPE tpe) {
|
||||
auto it = match_map.find(ref);
|
||||
if (it != match_map.end()) {
|
||||
put_value(get_signal(it->second), bit, tpe);
|
||||
}
|
||||
}
|
||||
|
||||
void load(const std::string& node, size_t width, mpz_t& data, PUT_VALUE_TYPE tpe, int idx) {
|
||||
std::string name = idx < 0 ? node : node + "[" + std::to_string(idx) + "]";
|
||||
if (log) {
|
||||
char* data_str = mpz_get_str(NULL, 16, data);
|
||||
std::cerr << " * " << PUT_VALUE_TYPE_STRING[tpe] << " " << name
|
||||
<< " <- 0x" << data_str << " *" << std::endl;
|
||||
free(data_str);
|
||||
}
|
||||
if (!gate_level()) {
|
||||
put_value(get_signal(name), data, tpe);
|
||||
} else if (width == 1 && idx < 0) {
|
||||
load_bit(name, data, tpe);
|
||||
} else {
|
||||
for (size_t i = 0 ; i < width ; i++) {
|
||||
mpz_t bit;
|
||||
mpz_init(bit);
|
||||
// bit = (data >> i) & 0x1
|
||||
mpz_fdiv_q_2exp(bit, data, i);
|
||||
mpz_and(bit, bit, one);
|
||||
load_bit(name + "[" + std::to_string(i) + "]", bit, tpe);
|
||||
mpz_clear(bit);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void poke(const std::string& node, mpz_t& data) {
|
||||
if (log) {
|
||||
char* data_str = mpz_get_str(NULL, 16, data);
|
||||
std::cerr << " * POKE " << node << " <- 0x" << data_str << " *" << std::endl;
|
||||
free(data_str);
|
||||
}
|
||||
put_value(get_signal(node), data, PUT_DEPOSIT);
|
||||
}
|
||||
|
||||
bool expect(const std::string& node, mpz_t& expected) {
|
||||
mpz_t value;
|
||||
mpz_init(value);
|
||||
get_value(get_signal(node), value);
|
||||
bool pass = mpz_cmp(value, expected) == 0 || cycles <= 1;
|
||||
if (log) {
|
||||
char* value_str = mpz_get_str(NULL, 16, value);
|
||||
char* expected_str = mpz_get_str(NULL, 16, expected);
|
||||
std::cerr << " * EXPECT " << node
|
||||
<< " -> 0x" << value_str << " ?= 0x" << expected_str
|
||||
<< (pass ? " : PASS" : " : FAIL") << " *" << std::endl;
|
||||
free(value_str);
|
||||
free(expected_str);
|
||||
}
|
||||
return pass;
|
||||
}
|
||||
};
|
||||
|
||||
#endif //__REPLAY_H
|
|
@ -0,0 +1,30 @@
|
|||
##################################################################################
|
||||
# Replay Parameters
|
||||
# 1) TARGET_VERILOG: verilog file to be replay (by default $(GEN_DIR)/$(DESISN).v)
|
||||
# 2) REPLAY_BINARY: binary file for replay (by default $(OUT_DIR)/$(DESIGN)-reply)
|
||||
##################################################################################
|
||||
|
||||
TARGET_VERILOG ?= $(GEN_DIR)/$(DESIGN).v $(GEN_DIR)/$(DESIGN).macros.v
|
||||
REPLAY_BINARY ?= $(OUT_DIR)/$(DESIGN)-replay
|
||||
replay_h := $(midas_dir)/sample/sample.h $(replay_dir)/replay_vpi.h $(replay_dir)/replay.h
|
||||
replay_cc := $(midas_dir)/sample/sample.cc $(replay_dir)/replay_vpi.cc
|
||||
|
||||
ifneq ($(filter $(MAKECMDGOALS),vcs-replay $(REPLAY_BINARY)),)
|
||||
$(info verilog files: $(TARGET_VERILOG))
|
||||
$(info replay binary: $(REPLAY_BINARY))
|
||||
endif
|
||||
|
||||
# Compile VCS replay binary
|
||||
$(REPLAY_BINARY): $(v_dir)/replay.v $(TARGET_VERILOG) $(replay_cc) $(replay_h) $(lib)
|
||||
mkdir -p $(OUT_DIR)
|
||||
rm -rf $(GEN_DIR)/$(notdir $@).csrc
|
||||
rm -rf $(OUT_DIR)/$(notdir $@).daidir
|
||||
$(VCS) $(VCS_FLAGS) -CFLAGS -I$(replay_dir) \
|
||||
-Mdir=$(GEN_DIR)/$(notdir $@).csrc +vpi -P $(r_dir)/vpi.tab \
|
||||
+define+STOP_COND=!replay.reset +define+PRINTF_COND=!replay.reset \
|
||||
+define+VFRAG=\"$(GEN_DIR)/$(DESIGN).vfrag\" \
|
||||
-o $@ $< $(TARGET_VERILOG) $(replay_cc) $(lib)
|
||||
|
||||
vcs-replay: $(REPLAY_BINARY)
|
||||
|
||||
.PHONY: vcs-replay
|
|
@ -0,0 +1,229 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "replay_vpi.h"
|
||||
#include "emul/vcs_main.h"
|
||||
|
||||
void replay_vpi_t::init(int argc, char** argv) {
|
||||
host = midas_context_t::current();
|
||||
target_args_t *targs = new target_args_t(argc, argv);
|
||||
target.init(target_thread, targs);
|
||||
replay_t::init(argc, argv);
|
||||
target.switch_to();
|
||||
}
|
||||
|
||||
int replay_vpi_t::finish() {
|
||||
target.switch_to();
|
||||
return replay_t::finish();
|
||||
}
|
||||
|
||||
void replay_vpi_t::add_signal(vpiHandle& sig_handle, std::string& wire) {
|
||||
size_t id = replay_data.signals.size();
|
||||
replay_data.signals.push_back(sig_handle);
|
||||
replay_data.signal_map[wire] = id;
|
||||
}
|
||||
|
||||
void replay_vpi_t::probe_bits(vpiHandle& sig_handle, std::string& sigpath, std::string& modname) {
|
||||
if (gate_level()) {
|
||||
if (vpi_get(vpiSize, sig_handle) == 1) {
|
||||
std::string bitpath = sigpath + "[0]";
|
||||
add_signal(sig_handle, bitpath);
|
||||
} else {
|
||||
vpiHandle bit_iter = vpi_iterate(vpiBit, sig_handle);
|
||||
while (vpiHandle bit_handle = vpi_scan(bit_iter)) {
|
||||
std::string bitname = vpi_get_str(vpiName, bit_handle);
|
||||
std::string bitpath = modname + "." + bitname;
|
||||
add_signal(bit_handle, bitpath);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void replay_vpi_t::probe_signals() {
|
||||
// traverse testbench first
|
||||
vpiHandle replay_handle = vpi_scan(vpi_iterate(vpiModule, NULL));
|
||||
vpiHandle reg_iter = vpi_iterate(vpiReg, replay_handle);
|
||||
vpiHandle net_iter = vpi_iterate(vpiNet, replay_handle);
|
||||
while (vpiHandle reg_handle = vpi_scan(reg_iter)) {
|
||||
std::string regname = vpi_get_str(vpiName, reg_handle);
|
||||
if (regname.find("_delay") != 0) add_signal(reg_handle, regname);
|
||||
}
|
||||
while (vpiHandle net_handle = vpi_scan(net_iter)) {
|
||||
std::string netname = vpi_get_str(vpiName, net_handle);
|
||||
if (netname.find("_delay") != 0) add_signal(net_handle, netname);
|
||||
}
|
||||
|
||||
vpiHandle syscall_handle = vpi_handle(vpiSysTfCall, NULL);
|
||||
vpiHandle arg_iter = vpi_iterate(vpiArgument, syscall_handle);
|
||||
vpiHandle top_handle = vpi_scan(arg_iter);
|
||||
std::queue<vpiHandle> modules;
|
||||
size_t offset = std::string(vpi_get_str(vpiFullName, top_handle)).find(".") + 1;
|
||||
|
||||
// Start from the top module
|
||||
modules.push(top_handle);
|
||||
|
||||
while (!modules.empty()) {
|
||||
vpiHandle mod_handle = modules.front();
|
||||
modules.pop();
|
||||
|
||||
std::string modname = std::string(vpi_get_str(vpiFullName, mod_handle)).substr(offset);
|
||||
|
||||
if (!vpi_scan(vpi_iterate(vpiPrimitive, mod_handle))) { // Not a gate?
|
||||
// Iterate its ports
|
||||
vpiHandle net_iter = vpi_iterate(vpiNet, mod_handle);
|
||||
while (vpiHandle net_handle = vpi_scan(net_iter)) {
|
||||
std::string netname = vpi_get_str(vpiName, net_handle);
|
||||
std::string netpath = modname + "." + netname;
|
||||
add_signal(net_handle, netpath);
|
||||
probe_bits(net_handle, netpath, modname);
|
||||
}
|
||||
}
|
||||
|
||||
// Iterate its regs
|
||||
vpiHandle reg_iter = vpi_iterate(vpiReg, mod_handle);
|
||||
while (vpiHandle reg_handle = vpi_scan(reg_iter)) {
|
||||
std::string regname = vpi_get_str(vpiName, reg_handle);
|
||||
std::string regpath = modname + "." + regname;
|
||||
add_signal(reg_handle, regpath);
|
||||
probe_bits(reg_handle, regpath, modname);
|
||||
}
|
||||
|
||||
// Iterate its mems
|
||||
vpiHandle mem_iter = vpi_iterate(vpiRegArray, mod_handle);
|
||||
while (vpiHandle mem_handle = vpi_scan(mem_iter)) {
|
||||
vpiHandle elm_iter = vpi_iterate(vpiReg, mem_handle);
|
||||
while (vpiHandle elm_handle = vpi_scan(elm_iter)) {
|
||||
std::string elmname = vpi_get_str(vpiName, elm_handle);
|
||||
std::string elmpath = modname + "." + elmname;
|
||||
add_signal(elm_handle, elmpath);
|
||||
probe_bits(elm_handle, elmpath, modname);
|
||||
}
|
||||
}
|
||||
|
||||
// Find DFF
|
||||
vpiHandle udp_iter = vpi_iterate(vpiPrimitive, mod_handle);
|
||||
while (vpiHandle udp_handle = vpi_scan(udp_iter)) {
|
||||
if (vpi_get(vpiPrimType, udp_handle) == vpiSeqPrim) {
|
||||
add_signal(udp_handle, modname);
|
||||
}
|
||||
}
|
||||
|
||||
vpiHandle sub_iter = vpi_iterate(vpiModule, mod_handle);
|
||||
while (vpiHandle sub_handle = vpi_scan(sub_iter)) {
|
||||
modules.push(sub_handle);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void replay_vpi_t::put_value(vpiHandle& sig, std::string& value, PLI_INT32 flag) {
|
||||
s_vpi_value value_s;
|
||||
// s_vpi_time time_s;
|
||||
value_s.format = vpiHexStrVal;
|
||||
value_s.value.str = (PLI_BYTE8*) value.c_str();
|
||||
// time_s.type = vpiScaledRealTime;
|
||||
// time_s.real = 0.0;
|
||||
vpi_put_value(sig, &value_s, /*&time_s*/ NULL, flag);
|
||||
}
|
||||
|
||||
void replay_vpi_t::get_value(vpiHandle& sig, std::string& value) {
|
||||
s_vpi_value value_s;
|
||||
value_s.format = vpiHexStrVal;
|
||||
vpi_get_value(sig, &value_s);
|
||||
value = value_s.value.str;
|
||||
}
|
||||
|
||||
void replay_vpi_t::put_value(vpiHandle& sig, mpz_t& data, PUT_VALUE_TYPE type) {
|
||||
PLI_INT32 flag;
|
||||
switch(type) {
|
||||
case PUT_DEPOSIT: flag = vpiNoDelay; break;
|
||||
case PUT_FORCE: flag = vpiForceFlag; forces.push(sig); break;
|
||||
}
|
||||
size_t value_size;
|
||||
uint32_t* value = (uint32_t*)mpz_export(NULL, &value_size, -1, sizeof(uint32_t), 0, 0, data);
|
||||
size_t signal_size = ((vpi_get(vpiSize, sig) - 1) / 32) + 1;
|
||||
s_vpi_value value_s;
|
||||
s_vpi_vecval vecval_s[signal_size];
|
||||
value_s.format = vpiVectorVal;
|
||||
value_s.value.vector = vecval_s;
|
||||
for (size_t i = 0 ; i < signal_size ; i++) {
|
||||
value_s.value.vector[i].aval = i < value_size ? value[i] : 0;
|
||||
value_s.value.vector[i].bval = 0;
|
||||
}
|
||||
vpi_put_value(sig, &value_s, NULL, flag);
|
||||
}
|
||||
|
||||
void replay_vpi_t::get_value(vpiHandle& sig, mpz_t& data) {
|
||||
size_t signal_size = ((vpi_get(vpiSize, sig) - 1) / 32) + 1;
|
||||
s_vpi_value value_s;
|
||||
s_vpi_vecval vecval_s[signal_size];
|
||||
value_s.format = vpiVectorVal;
|
||||
value_s.value.vector = vecval_s;
|
||||
vpi_get_value(sig, &value_s);
|
||||
|
||||
uint32_t value[signal_size];
|
||||
for (size_t i = 0 ; i < signal_size ; i++) {
|
||||
value[i] = value_s.value.vector[i].aval;
|
||||
}
|
||||
mpz_import(data, signal_size, -1, sizeof(uint32_t), 0, 0, value);
|
||||
}
|
||||
|
||||
void replay_vpi_t::take_steps(size_t n) {
|
||||
for (size_t i = 0 ; i < n ; i++)
|
||||
target.switch_to();
|
||||
}
|
||||
|
||||
void replay_vpi_t::tick() {
|
||||
while(!forces.empty()) {
|
||||
vpi_put_value(forces.front(), NULL, NULL, vpiReleaseFlag);
|
||||
forces.pop();
|
||||
}
|
||||
host->switch_to();
|
||||
vpiHandle syscall_handle = vpi_handle(vpiSysTfCall, NULL);
|
||||
vpiHandle arg_iter = vpi_iterate(vpiArgument, syscall_handle);
|
||||
vpiHandle exit_handle = vpi_scan(arg_iter);
|
||||
s_vpi_value vexit;
|
||||
vexit.format = vpiIntVal;
|
||||
vexit.value.integer = done();
|
||||
vpi_put_value(exit_handle, &vexit, NULL, vpiNoDelay);
|
||||
}
|
||||
|
||||
static replay_vpi_t* replay = NULL;
|
||||
|
||||
extern "C" {
|
||||
|
||||
PLI_INT32 init_sigs_calltf(PLI_BYTE8 *user_data) {
|
||||
replay->probe_signals();
|
||||
return 0;
|
||||
}
|
||||
|
||||
PLI_INT32 tick_calltf(PLI_BYTE8 *user_data) {
|
||||
replay->tick();
|
||||
return 0;
|
||||
}
|
||||
|
||||
PLI_INT32 sim_end_cb(p_cb_data cb_data) {
|
||||
replay->tick();
|
||||
return 0;
|
||||
}
|
||||
|
||||
PLI_INT32 tick_compiletf(PLI_BYTE8 *user_data) {
|
||||
s_cb_data data_s;
|
||||
data_s.reason = cbEndOfSimulation;
|
||||
data_s.cb_rtn = sim_end_cb;
|
||||
data_s.obj = NULL;
|
||||
data_s.time = NULL;
|
||||
data_s.value = NULL;
|
||||
data_s.user_data = NULL;
|
||||
vpi_free_object(vpi_register_cb(&data_s));
|
||||
}
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
replay = new replay_vpi_t;
|
||||
replay->init(argc, argv);
|
||||
replay->replay();
|
||||
int exitcode = replay->finish();
|
||||
delete replay;
|
||||
return exitcode;
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -0,0 +1,36 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __REPLAY_VPI_H
|
||||
#define __REPLAY_VPI_H
|
||||
|
||||
#include "vpi_user.h"
|
||||
#include "replay.h"
|
||||
#include "midas_context.h"
|
||||
#include <queue>
|
||||
|
||||
class replay_vpi_t: public replay_t<vpiHandle> {
|
||||
public:
|
||||
replay_vpi_t() { }
|
||||
virtual ~replay_vpi_t() { }
|
||||
|
||||
virtual void init(int argc, char** argv);
|
||||
virtual int finish();
|
||||
void probe_signals();
|
||||
void tick();
|
||||
|
||||
private:
|
||||
std::queue<vpiHandle> forces;
|
||||
|
||||
midas_context_t *host;
|
||||
midas_context_t target;
|
||||
|
||||
inline void add_signal(vpiHandle& sig_handle, std::string& path);
|
||||
inline void probe_bits(vpiHandle& sig_handle, std::string& sigpath, std::string& modname);
|
||||
void put_value(vpiHandle& sig, std::string& value, PLI_INT32 flag);
|
||||
void get_value(vpiHandle& sig, std::string& value);
|
||||
virtual void put_value(vpiHandle& sig, mpz_t& data, PUT_VALUE_TYPE type);
|
||||
virtual void get_value(vpiHandle& sig, mpz_t& data);
|
||||
virtual void take_steps(size_t n);
|
||||
};
|
||||
|
||||
#endif // __REPLAY_VPI_H
|
|
@ -0,0 +1,41 @@
|
|||
# VCS RTL Simulation Makefrag
|
||||
#
|
||||
# This makefrag stores common recipes for building RTL simulators with VCS
|
||||
#
|
||||
# Compulsory variables:
|
||||
# All those described Makefrag-verilator
|
||||
# vcs_wrapper_v: An additional verilog wrapper around the DUT not used in verilator
|
||||
# CLOCK_PERIOD: Self explanatory
|
||||
# TB := The top level module on which the stop and printf conditions are defined
|
||||
#
|
||||
|
||||
VCS ?= vcs -full64
|
||||
override VCS_FLAGS := -quiet -timescale=1ns/1ps +v2k +rad +vcs+initreg+random +vcs+lic+wait \
|
||||
-notice -line +lint=all,noVCDE,noONGS,noUI -quiet -debug_pp +no_notifier -cpp $(CXX) \
|
||||
-Mdir=$(GEN_DIR)/$(DESIGN)-debug.csrc \
|
||||
+vc+list \
|
||||
-CFLAGS "$(CXXFLAGS) $(CFLAGS) -DVCS -I$(VCS_HOME)/include" \
|
||||
-LDFLAGS "$(LDFLAGS)" \
|
||||
-sverilog \
|
||||
+define+CLOCK_PERIOD=$(CLOCK_PERIOD) \
|
||||
+define+RANDOMIZE_GARBAGE_ASSIGN \
|
||||
+define+RANDOMIZE_INVALID_ASSIGN \
|
||||
+define+STOP_COND=!$(TB).reset \
|
||||
+define+PRINTF_COND=!$(TB).reset \
|
||||
$(VCS_FLAGS)
|
||||
|
||||
vcs_v := $(emul_v) $(vcs_wrapper_v)
|
||||
|
||||
$(OUT_DIR)/$(DESIGN): $(vcs_v) $(emul_cc) $(emul_h)
|
||||
mkdir -p $(OUT_DIR)
|
||||
rm -rf $(GEN_DIR)/$(DESIGN).csrc
|
||||
rm -rf $(OUT_DIR)/$(DESIGN).daidir
|
||||
$(VCS) $(VCS_FLAGS) \
|
||||
-o $@ $(vcs_v) $(emul_cc)
|
||||
|
||||
$(OUT_DIR)/$(DESIGN)-debug: $(vcs_v) $(emul_cc) $(emul_h)
|
||||
mkdir -p $(OUT_DIR)
|
||||
rm -rf $(GEN_DIR)/$(DESIGN)-debug.csrc
|
||||
rm -rf $(OUT_DIR)/$(DESIGN)-debug.daidir
|
||||
$(VCS) $(VCS_FLAGS) +define+DEBUG \
|
||||
-o $@ $(vcs_v) $(emul_cc)
|
|
@ -0,0 +1,37 @@
|
|||
# Verilator RTL Simulation Makefrag
|
||||
#
|
||||
# This makefrag stores common recipes for building RTL simulators with Verilator
|
||||
#
|
||||
# Compulsory variables:
|
||||
# OUT_DIR: See Makefile
|
||||
# GEN_DIR: See Makefile
|
||||
# DESIGN: See Makefile
|
||||
# emul_cc: C++ sources
|
||||
# emul_h: C++ headers
|
||||
# emul_v: verilog sources and headers
|
||||
#
|
||||
# Verilator Only:
|
||||
# top_module: The top of the DUT
|
||||
# (optional) verilator_conf: An verilator configuration file
|
||||
|
||||
VERILATOR ?= verilator --cc --exe
|
||||
override VERILATOR_FLAGS := --assert -Wno-STMTDLY -O3 \
|
||||
-CFLAGS "$(CXXFLAGS) $(CFLAGS)" \
|
||||
-LDFLAGS "$(LDFLAGS) " \
|
||||
$(VERILATOR_FLAGS)
|
||||
|
||||
$(OUT_DIR)/V$(DESIGN): $(emul_v) $(emul_cc) $(emul_h)
|
||||
mkdir -p $(OUT_DIR)
|
||||
rm -rf $(GEN_DIR)/V$(DESIGN).csrc
|
||||
$(VERILATOR) $(VERILATOR_FLAGS) --top-module $(top_module) -Mdir $(GEN_DIR)/V$(DESIGN).csrc \
|
||||
-CFLAGS "-include $(GEN_DIR)/V$(DESIGN).csrc/V$(top_module).h" \
|
||||
-o $@ $(emul_v) $(verilator_conf) $(emul_cc)
|
||||
$(MAKE) -C $(GEN_DIR)/V$(DESIGN).csrc -f V$(top_module).mk
|
||||
|
||||
$(OUT_DIR)/V$(DESIGN)-debug: $(emul_v) $(emul_cc) $(emul_h)
|
||||
mkdir -p $(OUT_DIR)
|
||||
rm -rf $(GEN_DIR)/V$(DESIGN)-debug.csrc
|
||||
$(VERILATOR) $(VERILATOR_FLAGS) --trace --top-module $(top_module) -Mdir $(GEN_DIR)/V$(DESIGN)-debug.csrc \
|
||||
-CFLAGS "-include $(GEN_DIR)/V$(DESIGN)-debug.csrc/V$(top_module).h" \
|
||||
-o $@ $(emul_v) $(verilator_conf) $(emul_cc)
|
||||
$(MAKE) -C $(GEN_DIR)/V$(DESIGN)-debug.csrc -f V$(top_module).mk
|
|
@ -0,0 +1,273 @@
|
|||
// See LICENSE.SiFive for license details.
|
||||
// See LICENSE.Berkeley for license details.
|
||||
|
||||
#include "verilated.h"
|
||||
#if VM_TRACE
|
||||
#include <memory>
|
||||
#include "verilated_vcd_c.h"
|
||||
#endif
|
||||
#include <iostream>
|
||||
#include <fcntl.h>
|
||||
#include <signal.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <unistd.h>
|
||||
#include <getopt.h>
|
||||
|
||||
// Originally from Rocket-Chip, with RISC-V specific stuff stripped out
|
||||
|
||||
// For option parsing, which is split across this file, Verilog, a few external
|
||||
// files must be pulled in. The list of files and what they provide is
|
||||
// enumerated:
|
||||
//
|
||||
// Biancolin: This will be useful later.
|
||||
// $(ROCKETCHIP_DIR)/generated-src(-debug)?/$(CONFIG).plusArgs:
|
||||
// defines:
|
||||
// - PLUSARG_USAGE_OPTIONS
|
||||
// variables:
|
||||
// - static const char * verilog_plusargs
|
||||
|
||||
static uint64_t trace_count = 0;
|
||||
bool verbose;
|
||||
bool done_reset;
|
||||
|
||||
//void handle_sigterm(int sig)
|
||||
//{
|
||||
// Biancolin: //TODO
|
||||
//}
|
||||
|
||||
double sc_time_stamp()
|
||||
{
|
||||
return trace_count;
|
||||
}
|
||||
|
||||
extern "C" int vpi_get_vlog_info(void* arg)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void usage(const char * program_name)
|
||||
{
|
||||
printf("Usage: %s [VERILOG PLUSARG]...\n",
|
||||
program_name);
|
||||
fputs("\
|
||||
Run a BINARY on the Rocket Chip emulator.\n\
|
||||
\n\
|
||||
Mandatory arguments to long options are mandatory for short options too.\n\
|
||||
\n\
|
||||
EMULATOR OPTIONS\n\
|
||||
-c, --cycle-count Print the cycle count before exiting\n\
|
||||
+cycle-count\n\
|
||||
-h, --help Display this help and exit\n\
|
||||
-m, --max-cycles=CYCLES Kill the emulation after CYCLES\n\
|
||||
+max-cycles=CYCLES\n\
|
||||
-s, --seed=SEED Use random number seed SEED\n\
|
||||
automatically.\n\
|
||||
-V, --verbose Enable all Chisel printfs (cycle-by-cycle info)\n\
|
||||
+verbose\n\
|
||||
", stdout);
|
||||
#if VM_TRACE == 0
|
||||
fputs("\
|
||||
\n\
|
||||
EMULATOR DEBUG OPTIONS (only supported in debug build -- try `make debug`)\n",
|
||||
stdout);
|
||||
#endif
|
||||
fputs("\
|
||||
-v, --vcd=FILE, Write vcd trace to FILE (or '-' for stdout)\n\
|
||||
-x, --dump-start=CYCLE Start VCD tracing at CYCLE\n\
|
||||
+dump-start\n\
|
||||
", stdout);
|
||||
//fputs("\n" PLUSARG_USAGE_OPTIONS, stdout);
|
||||
}
|
||||
|
||||
int main(int argc, char** argv)
|
||||
{
|
||||
unsigned random_seed = (unsigned)time(NULL) ^ (unsigned)getpid();
|
||||
uint64_t max_cycles = -1;
|
||||
int ret = 0;
|
||||
bool print_cycles = false;
|
||||
// Port numbers are 16 bit unsigned integers.
|
||||
#if VM_TRACE
|
||||
FILE * vcdfile = NULL;
|
||||
uint64_t start = 0;
|
||||
#endif
|
||||
int verilog_plusargs_legal = 1;
|
||||
|
||||
while (1) {
|
||||
static struct option long_options[] = {
|
||||
{"cycle-count", no_argument, 0, 'c' },
|
||||
{"help", no_argument, 0, 'h' },
|
||||
{"max-cycles", required_argument, 0, 'm' },
|
||||
{"seed", required_argument, 0, 's' },
|
||||
{"rbb-port", required_argument, 0, 'r' },
|
||||
#if VM_TRACE
|
||||
{"vcd", required_argument, 0, 'v' },
|
||||
{"dump-start", required_argument, 0, 'x' },
|
||||
#endif
|
||||
{"verbose", no_argument, 0, 'V' }
|
||||
};
|
||||
int option_index = 0;
|
||||
#if VM_TRACE
|
||||
int c = getopt_long(argc, argv, "-chm:s:r:v:Vx:", long_options, &option_index);
|
||||
#else
|
||||
int c = getopt_long(argc, argv, "-chm:s:r:V", long_options, &option_index);
|
||||
#endif
|
||||
if (c == -1) break;
|
||||
retry:
|
||||
switch (c) {
|
||||
// Process long and short EMULATOR options
|
||||
case '?': usage(argv[0]); return 1;
|
||||
case 'c': print_cycles = true; break;
|
||||
case 'h': usage(argv[0]); return 0;
|
||||
case 'm': max_cycles = atoll(optarg); break;
|
||||
case 's': random_seed = atoi(optarg); break;
|
||||
case 'V': verbose = true; break;
|
||||
#if VM_TRACE
|
||||
case 'v': {
|
||||
vcdfile = strcmp(optarg, "-") == 0 ? stdout : fopen(optarg, "w");
|
||||
if (!vcdfile) {
|
||||
std::cerr << "Unable to open " << optarg << " for VCD write\n";
|
||||
return 1;
|
||||
}
|
||||
break;
|
||||
}
|
||||
case 'x': start = atoll(optarg); break;
|
||||
#endif
|
||||
// Process legacy '+' EMULATOR arguments by replacing them with
|
||||
// their getopt equivalents
|
||||
case 1: {
|
||||
std::string arg = optarg;
|
||||
if (arg.substr(0, 1) != "+") {
|
||||
optind--;
|
||||
goto done_processing;
|
||||
}
|
||||
if (arg == "+verbose")
|
||||
c = 'V';
|
||||
else if (arg.substr(0, 12) == "+max-cycles=") {
|
||||
c = 'm';
|
||||
optarg = optarg+12;
|
||||
}
|
||||
#if VM_TRACE
|
||||
else if (arg.substr(0, 12) == "+dump-start=") {
|
||||
c = 'x';
|
||||
optarg = optarg+12;
|
||||
}
|
||||
#endif
|
||||
else if (arg.substr(0, 12) == "+cycle-count")
|
||||
c = 'c';
|
||||
// If we don't find a legacy '+' EMULATOR argument, it still could be
|
||||
// a VERILOG_PLUSARG and not an error.
|
||||
//else if (verilog_plusargs_legal) {
|
||||
// const char ** plusarg = &verilog_plusargs[0];
|
||||
// int legal_verilog_plusarg = 0;
|
||||
// while (*plusarg && (legal_verilog_plusarg == 0)){
|
||||
// if (arg.substr(1, strlen(*plusarg)) == *plusarg) {
|
||||
// legal_verilog_plusarg = 1;
|
||||
// }
|
||||
// plusarg ++;
|
||||
// }
|
||||
// if (!legal_verilog_plusarg) {
|
||||
// verilog_plusargs_legal = 0;
|
||||
// } else {
|
||||
// c = 'P';
|
||||
// }
|
||||
// goto retry;
|
||||
//}
|
||||
// Not a recongized plus-arg
|
||||
else {
|
||||
std::cerr << argv[0] << ": invalid plus-arg (Verilog or HTIF) \""
|
||||
<< arg << "\"\n";
|
||||
c = '?';
|
||||
}
|
||||
goto retry;
|
||||
}
|
||||
case 'P': break; // Nothing to do here, Verilog PlusArg
|
||||
default:
|
||||
c = '?';
|
||||
goto retry;
|
||||
}
|
||||
}
|
||||
|
||||
done_processing:
|
||||
if (verbose)
|
||||
fprintf(stderr, "using random seed %u\n", random_seed);
|
||||
|
||||
srand(random_seed);
|
||||
srand48(random_seed);
|
||||
|
||||
Verilated::randReset(2);
|
||||
Verilated::commandArgs(argc, argv);
|
||||
TEST_HARNESS *tile = new TEST_HARNESS;
|
||||
|
||||
#if VM_TRACE
|
||||
Verilated::traceEverOn(true); // Verilator must compute traced signals
|
||||
std::unique_ptr<VerilatedVcdFILE> vcdfd(new VerilatedVcdFILE(vcdfile));
|
||||
std::unique_ptr<VerilatedVcdC> tfp(new VerilatedVcdC(vcdfd.get()));
|
||||
if (vcdfile) {
|
||||
tile->trace(tfp.get(), 99); // Trace 99 levels of hierarchy
|
||||
tfp->open("");
|
||||
}
|
||||
#endif
|
||||
|
||||
//signal(SIGTERM, handle_sigterm);
|
||||
|
||||
bool dump;
|
||||
// reset for several cycles to handle pipelined reset
|
||||
for (int i = 0; i < 10; i++) {
|
||||
tile->reset = 1;
|
||||
tile->clock = 0;
|
||||
tile->eval();
|
||||
#if VM_TRACE
|
||||
dump = tfp && trace_count >= start;
|
||||
if (dump)
|
||||
tfp->dump(static_cast<vluint64_t>(trace_count * 2));
|
||||
#endif
|
||||
tile->clock = 1;
|
||||
tile->eval();
|
||||
#if VM_TRACE
|
||||
if (dump)
|
||||
tfp->dump(static_cast<vluint64_t>(trace_count * 2 + 1));
|
||||
#endif
|
||||
trace_count ++;
|
||||
}
|
||||
tile->reset = 0;
|
||||
done_reset = true;
|
||||
|
||||
while (!tile->io_success && trace_count < max_cycles) {
|
||||
tile->clock = 0;
|
||||
tile->eval();
|
||||
#if VM_TRACE
|
||||
dump = tfp && trace_count >= start;
|
||||
if (dump)
|
||||
tfp->dump(static_cast<vluint64_t>(trace_count * 2));
|
||||
#endif
|
||||
|
||||
tile->clock = 1;
|
||||
tile->eval();
|
||||
#if VM_TRACE
|
||||
if (dump)
|
||||
tfp->dump(static_cast<vluint64_t>(trace_count * 2 + 1));
|
||||
#endif
|
||||
trace_count++;
|
||||
}
|
||||
|
||||
#if VM_TRACE
|
||||
if (tfp)
|
||||
tfp->close();
|
||||
if (vcdfile)
|
||||
fclose(vcdfile);
|
||||
#endif
|
||||
|
||||
if (trace_count == max_cycles)
|
||||
{
|
||||
fprintf(stderr, "*** FAILED *** via trace_count (timeout, seed %d) after %ld cycles\n", random_seed, trace_count);
|
||||
ret = 2;
|
||||
}
|
||||
else if (verbose || print_cycles)
|
||||
{
|
||||
fprintf(stderr, "Completed after %ld cycles\n", trace_count);
|
||||
}
|
||||
|
||||
if (tile) delete tile;
|
||||
return ret;
|
||||
}
|
|
@ -0,0 +1,5 @@
|
|||
// HACK: Disable MULTIDRIVEN linting, since verilator cannot determine if two
|
||||
// syntactically different clocks are aliases of one another if they are
|
||||
// driven by seperate ports.
|
||||
`verilator_config
|
||||
lint_off -msg MULTIDRIVEN
|
|
@ -0,0 +1,158 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "sample.h"
|
||||
#include <cassert>
|
||||
#include <cstring>
|
||||
#include <fstream>
|
||||
#include <sstream>
|
||||
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
std::array<std::vector<std::string>, CHAIN_NUM> sample_t::signals = {};
|
||||
std::array<std::vector<size_t>, CHAIN_NUM> sample_t::widths = {};
|
||||
std::array<std::vector<int>, CHAIN_NUM> sample_t::depths = {};
|
||||
size_t sample_t::chain_len[CHAIN_NUM] = {0};
|
||||
size_t sample_t::chain_loop[CHAIN_NUM] = {0};
|
||||
|
||||
void sample_t::init_chains(std::string filename) {
|
||||
std::fill(signals.begin(), signals.end(), std::vector<std::string>());
|
||||
std::fill(widths.begin(), widths.end(), std::vector<size_t>());
|
||||
std::fill(depths.begin(), depths.end(), std::vector<int>());
|
||||
std::ifstream file(filename.c_str());
|
||||
if (!file) {
|
||||
fprintf(stderr, "Cannot open %s\n", filename.c_str());
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
std::string line;
|
||||
while (std::getline(file, line)) {
|
||||
std::istringstream iss(line);
|
||||
size_t type;
|
||||
std::string signal;
|
||||
iss >> type >> signal;
|
||||
size_t width;
|
||||
int depth;
|
||||
iss >> width >> depth;
|
||||
if (signal == "null") signal = "";
|
||||
signals[type].push_back(signal);
|
||||
widths[type].push_back(width);
|
||||
depths[type].push_back(depth);
|
||||
chain_len[type] += width;
|
||||
switch ((CHAIN_TYPE) type) {
|
||||
case SRAM_CHAIN:
|
||||
case REGFILE_CHAIN:
|
||||
if (!signal.empty() && depth > 0) {
|
||||
chain_loop[type] = std::max(chain_loop[type], (size_t) depth);
|
||||
}
|
||||
break;
|
||||
default:
|
||||
chain_loop[type] = 1;
|
||||
break;
|
||||
}
|
||||
}
|
||||
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
|
||||
chain_len[t] /= DAISY_WIDTH;
|
||||
}
|
||||
file.close();
|
||||
}
|
||||
|
||||
void sample_t::dump_chains(std::ostream& os) {
|
||||
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
|
||||
auto chain_signals = signals[t];
|
||||
auto chain_widths = widths[t];
|
||||
for (size_t id = 0 ; id < chain_signals.size() ; id++) {
|
||||
auto signal = chain_signals[id];
|
||||
auto width = chain_widths[id];
|
||||
os << SIGNALS << " " << t << " " <<
|
||||
(signal.empty() ? "null" : signal) << " " << width << std::endl;
|
||||
}
|
||||
}
|
||||
for (size_t id = 0 ; id < IN_TR_SIZE ; id++) {
|
||||
os << SIGNALS << " " << IN_TR << " " << IN_TR_NAMES[id] << std::endl;
|
||||
}
|
||||
for (size_t id = 0 ; id < OUT_TR_SIZE ; id++) {
|
||||
os << SIGNALS << " " << OUT_TR << " " << OUT_TR_NAMES[id] << std::endl;
|
||||
}
|
||||
for (size_t id = 0, bits_id = 0 ; id < IN_TR_READY_VALID_SIZE ; id++) {
|
||||
os << SIGNALS << " " << IN_TR_VALID << " " <<
|
||||
(const char*)IN_TR_READY_VALID_NAMES[id] << "_valid" << std::endl;
|
||||
os << SIGNALS << " " << IN_TR_READY << " " <<
|
||||
(const char*)IN_TR_READY_VALID_NAMES[id] << "_ready" << std::endl;
|
||||
for (size_t k = 0 ; k < (size_t)IN_TR_BITS_FIELD_NUMS[id] ; k++, bits_id++) {
|
||||
os << SIGNALS << " " << IN_TR_BITS << " " <<
|
||||
(const char*)IN_TR_BITS_FIELD_NAMES[bits_id] << std::endl;
|
||||
}
|
||||
}
|
||||
for (size_t id = 0, bits_id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
|
||||
os << SIGNALS << " " << OUT_TR_VALID << " " <<
|
||||
(const char*)OUT_TR_READY_VALID_NAMES[id] << "_valid" << std::endl;
|
||||
os << SIGNALS << " " << OUT_TR_READY << " " <<
|
||||
(const char*)OUT_TR_READY_VALID_NAMES[id] << "_ready" << std::endl;
|
||||
for (size_t k = 0 ; k < (size_t)OUT_TR_BITS_FIELD_NUMS[id] ; k++, bits_id++) {
|
||||
os << SIGNALS << " " << OUT_TR_BITS << " " <<
|
||||
(const char*)OUT_TR_BITS_FIELD_NAMES[bits_id] << std::endl;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
size_t sample_t::read_chain(CHAIN_TYPE type, const char* snap, size_t start) {
|
||||
size_t t = static_cast<size_t>(type);
|
||||
auto chain_signals = signals[t];
|
||||
auto chain_widths = widths[t];
|
||||
auto chain_depths = depths[t];
|
||||
for (size_t i = 0 ; i < chain_loop[type] ; i++) {
|
||||
for (size_t s = 0 ; s < chain_signals.size() ; s++) {
|
||||
auto signal = chain_signals[s];
|
||||
auto width = chain_widths[s];
|
||||
auto depth = chain_depths[s];
|
||||
if (!signal.empty()) {
|
||||
char substr[1025];
|
||||
assert(width <= 1024);
|
||||
strncpy(substr, snap+start, width);
|
||||
substr[width] = '\0';
|
||||
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_str(*value, substr, 2);
|
||||
switch(type) {
|
||||
case TRACE_CHAIN:
|
||||
add_cmd(new force_t(type, s, value));
|
||||
break;
|
||||
case REGS_CHAIN:
|
||||
add_cmd(new load_t(type, s, value));
|
||||
break;
|
||||
case SRAM_CHAIN:
|
||||
case REGFILE_CHAIN:
|
||||
if (static_cast<int>(i) < depth)
|
||||
add_cmd(new load_t(type, s, value, i));
|
||||
break;
|
||||
case CNTR_CHAIN:
|
||||
add_cmd(new count_t(type, s, value));
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
start += width;
|
||||
}
|
||||
assert(start % DAISY_WIDTH == 0);
|
||||
}
|
||||
return start;
|
||||
}
|
||||
|
||||
sample_t::sample_t(const char* snap, uint64_t _cycle):
|
||||
cycle(_cycle), force_prev_id(-1) {
|
||||
size_t start = 0;
|
||||
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
|
||||
CHAIN_TYPE type = static_cast<CHAIN_TYPE>(t);
|
||||
start = read_chain(type, snap, start);
|
||||
}
|
||||
}
|
||||
|
||||
sample_t::sample_t(CHAIN_TYPE type, const char* snap, uint64_t _cycle):
|
||||
cycle(_cycle), force_prev_id(-1) {
|
||||
read_chain(type, snap);
|
||||
}
|
||||
#endif
|
||||
|
||||
sample_t::~sample_t() {
|
||||
for (auto& cmd: cmds) delete cmd;
|
||||
cmds.clear();
|
||||
}
|
|
@ -0,0 +1,192 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __SAMPLE_H
|
||||
#define __SAMPLE_H
|
||||
|
||||
#include <string>
|
||||
#include <array>
|
||||
#include <vector>
|
||||
#include <map>
|
||||
#include <ostream>
|
||||
#include <inttypes.h>
|
||||
#include <gmp.h>
|
||||
|
||||
enum SAMPLE_INST_TYPE { SIGNALS, CYCLE, LOAD, FORCE, POKE, STEP, EXPECT, COUNT };
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
enum { IN_TR = CHAIN_NUM,
|
||||
OUT_TR,
|
||||
IN_TR_VALID,
|
||||
IN_TR_READY,
|
||||
IN_TR_BITS,
|
||||
OUT_TR_VALID,
|
||||
OUT_TR_READY,
|
||||
OUT_TR_BITS };
|
||||
#endif
|
||||
|
||||
struct sample_inst_t {
|
||||
virtual ~sample_inst_t() {}
|
||||
virtual std::ostream& dump(std::ostream &os) const = 0;
|
||||
friend std::ostream& operator<<(std::ostream &os, const sample_inst_t& cmd) {
|
||||
return cmd.dump(os);
|
||||
}
|
||||
};
|
||||
|
||||
struct step_t: sample_inst_t {
|
||||
step_t(size_t n_): n(n_) { }
|
||||
std::ostream& dump(std::ostream &os) const {
|
||||
return os << STEP << " " << n << std::endl;
|
||||
}
|
||||
const size_t n;
|
||||
};
|
||||
|
||||
struct load_t: sample_inst_t {
|
||||
load_t(const size_t type, const size_t id, mpz_t* value, const int idx = -1):
|
||||
type(type), id(id), value(value), idx(idx) { }
|
||||
~load_t() {
|
||||
mpz_clear(*value);
|
||||
free(value);
|
||||
}
|
||||
std::ostream& dump(std::ostream &os) const {
|
||||
char* value_str = mpz_get_str(NULL, 16, *value);
|
||||
os << LOAD << " " << type << " " << id << " " << value_str << " " << idx << std::endl;
|
||||
free(value_str);
|
||||
return os;
|
||||
}
|
||||
|
||||
const size_t type;
|
||||
const size_t id;
|
||||
mpz_t* const value;
|
||||
const int idx;
|
||||
};
|
||||
|
||||
struct force_t: sample_inst_t {
|
||||
force_t(const size_t type, const size_t id, mpz_t* value):
|
||||
type(type), id(id), value(value) { }
|
||||
~force_t() {
|
||||
mpz_clear(*value);
|
||||
free(value);
|
||||
}
|
||||
std::ostream& dump(std::ostream &os) const {
|
||||
char* value_str = mpz_get_str(NULL, 16, *value);
|
||||
os << FORCE << " " << type << " " << id << " " << value_str << std::endl;
|
||||
free(value_str);
|
||||
return os;
|
||||
}
|
||||
|
||||
const size_t type;
|
||||
const size_t id;
|
||||
mpz_t* const value;
|
||||
};
|
||||
|
||||
struct poke_t: sample_inst_t {
|
||||
poke_t(const size_t type, const size_t id, mpz_t* value):
|
||||
type(type), id(id), value(value) { }
|
||||
~poke_t() {
|
||||
mpz_clear(*value);
|
||||
free(value);
|
||||
}
|
||||
std::ostream& dump(std::ostream &os) const {
|
||||
char* value_str = mpz_get_str(NULL, 16, *value);
|
||||
os << POKE << " " << type << " " << id << " " << value_str << std::endl;
|
||||
free(value_str);
|
||||
return os;
|
||||
}
|
||||
|
||||
const size_t type;
|
||||
const size_t id;
|
||||
mpz_t* const value;
|
||||
};
|
||||
|
||||
struct expect_t: sample_inst_t {
|
||||
expect_t(const size_t type, const size_t id, mpz_t* value):
|
||||
type(type), id(id), value(value) { }
|
||||
~expect_t() {
|
||||
mpz_clear(*value);
|
||||
free(value);
|
||||
}
|
||||
std::ostream& dump(std::ostream &os) const {
|
||||
char* value_str = mpz_get_str(NULL, 16, *value);
|
||||
os << EXPECT << " " << type << " " << id << " " << value_str << std::endl;
|
||||
free(value_str);
|
||||
return os;
|
||||
}
|
||||
|
||||
const size_t type;
|
||||
const size_t id;
|
||||
mpz_t* const value;
|
||||
};
|
||||
|
||||
struct count_t: sample_inst_t {
|
||||
count_t(const size_t type, const size_t id, mpz_t* value):
|
||||
type(type), id(id), value(value) { }
|
||||
~count_t() {
|
||||
mpz_clear(*value);
|
||||
free(value);
|
||||
}
|
||||
std::ostream& dump(std::ostream &os) const {
|
||||
char* value_str = mpz_get_str(NULL, 16, *value);
|
||||
os << COUNT << " " << type << " " << id << " " << value_str << std::endl;
|
||||
free(value_str);
|
||||
return os;
|
||||
}
|
||||
|
||||
const size_t type;
|
||||
const size_t id;
|
||||
mpz_t* const value;
|
||||
};
|
||||
|
||||
class sample_t {
|
||||
public:
|
||||
sample_t(uint64_t _cycle): cycle(_cycle) { }
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
sample_t(const char* snap, uint64_t _cycle);
|
||||
sample_t(CHAIN_TYPE type, const char* snap, uint64_t _cycle);
|
||||
|
||||
std::ostream& dump(std::ostream &os) const {
|
||||
os << CYCLE << " cycle: " << cycle << std::endl;
|
||||
for (size_t i = 0 ; i < cmds.size() ; i++) {
|
||||
os << *cmds[i];
|
||||
}
|
||||
return os;
|
||||
}
|
||||
|
||||
friend std::ostream& operator<<(std::ostream& os, const sample_t& s) {
|
||||
return s.dump(os);
|
||||
}
|
||||
#endif
|
||||
virtual ~sample_t();
|
||||
|
||||
void add_cmd(sample_inst_t *cmd) { cmds.push_back(cmd); }
|
||||
|
||||
inline const uint64_t get_cycle() const { return cycle; }
|
||||
inline const std::vector<sample_inst_t*>& get_cmds() const { return cmds; }
|
||||
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
size_t read_chain(CHAIN_TYPE type, const char* snap, size_t start = 0);
|
||||
|
||||
static void init_chains(std::string filename);
|
||||
static void dump_chains(FILE *file);
|
||||
static void dump_chains(std::ostream &os);
|
||||
static size_t get_chain_loop(CHAIN_TYPE t) {
|
||||
return chain_loop[t];
|
||||
}
|
||||
static size_t get_chain_len(CHAIN_TYPE t) {
|
||||
return chain_len[t];
|
||||
}
|
||||
#endif
|
||||
private:
|
||||
const uint64_t cycle;
|
||||
std::vector<sample_inst_t*> cmds;
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
std::vector<std::vector<force_t*>> force_bins;
|
||||
size_t force_bin_idx;
|
||||
size_t force_prev_id;
|
||||
static size_t chain_loop[CHAIN_NUM];
|
||||
static size_t chain_len[CHAIN_NUM];
|
||||
static std::array<std::vector<std::string>, CHAIN_NUM> signals;
|
||||
static std::array<std::vector<size_t>, CHAIN_NUM> widths;
|
||||
static std::array<std::vector<int>, CHAIN_NUM> depths;
|
||||
#endif
|
||||
};
|
||||
|
||||
#endif // __SAMPLE_H
|
|
@ -0,0 +1,333 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "simif.h"
|
||||
#include <fstream>
|
||||
#include <iostream>
|
||||
#include <algorithm>
|
||||
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
void simif_t::init_sampling(int argc, char** argv) {
|
||||
// Read mapping files
|
||||
sample_t::init_chains(std::string(TARGET_NAME) + ".chain");
|
||||
|
||||
// Init sample variables
|
||||
sample_file = std::string(TARGET_NAME) + ".sample";
|
||||
sample_num = 30;
|
||||
last_sample = NULL;
|
||||
last_sample_id = 0;
|
||||
profile = false;
|
||||
sample_count = 0;
|
||||
sample_time = 0;
|
||||
sample_cycle = 0;
|
||||
snap_cycle = -1ULL;
|
||||
tracelen = TRACE_MAX_LEN;
|
||||
trace_count = 0;
|
||||
|
||||
std::vector<std::string> args(argv + 1, argv + argc);
|
||||
for (auto &arg: args) {
|
||||
if (arg.find("+sample=") == 0) {
|
||||
sample_file = arg.c_str() + 8;
|
||||
}
|
||||
if (arg.find("+samplenum=") == 0) {
|
||||
sample_num = strtol(arg.c_str() + 11, NULL, 10);
|
||||
}
|
||||
if (arg.find("+sample-cycle=") == 0) {
|
||||
sample_cycle = strtoll(arg.c_str() + 14, NULL, 10);
|
||||
}
|
||||
if (arg.find("+tracelen=") == 0) {
|
||||
tracelen = strtol(arg.c_str() + 10, NULL, 10);
|
||||
}
|
||||
if (arg.find("+profile") == 0) {
|
||||
profile = true;
|
||||
}
|
||||
}
|
||||
|
||||
assert(tracelen > 2);
|
||||
write(TRACELEN_ADDR, tracelen);
|
||||
|
||||
#ifdef KEEP_SAMPLES_IN_MEM
|
||||
samples = new sample_t*[sample_num];
|
||||
for (size_t i = 0 ; i < sample_num ; i++) samples[i] = NULL;
|
||||
#endif
|
||||
|
||||
// flush output traces by sim reset
|
||||
for (size_t k = 0 ; k < OUT_TR_SIZE ; k++) {
|
||||
size_t addr = OUT_TR_ADDRS[k];
|
||||
size_t chunk = OUT_TR_CHUNKS[k];
|
||||
for (size_t off = 0 ; off < chunk ; off++)
|
||||
read(addr+off);
|
||||
}
|
||||
for (size_t id = 0, bits_id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
|
||||
read((size_t)OUT_TR_READY_ADDRS[id]);
|
||||
bits_id = !read((size_t)OUT_TR_VALID_ADDRS[id]) ?
|
||||
bits_id + (size_t)OUT_TR_BITS_FIELD_NUMS[id] :
|
||||
trace_ready_valid_bits(NULL, false, id, bits_id);
|
||||
}
|
||||
}
|
||||
|
||||
void simif_t::finish_sampling() {
|
||||
// tail samples
|
||||
save_sample();
|
||||
|
||||
// dump samples
|
||||
std::ofstream file(sample_file.c_str(), std::ios_base::out | std::ios_base::trunc);
|
||||
sample_t::dump_chains(file);
|
||||
#ifdef KEEP_SAMPLES_IN_MEM
|
||||
for (size_t i = 0 ; i < sample_num ; i++) {
|
||||
if (samples[i] != NULL) {
|
||||
samples[i]->dump(file);
|
||||
delete samples[i];
|
||||
}
|
||||
}
|
||||
delete[] samples;
|
||||
#else
|
||||
for (size_t i = 0 ; i < std::min(sample_num, sample_count) ; i++) {
|
||||
std::string fname = sample_file + "_" + std::to_string(i);
|
||||
std::ifstream f(fname.c_str());
|
||||
std::string line;
|
||||
while (std::getline(f, line)) {
|
||||
file << line << std::endl;
|
||||
}
|
||||
remove(fname.c_str());
|
||||
}
|
||||
#endif
|
||||
file.close();
|
||||
|
||||
fprintf(stderr, "Sample Count: %zu\n", sample_count);
|
||||
if (profile) {
|
||||
double sim_time = diff_secs(timestamp(), sim_start_time);
|
||||
fprintf(stderr, "Sample Time: %.3f s\n", diff_secs(sample_time, 0));
|
||||
}
|
||||
}
|
||||
|
||||
static const size_t data_t_chunks = sizeof(data_t) / sizeof(uint32_t);
|
||||
|
||||
size_t simif_t::trace_ready_valid_bits(sample_t* sample, bool poke, size_t id, size_t bits_id) {
|
||||
size_t bits_addr = poke ? (size_t)IN_TR_BITS_ADDRS[id] : (size_t)OUT_TR_BITS_ADDRS[id];
|
||||
size_t bits_chunk = poke ? (size_t)IN_TR_BITS_CHUNKS[id] : (size_t)OUT_TR_BITS_CHUNKS[id];
|
||||
size_t num_fields = poke ? (size_t)IN_TR_BITS_FIELD_NUMS[id] : (size_t)OUT_TR_BITS_FIELD_NUMS[id];
|
||||
data_t *bits_data = new data_t[bits_chunk];
|
||||
for (size_t off = 0 ; off < bits_chunk ; off++) {
|
||||
bits_data[off] = read(bits_addr + off);
|
||||
}
|
||||
if (sample) {
|
||||
mpz_t data;
|
||||
mpz_init(data);
|
||||
mpz_import(data, bits_chunk, -1, sizeof(data_t), 0, 0, bits_data);
|
||||
for (size_t k = 0, off = 0 ; k < num_fields ; k++, bits_id++) {
|
||||
size_t field_width = ((unsigned int*)(
|
||||
poke ? IN_TR_BITS_FIELD_WIDTHS : OUT_TR_BITS_FIELD_WIDTHS))[bits_id];
|
||||
mpz_t *value = (mpz_t*)malloc(sizeof(mpz_t)), mask;
|
||||
mpz_inits(*value, mask, NULL);
|
||||
// value = data >> off
|
||||
mpz_fdiv_q_2exp(*value, data, off);
|
||||
// mask = (1 << field_width) - 1
|
||||
mpz_set_ui(mask, 1);
|
||||
mpz_mul_2exp(mask, mask, field_width);
|
||||
mpz_sub_ui(mask, mask, 1);
|
||||
// *value = *value & mask
|
||||
mpz_and(*value, *value, mask);
|
||||
mpz_clear(mask);
|
||||
sample->add_cmd(poke ?
|
||||
(sample_inst_t*) new poke_t(IN_TR_BITS, bits_id, value):
|
||||
(sample_inst_t*) new expect_t(OUT_TR_BITS, bits_id, value));
|
||||
off += field_width;
|
||||
}
|
||||
mpz_clear(data);
|
||||
}
|
||||
|
||||
delete[] bits_data;
|
||||
return bits_id;
|
||||
}
|
||||
|
||||
sample_t* simif_t::read_traces(sample_t *sample) {
|
||||
for (size_t i = 0 ; i < std::min(trace_count, tracelen) ; i++) {
|
||||
// wire input traces from FPGA
|
||||
for (size_t id = 0 ; id < IN_TR_SIZE ; id++) {
|
||||
size_t addr = IN_TR_ADDRS[id];
|
||||
size_t chunk = IN_TR_CHUNKS[id];
|
||||
data_t *data = new data_t[chunk];
|
||||
for (size_t off = 0 ; off < chunk ; off++) {
|
||||
data[off] = read(addr+off);
|
||||
}
|
||||
if (sample) {
|
||||
mpz_t *value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_import(*value, chunk, -1, sizeof(data_t), 0, 0, data);
|
||||
sample->add_cmd(new poke_t(IN_TR, id, value));
|
||||
}
|
||||
delete[] data;
|
||||
}
|
||||
|
||||
// ready valid input traces from FPGA
|
||||
for (size_t id = 0, bits_id = 0 ; id < IN_TR_READY_VALID_SIZE ; id++) {
|
||||
size_t valid_addr = (size_t)IN_TR_VALID_ADDRS[id];
|
||||
data_t valid_data = read(valid_addr);
|
||||
if (sample) {
|
||||
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_ui(*value, valid_data);
|
||||
sample->add_cmd(new poke_t(IN_TR_VALID, id, value));
|
||||
}
|
||||
bits_id = !valid_data ?
|
||||
bits_id + (size_t)IN_TR_BITS_FIELD_NUMS[id] :
|
||||
trace_ready_valid_bits(sample, true, id, bits_id);
|
||||
}
|
||||
for (size_t id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
|
||||
size_t ready_addr = (size_t)OUT_TR_READY_ADDRS[id];
|
||||
data_t ready_data = read(ready_addr);
|
||||
if (sample) {
|
||||
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_ui(*value, ready_data);
|
||||
sample->add_cmd(new poke_t(OUT_TR_READY, id, value));
|
||||
}
|
||||
}
|
||||
|
||||
if (sample) sample->add_cmd(new step_t(1));
|
||||
|
||||
// wire output traces from FPGA
|
||||
for (size_t id = 0 ; id < OUT_TR_SIZE ; id++) {
|
||||
size_t addr = OUT_TR_ADDRS[id];
|
||||
size_t chunk = OUT_TR_CHUNKS[id];
|
||||
data_t *data = new data_t[chunk];
|
||||
for (size_t off = 0 ; off < chunk ; off++) {
|
||||
data[off] = read(addr+off);
|
||||
}
|
||||
if (sample && i > 0) {
|
||||
mpz_t *value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_import(*value, chunk, -1, sizeof(data_t), 0, 0, data);
|
||||
sample->add_cmd(new expect_t(OUT_TR, id, value));
|
||||
}
|
||||
delete[] data;
|
||||
}
|
||||
|
||||
// ready valid output traces from FPGA
|
||||
for (size_t id = 0, bits_id = 0 ; id < OUT_TR_READY_VALID_SIZE ; id++) {
|
||||
size_t valid_addr = (size_t)OUT_TR_VALID_ADDRS[id];
|
||||
data_t valid_data = read(valid_addr);
|
||||
if (sample) {
|
||||
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_ui(*value, valid_data);
|
||||
sample->add_cmd(new expect_t(OUT_TR_VALID, id, value));
|
||||
}
|
||||
bits_id = !valid_data ?
|
||||
bits_id + (size_t)OUT_TR_BITS_FIELD_NUMS[id] :
|
||||
trace_ready_valid_bits(sample, false, id, bits_id);
|
||||
}
|
||||
for (size_t id = 0 ; id < IN_TR_READY_VALID_SIZE ; id++) {
|
||||
size_t ready_addr = (size_t)IN_TR_READY_ADDRS[id];
|
||||
data_t ready_data = read(ready_addr);
|
||||
if (sample) {
|
||||
mpz_t* value = (mpz_t*)malloc(sizeof(mpz_t));
|
||||
mpz_init(*value);
|
||||
mpz_set_ui(*value, ready_data);
|
||||
sample->add_cmd(new expect_t(IN_TR_READY, id, value));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (sample && sample_cycle > 0) {
|
||||
sample->add_cmd(new step_t(5)); // to catch assertions in replay
|
||||
}
|
||||
|
||||
return sample;
|
||||
}
|
||||
|
||||
static inline char* int_to_bin(char *bin, data_t value, size_t size) {
|
||||
for (size_t i = 0 ; i < size; i++) {
|
||||
bin[i] = ((value >> (size-1-i)) & 0x1) + '0';
|
||||
}
|
||||
bin[size] = 0;
|
||||
return bin;
|
||||
}
|
||||
|
||||
sample_t* simif_t::read_snapshot(bool load) {
|
||||
std::ostringstream snap;
|
||||
char bin[DAISY_WIDTH+1];
|
||||
for (size_t t = 0 ; t < CHAIN_NUM ; t++) {
|
||||
CHAIN_TYPE type = static_cast<CHAIN_TYPE>(t);
|
||||
const size_t chain_loop = sample_t::get_chain_loop(type);
|
||||
const size_t chain_len = sample_t::get_chain_len(type);
|
||||
for (size_t k = 0 ; k < chain_loop ; k++) {
|
||||
for (size_t i = 0 ; i < CHAIN_SIZE[t] ; i++) {
|
||||
switch(type) {
|
||||
case SRAM_CHAIN:
|
||||
write(SRAM_RESTART_ADDR + i, 1);
|
||||
break;
|
||||
case REGFILE_CHAIN:
|
||||
write(REGFILE_RESTART_ADDR + i, 1);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
for (size_t j = 0 ; j < chain_len ; j++) {
|
||||
// TODO: write arbitrary values
|
||||
if (load) write(CHAIN_IN_ADDR[t], 0);
|
||||
data_t value = read(CHAIN_ADDR[t] + i);
|
||||
if (!load) snap << int_to_bin(bin, value, DAISY_WIDTH);
|
||||
}
|
||||
if (load) write(CHAIN_LOAD_ADDR[t], 1);
|
||||
}
|
||||
}
|
||||
}
|
||||
return load ? NULL : new sample_t(snap.str().c_str(), cycles());
|
||||
}
|
||||
|
||||
void simif_t::save_sample() {
|
||||
if (last_sample != NULL) {
|
||||
sample_t* sample = read_traces(last_sample);
|
||||
#ifdef KEEP_SAMPLES_IN_MEM
|
||||
if (samples[last_sample_id] != NULL)
|
||||
delete samples[last_sample_id];
|
||||
samples[last_sample_id] = sample;
|
||||
#else
|
||||
std::string filename = sample_file + "_" + std::to_string(last_sample_id);
|
||||
std::ofstream file(filename.c_str(), std::ios_base::out | std::ios_base::trunc);
|
||||
sample->dump(file);
|
||||
delete sample;
|
||||
file.close();
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
void simif_t::reservoir_sampling(size_t n) {
|
||||
if (t % tracelen == 0) {
|
||||
midas_time_t start_time = 0;
|
||||
uint64_t record_id = t / tracelen;
|
||||
uint64_t sample_id = record_id < sample_num ? record_id : gen() % (record_id + 1);
|
||||
if (sample_id < sample_num) {
|
||||
sample_count++;
|
||||
if (profile) start_time = timestamp();
|
||||
save_sample();
|
||||
last_sample = read_snapshot();
|
||||
last_sample_id = sample_id;
|
||||
trace_count = 0;
|
||||
if (profile) sample_time += (timestamp() - start_time);
|
||||
}
|
||||
}
|
||||
if (trace_count < tracelen) trace_count += n;
|
||||
}
|
||||
|
||||
void simif_t::deterministic_sampling(size_t n) {
|
||||
if (((t + n) - sample_cycle <= tracelen || sample_cycle <= t) &&
|
||||
((last_sample_id + 1) < sample_num)) {
|
||||
sample_count++;
|
||||
snap_cycle = t;
|
||||
fprintf(stderr, "[id: %u] Snapshot at %llu\n",
|
||||
(unsigned)last_sample_id, (unsigned long long)t);
|
||||
trace_count = std::min(n, tracelen);
|
||||
if (last_sample) {
|
||||
save_sample();
|
||||
} else {
|
||||
// flush trace buffer
|
||||
read_traces(NULL);
|
||||
}
|
||||
trace_count = 0;
|
||||
last_sample_id = last_sample ? last_sample_id + 1 : 0;
|
||||
last_sample = read_snapshot();
|
||||
}
|
||||
}
|
||||
#endif
|
|
@ -0,0 +1,224 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "simif.h"
|
||||
#include <fstream>
|
||||
#include <algorithm>
|
||||
|
||||
midas_time_t timestamp(){
|
||||
struct timeval tv;
|
||||
gettimeofday(&tv, NULL);
|
||||
return 1000000L * tv.tv_sec + tv.tv_usec;
|
||||
}
|
||||
|
||||
double diff_secs(midas_time_t end, midas_time_t start) {
|
||||
return ((double)(end - start)) / TIME_DIV_CONST;
|
||||
}
|
||||
|
||||
simif_t::simif_t() {
|
||||
pass = true;
|
||||
t = 0;
|
||||
fail_t = 0;
|
||||
seed = time(NULL); // FIXME: better initail seed?
|
||||
SIMULATIONMASTER_0_substruct_create;
|
||||
this->master_mmio_addrs = SIMULATIONMASTER_0_substruct;
|
||||
LOADMEMWIDGET_0_substruct_create;
|
||||
this->loadmem_mmio_addrs = LOADMEMWIDGET_0_substruct;
|
||||
PEEKPOKEBRIDGEMODULE_0_substruct_create;
|
||||
this->defaultiowidget_mmio_addrs = PEEKPOKEBRIDGEMODULE_0_substruct;
|
||||
}
|
||||
|
||||
void simif_t::init(int argc, char** argv, bool log) {
|
||||
// Simulation reset
|
||||
write(this->master_mmio_addrs->SIM_RESET, 1);
|
||||
while(!done());
|
||||
|
||||
this->log = log;
|
||||
std::vector<std::string> args(argv + 1, argv + argc);
|
||||
std::string loadmem;
|
||||
bool fastloadmem = false;
|
||||
for (auto &arg: args) {
|
||||
if (arg.find("+fastloadmem") == 0) {
|
||||
fastloadmem = true;
|
||||
}
|
||||
if (arg.find("+loadmem=") == 0) {
|
||||
loadmem = arg.c_str() + 9;
|
||||
}
|
||||
if (arg.find("+seed=") == 0) {
|
||||
seed = strtoll(arg.c_str() + 6, NULL, 10);
|
||||
fprintf(stderr, "Using custom SEED: %ld\n", seed);
|
||||
}
|
||||
}
|
||||
gen.seed(seed);
|
||||
fprintf(stderr, "random min: 0x%llx, random max: 0x%llx\n", gen.min(), gen.max());
|
||||
if (!fastloadmem && !loadmem.empty()) {
|
||||
load_mem(loadmem.c_str());
|
||||
}
|
||||
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
init_sampling(argc, argv);
|
||||
#endif
|
||||
}
|
||||
|
||||
uint64_t simif_t::actual_tcycle() {
|
||||
write(this->defaultiowidget_mmio_addrs->tCycle_latch, 1);
|
||||
data_t cycle_l = read(this->defaultiowidget_mmio_addrs->tCycle_0);
|
||||
data_t cycle_h = read(this->defaultiowidget_mmio_addrs->tCycle_1);
|
||||
return (((uint64_t) cycle_h) << 32) | cycle_l;
|
||||
}
|
||||
|
||||
uint64_t simif_t::hcycle() {
|
||||
write(this->defaultiowidget_mmio_addrs->hCycle_latch, 1);
|
||||
data_t cycle_l = read(this->defaultiowidget_mmio_addrs->hCycle_0);
|
||||
data_t cycle_h = read(this->defaultiowidget_mmio_addrs->hCycle_1);
|
||||
return (((uint64_t) cycle_h) << 32) | cycle_l;
|
||||
}
|
||||
|
||||
void simif_t::target_reset(int pulse_length) {
|
||||
poke(reset, 1);
|
||||
take_steps(pulse_length, true);
|
||||
poke(reset, 0);
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
// flush I/O traces by target resets
|
||||
trace_count = std::min((size_t)(pulse_length), tracelen);
|
||||
read_traces(NULL);
|
||||
trace_count = 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int simif_t::finish() {
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
finish_sampling();
|
||||
#endif
|
||||
|
||||
fprintf(stderr, "Runs %llu cycles\n", actual_tcycle());
|
||||
fprintf(stderr, "[%s] %s Test", pass ? "PASS" : "FAIL", TARGET_NAME);
|
||||
if (!pass) { fprintf(stdout, " at cycle %llu", fail_t); }
|
||||
fprintf(stderr, "\nSEED: %ld\n", seed);
|
||||
|
||||
return pass ? EXIT_SUCCESS : EXIT_FAILURE;
|
||||
}
|
||||
|
||||
static const size_t data_t_chunks = sizeof(data_t) / sizeof(uint32_t);
|
||||
|
||||
void simif_t::poke(size_t id, mpz_t& value) {
|
||||
if (log) {
|
||||
char* v_str = mpz_get_str(NULL, 16, value);
|
||||
fprintf(stderr, "* POKE %s.%s <- 0x%s *\n", TARGET_NAME, INPUT_NAMES[id], v_str);
|
||||
free(v_str);
|
||||
}
|
||||
size_t size;
|
||||
data_t* data = (data_t*)mpz_export(NULL, &size, -1, sizeof(data_t), 0, 0, value);
|
||||
for (size_t i = 0 ; i < INPUT_CHUNKS[id] ; i++) {
|
||||
write(INPUT_ADDRS[id]+i, i < size ? data[i] : 0);
|
||||
}
|
||||
}
|
||||
|
||||
void simif_t::peek(size_t id, mpz_t& value) {
|
||||
const size_t size = (const size_t)OUTPUT_CHUNKS[id];
|
||||
data_t data[size];
|
||||
for (size_t i = 0 ; i < size ; i++) {
|
||||
data[i] = read((size_t)OUTPUT_ADDRS[id]+i);
|
||||
}
|
||||
mpz_import(value, size, -1, sizeof(data_t), 0, 0, data);
|
||||
if (log) {
|
||||
char* v_str = mpz_get_str(NULL, 16, value);
|
||||
fprintf(stderr, "* PEEK %s.%s -> 0x%s *\n", TARGET_NAME, (const char*)OUTPUT_NAMES[id], v_str);
|
||||
free(v_str);
|
||||
}
|
||||
}
|
||||
|
||||
bool simif_t::expect(size_t id, mpz_t& expected) {
|
||||
mpz_t value;
|
||||
mpz_init(value);
|
||||
peek(id, value);
|
||||
bool pass = mpz_cmp(value, expected) == 0;
|
||||
if (log) {
|
||||
char* v_str = mpz_get_str(NULL, 16, value);
|
||||
char* e_str = mpz_get_str(NULL, 16, expected);
|
||||
fprintf(stderr, "* EXPECT %s.%s -> 0x%s ?= 0x%s : %s\n",
|
||||
TARGET_NAME, (const char*)OUTPUT_NAMES[id], v_str, e_str, pass ? "PASS" : "FAIL");
|
||||
free(v_str);
|
||||
free(e_str);
|
||||
}
|
||||
mpz_clear(value);
|
||||
return expect(pass, NULL);
|
||||
}
|
||||
|
||||
void simif_t::step(uint32_t n, bool blocking) {
|
||||
if (n == 0) return;
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
reservoir_sampling(n);
|
||||
#endif
|
||||
// take steps
|
||||
if (log) fprintf(stderr, "* STEP %d -> %llu *\n", n, (t + n));
|
||||
take_steps(n, blocking);
|
||||
t += n;
|
||||
}
|
||||
|
||||
void simif_t::load_mem(std::string filename) {
|
||||
fprintf(stdout, "[loadmem] start loading\n");
|
||||
std::ifstream file(filename.c_str());
|
||||
if (!file) {
|
||||
fprintf(stderr, "Cannot open %s\n", filename.c_str());
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
const size_t chunk = MEM_DATA_BITS / 4;
|
||||
size_t addr = 0;
|
||||
std::string line;
|
||||
mpz_t data;
|
||||
mpz_init(data);
|
||||
while (std::getline(file, line)) {
|
||||
assert(line.length() % chunk == 0);
|
||||
for (int j = line.length() - chunk ; j >= 0 ; j -= chunk) {
|
||||
mpz_set_str(data, line.substr(j, chunk).c_str(), 16);
|
||||
write_mem(addr, data);
|
||||
addr += chunk / 2;
|
||||
}
|
||||
}
|
||||
mpz_clear(data);
|
||||
file.close();
|
||||
fprintf(stdout, "[loadmem] done\n");
|
||||
}
|
||||
|
||||
// NB: mpz_t variables may not export <size> <data_t> beats, if initialized with an array of zeros.
|
||||
void simif_t::read_mem(size_t addr, mpz_t& value) {
|
||||
write(this->loadmem_mmio_addrs->R_ADDRESS_H, addr >> 32);
|
||||
write(this->loadmem_mmio_addrs->R_ADDRESS_L, addr & ((1ULL << 32) - 1));
|
||||
const size_t size = MEM_DATA_CHUNK;
|
||||
data_t data[size];
|
||||
for (size_t i = 0 ; i < size ; i++) {
|
||||
data[i] = read(this->loadmem_mmio_addrs->R_DATA);
|
||||
}
|
||||
mpz_import(value, size, -1, sizeof(data_t), 0, 0, data);
|
||||
}
|
||||
|
||||
void simif_t::write_mem(size_t addr, mpz_t& value) {
|
||||
write(this->loadmem_mmio_addrs->W_ADDRESS_H, addr >> 32);
|
||||
write(this->loadmem_mmio_addrs->W_ADDRESS_L, addr & ((1ULL << 32) - 1));
|
||||
write(this->loadmem_mmio_addrs->W_LENGTH, 1);
|
||||
size_t size;
|
||||
data_t* data = (data_t*)mpz_export(NULL, &size, -1, sizeof(data_t), 0, 0, value);
|
||||
for (size_t i = 0 ; i < MEM_DATA_CHUNK ; i++) {
|
||||
write(this->loadmem_mmio_addrs->W_DATA, i < size ? data[i] : 0);
|
||||
}
|
||||
}
|
||||
|
||||
#define MEM_DATA_CHUNK_BYTES (MEM_DATA_CHUNK*sizeof(data_t))
|
||||
#define ceil_div(a, b) (((a) - 1) / (b) + 1)
|
||||
|
||||
void simif_t::write_mem_chunk(size_t addr, mpz_t& value, size_t bytes) {
|
||||
write(this->loadmem_mmio_addrs->W_ADDRESS_H, addr >> 32);
|
||||
write(this->loadmem_mmio_addrs->W_ADDRESS_L, addr & ((1ULL << 32) - 1));
|
||||
size_t num_beats = ceil_div(bytes, MEM_DATA_CHUNK_BYTES);
|
||||
write(this->loadmem_mmio_addrs->W_LENGTH, num_beats);
|
||||
size_t size;
|
||||
data_t* data = (data_t*)mpz_export(NULL, &size, -1, sizeof(data_t), 0, 0, value);
|
||||
for (size_t i = 0 ; i < num_beats * MEM_DATA_CHUNK ; i++) {
|
||||
write(this->loadmem_mmio_addrs->W_DATA, i < size ? data[i] : 0);
|
||||
}
|
||||
}
|
||||
|
||||
void simif_t::zero_out_dram() {
|
||||
write(this->loadmem_mmio_addrs->ZERO_OUT_DRAM, 1);
|
||||
while(!read(this->loadmem_mmio_addrs->ZERO_FINISHED));
|
||||
}
|
|
@ -0,0 +1,167 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __SIMIF_H
|
||||
#define __SIMIF_H
|
||||
|
||||
#include <cassert>
|
||||
#include <cstring>
|
||||
#include <sstream>
|
||||
#include <map>
|
||||
#include <queue>
|
||||
#include <random>
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
#include "sample/sample.h"
|
||||
#endif
|
||||
#include <gmp.h>
|
||||
#include <sys/time.h>
|
||||
#define TIME_DIV_CONST 1000000.0;
|
||||
typedef uint64_t midas_time_t;
|
||||
|
||||
midas_time_t timestamp();
|
||||
|
||||
double diff_secs(midas_time_t end, midas_time_t start);
|
||||
|
||||
typedef std::map< std::string, size_t > idmap_t;
|
||||
typedef std::map< std::string, size_t >::const_iterator idmap_it_t;
|
||||
|
||||
class simif_t
|
||||
{
|
||||
public:
|
||||
simif_t();
|
||||
virtual ~simif_t() { }
|
||||
private:
|
||||
// simulation information
|
||||
bool log;
|
||||
bool pass;
|
||||
uint64_t t;
|
||||
uint64_t fail_t;
|
||||
// random numbers
|
||||
uint64_t seed;
|
||||
std::mt19937_64 gen;
|
||||
SIMULATIONMASTER_struct * master_mmio_addrs;
|
||||
LOADMEMWIDGET_struct * loadmem_mmio_addrs;
|
||||
PEEKPOKEBRIDGEMODULE_struct * defaultiowidget_mmio_addrs;
|
||||
midas_time_t sim_start_time;
|
||||
|
||||
inline void take_steps(size_t n, bool blocking) {
|
||||
write(this->master_mmio_addrs->STEP, n);
|
||||
if (blocking) while(!done());
|
||||
}
|
||||
virtual void load_mem(std::string filename);
|
||||
|
||||
public:
|
||||
// Simulation APIs
|
||||
virtual void init(int argc, char** argv, bool log = false);
|
||||
virtual int finish();
|
||||
virtual void step(uint32_t n, bool blocking = true);
|
||||
inline bool done() { return read(this->master_mmio_addrs->DONE); }
|
||||
|
||||
// Widget communication
|
||||
virtual void write(size_t addr, data_t data) = 0;
|
||||
virtual data_t read(size_t addr) = 0;
|
||||
virtual ssize_t pull(size_t addr, char *data, size_t size) = 0;
|
||||
virtual ssize_t push(size_t addr, char *data, size_t size) = 0;
|
||||
|
||||
inline void poke(size_t id, data_t value) {
|
||||
if (log) fprintf(stderr, "* POKE %s.%s <- 0x%x *\n",
|
||||
TARGET_NAME, INPUT_NAMES[id], value);
|
||||
write(INPUT_ADDRS[id], value);
|
||||
}
|
||||
|
||||
inline data_t peek(size_t id) {
|
||||
data_t value = read(((unsigned int*)OUTPUT_ADDRS)[id]);
|
||||
if (log) fprintf(stderr, "* PEEK %s.%s -> 0x%x *\n",
|
||||
TARGET_NAME, (const char*)OUTPUT_NAMES[id], value);
|
||||
return value;
|
||||
}
|
||||
|
||||
inline bool expect(size_t id, data_t expected) {
|
||||
data_t value = peek(id);
|
||||
bool pass = value == expected;
|
||||
if (log) fprintf(stderr, "* EXPECT %s.%s -> 0x%x ?= 0x%x : %s\n",
|
||||
TARGET_NAME, (const char*)OUTPUT_NAMES[id], value, expected, pass ? "PASS" : "FAIL");
|
||||
return expect(pass, NULL);
|
||||
}
|
||||
|
||||
inline bool expect(bool pass, const char *s) {
|
||||
if (log && s) fprintf(stderr, "* %s : %s *\n", s, pass ? "PASS" : "FAIL");
|
||||
if (this->pass && !pass) fail_t = t;
|
||||
this->pass &= pass;
|
||||
return pass;
|
||||
}
|
||||
|
||||
void poke(size_t id, mpz_t& value);
|
||||
void peek(size_t id, mpz_t& value);
|
||||
bool expect(size_t id, mpz_t& expected);
|
||||
|
||||
// LOADMEM functions
|
||||
void read_mem(size_t addr, mpz_t& value);
|
||||
void write_mem(size_t addr, mpz_t& value);
|
||||
void write_mem_chunk(size_t addr, mpz_t& value, size_t bytes);
|
||||
void zero_out_dram();
|
||||
|
||||
uint64_t get_seed() { return seed; };
|
||||
|
||||
// A default reset scheme that holds reset high for pulse_length cycles
|
||||
void target_reset(int pulse_length = 5);
|
||||
|
||||
// Returns an upper bound for the cycle reached by the target
|
||||
// If using blocking steps, this will be ~equivalent to actual_tcycle()
|
||||
uint64_t cycles(){ return t; };
|
||||
// Returns the current target cycle as measured by a hardware counter in the DefaultIOWidget
|
||||
// (# of reset tokens generated)
|
||||
uint64_t actual_tcycle();
|
||||
// Returns the current host cycle as measured by a hardware counter
|
||||
uint64_t hcycle();
|
||||
uint64_t rand_next(uint64_t limit) { return gen() % limit; }
|
||||
|
||||
#ifdef ENABLE_SNAPSHOT
|
||||
private:
|
||||
// sample information
|
||||
#ifdef KEEP_SAMPLES_IN_MEM
|
||||
sample_t** samples;
|
||||
#endif
|
||||
sample_t* last_sample;
|
||||
size_t sample_num;
|
||||
size_t last_sample_id;
|
||||
std::string sample_file;
|
||||
uint64_t sample_cycle;
|
||||
uint64_t snap_cycle;
|
||||
|
||||
size_t trace_count;
|
||||
|
||||
// profile information
|
||||
bool profile;
|
||||
size_t sample_count;
|
||||
midas_time_t sample_time;
|
||||
|
||||
void init_sampling(int argc, char** argv);
|
||||
void finish_sampling();
|
||||
void reservoir_sampling(size_t n);
|
||||
void deterministic_sampling(size_t n);
|
||||
size_t trace_ready_valid_bits(
|
||||
sample_t* sample, bool poke, size_t id, size_t bits_id);
|
||||
inline void save_sample();
|
||||
|
||||
protected:
|
||||
size_t tracelen;
|
||||
sample_t* read_snapshot(bool load = false);
|
||||
sample_t* read_traces(sample_t* s);
|
||||
|
||||
public:
|
||||
uint64_t get_snap_cycle() const {
|
||||
return snap_cycle;
|
||||
}
|
||||
uint64_t get_sample_cycle() const {
|
||||
return sample_cycle;
|
||||
}
|
||||
void set_sample_cycle(uint64_t cycle) {
|
||||
sample_cycle = cycle;
|
||||
}
|
||||
void set_trace_count(uint64_t count) {
|
||||
trace_count = count;
|
||||
}
|
||||
#endif
|
||||
};
|
||||
|
||||
#endif // __SIMIF_H
|
|
@ -0,0 +1,201 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "simif_emul.h"
|
||||
#ifdef VCS
|
||||
#include "midas_context.h"
|
||||
#include "emul/vcs_main.h"
|
||||
#else
|
||||
#include <verilated.h>
|
||||
#if VM_TRACE
|
||||
#include <verilated_vcd_c.h>
|
||||
#endif
|
||||
#endif
|
||||
#include <signal.h>
|
||||
|
||||
uint64_t main_time = 0;
|
||||
std::unique_ptr<mmio_t> master;
|
||||
std::unique_ptr<mmio_t> dma;
|
||||
|
||||
#ifdef VCS
|
||||
midas_context_t* host;
|
||||
midas_context_t target;
|
||||
bool vcs_rst = false;
|
||||
bool vcs_fin = false;
|
||||
#else
|
||||
PLATFORM_TYPE* top = NULL;
|
||||
#if VM_TRACE
|
||||
VerilatedVcdC* tfp = NULL;
|
||||
#endif // VM_TRACE
|
||||
double sc_time_stamp() {
|
||||
return (double) main_time;
|
||||
}
|
||||
extern void tick();
|
||||
#endif // VCS
|
||||
|
||||
void finish() {
|
||||
#ifdef VCS
|
||||
vcs_fin = true;
|
||||
target.switch_to();
|
||||
#else
|
||||
#if VM_TRACE
|
||||
if (tfp) tfp->close();
|
||||
delete tfp;
|
||||
#endif // VM_TRACE
|
||||
#endif // VCS
|
||||
}
|
||||
|
||||
void handle_sigterm(int sig) {
|
||||
finish();
|
||||
}
|
||||
|
||||
simif_emul_t::~simif_emul_t() { }
|
||||
|
||||
void simif_emul_t::init(int argc, char** argv, bool log) {
|
||||
// Parse args
|
||||
std::vector<std::string> args(argv + 1, argv + argc);
|
||||
std::string waveform = "dump.vcd";
|
||||
std::string loadmem;
|
||||
bool fastloadmem = false;
|
||||
bool dramsim = false;
|
||||
uint64_t memsize = 1L << MEM_ADDR_BITS;
|
||||
for (auto arg: args) {
|
||||
if (arg.find("+waveform=") == 0) {
|
||||
waveform = arg.c_str() + 10;
|
||||
}
|
||||
if (arg.find("+loadmem=") == 0) {
|
||||
loadmem = arg.c_str() + 9;
|
||||
}
|
||||
if (arg.find("+fastloadmem") == 0) {
|
||||
fastloadmem = true;
|
||||
}
|
||||
if (arg.find("+dramsim") == 0) {
|
||||
dramsim = true;
|
||||
}
|
||||
if (arg.find("+memsize=") == 0) {
|
||||
memsize = strtoll(arg.c_str() + 9, NULL, 10);
|
||||
}
|
||||
if (arg.find("+fuzz-host-timing=") == 0) {
|
||||
maximum_host_delay = atoi(arg.c_str() + 18);
|
||||
}
|
||||
}
|
||||
|
||||
void* mems[1];
|
||||
mems[0] = ::init(memsize, dramsim);
|
||||
if (mems[0] && fastloadmem && !loadmem.empty()) {
|
||||
fprintf(stdout, "[fast loadmem] %s\n", loadmem.c_str());
|
||||
::load_mem(mems, loadmem.c_str(), MEM_DATA_BITS / 8, 1);
|
||||
}
|
||||
|
||||
signal(SIGTERM, handle_sigterm);
|
||||
#ifdef VCS
|
||||
host = midas_context_t::current();
|
||||
target_args_t *targs = new target_args_t(argc, argv);
|
||||
target.init(target_thread, targs);
|
||||
vcs_rst = true;
|
||||
for (size_t i = 0 ; i < 10 ; i++)
|
||||
target.switch_to();
|
||||
vcs_rst = false;
|
||||
#else
|
||||
Verilated::commandArgs(argc, argv); // Remember args
|
||||
|
||||
top = new PLATFORM_TYPE;
|
||||
#if VM_TRACE // If emul was invoked with --trace
|
||||
tfp = new VerilatedVcdC;
|
||||
Verilated::traceEverOn(true); // Verilator must compute traced signals
|
||||
VL_PRINTF("Enabling waves: %s\n", waveform.c_str());
|
||||
top->trace(tfp, 99); // Trace 99 levels of hierarchy
|
||||
tfp->open(waveform.c_str()); // Open the dump file
|
||||
#endif // VM_TRACE
|
||||
|
||||
top->reset = 1;
|
||||
for (size_t i = 0 ; i < 10 ; i++) ::tick();
|
||||
top->reset = 0;
|
||||
#endif
|
||||
|
||||
simif_t::init(argc, argv, log);
|
||||
}
|
||||
|
||||
int simif_emul_t::finish() {
|
||||
int exitcode = simif_t::finish();
|
||||
::finish();
|
||||
return exitcode;
|
||||
}
|
||||
|
||||
void simif_emul_t::advance_target() {
|
||||
int cycles_to_wait = rand_next(maximum_host_delay) + 1;
|
||||
for (int i = 0; i < cycles_to_wait; i++) {
|
||||
#ifdef VCS
|
||||
target.switch_to();
|
||||
#else
|
||||
::tick();
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
void simif_emul_t::wait_write(std::unique_ptr<mmio_t>& mmio) {
|
||||
while(!mmio->write_resp()) advance_target();
|
||||
}
|
||||
|
||||
void simif_emul_t::wait_read(std::unique_ptr<mmio_t>& mmio, void *data) {
|
||||
while(!mmio->read_resp(data)) advance_target();
|
||||
}
|
||||
|
||||
void simif_emul_t::write(size_t addr, data_t data) {
|
||||
size_t strb = (1 << CTRL_STRB_BITS) - 1;
|
||||
master->write_req(addr << CHANNEL_SIZE, CHANNEL_SIZE, 0, &data, &strb);
|
||||
wait_write(master);
|
||||
}
|
||||
|
||||
data_t simif_emul_t::read(size_t addr) {
|
||||
data_t data;
|
||||
master->read_req(addr << CHANNEL_SIZE, CHANNEL_SIZE, 0);
|
||||
wait_read(master, &data);
|
||||
return data;
|
||||
}
|
||||
|
||||
#define MAX_LEN 255
|
||||
|
||||
ssize_t simif_emul_t::pull(size_t addr, char* data, size_t size) {
|
||||
ssize_t len = (size - 1) / DMA_WIDTH;
|
||||
|
||||
while (len >= 0) {
|
||||
size_t part_len = len % (MAX_LEN + 1);
|
||||
|
||||
dma->read_req(addr, DMA_SIZE, part_len);
|
||||
wait_read(dma, data);
|
||||
|
||||
len -= (part_len + 1);
|
||||
addr += (part_len + 1) * DMA_WIDTH;
|
||||
data += (part_len + 1) * DMA_WIDTH;
|
||||
}
|
||||
return size;
|
||||
}
|
||||
|
||||
ssize_t simif_emul_t::push(size_t addr, char *data, size_t size) {
|
||||
ssize_t len = (size - 1) / DMA_WIDTH;
|
||||
size_t remaining = size - len * DMA_WIDTH;
|
||||
size_t strb[len + 1];
|
||||
size_t *strb_ptr = &strb[0];
|
||||
|
||||
for (int i = 0; i < len; i++)
|
||||
strb[i] = (1LL << DMA_WIDTH) - 1;
|
||||
|
||||
if (remaining == DMA_WIDTH)
|
||||
strb[len] = strb[0];
|
||||
else
|
||||
strb[len] = (1LL << remaining) - 1;
|
||||
|
||||
while (len >= 0) {
|
||||
size_t part_len = len % (MAX_LEN + 1);
|
||||
|
||||
dma->write_req(addr, DMA_SIZE, part_len, data, strb_ptr);
|
||||
wait_write(dma);
|
||||
|
||||
len -= (part_len + 1);
|
||||
addr += (part_len + 1) * DMA_WIDTH;
|
||||
data += (part_len + 1) * DMA_WIDTH;
|
||||
strb_ptr += (part_len + 1);
|
||||
}
|
||||
|
||||
return size;
|
||||
}
|
|
@ -0,0 +1,38 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __SIMIF_EMUL_H
|
||||
#define __SIMIF_EMUL_H
|
||||
|
||||
#include <memory>
|
||||
|
||||
#include "simif.h"
|
||||
#include "mm.h"
|
||||
#include "mm_dramsim2.h"
|
||||
#include "emul/mmio.h"
|
||||
|
||||
// simif_emul_t is a concrete simif_t implementation for Software RTL simulators
|
||||
// The basis for MIDAS-level simulation
|
||||
class simif_emul_t : public virtual simif_t
|
||||
{
|
||||
public:
|
||||
simif_emul_t() { }
|
||||
virtual ~simif_emul_t();
|
||||
virtual void init(int argc, char** argv, bool log = false);
|
||||
virtual int finish();
|
||||
|
||||
virtual void write(size_t addr, data_t data);
|
||||
virtual data_t read(size_t addr);
|
||||
virtual ssize_t pull(size_t addr, char* data, size_t size);
|
||||
virtual ssize_t push(size_t addr, char* data, size_t size);
|
||||
|
||||
private:
|
||||
// The maximum number of cycles the RTL simulator can advance before
|
||||
// switching back to the driver process. +fuzz-host-timings sets this to a value > 1, introducing random delays
|
||||
// in MMIO (read, write) and DMA (push, pull) requests
|
||||
int maximum_host_delay = 1;
|
||||
void advance_target();
|
||||
void wait_read(std::unique_ptr<mmio_t>& mmio, void *data);
|
||||
void wait_write(std::unique_ptr<mmio_t>& mmio);
|
||||
};
|
||||
|
||||
#endif // __SIMIF_EMUL_H
|
|
@ -0,0 +1,212 @@
|
|||
#include "simif_f1.h"
|
||||
#include <cassert>
|
||||
|
||||
#include <fcntl.h>
|
||||
#include <sys/stat.h>
|
||||
#include <sys/types.h>
|
||||
#include <unistd.h>
|
||||
|
||||
simif_f1_t::simif_f1_t(int argc, char** argv) {
|
||||
#ifdef SIMULATION_XSIM
|
||||
mkfifo(driver_to_xsim, 0666);
|
||||
fprintf(stderr, "opening driver to xsim\n");
|
||||
driver_to_xsim_fd = open(driver_to_xsim, O_WRONLY);
|
||||
fprintf(stderr, "opening xsim to driver\n");
|
||||
xsim_to_driver_fd = open(xsim_to_driver, O_RDONLY);
|
||||
#else
|
||||
slot_id = -1;
|
||||
std::vector<std::string> args(argv + 1, argv + argc);
|
||||
for (auto &arg: args) {
|
||||
if (arg.find("+slotid=") == 0) {
|
||||
slot_id = atoi((arg.c_str()) + 8);
|
||||
}
|
||||
}
|
||||
if (slot_id == -1) {
|
||||
fprintf(stderr, "Slot ID not specified. Assuming Slot 0\n");
|
||||
slot_id = 0;
|
||||
}
|
||||
fpga_setup(slot_id);
|
||||
#endif
|
||||
}
|
||||
|
||||
void simif_f1_t::check_rc(int rc, char * infostr) {
|
||||
#ifndef SIMULATION_XSIM
|
||||
if (rc) {
|
||||
if (infostr) {
|
||||
fprintf(stderr, "%s\n", infostr);
|
||||
}
|
||||
fprintf(stderr, "INVALID RETCODE: %d\n", rc, infostr);
|
||||
fpga_shutdown();
|
||||
exit(1);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
void simif_f1_t::fpga_shutdown() {
|
||||
#ifndef SIMULATION_XSIM
|
||||
int rc = fpga_pci_detach(pci_bar_handle);
|
||||
// don't call check_rc because of fpga_shutdown call. do it manually:
|
||||
if (rc) {
|
||||
fprintf(stderr, "Failure while detaching from the fpga: %d\n", rc);
|
||||
}
|
||||
close(edma_write_fd);
|
||||
close(edma_read_fd);
|
||||
#endif
|
||||
}
|
||||
|
||||
void simif_f1_t::fpga_setup(int slot_id) {
|
||||
#ifndef SIMULATION_XSIM
|
||||
/*
|
||||
* pci_vendor_id and pci_device_id values below are Amazon's and avaliable
|
||||
* to use for a given FPGA slot.
|
||||
* Users may replace these with their own if allocated to them by PCI SIG
|
||||
*/
|
||||
uint16_t pci_vendor_id = 0x1D0F; /* Amazon PCI Vendor ID */
|
||||
uint16_t pci_device_id = 0xF000; /* PCI Device ID preassigned by Amazon for F1 applications */
|
||||
|
||||
int rc = fpga_pci_init();
|
||||
check_rc(rc, "fpga_pci_init FAILED");
|
||||
|
||||
/* check AFI status */
|
||||
struct fpga_mgmt_image_info info = {0};
|
||||
|
||||
/* get local image description, contains status, vendor id, and device id. */
|
||||
rc = fpga_mgmt_describe_local_image(slot_id, &info,0);
|
||||
check_rc(rc, "Unable to get AFI information from slot. Are you running as root?");
|
||||
|
||||
/* check to see if the slot is ready */
|
||||
if (info.status != FPGA_STATUS_LOADED) {
|
||||
rc = 1;
|
||||
check_rc(rc, "AFI in Slot is not in READY state !");
|
||||
}
|
||||
|
||||
fprintf(stderr, "AFI PCI Vendor ID: 0x%x, Device ID 0x%x\n",
|
||||
info.spec.map[FPGA_APP_PF].vendor_id,
|
||||
info.spec.map[FPGA_APP_PF].device_id);
|
||||
|
||||
/* confirm that the AFI that we expect is in fact loaded */
|
||||
if (info.spec.map[FPGA_APP_PF].vendor_id != pci_vendor_id ||
|
||||
info.spec.map[FPGA_APP_PF].device_id != pci_device_id) {
|
||||
fprintf(stderr, "AFI does not show expected PCI vendor id and device ID. If the AFI "
|
||||
"was just loaded, it might need a rescan. Rescanning now.\n");
|
||||
|
||||
rc = fpga_pci_rescan_slot_app_pfs(slot_id);
|
||||
check_rc(rc, "Unable to update PF for slot");
|
||||
/* get local image description, contains status, vendor id, and device id. */
|
||||
rc = fpga_mgmt_describe_local_image(slot_id, &info,0);
|
||||
check_rc(rc, "Unable to get AFI information from slot");
|
||||
|
||||
fprintf(stderr, "AFI PCI Vendor ID: 0x%x, Device ID 0x%x\n",
|
||||
info.spec.map[FPGA_APP_PF].vendor_id,
|
||||
info.spec.map[FPGA_APP_PF].device_id);
|
||||
|
||||
/* confirm that the AFI that we expect is in fact loaded after rescan */
|
||||
if (info.spec.map[FPGA_APP_PF].vendor_id != pci_vendor_id ||
|
||||
info.spec.map[FPGA_APP_PF].device_id != pci_device_id) {
|
||||
rc = 1;
|
||||
check_rc(rc, "The PCI vendor id and device of the loaded AFI are not "
|
||||
"the expected values.");
|
||||
}
|
||||
}
|
||||
|
||||
/* attach to BAR0 */
|
||||
pci_bar_handle = PCI_BAR_HANDLE_INIT;
|
||||
rc = fpga_pci_attach(slot_id, FPGA_APP_PF, APP_PF_BAR0, 0, &pci_bar_handle);
|
||||
check_rc(rc, "fpga_pci_attach FAILED");
|
||||
|
||||
// EDMA setup
|
||||
char device_file_name[256];
|
||||
char device_file_name2[256];
|
||||
|
||||
sprintf(device_file_name, "/dev/xdma%d_h2c_0", slot_id);
|
||||
printf("Using xdma write queue: %s\n", device_file_name);
|
||||
sprintf(device_file_name2, "/dev/xdma%d_c2h_0", slot_id);
|
||||
printf("Using xdma read queue: %s\n", device_file_name2);
|
||||
|
||||
|
||||
edma_write_fd = open(device_file_name, O_WRONLY);
|
||||
edma_read_fd = open(device_file_name2, O_RDONLY);
|
||||
assert(edma_write_fd >= 0);
|
||||
assert(edma_read_fd >= 0);
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
|
||||
simif_f1_t::~simif_f1_t() {
|
||||
fpga_shutdown();
|
||||
}
|
||||
|
||||
void simif_f1_t::write(size_t addr, uint32_t data) {
|
||||
// addr is really a (32-byte) word address because of zynq implementation
|
||||
addr <<= 2;
|
||||
#ifdef SIMULATION_XSIM
|
||||
uint64_t cmd = (((uint64_t)(0x80000000 | addr)) << 32) | (uint64_t)data;
|
||||
char * buf = (char*)&cmd;
|
||||
::write(driver_to_xsim_fd, buf, 8);
|
||||
#else
|
||||
int rc = fpga_pci_poke(pci_bar_handle, addr, data);
|
||||
check_rc(rc, NULL);
|
||||
#endif
|
||||
}
|
||||
|
||||
uint32_t simif_f1_t::read(size_t addr) {
|
||||
addr <<= 2;
|
||||
#ifdef SIMULATION_XSIM
|
||||
uint64_t cmd = addr;
|
||||
char * buf = (char*)&cmd;
|
||||
::write(driver_to_xsim_fd, buf, 8);
|
||||
|
||||
int gotdata = 0;
|
||||
while (gotdata == 0) {
|
||||
gotdata = ::read(xsim_to_driver_fd, buf, 8);
|
||||
if (gotdata != 0 && gotdata != 8) {
|
||||
printf("ERR GOTDATA %d\n", gotdata);
|
||||
}
|
||||
}
|
||||
return *((uint64_t*)buf);
|
||||
#else
|
||||
uint32_t value;
|
||||
int rc = fpga_pci_peek(pci_bar_handle, addr, &value);
|
||||
return value & 0xFFFFFFFF;
|
||||
#endif
|
||||
}
|
||||
|
||||
ssize_t simif_f1_t::pull(size_t addr, char* data, size_t size) {
|
||||
#ifdef SIMULATION_XSIM
|
||||
return -1; // TODO
|
||||
#else
|
||||
return ::pread(edma_read_fd, data, size, addr);
|
||||
#endif
|
||||
}
|
||||
|
||||
ssize_t simif_f1_t::push(size_t addr, char* data, size_t size) {
|
||||
#ifdef SIMULATION_XSIM
|
||||
return -1; // TODO
|
||||
#else
|
||||
return ::pwrite(edma_write_fd, data, size, addr);
|
||||
#endif
|
||||
}
|
||||
|
||||
uint32_t simif_f1_t::is_write_ready() {
|
||||
uint64_t addr = 0x4;
|
||||
#ifdef SIMULATION_XSIM
|
||||
uint64_t cmd = addr;
|
||||
char * buf = (char*)&cmd;
|
||||
::write(driver_to_xsim_fd, buf, 8);
|
||||
|
||||
int gotdata = 0;
|
||||
while (gotdata == 0) {
|
||||
gotdata = ::read(xsim_to_driver_fd, buf, 8);
|
||||
if (gotdata != 0 && gotdata != 8) {
|
||||
printf("ERR GOTDATA %d\n", gotdata);
|
||||
}
|
||||
}
|
||||
return *((uint64_t*)buf);
|
||||
#else
|
||||
uint32_t value;
|
||||
int rc = fpga_pci_peek(pci_bar_handle, addr, &value);
|
||||
check_rc(rc, NULL);
|
||||
return value & 0xFFFFFFFF;
|
||||
#endif
|
||||
}
|
|
@ -0,0 +1,41 @@
|
|||
#ifndef __SIMIF_F1_H
|
||||
#define __SIMIF_F1_H
|
||||
|
||||
#include "simif.h" // from midas
|
||||
|
||||
#ifndef SIMULATION_XSIM
|
||||
#include <fpga_pci.h>
|
||||
#include <fpga_mgmt.h>
|
||||
#endif
|
||||
|
||||
class simif_f1_t: public virtual simif_t
|
||||
{
|
||||
public:
|
||||
simif_f1_t(int argc, char** argv);
|
||||
virtual ~simif_f1_t();
|
||||
virtual void write(size_t addr, uint32_t data);
|
||||
virtual uint32_t read(size_t addr);
|
||||
virtual ssize_t pull(size_t addr, char* data, size_t size);
|
||||
virtual ssize_t push(size_t addr, char* data, size_t size);
|
||||
uint32_t is_write_ready();
|
||||
void check_rc(int rc, char * infostr);
|
||||
void fpga_shutdown();
|
||||
void fpga_setup(int slot_id);
|
||||
private:
|
||||
char in_buf[MMIO_WIDTH];
|
||||
char out_buf[MMIO_WIDTH];
|
||||
#ifdef SIMULATION_XSIM
|
||||
char * driver_to_xsim = "/tmp/driver_to_xsim";
|
||||
char * xsim_to_driver = "/tmp/xsim_to_driver";
|
||||
int driver_to_xsim_fd;
|
||||
int xsim_to_driver_fd;
|
||||
#else
|
||||
// int rc;
|
||||
int slot_id;
|
||||
int edma_write_fd;
|
||||
int edma_read_fd;
|
||||
pci_bar_handle_t pci_bar_handle;
|
||||
#endif
|
||||
};
|
||||
|
||||
#endif // __SIMIF_F1_H
|
|
@ -0,0 +1,33 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "simif_zynq.h"
|
||||
#include <cassert>
|
||||
#include <fcntl.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/mman.h>
|
||||
|
||||
#define read_reg(r) (dev_vaddr[r])
|
||||
#define write_reg(r, v) (dev_vaddr[r] = v)
|
||||
|
||||
simif_zynq_t::simif_zynq_t() {
|
||||
int fd = open("/dev/mem", O_RDWR|O_SYNC);
|
||||
assert(fd != -1);
|
||||
|
||||
int host_prot = PROT_READ | PROT_WRITE;
|
||||
int flags = MAP_SHARED;
|
||||
uintptr_t pgsize = sysconf(_SC_PAGESIZE);
|
||||
assert(dev_paddr % pgsize == 0);
|
||||
|
||||
dev_vaddr = (uintptr_t*)mmap(0, pgsize, host_prot, flags, fd, dev_paddr);
|
||||
assert(dev_vaddr != MAP_FAILED);
|
||||
}
|
||||
|
||||
void simif_zynq_t::write(size_t addr, uint32_t data) {
|
||||
write_reg(addr, data);
|
||||
__sync_synchronize();
|
||||
}
|
||||
|
||||
uint32_t simif_zynq_t::read(size_t addr) {
|
||||
__sync_synchronize();
|
||||
return read_reg(addr);
|
||||
}
|
|
@ -0,0 +1,27 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __SIMIF_ZYNQ_H
|
||||
#define __SIMIF_ZYNQ_H
|
||||
|
||||
#include "simif.h"
|
||||
|
||||
class simif_zynq_t: public virtual simif_t
|
||||
{
|
||||
public:
|
||||
simif_zynq_t();
|
||||
virtual ~simif_zynq_t() { }
|
||||
|
||||
private:
|
||||
volatile uintptr_t* dev_vaddr;
|
||||
const static uintptr_t dev_paddr = 0x43C00000;
|
||||
|
||||
protected:
|
||||
virtual void write(size_t addr, uint32_t data);
|
||||
virtual uint32_t read(size_t addr);
|
||||
virtual size_t pread(size_t addr, char* data, size_t size) {
|
||||
// Not supported
|
||||
return 0;
|
||||
}
|
||||
};
|
||||
|
||||
#endif // __SIMIF_ZYNQ_H
|
|
@ -0,0 +1,83 @@
|
|||
# See LICENSE for license details.
|
||||
#
|
||||
# Makefrag for generating MIDAS's synthesizable unit tests
|
||||
|
||||
# Compulsory arguments:
|
||||
# ROCKETCHIP_DIR: Location of rocket chip source -- to grab verilog sources and simulation makefrags
|
||||
# TODO: These are provided as resources -- fix.
|
||||
# SBT: command to invoke sbt
|
||||
# GEN_DIR: Directory into which to emit generate verilog
|
||||
|
||||
DESIGN := TestHarness
|
||||
CONFIG ?= AllUnitTests
|
||||
OUT_DIR ?= $(GEN_DIR)
|
||||
TB ?= TestDriver
|
||||
EMUL ?= vcs
|
||||
CLOCK_PERIOD ?= 1.0
|
||||
|
||||
MAKEFRAG_DIR:=$(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
|
||||
sim_makefrag_dir := $(MAKEFRAG_DIR)/../rtlsim
|
||||
|
||||
vsrc := $(ROCKETCHIP_DIR)/src/main/resources/vsrc
|
||||
csrc := $(ROCKETCHIP_DIR)/src/main/resources/csrc
|
||||
|
||||
# Stupidly guess what this test might depend on
|
||||
src_path = src/main/scala
|
||||
scala_srcs := $(shell find $(BASE_DIR) -name "*.scala")
|
||||
|
||||
$(GEN_DIR)/$(DESIGN).v $(GEN_DIR)/$(DESIGN).behav_srams.v: $(scala_srcs)
|
||||
mkdir -p $(@D)
|
||||
cd $(BASE_DIR) && $(SBT) "runMain midas.unittest.Generator -td $(GEN_DIR) -conf $(CONFIG)"
|
||||
touch $(GEN_DIR)/$(DESIGN).behav_srams.v
|
||||
|
||||
verilog: $(GEN_DIR)/$(DESIGN).v
|
||||
|
||||
# Common SW RTL simulation Makefrag arguments
|
||||
# These aren't required as yet, but will be in the future
|
||||
#bb_vsrcs = \
|
||||
# $(vsrc)/ClockDivider2.v \
|
||||
# $(vsrc)/ClockDivider3.v \
|
||||
# $(vsrc)/AsyncResetReg.v \
|
||||
#
|
||||
#sim_vsrcs = \
|
||||
# $(bb_vsrcs)
|
||||
|
||||
emul_v := $(GEN_DIR)/$(DESIGN).v #$(sim_vsrcs)
|
||||
emul_h :=
|
||||
emul_cc :=
|
||||
|
||||
# VCS Makefrag arguments
|
||||
ifeq ($(EMUL),vcs)
|
||||
vcs_wrapper_v := $(vsrc)/TestDriver.v
|
||||
VCS_FLAGS = +verbose
|
||||
include $(sim_makefrag_dir)/Makefrag-vcs
|
||||
|
||||
vcs = $(OUT_DIR)/$(DESIGN)
|
||||
vcs_debug = $(OUT_DIR)/$(DESIGN)-debug
|
||||
|
||||
vcs: $(vcs)
|
||||
vcs-debug: $(vcs_debug)
|
||||
else
|
||||
|
||||
# Verilator Makefrag arguments
|
||||
top_module := TestHarness
|
||||
override CFLAGS += -I$(csrc) -include $(csrc)/verilator.h -DTEST_HARNESS=V$(top_module) -std=c++11
|
||||
override emul_cc += $(sim_makefrag_dir)/generic_vharness.cc
|
||||
|
||||
include $(sim_makefrag_dir)/Makefrag-verilator
|
||||
|
||||
verilator = $(OUT_DIR)/V$(DESIGN)
|
||||
verilator_debug = $(OUT_DIR)/V$(DESIGN)-debug
|
||||
|
||||
verilator: $(verilator)
|
||||
verilator-debug: $(verilator_debug)
|
||||
endif
|
||||
|
||||
# Run recipes
|
||||
run-midas-unittests: $($(EMUL))
|
||||
cd $(GEN_DIR) && $<
|
||||
|
||||
run-midas-unittests-debug: $($(EMUL)_debug)
|
||||
cd $(GEN_DIR) && $<
|
||||
|
||||
.PHONY: run-midas-unittests run-midas-unittests-debug verilog
|
|
@ -0,0 +1,2 @@
|
|||
*.o
|
||||
*.a
|
|
@ -0,0 +1,72 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "midas_context.h"
|
||||
#include <stdlib.h>
|
||||
#include <cassert>
|
||||
|
||||
static __thread midas_context_t* cur = NULL;
|
||||
|
||||
midas_context_t::midas_context_t()
|
||||
: creator(NULL), func(NULL), arg(NULL),
|
||||
mutex(PTHREAD_MUTEX_INITIALIZER),
|
||||
cond(PTHREAD_COND_INITIALIZER), flag(0)
|
||||
{
|
||||
}
|
||||
|
||||
midas_context_t* midas_context_t::current()
|
||||
{
|
||||
if (cur == NULL)
|
||||
{
|
||||
cur = new midas_context_t;
|
||||
cur->thread = pthread_self();
|
||||
cur->flag = 1;
|
||||
}
|
||||
return cur;
|
||||
}
|
||||
|
||||
void* midas_context_t::wrapper(void* a)
|
||||
{
|
||||
midas_context_t* ctx = static_cast<midas_context_t*>(a);
|
||||
cur = ctx;
|
||||
ctx->creator->switch_to();
|
||||
|
||||
ctx->func(ctx->arg);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void midas_context_t::init(int (*f)(void*), void* a)
|
||||
{
|
||||
func = f;
|
||||
arg = a;
|
||||
creator = current();
|
||||
|
||||
assert(flag == 0);
|
||||
|
||||
pthread_mutex_lock(&creator->mutex);
|
||||
creator->flag = 0;
|
||||
if (pthread_create(&thread, NULL, &midas_context_t::wrapper, this) != 0)
|
||||
abort();
|
||||
pthread_detach(thread);
|
||||
while (!creator->flag)
|
||||
pthread_cond_wait(&creator->cond, &creator->mutex);
|
||||
pthread_mutex_unlock(&creator->mutex);
|
||||
}
|
||||
|
||||
midas_context_t::~midas_context_t()
|
||||
{
|
||||
assert(this != cur);
|
||||
}
|
||||
|
||||
void midas_context_t::switch_to()
|
||||
{
|
||||
assert(this != cur);
|
||||
cur->flag = 0;
|
||||
this->flag = 1;
|
||||
pthread_mutex_lock(&this->mutex);
|
||||
pthread_cond_signal(&this->cond);
|
||||
pthread_mutex_unlock(&this->mutex);
|
||||
pthread_mutex_lock(&cur->mutex);
|
||||
while (!cur->flag)
|
||||
pthread_cond_wait(&cur->cond, &cur->mutex);
|
||||
pthread_mutex_unlock(&cur->mutex);
|
||||
}
|
|
@ -0,0 +1,28 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef __CONTEXT_H
|
||||
#define __CONTEXT_H
|
||||
|
||||
#include <pthread.h>
|
||||
|
||||
class midas_context_t
|
||||
{
|
||||
public:
|
||||
midas_context_t();
|
||||
~midas_context_t();
|
||||
void init(int (*func)(void*), void* arg);
|
||||
void switch_to();
|
||||
static midas_context_t* current();
|
||||
private:
|
||||
midas_context_t* creator;
|
||||
int (*func)(void*);
|
||||
void* arg;
|
||||
|
||||
pthread_t thread;
|
||||
pthread_mutex_t mutex;
|
||||
pthread_cond_t cond;
|
||||
volatile int flag;
|
||||
static void* wrapper(void*);
|
||||
};
|
||||
|
||||
#endif // __CONTEXT_H
|
|
@ -0,0 +1,163 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "mm.h"
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include <cstdlib>
|
||||
#include <cstring>
|
||||
#include <string>
|
||||
#include <cassert>
|
||||
|
||||
void mm_base_t::write(uint64_t addr, uint8_t *data) {
|
||||
addr %= this->size;
|
||||
|
||||
uint8_t* base = this->data + addr;
|
||||
memcpy(base, data, word_size);
|
||||
}
|
||||
|
||||
void mm_base_t::write(uint64_t addr, uint8_t *data, uint64_t strb, uint64_t size)
|
||||
{
|
||||
if (addr > this->size) {
|
||||
char buf[80];
|
||||
snprintf(buf, 80, "Out-of-bounds write @ address: 0x%lx Memory size: 0x%lx\n", addr, this->size);
|
||||
throw(mm_exception(buf));
|
||||
}
|
||||
|
||||
strb &= ((1L << size) - 1) << (addr % word_size);
|
||||
uint8_t *base = this->data + (addr / word_size) * word_size;
|
||||
for (int i = 0; i < word_size; i++) {
|
||||
if (strb & 1)
|
||||
base[i] = data[i];
|
||||
strb >>= 1;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
std::vector<char> mm_base_t::read(uint64_t addr)
|
||||
{
|
||||
if (addr > this->size) {
|
||||
char buf[80];
|
||||
snprintf(buf, 80, "Out-of-bounds read @ address: 0x%lx Memory size: 0x%lx\n", addr, this->size);
|
||||
throw(mm_exception(buf));
|
||||
}
|
||||
uint8_t *base = this->data + addr;
|
||||
return std::vector<char>(base, base + word_size);
|
||||
}
|
||||
|
||||
void mm_base_t::init(size_t sz, int wsz, int lsz)
|
||||
{
|
||||
assert(wsz > 0 && lsz > 0 && (lsz & (lsz-1)) == 0 && lsz % wsz == 0);
|
||||
word_size = wsz;
|
||||
line_size = lsz;
|
||||
data = new uint8_t[sz];
|
||||
size = sz;
|
||||
}
|
||||
|
||||
mm_base_t::~mm_base_t()
|
||||
{
|
||||
delete [] data;
|
||||
}
|
||||
|
||||
void mm_magic_t::init(size_t sz, int wsz, int lsz)
|
||||
{
|
||||
mm_t::init(sz, wsz, lsz);
|
||||
dummy_data.resize(word_size);
|
||||
}
|
||||
|
||||
void mm_magic_t::tick(
|
||||
bool reset,
|
||||
bool ar_valid,
|
||||
uint64_t ar_addr,
|
||||
uint64_t ar_id,
|
||||
uint64_t ar_size,
|
||||
uint64_t ar_len,
|
||||
|
||||
bool aw_valid,
|
||||
uint64_t aw_addr,
|
||||
uint64_t aw_id,
|
||||
uint64_t aw_size,
|
||||
uint64_t aw_len,
|
||||
|
||||
bool w_valid,
|
||||
uint64_t w_strb,
|
||||
void *w_data,
|
||||
bool w_last,
|
||||
|
||||
bool r_ready,
|
||||
bool b_ready)
|
||||
{
|
||||
bool ar_fire = !reset && ar_valid && ar_ready();
|
||||
bool aw_fire = !reset && aw_valid && aw_ready();
|
||||
bool w_fire = !reset && w_valid && w_ready();
|
||||
bool r_fire = !reset && r_valid() && r_ready;
|
||||
bool b_fire = !reset && b_valid() && b_ready;
|
||||
|
||||
if (ar_fire) {
|
||||
uint64_t start_addr = (ar_addr / word_size) * word_size;
|
||||
for (size_t i = 0; i <= ar_len; i++) {
|
||||
auto dat = read(start_addr + i * word_size);
|
||||
rresp.push(mm_rresp_t(ar_id, dat, i == ar_len));
|
||||
}
|
||||
}
|
||||
|
||||
if (aw_fire) {
|
||||
store_addr = aw_addr;
|
||||
store_id = aw_id;
|
||||
store_count = aw_len + 1;
|
||||
store_size = 1 << aw_size;
|
||||
store_inflight = true;
|
||||
}
|
||||
|
||||
if (w_fire) {
|
||||
write(store_addr, (uint8_t*)w_data, w_strb, store_size);
|
||||
store_addr += store_size;
|
||||
store_count--;
|
||||
|
||||
if (store_count == 0) {
|
||||
store_inflight = false;
|
||||
bresp.push(store_id);
|
||||
assert(w_last);
|
||||
}
|
||||
}
|
||||
|
||||
if (b_fire)
|
||||
bresp.pop();
|
||||
|
||||
if (r_fire)
|
||||
rresp.pop();
|
||||
|
||||
cycle++;
|
||||
|
||||
if (reset) {
|
||||
while (!bresp.empty()) bresp.pop();
|
||||
while (!rresp.empty()) rresp.pop();
|
||||
cycle = 0;
|
||||
}
|
||||
}
|
||||
|
||||
void load_mem(void** mems, const char* fn, int line_size, int nchannels)
|
||||
{
|
||||
char* m;
|
||||
int start = 0;
|
||||
std::ifstream in(fn);
|
||||
if (!in)
|
||||
{
|
||||
std::cerr << "could not open " << fn << std::endl;
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
std::string line;
|
||||
while (std::getline(in, line))
|
||||
{
|
||||
#define parse_nibble(c) ((c) >= 'a' ? (c)-'a'+10 : (c)-'0')
|
||||
for (int i = line.length()-2, j = 0; i >= 0; i -= 2, j++) {
|
||||
char data = (parse_nibble(line[i]) << 4) | parse_nibble(line[i+1]);
|
||||
int addr = start + j;
|
||||
int channel = (addr / line_size) % nchannels;
|
||||
m = (char *) mems[channel];
|
||||
addr = (addr / line_size / nchannels) * line_size + (addr % line_size);
|
||||
m[addr] = data;
|
||||
}
|
||||
start += line.length()/2;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,172 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef MM_EMULATOR_H
|
||||
#define MM_EMULATOR_H
|
||||
|
||||
#include <stdint.h>
|
||||
#include <cstring>
|
||||
#include <queue>
|
||||
#include <string>
|
||||
#include <stdexcept>
|
||||
|
||||
class mm_exception : public std::runtime_error {
|
||||
public:
|
||||
explicit mm_exception(const std::string& msg) :
|
||||
std::runtime_error(msg), msg_(msg) {};
|
||||
|
||||
virtual const char* what() const throw()
|
||||
{
|
||||
return msg_.c_str();
|
||||
};
|
||||
|
||||
virtual ~mm_exception() throw () {};
|
||||
|
||||
private:
|
||||
std::string msg_;
|
||||
};
|
||||
|
||||
class mm_base_t
|
||||
{
|
||||
public:
|
||||
mm_base_t(): data(0), size(0) {}
|
||||
virtual void init(size_t sz, int word_size, int line_size);
|
||||
virtual void* get_data() { return data; }
|
||||
virtual size_t get_size() { return size; }
|
||||
virtual size_t get_word_size() { return word_size; }
|
||||
virtual size_t get_line_size() { return line_size; }
|
||||
|
||||
void write(uint64_t addr, uint8_t *data);
|
||||
void write(uint64_t addr, uint8_t *data, uint64_t strb, uint64_t size);
|
||||
std::vector<char> read(uint64_t addr);
|
||||
|
||||
virtual ~mm_base_t();
|
||||
|
||||
protected:
|
||||
uint8_t* data;
|
||||
size_t size;
|
||||
int word_size;
|
||||
int line_size;
|
||||
};
|
||||
|
||||
|
||||
class mm_t: public mm_base_t
|
||||
{
|
||||
public:
|
||||
virtual bool ar_ready() = 0;
|
||||
virtual bool aw_ready() = 0;
|
||||
virtual bool w_ready() = 0;
|
||||
virtual bool b_valid() = 0;
|
||||
virtual uint64_t b_resp() = 0;
|
||||
virtual uint64_t b_id() = 0;
|
||||
virtual bool r_valid() = 0;
|
||||
virtual uint64_t r_resp() = 0;
|
||||
virtual uint64_t r_id() = 0;
|
||||
virtual void *r_data() = 0;
|
||||
virtual bool r_last() = 0;
|
||||
|
||||
virtual void tick
|
||||
(
|
||||
bool reset,
|
||||
|
||||
bool ar_valid,
|
||||
uint64_t ar_addr,
|
||||
uint64_t ar_id,
|
||||
uint64_t ar_size,
|
||||
uint64_t ar_len,
|
||||
|
||||
bool aw_valid,
|
||||
uint64_t aw_addr,
|
||||
uint64_t aw_id,
|
||||
uint64_t aw_size,
|
||||
uint64_t aw_len,
|
||||
|
||||
bool w_valid,
|
||||
uint64_t w_strb,
|
||||
void *w_data,
|
||||
bool w_last,
|
||||
|
||||
bool r_ready,
|
||||
bool b_ready
|
||||
) = 0;
|
||||
};
|
||||
|
||||
struct mm_rresp_t
|
||||
{
|
||||
uint64_t id;
|
||||
std::vector<char> data;
|
||||
bool last;
|
||||
|
||||
mm_rresp_t(uint64_t id, std::vector<char> data, bool last)
|
||||
{
|
||||
this->id = id;
|
||||
this->data = data;
|
||||
this->last = last;
|
||||
}
|
||||
|
||||
mm_rresp_t()
|
||||
{
|
||||
this->id = 0;
|
||||
this->last = false;
|
||||
}
|
||||
};
|
||||
|
||||
class mm_magic_t : public mm_t
|
||||
{
|
||||
public:
|
||||
mm_magic_t() : store_inflight(false) {}
|
||||
|
||||
virtual void init(size_t sz, int word_size, int line_size);
|
||||
|
||||
virtual bool ar_ready() { return true; }
|
||||
virtual bool aw_ready() { return !store_inflight; }
|
||||
virtual bool w_ready() { return store_inflight; }
|
||||
virtual bool b_valid() { return !bresp.empty(); }
|
||||
virtual uint64_t b_resp() { return 0; }
|
||||
virtual uint64_t b_id() { return b_valid() ? bresp.front() : 0; }
|
||||
virtual bool r_valid() { return !rresp.empty(); }
|
||||
virtual uint64_t r_resp() { return 0; }
|
||||
virtual uint64_t r_id() { return r_valid() ? rresp.front().id: 0; }
|
||||
virtual void *r_data() { return r_valid() ? &rresp.front().data[0] : &dummy_data[0]; }
|
||||
virtual bool r_last() { return r_valid() ? rresp.front().last : false; }
|
||||
|
||||
virtual void tick
|
||||
(
|
||||
bool reset,
|
||||
|
||||
bool ar_valid,
|
||||
uint64_t ar_addr,
|
||||
uint64_t ar_id,
|
||||
uint64_t ar_size,
|
||||
uint64_t ar_len,
|
||||
|
||||
bool aw_valid,
|
||||
uint64_t aw_addr,
|
||||
uint64_t aw_id,
|
||||
uint64_t aw_size,
|
||||
uint64_t aw_len,
|
||||
|
||||
bool w_valid,
|
||||
uint64_t w_strb,
|
||||
void *w_data,
|
||||
bool w_last,
|
||||
|
||||
bool r_ready,
|
||||
bool b_ready
|
||||
);
|
||||
|
||||
protected:
|
||||
bool store_inflight;
|
||||
uint64_t store_addr;
|
||||
uint64_t store_id;
|
||||
uint64_t store_size;
|
||||
uint64_t store_count;
|
||||
std::vector<char> dummy_data;
|
||||
std::queue<uint64_t> bresp;
|
||||
|
||||
std::queue<mm_rresp_t> rresp;
|
||||
|
||||
uint64_t cycle;
|
||||
};
|
||||
|
||||
void load_mem(void** mems, const char* fn, int line_size, int nchannels);
|
||||
#endif
|
|
@ -0,0 +1,152 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#include "mm_dramsim2.h"
|
||||
#include "mm.h"
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include <list>
|
||||
#include <queue>
|
||||
#include <cstring>
|
||||
#include <cstdlib>
|
||||
#include <cassert>
|
||||
|
||||
//#define DEBUG_DRAMSIM2
|
||||
|
||||
using namespace DRAMSim;
|
||||
|
||||
void mm_dramsim2_t::read_complete(unsigned id, uint64_t address, uint64_t clock_cycle)
|
||||
{
|
||||
assert(!rreq[address].empty());
|
||||
auto req = rreq[address].front();
|
||||
uint64_t start_addr = (req.addr / word_size) * word_size;
|
||||
for (size_t i = 0; i < req.len; i++) {
|
||||
auto dat = read(start_addr + i * word_size);
|
||||
rresp.push(mm_rresp_t(req.id, dat, (i == req.len - 1)));
|
||||
}
|
||||
read_id_busy[req.id] = false;
|
||||
rreq[address].pop();
|
||||
}
|
||||
|
||||
void mm_dramsim2_t::write_complete(unsigned id, uint64_t address, uint64_t clock_cycle)
|
||||
{
|
||||
assert(!wreq[address].empty());
|
||||
auto b_id = wreq[address].front();
|
||||
bresp.push(b_id);
|
||||
write_id_busy[b_id] = false;
|
||||
wreq[address].pop();
|
||||
}
|
||||
|
||||
void power_callback(double a, double b, double c, double d)
|
||||
{
|
||||
//fprintf(stderr, "power callback: %0.3f, %0.3f, %0.3f, %0.3f\n",a,b,c,d);
|
||||
}
|
||||
|
||||
void mm_dramsim2_t::init(size_t sz, int wsz, int lsz)
|
||||
{
|
||||
assert(lsz == 64); // assumed by dramsim2
|
||||
mm_t::init(sz, wsz, lsz);
|
||||
|
||||
dummy_data.resize(word_size);
|
||||
|
||||
assert(size % (1024*1024) == 0);
|
||||
mem = getMemorySystemInstance(memory_ini, system_ini, ini_dir, "results", size/(1024*1024));
|
||||
|
||||
TransactionCompleteCB *read_cb = new Callback<mm_dramsim2_t, void, unsigned, uint64_t, uint64_t>(this, &mm_dramsim2_t::read_complete);
|
||||
TransactionCompleteCB *write_cb = new Callback<mm_dramsim2_t, void, unsigned, uint64_t, uint64_t>(this, &mm_dramsim2_t::write_complete);
|
||||
mem->RegisterCallbacks(read_cb, write_cb, power_callback);
|
||||
|
||||
#ifdef DEBUG_DRAMSIM2
|
||||
fprintf(stderr,"Dramsim2 init successful\n");
|
||||
#endif
|
||||
}
|
||||
|
||||
bool mm_dramsim2_t::ar_ready() {
|
||||
return mem->willAcceptTransaction();
|
||||
}
|
||||
|
||||
bool mm_dramsim2_t::aw_ready() {
|
||||
return mem->willAcceptTransaction() && !store_inflight;
|
||||
}
|
||||
|
||||
void mm_dramsim2_t::tick(
|
||||
bool reset,
|
||||
|
||||
bool ar_valid,
|
||||
uint64_t ar_addr,
|
||||
uint64_t ar_id,
|
||||
uint64_t ar_size,
|
||||
uint64_t ar_len,
|
||||
|
||||
bool aw_valid,
|
||||
uint64_t aw_addr,
|
||||
uint64_t aw_id,
|
||||
uint64_t aw_size,
|
||||
uint64_t aw_len,
|
||||
|
||||
bool w_valid,
|
||||
uint64_t w_strb,
|
||||
void *w_data,
|
||||
bool w_last,
|
||||
|
||||
bool r_ready,
|
||||
bool b_ready)
|
||||
{
|
||||
bool ar_fire = !reset && ar_valid && ar_ready();
|
||||
bool aw_fire = !reset && aw_valid && aw_ready();
|
||||
bool w_fire = !reset && w_valid && w_ready();
|
||||
bool r_fire = !reset && r_valid() && r_ready;
|
||||
bool b_fire = !reset && b_valid() && b_ready;
|
||||
|
||||
if (mem->willAcceptTransaction()) {
|
||||
for (auto it = rreq_queue.begin(); it != rreq_queue.end(); it++) {
|
||||
if (!read_id_busy[it->id]) {
|
||||
read_id_busy[it->id] = true;
|
||||
auto transaction = *it;
|
||||
rreq[transaction.addr].push(transaction);
|
||||
mem->addTransaction(false, transaction.addr);
|
||||
rreq_queue.erase(it);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (ar_fire) {
|
||||
rreq_queue.push_back(mm_req_t(ar_id, 1 << ar_size, ar_len + 1, ar_addr));
|
||||
}
|
||||
|
||||
if (aw_fire) {
|
||||
store_addr = aw_addr;
|
||||
store_id = aw_id;
|
||||
store_count = aw_len + 1;
|
||||
store_size = 1 << aw_size;
|
||||
store_inflight = true;
|
||||
}
|
||||
|
||||
if (w_fire) {
|
||||
write(store_addr, (uint8_t*)w_data, w_strb, store_size);
|
||||
store_addr += store_size;
|
||||
store_count--;
|
||||
|
||||
if (store_count == 0) {
|
||||
store_inflight = false;
|
||||
mem->addTransaction(true, store_addr);
|
||||
wreq[store_addr].push(store_id);
|
||||
assert(w_last);
|
||||
}
|
||||
}
|
||||
|
||||
if (b_fire)
|
||||
bresp.pop();
|
||||
|
||||
if (r_fire)
|
||||
rresp.pop();
|
||||
|
||||
mem->update();
|
||||
cycle++;
|
||||
|
||||
if (reset) {
|
||||
while (!bresp.empty()) bresp.pop();
|
||||
while (!rresp.empty()) rresp.pop();
|
||||
cycle = 0;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,124 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
#ifndef _MM_EMULATOR_DRAMSIM2_H
|
||||
#define _MM_EMULATOR_DRAMSIM2_H
|
||||
|
||||
#include "mm.h"
|
||||
#include <DRAMSim.h>
|
||||
#include <map>
|
||||
#include <queue>
|
||||
#include <list>
|
||||
#include <stdint.h>
|
||||
|
||||
struct mm_req_t {
|
||||
uint64_t id;
|
||||
uint64_t size;
|
||||
uint64_t len;
|
||||
uint64_t addr;
|
||||
|
||||
mm_req_t(uint64_t id, uint64_t size, uint64_t len, uint64_t addr)
|
||||
{
|
||||
this->id = id;
|
||||
this->size = size;
|
||||
this->len = len;
|
||||
this->addr = addr;
|
||||
}
|
||||
|
||||
mm_req_t()
|
||||
{
|
||||
this->id = 0;
|
||||
this->size = 0;
|
||||
this->len = 0;
|
||||
this->addr = 0;
|
||||
}
|
||||
};
|
||||
|
||||
class mm_dramsim2_t : public mm_t
|
||||
{
|
||||
public:
|
||||
mm_dramsim2_t(int axi4_ids) :
|
||||
read_id_busy(axi4_ids, false),
|
||||
write_id_busy(axi4_ids, false) {};
|
||||
mm_dramsim2_t(std::string memory_ini, std::string system_ini, std::string ini_dir, int axi4_ids) :
|
||||
memory_ini(memory_ini),
|
||||
system_ini(system_ini),
|
||||
ini_dir(ini_dir),
|
||||
read_id_busy(axi4_ids, false),
|
||||
write_id_busy(axi4_ids, false) {};
|
||||
|
||||
virtual void init(size_t sz, int word_size, int line_size);
|
||||
|
||||
virtual bool ar_ready();
|
||||
virtual bool aw_ready();
|
||||
virtual bool w_ready() { return store_inflight; }
|
||||
virtual bool b_valid() { return !bresp.empty(); }
|
||||
virtual uint64_t b_resp() { return 0; }
|
||||
virtual uint64_t b_id() { return b_valid() ? bresp.front() : 0; }
|
||||
virtual bool r_valid() { return !rresp.empty(); }
|
||||
virtual uint64_t r_resp() { return 0; }
|
||||
virtual uint64_t r_id() { return r_valid() ? rresp.front().id: 0; }
|
||||
virtual void *r_data() { return r_valid() ? &rresp.front().data[0] : &dummy_data[0]; }
|
||||
virtual bool r_last() { return r_valid() ? rresp.front().last : false; }
|
||||
|
||||
virtual void tick
|
||||
(
|
||||
bool reset,
|
||||
|
||||
bool ar_valid,
|
||||
uint64_t ar_addr,
|
||||
uint64_t ar_id,
|
||||
uint64_t ar_size,
|
||||
uint64_t ar_len,
|
||||
|
||||
bool aw_valid,
|
||||
uint64_t aw_addr,
|
||||
uint64_t aw_id,
|
||||
uint64_t aw_size,
|
||||
uint64_t aw_len,
|
||||
|
||||
bool w_valid,
|
||||
uint64_t w_strb,
|
||||
void *w_data,
|
||||
bool w_last,
|
||||
|
||||
bool r_ready,
|
||||
bool b_ready
|
||||
);
|
||||
|
||||
|
||||
protected:
|
||||
DRAMSim::MultiChannelMemorySystem *mem;
|
||||
uint64_t cycle;
|
||||
|
||||
bool store_inflight = false;
|
||||
std::string memory_ini = "DDR3_micron_64M_8B_x4_sg15.ini";
|
||||
std::string system_ini = "system.ini";
|
||||
std::string ini_dir = "dramsim2_ini";
|
||||
|
||||
uint64_t store_addr;
|
||||
uint64_t store_id;
|
||||
uint64_t store_size;
|
||||
uint64_t store_count;
|
||||
std::vector<char> dummy_data;
|
||||
std::queue<uint64_t> bresp;
|
||||
|
||||
// Keep a FIFO of IDs that made reads to an address since Dramsim2 doesn't
|
||||
// track it. Reads or writes to the same address from different IDs can
|
||||
// collide
|
||||
std::map<uint64_t, std::queue<uint64_t>> wreq;
|
||||
std::map<uint64_t, std::queue<mm_req_t>> rreq;
|
||||
std::queue<mm_rresp_t> rresp;
|
||||
//std::map<uint64_t, std::queue<mm_rresp_t> > rreq;
|
||||
|
||||
|
||||
// Track inflight requests by putting indexes to their positions in the
|
||||
// stimulus vector in queues for each AXI channel
|
||||
std::vector<bool> read_id_busy;
|
||||
std::vector<bool> write_id_busy;
|
||||
std::list<mm_req_t> rreq_queue;
|
||||
|
||||
void read_complete(unsigned id, uint64_t address, uint64_t clock_cycle);
|
||||
void write_complete(unsigned id, uint64_t address, uint64_t clock_cycle);
|
||||
};
|
||||
|
||||
#endif
|
|
@ -0,0 +1,23 @@
|
|||
# Compile DRAMSim2
|
||||
dramsim_o := $(foreach f, \
|
||||
$(patsubst %.cpp, %.o, $(wildcard $(midas_dir)/dramsim2/*.cpp)), \
|
||||
$(GEN_DIR)/$(notdir $(f)))
|
||||
$(dramsim_o): $(GEN_DIR)/%.o: $(midas_dir)/dramsim2/%.cpp
|
||||
$(CXX) $(CXXFLAGS) -DNO_STORAGE -DNO_OUTPUT -Dmain=nomain -c -o $@ $<
|
||||
|
||||
ifeq ($(PLATFORM),zynq)
|
||||
host = arm-xilinx-linux-gnueabi
|
||||
endif
|
||||
|
||||
# Compile utility code
|
||||
lib_files := mm mm_dramsim2 $(if $(filter $(CXX),cl),,midas_context)
|
||||
lib_cc := $(addprefix $(util_dir)/, $(addsuffix .cc, $(lib_files)))
|
||||
lib_o := $(addprefix $(GEN_DIR)/, $(addsuffix .o, $(lib_files)))
|
||||
|
||||
$(lib_o): $(GEN_DIR)/%.o: $(util_dir)/%.cc
|
||||
$(CXX) $(CXXFLAGS) -c -o $@ $<
|
||||
|
||||
lib := $(GEN_DIR)/libmidas.a
|
||||
|
||||
$(lib): $(lib_o) $(dramsim_o)
|
||||
$(AR) rcs $@ $^
|
|
@ -0,0 +1,58 @@
|
|||
NUM_BANKS=8
|
||||
NUM_ROWS=32768
|
||||
NUM_COLS=2048
|
||||
DEVICE_WIDTH=4
|
||||
|
||||
;in nanoseconds
|
||||
;#define REFRESH_PERIOD 7800
|
||||
REFRESH_PERIOD=7800
|
||||
tCK=1.5 ;*
|
||||
|
||||
CL=10 ;*
|
||||
AL=0 ;*
|
||||
;AL=3; needs to be tRCD-1 or 0
|
||||
;RL=(CL+AL)
|
||||
;WL=(RL-1)
|
||||
BL=8 ;*
|
||||
tRAS=24;*
|
||||
tRCD=10 ;*
|
||||
tRRD=4 ;*
|
||||
tRC=34 ;*
|
||||
tRP=10 ;*
|
||||
tCCD=4 ;*
|
||||
tRTP=5 ;*
|
||||
tWTR=5 ;*
|
||||
tWR=10 ;*
|
||||
tRTRS=1; -- RANK PARAMETER, TODO
|
||||
tRFC=107;*
|
||||
tFAW=20;*
|
||||
tCKE=4 ;*
|
||||
tXP=4 ;*
|
||||
|
||||
tCMD=1 ;*
|
||||
|
||||
IDD0=100;
|
||||
IDD1=130;
|
||||
IDD2P=10;
|
||||
IDD2Q=70;
|
||||
IDD2N=70;
|
||||
IDD3Pf=60;
|
||||
IDD3Ps=60;
|
||||
IDD3N=90;
|
||||
IDD4W=255;
|
||||
IDD4R=230;
|
||||
IDD5=305;
|
||||
IDD6=9;
|
||||
IDD6L=12;
|
||||
IDD7=415;
|
||||
|
||||
;same bank
|
||||
;READ_TO_PRE_DELAY=(AL+BL/2+max(tRTP,2)-2)
|
||||
;WRITE_TO_PRE_DELAY=(WL+BL/2+tWR)
|
||||
;READ_TO_WRITE_DELAY=(RL+BL/2+tRTRS-WL)
|
||||
;READ_AUTOPRE_DELAY=(AL+tRTP+tRP)
|
||||
;WRITE_AUTOPRE_DELAY=(WL+BL/2+tWR+tRP)
|
||||
;WRITE_TO_READ_DELAY_B=(WL+BL/2+tWTR);interbank
|
||||
;WRITE_TO_READ_DELAY_R=(WL+BL/2+tRTRS-RL);interrank
|
||||
|
||||
Vdd=1.5 ; TODO: double check this
|
|
@ -0,0 +1,25 @@
|
|||
; COPY THIS FILE AND MODIFY IT TO SUIT YOUR NEEDS
|
||||
|
||||
NUM_CHANS=1 ; number of *logically independent* channels (i.e. each with a separate memory controller); should be a power of 2
|
||||
JEDEC_DATA_BUS_BITS=64 ; Always 64 for DDRx; if you want multiple *ganged* channels, set this to N*64
|
||||
TRANS_QUEUE_DEPTH=32 ; transaction queue, i.e., CPU-level commands such as: READ 0xbeef
|
||||
CMD_QUEUE_DEPTH=32 ; command queue, i.e., DRAM-level commands such as: CAS 544, RAS 4
|
||||
EPOCH_LENGTH=100000 ; length of an epoch in cycles (granularity of simulation)
|
||||
ROW_BUFFER_POLICY=open_page ; close_page or open_page
|
||||
ADDRESS_MAPPING_SCHEME=scheme2 ;valid schemes 1-7; For multiple independent channels, use scheme7 since it has the most parallelism
|
||||
SCHEDULING_POLICY=rank_then_bank_round_robin ; bank_then_rank_round_robin or rank_then_bank_round_robin
|
||||
QUEUING_STRUCTURE=per_rank ;per_rank or per_rank_per_bank
|
||||
|
||||
;for true/false, please use all lowercase
|
||||
DEBUG_TRANS_Q=false
|
||||
DEBUG_CMD_Q=false
|
||||
DEBUG_ADDR_MAP=false
|
||||
DEBUG_BUS=false
|
||||
DEBUG_BANKSTATE=false
|
||||
DEBUG_BANKS=false
|
||||
DEBUG_POWER=false
|
||||
VIS_FILE_OUTPUT=false
|
||||
|
||||
USE_LOW_POWER=true ; go into low power mode when idle?
|
||||
VERIFICATION_OUTPUT=false ; should be false for normal operation
|
||||
TOTAL_ROW_ACCESSES=4 ; maximum number of open page requests to send to the same row before forcing a row close (to prevent starvation)
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,61 @@
|
|||
#!/usr/bin/python
|
||||
import optparse
|
||||
import subprocess
|
||||
import json
|
||||
import re
|
||||
|
||||
# This script generates json files used to drive the memory configuration
|
||||
# generator by invoking verilator's verilog preprocessor on the micron-provided
|
||||
# DRAM-model verilog headers. It does this for all combinations of speedgrade X
|
||||
# DQ width
|
||||
|
||||
|
||||
parser = optparse.OptionParser()
|
||||
parser.add_option('-f', '--input-file', dest='input_file', help='The verilog header to parse for DDR timings.')
|
||||
parser.add_option('-o', '--output-file', dest='output_file', help='The output json file name.')
|
||||
(options, args) = parser.parse_args()
|
||||
|
||||
def call_verilator_preprocessor(filename, speedgrade, width):
|
||||
#args = ['verilator', '-E', '-D' + width, '-D' + speedgrade, filename]
|
||||
args = "verilator -E -D{0} -D{1} {2}".format(speedgrade, width, filename)
|
||||
p = subprocess.Popen(args, shell=True, stdout = subprocess.PIPE)
|
||||
return p.stdout.readlines()
|
||||
|
||||
def get_units(filename):
|
||||
units = {}
|
||||
with open(filename, 'rb') as vhf:
|
||||
for line in vhf.readlines():
|
||||
m = re.search('parameter\s*(\w*).*?\/\/\s*([()\w]+?)\s*(tCK|ps)', line)
|
||||
if m:
|
||||
units[m.group(1)] = m.group(3)
|
||||
|
||||
return units
|
||||
|
||||
|
||||
values = {}
|
||||
|
||||
speedgrades = [ "sg093", "sg107", "sg125", "sg15E", "sg15", "sg187E", "sg187", "sg25E", "sg25",]
|
||||
widths = ['x4', 'x8', 'x16']
|
||||
|
||||
unit_table = get_units(options.input_file)
|
||||
|
||||
for sg in speedgrades:
|
||||
values[sg] = {};
|
||||
for width in widths:
|
||||
lines = call_verilator_preprocessor(options.input_file, sg, width)
|
||||
values[sg][width] = {};
|
||||
for line in lines:
|
||||
line_backup = line
|
||||
m = re.search('parameter\s*(\w*)\s*=\s*(\w*);', line)
|
||||
if m:
|
||||
units = unit_table.get(m.group(1), "none")
|
||||
try:
|
||||
values[sg][width][m.group(1)] = { "units" : units,
|
||||
"value" : int(m.group(2))}
|
||||
except ValueError:
|
||||
# Reject string parameters
|
||||
pass
|
||||
|
||||
with open(options.output_file, 'wb') as jsonf:
|
||||
json.dump(values, jsonf, indent = 4)
|
||||
|
|
@ -0,0 +1 @@
|
|||
*.pyc
|
|
@ -0,0 +1,105 @@
|
|||
#!/usr/bin/env python
|
||||
|
||||
# See LICENSE for license details.
|
||||
|
||||
import os
|
||||
import os.path
|
||||
from subprocess import Popen
|
||||
import argparse
|
||||
import shutil
|
||||
import re
|
||||
import csv
|
||||
import numpy as np
|
||||
import math
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description = 'Run PrimeTime PX for each sample generated by replay-sample.py')
|
||||
parser.add_argument("-s", "--sample", dest="sample", type=str,
|
||||
help='sample file', required=True)
|
||||
parser.add_argument("-d", "--design", dest="design", type=str,
|
||||
help='design name', required=True)
|
||||
parser.add_argument("-m", "--make", dest="make", type=str, nargs='*',
|
||||
help='make command', required=True)
|
||||
parser.add_argument("--output-dir", dest="output_dir", type=str,
|
||||
help='output directory for vpd, power', required=True)
|
||||
parser.add_argument("--trace-dir", dest="trace_dir", type=str,
|
||||
help='PLSI TRACE directory', required=True)
|
||||
parser.add_argument("--obj-dir", dest="obj_dir", type=str,
|
||||
help='PLSI OBJ directory', required=True)
|
||||
args = parser.parse_args()
|
||||
|
||||
""" Read Sample """
|
||||
num = 0
|
||||
with open(args.sample) as f:
|
||||
for line in f:
|
||||
tokens = line.split(" ")
|
||||
head = tokens[0]
|
||||
if head == '1':
|
||||
assert tokens[1] == 'cycle:'
|
||||
num += 1
|
||||
|
||||
prefix = os.path.basename(os.path.splitext(args.sample)[0])
|
||||
if not os.path.exists(args.output_dir):
|
||||
os.makedirs(args.output_dir)
|
||||
if not os.path.exists(args.trace_dir):
|
||||
os.makedirs(args.trace_dir)
|
||||
|
||||
ids = range(num)
|
||||
for k in xrange(0, num, 10):
|
||||
ps = list()
|
||||
|
||||
for i in ids[k:k+10]:
|
||||
""" Copy vpd """
|
||||
shutil.copy("%s/%s-replay-%d.vpd" % (args.output_dir, prefix, i),
|
||||
"%s/%s-replay-%d.vpd" % (args.trace_dir, prefix, i))
|
||||
|
||||
""" Run PrimeTime PX """
|
||||
cmd = ["make"] + args.make + \
|
||||
["SAMPLE=%s/%s-replay-%d.sample" % (args.output_dir, prefix, i)]
|
||||
ps.append(Popen(cmd, stdout=open(os.devnull, 'wb')))
|
||||
|
||||
while any(p.poll() == None for p in ps):
|
||||
pass
|
||||
|
||||
assert all(p.poll() == 0 for p in ps)
|
||||
|
||||
""" Read report file """
|
||||
modules = list()
|
||||
sample_pwr = dict()
|
||||
|
||||
for i in xrange(num):
|
||||
report_filename = "%s/pt-power/%s-replay-%d/synopsys-pt-workdir/reports/%s_report_power.report" % (
|
||||
args.obj_dir, prefix, i, args.design)
|
||||
with open(report_filename) as f:
|
||||
found = False
|
||||
for line in f:
|
||||
tokens = line.split()
|
||||
if not found:
|
||||
found = len(tokens) > 0 and tokens[0] == 'Hierarchy'
|
||||
elif found and len(tokens) >= 6:
|
||||
module = ' '.join(tokens[:2]) if len(tokens) > 6 else tokens[0]
|
||||
int_pwr = tokens[-5]
|
||||
switch_pwr = tokens[-4]
|
||||
leak_pwr = tokens[-3]
|
||||
total_pwr = tokens[-2]
|
||||
percent = tokens[-1]
|
||||
if not 'clk_gate' in module:
|
||||
if not module in sample_pwr:
|
||||
modules.append(module)
|
||||
sample_pwr[module] = list()
|
||||
sample_pwr[module].append('0.0' if total_pwr == 'N/A' else total_pwr)
|
||||
|
||||
""" Dump power """
|
||||
csv_filename = "%s/%s-pwr.csv" % (args.output_dir, prefix)
|
||||
print "[strober] Dump Power at", csv_filename
|
||||
with open(csv_filename, "w") as f:
|
||||
writer = csv.writer(f)
|
||||
writer.writerow(["Modules"] + ["Sample %d (mW)" % i for i in xrange(num)] + [
|
||||
"Average (mW)", "95% error", "99% error"])
|
||||
for m in modules:
|
||||
arr = np.array([1000.0 * float(x) for x in sample_pwr[m]])
|
||||
avg = np.mean(arr)
|
||||
var = np.sum(np.power(arr - avg, 2)) / (num - 1) if num > 1 else 0 # sample variance
|
||||
_95 = 1.96 * math.sqrt(var / num)
|
||||
_99 = 2.576 * math.sqrt(var / num)
|
||||
writer.writerow([m] + arr.tolist() + [avg, _95, _99])
|
|
@ -0,0 +1,178 @@
|
|||
#!/usr/bin/env python
|
||||
|
||||
# See LICENSE for license details.
|
||||
|
||||
import sys
|
||||
import os
|
||||
import tempfile
|
||||
import subprocess
|
||||
import argparse
|
||||
import json
|
||||
import fm_regex
|
||||
|
||||
def initialize_arguments(args):
|
||||
""" initilize translator arguments """
|
||||
parser = argparse.ArgumentParser(
|
||||
description = 'run formality for macros')
|
||||
parser.add_argument('--paths', type=file, required=True,
|
||||
help="""macro path analaysis file from Strober's compiler (d.g. <design>.macros.path) """)
|
||||
parser.add_argument('--ref', nargs='+',
|
||||
help="""reference verilog file""")
|
||||
parser.add_argument('--impl', nargs='+',
|
||||
help="""implementation verilog file""")
|
||||
parser.add_argument('--match', type=str, required=True,
|
||||
help="match file to be appended")
|
||||
|
||||
""" parse the arguments """
|
||||
res = parser.parse_args(args)
|
||||
|
||||
return res.paths, res.match, res.ref, res.impl
|
||||
|
||||
def read_path_file(f):
|
||||
paths = dict()
|
||||
try:
|
||||
for line in f:
|
||||
tokens = line.split()
|
||||
module = tokens[0]
|
||||
path = tokens[1]
|
||||
if not module in paths:
|
||||
paths[module] = list()
|
||||
paths[module].append(path)
|
||||
|
||||
finally:
|
||||
f.close()
|
||||
|
||||
return paths
|
||||
|
||||
def read_match_file(match_file):
|
||||
gate_names = dict()
|
||||
with open(match_file, 'r') as f:
|
||||
for line in f:
|
||||
tokens = line.split()
|
||||
gate_names[tokens[0]] = tokens[1]
|
||||
|
||||
return gate_names
|
||||
|
||||
def write_tcl(tcl_file, report_file, mem_name, ref_v_files, impl_v_files):
|
||||
with open(tcl_file, 'w') as f:
|
||||
""" Don't match name substrings """
|
||||
f.write("set_app_var name_match_allow_subset_match none\n")
|
||||
|
||||
""" No errors from unresolved modules """
|
||||
f.write("set_app_var hdlin_unresolved_modules black_box\n")
|
||||
|
||||
""" Read reference verilog files """
|
||||
for ref in ref_v_files:
|
||||
f.write("read_verilog -r %s -work_library WORK\n" % ref)
|
||||
|
||||
""" Set top of reference """
|
||||
f.write("set_top r:/WORK/%s\n" % mem_name)
|
||||
|
||||
""" Read implementation verilog files """
|
||||
for impl in impl_v_files:
|
||||
f.write("read_verilog -i %s -work_library WORK\n" % impl)
|
||||
|
||||
""" Set top of implementation """
|
||||
f.write("set_top i:/WORK/%s\n" % mem_name)
|
||||
|
||||
""" Match """
|
||||
f.write("match\n")
|
||||
|
||||
""" Report match points """
|
||||
f.write("report_matched_points > %s\n" % report_file)
|
||||
|
||||
""" Report unmatch points """
|
||||
# f.write("report_unmatched_points >> %s\n" % report_file)
|
||||
|
||||
""" Finish """
|
||||
f.write("exit\n")
|
||||
|
||||
return
|
||||
|
||||
def append_match_file(report_file, match_file, mem, paths, gate_names):
|
||||
""" construct macro name mapping for the formality report """
|
||||
macro_map = list()
|
||||
|
||||
ref_was_matched = False
|
||||
|
||||
with open(report_file, 'r') as f:
|
||||
for line in f:
|
||||
if ref_was_matched:
|
||||
impl_matched = fm_regex.impl_regex.search(line)
|
||||
if impl_matched:
|
||||
impl_name = impl_matched.group(1).replace("/", ".")
|
||||
impl_name = impl_name.replace(mem + ".", "")
|
||||
ff_matched = fm_regex.ff_regex.match(impl_name)
|
||||
reg_matched = fm_regex.reg_regex.match(impl_name)
|
||||
mem_matched = fm_regex.mem_regex.match(impl_name)
|
||||
if mem_matched:
|
||||
impl_name = mem_matched.group(1) + "[" + mem_matched.group(2) + "]" +\
|
||||
"[" + mem_matched.group(3) + "]"
|
||||
elif reg_matched:
|
||||
impl_name = reg_matched.group(1) + "[" + reg_matched.group(2) + "]"
|
||||
elif ff_matched:
|
||||
impl_name = ff_matched.group(1)
|
||||
|
||||
macro_map.append((ref_name, impl_name))
|
||||
ref_was_matched = False
|
||||
|
||||
else:
|
||||
ref_matched = fm_regex.ref_regex.search(line)
|
||||
if ref_matched:
|
||||
ref_name = ref_matched.group(2).replace("/", ".")
|
||||
ref_name = ref_name.replace(mem + ".", "")
|
||||
ff_matched = fm_regex.ff_regex.match(ref_name)
|
||||
reg_matched = fm_regex.reg_regex.match(ref_name)
|
||||
mem_matched = fm_regex.mem_regex.match(ref_name)
|
||||
if mem_matched:
|
||||
ref_name = mem_matched.group(1) + "[" + mem_matched.group(2) + "]" +\
|
||||
"[" + mem_matched.group(3) + "]"
|
||||
elif reg_matched:
|
||||
ref_name = reg_matched.group(1) + "[" + reg_matched.group(2) + "]"
|
||||
elif ff_matched:
|
||||
ref_name = ff_matched.group(1)
|
||||
|
||||
ref_was_matched = True
|
||||
|
||||
""" append the name mapping to the match file """
|
||||
with open(match_file, 'a') as f:
|
||||
for path in paths:
|
||||
for ref_name, impl_name in macro_map:
|
||||
ref_full_name = path + "." + ref_name
|
||||
if path in gate_names:
|
||||
impl_mod_path = gate_names[path]
|
||||
else:
|
||||
impl_mod_path = path
|
||||
impl_full_name = impl_mod_path + "." + impl_name
|
||||
f.write("%s %s\n" % (ref_full_name, impl_full_name))
|
||||
|
||||
return
|
||||
|
||||
if __name__ == '__main__':
|
||||
""" parse the arguments """
|
||||
path_file, match_file, ref_files, impl_files = initialize_arguments(sys.argv[1:])
|
||||
|
||||
""" read path file """
|
||||
paths = read_path_file(path_file)
|
||||
|
||||
""" read match file """
|
||||
gate_names = read_match_file(match_file)
|
||||
|
||||
""" create temp dir """
|
||||
dir_path = tempfile.mkdtemp()
|
||||
|
||||
for mem in paths:
|
||||
""" TCL file path """
|
||||
tcl_file = os.path.join(dir_path, mem + ".tcl")
|
||||
|
||||
""" report file path """
|
||||
report_file = os.path.join(dir_path, mem + ".rpt")
|
||||
|
||||
""" generate TCL script for formality """
|
||||
write_tcl(tcl_file, report_file, mem, ref_files, impl_files)
|
||||
|
||||
""" execute formality """
|
||||
assert subprocess.call(["fm_shell", "-f", tcl_file]) == 0
|
||||
|
||||
""" append mappings to the match file """
|
||||
append_match_file(report_file, match_file, mem, paths[mem], gate_names)
|
|
@ -0,0 +1,109 @@
|
|||
#!/usr/bin/env python
|
||||
|
||||
# See LICENSE for license details.
|
||||
|
||||
import fm_regex
|
||||
import read_svf
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
def initialize_arguments(args):
|
||||
""" initilize arguments """
|
||||
parser = argparse.ArgumentParser(
|
||||
description = 'Find match map between RTL Names & Gate-Level Names')
|
||||
parser.add_argument('--match', type=str, required=True,
|
||||
help="""match output file (RTL name -> Gate-level name)""")
|
||||
parser.add_argument('--report', type=str, required=True,
|
||||
help="""report from Snopsys formality (generated by 'match' and 'report_match_points' )""")
|
||||
parser.add_argument('--svf', type=str, required=True,
|
||||
help="""decripted svf file from formality (generated in formality_svf/svf.txt)""")
|
||||
|
||||
""" parse the arguments """
|
||||
res = parser.parse_args(args)
|
||||
|
||||
return res.report, res.svf, res.match
|
||||
|
||||
def read_name_map(report_file, instance_map, change_names):
|
||||
name_map = list()
|
||||
|
||||
with open(report_file, 'r') as f:
|
||||
ref_was_matched = False
|
||||
ref_name = ""
|
||||
for line in f:
|
||||
if ref_was_matched:
|
||||
impl_matched = fm_regex.impl_regex.search(line)
|
||||
if impl_matched:
|
||||
name_map.append((ref_name, impl_matched.group(1).replace("/", ".")))
|
||||
ref_was_matched = False
|
||||
|
||||
else:
|
||||
ref_matched = fm_regex.ref_regex.search(line)
|
||||
if ref_matched:
|
||||
gate_type = ref_matched.group(1)
|
||||
ref_name_tokens = ref_matched.group(2).split("/")
|
||||
ref_name = ref_name_tokens[0]
|
||||
design = ref_name
|
||||
for i, token in enumerate(ref_name_tokens[1:]):
|
||||
if design in change_names:
|
||||
map = change_names[design]
|
||||
rtl_name = map[token] if token in map else token
|
||||
else:
|
||||
rtl_name = token
|
||||
ref_name = ref_name + "." + rtl_name
|
||||
if design in instance_map and i < len(ref_name_tokens[1:]) - 1:
|
||||
design = instance_map[design][rtl_name]
|
||||
else:
|
||||
design = ""
|
||||
|
||||
if gate_type == "DFF":
|
||||
""" D Flip Flops """
|
||||
ff_matched = fm_regex.ff_regex.match(ref_name)
|
||||
reg_matched = fm_regex.reg_regex.match(ref_name)
|
||||
mem_matched = fm_regex.mem_regex.match(ref_name)
|
||||
if mem_matched:
|
||||
ref_name = mem_matched.group(1) + "[" + mem_matched.group(2) + "]" +\
|
||||
"[" + mem_matched.group(3) + "]"
|
||||
elif reg_matched:
|
||||
ref_name = reg_matched.group(1) + "[" + reg_matched.group(2) + "]"
|
||||
elif ff_matched:
|
||||
ref_name = ff_matched.group(1)
|
||||
else:
|
||||
print ref_name
|
||||
assert False
|
||||
|
||||
elif gate_type == "BBox":
|
||||
""" Macros """
|
||||
pass
|
||||
|
||||
elif gate_type == "BlPin" or gate_type == "BBPin" or gate_type == "Port":
|
||||
""" Pins """
|
||||
bus_matched = fm_regex.bus_regex.search(ref_name)
|
||||
if bus_matched:
|
||||
ref_name = bus_matched.group(1) + "[" + bus_matched.group(2) + "]"
|
||||
|
||||
else:
|
||||
assert False
|
||||
|
||||
ref_was_matched = True
|
||||
|
||||
return name_map
|
||||
|
||||
def write_match_file(match_file, name_map):
|
||||
with open(match_file, 'w') as f:
|
||||
for ref_name, impl_name in name_map:
|
||||
f.write("%s %s\n" % (ref_name, impl_name))
|
||||
|
||||
return
|
||||
|
||||
if __name__ == '__main__':
|
||||
""" parse the arguments and open files """
|
||||
report_file, svf_file, match_file = initialize_arguments(sys.argv[1:])
|
||||
|
||||
""" read svf file for guidance used in formality """
|
||||
instance_map, change_names = read_svf.read_svf_file(svf_file)
|
||||
|
||||
""" read gate-level names from the formality report file """
|
||||
name_map = read_name_map(report_file, instance_map, change_names)
|
||||
|
||||
""" write the output file """
|
||||
write_match_file(match_file, name_map)
|
|
@ -0,0 +1,27 @@
|
|||
import re
|
||||
|
||||
# See LICENSE for license details.
|
||||
|
||||
""" define reference(RTL) name regular expression """
|
||||
ref_regex = re.compile(r"""
|
||||
Ref\s+ # is reference(RTL)?
|
||||
(DFF|BlPin|Port|BBox|BBPin)\w*\s+ # Type
|
||||
(?:[\w\(\)]*)\s # Matched by (e.g. name)
|
||||
r:/WORK/ # name prefix
|
||||
([\w/\[\]\$]*) # RTL(chisel) name
|
||||
""", re.VERBOSE)
|
||||
|
||||
""" define implemntation(gate-level designs) name regular expression """
|
||||
impl_regex = re.compile(r"""
|
||||
Impl\s+ # is implementation(gate-level design)?
|
||||
(?:DFF|BlPin|Port|BBox|BBPin)\w*\s+ # Type
|
||||
(?:\(-\))?\s+ # Inverted?
|
||||
(?:[\w\(\)]*)\s # Matched by (e.g. name)
|
||||
i:/WORK/ # name prefix
|
||||
([\w/\[\]]*) # gate-level name
|
||||
""", re.VERBOSE)
|
||||
|
||||
ff_regex = re.compile(r"([\w.\$]*)_reg")
|
||||
reg_regex = re.compile(r"([\w.\$]*)_reg[_\[](\d+)[_\]]")
|
||||
mem_regex = re.compile(r"([\w.\$]*)_reg[_\[](\d+)[_\]][_\[](\d+)[_\]]")
|
||||
bus_regex = re.compile(r"([\w.\$]*)[_\[](\d+)[_\]]")
|
|
@ -0,0 +1,128 @@
|
|||
import sys
|
||||
import argparse
|
||||
# See LICENSE for license details.
|
||||
|
||||
|
||||
def construct_instance_map(instance_map, tokens):
|
||||
is_design = False
|
||||
is_instance = False
|
||||
is_module = False
|
||||
for token in tokens:
|
||||
if token == "{" or token == "}":
|
||||
pass
|
||||
elif token == "-design":
|
||||
is_design = True
|
||||
elif token == "-instance":
|
||||
is_instance = True
|
||||
elif token == "-linked":
|
||||
is_module = True
|
||||
elif is_design:
|
||||
design = token
|
||||
is_design = False
|
||||
elif is_instance:
|
||||
instance = token
|
||||
is_instance = False
|
||||
elif is_module:
|
||||
module = token
|
||||
is_module = False
|
||||
else:
|
||||
print token, ': ', ' '.join(tokens)
|
||||
assert False
|
||||
|
||||
if not design in instance_map:
|
||||
instance_map[design] = dict()
|
||||
instance_map[design][instance] = module
|
||||
return
|
||||
|
||||
def uniquify_instances(instance_map, tokens):
|
||||
state = -1
|
||||
for token in tokens:
|
||||
if token == "{" or token == "}":
|
||||
pass
|
||||
elif token == "-design":
|
||||
state = 0
|
||||
elif state == 0:
|
||||
""" identify a parent module name """
|
||||
top = token
|
||||
state = 1
|
||||
elif state == 1:
|
||||
""" identify an child instance name """
|
||||
design = top
|
||||
path_tokens = token.split("/")
|
||||
for path_token in path_tokens[:-1]:
|
||||
design = instance_map[design][path_token]
|
||||
instance = path_tokens[-1]
|
||||
path = '.'.join(path_tokens) # for debugging
|
||||
state = 2
|
||||
elif state == 2:
|
||||
""" identify a child module name """
|
||||
if not design in instance_map:
|
||||
instance_map[design] = dict()
|
||||
instance_map[design][instance] = token
|
||||
state = 1
|
||||
else:
|
||||
print token, ': ', ' '.join(tokens)
|
||||
assert False
|
||||
|
||||
return
|
||||
|
||||
def construct_change_names(change_names, tokens):
|
||||
state = -1
|
||||
for token in tokens:
|
||||
if token == '{' or token == '}':
|
||||
pass
|
||||
elif token == "-design":
|
||||
state = 0
|
||||
elif state == 0:
|
||||
""" identify a parent module name """
|
||||
design = token
|
||||
state = 1
|
||||
elif state == 1:
|
||||
""" identify an object type """
|
||||
is_cell = token == "cell"
|
||||
state = 2
|
||||
elif state == 2:
|
||||
""" identify an rtl name """
|
||||
rtl_name = token
|
||||
state = 3
|
||||
elif state == 3:
|
||||
if not design in change_names:
|
||||
change_names[design] = dict()
|
||||
""" record name changes only for cells """
|
||||
if is_cell:
|
||||
change_names[design][token] = rtl_name
|
||||
state = 1
|
||||
else:
|
||||
print token, ': ', ' '.join(tokens)
|
||||
assert False
|
||||
|
||||
return
|
||||
|
||||
def read_svf_file(svf_file):
|
||||
instance_map = dict()
|
||||
change_names = dict()
|
||||
|
||||
with open(svf_file, 'r') as f:
|
||||
full_line = ""
|
||||
for line in f:
|
||||
tokens = line.split()
|
||||
if len(tokens) == 0:
|
||||
pass
|
||||
elif tokens[-1] == '\\':
|
||||
full_line += ' '.join(tokens[:-1]) + ' '
|
||||
else:
|
||||
full_line += ' '.join(tokens)
|
||||
full_tokens = full_line.split()
|
||||
guide_cmd = full_tokens[0]
|
||||
|
||||
if guide_cmd == "guide_instance_map":
|
||||
construct_instance_map(instance_map, full_tokens[1:])
|
||||
elif guide_cmd == "guide_uniquify":
|
||||
uniquify_instances(instance_map, full_tokens[1:])
|
||||
elif guide_cmd == "guide_change_names":
|
||||
construct_change_names(change_names, full_tokens[1:])
|
||||
|
||||
full_line = ""
|
||||
|
||||
return instance_map, change_names
|
||||
|
|
@ -0,0 +1,77 @@
|
|||
#!/usr/bin/env python
|
||||
|
||||
# See LICENSE for license details.
|
||||
|
||||
import os.path
|
||||
import argparse
|
||||
from subprocess import Popen
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description = 'Replay each sample in a separate simulation instance')
|
||||
parser.add_argument("-s", "--sample", dest="sample", type=str,
|
||||
help='sample files', required=True)
|
||||
parser.add_argument("-e", "--sim", dest="sim", type=str,
|
||||
help='simulator executable for sample replays', required=True)
|
||||
parser.add_argument("-d", "--dir", dest="dir", type=str,
|
||||
help='output directory for waveform, log', required=True)
|
||||
parser.add_argument("-m", "--match", dest="match", type=str, default=None,
|
||||
help='match file generated by fm-match.py', required=False)
|
||||
parser.add_argument("-n", "--num", dest="num", type=str, default=10,
|
||||
help='# instances of gate-level simulation', required=False)
|
||||
args = parser.parse_args()
|
||||
|
||||
prefix = os.path.basename(os.path.splitext(args.sample)[0])
|
||||
abspath = os.path.abspath(args.sim)
|
||||
dirname = os.path.dirname(abspath)
|
||||
basename = os.path.basename(abspath)
|
||||
if not os.path.exists(args.dir):
|
||||
os.makedirs(args.dir)
|
||||
|
||||
""" Split Sample """
|
||||
prologue = list()
|
||||
samples = list()
|
||||
with open(args.sample) as f:
|
||||
for line in f:
|
||||
tokens = line.split(" ")
|
||||
head = tokens[0]
|
||||
if head == '0':
|
||||
prologue.append(line)
|
||||
elif head == '1':
|
||||
assert tokens[1] == 'cycle:'
|
||||
samples.append(list(line))
|
||||
else:
|
||||
samples[-1].append(line)
|
||||
|
||||
""" Save samples """
|
||||
for i, sample in enumerate(samples):
|
||||
f = open(os.path.join(args.dir, "%s-replay-%d.sample" % (prefix, i)), 'w')
|
||||
for line in prologue:
|
||||
f.write("%s" % line)
|
||||
for line in sample:
|
||||
f.write("%s" % line)
|
||||
f.close()
|
||||
|
||||
""" Execute replays """
|
||||
ids = range(len(samples))
|
||||
for k in xrange(0, len(samples), args.num):
|
||||
ps = list()
|
||||
logs = list()
|
||||
for i in ids[k:k+min(args.num,len(samples)-k)]:
|
||||
cmd = ["./%s" % basename,
|
||||
"+verbose",
|
||||
"+match=%s" % (args.match) if args.match != None else "",
|
||||
"+sample=%s/%s-replay-%d.sample" % (args.dir, prefix, i),
|
||||
"+vcdfile=%s/%s-replay-%d.vcd" % (args.dir, prefix, i),
|
||||
"+waveform=%s/%s-replay-%d.vpd" % (args.dir, prefix, i)]
|
||||
print " ".join(cmd)
|
||||
log = open("%s/%s-replay-%d.out" % (args.dir, prefix, i), 'w')
|
||||
ps.append(Popen(cmd, cwd=dirname, stderr=log))
|
||||
logs.append(log)
|
||||
|
||||
while any(p.poll() == None for p in ps):
|
||||
pass
|
||||
|
||||
assert all(p.poll() == 0 for p in ps)
|
||||
|
||||
for log in logs:
|
||||
log.close()
|
|
@ -0,0 +1,2 @@
|
|||
$init_sigs call=init_sigs_calltf
|
||||
$tick call=tick_calltf check=tick_compiletf acc=rw,frc:* acc-=frc:%CELL
|
|
@ -0,0 +1,158 @@
|
|||
// See LICENSE.Berkeley for license details.
|
||||
|
||||
package junctions
|
||||
|
||||
import Chisel._
|
||||
import freechips.rocketchip.unittest.UnitTest
|
||||
|
||||
class MultiWidthFifo(inW: Int, outW: Int, n: Int) extends Module {
|
||||
val io = new Bundle {
|
||||
val in = Decoupled(Bits(width = inW)).flip
|
||||
val out = Decoupled(Bits(width = outW))
|
||||
val count = UInt(OUTPUT, log2Up(n + 1))
|
||||
}
|
||||
|
||||
if (inW == outW) {
|
||||
val q = Module(new Queue(Bits(width = inW), n))
|
||||
q.io.enq <> io.in
|
||||
io.out <> q.io.deq
|
||||
io.count := q.io.count
|
||||
} else if (inW > outW) {
|
||||
val nBeats = inW / outW
|
||||
|
||||
require(inW % outW == 0, s"MultiWidthFifo: in: $inW not divisible by out: $outW")
|
||||
require(n % nBeats == 0, s"Cannot store $n output words when output beats is $nBeats")
|
||||
|
||||
val wdata = Reg(Vec(n / nBeats, Bits(width = inW)))
|
||||
val rdata = Vec(wdata.flatMap { indat =>
|
||||
(0 until nBeats).map(i => indat(outW * (i + 1) - 1, outW * i)) })
|
||||
|
||||
val head = Reg(init = UInt(0, log2Up(n / nBeats)))
|
||||
val tail = Reg(init = UInt(0, log2Up(n)))
|
||||
val size = Reg(init = UInt(0, log2Up(n + 1)))
|
||||
|
||||
when (io.in.fire()) {
|
||||
wdata(head) := io.in.bits
|
||||
head := head + UInt(1)
|
||||
}
|
||||
|
||||
when (io.out.fire()) { tail := tail + UInt(1) }
|
||||
|
||||
size := MuxCase(size, Seq(
|
||||
(io.in.fire() && io.out.fire()) -> (size + UInt(nBeats - 1)),
|
||||
io.in.fire() -> (size + UInt(nBeats)),
|
||||
io.out.fire() -> (size - UInt(1))))
|
||||
|
||||
io.out.valid := size > UInt(0)
|
||||
io.out.bits := rdata(tail)
|
||||
io.in.ready := size < UInt(n - nBeats + 1)
|
||||
io.count := size
|
||||
} else {
|
||||
val nBeats = outW / inW
|
||||
|
||||
require(outW % inW == 0, s"MultiWidthFifo: out: $outW not divisible by in: $inW")
|
||||
|
||||
val wdata = Reg(Vec(n * nBeats, Bits(width = inW)))
|
||||
val rdata = Vec.tabulate(n) { i =>
|
||||
Cat(wdata.slice(i * nBeats, (i + 1) * nBeats).reverse)}
|
||||
|
||||
val head = Reg(init = UInt(0, log2Up(n * nBeats)))
|
||||
val tail = Reg(init = UInt(0, log2Up(n)))
|
||||
val size = Reg(init = UInt(0, log2Up(n * nBeats + 1)))
|
||||
|
||||
when (io.in.fire()) {
|
||||
wdata(head) := io.in.bits
|
||||
head := head + UInt(1)
|
||||
}
|
||||
|
||||
when (io.out.fire()) { tail := tail + UInt(1) }
|
||||
|
||||
size := MuxCase(size, Seq(
|
||||
(io.in.fire() && io.out.fire()) -> (size - UInt(nBeats - 1)),
|
||||
io.in.fire() -> (size + UInt(1)),
|
||||
io.out.fire() -> (size - UInt(nBeats))))
|
||||
|
||||
io.count := size >> UInt(log2Up(nBeats))
|
||||
io.out.valid := io.count > UInt(0)
|
||||
io.out.bits := rdata(tail)
|
||||
io.in.ready := size < UInt(n * nBeats)
|
||||
}
|
||||
}
|
||||
|
||||
class MultiWidthFifoTest extends UnitTest {
|
||||
val big2little = Module(new MultiWidthFifo(16, 8, 8))
|
||||
val little2big = Module(new MultiWidthFifo(8, 16, 4))
|
||||
|
||||
val bl_send = Reg(init = false.B)
|
||||
val lb_send = Reg(init = false.B)
|
||||
val bl_recv = Reg(init = false.B)
|
||||
val lb_recv = Reg(init = false.B)
|
||||
val bl_finished = Reg(init = false.B)
|
||||
val lb_finished = Reg(init = false.B)
|
||||
|
||||
val bl_data = Vec.tabulate(4){i => UInt((2 * i + 1) * 256 + 2 * i, 16)}
|
||||
val lb_data = Vec.tabulate(8){i => UInt(i, 8)}
|
||||
|
||||
val (bl_send_cnt, bl_send_done) = Counter(big2little.io.in.fire(), 4)
|
||||
val (lb_send_cnt, lb_send_done) = Counter(little2big.io.in.fire(), 8)
|
||||
|
||||
val (bl_recv_cnt, bl_recv_done) = Counter(big2little.io.out.fire(), 8)
|
||||
val (lb_recv_cnt, lb_recv_done) = Counter(little2big.io.out.fire(), 4)
|
||||
|
||||
big2little.io.in.valid := bl_send
|
||||
big2little.io.in.bits := bl_data(bl_send_cnt)
|
||||
big2little.io.out.ready := bl_recv
|
||||
|
||||
little2big.io.in.valid := lb_send
|
||||
little2big.io.in.bits := lb_data(lb_send_cnt)
|
||||
little2big.io.out.ready := lb_recv
|
||||
|
||||
val bl_recv_data_idx = bl_recv_cnt >> UInt(1)
|
||||
val bl_recv_data = Mux(bl_recv_cnt(0),
|
||||
bl_data(bl_recv_data_idx)(15, 8),
|
||||
bl_data(bl_recv_data_idx)(7, 0))
|
||||
|
||||
val lb_recv_data = Cat(
|
||||
lb_data(Cat(lb_recv_cnt, UInt(1, 1))),
|
||||
lb_data(Cat(lb_recv_cnt, UInt(0, 1))))
|
||||
|
||||
when (io.start) {
|
||||
bl_send := true.B
|
||||
lb_send := true.B
|
||||
}
|
||||
|
||||
when (bl_send_done) {
|
||||
bl_send := false.B
|
||||
bl_recv := true.B
|
||||
}
|
||||
|
||||
when (lb_send_done) {
|
||||
lb_send := false.B
|
||||
lb_recv := true.B
|
||||
}
|
||||
|
||||
when (bl_recv_done) {
|
||||
bl_recv := false.B
|
||||
bl_finished := true.B
|
||||
}
|
||||
|
||||
when (lb_recv_done) {
|
||||
lb_recv := false.B
|
||||
lb_finished := true.B
|
||||
}
|
||||
|
||||
io.finished := bl_finished && lb_finished
|
||||
|
||||
val bl_start_recv = Reg(next = bl_send_done)
|
||||
val lb_start_recv = Reg(next = lb_send_done)
|
||||
|
||||
assert(!little2big.io.out.valid || little2big.io.out.bits === lb_recv_data,
|
||||
"Little to Big data mismatch")
|
||||
assert(!big2little.io.out.valid || big2little.io.out.bits === bl_recv_data,
|
||||
"Bit to Little data mismatch")
|
||||
|
||||
assert(!lb_start_recv || little2big.io.count === UInt(4),
|
||||
"Little to Big count incorrect")
|
||||
assert(!bl_start_recv || big2little.io.count === UInt(8),
|
||||
"Big to Little count incorrect")
|
||||
}
|
|
@ -0,0 +1,117 @@
|
|||
package junctions
|
||||
|
||||
import Chisel._
|
||||
import freechips.rocketchip.config.Parameters
|
||||
|
||||
class ReorderQueueWrite[T <: Data](dType: T, tagWidth: Int) extends Bundle {
|
||||
val data = dType.cloneType
|
||||
val tag = UInt(width = tagWidth)
|
||||
|
||||
override def cloneType =
|
||||
new ReorderQueueWrite(dType, tagWidth).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class ReorderEnqueueIO[T <: Data](dType: T, tagWidth: Int)
|
||||
extends DecoupledIO(new ReorderQueueWrite(dType, tagWidth)) {
|
||||
|
||||
override def cloneType =
|
||||
new ReorderEnqueueIO(dType, tagWidth).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class ReorderDequeueIO[T <: Data](dType: T, tagWidth: Int) extends Bundle {
|
||||
val valid = Bool(INPUT)
|
||||
val tag = UInt(INPUT, tagWidth)
|
||||
val data = dType.cloneType.asOutput
|
||||
val matches = Bool(OUTPUT)
|
||||
|
||||
override def cloneType =
|
||||
new ReorderDequeueIO(dType, tagWidth).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class ReorderQueue[T <: Data](dType: T, tagWidth: Int,
|
||||
size: Option[Int] = None, nDeq: Int = 1)
|
||||
extends Module {
|
||||
val io = new Bundle {
|
||||
val enq = new ReorderEnqueueIO(dType, tagWidth).flip
|
||||
val deq = Vec(nDeq, new ReorderDequeueIO(dType, tagWidth))
|
||||
}
|
||||
|
||||
val tagSpaceSize = 1 << tagWidth
|
||||
val actualSize = size.getOrElse(tagSpaceSize)
|
||||
|
||||
if (tagSpaceSize > actualSize) {
|
||||
require(tagSpaceSize % actualSize == 0)
|
||||
|
||||
val smallTagSize = log2Ceil(actualSize)
|
||||
|
||||
val roq_data = Reg(Vec(actualSize, dType))
|
||||
val roq_tags = Reg(Vec(actualSize, UInt(width = tagWidth - smallTagSize)))
|
||||
val roq_free = Reg(init = Vec.fill(actualSize)(true.B))
|
||||
val roq_enq_addr = io.enq.bits.tag(smallTagSize-1, 0)
|
||||
|
||||
io.enq.ready := roq_free(roq_enq_addr)
|
||||
|
||||
when (io.enq.valid && io.enq.ready) {
|
||||
roq_data(roq_enq_addr) := io.enq.bits.data
|
||||
roq_tags(roq_enq_addr) := io.enq.bits.tag >> smallTagSize.U
|
||||
roq_free(roq_enq_addr) := false.B
|
||||
}
|
||||
|
||||
io.deq.foreach { deq =>
|
||||
val roq_deq_addr = deq.tag(smallTagSize-1, 0)
|
||||
|
||||
deq.data := roq_data(roq_deq_addr)
|
||||
deq.matches := !roq_free(roq_deq_addr) && roq_tags(roq_deq_addr) === (deq.tag >> smallTagSize.U)
|
||||
|
||||
when (deq.valid) {
|
||||
roq_free(roq_deq_addr) := true.B
|
||||
}
|
||||
}
|
||||
} else if (tagSpaceSize == actualSize) {
|
||||
val roq_data = Mem(tagSpaceSize, dType)
|
||||
val roq_free = Reg(init = Vec.fill(tagSpaceSize)(true.B))
|
||||
|
||||
io.enq.ready := roq_free(io.enq.bits.tag)
|
||||
|
||||
when (io.enq.valid && io.enq.ready) {
|
||||
roq_data(io.enq.bits.tag) := io.enq.bits.data
|
||||
roq_free(io.enq.bits.tag) := false.B
|
||||
}
|
||||
|
||||
io.deq.foreach { deq =>
|
||||
deq.data := roq_data(deq.tag)
|
||||
deq.matches := !roq_free(deq.tag)
|
||||
|
||||
when (deq.valid) {
|
||||
roq_free(deq.tag) := true.B
|
||||
}
|
||||
}
|
||||
} else {
|
||||
require(actualSize % tagSpaceSize == 0)
|
||||
|
||||
val qDepth = actualSize / tagSpaceSize
|
||||
val queues = Seq.fill(tagSpaceSize) {
|
||||
Module(new Queue(dType, qDepth))
|
||||
}
|
||||
|
||||
io.enq.ready := false.B
|
||||
io.deq.foreach(_.matches := false.B)
|
||||
io.deq.foreach(_.data := dType.fromBits(UInt(0)))
|
||||
|
||||
for ((q, i) <- queues.zipWithIndex) {
|
||||
when (io.enq.bits.tag === UInt(i)) { io.enq.ready := q.io.enq.ready }
|
||||
q.io.enq.valid := io.enq.valid && io.enq.bits.tag === UInt(i)
|
||||
q.io.enq.bits := io.enq.bits.data
|
||||
|
||||
val deqReadys = Wire(Vec(nDeq, Bool()))
|
||||
io.deq.zip(deqReadys).foreach { case (deq, rdy) =>
|
||||
when (deq.tag === UInt(i)) {
|
||||
deq.matches := q.io.deq.valid
|
||||
deq.data := q.io.deq.bits
|
||||
}
|
||||
rdy := deq.valid && deq.tag === UInt(i)
|
||||
}
|
||||
q.io.deq.ready := deqReadys.reduce(_ || _)
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,148 @@
|
|||
// See LICENSE.SiFive for license details.
|
||||
// See LICENSE.Berkeley for license details.
|
||||
|
||||
package junctions
|
||||
|
||||
import Chisel._
|
||||
import freechips.rocketchip.config._
|
||||
import scala.collection.mutable.HashMap
|
||||
|
||||
case class MemAttr(prot: Int, cacheable: Boolean = false)
|
||||
|
||||
sealed abstract class MemRegion {
|
||||
def start: BigInt
|
||||
def size: BigInt
|
||||
def numSlaves: Int
|
||||
def attr: MemAttr
|
||||
|
||||
def containsAddress(x: UInt) = UInt(start) <= x && x < UInt(start + size)
|
||||
}
|
||||
|
||||
case class MemSize(size: BigInt, attr: MemAttr) extends MemRegion {
|
||||
def start = 0
|
||||
def numSlaves = 1
|
||||
}
|
||||
|
||||
case class MemRange(start: BigInt, size: BigInt, attr: MemAttr) extends MemRegion {
|
||||
def numSlaves = 1
|
||||
}
|
||||
|
||||
object AddrMapProt {
|
||||
val R = 0x1
|
||||
val W = 0x2
|
||||
val X = 0x4
|
||||
val RW = R | W
|
||||
val RX = R | X
|
||||
val RWX = R | W | X
|
||||
val SZ = 3
|
||||
}
|
||||
|
||||
class AddrMapProt extends Bundle {
|
||||
val x = Bool()
|
||||
val w = Bool()
|
||||
val r = Bool()
|
||||
}
|
||||
|
||||
case class AddrMapEntry(name: String, region: MemRegion)
|
||||
|
||||
object AddrMap {
|
||||
def apply(elems: AddrMapEntry*): AddrMap = new AddrMap(elems)
|
||||
}
|
||||
|
||||
class AddrMap(
|
||||
entriesIn: Seq[AddrMapEntry],
|
||||
val start: BigInt = BigInt(0),
|
||||
val collapse: Boolean = false) extends MemRegion {
|
||||
private val slavePorts = HashMap[String, Int]()
|
||||
private val mapping = HashMap[String, MemRegion]()
|
||||
|
||||
def isEmpty = entries.isEmpty
|
||||
def length = entries.size
|
||||
def numSlaves = slavePorts.size
|
||||
|
||||
val (size: BigInt, entries: Seq[AddrMapEntry], attr: MemAttr) = {
|
||||
var ind = 0
|
||||
var base = start
|
||||
var rebasedEntries = collection.mutable.ArrayBuffer[AddrMapEntry]()
|
||||
var prot = 0
|
||||
var cacheable = true
|
||||
for (AddrMapEntry(name, r) <- entriesIn) {
|
||||
require (!mapping.contains(name))
|
||||
base = r.start
|
||||
|
||||
r match {
|
||||
case r: AddrMap =>
|
||||
val subMap = new AddrMap(r.entries, base, r.collapse)
|
||||
rebasedEntries += AddrMapEntry(name, subMap)
|
||||
mapping += name -> subMap
|
||||
mapping ++= subMap.mapping.map { case (k, v) => s"$name:$k" -> v }
|
||||
if (r.collapse) {
|
||||
slavePorts += (name -> ind)
|
||||
ind += 1
|
||||
} else {
|
||||
slavePorts ++= subMap.slavePorts.map {
|
||||
case (k, v) => s"$name:$k" -> (ind + v)
|
||||
}
|
||||
ind += r.numSlaves
|
||||
}
|
||||
case _ =>
|
||||
val e = MemRange(base, r.size, r.attr)
|
||||
rebasedEntries += AddrMapEntry(name, e)
|
||||
mapping += name -> e
|
||||
slavePorts += name -> ind
|
||||
ind += r.numSlaves
|
||||
}
|
||||
|
||||
base += r.size
|
||||
prot |= r.attr.prot
|
||||
cacheable &&= r.attr.cacheable
|
||||
}
|
||||
(base - start, rebasedEntries, MemAttr(prot, cacheable))
|
||||
}
|
||||
|
||||
val flatten: Seq[AddrMapEntry] = {
|
||||
mapping.toSeq.map {
|
||||
case (name, range: MemRange) => Some(AddrMapEntry(name, range))
|
||||
case _ => None
|
||||
}.flatten.sortBy(_.region.start)
|
||||
}
|
||||
|
||||
// checks to see whether any MemRange overlaps within this AddrMap
|
||||
flatten.combinations(2) foreach {
|
||||
case (Seq(AddrMapEntry(an, ar), AddrMapEntry(bn, br))) =>
|
||||
val arEnd = ar.start + ar.size
|
||||
val brEnd = br.start + br.size
|
||||
val abOverlaps = ar.start < brEnd && br.start < arEnd
|
||||
require(!abOverlaps,
|
||||
s"region $an@0x${ar.start.toString(16)} overlaps region $bn@0x${br.start.toString(16)}")
|
||||
}
|
||||
|
||||
def toRange: MemRange = MemRange(start, size, attr)
|
||||
def apply(name: String): MemRegion = mapping(name)
|
||||
def contains(name: String): Boolean = mapping.contains(name)
|
||||
def port(name: String): Int = slavePorts(name)
|
||||
def subMap(name: String): AddrMap = mapping(name).asInstanceOf[AddrMap]
|
||||
def isInRegion(name: String, addr: UInt): Bool = mapping(name).containsAddress(addr)
|
||||
|
||||
def isCacheable(addr: UInt): Bool = {
|
||||
flatten.filter(_.region.attr.cacheable).map(
|
||||
_.region.containsAddress(addr)
|
||||
).foldLeft(false.B)(_ || _)
|
||||
}
|
||||
|
||||
def isValid(addr: UInt): Bool = {
|
||||
flatten.map(_.region.containsAddress(addr)).foldLeft(false.B)(_ || _)
|
||||
}
|
||||
|
||||
def getProt(addr: UInt): AddrMapProt = {
|
||||
val protForRegion = flatten.map { entry =>
|
||||
Mux(entry.region.containsAddress(addr),
|
||||
UInt(entry.region.attr.prot, AddrMapProt.SZ), UInt(0))
|
||||
}
|
||||
new AddrMapProt().fromBits(protForRegion.reduce(_|_))
|
||||
}
|
||||
|
||||
override def containsAddress(x: UInt) = {
|
||||
flatten.map(_.region.containsAddress(x)).reduce(_||_)
|
||||
}
|
||||
}
|
|
@ -0,0 +1,610 @@
|
|||
/// See LICENSE for license details.
|
||||
|
||||
package junctions
|
||||
|
||||
import Chisel._
|
||||
import scala.math.{min, max}
|
||||
import scala.collection.mutable.ArraySeq
|
||||
import freechips.rocketchip.util.{DecoupledHelper, ParameterizedBundle, HellaPeekingArbiter}
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
|
||||
case object NastiKey extends Field[NastiParameters]
|
||||
|
||||
case class NastiParameters(dataBits: Int, addrBits: Int, idBits: Int)
|
||||
|
||||
trait HasNastiParameters {
|
||||
implicit val p: Parameters
|
||||
val nastiExternal = p(NastiKey)
|
||||
val nastiXDataBits = nastiExternal.dataBits
|
||||
val nastiWStrobeBits = nastiXDataBits / 8
|
||||
val nastiXAddrBits = nastiExternal.addrBits
|
||||
val nastiWIdBits = nastiExternal.idBits
|
||||
val nastiRIdBits = nastiExternal.idBits
|
||||
val nastiXIdBits = max(nastiWIdBits, nastiRIdBits)
|
||||
val nastiXUserBits = 1
|
||||
val nastiAWUserBits = nastiXUserBits
|
||||
val nastiWUserBits = nastiXUserBits
|
||||
val nastiBUserBits = nastiXUserBits
|
||||
val nastiARUserBits = nastiXUserBits
|
||||
val nastiRUserBits = nastiXUserBits
|
||||
val nastiXLenBits = 8
|
||||
val nastiXSizeBits = 3
|
||||
val nastiXBurstBits = 2
|
||||
val nastiXCacheBits = 4
|
||||
val nastiXProtBits = 3
|
||||
val nastiXQosBits = 4
|
||||
val nastiXRegionBits = 4
|
||||
val nastiXRespBits = 2
|
||||
|
||||
def bytesToXSize(bytes: UInt) = MuxLookup(bytes, UInt("b111"), Array(
|
||||
UInt(1) -> UInt(0),
|
||||
UInt(2) -> UInt(1),
|
||||
UInt(4) -> UInt(2),
|
||||
UInt(8) -> UInt(3),
|
||||
UInt(16) -> UInt(4),
|
||||
UInt(32) -> UInt(5),
|
||||
UInt(64) -> UInt(6),
|
||||
UInt(128) -> UInt(7)))
|
||||
}
|
||||
|
||||
abstract class NastiModule(implicit val p: Parameters) extends Module
|
||||
with HasNastiParameters
|
||||
abstract class NastiBundle(implicit val p: Parameters) extends ParameterizedBundle()(p)
|
||||
with HasNastiParameters
|
||||
|
||||
abstract class NastiChannel(implicit p: Parameters) extends NastiBundle()(p)
|
||||
abstract class NastiMasterToSlaveChannel(implicit p: Parameters) extends NastiChannel()(p)
|
||||
abstract class NastiSlaveToMasterChannel(implicit p: Parameters) extends NastiChannel()(p)
|
||||
|
||||
trait HasNastiMetadata extends HasNastiParameters {
|
||||
val addr = UInt(width = nastiXAddrBits)
|
||||
val len = UInt(width = nastiXLenBits)
|
||||
val size = UInt(width = nastiXSizeBits)
|
||||
val burst = UInt(width = nastiXBurstBits)
|
||||
val lock = Bool()
|
||||
val cache = UInt(width = nastiXCacheBits)
|
||||
val prot = UInt(width = nastiXProtBits)
|
||||
val qos = UInt(width = nastiXQosBits)
|
||||
val region = UInt(width = nastiXRegionBits)
|
||||
}
|
||||
|
||||
trait HasNastiData extends HasNastiParameters {
|
||||
val data = UInt(width = nastiXDataBits)
|
||||
val last = Bool()
|
||||
}
|
||||
|
||||
class NastiReadIO(implicit val p: Parameters) extends ParameterizedBundle()(p) {
|
||||
val ar = Decoupled(new NastiReadAddressChannel)
|
||||
val r = Decoupled(new NastiReadDataChannel).flip
|
||||
}
|
||||
|
||||
class NastiWriteIO(implicit val p: Parameters) extends ParameterizedBundle()(p) {
|
||||
val aw = Decoupled(new NastiWriteAddressChannel)
|
||||
val w = Decoupled(new NastiWriteDataChannel)
|
||||
val b = Decoupled(new NastiWriteResponseChannel).flip
|
||||
}
|
||||
|
||||
class NastiIO(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val aw = Decoupled(new NastiWriteAddressChannel)
|
||||
val w = Decoupled(new NastiWriteDataChannel)
|
||||
val b = Decoupled(new NastiWriteResponseChannel).flip
|
||||
val ar = Decoupled(new NastiReadAddressChannel)
|
||||
val r = Decoupled(new NastiReadDataChannel).flip
|
||||
}
|
||||
|
||||
class NastiAddressChannel(implicit p: Parameters) extends NastiMasterToSlaveChannel()(p)
|
||||
with HasNastiMetadata
|
||||
|
||||
class NastiResponseChannel(implicit p: Parameters) extends NastiSlaveToMasterChannel()(p) {
|
||||
val resp = UInt(width = nastiXRespBits)
|
||||
}
|
||||
|
||||
class NastiWriteAddressChannel(implicit p: Parameters) extends NastiAddressChannel()(p) {
|
||||
val id = UInt(width = nastiWIdBits)
|
||||
val user = UInt(width = nastiAWUserBits)
|
||||
}
|
||||
|
||||
class NastiWriteDataChannel(implicit p: Parameters) extends NastiMasterToSlaveChannel()(p)
|
||||
with HasNastiData {
|
||||
val id = UInt(width = nastiWIdBits)
|
||||
val strb = UInt(width = nastiWStrobeBits)
|
||||
val user = UInt(width = nastiWUserBits)
|
||||
}
|
||||
|
||||
class NastiWriteResponseChannel(implicit p: Parameters) extends NastiResponseChannel()(p) {
|
||||
val id = UInt(width = nastiWIdBits)
|
||||
val user = UInt(width = nastiBUserBits)
|
||||
}
|
||||
|
||||
class NastiReadAddressChannel(implicit p: Parameters) extends NastiAddressChannel()(p) {
|
||||
val id = UInt(width = nastiRIdBits)
|
||||
val user = UInt(width = nastiARUserBits)
|
||||
}
|
||||
|
||||
class NastiReadDataChannel(implicit p: Parameters) extends NastiResponseChannel()(p)
|
||||
with HasNastiData {
|
||||
val id = UInt(width = nastiRIdBits)
|
||||
val user = UInt(width = nastiRUserBits)
|
||||
}
|
||||
|
||||
object NastiConstants {
|
||||
val BURST_FIXED = UInt("b00")
|
||||
val BURST_INCR = UInt("b01")
|
||||
val BURST_WRAP = UInt("b10")
|
||||
|
||||
val RESP_OKAY = UInt("b00")
|
||||
val RESP_EXOKAY = UInt("b01")
|
||||
val RESP_SLVERR = UInt("b10")
|
||||
val RESP_DECERR = UInt("b11")
|
||||
|
||||
val CACHE_DEVICE_NOBUF = UInt("b0000")
|
||||
val CACHE_DEVICE_BUF = UInt("b0001")
|
||||
val CACHE_NORMAL_NOCACHE_NOBUF = UInt("b0010")
|
||||
val CACHE_NORMAL_NOCACHE_BUF = UInt("b0011")
|
||||
|
||||
def AXPROT(instruction: Bool, nonsecure: Bool, privileged: Bool): UInt =
|
||||
Cat(instruction, nonsecure, privileged)
|
||||
|
||||
def AXPROT(instruction: Boolean, nonsecure: Boolean, privileged: Boolean): UInt =
|
||||
AXPROT(Bool(instruction), Bool(nonsecure), Bool(privileged))
|
||||
}
|
||||
|
||||
import NastiConstants._
|
||||
|
||||
object NastiWriteAddressChannel {
|
||||
def apply(id: UInt, addr: UInt, size: UInt,
|
||||
len: UInt = UInt(0), burst: UInt = BURST_INCR)
|
||||
(implicit p: Parameters) = {
|
||||
val aw = Wire(new NastiWriteAddressChannel)
|
||||
aw.id := id
|
||||
aw.addr := addr
|
||||
aw.len := len
|
||||
aw.size := size
|
||||
aw.burst := burst
|
||||
aw.lock := false.B
|
||||
aw.cache := CACHE_DEVICE_NOBUF
|
||||
aw.prot := AXPROT(false, false, false)
|
||||
aw.qos := UInt("b0000")
|
||||
aw.region := UInt("b0000")
|
||||
aw.user := UInt(0)
|
||||
aw
|
||||
}
|
||||
}
|
||||
|
||||
object NastiReadAddressChannel {
|
||||
def apply(id: UInt, addr: UInt, size: UInt,
|
||||
len: UInt = UInt(0), burst: UInt = BURST_INCR)
|
||||
(implicit p: Parameters) = {
|
||||
val ar = Wire(new NastiReadAddressChannel)
|
||||
ar.id := id
|
||||
ar.addr := addr
|
||||
ar.len := len
|
||||
ar.size := size
|
||||
ar.burst := burst
|
||||
ar.lock := false.B
|
||||
ar.cache := CACHE_DEVICE_NOBUF
|
||||
ar.prot := AXPROT(false, false, false)
|
||||
ar.qos := UInt(0)
|
||||
ar.region := UInt(0)
|
||||
ar.user := UInt(0)
|
||||
ar
|
||||
}
|
||||
}
|
||||
|
||||
object NastiWriteDataChannel {
|
||||
def apply(data: UInt, strb: Option[UInt] = None,
|
||||
last: Bool = true.B, id: UInt = UInt(0))
|
||||
(implicit p: Parameters): NastiWriteDataChannel = {
|
||||
val w = Wire(new NastiWriteDataChannel)
|
||||
w.strb := strb.getOrElse(Fill(w.nastiWStrobeBits, UInt(1, 1)))
|
||||
w.data := data
|
||||
w.last := last
|
||||
w.id := id
|
||||
w.user := UInt(0)
|
||||
w
|
||||
}
|
||||
}
|
||||
|
||||
object NastiReadDataChannel {
|
||||
def apply(id: UInt, data: UInt, last: Bool = true.B, resp: UInt = UInt(0))(
|
||||
implicit p: Parameters) = {
|
||||
val r = Wire(new NastiReadDataChannel)
|
||||
r.id := id
|
||||
r.data := data
|
||||
r.last := last
|
||||
r.resp := resp
|
||||
r.user := UInt(0)
|
||||
r
|
||||
}
|
||||
}
|
||||
|
||||
object NastiWriteResponseChannel {
|
||||
def apply(id: UInt, resp: UInt = UInt(0))(implicit p: Parameters) = {
|
||||
val b = Wire(new NastiWriteResponseChannel)
|
||||
b.id := id
|
||||
b.resp := resp
|
||||
b.user := UInt(0)
|
||||
b
|
||||
}
|
||||
}
|
||||
|
||||
class NastiQueue(depth: Int)(implicit p: Parameters) extends Module {
|
||||
val io = new Bundle {
|
||||
val in = (new NastiIO).flip
|
||||
val out = new NastiIO
|
||||
}
|
||||
|
||||
io.out.ar <> Queue(io.in.ar, depth)
|
||||
io.out.aw <> Queue(io.in.aw, depth)
|
||||
io.out.w <> Queue(io.in.w, depth)
|
||||
io.in.r <> Queue(io.out.r, depth)
|
||||
io.in.b <> Queue(io.out.b, depth)
|
||||
}
|
||||
|
||||
object NastiQueue {
|
||||
def apply(in: NastiIO, depth: Int = 2)(implicit p: Parameters): NastiIO = {
|
||||
val queue = Module(new NastiQueue(depth))
|
||||
queue.io.in <> in
|
||||
queue.io.out
|
||||
}
|
||||
}
|
||||
|
||||
class NastiArbiterIO(arbN: Int)(implicit p: Parameters) extends Bundle {
|
||||
val master = Vec(arbN, new NastiIO).flip
|
||||
val slave = new NastiIO
|
||||
override def cloneType =
|
||||
new NastiArbiterIO(arbN).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
/** Arbitrate among arbN masters requesting to a single slave */
|
||||
class NastiArbiter(val arbN: Int)(implicit p: Parameters) extends NastiModule {
|
||||
val io = new NastiArbiterIO(arbN)
|
||||
|
||||
if (arbN > 1) {
|
||||
val arbIdBits = log2Up(arbN)
|
||||
|
||||
val ar_arb = Module(new RRArbiter(new NastiReadAddressChannel, arbN))
|
||||
val aw_arb = Module(new RRArbiter(new NastiWriteAddressChannel, arbN))
|
||||
|
||||
val w_chosen = Reg(UInt(width = arbIdBits))
|
||||
val w_done = Reg(init = true.B)
|
||||
|
||||
when (aw_arb.io.out.fire()) {
|
||||
w_chosen := aw_arb.io.chosen
|
||||
w_done := false.B
|
||||
}
|
||||
|
||||
when (io.slave.w.fire() && io.slave.w.bits.last) {
|
||||
w_done := true.B
|
||||
}
|
||||
|
||||
val queueSize = min((1 << nastiXIdBits) * arbN, 64)
|
||||
|
||||
val rroq = Module(new ReorderQueue(
|
||||
UInt(width = arbIdBits), nastiXIdBits, Some(queueSize)))
|
||||
|
||||
val wroq = Module(new ReorderQueue(
|
||||
UInt(width = arbIdBits), nastiXIdBits, Some(queueSize)))
|
||||
|
||||
for (i <- 0 until arbN) {
|
||||
val m_ar = io.master(i).ar
|
||||
val m_aw = io.master(i).aw
|
||||
val m_r = io.master(i).r
|
||||
val m_b = io.master(i).b
|
||||
val a_ar = ar_arb.io.in(i)
|
||||
val a_aw = aw_arb.io.in(i)
|
||||
val m_w = io.master(i).w
|
||||
|
||||
a_ar <> m_ar
|
||||
a_aw <> m_aw
|
||||
|
||||
m_r.valid := io.slave.r.valid && rroq.io.deq.head.data === UInt(i)
|
||||
m_r.bits := io.slave.r.bits
|
||||
|
||||
m_b.valid := io.slave.b.valid && wroq.io.deq.head.data === UInt(i)
|
||||
m_b.bits := io.slave.b.bits
|
||||
|
||||
m_w.ready := io.slave.w.ready && w_chosen === UInt(i) && !w_done
|
||||
}
|
||||
|
||||
io.slave.r.ready := io.master(rroq.io.deq.head.data).r.ready
|
||||
io.slave.b.ready := io.master(wroq.io.deq.head.data).b.ready
|
||||
|
||||
rroq.io.deq.head.tag := io.slave.r.bits.id
|
||||
rroq.io.deq.head.valid := io.slave.r.fire() && io.slave.r.bits.last
|
||||
wroq.io.deq.head.tag := io.slave.b.bits.id
|
||||
wroq.io.deq.head.valid := io.slave.b.fire()
|
||||
|
||||
assert(!rroq.io.deq.head.valid || rroq.io.deq.head.matches,
|
||||
"NastiArbiter: read response mismatch")
|
||||
assert(!wroq.io.deq.head.valid || wroq.io.deq.head.matches,
|
||||
"NastiArbiter: write response mismatch")
|
||||
|
||||
io.slave.w.bits := io.master(w_chosen).w.bits
|
||||
io.slave.w.valid := io.master(w_chosen).w.valid && !w_done
|
||||
|
||||
val ar_helper = DecoupledHelper(
|
||||
ar_arb.io.out.valid,
|
||||
io.slave.ar.ready,
|
||||
rroq.io.enq.ready)
|
||||
|
||||
io.slave.ar.valid := ar_helper.fire(io.slave.ar.ready)
|
||||
io.slave.ar.bits := ar_arb.io.out.bits
|
||||
ar_arb.io.out.ready := ar_helper.fire(ar_arb.io.out.valid)
|
||||
rroq.io.enq.valid := ar_helper.fire(rroq.io.enq.ready)
|
||||
rroq.io.enq.bits.tag := ar_arb.io.out.bits.id
|
||||
rroq.io.enq.bits.data := ar_arb.io.chosen
|
||||
|
||||
val aw_helper = DecoupledHelper(
|
||||
aw_arb.io.out.valid,
|
||||
io.slave.aw.ready,
|
||||
wroq.io.enq.ready)
|
||||
|
||||
io.slave.aw.bits <> aw_arb.io.out.bits
|
||||
io.slave.aw.valid := aw_helper.fire(io.slave.aw.ready, w_done)
|
||||
aw_arb.io.out.ready := aw_helper.fire(aw_arb.io.out.valid, w_done)
|
||||
wroq.io.enq.valid := aw_helper.fire(wroq.io.enq.ready, w_done)
|
||||
wroq.io.enq.bits.tag := aw_arb.io.out.bits.id
|
||||
wroq.io.enq.bits.data := aw_arb.io.chosen
|
||||
|
||||
} else { io.slave <> io.master.head }
|
||||
}
|
||||
|
||||
/** A slave that send decode error for every request it receives */
|
||||
class NastiErrorSlave(implicit p: Parameters) extends NastiModule {
|
||||
val io = (new NastiIO).flip
|
||||
|
||||
when (io.ar.fire()) { printf("Invalid read address %x\n", io.ar.bits.addr) }
|
||||
when (io.aw.fire()) { printf("Invalid write address %x\n", io.aw.bits.addr) }
|
||||
|
||||
val r_queue = Module(new Queue(new NastiReadAddressChannel, 1))
|
||||
r_queue.io.enq <> io.ar
|
||||
|
||||
val responding = Reg(init = false.B)
|
||||
val beats_left = Reg(init = UInt(0, nastiXLenBits))
|
||||
|
||||
when (!responding && r_queue.io.deq.valid) {
|
||||
responding := true.B
|
||||
beats_left := r_queue.io.deq.bits.len
|
||||
}
|
||||
|
||||
io.r.valid := r_queue.io.deq.valid && responding
|
||||
io.r.bits.id := r_queue.io.deq.bits.id
|
||||
io.r.bits.data := UInt(0)
|
||||
io.r.bits.resp := RESP_DECERR
|
||||
io.r.bits.last := beats_left === UInt(0)
|
||||
|
||||
r_queue.io.deq.ready := io.r.fire() && io.r.bits.last
|
||||
|
||||
when (io.r.fire()) {
|
||||
when (beats_left === UInt(0)) {
|
||||
responding := false.B
|
||||
} .otherwise {
|
||||
beats_left := beats_left - UInt(1)
|
||||
}
|
||||
}
|
||||
|
||||
val draining = Reg(init = false.B)
|
||||
io.w.ready := draining
|
||||
|
||||
when (io.aw.fire()) { draining := true.B }
|
||||
when (io.w.fire() && io.w.bits.last) { draining := false.B }
|
||||
|
||||
val b_queue = Module(new Queue(UInt(width = nastiWIdBits), 1))
|
||||
b_queue.io.enq.valid := io.aw.valid && !draining
|
||||
b_queue.io.enq.bits := io.aw.bits.id
|
||||
io.aw.ready := b_queue.io.enq.ready && !draining
|
||||
io.b.valid := b_queue.io.deq.valid && !draining
|
||||
io.b.bits.id := b_queue.io.deq.bits
|
||||
io.b.bits.resp := RESP_DECERR
|
||||
b_queue.io.deq.ready := io.b.ready && !draining
|
||||
}
|
||||
|
||||
class NastiRouterIO(nSlaves: Int)(implicit p: Parameters) extends Bundle {
|
||||
val master = (new NastiIO).flip
|
||||
val slave = Vec(nSlaves, new NastiIO)
|
||||
override def cloneType =
|
||||
new NastiRouterIO(nSlaves).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
/** Take a single Nasti master and route its requests to various slaves
|
||||
* @param nSlaves the number of slaves
|
||||
* @param routeSel a function which takes an address and produces
|
||||
* a one-hot encoded selection of the slave to write to */
|
||||
class NastiRouter(nSlaves: Int, routeSel: UInt => UInt)(implicit p: Parameters)
|
||||
extends NastiModule {
|
||||
|
||||
val io = new NastiRouterIO(nSlaves)
|
||||
|
||||
val ar_route = routeSel(io.master.ar.bits.addr)
|
||||
val aw_route = routeSel(io.master.aw.bits.addr)
|
||||
|
||||
val ar_ready = Wire(init = false.B)
|
||||
val aw_ready = Wire(init = false.B)
|
||||
val w_ready = Wire(init = false.B)
|
||||
|
||||
val queueSize = min((1 << nastiXIdBits) * nSlaves, 64)
|
||||
|
||||
// These reorder queues remember which slave ports requests were sent on
|
||||
// so that the responses can be sent back in-order on the master
|
||||
val ar_queue = Module(new ReorderQueue(
|
||||
UInt(width = log2Up(nSlaves + 1)), nastiXIdBits,
|
||||
Some(queueSize), nSlaves + 1))
|
||||
val aw_queue = Module(new ReorderQueue(
|
||||
UInt(width = log2Up(nSlaves + 1)), nastiXIdBits,
|
||||
Some(queueSize), nSlaves + 1))
|
||||
// This queue holds the accepted aw_routes so that we know how to route the
|
||||
val w_queue = Module(new Queue(aw_route, nSlaves))
|
||||
|
||||
val ar_helper = DecoupledHelper(
|
||||
io.master.ar.valid,
|
||||
ar_queue.io.enq.ready,
|
||||
ar_ready)
|
||||
|
||||
val aw_helper = DecoupledHelper(
|
||||
io.master.aw.valid,
|
||||
w_queue.io.enq.ready,
|
||||
aw_queue.io.enq.ready,
|
||||
aw_ready)
|
||||
|
||||
val w_helper = DecoupledHelper(
|
||||
io.master.w.valid,
|
||||
w_queue.io.deq.valid,
|
||||
w_ready)
|
||||
|
||||
def routeEncode(oh: UInt): UInt = Mux(oh.orR, OHToUInt(oh), UInt(nSlaves))
|
||||
|
||||
ar_queue.io.enq.valid := ar_helper.fire(ar_queue.io.enq.ready)
|
||||
ar_queue.io.enq.bits.tag := io.master.ar.bits.id
|
||||
ar_queue.io.enq.bits.data := routeEncode(ar_route)
|
||||
|
||||
aw_queue.io.enq.valid := aw_helper.fire(aw_queue.io.enq.ready)
|
||||
aw_queue.io.enq.bits.tag := io.master.aw.bits.id
|
||||
aw_queue.io.enq.bits.data := routeEncode(aw_route)
|
||||
|
||||
w_queue.io.enq.valid := aw_helper.fire(w_queue.io.enq.ready)
|
||||
w_queue.io.enq.bits := aw_route
|
||||
w_queue.io.deq.ready := w_helper.fire(w_queue.io.deq.valid, io.master.w.bits.last)
|
||||
|
||||
io.master.ar.ready := ar_helper.fire(io.master.ar.valid)
|
||||
io.master.aw.ready := aw_helper.fire(io.master.aw.valid)
|
||||
io.master.w.ready := w_helper.fire(io.master.w.valid)
|
||||
|
||||
val ar_valid = ar_helper.fire(ar_ready)
|
||||
val aw_valid = aw_helper.fire(aw_ready)
|
||||
val w_valid = w_helper.fire(w_ready)
|
||||
val w_route = w_queue.io.deq.bits
|
||||
|
||||
io.slave.zipWithIndex.foreach { case (s, i) =>
|
||||
s.ar.valid := ar_valid && ar_route(i)
|
||||
s.ar.bits := io.master.ar.bits
|
||||
when (ar_route(i)) { ar_ready := s.ar.ready }
|
||||
|
||||
s.aw.valid := aw_valid && aw_route(i)
|
||||
s.aw.bits := io.master.aw.bits
|
||||
when (aw_route(i)) { aw_ready := s.aw.ready }
|
||||
|
||||
s.w.valid := w_valid && w_route(i)
|
||||
s.w.bits := io.master.w.bits
|
||||
when (w_route(i)) { w_ready := s.w.ready }
|
||||
}
|
||||
|
||||
val ar_noroute = !ar_route.orR
|
||||
val aw_noroute = !aw_route.orR
|
||||
val w_noroute = !w_route.orR
|
||||
|
||||
val err_slave = Module(new NastiErrorSlave)
|
||||
err_slave.io.ar.valid := ar_valid && ar_noroute
|
||||
err_slave.io.ar.bits := io.master.ar.bits
|
||||
err_slave.io.aw.valid := aw_valid && aw_noroute
|
||||
err_slave.io.aw.bits := io.master.aw.bits
|
||||
err_slave.io.w.valid := w_valid && w_noroute
|
||||
err_slave.io.w.bits := io.master.w.bits
|
||||
|
||||
when (ar_noroute) { ar_ready := err_slave.io.ar.ready }
|
||||
when (aw_noroute) { aw_ready := err_slave.io.aw.ready }
|
||||
when (w_noroute) { w_ready := err_slave.io.w.ready }
|
||||
|
||||
val b_arb = Module(new RRArbiter(new NastiWriteResponseChannel, nSlaves + 1))
|
||||
val r_arb = Module(new HellaPeekingArbiter(
|
||||
new NastiReadDataChannel, nSlaves + 1,
|
||||
// we can unlock if it's the last beat
|
||||
(r: NastiReadDataChannel) => r.last, rr = true))
|
||||
|
||||
val all_slaves = io.slave :+ err_slave.io
|
||||
|
||||
for (i <- 0 to nSlaves) {
|
||||
b_arb.io.in(i) <> all_slaves(i).b
|
||||
aw_queue.io.deq(i).valid := all_slaves(i).b.fire()
|
||||
aw_queue.io.deq(i).tag := all_slaves(i).b.bits.id
|
||||
|
||||
r_arb.io.in(i) <> all_slaves(i).r
|
||||
ar_queue.io.deq(i).valid := all_slaves(i).r.fire() && all_slaves(i).r.bits.last
|
||||
ar_queue.io.deq(i).tag := all_slaves(i).r.bits.id
|
||||
|
||||
assert(!aw_queue.io.deq(i).valid || aw_queue.io.deq(i).matches,
|
||||
s"aw_queue $i tried to dequeue untracked transaction")
|
||||
assert(!ar_queue.io.deq(i).valid || ar_queue.io.deq(i).matches,
|
||||
s"ar_queue $i tried to dequeue untracked transaction")
|
||||
}
|
||||
|
||||
io.master.b <> b_arb.io.out
|
||||
io.master.r <> r_arb.io.out
|
||||
}
|
||||
|
||||
/** Crossbar between multiple Nasti masters and slaves
|
||||
* @param nMasters the number of Nasti masters
|
||||
* @param nSlaves the number of Nasti slaves
|
||||
* @param routeSel a function selecting the slave to route an address to */
|
||||
class NastiCrossbar(nMasters: Int, nSlaves: Int,
|
||||
routeSel: UInt => UInt)
|
||||
(implicit p: Parameters) extends NastiModule {
|
||||
val io = new Bundle {
|
||||
val masters = Vec(nMasters, new NastiIO).flip
|
||||
val slaves = Vec(nSlaves, new NastiIO)
|
||||
}
|
||||
|
||||
if (nMasters == 1) {
|
||||
val router = Module(new NastiRouter(nSlaves, routeSel))
|
||||
router.io.master <> io.masters.head
|
||||
io.slaves <> router.io.slave
|
||||
} else {
|
||||
val routers = Vec.fill(nMasters) { Module(new NastiRouter(nSlaves, routeSel)).io }
|
||||
val arbiters = Vec.fill(nSlaves) { Module(new NastiArbiter(nMasters)).io }
|
||||
|
||||
for (i <- 0 until nMasters) {
|
||||
routers(i).master <> io.masters(i)
|
||||
}
|
||||
|
||||
for (i <- 0 until nSlaves) {
|
||||
arbiters(i).master <> Vec(routers.map(r => r.slave(i)))
|
||||
io.slaves(i) <> arbiters(i).slave
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
class NastiInterconnectIO(val nMasters: Int, val nSlaves: Int)
|
||||
(implicit p: Parameters) extends Bundle {
|
||||
/* This is a bit confusing. The interconnect is a slave to the masters and
|
||||
* a master to the slaves. Hence why the declarations seem to be backwards. */
|
||||
val masters = Vec(nMasters, new NastiIO).flip
|
||||
val slaves = Vec(nSlaves, new NastiIO)
|
||||
override def cloneType =
|
||||
new NastiInterconnectIO(nMasters, nSlaves).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
abstract class NastiInterconnect(implicit p: Parameters) extends NastiModule()(p) {
|
||||
val nMasters: Int
|
||||
val nSlaves: Int
|
||||
|
||||
lazy val io = new NastiInterconnectIO(nMasters, nSlaves)
|
||||
}
|
||||
|
||||
class NastiRecursiveInterconnect(
|
||||
val nMasters: Int, addrMap: AddrMap)
|
||||
(implicit p: Parameters) extends NastiInterconnect()(p) {
|
||||
def port(name: String) = io.slaves(addrMap.port(name))
|
||||
val nSlaves = addrMap.numSlaves
|
||||
val routeSel = (addr: UInt) =>
|
||||
Cat(addrMap.entries.map(e => addrMap(e.name).containsAddress(addr)).reverse)
|
||||
|
||||
val xbar = Module(new NastiCrossbar(nMasters, addrMap.length, routeSel))
|
||||
xbar.io.masters <> io.masters
|
||||
|
||||
io.slaves <> addrMap.entries.zip(xbar.io.slaves).flatMap {
|
||||
case (entry, xbarSlave) => {
|
||||
entry.region match {
|
||||
case submap: AddrMap if submap.entries.isEmpty =>
|
||||
val err_slave = Module(new NastiErrorSlave)
|
||||
err_slave.io <> xbarSlave
|
||||
None
|
||||
case submap: AddrMap =>
|
||||
val ic = Module(new NastiRecursiveInterconnect(1, submap))
|
||||
ic.io.masters.head <> xbarSlave
|
||||
ic.io.slaves
|
||||
case r: MemRange =>
|
||||
Some(xbarSlave)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,90 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas
|
||||
|
||||
import passes.Utils.writeEmittedCircuit
|
||||
|
||||
import chisel3.{Data, Bundle, Record, Clock, Bool}
|
||||
import chisel3.internal.firrtl.Port
|
||||
import firrtl.ir.Circuit
|
||||
import firrtl.{Transform, CircuitState}
|
||||
import firrtl.annotations.Annotation
|
||||
import firrtl.CompilerUtils.getLoweringTransforms
|
||||
import firrtl.passes.memlib._
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
import java.io.{File, FileWriter, Writer}
|
||||
import logger._
|
||||
|
||||
// Directory into which output files are dumped. Set by dir argument
|
||||
case object OutputDir extends Field[File]
|
||||
|
||||
// Compiler for Midas Transforms
|
||||
private class MidasCompiler extends firrtl.Compiler {
|
||||
def emitter = new firrtl.LowFirrtlEmitter
|
||||
def transforms =
|
||||
getLoweringTransforms(firrtl.ChirrtlForm, firrtl.MidForm) ++
|
||||
Seq(new InferReadWrite) ++
|
||||
getLoweringTransforms(firrtl.MidForm, firrtl.LowForm)
|
||||
}
|
||||
|
||||
// These next two compilers split LFO from the rest of the lowering
|
||||
// compilers to schedule around the presence of internal & non-standard WIR
|
||||
// nodes (Dshlw) present after LFO, which custom transforms can't handle
|
||||
private class HostTransformCompiler extends firrtl.Compiler {
|
||||
def emitter = new firrtl.LowFirrtlEmitter
|
||||
def transforms =
|
||||
Seq(new firrtl.IRToWorkingIR,
|
||||
new firrtl.ResolveAndCheck,
|
||||
new firrtl.HighFirrtlToMiddleFirrtl) ++
|
||||
getLoweringTransforms(firrtl.MidForm, firrtl.LowForm)
|
||||
}
|
||||
|
||||
// Custom transforms have been scheduled -> do the final lowering
|
||||
private class LastStageVerilogCompiler extends firrtl.Compiler {
|
||||
def emitter = new firrtl.VerilogEmitter
|
||||
def transforms = Seq(new firrtl.LowFirrtlOptimization,
|
||||
new firrtl.transforms.RemoveReset)
|
||||
}
|
||||
|
||||
object MidasCompiler {
|
||||
def apply(
|
||||
chirrtl: Circuit,
|
||||
targetAnnos: Seq[Annotation],
|
||||
io: Seq[(String, Data)],
|
||||
dir: File,
|
||||
targetTransforms: Seq[Transform], // Run pre-MIDAS transforms, on the target RTL
|
||||
hostTransforms: Seq[Transform] // Run post-MIDAS transformations
|
||||
)
|
||||
(implicit p: Parameters): CircuitState = {
|
||||
val midasAnnos = Seq(
|
||||
firrtl.TargetDirAnnotation(dir.getPath()),
|
||||
InferReadWriteAnnotation)
|
||||
val midasTransforms = new passes.MidasTransforms(io)(p alterPartial { case OutputDir => dir })
|
||||
val compiler = new MidasCompiler
|
||||
val midas = compiler.compile(firrtl.CircuitState(
|
||||
chirrtl, firrtl.ChirrtlForm, targetAnnos ++ midasAnnos),
|
||||
targetTransforms :+ midasTransforms)
|
||||
|
||||
val postHostTransforms = new HostTransformCompiler().compile(midas, hostTransforms)
|
||||
val result = new LastStageVerilogCompiler().compileAndEmit(postHostTransforms)
|
||||
|
||||
writeEmittedCircuit(result, new File(dir, s"FPGATop.v"))
|
||||
result
|
||||
}
|
||||
|
||||
// Unlike above, elaborates the target locally, before constructing the target IO Record.
|
||||
def apply[T <: chisel3.core.UserModule](
|
||||
w: => T,
|
||||
dir: File,
|
||||
targetTransforms: Seq[Transform] = Seq.empty,
|
||||
hostTransforms: Seq[Transform] = Seq.empty
|
||||
)
|
||||
(implicit p: Parameters): CircuitState = {
|
||||
dir.mkdirs
|
||||
lazy val target = w
|
||||
val circuit = chisel3.Driver.elaborate(() => target)
|
||||
val chirrtl = firrtl.Parser.parse(chisel3.Driver.emit(circuit))
|
||||
val io = target.getPorts map (p => p.id.instanceName -> p.id)
|
||||
apply(chirrtl, circuit.annotations.map(_.toFirrtl), io, dir, targetTransforms, hostTransforms)
|
||||
}
|
||||
}
|
|
@ -0,0 +1,85 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas
|
||||
|
||||
import core._
|
||||
import widgets._
|
||||
import platform._
|
||||
import models._
|
||||
import strober.core._
|
||||
import junctions.{NastiKey, NastiParameters}
|
||||
import freechips.rocketchip.config.{Parameters, Config, Field}
|
||||
import freechips.rocketchip.unittest.UnitTests
|
||||
|
||||
|
||||
trait PlatformType
|
||||
case object Zynq extends PlatformType
|
||||
case object F1 extends PlatformType
|
||||
case object Platform extends Field[PlatformType]
|
||||
// Switches to synthesize prints and assertions
|
||||
case object SynthAsserts extends Field[Boolean]
|
||||
case object SynthPrints extends Field[Boolean]
|
||||
// Exclude module instances from assertion and print synthesis
|
||||
// Tuple of Parent Module (where the instance is instantiated) and the instance name
|
||||
case object EnableSnapshot extends Field[Boolean]
|
||||
case object HasDMAChannel extends Field[Boolean]
|
||||
case object KeepSamplesInMem extends Field[Boolean]
|
||||
|
||||
// MIDAS 2.0 Switches
|
||||
case object GenerateMultiCycleRamModels extends Field[Boolean](false)
|
||||
// User provided transforms to run before Golden Gate transformations
|
||||
// These are constructor functions accept a Parameters instance and produce a
|
||||
// sequence of firrtl Transforms to run
|
||||
case object TargetTransforms extends Field[Seq[(Parameters) => Seq[firrtl.Transform]]](Seq())
|
||||
// User provided transforms to run after Golden Gate transformations
|
||||
case object HostTransforms extends Field[Seq[(Parameters) => Seq[firrtl.Transform]]](Seq())
|
||||
|
||||
class SimConfig extends Config((site, here, up) => {
|
||||
case TraceMaxLen => 1024
|
||||
case SRAMChainNum => 1
|
||||
case ChannelLen => 16
|
||||
case ChannelWidth => 32
|
||||
case DaisyWidth => 32
|
||||
case SynthAsserts => false
|
||||
case SynthPrints => false
|
||||
case EnableSnapshot => false
|
||||
case KeepSamplesInMem => true
|
||||
case CtrlNastiKey => NastiParameters(32, 32, 12)
|
||||
case DMANastiKey => NastiParameters(512, 64, 6)
|
||||
case FpgaMMIOSize => BigInt(1) << 12 // 4 KB
|
||||
case AXIDebugPrint => false
|
||||
case HostMemChannelNastiKey => NastiParameters(64, 32, 6)
|
||||
case HostMemNumChannels => 1
|
||||
|
||||
case MemNastiKey => site(HostMemChannelNastiKey).copy(
|
||||
addrBits = chisel3.util.log2Ceil(site(HostMemNumChannels)) + site(HostMemChannelNastiKey).addrBits,
|
||||
// TODO: We should try to constrain masters to 4 bits of ID space -> but we need to map
|
||||
// multiple target-ids on a single host-id in the DRAM timing model to support that
|
||||
idBits = 6
|
||||
)
|
||||
})
|
||||
|
||||
class ZynqConfig extends Config(new Config((site, here, up) => {
|
||||
case Platform => Zynq
|
||||
case HasDMAChannel => false
|
||||
case MasterNastiKey => site(CtrlNastiKey)
|
||||
}) ++ new SimConfig)
|
||||
|
||||
class ZynqConfigWithSnapshot extends Config(new Config((site, here, up) => {
|
||||
case EnableSnapshot => true
|
||||
}) ++ new ZynqConfig)
|
||||
|
||||
// we are assuming the host-DRAM size is 2^chAddrBits
|
||||
class F1Config extends Config(new Config((site, here, up) => {
|
||||
case Platform => F1
|
||||
case HasDMAChannel => true
|
||||
case CtrlNastiKey => NastiParameters(32, 25, 12)
|
||||
case MasterNastiKey => site(CtrlNastiKey)
|
||||
case HostMemChannelNastiKey => NastiParameters(64, 34, 16)
|
||||
case HostMemNumChannels => 4
|
||||
}) ++ new SimConfig)
|
||||
|
||||
class F1ConfigWithSnapshot extends Config(new Config((site, here, up) => {
|
||||
case EnableSnapshot => true
|
||||
}) ++ new F1Config)
|
||||
|
|
@ -0,0 +1,76 @@
|
|||
// See LICENSE for license details.
|
||||
package midas.unittest
|
||||
|
||||
import chisel3._
|
||||
import chisel3.experimental.RawModule
|
||||
import firrtl.{ExecutionOptionsManager, HasFirrtlOptions}
|
||||
|
||||
import freechips.rocketchip.config.{Parameters, Config, Field}
|
||||
import midas.widgets.ScanRegister
|
||||
|
||||
case object QoRTargets extends Field[Parameters => Seq[RawModule]]
|
||||
class QoRShim(implicit val p: Parameters) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val scanIn = Input(Bool())
|
||||
val scanOut = Output(Bool())
|
||||
val scanEnable = Input(Bool())
|
||||
})
|
||||
|
||||
val modules = p(QoRTargets)(p)
|
||||
val scanOuts = modules.map({ module =>
|
||||
val ports = module.getPorts.flatMap({
|
||||
case chisel3.internal.firrtl.Port(id: Clock, _) => None
|
||||
case chisel3.internal.firrtl.Port(id, _) => Some(id)
|
||||
})
|
||||
ScanRegister(ports, io.scanEnable, io.scanIn)
|
||||
})
|
||||
io.scanOut := scanOuts.reduce(_ || _)
|
||||
}
|
||||
|
||||
class Midas2QoRTargets extends Config((site, here, up) => {
|
||||
case QoRTargets => (q: Parameters) => {
|
||||
implicit val p = q
|
||||
Seq(
|
||||
Module(new midas.models.sram.AsyncMemChiselModel(160, 64, 6, 3))
|
||||
)
|
||||
}
|
||||
})
|
||||
|
||||
|
||||
// Generates synthesizable unit tests for key modules, such as simulation channels
|
||||
// See: src/main/cc/unittest/Makefile for the downstream RTL-simulation flow
|
||||
//
|
||||
// TODO: Make the core of this generator a trait that can be mixed into
|
||||
// FireSim's ScalaTests for more type safety
|
||||
object QoRShimGenerator extends App with freechips.rocketchip.util.HasGeneratorUtilities {
|
||||
|
||||
case class QoRShimOptions(
|
||||
configProject: String = "midas.unittest",
|
||||
config: String = "Midas2QoRTargets") {
|
||||
val fullConfigClasses: Seq[String] = Seq(configProject + "." + config)
|
||||
}
|
||||
|
||||
trait HasUnitTestOptions {
|
||||
self: ExecutionOptionsManager =>
|
||||
var qorOptions = QoRShimOptions()
|
||||
parser.note("MIDAS Unit Test Generator Options")
|
||||
parser.opt[String]("config-project")
|
||||
.abbr("cp")
|
||||
.valueName("<config-project>")
|
||||
.foreach { d => qorOptions = qorOptions.copy(configProject = d) }
|
||||
parser.opt[String]("config")
|
||||
.abbr("conf")
|
||||
.valueName("<configClassName>")
|
||||
.foreach { cfg => qorOptions = qorOptions.copy(config = cfg) }
|
||||
}
|
||||
|
||||
val exOptions = new ExecutionOptionsManager("qor")
|
||||
with HasChiselExecutionOptions
|
||||
with HasFirrtlOptions
|
||||
with HasUnitTestOptions
|
||||
|
||||
exOptions.parse(args)
|
||||
|
||||
val params = getConfig(exOptions.qorOptions.fullConfigClasses).toInstance
|
||||
Driver.execute(exOptions, () => new QoRShim()(params))
|
||||
}
|
|
@ -0,0 +1,81 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas.unittest
|
||||
|
||||
import midas.core._
|
||||
|
||||
import chisel3._
|
||||
import firrtl.{ExecutionOptionsManager, HasFirrtlOptions}
|
||||
|
||||
import freechips.rocketchip.config.{Parameters, Config, Field}
|
||||
import freechips.rocketchip.unittest.{UnitTests, TestHarness}
|
||||
import midas.models.{CounterTableUnitTest, LatencyHistogramUnitTest, AddressRangeCounterUnitTest}
|
||||
|
||||
|
||||
// Unittests
|
||||
class WithAllUnitTests extends Config((site, here, up) => {
|
||||
case UnitTests => (q: Parameters) => {
|
||||
implicit val p = q
|
||||
val timeout = 2000000
|
||||
Seq(
|
||||
Module(new PipeChannelUnitTest(latency = 0, timeout = timeout)),
|
||||
Module(new PipeChannelUnitTest(latency = 1, timeout = timeout)),
|
||||
Module(new ReadyValidChannelUnitTest(timeout = timeout)),
|
||||
Module(new CounterTableUnitTest),
|
||||
Module(new LatencyHistogramUnitTest),
|
||||
Module(new AddressRangeCounterUnitTest))
|
||||
}
|
||||
})
|
||||
|
||||
// Failing tests
|
||||
class WithTimeOutCheck extends Config((site, here, up) => {
|
||||
case UnitTests => (q: Parameters) => {
|
||||
implicit val p = q
|
||||
Seq(
|
||||
Module(new PipeChannelUnitTest(timeout = 100)),
|
||||
)
|
||||
}
|
||||
})
|
||||
|
||||
// Complete configs
|
||||
class AllUnitTests extends Config(new WithAllUnitTests ++ new midas.SimConfig)
|
||||
class TimeOutCheck extends Config(new WithTimeOutCheck ++ new midas.SimConfig)
|
||||
|
||||
// Generates synthesizable unit tests for key modules, such as simulation channels
|
||||
// See: src/main/cc/unittest/Makefile for the downstream RTL-simulation flow
|
||||
//
|
||||
// TODO: Make the core of this generator a trait that can be mixed into
|
||||
// FireSim's ScalaTests for more type safety
|
||||
object Generator extends App with freechips.rocketchip.util.HasGeneratorUtilities {
|
||||
|
||||
case class UnitTestOptions(
|
||||
configProject: String = "midas.unittest",
|
||||
config: String = "AllUnitTests") {
|
||||
val fullConfigClasses: Seq[String] = Seq(configProject + "." + config)
|
||||
}
|
||||
|
||||
trait HasUnitTestOptions {
|
||||
self: ExecutionOptionsManager =>
|
||||
var utOptions = UnitTestOptions()
|
||||
parser.note("MIDAS Unit Test Generator Options")
|
||||
parser.opt[String]("config-project")
|
||||
.abbr("cp")
|
||||
.valueName("<config-project>")
|
||||
.foreach { d => utOptions = utOptions.copy(configProject = d) }
|
||||
parser.opt[String]("config")
|
||||
.abbr("conf")
|
||||
.valueName("<configClassName>")
|
||||
.foreach { cfg => utOptions = utOptions.copy(config = cfg) }
|
||||
}
|
||||
|
||||
val exOptions = new ExecutionOptionsManager("regressions")
|
||||
with HasChiselExecutionOptions
|
||||
with HasFirrtlOptions
|
||||
with HasUnitTestOptions
|
||||
|
||||
exOptions.parse(args)
|
||||
|
||||
val params = getConfig(exOptions.utOptions.fullConfigClasses).toInstance
|
||||
Driver.execute(exOptions, () => new TestHarness()(params))
|
||||
}
|
||||
|
|
@ -0,0 +1,374 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas
|
||||
package core
|
||||
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.unittest._
|
||||
import freechips.rocketchip.util.{DecoupledHelper}
|
||||
import freechips.rocketchip.tilelink.LFSR64 // Better than chisel's
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.experimental.{dontTouch, chiselName, MultiIOModule}
|
||||
|
||||
import strober.core.{TraceQueue, TraceMaxLen}
|
||||
import midas.core.SimUtils.{ChLeafType}
|
||||
|
||||
// For now use the convention that clock ratios are set with respect to the transformed RTL
|
||||
trait IsRationalClockRatio {
|
||||
def numerator: Int
|
||||
def denominator: Int
|
||||
def isUnity() = numerator == denominator
|
||||
def isReciprocal() = numerator == 1
|
||||
def isIntegral() = denominator == 1
|
||||
def inverse: IsRationalClockRatio
|
||||
}
|
||||
|
||||
case class RationalClockRatio(numerator: Int, denominator: Int) extends IsRationalClockRatio {
|
||||
def inverse() = RationalClockRatio(denominator, numerator)
|
||||
}
|
||||
|
||||
case object UnityClockRatio extends IsRationalClockRatio {
|
||||
val numerator = 1
|
||||
val denominator = 1
|
||||
def inverse() = UnityClockRatio
|
||||
}
|
||||
|
||||
case class ReciprocalClockRatio(denominator: Int) extends IsRationalClockRatio {
|
||||
val numerator = 1
|
||||
def inverse = IntegralClockRatio(numerator = denominator)
|
||||
}
|
||||
|
||||
case class IntegralClockRatio(numerator: Int) extends IsRationalClockRatio {
|
||||
val denominator = 1
|
||||
def inverse = ReciprocalClockRatio(denominator = numerator)
|
||||
}
|
||||
|
||||
class PipeChannelIO[T <: ChLeafType](gen: T)(implicit p: Parameters) extends Bundle {
|
||||
val in = Flipped(Decoupled(gen))
|
||||
val out = Decoupled(gen)
|
||||
val trace = Decoupled(gen)
|
||||
val traceLen = Input(UInt(log2Up(p(TraceMaxLen)+1).W))
|
||||
override def cloneType = new PipeChannelIO(gen)(p).asInstanceOf[this.type]
|
||||
|
||||
}
|
||||
|
||||
class PipeChannel[T <: ChLeafType](
|
||||
val gen: T,
|
||||
latency: Int,
|
||||
clockRatio: IsRationalClockRatio = UnityClockRatio
|
||||
)(implicit p: Parameters) extends Module {
|
||||
|
||||
require(clockRatio.isUnity)
|
||||
require(latency == 0 || latency == 1)
|
||||
|
||||
val io = IO(new PipeChannelIO(gen))
|
||||
val tokens = Module(new Queue(gen, p(ChannelLen)))
|
||||
tokens.io.enq <> io.in
|
||||
io.out <> tokens.io.deq
|
||||
|
||||
if (latency == 1) {
|
||||
val initializing = RegNext(reset.toBool)
|
||||
when(initializing) {
|
||||
tokens.io.enq.valid := true.B
|
||||
io.in.ready := false.B
|
||||
}
|
||||
}
|
||||
|
||||
if (p(EnableSnapshot)) {
|
||||
io.trace <> TraceQueue(tokens.io.deq, io.traceLen)
|
||||
} else {
|
||||
io.trace := DontCare
|
||||
io.trace.valid := false.B
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class PipeChannelUnitTest(
|
||||
latency: Int = 0,
|
||||
numTokens: Int = 4096,
|
||||
timeout: Int = 50000
|
||||
)(implicit p: Parameters) extends UnitTest(timeout) {
|
||||
|
||||
override val testName = "PipeChannel Unit Test"
|
||||
val payloadWidth = 8
|
||||
val dut = Module(new PipeChannel(UInt(payloadWidth.W), latency, UnityClockRatio))
|
||||
val referenceInput = Wire(UInt(payloadWidth.W))
|
||||
val referenceOutput = ShiftRegister(referenceInput, latency)
|
||||
|
||||
val inputChannelMapping = Seq(IChannelDesc("in", referenceInput, dut.io.in))
|
||||
val outputChannelMapping = Seq(OChannelDesc("out", referenceOutput, dut.io.out, TokenComparisonFunctions.ignoreNTokens(1)))
|
||||
|
||||
io.finished := DirectedLIBDNTestHelper(inputChannelMapping, outputChannelMapping, numTokens)
|
||||
|
||||
dut.io.traceLen := DontCare
|
||||
dut.io.trace.ready := DontCare
|
||||
}
|
||||
|
||||
// A bidirectional token channel wrapping a target-decoupled (ready-valid) interface
|
||||
// Structurally, this keeps the target bundle intact however it should really be thought of as:
|
||||
// two *independent* token channels
|
||||
// fwd: DecoupledIO (carries a combined valid-and-payload token)
|
||||
// - valid -> fwd.hValid
|
||||
// - ready -> fwd.hReady
|
||||
// - bits -> {target.valid, target.bits}
|
||||
//
|
||||
// rev: DecoupledIO (carries a ready token)
|
||||
// - valid -> rev.hValid
|
||||
// - ready -> rev.hReady
|
||||
// - bits -> target.ready
|
||||
//
|
||||
// WARNING: Target.fire() is meaningless unless are fwd and rev channels are
|
||||
// synchronized and carry valid tokens
|
||||
|
||||
class SimReadyValidIO[T <: Data](gen: T) extends Bundle {
|
||||
val target = EnqIO(gen)
|
||||
val fwd = new HostReadyValid
|
||||
val rev = Flipped(new HostReadyValid)
|
||||
override def cloneType = new SimReadyValidIO(gen).asInstanceOf[this.type]
|
||||
|
||||
def fwdIrrevocabilityAssertions(suggestedName: Option[String] = None): Unit = {
|
||||
val hValidPrev = RegNext(fwd.hValid, false.B)
|
||||
val hReadyPrev = RegNext(fwd.hReady)
|
||||
val hFirePrev = hValidPrev && hReadyPrev
|
||||
val tPrev = RegNext(target)
|
||||
val prefix = suggestedName match {
|
||||
case Some(name) => name + ": "
|
||||
case None => ""
|
||||
}
|
||||
assert(!hValidPrev || hFirePrev || fwd.hValid,
|
||||
s"${prefix}hValid de-asserted without handshake, violating fwd token irrevocability")
|
||||
assert(!hValidPrev || hFirePrev || tPrev.valid === target.valid,
|
||||
s"${prefix}tValid transitioned without host handshake, violating fwd token irrevocability")
|
||||
assert(!hValidPrev || hFirePrev || tPrev.bits.asUInt() === target.bits.asUInt(),
|
||||
s"${prefix}tBits transitioned without host handshake, violating fwd token irrevocability")
|
||||
assert(!hFirePrev || tPrev.fire || !tPrev.valid,
|
||||
s"${prefix}tValid deasserted without prior target handshake, violating target-queue irrevocability")
|
||||
assert(!hFirePrev || tPrev.fire || !tPrev.valid || tPrev.bits.asUInt() === target.bits.asUInt(),
|
||||
s"${prefix}tBits transitioned without prior target handshake, violating target-queue irrevocability")
|
||||
}
|
||||
|
||||
def revIrrevocabilityAssertions(suggestedName: Option[String] = None): Unit = {
|
||||
val prefix = suggestedName match {
|
||||
case Some(name) => name + ": "
|
||||
case None => ""
|
||||
}
|
||||
val hReadyPrev = RegNext(rev.hReady, false.B)
|
||||
val hValidPrev = RegNext(rev.hValid)
|
||||
val tReadyPrev = RegNext(target.ready)
|
||||
val hFirePrev = hReadyPrev && hValidPrev
|
||||
assert(hFirePrev || !hReadyPrev || rev.hReady,
|
||||
s"${prefix}hReady de-asserted, violating token irrevocability")
|
||||
assert(hFirePrev || !hReadyPrev || tReadyPrev === target.ready,
|
||||
s"${prefix}tReady de-asserted, violating token irrevocability")
|
||||
}
|
||||
|
||||
// Returns two directioned objects driven by this SimReadyValidIO hw instance
|
||||
def bifurcate(): (DecoupledIO[ValidIO[T]], DecoupledIO[Bool]) = {
|
||||
// Can't use bidirectional wires, so we use a dummy module (akin to the identity module)
|
||||
class BifurcationModule[T <: Data](gen: T) extends MultiIOModule {
|
||||
val fwd = IO(Decoupled(Valid(gen)))
|
||||
val rev = IO(Flipped(DecoupledIO(Bool())))
|
||||
val coupled = IO(Flipped(cloneType))
|
||||
// Forward channel
|
||||
fwd.bits.bits := coupled.target.bits
|
||||
fwd.bits.valid := coupled.target.valid
|
||||
fwd.valid := coupled.fwd.hValid
|
||||
coupled.fwd.hReady := fwd.ready
|
||||
// Reverse channel
|
||||
rev.ready := coupled.rev.hReady
|
||||
coupled.target.ready := rev.bits
|
||||
coupled.rev.hValid := rev.valid
|
||||
}
|
||||
val bifurcator = Module(new BifurcationModule(gen))
|
||||
bifurcator.coupled <> this
|
||||
(bifurcator.fwd, bifurcator.rev)
|
||||
}
|
||||
|
||||
// Returns two directioned objects which will drive this SimReadyValidIO hw instance
|
||||
def combine(): (DecoupledIO[ValidIO[T]], DecoupledIO[Bool]) = {
|
||||
// Can't use bidirectional wires, so we use a dummy module (akin to the identity module)
|
||||
class CombiningModule[T <: Data](gen: T) extends MultiIOModule {
|
||||
val fwd = IO(Flipped(DecoupledIO(Valid(gen))))
|
||||
val rev = IO((Decoupled(Bool())))
|
||||
val coupled = IO(cloneType)
|
||||
// Forward channel
|
||||
coupled.target.bits := fwd.bits.bits
|
||||
coupled.target.valid := fwd.bits.valid
|
||||
coupled.fwd.hValid := fwd.valid
|
||||
fwd.ready := coupled.fwd.hReady
|
||||
// Reverse channel
|
||||
coupled.rev.hReady := rev.ready
|
||||
rev.bits := coupled.target.ready
|
||||
rev.valid := coupled.rev.hValid
|
||||
}
|
||||
val combiner = Module(new CombiningModule(gen))
|
||||
this <> combiner.coupled
|
||||
(combiner.fwd, combiner.rev)
|
||||
}
|
||||
}
|
||||
|
||||
object SimReadyValid {
|
||||
def apply[T <: Data](gen: T) = new SimReadyValidIO(gen)
|
||||
}
|
||||
|
||||
class ReadyValidTraceIO[T <: Data](gen: T) extends Bundle {
|
||||
val bits = Decoupled(gen)
|
||||
val valid = Decoupled(Bool())
|
||||
val ready = Decoupled(Bool())
|
||||
override def cloneType = new ReadyValidTraceIO(gen).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
object ReadyValidTrace {
|
||||
def apply[T <: Data](gen: T) = new ReadyValidTraceIO(gen)
|
||||
}
|
||||
|
||||
class ReadyValidChannelIO[T <: Data](gen: T)(implicit p: Parameters) extends Bundle {
|
||||
val enq = Flipped(SimReadyValid(gen))
|
||||
val deq = SimReadyValid(gen)
|
||||
val trace = ReadyValidTrace(gen)
|
||||
val traceLen = Input(UInt(log2Up(p(TraceMaxLen)+1).W))
|
||||
val targetReset = Flipped(Decoupled(Bool()))
|
||||
override def cloneType = new ReadyValidChannelIO(gen)(p).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class ReadyValidChannel[T <: Data](
|
||||
gen: T,
|
||||
n: Int = 2, // Target queue depth
|
||||
// Clock ratio (N/M) of deq interface (N) vs enq interface (M)
|
||||
clockRatio: IsRationalClockRatio = UnityClockRatio
|
||||
)(implicit p: Parameters) extends Module {
|
||||
require(clockRatio.isUnity, "CDC is not currently implemented")
|
||||
|
||||
val io = IO(new ReadyValidChannelIO(gen))
|
||||
val enqFwdQ = Module(new Queue(ValidIO(gen), 2, flow = true))
|
||||
enqFwdQ.io.enq.bits.valid := io.enq.target.valid
|
||||
enqFwdQ.io.enq.bits.bits := io.enq.target.bits
|
||||
enqFwdQ.io.enq.valid := io.enq.fwd.hValid
|
||||
io.enq.fwd.hReady := enqFwdQ.io.enq.ready
|
||||
|
||||
val deqRevQ = Module(new Queue(Bool(), 2, flow = true))
|
||||
deqRevQ.io.enq.bits := io.deq.target.ready
|
||||
deqRevQ.io.enq.valid := io.deq.rev.hValid
|
||||
io.deq.rev.hReady := deqRevQ.io.enq.ready
|
||||
|
||||
val reference = Module(new Queue(gen, n))
|
||||
val deqFwdFired = RegInit(false.B)
|
||||
val enqRevFired = RegInit(false.B)
|
||||
|
||||
val finishing = DecoupledHelper(
|
||||
io.targetReset.valid,
|
||||
enqFwdQ.io.deq.valid,
|
||||
deqRevQ.io.deq.valid,
|
||||
(enqRevFired || io.enq.rev.hReady),
|
||||
(deqFwdFired || io.deq.fwd.hReady))
|
||||
|
||||
val targetFire = finishing.fire()
|
||||
val enqBitsLast = RegEnable(enqFwdQ.io.deq.bits.bits, targetFire)
|
||||
// enqRev
|
||||
io.enq.rev.hValid := !enqRevFired
|
||||
io.enq.target.ready := reference.io.enq.ready
|
||||
|
||||
// deqFwd
|
||||
io.deq.fwd.hValid := !deqFwdFired
|
||||
io.deq.target.bits := reference.io.deq.bits
|
||||
io.deq.target.valid := reference.io.deq.valid
|
||||
|
||||
io.targetReset.ready := finishing.fire(io.targetReset.valid)
|
||||
enqFwdQ.io.deq.ready := finishing.fire(enqFwdQ.io.deq.valid)
|
||||
deqRevQ.io.deq.ready := finishing.fire(deqRevQ.io.deq.valid)
|
||||
|
||||
reference.reset := reset.toBool || targetFire && io.targetReset.bits
|
||||
reference.io.enq.valid := targetFire && enqFwdQ.io.deq.bits.valid
|
||||
reference.io.enq.bits := Mux(targetFire, enqFwdQ.io.deq.bits.bits, enqBitsLast)
|
||||
reference.io.deq.ready := targetFire && deqRevQ.io.deq.bits
|
||||
|
||||
deqFwdFired := Mux(targetFire, false.B, deqFwdFired || io.deq.fwd.hReady)
|
||||
enqRevFired := Mux(targetFire, false.B, enqRevFired || io.enq.rev.hReady)
|
||||
|
||||
io.trace := DontCare
|
||||
io.trace.bits.valid := false.B
|
||||
io.trace.valid.valid := false.B
|
||||
io.trace.ready.valid := false.B
|
||||
}
|
||||
|
||||
@chiselName
|
||||
class ReadyValidChannelUnitTest(
|
||||
numTokens: Int = 4096,
|
||||
queueDepth: Int = 2,
|
||||
timeout: Int = 50000
|
||||
)(implicit p: Parameters) extends UnitTest(timeout) {
|
||||
override val testName = "PipeChannel ClockRatio: ${clockRatio.numerator}/${clockRatio.denominator}"
|
||||
|
||||
val payloadType = UInt(8.W)
|
||||
val resetLength = 4
|
||||
|
||||
val dut = Module(new ReadyValidChannel(payloadType))
|
||||
val reference = Module(new Queue(payloadType, queueDepth))
|
||||
|
||||
// Generates target-reset tokens
|
||||
def resetTokenGen(): Bool = {
|
||||
val resetCount = RegInit(0.U(log2Ceil(resetLength + 1).W))
|
||||
val outOfReset = resetCount === resetLength.U
|
||||
resetCount := Mux(outOfReset, resetCount, resetCount + 1.U)
|
||||
!outOfReset
|
||||
}
|
||||
|
||||
// This will ensure that the bits field of deq matches even if target valid
|
||||
// is not asserted. To workaround random initialization of the queue's
|
||||
// mem, it neglects all target-invalid output tokens until all entries of
|
||||
// the mem has been written once.
|
||||
//
|
||||
// TODO: Consider initializing all memories to zero even in the unittests as
|
||||
// that will more closely the FPGA
|
||||
val enqCount = RegInit(0.U(log2Ceil(queueDepth + 1).W))
|
||||
val memFullyDefined = enqCount === queueDepth.U
|
||||
enqCount := Mux(!memFullyDefined && reference.io.enq.fire && !reference.reset.toBool, enqCount + 1.U, enqCount)
|
||||
|
||||
// Track the target cycle at which all entries are known
|
||||
val memFullyDefinedCycle = RegInit(1.U(log2Ceil(2*timeout).W))
|
||||
memFullyDefinedCycle := Mux(!memFullyDefined, memFullyDefinedCycle + 1.U, memFullyDefinedCycle)
|
||||
|
||||
def strictPayloadCheck(ref: Data, ch: DecoupledIO[Data]): Bool = {
|
||||
// hack: fix the types
|
||||
val refTyped = ref.asTypeOf(refDeqFwd)
|
||||
val modelTyped = ref.asTypeOf(refDeqFwd)
|
||||
|
||||
val deqCount = RegInit(0.U(log2Ceil(numTokens + 1).W))
|
||||
when (ch.fire) { deqCount := deqCount + 1.U }
|
||||
|
||||
// Neglect a comparison if: 1) still under reset 2) mem contents still undefined
|
||||
val exempt = deqCount < resetLength.U ||
|
||||
!refTyped.valid && !modelTyped.valid && (deqCount < memFullyDefinedCycle)
|
||||
val matchExact = ref.asUInt === ch.bits.asUInt
|
||||
|
||||
!ch.fire || exempt || matchExact
|
||||
}
|
||||
|
||||
val (deqFwd, deqRev) = dut.io.deq.bifurcate()
|
||||
val (enqFwd, enqRev) = dut.io.enq.combine()
|
||||
|
||||
val refDeqFwd = Wire(Valid(payloadType))
|
||||
refDeqFwd.bits := reference.io.deq.bits
|
||||
refDeqFwd.valid := reference.io.deq.valid
|
||||
val refEnqFwd = Wire(Valid(payloadType))
|
||||
reference.io.enq.bits := refEnqFwd.bits
|
||||
reference.io.enq.valid := refEnqFwd.valid
|
||||
|
||||
val inputChannelMapping = Seq(IChannelDesc("enqFwd", refEnqFwd, enqFwd),
|
||||
IChannelDesc("deqRev", reference.io.deq.ready, deqRev),
|
||||
IChannelDesc("reset" , reference.reset, dut.io.targetReset, Some(resetTokenGen)))
|
||||
|
||||
val outputChannelMapping = Seq(OChannelDesc("deqFwd", refDeqFwd, deqFwd, strictPayloadCheck),
|
||||
OChannelDesc("enqRev", reference.io.enq.ready, enqRev, TokenComparisonFunctions.ignoreNTokens(resetLength)))
|
||||
|
||||
io.finished := DirectedLIBDNTestHelper(inputChannelMapping, outputChannelMapping, numTokens)
|
||||
|
||||
dut.io.traceLen := DontCare
|
||||
dut.io.traceLen := DontCare
|
||||
dut.io.trace.ready.ready := DontCare
|
||||
dut.io.trace.valid.ready := DontCare
|
||||
dut.io.trace.bits.ready := DontCare
|
||||
dut.io.traceLen := DontCare
|
||||
}
|
|
@ -0,0 +1,61 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas.core
|
||||
|
||||
import freechips.rocketchip.tilelink.LFSR64 // Better than chisel's
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.experimental.MultiIOModule
|
||||
|
||||
trait ClockUtils {
|
||||
// Assume time is measured in ps
|
||||
val timeStepBits = 32
|
||||
|
||||
}
|
||||
|
||||
class GenericClockCrossing[T <: Data](gen: T) extends MultiIOModule with ClockUtils {
|
||||
val enq = IO(Flipped(Decoupled(gen)))
|
||||
val deq = IO(Decoupled(gen))
|
||||
val enqDomainTimeStep = IO(Input(UInt(timeStepBits.W)))
|
||||
val deqDomainTimeStep = IO(Input(UInt(timeStepBits.W)))
|
||||
|
||||
val enqTokens = Queue(enq, 2)
|
||||
|
||||
// Deq Domain handling
|
||||
val residualTime = Reg(UInt(timeStepBits.W))
|
||||
val hasResidualTime = RegInit(false.B)
|
||||
val timeToNextEnqEdge = Mux(hasResidualTime, residualTime, enqDomainTimeStep)
|
||||
val timeToNextDeqEdge = RegInit(0.U(timeStepBits.W))
|
||||
|
||||
val enqTokenVisible = timeToNextEnqEdge > timeToNextDeqEdge
|
||||
val tokenWouldExpire = timeToNextEnqEdge < timeToNextDeqEdge + deqDomainTimeStep
|
||||
|
||||
deq.valid := enqTokens.valid && enqTokenVisible
|
||||
deq.bits := enqTokens.bits
|
||||
enqTokens.ready := !enqTokenVisible || deq.ready && tokenWouldExpire
|
||||
|
||||
val enqTokenExpiring = enqTokens.fire
|
||||
val deqTokenReleased = deq.fire
|
||||
|
||||
// CASE 1: This ENQ token is visible in the current deq token, but not visible in future DEQ tokens
|
||||
// ENQ N | ENQ N1 |
|
||||
// ... | DEQ M | DEQ M1 |
|
||||
when (enqTokenExpiring && deqTokenReleased) {
|
||||
hasResidualTime := false.B
|
||||
timeToNextDeqEdge := timeToNextDeqEdge + deqDomainTimeStep - timeToNextEnqEdge
|
||||
// Case 2: This ENQ token is no longer visible, generally Fast -> Slow)
|
||||
// ENQ N | ENQ N+1 | ...
|
||||
// DEQ M | DEQ M+1...
|
||||
}.elsewhen(enqTokenExpiring) {
|
||||
hasResidualTime := false.B
|
||||
timeToNextDeqEdge := timeToNextDeqEdge - timeToNextEnqEdge
|
||||
// Case 3: This ENQ token is visible in the current and possibly future output tokens
|
||||
// ENQ M | ...
|
||||
// ENQ N | ENQ N+1 | ...
|
||||
}.elsewhen(deqTokenReleased) {
|
||||
hasResidualTime := true.B
|
||||
timeToNextDeqEdge := deqDomainTimeStep
|
||||
residualTime := timeToNextEnqEdge - deqDomainTimeStep
|
||||
}
|
||||
}
|
|
@ -0,0 +1,153 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas
|
||||
package core
|
||||
|
||||
import junctions._
|
||||
import widgets._
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.core.ActualDirection
|
||||
import chisel3.core.DataMirror.directionOf
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
import freechips.rocketchip.diplomacy.AddressSet
|
||||
import freechips.rocketchip.util.{DecoupledHelper}
|
||||
|
||||
import scala.collection.mutable
|
||||
|
||||
case object DMANastiKey extends Field[NastiParameters]
|
||||
case object FpgaMMIOSize extends Field[BigInt]
|
||||
|
||||
// The AXI4 widths for a single host-DRAM channel
|
||||
case object HostMemChannelNastiKey extends Field[NastiParameters]
|
||||
// The number of host-DRAM channels -> all channels must have the same AXI4 widths
|
||||
case object HostMemNumChannels extends Field[Int]
|
||||
// The aggregate memory-space seen by masters wanting DRAM
|
||||
case object MemNastiKey extends Field[NastiParameters]
|
||||
|
||||
class FPGATopIO(implicit val p: Parameters) extends WidgetIO {
|
||||
val dma = Flipped(new NastiIO()(p alterPartial ({ case NastiKey => p(DMANastiKey) })))
|
||||
val mem = Vec(4, new NastiIO()(p alterPartial ({ case NastiKey => p(HostMemChannelNastiKey) })))
|
||||
}
|
||||
|
||||
// Platform agnostic wrapper of the simulation models for FPGA
|
||||
class FPGATop(simIoType: SimWrapperChannels)(implicit p: Parameters) extends Module with HasWidgets {
|
||||
val io = IO(new FPGATopIO)
|
||||
// Simulation Target
|
||||
val sim = Module(new SimBox(simIoType.cloneType))
|
||||
val simIo = sim.io.channelPorts
|
||||
// This reset is used to return the simulation to time 0.
|
||||
val master = addWidget(new SimulationMaster)
|
||||
val simReset = master.io.simReset
|
||||
|
||||
sim.io.clock := clock
|
||||
sim.io.reset := reset.toBool || simReset
|
||||
sim.io.hostReset := simReset
|
||||
|
||||
val memPorts = new mutable.ListBuffer[NastiIO]
|
||||
case class DmaInfo(name: String, port: NastiIO, size: BigInt)
|
||||
val dmaInfoBuffer = new mutable.ListBuffer[DmaInfo]
|
||||
|
||||
// Instantiate bridge widgets.
|
||||
simIo.bridgeAnnos.map({ bridgeAnno =>
|
||||
val widgetChannelPrefix = s"${bridgeAnno.target.ref}"
|
||||
val widget = addWidget(bridgeAnno.elaborateWidget)
|
||||
widget.reset := reset.toBool || simReset
|
||||
widget match {
|
||||
case model: midas.models.FASEDMemoryTimingModel =>
|
||||
memPorts += model.io.host_mem
|
||||
model.hPort.hBits.axi4.aw.bits.user := DontCare
|
||||
model.hPort.hBits.axi4.aw.bits.region := DontCare
|
||||
model.hPort.hBits.axi4.ar.bits.user := DontCare
|
||||
model.hPort.hBits.axi4.ar.bits.region := DontCare
|
||||
model.hPort.hBits.axi4.w.bits.id := DontCare
|
||||
model.hPort.hBits.axi4.w.bits.user := DontCare
|
||||
case peekPoke: PeekPokeBridgeModule =>
|
||||
peekPoke.io.step <> master.io.step
|
||||
master.io.done := peekPoke.io.idle
|
||||
case _ =>
|
||||
}
|
||||
widget.hPort.connectChannels2Port(bridgeAnno, simIo)
|
||||
|
||||
widget match {
|
||||
case widget: HasDMA => dmaInfoBuffer += DmaInfo(widget.getWName, widget.dma, widget.dmaSize)
|
||||
case _ => Nil
|
||||
}
|
||||
})
|
||||
|
||||
// Host Memory Channels
|
||||
// Masters = Target memory channels + loadMemWidget
|
||||
val numMemModels = memPorts.length
|
||||
val nastiP = p.alterPartial({ case NastiKey => p(MemNastiKey) })
|
||||
val loadMem = addWidget(new LoadMemWidget(MemNastiKey))
|
||||
loadMem.reset := reset.toBool || simReset
|
||||
memPorts += loadMem.io.toSlaveMem
|
||||
|
||||
val channelSize = BigInt(1) << p(HostMemChannelNastiKey).addrBits
|
||||
val hostMemAddrMap = new AddrMap(Seq.tabulate(p(HostMemNumChannels))(i =>
|
||||
AddrMapEntry(s"memChannel$i", MemRange(i * channelSize, channelSize, MemAttr(AddrMapProt.RW)))))
|
||||
|
||||
val mem_xbar = Module(new NastiRecursiveInterconnect(numMemModels + 1, hostMemAddrMap)(nastiP))
|
||||
|
||||
io.mem.zip(mem_xbar.io.slaves).foreach({ case (mem, slave) => mem <> NastiQueue(slave)(nastiP) })
|
||||
memPorts.zip(mem_xbar.io.masters).foreach({ case (mem_model, master) => master <> mem_model })
|
||||
|
||||
|
||||
// Sort the list of DMA ports by address region size, largest to smallest
|
||||
val dmaInfoSorted = dmaInfoBuffer.sortBy(_.size).reverse.toSeq
|
||||
// Build up the address map using the sorted list,
|
||||
// auto-assigning base addresses as we go.
|
||||
val dmaAddrMap = dmaInfoSorted.foldLeft((BigInt(0), List.empty[AddrMapEntry])) {
|
||||
case ((startAddr, addrMap), DmaInfo(widgetName, _, reqSize)) =>
|
||||
// Round up the size to the nearest power of 2
|
||||
val regionSize = 1 << log2Ceil(reqSize)
|
||||
val region = MemRange(startAddr, regionSize, MemAttr(AddrMapProt.RW))
|
||||
|
||||
(startAddr + regionSize, AddrMapEntry(widgetName, region) :: addrMap)
|
||||
}._2.reverse
|
||||
val dmaPorts = dmaInfoSorted.map(_.port)
|
||||
|
||||
if (dmaPorts.isEmpty) {
|
||||
val dmaParams = p.alterPartial({ case NastiKey => p(DMANastiKey) })
|
||||
val error = Module(new NastiErrorSlave()(dmaParams))
|
||||
error.io <> io.dma
|
||||
} else if (dmaPorts.size == 1) {
|
||||
dmaPorts(0) <> io.dma
|
||||
} else {
|
||||
val dmaParams = p.alterPartial({ case NastiKey => p(DMANastiKey) })
|
||||
val router = Module(new NastiRecursiveInterconnect(
|
||||
1, new AddrMap(dmaAddrMap))(dmaParams))
|
||||
router.io.masters.head <> NastiQueue(io.dma)(dmaParams)
|
||||
dmaPorts.zip(router.io.slaves).foreach { case (dma, slave) => dma <> NastiQueue(slave)(dmaParams) }
|
||||
}
|
||||
|
||||
genCtrlIO(io.ctrl, p(FpgaMMIOSize))
|
||||
|
||||
val addrConsts = dmaAddrMap.map {
|
||||
case AddrMapEntry(name, MemRange(addr, _, _)) =>
|
||||
(s"${name.toUpperCase}_DMA_ADDR" -> addr.longValue)
|
||||
}
|
||||
|
||||
val headerConsts = addrConsts ++ List[(String, Long)](
|
||||
"CTRL_ID_BITS" -> io.ctrl.nastiXIdBits,
|
||||
"CTRL_ADDR_BITS" -> io.ctrl.nastiXAddrBits,
|
||||
"CTRL_DATA_BITS" -> io.ctrl.nastiXDataBits,
|
||||
"CTRL_STRB_BITS" -> io.ctrl.nastiWStrobeBits,
|
||||
// These specify channel widths; used mostly in the test harnesses
|
||||
"MEM_ADDR_BITS" -> io.mem(0).nastiXAddrBits,
|
||||
"MEM_DATA_BITS" -> io.mem(0).nastiXDataBits,
|
||||
"MEM_ID_BITS" -> io.mem(0).nastiXIdBits,
|
||||
// These are fixed by the AXI4 standard, only used in SW DRAM model
|
||||
"MEM_SIZE_BITS" -> io.mem(0).nastiXSizeBits,
|
||||
"MEM_LEN_BITS" -> io.mem(0).nastiXLenBits,
|
||||
"MEM_RESP_BITS" -> io.mem(0).nastiXRespBits,
|
||||
"MEM_STRB_BITS" -> io.mem(0).nastiWStrobeBits,
|
||||
// Address width of the aggregated host-DRAM space
|
||||
"DMA_ID_BITS" -> io.dma.nastiXIdBits,
|
||||
"DMA_ADDR_BITS" -> io.dma.nastiXAddrBits,
|
||||
"DMA_DATA_BITS" -> io.dma.nastiXDataBits,
|
||||
"DMA_STRB_BITS" -> io.dma.nastiWStrobeBits,
|
||||
"DMA_WIDTH" -> p(DMANastiKey).dataBits / 8,
|
||||
"DMA_SIZE" -> log2Ceil(p(DMANastiKey).dataBits / 8)
|
||||
)
|
||||
}
|
|
@ -0,0 +1,31 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas
|
||||
package core
|
||||
|
||||
import chisel3._
|
||||
|
||||
|
||||
// Adapted from DecoupledIO in Chisel3
|
||||
class HostDecoupledIO[+T <: Data](gen: T) extends Bundle
|
||||
{
|
||||
val hReady = Input(Bool())
|
||||
val hValid = Output(Bool())
|
||||
val hBits = gen.cloneType
|
||||
def fire(): Bool = hReady && hValid
|
||||
override def cloneType: this.type =
|
||||
new HostDecoupledIO(gen).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
/** Adds a ready-valid handshaking protocol to any interface.
|
||||
* The standard used is that the consumer uses the flipped interface.
|
||||
*/
|
||||
object HostDecoupled {
|
||||
def apply[T <: Data](gen: T): HostDecoupledIO[T] = new HostDecoupledIO(gen)
|
||||
}
|
||||
|
||||
class HostReadyValid extends Bundle {
|
||||
val hReady = Input(Bool())
|
||||
val hValid = Output(Bool())
|
||||
def fire(): Bool = hReady && hValid
|
||||
}
|
|
@ -0,0 +1,115 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas.core
|
||||
|
||||
import freechips.rocketchip.tilelink.LFSR64 // Better than chisel's
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.experimental.{chiselName}
|
||||
|
||||
// Describes an input channel / input port pair for an LI-BDN unittest
|
||||
// name: a descriptive channel name
|
||||
// reference: a hardware handle to the input on the reference RTL
|
||||
// modelChannel: a hardware handle to the input channel on the model
|
||||
// tokenGenFunc: a option carrying a function that when excuted,
|
||||
// generates hardware to produce a new input value each cycle
|
||||
case class IChannelDesc(
|
||||
name: String,
|
||||
reference: Data,
|
||||
modelChannel: DecoupledIO[Data],
|
||||
tokenGenFunc: Option[() => Data] = None) {
|
||||
|
||||
private def tokenSequenceGenerator(typ: Data): Data =
|
||||
Cat(Seq.fill((typ.getWidth + 63)/64)(LFSR64()))(typ.getWidth - 1, 0).asTypeOf(typ)
|
||||
|
||||
// Generate the testing hardware for a single input channel of a model
|
||||
@chiselName
|
||||
def genEnvironment(testLength: Int): Unit = {
|
||||
val inputGen = tokenGenFunc.getOrElse(() => tokenSequenceGenerator(reference.cloneType))()
|
||||
|
||||
// Drive a new input to the reference on every cycle
|
||||
reference := inputGen
|
||||
|
||||
// Drive tokenzied inputs to the model
|
||||
val inputTokenQueue = Module(new Queue(reference.cloneType, testLength, flow = true))
|
||||
inputTokenQueue.io.enq.bits := reference
|
||||
inputTokenQueue.io.enq.valid := true.B
|
||||
|
||||
// This provides an irrevocable input token stream
|
||||
val stickyTokenValid = Reg(Bool())
|
||||
modelChannel <> inputTokenQueue.io.deq
|
||||
modelChannel.valid := stickyTokenValid && inputTokenQueue.io.deq.valid
|
||||
inputTokenQueue.io.deq.ready := stickyTokenValid && modelChannel.ready
|
||||
|
||||
when (modelChannel.fire || ~stickyTokenValid) {
|
||||
stickyTokenValid := LFSR64()(1)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Describes an output channel / output port pair for an LI-BDN unittest
|
||||
// name: a descriptive channel name
|
||||
// reference: a hardware handle to the output on the reference RTL
|
||||
// modelChannel: a hardware handle to the output channel on the model
|
||||
// comparisonFunc: a function that elaborates hardware to compare
|
||||
// an output token Decoupled[Data] to the correct reference output [Data]
|
||||
case class OChannelDesc(
|
||||
name: String,
|
||||
reference: Data,
|
||||
modelChannel: DecoupledIO[Data],
|
||||
comparisonFunc: (Data, DecoupledIO[Data]) => Bool = (a, b) => !b.fire || a.asUInt === b.bits.asUInt) {
|
||||
|
||||
// Generate the testing hardware for a single output channel of a model
|
||||
@chiselName
|
||||
def genEnvironment(testLength: Int): Bool = {
|
||||
val refOutputs = Module(new Queue(reference.cloneType, testLength, flow = true))
|
||||
val refIdx = RegInit(0.U(log2Ceil(testLength + 1).W))
|
||||
val modelIdx = RegInit(0.U(log2Ceil(testLength + 1).W))
|
||||
|
||||
val hValidPrev = RegNext(modelChannel.valid, false.B)
|
||||
val hReadyPrev = RegNext(modelChannel.ready)
|
||||
val hFirePrev = hValidPrev && hReadyPrev
|
||||
|
||||
// Collect outputs from the reference RTL
|
||||
refOutputs.io.enq.valid := true.B
|
||||
refOutputs.io.enq.bits := reference
|
||||
|
||||
assert(comparisonFunc(refOutputs.io.deq.bits, modelChannel),
|
||||
s"${name} Channel: Output token traces did not match")
|
||||
assert(!hValidPrev || hFirePrev || modelChannel.valid,
|
||||
s"${name} Channel: hValid de-asserted without handshake, violating output token irrevocability")
|
||||
|
||||
val modelChannelDone = modelIdx === testLength.U
|
||||
when (modelChannel.fire) { modelIdx := modelIdx + 1.U }
|
||||
refOutputs.io.deq.ready := modelChannel.fire
|
||||
|
||||
// Fuzz backpressure on the token channel
|
||||
modelChannel.ready := LFSR64()(1) & !modelChannelDone
|
||||
|
||||
// Return the done signal
|
||||
modelChannelDone
|
||||
}
|
||||
}
|
||||
|
||||
object TokenComparisonFunctions{
|
||||
// Ignores the first N output tokens when verifying a token output trace
|
||||
def ignoreNTokens(numTokens: Int)(ref: Data, ch: DecoupledIO[Data]): Bool = {
|
||||
val count = RegInit(0.U(log2Ceil(numTokens + 1).W))
|
||||
val ignoreToken = count < numTokens.U
|
||||
when (ch.fire && ignoreToken) { count := count + 1.U }
|
||||
!ch.fire || ignoreToken || ref.asUInt === ch.bits.asUInt
|
||||
}
|
||||
}
|
||||
|
||||
object DirectedLIBDNTestHelper{
|
||||
@chiselName
|
||||
def apply(
|
||||
inputChannelMapping: Seq[IChannelDesc],
|
||||
outputChannelMapping: Seq[OChannelDesc],
|
||||
testLength: Int = 4096): Bool = {
|
||||
inputChannelMapping.foreach(_.genEnvironment(testLength))
|
||||
val finished = outputChannelMapping.map(_.genEnvironment(testLength)).foldLeft(true.B)(_ && _)
|
||||
finished
|
||||
}
|
||||
}
|
|
@ -0,0 +1,83 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas.core
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.experimental.{Direction}
|
||||
import chisel3.experimental.DataMirror.directionOf
|
||||
|
||||
import scala.collection.mutable.{ArrayBuffer}
|
||||
|
||||
// A collection of useful types and methods for moving between target and host-land interfaces
|
||||
object SimUtils {
|
||||
type ChLeafType = Bits
|
||||
type ChTuple = Tuple2[ChLeafType, String]
|
||||
type RVChTuple = Tuple2[ReadyValidIO[Data], String]
|
||||
type ParsePortsTuple = (List[ChTuple], List[ChTuple], List[RVChTuple], List[RVChTuple])
|
||||
|
||||
// (Some, None) -> Source channel
|
||||
// (None, Some) -> Sink channel
|
||||
// (Some, Some) -> Loop back channel -> two interconnected models
|
||||
trait PortTuple[T <: Any] {
|
||||
def source: Option[T]
|
||||
def sink: Option[T]
|
||||
def isOutput(): Boolean = sink == None
|
||||
def isInput(): Boolean = source == None
|
||||
def isLoopback(): Boolean = source != None && sink != None
|
||||
}
|
||||
|
||||
case class WirePortTuple(source: Option[ReadyValidIO[Data]], sink: Option[ReadyValidIO[Data]])
|
||||
extends PortTuple[ReadyValidIO[Data]]{
|
||||
require(source != None || sink != None)
|
||||
}
|
||||
// Tuple of forward port and reverse (backpressure) port
|
||||
type TargetRVPortType = (ReadyValidIO[ValidIO[Data]], ReadyValidIO[Bool])
|
||||
// A tuple of Options of the above type. _1 => source port _2 => sink port
|
||||
// Same principle as the wire channel, now with a more complex port type
|
||||
case class TargetRVPortTuple(source: Option[TargetRVPortType], sink: Option[TargetRVPortType])
|
||||
extends PortTuple[TargetRVPortType]{
|
||||
require(source != None || sink != None)
|
||||
}
|
||||
|
||||
def rvChannelNamePair(chName: String): (String, String) = (chName + "_fwd", chName + "_rev")
|
||||
def rvChannelNamePair(tuple: RVChTuple): (String, String) = rvChannelNamePair(tuple._2)
|
||||
|
||||
def prefixWith(prefix: String, base: Any): String =
|
||||
if (prefix != "") s"${prefix}_${base}" else base.toString
|
||||
|
||||
// Returns a list of input and output elements, with their flattened names
|
||||
def parsePorts(io: Seq[(String, Data)], alsoFlattenRVPorts: Boolean): ParsePortsTuple = {
|
||||
val inputs = ArrayBuffer[ChTuple]()
|
||||
val outputs = ArrayBuffer[ChTuple]()
|
||||
val rvInputs = ArrayBuffer[RVChTuple]()
|
||||
val rvOutputs = ArrayBuffer[RVChTuple]()
|
||||
|
||||
def loop(name: String, data: Data): Unit = data match {
|
||||
case c: Clock => // skip
|
||||
case rv: ReadyValidIO[_] => (directionOf(rv.valid): @unchecked) match {
|
||||
case Direction.Input => rvInputs += (rv -> name)
|
||||
case Direction.Output => rvOutputs += (rv -> name)
|
||||
}
|
||||
if (alsoFlattenRVPorts) rv.elements foreach {case (n, e) => loop(prefixWith(name, n), e)}
|
||||
case b: Record =>
|
||||
b.elements foreach {case (n, e) => loop(prefixWith(name, n), e)}
|
||||
case v: Vec[_] =>
|
||||
v.zipWithIndex foreach {case (e, i) => loop(prefixWith(name, i), e)}
|
||||
case b: ChLeafType => (directionOf(b): @unchecked) match {
|
||||
case Direction.Input => inputs += (b -> name)
|
||||
case Direction.Output => outputs += (b -> name)
|
||||
|
||||
}
|
||||
}
|
||||
io.foreach({ case (name, port) => loop(name, port)})
|
||||
(inputs.toList, outputs.toList, rvInputs.toList, rvOutputs.toList)
|
||||
}
|
||||
|
||||
def parsePorts(io: Data, prefix: String = "", alsoFlattenRVPorts: Boolean = true): ParsePortsTuple =
|
||||
parsePorts(Seq(prefix -> io), alsoFlattenRVPorts)
|
||||
|
||||
def parsePortsSeq(io: Seq[(String, Data)], alsoFlattenRVPorts: Boolean = true): ParsePortsTuple =
|
||||
parsePorts(io, alsoFlattenRVPorts)
|
||||
|
||||
}
|
|
@ -0,0 +1,355 @@
|
|||
// See LICENSE for license details.
|
||||
|
||||
package midas
|
||||
package core
|
||||
|
||||
|
||||
import midas.widgets.BridgeIOAnnotation
|
||||
import midas.passes.fame
|
||||
import midas.passes.fame.{FAMEChannelConnectionAnnotation, DecoupledForwardChannel}
|
||||
import midas.core.SimUtils._
|
||||
|
||||
// from rocketchip
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.experimental.{MultiIOModule, Direction}
|
||||
import chisel3.experimental.DataMirror.directionOf
|
||||
import firrtl.annotations.{ReferenceTarget}
|
||||
|
||||
import scala.collection.immutable.ListMap
|
||||
import scala.collection.mutable.{ArrayBuffer}
|
||||
|
||||
|
||||
case object ChannelLen extends Field[Int]
|
||||
case object ChannelWidth extends Field[Int]
|
||||
|
||||
trait HasSimWrapperParams {
|
||||
implicit val p: Parameters
|
||||
implicit val channelWidth = p(ChannelWidth)
|
||||
val traceMaxLen = p(strober.core.TraceMaxLen)
|
||||
val daisyWidth = p(strober.core.DaisyWidth)
|
||||
val sramChainNum = p(strober.core.SRAMChainNum)
|
||||
}
|
||||
|
||||
|
||||
class SimReadyValidRecord(es: Seq[(String, ReadyValidIO[Data])]) extends Record {
|
||||
val elements = ListMap() ++ (es map { case (name, rv) =>
|
||||
(directionOf(rv.valid): @unchecked) match {
|
||||
case Direction.Input => name -> Flipped(SimReadyValid(rv.bits.cloneType))
|
||||
case Direction.Output => name -> SimReadyValid(rv.bits.cloneType)
|
||||
}
|
||||
})
|
||||
def cloneType = new SimReadyValidRecord(es).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class ReadyValidTraceRecord(es: Seq[(String, ReadyValidIO[Data])]) extends Record {
|
||||
val elements = ListMap() ++ (es map {
|
||||
case (name, rv) => name -> ReadyValidTrace(rv.bits.cloneType)
|
||||
})
|
||||
def cloneType = new ReadyValidTraceRecord(es).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
// Regenerates the "bits" field of a target ready-valid interface from a list of flattened
|
||||
// elements that include the "bits_" prefix. This is stripped off.
|
||||
class PayloadRecord(elms: Seq[(String, Data)]) extends Record {
|
||||
override val elements = ListMap((elms map { case (name, data) => name.stripPrefix("bits_") -> data.cloneType }):_*)
|
||||
override def cloneType: this.type = new PayloadRecord(elms).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
abstract class ChannelizedWrapperIO(chAnnos: Seq[FAMEChannelConnectionAnnotation],
|
||||
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port]) extends Record {
|
||||
|
||||
def regenTypesFromField(name: String, tpe: firrtl.ir.Type): Seq[(String, ChLeafType)] = tpe match {
|
||||
case firrtl.ir.BundleType(fields) => fields.flatMap(f => regenTypesFromField(prefixWith(name, f.name), f.tpe))
|
||||
case firrtl.ir.UIntType(width: firrtl.ir.IntWidth) => Seq(name -> UInt(width.width.toInt.W))
|
||||
case firrtl.ir.SIntType(width: firrtl.ir.IntWidth) => Seq(name -> SInt(width.width.toInt.W))
|
||||
case _ => throw new RuntimeException(s"Unexpected type in token payload: ${tpe}.")
|
||||
}
|
||||
|
||||
def regenTypes(refTargets: Seq[ReferenceTarget]): Seq[(String, ChLeafType)] = {
|
||||
val port = leafTypeMap(refTargets.head.copy(component = Seq()))
|
||||
val fieldName = refTargets.head.component match {
|
||||
case firrtl.annotations.TargetToken.Field(fName) :: Nil => fName
|
||||
case firrtl.annotations.TargetToken.Field(fName) :: fields => fName
|
||||
case _ => throw new RuntimeException("Expected only a bits field in ReferenceTarget's component.")
|
||||
}
|
||||
|
||||
val bitsField = port.tpe match {
|
||||
case a: firrtl.ir.BundleType => a.fields.filter(_.name == fieldName).head
|
||||
case _ => throw new RuntimeException("ReferenceTargets should point at the channel's bundle.")
|
||||
}
|
||||
|
||||
regenTypesFromField("", bitsField.tpe)
|
||||
}
|
||||
|
||||
def regenPayloadType(refTargets: Seq[ReferenceTarget]): Data = {
|
||||
require(!refTargets.isEmpty)
|
||||
// Reject all (String -> Data) pairs not included in the refTargets
|
||||
// Use this to remove target valid
|
||||
val targetLeafNames = refTargets.map(_.component.reverse.head.value).toSet
|
||||
val elements = regenTypes(refTargets).filter({ case (name, f) => targetLeafNames(name) })
|
||||
elements match {
|
||||
case (name, field) :: Nil => field // If there's only a single field, just pass out the type
|
||||
case elms => new PayloadRecord(elms)
|
||||
}
|
||||
}
|
||||
|
||||
def regenWireType(refTargets: Seq[ReferenceTarget]): ChLeafType = {
|
||||
require(refTargets.size == 1, "FIXME: Handle aggregated wires")
|
||||
regenTypes(refTargets).head._2
|
||||
}
|
||||
|
||||
val payloadTypeMap: Map[FAMEChannelConnectionAnnotation, Data] = chAnnos.collect({
|
||||
// Target Decoupled Channels need to have their target-valid ReferenceTarget removed
|
||||
case ch @ FAMEChannelConnectionAnnotation(_,DecoupledForwardChannel(_,Some(vsrc),_,_),Some(srcs),_) =>
|
||||
ch -> regenPayloadType(srcs.filterNot(_ == vsrc))
|
||||
case ch @ FAMEChannelConnectionAnnotation(_,DecoupledForwardChannel(_,_,_,Some(vsink)),_,Some(sinks)) =>
|
||||
ch -> regenPayloadType(sinks.filterNot(_ == vsink))
|
||||
}).toMap
|
||||
|
||||
val wireTypeMap: Map[FAMEChannelConnectionAnnotation, ChLeafType] = chAnnos.collect({
|
||||
case ch @ FAMEChannelConnectionAnnotation(_,fame.PipeChannel(_),Some(srcs),_) => ch -> regenWireType(srcs)
|
||||
case ch @ FAMEChannelConnectionAnnotation(_,fame.PipeChannel(_),_,Some(sinks)) => ch -> regenWireType(sinks)
|
||||
}).toMap
|
||||
|
||||
val wireElements = ArrayBuffer[(String, ReadyValidIO[Data])]()
|
||||
|
||||
|
||||
val wirePortMap: Map[String, WirePortTuple] = chAnnos.collect({
|
||||
case ch @ FAMEChannelConnectionAnnotation(globalName, fame.PipeChannel(_),sources,sinks) => {
|
||||
val sinkP = sinks.map({ tRefs =>
|
||||
val name = tRefs.head.ref.stripSuffix("_bits")
|
||||
val port = Flipped(Decoupled(wireTypeMap(ch)))
|
||||
wireElements += name -> port
|
||||
port
|
||||
})
|
||||
val sourceP = sources.map({ tRefs =>
|
||||
val name = tRefs.head.ref.stripSuffix("_bits")
|
||||
val port = Decoupled(wireTypeMap(ch))
|
||||
wireElements += name -> port
|
||||
port
|
||||
})
|
||||
(globalName -> WirePortTuple(sourceP, sinkP))
|
||||
}
|
||||
}).toMap
|
||||
|
||||
// Looks up a channel based on a channel name
|
||||
val wireOutputPortMap = wirePortMap.collect({
|
||||
case (name, portTuple) if portTuple.isOutput => name -> portTuple.source.get
|
||||
})
|
||||
|
||||
val wireInputPortMap = wirePortMap.collect({
|
||||
case (name, portTuple) if portTuple.isInput => name -> portTuple.sink.get
|
||||
})
|
||||
|
||||
|
||||
val rvElements = ArrayBuffer[(String, ReadyValidIO[Data])]()
|
||||
|
||||
// Using a channel's globalName; look up it's associated port tuple
|
||||
val rvPortMap: Map[String, TargetRVPortTuple] = chAnnos.collect({
|
||||
case ch @ FAMEChannelConnectionAnnotation(globalName, info@DecoupledForwardChannel(_,_,_,_), leafSources, leafSinks) =>
|
||||
val sourcePortPair = leafSources.map({ tRefs =>
|
||||
require(!tRefs.isEmpty, "FIXME: Are empty decoupleds OK?")
|
||||
val validTRef: ReferenceTarget = info.validSource.getOrElse(throw new RuntimeException(
|
||||
"Target RV port has leaves but no TRef to a validSource"))
|
||||
val readyTRef: ReferenceTarget = info.readySink.getOrElse(throw new RuntimeException(
|
||||
"Target RV port has leaves but no TRef to a readySink"))
|
||||
|
||||
val fwdName = validTRef.ref
|
||||
val fwdPort = Decoupled(Valid(payloadTypeMap(ch)))
|
||||
val revName = readyTRef.ref
|
||||
val revPort = Flipped(Decoupled(Bool()))
|
||||
rvElements ++= Seq((fwdName -> fwdPort), (revName -> revPort))
|
||||
(fwdPort, revPort)
|
||||
})
|
||||
|
||||
val sinkPortPair = leafSinks.map({ tRefs =>
|
||||
require(!tRefs.isEmpty, "FIXME: Are empty decoupleds OK?")
|
||||
val validTRef: ReferenceTarget = info.validSink.getOrElse(throw new RuntimeException(
|
||||
"Target RV port has payload sinks but no TRef to a validSink"))
|
||||
val readyTRef: ReferenceTarget = info.readySource.getOrElse(throw new RuntimeException(
|
||||
"Target RV port has payload sinks but no TRef to a readySource"))
|
||||
|
||||
val fwdName = validTRef.ref
|
||||
val fwdPort = Flipped(Decoupled(Valid(payloadTypeMap(ch))))
|
||||
val revName = readyTRef.ref
|
||||
val revPort = Decoupled(Bool())
|
||||
rvElements ++= Seq((fwdName -> fwdPort), (revName -> revPort))
|
||||
(fwdPort, revPort)
|
||||
})
|
||||
globalName -> TargetRVPortTuple(sourcePortPair, sinkPortPair)
|
||||
}).toMap
|
||||
|
||||
// Looks up a channel based on a channel name
|
||||
val rvOutputPortMap = rvPortMap.collect({
|
||||
case (name, portTuple) if portTuple.isOutput => name -> portTuple.source.get
|
||||
})
|
||||
|
||||
val rvInputPortMap = rvPortMap.collect({
|
||||
case (name, portTuple) if portTuple.isInput => name -> portTuple.sink.get
|
||||
})
|
||||
|
||||
// Looks up a FCCA based on a global channel name
|
||||
val chNameToAnnoMap = chAnnos.map(anno => anno.globalName -> anno)
|
||||
}
|
||||
|
||||
class TargetBoxIO(val chAnnos: Seq[FAMEChannelConnectionAnnotation],
|
||||
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port])
|
||||
extends ChannelizedWrapperIO(chAnnos, leafTypeMap) {
|
||||
|
||||
val clock = Input(Clock())
|
||||
val hostReset = Input(Bool())
|
||||
override val elements = ListMap((wireElements ++ rvElements):_*) ++
|
||||
// Untokenized ports
|
||||
ListMap("clock" -> clock, "hostReset" -> hostReset)
|
||||
override def cloneType: this.type = new TargetBoxIO(chAnnos, leafTypeMap).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class TargetBox(chAnnos: Seq[FAMEChannelConnectionAnnotation],
|
||||
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port]) extends BlackBox {
|
||||
val io = IO(new TargetBoxIO(chAnnos, leafTypeMap))
|
||||
}
|
||||
|
||||
class SimWrapperChannels(val chAnnos: Seq[FAMEChannelConnectionAnnotation],
|
||||
val bridgeAnnos: Seq[BridgeIOAnnotation],
|
||||
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port])
|
||||
extends ChannelizedWrapperIO(chAnnos, leafTypeMap) {
|
||||
|
||||
override val elements = ListMap((wireElements ++ rvElements):_*)
|
||||
override def cloneType: this.type = new SimWrapperChannels(chAnnos, bridgeAnnos, leafTypeMap).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
|
||||
class SimBox(simChannels: SimWrapperChannels) extends BlackBox {
|
||||
val io = IO(new Bundle {
|
||||
val clock = Input(Clock())
|
||||
val reset = Input(Bool())
|
||||
val hostReset = Input(Bool())
|
||||
val channelPorts = simChannels.cloneType
|
||||
})
|
||||
}
|
||||
|
||||
class SimWrapper(chAnnos: Seq[FAMEChannelConnectionAnnotation],
|
||||
bridgeAnnos: Seq[BridgeIOAnnotation],
|
||||
leafTypeMap: Map[ReferenceTarget, firrtl.ir.Port])
|
||||
(implicit val p: Parameters) extends MultiIOModule with HasSimWrapperParams {
|
||||
|
||||
// Remove all FCAs that are loopback channels. All non-loopback FCAs connect
|
||||
// to bridges and will be presented in the SimWrapper's IO
|
||||
val bridgeChAnnos = chAnnos.collect({
|
||||
case fca @ FAMEChannelConnectionAnnotation(_,_,_,None) => fca
|
||||
case fca @ FAMEChannelConnectionAnnotation(_,_,None,_) => fca
|
||||
})
|
||||
|
||||
val channelPorts = IO(new SimWrapperChannels(bridgeChAnnos, bridgeAnnos, leafTypeMap))
|
||||
val hostReset = IO(Input(Bool()))
|
||||
val target = Module(new TargetBox(chAnnos, leafTypeMap))
|
||||
|
||||
target.io.hostReset := reset.toBool && hostReset
|
||||
target.io.clock := clock
|
||||
import chisel3.core.ExplicitCompileOptions.NotStrict // FIXME
|
||||
|
||||
def getPipeChannelType(chAnno: FAMEChannelConnectionAnnotation): ChLeafType = {
|
||||
target.io.wireTypeMap(chAnno)
|
||||
}
|
||||
|
||||
def genPipeChannel(chAnno: FAMEChannelConnectionAnnotation, latency: Int = 1): PipeChannel[ChLeafType] = {
|
||||
require(chAnno.sources == None || chAnno.sources.get.size == 1, "Can't aggregate wire-type channels yet")
|
||||
require(chAnno.sinks == None || chAnno.sinks .get.size == 1, "Can't aggregate wire-type channels yet")
|
||||
|
||||
val channel = Module(new PipeChannel(getPipeChannelType(chAnno), latency))
|
||||
channel suggestName s"PipeChannel_${chAnno.globalName}"
|
||||
|
||||
val portTuple = target.io.wirePortMap(chAnno.globalName)
|
||||
portTuple.source match {
|
||||
case Some(srcP) => channel.io.in <> srcP
|
||||
case None => channel.io.in <> channelPorts.elements(s"${chAnno.globalName}_sink")
|
||||
}
|
||||
|
||||
portTuple.sink match {
|
||||
case Some(sinkP) => sinkP <> channel.io.out
|
||||
case None => channelPorts.elements(s"${chAnno.globalName}_source") <> channel.io.out
|
||||
}
|
||||
|
||||
channel.io.trace.ready := DontCare
|
||||
channel.io.traceLen := DontCare
|
||||
channel
|
||||
}
|
||||
|
||||
// Helper functions to attach legacy SimReadyValidIO to true, dual-channel implementations of target ready-valid
|
||||
def bindRVChannelEnq[T <: Data](enq: SimReadyValidIO[T], port: TargetRVPortType): Unit = {
|
||||
val (fwdPort, revPort) = port
|
||||
enq.fwd.hValid := fwdPort.valid
|
||||
enq.target.valid := fwdPort.bits.valid
|
||||
enq.target.bits := fwdPort.bits.bits // Yeah, i know
|
||||
fwdPort.ready := enq.fwd.hReady
|
||||
|
||||
// Connect up the target-ready token channel
|
||||
revPort.valid := enq.rev.hValid
|
||||
revPort.bits := enq.target.ready
|
||||
enq.rev.hReady := revPort.ready
|
||||
}
|
||||
|
||||
def bindRVChannelDeq[T <: Data](deq: SimReadyValidIO[T], port: TargetRVPortType): Unit = {
|
||||
val (fwdPort, revPort) = port
|
||||
deq.fwd.hReady := fwdPort.ready
|
||||
fwdPort.valid := deq.fwd.hValid
|
||||
fwdPort.bits.valid := deq.target.valid
|
||||
fwdPort.bits.bits := deq.target.bits
|
||||
|
||||
// Connect up the target-ready token channel
|
||||
deq.rev.hValid := revPort.valid
|
||||
deq.target.ready := revPort.bits
|
||||
revPort.ready := deq.rev.hReady
|
||||
}
|
||||
|
||||
|
||||
def getReadyValidChannelType(chAnno: FAMEChannelConnectionAnnotation): Data = {
|
||||
target.io.payloadTypeMap(chAnno)
|
||||
}
|
||||
|
||||
def genReadyValidChannel(chAnno: FAMEChannelConnectionAnnotation): ReadyValidChannel[Data] = {
|
||||
val chName = chAnno.globalName
|
||||
val strippedName = chName.stripSuffix("_fwd")
|
||||
// Determine which bridge this channel belongs to by looking it up with the valid
|
||||
//val bridgeClockRatio = io.bridges.find(_(rvInterface.valid)) match {
|
||||
// case Some(bridge) => bridge.clockRatio
|
||||
// case None => UnityClockRatio
|
||||
//}
|
||||
val bridgeClockRatio = UnityClockRatio // TODO: FIXME
|
||||
// A channel is considered "flipped" if it's sunk by the tranformed RTL (sourced by an bridge)
|
||||
val channel = Module(new ReadyValidChannel(getReadyValidChannelType(chAnno).cloneType))
|
||||
|
||||
channel.suggestName(s"ReadyValidChannel_$strippedName")
|
||||
|
||||
val enqPortPair = (chAnno.sources match {
|
||||
case Some(_) => target.io.rvOutputPortMap(chName)
|
||||
case None => channelPorts.rvInputPortMap(chName)
|
||||
})
|
||||
bindRVChannelEnq(channel.io.enq, enqPortPair)
|
||||
|
||||
val deqPortPair = (chAnno.sinks match {
|
||||
case Some(_) => target.io.rvInputPortMap(chName)
|
||||
case None => channelPorts.rvOutputPortMap(chName)
|
||||
})
|
||||
bindRVChannelDeq(channel.io.deq, deqPortPair)
|
||||
|
||||
channel.io.trace := DontCare
|
||||
channel.io.traceLen := DontCare
|
||||
channel.io.targetReset.bits := false.B
|
||||
channel.io.targetReset.valid := true.B
|
||||
channel
|
||||
}
|
||||
|
||||
// Generate all ready-valid channels
|
||||
val rvChannels = chAnnos.collect({
|
||||
case ch @ FAMEChannelConnectionAnnotation(_,fame.DecoupledForwardChannel(_,_,_,_),_,_) => genReadyValidChannel(ch)
|
||||
})
|
||||
|
||||
// Generate all wire channels, excluding reset
|
||||
chAnnos.collect({
|
||||
case ch @ FAMEChannelConnectionAnnotation(name, fame.PipeChannel(latency),_,_) => genPipeChannel(ch, latency)
|
||||
})
|
||||
}
|
|
@ -0,0 +1,150 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.util.GenericParameterizedBundle
|
||||
import junctions._
|
||||
import midas.widgets._
|
||||
|
||||
import Console.{UNDERLINED, RESET}
|
||||
|
||||
case class BankConflictConfig(
|
||||
maxBanks: Int,
|
||||
maxLatencyBits: Int = 12, // 4K cycles
|
||||
params: BaseParams) extends BaseConfig {
|
||||
|
||||
def elaborate()(implicit p: Parameters): BankConflictModel = Module(new BankConflictModel(this))
|
||||
}
|
||||
|
||||
class BankConflictMMRegIO(cfg: BankConflictConfig)(implicit p: Parameters)
|
||||
extends SplitTransactionMMRegIO(cfg){
|
||||
val latency = Input(UInt(cfg.maxLatencyBits.W))
|
||||
val conflictPenalty = Input(UInt(32.W))
|
||||
// The mask bits setting determines how many banks are used
|
||||
val bankAddr = Input(new ProgrammableSubAddr(
|
||||
maskBits = log2Ceil(cfg.maxBanks),
|
||||
longName = "Bank Address",
|
||||
defaultOffset = 13,
|
||||
defaultMask = (1 << cfg.maxBanks) - 1
|
||||
))
|
||||
|
||||
val bankConflicts = Output(Vec(cfg.maxBanks, UInt(32.W)))
|
||||
|
||||
val registers = maxReqRegisters ++ Seq(
|
||||
(latency -> RuntimeSetting(30,
|
||||
"Latency",
|
||||
min = 1,
|
||||
max = Some((1 << (cfg.maxLatencyBits-1)) - 1))),
|
||||
(conflictPenalty -> RuntimeSetting(30,
|
||||
"Bank-Conflict Penalty",
|
||||
max = Some((1 << (cfg.maxLatencyBits-1)) - 1)))
|
||||
)
|
||||
|
||||
def requestSettings() {
|
||||
Console.println(s"${UNDERLINED}Generating runtime configuration for Bank-Conflict Model${RESET}")
|
||||
}
|
||||
}
|
||||
|
||||
class BankConflictIO(cfg: BankConflictConfig)(implicit p: Parameters)
|
||||
extends SplitTransactionModelIO()(p) {
|
||||
val mmReg = new BankConflictMMRegIO(cfg)
|
||||
}
|
||||
|
||||
class BankQueueEntry(cfg: BankConflictConfig)(implicit p: Parameters) extends Bundle {
|
||||
val xaction = new TransactionMetaData
|
||||
val bankAddr = UInt(log2Ceil(cfg.maxBanks).W)
|
||||
override def cloneType = new BankQueueEntry(cfg)(p).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
// Appends a target cycle at which this reference should be complete
|
||||
class BankConflictReference(cfg: BankConflictConfig)(implicit p: Parameters) extends Bundle {
|
||||
val reference = new BankQueueEntry(cfg)
|
||||
val cycle = UInt(cfg.maxLatencyBits.W) // Indicates latency until doneness
|
||||
val done = Bool() // Set high when the cycle count expires
|
||||
override def cloneType = new BankConflictReference(cfg)(p).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
object BankConflictConstants {
|
||||
val nBankStates = 3
|
||||
val bankIdle :: bankBusy :: bankPrecharge :: Nil = Enum(nBankStates)
|
||||
}
|
||||
|
||||
import BankConflictConstants._
|
||||
|
||||
class BankConflictModel(cfg: BankConflictConfig)(implicit p: Parameters) extends SplitTransactionModel(cfg)(p) {
|
||||
|
||||
val longName = "Bank Conflict"
|
||||
def printTimingModelGenerationConfig {}
|
||||
/**************************** CHISEL BEGINS *********************************/
|
||||
// This is the absolute number of banks the model can account for
|
||||
lazy val io = IO(new BankConflictIO(cfg))
|
||||
|
||||
val latency = io.mmReg.latency
|
||||
val conflictPenalty = io.mmReg.conflictPenalty
|
||||
|
||||
val transactionQueue = Module(new DualQueue(
|
||||
gen = new BankQueueEntry(cfg),
|
||||
entries = cfg.maxWrites + cfg.maxReads))
|
||||
|
||||
transactionQueue.io.enqA.valid := newWReq
|
||||
transactionQueue.io.enqA.bits.xaction := TransactionMetaData(awQueue.io.deq.bits)
|
||||
transactionQueue.io.enqA.bits.bankAddr := io.mmReg.bankAddr.getSubAddr(awQueue.io.deq.bits.addr)
|
||||
|
||||
transactionQueue.io.enqB.valid := tNasti.ar.fire
|
||||
transactionQueue.io.enqB.bits.xaction := TransactionMetaData(tNasti.ar.bits)
|
||||
transactionQueue.io.enqB.bits.bankAddr := io.mmReg.bankAddr.getSubAddr(tNasti.ar.bits.addr)
|
||||
|
||||
val bankBusyCycles = Seq.fill(cfg.maxBanks)(RegInit(UInt(0, cfg.maxLatencyBits)))
|
||||
val bankConflictCounts = RegInit(VecInit(Seq.fill(cfg.maxBanks)(0.U(32.W))))
|
||||
|
||||
val newReference = Wire(Decoupled(new BankConflictReference(cfg)))
|
||||
newReference.valid := transactionQueue.io.deq.valid
|
||||
newReference.bits.reference := transactionQueue.io.deq.bits
|
||||
val marginalCycles = latency + VecInit(bankBusyCycles)(transactionQueue.io.deq.bits.bankAddr)
|
||||
newReference.bits.cycle := tCycle(cfg.maxLatencyBits-1, 0) + marginalCycles
|
||||
newReference.bits.done := marginalCycles === 0.U
|
||||
transactionQueue.io.deq.ready := newReference.ready
|
||||
|
||||
val refBuffer = CollapsingBuffer(newReference, cfg.maxReads + cfg.maxWrites)
|
||||
val refList = refBuffer.io.entries
|
||||
val refUpdates = refBuffer.io.updates
|
||||
|
||||
bankBusyCycles.zip(bankConflictCounts).zipWithIndex.foreach({ case ((busyCycles, conflictCount), idx) =>
|
||||
when(busyCycles > 0.U){
|
||||
busyCycles := busyCycles - 1.U
|
||||
}
|
||||
|
||||
when(newReference.fire() && newReference.bits.reference.bankAddr === idx.U){
|
||||
busyCycles := marginalCycles + conflictPenalty
|
||||
conflictCount := Mux(busyCycles > 0.U, conflictCount + 1.U, conflictCount)
|
||||
}
|
||||
})
|
||||
|
||||
// Mark the reference as complete
|
||||
refList.zip(refUpdates).foreach({ case (ref, update) =>
|
||||
when(tCycle(cfg.maxLatencyBits-1, 0) === ref.bits.cycle) { update.bits.done := true.B }
|
||||
})
|
||||
|
||||
val selector = Module(new Arbiter(refList.head.bits.cloneType, refList.size))
|
||||
selector.io.in <> refList.map({ entry =>
|
||||
val candidate = V2D(entry)
|
||||
candidate.valid := entry.valid && entry.bits.done
|
||||
candidate
|
||||
})
|
||||
|
||||
// Take the readies from the arbiter, and kill the selected entry
|
||||
refUpdates.zip(selector.io.in).foreach({ case (ref, sel) =>
|
||||
when(sel.fire()) { ref.valid := false.B } })
|
||||
|
||||
io.mmReg.bankConflicts := bankConflictCounts
|
||||
|
||||
val completedRef = selector.io.out.bits.reference
|
||||
|
||||
rResp.bits := ReadResponseMetaData(completedRef.xaction)
|
||||
wResp.bits := WriteResponseMetaData(completedRef.xaction)
|
||||
wResp.valid := selector.io.out.valid && completedRef.xaction.isWrite
|
||||
rResp.valid := selector.io.out.valid && !completedRef.xaction.isWrite
|
||||
selector.io.out.ready := Mux(completedRef.xaction.isWrite, wResp.ready, rResp.ready)
|
||||
}
|
|
@ -0,0 +1,726 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.util.GenericParameterizedBundle
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
|
||||
import org.json4s._
|
||||
import org.json4s.native.JsonMethods._
|
||||
|
||||
import Console.{UNDERLINED, GREEN, RESET}
|
||||
import scala.collection.mutable
|
||||
import scala.io.Source
|
||||
|
||||
|
||||
trait HasDRAMMASConstants {
|
||||
val maxDRAMTimingBits = 7 // width of a DRAM timing
|
||||
val tREFIWidth = 14 // Refresh interval. Suffices up to tCK = ~0.5ns (for 64ms, 8192 refresh commands)
|
||||
val tREFIBits = 14 // Refresh interval. Suffices up to tCK = ~0.5ns (for 64ms, 8192 refresh commands)
|
||||
val tRFCBits = 10
|
||||
val numBankStates = 2
|
||||
val numRankStates = 2
|
||||
}
|
||||
|
||||
object DRAMMasEnums extends HasDRAMMASConstants {
|
||||
val cmd_nop :: cmd_act :: cmd_pre :: cmd_casw :: cmd_casr :: cmd_ref :: Nil = Enum(6)
|
||||
val bank_idle :: bank_active :: Nil = Enum(numBankStates)
|
||||
val rank_active :: rank_refresh :: Nil = Enum(numRankStates)
|
||||
}
|
||||
|
||||
|
||||
case class JSONField(value: BigInt, units: String)
|
||||
|
||||
class DRAMProgrammableTimings extends Bundle with HasDRAMMASConstants with HasProgrammableRegisters
|
||||
with HasConsoleUtils {
|
||||
// The most vanilla of DRAM timings
|
||||
val tAL = UInt(maxDRAMTimingBits.W)
|
||||
val tCAS = UInt(maxDRAMTimingBits.W)
|
||||
val tCMD = UInt(maxDRAMTimingBits.W)
|
||||
val tCWD = UInt(maxDRAMTimingBits.W)
|
||||
val tCCD = UInt(maxDRAMTimingBits.W)
|
||||
val tFAW = UInt(maxDRAMTimingBits.W)
|
||||
val tRAS = UInt(maxDRAMTimingBits.W)
|
||||
val tREFI = UInt(tREFIBits.W)
|
||||
val tRC = UInt(maxDRAMTimingBits.W)
|
||||
val tRCD = UInt(maxDRAMTimingBits.W)
|
||||
val tRFC = UInt(tRFCBits.W)
|
||||
val tRRD = UInt(maxDRAMTimingBits.W)
|
||||
val tRP = UInt(maxDRAMTimingBits.W)
|
||||
val tRTP = UInt(maxDRAMTimingBits.W)
|
||||
val tRTRS = UInt(maxDRAMTimingBits.W)
|
||||
val tWR = UInt(maxDRAMTimingBits.W)
|
||||
val tWTR = UInt(maxDRAMTimingBits.W)
|
||||
|
||||
|
||||
def tCAS2tCWL(tCAS: BigInt) = {
|
||||
require(tCAS > 4)
|
||||
if (tCAS > 12 ) tCAS - 4
|
||||
else if (tCAS > 9) tCAS - 3
|
||||
else if (tCAS > 7) tCAS - 2
|
||||
else if (tCAS > 5) tCAS - 1
|
||||
else tCAS
|
||||
}
|
||||
|
||||
// Defaults are set to sg093, x8, 2048Mb density (1GHz clock)
|
||||
val registers = Seq(
|
||||
tAL -> RuntimeSetting(0,"Additive Latency"),
|
||||
tCAS -> JSONSetting(14, "CAS Latency", { _("CL_TIME") }),
|
||||
tCMD -> JSONSetting(1, "Command Transport Time", { lut => 1 }),
|
||||
tCWD -> JSONSetting(10, "Write CAS Latency", { lut => tCAS2tCWL(lut("CL_TIME")) }),
|
||||
tCCD -> JSONSetting(4, "Column-to-Column Delay", { _("TCCD") }),
|
||||
tFAW -> JSONSetting(25, "Four row-Activation Window", { _("TFAW") }),
|
||||
tRAS -> JSONSetting(33, "Row Access Strobe Delay", { _("TRAS_MIN") }),
|
||||
tREFI -> JSONSetting(7800,"REFresh Interval", { _("TRFC_MAX")/9 }),
|
||||
tRC -> JSONSetting(47, "Row Cycle time", { _("TRC") }),
|
||||
tRCD -> JSONSetting(14, "Row-to-Column Delay", { _("TRCD") }),
|
||||
tRFC -> JSONSetting(160, "ReFresh Cycle time", { _("TRFC_MIN") }),
|
||||
tRRD -> JSONSetting(8, "Row-to-Row Delay", { _("TRRD") }),
|
||||
tRP -> JSONSetting(14, "Row-Precharge delay", { _("TRP") }),
|
||||
tRTP -> JSONSetting(8, "Read-To-Precharge delay", { lut => lut("TRTP").max(lut("TRTP_TCK")) }),
|
||||
tRTRS -> JSONSetting(2, "Rank-to-Rank Switching Time", { lut => 2 }), // FIXME
|
||||
tWR -> JSONSetting(15, "Write-Recovery time", { _("TWR") }),
|
||||
tWTR -> JSONSetting(8, "Write-To-Read Turnaround Time", { _("TWTR") })
|
||||
)
|
||||
|
||||
def setDependentRegisters(lut: Map[String, JSONField], freqMHz: BigInt) {
|
||||
val periodPs = 1000000.0/freqMHz.toFloat
|
||||
// Generate a lookup table of timings in units of tCK (as all programmable
|
||||
// timings in the model are in units of the controller clock frequency
|
||||
val lutTCK = lut.flatMap({
|
||||
case (name , JSONField(value, "ps")) =>
|
||||
Some(name -> BigInt(((value.toFloat + periodPs - 1)/periodPs).toInt))
|
||||
case (name , JSONField(value, "tCK")) => Some(name -> value)
|
||||
case _ => None
|
||||
})
|
||||
|
||||
registers foreach {
|
||||
case (elem, reg: JSONSetting) => reg.setWithLUT(lutTCK)
|
||||
case _ => None
|
||||
}
|
||||
}
|
||||
|
||||
override def cloneType = new DRAMProgrammableTimings().asInstanceOf[this.type]
|
||||
|
||||
}
|
||||
|
||||
case class DRAMBackendKey(writeDepth: Int, readDepth: Int, latencyBits: Int)
|
||||
|
||||
abstract class DRAMBaseConfig extends BaseConfig with HasDRAMMASConstants {
|
||||
def dramKey: DramOrganizationParams
|
||||
def backendKey: DRAMBackendKey
|
||||
}
|
||||
|
||||
abstract class BaseDRAMMMRegIO(cfg: DRAMBaseConfig) extends MMRegIO(cfg) with HasConsoleUtils {
|
||||
|
||||
// The default assignment corresponde to a standard open-page policy
|
||||
// with 8K pages. All available ranks are enabled.
|
||||
val bankAddr = Input(new ProgrammableSubAddr(
|
||||
maskBits = cfg.dramKey.bankBits,
|
||||
longName = "Bank Address",
|
||||
defaultOffset = 13, // Assume 8KB page size
|
||||
defaultMask = 7 // DDR3 Has 8 banks
|
||||
))
|
||||
|
||||
val rankAddr = Input(new ProgrammableSubAddr(
|
||||
maskBits = cfg.dramKey.rankBits,
|
||||
longName = "Rank Address",
|
||||
defaultOffset = bankAddr.defaultOffset + log2Ceil(bankAddr.defaultMask + 1),
|
||||
defaultMask = (1 << cfg.dramKey.rankBits) - 1
|
||||
))
|
||||
|
||||
val defaultRowOffset = rankAddr.defaultOffset + log2Ceil(rankAddr.defaultMask + 1)
|
||||
val rowAddr = Input(new ProgrammableSubAddr(
|
||||
maskBits = cfg.dramKey.rowBits,
|
||||
longName = "Row Address",
|
||||
defaultOffset = defaultRowOffset,
|
||||
defaultMask = (cfg.dramKey.dramSize >> defaultRowOffset.toInt) - 1
|
||||
))
|
||||
|
||||
// Page policy 1 = open, 0 = closed
|
||||
val openPagePolicy = Input(Bool())
|
||||
// Additional latency added to read data beats after it's received from the devices
|
||||
val backendLatency = Input(UInt(cfg.backendKey.latencyBits.W))
|
||||
|
||||
// Counts the number of misses in the open row buffer
|
||||
//val rowMisses = Output(UInt(32.W))
|
||||
val dramTimings = Input(new DRAMProgrammableTimings())
|
||||
val rankPower = Output(Vec(cfg.dramKey.maxRanks, new RankPowerIO))
|
||||
|
||||
|
||||
// END CHISEL TYPES
|
||||
val dramBaseRegisters = Seq(
|
||||
(openPagePolicy -> RuntimeSetting(1, "Open-Page Policy")),
|
||||
(backendLatency -> RuntimeSetting(2,
|
||||
"Backend Latency",
|
||||
min = 1,
|
||||
max = Some(1 << (cfg.backendKey.latencyBits - 1))))
|
||||
)
|
||||
|
||||
// A list of DDR3 speed grades provided by micron.
|
||||
// _1 = is used as a key to look up a device, _2 = long name
|
||||
val speedGrades = Seq(
|
||||
("sg093" -> "DDR3-2133 (14-14-14) Minimum Clock Period: 938 ps"),
|
||||
("sg107" -> "DDR3-1866 (13-13-13) Minimum Clock Period: 1071 ps"),
|
||||
("sg125" -> "DDR3-1600 (11-11-11) Minimum Clock Period: 1250 ps"),
|
||||
("sg15E" -> "DDR3-1333H (9-9-9) Minimum Clock Period: 1500 ps"),
|
||||
("sg15" -> "DDR3-1333J (10-10-10) Minimum Clock Period: 1500 ps"),
|
||||
("sg187U" -> "DDR3-1066F (7-7-7) Minimum Clock Period: 1875 ps"),
|
||||
("sg187" -> "DDR3-1066G (8-8-8) Minimum Clock Period: 1875 ps"),
|
||||
("sg25E" -> "DDR3-800E (5-5-5) Minimum Clock Period: 2500 ps"),
|
||||
("sg25" -> "DDR3-800 (6-6-6) Minimum Clock Period: 2500 ps")
|
||||
)
|
||||
|
||||
// Prompt the user for an address assignment scheme. TODO: Channel bits.
|
||||
def getAddressScheme(
|
||||
numRanks: BigInt,
|
||||
numBanks: BigInt,
|
||||
numRows: BigInt,
|
||||
numBytesPerLine: BigInt,
|
||||
pageSize: BigInt) {
|
||||
|
||||
case class SubAddr(
|
||||
shortName: String,
|
||||
longName: String,
|
||||
field: Option[ProgrammableSubAddr],
|
||||
count: BigInt) {
|
||||
require(isPow2(count))
|
||||
val bits = log2Ceil(count)
|
||||
def set(offset: Int) { field.foreach( _.forceSettings(offset, count - 1) ) }
|
||||
def legendEntry = s" ${shortName} -> ${longName}"
|
||||
}
|
||||
|
||||
val ranks = SubAddr("L", "Rank Address Bits", Some(rankAddr), numRanks)
|
||||
val banks = SubAddr("B", "Bank Address Bits", Some(bankAddr), numBanks)
|
||||
val rows = SubAddr("R", "Row Address Bits", Some(rowAddr), numRows)
|
||||
val linesPerRow = SubAddr("N", "log2(Lines Per Row)", None, pageSize/numBytesPerLine)
|
||||
val bytesPerLine= SubAddr("Z", "log2(Bytes Per Line)", None, numBytesPerLine)
|
||||
|
||||
// Address schemes
|
||||
// _1 = long name, _2 = A seq of subfields from address MSBs to LSBs
|
||||
val addressSchemes = Seq(
|
||||
"Baseline Open " -> Seq(rows, ranks, banks, linesPerRow, bytesPerLine),
|
||||
"Baseline Closed " -> Seq(rows, linesPerRow, ranks, banks, bytesPerLine)
|
||||
)
|
||||
|
||||
val legendHeader = s"${UNDERLINED}Legend${RESET}\n"
|
||||
val legendBody = (addressSchemes.head._2 map {_.legendEntry}).mkString("\n")
|
||||
|
||||
val schemeStrings = addressSchemes map { case (name, addrOrder) =>
|
||||
val shortNameOrder = (addrOrder map { _.shortName }).mkString(" | ")
|
||||
s"${name} -> ( ${shortNameOrder} ) "
|
||||
}
|
||||
|
||||
val scheme = addressSchemes(requestSeqSelection(
|
||||
"Select an address assignment scheme:",
|
||||
schemeStrings,
|
||||
legendHeader + legendBody + "\nAddress scheme number"))._2
|
||||
|
||||
def setSubAddresses(ranges: Seq[SubAddr], offset: Int = 0): Unit = ranges match {
|
||||
case current :: moreSigFields =>
|
||||
current.set(offset)
|
||||
setSubAddresses(moreSigFields, offset + current.bits)
|
||||
case Nil => None
|
||||
}
|
||||
setSubAddresses(scheme.reverse)
|
||||
}
|
||||
|
||||
// Prompt the user for a speedgrade selection. TODO: illegalize SGs based on frequency
|
||||
def getSpeedGrade(): String = {
|
||||
speedGrades(requestSeqSelection("Select a speed grade:", speedGrades.unzip._2))._1
|
||||
}
|
||||
|
||||
// Get the parameters (timings, bitwidths etc..) for a paticular device from jsons in resources/
|
||||
def lookupPart(density: BigInt, dqWidth: BigInt, speedGrade: String): Map[String, JSONField] = {
|
||||
val dqKey = "x" + dqWidth.toString
|
||||
val stream = getClass.getResourceAsStream(s"/midas/models/dram/${density}Mb_ddr3.json")
|
||||
val lines = Source.fromInputStream(stream).getLines
|
||||
implicit val formats = org.json4s.DefaultFormats
|
||||
val json = parse(lines.mkString).extract[Map[String, Map[String, Map[String, JSONField]]]]
|
||||
json(speedGrade)(dqKey)
|
||||
}
|
||||
|
||||
def setBaseDRAMSettings(): Unit = {
|
||||
|
||||
// Prompt the user for overall memory organization of this channel
|
||||
Console.println(s"${UNDERLINED}Memory system organization${RESET}")
|
||||
val memorySize = requestInput("Memory system size in GiB", 2)
|
||||
val numRanks = requestInput("Number of ranks", 1)
|
||||
val busWidth = requestInput("DRAM data bus width in bits", 64)
|
||||
val dqWidth = requestInput("Device DQ width", 8)
|
||||
|
||||
val devicesPerRank = busWidth / dqWidth
|
||||
val deviceDensityMib = ((memorySize << 30) * 8 / numRanks / devicesPerRank) >> 20
|
||||
Console.println(s"${GREEN}Selected Device density (Mib) -> ${deviceDensityMib}${RESET}")
|
||||
|
||||
// Select the appropriate device, and look up it's parameters in resource jsons
|
||||
Console.println(s"\n${UNDERLINED}Device Selection${RESET}")
|
||||
val freqMHz = requestInput("Clock Frequency in MHz", 1000)
|
||||
val speedGradeKey = getSpeedGrade()
|
||||
|
||||
val lut = lookupPart(deviceDensityMib, dqWidth, speedGradeKey)
|
||||
val dramTimingSettings = dramTimings.setDependentRegisters(lut, freqMHz)
|
||||
|
||||
// Determine the address assignment scheme
|
||||
Console.println(s"\n${UNDERLINED}Address assignment${RESET}")
|
||||
val lineSize = requestInput("Line size in Bytes", 64)
|
||||
|
||||
val numBanks = 8 // DDR3 Mandated
|
||||
val pageSize = ((BigInt(1) << lut("COL_BITS").value.toInt) * devicesPerRank * dqWidth ) / 8
|
||||
val numRows = BigInt(1) << lut("ROW_BITS").value.toInt
|
||||
getAddressScheme(numRanks, numBanks, numRows, lineSize, pageSize)
|
||||
}
|
||||
}
|
||||
|
||||
case class DramOrganizationParams(maxBanks: Int, maxRanks: Int, dramSize: BigInt, lineBits: Int = 8) {
|
||||
require(isPow2(maxBanks))
|
||||
require(isPow2(maxRanks))
|
||||
require(isPow2(dramSize))
|
||||
require(isPow2(lineBits))
|
||||
def bankBits = log2Up(maxBanks)
|
||||
def rankBits = log2Up(maxRanks)
|
||||
def rowBits = log2Ceil(dramSize) - lineBits
|
||||
def maxRows = 1 << rowBits
|
||||
}
|
||||
|
||||
trait CommandLegalBools {
|
||||
val canCASW = Output(Bool())
|
||||
val canCASR = Output(Bool())
|
||||
val canPRE = Output(Bool())
|
||||
val canACT = Output(Bool())
|
||||
}
|
||||
|
||||
trait HasLegalityUpdateIO {
|
||||
val key: DramOrganizationParams
|
||||
import DRAMMasEnums._
|
||||
val timings = Input(new DRAMProgrammableTimings)
|
||||
val selectedCmd = Input(cmd_nop.cloneType)
|
||||
val autoPRE = Input(Bool())
|
||||
val cmdRow = Input(UInt(key.rowBits.W))
|
||||
//val burstLength = Input(UInt(4.W)) // TODO: Fixme
|
||||
}
|
||||
|
||||
// Add some scheduler specific metadata to a reference
|
||||
// TODO factor out different MAS metadata into a mixin
|
||||
class MASEntry(key: DRAMBaseConfig)(implicit p: Parameters) extends Bundle {
|
||||
val xaction = new TransactionMetaData
|
||||
val rowAddr = UInt(key.dramKey.rowBits.W)
|
||||
val bankAddrOH = UInt(key.dramKey.maxBanks.W)
|
||||
val bankAddr = UInt(key.dramKey.bankBits.W)
|
||||
val rankAddrOH = UInt(key.dramKey.maxRanks.W)
|
||||
val rankAddr = UInt(key.dramKey.rankBits.W)
|
||||
|
||||
def decode(from: XactionSchedulerEntry, mmReg: BaseDRAMMMRegIO) {
|
||||
xaction := from.xaction
|
||||
bankAddr := mmReg.bankAddr.getSubAddr(from.addr)
|
||||
bankAddrOH := UIntToOH(bankAddr)
|
||||
rowAddr := mmReg.rowAddr.getSubAddr(from.addr)
|
||||
rankAddr := mmReg.rankAddr.getSubAddr(from.addr)
|
||||
rankAddrOH := UIntToOH(rankAddr)
|
||||
}
|
||||
|
||||
def addrMatch(rank: UInt, bank: UInt, row: Option[UInt] = None): Bool = {
|
||||
val rowHit = row.foldLeft(true.B)({ case (p, addr) => p && addr === rowAddr })
|
||||
rank === rankAddr && bank === bankAddr && rowHit
|
||||
}
|
||||
|
||||
override def cloneType = new MASEntry(key)(p).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class FirstReadyFCFSEntry(key: DRAMBaseConfig)(implicit p: Parameters) extends MASEntry(key)(p) {
|
||||
val isReady = Bool() //Set when entry hits in open row buffer
|
||||
val mayPRE = Bool() // Set when no other entires hit open row buffer
|
||||
|
||||
// We only ask for a precharge, if we have permission (no other references hit)
|
||||
// and the entry isn't personally ready
|
||||
def wantPRE(): Bool = !isReady && mayPRE // Don't need the dummy args
|
||||
def wantACT(): Bool = !isReady
|
||||
override def cloneType = new FirstReadyFCFSEntry(key)(p).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
// Tracks the state of a bank, including:
|
||||
// - Whether it's active/idle
|
||||
// - Open row address
|
||||
// - Whether CAS, PRE, and ACT commands can be legally issued
|
||||
//
|
||||
// A MAS model uses these trackers to filte out illegal instructions for this bank
|
||||
//
|
||||
// A necessary condition for the controller to issue a CMD that uses this bank
|
||||
// is that the can{CMD} bit be high. The controller of course all extra-bank
|
||||
// timing and resource constraints are met. The controller must also ensure CAS
|
||||
// commands use the open ROW.
|
||||
|
||||
class BankStateTrackerO(key: DramOrganizationParams) extends GenericParameterizedBundle(key)
|
||||
with CommandLegalBools {
|
||||
|
||||
import DRAMMasEnums._
|
||||
val openRow = Output(UInt(key.rowBits.W))
|
||||
val state = Output(Bool())
|
||||
|
||||
def isRowHit(ref: MASEntry): Bool = ref.rowAddr === openRow && state === bank_active
|
||||
}
|
||||
|
||||
class BankStateTrackerIO(val key: DramOrganizationParams) extends GenericParameterizedBundle(key)
|
||||
with HasLegalityUpdateIO {
|
||||
val out = new BankStateTrackerO(key)
|
||||
val cmdUsesThisBank = Input(Bool())
|
||||
}
|
||||
|
||||
class BankStateTracker(key: DramOrganizationParams) extends Module with HasDRAMMASConstants {
|
||||
import DRAMMasEnums._
|
||||
val io = IO(new BankStateTrackerIO(key))
|
||||
|
||||
val state = RegInit(bank_idle)
|
||||
val openRowAddr = Reg(UInt(key.rowBits.W))
|
||||
|
||||
val nextLegalPRE = Module(new DownCounter(maxDRAMTimingBits))
|
||||
val nextLegalACT = Module(new DownCounter(maxDRAMTimingBits))
|
||||
val nextLegalCAS = Module(new DownCounter(maxDRAMTimingBits))
|
||||
|
||||
Seq(nextLegalPRE, nextLegalCAS, nextLegalACT) foreach { mod =>
|
||||
mod.io.decr := true.B
|
||||
mod.io.set.valid := false.B
|
||||
mod.io.set.bits := DontCare
|
||||
}
|
||||
|
||||
when (io.cmdUsesThisBank) {
|
||||
switch(io.selectedCmd) {
|
||||
is(cmd_act) {
|
||||
assert(io.out.canACT, "Bank Timing Violation: Controller issued activate command illegally")
|
||||
state := bank_active
|
||||
openRowAddr := io.cmdRow
|
||||
nextLegalCAS.io.set.valid := true.B
|
||||
nextLegalCAS.io.set.bits := io.timings.tRCD - io.timings.tAL - 1.U
|
||||
nextLegalPRE.io.set.valid := true.B
|
||||
nextLegalPRE.io.set.bits := io.timings.tRAS - 1.U
|
||||
nextLegalACT.io.set.valid := true.B
|
||||
nextLegalACT.io.set.bits := io.timings.tRC - 1.U
|
||||
}
|
||||
is(cmd_casr) {
|
||||
assert(io.out.canCASR, "Bank Timing Violation: Controller issued CASR command illegally")
|
||||
when (io.autoPRE) {
|
||||
state := bank_idle
|
||||
nextLegalACT.io.set.valid := true.B
|
||||
nextLegalACT.io.set.bits := io.timings.tRTP + io.timings.tAL + io.timings.tRP - 1.U
|
||||
}.otherwise {
|
||||
nextLegalPRE.io.set.valid := true.B
|
||||
nextLegalPRE.io.set.bits := io.timings.tRTP + io.timings.tAL - 1.U
|
||||
}
|
||||
}
|
||||
is(cmd_casw) {
|
||||
assert(io.out.canCASW, "Bank Timing Violation: Controller issued CASW command illegally")
|
||||
when (io.autoPRE) {
|
||||
state := bank_idle
|
||||
nextLegalACT.io.set.valid := true.B
|
||||
nextLegalACT.io.set.bits := io.timings.tCWD + io.timings.tAL + io.timings.tWR +
|
||||
io.timings.tCCD + io.timings.tRP + 1.U
|
||||
}.otherwise {
|
||||
nextLegalPRE.io.set.valid := true.B
|
||||
nextLegalPRE.io.set.bits := io.timings.tCWD + io.timings.tAL + io.timings.tWR +
|
||||
io.timings.tCCD - 1.U
|
||||
}
|
||||
}
|
||||
is(cmd_pre) {
|
||||
assert(io.out.canPRE, "Bank Timing Violation: Controller issued PRE command illegally")
|
||||
state := bank_idle
|
||||
nextLegalACT.io.set.valid := true.B
|
||||
nextLegalACT.io.set.bits := io.timings.tRP - 1.U
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
io.out.canCASW := (state === bank_active) && nextLegalCAS.io.idle // Controller must check rowAddr
|
||||
io.out.canCASR := (state === bank_active) && nextLegalCAS.io.idle // Controller must check rowAddr
|
||||
io.out.canPRE := (state === bank_active) && nextLegalPRE.io.idle
|
||||
io.out.canACT := (state === bank_idle) && nextLegalACT.io.idle
|
||||
io.out.state := state
|
||||
io.out.openRow := openRowAddr
|
||||
}
|
||||
|
||||
|
||||
// Tracks the state of a rank, including:
|
||||
// - Whether CAS, PRE, and ACT commands can be legally issued
|
||||
//
|
||||
// A MAS model uses these trackers to filte out illegal instructions for this bank
|
||||
//
|
||||
// A necessary condition for the controller to issue a CMD that uses this bank
|
||||
// is that the can{CMD} bit be high. The controller of course all extra-bank
|
||||
// timing and resource constraints are met. The controller must also ensure CAS
|
||||
// commands use the open ROW.
|
||||
|
||||
class RankStateTrackerO(key: DramOrganizationParams) extends GenericParameterizedBundle(key)
|
||||
with CommandLegalBools {
|
||||
import DRAMMasEnums._
|
||||
val canREF = Output(Bool())
|
||||
val wantREF = Output(Bool())
|
||||
val state = Output(rank_active.cloneType)
|
||||
val banks = Vec(key.maxBanks, Output(new BankStateTrackerO(key)))
|
||||
}
|
||||
|
||||
class RankStateTrackerIO(val key: DramOrganizationParams) extends GenericParameterizedBundle(key)
|
||||
with HasLegalityUpdateIO with HasDRAMMASConstants {
|
||||
val rank = new RankStateTrackerO(key)
|
||||
val tCycle = Input(UInt(maxDRAMTimingBits.W))
|
||||
val cmdUsesThisRank = Input(Bool())
|
||||
val cmdBankOH = Input(UInt(key.maxBanks.W))
|
||||
}
|
||||
|
||||
class RankStateTracker(key: DramOrganizationParams) extends Module with HasDRAMMASConstants {
|
||||
import DRAMMasEnums._
|
||||
|
||||
val io = IO(new RankStateTrackerIO(key))
|
||||
|
||||
val nextLegalPRE = Module(new DownCounter(maxDRAMTimingBits))
|
||||
val nextLegalACT = Module(new DownCounter(tRFCBits))
|
||||
val nextLegalCASR = Module(new DownCounter(maxDRAMTimingBits))
|
||||
val nextLegalCASW = Module(new DownCounter(maxDRAMTimingBits))
|
||||
val tREFI = RegInit(0.U(tREFIBits.W))
|
||||
val state = RegInit(rank_active)
|
||||
val wantREF = RegInit(false.B)
|
||||
|
||||
Seq(nextLegalPRE, nextLegalCASW, nextLegalCASR, nextLegalACT) foreach { mod =>
|
||||
mod.io.decr := true.B
|
||||
mod.io.set.valid := false.B
|
||||
mod.io.set.bits := DontCare
|
||||
}
|
||||
|
||||
val tFAWcheck = Module(new Queue(io.tCycle.cloneType, entries = 4))
|
||||
tFAWcheck.io.enq.valid := io.cmdUsesThisRank && io.selectedCmd === cmd_act
|
||||
tFAWcheck.io.enq.bits := io.tCycle + io.timings.tFAW
|
||||
tFAWcheck.io.deq.ready := io.tCycle === tFAWcheck.io.deq.bits
|
||||
|
||||
when (io.cmdUsesThisRank && io.selectedCmd === cmd_act) {
|
||||
assert(io.rank.canACT, "Rank Timing Violation: Controller issued ACT command illegally")
|
||||
nextLegalACT.io.set.valid := true.B
|
||||
nextLegalACT.io.set.bits := io.timings.tRRD - 1.U
|
||||
|
||||
}.elsewhen (io.selectedCmd === cmd_casr) {
|
||||
assert(!io.cmdUsesThisRank || io.rank.canCASR,
|
||||
"Rank Timing Violation: Controller issued CASR command illegally")
|
||||
nextLegalCASR.io.set.valid := true.B
|
||||
nextLegalCASR.io.set.bits := io.timings.tCCD +
|
||||
Mux(io.cmdUsesThisRank, 0.U, io.timings.tRTRS) - 1.U
|
||||
|
||||
// TODO: tRTRS isn't the correct parameter here, but need a two cycle delay in DDR3
|
||||
nextLegalCASW.io.set.valid := true.B
|
||||
nextLegalCASW.io.set.bits := io.timings.tCAS + io.timings.tCCD - io.timings.tCWD +
|
||||
io.timings.tRTRS - 1.U
|
||||
|
||||
}.elsewhen (io.selectedCmd === cmd_casw) {
|
||||
assert(!io.cmdUsesThisRank || io.rank.canCASW,
|
||||
"Rank Timing Violation: Controller issued CASW command illegally")
|
||||
nextLegalCASR.io.set.valid := true.B
|
||||
nextLegalCASR.io.set.bits := Mux(io.cmdUsesThisRank,
|
||||
io.timings.tCWD + io.timings.tCCD + io.timings.tWTR - 1.U,
|
||||
io.timings.tCWD + io.timings.tCCD + io.timings.tRTRS - io.timings.tCAS - 1.U)
|
||||
|
||||
// TODO: OST
|
||||
nextLegalCASW.io.set.valid := true.B
|
||||
nextLegalCASW.io.set.bits := io.timings.tCCD - 1.U
|
||||
|
||||
}.elsewhen (io.cmdUsesThisRank && io.selectedCmd === cmd_pre) {
|
||||
assert(io.rank.canPRE, "Rank Timing Violation: Controller issued PRE command illegally")
|
||||
|
||||
}.elsewhen (io.cmdUsesThisRank && io.selectedCmd === cmd_ref) {
|
||||
assert(io.rank.canREF, "Rank Timing Violation: Controller issued REF command illegally")
|
||||
wantREF := false.B
|
||||
state := rank_refresh
|
||||
nextLegalACT.io.set.valid := true.B
|
||||
nextLegalACT.io.set.bits := io.timings.tRFC - 1.U
|
||||
}
|
||||
|
||||
// Disable refresion by setting tREFI = 0
|
||||
when (tREFI === io.timings.tREFI && io.timings.tREFI =/= 0.U) {
|
||||
tREFI := 0.U
|
||||
wantREF := true.B
|
||||
}.otherwise {
|
||||
tREFI := tREFI + 1.U
|
||||
}
|
||||
|
||||
when (state === rank_refresh && nextLegalACT.io.current === 1.U) {
|
||||
state := rank_active
|
||||
}
|
||||
|
||||
val bankTrackers = Seq.fill(key.maxBanks)(Module(new BankStateTracker(key)).io)
|
||||
io.rank.banks.zip(bankTrackers) foreach { case (out, bank) => out := bank.out }
|
||||
|
||||
bankTrackers.zip(io.cmdBankOH.toBools) foreach { case (bank, cmdUsesThisBank) =>
|
||||
bank.timings := io.timings
|
||||
bank.selectedCmd := io.selectedCmd
|
||||
bank.cmdUsesThisBank := cmdUsesThisBank && io.cmdUsesThisRank
|
||||
bank.cmdRow := io.cmdRow
|
||||
bank.autoPRE:= io.autoPRE
|
||||
}
|
||||
|
||||
io.rank.canREF := (bankTrackers map { _.out.canACT } reduce { _ && _ })
|
||||
io.rank.canCASR := nextLegalCASR.io.idle
|
||||
io.rank.canCASW := nextLegalCASW.io.idle
|
||||
io.rank.canPRE := nextLegalPRE.io.idle
|
||||
io.rank.canACT := nextLegalACT.io.idle && tFAWcheck.io.enq.ready
|
||||
io.rank.wantREF := wantREF
|
||||
io.rank.state := state
|
||||
}
|
||||
|
||||
|
||||
class CommandBusMonitor extends Module {
|
||||
import DRAMMasEnums._
|
||||
val io = IO( new Bundle {
|
||||
val cmd = Input(cmd_nop.cloneType)
|
||||
val rank = Input(UInt())
|
||||
val bank = Input(UInt())
|
||||
val row = Input(UInt())
|
||||
val autoPRE = Input(Bool())
|
||||
})
|
||||
|
||||
val cycleCounter = RegInit(1.U(32.W))
|
||||
val lastCommand = RegInit(0.U(32.W))
|
||||
cycleCounter := cycleCounter + 1.U
|
||||
when (io.cmd =/= cmd_nop) {
|
||||
lastCommand := cycleCounter
|
||||
when (lastCommand + 1.U =/= cycleCounter) { printf("nop(%d);\n", cycleCounter - lastCommand - 1.U) }
|
||||
}
|
||||
|
||||
switch (io.cmd) {
|
||||
is(cmd_act) {
|
||||
printf("activate(%d, %d, %d); // %d\n", io.rank, io.bank, io.row, cycleCounter)
|
||||
}
|
||||
is(cmd_casr) {
|
||||
val autoPRE = io.autoPRE
|
||||
val burstChop = false.B
|
||||
val column = 0.U // Don't care since we aren't checking data
|
||||
printf("read(%d, %d, %d, %x, %x); // %d\n",
|
||||
io.rank, io.bank, column, autoPRE, burstChop, cycleCounter)
|
||||
}
|
||||
is(cmd_casw) {
|
||||
val autoPRE = io.autoPRE
|
||||
val burstChop = false.B
|
||||
val column = 0.U // Don't care since we aren't checking data
|
||||
val mask = 0.U // Don't care since we aren't checking data
|
||||
val data = 0.U // Don't care since we aren't checking data
|
||||
printf("write(%d, %d, %d, %x, %x, %d, %d); // %d\n",
|
||||
io.rank, io.bank, column, autoPRE, burstChop, mask, data, cycleCounter)
|
||||
}
|
||||
is(cmd_ref) {
|
||||
printf("refresh(%d); // %d\n", io.rank, cycleCounter)
|
||||
}
|
||||
is(cmd_pre) {
|
||||
val preAll = false.B
|
||||
printf("precharge(%d,%d,%d); // %d\n",io.rank, io.bank, preAll, cycleCounter)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
class RankRefreshUnitIO(key: DramOrganizationParams) extends GenericParameterizedBundle(key) {
|
||||
val rankStati = Vec(key.maxRanks, Flipped(new RankStateTrackerO(key)))
|
||||
// The user may have instantiated multiple ranks, but is only modelling a single
|
||||
// rank system. Don't issue refreshes to ranks we aren't modelling
|
||||
val ranksInUse = Input(UInt(key.maxRanks.W))
|
||||
val suggestREF = Output(Bool())
|
||||
val refRankAddr = Output(UInt(key.rankBits.W))
|
||||
val suggestPRE = Output(Bool())
|
||||
val preRankAddr = Output(UInt(key.rankBits.W))
|
||||
val preBankAddr = Output(UInt(key.bankBits.W))
|
||||
}
|
||||
|
||||
class RefreshUnit(key: DramOrganizationParams) extends Module {
|
||||
val io = IO(new RankRefreshUnitIO(key))
|
||||
|
||||
val ranksWantingRefresh = VecInit(io.rankStati map { _.wantREF }).asUInt
|
||||
val refreshableRanks = VecInit(io.rankStati map { _.canREF }).asUInt & io.ranksInUse
|
||||
|
||||
io.refRankAddr := PriorityEncoder(ranksWantingRefresh & refreshableRanks)
|
||||
io.suggestREF := (ranksWantingRefresh & refreshableRanks).orR
|
||||
|
||||
// preRef => a precharge needed before refresh may occur
|
||||
val preRefBanks = io.rankStati map { rank => PriorityEncoder(rank.banks map { _.canPRE })}
|
||||
|
||||
val prechargeableRanks = VecInit(io.rankStati map { rank => rank.canPRE &&
|
||||
(rank.banks map { _.canPRE } reduce { _ || _ })}).asUInt & io.ranksInUse
|
||||
|
||||
io.suggestPRE := (ranksWantingRefresh & prechargeableRanks).orR
|
||||
io.preRankAddr := PriorityEncoder(ranksWantingRefresh & prechargeableRanks)
|
||||
io.preBankAddr := PriorityMux(ranksWantingRefresh & prechargeableRanks, preRefBanks)
|
||||
}
|
||||
|
||||
|
||||
// Outputs for counters used to feed to micron's power calculator
|
||||
// # CASR, CASW is a proxy for cycles of read and write data (assuming fixed burst length)
|
||||
// 1 - (ACT/(CASR + CASW)) = rank row buffer hit rate
|
||||
class RankPowerIO extends Bundle {
|
||||
val allPreCycles = UInt(32.W) // # of cycles the rank has all banks precharged
|
||||
val numCASR = UInt(32.W) // Assume no burst-chop
|
||||
val numCASW = UInt(32.W) // Ditto above
|
||||
val numACT = UInt(32.W)
|
||||
|
||||
// TODO
|
||||
// CKE low & all banks pre
|
||||
// CKE low & at least one bank active
|
||||
}
|
||||
|
||||
object RankPowerIO {
|
||||
def apply(): RankPowerIO = {
|
||||
val w = Wire(new RankPowerIO)
|
||||
w.allPreCycles := 0.U
|
||||
w.numCASR := 0.U
|
||||
w.numCASW := 0.U
|
||||
w.numACT := 0.U
|
||||
w
|
||||
}
|
||||
}
|
||||
|
||||
class RankPowerMonitor(key: DramOrganizationParams) extends Module with HasDRAMMASConstants {
|
||||
import DRAMMasEnums._
|
||||
val io = IO(new Bundle {
|
||||
val stats = Output(new RankPowerIO)
|
||||
val rankState = Input(new RankStateTrackerO(key))
|
||||
val selectedCmd = Input(cmd_nop.cloneType)
|
||||
val cmdUsesThisRank = Input(Bool())
|
||||
})
|
||||
val stats = RegInit(RankPowerIO())
|
||||
|
||||
when (io.cmdUsesThisRank) {
|
||||
switch(io.selectedCmd) {
|
||||
is(cmd_act) {
|
||||
stats.numACT := stats.numACT + 1.U
|
||||
}
|
||||
is(cmd_casw) {
|
||||
stats.numCASW := stats.numCASW + 1.U
|
||||
}
|
||||
is(cmd_casr) {
|
||||
stats.numCASR := stats.numCASR + 1.U
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// This is questionable. Needs to be reevaluated once CKE toggling is accounted for
|
||||
when (io.rankState.state =/= rank_refresh && ((io.rankState.banks) forall { _.canACT })) {
|
||||
stats.allPreCycles := stats.allPreCycles + 1.U
|
||||
}
|
||||
|
||||
io.stats := stats
|
||||
}
|
||||
|
||||
class DRAMBackendIO(val latencyBits: Int)(implicit val p: Parameters) extends Bundle {
|
||||
val newRead = Flipped(Decoupled(new ReadResponseMetaData))
|
||||
val newWrite = Flipped(Decoupled(new WriteResponseMetaData))
|
||||
val completedRead = Decoupled(new ReadResponseMetaData)
|
||||
val completedWrite = Decoupled(new WriteResponseMetaData)
|
||||
val readLatency = Input(UInt(latencyBits.W))
|
||||
val writeLatency = Input(UInt(latencyBits.W))
|
||||
val tCycle = Input(UInt(latencyBits.W))
|
||||
}
|
||||
|
||||
class DRAMBackend(key: DRAMBackendKey)(implicit p: Parameters) extends Module {
|
||||
val io = IO(new DRAMBackendIO(key.latencyBits))
|
||||
val rQueue = Module(new DynamicLatencyPipe(new ReadResponseMetaData, key.readDepth, key.latencyBits))
|
||||
val wQueue = Module(new DynamicLatencyPipe(new WriteResponseMetaData, key.writeDepth, key.latencyBits))
|
||||
|
||||
io.completedRead <> rQueue.io.deq
|
||||
io.completedWrite <> wQueue.io.deq
|
||||
rQueue.io.enq <> io.newRead
|
||||
rQueue.io.latency := io.readLatency
|
||||
wQueue.io.enq <> io.newWrite
|
||||
wQueue.io.latency := io.writeLatency
|
||||
Seq(rQueue, wQueue) foreach { _.io.tCycle := io.tCycle }
|
||||
}
|
|
@ -0,0 +1,358 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
import junctions._
|
||||
import midas.widgets._
|
||||
|
||||
/** A simple freelist
|
||||
* @param entries The number of IDS to be managed by the free list
|
||||
*
|
||||
* Inputs: freeId. Valid is asserted along side an ID that is to be
|
||||
* returned to the freelist
|
||||
*
|
||||
* Outputs: nextId. The next available ID. Granted on a successful handshake
|
||||
*/
|
||||
|
||||
class FreeList(entries: Int) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val freeId = Flipped(Valid(UInt(log2Up(entries).W)))
|
||||
val nextId = Decoupled(UInt(log2Up(entries).W))
|
||||
})
|
||||
require(entries > 0)
|
||||
val nextId = RegInit({ val i = Wire(Valid(UInt())); i.valid := true.B;
|
||||
i.bits := 0.U; i})
|
||||
|
||||
io.nextId.valid := nextId.valid
|
||||
io.nextId.bits := nextId.bits
|
||||
// Add an extra entry to represent the empty bit. Maybe not necessary?
|
||||
val ids = RegInit(Vec.tabulate(entries)(i =>
|
||||
if (i == 0) false.B else true.B))
|
||||
val next = ids.indexWhere((x:Bool) => x)
|
||||
|
||||
when(io.nextId.fire() || ~nextId.valid) {
|
||||
nextId.bits := next
|
||||
nextId.valid := ids.exists((x: Bool) => x)
|
||||
ids(next) := false.B
|
||||
}
|
||||
|
||||
when(io.freeId.valid) {
|
||||
ids(io.freeId.bits) := true.B
|
||||
}
|
||||
}
|
||||
|
||||
// This maintains W-W R-R orderings by managing a set of shared physical
|
||||
// queues based on the the NASTI id field.
|
||||
class RATEntry(vIdWidth: Int, pIdWidth: Int) extends Bundle {
|
||||
val current = Valid(UInt(vIdWidth.W))
|
||||
val next = Valid(UInt(pIdWidth.W))
|
||||
val head = Output(Bool())
|
||||
|
||||
def matchHead(id: UInt): Bool = {
|
||||
(current.bits === id) && head
|
||||
}
|
||||
|
||||
def matchTail(id: UInt): Bool = {
|
||||
(current.bits === id) && (current.valid) && !next.valid
|
||||
}
|
||||
def push(id: UInt) {
|
||||
next.bits := id
|
||||
next.valid := true.B
|
||||
}
|
||||
def setTranslation(id: UInt) {
|
||||
current.bits := id
|
||||
current.valid := true.B
|
||||
}
|
||||
|
||||
def setHead() { head := true.B }
|
||||
def pop() {
|
||||
current.valid := false.B
|
||||
next.valid := false.B
|
||||
head := false.B
|
||||
}
|
||||
override def cloneType() = new RATEntry(vIdWidth, pIdWidth).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
object RATEntry {
|
||||
def apply(vIdWidth: Int, pIdWidth: Int) = {
|
||||
val entry = Wire(new RATEntry(vIdWidth, pIdWidth))
|
||||
entry.current.valid := false.B
|
||||
entry.current.bits := DontCare
|
||||
entry.next.valid := false.B
|
||||
entry.next.bits := DontCare
|
||||
entry.head := false.B
|
||||
entry
|
||||
}
|
||||
}
|
||||
|
||||
class AllocationIO(vIdWidth: Int, pIdWidth: Int) extends Bundle {
|
||||
val pId = Output(UInt(pIdWidth.W))
|
||||
val vId = Input(UInt(vIdWidth.W))
|
||||
val ready = Output(Bool())
|
||||
val valid = Input(Bool())
|
||||
|
||||
def fire(): Bool = ready && valid
|
||||
}
|
||||
|
||||
|
||||
class ReorderBuffer(val numVIds: Int, val numPIds: Int) extends Module {
|
||||
val pIdWidth = log2Up(numPIds)
|
||||
val vIdWidth = log2Up(numVIds)
|
||||
val io = IO(new Bundle {
|
||||
// Free a physical ID
|
||||
val free = Flipped(Valid(UInt(pIdWidth.W)))
|
||||
// ID Allocation. Two way handshake. The next available PId is held on
|
||||
// nextPId.bits. nextPID.valid == false if there are no free IDs avaiable.
|
||||
// Allocation occurs when nextPId.fire asserts
|
||||
val next = new AllocationIO(vIdWidth, pIdWidth)
|
||||
val trans = new AllocationIO(vIdWidth, pIdWidth)
|
||||
})
|
||||
|
||||
val rat = RegInit(Vec.fill(numPIds)(RATEntry(vIdWidth, pIdWidth)))
|
||||
val freeList = Module(new FreeList(numPIds))
|
||||
freeList.io.freeId <> io.free
|
||||
|
||||
// PID allocation
|
||||
io.next.ready := freeList.io.nextId.valid
|
||||
freeList.io.nextId.ready := io.next.valid
|
||||
val nextPId = freeList.io.nextId.bits
|
||||
io.next.pId := nextPId
|
||||
|
||||
// Pointer to the child of an entry being freed (it will become the new head)
|
||||
val nextHeadPtr = WireInit({val w = Wire(Valid(UInt(pIdWidth.W))); w.valid := false.B; w.bits := DontCare; w})
|
||||
|
||||
// Pointer to the parent of a entry being appended to a linked-list
|
||||
val parentEntryPtr = Wire(Valid(UInt()))
|
||||
parentEntryPtr.bits := rat.onlyIndexWhere(_.matchTail(io.next.vId))
|
||||
parentEntryPtr.valid := io.next.fire() && rat.exists(_.matchTail(io.next.vId))
|
||||
|
||||
|
||||
for((entry,index) <- rat.zipWithIndex){
|
||||
// Allocation: Set the pointer of the new entry's parent
|
||||
when(parentEntryPtr.valid && parentEntryPtr.bits === index.U){
|
||||
rat(parentEntryPtr.bits).push(nextPId)
|
||||
}
|
||||
// Deallocation: Set the head bit of a link list whose head is to be freed
|
||||
when(nextHeadPtr.valid && (index.U === nextHeadPtr.bits)) {
|
||||
rat(index).setHead()
|
||||
}
|
||||
// Allocation: Add the new entry to the table
|
||||
when(io.next.fire() && nextPId === index.U) {
|
||||
rat(index).setTranslation(io.next.vId)
|
||||
// We set the head bit if no linked-list exists for this vId, or
|
||||
// if the parent, and thus previous head, is about to be freed.
|
||||
when (~parentEntryPtr.valid ||
|
||||
(io.trans.fire() && (io.trans.pId === parentEntryPtr.bits))){
|
||||
rat(index).setHead()
|
||||
}
|
||||
}
|
||||
// Deallocation: invalidate the entry = io.free.bits
|
||||
// Note this exploits last connect semantics to override the pushing
|
||||
// of new child to this entry when it is about to be freed.
|
||||
when(io.trans.fire() && (index.U === io.trans.pId)) {
|
||||
assert(rat(index).head)
|
||||
nextHeadPtr := rat(index).next
|
||||
rat(index).pop()
|
||||
}
|
||||
}
|
||||
|
||||
io.trans.pId := rat.onlyIndexWhere(_.matchHead(io.trans.vId))
|
||||
io.trans.ready := rat.exists(_.matchHead(io.trans.vId))
|
||||
}
|
||||
|
||||
|
||||
// Read response staging units only buffer data and last fields of a B payload
|
||||
class StoredBeat(implicit p: Parameters) extends NastiBundle()(p) with HasNastiData
|
||||
|
||||
// Buffers read reponses from the host-memory system in a structure that maintains
|
||||
// their per-transaction ID ordering.
|
||||
class ReadEgressResponseIO(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val tBits = Output(new NastiReadDataChannel)
|
||||
val tReady = Input(Bool()) // Really this is part of the input token to the egress unit...
|
||||
val hValid = Output(Bool())
|
||||
}
|
||||
|
||||
class ReadEgressReqIO(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val t = Output(Valid(UInt(p(NastiKey).idBits.W)))
|
||||
val hValid = Output(Bool())
|
||||
}
|
||||
|
||||
class ReadEgress(maxRequests: Int, maxReqLength: Int, maxReqsPerId: Int)
|
||||
(implicit val p: Parameters) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val enq = Flipped(Decoupled(new NastiReadDataChannel))
|
||||
val resp = new ReadEgressResponseIO
|
||||
val req = Flipped(new ReadEgressReqIO)
|
||||
})
|
||||
|
||||
// The total BRAM state required to implement a maximum length queue for each AXI transaction ID
|
||||
val virtualState = (maxReqsPerId * (1 << p(NastiKey).idBits) * maxReqLength * p(NastiKey).dataBits)
|
||||
// The total BRAM state required to dynamically allocate a entres to responses
|
||||
val physicalState = (maxRequests * maxReqLength * p(NastiKey).dataBits)
|
||||
// 0x20000 = 4 32 Kb BRAMs
|
||||
val generateTranslation = (virtualState > 0x20000) && (virtualState > physicalState + 0x10000)
|
||||
|
||||
// This module fires whenever there is a token available on the request port.
|
||||
val targetFire = io.req.hValid
|
||||
|
||||
// On reset, the egress unit always has a single output token valid, but with invalid target data
|
||||
val currReqReg = RegInit({
|
||||
val r = Wire(io.req.t.cloneType)
|
||||
r.valid := false.B
|
||||
r.bits := DontCare
|
||||
r
|
||||
})
|
||||
|
||||
val xactionDone = Wire(Bool())
|
||||
when (targetFire && io.req.t.valid) {
|
||||
currReqReg := io.req.t
|
||||
}.elsewhen (targetFire && xactionDone) {
|
||||
currReqReg.valid := false.B
|
||||
}
|
||||
|
||||
val xactionStart = targetFire && io.req.t.valid
|
||||
// Queue address into which to enqueue the host-response
|
||||
val enqPId = Wire(Valid(UInt()))
|
||||
// Queue address from which to dequeue the response
|
||||
val (deqPId: UInt, deqPIdReg: ValidIO[UInt]) = if (generateTranslation) {
|
||||
val rob = Module(new ReorderBuffer(1 << p(NastiKey).idBits, maxRequests))
|
||||
val enqPIdReg = RegInit({val i = Wire(Valid(UInt(log2Up(maxRequests).W)))
|
||||
i.valid := false.B;
|
||||
i.bits := DontCare;
|
||||
i})
|
||||
|
||||
val deqPIdReg = RegInit({ val r = Wire(Valid(UInt(log2Up(maxRequests).W)));
|
||||
r.valid := false.B;
|
||||
r.bits := DontCare;
|
||||
r })
|
||||
val translationFailure = currReqReg.valid && ~deqPIdReg.valid
|
||||
|
||||
rob.io.trans.vId := Mux(translationFailure, currReqReg.bits, io.req.t.bits)
|
||||
rob.io.trans.valid := translationFailure || xactionStart
|
||||
rob.io.free.valid := xactionDone
|
||||
rob.io.free.bits := deqPIdReg.bits
|
||||
|
||||
when(rob.io.trans.fire()) {
|
||||
deqPIdReg.valid := rob.io.trans.fire()
|
||||
deqPIdReg.bits := rob.io.trans.pId
|
||||
}.elsewhen (targetFire && xactionDone) {
|
||||
deqPIdReg.valid := false.B
|
||||
}
|
||||
|
||||
|
||||
//Don't initiate another allocation until the current one has finished
|
||||
rob.io.next.vId := io.enq.bits.id
|
||||
io.enq.ready := enqPId.valid
|
||||
assert(enqPId.valid || ~io.enq.valid)
|
||||
rob.io.next.valid := ~enqPIdReg.valid && io.enq.valid
|
||||
enqPId.bits := Mux(enqPIdReg.valid, enqPIdReg.bits, rob.io.next.pId)
|
||||
enqPId.valid := enqPIdReg.valid || rob.io.next.ready
|
||||
when (io.enq.fire()) {
|
||||
when (io.enq.bits.last) {
|
||||
enqPIdReg.valid := false.B
|
||||
}.elsewhen (~enqPIdReg.valid) {
|
||||
enqPIdReg.valid := true.B
|
||||
enqPIdReg.bits := rob.io.next.pId
|
||||
}
|
||||
}
|
||||
// Deq using the translation if first beat, otherwise use the register
|
||||
val deqPId = Mux(translationFailure || xactionStart, rob.io.trans.pId, deqPIdReg.bits)
|
||||
(deqPId, deqPIdReg)
|
||||
} else {
|
||||
enqPId.bits := io.enq.bits.id
|
||||
enqPId.valid := io.enq.valid
|
||||
io.enq.ready := true.B
|
||||
val deqPId = Mux(xactionStart, io.req.t.bits, currReqReg.bits)
|
||||
(deqPId, currReqReg)
|
||||
}
|
||||
|
||||
val mQDepth = if (generateTranslation) maxReqLength else maxReqLength * maxReqsPerId
|
||||
val mQWidth = if (generateTranslation) maxRequests else 1 << p(NastiKey).idBits
|
||||
val multiQueue = Module(new MultiQueue(new StoredBeat, mQWidth, mQDepth))
|
||||
|
||||
multiQueue.io.enq.bits.data := io.enq.bits.data
|
||||
multiQueue.io.enq.bits.last := io.enq.bits.last
|
||||
multiQueue.io.enq.valid := io.enq.valid
|
||||
multiQueue.io.enqAddr := enqPId.bits
|
||||
multiQueue.io.deqAddr := deqPId
|
||||
|
||||
xactionDone := targetFire && currReqReg.valid && deqPIdReg.valid &&
|
||||
io.resp.tReady && io.resp.tBits.last
|
||||
|
||||
io.resp.tBits := NastiReadDataChannel(currReqReg.bits,
|
||||
multiQueue.io.deq.bits.data, multiQueue.io.deq.bits.last)
|
||||
io.resp.hValid := ~currReqReg.valid || (deqPIdReg.valid && multiQueue.io.deq.valid)
|
||||
multiQueue.io.deq.ready := targetFire && currReqReg.valid &&
|
||||
deqPIdReg.valid && io.resp.tReady
|
||||
}
|
||||
|
||||
class WriteEgressResponseIO(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val tBits = Output(new NastiWriteResponseChannel)
|
||||
val tReady = Input(Bool())
|
||||
val hValid = Output(Bool())
|
||||
}
|
||||
|
||||
class WriteEgressReqIO(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val t = Output(Valid(UInt(p(NastiKey).idBits.W)))
|
||||
val hValid = Output(Bool())
|
||||
}
|
||||
// Maintains a series of incrementer/decrementers to track the number of
|
||||
// write acknowledgements returned by the host memory system. No other
|
||||
// response metadata is stored.
|
||||
class WriteEgress(maxRequests: Int, maxReqLength: Int, maxReqsPerId: Int)
|
||||
(implicit val p: Parameters) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val enq = Flipped(Decoupled(new NastiWriteResponseChannel))
|
||||
val resp = new WriteEgressResponseIO
|
||||
val req = Flipped(new WriteEgressReqIO)
|
||||
})
|
||||
|
||||
// This module fires whenever there is a token available on the request port.
|
||||
val targetFire = io.req.hValid
|
||||
|
||||
// Indicates whether the egress unit is releasing a transaction
|
||||
val currReqReg = RegInit({
|
||||
val r = Wire(io.req.t.cloneType)
|
||||
r.valid := false.B
|
||||
r.bits := DontCare
|
||||
r
|
||||
})
|
||||
val haveAck = RegInit(false.B)
|
||||
when (targetFire && io.req.t.valid) {
|
||||
currReqReg := io.req.t
|
||||
}.elsewhen (targetFire && currReqReg.valid && haveAck && io.resp.tReady) {
|
||||
currReqReg.valid := false.B
|
||||
}
|
||||
|
||||
val ackCounters = Seq.fill(1 << p(NastiKey).idBits)(RegInit(0.U(log2Up(maxReqsPerId + 1).W)))
|
||||
val notEmpty = VecInit(ackCounters map {_ =/= 0.U})
|
||||
val retry = currReqReg.valid && !haveAck
|
||||
val deqId = Mux(retry, currReqReg.bits, io.req.t.bits)
|
||||
when (retry || targetFire && io.req.t.valid) {
|
||||
haveAck := notEmpty(deqId)
|
||||
}
|
||||
|
||||
val idMatch = currReqReg.bits === io.enq.bits.id
|
||||
val do_enq = io.enq.fire()
|
||||
val do_deq = targetFire && currReqReg.valid && haveAck && io.resp.tReady
|
||||
ackCounters.zipWithIndex foreach { case (count, idx) =>
|
||||
when (!(do_deq && do_enq && idMatch)) {
|
||||
when(do_enq && io.enq.bits.id === idx.U) {
|
||||
count := count + 1.U
|
||||
}.elsewhen(do_deq && currReqReg.bits === idx.U) {
|
||||
count := count - 1.U
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
io.resp.tBits := NastiWriteResponseChannel(currReqReg.bits)
|
||||
io.resp.hValid := !currReqReg.valid || haveAck
|
||||
io.enq.ready := true.B
|
||||
}
|
||||
|
||||
trait EgressUnitParameters {
|
||||
val egressUnitDelay = 1
|
||||
}
|
||||
|
|
@ -0,0 +1,577 @@
|
|||
// See LICENSE for license details.
|
||||
package midas
|
||||
package models
|
||||
|
||||
// From RC
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
import freechips.rocketchip.util.{DecoupledHelper}
|
||||
import freechips.rocketchip.diplomacy.{LazyModule}
|
||||
import freechips.rocketchip.amba.axi4.{AXI4EdgeParameters, AXI4Bundle}
|
||||
import junctions._
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.experimental.dontTouch
|
||||
|
||||
import midas.core._
|
||||
import midas.widgets._
|
||||
import midas.passes.{Fame1ChiselAnnotation}
|
||||
import midas.passes.fame.{HasSerializationHints}
|
||||
|
||||
import scala.math.min
|
||||
import Console.{UNDERLINED, RESET}
|
||||
|
||||
import java.io.{File, FileWriter}
|
||||
|
||||
// Note: NASTI -> legacy rocket chip implementation of AXI4
|
||||
case object FasedAXI4Edge extends Field[Option[AXI4EdgeSummary]](None)
|
||||
|
||||
case class BaseParams(
|
||||
// Pessimistically provisions the functional model. Don't be cheap:
|
||||
// underprovisioning will force functional model to assert backpressure on
|
||||
// target AW. W or R channels, which may lead to unexpected bandwidth throttling.
|
||||
maxReads: Int,
|
||||
maxWrites: Int,
|
||||
nastiKey: Option[NastiParameters] = None,
|
||||
edge: Option[AXI4EdgeParameters] = None,
|
||||
|
||||
// AREA OPTIMIZATIONS:
|
||||
// AXI4 bursts(INCR) can be 256 beats in length -- some
|
||||
// area can be saved if the target design only issues smaller requests
|
||||
maxReadLength: Int = 256,
|
||||
maxReadsPerID: Option[Int] = None,
|
||||
maxWriteLength: Int = 256,
|
||||
maxWritesPerID: Option[Int] = None,
|
||||
|
||||
// DEBUG FEATURES
|
||||
// Check for collisions in pending reads and writes to the host memory system
|
||||
// May produce false positives in timing models that reorder requests
|
||||
detectAddressCollisions: Boolean = false,
|
||||
|
||||
// HOST INSTRUMENTATION
|
||||
stallEventCounters: Boolean = false, // To track causes of target-time stalls
|
||||
localHCycleCount: Boolean = false, // Host Cycle Counter
|
||||
latencyHistograms: Boolean = false, // Creates a BRAM histogram of various system latencies
|
||||
|
||||
// BASE TIMING-MODEL SETTINGS
|
||||
// Some(key) instantiates an LLC model in front of the DRAM timing model
|
||||
llcKey: Option[LLCParams] = None,
|
||||
|
||||
// BASE TIMING-MODEL INSTRUMENTATION
|
||||
xactionCounters: Boolean = true, // Numbers of read and write AXI4 xactions
|
||||
beatCounters: Boolean = false, // Numbers of read and write beats in AXI4 xactions
|
||||
targetCycleCounter: Boolean = false, // Redundant in a full simulator; useful for testing
|
||||
|
||||
// Number of xactions in flight in a given cycle. Bin N contains the range
|
||||
// (occupancyHistograms[N-1], occupancyHistograms[N]]
|
||||
occupancyHistograms: Seq[Int] = Seq(0, 2, 4, 8),
|
||||
addrRangeCounters: BigInt = BigInt(0)
|
||||
)
|
||||
// A serializable summary of the diplomatic edge
|
||||
case class AXI4EdgeSummary(
|
||||
maxReadTransfer: Int,
|
||||
maxWriteTransfer: Int,
|
||||
idReuse: Option[Int],
|
||||
maxFlight: Option[Int],
|
||||
)
|
||||
|
||||
object AXI4EdgeSummary {
|
||||
// Returns max ID reuse; None -> unbounded
|
||||
private def getIDReuseFromEdge(e: AXI4EdgeParameters): Option[Int] = {
|
||||
val maxFlightPerMaster = e.master.masters.map(_.maxFlight)
|
||||
maxFlightPerMaster.reduce( (_,_) match {
|
||||
case (Some(prev), Some(cur)) => Some(scala.math.max(prev, cur))
|
||||
case _ => None
|
||||
})
|
||||
}
|
||||
// Returns (maxReadLength, maxWriteLength)
|
||||
private def getMaxTransferFromEdge(e: AXI4EdgeParameters): (Int, Int) = {
|
||||
val beatBytes = e.slave.beatBytes
|
||||
val readXferSize = e.slave.slaves.head.supportsRead.max
|
||||
val writeXferSize = e.slave.slaves.head.supportsWrite.max
|
||||
((readXferSize + beatBytes - 1) / beatBytes, (writeXferSize + beatBytes - 1) / beatBytes)
|
||||
}
|
||||
|
||||
// Sums up the maximum number of requests that can be inflight across all masters
|
||||
// None -> unbounded
|
||||
private def getMaxTotalFlightFromEdge(e: AXI4EdgeParameters): Option[Int] = {
|
||||
val maxFlightPerMaster = e.master.masters.map(_.maxFlight)
|
||||
maxFlightPerMaster.reduce( (_,_) match {
|
||||
case (Some(prev), Some(cur)) => Some(prev + cur)
|
||||
case _ => None
|
||||
})
|
||||
}
|
||||
|
||||
def apply(e: AXI4EdgeParameters): AXI4EdgeSummary = AXI4EdgeSummary(
|
||||
getMaxTransferFromEdge(e)._1,
|
||||
getMaxTransferFromEdge(e)._2,
|
||||
getIDReuseFromEdge(e),
|
||||
getMaxTotalFlightFromEdge(e))
|
||||
}
|
||||
|
||||
abstract class BaseConfig {
|
||||
def params: BaseParams
|
||||
|
||||
private def getMaxPerID(e: Option[AXI4EdgeSummary], modelMaxXactions: Int, userMax: Option[Int])(implicit p: Parameters): Int = {
|
||||
e.flatMap(_.idReuse).getOrElse(min(userMax.getOrElse(modelMaxXactions), modelMaxXactions))
|
||||
}
|
||||
|
||||
def maxReadLength(implicit p: Parameters) = p(FasedAXI4Edge) match {
|
||||
case Some(e) => e.maxReadTransfer
|
||||
case _ => params.maxReadLength
|
||||
}
|
||||
|
||||
def maxWriteLength(implicit p: Parameters) = p(FasedAXI4Edge) match {
|
||||
case Some(e) => e.maxWriteTransfer
|
||||
case _ => params.maxWriteLength
|
||||
}
|
||||
|
||||
def maxWritesPerID(implicit p: Parameters) = getMaxPerID(p(FasedAXI4Edge), params.maxWrites, params.maxWritesPerID)
|
||||
def maxReadsPerID(implicit p: Parameters) = getMaxPerID(p(FasedAXI4Edge), params.maxReads, params.maxReadsPerID)
|
||||
|
||||
def maxWrites(implicit p: Parameters) = {
|
||||
val maxFromEdge = p(FasedAXI4Edge).flatMap(_.maxFlight).getOrElse(params.maxWrites)
|
||||
min(params.maxWrites, maxFromEdge)
|
||||
}
|
||||
|
||||
def maxReads(implicit p: Parameters) = {
|
||||
val maxFromEdge = p(FasedAXI4Edge).flatMap(_.maxFlight).getOrElse(params.maxReads)
|
||||
min(params.maxReads, maxFromEdge)
|
||||
}
|
||||
|
||||
def useLLCModel = params.llcKey != None
|
||||
|
||||
// Timing model classes implement this function to elaborate the correct module
|
||||
def elaborate()(implicit p: Parameters): TimingModel
|
||||
|
||||
def maxWritesBits(implicit p: Parameters) = log2Up(maxWrites)
|
||||
def maxReadsBits(implicit p: Parameters) = log2Up(maxReads)
|
||||
}
|
||||
|
||||
|
||||
// A wrapper bundle around all of the programmable settings in the functional model (!timing model).
|
||||
class FuncModelProgrammableRegs extends Bundle with HasProgrammableRegisters {
|
||||
val relaxFunctionalModel = Input(Bool())
|
||||
|
||||
val registers = Seq(
|
||||
(relaxFunctionalModel -> RuntimeSetting(0, """Relax functional model""", max = Some(1)))
|
||||
)
|
||||
|
||||
def getFuncModelSettings(): Seq[(String, String)] = {
|
||||
Console.println(s"${UNDERLINED}Functional Model Settings${RESET}")
|
||||
setUnboundSettings()
|
||||
getSettings()
|
||||
}
|
||||
}
|
||||
|
||||
class FASEDTargetIO(implicit val p: Parameters) extends Bundle {
|
||||
val axi4 = Flipped(new NastiIO)
|
||||
val reset = Input(Bool())
|
||||
}
|
||||
|
||||
class MemModelIO(implicit val p: Parameters) extends WidgetIO()(p){
|
||||
// The default NastiKey is expected to be that of the target
|
||||
val host_mem = new NastiIO()(p.alterPartial({ case NastiKey => p(MemNastiKey)}))
|
||||
}
|
||||
|
||||
// Need to wrap up all the parameters in a case class for serialization. The edge and width
|
||||
// were previously passed in via the target's Parameters object
|
||||
case class CompleteConfig(
|
||||
userProvided: BaseConfig,
|
||||
axi4Widths: NastiParameters,
|
||||
axi4Edge: Option[AXI4EdgeSummary] = None) extends HasSerializationHints {
|
||||
def typeHints(): Seq[Class[_]] = Seq(userProvided.getClass)
|
||||
}
|
||||
|
||||
class FASEDMemoryTimingModel(completeConfig: CompleteConfig, hostParams: Parameters) extends BridgeModule[HostPortIO[FASEDTargetIO]]()(hostParams) {
|
||||
val cfg = completeConfig.userProvided
|
||||
// Reconstitute the parameters object
|
||||
implicit override val p = hostParams.alterPartial({
|
||||
case NastiKey => completeConfig.axi4Widths
|
||||
case FasedAXI4Edge => completeConfig.axi4Edge
|
||||
})
|
||||
|
||||
require(p(NastiKey).idBits <= p(MemNastiKey).idBits,
|
||||
"Target AXI4 IDs cannot be mapped 1:1 onto host AXI4 IDs"
|
||||
)
|
||||
|
||||
val io = IO(new MemModelIO)
|
||||
val hPort = IO(HostPort(new FASEDTargetIO))
|
||||
val tNasti = hPort.hBits.axi4
|
||||
val tReset = hPort.hBits.reset
|
||||
|
||||
val model = cfg.elaborate()
|
||||
printGenerationConfig
|
||||
|
||||
// Debug: Put an optional bound on the number of memory requests we can make
|
||||
// to the host memory system
|
||||
val funcModelRegs = Wire(new FuncModelProgrammableRegs)
|
||||
val ingress = Module(new IngressModule(cfg))
|
||||
|
||||
// Drop in a width adapter to handle differences between
|
||||
// the host and target memory widths
|
||||
val widthAdapter = Module(LazyModule(
|
||||
new TargetToHostAXI4Converter(p(NastiKey), p(MemNastiKey))
|
||||
).module)
|
||||
|
||||
val hostMemOffsetWidthOffset = io.host_mem.aw.bits.addr.getWidth - p(CtrlNastiKey).dataBits
|
||||
val hostMemOffsetLowWidth = if (hostMemOffsetWidthOffset > 0) p(CtrlNastiKey).dataBits else io.host_mem.aw.bits.addr.getWidth
|
||||
val hostMemOffsetHighWidth = if (hostMemOffsetWidthOffset > 0) hostMemOffsetWidthOffset else 0
|
||||
val hostMemOffsetHigh = RegInit(0.U(hostMemOffsetHighWidth.W))
|
||||
val hostMemOffsetLow = RegInit(0.U(hostMemOffsetLowWidth.W))
|
||||
val hostMemOffset = Cat(hostMemOffsetHigh, hostMemOffsetLow)
|
||||
attach(hostMemOffsetHigh, "hostMemOffsetHigh", WriteOnly)
|
||||
attach(hostMemOffsetLow, "hostMemOffsetLow", WriteOnly)
|
||||
|
||||
io.host_mem <> widthAdapter.sAxi4
|
||||
io.host_mem.aw.bits.user := DontCare
|
||||
io.host_mem.aw.bits.region := DontCare
|
||||
io.host_mem.ar.bits.user := DontCare
|
||||
io.host_mem.ar.bits.region := DontCare
|
||||
io.host_mem.w.bits.id := DontCare
|
||||
io.host_mem.w.bits.user := DontCare
|
||||
io.host_mem.ar.bits.addr := widthAdapter.sAxi4.ar.bits.addr + hostMemOffset
|
||||
io.host_mem.aw.bits.addr := widthAdapter.sAxi4.aw.bits.addr + hostMemOffset
|
||||
|
||||
widthAdapter.mAxi4.aw <> ingress.io.nastiOutputs.aw
|
||||
widthAdapter.mAxi4.ar <> ingress.io.nastiOutputs.ar
|
||||
widthAdapter.mAxi4.w <> ingress.io.nastiOutputs.w
|
||||
|
||||
val readEgress = Module(new ReadEgress(
|
||||
maxRequests = cfg.maxReads,
|
||||
maxReqLength = cfg.maxReadLength,
|
||||
maxReqsPerId = cfg.maxReadsPerID))
|
||||
|
||||
readEgress.io.enq <> widthAdapter.mAxi4.r
|
||||
readEgress.io.enq.bits.user := DontCare
|
||||
|
||||
val writeEgress = Module(new WriteEgress(
|
||||
maxRequests = cfg.maxWrites,
|
||||
maxReqLength = cfg.maxWriteLength,
|
||||
maxReqsPerId = cfg.maxWritesPerID))
|
||||
|
||||
writeEgress.io.enq <> widthAdapter.mAxi4.b
|
||||
writeEgress.io.enq.bits.user := DontCare
|
||||
|
||||
// Track outstanding requests to the host memory system
|
||||
val hOutstandingReads = SatUpDownCounter(cfg.maxReads)
|
||||
hOutstandingReads.inc := io.host_mem.ar.fire()
|
||||
hOutstandingReads.dec := io.host_mem.r.fire() && io.host_mem.r.bits.last
|
||||
hOutstandingReads.max := cfg.maxReads.U
|
||||
val hOutstandingWrites = SatUpDownCounter(cfg.maxWrites)
|
||||
hOutstandingWrites.inc := io.host_mem.aw.fire()
|
||||
hOutstandingWrites.dec := io.host_mem.b.fire()
|
||||
hOutstandingWrites.max := cfg.maxWrites.U
|
||||
|
||||
val host_mem_idle = hOutstandingReads.empty && hOutstandingWrites.empty
|
||||
// By default, disallow all R->W, W->R, and W->W reorderings in host memory
|
||||
// system. see IngressUnit.scala for more detail
|
||||
ingress.io.host_mem_idle := host_mem_idle
|
||||
ingress.io.host_read_inflight := !hOutstandingReads.empty
|
||||
ingress.io.relaxed := funcModelRegs.relaxFunctionalModel
|
||||
|
||||
// Five conditions to execute a target cycle:
|
||||
// 1: AXI4 tokens are available, and there is space to enqueue a new input token
|
||||
// 2: Ingress has space for requests snooped in token
|
||||
val ingressReady = ingress.io.nastiInputs.hReady
|
||||
// 3: Egress unit has produced the payloads for read response channel
|
||||
val rReady = readEgress.io.resp.hValid
|
||||
// 4: Egress unit has produced the payloads for write response channel
|
||||
val bReady = writeEgress.io.resp.hValid
|
||||
// 5: If targetReset is asserted the host-memory system must first settle
|
||||
val tResetReady = (!tReset || host_mem_idle)
|
||||
|
||||
// decoupled helper fire currently doesn't support directly passing true/false.B as exclude
|
||||
val tFireHelper = DecoupledHelper(hPort.toHost.hValid,
|
||||
hPort.fromHost.hReady,
|
||||
ingressReady, bReady, rReady, tResetReady)
|
||||
|
||||
val targetFire = tFireHelper.fire
|
||||
// HACK: Feeding valid back on ready and ready back on valid until we figure out
|
||||
// channel tokenization
|
||||
hPort.toHost.hReady := tFireHelper.fire
|
||||
hPort.fromHost.hValid := tFireHelper.fire
|
||||
ingress.io.nastiInputs.hValid := tFireHelper.fire(ingressReady)
|
||||
|
||||
model.tNasti <> tNasti
|
||||
model.reset := tReset
|
||||
// Connect up aw to ingress and model
|
||||
ingress.io.nastiInputs.hBits.aw.valid := tNasti.aw.fire
|
||||
ingress.io.nastiInputs.hBits.aw.bits := tNasti.aw.bits
|
||||
|
||||
// Connect ar to ingress and model
|
||||
ingress.io.nastiInputs.hBits.ar.valid := tNasti.ar.fire
|
||||
ingress.io.nastiInputs.hBits.ar.bits := tNasti.ar.bits
|
||||
|
||||
// Connect w to ingress and model
|
||||
ingress.io.nastiInputs.hBits.w.valid := tNasti.w.fire
|
||||
ingress.io.nastiInputs.hBits.w.bits := tNasti.w.bits
|
||||
|
||||
// Connect target-level signals between egress and model
|
||||
readEgress.io.req.t := model.io.egressReq.r
|
||||
readEgress.io.req.hValid := targetFire
|
||||
readEgress.io.resp.tReady := model.io.egressResp.rReady
|
||||
model.io.egressResp.rBits := readEgress.io.resp.tBits
|
||||
|
||||
writeEgress.io.req.t := model.io.egressReq.b
|
||||
writeEgress.io.req.hValid := targetFire
|
||||
writeEgress.io.resp.tReady := model.io.egressResp.bReady
|
||||
model.io.egressResp.bBits := writeEgress.io.resp.tBits
|
||||
|
||||
ingress.reset := reset.toBool || tReset && tFireHelper.fire(ingressReady)
|
||||
readEgress.reset := reset.toBool || tReset && targetFire
|
||||
writeEgress.reset := reset.toBool || tReset && targetFire
|
||||
|
||||
|
||||
if (cfg.params.localHCycleCount) {
|
||||
val hCycle = RegInit(0.U(32.W))
|
||||
hCycle := hCycle + 1.U
|
||||
attach(hCycle, "hostCycle", ReadOnly)
|
||||
}
|
||||
|
||||
if (cfg.params.stallEventCounters) {
|
||||
val writeEgressStalls = RegInit(0.U(32.W))
|
||||
when(!bReady) {
|
||||
writeEgressStalls := writeEgressStalls + 1.U
|
||||
}
|
||||
|
||||
val readEgressStalls = RegInit(0.U(32.W))
|
||||
when(!rReady) {
|
||||
readEgressStalls := readEgressStalls + 1.U
|
||||
}
|
||||
|
||||
val tokenStalls = RegInit(0.U(32.W))
|
||||
when(!(tResetReady && hPort.toHost.hValid && hPort.fromHost.hReady)) {
|
||||
tokenStalls := tokenStalls + 1.U
|
||||
}
|
||||
|
||||
val hostMemoryIdleCycles = RegInit(0.U(32.W))
|
||||
when(host_mem_idle) {
|
||||
hostMemoryIdleCycles := hostMemoryIdleCycles + 1.U
|
||||
}
|
||||
|
||||
when (targetFire) {
|
||||
writeEgressStalls := 0.U
|
||||
readEgressStalls := 0.U
|
||||
tokenStalls := 0.U
|
||||
}
|
||||
attach(writeEgressStalls, "writeStalled", ReadOnly)
|
||||
attach(readEgressStalls, "readStalled", ReadOnly)
|
||||
attach(tokenStalls, "tokenStalled", ReadOnly)
|
||||
}
|
||||
|
||||
if (cfg.params.detectAddressCollisions) {
|
||||
val discardedMSBs = 6
|
||||
val collision_checker = Module(new AddressCollisionChecker(
|
||||
cfg.maxReads, cfg.maxWrites, p(NastiKey).addrBits - discardedMSBs))
|
||||
collision_checker.io.read_req.valid := targetFire && tNasti.ar.fire
|
||||
collision_checker.io.read_req.bits := tNasti.ar.bits.addr >> discardedMSBs
|
||||
collision_checker.io.read_done := io.host_mem.r.fire && io.host_mem.r.bits.last
|
||||
|
||||
collision_checker.io.write_req.valid := targetFire && tNasti.aw.fire
|
||||
collision_checker.io.write_req.bits := tNasti.aw.bits.addr >> discardedMSBs
|
||||
collision_checker.io.write_done := io.host_mem.b.fire
|
||||
|
||||
val collision_addr = RegEnable(collision_checker.io.collision_addr.bits,
|
||||
targetFire & collision_checker.io.collision_addr.valid)
|
||||
|
||||
val num_collisions = RegInit(0.U(32.W))
|
||||
when (targetFire && collision_checker.io.collision_addr.valid) {
|
||||
num_collisions := num_collisions + 1.U
|
||||
}
|
||||
|
||||
attach(num_collisions, "addrCollision", ReadOnly)
|
||||
attach(collision_addr, "collisionAddr", ReadOnly)
|
||||
}
|
||||
|
||||
if (cfg.params.latencyHistograms) {
|
||||
|
||||
// Measure latency from reception of first read data beat; need
|
||||
// some state to track when a beat corresponds to the start of a new xaction
|
||||
val newHRead = RegInit(true.B)
|
||||
when (readEgress.io.enq.fire && readEgress.io.enq.bits.last) {
|
||||
newHRead := true.B
|
||||
}.elsewhen (readEgress.io.enq.fire) {
|
||||
newHRead := false.B
|
||||
}
|
||||
// Latencies of host xactions
|
||||
val hReadLatencyHist = HostLatencyHistogram(
|
||||
ingress.io.nastiOutputs.ar.fire,
|
||||
ingress.io.nastiOutputs.ar.bits.id,
|
||||
readEgress.io.enq.fire && newHRead,
|
||||
readEgress.io.enq.bits.id
|
||||
)
|
||||
attachIO(hReadLatencyHist, "hostReadLatencyHist_")
|
||||
|
||||
val hWriteLatencyHist = HostLatencyHistogram(
|
||||
ingress.io.nastiOutputs.aw.fire,
|
||||
ingress.io.nastiOutputs.aw.bits.id,
|
||||
writeEgress.io.enq.fire,
|
||||
writeEgress.io.enq.bits.id
|
||||
)
|
||||
attachIO(hWriteLatencyHist, "hostWriteLatencyHist_")
|
||||
|
||||
// target-time latencies of xactions
|
||||
val newTRead = RegInit(true.B)
|
||||
// Measure latency from reception of first read data beat; need
|
||||
// some state to track when a beat corresponds to the start of a new xaction
|
||||
when (targetFire) {
|
||||
when (model.tNasti.r.fire && model.tNasti.r.bits.last) {
|
||||
newTRead := true.B
|
||||
}.elsewhen (model.tNasti.r.fire) {
|
||||
newTRead := false.B
|
||||
}
|
||||
}
|
||||
|
||||
val tReadLatencyHist = HostLatencyHistogram(
|
||||
model.tNasti.ar.fire && targetFire,
|
||||
model.tNasti.ar.bits.id,
|
||||
model.tNasti.r.fire && targetFire && newTRead,
|
||||
model.tNasti.r.bits.id,
|
||||
cycleCountEnable = targetFire
|
||||
)
|
||||
attachIO(tReadLatencyHist, "targetReadLatencyHist_")
|
||||
|
||||
val tWriteLatencyHist = HostLatencyHistogram(
|
||||
model.tNasti.aw.fire && targetFire,
|
||||
model.tNasti.aw.bits.id,
|
||||
model.tNasti.b.fire && targetFire,
|
||||
model.tNasti.b.bits.id,
|
||||
cycleCountEnable = targetFire
|
||||
)
|
||||
attachIO(tWriteLatencyHist, "targetWriteLatencyHist_")
|
||||
|
||||
// Total host-latency of transactions
|
||||
val totalReadLatencyHist = HostLatencyHistogram(
|
||||
model.tNasti.ar.fire && targetFire,
|
||||
model.tNasti.ar.bits.id,
|
||||
model.tNasti.r.fire && targetFire && newTRead,
|
||||
model.tNasti.r.bits.id
|
||||
)
|
||||
attachIO(totalReadLatencyHist, "totalReadLatencyHist_")
|
||||
|
||||
val totalWriteLatencyHist = HostLatencyHistogram(
|
||||
model.tNasti.aw.fire && targetFire,
|
||||
model.tNasti.aw.bits.id,
|
||||
model.tNasti.b.fire && targetFire,
|
||||
model.tNasti.b.bits.id
|
||||
)
|
||||
attachIO(totalWriteLatencyHist, "totalWriteLatencyHist_")
|
||||
|
||||
// Ingress latencies
|
||||
val iReadLatencyHist = HostLatencyHistogram(
|
||||
ingress.io.nastiInputs.hBits.ar.fire() && targetFire,
|
||||
ingress.io.nastiInputs.hBits.ar.bits.id,
|
||||
ingress.io.nastiOutputs.ar.fire,
|
||||
ingress.io.nastiOutputs.ar.bits.id
|
||||
)
|
||||
attachIO(iReadLatencyHist, "ingressReadLatencyHist_")
|
||||
|
||||
val iWriteLatencyHist = HostLatencyHistogram(
|
||||
ingress.io.nastiInputs.hBits.aw.fire() && targetFire,
|
||||
ingress.io.nastiInputs.hBits.aw.bits.id,
|
||||
ingress.io.nastiOutputs.aw.fire,
|
||||
ingress.io.nastiOutputs.aw.bits.id
|
||||
)
|
||||
attachIO(iWriteLatencyHist, "ingressWriteLatencyHist_")
|
||||
}
|
||||
|
||||
if (cfg.params.addrRangeCounters > 0) {
|
||||
val n = cfg.params.addrRangeCounters
|
||||
val readRanges = AddressRangeCounter(n, model.tNasti.ar, targetFire)
|
||||
val writeRanges = AddressRangeCounter(n, model.tNasti.aw, targetFire)
|
||||
val numRanges = n.U(32.W)
|
||||
|
||||
attachIO(readRanges, "readRanges_")
|
||||
attachIO(writeRanges, "writeRanges_")
|
||||
attach(numRanges, "numRanges", ReadOnly)
|
||||
}
|
||||
|
||||
val rrespError = RegEnable(io.host_mem.r.bits.resp, 0.U,
|
||||
io.host_mem.r.bits.resp =/= 0.U && io.host_mem.r.fire)
|
||||
val brespError = RegEnable(io.host_mem.r.bits.resp, 0.U,
|
||||
io.host_mem.b.bits.resp =/= 0.U && io.host_mem.b.fire)
|
||||
|
||||
// Generate the configuration registers and tie them to the ctrl bus
|
||||
attachIO(model.io.mmReg)
|
||||
attachIO(funcModelRegs)
|
||||
attach(rrespError, "rrespError", ReadOnly)
|
||||
attach(brespError, "brespError", ReadOnly)
|
||||
|
||||
genCRFile()
|
||||
dontTouch(targetFire)
|
||||
chisel3.experimental.annotate(Fame1ChiselAnnotation(model, "targetFire"))
|
||||
getDefaultSettings("runtime.conf")
|
||||
|
||||
override def genHeader(base: BigInt, sb: StringBuilder) {
|
||||
def genCPPmap(mapName: String, map: Map[String, BigInt]): String = {
|
||||
val prefix = s"const std::map<std::string, int> $mapName = {\n"
|
||||
map.foldLeft(prefix)((str, kvp) => str + s""" {\"${kvp._1}\", ${kvp._2}},\n""") + "};\n"
|
||||
}
|
||||
import midas.widgets.CppGenerationUtils._
|
||||
super.genHeader(base, sb)
|
||||
|
||||
sb.append(CppGenerationUtils.genMacro(s"${getWName.toUpperCase}_target_addr_bits", UInt32(p(NastiKey).addrBits)))
|
||||
|
||||
crRegistry.genArrayHeader(wName.getOrElse(name).toUpperCase, base, sb)
|
||||
}
|
||||
|
||||
// Prints out key elaboration time settings
|
||||
private def printGenerationConfig(): Unit = {
|
||||
println("Generating a Midas Memory Model")
|
||||
println(" Max Read Requests: " + cfg.maxReads)
|
||||
println(" Max Write Requests: " + cfg.maxReads)
|
||||
println(" Max Read Length: " + cfg.maxReadLength)
|
||||
println(" Max Write Length: " + cfg.maxWriteLength)
|
||||
println(" Max Read ID Reuse: " + cfg.maxReadsPerID)
|
||||
println(" Max Write ID Reuse: " + cfg.maxWritesPerID)
|
||||
|
||||
println("\nTiming Model Parameters")
|
||||
model.printGenerationConfig
|
||||
cfg.params.llcKey match {
|
||||
case Some(key) => key.print()
|
||||
case None => println(" No LLC Model Instantiated\n")
|
||||
}
|
||||
}
|
||||
|
||||
// Accepts an elaborated memory model and generates a runtime configuration for it
|
||||
private def emitSettings(fileName: String, settings: Seq[(String, String)])(implicit p: Parameters): Unit = {
|
||||
val file = new File(p(OutputDir), fileName)
|
||||
val writer = new FileWriter(file)
|
||||
settings.foreach({
|
||||
case (field, value) => writer.write(s"+mm_${field}=${value}\n")
|
||||
})
|
||||
writer.close
|
||||
}
|
||||
|
||||
def getSettings(fileName: String)(implicit p: Parameters) {
|
||||
println("\nGenerating a Midas Memory Model Configuration File")
|
||||
val functionalModelSettings = funcModelRegs.getFuncModelSettings()
|
||||
val timingModelSettings = model.io.mmReg.getTimingModelSettings()
|
||||
emitSettings(fileName, functionalModelSettings ++ timingModelSettings)
|
||||
}
|
||||
|
||||
def getDefaultSettings(fileName: String)(implicit p: Parameters) {
|
||||
val functionalModelSettings = funcModelRegs.getDefaults()
|
||||
val timingModelSettings = model.io.mmReg.getDefaults()
|
||||
emitSettings(fileName, functionalModelSettings ++ timingModelSettings)
|
||||
}
|
||||
}
|
||||
|
||||
class FASEDBridge(argument: CompleteConfig)(implicit p: Parameters)
|
||||
extends BlackBox with Bridge[HostPortIO[FASEDTargetIO], FASEDMemoryTimingModel] {
|
||||
val io = IO(new FASEDTargetIO)
|
||||
val bridgeIO = HostPort(io)
|
||||
val constructorArg = Some(argument)
|
||||
generateAnnotations()
|
||||
}
|
||||
|
||||
object FASEDBridge {
|
||||
def apply(axi4: AXI4Bundle, reset: Bool, cfg: CompleteConfig)(implicit p: Parameters): FASEDBridge = {
|
||||
val ep = Module(new FASEDBridge(cfg)(p.alterPartial({ case NastiKey => cfg.axi4Widths })))
|
||||
ep.io.reset := reset
|
||||
import chisel3.core.ExplicitCompileOptions.NotStrict
|
||||
ep.io.axi4 <> axi4
|
||||
ep
|
||||
}
|
||||
}
|
|
@ -0,0 +1,172 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import junctions._
|
||||
import midas.widgets._
|
||||
|
||||
import Console.{UNDERLINED, RESET}
|
||||
|
||||
case class FIFOMASConfig(
|
||||
dramKey: DramOrganizationParams,
|
||||
transactionQueueDepth: Int,
|
||||
backendKey: DRAMBackendKey = DRAMBackendKey(4, 4, DRAMMasEnums.maxDRAMTimingBits),
|
||||
params: BaseParams)
|
||||
extends DRAMBaseConfig {
|
||||
|
||||
def elaborate()(implicit p: Parameters): FIFOMASModel = Module(new FIFOMASModel(this)(p))
|
||||
}
|
||||
|
||||
class FIFOMASMMRegIO(val cfg: FIFOMASConfig) extends BaseDRAMMMRegIO(cfg) {
|
||||
val registers = dramBaseRegisters
|
||||
|
||||
def requestSettings() {
|
||||
Console.println(s"Configuring a First-Come First-Serve Model")
|
||||
setBaseDRAMSettings()
|
||||
}
|
||||
}
|
||||
|
||||
class FIFOMASIO(val cfg: FIFOMASConfig)(implicit p: Parameters) extends TimingModelIO()(p) {
|
||||
val mmReg = new FIFOMASMMRegIO(cfg)
|
||||
//override def clonetype = new FIFOMASIO(cfg)(p).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class FIFOMASModel(cfg: FIFOMASConfig)(implicit p: Parameters) extends TimingModel(cfg)(p)
|
||||
with HasDRAMMASConstants {
|
||||
|
||||
val longName = "FIFO MAS"
|
||||
def printTimingModelGenerationConfig {}
|
||||
/**************************** CHISEL BEGINS *********************************/
|
||||
import DRAMMasEnums._
|
||||
|
||||
lazy val io = IO(new FIFOMASIO(cfg))
|
||||
val timings = io.mmReg.dramTimings
|
||||
|
||||
val backend = Module(new DRAMBackend(cfg.backendKey))
|
||||
val xactionScheduler = Module(new UnifiedFIFOXactionScheduler(cfg.transactionQueueDepth, cfg))
|
||||
xactionScheduler.io.req <> nastiReq
|
||||
xactionScheduler.io.pendingAWReq := pendingAWReq.value
|
||||
xactionScheduler.io.pendingWReq := pendingWReq.value
|
||||
|
||||
val currentReference = Queue({
|
||||
val next = Wire(Decoupled(new MASEntry(cfg)))
|
||||
next.valid := xactionScheduler.io.nextXaction.valid
|
||||
next.bits.decode(xactionScheduler.io.nextXaction.bits, io.mmReg)
|
||||
xactionScheduler.io.nextXaction.ready := next.ready
|
||||
next
|
||||
}, 1, pipe = true)
|
||||
|
||||
val selectedCmd = WireInit(cmd_nop)
|
||||
val memReqDone = (selectedCmd === cmd_casr || selectedCmd === cmd_casw)
|
||||
|
||||
// Trackers controller-level structural hazards
|
||||
val cmdBusBusy = Module(new DownCounter((maxDRAMTimingBits)))
|
||||
cmdBusBusy.io.decr := true.B
|
||||
|
||||
// Trackers for bank-level hazards and timing violations
|
||||
val rankStateTrackers = Seq.fill(cfg.dramKey.maxRanks)(Module(new RankStateTracker(cfg.dramKey)))
|
||||
val currentRank = VecInit(rankStateTrackers map { _.io.rank })(currentReference.bits.rankAddr)
|
||||
val bankMuxes = VecInit(rankStateTrackers map { tracker => tracker.io.rank.banks(currentReference.bits.bankAddr) })
|
||||
val currentBank = WireInit(bankMuxes(currentReference.bits.rankAddr))
|
||||
|
||||
// Command scheduling logic
|
||||
val cmdRow = currentReference.bits.rowAddr
|
||||
val cmdRank = WireInit(UInt(cfg.dramKey.rankBits.W), init = currentReference.bits.rankAddr)
|
||||
val cmdBank = WireInit(currentReference.bits.bankAddr)
|
||||
val cmdBankOH = UIntToOH(cmdBank)
|
||||
val currentRowHit = currentBank.state === bank_active && cmdRow === currentBank.openRow
|
||||
val casAutoPRE = WireInit(false.B)
|
||||
|
||||
val canCASW = backend.io.newWrite.ready && currentReference.valid &&
|
||||
currentRowHit && currentReference.bits.xaction.isWrite && currentBank.canCASW &&
|
||||
currentRank.canCASW && !currentRank.wantREF
|
||||
|
||||
val canCASR = backend.io.newRead.ready && currentReference.valid && currentRowHit &&
|
||||
!currentReference.bits.xaction.isWrite && currentBank.canCASR && currentRank.canCASR &&
|
||||
!currentRank.wantREF
|
||||
|
||||
val refreshUnit = Module(new RefreshUnit(cfg.dramKey)).io
|
||||
refreshUnit.ranksInUse := io.mmReg.rankAddr.maskToOH()
|
||||
refreshUnit.rankStati.zip(rankStateTrackers) foreach { case (refInput, tracker) =>
|
||||
refInput := tracker.io.rank }
|
||||
|
||||
when (refreshUnit.suggestREF) {
|
||||
selectedCmd := cmd_ref
|
||||
cmdRank := refreshUnit.refRankAddr
|
||||
}.elsewhen (refreshUnit.suggestPRE) {
|
||||
selectedCmd := cmd_pre
|
||||
cmdRank := refreshUnit.preRankAddr
|
||||
cmdBank := refreshUnit.preBankAddr
|
||||
}.elsewhen(io.mmReg.openPagePolicy) {
|
||||
when (canCASR) {
|
||||
selectedCmd := cmd_casr
|
||||
}.elsewhen (canCASW) {
|
||||
selectedCmd := cmd_casw
|
||||
}.elsewhen (currentReference.valid && currentBank.canACT && currentRank.canACT && !currentRank.wantREF) {
|
||||
selectedCmd := cmd_act
|
||||
}.elsewhen (currentReference.valid && !currentRowHit && currentBank.canPRE && currentRank.canPRE) {
|
||||
selectedCmd := cmd_pre
|
||||
}
|
||||
}.otherwise {
|
||||
when (canCASR) {
|
||||
selectedCmd := cmd_casr
|
||||
casAutoPRE := true.B
|
||||
}.elsewhen (canCASW) {
|
||||
selectedCmd := cmd_casw
|
||||
casAutoPRE := true.B
|
||||
}.elsewhen (currentReference.valid && currentBank.canACT && currentRank.canACT && !currentRank.wantREF) {
|
||||
selectedCmd := cmd_act
|
||||
}
|
||||
}
|
||||
|
||||
rankStateTrackers.zip(UIntToOH(cmdRank).toBools) foreach { case (state, cmdUsesThisRank) =>
|
||||
state.io.selectedCmd := selectedCmd
|
||||
state.io.cmdBankOH := cmdBankOH
|
||||
state.io.cmdRow := cmdRow
|
||||
state.io.autoPRE := casAutoPRE
|
||||
state.io.cmdUsesThisRank := cmdUsesThisRank
|
||||
state.io.timings := timings
|
||||
state.io.tCycle := tCycle
|
||||
}
|
||||
|
||||
// TODO: sensible mapping to DRAM bus width
|
||||
|
||||
cmdBusBusy.io.set.bits := timings.tCMD - 1.U
|
||||
cmdBusBusy.io.set.valid := (selectedCmd =/= cmd_nop)
|
||||
|
||||
currentReference.ready := memReqDone
|
||||
|
||||
backend.io.tCycle := tCycle
|
||||
backend.io.newRead.bits := ReadResponseMetaData(currentReference.bits.xaction)
|
||||
backend.io.newRead.valid := memReqDone && !currentReference.bits.xaction.isWrite
|
||||
backend.io.readLatency := timings.tCAS + timings.tAL + io.mmReg.backendLatency
|
||||
|
||||
// For writes we send out the acknowledge immediately
|
||||
backend.io.newWrite.bits := WriteResponseMetaData(currentReference.bits.xaction)
|
||||
backend.io.newWrite.valid := memReqDone && currentReference.bits.xaction.isWrite
|
||||
backend.io.writeLatency := 1.U
|
||||
|
||||
wResp <> backend.io.completedWrite
|
||||
rResp <> backend.io.completedRead
|
||||
|
||||
// Dump the command stream
|
||||
val cmdMonitor = Module(new CommandBusMonitor())
|
||||
cmdMonitor.io.cmd := selectedCmd
|
||||
cmdMonitor.io.rank := cmdRank
|
||||
cmdMonitor.io.bank := cmdBank
|
||||
cmdMonitor.io.row := cmdRow
|
||||
cmdMonitor.io.autoPRE := casAutoPRE
|
||||
|
||||
val powerStats = (rankStateTrackers).zip(UIntToOH(cmdRank).toBools) map {
|
||||
case (rankState, cmdUsesThisRank) =>
|
||||
val powerMonitor = Module(new RankPowerMonitor(cfg.dramKey))
|
||||
powerMonitor.io.selectedCmd := selectedCmd
|
||||
powerMonitor.io.cmdUsesThisRank := cmdUsesThisRank
|
||||
powerMonitor.io.rankState := rankState.io.rank
|
||||
powerMonitor.io.stats
|
||||
}
|
||||
|
||||
io.mmReg.rankPower := VecInit(powerStats)
|
||||
}
|
|
@ -0,0 +1,290 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.util.GenericParameterizedBundle
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
|
||||
import junctions._
|
||||
import midas.widgets._
|
||||
|
||||
import Console.{UNDERLINED, RESET}
|
||||
|
||||
case class FirstReadyFCFSConfig(
|
||||
dramKey: DramOrganizationParams,
|
||||
schedulerWindowSize: Int,
|
||||
transactionQueueDepth: Int,
|
||||
backendKey: DRAMBackendKey = DRAMBackendKey(4, 4, DRAMMasEnums.maxDRAMTimingBits),
|
||||
params: BaseParams)
|
||||
extends DRAMBaseConfig {
|
||||
|
||||
def elaborate()(implicit p: Parameters): FirstReadyFCFSModel = Module(new FirstReadyFCFSModel(this))
|
||||
}
|
||||
|
||||
class FirstReadyFCFSMMRegIO(val cfg: FirstReadyFCFSConfig) extends BaseDRAMMMRegIO(cfg) {
|
||||
val schedulerWindowSize = Input(UInt(log2Ceil(cfg.schedulerWindowSize).W))
|
||||
val transactionQueueDepth = Input(UInt(log2Ceil(cfg.transactionQueueDepth).W))
|
||||
|
||||
val registers = dramBaseRegisters ++ Seq(
|
||||
(schedulerWindowSize -> RuntimeSetting(
|
||||
default = cfg.schedulerWindowSize,
|
||||
query = "Reference queue depth",
|
||||
min = 1,
|
||||
max = Some(cfg.schedulerWindowSize))),
|
||||
transactionQueueDepth -> RuntimeSetting(
|
||||
default = cfg.transactionQueueDepth,
|
||||
query = "Transaction queue depth",
|
||||
min = 1,
|
||||
max = Some(cfg.transactionQueueDepth)))
|
||||
|
||||
def requestSettings() {
|
||||
Console.println(s"Configuring First-Ready First-Come First Serve Model")
|
||||
setBaseDRAMSettings()
|
||||
}
|
||||
}
|
||||
|
||||
class FirstReadyFCFSIO(val cfg: FirstReadyFCFSConfig)(implicit p: Parameters) extends TimingModelIO()(p){
|
||||
val mmReg = new FirstReadyFCFSMMRegIO(cfg)
|
||||
}
|
||||
|
||||
class FirstReadyFCFSModel(cfg: FirstReadyFCFSConfig)(implicit p: Parameters) extends TimingModel(cfg)(p)
|
||||
with HasDRAMMASConstants {
|
||||
|
||||
val longName = "First-Ready FCFS MAS"
|
||||
def printTimingModelGenerationConfig {}
|
||||
/**************************** CHISEL BEGINS *********************************/
|
||||
|
||||
import DRAMMasEnums._
|
||||
lazy val io = IO(new FirstReadyFCFSIO(cfg))
|
||||
|
||||
val timings = io.mmReg.dramTimings
|
||||
|
||||
val backend = Module(new DRAMBackend(cfg.backendKey))
|
||||
val xactionScheduler = Module(new UnifiedFIFOXactionScheduler(cfg.transactionQueueDepth, cfg))
|
||||
xactionScheduler.io.req <> nastiReq
|
||||
xactionScheduler.io.pendingAWReq := pendingAWReq.value
|
||||
xactionScheduler.io.pendingWReq := pendingWReq.value
|
||||
|
||||
// Trackers for controller-level structural hazards
|
||||
val cmdBusBusy = Module(new DownCounter((maxDRAMTimingBits)))
|
||||
cmdBusBusy.io.decr := true.B
|
||||
|
||||
|
||||
// Forward declared wires
|
||||
val selectedCmd = WireInit(cmd_nop)
|
||||
val memReqDone = (selectedCmd === cmd_casr || selectedCmd === cmd_casw)
|
||||
|
||||
// Trackers for DRAM timing violations
|
||||
val rankStateTrackers = Seq.fill(cfg.dramKey.maxRanks)(Module(new RankStateTracker(cfg.dramKey)))
|
||||
|
||||
// Prevents closing a row before a CAS command has been issued for the ready entry
|
||||
// Instead of counting the number, we keep a bit to indicate presence
|
||||
// it is set on activation, enqueuing a new ready entry, and unset when a memreq kills the last
|
||||
// ready entry
|
||||
val bankHasReadyEntries = RegInit(VecInit(Seq.fill(cfg.dramKey.maxRanks * cfg.dramKey.maxBanks)(false.B)))
|
||||
|
||||
// State for the collapsing buffer of pending memory references
|
||||
val newReference = Wire(Decoupled(new FirstReadyFCFSEntry(cfg)))
|
||||
newReference.valid := xactionScheduler.io.nextXaction.valid
|
||||
newReference.bits.decode(xactionScheduler.io.nextXaction.bits, io.mmReg)
|
||||
|
||||
// Mark that the new reference hits an open row buffer, in case it missed the broadcast
|
||||
val rowHitsInRank = VecInit(rankStateTrackers map { tracker =>
|
||||
VecInit(tracker.io.rank.banks map { _.isRowHit(newReference.bits)}).asUInt })
|
||||
|
||||
xactionScheduler.io.nextXaction.ready := newReference.ready
|
||||
|
||||
val refBuffer = CollapsingBuffer(
|
||||
enq = newReference,
|
||||
depth = cfg.schedulerWindowSize,
|
||||
programmableDepth = Some(io.mmReg.schedulerWindowSize)
|
||||
)
|
||||
val refList = refBuffer.io.entries
|
||||
val refUpdates = refBuffer.io.updates
|
||||
|
||||
// Selects the oldest candidate from all ready references that can legally request a CAS
|
||||
val columnArbiter = Module(new Arbiter(refList.head.bits.cloneType, refList.size))
|
||||
|
||||
def checkRankBankLegality(getField: CommandLegalBools => Bool)(masEntry: FirstReadyFCFSEntry): Bool = {
|
||||
val bankFields = rankStateTrackers map { rank => VecInit(rank.io.rank.banks map getField).asUInt }
|
||||
val bankLegal = (Mux1H(masEntry.rankAddrOH, bankFields) & masEntry.bankAddrOH).orR
|
||||
val rankFields = VecInit(rankStateTrackers map { rank => getField(rank.io.rank) }).asUInt
|
||||
val rankLegal = (masEntry.rankAddrOH & rankFields).orR
|
||||
rankLegal && bankLegal
|
||||
}
|
||||
|
||||
def rankWantsRef(rankAddrOH: UInt): Bool =
|
||||
(rankAddrOH & (VecInit(rankStateTrackers map { _.io.rank.wantREF }).asUInt)).orR
|
||||
|
||||
|
||||
val canLegallyCASR = checkRankBankLegality( _.canCASR ) _
|
||||
val canLegallyCASW = checkRankBankLegality(_.canCASW) _
|
||||
val canLegallyACT = checkRankBankLegality(_.canACT) _
|
||||
val canLegallyPRE = checkRankBankLegality(_.canPRE) _
|
||||
|
||||
columnArbiter.io.in <> refList.map({ entry =>
|
||||
val candidate = V2D(entry)
|
||||
val canCASR = canLegallyCASR(entry.bits) && backend.io.newRead.ready
|
||||
val canCASW = canLegallyCASW(entry.bits) && backend.io.newWrite.ready
|
||||
candidate.valid := entry.valid && entry.bits.isReady &&
|
||||
Mux(entry.bits.xaction.isWrite, canCASW, canCASR) &&
|
||||
!rankWantsRef(entry.bits.rankAddrOH)
|
||||
|
||||
candidate
|
||||
})
|
||||
|
||||
|
||||
val entryWantsPRE = refList map { ref => ref.valid && ref.bits.wantPRE() && canLegallyPRE(ref.bits) }
|
||||
val entryWantsACT = refList map { ref => ref.valid && ref.bits.wantACT() && canLegallyACT(ref.bits) &&
|
||||
!rankWantsRef(ref.bits.rankAddrOH) }
|
||||
|
||||
val preBank = PriorityMux(entryWantsPRE, refList.map(_.bits.bankAddr))
|
||||
val preRank = PriorityMux(entryWantsPRE, refList.map(_.bits.rankAddr))
|
||||
val suggestPre = entryWantsPRE reduce {_ || _}
|
||||
|
||||
val actRank = PriorityMux(entryWantsACT, refList.map(_.bits.rankAddr))
|
||||
val actBank = PriorityMux(entryWantsACT, refList.map(_.bits.bankAddr))
|
||||
val actRow = PriorityMux(entryWantsACT, refList.map(_.bits.rowAddr))
|
||||
|
||||
// See if the oldest pending row reference wants a PRE an ACT
|
||||
val suggestAct = (entryWantsACT.zip(entryWantsPRE)).foldRight(false.B)({
|
||||
case ((act, pre), current) => Mux(act, true.B, !pre && current) })
|
||||
|
||||
// NB: These are not driven for all command types. Ex. When issuing a CAS cmdRow
|
||||
// will not correspond to the row of the CAS command since that is implicit
|
||||
// to the state of the bank.
|
||||
val cmdBank = WireInit(UInt(cfg.dramKey.bankBits.W), init = preBank)
|
||||
val cmdBankOH = UIntToOH(cmdBank)
|
||||
val cmdRank = WireInit(UInt(cfg.dramKey.rankBits.W), init = columnArbiter.io.out.bits.rankAddr)
|
||||
val cmdRow = actRow
|
||||
|
||||
val refreshUnit = Module(new RefreshUnit(cfg.dramKey)).io
|
||||
refreshUnit.ranksInUse := io.mmReg.rankAddr.maskToOH()
|
||||
refreshUnit.rankStati.zip(rankStateTrackers) foreach { case (refInput, tracker) =>
|
||||
refInput := tracker.io.rank }
|
||||
|
||||
when (refreshUnit.suggestREF) {
|
||||
selectedCmd := cmd_ref
|
||||
cmdRank := refreshUnit.refRankAddr
|
||||
}.elsewhen (refreshUnit.suggestPRE) {
|
||||
selectedCmd := cmd_pre
|
||||
cmdRank := refreshUnit.preRankAddr
|
||||
cmdBank := refreshUnit.preBankAddr
|
||||
}.elsewhen(columnArbiter.io.out.valid){
|
||||
selectedCmd := Mux(columnArbiter.io.out.bits.xaction.isWrite, cmd_casw, cmd_casr)
|
||||
cmdBank := columnArbiter.io.out.bits.bankAddr
|
||||
}.elsewhen(suggestAct) {
|
||||
selectedCmd := cmd_act
|
||||
cmdRank := actRank
|
||||
cmdBank := actBank
|
||||
}.elsewhen(suggestPre) {
|
||||
selectedCmd := cmd_pre
|
||||
cmdRank := preRank
|
||||
}
|
||||
|
||||
// Remove a reference if it is granted a column access
|
||||
columnArbiter.io.out.ready := selectedCmd === cmd_casw || selectedCmd === cmd_casr
|
||||
|
||||
// Take the readies from the arbiter, and kill the selected entry
|
||||
val entriesStillReady = refUpdates.zip(columnArbiter.io.in) map { case (ref, sel) =>
|
||||
when (sel.fire()) { ref.valid := false.B }
|
||||
// If the entry is not killed, but shares the same open row as the killed reference, return true
|
||||
!sel.fire() && ref.valid && ref.bits.isReady &&
|
||||
cmdBank === ref.bits.bankAddr && cmdRank === ref.bits.rankAddr
|
||||
}
|
||||
|
||||
val otherReadyEntries = entriesStillReady reduce { _ || _ }
|
||||
val casAutoPRE = Mux(io.mmReg.openPagePolicy, false.B, memReqDone && !otherReadyEntries)
|
||||
|
||||
// Mark new entries that now hit in a open row buffer
|
||||
// Or invalidate them if a precharge was issued
|
||||
refUpdates.foreach({ ref =>
|
||||
when(cmdRank === ref.bits.rankAddr && cmdBank === ref.bits.bankAddr) {
|
||||
when (selectedCmd === cmd_act) {
|
||||
ref.bits.isReady := ref.bits.rowAddr === cmdRow
|
||||
ref.bits.mayPRE := false.B
|
||||
}.elsewhen (selectedCmd === cmd_pre) {
|
||||
ref.bits.isReady := false.B
|
||||
ref.bits.mayPRE := false.B
|
||||
}.elsewhen (memReqDone && !otherReadyEntries) {
|
||||
ref.bits.mayPRE := true.B
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
val newRefAddrMatch = newReference.bits.addrMatch(cmdRank, cmdBank, Some(cmdRow))
|
||||
val newRefBankAddrMatch = newReference.bits.addrMatch(cmdRank, cmdBank)
|
||||
newReference.bits.isReady := // 1) Row just opened or 2) already open && No precharges to that row this cycle
|
||||
selectedCmd === cmd_act && newRefAddrMatch ||
|
||||
(rowHitsInRank(newReference.bits.rankAddr) & newReference.bits.bankAddrOH).orR &&
|
||||
!(memReqDone && casAutoPRE && newRefBankAddrMatch) && !(selectedCmd === cmd_pre && newRefBankAddrMatch)
|
||||
|
||||
|
||||
// Useful only for the open-page policy. In closed page policy, precharges
|
||||
// are always issued as part of auto-pre commands on in preperation for refresh.
|
||||
newReference.bits.mayPRE := // Last ready reference serviced or no other ready entries
|
||||
Mux(io.mmReg.openPagePolicy,
|
||||
// 1:The last ready request has been made to the bank
|
||||
newReference.bits.addrMatch(cmdRank, cmdBank) && memReqDone && !otherReadyEntries ||
|
||||
// 2: There are no ready references, and a precharge is not being issued to the bank this cycle
|
||||
!bankHasReadyEntries(Cat(newReference.bits.rankAddr, newReference.bits.bankAddr)) &&
|
||||
!(selectedCmd === cmd_pre && newRefBankAddrMatch),
|
||||
false.B)
|
||||
|
||||
// Check if the broadcasted cmdBank and cmdRank hit a ready entry
|
||||
when(memReqDone || selectedCmd === cmd_act) {
|
||||
bankHasReadyEntries(Cat(cmdRank, cmdBank)) := memReqDone && otherReadyEntries || selectedCmd === cmd_act
|
||||
}
|
||||
|
||||
when (newReference.bits.isReady & newReference.fire()){
|
||||
bankHasReadyEntries(Cat(newReference.bits.rankAddr, newReference.bits.bankAddr)) := true.B
|
||||
}
|
||||
|
||||
rankStateTrackers.zip(UIntToOH(cmdRank).toBools) foreach { case (state, cmdUsesThisRank) =>
|
||||
state.io.selectedCmd := selectedCmd
|
||||
state.io.cmdBankOH := cmdBankOH
|
||||
state.io.cmdRow := cmdRow
|
||||
state.io.autoPRE := casAutoPRE
|
||||
state.io.cmdUsesThisRank := cmdUsesThisRank
|
||||
|
||||
state.io.timings := timings
|
||||
state.io.tCycle := tCycle
|
||||
}
|
||||
|
||||
cmdBusBusy.io.set.bits := timings.tCMD - 1.U
|
||||
cmdBusBusy.io.set.valid := selectedCmd =/= cmd_nop
|
||||
|
||||
backend.io.tCycle := tCycle
|
||||
backend.io.newRead.bits := ReadResponseMetaData(columnArbiter.io.out.bits.xaction)
|
||||
backend.io.newRead.valid := memReqDone && !columnArbiter.io.out.bits.xaction.isWrite
|
||||
backend.io.readLatency := timings.tCAS + timings.tAL + io.mmReg.backendLatency
|
||||
|
||||
// For writes we send out the acknowledge immediately
|
||||
backend.io.newWrite.bits := WriteResponseMetaData(columnArbiter.io.out.bits.xaction)
|
||||
backend.io.newWrite.valid := memReqDone && columnArbiter.io.out.bits.xaction.isWrite
|
||||
backend.io.writeLatency := 1.U
|
||||
|
||||
wResp <> backend.io.completedWrite
|
||||
rResp <> backend.io.completedRead
|
||||
|
||||
// Dump the cmd stream
|
||||
val cmdMonitor = Module(new CommandBusMonitor())
|
||||
cmdMonitor.io.cmd := selectedCmd
|
||||
cmdMonitor.io.rank := cmdRank
|
||||
cmdMonitor.io.bank := cmdBank
|
||||
cmdMonitor.io.row := cmdRow
|
||||
cmdMonitor.io.autoPRE := casAutoPRE
|
||||
|
||||
val powerStats = (rankStateTrackers).zip(UIntToOH(cmdRank).toBools) map {
|
||||
case (rankState, cmdUsesThisRank) =>
|
||||
val powerMonitor = Module(new RankPowerMonitor(cfg.dramKey))
|
||||
powerMonitor.io.selectedCmd := selectedCmd
|
||||
powerMonitor.io.cmdUsesThisRank := cmdUsesThisRank
|
||||
powerMonitor.io.rankState := rankState.io.rank
|
||||
powerMonitor.io.stats
|
||||
}
|
||||
|
||||
io.mmReg.rankPower := VecInit(powerStats)
|
||||
}
|
|
@ -0,0 +1,160 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
// From RC
|
||||
import freechips.rocketchip.config.{Parameters}
|
||||
import freechips.rocketchip.util.{DecoupledHelper}
|
||||
import junctions._
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util.{Queue}
|
||||
|
||||
import midas.core.{HostDecoupled}
|
||||
import midas.widgets.{SatUpDownCounter}
|
||||
|
||||
// The ingress module queues up incoming target requests, and issues them to the
|
||||
// host memory system.
|
||||
|
||||
// NB: The AXI4 imposes no ordering between in flight reads and writes. In the
|
||||
// event the target-master issues a read and a write to an overlapping memory
|
||||
// region, host-memory-system reorderings of those requests will result in
|
||||
// non-deterministic target behavior.
|
||||
//
|
||||
// asserting io.relaxed = true, allows the ingress unit to issue requests ASAP. This
|
||||
// is a safe optimization only for non-chump-city AXI4 masters.
|
||||
//
|
||||
// asserting io.relaxed = false, will force the ingress unit to pessimistically
|
||||
// issue host-memory requests to prevent reorderings,by waiting for the
|
||||
// host-memory system to go idle before 1) issuing any write, 2) issuing a read
|
||||
// if there is a write inflight. (I did not want to add the extra complexity of
|
||||
// tracking inflight addresses.) This has the effect of forcing reads to see
|
||||
// the value of youngest write for which the AW and all W beats have been
|
||||
// accepted, but no write acknowledgement has been issued..
|
||||
|
||||
trait IngressModuleParameters {
|
||||
val cfg: BaseConfig
|
||||
implicit val p: Parameters
|
||||
// In general the only consequence of undersizing these are more wasted
|
||||
// host cycles the model waits to drain these
|
||||
val ingressAWQdepth = cfg.maxWrites
|
||||
val ingressWQdepth = 2*cfg.maxWriteLength
|
||||
val ingressARQdepth = 4
|
||||
|
||||
// DEADLOCK RISK: if the host memory system accepts only one AW while a W
|
||||
// xaction is inflight, and the entire W-transaction is not available in the
|
||||
// ingress module the host memory system will drain the WQueue without
|
||||
// consuming another AW token. The target will remain stalled and cannot
|
||||
// complete the W xaction.
|
||||
require(ingressWQdepth >= cfg.maxWriteLength)
|
||||
require(ingressAWQdepth >= cfg.maxWrites)
|
||||
}
|
||||
|
||||
class IngressModule(val cfg: BaseConfig)(implicit val p: Parameters) extends Module
|
||||
with IngressModuleParameters {
|
||||
val io = IO(new Bundle {
|
||||
// This is target valid and not decoupled because the model has handshaked
|
||||
// the target-level channels already for us
|
||||
val nastiInputs = Flipped(HostDecoupled((new ValidNastiReqChannels)))
|
||||
val nastiOutputs = new NastiReqChannels
|
||||
val relaxed = Input(Bool())
|
||||
val host_mem_idle = Input(Bool())
|
||||
val host_read_inflight = Input(Bool())
|
||||
})
|
||||
|
||||
|
||||
val awQueue = Module(new Queue(new NastiWriteAddressChannel, ingressAWQdepth))
|
||||
val wQueue = Module(new Queue(new NastiWriteDataChannel, ingressWQdepth))
|
||||
val arQueue = Module(new Queue(new NastiReadAddressChannel, ingressARQdepth))
|
||||
|
||||
// Host request gating -- wait until we have a complete W transaction before
|
||||
// we issue it.
|
||||
val wCredits = SatUpDownCounter(cfg.maxWrites)
|
||||
wCredits.inc := awQueue.io.enq.fire()
|
||||
wCredits.dec := wQueue.io.deq.fire() && wQueue.io.deq.bits.last
|
||||
val awCredits = SatUpDownCounter(cfg.maxWrites)
|
||||
awCredits.inc := wQueue.io.enq.fire() && wQueue.io.enq.bits.last
|
||||
awCredits.dec := awQueue.io.deq.fire()
|
||||
|
||||
// All the sources of host stalls
|
||||
val tFireHelper = DecoupledHelper(
|
||||
io.nastiInputs.hValid,
|
||||
awQueue.io.enq.ready,
|
||||
wQueue.io.enq.ready,
|
||||
arQueue.io.enq.ready)
|
||||
|
||||
|
||||
val ingressUnitStall = !tFireHelper.fire(io.nastiInputs.hValid)
|
||||
|
||||
// A request is finished when we have both a complete AW and W request
|
||||
// Only then can we consider issuing the write to host memory system
|
||||
//
|
||||
// When we aren't relaxing the ordering, we repurpose the credit counters to
|
||||
// simply count the number of complete W and AW requests.
|
||||
val write_req_done = ((awCredits.value > wCredits.value) && wCredits.inc) ||
|
||||
((awCredits.value < wCredits.value) && awCredits.inc) ||
|
||||
awCredits.inc && wCredits.inc
|
||||
|
||||
when (!io.relaxed) {
|
||||
Seq(awCredits, wCredits) foreach { _.dec := write_req_done }
|
||||
}
|
||||
|
||||
|
||||
val read_req_done = arQueue.io.enq.fire()
|
||||
|
||||
// FIFO that tracks the relative order of reads and writes are they are received
|
||||
// bit 0 = Read, bit 1 = Write
|
||||
val xaction_order = Module(new DualQueue(Bool(), cfg.maxReads + cfg.maxWrites))
|
||||
xaction_order.io.enqA.valid := read_req_done
|
||||
xaction_order.io.enqA.bits := true.B
|
||||
xaction_order.io.enqB.valid := write_req_done
|
||||
xaction_order.io.enqB.bits := false.B
|
||||
|
||||
val do_hread = io.relaxed ||
|
||||
(io.host_mem_idle || io.host_read_inflight) && xaction_order.io.deq.valid && xaction_order.io.deq.bits
|
||||
|
||||
val do_hwrite = Mux(io.relaxed, !awCredits.empty,
|
||||
io.host_mem_idle && xaction_order.io.deq.valid && !xaction_order.io.deq.bits)
|
||||
|
||||
xaction_order.io.deq.ready := io.nastiOutputs.ar.fire || io.nastiOutputs.aw.fire
|
||||
|
||||
val do_hwrite_data_reg = RegInit(false.B)
|
||||
when (io.nastiOutputs.aw.fire) {
|
||||
do_hwrite_data_reg := true.B
|
||||
}.elsewhen (io.nastiOutputs.w.fire && io.nastiOutputs.w.bits.last) {
|
||||
do_hwrite_data_reg := false.B
|
||||
}
|
||||
|
||||
val do_hwrite_data = Mux(io.relaxed, !wCredits.empty, do_hwrite_data_reg)
|
||||
|
||||
|
||||
io.nastiInputs.hReady := !ingressUnitStall
|
||||
|
||||
arQueue.io.enq.bits := io.nastiInputs.hBits.ar.bits
|
||||
arQueue.io.enq.valid := tFireHelper.fire(arQueue.io.enq.ready) && io.nastiInputs.hBits.ar.valid
|
||||
|
||||
io.nastiOutputs.ar <> arQueue.io.deq
|
||||
io.nastiOutputs.ar.valid := do_hread && arQueue.io.deq.valid
|
||||
arQueue.io.deq.ready := do_hread && io.nastiOutputs.ar.ready
|
||||
|
||||
awQueue.io.enq.bits := io.nastiInputs.hBits.aw.bits
|
||||
awQueue.io.enq.valid := tFireHelper.fire(awQueue.io.enq.ready) && io.nastiInputs.hBits.aw.valid
|
||||
wQueue.io.enq.bits := io.nastiInputs.hBits.w.bits
|
||||
wQueue.io.enq.valid := tFireHelper.fire(wQueue.io.enq.ready) && io.nastiInputs.hBits.w.valid
|
||||
|
||||
io.nastiOutputs.aw.bits := awQueue.io.deq.bits
|
||||
io.nastiOutputs.w.bits := wQueue.io.deq.bits
|
||||
|
||||
io.nastiOutputs.aw.valid := do_hwrite && awQueue.io.deq.valid
|
||||
awQueue.io.deq.ready := do_hwrite && io.nastiOutputs.aw.ready
|
||||
|
||||
io.nastiOutputs.w.valid := do_hwrite_data && wQueue.io.deq.valid
|
||||
wQueue.io.deq.ready := do_hwrite_data && io.nastiOutputs.w.ready
|
||||
// Deadlock checks.
|
||||
assert(!(wQueue.io.enq.valid && !wQueue.io.enq.ready &&
|
||||
Mux(io.relaxed, wCredits.empty, !xaction_order.io.deq.valid)),
|
||||
"DEADLOCK: Timing model requests w enqueue, but wQueue is full and cannot drain")
|
||||
|
||||
assert(!(awQueue.io.enq.valid && !awQueue.io.enq.ready &&
|
||||
Mux(io.relaxed, awCredits.empty, !xaction_order.io.deq.valid)),
|
||||
"DEADLOCK: Timing model requests aw enqueue, but is awQueue is full and cannot drain")
|
||||
}
|
|
@ -0,0 +1,73 @@
|
|||
|
||||
package midas
|
||||
package models
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
import freechips.rocketchip.util.ParameterizedBundle
|
||||
import junctions._
|
||||
|
||||
class NastiReqChannels(implicit val p: Parameters) extends ParameterizedBundle {
|
||||
val aw = Decoupled(new NastiWriteAddressChannel)
|
||||
val w = Decoupled(new NastiWriteDataChannel)
|
||||
val ar = Decoupled(new NastiReadAddressChannel)
|
||||
|
||||
def fromNasti(n: NastiIO): Unit = {
|
||||
aw <> n.aw
|
||||
ar <> n.ar
|
||||
w <> n.w
|
||||
}
|
||||
}
|
||||
|
||||
object NastiReqChannels {
|
||||
def apply(nasti: NastiIO)(implicit p: Parameters): NastiReqChannels = {
|
||||
val w = Wire(new NastiReqChannels)
|
||||
w.ar <> nasti.ar
|
||||
w.aw <> nasti.aw
|
||||
w.w <> nasti.w
|
||||
w
|
||||
}
|
||||
}
|
||||
|
||||
class ValidNastiReqChannels(implicit val p: Parameters) extends ParameterizedBundle {
|
||||
val aw = Valid(new NastiWriteAddressChannel)
|
||||
val w = Valid(new NastiWriteDataChannel)
|
||||
val ar = Valid(new NastiReadAddressChannel)
|
||||
}
|
||||
|
||||
class NastiRespChannels(implicit val p: Parameters) extends ParameterizedBundle {
|
||||
val b = Decoupled(new NastiWriteResponseChannel)
|
||||
val r = Decoupled(new NastiReadDataChannel)
|
||||
}
|
||||
|
||||
// Target-level interface
|
||||
class EgressReq(implicit val p: Parameters) extends ParameterizedBundle
|
||||
with HasNastiParameters {
|
||||
val b = Valid(UInt(nastiWIdBits.W))
|
||||
val r = Valid(UInt(nastiRIdBits.W))
|
||||
}
|
||||
|
||||
// Target-level interface
|
||||
class EgressResp(implicit val p: Parameters) extends ParameterizedBundle {
|
||||
val bBits = Output(new NastiWriteResponseChannel)
|
||||
val bReady = Input(Bool())
|
||||
val rBits = Output(new NastiReadDataChannel)
|
||||
val rReady = Input(Bool())
|
||||
}
|
||||
|
||||
// Contains the metadata required to track a transaction as it it requested from the egress unit
|
||||
class CurrentReadResp(implicit val p: Parameters) extends ParameterizedBundle
|
||||
with HasNastiParameters {
|
||||
val id = UInt(nastiRIdBits.W)
|
||||
val len = UInt(nastiXLenBits.W)
|
||||
}
|
||||
class CurrentWriteResp(implicit val p: Parameters) extends ParameterizedBundle
|
||||
with HasNastiParameters {
|
||||
val id = UInt(nastiRIdBits.W)
|
||||
}
|
||||
|
||||
class MemModelTargetIO(implicit val p: Parameters) extends ParameterizedBundle {
|
||||
val nasti = new NastiIO
|
||||
val reset = Output(Bool())
|
||||
}
|
|
@ -0,0 +1,438 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
// NOTE: This LLC model is *very* crude model of a cache that simple forwards
|
||||
// misses onto the DRAM model, while short-circuiting hits.
|
||||
import junctions._
|
||||
|
||||
import midas.core._
|
||||
import midas.widgets._
|
||||
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.util.{ParameterizedBundle, MaskGen, UIntToOH1}
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
|
||||
import scala.math.min
|
||||
import Console.{UNDERLINED, RESET}
|
||||
|
||||
import java.io.{File, FileWriter}
|
||||
|
||||
// State to track reads to DRAM, ~loosely an MSHR
|
||||
class MSHR(llcKey: LLCParams)(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val set_addr = UInt(llcKey.sets.maxBits.W)
|
||||
val xaction = new TransactionMetaData
|
||||
val wb_in_flight = Bool()
|
||||
val acq_in_flight = Bool()
|
||||
val enabled = Bool() // Set by a runtime configuration register
|
||||
|
||||
def valid(): Bool = (wb_in_flight || acq_in_flight) && enabled
|
||||
def available(): Bool = !valid && enabled
|
||||
def setCollision(set_addr: UInt): Bool = (set_addr === this.set_addr) && valid
|
||||
|
||||
// Call on a MSHR register; sets all pertinent fields (leaving enabled untouched)
|
||||
def allocate(
|
||||
new_xaction: TransactionMetaData,
|
||||
new_set_addr: UInt,
|
||||
do_acq: Bool,
|
||||
do_wb: Bool = false.B)(implicit p: Parameters): Unit = {
|
||||
set_addr := new_set_addr
|
||||
wb_in_flight := do_wb
|
||||
acq_in_flight := do_acq
|
||||
xaction := new_xaction
|
||||
}
|
||||
|
||||
override def cloneType = new MSHR(llcKey)(p).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
object MSHR {
|
||||
def apply(llcKey: LLCParams)(implicit p: Parameters): MSHR = {
|
||||
val w = Wire(new MSHR(llcKey))
|
||||
w.wb_in_flight := false.B
|
||||
w.acq_in_flight := false.B
|
||||
// Initialize to enabled to play nice with assertions
|
||||
w.enabled := true.B
|
||||
w.xaction := DontCare
|
||||
w.set_addr := DontCare
|
||||
w
|
||||
}
|
||||
}
|
||||
|
||||
class BlockMetadata(tagBits: Int) extends Bundle {
|
||||
val tag = UInt(tagBits.W)
|
||||
val valid = Bool()
|
||||
val dirty = Bool()
|
||||
override def cloneType = new BlockMetadata(tagBits).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
class LLCProgrammableSettings(llcKey: LLCParams) extends Bundle
|
||||
with HasProgrammableRegisters with HasConsoleUtils {
|
||||
val wayBits = Input(UInt(log2Ceil(llcKey.ways.maxBits).W))
|
||||
val setBits = Input(UInt(log2Ceil(llcKey.sets.maxBits).W))
|
||||
val blockBits = Input(UInt(log2Ceil(llcKey.blockBytes.maxBits).W))
|
||||
val activeMSHRs = Input(UInt(log2Ceil(llcKey.mshrs.max + 1).W))
|
||||
|
||||
// Instrumentation
|
||||
val misses = Output(UInt(32.W)) // Total accesses is provided by (totalReads + totalWrites)
|
||||
val writebacks = Output(UInt(32.W)) // Number of dirty lines returned to DRAM
|
||||
val refills = Output(UInt(32.W)) // Number of clean lines requested from DRAM
|
||||
val peakMSHRsUsed = Output(UInt(log2Ceil(llcKey.mshrs.max+1).W)) // Peak number of MSHRs used
|
||||
// Note short-burst writes will produce a refill, whereas releases from caches will not
|
||||
|
||||
val registers = Seq(
|
||||
wayBits -> RuntimeSetting(llcKey.ways.maxBits, "Log2(ways per set)"),
|
||||
setBits -> RuntimeSetting(llcKey.sets.maxBits, "Log2(sets per bank"),
|
||||
blockBits -> RuntimeSetting(llcKey.blockBytes.maxBits, "Log2(cache-block bytes"),
|
||||
activeMSHRs -> RuntimeSetting(llcKey.mshrs.max, "Number of MSHRs", min = 1, max = Some(llcKey.mshrs.max))
|
||||
)
|
||||
|
||||
def maskTag(addr: UInt): UInt = (addr >> (blockBits +& setBits))
|
||||
def maskSet(addr: UInt): UInt = ((addr >> blockBits) & ((1.U << setBits) - 1.U))(llcKey.sets.maxBits-1, 0)
|
||||
def regenPhysicalAddress(set_addr: UInt, tag_addr: UInt): UInt =
|
||||
(set_addr << (blockBits)) |
|
||||
(tag_addr << (blockBits +& setBits))
|
||||
|
||||
def setLLCSettings(bytesPerBlock: Option[Int] = None): Unit = {
|
||||
Console.println(s"\n${UNDERLINED}Last-Level Cache Settings${RESET}")
|
||||
|
||||
regMap(blockBits).set(log2Ceil(requestInput("Block size in bytes",
|
||||
default = llcKey.blockBytes.max,
|
||||
min = Some(llcKey.blockBytes.min),
|
||||
max = Some(llcKey.blockBytes.max))))
|
||||
regMap(setBits).set(log2Ceil(requestInput("Number of sets in LLC",
|
||||
default = llcKey.sets.max,
|
||||
min = Some(llcKey.sets.min),
|
||||
max = Some(llcKey.sets.max))))
|
||||
regMap(wayBits).set(log2Ceil(requestInput("Set associativity",
|
||||
default = llcKey.ways.max,
|
||||
min = Some(llcKey.ways.min),
|
||||
max = Some(llcKey.ways.max))))
|
||||
}
|
||||
}
|
||||
|
||||
case class WRange(min: Int, max: Int) {
|
||||
def minBits: Int = log2Ceil(min)
|
||||
def maxBits: Int = log2Ceil(max)
|
||||
override def toString(): String = s"[${min},${max}]"
|
||||
}
|
||||
|
||||
case class LLCParams(
|
||||
ways: WRange = WRange(1, 8),
|
||||
sets: WRange = WRange(32, 4096),
|
||||
blockBytes: WRange = WRange(8, 128),
|
||||
mshrs: WRange = WRange(1, 8)// TODO: check against AXI ID width
|
||||
) {
|
||||
|
||||
def maxTagBits(addrWidth: Int): Int = addrWidth - blockBytes.minBits - sets.minBits
|
||||
|
||||
def print(): Unit = {
|
||||
println(" LLC Parameters:")
|
||||
println(" Sets: " + sets)
|
||||
println(" Associativity: " + ways)
|
||||
println(" Block Size (B): " + blockBytes)
|
||||
println(" MSHRs: " + mshrs)
|
||||
println(" Replacement Policy: Random\n")
|
||||
}
|
||||
}
|
||||
|
||||
class LLCModelIO(val key: LLCParams)(implicit val p: Parameters) extends Bundle {
|
||||
val req = Flipped(new NastiReqChannels)
|
||||
val wResp = Decoupled(new WriteResponseMetaData) // to backend
|
||||
val rResp = Decoupled(new ReadResponseMetaData)
|
||||
val memReq = new NastiReqChannels // to backing DRAM model
|
||||
val memRResp = Flipped(Decoupled(new ReadResponseMetaData)) // from backing DRAM model
|
||||
val memWResp = Flipped(Decoupled(new WriteResponseMetaData))
|
||||
|
||||
// LLC runtime configuration
|
||||
val settings = new LLCProgrammableSettings(key)
|
||||
}
|
||||
|
||||
class LLCModel(cfg: BaseConfig)(implicit p: Parameters) extends NastiModule()(p) {
|
||||
val llcKey = cfg.params.llcKey.get
|
||||
val io = IO(new LLCModelIO(llcKey))
|
||||
|
||||
require(log2Ceil(llcKey.mshrs.max) <= nastiXIdBits, "Can have at most one MSHR per AXI ID")
|
||||
|
||||
val maxTagBits = llcKey.maxTagBits(nastiXAddrBits)
|
||||
val way_addr_mask = Reverse(MaskGen((llcKey.ways.max - 1).U, io.settings.wayBits, llcKey.ways.max))
|
||||
|
||||
// Rely on intialization of BRAM to 0 during programming to unset all valid bits the md_array
|
||||
val md_array = SyncReadMem(llcKey.sets.max, Vec(llcKey.ways.max, new BlockMetadata(maxTagBits)))
|
||||
val d_array_busy = Module(new DownCounter(8))
|
||||
d_array_busy.io.set.valid := false.B
|
||||
d_array_busy.io.set.bits := DontCare
|
||||
d_array_busy.io.decr := false.B
|
||||
|
||||
val mshr_mask_vec = UIntToOH1(io.settings.activeMSHRs, llcKey.mshrs.max).toBools
|
||||
val mshrs = RegInit(VecInit(Seq.fill(llcKey.mshrs.max)(MSHR(llcKey))))
|
||||
// Enable only active MSHRs as requested in the runtime configuration
|
||||
mshrs.zipWithIndex.foreach({ case (m, idx) => m.enabled := mshr_mask_vec(idx) })
|
||||
|
||||
val mshr_available = mshrs.exists({m: MSHR => m.available() })
|
||||
val mshr_next_idx = mshrs.indexWhere({ m: MSHR => m.available() })
|
||||
|
||||
// TODO: Put this on a switch
|
||||
val mshrs_allocated = mshrs.count({m: MSHR => m.valid})
|
||||
assert((mshrs_allocated < io.settings.activeMSHRs) || !mshr_available,
|
||||
"Too many runtime MSHRs exposed given runtime programmable limit")
|
||||
assert((mshrs_allocated === io.settings.activeMSHRs) || mshr_available,
|
||||
"Too few runtime MSHRs exposed given runtime programmable limit")
|
||||
|
||||
val s2_ar_mem = Module(new Queue(new NastiReadAddressChannel, 2))
|
||||
val s2_aw_mem = Module(new Queue(new NastiWriteAddressChannel, 2))
|
||||
val miss_resource_hazard = !mshr_available || !s2_aw_mem.io.enq.ready || !s2_ar_mem.io.enq.ready
|
||||
|
||||
val reads = Queue(io.req.ar)
|
||||
val read_set = io.settings.maskSet(reads.bits.addr)
|
||||
val read_set_collision = mshrs.exists({ m: MSHR => m.setCollision(read_set) })
|
||||
val can_deq_read = reads.valid && !read_set_collision && !miss_resource_hazard && io.rResp.ready
|
||||
|
||||
val writes = Queue(io.req.aw)
|
||||
val write_set = io.settings.maskSet(writes.bits.addr)
|
||||
val write_set_collision = mshrs.exists({ m: MSHR => m.setCollision(write_set) })
|
||||
val can_deq_write = writes.valid && !write_set_collision && !miss_resource_hazard && mshr_available && io.wResp.ready
|
||||
|
||||
val llc_idle :: llc_r_mdaccess :: llc_r_wb :: llc_r_daccess :: llc_w_mdaccess :: llc_w_wb :: llc_w_daccess :: llc_refill :: Nil = Enum(8)
|
||||
|
||||
val state = RegInit(llc_idle)
|
||||
val refill_start = WireInit(false.B)
|
||||
val read_start = WireInit(false.B)
|
||||
val write_start = WireInit(false.B)
|
||||
|
||||
val set_addr = Mux(write_start, write_set, read_set)
|
||||
val tag_addr = io.settings.maskTag(Mux(write_start, writes.bits.addr, reads.bits.addr))
|
||||
|
||||
// S1 = Tag matches, replacement candidate selection, and replacement policy update
|
||||
val s1_tag_addr = RegNext(tag_addr)
|
||||
val s1_set_addr = RegNext(set_addr)
|
||||
val s1_valid = state === llc_r_mdaccess || state === llc_w_mdaccess
|
||||
val s1_metadata = {
|
||||
import Chisel._
|
||||
md_array.read(set_addr, read_start || write_start)
|
||||
}
|
||||
|
||||
def isHit(m: BlockMetadata): Bool = m.valid && (m.tag === s1_tag_addr)
|
||||
val hit_ways = VecInit(s1_metadata.map(isHit)).asUInt & way_addr_mask
|
||||
val hit_way_sel = PriorityEncoderOH(hit_ways)
|
||||
val hit_valid = hit_ways.orR
|
||||
|
||||
def isEmptyWay(m: BlockMetadata): Bool = !m.valid
|
||||
val empty_ways = VecInit(s1_metadata.map(isEmptyWay)).asUInt & way_addr_mask
|
||||
val empty_way_sel = PriorityEncoderOH(empty_ways)
|
||||
val empty_valid = empty_ways.orR
|
||||
|
||||
val fill_empty_way = !hit_valid && empty_valid
|
||||
|
||||
val lsfr = LFSR16(true.B)
|
||||
val evict_way_sel = UIntToOH(lsfr(llcKey.ways.maxBits - 1, 0) & ((1.U << io.settings.wayBits) - 1.U))
|
||||
val evict_way_is_dirty = (VecInit(s1_metadata.map(_.dirty)).asUInt & evict_way_sel).orR
|
||||
val evict_way_tag = Mux1H(evict_way_sel, s1_metadata.map(_.tag))
|
||||
|
||||
val do_evict = !hit_valid && !empty_valid
|
||||
val evict_dirty_way = do_evict && evict_way_is_dirty
|
||||
val dirty_line_addr = io.settings.regenPhysicalAddress(s1_set_addr, evict_way_tag)
|
||||
|
||||
val selected_way_OH = Mux(hit_valid, hit_way_sel, Mux(empty_valid, empty_way_sel, evict_way_sel)).toBools
|
||||
|
||||
val md_update = s1_metadata.zip(selected_way_OH) map { case (md, sel) =>
|
||||
val next = WireInit(md)
|
||||
when (sel) {
|
||||
when (fill_empty_way) {
|
||||
next.valid := true.B
|
||||
}
|
||||
when(state === llc_w_mdaccess) {
|
||||
next.dirty := true.B
|
||||
// This also assumes that all md fields in invalid ways are initialized
|
||||
// to zero during programming. Otherwise we'd need to unset the dirty bit
|
||||
// on a compulsory miss
|
||||
}.elsewhen(state === llc_r_mdaccess && do_evict) {
|
||||
next.dirty := false.B
|
||||
}
|
||||
when(do_evict || fill_empty_way) {
|
||||
next.tag := s1_tag_addr
|
||||
}
|
||||
}
|
||||
next
|
||||
}
|
||||
when (s1_valid) {
|
||||
md_array.write(s1_set_addr, VecInit(md_update))
|
||||
}
|
||||
|
||||
// FIXME: Inner and outer widths are the same
|
||||
val block_beats = (1.U << (io.settings.blockBits - log2Ceil(nastiXDataBits/8).U))
|
||||
// AXI4 length; subtract 1
|
||||
val axi4_block_len = block_beats - 1.U
|
||||
|
||||
val read_triggered_refill = state === llc_r_mdaccess && !hit_valid
|
||||
val write_triggered_refill = state === llc_w_mdaccess && (writes.bits.len < axi4_block_len) &&
|
||||
!hit_valid
|
||||
val need_refill = read_triggered_refill || write_triggered_refill
|
||||
val need_writeback = s1_valid && evict_dirty_way
|
||||
|
||||
val allocate_mshr = need_refill || need_writeback
|
||||
|
||||
when(allocate_mshr) {
|
||||
mshrs(mshr_next_idx).allocate(
|
||||
new_xaction = Mux(state === llc_r_mdaccess,
|
||||
TransactionMetaData(reads.bits),
|
||||
TransactionMetaData(writes.bits)),
|
||||
new_set_addr = s1_set_addr,
|
||||
do_acq = need_refill,
|
||||
do_wb = need_writeback)
|
||||
}
|
||||
|
||||
|
||||
// Refill Issue
|
||||
// For now always fetch whole cache lines from DRAM, even if fewer beats are required for
|
||||
// a write-triggered refill
|
||||
val current_line_addr = io.settings.regenPhysicalAddress(s1_set_addr, s1_tag_addr)
|
||||
s2_ar_mem.io.enq.bits := NastiReadAddressChannel(
|
||||
addr = current_line_addr,
|
||||
id = mshr_next_idx,
|
||||
size = log2Ceil(nastiXDataBits/8).U,
|
||||
len = axi4_block_len)
|
||||
s2_ar_mem.io.enq.valid := need_refill
|
||||
|
||||
reads.ready := (state === llc_r_mdaccess)
|
||||
|
||||
// Writeback Issue
|
||||
s2_aw_mem.io.enq.bits := NastiWriteAddressChannel(
|
||||
addr = dirty_line_addr,
|
||||
id = mshr_next_idx,
|
||||
size = log2Ceil(nastiXDataBits/8).U,
|
||||
len = axi4_block_len)
|
||||
s2_aw_mem.io.enq.valid := need_writeback
|
||||
|
||||
writes.ready := io.req.w.bits.last && io.req.w.fire
|
||||
|
||||
io.memReq.ar <> s2_ar_mem.io.deq
|
||||
io.memReq.aw <> s2_aw_mem.io.deq
|
||||
io.memReq.w.valid := (state === llc_r_wb || state === llc_w_wb)
|
||||
io.memReq.w.bits.last := d_array_busy.io.idle
|
||||
|
||||
// Handle responses from DRAM
|
||||
when (io.memWResp.valid) {
|
||||
mshrs(io.memWResp.bits.id).wb_in_flight := false.B
|
||||
}
|
||||
io.memWResp.ready := true.B
|
||||
|
||||
when (refill_start) {
|
||||
mshrs(io.memRResp.bits.id).acq_in_flight := false.B
|
||||
}
|
||||
val can_refill = io.memRResp.valid &&
|
||||
(mshrs(io.memRResp.bits.id).xaction.isWrite || io.rResp.ready)
|
||||
io.memRResp.ready := refill_start
|
||||
|
||||
// Data-array hazard tracking
|
||||
when (((state === llc_w_mdaccess || state === llc_r_mdaccess) && evict_dirty_way) ||
|
||||
refill_start) {
|
||||
d_array_busy.io.set.valid := true.B
|
||||
d_array_busy.io.set.bits := axi4_block_len
|
||||
}.elsewhen (state === llc_r_mdaccess && hit_valid) {
|
||||
d_array_busy.io.set.valid := true.B
|
||||
d_array_busy.io.set.bits := reads.bits.len
|
||||
}.elsewhen (state === llc_w_mdaccess && (hit_valid || empty_valid) ||
|
||||
state === llc_w_wb && (io.memReq.w.fire && io.memReq.w.bits.last)) {
|
||||
d_array_busy.io.set.valid := true.B
|
||||
d_array_busy.io.set.bits := writes.bits.len
|
||||
}
|
||||
|
||||
d_array_busy.io.decr := Mux(state === llc_w_wb || state === llc_r_wb,
|
||||
io.memReq.w.fire,
|
||||
Mux(state === llc_w_daccess, io.req.w.valid, true.B))
|
||||
|
||||
io.req.w.ready := (state === llc_w_daccess) || (state === llc_w_mdaccess && !evict_dirty_way)
|
||||
|
||||
io.rResp.valid := (refill_start && !mshrs(io.memRResp.bits.id).xaction.isWrite) ||
|
||||
(state === llc_r_mdaccess && hit_valid)
|
||||
io.rResp.bits := Mux(refill_start,
|
||||
ReadResponseMetaData(mshrs(io.memRResp.bits.id).xaction),
|
||||
ReadResponseMetaData(reads.bits))
|
||||
|
||||
io.wResp.valid := (state === llc_w_mdaccess || state === llc_w_daccess) &&
|
||||
io.req.w.fire && io.req.w.bits.last
|
||||
io.wResp.bits := WriteResponseMetaData(writes.bits)
|
||||
|
||||
switch (state) {
|
||||
is (llc_idle) {
|
||||
when (can_refill) {
|
||||
state := llc_refill
|
||||
refill_start := true.B
|
||||
}.elsewhen(can_deq_read) {
|
||||
state := llc_r_mdaccess
|
||||
read_start := true.B
|
||||
}.elsewhen(can_deq_write) {
|
||||
state := llc_w_mdaccess
|
||||
write_start := true.B
|
||||
}
|
||||
}
|
||||
is (llc_r_mdaccess) {
|
||||
when (hit_valid) {
|
||||
when(reads.bits.len =/= 0.U) {
|
||||
state := llc_r_daccess
|
||||
}.otherwise {
|
||||
state := llc_idle
|
||||
}
|
||||
}.elsewhen (evict_dirty_way) {
|
||||
state := llc_r_wb
|
||||
}.otherwise {
|
||||
state := llc_idle
|
||||
}
|
||||
}
|
||||
is (llc_w_mdaccess) {
|
||||
when (!evict_dirty_way) {
|
||||
when (io.req.w.valid && io.req.w.bits.last) {
|
||||
state := llc_idle
|
||||
}.otherwise {
|
||||
state := llc_w_daccess
|
||||
}
|
||||
}.otherwise {
|
||||
state := llc_w_wb
|
||||
}
|
||||
}
|
||||
is (llc_r_wb) {
|
||||
when(io.memReq.w.fire && io.memReq.w.bits.last) {
|
||||
state := llc_idle
|
||||
}
|
||||
}
|
||||
is (llc_w_wb) {
|
||||
when(io.memReq.w.fire && io.memReq.w.bits.last) {
|
||||
state := llc_w_daccess
|
||||
}
|
||||
}
|
||||
is (llc_w_daccess) {
|
||||
when (io.req.w.valid && io.req.w.bits.last) {
|
||||
state := llc_idle
|
||||
}
|
||||
}
|
||||
is (llc_r_daccess) {
|
||||
when (d_array_busy.io.current === 1.U) {
|
||||
state := llc_idle
|
||||
}
|
||||
}
|
||||
is (llc_refill) {
|
||||
when (d_array_busy.io.current === 1.U) {
|
||||
state := llc_idle
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Instrumentation
|
||||
val miss_count = RegInit(0.U(32.W))
|
||||
when (s1_valid && !hit_valid) { miss_count := miss_count + 1.U }
|
||||
io.settings.misses := miss_count
|
||||
|
||||
val wb_count = RegInit(0.U(32.W))
|
||||
when (s1_valid && evict_dirty_way) { wb_count := wb_count + 1.U }
|
||||
io.settings.writebacks := wb_count
|
||||
|
||||
val refill_count = RegInit(0.U(32.W))
|
||||
when (state === llc_r_mdaccess && !hit_valid) { refill_count := refill_count + 1.U }
|
||||
io.settings.refills := refill_count
|
||||
|
||||
val peak_mshrs_used = RegInit(0.U(log2Ceil(llcKey.mshrs.max + 1).W))
|
||||
when (peak_mshrs_used < mshrs_allocated) { peak_mshrs_used := mshrs_allocated }
|
||||
io.settings.peakMSHRsUsed := peak_mshrs_used
|
||||
|
||||
}
|
|
@ -0,0 +1,83 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.util.ParameterizedBundle
|
||||
import junctions._
|
||||
import midas.widgets._
|
||||
|
||||
import Console.{UNDERLINED, RESET}
|
||||
|
||||
case class LatencyPipeConfig(params: BaseParams) extends BaseConfig {
|
||||
def elaborate()(implicit p: Parameters): LatencyPipe = Module(new LatencyPipe(this))
|
||||
}
|
||||
|
||||
class LatencyPipeMMRegIO(cfg: BaseConfig)(implicit p: Parameters) extends SplitTransactionMMRegIO(cfg){
|
||||
val readLatency = Input(UInt(32.W))
|
||||
val writeLatency = Input(UInt(32.W))
|
||||
|
||||
val registers = maxReqRegisters ++ Seq(
|
||||
(writeLatency -> RuntimeSetting(30, "Write latency", min = 1)),
|
||||
(readLatency -> RuntimeSetting(30,"Read latency", min = 1))
|
||||
)
|
||||
|
||||
def requestSettings() {
|
||||
Console.println(s"${UNDERLINED}Generating a runtime configuration for a latency-bandwidth pipe${RESET}")
|
||||
}
|
||||
}
|
||||
|
||||
class LatencyPipeIO(val cfg: LatencyPipeConfig)(implicit p: Parameters) extends SplitTransactionModelIO()(p) {
|
||||
val mmReg = new LatencyPipeMMRegIO(cfg)
|
||||
}
|
||||
|
||||
class WritePipeEntry(implicit val p: Parameters) extends Bundle {
|
||||
val releaseCycle = UInt(64.W)
|
||||
val xaction = new WriteResponseMetaData
|
||||
}
|
||||
|
||||
class ReadPipeEntry(implicit val p: Parameters) extends Bundle {
|
||||
val releaseCycle = UInt(64.W)
|
||||
val xaction = new ReadResponseMetaData
|
||||
}
|
||||
|
||||
class LatencyPipe(cfg: LatencyPipeConfig)(implicit p: Parameters) extends SplitTransactionModel(cfg)(p) {
|
||||
lazy val io = IO(new LatencyPipeIO(cfg))
|
||||
|
||||
val longName = "Latency Bandwidth Pipe"
|
||||
def printTimingModelGenerationConfig {}
|
||||
/**************************** CHISEL BEGINS *********************************/
|
||||
// Configuration values
|
||||
val readLatency = io.mmReg.readLatency
|
||||
val writeLatency = io.mmReg.writeLatency
|
||||
|
||||
// ***** Write Latency Pipe *****
|
||||
// Write delays are applied to the cycle upon which both the AW and W
|
||||
// transactions have completed. Since multiple AW packets may arrive
|
||||
// before the associated W packet, we queue them up.
|
||||
val writePipe = Module(new Queue(new WritePipeEntry, cfg.maxWrites, flow = true))
|
||||
|
||||
writePipe.io.enq.valid := newWReq
|
||||
writePipe.io.enq.bits.xaction := WriteResponseMetaData(awQueue.io.deq.bits)
|
||||
writePipe.io.enq.bits.releaseCycle := writeLatency + tCycle - egressUnitDelay.U
|
||||
|
||||
val writeDone = writePipe.io.deq.bits.releaseCycle <= tCycle
|
||||
wResp.valid := writePipe.io.deq.valid && writeDone
|
||||
wResp.bits := writePipe.io.deq.bits.xaction
|
||||
writePipe.io.deq.ready := wResp.ready && writeDone
|
||||
|
||||
|
||||
// ***** Read Latency Pipe *****
|
||||
val readPipe = Module(new Queue(new ReadPipeEntry, cfg.maxReads, flow = true))
|
||||
|
||||
readPipe.io.enq.valid := nastiReq.ar.fire
|
||||
readPipe.io.enq.bits.xaction := ReadResponseMetaData(nastiReq.ar.bits)
|
||||
readPipe.io.enq.bits.releaseCycle := readLatency + tCycle - egressUnitDelay.U
|
||||
// Release read responses on the appropriate cycle
|
||||
val readDone = readPipe.io.deq.bits.releaseCycle <= tCycle
|
||||
rResp.valid := readPipe.io.deq.valid && readDone
|
||||
rResp.bits := readPipe.io.deq.bits.xaction
|
||||
readPipe.io.deq.ready := rResp.ready && readDone
|
||||
}
|
||||
|
|
@ -0,0 +1,163 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import freechips.rocketchip.util.{HasGeneratorUtilities, ParsedInputNames} // For parameter lookup
|
||||
import freechips.rocketchip.config._
|
||||
|
||||
import chisel3._
|
||||
|
||||
import org.json4s._
|
||||
|
||||
import Console.{UNDERLINED, GREEN, RESET}
|
||||
import java.io.{File, FileWriter}
|
||||
|
||||
// Hacky utilities to get console input from user.
|
||||
trait HasConsoleUtils {
|
||||
def requestInput(query: String,
|
||||
default: BigInt,
|
||||
min: Option[BigInt] = None,
|
||||
max: Option[BigInt] = None): BigInt = {
|
||||
def inner(): BigInt = {
|
||||
Console.printf(query + s"(${default}):")
|
||||
var value = default
|
||||
try {
|
||||
val line = io.StdIn.readLine()
|
||||
if (line.length() > 0) {
|
||||
value = line.toInt
|
||||
}
|
||||
if (max != None && value > max.get) {
|
||||
Console.printf(s"Request integer ${value} exceeds maximum ${max.get}")
|
||||
inner
|
||||
} else if (min != None && value < min.get) {
|
||||
Console.printf(s"Request integer ${value} is less than minimum ${min.get}")
|
||||
inner
|
||||
}
|
||||
} catch {
|
||||
case e: java.lang.NumberFormatException => {
|
||||
Console.println("Please give me an integer!")
|
||||
value = inner
|
||||
}
|
||||
case e: java.io.EOFException => { value = default }
|
||||
}
|
||||
value
|
||||
}
|
||||
inner
|
||||
}
|
||||
|
||||
// Select from list of possibilities
|
||||
// Format:
|
||||
// HEADER
|
||||
// POS 0
|
||||
// ...
|
||||
// POS N-1
|
||||
// FOOTER (DEFAULT):
|
||||
def requestSeqSelection(
|
||||
header: String,
|
||||
possibilities: Seq[String],
|
||||
footer: String = "Selection number",
|
||||
default: BigInt = 0): Int = {
|
||||
|
||||
val query = s"${header}\n" + (possibilities.zipWithIndex).foldRight(footer)((head, body) =>
|
||||
s" ${head._2}) ${head._1}\n" + body)
|
||||
|
||||
requestInput(query, default).toInt
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// Runtime settings are programmable registers that change behavior of a memory model instance
|
||||
// These are instantatiated in the I/O of the timing model and tied to a Chisel Input
|
||||
trait IsRuntimeSetting extends HasConsoleUtils {
|
||||
def default: BigInt
|
||||
def query: String
|
||||
def min: BigInt
|
||||
def max: Option[BigInt]
|
||||
|
||||
private var _isSet = false
|
||||
private var _value: BigInt = 0
|
||||
|
||||
def set(value: BigInt) {
|
||||
require(!_isSet, "Trying to set a programmable register that has already been set.")
|
||||
_value = value;
|
||||
_isSet = true
|
||||
}
|
||||
|
||||
def isSet() = _isSet
|
||||
|
||||
def getOrElse(alt: =>BigInt): BigInt = if (_isSet) _value else alt
|
||||
|
||||
// This prompts the user via the console for setting
|
||||
def requestSetting(field: Data) {
|
||||
set(requestInput(query, default, Some(min), max))
|
||||
}
|
||||
}
|
||||
|
||||
// A vanilla runtime setting of the memory model
|
||||
case class RuntimeSetting(
|
||||
default: BigInt,
|
||||
query: String,
|
||||
min: BigInt = 0,
|
||||
max: Option[BigInt] = None) extends IsRuntimeSetting
|
||||
|
||||
// A setting whose value can be looked up from a provided table.
|
||||
case class JSONSetting(
|
||||
default: BigInt,
|
||||
query: String,
|
||||
lookUp: Map[String, BigInt] => BigInt,
|
||||
min: BigInt = 0,
|
||||
max: Option[BigInt] = None) extends IsRuntimeSetting {
|
||||
|
||||
def setWithLUT(lut: Map[String, BigInt]) = set(lookUp(lut))
|
||||
}
|
||||
|
||||
trait HasProgrammableRegisters extends Bundle {
|
||||
def registers: Seq[(Data, IsRuntimeSetting)]
|
||||
|
||||
lazy val regMap = Map(registers: _*)
|
||||
|
||||
def getName(dat: Data): String = {
|
||||
val name = elements.find(_._2 == dat) match {
|
||||
case Some((name, elem)) => name
|
||||
case None => throw new RuntimeException("Could not look up register leaf name")
|
||||
}
|
||||
name
|
||||
}
|
||||
|
||||
// Returns the default values for all registered RuntimeSettings
|
||||
def getDefaults(prefix: String = ""): Seq[(String, String)] = {
|
||||
val localDefaults = registers map { case (elem, reg) => (s"${prefix}${getName(elem)}" -> s"${reg.default}") }
|
||||
localDefaults ++ (elements flatMap {
|
||||
case (name, elem: HasProgrammableRegisters) => elem.getDefaults(s"${prefix}${name}_")
|
||||
case _ => Seq()
|
||||
})
|
||||
}
|
||||
|
||||
// Returns the requested values for all RuntimSEttings, throws an exception if one is unbound
|
||||
def getSettings(prefix: String = ""): Seq[(String, String)] = {
|
||||
val localSettings = registers map { case (elem, reg) => {
|
||||
val name = s"${prefix}${getName(elem)}"
|
||||
val setting = reg.getOrElse(throw new RuntimeException(s"Runtime Setting ${name} has not been set"))
|
||||
(name -> setting.toString)
|
||||
}
|
||||
}
|
||||
// Recurse into leaves
|
||||
localSettings ++ (elements flatMap {
|
||||
case (name, elem: HasProgrammableRegisters) => elem.getSettings(s"${prefix}${name}_")
|
||||
case _ => Seq()
|
||||
})
|
||||
}
|
||||
|
||||
// Requests the users input for all unset RuntimeSettings
|
||||
def setUnboundSettings(prefix: String = "test") {
|
||||
// Set all local registers
|
||||
registers foreach {
|
||||
case (elem, reg) if !reg.isSet => reg.requestSetting(elem)
|
||||
case _ => None
|
||||
}
|
||||
// Traverse into leaf bundles and set them
|
||||
elements foreach {
|
||||
case (name, elem: HasProgrammableRegisters) => elem.setUnboundSettings()
|
||||
case _ => None
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,71 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import Chisel._
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.diplomacy._
|
||||
import freechips.rocketchip.util._
|
||||
import freechips.rocketchip.amba.axi4._
|
||||
import freechips.rocketchip.tilelink._
|
||||
import freechips.rocketchip.devices.tilelink._
|
||||
import junctions.NastiParameters
|
||||
|
||||
// WARNING: The address widths are totally bungled here. This is intended
|
||||
// for use with the memory model only
|
||||
// We're going to rely on truncation of the (sometimes) wider master address
|
||||
// later on
|
||||
//
|
||||
// For identical widths this module becomes passthrough
|
||||
class TargetToHostAXI4Converter (
|
||||
mWidths: NastiParameters,
|
||||
sWidths: NastiParameters,
|
||||
mMaxTransfer: Int = 128)
|
||||
(implicit p: Parameters) extends LazyModule
|
||||
{
|
||||
implicit val valname = ValName("FASEDWidthAdapter")
|
||||
val m = AXI4MasterNode(Seq(AXI4MasterPortParameters(
|
||||
masters = Seq(AXI4MasterParameters(
|
||||
name = "widthAdapter",
|
||||
aligned = true,
|
||||
maxFlight = Some(2),
|
||||
id = IdRange(0, (1 << mWidths.idBits))))))) // FIXME: Idbits
|
||||
|
||||
val s = AXI4SlaveNode(Seq(AXI4SlavePortParameters(
|
||||
slaves = Seq(AXI4SlaveParameters(
|
||||
address = Seq(AddressSet(0, (BigInt(1) << mWidths.addrBits) - 1)),
|
||||
supportsWrite = TransferSizes(1, mMaxTransfer),
|
||||
supportsRead = TransferSizes(1, mMaxTransfer),
|
||||
interleavedId = Some(0))), // slave does not interleave read responses
|
||||
beatBytes = sWidths.dataBits/8)
|
||||
))
|
||||
|
||||
// If no width change necessary, pass through with a buffer
|
||||
if (mWidths.dataBits == sWidths.dataBits) {
|
||||
s := m
|
||||
} else {
|
||||
// Otherwise we need to convert to TL2 and back
|
||||
val xbar = LazyModule(new TLXbar)
|
||||
val error = LazyModule(new TLError(DevNullParams(
|
||||
Seq(AddressSet(BigInt(1) << mWidths.addrBits, 0xff)), maxAtomic = 1, maxTransfer = 128),
|
||||
beatBytes = sWidths.dataBits/8))
|
||||
|
||||
(xbar.node
|
||||
:= TLWidthWidget(mWidths.dataBits/8)
|
||||
:= TLFIFOFixer()
|
||||
:= AXI4ToTL()
|
||||
:= AXI4Buffer()
|
||||
:= m )
|
||||
error.node := xbar.node
|
||||
(s := AXI4Buffer()
|
||||
:= AXI4UserYanker()
|
||||
:= TLToAXI4()
|
||||
:= xbar.node)
|
||||
}
|
||||
|
||||
lazy val module = new LazyModuleImp(this) {
|
||||
val mAxi4 = IO(Flipped(m.out.head._1.cloneType))
|
||||
m.out.head._1 <> mAxi4
|
||||
val sAxi4 = IO(s.in.head._1.cloneType)
|
||||
sAxi4 <> s.in.head._1
|
||||
}
|
||||
}
|
|
@ -0,0 +1,244 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.util.ParameterizedBundle
|
||||
import junctions._
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
|
||||
import midas.core._
|
||||
import midas.widgets._
|
||||
|
||||
import Console.{UNDERLINED, RESET}
|
||||
|
||||
// Automatically bound to simulation-memory-mapped. registers Extends this
|
||||
// bundle to add additional programmable values and instrumentation
|
||||
abstract class MMRegIO(cfg: BaseConfig) extends Bundle with HasProgrammableRegisters {
|
||||
val (totalReads, totalWrites) = if (cfg.params.xactionCounters) {
|
||||
(Some(Output(UInt(32.W))), Some(Output(UInt(32.W))))
|
||||
} else {
|
||||
(None, None)
|
||||
}
|
||||
val (totalReadBeats, totalWriteBeats) = if (cfg.params.beatCounters) {
|
||||
(Some(Output(UInt(32.W))), Some(Output(UInt(32.W))))
|
||||
} else {
|
||||
(None, None)
|
||||
}
|
||||
|
||||
val llc = if (cfg.useLLCModel) Some(new LLCProgrammableSettings(cfg.params.llcKey.get)) else None
|
||||
|
||||
// Instrumentation Registers
|
||||
val bins = cfg.params.occupancyHistograms match {
|
||||
case Nil => 0
|
||||
case binMaximums => binMaximums.size + 1
|
||||
}
|
||||
|
||||
val readOutstandingHistogram = Output(Vec(bins, UInt(32.W)))
|
||||
val writeOutstandingHistogram = Output(Vec(bins, UInt(32.W)))
|
||||
|
||||
val targetCycle = if (cfg.params.targetCycleCounter) Some(Output(UInt(32.W))) else None
|
||||
|
||||
// Implemented by each timing model to query runtime values for its
|
||||
// programmable settings
|
||||
def requestSettings(): Unit
|
||||
|
||||
// Called by MidasMemModel to fetch all programmable settings for the timing
|
||||
// model. These are concatenated with functional model settings
|
||||
def getTimingModelSettings(): Seq[(String, String)] = {
|
||||
// First invoke the timing model specific method
|
||||
requestSettings()
|
||||
// Finally set everything that hasn't already been set
|
||||
llc.foreach({ _.setLLCSettings() })
|
||||
Console.println(s"\n${UNDERLINED}Remaining Free Parameters${RESET}")
|
||||
setUnboundSettings()
|
||||
getSettings()
|
||||
}
|
||||
}
|
||||
|
||||
abstract class TimingModelIO(implicit val p: Parameters) extends Bundle {
|
||||
val tNasti = Flipped(new NastiIO)
|
||||
val egressReq = new EgressReq
|
||||
val egressResp = Flipped(new EgressResp)
|
||||
// This sub-bundle contains all the programmable fields of the model
|
||||
val mmReg: MMRegIO
|
||||
}
|
||||
|
||||
abstract class TimingModel(val cfg: BaseConfig)(implicit val p: Parameters) extends Module
|
||||
with IngressModuleParameters with EgressUnitParameters with HasNastiParameters {
|
||||
|
||||
// Concrete timing models must implement io with the MMRegIO sub-bundle
|
||||
// containing all of the requisite runtime-settings and instrumentation brought
|
||||
// out as inputs and outputs respectively. See MMRegIO above.
|
||||
val io: TimingModelIO
|
||||
val longName: String
|
||||
// Implemented by concrete timing models to describe their configuration during
|
||||
// chisel elaboration
|
||||
protected def printTimingModelGenerationConfig: Unit
|
||||
|
||||
def printGenerationConfig {
|
||||
println(" Timing Model Class: " + longName)
|
||||
printTimingModelGenerationConfig
|
||||
}
|
||||
|
||||
/**************************** CHISEL BEGINS *********************************/
|
||||
// Regulates the return of beats to the target memory system
|
||||
val tNasti = io.tNasti
|
||||
// Request channels presented to DRAM models
|
||||
val nastiReqIden = Module(new IdentityModule(new NastiReqChannels))
|
||||
val nastiReq = nastiReqIden.io.out
|
||||
val wResp = Wire(Decoupled(new WriteResponseMetaData))
|
||||
val rResp = Wire(Decoupled(new ReadResponseMetaData))
|
||||
|
||||
val monitor = Module(new MemoryModelMonitor(cfg))
|
||||
monitor.axi4 := io.tNasti
|
||||
|
||||
val tCycle = RegInit(0.U(64.W))
|
||||
tCycle := tCycle + 1.U
|
||||
io.mmReg.targetCycle.foreach({ _ := tCycle })
|
||||
|
||||
|
||||
val pendingReads = SatUpDownCounter(cfg.maxReads)
|
||||
pendingReads.inc := tNasti.ar.fire()
|
||||
pendingReads.dec := tNasti.r.fire() && tNasti.r.bits.last
|
||||
|
||||
val pendingAWReq = SatUpDownCounter(cfg.maxWrites)
|
||||
pendingAWReq.inc := tNasti.aw.fire()
|
||||
pendingAWReq.dec := tNasti.b.fire()
|
||||
|
||||
val pendingWReq = SatUpDownCounter(cfg.maxWrites)
|
||||
pendingWReq.inc := tNasti.w.fire() && tNasti.w.bits.last
|
||||
pendingWReq.dec := tNasti.b.fire()
|
||||
|
||||
assert(!tNasti.ar.valid || (tNasti.ar.bits.burst === NastiConstants.BURST_INCR),
|
||||
"Illegal ar request: memory model only supports incrementing bursts")
|
||||
|
||||
assert(!tNasti.aw.valid || (tNasti.aw.bits.burst === NastiConstants.BURST_INCR),
|
||||
"Illegal aw request: memory model only supports incrementing bursts")
|
||||
|
||||
// Release; returns responses to target
|
||||
val xactionRelease = Module(new AXI4Releaser)
|
||||
tNasti.b <> xactionRelease.io.b
|
||||
tNasti.r <> xactionRelease.io.r
|
||||
io.egressReq <> xactionRelease.io.egressReq
|
||||
xactionRelease.io.egressResp <> io.egressResp
|
||||
|
||||
if (cfg.useLLCModel) {
|
||||
// Drop the LLC model inline
|
||||
val llc_model = Module(new LLCModel(cfg))
|
||||
llc_model.io.settings <> io.mmReg.llc.get
|
||||
llc_model.io.memRResp <> rResp
|
||||
llc_model.io.memWResp <> wResp
|
||||
llc_model.io.req.fromNasti(io.tNasti)
|
||||
nastiReqIden.io.in <> llc_model.io.memReq
|
||||
xactionRelease.io.nextWrite <> llc_model.io.wResp
|
||||
xactionRelease.io.nextRead <> llc_model.io.rResp
|
||||
} else {
|
||||
nastiReqIden.io.in.fromNasti(io.tNasti)
|
||||
xactionRelease.io.nextWrite <> wResp
|
||||
xactionRelease.io.nextRead <> rResp
|
||||
}
|
||||
|
||||
|
||||
if (cfg.params.xactionCounters) {
|
||||
val totalReads = RegInit(0.U(32.W))
|
||||
val totalWrites = RegInit(0.U(32.W))
|
||||
when(pendingReads.inc){ totalReads := totalReads + 1.U }
|
||||
when(pendingAWReq.inc){ totalWrites := totalWrites + 1.U}
|
||||
io.mmReg.totalReads foreach { _ := totalReads }
|
||||
io.mmReg.totalWrites foreach { _ := totalWrites }
|
||||
}
|
||||
|
||||
if (cfg.params.beatCounters) {
|
||||
val totalReadBeats = RegInit(0.U(32.W))
|
||||
val totalWriteBeats = RegInit(0.U(32.W))
|
||||
when(tNasti.r.fire){ totalReadBeats := totalReadBeats + 1.U }
|
||||
when(tNasti.w.fire){ totalWriteBeats := totalWriteBeats + 1.U }
|
||||
io.mmReg.totalReadBeats foreach { _ := totalReadBeats}
|
||||
io.mmReg.totalWriteBeats foreach { _ := totalWriteBeats }
|
||||
}
|
||||
|
||||
cfg.params.occupancyHistograms match {
|
||||
case Nil => Nil
|
||||
case binMaximums =>
|
||||
val numBins = binMaximums.size + 1
|
||||
val readOutstandingHistogram = Seq.fill(numBins)(RegInit(0.U(32.W)))
|
||||
val writeOutstandingHistogram = Seq.fill(numBins)(RegInit(0.U(32.W)))
|
||||
|
||||
def bindHistograms(bins: Seq[UInt], maximums: Seq[Int], count: UInt): Bool = {
|
||||
(bins.zip(maximums)).foldLeft(false.B)({ case (hasIncrmented, (bin, maximum)) =>
|
||||
when (!hasIncrmented && (count <= maximum.U)) {
|
||||
bin := bin + 1.U
|
||||
}
|
||||
hasIncrmented || (count <= maximum.U)
|
||||
})
|
||||
}
|
||||
|
||||
// Append the largest UInt representable to the end of the Seq to catch remaining cases
|
||||
val allBinMaximums = binMaximums :+ -1
|
||||
bindHistograms(readOutstandingHistogram, binMaximums, pendingReads.value)
|
||||
bindHistograms(writeOutstandingHistogram, binMaximums, pendingAWReq.value)
|
||||
io.mmReg.readOutstandingHistogram := readOutstandingHistogram
|
||||
io.mmReg.writeOutstandingHistogram := writeOutstandingHistogram
|
||||
}
|
||||
}
|
||||
|
||||
// A class of simple timing models that has independently programmable bounds on
|
||||
// the number of reads and writes the model will accept.
|
||||
//
|
||||
// This is in contrast to more complex DRAM models that propogate backpressure
|
||||
// from shared structures back to the AXI4 request channels.
|
||||
abstract class SplitTransactionMMRegIO(cfg: BaseConfig)(implicit p: Parameters) extends MMRegIO(cfg) {
|
||||
val readMaxReqs = Input(UInt(log2Ceil(cfg.maxReads+1).W))
|
||||
val writeMaxReqs = Input(UInt(log2Ceil(cfg.maxWrites+1).W))
|
||||
|
||||
val maxReqRegisters = Seq(
|
||||
(writeMaxReqs -> RuntimeSetting(cfg.maxWrites,
|
||||
"Maximum number of target-writes the model will accept",
|
||||
max = Some(cfg.maxWrites))),
|
||||
(readMaxReqs -> RuntimeSetting(cfg.maxReads,
|
||||
"Maximum number of target-reads the model will accept",
|
||||
max = Some(cfg.maxReads)))
|
||||
)
|
||||
}
|
||||
|
||||
abstract class SplitTransactionModelIO(implicit p: Parameters)
|
||||
extends TimingModelIO()(p) {
|
||||
// This sub-bundle contains all the programmable fields of the model
|
||||
val mmReg: SplitTransactionMMRegIO
|
||||
}
|
||||
|
||||
abstract class SplitTransactionModel(cfg: BaseConfig)(implicit p: Parameters)
|
||||
extends TimingModel(cfg)(p) {
|
||||
override val io: SplitTransactionModelIO
|
||||
|
||||
pendingReads.max := io.mmReg.readMaxReqs
|
||||
pendingAWReq.max := io.mmReg.writeMaxReqs
|
||||
pendingWReq.max := io.mmReg.writeMaxReqs
|
||||
nastiReq.ar.ready := ~pendingReads.full
|
||||
nastiReq.aw.ready := ~pendingAWReq.full
|
||||
nastiReq.w.ready := ~pendingWReq.full
|
||||
|
||||
//recombines AW and W transactions before passing them onto the rest of the model
|
||||
val awQueue = Module(new Queue(new NastiWriteAddressChannel, cfg.maxWrites, flow = true))
|
||||
|
||||
val newWReq = if (!cfg.useLLCModel) {
|
||||
((pendingWReq.value > pendingAWReq.value) && pendingAWReq.inc) ||
|
||||
((pendingWReq.value < pendingAWReq.value) && pendingWReq.inc) ||
|
||||
(pendingWReq.inc && pendingAWReq.inc)
|
||||
} else {
|
||||
val memWReqs = SatUpDownCounter(cfg.maxWrites)
|
||||
val newWReq = ((memWReqs.value > awQueue.io.count) && nastiReq.aw.fire) ||
|
||||
((memWReqs.value < awQueue.io.count) && memWReqs.inc) ||
|
||||
(memWReqs.inc && nastiReq.aw.fire)
|
||||
|
||||
memWReqs.inc := nastiReq.w.fire && nastiReq.w.bits.last
|
||||
memWReqs.dec := newWReq
|
||||
newWReq
|
||||
}
|
||||
|
||||
awQueue.io.enq.bits := nastiReq.aw.bits
|
||||
awQueue.io.enq.valid := nastiReq.aw.fire()
|
||||
awQueue.io.deq.ready := newWReq
|
||||
}
|
|
@ -0,0 +1,62 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import junctions._
|
||||
import midas.widgets._
|
||||
|
||||
import freechips.rocketchip.config.Parameters
|
||||
import freechips.rocketchip.util.{ParameterizedBundle, DecoupledHelper}
|
||||
|
||||
// Add some scheduler specific metadata to a reference
|
||||
class XactionSchedulerEntry(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val xaction = new TransactionMetaData
|
||||
val addr = UInt(nastiXAddrBits.W)
|
||||
}
|
||||
|
||||
class XactionSchedulerIO(val cfg: BaseConfig)(implicit val p: Parameters) extends Bundle{
|
||||
val req = Flipped(new NastiReqChannels)
|
||||
val nextXaction = Decoupled(new XactionSchedulerEntry)
|
||||
val pendingWReq = Input(UInt((cfg.maxWrites + 1).W))
|
||||
val pendingAWReq = Input(UInt((cfg.maxWrites + 1).W))
|
||||
}
|
||||
|
||||
class UnifiedFIFOXactionScheduler(depth: Int, cfg: BaseConfig)(implicit p: Parameters) extends Module {
|
||||
val io = IO(new XactionSchedulerIO(cfg))
|
||||
|
||||
import DRAMMasEnums._
|
||||
|
||||
val transactionQueue = Module(new DualQueue(
|
||||
gen = new XactionSchedulerEntry,
|
||||
entries = depth))
|
||||
|
||||
transactionQueue.io.enqA.valid := io.req.ar.valid
|
||||
transactionQueue.io.enqA.bits.xaction := TransactionMetaData(io.req.ar.bits)
|
||||
transactionQueue.io.enqA.bits.addr := io.req.ar.bits.addr
|
||||
io.req.ar.ready := transactionQueue.io.enqA.ready
|
||||
|
||||
transactionQueue.io.enqB.valid := io.req.aw.valid
|
||||
transactionQueue.io.enqB.bits.xaction := TransactionMetaData(io.req.aw.bits)
|
||||
transactionQueue.io.enqB.bits.addr := io.req.aw.bits.addr
|
||||
io.req.aw.ready := transactionQueue.io.enqB.ready
|
||||
// Accept up to one additional write data request
|
||||
// TODO: More sensible model; maybe track a write buffer volume
|
||||
io.req.w.ready := io.pendingWReq <= io.pendingAWReq
|
||||
|
||||
val selectedCmd = WireInit(cmd_nop)
|
||||
val completedWrites = SatUpDownCounter(cfg.maxWrites)
|
||||
completedWrites.inc := io.req.w.fire && io.req.w.bits.last
|
||||
completedWrites.dec := io.nextXaction.fire && io.nextXaction.bits.xaction.isWrite
|
||||
|
||||
// Prevent release of oldest transaction if it is a write and it's data is not yet available
|
||||
val deqGate = DecoupledHelper(
|
||||
transactionQueue.io.deq.valid,
|
||||
io.nextXaction.ready,
|
||||
(!io.nextXaction.bits.xaction.isWrite || ~completedWrites.empty)
|
||||
)
|
||||
|
||||
io.nextXaction <> transactionQueue.io.deq
|
||||
io.nextXaction.valid := deqGate.fire(io.nextXaction.ready)
|
||||
transactionQueue.io.deq.ready := deqGate.fire(transactionQueue.io.deq.valid)
|
||||
}
|
|
@ -0,0 +1,753 @@
|
|||
package midas
|
||||
package models
|
||||
|
||||
// From RC
|
||||
import freechips.rocketchip.config.{Parameters, Field}
|
||||
import freechips.rocketchip.util.{ParameterizedBundle, GenericParameterizedBundle, UIntIsOneOf}
|
||||
import freechips.rocketchip.unittest.UnitTest
|
||||
import junctions._
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util._
|
||||
import chisel3.experimental.MultiIOModule
|
||||
|
||||
// From MIDAS
|
||||
import midas.widgets.{D2V, V2D, SkidRegister}
|
||||
|
||||
class DualQueue[T <: Data](gen: =>T, entries: Int) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val enqA = Flipped(Decoupled(gen.cloneType))
|
||||
val enqB = Flipped(Decoupled(gen.cloneType))
|
||||
val deq = Decoupled(gen.cloneType)
|
||||
val next = Valid(gen.cloneType)
|
||||
})
|
||||
|
||||
val qA = Module(new Queue(gen.cloneType, (entries+1)/2))
|
||||
val qB = Module(new Queue(gen.cloneType, entries/2))
|
||||
qA.io.deq.ready := false.B
|
||||
qB.io.deq.ready := false.B
|
||||
|
||||
val enqPointer = RegInit(false.B)
|
||||
when (io.enqA.fire() ^ io.enqB.fire()) {
|
||||
enqPointer := ~enqPointer
|
||||
}
|
||||
|
||||
when(enqPointer ^ ~io.enqA.valid){
|
||||
qA.io.enq <> io.enqB
|
||||
qB.io.enq <> io.enqA
|
||||
}.otherwise{
|
||||
qA.io.enq <> io.enqA
|
||||
qB.io.enq <> io.enqB
|
||||
}
|
||||
|
||||
val deqPointer = RegInit(false.B)
|
||||
when (io.deq.fire()) {
|
||||
deqPointer := ~deqPointer
|
||||
}
|
||||
|
||||
when(deqPointer){
|
||||
io.deq <> qB.io.deq
|
||||
io.next <> D2V(qA.io.deq)
|
||||
}.otherwise{
|
||||
io.deq <> qA.io.deq
|
||||
io.next <> D2V(qB.io.deq)
|
||||
}
|
||||
}
|
||||
|
||||
class ProgrammableSubAddr(
|
||||
val maskBits: Int,
|
||||
val longName: String,
|
||||
val defaultOffset: BigInt,
|
||||
val defaultMask: BigInt) extends Bundle with HasProgrammableRegisters {
|
||||
val offset = UInt(32.W) // TODO:fixme
|
||||
val mask = UInt(maskBits.W) // Must be contiguous high bits starting from LSB
|
||||
def getSubAddr(fullAddr: UInt): UInt = (fullAddr >> offset) & mask
|
||||
|
||||
// Used to produce a bit vector of enables from a mask
|
||||
def maskToOH(): UInt = {
|
||||
val decodings = Seq.tabulate(maskBits)({ i => ((1 << (1 << (i + 1))) - 1).U})
|
||||
MuxCase(1.U, (mask.toBools.zip(decodings)).reverse)
|
||||
}
|
||||
|
||||
val registers = Seq(
|
||||
(offset -> RuntimeSetting(defaultOffset,s"${longName} Offset", min = 0)),
|
||||
(mask -> RuntimeSetting(defaultMask,s"${longName} Mask", max = Some((1 << maskBits) - 1)))
|
||||
)
|
||||
|
||||
def forceSettings(offsetValue: BigInt, maskValue: BigInt) {
|
||||
regMap(offset).set(offsetValue)
|
||||
regMap(mask).set(maskValue)
|
||||
}
|
||||
}
|
||||
|
||||
// A common motif to track inputs in a buffer
|
||||
trait HasFIFOPointers {
|
||||
val entries: Int
|
||||
val do_enq = Wire(Bool())
|
||||
val do_deq = Wire(Bool())
|
||||
|
||||
val enq_ptr = Counter(entries)
|
||||
val deq_ptr = Counter(entries)
|
||||
val maybe_full = RegInit(false.B)
|
||||
|
||||
val ptr_match = enq_ptr.value === deq_ptr.value
|
||||
val empty = ptr_match && !maybe_full
|
||||
val full = ptr_match && maybe_full
|
||||
|
||||
when (do_enq) {
|
||||
enq_ptr.inc()
|
||||
}
|
||||
when (do_deq) {
|
||||
deq_ptr.inc()
|
||||
}
|
||||
when (do_enq =/= do_deq) {
|
||||
maybe_full := do_enq
|
||||
}
|
||||
}
|
||||
|
||||
class DynamicLatencyPipeIO[T <: Data](gen: T, entries: Int, countBits: Int)
|
||||
extends QueueIO(gen, entries) {
|
||||
val latency = Input(UInt(countBits.W))
|
||||
val tCycle = Input(UInt(countBits.W))
|
||||
|
||||
override def cloneType = new DynamicLatencyPipeIO(gen, entries, countBits).asInstanceOf[this.type]
|
||||
}
|
||||
|
||||
// I had to copy this code because critical fields are now private
|
||||
class DynamicLatencyPipe[T <: Data] (
|
||||
gen: T,
|
||||
val entries: Int,
|
||||
countBits: Int
|
||||
) extends Module with HasFIFOPointers {
|
||||
val io = IO(new DynamicLatencyPipeIO(gen, entries, countBits))
|
||||
|
||||
// Add the implication on enq.fire to work around target reset problems for now
|
||||
assert(!io.enq.fire || io.latency =/= 0.U, "DynamicLatencyPipe only supports latencies > 0")
|
||||
val ram = Mem(entries, gen)
|
||||
do_enq := io.enq.fire()
|
||||
do_deq := io.deq.fire()
|
||||
|
||||
when (do_enq) {
|
||||
ram(enq_ptr.value) := io.enq.bits
|
||||
}
|
||||
|
||||
io.enq.ready := !full
|
||||
io.deq.bits := ram(deq_ptr.value)
|
||||
|
||||
val ptr_diff = enq_ptr.value - deq_ptr.value
|
||||
if (isPow2(entries)) {
|
||||
io.count := Cat(maybe_full && ptr_match, ptr_diff)
|
||||
} else {
|
||||
io.count := Mux(ptr_match,
|
||||
Mux(maybe_full,
|
||||
entries.asUInt, 0.U),
|
||||
Mux(deq_ptr.value > enq_ptr.value,
|
||||
entries.asUInt + ptr_diff, ptr_diff))
|
||||
}
|
||||
|
||||
val latencies = Reg(Vec(entries, UInt(countBits.W)))
|
||||
val pendingRegisters = RegInit(VecInit(Seq.fill(entries)(false.B)))
|
||||
val done = Vec(latencies.zip(pendingRegisters) map { case (lat, pendingReg) =>
|
||||
val cycleMatch = lat === io.tCycle
|
||||
when (cycleMatch) { pendingReg := false.B }
|
||||
cycleMatch || !pendingReg
|
||||
})
|
||||
|
||||
when (do_enq) {
|
||||
latencies(enq_ptr.value) := io.tCycle + io.latency
|
||||
pendingRegisters(enq_ptr.value) := io.latency != 1.U
|
||||
}
|
||||
|
||||
io.deq.valid := !empty && done(deq_ptr.value)
|
||||
}
|
||||
|
||||
// Counts down from a set value; If the set value is less than the present value
|
||||
// it is ignored.
|
||||
|
||||
class DownCounter(counterWidth: Int) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val set = Input(Valid(UInt(counterWidth.W)))
|
||||
val decr = Input(Bool())
|
||||
val current = Output(UInt(counterWidth.W))
|
||||
val idle = Output(Bool())
|
||||
})
|
||||
|
||||
require(counterWidth > 0, "DownCounter must have a width > 0")
|
||||
val delay = RegInit(0.U(counterWidth.W))
|
||||
when(io.set.valid && io.set.bits >= delay) {
|
||||
delay := io.set.bits
|
||||
}.elsewhen(io.decr && delay =/= 0.U){
|
||||
delay := delay - 1.U
|
||||
}
|
||||
io.idle := delay === 0.U
|
||||
io.current := delay
|
||||
}
|
||||
|
||||
// While down counter has a local decrementer, this module instead matches against
|
||||
// a provided cycle count.
|
||||
class CycleTracker(counterWidth: Int) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val set = Input(Valid(UInt(counterWidth.W)))
|
||||
val tCycle = Input(UInt(counterWidth.W))
|
||||
val idle = Output(Bool())
|
||||
})
|
||||
|
||||
require(counterWidth > 0, "CycleTracker must have a width > 0")
|
||||
val delay = RegInit(0.U(counterWidth.W))
|
||||
val idle = RegInit(true.B)
|
||||
when(io.set.valid && io.tCycle =/= io.set.bits) {
|
||||
delay := io.set.bits
|
||||
idle := false.B
|
||||
}.elsewhen(delay === io.tCycle){
|
||||
idle := true.B
|
||||
}
|
||||
io.idle := idle
|
||||
}
|
||||
|
||||
// A collapsing buffer with entries that can be updated. Valid entries trickle
|
||||
// down through queue, one entry per cycle.
|
||||
// Kill is implemented by setting io.update(entry).valid := false.B
|
||||
//
|
||||
// NB: Companion object should be used to generate a module instance -> or
|
||||
// updates must be driven to entries by default for the module to behave
|
||||
// correctly
|
||||
class CollapsingBufferIO[T <: Data](private val gen: T, val depth: Int) extends Bundle {
|
||||
val entries = Output(Vec(depth, Valid(gen)))
|
||||
val updates = Input(Vec(depth, Valid(gen)))
|
||||
val enq = Flipped(Decoupled(gen))
|
||||
val programmableDepth = Input(UInt(log2Ceil(depth+1).W))
|
||||
}
|
||||
|
||||
// Note: Use companion object
|
||||
class CollapsingBuffer[T <: Data](gen: T, depth: Int) extends Module {
|
||||
val io = IO(new CollapsingBufferIO(gen, depth))
|
||||
|
||||
def linkEntries(entries: Seq[(ValidIO[T], ValidIO[T], Bool)], shifting: Bool): Unit = entries match {
|
||||
case Nil => throw new RuntimeException("Asked for 0 entry collapasing buffer?")
|
||||
// Youngest entry, connect up io.enq
|
||||
case (entry, currentUpdate, lastEntry) :: Nil => {
|
||||
val shift = shifting || !currentUpdate.valid
|
||||
entry := Mux(shift, D2V(io.enq), currentUpdate)
|
||||
io.enq.ready := shift
|
||||
}
|
||||
// Default case, a younger stage enqueues into this one
|
||||
case (entry, currentUpdate, lastEntry) :: tail => {
|
||||
val youngerUpdate = tail.head._2
|
||||
val shift = !lastEntry && ( shifting || !currentUpdate.valid)
|
||||
entry := Mux(shift, youngerUpdate, currentUpdate)
|
||||
linkEntries(tail, shift)
|
||||
}
|
||||
}
|
||||
|
||||
val lastEntry = UIntToOH(io.programmableDepth).toBools.take(depth).reverse
|
||||
val entries = Seq.fill(depth)(
|
||||
RegInit({val w = Wire(Valid(gen.cloneType)); w.valid := false.B; w.bits := DontCare; w}))
|
||||
io.entries := entries
|
||||
|
||||
linkEntries((entries, io.updates, lastEntry).zipped.toList, false.B)
|
||||
}
|
||||
|
||||
object CollapsingBuffer {
|
||||
def apply[T <: Data](
|
||||
enq: DecoupledIO[T],
|
||||
depth: Int,
|
||||
programmableDepth: Option[UInt] = None): CollapsingBuffer[T] = {
|
||||
|
||||
val buffer = Module(new CollapsingBuffer(enq.bits.cloneType, depth))
|
||||
// This sets the default that each entry retains its value unless driven by the parent mod
|
||||
(buffer.io.updates).zip(buffer.io.entries).foreach({ case (e, u) => e := u })
|
||||
buffer.io.enq <> enq
|
||||
buffer.io.programmableDepth := programmableDepth.getOrElse(depth.U)
|
||||
buffer
|
||||
}
|
||||
}
|
||||
|
||||
trait HasAXI4Id extends HasNastiParameters { val id = UInt(nastiXIdBits.W) }
|
||||
trait HasAXI4IdAndLen extends HasAXI4Id { val len = UInt(nastiXLenBits.W) }
|
||||
trait HasReqMetaData extends HasAXI4IdAndLen { val addr = UInt(nastiXAddrBits.W) }
|
||||
|
||||
class TransactionMetaData(implicit val p: Parameters) extends Bundle with HasAXI4IdAndLen {
|
||||
val isWrite = Bool()
|
||||
}
|
||||
|
||||
object TransactionMetaData {
|
||||
def apply(id: UInt, len: UInt, isWrite: Bool)(implicit p: Parameters): TransactionMetaData = {
|
||||
val w = Wire(new TransactionMetaData)
|
||||
w.id := id
|
||||
w.len := len
|
||||
w.isWrite := isWrite
|
||||
w
|
||||
}
|
||||
|
||||
def apply(x: NastiReadAddressChannel)(implicit p: Parameters): TransactionMetaData =
|
||||
apply(x.id, x.len, false.B)
|
||||
|
||||
def apply(x: NastiWriteAddressChannel)(implicit p: Parameters): TransactionMetaData =
|
||||
apply(x.id, x.len, true.B)
|
||||
|
||||
}
|
||||
|
||||
class WriteResponseMetaData(implicit val p: Parameters) extends Bundle with HasAXI4Id
|
||||
class ReadResponseMetaData(implicit val p: Parameters) extends Bundle with HasAXI4IdAndLen
|
||||
|
||||
object ReadResponseMetaData {
|
||||
def apply(x: HasAXI4IdAndLen)(implicit p: Parameters): ReadResponseMetaData = {
|
||||
val readMetaData = Wire(new ReadResponseMetaData)
|
||||
readMetaData.id := x.id
|
||||
readMetaData.len := x.len
|
||||
readMetaData
|
||||
}
|
||||
// UGH. Will fix when i go to RC's AXI4 impl
|
||||
def apply(x: NastiReadAddressChannel)(implicit p: Parameters): ReadResponseMetaData = {
|
||||
val readMetaData = Wire(new ReadResponseMetaData)
|
||||
readMetaData.id := x.id
|
||||
readMetaData.len := x.len
|
||||
readMetaData
|
||||
}
|
||||
}
|
||||
|
||||
object WriteResponseMetaData {
|
||||
def apply(x: HasAXI4Id)(implicit p: Parameters): WriteResponseMetaData = {
|
||||
val writeMetaData = Wire(new WriteResponseMetaData)
|
||||
writeMetaData.id := x.id
|
||||
writeMetaData
|
||||
}
|
||||
|
||||
def apply(x: NastiWriteAddressChannel)(implicit p: Parameters): WriteResponseMetaData = {
|
||||
val writeMetaData = Wire(new WriteResponseMetaData)
|
||||
writeMetaData.id := x.id
|
||||
writeMetaData
|
||||
}
|
||||
}
|
||||
|
||||
class AXI4ReleaserIO(implicit val p: Parameters) extends ParameterizedBundle()(p) {
|
||||
val b = Decoupled(new NastiWriteResponseChannel)
|
||||
val r = Decoupled(new NastiReadDataChannel)
|
||||
val egressReq = new EgressReq
|
||||
val egressResp = Flipped(new EgressResp)
|
||||
val nextRead = Flipped(Decoupled(new ReadResponseMetaData))
|
||||
val nextWrite = Flipped(Decoupled(new WriteResponseMetaData))
|
||||
}
|
||||
|
||||
|
||||
class AXI4Releaser(implicit p: Parameters) extends Module {
|
||||
val io = IO(new AXI4ReleaserIO)
|
||||
|
||||
val currentRead = Queue(io.nextRead, 1, pipe = true)
|
||||
currentRead.ready := io.r.fire && io.r.bits.last
|
||||
io.egressReq.r.valid := io.nextRead.fire
|
||||
io.egressReq.r.bits := io.nextRead.bits.id
|
||||
io.r.valid := currentRead.valid
|
||||
io.r.bits := io.egressResp.rBits
|
||||
io.egressResp.rReady := io.r.ready
|
||||
|
||||
val currentWrite = Queue(io.nextWrite, 1, pipe = true)
|
||||
currentWrite.ready := io.b.fire
|
||||
io.egressReq.b.valid := io.nextWrite.fire
|
||||
io.egressReq.b.bits := io.nextWrite.bits.id
|
||||
io.b.valid := currentWrite.valid
|
||||
io.b.bits := io.egressResp.bBits
|
||||
io.egressResp.bReady := io.b.ready
|
||||
}
|
||||
|
||||
class FIFOAddressMatcher(val entries: Int, addrWidth: Int) extends Module with HasFIFOPointers {
|
||||
val io = IO(new Bundle {
|
||||
val enq = Flipped(Valid(UInt(addrWidth.W)))
|
||||
val deq = Input(Bool())
|
||||
val match_address = Input(UInt(addrWidth.W))
|
||||
val hit = Output(Bool())
|
||||
})
|
||||
|
||||
val addrs = RegInit(VecInit(Seq.fill(entries)({
|
||||
val w = Wire(Valid(UInt(addrWidth.W)))
|
||||
w.valid := false.B
|
||||
w.bits := DontCare
|
||||
w
|
||||
})))
|
||||
do_enq := io.enq.valid
|
||||
do_deq := io.deq
|
||||
|
||||
assert(!full || (!do_enq || do_deq)) // Since we don't have backpressure, check for overflow
|
||||
when (do_enq) {
|
||||
addrs(enq_ptr.value).valid := true.B
|
||||
addrs(enq_ptr.value).bits := io.enq.bits
|
||||
}
|
||||
|
||||
when (do_deq) {
|
||||
addrs(deq_ptr.value).valid := false.B
|
||||
}
|
||||
|
||||
io.hit := addrs.exists({entry => entry.valid && entry.bits === io.match_address })
|
||||
}
|
||||
|
||||
class AddressCollisionCheckerIO(addrWidth: Int)(implicit p: Parameters) extends NastiBundle()(p) {
|
||||
val read_req = Input(Valid(UInt(addrWidth.W)))
|
||||
val read_done = Input(Bool())
|
||||
val write_req = Input(Valid(UInt(addrWidth.W)))
|
||||
val write_done = Input(Bool())
|
||||
val collision_addr = ValidIO(UInt(addrWidth.W))
|
||||
}
|
||||
|
||||
class AddressCollisionChecker(numReads: Int, numWrites: Int, addrWidth: Int)(implicit p: Parameters)
|
||||
extends NastiModule()(p) {
|
||||
val io = IO(new AddressCollisionCheckerIO(addrWidth))
|
||||
|
||||
require(isPow2(numReads))
|
||||
require(isPow2(numWrites))
|
||||
//val discardedLSBs = 6
|
||||
//val addrType = UInt(p(NastiKey).addrBits - discardedLSBs)
|
||||
|
||||
val read_matcher = Module(new FIFOAddressMatcher(numReads, addrWidth)).io
|
||||
read_matcher.enq := io.read_req
|
||||
read_matcher.deq := io.read_done
|
||||
read_matcher.match_address := io.write_req.bits
|
||||
|
||||
val write_matcher = Module(new FIFOAddressMatcher(numReads, addrWidth)).io
|
||||
write_matcher.enq := io.write_req
|
||||
write_matcher.deq := io.write_done
|
||||
write_matcher.match_address := io.read_req.bits
|
||||
|
||||
io.collision_addr.valid := io.read_req.valid && write_matcher.hit ||
|
||||
io.write_req.valid && read_matcher.hit
|
||||
io.collision_addr.bits := Mux(io.read_req.valid, io.read_req.bits, io.write_req.bits)
|
||||
}
|
||||
|
||||
|
||||
class CounterReadoutIO(val addrBits: Int) extends Bundle {
|
||||
val enable = Input(Bool()) // Set when the simulation memory bus whishes to read out the values
|
||||
val addr = Input(UInt(addrBits.W))
|
||||
val dataL = Output(UInt(32.W))
|
||||
val dataH = Output(UInt(32.W))
|
||||
}
|
||||
|
||||
class CounterIncrementIO(val addrBits: Int, val dataBits: Int) extends Bundle {
|
||||
val enable = Input(Bool())
|
||||
val addr = Input(UInt(addrBits.W))
|
||||
// Pass data 2 cycles after enable
|
||||
val data = Input(UInt(dataBits.W))
|
||||
}
|
||||
|
||||
class CounterTable(addrBits: Int, dataBits: Int) extends Module {
|
||||
val io = IO(new Bundle {
|
||||
val incr = new CounterIncrementIO(addrBits, dataBits)
|
||||
val readout = new CounterReadoutIO(addrBits)
|
||||
})
|
||||
|
||||
require(dataBits > 32)
|
||||
val memDepth = 1 << addrBits
|
||||
|
||||
val counts = Mem(memDepth, UInt(dataBits.W))
|
||||
|
||||
val s0_readAddr = Mux(io.readout.enable, io.readout.addr, io.incr.addr)
|
||||
val s1_readAddr = RegNext(s0_readAddr)
|
||||
val s1_readData = counts.read(s1_readAddr)
|
||||
val s1_valid = RegNext(io.incr.enable, false.B)
|
||||
val s1_readout = RegNext(io.readout.enable)
|
||||
|
||||
val s2_valid = RegNext(s1_valid && !s1_readout)
|
||||
val s2_writeAddr = RegNext(s1_readAddr)
|
||||
val s2_readData = RegNext(s1_readData)
|
||||
val s2_writeData = Wire(UInt(dataBits.W))
|
||||
|
||||
val s3_valid = RegNext(s2_valid)
|
||||
val s3_writeData = RegNext(s2_writeData)
|
||||
val s3_writeAddr = RegNext(s2_writeAddr)
|
||||
|
||||
val doBypass = s2_valid && s3_valid && s2_writeAddr === s3_writeAddr
|
||||
s2_writeData := Mux(doBypass, s3_writeData, s2_readData) + io.incr.data
|
||||
|
||||
when (s2_valid) {
|
||||
counts(s2_writeAddr) := s2_writeData
|
||||
}
|
||||
|
||||
io.readout.dataL := s2_readData(31, 0)
|
||||
io.readout.dataH := s2_readData(dataBits-1, 32)
|
||||
}
|
||||
|
||||
// Stores a histogram of host latencies in BRAM
|
||||
// Setting io.readoutEnable ties a read port of the BRAM to a read address that
|
||||
// can be driven by the simulation bus
|
||||
//
|
||||
// WARNING: Will drop bin updates if attempting to read values while host
|
||||
// transactions are still inflight
|
||||
|
||||
class HostLatencyHistogramIO(val idBits: Int, val binAddrBits: Int) extends Bundle {
|
||||
val reqId = Flipped(ValidIO(UInt(idBits.W)))
|
||||
val respId = Flipped(ValidIO(UInt(idBits.W)))
|
||||
val cycleCountEnable = Input(Bool()) // Indicates which cycles the counter should be incremented
|
||||
val readout = new CounterReadoutIO(binAddrBits)
|
||||
}
|
||||
|
||||
// Defaults Will fit in a 36K BRAM
|
||||
class HostLatencyHistogram (
|
||||
idBits: Int,
|
||||
cycleCountBits: Int = 10
|
||||
) extends Module {
|
||||
val io = IO(new HostLatencyHistogramIO(idBits, cycleCountBits))
|
||||
val binSize = 36
|
||||
// Need a queue for each ID to track the host cycle a request was issued.
|
||||
val queues = Seq.fill(1 << idBits)(Module(new Queue(UInt(cycleCountBits.W), 1)))
|
||||
|
||||
val cycle = RegInit(0.U(cycleCountBits.W))
|
||||
when (io.cycleCountEnable) { cycle := cycle + 1.U }
|
||||
|
||||
// When the host accepts an AW/AR enq the current cycle
|
||||
(queues map { _.io.enq }).zip(UIntToOH(io.reqId.bits).toBools).foreach({ case (enq, sel) =>
|
||||
enq.valid := io.reqId.valid && sel
|
||||
enq.bits := cycle
|
||||
assert(!(enq.valid && !enq.ready), "Multiple requests issued to same ID")
|
||||
})
|
||||
|
||||
val deqAddrOH = UIntToOH(io.respId.bits)
|
||||
val reqCycle = Mux1H(deqAddrOH, (queues map { _.io.deq.bits }))
|
||||
(queues map { _.io.deq }).zip(deqAddrOH.toBools).foreach({ case (deq, sel) =>
|
||||
deq.ready := io.respId.valid && sel
|
||||
assert(deq.valid || !deq.ready, "Received an unexpected response")
|
||||
})
|
||||
|
||||
val histogram = Module(new CounterTable(cycleCountBits, binSize))
|
||||
histogram.io.incr.enable := io.respId.valid
|
||||
histogram.io.incr.addr := cycle - reqCycle
|
||||
histogram.io.incr.data := 1.U
|
||||
|
||||
io.readout <> histogram.io.readout
|
||||
}
|
||||
|
||||
object HostLatencyHistogram {
|
||||
def apply(
|
||||
reqValid: Bool,
|
||||
reqId: UInt,
|
||||
respValid: UInt,
|
||||
respId: UInt,
|
||||
cycleCountEnable: Bool = true.B,
|
||||
binAddrBits: Int = 10): CounterReadoutIO = {
|
||||
require(reqId.getWidth == respId.getWidth)
|
||||
val histogram = Module(new HostLatencyHistogram(reqId.getWidth, binAddrBits))
|
||||
histogram.io.reqId.bits := reqId
|
||||
histogram.io.reqId.valid := reqValid
|
||||
histogram.io.respId.bits := respId
|
||||
histogram.io.respId.valid := respValid
|
||||
histogram.io.cycleCountEnable := cycleCountEnable
|
||||
histogram.io.readout
|
||||
}
|
||||
}
|
||||
|
||||
// Pick out the relevant parts of NastiReadAddressChannel or NastiWriteAddressChannel
|
||||
class AddressRangeCounterRequest(implicit p: Parameters) extends NastiBundle {
|
||||
val addr = UInt(nastiXAddrBits.W)
|
||||
val len = UInt(nastiXLenBits.W)
|
||||
val size = UInt(nastiXSizeBits.W)
|
||||
}
|
||||
|
||||
// Stores count of #bytes requested from each range in BRAM.
|
||||
// Setting io.readout.enable ties a read port of the BRAM to a read address
|
||||
// that can be driven by the simulation bus
|
||||
//
|
||||
// WARNING: Will drop range updates if attempting to read values when host
|
||||
// transactions issued
|
||||
|
||||
class AddressRangeCounter(nRanges: BigInt)(implicit p: Parameters) extends NastiModule {
|
||||
val io = IO(new Bundle {
|
||||
val req = Flipped(ValidIO(new AddressRangeCounterRequest))
|
||||
val readout = new CounterReadoutIO(log2Ceil(nRanges))
|
||||
})
|
||||
|
||||
require(nRanges > 1)
|
||||
require(nRanges < (1L << 32))
|
||||
require(isPow2(nRanges))
|
||||
|
||||
val counterBits = 48
|
||||
val addrMSB = nastiXAddrBits - 1
|
||||
val addrLSB = nastiXAddrBits - log2Ceil(nRanges)
|
||||
|
||||
val s1_len = RegNext(io.req.bits.len)
|
||||
val s1_size = RegNext(io.req.bits.size)
|
||||
val s1_bytes = (s1_len + 1.U) << s1_size
|
||||
val s2_bytes = RegNext(s1_bytes)
|
||||
|
||||
val counters = Module(new CounterTable(log2Ceil(nRanges), counterBits))
|
||||
counters.io.incr.enable := io.req.valid
|
||||
counters.io.incr.addr := io.req.bits.addr(addrMSB, addrLSB)
|
||||
counters.io.incr.data := s2_bytes
|
||||
io.readout <> counters.io.readout
|
||||
}
|
||||
|
||||
object AddressRangeCounter {
|
||||
def apply[T <: NastiAddressChannel](
|
||||
n: BigInt, req: DecoupledIO[T], en: Bool)(implicit p: Parameters) = {
|
||||
val counter = Module(new AddressRangeCounter(n))
|
||||
counter.io.req.valid := req.fire() && en
|
||||
counter.io.req.bits.addr := req.bits.addr
|
||||
counter.io.req.bits.len := req.bits.len
|
||||
counter.io.req.bits.size := req.bits.size
|
||||
counter.io.readout
|
||||
}
|
||||
}
|
||||
|
||||
object AddressCollisionCheckMain extends App {
|
||||
implicit val p = Parameters.empty.alterPartial({case NastiKey => NastiParameters(64,32,4)})
|
||||
chisel3.Driver.execute(args, () => new AddressCollisionChecker(4,4,16))
|
||||
}
|
||||
|
||||
class CounterTableUnitTest extends UnitTest {
|
||||
val addrBits = 10
|
||||
val dataBits = 48
|
||||
val counters = Module(new CounterTable(addrBits, dataBits))
|
||||
|
||||
val (s_start :: s_readInit :: s_incr :: s_readout :: s_done :: Nil) = Enum(5)
|
||||
val state = RegInit(s_start)
|
||||
|
||||
val incrAddrs = VecInit(Seq(0, 0, 4, 0, 4, 16).map(_.U(addrBits.W)))
|
||||
val incrData = VecInit(Seq(1, 2, 5, 1, 3, 7).map(_.U(dataBits.W)))
|
||||
val (incrIdx, incrDone) = Counter(state === s_incr, incrAddrs.size)
|
||||
|
||||
val readAddrs = VecInit(Seq(0, 4, 8, 16).map(_.U(addrBits.W)))
|
||||
val readExpected = VecInit(Seq(4, 8, 0, 7).map(_.U(dataBits.W)))
|
||||
val (readIdx, readDone) = Counter(state.isOneOf(s_readInit, s_readout), readAddrs.size)
|
||||
val initValues = Reg(Vec(readExpected.size, UInt(dataBits.W)))
|
||||
|
||||
counters.io.incr.enable := state === s_incr
|
||||
counters.io.incr.addr := incrAddrs(incrIdx)
|
||||
counters.io.incr.data := RegNext(RegNext(incrData(incrIdx)))
|
||||
|
||||
counters.io.readout.enable := state.isOneOf(s_readInit, s_readout)
|
||||
counters.io.readout.addr := readAddrs(readIdx)
|
||||
|
||||
val readData = Cat(counters.io.readout.dataH, counters.io.readout.dataL)
|
||||
val initValid = RegNext(RegNext(state === s_readInit, false.B), false.B)
|
||||
val initWriteIdx = RegNext(RegNext(readIdx))
|
||||
|
||||
when (initValid) { initValues(initWriteIdx) := readData }
|
||||
|
||||
val expectedCount = RegNext(RegNext(readExpected(readIdx)))
|
||||
val readValid = RegNext(RegNext(state === s_readout, false.B), false.B)
|
||||
val readCount = readData - RegNext(RegNext(initValues(readIdx)))
|
||||
|
||||
assert(!readValid || readCount === expectedCount)
|
||||
|
||||
when (state === s_start && io.start) { state := s_readInit }
|
||||
when (state === s_readInit && readDone) { state := s_incr }
|
||||
when (incrDone) { state := s_readout }
|
||||
when (state === s_readout && readDone) { state := s_done }
|
||||
|
||||
io.finished := state === s_done
|
||||
}
|
||||
|
||||
class LatencyHistogramUnitTest extends UnitTest {
|
||||
val addrBits = 8
|
||||
val histogram = Module(new HostLatencyHistogram(2, addrBits))
|
||||
val dataBits = histogram.binSize
|
||||
|
||||
val (s_start :: s_readInit :: s_run :: s_readout :: s_done :: Nil) = Enum(5)
|
||||
val state = RegInit(s_start)
|
||||
|
||||
// The second response comes a cycle after the first,
|
||||
// with the same amount of time (2 cycles) after the request.
|
||||
// Therefore, it will require a bypass.
|
||||
// The third response comes a cycle after the second, but since the number
|
||||
// of cycles is 1 instead of 2, it will not require a bypass.
|
||||
// The fourth response also comes 2 cycles after the request,
|
||||
// but since several cycles have elapsed since last update, no bypass needed
|
||||
val cycleReq = VecInit(Seq(true.B, true.B, false.B, true.B, true.B, false.B, false.B))
|
||||
val cycleResp = VecInit(Seq(false.B, false.B, true.B, true.B, true.B, false.B, true.B))
|
||||
|
||||
val (runIdx, runDone) = Counter(state === s_run, cycleReq.size)
|
||||
val (reqId, _) = Counter(histogram.io.reqId.valid, 4)
|
||||
val (respId, _) = Counter(histogram.io.respId.valid, 4)
|
||||
|
||||
val readAddrs = VecInit(Seq(1.U(addrBits.W), 2.U(addrBits.W)))
|
||||
val readExpected = VecInit(Seq(1.U(dataBits.W), 3.U(dataBits.W)))
|
||||
val (readIdx, readDone) = Counter(state.isOneOf(s_readInit, s_readout), readAddrs.size)
|
||||
|
||||
histogram.io.reqId.valid := state === s_run && cycleReq(runIdx)
|
||||
histogram.io.reqId.bits := reqId
|
||||
histogram.io.respId.valid := state === s_run && cycleResp(runIdx)
|
||||
histogram.io.respId.bits := respId
|
||||
histogram.io.cycleCountEnable := true.B
|
||||
histogram.io.readout.enable := state.isOneOf(s_readInit, s_readout)
|
||||
histogram.io.readout.addr := readAddrs(readIdx)
|
||||
|
||||
val initValues = Reg(Vec(readExpected.size, UInt(dataBits.W)))
|
||||
val initWriteIdx = RegNext(RegNext(readIdx))
|
||||
val initValid = RegNext(RegNext(state === s_readInit, false.B), false.B)
|
||||
val readData = Cat(histogram.io.readout.dataH, histogram.io.readout.dataL)
|
||||
|
||||
when (initValid) { initValues(initWriteIdx) := readData }
|
||||
|
||||
when (state === s_start && io.start) { state := s_readInit }
|
||||
when (state === s_readInit && readDone) { state := s_run }
|
||||
when (runDone) { state := s_readout }
|
||||
when (state === s_readout && readDone) { state := s_done }
|
||||
|
||||
val expectedCount = RegNext(RegNext(readExpected(readIdx)))
|
||||
val readValid = RegNext(RegNext(state === s_readout, false.B), false.B)
|
||||
val readCount = readData - RegNext(RegNext(initValues(readIdx)))
|
||||
|
||||
assert(!readValid || readCount === expectedCount)
|
||||
|
||||
io.finished := state === s_done
|
||||
}
|
||||
|
||||
class AddressRangeCounterUnitTest(implicit p: Parameters) extends UnitTest {
|
||||
val nCounters = 8
|
||||
val nastiP = p.alterPartial({
|
||||
case NastiKey => NastiParameters(64, 16, 4)
|
||||
})
|
||||
val counters = Module(new AddressRangeCounter(nCounters)(nastiP))
|
||||
|
||||
val (s_start :: s_readInit :: s_run :: s_readout :: s_done :: Nil) = Enum(5)
|
||||
val state = RegInit(s_start)
|
||||
|
||||
val reqAddrs = VecInit(Seq(
|
||||
0x00000, 0x2000, 0x4000, 0x6000,
|
||||
0x4000, 0x2000, 0x0000, 0x6000).map(_.U(16.W)))
|
||||
val reqSizes = VecInit(Seq(3, 2, 3, 1, 0, 1, 2, 3).map(_.U(3.W)))
|
||||
val reqLens = VecInit(Seq(0, 4, 5, 3, 1, 9, 4, 6).map(_.U(8.W)))
|
||||
|
||||
def computeExpected(idx: Int) = (reqLens(idx) + 1.U) << reqSizes(idx)
|
||||
|
||||
val readExpected = VecInit(Seq((0, 6), (1, 5), (2, 4), (3, 7)).map {
|
||||
case (a, b) => computeExpected(a) + computeExpected(b)
|
||||
})
|
||||
|
||||
val (readIdx, readDone) = Counter(state.isOneOf(s_readInit, s_readout), readExpected.size)
|
||||
val (runIdx, runDone) = Counter(state === s_run, reqAddrs.size)
|
||||
|
||||
val initValues = Reg(Vec(readExpected.size, UInt(counters.counterBits.W)))
|
||||
val initWriteIdx = RegNext(RegNext(readIdx))
|
||||
val initValid = RegNext(RegNext(state === s_readInit, false.B), false.B)
|
||||
val readData = Cat(counters.io.readout.dataH, counters.io.readout.dataL)
|
||||
|
||||
counters.io.req.valid := state === s_run
|
||||
counters.io.req.bits.addr := reqAddrs(runIdx)
|
||||
counters.io.req.bits.len := reqLens(runIdx)
|
||||
counters.io.req.bits.size := reqSizes(runIdx)
|
||||
counters.io.readout.enable := state.isOneOf(s_readInit, s_readout)
|
||||
counters.io.readout.addr := readIdx
|
||||
|
||||
when (initValid) { initValues(initWriteIdx) := readData }
|
||||
when (state === s_start && io.start) { state := s_readInit }
|
||||
when (state === s_readInit && readDone) { state := s_run }
|
||||
when (runDone) { state := s_readout }
|
||||
when (state === s_readout && readDone) { state := s_done }
|
||||
|
||||
val expectedCount = RegNext(RegNext(readExpected(readIdx)))
|
||||
val readValid = RegNext(RegNext(state === s_readout, false.B), false.B)
|
||||
val readCount = readData - RegNext(RegNext(initValues(readIdx)))
|
||||
|
||||
assert(!readValid || readCount === expectedCount)
|
||||
|
||||
io.finished := state === s_done
|
||||
}
|
||||
|
||||
// Checks AXI4 transactions to ensure they conform to the bounds
|
||||
// set in the memory model configuration eg. Max burst lengths respected
|
||||
// NOTE: For use only in a FAME1 context
|
||||
class MemoryModelMonitor(cfg: BaseConfig)(implicit p: Parameters) extends MultiIOModule {
|
||||
val axi4 = IO(Input(new NastiIO))
|
||||
|
||||
assert(!axi4.ar.fire || axi4.ar.bits.len < cfg.maxReadLength.U,
|
||||
s"Read burst length exceeds memory-model maximum of ${cfg.maxReadLength}")
|
||||
assert(!axi4.aw.fire || axi4.aw.bits.len < cfg.maxWriteLength.U,
|
||||
s"Write burst length exceeds memory-model maximum of ${cfg.maxReadLength}")
|
||||
}
|
|
@ -0,0 +1,160 @@
|
|||
package midas.models.sram
|
||||
|
||||
import chisel3._
|
||||
import chisel3.util.{Mux1H, Decoupled, RegEnable, log2Ceil, Enum}
|
||||
import chisel3.experimental.{MultiIOModule, dontTouch}
|
||||
//import chisel3.experimental.ChiselEnum
|
||||
import chisel3.experimental.{DataMirror, requireIsChiselType}
|
||||
import collection.immutable.ListMap
|
||||
|
||||
class AsyncMemModelGen(val depth: Int, val dataWidth: Int) extends ModelGenerator {
|
||||
assert(depth > 0)
|
||||
assert(dataWidth > 0)
|
||||
val emitModel = () => new AsyncMemChiselModel(depth, dataWidth)
|
||||
val emitRTLImpl = () => new AsyncMemChiselRTL(depth, dataWidth)
|
||||
}
|
||||
|
||||
class AsyncMemChiselRTL(val depth: Int, val dataWidth: Int, val nReads: Int = 2, val nWrites: Int = 2) extends MultiIOModule {
|
||||
val channels = IO(new RegfileRTLIO(depth, dataWidth, nReads, nWrites))
|
||||
val data = Mem(depth, UInt(dataWidth.W))
|
||||
|
||||
for (i <- 0 until nReads) {
|
||||
channels.read_resps(i) := data.read(channels.read_cmds(i).addr)
|
||||
}
|
||||
|
||||
for (i <- 0 until nWrites) {
|
||||
val write_cmd = channels.write_cmds(i)
|
||||
def collides(c: WriteCmd) = c.active && write_cmd.active && (c.addr === write_cmd.addr)
|
||||
val collision_detected = channels.write_cmds.drop(i+1).foldLeft(false.B) {
|
||||
case (detected, cmd) => detected || collides(cmd)
|
||||
}
|
||||
|
||||
when (write_cmd.active && !reset.toBool() && !collision_detected) {
|
||||
data.write(write_cmd.addr, write_cmd.data)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
object AsyncMemChiselModel {
|
||||
//object ReadState extends ChiselEnum {
|
||||
// val start, active, generated, responded = Value
|
||||
//}
|
||||
object ReadState {
|
||||
lazy val start :: active :: generated :: responded :: Nil = Enum(4)
|
||||
}
|
||||
}
|
||||
|
||||
class AsyncMemChiselModel(val depth: Int, val dataWidth: Int, val nReads: Int = 2, val nWrites: Int = 2) extends MultiIOModule {
|
||||
|
||||
// FSM states and helper functions
|
||||
//import AsyncMemChiselModel.ReadState
|
||||
import AsyncMemChiselModel.ReadState._
|
||||
val tupleAND = (vals: (Bool, Bool)) => vals._1 && vals._2
|
||||
val tupleOR = (vals: (Bool, Bool)) => vals._1 || vals._2
|
||||
|
||||
// Channelized IO
|
||||
val channels = IO(new RegfileModelIO(depth, dataWidth, nReads, nWrites))
|
||||
|
||||
// Target reset logic
|
||||
val target_reset_fired = Reg(Bool())
|
||||
val target_reset_available = target_reset_fired || channels.reset.valid
|
||||
val target_reset_reg = Reg(Bool())
|
||||
val target_reset_value = Mux(target_reset_fired, target_reset_reg, channels.reset.bits)
|
||||
|
||||
// Host memory implementation
|
||||
val data = Mem(depth, UInt(dataWidth.W))
|
||||
val active_read_addr = Wire(UInt())
|
||||
val active_write_addr = Wire(UInt())
|
||||
val active_write_data = Wire(UInt())
|
||||
val active_write_en = Wire(Bool())
|
||||
val read_data_async = data.read(active_read_addr)
|
||||
val read_data = RegNext(read_data_async)
|
||||
when (active_write_en && !target_reset_value && !reset.toBool()) {
|
||||
data.write(active_write_addr, active_write_data)
|
||||
}
|
||||
|
||||
|
||||
// Read request management and response data buffering
|
||||
val read_state = Reg(Vec(nReads, start.cloneType))
|
||||
val read_resp_data = Reg(Vec(nReads, UInt(dataWidth.W)))
|
||||
val read_access_req = (read_state zip channels.read_cmds) map { case (s, cmd) => s === start && cmd.valid }
|
||||
|
||||
// Don't use priority encoder because bools catted to ints considered hard to QED
|
||||
val read_access_available = read_access_req.scanLeft(true.B)({ case (open, claim) => open && !claim }).init
|
||||
val read_access_granted = (read_access_req zip read_access_available) map tupleAND
|
||||
|
||||
// Have all reads actually been performed?
|
||||
val reads_done = read_state.foldLeft(true.B) { case (others_done, s) => others_done && s =/= start }
|
||||
|
||||
// This is used to overlap last read and first write -- depends on READ_FIRST implementation
|
||||
val reads_finishing = (read_state zip channels.read_cmds).foldLeft(true.B) {
|
||||
case (finishing, (s, cmd)) => finishing && (s =/= start || cmd.fire)
|
||||
}
|
||||
|
||||
// Are all reads done or finishing this cycle?
|
||||
val outputs_responded_or_firing = (read_state zip channels.read_resps).foldLeft(true.B) {
|
||||
case (res, (s, resp)) => res && (s === responded || resp.fire)
|
||||
}
|
||||
|
||||
// Write request management
|
||||
val write_complete = Reg(Vec(nWrites, Bool()))
|
||||
|
||||
// Order writes for determinism
|
||||
val write_prereqs_met = (true.B +: write_complete.init) map { case p => p && reads_done && target_reset_available }
|
||||
|
||||
// Are all writes done or finishing this cycle?
|
||||
val writes_done_or_finishing = (write_complete zip channels.write_cmds).foldLeft(true.B) {
|
||||
case (res, (complete, cmd)) => res && (complete || cmd.fire)
|
||||
}
|
||||
|
||||
val advance_cycle = outputs_responded_or_firing && writes_done_or_finishing
|
||||
|
||||
// Target reset state management
|
||||
channels.reset.ready := !target_reset_fired
|
||||
when (advance_cycle || reset.toBool()) {
|
||||
target_reset_fired := false.B
|
||||
} .elsewhen (channels.reset.fire) {
|
||||
target_reset_fired := true.B
|
||||
target_reset_reg := channels.reset.bits
|
||||
}
|
||||
|
||||
// Read state management
|
||||
active_read_addr := channels.read_cmds(0).bits.addr
|
||||
for (i <- 0 until nReads) {
|
||||
when (read_access_granted(i)) { active_read_addr := channels.read_cmds(i).bits.addr }
|
||||
|
||||
channels.read_cmds(i).ready := read_state(i) === start && read_access_available(i)
|
||||
channels.read_resps(i).bits := Mux(read_state(i) === active, read_data, read_resp_data(i))
|
||||
channels.read_resps(i).valid := read_state(i) === active || read_state(i) === generated
|
||||
|
||||
when (advance_cycle || reset.toBool()) {
|
||||
read_state(i) := start
|
||||
} .elsewhen (read_state(i) === start && read_access_granted(i)) {
|
||||
read_state(i) := active
|
||||
} .elsewhen (read_state(i) === active) {
|
||||
read_state(i) := Mux(channels.read_resps(i).fire, responded, generated)
|
||||
read_resp_data(i) := read_data
|
||||
} .elsewhen (read_state(i) === generated && channels.read_resps(i).fire) {
|
||||
read_state(i) := responded
|
||||
}
|
||||
}
|
||||
|
||||
// Write state management
|
||||
active_write_addr := channels.write_cmds(0).bits.addr
|
||||
active_write_data := channels.write_cmds(0).bits.data
|
||||
active_write_en := false.B
|
||||
for (i <- 0 until nWrites) {
|
||||
channels.write_cmds(i).ready := write_prereqs_met(i) && !write_complete(i)
|
||||
when (advance_cycle || reset.toBool()) {
|
||||
write_complete(i) := false.B
|
||||
} .elsewhen (channels.write_cmds(i).fire) {
|
||||
write_complete(i) := true.B
|
||||
}
|
||||
when (channels.write_cmds(i).fire) {
|
||||
active_write_addr := channels.write_cmds(i).bits.addr
|
||||
active_write_data := channels.write_cmds(i).bits.data
|
||||
active_write_en := channels.write_cmds(i).bits.active
|
||||
}
|
||||
}
|
||||
|
||||
}
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue