NEMU(NJU Emulator) is a simple but complete full-system emulator designed for teaching purpose.
Originally it supports x86, mips32, riscv64, and riscv32.
This repo only guarantees the support for riscv64.
The main features of NEMU include
a small monitor with a simple debugger
single step
register/memory examination
expression evaluation without the support of symbols
watch point
differential testing against reference design (e.g. QEMU)
snapshot
CPU core with support of most common ISAs
x86
real mode is not supported
x87 floating point instructions are not supported
mips32
CP1 floating point instructions are not supported
riscv32
only RV32IM
riscv64
rv64gcbhk currently
rv64gcbhkv in the near future
memory
paging
TLB is optional (but necessary for mips32)
protection is not supported for most ISAs, but PMP is supported for riscv64
Checkpoint is not compatible with GEM5’s SE checkpoints or m5 checkpoints.
Cannot produce GEM5’s SE checkpoints or m5 checkpoints
Cannot run GEM5’s SE checkpoints or m5 checkpoints
Recommend NOT to produce a checkpoint in M-mode
Please DO NOT
Please don’t running SimPoint bbv.gz with NEMU, XS-GEM5, or XiangShan processor, because it is not bootable
Please don’t make a new issue without reading the doc
Please don’t make a new issue without searching in issue list
Please don’t make a new issue about building Linux in NEMU’s issue list,
plz head to XiangShan doc
The role of NEMU in XiangShan ecosystem
NEMU plays the following roles in XiangShan ecosystem:
In reference mode, NEMU is the golden model of XiangShan processor (paper:
MINJIE,
code to adapt NEMU with XiangShan:Difftest)
In standalone mode, NEMU is able to produce SimPoint BBVs and checkpoints for XS-GEM5 and XiangShan processor.
In standalone mode, NEMU can also be used as a profiler for large programs.
Workflows: How to use NEMU in XiangShan
Run in reference mode
NEMU can be used as a reference design
to validate the correctness of XiangShan processor or XS-GEM5.
Typical workflow is as follows.
Concrete instructions are described in Section build-NEMU-as-ref.
graph TD;
build["Build NEMU in reference mode"]
so[/"./build/riscv64-nemu-interpreter-so"/]
cosim["Run XS-GEM5 or XiangShan processor, turn on difftest, specify riscv64-nemu-interpreter-so as reference design"]
build-->so
so-->cosim
graph TD;
am["Build a baremetal app with AM"]
linux["Build a Linux image containing user app"]
baremetal[/"Image of baremetal app or OS"/]
run["Run image with NEMU, XS-GEM5, or XiangShan processor"]
am-->baremetal
linux-->baremetal
baremetal-->run
Run in standalone to produce checkpoints
Because most of the enterprise users and researchers are more interested in running larger workloads,
like SPECCPU, on XS-GEM5 or XiangShan processor.
To reduce the simulation time of detailed simulation, NEMU serves as a checkpoint producer.
The flow for producing and running checkpoints is as follows.
The detailed instructions for each step is described in Section Howto.
graph TD;
linux["Build a Linux image containing NEMU trap app and user app"]
bin[/"Image containing Linux and app"/]
profiling["Boot image with NEMU with SimPoint profiling"]
bbv[/"SimPoint BBV, a .gz file"/]
cluster["Cluster BBV with SimPoint"]
points[/"SimPoint sampled points and weights"/]
take_cpt["Boot image with NEMU to produce checkpoints"]
checkpoints[/"Checkpoints, several .gz files of memory image"/]
run["Run checkpoints with XS-GEM5 or XiangShan processor"]
linux-->bin
bin-->profiling
profiling-->bbv
bbv-->cluster
cluster-->points
points-->take_cpt
take_cpt-->checkpoints
checkpoints-->run
Howto
Install dependencies
Because different distributions have different package management tools, the installation commands are different.
For Ubuntu, users can install the dependencies with the following command:
As described in the workflow, NEMU either takes a baremetal app or
an operating system image as input.
For baremetal app, Abstract Machine is a light-weight baremetal library.
Common simple apps like coremark and dhrystone can be built with Abstract Machine.
Then modify NEMU_HOME and BBL_PATH in $NEMU_HOME/scripts/checkpoint_example/checkpoint_env.sh and the workload parameter passed to the function in each example script to get started.
Because we restore checkpoint in M mode, and the PC of returning to user mode is stored in EPC register.
This recovery method will break the architecture state (EPC) if the checkpoint is produced in M mode.
In contrast, if the checkpoint is produced in S mode or U mode,
the return process is just like a normal trap return, which will not break the architecture state.
Cannot build/run NEMU on cpt-bk or tracing branch
Please use master branch. The checkpoint related code is not merged from tracing branch into master
First, make sure interval size is smaller than total instruction counter of the application.
Second, it is not necessary to produce checkpoints for small applications with few intervals.
How to pick an interval size for SimPoint?
Typical sampling interval size used in architecture research is 10M-200M,
while typical warmup interval size is 20M-100M.
It depends on your cache size and use case.
For example, when studying cache’s temporal locality, it is better to use a larger interval size (>=50M).
How long does a 40M simulation take?
The simulation time depends on IPC of the application and the complexity of the CPU model.
For Verilator simulation of XiangShan processor, the simulation time varies from hours to days.
For XS-GEM5, the simulation time varies typically ranges from 6 minutes to 1 hour.
NEMU
Abort NEMU
NEMU(NJU Emulator) is a simple but complete full-system emulator designed for teaching purpose. Originally it supports x86, mips32, riscv64, and riscv32. This repo only guarantees the support for riscv64.
The main features of NEMU include
What is NOT supported
Please DO NOT
The role of NEMU in XiangShan ecosystem
NEMU plays the following roles in XiangShan ecosystem:
Workflows: How to use NEMU in XiangShan
Run in reference mode
NEMU can be used as a reference design to validate the correctness of XiangShan processor or XS-GEM5. Typical workflow is as follows. Concrete instructions are described in Section build-NEMU-as-ref.
Run in standalone mode without checkpoint
The typical flow for running workloads is similar for NEMU, XS-GEM5, and XiangShan processor. All of them only support full-system simulation. To prepare workloads for full-system simulation, users need to either build a baremetal app or running user programs in an operating system.
Run in standalone to produce checkpoints
Because most of the enterprise users and researchers are more interested in running larger workloads, like SPECCPU, on XS-GEM5 or XiangShan processor. To reduce the simulation time of detailed simulation, NEMU serves as a checkpoint producer. The flow for producing and running checkpoints is as follows. The detailed instructions for each step is described in Section Howto.
Howto
Install dependencies
Because different distributions have different package management tools, the installation commands are different. For Ubuntu, users can install the dependencies with the following command:
Use NEMU as reference design
Build reference.so
To build NEMU as reference design, run
./build/riscv64-nemu-interpreter-so
is the reference design.Specifically, xxx-ref_defconfig varies for different ISA extensions.
Cosimulation
To test XS-GEM5 against NEMU, refer to the doc of XS-GEM5 Difftest.
To test XiangShan processor against NEMU, run
Details can be found in the tutorial of XiangShan.
Workloads
As described in the workflow, NEMU either takes a baremetal app or an operating system image as input.
For baremetal app, Abstract Machine is a light-weight baremetal library. Common simple apps like coremark and dhrystone can be built with Abstract Machine.
For build operating system image, Please read the doc to build Linux.
Then modify
NEMU_HOME
andBBL_PATH
in$NEMU_HOME/scripts/checkpoint_example/checkpoint_env.sh
and the workload parameter passed to the function in each example script to get started.SimPoint profiling and checkpoint
Please read the doc to generate checkpoint
Run a checkpoint with XS-GEM5 or XiangShan processor
Run a checkpoint with XiangShan processor
Run checkpoints with XS-GEM5: the doc to run XS-GEM5
FAQ
Why cannot produce a checkpoint in M-mode?
Read the source code of GCPT restorer
Because we restore checkpoint in M mode, and the PC of returning to user mode is stored in EPC register. This recovery method will break the architecture state (EPC) if the checkpoint is produced in M mode. In contrast, if the checkpoint is produced in S mode or U mode, the return process is just like a normal trap return, which will not break the architecture state.
Cannot build/run NEMU on cpt-bk or tracing branch
Please use master branch. The checkpoint related code is not merged from tracing branch into master
How to run a checkpoint with XiangShan processor?
First, make sure you have obtained a checkpoint.gz, not a bbv.gz. Then, see the doc to run checkpoints.
bbv.gz is empty
First, make sure interval size is smaller than total instruction counter of the application. Second, it is not necessary to produce checkpoints for small applications with few intervals.
How to pick an interval size for SimPoint?
Typical sampling interval size used in architecture research is 10M-200M, while typical warmup interval size is 20M-100M. It depends on your cache size and use case. For example, when studying cache’s temporal locality, it is better to use a larger interval size (>=50M).
How long does a 40M simulation take?
The simulation time depends on IPC of the application and the complexity of the CPU model. For Verilator simulation of XiangShan processor, the simulation time varies from hours to days. For XS-GEM5, the simulation time varies typically ranges from 6 minutes to 1 hour.
Error when building Linux, riscv-pk, or OpenSBI
First, check FAQs of building Linux kernel for XiangShan
Then, try to search solution in issue list of NEMU and issue list of XiangShan doc.
Finally, if you cannot find a solution, please make a new issue in XiangShan doc.