65 lines
3.0 KiB
ReStructuredText
65 lines
3.0 KiB
ReStructuredText
.. _debugging-hanging-simulators:
|
|
|
|
Debugging a Hanging Simulator
|
|
=============================
|
|
|
|
A common symptom of a failing simulation is that appears to hang. Debugging this is
|
|
especially daunting in FireSim because it's not immediately obvious if it's a bug in the
|
|
target, or somewhere in the host. To make it easier to identify the problem, the
|
|
simulation driver includes a polling watchdog that tracks for simulation progress, and
|
|
periodically updates an output file, ``heartbeat.csv``, with a target cycle count and a
|
|
timestamp. When debugging these issues, we always encourage the use of metasimulation to
|
|
try reproducing the failure if possible. We outline three common cases in the section
|
|
below.
|
|
|
|
Case 1: Target hang.
|
|
--------------------
|
|
|
|
**Symptoms:** There is no output from the target (i.e., the uartlog might cease), but
|
|
simulated time continues to advance (``heartbeat.csv`` will be periodically updated).
|
|
Simulator instrumentation (TracerV, printf) may continue to produce new output.
|
|
|
|
**Causes:** Typically, a bug in the target RTL. However, bridge bugs leading to
|
|
erroneous token values will also produce this behavior.
|
|
|
|
**Next steps:** You can deploy the full suite of FireSim's debugging tools for failures
|
|
of this nature, since assertion synthesis, printf synthesis, and other target-side
|
|
features still function. Assume there is a bug in the target RTL and trace back the
|
|
failure to a bridge if applicable.
|
|
|
|
Case 2: Simulator hang due to FPGA-side token starvation.
|
|
---------------------------------------------------------
|
|
|
|
**Symptoms:** The driver's main loop spins freely, as no bridge gets new work to do. As
|
|
a result, the polling interval quickly elapses and the simulation is torn down due to a
|
|
lack of forward progress.
|
|
|
|
**Causes:** Generally, a bug in a bridge implementation (ex. the BridgeModule has
|
|
accidentally dequeued a token without producing a new output token; the BridgeModule is
|
|
waiting on a driver interaction that never occurs).
|
|
|
|
**Next steps:** These are the trickiest to solve. Try to identify the bridge that's
|
|
responsible by removing unnecessary ones, using an AutoILA, and adding printfs to
|
|
BridgeDriver sources. Target-side debugging utilities may be used to identify
|
|
problematic target behavior, but tend not to be useful for identifying the root cause.
|
|
|
|
Case 3: Simulator hang due to driver-side deadlock.
|
|
---------------------------------------------------
|
|
|
|
**Symptoms:** The loss of all output, notably, ``heartbeat.csv`` ceases to be further
|
|
updated.
|
|
|
|
**Causes:** Generally, a bridge driver bug. For example, the driver may be busy waiting
|
|
on some output from the FPGA, but the FPGA-hosted part of the simulator has stalled
|
|
waiting for tokens.
|
|
|
|
**Next Steps:** Identify the buggy driver using printfs or attaching to the running
|
|
simulator using GDB.
|
|
|
|
Simulator Heartbeat PlusArgs
|
|
----------------------------
|
|
|
|
``+heartbeat-polling-interval=<int>``: Specifies the number of round trips through the
|
|
simulator main loop before polling the FPGA's target cycle counter. Disable the
|
|
heartbeat by setting this to -1.
|