497 lines
20 KiB
ReStructuredText
497 lines
20 KiB
ReStructuredText
Manager Tasks
|
|
=============
|
|
|
|
This page outlines all of the tasks that the FireSim manager supports.
|
|
|
|
.. _firesim-managerinit:
|
|
|
|
``firesim managerinit``
|
|
-----------------------
|
|
|
|
This is a setup command that does the following:
|
|
|
|
- Backup existing config files if they exist (``config_runtime.yaml``,
|
|
``config_build.yaml``, ``config_build_recipes.yaml``, and ``config_hwdb.yaml``).
|
|
- Replace the default config files (``config_runtime.yaml``, ``config_build.yaml``,
|
|
``config_build_recipes.yaml``, and ``config_hwdb.yaml``) with clean example versions.
|
|
|
|
Then, do platform-specific init steps for the given ``--platform``.
|
|
|
|
.. tabs::
|
|
|
|
.. tab::
|
|
|
|
``f1``
|
|
|
|
- Run ``aws configure``, prompt for credentials
|
|
- Prompt the user for email address and subscribe them to
|
|
notifications for their own builds.
|
|
- Setup the ``config_runtime.yaml`` and ``config_build.yaml``
|
|
files with AWS run/build farm arguments.
|
|
|
|
.. tab::
|
|
|
|
All other platforms
|
|
|
|
This includes platforms such as: ``xilinx_alveo_u200``,
|
|
``xilinx_alveo_u250``, ``xilinx_alveo_u280``, ``xilinx_vcu118``,
|
|
and ``rhsresearch_nitefury_ii``.
|
|
|
|
- Setup the ``config_runtime.yaml`` and ``config_build.yaml``
|
|
files with externally provisioned run/build farm arguments.
|
|
|
|
You can re-run this whenever you want to get clean configuration files.
|
|
|
|
.. note::
|
|
|
|
For ``f1``, you can just hit Enter when prompted for ``aws configure`` credentials
|
|
and your email address, and both will keep your previously specified values.
|
|
|
|
If you run this command by accident and didn't mean to overwrite your configuration
|
|
files, you'll find backed-up versions in
|
|
``firesim/deploy/sample-backup-configs/backup*``.
|
|
|
|
.. _firesim-buildbitstream:
|
|
|
|
``firesim buildbitstream``
|
|
--------------------------
|
|
|
|
This command builds a FireSim bitstream using a **Build Farm** from the Chisel RTL for
|
|
the configurations that you specify. The process of defining configurations to build is
|
|
explained in the documentation for :ref:`config-build` and :ref:`config-build-recipes`.
|
|
|
|
For each config, the build process entails:
|
|
|
|
.. tabs::
|
|
|
|
.. tab::
|
|
|
|
F1
|
|
|
|
#. [Locally] Run the elaboration process for your hardware
|
|
configuration
|
|
#. [Locally] FAME-1 transform the design with MIDAS
|
|
#. [Locally] Attach simulation models (I/O widgets, memory model,
|
|
etc.)
|
|
#. [Locally] Emit Verilog to run through the FPGA Flow
|
|
#. Use a build farm configuration to launch/use build hosts for
|
|
each configuration you want to build
|
|
#. [Local/Remote] Prep build hosts, copy generated Verilog for
|
|
hardware configuration to build instance
|
|
#. [Local or Remote] Run Vivado Synthesis and P&R for the
|
|
configuration
|
|
#. [Local/Remote] Copy back all output generated by Vivado
|
|
including the final tar file
|
|
#. [Local/AWS Infra] Submit the tar file to the AWS backend for
|
|
conversion to an AFI
|
|
#. [Local] Wait for the AFI to become available, then notify the
|
|
user of completion by email
|
|
|
|
.. tab::
|
|
|
|
XDMA-based On-Prem.
|
|
|
|
#. [Locally] Run the elaboration process for your hardware
|
|
configuration
|
|
#. [Locally] FAME-1 transform the design with MIDAS
|
|
#. [Locally] Attach simulation models (I/O widgets, memory model,
|
|
etc.)
|
|
#. [Locally] Emit Verilog to run through the FPGA Flow
|
|
#. Use a build farm configuration to launch/use build hosts for
|
|
each configuration you want to build
|
|
#. [Local/Remote] Prep build hosts, copy generated Verilog for
|
|
hardware configuration to build instance
|
|
#. [Local or Remote] Run Vivado Synthesis and P&R for the
|
|
configuration
|
|
#. [Local/Remote] Copy back all output generated by Vivado
|
|
(including ``bit`` bitstream)
|
|
|
|
.. tab::
|
|
|
|
Vitis-based On-Prem.
|
|
|
|
#. [Locally] Run the elaboration process for your hardware
|
|
configuration
|
|
|
|
#. [Locally] FAME-1 transform the design with MIDAS
|
|
|
|
#. [Locally] Attach simulation models (I/O widgets, memory model,
|
|
etc.)
|
|
|
|
#. [Locally] Emit Verilog to run through the FPGA Flow
|
|
|
|
#. Use a build farm configuration to launch/use build hosts for
|
|
each configuration you want to build
|
|
|
|
#. [Local/Remote] Prep build hosts, copy generated Verilog for
|
|
hardware configuration to build instance
|
|
|
|
#. [Local or Remote] Run Vitis Synthesis and P&R for the
|
|
configuration
|
|
|
|
#. [Local/Remote] Copy back all output generated by Vitis
|
|
(including the ``bitstream_tar`` containing the ``xclbin``
|
|
bitstream)
|
|
|
|
This process happens in parallel for all of the builds you specify. The command will
|
|
exit when all builds are completed (but you will get notified as INDIVIDUAL builds
|
|
complete if on F1) and indicate whether all builds passed or a build failed by the exit
|
|
code.
|
|
|
|
.. note::
|
|
|
|
**It is highly recommended that you either run this command in a** ``screen`` **or
|
|
use** ``mosh`` **to access the manager instance. Builds will not finish if the
|
|
manager is killed due to ssh disconnection from the manager instance.**
|
|
|
|
When you run a build for a particular configuration, a directory named
|
|
``LAUNCHTIME-CONFIG_TRIPLET-BUILD_NAME`` is created in
|
|
``firesim/deploy/results-build/``. This directory will contain:
|
|
|
|
.. tabs::
|
|
|
|
.. tab::
|
|
|
|
F1
|
|
|
|
- ``AGFI_INFO``: Describes the state of the AFI being built,
|
|
while the manager is running. Upon build completion, this
|
|
contains the AGFI/AFI that was produced, along with its
|
|
metadata.
|
|
|
|
- ``cl_firesim:``: This directory is essentially the Vivado
|
|
project that built the FPGA image, in the state it was in when
|
|
the Vivado build process completed. This contains reports,
|
|
stdout from the build, and the final tar file produced by
|
|
Vivado. This also contains a copy of the generated verilog
|
|
(``FireSim-generated.sv``) used to produce this build.
|
|
|
|
.. tab::
|
|
|
|
XDMA-based On-Prem.
|
|
|
|
The Vivado project collateral that built the FPGA image, in the
|
|
state it was in when the Vivado build process completed. This
|
|
contains reports, ``stdout`` from the build, and the final
|
|
``bitstream_tar`` bitstream/metadata file produced by Vivado. This
|
|
also contains a copy of the generated verilog
|
|
(``FireSim-generated.sv``) used to produce this build.
|
|
|
|
.. tab::
|
|
|
|
Vitis-based On-Prem.
|
|
|
|
The Vitis project collateral that built the FPGA image, in the
|
|
state it was in when the Vitis build process completed. This
|
|
contains reports, ``stdout`` from the build, and the final
|
|
``bitstream_tar`` produced from the Vitis-generated ``xclbin``
|
|
bitstream. This also contains a copy of the generated verilog
|
|
(``FireSim-generated.sv``) used to produce this build.
|
|
|
|
If this command is cancelled by a SIGINT, it will prompt for confirmation that you want
|
|
to terminate the build instances. If you respond in the affirmative, it will move
|
|
forward with the termination. If you do not want to have to confirm the termination
|
|
(e.g. you are using this command in a script), you can give the command the
|
|
``--forceterminate`` command line argument. For example, the following will terminate
|
|
all build instances in the build farm without prompting for confirmation if a SIGINT is
|
|
received:
|
|
|
|
.. code-block:: bash
|
|
|
|
firesim buildbitstream --forceterminate
|
|
|
|
.. _firesim-builddriver:
|
|
|
|
``firesim builddriver``
|
|
-----------------------
|
|
|
|
For FPGA-based simulations (when ``metasimulation_enabled`` is ``false`` in
|
|
``config_runtime.yaml``), this command will build the host-side simulation driver, also
|
|
without requiring any simulation hosts to be launched or reachable. For complicated
|
|
designs, running this before running ``firesim launchrunfarm`` can reduce the time spent
|
|
leaving FPGA hosts idling while waiting for driver build.
|
|
|
|
For metasimulations (when ``metasimulation_enabled`` is ``true`` in
|
|
``config_runtime.yaml``), this command will build the entire software simulator without
|
|
requiring any simulation hosts to be launched or reachable. This is useful for example
|
|
if you are using FireSim metasimulations as your primary simulation tool while
|
|
developing target RTL, since it allows you to run the Chisel build flow and iterate on
|
|
your design without launching/setting up extra machines to run simulations.
|
|
|
|
.. _firesim-tar2afi:
|
|
|
|
``firesim tar2afi``
|
|
-------------------
|
|
|
|
.. note::
|
|
|
|
Can only be used for the F1 platform.
|
|
|
|
This command can be used to run only steps 9 & 10 from an aborted ``firesim
|
|
buildbitstream`` for F1 that has been manually corrected. ``firesim tar2afi`` assumes
|
|
that you have a
|
|
``firesim/deploy/results-build/LAUNCHTIME-CONFIG_TRIPLET-BUILD_NAME/cl_firesim``
|
|
directory tree that can be submitted to the AWS backend for conversion to an AFI.
|
|
|
|
When using this command, you need to also provide the ``--launchtime LAUNCHTIME``
|
|
cmdline argument, specifying an already existing LAUNCHTIME.
|
|
|
|
This command will run for the configurations specified in :ref:`config-build` and
|
|
:ref:`config-build-recipes` as with :ref:`firesim-buildbitstream`. It is likely that you
|
|
may want to comment out build recipe names that successfully completed the
|
|
:ref:`firesim-buildbitstream` process before running this command.
|
|
|
|
.. _firesim-shareagfi:
|
|
|
|
``firesim shareagfi``
|
|
---------------------
|
|
|
|
.. note::
|
|
|
|
Can only be used for the F1 platform.
|
|
|
|
This command allows you to share AGFIs that you have already built (that are listed in
|
|
:ref:`config-hwdb`) with other users. It will take the named hardware configurations
|
|
that you list in the ``agfis_to_share`` section of ``config_build.yaml``, grab the
|
|
respective AGFIs for each from ``config_hwdb.yaml``, and share them across all F1
|
|
regions with the users listed in the ``share_with_accounts`` section of
|
|
``config_build.yaml``. You can also specify ``public: public`` in
|
|
``share_with_accounts`` to make the AGFIs public.
|
|
|
|
You must own the AGFIs in order to do this -- this will NOT let you share AGFIs that
|
|
someone else owns and gave you access to.
|
|
|
|
.. _firesim-launchrunfarm:
|
|
|
|
``firesim launchrunfarm``
|
|
-------------------------
|
|
|
|
.. note::
|
|
|
|
Can only be used for the F1 platform.
|
|
|
|
This command launches a **Run Farm** on AWS EC2 on which you run simulations. Run farms
|
|
consist of a set of **run farm instances** that can be spawned on AWS EC2. The
|
|
``run_farm`` mapping in ``config_runtime.yaml`` determines the run farm used and its
|
|
configuration (see :ref:`config-runtime`). The ``base_recipe`` key/value pair specifies
|
|
the default set of arguments to use for a particular run farm type. To change the run
|
|
farm type, a new ``base_recipe`` file must be provided from ``deploy/run-farm-recipes``.
|
|
You are able to override the arguments given by a ``base_recipe`` by adding keys/values
|
|
to the ``recipe_arg_overrides`` mapping. These keys/values must match the same mapping
|
|
structure as the ``args`` mapping. Overridden arguments override recursively such that
|
|
all key/values present in the override args replace the default arguments given by the
|
|
``base_recipe``. In the case of sequences, a overridden sequence completely replaces the
|
|
corresponding sequence in the default args.
|
|
|
|
An AWS EC2 run farm consists of AWS instances like ``f1.16xlarge``, ``f1.4xlarge``,
|
|
``f1.2xlarge``, and ``m4.16xlarge`` instances. Before you run the command, you define
|
|
the number of each that you want in the ``recipe_arg_overrides`` section of
|
|
``config_runtime.yaml`` or in the ``base_recipe`` itself.
|
|
|
|
A launched run farm is tagged with a ``run_farm_tag``, which is used to disambiguate
|
|
multiple parallel run farms; that is, you can have many run farms running, each running
|
|
a different experiment at the same time, each with its own unique ``run_farm_tag``. One
|
|
convenient feature to add to your AWS management panel is the column for
|
|
``fsimcluster``, which contains the ``run_farm_tag`` value. You can see how to do that
|
|
in the :ref:`fsimcluster-aws-panel` section.
|
|
|
|
The other options in the ``run_farm`` section, ``run_instance_market``,
|
|
``spot_interruption_behavior``, and ``spot_max_price`` define *how* instances in the run
|
|
farm are launched. See the documentation for ``config_runtime.yaml`` for more details on
|
|
other arguments (see :ref:`config-runtime`).
|
|
|
|
**ERRATA**: One current requirement is that you must define a target config in the
|
|
``target_config`` section of ``config_runtime.yaml`` that does not require more
|
|
resources than the run farm you are trying to launch. Thus, you should also setup your
|
|
``target_config`` parameters before trying to launch the corresponding run farm. This
|
|
requirement will be removed in the future.
|
|
|
|
Once you setup your configuration and call ``firesim launchrunfarm``, the command will
|
|
launch the run farm. If all succeeds, you will see the command print out instance IDs
|
|
for the correct number/types of instances (you do not need to pay attention to these or
|
|
record them). If an error occurs, it will be printed to console.
|
|
|
|
.. warning::
|
|
|
|
On AWS EC2, once you run this command, your run farm will continue to run until you
|
|
call ``firesim terminaterunfarm``. This means you will be charged for the running
|
|
instances in your run farm until you call ``terminaterunfarm``. You are responsible
|
|
for ensuring that instances are only running when you want them to be by checking
|
|
the AWS EC2 Management Panel.
|
|
|
|
.. _firesim-terminaterunfarm:
|
|
|
|
``firesim terminaterunfarm``
|
|
----------------------------
|
|
|
|
.. note::
|
|
|
|
Can only be used for the F1 platform.
|
|
|
|
This command terminates some or all of the instances in the Run Farm defined in your
|
|
``config_runtime.yaml`` file by the ``run_farm`` ``base_recipe``, depending on the
|
|
command line arguments you supply.
|
|
|
|
By default, running ``firesim terminaterunfarm`` will terminate ALL instances with the
|
|
specified ``run_farm_tag``. When you run this command, it will prompt for confirmation
|
|
that you want to terminate the listed instances. If you respond in the affirmative, it
|
|
will move forward with the termination.
|
|
|
|
If you do not want to have to confirm the termination (e.g. you are using this command
|
|
in a script), you can give the command the ``--forceterminate`` command line argument.
|
|
For example, the following will TERMINATE ALL INSTANCES IN THE RUN FARM WITHOUT
|
|
PROMPTING FOR CONFIRMATION:
|
|
|
|
.. code-block:: bash
|
|
|
|
firesim terminaterunfarm --forceterminate
|
|
|
|
The ``--terminatesome=INSTANCE_TYPE:COUNT`` flag additionally allows you to terminate
|
|
only some (``COUNT``) of the instances of a particular type (``INSTANCE_TYPE``) in a
|
|
particular Run Farm.
|
|
|
|
Here are some examples:
|
|
|
|
.. code-block:: bash
|
|
|
|
[ start with 2 f1.16xlarges, 2 f1.2xlarges, 2 m4.16xlarges ]
|
|
|
|
firesim terminaterunfarm --terminatesome=f1.16xlarge:1 --forceterminate
|
|
|
|
[ now, we have: 1 f1.16xlarges, 2 f1.2xlarges, 2 m4.16xlarges ]
|
|
|
|
.. code-block:: bash
|
|
|
|
[ start with 2 f1.16xlarges, 2 f1.2xlarges, 2 m4.16xlarges ]
|
|
|
|
firesim terminaterunfarm --terminatesome=f1.16xlarge:1 --terminatesome=f1.2xlarge:2 --forceterminate
|
|
|
|
[ now, we have: 1 f1.16xlarges, 0 f1.2xlarges, 2 m4.16xlarges ]
|
|
|
|
.. warning::
|
|
|
|
On AWS EC2, once you call ``launchrunfarm``, you will be charged for running
|
|
instances in your Run Farm until you call ``terminaterunfarm``. You are responsible
|
|
for ensuring that instances are only running when you want them to be by checking
|
|
the AWS EC2 Management Panel.
|
|
|
|
.. _firesim-infrasetup:
|
|
|
|
``firesim infrasetup``
|
|
----------------------
|
|
|
|
Once you have launched a Run Farm and setup all of your configuration options, the
|
|
``infrasetup`` command will build all components necessary to run the simulation and
|
|
deploy those components to the machines in the Run Farm. Here is a rough outline of what
|
|
the command does:
|
|
|
|
- Constructs the internal representation of your simulation. This is a tree of
|
|
components in the simulation (simulated server blades, switches)
|
|
- For each type of server blade, rebuild the software simulation driver by querying the
|
|
bitstream metadata to get the build-quadruplet or using its override
|
|
- For each type of switch in the simulation, generate the switch model binary
|
|
- For each host instance in the Run Farm, collect information about all the resources
|
|
necessary to run a simulation on that host instance, then copy files and flash FPGAs
|
|
with the required bitstream.
|
|
|
|
Details about setting up your simulation configuration can be found in
|
|
:ref:`config-runtime`.
|
|
|
|
**Once you run a simulation, you should re-run** ``firesim infrasetup`` **before
|
|
starting another one, even if it is the same exact simulation on the same Run Farm.**
|
|
|
|
You can see detailed output from an example run of ``infrasetup`` in the
|
|
:ref:`single-node-sim` and :ref:`cluster-sim` Getting Started Guides.
|
|
|
|
.. _firesim-boot:
|
|
|
|
``firesim boot``
|
|
----------------
|
|
|
|
Once you have run ``firesim infrasetup``, this command will actually start simulations.
|
|
It begins by launching all switches (if they exist in your simulation config), then
|
|
launches all server blade simulations. This simply launches simulations and then exits
|
|
-- it does not perform any monitoring.
|
|
|
|
This command is useful if you want to launch a simulation, then plan to interact with
|
|
the simulation by-hand (i.e. by directly interacting with the console).
|
|
|
|
.. _firesim-kill:
|
|
|
|
``firesim kill``
|
|
----------------
|
|
|
|
Given a simulation configuration and simulations running on a Run Farm, this command
|
|
force-terminates all components of the simulation. Importantly, this does not allow any
|
|
outstanding changes to the filesystem in the simulated systems to be committed to the
|
|
disk image.
|
|
|
|
.. _firesim-runworkload:
|
|
|
|
``firesim runworkload``
|
|
-----------------------
|
|
|
|
This command is the standard tool that lets you launch simulations, monitor the progress
|
|
of workloads running on them, and collect results automatically when the workloads
|
|
complete. To call this command, you must have first called ``firesim infrasetup`` to
|
|
setup all required simulation infrastructure on the remote nodes.
|
|
|
|
This command will first create a directory in ``firesim/deploy/results-workload/`` named
|
|
as ``LAUNCH_TIME-WORKLOADNAME``, where results will be completed as simulations
|
|
complete. This command will then automatically call ``firesim boot`` to start
|
|
simulations. Then, it polls all the instances in the Run Farm every 10 seconds to
|
|
determine the state of the simulated system. If it notices that a simulation has
|
|
shutdown (i.e. the simulation disappears from the output of ``screen -ls``), it will
|
|
automatically copy back all results from the simulation, as defined in the workload
|
|
configuration (see the :ref:`deprecated-defining-custom-workloads` section).
|
|
|
|
For non-networked simulations, it will wait for ALL simulations to complete (copying
|
|
back results as each workload completes), then exit.
|
|
|
|
For globally-cycle-accurate networked simulations, the global simulation will stop when
|
|
any single node powers off. Thus, for these simulations, ``runworkload`` will copy back
|
|
results from all nodes and force them to terminate by calling ``kill`` when ANY SINGLE
|
|
ONE of them shuts down cleanly.
|
|
|
|
A simulation shuts down cleanly when the workload running on the simulator calls
|
|
``poweroff``.
|
|
|
|
.. _firesim-runcheck:
|
|
|
|
``firesim runcheck``
|
|
--------------------
|
|
|
|
This command is provided to let you debug configuration options without launching
|
|
instances. In addition to the output produced at command line/in the log, you will find
|
|
a pdf diagram of the topology you specify, annotated with information about the
|
|
workloads, hardware configurations, and abstract host mappings for each simulation (and
|
|
optionally, switch) in your design. These diagrams are located in
|
|
``firesim/deploy/generated-topology-diagrams/``, named after your topology.
|
|
|
|
Here is an example of such a diagram (click to expand/zoom, it will likely be illegible
|
|
without expanding):
|
|
|
|
.. figure:: runcheck_example.png
|
|
:scale: 50 %
|
|
:alt: Example diagram from running ``firesim runcheck``
|
|
|
|
Example diagram for an 8-node cluster with one ToR switch
|
|
|
|
.. _firesim-enumeratefpgas:
|
|
|
|
``firesim enumeratefpgas``
|
|
--------------------------
|
|
|
|
.. note::
|
|
|
|
Can only be used for XDMA-based On-Premises platforms.
|
|
|
|
This command should be run once for each on-premises Run Farm you plan to use that
|
|
contains XDMA-based FPGAs. When run, the command will generate a file
|
|
(``/opt/firesim-db.json``) on each Run Farm Machine in the run farm that contains a
|
|
mapping from the FPGA ID used for JTAG programming to the PCIe ID used to run
|
|
simulations for each FPGA attached to the machine.
|
|
|
|
If you ever change the physical layout of a Run Farm Machine in your Run Farm (e.g.,
|
|
which PCIe slot the FPGAs are attached to), you will need to re-run this command.
|