finish cluster tutorial
This commit is contained in:
parent
fffecbe6e0
commit
de0eaba226
|
@ -1,22 +1,35 @@
|
|||
Running a Cluster Simulation
|
||||
===============================
|
||||
|
||||
|
||||
TODO
|
||||
|
||||
Now that we've completed the setup of our manager instance, it's time to run
|
||||
a simulation! In this section, we will simulate **1 target node**, for which we
|
||||
will need a single ``f1.2xlarge`` (1 FPGA) instance.
|
||||
Now, let's move on to simulating a cluster of eight nodes, interconnected
|
||||
by a network with one 8-port Top-of-Rack (ToR) switch and 200 Gbps, 2μs links.
|
||||
This will require one ``f1.16xlarge`` (8 FPGA) instance.
|
||||
|
||||
Make sure you are ``ssh`` or ``mosh``'d into your manager instance and have sourced
|
||||
``sourceme-f1-manager.sh`` before running any of these commands.
|
||||
|
||||
Returning to a clean configuration
|
||||
-------------------------------------
|
||||
|
||||
If you already ran the single-node tutorial, let's return to a clean FireSim
|
||||
manager configuration by doing the following:
|
||||
|
||||
::
|
||||
|
||||
cd firesim/deploy
|
||||
cp backup-sample-configs/sample_config_runtime.ini config_runtime.ini
|
||||
|
||||
|
||||
Building target software
|
||||
------------------------
|
||||
|
||||
In these instructions, we'll assume that you want to boot Linux on your
|
||||
simulated node. To do so, we'll need to build our FireSim-compatible RISC-V
|
||||
If you already built target software during the single-node tutorial, you can
|
||||
skip to the next part (Setting up the manager configuration). If you haven't followed the single-node tutorial,
|
||||
continue with this section.
|
||||
|
||||
In these instructions, we'll assume that you want to boot Linux on each of the
|
||||
nodes in your
|
||||
simulated cluster. To do so, we'll need to build our FireSim-compatible RISC-V
|
||||
Linux distro. You can do this like so:
|
||||
|
||||
::
|
||||
|
@ -52,100 +65,28 @@ you have not modified it):
|
|||
.. include:: /../deploy/sample-backup-configs/sample_config_runtime.ini
|
||||
:code: ini
|
||||
|
||||
We'll need to modify a couple of these lines.
|
||||
For the 8-node cluster simulation, the defaults in this file are exactly what
|
||||
we want. Let's outline the important parameters:
|
||||
|
||||
First, let's tell the manager to use the correct numbers and types of instances.
|
||||
You'll notice that in the ``[runfarm]`` section, the manager is configured to
|
||||
launch a Run Farm named ``mainrunfarm``, consisting of one ``f1.16xlarge`` and
|
||||
no ``m4.16xlarge``\ s or ``f1.2xlarge``\ s. The tag specified here allows the
|
||||
manager to differentiate amongst many parallel run farms (each running
|
||||
a workload) that you may be operating -- but more on that later.
|
||||
|
||||
Since we only want to simulate a single node, let's switch to using one
|
||||
``f1.2xlarge`` and no ``f1.16xlarge``\s. To do so, change this section to:
|
||||
|
||||
::
|
||||
|
||||
[runfarm]
|
||||
# per aws restrictions, this tag cannot be longer than 255 chars
|
||||
runfarmtag=mainrunfarm
|
||||
f1_16xlarges=0
|
||||
m4_16xlarges=0
|
||||
f1_2xlarges=1
|
||||
* ``f1_16xlarges=1``: This tells the manager that we want to launch one ``f1.16xlarge`` when we call the ``launchrunfarm`` command.
|
||||
* ``topology=example_8config``: This tells the manager to use the topology named ``example_8config`` which is defined in ``deploy/runtools/user_topology.py``. This topology simulates an 8-node cluster with one ToR switch.
|
||||
* ``defaulthwconfig=firesim-quadcore-nic-ddr3-llc4mb``: This tells the manager to use a quad-core Rocket Chip configuration with 4 MB of L2 and 16 GB of DDR3, with a NIC, for each of the simulated nodes in the topology.
|
||||
|
||||
You'll see other parameters here, like ``runinstancemarket``,
|
||||
``spotinterruptionbehavior``, and ``spotmaxprice``. If you're an experienced
|
||||
AWS user, you can see what these do by looking at the (advanced configuration
|
||||
section TODO). Otherwise, don't change them.
|
||||
|
||||
Now, let's change the ``[targetconfig]`` section to model the correct target design.
|
||||
By default, it is set to model an 8-node cluster with a cycle-accurate network.
|
||||
Instead, we want to model a single-node with no network. To do so, we will need
|
||||
to change a few items in this section:
|
||||
|
||||
::
|
||||
|
||||
[targetconfig]
|
||||
topology=no_net_config
|
||||
no_net_num_nodes=1
|
||||
linklatency=6405
|
||||
switchinglatency=10
|
||||
netbandwidth=200
|
||||
|
||||
# This references a section from config_hwconfigs.ini
|
||||
# In homogeneous configurations, use this to set the hardware config deployed
|
||||
# for all simulators
|
||||
defaulthwconfig=firesim-quadcore-no-nic-ddr3-llc4mb
|
||||
|
||||
|
||||
Note that we changed three of the parameters here: ``topology`` is now set to
|
||||
``no_net_config``, indicating that we do not want a network. Then,
|
||||
``no_net_num_nodes`` is set to ``1``, indicating that we only want to simulate
|
||||
one node. Lastly, we changed ``defaulthwconfig`` from
|
||||
``firesim-quadcore-nic-ddr3-llc4mb`` to
|
||||
``firesim-quadcore-no-nic-ddr3-llc4mb``. Notice the subtle difference in this
|
||||
last option? All we did is switch to a hardware configuration that does not
|
||||
have a NIC. This hardware configuration models a Quad-core Rocket Chip with 4
|
||||
MB of L2 cache and 16 GB of DDR3, and **no** network interface card.
|
||||
|
||||
We will leave the last section (``[workload]``) unchanged here, since we do
|
||||
want to run Linux on our simulated system. The ``terminateoncompletion``
|
||||
feature is an advanced feature that you can learn more about in the (advanced
|
||||
configuration TODO) section.
|
||||
As in the single-node tutorial, we will leave the last section (``[workload]``)
|
||||
unchanged here, since we do want to run Linux on our simulated system. The
|
||||
``terminateoncompletion`` feature is an advanced feature that you can learn
|
||||
more about in the (advanced configuration TODO) section.
|
||||
|
||||
As a final sanity check, your ``config_runtime.ini`` file should now look like this:
|
||||
|
||||
::
|
||||
|
||||
# RUNTIME configuration for the FireSim Simulation Manager
|
||||
# See docs/Configuration-Details.rst for documentation of all of these params.
|
||||
|
||||
[runfarm]
|
||||
runfarmtag=mainrunfarm
|
||||
|
||||
f1_16xlarges=0
|
||||
m4_16xlarges=0
|
||||
f1_2xlarges=1
|
||||
|
||||
runinstancemarket=ondemand
|
||||
spotinterruptionbehavior=terminate
|
||||
spotmaxprice=ondemand
|
||||
|
||||
[targetconfig]
|
||||
topology=no_net_config
|
||||
no_net_num_nodes=1
|
||||
linklatency=6405
|
||||
switchinglatency=10
|
||||
netbandwidth=200
|
||||
|
||||
# This references a section from config_hwconfigs.ini
|
||||
# In homogeneous configurations, use this to set the hardware config deployed
|
||||
# for all simulators
|
||||
defaulthwconfig=firesim-quadcore-no-nic-ddr3-llc4mb
|
||||
|
||||
[workload]
|
||||
workloadname=linux-uniform.json
|
||||
terminateoncompletion=no
|
||||
.. include:: /../deploy/sample-backup-configs/sample_config_runtime.ini
|
||||
:code: ini
|
||||
|
||||
|
||||
Launching a Simulation!
|
||||
|
@ -157,8 +98,6 @@ our single-node simulation, let's actually launch an instance and run it!
|
|||
Starting the Run Farm
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
|
||||
First, we will tell the manager to launch our Run Farm, as we specified above.
|
||||
When you do this, you will start getting charged for the running EC2 instances
|
||||
(in addition to your manager).
|
||||
|
@ -173,20 +112,20 @@ You should expect output like the following:
|
|||
|
||||
::
|
||||
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy$ firesim launchrunfarm
|
||||
FireSim Manager. Docs: http://docs.fires.im
|
||||
Running: launchrunfarm
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy$ firesim launchrunfarm
|
||||
FireSim Manager. Docs: http://docs.fires.im
|
||||
Running: launchrunfarm
|
||||
|
||||
Waiting for instance boots: f1.16xlarges
|
||||
Waiting for instance boots: m4.16xlarges
|
||||
Waiting for instance boots: f1.2xlarges
|
||||
i-0d6c29ac507139163 booted!
|
||||
The full log of this run is:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--00-19-43-launchrunfarm-B4Q2ROAK0JN9EDE4.log
|
||||
Waiting for instance boots: f1.16xlarges
|
||||
i-09e5491cce4d5f92d booted!
|
||||
Waiting for instance boots: m4.16xlarges
|
||||
Waiting for instance boots: f1.2xlarges
|
||||
The full log of this run is:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--06-05-53-launchrunfarm-ZGVP753DSU1Y9Q6R.log
|
||||
|
||||
|
||||
The output will rapidly progress to ``Waiting for instance boots: f1.2xlarges``
|
||||
and then take a minute or two while your ``f1.2xlarge`` instance launches.
|
||||
The output will rapidly progress to ``Waiting for instance boots: f1.16xlarges``
|
||||
and then take a minute or two while your ``f1.16xlarge`` instance launches.
|
||||
Once the launches complete, you should see the instance id printed and the instance
|
||||
will also be visible in your AWS EC2 Management console. The manager will tag
|
||||
the instances launched with this operation with the value you specified above
|
||||
|
@ -200,7 +139,8 @@ Setting up the simulation infrastructure
|
|||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The manager will also take care of building and deploying all software
|
||||
components necessary to run your simulation. The manager will also handle
|
||||
components necessary to run your simulation (including switches for the networked
|
||||
case). The manager will also handle
|
||||
flashing FPGAs. To tell the manager to setup our simulation infrastructure,
|
||||
let's run:
|
||||
|
||||
|
@ -213,35 +153,60 @@ For a complete run, you should expect output like the following:
|
|||
|
||||
::
|
||||
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy$ firesim infrasetup
|
||||
FireSim Manager. Docs: http://docs.fires.im
|
||||
Running: infrasetup
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy$ firesim infrasetup
|
||||
FireSim Manager. Docs: http://docs.fires.im
|
||||
Running: infrasetup
|
||||
|
||||
Building FPGA software driver for FireSim-FireSimRocketChipQuadCoreConfig-FireSimDDR3FRFCFSLLC4MBConfig
|
||||
Building switch model binary for switch switch0
|
||||
[172.30.2.178] Executing task 'instance_liveness'
|
||||
[172.30.2.178] Checking if host instance is up...
|
||||
[172.30.2.178] Executing task 'infrasetup_node_wrapper'
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 0.
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 1.
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 2.
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 3.
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 4.
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 5.
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 6.
|
||||
[172.30.2.178] Copying FPGA simulation infrastructure for slot: 7.
|
||||
[172.30.2.178] Installing AWS FPGA SDK on remote nodes.
|
||||
[172.30.2.178] Unloading EDMA Driver Kernel Module.
|
||||
[172.30.2.178] Copying AWS FPGA EDMA driver to remote node.
|
||||
[172.30.2.178] Clearing FPGA Slot 0.
|
||||
[172.30.2.178] Clearing FPGA Slot 1.
|
||||
[172.30.2.178] Clearing FPGA Slot 2.
|
||||
[172.30.2.178] Clearing FPGA Slot 3.
|
||||
[172.30.2.178] Clearing FPGA Slot 4.
|
||||
[172.30.2.178] Clearing FPGA Slot 5.
|
||||
[172.30.2.178] Clearing FPGA Slot 6.
|
||||
[172.30.2.178] Clearing FPGA Slot 7.
|
||||
[172.30.2.178] Flashing FPGA Slot: 0 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Flashing FPGA Slot: 1 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Flashing FPGA Slot: 2 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Flashing FPGA Slot: 3 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Flashing FPGA Slot: 4 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Flashing FPGA Slot: 5 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Flashing FPGA Slot: 6 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Flashing FPGA Slot: 7 with agfi: agfi-09e85ffabe3543903.
|
||||
[172.30.2.178] Loading EDMA Driver Kernel Module.
|
||||
[172.30.2.178] Copying switch simulation infrastructure for switch slot: 0.
|
||||
The full log of this run is:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--06-07-33-infrasetup-2Z7EBCBIF2TSI66Q.log
|
||||
|
||||
Building FPGA software driver for FireSimNoNIC-FireSimRocketChipQuadCoreConfig-FireSimDDR3FRFCFSLLC4MBConfig
|
||||
[172.30.2.174] Executing task 'instance_liveness'
|
||||
[172.30.2.174] Checking if host instance is up...
|
||||
[172.30.2.174] Executing task 'infrasetup_node_wrapper'
|
||||
[172.30.2.174] Copying FPGA simulation infrastructure for slot: 0.
|
||||
[172.30.2.174] Installing AWS FPGA SDK on remote nodes.
|
||||
[172.30.2.174] Unloading EDMA Driver Kernel Module.
|
||||
[172.30.2.174] Copying AWS FPGA EDMA driver to remote node.
|
||||
[172.30.2.174] Clearing FPGA Slot 0.
|
||||
[172.30.2.174] Flashing FPGA Slot: 0 with agfi: agfi-0eaa90f6bb893c0f7.
|
||||
[172.30.2.174] Loading EDMA Driver Kernel Module.
|
||||
The full log of this run is:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--00-32-02-infrasetup-9DJJCX29PF4GAIVL.log
|
||||
|
||||
Many of these tasks will take several minutes, especially on a clean copy of
|
||||
the repo. The console output here contains the "user-friendly" version of the
|
||||
the repo (in particular, ``f1.16xlarges`` usually take a couple of minutes to
|
||||
start, so don't be alarmed if you're stuck at ``Checking if host instance is
|
||||
up...``) . The console output here contains the "user-friendly" version of the
|
||||
output. If you want to see detailed progress as it happens, ``tail -f`` the
|
||||
latest logfile in ``firesim/deploy/logs/``.
|
||||
|
||||
At this point, the ``f1.2xlarge`` instance in our Run Farm has all the infrastructure
|
||||
necessary to run a simulation.
|
||||
At this point, the ``f1.16xlarge`` instance in our Run Farm has all the
|
||||
infrastructure necessary to run everything in our simulation.
|
||||
|
||||
So, let's launch our simulation!
|
||||
|
||||
|
||||
Running a simulation!
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
@ -252,88 +217,126 @@ Finally, let's run our simulation! To do so, run:
|
|||
firesim runworkload
|
||||
|
||||
|
||||
This command boots up a simulation and prints out the live status of the simulated
|
||||
nodes every 10s. When you do this, you will initially see output like:
|
||||
|
||||
TODO
|
||||
|
||||
|
||||
If you don't look quickly, you might miss it! After that, you'll get a live
|
||||
status page:
|
||||
This command boots up the 8-port switch simulation and then starts 8 Rocket Chip
|
||||
FPGA Simulations, then prints out the live status of the simulated
|
||||
nodes and switch every 10s. When you do this, you will initially see output like:
|
||||
|
||||
::
|
||||
|
||||
FireSim Simulation Status @ 2018-05-19 00:38:56.062737
|
||||
--------------------------------------------------------------------------------
|
||||
This workload's output is located in:
|
||||
/home/centos/firesim-new/deploy/results-workload/2018-05-19--00-38-52-linux-uniform/
|
||||
This run's log is located in:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--00-38-52-runworkload-JS5IGTV166X169DZ.log
|
||||
This status will update every 10s.
|
||||
--------------------------------------------------------------------------------
|
||||
Instances
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.174 | Terminated: False
|
||||
--------------------------------------------------------------------------------
|
||||
Simulated Switches
|
||||
--------------------------------------------------------------------------------
|
||||
--------------------------------------------------------------------------------
|
||||
Simulated Nodes/Jobs
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.174 | Job: linux-uniform0 | Sim running: True
|
||||
--------------------------------------------------------------------------------
|
||||
Summary
|
||||
--------------------------------------------------------------------------------
|
||||
1/1 instances are still running.
|
||||
1/1 simulations are still running.
|
||||
--------------------------------------------------------------------------------
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy$ firesim runworkload
|
||||
FireSim Manager. Docs: http://docs.fires.im
|
||||
Running: runworkload
|
||||
|
||||
Creating the directory: /home/centos/firesim-new/deploy/results-workload/2018-05-19--06-28-43-linux-uniform/
|
||||
[172.30.2.178] Executing task 'instance_liveness'
|
||||
[172.30.2.178] Checking if host instance is up...
|
||||
[172.30.2.178] Executing task 'boot_switch_wrapper'
|
||||
[172.30.2.178] Starting switch simulation for switch slot: 0.
|
||||
[172.30.2.178] Executing task 'boot_simulation_wrapper'
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 0.
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 1.
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 2.
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 3.
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 4.
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 5.
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 6.
|
||||
[172.30.2.178] Starting FPGA simulation for slot: 7.
|
||||
[172.30.2.178] Executing task 'monitor_jobs_wrapper'
|
||||
|
||||
|
||||
This will only exit once all of the simulated nodes have shut down. So, let's let it
|
||||
run and open another ssh connection to the manager instance. From there, ``cd`` into
|
||||
your firesim directory again and ``source sourceme-f1-manager.sh`` again to get
|
||||
our ssh key setup. To access our simulated system, ssh into the IP address being
|
||||
printed by the status page, **from your manager instance**. In our case, from
|
||||
the above output, we see that our simulated system is running on the instance with
|
||||
IP ``172.30.2.174``. So, run:
|
||||
If you don't look quickly, you might miss it, because it will be replaced with
|
||||
a live status page once simulations are kicked-off:
|
||||
|
||||
::
|
||||
|
||||
FireSim Simulation Status @ 2018-05-19 06:28:56.087472
|
||||
--------------------------------------------------------------------------------
|
||||
This workload's output is located in:
|
||||
/home/centos/firesim-new/deploy/results-workload/2018-05-19--06-28-43-linux-uniform/
|
||||
This run's log is located in:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--06-28-43-runworkload-ZHZEJED9MDWNSCV7.log
|
||||
This status will update every 10s.
|
||||
--------------------------------------------------------------------------------
|
||||
Instances
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.178 | Terminated: False
|
||||
--------------------------------------------------------------------------------
|
||||
Simulated Switches
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.178 | Switch name: switch0 | Switch running: True
|
||||
--------------------------------------------------------------------------------
|
||||
Simulated Nodes/Jobs
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform1 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform0 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform3 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform2 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform5 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform4 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform7 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform6 | Sim running: True
|
||||
--------------------------------------------------------------------------------
|
||||
Summary
|
||||
--------------------------------------------------------------------------------
|
||||
1/1 instances are still running.
|
||||
8/8 simulations are still running.
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
||||
In cycle-accurate networked mode, this will only exit when any ONE of the
|
||||
simulated nodes shuts down. So, let's let it run and open another ssh
|
||||
connection to the manager instance. From there, ``cd`` into your firesim
|
||||
directory again and ``source sourceme-f1-manager.sh`` again to get our ssh key
|
||||
setup. To access our simulated system, ssh into the IP address being printed by
|
||||
the status page, **from your manager instance**. In our case, from the above
|
||||
output, we see that our simulated system is running on the instance with IP
|
||||
``172.30.2.178``. So, run:
|
||||
|
||||
::
|
||||
|
||||
[RUN THIS ON YOUR MANAGER INSTANCE!]
|
||||
ssh 172.30.2.174
|
||||
ssh 172.30.2.178
|
||||
|
||||
This will log you into the instance running the simulation. Then, to attach to the
|
||||
console of the simulated system, run:
|
||||
This will log you into the instance running the simulation. On this machine,
|
||||
run ``screen -ls`` to get the list of all running simulation components.
|
||||
Attaching to the screens ``fsim0`` to ``fsim7`` will let you attach to the
|
||||
consoles of any of the 8 simulated nodes. You'll also notice an additional
|
||||
screen for the switch, however by default there is no interesting output printed
|
||||
here for performance reasons.
|
||||
|
||||
For example, if we want to enter commands into node zero, we can attach
|
||||
to its console like so:
|
||||
|
||||
::
|
||||
|
||||
screen -r fsim0
|
||||
|
||||
Voila! You should now see Linux booting on the simulated system and then be prompted
|
||||
Voila! You should now see Linux booting on the simulated node and then be prompted
|
||||
with a Linux login prompt, like so:
|
||||
|
||||
|
||||
::
|
||||
|
||||
[truncated Linux boot output]
|
||||
[ 0.020000] VFS: Mounted root (ext2 filesystem) on device 254:0.
|
||||
[ 0.020000] devtmpfs: mounted
|
||||
[ 0.020000] Freeing unused kernel memory: 140K
|
||||
[ 0.020000] This architecture does not have kernel memory protection.
|
||||
mount: mounting sysfs on /sys failed: No such device
|
||||
Starting logging: OK
|
||||
Starting mdev...
|
||||
mdev: /sys/dev: No such file or directory
|
||||
modprobe: can't change directory to '/lib/modules': No such file or directory
|
||||
Initializing random number generator... done.
|
||||
Starting network: ip: SIOCGIFFLAGS: No such device
|
||||
ip: can't find device 'eth0'
|
||||
FAIL
|
||||
Starting dropbear sshd: OK
|
||||
[truncated Linux boot output]
|
||||
[ 0.020000] Registered IceNet NIC 00:12:6d:00:00:02
|
||||
[ 0.020000] VFS: Mounted root (ext2 filesystem) on device 254:0.
|
||||
[ 0.020000] devtmpfs: mounted
|
||||
[ 0.020000] Freeing unused kernel memory: 140K
|
||||
[ 0.020000] This architecture does not have kernel memory protection.
|
||||
mount: mounting sysfs on /sys failed: No such device
|
||||
Starting logging: OK
|
||||
Starting mdev...
|
||||
mdev: /sys/dev: No such file or directory
|
||||
modprobe: can't change directory to '/lib/modules': No such file or directory
|
||||
Initializing random number generator... done.
|
||||
Starting network: OK
|
||||
Starting dropbear sshd: OK
|
||||
|
||||
Welcome to Buildroot
|
||||
buildroot login:
|
||||
Welcome to Buildroot
|
||||
buildroot login:
|
||||
|
||||
If you also ran the single-node no-nic simulation you'll notice a difference
|
||||
in this boot output -- here, Linux sees the NIC and its assigned MAC address and
|
||||
automatically brings up the ``eth0`` interface at boot.
|
||||
|
||||
Now, you can login to the system! The username is ``root`` and the password is
|
||||
``firesim``. At this point, you should be presented with a regular console,
|
||||
|
@ -368,13 +371,13 @@ You should see output like the following from the simulation console:
|
|||
::
|
||||
|
||||
# poweroff -f
|
||||
[ 12.456000] reboot: Power down
|
||||
[ 3.748000] reboot: Power down
|
||||
Power off
|
||||
time elapsed: 468.8 s, simulation speed = 88.50 MHz
|
||||
*** PASSED *** after 41492621244 cycles
|
||||
Runs 41492621244 cycles
|
||||
[PASS] FireSimNoNIC Test
|
||||
SEED: 1526690334
|
||||
time elapsed: 360.5 s, simulation speed = 37.82 MHz
|
||||
*** PASSED *** after 13634406804 cycles
|
||||
Runs 13634406804 cycles
|
||||
[PASS] FireSim Test
|
||||
SEED: 1526711978
|
||||
Script done, file is uartlog
|
||||
|
||||
[screen is terminating]
|
||||
|
@ -385,52 +388,105 @@ from the manager:
|
|||
|
||||
::
|
||||
|
||||
FireSim Simulation Status @ 2018-05-19 00:46:50.075885
|
||||
--------------------------------------------------------------------------------
|
||||
This workload's output is located in:
|
||||
/home/centos/firesim-new/deploy/results-workload/2018-05-19--00-38-52-linux-uniform/
|
||||
This run's log is located in:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--00-38-52-runworkload-JS5IGTV166X169DZ.log
|
||||
This status will update every 10s.
|
||||
--------------------------------------------------------------------------------
|
||||
Instances
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.174 | Terminated: False
|
||||
Instance IP: 172.30.2.178 | Terminated: False
|
||||
--------------------------------------------------------------------------------
|
||||
Simulated Switches
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.178 | Switch name: switch0 | Switch running: True
|
||||
--------------------------------------------------------------------------------
|
||||
Simulated Nodes/Jobs
|
||||
--------------------------------------------------------------------------------
|
||||
Instance IP: 172.30.2.174 | Job: linux-uniform0 | Sim running: False
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform1 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform0 | Sim running: False
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform3 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform2 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform5 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform4 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform7 | Sim running: True
|
||||
Instance IP: 172.30.2.178 | Job: linux-uniform6 | Sim running: True
|
||||
--------------------------------------------------------------------------------
|
||||
Summary
|
||||
--------------------------------------------------------------------------------
|
||||
1/1 instances are still running.
|
||||
0/1 simulations are still running.
|
||||
7/8 simulations are still running.
|
||||
--------------------------------------------------------------------------------
|
||||
Teardown required, manually tearing down...
|
||||
[172.30.2.178] Executing task 'kill_switch_wrapper'
|
||||
[172.30.2.178] Killing switch simulation for switchslot: 0.
|
||||
[172.30.2.178] Executing task 'kill_simulation_wrapper'
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 0.
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 1.
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 2.
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 3.
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 4.
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 5.
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 6.
|
||||
[172.30.2.178] Killing FPGA simulation for slot: 7.
|
||||
[172.30.2.178] Executing task 'screens'
|
||||
Confirming exit...
|
||||
[172.30.2.178] Executing task 'monitor_jobs_wrapper'
|
||||
[172.30.2.178] Slot 0 completed! copying results.
|
||||
[172.30.2.178] Slot 1 completed! copying results.
|
||||
[172.30.2.178] Slot 2 completed! copying results.
|
||||
[172.30.2.178] Slot 3 completed! copying results.
|
||||
[172.30.2.178] Slot 4 completed! copying results.
|
||||
[172.30.2.178] Slot 5 completed! copying results.
|
||||
[172.30.2.178] Slot 6 completed! copying results.
|
||||
[172.30.2.178] Slot 7 completed! copying results.
|
||||
[172.30.2.178] Killing switch simulation for switchslot: 0.
|
||||
FireSim Simulation Exited Successfully. See results in:
|
||||
/home/centos/firesim-new/deploy/results-workload/2018-05-19--00-38-52-linux-uniform/
|
||||
/home/centos/firesim-new/deploy/results-workload/2018-05-19--06-39-35-linux-uniform/
|
||||
The full log of this run is:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--00-38-52-runworkload-JS5IGTV166X169DZ.log
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--06-39-35-runworkload-4CDB78E3A4IA9IYQ.log
|
||||
|
||||
In the cluster case, you'll notice that shutting down ONE simulator causes the
|
||||
whole simulation to be torn down -- this is because we currently do not implement
|
||||
any kind of "disconnect" mechanism to remove one node from a globally-cycle-accurate
|
||||
simulation.
|
||||
|
||||
If you take a look at the workload output directory given in the manager output (in this case, ``/home/centos/firesim-new/deploy/results-workload/2018-05-19--00-38-52-linux-uniform/``), you'll see the following:
|
||||
If you take a look at the workload output directory given in the manager output (in this case, ``/home/centos/firesim-new/deploy/results-workload/2018-05-19--06-39-35-linux-uniform/``), you'll see the following:
|
||||
|
||||
::
|
||||
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy/results-workload/2018-05-19--00-38-52-linux-uniform$ ls -la */*
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 00:46 linux-uniform0/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 00:46 linux-uniform0/os-release
|
||||
-rw-rw-r-- 1 centos centos 7316 May 19 00:46 linux-uniform0/uartlog
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy/results-workload/2018-05-19--06-39-35-linux-uniform$ ls -la */*
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform0/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform0/os-release
|
||||
-rw-rw-r-- 1 centos centos 7476 May 19 06:45 linux-uniform0/uartlog
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform1/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform1/os-release
|
||||
-rw-rw-r-- 1 centos centos 8157 May 19 06:45 linux-uniform1/uartlog
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform2/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform2/os-release
|
||||
-rw-rw-r-- 1 centos centos 8157 May 19 06:45 linux-uniform2/uartlog
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform3/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform3/os-release
|
||||
-rw-rw-r-- 1 centos centos 8157 May 19 06:45 linux-uniform3/uartlog
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform4/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform4/os-release
|
||||
-rw-rw-r-- 1 centos centos 8157 May 19 06:45 linux-uniform4/uartlog
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform5/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform5/os-release
|
||||
-rw-rw-r-- 1 centos centos 8157 May 19 06:45 linux-uniform5/uartlog
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform6/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform6/os-release
|
||||
-rw-rw-r-- 1 centos centos 8157 May 19 06:45 linux-uniform6/uartlog
|
||||
-rw-rw-r-- 1 centos centos 797 May 19 06:45 linux-uniform7/memory_stats.csv
|
||||
-rw-rw-r-- 1 centos centos 125 May 19 06:45 linux-uniform7/os-release
|
||||
-rw-rw-r-- 1 centos centos 8157 May 19 06:45 linux-uniform7/uartlog
|
||||
-rw-rw-r-- 1 centos centos 153 May 19 06:45 switch0/switchlog
|
||||
|
||||
|
||||
What are these files? They are specified to the manager in a configuration file
|
||||
(``firesim/deploy/workloads/linux-uniform.json``) as files that we want
|
||||
automatically copied back to our manager after we run a simulation, which is
|
||||
useful for running benchmarks automatically. The advanced workloads section TODO
|
||||
will describe this process in detail.
|
||||
useful for running benchmarks automatically. Note that there is a directory for
|
||||
each simulated node and each simulated switch in the cluster. The advanced
|
||||
workloads section TODO will describes this process in detail.
|
||||
|
||||
For now, let's wrap-up our tutorial by terminating the ``f1.2xlarge`` instance
|
||||
For now, let's wrap-up our tutorial by terminating the ``f1.16xlarge`` instance
|
||||
that we launched. To do so, run:
|
||||
|
||||
::
|
||||
|
@ -447,13 +503,14 @@ Which should present you with the following:
|
|||
|
||||
IMPORTANT!: This will terminate the following instances:
|
||||
f1.16xlarges
|
||||
[]
|
||||
['i-09e5491cce4d5f92d']
|
||||
m4.16xlarges
|
||||
[]
|
||||
f1.2xlarges
|
||||
['i-0d6c29ac507139163']
|
||||
[]
|
||||
Type yes, then press enter, to continue. Otherwise, the operation will be cancelled.
|
||||
|
||||
|
||||
You must type ``yes`` then hit enter here to have your instances terminated. Once
|
||||
you do so, you will see:
|
||||
|
||||
|
@ -464,12 +521,11 @@ you do so, you will see:
|
|||
yes
|
||||
Instances terminated. Please confirm in your AWS Management Console.
|
||||
The full log of this run is:
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--00-51-54-terminaterunfarm-T9ZAED3LJUQQ3K0N.log
|
||||
/home/centos/firesim-new/deploy/logs/2018-05-19--06-50-37-terminaterunfarm-3VF0Z2KCAKKDY0ZU.log
|
||||
|
||||
**At this point, you should always confirm in your AWS management console that
|
||||
the instance is in the shutting-down or terminated states. You are ultimately
|
||||
responsible for ensuring that your instances are terminated appropriately.**
|
||||
|
||||
Congratulations on running your first FireSim simulation! At this point, you can
|
||||
check-out some of the advanced features of FireSim in the sidebar to the left,
|
||||
or you can continue on with the cluster simulation tutorial.
|
||||
Congratulations on running a cluster FireSim simulation! At this point, you can
|
||||
check-out some of the advanced features of FireSim in the sidebar to the left.
|
||||
|
|
|
@ -252,11 +252,21 @@ Finally, let's run our simulation! To do so, run:
|
|||
This command boots up a simulation and prints out the live status of the simulated
|
||||
nodes every 10s. When you do this, you will initially see output like:
|
||||
|
||||
TODO
|
||||
::
|
||||
|
||||
centos@ip-172-30-2-111.us-west-2.compute.internal:~/firesim-new/deploy$ firesim runworkload
|
||||
FireSim Manager. Docs: http://docs.fires.im
|
||||
Running: runworkload
|
||||
|
||||
If you don't look quickly, you might miss it! After that, you'll get a live
|
||||
status page:
|
||||
Creating the directory: /home/centos/firesim-new/deploy/results-workload/2018-05-19--00-38-52-linux-uniform/
|
||||
[172.30.2.174] Executing task 'instance_liveness'
|
||||
[172.30.2.174] Checking if host instance is up...
|
||||
[172.30.2.174] Executing task 'boot_simulation_wrapper'
|
||||
[172.30.2.174] Starting FPGA simulation for slot: 0.
|
||||
[172.30.2.174] Executing task 'monitor_jobs_wrapper'
|
||||
|
||||
If you don't look quickly, you might miss it, since it will get replaced with a
|
||||
live status page:
|
||||
|
||||
::
|
||||
|
||||
|
|
Loading…
Reference in New Issue