nepi-ng - in real scale

We have grouped in this page a few more elaborate examples :

a multi-ping example that in particular deals with a variable number of nodes, and runs ping beween each couple of nodes.
more examples are also available in this github repo https://github.com/fit-r2lab/r2lab-demos

Objective

This script is an extension of the B series, but the number of nodes involved is now a parameter in the experiment (the --max option that defaults to 5).

We will thus use the m first nodes in the system, and

initialize the wireless ad-hoc network, on all those node, like we had done in the B series,
and then run as many pings as there are couples of nodes - that is to say m*(m-1)/2 pings
the results of these are then retrieved locally in files names PING-i-j

`--parallel` : 3 modes: sequential, full parallel, partially parallel

This script implements 3 different strategies, mostly for the purpose of this tutorial:

sequential (default) :

check that we have the lease,
then prepare all nodes (init their wireless ad-hoc network) in parallel,
then sequentially run all the pings

full parallel (with the `-p 0` option) :

Same, but all the pings are running at the same time. This is why, as you will realize if you try to run the script yourself, this strategy can cope with something like around 8 nodes. With a higher number of nodes, there is a too great need of simultaneous open connections, and the script stops to behave as expected.

parallel / limited window (with .e.g the `-p 20` option) :

This is an illustration of the ability that an asynciojobs scheduler has, to limit the number of simultaneous jobs. So if you say -p 20 only a maximum number of 20 pings will run at the same time.

`--dry-run`

This script also implements a dry-run mode. If the -n option is provided, the scheduler is merely printed out in text form, and a .dot file is also produced, as well as a .png file if you have dot installed on your system.

Here are the outputs obtained with --max 4. As you can see on these figures, the only difference between both strategies is the addition of one required relationship per ping job. This is what the Sequence object is only about, see more details on Sequence in its own documentation.

Sequential scheduler with `--max 4`

Parallel scheduler with `--max 4`

troubleshooting: `--verbose` and `--debug`

The script implements 2 options to increase verbosity

-v/--verbose sets verbosity of the ssh layer, like we have done throughout these tutorials, and
-d/--debug sets verbosity on the scheduler and jobs objects.

That's one way to see it, but your specific needs may lead you to doing other choices.

The code

#!/usr/bin/env python3

import os

from argparse import ArgumentParser

from asynciojobs import Scheduler, Sequence, PrintJob

from apssh import SshNode, LocalNode, SshJob
from apssh import Run, RunScript, Pull
from apssh import TimeColonFormatter

##########
gateway_hostname  = 'faraday.inria.fr'
gateway_username  = 'inria_r2lab.tutorial'
# a fixed amount of time that we wait for once all the nodes
# have their wireless interface configured
settle_delay      = 10
# this should be amply enough to talk to another
ping_timeout      = 5

parser = ArgumentParser()
parser.add_argument("-s", "--slice", default=gateway_username,
                    help="specify an alternate slicename, default={}"
                         .format(gateway_username))
parser.add_argument("-l", "--load-images", default=False, action='store_true',
                    help = "enable to load the default image on nodes before the exp")
parser.add_argument("-w", "--wifi-driver", default='ath9k',
                    choices = ['iwlwifi', 'ath9k'],
                    help="specify which driver to use")

parser.add_argument("-m", "--max", default=5, type=int,
                    help="will run on all nodes between 1 and this number")
parser.add_argument("-p", "--parallel", default=None,type=int,
                    help="""run in parallel, with this value as the
                    limit to the number of simultaneous pings - -p 0 means no limit""")
parser.add_argument("-t", "--ping-timeout", default=ping_timeout,
                    help="specify timeout for each individual ping - default={}".format(ping_timeout))

parser.add_argument("-n", "--dry-run", default=False, action='store_true',
                    help="do not run anything, just print out scheduler, and generate .dot file")
parser.add_argument("-v", "--verbose-ssh", default=False, action='store_true',
                    help="run ssh in verbose mode")
parser.add_argument("-d", "--debug", default=False, action='store_true',
                    help="run jobs and engine in verbose mode")
args = parser.parse_args()

gateway_username = args.slice
verbose_ssh = args.verbose_ssh
verbose_jobs = args.debug
ping_timeout = args.ping_timeout
wireless_driver   = args.wifi_driver

### the list of (integers) that hold node numbers, starting at 1
# of course it would make sense to come up with a less rustic way of
# selecting target nodes
node_ids = range(1, args.max+1)

# convenience
def fitname(id):
    return "fit{:02d}".format(id)

########## the nodes involved
faraday = SshNode(hostname = gateway_hostname, username = gateway_username,
formatter=TimeColonFormatter(), verbose = verbose_ssh)

# this is a python dictionary that allows to retrieve a node object
# from an id
node_index = {
    id: SshNode(gateway = faraday,
                hostname = fitname(id),
                username = "root",
                formatter=TimeColonFormatter(),
                verbose = verbose_ssh)
    for id in node_ids
}

########## the global scheduler
scheduler = Scheduler(verbose = verbose_jobs)

##########
check_lease = SshJob(
    scheduler = scheduler,
    node = faraday,
    verbose = verbose_jobs,
    critical = True,
    command = Run("rhubarbe leases --check"),
)

########## load images if requested

green_light = check_lease

if args.load_images:
    negated_node_ids = [ "~{}".format(id) for id in node_ids ]
    # replace green_light in this case
    green_light = SshJob(
        node = faraday,
        required = check_lease,
        critical = True,
        scheduler = scheduler,
        verbose = verbose_jobs,
        commands = [
            Run("rhubarbe", "off", "-a", *negated_node_ids),
            Run("rhubarbe", "load", "-i", "ubuntu", *node_ids),
            Run("rhubarbe", "wait", *node_ids)
        ]
    )

##########
# setting up the wireless interface on all nodes
#
# this is a python feature known as a list comprehension
# we just create as many SshJob instances as we have
# (id, SshNode) couples in node_index
# and gather them all in init_wireless_jobs
# they all depend on green_light
init_wireless_jobs = [
    SshJob(
        scheduler = scheduler,
        required = green_light,
        node = node,
        verbose = verbose_jobs,
        label = "init {}".format(id),
        command = RunScript(
            "B3-wireless.sh", "init-ad-hoc-network",
            wireless_driver, "foobar", 2412,
        ))
    for id, node in node_index.items() ]


########## let the wireless network settle
settle_wireless_job = PrintJob(
    "Let the wireless network settle",
    sleep = settle_delay,
    scheduler = scheduler,
    required = init_wireless_jobs,
    label = "settling")

##########
# create all the ping jobs, i.e. max*(max-1)/2
# this again is a python list comprehension
# see the 2 for instructions at the bottom
#
# notice that these SshJob instances are not yet added
# to the scheduler, we will add them later on
# depending on the sequential/parallel strategy

pings = [
    SshJob(
        node = nodei,
        required = settle_wireless_job,
        label = "ping {} -> {}".format(i, j),
        verbose = verbose_jobs,
        commands = [
            Run("echo {} '->' {}".format(i, j)),
            RunScript("B3-wireless.sh", "my-ping",
                      "10.0.0.{}".format(j), ping_timeout,
                      ">", "PING-{:02d}-{:02d}".format(i, j)),
            Pull(remotepaths = "PING-{:02d}-{:02d}".format(i, j), localpath="."),
        ]
    )
    # looping on the source
    for i, nodei in node_index.items()
    # and on the destination
    for j, nodej in node_index.items()
    # and keep only half of the couples
    if j > i
]

if args.parallel is None:
    # with the sequential strategy, we just need to
    # create a Sequence out of the list of pings
    # Sequence will add the required relationships
    scheduler.add(Sequence(*pings, scheduler=scheduler))
    # for running sequentially we impose no limit on the scheduler
    # that will be limitied anyways by the very structure
    # of the required graph
    jobs_window = None
else:
    # with the parallel strategy
    # we just need to insert all the ping jobs
    # as each already has its required OK
    scheduler.update(pings)
    # this time the value in args.parallel is the one
    # to use as the jobs_limit; if 0 then inch'allah
    jobs_window = args.parallel

# finally - i.e. when all pings are done
# we can list the current contents of our local directory
SshJob(
    node = LocalNode(),
    scheduler = scheduler,
    required = pings,
    verbose = verbose_jobs,
    commands = [
        Run("ls", "-l", "PING*")
    ]
)

#
# dry-run mode
# show the scheduler using list(details=True)
# also generate a .dot file, and attempt to
# transform it into a .png - should work if graphviz is installed
# but don't run anything of course
#
if args.dry_run:
    print("==================== COMPLETE SCHEDULER")
    # -n + -v = max details
    scheduler.list(details=verbose_jobs)
    suffix = "par" if args.parallel else "seq"
    if args.load_images:
        suffix += "-load"
    filename = "multi-ping-{}-{}".format(suffix, args.max)
    print("Creating dot file: {filename}.dot".format(filename=filename))
    scheduler.export_as_dotfile(filename+".dot")
    # try to run dot
    command = "dot -Tpng -o {filename}.png {filename}.dot".format(filename=filename)
    print("Trying to run dot to create {filename}.png".format(filename=filename))
    retcod = os.system(command)
    if retcod == 0:
        print("{filename}.png OK".format(filename=filename))
    else:
        print("Could not create {filename}.png - do you have graphviz installed ?"
              .format(filename=filename))
    # in dry-run mode we are done
    exit(0)

ok = scheduler.orchestrate(jobs_window=jobs_window)
# give details if it failed
ok or scheduler.debrief()

# return something useful to your OS
exit(0 if ok else 1)

How to run it

How to produce the 2 figures above:

# sequential
./multi-ping.py -m 4 -n
# parallel
./multi-ping.py -m 4 -p 0 -n