How to use JUBE to automate the build process of a given HPC
application?
Objectives
Define an initial workflow.
Create the benchmarks systems description automatically.
Writing an initial workflow
Mapping our experience with JUBE so far to the steps involved in
compiling GROMACS, we can define two different steps:
download and unpack GROMACS sources
building the application
Discussion
Why don’t we have more or less steps defined?
If we would define all actions taken in the episode manually building
GROMACS in one step, we would always download the sources of GROMACS for
each build.
If we would separate the configure and the make
phases, those will seem independent, yet there is always a single
configure phase with each make phase. Therefore, these two actions
should be part of the same step.
Create an XML configuration for a GROMACS workflow with the name
gromacs.xml with the following configuration.
After executing this workflow we can make several observations:
Some of the strings now hardcoded are part of a pattern that
could be represented as a parameter instead.
Downloading and unpacking is done as part of the workflow into
the workflow directory tree. Additional runs will download and unpack
the sources again.
So let’s address these issues one at a time.
Reducing code-copy in the specification
In several locations, the version of GROMACS is referenced: in the
source archive, the unpacked source directory, and the installation
path. If we’d change the version in the future, we would have to change
several locations in the workflow configuration. We can reduce this by
creating an appropriate parameterset.
Edit your initial GROMACS workflow configuration and use parameters
for key variables in your workflow, to make this more flexible.
By default, JUBE will create a specific directory for each
workpackage to ensure that independent workpackages do not interfere
with each other. However, sometime it is helpful to break out of this
safety net.
Caution
The sandboxing of individual workflow runs and their workpackages is
done for good reason: to limit potential interactions among independent
workpackages and ensure consistency and reproducibility.
Only deviate from this when you have a very strong argument to do
so.
Next to workflow-specific parameters defined by the workflow itself,
JUBE also defines variables containing information about the current
workflow run. These variables can be referenced just as any parameter
defined as part of a parameterset. One of these variables is
$jube_benchmark_home, and it contains the absolute path to
the location of the workflow specification.
Using this variable, we can now define the the installation path
outside of the directory structure referenced by outpath.
However, as any paths outside of JUBE’s run directory tree will be
accessed (and potentially written to) by multiple workflow runs.
Therefore, you will need to take precautions not to overwrite
installations accidentially.
Now we installed GROMACS externally but yet have to automate the
decision whether to build or use the installed version. This can be
handled with the active attribute. Steps and do tags (and a
few others) can contain an attribute active that can be
either true or false or any parsable boolean
Python expression. When evaluated to false, the respective
entity is disabled. When evaluated to true, the respective
entity remains enabled (just as if no active attribute had
been given).
To only build GROMACS, when no complete install is available, we
need
an indicator that a previous install was successfull,
the evaluation of that indicator,
an expression to use as the value for the active
attribute, and
add an action to remove any preexisting installation (that may be
incomplete).
For this purpose we add a final do action to the
build step that creates a file indicating that this step was
complete and, because of transitivity, all prior do actions
completed successfully. Furthermore, we then create a
parameter as part of the gromacs_pset that
indicates the existence of the file in the target directory. We use a
parameter here, because of it ease of use when checking for the
existence of a file as part of a shell expresseion. This parameter can
then be referenced in the appropriate do actions in the
build step.
Key Points
Group actions that belong together and have a 1:1 relation ship in a
single step.
Basing parameter values on other parameter values can help code copy
and increase flexibility and maintainability of your workflow.
You can generate build files from templates using dynamic values
from parameter sets.