Introduction
- Workflow automation aids in getting reproducible results.
- JUBE enables execution and management of complex workflows on HPC systems.
- JUBE simplifies exploration of a large parameter space of measurements.
- JUBE automatically isolates individual workpackages of a run in separate directories steps to avoid individual concurrent workpackages to overwrite or unintentional reuse of intermediate data.
- JUBE does not intrinsically create fully reproducible workflows, but the user has to manually record any parameters that make the workflow reproducible.
An example application
- HPC applications may need to be downloaded and built before use
- HPC applications may have additional dependencies.
- HPC applications often do not have an automatic dependency
management system (like
pip
orconda
). - HPC applications may need serial preparation of inputs
- steps of individual HPC workflows may occur at different levels of parallelism
Working with JUBE
- The different states of a JUBE workflow can be
running
andcompleted
. - A workflow is completed if all steps of the workflow are completed (even if some steps may have failed).
- A JUBE workflow is started with
jube run
. - JUBE executes the workflow until the end, or until an asynchronous task is discovered.
- If JUBE exits with completion of asynchronous tasks still pending,
the user needs to call
jube continue
to check for their completion and further running of dependent workflow steps until overall completion is achieved. - The user can analyze output and print results at any time.
- Comma-separated values in parameter definition are tokenized and create individual workpackages. Using multiple comma-separated values for parameters enables easy creation of parameter spaces.
An initial workflow specification
- Group actions that belong together and have a 1:1 relation ship in a single step.
- Basing parameter values on other parameter values can help code copy and increase flexibility and maintainability of your workflow.
- You can generate build files from templates using dynamic values from parameter sets.
Spanning parameter spaces
- Comma-separated values in parameter definition are tokenized and create individual workpackages.
- Using multiple comma-separated values for parameters enables easy creation of parameter spaces.
Using templates
- Filesets let you easily copy templates into the workpackage space.
- Substitution sets perform simple text-based substitutions in templates.
- Using parameter values in substitution sets enables easy value manipulation during template generation.
Including external data
-
from
inside ause
clause allows to include an external set as is. -
init_with
allows to initialize child sets with the values of a parent set and modify and extend the set. -
include
allows to include arbitrary parts of external configuration files.
Running an application
- JUBE provides a default job script template for different schedulers.
- JUBE allows for asynchronous execution of workflows using the
done_file
attribute. - Asynchronous JUBE executions are dependent on the generation of files by the asynchronous task to continue.
Output analysis and presentation
- JUBE allows for definition of patterns to retrieve values from a workflow step.
- Patterns can be defined as regular expressions or Python expressions.
- JUBE allows for the generate of pretty-printed tables and CSV format.
Thoughts on reproducibility
- Identify aspects of the workflow that are important for reproducibility.
- Obtain values for these either through parameters or patterns.
- Add actions to the workflow that help in recording information through patterns.
Wrap-up and outlook
- JUBE is a very flexible workflow management system.
- JUBE’s advanced features can be further customized with Python and Perl expressions.