Output analysis and presentation
Last updated on 2025-06-27 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- How can you retrieve output from your HPC application and workflow?
- How to generate tables and CSV output from JUBE workflow parameters?
Objectives
- Generate regular expression patterns to analyse the output of a step.
- Generate a basic table from benchmark parameters and output.
Prerequisite
If you cannot execute GROMACS via the generated batch script to
generate an output, you can down load an example error log job.err and place it in the work directory of
your run stage
(jube_run/<id>/000002_run/work/
). Furthermore you
need to let JUBE know that the execution was successful with the
following command executed in the same directory.
Then continue the workflow to complete it.
Analyzing workflow output
Next to parameters, which are evaluated at use time in the steps of a workflow, another way to automatically obtain and store information about a workflow execution is to analyze the output generated during execution. JUBE provides an easy way for regular expressions to parse workflow output and store values of matched patterns in variables to be included in later result tables.
Patterns are defined as part of pattern sets that can be applied to specific files during analysis. As with parameters, the names of patterns need to be unique for the whole workflow. When included in result tables, their names can be shortened or changed completely to better fit the generated table.
Challenge
For the workflow defined, we have two steps with potentially valuable information: build and run. Check the outputs in the corresponding workpackages and identify interesting output files to parse for meaningful data.
- The
cmake_configure.log
in the build workpackage contains output from CMake that provides some insight on which libraries were found and used for the build. - The
stdout
in the run workpackage contains the output of the job submission command - The
job.err
in the run workpackage contains the GROMACS output.
Patterns are defines as regular
expressions as part of a pattern set. When we take the following
snippets from cmake_configure.log
OUTPUT
-- The C compiler identification is IntelLLVM 2024.2.0
-- The CXX compiler identification is IntelLLVM 2024.2.0
...
-- Detected best SIMD instructions for this CPU - AVX_512
-- Enabling 512-bit AVX-512 SIMD instructions using CXX flags: -march=skylake-avx512
...
-- Using external FFT library - Intel MKL
...
we can already identify some interesting information that are worth extracting.
Each pattern can contain an arbitrary amount of wildcards, but must
contain exactly one matching operator ()
, which
defines the value of the pattern. JUBE also defines several
patterns for common elementar types such as numbers and individual
words:
-
$jube_pat_int
: integer number w/ matching operator -
$jube_pat_nint
: integer number w/o matchin operator -
$jube_pat_fp
: floating point number w/ matching operator -
$jube_pat_nfp
: floating point number w/o matching operator -
$jube_pat_wrd
: word w/ matching operator -
$jube_pat_nwrd
: word w/o matching operator -
$jube_pat_bl
: blank space (variable length) w/o matching operator
Callout
While all steps of a JUBE workflow need to be defined before the workflow is started and cannot be altered after that, patterns and result tables can be updated while the workflow is active and even after a workflow has completed. This enables an iterative approach for defining patterns and result tables for previously unknown output.
To use the defined patterns, JUBE needs an analyser
specifying which patterns to use for which file or which patterns to use
globally.
We can then start and test the analyser with the analyse
command. However, as we modified the workflow configuration by adding
new patterns and analyser definitions, we need to tell JUBE to update
its information about the workflow. This can be done with the
-u
(for update) argument followed by the updated workflow
specification.
OUTPUT
######################################################################
# Analyse benchmark "GROMACS" id: 35
######################################################################
>>> Start analyse
>>> Analyse finished
>>> Analyse data storage: jube_run/000035/analyse.xml
######################################################################
Without the definition of a result table, we cannot visualise this directly, but we can investigate the data storage given in the output and check which patterns were matched for which workpackage.
XML
<?xml version="1.0" encoding="UTF-8"?>
<analyse>
<analyser name="gromacs_analyser">
<step name="build">
<workpackage id="1">
<pattern name="cmake_c_compiler_id" type="string">IntelLLVM</pattern>
<pattern name="cmake_c_compiler_id_first" type="string">IntelLLVM</pattern>
<pattern name="cmake_c_compiler_id_cnt" type="int">1</pattern>
<pattern name="cmake_c_compiler_id_last" type="string">IntelLLVM</pattern>
<pattern name="cmake_c_compiler_version" type="string">2024.2.0</pattern>
<pattern name="cmake_c_compiler_version_first" type="string">2024.2.0</pattern>
<pattern name="cmake_c_compiler_version_cnt" type="int">1</pattern>
<pattern name="cmake_c_compiler_version_last" type="string">2024.2.0</pattern>
<pattern name="cmake_cxx_compiler_id" type="string">IntelLLVM</pattern>
<pattern name="cmake_cxx_compiler_id_first" type="string">IntelLLVM</pattern>
<pattern name="cmake_cxx_compiler_id_cnt" type="int">1</pattern>
<pattern name="cmake_cxx_compiler_id_last" type="string">IntelLLVM</pattern>
<pattern name="cmake_cxx_compiler_version" type="string">2024.2.0</pattern>
<pattern name="cmake_cxx_compiler_version_first" type="string">2024.2.0</pattern>
<pattern name="cmake_cxx_compiler_version_cnt" type="int">1</pattern>
<pattern name="cmake_cxx_compiler_version_last" type="string">2024.2.0</pattern>
<pattern name="SIMD_detected" type="string">AVX_512</pattern>
<pattern name="SIMD_detected_first" type="string">AVX_512</pattern>
<pattern name="SIMD_detected_cnt" type="int">1</pattern>
<pattern name="SIMD_detected_last" type="string">AVX_512</pattern>
<pattern name="SIMD_flags" type="string">-march=skylake-avx512</pattern>
<pattern name="SIMD_flags_first" type="string">-march=skylake-avx512</pattern>
<pattern name="SIMD_flags_cnt" type="int">1</pattern>
<pattern name="SIMD_flags_last" type="string">-march=skylake-avx512</pattern>
<pattern name="FFT_detected" type="string">Intel MKL</pattern>
<pattern name="FFT_detected_first" type="string">Intel MKL</pattern>
<pattern name="FFT_detected_cnt" type="int">1</pattern>
<pattern name="FFT_detected_last" type="string">Intel MKL</pattern>
</workpackage>
</step>
</analyser>
</analyse>
JUBE also tracks multiple matches per pattern and tracks it in
“shadow” patterns with additional suffixes. You can find a more details
description in
the JUBE Glossary under ‘statistical values’. For numerical
statistics (e.g., min, max, avg, std) the pattern needs to be of
type int
or float
. Also, a
unit specified as a string can be stored with a
pattern.
Generating a result table
With our first successful matches, we can now define our first result table for build-related information in a result specification.
OUTPUT
gromacs_build:
| compiler | compiler version | SIMD | FFT |
|-----------|------------------|---------|-----------|
| IntelLLVM | 2024.2.0 | AVX_512 | Intel MKL |
Challenge
Add an additional patternset, analyser and result definition to generate an additional table similar to the following:
OUTPUT
gromacs_run:
| wp | gromacs_core_time[s] | gromacs_wall_time[s] | gromacs_core_perf[ns/day] | gromacs_wall_perf[hours/ns] |
|----|----------------------|----------------------|---------------------------|-----------------------------|
| 2 | 89.664 | 3.736 | 89.664 | 3.736 |
The wp
column references a JUBE variable identifying the
workpackage of the row. This makes it easier to identify the
right directory of the workpackage in case multiple run steps were
executed.
Callout
Each table has a separate file in the result/
directory
with its name as is name and the extension
.dat
.
Tables can be either pretty printed or in CSV (comma-separated values) format. The latter being the default type.
Key Points
- JUBE allows for definition of patterns to retrieve values from a workflow step.
- Patterns can be defined as regular expressions or Python expressions.
- JUBE allows for the generate of pretty-printed tables and CSV format.