Output analysis and presentation

Last updated on 2025-06-27 | Edit this page

Overview

Questions

  • How can you retrieve output from your HPC application and workflow?
  • How to generate tables and CSV output from JUBE workflow parameters?

Objectives

  • Generate regular expression patterns to analyse the output of a step.
  • Generate a basic table from benchmark parameters and output.

Prerequisite

If you cannot execute GROMACS via the generated batch script to generate an output, you can down load an example error log job.err and place it in the work directory of your run stage (jube_run/<id>/000002_run/work/). Furthermore you need to let JUBE know that the execution was successful with the following command executed in the same directory.

SH

jube_run/<id>/000002_run/work/$ touch ready

Then continue the workflow to complete it.

Analyzing workflow output


Next to parameters, which are evaluated at use time in the steps of a workflow, another way to automatically obtain and store information about a workflow execution is to analyze the output generated during execution. JUBE provides an easy way for regular expressions to parse workflow output and store values of matched patterns in variables to be included in later result tables.

Patterns are defined as part of pattern sets that can be applied to specific files during analysis. As with parameters, the names of patterns need to be unique for the whole workflow. When included in result tables, their names can be shortened or changed completely to better fit the generated table.

Challenge

For the workflow defined, we have two steps with potentially valuable information: build and run. Check the outputs in the corresponding workpackages and identify interesting output files to parse for meaningful data.

  • The cmake_configure.log in the build workpackage contains output from CMake that provides some insight on which libraries were found and used for the build.
  • The stdout in the run workpackage contains the output of the job submission command
  • The job.err in the run workpackage contains the GROMACS output.

Patterns are defines as regular expressions as part of a pattern set. When we take the following snippets from cmake_configure.log

OUTPUT

-- The C compiler identification is IntelLLVM 2024.2.0
-- The CXX compiler identification is IntelLLVM 2024.2.0
...
-- Detected best SIMD instructions for this CPU - AVX_512
-- Enabling 512-bit AVX-512 SIMD instructions using CXX flags:  -march=skylake-avx512
...
-- Using external FFT library - Intel MKL
...

we can already identify some interesting information that are worth extracting.

Each pattern can contain an arbitrary amount of wildcards, but must contain exactly one matching operator (), which defines the value of the pattern. JUBE also defines several patterns for common elementar types such as numbers and individual words:

  • $jube_pat_int: integer number w/ matching operator
  • $jube_pat_nint: integer number w/o matchin operator
  • $jube_pat_fp: floating point number w/ matching operator
  • $jube_pat_nfp: floating point number w/o matching operator
  • $jube_pat_wrd: word w/ matching operator
  • $jube_pat_nwrd: word w/o matching operator
  • $jube_pat_bl: blank space (variable length) w/o matching operator

Callout

While all steps of a JUBE workflow need to be defined before the workflow is started and cannot be altered after that, patterns and result tables can be updated while the workflow is active and even after a workflow has completed. This enables an iterative approach for defining patterns and result tables for previously unknown output.

To use the defined patterns, JUBE needs an analyser specifying which patterns to use for which file or which patterns to use globally.

We can then start and test the analyser with the analyse command. However, as we modified the workflow configuration by adding new patterns and analyser definitions, we need to tell JUBE to update its information about the workflow. This can be done with the -u (for update) argument followed by the updated workflow specification.

OUTPUT

######################################################################
# Analyse benchmark "GROMACS" id: 35
######################################################################
>>> Start analyse
>>> Analyse finished
>>> Analyse data storage: jube_run/000035/analyse.xml
######################################################################

Without the definition of a result table, we cannot visualise this directly, but we can investigate the data storage given in the output and check which patterns were matched for which workpackage.

XML

<?xml version="1.0" encoding="UTF-8"?>
<analyse>
  <analyser name="gromacs_analyser">
    <step name="build">
      <workpackage id="1">
        <pattern name="cmake_c_compiler_id" type="string">IntelLLVM</pattern>
        <pattern name="cmake_c_compiler_id_first" type="string">IntelLLVM</pattern>
        <pattern name="cmake_c_compiler_id_cnt" type="int">1</pattern>
        <pattern name="cmake_c_compiler_id_last" type="string">IntelLLVM</pattern>
        <pattern name="cmake_c_compiler_version" type="string">2024.2.0</pattern>
        <pattern name="cmake_c_compiler_version_first" type="string">2024.2.0</pattern>
        <pattern name="cmake_c_compiler_version_cnt" type="int">1</pattern>
        <pattern name="cmake_c_compiler_version_last" type="string">2024.2.0</pattern>
        <pattern name="cmake_cxx_compiler_id" type="string">IntelLLVM</pattern>
        <pattern name="cmake_cxx_compiler_id_first" type="string">IntelLLVM</pattern>
        <pattern name="cmake_cxx_compiler_id_cnt" type="int">1</pattern>
        <pattern name="cmake_cxx_compiler_id_last" type="string">IntelLLVM</pattern>
        <pattern name="cmake_cxx_compiler_version" type="string">2024.2.0</pattern>
        <pattern name="cmake_cxx_compiler_version_first" type="string">2024.2.0</pattern>
        <pattern name="cmake_cxx_compiler_version_cnt" type="int">1</pattern>
        <pattern name="cmake_cxx_compiler_version_last" type="string">2024.2.0</pattern>
        <pattern name="SIMD_detected" type="string">AVX_512</pattern>
        <pattern name="SIMD_detected_first" type="string">AVX_512</pattern>
        <pattern name="SIMD_detected_cnt" type="int">1</pattern>
        <pattern name="SIMD_detected_last" type="string">AVX_512</pattern>
        <pattern name="SIMD_flags" type="string">-march=skylake-avx512</pattern>
        <pattern name="SIMD_flags_first" type="string">-march=skylake-avx512</pattern>
        <pattern name="SIMD_flags_cnt" type="int">1</pattern>
        <pattern name="SIMD_flags_last" type="string">-march=skylake-avx512</pattern>
        <pattern name="FFT_detected" type="string">Intel MKL</pattern>
        <pattern name="FFT_detected_first" type="string">Intel MKL</pattern>
        <pattern name="FFT_detected_cnt" type="int">1</pattern>
        <pattern name="FFT_detected_last" type="string">Intel MKL</pattern>
      </workpackage>
    </step>
  </analyser>
</analyse>

JUBE also tracks multiple matches per pattern and tracks it in “shadow” patterns with additional suffixes. You can find a more details description in the JUBE Glossary under ‘statistical values’. For numerical statistics (e.g., min, max, avg, std) the pattern needs to be of type int or float. Also, a unit specified as a string can be stored with a pattern.

Generating a result table


With our first successful matches, we can now define our first result table for build-related information in a result specification.

OUTPUT

gromacs_build:
|  compiler | compiler version |    SIMD |       FFT |
|-----------|------------------|---------|-----------|
| IntelLLVM |         2024.2.0 | AVX_512 | Intel MKL |

Challenge

Add an additional patternset, analyser and result definition to generate an additional table similar to the following:

OUTPUT

gromacs_run:
| wp | gromacs_core_time[s] | gromacs_wall_time[s] | gromacs_core_perf[ns/day] | gromacs_wall_perf[hours/ns] |
|----|----------------------|----------------------|---------------------------|-----------------------------|
|  2 |               89.664 |                3.736 |                    89.664 |                       3.736 |

The wp column references a JUBE variable identifying the workpackage of the row. This makes it easier to identify the right directory of the workpackage in case multiple run steps were executed.

Callout

Each table has a separate file in the result/ directory with its name as is name and the extension .dat.

Tables can be either pretty printed or in CSV (comma-separated values) format. The latter being the default type.

Key Points

  • JUBE allows for definition of patterns to retrieve values from a workflow step.
  • Patterns can be defined as regular expressions or Python expressions.
  • JUBE allows for the generate of pretty-printed tables and CSV format.