Introduction

The High Throughput Computing (HTC) service on the SCIGNE plateform is based on a computing cluster optimised for processing data-parallel tasks. This cluster is connected to a larger infrastructure, the EGI European Grid Infrastructure.

Computing jobs are managed using DIRAC. DIRAC allows you to manage your computations both on the SCIGNE platform’s cluster and on the EGI grid infrastructure.

After a brief introduction to how a computing grid works, this documentation explains how to submit, monitor and retrieve the results of your calculations using DIRAC.

The Computing Grid

This section presents the different services involved in submitting a job on the computing grid. The interaction between these services during a job’s worklow is illustrated in the figure below. The acronyms are detailed in the table The main services of the computing grid.

_images/job_workflow.png

Job workflow on the grid infrastructure

Users manage computing jobs with DIRAC. Before submitting a job, large data files (> 10 MB) must be copied to the Storage Element (SE). How the SE works is detailed in the documentation about grid storage. These data will be then accessible from the compute nodes.

Once the data is available, the job can be be submitted to the DIRAC service, which selects the site(s) where the computation will run. DIRAC makes this selection based on site availability and and the calculation prerequisites specified by the user. After selecting a site, DIRAC submits the job to the Computing Element (CE). The CE distributes the jobs to the Worker Nodes (WN). When the computation is complete and the results copied to a SE, the CE collects the execution information and sends it back to DIRAC.

While the computation is in progress, the user can query DIRAC to monitor its status.

The main services of the computing grid

Element

Role

UI

A UI (User Interface) is a workstation or server where DIRAC is installed and configured. It enabled users to:

  • manage jobs (submission, status monitoring, cancellation)

  • retrieve the results of calculations

  • copy, replicate, or delete data on the grid

DIRAC

DIRAC is the server with which users interact when submitting, monitoring, and retrieving the results of calculations. This server selects the grid sites that match the computation requirements. It interacts mainly with the Computing Element (CE) to submit jobs, transfer sandboxes, and monitor the status of a calculation.

SE

The SE (Storage Element) is the component of the grid that manages data storage. It is used to store input data needed by jobs and to retrieve the results they produce. The SE is accessible via various transfer protocols.

CE

The CE (Computing Element) is the server that interacts directly with the queue manager of the computing cluster. It receives jobs from DIRAC and dispatches them to the Worker Nodes (WN) for execution.

WN

The WN (Worker Node) is the server that executes the job. It connects to the SE to retrieve the input data required for the job and can also copy the results back to the SE once the job is complete.

Job Management with DIRAC

The DIRAC software can be used either through a command-line client or via the DIRAC web interface. This section explains how to use the command-line interface.

Prerequisites for Using DIRAC

In order to submit a job, the following two prerequisites must be met:

  • Have a workstation with the DIRAC client installed.

  • Have a valid certificate registered with a Virtual Organisation (VO). For more information, see the document Managing a certificate.

DIRAC Client

The installation of the DIRAC client is explained in the DIRAC documentation.

For users with an IPHC account, the DIRAC client is already pre-installed on the UI servers. To set up the DIRAC environment, simply do the following:

$ source /cvmfs/dirac.egi.eu/dirac/bashrc_egi
$ export X509_CERT_DIR=/etc/grid-security/certificates
$ export X509_VOMS_DIR=/etc/grid-security/vomsdir
$ export X509_VOMSES=/etc/vomses

Certificate

The private and public parts of the certificate must be placed in the ${HOME}/.globus directory on the server from which jobs will be submitted. These files must have read permissions restricted to the owner only:

$ ls -l $HOME/.globus
-r-------- 1 user group 1935 Feb 16  2025 usercert.pem
-r-------- 1 user group 1920 Feb 16  2025 userkey.pem

In this documentation, we use the VO vo.scigne.fr, which is the regional VO. You should replace this VO name with the one you are using (e.g., biomed).

Before performing any operations on the regional grid, you need to have a valid proxy. It can be generated with the following command:

$ dirac-proxy-init -g scigne_user -M

The -g option specifies the group to which you belong, and the -M option indicates that the VOMS extension should be added to the proxy.

The following command can be used to check the validity of your proxy:

$ dirac-proxy-info
subject      : /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=NOM Prenom/CN=1234567890
issuer       : /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=NOM Prenom
identity     : /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=NOM Prenom
timeleft     : 23:53:43
DIRAC group  : scigne_user
rfc          : False
path         : /tmp/x509up_u1000
username     : pnom
properties   : NormalUser
VOMS         : True
VOMS fqan    : ['/vo.scigne.fr']

Submit a job

Submitting a job with DIRAC requires a text file containing instructions in JDL (Job Description Language) format. This file specifies the characteristics of the job and its requirements.

The example JDL file shown below is fully functional and can be used to submit a simple job. It will be referred to as myjob.jdl throughout this documentation.

JobName = "mysimplejob";
Executable = "/bin/bash";
Arguments = "myscript.sh";
StdOutput = "stdout.out";
StdError = "stderr.err";
InputSandbox = { "myscript.sh" };
OutputSandbox = { "stdout.out", "stderr.err" };
VirtualOrganisation = "vo.scigne.fr";

Each attribute in the example JDL file has a specific role:

  • JobName specifies the name of the job.

  • Executable defines the command to be executed on the compute nodes.

  • Arguments lists the arguments passed to the program defined by the Executable attribute. The contents of the myscript.sh script represent what you would type interactively to perform your job.

  • StdOutput specifies the name of the file where standard output will be redirected.

  • StdError specifies the name of the file where error messages will be redirected.

  • InputSandbox lists the files sent alongside the JDL file to the DIRAC server. These files can be used by the Worker Nodes (WN) running the job. Note that the total size of the InputSandbox is limited. If the total size of all files exceeds 10 MB, it is recommended to use an Storage Element (SE) or pre-install the required software on the compute nodes.

  • OutputSandbox specifies the files to be retrieved after the job has finished. By default, we recommend retrieving the files stdout.out and stderr.err, which contain the standard output and error output, respectively. These files will be downloaded from DIRAC. As with the InputSandbox, the total volume must not exceed 10 MB. If the combined size of the files in the OutputSandbox exceeds this limit, you should copy the output files (results, logs from the job) to a Storage Element (SE) and retrieve them later to avoid truncation.

  • VirtualOrganization specifies the VO in which the job will be run. For the regional grid, this is vo.scigne.fr.

An example of the myscript.sh script is shown below:

#!/bin/sh

echo "=====  Begin  ====="
date
echo "The program is running on $HOSTNAME"
date
dd if=/dev/urandom of=fichier.txt bs=1G count=1
gfal-copy file://`pwd`/fichier.txt https://sbgdcache.in2p3.fr/vo.scigne.fr/lab/pname/fichier.txt
echo "=====  End  ====="

The executables used in the myscript.sh script must be installed on the grid. Software installation is managed by the platform support team. If you need specific software, please contact us.

The gfal-copy command copies the generated file to the storage element sbgdcache.in2p3.fr. Details about the storage service can be found in the dedicated documentation.

Once the myjob.jdl and myscript.sh files have been created, you can submit the job to the grid using the following command:

$ dirac-wms-job-submit myjob.jdl
JobID = 79342906

It is important to keep the task identifier 79342906. This identifier is used to track the different stages of a job and to retrieve the OutputSandbox.

Monitoring the Status of a Job

To check the status of a job, use the following command with the job identifier:

$ dirac-wms-job-status 79342906
JobID=79342906 Status=Waiting; MinorStatus=Pilot Agent Submission; Site=ANY;

Once the calculation has completed, the previous command will display:

$ dirac-wms-job-status 79342906
JobID=79342906 Status=Done; MinorStatus=Execution Complete; Site=LCG.SBG.fr;

The different states that a job’s status can take are detailed in figure 3.1.

Retrieving the Results of a Job

To retrieve the files specified in the OutputSandbox parameter, use the following command:

$ dirac-wms-job-get-output 79342906

You can check the output files using:

$ ls -rtl /home/user/79342906
total 12
-rw-r--r-- 1 user group 73 juil. 25 15:12 StdOut
_images/job_states.png

The different states of a DIRAC job.

Advanced Use of DIRAC

This section explains how to submit parametric tasks.

Submitting a Parametric Set of Jobs

Submitting a Parametric task means submitting a set of jobs that share the same JDL file but use a unique parameter value for each job. This approach is very useful when you need to run a series of jobs that differ only by the value of a single parameter (for example, a numeric value used in input or output file names).

JobName = "Test_param_%n";
JobGroup = "Param_Test_1";
Executable = "/bin/sh";
Arguments = "myparamscript.sh %s";
Parameters = 10;
ParameterStart = 0;
ParameterStep = 1;
StdOutput = "stdout.out";
StdError = "stderr.err";
InputSandbox = { "myscript.sh" };
OutputSandbox = { "stdout.out", "stderr.err" };
VirtualOrganisation = "vo.scigne.fr";

This JDL file will generate 10 jobs. The following placeholders can be used in the JDL file:

  • %n: the iteration number

  • %s: the parameter value

  • %j: the job identifier

$ dirac-wms-job-submit params.jdl
JobID = [79343433, 79343434, 79343435, 79343436, 79343437, 79343438, 79343439, 79343440, 79343441, 79343442]

Using COMDIRAC Commands

The COMDIRAC commands provide simplified and efficient access to the DIRAC system. They allow easier management of jobs and data transfers across multiple virtual organizations (VOs).

Supplementary Documentation

The following links provide access to further documentation for more advanced use of DIRAC and the computing grid: