Introduction ============ The High Throughput Computing (*HTC*) service on the `SCIGNE plateform `_ is based on a computing cluster optimised for processing data-parallel tasks. This cluster is connected to a larger infrastructure, the `EGI European Grid Infrastructure `_. Computing jobs are managed using `DIRAC `_. DIRAC allows you to manage your computations both on the SCIGNE platform's cluster and on the EGI grid infrastructure. After a brief introduction to how a computing grid works, this documentation explains how to submit, monitor and retrieve the results of your calculations using DIRAC. The Computing Grid ================== This section presents the different services involved in submitting a job on the computing grid. The interaction between these services during a job's worklow is illustrated in the figure below. The acronyms are detailed in the table `The main services of the computing grid <#grid-services>`_. .. figure:: _static/job_workflow.png Job workflow on the grid infrastructure Users manage computing jobs with DIRAC. Before submitting a job, large data files (> 10 MB) must be copied to the Storage Element (SE). How the SE works is detailed in the `documentation about grid storage `_. These data will be then accessible from the compute nodes. Once the data is available, the job can be be submitted to the DIRAC service, which selects the site(s) where the computation will run. DIRAC makes this selection based on site availability and and the calculation prerequisites specified by the user. After selecting a site, DIRAC submits the job to the Computing Element (CE). The CE distributes the jobs to the Worker Nodes (WN). When the computation is complete and the results copied to a SE, the CE collects the execution information and sends it back to DIRAC. While the computation is in progress, the user can query DIRAC to monitor its status. .. _grid-services: .. table:: The main services of the computing grid +---------+----------------------------------------------------------------+ | Element | Role | +=========+================================================================+ | UI | A UI (*User Interface*) is a workstation or server where | | | DIRAC is installed and configured. It enabled users to: | | | | | | - manage jobs (submission, status monitoring, cancellation) | | | | | | - retrieve the results of calculations | | | | | | - copy, replicate, or delete data on the grid | +---------+----------------------------------------------------------------+ | DIRAC | DIRAC is the server with which users interact when submitting, | | | monitoring, and retrieving the results of calculations. This | | | server selects the grid sites that match the computation | | | requirements. It interacts mainly with the Computing | | | Element (CE) to submit jobs, transfer sandboxes, and | | | monitor the status of a calculation. | +---------+----------------------------------------------------------------+ | SE | The SE (*Storage Element*) is the component of the grid that | | | manages data storage. It is used to store input data needed by | | | jobs and to retrieve the results they produce. The SE is | | | accessible via various transfer protocols. | +---------+----------------------------------------------------------------+ | CE | The CE (*Computing Element*) is the server that interacts | | | directly with the queue manager of the computing cluster. It | | | receives jobs from DIRAC and dispatches them to the Worker | | | Nodes (WN) for execution. | +---------+----------------------------------------------------------------+ | WN | The WN (*Worker Node*) is the server that executes the job. It | | | connects to the SE to retrieve the input data required for the | | | job and can also copy the results back to the SE once the job | | | is complete. | +---------+----------------------------------------------------------------+ Job Management with DIRAC ========================= The DIRAC software can be used either through a command-line client or via the `DIRAC web interface `_. This section explains how to use the command-line interface. Prerequisites for Using DIRAC ----------------------------- In order to submit a job, the following two prerequisites must be met: - Have a workstation with the DIRAC client installed. - Have a valid certificate registered with a Virtual Organisation (*VO*). For more information, see the document `Managing a certificate `_. DIRAC Client ++++++++++++ The installation of the DIRAC client is explained in the `DIRAC documentation `_. For users with an IPHC account, the DIRAC client is already pre-installed on the *UI* servers. To set up the DIRAC environment, simply do the following: .. code-block:: console $ source /cvmfs/dirac.egi.eu/dirac/bashrc_egi $ export X509_CERT_DIR=/etc/grid-security/certificates $ export X509_VOMS_DIR=/etc/grid-security/vomsdir $ export X509_VOMSES=/etc/vomses Certificate +++++++++++ The private and public parts of the certificate must be placed in the ``${HOME}/.globus`` directory on the server from which jobs will be submitted. These files must have read permissions restricted to the owner only: .. code-block:: console $ ls -l $HOME/.globus -r-------- 1 user group 1935 Feb 16 2025 usercert.pem -r-------- 1 user group 1920 Feb 16 2025 userkey.pem In this documentation, we use the VO ``vo.scigne.fr``, which is the regional VO. You should replace this VO name with the one you are using (e.g., ``biomed``). Before performing any operations on the regional grid, you need to have a valid proxy. It can be generated with the following command: .. code-block:: console $ dirac-proxy-init -g scigne_user -M The ``-g`` option specifies the group to which you belong, and the ``-M`` option indicates that the *VOMS* extension should be added to the proxy. The following command can be used to check the validity of your proxy: .. code-block:: console $ dirac-proxy-info subject : /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=NOM Prenom/CN=1234567890 issuer : /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=NOM Prenom identity : /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=NOM Prenom timeleft : 23:53:43 DIRAC group : scigne_user rfc : False path : /tmp/x509up_u1000 username : pnom properties : NormalUser VOMS : True VOMS fqan : ['/vo.scigne.fr'] Submit a job ------------ Submitting a job with DIRAC requires a text file containing instructions in JDL (Job Description Language) format. This file specifies the characteristics of the job and its requirements. The example JDL file shown below is fully functional and can be used to submit a simple job. It will be referred to as ``myjob.jdl`` throughout this documentation. .. code:: JobName = "mysimplejob"; Executable = "/bin/bash"; Arguments = "myscript.sh"; StdOutput = "stdout.out"; StdError = "stderr.err"; InputSandbox = { "myscript.sh" }; OutputSandbox = { "stdout.out", "stderr.err" }; VirtualOrganisation = "vo.scigne.fr"; Each attribute in the example JDL file has a specific role: - *JobName* specifies the name of the job. - *Executable* defines the command to be executed on the compute nodes. - *Arguments* lists the arguments passed to the program defined by the *Executable* attribute. The contents of the ``myscript.sh`` script represent what you would type interactively to perform your job. - *StdOutput* specifies the name of the file where standard output will be redirected. - *StdError* specifies the name of the file where error messages will be redirected. - *InputSandbox* lists the files sent alongside the JDL file to the DIRAC server. These files can be used by the Worker Nodes (WN) running the job. Note that the total size of the *InputSandbox* is limited. If the total size of all files exceeds 10 MB, it is recommended to use an Storage Element (SE) or pre-install the required software on the compute nodes. - *OutputSandbox* specifies the files to be retrieved after the job has finished. By default, we recommend retrieving the files ``stdout.out`` and ``stderr.err``, which contain the standard output and error output, respectively. These files will be downloaded from DIRAC. As with the *InputSandbox*, the total volume must not exceed 10 MB. If the combined size of the files in the *OutputSandbox* exceeds this limit, you should copy the output files (results, logs from the job) to a Storage Element (SE) and retrieve them later to avoid truncation. - *VirtualOrganization* specifies the VO in which the job will be run. For the regional grid, this is ``vo.scigne.fr``. An example of the ``myscript.sh`` script is shown below: .. code-block:: bash #!/bin/sh echo "===== Begin =====" date echo "The program is running on $HOSTNAME" date dd if=/dev/urandom of=fichier.txt bs=1G count=1 gfal-copy file://`pwd`/fichier.txt https://sbgdcache.in2p3.fr/vo.scigne.fr/lab/pname/fichier.txt echo "===== End =====" The executables used in the ``myscript.sh`` script must be installed on the grid. Software installation is managed by the platform support team. If you need specific software, please `contact us `_. The ``gfal-copy`` command copies the generated file to the storage element ``sbgdcache.in2p3.fr``. Details about the storage service can be found in the `dedicated documentation `_. Once the ``myjob.jdl`` and ``myscript.sh`` files have been created, you can submit the job to the grid using the following command: .. code-block:: console $ dirac-wms-job-submit myjob.jdl JobID = 79342906 It is important to keep the task identifier ``79342906``. This identifier is used to track the different stages of a job and to retrieve the *OutputSandbox*. Monitoring the Status of a Job ------------------------------ To check the status of a job, use the following command with the job identifier: .. code-block:: console $ dirac-wms-job-status 79342906 JobID=79342906 Status=Waiting; MinorStatus=Pilot Agent Submission; Site=ANY; Once the calculation has completed, the previous command will display: .. code-block:: console $ dirac-wms-job-status 79342906 JobID=79342906 Status=Done; MinorStatus=Execution Complete; Site=LCG.SBG.fr; The different states that a job's status can take are detailed in figure 3.1. Retrieving the Results of a Job ------------------------------- To retrieve the files specified in the *OutputSandbox* parameter, use the following command: .. code-block:: console $ dirac-wms-job-get-output 79342906 You can check the output files using: .. code-block:: console $ ls -rtl /home/user/79342906 total 12 -rw-r--r-- 1 user group 73 juil. 25 15:12 StdOut .. figure:: _static/job_states.png The different states of a DIRAC job. Advanced Use of DIRAC ===================== This section explains how to submit parametric tasks. Submitting a Parametric Set of Jobs ----------------------------------- Submitting a *Parametric* task means submitting a set of jobs that share the same JDL file but use a unique parameter value for each job. This approach is very useful when you need to run a series of jobs that differ only by the value of a single parameter (for example, a numeric value used in input or output file names). .. code:: JobName = "Test_param_%n"; JobGroup = "Param_Test_1"; Executable = "/bin/sh"; Arguments = "myparamscript.sh %s"; Parameters = 10; ParameterStart = 0; ParameterStep = 1; StdOutput = "stdout.out"; StdError = "stderr.err"; InputSandbox = { "myscript.sh" }; OutputSandbox = { "stdout.out", "stderr.err" }; VirtualOrganisation = "vo.scigne.fr"; This JDL file will generate 10 jobs. The following placeholders can be used in the JDL file: - ``%n``: the iteration number - ``%s``: the parameter value - ``%j``: the job identifier .. code-block:: console $ dirac-wms-job-submit params.jdl JobID = [79343433, 79343434, 79343435, 79343436, 79343437, 79343438, 79343439, 79343440, 79343441, 79343442] Using COMDIRAC Commands ======================= The `COMDIRAC `_ commands provide simplified and efficient access to the DIRAC system. They allow easier management of jobs and data transfers across multiple virtual organizations (VOs). Supplementary Documentation =========================== The following links provide access to further documentation for more advanced use of DIRAC and the computing grid: - `Official DIRAC documentation `_ - `DIRAC documentation on the France Grilles website `_ (*in French*)