Introduction ============ The High Throughput Computing (*HTC*) service on the `SCIGNE plateform `_ is based on a computing cluster optimised for processing data-parallel tasks. This cluster is connected to a larger infrastructure, the `EGI European Grid Infrastructure `_. Although the recommended way to interact with the HTC service is to use the `DIRAC software `_, it is also possible to use directly the *Computing Element* (**CE**) of the SCIGNE platform for managing your jobs. The CE software installed at SCIGNE is `ARC CE `_. After a brief introduction to how a computing grid works, this documentation explains how to submit, monitor and retrieve the results of your calculations with the ARC CE client. The Computing Grid ================== This section presents the different services involved in submitting a job on the computing grid. The interaction between these services during a job's worklow is illustrated in the figure below. The acronyms are detailed in the table `The main services of the computing grid <#grid-services>`_. .. figure:: _static/job_workflow_arcce.png Job workflow on the grid infrastructure Users manage computing jobs with DIRAC. Before submitting a job, large data files (> 10 MB) must be copied to the Storage Element (SE). How the SE works is detailed in the `documentation about grid storage `_. These data will be then accessible from the compute nodes. Once the data is available, the job can be be submitted to the DIRAC service, which selects the site(s) where the computation will run. DIRAC makes this selection based on site availability and and the calculation prerequisites specified by the user. After selecting a site, DIRAC submits the job to the Computing Element (CE). The CE distributes the jobs to the Worker Nodes (WN). When the computation is complete and the results copied to a SE, the CE collects the execution information and sends it back to DIRAC. While the computation is in progress, the user can query DIRAC to monitor its status. .. _grid-services: .. table:: The main services of the computing grid +---------+----------------------------------------------------------------+ | Element | Role | +=========+================================================================+ | UI | A UI (*User Interface*) is a workstation or server where | | | DIRAC is installed and configured. It enabled users to: | | | | | | - manage jobs (submission, status monitoring, cancellation) | | | | | | - retrieve the results of calculations | | | | | | - copy, replicate, or delete data on the grid | +---------+----------------------------------------------------------------+ | SE | The SE (*Storage Element*) is the component of the grid that | | | manages data storage. It is used to store input data needed by | | | jobs and to retrieve the results they produce. The SE is | | | accessible via various transfer protocols. | +---------+----------------------------------------------------------------+ | ARC CE | The ARC CE (*Computing Element*) is the server that interacts | | | directly with the queue manager of the computing cluster. It | | | permits to submit jobs to the local cluster. | +---------+----------------------------------------------------------------+ | WN | The WN (*Worker Node*) is the server that executes the job. It | | | connects to the SE to retrieve the input data required for the | | | job and can also copy the results back to the SE once the job | | | is complete. | +---------+----------------------------------------------------------------+ Prerequisites ============= In order to submit a job to an ARC CE, the following two prerequisites must be met: - Have a workstation with the ARC CE client installed. - Have a valid certificate registered with a Virtual Organisation (*VO*). For more information, see the document `Managing a certificate `_. ARC CE Client ------------- Jobs are managed using the ARC CE client, which is a command line interface software. If you have an account at IPHC, you can simply use the UI server, as the ARC CE client is already installed there. It is also possible to simply install the ARC CE client on your GNU/Linux workstation: * With Ubuntu: .. code-block:: console $ sudo apt install nordugrid-arc-client * With RedHat and its derivatives (i.e. AlmaLinux): .. code-block:: console $ dnf install -y epel-release $ dnf update $ dnf install -y install nordugrid-arc-client Certificate ----------- The private and public parts of the certificate must be placed in the ``${HOME}/.globus`` directory on the server from which jobs will be submitted. These files must have read permissions restricted to the owner only: .. code-block:: console $ ls -l $HOME/.globus -r-------- 1 user group 1935 Feb 16 2025 usercert.pem -r-------- 1 user group 1920 Feb 16 2025 userkey.pem In this documentation, we use the VO ``vo.scigne.fr``, which is the regional VO. You should replace this VO name with the one you are using (e.g., ``biomed``). Job Management with the ARC CE Client ===================================== Proxy generation ---------------- Before submitting a job to the computing grid, it is required to generate a valid *proxy*. This temporary certificate allows you to authenticate with all grid services. It is generated with the following command: .. code-block:: console $ arcproxy -S vo.scigne.fr -c validityPeriod=24h -c vomsACvalidityPeriod=24 The **-S** option allows to select the VO. The following command shows the validity of your proxy: .. code-block:: console $ arcproxy -I Subject: /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=Your Name/CN=265213680 Issuer: /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=Your Name Identity: /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=Your Name Time left for proxy: 23 hours 59 minutes 48 seconds Proxy path: /tmp/x509up_u1234 Proxy type: X.509 Proxy Certificate Profile RFC compliant impersonation proxy - RFC inheritAll proxy Proxy key length: 2048 Proxy signature: sha384 ====== AC extension information for VO vo.scigne.fr ====== VO : vo.scigne.fr subject : /DC=org/DC=terena/DC=tcs/C=FR/O=Centre national de la recherche scientifique/CN=Your Name issuer : /DC=org/DC=terena/DC=tcs/C=FR/L=Paris/O=Centre national de la recherche scientifique/CN=voms-scigne.ijclab.in2p3.fr uri : voms-scigne.ijclab.in2p3.fr:443 attribute : /vo.scigne.fr Time left for AC: 11 hours 59 minutes 52 seconds Job submission -------------- The submission of a job with the ARC CE client requires a text file containing instructions in `xRSL format `_. This file specifies the characteristics of the job and its requirements. The xRSL file presented in the example below is fully functional and can be used to realise a simple job. The xRSL file is named ``myjob.xrsl`` in this document. .. code:: & (executable = "/bin/bash") (arguments = "myscript.sh") (jobName="mysimplejob") (inputFiles = ("myscript.sh" "") ) (stdout = "stdout") (stderr = "stderr") (gmlog="simplejob.log") (wallTime="240") (runTimeEnvironment="ENV/PROXY") (count="1") (countpernode="1") Each attribute of this example has a specific role: - *executable* defines the command to be executed on the compute nodes. It can be the name of a shell script. - *arguments* specifies the arguments to pass to the program defined by the *executable* attribute; in this example ``myscript.sh`` is a bash script that will be passed as an argument to the **bash** command. - *jobName* defines the name of the job. - *inputFiles* indicates the files to send to the ARC CE and needed for the computation. The use of *""* after the name of the file indicates that the file is available locally, in the directory from which the ARC CE commands are executed. - *stdout* specifies the name of the file where standard output will be redirected. - *stderr* specifies the name of the file where error messages will be redirected. - *gmlog* is the name of the directory containing diagnostic tools. - *wallTime* is the maximum execution time for a job. - *runTimeEnvironment* indicates the execution environment required by a job. - *count* is an integer indicating the simultaneous number of executed task (this parameter is greater than 1 in case of MPI jobs). - *countpernode* is an integer indicating how many tasks should run on a single server. If all tasks need to run on the same server, the value of the *count* and *countpernode* attributes should be identical. An example of the ``myscript.sh`` script is shown below: .. code:: bash #!/bin/sh echo "===== Begin =====" date echo "The program is running on $HOSTNAME" date dd if=/dev/urandom of=fichier.txt bs=1G count=1 gfal-copy file://`pwd`/fichier.txt root://sbgdcache.in2p3.fr/vo.scigne.fr/lab/name/fichier.txt echo "===== End =====" Executables used in the ``myscript.sh`` script need to be installed on the grid. The installation of these software is carried out by the support team of the SCIGNE platform. Do not hesitate to `contact it `_ for any enquiry. The ``gfal-copy`` command allows to copy files to a SE. It is recommanded to use the ``sbgdcache.in2p3.fr`` SE when you submit jobs at SCIGNE (it is the closest SE). The usage of the SE is described in the `dedicated guide `_. Once the ``myjob.xrsl`` and ``myscript.sh`` files are created, the job can be submitted to the ARC CE with the following command: .. code-block:: console $ arcsub -c sbgce1.in2p3.fr myjob.xrsl Job submitted with jobid: gsiftp://sbgce1.in2p3.fr:2811/jobs/VnhGHkl9RUnhJKeJ7oqv78RnGBFKFvDGEDDmxDFKDmABFKDmnKeGDn The **-c** option allows to select the ARC CE server. The identifier of the job is stored in a local database: ``{HOME}/.arc/jobs.dat``. Monitoring job status --------------------- For monitoring the status of a job, the following command can be used with the job identifier: .. code-block:: console $ arcstat gsiftp://sbgce1.in2p3.fr:2811/jobs/VnhGHkl9RUnhJKeJ7oqv78RnGBFKFvDGEDDmxDFKDmABFKDmnKeGDn Job: gsiftp://sbgce1.in2p3.fr:2811/jobs/VnhGHkl9RUnhJKeJ7oqv78RnGBFKFvDGEDDmxDFKDmABFKDmnKeGDn Name: mysimplejob State: Finishing Status of 1 jobs was queried, 1 jobs returned information In the case of a short job, you may have to wait some time before obtaining an updated status of the job. Getting the output files ------------------------ Once the job is finished (job status: *Finished*), the content of the output files, described by the stdout and stderr attributes, can be retrieved with: .. code-block:: console $ arcget gsiftp://sbgce1.in2p3.fr:2811/jobs/VnhGHkl9RUnhJKeJ7oqv78RnGBFKFvDGEDDmxDFKDmABFKDmnKeGDn The file are in the ``VnhGHkl9RUnhJKeJ7oqv78RnGBFKFvDGEDDmxDFKDmABFKDmnKeGDn`` directory. Supplementary Documentation =========================== The reference below points to a documentation that gives a deeper view of the ARC CE client and the computing grid: - `ARC CE official documentation `_