...
Serial | MPI | OpenMP | MPI+OpenMP | |
---|---|---|---|---|
Fortran |
|
|
|
|
C |
|
|
|
|
C++ |
|
|
|
|
Running Jobs on Lotus
Running programs on the cluster is done by interacting with the job scheduling system. Lotus uses the SLURM job scheduler for managing both batch jobs and interactive runs. You should not run computationally intensive tasks on the login nodes – use the compute nodes.
If you have special needs for running jobs on the cluster, please contact the cluster support staff to help. Submitting large quantities of jobs (esp. short jobs) can impact overall scheduler response for all users.
Requesting Interactive Resources
You can request an interactive session by using the srun
command.
srun --pty --nodes=2 --ntasks-per-node=48 -t 30:00 --wait=0 /bin/bash
This command requests two full compute nodes with 48 cores each (for a total of 96 cores) for 30 minutes. When this request is granted, you will automatically be logged into the assigned node and can work normally. If you would like to run a parallel program from within the interactive job you can use srun
without any options:
srun myprog
Submitting Batch Jobs
To submit a job using a batch file, create a short text file in the style of the following examples, updating where necessary to reflect your program and parameters. You can add additional SBATCH
lines to send email notifications (--mail-user
), etc. see man sbatch
for more information. To submit your job, use the sbatch
command:
sbatch jobfile
The jobfile
is the file you create and contains the SLURM resource specifications and shell commands. Several examples are provided below.
MPI Job
This job runs on 2 compute nodes with 48 cores each (for a total of 96 cores), each core is assigned a single MPI rank.
Code Block | ||
---|---|---|
| ||
#!/bin/bash
#SBATCH --job-name="hellompi"
#SBATCH --output="hellompi.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --export=ALL
#SBATCH -t 01:30:00
srun ./hello_mpi |
OpenMP Job
This job requests a single compute node and uses 48 threads for all OpenMP parallel sections. OpenMP (non-hybrid) will only work when all processes are on the same node (i.e. --nodes must be 1).
Code Block | ||
---|---|---|
| ||
#!/bin/bash
#SBATCH --job-name="hello_openmp"
#SBATCH --output="hello_openmp.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=48
#SBATCH --export=ALL
#SBATCH -t 01:30:00
export OMP_NUM_THREADS=48
./hello_openmp |
Hybrid MPI-OpenMP Job
This job requests 2 nodes and 96 total processors. This will launch 2 MPI ranks per node (total of 4 MPI processes), with each process using 24 OpenMP threads.
Code Block | ||
---|---|---|
| ||
#!/bin/bash
#SBATCH --job-name="hellohybrid"
#SBATCH --output="hello_hybrid.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --ntasks=4
#SBATCH --export=ALL
#SBATCH -t 01:30:00
export OMP_NUM_THREADS=24
srun --cpus-per-task=$OMP_NUM_THREADS ./hello_hybrid |
SLURM No-Requeue Option
SLURM will requeue jobs if there is a node failure of if your job is preempted. In some cases, this may cause input or output files to be overwritten that should be preserved. You may request that your job not be automatically re-queued by adding the following line to your batch file:
Code Block | ||
---|---|---|
| ||
#SBATCH --no-requeue |
Monitoring Job Status
Users can monitor their jobs using the squeue
command.
Code Block |
---|
[user1@lotus-login01]$ squeue -u user1
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
256556 compute hellompi user1 R 0:03:57 4 compute[01-02]
256555 compute hellompi user1 R 0:14:44 4 compute[03-04] |
This shows two jobs that are currently running in the compute partition and which compute nodes they are assigned to. You can use additional options to customize your display:
-i <interval>
repeats every interval seconds-j<joblist>
shows information for specific jobs
Users can cancel jobs using the scancel
command:
[user1@lotus-login01]$ scancel <jobid>
Storage Considerations
Lotus has a single storage server with a total of 504TB of disk in a ZFS RAID-Z2 filesystem. This filesystem is the primary storage location for all data on the cluster. Programs that perform a lot of file I/O operations in parallel may have poor performance with this storage design. Lotus does not have any storage that uses a parallel filesystem.
Each compute node has access to 512GB of local SSD storage which can be used for check-pointing and programs that will benefit from local fast storage. The latency for local SSD access is several orders of magnitude lower than accessing the shared network filesystem. Users may use the /scratch
filesystem on each compute node for temporary storage. Scratch storage space will be reclaimed after your job completes.
Software
Users may request that new software packages be added to the cluster if they may benefit multiple users or research groups. If you would like specific software installed, please contact the research computing support staff.