Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Serial

MPI

OpenMP

MPI+OpenMP

Fortran

flang

mpif90

flang -mp

mpif90 -mp

C

clang

mpicc

clang -mp

mpicc -mp

C++

clang++

mpicxx

clang++ -mp

mpicxx -mp

Running Jobs on Lotus

Running programs on the cluster is done by interacting with the job scheduling system. Lotus uses the SLURM job scheduler for managing both batch jobs and interactive runs. You should not run computationally intensive tasks on the login nodes – use the compute nodes.

If you have special needs for running jobs on the cluster, please contact the cluster support staff to help. Submitting large quantities of jobs (esp. short jobs) can impact overall scheduler response for all users.

Requesting Interactive Resources

You can request an interactive session by using the srun command.

srun --pty --nodes=2 --ntasks-per-node=48 -t 30:00 --wait=0 /bin/bash

This command requests two full compute nodes with 48 cores each (for a total of 96 cores) for 30 minutes. When this request is granted, you will automatically be logged into the assigned node and can work normally. If you would like to run a parallel program from within the interactive job you can use srun without any options:

srun myprog

Submitting Batch Jobs

To submit a job using a batch file, create a short text file in the style of the following examples, updating where necessary to reflect your program and parameters. You can add additional SBATCH lines to send email notifications (--mail-user), etc. see man sbatch for more information. To submit your job, use the sbatch command:

sbatch jobfile

The jobfile is the file you create and contains the SLURM resource specifications and shell commands. Several examples are provided below.

MPI Job

This job runs on 2 compute nodes with 48 cores each (for a total of 96 cores), each core is assigned a single MPI rank.

Code Block
languagebash
#!/bin/bash
#SBATCH --job-name="hellompi"
#SBATCH --output="hellompi.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --export=ALL
#SBATCH -t 01:30:00

srun ./hello_mpi

OpenMP Job

This job requests a single compute node and uses 48 threads for all OpenMP parallel sections. OpenMP (non-hybrid) will only work when all processes are on the same node (i.e. --nodes must be 1).

Code Block
languagebash
#!/bin/bash
#SBATCH --job-name="hello_openmp"
#SBATCH --output="hello_openmp.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=48
#SBATCH --export=ALL
#SBATCH -t 01:30:00

export OMP_NUM_THREADS=48
./hello_openmp

Hybrid MPI-OpenMP Job

This job requests 2 nodes and 96 total processors. This will launch 2 MPI ranks per node (total of 4 MPI processes), with each process using 24 OpenMP threads.

Code Block
languagebash
#!/bin/bash
#SBATCH --job-name="hellohybrid"
#SBATCH --output="hello_hybrid.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --ntasks=4
#SBATCH --export=ALL
#SBATCH -t 01:30:00

export OMP_NUM_THREADS=24
srun --cpus-per-task=$OMP_NUM_THREADS ./hello_hybrid

SLURM No-Requeue Option

SLURM will requeue jobs if there is a node failure of if your job is preempted. In some cases, this may cause input or output files to be overwritten that should be preserved. You may request that your job not be automatically re-queued by adding the following line to your batch file:

Code Block
languagebash
#SBATCH --no-requeue

Monitoring Job Status

Users can monitor their jobs using the squeue command.

Code Block
[user1@lotus-login01]$ squeue -u user1
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
256556   compute hellompi    user1     R    0:03:57      4 compute[01-02]
256555   compute hellompi    user1     R    0:14:44      4 compute[03-04]

This shows two jobs that are currently running in the compute partition and which compute nodes they are assigned to. You can use additional options to customize your display:

  • -i <interval> repeats every interval seconds

  • -j<joblist> shows information for specific jobs

Users can cancel jobs using the scancel command:

[user1@lotus-login01]$ scancel <jobid>

Storage Considerations

Lotus has a single storage server with a total of 504TB of disk in a ZFS RAID-Z2 filesystem. This filesystem is the primary storage location for all data on the cluster. Programs that perform a lot of file I/O operations in parallel may have poor performance with this storage design. Lotus does not have any storage that uses a parallel filesystem.

Each compute node has access to 512GB of local SSD storage which can be used for check-pointing and programs that will benefit from local fast storage. The latency for local SSD access is several orders of magnitude lower than accessing the shared network filesystem. Users may use the /scratch filesystem on each compute node for temporary storage. Scratch storage space will be reclaimed after your job completes.

Software

Users may request that new software packages be added to the cluster if they may benefit multiple users or research groups. If you would like specific software installed, please contact the research computing support staff.