Lotus / HPC User Guide

Technical Details

System Component	Configuration
Compute Nodes
CPU Type	AMD EPYC 7352
Sockets	2
Cores/socket	24
Clock speed	2.3 GHz
Memory	256 GB RAM
Local Storage	512GB Micron 1300 SSD (/scratch)
Memory Bandwidth	409.6 GB/s
System
Total Compute Nodes	44
Total Compute Cores	2,112
Total Memory	11.6 TB
Total Storage	342 TB
Interconnect	Mellanox Infiniband EDR
Link bandwidth	100 Gb/s
MPI Latency	1.64 µs

Systems Software Environment

Software	Description
Operating System	CentOS Linux 7.9
Cluster Management	Scyld Clusterware 11.0
Compilers	AOCC, GCC, Clang, Go
Parallel Frameworks	Open MPI, MVAPICH2, Sandia OpenSHMEM

System Access

To use the Lotus cluster, you must first request an account on the system. Rhodes College faculty, students, and staff should submit a request using the following form for gaining access to the cluster:

Rhodes HPC Cluster Access Request Form

Non-Rhodes users must have a guest researcher account that is sponsored by a Rhodes faculty or staff member.

Access to Lotus is through the Secure Shell (SSH) to:

lotus.rhodes.edu

For on campus users:

From your terminal window at the prompt , type the following (not including the $ and replacing the “user” with your username) to log in!

$ ssh user@lotus.rhodes.edu

For off-campus users:

Direct SSH access is not permitted from off-campus. Users may either use a VPN to access the cluster (and then SSH in), or can login to a virtual desktop at http://desktops.rhodes.edu and use PuTTY to access the cluster. For more information on using these resources, see the Getting Started information.

Notes:

When you login to lotus.rhodes.edu you will be directed to either lotus-login01 or lotus-login02. These machines are identical in hardware and software configuration.

You may add your SSH public key to ~/.ssh/authorized_keys to enable password-less login using ECDSA, RSA, and ed25519 key types. Please ensure that your private keys are secured with a strong local password. You can use ssh-agent to avoid having to repeatedly type your private key password.

Hosts which attempt to connect very frequently (many times per second) may be blocked temporarily in order to improve system security. If you are blocked, wait 15 minutes and try again.

Modules

The cluster provides the modules system for loading specific software packages and environments. Module commands can update your shell environment to automatically find optional tools, compilers, and libraries that you may need to support your application. Modules also provide a flexible mechanism for maintaining several versions of the same software or specific combinations of dependent software packages. New modules can be added upon request.

To list all of the available modules on the system, use the following command:

module available

To load a specific specific module you can use the load command:

module load mvapich2

This would load the MVAPICH2 MPI library into your environment, replacing any other version of MPI that was previously configured. Running a module command only affects the current running shell. You may wish to add specific module commands to batch files for submitting jobs or add then to shell configuration files that are read on login (typically .bashrc or .zshrc)

Other useful module commands are listed below:

Command	Description
`module list`	List the modules that are currently loaded
`module avail`	List the modules that are available to be loaded
`module display <module name>`	Show the environment variables modified by the <module_name> module
`module load <module name>`	Load the module <module_name> into the environment
`module unload <module name>`	Remove the module <module_name> from the environment
`module swap <mod1> <mod2>`	Replace <mod1> with <mod2> in the environment

Job Charging and Queue Limits

Currently, the cluster is operating under a free use billing model. There are no explicit time allocations for the cluster or enforced limits on overall usage of the system. This use model is subject to change depending on how usage evolves over time.

This resource is a shared, campus-wide resource. We ask that you use the system in a manner that is consistent with campus community standards and respect the shared nature of the system.

Jobs are subject to the following limits:

Maximum wall clock time for a single job is 48 hours
Jobs may request up to the max number of cores on the system (2,112)
Jobs may request up to the max number of nodes on the system (44)
Users may have at most 128 jobs queued at a time
Queued jobs may be preempted to support priority jobs (e.g. a paper deadline) or for emergency maintenance.

Compiling

All hosts in the cluster have access to GNU, AOCC (AMD), and Clang compilers along with multiple MPI implementations (OpenMPI and MVAPICH2). The default compiler is GCC 10.2.0, which has been compiled with AMD Rome specific optimizations (-march=znver2). GCC and AOCC compilers can be configured to generate Advanced Vector Extensions 2 (AVX2). Using AVX2, up to eight floating point operations can be executed per-cycle per-core. AVX2 is not enabled by default and is enabled by setting the appropriate compiler flags.

Using GCC

The GNU GCC compiler family can be loaded with the module system (it is loaded by default):

module load gcc

To compile a program with the GNU toolchain use the following commands:

	Serial	MPI	OpenMP	MPI+OpenMP
Fortran	`gfortran`	`mpif90`	`gfortran -fopenmp`	`mpif90 -fopenmp`
C	`gcc`	`mpicc`	`gcc -fopenmp`	`mpicc -fopenmp`
C++	`g++`	`mpicxx`	`g++ -fopenmp`	`mpicxx -fopenmp`

To compile your programs with AVX extensions, compile with the -march=core-avx2 compiler flag. You will probably want to use this in conjunction with normal optimization flags (i.e. -O3)

For more information on the GNU compilers, check the manual pages:

man gcc or man g++ or man gfortran

Using AOCC (AMD compiler)

The AMD Optimizing C/C++ Compiler (AOCC) is available and can be loaded with the module system:

module load aocc

To compile a program with the AMD toolchain use the following commands:

	Serial	MPI	OpenMP	MPI+OpenMP
Fortran	`flang`	`mpif90`	`flang -mp`	`mpif90 -mp`
C	`clang`	`mpicc`	`clang -mp`	`mpicc -mp`
C++	`clang++`	`mpicxx`	`clang++ -mp`	`mpicxx -mp`

Running Jobs on Lotus

Running programs on the cluster is done by interacting with the job scheduling system. Lotus uses the SLURM job scheduler for managing both batch jobs and interactive runs. You should not run computationally intensive tasks on the login nodes – use the compute nodes.

If you have special needs for running jobs on the cluster, please contact the cluster support staff to help. Submitting large quantities of jobs (esp. short jobs) can impact overall scheduler response for all users.

Requesting Interactive Resources

You can request an interactive session by using the srun command.

srun --pty --nodes=2 --ntasks-per-node=48 -t 30:00 --wait=0 /bin/bash

This command requests two full compute nodes with 48 cores each (for a total of 96 cores) for 30 minutes. When this request is granted, you will automatically be logged into the assigned node and can work normally. If you would like to run a parallel program from within the interactive job you can use srun without any options:

srun myprog

Submitting Batch Jobs

To submit a job using a batch file, create a short text file in the style of the following examples, updating where necessary to reflect your program and parameters. You can add additional SBATCH lines to send email notifications (--mail-user), etc. see man sbatch for more information. To submit your job, use the sbatch command:

sbatch jobfile

The jobfile is the file you create and contains the SLURM resource specifications and shell commands. Several examples are provided below.

MPI Job

This job runs on 2 compute nodes with 48 cores each (for a total of 96 cores), each core is assigned a single MPI rank.

#!/bin/bash
#SBATCH --job-name="hellompi"
#SBATCH --output="hellompi.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --export=ALL
#SBATCH -t 01:30:00

srun ./hello_mpi

OpenMP Job

This job requests a single compute node and uses 48 threads for all OpenMP parallel sections. OpenMP (non-hybrid) will only work when all processes are on the same node (i.e. --nodes must be 1).

#!/bin/bash
#SBATCH --job-name="hello_openmp"
#SBATCH --output="hello_openmp.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=48
#SBATCH --export=ALL
#SBATCH -t 01:30:00

export OMP_NUM_THREADS=48
./hello_openmp

Hybrid MPI-OpenMP Job

This job requests 2 nodes and 96 total processors. This will launch 2 MPI ranks per node (total of 4 MPI processes), with each process using 24 OpenMP threads.

#!/bin/bash
#SBATCH --job-name="hellohybrid"
#SBATCH --output="hello_hybrid.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --ntasks=4
#SBATCH --export=ALL
#SBATCH -t 01:30:00

export OMP_NUM_THREADS=24
srun --cpus-per-task=$OMP_NUM_THREADS ./hello_hybrid

SLURM No-Requeue Option

SLURM will requeue jobs if there is a node failure of if your job is preempted. In some cases, this may cause input or output files to be overwritten that should be preserved. You may request that your job not be automatically re-queued by adding the following line to your batch file:

#SBATCH --no-requeue

Monitoring Job Status

Users can monitor their jobs using the squeue command.

[user1@lotus-login01]$ squeue -u user1
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
256556   compute hellompi    user1     R    0:03:57      4 compute[01-02]
256555   compute hellompi    user1     R    0:14:44      4 compute[03-04]

This shows two jobs that are currently running in the compute partition and which compute nodes they are assigned to. You can use additional options to customize your display:

-i <interval> repeats every interval seconds
-j<joblist> shows information for specific jobs

Users can cancel jobs using the scancel command:

[user1@lotus-login01]$ scancel <jobid>

Storage Considerations

Lotus has a single storage server with a total of 504TB of disk in a ZFS RAID-Z2 filesystem. This filesystem is the primary storage location for all data on the cluster. Programs that perform a lot of file I/O operations in parallel may have poor performance with this storage design. Lotus does not have any storage that uses a parallel filesystem.

Each compute node has access to 512GB of local SSD storage which can be used for check-pointing and programs that will benefit from local fast storage. The latency for local SSD access is several orders of magnitude lower than accessing the shared network filesystem. Users may use the /scratch filesystem on each compute node for temporary storage. Scratch storage space will be reclaimed after your job completes.

Software

Users may request that new software packages be added to the cluster if they may benefit multiple users or research groups. If you would like specific software installed, please contact the research computing support staff.