Gerstein Lab Computing in HPC

From GersteinInfo

Jump to: navigation, search

Contents

General Usage Notes

Much of the Gerstein lab's computational power and storage resides on McCleary, the high-performance compute (HPC) system maintained the Yale Center for Research Computing (https://research.computing.yale.edu). This page has information on how to access and use these resources, especially for new HPC users.


Getting Access

New users who have not used HPC before should first complete the following form to request an account: http://research.computing.yale.edu/support/hpc/account-request When filling out the form, select "no" to the question "Are you a Principal Investigator", and then put Mark Gerstein for Principal Investigator and mihali.felipe@yale.edu for PI or delegate email. Also, select "McCleary" in the cluster checklist.

After the form is submitted, email Mihali so that he can approve your access. Then, you will need to generate a public-private key and submit the public key to YCRC to allow for ssh access. See https://docs.ycrc.yale.edu/clusters-at-yale/access/ssh/ for instructions on how to upload your key (follow the steps listed under "Generate Your Key Pair on macOS and Linux").

For students joining the lab who have HPC access under a different lab (such as rotation students), please e-mail HPC at hpc@yale.edu and ask to connect your existing account to the lab resources.


Logging into the cluster

McCleary can be accessed using two methods. Via command line, you can ssh into the cluster using the command ssh <netid>@mccleary.ycrc.yale.edu. If you prefer a graphic-user interface, you can login using Open On-Demand through a web browser at https://ood-mccleary.ycrc.yale.edu/. Note that you will need to use two-factor authentication during the login, and you will first need to login to Yale VPN if you are off campus (see https://docs.ycrc.yale.edu/clusters-at-yale/access/vpn/ for more details).


Navigating the cluster

There are several locations within McCleary where you can store data files and software packages, as described below. See https://docs.ycrc.yale.edu/data/hpc-storage/ for more information on these drives.

  • Home directory: This is your main directory when logging into the cluster (also accessible at ~/ ). This is ideal for small files and software packages, and is limited to 125Gb and 500k files per person.
  • Project directory: This directory is accessible at ~/project. It is useful for smaller data files and software files, and is the default install location for conda packages. As this is limited to 4Tb storage and 5M files for the entire lab, please do not store large data files in this directory.
  • Scratch directory: This directory is accessible at ~/palmer_scratch. It is useful for large files and temporary files that do not need to be permanently saved on the cluster. It is limited to 10Tb storage and 15M files for the entire lab, and any files older than 60 days will be deleted.
  • Gibbs storage directory: This is the main data directory for the Gerstein lab, located at /gpfs/gibbs/pi/gerstein/. In most cases, all of your main data and analysis files and folders should be kept here. If you are a new user, please create a directory here for yourself. As of December 2024, this drive contains approx. 1.5PB of storage space, limited to 110M files for the lab.
  • Palmer storage directory: This is an auxiliary data directory for the Gerstein lab, located at /vast/palmer/pi/gerstein/. This directory is useful for any large files, especially if there is limited room on Gibbs. If you are a new user, please create a directory here for yourself. As of December 2024, this drive contains approx. 140Tb of storage space, limited to 50M files for the lab.
  • Slayman storage directory: This is a temporary, read-only directory currently located at /gpfs/gibbs/pi/slayman/slayman-gerstein, containing files previously stored on the now-defunct Slayman system. These files will be migrated to Gibbs in early 2025.

To see how much free space is available in each of these locations, use the getquota command on HPC.

Controlling Data Access by ACL

To limit access to a certain directory in Gibbs and Palmer (for instance, when using protected access data), first create a directory and set the permission to chmod 700.

To allow access to 'directory' by user 'netid':

$ setfacl -m u:<netid>:rx <directory>

Use getfacl to examine ACLs:

$ getfacl <directory>


Installing software

Most bioinformatics software packages can be installed and used on the cluster without root access. Here are a few tips for using software on McCleary:

  • Many common programming languages/libraries (Python, R, Matlab) and bioinformatics packages (such as Samtools, BWA, bedtools, bcftools, etc.) are already installed on the cluster. Use module avail to list all available software packages on the cluster, and module load <package_name> to load a package into memory. See https://docs.ycrc.yale.edu/applications/modules/ for more information.
  • McCleary also has Miniconda installed to generate your own R or Python-based environment. To create a new environment, first run module load miniconda, and then run conda create -n <add_name_here> <add_packages_here> to create the environment. (This step might require additional memory; if the step is killed, see "submitting jobs" below to run an interactive job with more memory.) To activate your environment, first run run module load miniconda and then conda activate <environment_name>. See https://docs.ycrc.yale.edu/clusters-at-yale/guides/conda/ for more information.
    • Note that the default storage directory for conda environments is ~/project. If the disk space is full, you can create an environment in Gibbs or Palmer using conda create --prefix /gpfs/gibbs/pi/gerstein/<netid> -n <add_name_here> <add_packages_here>, and then use the full file path to the environment in conda activate.
  • To build or use Docker images on McCleary, you will need to use Apptainer/Singularity for a non-root interface with the Docker image. You can download existing Dockers from Docker Hub using the command apptainer build <image_name>.sif docker://<image_name> (you will first need to run an interactive job to do this). Then, you can call the image from a script: apptainer exec containername.sif <command_within_image>. See https://docs.ycrc.yale.edu/clusters-at-yale/guides/containers/ for more information, including on how to build your own container.


Compute Resources and Partitions

McCleary contains many CPU and GPU nodes for use by Gerstein lab members, including general Yale-wide resources and dedicated nodes for Gerstein lab use only. These are organized into partitions on McCleary for job submission (see below). Note the limits for job length and memory; if your job does not follow these guidelines, it may not run or will be delayed until more resources are available.

The following notes describe the available resources and partitions, updated as of December 2024. You can use this guide when decide which partition is best to submit your job. See https://docs.ycrc.yale.edu/clusters/mccleary/ for up-to-date information.

  • Dedicated Gerstein lab nodes
    • pi_gerstein partition: Contains 84 CPUs and 1056Gb memory. Jobs can be submitted for up to 7 days.
    • pi_gerstein_gpu partition: Contains one A100 (4 GPUs, 80GB memory) and two RTX3090 nodes (12 total GPUs, 48GB memory). Jobs can be submitted for up to 7 days.
  • Yale-wide nodes
    • day partition: 1844 CPUs with 26Tb total memory. Jobs can be submitted for 24 hours with up to 256 CPUs and 3000 GB memory per user (512 CPUs and 6000GB memory total for the lab; max 983Gb memory per job).
    • week partition: 968 CPUs with 14Tb total memory. Jobs can be submitted for up to 7 days with up to 192 CPUs and 2949 GB memory per user (max 983Gb memory per job).
    • long partition: 108 CPUs with 540Gb total memory. Jobs can be submitted for up to 28 days with up to 36 CPUs per user and 180GB memory per job.
    • bigmem partition: 268 CPUs with 23Tb total memory. Jobs can be submitted for 24 hours with up to 32 CPUs and 3960 GB memory per user.
    • gpu partition: Contains 9x A5000, 3X A100, 3X RTX3090, and 4X RTX5000 nodes, each with 4 GPUs and between 16-84Gb memory. Jobs can be submitted for up to 2 days with a limit of 12 GPUs/user (and 24 GPUs for the lab).


Submitting Jobs

To run a software command, most tasks will require the Slurm interface to submit a job to a partition on McCleary. Some useful commands for doing this are described here. Also see https://docs.ycrc.yale.edu/clusters-at-yale/job-scheduling/ for more information on using Slurm on McCleary.

  • To run a basic Slurm job, create a bash script with an SBATCH metadata header, followed by the commands you wish to run. An example is below:

#!/bin/bash

#SBATCH --output <job_name>.out

#SBATCH --job-name <job_name>

#SBATCH --ntasks=2 #Add in number of threads/CPUs for parallel jobs

#SBATCH --mem-per-cpu 20g #Add in amount of memory per CPU

#SBATCH -t 72:00:00 #Add in expected amount of time for your job to run

#SBATCH --mail-type ALL #Add this optional flag if you want to be emailed when the job starts and ends

#SBATCH --partition=<partition_name>

module load <module_name>

<Add job commands here>

BAM/FASTQ internal data archive

Data Transfers to external locations

https://docs.ycrc.yale.edu/clusters-at-yale/data/transfer/


Additional Training

See https://research.computing.yale.edu/training/ for resources on how to use the resources available on McCleary. Workshops and training sessions hosted by YCRC are also announced in monthly e-mails.


Data maintenance after off-boarding

Help/issues

Current Hardware {OLD}

Below data is kept for historical reasons (last updated in 2020)

Compute

Grace

33 nodes, 672 cores purchased in 2015

Farnam

2 nodes, 64 cores purchased in 2012 (will be shutdown in 2019)

11 nodes, 308 cores purchased in 2018

1 large memory node (1.5TB RAM), 40 cores purchased in 2018

3 GPU nodes (2xNVIDIA K80), 60 cores purchased in 2016

2 GPU nodes (2xNVIDIA P100), 56 cores purchased in 2018

1 GPU node(4xNVIDIA TITAN V), 8 cores purchased in 2018

Total: 20 nodes, 536 cores

  • 32 nodes were shutdown in September 2018

OpenStack

1 director node, 8 cores

3 controllers nodes, 24 cores

3 ceph nodes, 24 cores

5 compute nodes, 40 cores

Storage

Loomis (mounted on grace)

3 TB default allocation

130 TB purchased in 2014

170 TB purchased in 2015

100 TB purchased in 2016

Total: 403 TB 93% used (27 TB free)

  • 30 TB loan from HPC (ending Jan 2019)

Farnam (mounted on both farnam and grace)

4 TB default allocation

90 TB purchased in 2013 (will retire in July 2019)

276 TB purchased in 2016

757 TB purchased in 2018

Total: 1127 TB 83% used (193 TB free)

  • Due to the limit of storage capacity, the actual will be 1017 TB + 90 TB (loan from HPC). The loan will be taken away in July 2019.

OpenStack

Ceph nodes, 163TB

Compute nodes, 2TB

Director node, 2.2TB

Controller node, 2TB

70 TB with 10GB connection to farnam

Actual CPU Usage

Grace

Grace Shared 258,948 h (equivalent to ~120 cores at 100% utilization)

Grace Dedicated 371,982 h (equivalent to ~175 cores at 100% utilization)

Grace Scavenge 362,890 h (equivalent to ~165 cores at 100% utilization)

Grace Total 993,820 h (~462/672 cores)

Farnam

Farnam Shared 108,789 h (equivalent to ~50 cores at 100% utilization)

Farnam Dedicated 262,642 h (equivalent to ~120 cores at 100% utilization)

Farnam Dedicated - GPU 13,250 h

Farnam Scavenge 29,527 h (equivalent to 13 cores at 100% utilization)

Farnam Total 414,208 h (~183/536 cores)

Personal tools