HPC Computing

HPC Computing

The High-Performance Computing (HPC) group maintains a wide range of computational resources to fit your needs. All are Linux compute clusters, each attached to large storage platforms to support intensive data research.

Each cluster is configured to meet the needs of its main users, with various operating CPUs and GPUs. Currently, Lilac is the primary cluster designated for new HPC research requests. Requests for alternate clusters are reviewed on a case-by-case basis depending on the research requirements and cluster availability.

For access to HPC systems, visit the Getting Started page. To request an account, click here.

Research HPC

The MSK and Sloan Kettering Institute research community has access to three HPC systems. Each has distinct differences that provide solutions to a unique set of computational requirements. Most users will operate on Lilac.

The Lilac HPC cluster runs an LSF queueing system, which supports a variety of job submission techniques.

Click here for a live view of the system activity on Lilac.

Lilac Cluster 2018

 

Compute

  • Primary login host: lilac.mskcc.org
  • 91 compute nodes with GPUs. (GPUs on nodes vary):
    name quantity model CPU cores RAM GPUs NVMe net
    ld01-07 7 NVidia DGX-1 2*Xeon E5-2698 v4 40 512 8*Tesla V100 10
    lg01-06 6 Exxact TXR231-1000R 2*Xeon E5-2699 v3 36 512 4*GeForce GTX TitanX 10
    lp01-35 35 Exxact TXR231-1000R 2*Xeon E5-2680 v3 24 512 4*GeForce GTX 1080Ti 10
    ls01-18 18 Supermicro SYS-7048GR-TR 2*Xeon E5-2697 v4 36 512 4*GeForce GTX 1080 2T 10
    lt01-08 8 Supermicro SYS-7048GR-TR 2*Xeon E5-2697 v4 36 512 4*GeForce GTX 1080Ti 2T 10
    lt09-22 14 Supermicro SYS-7048GR-TR 2*Xeon E5-2697 v4 36 512 4*GeForce GTX 1080Ti 2T 25
    lv01 1 Supermicro SYS-7048GR-TR 2*Xeon E5-2697 v4 36 512 4*Tesla V100 2T 25

Storage

  • /data (compute): 2.6PB, GPFS v5.1, 4 * NSD Servers, 4 * NetApp E5760, connected at 16*25gbps
  • /warm (non-compute): 1.5PB, GPFS v5.1, 4 * NSD Servers, 4 * Dell MD3460, connected at 4*10gbps

Network

Arista 7328x 100gbps switch

Bioinformatics HPC

High-Performance Genomics Computing Systems Access

Computing resources for investigators in the Computational Biology Center and other SKI investigators who need access to significant Linux compute and storage resources specifically in association with the Center for Molecular Oncology or the Bioinformatics Core. These systems support processing of sequence data generated by IGO.

Three computational clusters support Bioinformatics operations. These systems are known as Lux, Luna, and Juno. Lux primarily supports processing of raw genomic data. Luna is currently the main computational system used within the Bioinformatics Core and by members of the Center of Molecular Oncology. For members of these labs; these systems provide access to shared research data, applications, and packages to support bioinformatics analysis pipelines.

The Juno cluster is the successor to the Luna cluster. It runs a mixture of CentOS 6 and 7, whereas the Luna cluster runs CentOS 6 only. Juno has the same Isilon storage available as Luna. However, Juno’s CentOS 7 nodes have an additional large /juno filesystem. Users are currently testing Juno and preparing to migrate off Luna.

Click here for a live view of system activity on Juno.

Luna Cluster 2018

Compute

  • CentOS 6 cluster login host: luna.mskcc.org
  • Additional compute servers: selene.mskcc.org, rhea.mskcc.org, phos.mskcc.org, & pluto.mskcc.org
  • 98 compute nodes:
    name quantity model CPU #cores RAM #GPU NVMe GE
    s01-24 24 HPE DL160G8 2*Xeon E5-2660 16 384 none none 10
    t01-02 2 HPE DL580G8 4*Xeon E7-4820 v2 32 1536 none none 10
    u01-36 36 HPE DL160G9 2*Xeon E5-2640 v3 16 256 none none 10
    w01 1 HPE DL160G9 2*Xeon E5-2640 v4 16 256 none none 10
    x11-24 24 Supermicro SYS-6018R-TDW 2*Xeon E5-2640 v4 20 256 none 2x2T 25
    y01-03 3 Supermicro SYS-6018R-TDW 2*Xeon E5-2640 v4 20 512 none 2x2T 25
    • CentOS 7 cluster login host: juno.mskcc.org
    • 11 compute nodes:
      name quantity model CPU cores RAM GPU NVMe net OS
      jx01-10 10 Supermicro SYS-6018R-TDW 2*Xeon E5-2640 v4 20 256 none 2x2T 25 CentOS 7
      ju14 1 HPE DL160G9 2*Xeon E5-2640 v3 16 256 none none 10 CentOS 6

    Storage

    3.5PB, 28-node Isilon cluster. connected at 28*10GE
    • /home  — 100GB limit – for scripts only, no huge files, frequent mirrored backup
    • /ifs/work  — fast disk, less space, for active projects
    • /ifs/res  — slower disk, more space, for long-term storage of sequence data
    • /ifs/archive   — read-only – GCL FASTQ
    • /opt/common  — popular third-party programs
    • /common/data  — data, genome assemblies, GTFs, etc.

    • /juno (CentOS 7 only): 2.6PB, GPFS v5.1, 4*NSD servers, 4*NetApp E5760, connected at 16*25GE

    Network

    • Arista 7328x 100gbps switch

Clinical HPC

The Department of Molecular Pathology

Scientists here analyze tumor DNA and shed light on the complex molecular changes that occur in cancer. These advances are enabling doctors to improve diagnoses and develop personalized treatments. There are extraordinary opportunities to provide insights into what causes cancer to form or progress and to suggest strategies for blocking it.

To support this analytical process, a computational infrastructure has been put in place that provides the storage, compute, and networking requirements to effectively process genomic data to completion for use by doctors and geneticists. This system is known as Phoenix, and it is an isolated compute cluster dedicated to clinical sequence processing.

Access to this system requires a direct association with the Department of Molecular Pathology. Account requests must come from Aijaz Syed, Ahmet Zehir, or Michael Berger. For more information, contact us.

Clinical Cluster Architecture

The Phoenix cluster consists of ten compute nodes, two head nodes, a three-node Islion storage cluster, a pair of Arista 7050 10GbE switches, and a serial console switch. The equipment is installed in cabinet 16JJ and 16KK in the New Jersey data center.

The compute nodes are eight HP ProLiant DL160 Gen8s with 128 GB of memory “thin” nodes and two HP ProLiant DL580 G7s with 1 TB of memory “fat” nodes. In total, the cluster can produce 208 cores of compute power. The head nodes are two HP ProLiant DL380s with 128 GB of memory. The cluster has an internal 85 TB Isilon storage cluster for sharing data and software among compute nodes.

The cluster compute nodes are connected to a pair of private 10 GbE switched networks that do not physically connect to any MSK network. The only nodes that connect to the MSK LAN are the cluster head nodes, the Isilon storage cluster, and the LIMS servers.

The cluster’s pair of private switches are Arista 7050 10 GbE units, which are connected via a trunked connection of 80 GbE links.

Cluster Storage

  • approximately 600 TB of Isilon storage (mirrored in two data centers)

Cluster Head Node: Phoenix-h1, phoenix-h2

The cluster head nodes are a pair of HP ProLiant DL380 Gen8 servers with 16 cores and 128 GB of memory each. They act as the qmaster for the cluster’s SGE, DNS server (master and slave), PXE Boot server, Puppet master, Foreman server, and DHCP server.

AD authentication is activated via CentrifyDC software on the servers. Users in the AD groups bicadmin, dmppower, dmpseq, and dmpalys are allowed to log on. Only users in the AD groups bicadmin and dmppower are allowed to submit jobs via qsub common.

Cluster Compute Thin Node: p01 to p08

Currently, there are eight DL160 Gen8 servers with 2x Intel Xeon E5-2670 CPUs (2.60 Ghz, eight cores, 20 MB, 115 W); 128 GB of memory; and a 300 GB (15 k rpm), 1 TB (7.2 k rpm) HD, and 10 Gb two-port 530FLR-SFP adapter. The server’s eth0 and eth1 are bonded by a 10 Gb link to the private network 10.1.0.0/24 and a single 1 Gb link (iLO) to the management network 172.22.242.0/25.

Cluster Compute Fat Node: pa1, pa2

There are two DL580 G7 servers with 4x Intel Xeon E7-4870 CPUs (2.40 Ghz, ten cores, 30 MB, 130 W); 1 TB of memory; and a 300 GB (15 k rpm), 7 TB (7.2 k rpm) HD, and 1x 10 Gb dual port NC524 SFP module. The server’s eth0 and eth1 are bonded by a 10 Gb link to the private network 10.1.0.0/24 and a single 1 Gb link (iLO) to the management network 172.22.242.0/25.