The High-Performance Computing (HPC) group maintains a wide range of computational resources to fit your needs. All are Linux compute clusters, each attached to large storage platforms to support intensive data research.
Each cluster is configured to meet the needs of its main users, with various operating CPUs and GPUs. Currently, Lilac is the primary cluster designated for new HPC research requests. Requests for alternate clusters are reviewed on a case-by-case basis depending on the research requirements and cluster availability.
The MSK and Sloan Kettering Institute research community has access to three HPC systems. Each has distinct differences that provide solutions to a unique set of computational requirements. Saba and Hal are being consolidated into the larger Lilac cluster. Most users will operate on Lilac.
The Lilac HPC cluster runs an LSF queueing system, which supports a variety of job submission techniques.
Cluster Technical Specifications
- a total of 2,520 Intel compute cores, 102 GTX 1080s, and 1080Ti NVIDIA GPUs
- 26 supermicro SYS-7048GR-TR nodes, ls01..18, each with four GTX 1080 NVIDIA GPUs per node, lp01-36
- six nodes, lg01..06, each with four GPUs per node (either GTX 1080 NVIDIA or Titan X)
- 36 nodes, lp01-lp36, with 32 GTX 1080Ti GPUs and eight GTX 1080 GPUs
- sda/120 GB and sdb/1 TB drives
- dual power supplies
- 4U in size
- 10 G switch
- Name: hal-7308
- Arista switches
- /data (compute) has 3,840 terabytes of raw storage and 2.5 PB of useable storage
- /warm (noncompute) has 1.5 PB of useable storage
- 10 arrays
- 60 * 8 TB or 10 TB disks per array
Computing resources are available for investigators in the Computational Biology Center and other SKI investigators who need access to significant Linux compute and storage resources in association with the Marie-Josée and Henry R. Kravis Center for Molecular Oncology (CMO) or the Bioinformatics Core. These systems support the initial processing of sequence data that’s generated by the Integrated Genomics Operation.
Two computational clusters support bioinformatics operations: Lux and Luna. Lux primarily supports the processing of raw genomic data. Luna is the main computational system used by the Bioinformatics Core and members of the CMO. These systems provide these labs with access to shared research data, applications, and packages that support various pipelines for bioinformatics analysis.
- luna.cbio.mskcc.org (login host): HP DL380 Gen8, one Xeon E5–2650 v2 @ 2.60 GHz, and 64 gb RAM
- 62 compute nodes, with 1,024 cores (2,048 threads)
- u01-u36: 36 HP ProLiant DL160 Gen9, dual eight-core Xeon E5–2640 v3s @ 2.60 GHz, and 256 gb RAM per node
- s01-s24: 24 HP ProLiant DL160 Gen8, dual eight-core Xeon E5–2660 0s @ 2.20 GHz, and 384 gb RAM per node
- t01-t02: two HP ProLiant DL580 Gen8, quad eight-core Xeon E7–4820 v2s @ 2.00 GHz, and 1.5 tb RAM per node
- nodes have 800 GB at /scratch/$USER
- solisi (isilon array) 4.4P PBytes (NL and X)
Luna is the head node for submitting jobs to the cluster. The following directories are available:
- /home has a 100 GB limit; for scripts only—no huge files; frequent mirrored backup
- /ifs/work is a fast disk, with less space, for ongoing projects; about 10 TB per lab
- /ifs/res is a slow disk, with more space, for long-term storage of sequence data
- /ifs/archive is read only; GCL fastqs
- /opt/common holds binaries and popular third-party programs
- /common/data has data, genome assemblies, and GTFs, etc.
Scientists here analyze tumor DNA and shed light on the complex molecular changes that occur in cancer. These advances are enabling doctors to improve diagnoses and develop personalized treatments. There are extraordinary opportunities to provide insights into what causes cancer to form or progress and to suggest strategies for blocking it.
To support this analytical process, a computational infrastructure has been put in place that provides the storage, compute, and networking requirements to effectively process genomic data to completion for use by doctors and geneticists. This system is known as Phoenix, and it is an isolated compute cluster dedicated to clinical sequence processing.
Access to this system requires a direct association with the Department of Molecular Pathology. Account requests must come from Aijaz Syed, Ahmet Zehir, or Michael Berger. For more information, contact us.
Clinical Cluster Architecture
The Phoenix cluster consists of ten compute nodes, two head nodes, a three-node Islion storage cluster, a pair of Arista 7050 10GbE switches, and a serial console switch. The equipment is installed in cabinet 16JJ and 16KK in the New Jersey data center.
The compute nodes are eight HP ProLiant DL160 Gen8s with 128 GB of memory “thin” nodes and two HP ProLiant DL580 G7s with 1 TB of memory “fat” nodes. In total, the cluster can produce 208 cores of compute power. The head nodes are two HP ProLiant DL380s with 128 GB of memory. The cluster has an internal 85 TB Isilon storage cluster for sharing data and software among compute nodes.
The cluster compute nodes are connected to a pair of private 10 GbE switched networks that do not physically connect to any MSK network. The only nodes that connect to the MSK LAN are the cluster head nodes, the Isilon storage cluster, and the LIMS servers.
The cluster’s pair of private switches are Arista 7050 10 GbE units, which are connected via a trunked connection of 80 GbE links.
- approximately 600 TB of Isilon storage (mirrored in two data centers)
Cluster Head Node: Phoenix-h1, phoenix-h2
The cluster head nodes are a pair of HP ProLiant DL380 Gen8 servers with 16 cores and 128 GB of memory each. They act as the qmaster for the cluster’s SGE, DNS server (master and slave), PXE Boot server, Puppet master, Foreman server, and DHCP server.
AD authentication is activated via CentrifyDC software on the servers. Users in the AD groups bicadmin, dmppower, dmpseq, and dmpalys are allowed to log on. Only users in the AD groups bicadmin and dmppower are allowed to submit jobs via qsub common.
Cluster Compute Thin Node: p01 to p08
Currently, there are eight DL160 Gen8 servers with 2x Intel Xeon E5-2670 CPUs (2.60 Ghz, eight cores, 20 MB, 115 W); 128 GB of memory; and a 300 GB (15 k rpm), 1 TB (7.2 k rpm) HD, and 10 Gb two-port 530FLR-SFP adapter. The server’s eth0 and eth1 are bonded by a 10 Gb link to the private network 10.1.0.0/24 and a single 1 Gb link (iLO) to the management network 172.22.242.0/25.
Cluster Compute Fat Node: pa1, pa2
There are two DL580 G7 servers with 4x Intel Xeon E7-4870 CPUs (2.40 Ghz, ten cores, 30 MB, 130 W); 1 TB of memory; and a 300 GB (15 k rpm), 7 TB (7.2 k rpm) HD, and 1x 10 Gb dual port NC524 SFP module. The server’s eth0 and eth1 are bonded by a 10 Gb link to the private network 10.1.0.0/24 and a single 1 Gb link (iLO) to the management network 172.22.242.0/25.