Duke Compute Cluster (DCC)

2025 DSS Bootcamp

Colin Rundel

DCC Basics

What is the DCC?

The Duke Compute Cluster is a general purpose high performance/high-throughput installation, and it is fitted with software used for a broad array of scientific projects. With a few notable exceptions, applications on the cluster are generally Free and Open Source Software.

Quick facts:

  • 1360 nodes which combined have more than 45,000 vCPUs, 980 GPUs and 270TB of RAM

  • Interconnects are 10 Gbps or 40 Gbps

  • Utilizes a 7 Petabyte Isilon file system

  • Runs Alma 9 and SLURM is the job scheduler

Cluster Appropriate Use

Users of the cluster agree to only run jobs that relate to the research mission of Duke University. Use of the cluster for the following activities is prohibited:

  • Financial gain

  • Commercial or business use

  • Unauthorized use or storage of copyright-protected or proprietary resources

  • Unauthorized mining of data on or off campus (including many web scraping techniques)

Data Security

Users of the cluster are responsible for the data they introduce to the cluster and must follow all applicable Duke (including IRB), school, and departmental policies on data management and data use.

Security and compliance provisions on the cluster are sufficient to meet the Duke data classification standard for public or restricted data.

Use of sensitive data (e.g. legally protected data such as PHI or FERPA) or data bound by certain restrictions in data use agreements is not allowed.

As a shared resource, privacy on the cluster is constrained and users of the cluster must conduct themselves in a manner that respects other researchers’ privacy. Cluster support staff have access to all data on the cluster and may inspect elements of the system from time to time. Metadata on the cluster and utilization by group (and sometimes user) will be made available to all cluster users and Duke stakeholders.

Data Classification Standard

See Data Classification Standard for more details:

  • Sensitive - this is data that Duke is either required by law to protect, or which Duke protects to mitigate institutional risk. (SSNs, CCNs, PHI, etc.)

  • Restricted - this is data that is not necessarily for public consumption, but also does not fit into the Sensitive category. Duke may have a proprietary obligation to protect Restricted data, but disclosure would not significantly harm the university. (NDA data, financial transaction data, etc.)

  • Public -All other data, which can be accessible to the general public.

Storage on the DCC

Path Size Description Backups
/work/<netid> 650 TB “Unpartitioned, high speed volume, shared across all users” None
/cwork/<netid> 830 TB High speed volume None
/hpc/home/<netid> 25 GB “Personal scripts, and other environment setup” None
/hpc/group/<groupname> 1 TB (expandable) “Private to each lab group” 7-day snapshot
/hpc/dctrl/<netid> 500 GB “Private to each PhD student” 7-day snapshot
/datacommons/<groupname> Fee-based “Archival storage” Optional 30-day backup

Storage usage

  • /home - personal scripts and configuration files, environment setup information

  • /group - software installations, lab specific scripts, moderately sized data sets or intermediate results that are needed for longer than 75 days.

  • /work - large data sets under analysis, intermediate results. Files older than 75 days are automatically purged!

  • /cwork/ - experimental version of /work for large data sets under analysis, intermediate data, and preliminary results. Files older than 75 days are automatically purged!

  • /datacommons - long term storage for source data and results data

  • /scratch - node specific high speed storage, data must be deleted when job is complete. Sizes vary by node, use with caution.

Accessing the DCC

All access to the DCC compute nodes is managed through the slurm job scheduler.

Traditionally, users would log in to the cluster via SSH and then submit jobs to the scheduler.

This is still the preferred approach for large scale jobs but is beyond the scope of today’s workshop. If this is of interest to you I strongly recommend RC training.

Open OnDemand

For interactive use the current recommended approach is to use the OnDemand web-based interface.

OnDemand Demo

Accessing OnDemand

Accounts & Partitions

Your account will belong to different groups and those groups will have access to different partitions on the DCC.

OnDemand generally will let you know which partitions you have access to, but this is a common reason a job may fail to run.

General use partitions:

  • interactive for short interactive sessions, default resources are low.
  • common for jobs that will run on the DCC core node
  • gpu-common for jobs that will run on DCC GPU nodes
  • scavenger for jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption).
  • scavenger-gpu for GPU jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption)

Limitations

Interactive partition:

  • 10 CPUs, 64 GB memory

All other partitions:

  • Total memory per user account: 1.5 TB
  • Total CPUs per user account: 400
  • Queued jobs per user per account: 400

Important Job Options

Stat OnDemand has been customized and looks a little bit different but the core job details are basically the same.

After account and partition, the other important job options are:

  • Session duration - how long the job will run before it is killed, better to pick longer than you think you need and then cancel the job if you finish early.

  • Session size - how many CPUs and GBs of memory to ask for. Keep in mind partition limits and larger jobs may take longer to schedule. We’ve provided defaults that should work for most use cases, but custom values can be set under “Advanced Options”.

App Container

This is another advanced option - we have pre-built containers for all of the tools provided that have a preconfigured environment with Python, R, and Julia installed (including common packages).

The hope is that these will meet most of your needs, but you can also create your own containers if something more specialized is needed.

If you would like to explore this see RC Containers documentation for details or come chat with me.