2025 DSS Bootcamp
The Duke Compute Cluster is a general purpose high performance/high-throughput installation, and it is fitted with software used for a broad array of scientific projects. With a few notable exceptions, applications on the cluster are generally Free and Open Source Software.
Quick facts:
1360 nodes which combined have more than 45,000 vCPUs, 980 GPUs and 270TB of RAM
Interconnects are 10 Gbps or 40 Gbps
Utilizes a 7 Petabyte Isilon file system
Runs Alma 9 and SLURM is the job scheduler
Users of the cluster agree to only run jobs that relate to the research mission of Duke University. Use of the cluster for the following activities is prohibited:
Financial gain
Commercial or business use
Unauthorized use or storage of copyright-protected or proprietary resources
Unauthorized mining of data on or off campus (including many web scraping techniques)
Users of the cluster are responsible for the data they introduce to the cluster and must follow all applicable Duke (including IRB), school, and departmental policies on data management and data use.
Security and compliance provisions on the cluster are sufficient to meet the Duke data classification standard for public or restricted data.
Use of sensitive data (e.g. legally protected data such as PHI or FERPA) or data bound by certain restrictions in data use agreements is not allowed.
As a shared resource, privacy on the cluster is constrained and users of the cluster must conduct themselves in a manner that respects other researchers’ privacy. Cluster support staff have access to all data on the cluster and may inspect elements of the system from time to time. Metadata on the cluster and utilization by group (and sometimes user) will be made available to all cluster users and Duke stakeholders.
See Data Classification Standard for more details:
Sensitive - this is data that Duke is either required by law to protect, or which Duke protects to mitigate institutional risk. (SSNs, CCNs, PHI, etc.)
Restricted - this is data that is not necessarily for public consumption, but also does not fit into the Sensitive category. Duke may have a proprietary obligation to protect Restricted data, but disclosure would not significantly harm the university. (NDA data, financial transaction data, etc.)
Public -All other data, which can be accessible to the general public.
Path | Size | Description | Backups |
---|---|---|---|
/work/<netid> |
650 TB | “Unpartitioned, high speed volume, shared across all users” | None |
/cwork/<netid> |
830 TB | High speed volume | None |
/hpc/home/<netid> |
25 GB | “Personal scripts, and other environment setup” | None |
/hpc/group/<groupname> |
1 TB (expandable) | “Private to each lab group” | 7-day snapshot |
/hpc/dctrl/<netid> |
500 GB | “Private to each PhD student” | 7-day snapshot |
/datacommons/<groupname> |
Fee-based | “Archival storage” | Optional 30-day backup |
/home
- personal scripts and configuration files, environment setup information
/group
- software installations, lab specific scripts, moderately sized data sets or intermediate results that are needed for longer than 75 days.
/work
- large data sets under analysis, intermediate results. Files older than 75 days are automatically purged!
/cwork/
- experimental version of /work for large data sets under analysis, intermediate data, and preliminary results. Files older than 75 days are automatically purged!
/datacommons
- long term storage for source data and results data
/scratch
- node specific high speed storage, data must be deleted when job is complete. Sizes vary by node, use with caution.
All access to the DCC compute nodes is managed through the slurm job scheduler.
Traditionally, users would log in to the cluster via SSH and then submit jobs to the scheduler.
This is still the preferred approach for large scale jobs but is beyond the scope of today’s workshop. If this is of interest to you I strongly recommend RC training.
For interactive use the current recommended approach is to use the OnDemand web-based interface.
Open OnDemand gives access to tools like Jupyter Notebooks, RStudio, and VS Code on the DCC through your browser
Currently there are two flavors of OnDemand available:
Your account will belong to different groups and those groups will have access to different partitions on the DCC.
OnDemand generally will let you know which partitions you have access to, but this is a common reason a job may fail to run.
General use partitions:
Interactive partition:
All other partitions:
Stat OnDemand has been customized and looks a little bit different but the core job details are basically the same.
After account and partition, the other important job options are:
Session duration - how long the job will run before it is killed, better to pick longer than you think you need and then cancel the job if you finish early.
Session size - how many CPUs and GBs of memory to ask for. Keep in mind partition limits and larger jobs may take longer to schedule. We’ve provided defaults that should work for most use cases, but custom values can be set under “Advanced Options”.
This is another advanced option - we have pre-built containers for all of the tools provided that have a preconfigured environment with Python, R, and Julia installed (including common packages).
The hope is that these will meet most of your needs, but you can also create your own containers if something more specialized is needed.
If you would like to explore this see RC Containers documentation for details or come chat with me.