COSMA job queues
A list of available queues can be found using the sinfo -a -s command. Partitions are summarised here:
Partition |
Nodes |
Node details |
GPU info |
Network |
Comments |
|---|---|---|---|---|---|
cosma8 |
528, m8001-m8528 |
128 cores, 1TB |
No |
HDR 200g |
Variants: -rome, -milan, -prince, -pauper, -serial |
cosma7 |
220, m7229-m7448 |
28 cores, 512GB |
No |
EDR 100g |
Variants: -prince, -pauper |
cosma7-rp |
222, m7001-m7222 |
28 cores, 512GB |
No |
Rockport 100g |
Variants: -prince, -pauper |
cosma8-shm |
2, mad04,05 |
128 cores, 4TB |
Up to 3x A100 |
HDR 200g |
Composible fabric |
cosma8-shm2 |
2, ga004,06 |
128,64 cores, 1TB |
MI100, 2x MI210 |
HDR 200g |
AMD GPUs |
cosma8-shm3 |
2, mad08,09 |
192 cores, 768GB |
No |
HDR 200g |
AMD Genoa |
cosma8-ska |
1, mad07 |
56 cores, 4TB |
No |
HDR 200g |
NVDIMMs |
cosma8-ska2 |
4, mad01,03,04,05 |
56, 48, 128, 128 cores, 3, 6, 4, 4TB |
3x A100 |
EDR/HDR |
mad04,05 composible fabric |
cosma8-highram |
See cosma8-shm |
||||
cosma8-draper |
1, ga007 |
96 cores, 2TB |
8x MI300X |
HDR200 |
AMD GPU |
cosma8-dine2 |
8, gc001-gc008 |
64 cores, 2TB |
A30, V100 |
HDR200 |
Composible fabric |
cosma7-mad |
1, mad01 |
56 cores, 3TB |
No |
EDR 100g |
|
cosma7-shm |
2, mad01,02 |
56, 112 cores, 3, 1.5TB |
No |
EDR 100g |
|
cosma7-shm2 |
1, mad03 |
48 cores, 6TB |
No EDR 100g |
NVDIMMs |
|
cosma5 |
8, m5001-m5008 |
256 cores, 1.5TB |
No |
EDR 100g |
Replacement of original COSMA5 |
cosma-analyse |
3, mad01,02,03 |
56,112,48 cores, 3,1.5,6TB |
No |
EDR 100g |
|
bluefield1 |
18, b101-b124 with gaps |
32 cores, 512G |
No |
HDR200 |
Including Bluefield2 DPUs |
dine |
See bluefield1 |
||||
dine2 |
See cosma8-dine2 |
||||
gracehopper |
1, gn003 |
72 cores, 512GB |
H100 GPU |
HDR200 |
Arm CPU |
mi300x |
See cosma8-draper |
Nodes available for direct SSH (from a login node), for prototyping etc:
gn002 - Grace Hopper, cf gracehopper parttion
mad06 - 128 cores, 1TB RAM, Milan-X (large cache), part of the cosma8-shm composible fabric with up to 3 A100 GPUs
ga008 - 96 cores, 500G, 4x AMD MI300A GPUs
COSMA8 queues
There are a number of COSMA8 queues: cosma8, cosma8-prince, cosma8-pauper, cosma8-serial, cosma8-shm, cosma8-shm2, cosma8-shm3, cosma8-draper, cosma8-dine2, cosma8-milan, cosma8-rome, cosma8-highram (and a few others)
If you are unsure, submit to cosma8 (assuming yuo have access - see below).
Because COSMA8 contains two generations of processors (Rome and Milan), if you want to only submit to one of those (there is a 10-20% performance difference between them), you can submit to the specific cosma8-milan or cosma8-rome partitions.
If your job does not require a full node (of 128 cores), you can submit to the cosma8-serial queue.
If you have a valid reason for a longer running time (than 3 days), or for higher priority, you can ask to be able to submit to the -prince queues. The prince queues allow 30 day run times, and can be used by large jobs to avoid the need for reservations and increase system efficiency.
The -pauper queues are for low priority jobs, primarily for projects who have used up their allocation (who will automatically be demoted to this queue). They are limited to a 24 hour runtime.
Other COSMA8 queues are smaller specialised compute partitions, mentioned in more detail below.
Access to COSMA8
You can use the scontrol show partition=PARTITION to show details about each queue, and work out which groups/projects you would need to be in to submit jobs to them.
Access to the DiRAC COSMA8 system is only for DiRAC projects with a time allocation on this system. Each COSMA8 compute node has 128 cores and 1TB RAM. If you request fewer than 128 cores for a job using the cosma8-serial queue, you will not be granted exclusive access to a node, and therefore may share a node with other jobs. Unless you specify a memory in your SLURM script (e.g. bash #SBATCH --mem=800G), then you will be given a fraction of 1TB equal to the number of cores you request/128. If you do not wish to share a node, use the “#SBATCH –exclusive” flag (or don’t submit to the cosma8-serial queue). The cosma8-shm queue provides access to two large 4TB nodes, each with 128 cores.
cosma8-shm, cosma8-shm2, cosma8-shm3
The cosma8-shm and cosma8-shm2 queues are small queues with non-standard hardware.
cosma8-shm contains:
mad04: 128 cores, 4TB RAM, 0-3 NVIDIA A100 GPUs, AMD Rome processors
mad05: 128 cores, 4TB RAM, 0-3 NVIDIA A100 GPUs, AMD Rome processors
If also used to include mad06: Milan-X architecture, 128 cores, 1TB RAM, large (768MB) L3 cache, with 0-3 NVIDIA A100 GPUs, though this is now a more general purpose node for direct ssh from a login node.
The A100 GPUs are “composable”, that is, they can be allocated to different servers upon demand using a Liqid fabric. By default, mad04 and mad05 contain one GPU each (as does login8b). However, should more than 1 GPU be required for a particular workload this can be arranged (at the expense of moving them from the other nodes)
cosma8-shm2 contains:
ga004: 128 cores, 1TB RAM, AMD MI100 GPU, AMD Rome processors
ga005: 64 cores, 1TB RAM, 2x AMD MI200 GPU, AMD Rome processors
ga006: 64 cores, 1TB RAM, 2x AMD MI200 GPU, AMD Rome processors
Should you wish to submit to a particular node, or exclude a particular node (e.g. to ensure that you have access to a particular type of GPU), you can use the bash #SBATCH --exclude=ga004 directive within your batch script (which in this case would exclude ga004, thus ensuring you have access to a node with 2x AMD MI200 GPUs. Or #SBATCH –nodelist=ga004 (which in this case would submit to only the ga004 node).
cosma8-shm3 contains:
mad08: 192 cores, 768GB RAM, Genoa processors
mad09: 192 cores, 768GB RAM, Genoa processors
cosma8-highram
The cosma8-highram queue uses the same nodes as the cosma8-shm queue, but is for DiRAC projects.
COSMA7 queues: cosma7, cosma7-rp, -pauper and -prince
Access to the DiRAC COSMA7 machine is provided using the two queues cosma7 and cosma7-rp (and their increased or reduced priority versions, cosma7-pauper, cosma7-prince, cosma7-rp-pauper and cosma7-rp-prince). These two queues provide access to half of COSMA7, with 224 nodes in each. cosma7 uses a 100Gbit/s InfiniBand fabric, while cosma7-rp uses a 100Gbit/s Rockport Ethernet fabric (6D torus topology).
All the queues are configured so that job exclusive access to nodes is enforced. This means that no jobs share a compute node. Therefore, if you only need a single core, your project allocation will still be charged for using 28 cores.
The main use of these queues is for DiRAC projects that have been assigned time at Durham and the mix of jobs expected to match its capabilities are MPI/OpenMP/Hybrid jobs using up to 28 cores per node with a maximum memory of 512GB per node. If any jobs can run using fewer resources than a single node then they should be packaged into a batch job with appropriate internal process control to scale up to this level.
In addition to the hardware limits the queues have the following limits and priorities:
Name |
Priority |
Maximum run time |
Maximum cores |
|---|---|---|---|
cosma7 and cosma7-rp |
Normal |
72 hours |
unlimited |
-pauper |
Low |
24 hours |
unlimited |
-prince |
Highest |
30 days |
4096 |
The pauper and prince variations of the queues share the same resources so the order that jobs run are decided on a number of factors. Higher priority jobs will run first, and in fact jobs in higher priority queues will always run before lower priority jobs, however, it may not superficially seem like that as jobs from lower priority queues may run as back-fills (this is allowed when a lower priority job will complete before the resources needed for a higher one will become available, so setting a run-time limit for your job may get it completed more quickly). See the FAQ for how to make use of back-filling.
COSMA6 queues: cosma6, cosma6-pauper and cosma6-prince
COSMA6 was retired in April 2023 after 11 years of service.
COSMA5 queues: cosma5, cosma-test
Access to the Durham COSMA5 machine is provided using the cosma5 queue. This queue is comprised of the new (2024, 2025) COSMA5 nodes, a total of 8 nodes each with 256 cores and 1.5TB RAM. This partition is non-exclusive, i.e. jobs will share a node with other jobs unless exclusive access is explicitly requested. You are therefore advised to explicitly request the amount of RAM your job will use, otherwise a default of around 6GB per core will be allocated.
The old COSMA5 nodes were retired in 2025, originally starting service in 2012 (16 cores per node, 128GB RAM). There were originally 420 nodes, 6720 cores.
The main use of this queue is for Durham projects that have been assigned time at Durham and the mix of jobs expected to match its capabilities are MPI/OpenMP/Hybrid jobs using up to 256 cores per node with a maximum memory of 1.5TB per node.
In addition to the hardware limits the queues have the following limits and priorities:
Name |
Priority |
Maximum run time |
Maximum cores |
Cores per node |
nodes |
RAM per node |
|---|---|---|---|---|---|---|
cosma5 |
Normal |
72 hours |
unlimited |
256 |
8 |
1.5TB |
The cosma5-test queue is for testing purposes only.
Higher priority jobs will run first, though lower priority jobs with shorter runtimes may run as back-fill jobs, making use of available nodes before a larger job is due to start. Setting a run-time limit for your job may get it completed more quickly. See the FAQ descriptions for how to make use of back-filling.
The quarterly allocation can can be found out in the COSMA usage pages, you’ll need your COSMA username and password to see these. However, COSMA5 is no longer listed here as no DiRAC allocations are awarded on it.
Jobs within the same queue are scheduled using a fairshare arrangement so each user initially has the same priority. This is then weighted using a resources used formula.
Note that the order of running can again be affected by back-filling (but that will only work if a job is given a run time) and using fewer resources than other jobs.
cosma8-ska, cosma8-ska2 queues
The cosma8-ska and cosma8-ska2 queues are for SKA projects.
cosma8-ska is a single node (mad07) with 4TB RAM and 128 cores.
cosma8-ska2 has 4 high RAM nodes (mad01, mad03, mad04, mad05), with 3, 6, 4 and 4TB RAM respectively, and between 56-128 cores.
cosma8-draper and mi300x queues
The cosma8-draper and mi300x queues both contain the same single node, an 8-GPU system containing 8x MI300X GPUs.
Direct access to an MI300A system (4x GPUs) is available via direct ssh to ga008 from a login node.
The cosma8-draper queue is named in honour of Peter Draper, who retired from COSMA technical support after many years in 2026.
cosma8-dine2, dine2
The cosma8-dine2 and dine2 partitions contain 8 nodes each with 2TB RAM and 64 cores (Intel Sapphire Rapids). Each node also contains between 0-12 GPUs: This is a composible system where GPUs can be allocated to nodes upon request. There are a total of 8x A30 GPUs and 4x V100 gPUs.
cosma7-mad
A single node, mad01, with 56 cores and 3TB RAM
cosma7-shm
Two nodes, mad01, mad02, with 56 and 3TB RAM and 112 cores and 1.5TB RAM respectively.
cosma7-shm2
A single node, mad03, with 48 cores and 6TB RAM
cosma-analyse
A partition for analysis of cosmological datasets, including mad01 (56 cores, 3TB), mad02 (112 cores, 1.5TB), mad03 (48 cores, 6TB).
bluefield1, dine
Both are 18 node (was 24) partitions with each node having 32 cores and 512GB RAM. This includes gitlab-runners, and so can be used for CI/CD. The nodes are b[101-105,107-115, 116-119]
gracehopper
A single Grace-Hopper node, gn003. The gn002 node is also Grace-Hopper, available for direct ssh from a login node for code testing, development, etc.
Other queues
There are a number of other queues which may be of relevance. If in doubt, please ask.