HPC Hardware Lab @Durham

The HPC Hardware Laboratory @Durham is used for prototyping and development of new systems and technologies. It is hosted within, yet independent of the COSMA HPC facility and receives funding from Durham University, industry partners, UKRI (e.g. through ExCALIBUR) and DiRAC. Many of the systems tested are readily available for benchmarking. Details of available nodes and GPUs can be found here.

To request access to any of the systems within the hardware lab, please create an account and request to join the corresponding project code. Please contact cosma-support@durham.ac.uk when you request an account so that it can be set up correctly.

The Hardware Lab features the following systems and technologies:
DINE and DINE2
GPU compute nodes
Composable Infrastructure
Rockport Network Fabric
Quantum Annealing (DWAVE)
CPU Compute Lab
Storage Lab
Solar Power
Immersion Cooling
Heat Storage

There are a number of other hardware lab facilities around the UK. If you are looking for specific hardware which is not on this list, a good starting point to search for available facilities is the collection maintained by the SHAREing project.

DINE and DINE2

DINE and DINE2 are experimental test clusters for exploring new hardware and networking technologies.

DINE

DINE, is equipped with 24 nodes, each containing a NVIDIA BlueField-2 Data Processing Unit, with HDR200 InfiniBand connectivity.
It was previously equipped with BlueField1 and Rockport network cards. To use this facility, join the do009 project and submit jobs to the bluefield1 Slurm partition.

The DINE2 cluster is an 8-node cluster equipped with a CerIO composability fabric, allowing GPUs to be added to servers upon demand. To use the DINE2 cluster as a Hardware Lab user, apply to join the do015 project.

GPU compute

We maintain multiple generations of GPU architecture from multiple vendors.

AMD GPU nodes

We have multiple generations of AMD MI GPU:

  • MI100

  • MI210

  • MI300A

  • MI300X

There are two nodes, each with two AMD MI200 GPUs available. Submit jobs to the cosma8-shm2 partition. This partition also contains a node with one AMD MI100 GPU. To specify a particular GPU to submit to, use –exclude or –include.

For the MI300 GPUs, either submit to the mi300x queue, or ssh directly from a login node to the ga008 node (MI300X).

NVIDIA GPU nodes

V100, A100 and H100 nodes are available for use, including Intel-Hopper (X86 CPU) and Grace-Hopper (ARM CPU).

Intel GPU nodes

A Ponte Vecchio GPU node was once available but is currently offline.

Tenstorrent Blackhole node (RISC-V)

A Tenstorrent Blackhole server is available for interactive use, containing 4x Blackhole cards.

Tenstorrent cards are designed specifically for AI and machine-learning workloads, unlike GPUs which have been adapted over time from their original graphics rendering workloads. Currently PyTorch, ONNX, and TensorFlow can be easily ported to Tenstorrent architecture, but software support is otherwise in the early stages of development.

Composable infrastructure

COSMA contains a composable GPU and RAM system, attached to the cosma8-shm queue. If you need to use these resources in a different configuration, please ask cosma-support.

COSMA also hosts an 8-node system with 8x A30 GPUs, allowing up to 8 GPUs per node, based on a CerIO composable fabric.

Rockport network fabric

The Rockport network fabric is 6D torus network with 100G connectivity, installed as half of COSMA7. To use this, please submit to the cosma7-rp Slurm partition.

DWAVE quantum

The ExCALIBUR project funded quantum annealing system access via a DWAVE system, which was administered by COSMA staff. This has now expired, but if you are interested in access, please contact cosma-support.

CPU compute lab

A wide variety of processor technologies are available for testing and benchmarking and we try to maintain cutting-edge components.

Storage lab

Our storage laboratory includes prototype and production file systems of various types and technologies.

Solar

Our solar installation provides power directly usable by COSMA, allowing us to study the interplay between solar energy generation and HPC power usage.

Immersion cooling

We have a prototype immersion cooling tank, and provide access to this facility to HPC technical support teams from across the UK, to help reduce the entry barrier to this technology, and to develop experience using it.

Heat storage

We are investigating the inter-seasonal storage of waste data centre heat using abandoned flooded coal mines beneath the data centre site, to allow this heat to be used for building heating during the colder winter months.