HPC Engineer

Overview

We are looking for a highly skilled High-Performance Computing (HPC) Engineer to join our team for a 6-month contract with potential for extension. The successful candidate will work closely with the scientific community to enhance computational workflows and maintain robust HPC infrastructure, employing DevOps principles and Infrastructure-as-Code methodologies. This hybrid role allows for three days on-site and two days remote, starting on November 24th.

Responsibilities

Design, implement, and maintain secure and scalable HPC infrastructure using Infrastructure-as-Code tools such as Terraform.
Develop, deliver, and support advanced research computing services and applications.
Apply Site Reliability Engineering principles to ensure high availability and performance across HPC environments.
Troubleshoot and resolve complex technical challenges affecting platform and user workloads.

Requirements

10+ years of hands-on experience in designing and operating large-scale computing environments (HPC, HTC, or Big Compute).
Proven ability to drive innovation and integrate emerging technologies into HPC solutions.
Administration experience with cluster and workload management software (e.g., Slurm, LSF, Grid Engine).
Strong knowledge of Linux system administration, TCP/IP networking, and storage systems.
Experience managing parallel file systems (e.g., Weka, GPFS, Lustre).
Hands-on experience with private cloud platforms (e.g., OpenStack).
Proficiency with configuration management tools (e.g., Ansible, Salt, Puppet).
Strong scripting skills in Bash and Python for automation and systems management.

Skills	Bash, Python, Agile
Location	Greater London
Type	Hybrid
Source	LinkedIn
Posted	04/11/25