HPC Engineer

Apply

Overview

We are looking for a highly skilled High-Performance Computing (HPC) Engineer to join our team for a 6-month contract with potential for extension. The successful candidate will work closely with the scientific community to enhance computational workflows and maintain robust HPC infrastructure, employing DevOps principles and Infrastructure-as-Code methodologies. This hybrid role allows for three days on-site and two days remote, starting on November 24th.

Responsibilities

  • Design, implement, and maintain secure and scalable HPC infrastructure using Infrastructure-as-Code tools such as Terraform.
  • Develop, deliver, and support advanced research computing services and applications.
  • Apply Site Reliability Engineering principles to ensure high availability and performance across HPC environments.
  • Troubleshoot and resolve complex technical challenges affecting platform and user workloads.

Requirements

  • 10+ years of hands-on experience in designing and operating large-scale computing environments (HPC, HTC, or Big Compute).
  • Proven ability to drive innovation and integrate emerging technologies into HPC solutions.
  • Administration experience with cluster and workload management software (e.g., Slurm, LSF, Grid Engine).
  • Strong knowledge of Linux system administration, TCP/IP networking, and storage systems.
  • Experience managing parallel file systems (e.g., Weka, GPFS, Lustre).
  • Hands-on experience with private cloud platforms (e.g., OpenStack).
  • Proficiency with configuration management tools (e.g., Ansible, Salt, Puppet).
  • Strong scripting skills in Bash and Python for automation and systems management.
SkillsBash, Python, Agile
LocationGreater London
TypeHybrid
SourceLinkedIn
Posted04/11/25