HPC Engineer

Apply

Overview

We are looking for a skilled High-Performance Computing (HPC) Engineer to enhance our team. This role involves collaborating with the scientific community to optimize computational workflows and provide robust HPC services and infrastructure. The ideal candidate will bring extensive technical knowledge and a commitment to delivering high-quality, automated solutions. You will be instrumental in developing and maintaining advanced computing platforms, while applying DevOps principles.

Responsibilities

  • Design, implement, and maintain secure and scalable HPC infrastructure using Infrastructure-as-Code tools like Terraform.
  • Develop, deliver, and support advanced research computing services and applications.
  • Apply Site Reliability Engineering principles to ensure high availability and reliability of HPC environments.
  • Troubleshoot and resolve complex technical issues impacting platform and user workloads.
  • Drive innovation by integrating emerging technologies into HPC solutions.
  • Manage and administer cluster and workload management software.
  • Utilize scripting skills in Bash and Python for automation and systems management.
  • Cultivate productive relationships with third-party suppliers.

Requirements

  • 10+ years of experience in designing, operating, or engineering large-scale computing environments.
  • Proven track record of integrating emerging technologies into HPC solutions.
  • Experience with cluster and workload management software (e.g., Slurm, LSF, Grid Engine).
  • Strong Linux system administration skills and understanding of TCP/IP networking.
  • Experience with managing parallel file systems (e.g., Weka, GPFS, Lustre).
  • Hands-on experience with private cloud platforms (e.g., OpenStack).
  • Proficiency in configuration management tools (e.g., Ansible, Salt, Puppet).
  • Experience in DevOps environments utilizing agile methodologies.
SkillsBash, Python
LocationGreater London
TypeHybrid
SourceLinkedIn
Posted03/11/25