Site Reliability Engineer

Apply

Overview

The Site Reliability Engineer will play a key role in ensuring the availability, performance, and resilience of critical infrastructure services. Working in a hybrid environment, you will collaborate with software engineers and system administrators to enhance system observability, automate processes, and address reliability challenges. This position offers the chance to work on high-impact projects within a technically sophisticated team committed to delivering exceptional service.

Responsibilities

  • Partner with software engineers to improve system reliability and performance.
  • Automate processes in collaboration with system administrators to reduce manual tasks.
  • Develop and enhance monitoring, logging, and observability tools to proactively identify issues.
  • Support the ongoing improvement of development environments to meet delivery and quality standards.
  • Research and implement new tools and architectures to enhance scalability and resilience.
  • Build and maintain CI/CD pipelines to streamline deployment processes.
  • Expand expertise across both cloud-based and on-premises environments.

Requirements

  • Proven experience with Terraform and modern configuration management tools (e.g., Ansible, Chef).
  • Strong skills in Docker and Kubernetes/OpenShift/Docker Swarm.
  • Hands-on experience with CI/CD pipeline development (e.g., Jenkins).
  • Deep knowledge of monitoring and observability tools (e.g., Grafana, Prometheus, InfluxDB).
  • Solid understanding of Linux, network security, SQL, and AWS services (e.g., EC2, S3, RDS, Lambda).
  • Familiarity with message queueing systems (e.g., RabbitMQ).
  • Experience with Azure environments is a plus.
  • Strong programming ability in languages such as Python, Java, or Go is preferred.
SkillsJava, Python, SQL, AWS, Azure
LocationHereford
TypeHybrid
Rate£500-£600/day
SourceLinkedIn
Posted06/11/25