Overview
We are seeking a skilled Site Reliability Engineer (SRE) to support the operations of a growing SME in the Defence sector. The successful contractor will collaborate with cross-functional teams to enhance system reliability and performance, working on-site for three days each week. This position focuses on delivering robust solutions and ensuring optimal service reliability with a strong emphasis on cloud technologies and modern development practices.
Responsibilities
- Implement reliability engineering practices to improve system performance.
- Collaborate with development teams to streamline CI/CD processes.
- Utilize configuration management tools to automate system deployment and management.
- Manage containerized applications using Docker, Kubernetes, and OpenShift.
- Monitor system performance using Prometheus and Grafana.
- Support infrastructure as code initiatives using Terraform.
- Administer Linux systems and manage shell scripting tasks.
- Work with AWS services, including EC2, RDS, and S3.
Requirements
- Active or eligible to obtain DV clearance.
- Proficiency with configuration management tools such as Ansible or Chef.
- Experience working with Docker containers, Kubernetes, and OpenShift.
- Familiarity with Terraform for infrastructure management.
- Knowledge of CI/CD methodologies and tools, particularly Jenkins.
- Experience monitoring systems using Prometheus or Grafana.
- Strong understanding of SQL and Linux administration.
- Shell scripting experience.