Overview
We are seeking a Senior Site Reliability Engineer (SRE) Team Lead to take on a key leadership role within a dynamic team. In this hands-on position, the contractor will provide technical guidance, mentor fellow engineers, and drive reliability and performance improvements across critical cloud infrastructure. The role requires collaboration with various teams to foster a culture of operational excellence while occasionally visiting client offices in Central London.
Responsibilities
- Provide technical leadership for the Site Reliability Engineering function, focusing on reliability and performance improvements.
- Lead, mentor, and coach a team of SREs and engineers to promote operational excellence.
- Engage in hands-on design, implementation, and support of cloud infrastructure and automation initiatives.
- Define and implement service-level objectives (SLOs) and error budgets, aligning engineering priorities with business goals.
- Architect, maintain, and optimize distributed systems in an AWS cloud environment.
- Drive change management across infrastructure and operational processes.
- Champion Infrastructure as Code and automation practices to improve efficiency.
- Collaborate with development teams to embed reliability best practices throughout the software development lifecycle.
Requirements
- Proven experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles, with leadership experience.
- Hands-on technical expertise alongside mentoring capabilities.
- Extensive experience with AWS cloud services and architectures.
- Strong Linux/Unix systems administration skills.
- Proficient in scripting or programming languages such as Python, Bash, Go, or Java.
- Experience with Infrastructure as Code tools, including Terraform and CloudFormation.
- Familiarity with containerization and orchestration technologies, including Docker and Kubernetes.
- Experience with observability and monitoring platforms like Datadog and Splunk.