Site Reliability Engineer

Apply

Overview

Lead SRE / Observability Engineering Lead - (Outside IR35 Contract / Remote) Location: Bristol / London HQ – Largely Remote (Occasional Travel) Day Rate: Outside IR35 – £650 to £750 p/d Duration: 3-6 Months Initial – with intention to extend Payment Terms: Monthly Our client is a FTSE100 Wealth/Asset Management firm seeking to engage a Lead SRE Engineer (Observability SME) to support the implementation and instrumentation of their new Observability solution. This role will be critical in delivering against our Digital OKRs by embedding observability best practices, frameworks, and tooling across digital platforms and engineering teams. Key Responsibilities: Strategy & Roadmap: Define and drive the observability roadmap in alignment with business priorities and digital platform objectives. Champion observability-by-design standards across the Software Development Lifecycle (SDLC). Reliability & Performance: Establish and manage SLIs, SLOs, and error budgets to track and improve system reliability. Support capacity and availability planning through real-time telemetry and predictive analytics. Instrumentation & Runbooks: Design and implement observability runbooks covering metrics, logs, traces, synthetics, and customer journey monitoring. Set standards for instrumentation, dashboards, alerting, and enable teams to self-serve their system metrics and traces. Implementation & Enablement: Assist digital engineering teams in implementing synthetic monitoring, health checks, observable CI/CD pipelines, distributed tracing, and cloud-native monitoring patterns. Partner with SRE, DevOps, and engineering teams to embed observability into digital platforms and services. Collaboration & Culture: Promote a culture of data-driven decision-making and operational excellence. Drive adoption of observability best practices, standards, and governance across teams. Technical Expertise Core Experience: 10+ years in engineering roles, with at least 5 years in SRE, Observability, or DevOps functions. Proven track record implementing observability solutions in cloud-native environments (AWS, Azure, or GCP). Hands-on proficiency with observability tools such as Datadog, Grafana, Prometheus, OpenTelemetry. Strong knowledge of distributed systems, microservices, and container orchestration (Kubernetes, Docker). Experience with automation and Infrastructure as Code (Terraform, Ansible) and CI/CD pipelines. Familiarity with performance engineering, capacity planning, resilience testing, and telemetry-based insights. Experience supporting cloud-native platforms and modern application architectures. Proficiency in programming and scripting languages such as Python or Go. Experience building and managing enterprise-grade observability solutions. Strong understanding of secrets management, RBAC, audit logging, compliance, and secure infrastructure practices.

Responsibilities

  • Define and drive the observability roadmap aligned with business objectives.
  • Establish and manage SLIs, SLOs, and error budgets to enhance system reliability.
  • Design and implement observability runbooks for metrics, logs, traces, and monitoring.
  • Assist engineering teams in implementing various observability solutions and patterns.
  • Promote a culture of data-driven decision-making and operational excellence.
  • Drive adoption of observability best practices across teams.

Requirements

  • 10+ years of experience in engineering roles, with 5+ years in SRE, Observability, or DevOps.
  • Proven experience implementing observability solutions in cloud-native environments (AWS, Azure, GCP).
  • Hands-on expertise with observability tools such as Datadog, Grafana, and Prometheus.
  • Strong knowledge of microservices, container orchestration (Kubernetes, Docker).
  • Experience with Infrastructure as Code tools (Terraform, Ansible) and CI/CD pipelines.
  • Proficiency in programming/scripting languages such as Python or Go.
  • Familiarity with performance engineering and telemetry-based insights.
  • Understanding of secure infrastructure practices and compliance standards.
SkillsPython, AWS, GCP, Azure
LocationCity Of Bristol
TypeOn-site
Rate£650-£750/day
SourceLinkedIn
Posted05/11/25