Overview
We are seeking a Senior Platform Engineer with expertise in Kubernetes and infrastructure as code to enhance our internal Agentic AI platform in a hybrid work environment. The successful candidate will be responsible for building and maintaining platform capabilities while ensuring operational health and security compliance. This role requires collaboration with cross-functional teams to implement automated workflows and manage the overall infrastructure.
Responsibilities
- Build and implement new platform capabilities based on architectural designs.
- Maintain operational health of the platform, including monitoring and incident response.
- Design and enforce security policies across the platform.
- Establish and monitor the platform observability stack to ensure system reliability.
- Implement GitOps delivery automation for version-controlled changes.
- Manage secrets rotation, certificate lifecycle, and identity configuration.
- Ensure the health of platform data services, such as backup schedules and failover testing.
- Participate in high-stakes operational procedures to ensure secure system functioning.
Requirements
- Deep experience with Kubernetes cluster operations, ideally with RKE2, EKS, or GKE.
- Proficiency in Helm, RBAC design, and managing multi-namespace workloads.
- Experience deploying and managing secrets at scale using platforms like HashiCorp Vault.
- Familiarity with policy-as-code tools such as OPA/Rego or Kyverno for admission control.
- Experience with GitOps tools like Fleet, ArgoCD, or Flux in production environments.
- Knowledge of Linux platform engineering principles including TLS, PKI, and networking fundamentals.
- Experience with API gateway technologies, such as Kong or Envoy, for deployment and operations.