Overview
The Senior Kubernetes Platform Engineer will play a pivotal role in enhancing the Kubernetes-based platform infrastructure within a prominent Higher Education institution. Working within a close-knit engineering team, the contractor will focus on hands-on development and optimization of the platform, while also mentoring junior engineers. This position offers an opportunity to directly impact the infrastructure that supports essential functions for students and researchers.
Responsibilities
- Design, build, and maintain the on-premises Kubernetes-based platform infrastructure.
- Assess technical debt and infrastructure gaps to develop a sensible work sequence.
- Operate and enhance the GitOps pipeline using Argo CD, Argo Workflows, and Helm-managed workloads.
- Manage the lifecycle of Talos nodes, including updates and machine configuration changes.
- Maintain and troubleshoot Cilium network configurations and the Ceph storage cluster.
- Oversee GPU infrastructure management, including NVIDIA device plugin and GPU Operator.
- Improve observability through Prometheus and Grafana, developing alert rules and dashboards.
- Provide technical mentorship to junior engineers, helping to upskill colleagues.
Requirements
- Proven experience in Kubernetes at a cluster-level, including building and operating infrastructure.
- Strong understanding of Kubernetes networking, focusing on the non-policy aspects of CNI.
- Extensive Linux OS-level experience, including systemd, journald, and process management.
- Proficiency in scripting and automation utilizing Python, Go, or Shell.
- Solid GitOps experience with tools like Argo CD or Flux, demonstrating a deep understanding of the model.
- Expertise in infrastructure as code with Terraform, Helm, or Ansible.
- Experience with observability tools like Prometheus and Grafana, including alert and dashboard design.
- Demonstrated ability to mentor junior engineers and produce clear documentation.