Overview
The MLOps Engineer will support a London-based client in scaling their machine learning capabilities by transitioning models from development to a stable production environment. This role involves close collaboration with Data Scientists to produce reliable workloads and maintain the overall health of the machine learning pipeline. The contractor will implement best practices in CI/CD and manage performance monitoring across machine learning services.
Responsibilities
- Build, improve, and maintain the end-to-end machine learning pipeline from data ingestion to deployment.
- Collaborate with Data Scientists to convert prototypes into production-ready workloads.
- Set up continuous integration and continuous deployment protocols for model training, testing, and deployment using Python tools.
- Manage model performance monitoring, drift tracking, and retraining workflows.
- Create infrastructure utilizing Terraform or similar technologies, preferably within Azure environments.
- Maintain containerised workloads with Docker and Kubernetes.
- Implement observability solutions including logging, alerting, and performance metrics for ML services.
- Troubleshoot production issues to enhance platform stability and performance.
Requirements
- Strong proficiency in Python for production environments.
- Experience with Azure for machine learning workloads, including compute, networking, and storage.
- Familiarity with MLOps tools such as MLflow, Azure ML, or Kubeflow.
- Solid background in continuous integration and deployment using GitHub Actions or Azure DevOps.
- Experience in container orchestration using Docker and Kubernetes.
- Understanding of supporting Data Scientists through the model lifecycle.
- Experience with large-scale data or machine learning projects is beneficial.
- Knowledge of data engineering fundamentals is a plus.