Role overview
MLOps Engineer
We are seeking a versatile Backend & MLOps Engineer to join our Core AI Development Team. This team designs and builds tools that enable the firm to leverage Large Language Models (LLMs) in daily workflows, including enterprise Retrieval-Augmented Generation (RAG) systems and data ingestion pipelines. In this hybrid role, you will split your time between building robust backend systems in Python and managing the infrastructure, CI/CD pipelines, and observability platforms that keep our AI solutions running at scale.
What you'll work on
Design, implement, and maintain scalable build/release pipelines to support rapid development and deployment of ML-powered applications and APIs.
Ensure the stability and scalability of our Kubernetes-based infrastructure supporting AI/LLM workloads.
Contribute to the development of enterprise RAG solutions, data ingestion pipelines, and other AI-driven tools.
Develop and manage metrics and monitoring systems leveraging DataDog to track system and model performance and reliability.
Automate workflows and continuously improve CI/CD processes for ML-driven services.
Troubleshoot and resolve infrastructure-related issues, ensuring minimal downtime for AI systems.
Implement best practices for system security, reliability, and scalability in ML production environments.
What we're looking for
- Experience with cloud platforms (AWS, GCP, Azure) for infrastructure management and application deployment.
- Familiarity with AI/LLM-related workflows, tools, and interfaces, including model serving and prompt management.
- Experience with data ingestion pipelines or event-driven architectures.