Role overview

This is an advanced, high-impact role focused on the infrastructure that powers our ML systems. You are the architect responsible for the entire model lifecycle, from feature engineering to production monitoring. Your primary mission is to design and build scalable, automated systems, including feedback loops for continuous model improvement and point-in-time (PIT) correct feature pipelines.

You will be the lead authority on MLOps, building the monitoring systems to track data drift, model drift, and data quality drift in near real-time.

The ideal candidate is an expert in PySpark and Databricks who thinks about ML as an end-to-end, automated system, not just a model.

What you'll work on

Design and implement core ML infrastructure, including automated feedback loops and scalable point-in-time (PIT) join logic for feature engineering.
Develop and own the complete model monitoring framework, creating automated alerts to detect data drift, model drift, and data quality drift.
Leverage MLflow (Tracking, Registry, and Deployment) to manage the entire model lifecycle, including the champion-challenger process.
Build and maintain scalable data processing and feature engineering pipelines using PySpark.
Implement and manage our Feature Store, ensuring proper feature versioning and consistency between training and serving.
Drive the hyperparameter tuning process at scale using libraries like Hyperopt.
Collaborate with data scientists to productionize, calibrate, and release new model versions.

What we're looking for

5+ years of experience as a Machine Learning Engineer, MLOps Engineer, or Data Engineer with a heavy ML focus.
Must-Have: Exceptional proficiency in PySpark for large-scale data analysis, aggregation, and data quality checks.
Must-Have: Proven expertise in designing and implementing complex ML infrastructure, such as feedback loops and Point-in-Time (PIT) joins.
Must-Have: Hands-on experience with Feature Stores and a strong understanding of feature versioning for production systems.
Must-Have: Deep expertise with MLflow, including Tracking, Registry, and Deployment (or similar tools like Kubeflow/Sagemaker).
Must-Have: Demonstrable experience with hyperparameter tuning libraries (e.g., Hyperopt, Ray Tune).
Must-Have: Strong practical understanding of model performance monitoring and techniques for detecting data drift, model drift, and data quality drift.
Must-Have: Deep knowledge of the Databricks platform.

Experience with Spark Structured Streaming or other streaming data technologies.
Familiarity with calibration techniques (e.g., Isotonic Regression).

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Machine Learning Data Science Mlops Data Engineer Ai

Machine Learning Engineer MLOps Infrastructure

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?