Role overview
Job description: ML Engineer
Location: Bangalore
Work: Onsite (Mon - Thu) and Remote on Friday
Contract to Hire: Contract 4 months to Full time employment
*Job Title
Machine Learning Engineer (Data Engineering Focus) Databricks, Retail Grocery
Overview**
We are seeking a Machine Learning Engineer with strong Data Engineering skills to build and operationalize scalable data and ML solutions on Databricks running on Google Cloud Platform (GCP). This role focuses on developing end-to-end data pipelines, feature engineering, and production ML workflows that power critical retail grocery use cases such as demand forecasting, personalization, promotions, pricing, and inventory optimization.
You will work across data engineering, data science, and platform teams to deliver reliable, production-grade ML systems at scale.
*Key Responsibilities
Data Engineering & Platform**
- Design, build, and optimize batch and streaming data pipelines using Databricks (Spark / Structured Streaming).
- Develop robust ETL/ELT pipelines ingesting retail data (POS, transactions, customer, inventory, promotions, supplier data).
- Implement Delta Lake tables with best practices for performance, schema evolution, and data quality.
- Orchestrate pipelines using Databricks Workflows and/or Cloud Composer (Airflow).
- Ensure data reliability, observability, and cost efficiency across pipelines.
Machine Learning Engineering
- Build and productionize ML pipelines using Databricks MLflow, Databricks Feature Store, and Spark ML / Python ML frameworks.
- Collaborate with data scientists to convert experiments into scalable, reusable ML pipelines.
- Deploy and manage batch and real-time inference workflows within Databricks.
- Optimize model training and inference for performance and cost.
MLOps & Best Practices
- Implement ML lifecycle management using MLflow (experiment tracking, model registry, versioning).
- Enable CI/CD for data and ML pipelines using Git-based workflows.
- Monitor model performance, data drift, and pipeline health.
- Enforce best practices around testing, code quality, and reproducibility.
Retail Analytics & Collaboration
- Partner with business, analytics, and product teams to translate retail grocery use cases into data and ML solutions.
- Provide technical guidance on Spark optimization, data modeling, and ML architecture.
- Contribute to platform standards and reusable components.
Required Qualifications
- 3+ years of experience in Data Engineering and/or Machine Learning Engineering.
Strong hands-on experience with Databricks:
Apache Spark (PySpark / Spark SQL)
Delta Lake
MLflow
Strong proficiency in Python and SQL.
Experience building production-grade data pipelines at scale.
Solid understanding of ML concepts, feature engineering, and model evaluation.
Experience deploying ML models in distributed environments.
Preferred
- Experience with Databricks on GCP.
- Familiarity with Cloud Composer (Airflow).
- Experience with BigQuery, Pub/Sub, or GCP storage services.
- Retail, grocery, e-commerce, or CPG domain experience.
- Experience with demand forecasting, recommendation systems, or pricing models.
- Exposure to real-time/streaming ML use cases.
What Success Looks Like
- ML solutions move from prototype to production efficiently.
- Data pipelines are scalable, reliable, and cost-optimized.
- Business teams rely on ML outputs for core retail decisions.
- The Databricks platform supports multiple ML use cases with minimal friction.
*Tech Stack-
Cloud & Platform:**
GCP, Databricks
Data Processing & Storage:
Apache Spark (PySpark, Spark SQL), Delta Lake
Programming Languages:
Python, SQL
Machine Learning & MLOps:
MLflow, Databricks Feature Store, Spark ML
Orchestration & Streaming:
Databricks Workflows, Airflow (Cloud Composer), Pub/Sub
Analytics & Data Services:
BigQuery
What we're looking for
- Experience with Databricks on GCP.
- Familiarity with Cloud Composer (Airflow).
- Experience with BigQuery, Pub/Sub, or GCP storage services.
- Retail, grocery, e-commerce, or CPG domain experience.
- Experience with demand forecasting, recommendation systems, or pricing models.
- Exposure to real-time/streaming ML use cases.