Role overview

Job description: ML Engineer

Location: Bangalore

Work: Onsite (Mon - Thu) and Remote on Friday

Contract to Hire: Contract 4 months to Full time employment

*Job Title

Machine Learning Engineer (Data Engineering Focus) Databricks, Retail Grocery

Overview**

We are seeking a Machine Learning Engineer with strong Data Engineering skills to build and operationalize scalable data and ML solutions on Databricks running on Google Cloud Platform (GCP). This role focuses on developing end-to-end data pipelines, feature engineering, and production ML workflows that power critical retail grocery use cases such as demand forecasting, personalization, promotions, pricing, and inventory optimization.

You will work across data engineering, data science, and platform teams to deliver reliable, production-grade ML systems at scale.

*Key Responsibilities

Data Engineering & Platform**

Design, build, and optimize batch and streaming data pipelines using Databricks (Spark / Structured Streaming).
Develop robust ETL/ELT pipelines ingesting retail data (POS, transactions, customer, inventory, promotions, supplier data).
Implement Delta Lake tables with best practices for performance, schema evolution, and data quality.
Orchestrate pipelines using Databricks Workflows and/or Cloud Composer (Airflow).
Ensure data reliability, observability, and cost efficiency across pipelines.

Machine Learning Engineering

Build and productionize ML pipelines using Databricks MLflow, Databricks Feature Store, and Spark ML / Python ML frameworks.
Collaborate with data scientists to convert experiments into scalable, reusable ML pipelines.
Deploy and manage batch and real-time inference workflows within Databricks.
Optimize model training and inference for performance and cost.

MLOps & Best Practices

Implement ML lifecycle management using MLflow (experiment tracking, model registry, versioning).
Enable CI/CD for data and ML pipelines using Git-based workflows.
Monitor model performance, data drift, and pipeline health.
Enforce best practices around testing, code quality, and reproducibility.

Retail Analytics & Collaboration

Partner with business, analytics, and product teams to translate retail grocery use cases into data and ML solutions.
Provide technical guidance on Spark optimization, data modeling, and ML architecture.
Contribute to platform standards and reusable components.

Required Qualifications

3+ years of experience in Data Engineering and/or Machine Learning Engineering.
Strong hands-on experience with Databricks:
Apache Spark (PySpark / Spark SQL)
Delta Lake
MLflow
Strong proficiency in Python and SQL.
Experience building production-grade data pipelines at scale.
Solid understanding of ML concepts, feature engineering, and model evaluation.
Experience deploying ML models in distributed environments.

Preferred

Experience with Databricks on GCP.
Familiarity with Cloud Composer (Airflow).
Experience with BigQuery, Pub/Sub, or GCP storage services.
Retail, grocery, e-commerce, or CPG domain experience.
Experience with demand forecasting, recommendation systems, or pricing models.
Exposure to real-time/streaming ML use cases.

What Success Looks Like

ML solutions move from prototype to production efficiently.
Data pipelines are scalable, reliable, and cost-optimized.
Business teams rely on ML outputs for core retail decisions.
The Databricks platform supports multiple ML use cases with minimal friction.

*Tech Stack-

Cloud & Platform:**

GCP, Databricks

Data Processing & Storage:

Apache Spark (PySpark, Spark SQL), Delta Lake

Programming Languages:

Python, SQL

Machine Learning & MLOps:

MLflow, Databricks Feature Store, Spark ML

Orchestration & Streaming:

Databricks Workflows, Airflow (Cloud Composer), Pub/Sub

Analytics & Data Services:

BigQuery

What we're looking for

Experience with Databricks on GCP.
Familiarity with Cloud Composer (Airflow).
Experience with BigQuery, Pub/Sub, or GCP storage services.
Retail, grocery, e-commerce, or CPG domain experience.
Experience with demand forecasting, recommendation systems, or pricing models.
Exposure to real-time/streaming ML use cases.

Tags & focus areas

Used for matching and alerts on DevFound

Remote Data Engineering Apache Spark Machine Learning Data Bricks Python Airflow Streaming Pyspark Data Science

Machine Learning Engineer with Data engineering

Role overview

What we're looking for

Tags & focus areas

Ready to Join the Team?