Role overview

We are seeking a Senior MLOps Engineer to steer the technical vision of our Training and Inference Optimization team. In this high-impact role, you will architect the infrastructure that powers our next-generation AI models. You will bridge the gap between systems programming and machine learning, optimizing large-scale LLM training via NVIDIA NeMo and building ultra-high-throughput serving systems using vLLM, TensorRT-LLM, and SGLang.

Your mission is to ensure our models are not only state-of-the-art but also production-hardened, cost-efficient, and performant at scale.

What you'll work on

Training Infrastructure: Architect and maintain scalable distributed training pipelines using NVIDIA NeMo/Nemotron/Megatron-Bridge. You will optimize GPU utilization, manage complex checkpointing strategies, and implement automated fault tolerance for long-running jobs.
Inference Orchestration: Lead the deployment of LLMs using vLLM, TensorRT-LLM, or SGLang. You will implement and tune cutting-edge techniques - including PagedAttention, continuous batching, and advanced quantization (AWQ/FP8) to maximize throughput and minimize TPOT (Time Per Output Token).
Workload Orchestration: Utilize SLURM/Flyte/Ray/SkyPilot to manage and scale ML workloads across diverse cloud providers and on-prem clusters, ensuring seamless resource shifting and cost-effective execution.
Lifecycle Management: Standardize model tracking, versioning, and transition workflows using MLflow (or similar tool), ensuring reproducible training runs and a clear path from research to production.
Performance Engineering: Conduct deep-dive profiling and bottleneck analysis across the full stack - from CUDA kernels and NCCL collective communications to Python-level orchestration.
Efficiency & Cost Governance: Monitor and optimize cloud and on-prem GPU expenditures through intelligent scaling policies and high-density resource packing.
Technical Leadership: Set the bar for engineering excellence. You will drive the roadmap, perform rigorous code reviews, and mentor junior and mid-level engineers.

What we're looking for

Active contributions to relevant open-source projects (vLLM, SGLang, SkyPilot, or NeMo).
Proven track record with model compression (Sparsity, Distillation, or Quantization).
Experience writing or optimizing custom Triton kernels.
Expertise in ML observability stacks (Prometheus, Grafana, Jaeger).

Tags & focus areas

Used for matching and alerts on DevFound

Ai Mlops

Senior MLOps Engineer Training Inference Optimization

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?