Multiverse Computing
AI

Senior MLOps Engineer Training Inference Optimization

Multiverse Computing · Donostia-San Sebastián, PV, ES

Actively hiring Posted 12 days ago

Role overview

We are seeking a Senior MLOps Engineer to steer the technical vision of our Training and Inference Optimization team. In this high-impact role, you will architect the infrastructure that powers our next-generation AI models. You will bridge the gap between systems programming and machine learning, optimizing large-scale LLM training via NVIDIA NeMo and building ultra-high-throughput serving systems using vLLM, TensorRT-LLM, and SGLang.

Your mission is to ensure our models are not only state-of-the-art but also production-hardened, cost-efficient, and performant at scale.

What you'll work on

  • Training Infrastructure: Architect and maintain scalable distributed training pipelines using NVIDIA NeMo/Nemotron/Megatron-Bridge. You will optimize GPU utilization, manage complex checkpointing strategies, and implement automated fault tolerance for long-running jobs.
  • Inference Orchestration: Lead the deployment of LLMs using vLLM, TensorRT-LLM, or SGLang. You will implement and tune cutting-edge techniques - including PagedAttention, continuous batching, and advanced quantization (AWQ/FP8) to maximize throughput and minimize TPOT (Time Per Output Token).
  • Workload Orchestration: Utilize SLURM/Flyte/Ray/SkyPilot to manage and scale ML workloads across diverse cloud providers and on-prem clusters, ensuring seamless resource shifting and cost-effective execution.
  • Lifecycle Management: Standardize model tracking, versioning, and transition workflows using MLflow (or similar tool), ensuring reproducible training runs and a clear path from research to production.
  • Performance Engineering: Conduct deep-dive profiling and bottleneck analysis across the full stack - from CUDA kernels and NCCL collective communications to Python-level orchestration.
  • Efficiency & Cost Governance: Monitor and optimize cloud and on-prem GPU expenditures through intelligent scaling policies and high-density resource packing.
  • Technical Leadership: Set the bar for engineering excellence. You will drive the roadmap, perform rigorous code reviews, and mentor junior and mid-level engineers.

What we're looking for

  • Active contributions to relevant open-source projects (vLLM, SGLang, SkyPilot, or NeMo).
  • Proven track record with model compression (Sparsity, Distillation, or Quantization).
  • Experience writing or optimizing custom Triton kernels.
  • Expertise in ML observability stacks (Prometheus, Grafana, Jaeger).

Tags & focus areas

Used for matching and alerts on DevFound
Ai Mlops