Doghouse Recruitment
AI

Machine Learning Engineer

Doghouse Recruitment ·

Actively hiring Posted 2 months ago

Role overview

*AI/ML Solutions Architect – Distributed Training & GPU Infrastructure

Location:**
Remote from anywhere in the U.S.

Salary:
Up to $230k base + bonus + RSU's depending on seniority

Join a fast-moving AI infrastructure team working on the cutting edge of large-scale ML workloads. This role is ideal for engineers who enjoy solving deep technical challenges in distributed training, multi-GPU systems, and scalable AI inference infrastructure. You'll work directly with AI-focused clients, helping them get the most out of modern GPUs (H100, B200, etc.) and ML frameworks like PyTorch and JAX.

What you'll work on

Work alongside world-class engineers building the infrastructure behind next-gen AI systems. As part of the customer solutions team, you'll:

  • Design and deploy high-performance ML pipelines across hundreds/thousands of GPUs
  • Guide customers in optimizing distributed training and inference setups
  • Deliver tech talks, contribute to whitepapers, and gather feedback for product teams
  • Work cross-functionally with engineering, product, and R&D to shape our AI platform

What we're looking for

  • 5+ years in ML infrastructure, MLOps, or similar roles
  • Deep experience with PyTorch or Tensorflow and multi-node training
  • Strong understanding of ML pipeline design, performance tuning, and deployment
  • Kubernetes, Slurm, Terraform, Git, Docker
  • Programming in Python (Go, Java, or C++ a plus)

We’re looking for hands-on engineers who understand real-world ML problems and love building scalable, robust systems. If you thrive at the intersection of infrastructure and AI, this is your next move.

Tags & focus areas

Used for matching and alerts on DevFound
Fulltime Remote Machine Learning Pytorch