Machine Learning Engineer

Doghouse Recruitment ·

Actively hiring Posted 2 months ago 2 min read

Role overview

*AI/ML Solutions Architect – Distributed Training & GPU Infrastructure

Location:**
Remote from anywhere in the U.S.

Salary:
Up to $230k base + bonus + RSU's depending on seniority

Join a fast-moving AI infrastructure team working on the cutting edge of large-scale ML workloads. This role is ideal for engineers who enjoy solving deep technical challenges in distributed training, multi-GPU systems, and scalable AI inference infrastructure. You'll work directly with AI-focused clients, helping them get the most out of modern GPUs (H100, B200, etc.) and ML frameworks like PyTorch and JAX.

What you'll work on

Work alongside world-class engineers building the infrastructure behind next-gen AI systems. As part of the customer solutions team, you'll:

Design and deploy high-performance ML pipelines across hundreds/thousands of GPUs
Guide customers in optimizing distributed training and inference setups
Deliver tech talks, contribute to whitepapers, and gather feedback for product teams
Work cross-functionally with engineering, product, and R&D to shape our AI platform

What we're looking for

5+ years in ML infrastructure, MLOps, or similar roles
Deep experience with PyTorch or Tensorflow and multi-node training
Strong understanding of ML pipeline design, performance tuning, and deployment
Kubernetes, Slurm, Terraform, Git, Docker
Programming in Python (Go, Java, or C++ a plus)

We’re looking for hands-on engineers who understand real-world ML problems and love building scalable, robust systems. If you thrive at the intersection of infrastructure and AI, this is your next move.

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Remote Machine Learning Pytorch

Machine Learning Engineer

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?