Role overview
- Not a junior ML role
- Not a pure research position
- Not a “prompt engineer” role
- Not a platform consumer-only position
What you'll work on
- Design, deploy, and operate production-grade generative AI platforms
- Serve and scale large language models using modern inference frameworks (e.g., vLLM, SGLang, TensorRT-LLM)
- Own platform reliability, including on-call responsibilities, incident response, and performance tuning
- Build and operate Kubernetes-based infrastructure for AI workloads
- Partner with application teams consuming the AI platform and support production use cases
- Evaluate and integrate developer productivity tools (Copilot agents, Cursor, etc.) responsibly
What we're looking for
- Experience serving LLMs at scale using raw or managed GPUs
- Familiarity with inference optimization and cost-performance tradeoffs
- Background in:
- Large tech companies
- Research labs or higher-ed research environments
- Enterprise platforms with strict reliability requirements
- Daily, practical use of GenAI developer tools (Copilot, Cursor, Windsurf, etc.)
Tags & focus areas
Used for matching and alerts on DevFound Contract Remote Ai Machine Learning Generative Ai