Role overview
Role Overview
As a core member of our AI engineering team, you will design, develop, and optimize cutting-edge AI models and workloads that run natively on our high-performance GPU clusters. Leverage our SOTA infrastructure to train, fine-tune, and serve massive-scale models at unprecedented efficiency. Collaborate across infrastructure, product, and research teams to align hardware capabilities with real-world AI demands, driving breakthroughs in performance, scalability, and innovation.
What you'll work on
- Design, implement, and train state-of-the-art ML models for high-impact applications (e.g., NLP, Computer Vision, Network Optimization).
- Optimize AI workloads for extreme performance and scalability on large-scale GPU systems like GB200 NVL72, using tools such as Dynamo, vLLM, and advanced inference engines.
- Partner with cross-functional teams to co-design hardware-software solutions that maximize AI processing efficiency.
- Build robust tools, data pipelines, evaluation frameworks, and deployment systems.
- Track and incorporate the latest AI research and technological advancements.
- Contribute to product requirements (PRDs) and agile execution (sprint planning and delivery).
- Champion a culture of humility, bold innovation, and high-velocity product delivery.
What we're looking for
- Master's or PhD in Computer Science, AI/ML, or a related discipline.
- Experience with Large Language Models (LLMs), Generative AI, or Computer Vision.
- Familiarity with distributed training frameworks and techniques (e.g., Ray, DeepSpeed, Megatron-LM).
- Proven expertise optimizing models for GPU inference (e.g., TensorRT, Triton Inference Server).
- Knowledge of MLOps tools and practices (Kubeflow, MLflow, etc.).