Role overview
Role Overview As a core member of our AI engineering team, you will design, develop, and optimize cutting-edge AI models and workloads that run natively on our high-performance GPU clusters. Leverage our SOTA infrastructure to train, fine-tune, and serve massive-scale models at unprecedented efficiency. Collaborate across infrastructure, product, and research teams to align hardware capabilities with real-world AI demands, driving breakthroughs in performance, scalability, and innovation.
What we're looking for
Design, implement, and train state-of-the-art ML models for high-impact applications (e.g., NLP, Computer Vision, Network Optimization).
Optimize AI workloads for extreme performance and scalability on large-scale GPU systems like GB200 NVL72, using tools such as Dynamo, vLLM, and advanced inference engines.
Partner with cross-functional teams to co-design hardware-software solutions that maximize AI processing efficiency.
Build robust tools, data pipelines, evaluation frameworks, and deployment systems.
Track and incorporate the latest AI research and technological advancements.
Contribute to product requirements (PRDs) and agile execution (sprint planning and delivery).
Champion a culture of humility, bold innovation, and high-velocity product delivery.
Bachelor's degree in Computer Science, Electrical Engineering, Mathematics, Statistics, or a related technical field.
3+ years of hands-on experience in machine learning, deep learning, and software engineering.
Proficiency in Python; experience with C/C++.
Strong working knowledge of major AI/ML frameworks (PyTorch, TensorFlow, JAX, or similar).
Solid foundation in data structures, algorithms, and software design principles.
Master's or PhD in Computer Science, AI/ML, or a related discipline.
Experience with Large Language Models (LLMs), Generative AI, or Computer Vision.
Familiarity with distributed training frameworks and techniques (e.g., Ray, DeepSpeed, Megatron-LM).
Proven expertise optimizing models for GPU inference (e.g., TensorRT, Triton Inference Server).
Knowledge of MLOps tools and practices (Kubeflow, MLflow, etc.).
Show more
Show less
Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Information Technology and Engineering
Industries
Technology, Information and Media, Software Development, and IT Services and IT Consulting