Role overview

Member of Technical Staff, Machine Learning

San Francisco, CA (On-Site M-F)

Our client is a cutting-edge AI startup in the Bay Area developing highly efficient foundational models for real-world deployment across devices. Rapidly growing, highly technical team focused on building top-tier large language model (LLM) architectures with real-world impact.

As a Member of Technical Staff, you’ll drive innovation on large-scale model training, infrastructure, and optimization. You’ll collaborate closely with a small team of seasoned researchers and engineers, advancing state-of-the-art LLMs for efficient deployment at scale.

Responsibilities:

Design, implement, and optimize large-scale pretraining and post-training pipelines for language models
Tackle challenges in model parallelism, distributed training, and low-level hardware/software co-design
Monitor, maintain, and troubleshoot massive training and inference workloads end-to-end
Collaborate on advancing core model architectures, inference optimizations, and custom hardware design
Contribute to open-source community initiatives and research publications
Analyze and streamline data pipelines, instruction data curation, and evaluation methods
Apply advanced optimization theory to improve model performance

Qualifications:

Degree in Computer Science, Electrical Engineering, or related technical field (or equivalent practical experience)
Hands-on experience in machine learning research centered on LLMs, efficient AI systems, or large-scale model training
Strong proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow)
Expertise with distributed training, parallelization strategies, and large-scale computational infrastructure
Understanding of low-level GPU optimizations, CUDA, or similar technologies

Preferred Skills:

Previous work at leading research labs or high-impact contributions to community AI projects
Experience with custom hardware, FPGA/ASIC design, or maximizing training throughput
Familiarity with open-source inference engines (e.g., llama.cpp, vllm, triton)
Academic publications in optimization, LLM training, or AI infrastructure
Prior work optimizing models for edge or device-level deployment

What you'll work on

Design, implement, and optimize large-scale pretraining and post-training pipelines for language models
Tackle challenges in model parallelism, distributed training, and low-level hardware/software co-design
Monitor, maintain, and troubleshoot massive training and inference workloads end-to-end
Collaborate on advancing core model architectures, inference optimizations, and custom hardware design
Contribute to open-source community initiatives and research publications
Analyze and streamline data pipelines, instruction data curation, and evaluation methods
Apply advanced optimization theory to improve model performance

What we're looking for

Previous work at leading research labs or high-impact contributions to community AI projects
Experience with custom hardware, FPGA/ASIC design, or maximizing training throughput
Familiarity with open-source inference engines (e.g., llama.cpp, vllm, triton)
Academic publications in optimization, LLM training, or AI infrastructure
Prior work optimizing models for edge or device-level deployment

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Ai Machine Learning Deep Learning Generative Ai Pytorch Tensorflow

Machine Learning Engineer

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?