Role overview
**Member of Technical Staff, Machine Learning**
*San Francisco, CA (On-Site M-F)*
Our client is a cutting-edge AI startup in the Bay Area developing highly efficient foundational models for real-world deployment across devices. Rapidly growing, highly technical team focused on building top-tier large language model (LLM) architectures with real-world impact.
As a Member of Technical Staff, you’ll drive innovation on large-scale model training, infrastructure, and optimization. You’ll collaborate closely with a small team of seasoned researchers and engineers, advancing state-of-the-art LLMs for efficient deployment at scale.
**Responsibilities:**
* Design, implement, and optimize large-scale pretraining and post-training pipelines for language models
* Tackle challenges in model parallelism, distributed training, and low-level hardware/software co-design
* Monitor, maintain, and troubleshoot massive training and inference workloads end-to-end
* Collaborate on advancing core model architectures, inference optimizations, and custom hardware design
* Contribute to open-source community initiatives and research publications
* Analyze and streamline data pipelines, instruction data curation, and evaluation methods
* Apply advanced optimization theory to improve model performance
**Qualifications:**
* Degree in Computer Science, Electrical Engineering, or related technical field (or equivalent practical experience)
* Hands-on experience in machine learning research centered on LLMs, efficient AI systems, or large-scale model training
* Strong proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow)
* Expertise with distributed training, parallelization strategies, and large-scale computational infrastructure
* Understanding of low-level GPU optimizations, CUDA, or similar technologies
**Preferred Skills:**
* Previous work at leading research labs or high-impact contributions to community AI projects
* Experience with custom hardware, FPGA/ASIC design, or maximizing training throughput
* Familiarity with open-source inference engines (e.g., llama.cpp, vllm, triton)
* Academic publications in optimization, LLM training, or AI infrastructure
* Prior work optimizing models for edge or device-level deployment