RecruitSeq
AI

Machine Learning Engineer

RecruitSeq · San Francisco, CA

Actively hiring Posted 3 months ago

Role overview

Member of Technical Staff, Machine Learning

San Francisco, CA (On-Site M-F)

Our client is a cutting-edge AI startup in the Bay Area developing highly efficient foundational models for real-world deployment across devices. Rapidly growing, highly technical team focused on building top-tier large language model (LLM) architectures with real-world impact.

As a Member of Technical Staff, you’ll drive innovation on large-scale model training, infrastructure, and optimization. You’ll collaborate closely with a small team of seasoned researchers and engineers, advancing state-of-the-art LLMs for efficient deployment at scale.

Responsibilities:

  • Design, implement, and optimize large-scale pretraining and post-training pipelines for language models
  • Tackle challenges in model parallelism, distributed training, and low-level hardware/software co-design
  • Monitor, maintain, and troubleshoot massive training and inference workloads end-to-end
  • Collaborate on advancing core model architectures, inference optimizations, and custom hardware design
  • Contribute to open-source community initiatives and research publications
  • Analyze and streamline data pipelines, instruction data curation, and evaluation methods
  • Apply advanced optimization theory to improve model performance

Qualifications:

  • Degree in Computer Science, Electrical Engineering, or related technical field (or equivalent practical experience)
  • Hands-on experience in machine learning research centered on LLMs, efficient AI systems, or large-scale model training
  • Strong proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow)
  • Expertise with distributed training, parallelization strategies, and large-scale computational infrastructure
  • Understanding of low-level GPU optimizations, CUDA, or similar technologies

Preferred Skills:

  • Previous work at leading research labs or high-impact contributions to community AI projects
  • Experience with custom hardware, FPGA/ASIC design, or maximizing training throughput
  • Familiarity with open-source inference engines (e.g., llama.cpp, vllm, triton)
  • Academic publications in optimization, LLM training, or AI infrastructure
  • Prior work optimizing models for edge or device-level deployment

What you'll work on

  • Design, implement, and optimize large-scale pretraining and post-training pipelines for language models
  • Tackle challenges in model parallelism, distributed training, and low-level hardware/software co-design
  • Monitor, maintain, and troubleshoot massive training and inference workloads end-to-end
  • Collaborate on advancing core model architectures, inference optimizations, and custom hardware design
  • Contribute to open-source community initiatives and research publications
  • Analyze and streamline data pipelines, instruction data curation, and evaluation methods
  • Apply advanced optimization theory to improve model performance

What we're looking for

  • Previous work at leading research labs or high-impact contributions to community AI projects
  • Experience with custom hardware, FPGA/ASIC design, or maximizing training throughput
  • Familiarity with open-source inference engines (e.g., llama.cpp, vllm, triton)
  • Academic publications in optimization, LLM training, or AI infrastructure
  • Prior work optimizing models for edge or device-level deployment

Tags & focus areas

Used for matching and alerts on DevFound
Fulltime Ai Machine Learning Deep Learning Generative Ai Pytorch Tensorflow