Role overview
Job Title:
LLM Engineer
Job Type:
Contract
(W2 Only)
Contract Duration:
ASAP through 12/31/2025 (with good potential for extension into 2026)
Work Location:
San Jose, CA (HYBRID role; Onsite 2 days per week)
Work Schedule/Hours:
Monday–Friday, 8 hours per day, 40 hours per week (standard business hours)
Compensation:
$85 to $95 per hour
Overview:
A leading Big Four consulting firm is seeking a highly-skilled
LLM Engineer
to design, train, and optimize large language models that drive cutting-edge applications in generative AI and natural language understanding. This role offers the opportunity to work on advanced model development, scalable deployment systems, and innovative research alongside cross-functional product and engineering teams.
Responsibilities:
Model Development & Optimization
- Design, train, fine-tune, and evaluate large language models (LLMs) to ensure high performance, efficiency, and alignment with research or product goals.
- Optimize model architectures, tokenization strategies, and data pipelines to enhance throughput and model accuracy.
Systems Integration & Deployment
- Build and maintain scalable inference pipelines for production environments.
- Optimize serving infrastructure using techniques such as quantization, caching, pruning, and distillation.
- Integrate trained models into enterprise applications, APIs, or end-user products.
Research & Cross-Functional Collaboration
- Lead experimentation with new architectures, retrieval-augmented generation (RAG) frameworks, and prompt-engineering techniques.
- Collaborate closely with product managers, data scientists, and ML operations teams to translate research into production-grade solutions.
- Stay current with advancements in transformer architectures, fine-tuning methods, and LLM safety/alignment best practices.
Qualifications:
Required:
- High school diploma or GED required; Bachelor’s degree or higher preferred.
- 5+ years of experience in machine learning, NLP, or large-scale model development.
- Strong understanding of deep learning frameworks such as PyTorch or TensorFlow.
- Experience building, training, or fine-tuning large language models (e.g., GPT, LLaMA, PaLM, Falcon, etc.).
- Solid programming skills in Python, with experience in distributed training and cloud-based ML infrastructure (AWS, GCP, or Azure).
- Strong problem-solving and communication skills, with the ability to work cross-functionally in fast-paced environments.
Preferred:
- Experience with retrieval systems, vector databases, or RAG pipelines.
- Familiarity with model alignment, evaluation metrics, and responsible AI practices.
What you'll work on
Model Development & Optimization
- Design, train, fine-tune, and evaluate large language models (LLMs) to ensure high performance, efficiency, and alignment with research or product goals.
- Optimize model architectures, tokenization strategies, and data pipelines to enhance throughput and model accuracy.
Systems Integration & Deployment
- Build and maintain scalable inference pipelines for production environments.
- Optimize serving infrastructure using techniques such as quantization, caching, pruning, and distillation.
- Integrate trained models into enterprise applications, APIs, or end-user products.
Research & Cross-Functional Collaboration
- Lead experimentation with new architectures, retrieval-augmented generation (RAG) frameworks, and prompt-engineering techniques.
- Collaborate closely with product managers, data scientists, and ML operations teams to translate research into production-grade solutions.
- Stay current with advancements in transformer architectures, fine-tuning methods, and LLM safety/alignment best practices.
What we're looking for
Required:
- High school diploma or GED required; Bachelor’s degree or higher preferred.
- 5+ years of experience in machine learning, NLP, or large-scale model development.
- Strong understanding of deep learning frameworks such as PyTorch or TensorFlow.
- Experience building, training, or fine-tuning large language models (e.g., GPT, LLaMA, PaLM, Falcon, etc.).
- Solid programming skills in Python, with experience in distributed training and cloud-based ML infrastructure (AWS, GCP, or Azure).
- Strong problem-solving and communication skills, with the ability to work cross-functionally in fast-paced environments.
- Experience with retrieval systems, vector databases, or RAG pipelines.
- Familiarity with model alignment, evaluation metrics, and responsible AI practices.