Role overview

Robots & Pencils is seeking an outcome-oriented AI Engineer to partner with a strategic client on a high-impact AI system calibration and optimization engagement. You'll embed directly with the client's AI and product engineering teams to improve the accuracy, reliability, and transparency of their Azure-hosted, fine-tuned GPT model through systematic prompt optimization and RAG calibration.

As an AI Engineer, you'll serve as technical thought partner, actively coding and leveraging your software engineering experience to build calibration pipelines, optimize prompts using prompt optimization frameworks, and establish repeatable improvement workflows. You'll work on-site with the client, driving measurable outcomes that maximize their AI system performance.

What you'll work on

Embed with strategic client as their technical partner for AI system calibration and prompt optimization.
Build production-grade calibration systems using Python within the client's Azure environment.
Implement DSPy framework and GEPA optimizer to systematically improve prompt quality and retrieval performance.
Design and develop Golden Dataset curation workflows using Azure Data Labeling, establishing gold/silver data tier schemas.
Create evaluation frameworks to measure model accuracy, precision/recall, latency, and hallucination rates.
Architect prompt optimization pipelines for retrieval, context synthesis, and answer generation tailored to client needs.
Own the path to production - evaluation pipelines, Azure ML workflows, KPI dashboards, and optimization automation.
Iterate rapidly based on client feedback and KPI results, translate business goals into technical calibration improvements.

What we're looking for

Direct hands-on experience with DSPy framework and GEPA optimizer.
Understanding systematic optimization principles: evolutionary algorithms, Bayesian optimization, multi-objective optimization, and Pareto efficiency concepts.
Familiarity with prompt optimization frameworks and methods - experience with any of: MIPROv2, TextGrad, EvoPrompt, AutoPrompt, or reinforcement learning approaches (GRPO, PPO).
Experience with LLM-as-judge patterns and automated evaluation pipelines.
Knowledge of advanced RAG patterns - Adaptive RAG, Self-RAG, Corrective RAG - and retrieval evaluation methods (MRR, NDCG, precision@k).
Understanding of agentic AI patterns - ReAct, Chain-of-Thought, Tool Use - and their application in RAG systems.
Experience building evaluation dashboards with Azure Monitor, Application Insights, or similar observability tools.
Familiarity with MLOps practices - model versioning, experiment tracking, metric logging for evaluation systems.
Experience with AWS or GCP AI/ML platforms (Bedrock, SageMaker, Vertex AI) and cross-cloud architecture patterns.
Experience with product catalog systems, cross-reference matching, or e-commerce search optimization.
Background in manufacturing, industrial equipment, or technical specification systems.
Prior consulting or professional services experience with enterprise clients.

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Remote Ai Ai Engineer Machine Learning Mlops Generative Ai

AI Engineer AI System Calibration Optimization

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?