Role overview

At RubyLabs, we’re seeking a senior AI Engineer (Node.js / Next.js / TypeScript) to shape our AI infrastructure and drive production-ready LLM experiences. You’ll work in a modern stack, making data-driven decisions around model performance, reliability, and cost.

You’ll own advanced prompt systems, structured outputs, and complex LLM workflows using LangChain or LlamaIndex. Observability, debugging, and evaluation are core to the role, leveraging Langfuse and AI gateways like OpenRouter to continuously improve model quality and operational efficiency. You’ll take full ownership of key AI features from experimentation to live production.

What you'll work on

Advanced Prompt Engineering: Designing complex, dynamic prompt templates with conditional logic and efficiently reusing information and context within prompts to maximize generation quality and reasoning.
Structured Outputs & Schemas: Implementing various response schemes (JSON mode, function calling, Zod/JSON schemas) to ensure AI outputs are predictable and ready for seamless integration into application logic.
Prompt Engineering & Evaluations: Building robust evaluation pipelines and using Langfuse to collect feedback and score the quality of responses in real time.
Tracing & Debugging: Performing deep debugging of complex LLM chains using Langfuse traces to identify bottlenecks and optimize for cost, latency, and context window usage.
AI A/B Testing: Running systematic experiments across different models via OpenRouter (e.g., comparing Claude 3.5 Sonnet vs. GPT-4o) and analyzing results based on quantitative metrics.
Data-Driven Decisions: Making deployment decisions for new prompts or models strictly based on quantitative benchmarks and trace data, rather than intuition.
Output Scoring & Analysis: Developing scoring systems to analyze the “Problem Solution” chain and identify root causes of hallucinations or logic errors using Langfuse analytics.
Model Performance & Fine-Tuning: Regularly re-evaluating model performance as new architectures emerge and performing fine-tuning when necessary to meet specific domain requirements.

What we're looking for

Fine-Tuning: Practical experience in fine-tuning models for specific domain tasks or JSON compliance.
RAG Architecture: Understanding how to build and optimize Retrieval-Augmented Generation systems, including indexing, retrieval, and re-ranking.
Python: Basic knowledge for working with data science scripts or AI evaluation libraries.

Interview process

After submitting your application, we conduct a thorough review which typically takes 3 to 5 days, but may occasionally take longer due to the volume of applications received. If we see a potential fit, we proceed with the following steps:

Recruiter Screening (40 minutes)
Technical Interview (60 minutes)
Final Interview (30 minutes)

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Remote Ai Ai Engineer Generative Ai

AI Engineer

Role overview

What you'll work on

What we're looking for

Interview process

Tags & focus areas

Ready to Join the Team?