Role overview

Please do NOT apply to this role through LinkedIn! ... US Only

Let’s face it, nobody cares about a puffed-up job description telling you how “cutting edge” the company is.

So let’s make this about you.

It’s about you being a builder... and currently not building anything that actually works under pressure.
It’s about you craving deeper technical challenges, but you’re too busy duct-taping brittle prompts to chase that dream.
It’s about you knowing there's a better way—one rooted in consistency, metrics, and clarity—but your current team thinks “talking nicely to the model” is enough.
It’s about you wanting ownership over cognitive systems that behave as predictably as real code.

…And it’s about you tackling the hard problems—like output reliability, schema enforcement, and evaluation pipelines—that actually move AI from magic to product.

Your Problems Solved!
This is not a "creative writing" gig. You’ll engineer deterministic behavior inside non-deterministic systems. Your mission is to architect prompts that yield consistent, structured outputs (primarily JSON), integrate directly into Intertru.ai’s API-driven stack, and hold up under real-world usage—every time.

By designing golden datasets, regression test suites, and evaluation metrics, your work will define what "reliable AI" means. You’ll analyze performance across models like GPT-4o, Claude 3.5 Sonnet, and open-source alternatives to select the best engine for each job. Your systems-thinking mindset will empower our platform to move faster, scale smarter, and deliver accuracy at scale.

In the first 90 days, you will conquer the following:
30-Day Deliverables:

Audit all current production prompts and identify at least 3 key inconsistency risks across our core flows (accuracy, structure, or latency).
Create and document a standardized JSON schema format for all structured outputs and enforce it across at least 2 primary use cases.
Build a "Golden Dataset" of 100+ validated prompt/input-output examples to serve as a foundation for regression testing.

60-Day Deliverables:

Launch automated regression testing for at least 3 production prompts, with a defined performance scoring system (accuracy, structure adherence, token cost).
Benchmark at least 2 LLMs (e.g., GPT-4o vs Claude 3.5) for latency, cost, and output consistency on key use cases; present findings and recommendation.
Collaborate with engineering to integrate prompt versioning into Git or a prompt ops platform; document and train team on rollback protocol.

90-Day Deliverables:

Design and deploy a full prompt evaluation suite (Eval) that triggers on prompt updates, validating against golden datasets.
Optimize at least 2 prompts for lower token usage while maintaining ≥95% accuracy on critical structured outputs.
Publish a 90-day prompt performance report tracking model behavior, format consistency, and improvement over time.

Soft Skills
You thrive in ambiguity, love systems thinking, and obsess over repeatability. Your communication is crisp and direct—because clarity builds trust across engineering and product teams. You’re the person who calls out vague requirements and replaces them with testable criteria. You have a scientific mindset, a relentless curiosity, and a deep intolerance for flakiness in AI behavior. You play well with builders but know how to lead when it’s time to ship.

Technical Skills
You bring hands-on experience with OpenAI, Anthropic, or open-source LLMs via API—not just chat interfaces. You’ve deployed prompt chains that use advanced techniques like CoT, Few-Shot, and ToT prompting to solve complex problems. You know how to enforce strict JSON/XML structures in responses, and how to test them for integrity. You’re fluent in Python and capable of building eval harnesses using tools like LangChain, DSPy, or custom logic. You can benchmark models, dial in temperature, and debug prompt behaviors under load.

What we're looking for

About Us
Intertru.ai is a method-driven interviewing accuracy platform that uses AI and the HireOS® methodology to ensure that your company hires the strongest people, every time. We released our MVP in November and have already secured paying customers. We are bootstrapped, pre-funded, and solving massive challenges—especially around customer success delivery and execution excellence.

We’re managing, but we’re also pushing limits every day.
What really makes us a place where you’ll thrive? Our Core Values. We live by them. We hire by them. We build by them. https://www.intertru.ai

Our core values guide our commitment to excellence:
You First
We prioritize and empower your growth. We care deeply about you and your business. Our actions demonstrate genuine empathy, understanding, and solutions that deliver measurable value to you. Also, we say please, thank you, and hold the door open for others.
Dig Deeper
We listen to understand, not to respond and it is in our nature to ask the difficult questions that enable us to attack the root of the issue. We consistently hone our expertise using targeted data. Making sure our solutions are proven in the real world with concrete results.
Commit Action
We set measurable goals, hold each other accountable, and take corrective action. Our approach is to initiate communication that uncovers issues before they become real problems. We operate from win-win and walk from win-lose situations.
Re-Evolutionary
We are people who defy the shackles of conventional constraints. We reject “best practices” “conventional wisdom”, or the “status quo” perspectives that produce mediocre results. With a relentless spirit of rebellion, we celebrate failure and challenge one another to break boundaries and rewrite the rules of what real success means.

Here’s your Call To Action!
If you’ve read this far, you have grit—and we should meet. While a resume is important, we want to get to know you as a person.

Please email your resume to [email protected] and answer these three questions:

What motivated you to respond to this role?
What do you feel you are capable of achieving?
How can we best reach you for a conversation this week?

We only respond to people who invest the time in providing well-thought-out (non-AI) responses!

Show less

Seniority level

Not Applicable

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Technology, Information and Internet

Tags & focus areas

Used for matching and alerts on DevFound

Founding AI Prompt Engineer

Role overview

What we're looking for

Tags & focus areas

Ready to Join the Team?