Role overview
We are seeking an AI Engineer with hands-on experience in open-source language models, local inference, RAG systems, agent architectures, function/tool calling, MCP, and end-to-end data pipelines.
This role requires someone who can:
● Understand business processes deeply,
● Communicate effectively with non-technical team members,
● Architect AI solutions that are stable, scalable, and actually useful in day-to-day operations.
This is a design → build → deploy → iterate role where your work will directly impact core business workflows.
What you'll work on
● Design, build, and deploy AI-powered tools and assistants that support clinical, staffing,
scheduling, analytics, and administrative workflows.
● Work with open-source LLMs (LLaMA, Mistral, Gemma, etc.) and local inference runtimes
(Ollama, vLLM, Text Generation Inference).
● Implement RAG pipelines using embeddings, vector databases (Chroma, Qdrant, Weaviate, pgvector), and retrieval heuristics tailored to business context.
● Build multi-tool / function-calling agents, including execution planning, state management, and iterative reasoning flows.
● Architect and integrate MCP-based agents with internal systems, CRMs, databases, analytics dashboards, and forms/workflows.
● Develop and maintain data pipelines for ingestion, cleaning, semantic indexing, embeddings, storage, and scheduled refresh.
● Deploy models and pipelines both locally and in the cloud (AWS, containerized GPU servers, macOS AI compute environments).
● Optimize inference performance, caching, batching, routing, and cost vs. latency trade-offs.
● Collaborate directly with non-technical staff to gather requirements and translate real operational needs into practical AI tools.
● Document workflows, maintain best practices, and train internal users on effective tool usage.
What we're looking for
● Healthcare operations / scheduling / staffing workflow familiarity.
● Knowledge of HIPAA and security practices around PHI/PII.
● Experience with Apple Silicon GPU/ML workloads (e.g., Mac Studio-based compute clusters).