Role overview
About the Role
As an AI Engineer , you will design, build, and operate agentic AI systems end-to-end —from concept to production. You’ll work on multi-agent orchestration, Retrieval-Augmented Generation (RAG), evaluation frameworks, and AI guardrails to build safe, reliable, and high-performing systems.
You will collaborate cross-functionally with product, ML, and design teams—bringing ideas to life through strong engineering execution, clear communication, and a low-ego, problem-solving mindset.
What we're looking for
1. RAG Development & Optimization
- Design and implement Retrieval-Augmented Generation pipelines to ground LLMs in enterprise or domain-specific data.
- Make strategic decisions on chunking strategy , embedding models , and retrieval mechanisms to balance context precision, recall, and latency.
- Work with vector databases (Qdrant, Weaviate, pgvector, Pinecone) and embedding frameworks (OpenAI, Hugging Face, Instructor, etc.).
- Diagnose and iterate on challenges like chunk size trade-offs , retrieval quality , context window limits , and grounding accuracy —using structured evaluation and metrics.
2. Chatbot Quality & Evaluation Frameworks
- Establish comprehensive evaluation frameworks for LLM applications, combining quantitative (BLEU, ROUGE, response time) and qualitative methods (human evaluation, LLM-as-a-judge, relevance, coherence, user satisfaction).
- Implement continuous monitoring and automated regression testing using tools like LangSmith , LangFuse , Arize , or custom evaluation harnesses .
- Identify and prevent quality degradation, hallucinations, or factual inconsistencies before production release.
- Collaborate with design and product to define success metrics and user feedback loops for ongoing improvement.
3. Guardrails, Safety & Responsible AI
- Implement multi-layered guardrails across input validation, output filtering, prompt engineering, re-ranking, and abstention (“I don’t know”) strategies.
- Use frameworks such as Guardrails AI , NeMo Guardrails , or Llama Guard to ensure compliance, safety, and brand integrity.
- Build policy-driven safety systems for handling sensitive data, user content, and edge cases with clear escalation paths.
- Balance safety, user experience, and helpfulness , knowing when to block, rephrase, or gracefully decline responses.
4. Multi-Agent Systems & Orchestration
- Design and operate multi-agent workflows using orchestration frameworks such as LangGraph , AutoGen , CrewAI , or Haystack .
- Coordinate routing logic, task delegation, and parallel vs. sequential agent execution to handle complex reasoning or multi-step tasks.
- Build observability and debugging tools for tracking agent interactions, performance, and cost optimization.
- Evaluate trade-offs around latency, reliability, and scalability in production-grade multi-agent environments.
- Strong proficiency in Python (FastAPI, Flask, asyncio) and GCP experience is good to have
- Demonstrated hands-on RAG implementation experience with specific tools, models, and evaluation metrics.
- Practical knowledge of agentic frameworks (LangGraph, LangChain) and evaluation ecosystems (LangFuse, LangSmith).
- Excellent communication skills , proven ability to collaborate cross-functionally , and a low-ego, ownership-driven work >
- Experience in traditional AI/ML workflows — e.g., model training, feature engineering, and deployment of ML models (scikit-learn, TensorFlow, PyTorch).
- Familiarity with retrieval optimization , prompt tuning , and tool-use evaluation .
- Background in observability and performance profiling for large-scale AI systems.
- Understanding of security and privacy principles for AI systems (PII redaction, authentication/authorization, RBAC)
- Exposure to enterprise chatbot systems , LLMOps pipelines , and continuous model evaluation in production.