Role overview
The Opportunity
While 2024 was the year of AI experimentation, 2026 is the year of
Agentic AI
. Our client, a Financial Services leader in Fulton Market, is moving beyond simple chatbots to build autonomous AI agents capable of multi-step reasoning and enterprise-scale task execution.
We are looking for an AI Engineer who doesn't just "call an API" but understands how to build resilient, cost-effective, and secure production systems. You will be joining a high-priority squad tasked with automating core business logic using the latest LLM orchestration frameworks.
Key Responsibilities
- Agent Orchestration: Design and deploy multi-agent systems using frameworks like LangGraph , CrewAI , or PydanticAI to handle complex, non-linear workflows.
- Advanced RAG Pipelines: Implement and optimize Retrieval-Augmented Generation (RAG) using vector databases ( Pinecone , Weaviate , or Milvus ) and advanced reranking strategies.
- Model Optimization: Fine-tune open-source models (Llama 3/4, Mistral) using LoRA/QLoRA for domain-specific tasks while maintaining low latency.
- AI FinOps & Guardrails: Implement token-cost monitoring and security guardrails (Zero Trust integration) to ensure LLM outputs are safe, compliant, and within budget.
- Production Engineering: Containerize AI microservices using Docker/Kubernetes and set up CI/CD pipelines for model deployment and monitoring (MLOps).
Technical Requirements
- Python Mastery: 5+ years of Python development (including experience with Python 3.12+ features).
- LLM Experience: Proven track record of shipping LLM-powered applications to production (OpenAI, Claude, or local hosting).
- Data Architecture: Strong SQL skills and experience with unstructured data processing.
- Chicago Connection: Must be able to commute to the Chicago office 3 days a week to collaborate with the engineering leadership team.
What you'll work on
- Agent Orchestration: Design and deploy multi-agent systems using frameworks like LangGraph , CrewAI , or PydanticAI to handle complex, non-linear workflows.
- Advanced RAG Pipelines: Implement and optimize Retrieval-Augmented Generation (RAG) using vector databases ( Pinecone , Weaviate , or Milvus ) and advanced reranking strategies.
- Model Optimization: Fine-tune open-source models (Llama 3/4, Mistral) using LoRA/QLoRA for domain-specific tasks while maintaining low latency.
- AI FinOps & Guardrails: Implement token-cost monitoring and security guardrails (Zero Trust integration) to ensure LLM outputs are safe, compliant, and within budget.
- Production Engineering: Containerize AI microservices using Docker/Kubernetes and set up CI/CD pipelines for model deployment and monitoring (MLOps).
What we're looking for
- Python Mastery: 5+ years of Python development (including experience with Python 3.12+ features).
- LLM Experience: Proven track record of shipping LLM-powered applications to production (OpenAI, Claude, or local hosting).
- Data Architecture: Strong SQL skills and experience with unstructured data processing.
- Chicago Connection: Must be able to commute to the Chicago office 3 days a week to collaborate with the engineering leadership team.