Role overview
ABOUT OUR CLIENT
Our Client is a global digital transformation and technology solutions leader founded in 2009 and headquartered in the United States, with operations across North America, India, and the Philippines. They partner with more than 160 organizations worldwide — from innovative startups to Fortune 500 enterprises — delivering tailored solutions across software development, data analytics, AI, customer experience, and digital operations. Recognized for excellence and innovation, they’ve been honored with multiple international awards and certifications, including recognition from Deloitte for outstanding growth and performance in technology. With a team of over 2,000 professionals spanning engineering, AI/ML, digital marketing, and data science, Our Client empowers businesses through a holistic approach that integrates technology, process design, and analytics to drive measurable growth and operational excellence.
ABOUT THE ROLE
This is a high-impact, 20-hour contract role for an engineer passionate about building real-time, multimodal AI systems. You’ll design and optimize pipelines that fuse speech, vision, and large language models into seamless, reactive systems. Your work will help bring AI to life — enabling experiences measured in milliseconds, not minutes.
What we're looking for
* Architect ultra-low-latency AI systems integrating speech-to-text, language models, text-to-speech, and computer vision
* Develop real-time streaming and inference pipelines using WebRTC, websockets, and gRPC
* Design and integrate conversational flows with grounding, emotional tone, and memory
* Deploy and optimize GPU workloads at scale using Docker, Kubernetes, and Triton
* Build hybrid agent architectures combining LLMs, vision models, and custom logic
* Train, fine-tune, and optimize AI models across speech, vision, and transformer domains
* Develop retrieval-augmented generation (RAG) pipelines and multi-agent orchestration
* Write clean, modular, production-grade code that ships fast and scales elegantly
* Collaborate cross-functionally to build living, interactive AI products
* Expertise in speech AI, including streaming STT/TTS pipelines and latency tuning
* Experience integrating LLMs for conversational AI, prompt design, and guardrails
* Strong background in real-time engineering: WebRTC, sockets, gRPC, GPU streaming
* Proficiency in computer vision frameworks such as YOLO, SAM, and object tracking
* Hands-on experience with AI orchestration tools such as LangChain, Langflow, or CrewAI
* Advanced skills in ML infrastructure (Docker, Kubernetes, cloud GPU optimization)
* Fluency in Python (PyTorch/TensorFlow), TypeScript/Node, FastAPI, and API design
* Strong systems-thinking mindset — able to design agents that act, not just respond
* Experience with model quantization, distillation, or Triton inference servers
* Edge deployment expertise (Jetson, ARM, mobile models)
* Background in audio DSP, emotion recognition, or prosody modeling
* Experience building agent “personality engines” or affective AI systems
CONTRACT DETAILS
* Part-time contract: approximately 20 hours per week
* Duration: Ongoing
* Work arrangement: Remote
WHY THIS ROLE
This is not a maintenance role — it’s invent-the-product territory. You’ll have the chance to define the next interface of computing, helping to build AI that feels alive and responsive. If you’ve dreamed of creating real-time agents, multimodal copilots, or embodied intelligence, this is your opportunity to turn that vision into reality.