Role overview
**Job Description:**
* Deploy and secure an on-premises AI infrastructure for hosting large language models (LLMs)
* Install, configure, and maintain AI model-serving frameworks on internal GPU-enabled servers
* Develop and maintain robust, scalable APIs to ensure internal access to AI capabilities and seamless integration with enterprise applications and data systems
* Collaborate on the implementation of a Retrieval-Augmented Generation (RAG) pipeline and AI agents to automate business workflows
**Requirements:**
* Bachelor's degree or higher in computer science, electrical/computer engineering or related field
* Minimum 4 years of experience in systems engineering, DevOps, or MLOps Role
* Proficiency in Linux Server Administration
* Strong working knowledge of GPU-accelerated compute environments
* Proficiency in Python for scripting, automation, and building AI/ML data pipelines
* Experience deploying LLMs or generative AI models in production environments
* Working knowledge of RAG architectures, including vector databases, embedding models, and retrieval strategies.