Job Summary
We are looking for an experienced Data Engineer to support AI Factory initiatives by building and maintaining data infrastructure for AI, automation, and GenAI applications.
The role will focus on designing data pipelines, preparing knowledge bases, managing vector databases, and enabling enterprise data sources for AI applications such as chatbots, AI assistants, and RAG-based solutions.
The ideal candidate will have strong experience in Python, SQL, Azure data services, ETL/ELT pipelines, data modelling, and AI-ready data preparation. Experience in public sector, higher education, research, or large enterprise environments is highly preferred.
Key Responsibilities
- Design, build, and maintain data pipelines to ingest, transform, and prepare data for AI applications
- Develop and maintain knowledge bases and vector databases for RAG systems
- Implement data quality checks, monitoring, and validation for AI data sources
- Build connectors and integrations with enterprise data platforms and systems
- Optimise data storage, retrieval, and performance for GenAI applications
- Support semantic search, embeddings, and retrieval workflows for LLM-based applications
- Document data architectures, data flows, and maintain data catalogues
- Collaborate with AI, application, and business teams to understand data requirements for chatbots and AI assistants
- Ensure data pipelines are scalable, reliable, and aligned with governance and compliance requirements
Required Qualifications and Skills
- Minimum 5+ years of experience in data engineering
- Strong programming experience in Python
- Strong expertise in SQL
- Experience with relational and NoSQL databases
- Hands-on experience with Azure data services, such as:
- Azure Data Factory
- Azure Synapse
- Azure Databricks
- Knowledge of ETL/ELT patterns and data pipeline orchestration
- Experience with data modelling and schema design
- Experience building and maintaining scalable data pipelines
- Ability to work with enterprise data sources and complex integration environments
- Strong documentation and communication skills
Preferred Qualifications
- Experience preparing data for LLM or GenAI applications
- Knowledge of RAG — Retrieval-Augmented Generation
- Experience with chunking, embeddings, semantic search, and retrieval
- Experience with vector databases such as:
- Pinecone
- Weaviate
- FAISS
- Azure AI Search
- Similar vector database platforms
- Experience with Azure OpenAI or similar GenAI platforms
- Familiarity with data governance, data privacy, and compliance requirements
- Experience with Google Cloud Platform data services such as:
- BigQuery
- Cloud Storage
- Dataflow
- Pub/Sub
- Experience in public sector, higher education, research, government, or multi-entity enterprise environments
Experience Requirement
Candidates must demonstrate recent experience, preferably within the last 18 months, in enterprise-scale data, digital, AI, or automation projects of comparable scope and complexity.
Experience in any of the following environments will be considered an advantage:
- Public sector organisations
- Government entities
- Higher education institutions
- Research organisations
- State-owned enterprises
- Large multi-entity organisations
Job Type: Permanent
Pay: Up to QAR20,000.00 per month
Work Location: In person