FDM Group
AI

AI ML Platm Engineer

FDM Group · London, ENG, GB

Actively hiring Posted 1 day ago

Role overview

We are a business and technology consultancy and one of the UK's leading graduate employers, recruiting the brightest talent to become the innovators of tomorrow. We have centres across Europe, North America and Asia-Pacific, and a global workforce of over 2,500 employees. FDM has shown exponential growth throughout the years, firmly establishing itself as an award-winning employer and is listed on the FTSE4Good Index.

What you'll work on

AI Platform & Infrastructure

  • Engineer and operate the integration layer between front-end applications and AI solutions via API management and compute provisioning.
  • Design, implement, and maintain scalable, resilient infrastructure for AI/ML workloads, including performance testing and capacity planning.
  • Evaluate capacity and load patterns to optimise reliability of services within a Microsoft Services Architecture.
  • Manage authentication, identity, and role-based access controls across the platform.
  • Add, remove, and configure service components via Infrastructure as Code (IaC) and networking best practices.

ML Pipelines & Model Operations

  • Create and manage end-to-end ML pipelines for training, scoring, and deployment — including logic for domain-specific AI system behaviours (in-season, off-season, pre-game, post-game, during games, etc.).
  • Support model development workflows including pre-training, fine-tuning, prompt engineering, and Retrieval-Augmented Generation (RAG).
  • Analyse data to identify patterns, trends, and insights that inform model development and tuning.Enable continuous experimentation and comparison against baseline models.

Monitoring, Reliability & Data Operations

  • Monitor incoming data to detect data drift; trigger model retraining and configure rollback procedures for disaster recovery.
  • Monitor operational and ML-related issues by comparing model inputs, exploring model-specific metrics, and managing alerts on ML/AI platform components.
  • Manage data pipelines for training and scoring workloads.
  • Design and implement resilience patterns to ensure high availability and fault tolerance.

Tags & focus areas

Used for matching and alerts on DevFound
Ai Machine Learning Mlops