Role overview

Greetings

We're seeking a
hands-on MLOps Solution Architect
to design and implement scalable, secure, and cost-effective ML platforms on
AWS
. You'll lead the end-to-end architecture for model training, CI/CD pipelines, deployment strategies, monitoring, and governance across teams of data scientists and engineers.

Location: Toronto, ON (Hybrid- 3 days onsite per week)

Client: One of the largest banks in Canada

Duration: Long-term contract

What we're looking for

Architect MLOps frameworks using AWS SageMaker , EKS , ECR , CodePipeline , and Step Functions
Design pipelines for data prep, training, evaluation, registry, and automated deployment
Integrate MLflow or SageMaker Model Registry for model tracking and lifecycle management
Implement model serving strategies — batch, online, A/B, shadow, and canary rollouts
Set up monitoring with CloudWatch , Evidently AI , Prometheus , or WhyLabs
Establish governance: lineage, audit trails, model approvals, and access controls (IAM, KMS)
Drive standardization across MLOps templates and Infrastructure as Code (Terraform or CloudFormation)
Collaborate with Data Engineering and DevOps to align ML pipelines with enterprise architecture

Must-Have Skills

14+ years of experience in ML/AI platform design and data infrastructure
Deep expertise in AWS services:
Compute: EC2, EKS, Batch, Lambda
Storage: S3, Lake Formation, Glue Catalog
Pipeline: Step Functions, CodePipeline, Airflow
Training/Serving: SageMaker (Studio, Training, Model Registry, Endpoints)
Monitoring: CloudWatch, CloudTrail, Prometheus
Security: IAM, Secrets Manager, KMS, VPC
Proficient in Python and infrastructure scripting (Terraform, CloudFormation)
Experience building and deploying models in production environments (CI/CD)
Familiar with data versioning (DVC, Delta Lake) and experiment tracking (MLflow)
Strong understanding of containerization (Docker, EKS) and Kubernetes-based serving
Excellent communication and stakeholder management

Knowledge of Generative AI and LLM deployment using AWS Bedrock or custom endpoints
Familiarity with event-driven pipelines using SNS/SQS or Kinesis
Model performance optimization with GPU instances and autoscaling
Cost governance and monitoring for ML workloads
Experience in financial or regulated industries (governance, model risk)

Best Regards,

Tags & focus areas

Used for matching and alerts on DevFound

Solution Engineer Architecture Aws Docker Kubernetes Python Terraform Airflow Mlflow

MLOps Solution Architect

Role overview

What we're looking for

Tags & focus areas

Ready to Join the Team?