Iris Software Inc.
AI

MLOps Solution Architect

Iris Software Inc. · Toronto, ON · $98k - $120k

Actively hiring Posted 4 months ago

Role overview

Greetings

We're seeking a
hands-on MLOps Solution Architect
to design and implement scalable, secure, and cost-effective ML platforms on
AWS
. You'll lead the end-to-end architecture for model training, CI/CD pipelines, deployment strategies, monitoring, and governance across teams of data scientists and engineers.

Location: Toronto, ON (Hybrid- 3 days onsite per week)

Client: One of the largest banks in Canada

Duration: Long-term contract

What we're looking for

  • Architect MLOps frameworks using AWS SageMaker , EKS , ECR , CodePipeline , and Step Functions
  • Design pipelines for data prep, training, evaluation, registry, and automated deployment
  • Integrate MLflow or SageMaker Model Registry for model tracking and lifecycle management
  • Implement model serving strategies — batch, online, A/B, shadow, and canary rollouts
  • Set up monitoring with CloudWatch , Evidently AI , Prometheus , or WhyLabs
  • Establish governance: lineage, audit trails, model approvals, and access controls (IAM, KMS)
  • Drive standardization across MLOps templates and Infrastructure as Code (Terraform or CloudFormation)
  • Collaborate with Data Engineering and DevOps to align ML pipelines with enterprise architecture

Must-Have Skills

  • 14+ years of experience in ML/AI platform design and data infrastructure
  • Deep expertise in AWS services:
  • Compute: EC2, EKS, Batch, Lambda
  • Storage: S3, Lake Formation, Glue Catalog
  • Pipeline: Step Functions, CodePipeline, Airflow
  • Training/Serving: SageMaker (Studio, Training, Model Registry, Endpoints)
  • Monitoring: CloudWatch, CloudTrail, Prometheus
  • Security: IAM, Secrets Manager, KMS, VPC
  • Proficient in Python and infrastructure scripting (Terraform, CloudFormation)
  • Experience building and deploying models in production environments (CI/CD)
  • Familiar with data versioning (DVC, Delta Lake) and experiment tracking (MLflow)
  • Strong understanding of containerization (Docker, EKS) and Kubernetes-based serving
  • Excellent communication and stakeholder management
  • Knowledge of Generative AI and LLM deployment using AWS Bedrock or custom endpoints
  • Familiarity with event-driven pipelines using SNS/SQS or Kinesis
  • Model performance optimization with GPU instances and autoscaling
  • Cost governance and monitoring for ML workloads
  • Experience in financial or regulated industries (governance, model risk)

Best Regards,

Tags & focus areas

Used for matching and alerts on DevFound
Solution Engineer Architecture Aws Docker Kubernetes Python Terraform Airflow Mlflow