Role overview

The Role

You'll own the entire deployment pipeline and model serving infrastructure. This is a hybrid DevOps + MLOps role – you'll ensure our application deploys reliably AND that our AI models (both frontier and local) serve efficiently.

Our cost optimization strategy requires routing between expensive frontier models (Claude, GPT) and cost-effective local models (Llama, Mistral) based on task complexity. You'll build and own this infrastructure.

What you'll work on

DevOps

CI/CD pipelines – Automated build, test, and deploy on every push
Infrastructure as code – Terraform/Pulumi for reproducible environments
Monitoring & alerting – Know when things break before customers do
Incident response – Own uptime and reliability
Daily deploys – Enable the team to ship to production every day safely

MLOps

Model serving infrastructure – Deploy and serve LLMs (local and API-based)
Model router – Build the abstraction layer that routes requests to appropriate models
GPU infrastructure – Manage inference servers for local models (Llama, Mistral)
Cost optimization – Track and optimize model usage costs
Model versioning – Safe rollouts and rollbacks for prompt/model changes

Platform

Developer experience – Make the team faster through better tooling
Scaling – Prepare infrastructure for growth

Security (Critical)

Infrastructure security – Server hardening, network security, firewall configuration, VPC design
Secrets management – Vault, AWS Secrets Manager, or similar; no secrets in code
Access control – IAM policies, least-privilege principles, SSO integration
Vulnerability scanning – Automated scanning in CI/CD, dependency audits, container scanning
Intrusion detection – CloudTrail, GuardDuty, or similar; alert on suspicious activity
Encryption – Data at rest and in transit; key management
Incident response – Work with fractional CISO to implement detection, containment, and recovery procedures
Compliance – Support audits and maintain security documentation

Quality & Testing Infrastructure

CI/CD quality gates – Automated tests run on every push; bad code doesn't deploy
Test environment management – Staging environments that mirror production
LLM output monitoring – Track hallucinations, wrong tool calls, response quality in production
Security scanning – Automated vulnerability scanning in CI pipeline
Alerting & anomaly detection – Know when something breaks before customers do

Tech StackCurrent

Cloud: AWS (EC2, RDS, S3, Lambda)
Containers: Docker
CI/CD: GitHub Actions
Database: PostgreSQL (RDS)
Caching: Redis

You'll Build

Model serving: vLLM, Ollama, or similar for local inference
GPU compute: AWS/GCP GPU instances or dedicated inference providers
Model routing: Custom abstraction layer for model selection
Observability: Datadog, Grafana, or similar for unified monitoring

What we're looking for

MLOps experience – Model deployment, serving, monitoring
GPU infrastructure – Managing inference workloads
Experience with LLM serving (vLLM, TGI, Ollama)
Kubernetes experience
Cost optimization mindset

Experience serving both frontier APIs and local models
LangChain/LangSmith or similar LLM observability
Startup experience – comfort with ambiguity and speed
Texas location

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Remote Ai Mlops Generative Ai

DevOps MLOps Engineer

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?