Role overview
The Role
You'll own the entire deployment pipeline and model serving infrastructure. This is a hybrid DevOps + MLOps role – you'll ensure our application deploys reliably AND that our AI models (both frontier and local) serve efficiently.
Our cost optimization strategy requires routing between expensive frontier models (Claude, GPT) and cost-effective local models (Llama, Mistral) based on task complexity. You'll build and own this infrastructure.
What you'll work on
DevOps
- CI/CD pipelines – Automated build, test, and deploy on every push
- Infrastructure as code – Terraform/Pulumi for reproducible environments
- Monitoring & alerting – Know when things break before customers do
- Incident response – Own uptime and reliability
- Daily deploys – Enable the team to ship to production every day safely
MLOps
- Model serving infrastructure – Deploy and serve LLMs (local and API-based)
- Model router – Build the abstraction layer that routes requests to appropriate models
- GPU infrastructure – Manage inference servers for local models (Llama, Mistral)
- Cost optimization – Track and optimize model usage costs
- Model versioning – Safe rollouts and rollbacks for prompt/model changes
Platform
- Developer experience – Make the team faster through better tooling
- Scaling – Prepare infrastructure for growth
Security (Critical)
- Infrastructure security – Server hardening, network security, firewall configuration, VPC design
- Secrets management – Vault, AWS Secrets Manager, or similar; no secrets in code
- Access control – IAM policies, least-privilege principles, SSO integration
- Vulnerability scanning – Automated scanning in CI/CD, dependency audits, container scanning
- Intrusion detection – CloudTrail, GuardDuty, or similar; alert on suspicious activity
- Encryption – Data at rest and in transit; key management
- Incident response – Work with fractional CISO to implement detection, containment, and recovery procedures
- Compliance – Support audits and maintain security documentation
Quality & Testing Infrastructure
- CI/CD quality gates – Automated tests run on every push; bad code doesn't deploy
- Test environment management – Staging environments that mirror production
- LLM output monitoring – Track hallucinations, wrong tool calls, response quality in production
- Security scanning – Automated vulnerability scanning in CI pipeline
- Alerting & anomaly detection – Know when something breaks before customers do
Tech StackCurrent
- Cloud: AWS (EC2, RDS, S3, Lambda)
- Containers: Docker
- CI/CD: GitHub Actions
- Database: PostgreSQL (RDS)
- Caching: Redis
You'll Build
- Model serving: vLLM, Ollama, or similar for local inference
- GPU compute: AWS/GCP GPU instances or dedicated inference providers
- Model routing: Custom abstraction layer for model selection
- Observability: Datadog, Grafana, or similar for unified monitoring
What we're looking for
- MLOps experience – Model deployment, serving, monitoring
- GPU infrastructure – Managing inference workloads
- Experience with LLM serving (vLLM, TGI, Ollama)
- Kubernetes experience
- Cost optimization mindset
- Experience serving both frontier APIs and local models
- LangChain/LangSmith or similar LLM observability
- Startup experience – comfort with ambiguity and speed
- Texas location