Role overview
- Have industry experience as a Data Engineer, Machine Learning Engineer, or Data Scientist, dealing with data infrastructure, distributed systems, and fault tolerant data pipelines.
- Experience deploying models and infrastructure on Kubernetes.
- Experience with infrastructure tools for provisioning, deployment, and monitoring such as Terraform, AWS, Docker, and Datadog.
- Experience with heterogeneous data sources and data models including MongoDB, PostgreSQL, Redis, and Neo4J.
- Own problems end-to-end, and are willing to pick up whatever context is needed.
- You enjoy working as part of a fast-moving team, where perfectionism can sometimes be at odds with pragmatism.
- A desire to dig into problems across the stack, whether networking issues, performance bottlenecks, memory leaks, or simply reading unfamiliar code to figure out where potential issues might exist.
- Have a strong belief in the crucial need of high-quality data for producing state of the art machine learning systems.
What you'll work on
- Design data pipelines to handle large scale data ingest. This includes figuring out ways to store and process this data with robust features for filtering, pre-processing, and versioning.
- Build out data infrastructure to train large neural networks using self-supervised and contrastive learning.
- Build and refine custom data labeling services that directly influence the quality of our iris recognition engine.
- Work closely with other internal stakeholders to incorporate their data usage needs.
Tags & focus areas
Used for matching and alerts on DevFound Dev Machine Learning Infrastructure Kubernetes Docker Remote Engineer Terraform Aws