OpenAI
AI

Reliability Engineer

OpenAI · CA San Francisco, California, United States · $103k - $123k

Actively hiring Posted almost 2 years ago

Role overview


  • Enjoy seeking out and addressing bottlenecks and areas for performance improvement in our systems.

  • Utilize Infrastructure as Code (IaC) principles to automate infrastructure provisioning and configuration management.

  • Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.

  • Have a track record of accelerating engineering reliability by empowering your fellow engineers with excellent tooling and systems.

  • Help create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.

  • Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.

  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.

What we're looking for


  • Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).

  • Proven experience as an reliability engineer or a similar role in a fast-paced, rapidly scaling company.

  • Strong proficiency in cloud infrastructure.

  • Proficiency in programming/scripting languages.

  • Experience with containerization technologies and container orchestration platforms like Kubernetes.

  • Knowledge of IaC tools such as Terraform or CloudFormation.

  • Excellent problem-solving and troubleshooting skills.

  • Strong communication and collaboration skills.

  • Experience with observability tools such as DataDog, Prometheus, Grafana, Splunk and ELK stack.

  • Experience with microservices architecture and service mesh technologies.

  • Knowledge of security best practices in cloud environments.

This role is exclusively based in our San Francisco HQ. We offer relocation assistance to new employees.

#LI-TN1

Tags & focus areas

Used for matching and alerts on DevFound
Dev Reliability Kubernetes Openai Terraform