Role overview
Reddit is a community of communities where people can dive into anything through experiences built around their interests, hobbies, and passions. Our mission is to bring community, belonging, and empowerment to everyone in the world. Reddit users submit, vote, and comment on content, stories, and discussions about the topics they care about the most. From pets to parenting, there’s a community for everybody on Reddit and with over 50 million daily active users, it is home to the most open and authentic conversations on the internet. For more information, visit redditinc.com.
Our mission is to bring community and belonging to everyone in the world. Reddit is a community of communities where people can dive into anything through experiences built around their interests, hobbies, and passions. With more than 50 million people visiting 100,000+ communities daily, it is home to the most open and authentic conversations on the internet. From pets to parenting, skincare to stocks, there’s a community for everybody on Reddit. For more information, visit redditinc.com.
Our community of users generates over 65B analytics events per day, each of which is ingested by the Data Infrastructure team into a data warehouse that sees 55,000+ daily queries.
As a data infrastructure engineer, you will build and maintain the systems and tools used across Reddit to generate, ingest, and store petabytes of raw data. You will collaborate with your team and partner teams like machine learning, and Ads to create and improve scalable, fault tolerant, self-serve systems. You will also develop standards, and frameworks to ensure a high level of data quality to help shape the data culture across all of Reddit!
How You Will Contribute
- Refine and maintain our data infrastructure technologies to support real-time analysis of hundreds of millions of users.
- Own the data pipeline that surfaces 65B+ daily events to all teams, and the tools we use ingestion, storage and to improve data quality.
- Support warehousing, analytics and ML customers that rely on our data pipeline for analysis, modeling, and reporting.
- Build data pipelines with distributed streaming tools such as Kafka, Kinesis, Flink, or Spark
- Ship quality code to enable scalable, fault-tolerant and resilient services in a multi-cloud architecture
What we're looking for
- 3+ years of coding experience in a production setting writing clean, maintainable, and well-tested code.
- Experience with object-oriented programming languages such as Scala, Python, Go, or Java.
- Degree in Computer Science or equivalent technical field.
- Experience working with Terraform, Helm, Prometheus, Docker, Kubernetes, and CI/CD.
- Excellent communication skills to collaborate with stakeholders in engineering, data science, machine learning, and product.
- Comprehensive Health benefits
- 401k Matching
- Workspace benefits for your home office
- Personal & Professional development funds
- Family Planning Support
- Flexible Vacation & Reddit Global Days Off
- 4+ months paid Parental Leave
- Paid Volunteer time off