Job Description
Summary
Description
As a Site Reliability Engineering Manager, responsibilities include:
- Lead SRE teams responsible for reliability and performance of on-prem and cloud-based services
- Leading and growing the engineers on your team
- Manage staging and production environments with goal of maximizing availability
- Promote observability of systems for monitoring, alerting, and metrics reporting
- Advocate best practices of reliability engineering
Minimum Qualifications
- 10+ years experience with large scale distributed systems
- Demonstrable success leading engineering teams - ideally SRE or Production Engineering
- Knowledge of core operating system principles, networking fundamentals, and systems management
- Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts
- Strong leadership capabilities, with excellent problem-solving and decision-making skills.
- 10+ years professional experience in an engineering leadership position
Preferred Qualifications
- Bachelors or Master’s degree in computer science or equivalent field with 10+ years of experience