Job Description
Summary
Description
You will strive to to improve the stability, security, efficiency, and scalability of a 24/7 global service. You will participate in on-call rotations—we have geographically distributed SRE teams for follow-the-sun support. Your strong troubleshooting ability will be used daily to isolate issues and resolve the root cause through investigative analysis. The role also requires building and maintaining accurate, up-to-date documentation reflecting configuration, providing code reviews, and mentoring new team members.
An ideal candidate is an independent problem-solver who is focused and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner.
Minimum Qualifications
- Demonstrated a strong sense of ownership and integrity demonstrated through clear communication and collaboration.
- Sophisticated knowledge of one or more of the following: Kubernetes, containerisation systems, and/or public cloud infrastructure (AWS, GCP).
- Proficiency programming in Go, Python, or similar language to automate tasks.
- Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker).
Preferred Qualifications
- Working knowledge of multi-tier applications and their dependencies including load balancing, TCP/IP networking, web services, LDAP and DNS.
- Proficiency with web server administration including Apache and Nginx.
- Knowledge of database design, support and administration including Postgres, MySQL, and HBase.
- Network administration and troubleshooting.
- Good interpersonal skills shown through previous projects or assignments.