posted Jun 01
Senior Site Reliability Engineer
Job Location: San Francisco, California
Salary: $150,000 - $200,000 a year
Job Description
• Maintain, improve, scale and secure our AWS/GCP infrastructure and Linux systems • Assist our development teams in running, packaging, deploying and troubleshooting applications • Work with developers on streamlining deployment processes with Jenkins and other CI/CD tooling • Build, maintain, monitor and improve our Kubernetes clusters • Work with development teams on migrating applications to Kubernetes • Be responsible for maintenance and improvements to multiple internal services, for example Kubernetes, Prometheus, ELK • Monitor, triage and respond to alerts in our high availability environments • Participate in design and code reviews, and ensure that the foundation for our services is best in class • Evaluate new technologies, design and implement as appropriate • Identify automation opportunities and implement by creating custom or by using off the shelf solutions
Qualifications
• 5+ years of experience of working in cloud-based systems operations, as a SRE or DevOps engineer • First-hand experience with configuration management and infrastructure as code (Ansible, Puppet, Terraform) • Proficient in utilizing SRE methodologies like capacity planning and disaster recovery testing to ensure the scalability, resilience, and availability of critical services • A strong understanding of computer networking, TCP/UDP, load balancing, distributed computing, web services, and the fundamental protocols used by the internet (HTTP, HTTPS, DNS, etc.) • Experienced in managing production workloads and skilled in using monitoring tools to detect issues early • Comfortable with participating in on-call rotations and conducting thorough root cause analyses to keep systems running smoothly • Proficiency in at least one programming language • Committed to supporting teammates, especially during challenging times, and excited about working in a close-knit, growing team. Approachable, empathetic, and proactive in promoting collaboration and innovation • Excels in working independently, demonstrating the ability to accomplish tasks without constant monitoring • Production experience building and maintaining Kubernetes clusters • Bonus: Ability to understand Go, Rust, C++ and TypeScript source code
Benefits
• Competitive health, dental & vision coverage • Flexible time off + 15 company holidays including a company-wide holiday break • Paid parental leave • Life & ADD • Short & Long term disability • FSA & Dependent Care Accounts • 401K (4% match) • Employee Assistance Program • Monthly gym allowance • Daily lunch and snacks in-office • L&D budget of $1,500/year • Company retreats

Related Jobs

- Company
- Henry Schein One
- Post Date
- New
- Title
- .NET Staff Software Engineer
- Type
- $120,000 - $160,000 a year
- Location
- Remote

- Company
- KUBRA
- Post Date
- New
- Title
- Senior Security Architect
- Location
- Unknown, California

- Company
- Okta
- Post Date
- New
- Title
- Staff Site Reliability Engineer (Customer Identity Cloud)
- Type
- $160,000 - $240,000 a year
- Location
- Remote

- Company
- Kiddom
- Post Date
- New
- Title
- Senior Software Engineer, Infrastructure
- Location
- Remote

- Company
- OwnBackup
- Post Date
- New
- Title
- Team Lead, Production Engineer
- Type
- $160,000 - $210,000 a year
- Location
- Unknown, California