posted Jun 01

Senior Site Reliability Engineer

Ansible AWS Azure Bash Cloud Docker GCP Go Grafana Java Kotlin Kubernetes Prometheus Python Terraform Unix senior

Job Location: San Francisco, California

Salary: $125,000 - $200,000 a year

Job Description

• Monitoring and Alerting: Set up and maintain monitoring systems to track the health and performance of applications and infrastructure. Create and manage alerting mechanisms to detect and respond to issues quickly. • Incident Response: Handle incidents and outages, working to resolve them swiftly and minimize downtime. Performing root cause analysis to prevent future occurrences and improve system resilience. • Automation: Develop tools and scripts to automate repetitive tasks, such as deployments, monitoring, and scaling, to increase efficiency and reduce human error. • Performance Optimization: Analyze system performance and identify bottlenecks or areas for improvement. Work with development teams to optimize code and infrastructure for better performance and resource utilization. • Capacity Planning: Plan for future growth by analyzing current usage trends and forecasting resource needs. Additionally, you’ll ensure that systems can handle increased load without compromising performance or reliability. • Service Level Objectives (SLOs) and Service Level Agreements (SLAs): Define and measure SLOs and SLAs to set expectations for system reliability and performance. Track these metrics and work to maintain or exceed the defined standards. • Incident Management and Postmortems: After incidents, conduct post mortems to document what went wrong, what was done to fix it, and how to prevent similar incidents in the future. This process helps in continuous improvement and learning from failures. • Collaboration with Development Teams: Work closely with software developers to integrate reliability and performance into the development process. Provide guidance on best practices and assist with designing resilient systems. • Security and Compliance: Ensure that systems are secure and compliant with relevant regulations and standards. They implement security measures, monitor for vulnerabilities, and respond to security incidents. • Continuous Improvement: Continuously look for ways to improve system reliability, performance, and efficiency. Stay updated with industry trends and advancements to implement the best practices and technologies. • Participate in an on-call rotation

Qualifications

• 5+ years of experience in site reliability including incident response, incident management, automation and performance optimization • 5+ years of experience in cloud platforms (AWS preferred) • 4+ years of experience working with DevOps technologies such as Docker, Kubernetes, Helm, and Terraform • 4+ years developing and maintaining CI/CD pipelines • 4+ years experience using a scripting language like python or bash • Experience coding in Kotlin or another JVM language is a plus

Benefits

• Mission and Impact: Grindr is building the global gayborhood in your pocket. Your role will impact the lives of millions of LGBTQ+ people around the world. Through our success, we are making a world where the lives of our community are free, equal, and just. • Family Insurance: Insurance premium coverage for health, dental, and vision for you and partial coverage for your dependents. • Retirement Savings: Generous 401K plan with 6% match and immediate vest in the U.S. • Compensation: Industry-competitive compensation and eligibility for company bonus and equity programs. • Queer-Inclusive Benefits: Industry-leading gender-affirming offerings with up to 90% cost coverage, access to Included Health, monthly stipends for HRT, and more. • Additional Benefits: Flexible vacation policy, monthly stipends for cell phone, internet, wellness, food, and commuting, breakfast/lunch provided onsite, and yearly travel & leisure stipend.

Related Jobs

logo
Company
Henry Schein One
Post Date
New
Title
.NET Staff Software Engineer
Type
$120,000 - $160,000 a year
Location
Remote
logo
Company
KUBRA
Post Date
New
Title
Senior Security Architect
Location
Unknown, California
logo
Company
Okta
Post Date
New
Title
Staff Site Reliability Engineer (Customer Identity Cloud)
Type
$160,000 - $240,000 a year
Location
Remote
logo
Company
Kiddom
Post Date
New
Title
Senior Software Engineer, Infrastructure
Location
Remote
logo
Company
OwnBackup
Post Date
New
Title
Team Lead, Production Engineer
Type
$160,000 - $210,000 a year
Location
Unknown, California