posted Jun 05

Senior DevOps Infrastructure Engineer (US)

Ansible Bash Cloud Go Grafana Kubernetes MySQL Prometheus Puppet Python SDLC VMware senior

Job Location: Remote

Salary: $120,000 - $155,000 a year

Job Description

• Administration - Participate in maintenance and operations of our production environment, including patching, deployment, server administration, and troubleshooting, either using configuration as code tooling or manually. • Reliability & Performance - Ensure reliability, availability and performance of services. Respond to incidents and resolve before they become customer impacting. • Projects - Deliver complex solutions that traverse all layers of the technology stack: Operating System, Virtualisation, Network, Storage & Cloud. • Data Centre - Participate and coordinate on-site deployments of critical hardware, including servers and storage. • Collaboration - Work closely with teammates, software, and security teams to rapidly meet customer, business, and compliance needs. • Automation - Drive the automation of operational tasks, and ensure our infrastructure is more like cattle than pets. • Observability - Develop and maintain internal and commercial or OSS tools to improve system health, performance, and deployment. • Continuous Improvement - Drive never-ending improvement in SRE processes, tools, and methodologies. Take a leading role in blameless post-mortems to avoid repeat issues or mistakes and clearly document all lessons learned for others. If you love writing actionable documentation, we’d love to set up an interview. • On-Call - Participate in a rotating 24x7 on-call schedule with your team to ensure availability of services across the production environment.

Qualifications

• 5+ years of experience in Site Reliability Engineering, DevOps, System Administration, or similar roles. • Deep experience working in colocation facilities – we have a hybrid footprint, and if you have only worked in the public cloud space, this role is not a great fit for you. • Experience using Puppet, Ansible, or other common configuration as code tooling to deploy and configure systems. • Strong familiarity with Linux systems (any distro is fine, but we have a preference for RHEL downstreams). • Experience using Proxmox, VMWare, or KVM as virtualization platforms for large-scale production environments. • Experience administering enterprise grade SANs and load balancers is necessary to be successful in this role. • Demonstrated proficiency in one or more scripting or programming languages (e.g., Python, Go, Bash/ZSH, etc.) • Multiple years experience proactively implementing and responding to infrastructure, application, and network alerts using industry standard or homebrew toolchains. • Strong problem-solving skills and experience working in extreme high availability production environments (99.95% or greater), with high performance requirements, is required.

Benefits

• A remote first culture! • Flex PTO • Health, Dental and Vision Insurance • 13 Paid Holidays • Company volunteer days

Related Jobs

logo
Company
Henry Schein One
Post Date
New
Title
.NET Staff Software Engineer
Type
$120,000 - $160,000 a year
Location
Remote
logo
Company
KUBRA
Post Date
New
Title
Senior Security Architect
Location
Unknown, California
logo
Company
Okta
Post Date
New
Title
Staff Site Reliability Engineer (Customer Identity Cloud)
Type
$160,000 - $240,000 a year
Location
Remote
logo
Company
Kiddom
Post Date
New
Title
Senior Software Engineer, Infrastructure
Location
Remote
logo
Company
OwnBackup
Post Date
New
Title
Team Lead, Production Engineer
Type
$160,000 - $210,000 a year
Location
Unknown, California