Architect, Site Reliability Engineer
OwnBackup is one of the fastest growing global SaaS companies. With over 6,000 customers, we are ranked on the Forbes Cloud 100 as one of the world's top private cloud companies and have raised over $500 million in funding from AIkeon Capital, B Capital Group, BlackRock Private Equity Partners, Insight Partners and others.
Architect, SRE is a new role at OwnBackup reporting to our Director, SRE. We are seeking someone who will design and implement our global, multi-cloud monitoring and observability tooling across the SaaS application and infrastructure to ensure high availability and reliability of systems and applications.
This individual is responsible for availability, reliability, performance, monitoring and ensuring appropriate incident response for infrastructure and applications across 14 regions globally, multiple SaaS applications, and supporting over 6,000 customers.
Your Day-to-Day Role
- Design and implement an observability platform, working closely with DevOps, infrastructure, production engineering, security and software engineering teams across product areas, to increase supportability and help these teams become independent in troubleshooting production issues.
- Implement instrumentation and monitoring solutions that capture a comprehensive set of metrics, logs, and traces, and that enable us to quickly identify and troubleshoot issues.
- Work with senior engineering and quality assurance team members to build tools and testing strategies for problem prevention, detection, and chaos testing in production.
- Provide design reviews for engineering teams to ensure their systems are observable, scalable and reliable and subsequently perform Production Readiness Reviews before new features or products are introduced to production.
- Lead definition and tracking of product Error Budgets including the creation and maintenance of the SRE dashboard that also tracks our Service Level Indicators (SLI) and Service Level Objectives (SLO).
- Automate toilsome production tasks that can improve reliability and efficiency in operations.
- Support incident response by helping to troubleshoot production incidents in real time.
- Lead root cause investigations and improve service reliability through blameless post-incident reviews.
- Conduct on-going SRE training sessions for new hires across various departments in Product Development.
Your Work Experience
- Minimum 8-10 years of experience as a Senior DevOps Engineer, SRE, Infrastructure Engineer, Platform Architect or equivalent
- Experience defining SRE roadmap for organizations
- Passion to learn cutting edge technology
- Excellent communication and interpersonal skills, ability to work and coordinate between multiple teams, ability to work independently
- Proficiency in Linux / Unix
- Experience working with B2B SaaS, live production environments
- Extensive experience with public cloud technologies - AWS and Azure experience preferred. AWS GovCloud experience is a plus.
- Experience with configuration management systems Ansible and Terraform
- Experience in CI/CD and related tooling (Gitlab and Jenkins)
- Experience with Containerization (Docker, K8s)
- Experience in one or more of the following scripting languages: Bash, Python, Ruby
- Experience working in a Global, 24x7x365 organization
- Fluent in English - working with global teams
- Database management experience (MariaDB, Postgres)
- Experience with DevOps tools (Jenkins, GitLab CI, Nexus, Docker, K8s, etc)
- An eye for seeing a complicated process and desire to automate it out of existence.
Role will require
- International Travel at least 1x per quarter
- NJ Office attendance 2-3 times per week to interact with team
- Early hours at times to overlap with Israel timezone
- Pager/On-Call Duties which may include nights/weekends
This is a full-time position. The ideal candidate will work out of our Englewood Cliffs, NJ office to maximize collaboration and interaction with the business. Travel may be required.
OwnBackup is dedicated to creating an environment where employees thrive, which is why base pay is only one part of the total compensation package that is provided to compensate and recognize employees for their work. This role may also be eligible for unlimited PTO, generous medical benefits, a 401(k) savings plan with a 4% employer match, discretionary bonuses/incentives, and stock options. We also offer catered lunches in the office five days a week, a full fitness center, and free shuttle bus service to and from New York City.
Creating an environment where employees thrive also means making sure every employee feels accepted. As we scale to help all types of companies protect precious data, our team must reflect the diversity we serve. OwnBackup is an Equal Opportunity Employer and we believe that every employee in the company brings a unique perspective that they can and should contribute in order to make an impact every day. We strive to be one team and one culture that builds trust through transparency. We do not discriminate based on race, color, religion, sex, sexual orientation, gender identity, age, national origin, protected veteran status or disability status.
A Bit About Us
OwnBackup is a leading SaaS data protection platform for some of the largest SaaS ecosystems in the world, including Salesforce, Microsoft Dynamics 365, and ServiceNow. Through capabilities like data security, backup and recovery, archiving, and sandbox seeding, OwnBackup empowers thousands of organizations worldwide to manage and protect the mission-critical data that drives their business.