About ESO
ESO is a rapidly growing technology company that is passionate about improving community health and safety through the power of data. We provide software applications, interoperability and data management solutions to emergency medical services, fire departments and hospitals.
We’re small enough to be nimble and fun, but big enough to be a great, stable place to work. We serve more than 10,000 customers out of our US, Canadian and Northern Ireland offices.
About the role
ESO is seeking a Site Reliability Engineering Manager to join our team. As a Manager of Site Reliability Engineering, you will be focused on enhancing observability solutions across our cloud-based estate to improve the resiliency of our mission-critical applications. Additionally, you will manage the engineers who monitor and remediate real-time production escalations in accordance with our product SLAs and SLOs.
This role reports to the Director of SRE and will require you to:
- Provide functional management of SRE team members.
- Deliver performance feedback and compensation reviews.
- Mentor, set goals and support career planning for your team members.
- Oversee daily planning, escalations, and triage meetings to support operations.
- Collaborate with other teams to ensure systems are operating in accordance with SLAs, SLOs and error rate budgets.
- Proactively develop automation solutions to support improved MTTR and self-healing in a distributed systems environment.
- Develop runbooks for routine production operations.
- Craft monitoring & observability solutions that drive meaningful action and insight.
- Identify and present solutions that can improve the performance and reliability of ESO’s systems.
- Report on core operational metrics that provide a trended view of the health of ESO’s applications.
More about what you’ll be doing
The right candidate for this role has experience hiring, mentoring, and managing technical execution using Agile methodologies. As a member of engineering leadership, you will collaborate frequently with development teams and business stakeholders throughout the SDLC to ensure ESO’s products are well-architected for performance, scalability, security, and high availability.
ESO’s infrastructure runs predominately in Azure where IaaS, PaaS and Serverless technologies are utilized within the platform. You should be familiar with modern DevOps pipelines that promote zero-downtime deployment strategies and have a strong understanding of Infrastructure as Code recipes that prevent configuration drift between environments.
Your qualifications
You are a creative technologist who leads by example and sets the pace and tone for building a high-performance team. The essentials to be successful in this role:
- 2-3+ years managing a high performing, technical team
- Experience with Agile development methodologies (Scrum / Kanban)
- Operations experience supporting mission-critical, customer facing systems
- Prior experience as a Software, SRE or DevOps engineer
- Hands on experience with cloud services (Azure, AWS, or GCP)
- Strong knowledge of Relational Databases (MSSQL)
- Experience with monitoring and observability tooling
- Familiarity with Continuous Integration and Continuous Deployment
- Prior experience using source control
The things that are a plus for this role:
- Experience with Microsoft Azure Serverless and PaaS technologies
- Experience with Docker Containers and Infrastructure as Code
- Experience in Healthcare / Public Safety sectors