Please scroll down, To apply

Site Reliability Engineer

hiring now

QGenda

2021-12-03 08:52:19

Job location Atlanta, Georgia, United States

Job type: fulltime

Job industry: Engineering

Job description

Title: Site Reliability Engineer

Classification (FLSA): Exempt
Position type: Full time
Reports to: Director, Site Reliability Engineering

Summary / Objective:
QGenda is a fast growing Atlanta-based healthcare software company, with an amazing corporate culture, where we strive to be the best place to be a customer. Our software is used by thousands of hospital departments around the world to automatically generate the most optimized physician work schedules to accommodate complex business rules and accurately schedule the appropriate medical provider based on their skill level, specialty, availability, and preferences.
As a Site Reliability Engineer, you will work with our product development teams to increase the scalability, reliability, and performance of our systems. Youll build and extend existing automation for configuration and monitoring of our AWS hosted applications. Youll evaluate new AWS services and tools to determine if they could be utilized in our environments. Youll bring a focus to platform health and monitoring to allow us to deliver the best possible experience for our customers.

Key Responsibilities:

  • Assist in Development Operations
    • Partner with software engineering teams to make sure scalability/reliability is designed and implemented in new features and products
    • Promote fundamentals of site reliability across the Product Development department and the organization as a whole
    • Work closely with development and operations teams to build highly available, cost effective systems
  • Build and Maintain Infrastructure
    • Write automation code for provisioning and operating infrastructure
    • Oversee infrastructure for customer facing applications hosted in AWS within production and pre-production environments including their provisioning
    • Maintain an understanding of new cloud computing capabilities on Amazon Web Services and look for opportunities to utilize those capabilities for our products
  • Ensure Application Uptime and Performance
    • Use extensive metrics to identify issues before they impact our customers
    • Establish end-to-end monitoring and alerting on all critical aspects of the system to ensure SLAs and get proactive notifications of possible issues for all systems
    • Design platforms for extremely high uptime metrics and ensure that our production SLAs are measured, monitored and maintained
    • Identify underlying root causes and provide recommendations or solutions for long term permanent fixes to critical production issues
    • Participate in service capacity planning and demand forecasting, software performance analysis and system tuning
  • Assure High Security Across the Application and Organization
    • Troubleshoot problems across the entire cloud-based stack: network, databases, and application and build automation to prevent problem recurrence
    • Develop effective documentation, tooling, and alerts to both identify and address reliability risks
  • Participate in oncall rotation with other team members on the Development Team
Knowledge, Skills and Abilities:
  • Advanced proficiency with at least one scripting or programming language, preferably Ruby or Python
  • Solid Linux administration experience, experience with Windows and Active Directory is a plus
  • Strong experience supporting applications running Ruby, Python or PHP
  • Experience with Nginx, Apache, Docker or similar technologies
  • Handson experience building infrastructure and supporting applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache
  • Strong understanding of networking and DNS
  • Familiarity with configuration management and infrastructure as code (IaC) tools such as Ansible, Terraform or Cloudformation
  • Availability for off-hours deployment and upgrades of production systems during release and maintenance windows
  • Firm understanding and experience with Agile and Scrum SDLC processes
  • Using distributed version control system experience (Git preferred) to checkin code, branching, merging, pull request, code review, etc.
  • Knowledge of CI/CD best practices and tools such as AWS CodeBuild, Jenkins and TeamCity
  • Experience designing and delivering secure, high performance and highlyavailable cloud services
  • Experience working with stakeholders to define and track SLIs, SLOs and SLAs using metrics and monitoring to ensure the objectives are met or exceeded
Education / Professional Certifications or Licenses Required:
  • Bachelor's degree (B.S. preferred) from a major university in a related field
Work Environment / Physical demands/ Travel Requirements :
Computer-based work environment.
Sitting and standing for extended periods
Lifting of 5 - 10 pounds.

Awards:
  • 2018 - EY Entrepreneur of the Year
  • 2018 - GA Fast 40
  • 2018 - Deloitte Technology Fast 500
  • 2018 - Glassdoor Top 50 CEO
  • 2019 - GA Fast 40
  • 2019 - AJC Best Places to Work
  • 2020 - Deloitte Technology Fast 500
  • 2020 - AJC Best Places to Work
Compensation & Perks:
  • Competitive Salary
  • Bonus Eligible
  • 401k Employer Match
  • Pluralsight Subscription
Great Benefits & Culture:
  • Full Health and Dental (QGenda pays 100% of the individual premiums)
  • Employee-centric work culture
  • Work remotely when needed
  • 3 "Flex Hours" per week
  • Relaxed vacation policy
  • Company outings
  • Costco membership
  • Casual dress
  • Opportunity to be part of a fast growing software company with hundreds of customers and thousands of users around the world.
PM21
Applicants must be currently authorized to work in the United States on a full-time basis.
Powered by JazzHR

PI

Inform a friend!

Similar jobs

Top