Site Reliability Engineer
Insight Global
2021-12-03 07:35:29
Saint Louis, Missouri, United States
Job type: fulltime
Job industry: Banking & Financial Services
Job description
Day to Day:
An employer is looking to hire an Infrastructure Reliability Engineer to sit in St. Louis, MO. This person will be working on the BIS IRE (Infrastructure Reliability Engineering) team, which is responsible for the 24x7 infrastructure support of a number of business-critical services that underpin the overall performance of a multi-billion-dollar business segment. This Reliability Engineer will be contributing to the availability of services, their security and regulatory compliance, and the customer experience associated with their domain. The role also guarantees the reliable, agile, and continuous flow of change required to optimize the performance of their platform ecosystem and performance of the customer service.
The role will have the following responsibilities:
-Identifies and proposes strategic service and system changes, enhancements and developments and liaises with appropriate third party and software vendors, users and IT teams (Cloud, Virtualization, Storage, Networking, BigData, Middleware, DB, etc.).
-Responsible for delivering best of breed enhancements to the environment including but not limited to automation, monitoring and alerting.
-Architect, Design and Drive enhancements to the environment through to implementation.
-Proactively ensure the highest levels of systems and infrastructure availability.
-Understand existing complex environment and be able to easily identify problem areas and undertake successful implementations to resolve and/or mitigate.
-Apply the best of Site Reliability Engineering, DevOps, CI/CD and Infrastructure as Code practices to simultaneously increase change velocity and platform stability.
-Ensuring monitoring performance and identifying any issues, identify possible solutions, and work with support engineers to provide resolution.
-Participate in the design of information and operational support systems.
-Install, configure, test and maintain operating systems, application software and system management tools.
-Provide an advanced level of support to the existing environment when required.
-Maintain security, backup, and redundancy strategies.
-Write and maintain custom scripts to increase system efficiency and lower the human intervention time on any tasks.
-Escalate risks and issues and provide frequent status reports and for management.
-Write and maintain appropriate documentation for manual and automated processes.
-Manage, deliver and monitor a number of key improvements to the current systems and infrastructure.
-Actively report on project progress, raising risks and managing deliverables often with conflicting priorities and timescales.
Must Haves:
-3+ years' experience in an IT or technology support role
-Solid interpersonal skills and comfortable interacting with stakeholders at both infrastructure and
application levels
-Willing to step up and drive major incidents involving production environments with a strong sense of urgency
-Scripting experience in at least one of the following: PowerShell, BASH, Python, Perl, Ruby
-Systems configuration and administration: Windows and/or Linux
-Large scale networking systems knowledge
-Ability to analyze how components of a distributed system work together using a broad range of skills and tools
Plusses:
-Puppet, Ansible, AWS, IaC (Terraform, CloudFormation), Docker, Kubernetes, Jenkins, Splunk, Kibana, Grafana, Rundeck, RegEx
-Network+ or equivalent training