Please scroll down, To apply

Site Reliability Engineer

hiring now

TechnoSmarts, Inc.

2021-12-03 09:02:07

Job location Tallahassee, Florida, United States

Job type: fulltime

Job industry: Banking & Financial Services

Job description

Direct full-time client hire

100% remote.

Client: International data, analytics and technology company serving as a major player in the global economy as a provider of business and consumer financial data, HR solutions and fraud prevention services assisting employers, employees, financial institutions, and government agencies make critical decisions. In operation for over 100 years employing over 4,000.


Comprehensive benefits, competitive salary, plus bonus up to 10%.

Overview:

Seeking a Site Reliability Engineering (SRE) for building and running large-scale, distributed, fault-tolerant systems. Ensure that internal and external services meet or exceed reliability and performance expectations while adhering to established engineering principles. Build and run production systems and engineer solutions to operational problems. Will be responsible for overall system operation using a variety of tools and approaches to solve a broad set of problems such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.

Role:

  • Engage in and improve the software development lifecycle - from inception and design, through development, deployment, operation and refinement.
  • Influence and design infrastructure, architecture, standards and methods for large-scale systems.
  • Support services prior to production via infrastructure design, software platform development, load testing, capacity planning and launch reviews.
  • Maintain services during deployment and in production by measuring and monitoring key performance and service level indicators including availability, latency, and overall system health.
  • Automate system scalability and continually work to improve system resiliency, performance and efficiency.
  • Practice sustainable incident response as part of an on-call rotation and through blameless postmortems.
  • Remediate tasks within corrective action plan via sustainable, preventative, and automated measures whenever possible.

Qualifications:

  • BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required.
  • 4 - 7+ years of public cloud experience required.
  • Google Cloud platform experience in a professional setting required.
  • Experience managing infrastructure as code via tools such as Terraform or CloudFormation required.
  • Proficiency with continuous integration and continuous delivery tooling and practices.
  • Experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or Node.js.
  • System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.).
  • Proficiency with monitoring tools like Stackdriver, Datadog, or AppDynamics.

Inform a friend!

Top