Gather

Site Reliability Engineer

Job Description

Posted on: 
July 31, 2024

We are currently seeking an exceptional DevOps/SRE to become a valuable member of our team. In this role, you will play a pivotal part in overseeing our application stack and cloud infrastructure, ensuring seamless orchestration and management of our services. Your responsibilities will encompass the design, development, and maintenance of our internal automation tools, which are crucial for efficiently managing our service lifecycle. Additionally, you'll be tasked with diagnosing and resolving runtime issues spanning the various tiers of our hosting stack.

Responsibilities

  • Manage containerized applications using technologies like Docker and Kubernetes
  • Implement monitoring, logging, and proactive issue identification
  • Architect and manage cloud-based infrastructure on GCP
  • Design resilient infrastructure for high availability and disaster recovery
  • Automate infrastructure setup and configuration using tools like Terraform, Ansible, or Puppet
  • Handle incident response and contribute to problem-solving efforts when necessary
  • Foster collaboration between development and operations teams

Nice To Have

  • Establish and optimize CI/CD pipelines for automated software delivery

Job Requirements

Your qualifications:

  • Bachelor's degree in Computer Science, Engineering, or a related field
  • 3+ years experience in building and maintaining cloud-native production infrastructure
  • Strong passion for meticulously documenting and automating intricate data systems.
  • Proficiency in Infrastructure as Code (IaC) practices.
  • A solid grasp of cutting-edge monitoring solutions and techniques.
  • Expertise in at least one modern programming language such as Python, Go, or similar.
  • Team player who is driven by ensuring the highest level of product quality
  • Desirable: Previous exposure to troubleshooting and enhancing production infrastructure.
Apply now

More job openings