Senior Site Reliability Engineer

An exciting opportunity has arisen for a Senior Site Reliability Engineer to join a forward-thinking technology organisation in Bangkok, where your expertise will be pivotal in shaping the reliability, scalability, and operational maturity of a cutting-edge cloud platform. This role is perfect for someone who thrives on combining software engineering with systems engineering principles to deliver highly available, resilient, and secure services on AWS.

You will have the opportunity to drive meaningful change by leading reliability initiatives, advancing automation, and playing a key role in designing, building, and operating resilient AWS cloud infrastructure that supports business-critical services and enables long-term platform scalability and growth.

What you'll do:

• Design, implement, and maintain highly available, scalable, and fault-tolerant cloud infrastructure and platform services using AWS technologies.
• Establish, monitor, and continuously improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and reliability metrics to ensure consistent service delivery.
• Proactively identify potential reliability risks within the platform and implement preventive measures to minimise service disruptions for end users.
• Drive operational excellence initiatives by automating processes and optimising system stability through engineering-driven solutions.
• Architect comprehensive observability solutions including monitoring, logging, tracing, and alerting capabilities across all layers of the platform.
• Develop actionable operational metrics and dashboards that provide deep insights into system health and performance trends.
• Lead incident response activities for critical production environments by providing technical leadership during major incidents and facilitating post-incident reviews.
• Conduct thorough root cause analyses following incidents to ensure corrective actions are implemented effectively and lessons learned are shared across teams.
• Develop and maintain automation frameworks as well as self-healing capabilities to reduce manual effort and eliminate operational toil.
• Collaborate with Engineering, DevOps, Security, Architecture, and Product teams to influence system design decisions and promote operational readiness throughout the software development lifecycle.

What you bring:

• Minimum 6 years of experience in Cloud Infrastructure, Platform Engineering, DevOps, or related roles, including at least 2 years of hands-on experience as a Site Reliability Engineer (SRE) supporting and maintaining production-critical environments.
• Proven track record implementing observability solutions including monitoring frameworks (metrics/logging/tracing) and incident management processes is needed.
• Demonstrated expertise in AWS cloud architecture including deployment of scalable services using best practice methodologies.
• Hands-on proficiency with Infrastructure-as-Code tools such as Terraform is required; familiarity with AWS CloudFormation or AWS CDK is advantageous.
• Solid experience building CI/CD pipelines using Git-based workflows alongside deployment automation tools is important for this role.
• Strong understanding of containerisation technologies—especially Kubernetes—and experience managing Amazon EKS clusters is highly valued.
• Experience conducting root cause analysis after incidents along with driving operational improvements based on findings.
• Excellent analytical skills combined with effective troubleshooting abilities plus strong communication skills for stakeholder engagement.

What sets this company apart:
This organisation stands out for its unwavering commitment to fostering an inclusive workplace where knowledge sharing and professional development are prioritised at every stage of your career journey. Employees benefit from flexible working opportunities designed to support both personal wellbeing and professional aspirations—enabling you to thrive inside and outside the office. If you’re looking for a place where your contributions make a real difference—and where your career can progress alongside some of the brightest minds in technology—this could be the perfect fit for you.

What's next:
If you are ready to take your expertise in site reliability engineering to new heights within a collaborative environment focused on operational excellence—this is your moment!

Apply today by clicking on the link provided; we look forward to connecting with passionate professionals eager to make an impact.

Due to the high volume of applications, our team will only be in touch if your application is shortlisted.

Robert Walters Recruitment (Thailand) Limited
Recruitment License No.: น. 1188 / 2551

Similar jobs

View more jobs

Senior Site Reliability Engineer

Share

Similar jobs