Submeter
AWS Operational Support Engineer
Descrição da posição
Provide technical support and maintenance for applications and infrastructure hosted on Amazon Web Services (AWS).
Monitor system performance, troubleshoot issues, implement best practices for scalability and reliability, and ensure smooth operation of cloud environments while collaborating with development and operations teams.
- Monitor and optimize AWS-based production environments, ensuring high availability, performance, and resilience.
- Manage observability tools such as CloudWatch, X-Ray, Datadog, and custom dashboards to provide real-time visibility across infrastructure and applications.
- Lead the response to critical production incidents, including troubleshooting, root cause analysis, mitigation, and post-mortem reporting.
- Maintain and improve disaster recovery processes, failover procedures, backup strategies, and resilience testing.
- Support ECS clusters, EC2 instances, load balancers, databases, RabbitMQ, and S3 services.
- Troubleshoot and support CI/CD pipelines, Infrastructure as Code deployments, and production releases.
- Collaborate with development teams to improve application performance, database queries, and operational reliability.
- Manage IAM roles, security configurations, certificates, and access controls according to best practices.
- Create and maintain operational documentation, runbooks, and incident procedures.
Requirements
- Proven experience in AWS operations and production support environments.
- Strong knowledge of EC2, ECS, S3, IAM, VPC, CloudWatch, Route 53, Load Balancers, and Infrastructure as Code tools such as CloudFormation or AWS CDK.
- Experience with Docker, ECS, CI/CD pipelines, Jenkins, and AWS Code services.
- Knowledge of Aurora PostgreSQL, MongoDB Atlas, RabbitMQ, and monitoring tools such as Datadog and X-Ray.
- Strong troubleshooting skills across infrastructure, networking, databases, and cloud services.
- Experience with disaster recovery, failover testing, backup strategies, and resilience practices.
- Scripting and automation skills using Python and Bash.
- Ability to read and troubleshoot Java, Python, or TypeScript code.
- Experience managing production incidents, root cause analysis, and operational runbooks.
- Strong communication, analytical, and documentation skills.
Quer se candidatar?
Cargo
Nome*
Email*
Telefone*
País*
Cidade*
Linkedin
Upload your CV*
(máx. 4MB)
Faça upload da sua foto ou video
(máx. 4MB)


