Site Reliability Engineer, Zalo
Full-time
A Backend Reliability Engineer (BRE) in Zalo is a crucial role responsible for ensuring the constant availability, optimal performance, and robust scalability of ZA's inhouse database systems. This position blends the skills of a traditional database administrator with the principles of software engineering and site reliability engineering (SRE). DREs are proactive problem-solvers who leverage automation, deep technical expertise, and a collaborative mindset to build and maintain resilient and efficient data infrastructure
🤖 What you will do
- System Reliability and Availability: Design, build, and maintain highly available and fault-tolerant database systems. Develop and implement strategies for disaster recovery, backup, and restore processes to minimize downtime and data loss.
- Performance and Scalability: Proactively monitor database performance, identifying and resolving bottlenecks. Optimize queries, tune database configurations, and plan for future capacity needs to ensure the system can handle growing data volumes and user loads.
- Automation and Tooling: Develop and implement automation for routine database tasks, such as provisioning, configuration management, and patching. Build and maintain tools to improve the observability and manageability of the database environment.
- Incident Response and Troubleshooting: Serve as a primary point of contact for database-related incidents. Troubleshoot and resolve complex production issues, conducting root cause analysis to prevent recurrence. Participate in on-call rotations.
- Collaboration and Consultation: Work closely with software development teams to advise on database design, schema changes, and query optimization.
- Collaborate with infrastructure and SRE teams to ensure the database environment aligns with overall system architecture and reliability goals.
- Security and Compliance: Implement and maintain security best practices for databases, including access control, encryption, and auditing. Ensure compliance with relevant data protection regulations.
- Documentation and Knowledge Sharing: Create and maintain comprehensive documentation for database architecture, processes, and procedures. Share knowledge and best practices with other engineering teams.
Required Skills and Qualifications:
👾 What you will need
- Proven experience in a database administration, database engineering, or a similar role. Experience with SRE principles is highly desirable.
- Experience with NoSQL databases like MongoDB, Cassandra, Redis, Scylla...
- Proficiency in programing languages such as C++, Python, Java, ...
- Strong understanding of cloud platforms (AWS, Google Cloud, Azure) and their database services (e.g., RDS, Aurora, Cloud SQL).
- Experience with infrastructure-as-code tools like Terraform or Ansible.
- Knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, Datadog).
- Familiarity with containerization and orchestration technologies (Docker, Kubernetes)