Back

AI Engineer, Zalo AI

Hồ Chí Minh
Full-time

🤖 What you will do

  • Optimizing latency and throughput of model inference;
  • Building reliable production serving system to serve millions of users;

👾 What you will need

  • Experience with programming languages such as C++ and Python;
  • Solid knowledge of Data Structures and Algorithms;
  • Proficiency with deep learning frameworks such as PyTorch and TensorRT;
  • Experience with system optimizations for model serving, such as batching, caching, load balancing, and model parallelism;
  • Experience with algorithmic optimizations for inference, such as quantization, distillation, and speculative decoding;
  • Experience with HTTP, gRPC, and Triton Inference Server;
  • Experience with large-scale, high-concurrency production serving;
  • Ability to quickly learn new technologies, frameworks, and algorithms;

Nice to have: 

  • Experience with low-level optimizations for inference, such as GPU kernels;
  • Experience with building solutions with MLOps tools and frameworks such as Kubernetes, Kubeflow, etc;•

Take a look inside
<whpinrsicnlg__sptrjovcjepsls/>

Our interview process is all about getting to know each other. Come prepared to showcase your hard work, skills, and achievements, and get a better understanding of what it’s like to work at Zalo group.

Why
<pczhtotoksye/>
Zalo?

Life at <gZparlno/>