AI Engineer, Zalo AI
Full-time
🤖 What you will do
- Optimizing latency and throughput of model inference;
- Building reliable production serving system to serve millions of users;
👾 What you will need
- Experience with programming languages such as C++ and Python;
- Solid knowledge of Data Structures and Algorithms;
- Proficiency with deep learning frameworks such as PyTorch and TensorRT;
- Experience with system optimizations for model serving, such as batching, caching, load balancing, and model parallelism;
- Experience with algorithmic optimizations for inference, such as quantization, distillation, and speculative decoding;
- Experience with HTTP, gRPC, and Triton Inference Server;
- Experience with large-scale, high-concurrency production serving;
- Ability to quickly learn new technologies, frameworks, and algorithms;
Nice to have:
- Experience with low-level optimizations for inference, such as GPU kernels;
- Experience with building solutions with MLOps tools and frameworks such as Kubernetes, Kubeflow, etc;•