- Experience building and operating ML infrastructure or model serving systems in production
- Proficiency in Golang or Python, with strong systems engineering fundamentals
- Hands-on experience with Kubernetes and container orchestration at scale
- Familiarity with ML serving frameworks such as Ray Serve, Triton, TorchServe, or similar
- Understanding of distributed systems, API design, and system reliability
- Strong collaboration and communication skills in a remote-first environment