- Drive key aspects of the hardware development lifecycle, including feasibility studies, hardware bring-up, validation, deployment, and ongoing production support.
- Perform platform integration, performance characterization, and system-level debugging across compute infrastructure.
- Develop and execute rigorous evaluation and stress-testing strategies for server platforms.
- Support BIOS/BMC firmware qualification, hardware health monitoring, and automation tooling.
- Partner with hardware vendors and internal infrastructure teams to investigate component- and system-level failures.
- 5+ years of hardware engineering, system architecture, or infrastructure validation experience.
- BS/MS in Electrical Engineering, Computer Engineering, Computer Science, or equivalent practical experience.
- Strong understanding of Linux-based server environments, modern data center technologies, and server architecture (PCIe, NVMe, Ethernet/InfiniB, memory subsystems, accelerator-based platforms).
- Hands-on experience testing and validating CPU, Memory, Storage, and high-speed networking subsystems in a Linux environment.
- Proficiency in Python, Go, or Bash for developing hardware validation tools and automation scripts.
- Strong experience debugging complex server issues remotely, including analysis of kernel logs, hardware registers, and bus-level traces/captures.
- Familiarity with GPU platforms, AI accelerators, or high-performance compute (HPC) systems is a plus.
- Decisive and effective at tracking hardware issues from identification through to fleet-wide resolution.
- Excellent oral and written communication skills; able to translate complex hardware constraints into actionable insights for software teams.
- Strong interpersonal skills with the ability to lead cross-functional projects.
- Willing to travel occasionally to data centers or vendor sites.