Head of Platform/AI Cluster Management - System Integrator (San Francisco) Job at Hamilton Barnes Associates Limited, San Francisco, CA

MEs5TEtEem11RS9LRDJYbzdCKzUxcW9jZUE9PQ==
  • Hamilton Barnes Associates Limited
  • San Francisco, CA

Job Description

Ready to lead innovation at the intersection of platforms and artificial intelligence?

Join a pioneering technology company driving advancements in cloud, AI, and data-driven solutions across global markets. The organization is recognized for fostering innovation, scalability, and collaboration through cutting-edge platforms that empower enterprises to evolve intelligently.

The team is hiring a Head of Platform/AI Cluster Management to oversee the strategic development, integration, and optimization of AI and platform initiatives. The role will focus on leading cross-functional teams, enhancing performance and scalability, and aligning technology strategy with long-term business goals.

Shape the future of intelligent platforms and transformative innovation. Apply now!

Responsibilities

  • Own the scheduler/runtime layer (Slurm, Kubernetes, Ray), including multi-tenancy, quotas, and GPU/host fleet management.
  • Lead cluster operations across images, CI/CD, repair/health, performance/telemetry, and incident response.
  • Deliver platform services that ensure workload SLOs and reliable runtime execution.
  • Define and implement namespace/tenancy design, node health automation, golden images, admission controls, on-call runbooks, and go-live gates.
  • Collaborate closely with infra, SRE, and network teams to optimize workload placement and cluster efficiency.
  • Provide hands-on expertise in NCCL behaviours, placement strategies, and congestion signal management.

Requirements

  • Deep expertise in cluster management, scheduling, and runtime environments for large-scale compute.
  • Hands-on background with Slurm, Kubernetes, Ray, or similar orchestration platforms.
  • Strong understanding of NCCL performance tuning, workload isolation, and congestion management.
  • Experience scaling multi-tenant, GPU-heavy clusters with strict SLOs.
  • Ability to thrive in a startup environment with full ownership over platform and cluster strategy.

Salary

  • $500,000 gross per year (Negotiable)
#J-18808-Ljbffr

Job Tags

Full time,

Similar Jobs

Reyes Coca-Cola Bottling

Order Builder "Loader" Job at Reyes Coca-Cola Bottling

 ...Join the leading beverage provider, Reyes Coca-ColaBottling! Shift: Full Time, Monday - Friday, 5:00pm start, OT flexibility required Benefits: Union,Medical, Dental, Vision, Retirement, Vacation Hourly Pay Rate:$23.00 plus $1.00 hour shift differential... 

TradeJobsWorkforce

City Carrier Assistant Job at TradeJobsWorkforce

 ...Join our team as a City Carrier Assistant and take the next step in your career! In this pivotal role, you will work closely with your colleagues to ensure efficiency and maintain high standards in our services. As a valued team member, you will enjoy: Competitive... 

The Robison Group

Experienced Private Investigator Job at The Robison Group

 ...the position will be expected to perform investigations of workers compensation claims,...  ...Candidates with at least one year of PI experience are strongly encouraged to apply.Position...  ...Candidates MUST hold an active Tennessee Private Investigator license.Self-motivated, determined... 

Synergy Medical Staffing

Travel Speech Language Pathologist Job at Synergy Medical Staffing

 ...Start Date: 10/27/2025~ Duration: 13 weeks ~40 hours per week ~ Shift: 8 hours, days ~ Employment Type: Travel Travel, SLP - Rehab Location: Roswell, New Mexico Shift: 5x8 Days, 08:00:00-16:00:00, 8.00-5 Duration 13 Weeks When you join Synergy... 

PPG - USA

Aerospace Process Engineer Job at PPG - USA

Job Description As a Manufacturing Process Engineer , you will be responsible for developing and troubleshooting plant process systems including the implementation of process strategies, managing process resources, identifying and optimizing critical process variables...