Sunday, February 3, 2013

Unable to restart pbs_mom on nodes

I was unable to restart the pbs_mom from one of the compute node. A look at the log file at /var/spool/torque/mom_logs shows

 # less /var/spool/torque/mom_logs

pbs_mom;Svr;pbs_mom;LOG_ERROR::pbs_mom, 
Unable to get my full hostname for grapefruit.local.spms.ntu.edu.sg error -1

 Once you have this type of error, torque server will not be able to manage pbs_mom on the node which have this error

To solve the issues, it is very simple, you have to resolve the hostname discrepancy client hostname information between what the Torque Server has and what Torque Client. Check that the /etc/hostname or /etc/resolv.conf have the necessary information. Look at the Changing the hostname on CentOS on how to change hostname.

No comments: