Usually the error above there is a problem on pbs_mom on the compute node.
Step 1: Check the Queue and check the node that the jobs lands on
# qstat -a (for summary) # qstat -n (You will see where the nodes the job lands)
Step 2: Try to kill as cluster administrator.
# qdel*If you are not able to delete the job somehow
Step 3: Try restarting PBS Mom on the client
# service pbs_mom restart
Step 4: If Step 3 is not workable, it might be due to connection issues or hardware problems
Try
# ssh compute_node_1
If you cannot, you have to remote KVM into the server to take a look
No comments:
Post a Comment