If the Compute Node pbs_mom is lost and cannot be recovered (due to hardware or network failure) and to purge a running job from the qstat output or show
1. Shutdown the pbs_server daemon on the PBS Server
# service pbs_server stop
2. Remove Job Spool Files that holds the hanged JobID (For example 4444)
# rm /var/spool/torque/server_priv/jobs/4444.headnode.SC
# rm /var/spool/torque/server_priv/jobs/4444.headnode.JB
3. Start the pbs_Server Daemon
# service pbs_server start
4. Restart the MAUI Daemon
# service maui restart
References:
- Deleting PBS/Maui Jobs
No comments:
Post a Comment