Tuesday, July 17, 2012

Issues arising when node has muliple queue with Torque

I noticed that for Torque/MAUI, when the compute nodes belong to different queues, there could be a tendency where Torque/MAUI could conclude that the resource pool does not have sufficient resources.

I'm using Torque 2.5.3 / MAUI 3.3.1 version

Take for example, in /var/spool/torque/server_priv/nodes, if your nodes belong to
node01 np=8 queue1 queue2
node02 np=8 queue1 queue2
node03 np=8 queue2 queue3
node04 np=8 queue2 queue3

If you submit a job to queue2, something like

$ qsub -q queue2 -l nodes=3:ppn=8 openmpi.sh -v file=my_mpi_file

Based on the resources in queue2, there should be enough, but somehow MAUI will see that the resoruce is not enough. One of the best way to identify is to see the issue is to use checkjob. See Using MAUI checkjob command 

It is recommended that compute resource is tagged to one queue to prevent Torque/MAUI miscalculation



No comments: