Here are some excerpts.....
Host status Host status describes the ability of a host to accept and run batch jobs in terms of daemon states, load levels, and administrative controls. The bhosts and lsload commands display host status.
1. bhosts Displays the current status of the host
STATUS | DESCRIPTION |
ok | Host is available to accept and run new batch jobs |
unavail | Host is down, or LIM and sbatchd are unreachable. |
unreach | LIM is running but sbatchd is unreachable. |
closed | Host will not accept new jobs. Use bhosts -l to display the reasons. |
unlicensed | Host does not have a valid license. |
2. bhosts -l Displays the closed reasons. A closed host does not accept new batch jobs:
$ bhosts -l HOST node001 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW closed_Adm 60.00 - 16 0 0 0 0 0 - CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem root maxroot Total 0.0 0.0 0.0 0% 0.0 0 0 28656 324G 16G 60G 3e+05 4e+05 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M 0.0 0.0 processes clockskew netcard iptotal cpuhz cachesize diskvolume Total 404.0 0.0 2.0 2.0 1200.0 2e+04 5e+05 Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0 processesroot ipmi powerconsumption ambienttemp cputemp Total 396.0 -1.0 -1.0 -1.0 -1.0 Reserved 0.0 0.0 0.0 0.0 0.0 aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox Total 17.0 25.0 128.0 10.0 272.0 48.0 48.0 50.0 Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 gambit geom_trans tgrid fluent_par Total 50.0 50.0 50.0 193.0 Reserved 0.0 0.0 0.0 0.0
3. bhosts -X Condensed host groups in an condensed format
$ bhosts -X HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV comp027 ok - 16 0 0 0 0 0 comp028 ok - 16 0 0 0 0 0 comp029 ok - 16 0 0 0 0 0 comp030 ok - 16 0 0 0 0 0 comp031 ok - 16 0 0 0 0 0 comp032 ok - 16 0 0 0 0 0 comp033 ok - 16 0 0 0 0 0
4. bhosts -l hostID Display all information about specific server host such as the CPU factor and the load thresholds to start, suspend, and resume jobs
# bhosts -l comp067 HOST comp067 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok 60.00 - 16 0 0 0 0 0 - CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem root maxroot Total 0.0 0.0 0.0 0% 0.0 0 0 13032 324G 16G 60G 3e+05 4e+05 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M 0.0 0.0 processes clockskew netcard iptotal cpuhz cachesize diskvolume Total 406.0 0.0 2.0 2.0 1200.0 2e+04 5e+05 Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0 processesroot ipmi powerconsumption ambienttemp cputemp Total 399.0 -1.0 -1.0 -1.0 -1.0 Reserved 0.0 0.0 0.0 0.0 0.0 aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox Total 18.0 25.0 128.0 10.0 272.0 47.0 47.0 50.0 Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 gambit geom_trans tgrid fluent_par Total 50.0 50.0 50.0 193.0 Reserved 0.0 0.0 0.0 0.0 LOAD THRESHOLD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - root maxroot processes clockskew netcard iptotal cpuhz cachesize loadSched - - - - - - - - loadStop - - - - - - - - diskvolume processesroot ipmi powerconsumption ambienttemp cputemp loadSched - - - - - - loadStop - - - - - -
5. lsload Displays the current state of the host:
STATUS | DESCRIPTION |
ok | Host is available to accept and run batch jobs and remote tasks. |
-ok | LIM is running but RES is unreachable. |
busy | Does not affect batch jobs, only used for remote task placement (i.e., lsrun). The value of a load index exceeded a threshold (configured in lsf.cluster.cluster_name, displayed by lshosts -l). Indices that exceed thresholds are identified with an asterisk (*). |
lockW | Does not affect batch jobs, only used for remote task placement (i.e., lsrun). Host is locked by a run window (configured in lsf.cluster.cluster_name, displayed by lshosts -l). |
lockU | Will not accept new batch jobs or remote tasks. An LSF administrator or root explicitly locked the host using lsadmin limlock, or an exclusive batch job (bsub -x) is running on the host. Running jobs are not affected. Use lsadmin limunlock to unlock LIM on the local host. |
unavail | Host is down, or LIM is unavailable. |
unlicensed | The host does not have a valid license. |
6. References:
No comments:
Post a Comment