Sunday, June 28, 2015

Resolving unreach or unavail nodes in OpenLava-3.0

After configuring OpenLava-3.0 using the tar ball and following the instruction according to the OpenLava – Getting Started Guide After fixing OpenLava with LM is Down Error Messages for OpenLava-3.0, you may errors

HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
compute-c00     unreach              -     16      0      0      0      0      0
headnode-h00     ok              -     16      0      0      0      0      0

Suggestions:
  1. Check your permission where openlava-3.0 reside. Make sure the HeadNode and ComputeNode has the user and group openlava and openlava have permission on the folder
    drwxr-xr-x. 10 openlava openlava 4096 Jun 26 00:32 openlava-3.0
  2. Install pdsh. See Installing pdsh to issue commands to a group of nodes in parallel in CentOS on all the compute nodes and use pdcp to copy /etc/passwd /etc/shadow /etc/group to all the nodes
    # pdcp -a /etc/passwd /etc
    # pdcp -a /etc/shadow /etc
    # pdcp -a /etc/group /etc
  3. Make sure your /etc/hosts reflect the short hostname of the cluster both in the HeadNode and ComputeNode. Refrain from putting 2 hostnames per line.
  4. Check your firewalls settings. Make sure the ports 6322:6325 are opened.
  5. Ensure your NTP are synchronized across the clients and HeadNode with the designated NTP Server. If the NTP

Thursday, June 25, 2015

LM is Down Error Messages for OpenLava-3.0

After configuring OpenLava-3.0 using the tar ball and following the instruction according to the OpenLava – Getting Started Guide

I was encountering errors like
# lsid
openlava project 3.0, June 25 2015
ls_getclustername(): LIM is down; try later

Debugging:
# service openlava stop

# vim /usr/local/openlave-3.0/etc/lsf.conf

# /usr/local/openlava-3.0/sbin/lim -2

Solution (Check first):
Check that the
# hostname -s
# hostname -f

In your /etc/hosts, you may want to change to something like this. It solved my issues
127.0.0.1   headnode-h00 localhost