Friday, December 2, 2016

Error polling HP CQ with status WORK REQUEST FLUSHED ERROR status on LSF Platform

I was encountering "Error polling HP CQ with status WORK REQUEST FLUSHED ERROR status" during OpenMPI run and it was occuring randomly.

I suspect it is due to nodes issue. I checked the LSF /opt/lsf/log/sbatchd.log.comp001. It is definitely an authentication issue with AD. I'm using centrify.

acctMapTo: No valid user name found for job 149044, userName(mr_x) failed:Success
runEexec: getOSUid_() failed. Bad user ID


I did a
$ badmin hclose comp001

and then restart centrify services. Alternatively, you can reboot if you want a clean start.

The OpenMPI could run again.

Thursday, November 3, 2016

IBM Platform Cluster Manager Community Edition

IBM Platform Cluster Manager Community Edition has been released with no charge.


Platform Cluster Manager Community Edition is easy-to-use, powerful cluster management software for technical computing users. It delivers a comprehensive set of functions to help manage hardware and software from the infrastructure level. It automates the deployment of the operating system and software components, and complex activities, such as application cluster creation and maintenance of a system.

The community edition offering of Platform Cluster Manager Community Edition, uses a centralized user interface from where system administrators can manage a complex cluster as a single system. It offers the flexibility for users to add customized features that are based on specific requirements of their environment. It also provides a kit framework for easy software deployment. It also has the ability to set up enable a mutlitenant, multi-cluster environment. 

To Download, do here

Supported Platform

Management Node  Compute Node
  CentOS 6.6 CentOS 6.6, CentOS 6.5
RHEL 6.7 RHEL 6.7,
RHEL 6.6,
RHEL 6.5,
RHEL 5.11
CentOS 6.6,
CentOS 6.5,
CentOS 5.11
RHELSC 6.6,
RHELSC 6.5,
RHELSC 5.11
RHEL 7.1 RHEL 7.1,
RHEL 7.0,
RHEL 6.6,
RHEL 6.5,
RHEL 5.11
CentOS 7.0,
CentOS 6.6,
CentOS 6.5,
CentOS 5.11
RHELSC 7.0,
RHELSC 6.6,
RHELSC 6.5

For more information, see IBM Platform Cluster Manager Community Edition

Tuesday, October 25, 2016

Kernel Local Privilege Escalation - CVE-2016-5195

Taken from RedHat (https://access.redhat.com/security/vulnerabilities/2706661)

Background Information
A race condition was found in the way the Linux kernel's memory subsystem handled the copy-on-write (COW) breakage of private read-only memory mappings. An unprivileged local user could use this flaw to gain write access to otherwise read-only memory mappings and thus increase their privileges on the system.


This could be abused by an attacker to modify existing setuid files with instructions to elevate privileges. An exploit using this technique has been found in the wild. This flaw affects most modern Linux distributions.

Red Hat Product Security has rated this update as having a security impact of Important.

Impacted Products:
The following Red Hat Product versions are impacted:
•    Red Hat Enterprise Linux 5
•    Red Hat Enterprise Linux 6
•    Red Hat Enterprise Linux 7
•    Red Hat Enterprise MRG 2
•    Red Hat Openshift Online v2

Attack Description and Impact:This flaw allows an attacker with a local system account to modify on-disk binaries, bypassing the standard permission mechanisms that would prevent modification without an appropriate permission set. This is achieved by racing the madvise(MADV_DONTNEED) system call while having the page of the executable mmapped in memory.

Take Action:All Red Hat customers running the affected versions of the kernel are strongly recommended to update the kernel as soon as patches are available. Details about impacted packages as well as recommended mitigation are noted below. A system reboot is required in order for the kernel update to be applied.

Mitigation:Please reference bug 1384344  - https://bugzilla.redhat.com/show_bug.cgi?id=1384344#c13 for detailed mitigation steps.

Updates for Affected Products:
A kpatch for customers running Red Hat Enterprise Linux 7.2 or greater will be available. Please open a support case to gain access to the kpatch.

For more details about what a kpatch is: Is live kernel patching (kpatch) supported in RHEL 7? - please refer to https://access.redhat.com/solutions/2206511



Monday, October 17, 2016

Offline Nodes in MOAB

Change State of MOAB Clients Nodes

To offline the nodes

# mnodectl -m state=drained node1

To flush the nodes
# mnodectl -m state=flush node1

To reserve the nodes
# mnodectl -m state=reserved node1

To delete nodes
# mnodectl -d node1

Friday, October 14, 2016

LAMMPS Tools and Packmol with Intel Fortran

PACKMOL information can be obtained from http://www.ime.unicamp.br/~martinez/packmol/userguide.shtml#conv

Installing can be found at http://www.ime.unicamp.br/~martinez/packmol/userguide.shtml#comp

 1. Compile Packmol with Intel Fortran
# tar -zxvf packmol.tar.gz
# cd packmol
# ./configure ifort
# make

2. LAMMPS Tools
# git clone https://github.com/jdevemy/lammps-tools.git
# cd lammps-tools
# python setup.py build
# sudo python setup.py install

3. Make sure the Python has the following libraries in create_conf (sys, os, logging, argparse, math random) 4. Make sure the Python (if you install lammps-tool)
# export PYTHONPATH=/home/user1/Downloads/lammps-tools-master/lib
# ./create_conf