Linux Toolkits: 2015

Saturday, December 26, 2015

Disabling selinux without restart on CentOS

Do the following

# setenforce 0

To verify that the selinux has been changed from enforcing to permissive which will not block anything, but warn only

# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   permissive
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted

Wednesday, December 9, 2015

Managing database size with Platform LSF HPC 3.2

From IBM Technote.

Tuesday, December 8, 2015

Open Source Service Management Software - ORTS

If you are looking for an Open Source Helpdesk or Service Management Software, you may want to consider OTRS Free – The Flexible Open Source Service Management Software

Monday, November 23, 2015

Configure Error when compiling GCC 5.2.0 on CentOS 6.6

When I compile according to Compiling GNU 5.2.0 on CentOS 6

I've got this error.

configure: error: 
I suspect your system does not have 32-bit development libraries (libc and headers). 
If you have them, rerun configure with --enable-multilib. 
If you do not have them, and want to build a 64-bit-only compiler, rerun configure with --disable-multilib.

The solution is

# yum install glibc-devel*.i686

Thursday, November 12, 2015

Too many content sets for certificate Red Hat Enterprise Linux, Standard (up to 2 sockets) 3 year. A newer client may be available to address this problem.

If your encounter Too many content sets for certificate Red Hat Enterprise Linux, Standard (up to 2 sockets) 3 year. A newer client may be available to address this problem when you are using RHEL 6.3 and below.

Step 1: you need to be able to do a yum install. To be able to do that, at the command prompt, register with RHN Classics

# rhn_register

Step 2: Update Yum subscription

# yum update subscription-manager

Step 3:Subscribe to the Subscription Pool

# subscription-manager subscribe --pool=xxxxxxxxxxxxxxxxxxxxxx

Not loading "rhnplugin" plugin, as it is disabled

After subscribing to the Red Hat Subscription by doing a

# subscription-manager register

After that I did a

# yum clean all
# yum repolist -v
..... 
Not loading "rhnplugin" plugin, as it is disabled
.....

Finally, I did a

# subscription-manager list --available

Select the right pool

# subscription-manager subscribe --pool=xxxxxxxxxxxxxxxxxxxxxx

If it still does not work, go for the RHN Classic

# rhn_register

You can check whether it works by

# yum -d10 repolist

Repo-id      : rhel-x86_64-server-6
Repo-name    : Red Hat Enterprise Linux Server (v. 6 for 64-bit x86_64)
Repo-updated : Tue Nov 10 21:10:32 2015
Repo-pkgs    : 16,298
Repo-size    : 28 G
Repo-baseurl : https://xmlrpc.rhn.redhat.com/XMLRPC/GET-REQ/rhel-x86_64-server-6
Repo-expire  : 21,600 second(s) (last: Thu Nov 12 09:19:23 2015)
Repo-exclude : ibutils-libs*
Repo-excluded: 12

Wednesday, November 4, 2015

How to Display OpenSM Logs

This is a interesting video from Mellanox on How to Display OpenSM Logs from Mellanox Academy

Thursday, October 29, 2015

LibGL Error. Unable to load driver swrast_dri.so

If you encounter the error when you run applications, says MATLAB

libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast

You need to install mesa-libGLU*.x86_64 and mesa-libGLU*.i686

yum install  mesa-libGLU*.i686 mesa-libGLU*.x86_64

Tuesday, October 27, 2015

kipmi0 taking excessive CPU resources

This article kipmi0 problem provides a good explanation of kipmi0 problem. But it did not work on me. Instead, this modification of kipmid parameters

# echo 50 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us

This keep the kipmi0 to 5% of the CPU

Saturday, October 3, 2015

MAUI secondary client lost connection to primary MAUI host

If you are using Secondary Submission Node for torque and wish to use MAUI as a scheduler, you may encounter the error

ERROR:    lost connection to server
ERROR:    cannot request service (status)

The solution is quite simple, make sure the the time are the same. This is the most common cause of lost connection.

Related Information:

Bad UID for job execution MSG=ruserok failed validating user1 from ServerNode while configuring Submission Node in Torque

Wednesday, September 23, 2015

How To Tune Your Linux Server Using mlnx_tune Tool

This is a wonderful video from Mellanox

How To Tune Your Linux Server Using mlnx_tune Tool from Mellanox Technologies on Vimeo.

Monday, August 31, 2015

Troubleshooting and Error Messages Tips for Platform and OpenLava

Troubleshooting and Error Messages Tips for Platform. But it can be used for OpenLava. Do take a look and digest.

Troubleshooting and Error Messages

Wednesday, August 26, 2015

udev: renamed network interface eth0 to eth2

I was encountering this error when I start the network

# dmesg |grep eth0 
udev: renamed network interface eth0 to eth2

This occurs when you clone or change the NIC hardware and the OS still retain the old interface information in /etc/udev/rules.d/70-persistent-net.rules.

Just delete /etc/udev/rules.d/70-persisitent-net.rules and reboot the system so the Linux can rebuild the /etc/udev/rules.d/70-persistent-net.rules that match the replaced NIC hardware.

# rm  /etc/udev/rules.d/70-persistent-net.rules
# reboot

Failed to conect to FastX Server

Do take a look at Failed to start a secure connection to the server

Basically, to fix this issue, run the command from your linux machine:

# killall fastx_server

Relaunch the FastX Client

Thursday, August 20, 2015

No irq handler for vector on CentOS 6

If you are getting error messages on the /var/log/messages on your CentOS 6. You may want to see Redhat Bugzilla "No irq handler for vector" error, sluggish system"

Aug 20 11:22:58 node1 kernel: do_IRQ: 2.135 No irq handler for vector (irq -1)

Step 1: Edit Grub Bootloader. Add the pci=nomsi,noaer to the end of the kernel options

# vim /boot/grub.menu.1st

default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS 6 (2.6.32-504.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-504.el6.x86_64 ro root=/dev/mapper/vg_cherry-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_cherry/lv_root SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_cherry/lv_swap rd_NO_DM rhgb quiet pci=nomsi,noaer
        initrd /initramfs-2.6.32-504.el6.x86_64.img

Step 2: Disable irqbalance daemon

# service irqbalance stop
# chkconfig --levels 2345 irqbalance off

Step 3: Reboot the System

# reboot

References:

Monday, August 17, 2015

Using Python 2 on JupyterHub

If you are installing and configuring JupyterHub, do take a look

Basic Setup and Configuration of JupyterHub with Python-3.4.3

By default, JupyterHub uses Python 3.3. However you may want to use Python-2 on JuypterHub. You may want to take a look at Basic Setup and Configuration of JupyterHub with Python-3.4.3

Step 1: Install latest version of Python-2
You may want to see Installing and Compiling Python 2.7.8 on CentOS 5. You can apply this for CentOS 6

Step 2: Remember to install iPython2 and iPython[notebook] on Python-2
Do take a look at Installing scipy and other scientific packages using pip3 for Python 3.4.1 for some similar ideas

Step 3: Install Python KernelSpec for Python 2

# /usr/local/python-2.7.10/bin/python2 -m IPython kernelspec install-self
# /usr/local/python-3.4.3/bin/python3 -m IPython kernelspec install-self

Step 4: Restart JupytHub

# juypterHub

Friday, August 14, 2015

Fixing Rsync out of memory Issues

If you are doing rsync and you encountered this error like rsync out of memory, you may want to take a look.a this article (Rsync out of memory? Try this...). Need to add an additional parameter (--no-inc-recursive) to the rsync commands.

According to the article, the the out of memory failure occured when rsync attempts to load all the filenames and info in to RAM at startup. For example,

# rsync -lH -rva --no-inc-recursive --progress gromacs remote_server:/usr/local

References:

Commonly Used rsync Arguments

Wednesday, August 12, 2015

Cannot retrieve metalink for repository: epel. Please verify its path and try again

I was doing a yum install and I encountered an error

# yum install libstdc++-4.4.7-16.el6.x86_64
Loaded plugins: fastestmirror, refresh-packagekit, security
Loading mirror speeds from cached hostfile
Error: Cannot retrieve metalink for repository: epel. 
Please verify its path and try again

The correct fix is to update your SSL certificates.

# yum upgrade ca-certificates --disablerepo=epel -y

Yum install again. You should be able to work.

# yum install libstdc++-4.4.7-16.el6.x86_64

Error when installing libXrender for CentOS 6

If you do encountered an issue such as this. Apparently, there is a incompatibility between llibXi-1.7.2-2.,/pre>2.el6.i686 != libXi-1.6.1-3.el6.x86_64

# yum install libXrender

Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem. Eg.:

1. You have an upgrade for libXrender which is missing some
dependency that another package requires. Yum is trying to
solve this by installing an older version of libXrender of the
different architecture. If you exclude the bad architecture
yum will tell you what the root cause is (which package
requires what). You can try redoing the upgrade with
--exclude libXrender.otherarch ... this should give you an error
message showing the root cause of the problem.

2. You have multiple architectures of libXrender installed, but
yum can only see an upgrade for one of those arcitectures.
If you don't want/need both architectures anymore then you
can remove the one with the missing update and everything
will work.

3. You have duplicate versions of libXrender installed already.
You can use "yum check" to get yum show these errors.

...you can also use --setopt=protected_multilib=false to remove
this checking, however this is almost never the correct thing to
do as something else is very likely to go wrong (often causing
much more problems).

Protected multilib versions: libXrender-0.9.8-2.1.el6.i686 != libXrender-0.9.7-2.el6.x86_64
You could try using --skip-broken to work around the problem

To resolve the issue, you have to install libXrender-0.9.8-2.1.el6.x86_64

# yum install libXrender-0.9.8-2.1.el6.x86_64

Finally, do a

# yum install libXrender

Error when installing libXi for CentOS 6

If you do encountered an issue such as this. Apparently, there is a incompatibility between llibXi-1.7.2-2.,/pre>2.el6.i686 != libXi-1.6.1-3.el6.x86_64

# yum install libXi

Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem. Eg.:

1. You have an upgrade for libXi which is missing some
dependency that another package requires. Yum is trying to
solve this by installing an older version of libXi of the
different architecture. If you exclude the bad architecture
yum will tell you what the root cause is (which package
requires what). You can try redoing the upgrade with
--exclude libXi.otherarch ... this should give you an error
message showing the root cause of the problem.

2. You have multiple architectures of libXi installed, but
yum can only see an upgrade for one of those arcitectures.
If you don't want/need both architectures anymore then you
can remove the one with the missing update and everything
will work.

3. You have duplicate versions of libXi installed already.
You can use "yum check" to get yum show these errors.

Protected multilib versions: libXi-1.7.2-2.2.el6.i686 != libXi-1.6.1-3.el6.x86_64
You could try using --skip-broken to work around the problem

To resolve the issue, we have to do a

# yum install libXi-1.7.2-2.2.el6.x86_64

and you will see quite a list of updates to some core libraries. Finally, you can do a

# yum install libXi

Error when installing libstc++ for CentOS 6

If you do encountered an issue such as this. Apparently, there is a incompatibility between libstdc++-4.4.7-16.el6.i686 != libstdc++-4.4.7-11.el6.x86_64.

# yum install libstdc++.so.6

Resolving Dependencies
--> Running transaction check
---> Package libstdc++.i686 0:4.4.7-16.el6 will be installed
--> Finished Dependency Resolution
Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem. Eg.:

1. You have an upgrade for libstdc++ which is missing some
dependency that another package requires. Yum is trying to
solve this by installing an older version of libstdc++ of the
different architecture. If you exclude the bad architecture
yum will tell you what the root cause is (which package
requires what). You can try redoing the upgrade with
--exclude libstdc++.otherarch ... this should give you an error
message showing the root cause of the problem.

2. You have multiple architectures of libstdc++ installed, but
yum can only see an upgrade for one of those arcitectures.
If you don't want/need both architectures anymore then you
can remove the one with the missing update and everything
will work.

3. You have duplicate versions of libstdc++ installed already.
You can use "yum check" to get yum show these errors.

Protected multilib versions: libstdc++-4.4.7-16.el6.i686 != libstdc++-4.4.7-11.el6.x86_64
You could try using --skip-broken to work around the problem
** Found 1 pre-existing rpmdb problem(s), 'yum check' output follows:
1:emacs-23.1-25.el6.x86_64 has missing requires of libotf.so.0()(64bit)

To resolve the issues, do a

# yum install libstdc++-4.4.7-16.el6.x86_64

and you will see quite a list of updates to some core libraries. Finally, you can do a

# yum install libstdc++

Thursday, July 23, 2015

GPFS Client Node cannot be added to the GPFS cluster

At the NSD Node, I issue the command

# mmaddnode -N node1
Thu Jul 23 13:40:12 SGT 2015: mmaddnode: Processing node node1
mmaddnode: Node node1 was not added to the cluster.
The node appears to already belong to a GPFS cluster.
mmaddnode: mmaddnode quitting.  None of the specified nodes are valid.
mmaddnode: Command failed.  Examine previous error messages to determine cause.

If we do a mmcluster, the node is not around in the cluster

# mmcluster |grep node1

If the node is not in the cluster, issue this command on the client node that could not be added:

# mmdelnode -f
mmdelnode: All GPFS configuration files on node goldsvr1 have been removed.

Reissue the mmaddnode command. References:

Node cannot be added to the GPFS cluster

permission denied on .gvfs on CentOS 6

If you encounter a user who has this issue and it is often noticed when we do a rsync, Do take a look at this forum thread permission denied on .gvfs

The solution is found when I umount the .gvfs

# umount /home/useraccount/.gvfs

Saturday, July 4, 2015

Deleting Users' Crontab

Users' Crontab are stored in /var/spool/cron/
If you need to clean out a user crontab files, you can delete the file there.

Sunday, June 28, 2015

Resolving unreach or unavail nodes in OpenLava-3.0

After configuring OpenLava-3.0 using the tar ball and following the instruction according to the OpenLava – Getting Started Guide After fixing OpenLava with LM is Down Error Messages for OpenLava-3.0, you may errors

HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
compute-c00     unreach              -     16      0      0      0      0      0
headnode-h00     ok              -     16      0      0      0      0      0

Suggestions:

Check your permission where openlava-3.0 reside. Make sure the HeadNode and ComputeNode has the user and group openlava and openlava have permission on the folder
```
drwxr-xr-x. 10 openlava openlava 4096 Jun 26 00:32 openlava-3.0
```
Install pdsh. See Installing pdsh to issue commands to a group of nodes in parallel in CentOS on all the compute nodes and use pdcp to copy /etc/passwd /etc/shadow /etc/group to all the nodes
```
# pdcp -a /etc/passwd /etc
# pdcp -a /etc/shadow /etc
# pdcp -a /etc/group /etc
```
Make sure your /etc/hosts reflect the short hostname of the cluster both in the HeadNode and ComputeNode. Refrain from putting 2 hostnames per line.
Check your firewalls settings. Make sure the ports 6322:6325 are opened.
Ensure your NTP are synchronized across the clients and HeadNode with the designated NTP Server. If the NTP

Thursday, June 25, 2015

LM is Down Error Messages for OpenLava-3.0

After configuring OpenLava-3.0 using the tar ball and following the instruction according to the OpenLava – Getting Started Guide

I was encountering errors like

# lsid
openlava project 3.0, June 25 2015
ls_getclustername(): LIM is down; try later

Debugging:

# service openlava stop

# vim /usr/local/openlave-3.0/etc/lsf.conf

# /usr/local/openlava-3.0/sbin/lim -2

Solution (Check first):
Check that the

# hostname -s
# hostname -f

In your /etc/hosts, you may want to change to something like this. It solved my issues

127.0.0.1   headnode-h00 localhost

Friday, May 29, 2015

Inappropriate ioctl for device MSG=cannot create job file for Torque

I encountered this error on the cluster.

qsub: submit error (PBS_Server System error: 
Inappropriate ioctl for device MSG=cannot create job file 
/var/spool/torque/server_priv/jobs/497741.headnode-h00.cluster.com 
(28 - No space left on device))

I did a df -h and notice that there is still space. But when I did a df -i, I've noticed all the spaces iUSE% is almost 100% used up. To bring up the

# find / -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n

References:

Find where inodes are being used

Thursday, May 21, 2015

Beware of Trojanized version of Putty SSH client distributed in the Wild

Summary
Reports of a trojanized version of Opensource SSH PUTTY client is found to the distributed in the wild.

Attacks
According to the report, if appear to occur in the following manner

The victim performs a search for PuTTY on a search engine.
The search engine provides multiple results for PuTTY. Instead of selecting the official home page for PuTTY, the victim unknowingly selects a compromised website.
The compromised website redirects the user several times, ultimately connecting them to an IP address in the United Arab Emirates. This site provides the user with the fake version of PuTTY to download.

Mitigation

Always ensure that you only download the software from the authors/publisher official homepage.
Check the Software’s “About Information”. According to the report, the malicious version will show this.

References:

Friday, May 15, 2015

Buffer Overflow vulnerability within the QEMU system emulator

Red Hat Product Security is now aware of a 'buffer overflow' vulnerability within the QEMU system emulator, which is widely installed and used for virtualization purposes on Linux systems. QEMU is also used by Red Hat’s cloud and virtualization products.

The vulnerability is known as VENOM and is assigned the identifier CVE-2015-3456.

This vulnerability affects the Floppy Disk Controller (FDC) emulation implemented in QEMU and could cause VM guests to crash the host's hypervisor and potentially facilitate arbitrary code execution on the host via guests. Even if the guest does not explicitly enable an FDC, all x86 and x86_64 guests are vulnerable.

For more more detailed information, do take a look at Redhat Security Blog: VENOM, don't get bitten

Resolving downed Interface Group on NetApp Cluster-Mode

netapp-cluster1::*> network port show
Auto-Negot  Duplex     Speed (Mbps)
Node   Port   Role         Link   MTU Admin/Oper  Admin/Oper Admin/Oper
------ ------ ------------ ---- ----- ----------- ---------- ------------
netapp-cluster1-01
a0a    data         down  1500  true/-     auto/-      auto/-
e0a    data         up    1500  true/true  full/full   auto/1000
e0b    data         up    1500  true/true  full/full   auto/1000
e0c    data         up    1500  true/true  full/full   auto/100
e0d    data         up    1500  true/true  full/full   auto/1000

2. If you have a LIF that should be on that node; do the following: The purpose is to let another node within the cluster to be the home-node for the data and mgmt while you up and down the interface group

netapp-cluster1::*> net int modify -vserver vs_StorageVNode11  -lif vs_StorageVNode11_data1 -home-node netapp-cluster1-02 -home-port a0a
netapp-cluster1::*> net int modify -vserver vs_StorageVNode11  -lif vs_StorageVNode11_mgmt1 -home-node netapp-cluster1-02 -home-port a0a
net int revert *

3. Remove the Interface Group from the -port e0c and down and up the e0c port

netapp-cluster1::*> ifgrp remove-port -node netapp-cluster1-01 -ifgrp a0a -port e0c
netapp-cluster1::*> net port modify -node netapp-cluster1-01 -port e0c -up-admin false
netapp-cluster1::*> net port modify -node netapp-cluster1-01 -port e0c -up-admin true
netapp-cluster1::*> net port show -node netapp-cluster1-01 -port a0a,e0c

Once If e0c shows up and at auto/1000, add the interface and return the interface group back to netap-cluster-01

netapp-cluster1::*> ifgrp add-port -node netapp-cluster1-01 -ifgrp a0a -port e0c
netapp-cluster1::*> net port show -node netapp-cluster1-01 -port a0a
netapp-cluster1::*> net int modify -vserver vs_StorageVNode11 -lif vs_StorageVNode11_data1 -home-node netapp-cluster1-01 -home-port a0a
netapp-cluster1::*> net int modify -vserver vs_StorageVNode11 -lif vs_StorageVNode11_mgmt1 -home-node netapp-cluster1-01 -home-port a0a
netapp-cluster1::*> net int revert *

Wednesday, May 13, 2015

Node, port and Lif Information for NetApp

Useful References

How to determine the node, port, or lif to which a client should be connected
https://kb.netapp.com/support/index?page=content&id=1013873&locale=en_US&access=s
How to determine why a lif is on a certain port or node
https://kb.netapp.com/support/index?page=content&id=1013874&locale=en_US&access=s
Enabling and reverting LIFs to home ports
https://library.netapp.com/ecmdocs/ECMP1636041/html/GUID-7865FB3E-F57B-4976-803D-A87F2F760342.html

Friday, May 8, 2015

Open-Source Remote Desktop Solution X2Go

X2Go is a interesting Remote Desktop Solutionfor Linux and has the following features (from their website)

Graphical Remote Desktop that works well over both low bandwidth and high bandwidth connections
The ability to disconnect and reconnect to a session, even from another client
Support for sound
Support for as many simultaneous users as the computer's resources will support (NX3 free edition limited you to 2.)
Traffic is securely tunneled over SSH
File Sharing from client to server
Printer Sharing from client to server
Easily select from multiple desktop environments (e.g., MATE, GNOME, KDE)
Remote support possible via Desktop Sharing
The ability to access single applications by specifying the name of the desired executable in the client configuration or selecting one of the pre-defined common applications

Tuesday, May 5, 2015

Calculate the fingerprint of a key file

# ssh-keygen -l -f id_rsa.pub
2048 ....................................................     yournode@headnode.com

Thursday, April 30, 2015

How to find relocated N series downloads from IBM Portals

If you are looking for N Series downloads from IBM. Do find it at the link How to find relocated N series downloads

Tuesday, April 28, 2015

Error qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host

I encountered this error when following Torque Administratrion Guide. To mitigate the error, remember to put in the PATH

export PATH=$PATH:/opt/torque/x86_64.bin:/opt/torque/x86_64/sbin

Remember to source the file :)

If after the above insert, it does not work, just do it manually

pbs_server -t create

Manually check that the pbs_server database is created

[root@headenode torque-4.2.10]# ps -afe|grep "pbs_server -t create"
root     26644     1  0 16:06 ?        00:00:01 pbs_server -t create
root     30318  2682  0 16:21 pts/0    00:00:00 grep pbs_server -t create

File Test Operators

If you writing BASH, you may want to look at the File Test Operators written by Advanced Bash-Scripting Guide:

Simple BASH script to setup shared SSH keys on Cluster

Do take a look at Simple BASH script to setup shared SSH keys on Cluster.

This can be run in the user directory to allow passwordless access through nodes in the cluster.

SSH-keygen non-interactive command

If you want a non-interactive ssh-keygen command direct from ssh-keygen itself, see

# ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa

Wednesday, April 22, 2015

Installing scipy and other scientific packages using pip3 for Python 3.4.1

I wanted to install the packages using pip3. Before you can successfully install the python packages, do note that you have to make sure the following packages are found in your CentOS 6.

# yum install blas blas-devel lapack lapack-devel numpy

After you install according to Compiling and Configuring Python 3.4.1 on CentOS The packages that I want to install are numpy scipy matplotlib ipython ipython[notebook] pandas sympy nose

# /usr/local/python-3.4.1/bin/pip install numpy
# /usr/local/python-3.4.1/bin/pip install scipy
# /usr/local/python-3.4.1/bin/pip install matplotlib
# /usr/local/python-3.4.1/bin/pip install ipython[notebook]
# /usr/local/python-3.4.1/bin/pip install pandas
# /usr/local/python-3.4.1/bin/pip install sympy
# /usr/local/python-3.4.1/bin/pip install nose

Monday, April 20, 2015

Accessing LSF batch job ID and array ID

The variables are as followed:

LSB_JOBID: LSF assigned job ID
LSB_BATCH_JID: Array job ID. Includes job ID and array index number
LSB_JOBINDEX: Job array index

References:

Accessing LSF batch job ID and array ID within job environment

Thursday, April 16, 2015

APT30 and the Mechanics of a long-running cyber espionage operation

FireEye has released a recent report detailing the movements, methods and motives of a cyber-hacking group named “APT30” which specializes in comprising corporations, institutions and government agencies in the South East Asian geographical area.

For more information, do take a look at https://www2.fireeye.com/rs/fireye/images/rpt-apt30.pdf

Saturday, April 4, 2015

Using ibdiagnet to generate topology of the network.

You can use the ibdiagnet to generate the topology of the IB Network simply by using the "-w" switch

# ibdiagnet -w /var/tmp/ibdiagnet2/topology.top
.....
.....
-I- ibdiagnet database file   : /var/tmp/ibdiagnet2/ibdiagnet2.db_csv
-I- LST file                  : /var/tmp/ibdiagnet2/ibdiagnet2.lst
-I- Topology file             : /var/tmp/ibdiagnet2/topology.top
-I- Subnet Manager file       : /var/tmp/ibdiagnet2/ibdiagnet2.sm
-I- Ports Counters file       : /var/tmp/ibdiagnet2/ibdiagnet2.pm
-I- Nodes Information file    : /var/tmp/ibdiagnet2/ibdiagnet2.nodes_info
-I- Partition keys file       : /var/tmp/ibdiagnet2/ibdiagnet2.pkey
-I- Alias guids file          : /var/tmp/ibdiagnet2/ibdiagnet2.aguid

# vim /var/tmp/ibdiagnet2/topology.top

# This topology file was automatically generated by IBDM

SX6036G Left-Leaf-SW03
U1/P1 -4x-14G-> HCA_1 mtlacad05 U1/P1
U1/P17 -4x-14G-> SX6012 Right-Spine-SW02 U1/P2
U1/P18 -4x-14G-> SX6012 Left-Spine-SW01 U1/P2
U1/P2 -4x-14G-> HCA_1 mtlacad07 U1/P1
U1/P3 -4x-14G-> HCA_1 mtlacad03 U1/P1
U1/P4 -4x-14G-> HCA_1 mtlacad04 U1/P1
U1/P6 -4x-14G-> HCA_1 mtlacad06 U1/P1
.....
.....

Monday, March 30, 2015

Using ibdev2netdev to quickly identify ports

ibdev2netdev is a nice tool to quickly identify ports to ib0

[root@headnode-h99 ~]# ibdev2netdev
mlx4_0 port 1 ==> ib0 (Up)
mlx4_0 port 2 ==> ib1 (Down)

Tools for Performance Test for IB

ibportstate

Enables the querying of the logical link and physical por tstates of an IB Port.
Displays information such as LinkSpeed, LinkWidth and extended link speed
Allows adjusting of link speed that is enabled on any IB Port

# ibportstate LID PortNumber

# Port info: Lid 15 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................15
SMLid:...........................1
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............14.0625 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
# MLNX ext Port info: Lid 15 port 1
StateChangeEnable:...............0x00
LinkSpeedSupported:..............0x01
LinkSpeedEnabled:................0x01
LinkSpeedActive:.................0x00

Friday, March 27, 2015

Leap Second on 30th June 2015 and effects on CentOS and RHEL

At 11:59 p.m. on June 30, clocks will count up all the way to 60 seconds. That will allow the Earth's spin to catch up with atomic time.

Background - http://www.usatoday.com/story/tech/2015/01/08/computer-chaos-feares/21433363/

All of Red Hat Enterprise Linux 4, 5, 6 & 7 will be affected.

*Resolve Leap Second Issues in Red Hat Enterprise Linux
https://access.redhat.com/articles/15145

*Are we susceptible to a leap second event?
https://access.redhat.com/articles/199563

*Labs: Leap Second Issue Detector
https://access.redhat.com/labs/leapsecond/

Basic Configuration of Octopus 4.1.2 with OpenMPI on CentOS 6

Octopus is a scientific program aimed at the ab initio virtual experimentation on a hopefully ever-increasing range of system types. Electrons are described quantum-mechanically within density-functional theory (DFT), in its time-dependent form (TDDFT) when doing simulations in time. Nuclei are described classically as point particles. Electron-nucleus interaction is described within the pseudopotential approximation.

Do take a look at the installation writeup by linuxcluster Basic Configuration of Octopus 4.1.2 with OpenMPI on CentOS 6

Saturday, March 21, 2015

Unable to Submit via Torque Submission Node - Socket_Connect Error for Torque 4.2.7

I am using Torque Server version 4.2.7. I was trying to configure a Submission Node. Here are a sample of my qmgr -c 'p s" output. Firewall has allows the necessary traffic in outr

# qmgr -c "p s"
.......... 
set server acl_hosts = submission_node.cluster.spms.ntu.edu.sg
set server acl_hosts += head_node.cluster.spms.ntu.edu.sg
set server submit_hosts = submission_node.cluster.spms.ntu.edu.sg
set server submit_hosts += head_node.cluster.spms.ntu.edu.sg
set server allow_node_submit = True 
.......

After we ssh into the submission_node, and as I simulate as a user, I got this errors. Yes, the submission_node has been configured as a conventional client.

socket_connect error (VERIFY THAT trqauthd IS RUNNING)
Error in connection to trqauthd (15137)-[could not connect to unix socket /tmp/trqauthd-unix: 111]
socket_connect error (VERIFY THAT trqauthd IS RUNNING)
Error in connection to trqauthd (15137)-[could not connect to unix socket /tmp/trqauthd-unix: 111]
socket_connect error (VERIFY THAT trqauthd IS RUNNING)
Error in connection to trqauthd (15137)-[could not connect to unix socket /tmp/trqauthd-unix: 111]
Unable to communicate with head_node(10.10.10.20)
Communication failure. qsub: cannot connect to server head_node (errno=15137) could not connect to trqauthd

Taking a look at the Torque 4.2.7 documentation, the documentation mentioned that you have to make sure the submission node have trqauthd script at /etc/init.d if you are using RH / CentOS. You can easily scp the /etc/init.d/trqauthd to the submision node

From the head_node

# scp -v /etc/init.d/trqauthd root@submssion_node:/etc/init.d/

Create a /etc/hosts_equiv file

# touch /etc/hosts_equiv

Put the Submission_Node file name at the /etc/hosts.equiv of the head_node

submission_node

At the Submission_Node, start the trqauthd service

# service trqauthd start

Now trying submitting as a normal user

Tuesday, March 17, 2015

Where to download Intel Compiler?

I often has to google a while before I can locate the download site for the our purchased Intel Compiler. Here is the link just in case I forget again. Just log on and you can access the Intel Compilers

https://registrationcenter.intel.com/RegCenter/MyProducts.aspx

Enabling Massive Multi-GPU Scaling and Peering

Do take a look at http://www.cirrascale.com/ for high density Multi-GPU Scaling and Peering.

Monday, March 9, 2015

Enabling Predictive Cache Statistics (PCS) for Data OnTap 8.2p

* node1 is the controller currently primary to the aggregate/vol/LUN.

Step 1: Enable PCS

node1::> node run –node node1
node1::> options flexscale.enable on
node1::>options flexscale.enable
flexscale.enable pcs  you should see this
node1::>options flexscale.pcs_size 330GB  based on 3 x 200GB SSD RAID4

Step 2: Allow the representative workload to run and Run your workload

Step 3: Collect data throughout the process

node1::>stats show -p flexscale-access

NetApp recommends issuing this command through an SSH connection and logging the output throughout the observation period because you want to capture and observe the peak performance of the system and the cache. This output can also be easily imported into spreadsheet software, graphed, and so on. This process initially provides information on the “cold” state of the emulated cache. That is, no data is in the cache at the start of the test, and the cache is filled as the workload runs. The best time to observe the emulated cache is once it is filled, or “warmed”, as this will be the point when it enters a steady state. Filling the emulated cache can take a considerable amount of time and depends greatly on the workload. References:

Sunday, March 8, 2015

Using Tuned to tune CentOS 6 System

Tuned is a Dynamic Adaptive Tuning System Daemon. According to Manual Page

tuned is a dynamic adaptive system tuning daemon that tunes system settings dynamically depending on usage. For each hardware subsystem a specific monitoring plugin collects data periodically. This information is then used by tuning plugins to change system settings to lower or higher power saving modes in order to adapt to the current usage. Currently monitoring and tuning plugins for CPU, ethernet network and ATA harddisk devices are implemented.

Using Tuned

1. Installing tuned

# yum install tuned

2. To view a list of available tuning profiles

 [root@myCentOS ~]# tuned-adm list
Available profiles:
- laptop-ac-powersave
- server-powersave
- laptop-battery-powersave
- desktop-powersave
- virtual-host
- virtual-guest
- enterprise-storage
- throughput-performance
- latency-performance
- spindown-disk
- default

3. Tuning to a specific profile

# tuned-adm profile latency-performance
Switching to profile 'latency-performance'
Applying deadline elevator: dm-0 dm-1 dm-2 sda             [  OK  ]
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf:                                [  OK  ]
Calling '/etc/ktune.d/tunedadm.sh start':                  [  OK  ]
Applying sysctl settings from /etc/sysctl.conf
Starting tuned:                                            [  OK  ]

4. Checking current tuned profile used and its status

# tuned-adm active
Current active profile: latency-performance
Service tuned: enabled, running
Service ktune: enabled, running

5. Turning off the tuned daemon

# tuned-adm off

References:

Tuning Your System With Tuned (http://servicesblog.redhat.com)

Compiling Gromacs 5.0.4 on CentOS 6

Compiling Gromacs has never been easier using the cmake. There are a few assumptions.

Use MKL and Intel Compilers
Use OpenMPI as the MPI-of-choice. The necessary PATH and LD_LIBRARY_PATH have been placed in .bashrc
We will use SINGLE precision for speed used MDRUN and MPI Flags

Here is my configuration file using Intel Compilers

# tar xfz gromacs-5.0.4.tar.gz
# cd gromacs-5.0.4
# mkdir build
# cd build

# /usr/local/cmake-3.1.3/bin/cmake -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON 
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs-5.0.4 -DGMX_MPI=on -DGMX_FFT_LIBRARY=mkl 
-DGMX_DOUBLE=off -DGMX_BUILD_MDRUN_ONLY=on -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc

# make
# make check
# sudo make install
# source /usr/local/gromacs/bin/GMXRC

References:

Compiling Gromacs 5.0.4 on CentOS 6 (linuxcluster.wordpress.com)

Friday, March 6, 2015

FREAK (Factoring Attack on RSA-EXPORT Keys) Attack

FREAK (Factoring Attack on RSA-EXPORT Keys) Attack

The vulnerability allows attackers to intercept HTTPS connections between vulnerable clients and servers and force them to use ‘export-grade’ cryptography(weak export cipher suites), which can then be decrypted.

It is recommended to update to the latest software patches. OpenSSL (CVE-2015-0204): versions before 1.0.1k are vulnerable.
For non-OpenSSL, disable support for any export cipher suites and known insecure ciphers on your web server.

Solutions:

Use latest version of Chrome/IE/Mozilla instead of the Android Browser and Safari.
Check if your site is vulnerable. SSL Labs - https://www.ssllabs.com/ssltest/

References:

FREAK Attack - https://freakattack.com/
Graham Cluley - https://grahamcluley.com/2015/03/freak-attack-what-is-it-heres-what-you-need-to-know/
Recommended Configuration - https://wiki.mozilla.org/Security/Server_Side_TLS#Recommended_configurations

do_vfs_lock: VFS is out of sync with lock manager for CentOS 5

If you are reading at the "do_vfs_lock: VFS is out of sync with lock manager" messages at your screen or in your log file,

According to RedHat Site,

The message will be printed whenever there is locking contention (two or more processes trying to lock the same file) and the mount had nolock specified.

The RHEL-5 code prints the message unconditionally, while on the upstream code it is a debugging message, so it won't be seen on normal operation there.

Do take a look at your /etc/fstab and the mounting option. You should remove the "nolock" options

References:

Many "do_vfs_lock: VFS is out of sync with lock manager" messages on a "-o nolock" NFS mount in RHEL?

Thursday, February 26, 2015

RedHat Forum Forum 2014 - Singapore

Watch Red Hat Forum 2014 Sessions Now, On Demand. See Red Hat Forum 2014 – Singapore

Wednesday, February 25, 2015

Samba Remote Code Execution Vulnerability

An uninitialized pointer use flaw was found in the Samba daemon (smbd). A malicious Samba client could send specially crafted netlogon packets that, when processed by smbd, could potentially lead to arbitrary code execution with the privileges of the user running smbd (by default, the root user)

For more details about the vulnerability or information on updating your Samba connections, see
CVE Page: https://access.redhat.com/security/cve/CVE-2015-0240
KCS Article: https://access.redhat.com/articles/1346913
KCS Solution: https://access.redhat.com/solutions/1351573

Workaround / Advices
It is recommended to update to the latest software patches.

Other references:
Please refer to the TNAS report 24 February 2015 (Ref: 24022015-02) for additional information
Samba - https://www.samba.org/samba/security/CVE-2015-0240
US-Cert - https://www.us-cert.gov/ncas/current-activity/2015/02/24/Samba-Remote-Code-Execution-Vulnerability
Tripwire - http://www.tripwire.com/state-of-security/vulnerability-management/vert-threat-alert-samba-remote-code-execution/

Saturday, December 26, 2015

Wednesday, December 9, 2015

Tuesday, December 8, 2015

Monday, November 23, 2015

Thursday, November 12, 2015

Wednesday, November 4, 2015

Thursday, October 29, 2015

Tuesday, October 27, 2015

Saturday, October 3, 2015

Wednesday, September 23, 2015

Monday, August 31, 2015

Wednesday, August 26, 2015

Thursday, August 20, 2015

Monday, August 17, 2015

Friday, August 14, 2015

Wednesday, August 12, 2015

Thursday, July 23, 2015

Saturday, July 4, 2015

Sunday, June 28, 2015

Thursday, June 25, 2015

Friday, May 29, 2015

Thursday, May 21, 2015

Friday, May 15, 2015

Wednesday, May 13, 2015

Friday, May 8, 2015

Tuesday, May 5, 2015

Thursday, April 30, 2015

Tuesday, April 28, 2015

Wednesday, April 22, 2015

Monday, April 20, 2015

Thursday, April 16, 2015

Saturday, April 4, 2015

Monday, March 30, 2015

Friday, March 27, 2015

Saturday, March 21, 2015

Tuesday, March 17, 2015

Monday, March 9, 2015

Sunday, March 8, 2015

Friday, March 6, 2015

Thursday, February 26, 2015

Wednesday, February 25, 2015

Total Pageviews

Blog Archive

Useful Sites

About Me