Thursday, December 27, 2012

Prepending number of lines to standard output using nl

nl copies files to standard output, with lines added. It is very flexible as it can prepend numbers to non blank lines and even justify left or right.

Usage 1: To add lines to standard output
$ nl -ba /etc/hosts.allow 

     1  #
     2  # hosts.allow   This file describes the names of the hosts which are
     3  #               allowed to use the local INET services, as decided
     4  #               by the '/usr/sbin/tcpd' server.
     5  #
     6
where -ba => prepend numbers to all lines

Usage 2: To add lines to non-blank text
$ nl -bt /etc/hosts.allow
     
     1  #
     2  # hosts.allow   This file describes the names of the hosts which are
     3  #               allowed to use the local INET services, as decided
     4  #               by the '/usr/sbin/tcpd' server.
     5  #
where -bt => prepend lines to non-blank lines

Usage 3: Format the numbering to Left Justify
$ nl - bt -nln /etc/hosts.allow

1       #
2       # hosts.allow   This file describes the names of the hosts which are
3       #               allowed to use the local INET services, as decided
4       #               by the '/usr/sbin/tcpd' server.
5       #
where -bt => prepend lines to non-blank lines
-nln => format the number to be left-justify

Usage 4: Format the numbering to Right Justify
 $ nl -bt -nrn /etc/hosts.allow

     1  #
     2  # hosts.allow   This file describes the names of the hosts which are
     3  #               allowed to use the local INET services, as decided
     4  #               by the '/usr/sbin/tcpd' server.
     5  #

Wednesday, December 26, 2012

Switching between Ethernet and Infiniband using Virtual Protocol Interconnect (VPI)

This short writeup is a summary of the article Switching between Ethernet and Infiniband using Virtual Protocol Interconnect (VPI). Of course you will need to use the QSA Adapter (QSFP+ to SFP+ adapter) which is the world's first solution for the QSFP to SFP+ conversion challenge for 40GB/Infiniband to 10G/1G. For more information, see Quad to Serial Small Form Factor Pluggable (QSA) Adapter to allow for the hardware


For the full article, see Switching between Ethernet and Infiniband using Virtual Protocol Interconnect (VPI)

Overview
mlx4 is the low level driver implementation for the ConnectX adapters designed by Mellanox Technologies. The ConnectX can operate as an InfiniBand adapter, as an Ethernet NIC, or as a Fibre Channel HBA. The driver in OFED 1.4 supports Infiniband and Ethernet NIC configurations. To accommodate the supported configurations, the driver is split into three modules:
  1. mlx4_core
    Handles low-level functions like device initialization and firmware commands processing. Also controls resource allocation so that the InfiniBand and Ethernet functions can share the device without interfering with each other.
  2. mlx4_ib
    Handles InfiniBand-specific functions and plugs into the InfiniBand midlayer
  3. mlx4_en
    A new 10G driver named mlx4_en was added to drivers/net/mlx4. It handles Ethernet specific functions and plugs into the netdev mid-layer.
Using Virtual Protocol Interconnect (VPI) to switch between Ethernet and Infiniband
Loading Drivers
  1. The VPI driver is a combination of the Mellanox ConnectX HCA Ethernet and Infiniband drivers. It supplies the user with the ability to run Infiniband and Ethernet protocols on the same HCA.
  2. Check the MLX4 Driver is loaded, ensure that the
    # vim /etc/infiniband/openib.conf
    # Load MLX4_EN module
    MLX4_EN_LOAD=yes
  3. If the MLX4_EN_LOAD=no, the Ethernet Driver can be loaded by running
    # /sbin/modprobe mlx4_en
Port Management / Driver Switching
  1. Show Port Configuration
    # /sbin/connectx_port_config -s
    --------------------------------
    Port configuration for PCI device: 0000:16:00.0 is:
    eth
    eth
    --------------------------------
  2. Looking at saved configuration
    # vim /etc/infiniband/connectx.conf
  3. Switching between Ethernet and Infiniband
    # /sbin/connectx_port_config
  4. Configuration supported by VPI
    - The following configurations are supported by VPI:
     Port1 = eth   Port2 = eth
     Port1 = ib    Port2 = ib
     Port1 = auto  Port2 = auto
     Port1 = ib    Port2 = eth
     Port1 = ib    Port2 = auto
     Port1 = auto  Port2 = eth
    
      Note: the following options are not supported:
     Port1 = eth   Port2 = ib
     Port1 = eth   Port2 = auto
     Port1 = auto  Port2 = ib
For more information, see
  1. ConnectX -3 VPI Single and Dual QSFP+ Port Adapter Card User Manual (pdf)
  2. Open Fabrics Enterprise Distribution (OFED) ConnectX driver (mlx4) in OFED 1.4 Release Notes

Friday, December 21, 2012

Quad to Serial Small Form Factor Pluggable (QSA) Adapter


Quad to Serial Small Form Factor Pluggable (QSA) Adapter designed by Mellanox Technologies is the world’s first solution for the QSFP to SFP+ conversion challenge.

The QSA enables smooth, cost-effective, connections between Virtual Protocol Interconnect® (VPI) or 40 Gigabit Ethernet adapters using contemporary QSFP ports and 1 or 10 Gigabit Ethernet networks using existing SFP or SFP+ based cabling. Similarly Ethernet switches with 40Gb/s QSFP ports can connect to servers with 10Gb/s Ethernet NIC ports using QSA.

For more information, see Quad to Serial Small Form Factor Pluggable (QSA) Adapter from Mellanox

Thursday, December 20, 2012

Using getent to query /etc/nsswitch.conf

getent program is useful for querying information setup on the /etc/nsswitch.conf. Some of the usage includes

Example 1: To get the user1 entry at /etc/passwd, you will do something like 
# getent passwd user1
user1:x:604:100:User 1:/home/user1:/bin/bash

Example 2: To get the hosts entry at /etc/hosts, you will do something like
# getent hosts node1
192.168.1.5     node1.private.mycluster.com node1

Example 3: To get the groups at /etc/group, you will do something like
# getent group gaussian
gaussian:x:501:user1,user2

For more information, see getent Linux manual pages

Wednesday, December 19, 2012

Information for configuring Microsoft HPC Server

If you are looking to configure the Microsoft Server HPC Server 2008 R2. You may want to take a look at the resources here

  1. Windows HPC Server 2008 R2  - Step by Step (pdf) (Resource Kit)
  2. Windows HPC Server 2008 R2

Tuesday, December 18, 2012

Commercial Solution for Check-pointing by Smart Suspend

If you are looking for a commercial checkpoint solution here executing jobs can be reliably suspended and resumed at will, you may want to take a look at Smart Suspend by Jaryba. According to the website of Smart Suspend Features

Jaryba SmartSuspend (SSR) is a grid workload management solution that enables executing jobs to be reliably suspended and resumed at will. While suspended, the job's hardware (CPU and memory) and license resources are released, making those resources available to other jobs. As a user space technology, SSR achieves this without any modification to the underlying operating system (OS) or the applications under management. Licenses, memory and CPU are cleanly reacquired when a job is resumed.

For more information on how SmartSuspend Works see
  1.  How SmartSuspend Suspend Works
  2. Suspension Examples
  3. Using SmartSuspend

Monday, December 17, 2012

QUEST 1.3.0 and forrtl severe (173) error

If  you are running codes that uses QUEST 1.3.0 which are compiled with Intel XE, you may encounted the error

forrtl: severe (173): 
A pointer passed to DEALLOCATE points to an array that cannot be deallocated

Do note that QUEST used to work with Intel's ifort, but Intel has tightened their standard of memory allocation/deallocation hence the error you see. It is recommended you use gfrotran for the compilation

Wednesday, December 12, 2012

Good Redbook read - IBM Platform Computing Solutions



A good Redbook read on IBM Platform Computing Solutions. The abstract which is taken from the site,

This IBM® Platform Computing Solutions Redbooks® publication is the first book to describe each of the available offerings that are part of the IBM portfolio of Cloud, analytics, and High Performance Computing (HPC) solutions for our clients. This IBM Redbooks publication delivers descriptions of the available offerings from IBM Platform Computing that address challenges for our clients in each industry. We include a few implementation and testing scenarios with selected solutions............

 The chapters are as followed:

Chapter 1. Introduction to IBM Platform Computing
Chapter 2. Technical computing software portfolio
Chapter 3. Planning
Chapter 4. IBM Platform Load Sharing Facility (LSF) product family
Chapter 5. IBM Platform Symphony
Chapter 6. IBM Platform High Performance Computing
Chapter 7. IBM Platform Cluster Manager Advanced Edition
Appendix A. IBM Platform Computing Message Passing Interface
Appendix B. Troubleshooting examples
Appendix C. IBM Platform Load Sharing Facility add-ons and examples
Appendix D. Getting started with KVM provisioning

Monday, December 10, 2012

Modifying default template for user settings in Linux

If you wish to put or modify a standard template when creating new users, you may wish to put them in the /etc/skel. The /etc/skel acts as a containers where you can out the typical .bashrc .bash_profile .bash_profile or other scripts that you would want all the default users should have. In CentOS, you would typically see

drwxr-xr-x   3 root root  4096 Oct 31 13:12 .
drwxr-xr-x 126 root root 12288 Dec 11 22:48 ..
-rw-r--r--   1 root root    33 Jan 22  2009 .bash_logout
-rw-r--r--   1 root root   290 Oct 31 13:12 .bash_profile
-rw-r--r--   1 root root   176 Jan 22  2009 .bash_profile.old
-rw-r--r--   1 root root   461 Oct 31 13:12 .bashrc
-rw-r--r--   1 root root   124 Jan 22  2009 .bashrc.old
-rw-r--r--   1 root root   515 Jun 15  2008 .emacs
drwxr-xr-x   4 root root  4096 Sep  9  2010 .mozilla
-rw-r--r--   1 root root   658 Sep 22  2009 .zshrc

Do when you do a useradd, you will invoke the following workflow Default values used by useradd command  and inclusion of the template found in /etc/skel

Sunday, December 9, 2012

Black Screen when reconnecting back to old VNC Server when hostname was changed

When I was reconnecting to an old VNC session, I got a black screen and the screen was unresponsive. There was no way to get back to the contents in the screen. Prior to the reconnecting, the hostname on the VNC Server was changed.

VNC uses hostname and the session id for the identification of the session. You can take a look at the contents at ~/.vnc/

$ ls ~/.vnc/

headnode-h00.mycluster.sg:33.pid
headnode-h00.mycluster.sg:33.log
headnode-h00.mycluster.sg:40.pid
headnode-h00.mycluster.sg:40.log
headnode-h00.mycluster.sg:42.log

To get back to any session, and assuming your VNC Server and network are accounted for and has connection, then you have to check the hostname of the server has not been accidentally changed.

$ hostname

headnode-h00.mycluster.sg
If hostname is changed,do read the blog entries 
  1. Changing the hostname on CentOS 
  2. Another look at Changing hostname for CentOS 

Thursday, December 6, 2012

libstdc++.so.5()(64bit) is needed

If you have an error, for example something like this

libstdc++.so.5()(64bit) is needed by gpfs.base-3.4.0-0.x86_64
libstdc++.so.5(CXXABI_1.2)(64bit) is needed by gpfs.base-3.4.0-0.x86_64
libstdc++.so.5(GLIBCPP_3.2)(64bit) is needed by gpfs.base-3.4.0-0.x86_64
libstdc++.so.5(GLIBCPP_3.2.2)(64bit) is needed by gpfs.base-3.4.0-0.x86_64

The error is due to missing legacy libraries compat-libstdc++. For CentOS, just do a

 yum install compat-libstdc++*

Tuesday, December 4, 2012

Debugging gmond issue quickly

If you need to debug gmond issues quickly, use the command

# /usr/sbin/gmond --debug=9

loaded module: core_metrics
loaded module: cpu_module
loaded module: disk_module
loaded module: load_module
loaded module: mem_module
loaded module: net_module
loaded module: proc_module
loaded module: sys_module
loaded module: multicpu_module
udp_recv_channel mcast_join=NULL mcast_if=NULL port=8649 bind=NULL
tcp_accept_channel bind=NULL port=8649
Unable to create tcp_accept_channel. Exiting.

For more information and example see: 
  1. Ganglia Node unable to update Gmetad Node 
  2. Gmond dead but subsys locked for ganglia monitoring daemon

Monday, December 3, 2012

Default values used by useradd command

When the users issues a useradd command, the useradd commands reads the /etc/default/useradd and the /etc/login.defs and determine the default value for useradd. To display the value for /etc/defaults/useradd, see Displaying defaults for useradd

Do read also Modifying default template for user settings in Linux which will add in the files settings for the users.

To read the /etc/login.defs,
# vim /etc/login.defs

# Password aging controls:
#
#       PASS_MAX_DAYS   Maximum number of days a password may be used.
#       PASS_MIN_DAYS   Minimum number of days allowed between password changes.
#       PASS_MIN_LEN    Minimum acceptable password length.
#       PASS_WARN_AGE   Number of days warning given before a password expires.
#
PASS_MAX_DAYS   99999
PASS_MIN_DAYS   0
PASS_MIN_LEN    5
PASS_WARN_AGE   7

#
# Min/max values for automatic uid selection in useradd
#
UID_MIN                   500
UID_MAX                 60000

#
# Min/max values for automatic gid selection in groupadd
#
GID_MIN                   500
GID_MAX                 60000

#
# If defined, this command is run when removing a user.
# It should remove any at/cron/print jobs etc. owned by
# the user to be removed (passed as the first argument).
#
#USERDEL_CMD    /usr/sbin/userdel_local

#
# If useradd should create home directories for users by default
# On RH systems, we do. This option is overridden with the -m flag on
# useradd command line.
#
CREATE_HOME     yes

# The permission mask is initialized to this value. If not specified,
# the permission mask will be initialized to 022.
UMASK           077

# This enables userdel to remove user groups if no members exist.
#
USERGROUPS_ENAB yes

# Use MD5 or DES to encrypt password? Red Hat use MD5 by default.
MD5_CRYPT_ENAB yes

ENCRYPT_METHOD MD5