Tuesday, August 30, 2011

Tweaking the Linux Kernel to manage memory and swap usage

This writeup is assuming you are tweaking to the minimise swap and maximise physical memory. This tweaking should be considered especially for High Performance MPI applications where good low latency parallelism between nodes is very essential.

In addition, this writeup also help you to “kill” runaway memory applications. For more information, see Tweaking the Linux Kernel to manage memory and swap (Linux Cluster)

Monday, August 29, 2011

Network File System (NFS) in High Performance Network

This article "High Performance (NFS) in High Performance" by Carnegic Mellon is very interesting article about NFS Performance. Do take a look. Here is a summary of their fundings

  1. For point-to-point throughput, IP over InfiniBand (Connected Mode) is comparable to a native InfiniBand.
  2. When a disk is a bottleneck, NFS can benefit from neither IPoIB nor RMDA
  3. When a disk is not a bottleneck, NFS benefits significantly from both IPoIB and RDMA. RDMA is better than IPoIB by ~20%
  4. As the number of concurrent read operations increases, aggregate throughputs achieved for both IPoIB and RDMA significantly improve with no disadvantage for IPoIB

Sunday, August 28, 2011

High NFS Load causing echo 0 > /proc/sys/kernel/hung_task_timeout_secs

Do note that simultaneous numerous write by the NFS Clients on the NFS Server will cause tremendous performance penalty and system lock-out as describe below. You will notice if you use "top" utilities, the load can be extremely high as numerous system locks are queued. 

One of my researcher was running a intense load on the NFS Server that cause an  eventual  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs". Before that, I saw on the log file "rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket"

To solve the problem, you have to lighten the load of the NFS or improve the setting. You may want to take a look at the Configuring NFS Server for Performance. A longer term solution will be to move to parallel file system.

Sometimes, it could be caused by other factors like drivers. You may want to take a look at Upgrading of Broadcom Drivers to resolve eth0 NIC SerDES Link is Down

Saturday, August 27, 2011

'Devastating' Apache bug leaves servers exposed

'Devastating' Apache bug leaves servers exposed

Devs race to fix weakness disclosed in 2007

Maintainers of the Apache webserver are racing to patch a severe weakness that allows an attacker to use a single PC to completely crash a system and was first diagnosed 54 months ago.

Attack code dubbed “Apache Killer” that exploits the vulnerability in the way Apache handles HTTP-based range requests was published Friday on the Full-disclosure mailing list. By sending servers running versions 1.3 and 2 of Apache multiple GET requests containing overlapping byte ranges, an attacker can consume all memory on a target system.

“The behaviour when compressing the streams is devastating and can end up in rendering the underlying operating system unusable when the requests are sent parallely,” Kingcope, the researcher credited with writing and publishing the proof-of-concept attack code wrote Wednesday on Apache's Bugzilla discussion list. “Symptoms are swapping to disk and killing of processes including but solely httpd processes.”

The denial-of-service attack works by abusing the routine web clients use to download only certain parts, or byte ranges, of an HTTP document from an Apache server. By stacking an HTTP header with multiple ranges, an attacker can easily cause a system to malfunction. On Wednesday morning, Apache developers said they expect to release a patch in the next 96 hours.

The Apache advisory contains several workarounds that admins can deploy in the meantime.

The susceptibility of Apache's range handling to crippling DoS attacks was disclosed in January 2007 Michal Zalewski, a security researcher who has since taken a job with Google. He said at the time that both Apache and Microsoft's competing IIS webserver were vulnerable to crippling DoS attacks because of the programs' “bizarro implementation” of range header functionality based on the HTTP/1.1 standard.

“Combined with the functionality of window scaling (as per RFC 1323)), it is my impression that a lone, short request can be used to trick the server into firing gigabytes of bogus data into the void, regardless of the server file size, connection count, or keep-alive request number limits implemented by the administrator,” Zalewski wrote. “Whoops?”

In an email to The Register on Wednesday, Zalewski wrote: “Not sure why they haven't done something about it back then, probably just haven't noticed in absence of an exploit.”

The episode challenges the conventional wisdom repeated by many proponents of open-source software that flaws in freely available software get fixed faster than in proprietary code because everyday users are free to inspect the source code and report any vulnerabilities they find. Assuming that claim is true, the four-year weakness in Apache's range-handling feature would appear to be an obvious exception.

About 235 million websites use Apache, making it the most widely used webserver with about 66 percent of the entire internet, according to figures released last month by Netcraft. IIS ranked second with more than 60 million sites, or about 17 percent.

In a statement issued several hours after this article was published, Microsoft spokesman Jerry Bryant said: "IIS 6.0 and later versions are not susceptible to this type of denial-of-service due to built in restrictions." ®


Trustwave's SpiderLabs has provided a detailed technical analysis here along with instructions for mitigating attacks using the open-source ModSecurity firewall.

Friday, August 26, 2011

Infiniband HOWTo

Stumble on this short but wonderful tutorial on Infiniband HOWTo by Guy Coates. Although the article is for Debian, but you can apply to CentPS

Thursday, August 25, 2011

IBM HPC Management Suite for Cloud

Do look at IBM Management Suite for Cloud . According to the site, the list of comprehensive tools include
  1. Provisions bare metal high performance compute clusters for technical computing and analysis workloads.
  2. Consolidates the infrastructure for efficient sharing of HPC resources.
  3. Accesses the HPC infrastructure through an on-demand, self-service web portal optimized for HPC users and administrators.
  4. Achieves rapid image deployment and resource management using diskless provisioning.
  5. Centralizes user and energy management, usage metering and accounting.

Wednesday, August 24, 2011

Strange /etc/sysconfig/network-scripts/ifcfg-eth*.bak problems on CentOS

I think there is a bug with Kudzu that causes sometimes to change network setting from /etc/sysconfig/network-scripts/ifcfg-eth* to /etc/sysconfig/network-scripts/ifcfg-eth*.bak causing loss of network connectivity after there is a reboot. The original setting was replaced with one with DHCP. In other words, the original ifcfg-eth0 becomes ifcfg-eth0.bak and a new one ifcfg-eth0 is a config scripts with DHCP settings. 

It seems that the cause of this issue seems to be coming from the Kudzu Daemon. Once I stop the service, my network configuration remain as it is during bootup.

# chkconfig --levels 2345 kudzu off

# service kudzu stop

For more information, do look at  CentOS Discussion Thread http://www.centos.org/modules/newbb/viewtopic.php?topic_id=8376

Sunday, August 21, 2011

How to check whether crontab is working for CentOS

To check whether Crontab is working, there are a few pointers you may wish to take note:

1. Check the crond service whether it is working
# service crond status
crond (pid  2873) is running...

Or if you prefer to use ps
# ps -ef | grep cron

2. Check your log file.
# tail -20 /var/log/cron
You should see something like this
Aug 22 00:41:01 yoursever crond[29827]: (root) CMD ("/root/yourscripts.sh" 2> /usr/local/yourscript_rsync/yourscript_rsync.errors > /dev/null)

Wednesday, August 17, 2011

Stuck at " running /sbin/loader " on CentOS 5 using PS/2 keyboard

I was installing CentOS 5.5 on a IBM x3550 M3. But somehow the installation procedure was stuck and stops at "running /sbin/loader" I waited for a while after 15 min, but it has not moved away from the "running /sbin/loader".

I suspect that my PS/2 keyboard with my PS/2 connector to usb converter for my keyboards was an issue after reading some forums that have raised such an issue.

I changed the keyboard to a usb-keyboard and the issue was gone.

Tuesday, August 16, 2011

Green Memory and SSDs

Interesting Information on Green Memory and SSDs from Samsung.

From Samsung
"In a server, distribution of power is directly related with its memory density and as such the need to reduce memory power consumption is becoming increasingly critical. When memory density increases among different servers, its contribution to total system power increases in other words, more memory requires a server to allocate more power for memory usage....."
  1. 30nm class Green DDR3
  2. Green SSD

SGI Acquires OpenCFD Ltd., the Leader In Open Source Computational Fluid Dynamics (CFD) Software

Interesting news....More information, read more.....

SGI Acquires OpenCFD Ltd., the Leader In Open Source Computational Fluid Dynamics (CFD) Software
    Read also:
    The OpenFOAM Foundation

    Monday, August 15, 2011

    RAMSpeed cache and memory benchmarking tool

    RAMspeed is a free open source command line utility to measure cache and memory performance of computer systems. For more information on the algorithm and other information, do read the RAMSpeed site.

    RAMspeed is more accurate than many other benchmarking tools, more customisable, open source, compact, and gives you much more information to analyse. Some people may say that the lack of some graphical interface is a large drawback, but it may be considered as an advantage as well.

    Now for the download information.....taken from RAMSpeed site.

    RAMspeed (UNIX) v2.6.0 (August, 2009) — for uniprocessor machines running UNIX-like operating systems. The source code is available for download (76Kb).

    RAMspeed/SMP (UNIX) v3.5.0 (August, 2009) — for multiprocessor machines running UNIX-like operating systems and supporting System V IPC extensions. The source code is available for download (78Kb).

    RAMspeed (DOS) v2.5.0 (August, 2009) — for DOS as well as 32-bit Windows operating systems (95 to 2003; i386 only). Both the source code and a pre-compiled executable are available for download (109Kb).

    RAMspeed (Win32) v1.1.1 (August, 2009) — for 32-bit as well as 64-bit Windows operating systems (95 to Vista; i386 or amd64). Both the source code and a pre-compiled executable are available for download (71Kb). 

    Friday, August 12, 2011

    Memory Management - Preventing the kernel from dishing out more memory than required

    I think for us who have been running computational jobs have seen the memory got eaten up by some buggy or stray applications. Hopefully the kernel kills it. But somehow you must have seen that the kernel may not have kill the culprit and the server go to a linbo.

    Let's say if we wish to ensure that the kernel only gives out memory to processes equal to the physical memory, then we have to do the following at /etc/sysctl.conf or /etc/sysctl.d/myapp.conf

    My assumption is that you have 10GB of swap and 20GB of memory and you wish the kernel to stop handling processes at  18GB RAM, then the calculation should be (swap size +  0.4 * RAM size)

    So at /etc/sysctlf.conf, the configuration will be
    vm.overcommit_memory = 2
    vm.overcommit_ratio = 40
    Note: The ratio is (40/100). For explanation of vm.overcommit_memory =2. Do look at Tweaking Linux Kernel Overcommit Behaviour for memory

    Once the memory hits 18GB, the so-called OOM killer of the Linux kernel will kick in.

    Another calculation example is that your RAM size and  SWAP size are the same and you wish exactly the physical memory to be used only. then
    vm.overcommit_memory = 2
    vm.overcommit_ratio = 0

    For more information, do read
    1. When Linux runs out of memory 
    2. vm.overcommit_memory = 2, vm.overcommit_ratio = 0

    Thursday, August 11, 2011

    Tweaking Linux Kernel Overcommit Behaviour for memory

    You can change the behaviour of the Linux Kernel in regardings to its Overcommit Behaviour for memory. You can change the behaviour at /etc/sysctl.conf/ or /etc/sysctl.d/myconfig.conf

    You will write something like
    vm.overcommit_memory = (a integer from 0 to 2)
    0 - means that the kernel will use predefined heuristics when deciding whether to allow such an overcommit. This is the default.
    1 -always overcommits (Very dangerous)
    2 - Prevents overcommits from exceeding a certain value. Within this mode, the total commit can not exceed the swap space(s) size + overcommit_ratio percent * RAM size. By default, the overcommit ratio is 50.

    For example, do look at Memory Management - Preventing the kernel from dishing out more memory than required

    Wednesday, August 10, 2011

    Selecting default OS or kernel in GRUB boot loader

    If you are wishing to change the default OS or kernel in GRUB boot loader, you can easily make the changes

    # vim /boot/grub/grub.conf

    Change the  default=0 to 1 of other numeric if that is the kernel or OS you wish to use.

    # grub.conf generated by anaconda
    # Note that you do not have to rerun grub after making changes to this file
    # NOTICE:  You do not have a /boot partition.  This means that
    #          all kernel and initrd paths are relative to /, eg.
    #          root (hd0,0)
    #          kernel /boot/vmlinuz-version ro root=/dev/sda1
    #          initrd /boot/initrd-version.img
    title CentOS (2.6.18-238.19.1.el5.centos.plus)
            root (hd0,0)
            kernel /boot/vmlinuz-2.6.18-238.19.1.el5.centos.plus ro root=LABEL=/ rhgb quiet
            initrd /boot/initrd-2.6.18-238.19.1.el5.centos.plus.img
    title CentOS (2.6.18-238.19.1.el5)
            root (hd0,0)
            kernel /boot/vmlinuz-2.6.18-238.19.1.el5 ro root=LABEL=/ rhgb quiet
            initrd /boot/initrd-2.6.18-238.19.1.el5.img

    Tuesday, August 9, 2011

    Encountering cannot restore segment prot after reloc during mpirun

    If you encounter this error "Encountering cannot restore segment prot after reloc: permission denied " during a mpirun, this is due to the presence of SELinux setting is enabled on the server. In CentOS, you can disabled this setting by changing /etc/selinux to

    To ensure the above commands take effect, you need to reboot.

    If you do not wish to reboot just yet, You can using this command to prevent the current selinux from running
    # /usr/sbin/setenforce 0

    Monday, August 8, 2011

    Diagnosing stuck OpenMPI jobs with mpirun

    I was testing a simple Openmpi jobs and somehow my jobs hangs without any output. In order to diagnose why the mpi-run failed, I've used this cool flag to

     $ mpirun --debug-daemons -np 24 -host c18,c19,c20  hello_world_mpi

    [c18.cluster.spms.ntu.edu.sg:09182] [[21720,0],1] orted_cmd: received exit
    [c19.cluster.spms.ntu.edu.sg:13800] [[21720,0],2] orted_cmd: received exit
    [c19.cluster.spms.ntu.edu.sg:13800] [[21720,0],2] orted: finalizing
    [c18.cluster.spms.ntu.edu.sg:09182] [[21720,0],1] orted: finalizing
    [c20.cluster.spms.ntu.edu.sg:08614] [[21720,0],3] orted_cmd: received exit
    [c20.cluster.spms.ntu.edu.sg:08614] [[21720,0],3] orted: finalizing 

    Finally realised that my mpirun "ran away to a "IBM RNDIS/CDC ETHER" which I promptly shutdown with a ifdown usb0

    After that it run smoothly

    Sunday, August 7, 2011

    Infiniband versus Ethernet myths and misconceptions

    I have written my thoughts on the Infiniband versus Ethernet myths and misconceptions. For more information, see Infiniband versus Ethernet myths and misconceptions from Linux Cluster Blog

    Only 3 critical myths are discussed.
    1. Opinion 1: Infiniband is lower latency than Ethernet
    2. Opinion 2: QDR‐IB has higher bandwidth than 10GbE
    3. Opinion 3: IB Switch scale better than 10GbE

       Most of my materials are taken from Chelsio White Paper - Eight myths about InfiniBand WP 09-10

      Saturday, August 6, 2011

      Server Nodes File Configuration for Torque

      As you may be having different server hardware specification for Torque, it is quite useful to specify in the Torque $TORQUEHOME/server_priv/nodes so that you can use these features to request specific nodes when submitting jobs such as

      c001 np=8     64G     
      c002 np=12   126G   

      For more information, do look at Torque Server Node File Configuration

      Friday, August 5, 2011

      mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting

      If you see this error "mlx4: There is a mismatch between the kernel and the userspace libraries: Kernel does not support XRC. Exiting", it measns that there is a mis-matched kernel modules and userspace libraries.

      Solution: Reinstall the OFED kernel modules. For more information, see Installing Voltaire QDR Infiniband Drivers for CentOS 5.4

      If you have upgraded your linux kernel, you may want to reinstall the OFED kernel modules to the upgraded kernel or boot back to your original  kernel

      Wednesday, August 3, 2011

      Understanding memory usage with top

      If you type "top" on the console,  you will see similar information

      Mem:  12299804k total,  1505328k used, 10794476k free,   196024k buffers

      How do you interpret the information especially the buffer? The above server has 12GB of RAM
      1. About 200Mbytes of memory is in the buffer. This buffering is to speed up disk access by caching written data in memory and reading in more than what we need in anticipation of the next week. If the memory is for more important requirements such as applications, it will be released immediately.