Wednesday, February 29, 2012

Best Practices for VMware vSphere 5 on NetApp Data ONTAP


This updated technical report Best Practices for Vmware vSphere 5 covers 3 main areas
  1. How the new and updated technologies from VMware and NetApp enable you to virtualize demanding business-critical applications
  2. The role of the best practice TR as it relates to the rest of the joint VMware and NetApp solutions documents in the NetApp technical library
  3. How to enable coordinated and automated cross-domain operations where resources are consumed dynamically while complying with operational standards
 Interesting Extraction from the article


With VMware vSphere 5 you can deploy “monster VMs”—virtual machines configured with a maximum of 32 virtual CPUs and 1TB of memory. To put the new configurations of a “monster VM” in perspective, in vSphere 4.1 a virtual machine was limited to 8 virtual CPUs and 255GB of memory. The VMware engineering team has outdone itself in delivering such an increase in computing capacity.

NetApp has also enhanced the performance of the dedupe intelligence within the storage caches of its arrays. Dedupe-aware storage controller cache is unique to NetApp and is a feature that we developed and brought to market specifically for use with server virtualization. If you’re familiar with the performance benefits of VMware Transparent Page Sharing, then just imagine that technology in the cache of a storage array. NetApp is the only array that allows data cached from one VM to be accessed when servicing a subsequent request by another VM......

Interesting related technical report "NetApp Storage Best Practices for VMware vSphere" (pdf)

Tuesday, February 28, 2012

Using Torque to set up a Queue to direct users to a subset of resources

If you are running clusters, you may want to set up a queue to direct users to a subset of resources with Torque. For example, I may wish to direct a users who needs specific resources like MATLAB to a particular queue.

More information can be found at Torque Documents 4.1 "4.1.4 Mapping a Queue to a subset of Resources"


....The simplest method is using default_resources.neednodes on an execution queue, setting it to a particular node attribute. Maui/Moab will use this information to ensure that jobs in that queue will be assigned nodes with that attribute...... 

For example, if you are creating a queue for users of MATLAB
qmgr -c "create queue matlab"
qmgr -c "set queue matlab queue_type = Execution"
qmgr -c "set queue matlab resources_default.neednodes = matlab"
qmgr -c "set queue matlab enabled = True"
set queue matlab started = True

For those nodes, you are assigning to the queue, do update the nodes properties. A good example can be found at 3.2 Nodes Properties

To add new properties on-the-fly,
qmgr -c "set node node001 properties += matlab"

(if you are adding additional properties to the nodes)

To remove properties on-the-fly
qmgr -c "set node node001 properties -= matlab"


Saturday, February 25, 2012

PCI Device Resource Allocation Failure noticed during Boot Time for IBM Server 3650

After plugging on a HBA, Quad-Port Ethernet Card, Low latency TOE Card in additon to my RAID Controller. On boot, I saw this error "PCI Device Resource Allocation Failure" during boot time. But I did not pay much attention.

Until the system boot, I realize that my Low Latency TOE Card work but with no communication. I thought it was a card / cable issue and I correspondingly change it. But the effect was still the same. No communication still. Very puzzling.

In the IBM Website, there is an article that might give a hint. 1801 PCI resource error using CTRL-H with 2x MegaRAID 8480 - IBM System x.

From what I infer from the article, it is due to insufficient ROM Space for that many devices. And there is need to turn off unneeded ROM BIOS apps/PXE boot options.

What I did was to.....

Disable Planar Ethernet PXE/DHCP
  1. Press F1 when prompted on boot
  2. Go to Setup Option
  3. At the Option of "Planer Ethernet PXE/DHCP", turn it to Disabled

If the above does not eliminate the error, do the following
Disable unrequired PCI ROM Slot
  1. Press F1 when prompted on boot
  2. Go to Advanced Setup
  3. Select PCI Settings
  4. Select PCI ROM Control Execution
Within the "PCI ROM Control Execution" Panel, select the slot number for the secondary controller and disable that slot.

Reboot, you should see the error gone.


For more information see
  1. 1801 PCI resource error using CTRL-H with 2x MegaRAID 8480 - IBM System x.
  2. Connecting ESX to SAN: PCI Device resource allocation failure
  3. x3650 PCI Device Resource Allocation Failure

Friday, February 24, 2012

HP Z1 Workstation Revealed - The Power without the Tower


Introducing the HP Z1, the world’s first all-in-one workstation with a
27” (diagonal) display that snaps open to let you swap out parts and make upgrades. No tools required.

This workstation is simply cool.....Coming in April 20

  1. HP Z1 Workstation Reveal 
  2. Creation of The Power Without The Tower 
  3. NVIDIA Joins Forces with HP to Free the Workstation from its Tower

Thursday, February 23, 2012

EU wants Europe to be supercomputing superpower


The European Commission today unveiled plans to make Europe a leading light in high-performance computing (HPC).
The EC said there had been a “relative decline in HPC use and capabilities,” but it hopes that will be reversed with a doubling of investment in supercomputing......

For the rest of the article, do read EU wants Europe to be supercomputing superpower (IT Pro)

Sunday, February 19, 2012

How to replace failed disk on N-Series Filer

I had 2 failed disk on 1 aggregate for my N-Series (N3600)  or NetApp FAS2050. I was really thankful that NetApp is RAID-6 with Dual Parity. Here are the steps that I took to quickly replace my failed disks.

Of course, I new that that the disks has failed as I see  amber lights on my failed disks. I verify with the Syslog Messages in my Filer and this confirmed the 2 disks have failed.

NetApp Filer has hot-swappable disks, so it is safe to pull out the failed disks. Do verify that the disks has failed though. One you have replaced the disks, it should automatically detect the correct controller. If it does not, you have to do the following

Controller-A> disk show -n

 DISK       OWNER               POOL
------     -------             ------
0a.76    Not Owned              NONE    

U should see that the disk are unowned. I'm assuming you are using the controller console you are assigning the disk to. Just type

Controller-A> disk assign 0a.76

At your syslog file, you should see the disk being rebuilt. If somehow it the N-Series Filer do not accept the command above, you have to unown the disk again and enter the right controller console and assign the disk again. To unown, use the following command

Controller-A> disk assign 0a.76 -s unowned -f

Thursday, February 16, 2012

Using Kernel Samepage Merging with KVM

For the original and writeup of the article, do look at Using KSM (Kernel Samepage Merging) with KVM.

In short, from the article
Kernel SamePage Merging is a recent linux kernel feature which combines identical memory pages from multiple processes into one copy on write memory region. Because kvm guest virtual machines run as processes under linux, this feature provides the memory overcommit feature to kvm so important to hypervisors for more efficient use of memory......

Pointer 1. Verifying Kernel KSM Support
# grep KSM /boot/config -'uname -r'

You should see something like this if KSM is enabled
CONFIG_KSM=y

You should also see a directory for KSM in
Pix taken from Linux-KVM

Pointer 2: By default, KSM is limited to 2000 kernel pages.

To verify, type the following command
# cat /sys/kernel/mm/ksm/max_kernel_pages
You should see
2000

Pointer 3: Verifying KVM Support for Samepage Merging

From the article.....
In order for your KVM guests to take advantage of KSM, your version of qemu-kvm must explicitly request from the kernel that identical pages be merged using the new madvise interface. The patch for this feature was added to the kvm development tree just recently following the kvm-88 release. If you’re compiling kvm yourself you can verify whether your version of kvm will support KSM by inspecting exec.c source file for the following lines of code

If you don’t see these lines in your exec.c file then your kvm process will still run fine but but it won’t take advantage of KSM.
#ifdef MADV_MERGEABLE
        madvise(new_block->host, size, MADV_MERGEABLE);
#endif

Pointer 4 - Run multiple simiar guests
.......With multiple virtual machines running, you can verify that KSM is working by inspecting the following file to see how many pages are being shared between your kvm guests.

If the value is greateer than zero, KSM is used
# cat /sys/kernel/mm/KSM/pages_sharing

Monday, February 13, 2012

ld cannot find -lmkl_intel_ilp64.a

I encounter an error while compiling for NWChem 6.1. The error I receive during the compilation

ld: cannot find -lmkl_intel_ilp64.a

When you encountered the error, it is due to the MKL path does not contain the required libraries which is libmkl_intel_ilp64.a or libmkl_intel_ilp64.so you should modify the -L....... to point to the right MKL path. For example, see

BLASOPT="-L/opt/intel/mkl/10.2.4.032/lib/em64t -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread"

Friday, February 10, 2012

How to fix a broken yum for CentOS

When we type the command yum, we encounter the error such as the one below.

# yum update
Loaded plugins: downloadonly, fastestmirror
Loading mirror speeds from cached hostfile
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in ?
    yummain.user_main(sys.argv[1:], exit_code=True)
  File "/usr/share/yum-cli/yummain.py", line 309, in user_main
    errcode = main(args)
  File "/usr/share/yum-cli/yummain.py", line 178, in main
    result, resultmsgs = base.doCommands()
  File "/usr/share/yum-cli/cli.py", line 345, in doCommands
    self._getTs(needTsRemove)
  File "/usr/lib/python2.4/site-packages/yum/depsolve.py", line 101, in _getTs
    self._getTsInfo(remove_only)
  File "/usr/lib/python2.4/site-packages/yum/depsolve.py", line 112, in _getTsIn                                                                             fo
    pkgSack = self.pkgSack
  File "/usr/lib/python2.4/site-packages/yum/__init__.py", line 661, in 
    pkgSack = property(fget=lambda self: self._getSacks(),
  File "/usr/lib/python2.4/site-packages/yum/__init__.py", line 501, in _getSack                                                                             s
    self.repos.populateSack(which=repos)
  File "/usr/lib/python2.4/site-packages/yum/repos.py", line 232, in populateSac                                                                             k
    self.doSetup()
  File "/usr/lib/python2.4/site-packages/yum/repos.py", line 79, in doSetup
    self.ayum.plugins.run('postreposetup')
  File "/usr/lib/python2.4/site-packages/yum/plugins.py", line 179, in run
    func(conduitcls(self, self.base, conf, **kwargs))
  File "/usr/lib/yum-plugins/fastestmirror.py", line 181, in postreposetup_hook
    all_urls = FastestMirror(all_urls).get_mirrorlist()
  File "/usr/lib/yum-plugins/fastestmirror.py", line 333, in get_mirrorlist
    self._poll_mirrors()
  File "/usr/lib/yum-plugins/fastestmirror.py", line 376, in _poll_mirrors
    pollThread.start()
  File "/usr/lib64/python2.4/threading.py", line 416, in start
    _start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
 

Possible Solution 1: (which solve my problem immediately). Modify the the following.
# yum clean all
# vim /etc/yum/pluginconf.d/fastestmirror.conf

enabled=0

Possible Solution 2: The good old yum clean all. (Clean everything)
# yum clean all
Loaded plugins: downloadonly, fastestmirror
Cleaning up Everything
Cleaning up list of fastest mirrors

Possible Solution 3: Rebuild the yum database
# yum clean all
# rm -f /var/lib/rpm/__db*
# rpm --rebuilddb
# yum update
Hope this help.

Wednesday, February 8, 2012

Allowing users to mount CD and DVD ROM

Step 1: Do take a look at finding out the device name of the CD-ROM or DVD-ROM for your Linux Box if you are using the CentOS, you may want to take a look at the Checking Device Name of CD-ROM in CentOS

Step 2: Make a directory to mount the DVD ROM
# mkdir -p /media/dvdrom 


Step 3: Update your /etc/fstab file
/dev/dvd                /media/dvdrom           iso9660 ro,user,noauto  0 0 


Step 4: Test the setup as a typical user. Put in a CD-ROM or DVD-ROM
$ mount /media/dvdrom


Step 5: To unmount, just simply type
$ umount /media/dvdrom


Tuesday, February 7, 2012

Using Ethtool to manage Ethernet Card

Ethtool is really a swiss army knife for diagnosing and managing and manipulating Network Card. I used it rather often. Here are some which I use more often.

Usage 1: ethtool eth0
# ethtool eth0

Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Link detected: yes

Interesting information include "Link Detected Status", Duplex, Speed, Supported Ports etc. I particular find the Link Detection Usage one of my most used tool. :)


Usage 2: Checking of Ethernet Drivers Version
# ethtool -i eth0

driver: bnx2
version: 2.0.23b
firmware-version: bc 1.9.6
bus-info: 0000:03:00.0


Usage 3: Identify the Physical Network Card (Blink LED Port of NIC Card)
# ethtool -p eth0


Usage 4: Display Auto-negotiation, RX and TX of Network Card
# ethtool -a eth0

Pause parameters for eth0:
Autonegotiate:  on
RX:             on
TX:             on


Usage 5: Show offl-load property of Network Card.
# ethtool -k eth0

Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off


For more information, do look at the excellent article on
  1. Ethtool man page
  2. 9 Linux ethtool Examples to Manipulate Ethernet Card (NIC Card)

Unable to remove mpi-selector for OFED 1.5.3 during uninstallation

If  you encounter this error "cannot remove mpi-selector" when uninstalling or even installing OFED package, it is best to exit the package and do a yum remove mpi-selector. Apparently, there there are a number of dependencies to BLAS, Scalapack etc.
# yum remove mpi-selector.


Later you can reinstall the dependencies like BLAS, SCALAPACK etc

Monday, February 6, 2012

vncserver for CentOS 6

To do a installation of tigervnc for the CentOS 6, just do
# yum install vnc-server

Sunday, February 5, 2012

Checking Device Name of CD-ROM in CentOS

If you are checking what is the device name of CD-ROM in CentOS 6, you can install the package wodim which is actually a command line CD/DVD recording program

# yum install wodim


To check the device name of CD-ROM / DVD-ROM,
just type
#  wodim --devices

wodim: Overview of accessible drives (1 found) :
-------------------------------------------------------------------------
 0  dev='/dev/scd0'     rwrw-- : 'NECVMWar' 'VMware IDE CDR10'
-------------------------------------------------------------------------


For CentOS 5, somehow they do not have wodim in the commonly used repository. But you can easily deduce from /dev directory. It is usually /dev/dvd or /dev/cdrom. Just check the symbolic links lead to...