Monday, October 31, 2011

Set higher MTU for vSwitch and Virtual Distributed Switch

 Refer to KB: iSCSI and Jumbo Frames configuration on ESX 3.x and ESX 4.x for more details

**Any packet larger than 1500 MTU is a Jumbo Frame. ESX supports frames up to 9Kb (9000 Bytes).

 To set the MTU size for the vSwitch, run the command:

# esxcfg-vswitch -m (MTU_Number) (Vswitch)

where MTU_Number = 9000, Vswitch = Name of the Vswitch

This command sets the MTU for all uplinks on that vSwitch. Set the MTU size to the largest MTU size among all the virtual network adapters connected to the vSwitch.



Refer to KB: Enabling Jumbo Frames for VMkernel ports in a virtual distributed switch


Run this command to change the MTU size for the individual port group:

# esxcfg-vmknic -m 9000 -v (port number) -s (dvs Switch name) 

For Example:
# esxcfg-vmknic -m 9000 -v 115 -s "NewLAN-DVS"


To enable Jumbo Frames on a VMkernel port from vCenter Server:

1) Click Home > Hosts and Clusters > Host > Configuration > Networking.
2) Navigate to the vSphere Distributed Switch tab.
3) Click the VMkernel port (eg: vmk1)
4) Click Manage Virtual Adapters.
5) Select the vmk interface and click Edit.
6) Under the NIC settings, change the MTU value to 9000.
7) Click OK.

Saturday, October 29, 2011

Error in Compiling GotoBLAS2 in Westmere Chipsets

GotoBLAS2 uses new algorithms and memory techniques for optimal performance of the BLAS routines.

When I was tried compiling the GotoBLAS2 on my Westmere chipsets, I followed the "02QuickInstall.txt", I got this error

../kernel/x86_64/gemm_ncopy_4.S: Assembler messages:
../kernel/x86_64/gemm_ncopy_4.S:192: Error: undefined symbol `RPREFETCHSIZE' in                                        operation
...........
...........
...........

gcc -O2 -Wall -m64 -DF_INTERFACE_INTEL -fPIC  -DSMP_SERVER -DMAX_CPU_NUMBER=8 -D                                       ASMNAME=strmm_kernel_RN -DASMFNAME=strmm_kernel_RN_ -DNAME=strmm_kernel_RN_ -DCN                                       AME=strmm_kernel_RN -DCHAR_NAME=\"strmm_kernel_RN_\" -DCHAR_CNAME=\"strmm_kernel                                       _RN\" -I.. -UDOUBLE  -UCOMPLEX -c -DTRMMKERNEL -UDOUBLE -UCOMPLEX -ULEFT -UTRANS                                       A ../kernel/x86_64/gemm_kernel_8x4_sse3.S -o strmm_kernel_RN.o
make[1]: *** [sgemm_oncopy.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/root/GotoBLAS2/kernel'


I was quite puzzled to why the compilation did not work. I googled and found a wonderful answer Trouble compiling GotoBLAS2 on newer CPU. Basically, you will need to

gmake clean
gmake TARGET=NEHALEM
Eventually yo will get something like

 GotoBLAS build complete.

  OS               ... Linux
  Architecture     ... x86_64
  BINARY           ... 64bit
  C compiler       ... GCC  (command line : gcc)
  Fortran compiler ... INTEL  (command line : ifort)
  Library Name     ... libgoto2_nehalemp-r1.13.a (Multi threaded; Max num-threads is 8)



According to Trouble compiling GotoBLAS2 on newer CPU, the problem appears to be that newer CPUs (Intel X5650 in my case) are not detected properly by the CPU ID routine in GotoBlas2.

The problem with gemm_ncopy_4.S arises because it defines RPRETCHSIZE and WPREFETCHSIZE using #ifdef statements depending on CPU type. There is an entry for #ifdef GENERIC, but that was not set for me in config.h.

Friday, October 28, 2011

IBM DS4000 (FastT) Storage Manager Client on Windows

Boy, this blog "How to Install IBM DS4000 (FastT) Storage Manager Client on Windows" is really helpful in getting information and the software on IBM DS4000 (FastT) Storage Manager Client. Even easier than finding on IBM Website.

......
1) Download the software.
For Windows XP, 2000, 2003, or 2008, on a 32-bit platform, click here.
For Windows Vista , 2003, 2008, on a 64-bit platform, click here
For Windows Vista 32-bit, click here.

If these links don’t work for you, try navigating IBM’s site:
www.ibm.com > support & downloads > fixes, updates, and drivers
Category > (under SYSTEMS) system storage
Product Family > disk systems, Product > DS4800 or whichever DS4000
Select Operating System and Click GO
Click Downloads on the Support and Downloads box.
.............

For more information do go to How to Install IBM DS4000 (FastT) Storage Manager Client on Windows

Wednesday, October 26, 2011

Comparison of File System

Wikipedia did a wonderful job in providing a comparison of various file system
Comparison of file systems. For those who are assessing various file system offerings, do read it.

Another shorter but not as comprehensive is File System Primer from Novell

Another comparison between Lustre File System and Panasas ActiveScale Performance comparison of the Cluster File Systems at the Intel CRT-DC

Monday, October 24, 2011

Looking for solution.... CPU soft lockup detected for CentOS 5 and IBM BladeCentre

One of my IBM BladeCentre node hangs and the log messages generated
"BUG: soft lockup - CPU#1 stuck for 10s! [rpciod/1:3646]"

I have been looking around, but found a reference but not a solution though. I have seen other forum that 
RHEL 5.X CPU soft lockup detected in PAGE_LOCK_ANON_VMA - IBM BladeCenter and System x 

Symptom
While running the Red Hat Enterprise Linux 5.x (RHEL5) family of products, the kernel may report the following error:
BUG: soft lockup - CPU#XX stuck for 10s!

Where XX can be the number of any processor in the system.
The associated stack backtrace points to page_lock_anon_vma as the code running at the time of the soft lockup detection.

Affected configurations
The system is configured with at least one of the following:
  • Red Hat Enterprise Linux 5, any update
This tip is not system specific.
This tip is not option specific.
Note: This does not imply that the network operating system will work under all combinations of hardware and software. Please see the compatibility page for more information: http://www.ibm.com/servers/eserver/serverproven/compat/us/

Solution
This behavior will be corrected in subsequent families of Red Hat products. For more information, contact Red Hat at the following URL:
http://www.redhat.com/about/contact/

Sunday, October 23, 2011

What is nearline SAS Hard Disk?

"Nearline" refers to the lower rotational speed hard disk which is usually refer to the high-capacity SATA hdd. On the other hands, "SAS is a enterprise-class drive which supposedly has a more robust mechanical specification and a controller/firmware optimized for high-volume I/O, manageability, and better error detection and correction".

So inline SAS really means  standard consumer-class SATA drives wtih SAS interface.

Large capacity with enterprise interface.....Not too bad.

For more inforamtion on Nearline, see http://en.wikipedia.org/wiki/Nearline_storage

Saturday, October 22, 2011

Myricom DBL 2.0 Achieves Lowest UDP and TCP Latency for High Frequency Trading

For the full article, see Myricom DBL 2.0 Achieves Lowest UDP and TCP Latency for High Frequency Trading

Myricom DBL 2.0 software has benchmarked application-to-application UDP latency of under 3.5 microseconds and transparent sockets TCP latency of 4.0 microseconds. For HFT applications, DBL enables unmatched networking performance for UDP multicast and TCP order execution, all over industry-standard 10-Gigabit Ethernet
 .......
DBL reduces latency by microseconds for existing applications running on standard TCP/UDP Ethernet networks. With the DBL solution, end-users can achieve extreme performance without rewriting their applications or resorting to specialty networks such as Infiniband. DBL provides transparent acceleration in both Linux and Windows environments.
......
In addition to extremely low UDP and TCP communication latency, DBL 2.0 delivers repeatable low latency, rather than unpredictable and variable latency performance found with competing solutions. Repeatable low latency performance is critical, as packet delay or loss in mission-critical trading and order environments can be devastating to the traders' bottom line.

Wednesday, October 19, 2011

What is SCSI RDMA Protocol (SRP)?

What is SCSI RDMA Protocol (SRP)?

"The SCSI RDMA Protocol (SRP) is an emerging industry standard protocol for utilizing block storage devices over an InfiniBand™ fabric. The use of RDMA makes higher throughput and lower latency possible than what is possible through e.g. the TCP/IP communication protocol. RDMA is only possible with network adapters that support RDMA in hardware."

Here is the diagram. With this, you can use Infiniband as an alternative interconnect instead of relying on Fibre Channel. The advantages of Infiniband is obvious. It has tremendous high throughput and low latency which is important for High Read -Write. 

Using dd to test and analyse read and write performance

According to Wikipedia, dd is a common UNIX program whose primary purpose is the low-level copying of raw data. There are many usage of dd, but for this blog we will use dd to test and analyse read and write performance of file system.

# dd if=/dev/zero of=/home/myaccount/outfile bs=4M count=4096 

4096+0 records in
4096+0 records out
17179869184 bytes (17 GB) copied, 433.088 seconds, 39.7 MB/s

if = input file
of = output
bs = block size
count = file size in kb

Tuesday, October 18, 2011

Malformed database image issue with yum

Today I was doing a yum install after updating my LVM and I suffered a "malformed database image issue". This error can be easily rectify. Just do a

# yum clean dbcache

Then do a
# yum check

Monday, October 17, 2011

Extend LVM on Vmware Linux Guest

One of my mirror ran out of space today. I've come across an excellent article on How to extend LVM on Vmware Guest running Linux by Edward's Blog. Tried his tutorial and it work without a hitch.


Saturday, October 15, 2011

Which File System Blocksize is suitable for my system?

Taken from IBM Developer Network "File System Blocksize"

Although the article has referenced to General Parallel File System (GPFS), but there are many good pointers System Administrators can take note of.

Here are some excerpts from the article........ 

This is one question that many system administrator asked before we start preparing the system. How do choose a blocksize for your file system? IBM Developer Network (File System Blocksize) recommends the following block size for various type of application.


IO Type Application Examples Blocksize
Large Sequential IO Scientific Computing, Digital Media 1MB to 4MB
Relational Database DB2, Oracle 512kb
Small I/O Sequential General File Service, File based Analytics,Email, Web Applications 256kb
Special* Special 16KB-64KB

What if I do not know my application IO profile?
Often you do not have good information on the nature of the IO profile or the applications are so diverse it is difficult to optimize for one or the other. There are generally two approaches to designing for this type of situation separation or compromise.

Separation
In this model you create two file systems, one with a large file system blocksize for sequential applications and one with a smaller block size for small file applications. You can gain benefits from having file systems of two different block sizes even on a single type of storage. Or you can use different types of storage for each file system to further optimize to the workload. In either case the idea is that you provide two file systems to your end users, for scratch space on a compute cluster for example. Then the end users can run tests themselves by pointing the application to one file system or another to and determining by direct testing which is best for their workload. In this situation you may have one file system optimized for sequential IO with a 1MB blocksize and one for more random workloads at 256KB block size.

Compromise
In this situation you either do not have sufficient information on workloads (i.e. end users won't think about IO performance) or enough storage for multiple file systems. In this case it is generally recommended to go with a blocksize of 256KB or 512KB depending on the general workloads and storage model used. With a 256KB block size you will still get good sequential performance (though not necessarily peak marketing numbers) and you will get good performance and space utilization with small files (256KB has minimum allocation of 8KB to a file). This is a good configuration for multi-purpose research workloads where the application developers are focusing on their algorithms more than IO optimization.

Friday, October 14, 2011

How to check FileSystem Block Size on Linux

In case  you wish to find out what Block Size your system is using in using, you can use the following commands to check

# tune2fs -l /dev/sda1 | grep -i 'block size'
Block size:1024

# blockdev --getbsz /dev/sda1
1024

Thursday, October 13, 2011

Tuning rsize and wsize on NFS for a 10GbE network

Taken from Myricom Site "Do you have recommendations for tuning NFS on a 10GbE network"

  1. Use Recent Linux Kernel 2.6.19 or later. CentOS 6 will be a good candidate to implement.
  2. On /etc/fstab, you can set rsize=1048576,wsize=1048576
  3. You can use the above buffers on NFSv3
  4. Do note that for Linux Kernel 2.18 and below, the rsize and wsize is 32KB.

Monday, October 10, 2011

Using mpstat to display SMP CPU statistics

mpstat is a command-line utilities to report CPU related statistics. For CentOS, to install mpstat, you have to install the sysstat package (http://sebastien.godard.pagesperso-orange.fr/)
# yum install sysstat
1. mpstat is very straigtforward. Use the command below. On my 32-core machine,
# mpstat -P ALL
11:10:11 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
11:10:13 PM  all   40.75    0.00    0.03    0.00    0.00    0.00    0.00   59.22   1027.50
11:10:13 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00   1000.50
11:10:13 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM    2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM    4  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM    8    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     16.50
11:10:13 PM    9    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   10    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   11    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   12  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00     10.50
11:10:13 PM   13    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   14   99.50    0.00    0.50    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM   15  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM   16    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   17    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   18    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   19  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM   20  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM   21    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   22    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   23    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   24  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM   25  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM   26    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
11:10:13 PM   27  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
11:10:13 PM   28    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
where CPU - Processor number. The keyword all indicates that statistics are calculated as averages among all processors.
%user - Show the percentage of CPU utilization that occurred while executing at the user level (application).
%nice -
Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
%sys
- Show the percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing interrupts or softirqs.
%iowait
- Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%irq
- Show the percentage of time spent by the CPU or CPUs to service interrupts.
%soft
- Show the percentage of time spent by the CPU or CPUs to service softirqs. A softirq (software interrupt) is one of up to 32 enumerated software interrupts which can run on multiple CPUs at once.
%steal
- Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%idle - Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
intr/s - Show the total number of interrupts received per second by the CPU or CPUs.


2. Getting average from mpstat To get an average you have to invoke the interval and count argument. In the example, interval is 2 second for 5 count
# mpstat -P ALL 2 5
At the end of the statistics report, you will see an average
Average:     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
Average:     all   40.76    0.00    0.03    0.00    0.00    0.00    0.00   59.21   1047.50
Average:       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00   1000.60
Average:       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:       2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:       3    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:       4   99.90    0.00    0.10    0.00    0.00    0.00    0.00    0.00      0.00
Average:       5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:       6    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:       7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:       8    0.00    0.00    0.10    0.00    0.00    0.00    0.00   99.90     17.30
Average:       9    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      10    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      11    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      12   99.90    0.00    0.00    0.00    0.00    0.10    0.00    0.00     29.70
Average:      13    0.00    0.00    0.10    0.00    0.00    0.00    0.00   99.90      0.00
Average:      14   99.50    0.00    0.50    0.00    0.00    0.00    0.00    0.00      0.00
Average:      15  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      17    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      18    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      19  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:      20  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:      21    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      22    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      23    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      24  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:      25   99.90    0.00    0.10    0.00    0.00    0.00    0.00    0.00      0.00
Average:      26    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      27  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:      28    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      29    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
Average:      30  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00
Average:      31    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00

Sunday, October 9, 2011

Outline of File Hierarchy Systems in RHEL, CentOS and SL

The location of the files and directories in RHEL or its clone system are based on the Filesystem Hierarchy System (FHS) guidelines. For more information on the Filesystem Hierarchy System (FHS), do read the Filesystem Hierarchy Standard

  1. /bin/ (Essential Commands for admins and users)
  2. /usr/bin/ (Common commands for admins and users)
  3. /sbin/ (Essential commands for admins)
  4. /usr/sbin (Common commands for admins)
  5. /tmp/ (Temporary files for all users)
  6. /usr/local/ (Location for locally-installed software indepndent of operating systems updates)
  7. /usr/share/man (Manual Pages)
  8. /usr/src (source code)
  9. /var/ (variable files such as spool and log files)
  10. /var/log/ (Log files)
  11. /etc/ (Configuration files)
  12. /proc/ (Kernel virtual file system)
  13. /dev/ (Device file)
Much of these information is derived from  "Red Hat Enteprise Linux Administration Unleased"

Saturday, October 8, 2011

Avoiding DNS Lookup for Apache 2

If you wish to avoid situations where you do not wish to do DNS lookup for the client machines which will slow Apache performance. To do it is quite quick by setting the HostNameLookups directive to off at /etc/httpd/conf/httpd.conf

HostNameLookups Off

Thursday, October 6, 2011

A good read - Dissecting shared libraries

This article "Dissecting shared libraries" from IBM DeveloperWorks is a good read if you wish to have a deeper understanding on shared libraries.

Monday, October 3, 2011

Troubleshooting Blade Management Module connectivity issues

This article is a sub-set of the full document from IBM "Troubleshooting Management Module connectivity issues"



Solution

The Management Module (MM) and the Advanced Management Module (AMM) are the central points of management for the IBM BladeCenter chassis. As such, when the MM is not responsive, the ability to perform normal management on the chassis is significantly compromised. This document covers four different symptoms related to MM connectivity failures: (1) cannot login to the web or telnet interface because of USERID and/or PASSWORD failures. (2) cannot get any network response from the MM, and (3) the MM responds to network pings, but either the web interface or telnet interface does not respond. (4) MM failover does not work.

Throughout this document, "MM" will be used to mean either the MM or AMM. The term AMM will only be used to point out any differences between the two.

When troubleshooting MM connectivity problems, there are a few common procedures that are used in several situations.



Reset the IPaddress of the MM (this procedure does not work on the AMM)

When the MM is restored to its default TCPIP configuration, the Ethernet port on the MM will attempt to get a DHCP address. Disconnect the Ethernet cable if this is not wanted. With the Ethernet cable disconnected, the MM will search for a DHCP server for five minutes, then timeout and take the address 192.168.70.125/255.255.255.0.

Before resetting the MM to its default configuration, have a laptop local to the chassis that can connect to the MM with a cross-over cable (the AMM supports either cable type). Make sure that the laptop is configured with the IPaddress 192.168.70.100/255.255.255.0 so it will not conflict with any address on the chassis. To reset the TCPIP address on the MM, insert a paper-clip into the hole on the back of the MM labeled "IPreset" until it depressed the button inside. Hold it there for just under three seconds, then remove the paper clip. That resets the MM's Ethernet interface to its default configuration.



Reset the IPaddress of the AMM using the serial cable

The AMM has a port for ethernet and serial connectivity. The serial port is at the top of the AMM, just above the video connection. To connect to the serial port, insert one end of a straight-through ethernet cable in the AMM serial port. Attach the other end of the cable to the serial dongle whose pinouts are described in the AMM Installation Guide ("Serial connection," near the end of Chapter 3).

The default serial settings for the AMM are 57k, 8 data bits, No parity, 1 stop bit, flow control off. Once connected to the serial console, login as usual. Create a basic config for the external interface with the following commands (system: x is either system:mm 1 for the AMM in slot 1 or system:mm 2 for the AMM in slot 2).

use static ip: ifconfig -eth0 -c static -T system:mm x

IPaddress: ifconfig -eth0 -i ip-address -T system:mm x

subnet mask ifconfig -eth0 -s subnet mask -T

system:mm x

gateway: ifconfig -eth0 -g IPaddress of gateway -T

system:mm x

They can be combined into one long command as follows:

ifconfig -eth0 -i ip_address -s subnet mask -g IPof gateway -c static -T system:mm x






Reset the MM to its default configuration

One should remember that resetting the MM to defaults turns off the external ports for all four I/O modules, which will cut off all network and fibre connectivity. Therefore, this operation should only be done when the chassis is in a maintenance window and can be off-line for a short period of time. Also, when the MM is restored to its default configuration, it will attempt to get a DHCP address. Disconnect the Ethernet cable if a DHCP address is not wanted. The MM will search for a DHCP server for five minutes, then timeout and take the address 192.168.70.125/255.255.255.0. Before resetting the MM to its default configuration, have a laptop local to the chassis that can connect to the MM with a cross-over cable (the AMM supports either cable type). Make sure that the laptop is configured with the IPaddress 192.168.70.100/255.255.255.0 so it will not conflict with any default address on the chassis.

If the MM is accepting web logins, the default configuration can be restored in the web GUI at:

Select (MM) MM Control, click Restore Defaults, and then click Restore Defaults

Select (AMM) MM Control, click Configuration Mgmt, then click Restore Defaults or click Restore Defaults Preserve Logs

If neither login service is working, the default configuration can be restored by accessing the back of the MM. On the back of the MM, there is a pin hole that is large enough for a paper clip. It is labeled "IPreset." In addition to resetting the IPaddress, pushing a paper clip in for the right amount of time resets the entire MM configuration back to its defaults. To reset the Management Module to the default configuration, including the default login name "USERID" and password "PASSW0RD," push a paper clip into the pin hole until it hits the button inside and hold it. The amount of time required to hold the pin in varies as follows:

MM with 82D firmware or earlier = push in for 5 seconds, then release the pin for 5 seconds, then push it in for another 10 seconds. The timing is quite precise, make sure a watch with a second hand is available. When the reset starts, the fans will ramp up to full speed, which is clearly audible.

AMM or MM with 82F firmware or later = push in the pin and hold it for 10 seconds. When the reset starts, the fans will ramp up to full speed, which is clearly audible.








Remove and reinsert the MM

Troubleshooting the MM sometimes requires physically removing it from the slot and re-inserting it. Before removing it, note whether the green Ethernet LED or amber LED are lit. In normal operations with an Ethernet cabled connected, the Ethernet LED will be on, and the ambler LED will be off. The amber LED will come on briefly when the MM is powered on or reset. It is also a good idea to look at the female connectors when the MM is removed and examine the female connectors to confirm they have not been damaged. When both MMs are removed, the fans ramp up to full speed.

This is clearly audible. When re-inserting the MM, listen to hear if the fans return to the previous noise level. If they do, that indicates that the MM has completed its POST process. If they do not, that indicates that there is some other problem with the chassis that the MM is trying to address. For a visual indication that the MM is working correctly, look at the MM directly.

After the MM is re-inserted and an Ethernet cable connected, confirm the status of the green Ethernet LED and the amber LED. If the amber LED stays on, that indicates a fault in the MM.





Symptom 1: Cannot login due to bad userid or password

If a user makes five unsuccessfull login attempts, the MM will stop accepting logins for a period of time. Two minutes is the default lockout time, though this is configurable in the MM interface at MM Control then click Login Profiles.

If login fails through both the web and telnet interface, resetting the MM to the default login of "USERID" and "PASSWORD" can be accomplished by following the procedure "Reset the MM to its default configuration." The default login ID and password are case sensitive, and in "PASSWORD." a zero is used for the letter "O."

If USERID/PASSWORD login problems still exist after resetting the MM to defaults, contact IBM support. If the MM does not have network connectivity after resetting defaults, follow the steps below for the appropriate symptom.







Symptom 2: MM does not respond to any network connection
If the MM does not respond to any remote network connection, troubleshooting will need to be done at the chassis. The first step is to find a laptop that can login to other MMs and connect it to the MM with a cross-over cable (either a cross-over or straight through can be used for the AMM). Verify that the IPconfiguration on the laptop puts it in the same subnet as the MM, and verify that the laptop is not running a local firewall. Try to connect to the MM via a web browser, telnet, and ping. Depending on the results you get, take the following steps:

If the laptop has complete access to the MM when connected locally, then the previous connectivity problems are most likely due to network problems on the customer's LAN, or the other workstation the customer used to access this MM.

If the laptop can ping the MM, but cannot connect via web browser or telnet, go to symptom

If the laptop cannot ping the MM, take the following steps to try and restore connectivity.

Clear the arp cache on the laptop. - If the chassis has a redundant MM, fail over to it and attempt to connect to it.

If the chassis only has one MM, move it to the other slot, following the procedure "Remove and reinsert the MM."

Follow the procedure "Reset the IPaddress of the MM" or "Reset the IPaddress for the AMM using the serial cable."

Follow the procedure "Reset the MM to its default configuration"

If these all fail, contact IBM support for assistance.



Symptom 3: Cannot connect to the MM using the web browser/telnet/ssh, but can ping the MM
The MM runs a few network servers that enable users to login and manage the chassis. If basic connectivity via 'ping' is functioning, but one or more of the login services is not working (for example, web server, telnet server), the problem is due to a configuration error or firmware defects. It is never a hardware failure. When the MM will respond to a ping, but any one of the login services does not respond, take the the following steps:

Ensure that a supported web browser is being used.

If possible, verify whether the MM is running the network servers on their default network ports. If all logins fail, check with the administrator for the BladeCenter. If it is possible to login to the web interface, select MM Control and click Port Assignments. There is no way to get that information in the telnet interface.

Verify whether this workstation can connect to other MMs. If it cannot, the problem is most likely due to a firewall running on the client workstation or the network. Shutdown any firewalls on the client machine and try again. If the client still has problems connecting to multiple MMs, consult the network administrator for the LAN.

Restart the MM. If the MM responds to network logins after it has been restarted, this is most likely an MM firmware defect. Download the changelog for the current MM or AMM code and see if any similar issues have been resolved. If not, contact IBM Support for additional assistance.

At this point. troubleshooting must continue with a laptop or other workstation local to the MM. Find a laptop which can connect to other MMs, and connect it directly to the MM with a crossover cable (both cross-over and straight-though work for the AMM). Verify that the Ethernet link is up and the laptop is configured so it is on the same subnet as the MM. If the laptop can ping the MM, attempt to login to the MM with a supported web browser. If that works, contact the network administrator for assistance troubleshooting the network.

If the MM still does not allow logins over the WEB interface at this point, restore the MM to its defaults with the procedure "Reset the MM to its default configuration." If this does not restore connectivity, contact IBM Support.



Symptom 4: Failover of MM to redundant MM does not work

When there are two MMs in a chassis, one MM is active and the second MM is on standby. When a user initiates a failover from the primary to redundant, the primary sends a message to redundant to become the primary, then reboots itself. On rare occasions, this does not work. When it does not, take the follow steps to resolve it:

Examine the MM Event log and the MM BIST log to see if any errors have been detected.

Physically remove the primary MM and see if the redundant MM boots successfully. If it does not, move it to the other slot and see if it can boot in that slot

Reset the MM to defaults using the procedure "Reset the MM to its default configuration."

Repeat the failover process with both MMs. If it still does not work, contact IBM support