Monday, October 3, 2011

Troubleshooting Blade Management Module connectivity issues

This article is a sub-set of the full document from IBM "Troubleshooting Management Module connectivity issues"



Solution

The Management Module (MM) and the Advanced Management Module (AMM) are the central points of management for the IBM BladeCenter chassis. As such, when the MM is not responsive, the ability to perform normal management on the chassis is significantly compromised. This document covers four different symptoms related to MM connectivity failures: (1) cannot login to the web or telnet interface because of USERID and/or PASSWORD failures. (2) cannot get any network response from the MM, and (3) the MM responds to network pings, but either the web interface or telnet interface does not respond. (4) MM failover does not work.

Throughout this document, "MM" will be used to mean either the MM or AMM. The term AMM will only be used to point out any differences between the two.

When troubleshooting MM connectivity problems, there are a few common procedures that are used in several situations.



Reset the IPaddress of the MM (this procedure does not work on the AMM)

When the MM is restored to its default TCPIP configuration, the Ethernet port on the MM will attempt to get a DHCP address. Disconnect the Ethernet cable if this is not wanted. With the Ethernet cable disconnected, the MM will search for a DHCP server for five minutes, then timeout and take the address 192.168.70.125/255.255.255.0.

Before resetting the MM to its default configuration, have a laptop local to the chassis that can connect to the MM with a cross-over cable (the AMM supports either cable type). Make sure that the laptop is configured with the IPaddress 192.168.70.100/255.255.255.0 so it will not conflict with any address on the chassis. To reset the TCPIP address on the MM, insert a paper-clip into the hole on the back of the MM labeled "IPreset" until it depressed the button inside. Hold it there for just under three seconds, then remove the paper clip. That resets the MM's Ethernet interface to its default configuration.



Reset the IPaddress of the AMM using the serial cable

The AMM has a port for ethernet and serial connectivity. The serial port is at the top of the AMM, just above the video connection. To connect to the serial port, insert one end of a straight-through ethernet cable in the AMM serial port. Attach the other end of the cable to the serial dongle whose pinouts are described in the AMM Installation Guide ("Serial connection," near the end of Chapter 3).

The default serial settings for the AMM are 57k, 8 data bits, No parity, 1 stop bit, flow control off. Once connected to the serial console, login as usual. Create a basic config for the external interface with the following commands (system: x is either system:mm 1 for the AMM in slot 1 or system:mm 2 for the AMM in slot 2).

use static ip: ifconfig -eth0 -c static -T system:mm x

IPaddress: ifconfig -eth0 -i ip-address -T system:mm x

subnet mask ifconfig -eth0 -s subnet mask -T

system:mm x

gateway: ifconfig -eth0 -g IPaddress of gateway -T

system:mm x

They can be combined into one long command as follows:

ifconfig -eth0 -i ip_address -s subnet mask -g IPof gateway -c static -T system:mm x






Reset the MM to its default configuration

One should remember that resetting the MM to defaults turns off the external ports for all four I/O modules, which will cut off all network and fibre connectivity. Therefore, this operation should only be done when the chassis is in a maintenance window and can be off-line for a short period of time. Also, when the MM is restored to its default configuration, it will attempt to get a DHCP address. Disconnect the Ethernet cable if a DHCP address is not wanted. The MM will search for a DHCP server for five minutes, then timeout and take the address 192.168.70.125/255.255.255.0. Before resetting the MM to its default configuration, have a laptop local to the chassis that can connect to the MM with a cross-over cable (the AMM supports either cable type). Make sure that the laptop is configured with the IPaddress 192.168.70.100/255.255.255.0 so it will not conflict with any default address on the chassis.

If the MM is accepting web logins, the default configuration can be restored in the web GUI at:

Select (MM) MM Control, click Restore Defaults, and then click Restore Defaults

Select (AMM) MM Control, click Configuration Mgmt, then click Restore Defaults or click Restore Defaults Preserve Logs

If neither login service is working, the default configuration can be restored by accessing the back of the MM. On the back of the MM, there is a pin hole that is large enough for a paper clip. It is labeled "IPreset." In addition to resetting the IPaddress, pushing a paper clip in for the right amount of time resets the entire MM configuration back to its defaults. To reset the Management Module to the default configuration, including the default login name "USERID" and password "PASSW0RD," push a paper clip into the pin hole until it hits the button inside and hold it. The amount of time required to hold the pin in varies as follows:

MM with 82D firmware or earlier = push in for 5 seconds, then release the pin for 5 seconds, then push it in for another 10 seconds. The timing is quite precise, make sure a watch with a second hand is available. When the reset starts, the fans will ramp up to full speed, which is clearly audible.

AMM or MM with 82F firmware or later = push in the pin and hold it for 10 seconds. When the reset starts, the fans will ramp up to full speed, which is clearly audible.








Remove and reinsert the MM

Troubleshooting the MM sometimes requires physically removing it from the slot and re-inserting it. Before removing it, note whether the green Ethernet LED or amber LED are lit. In normal operations with an Ethernet cabled connected, the Ethernet LED will be on, and the ambler LED will be off. The amber LED will come on briefly when the MM is powered on or reset. It is also a good idea to look at the female connectors when the MM is removed and examine the female connectors to confirm they have not been damaged. When both MMs are removed, the fans ramp up to full speed.

This is clearly audible. When re-inserting the MM, listen to hear if the fans return to the previous noise level. If they do, that indicates that the MM has completed its POST process. If they do not, that indicates that there is some other problem with the chassis that the MM is trying to address. For a visual indication that the MM is working correctly, look at the MM directly.

After the MM is re-inserted and an Ethernet cable connected, confirm the status of the green Ethernet LED and the amber LED. If the amber LED stays on, that indicates a fault in the MM.





Symptom 1: Cannot login due to bad userid or password

If a user makes five unsuccessfull login attempts, the MM will stop accepting logins for a period of time. Two minutes is the default lockout time, though this is configurable in the MM interface at MM Control then click Login Profiles.

If login fails through both the web and telnet interface, resetting the MM to the default login of "USERID" and "PASSWORD" can be accomplished by following the procedure "Reset the MM to its default configuration." The default login ID and password are case sensitive, and in "PASSWORD." a zero is used for the letter "O."

If USERID/PASSWORD login problems still exist after resetting the MM to defaults, contact IBM support. If the MM does not have network connectivity after resetting defaults, follow the steps below for the appropriate symptom.







Symptom 2: MM does not respond to any network connection
If the MM does not respond to any remote network connection, troubleshooting will need to be done at the chassis. The first step is to find a laptop that can login to other MMs and connect it to the MM with a cross-over cable (either a cross-over or straight through can be used for the AMM). Verify that the IPconfiguration on the laptop puts it in the same subnet as the MM, and verify that the laptop is not running a local firewall. Try to connect to the MM via a web browser, telnet, and ping. Depending on the results you get, take the following steps:

If the laptop has complete access to the MM when connected locally, then the previous connectivity problems are most likely due to network problems on the customer's LAN, or the other workstation the customer used to access this MM.

If the laptop can ping the MM, but cannot connect via web browser or telnet, go to symptom

If the laptop cannot ping the MM, take the following steps to try and restore connectivity.

Clear the arp cache on the laptop. - If the chassis has a redundant MM, fail over to it and attempt to connect to it.

If the chassis only has one MM, move it to the other slot, following the procedure "Remove and reinsert the MM."

Follow the procedure "Reset the IPaddress of the MM" or "Reset the IPaddress for the AMM using the serial cable."

Follow the procedure "Reset the MM to its default configuration"

If these all fail, contact IBM support for assistance.



Symptom 3: Cannot connect to the MM using the web browser/telnet/ssh, but can ping the MM
The MM runs a few network servers that enable users to login and manage the chassis. If basic connectivity via 'ping' is functioning, but one or more of the login services is not working (for example, web server, telnet server), the problem is due to a configuration error or firmware defects. It is never a hardware failure. When the MM will respond to a ping, but any one of the login services does not respond, take the the following steps:

Ensure that a supported web browser is being used.

If possible, verify whether the MM is running the network servers on their default network ports. If all logins fail, check with the administrator for the BladeCenter. If it is possible to login to the web interface, select MM Control and click Port Assignments. There is no way to get that information in the telnet interface.

Verify whether this workstation can connect to other MMs. If it cannot, the problem is most likely due to a firewall running on the client workstation or the network. Shutdown any firewalls on the client machine and try again. If the client still has problems connecting to multiple MMs, consult the network administrator for the LAN.

Restart the MM. If the MM responds to network logins after it has been restarted, this is most likely an MM firmware defect. Download the changelog for the current MM or AMM code and see if any similar issues have been resolved. If not, contact IBM Support for additional assistance.

At this point. troubleshooting must continue with a laptop or other workstation local to the MM. Find a laptop which can connect to other MMs, and connect it directly to the MM with a crossover cable (both cross-over and straight-though work for the AMM). Verify that the Ethernet link is up and the laptop is configured so it is on the same subnet as the MM. If the laptop can ping the MM, attempt to login to the MM with a supported web browser. If that works, contact the network administrator for assistance troubleshooting the network.

If the MM still does not allow logins over the WEB interface at this point, restore the MM to its defaults with the procedure "Reset the MM to its default configuration." If this does not restore connectivity, contact IBM Support.



Symptom 4: Failover of MM to redundant MM does not work

When there are two MMs in a chassis, one MM is active and the second MM is on standby. When a user initiates a failover from the primary to redundant, the primary sends a message to redundant to become the primary, then reboots itself. On rare occasions, this does not work. When it does not, take the follow steps to resolve it:

Examine the MM Event log and the MM BIST log to see if any errors have been detected.

Physically remove the primary MM and see if the redundant MM boots successfully. If it does not, move it to the other slot and see if it can boot in that slot

Reset the MM to defaults using the procedure "Reset the MM to its default configuration."

Repeat the failover process with both MMs. If it still does not work, contact IBM support