Monday, December 16, 2013

GPFS NSD Nodes stuck in Arbitrating Mode

One of our GPFS NSD Nodes are forever stuck in arbitrating nodes. One of the symptoms that was noticeable was that the users was able to log-in but unable to do a "ls" of their own directories. You can get a quick deduction by looking at one of the NSD Nodes. For this kind of issues, do a mmdiag --waiters first. There are limited articles on this
# mmdiag --waiters 

.....
.....
0x7FB0C0013D10 waiting 27176.264845756 seconds, SharedHashTabFetchHandlerThread: 
on ThCond 0x1C0000F9B78 (0x1C0000F9B78) (TokenCondvar), reason 'wait for SubToken to become stable'
References:
  1. IZ17622: GPFS DEADLOCK WAITING FOR SUBTOKEN TO BECOME STABLE CAUSES HANG
  2. GPFS File System Deadlock

For more information on the resolution, see  GPFS NSD Nodes stuck in Arbitrating Mode (Linux Cluster)

No comments: