One of our GPFS NSD Nodes are forever stuck in arbitrating nodes. One
of the symptoms that was noticeable was that the users was able to
log-in but unable to do a "ls" of their own directories. You can get a
quick deduction by looking at one of the NSD Nodes. For this kind of
issues, do a mmdiag --waiters first. There are limited articles on this
# mmdiag --waiters
.....
.....
0x7FB0C0013D10 waiting 27176.264845756 seconds, SharedHashTabFetchHandlerThread:
on ThCond 0x1C0000F9B78 (0x1C0000F9B78) (TokenCondvar), reason 'wait for SubToken to become stable'
References:
- IZ17622: GPFS DEADLOCK WAITING FOR SUBTOKEN TO BECOME STABLE CAUSES HANG
- GPFS File System Deadlock
For more information on the resolution, see
GPFS NSD Nodes stuck in Arbitrating Mode (Linux Cluster)
No comments:
Post a Comment