I encountered this error on my compute nodes using Torque 4.2.5.
pbs_mom.29384;Svr;pbs_mom;LOG_ERROR::read_tcp_reply, Mismatching protocols. Expected protocol 4 but read reply for 0
This error was quite misleading. I was looking at my protocol which was IB and Ethernet.
When I did a pbsnodes -l, all the compute nodes were down.
# pbsnodes -l
node-c00 down
node-c01 down
.....
.....
After some troubleshooting, I realised that the error is due to use of inconsistent use of short hostname and long hostname. On my /etc/hosts, I used the long hostname for the compute node first (which Torque Server pick up.
192.168.1.2 node-c00.cluster.com node-c00
......
......
But on each of the client nodes ie /etc/sysconfig/network, I used the short hostname. This create some confusion for the torque server
HOSTNAME=node-c00
To correct the matter, just rename the HOSTNAME to the long name
HOSTNAME=node-c00.cluster.com
Do a restart of the pbs_mom on the client node and you should get your nodes alive
# service pbs_mom restart