Wednesday, February 26, 2014

Using mcelog to detect cpu and memory issues on CentOS 6

 mcelog is a daemon that collects and decodes Machine Check Exception data on x86-64 machines

According to mcelog website, 

The mcelog daemon accounts memory and some other errors errors in various ways. mcelog --client can be used to query a running daemon. The daemon can also execute triggers when configurable error thresholds are exceeded. This is used to implement a range of automatic predictive failure analysis algorithms: including bad page offlining and automatic cache error handling. User defined actions can be also configured.

For CentOS 6, mcelog is a default install. But you could do a yum install
# yum install mcelog

CentOS has already configured the cron to run hourly check. You can take a look at /etc/cron.hourly/mcelog.cron. It should be something like this below
#!/bin/bash

# do not run if mcelogd is running
service mcelogd status >& /dev/null
[ $? -eq 0 ] && exit 0

# is mcelog supported?
/usr/sbin/mcelog --supported >& /dev/null
if [ $? -eq 1 ]; then
       exit 1;
fi

/usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog

To view the error, do take a look at /var/log/mcelog
# less /var/log/mcelog

To see log in real time,
# tail -f /var/log/mcelog

No comments: