You are here

Monitoring the load on your server

Submitted by Julian Stokes on 25 November, 2013 - 15:32

One of the keys to running a web/mail server is to try be aware of potential issues before they become problems. That means stepping away from the glamorous world of graphics and flashy website cleverness into the murky depths of server monitoring.

Every morning begins with the perusal of the "Logwatch" email prepared in the early hours by the machine itself summarising and detailing what has been going on in the past 24 hours. We see all sorts of information here, from the number of emails sent and rejected as spam, to various failed attempts to gain access to the server by guessing appropriate user names and passwords.

But that's only good as far as it goes. Logwatch is about the past. What if something's going on right now? I need to know about it as it's happening. Being told about a mishap in a messsage tomorrow morning will not help my customers now. My aim is as far as is possible to address  any potential problems before my customers notice!

One way of checking that everything is OK is to monitor the load of the server - to see how hard it is working. This is where a load monitoring script is used. Every minute of every day it sits there checking that the server load has not passed a pre-determined threshhold, and if it does it fires out an email to me with useful information about what is going on.

I got my load monitoring script from http://linuxadministrator.pro/blog/?p=293
And a very good script it is too. Copying it saved me a great deal of hard work, but as with so many things it didn't just work "out of the tin". I had to do a little fettling and correct a minor mistake. Being a good netizen I have informed the author, but I would like to take the opportunity to share with you my corrected version so that I can possibly save you even more time!

The mistake in the original version is that a '--' should appear in front of the email address that is to receive the alerts.

############### START ###############

    #!/bin/bash

    CUR_TIME=`date +"%A %b %e %r"`
    HOSTNAME=`hostname`

    Load_AVG=`uptime | cut -d'l' -f2 | awk '{print $3}' | cut -d. -f1`
    LOAD_CUR=`uptime | cut -d'l' -f2 | awk '{print $3 " " $4 " " $5}' | sed 's/,//'`

    #Set the following value according to your requirement.
    LIMIT=1

    if [ $Load_AVG -gt $LIMIT ]
    then

    /bin/ps -auxf >> /root/process_out

    echo "Current Time :: $CUR_TIME" >> /tmp/load.txt
    echo "Current Load Average :: $LOAD_CUR" >> /tmp/load.txt
    echo "The list of current running processes..." >> /tmp/load.txt
    echo "Kindly login to your server and check"  >> /tmp/load.txt

    /usr/bin/mutt -s "ALERT!!! High 1 minute load on '$HOSTNAME'" -a /root/process_out -- putyouremailaddresshere < /tmp/load.txt

    fi

    /bin/rm -f /tmp/load.txt
    /bin/rm -f /root/process_out

    ############### END ###############