Table of Contents

Nagios

Overview

SoCS hosts a Nagios server which is available at “monitor.socs.uoguelph.ca”. It is an open-source monitoring service that is used within SoCS to monitor the status of all other servers. It helps identify when security updates are needed, flag when resource consumption is high, flag server errors, and more.

Configuration Information

Nagios is used in a relatively default configuration for our needs. NRPE is used to monitor all Linux servers with nagios-nrpe-server installed along with the nagios plugins on each server to be monitored. Configuration is kept in /etc/nagios-nrpe-server/nrpe_local.cfg. This file is pushed via Ansible (nagios-client role) to each client along with plugins to check debian packages and to check free memory. Depending on the configuration of the server, the check disk may need to be modified to account for multiple disk, or disk with different names (ie sda1, hda1, vda1).

The following are monitored on all servers:

Other things to monitor are server specific

TODO