Nagios

Overview

SoCS hosts a Nagios server which is available at “monitor.socs.uoguelph.ca”. It is an open-source monitoring service that is used within SoCS to monitor the status of all other servers. It helps identify when security updates are needed, flag when resource consumption is high, flag server errors, and more.

Configuration Information

Nagios is used in a relatively default configuration for our needs. NRPE is used to monitor all Linux servers with nagios-nrpe-server installed along with the nagios plugins on each server to be monitored. Configuration is kept in /etc/nagios-nrpe-server/nrpe_local.cfg. This file is pushed via Ansible (nagios-client role) to each client along with plugins to check debian packages and to check free memory. Depending on the configuration of the server, the check disk may need to be modified to account for multiple disk, or disk with different names (ie sda1, hda1, vda1).

The following are monitored on all servers:

Load
Number of Processes
Memory Free
Total Users
Debian Package Update Status
Disk Space

Other things to monitor are server specific

SSL Certificate expiry
SLAPD Replication to ensure primary and secondary LDAP are in sync
HTTP-Server to check that Apache/Nginx is functioning
ZFS Health - for proxmox to ensure pool is healthy and there is free space
IMAP and SMTP Health for mail server

TODO

Consider if there is a need for a secondary Nagios implementation to monitor the main Nagios server.
Look into Host Groups for possible improvements.
Update plugins for Windows monitoring and not currently functional

School of Computer Science Wiki

Table of Contents

Nagios

Overview

Configuration Information

TODO