Table of Contents
Nagios
Overview
SoCS hosts a Nagios server which is available at “monitor.socs.uoguelph.ca”. It is an open-source monitoring service that is used within SoCS to monitor the status of all other servers. It helps identify when security updates are needed, flag when resource consumption is high, flag server errors, and more.
Configuration Information
Nagios is used in a relatively default configuration for our needs. NRPE is used to monitor all Linux servers with nagios-nrpe-server installed along with the nagios plugins on each server to be monitored. Configuration is kept in /etc/nagios-nrpe-server/nrpe_local.cfg. This file is pushed via Ansible (nagios-client role) to each client along with plugins to check debian packages and to check free memory. Depending on the configuration of the server, the check disk may need to be modified to account for multiple disk, or disk with different names (ie sda1, hda1, vda1).
The following are monitored on all servers:
- Load
- Number of Processes
- Memory Free
- Total Users
- Debian Package Update Status
- Disk Space
Other things to monitor are server specific
- SSL Certificate expiry
- SLAPD Replication to ensure primary and secondary LDAP are in sync
- HTTP-Server to check that Apache/Nginx is functioning
- ZFS Health - for proxmox to ensure pool is healthy and there is free space
- IMAP and SMTP Health for mail server
TODO
- Consider if there is a need for a secondary Nagios implementation to monitor the main Nagios server.
- Look into Host Groups for possible improvements.
- Update plugins for Windows monitoring and not currently functional