Site Tools


sysadmin:services:nagios

This is an old revision of the document!


Nagios

Overview

SoCS hosts a Nagios server which is available at “monitor.socs.uoguelph.ca”. It is an open-source monitoring service that is used within SoCS to monitor the status of all other servers. It helps identify when security updates are needed, flag when resource consumption is high, flag server errors, and more.

Configuration Information

Nagios is used in a relatively default configuration for our needs. NRPE is used to monitor all Linux servers with nagios-nrpe-server installed along with the nagios plugins on each server to be monitored. Configuration is kept in /etc/nagios-nrpe-server/nrpe_local.cfg. This file is pushed via Ansible (nagios-client role) to each client along with plugins to check debian packages and to check free memory.

The following are monitored on all servers:

  • Load
  • Number of Processes
  • Memory Free
  • Total Users
  • Debian Package Update Status
  • Disk Space

Other things to monitor are server specific

  • SSL Certificate expiry
  • SLAPD Replication to ensure primary and secondary LDAP are in sync
  • HTTP-Server to check that Apache/Nginx is functioning
  • ZFS Health - for proxmox to ensure pool is healthy and there is free space
  • IMAP and SMTP Health for mail server
TODO
  • Consider if there is a need for a secondary Nagios implementation to monitor the main Nagios server.
  • Look into Host Groups for possible improvements.
  • Update plugins for Windows monitoring and not currently functional
sysadmin/services/nagios.1712685997.txt.gz · Last modified: 2024/04/09 18:06 by kjohns23