Site Tools


sysadmin:projects:w23:nagioscleanup

Nagios Cleanup

  • Nagios monitors the health of SoCS Servers
  • Some of the checks haven't been calibrated in a while, particularly those for the NoMachine/Linux Cluster
  • We went through the list of checks reporting as Critical and determined whether the behaviour was actually one that should be alerted. For example, number of processes, which alerts at 200 by default, but it is not uncommon to have ten times that on the linux servers.
  • Alert thresholds were adjusted as required. This should prevent false positives from hiding alerts that are important to immediately respond to
sysadmin/projects/w23/nagioscleanup.txt · Last modified: 2023/03/15 15:55 by kjohns23