Maintenance mode

Monitoring servers, services and connections is great. It enables pro-active management, notification and escalation and improves root cause analysis.

One big challenge is the number of notifications being sent and the relevance of those notifications. A well set-up environment sents notifications when problems raise or a negative trend is detected. Signals for the Administrator to get out of his lazy chair.
Most environments, however, sent more notifications then needed and are often irrelevant. This causes a negative effect, the mailbox fills up rapidly and the value of the message decrease.

An example of a not well-planned monitoring environment is a reboot schedule. Especially when terminal servers are periodically rebooted, or re-deployed, servers maybe be unreachable once in a while. The monitoring software assumes the server is in trouble and would cause an alert and sent notifications.

But a planned reboot of a server, or re-deploying a server, is a intentional action by an adminstrator. No reason to raise an alert or sent a notification.

For these situations most monitoring software has a “maintenance mode”. An administrator can specify that a specific server (or service) is in maintenance for a specified duration which suppresses the notifications. Great!

But setting this maintenance mode each time scheduled reboot is planned is a lot of manual work. The same for a deployment, an administrator does not want to set a maintenance mode in the monitoring software, then deploy the machine and go back to the monitoring software to disable the maintenance mode. Why not? Because administrators are lazy (and so am I)!

The solution comes with scripts. There are scripts available that are configurable with parameters and set a server / service in maintenance mode. By calling these scripts before and after a reboot (or deployment) the server can be placed in (and out) maintenance mode without a manual action.
This results in less notifications which are not relevant.

Microsoft System Center Operations Manager (SCOM)
SCOM can be controlled using a PowerShell script. There are multiple scripts on the internet, one of them can be found here.
A GUI based tool, the “SCOM Remote Maintenance Mode Scheduler”, can be found here.

Nagios (including GroundWork)
Nagios is a bit more difficult to control since it is web-based. There are PERL scripts available but these scripts should be executed on the nagios server, but the best result is achieved if the script is run from the target server self (or the deployment server). A windows based script (VBscript) is available and is named nagios_downtime.vbs.

Keep in mind that the server / service-name you specify is CaSE SensItVe. In case the name won’t match the name in Nagios, an error is thrown “Error 403 – Sorry, but you are not authorized to commit the specified command.”

Besides the name the datetime setting is very important. Check the date format in the VBS script (nagiosDateFormat=). This has to be the same as in the date_format in nagios.cfg.

In order to remote-control Nagios the configuration needs to be altered. The reason for this is that Nagios blocks external access by default, this has to be enabled. This can be achieved by disabling Guava Single On in the httpd.conf. You can read more about the configuration here.
Creating a .htaccess file is not always necessary, but in some situations is. How to create a .htaccess file for Nagios can be read here.

Ingmar Verheij