Real-Time Monitoring with Nagios


Real-Time Monitoring with Nagios

 

 

 

September 6, 2024 - A common question is whether Nagios Monitoring Solutions provide real-time monitoring. The short answer is yes.

 

Is real-time monitoring always better, though? Some people may be surprised to hear that while real-time monitoring is useful in many cases, it's not always the best monitoring technique across all monitoring use cases.

In this article, you'll learn how you can implement real-time monitoring with Nagios as well as use cases for when real-time monitoring is not beneficial.

 

How Nagios Does Real-Time Monitoring

There are three ways that you can utilize real-time monitoring with Nagios solutions:

1. SNMP Traps

SNMP traps are a classic example of real-time monitoring. Nagios XI in particular is able to be configured to receive SNMP traps, which give you the ability to see real-time notifications about devices you're monitoring.  As a result, XI can notify the right people the instant XI receives a trap. For example, you might have a switch configured to send an alert to XI when a cable is plugged into an interface. When that plug goes in, the device sends the alert to XI immediately, and XI sends out the notification.

2. Passive Checks

Passive checks are another method XI can use to monitor devices in real time. With active checks, XI reaches out for information on the devices it's monitoring. Passive checks function in the opposite way, with monitored devices scheduling their own checks and sending the results back to XI. Because the monitored system doesn't need to schedule thousands or tens of thousands of Service checks, it's much easier for the system to schedule those checks more frequently, such as every 30 seconds, for example.

Passive checks are also beneficial because they can lessen the load on your XI server by reducing the number of active checks. In order to keep track of any issues that arise in your monitored devices, XI performs many active checks over intervals, which increases the load on XI, especially when all your checks are set to one-minute intervals. By dispersing some of that load onto your devices through passive checks, XI is able to run more efficiently, and you're able to quickly receive notifications about changes in status.

3. Nagios Cross-Platform Agent

Beyond using networking devices and other SNMP-enabled devices, you can also use the Nagios Cross-Platform Agent (NCPA) with XI to receive real-time alerts from server infrastructure. With this agent, you can monitor server statistics for major operating systems (i.e., Windows, Linux, and Mac) with active or passive checks and graph most of the general server system information in real time. To have this real-time data, NCPA has a passive configuration option.

All these methods allow you to utilize real-time monitoring in XI. With these capabilities, you can gain valuable insights about what you are monitoring; however, real-time monitoring won't always give you the best value.

 

When Real-Time Monitoring Is Not Helpful

When you have real-time monitoring capabilities, a monitoring system will take immediate action upon receiving a piece of information. For instance, a monitoring solution could receive an SNMP trap, and it will instantly send out a notification. There might also be a different idea in the industry where “real-time” monitoring involves a continuous flow of performance data delivered at very short (perhaps one-second) intervals.

Whatever your definition, real-time monitoring can be helpful in certain situations, but it's not the best way to monitor in every situation.

 

As a use case for when real-time monitoring isn't a benefit, think about a virtual machine Host that shows a spike in disk I/O among data that is otherwise consistent and normal. Do we want to notify the monitoring team or any other team about this spike in the data? No. We wouldn't want to wake up an on-call technician at 3 a.m. about this situation. This is a transitory spike in a performance metric. It's not persistent. Lots of performance metrics spike momentarily and recover normal performance.

Using real-time monitoring to immediately send out a notification about a performance data spike like the one in this use case leads to a problem called notification fatigue (or, in the case of waking a sleeping on-call technician, it can lead to physical fatigue). Teams get so overwhelmed by notifications that aren't meaningful that they start to ignore them. That's not good for the organization because they may also ignore truly critical issues.

Additionally, definitions of “real-time” monitoring that include some sense of streams of data at very short intervals are problematic in two ways:

1 This kind of definition emphasizes the wrong part of performance data notifications. The absolute length of the interval between data points is not what is most important; the persistence of the issue over time is.

2 Sampling performance metrics too frequently can have a significant negative impact on the performance of the devices being monitored. When you're checking devices at short intervals, those devices will spend more time responding to checks from the monitoring tool and less time responding to actual requests. As a result, devices won't function optimally.

These problems are why XI has built-in check logic that can be configured to only notify the team about persistent issues. When disk I/O, CPU utilization, or bandwidth rises above a specified threshold and remains there for a certain amount of time, that's when we might want to wake up a technician.

 

Conclusion

Real-time monitoring is useful in the right situations, but it's important to keep in mind that it's not the only way to monitor. Especially in cases where we want to limit notifications to persistent problems, a “real-time” focus can lead to real notification fatigue for teams.

If you're curious about using real-time and near real-time monitoring with Nagios XI, download a free 30-day trial today.

 

Nagios XI and all Nagios solutions are available in Romania through Simple IT, Nagios Partner in Romania.

 

 

About Simple IT

 

SIMPLE IT is a distributor for software solutions and hardware appliances, adding value with consulting, training, implementation, configuration and support services, backed by certified specialists, in order to offer the best IT experience to customers and partners. For more information, please visit www.simpleit.com.ro.