Monitoring DO network stability

September 21, 2019 172 views
Monitoring Networking

Every once in a while either due to unforeseen networking issues or due to DO maintenance I forgot to schedule in Nagios, I get an alert blast due to so many connectivity issues timing out.

Is there a good endpoint to monitor within DO’s ownership that can serve as a way to keep Nagios noise down?

My current plan is to make all of my monitored hosts dependent on the upstream DO “host” so that if the top level one fails, I get one notification that my droplet endpoints may be impacted, but none of the failures of the individual endpoints’ reachability.

This problems happens both monitoring from the inside-out, and the outside-in. Cross-region to droplets can be listed as “down” due to subsequent failures, but also the monitoring droplet trying to reach non-DO hosted items fails due to network turbulence.

Is there any DO-level endpoint I can monitor to work as an overall “regional status health” item, that can inform individual DO connectivity statuses?

1 Answer

I think you should be able to accomplish that by running a ping check against the network gateway that is set on the droplet.

While you could monitor a higher level “network” abstraction, it wouldn’t necessarily solve your problem, because the issue could be isolated to a hypervisor, network switch, or some other network device between you and the final router.

So if you were monitoring say connectivity at AMS but the issue was with the hypervisor that your droplet was on, then it would come back as Network Ok, but the droplet would still be potentially affected.

Have another answer? Share your knowledge.