How To Set Up Alerts with DigitalOcean Monitoring
DigitalOcean Monitoring provides visibility into resource usage across your infrastructure through a variety of metrics. Alert policies allow you to configure customized thresholds for individual resources in order to define healthy usage. Notifications are sent out to let you know when usage exceeds the threshold so that you can respond to changes quickly.
In this guide, we will walk through how to set up alert policies and configure notifications for your DigitalOcean Droplets.
Configure the Target Droplets
Like other DigitalOcean Monitoring functionalities, the alert policies and notifications feature relies on information provided by the DigitalOcean Agent: a small, open-source program that gathers metrics. Before setting up alert policies, the Agent must be installed on each of the participating Droplets.
In this guide, we will be using an Ubuntu 16.04 Droplet to demonstrate how to configure alerting policies and notifications. To quickly get the Agent up and running on the Droplet, follow the instructions below. If you would like additional instructions, our "How To Install and Use the DigitalOcean Agent for Monitoring" article provides more in-depth information about the Agent and a more complete walkthrough.
For new Droplets, the Agent can be installed during the creation process by selecting Monitoring under the Select additional options section of the create page.
For existing Droplets, the Agent can be installed by logging into the Droplet and typing:
- curl -sSL https://agent.digitalocean.com/install.sh | sh
Once the DigitalOcean Agent is installed on your Droplet, configure a
sudo user for administrative purposes. You can follow our initial server setup guide for Ubuntu 16.04 to create an administrative user.
When you are ready to continue, visit the DigitalOcean control panel.
Create an Alert Policy
Once the Agent has been installed on your Droplets, you can begin creating alert policies. In the control panel, click Monitoring in the top menu:
On the page that follows, click on the Create alert policy button. If you don't currently have any alert policies in place, you will see a welcome screen with the button in the center:
If you have active alert policies, the button will be on the right-hand side:
You will be taken to the alert policy creation page.
Choosing a Metric and Setting a Threshold
Each alert policy is composed of a metric, a threshold, and an alert interval. These can be configured within the Select metric & set threshold section.
The first step in defining an alerting policy is to choose the resource you wish to create a policy around. The following metrics are supported:
- CPU: The percentage of total CPU used on the Droplet, out of 100%
- Bandwidth — Inbound: The amount of incoming traffic to the Droplet, in MBps
- Bandwidth — Outbound: The amount of outgoing traffic from the Droplet, in MBps
- Disk — Read: The read activity for the Droplet's disks, in MB/s
- Disk — Read: The write activity for the Droplet's disks, in MB/s
- Memory Utilization: The percentage of total memory being used, out of 100%
- Disk Utilization: The percentage of total storage (including attached block storage) being used, out of 100%
Additional information about each metric can be found in our "Glossary of DigitalOcean Metrics and Terminology".
Select a metric you'd like to configure an alert policy around. For this guide, we'll select CPU:
In the next field, choose whether you want to receive an alert when the resource utilization is above or is below the threshold we will set. In most scenarios, alerting when usage climbs above the threshold is the more helpful option, since high usage tells us that the current resources may no longer be sufficient. For this reason, we will choose is above for our policy:
Next, set the actual threshold value for triggering an alert. This will be a specific value or a percentage, depending on the metric you selected. Since for this example we are creating a CPU alert policy, this will be a percentage.
Note: The threshold value you should choose depends largely on what your intentions are for the alert policy and the amount of usage that is considered normal for that resource in your infrastructure. For example, the normal CPU usage for a computing cluster is likely to be different from that for a web app. Likewise, a threshold designed to give you an early indicator that you may need to scale out might be different from an emergency threshold that should be addressed immediately.
For our example, we'll use 70% for our alert policy, which indicates substantial, but not excessive utilization:
The next field represents the alert interval. Metrics are averaged over the selected interval to decide whether to trigger an alert. The alert window can range from 5 minutes to 1 day. We will choose 5 min so that we are alerted quickly if the average usage exceeds our threshold:
Now that we've defined the parameters of our alert policy, we can tie our policy to specific Droplets.
Applying the Policy to Droplets
The Select Droplets or Tags section includes a field where you apply the alert policy to specific Droplets or groups of Droplets.
In the provided field, you can input the names of individual Droplets, Droplet tags, or a mixture of these two identifiers. Adding Droplets by name allows you to target individual resources unambiguously. Adding tags to an alert policy provides flexibility in deciding which Droplets are covered by the policy by adding or removing tags from Droplets.
Select the Droplet or Droplets that you wish to apply the alert policy to. Remember that only Droplets with the Agent installed will be monitored:
Once the alert policy has been associated with at least one Droplet or tag, we can continue.
Selecting the Alert Notification Method
When an alert is triggered, a notification is sent. DigitalOcean can currently send notifications to your DigitalOcean account email address, or using Slack. More than one notification method can be enabled and at least one alert type must be selected.
By default, the email associated with the current DigitalOcean account is selected. At this time, it is not possible to set an alternative email address for alert notifications. If you set up Slack notifications, you can optionally uncheck email notifications to turn them off.
If you are part of a Slack organization, you can choose to connect your Slack account to receive notifications in Slack. Click the Connect Slack button to authorize DigitalOcean to create notifications within your Slack organization:
On the authorization page that follows, you can select any Slack teams you are authenticated to or log into a different team. You can then choose to notify Slackbot (which will send messages only to you), notify a channel, or notify any person or group through direct messages.
For this demonstration, we will use the DigitalOcean account email notification, so we do not need to modify any of the settings in this section.
Choosing an Alert Name and Saving the Policy
Finally, choose a name for the alert policy. This name will be used to identify this specific alert policy when notifications are sent, so it is important to choose a unique and descriptive value.
We will call our alert Test CPU Alert:
When you are finished, click the Create alert button to create the alert policy.
Your alert policy will be created. You will be taken back to the Monitoring index page, where your new alert will be visible:
Now that we have an alert policy, we can trigger it to test our notifications.
Trigger an Alert
After creating a new alert policy, it is important to test it. We need to ensure that it triggers correctly and that we are able to receive notifications when an alert is triggered. Log into the Ubuntu 16.04 Droplet covered by your alert policy to continue. It may be best to create and add a new Droplet to your policy for testing if you are concerned about interrupting services on a production host.
To generate the necessary CPU usage to trigger an alert, we will use a tool called
stress, which is available in Ubuntu's default package repositories. Update the
apt package index and install the utility by typing:
- sudo apt-get update
- sudo apt-get install stress
stress is installed, run this command to occupy your Droplet's CPUs:
- stress -c `nproc --all`
This will start a worker process for each of your Droplet's processors to generate high CPU usage metrics. Leave the process running so that the usage exceeds the configured threshold for the alert interval we selected.
Check for Notifications
Since our alert policy is configured to trigger after the threshold has been broken for five minutes, we should expect to receive our first notification shortly. Keep an eye on the email address associated with your DigitalOcean account.
The notification email will indicate the name of the alert policy in the subject line.
The body of the email shows the value of the metric that triggered the alert as well as the threshold that was broken. A link to the alerting Droplet's page in the control panel and the Droplet's IP address are also provided:
Now that we have successfully triggered the alert, we can stop our CPU-hungry process to test the resolution notification.
Resolve the Alert
Alerts are resolved automatically when the average resource use over the alert interval falls back into the expected range. At this time, alerts cannot be manually resolved or acknowledged.
Let's consider our example. Our CPU alert policy was configured with a threshold of 70% and an alert interval of five minutes. This means that the alert will be resolved when the average of CPU metrics collected during the last five minutes falls below the 70% threshold. This provides a good balance that allows a Droplet to transition out of the triggered state when there is a reasonable expectation that a new alert won't be triggered again immediately afterwards.
To resolve our alert, we need to stop our
stress process to allow the CPU usage to drop below the threshold for a while. If you haven't already, on your Ubuntu 16.04 Droplet, press CTRL+C to stop the
stress process and return to the
After the average CPU usage over five minutes falls below 70%, the Droplet will transition out of the triggered state and you will be sent a resolution email.
Again, the subject line will mention the alert policy by name. The body of the resolution email reports the total time that the Droplet was in the triggered state, the current average of the metric over the alert interval, and the Droplet's IP address. A link is included to the alert policy in case you would like to make any changes:
The current alert has now been resolved. The alert policy will continue to monitor the Droplet's CPU usage going forward.
In this guide, we've demonstrated how to configure alert policies to monitor Droplet resources and notify us by email when usage exceeds a customizable threshold. We've applied the alert policy to a Droplet with the DigitalOcean Agent installed, and used a CPU-intensive program to trigger an alert. We've also confirmed that both alert and resolution notifications work as expected.
To learn more about DigitalOcean Monitoring, check out the following articles: