An Introduction to DigitalOcean Monitoring
DigitalOcean Monitoring gives administrators greater insight into their infrastructure's resource usage with improved graphs, configurable alert policies, and integrated notifications. Droplet graphs provide up to the minute visualization of current and historical resource consumption across a variety of metrics. Alert policies allow users to configure thresholds for specific Droplet-level resources. If usage exceeds those thresholds, email or Slack notifications are dispatched.
DigitalOcean Monitoring is built to help users respond promptly to critical situations where resources have been exhausted and infrastructure health may be affected. However, it can also provide early insight into changing usage patterns to assist with planning and proactive measures to avert future problems altogether.
In this guide, we’ll provide an overview of DigitalOcean Monitoring and discuss how it can help you to better manage your infrastructure resources.
What Is DigitalOcean Monitoring?
DigitalOcean Monitoring is a free, opt-in service that provides insight into resource usage across your infrastructure. Monitoring is composed of a few different components which, together, increase visibility into the operational health of your servers.
Droplet graphs are a visual representation of system-level metrics to provide a high-level overview of resource usage. This can help you understand how your resource usage is changing over time, how different resource levels correlate, and which processes are contributing to those levels. You can learn more by reading our "How To Track Droplet Performance with DigitalOcean Droplet Graphs" article.
Alert policies are user-created rules that define thresholds for resource consumption. When a usage exceeds the threshold, notifications are dispatched through email or Slack. Our "How To Set Up Alerts with DigitalOcean Monitoring" article provides more detail on how to get started with alerts.
A few things you should know about DigitalOcean Monitoring:
- Price: Free
- Region support: Available in all regions
- Requirements: The DigitalOcean Agent must be installed on all participating Droplets
- Graphed metrics: System and user CPU usage, memory usage, disk read, disk write, disk usage, incoming and outgoing public bandwidth, incoming and outgoing private bandwidth, top processes by CPU and memory
- Alert-capable metrics: CPU, incoming bandwidth, outgoing bandwidth, disk read, disk write, memory usage, disk usage
- Notification methods: Email and Slack
When Should I Use Alert Policies and Notifications?
DigitalOcean Monitoring is built to increase your awareness of Droplet-level resource utilization. The gathered metrics and alerting can help you track the operational health of your infrastructure. In other words, DigitalOcean Monitoring is concerned with the data sources that best indicate that your server has the resource capacity to satisfy its current workload.
Because the Agent focuses on system-level data, it is not suitable for application performance monitoring (APM) tasks like tracking application errors, website connections, or response times. These require a monitoring system with a different focus, which could be used in addition to DigitalOcean Monitoring.
Enabling the DigitalOcean Agent
All Monitoring features require a small, open-source DigitalOcean Agent program to be installed on participating Droplets. The Agent gathers relevant metrics and forwards them to the DigitalOcean Monitoring service. To learn more about what metrics the Agent gathers and how it works, read our guide on "How To Install and Use the DigitalOcean Agent for Monitoring".
The Agent currently supports Ubuntu 14.04 and higher, Debian 8, and CentOS 6 and higher. The automatic Monitoring checkbox is not available for Fedora, but the installation script can be used.
You can opt-in to Monitoring and automatically install the Agent when creating Droplets by checking the Monitoring box:
If Monitoring wasn't selected during creation, you can manually install the Agent by logging into the Droplet and typing:
- curl -sSL https://agent.digitalocean.com/install.sh | sh
Once you have opted in, you can start creating alert policies.
Creating Alert Policies in the DigitalOcean Control Panel
You can create an alert policy at any time by clicking the Monitoring link in the main navigation in the DigitalOcean control panel and then clicking the Create alert policy button:
You can also follow the link from a Droplet's Graph page:
You will be taken to the Create alert policy page, where you can define a new alert policy:
The alert creation process is organized around four main decisions:
- Setting the metric and threshold
- Selecting Droplets or Tags
- Setting alerts
- Naming the alert policy
Once you're satisfied with your choices, you'll click the button to create the alert. Let's take a look at each of the above decisions in more detail.
Setting the Metric and Threshold
The pattern for defining an alert policy is the same for all metrics:
- Choose the metric. The options include CPU, Inbound Bandwidth, Outbound Bandwidth, Disk Read, Disk Write, Memory Utilization and Disk Utilization. For more about these metrics, see the Monitoring Glossary.
- Indicate whether you want to be notified when usage is above or below the threshold. The default value of is above is usually appropriate.
- Specify the usage threshold itself as either a percentage of the total available capacity being used, or as a usage rate depending on the selected metric. An appropriate value here will depend on the metric, the goal of the alert, and the typical server usage patterns.
- Pick how long a Droplet must exceed the threshold before a notification is triggered. The intervals include 5 minutes, 10 minutes, 30 minutes, 1 hour, 6 hours, and 1 day.
For example, we may have emergency alerts set to notify us if Disk Utilization is above 90% for more than 10 minutes. Since the current capacity is nearly exhausted, this type of alert would be helpful to prompt urgent action from on-call staff.
But to help with capacity planning, we might also choose to be alerted when Disk Utilization is above 70% for more than one day. This would serve as an early indicator of capacity issues so that we can plan to add more storage before there's an emergency. This second type of alert may tell us that we should consider adding additional storage when we get to the office on Monday (if we anticipate that the rate of consumption won't outpace the remaining storage before that time):
If both of the above alerts were in place, we would still get notified if usage climbed above 90%, which would be helpful if our consumption estimates were incorrect.
After defining the metric and threshold, the next step is to select the Droplets where the alert policy should be applied.
Applying the Policy to Droplets
The policy can be applied to Droplets by name or by tag. When a Droplet is added by name, it is managed from the alert policy page. This can work well with smaller infrastructures or with policies that are specific to just a few Droplets.
When Droplets are added by tag, only the tag needs to be managed on the alert policy form. Afterward, any Droplet with the Agent installed and categorized using the tag of the policy will automatically be monitored. Using tags to apply alert policies can be helpful when managing many Droplets since the tags can be added to the Droplet as part of the creation process:
Nothing will prevent a Droplet from being added to a policy multiple times, by both name and tag, for example. This will not result in multiple notifications for that Droplet when an alert is triggered. For each Droplet, an alert will be sent once per policy, regardless of how many ways the Droplet may have qualified for that policy.
Configuring the Notification Method
To create a policy, you must select at least one of the two possible notification methods: Email or Slack. The first choice, checked by default, is the verified email address of the account you're using when you create the policy.
At this time, you cannot add or change the alert email address for an individual account.
When you create an alert policy from a Team account, you'll have the option to select any of your teammates as email recipients for an alert.
When you click Add more recipients, you'll be given a list of your team's email addresses. Select each address individually or use the checkbox by the Team members header to select or deselect all the addresses on the list.
In addition to email notification, Slack users can connect to a Slack team, then choose a channel for notifications:
Once you've authorized the link between DigitalOcean Monitoring and a Slack team, that connection will be available and enabled by default the next time you create an alert policy. If you choose to unlink in a new alert policy, you'll be able to select a different channel or a different team without affecting any previous connections.
Naming and Creating the Alert Policy
The final step before creating the policy is to assign a name. A default subject line is provided for each metric, but in order to more easily distinguish among policies, we encourage you to customize the name.
The text you put in the "Monitor name" field will:
- Identify the policy on the Monitoring index page
- Form part of the subject line of the email alert
Subject: DigitalOcean monitoring triggered: Disk Utilization is high on a server tagged 'Database'
When everything is configured, clicking the Create alert policy button will create the policy and kick off the evaluation of incoming data.
The new policy will appear on the Monitoring page under a section called Alert Policies:
Once you have an alert policy, it is important to understand how the monitoring system collects data and evaluates alert states.
Data Point Collection and Alert States
When a policy is first created, it may take a few minutes before the evaluation of incoming data begins. After that slight delay, data will be evaluated at regular intervals.
If the average of the data points in the alert interval exceeds the threshold, an alert is triggered. In our example, once monitoring begins, after 1440 minutes (one day), the monitoring will average out the data points over that period to determine the percentage of disk usage. If the average indicates that disk usage is above 70%, we would receive a notification.
This same data point evaluation process is used to determine when an alert has been resolved. Data points continue to be collected at regular intervals. Each time a new metric is received, the oldest point drops off, the newest is added, and the average of the threshold interval is evaluated. This means that if a threshold was barely exceeded and a new data point comes in that brings the new average below the threshold, a resolution notification could be triggered without much delay.
With our disk example, let's say that a log rotation policy deletes an old and particularly large log file, causing the threshold to go down dramatically. We will receive the resolution notification in the same channels where we received the alert notification (unless, of course, we've edited the policy in the interim).
At this time, it is not possible to manually resolve or acknowledge an alert. Alerts are automatically resolved when resource usage falls back to an acceptable level according to the alert policy.
Receiving Notifications and Viewing Active Alerts
When an alert is triggered according to the process outlined above, a notification is sent using the chosen mediums. You will be notified once per configured medium when an alert has been triggered. A second notification is sent when the alert has been resolved.
Each notification includes the name of the alert, the name and IP address of the triggering Droplet, and a link to the triggering Droplet's page in the control panel. Additionally, notifications about triggered alerts include the alert policy parameters and the average resource usage at the time the alert was triggered. Resolution notifications include the length of the alert event and the current average resource usage.
Alerts in the Control Panel
If an alert is triggered, a new section in the Monitoring interface will be displayed called Triggered Alerts. This section is only visible when there are active alerts:
This section of the page displays the active alerts, including each of the Droplets that are currently above the usage threshold. Once the alert has been resolved, the entry will drop out of the Triggered Alerts section. If there are no longer any active alerts, the Triggered Alerts section will be hidden.
If you've selected email notifications, you will receive a notification email when an alert is triggered:
Example ContentSubject: DigitalOcean monitoring triggered: Disk Utilization is high on a server tagged 'Database' Disk Utilization is currently at 82.63%, above threshold of 70.00% for more than 1 day. View droplet Database-01: https://cloud.digitalocean.com/droplets/12345678 IP: 203.0.113.1
Once the alert has been resolved, a similar resolution email will be sent:
Example ContentSubject: DigitalOcean monitoring resolved: Disk Utilization is high on a server tagged 'Database' The monitor was triggered for more than 1 day. Disk Utilization is currently at 69.70%. View droplet Database-01: https://cloud.digitalocean.com/droplets/12345678 IP: 203.0.113.1
This indicates that the alert has been resolved.
If you've enabled Slack notifications, you will receive a notification in Slack in the team and channel selected in the alert policy:
Once the average resource consumption has dipped below the threshold again, a similar Slack notification will be sent indicating that the alert has been resolved:
Again, this message indicates that the alert has been resolved.
DigitalOcean Monitoring provides up-to-date information about your infrastructure health to enable timely, data-driven decision making. Droplet graphs give you historical insight into how usage patterns have developed over time, while alert policies and notifications let you know when usage falls outside of the parameters you find acceptable.
To learn more about DigitalOcean Monitoring, check out the guides below: