What tools exist to help me understand the performance of my Droplet?

November 12, 2014 3.2k views

I am using a droplet as an inexpensive way to supplement our existing (external) network monitoring tools for a few web applications using Python mechanize scripts. I occasionally see poor performance (requests taking 70,000 ms to complete rather than the usual 400 ms to 500 ms). This could be an application issue - but it could also be a performance issue on the droplet (or somewhere in between). Are there any tools available to me to help me understand if there was a momentary performance problem with my Droplet?

(We use other monitoring tools that give us more information, but it is still challenging to debug intermittent issues.)

1 Answer

If a request takes 70 seconds to complete, it's your code. There's almost no way DO staff wouldn't notice a droplet host having such serious performance issues.

  • Thanks. I understand that. My question is more about how I can get visibility into network and droplet performance. We are not monitoring our own code, unfortunately, so we are limited in our ability to use something like New Relic.

    When we do see poor performance from this external monitoring that isn't supported by internal monitoring data (we see subsecond response times), it points to a likely networking issue ... but the question is where?

  • What is your code written in? There are various profilers you can use to get an insight in where your code is taking time to execute.

    If you insist that it's not code, there are monitoring systems like uptimerobot.

  • It's not my code, unfortunately. If it was, I'd have fixed the issue by now. ;)

    I'm monitoring a couple third party applications. These applications do the same thing, are hosted on the same infrastructure, and show different performance characteristics. We use tools like uptimerobot and dotcom monitor, but aren't fully satisfied with the data we can access or their reporting. So, I wrote my own script to collect data that we can analyze with Tableau or Excel to produce some richer reporting. It tests time to respond to an HTTP request, tracks HTTP status code (eg 200 etc), and does some keyword matching to ensure the correct content is returned.

    My suspicion is that my employer's network is to blame for intermittent poor performance. I'm just looking for some sort of insight into the performance of my Droplets and the Digital Ocean network. I love my Droplet and am very happy with performance in general, but I think it is unlikely that there are never any problems on the Digital Ocean systems.

    If I can't get access to any sort of metrics, that's fine - I feel the value Digital Ocean offers is extremely high - I was just wondering if there were performance metrics that I could access that I wasn't aware of.

  • Hi,

    What metric do you want to monitor? You can use tools like "mtr" to try and diagnose issues local to your employer's network.

  • Sorry - I just realized why I confused you. "We are not monitoring our own code" means "it is not our code that we are monitoring". We are monitoring applications, but we did not write them.

  • Our networking team has plenty of tools that they use to monitor performance, but they're not showing the slowdowns we see in the script run from Digital Ocean. That may be because of where on the network their tools are running from. It may also be because they are sampling performance less frequently than my script is.

    Because of that, however, they don't trust my statistics. Without more information, it's hard to make a convincing argument. Maybe I'll set up a machine on AWS and merge all the data for future use. It would be one more layer of circumstantial evidence, but the likelihood of DigitalOcean and Amazon having issues at the same time is low.

  • What is the question at this point?

  • My original question was if there are tools I can use to understand Digital Ocean network performance and performance/health of my Droplet in retrospect. It sounds like there aren't, so I think we're done here.

  • I suggested some tools above actually, such as "mtr". It's hard to be precise because your problem is very vague.

  • The problem is actually quite specific. I am creating a log file of the performance of HTTP requests from a Droplet to web servers hosted on my employer's network. That performance is normally pretty good, but occasionally unacceptable. I receive notifications from my system (or from a user) that performance is slow - meaning, performance WAS slow in the past and may not be slow in the present. Use of a traceroute tool in the present is of limited usefulness to me because it can't give me any data on what WAS the case in the past at the time the user or my script detected an issue. It is only useful if the problem is persistent, and it is not.

    I was hoping DigitalOcean had some tools to help me understand overall network health (or Droplet/host health) at the time a slowdown is detected in my monitoring data. It does not.

  • Hi,

    The problem might be specific from your point of view, but since we have no example site to check it out with, nor any idea what metrics you're actually talking about, it's a little hard to recommend anything else than wild guesses.

    What does "Good" mean? What does "Bad" mean? What tools are you using to generate your metrics? What metrics? What does a bad log look like, what does a good log look like? These are all questions that are crucial to answer if you want to dig deeper into the problem, otherwise there's really nothing to go on from my point of view.

    As for DigitalOcean themselves, you have to realize that they are an unmanaged provider, which means you're the one responsible for administering the server they give you. They give you CPU, IOPS and bandwidth graphs, but you're free to use the hundreds of thousands of different tools available to monitor performance points on Linux by installing them yourself. I'm also up to pointing you to better tools, but without any real description of the problem the best I can do is guess.

    EDIT: For instance, latency shown using your Python scripts might be different than shown by ICMP ping, etc.

  • I wasn't looking for a managed service, I was looking for something from Digital Ocean that I could look at to understand if there was some issue local to my Droplet at a particular time of day that might have impacted the time elapsed between HTTP request and response. I don't understand why that is so difficult for you to grasp.

    The issue is that we're trying to debug an intermittent issue. I think Digital Ocean services are awesome, but I also know that no infrastructure is perfect. If Digital Ocean performance is normal 99.999% of the time, how do I know that an atypical result in my logs are a real issue in the app I'm monitoring and not the 0.001% of the time that performance is slower than normal. In this case, I'm not too suspicious that there is any problem at Digital Ocean, but I would like some data to confirm that.

    Google offers this for a variety of their services. For example: https://code.google.com/status/appengine/detail/memcache/2014/11/13#ae-trust-detail-memcache-get-latency

    Sometimes tools like that are built into a service but are hard to find. I was hoping Digital Ocean offered something similar for network throughput and I just wasn't aware of it. If not, and if someone at Digital Ocean is watching this, I hope they consider it in the future. Some historical metrics would be better than none. If I was using their service for more significant hosting, I'd want to understand if a customer complaint is caused by something I can directly impact (my code or configuration) or something out of my hands (Digital Ocean infrastructure). I hope that is clear enough for you.

  • Hey @pinneycolton! We don't currently provide tools like the GAE example you linked to. We do have plans to do a major overhaul to our status site in order to let us provide more fine grained information about our different datacenters rather than the more global view it currently shows. Thanks for the feedback!

  • Thanks for the info!

  • Apologies if you find my replies confusing to parse. Nothing has been "difficult to grasp" from my point of view.

    If you think the host is a problem, then create a new droplet and do your tests on it. Droplets will naturally end up on different hosts by the way of a spreading algorithm. You can also ask DigitalOcean staff to move your droplet to a different hypervisor I believe, which would have an equivalent effect. You can also create a droplet in a datacenter that is close to the one you're using if there's such a DC available for that location (NYC has 3 different DCs, so does AMS).

    The closest thing to what you're asking is the status page, which will generally have a message written to it if there's an issue affecting one of their locations. I know that in the past when there was a network issue, something was written on that page.

    More precise than that, I doubt that DigitalOcean is willing to go further and offer up the CPU/network graphs of their hypervisors, for example. I certainly haven't heard of an unmanaged services provider that went this far.

Have another answer? Share your knowledge.