Update 0120 UTC 5 June – We want to clarify that all customer details shared in this post have been approved by the customer in advance. We would never share such company information without express permission.
On May 29, DigitalOcean customer Raisup’s account was locked, and their resources were powered down due to a false positive generated by our anti-fraud and abuse automation system. The follow-up in handling the false positive resulted in a subsequent lock, and a communication of permanent denial of access to the account was sent to the customer. The account owner leveraged Twitter as an avenue to call attention to the mistake. Shortly thereafter, DigitalOcean investigated the issue and the Raisup account was unlocked and powered back on. We'd like to apologize and share more details about exactly what happened.
The initial account lock and resource power down resulted from an automated service that monitors for cryptocurrency mining activity (Droplet CPU loads and Droplet create behaviors). These signals, coupled with a number of account-level signals (including payment history and current run rate compared to total payments) are used to determine if automated action is warranted to minimize the impact of potential fraudulent high-cpu-loads on other customers. Before any action is taken against accounts, automated safeties are checked to avoid action on a customer that is in good standing without warning.
Unfortunately in this case, the safeties were insufficient to prevent automated action. Additionally, because the customer was running on credit, they did not have a clear payment history, which meant that one of the primary safeties (payment history) was not triggered. The automated service created a support ticket on behalf of the customer to allow for rapid communication regarding the action.
Upon recognizing his resources had been powered off, and the account locked, the customer replied to the ticket created for communication on the action. An Abuse Operations agent re-enabled the account 12 hours after the initial ticket. However, a mistake occurred and the agent did not flag the account as approved for the CPU-intensive activity that was the cause of the initial flag.
On May 30, the same automated service then acted on the account a second time, due to the absence of a safety flag. Upon a second review by a different Abuse Operations agent (nearly 29 hours after the customer responded to the second flag), the agent failed to recognize this was a false positive, and the agent fully denied access back into the account. This action triggered the final “access denied” communication to the customer. At this point, the customer initiated the series of tweets to gain the attention of DigitalOcean.
After further investigation the Droplets were powered back on, access was regranted to the account, and the appropriate safeties were flagged. DigitalOcean leadership initiated communication with the customer to extend apologies, offer credit, and fully explain what happened to resolve the issue.
2019-05-29 16:43 UTC – Customer creates a batch of 10 Droplets rapidly creating ~100% CPU load across all new worker Droplets.
2019-05-29 18:24 UTC – Cryptocurrency mining mitigation detects suspicious behavior, including very high CPU utilization on an account with no payment history, which results in an account lock. As a part of this lock a support ticket is automatically created on the customer’s behalf.
2019-05-29 18:37 UTC – Customer replies back to the ticket with a request to unlock.
2019-05-30 06:43 UTC – Action is taken due to the customer reaching out on social media and Support. Support routes the issue to the Abuse Ops. Account is unlocked by responding Abuse Ops agent and a reply is sent in email, 12 hours after customer responded. The Allow High Cpu Usage flag is not set as part of the unlock.
2019-05-30 09:49 UTC – Account is locked and powered down by the cryptocurrency mitigation three hours after the customer powers their Droplets back on when the CPU usage on the same worker Droplets spikes back to 100%. Customer replies back to the new Verification support ticket within 20 minutes.
2019-05-31 15:32 UTC – 29 hours after the customer’s response, the account is denied reactivation. Abuse Ops agent (different from initial agent) cites the link to an older account, connected through a shared SSH key, as additional justification for making the decision to deny access.
2019-05-31 19:21 UTC – Social escalation leads to the account being unlocked/powered back on.
2019-05-31 – Communication across multiple channels (Twitter, HackerNews, other media outlets) occurs to provide apologies and clarity on the situation. Customer is directly contacted by DO staff to offer apologies, situational insight, and credit.
2019-06-01 – Customer responds to direct contact, acknowledging the apology.
This situation involved failures across people, process, and technology:
The safeties intended to prevent fraud and abuse algorithms from taking automated action on a healthy, non-abusive customer were inadequate for a customer lacking payment history.
There were a number of issues and missteps that contributed to the incident. To prevent similar incidents from occurring in the future, we are considering the following measures:
We wanted to share the specific details around this incident as accurately and quickly as possible to give the community insight into what happened and how we handled it. We recognize the impact this had on a customer, and how this represented a breach of trust for the community, and for that we are deeply sorry. We have a number of takeaways to improve the technical, process, and people missteps that led to this failure. The entire team at DigitalOcean values and remains committed to the global community of developers.
Chief Technical Officer