Featured AI Products
Compute
Build, deploy, and scale cloud compute resources
Containers and Images
Safely store and manage containers and backups
Managed Databases
Fully managed resources running popular database engines
Management and Dev Tools
Control infrastructure and gather insights
Networking
Secure and control traffic to apps
Security
Help protect your account and resources with these security features
Storage
Store and access any amount of data reliably in the cloud
Browse all products
AI/ML
CMS
Data and IoT
Developer Tools
Gaming and Media
Hosting
Security and Networking
Startups and SMBs
Web and App Platforms
See all solutions
Community
Documentation
Developer Tools
Get Involved
Utilities and Help
Become a Partner
Marketplace
Pricing

NY1 (Equinix) Power Issue Postmortem

By DigitalOcean

Published: November 24, 2013
4 min read

Initial Power Issue

Last night at approximately 8:43PM EST we received notification from our various monitoring systems that a large number of servers were down in our NY1 region. We immediately began to troubleshoot the situation and confirmed that network connectivity was unavailable. We immediately began to access equipment and found that most of it was online about a minute later. However after reviewing the logs, we saw that the uptime for each device accessed was now in the minutes instead of days, indicating that it had been rebooted.

We then immediately contacted the Equinix datacenter to get more information. We suspected that there was a power failure and needed to confirm if it was related to our equipment or something larger. On the phone we received confirmation that there was a large power failure at the facility.

Immediate Response

All of our staff was at high alert at this point, as we had to review each piece of hardware that had been rebooted and we dispatched people to the datacenter as well. Lev, our Director of Datacenter Operations, is currently in Amsterdam opening up our latest facility so instead Anthony and Moisey were on site within the hour.

Initial Contact from Equinix

An hour later we received official confirmation from Equinix via email that there was in fact a power failure incident at the facility.

INCIDENT SUMMARY: UPS 7 Failure

INCIDENT DESCRIPTION:

Equinix IBX Site Engineer reports that UPS 7 failed causing disruptions to customer equipment. UPS 7 is back online. Engineers are currently investigating the issue.

Next update will be when a significant change to the situation occurs.

Information was limited coming from Equinix directly; however, with our engineers onsite, we also had a chance to discuss the power issue with other customers of the datacenter and gathered more information.

Informally what we suspect is that UPS7 was responsible for cleaning the dirty power that comes in from the public grid into stable power which then is distributed throughout the DC. There was in fact a hardware failure of UPS7, which should have triggered an automatic switch to a redundant UPS–which they do have on site– but that switched failed to occur. It is very likely that there is more than one UPS that handles in-bound power, as only about half of the datacenter experienced a failure.

When the redundancy failed and another UPS did not take over, it essentially meant that power was cut off to equipment. UPS7 then hard rebooted and was back online, which then resumed the flow of power to equipment; however, there was an interruption of several minutes in between.

While we were on site, we did see power engineers arrive at the facility about 3-4 hours later to investigate what caused the initial failure of UPS7 and why the redundant power switching systems did not operate as they were supposed to.

Power Restored - Restoring the Cloud in NY1

Losing power to hypervisors is the worst case scenario because an immediate interruption in power doesn’t allow the disk drives to clear any caches they have, thus increasing the likelihood that there may be filesystem corruption. We began to troubleshoot every single hypervisor to ensure that it booted up successfully and we did find several systems that needed manual intervention.

We did not need to recover or rebuild any RAIDs during the process. Instead, some systems failed to boot citing that they failed to find a RAID config, but we suspected that was related to the way they lost power. We powered off those systems and removed the power cords to ensure that everything would reset correctly, and then reseated the physical SSD drives and powered the systems back on.

Given that the network was also affected, we had to ensure that all of the top of rack switches would converge back onto the network successfully. Here we observed three switches that needed manual intervention to get them back on both cores. One of the switches also had one of the 10-gigE gbics fail which we replaced with a spare. After that was completed, the network layer was back in full operation.

Once we completed getting all of the physical hypervisors back online, we then proceeded with powering on all of the virtual machines that resided on those systems. We wanted to approach this in a systematic way to ensure that we could give 100% focus to each step of this process. After the virtual machines were back online, we began notifying any customers that opened any tickets that the majority of the work was now complete and to please notify us if they saw any issues.

Please Contact Us

With all of the hypervisors back online, networking issues resolved, and all virtual machines booted, we instructed customers that were having any issues to please open a ticket so that we could troubleshoot with them. We did see a small percentage of virtual machines having dirty file systems which required an fsck to get them online and working, and we ask customers to reach out to us so we can help with that process if any customer is not familiar with fsck.

About the author

DigitalOcean

Author

News

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Engineering

Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character.ai

Piyush Srivastava

January 13, 2026
13 min read

News

Currents Report: How Growing Tech Businesses Use AI Today

Roxie Elliott

February 6, 2025
3 min read

News

Introducing the DigitalOcean Netlify Extension

Quinn Eckart

October 3, 2024
2 min read

News

NY1 (Equinix) Power Issue Postmortem

By DigitalOcean

Published: November 24, 2013
4 min read

<- Back to blog home

Initial Power Issue

Immediate Response

Initial Contact from Equinix

An hour later we received official confirmation from Equinix via email that there was in fact a power failure incident at the facility.

INCIDENT SUMMARY: UPS 7 Failure

INCIDENT DESCRIPTION:

Equinix IBX Site Engineer reports that UPS 7 failed causing disruptions to customer equipment. UPS 7 is back online. Engineers are currently investigating the issue.

Next update will be when a significant change to the situation occurs.

Power Restored - Restoring the Cloud in NY1

Please Contact Us

About the author

DigitalOcean

Author

News

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Engineering

Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character.ai

Piyush Srivastava

January 13, 2026
13 min read

News

Currents Report: How Growing Tech Businesses Use AI Today

Roxie Elliott

February 6, 2025
3 min read

News

Introducing the DigitalOcean Netlify Extension

Quinn Eckart

October 3, 2024
2 min read

NY1 (Equinix) Power Issue Postmortem

Initial Power Issue

Immediate Response

Initial Contact from Equinix

Power Restored - Restoring the Cloud in NY1

Please Contact Us

About the author

Start building today

Related Articles

Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character.ai

Currents Report: How Growing Tech Businesses Use AI Today

Introducing the DigitalOcean Netlify Extension

NY1 (Equinix) Power Issue Postmortem

Initial Power Issue

Immediate Response

Initial Contact from Equinix

Power Restored - Restoring the Cloud in NY1

Please Contact Us

About the author

Start building today

Related Articles

Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character.ai

Currents Report: How Growing Tech Businesses Use AI Today

Introducing the DigitalOcean Netlify Extension