Email me when new posts are made to this blog

Server outage on Jan 23rd - Webdrive incident report

Written by Courtney Brown on January 24th, 2013.      0 comments

We have received an incident report from Webdrive, our primary server hosting facility, regarding server outages yesterday (January 23rd) with the following information:

At approximately 13:15 NZDT a redundant network switch that connects cloud hypervisors failed (cause currently unknown) - an engineer was dispatched to investigate and found the device to be non-responsive (dead). The engineer returned to collect replacement equipment when a notification was received that the second of the redundant switches connecting cloud hypervisors had also subsequently failed. The failed switch was replaced and the service was restored.The second failed switch is also currently being replaced.

It is an extremely rare occurrence that two individual devices should fail within such a brief window. The exact cause of the hardware failures is currently unknown and will be further investigated. 

Our focus is currently on ensuring that we have both switches replaced and service fully restored (with full fail over capability). 

This outage affected all of our Webdrive-hosted servers, which accounts for the majority of Zeald websites. 

 

Comments