A few weeks back I received a letter from our local electric utility informing me of scheduled overnight outages due to service work in my neighborhood. The day came, and past, without the announced outage. This is not unusual, it has happened before so I didn’t think much of it.
A few days later I received an early morning call that our office email and Internet access were down. I had not received any NAGIOS alerts overnight but I thought that the utility had finally gotten around to doing the scheduled work and that the power had failed overnight and exhausted the UPSes.
I walked the caller through the UPS restart process and found that they were already running fine. However, all the equipment in our mail server rack was powered off. This includes our firewall (which explains the Internet outage ) as well as the NAGIOS monitoring machine (which explains why I did not receive any alerts) and the mail server. Hmm… the plot thickens.
I ask them to switch the KVM to the file server machine and the KVM is unresponsive; it is ‘stuck’ on the email server screen. It is also making a very odd ‘screeching noise. Oh boy…
I have them power up the firewall and monitoring machines and I begin to get a flood of alerts. Oddly, none of them are related to a power fail. The file server starts up normally but the mail server however refuses to power up. With my co-workers temporarily distracted by the restored Internet access I jump in my car and race to the office.
When I arrive the server will still not power up so I decide to move it’s drives to another machine (this is almost trivial with a Mac server). The drives are mounted facing the front of the server and are access-able via a push-bar that releases one of the three drives.
I push the first bar to release the drive and water gushes out all over my shoes. Oh frack….
My office is located on the top floor of a renovated 19th century shoe factory. The space is wonderful with exposed brick, soaring ceilings and rough hewn beams that would make Paul Bunyan proud. The roof however is not quite so wonderful. In the past we’ve had water ‘incidents’ that have claimed a color printer and caused some Dis-coloration of the floor in the server room. The building management had assured me that it had been fixed — apparently not so much. Judging from the splash patterns in the rack where the mail server is located a leak formed directly over the rack. The water primarily hit the upper components in the rack; the screeching KVM switch, an external hard drive (this will be important later) and the mail server. I’m guessing that it also found it’s way down to the other machines in the rack causing them to shutdown but the UPS somehow dodged the watery bullet.
Miraculously only the mail server was toasted by the water; the other machines (with the exception of the KVM – it’s totaled) started up normally.
After pulling the remaining drives (Yes, I covered the machines below the mail server with plastic to prevent further damage) I removed them from the ‘sleds’ and put them in front of a fan to dry. My next stop was the backup system console.
A quick check indicated that I had a good ‘full’ backup from the prior weekend and a good ‘incremental’ backup from the night before the deluge.
It was at this point that I uttered the line that I would later prove to be a curse… “We have a good backup. It could be worse…”.
It was… much worse.
More later…