« Free Remote Assistance Now Available | Home | How to: Delete Stubborn Outlook Reminders »
Adventures in Disaster Recovery
By TheEmperor | March 15, 2008
The last three days have been filled with issues that underscore the need for proper disaster recovery measures in all things. One of our clients had a complete meltdown of their domain controller. We had installed some ScanRouter software while working on getting their Ricoh scanner to scan to a network drive. The software installation failed, the vendor requested a complete uninstall and reinstall. After the reinstall the server required a reboot. As is the normal procedure I was going to wait for friday afternoon to reboot so that if something went wrong we would have the full weekend to repair it.
Unfortunately on Wednesday the office manager noticed a message saying that the server needed a reboot. One unscheduled reboot later and the server is throwing LSASS errors and no one can login. I log in to the terminal server and do a few things remotely, then reboot the domain controller. Now it’s down for the count, LSASS errors on boot cause infinite rebooting. After putting in a 19 hour day, including a 5 hours call to microsoft, attempting to repair the server and then beginning the reinstall I finished up at 5:30am and went to get some rest, calling in one of my co-workers to finish up the reinstall of Exchange and the Users. Today I went back in to finish re-adding all of the machines to the domain and make sure all of their custom software worked and whatnot. So after about 40 man hours of work the system is completely back up and running. In my opinion that’s 36 man hours too many.
How should that situation have gone? The second the system failed to reboot we should have been able to drop a system image in place, restore the OS to a previous state and go about our business with no more worries. Why couldn’t we? Because like most consulting companies we don’t have a universal disaster recovery policy. Of course, I’ll be recommending one to the company owners on Monday and trying to get it approved and implemented.
The disaster recovery plan I’m going to pitch involves Monthly images of the OS and the system critical components like the system state using Acronis. With such an image restoring the OS to a working state would have taken only an hour or so. With all of the data on the drive still intact the entire process would have been near painless instead of grueling.
I’ll have a complete article on Acronis up sometime next week for anyone interested in using it for their own disaster recovery procedures.
Topics: acronis, disaster, disaster recovery, server crash, system image, system recovery, system restore, windows crash |
Comments are closed.





