I used to think about the day when I fixed everything so we would stop IT outages. Of course that is silly. Like other healthcare organizations we are adding applications to the portfolio every year as new solutions address previously underautomated areas. Most of these are not core parts of the IT architecture, but they are supplemental such as documentation systems for clinical departments (e.g., rehab) and contract modeling systems.
With the increase in the number of applications in the portfolio comes complexity. In addition our infrastructure is becoming much more complicated including a more sophisticated network; changing virtualization technologies; and complex storage.
So, our IT Operations philosophy is to perform a Root Cause Analysis on every critical service interruption. Our Root Cause Analysis asks three things:
- How can we prevent this type of outage in the future?
- How can we detect this type of outage in the future?
- How can we respond to this type of outage more quickly?
The second two questions are important. Even if the cause of the service interruption is a simple fix, sooner or later stuff is going to hit the fan. We want our IT folks to see that it has when it does and already be communicating to our customers how we are fixing the problem before they call us.