This post consists of questions posed by a healthcare organization looking to implement a data-warehousing solution. The answers come from Dale Sanders, senior vice president at Health Catalyst and CIO mentor and senior technology advisor for the Cayman Islands Health Services Authority. Sanders, a former CIO at Northwestern Medical Faculty Foundation, has spent the last 15 years applying IT to improve quality and efficiency while also reducing expenses.
The questions were posed by representatives from eHealth, the IT department of a healthcare system.
Q: We want to proceed with a data warehousing initiative in increments with budgets of 300-500K at a time. Steps must be opportunistic in order to avoid mistakes due to poor planning and a lack of insight. We need to know what crucial decisions we need to make now. There are some cultural issues in regards to data ownership and data sharing. We want to ensure we don’t make fundamental mistakes early on.
Although eHealth is the technology department, there are pockets of data, analysts, and developers which are outside of eHealth. The Center for Health Policy has its own data repository, governance, and processes. We would like to label our top operational systems (ADT, Emergency). By demonstrating results, we will be able to fund a system expansion. From the business perspective, if we had the top 20 data marts, we could start conforming the dimensions that are common to the data marts. This will support buy-in by reducing maintenance and offering more comprehensive reporting.
A: One of the challenges you will have is to decide whether to go back to the source systems or to extract from your existing data marts. In either case, it is essential to identify which core dimensions of analysis are most important (provider ID, patient ID, etc.). For these core dimensions, you need to define naming conventions, data types, values, etc. You’ll need to add the attributes to the source systems or the data marts; probably the latter.
The first thing you should do is define approximately 20 dimensions (standards and naming conventions). I call this the core data element of the “Bus Architecture.” Having this in place will allow you to query across all of the different systems — whether through a source system or a data mart– without remodeling the data.
Q: What is the best approach to ensure that we cover standards for the systems in the data warehouse (demographics, naming conventions, etc.)? We currently have the standards, but we don’t know if all of the source systems are using them.
A: First you should consider the ROI for the data acquisition. You want to go after the high-visibility and high-value data sources first. My mantra for organizations is: “No data governance before it is required.” Oftentimes organizations try to govern too much and too early. You do not want to govern just for the sake of governing. Allow for and tolerate a certain amount of ambiguity in data governance, initially. Focus those limited governance activities on increasing access to data; improving data quality; and increasing the data literacy of the organization. The key is to start small and build some success stories. Others will be attracted to that, and the governance structure will coalesce around that rather than inhibit the evolution of the data warehouse.
One concept that is a bit counter intuitive to most health IT types is that you do not have to conform all of the dimensions a transaction system standard like HL7 might follow. Here’s an example: a team got wrapped around itself standardizing the data types and lengths for patient and provider names that did not have any analytics use cases. All that effort was unnecessary because analysts rarely join across source systems based on name, but rather use a numeric person identifier– an MPI. Do not try to conform all of the dimensions and adopt all the standards that are out there, all at once. Focus on a small number of core dimensions of analysis. Over time, as the number of analytic use cases expands, you can standardize as you need to. Don’t try to boil the ocean.
Q: How do you approach bringing in data sources while integrating enterprise information standards?
A: The data model issue is the greatest cause of failure for healthcare enterprise data warehouses (EDW) in the U.S. There are four data models to consider with a healthcare EDW:
1) Star schema, advocated by Ralph Kimball
2) Enterprise Information Model/Corporate Information Model, advocated by Bill Inmon and Claudia Imhoff
3) I2B2, which is a variation on a star schema designed at Harvard Medical Center to facilitate information exchange between academic medical centers
4) The “Late Binding Bus Architecture” that I advocate. This is the approach we used at Intermountain, Northwestern and another 20 to 30 organizations across the U.S. This model is significantly different from the others, because there is very little data modeling that goes on. The focus is on data relating, not data modeling. Why remodel the data when the source systems have already modeled it for you?
When defining high-value data sources, consider the following:
- The volume and breadth of data. Simply put, when there is a lot of data, there is usually value, analytically.
- What are the pressing issues in your organization or patient population? Start where there is a hot item of analysis or vexing issue where there is strong leadership around solving the issue. By addressing these high profile questions, first, you will attract the attention of other opportunities for analytic use cases .
- Analytic needs and strategy for the patient population in Province.
Q: What would the warehouse look like, and where would business rules be applied?
A: Business rules in a data warehouse can be applied in one of six “binding points” in the flow of data, from the source system (Binding Point 1) to the visualization layer (Binding Point 6). The fundamental healthcare value equation is quality divided by the cost of production. Clinical effectiveness analytics — which is the attempt to measure the clinical quality and effectiveness and adherence to best practices — is becoming a greater focus of the healthcare analytics community across North America, so this is something to take into account when designing the warehouse. The rules around quality are changing all the time, so you want to bind to those rules very late in the flow of data in the EDW. The rules around cost are not quite as volatile, even though the data quality is terrible in the US. But you can bind to cost rules earlier in the architecture. In Canada, I suspect you would need to build the infrastructure around the numerator (quality), first, but then build the data content to support the cost of production over the upcoming years. In the US, the cost of care is an increasing concern.
Q: What are your thoughts on what governance should be in place and how it would develop as the EDW develops?
A: There are several things you can do to successfully evolve and gain buy-in for data governance structures:
- Communicate to let broad stakeholders know you are engaged in a new and integrated EDW. This will alleviate the knee-jerk concerns that an EDW can evoke.
- Consider publishing a three-year roadmap that outlines development of content, security, auditing and layers of the EDW to the executive sponsors and stakeholders. This roadmap can also demonstrate the evolution of the data stewardship and data governance structures.
- Consider involving the CIO heavily as part of the governance structure to help break down barriers of access to data.
- I’ll share with you what I call the Library Metaphor. There isn’t a lot of need for governance of a library while it is being constructed. It isn’t until books and periodicals are ordered that governance becomes an issue. Access and security comes after the building. Over time, those working with the EDW as a career become librarians who don’t always understand how the content is being used, but know that people are accessing it and doing useful work. As the community literacy with the data expands, so will the need to expand the library. The same is true with the EDW: start out with the core data, and then users will ask for new and different content which needs to be expressed in different ways. As librarians, we need to be aware and in tune with the community we are supporting. It makes no sense to build a highly complex and capable data warehouse in a community that has little or no data literacy.
Q: Regarding the roles for those working directly with the EDW, would they come from a core group or from different functional areas?
A: I recommend a staffing model consisting of 60 percent regionally or centrally assigned personnel and 40 percent business-unit assigned. This holds true both for projects and operationally. This model balances nicely for the evolution of the EDW. It allows for business cases to percolate up while allowing for centralized analytic use cases. You definitely want to allow for both.