Health Care Analytics: How to begin your analysis?

Comments · 1380 Views

This blog will help you take your first step into healthcare analytics by first understanding a research study and use that knowledge to uncover important factors which can act as stepping stones for you as a part of the analysis team to build predictive models.



A while back my Health Care Analytics professor gave me an assignment to create a new model for treating a respiratory disease, modeled after a successful pilot conducted at Alleghany General Hospital in Pittsburg. One of the tasks in the assignment was to outline an analytics database to manage the project. The database was required to have the 10 most important data elements that would be beneficial to the analysis team to begin its analysis. In simple terms, given a population, what elements do I need to know about that population which will help me in gaining insights so that my organization can improve its quality/experience of care and reduce per capita cost of health care?

The assignment took a while to finish and I ended up getting a good grade as well, but more importantly, I thought this would be a good starting point for someone who wants to explore health care analytics!

For this article, I will be taking the example of COPD. So if you’ve already heard of it, you’re all set! For those who are not aware of it, don’t be thrown off by the name, all you need to know to understand this article is that its a chronic lung disease with breathlessness as one of the biggest symptoms.

One last thing before we start: although some of the points that I’ll be listing mention COPD here and there, but note that many of the points are still relevant irrespective of the type of disease or the patient population that you’re trying to analyze.


Here are the elements:



1) We would want to know the COPD severity of every patient as there is a direct relationship between COPD severity and its cost of care. This makes it a very important element for our analysis. The estimated direct and indirect costs of COPD exacerbations in the US are $52.4 billion and a majority of this is due to hospital admissions. With severity stratification, separate bundles can be designed (for a bundled payment approach). Follow-ups can be scheduled in the form of disease education, counseling sessions, addressing behavioral health concerns, and pulmonologist consultation. This strategy would reduce inpatient admissions and readmission rates.

The estimated direct and indirect costs of COPD exacerbations in the US are $52.4 billion



2) We would also want to know who are the patients who do not strictly adhere to their medication/scheduled pulmonologist visits. Classifying such population is critical for us as, in the long run, such patients can result in increased cost of outpatient care, emergency room visits, and hospitalizations. Is it due to the medication burden that such patients don’t adhere to the plan or are there other factors? Strategies can then be made to answer such questions and there can be more focus on educating this population so that such poor adherence to the care path be minimized and quality of care can be improved.


3) How many of the COPD patients get readmitted? As inpatient admissions are a major contributing factor towards the total cost of care for COPD patients, it is imperative to analyze this attribute. Is there a comorbidity burden on the patients which leads to readmissions or is it because of lack of post-discharge support? A higher readmission rate affects the cost and directly relates to the quality of care. For COPD patients, this could be reduced by better inpatient education, better discharge instructions, counseling on smoking cessation, and post-discharge support.


Comorbidity is the presence of one or more additional health conditions co-occurring with a primary condition. Source: Wikipedia



4) It will be helpful in identifying the services, procedures, and facilities that the costliest patients consume. We can start by identifying all these features for the patients who account for 50% of the costs. The next step would be to look for alternative procedures that might be less expensive, less resource-intensive, and as effective. For example, if the costliest patients are also the ones who get readmitted the most: in this case, point 3 will be helpful. Moreover, if we extract the COPD severity level of such patients and find that the ones at greater risk are also the costliest, it would reinforce our analysis in point 1.


5) A substantial part of the health care costs can be attributed to the increasing consumption of emergency services. ED use is more expensive to the healthcare system than going to a primary care physician. We would like to identify that part of COPD patients who consume emergency services the most. If our analysis suggests that high severity cases(stags C D) are also the ones who consume ED services the most, we can then devise specialized bundles for such cases: this could comprise of more frequent follow-ups, pulmonologist visits, etc. The goal would be to reduce ED services consumption for high severity cases.


6) Who are the patients who are obtaining services across different providers? The next step would be to analyze if such patients go through duplicate or redundant procedures across providers. In case there are such patients, this could mean that medical data is not maintained in a standardized manner by a provider network (maybe in the form of free text) which makes it difficult to interpret by the other provider system. This could lead to inconsistencies in the network and could drive up the cost and impact the quality of care.


7) It would be interesting to have some information on age, gender, and health plan coverage of patients and their participation in drug plans provided by the insurance company. This becomes critical for us because if we are able to find patterns in such factors, then we can use such metrics for better risk stratification and analysis.


8) Data related to patient throughput across multiple health record systems can be really helpful. You can find the definition of throughput here. A better throughput could mean that the hospital system is better prepared to deal with patients at an individual level and is utilizing complete benefits of the Health Information Exchange. This is a direct indication of the quality of care by reducing medical errors and improved efficiency by eliminating unnecessary processes. On the other hand, less or no improvement in patient throughput could raise certain questions regarding data standardization and consistency. It would also be helpful to gain insights on how frequently the hospital system queries the HIE database as it can help them in being more prepared for a patient.

9) We would want to analyze the overall health complexity of a patient in the form of comorbidity information. This enables proper resource allocation according to the health care needs of the patient. Taking the solution at AGH Pittsburg as an example, patients in the COPD care path who accounted for 30% of the total spending were not associated with COPD at all. Hence, analyzing this parameter is particularly helpful while defining as to what constitutes a ‘bundle’ (again, this is for a bundled payment approach) for a COPD patient.



10) Lastly, we would want to identify patients who are on palliative care support. Analyzing this specific population will help us find patterns as to what severity of COPD cases are referred to palliative care and do palliative care referrals at early stages lead to better outcomes. Better outcomes with palliative care could suggest a good quality of care being delivered. It could then be added to the ‘bundle’ for a better care path.


Palliative care is a specialized medical care for people living with a serious illness. This type of care is focused on providing relief from the symptoms and stress of the illness. The goal is to improve quality of life for both the patient and the family.  Source:


To conclude, I would look to say that healthcare has become a data-intensive industry, now more than ever, due to rigorous record-keeping, digitization of health records, and a tonne of regulatory requirements. Healthcare providers are always on the look for talented analysts who can transform this abundant information into actionable insights to achieve better health outcomes for their patient population and eliminate inefficiencies. For people who are targeting such roles or data analysts in a general sense, I hope you found this article helpful!


Images by: