Patient Selection Algorithm Development for Cardiac… | Starschema

Patient Selection Algorithm Development for Cardiac Resynchronization Therapy

Improving treatment outcomes through comprehensive patient selection

The Challenge

Cardiac Resynchronization Therapy (CRT) is a novel treatment modality for patients suffering from moderate and severe heart failure caused by left bundle branch block (LBBB), an abnormality of the heart's electrical system, in which the left ventricle's contraction is delayed and therefore systolic function is reduced, leading to the hallmark reduced ejection fraction (<35%) of heart failure. Similar to a pacemaker and often also including a cardioverter-defibrillator, CRT involves the implantation of an electrical device that uses electrical pulses to synchronize the function of the two ventricles. For appropriate patients, CRT can be a life-saving therapeutic option. However, the intervention itself has considerable risks, carries significant expense and its benefits to heart failure patients are limited to a small group of suitable patients. Given these issues, patient selection is paramount. To facilitate that, Starschema has been commissioned by a major university hospital's interventional cardiology service to perform an analysis of its CRT program, which has been running for 12 years at the time, and determine patient selection criteria to identify patients most likely to benefit from CRT and least likely to suffer perioperative or postoperative complications.

One of the most significant challenges was the diversity of data available on each patient, ranging from quantitative information (e.g. lab tests) through information extracted from operative reports and EHR (Electronic Health Record) systems, ECG recordings (including long-term Holter monitoring), prescription information and determination of ejection fraction by cardiac MRI to echocardiograms. In addition to this, where mortality has occurred, cause of death and circumstances surrounding mortality were comprehensively encoded by expert clinicians using ICD-10, and mortality events have been classified as heart failure related, device-related (e.g. postoperative infections) or unrelated.

Our Approach

To manage this diverse array of data, a data lake was constructed that could accommodate structured, unstructured and binary data alike. This enabled our data scientists to rapidly and efficiently access all data assets pertaining to a patient regardless of format.

Using advanced feature engineering techniques, data was standardized and encoded in a format that could then be used in a multifactorial survival model. For ECG data, dynamic time warping was used to align individual beats and the coefficients of Daubechies wavelet transforms were used, while echocardiograms were analyzed using a deep autoencoder. With the vast amount of time series data, including several ECG recordings each of thousands of patients, we relied on automated feature engineering using deep feature synthesis (DFS), which allowed us to generate and select synthetic features from large amounts of data for optimum information content and representativeness.

Finally, a survival model was built that stratified risk of perioperative, postoperative cardiac and postoperative non-cardiac death. This highlighted the most determinative factors of survival, including obvious ones (such as age, physical status and comorbidities) and less obvious ones (such as QRS complex width and the presence of atrial fibrillation). This allowed the creation of a patient selection scoring algorithm that leveraged the LIME (Local Interpretable Model-Agnostic Explanations) model to not merely indicate to the physician that a particular patient was assigned a particular score, but also what values that score is based on. In the end, decisions of patient selection are made by clinicians, and with decisions that have such a vast effect on individual lives, the appropriate role of machine learning is to distill a vast majority of clinical information and advise the clinician of the factors that militate in favor and against the intervention.


The objective of this project was to create a clinically useful tool that could distill a vast amount of patient information of often different sparsity and provide clinicians with a pre-implantation likelihood of perioperative mortality and long-term survival. The clinical reception of a tool allowing reasoned decision-making on the basis of data and rapidly summarizing a patient's entire record into a single predictive model was overwhelmingly positive.

When compared to clinical trials of CRT, such as the COMPANION, CARE-HF and the MIRACLE/MIRACLE-ICD trials, the outcomes were in near-complete agreement with the model. However, unlike the clinical algorithms and patient selection guidelines laid down by these studies, the model we provided did not merely calculate suitability, but also explained why a patient would be suitable or unsuitable for CRT and quantified the relative weight of each of those factors. Guidelines and clinical algorithms treat factors usually as being of equal weight, whereas a quantitative model can assist the clinician to make better decisions by also highlighting the relative impact of each factor on outcomes within the population.
Accurate patient selection improves outcomes, prevents inappropriate treatment and saves lives. By creating a transparent and easily interpretable predictor, clinicians can now make treatment decisions with greater confidence and weigh the factors for and against a particular intervention in view of an intuitive explanation of relative weights and contributions.

Technologies used

● Python
● scikit-learn
● TensorFlow
● Deep Feature Synthesis
● LIME (Local Interpretable Model-Agnostic Explanations)

Skills used

● Clinical analytics and population health
● Deep Feature Synthesis and feature engineering
● Machine learning
● Model explanation and analysis

Introducing the Stack of the Future for Modern Data Leaders

Fast unobstructed access to data and time to insight matters more now than ever. In these quickly changing times, businesses must innovate and implement a ‘Stack of the Future’ to be able to make accurate, data-driven decisions in minutes, not hours or days. The potential value of data is well known but in the new environment, the ability to easily share and collaborate on data is a competitive differentiator that will be leveraged by forward-thinking companies

COVID–19 Data Set Modeling and Analytics

During times of crisis, companies must look at the available data — both internal and external— and try to understand how that data can be used to determine how the business is currently being impacted, how it is likely to be affected in the future, what are most likely scenarios that will play out, what can be done to counter those scenarios and take advantage of hidden opportunities in this rapidly changing environment. The Starschema COVID-19 dataset ingests reliable data from multiple sources and makes it analytics-ready so it can be easily accessed and used.

A DataOps Journey

Keeping your data platforms running with operational efficiency is both paramount and can be a costly and complicated endeavor. Join us and learn how to apply strategies, techniques, and tools to build a reliable and effective DataOps practice in your organization.

Working with the COVID-19 Data Set

During this time of crisis, everyone is searching for answers. Governments, healthcare institutions, non-governmental organizations, and businesses large and small urgently need to make decisions about their future. We believe they should be armed with accurate, easily accessible, analytics-ready data. That’s why we collated, curated, and unified the most credible and reliable public data sets into a single source of truth data set.


This website uses cookies

To provide you with the best possible experience on our website, we may use cookies, as described here. By clicking accept, closing this banner, or continuing to browse our websites, you consent to the use of such cookies.

I agree