Patient Selection Algorithm Development for Cardiac Resynchronization Therapy

Improving treatment outcomes through comprehensive patient selection

The Challenge

Cardiac Resynchronization Therapy (CRT) is a novel treatment modality for patients suffering from moderate and severe heart failure caused by left bundle branch block (LBBB), an abnormality of the heart's electrical system, in which the left ventricle's contraction is delayed and therefore systolic function is reduced, leading to the hallmark reduced ejection fraction (<35%) of heart failure. Similar to a pacemaker and often also including a cardioverter-defibrillator, CRT involves the implantation of an electrical device that uses electrical pulses to synchronize the function of the two ventricles. For appropriate patients, CRT can be a life-saving therapeutic option. However, the intervention itself has considerable risks, carries significant expense and its benefits to heart failure patients are limited to a small group of suitable patients. Given these issues, patient selection is paramount. To facilitate that, Starschema has been commissioned by a major university hospital's interventional cardiology service to perform an analysis of its CRT program, which has been running for 12 years at the time, and determine patient selection criteria to identify patients most likely to benefit from CRT and least likely to suffer perioperative or postoperative complications.

One of the most significant challenges was the diversity of data available on each patient, ranging from quantitative information (e.g. lab tests) through information extracted from operative reports and EHR (Electronic Health Record) systems, ECG recordings (including long-term Holter monitoring), prescription information and determination of ejection fraction by cardiac MRI to echocardiograms. In addition to this, where mortality has occurred, cause of death and circumstances surrounding mortality were comprehensively encoded by expert clinicians using ICD-10, and mortality events have been classified as heart failure related, device-related (e.g. postoperative infections) or unrelated.

Our Approach

To manage this diverse array of data, a data lake was constructed that could accommodate structured, unstructured and binary data alike. This enabled our data scientists to rapidly and efficiently access all data assets pertaining to a patient regardless of format.

Using advanced feature engineering techniques, data was standardized and encoded in a format that could then be used in a multifactorial survival model. For ECG data, dynamic time warping was used to align individual beats and the coefficients of Daubechies wavelet transforms were used, while echocardiograms were analyzed using a deep autoencoder. With the vast amount of time series data, including several ECG recordings each of thousands of patients, we relied on automated feature engineering using deep feature synthesis (DFS), which allowed us to generate and select synthetic features from large amounts of data for optimum information content and representativeness.

Finally, a survival model was built that stratified risk of perioperative, postoperative cardiac and postoperative non-cardiac death. This highlighted the most determinative factors of survival, including obvious ones (such as age, physical status and comorbidities) and less obvious ones (such as QRS complex width and the presence of atrial fibrillation). This allowed the creation of a patient selection scoring algorithm that leveraged the LIME (Local Interpretable Model-Agnostic Explanations) model to not merely indicate to the physician that a particular patient was assigned a particular score, but also what values that score is based on. In the end, decisions of patient selection are made by clinicians, and with decisions that have such a vast effect on individual lives, the appropriate role of machine learning is to distill a vast majority of clinical information and advise the clinician of the factors that militate in favor and against the intervention.


The objective of this project was to create a clinically useful tool that could distill a vast amount of patient information of often different sparsity and provide clinicians with a pre-implantation likelihood of perioperative mortality and long-term survival. The clinical reception of a tool allowing reasoned decision-making on the basis of data and rapidly summarizing a patient's entire record into a single predictive model was overwhelmingly positive.

When compared to clinical trials of CRT, such as the COMPANION, CARE-HF and the MIRACLE/MIRACLE-ICD trials, the outcomes were in near-complete agreement with the model. However, unlike the clinical algorithms and patient selection guidelines laid down by these studies, the model we provided did not merely calculate suitability, but also explained why a patient would be suitable or unsuitable for CRT and quantified the relative weight of each of those factors. Guidelines and clinical algorithms treat factors usually as being of equal weight, whereas a quantitative model can assist the clinician to make better decisions by also highlighting the relative impact of each factor on outcomes within the population.
Accurate patient selection improves outcomes, prevents inappropriate treatment and saves lives. By creating a transparent and easily interpretable predictor, clinicians can now make treatment decisions with greater confidence and weigh the factors for and against a particular intervention in view of an intuitive explanation of relative weights and contributions.

Technologies used

● Python
● scikit-learn
● TensorFlow
● Deep Feature Synthesis
● LIME (Local Interpretable Model-Agnostic Explanations)

Skills used

● Clinical analytics and population health
● Deep Feature Synthesis and feature engineering
● Machine learning
● Model explanation and analysis

Five AI Trends in Healthcare to Watch in 2022

AI is transforming healthcare, unlocking unprecedented opportunities for enabling easier discovery of deeper insights that drive innovation – but the available technologies can very greatly in their maturity and domain-specific applicability. This white paper introduces five proven, future-resilient solutions to challenges that healthcare providers face today.

Understanding Topic Modeling and Planning Its Implementation

Topic modeling enables the analysis of text-based data to leverage insights that are difficult to extract and understand to help you optimize costs, improve operations and drive innovation. Read this white paper to understand the fundamentals of topic modeling and learn how to get started implementing it.

Improving Predictive Accuracy

A Fortune 50 healthcare conglomerate wanted to improve the predictive model of a mobile application that helps those suffering from allergic rhinitis to make informed decisions about potential exposure to outdoor allergens by predicting symptom severity.

Innovative Medical R&D Insights Using Machine Learning with Gedeon Richter

Gedeon Richter, a multinational pharmaceutical and biotechnology company, leveraged Starschema's data science expertise to jointly develop an ML-based methodology to quantify the properties of the mitochondrial network within neurons to enable more effective analysis of medications for various neurological diseases.