Patient Selection Algorithm Development for Cardiac… | Starschema

Patient Selection Algorithm Development for Cardiac Resynchronization Therapy

Improving treatment outcomes through comprehensive patient selection

The Challenge

Cardiac Resynchronization Therapy (CRT) is a novel treatment modality for patients suffering from moderate and severe heart failure caused by left bundle branch block (LBBB), an abnormality of the heart's electrical system, in which the left ventricle's contraction is delayed and therefore systolic function is reduced, leading to the hallmark reduced ejection fraction (<35%) of heart failure. Similar to a pacemaker and often also including a cardioverter-defibrillator, CRT involves the implantation of an electrical device that uses electrical pulses to synchronize the function of the two ventricles. For appropriate patients, CRT can be a life-saving therapeutic option. However, the intervention itself has considerable risks, carries significant expense and its benefits to heart failure patients are limited to a small group of suitable patients. Given these issues, patient selection is paramount. To facilitate that, Starschema has been commissioned by a major university hospital's interventional cardiology service to perform an analysis of its CRT program, which has been running for 12 years at the time, and determine patient selection criteria to identify patients most likely to benefit from CRT and least likely to suffer perioperative or postoperative complications.

One of the most significant challenges was the diversity of data available on each patient, ranging from quantitative information (e.g. lab tests) through information extracted from operative reports and EHR (Electronic Health Record) systems, ECG recordings (including long-term Holter monitoring), prescription information and determination of ejection fraction by cardiac MRI to echocardiograms. In addition to this, where mortality has occurred, cause of death and circumstances surrounding mortality were comprehensively encoded by expert clinicians using ICD-10, and mortality events have been classified as heart failure related, device-related (e.g. postoperative infections) or unrelated.

Our Approach

To manage this diverse array of data, a data lake was constructed that could accommodate structured, unstructured and binary data alike. This enabled our data scientists to rapidly and efficiently access all data assets pertaining to a patient regardless of format.

Using advanced feature engineering techniques, data was standardized and encoded in a format that could then be used in a multifactorial survival model. For ECG data, dynamic time warping was used to align individual beats and the coefficients of Daubechies wavelet transforms were used, while echocardiograms were analyzed using a deep autoencoder. With the vast amount of time series data, including several ECG recordings each of thousands of patients, we relied on automated feature engineering using deep feature synthesis (DFS), which allowed us to generate and select synthetic features from large amounts of data for optimum information content and representativeness.

Finally, a survival model was built that stratified risk of perioperative, postoperative cardiac and postoperative non-cardiac death. This highlighted the most determinative factors of survival, including obvious ones (such as age, physical status and comorbidities) and less obvious ones (such as QRS complex width and the presence of atrial fibrillation). This allowed the creation of a patient selection scoring algorithm that leveraged the LIME (Local Interpretable Model-Agnostic Explanations) model to not merely indicate to the physician that a particular patient was assigned a particular score, but also what values that score is based on. In the end, decisions of patient selection are made by clinicians, and with decisions that have such a vast effect on individual lives, the appropriate role of machine learning is to distill a vast majority of clinical information and advise the clinician of the factors that militate in favor and against the intervention.


The objective of this project was to create a clinically useful tool that could distill a vast amount of patient information of often different sparsity and provide clinicians with a pre-implantation likelihood of perioperative mortality and long-term survival. The clinical reception of a tool allowing reasoned decision-making on the basis of data and rapidly summarizing a patient's entire record into a single predictive model was overwhelmingly positive.

When compared to clinical trials of CRT, such as the COMPANION, CARE-HF and the MIRACLE/MIRACLE-ICD trials, the outcomes were in near-complete agreement with the model. However, unlike the clinical algorithms and patient selection guidelines laid down by these studies, the model we provided did not merely calculate suitability, but also explained why a patient would be suitable or unsuitable for CRT and quantified the relative weight of each of those factors. Guidelines and clinical algorithms treat factors usually as being of equal weight, whereas a quantitative model can assist the clinician to make better decisions by also highlighting the relative impact of each factor on outcomes within the population.
Accurate patient selection improves outcomes, prevents inappropriate treatment and saves lives. By creating a transparent and easily interpretable predictor, clinicians can now make treatment decisions with greater confidence and weigh the factors for and against a particular intervention in view of an intuitive explanation of relative weights and contributions.

Technologies used

● Python
● scikit-learn
● TensorFlow
● Deep Feature Synthesis
● LIME (Local Interpretable Model-Agnostic Explanations)

Skills used

● Clinical analytics and population health
● Deep Feature Synthesis and feature engineering
● Machine learning
● Model explanation and analysis

Customer Segmentation at an Energy Company

Our client, a retail gas and electricity provider, operates a call center to facilitate customer requests and to offer complementary services, such as equipment maintenance.

Using Data to Improve Global Supply Chains

Over 60% of the world’s global seaborne trade is shipped using intermodal freight containers, and the ports that manage them serve as central points for supply chains – over 90% of global trade is conducted through ports.

Optimizing Data Visualization

Great visualizations can make data come alive, leading to better insights and decisions. Data consumers of every kind rely on dashboards to perform their work, but they have a common complaint, slow loading times.

Palette Insight

Palette Insight provides actionable intelligence about your Tableau deployment to help you maximize the benefits of Tableau Server.


This website uses cookies

To provide you with the best possible experience on our website, we may use cookies, as described here. By clicking accept, closing this banner, or continuing to browse our websites, you consent to the use of such cookies.

I agree