Improving Predictive Accuracy

Practice Area

  • Data Science

Business Impact

  • 20-25% improvement in predictive accuracy


  • Limited data source utilization
  • Inadequate predictive model


  • Python


Allergic rhinitis (AR) affects almost one in every ten Americans. Many of them rely on mobile applications to make informed decisions about potential exposure to outdoor allergens. Such applications deliver value to the customer by aiding their assessment of exposure risk and thus reducing their overall symptom burden, while also serving as an avenue for manufacturers of anti-allergic preparations to learn more about their target customer demographic, in particular their experience with AR.

Our client, a Fortune 50 healthcare conglomerate, offers an application to provide predictions about symptom severity, and wanted to improve its predictive model.

Improving Predictive Accuracy


The application relied on limited data and an unmaintained model based on a simple regression algorithm to make predictions. As a result, there was considerable room for improvement in its predictive accuracy.

There were three key challenges involved in enabling the application to deliver more value for users. First, the metric for measuring predictive accuracy was inadequate, as it failed to reflect relative class imbalances — where the classes are not approximately evenly distributed, such as in this case — and the ordering of symptom severity. Second, improving predictive accuracy would require adding and preparing new input variables. And third, there was no documentation or in-house expertise available for the legacy model at the client, which meant that Starschema would have to first reverse-engineer the model, then improve it by using more effective, cutting-edge predictive algorithms.

Improving Predictive Accuracy


The Starschema team identified the most appropriate metric to measure model performance and replaced the legacy metric with it. The new, custom-adjusted metric reflects both relative class imbalances and the ordering of symptom severity, and it also served as a baseline for evaluating the effectiveness of the developments that followed.

Updating the data sources entailed two main tasks. The first involved making better use of existing sources. The team found that, by joining together strongly correlated symptoms, they could reduce statistical noise and decrease the model’s complexity to improve its overall robustness. They also expanded the range of inputs – which had previously comprised only key symptoms and pollen counts – with weather and patient treatment information to give the application a more comprehensive foundation for predictions.

In addition, identifying typical co-occurring symptoms made it possible to identify whether the symptoms that the user is experiencing are typically allergic, atypical or mixed regime. This way, the application can provide higher-quality feedback while requiring less manual input from the user.

The most important step involved feature engineering, which allows the model to derive secondary variables from a feature variable. For the purposes of the client’s application, this meant an increase in predictive accuracy, as it enabled the model to consider trends in addition to point data as temporal information. Dimensionality reduction and feature selection helped further improve predictive accuracy by simplifying the model.

The Starschema team then rebuilt, from scratch, the underlying machine learning model based on a Gradient Boosting Regressor algorithm and changed the programming language form Java to Python.

Improving Predictive Accuracy


Starschema delivered the solution in two months. The new data sources and ML model resulted in a consistent 20-25% uplift in the application’s predictive accuracy. The application now makes significantly more accurate predictions about symptom severity for the next three days based on pollen and weather data, as well as symptom and treatment data from the user.

The project also paved the way for future developments that will increase the application’s value. Users will benefit from further improvement in predictive accuracy thanks to the addition of air quality data, while the introduction of sales data will enable the indexing of the start of allergy season to help the client find out how it impacts the sales of allergy symptom relief products.

Innovative Medical R&D Insights Using Machine Learning with Gedeon Richter

Gedeon Richter, a multinational pharmaceutical and biotechnology company, leveraged Starschema's data science expertise to jointly develop an ML-based methodology to quantify the properties of the mitochondrial network within neurons to enable more effective analysis of medications for various neurological diseases.

Five Ways to Leverage AI and Tableau

In this webinar, Tamas Foldi, Starschema CTO and Tableau Zen Master along with Kristof Csefalvay, Starschema’s VP for Special Projects present how ML/AI, natural language generation (NLG) and other goodies can be used to reveal hidden insights to drive bigger business impacts.

Patient Selection Algorithm Development for Cardiac Resynchronization Therapy

Cardiac Resynchronization Therapy (CRT) can be a life-saving therapeutic option. However, the intervention itself has considerable risks, carries significant expense and its benefits to heart failure patients are limited to a small group of suitable patients, making patient selection critical. The right algorithm can determine patient selection criteria to identify patients most likely to benefit from CRT and least likely to suffer pre-operative or post-operative complications.

Detecting MRI artifacts Using Deep Convolutional Neural Networks

Magnetic resonance imaging (MRI) artifacts, distortions or false signals that affect image quality, may adversely affect diagnostic quality, resulting in potential diagnostic errors and the need for costly and time consuming repeat examinations that may delay timely treatment. Detecting these artifacts with convolutional neural networks can save costs and improve outcomes.