Improving Predictive Accuracy

Practice Area

  • Data Science

Business Impact

  • 20-25% improvement in predictive accuracy


  • Limited data source utilization
  • Inadequate predictive model


  • Python


Allergic rhinitis (AR) affects almost one in every ten Americans. Many of them rely on mobile applications to make informed decisions about potential exposure to outdoor allergens. Such applications deliver value to the customer by aiding their assessment of exposure risk and thus reducing their overall symptom burden, while also serving as an avenue for manufacturers of anti-allergic preparations to learn more about their target customer demographic, in particular their experience with AR.

Our client, a Fortune 50 healthcare conglomerate, offers an application to provide predictions about symptom severity, and wanted to improve its predictive model.

Improving Predictive Accuracy


The application relied on limited data and an unmaintained model based on a simple regression algorithm to make predictions. As a result, there was considerable room for improvement in its predictive accuracy.

There were three key challenges involved in enabling the application to deliver more value for users. First, the metric for measuring predictive accuracy was inadequate, as it failed to reflect relative class imbalances — where the classes are not approximately evenly distributed, such as in this case — and the ordering of symptom severity. Second, improving predictive accuracy would require adding and preparing new input variables. And third, there was no documentation or in-house expertise available for the legacy model at the client, which meant that Starschema would have to first reverse-engineer the model, then improve it by using more effective, cutting-edge predictive algorithms.

Improving Predictive Accuracy


The Starschema team identified the most appropriate metric to measure model performance and replaced the legacy metric with it. The new, custom-adjusted metric reflects both relative class imbalances and the ordering of symptom severity, and it also served as a baseline for evaluating the effectiveness of the developments that followed.

Updating the data sources entailed two main tasks. The first involved making better use of existing sources. The team found that, by joining together strongly correlated symptoms, they could reduce statistical noise and decrease the model’s complexity to improve its overall robustness. They also expanded the range of inputs – which had previously comprised only key symptoms and pollen counts – with weather and patient treatment information to give the application a more comprehensive foundation for predictions.

In addition, identifying typical co-occurring symptoms made it possible to identify whether the symptoms that the user is experiencing are typically allergic, atypical or mixed regime. This way, the application can provide higher-quality feedback while requiring less manual input from the user.

The most important step involved feature engineering, which allows the model to derive secondary variables from a feature variable. For the purposes of the client’s application, this meant an increase in predictive accuracy, as it enabled the model to consider trends in addition to point data as temporal information. Dimensionality reduction and feature selection helped further improve predictive accuracy by simplifying the model.

The Starschema team then rebuilt, from scratch, the underlying machine learning model based on a Gradient Boosting Regressor algorithm and changed the programming language form Java to Python.

Improving Predictive Accuracy


Starschema delivered the solution in two months. The new data sources and ML model resulted in a consistent 20-25% uplift in the application’s predictive accuracy. The application now makes significantly more accurate predictions about symptom severity for the next three days based on pollen and weather data, as well as symptom and treatment data from the user.

The project also paved the way for future developments that will increase the application’s value. Users will benefit from further improvement in predictive accuracy thanks to the addition of air quality data, while the introduction of sales data will enable the indexing of the start of allergy season to help the client find out how it impacts the sales of allergy symptom relief products.

Ask the Expert

Eszter Windhager-Pokol

Data Science Team Lead

Eszter holds a degree in Applied Mathematics and has years of experience supporting data-driven decision-making as a consultant, with additional experience researching collaboration filtering and developing user behavior analytics products for IT security purposes. Eszter regularly holds data science trainings for business users and teaches Mastering the Process of Data Science at CEU as a visiting faculty instructor.

Windhager Pokol Eszter
Automating BI Analytical Tasks with Anomaly Detection and NLG Summation

Learn how to design and implement a complex solution that automatically identifies anomalies in organizational data, provides relevant context and communicates it all in an easy-to-consume form to augment analysts' work.

The Cornerstones of an Effective Location Data Strategy

Geolocation data provides invaluable insights into the habits and preferences of users, customers and audiences. This white paper helps understand the fundamental opportunities and challenges inherent in using location data for business-critical processes in any industry.

Data Science Project Planning Worksheet

This downloadable worksheet contains five categories of requirements and conditions that a client-side stakeholder in a project involving data science services needs to define to ensure timely delivery, optimized cost and valuable outcomes.

Five AI Trends in Healthcare to Watch in 2022

AI is transforming healthcare, unlocking unprecedented opportunities for enabling easier discovery of deeper insights that drive innovation – but the available technologies can very greatly in their maturity and domain-specific applicability. This white paper introduces five proven, future-resilient solutions to challenges that healthcare providers face today.