Background
Allergic rhinitis (AR) affects almost one in every ten Americans. Many of them rely on mobile applications to make informed decisions about potential exposure to outdoor allergens. Such applications deliver value to the customer by aiding their assessment of exposure risk and thus reducing their overall symptom burden, while also serving as an avenue for manufacturers of anti-allergic preparations to learn more about their target customer demographic, in particular their experience with AR.
Our client, a Fortune 50 healthcare conglomerate, offers an application to provide predictions about symptom severity, and wanted to improve its predictive model.
There were three key challenges involved in enabling the application to deliver more value for users. First, the metric for measuring predictive accuracy was inadequate, as it failed to reflect relative class imbalances — where the classes are not approximately evenly distributed, such as in this case — and the ordering of symptom severity. Second, improving predictive accuracy would require adding and preparing new input variables. And third, there was no documentation or in-house expertise available for the legacy model at the client, which meant that Starschema would have to first reverse-engineer the model, then improve it by using more effective, cutting-edge predictive algorithms.
Updating the data sources entailed two main tasks. The first involved making better use of existing sources. The team found that, by joining together strongly correlated symptoms, they could reduce statistical noise and decrease the model’s complexity to improve its overall robustness. They also expanded the range of inputs – which had previously comprised only key symptoms and pollen counts – with weather and patient treatment information to give the application a more comprehensive foundation for predictions.
In addition, identifying typical co-occurring symptoms made it possible to identify whether the symptoms that the user is experiencing are typically allergic, atypical or mixed regime. This way, the application can provide higher-quality feedback while requiring less manual input from the user.
The most important step involved feature engineering, which allows the model to derive secondary variables from a feature variable. For the purposes of the client’s application, this meant an increase in predictive accuracy, as it enabled the model to consider trends in addition to point data as temporal information. Dimensionality reduction and feature selection helped further improve predictive accuracy by simplifying the model.
The Starschema team then rebuilt, from scratch, the underlying machine learning model based on a Gradient Boosting Regressor algorithm and changed the programming language form Java to Python.
The project also paved the way for future developments that will increase the application’s value. Users will benefit from further improvement in predictive accuracy thanks to the addition of air quality data, while the introduction of sales data will enable the indexing of the start of allergy season to help the client find out how it impacts the sales of allergy symptom relief products.