Data Science

The Knowledge Dividend of LLMs: a pragmatic perspective 

Learn what LLMs like GPT-4 know, how they know it and what this means for users with this pragmatic look at knowledge in large language models.

A No-Nonsense Approach to Large Language Models for the Enterprise pt. 3

See the results of experiments with GPT and other LLMs in enterprise use cases to set realistic expectations and get the most from the technology.

Avoid the Pitfalls of Causality Analysis

Learn the basics of the most popular root cause analysis methods and find out how to apply them to uncover KPI divers and improve decision-making.

A No-Nonsense Approach to Large Language Models for the Enterprise pt. 2

Learn how OpenAI and its open-source competitors fare in terms of performance, price and security in an enterprise context—based on real-world experiments.

Profitable Location Data Monetization — 3 Lessons from a Telco Company

Get best practices for turning your organizational data into a lucrative revenue stream, based on actual project experience at a major telco company.

A No-Nonsense Approach to Large Language Models for the Enterprise pt. 1

Ignore the hype around large language models like ChatGPT and find out from data scientists where the real opportunities lie for the enterprise.

Overcoming Data Science Challenges in Biosensor Analytics

AI and biosensor analytics will change healthcare as we know it  -  if companies can deliver the right solutions. See how this is playing out.

Consumer Goods R&D with Automated Product Stability Forecasting

See how a Fortune 50 company used an ML-driven solution to digitalize testing and improve the speed and cost-effectiveness of research and development.

Asemantic Induction of Hallucinations in Large Language Models

See how you can get GPT-4 to hallucinate and what it tells us about how GPT and similar language models arrive at their outputs.

Can We Put a Lie Detector on ChatGPT?

Find out how a ChatGPT-like generative AI solution could be equipped to make it a reliable source of information for business use cases and beyond.

Data for the Next Pandemic

Data has made all the difference in this pandemic. Learn how we need to prepare for the next using modern methods of infectious disease modeling.

Five Healthcare AI/ML Trends to Watch for in 2023

Learn about the technologies and approaches transforming healthcare, including AI in patient safety, decentralized clinical trials and more.

Data Science Trends to Rule 2022

Stay ahead of data science trends and learn how to approach the most promising technologies to drive innovation and gain competitive advantage.

8 Best Practices for Working with Your Data Science Vendor — from Data Scientists

Get practical advice from Starschema data scientists to optimize your workflows for better productivity and results from your next project.

The COVID Tracking Project is Shutting Down in a Week. What Next?

The COVID Tracking Project has been one of the most successful citizen-driven data collection projects in history. Driven by The Atlantic and supported by an army of volunteers, it has collected the nuggets of information about testing and case counts, often beating federal and state authorities to the race...

Fighting the COVID-19 pandemic with data and context
The COVID-19 outbreak is in many ways an outlier. It emerged with unusual speed, spread rapidly throughout the globe and has elicited a public health response that is unprecedented in recent…
COVID-19 and the first war of data science - Starschema Blog - Medium
In the subtitle of his remarkable history about the race for the nuclear bomb, science writer and historian of science Jim Baggott referred to World War II as the “first war of physics”. Today, the…
Arguing with Edward Snowden - Starschema Blog - Medium
I’ve recently read Edward Snowden’s Permanent Record during my holiday. I think it is a great book that I highly recommend for basically anyone, however it is particularly interesting for IT-folks…
Text preprocessing in different languages for Natural Language Processing in Python
In the first part, I outlined text pre-processing principles based on a framework from an academic article. The underlying goal of all these techniques was to reduce text data dimensionality but…
Predictive maintenance helped win a war. Now, it can help you outpace the competition.
The year is 1943. Britain’s survival still hangs by a thin, precarious thread, despite America joining the war effort. Just a few months ago, in February 1942, two German battleships, the…
A comprehensive guide to text pre-processing with python
This is Part 1 of a pair of tutorials on text pre-processing in python. In this first part, I’ll lay out the theoretical foundations. In the second part, I’ll demonstrate the steps described below…
Answering the big questions (this time, in chemistry)
Why do some molecules have undesirable biological effects, but others don’t? A model can tell us which do and which don’t, but model introspection can go one step further: it can tell us why.
Create a map of Budapest districts colored by income using folium in Python

PART 2 – Ever wondered how to draw a map of less common geographical areas? And color them based on some data? This pair of tutorials shows how to build this from scratch! First, you need to construct the…

JIT fast! Supercharge tensor processing in Python with JIT compilation
At Starschema, we’re constantly looking for ways to speed up some of the computationally intensive tasks we’re dealing with. Since a good amount of our work involves image processing, this means…
Digging deeper into ensemble learning - Starschema Blog - Medium

PART 2 – Have you ever wondered how combining weak predictors can yield a strong predictor? Ensemble Learning is the answer! This is the second of a pair of articles in which I will explore ensemble learning…

Combine your machine learning models for better out-of-sample accuracy

PART 1 – Have you ever wondered how combining weak predictors can yield a strong predictor? Ensemble Learning is the answer! This is the first of a pair of articles in which I will explore ensemble learning…

Quantifying hard retinal exudates using Growing Neural Gas algorithms
Diabetic retinopathy is a major cause of blindness in the developed world. Read how an uncommon neural network algorithm can be used to quantify the extent of disease.
Self-Organising Feature Maps for fun and profit - Starschema Blog - Medium
This is Part 2 of a three-part series on competitive neural networks. You can find Part 1, an introduction to competitive neural networks, here. Part 3, which looks at a different competitive…
Funderstanding competitive neural networks - Starschema Blog - Medium
Funderstanding is a little term I came up with a few years ago for fun ways of understanding complex concepts. The typical university way of teaching something is by laying the theoretical…