Data Engineering

Enabling Scalable Machine Learning in Snowflake using SQL, Python and Bodo

Most database engines have begun to supplement their SQL capabilities by offering Python query support to allow their more data science inclined users to embed advanced statistics or machine learning code into query pipelines or data visualization tools like Tableau. Snowflake has been bucking that trend — until now.

Testing SQL Pool Performance in Azure Synapse Analytics

The feature set of Synapse Analytics is considerably richer than that of a “plain” old Azure SQL Database, but what benefits do we get from the smallest dedicated pool?

Introducing the Starschema Worldwide Address Data Set in Snowflake

We’re happy to announce that, as part of our ongoing effort to democratize data, we’ve taken over as the provider of The Worldwide Address Data Set, a free and open global address collection on the Snowflake Data Marketplace.

Vaccine Tracking Added to The Starschema COVID-19 Epidemiological Dataset

In our effort to help organizations assess contingency plans and make informed, data-driven decisions in real-time as they respond to the global health emergency, we’ve added vaccine tracking from the University of Oxford to the Starschema COVID-19 Epidemiological Dataset.

StarSnow: HTTP Client for Snowflake SQL

Snowflake is an extremely SQL-friendly database: you can ingest, transform, and access your structured and semi-structured data directly from your SQL code. However, as a cloud-only data platform, it has some fundamental restrictions...

Monitoring with JMX: How to Integrate Tableau Server with InfluxDB
Getting JMX metrics with jmxtrans from Tableau Server, storing in InfluxDB and accessing it in Grafana
Monitor your infrastructure with InfluxDB and Grafana on Kubernetes
Monitoring your infrastructure and applications is a must-have if you play your game seriously. Overseeing your entire landscape, running servers, cloud spends, VMs, containers, and the applications…
Getting through a challenging age with data - Starschema Blog - Medium
Following the implosion of the US housing bubble, when the mid-2000s Great Recession began to hit markets worldwide, many enterprises found themselves in dire straits. In the ensuing crisis…
Deploying TabPy in Enterprise: Scaling and Hardening in Kubernetes
There are so many tutorials out in the wild about how to take an application, containerize it and run it in your enterprise’s on-prem/public cloud securely, but hey, this will be yet another one…
Mining your Tableau logs with Apache Drill - Starschema Blog - Medium
Tableau Server and Desktop logs each and every action you perform. The log data is a gold mine for people eager to understand what is happening under the hood and why. However, there is no easy way…
Text preprocessing in different languages for Natural Language Processing in Python
In the first part, I outlined text pre-processing principles based on a framework from an academic article. The underlying goal of all these techniques was to reduce text data dimensionality but…
Scaling out Tableau Extracts — Building a distributed Tableau Hyper Cluster
Tableau Hyper Database (“Extract”) is a great engine; it’s one of the reasons people are obsessed with Tableau analytics. However, being a single node database server, it has its limits (performance…