The Starschema COVID–19 Data Set collates a range of important resources for assessing the impact, severity, and response to the COVID–19 pandemic. The data is stored in AWS S3 as flat files and through the Snowflake Data Marketplace as a shareable data source for ease of access and is available free-of-charge. Detailed information about the content of the data set is available on the project’s Github repository. The METADATA table in the Snowflake Data Marketplace share contains detailed column-level information about the tables that comprise the share, as well as comments from data originators that help users make sense of the data.
As a wide range of organizations, from NGOs and governments to public health authorities and enterprises, struggle to adapt to this new world, the data provided in the Starschema COVID-19 Data Set can provide accurate, up-to-date intelligence to support real-time, data-driven decision-making. This single source of truth is “analytics-ready, and integrates with other data sources so you can analyze the progression of the COVID-19 pandemic over time, in any context. By aligning the data along widely used identifiers (e.g. ISO 3166 geographies), data from disparate sources can be unified easily and users are spared the work of reconciling the range of data sources that often use different identifiers or definitions.
While we included the most reliable and trustworthy sources, all data is not created equal and this is particularly true when looking at the data reported by countries and states that have unequal access to resources, assign different definitions to the same metric and whose governments exert influence to reflect well on their political agendas. This means it takes work to analyze the data and build models that you can have confidence in, and provide real analytical value to make better decisions.
New relevant and reliable data sources will be added to this data set as they come available and it will be constantly updated and revised. The pandemic is a moving target and will remain so for the foreseeable future. A good model can help identify trends, alert us when things are changing, show us how fast they are changing, and do so at various levels of detail.
These models — and the visualizations and dashboards they power — can be particularly helpful in evaluating
- supply chain dynamics
- demand planning
- HR and location vulnerabilities
- financial impacts
By integrating the Starschema COVID–19 Data Set with other related data — both internal and external — executives and managers can better understand the impacts at a deeper level and make business-critical data-driven decisions based on answers to newly relevant questions:
- What geographic areas are affected, how badly, when will they begin to normalize, and how quickly?
- What areas are at risk?
- What areas are threatened by a possible recurrence?
- How are government policies affecting each area and what effect will potential future policy changes have on the business?
- Who in your organization is at risk and how does this risk affect the capabilities of the organization?
- Who can be reassigned to ensure the most important functions and projects aren’t impacted?
- How is working from home impacting projects — good and bad?
- Which projects are at risk now and which are likely to be in the future?
- How does working from home impact operational costs?
- What supply chains, distribution centers, and customer channels are at risk now, and which are likely to become at risk in the future?
Through the work of collating, curating, and unifying the data we developed a nuanced understanding of the data, its biases and how to best work with it to gain meaningful insights. Our solution teams are led by a senior data scientist with long-standing expertise in clinical epidemiology and the analysis of viral outbreaks.