Lisheng chang M2524nc JQ40 unsplash

Data Marketplace Architecture

Case Study

Background

One of the world’s largest investment, advisory and risk management firms provides portfolio management software that helps customers visualize and act on investment data. To build on the software’s success, the company set out to create a marketplace for financial data sets that leverages the technology behind the software. However, during development, they were unsatisfied with the time it took to onboard new datasets and make them usable on the platform — the underlying issue resulted in month-long delays before a dataset could be shared on the data marketplace. The company wanted to improve the platform’s performance to ensure customer satisfaction upon the product’s launch and reached out to Starschema to design and implement a solution.

Practice Area

  • Data Engineering

Business Impact

  • Faster insights via reduced data load times
  • Improved customer satisfaction through better product performance

Challenges

  • Moving from legacy architecture and batch processing to a modern stream-based architecture
  • Ensuring the system could handle the huge increase of data due to stream-based architecture
  • Designing and deploying a metadata-driven ingestion system to reduce the technical knowledge required for data stewards to onboard datasets
  • Building data pipelines in highly volatile infrastructure
  • Accommodating a wide range of data types

Technologies

  • Snowflake
  • dbt
  • Python
  • Apache Airflow
  • DataHub
  • Kubernetes
Data Marketplace Architecture

Challenge

Data Marketplace Architecture

Solution

Data Marketplace Architecture

Outcome

Ask the Expert

Anjan Banerjee

Field CTO

Anjan works with cloud technologies and data warehouses. He has extensive experience in building data orchestration pipelines, designing multiple cloud-native solutions, and solving business-critical problems for many multinational companies. He applies the concept of infrastructure as code as a means to increase the speed, consistency, and accuracy of cloud deployments.

Anjan Banerjee 1