Building a Data Warehouse. Best practice and advanced techniques | by Mike Shakhomirov | Feb, 2024 – Towards Data Science

Best practice and advanced techniques for beginners 12 min read

In this story, I would like to talk about data warehouse design and how we organise the process. Data modelling is an essential part of data engineering. It defines the database structure, schemas we use and data materialisation strategies for analytics. Designed in the right way it helps to ensure our data warehouse runs efficiently meeting all business requirements and cost optimisation targets. We will touch on some well-known best practices in data warehouse design using the dbt tool as an example. We will take a better look into some examples of how to organise the build process, test our datasets and use advanced techniques with macros for better workflow integration and deployment.

Lets say we have a data warehouse and lots of SQL to deal with the data we have in it.

In my case it is Snowflake. Great tool and one of the most popular solutions in the market right now, definitely among the top three tools for this purpose.

So how do we structure our data warehouse project? Consider this starter project folder structure below. This is what we have after we run dbt init command.

At the moment we can see only one model called example with table_a and table_b objects. It can be any data warehouse objects that relate to each other in a certain way, i.e. view, table, dynamic table, etc.

When we start building our data warehouse the number of these objects will grow inevitably and it is the best practice to keep it organised.

The simple way of doing this would be to organise the model folder structure being split into base (basic row transformations) and analytics models. In the analytics subfolder, we would typically have data deeply enriched and

Excerpt from:

Building a Data Warehouse. Best practice and advanced techniques | by Mike Shakhomirov | Feb, 2024 - Towards Data Science

Related Posts

Comments are closed.