SSIS Dimensions, Fact Table, Star Schema and SnowFlake - Theory Part 1

Introduction

Friends, welcome tolearning msbi tutorial website . I'm your host Gurunatha.D a trainer and developer at #Questpond, we love learning and teaching new technologies like MSBI, MVC5, Sharepoint and so on. We do take training all new technologies C#, SQL Server, MSBI, Sharepoint, WCF, WPF, WWF, Design Pattern, UML, Software Architecture and so on.

About us : We are Questpond and we love training and spreading knowledge across India and as well as in foreign countries like USA, UK, AUS, Nepal etc. We conduct training for C#, .NET ,ASP.NET, Sql Server, AngularJS, Design patterns, MSBI, MVC 5, WCF, SharePoint, Entity Framework, Web API, JSon JQuery. If you have any doubts or query feel free to touch base with us on questpond@yahoo.com, questpond@questpond.com.

This session is more about theory ranther than practical here we have concentrated less on practical and more on theory, this was necessary because to understand our next article on SSAS cube in a practical way you need to understand this article theory part. So request you to understand this article carefully.

What is Fact Table.

Fact table also called as measure table. It is located center of a snowflake or star schema and surrounded by dimensions. Fact table mostly have foreign keys to dimension table. It consist of two types of columns one of those facts and another one of foreign keys. Data analysis is usualy done on numbers or numeric values like number of apartments, total no of population, total amount of cars, total sales and so on. To take analysis it is necessary to have numbers and this fact table have those numbers which helps us to do nalysis. In simple words fact table have those numbers on which we tend to do analysis and forecasting.

What are Dimensions.

Dimensions are also part of star schema or snowflake. They are surrounded to fact table to make star like design structure and their table rows are uniquely identified by a single key field. Dimensions are nothing but reference which speaks about fact table. For example number of apartments with a reference of city table : Means only number of apartments will not justify anything or there is no mearning of no of apartments unless and until if we give any reference to it like number of apartments in Mumbai where "Mumbai" data is a part of "city table" or "dimension table". So defining dimensions are context which speaks about fact table or measure table. Dimension and fact table are connected by primary key and foreign key. Both table joining is considered on these keys.

What is Star Schema in Data Warehousing.

Star schema is a design made by fact table at center and surrounded by dimensions table the resulting diagram resembles a star. In this design model every dimension table is directly connected to fact table as shown in below image. Star schema is used mostly in data warehouse where we have million, billion or trillions of data and it is optimized for querying large data sets to support to support OLAP cubes / analytical applications. Here is data is connected by using a foreign key relationship, dimension table has primary key and fact table has foreign key to identify each record properly.

What is Snowflake Schema in Data Warehousing.

Snowflake schema is also formed using fact table at and surrounded by dimensions table the resulting diagram resembles a snowflake. Only difference is here all dimension table are connected to fact table directly, some dimension table may be connected to each other and some may connected to fact table. The only difference between start schema and snowflake schema is that there all dimension table directly connected to fact table and here out of many only few may be connected to fact table and rest other dimension tables are connected to each other. All these joining data is made using foreign key and primary key only. Diagram representation is shown below.

Star schema vs snowflake schema

Here we see the differences between Star schema & snowflake schema and try to see which is better.

Difference 1

1. Snowflake schema uses normalized data model means here you will not find and unnecessary redundancy of data thus helps to eliminate amount of data. If there no redundancy then it is easier to maintain and update.

1. Star Schema uses de-normalized data model means here you will find lots of redundant data. Hence to managing data in this model is less easy.

Conclusion : If you are looking for ease of maintenance with no repetition data then go for SnowFlake model.

Difference 2

2. snowflake data model has unique data with no repetition. To present data with no redundancy it makes lots of joins in data-warehouse. Hence you need write more complex queries which will have more number of joins. Snowflake schema is complex just because few dimension tables are connected to each other and other few connected to fact table. So writing query becomes more complex.

2. Star Schema is a simplest model which has lower query complexity and easy to understand why because all dimension in a schema is directly connected to face table.

Conclusion : If you need lower query complexity then go for Star Schema here you will have less number of query joins.

Difference 3

3. Since Snowflake model has more joins it is well understood that it will have more complex primary-key and foreign-key relationship due to this more number of joins. If there are more number of joins then to fetch millions of data it will take longer time for query execution, hence project becomes slow.

3. Since Star schema model has less number of joins it is well understood that it will take less time for query execution and hence project becomes fast.

Conclusion : If you want to improve query execution time to fetch millions of data then select star schema model.

Difference 4

4. Higher number of joins in snowflake schema because here few dimension table are connected to each other and other few to fact table which increases number of joins.

4. Less number of joins in star schema because all dimension tables are only connected to fact table

Conclusion : Read difference 2 & 3

Difference 5

5. It is very much possible that in snowflake design model one or two or few dimension will get connect to each other and remaining will get connect to fact table.

5. In star schema design model all dimension table must connect to fact table .

Hope you have understood this theory based article in our next upcoming article we understand in a practical way using an example of how to create star schema design model and snowflake design model.

Help us to share in Facebook, twitter and google plus.

+91-22-66752917
+91 9967590707
questpond@questpond.com / questpond@yahoo.com