How to implement ETL Process using SSIS with an example

Introduction

Hello Friends, Welcome to learn msbi step by step. Before we start our article session let me breif you little bit about us. We are #Questpond - A specialized E-Learning firm since past 15 years. We are dedicated and experts in Microsoft technologies. We do take online and as well as offline trainings for Freshers / Intermediate / Experts. We cover all corners of Microsoft technologies from C# ASP.NET to Microsoft CRM. Our mentors (teaching body) honoured with MVP (Microsoft's Most Valued Professional) endorsement.

So if you want to learn anything step by step like ASP.NET, C#, SQL Server, MSBI, Sharepoint, AngularJS, MVC, Microsoft CRM etc. Do let us know via mail or phone-call. OR we do also provide self-learning materials i.e. complete DOTNET DVD pack you can order that from our price page or order now page.

Apart from paid package we do provide free-learning materials (Videos & Articles) to kick start with MSBI and other topics.

Now coming back to topic, In this article session we will understand how to implement ETL i.e. (Extraction, Transformation and Loading) process in Data Warehouse using microsoft tool SSIS (SQL Server Integration Services).

Before we start let's understand the Data Warehouse & ETL process theory part using real time example.

What is Data Warehouse ?

As name implies Data warehouse, It is warehouse for database to store large aggregated data collected from wide range of sources within an organization. Source can be soft files, database files or some excel files.

For example : Baskin-Robbins (Famous for world's largest chain of ice cream specialty shops) has many shops in India as well as across the world. Let's say there is a Baskin-Robbins shop in our area and it has its own system of saving customer visit and product purchase history. So these data must be stored in a excel. Once in a week all these area-data is been collected and stored in a centralized city-data center which is nothing data-warehouse for all small-small areas. Same way all this city-data must be collected and stored in a state-data. A large data store which is accumulated from wide-range of souces is known as Data War.ehouse

I hope you have understood about data-warehouse.Now let's understand ETL process.

Difference between Database and Data warehouse

  • Data warehouse is one kind of database or a large database. Data warehouse is formed using multiple databases.
  • Database with respect Data warehouse, It helps to store data/information for a particular entity.
  • Database is used for Online Transactional Processing (OLTP).
  • Data warehouse is used for Online Analytical Processing (OLAP)
  • OLTP is a decentralized system normally used in Internet websites, banks, airlines, to avoid single points of failure and to spread the volume between multiple servers. This system is good to control and run fundamental business tasks.
  • OLAP is centralized system to help with planning, problem solving, and decision support. Queries are often very complex and relatively used for low volume of transaction.
  • Database tables are always in a normalized structure.
  • Data Warehouse tables are always in a de-normalized structure.
  • Normalization Database - Designed in such a way that same column data is not repeated or in simple words there will not be any redundant data. Here you will lots of Joins using foreign-key and primary-key
  • De-Normalization Database where you get more repeated data. Datamase mnagement becomes easy. No need to joins using foreign-key and primary-key.
  • Normalized Database - In terms performance due to many joins it affects the performance.
  • De-Normalized Database - Due to less number of joins or some time no-joins it improves the performance.

What is an ETL process ?

ETL stands for Extraction, Transformation and Loading. It is a process in data warehousing to extract data, transform data and load data to final source. ETL covers a process of how the data are loaded from the source system to the data warehouse. Let us briefly describe each step of the ETL process.

Extraction

Extraction is the first step of ETL process where data from different sources like txt file, XML file, Excel file or various sources collected.

Transformation

Transformation is the second step of ETL process where all collected data is been transformed into same format i.e. format can be anything as per our requirement before loading it to data-warehouse i.e. it may be data-type format, data merge format, splitting format, alphabet joining format, currency format etc.

Loading

Final step of ETL process, The big chunck of data which is collected from various sources and transformed then finally load to our data warehouse.

I hope you have understood this ETL process now let's see ETL process along with some real time example.

To do ETL process in data-ware house we will be using Microsoft SSIS tool.

ETL process with SSIS Step by Step using example

We do this example by keeping baskin robbins (India) company in mind i.e. customer data which is maintained by small small outlet in an excel file and finally sending that excel file to USA (main branch) as total sales per month. This data is necessary at head quaters (main branch) to track performance of each outlet.

So here also we will do same thing i.e. We will collect customer product purchase sales data from small-small outlet (In an Excel Format) - Extraction

Since baskin robbins is located in USA we need to convert or transform product purchase amount to USD currency and we will also convert product name to uppercase for unique representation - Transformation

Finally loading this transofrmed data to database / datawarehouse (SQL Server Database) - Loading

Step 1 :- Extract Data From Excel File

Before you read this steps kindly make sure you have installed microsoft business intelligence along with SQL Server.

I thin step we will create a simple excel file with a columns names as CustomerCode, CustomerName, ProductPurchase, Quantity, Amount, CustomerVisitedDate respectively.

Add some data as shown in below image.

Give a nice name and save it your computer.

Now open your SQL Server data tools if you don't have SQL server data tools installed i request to read our first article i.e How to install MSBI step by step. if you already MSBI data tool just open it and go to FILE -> NEW -> PROJECT.

Since we are doing ETL process and that process comes under SSIS so we need to create Integration Services so choose that -> Integration Service Project.

Give a nice name and create a project.

Now go to SSIS Toolbox and drag and drop "Data Flow Task" to control panel as show in below image.

Just double click on "Data Flow Task" to take you to "Data Flow".

Now go to SSIS Toolbox and from Other Souces tab just drap and drop Excel Souces. Why excel source because our inital data which we want to extract it is in excel format.

So just drag excel file as shown below image and right click and rename it so that if any developer reads it can easily able to understand.

Now right click on that excel source -> Edit -> Click on New button - Browse Excel file from your computer as shown in below image.

Since our first column of excel file is having column names so we need to check this below check box as you see in above image.

Now select Data access mode as "Table on view" then select Excel sheet name from drop down.

If you want to preview you can also do that by clicking preview button.

Finally click on OK button. So now your excel source is ready.It means we have successfully extracted our excel data file to SSIS excel data source.

Step 2 : Transform Data (Convert to US currency and Upper case)

As you now in our excel file we have column name called "Amount" and that amount is in Indian currency. So we need to convert that Indian amount to USD amount so to do that we will drag and drop "Derived Column" from SSIS toolbox.

Now if you see on Excel Source file box there are two arrows "Red" and Blue". Just drag that "Blue" arrow and join it to "Derived column" as shown in below image and rename that "Derived column".

Now right click on "Derived column" i.e ConvertingAmounttoUSD and click on Edit as show below image.

Now in the below Derived Column Name give a new column name "USDAmount" and Derived column select "add a new column" and in the Expressing give the Formula i.e. "Excel Column (Amount / 60)" and Data-type select as double precision float.

Why "Amount / 60" this expression is it because we want to current Indian Amount to USD currency so that's why we a added a new column "USDAmount" and added calculated values to this new column.

Finally save it.

We have now USD amount, next step we need stand representation of product name i.e. in UPPER CASE so for that we will add another "Derived Column" same way. Now here we need to drag "Blue" arrow from "ConvertingAmounttoUSD" to new derived column ("CapitalProductName") as shown below image.

Same way right click on "CapitalProductName" go to EDIT choose column "ProductPurchase" from columns section and drag it to "Derive Column Name. In the Derived Column select "Replace 'ProductPurchase'.

Expression you can either choose from string function of you can type it i.e. UPPER (Column name). finally click on OK button.

So as you can see we have completed data transformation part i.e. Convert Amount to USD and Changed Product Names to UPPER case.

Step 3 : Loading Data To SQL Server

Before you start this step just open up your SQL server management studio and create a new database if need or just a new table with same excel column names as shown below image

Now go to your SSDT and from SSIS toolbox under Other Destination select "ADO.NET Destination"and drag it to "Data Flow" and drag arrow from "CapitalProductName" to "ADO.NET Destination".

Right click and click on edit add your server name if you do not know your extact server name go to sql server management studio -> File -> click on connect and copy that server name. Now come back to SSDT and click New button -> Again New -> give here server name.Once you give server name automatically database dropdown will populate. Select database and click on OK button. Finally choose table as shown in below image and save it (OK button).

So what we did here it that we have created a path from where excel data flow from excel file -> Transform USD & UPPER CASE -> SQL Server Data Warehouse.

Final step just go to DEBUG and click on Start button from top menu of SSDT or just click on F5. Automatically data will flow from Excel Source -> SQL Server.

This is how ETL process is executed using SSIS.

I hope you have understood the article if you still have any doubts feel free to ask us on below mentioned details.

To quick start on MSBI, request you to watch our Tutorial Videos for free and read more MSBI articles. If you are looking for MSBI Training with hands-on project then visit our training page and contact us.

If you want to complete .NET DVD Pack which includes AngularJS, Sharepoint, MVC, MVP, MSBI, SQL Server, C#, ASP.NET, Java, Testing, WCF, WPF then visit our Order now page.

+91-22-66752917
+91 9967590707
questpond@questpond.com / questpond@yahoo.com