Data Warehouse Architecture

February 07, 2017

Basic steps to develop Data Warehouse Architecture

1. Extracting the transactional data from the data sources into a staging area

2. Transforming the transactional data

3. Loading the transformed data into a dimensional database

4. Building pre-calculated summary values to speed up report generation

5. Building (or purchasing) a front-end reporting tool

Extracting Transactional Data

A large part of building a DW is pulling data from various data sources and placing it in a central storage area. In fact, this can be the most difficult step to accomplish due to the reasons mentioned earlier.

Fortunately for many small to mid-size companies, Microsoft has come up with an excellent tool for data extraction. Data Transformation Services (DTS), which is part of Microsoft SQL Server 7.0 and 2000, allows you to import and export data from any OLE DB or ODBC-compliant database as long as you have an appropriate provider.

Transforming Transactional Data

Most companies have their data spread out in a number of various database management systems: MS Access, MS SQL Server, Oracle, Sybase, and so on. Many companies will also have much of their data in flat files, spreadsheets, mail systems and other types of data stores. When building a data warehouse, you need to relate data from all of these sources and build some type of a staging area that can handle data extracted from any of these source systems.

Creating a Dimensional Model

The third step in building a data warehouse is coming up with a dimensional model. Most modern transactional systems are built using the relational model. The relational database is highly normalized; when designing such a system, you try to get rid of repeating columns and make all columns dependent on the primary key of each table.

The relational systems perform well in the On-Line Transaction Processing (OLTP) environment. On the other hand, they perform rather poorly in the reporting (and especially DW) environment, in which joining multiple huge tables just is not the best idea.

Loading the Data

After you've built a dimensional model, it's time to populate it with the data in the staging database. This step only sounds trivial. It might involve combining several columns together or splitting one field into several columns. You might have to perform several lookups before calculating certain values for your dimensional model.

Generating Precalculated Summary Values

The next step is generating the precalculated summary values which are commonly referred to as aggregations. This step has been tremendously simplified by SQL Server Analysis Services (or OLAP Services, as it is referred to in SQL Server 7.0).After you have populated your dimensional database, SQL Server Analysis Services does all the aggregate generation work for you.

Building (or Purchasing) a Front-End Reporting Tool

After you've built the dimensional database and the aggregations you can decide how sophisticated your reporting tools need to be. If you just need the drill-down capabilities, and your users have Microsoft Office 2000 on their desktops, the Pivot Table Service of Microsoft Excel 2000 will do the job.

The cost of building a custom reporting (and OLAP) tool will usually outweigh the purchase price of a third-party tool. That is not to say that OLAP tools are cheap (not in the least!).

Architectural components of data warehouse

The architecture consists of various interconnected elements:

ü Operational and external database layer – the source data for the DW

ü Information access layer – the tools the end user access to extract and analyze the data

ü Data access layer – the interface between the operational and information access layers

ü Metadata layer – the data directory or repository of metadata information

Additional layers are:

Process management layer – the scheduler or job controller.

Application messaging layer – the ―middleware‖ that transports information around the firm.

Physical data warehouse layer – where the actual data used in the DSS are located.

Data staging layer – all of the processes necessary to select, edit, summarize and load warehouse data from the operational and external data bases.

Data Warehouse System Architecture

Data warehouse system architecture Two-Tiered

This is the next step of evolution before we get onto a Data warehouse.

This is the next evolution state for an enterprise. This involves a Data Mart OR set of Independent Data Marts sitting on top of a single OR multiple staging databases. Data Mart is evolved as a quick solution to a business need. This results in Individual businesses and system owners creating their own data mart to meet their specific needs. Data-Mart does not work in isolation and it needs a staging database to get the data and OLAP tools to analyze the data contained in the data-mart. This leads to various variants of this topology.

Data warehouse system architecture Three-Tiered

Generally the data warehouses adopt the three-tier architecture. Following are the three tiers of data warehouse architecture.

Bottom Tier - The bottom tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.

Middle Tier - In the middle tier we have OLAP Server. The OLAP Server can be implemented in either of the following ways.

· By relational OLAP (ROLAP), this is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations.

· By Multidimensional OLAP (MOLAP) model, which directly implements multidimensional data and operations?

Top-Tier - This tier is the front-end client layer. This layer holds the query tools and reporting tool, analysis tools and data mining tools.

Search This Blog

IT & Programming Information

Data Warehouse Architecture

Comments

Post a Comment

Popular posts from this blog

પટેલ સમાજનો ઈતિહાસ જાણો : કોણ અને ક્યાંથી આવ્યા હતા પાટીદારો

Java Event Delegation Model, Listener and Adapter Classes

Cluster Computing & Grid Computing