Data Warehouse Architecture
Basic steps to develop
Data Warehouse Architecture
1. Extracting the transactional data from the data
sources into a staging area
2. Transforming the transactional data
3.
Loading the transformed data into a dimensional database
4. Building pre-calculated summary values to speed up
report generation
5.
Building (or purchasing) a front-end reporting tool
Extracting
Transactional Data
A
large part of building a DW is pulling data from various data sources and
placing it in a central storage area. In fact, this can be the most
difficult step to accomplish due to the reasons mentioned earlier.
Fortunately
for many small to mid-size companies, Microsoft has come up with an excellent
tool for data extraction. Data Transformation Services (DTS), which is part of
Microsoft SQL Server 7.0 and 2000, allows you to import and export data from
any OLE DB or ODBC-compliant database as long as you have an appropriate
provider.
Transforming
Transactional Data
Most
companies have their data spread out in a number of various database management
systems: MS Access, MS SQL Server, Oracle, Sybase, and so on. Many companies
will also have much of their data in flat files, spreadsheets, mail systems and
other types of data stores. When building a data warehouse, you need to relate
data from all of these sources and build some type of a staging area that can
handle data extracted from any of these source systems.
Creating
a Dimensional Model
The
third step in building a data warehouse is coming up with a dimensional
model. Most modern transactional systems are built using the relational
model. The relational database is highly normalized; when designing such a
system, you try to get rid of repeating columns and make all columns dependent
on the primary key of each table.
The
relational systems perform well in the On-Line Transaction Processing (OLTP)
environment. On the other hand, they perform rather poorly in the reporting
(and especially DW) environment, in which joining multiple huge tables just is not
the best idea.
Loading
the Data
After
you've built a dimensional model, it's time to populate it with the data in
the staging database. This step only sounds trivial. It might involve combining
several columns together or splitting one field into several columns. You might
have to perform several lookups before calculating certain values for your
dimensional model.
Generating
Precalculated Summary Values
The
next step is generating the precalculated summary values which are
commonly referred to as aggregations. This step has been tremendously
simplified by SQL Server Analysis Services (or OLAP Services, as it is referred
to in SQL Server 7.0).After you have populated your dimensional database, SQL
Server Analysis Services does all the aggregate generation work for you.
Building
(or Purchasing) a Front-End Reporting Tool
After you've built the
dimensional database and the aggregations you can decide how sophisticated your
reporting tools need to be. If you just need the drill-down
capabilities, and your users have Microsoft Office 2000 on their desktops, the
Pivot Table Service of Microsoft Excel 2000 will do the job.
The cost of building a
custom reporting (and OLAP) tool will usually outweigh the purchase price of a
third-party tool. That is not to say that OLAP tools are cheap (not in the
least!).
The
architecture consists of various interconnected elements:
ü Operational and external database layer – the source data for the DW
ü Information access layer – the tools the end user access to extract and analyze
the data
ü Data access layer – the interface between the operational and
information access layers
ü
Metadata
layer – the data directory or
repository of metadata information
Additional
layers are:
Process
management layer – the scheduler or
job controller.
Application
messaging layer – the ―middleware‖
that transports information around the firm.
Physical
data warehouse layer – where the
actual data used in the DSS are located.
Data
staging layer – all of the processes
necessary to select, edit, summarize and load warehouse data from the
operational and external data bases.
Data
Warehouse System Architecture
Data
warehouse system architecture Two-Tiered
This
is the next step of evolution before we get onto a Data warehouse.
This
is the next evolution state for an enterprise. This involves a Data Mart OR set
of Independent Data Marts sitting on top of a single OR multiple staging
databases. Data Mart is evolved as a quick solution to a business need. This
results in Individual businesses and system owners creating their own data mart
to meet their specific needs. Data-Mart does not work in isolation and it needs
a staging database to get the data and OLAP tools to analyze the data contained
in the data-mart. This leads to various variants of this topology.
Data
warehouse system architecture Three-Tiered
Generally
the data warehouses adopt the three-tier architecture. Following are the three
tiers of data warehouse architecture.
Bottom
Tier - The bottom tier of the
architecture is the data warehouse database server. It is the relational
database system. We use the back end tools and utilities to feed data into
bottom tier. These back end tools and utilities perform the Extract, Clean,
Load, and refresh functions.
Middle
Tier - In the middle tier we have OLAP
Server. The OLAP Server can be implemented in either of the following ways.
·
By relational
OLAP (ROLAP), this is an extended relational database management system. The
ROLAP maps the operations on multidimensional data to standard relational
operations.
·
By
Multidimensional OLAP (MOLAP) model, which directly implements multidimensional
data and operations?
Top-Tier
- This tier is the front-end client
layer. This layer holds the query tools and reporting tool, analysis tools and
data mining tools.
Comments
Post a Comment