What Is a Data Warehouse?

What Is a Data Warehouse?

Data warehouse is a relational database that is designed for query and analysis. Data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.

The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.

This helps in:

·        Maintaining historical records
·       Analyzing the data to gain a better understanding of the business and to improve the business

Operational and Informational systems

Operational Systems:

An operational system is a term used in data warehousing to refer to a system that is used to process the day-to-day transactions of an organization. These systems are designed in a manner that processing of day-to-day transactions is performed efficiently and the integrity of the transactional data is preserved.

Sometimes operational systems are referred to as operational databases, transaction processing systems, or online transaction processing systems (OLTP). However, the use of the last two terms as synonyms may be confusing, because operational systems can be batch processing systems as well.

Any Enterprise must necessarily maintain a lot of data about its operation. This is its "Operational Data".

Organization
Probably
Manufacturing Company
Product data
Bank
Account Data
Hospital
Patient Data
University
Student Data
Government Department
Planning data

Informational Systems

Information system, an integrated set of components for collecting, storing, and processing data and for delivering information, knowledge, and digital products. Business firms and other organizations rely on information systems to carry out and manage their operations, interact with their customers and suppliers, and compete in the marketplace. For instance, corporations use information systems to reach their potential customers with targeted messages over the Web, to process financial accounts, and to manage their human resources.

Components of information systems

The main components of information systems are computer hardware and software, telecommunications, databases and data warehouses, human resources, and procedures. The hardware, software, and telecommunications constitute information technology (IT), which is now ingrained in the operations and management of organizations.

OLTP and DSS Systems

OLTP (Online Transaction Processing)

Online Transaction processing database applications are optimal for managing changing data, and usually have a large number of users who will be simultaneously performing transactions that change real-time data. Although individual requests by users for data tend to reference few records, many of these requests are being made at the same time. Common examples of these types of databases are airline ticketing systems and banking transaction systems.

The primary concerns in this type of application are concurrency and atomicity.
Concurrency controls in a database system ensure that two users cannot change the same data, or that one user cannot change a piece of data before another user is done with it. For example, if you are talking to an airline ticket agent to reserve the last available seat on a flight and the agent begins the process of reserving the seat in your name, another agent should not be able to tell another passenger that the seat is available.

Atomicity ensures that all of the steps involved in a transaction complete successfully as a group. If any step fails, no other steps should be completed. For example, a banking transaction may involve two steps: taking funds out of your checking account and placing them into your savings account. If the step that removes the funds from your checking account succeeds, you want to make sure that the funds are placed into your savings account or put back into your checking account.

Transaction processing system databases should be designed to promote:

·        Good data placement.
·        Short transactions to minimize long-term locks and improve concurrency.
·        Online backup.
·        High normalization of the database.
·        Little or no historical or aggregated data
·        Careful use of indexes.

DSS (Decision Support System)

Decision-support database applications are optimal for data queries that do not change data. For example, a company can periodically summarize its sales data by date, sales region, or product and store this information in a separate database to be used for analysis by senior management.

To make business decisions, users need to be able to determine trends in sales quickly by querying the data based on various criteria. However, they do not need to change this data. The tables in a decision-support database are heavily indexed, and the raw data is often preprocessed and organized to support the various types of queries to be used. Because the users are not changing data, concurrency and atomicity issues are not a concern; the data is changed only by periodic, bulk updates made during off-hour, low-traffic times in the database.

Decision-support system databases should be designed to promote:

Heavy Indexing

Decision-support systems have low update requirements but large volumes of data. Use many indexes to improve query performance.

Denormalization of the database

Introduce pre-aggregated or summarized data to satisfy common query requirements and improve query response times.

Use of a star or snowflake schema to organize the data within the database.

Characteristics of Data Warehouse

Some data is denormalized for simplification and to improve performance.

Large amounts of historical data are used.
Queries often retrieve large amounts of data.
Both planned and ad-hoc queries are common.
The data load is controlled.

Subject Oriented:

Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales.

Integrated:

Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure.
Nonvolatile

Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.

Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant.

Data Warehouse software and hardware architecture

In computing, the term data warehouse appliance (DWA) was coined by Foster Hinshaw to define a new category of computer architecture for data warehousing (DW) specifically targeted for Big Data Analytics and Discovery that is (a) simple to use (not a pre-configuration) and (b) very high performance for this workload. A DWA includes an integrated set of servers, storage, operating system(s), and DBMS.

In marketing, the term has evolved to include pre-installed and pre-optimized hardware and software as well as similar software-only systems promoted as easy to install on specific recommended hardware configurations or preconfigured as a complete system. These are marketing uses of the term and do not reflect the technical definition.

At its core, a DWA is designed specifically for high performance big data analytics and is delivered as an easy-to-use packaged solution. The internal software (and often hardware) constructs of a DWA differ significantly from a traditional stack in that they are written for a target workload and not a generic general purpose workload.

The data warehouse appliance (DWA) has several characteristics which differentiate that architecture from similar machines in a data center, such as an enterprise data warehouse (EDW).

1.     A DWA has a very tight integration of its internal components which are optimized for "data-centric" operations in contrast to "compute-centric" operations. The latter tend to emphasize number of CPU's, cores and network bandwidth.

2.     A DWA is trivial to use and install. In contrast to a "pre-configuration" of components, a DWA has very few configuration switches or options.

3.     A DWA is optimized for analytics on Big Data.


Most DW appliances use massively parallel processing (MPP) architectures to provide high query performance and platform scalability. MPP architectures consist of independent processors or servers executing in parallel.

Comments

Popular posts from this blog

પટેલ સમાજનો ઈતિહાસ જાણો : કોણ અને ક્યાંથી આવ્યા હતા પાટીદારો

Python HTML Generator using Yattag Part 1

Python HTML Generator using Yattag Part 2