Tuesday, July 7, 2009

Notes for Data Warehousing

1. Type of Data
a. Operational Data ( OLTP)
  • Data that works
  • Frequent updated and queried
  • Normalized for efficient search and update
  • Fragmented and local relevance.
  • Point query, query access individual tables.
What is the salary of John?
What is the phone number of the person who is in charge of Depta
How many people are rated as excellent?

b. Historical Data (OLAP)
  • Data that tells
  • Very infrequent update
  • Integrated data set with global relevance.
  • Analytical queries that require huge amounts of aggregation.
  • performance issue, need quick response time.
How is the trend in the past 2 years?
How is summary of something?

2. What is Data Warehousing?
  • An infrastructure of manage historical data
  • Designed to support OLAP queries involving gratuitous use of aggregation
  • Post retrival processing(reporting)
3. OLTP --> Data cleaning and Integration --> Data Warehousing

4. Data Marts:
  • Segments of OLTP
  • Data Warehouse is a collection of data marts
5. Data Cleaning

a. Dirt Data
  • Lack of standardization
  • Missing or duplicate data
  • Inconsistent.
b. Issue of Data cleaning
  • Can not be fully automated
  • GIGO
  • Require data considerable knowledge
  • Complex
c. Steps of cleaning process.
  • Data analysis
  • Definition of transformation rule
  • Rules verification
  • Transformation
  • Backflow: re-populate data
5. Integration
Schema: Forming an integrated schema structure from different data source, cleaning data and managing from different DS.

No comments:

Post a Comment