Data warehouse Design Using Oracle
Unit Objective:
This unit will cover various schema design of the data warehousing modelling. It will describe schema table such as fact table and dimension table. The practical part explains details of creating dimension table using oracle statement level, hierarchy and attributes. Displaying dimension, validation of dimension and altering dimension.
1.1 Fact table
Figure 1-2 Fact Table
1.2 Dimension Table
Dimension is a collection of reference information about a measurable in the fact table.
The primary key column of the dimension table has uniquely identifies each dimension record or row.
The dimension tables are organized has descriptive attributes. For example, a customer dimension’s attributes could include first and last name, birth date, gender, Qualification, Address etc.,
1.3 Star Schema
Star Schema a Multidimensional data representation of relational database schema
A star schema contains two types of tables – Fact table and dimension table.
A Fact table in the middle connected to a set of dimension tables
In a star schema, a dimension table will not have any parent table
Each dimension table have primary key that corresponds exactly to one of the components of the composite key in the fact table
steps of designing star schema
1. Identify the business process. i.e Fact table For example - sales, reservation
2. Identify the measures or fact data i.e in the sales fact table net profit, sales_income, No_of _unit_sold.
3. Identify the Dimension table related to fact table. For example product, Time, Location,Branch
4. List of attributes or column in each dimension table
5. Find the lowest level of aggregation or summary analysis in the fact table. Ex. Total sales income.
star schema
1.3.1 Snowflake Schema
Snowflake Schema is a refinement of star schema where some dimensional hierarchy is normalized into third normal form and forms a set of smaller dimension tables.
Snowflake schema keeps same fact table structure as star schema.
In the dimension, it has multiple levels with multiple hierarchies. From each hierarchy of levels any one level can be attached to Fact Table.
Mostly lowest level hierarchy is attached to Fact Table. These hierarchies help to drill down the data from topmost hierarchies to the lowermost hierarchies.
the tables which describe the dimensions will
already exist and will typically be normalized.
Snow Flakes
1.3.2. Galaxy schema or Fact cancellation schema
- Multiple fact tables share dimension tables.
- This schema is viewed as collection of stars hence called galaxy schema or fact constellation.
- Sophisticated application requires such schema.
-
For Data warehouse the fact constellation schema is
commonly used since it can model multiple, interrelated subjects.
galaxy
1.4 CREATE DIMENSION statement
Before creating dimension, first create product table as a base table and then create dimension table product_dim.
conn hr/hr
Create table product( product_id number(5) primary key,prod_category varchar2(15),product_subcategory varchar2(15),prod_category_desc varchar2(20),Prod_subcate_desc varchar2(20),prod_price number(8,3),prod_status varchar2(5),brand_name varchar2(15),supplier_name varchar2(20));
The SQL CREATE DIMENSION statement supports the specification of levels, hierarchies and attributes.
SYNTAX:
CREATE DIMENSION <DIMENSION_NAME>
LEVEL <LEVEL NAME> IS<TABLE_NAME.COLUMN NAME>
HIERARCHY <HIERARCHY NAME>
(<CHILD LEVEL_NAME> CHILD OF <PARENT LEVEL_NAME>)
ATTRIBUTE <LEVEL NAME> DETERMINES (<COLUMN NAME>)
1.5 LEVEL
LEVEL is a first part of a Dimension declaration, which involves the specification of levels.
All Dimension names are unique in the database, dimension is a database object like table, index, view etc., If product_dim is already exist then for adding or modifying level use ALTER dimension otherwise drop dimension product_dim again re-create CREATE DIMENSION product_dim
Most dimensions involve multiple levels, in the example there are three Levels namely pid, prod_sub_cate, prod_cate
CREATE DIMENSION product_dim
LEVEL pid IS (product.product_id)
LEVEL prod_sub_Cate IS (product.product_subcategory)
LEVEL prod_cate IS (product.prod_category)
create Level
Suppose HR user don't privilege have to create dimension, then grant create dimension privilege to Hr user then create dimension.
1.5.1 Identical Level name
All the level name in the dimension to be unique not identical. If both the level has the same level name. So, it gives error message.
Identical Level name
1.5.2. Don't repeat the same column as level in different dimension
Using same column not allow to create as level even though the dimension is different and level name is different. Therefore only one time the column name of the table is allowed to create dimension. Duplicating the column for level and dimension is not allowed.
same column name don't allow to create level
1.6 Hierarchies
Hierarchies are used for data aggregation in data warehousing, it is logical structures that use ordered levels of organizing data.
Example, time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level.
Using hierarchies to drill down into data in the data warehousing to view different levels of granularity.
For example, city is a child of state (because you can aggregate city-level data up to state).
A hierarchical relationship is a functional dependency from one level of a hierarchy to the next level in the hierarchy, the CHILD OF relationship denotes that each child's level value is associated with one and only one parent level value
If the columns of a parent level and child level are in different relations, then the connection between them also requires a 1:n join relationship
Column of each hierarchy level are non-null that hierarchical integrity should be maintained.
Hierarchy level cannot be joined to itself either directly or indirectly.
A single dimension definition can contain multiple hierarchies.
Non-hierarchical dimension are referred as “Flat Dimension”, there is only a single level in a dimension
Here, we already created product_dim, so ALTER dimension add hierarchy or otherwise drop dimension product_dim and re-create product_dim again.
ALTER DIMENSION product_dim Add
HIERARCHY prod_rollup
( pid CHILD OF
prod_sub_cate CHILD OF
prod_cate)
Hierarchy
1.6.1. Hierarchy Using Join key
- Join key is used in snowflakes schema for joining parent dimension and child dimension.
- Hierarchy level cannot be joined to itself either directly or indirectly.
step 1: drop the brand_name from product table because it is normalized into another dimension called brand table
alter table product drop column brand_name;
step 2: Normalize the product table and create new table brand with three column
create table brand(brand_id number(5) primary key, brand_name varchar2(15),brand_desc varchar2(20));
Step 3: Alter the table product and create foreign key relationship between product and brand table.
alter table product
add
brand_id number(5) references brand(brand_id);
step 4: create level for brand table primary key
step 5: Alter dimension add the level brand id
ALTER DIMENSION PROCUT_DIM
ADD
LEVEL BID IS BRAND.BRAND_ID;
step 6: create hierarchy brand_rollup using join key with product brand_id and brand.brand_id
ALTER DIMENSION PROCUT_DIM
ADD
HIERARCHY BRAND_ROLLUP
(PID CHILD OF BID
JOIN KEY PRODUCT.BRAND_ID REFERENCES BID);
1.7 Attributes
- The ATTRIBUTE clause defines functional dependency relationships involving dimension levels and non source columns in dimension tables.
- The attribute_clause specify the columns that are uniquely determined by a hierarchy level. There must be a 1:1 attribute relationship between hierarchy levels and their dependent dimension attributes
The dependent_columns need not have been specified in the level_clause. For example prod_status is not specified in the level name.
ALTER DIMENSION PRODUCT_DIM ADD
ATTRIBUTE PID DETERMINES PROD_STATUS;
OR
ATTRIBUTE PID DETERMINES PRODUCT.PROD_STATUS;
The columns in level must all come from the same table as the dependent_columns.
ALTER DIMENSION PRODUCT_DIM ADD
ATTRIBUTE PID DETERMINES BRAND.BRAND_DESC;
because level pid from product table but attributes brand_desc from brand table.
Attribute level table differ from column table
1.7.1 Extended_attribute_clause
The only difference is that this clause assigns a name to the attribute that is different from the level name.
If the extended_attribute_clause is used to create multiple columns determined by a hierarchy level.
ALTER DIMENSION PRODUCT_DIM
ADD
ATTRIBUTE SUPP LEVEL PID DETERMINES SUPPLIER_NAME;
In the above example SUPP is attribute name.
1.8 ALTER DIMENSION
If dimension is already exist then using alter dimension statement it is easy to include hierarchy or level or attributes into the dimension. In the same manner removing or modifying the level, hierarchy or attribute also possible using alter dimension...drop statement.
1.8.1 Alter Dimension using ADD
Add level
ALTER DIMENSION PRODUCT_DIM
ADD
LEVEL PPRICE IS PRODUCT.PROD_PRICE;
Add Hierarchy
ALTER DIMENSION PRODUCT_DIM
ADD
HIERARCHY PRICE_ROLLUP
(PID CHILD OF PROD_CATE CHILD OF PPRICE);
Add Attribute
ALTER DIMENSION PRODUCT_DIM
ADD
ATTRIBUTE psubcate level pid DETERMINES
(product.product_subcategory, Prod_subcate_desc);
1.8.2 Alter Dimension using DROP
CASCADE and RESTRICT is used for restricting DROP IN ALTER DIMENSION.
cascade and restrict using drop dimension
you can drop one attribute column without dropping them all.
The following statement illustrates how you can drop a single column without dropping all columns:
ALTER DIMENSION product_dim
DROP ATTRIBUTE psubcate LEVEL pid COLUMN Prod_subcate_desc;
1.9. DROP Dimension
A dimension is removed using the DROP DIMENSION statement. For example:
DROP DIMENSION <Dimension name>
DROP DIMENSION product_dim;
Drop dimension
1.10 Display Dimension
As mentioned earlier in the database schema, the dimension is one of the objects like table, index. This statement gives how many user objects created by the user, here in where condition given object type is dimension. So it will display how many dimension created by user. Like "select * from tab" show how many tables are existed in the current user same manner it will show how many dimensions are created by the user.
SQL> SET LINESIZE 400;
SQL> select object_type,object_name FROM user_objects where object_type='DIMENSION';
OBJECT_TYPE OBJECT_NAME
------------------- ---------------------------------------------------------------------
DIMENSION PRODUCT_DIM
1.11 Dimension status
“All_dimensions” gives the number of dimension created by the
user and also tells the status of each dimension table whether it is, ERROR,
NEEDS_COMPILE, VALID etc.
dimension status
1.12 Retrieve Dimension
To view the definition of a dimension, use the DESCRIBE_DIMENSION procedure in the DBMS_DIMENSION package.
SET SERVEROUTPUT ON FORMAT WRAPPED; --to improve the display of info
EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION ('PRODUCT_DIM');
it will display how many level, hierarchy and attributes are available within the dimension.
Retrieve dimension
1.13 Validate Dimension
For verifying the validity of the relationship specified in the dimensions use DBMS_DIMENSION.VALIDATE_DIMENSION procedure periodically.
This procedure is easy to use and has only four parameters:
* dimension: who created the dimension or owner of the dimension object
* incremental: set to TRUE to check only the new rows for tables of this dimension.
* check_nulls: set to TRUE to verify that all columns that are not in the levels containing a SKIP WHEN NULL clauses are not null.
* statement_id: user-supplied unique identifiers to identify the result of each run of the procedure
In this example dimension='hr.product_dim',
incremental = FALSE, check_nulls=TRUE, statement_id='error catch'
EXECUTE DBMS_DIMENSION.VALIDATE_DIMENSION('hr.product_dim',FALSE, TRUE,'error catch');
1.13.1 Dimension_Exceptions
Before running the VALIDATE_DIMENSION procedure, need to create a local table, DIMENSION_EXCEPTIONS.
Dimension Exception
If the VALIDATE_DIMENSION procedure encounters
any errors, the errors are stored in DIMENSION_EXCEPTIONS table. Querying
this table will identify the exceptions that were found.
For validating Dimension
Step 1: check whether dimension_exception table is exit or not, then it will be created only once.
creating dimension exception
Step 2: suppose it is not exist, then create the dimension_exceptions table.
create table dimension_exceptions(STATEMENT_ID VARCHAR2(25),OWNER VARCHAR2(25),TABLE_NAME VARCHAR2(20),DIMENSION_NAME VARCHAR2(25),RELATIONSHIP VARCHAR2(25),BAD_ROWID VARCHAR2(35));
Step 3: Execute the validate_dimension procedure
EXECUTE DBMS_DIMENSION.VALIDATE_DIMENSION('hr.product_dim',FALSE, TRUE,'error catch');
Step 4: check there is any error in the dimension exception stored
display dimension exception
At present, there is no error occurred in the dimension_exceptions table.
Step 5: check with errors
Try to insert two records in the base table store.
SQL> Insert into product(PRODUCT_ID,PROD_CATEGORY) values (11,'Electronics');
1 row created.
SQL> insert into product (PRODUCT_ID,PROD_CATEGORY) values(11,'Cosmetics');
insert into product (PRODUCT_ID,PROD_CATEGORY) values(11,'Cosmetics')
*
ERROR at line 1:
ORA-00001: unique constraint (HR.SYS_C0011199) violated
Select * from dimension_exceptions;
Error Dimension Exception
1.13.2 Revalidate Dimension status
Suppose after creating the dimension for product table called “product_dim”, alter the structure of the base table any column data type or size of the column.
Describe product table
Modify the prod_category varchar2(15) into prod_category varchar2(20);
select dimension_name, compile_state from all_dimensions;
Alter table product modify (prod_category varchar2(20));
compile state
To revalidate the dimension “product_dim”, use the COMPILE option; now check the status of dimension again, the status has changed as “VALID”
Alter dimension Product_dim compile;
select dimension_name, compile_state from all_dimensions;
product compile state
Unit summary
At the end of the unit, student can learn how to design schema and how to create, modify and display the dimension. They also understand status of dimension.
Practical Exercise
Using hr schema,
Create dimension patientinfo_dim using patient table. The patient table has following columns patient id, patient name, DOB, gender, Address, disease, test, medicine attributes.
The patient table normalized into doctor table which has following attributes doctor id, doctor name, yrs_of_experience and specialization.
Design patientinfo_dim has hierarchy name as "medicine_rollup" which has following level patient_id, patient_name, disease, test and medicine.
create attributes "patient_info" which has columns are DOB, gender.
Do the following :
1) Include attribute as "Address" in patientinfo_dim
2) Display hierarchy and level of patientinfo_dim
3) Display how many dimensions are created by user hr.
4) Remove the level "test" from patientinfo_dim
5) Display the patient-dim status
6) Include "patient_dr_rollup" hierarchy in patientinfo_dim dimension for joining the patient and doctor table
7) Remove the gender column from patient_info attribute
8) Change the disease column size as varchar2(40) in base table product.
9) Check the productinfo_dim validation status and write appropriate statement to execute "compile_state" of patientinfo_dim as valid.
10) insert the below given values in the Patient table, patiend_id column is primary key column
patient_id Patient_name Gender
P11 Sara F
P11 Wafa F
While inserting check and store dimension Exception information.
Question
1. Explain design steps of Star schema
2. Differentiae star and snowflake schema
3. what type of column data available in fact table
4. How level is related in hierarchy using create dimension statement.
Video Tutorial Link
YouTube Tutorial 2: Create Dimension Snow Flakes Schema
YouTube Tutorial 3: Create Dimension Statement Syntax
YouTube Tutorial 4: Create Level
YouTube Tutorial 5- Identical Level Name
YouTube Tutorial 6: Create Hierarchy
YouTube Tutorial 7: Hierarchy using Join Key
YouTube Tutorial 8 : Create Attribute
YouTube Tutorial 9: Alter Dimension using ADD
YouTube Tutorial 10: Alter Dimension Using Drop
YouTube Tutorial 11: Drop Dimension
Lecturer Notes
Lecturer Notes 1: Create dimension Star Schema
Lecturer Notes 2: Create Dimension Snowflakes Schema
Lecturer Notes 3: Create Dimension Statement Syntax
Lecturer Notes 4: Create Dimension Level
Lecturer Notes- OER Unit 1 Schema Design
References
http://www.vertabelo.com/blog/technical-articles/data-warehouse-modeling-star-schema-vs-snowflake-schema
https://learndatamodeling.com/blog/designing-star-schema/
https://docs.oracle.com/cd/B28359_01/server.111/b28313/dimen.htm