Msc: 2007

4.7 Normal Forms

There are 6 types of normal forms.

1. First Normal Form (1NF).

Any repeating groups have been removed, so there is a single value at the intersection of each row and column of the table.

2. Second Normal Form (2NF).

Any partial functional dependencies have been removed.

3. Boyce-Codd Normal Form (BCNF).

Any remaining anomalies that result from functional dependencies have been removed.

4. Third Normal Form (3NF).

Any transitive dependencies have been removed.

5. Fourth Normal Form (4NF).

Any multi-valued dependencies have been removed.

6. Fifth Normal Form (5NF).

Any remaining anomalies have been removed.

4.7 Normal Forms
4.7.1 First Normal Form (1NF)

To be in First Normal Form (1NF) a relation must have only single-valued attributes -- neither repeating groups nor arrays are permitted

An example of un-normalized form (UNF) and first normal form (1NF) comparison is shown as follow.

UNF

– Job-Seeker(Jregno, Jsname, (JsQual, Subject, level, award), (Pemp, employer, addr, pay))

1 NF

– Job-Seeker(Jregno, Jsname)
– (JsQual, Subject, level, award)
– (Pemp, employer, addr, pay)

Another example to show the conversion of the following table: FIRST

Figure 4.9(a)

To convert it to 1NF there are two solutions:

First solution: Repeat the supplier number for each part supplied. (This is not the best table design since it consumes more space.)

Figure 4.9(b)

Second solution: Split the relation into two separate relations. (This is better since it contains less redundant information.).

Figure 4.9(c)

4.7 Normal Forms
4.7.2 Second Normal Form (2NF)

To be in Second Normal Form (2NF) the relation must be:

in 1NF and
each non-key attribute must be dependent on the whole key (not a subset of the key or partial dependency).

2NF conversion

(Pemp, employer, addr, pay) =>(Pemp, pay), (employer, addr)

Definition. An attribute is a non-key attribute if it is not part of the primary key.

Consider the relation scheme:

FIRST(S#, STATUS, CITY, P#, QTY)

with the meaning “a supplier denoted S#, with a STATUS and located in CITY supplies a part denoted by P# in quantity QTY” and the following dependency rules:

S# -> CITY
S#, P# -> QTY
CITY -> STATUS

Problems in the FIRST relation:

(1) The table is redundant which will result in inconsistencies whenever the table is updated (i.e., more cumbersome updates, thus possible inconsistent data).

(2) Insertion can cause a null primary key, if only S# and CITY is inserted.

(3) Deletion of the 9th tuple wll cause to delete the fact that S3 is in Paris (for table in Figure 4.9b)

The problem arises because the primary key for the relation is (S#, P#) but CITY and STATUS are only dependent on S# (i.e., partial dependency).

To overcome this problem, split the first relation into two relations

Figure 4.10

4.7 Normal Forms
4.7.3 Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a superkey or the FD is trivial.

Let R be a relation schema

X be a subset of the attributes of R
A be an attribute of R

R is in BCNF if

for every FD X -> A, one of the following is true

A is a subset of X (it is a trivial FD), or X is a super-key

Theoretically we must check each FD

X -> A in F+

It is actually sufficient to check

whether the LHS of each FD in F+ is a super-key.

Consider:

STUDENT-ID, MAJOR -> ADVISER
ADVISER -> MAJOR

Non-BCNF

R(STUDENT-ID, MAJOR, ADVISER)

BCNF (Solution: split into two relations)

R1(STUDENT-ID, ADVISER)
R2(ADVISER, MAJOR)

Example on testing of BCNF

R (A, B, C, D), with F = { A -> B, B -> C}
Decompose R into R1(A,B) and R2(A,C,D)

What is the key for R2 ?

Neither of the dependencies in F, A -> B, B -> C, contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF. In fact, dependency A -> C in F+ (transitivity) shows R2 is not in BCNF.

Lossless-join Decomposition into BCNF

Given schema R

W Ì Schema R
A is a single attribute
W -> A is a FD that violates BCNF

decompose R into

R - A, and WA

If either is not in BCNF, decompose recursively

the decomposition is lossless-join

Example of BCNF Decomposition

R = (branch-name, branch-city, assets,
customer-name, loan-number, amount)

F = {branch-name -> assets, branch-city
loan-number -> amount, branch-name}

Key = {loan-number, customer-name}

Decomposition

R (F1)

R1 = (branch-name, branch-city, assets)
R2 = (branch-name, customer-name, loan-number, amount)

R2 (F2)

R3 = (branch-name, loan-number, amount)
R4 = (customer-name, loan-number)

Final decomposition

R1, R3, R4

Figure 4.11

Figure 4.12

If there are multiple dependencies (in a relation) that violate BCNF. There may be different ways to decompose, resulting in different collections of BCNF relations. A designer has to choose one based on the semantics of the applications.

4.7 Normal Forms
4.7.4 Third Normal Form (3NF)

To be in Third Normal Form (3NF) the relation must be in 2NF and no transitive dependencies may exist within the relation. (i.e. the only determinants it contains are candidate keys)

A transitive dependency is when an attribute is indirectly functionally dependent on the key (that is, the dependency is through another non-key attribute).

Consider:

The relation SECOND which is already in 2NF still has the processing anomalies.

Problems:

(1) The fact that a city has a certain status cannot be entered unless there is a supplier in that city.

(2) If we delete the row containing S2 and S3 then we lose the fact that Paris has the status 10
(for table in Figure 4.10)

(3) The status of city is repeated for each occurrence of the city.

The problem arises because STATUS is transitively dependent on S# via CITY.

The solution is to split the SECOND relation into two separate relations using the relational algebra projection operator.

Figure 4.13

“Performance is the only reason for not going into 3NF”
Relations in 3NF are sufficient for most practical database applications. However, 3NF does not guarantee that all anomalies have been removed.

Let

R be a relation schema
X be a subset of the attributes of R
A be an attribute of R

R is in 3NF if

for every FD X -> A, one of the following is true

A -> X (it is a trivial FD), or
X is a super-key, or
A is part of some key for R. {not for BCNF}

BCNF is stronger than 3NF and every BCNF relation is also in 3NF

Example of violation of 3NF

Consider the table in figure 4.14. The key is SID, and the functional dependencies are SID -> Building and
Building -> Fee. By transitivity, it implies SID -> Fee.

Figure 4.14

The key of HOUSING is SID is a single attribute. This implies the relation is 2NF since both Building and Fee are determined by SID). However, HOUSING has anomalies due to transitivity.

If the second tuple is deleted, the whole record of fee for Ingersoll is lost. This is deletion anomaly. Furthermore, we cannot insert the fee of another hall unless a student is committed to provide SID for the table. This is insertion anomaly.

The problems can be resolved by splitting the table into 2 as shown in figure 4.15.

Figure 4.15

Example on 3NF vs BCNF

Reserves ( sailorID, boatID, day, creditCardID )

Consider FD sailorID -> creditCardID

That is, a sailor uses a unique credit card to pay for reservations

sailorID is not a superkey
creditCardID is not part of a key

Hence, this relation is not in 3NF.

But if we also know FD creditCardID -> sailorID

That is, credit cards uniquely identify the owner

is also a key

FD1 : SailorID -> creditCardID does not violate 3NF because creditCardID is part of a key
FD2 : creditCardID -> SailorID does not violate 3NF

Reserves is in 3NF but in all tuples with same SailorID value, the same
pair is redundantly stored

Hence, Reserves is not in BCNF

Examples of Decomposition of Relations

Consider

Banker-info-schema = (branch-name, customer-name, banker-name, office-number)

The functional dependencies for this relation schema are:

banker-name -> branch-name, office-number
customer-name, branch-name -> banker-name

The key is: {customer-name, branch-name}

Consider

(branch-name, customer-name, banker-name, office-number)

Is it in 3NF?

banker-name -> branch-name office-number

Not trivial
Not superkey
NOT part of a candidate key

customer-name branch-name -> banker-name

Not trivial
Not superkey
NOT Part of a candidate key

Figure 4.16

Applying 3NF to Banker-info-schema

Banker-office-schema = (banker-name, branch-name, office-number)

Banker-schema = (customer-name, branch-name, banker-name)

Example

Consider the following table

Figure 4.17

The following are 2 FDs

I -> (I,N,L,R,W,H)
R -> W

Since R is not a superkey and W is not part of any key, it is a violation of 3NF.

Decompose INLRWH into INLRH & RW
INLRH has {hkid, name, lot, rating, hourly_wage}

FD I -> INLRH
I is a key for INLRH

RW has {rating, hours_worked}

FD R -> W
R is a key for RW

Both schemas are in BCNF.This decomposition was guided by the FD (R -> W) that violated 3NF

Method: remove W from the main schema

Figure 4.18

4.7 Normal Forms
4.7.5 Fourth Normal Form (4NF)

A relation is in Fourth Normal Form (4NF) if it is in Boyce-Codd Normal Form and contains no multi-valued dependency.

Consider the following course-teacher-text relation:

Figure 4.19

4NF: Solution (split CTX Relation into two relations TEACHER and TEXT

Figure 4.20

4.7 Normal Forms
4.7.6 Fifth Normal Form (5NF)

A relation is in Fifth Normal Form (5NF) if it is in Fourth Normal Form and does not have a join dependency. A join dependency: a relation that has a join dependency cannot be divided into two (or more) relations such that the resulting tables can be recombined to form the original table.

In general, the normalization process includes:

Eliminating repeating groups.
Removing partially-dependent columns.
Removing transitively-dependent columns.

Eliminating repeating groups

When a field in a given row contains more than one value for each occurrence of the primary key, then that group of data items is called a repeating group. This is a violation of the first normal form, which does not allow multi-valued attributes.

Refer to the DEPARTMENT table. For any occurrence of a given primary key, if a column can have more than one value, then this set of values is a repeating group. Therefore, the first row, where DEPT_NO = "100," contains a repeating group in the DEPT_LOCATIONS column.

DEPT_NO

DEPARTMENT

HEAD_DEPT

BUDGET

DEPT_LOCATIONS

100 Sales 000 1000000 Monterey, Santa Cruz, Salinas

600 Engineering 120 1100000 San Francisco

900 Finance 000 400000 Monterey

In the next example, even if you change the attribute to represent only one location, for every occurrence of the primary key "100," all of the columns contain repeating information except for DEPT_LOCATION, so this is still a repeating group.

DEPT_NO

DEPARTMENT

HEAD_DEPT

BUDGET

DEPT_LOCATIONS

100 Sales 000 1000000 Monterey

100 Sales 000 1000000 Santa Cruz

100 Sales 000 1000000 Salinas

600 Engineering 120 1100000 San Francisco

900 Finance 000 400000 Monterey

To normalize this table, we could eliminate the DEPT_LOCATION attribute from the DEPARTMENT table, and create another table called DEPT_LOCATIONS. We could then create a primary key that is a combination of DEPT_NO and DEPT_LOCATION. Now a distinct row exists for each location of the department, and we have eliminated the repeating groups.

DEPT_NO DEPT_LOCATION

100 Monterey

100 Santa Cruz

600 San Francisco

100 Salinas

Removing partially-dependent columns

Another important step in the normalization process is to remove any non-key columns that are dependent on only part of a composite key. Such columns are said to have a partial key dependency. Non-key columns provide information about the subject, but do not uniquely define it.

For example, suppose you wanted to locate an employee by project, and you created the PROJECT table with a composite primary key of EMP_NO and PROJ_ID.

EMP_NO PROJ_ID LAST_NAME PROJ_NAME PROJ_DESC PRODUCT

44 DGPII Smith Automap blob data hardware

47 VBASE Jenner Video database blob data software

24 HWRII Stevens Translator upgrade blob data software

The problem with this table is that PROJ_NAME, PROJ_DESC, and PRODUCT are attributes of PROJ_ID, but not EMP_NO, and are therefore only partially dependent on the EMP_NO/PROJ_ID primary key. This is also true for LAST_NAME because it is an attribute of EMP_NO, but does not relate to PROJ_ID. To normalize this table, we would remove the EMP_NO and LAST_NAME columns from the PROJECT table, and create another table called EMPLOYEE_PROJECT that has EMP_NO and PROJ_ID as a composite primary key. Now a unique row exists for every project that an employee is assigned to.

Removing transitively-dependent columns

The third step in the normalization process is to remove any non-key columns that depend upon other non-key columns. Each non-key column must be a fact about the primary key column. For example, suppose we added TEAM_LEADER_ID and PHONE_EXT to the PROJECT table, and made PROJ_ID the primary key. PHONE_EXT is a fact about TEAM_LEADER_ID, a non-key column, not about PROJ_ID, the primary key column.

PROJ_ID TEAM_LEADER_ID PHONE_EXT PROJ_NAME PROJ_DESC PRODUCT

DGPII 44 4929 Automap blob data hardware

VBASE 47 4967 Video database blob data software

HWRII 24 4668 Translator upgrade blob data software

To normalize this table, we would remove PHONE_EXT, change TEAM_LEADER_ID to TEAM_LEADER, and make TEAM_LEADER a foreign key referencing EMP_NO in the EMPLOYEE table.

PROJ_ID

TEAM_LEADER

PROJ_NAME

PROJ_DESC

PRODUCT

DGPII 44 Automap blob data hardware

VBASE 47 Video database blob data software

HWRII 24 Trans. Upgrade blob data software

EMP_NO LAST_NAME FIRST_NAME DEPT_NO JOB_CODE PHONE_EXT SALARY

24 Smith John 100 Eng 4968 64000

48 Carter Catherine 900 Sales 4967 72500

36 Smith Jane 600 Admin 4800 37500

Msc

Friday, October 26, 2007

1-5NF

How to normalize a database?

Labels

Blog Archive

DEPT_NO	DEPARTMENT	HEAD_DEPT	BUDGET	DEPT_LOCATIONS
100	Sales	000	1000000	Monterey, Santa Cruz, Salinas
600	Engineering	120	1100000	San Francisco
900	Finance	000	400000	Monterey

DEPT_NO	DEPT_LOCATION
100	Monterey
100	Santa Cruz
600	San Francisco
100	Salinas

EMP_NO	PROJ_ID	LAST_NAME	PROJ_NAME	PROJ_DESC	PRODUCT
44	DGPII	Smith	Automap	blob data	hardware
47	VBASE	Jenner	Video database	blob data	software
24	HWRII	Stevens	Translator upgrade	blob data	software

PROJ_ID	TEAM_LEADER_ID	PHONE_EXT	PROJ_NAME	PROJ_DESC	PRODUCT
DGPII	44	4929	Automap	blob data	hardware
VBASE	47	4967	Video database	blob data	software
HWRII	24	4668	Translator upgrade	blob data	software

PROJ_ID	TEAM_LEADER	PROJ_NAME	PROJ_DESC	PRODUCT
DGPII	44	Automap	blob data	hardware
VBASE	47	Video database	blob data	software
HWRII	24	Trans. Upgrade	blob data	software

EMP_NO	LAST_NAME	FIRST_NAME	DEPT_NO	JOB_CODE	PHONE_EXT	SALARY
24	Smith	John	100	Eng	4968	64000
48	Carter	Catherine	900	Sales	4967	72500
36	Smith	Jane	600	Admin	4800	37500