Friday, October 26, 2007

1-5NF

4.7 Normal Forms


There are 6 types of normal forms.

1. First Normal Form (1NF).

Any repeating groups have been removed, so there is a single value at the intersection of each row and column of the table.

2. Second Normal Form (2NF).

Any partial functional dependencies have been removed.

3. Boyce-Codd Normal Form (BCNF).

Any remaining anomalies that result from functional dependencies have been removed.

4. Third Normal Form (3NF).

Any transitive dependencies have been removed.

5. Fourth Normal Form (4NF).

Any multi-valued dependencies have been removed.

6. Fifth Normal Form (5NF).

Any remaining anomalies have been removed.




4.7 Normal Forms
4.7.1 First Normal Form (1NF)


To be in First Normal Form (1NF) a relation must have only single-valued attributes -- neither repeating groups nor arrays are permitted

An example of un-normalized form (UNF) and first normal form (1NF) comparison is shown as follow.

UNF

– Job-Seeker(Jregno, Jsname, (JsQual, Subject, level, award), (Pemp, employer, addr, pay))

1 NF

– Job-Seeker(Jregno, Jsname)
– (JsQual, Subject, level, award)
– (Pemp, employer, addr, pay)

Another example to show the conversion of the following table: FIRST

Figure 4.9(a)

To convert it to 1NF there are two solutions:

First solution: Repeat the supplier number for each part supplied. (This is not the best table design since it consumes more space.)

Figure 4.9(b)

Second solution: Split the relation into two separate relations. (This is better since it contains less redundant information.).

Figure 4.9(c)

4.7 Normal Forms
4.7.2 Second Normal Form (2NF)


To be in Second Normal Form (2NF) the relation must be:

  • in 1NF and
  • each non-key attribute must be dependent on the whole key (not a subset of the key or partial dependency).

2NF conversion

(Pemp, employer, addr, pay) =>(Pemp, pay), (employer, addr)

Definition. An attribute is a non-key attribute if it is not part of the primary key.

Consider the relation scheme:

FIRST(S#, STATUS, CITY, P#, QTY)

with the meaning “a supplier denoted S#, with a STATUS and located in CITY supplies a part denoted by P# in quantity QTY” and the following dependency rules:

S# -> CITY
S#, P# -> QTY
CITY -> STATUS

Problems in the FIRST relation:


(1) The table is redundant which will result in inconsistencies whenever the table is updated (i.e., more cumbersome updates, thus possible inconsistent data).

(2) Insertion can cause a null primary key, if only S# and CITY is inserted.

(3) Deletion of the 9th tuple wll cause to delete the fact that S3 is in Paris (for table in Figure 4.9b)

The problem arises because the primary key for the relation is (S#, P#) but CITY and STATUS are only dependent on S# (i.e., partial dependency).

To overcome this problem, split the first relation into two relations

Figure 4.10

4.7 Normal Forms
4.7.3 Boyce-Codd Normal Form (BCNF)


A relation is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a superkey or the FD is trivial.

Let R be a relation schema

  • X be a subset of the attributes of R
  • A be an attribute of R

R is in BCNF if

for every FD X -> A, one of the following is true

  • A is a subset of X (it is a trivial FD), or X is a super-key

Theoretically we must check each FD

X -> A in F+

It is actually sufficient to check

whether the LHS of each FD in F+ is a super-key.

Consider:

STUDENT-ID, MAJOR -> ADVISER
ADVISER -> MAJOR

Non-BCNF

R(STUDENT-ID, MAJOR, ADVISER)

BCNF (Solution: split into two relations)

R1(STUDENT-ID, ADVISER)
R2(ADVISER, MAJOR)

Example on testing of BCNF

R (A, B, C, D), with F = { A -> B, B -> C}
Decompose R into R1(A,B) and R2(A,C,D)

What is the key for R2 ?

Neither of the dependencies in F, A -> B, B -> C, contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF. In fact, dependency A -> C in F+ (transitivity) shows R2 is not in BCNF.

Lossless-join Decomposition into BCNF


Given schema R

  • W Ì Schema R
  • A is a single attribute
  • W -> A is a FD that violates BCNF

decompose R into

R - A, and WA

If either is not in BCNF, decompose recursively

the decomposition is lossless-join

Example of BCNF Decomposition

R = (branch-name, branch-city, assets,
customer-name, loan-number, amount)


F = {branch-name -> assets, branch-city
loan-number -> amount, branch-name}

Key = {loan-number, customer-name}

Decomposition

R (F1)

  • R1 = (branch-name, branch-city, assets)
  • R2 = (branch-name, customer-name, loan-number, amount)

R2 (F2)

  • R3 = (branch-name, loan-number, amount)
  • R4 = (customer-name, loan-number)

Final decomposition

R1, R3, R4

Figure 4.11

Figure 4.12

If there are multiple dependencies (in a relation) that violate BCNF. There may be different ways to decompose, resulting in different collections of BCNF relations. A designer has to choose one based on the semantics of the applications.

4.7 Normal Forms
4.7.4 Third Normal Form (3NF)


To be in Third Normal Form (3NF) the relation must be in 2NF and no transitive dependencies may exist within the relation. (i.e. the only determinants it contains are candidate keys)

A transitive dependency is when an attribute is indirectly functionally dependent on the key (that is, the dependency is through another non-key attribute).

Consider:

The relation SECOND which is already in 2NF still has the processing anomalies.

Problems:

(1) The fact that a city has a certain status cannot be entered unless there is a supplier in that city.

(2) If we delete the row containing S2 and S3 then we lose the fact that Paris has the status 10
(for table in Figure 4.10)

(3) The status of city is repeated for each occurrence of the city.

The problem arises because STATUS is transitively dependent on S# via CITY.

The solution is to split the SECOND relation into two separate relations using the relational algebra projection operator.

Figure 4.13

“Performance is the only reason for not going into 3NF”
Relations in 3NF are sufficient for most practical database applications. However, 3NF does not guarantee that all anomalies have been removed.

Let

  • R be a relation schema
  • X be a subset of the attributes of R
  • A be an attribute of R

R is in 3NF if

for every FD X -> A, one of the following is true

  • A -> X (it is a trivial FD), or
  • X is a super-key, or
  • A is part of some key for R. {not for BCNF}

BCNF is stronger than 3NF and every BCNF relation is also in 3NF


Example of violation of 3NF


Consider the table in figure 4.14. The key is SID, and the functional dependencies are SID -> Building and
Building -> Fee. By transitivity, it implies SID -> Fee.


Figure 4.14

The key of HOUSING is SID is a single attribute. This implies the relation is 2NF since both Building and Fee are determined by SID). However, HOUSING has anomalies due to transitivity.

If the second tuple is deleted, the whole record of fee for Ingersoll is lost. This is deletion anomaly. Furthermore, we cannot insert the fee of another hall unless a student is committed to provide SID for the table. This is insertion anomaly.

The problems can be resolved by splitting the table into 2 as shown in figure 4.15.


Figure 4.15

Example on 3NF vs BCNF


Reserves ( sailorID, boatID, day, creditCardID )

Consider FD sailorID -> creditCardID

That is, a sailor uses a unique credit card to pay for reservations

  • sailorID is not a superkey
  • creditCardID is not part of a key

Hence, this relation is not in 3NF.

But if we also know FD creditCardID -> sailorID

That is, credit cards uniquely identify the owner

  • is also a key

  • FD1 : SailorID -> creditCardID does not violate 3NF because creditCardID is part of a key
  • FD2 : creditCardID -> SailorID does not violate 3NF

Reserves is in 3NF but in all tuples with same SailorID value, the same
pair is redundantly stored

Hence, Reserves is not in BCNF

Examples of Decomposition of Relations


Consider

Banker-info-schema = (branch-name, customer-name, banker-name, office-number)

The functional dependencies for this relation schema are:

  • banker-name -> branch-name, office-number
  • customer-name, branch-name -> banker-name

The key is: {customer-name, branch-name}

Consider

(branch-name, customer-name, banker-name, office-number)

Is it in 3NF?

banker-name -> branch-name office-number

  • Not trivial
  • Not superkey
  • NOT part of a candidate key

customer-name branch-name -> banker-name

  • Not trivial
  • Not superkey
  • NOT Part of a candidate key


Figure 4.16

Applying 3NF to Banker-info-schema

Banker-office-schema = (banker-name, branch-name, office-number)

Banker-schema = (customer-name, branch-name, banker-name)

Example

Consider the following table

Figure 4.17

The following are 2 FDs

  • I -> (I,N,L,R,W,H)
  • R -> W

Since R is not a superkey and W is not part of any key, it is a violation of 3NF.

Decompose INLRWH into INLRH & RW
INLRH has {hkid, name, lot, rating, hourly_wage}

  • FD I -> INLRH
  • I is a key for INLRH

RW has {rating, hours_worked}

  • FD R -> W
  • R is a key for RW

Both schemas are in BCNF.This decomposition was guided by the FD (R -> W) that violated 3NF

Method: remove W from the main schema

Figure 4.18

4.7 Normal Forms
4.7.5 Fourth Normal Form (4NF)


A relation is in Fourth Normal Form (4NF) if it is in Boyce-Codd Normal Form and contains no multi-valued dependency.

Consider the following course-teacher-text relation:

Figure 4.19

4NF: Solution (split CTX Relation into two relations TEACHER and TEXT

Figure 4.20

4.7 Normal Forms
4.7.6 Fifth Normal Form (5NF)


A relation is in Fifth Normal Form (5NF) if it is in Fourth Normal Form and does not have a join dependency. A join dependency: a relation that has a join dependency cannot be divided into two (or more) relations such that the resulting tables can be recombined to form the original table.

No comments: