Friday, October 26, 2007

1-5NF

4.7 Normal Forms


There are 6 types of normal forms.

1. First Normal Form (1NF).

Any repeating groups have been removed, so there is a single value at the intersection of each row and column of the table.

2. Second Normal Form (2NF).

Any partial functional dependencies have been removed.

3. Boyce-Codd Normal Form (BCNF).

Any remaining anomalies that result from functional dependencies have been removed.

4. Third Normal Form (3NF).

Any transitive dependencies have been removed.

5. Fourth Normal Form (4NF).

Any multi-valued dependencies have been removed.

6. Fifth Normal Form (5NF).

Any remaining anomalies have been removed.




4.7 Normal Forms
4.7.1 First Normal Form (1NF)


To be in First Normal Form (1NF) a relation must have only single-valued attributes -- neither repeating groups nor arrays are permitted

An example of un-normalized form (UNF) and first normal form (1NF) comparison is shown as follow.

UNF

– Job-Seeker(Jregno, Jsname, (JsQual, Subject, level, award), (Pemp, employer, addr, pay))

1 NF

– Job-Seeker(Jregno, Jsname)
– (JsQual, Subject, level, award)
– (Pemp, employer, addr, pay)

Another example to show the conversion of the following table: FIRST

Figure 4.9(a)

To convert it to 1NF there are two solutions:

First solution: Repeat the supplier number for each part supplied. (This is not the best table design since it consumes more space.)

Figure 4.9(b)

Second solution: Split the relation into two separate relations. (This is better since it contains less redundant information.).

Figure 4.9(c)

4.7 Normal Forms
4.7.2 Second Normal Form (2NF)


To be in Second Normal Form (2NF) the relation must be:

  • in 1NF and
  • each non-key attribute must be dependent on the whole key (not a subset of the key or partial dependency).

2NF conversion

(Pemp, employer, addr, pay) =>(Pemp, pay), (employer, addr)

Definition. An attribute is a non-key attribute if it is not part of the primary key.

Consider the relation scheme:

FIRST(S#, STATUS, CITY, P#, QTY)

with the meaning “a supplier denoted S#, with a STATUS and located in CITY supplies a part denoted by P# in quantity QTY” and the following dependency rules:

S# -> CITY
S#, P# -> QTY
CITY -> STATUS

Problems in the FIRST relation:


(1) The table is redundant which will result in inconsistencies whenever the table is updated (i.e., more cumbersome updates, thus possible inconsistent data).

(2) Insertion can cause a null primary key, if only S# and CITY is inserted.

(3) Deletion of the 9th tuple wll cause to delete the fact that S3 is in Paris (for table in Figure 4.9b)

The problem arises because the primary key for the relation is (S#, P#) but CITY and STATUS are only dependent on S# (i.e., partial dependency).

To overcome this problem, split the first relation into two relations

Figure 4.10

4.7 Normal Forms
4.7.3 Boyce-Codd Normal Form (BCNF)


A relation is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a superkey or the FD is trivial.

Let R be a relation schema

  • X be a subset of the attributes of R
  • A be an attribute of R

R is in BCNF if

for every FD X -> A, one of the following is true

  • A is a subset of X (it is a trivial FD), or X is a super-key

Theoretically we must check each FD

X -> A in F+

It is actually sufficient to check

whether the LHS of each FD in F+ is a super-key.

Consider:

STUDENT-ID, MAJOR -> ADVISER
ADVISER -> MAJOR

Non-BCNF

R(STUDENT-ID, MAJOR, ADVISER)

BCNF (Solution: split into two relations)

R1(STUDENT-ID, ADVISER)
R2(ADVISER, MAJOR)

Example on testing of BCNF

R (A, B, C, D), with F = { A -> B, B -> C}
Decompose R into R1(A,B) and R2(A,C,D)

What is the key for R2 ?

Neither of the dependencies in F, A -> B, B -> C, contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF. In fact, dependency A -> C in F+ (transitivity) shows R2 is not in BCNF.

Lossless-join Decomposition into BCNF


Given schema R

  • W Ì Schema R
  • A is a single attribute
  • W -> A is a FD that violates BCNF

decompose R into

R - A, and WA

If either is not in BCNF, decompose recursively

the decomposition is lossless-join

Example of BCNF Decomposition

R = (branch-name, branch-city, assets,
customer-name, loan-number, amount)


F = {branch-name -> assets, branch-city
loan-number -> amount, branch-name}

Key = {loan-number, customer-name}

Decomposition

R (F1)

  • R1 = (branch-name, branch-city, assets)
  • R2 = (branch-name, customer-name, loan-number, amount)

R2 (F2)

  • R3 = (branch-name, loan-number, amount)
  • R4 = (customer-name, loan-number)

Final decomposition

R1, R3, R4

Figure 4.11

Figure 4.12

If there are multiple dependencies (in a relation) that violate BCNF. There may be different ways to decompose, resulting in different collections of BCNF relations. A designer has to choose one based on the semantics of the applications.

4.7 Normal Forms
4.7.4 Third Normal Form (3NF)


To be in Third Normal Form (3NF) the relation must be in 2NF and no transitive dependencies may exist within the relation. (i.e. the only determinants it contains are candidate keys)

A transitive dependency is when an attribute is indirectly functionally dependent on the key (that is, the dependency is through another non-key attribute).

Consider:

The relation SECOND which is already in 2NF still has the processing anomalies.

Problems:

(1) The fact that a city has a certain status cannot be entered unless there is a supplier in that city.

(2) If we delete the row containing S2 and S3 then we lose the fact that Paris has the status 10
(for table in Figure 4.10)

(3) The status of city is repeated for each occurrence of the city.

The problem arises because STATUS is transitively dependent on S# via CITY.

The solution is to split the SECOND relation into two separate relations using the relational algebra projection operator.

Figure 4.13

“Performance is the only reason for not going into 3NF”
Relations in 3NF are sufficient for most practical database applications. However, 3NF does not guarantee that all anomalies have been removed.

Let

  • R be a relation schema
  • X be a subset of the attributes of R
  • A be an attribute of R

R is in 3NF if

for every FD X -> A, one of the following is true

  • A -> X (it is a trivial FD), or
  • X is a super-key, or
  • A is part of some key for R. {not for BCNF}

BCNF is stronger than 3NF and every BCNF relation is also in 3NF


Example of violation of 3NF


Consider the table in figure 4.14. The key is SID, and the functional dependencies are SID -> Building and
Building -> Fee. By transitivity, it implies SID -> Fee.


Figure 4.14

The key of HOUSING is SID is a single attribute. This implies the relation is 2NF since both Building and Fee are determined by SID). However, HOUSING has anomalies due to transitivity.

If the second tuple is deleted, the whole record of fee for Ingersoll is lost. This is deletion anomaly. Furthermore, we cannot insert the fee of another hall unless a student is committed to provide SID for the table. This is insertion anomaly.

The problems can be resolved by splitting the table into 2 as shown in figure 4.15.


Figure 4.15

Example on 3NF vs BCNF


Reserves ( sailorID, boatID, day, creditCardID )

Consider FD sailorID -> creditCardID

That is, a sailor uses a unique credit card to pay for reservations

  • sailorID is not a superkey
  • creditCardID is not part of a key

Hence, this relation is not in 3NF.

But if we also know FD creditCardID -> sailorID

That is, credit cards uniquely identify the owner

  • is also a key

  • FD1 : SailorID -> creditCardID does not violate 3NF because creditCardID is part of a key
  • FD2 : creditCardID -> SailorID does not violate 3NF

Reserves is in 3NF but in all tuples with same SailorID value, the same
pair is redundantly stored

Hence, Reserves is not in BCNF

Examples of Decomposition of Relations


Consider

Banker-info-schema = (branch-name, customer-name, banker-name, office-number)

The functional dependencies for this relation schema are:

  • banker-name -> branch-name, office-number
  • customer-name, branch-name -> banker-name

The key is: {customer-name, branch-name}

Consider

(branch-name, customer-name, banker-name, office-number)

Is it in 3NF?

banker-name -> branch-name office-number

  • Not trivial
  • Not superkey
  • NOT part of a candidate key

customer-name branch-name -> banker-name

  • Not trivial
  • Not superkey
  • NOT Part of a candidate key


Figure 4.16

Applying 3NF to Banker-info-schema

Banker-office-schema = (banker-name, branch-name, office-number)

Banker-schema = (customer-name, branch-name, banker-name)

Example

Consider the following table

Figure 4.17

The following are 2 FDs

  • I -> (I,N,L,R,W,H)
  • R -> W

Since R is not a superkey and W is not part of any key, it is a violation of 3NF.

Decompose INLRWH into INLRH & RW
INLRH has {hkid, name, lot, rating, hourly_wage}

  • FD I -> INLRH
  • I is a key for INLRH

RW has {rating, hours_worked}

  • FD R -> W
  • R is a key for RW

Both schemas are in BCNF.This decomposition was guided by the FD (R -> W) that violated 3NF

Method: remove W from the main schema

Figure 4.18

4.7 Normal Forms
4.7.5 Fourth Normal Form (4NF)


A relation is in Fourth Normal Form (4NF) if it is in Boyce-Codd Normal Form and contains no multi-valued dependency.

Consider the following course-teacher-text relation:

Figure 4.19

4NF: Solution (split CTX Relation into two relations TEACHER and TEXT

Figure 4.20

4.7 Normal Forms
4.7.6 Fifth Normal Form (5NF)


A relation is in Fifth Normal Form (5NF) if it is in Fourth Normal Form and does not have a join dependency. A join dependency: a relation that has a join dependency cannot be divided into two (or more) relations such that the resulting tables can be recombined to form the original table.

How to normalize a database?

In general, the normalization process includes:

  • Eliminating repeating groups.
  • Removing partially-dependent columns.
  • Removing transitively-dependent columns.

Eliminating repeating groups

When a field in a given row contains more than one value for each occurrence of the primary key, then that group of data items is called a repeating group. This is a violation of the first normal form, which does not allow multi-valued attributes.

Refer to the DEPARTMENT table. For any occurrence of a given primary key, if a column can have more than one value, then this set of values is a repeating group. Therefore, the first row, where DEPT_NO = "100," contains a repeating group in the DEPT_LOCATIONS column.

DEPT_NO
DEPARTMENT
HEAD_DEPT
BUDGET
DEPT_LOCATIONS
100 Sales 000 1000000 Monterey, Santa Cruz, Salinas
600 Engineering 120 1100000 San Francisco
900 Finance 000 400000 Monterey

In the next example, even if you change the attribute to represent only one location, for every occurrence of the primary key "100," all of the columns contain repeating information except for DEPT_LOCATION, so this is still a repeating group.

DEPT_NO
DEPARTMENT
HEAD_DEPT
BUDGET
DEPT_LOCATIONS
100 Sales 000 1000000 Monterey
100 Sales 000 1000000 Santa Cruz
100 Sales 000 1000000 Salinas
600 Engineering 120 1100000 San Francisco
900 Finance 000 400000 Monterey

To normalize this table, we could eliminate the DEPT_LOCATION attribute from the DEPARTMENT table, and create another table called DEPT_LOCATIONS. We could then create a primary key that is a combination of DEPT_NO and DEPT_LOCATION. Now a distinct row exists for each location of the department, and we have eliminated the repeating groups.

DEPT_NO DEPT_LOCATION
100 Monterey
100 Santa Cruz
600 San Francisco
100 Salinas

Removing partially-dependent columns

Another important step in the normalization process is to remove any non-key columns that are dependent on only part of a composite key. Such columns are said to have a partial key dependency. Non-key columns provide information about the subject, but do not uniquely define it.

For example, suppose you wanted to locate an employee by project, and you created the PROJECT table with a composite primary key of EMP_NO and PROJ_ID.

EMP_NO PROJ_ID LAST_NAME PROJ_NAME PROJ_DESC PRODUCT
44 DGPII Smith Automap blob data hardware
47 VBASE Jenner Video database blob data software
24 HWRII Stevens Translator upgrade blob data software

The problem with this table is that PROJ_NAME, PROJ_DESC, and PRODUCT are attributes of PROJ_ID, but not EMP_NO, and are therefore only partially dependent on the EMP_NO/PROJ_ID primary key. This is also true for LAST_NAME because it is an attribute of EMP_NO, but does not relate to PROJ_ID. To normalize this table, we would remove the EMP_NO and LAST_NAME columns from the PROJECT table, and create another table called EMPLOYEE_PROJECT that has EMP_NO and PROJ_ID as a composite primary key. Now a unique row exists for every project that an employee is assigned to.

Removing transitively-dependent columns

The third step in the normalization process is to remove any non-key columns that depend upon other non-key columns. Each non-key column must be a fact about the primary key column. For example, suppose we added TEAM_LEADER_ID and PHONE_EXT to the PROJECT table, and made PROJ_ID the primary key. PHONE_EXT is a fact about TEAM_LEADER_ID, a non-key column, not about PROJ_ID, the primary key column.

PROJ_ID TEAM_LEADER_ID PHONE_EXT PROJ_NAME PROJ_DESC PRODUCT
DGPII 44 4929 Automap blob data hardware
VBASE 47 4967 Video database blob data software
HWRII 24 4668 Translator upgrade blob data software

To normalize this table, we would remove PHONE_EXT, change TEAM_LEADER_ID to TEAM_LEADER, and make TEAM_LEADER a foreign key referencing EMP_NO in the EMPLOYEE table.

PROJ_ID
TEAM_LEADER
PROJ_NAME
PROJ_DESC
PRODUCT
DGPII 44 Automap blob data hardware
VBASE 47 Video database blob data software
HWRII 24 Trans. Upgrade blob data software

EMP_NO LAST_NAME FIRST_NAME DEPT_NO JOB_CODE PHONE_EXT SALARY
24 Smith John 100 Eng 4968 64000
48 Carter Catherine 900 Sales 4967 72500
36 Smith Jane 600 Admin 4800 37500