Friday, October 26, 2007

How to normalize a database?

In general, the normalization process includes:

  • Eliminating repeating groups.
  • Removing partially-dependent columns.
  • Removing transitively-dependent columns.

Eliminating repeating groups

When a field in a given row contains more than one value for each occurrence of the primary key, then that group of data items is called a repeating group. This is a violation of the first normal form, which does not allow multi-valued attributes.

Refer to the DEPARTMENT table. For any occurrence of a given primary key, if a column can have more than one value, then this set of values is a repeating group. Therefore, the first row, where DEPT_NO = "100," contains a repeating group in the DEPT_LOCATIONS column.

DEPT_NO
DEPARTMENT
HEAD_DEPT
BUDGET
DEPT_LOCATIONS
100 Sales 000 1000000 Monterey, Santa Cruz, Salinas
600 Engineering 120 1100000 San Francisco
900 Finance 000 400000 Monterey

In the next example, even if you change the attribute to represent only one location, for every occurrence of the primary key "100," all of the columns contain repeating information except for DEPT_LOCATION, so this is still a repeating group.

DEPT_NO
DEPARTMENT
HEAD_DEPT
BUDGET
DEPT_LOCATIONS
100 Sales 000 1000000 Monterey
100 Sales 000 1000000 Santa Cruz
100 Sales 000 1000000 Salinas
600 Engineering 120 1100000 San Francisco
900 Finance 000 400000 Monterey

To normalize this table, we could eliminate the DEPT_LOCATION attribute from the DEPARTMENT table, and create another table called DEPT_LOCATIONS. We could then create a primary key that is a combination of DEPT_NO and DEPT_LOCATION. Now a distinct row exists for each location of the department, and we have eliminated the repeating groups.

DEPT_NO DEPT_LOCATION
100 Monterey
100 Santa Cruz
600 San Francisco
100 Salinas

Removing partially-dependent columns

Another important step in the normalization process is to remove any non-key columns that are dependent on only part of a composite key. Such columns are said to have a partial key dependency. Non-key columns provide information about the subject, but do not uniquely define it.

For example, suppose you wanted to locate an employee by project, and you created the PROJECT table with a composite primary key of EMP_NO and PROJ_ID.

EMP_NO PROJ_ID LAST_NAME PROJ_NAME PROJ_DESC PRODUCT
44 DGPII Smith Automap blob data hardware
47 VBASE Jenner Video database blob data software
24 HWRII Stevens Translator upgrade blob data software

The problem with this table is that PROJ_NAME, PROJ_DESC, and PRODUCT are attributes of PROJ_ID, but not EMP_NO, and are therefore only partially dependent on the EMP_NO/PROJ_ID primary key. This is also true for LAST_NAME because it is an attribute of EMP_NO, but does not relate to PROJ_ID. To normalize this table, we would remove the EMP_NO and LAST_NAME columns from the PROJECT table, and create another table called EMPLOYEE_PROJECT that has EMP_NO and PROJ_ID as a composite primary key. Now a unique row exists for every project that an employee is assigned to.

Removing transitively-dependent columns

The third step in the normalization process is to remove any non-key columns that depend upon other non-key columns. Each non-key column must be a fact about the primary key column. For example, suppose we added TEAM_LEADER_ID and PHONE_EXT to the PROJECT table, and made PROJ_ID the primary key. PHONE_EXT is a fact about TEAM_LEADER_ID, a non-key column, not about PROJ_ID, the primary key column.

PROJ_ID TEAM_LEADER_ID PHONE_EXT PROJ_NAME PROJ_DESC PRODUCT
DGPII 44 4929 Automap blob data hardware
VBASE 47 4967 Video database blob data software
HWRII 24 4668 Translator upgrade blob data software

To normalize this table, we would remove PHONE_EXT, change TEAM_LEADER_ID to TEAM_LEADER, and make TEAM_LEADER a foreign key referencing EMP_NO in the EMPLOYEE table.

PROJ_ID
TEAM_LEADER
PROJ_NAME
PROJ_DESC
PRODUCT
DGPII 44 Automap blob data hardware
VBASE 47 Video database blob data software
HWRII 24 Trans. Upgrade blob data software

EMP_NO LAST_NAME FIRST_NAME DEPT_NO JOB_CODE PHONE_EXT SALARY
24 Smith John 100 Eng 4968 64000
48 Carter Catherine 900 Sales 4967 72500
36 Smith Jane 600 Admin 4800 37500


No comments: