Normalization is a process of organizing data in a database to avoid data redundancy and to eliminate insert anomaly, delete anomaly, and update anomaly.
Here, anomaly refers to error
Now, Data redundancy means similar data in multiple rows. Let’s take an example to explain those anomalies.
When you updating the data in a table, you should have to update it in one place. Consider Fig-a, Employee ID 1 has an address called Address 1. Now, Name 1 move to a new address from Address 1 to Address 1a. In this database(Fig-a), you have to change data in multiple rows(i.e., first and second rows).
Here, it says that you want to change data in a table but it will update the data in multiple places then that is called update anomaly.
When you deleting the data in a table, you should have to delete the particular data only. Consider Fig-a, Employee ID 3 working in a department called Department 2. Now, the company has decided to remove Department 2 from the departments. In this database(Fig-a), Employee ID 3 data are also deleted(i.e., fourth row).
Here, it says that you want to delete a particular data in a table but it will delete other's data also then that is called delete anomaly.
When you inserting the data in a table, you should have to insert all the data in a row. Consider Fig-a, Employee ID 5, and 6 are taking as an intern in the company. After the completion of their internship, they will be allocated to their respective departments. Now they have no departments. (i.e., sixth and seventh row).
Here, it says that you want to add a new row in the table, then the row should be “ fully completed no one is empty ”.
Types of Normalization:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
First Normal Form:
Each column in a table must have atomic(single) value which means we have to eliminate multiple values in a column for that we have to break a table into two or more tables. Let’s take an example,
Here, the colour column has two values so it is not satisfying the 1NF. To satisfy 1NF, we have to break it into 2 tables.
Now, the database table satisfying 1NF.
Second Normal Form:
In 2NF, it should satisfy the following two conditions.
- It should satisfy the first normal form.
- No non-prime attribute is dependent on the proper subset of any candidate key in a table
Candidate key : A minimal set of attributes that can uniquely identify the row and it has to be not-null. Example - Student roll number in the student table.
Prime attribute : An attribute that is a part of any candidate key.
Example - Student roll number is a candidate key and student name is a prime attribute.
Non-prime attribute : An attribute that is not part of any candidate key.
Example - Student roll number is a candidate key and student address is a non-prime attribute.
Here, Customer ID and Store ID both are the primary keys which mean composite primary key and the location depends on the Store ID which is a part of a primary key. So, it is not satisfying the 2NF. Let’s break it into two tables.
Now, Store ID depends on the Customer ID in the purchase table and Location depends on the Store ID in the store table.
Third Normal Form:
In 3NF, it should satisfy the following two conditions.
- It should satisfy the second normal form.
- At least one of the functional dependencies between X and Y (X ->Y) should be happening.
-> X is a super key of the table.
-> Y is a prime attribute of the table.
Super key : A set of one or more attributes within a table whose values can uniquely identify the row. A candidate key is a super key but vice versa is not true.
In this table, Book ID determines Genre ID and Genre ID determines Genre type which means Book ID determines Genre type via Genre ID and we have a transitive functional dependency( X -> Z -> Y ). Here, the Genre type is a prime attribute. Let’s break it out,
Now, the book table has a functional dependency of X -> Z and the Genre table has a functional dependency of Z -> Y.
X — Book ID
Y — Genre ID
Z — Genre type
Boyce-Codd Normal Form:
It is an advanced version of 3NF that why it is also called as 3.5NF, it should satisfy the following two conditions.
- It should satisfy the third normal form.
- It should be in functional dependency X ->Y which is “ X should be the super key of the table ”.
In this table, the unique combination of Student and Course determines the Teacher and also Teacher determines the Course. The problem is, the teacher is not a super key but it determines the course. To satisfy the BCNF, we have to break the table.
Fourth Normal Form:
In 4NF, it should satisfy the following two conditions.
- It should satisfy the Boyce-Codd normal form.
- It should not contain more than one multivalued dependency.
Here, key columns are Student, major, and Hobby but this table defines two multivalued dependencies. The student defines a major and it also defines a hobby. So, let’s divide it.
Now, it is divided into two tables which are a major table and a hobby table.
Fifth Normal Form:
If a database is said to be in 5NF then it should satisfy the fourth normal form and if we decompose a table into furthermore tables to avoid redundancy and anomalies, then we can rejoin all the tables through candidate keys, “ we should not lose the original data and we should not add any new data also ”. Let’s take an example,
In this table, we have seller, company, and product columns. Here, Adhitya has the same company but different products, and Abhinav have different companies and different products. Now, we divide this table into two tables which are the seller-company table and seller-product table.
If we combine those two tables, we will lose some data. Here, Aman has only one company which is coca-cola and he has only one product so, we can join Aman’s details easily. Adhitya has only one company and he has three products so he definitely would have bought those products in one company then we can join these details also. Next, Abhinav has two companies and he has two products here, we can’t join Abhinav details without a company-product table.
Now, by referring to the Company-Product table we can easily join Abhinav details also.
To create an accurate database, we have to do a normalization process at a hierarchical level.