What is Normalisation in database?

ยท

6 min read

Let's consider an example to illustrate normalization in a database.

Suppose we have a database that stores information about books and authors. Initially, we might design a single table to hold all the information:

Table: Books

Book IDTitleAuthorYear PublishedGenre
1Book AAuthor X2005Fiction
2Book BAuthor Y2010Non-Fiction
3Book CAuthor X2015Fiction

In this scenario, we can observe some data redundancy. The author's name "Author X" is repeated for multiple books. If an author changes their name or other details, we would need to update all the relevant rows, which can be error-prone and inefficient.

To address this, we can normalize the database by creating separate tables and establishing relationships between them. Here's an example of a normalized design:

Table: Authors

Author IDAuthor Name
1Author X
2Author Y

Table: Books

Book IDTitleYear PublishedGenreAuthor ID
1Book A2005Fiction1
2Book B2010Non-Fiction2
3Book C2015Fiction1

In this normalized design, we have separated the authors' information into a separate "Authors" table. Each author is assigned a unique "Author ID". The "Author ID" is then used as a foreign key in the "Books" table to establish a relationship between authors and their respective books.

By normalizing the database, we eliminate data redundancy. Now, if we need to update an author's information, we only need to modify a single row in the "Authors" table.

Why Normalisation?

Normalisation also provides other benefits. For example, if we want to query all books written by a specific author, we can easily join the "Books" and "Authors" tables using the "Author ID" column.

By breaking down the data into separate tables and establishing relationships, normalization helps ensure data consistency, reduce data redundancy, and facilitate efficient querying and maintenance of the database.

What is Atomicity?

Atomicity is enforced by the database management system (DBMS) through the use of transaction management mechanisms, such as transaction logs and rollback operations. If any part of a transaction fails or encounters an error, the DBMS rolls back the entire transaction, undoing all the changes made during that transaction.

For example, consider a banking system where a transaction involves transferring funds from one account to another. Atomicity ensures that if the withdrawal from the sender's account is successful, the deposit into the recipient's account will also be completed. If any error occurs during either operation, the entire transaction is rolled back, and the accounts are left unchanged.

By maintaining atomicity, databases can ensure data consistency and integrity, providing reliable and predictable behavior when executing transactions. It helps to prevent data corruption and maintain the accuracy and reliability of the database.

What are normal forms (1NF, 2NF, and 3NF) ?

Let's dive deeper into each of the normal forms (1NF, 2NF, and 3NF) with examples.

  1. First Normal Form (1NF):

    • 1NF ensures that each column in a table contains only atomic values and there are no repeating groups or arrays within the table.

    • Atomic values cannot be further divided. Each value in a column should be indivisible.

    • Here's an example to illustrate achieving 1NF:

Table: Students

Student IDNameCourses
1John DoeMath, Science
2Jane SmithEnglish, Math

In the initial design, the "Courses" column contains multiple values separated by commas. To achieve 1NF, we need to break down the courses into separate rows:

Table: Students

Student IDNameCourse
1John DoeMath
1John DoeScience
2Jane SmithEnglish
2Jane SmithMath

By separating the courses into individual rows, we ensure atomicity and eliminate the repeating groups within the table.

  1. Second Normal Form (2NF):

    • 2NF builds upon 1NF and requires that all non-key attributes in a table are dependent on the entire primary key.

    • Non-key attributes should depend on the entire primary key, not just part of it.

    • Here's an example to illustrate achieving 2NF:

Table: Orders

Order IDCustomer IDProduct IDProduct NameQuantity
11011Book A2
21012Book B1
31021Book A3

In this example, the primary key is composed of both "Order ID" and "Product ID". However, the "Product Name" is dependent on the "Product ID" only, not the entire primary key. To achieve 2NF, we can split the table into two:

Table: Orders

Order IDCustomer IDProduct IDQuantity
110112
210121
310213

Table: Products

Product IDProduct Name
1Book A
2Book B

By creating a separate "Products" table, we ensure that the non-key attribute "Product Name" is dependent on the entire primary key of the "Products" table.

  1. Third Normal Form (3NF):

    • 3NF extends 2NF and ensures that there are no transitive dependencies between non-key attributes.

    • Non-key attributes should depend only on the primary key and not on other non-key attributes.

    • Here's an example to illustrate achieving 3NF:

Table: Employees

Employee IDNameDepartmentDepartment Location
1John DoeHRNew York
2Jane SmithITSan Francisco
3Alex JohnsonSalesLondon

In this example, the "Department Location" attribute depends on the "Department" attribute, which is not the primary key. To achieve 3NF, we can split the table into two:

Table: Employees

Employee IDName
1John Doe
2Jane Smith
3Alex Johnson

Table: Departments

DepartmentDepartment Location
HRNew York
ITSan Francisco
SalesLondon

By creating a separate "Departments" table, we eliminate the transitive dependency between "Department Location" and "Department", ensuring that non-key attributes depend only on the primary key.

ย