Let's consider an example to illustrate normalization in a database.
Suppose we have a database that stores information about books and authors. Initially, we might design a single table to hold all the information:
Table: Books
Book ID | Title | Author | Year Published | Genre |
1 | Book A | Author X | 2005 | Fiction |
2 | Book B | Author Y | 2010 | Non-Fiction |
3 | Book C | Author X | 2015 | Fiction |
In this scenario, we can observe some data redundancy. The author's name "Author X" is repeated for multiple books. If an author changes their name or other details, we would need to update all the relevant rows, which can be error-prone and inefficient.
To address this, we can normalize the database by creating separate tables and establishing relationships between them. Here's an example of a normalized design:
Table: Authors
Author ID | Author Name |
1 | Author X |
2 | Author Y |
Table: Books
Book ID | Title | Year Published | Genre | Author ID |
1 | Book A | 2005 | Fiction | 1 |
2 | Book B | 2010 | Non-Fiction | 2 |
3 | Book C | 2015 | Fiction | 1 |
In this normalized design, we have separated the authors' information into a separate "Authors" table. Each author is assigned a unique "Author ID". The "Author ID" is then used as a foreign key in the "Books" table to establish a relationship between authors and their respective books.
By normalizing the database, we eliminate data redundancy. Now, if we need to update an author's information, we only need to modify a single row in the "Authors" table.
Why Normalisation?
Normalisation also provides other benefits. For example, if we want to query all books written by a specific author, we can easily join the "Books" and "Authors" tables using the "Author ID" column.
By breaking down the data into separate tables and establishing relationships, normalization helps ensure data consistency, reduce data redundancy, and facilitate efficient querying and maintenance of the database.
What is Atomicity?
Atomicity is enforced by the database management system (DBMS) through the use of transaction management mechanisms, such as transaction logs and rollback operations. If any part of a transaction fails or encounters an error, the DBMS rolls back the entire transaction, undoing all the changes made during that transaction.
For example, consider a banking system where a transaction involves transferring funds from one account to another. Atomicity ensures that if the withdrawal from the sender's account is successful, the deposit into the recipient's account will also be completed. If any error occurs during either operation, the entire transaction is rolled back, and the accounts are left unchanged.
By maintaining atomicity, databases can ensure data consistency and integrity, providing reliable and predictable behavior when executing transactions. It helps to prevent data corruption and maintain the accuracy and reliability of the database.
What are normal forms (1NF, 2NF, and 3NF) ?
Let's dive deeper into each of the normal forms (1NF, 2NF, and 3NF) with examples.
First Normal Form (1NF):
1NF ensures that each column in a table contains only atomic values and there are no repeating groups or arrays within the table.
Atomic values cannot be further divided. Each value in a column should be indivisible.
Here's an example to illustrate achieving 1NF:
Table: Students
Student ID | Name | Courses |
1 | John Doe | Math, Science |
2 | Jane Smith | English, Math |
In the initial design, the "Courses" column contains multiple values separated by commas. To achieve 1NF, we need to break down the courses into separate rows:
Table: Students
Student ID | Name | Course |
1 | John Doe | Math |
1 | John Doe | Science |
2 | Jane Smith | English |
2 | Jane Smith | Math |
By separating the courses into individual rows, we ensure atomicity and eliminate the repeating groups within the table.
Second Normal Form (2NF):
2NF builds upon 1NF and requires that all non-key attributes in a table are dependent on the entire primary key.
Non-key attributes should depend on the entire primary key, not just part of it.
Here's an example to illustrate achieving 2NF:
Table: Orders
Order ID | Customer ID | Product ID | Product Name | Quantity |
1 | 101 | 1 | Book A | 2 |
2 | 101 | 2 | Book B | 1 |
3 | 102 | 1 | Book A | 3 |
In this example, the primary key is composed of both "Order ID" and "Product ID". However, the "Product Name" is dependent on the "Product ID" only, not the entire primary key. To achieve 2NF, we can split the table into two:
Table: Orders
Order ID | Customer ID | Product ID | Quantity |
1 | 101 | 1 | 2 |
2 | 101 | 2 | 1 |
3 | 102 | 1 | 3 |
Table: Products
Product ID | Product Name |
1 | Book A |
2 | Book B |
By creating a separate "Products" table, we ensure that the non-key attribute "Product Name" is dependent on the entire primary key of the "Products" table.
Third Normal Form (3NF):
3NF extends 2NF and ensures that there are no transitive dependencies between non-key attributes.
Non-key attributes should depend only on the primary key and not on other non-key attributes.
Here's an example to illustrate achieving 3NF:
Table: Employees
Employee ID | Name | Department | Department Location |
1 | John Doe | HR | New York |
2 | Jane Smith | IT | San Francisco |
3 | Alex Johnson | Sales | London |
In this example, the "Department Location" attribute depends on the "Department" attribute, which is not the primary key. To achieve 3NF, we can split the table into two:
Table: Employees
Employee ID | Name |
1 | John Doe |
2 | Jane Smith |
3 | Alex Johnson |
Table: Departments
Department | Department Location |
HR | New York |
IT | San Francisco |
Sales | London |
By creating a separate "Departments" table, we eliminate the transitive dependency between "Department Location" and "Department", ensuring that non-key attributes depend only on the primary key.