When you design a database, you first want to normalize it. Main purpose is to avoid data duplication, because duplicate data takes up unnecessary space and is harder to maintain. (For other normalisation rules cf. e.g. http://en.wikipedia.org/wiki/Database_normalization)
E.g. suppose you want to store information about your customers. You want to store their address to send them promotional material. You also want to store what products they bought so far. If you'd put that in one table, you'd be repeating the customer's address for each article they bought. When one of them changes address, you need to remember to change all the records to update the address to avoid data inconsistency.
So you normalize this bit, and create a table with e.g. customer number + customer name + customer street + customer zip code/postal code, a second table with zip code + city, a third table with customer number + product number, a fourth table with product number + product description + vendor number, etc.
Now look at the I/O involved in getting at that data. When you put all the data in one table, accessing all the data will normally involve fewer I/O transactions and therefore be faster than accessing the data spread over multiple tables, which requires jumping back and forth from indexes to data records, as it . And despite the fact that I/O performance has improved tremendously since early days, it still is the slowest component in a computer.
Online Analytical Processing (OLAP) databases usually do batch updates followed by many reads, and they often gain in performance by denormalization, i.e. moving back from complete normalisation towards a design that requires fewer tables.
In the above example, putting both the zip code and the city in the customer address table would make sense, especially since the relation between zip code and city is not volatile (i.e. does not normally change).
Computers with slow I/O subsystems may also benefit from denormalisation.
Denormalisation basically is the process of finding the balance between avoiding data duplication and ensuring database performance.
Normalizing data means eliminating redundant information from a table and organizing the data so that future changes to the table are easier. Denormalization means allowing redundancy in a table. The main benefit of denormalization is improved performance with simplified data retrieval and manipulation.
Denormalization is done to increase the read performance of a database by reducing the number of joins needed to retrieve data. It involves duplicating data across tables to minimize the need for complex joins, which can result in faster query processing. However, denormalization can lead to data redundancy and potential data inconsistency risks.
none, it uses denormalization.
Generally to optimize the performance of select query. To minimize joins used in the query.
Denormalization can improve database performance by reducing the need for complex joins, which can speed up query response times. It simplifies data retrieval by consolidating related information into fewer tables, making it easier for applications to access the data they need. Additionally, denormalization can enhance read-heavy workloads, as it often results in fewer disk I/O operations. However, it may lead to data redundancy and increased complexity in ensuring data consistency.
compromises that include denormalization
A constraint between two sets of attributes is known as functional dependency in relational database. Determination of functional dependencies is vital in database denormalization, normalization and relational model.
Schema programming involves defining the structure and relationships of data in a database. Key concepts include defining data types, relationships between tables, and constraints to ensure data integrity. Principles include normalization to reduce redundancy and improve efficiency, and denormalization for performance optimization.
In a datamart, data is typically denormalized rather than normalized. This approach is used to optimize query performance and simplify data retrieval for analytical purposes. Denormalization combines data from multiple tables into fewer tables, which can improve read efficiency and speed up reporting. However, the specific design may vary based on the requirements and use cases of the datamart.
If you meant disadvantage of normalization then these are the answer for your query. More tables to join: By spreading out your data into more tables, you increase the need to join tables. Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value. Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.
System data duplication, or denormalization, causes excess use of redundant storage, excess time processing queries, and possible inconsistency when de-normalized data is changed in one place but not the other. (Any one else have examples? Please enhance this answer. Thank you.)
Denormalization is the process of taking different data points from many different tables and combining it to larger, single table(s). Example: Table: CustomerContact Column: ContactID Column: FirstName Column: LastName Table: CustomerAddresses Column: AddressID Column: House Num Column: Street Column: Suffix Column: Prefix Column: FK_ContactID ... could be combined into: Table: Customers Column: CustomerID Column: FirstName Column: LastName Column: House Num Column: Street Column: Suffix Column: Prefix This is often done to increase the read efficiency that is sometimes lacking in a relational database. It is most common in reporting data warehouse environments where writes, deletes, updates and deadlocks aren't usually an issue but it is important to have reports which run quickly with less expensive join operations. The drawbacks to doing this are the inherent data redundancies. In the example above, customers with more than one address would have their name repeated in the Customers table once for every address. Where if it was normalized, their name would only appear once in the CustomerContact table and they would have multiple CustomerAddress records. On large scales, this kind of architecture can have large disk space impacts. Also, let's say in the example above that the customer's name was to change. In the denormalized version, this would involve changing their name on many Customer records instead of just changing the CustomerContact record once. Note: This is a very simple explanation. Much more detail is needed to fully understand the pros and cons to normalization vs. denormalization and the reasons for adopting either architecture. It would be recommended to fully understand the following concepts first: 1. Relational Database Design 2. Foreign Keys, Primary Keys, Uniqueness 3. Normalization, Normal Forms (First Normal, Second Normal, Third Normal etc.) 4. Summing, Grouping, Aggregation