Data vault is one of the newer data modeling approach and its designed to support agility and scale. Typical data warehouse design approaches require a lot of changes to be made at the 3NF layer to conform the data that is coming from multiple sources. Data Vault aims at building this layer in a more efficient manner by keeping the changes to the existing structure to a minimum . In this post and the next series of posts we can look at how to use the data vault approach.
Data vault focuses on using business keys to create a business -centric model for data warehousing. This makes it easier to represent the way businesses integrate , connect and access information in the same manner as the business does.
There are three basic entities that are derived from the source systems and these are Hubs, Links and Satellites. Lets look at each of them in detail.
Hubs . The first step in a data vault design is to think of what defines a business entity and what is the corresponding business key. For example this could be a User with the business key being user id or in case of an employee it would be the employee id. This uniquely identifies the entity and this business key goes into the Hub . The hub only stores the business key and some metadata. We will get into the metadata later. In the example below , we will be just storing the userid or the employee id in the hub table . Other attributes like first name , last name , age etc will go into another entity called as the satellite.
Links Hubs are linked to each other to represent transactions or relationships in the real word . Links are entities that tie Hubs ( business entities ) together. For example Employees belong to a department , in this case a link entity will join the dept hub to the employee hub and it depicts a relationship . Lets take another example a user could access a web page and add a product to the shopping cart. In this case the user hub can be linked to the product hub and the order hub with a link entity . This link entity represents the transaction.
Satellites – These are separate entities that add more business context to the Hub entity and the link entity. The Hub entity for e.g. employee hub captures only the business key employee id , but there is a whole lot of employee attributes like his name , age , gender, title , pay etc. that needs to be captured as well. This is where all of the information goes . Links will also have its own satellites , For e.g. in case user _ product link entity , the satellite connected to the link will capture the details of the specific transaction i.e. the date when the product was added to the cart , the quantity, the price , or any other details that go in the transaction. In the case of the link entity depicting a relationship like in the case of the employee to department , the satellite connected that particular link entity will show the date where the employee started with the particular department , the role in the particular department and or any other contextual information required for the entity.
in conclusion you just have to remember this
Hubs – > Business keys
Links -> Relationships / Transaction data
Satellites -> Attributes / description for the above two
This post provides a very high level introductory overview of data vault. I will get into a more details in subsequent posts.
One thought on “Data Vault – introduction”
Comments are closed.