Data Vault – Links

Links link Hubs and represent relationships or transactions. Links therefore contain the hash key of each connected Hub long with some metadata. So in the case of employee and Department hubs , we will have a empdeptlink that will have the following fields

As you can see the link table stores the Hash key of the employee hub table and Dept Hub table

the primary key of the link table is the Hash key which is really the hash code for the combination of all the Business Keys in the link.

Just like Hubs , the load date and the record source are the only two meta data fields that are added to the link table.

Notice there is no slowly changing dimension logic built for links or hubs , those are captured in the satellite entities .

A Link consists of two or more foreign keys. These can be hash keys from Hubs or from other Links. The primary key of a Link table is the hash value calculated over all the foreign keys together with the load date. The foreign keys are, of course, hash values themselves because they reference the hash keys of the Hub tables

there are two other optional fields that may be added to the link table. one is the Last seen date , the logic for which we have described in the post for Hubs and the other is the Dependent Child key . In case of a customer placing an order , the order line number would be a dependent child key , since that affects the grain of the data – like quantity and amount etc. This is also called as a degenerate field . These fields cannot stand on its own like a hub and has no meaning unless you look at the context and have no descriptors on their own. The dependent child key is also used as an identifying element of the link structure so the hash key is derived from the business keys of the referenced hubs and dependent child key

The link table doesnot have any descriptive information so in the above example the link table does not have any information of the line item quantity or price etc. these details are stored in the satellite table for link . Link acts like a bridge table to represent transcations between the hubs and it essentially implements a many to many relationship between hubs. One to many relationship is a subset of many to many relationship. Link should go down to the lowest level of detail and this establishes the grain of the data warehouse and in modern data warehouse its best to always go down to the lowest avaibale grain .

We will look at Satellite next