in memory databases

Notes from the reading of paper – main memory Database systems

There are two kinds of databases memory resident database systems and disk resident databases. if the cache of the DRDB is large enough , copies of the data will be in memory at all times , but its not taking full advantage of the memory. The index structures are designed for disk access ( B-trees) , even though the data is in memory. Also applications may have to access data through a buffer manager as if the data were on disk. For example every time an application wishes to access a given tuple its disk address will have to be computed and then the buffer manager will be invoked to check if the corresponding block is in memory. Once the block is found the tuple will be copied into an application tuple buffer where it is actually examined. For memory resident database, you can just directly access by its memory address. Newer applications convert a tuple or object into an in-memory representation give applications a direct pointer to it – called as swizzling. with regards to locking for concurrency control , since access times to memory is fast , the time period for which the lock is held is very low as well and as such there is no significant advantage to doing narrow or small lock granules like on a specific cell or column as opposed to the entire table , in extreme cases the lock granule can be at the entire database and thus making it serial execution , which is highly desirable since the cost of concurrency control are almost eliminated. ( setting lock, releasing locks, coping with deadlock, CPU cache flushes etc ) . For disk based system , the locks are kept in a has table , with the disk copy having no information., with a memory database systems , this information can be coded into the object itself with a bit or 2 reserved for this .

For in memory database , if there is a need to write to a transaction log on disk , then it present a bottle neck. there are different approaches to solve this problem – carve out some of the memory to hold the log and flush the log at the end of the transaction or do group commits when the page is full etc.

In a main memeory database , index structures like B-trees which are designed for block-oriented storage lose much of their appeal. Hashing provides fast lookup and update ,but may not be as space -efficient as a Tree. T-tree is designed specifically for memory resident databases. since pointers are unfiorm size, we can use fixed length structures for building indexes that rely on pointers. With in memory database , query processing techniques that assume sequential access lose their appeal – for e.g sort merge join processing , no need to sort because of random access.

The rest of the paper deals with the different attempts at an in memory database system with some specific characteristics for each . overall a great introduction to in memory database from historical perspective and still v very relevant , since i have not seen much commercialization of this kind of dB’s other than HANA which is terribly expensive.