From Only SQL to No-SQL

Before we start, welcome to unstructured data. We will quickly review where current relational databases are lagging and need of new generation No-SQL databases. Examples are MongoDB, Cassandra, HBASE.

                                                     Relational Databases are designed in times when memory costs were high and disk seek times are high. So the design of  database was focused on optimizing the memory required for string data. Hence Normalization of data model was promoted(No data redundancy). Normalized data model is also optimized for Inserts/Updates. Although Index are supported for sake of faster access,  primary concern was saving the space. However, we currently live in a world where memory costs are so cheap such that faster reads, Real-time systems are the need of hour. In current situations, we can compromise on data redundancy for the sake of faster response times.

More bad news is that the same relational databases were used for hosting data warehouses where data will be stored in de-normalized form. Hence clearly Traditional Relational Databases which are designed for Normalized data models, faster Insert/updates are not an option for hosting OLAP systems which stored data in de-normalized form, rarely updates data. However we were forced to live that and to mitigate this we had come up with idea of storing pre-computed data. This led to creation of Cubes, aggregates and continuous refreshing overhead of them, which took out concept of real-time away and made data models difficult to modify.

SAP came up with In-memory Appliance HANA which is columnar store(Most of the columns often have less distinct values) Distributed Database(Scale-out approach) optimized for doing aggregations, joins on the fly without seeking the disk every time. Its clearly a game changer for OLAP systems and results are evident about its success. But still HANA follows traditional rows/columns & Tables as data storage model.

How about creating a data storage model specific to data being stored/Access patterns such that data read times are optimized rather than trying to fit every data under sun into column/row model, B-trees paradigm.

For example, storing DNA data sets into row/column is not optimized for reading them. Storing web pages. PDF documents into row/column paradigm is clearly not optimized.  This triggered the inception of new generation data bases which are mostly based on concept of <Key, value> pairs. Hence there is no rule that all rows of a table should have same number of columns/structure. Table can be related to concept of Collection in terms of No-SQL data bases. Instead of just cramming documents/videos/unstructured into a BLOB/CLOB fields, No-SQL databases have specialized options for string unstructured data. They promote de-normalzed models for faster access, support inverted index and processing of unstructured data while following ACID properties in a distributed environment. No-SQL databases clearly gives us more options for efficient storage of unstructured data.

However while we change our thinking from relational world to No-SQL world, we have to be careful in making our choices. Of all the storage models available,  we have to choose one such that it is optimized for your data.For example, Document model or key value pairs, Time To Live(TTL), Indexes have to be chosen such that they are optimized for data we are storing/retrieving.

Happy No-SQL!!!!

Comments

Popular posts from this blog

SAP V1 V2 V3 JOBS

Virtualization with SAP HANA

Star Join node Vs Analytic view && Attribute View vs Calculation Dimension View