SAP HANA RAMBLINGS


SAP HANA is a leading IMDB(In-memory DB, not the movies one:)). It works by keeping all required in RAM. The idea of developing in-memory cache was started with mounting pressure from BW customers for faster query execution times. Also the same case with APO CIF queues. SAP answered this query by developing live cache for APO and BW accelerator, both works keeping data in-memory. This prompted to develop a fully operational DB in-memory, SAP HANA. The cheaply available RAM costs are also a main reason.

SAP HANA is configured for parallel processing and distributed processing.

When a query is passed to HANA DB engine, HANA observes the generated execution plan of the query. Then it identifies the parts of query that can be computed in parallel there by reducing response time.

 Also, HANA is deployed in infrastructure that has disaster tolerance and high availability. SAP teams up with hardware vendors like HP,Hitachi, IBM to develop this hardware infrastructure. These vendors provide SMP machines i.e. 2 or 4 CPU's work by sharing same main memory. This distributed computing provides these advantages.
1. When one of them is down, still the system is available.
2. Also by adding more CPU's increases scalability.

Columnar Tables : We have seen traditional RDBMS where data is stored in rows. However these systems are better suited for transactions system where the whole record is required for processing and context is more data insert/delete/modify sensitive. For analytic/reporting  purposes we don't need all the 200 fields in sales order table. We might be interested in only net sales, tax amounts and discounts. Also in analytic systems, processing is more about at aggregate level rather than at record level. We are mostly interested in SUM,MIN, MAX values of particular aggregate. In total, we are interested in aggregated values of columns rather than the whole record. Hence traditional RDBMS is not an ideal choice for analytic systems. Because in RDBMS data is stored physically on circular discs. Each disc is composed of tracks. All the rows are arragned sequentially in tracks.

When we asked DB engine to calculate SUM of a particular column, it goes to each row and offsets to column of the row, then rotates the disc till next row and puts DB reader to column location and reads the column value. Hence the total time takes equals to rotation time till next row and reading time for corresponding column. Also as the data is stored on HDD, the reading times of ROM are way slower than reading times of main memory(RAM).

In case of column tables, data is stored column wise and compression, dictionary encoding techniques will be employed as shown below. After a customer survey report, many SAP table columns contains small number of distinct values repeated  through out the column like material colors, properties. After carefully examining may customers data, SAP came to this conclusion.  It was estimated that master data tables size could be reduced by 80% with compression technique. Also instead of processing text values, we could assign integer values for each distinct value, there by making operations like COUNT faster.
     As the column values are stored sequentially, for operations like  SUM,MIN,MAX, disc header proceeds sequentially; hence the reading times are lowered. Also as the data resides in RAM, which has higher processing speeds. Also as the column data is stored sequentially, indexes are not required.

All these factors explained above, parallel processing, distributed computing, data compression, data encoding, in-memory residence makes processing of aggregate operations very faster. Hence HANA DB is ideal for analytic operations and with lightening speeds. Your enterprise data can be searched/analysed at the speed we use Google.

Note: Although data compression, encoding takes extra processing times, overhead is very much small when  compared to row store DB processing times. SAP has mechanisms to write data changes to persistent area when there is an failure as data resides in main memory.We will see in an other post about HANA memory management.

If column  tables are efficient/prominence in HANA DB, why are there row tables:

HANA is not only a DB for analytic operations, it supports transactions systems too, hence the traditional row tables are supported. SAP vision is to run both analytic and transaction systems on one DB. Then you can do analytic operations on real time data seamlessly. Hence we can think about real time data analytics rather than working with pre-aggregated data which will upadted once in the month end only.






Comments

Popular posts from this blog

SAP V1 V2 V3 JOBS

Virtualization with SAP HANA

Star Join node Vs Analytic view && Attribute View vs Calculation Dimension View