Project Transformation : Push persistent BW dataflows to HANA Virtual Data Marts, Consume HANA models via Virtual provider
Project Transform is about redesigning BW data flows while leveraging HANA's number crunching power.
Goals :
1. Utilize HANA in-memory computing by pushing Stat/End Routines/Field Routines onto DB layer
2. Reduce intermediate staging DSO's there by reducing data foortprint, avoiding activation times
3. Promoting Virtual data mart layer
4. Providing users with up-to-date data rather than pre-computed data
5. Enabling users to query at item level rather than at summarized level(Cubes)
1. Utilize HANA in-memory computing by pushing Stat/End Routines/Field Routines onto DB layer
Routines from data flow will be included in the generated program and DTP spends lot of time doing processing on ABAPAS. An ABAP program spends more than 50% of its execution time in ABAPAS normally. These routines can be efficiently pushed down to HANA layer by creating Stored procedures and calling them from Routines. Else we can create a Calculation/Analytic view /Stored procedure which does the job of transformation.
We have a data flow like the one below where we get data from multiple data sources into target DSO. So to avoid staging DSO's, we have decided created calculation for every transformation.
Note : Why we chose scripted calculation view : We can use Analytic views too, but we have time-dependent master data and are bound to perform temporal Join. Hwoever temporal join can only be performed on DATE, TIMESTAMP fields, where as all our DATS, TIMS fields from ABAP will be converted to NVARCHAR(8) at database level. This was already discussed Here
We frequently do Lookups, transform columns and Filter data in routines. Lets see one way how we can push down these routines.
lets say we have a transformation between our source DSO and intermediate staging DSO. While moving from source DSO to Target DSO, we need to enhance source data(new attributes) with lookup from 3 Info objects(Done in Start & end routine), transforming a column(Done via Field Routine).
We have created a scripted calculation view and will be accessing Source DSO active table, lookup Info.objects master data views. which essentially does a JOIN between source DSO active table with 3 lookup tables and field routines logic can be included in SELECT of JOIN statement.
We have field routine which derives RESULT value based on nested IF conditions like below.
We can achieve the same result by writing a select statement like the below one.
formulas like RESULT = (STR - A /100) * 8 can be replaced as below.
Lookup can be replaced by writing a JOIN between DSO active table and info.object master data views.
AS clause helps us if Source and Target Fields names aren't same.
Now, once we are done with all the calculation views, its time to load data from the column views into target Cube.
Create a Virtual provider based on HANA Model and assign info objects to fields from HANA view. Then create a DTP from virtual provide to Target Cube to load data.
Trivia : We found that the DTP from virtual info provider executes fast.If data from target cube is read frequently, its better to materialize the data rather than to read from virtual provider every time for large amount of data. However for Real-Time data, this materialization approach doesn't work.
Limitations of Virtual provider on HANA Model: The HANA model and Virtual provider are not synced. So if the underlying HANA Model changes, then changes wont reflect in virtual info provider. To reflect the changes we need to create a new info provider. This will create problem for all queries/open hubs which are referencing the old info provider.
Also if HANA Model is having Input parameters, they can be assigned to to a Characteristic from virtual provider. Then we should create a restriction for this characteristic in BEx query.
Then user input will be passed onto HANA query. However the limitation is we can only pass single values. List values, ranges cant be passed.
Data flow after transformation :
Goals :
1. Utilize HANA in-memory computing by pushing Stat/End Routines/Field Routines onto DB layer
2. Reduce intermediate staging DSO's there by reducing data foortprint, avoiding activation times
3. Promoting Virtual data mart layer
4. Providing users with up-to-date data rather than pre-computed data
5. Enabling users to query at item level rather than at summarized level(Cubes)
1. Utilize HANA in-memory computing by pushing Stat/End Routines/Field Routines onto DB layer
Routines from data flow will be included in the generated program and DTP spends lot of time doing processing on ABAPAS. An ABAP program spends more than 50% of its execution time in ABAPAS normally. These routines can be efficiently pushed down to HANA layer by creating Stored procedures and calling them from Routines. Else we can create a Calculation/Analytic view /Stored procedure which does the job of transformation.
We have a data flow like the one below where we get data from multiple data sources into target DSO. So to avoid staging DSO's, we have decided created calculation for every transformation.
The calculation view contains the SQL required to suffice the Start/End/Fields routines we already have. We have identified the sequence in which these transformations need to execute(as there are dependencies among them).
Lets say the sequence is Transformation 1, 2 ,3 and 4. then we will create 4 calculation views for each level. Output from previous level calculation view(Transformation) will become input to next level calculation view(Transformation).
So at the end of transformation 4, we have therequired data that could be directly loaded int Target Cube.
To Expose the data from HANA Models to BW, we will create a Virtual Info provider based on HANA model. If we want to materialize the data, DTP from Virtual provider to target cube will be good.
(How to Load data from HANA models to BW Infor providers, we will discuss in next section)
With this approach we could eliminate the staging DSO there by avoiding redundancy in data storage, all transformations logic will be executed at database level rather than at ABAPAS level.
Note : Why we chose scripted calculation view : We can use Analytic views too, but we have time-dependent master data and are bound to perform temporal Join. Hwoever temporal join can only be performed on DATE, TIMESTAMP fields, where as all our DATS, TIMS fields from ABAP will be converted to NVARCHAR(8) at database level. This was already discussed Here
We frequently do Lookups, transform columns and Filter data in routines. Lets see one way how we can push down these routines.
lets say we have a transformation between our source DSO and intermediate staging DSO. While moving from source DSO to Target DSO, we need to enhance source data(new attributes) with lookup from 3 Info objects(Done in Start & end routine), transforming a column(Done via Field Routine).
We have created a scripted calculation view and will be accessing Source DSO active table, lookup Info.objects master data views. which essentially does a JOIN between source DSO active table with 3 lookup tables and field routines logic can be included in SELECT of JOIN statement.
We have field routine which derives RESULT value based on nested IF conditions like below.
IF STR-A = 1000 AND STR-B = N
IF STR-C = J AND STR-D = K
RESULT = '1000'
ELSEIF STR-C = M AND STR-D = N
RESULT = '2000
ELSE RESULT = '3000'
We can achieve the same result by writing a select statement like the below one.
formulas like RESULT = (STR - A /100) * 8 can be replaced as below.
Lookup can be replaced by writing a JOIN between DSO active table and info.object master data views.
AS clause helps us if Source and Target Fields names aren't same.
Now, once we are done with all the calculation views, its time to load data from the column views into target Cube.
Create a Virtual provider based on HANA Model and assign info objects to fields from HANA view. Then create a DTP from virtual provide to Target Cube to load data.
Trivia : We found that the DTP from virtual info provider executes fast.If data from target cube is read frequently, its better to materialize the data rather than to read from virtual provider every time for large amount of data. However for Real-Time data, this materialization approach doesn't work.
Limitations of Virtual provider on HANA Model: The HANA model and Virtual provider are not synced. So if the underlying HANA Model changes, then changes wont reflect in virtual info provider. To reflect the changes we need to create a new info provider. This will create problem for all queries/open hubs which are referencing the old info provider.
Also if HANA Model is having Input parameters, they can be assigned to to a Characteristic from virtual provider. Then we should create a restriction for this characteristic in BEx query.
Then user input will be passed onto HANA query. However the limitation is we can only pass single values. List values, ranges cant be passed.
Data flow after transformation :
Comments
Post a Comment