Informatica DEI/BDM v10.5 – Dynamic Mapping from Relational DB to Hadoop

Objective:

By the end of this article, you will be able to :

  1. Understand the purpose of Dynamic mapping
  2. Design, Configure and Implement Dynamic mapping in real time
  3. Compare similar services available across Informatica

Overview:

By definition, a Dynamic mapping should handle changes of Source objects, Target objects and transformation logic at run time. The ultimate purpose of this mapping is to avoid manual work while create/synchronize the source and target objects. This will reduce the time, effort and human errors while developing. The content of this blog has been produced based on the Informatica BDM/DEI version 10.5 (latest, at the time of writing this blog).

Business Use Case: 

Using Dynamic mapping many use cases can be achieved, here are a few.

  1. Database to Database/Cloud/Hadoop Migration
  2. Initial historical data load while building a Data Warehouse/Data Lake from scratch

Design:

For this article we are going to consider use case of loading data from relational databases (Oracle, SQL Server and MySQL) to Hadoop Hive tables using dynamic mapping. Hive tables will be created at run time dynamically. This use case will be apt for Initial/Historical one time load where data needs to be migrated from multiple sources to a single destination (For instance, Operational Data Store/Rawlayer of Data Lake) for hundreds of tables.

Design

Operation:

Create a source and target object and mapping in Developer tool.

Create a empty data object for source. Here we have selected JDBC connector which is used to connect different RDBMS instead of native connectors.

Source

Image 02

 

In Source Data Object property, select the check box for “At run time, get data object columns from the data source”. This helps to get the source object at run time.

 

Image 03

Create target object and select Link Type as “Link dynamic port based on the mapping flow”

To create target table at run time, Select Target Schema Strategy as ‘CREATE’ and we can use DDL at Advanced property as ‘CREATE TABLE <schema_name>.<table_name> ({INFA_COLUMN_LIST}) STORED AS PARQUET LOCATION “/user/warehouse/<table_name>”‘

 

Image 04

Here the parameter values in mapping are kept as ‘Default’. These values will be passed at run time when the workflow starts.

Dynamin_Mapping_4

This Dynamic mapping will be running in Hadoop Spark environment.

Dynamin_Mapping_5

Configuration:

Parameter values can be passed at run time using parameter set or parameter values. In this article, we are going to proceed with Parameter set.

Create a Parameter Set and import the dynamic mapping so that all the parameter keys will be passed. Here the values will be ‘Default’ since the values will be passed at run time.

Dynamin_param_1

Create a Workflow with Command Task.

Command Task – Used to set the parameter values at run time and kick start the mapping using inbuilt infacmd utility.

Dynamin_Workflow_1

Sample Script File:

Script

Sample Input File:

Param File

Create a Application and select the Mapping and Parameter Set objects and deploy this application to Model Repository Service (MRS).

Dynamin_App_1

Comparing Dynamic Mapping Against Alternate Solution:

Scenarios Informatica DEI – Dynamic Mapping Informatica Mass Ingestion Informatica Data Synchronization
Can handle semi and unstructured data? Yes Yes No – Suitable for Relation DB, Flat file, Sales force
Can do Insert, Update and Delete? Partially – Based on Hive storage format Partially – Only Truncate and Insert possible Yes
Can run in Hadoop environment? Yes Yes No – Only in secure native environment
Can control sequential and parallel load? Yes No – Parallel only possible No – Only one table possible at a time
Can run in Spark and Blaze engine? Yes No – Only Map Reduce No – Only in secure native environment

Limitation:

Here are the few limitations that we have encountered.

  1. Dynamic mapping can’t handle MediumInt data type from MySQL Database.
  2. No option to concatenate dynamically all the incoming columns at once. For Instance, your use case is to take MD5 by concatenating all the incoming columns, this is not possible in Dynamic mapping.

Conclusion:

There are various ways in which a Dynamic mapping can help your business by reducing the cost, effort and human error, increases reusability and productivity. As a conclusion Dynamic mapping may be used where Data Synchronization and Mass Ingestion are having challenges in meeting the requirements.

Please feel free to comment and share your experience on Dynamic mapping if you would know any other limitations or features that are not mentioned here. We will be glad to respond/update the blog based on such feedback.