Introduction – Oracle Data Integration (ODI)
Introduction – Oracle Data Integration ( ODI )
“Data integration is the combination of technical and business process used to combine data from disparate sources into meaningful and valuable information”.
Data integration ensures that information is timely, accurate and consistent across complex systems.
Types of Integration:
- Data Integration (large volume of data Integration)
- Event Integration (Event Drive Architecture)
- Service Integration ( Service Oriented Architecture)
- Oracle Data Integration features an active integration platform that includes all styles of data integration : data-based, event-based and service-based.
- Capable of transforming large volume of data efficiently
- Processing of events in real time through its advanced Change Data Capture (CDC) Capability
- Providing data services to Oracle SOA suite
ETL tools Operate by first extracting the data from various sources, transforming the data to proprietary or middle-tier ETL engine , and then loading the transformed data onto the target data warehouse or integration server.
Here data must be moved over the network twice:
Once between source and ETL server
Again Between ETL server and target data Warehouse
ETL engine performs data transformation row-by-row basis so it will be more difficult in some situations. Moreover, if you want to ensure the referential integrity by comparing data flow references against the value from target data warehouse, the referenced data must be downloaded from target to the engine, thus further increase the traffic in the network.
Extract data from sources, load to tables into destination server, and then transform data on the target RDBMS using native SQL operators.
Meaningful information is nothing but “Data”.
- Analytical Data (Data WareHouse)
- Transactional Data
- Its run time data or day to day data
- It’s current and detail
- It’s useful to run Business
- It is used to store in OLTP (On Line transaction Processing)
- Source of Transactional data is Application
- Example ATM Transactions , Share market Transaction etc
- It is useful analyze to the business
- It is historical and Summarized data
- It is stored in OLAP or Data Warehouse
- Source of Analytical Data is OLTP
Data Warehouse Concepts
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.
It usually contains historical data derived from transaction data, but it can include data from other sources.
It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources
Characteristics of a data warehouse (William Inmon):
- Subject Oriented
- Time Variant
Data warehouses are designed to help you analyze data.For example, to learn more about your company’s sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like “How much sales happend last year for first quarter?” This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented
Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse’s focus on change over time is what is meant by the term time variant
Data Warehousing Objects
Fact tables and dimension tables are the two types of objects commonly used in dimensional data warehouse schemas
Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables represent data, usually numeric and additive, that can be analyzed and examined. Examples include sales, cost, and profit.
Dimension tables, also known as lookup or reference tables, contain the relatively static data in the warehouse.Examples are customers or products.
Creating and Managing Topology for Oracle Technology
1. Creating Physical Architecture
a. The physical architecture defines the different elements of the information system, as well as their characteristics taken into account by Oracle Data Integrator
b. Each type of database (Oracle, DB2, etc.) or file format (XML, Flat File), or application software is represented in Oracle Data Integrator by a technology
c. The physical components that store and expose structured data are defined as “Data Server”
d. A data server is always linked to a single technology. A data server stores information according to a specific technical logic which is declared into physical schemas attached to this data server
Process to create Physical Architecture:
i. Open ODI Studio
ii. Connect to Master repository with Supervisor
iii. Go to Topology Navigator
iv. Expand Technologies and select Oracle Technology
v. Right click on it, select New Dataserver
vi. In the Definition mention as per below screenshot
vii. In the JDBC tab mention as per below screenshot
JDBC Drive: Select “oracle.jdbc.OracleDriver”
JDBC URL: jdbc:oracle:thin:@<host>:<port>:<sid>
viii. Click on Test Connection – Click Test – click Ok
ix. Hence Data server creation Completed
i. Right click on Data Server
ii. Select New Physical schema
iii. Enter details as mentioned below
Schema (Schema) – Data Schema à ODI where inserts and updates target tables
Schema (Work Schema) – Work Schema à Schema where all temporary tables (E$_,C$_…) are created and dropped by ODI
iv. Click on Save – Click on Ok
2. Creating Context
Contexts bring together components of the physical architecture (the real Architecture) of the information system with components of the Oracle Data Integrator logical architecture (the Architecture on which the user works)
Process to Create:
i. Go to Topology navigator
ii. Expand Context Accordion
iii. Click on the New Context
Name: Context Name
Code: Unique among various Repositories
v. Click on Save
3. Creating Logical Architecture
a. The logical architecture allows a user to identify as a single Logical Schema a group of similar physical schemas – that is containing data stores that are structurally identical – but located in different physical locations
b. Logical Schemas, like their physical counterpart, are attached to a technology
c. All the components developed in Oracle Data Integrator are designed on top of the logical architecture. For example, a data model is always attached to logical Schema
Process to Create Logical Architecture:
i. Go to Topology Navigator
ii. Expand Logical Architecture
iii. Expand Technologies and right click on Oracle Technology
Click on New Logical Schema
Select Physical Schema for context
Click on Save
1. What is difference between ODI 11g and ODI 12c?
2.Different types of Repositories?
Ans: They are two types of Repositories
1. Master Repository and
2. Work Repository
It holds information related to Topology and Security
It stores information related to Designer and Operator
3.Different types of Work Repositories?
Ans: They are two types of Work Repositories
Execution Work Repository
Development Work Repository
4.What is the hierarchy of Master and Work Repositories?
Ans: We can have one Master and ‘n’ Work repositories. But each Work repository should be attach to only one Master Repository
5.What is the difference between ETL and ELT?
Ans: In ETL we should have Middle Tier Server Engine where as in ELT shouldn’t require Middle-Tier Server Engine. So it reduce cost. Network traffic is more for ETL where as for ELT less network traffic compare to ETL
6.What is meant by OLAP and OLTP?
- OLAP – OnLine Analytic Process – Maintain Historical Data
- OLTP – OnLine Transaction Process – Daily Data (Business Data)
7.How many types of Navigators and their usage?
Ans: Four Types of Navigators are available.
- Designer – Development
- Operator – Monitoring
- Topology – Configuration
- Security – Provide Security
8.What is significant difference between Physical Schema and Logical Schema?
Ans: Physical Schema = Logical Schema + Context
For Example: Consider an Organization A whose Branches are A1,A2 and A3.
Consider the Structure of A1,A2 and A3 Schema’s are same but located in different Servers. By the EOD all the data stored in A1,A2 and A3 to be stored in A.
For above scenario, developer develops one mapping with one logical Schema , 3 Physical Schema (A1,A2 and A3)and 3 Context (A1_CTX,A2_CTX and A3_CTX) . While executing the mapping if he selects A1_CTX it loads to A1. (i.e., Logical_schema+A1_CTX = A1… ) . That means we can reuse same code to pick data from different schema’s.
9.What is an Agent and different types of Agents?
Ans: ODI agent is run time component which orchestrates the data integration process.
They are three types of Agents:
Standalone Agent – Light Weight
J2EE Agent – High Availability , Scalability, Security and better manageability
Colocated Standalone Agent – Combination both standalone and J2EE agent
10.What is Context and it’s purpose?
Ans: Contexts bring together components of the physical architecture (the real Architecture) of the information system with components of the Oracle Data Integrator logical architecture (the Architecture on which the user works).
For example, contexts may correspond to different execution environments (Development, Test and Production) or different execution locations (Boston Site, New-York Site, and so forth.) where similar physical resource exist.
11.What is an interface/Mapping?
Ans:Mappings are the logical and physical organization of your data sources, targets, and the transformations through which the data flows from source to target. We can create and manage mappings using the mapping editor, a new feature of ODI 12c.
12.What is a variable and different types of variables?
Ans:A variable is an object that stores a single value. This value can be a string, a number or a date. The variable value is stored in Oracle Data Integrator. It can be used in several places in your projects, and its value can be updated at run-time
- Refresh Variable
- Set Variable
- Declare Variable
- Incremental Variable
- Evaluation Variable
13.What is knowledge module and different types of KM’s?
Ans:Knowledge Modules (KMs) are code templates. Each KM is dedicated to an individual task in the overall data integration process
Types of KM’s:
- LKM – Loads heterogeneous data to staging area
- IKM – Integrate Data from staging area to target area
- RKM – Retrieves metadata to Oracle Data Integrator work repository
- SKM – Generates Data Manipulates web services
- CKM – Checks the constraints
- JKM – Used for CDC (Change Data Capture)
14.Different types of Reverse Engineering?
Ans: Two types of Reverse Engineering:
Standard Reverse Engineering
Customized Reverse Engineering
15.What is scenario?
Ans: The scenario code (the language generated) is frozen, and all subsequent modifications of the components which contributed to creating it will not change it in any way
16. What is difference between scenario and package?
Ans: Scenario is froze code we can’t do any modifications
17. What is Load Plan and different types of Load Plans?
A Load Plan is an executable object in Oracle Data Integrator that can contain a hierarchy of steps that can be executed conditionally, in parallel or in series
Types of Load Plans
18.What is difference between Package and Load Plan?
19. How to perform exception Handling in ODI?
Using ODISendMail and Eceptions in Loadplans
ODI processes are typically at the mercy of the entire IT infrastructure’s health. If source or target systems become unavailable, if network incidents occur, ODI processes will be impacted. When this is the case, it is important to make sure that ODI can notify the proper individuals. This post will review the techniques that are available in ODI to guarantee that such notifications occur reliably. When you create an ODI Package , one of the ODI Tool available to you for notification is ODISendMail
If we handle the exceptions at the load plan level there is no need to send the notification emails from the original package. What must do now is to create two separate packages: one to process the data, another one to handle the exceptions.
There are two main aspects to a load plan:
Steps: a set of scenarios that must be orchestrated, whether executions are serialized of parallelized
Exceptions: a series of scenarios that can be used in case one of the steps fails to execute properly.
This exception will run a dedicated scenario whose sole purpose is to send a notification email.
Load Plans Exception
When you edit the properties of the steps of the load plans, you can choose whether they will trigger the execution of an exception scenario or not. For each step, including the root step, you can define what the behavior will be in case of error:
Choose the exception scenario to run (or leave blank)
Run the exception scenario (if one is selected) and stop the execution of the load plan at this point
Run the exception scenario (if one is selected) and proceed with the rest of the load plan.
20. How to perform parallel execution and serial execution packages in ODI?
You can arrange the packages in parallel , serially or based on specific conditions in or using loadplans
21.Can we reverse engineer Flat Files?
Ans: Yes, we can reverse engineer the flat files.
22.What are the advantages of standalone and J2EE agent?
Ans: J2EE Agent is highly available.
23. What is difference between Static and Flow control?
24. What is the difference between lookup and Join?
Ans: lookup is basically just like left outer join
25. How to load data from XML file to Relational database table?
Example – XML to ORACLE Table using ODI
How to load data from XML to ORACLE Table using ODI 12c?
<?xml version=”1.0″ encoding=”UTF-8″?>
Target Table Structure:
CREATE TABLE REPLICA.EMPINFO
Knows how to do topology configuration for Oracle Technology
Knows how to create model for Oracle
- Expand Physical Architecture –> expand Technologies –> Right click on XML Technology and create new Dataserver
- In the definition provide the name of Dataserver (for e.g., XML_DS)
- Go to JDBC and Select as mention below
For JDBC URL: jdbc:snps:xml?f=<filename>[&s=<schema>&<property>=<value>…]
For example: jdbc:snps:xml?f=G:\ODI_FILES\ODI_XML\employee.xml&re=employeeDetails
Save it and test the connection. You will get successful connection prompt
Right click on Data server and create Physical Schema (Provide the first five characters of file name if you didn’t provide property of s in JDBC URL)
Expand Technologies — Expand XML — Right click and new logical Schema (XML_LS)
Link Logical Schema to Physical Schema
Create Model for XML technology as mention below
- Right click on Model and say click on reverse Engineering
- It will create two datastores as mention below
The standard reverse-engineering process will automatically reverse-engineer the table from the relational schema generated by the XML driver. Note that these tables automatically include:
Primary keys (PK columns) to preserve parent-child elements relationships
Foreign keys (FK columns) to preserve parent-child elements relationships
Order identifier (ORDER columns) to preserve the order of elements in the XML file
- Import Knowledge modules – LKM SQL to SQL and IKM SQL Control Append
- Create mapping — Drag drop the XML datastore (EMPLOYEEINFO) and Table datastore from Oracle Model (EMPINFO) and auto map it.
- For Employee Id map to the EMPLOYEEINFOORDER
- Go to physical tab and select above mention KM’s
- Execute the Mapping
Types of Variables in ODI
A variable’s value is stored in Oracle Data Integrator. This value may change during the execution.
Declare Variable: When a variable is used in a Package Oracle strongly recommends that you insert a Declare Variable step in the Package. This step explicitly declares the
variable in the Package.
Refresh Variable: This variable step refreshes the variable by running the query specified in the variable definition.
Set Variable: There are two functions for this step:
Assign sets the current value of a variable.
Increment increases or decreases a numeric value by the specified amount.
Evaluate Variable: This variable step type compares the value of the variable with a given value according to an operator. If the condition is met, then the evaluation
step is true, otherwise it is false. This step allows for branching in Packages
27. What is sequence and different types of Sequence available in ODI?
- A sequence is a variable automatically incremented when used. Between two uses the value is persistent.
- Standard sequences – whose current values are stored in the Repository.
- Specific sequences- whose current values are stored in an RDBMS table cell. Odi reads the value, locks the row and updates the row after the last increment.
- Native sequence – that maps a RDBMS-managed sequence.
28. What is User Functions?
User functions allow you to define customized functions for
which you will define technology-dependent implementations. They are usable in
mappings and procedures.
29.What is Marker?
A component of a project may be flagged in order to reflect a methodology or
organization. Flags are defined using markers. These markers are organized into
groups, and can be applied to most objects in a project
30.What is Scenario?
When a package, mapping, procedure, or variable component has been fully
developed, it is compiled into a scenario. A scenario is the execution unit for production. You can generatete and regenerate scenarios in ODI
Scenarios can be scheduled for automated execution
31.What is meant by load balancing and how to do setup for it?
Ans: Load Balancing:
Oracle Data Integrator allows you to load balance parallel session execution between
An agent’s load is determined at a given time by the ratio (Number of running
sessions / Maximum number of sessions) for this agent.
To setup load balancing:
1. Define a set of physical agents, and link them in a hierarchy of agents
2. Start all the physical agents corresponding to the agents defined in the topology.
3. Run the executions on the root agent of your hierarchy. Oracle Data Integrator will
balance the load of the executions between its linked agents.
32. What is an agent and types of agents available in ODI?
At design time, developers generate scenarios from the business rules that they have designed. The code of these scenarios is then retrieved from the repository by the Run-Time Agent.
This agent then connects to the data servers and orchestrates the code execution on these servers. It retrieves the return codes and messages for the
execution, as well as additional logging information – such as the number of processed records, execution time etc. – in the Repository.
The Agent comes in 3 different flavors:
- The Java EE Agent can be deployed as a web application and benefit from the features of an application server.
- The Standalone Agent runs in a simple Java Machine and can be deployed where needed to perform the integration flows.
- Colocated Standalone Agents can be installed on the source or target systems . They can be managed using Oracle Enterprise Manager and must be configured with an Oracle WebLogic domain. Colocated Standalone Agents can run on a separate machine from the Oracle WebLogic Administration Server
1. How promote code from one environment another environment?
Ans: We can promote the code in 2 ways
a.Import and export scenarios
b. Smart Import and Smart Export Objects
2. What are different ways to promoting code ?
a. Import and export scenarios
b. Smart Import and Smart Export Objects
3. What is difference between Smart import/Export and import/Export Scenario?
Ans: Import and Export we usually import or export that particular object or scenario but not the depend objects. Import and Export Scenarios is recommended for higher environment because we are no longer do development in higher environment (like UAT , PROD)
Smart Import and Smart Export is used to move code along with dependency object. This time of migration is recommended for lower environments like (DEV,QA) where we need to do development.
4. What is the approach you will follow for converting columns into rows?
5. What is the approach you will follow for converting rows into columns?
6. How to load different departments records into different tables
Using Split component in ODI 12c we can achieve it
7. How to load multiple files to single target table using single interface/Mapping in ODI?
using variables to loop through the files and a single mapping that loads the files to the target table
RANK function is a built in analytic function which is used to rank a record within a group of rows. Its return type is number and serves for both aggregate and analytic purpose in SQL.
RANK (expression) WITHIN GROUP (ORDER_BY expression [ASC | DESC] NULLS
RANK () OVER (PARTITION BY expression ORDER_BY expression)
assigns unique numbers to each row within the PARTITION given the ORDER BY clause
RANK() – behaves like ROW_NUMBER(), except that “equal” rows are ranked the same.
Primary Key Vs Unique Key
- There can only be one primary key in a table
- In some DBMS it cannot be NULL – e.g. MySQL adds NOT NULL
- Primary Key is a unique key identifier of the record
- Can be more than one unique key in one table
- Unique key can have NULL values
- It can be a candidate key
- Unique key can be NULL and may not be unique
Surrogate and Natural Key
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key.
A natural key is a key from within the data.
A surrogate key is a key that we introduce in the data so as to be able to
identify particular piece of information easily.
Usually, it is the unique identifier for each record.
1. What are different types of Schemas?
Ans: Star Schema and Snow Flake schema
2. What is different between SnowFlake and Star Schema?
3.What is Fact and Dimension?
4.Different types of Facts?
5.Different Types of Dimensions?
6.What is meant by CDC?
7. Different types of CDC?
8.What is meant by SCD?
A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.
9. Different types of SCD?
We have 6 types of slow changing dimensions but 1, 2 and 3 are commonly used.
Type 0 — Fixed Dimension. No changes allowed
Type 1 —-No History
Type 2 —-Row versioning
Type 3 —-Previous Value Column
Type 4—-History Table
Type 6—-Hybrid SCD
Note! No Type 5 in commonly agreed definitions
slowly changing dimensions Explained
When organising a datawarehouse into Kimball-style star schemas, you relate fact records to a specific dimension record with its related attributes. But what if the information in the dimension changes?
Do you now associate all fact records with the new value? Do you ignore the change to keep historical accuracy?
Or do you treat facts before the dimension change differently to those after?
It is this decision that determines whether to make your dimension a slowly changing one. There are several different types of SCD depending on how you treat incoming change.
Type 0 – Fixed Dimension
- No changes allowed, dimension never changes
Type 1 – No History
- Update record directly, there is no record of historical values, only current state
Type 2 – Row Versioning
- Track changes as version records with current flag & active dates and other metadata
Type 3 – Previous Value column
- Track change to a specific attribute, add a column to show the previous value, which is updated as further changes occur
Type 4 – History Table
- Show current value in dimension table but track all changes in separate table
Type 6 – Hybrid SCD
- Utilise techniques from SCD Types 1, 2 and 3 to track change
In reality, only types 0, 1 and 2 are widely used, with the others reserved for very specific requirements. Confusingly, there is no SCD type 5 in commonly agreed definitions.
After you have implemented your chosen dimension type, you can then point your fact records at the relevant business or surrogate key. Surrogate keys in these examples relate to a specific historical version of the record, removing join complexity from later data structures.
We have a very simple ‘customer’ dimension, with just 2 attributes – Customer Name and Country:
However, Bob has just informed us that he has now moved to the US and we want to update our dimension record to reflect this. We can see how the different SCD types will handle this change and the pro/cons of each method.
Our table remains the same. This means our existing reports will continue to show the same figures, maybe it is a business requirement that each customer is always allocated to the country they signed up from.
All future transactions associated to Bob will also be allocated to the ‘United Kingdom’ country.
The table is updated to reflect Bob’s new country:
All fact records associated with Bob will now be associated with the ‘United States’ country, regardless of when they occurred.
We often just want to see the current value of a dimension attribute – it could be that the only dimension changes that occur are corrections to mistakes, maybe there is no requirement for historical reporting.
In order to support type 2 changes, we need to add four columns to our table:
· Surrogate Key – the original ID will no longer be sufficient to identify the specific record we require, we therefore need to create a new ID that the fact records can join to specifically.
· Current Flag – A quick method of returning only the current version of each record
· Start Date – The date from which the specific historical version is active
· End Date – The date to which the specific historical version record is active
With these elements in place, our table will now look like:
This method is very powerful – you maintain the history for the entire record and can easily perform change-over-time analysis. However, it also comes with more maintenance overhead, increased storage requirement and potential performance impacts if used on very large dimensions.
Type 2 is the most common method of tracking change in data warehouses.
Here, we add a new column called “Previous Country” to track what the last value for our attribute was.
Note how this will only provide a single historical value for Country. If the customer changes his name, we will not be able to track it without adding a new column. Likewise, if Bob moved country again, we would either need to add further “Previous Previous Country” columns or lose the fact that he once lived in the United Kingdom.
There is no change to our existing table here, we simply update the record as if a Type 1 change had occurred. However, we simultaneously maintain a history table to keep track of these changes:
Our Dimension table reads:
Whilst our Type 4 historical table is created as:
Depending on your requirements, you may place both ID and Surrogate Key onto the fact record so that you can optimise performance whilst maintaining functionality.
Separating the historical data makes your dimensions smaller and therefore reduces complexity and improves performance if the majority of uses only need the current value.
However, if you do require historical values, this structure adds complexity and data redundancy overheads. It is generally assumed that the system will use Type 1 or Type 2 rather than Type 4.
The ‘Hybrid’ method simply takes SCD types 1, 2 and 3 and applies all techniques. We would maintain a history of all changes whilst simultaneously updating a “current value” column on all records.
This gives you the ability to provide an element of change comparison without additional calculation, whilst still maintaining a full, detailed history of all changes in the system.
Personally, if this requirement came up, I would avoid the data redundancy of this extra column and simply calculate the current value using the “LAST_VALUE()” window function at run-time. Although this depends on your priorities between data storage and direct querying performance.