What is the main difference between Inmon and Kimball?

Basically speaking, Inmon professes the while Kimball relies on the .

According to Ralf Kimball…

Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.

He follows Bottom-up approach i.e. first creates individual Data Marts from the existing sources and then Create Data Warehouse.

KIMBALL – First Data Marts – Combined way – Data warehouse.

According to

Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.

He follows Top-down approach i.e. first creates Data Warehouse from the existing sources and then create individual Data Marts.

INMON – First Data warehouse – Later – Data Marts.

The Main difference is:

Kimball: follows .

Inmon: follows ER Modeling bye Mayee.

Kimball: creating data marts first then combining them up to form a data warehouse.

Inmon: creating data warehouse then data marts.

What is difference between Views and Materialized Views?

Views:

•• Stores the SQL statement in the and let you use it as a table. Every time you access the view, the SQL statement executes.

•• This is PSEUDO table that is not stored in the database and it is just a query.

Materialized Views:

•• Stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.

•• These are similar to a view but these are permanently for the database and often and are useful in aggregation and summarization of data.

What is Junk Dimension? What is the difference between Junk Dimension and ? Junk Dimension:

The column which we are using rarely or not used, these columns are formed a dimension is called Junk Dimension.

Degenerate Dimension:

The column which we use in dimension is Degenerate Dimension.

A Degenerate Dimension is data that is dimensional in nature but stored in a .

Example:

EMP table has empno, ename, sal, job, deptno

But

We are talking only the column empno, ename from the EMP table and forming a dimension this is called Degenerate Dimension.

How to list Top 10 salaries, without using Rank Transformation?

By Using Sorter Transformation using sorted port as SAL and Filter Transformation to get first 10 records.

What is Data Warehousing?

The process of making operational data available to business managers and decision support systems is called Data Warehousing.

How do u handle two sessions in Informatica?

Using Link Condition.

If first session is succeeded automatically second runs and succeeded.

What is the purpose of using UNIX commands in Informatica? Which UNIX commands are generally used with Informatica?

Sometimes we have to work with UNIX based servers mostly we are using UNIX based servers so there we have to load data. ””egrep, grep, rm these commands would be used knowledge of UNIX would be advantage.

How to create Slowly Changing Dimension in Informatica? Select all rows. Caches the existing target as lookup table. Compares logical key column in that source against corresponding column in the target lookup table. Compare source columns against corresponding target columns if key match, flags news rows and changed rows.

Create two data flows: one new row and other is changed row. Generate a primary key for new row. Insert new row in the target and updates changed row in the target over writing existing rows.

Transformation used:

SQ –> 1 Connected Lookup –> target

2 Unconnected Lookup

–> Expression –> Router –> Update Strategy –> target (instance).

What is the difference between SQL Overriding in Source Qualifier and Lookup Transformation?

Major difference is we can use any types of joins in sql over riding in source qualifier but in lookup we can use only eqi-join in sql override.

How will you update the row without using Update Strategy Transformation?

You can set the property at session level “Treat Source Rows as: UPDATE or INSERT”, the record without using Update Strategy in the mapping.

In Target, there is a Update Override option of updating the records using the non-key columns. Using this one we can update the records without using Update Strategy Transformation.

How we do performance tuning in Informatica?

Performance tuning is done in several stages, like for first we do check in following order:

Target, Source, Mapping, Session, System, and depending upon which level got bottleneck we do rectify it.

Explain about scheduling real time in Informatica?

Scheduling of Informatica jobs can be done by the following ways:

Informatica Workflow Manager, Using Cron in UNIX, Using Opcon Scheduler.

What is the definition of Normalized and Denormalized?

Normalization:

Normalization is the process of removing redundancies. OLTP uses the Normalization process. Denormalization:

Denormalization is the process of allowing redundancies. OLAP/DWH uses the Denormalized process to greater level of detailed data (each and every transaction).

Why fact table is in normal form?

A Fact Table consists of measurements of business requirements and foreign keys of dimensions tables as per business rules.

Basically the fact table consists of the Index keys of the dimension/lookup tables and the measures.

So whenever we have the keys in a table that itself implies that the table is in the normal form.

What is difference between E-R Modeling and Dimensional modeling?

E-R Modeling is used for normalizing the OLTP database design. It revolves around the Entities and their relationships to capture the overall process of the system.

In E-R Modeling the data is in Normalized form. So more number of Joins, which may adversely affect the system performance.

Dimensional modeling/Multi-Dimensional Modeling is used for de-normalizing the ROLAP/MOLAP design. It revolves around Dimensions (point of analysis) for decision making and not to capture the process.

In Dimensional Modeling the data is denormalized, so less number of Joins, by which system performance will improve.

What is Conformed Fact?

A Dimension table which is used by more than one fact table is known as a Conformed Dimension.

Conformed facts are allowed to have the same name in separate tables and can be combined and compared mathematically. The relationship between the facts and dimensions are with 3NF, and can works in any type of Joins are called as Conformed Schema, the members of that schema are call so…

What are the Methodologies of Data Warehousing?

Every company has Methodology of their own. But to name a few SDLC Methodology, AIM Methodology are standard used. Other Methodologies are AMM, World class Methodology and many more.

Most of the time, we use Mr. Methodologies for Data Warehousing design. Two kinds of Schemas: Star Schema and Snow Flake Schema. Most probably every one follows Either Star Schema or Snow Flake Schema.

There are two Methodologies:

1.1. Ralph Kimball – First Data Marts then Enterprise Data Warehouse.

2.2. Bill Inmon – First Enterprise Data Warehouse then Data Marts from EDWH.H.

Regarding the Methodologies in the Data Warehousing. They are mainly two methods:

1.1. RaRalplph KiKimbalall Modedell

Kimball Model always structured as Denormalized Structure.

2.2. Bill Inmon Model Inmon Model Structured as Normalized Structure.

Depends on the requirements of the company any one can follow the company’s DWH will choose the one of the above models.

In DWH contains the Two Methods:

1.1. TTop dodowwn Meththodod

Top down approach in the sense preparing individual departments data (data marts) from the Enterprise DWH.

First loads into Data Marts and then loads into the Data Warehouse.

2.2. BBotottotom up Metethohodd

Bottom up approach is nothing but first gathering all the departments data and then cleanse the data and Transforms the data and then load all the individual departments data into the enterprise data warehouse.

First loads into Data Warehouse and then loads into the Data Marts.

What is Data Warehousing Hierarchy?

Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.

What are Data Validation Strategies for Data Marts Version?

Data Validation is to make sure that the loaded data is accurate and meets the business requirements.

Strategies are different methods followed to meet the validation requirements.

What are the data types present in BO? N what happens I…

There are different data types: Dimensions, and Detail. View is nothing but an alias and it can be used to resolve the loops in the universe. There are called as Object types in the Business Objects (BOs). And “Alias” is different from View in the universe. View is at database level, but Alias is a different name given for the same table to resolve the loops in universe.

The different data types in business objects are:

11.. CChhaarraacctteer r

22.. DDaattee

33.. LLoonng tteexxtt

44.. NNuummbbeer r

Dimension, Measure, Detail are objects type. Data types are “character, date and numeric”. What is Surrogate Key? Where we use it explain why?

Surrogate Key is the primary key for the Dimensional table.

Surrogate Key is system generated artificial primary key values. It is mainly used for critical column in DWH. Here “critical column” means nothing but it is a column which when we updated on them most DWH in top OLTP systems. Surrogate Keys are that which Join Dimension tables and Fact tables. Surrogate Keys is the Solution for Critical Column problems.

Example: The “customer purchases different items in different locations, for this situation we have to maintain historical data.

Example: any candidate key can be considered as Surrogate Key.

By using Surrogate Keys we can introduce the row in the data warehouse to maintain historical data.

Surrogate Key is a Unique Identification Key, it is like an artificial or alternative key to production key, because the production key may be alphanumeric or composite key, because the production key may be alphanumeric or composite key but the surrogate key is always single numeric key. Assume the production key is an alphanumeric field. If you create an index for these fields it will occupy more space, so it is not advisable to join/index, because generally all the data warehousing fact table are having historical data. These fact tables are linked with so many dimension tables. If it’s a numerical field the performance is high.

Surrogate Key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, also known as artificial or identify key, key for the dimension tables primary keys. They can use in sequence generator, or Oracle sequence, or SQL Server Identify values for the surrogate key.

It is useful because the natural primary key i.e. “Customer Number” in “Customer Table” can change and this makes updates more difficult.

What is Workflow?

A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

What is Worklets?

A worklet is an object that represents a set of tasks.

When to create Worklets?

Create a worklet when you want to reuse a set of workflow logic in several workflows. Use the Worklet Designer to create and edit worklets. Where to use Worklets?

You can run worklets inside a workflow. The workflow that contains the worklet is called the “parent workflow”. You can also nest a worklet in another worklet.

What is Workflow Monitor?

You can monitor workflows and tasks in the workflow monitor. View details about workflow or task in Gantt View or Task View.

Actions:

You can run, stop, abort, and resume workflows from the Workflow Monitor.

You can view the log file and Performance Data.

Slowly Changed Dimension: It is a Dimension which slowly changes over a time.

Slowly Changed Dimension TTyyppee DDeessccrriippttiioonn Mapping

Inserts new Dimensions, Overwrites existing SSCCD TTyyppe 11 SSlloowwlly CChhaannggiinng DDiimmeennssiioonn dimensions with changed dimensions. (shows Current Data)

Inserts new and changed dimensions. Creates a SSCD TyType 2/2/VeVersrsioion Datataa SSlolowlwly Chahangngining Dimimenensisionon version number and increments the primary key to track changes.

Inserts new and changed dimensions. Flags the SCSCD TyType 2/2/FlFlag CuCurrrrenentt SSlolowlwly Chahangngining Dimimenensisionon current version and increments the primary key to track changes.

Inserts new and changed dimensions. Create an SSCD TTyype 2/2/DDatate Ranangege SSlolowwlly Chahangngining Dimimeensnsioionn effective date range to track changes.

Inserts new dimensions. Updates changed SSCCD TTyyppee33 SSlloowwlly CChhaannggiinng DDiimmeennssiioonn values in existing dimensions. Optionally uses the load date to track changes.

What is difference between OLTP and OLAP?

OOLLTTPP OOLLAAPP

OOn LLiinne TTrraannssaaccttiioon PPrroocceessssiinngg OOn LLiinne AAnnaallyyttiiccaal PPrroocceessssiinngg

CCoonnttiinnuuoouussllyuu ppddaatteesdd aattaa RReeaadoo nnllydd aattaa

TTaabbllees aarre iin nnoorrmmaalliizzeed ffoorrmm PPaarrttiiaalllly NNoorrmmaalliizzeedd//DDeennoorrmmaalliizzeed ttaabblleess

SSiinnggllerr eeccoorrdaa cccceessss MMuullttiippllerr eeccoorrddsff ooraa nnaallyyssiispp uurrppoossee

HHoollddscc uurrrreenntdd aattaa HHoollddscc uurrrreenntaa nndhh iissttoorriiccaaldd aattaa ReRecocordrds arare maiaintntaiainened usising PrPrimimarary KeKey fifieleldd ReRecocordrds aas re babasesed on susurrrrogogatate kke ey fifieleldd

DDeelleettett hhett aabblleoo rrr eeccoorrdd CCaannnnootdd eelleettett hherr eeccoorrddss

CCoommpplleexdd aattamm ooddeell SSiimmpplliiffiieeddd aattamm ooddeell

What is the difference between and Data Warehouse?

DDAATTAMM AARRTT DDAATTAWW AARREEHHOOUUSSEE

A scaled – down version of the data warehouse that It is a database management system that facilitates on- addresses only one subject like “Sales department, HR line analytical processing by allowing the data to be a department etc. vieviewed in difdifferferent dimensions or perspectictives toto provide .

OnOne fafact tatablble wiwith multltipiple didimemensnsioion ttn abableles.s. More ththan onone fafact tatablble anand mulultitiplple dde imimenensision tatablbleses..

[Sales Department] [HR Department] [Manufacturing [Sales Department, HR Department, Manufacturing Department] Department]

SSmmaalll OOrrggaanniizzaattiioonns pprreeffeer ““DDaatta MMaarrtt”” BBiiggggeer OOrrggaanniizzaattiioon pprreeffeer DDaatta WWaarreehhoouussee

What is difference between Dimension Table & Fact Table?

DDIIMMEENNSSIIOONTT AABBLLEE FFAACCTTT AABBLLEE

It provides the context/descriptive information for fact It provides measurement of an enterprise. table measurements.

Structure of Dimension – Surrogate Key, one or more Structure of Fact Table – Foreign Key (fk), Degenerated other fields that compose the natural key (nk) and set of Dimension and Measurements. attributes.

In a schema more number of dimensions is presented Size of Fact Table is larger than Dimension Table. than Fact table.

Surrogate Key is used to prevent the primary key (pk) In a scschehema leless nunumber of FaFact TaTablbles obobseservrveded violation (store historical data). compared to Dimension Tables.

PPrroovviiddees eennttrry ppooiinntts tto ddaattaa.. CCoommppoosse oof DDeeggeenneerraatte DDiimmeennssiioon ffiieelldds aacct aas PPrriimmaarryy Key.

Values of fields are in numeric and text representation.n. Values of the fields always in numeric or integer form..

What is the difference between RDBMS SCHEMA & DWH SCHEMA?

Schema is nothing but the systematic arrangement of tables.

In OLTP it will be normalized and in DWH it will be denormalized.

RRDDBBMMSSSCC HHEEMMAA DDWWHSSCC HHEEMMAA •• Used for OLTP systems •• Used for OLAP systems

•• Normalized •• Denormalized

•• Cannot Solve extract and complex problems •• Extract and complex problems can be easilyly solved

What is a Cube in Data Warehousing Concept?

Cube is a logical schema which contains facts and dimensions

Cubes are multidimensional view of Data Warehouse or Data Marts. It is designed in a logical way to drill up & drill down slice-n-dice etc. which enables the business users to understand the trend of the business. It is good to have thethe design of the cube in the star schema so as to facilitate the effective use of the cube. Every part of the cube is a logical representation of the combination of facts – dimension attributes.

What is a Linked Cube?

A cube can be partitioned in 3 ways.

11.. RReepplliiccaattee

22.. TTrraannssppaarreenntt

33.. LLiinnkkeedd

In the Linked cube the data cells can be linked into another analytical database. If an end-user clicks on a data cell, you are actually linking through another analytic database.

What is Partitioning a Cube?

Partitioning a cube mainly used for optimization.

Example:

You may have data for 5GB to create a report, you can specify a size for a cube as 2GB so if the cube exceeds 2GB it automatically creates the second cube to store the data.

What is Incremental Loading?

Incremental Loading means loading the ongoing changes in the OLTP.

Aggregate table contains the measure values, aggregated/grouped/summed up to some level of hierarchy.

What are the possible Data Marts in Retail sales?

Product Information, Sales Information. What is the main difference between OLTP Schema and in Data Warehouse Schema?

RRDDBBMMSS DDWWHH

NNoorrmmaalliizzeedd DDeennoorrmmaalliizzeedd

MMoorrenn oo.oo fTT rraannssaaccttiioonnss LLeesssnn oo.oo fTT rraannssaaccttiioonnss

LLeesss ttiimme ffoor qquueerriiees eexxeeccuuttiioonn MMoorre ttiimme ffoor qquueerry eexxeeccuuttiioonn

MMoorrenn oo.oo fuu sseerrss LLeesssnn oo.oo fuu sseerrss

HHaavve IIe nnsseerrtt, DD, eelleette aae nnd Uppddaatte Trraannssaaccttiioonnss WWiilll nnl oot hht aavve moorre iie nnsseerrtt, dd, eelleette aae nnd uud ppddaatteess

What are the various ETL tools in the Market?

1.1. InInfoformrmataticica PowewerCrCenenteter r

2.2. Ab Initio

3.3. Data Stage

4.4. OrOracacle Wararehehououse Buiuildlder er

5.5. ESESS BaBase HyHypeperirionon

6.6. BO DaData InIntetegrgratator or

77.. SSAAS EETTLL

88.. MMS DDTTSS

9.9. PePervrvasasivive Datata JJa ununctctioionn

10.10. Cognos Decisiision Stream

1111.. SQL Loaoadeder r

12.12. Data Integrator (Business Objects)

1313.. SuSunonopspsiiss

What is Dimensional Modeling?

Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables – Fact tables and Dimension tables.

Fact table contains the facts/measurements of the business e.g. sales, revenue, profit etc… and the Dimension table contains the How to find the number of success, rejected and bad records in the same mapping?

-- First we separate this data using Expression Transformation. This is used to flag the row for 1 or 0. The condition is follows:

IIF(NOT IS_DATE(Hiredate,’DD-MON-YY’) OR ISNULL(Empno) OR ISNULL(Name) OR ISNULL(Hiredate) OR (ISNULL(Sex), 1, 0)

FLAG=1 is considered as invalid data and FLAG=0 is considered as valid data. This data will be routed into next transformationn using Router Transformation. Here we added two user groups one as FLAG=1 for invalid data and the other as FLAG=0 for valid data.

FLAG=1 data is forwarded to the Expression Transformation. Here we take one variable port and two output ports. One for increment purpose and the other for flag the row.