Basically Speaking, Inmon Professes the Snowflake Schema While Kimball Relies on the Star Schema
Total Page:16
File Type:pdf, Size:1020Kb
What is the main difference between Inmon and Kimball? Basically speaking, Inmon professes the Snowflake Schema while Kimball relies on the Star Schema. According to Ralf Kimball… Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level. He follows Bottom-up approach i.e. first creates individual Data Marts from the existing sources and then Create Data Warehouse. KIMBALL – First Data Marts – Combined way – Data warehouse. According to Bill Inmon… Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary. He follows Top-down approach i.e. first creates Data Warehouse from the existing sources and then create individual Data Marts. INMON – First Data warehouse – Later – Data Marts. The Main difference is: Kimball: follows Dimensional Modeling. Inmon: follows ER Modeling bye Mayee. Kimball: creating data marts first then combining them up to form a data warehouse. Inmon: creating data warehouse then data marts. What is difference between Views and Materialized Views? Views: •• Stores the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. •• This is PSEUDO table that is not stored in the database and it is just a query. Materialized Views: •• Stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results. •• These are similar to a view but these are permanently for the database and often and are useful in aggregation and summarization of data. What is Junk Dimension? What is the difference between Junk Dimension and Degenerate Dimension? Junk Dimension: The column which we are using rarely or not used, these columns are formed a dimension is called Junk Dimension. Degenerate Dimension: The column which we use in dimension is Degenerate Dimension. A Degenerate Dimension is data that is dimensional in nature but stored in a fact table. Example: EMP table has empno, ename, sal, job, deptno But We are talking only the column empno, ename from the EMP table and forming a dimension this is called Degenerate Dimension. How to list Top 10 salaries, without using Rank Transformation? By Using Sorter Transformation using sorted port as SAL and Filter Transformation to get first 10 records. What is Data Warehousing? The process of making operational data available to business managers and decision support systems is called Data Warehousing. How do u handle two sessions in Informatica? Using Link Condition. If first session is succeeded automatically second runs and succeeded. What is the purpose of using UNIX commands in Informatica? Which UNIX commands are generally used with Informatica? Sometimes we have to work with UNIX based servers mostly we are using UNIX based servers so there we have to load data. ””egrep, grep, rm these commands would be used knowledge of UNIX would be advantage. How to create Slowly Changing Dimension in Informatica? Select all rows. Caches the existing target as lookup table. Compares logical key column in that source against corresponding column in the target lookup table. Compare source columns against corresponding target columns if key match, flags news rows and changed rows. Create two data flows: one new row and other is changed row. Generate a primary key for new row. Insert new row in the target and updates changed row in the target over writing existing rows. Transformation used: SQ –> 1 Connected Lookup –> target 2 Unconnected Lookup –> Expression –> Router –> Update Strategy –> target (instance). What is the difference between SQL Overriding in Source Qualifier and Lookup Transformation? Major difference is we can use any types of joins in sql over riding in source qualifier but in lookup we can use only eqi-join in sql override. How will you update the row without using Update Strategy Transformation? You can set the property at session level “Treat Source Rows as: UPDATE or INSERT”, the record without using Update Strategy in the mapping. In Target, there is a Update Override option of updating the records using the non-key columns. Using this one we can update the records without using Update Strategy Transformation. How we do performance tuning in Informatica? Performance tuning is done in several stages, like for first we do check in following order: Target, Source, Mapping, Session, System, and depending upon which level got bottleneck we do rectify it. Explain about scheduling real time in Informatica? Scheduling of Informatica jobs can be done by the following ways: Informatica Workflow Manager, Using Cron in UNIX, Using Opcon Scheduler. What is the definition of Normalized and Denormalized? Normalization: Normalization is the process of removing redundancies. OLTP uses the Normalization process. Denormalization: Denormalization is the process of allowing redundancies. OLAP/DWH uses the Denormalized process to greater level of detailed data (each and every transaction). Why fact table is in normal form? A Fact Table consists of measurements of business requirements and foreign keys of dimensions tables as per business rules. Basically the fact table consists of the Index keys of the dimension/lookup tables and the measures. So whenever we have the keys in a table that itself implies that the table is in the normal form. What is difference between E-R Modeling and Dimensional modeling? E-R Modeling is used for normalizing the OLTP database design. It revolves around the Entities and their relationships to capture the overall process of the system. In E-R Modeling the data is in Normalized form. So more number of Joins, which may adversely affect the system performance. Dimensional modeling/Multi-Dimensional Modeling is used for de-normalizing the ROLAP/MOLAP design. It revolves around Dimensions (point of analysis) for decision making and not to capture the process. In Dimensional Modeling the data is denormalized, so less number of Joins, by which system performance will improve. What is Conformed Fact? A Dimension table which is used by more than one fact table is known as a Conformed Dimension. Conformed facts are allowed to have the same name in separate tables and can be combined and compared mathematically. The relationship between the facts and dimensions are with 3NF, and can works in any type of Joins are called as Conformed Schema, the members of that schema are call so… What are the Methodologies of Data Warehousing? Every company has Methodology of their own. But to name a few SDLC Methodology, AIM Methodology are standard used. Other Methodologies are AMM, World class Methodology and many more. Most of the time, we use Mr. Ralph Kimball Methodologies for Data Warehousing design. Two kinds of Schemas: Star Schema and Snow Flake Schema. Most probably every one follows Either Star Schema or Snow Flake Schema. There are two Methodologies: 1.1. Ralph Kimball – First Data Marts then Enterprise Data Warehouse. 2.2. Bill Inmon – First Enterprise Data Warehouse then Data Marts from EDWH.H. Regarding the Methodologies in the Data Warehousing. They are mainly two methods: 1.1. RaRalplph KiKimbalall Modedell Kimball Model always structured as Denormalized Structure. 2.2. Bill Inmon Model Inmon Model Structured as Normalized Structure. Depends on the requirements of the company any one can follow the company’s DWH will choose the one of the above models. In DWH contains the Two Methods: 1.1. TTop dodowwn Meththodod Top down approach in the sense preparing individual departments data (data marts) from the Enterprise DWH. First loads into Data Marts and then loads into the Data Warehouse. 2.2. BBotottotom up Metethohodd Bottom up approach is nothing but first gathering all the departments data and then cleanse the data and Transforms the data and then load all the individual departments data into the enterprise data warehouse. First loads into Data Warehouse and then loads into the Data Marts. What is Data Warehousing Hierarchy? Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure. What are Data Validation Strategies for Data Marts Version? Data Validation is to make sure that the loaded data is accurate and meets the business requirements. Strategies are different methods followed to meet the validation requirements. What are the data types present in BO? N what happens I… There are different data types: Dimensions, Measure and Detail. View is nothing but an alias and it can be used to resolve the loops in the universe. There are called as Object types in the Business Objects (BOs). And “Alias” is different from View in the universe. View is at database level, but Alias is a different name given for the same table to resolve the loops in universe. The different data types in business objects are: 11.. CChhaarraacctteer r 22.. DDaattee 33.. LLoonng tteexxtt 44.. NNuummbbeer r Dimension, Measure, Detail are objects type. Data types are “character, date and numeric”. What is Surrogate Key? Where we use it explain why? Surrogate Key is the primary key for the Dimensional table.