Journal of Computer and Engineering Technology 5 (1) © Nabu Research Academy, 2018

Identify the Different Between and Data Warehouse

Nur Rachman Dzakiyullah, Dep. of Information Technology, Universitas Aisyiyah Yogyakarta,

[email protected],

Karrar Abdulameer Albo Baqer , National Aerospace University Kharkiv Aviation Institute,

[email protected].

Abstract— The database is designed, built, and maintained the information system. The dramatically increase in governments and companies’ transactions meet by increase in their , data storage and quires which used to retrieve data from database. They use information processing system which is used for storage of everyday activities about them. However, information processing systems rely on online transaction processing (OLTP) in database, which is not so easily to access to the governments and companies' users. Moreover, database was not designed to support multi-dimensional view. Therefore, Multi-dimensional view, Online Analytical Processing (OLAP) and reducing time consuming for reports generating leads to the concept of a data warehouse. This study review of database and data warehouse. Moreover, identify the different between them.

Keywords-Database, data warehouse, OLTP, OLAP.

I. Introduction Operational database is the database of records, consisting of system-specific background data and event data belonging to a system upgrade contract (Mohammed, Hasson, Shawkat and Al-khafaji, 2012; Sankaran, Suresh, Gupta, Nesamoney and Mukhopadhyay, 1998). It may also contain data monitoring system, such as indicators, flags, and counters. The operational database is the source of data for data warehouse (Jaber, Ghani, Suryana, Mohammed and Abbas, 2015; Khurram, S., 2008). It contains detailed data used to run the daily operations of the enterprise. These are constantly changing as we update and reflect the current value of the last transaction (Hasso, P., 2009). Operational database contains data entities that to date, and modifiable. In the system of enterprise data management, operational database can be seen as opposed to a colleague from the database, decision support, which contains non-modifiable data, which are for the purposes of statistical analysis (Mohammed, Ibrahim and Nadzir, 2016; Charles, J., and Grry, P., 2008). An example of a base for decision-making is that it provides data so that the average wages for different types of employees can be identified at that time as an operational database contains the same data that will be used to calculate the amount to pay for the testing of employees in Depending on the number of days that they have (Mohammed and Anad, 2014; Marotta, A., and Ruggia, R., 2002).

The conceptual representation of database contents in different target has justified the needs for retrieving and generating the historical perspective of these targets. Thus, current data warehouses customized to be more flexible to generate the huge number of the incoming information in different data target. Recently, the field of DSS aims to justify the require trends rather without needs to look through the individual records in isolation. The founding was concerned on the importance of decision support queries in the Online Transaction Processing (OLTP) queries (Kang, 2002; Mohammed, Ibrahim, Shawkat and Hasson, 2013). The job of earlier on-line operational systems was to perform transaction and query processing. So, they are also termed as on-line transaction processing systems (OLTP). Data warehouse systems serve users or knowledge workers in the role of data analysis and decision-making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are called on-line analytical processing (OLAP) systems. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the on-line transaction processing (OLTP) applications traditionally supported by the operational databases. According to Inman (2003) data warehouse is “subject-oriented, integrated, time-varying, Non-Volatile data collection, which is used primarily in organizational decision-making process”. In other words, the data warehouse is not a change in the collection of data on a logical part of the business organization (Tim, 2004). These are usually quantitative result in some period of time such as day or month. To facilitate decision support, data storage has several different variations of the traditional system OLTP (Tim, 2004). There are many reasons to distinguish between data warehouse and traditional database. Data warehouse supports OLAP (Shi, H., et al., 2007). In addition, it is functional and performance requirements of which are quite different from those of the OLTP applications which are supported by the operational databases. OLTP applications usually computerizing clerical data processing tasks such as order entry and banking transactions, which from day to day operations of the organization (Shi, H., et al., 2007). These tasks are organized and repeated, and consist of short-term and isolated operations (Mendelzon, 2000). Operations require detailed, up to date data, as well as read or update multiple records access as a rule, primary keys. Operational databases are usually hundreds of megabytes to gigabytes in size. Sequence and the ability to restore databases are critical, and most of the entire operation is a key performance metric (Pasha, A., et al., 2004). To understand data warehousing further, the following four characteristics must be defined: Subject-oriented: Data warehouses are built around broad, non-overlapping subjects like customer, order, product, vendor, and time rather than around systems, functions, and processes like customer billing, order entry, and accounts payable. OLTP systems, on the other hand, are characterized by processes (Kimball, R. and Ross, M., 2002). Integrated: data is extracted from multiple, autonomous, Heterogeneous sources and is integrated by data wide consistencies in the measurement of variables, naming conventions, and physical data definitions (Chen, X., et al., 2008). Here the sources can be OLTP systems supporting different organizational needs. Such sources are mainly based on relational database systems like Oracle, Ingress, etc. It may also include mainframe computers having database applications written in COBOL or even flat files of data. It may include data from external sources like weather conditions or market conditions which do not necessarily form a part of the company database (Gary, P., and Greg, W., 2006). Time-variant: Data warehouse data is extracted from operational systems that enable it to be archived. This archival and subsequent historical value gives data warehouse an element of time as part of their structure. In fact, almost all the data warehouse applications have time as one of the dimensions (Paplpanas, T., 2000).

Nonvolatile: Since the data in the data warehouse is a snapshot of corporation's data at a specific point in time, the data is relatively constant and doesn't change much with time (Rob, P., and Carlos, C., 2000).

II. Data Warehouse Tools and Technique Data warehouses feature huge storage capacities that allow data collection without deletion or update options. Data warehouses also employ various tools and techniques. These tools can be used to clarify, structure, integrate, model, mine and multi-dimensional data. Upon the application of these tools, the data are ready to be used in DSS. Several tools and techniques are described below. Extract, Transform and Load (ETL): An ETL tool is used to extract clean data and information from multiple DBs. This tool then transforms retrieved data into a suitable form for data warehouses using special rules and tables. The suitable data are finally loaded onto a corresponding space in the data warehouse. Database Management System (DBMS): The DBMS systematically stores data inside warehouses following structures such as the star schema structure model and the snowflake schema structure. The star schema structure uses de-normalized dimensional tables, whereas the snowflake schema structure uses normalized dimensional tables. Such structures easily allow for a multi-dimensional view. They also facilitate easy access to data and quick response to queries (Inmon, 2003). Metadata: Metadata is used to structure information within data warehouses. Metadata provides descriptions and explanations for the flexible use and management of information resources. Metadata is also referred to as “data about data” because it provides elaborate descriptions of the data from the data itself. : The data mart is a small repository of data and information that is built to store retrieved data according to their respective departments. The data mart is used to provide easy access to data and quick responses to queries. Moreover, the data mart can improve department performance (Kimball and Ross, 2002). OLAP, Data Mining and Online Analytical Mining (OLAM): OLAP (Online Analytical Process) is used to analyze data and information with multidimensional views of a cube. Thus, an OLAP cube provides a three-dimensional description of information. The DM tool is used to extract new knowledge from quantities of data. OLAM is a new tool that combines DM and OLAP to obtain information suitable for a DSS. Decision Support System (DSS): The DSS aids in the decision-making process within an organization. Specifically, the DSS obtains information, knowledge, reports, and analytical data via OLAP, DM, and OLAM. This system assists decision makers in recognizing available decisions. Moreover, the DSS provides decision makers with useful reports for and possible solutions to unstructured and semi-structured information. Ad-hoc Query: This technique allows the data warehouse user to create his/her own queries. Ad hoc queries are important because they extend the OLAP data model for DW. Figure (1)0 illustrates the data warehouse and the use of tools and techniques in its platform.

Figure 1: General architecture of the common data warehouse (Turban, Aronson, Liang & Sharda, 2011).

III. Database vs Data Warehouse DB tasks are organized and repeated; they consist of short-term and isolated operations (Mendelzon, 2000). DB operations require details such as dates of data, read or updated multiple records that are accessed as a rule and primary keys. DBs generally have a storage size of approximately hundreds of megabytes or gigabytes (Pasha, Nasir & Shahzad, 2004). Data warehouses address several needs; their design thus requires many approaches (Marcin, 2009). The most critical concept of data warehousing is the saving of data and information in one place under one data structure model. For example, in a decision support system, integrated data are retrieved from DB platforms, and retrieved data are thereafter structured by a data model (Kimball & Rose, 2000). Therefore, a data warehouse is built to extract clean data from different sources, model and store the data in one place, and subsequently employ OLAP to provide a multi- dimensional view of the data. The data used in data warehouses are different from those used in operational DBs. Data warehouses use an ETL tool to collect clean data from DBs. Moreover, database data are generated from daily system operations (Levene & Loizou, 2001). Data warehousing involves more managed, better integrated, and clearer data than DBs that make data in data warehouse has different data table because DW uses proper data structure model (Santhi & Jigeesh, 2010). Table 1 shows the difference between operational DBs and data warehouses in terms of data.

Table 1. Data in Operational Database Versus Data in Common Data Warehouse (Liedes & Wolski, 2006)

Operational data Data warehouse data

application oriented subject oriented

Detailed summarized, otherwise refined

accurate, as of the moment of access represents values over time, snapshots

assist the office community assist the managerial community

can be updated is not updated

run repetitively and non-reflectively run heuristically

requirements for processing understood before requirements for processing not completely initial development understood before development

compatible with the Software Development Life completely different life cycle Cycle

performance sensitive (immediate response performance relaxed (immediacy not required) required when entering a transaction)

accessed a unit at a time (limited number of data accessed a set at a time (many records of many data elements for a single record) elements)

transaction driven analysis driven

control of update a major concern in terms of control of update no issue ownership

Semi-availability high availability

managed in its entirety managed by subsets

Non redundancy redundancy is a fact of life

static structure; variable contents flexible structure

small amount of data used in a process large amount of data used in a process

High speed performance with the complex data Slow speed performance with the complex data

Ability to deal with the multi system patterns Difficulties to deal with the multi system patterns such as complex business grid.

To monitor and feedback several business tasks To assist the planning and decision making Short and fast inserts and updates initiated by end Periodic long-running batch jobs refresh the data users

IV. Conclusion The dramatically increase in governments and companies’ transactions meet by increase in their databases, data storage and quires which used to retrieve data from database. Therefore, database is not able to handle of these data and information Moreover, relational database was not designed to support multi-dimensional view. Need for multi-dimensional view, Online Analytical Processing (OLAP) and reducing time consuming for reports generating leads to the concept of a data warehouse. This study reintroduced database and data warehouse. Moreover, identify the different between them. This can help the researcher to understand better about the different between these two concepts of saving information. Nowadays, huge companies use new technology which is called . More studies are needed in order to understand this new technology.

References

Charles, J., and Grry, P. (2008). Security: Preventing Enterprise Data Leaks a t the Source. Oracle Corporation. Chen, X., Neil, P., & Neil, E. (2008). Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance. ICDE 2008, pp. 1409-1411.

Gary, P., and Greg, W. (2006). What Would an Exemplary Entrepreneurship Dataset Look Like? Imperatives and Opportunities for Research, October 26-27, 2006, Washington, DC.

Hasso, P. (2009). A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database. SIGMOD’09, June 29–July 2, 2009, Providence, Rhode Island, USA. ACM 978-1-60558-551-2/09/06.

Inmon W. H. (2003). Building the data warehouse, Wiley Dreamtech India (P) Limited

Jaber, M. M., Ghani, M. K. A., Suryana, N., Mohammed, M. A., & Abbas, T. (2015). Flexible data warehouse parameters: Toward building an integrated architecture. International Journal of Computer Theory and Engineering, 7(5), 349.

Kang (2002). Exploiting Versions for On-Line Data Warehouse Maintenance in MOLAP Servers, Proc. Of VLDB Conference, China 2002. Khurram, S. (2008). Semi-star Schema for Operational and Analytical Requirements of SMEs. International Journal of Management and Decision Making (IJMDM): Special Issue on “Decision Support System and Knowledge Management in SME’s”, Greece, 2008.

Kimball, R. & Ross, M. (2002). The Data Warehouse Toolkit, Second Edition, Wiley, 2002.

Levene, M., & Loizou, G. (2001). Guaranteeing no interaction between functional dependencies and tree-like inclusion dependencies. Theoretical Computer Science, 254:683–690, 2001.

Liedes, A., and Wolski, A. (2006). SIREN: A memory-conserving, snapshot-consistent checkpoint algorithm for in- memory databases. In ICDE ’06, page 99, 2006.

Marcin, G. (2009). Extended Cascaded Star Schema and ECOLAP Operations for Spatial Data Warehouse. Intelligent Data Engineering and Automated Learning - IDEAL 2009. Marotta, A., & Ruggia, R. (2002). Data Warehousing Design: A Schema-transformation Approach. In proceedings of 22nd International Conference of the Chilean Computer Science Society, 2002. SCCC 2002, IEEE Computer Society, Atacama, Chile, 6-8 November 2002, pp 153- 161.

Mendelzon (2000). Temporal Queries in OLAP, Proceedings of VLDB Conference, Egypt, 2000. Mohammed, M. A., & Anad, M. M. (2014). Data warehouse for human resource by Ministry of Higher Education and Scientific Research. In Computer, Communications, and Control Technology (I4CT), 2014 International Conference on (pp. 176-181). IEEE.

Mohammed, M. A., Hasson, A. R., Shawkat, A. R., & Al-khafaji, N. J. (2012). E-government architecture uses data warehouse techniques to increase information sharing in Iraqi universities. In E-Learning, E-Management and E-Services (IS3e), 2012 IEEE Symposium on (pp. 1-5). IEEE.

Mohammed, M. A., Ibrahim, H., & Nadzir, M. M. (2016). Data Warehouse as an Influence Factor in Information Sharing in Public Organization. Journal of Engineering and Applied Sciences, 11(3), 655-661.

Mohammed, M., Ibrahim, H. B., Shawkat, A. R., & Hasson, A. R. (2013). Implementation of Data warehouse Architecture for E-government of Malaysian Public Universities to Increase Information sharing between them. In Proceedings of the International Conference on Rural ICT Development (pp. 1-8).

Paplpanas, T. (2000). Knowledge Discovery in Data Warehouses,” ACM SIGMOD Record, 29(3), pp. 88-100.

Pasha, A., Nasir, A., & Shahzad, K. (2004). Semi-Star Modeling Schema for Managing Data Warehouse Consistency. National Conference on Emerging Technologies 2004. Rob, P., & Carlos, C. (2000). Database Systems: Design, Implementation, and Management, Course Technology, Cambridge.

Sankaran, M., Suresh, S, Gupta, S, Nesamoney, D., & Mukhopadhyay, P. (1998). Apparatus and Method for Capturing and Propagating Changes from an Operational Database to Data Marts: WO Patent 1,998,050,868.

Santhi, K., and Jigeesh, N. (2010). A Virtual Data Warehouse for Manufacturing Industry. Social science research network 2010.

Shi, H., Tung, C., & Jia, S. (2007). Data warehouse enhancement: A semantic cube model approach. Information Sciences 177 (2007) 2238– 2254.Systems (1995). Retrieved on 28 March 2010, from (www.redbrick.com).

Tim, M. (2004). Reconsidering Multi-Dimensional Schemas. SIGMOD Record, Vol. 33, No. 1, March 2004.