<<

Revista Informatica Economică, nr. 3 (43)/2007 91

Database Vs Warehouse

Manole VELICANU, Bucharest, Romania, [email protected] Gheorghe MATEI, Bucharest, Romania, [email protected]

Data warehouse technology includes a set of concepts and methods that offer the users useful information for decision making. The necessity to build a arises from the ne- cessity to improve the quality of information in the organization. The date proceeding from different sources, having a variety of forms – both structured and unstructured, are filtered according to business rules and are integrated in a single large data collection. Using infor- matics solutions, managers have understood that data stored in operational systems – includ- ing , are an informational gold mine that must be exploited. Data warehouses have been developed to answer the increasing demands for complex analysis, which could not be properly achieved with operational databases. The present paper emphasizes some of the cri- teria that information application developers can use in order to choose between a solution or a data warehouse one. Keywords: data warehouse, database, database management systems, information systems, data organisation in externe memory, .

he need for databases and dataware- pany results. T houses The organisation of large volumes of data The possibility for users to efficiently access has evolved from files to database and latter data through analytical queries is of extreme to data warehouses. The pre-requisite of stor- importance for the competitive advantage of ing and processing larger and larger volumes companies. The transfer and sharing of data of data has led to the design of analytical sys- within the organisation, among departments tems based on data warehouses. and different locations as well as among The purpose of such a system is to provide business partners is also important. Solutions analysts with an integrated and consistent are more and more numerous due to the mul- on all the data relevant for the com- titude of systems which may be integrated pany. Based on the data systematised and with decision support systems: databases, consolidated in such data warehouses, com- data warehouses, data marts, business intelli- prehensive analyses of the performance of gence solutions, enterprise-based applica- the company may be derived, various data tions. Managers who will succeed are those correlations may be identified, together with who will implement decision support sys- trends which forecast future developments, tems, business performance monitoring ap- as well as solutions for the improvement of plications, executive information systems or business. business intelligence solutions [INMO05a]. The increased need to anticipate changes in Through these technologies, companies learn market conditions and customer choices re- what has happened to their business, why has quires the development of intelligent busi- happened and what will happen; and all this, ness plans, which involves access to neces- plus the expertise and intuition of users gen- sary information. Most of this information erate competitive advantages. can be found in transactional systems, rela- Organisations must store and process more tional databases included. The ability to and more data, which is more and more di- transform data into information, information versified. The need to use this data as a re- into knowledge and knowledge into action source for the organisation, as decision sup- [VELU03] is a must, so that companies may port, has triggered the ongoing improvement be competitive in an ever-changing economic of information systems. The better the data of environment. The solution to all such prob- a company is organised, the better the com- lems is the development of a data warehouse. 92 Revista Informatica Economică, nr. 3 (43)/2007

Transactional databases provide an answer to data to process - is obvious in the content of operational requirements, while data ware- the database. houses provide an answer to analysis re- The data warehouse includes only informa- quirements, thus offering the possibility for tion which shall be employed in information high quality analyses and complex ad-hoc and analysis processing, while the opera- queries through user friendly interfaces. tional database includes the detailed data re- The fundamental criterion for the organisa- quired for processing purposes, but which is tion of data in data warehouses is the subject not relevant for management or analysis. (the field of activity), while the fundamental The subject-orientation of a data warehouse criterion for databases is the application. allows the development of a decision-making The data warehouse concept is a logical ar- process through an incremental process, chitectural approach to extracting opera- which integrates different subjects into a sin- tional data and transforming it into accurate gle structure. For example, when a customer historical information to support the deci- is included in several operational databases, sion-making process. where he is differently defined, the customer is defined only once within the data ware- Comparative view of the features of data house and is viewed by all users in the same warehouses and databases way. The features of two data organization mode Integration in external memory, data warehouses and da- Operational databases are designed at differ- tabases, can be seen from the definitions pro- ent times, by different teams, in different vided below. ways. Thus, from a functional point of view, A database is an application-oriented collec- the database cannot be employed for analysis tion of data that is organised, structured, co- and reporting purposes. Data warehouse is an herent, with minimum and controlled redun- enterprise project. It includes data from all or dancy, which may be accessed by several most operational databases of the organisa- users in due time [VELU03]. tion, which is stored consistent to allow ana- A data warehouse is a subject-oriented col- lysts to focus on the use of data, not on its re- lection of data that is integrated, time-variant, liability and consistency. Consistency of data non-volatile, which may be used to support is very important for databases and is ensured the decision-making process [INMO05b]. by the objective of the database management We may thus infer the main features of data system - DBMS regarding . warehouses and databases, which shall be de- Consistency also applies to data warehouses scribed below. with respect to: field names, code systems, Subject-orientation date representations, variable measurements, Data organisation in data warehouses is physical attributes etc., so that the reports based on areas of interest, on the major sub- generated for various departments or differ- jects of the organisation: customers, prod- ent times shall include the same results. ucts, activities etc. Databases organise data Time-variant based on enterprise applications resulted The value of operational data in databases is from its functions. updated periodically and shows the current The main objective of a data warehouse is to status. On the other hand, for the information support the decision-making system, focus- requirements of economic analysis based on ing on the subjects of the organisation. In- data warehouses, historical data is of the es- formation is organised on these subjects. The sence, as it shows the trends for accurate objective of a database is to support the op- forecasting. The regular loading of data from erational system and information is organised operational databases makes the data in data on applications and processes. All data items warehouses time-variant. Data in data ware- which refer to the same subject or event in houses accurately shows the status at differ- the real world are linked and the orientation - ent moments, thus providing a historical view Revista Informatica Economică, nr. 3 (43)/2007 93 of date. This makes data warehouses differ- hoc queries, mainly for management pur- ent from operational databases, where data poses. should show the status at the time of access. 2) Functional requirements are different: op- In databases, data is updated with each new erational databases mainly focus on data se- transaction and former values are usually curity and coherence, which makes queries lost. Operational databases rarely keep his- slow, special ad-hoc, mainly in the case of torical data and this only for short times, as unpredicted criteria, while in data ware- their purpose is to keep current data. As op- houses is usually. These queries, specific to posed to these systems, data warehouses are economic analysis, may significantly com- not updated, but data is loaded periodically to promise the performance of the operational show the history of data. This allows for the system, due to the lack of predictable in- identification of trends, as well as for com- dexes, as is the case of data warehouses. parisons between different time periods. The 3) Although most operational systems and time horizon of data warehouses is signifi- data warehouses are built on relational tech- cantly longer compared to that of operational nologies, their design is substantially differ- databases, providing information from a his- ent, as their purpose is also different. Opera- torical perspective (5-10 years). That is why, tional databases are designed for online any structure in the data warehouse includes, and their main objec- either explicitly or implicitly, the time ele- tive refers to the efficient storing of a large ment, to identify a certain feature at a certain amount of transactional data. They include time, which is not mandatory for databases. current information on day-to-day activities No volatility and process-oriented information which is Data in data warehouses is static, not dy- subject to updating. As a result, data is dy- namic as is the case with operational sys- namic and thus, very volatile. The tasks of tems. As data warehouses show operational such systems are structured and repetitive data at a certain time, data will not be up- and are made up of current, short and isolated dated once loaded in data warehouses. As a transactions, which include detailed data. result, an identical query made after one year These transactions read or update few re- based on the same reference data will yield cordings – tens at most, mainly accessed the same result. In operational databases, in- based on their primary keys. Operational da- formation is volatile, as queries focus on cur- tabases reach sizes from hundreds of mega- rent data. Data is updated on an ongoing ba- bytes to gigabytes. Their consistency is es- sis, usually on a transactional basis. Any sential and refers to rapid transaction proc- transaction processed involves updating: add- essing. ing new records, modifying or deleting exist- As opposed to transactional databases, data ing other. warehouses are designed to be the support of decision-making systems. They are designed Differences between data warehouses and to facilitate , not efficient stor- operational databases ing, and the only operations performed refer Operational databases and data warehouses to the initial data loading, data access and its are mostly based on the same technological refreshment. Historical, summarised and support: both are data collections, both func- consolidated data is more important than de- tion based on keys, indexes and views, both tailed data. Data warehouses include consoli- is based to a . dated data from several operational systems, Nevertheless, the two systems are different, with long time horizons, and have an inte- as the criteria described below shows. grated and evolutive vision. As data is static 1) From a functional point of view: opera- and non-volatile, the size of data warehouses tional databases process transactions, provid- may reach in time hundreds of gigabytes, ing answers to operational requirements, terabytes or even petabytes. Many ad-hoc while data warehouses are used based on ad- queries may be made and millions of records 94 Revista Informatica Economică, nr. 3 (43)/2007 may be accessed, as many joins and aggrega- tems. Depending on the complexity of proc- tions may be performed. Information is sub- essing requirements, response times ranging ject-oriented, as data warehouses provide a from several seconds to days are allowed. multidimensional view on data, based on an 7) Data integrity is seen differently. Integrity intuitive model, designed to meet the re- constraints are established for the verification quirements of data analysis and decision of input data in operational databases. These makers. constraints are not necessary in the case of 4) Another difference refers to the status data warehouses, as data has been verified shown by data. Data warehouses show the and filtered before loading, and historical status of data at different moments in time, data will not be updated following its loading thus providing historical outlook. This is dif- in the data warehouse. In operational data- ferent from operational databases, where data bases, a transaction must lead the data collec- shows the current status at the time of access. tion from one consistent status to another 5) Optimisation is an issue for both systems, consistent one. This involves complex but in a different way. While operational da- mechanisms for data integrity maintenance tabases are designed to provide data process- systems: data logs, data restore, detection of ing optimisation and security, data ware- blockings, backup and recovery. These houses optimise analyses and the economic mechanisms are useless in the case of data significance of data. Operational databases warehouses. Treating of updating anomalies can be said to be optimised for writing, while are not as important as in the case of transac- data warehouses are optimised for reading. tional systems, as data warehouses are spe- To this purpose, multidimensional modelling cialised and optimised for the fast retrieval of is used for the design of data warehouses to large volumes of data, and updating refers make queries for the analysis and summary only to the regular add of new data. of large amounts of data more efficient. The 8) Another difference between the two types structure of a data warehouse is simple, intui- of systems refers to the mechanisms required tive and easy to understand by non-expert us- for users’ concurrent access. As data ware- ers, as opposed to the structure of an opera- houses are not updated, transaction manage- tional database, which is designed based on ment, concurrent access management and the entity-relationship model, by specific other such mechanisms integrated in the da- techniques which are complex and difficult tabase management system are used only in to understand. On the contrary, multidimen- the initial loading stage and for subsequent sional modelling involves the denormalisa- add, due to the fact that they are expensive tion of tables. This enters controlled data re- from the point of view of response time. dundancy, allows analyses from different These mechanisms may be disabled during points of view and different levels of detail. the current use of data warehouses. The free- 6) Categories of users are different. Opera- dom thus generated may be employed for the tional databases are meant for a large number optimisation of data access by: denormalisa- of users, from different categories. Auto- tion, summarisation, data access statistics, mated processes are repetitive, as processing index dynamic reorganisation etc. requirements are known before the initial de- 9) Backup and recovery strategies are differ- velopment. The system should immediately ent for the two types of systems. Most data in provide answers to any query or to any new data warehouses is historical data, which is transaction. As opposed to these systems, non-variant and does not require repetitive data warehouses are used by a small number saving. New data can be saved at the time of of users, namely by managers and business loading. It is advisable for data to be saved analysts. Processes are heuristic, as require- from intermediary databases in certain cases ments are not completely known before the in order to minimise impact on the perform- initial development. Response requirements ance of data warehouses. Recovery policies are more lax compared to operational sys- may also be different in the case of data Revista Informatica Economică, nr. 3 (43)/2007 95 warehouses as opposed to operational data- References bases, depending on how critical permanent, [1] [DEVL97] Barry Devlin, Data Ware- seamless access to data warehouses is for the house from Architecture to Implementation, organisation. In actual database backup and Addison-Wesley, 1997 recovery task is for DBMS. In actual data [2] [ENWIKI] Enciclopedia Wikipedia, warehouse this task is for database adminis- Data Warehouse, www.en.wikipedia.org trator. [3] [GAVA99] S. Gatziu, A. Vavouras, Data 10) Not only data organisation is different in Warehousing, Concepts andMechanisms, data warehouse from operational databases, Revista Informatik-Informatique, Nr. 1/1999. but the interfaces used as well. Data ware- [4] [INMO05a] , Data Ware- houses support analytical processing by house and Decision Support Systems, 2005, OLAP – On-Line Analytical Processing, www.dssresoures.com which differs from a functional point of view [5] [INMO05b] W. H. Inmon, Building the from transactional processing applications. Data Warehouse, 4th Edition,Wiley Publish- The attempt to perform analytical processing ing, Inc., Indianapolis, 2005 and comprehensive queries in operational da- [6] [KIMB02] , The Data tabases will only reduce performance. Warehouse Toolkit, Practical Techniques Conclusion. The differences shown above are For Building Dimensional Data Warehouse, part of the reasons why data warehouses are 2nd Edition, John Wiley&Sons, New York, built separately from operational databases. 2002 The separation of the two systems ensures [7] [TUAR01] E. Turban, J. Aronson, Ting- the scalability of business intelligence solu- Peng Liang, Decision Support Systems and tions as well as their ability to answer rapidly Intelligent Systems, 7th Edition, Prentice Hall, and efficiently to queries on the company. Inc., 2004. Data warehouses allow comprehensive [8] [VELI05] M. Velicanu, Dicţionar expli- analyses, as the structures of data collections: cativ al sistemelor de baze de date, ed. are more simple – only necessary informa- Economică, 2005. tion is retain, are standardised – structures [9] [VELU03] M. Velicanu, I. Lungu ş.a., are well documented, and are denormalised Sisteme de baze de date – teorie şipractică, – there are fewer joins between data collec- ed. Petrion, 2003. tions.