Mediators in the Architecture of Future Information Systems
Total Page:16
File Type:pdf, Size:1020Kb
Mediators in the Architecture of Future Information Systems Gio Wiederhold Stanford University Septemb er An edited version of this rep ort was published in The IEEE Computer Magazine March Abstract The installation of highsp eed networks using optical b er and high bandwidth messsage forwarding gateways is changing the physical capabilities of information systems These capabilities must b e complemented with corresp onding software systems advances to obtain a real b enet Without smart software we will gain access to more data but not improve access to the typ e and quality of information needed for decision making To develop the concepts needed for future information systems we mo del information pro cessing as an interaction of data and knowledge This mo del provides criteria for a highlevel functional partitioning These partitions are mapp ed into information pro cess ing mo dules The mo dules are assigned to no des of the distributed information systems A central role is assigned to mo dules that mediate between the users workstations and data re sources Mediators contain the administrative and technical knowledge to create information needed for decisionmaking Software which mediates is common to day but the structure the interfaces and implementations vary greatly so that automation of integration is awkward By formalizing and implementing mediation we establish a partitioned information sys tems architecture which is of managable complexity and can deliver much of the p ower that technology puts into our reach The partitions and mo dules map into the p owerful distributed hardw are that is b ecoming available We refer to the mo dules that p erform these services in a sharable and comp osable wayasmediators We will present conceptual requirements that must b e placed on mediators to assure eective largescale information systems The mo dularity in this architecture is not only a goal but also enables the goal to b e reached since these systems will need autonomous mo dules to p ermit growth and enable them to survive in a rapidly changing world The intent of this pap er is to provide a conceptual framework for many distinct eorts The concepts provide a direction for an information pro cessing systems in the foreseeable future We also indicate some subtasks that are of research concern to us In the long range the exp erience gathered by diverse eorts may lead to a new layer of highlevel communication standards Intro duction Computerbased information systems connected to worldwide highsp eed networks provide increasingly rapid access to a wide variety of data resources Mayo This technology op ens up p ossibilities of access to data requiring capabilities for assimilation and analysis which greatly exceed what wenowhave in hand Without intelligent pro cessing these advances will have only a minor b enet to the user at a decisionmaking level That brave user will be swamp ed with illdened data of unknown origin The problems We nd twotyp es of problems for single databases a primary hindrance for enduser access is the volume of data that is b ecoming available the lack of abstraction and the need to understand the representation of data the ma jor concern when combining information from multiple databases the mismatch encountered in information representation and structure We will expand on those two issues Volume and Abstraction The volume of data can b e reduced by selection It is not coincidental that SELECT is the principal op eration of relational database management systems but selected data is still at to o ne a level of detail to b e useful for decision making Further reduction is achieved by bringing data to higher levels of abstraction Aggregation op erations as COUNT AVERAGE SD MAX MIN etc provide some computational faciliti es for abstraction but anysuch abstraction is formulated within the application using some domain knowledge Typ e of abstraction Example Base data Abstraction Granularity Sales detail Product summaries Generalization Product data Product type Temp oral Daily sales Seasonally adjusted monthly sales Relative Product cost Inflation adjusted trends Exception recognition Accounting detail Evidence of fraud Path computation Airline schedules Trip duration and cost Figure Abstraction functions For most base data more than one abstraction must b e supp orted for the salesmanager the aggregation is by sales region while for marketing aggregations by customer income are appropriate Examples of required abstraction typ es are given in Fig Computational requirements for abstraction are often complicated The groupings to dene abstractions may for instance involve recursive closures These cannot b e sp ecied with current database query languages Application programs are then written byspe cialists to reduce the data The use of sp ecic datapro cessing programs as intermediaries diminishes exibility and resp onsiveness for the end user Now the knowledge that creates the abstractions is hidden and hard to share and reuse Mismatch Data obtained from remote and autonomous sources will often not match in terms of naming scop e granularity of abstractions temp oral bases and domain denitions as listed in Figure The dierences shown in the examples must b e resolved b efore automatic pro cessing can join these values Typ e of mismatch Example Domains Key dierence Alan TuringThe Enigma versus QATH reference for reader reference for librarian Scop e dierence employees paid versus employees available includes retirees includes consultants Abstraction grain personal income versus family income from employment for taxation Temp oral basis monthly budget versus weekly production central oce factory Domain semantics postal codes versus town names one can cover multiple places can have multiple codes Value semantics excessive pay versus excessive pay per internal revenue servive per board of directors Figure Mismatches in data resources Without an extended pro cessing paradigm as prop osed in this pap er the information needed to initiate actions will b e hidden in ever larger volumes of detail scrollable on ever larger screens in ever smaller fonts In essence the gap b etween information and data will yet b e wider than it is now Knowing that information exists and is accessible creates exp ectations by endusers Finding that it is not available in a useful form or that it cannot b e combined with other data creates confusion and frustration Webelieve that the ob jection made by some users ab out computerbased systems that they create information overload is voiced b ecause those users get to o much of the wrong kind of data Use of a model In order to visualize the requirements we place on future information systems we consider the activitie s that are carried out to day whenever decisions are b eing made The making of informed decisions requires the application of a variety of knowledge to information ab out the state of the world To clarify the distinction of data and and knowledge in our mo del we restate a denition from WiederholdB Data describ es sp ecic instances and events Data may gathered automatically or clerically The correctness of data can b e veried visavis the real world Knowledge describ es abstract classes Each class typically covers many instances Exp erts are needed to gather and formalize knowledge Data can b e used to disproveknowledge To manage the variety of knowledge we employ sp ecialists and to manage the volume of data we segment our databases In these partitions partial results are pro duced abstracted and ltered The problem of making decisions is now reduced to the issue of chosing and evaluating the signicance of the pieces of information derived in those partitions and fusing the imp ortant p ortions For example an investment decision for a manufacturer will dep end on the fusion of information on its own pro duction capabilityversus that of others of its sales exp erience in related pro ducts of the market for the conceived pro duct at a certain price of the costto price ratio appropriate for the size of the market and its cost of the funds to b e invested Sp ecialists will consider these diverse topics and each sp ecialist will consult multiple data resources to supp ort their claims The decision maker will fuse that information using considerations of risk and longrange ob jectives in combining the results An eective architecture for future information systems must supp ort automated in formation acquisition pro cesses for decisionmaking activiti es By default we approachthe solution in an manner analogous seen in humanbased supp ort systems While we mo del the system based on abstractions from the world of human information pro cessing wedonot constrain the architecture to b e anthrop omorphic and to mimic human b ehavior There are many asp ects of human b ehavior that wehave not yet b een able to capture and formalize adequately Our goal is merely to dene an architecture wholly comp osed of pieces of soft ware that are available or app ear to b e attainable in a mo dest timeframe saytenyears The demands of the software in terms of pro cessing cost should b e such that mo dern hardware can deal with it Current state We are not starting from a zero base Systems are b ecoming available now with the capabil ities envisaged byVannevar Bush for his memex in Bush We can select and scroll information on our workstation displays wehave remote access to data and can presentthe values on one of multiple windows we can insert do cuments into les in our workstation and we