Efficient and Secure Statistical Computing in Office Applications

G¨okhan Aydınlı School of Business and Economics, Humboldt-Universit¨atzu Berlin, CASE, SFB 373 Spandauer Straße 1 D-10178 Berlin [email protected]

Overview

Let’s not kid ourselves: the most widely used piece of software for statistics is Excel. This quote of B.D. Ripley quite soberly describes the state of demand for statistical software nowa- days. Not only students of economics, management science and related fields but particularly the industry asks for intuitive, efficient and secure software for statistical data analysis. But not for the sake of high implementation costs and the overhead of a steep learning curve. We will try to contribute to this pursuit and furthermore want to argue in favor of ap- plications as appropriate interface solution to matrix oriented statistical languages. We present the add-ins MD*ReX and RExcel, two statistical environments embedded in via (D)COM clients, the former based on the XploRe client/Server architecture and the lat- ter on as a numerical-statistical ”methods server”. We will emphasize the productivity gain available by combining the computational power of a statistical programming environment with the direct manipulation facilities available in spreadsheet programs like Excel. We also want to stimulate the discussion of securing the communication in such a client/server environment.

Spreadsheets and Statistics

The use of electronic as the primary software tool for teaching management science modeling techniques and quantitative methods in economics and finance undoubtedly played a key role in the increasing impact of quantitative lectures given in graduate programs. Researchers suggest that the ability to extract data from various sources and embed analytical decision models within larger systems are two of the most valuable skills for business students entering today’s IT dominated workplace. But today’s industry standard spreadsheet hardly meets the requirements for modern statistical analysis and efficient method proliferation: - dardization, transparency and reproducibility. A vast literature evolved over the last decade on methods of teaching and proliferation of statistics especially in management science. A not to small portion of this literature concentrates on spreadsheets as means of teaching quanti- tative skills. The suggested fields of application reach from introductory statistics, to more elaborate examples like decision support models or Monte Carlo/Markov Chain (MCMC) sim- ulations. Though or because of the known deficiencies of Excel in the field of statistics, the literature points at the various additional solutions which exist to overcome the computational drawbacks Excel surely displays. Excel has never been designed to be a full blown statistical package. Therefore we cannot expect functionality similar to professional statistical programs. On the other hand having powerful statistical methods available directly within Excel can turn it into a well known and convenient frontend to scientific statistical engines.

Glueing the Pieces Together

The Component Object Model (COM) architecture allows to embed code libraries into any application implementing a COM client interface. E.g. the programs in the Microsoft Office Suite provide COM connectivity. To achieve this goal one has to develop ”glue” code in the client application. COM add-ins allow to use one shared ”glue” code library across Office applications. Our aim is to provide an environment which on the one hand offers an intuitive and easy approach to statistical/quantitative methods and on the other hand does not give up accuracy, transparency and efficiency. We therefore propose a multi-level architecture based on the computing paradigm of client/server applications and add-ins; the COM and COM add-in approaches implement such a client server model. The statistical engine hereby functions as method repository, providing scalability of computing power for applications e.g. multi asset option hedging, VaR or educational purposes. This architecture also allows for easy exchangeability of the statistical engine. Furthermore we are aware of different modes of usage. Hence it seems desirable to ac- count for various user profiles: the methods developer (Teacher) who needs direct access to the statistical engine (e.g. through the command line utility in MD*ReX), the sophisticated methods user (Graduate Student) seeking for a macro editor (e.g. the XploRe Direct utility in MD*ReX) and of course the na¨ıve user (Undergraduate Student) who is accustomed to a menu driven interface with dialogues and menu options (e.g. the MD*ReX Toolbar). RExel offers the same functionalities.

Do we need Security?

Data transfer in distributed or even remote environments induces security questions, in terms of snooping risk. COM ’s RPC based security architecture provides authentication but encryption only on packet basis. Further privacy might be achieved via secure Virtual Private Networks. MD*ReX provides its own protocol MD*Crypt. This Java based COM interface could provide key-based encryption via Secure Sockets (SSL).

REFERENCES

Aydınlı, G. (2001), Net Based Spreadsheets in Quantitative Finance, in W. H¨ardle, T. Kleinow, G. Stahl: Applied Quantitative Finance, Springer Verlag, Heidelberg. Aydınlı et al. (2001), ReX: Linking XploRe to Excel, Computational Statistics & Data Analysis, Vol. 12, p. 27-37. McCullough, B.D., Wilson, B. (1999), On the Accuracy of Statistical Procedures in Mi- crosoft Excel, Computational Statistics & Data Analysis, Vol. 31, p. 27-37. Neuwirth, E., Baier, T. (2001), Embedding R in Standard Software, and the other way round, DSC 2001 Proceedings of the 2nd International Workshop on Distributed Statistical Computing. www.-stat.de, www.r-project.org, www.md-rex.com, md-crypt.com

RESUM´ E´

Nous avons argument´een faveur des tableurs comme une interface appropri´eepour les langages statistiques orient´esmatriciellement. Nous avons pr´esent´eles ajouts MD*ReX und RExcel, deux environnements statistiques inclus dans Microsoft Excel via COM clients, bas´es sur l’architecture client/server d’XploRe et sur R comme un ”methods server” num´erique et statistique. Nous avons soulign´ele gain de productivit´edisponible en combinant la puissance de calcul d’un environnement de programmation statistique avec les facilit´esde manipulation disponibles avec des tableurs comme Excel. De mˆeme,nous avons stimul´ela discussion concer- nant la s´ecurisationde la communication pour un tel environnement client/serveur.