Probability Distributions in Library and Information Science: a Historical and Practitioner Viewpoint

Probability Distributions in Library and Information Science: a Historical and Practitioner Viewpoint

Probability Distributions in Library and Information Science: A Historical and Practitioner Viewpoint Stephen J. Bensman LSU Libraries, Louisiana State University, Baton Rouge, LA 70803. E-mail: [email protected] This paper has a dual character dictated by its twofold information science with the history of probability and purpose. First, it is a speculative historiographic essay statistics. Second, the paper is intended to serve as a prac- containing an attempt to fix the present position of li- brary and information science within the context of the tical guide to persons doing statistical research in library probabilistic revolution that has been encompassing all and information science. Thus, the components comprising of science. Second, it comprises a guide to practitioners the duality of this paper are closely interconnected. engaged in statistical research in library and information I came to this topic as a result of recently completed science. There are pointed out the problems of utilizing research (Bensman, 1996; Bensman & Wilder, 1998) on the statistical methods in library and information science because of the highly and positively skewed distribu- market for scientific information. In this research, I wanted tions that dominate this discipline. Biostatistics are in- to solve what seemed a simple problem: What role does dicated as the source of solutions for these problems, scientific value play in the price libraries pay for scientific and the solutions are then traced back to the British journals? To solve this problem, I had to use parametric biometric revolution of 1865–1950, during the course of statistical techniques such as correlation and regression, and which modern inferential statistics were created. The thesis is presented that science has been undergoing a these techniques immediately confronted me with what probabilistic revolution for over 200 years, and it is seemed to be an extremely difficult problem. These tech- stated that this revolution is now coming to library and niques are based on the assumption of the normal distribu- information science, as general stochastic models re- tion, whereas library and information science data do not place specific, empirical informetric laws. An account is conform to the normal distribution but are dominated by given of the historical development of the counting dis- tributions and laws of error applicable in statistical re- horrendously skewed distributions. I realized that there is a search in library and information science, and it is need to connect the information science laws with the stressed that these distributions and laws are not spe- probability distributions, on which statistics are based, in cific to library and information science but are inherent some easily understandable manner, as an aid to persons in all biological and social phenomena. Urquhart’s Law is conducting statistical investigations of the problems afflict- used to give a practical demonstration of the distribu- tions. The difficulties of precisely fitting data to theoret- ing libraries. This need is becoming especially pressing, as ical probability models in library and information science computers are not only making much more data available because of the inherent fuzziness of the sets are dis- but also making simpler highly sophisticated statistical anal- cussed, and the paper concludes with the description of yses through spreadsheets and software such as SAS. a simple technique for identifying and dealing with the To obtain help in this matter, I contacted the LSU De- skewed distributions in library and information science. Throughout the paper, emphasis is placed on the rele- partment of Experimental Statistics, which assigned me as vance of research in library and information science to an adviser an ecologist named Jay Geaghan. Jay suggested social problems, both past and present. that I read a manual entitled Some Methods for the Statis- tical Analysis of Samples of Benthic Invertebrates (Elliott, 1977). It was an eye-opener in two respects. First, the Introduction manual introduced me to the system of probability distri- This paper has a dual character dictated by its twofold butions, with which biologists model patterns in nature, purpose. First, it is a speculative historiographic essay, in showing how to test for them and transform them for which I attempt to describe the present state of library and standard parametric statistical operations. Second, it pointed information science in terms of the overall development of out that the key model for the skewed distributions domi- science. To do this, I connect the history of library and nating biological phenomena is the negative binomial dis- tribution. This jarred a memory of Price (1976) describing the negative binomial distribution as the model for the © 2000 John Wiley & Sons, Inc. double-edged Matthew Effect, which Robert K. Merton and JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 51(9):816–833, 2000 his students, Jonathan and Stephen Cole and Harriet Zuck- is the geometric mean, from which deviations are measured erman, had placed at the basis of the social stratification of in logarithmic units. The negative binomial can be trans- science. In my previous writings (Bensman, 1982, 1985), I formed into an approximation of the lognormal distribution. had posited the double-edged Matthew Effect as underlying the skewed patterns of library use. The research on the scientific information market neces- Library and Information Science within the sitated solving many complex statistical problems. In seek- Context of the Historical Relationship of ing the solutions for these problems, I noticed a dual pattern. Probability and Statistics to Science as a Whole First, most of these problems had already been solved in The history of probability and statistics is too complex to biostatistics. Second, most of the works presenting these be adequately summarized in a paper such as this. There- solutions were British. The Elliott manual, for example, was fore, I will restrict myself to reviewing the theses of two key published by the British Freshwater Biological Association, books on this subject. Together, these books validate the and based on samples of benthic invertebrates in the English view presented in a two-volume collection of essays pub- Lake District. It also dawned on me that bibliometrics as a lished by MIT Press and entitled The Probabilistic Revolu- discipline had also risen to a great extent in Britain. Being tion (1987): that since 1800, the world has been experienc- a historian by training, my interest was naturally piqued, ing a scientific revolution, in which the mathematical theory and I decided to write a book that would not only present a of probability has been adopted in discipline after disci- history of this development but would also be an aid to pline. This probabilistic revolution is coming to library and persons doing statistical research in library and information information science, as specific, empirical, bibliometric science. Such an approach seemed particularly beneficial, laws are being replaced by general stochastic models. Of because, like myself, many persons in the library field primary importance in this transition has been the seminal understand things better in terms of their historical devel- work on bibliometric distributions by Bertram C. Brookes opment than mathematically. and Abraham Bookstein. For his part, Brookes (1977, 1979, 1984) concentrated on The Probability Distributions Affecting Library Bradford’s Law of Scattering, which he explored theoreti- and Information Science cally as a very mixed Poisson model. Coming to regard Bradford’s Law as a new calculus for the social sciences, he In the analysis of the production, dissemination, use, and found it almost identical mathematically to other empirical evaluation of human knowledge, we are basically dealing bibliometric laws, suspecting of these laws that “beneath with three discrete or counting distributions and two con- their confusions there lurks a simple distribution which tinuous laws of error. The counting distributions are the embraces them all but which remains to be identified” following: (1) the binomial, which models uniformity and (Brookes, 1984, p. 39). He reduced these various laws to a whose characteristic is that the variance is less than the single law, which he modeled two ways as “the Inverse mean; (2) the Poisson, which models randomness and Square Law of frequencies” and “the Log Law of ranks.” whose characteristic is that variance equals the mean; and The main features of Brookes’ hypothesis of a single dis- (3) the negative binomial, which models concentration and tribution arising from a mixed Poisson process were en- whose characteristic is that the variance is greater than the dorsed by Bookstein. In his work, Bookstein (1990, 1995, mean. I hasten to add that the negative binomial is only the 1997) posited through mathematical analysis that the vari- most useful of a series of contagious distributions, and, ous bibliometric laws together with Pareto’s law on income depending on the circumstances, it can change into the beta are variants of a single distribution, in spite of marked binomial, Poisson, or logarithmic series. differences in their appearance. Seeking a way to deal with To help explain the idea of a law of error, I will present these distributions, Bookstein (1997) came to the following to you my concept of a statistical model. A statistical model conclusion: is a mental construct of reality that is logically designed to test a hypothesis. It is centered on a hypothetical point, from I have argued. .that one important mechanism for surviv- which deviations are measured according to a law of error. ing in an ambiguous world is to create functional forms that Depending on the size of the deviation from the hypothet- are not too seriously affected by imperfect conceptualiza- ical point on which the model is centered, one accepts or tion. In this article I pushed this notion further, and looked rejects the hypothesis being tested. In statistical textbooks, at suitable random components for the underlying stable the law of error is the normal distribution, and the hypo- expectations.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us