References NEW SOFTWARE TOOLS Anderson, James P. (1980), “Computer security threat monitoring and surveillance,” Technical report, James Mosaic Displays in P. Anderson Co., Fort Washington, PA, April 1980. S-PLUS: A General Computer Immune Systems, www.cs.unm.edu/forrest/ Implementation and a (accessed June 23, 1998) Case Study. COAST project, www.cs.purdue.edu/coast/ By John W. Emerson (accessed June 23, 1998) Introduction DuMouchel, W. and Schonlau, M. (1998), “A compar- Hartigan and Kleiner (1981) introduced the mosaic as ison of test statistics for computer intrusion detection a graphical method for displaying counts in a contin- based on principal components regression of transition gency table. Later, they defined a mosaic as “a graphi- probabilities,” In Proceedings of the 30th Symposium cal display of cross-classified data in which each count on the Interface: Computing Science and Statistics, (to is represented by a rectangle of area proportional to the appear). count” (Hartigan and Kleiner 1984). Mosaics have been implemented in SAS (see Friendly 1992) as a graphi- Emerald, phlox.csl.sri.com/emerald/ cal tool for fitting log-linear models. Interactive mosaic plots (see Theus 1997a, b) have been implemented in (accessed June 23, 1998) Java. A third implementation is available in MANET, a data-visualization software package specifically for Intrusion Detection for Large Networks, seclab.cs.ucdavis.edu/arpa/ the Macintosh. No general implementation has been available in S-PLUS, one of the most popular statisti- (accessed June 23, 1998) cal packages. Netranger, The implementation presented in this article, while www.wheelgroup.com/netrangr/1netrang.html lacking the modelling features of Friendly’s SAS im- (accessed June 23, 1998) plementation, provides a simply specified function for mosaics displaying the joint distribution of any number Martin Theus of categorical variables. As an illustration, this article
AT&T Labs-Research examines patterns in television viewer data. A four-way
[email protected] table of 825 cells represents Nielsen tele- vision ratings (number of viewers) broken down by day, time, network, and switching behavior (changing chan- Matthias Schonlau nels, turning the television off, or staying with the cur- AT&T Labs-Research and rent channel) for the week starting November 6, 1995. National Institute of Simple patterns in the data appearing in the mosaic sup- Statistical Sciences port intuitive explanations of viewer behavior. [email protected] The Data
Nielsen Media Research maintains a sample of over 5,000 households nationwide, installing a Nielsen Peo- ple Meter (NPM) for each television set in the house- hold. The sample is designed to reflect the demographic composition of viewers nationwide, and uses 1990 Cen- sus data to achieve the desired result. Nielsen summa- rizes the stream of minute-by-minute measurements to provide quarter-hour viewing measurements (defined as the channel being watched at the midpoint of each
Vol.9 No.1 Statistical Computing & Statistical Graphics Newsletter 17 Monday Tuesday Wednesday Thursday Friday 8:00 8:15 8:30 8:45 9:00 9:15 9:30 9:45 10:00 10:15 10:30
21.7% 20.9% 19.7% 20.1% 17.6% 9.1% 9.4% 9.5% 9.6% 9.6% 9.6% 9.4% 9.3% 8.6% 8.3% 7.6%
Figure 1. (a) Mosaic of the week’s aggregate audience by day (lefthand panel); and (b) mosaic of the week’s aggregate audience by time (righthand panel).
quarter-hour block) for each viewer in the sample. (De- Shading of the tiles resulting from the inclusion tails are presented in Nielsen’s National Reference Sup- of the final variable in the mosaic may be speci- plement 1995.) fied, if desired. The amount of space separating A “TV guide” of the prime-time programming of the the tiles at each level of the mosaic may also be four major networks (ABC, CBS, NBC, and FOX) for customized. the weekdays starting Monday, November 6, 1995 ap- The documentation and S-PLUS code are available – pears in Figure 2. During any quarter-hour, the indi- details are provided at the end of the article. The basic vidual is observed watching a major network channel, algorithm, an efficient recursive procedure, proceeds as a non-network channel, or not watching television. At follows: 10:00 however, FOX ends its network programming, so Nielsen does not record individuals watching FOX af- 1. Initialize the parameters and the graphics device
ter 10:00. I confine this study to a subset consisting of – the lower left and upper right corners of the plot
x y x y