Popular Article Popular Kheti Volume -2, Issue-1 (January-March), 2014 Available online at www.popularkheti.info © 2014 popularkheti.info ISSN:2321-0001

Use of Statistical Packages for Designing and Analysis of Experiments in Latika Sharma and Nitu Mehta* (Ranka) Dept. of Agricultural Economics & Management, Rajasthan College of Agriculture, MPUAT, Udaipur-313001, India *Email of corresponding author: [email protected]

As long as analysis of data generated is concerned there are many statistical packages that come very handy. But as far as generation of design is concerned there are not many software packages available for this purpose. And when we talk of generation of the design, we mean an algorithm that is capable of generating the design already available in the literature. Using computer algorithms, it has not been possible to generate new designs. This is an open area of research. So the purpose of the present article is to give an exposure to various statistical software packages that are useful for designing and analysis of experiments.

Introduction Statistical Science is concerned with the twin aspect of theory of design of experiments and sample surveys and drawing valid inferences there from using various statistical techniques/methods. The art of drawing valid conclusions depends on how the data have been collected and analyzed. Depending upon the objective of the study, one has to choose an appropriate statistical procedure to test the hypothesis. When the number of observations is large or when the researcher is interested in multifarious aspects or some study, such calculations are very tedious and time consuming on a desk calculator. In this context, it is essential that the manpower engaged in teaching and research is to be trained in the applications of various statistical techniques/methods through the use of computer. An attempt has been made to cover computer aided analysis (using various statistical packages) related to Descriptive , Test of Significance, Design and Analysis of Experiment, Non parametric method, Forecasting through time-series models and some Financial analysis etc. In agricultural research, the key questions to be answered are generally expressed in terms of hypothesis that has to be verified or disapproved through experimentation. Whenever we want to ascertain the validity of any assertion, we need to generate data and then on the basis of data generated we draw valid conclusions. Thus, any experimentation has two major components, viz., designing the experiment and the analysis of data generated to draw meaningful and valid conclusions. In the earlier days of experimentation, designs were generated in such a way that there was ease in the data analysis. But with the advent of high-speed computers, mere ease in analysis cannot be a strong reason for the generation and then ultimate use of the design. With many statistical

Popular Kheti ISSN:2321-0001 112 Popular Sharma and Mehta, 2014, Pop. Kheti, 2(1):112-117 Article

software packages available, analyzing the data is not a problem worth naming. Now there are other considerations that go in the choice of a design for a given experimental situation. The design should be cost effective keeping in view the scarce and expensive resources. The design should be such that it provides precise estimates of the comparisons of interest to the experimenter. The design should be able to absorb various shocks like loss of data, presence of outliers, interchange and/or exchange of treatments, model inadequacy, etc., besides providing as small an experimental error as possible, or in other words as small a CV value as possible. So long as the analysis of data generated is concerned there are many statistical software packages that come very handy. But as far as generation of design is concerned there are not many software packages available for this purpose. And when we talk of generation of the design, we mean an algorithm that is capable of generating the design already available in the literature. Using computer algorithms, it has not been possible to generate new designs. This is an open area of research. So the purpose of the present article is to give an exposure to various statistical software packages that are useful for designing and analysis of experiments.

2. Statistical Packages for Designing and Analysis of Experiments The software packages are useful to create cutting edge methodologies and build revealing graphics that can lead to important discoveries from the experimental data. Computers can also help in the cataloguing of the designs, generation of the design, generation of the randomized layout of the design besides providing the analysis of the data generated for drawing statistically meaningful conclusions. One may also use the computers in teaching the subject of design and analysis of experiments in the classroom. For the design and analysis of experiments a number of software packages are available. Some the statistical packages are Statistical Analysis System (SAS), JMP, Statistical Package for Social Sciences (SPSS), SYSTAT, GENSTAT, GLIM, , MS- EXCEL, Design Expert Software, MICROSTA, MSTATC, Statistical Package for block designs SPBD etc.

SAS : SAS/STAT software, an integral component of the SAS System, provides extensive statistical capabilities with tools for both specialized and enterprise-wide analytical needs. Ready-to-use procedures handle a wide range of statistical analyses, including , regression, categorical data analysis, multivariate analysis, and nonparametric analysis. 1. Analysis of Variance :Analysis of variance is a technique for analyzing experimental data. With SAS/STAT software, you can perform analysis of variance for balanced or unbalanced designs, multivariate analysis of variance, and repeated measurements analysis of variance. You can also fit general linear models and mixed models for a variety of data situations, including random effects, repeated measurements, and unbalanced designs. 2. Regression : examines the relationship between a response variable and a set of explanatory variables. The relationship is expressed as an equation that predicts the response

Popular Kheti ISSN:2321-0001 113 Popular Sharma and Mehta, 2014, Pop. Kheti, 2(1):112-117 Article

variable from a function of the explanatory variables and a set of parameters. SAS/STAT software offers a general regression procedure that uses least squares to estimate the parameters, includes nine different model selection methods, such as stepwise regression, and produces a variety of diagnostic measures. More specialized procedures fit generalized linear models, mixed linear models, nonlinear models, and quadratic response surface models. 3. Categorical Data Analysis : Categorical data are those where the outcome of interest reflects categories, rather than the typical interval scale. The data are often presented in tabular form, known as contingency tables. With SAS/STAT software, you can investigate the association in a contingency table as well as produce measures that indicate the strength of that relationship. You can also use parametric models to investigate the variation of a function of the outcome variable across the various levels of the contingency table, analyzing functions such as means, logits, and proportions. Typical analyses include log-linear models, , and bioassay analysis. 4. Multivariate Analysis : Multivariate analysis encompass a wide variety of methods for modelling data with two or more response variables or for identifying relationships among several variables without designating particular variables as response or explanatory variables. You can use common to explain the correlations among a set of variables in terms of a limited number of unobservable, or latent, variables. Principal component analysis summarizes a large number of variables with a small number of linear combinations. SAS/STAT software also performs canonical correlation, discriminant analysis, path analysis, and structural equation modelling. 5. Nonparametric Analysis: Nonparametric analysis provides methods for analyzing data that don't require specific distributional assumptions such as normality. Many nonparametric methods are based on the ranks of the observations. SAS/STAT software performs nonparametric analysis of variance, including the Kruskal-Wallis, Wilcoxon-Mann-Whitney, and Friedman tests, as well as other rank tests for balanced or unbalanced one-way or two-way designs. Exact probabilities are computed for many nonparametric statistics. 6. Other Statistical Components in the SAS System: Several other components in the SAS System also provide statistical support. SAS/INSIGHT software is a highly interactive tool for data visualization and interactive data analysis. SAS/QC software provides tools for statistical quality improvement, including tools for statistical quality control and design of experiments. SAS/ETS software includes tools for econometrics and time series analysis. SAS/IML software is a powerful matrix-programming language with extensive mathematical operators and built-in functions that allow you to program statistical algorithms easily. SAS/OR software provides a wide range of optimization methods with numerous statistical applications.

Popular Kheti ISSN:2321-0001 114 Popular Sharma and Mehta, 2014, Pop. Kheti, 2(1):112-117 Article

JMP

JMP is a statistical discovery software for the Apple Macintosh, the power Macintosh and . JMP is dynamic statistical graphics. It shows each statistical result graphically as well as with reports. It makes us understand the data better alongwith-obtaining patterns or interesting points that don't fit patterns. This can leads to important discoveries. This works on the principle of discover more, interact more and understand more. Like SAS, JMP uses a general computational approach that works for any general linear model, balanced or unbalanced, complete or with missing observations. One can have main effects, interactions and nested terms in the model. It has the features of least squares means, user-defined contrasts and saved output. It also helps the user to identify the important main-effects and interactions in screening trials, optimum combination and critical points in response surface methodology. It also helps in the analysis of experiments with mixtures.

SPSS (Statistical Package for Social Sciences)

SPSS is a software package used for conducting statistical analysis, manipulating data, and generating table and graphs that summarize data. SPSS is a comprehensive and flexible statistical analysis and data management solution. SPSS can take data from almost any type of file and use them to generate tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and conduct complex statistical analyses. SPSS is available from several platforms; Windows, Macintosh, and the UNIX systems. SPSS (Statistical Package for the Social Sciences) is a data management and analysis product produced by IBM SPSS, Inc. In Chicago, Illinois. Among its features are modules for statistical data analysis, including descriptive statistics such as plots, frequencies, charts, and lists, as well as sophisticated inferential and multivariate statistical procedures like analysis of variance (ANOVA), factor analysis, , and categorical data analysis. SPSS is particularly well-suited to survey research, though by no means is it limited to just this topic of exploration.

SYSTAT

SYSTAT provides a powerful statistical and graphical analysis system in a new graphical user interface environment using descriptive menus, toolbars and dialog boxes. It offers numerous statistical features from simple descriptive statistics to highly sophisticated statistical algorithms. Taking advantage of the enhanced user interface and environment, SYSTAT offers many major performance enhancements for speed and increased ease of use. Simply pointing and clicking the mouse can accomplish most tasks. SYSTAT provides extensive use of drag-n-drop and right click mouse functionality. SYSTAT’s intuitive Windows interface and flexible command language are designed to make your research more efficient. You can quickly locate advanced options through clear, comprehensive dialogs. SYSTAT also offers a huge data worksheet for powerful data handling.

Popular Kheti ISSN:2321-0001 115 Popular Sharma and Mehta, 2014, Pop. Kheti, 2(1):112-117 Article

SYSTAT handles most of the popular data formats like, Excel, SPSS, SAS, BMDP, MINITAB, S- Plus, Statistica, , JMP, and ASCII. All matrix operations and computations are menu driven.

GENSTAT

It is an interactive general-purpose statistical package. It has got a flexible command language. It covers most of the standard statistical analysis procedures and can also be easily extended for newer techniques. It is particularly useful for the analysis of designed experiments, the fitting of linear models (including regression) and general linear models. It also covers most multivariate techniques, time-series analysis and Optimization.

GLIM (Generalized Linear Interactive Modelling)

It is an interactive command-driven package, which is primarily concerned with fitting generalized linear models. This means that it covers regression, ANOVA, probit and logit analysis and log linear models.

Design-Expert Software

This is software for Design of Experiments on windows. Main features of this software are the Response surface designs and designs for experiments with mixtures. Very important contributions have been made in the field of design of experiments. But these designs have not found the favor of the experimenters in the NARS. The reasons are not far to seek. The generation of the layout of the design, a randomized layout at that, and then subsequently the analysis of the data generated may be a stumbling block with the experimenters. Therefore, a need was felt to develop indigenous software packages that give us a catalogue of designs from where the experimenter may choose a design for his experiment, to get a randomized layout of the design and then to get the data analyzed through the package. The package must be user friendly and may be operative without the aid of a manual. Online help and details should be available in the package. With these points in view Indian Agricultural Statistics Research Institute has initiated the work on development of statistical software packages for cataloguing and generation of the designs alongwith the analysis the data generated.

Stata

It is an increasingly popular and powerful package. It is my runaway favorite. Strengths include the ease of using a command line interface, a menu-based interface, or saved “do” files of commands. The menu-based interface produces the appropriate command line commands, so it is a great way to improve your understanding of Stata’s command language. The Stata documentation is first rate both in its examples and its presentation of underlying econometric theory. There is also a large Stata user community which (1) powers the Statalist mailing list, where members are very generous at helping answer questions both simple and complex and (2) contributes a huge number of user-written

Popular Kheti ISSN:2321-0001 116 Popular Sharma and Mehta, 2014, Pop. Kheti, 2(1):112-117 Article

commands which one can easily add to one’s Stata toolkit. Several of these commands make it very simple to produce ready to publish tables of results, summary data, correlation tables, etc. That saves one from having to do post -formating in Excel or Word. With Stata’s Mata language, aimed mostly at matrix manipulation, one can craft powerful and fast programs. Stata’s main command language is quite powerful in itself, but Mata can be pre-compiled and is good at especially complex calculations. Stata’s can produce attractive graphs of many types. Because there are so many options, it can be a little overwhelming at first (good chance to use the menus to learn) and Stata doesn’t do 3-D graphics (several user written modules to do so are quite basic). Additionally, it isn’t primarily focused on data manipulation.

Conclusion The Statistical Analysis Packages seem to be a mixture of command driven and menu driven software. Data entry is keyboard. Data file operations include sorting, file merging and case deletion. Experiments can also be randomized to assist in experiment planning. Results include Descriptive statistics, Correlation and regression, ANOVA, Multivariate Analysis, Nonparametric Analysis and Other Statistical Components. Various statistical software packages are useful for designing and analysis of experiments making the job easier and faster.

Popular Kheti ISSN:2321-0001 117