INTRODUCTION to ANALYTICS Model

TKR College of Engineering and Technology Department of Computer Science and Engineering PE4-S1- INTRODUCTION TO ANALYTICS Model Questions -1 Part-A 1. List the Data types in R Logical Numeric Integer Complex Character Raw 2. Explain R loops A loop statement allows us to execute a statement or group of statements multiple time. Repeat loop-Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. while loop-Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body. for loop- Like a while statement, except that it tests the condition at the end of the loop body Control Statement- break statement-Terminates the loop statement and transfers execution to the statement immediately following the loop Next statement-The next statement simulates the behavior of R switch 3. What is Data Frame? • Data frames are tabular data objects. Unlike a matrix in data frame each column can containdifferent modes of data. The first column can be numeric while the second column can becharacter and third column can be logical. It is a list of vectors of equal length. Data Frames are created using the data.frame( ) function. It displays data along with header information • A data frame is used for storing data tables. It is a list of vectors of equal length 4. Explain about the concept of Reading Datasets We can import Datasets from various sources having various file types : Example: • .csv or .txt format • Big data tool – Impala • CSV File The sample data can also be in comma separated values (CSV) format. Each cell inside such data file isseparated by a special character, which usually is a comma, although other characters can be used as well. Thefirst row of the data file should contain the column names instead of the actual data. Here is a sample of the expected format. Col1,Col2,Col3 100,a1,b1 200,a2,b2 300,a3,b3 After we copy and paste the data above in a file named "mydata.csv" with a text editor, we can read the data with the function read.csv. In R data can read in two ways either from local disc or web. From disc: The data file location is known on local disc use: read.csv() or read.table() functions. Path is not specific then use : file.choose() > mydata = read.csv("mydata.csv") # read csv file > mydata From Web: The URL of the data from web is pass to read.csv() or read.table() functions 5. What is R ? Why to use R ? Justify R is a flexible and powerful open-source implementation of the language S (for statistics) developed by John Chambers and others at Bell Labs Five reasons to learn and use R: • R is open source and completely free. R community members regularly contribute packages to increase R‘s functionality. • R is as good as commercially available statistical packages like SPSS, SAS, and Minitab. • R has extensive statistical and graphing capabilities. R provides hundreds of built-in statistical functions as well as its own built-in programming language. • R is used in teaching and performing computational statistics. It is the language of choice for many academics who teach computational statistics. • Getting help from the R user community is easy. There are readily available online tutorials, data sets, and discussion forums about R • 6. Write short notes on R Lopping and control statements. A loop statement allows us to execute a statement or group of statements multiple time. Repeat loop-Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. while loop-Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body. for loop- Like a while statement, except that it tests the condition at the end of the loop body Control Statement- break statement-Terminates the loop statement and transfers execution to the statement immediately following the loop Next statement-The next statement simulates the behavior of R switch 7. What are the challenges in analysis of data? Data analytics are extremely important for risk managers. They improve decision- making, increase accountability, benefit financial health, and help employees predict losses and monitor performance. • The amount of data being collected • Collecting meaningful and real-time data • Visual representation of data • Data from multiple sources • Inaccessible data • Poor quality data • Pressure from the top • Lack of support • Confusion or anxiety • Budget • Shortage of skills • Scaling data analysis 8. Justify R has a data analytics software • R allows practicing a wide variety of statistical and graphical techniques like linear and nonlinear modeling, time-series analysis, classification, classical statistical tests, clustering, etc. R is a highly extensible and easy to learn language • R has extensive statistical and graphing capabilities. R provides hundreds of built-in statistical functions aswell as its own built-in programming language. • R is used in teaching and performing computational statistics. It is the language of choice for many academics who teach computational statistics 9. What is Data set? A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. Data sets can also consist of a collection of documents or files 10. What is R Script? Give example. • An R script is simply a text file containing (almost) the same commands that you would enter on the command line of R. ( almost) refers to the fact that if you are using sink() to send the output to a file, you will have to enclose some commands in print() to get the same output as on the command line Example: The R script(s) and data view. The R script is where you keep a record of your work. Create a new R script file: To create a new R script file: 1) File -> New -> R Script, 2) Click on the icon with the ―+‖ sign and select ―R Script‖ 3) Use shortcut as: Ctrl+Shift+N --------------------------------------------------------------------------------------------------------------- Part-B 1. What is the difference between the array and matrix?.Explain with examples. Arrays Arrays An array object (or simply array) contains a collection of elements of the same type, each of which is indexed (i.e., identified) by a number. A variable of type array contains a reference to an array object. To use an array in Java we have to: 1. declare a variable of type array that allows us to refer to an array object; 2. construct the array object specifying its dimension (number of elements of the array object); 3. access the elements of the array object through the array variable in order to assign or obtain their values (as if they were single variables). Matrix A matrix is a collection of elements of the same type, organized in the form of a table. Each element is indexed by a pair of numbers that identify the row and the column of the element. A matrix can be represented in Java through an array, whose elements are themselves (references to) arrays representing the various rows of the matrix. Declaration of a matrix A matrix is declared in the following way (as an array of arrays): int[][] m; // declaration of an array of arrays (matrix) • We have two different options for constructing matrices or arrays. Either we use the creator functions matrix () and Array (), or you simply change the dimensions using the dim () function. For example, you make an array with four columns, three rows, and two ―tables‖ like this: >my.array< - array(1:24, dim=c(3,4,2)) In the above example, ―my.array‖ is the name of the array we have given. And ―←‖ is the assignment operator. There are 24 units in this array mentioned as ―1:24‖ and are divided in three dimensions ―(3, 4, 2)‖. Although the rows are given as the first dimension, the tables are filled column-wise. So, for arrays, R fills the columns, then the rows, and then the rest. Alternatively, you could just add the dimensions using the dim ( ) function. This is a little hack that goes a bit faster than using the array ( ) function; it‘s especially useful if you have your data already in a vector. (This little trick also works for creating matrices, by the way, because a matrix is nothing more than an array with only two dimensions.) Say you already have a vector with the numbers 1 through 24, like this: >my.vector<- 1:24 You can easily convert that vector to an array exactly like my.array simply by assigning the dimensions, like this: > dim(my.vector) <- c(3,4,2) Arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimension. In the below example we create an array with two elements which are 3x3 matrices each. Creating an array: >my.array< - array(1:24, dim=c(3,4,2)) In the above example, ―my.array‖ is the name of the array we have given. There are 24 units in this array mentioned as ―1:24‖ and are divided in three dimensions ―(3, 4, 2)‖. Alternative: with existing vector and using dim() > my.vector<- 1:24 To convert my.vector vector to an array exactly like my.array simply by assigning the dimensions, like this: > dim(my.vector) <- c(3,4,2) Matrix A matrix is a collection of elements of the same type, organized in the form of a table.

Load more