<p> Job Oriented Engineering Course in Business Intelligence and Data Analytics</p><p>Sr. Subject No.</p><p>1 Statistical Analysis Techniques 2 Big Data Analytics 3 Machine Learning 4 Computer Programming 5 Professional Development courses 6 Project</p><p>STATISTICAL ANALYSIS TECHNIQUES</p><p>UNIT I - DATA PREPROCESSING Reading and getting data into R – ordered and unordered factors – arrays and matrices – lists and data frames – reading data from files Data Preprocessing: handling incomplete or incorrect data, handling missing values, subsetting, sorting, transforming scale, determining percentiles, removing noise, removing inconsistencies, transformations, standardizing, min-max normalization, z-score standardization</p><p>UNIT II - DESCRIPTIVE STATISTICS Populations and samples, Sampling Techniques - Data classification, Tabulation, Frequency and Graphic representation; Measures of central value: Arithmetic mean, Geometric mean, Harmonic mean, Mode, Median, Quartiles, Deciles, Percentile; Measures of variation: Range, IQR, Quartile deviation, Mean deviation, standard deviation; Measures of association: coefficient variance, ANOVA, corelation, outliers; Measures of shape: Skewness, Moments and Kurtosis</p><p>UNIT III - INFERENTIAL STATISTICS AND HYPOTHESIS TESTING Random variable, probability distributions, joint probability function, Sampling distribution of mean, Central Limit Theorem, Standard Error Estimation - Point and Interval Estimates, Confidence Intervals, level of confidence, sample size Hypothesis Testing - Level of significance, p-value, z-test, t-test, chi-square test, 1 and 2 tailed test, uses of t-distribution, F-distribution, χ2 distribution Conditional probability, expectation, independence, Bayes' rule</p><p>UNIT IV - PREDICTIVE ANALYTICS Predictive modeling and Analysis - Regression Analysis, Multicollinearity, Correlation analysis, Rank correlation coefficient, Multiple correlation, Least square, Curve fitting and goodness of fit, Residual analysis, Logistic regressions</p><p>UNIT V - EXPLORATORY DATA ANALYSIS AND VISUALIZATION Boxplot, scatter plot, histogram, model visualization, clustering and classification Make your data alive with visuals using R, Excel and tools like Tableau, Introduction to graphical analysis – plot() function – displaying multivariate data – matrix plots – multiple plots in one window - exporting graph - using graphics parameters UNIT VI - TIME SERIES FORECASTING Forecasting Models for Time series : Time series data, components of time series, TS forecasting modelling methods- Simple Moving Average, Simple exponential, double exponential (Holt's method), Triple exponential (Holt's winter method)</p><p>BIG DATA ANALYTICS </p><p>UNIT I – INTRODUCTION HADOOP ARCHITECTURE Big Data and its importance, Apache Hadoop and Hadoop EcoSystem, Moving Data in and out of Hadoop, Hadoop Architecture, Hadoop daemons, Schedulers, Hadoop 2.0 New Features, YARN Cluster Setup, SSH and Hadoop Configuration</p><p>UNIT - II HDFS and MAPREDUCE Introduction to distributed file system, Common Hadoop Shell commands, Hadoop Storage: HDFS, blocks, replication, HDFS commands Hadoop Map Reduce paradigm, Map and Reduce tasks, inputs and outputs of MapReduce - Data Serialization, Map / Reduce Side Join, write MR jobs in Java, Running MR jobs in local / pseudo / cluster mode, Data Locality, Shuffling and sorting</p><p>UNIT - III PIG PIG fundamentals, MapReduce vs. PIG, data types, programming constructs, execution modes, Grunt Shell, Script, Built-in Functions, Relational Join Operators, Core Relational Operators, How to write UDFs in Pig</p><p>UNIT - IV HIVE AND HIVEQL Hive Architecture and Installation, Hive vs RDBMS, Built-in Hive Functions, HiveQL - Querying Data - Sorting and Aggregating, Joins and Subqueries, How to write UDFs in Hive</p><p>UNIT - V HBASE HBase concepts, Schema Design, HBase Shell, HBase Java API for CRUD Operations, HBase vs. RDBMS Introduction to Zookeeper, Oozie, Flume, Sqoop</p><p>Unit VI - SPARK Spark Introduction, Framework, Installation, Spark with Map-Reduce, Spark-SQL with dataframes, Spark ML</p><p>Machine Learning</p><p>Unit I - DATA PREPROCESSING Text preprocessing, stop word removal, stemming, Dimensionality Reduction, Feature Selection algorithms, TF-IDF computation</p><p>Unit II - CLASSIFICATION Supervised learning, Bayesian Classification, k-Nearest Neighbors (k-NN), Decision tree, Support Vector Machines, Neural Networks, Multi label classification Overfitting/Underfitting, bagging/boosting and ensemble methods, Classifier performance measures, confusion matrix, Cross validation</p><p>Unit III - CLUSTERING AND OUTLIER ANALYSIS UnSupervised learning, K-means algorithm, other techniques, Interpretation of clusters and validation Introduction to outlier mining, Applications, Detection Techniques</p><p>UNIT IV – ASSOCIATION MINING Association rule mining, Apriori algorithm, Market Basket Analysis, Associative Classification</p><p>Unit V - TEXT ANALYTICS Introduction, text mining operations, Categorization, Clustering, Information extraction, Text mining applications</p><p>Unit VI – BUSINESS INTELLIGENCE What is a data warehouse, need for a data warehouse, architecture, Data Integration, data marts, OLTP vs OLAP, Multidimensional Modeling: Star and snow flake schema, Data cubes, Enterprise Reporting OLAP operations, Data Cube Computation and Data Generalization, Data lake Recent trends</p><p>[Note: analysis using tools like R / SCILAB / WEKA / MEKA]</p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages3 Page
-
File Size-