# Job Oriented Engineering Course in Business Intelligence and Data Analytics

**Job Oriented Engineering Course in **Business Intelligence and Data Analytics

1 / Statistical Analysis Techniques

2 / Big Data Analytics

3 / Machine Learning

4 / Computer Programming

5 / Professional Development courses

6 / Project

**STATISTICAL ANALYSIS TECHNIQUES**

UNIT I - DATA PREPROCESSING

Reading and getting data into R – ordered and unordered factors – arrays and matrices – lists and data frames – reading data from files

Data Preprocessing: handling incomplete or incorrect data, handling missing values, subsetting, sorting, transforming scale, determining percentiles, removing noise, removing inconsistencies, transformations, standardizing, min-max normalization, z-score standardization

UNIT II - DESCRIPTIVE STATISTICS

Populations and samples, Sampling Techniques - Data classification, Tabulation, Frequency and Graphic representation; Measures of central value: Arithmetic mean, Geometric mean, Harmonic mean, Mode, Median, Quartiles, Deciles, Percentile; Measures of variation: Range, IQR, Quartile deviation, Mean deviation, standard deviation; Measures of association: coefficient variance, ANOVA, corelation, outliers; Measures of shape: Skewness, Moments and Kurtosis

UNIT III - INFERENTIAL STATISTICS AND HYPOTHESIS TESTING

Random variable, probability distributions, joint probability function, Sampling distribution of mean, Central Limit Theorem, Standard Error

Estimation - Point and Interval Estimates, Confidence Intervals, level of confidence, sample size

Hypothesis Testing - Level of significance, p-value, z-test, t-test, chi-square test, 1 and 2 tailed test, uses of t-distribution, F-distribution, χ2 distribution

Conditional probability, expectation, independence, Bayes' rule

UNIT IV - PREDICTIVE ANALYTICS

Predictive modeling and Analysis - Regression Analysis, Multicollinearity, Correlation analysis, Rank correlation coefficient, Multiple correlation, Least square, Curve fitting and goodness of fit, Residual analysis, Logistic regressions

UNIT V - EXPLORATORY DATA ANALYSIS AND VISUALIZATION

Boxplot, scatter plot, histogram, model visualization, clustering and classification

Make your data alive with visuals using R, Excel and tools like Tableau, Introduction to graphical analysis – plot() function – displaying multivariate data – matrix plots – multiple plots in one window - exporting graph - using graphics parameters

UNIT VI - TIME SERIES FORECASTING

Forecasting Models for Time series : Time series data, components of time series, TS forecasting modelling methods- Simple Moving Average, Simple exponential, double exponential (Holt's method), Triple exponential (Holt's winter method)

**BIG DATA ANALYTICS**

UNIT I – INTRODUCTION HADOOP ARCHITECTURE

Big Data and its importance, Apache Hadoop and Hadoop EcoSystem, Moving Data in and out of Hadoop, Hadoop Architecture, Hadoop daemons, Schedulers, Hadoop 2.0 New Features, YARN

Cluster Setup, SSH and Hadoop Configuration

UNIT - II HDFS and MAPREDUCE

Introduction to distributed file system, Common Hadoop Shell commands, Hadoop Storage: HDFS, blocks, replication, HDFS commands

Hadoop Map Reduce paradigm, Map and Reduce tasks, inputs and outputs of MapReduce - Data Serialization, Map / Reduce Side Join, write MR jobs in Java, Running MR jobs in local / pseudo / cluster mode, Data Locality, Shuffling and sorting

UNIT - III PIG

PIG fundamentals, MapReduce vs. PIG, data types, programming constructs, execution modes, Grunt Shell, Script, Built-in Functions, Relational Join Operators, Core Relational Operators, How to write UDFs in Pig

UNIT - IV HIVE AND HIVEQL

Hive Architecture and Installation, Hive vs RDBMS, Built-in Hive Functions, HiveQL - Querying Data - Sorting and Aggregating, Joins and Subqueries, How to write UDFs in Hive

UNIT - V HBASE

HBase concepts, Schema Design, HBase Shell, HBase Java API for CRUD Operations, HBase vs. RDBMS

Introduction to Zookeeper, Oozie, Flume, Sqoop

Unit VI - SPARK

Spark Introduction, Framework, Installation, Spark with Map-Reduce, Spark-SQL with dataframes, Spark ML

**Machine Learning**

Unit I - DATA PREPROCESSING

Text preprocessing, stop word removal, stemming, Dimensionality Reduction, Feature Selection algorithms, TF-IDF computation

Unit II - CLASSIFICATION

Supervised learning, Bayesian Classification, k-Nearest Neighbors (k-NN), Decision tree, Support Vector Machines, Neural Networks, Multi label classification

Overfitting/Underfitting, bagging/boosting and ensemble methods, Classifier performance measures, confusion matrix, Cross validation

Unit III - CLUSTERING AND OUTLIER ANALYSIS

UnSupervised learning, K-means algorithm, other techniques, Interpretation of clusters and validation

Introduction to outlier mining, Applications, Detection Techniques

UNIT IV – ASSOCIATION MINING

Association rule mining, Apriori algorithm, Market Basket Analysis, Associative Classification

Unit V - TEXT ANALYTICS

Introduction, text mining operations, Categorization, Clustering, Information extraction, Text mining applications

Unit VI – BUSINESS INTELLIGENCE

What is a data warehouse, need for a data warehouse, architecture, Data Integration, data marts, OLTP vs OLAP, Multidimensional Modeling: Star and snow flake schema, Data cubes, Enterprise Reporting

OLAP operations, Data Cube Computation and Data Generalization, Data lake

Recent trends

[Note: analysis using tools like R / SCILAB / WEKA / MEKA]