Oracle Data Mining Topics

ORACLE DATA MINING TOPICS

1 1 DEVELOPER

2 INTRODUCTION

3 HISTORY

4 FUNCTIONS DEVELOPER

Oracle Data Mining

 Developer(s) : Oracle Corporation

 Stable release : 11gR2 / September, 2009

 Type : data mining and analytics

 License : proprietary

INTRODUCTION

 Oracle Data Mining (ODM) is an option of Oracle Corporation's Relational

Database Management System (RDBMS) Enterprise Edition (EE).

 It contains several data mining and data analysis algorithms

for classification, prediction, regression, associations, feature selection,

anomaly detection, feature extraction, and specialized analytics.

 It provides means for the creation, management and operational deployment

of data mining models inside the database environment.

 Oracle Data Mining (ODM) provides powerful data mining functionality as

native SQL functions within the Oracle Database.

 Oracle Data Mining enables users to discover new insights hidden in data

and to leverage investments in Oracle Database technology.

 With Oracle Data Mining, you can build and apply predictive models that

help you target your best customers, develop detailed customer profiles,

and find and prevent fraud.

 Oracle Data Mining, a component of the Oracle Advanced Analytics

Option, helps companies better "compete on analytics."

 The Oracle Data Miner "work flow" based GUI, an extension to SQL

Developer, allows data analysts to explore their data, build and evaluate

models, apply them to new data and save and share their analytical

methodologies.

 Data analysts and application developers can use the SQL APIs to build next-

generation applications that automatically mine star schema data to build and

deploy predictive models that deliver real-time results and predictions

throughout the enterprise.

 Because the data, models and results remain in the Oracle Database, data

movement is eliminated, information latency is minimized and security is

maintained.

 Additionally, Oracle Data Mining models can be included in SQL queries and

embedded in applications to offer improved business intelligence.

 Data analysts can quickly access their Oracle data using Oracle Data Miner

11g Release 2 graphical user interface and explore their data to find patterns,

relationships, and hidden insights.

 Oracle Data Mining provides a collection of in-database data mining

algorithms that solve a wide range of business problems.

 Anyone who can access data stored in an Oracle Database can access Oracle

Data Mining results-predictions, recommendations, and discoveries

using Oracle Business Intelligence Solutions.

HISTORY

 Oracle Data Mining was first introduced in 2002 and its releases

are named according to the corresponding Oracle database

release:

– Oracle Data Mining 9iR2 (9.2.0.1.0 - May 2002)

– Oracle Data Mining 10gR1 (10.1.0.2.0 - February 2004)

– Oracle Data Mining 10gR2 (10.2.0.1.0 - July 2005)

– Oracle Data Mining 11gR1 (11.1 - September 2007)

– Oracle Data Mining 11gR2 (11.2 - September 2009)

FUNCTIONS

 As of release 11gR1 Oracle Data Mining contains the

following data mining functions:

 Data transformation and model analysis:

• Data sampling, binning, discretization, and other data

transformations.

• Model exploration, evaluation and analysis.

 Feature selection (Attribute Importance):

• Minimum description length (MDL).

 Classification:

• Naive Bayes (NB).

• Generalized linear model (GLM) for Logistic

regression.

• Support Vector Machine (SVM).

• Decision Trees (DT).

 Regression:

• Support Vector Machine (SVM).

• Generalized linear model (GLM) for Multiple regression

 Anomaly detection:

• One-class Support Vector Machine (SVM).

 Feature extraction:

• Non-negative matrix factorization (NMF).

 Text and spatial mining:

• Combined text and non-text columns of input data.

• Spatial/GIS data.

 Clustering:

• Enhanced k-means (EKM).

• Orthogonal Partitioning Clustering (O-Cluster).[2][3]

 Association rule learning:

• Item sets and association rules (AM).

• Data mining (the advanced analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,[2][3][4] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.[2] The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

• Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.

• In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

• A prediction (Latin præ-, "before," and dicere, "to say") or forecast is a statement about the way things will happen in the future, often but not always based on experience or knowledge. While there is much overlap between prediction and forecast, a prediction may be a statement that some outcome is expected, while a forecast is more specific, and may cover a range of possible outcomes.

• In statistics, regression analysis is a statistical technique for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.

• association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness.

• In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features.

• Anomaly detection, also referred to as outlier detection refers to detecting patterns in a given data set that do not conform to an established normal behavior. The patterns thus detected are called anomalies and often translate to critical and actionable information in several application domains. Anomalies are also referred to as outliers, change, deviation, surprise, aberrant, peculiarity, intrusion, etc.

• In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.

INPUT SOURCES AND DATA PREPARATION

• Most Oracle Data Mining functions accept as input one relational table or view.

• Flat data can be combined with transactional data through the use of nested columns, enabling mining of data involving one-to-many relationships (e.g. a star schema).

• The full functionality of SQL can be used when preparing data for data mining, including dates and spatial data.

• Oracle Data Mining distinguishes numerical, categorical, and unstructured (text) attributes.

• The product also provides utilities for data preparation steps prior to model building such as outlier treatment, discretization, normalization and binning (sorting in general speak)

Graphical user interface: Oracle Data Miner

. Oracle Data Mining can be accessed using Oracle Data Miner a GUI “client”

that provides access to the data mining functions and structured templates

called Mining Activities that automatically prescribe the order of operations,

perform required data transformations, and set model parameters.

. The user interface also allows the automated generation of Java and/or SQL

code associated with the data mining activities.

. The Java Code Generator is an extension to Oracle JDeveloper.

. There is also an independent interface: the Spreadsheet Add-In for Predictive

Analytics which enables access to the Oracle Data Mining Predictive

Analytics PL/SQL package from Microsoft Excel.

Thank You !!!