Probabilistic Semantic Web Mining Using Artificial Neural Analysis 1.1 Data, Information, and Knowledge

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, No. 3, March 2010 Probabilistic Semantic Web Mining Using Artificial Neural Analysis 1.1 Data, Information, and Knowledge 1. T. KRISHNA KISHORE, 2. T.SASI VARDHAN, AND 3. N. LAKSHMI NARAYANA Abstract Most of the web user’s requirements are 1.1.1 Data search or navigation time and getting correctly matched result. These constrains can be satisfied Data are any facts, numbers, or text that with some additional modules attached to the can be processed by a computer. Today, existing search engines and web servers. This paper organizations are accumulating vast and growing proposes that powerful architecture for search amounts of data in different formats and different engines with the title of Probabilistic Semantic databases. This includes: Web Mining named from the methods used. Operational or transactional data such as, With the increase of larger and larger sales, cost, inventory, payroll, and accounting collection of various data resources on the World No operational data, such as industry Wide Web (WWW), Web Mining has become one sales, forecast data, and macro economic data of the most important requirements for the web Meta data - data about the data itself, such users. Web servers will store various formats of as logical database design or data dictionary data including text, image, audio, video etc., but definitions servers can not identify the contents of the data. 1.1.2 Information These search techniques can be improved by The patterns, associations, or relationships adding some special techniques including semantic among all this data can provide information. For web mining and probabilistic analysis to get more example, analysis of retail point of sale transaction accurate results. data can yield information on which products are Semantic web mining technique can provide selling and when. meaningful search of data resources by eliminating 1.1.3 Knowledge useless information with mining process. In this technique web servers will maintain Meta Information can be converted into information of each and every data resources knowledge about historical patterns and future available in that particular web server. This will trends. For example, summary information on retail help the search engine to retrieve information that supermarket sales can be analyzed in light of is relevant to user given input string. promotional efforts to provide knowledge of This paper proposing the idea of combing consumer buying behavior. Thus, a manufacturer these two techniques Semantic web mining and or retailer could determine which items are most Probabilistic analysis for efficient and accurate susceptible to promotional efforts. search results of web mining. SPF can be 1.1.4 Data Warehouses calculated by considering both semantic accuracy Dramatic advances in data capture, and syntactic accuracy of data with the input string. processing power, data transmission, and storage This will be the deciding factor for producing capabilities are enabling organizations to integrate results. their various databases into data warehouses. Data 1. Concept of Data Mining warehousing is defined as a process of centralized 294 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, No. 3, March 2010 data management and retrieval. Data warehousing, analytical software are available: statistical, like data mining, is a relatively new term although machine learning, and neural networks. Generally, the concept itself has been around for years. Data any of four types of relationships are sought: warehousing represents an ideal vision of Classes: Stored data is used to locate data in maintaining a central repository of all predetermined groups. For example, a restaurant organizational data. chain could mine customer purchase data to 1.2 What can data mining do? determine when customers visit and what they Data mining is primarily used today by typically order. This information could be used companies with a strong consumer focus - retail, to increase traffic by having daily specials. financial, communication, and marketing Clusters: Data items are grouped according to organizations. It enables these companies to logical relationships or consumer preferences. determine relationships among "internal" factors For example, data can be mined to identify such as price, product positioning, or staff skills, market segments or consumer affinities. and "external" factors such as economic indicators, Associations: Data can be mined to identify competition, and customer demographics. And, it associations. The beer-diaper example is an enables them to determine the impact on sales, example of associative mining. customer satisfaction, and corporate profits. Sequential patterns: Data is mined to anticipate Finally, it enables them to "drill down" into behavior patterns and trends. For example, an summary information to view detail transactional outdoor equipment retailer could predict the data. likelihood of a backpack being purchased based With data mining, a retailer could use on a consumer's purchase of sleeping bags and point-of-sale records of customer purchases to send hiking shoes. targeted promotions based on an individual's Data mining process includes various internal purchase history. By mining demographic data operations as shown in Fig 1 from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments. For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual Fig 1: Data Mining Process customers. American Express can suggest products 1.3.1 Data mining consists of five major to its cardholders based on analysis of their monthly expenditures. elements: 1.3 How does data mining work? Extract, transform, and load transaction While large-scale information technology data onto the data warehouse system. has been evolving separate transaction and Store and manage the data in a analytical systems, data mining provides the link multidimensional database system. between the two. Data mining software analyzes Provide data access to business analysts relationships and patterns in stored transaction data and information technology professionals. based on open-ended user queries. Several types of Analyze the data by application software. 295 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, No. 3, March 2010 Present the data in a useful format, such as Data/information extraction: Our focus a graph or table. will be on extraction of structured data from Web 1.3.2 Different levels of analysis are pages, such as products and search results. available: Extracting such data allows one to provide services. Two main types of techniques, machine Artificial neural networks: Non-linear learning and automatic extraction are covered. predictive models that learn through training and Web information integration and resemble biological neural networks in structure. schema matching: Although the Web contains a Genetic algorithms: Optimization techniques huge amount of data, each web site (or even page) that use process such as genetic combination, represents similar information differently. How to mutation, and natural selection in a design based identify or match semantically similar data is a on the concepts of natural evolution. very important problem with many practical Decision trees: Tree-shaped structures that applications. Some existing techniques and represent sets of decisions. These decisions problems are examined. generate rules for the classification of a dataset. Opinion extraction from online sources: Specific decision tree methods include There are many online opinion sources, e.g., Classification and Regression Trees (CART) and customer reviews of products, forums, blogs and Chi Square Automatic Interaction Detection chat rooms. Mining opinions (especially consumer (CHAID). CART and CHAID are decision tree opinions) is of great importance for marketing techniques used for classification of a dataset. intelligence and product benchmarking. We will They provide a set of rules that you can apply to introduce a few tasks and techniques to mine such a new (unclassified) dataset to predict which sources. records will have a given outcome. CART Knowledge synthesis: Concept segments a dataset by creating 2-way splits while hierarchies or ontology are useful in many CHAID segments using chi square tests to create applications. However, generating them manually multi-way splits. CART typically requires less is very time consuming. A few existing methods data preparation than CHAID. that explores the information redundancy of the Nearest neighbor method: A technique that Web will be presented. The main application is to classifies each record in a dataset based on a synthesize and organize the pieces of information combination of the classes of the k record(s) on the Web to give the user a coherent picture of most similar to it in a historical dataset (where k the topic domain.. 1). Sometimes called the k-nearest neighbor Segmenting Web pages and detecting technique. noise: In many Web applications, one only wants Rule induction: The extraction of useful if-then the main content of the Web page without rules from data based on statistical significance. advertisements, navigation links, copyright

Probabilistic Semantic Web Mining Using Artificial Neural Analysis 1.1 Data, Information, and Knowledge

Different Aspects of Web Log Mining

Towards Semantic Web Mining

Analysis of Web Logs and Web User in Web Mining

Data Mining Techniques for Social Media Analysis

Review of Various Web Page Ranking Algorithms in Web Structure Mining

Text and Web Mining a Big Challenge for Data Mining

Web Mining.Pdf

Journey from Data Mining to Web Mining to Big Data

A Survey on Web Multimedia Mining

Data, Text, and Web Mining for Business Intelligence

An XML-Based Approach for Warehousing and Analyzing

Web Text Mining for News by Classification