Using the Techniques of Data Mining and Text Mining in Educational Research

Participatory Educational Research (PER) Special Issue 2016-III, pp., 140-151 November, 2016 Available online at http://www.partedres.com ISSN: 2148-6123 Using The Techniques Of Data Mining And Text Mining In Educational Research Mustafa ERGÜN* AfyonKocatepe University, Faculty of Educational Sciences, Department of Educational Sciences Abstract Today there are many specialized ways of data storing and retrieving. Photographs, movies, articles and other similar data can be stored in electronic settings. Given that such storing provides hugh opportunity, it is called information explosion. Search engines scan the related data based on the keywords given by the internet users; based on the data gathered by firms they offer sale proposals. There are computer programs offer users the related texts via internet; these are also used by websites. Internet users’ data through emails, messages shared via social media such as facebook and twitter can be analysed for security and terrorist threat purposes. In scientific research there are many search engines, including GoPubMed in the field of biomedicine. In short, the computer-assisted text analysis has become widespread which provide search opportunities using different technological ways and software. The data in libraries and in digital settings are mostly in the form of the unstructured texts, audio and visual files. These data should be transformed into information for researchers. This topic is significant for novice research techniques. Therefore, new computer software is needed to analyse these data correctly and systematically. In educational research the use of text-, audio- and visual-based data has become frequent in recent years. In parallel to this change new techniques have appeared in regard to such data. On the other hand, there are different analyses for audio and textual data and for visual (film and photographs, etc.). The use of text mining has been expanding. For instance, it has been used for customer relationship management. Public institutions also use it to reveal fake data. It may analyse illness reports, economical reports, and unpublished research. Terrorism, aggravated theft, criminal offenses can be traced using text mining technique. In text mining uncovering patterns is similar to factor analysis. In this technique patterns in texts are uncovered using computers. Similar to the analysis of frequency, correlation and significance in qualitative research techniques, the frequency of words used in a text and connectedness (correlations and connection) are identified. The other goal is to have visual representations of these data. In text analysis it is possible to reveal the feelings, views of individuals which can also be done on audio and photograph data sets. In all scientific research the basic requirements are strong measurement, clean data sets which are error free and structured. The reliability of the study results is based on these qualities of the data. It is thought that text mining will be easier in future, because the Web 3.0 technologies, also called semantic web, would analyse texts and objects in internet together with their *[email protected] Participatory Educational Research (PER), Special Issue 2016-III;140-151, November, 2016 meaning. In other words, the texts would be put into Internet after the use of text mining. Metadata vocabulary groups (ontology), groups of words and concepts (maps) will be ready in texts. The basic aim of the Web 3.0 technology or semantic web is to make data usable and connected information. The Web 3.0 technology would assume the role of assistant that knows everything. In present period when search engines are used only websites containing the related words are listed. When the Web 3.0 technology or semantic web is used, the engines will interpret the websites and only related websites will be offered. The Web ontology language will be standard and it identifies common vocabulary or terminology. The resource description framework (RDF) will be put in files containing vocabulary hierarchy and the correlations of the objects. Key words: educational research, data mining, text mining, coding, concepts map, reliability Introduction Today it is possible to store and retrieve official and personal data; everyone can store photographs, movies and notes in electronic settings. The amount of such data increases everyday (some call it “information boom”). Many search engines can scan electronic data based on key terms (or based on content); some sales companies also use such data to offer “best” sales. Some computer programs provide their clients with textual data collected from Internet sources. Such programs are mostly used by sales companies. Today emails, notes on facebook and twitter as well as blogs are checked for security purposes. In science information-based search engines such as GoPubMed are employed. In short, computer- assisted text analysis has been used in different fields and various techniques make it possible to analyse data. Most of the data stored at libraries and digital settings are unstructured texts, voice and picture files. Using such data as information for researchers is very significant in terms of research methods. Techniques and software are needed to analyse these data correctly and systematically. Educational research is increasingly based on the textual, audio-visual data. In parallel to this development, the needed research techniques have been developed. On the other hand, audio and textual data should be simultaneously analysed, but films or photographs should be separately taken into consideration. Content analysis Content analysis is one of the traditional research methods used in social sciences and media studies. It may use both qualitative and quantitative data. It has many subcategories, including sentence analysis, speech or discourse analysis, thematic analysis, and vocubulary analysis. In these techniques the points taken into consideration include the frequency of words, word associations, conceptual associations, differential use of concepts by individuals, and grouping of words. Following the coding the process called thematic analysis takes place. There are three major types of content analysis. The first one is the traditional content analysis in which coding categories are extracted from the text. The second one, called directed Participatory Educational Research (PER) -141- Using The Techniques Of Data….., M. Ergün content analysis, tries to find out codes in accordance with theory or hypotheses and analyses these codes. The third one is summative content analysis which looks for, compares and interprets key words (Hsieh & Shannon, 2005). Both descriptive and interpretive research designs can be employed. Descriptive analysis is mostly used when the data are strictly used. If the data are to be freely interpreted, interpretive designs are employed. Content analysis is much more frequently used in descriptive designs, while thematic analysis is much more common in interpretive designs. Content analysis identifies the codes and thematic analysis produces concepts and themes based on these codes and then, interprets them. Content analysis is carried out through Excel-based software for the data obtained from structured interviews and survey questionnaires. Here frequency and percentages are identified and cross tabulation is developed as well as chi-square is calculated to make sound interpretations. The data can be presented through graphics. Both people and computer can better process and use structured and/or listed information. Th examples of the structured data include student records, hospital records, bank records, etc. In such information the related data about a person or an event is recorded and listed. Information stored in data bases developed through SQL or Access can easily be searched, found, and classified (Akpınar, 2000). It is discussed whether or not the information stored in the Excel programs are structured. Such data can be searched in different ways to get useful information. Computers generally produce “related terms” based on the morphological characteristics of words rather than the meaning of words. The textual movement of the relations obtained using this way can be presented through graphics. Some of the software used for content analysis include Concordance, Crawdad Technologies LLC, Diction, General Inquirer, Hamlet II, INTEXT, Leximancer, Minnesota Contextual Content Analysis (MCCA), Profiler Plus, PROTAN, SALT Software, PASW Text Analytics for Surveys, T-LAB Tools for Text Analysis, TABARI, TACT (Text Analysis Computing Tools), Tapor Tools, Text Analysis Tool, Textalyser, Textanz, TextArc, TEXTPACK, TextQuest, VBPro, Wordcruncher, WORDij, Wordle, WordStat, Yoshikoder… In this field only limited number of texts could be used. However, it is needed to examine the different types of texts and audio-based and visual information. Data Mining Search engines give us many related or unrelated data. However, it is hard to access the related ones. Given that such data are not structured, it takes longer time to access the related data. This information are of text written in natural sound, visuals or language. It is estimated that more than 80% of the computer-based data are unstructured. Numerous firms, and institutions has stored their records at electronic settings and become “rich in terms of raw data, but poor in terms of having information”. For these firms and institutions it is hard to be successful in a competitive market and to

Using the Techniques of Data Mining and Text Mining in Educational Research

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support