Development of a Simple Graphical Interface Based Software for Machine Learning and Data Visualization

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8 Issue-2, July 2019 Development of a Simple Graphical Interface Based Software for Machine Learning and Data Visualization Ogunleye G.O., Fashoto S.G, Daramola C.Y., Ogundele L.A., Ojewumi T.O., Adewole Timilehin Abstract: Machine learning has become one of the foremost Clustering algorithms such as Mean Shift and KMeans; and techniques used for extracting knowledge from large amounts of deep learning algorithms such as the Artificial Neural data. The programming expertise required to implement machine Network. The technical know-how required to implement learning algorithms has led to the rise of software products that the various algorithms available in machine learning makes simplify the process. Many of these systems however, have it a challenge for non-programmers to be able to make use sacrificed simplicity as they evolved and included more features. In this study, a machine learning software with a simple of them. Therefore, it is necessary to provide a graphical graphical user interface was developed with a special focus on user interface (GUI) based application which hides the enhancing usability. The system made use of basic graphical implementation details of these algorithms from the user and interface elements such as buttons and textboxes. Comparison of also provide visualizations, therefore making it easy for non- the system with other similar open-source tools revealed that the technical users to use. There are various existing open developed system showed an improvement in usability over the source software packages that provide graphical user other tools. interface functionalities for machine learning tasks such as Keywords: GUI, Software, Machine Learning, data WEKA, RapidMiner, KNIME, and Orange. visualization The Waikato Environment for Knowledge Analysis, WEKA [25] is one such system. It provides easy access to I. INTRODUCTION machine learning tasks such as data preprocessing, Due to the increase in data generating devices and classification, regression, clustering, visualization and systems, there has been an exponential increase in the feature selection with both GUI and command line interface. amount of data stored in electronic format in the past few RapidMiner Studio [4] is a similar platform developed by decades[23]. As the volume of data being generated is RapidMiner. It has both proprietary and open source increasing, so is the amount of effort required for human versions. It also provides an integrated environment for easy beings to manually process them. This led to the preprocessing, machine learning, and predictive analytics. introduction of machine learning, an attempt to automate the However, some of its features are limited in the open source data analysis and processing procedures. version. Machine learning can be defined as the art of giving KNIME (Konstanz Information Miner) is a data analytics, machines the ability to detect meaningful patterns in data reporting and integration platform that uses modular data and make decisions based on these patterns without being pipelining to implement its machine learning components. explicitly programmed [1]. This involves giving the While these applications have their advantages, the steep machine a training data, generating a model from this learning curve associated with them due to their large training data and then using this model to predict future amount of functionalities and also the large amount of data. This is very useful for tasks that are too complex for system memory they occupy has created the need to produce explicit programming and tasks that require adaptivity. a simpler software focused on accommodating beginners. The field of Machine learning is one of the fastest The primary purpose of this paper is to develop an growing fields of computer science, with a wide range of application which simplifies machine learning tasks by applications. These applications include, but are not limited abstracting them into simple GUI operations. The software to: speech recognition (Deng and Li[3], fraud detection [1], presented in this paper will provide users without the recommender systems [5], anti-spam filters, medical knowledge needed to write programs the opportunity to diagnosis[3], bioinformatics [2], and computer vision. perform basic machine learning and visualization tasks by There are many algorithms available to perform these simplifying these tasks into GUI operations. It will also help tasks. Examples are classification algorithms such as K- experienced machine learning programmers who wish to nearest neighbors, Support Vector Classifier and Naïve quickly build machine learning models for small tasks to Bayes classifier; Regression algorithms such as Linear save time. The algorithms to be considered in the proposed Regression and Support Vector Regression; software are limited to: i. K-nearest neighbors, ii.Support Revised Manuscript Received on July 15, 2019. vector classifier, iii. Linear regression, iv. Ridge regression, Ogunleye G.O., Department of Computer Science, Federal University, v. Lasso regression, vi. Support vector regression, vii. K- Oye-Ekiti, Ekiti State, Nigeria. Fashoto S.G, Department of Computer Science, University of Means, viii. Mean shift, ix. Artificial neural network. Swaziland, Kwaluseni, Swaziland Daramola C.Y., Department of Computer Science, Federal University, II. RELATED WORKS Oye-Ekiti, Ekiti State, Nigeria Ogundele L.A., Department of Computer Science, College of 2.1 Overview Education, Ilesa, Osun State, Nigeria Ojewumi T.O., Department of Computer Science, Redeemer’s As computer technology is University, Ede, Osun State, Nigeria advancing, more people are Adewole Timilehin, Department of Computer Science, Federal University, Oye-Ekiti, Ekiti State, Nigeria making use of computer Published By: Retrieval Number: B3426078219/19©BEIESP Blue Eyes Intelligence Engineering DOI: 10.35940/ijrte.B3426.078219 3770 & Sciences Publication Development of a Simple Graphical Interface Based Software for Machine Learning and Data Visualization systems for various tasks. This implies that more people al.[14]) and Scikit-learn [10] are examples of general without much knowledge of programming need to perform purpose tools while KNIME [17] is a tool with focus on tasks that require expert programming knowledge. As a computations in Chemistry but with general purpose result, many efforts have been made to solve this problem applications. by creating software tools that can significantly simplify 2.2.4 Target users these tasks [13]. Tools made for non-technical users tend to be more The tools created for machine learning have varied intuitive, visually-oriented and less technical. Such tools in many aspects such as type (library, platform, framework), also feature reduced functionalities in order to avoid feature intended users (beginners, experts), interface (CLI, GUI, clutter. This ensures that simplicity, ease of use and API), domain (business, computational biology, statistics, learnability are maintained. general), licensing (open-source, proprietary) and Tools made for experienced users on the other hand usually functionality among many others. In the rest of this chapter, favor functionality and flexibility more than ease of use and some of these characteristics are discussed and some learnability. existing tools are reviewed. 2.3 Existing Machine Learning Tools 2.2 Features of Machine Learning Tools In the past couple of decades, machine learning has become 2.2.1 Software type a popular method for performing tasks that require the Programming languages are the lower level tools for extraction of information from datasets [6]. Several software machine learning. The choice of programming language tools have been developed to make machine learning used for a machine learning project may have a huge impact programming easier and faster. They are in the form of: on the results. This can be due to factors such as ease of use, i) Programming languages, speed, scalability, extensibility, availability of libraries, data ii) Libraries/Packages, structures and other programming structures [8]. Some of iii) Graphical User Interface-based the commonly used programming languages are Python, R, applications, platforms and toolkits, C++ and Java. iv) Combinations of any of these. Libraries and packages are collections of facilities 2.3.1 Programming Languages such as functions and methods that can be called by the user. Python Libraries usually provide only discrete facilities, that is, Python is one of the most popular programming language their facilities are specialized for specific tasks. for machine learning because of its simple syntax, its Platforms and toolkits provide more functionality modular architecture, and its huge repository of libraries. It than libraries, often enough to complete whole projects. can be challenging though for beginners to master the They can contain their own integrated development language because of its steep learning curve. environments. Their features are loosely coupled, enabling R Statistical Programming Language the user to tie them together in various ways [20]. R is a similar language designed mainly for statistical 2.2.2 User Interface computing, data mining and machine learning. This domain- User interface design is one of the central issues on the specific nature of the language together with its huge usability of a software having a huge impact on the repository of libraries positioned

Development of a Simple Graphical Interface Based Software for Machine Learning and Data Visualization

Quick Tour of Machine Learning (機器 )

Understanding Support Vector Machines

Using a Support Vector Machine to Analyze a DNA Microarray∗

Arxiv:1612.07993V1 [Stat.ML] 23 Dec 2016 Until Recently, No Package Providing Multiple Semi-Supervised Learning Meth- Ods Was Available in R

Arxiv:1210.6293V1 [Cs.MS] 23 Oct 2012 So the Finally, Accessible

Sb2b Statistical Machine Learning Hilary Term 2017

Xgboost: a Scalable Tree Boosting System

Computational Tools for Big Data — Python Libraries

LIBSVM: a Library for Support Vector Machines

Likelihood Algorithm Afterapplying PCA on LISS 3 Image Using Python Platform

Scikit-Learn

Liblinear: a Library for Large Linear Classification