Integrating Web Site Services Into Application Through User Interface
Total Page:16
File Type:pdf, Size:1020Kb
JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 22 No. 1 (2014), pp. 137-153 Integrating Web Site Services into Application through User Interface Artur Opalinski´ Gdansk University of Technology Faculty of Power and Control Engineering Narutowicza 11/12, 80-233 Gdansk Artur.Opalinski-at-pg-gda-pl Abstract. The issue of integrating applications which are only accessible through visual user interface is not thoroughly researched. Integration of web applications running remotely and controlled by separate organizations becomes even more complicated, as their user interface can display differ- ently in different browsers or change without prior notification as a result of application maintenance. While possible, it is generally not common for web sites to provide web services through standard mechanisms like SOAP, RPC, or REST, due to administrative, and especially security reasons. Program- matic use of the capabilities of numerous public sites which only provide web user interface to their services is very appealing, as they may signifi- cantly extend the functionality of other applications. This paper presents the research on employing existing software of various purpose to integrating web sites using their user interface. With selected method, some capabili- ties of Moodle are expanded by integrating remote Moodle server with local application, to create team-work support tools. Keywords: browser automation, web user interface, application integration, web extraction. 138 Integrating Web Site Services into Application through User Interface 1. Introduction Companies and organizations are building information systems by integrating previously independent applications, together with new developments. This inte- gration process has to deal with existing applications, which can only be used through their specific interfaces, and often cannot be modified. In many cases, the cost of rewriting an application would be prohibitive. The problem discussed in this paper arose from the work to extend web-based e-Learning Moodle environment into a collaboration tool [1]. There are free Moo- dle installations [2] as well as installations available locally at the universities [1], which allow to use an existing, external Moodle installation in administration-free model. It provides easy access to real data and users and avoids administrative burden associated with maintaining the web site. The work aims at adapting Moodle to support programming teamwork, by allowing programmers eg.: • to self-enroll to the team, • to make choices regarding work, concerning eg. assignment to sub-groups, or selecting programming goal, • to upload the code on deadlines, and to get automatic correctness checks. Moodle [3] offers already most of the components needed for the above pro- grammers’ tasks (Fig.1), i.e. users database, choice results data, course activities and user file store. It does not, however, integrate the necessary information and does not run correctness checks on files uploaded. Batch processing of uploaded source code files depicted in Fig.2, is much in- dependent of the interactive Moodle functionality so it can be handled by a sepa- rate application, running remotely and completely out of Moodle environment and only loosely coupled with Moodle for initial and final data transfer (Fig.1). In the administration-free model, the only available coupling to Moodle is its Web user interface (WebUI). This refines the issue of extending current Moodle capability, to the issue of integrating web site services into another batch processing application, through the web user interface of the former one. It should be noted, that it is not about integrat- ing user interface into the batch-processing application; the integration goes in the opposite direction rather. The issue is therefore different from integrating through A. Opali´nski 139 Figure 1. General idea of application integration, showing Moodle components used for programmers collaboration portals [4]. Equipping an application with interface to web sites opens access to a tremendous amount of online services, including online compilers [5], online code duplication or software plagiarism detection tools [6], online file format converters [7], etc. The issue can not be solved with web services, which aim at interoperable machine-to-machine interaction by providing interface described in a machine- processable format, as Moodle does not offer them. Web user interface is not easily processed by machine and moreover its for- mat in public web sites is generally unstable - relatively frequently unannounced changes are introduced, which are easy to accommodate by humans, but constitute an issue for automated processing. Therefore proper tools are needed to provide for flexible interfacing application to a web site through that site’s WebUI. This paper reviews tools and methods to make such interfacing possible, and presents the final solution chosen. 2. State of the Art The issue of integrating applications through web user interface (WebUI) has very little published research. Making software a commodity by developing an industry of reusable components was set as a goal in the early days of software engineering [8]. The term middleware circulates in similar meanings at least since the famous Garmisch NATO Conference in 1968 [9]. 140 Integrating Web Site Services into Application through User Interface Figure 2. Information sources, steps of data preparation and processing of up- loaded source code files in integrated programmers collaboration tool Middleware in its various forms is preferred over the point-to-point architec- ture for modern Enterprise Application Integration [10][11][12], but its focal point remains on application interaction and logical data integration rather than on inter- facing, so it is far from solutions to the problem considered here. Web mashups pursuit a similar goal of integrating increasingly tremendous amount of information and services available on the Web, distributed across differ- ent platforms, to provide together unified services, even ad hoc [13]. Web mashups are not assumed to access components over WebUI, but instead rely on established APIs through existing lightweight (RSS/Atom, REST) and future specialized Web Services [14][15] for data and service acquisition. Research concentrates on com- bined use of such disjoint information [16][17][18] rather than on its acquisition interface. A. Opali´nski 141 Web scraping is an umbrella term for various extraction techniques of web data. Web scrapping can be described as collecting target structured data from in- formation presented in human-readable form. The challenge is to locate informa- tion in static HTML documents, rather than to control the remote site to generate a specific report or do requested service. To find information on the web page, either regular expression search, or web page information coordinates utilizing DOM tree and Xpath [19], or a combination thereof is used in wrappers [20]. To account for changes in web page structure over time or to make the solutions more universal, self-adapting wrappers [21], detecting similar page elements [22] [19], or methods for selecting essential content [23][24] are researched. The survey [25] summarizes the many aspects of web data extraction. While extracting information from static web pages is a crucial element of application integration over WebUI, web pages are increasingly dynamic and build on a mixture of languages, including HTML, but also JavaScript, Java Applets, etc. Thus it is often not possible to extract information without some form of interaction with the active page components [26]. [27] explores the visual regularity of the data records and data items on the web page to automatically extract structured results (database records) from deep web pages, and thus avoids analysis of complex web page source files. [28] regards ele- ments in the rendered page as 2D space objects, as humans do. But these solutions, while more user interface-oriented, still do not pursue WebUI interactivity. Few approaches take advantage of the interactivity of user interface at some degree: to automate frequent tasks in the web-browser WebVCR [29] and Chick- enfoot [30] record and reply user actions, or record shortcuts to web content in smart bookmarks for later replay. Commmercial Lixto [31] supports scripting lan- guages and dynamic content. It can be used to extract web data based on visu- ally generated wrappers, coded in its internal Elog language. SIKULI [32] is a vi- sual technology to automate and test graphical user interfaces (GUI) using images (screenshots), but requires running a browser locally and displaying its window on a graphical screen, which may not always be present. Similarly AutoIT [33] is a freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting. [34] introduced a flexible integration framework to ease the GUI-oriented integration of COTS applications. None of the described approaches, while close to the problem, offers solutions ready for integrating applications over WebUI. 142 Integrating Web Site Services into Application through User Interface 3. Tools available Some widespread tools can be adapted to to integrate the services provided by Moodle site into a local batch-processing application. These tools are wget (or similar software: curl), lynx, and Selenium. The GNU wget [35] is a software package for retrieving files using HTTP, HTTPS and FTP. It is a non-interactive command line tool, so it may easily be called