Web Resource Changes Monitoring System Development
Total Page:16
File Type:pdf, Size:1020Kb
Web Resource Changes Monitoring System Development Lyubomyr Chyrun[0000-0002-9448-1751]1, Yevhen Burov[0000-0001-8653-1520]2, Bohdan Rusyn[0000-0001-8654-2270]3, Liubomyr Pohreliuk[0000-0003-1482-5532]4, Oleh Oleshek[0000-0002-2222-8773]5, Aleksandr Gozhyj [0000-0002-2641-5947]6, Іgor Bobyk[0000-0002-0424-1720]7 1IT Step University, Lviv, Ukraine 2,5,7Lviv Polytechnic National University, Lviv, Ukraine 3-4Karpenko Physico-Mechanical Institute of the NAS Ukraine 6Petro Mohyla Black Sea National University, Nikolaev, Ukraine [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. The detection of changes in the web content and notifying users about them in timely manner is an important service for both the owners of the web sites and ordinary users. The system created automatically captures and monitors the differences between the content of the web page at a given mo- ment of time and compares them with saved earlier. The system periodically parses the contents of files on a server, available for public access according to the given pattern. It compares its contents with previously stored version and in case if discrepancies are discovered, immediately notifies the user. Thus, the user will not have to monitor manually the specified Internet resources. Keywords: Web resource, content, SEO technology, information technology, web technology, changes monitoring, web server, intelligent system, operating system, web page, automated monitoring system, neural network, content downloader, automated monitoring, software product, monitoring system, trademark office, computer science, neural network, intelligent automated mon- itoring system, user interface, global network, internet resource, web service, web site, regular expression pattern, apache http server, electronic content commerce system, server part 1 Introduction With every year, the information technology industry gains more and more rapid de- velopment. Smart systems for monitoring, analyzing and managing things are being introduced everywhere. However, the development of information technology is not limited only to this area. All related industries are also developing at a rapid pace. In order to increase productivity and improve the quality of the final product, the human labor is replaced by automation [1]. A circle is creating: the development of the IT sphere stimulates other sectors of employment, which in theirs turn stimulate the de- velopment of IT. That is why this process is becoming increasingly widespread with every year. It is logical that during this rapid development, we have a desire to keep up with the new trends and to develop at the same pace. To meet these needs, we need to constantly “consume” and “absorb” a huge amount of fresh information from mul- tiple sources. Unfortunately, the development of our abilities to perform this process cannot be improved by purchasing new, more powerful computers [2]. For today, it is not enough to read just published books or magazines to keep a finger on the pulse of the modern world [3]. The main reason of this is the huge amount of resources, need- ed for the publication of even one textbook, the main of which is time. The time that is spent on the design, printing and distribution of this publication. Fortunately, we have the Internet for more rational use of opportunities. Exactly thanks to this the global web has gained such popularity. As the main source of information the Internet opens for us lots of data venues. Similarly, the content sites are trying to attract the user with all possible palettes of web site design and functionality [4]. For this pur- pose, owners of Internet resources don’t spare the forces and facilities. In the face of intense competition, there is a need to constantly maintain the quality of the product at the proper level [5]. To achieve this aim it is necessary to invest not only in the devel- opment of something new, but also in preserving the already acquired that is protect- ing the site from unwanted changes and supervising the correct the work of all its components [6]. As an option we can design a lot of data processing, create a bunch of checks to protect against penetration of the unauthorized code into a page. But none method or even a combination of them all will allow to be sure for one hundred percent in the security of the initial data [7]. That is why the logical decision is not to going blindly for the unachievable aim, but to take care of possible solutions of inevi- table problems as they appears [8]. An intelligent system of automated monitoring of changes in web resources serves this purpose. 2 Analytical review of literary and other sources 2.1 Subversion (SVN) The first systems that provided the tracking of changes in the documents were the version control systems [9]. One of them is Subversion or SVN. Officially it was introduced in 2004 by CollabNet and represents a free, centralized version control system. It was built on the basis of CVS, but without its errors and disadvantages. At that time CVS was an unofficial standard among developers [10]. The work with SVN is not much different from the work with other centralized version control systems. Users copy files from the central repository, works with these copies and synchronize them with those that are left in the main repository, commenting on new edits. The ability of two or more users working with the same files is a useful feature. This is provided by creating local copies of the original files. However, this product also has the function of blocking access to a particular file, so that only one user could work with it. Also, the smart decision of this system is the use of the mechanism of delta-compression – the saving of not the original and final contents of the file, but only the differences between them [11]. The key features that SVN system provides for its users [12]: smart tracking of changes of all in a given directory – the file transfers, their re- naming or attributes editing are fixed; convenient mechanism of synchronization of the proper files; emulation procedure: ─ the making of new branches takes place by copying the old ones and adding new edits; ─ automatic finding of differences and their unification during branches merging; edits, no matter how large are they, are fixed in a single repository in the atomic transactions representation; transferring and processing only the differences between files, but not themselves; the same qualitative processing of both binary and text files; two storage formats: ─ repository – database; ─ repository – a set of regular files; the ability to write commits in any language; the ability to create a reflection copy of the repository; Although the SVN system is based on CVS, however, there is little left from it. Here is a list of the major differences [13]: the way of working with text and binary files was improved; boundaries in which the system tracks changes were increased – now it can control not only files but also directories; a set of properties of each file was added to the control; it saves network traffic during transferring files to the repository that are not com- pletely modified but only different parts; the ability to lock files during work was created, so that nobody can makes changes in parallel. For a better understanding of what a version control system represents and what pos- sibilities it offers, see Fig. 1 [14]. Fig. 1. Visualization of a simple Subversion project This figure shows the way of project development. It resembles the structure of a tree with a multitude of branches. Exactly from here the name for the mechanism for cre- ating various temporary versions of one software product is created. The trunk of the project is marked by green color, by yellow – the branches. Branches are created at the time of development for some additional functions in the system. In the case of their successful realization additional branches are merged into the trunk. This pro- cess is called synchronization and is indicated by a red arrow in this figure. Blue – branches of the project that will no longer change, are called tags. Discontinued de- velopment branch is marked by purple [15]. The process of user interaction with the SVN system can be divided into the following parts [16]: 1. Connect to a central repository. 2. Download the latest version of the project from the repository or creation a new one. 3. If the project was created earlier – make the necessary changes to the working copy. If this is a new project – work with it as with any other project, not tracked by this system. 4. Commit of accomplished actions and providing a textual description of what was made. 5. Transfer the completed commit to the repository. Due to the strict observance of this plan the history of the project development is cre- ated with the possibility at any time to return to a certain stage of development and go to a different way [17]. 2.2 SiteLock SiteLock is protecting web pages in real time [18]. This product was introduced in 2009 as the start-up of two programmers – Neville Feather and Scott Lovell, which eventually turned into an advanced security in the field of web security. This product was developed to expand the ability to protect resources written in WordPress CMS. It extends the set of standard features by its own, which will be useful to both the web developers and the owners of Internet resources. The main advantage of this software is the enormous variety of website scanning tools that prevent most problems for ap- pearing.