<<

Taieri 19,1 / 17 / 27,1 / 24,5 De redus interior 95% 8 , 201 , 2018 . 2 December December

2 . No 2 . 12 ol. ol. 12 No

2018 ol. 12 No

ISOM – V ISOM - V

December nal of . 2 ISOM - V nal of our No J

our 12 J 8 nal of ol. our J

ISOM – V

December 201 ISSN: 1843-4711 nal of

our J JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Vol. 12 No. 2 December 2018

EDITURA UNIVERSITARĂ Bucureşti

Foreword Welcome to the Journal of Information Systems & Operations Management (ISSN 1843-4711; IDB indexation: ProQuest, REPEC, QBE, EBSCO, COPERNICUS). This journal is an open access journal published two times a year by the Romanian-American University. The published articles focus on IT&C and belong to national and international researchers, professors who want to share their results of research, to share ideas, to speak about their expertise and Ph.D. students who want to improve their knowledge, to present their emerging doctoral research. Being a challenging and a favorable medium for scientific discussions, all the issues of the journal contain articles dealing with current issues from computer science, economics, management, IT&C, etc. Furthermore, JISOM encourages the cross-disciplinary research of national and international researchers and welcomes the contributions which give a special “touch and flavor” to the mentioned fields. Each article undergoes a double-blind review from an internationally and nationally recognized pool of reviewers. JISOM thanks all the authors who contributed to this journal by submitting their work to be published, and also thanks to all reviewers who helped and spared their valuable time in reviewing and evaluating the manuscripts. Last but not least, JISOM aims at being one of the distinguished journals in the mentioned fields. Looking forward to receiving your contributions, Best Wishes Virgil Chichernea, Ph.D. Editor-in-Chief JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

GENERAL MANAGER Professor Ovidiu Folcuţ

EDITOR IN CHIEF Professor Virgil Chichernea

MANAGING EDITORS Professor George Căruţaşu Lecturer Gabriel Eugen Garais

EDITORIAL BOARD

Academician Gheorghe Păun Romanian Academy Academician Mircea Stelian Petrescu Romanian Academy Professor Eduard Radaceanu Romanian Technical Academy Professor Pauline Cushman James Madison University, U.S.A. Professor Ramon Mata-Toledo James Madison University, U.S.A. Professor Allan Berg University of Dallas, U.S.A. Professor Kent Zimmerman James Madison University, U.S.A. Professor Traian Muntean Universite Aix –Marseille II , FRANCE Associate. Professor Susan Kruc James Madison University, U.S.A. Associate Professor Mihaela Paun Louisiana Tech University, U.S.A. Professor Cornelia Botezatu Romanian-American University Professor Ion Ivan Academy of Economic Studies Professor Radu Şerban Academy of Economic Studies Professor Ion Smeureanu Academy of Economic Studies Professor Floarea Năstase Academy of Economic Studies Professor Sergiu Iliescu University “Politehnica” Bucharest Professor Victor Patriciu National Technical Defence University Professor Lucia Rusu University “Babes-Bolyai” Cluj Napoca Associate Professor Sanda Micula University “Babes-Bolyai” Cluj Napoca Associate Professor Ion Bucur University “Politehnica” Bucharest Professor Costin Boiangiu University “Politehnica” Bucharest Associate Professor Irina Fagarasanu University “Politehnica” Bucharest Professor Viorel Marinescu Technical University of Civil Engineering Bucharest

Associate Professor Cristina Coculescu Romanian-American University Associate Professor Daniela Crisan Romanian-American University Associate Professor Alexandru Tabusca Romanian-American University Associate Professor Alexandru Pirjan Romanian-American University

Senior Staff Text Processing: Lecturer Justina Lavinia Stănică Romanian-American University Lecturer Mariana Coancă Romanian-American University PhD. student Dragos-Paul Pop Academy of Economic Studies

JISOM journal details 2018

No. Item Value 1 Category 2010 (by CNCSIS) B+ 2 CNCSIS Code 844 JOURNAL OF INFORMATION 3 Complete title / IDB title SYSTEMS & OPERATIONS MANAGEMENT 4 ISSN (print and/or electronic) 1843-4711 5 Frequency SEMESTRIAL Journal website (direct link to journal 6 http://JISOM.RAU.RO section) ProQuest

EBSCO

REPEC IDB indexation (direct link to journal http://ideas.repec.org/s/rau/jisomg.html 7 section / search interface) COPERNICUS http://journals.indexcopernicus.com/kar ta.php?action=masterlist&id=5147

QBE

Contact

First name and last name Virgil CHICHERNEA, PhD Professor

Phone +4-0729-140815 | +4-021-2029513

E-mail [email protected] [email protected]

ISSN: 1843-4711 The Proceedings of Journal ISOM Vol. 12 No. 2

CONTENTS

Editorial

Costin-Anton BOIANGIU AN OPTICAL APPROACH FOR DECODING THE 235 Ana-Karina NAZARE MYSTERIOUS VOYNICH MANUSCRIPT Andreea Dorina RACOVIŢĂ Iulia-Cristina STĂNICĂ Lucia RUSU EXTENDED ERP USING RESTFUL WEB 249 Ervin GERŐCS-SZÁSZ SERVICES CASE STUDY: WINMENTOR ENTERPRISE®

255 Adrian ATANASIU THE FACTORIZATION OF X – 1 IN Z2[X] 257 Bogdan GHIMIŞ Valentin Gabriel MITREA A VOTING-BASED IMAGE SEGMENTATION 265 Mihai-Cristian PÎRVU SYSTEM Mihai-Lucian VONCILĂ Costin-Anton BOIANGIU Ana-Maria Mihaela IORDACHE ECONOMIC INDICATORS AND HUMAN 281 Ionela-Cătălina ZAMFIR DEVELOPMENT INDEX

Andrei Alexandru ALDEA DOCUMENT LAYOUT ANALYSIS SYSTEM 292 Radu Gabriel CORIU Ştefan-Vlad PRAJICĂ Răzvan-Ştefan BRÎNZEA Costin-Anton BOIANGIU Andreea BARBU CUSTOMER LIFETIME VALUE AND 303 Bogdan TIGANOAIA CUSTOMER LOYALTY Răzvan-Costin DRAGOMIR VOTING-BASED HDR COMPRESSION 312 Costin-Anton BOIANGIU Oana CĂPLESCU APPLYING FILTERS TO 326 Costin-Anton BOIANGIU IMPROVE PEOPLE AND OBJECTS RECOGNITION USING AN API Alin ZAMFIROIU UNIT TESTING FOR OF THINGS 335 Daniel SAVU Andrei LEONTE Diana-Elena BELMEGA A SURVEY ON SOUND-BASED GAMES 349 Costin-Anton BOIANGIU Roxana Ştefania BÎRSANU THE USE OF ELECTRONIC RESOURCES IN 360 THE PROCESS OF FOREIGN LANGUAGE TEACHING/LEARNING

Andreea BARBU ROMANIA’S ENERGETIC SYSTEM 372 Bogdan TIGANOAIA Elena BANTAŞ IMAGE RECOLORING FOR -DEFICIENT 383 Costin-Anton BOIANGIU VIEWERS Ana-Maria Mihaela IORDACHE TAKING DECISION BASED ON THE 391 REGRESSION USING EXCEL 2016 Andrei LEICA FULLY CONVOLUTIONAL NEURAL NETWORKS 400 Mihai Bogdan VOICESCU FOR IMAGE SEGMENTATION Răzvan-Ştefan BRÎNZEA Costin-Anton BOIANGIU Georgiana-Rodica CHELU MULTI-ALGORITHM IMAGE DENOISING 411 Marius-Adrian GHIDEL Denisa-Gabriela OLTEANU Costin-Anton BOIANGIU Ion BUCUR Alexandru PRUNCU MAJORITY VOTING IMAGE BINARIZATION 422 Cezar GHIMBAS Radu BOERU Vlad NECULAE Costin-Anton BOIANGIU Laura SAVU DEVELOPING VIRTUAL REALITY AND 431 AUGMENTED REALITY PROJECTS WITH UNITY3D

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

AN OPTICAL APPROACH FOR DECODING THE MYSTERIOUS VOYNICH MANUSCRIPT

Costin-Anton BOIANGIU 1* Ana-Karina NAZARE 2 Andreea Dorina RACOVIŢĂ 3 Iulia-Cristina STĂNICĂ 4

ABSTRACT

The current paper presents several optical approaches used for investigating the decoding of the Voynich Manuscript. Over the past century, there have been numerous decipherment claims, but none of them can be accepted as the true solution. Most of them focus on the syntax analysis of the text, trying to correlate the Voynich text with well- known languages. In the current paper, we present a short history and description of the manuscript, as well as an innovative technique used for attempting its decoding. Our paper presents the use of several optical approaches, such as various distortions, words and scrambling, in the attempt of correctly identifying voynichese words with the help of OCR.

KEYWORDS: Voynich Manuscript; optical devices; OCR; decipherment.

1. INTRODUCTION

The Voynich manuscript is a mysterious document, allegedly dated from the 15th century (based on radiocarbon dating) [1]. The book is written in an unknown language and contains many illustrations which can be used to divide the manuscript into six sections: herbal, astronomical, biological, cosmological, pharmaceutical and recipes [2]. There are approximately 240 pages left of the manuscript, as some others might have been lost over time. The author, date and origin of the book are not known; the name is based on that of the Polish book dealer Wilfrid Voynich, who got in the possession of the manuscript in 1912. Over time, before being bought by Voynich, the manuscript had numerous owners, such as Czech scientists, Emperor Rudolf II or members of the Jesuit order. Currently, it is kept in the Beinecke Rare Book & Manuscript Library of Yale University [3]. In order to study the text of the Voynich Manuscript, a transcription was made using the EVA (Extensible Voynich Alphabet), further influencing the numerical analysis, ‘word’

1* corresponding author, Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 2 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 3 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 4 Engineer, “Politehnica” University of Bucharest, 060042 Bucharest, Romania, [email protected]

235

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT length, distribution and statistics. The text was analyzed and several observations were made along the years with regard to characters, words, sentences, paragraphs and sections [3]. Regarding the characters, it was observed that some of them appear generally at the end of lines, while others appear almost always at the beginning of paragraphs or on the first lines. Also, some characters seem to have separating or conjunctive function and are often found together. It is believed that some characters have the role of final letters, with varying frequencies [4]. Some characters appear very rarely and only on some pages of the manuscript, unusual fact for most languages. In terms of words, the same ones can be repeated two, three or more times in a phrase [5]. Also, many words differ by only one character and are placed in proximity to each other. Throughout the text, there are only a few words of one single character. Although it is not certain from the manuscript what constitutes a ‘word’ in the grammatical sense or if the spaces are separators between words, analysts found 8114 different words and a total of 37919 tokens in the whole manuscript. Due to these uncertainties, the word statistics are the least reliable of all other statistics. However, although tokens appear to have a normal frequency distribution, they may represent syllables, as well as single characters, instead of real words. Additionally, there is a high number of word that appear only once in the whole manuscript [6]. Our paper consists of four sections: the first one is the introduction, in the second one we analyze the current attempts of decoding the manuscript, the third one presents our proposed approaches and the final one draws the conclusions of the research.

2. DECIPHERMENT CLAIMS

Over the years, a relatively high number of decipherment claims have appeared. Unfortunately, none of them could be labeled as being valid - some of them manage to identify just a few words, while others pretend to hold the key for the decoding of the entire manuscript. We will present further a few examples of deciphering claims, based on various criteria.

2.1. Old theories

There have been many theories about the meaning of the Voynich Manuscript since its discovery over a century ago. One of the earliest theories for deciphering the manuscript comes from the philosopher William Romaine Newbold. His claim is based on micrography, sustaining that the letters have no real meaning, as they are actually composed of many small signs, which can be seen only under a magnifying glass. This theory was disclaimed as being too speculative [7]. Joseph Martin Feely pretends it is a book written by Roger Bacon. By using substitution and starting from some words related to specific illustrations, he claims that it is written in a medieval, extremely abbreviated latin [5]. Leonell C. Strong suggests that after using some sort of double arithmetical progression, he discovered that the manuscript is written in a medieval form of English [8]. Both theories were considered as being extremely subjective.

236

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2.2. Recent theories

Recent theories have appeared, made by various linguists and researchers all over the world. Dr Arthur Tucker came in 2014 with a new theory: he identified a series of proper nouns, such as plants and constellations, based on their representative drawings. The linguist suggests that the mysterious language of the document is called Nahuatl (Aztec language), but he wasn’t able to identify anything else except for those specific nouns [9]. Stephen Bax also sustained in 2014 that he succeeded in identifying several nouns (constellations or plants), by using a technique similar to the one used to decipher the hieroglyphs [10]. One of the most recent claims in deciphering the manuscript was made by Agnieszka Kałużna and Jacek Syguła. Their research, published in 2017 and entitled “The Key to The Voynich Manuscript”, is based on correlating the symbols from the Voynich Manuscript with prefixes, suffixes or abbreviations from the Latin alphabet. They give some examples of translations by passing through numerous languages, such as Latin, Greek, French or Italian [11]. Another claim of decipherment comes from Canadian researchers, who used artificial intelligence to identify the language of the manuscript. They pretend that it is written in ancient Hebrew and that the words are actual alphagrams (words which must have their letters arranged alphabetically). Their results seem to be remarkable, with 80% of words making sense in Hebrew and the first sentence of the manuscript being identified as “She made recommendations to the priest, man of the house and me and people” [12]. After performing a quick OCR on some Voynich pages and setting the language as Hebrew, some words were identified, with various meanings: “ancestors”, “forgive”, “take a look”, but these can be purely coincidences, as they are not necessarily correlated with the images or the theme of their corresponding chapter.

2.3. Hoax theories

Giving the fact that none of the theories proposed was accepted as being a valid decipherment of the Voynich Manuscript, people started to believe that it could be a hoax. Even if the parchment is allegedly dated from the 15th century, based on radiocarbon dating, it could have been used and written on centuries after being prepared. Some researchers say that Voynich himself created the manuscript and the now missing pages were removed later in order to make Roger Bacon as the intended author [13]. Some famous debate appeared between two scientists: Montemurro and Rugg. Gordon Rugg, a computing expert, claims that by using a card with randomly cut holes and moving it across various syllables, you can obtain a language that follows the statistical rules of true languages. On the other hand, physicist Marcelo Montemurro disagrees with the previously mentioned theory, stating that the manuscript itself is too complex and contains too many statistical similarities between its sections (regarding both words and images) [13] [14].

237

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

3. PROPOSED THEORY AND POSSIBLE ALGORITHM

Taking into account the fact that most decipherment techniques focused on searching for the meaning of the voynichese characters and words by correlating them with other languages, we considered that a totally different approach was needed. It is possible that several optical instruments (mirrors, lenses) were used in order to create the Voynich manuscript. We propose the simulation of different types of image transformations by using computer software in the attempt of correctly identifying the meaning of the Voynich manuscript.

3.1. Optical instruments for distortions

In the beginning, the best solution was to implement parameterized transformations which could be applied as filters to images containing selected chunks of texts. Prior to implementing the transformations as equations, they have been tested using specialized tools from dedicated programs (such as the Kaleidoscope tool from KrazyDad [15] or Adobe Photoshop filters). The kaleidoscoppe effect (figure 1), while it is quite beautiful and interesting, doesn’t promise successful results in actually distorting the content, and the process of applying this effect on single words in order to encode such rich text is time consuming and highly unlikely to have been used.

Figure 1. Kaleidoscope effect on Voynich words

238

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 2. Polar coordinates distortion Figure 3. Wave (sine) distortion

We have tested several Photoshop effects, such as pinch, polar coordinates (rectangular to polar - figure 2), ripple, shear, spherize, twirl, wave (sine) - figure 3, zigzag, lens correction. Pinch, by example, could be simulated in reality using a conical mirror. The transformation of polar coordinates from rectangular to polar corresponds to the use of a cylindrical mirror. Spherize, shear, ripple, wave can be simulated using lens of specific shapes. Given the fact that the effects applied in Photoshop on the entire pages of the manuscript didn’t give any relevant results, we decided to implement ourselves several geometric distortions which can be simulated using optical instruments available in the 15th century. They are: mirroring, rotation, twirl, fisheye, inverse conical deformation, inverse cylindrical deformation, skew, perspective transformation. There are two approaches to this system. Either the text is considered ciphered and the deciphering method consists only in applying the inverse deformation, or the text has been encrypted in such a manner that it can be read only by applying a direct transformation (representing the physical object - the decypher key). Therefore we identify three classes of deformations to be applied: • Direct optical deformations (Mirroring, Rotation, Twirl, Fisheye) • Inverse optical deformations (Inverse cylindrical and conical deformations and, again, Mirroring and Rotation) • Generall optical deformations (Skew, Perspective) In order to test the implemented transformations, we selected four high resolution images from the Voynich Manuscript, which only contain text, as the drawings were irrelevant for our distortion techniques.

239

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

All complex distortions have customizable parameters, such as the twirl angle, the radius of the cone/cylinder base. For each transform and distortion technique, we can specify some width and height values and select a region of interest (ROI) from the original image. The distortion can, therefore, be applied also on that specific window, not just on the entire image. We decided to create an interactive application, where the user can specify the wanted parameters and choose the center of the ROI using the mouse click (figure 4). The interface, as well as the loading and saving of the images were done using OpenCV as an external library.

Figure 4. Output of the program - Twirl effect

Figure 5. Skew results

240

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 6. Perspective transformation results

3.2. Scrambling

Another approach taken into consideration is the scrambling of entire words or word parts situated in pseudo-random positions or around key positions marked by special characters. A certain number of ROIs are scrambled all around the page, in order to see if we can obtain some meaning to the text. The user can select in the interface the number of patches that will be scrambled, interchanging the location of each pair. The software saves the resulting images in a separate folder so that OCR can be applied to recognize words. A preliminary step is providing an input set for the scrambling operations. This is represented by a set of words and word sections which must be detected in the source image and extracted. Word extraction was realized by identifying contours and extracting bounding rectangles in which words are identified. Scrambling is then applied on the detected words, with a fixed height and variable word length to select entire or partial words.

Word extraction The implemented words extracting algorithm is a low-level image processing feature extraction algorithm [16]. As its initial set of data it receives a large resolution image of a whole, colored manuscript page and outputs a set of bounding rectangles for each individual word. The first step in extracting words from the image is processing the image in order to easily identify regions of interest. Firstly, the image is converted to grayscale, as only the morphological information is needed, and then filtered using a bilateral filter, in order to reduce noise (such as the parchment texture artefacts). Next, the image is binarized and inverted (for further processing), using an adaptive threshold method offered by OpenCV. After thresholding the image, it is necessary to apply morphological operations of dilation and erosion respectively, in order to connect letters and disconnect words and lines, thus obtaining blobs for individual words (Figure 5. B).

241

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

The dilation is horizontal, while the erosion is vertical, because the letters in each word are disconnected, and the rows of the text are quite compact, and vertical neighboring words need to be separated. Lastly, by applying a custom function, the contours of the blobs are obtained. For each contour whose length is greater than a threshold calculated from the original dimensions of the source image, its straight bounding rectangle is determined (Figure 5. C) and then only this will be used for ulterior processing [17]. The algorithm also detects regions that do not correspond to words (such as drawings, sheet edges etc.), or even groups of connected words. Therefore, when selecting the correct bounding rectangles, several conditions must be taken into account. The following features are established through simple observation and direct testing, even though their limitations sometimes eliminate words correspondents in the final data set. Rectangles having an area which is too large or too small, or having a height more than three times larger than the average height of the rectangles (being assumed to be the average height of a word) are overlooked. Another condition is that words should have a greater length than height. Finally, for each filtered rectangle, a mask from the binarized source image is selected. A ratio of non-zero is calculated for each ROI, and if this ratio is greater than 0.45 (assuming words have a greater than 45% text surface), the rectangle is accepted.

Figure 5. Image processing steps in detecting words. A. Part of source image B. Binarized and dilated image. C.1., C.2. Identified contours and bounding rectangles D. Two of the identified words

242

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

OCR Tesseract OCR is probably the most famous optical character recognition system, developed initially by Hewlett Packard and currently sponsored by Google. We used the API on each image saved after performing the scrambling in order to see if any text can be extracted from it. There were no concluding results, so further investigations are needed.

Template matching Template matching is a well-known technique used in image processing, which has the goal of finding a sub-image (template) in a bigger image (search area). The process implies the translation of the template over the search area and calculating the similarity between each translated window and the original template [18]. An essential step is represented by the way the similarity measure is chosen is order to quantify the “matching”. Some of the most used similarity metrics include cross correlation and sum of absolute differences [19]. We used OpenCV in order to perform the template matching. By sliding the template window from left to right and up to down, a resulting matrix is created with the value obtained with the similarity matrix in that exact position (x, y). We used a trackbar with a certain threshold in order to accept a lower or higher degree of similarity between the wanted characters (template) and the specific Voynich page area. Template matching was essential in order to identify gallows characters (characters which raise above the other characters) (figure 6).

Figure 6. Voynich gallows characters

We used template matching on both pages of the Voynich Manuscript (figure 7) and various Medieval English manuscripts (figure 8). The interesting fact is that the template matching technique returned positive results also on some Medieval English pages, showing similarities between characters.

243

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 7. Template matching on a Voynich page

Figure 8. Template matching on a Medieval English manuscript

3.3. Mirrors

Our third approach for deciphering the Voynich manuscript consists in using various sizes and shapes of mirrors (rectangular, oval) in order to transform the text (figure 9). We tried to place the mirrors in different ways: facing each other (figure 10), perpendicular (figure 11), at an acute or obtuse angle. When positioning the mirrors facing each other, a special phenomenon called “infinite reflection” is produced, but in the end, the original image is obtained. A single mirror produces one image, two mirrors perpendicular on each other produce three images, while two mirrors at an acute angle produce more than three

244

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT images. The mirrors were also positioned in different spots on a page: in the upper part, at the middle or at the bottom. We also tried to fold the page and position the mirrors between words, between lines or next to gallows characters. An interesting result (figure 12a) was obtained on a specific Voynich page, which contained symmetrical text and images from the beginning (figure 12b).

Figure 9. Single mirror placed at the top Figure 10. Two mirrors facing each other

245

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 11. Two perpendicular mirrors

Figure 12. Symmetrical images. Left: Mirrored drawings; Right: Original Voynich page.

246

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Our tests with the mirrors did not lead however to any conclusive results. Even if the text might be mirrored (as we also simulated through computer software) or written from right to left (similar to Hebrew writing), the images obtained did not bring any known meaning to the mysterious writing.

4. CONCLUSIONS

After experimenting with a wide variety of simulated distortion devices, we can conclude that it is unlikely that the Voynich manuscript was written using one such optical instrument (mirrors or lenses). Other techniques, such as word scrambling based on keywords or gallows characters, didn’t show concluding results either. Combining the power of artificial intelligence, OCR and trying to identify meaningful words and sentences seem to represent the best option to finally decode the mysterious manuscript.

REFERENCES

[1] Steindl, Klaus; Sulzer, Andreas (2011). "The Voynich Code - The World's Mysterious Manuscript" [2] Schmeh, Klaus (2011). "The Voynich Manuscript: The Book Nobody Can Read". Skeptical Inquirer. [3] Zandbergen, René. "Voynich MS - Long tour: Known history of the manuscript". Voynich.nu [4] Tiltman, John (1967). “The Voynich Manuscript, The Most Mysterious Manuscript in the World”. NSA Technical Journal 12, pp.41-85. [5] D'Imperio, M. E. (1978). “The Voynich Manuscript: An Elegant Enigma”. National Security Agency. pp. 1–152. [6] Reddy, Sravana; Kevin, Knight (2011): “What we know about the Voynich Manuscript”. Proceedings of the ACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. [7] "Penn Biographies - William Romaine Newbold (1865–1926)". University of Pennsylvania. [8] Winter, Jay (October 17, 2015). “The Complete Voynich Manuscript Digitally Enhanced Researchers Edition”. Lulu Press. pp. 1–259. [9] Flood, Alison (2014). “New clue to Voynich manuscript mystery”. , Science and Nature. [10] Bax, Stephens (2014). “A proposed partial decoding of the Voynich script”. [11] Kaluzna, Agnieszka; Syguła, Jacek; Jaśkiewicz, Grzegorz (2017). „The Key to The Voynich Manuscript”. [12] Hauer, Bradley; Kondrak, Grzegorz, (2018). “Decoding Anagrammed Texts Written in an Unknown Language and Script”. Transactions of the Association for Computational Linguistics

247

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[13] Rugg, Gordon; Taylor, Gavin (2016): “Hoaxing statistical features of the Voynich Manuscript”. Cryptologia Journal 41, pp. 247-268. [14] Montemurro, Marcelo A.; Zanette, Damian H. (2013): “Keywords and Co- Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis”. Plos One Journal. [15] Krazy Dad (2017). Make your own kaleidoscope. Available online: https://krazydad.com/kaleido/, Accessed at: April 27, 2018. [16] Mark S. Nixon, Alberto S. Aguado (2012). “Feature Extraction & Image Processing for Computer Vision”, Academic Press 2012 [17] OpenCV Official documentation, Image Thresholding, Finding contours in your image, Contour features [18] Kim, Yongmin; Sun, Shijun; Park, HyunWook (2004). “Template Matching Using Correlative Autopredicative Search”. University of Washington. [19] Yang, Ruigang (2016). „Object recognition and template matching”, Universitatea din Kentucky, SUA.

248

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

EXTENDED ERP USING RESTFUL WEB SERVICES CASE STUDY: WINMENTOR ENTERPRISE®

Lucia RUSU 1* Ervin GERŐCS-SZÁSZ 2

ABSTRACT

Enterprise Resource Planning Systems are dominated by explosion of Internet technologies and WEB evolution. Concepts like extended enterprise ERPII and ERPIII became more attractive and suitable for newest business process. The purpose of this paper is to examine the new generation of ERP and REST concept used in ERP II implementations. The paper offers a concrete case study using WinMENTOR ENTERPRISE® implemented with REST as an ERP II solution for e-commerce in a company of bio products.

KEYWORDS: Extended ERP, RESTful web services, e-commerce

1. INTRODUCTION

The origin of Enterprise Resource Planning (ERP) stands in 1960 when appeared first Inventory management and control systems. A brief review pointed evolution to 1970: MRP (Material Requirements Planning) systems, 1980: MRPII (Manufacturing Requirements Planning) systems then in 1990 Enterprise Resource Planning (ERP). Internet Development and companies’ extension via e-business solution forced another ERP expansion (started to millennium) to Extended ERP/ERPII systems. From now one term like Enterprise Systems (ES), Enterprise Information Systems (EIS) and Enterprise Applications (EA) describe better actual processes in companies. ERP solve internal application integration: back-office and front-office (Espinoza and Windahl, 2008). ERP II proposes a web-centric solution, which involves Web platform, new modules like: e-business, business intelligence (BI), cloud SaaS, thin client-server. Extended ERP solve an inter-organizational integration across the whole supply chain, both customers and suppliers and offer also open source ERP systems as a solution for small enterprises (Mullaney, 2012). Web 2.0, 3.0 and 4.0 facilities were used in Extended ERP and from 2010 software vendors and developers offer a Post-Modern ERP or Entire Resource Planning (ERPIII). This approach include BI and analytics, RFID, performance IT tools, internet of things (IoT), ecosystems, in-memory technologies, mobility, and increased integrated functionality (Wood, 2011).

1* corresponding author, PhD., Professor, "Babes Bolyai" University, Cluj-Napoca, Romania, [email protected] 2 Software Developer, L&E Solutions, Cluj-Napoca, [email protected]

249

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

This paper presents a solution for Extended ERP implementation for a medium size company which used e-commerce solution as main relation with customers. After a short introduction, section 2 describes main issues of ERP system and Restful Web services. The next section offers a practical implementation of WinMENTOR ENTERPRISE® with Restful Web services and a functional evaluation and results based on quantified business parameters. Conclusions and future work are described in section 4.

2. THEORETICAL FUNDAMENTALS

2.1. From ERP to ERPII and ERPIII

For ERP evolution argumentation we find several definitions, all of them agree that ERP is the core system in every ES software. Motiwalla & Thompson approach (2012) put the accent on internal value chain and from that point of view “ERP systems are the specific kind of enterprise systems to integrate data across and be comprehensive in supporting all the major functions of the organization”. If we analyse an ERP from the supply chain process, it “is a modular software package for integrating data, processes, and information technology, in real-time, across internal and external value chains” (Shang&Sheddon, 2002). Another comprehensive definition for ERPII underlined that it “extends the foundation ERP system’s functionalities such as finances, distribution, manufacturing, human resources and payroll to customer relationship management, supply chain management, sales-force automation, and Internet-enabled integrated e-commerce and e-business” (IGI- global, 2017).

Table 1: ERP, extended ERP and their functionalities (de Búrca et al., 2005). Functionalities ERP Extended ERP Procurement e-procurement, SCM Production SCM, CRM, Supplier web Sales e-commerce, SCM, CRM Distribution e-commerce, SCM

As a synthesis, Table 1 shows how ERP II extend business process function with e- procurement, SCM, CRM, e-commerce and supplier web and Figure 1 shows a relevant manner on how customers and suppliers are in central role of companies’ business, linked together with central databases’ information and cooperate with companies via SCM, CRM e-commerce and e-procurement modules.

250

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 1. Conceptualization of the extended ERP (Espinoza and Windahl, 2008)

ERP marketplace is dominated by SAP, which offers revolutionary approach of business started to MRP in ’70 and followed with SAP R/3, SAP ECC, SAP Business One and SAP Hana. Second serious competitor is Oracle but both have serious impediments for implementation: highest price and complexity. Microsoft is another competitor, with Microsoft Dynamics AX, an ERP system fit for Windows systems and for companies in quest for project and financial software. All are fit for large companies. For small and medium companies we find: Infor, with a solution for discrete production: Infor Discrete Manufacturing Essentials, Industrial and Financial Systems IFS, Abas ERP for manufacturing and distribution, Epicor, Syspro, BatchMaster ERP, Sage (Shaul, 2015). Romanian solutions for ERP system are dominated by Borg and WinMENTOR ENTERPRISE®, followed by Clarvision and other ERP systems.

2.2. RESTful Web Services

The concept of REST (REpresentation State Transfer) was introduced by Roy Fielding. It describes a new architectural style of Web applications and network systems (Sun, 2009). For gaining work extensibility on Internet Web servers, clients, and intermediaries shared some four principles which Fielding calls REST constraints: 1. Identification of resources 2. Manipulation of resources through representations 3. Self-descriptive messages 4. Hypermedia as the engine of application state (HATEOAS) In fact these principles form a consistent metaphor of systems and interactions on the Web. More detailed, resources (1) could be identified with everything that can be named by a target of hypertext (e.g., a file, a script, a collection of resources). The client receives a representation (2) of that resource, as a response to a request for respectively. The representation of the resource may have a different format than despite the resource on the server. Manipulation of resources is done via messages on the Web, as HTTP methods.

251

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

All links or URIs kept state of any client-server interaction in the hypermedia (4) which is exchanged. The client and the server exchanged state information from messages, are maintained stateless (Fielding, 2000). Another comparison study between RESTful and WS-* services was focused on 3 levels: 1) architectural principles, 2) conceptual decisions, and 3) technology decisions (Pautasso et al., 2008). Both support architectural principles consist on protocol layering, heterogeneity and loose coupling to location (or dynamic late binding). As conceptual decisions, Pautasso et al compare 9 different designer decisions and RESTful services make 8 of them, while WS-* only 5, offering more alternatives than RESTful services. Technology decision shows 10 solutions for both styles. Relevant principles for systems available on the Web afford to identify four system properties of RESTful services: 1) uniform interface, 2) addressability, 3) statelessness, and 4) connectedness. WS-* services offers three of these four properties while RESTful Web services includes all in resources, URIs, representations, and the links (Richardson and Ruby, 2007).

3. EXTENDED ERP IMPLEMENTATION USING RESTFUL WEB SERVICES

The ERP implementations are widely acknowledged in even small and medium-sized enterprises. Diversification of web solutions and technologies and the explosion of e- commerce alternatives have spurred the modernization and development of enterprise solutions in ERPII or ERP III. Beyond the schemes presented in paragraph 2.1, we aim to offer a concrete ERP II implementation solution using WinMENTOR ENTERPRISE® and RESTFul web services (WME®, 2018). Starting with 2008 TH JUNIOR SRL offers a modularized and full integrated ERP - WinMENTOR ENTERPRISE® (WME®), based on Oracle DB technology as a host central database. During the following few years WME® was continuously developed from an ERP classic solution (contained accounting, finance, human resource management, production management, logistic management, CRM and SCM) to a modern approach linking together several newest modules: DataWareHouse, WME® Analytics (BI), WME® EDI, Import/Export Module, Web-Rest-Server (WME®, 2018). Thus, the developers offer a solution for the ERP II approach and WME® can be implemented successfully in corporate companies or in small and medium-size companies, which decided to extend them business via e-commerce. We will present the characteristics of the main modules involved in implementing the ERP II solution for a company producing bio products, which sells them on the Internet.

252

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 2. Configuration option for REST Web service

RESTful WebService allows other virtual applications to communicate with the Oracle Database (DB) of WinMENTOR ENTERPRISE® (WME®). The functions contained in this server offer the following features for virtual businesses (WinMENTOR®,2018): • Update functions, which refer to: article, new client, new order, new order from different management, customer outgoing, vendor inputs, house or bank transactions, which gives it the advantages of using the central database in virtual business. • Query functions: client-side databases, active discount criteria, earnings, item information, order information, client order status, price promotions are designed to facilitate access to DB virtual client information. Other category of functions focused on Production process Query functions: release stage in production, subunits nomenclature, status queries, inventory, inventory accumulated, not discharged on management. The EDI (Electronic Data Interchange) module is used for importing documents from other applications, or exporting documents in formats. It allows export of payments to banks in MT103 format, import of payments made, bank or POS receipts, import of invoices, notices, orders, transfers, teaching notes, etc. It also facilitates the import of client orders in XML, TecNET and EdiNET formats and is a permanent subject to customization, depending on customers’ demands. The link between both modules and Oracle DB is shown in Figure 2 and a sample of Order status from virtual customers is presented in Figure 3.

253

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 3. Orders status

Commercial documents related to e-commerce activity (especially invoices and warranty certificates) are automatically generated in PDF format, according to customer requirements, sent to every customer via email and stored in special directories (Figure 4).

Figure 4. Commercial Documents

4. EVALUATION OF THE PROPOSED SOLUTION

The practical implementation of the WME® as an ERP II solution was done for the benefit of a bio-company that did not have an earlier ERP solution and wanted to develop its virtual business in parallel. The WME® and Web Rest Server solutions allow the company to generate more than 1,000 bills per month, with a single operator, and also to process billing documents for virtual payments in less than an hour. This means that the documents received by the banks are processed by the REST and EDI modules, to enable the automatic generation of the accounting notes related to the collections of the issued bills.

254

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Commercial documents

1200 1000 800 600 896 923 998 1024 400 679 735 768 812 Commercial 200 documents 0 Jan Feb Mar Jun Jul Aug Sep Oct

Figure 5. Dynamic of Commercial Documents

Since January 2018, when the implementation was completed, and until October the same year, the number of automated business documents has increased from 679 to 1024 (Figure 5). Automatic generation of PDF bills and warranty certificates and automatic send just-in- time to all customers is essential to eliminate manual effort at customer’s company. Major benefits consist on increased number of automated documents during a period of time besides reduce number of employers. The company has only one operator in the delivery process and a fast courier company is charged to deliver products requested by customers. By aquisiton another ERPII module: WME Analytics (BI under Qlik) company could make a decizion optimization. It will increase efficient management and business evolution.With this solution, the client benefitting from the implementation estimates that it will be able to increase the activity volume by at least 10 times.

CONCLUSIONS

This papers presents an extended enterprise implementation for a company specialized on bio production, which offers an e-commerce solution as main customer relationship tool for developing businesses. The core of implementation is WinMENTOR ENTERPRISE®, a Romanian ERP solution, which is extended with Web RESTful services and EDI modules. In this manner all the commercial documents are generated as PDF files and sent via email to customers. Payment process is linked with the central Oracle database via the Web-REST and EDI modules. By implementing an ERP II solution, the company automated the order processing and payment processes, increased the sales and offered a modern solution to its customers. Beside all, the company reduced the sales employees number , increased business indicators and started an initiative to become a paperless company.

255

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

ACKNOWLEDGMENTS

This research was supported by TH JUNIOR SRL, which offers the WinMENTOR ENTERPRISE® software solution for developed our work.

REFERENCES

[1] de Búrca, Sean. Fynes, Brian & Marshall, Donna. Strategic technology adoption: extending ERP across the supply chain. The Journal of Enterprise Information Management, Vol. 18 No. 4, pp 427-440, (2005) [2] Fielding, R., Architectural Styles and the Design of Network-based Software Architectures., Doctoral dissertation. Technical report, University of California, Irvine, (2000). [3] https://www.igi-global.com/dictionary/extended-erp/10650 [4] L. Richardson and S. Ruby. RESTful Web Services. OReilly, (2007). [5] Motiwalla, L., and J. Thompson Enterprise Systems for Management, (2 ed.), Prentice Hall, (2012) [6] Mullaney, E., The Difference Between ERP And ERP II, Enterprise Resource Planning (ERP) and Analytics Software, SAP Business One Enterprise Software, (2012) [7] Pablo Espinoza, Torbjörn Windahl, How do Organizations Expand their ERP beyond its core capabilities, PhD thesis, Mälardalen University School of Sustainable Development of and Technology,(2008) [8] Paul Adamczyk, Patrick H. Smith, Ralph E. Johnson, Munawar Hafiz, REST and Web Services: In Theory and In Practice, (2015) [9] Pautasso, C. , Zimmermann, O. and Leymann, F. , Restful web services vs. ”big”’ web services: making the right architectural decision. In WWW ’08: Proceeding of the 17th international conference on , pages 805–814, New York, NY, USA, ACM., (2008) [10] Shang, S., and P.B. Seddon "Assessing and Managing the Benefits of Enterprise Systems: The Business Manager’s Perspective", Information Systems Journal, 12 (4), pp. 271–299, (2002) [11] Shaul, C., Top 10 ERP Solutions, http://erpselectionhelp.com/top-10-erp-solutions/, (2015) [12] Sun,B., A multi-tier architecture for building RESTful Webservices, IBM, 2009, ibm.com/developerWorks/ [13] WinMENTOR®, https://portal.winmentor.ro/wme/p/module, (2018) [14] Wood, B., ERP vs. ERP II vs. ERP III Future Enterprise Applications, http://www.r3now.com/erp-vs-erp-ii-vs-erp-iii-future-enterprise-applications/, (2010)

256

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

255 THE FACTORIZATION OF X – 1 IN Z2[X]

Adrian ATANASIU 1* Bogdan GHIMIŞ 2

ABSTRACT

AES (Rijndael) is considered the most prolific and widely used ([2]) encryption algorithm and it has deep roots in Galois field theory. The mathematical operations that occur are 8 done in a special finite field – GF(2 ) that is obtained by factorizing Z2[X] over the polinomial 1 + X + X3 + X4 + X8. We have been wondering why that polynomial has been chosen and if there are some hidden proprieties of that polynomial that other’s don’t have. In this paper, we are going to look into the structure of GF(28) and try to find some answers regarding this choice made by the authors of AES.

1. INTRODUCTION

First we are going to present a short mathematical set of basic concepts; followed by the inner workings of AES by pointing out its computations using the GF(28) field. Finally will look into other papers published by the authors of AES and see how they solved another similar problem.

2. MATHEMATICAL PRELIMINARIES

2.1. Groups and Rings

Definition 2.1.1. A group (G, *) is defined by a set G and a binary operation * on the set, that obeys the following proprieties: • the binary operation * is closed on S (taking any two elements x, y from G and applying the binary operation, the result is still an element from S) • the binary operation * is associative (e. g. ) • there exists an identity element in G (1) (e.g. . . ) • each element in G has an inverse (e.g. . . ) Definition 2.1.2. An abelian group (G, *) is a group where the binary operation is also commutative.

1* corresponding author, Professor, PhD, University of Bucharest, [email protected] 2 University of Bucharest, [email protected]

257

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Definition 2.1.3. In a group (G, *), a subset S generates G if any element of G can be expressed as a combination of elements of S using the binary operation *. Definition 2.1.4. A group (G, *) is called cyclic if it can be generated by a single element, which means that all the elements are actually “powers” of a single item α, called a generator ([5]). Definition 2.1.5. A ring (R, +, *) is defined by a set R and two binary operations (additive and multiplicative), so that: • (R, +) is an abelian group, with the identity element noted 0 • (R, *) is an monoid, with the identity element noted 1 • , , Definition 2.1.5. A commutative ring is a ring in which the multiplicative operation is commutative.

2.2 . Fields

Definition 2.2.1. A field (K, +, *) respects the following proprieties: • (K, +, *) is a commutative ring • \ . . Definition 2.2.2. A finite field is a field K with a finite number of elements. This number is called the order of K and denoted by ord(K). Theorem 2.2.3. If K is a finite field and then , where p is a prime number and n is a positive integer. Usually we shall work with field , where the addition is XOR, and the multiplication is defined iff . Definition 2.2.4. A polynomial ring K[X] in variable X over a field K is the set of polynomials: P … having as operations usual addition and multiplications with polynomials. The degree (deg) of a polynomial represents the largest power of X for which the coefficient an is not null. The fundamental result used here is the following: For every two polynomials P and Q, with Q ≠ 0, there are (unique) polynomials q (quotient) and r (remainder) so that: • • Furthermore, we can define the greatest common divisor () and the least common multiple () for polynomials. We can calculate them using – first - the Euclid’s algorithm (for gcd), then (for lcm) the relation , . , Definition 2.2.5. An irreducible polynomial is polynomial that cannot be written as a product of nontrivial polynomials over the same field.

258

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Definition 2.2.6. A root of a polynomial P is an element so that P(r) = 0, where 2 n P(r) = a0 + a1*r + a2*r +…+ an*r . Definition 2.2.7. A minimal polynomial of a value α is the polynomial m of lowest degree such that α is a root of m. Definition 2.2.8. A primitive polynomial is a polynomial that generates all elements of an extension field. In order to construct an extension of a field , we will need • the polynomial ring K[X], • an irreducible polynomial . Then quotient ring K[X} / f is defined as follows: K[X} / f = {r | there is P ϵ K[X] so that P=q * f + r, deg(r) < deg(f)}. We say that r = P (r equals P) modulo the irreductible polinomial f. Therefore ⁄ will contain all polynomials of degrees less than deg().

Theorem 2.2.9. Let be a ring of polynomials and an irreducible polynomial. Then (⁄ , +, *) is a field, where the product is performed modulo the polinomial f.

In particular, for 2 and for an irreducible polynomial of degree 8, we define 2 ⁄ as the set of bytes having a field structure that depends on the chosen polynomial . We specify that the byte a0a1a2…a7 corresponds to the polynomial 2 7 a0 + a1X + a2X + … + a7X . 8 3 4 8 8 The field GF(2 )=Z2[X] / (1+X+X +X +X ) has 2 elements, and the polynomial 1 + X (first row in Annex 2) can be chosen as generator for the multiplicative group (GF(28)*,*).

3. ADVANCED ENCRYPTION STANDARD

AES is a block cipher encryption symmetric algorithm with which one can partition the data into blocks, encrypt it and then send it through an insecure channel. Being a symmetric encryption algorithm, it uses the same encryption key for encrypting and decrypting the data. Encryption steps: • Key-Expansion step: (the symmetric key is used to derive Round-Keys) • In the first round we execute AddRoundKey • The next (9, 11 or 13) rounds the following operations take place: o SubBytes o ShiftRows o MixColumns o AddRoundKey • In the final round, all operations will be performed except the last one: o SubBytes

259

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

o ShiftRows o MixColumns In the SubBytes step, a substitution box (S-box) is used in which each byte is swapped with another one in a deterministic fashion, using a lookup table. This lookup table was derived from the multiplicative inverse over GF(28), followed by a affine transformation. This is the only nonlinear step of the algorithm ([1]). The MixColumns step ensures that if only one bit of the input text is modified, at least half of the output bits would change ([4]). Most of the operations of this algorithm take place in a finite field GF(28) using the irreducible polynomial 1 + X + X3 + X4 + X8. All the irreducible polynomials of degree 8 over Z2 are irreducible factors of – 1 and it is because of that this factorization is particularly interesting.

4. ABOUT IRREDUCIBLE FACTORS OF

There are 30 irreducible polynomials of degree 8 over Z2 that can be found in Annex 1. Each one of them could have been used for AES encryption system. One possible reason for which the peculiar polynomial 1 + X + X3 + X4 + X8 has been chosen is the fact that it has only five terms, and it is the first polynomial in lexicographical order (among all irreducible polynomial of degree 8, this one has the smallest exponents). We can also note that for polynomial 1 + X + X3 + X4 + X8, although irreducible, the 8 3 4 8 polynomial is not a generator of GF(2 ) (=Z2[X] / (1 + X + X + X + X )), its period being 51. One of AES’s inventors, Vincent Rijmen, along with Paulo Barreto, has proposed a hash function WHIRLPOOL ([6]) which is based on AES. This hash function uses the same Galois field GF(28), but uses 1 + X2 + X3 + X4 + X8 as irreducible polynomial. It is specified by the authors that this polynomial was chosen because it was the first polynomial listed in Table C from [3], and for which the primitive polynomial generates the whole GF(28). Another possible reason for choosing the first polynomial for Rijndael is for processing speed for 8 and 32 bit processors. In the original specification of this algorithm, it is asserted that the operations that take place in this field can be very efficient both for 8-bit processors (smartcards) and for 32-bit processors (PCs) ([1]). Moreover, the construction of S-boxes for AES was made in such a manner that the polynomials are simple, but there also exists an algebraic complexity, when working in GF(28)([1]). Rijndael’s authors have initially considered that the S-box should be the mapping x => x-1 in GF(28), but the algebraic complexity was weak and some attacks (e.g. interpolation attack) can be performed. Because of that an affine transformation was added.

260

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

4.1. The factorization of

8 2 3 4 8 For the factorization of 1, we consider GF(2 )=Z2[X] / (1 + X + X + X + X ) using the first primitive polynomial listed in Annex 2. Because 1 is a primitive polynomial, we can compute the factorization of 1. First of all, if α is a root of 1+X2 + X3 + X4 + X8 = 0, we shall compute the minimal polynomials of each power of α: Minimal polynomial Roots 1 X X X X α,α,α,α,α,α,α,α 1 X X X X X X α,α,α,α,α,α,α,α 1 X X X X X X α,α,α,α,α,α,α,α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X α, α 1 X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X X X X X α, α, α, α, α, α, α, α 1 X X α, α, α, α 1 X X X X α, α, α, α, α, α, α, α 1 X α In order to compute the minimal polynomial of a value α, we firstly compute its order k (i.e. for α we compute α,α ,α …,α α (modulus 1 + X2 + X3 + X4 + X8).

261

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2 k-1 k Because α 0 and because m = a0 + a1 * X + a2X + … + ak-1X + X , we will solve the linear system , where 1α α α 0α α α 0 α α α 0 α α α These polynomials are all irreductible factors of X255 - 1. To verify that we will compute their least common multiple. Because gcd = 1 for any pair of polynomials, the is directly their product, that is 1 + X255 (or X255 – 1 in the binary case).

BIBLIOGRAPHY

[1] Joan Daemen, Vincent Rijmen - The Rijndael Block Cipher - AES Proposal, https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and- guidelines/documents/aes-development/rijndael-ammended.pdf, 1999 [2] Mansoor Ebrahim, Shujaat Khan, Umer Bin Khalid - Symmetric Algorithm Survey: A Comparative Analysis - International Journal of Computer Applications (0975 - 8887), 2014 [3] Rudolf Lidl, Harald Niederreiter - Introduction to Finite Fields and their Applications - Cambridge University Press, 1986, pag. 378, Tabel C. [4] Claude Shannon - A Mathematical Theory of Cryptography, 1945, part 3, pag. 92. [5] https://en.wikipedia.org/wiki/Finite_field [6] https://web.archive.org/web/20170817134205/ http://www.larc.usp.br:80/~pbarreto/WhirlpoolPage.html

Annex 1. All irreducible polynomials of degree 8 over Z2:

1 X X X X 1 X X X X 1 1 1 1 1 X X X X X X 1 1 (AES) 1 1 1 1 1 1

262

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

1 1 1 1 1 (WHIRLPOOL) 1 1 1 1 1 1 1 1 1 1 These polynomials were found in an incremental manner, starting from irreducible polynomials of degree less or equal to 2. The method used in finding the irreducible polynomials was to generate the set of all polynomials of higher degree and then subtract from that set the reducible ones. A reducible polynomial of degree n can be found by taking two polynomials, one of degree p and one of degree q, where p > 0, q > 0, p + q = n, and multiplying them together.

8 3 4 8 Annex 2. All primitive polynomials which can generate GF(2 )=Z2[X] / (1 + X + X + X + X )

The current number corresponds (in base 10) to the vector representation of the polynomial associated on its right. For example, for 1 + X2 + X3 the vector representation is [0, 1, 1, 1, 0, 0, 0, 0], that is 1110 in binary and (1110)2 = (14)10. 3 1 134 5 1 135 1 6 136 9 1 138 11 1 142 14 143 1 17 1 144 18 147 1 19 1 149 1 20 150 23 1 152 24 153 1 25 1 155 1 26 157 1 28 160 30 164 31 1 165 1 33 1 166 34 167 1 35 1 169 1

263

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

39 1 170 40 172 42 173 1 44 178 48 180 49 1 183 1 60 184 62 185 1 63 1 186 65 1 190 69 1 191 1 70 192 71 1 193 1 72 196 73 1 200 75 1 201 1 76 206 78 207 1 79 1 208 82 214 84 215 1 86 218 87 1 220 88 221 1 89 1 222 90 226 91 1 227 1 95 1 229 1 100 230 101 1 231 1 104 233 1 105 1 234 109 1 235 1 110 238 112 240 113 1 241 1 118 244 119 1 245 1 121 1 246 122 248 123 1 251 1 126 253 1 129 1 254 132 255 1 We remark that 50% from elements of GF(28) (128 from 256) are primitive and can generate the whole field.

264

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

A VOTING-BASED IMAGE SEGMENTATION SYSTEM

Valentin Gabriel MITREA 1* Mihai-Cristian PÎRVU 2 Mihai-Lucian VONCILĂ 3 Costin-Anton BOIANGIU 4

ABSTRACT

Image segmentation is an important topic in the field of Computer Vision and has numerous practical applications. It is often used as a preprocessing step for other higher level image processing algorithms such as: text analysis, object identification, feature extraction, etc. However, there is no image segmentation technique that can produce perfect results on any type of image. Numerous algorithms exist and each has its upsides and downsides depending on the input. This paper proposes two voting algorithms that combine the results of some well-known segmentation techniques into a final output which aims to be, in as many cases as possible, better than the individual segmentations.

KEYWORDS: image processing, segmentation, voting technique, region growing, cluster, graph, mean shift

1. INTRODUCTION

Image segmentation represents a class of image processing algorithms that have the purpose of organizing an input image into groups of pixels. These groups are formed based on some criterion such as color intensity. Therefore, all the pixels from a particular segment have to be similar to each other. Considering the fact that no image segmentation technique can provide ideal results on any given image, we can split the segmentation algorithms into two categories: the ones that have the tendency to perform oversegmentation and the ones that have the tendency to perform undersegmentation. Oversegmentation means dividing the image into a very large number of segments. This has the advantage of focusing on details, but the end result will be affected by the noise present in the image.

1* corresponding author, Engineer, "Politehnica" University of Bucharest, Bucharest, Romania, [email protected] 2 Engineer, "Politehnica" University of Bucharest, Bucharest, Romania, [email protected] 3 Engineer, "Politehnica" University of Bucharest, 060042 Bucharest, Romania, [email protected] 4 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

265

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

On the other hand, undersegmentation means dividing the image into a small number of segments. This has the advantage of removing noise at the cost of ignoring some details from the image. In figure 1, it can be seen that the popular Lena image segmented two times. On the second row, there is an oversegmented result (to the left) and an undersegmented result (to the right).

Figure 1. Comparison between oversegmentation and undersegmentation

The goal of this paper is to present a voting technique that combines the results of some well-known image segmentation algorithms. On their own, these algorithms do not provide the optimal results, but by combining them through the voting technique, it is expected to generate a final, improved output.

2. RELATED WORK

Ana Fred proposes in [1] a majority voting combination of clustering algorithms. The idea is that a set of partitions are obtained by running multiple clustering algorithms and, then, the outputs are merged into a single result. To achieve this, the results of the clustering algorithms are mapped into a matrix, called the co-association matrix, where each pair (i, j) represents the number of times the elements i and j were found in the same cluster. And Fred's approach represented a starting point in [2] for the implementation of a voting image segmentation algorithm. In order to produce a meaningful result, the co-association matrix was used in combination with a weighted voting technique.

3. SEGMENTATION ALGORITHMS

In this section, there will be a discussion about the segmentation algorithms that generate the partial results which are used in our voting algorithms. The chosen algorithms for implementation: Region-Growing, Superpixel, Graph-Based and Mean-Shift. The selection was made in order to have a combination of oversegmentation and undersegmentation techniques. This way there are covered as many features from the input image as possible.

266

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

3.1. Region-growing

Region-Growing [3] is based on the fact that pixels which belong to the same region of an image must have similar properties. A region is created by starting from a single generator and iteratively adding neighbors which verify a criterion. The criteria that was chosen for the implementation is that the color intensity between the neighbor pixel and the seed must be lower than a threshold value. The algorithm starts from an input image where each pixel is labeled as not belonging to any region. While there are still pixels not allocated to a segment, the algorithm picks one and constructs a new region around it. Pixels are considered neighbors based on the four- neighboring connectivity rule. When the algorithm execution finishes, each pixel will be marked with an index that represents the region in which it belongs. The main steps of the algorithm are the following: • label = 0 • mark each pixel in the image as unlabeled • while there are still unlabeled pixels o seed = one of the unlabeled pixels o Q = queue of pixels o labels[seed] = label o add seed to Q o while Q is not empty ƒ pixel = front of Q ƒ for each neighbor of pixel • if neighbor is unlabeled and neighbor is similar to seed o labels[neighbor] = label o add neighbor to Q o label = label + 1 A very important step of the algorithm is the selection of a new generator pixel. There are two implemented techniques for this choice: random and pseudo-intelligent. The first method will randomly pick a seed from the remaining unlabeled pixels, while the second will favor generator pixels that have a high number of similar neighbors and leave the others to the end.

3.2. Superpixel

The Superpixel algorithm [4] is a variant of the K-means clustering algorithm, where the input is an image and the output is a clustered image, with values for each position corresponding to the index of the cluster that position has been assigned to.

The K-means clustering algorithm is used in general to partition a dataset {x1…xn} (each xi is a D-dimensional point) into K clusters, where K is considered a hyper-parameter which is given. Each cluster j has a mean value µj (also called centroid) and each point xi is governed by a 1-of-K vector ri,j, where ri,j = 1 means that xi is in cluster j. The goal of the algorithm is to assign the points to a cluster such that the distance of any point xi to its cluster mean µj is as small as possible. Formally, this can be interpreted as to minimize the following function, called the distortion measure:

267

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

, | | This function is minimized using an iterative approach of 2 steps, called the Expectation and Maximization steps which are executed until a convergence criterion is achieved. Initially, the values µj are chosen at random, but this can be altered, to provide faster convergence and better results. In the Expectation step, each point is reassigned to the closest centroid based on a distance function: | | Then, in the Maximization step, the centroids of each cluster are updated taking the mean value of the current points assigned to that cluster. ∑ , ∑ ,

For the image segmentation problem, the K-means is used as follows. Each pixel Ii,j represents a point xi in the formulation above. The clustering is done in the , thus both the distance function and the centroids are represented in this space. Having a good initialization mechanism is one of the most important aspects of the K-means algorithm. A potential choice for initialization is the insurance that while the K-1 centroids are initialized at random, they are also initialized as far away as possible. In practice, it is enough to reach a state where only a small percentage of the pixels change their centroids between two iterations, because reaching a convergent state might take too long and the improvement would be too slim to outweigh the time. A common value is 5% of the image. Below is an example of the algorithm's output with different K values, on an image taken from satellite, basically showing how the algorithm can be used for undersegmentation and oversegmentation just by varying its only parameter.

Original imagine Clustered imagine (K=4)

268

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Clustered image (K=8) Clustered image (K=20) Figure 2. Superpixel segmented images

3.3. Graph-based

The algorithm chosen for implementation in this paper is a variation of the one proposed by Felzenszwalb and Huttenlocher [5]. The said algorithm works with two basic structures: a graph, that will contain the information relating the final segments of the processed image, and a disjoint set, used for optimizing the search inside the graph. The Graph-Based technique will convert all the pixels of an image into nodes of a graph and place edges between the said nodes based on a four-neighboring connectivity rule. These edges represent the dissimilarity between two neighboring pixels. In order to obtain a segmented image as an end result, the algorithm will cut some edges based on a certain threshold. The algorithm also takes into account the fact that after segmenting the image some regions may end up being very small, thus it may be run multiple times to post- process the aforementioned small components. The current implementation takes as an input the image we want to segment, the minimum size for each segmented region and the threshold at which to cut a certain edge in the graph and returns an image in which the have been changed based on the obtained regions. The main steps of the algorithm are the following: • convert pixels from image to graph nodes • create an edge between every two nodes based on a four-neighboring connectivity rule, with the weight of those edges being the distance between the colors of the two pixels • create a disjoint set as follows o number of elements in set = number of nodes in graph o for each element in set ƒ element pair = element ƒ element threshold = threshold o for each edge in graph ƒ element1 = find set element for node1 of the edge ƒ element2 = find set element for node2 of the edge

269

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

ƒ if edge threshold <= element1 threshold and element2 threshold then • combine element1 and element2 in set • combined element threshold += edge weight o repeat to reduce the number of small regions • create final image based on the elements in the set Due to the fact that the algorithm has a tendency towards undersegmentation, running it once or twice is usually the best approach, as other runs after that will not guarantee reducing the number of small regions and will just result in a performance drop.

3.4. Mean-Shift

For Mean-Shift [6], the implementation found in the OpenCV library [7] was picked. The algorithm was selected to compensate for some of the inaccurate results the other algorithms might produce. As a general idea, the algorithm works with two spaces, the image space (i.e. the space representing the distances between the image pixels) and the color space (i.e. the space representing the distance between the values of the image pixels). In order to create different regions from an input image, it fills the aforementioned spaces with windows that cover all the values, and then it tries to shift the said windows, taking into consideration the mean of the covered pixels, so that it will obtain the final segmented image. The algorithm stops when it reaches convergence across all windows. The execution of Mean-Shift takes an input image to be segmented, the radiuses of both windows used for convergence, a maximum number of levels for the Gaussian pyramid, and offers a segmented image as output in which the pixel colors have been shifted. As it was mentioned, the algorithm tends to be slow sometimes, thus it is provided with two different termination criteria: convergence or a maximum number of iterations to run. This ends up favoring performance in detriment of stability. Due to the fact that the position of the windows is random, it might happen that running the algorithm multiple times will end up giving different results. However, the differences tend to be minimal if the values of the parameters are chosen correctly.

4. PARAMETER ESTIMATION

This chapter deals with the estimation of the parameters required by each algorithm described in section 3. The metric that is used to compute them will be described, then there will be a discussion about each algorithm independently and a verification of what values were found to work best and how they were picked. In order to perform the estimations, it was used the NYU Depth dataset [8] which offers a large set of input images together with already segmented images. These labeled images serve as the ground truth in the calculations. The results from the segmentation and voting algorithms will be made to be as close as possible to the labels in the dataset.

The result of a segmentation algorithm on an image I is an image RI that has the same shape as the initial image, but instead of having an intensity value for each position, it has an index of the region that position is part of. The problem that arises here is that if two

270

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT different algorithms are run on the same image, the indexes of the same region will not correspond, and additional processing must be done. One approach is to create a similarity matrix as described here [1]. Thus, the labels from the dataset will be processed and the outputs of each algorithm in the following manner: these images, instead of having an indexed label for each position, will contain the mean intensity value of that region. Formally, this can be written as: for all and ∑ . , , || This computation can be seen in Figure 3, where there is an image from the dataset, its label and the calculated mean label. It can be observed that in the mean label each region looks roughly like the initial value, but it is still clustered. The algorithm can now work on this resulted image and apply its voting techniques on the intensity values directly, instead of working with arbitrary labels.

Image Label Mean Label

Figure 3. Mean Labeled image

Consider there is a Mean Segmented image from a segmentation algorithm A, called RA, and a Mean Labeled image from the dataset, called RL. Then, the comparison metric is defined as the Euclidean Distance between the two results: ∑,, , . It should be noted that for a 640x480x3 image, as those in the NYU Depth dataset, the L value can vary from 0, where the result is identical to the label, to √640 480 3 255 59,927,040,000 244,800, when the label has only 0 values and the result has only 255 values (or the other way around). A second note that should be made here, is that, while the label does a good job at segmenting the image, it will usually compute the regions using a more semantic approach, rather than intensity based (which is how the algorithms used work). This can result in pretty big differences between the two results, but, nonetheless, the value L itself can be used to obtain the best parameters for the algorithms and to compare the algorithms with each other. Generally, a lower value means a better result and this is the metric we use when picking the parameters. Now that the metric is defined, there must be a talk about what parameters are varied for each algorithm and see what values were chosen. For the Region-Growing algorithm, the threshold is varied and whether the initial seeds are chosen at random or using a more intelligent approach. For the Superpixel algorithm, there is only one parameter to vary, and that is the number of clusters it produces using the K-means technique. For Graph-

271

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Based, there are two parameters: the minimum size and the denominator used in the standard deviation formula. Finally, for Mean-Shift, there are 3 parameters to find: the spatial and color radiuses and the maximum level of the Gaussian pyramid. In order to obtain the ideal parameters, there were used 150 images, picked at random from the dataset, and applied the algorithms while computing the mean L value, as defined earlier, for each given set of parameters. Below are the results for each algorithm, and, as can be seen, even a small change can significantly improve the quality of the segmentation. In the end, there were kept only the best results, and were used in the voting scheme, which will be described in the next section.

Figure 4. Region-Growing parameter choice results

For Region-Growing, the best results were using a threshold of 25 and random seed pixels.

Figure 5. Superpixel parameter choice results

272

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

For Superpixel, as can be seen, a smaller amount of clusters yield the best results. Thus, there are used only 5 clusters as parameter for the algorithm.

Figure 6. Graph-Based parameter choice results

For the Graph-Based algorithm, it can be seen that using a minimum size of 2 and a denominator of 5 gives the best segmentations.

Figure 7. Mean-Shift parameter choice results

Finally, for Mean-Shift, the algorithm was able to get the values of 80 spatial radius, 80 color radius and a maximum level of 5. It should be noted that visually, these values, while having the lower L value, did not provide the best segmentation. So, a secondary set of parameters was selected: (40, 40, 5).

273

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

5. VOTING ALGORITHMS

A voting scheme is used to combine the results of the 4 algorithms described in chapter 3. This scheme takes the best from each algorithm, and tries to get a better segmentation in the end. In order to do this, there have been implemented two voting methods: Region Based (which is similar to the Region-Growing algorithm) and Weights Based (which uses a weighted sum scheme, followed by a re-clustering algorithm). The algorithms are completely independent from each other, so, they are able to execute them in parallel. Also, all the 4 segmentation algorithms are run using the parameters estimated in chapter 4, thus the assumption that each algorithm produces the best possible individual result can be made. With this assumption, the combination of these partial results will lead to the best final result can be implied. The voting methods used by us are executed on the Mean Segmented images computed by each segmentation technique. This allows the highlight on how applying the voting algorithm improves the result compared to each algorithm individually.

5.1. Region Based

The Region-Based technique takes the Mean Segmented output of each segmentation algorithm and tries to build regions using the information gathered from those partial results. The algorithm is based on the Region-Growing segmentation described in section 3.1 with a few differences. The first one is that, in step a, when choosing the seed pixel, the algorithm simply takes the next unlabeled pixel from the image. For example, if pixels (0, 0) and (0, 1) are already labeled, then the generator pixel will be set as pixel (0, 2). The second one is that, when deciding if a neighboring pixel should be added to the current region, the program does not look at the color intensities of the neighbor and seed pixel. Instead, it looks at the regions created by each of the algorithms and decide if the neighbor should be added to the current region. The decision function will check if both Graph-Based and Mean-Shift say that the evaluated neighbor should be in the same region with the current position, and, if they both disagree, the two pixels will be in separate segments in the final image. This is because those two algorithms have the tendency to produce undersegmentation, and their capability of splitting segments should be trusted. Furthermore, the decision function will check if both Region-Growing and Superpixel say that the evaluated neighbor should be in the same region with the current position, and, if they both agree, the two pixels will be in the same segment in the final image. This is because those two algorithms have the tendency to produce oversegmentation, so, if they say that two pixels are in the same region, then that is pretty likely to be correct.

5.2. Weights Based

The second voting method we used, is a bit different from the first one. The Mean Segmented image is also used, but in another way. It can be considered that each

274

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT algorithm should contribute in a linear fashion, based on how well it performed during the parameter estimation. Thus, the final image will be a weighted sum of all algorithms:

In the above formula, the indexes R, S, G and M represent each of the 4 algorithms: Region-Growing, Superpixel, Graph-Based and Mean-Shift. As can be seen, the final voted image (V) is a linear combination of each partial result. Thus, it is only needed to find the weights for this sum. In order to do this, the parameters computed in section 4 were kept in place and then each algorithm was ran independently on a batch of images from the dataset. For each image, the program computed the L value and chose the weights in an inverse proportional manner to the sum of all L values. To better explain this, a simple example will be presented: considering that there are 3 algorithms A, B and C with their L values 1, 1, 3. Since a small L values means a better result, wA and wB need to have a greater weight than wC. The formula used is the following: , where ∑ . For our example, this leads to and . Using this formula, on a batch of 150 random images from the dataset, results into the following weights: wR = 0.258, wS = 0.253, wG = 0.273 and wM = 0.214. These are the values we use when applying the linear sum formula to compute the final segmentation image. In figure 8, the weights computation and how they change at each iteration can be seen, starting from all of them being equal to 0.25 and reaching the values presented above.

Figure 8. Weights computation

275

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

The image computed from the results of all the weighted sums might contain pixels that are from the same region, but have a slight variation in intensity. So, another instance of K-means will be run on this image in order to obtain a final segmented image.

6. RESULTS

In this chapter, the results of running the voting techniques on several test images will be presented. The voting schemes performed well in general, giving good results on various images. In every picture it will be showcased the output of each segmentation algorithm, as well as the final output of the two voting algorithms. In Figure 9, 10, 11 and 12 the results of running the voting algorithms on images from the dataset can be seen. The algorithms were ran on 150 random pictures from the dataset and the following mean loss values were obtained:

Algorithm Mean L value Region-Growing segmentation 42156.7879 Superpixel segmentation 42409.3569 Graph-Based segmentation 52854.0294 Mean-Shift segmentation 39288.1205 Region Based voting 43254.0269 Weights Based voting 41479.8401

So, as seen from the table, the voting algorithms often give good results in comparison to the labels from the dataset.

276

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 9. Test execution on image 1

Figure 10. Test execution on image 2

277

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 11. Test execution on image 3

Figure 12. Test execution on image 4

278

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

7. CONCLUSIONS AND FUTURE WORK

The voting algorithms presented in this paper are promising and they yielded good results on the tested images. Therefore, it can be concluded that using a voting scheme to combine the results of various segmentation algorithms is viable. As future work, other segmentation algorithms can be implemented to be used in the voting scheme. Also, a voting method that is based on region indices as presented in [2] can be used. Furthermore, other color spaces can be used during the segmentation process such as the CIE-Lab color space which is more adequate for detection of similar pixels as seen by the human eye. Also, different voting-based methods like those presented in [9][10][11] may be adapted to be used in an image segmentation scenario.

ACKNOWLEDGEMENT

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0689 / „Lib2Life- Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate” / "Revitalizing Libraries and Cultural Heritage through Advanced Technologies", within PNCDI III.

REFERENCES

[1] L. N. Fred, Finding consistent clusters in data partitions, Lecture Notes in Computer Science vol. 2096, Springer-Verlag, London, pages 309-318, 2001 [2] Costin-Anton Boiangiu, Radu Ioanitescu, "Voting-Based Image Segmentation", The Proceedings of Journal ISOM Vol. 7 No. 2 / December 2013 (Journal of Information Systems, Operations Management), pp. 211-220, ISSN 1843-4711 [3] Y. L. Chang, X. Li: “Adaptive Image Region-Growing, IEEE Transactions on Image Processing”, vol. 3, no. 6, pp. 868-872, 1994 [4] Achanta, Radhakrishna, et al. Slic superpixels. EPFL-REPORT-149300. 2010 [5] Efficient Graph-Based Image Segmentation - Pedro F. Felzenszwalb, Daniel P. Huttenlocher (2004) [6] D. Comaniciu, P. Meer, “Mean-Shift: a robust approach towards feature space analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, IEEE Computer Society, Washington DC, pp. 603-619, 2002 [7] Kanglai Qian, Mean-Shift Segmentation using OPENCV, Available at: https://github.com/qiankanglai/opencv.meanshift - Accessed on: 22 January, 2018 [8] Nathan Silberman, Datasets, Available at: https://cs.nyu.edu/~silberman/datasets/ - Accessed on: 22 January, 2018 [9] Costin-Anton Boiangiu, Radu Ioanitescu, Razvan-Costin Dragomir, “Voting-Based OCR System”, The Proceedings of Journal ISOM, Vol. 10 No. 2 / December 2016 (Journal of Information Systems, Operations Management), pp 470-486, ISSN 1843-4711

279

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[10] Costin-Anton Boiangiu, Mihai Simion, Vlad Lionte, Zaharescu Mihai – “Voting Based Image Binarization” - , The Proceedings of Journal ISOM Vol. 8 No. 2 / December 2014 (Journal of Information Systems, Operations Management), pp. 343-351, ISSN 1843-4711 [11] Costin-Anton Boiangiu, Paul Boglis, Georgiana Simion, Radu Ioanitescu, "Voting- Based Layout Analysis", The Proceedings of Journal ISOM Vol. 8 No. 1 / June 2014 (Journal of Information Systems, Operations Management), pp. 39-47, ISSN 1843-4711.

280

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

ECONOMIC INDICATORS AND HUMAN DEVELOPMENT INDEX

Ana-Maria Mihaela IORDACHE 1* Ionela-Cătălina ZAMFIR 2

ABSTRACT

Human Development Index (HDI) is a very interesting index that may show the development level of a country. A country that has a big HDI index and great life expectancy at birth should be a country with a developed economy, low unemployment rates, good import and export indicators and business-favorable economic environment. The main hypothesis tested in this paper is if there is a relation between HDI index and other type of mentioned indicators. Starting from the idea that in a cybernetic system, there is interdependence between all the events that take place. There are two types of analyses used to reduce the size of 22 variables from the dataset, while K-means and Ward's are used to classify the observations in four classes. A confusion matrix calculated between new classes (K-means algorithm) and known classes (from HDI index) confirm the tested hypothesis, with a certain "accuracy rate".

KEYWORDS: Dimension reduction analyses, HDI, K-Means, Ward's method, correlation

JEL CLASSIFICATION: C38, F63, O11, O15

INTRODUCTION AND LITERATURE REVIEW

Is it possible to create a model that shows the measure of economic and social development for worldwide countries? Has this any connection with HDI? This represents the main idea for this paper, considering that HDI is a well- known index that consider variables like education, life expectancy and gross national income indexes. Considering indicators that are more oriented to economic, business, and labor development the authors try to reveal the connection between HDI that is already computed and known and new aggregated indicators, computed using analyses that reduce the dataset dimension. The concern about HDI started years ago. In 2002, Biswas and Caliendo use a variables reduction analysis (and consider only one principal component) and the three indicators that compose HDI: gross domestic product per capita, life expectancy at birth and education for creating a similar indicator, named a metric for international human development. Their findings were similar to HDI and authors conclude that taking into

1* corresponding author, Lecturer, Phd, Romanian-American University, Bucharest, [email protected] 2 associate professor, Phd, Bucharest University of Economic Studies

281

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT account the PCA is a more complex technique that brings more "straightforwardness" by generating optimally weights. On the other hand, Ranis, Stewart and Samman (2006) try to identify 11 categories of human development over 30 indicators. The categories are "mental well-being, empowerment, political freedom, social relations, community well- being, inequalities, work conditions, leisure conditions, political security, economic security and environmental conditions" (Ranis et. al., 2006). Authors used rank-order correlations among the variables from each category and identified the most relevant indicators for each category. In 2004, Montenegro starts from the idea that, except income per capita, there is no rule to establish the most relevant variables that define the economic development. The author tries to define an economically developed country like a country with "high income per capita and a good income distribution" (Montenegro, 2004), where the terms high and good are understand differently by each person. After using the GDP per capita and Gini coefficients for a dataset of countries in order to develop an index, the author conclude with the recommendation to have a common methodology for Gini, recommendation for "United Nations, World Bank or the IMF" (Montenegro, 2004). Later, in 2011, Abraham and Ahmed used data from 1975 to 2008, GDP as economic growth and HDI index as social development, in order to identify "the disequilibrium between the variables" in time using error correction model (ECM) as methodology. Estimating a regression model with GDP and HDI, the authors show a non- significant short-term relationship between these variables, but a very significant equilibrium coefficient for long-term relationship. So that the policies "aimed at accelerating growth would have a negative impact on human development in the short run but in the long run, equilibrium will be restored by HDI adjusting upwards or downward to correct the equilibrium error" (Abraham, Ahmed, 2011). In 2014, Hajdouva, Z., Andrejovsky, P., Beslerova start from the idea that global experience does not confirm that economic development of countries comes with an increasing trend in quality of life. To do so, the authors chose 10 countries to study the relations "between the quality of life and environmental quality". By considering three clusters for all 10 countries, authors compare HDI with other indicators like corruption perception index (CPI), environmental performance index (EPI), GDP, and establish a model for future research that take into account indexes and variables like HDI, EPI, CPI and GDP. The research divided into several sections: the introduction and literature review present the most relevant studies in this area of interest and the assumption that there is a connection between HDI and economic, business, trade and employment indicators, connection revealed by new indicators (components, factors). The methodologies section show briefly the statistical background for testing the assumption while the data selection and description is the part presenting the dataset used in the research. The last two parts is the result and interpretation where the main results are presented, and conclusions, with final details about this paper.

282

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

METHODOLOGIES

Principal components analysis (PCA) and factor analysis (FA) are two of the main dimension reduction analyses. Both have in common the idea of reducing the variables matrix dimension by keeping as much information as possible. But, the main difference between them is that PCA relies on an optimum problem that maximize the variance that each component take from all variable, while the factor analysis's idea is the assumption of the existing factorial model that, with a small number of factors, the patterns between correlated variables can be explained. The model for principal components is (Dunteman, 1989): - The first principal component is a linear combination of all X variables: ∑ . The construction of this component takes into consideration that the variance of W (noted by λ) "is maximized by given the constraint that the sum of squared weights is one, and a1 is an eigenvector associated to the first eigenvalue of the covariance matrix" (Dunteman, 1989). - For the next principal component we should identify another eigenvector for the second eigenvalue, which maximize the variance of W. There is no correlation between it and the first principal component. - This method continues until all n principal components are computed (as many as original variables), each of them having less variance and less information from X dataset than the previously component. The idea of FA relies on1: "x = Λ f + e, for a p–element vector x, a p x k matrix Λ of loadings, a k–element vector f of scores and a p–element vector e of errors"3. If we consider the correlation matrix as Σ = Λ Λ' + Ψ, "the fit is done by optimizing the log likelihood assuming multivariate normality over the uniquenesses"3. Therefore, the scores might be written as f = Λ' Σ-1x using Thomson's method, while "Bartlett's method minimizes the sum of squares of standardized errors over the choice of f"3. On the other side, the K-Means algorithm is an unsupervised learning algorithm that divides the dataset into a known number of classes taking into account the maximization of variance between classes and minimization the variance inside each individual class. This algorithm identifies one of the four classes of development, such as the HDI: low, medium, high and very high development. Statistically, the steps for K-Means algorithm are2: - Initially, we know the number of clusters. The k initial classes formed "randomly" with k observations within the data. - each remained observation is then associated to one of the k clusters previously formed, using in general the method of the lowest distance between the initial centroid and each observation;

1 https://stat.ethz.ch/R-manual/R-devel/library/stats/html/factanal.html 2 https://en.wikipedia.org/wiki/K-means_clustering

283

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

- New centroids computed, after all observations grouped into k classes. - "The algorithm repeats, until the convergence reached"4. In comparison with K-Means algorithm, the Ward's method (Ruxanda, 2009) is an ascending hierarchical classification method that takes into consideration the general criterion of classification: at each classification step, two classes that have the smallest sum of squares of deviations, comparing to other pairs of clusters. The idea behind this method is maximizing the homogeneity of clusters.

DATA SELECTION AND DESCRIPTION

The World Bank database is the main source of data. The indicators considered are for 2017 and reflect mostly the trade, employment, business and economic indicators that are relevant in analyzing the development degree of each country, and comparing the new results with HDI index. Therefore, the table with considered indicators (for 2017) and their codes is:

Table 1. Indicators used for models

5 Name 1 Cod Name Cod "Merchandise exports (current "Age dependency ratio (% of 5 5 US$)" I11 working-age population)" I1 "Merchandise imports (current "Cost of business start-up 5 5 US$)" I12 procedures (% of GNI per capita)" I2 5 "Cost to export, compliance "Merchandise trade (% of GDP) " I13 5 (US$)" I3 "Net migration"5 I14 "Cost to import, border compliance 5 (US$)"5 I4 "Population growth (annual %)" I15 "Profit (% of commercial "Employers, total (% of total 5 employment)"5 I5 profits) " I16 "Rural population growth (annual "Employment in agriculture (% of 5 total employment)"5 I6 %)" I17 "Start-up procedures to register a "Employment in industry (% of 5 5 business (number) " I18 total employment)" I7 5 "Employment in services (% of "Tax payments (number) " I19 5 "Time required to start a business total employment)" I8 5 "GDP per capita growth (annual (days) " I20 5 "Urban population growth (annual %)" I9 5 "Labor tax and contributions (% of %)" I21 commercial profits)"5 I10 "Wage and salaried workers, total (% of total employment)"5 I22

From 209 initial countries that had available data for 2017, only 144 left after outliers’ removal. Each indicator has a name from I1 to I22, in the order mentioned above. All variables are standardized in order for using in further models.

1 https://data.worldbank.org/

284

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

RESULTS AND INTERPRETATION

For creating aggregated indicators it is necessary some level of correlation between selected variables, so the first step is to identify the correlations between variables that can confirm the utility of both principal and factorial analyses.

Figure 1. The correlation matrix between original variables

The figure from above is the correlation matrix between considered indicators. High correlations are between indicators like the employment in different areas (agriculture, industry and services) and the growth of urban population and I22. From these correlations it can be noticed that variables are splitting into two major components: the population component, that include variables like employment, population growth, labor tax and contribution, and a trade and economic component, including variables as the cost of export, import, gross domestic product per capita growth and number of tax payments. The correlation matrix from above represents that the dimension reduction analyses make sense, both analyses being methods to eliminate informational redundancy (no correlations between factors or principal components).

285

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 2. PCA statistics

The figure from above represents the main principal components analysis statistics. According to Kaiser criteria of choosing the number of principal components (PCs), 6 of 22 components may be considering for model, because of a variance higher than one, fact that is confirmed by the above scree plot: starting with component 7, the slope becomes almost insignificant and the amount of information brought by each new component decreases very much. Therefore, from 100% of information, 72% contained by first six components are enough to provide relevant conclusions.

Figure 3. FA statistics

If PCA is a way to reduce the number of correlated variables, the FA idea is that there is a factorial model that fits to a number of variables. In this respect, a KMO (Kaiser-Meyer- Olkin) value calculated for standardized variables has a value closer to 1 (0.71) and show the utility of using FA. Moreover, the Bartlett's Sphericity test show the rejection of null

286

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT hypothesis. This means that the correlation matrix is equal to unity, so the variables are orthogonal, without correlations. Both KMO and Bartlett's test show that the assumption that a factorial model exists between variables is true, so that the chosen number of five factors (57% of total variability) can explain the patterns between variables.

Figure 4. Classes centroids using K-Means

The centroids for each of four classes resulting from K-Means algorithm have interpreted in terms of factors and principal components. Each factor or component "takes" information from all 22 variables selected (more or less, depending on coefficients or eigenvectors), but, taking into account the correlations between observed variables and computed new aggregates (factors or components), it is easy to determine the name of each class, based on the average values from above. In this respect, the classes for principal components analysis (W) are (as level of development) 1= high, 2=low, 3=medium, 4=very high, while the classes for extracted factors are (as level of development) 1=low, 2=medium, 3=high, 4=very high.

287

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 5. Classes (K-Means classification) representation in factorial (left) and principal (right) plan

The figure from above show the graphically representation of all four classes in two dimensions: the first two factors (left side of the figure) and the first two PCs (right part of the figure). Even if the amount of information explained by first two factors or taken by first two PCs is not very high, comparing to five factors and six components, the clarity with which the four classes are distinguished is remarkable.

Figure 6. Confusion matrix for factors and principal components using K-Means

Confusion matrix are the most important outputs, because it represents the connection between the class showed by HDI, also named as original class and the new class (predicted) computed using K-Means algorithm and new datasets: principal components (W) and factors (F). From this point of view, knowing the new class signification from above, it is possible to estimate the "accuracy" degree, similar to supervised learning techniques, but with a different signification here. If we consider six principal components, then, there is an approximate 67% connection between new variables

288

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT classification output and HDI classification. On the other side, using factors, the percentage is only about 42%. The difference among the results from above comes from the amount of total information that each dataset takes from original variables, as well as the method applied to reduce data dimensionality: the difference between an optimum problem and maximum likelihood estimation.

Figure 1. Ward's dendrogram using principal components

The figure from above show the dendrogram obtained by applying the Ward's hierarchical method on principal components. The graph using red squares represents the fourth classes.

Figure 8. Confusion matrix for factors and principal components using Ward

It is interesting to compare the classification methods like K-Means and Ward's method from confusion matrix point of view. Theoretically, K-Means provides better results of

289

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT classification, because is an algorithm that runs until a stop condition is fulfilled, but, Ward's classification method is known as providing similar classification results as an algorithm. In this respect, we presented above the confusion matrixes, for both principal components (w2) and factors (f2). According to classes’ centroids and factors/components meaning, the new names for classes are 1=low, 2=high, 3=very high, 4=medium development level, for principal components. The percent of "correct classification" is about 51% (lower than a K-Means classification), and 1=low, 2=very high, 3=medium, 4=high development for factors and the percent is about 49%, higher than K-Means classification.

CONCLUSIONS

Finally, the conclusion of the article is to demonstrate the connection between the HDI index and the most relevant indicators from trade, employment, business and economic, taking into consideration the majority of worldwide countries. The dimension reduction methods (PCA and FA) were used to both synthesize the information from 22 variables and to create new indicators. Further analyses used these new indicators. Both K-means algorithm and Ward's method were used to classify the 144 countries into 4 classes and then to compare the new obtained classes with HDI, in order to see the connection between HDI index and new proposed models. In this respect, the confusion matrix estimate this connection in terms of "correct" classification percentage. For further analyses, we propose to analyze the degree of development for each group of countries by including more than 22 indicators, like social, education, health, poverty or financial indicators.

REFERENCES

[1] Abraham, T.W., Ahmed, U.A. (2011), Economic Growth and Human Development Index in Nigeria: An Error Correction Model Approach, International Journal of Administration and Development Studies, University of Maiduguri, Nigeria, Vol. 2, No. 1, pp. 239-254, ISSN: 2141-5226" [2] Biswas, B., Caliendo, F. (2002), A Multivariate Analysis of the Human Development Index, Economic Research Institute Study Papers, Paper 244." [3] Dunteman, G.H. (1989), Principal Components Analysis, Ed. SAGE, 1989, ISBN 0803931042, 9780803931046." [4] Hajdouva, Z., Andrejovsky, P., Beslerova, S. (2014), Development of quality of life economic indicators with regard to the environment, Procedia - Social and Behavioral Sciences, Vol. 110 (2014), pp. 747 – 754 [5] Montenegro, A. (2004), An Economic Development Index, Development and Comp Systems, University Library of Munich, Germany, https://EconPapers.repec.org/RePEc:wpa:wuwpdc:0404010." [6] Ranis, G., Stewart, F., Samman, E. (2006), Human Development: Beyond the Human Development Index, Journal of Human Development, Vol. 7, No. 3, pp. 323-358."

290

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[7] Ruxanda, Ghe. (2009), Analiza multidimensionala a datelor, Academia de Studii Economice, Scoala Doctorala, Bucuresti, 2009 [8] http://databank.worldbank.org/data/home.aspx [9] https://data.worldbank.org/ [10] https://en.wikipedia.org/wiki/List_of_countries_by_Human_Development_Index" [11] http://hdr.undp.org/en/composite/HDI" [12] https://stat.ethz.ch/R-manual/R-devel/library/stats/html/factanal.html" [13] https://en.wikipedia.org/wiki/K-means_clustering"

291

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

DOCUMENT LAYOUT ANALYSIS SYSTEM

Andrei Alexandru ALDEA 1* Radu Gabriel CORIU 2 Ştefan-Vlad PRAJICĂ 3 Răzvan-Ştefan BRÎNZEA 4 Costin-Anton BOIANGIU 5

ABSTRACT

The need to process large amounts of printed physical data has led to the development of automated solutions for scanning and converting such documents into an editable text format. Following the layout analysis process, the different areas (blocks) of the document can be labeled by content - text, image, tables. Such an analysis of the document is referred to as geometric analysis. A different approach is that of a logical layout analysis, or semantic analysis, in which text blocks are labeled according to their role inside the document - titles, footnotes etc. Identifying sections correctly, numbering pages and arranging them in the correct order are standard requirements for OCR.

KEYWORDS: document layout analysis, area Voronoi diagrams, voting system, whitespace cover

1. INTRODUCTION

Layout analysis - the process of identifying and labeling the regions of interest contained in scanned text documents - is a prerequisite for documents intended to be converted to electronic format with optical character recognition algorithms. Even if this task is simpler than image segmentation, it still poses difficulties for background structure analysis. There are two main approaches to layout analysis [1]: • bottom-up: iteratively parses a document based on its pixel information. Generally, these approaches first parse the document into regions. These regions are then grouped into words, rows, and then into text blocks. • top-down: Approaches of this type attempt to iteratively separate the document into columns and blocks based on whitespace and geometric information.

1* corresponding author, Engineer, Ubisoft Romania, Bucharest, Romania, [email protected] 2 Engineer, Sparkware Technologies Romania, Bucharest, Romania, [email protected] 3 Engineer, Tangoe Romania, Bucharest, Romania, [email protected] 4 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 5 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

292

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

This paper presents an approach for document layout analysis system, featuring three algorithms - one top-down and two bottom-up algorithms, and a voter used to aggregate the generated results. Due to their error masking capability, voting algorithms are used in a wide range of commercial and research applications.

2. RESEARCH STANDARD AND APPROACHES

To begin with, we consider a recent development in document layout analysis, described by Breuel in [2]. Traditional layout analysis methods generally begin by attempting a global and complete segmentation of the document in distinct geometric regions corresponding to entities such as columns, titles, and paragraphs, using proximity, texture, or white space. Although these sections can then be processed individually with good results, obtaining a segmentation that correctly mirrors the document’s layout is a task that is very difficult to generalize. The implementation depicted in [2] uses accurate and optimal algorithms combined with robust statistical models to model and analyze the layout of the pages. The first step is to determine the background structure of the pages by assuming that there is a collection of rectangles in the plane, bounded by a given bounding box. The main idea is to find a rectangle that maximizes Q(T) (where Q is the evaluation function, often just the area of the rectangle) among all possible bounding rectangles, without overlapping any rectangle in the plane. Figure 1 illustrates a partial application of this idea.

Figure 1. Partial coverage of white space using the greedy algorithm described in [2]

293

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

However, this step is insufficient in determining useful background information about the page. Next, a set of evaluation criteria will be used to determine the regions of white space that are “meaningful” (those that separate text). The requirement is that the identified rectangles are bounded by a minimum number of connected components on each side.

Figure 2. Automatically identified column delimiters

The previous idea can be used to identify white space aligned with the document’s axes , but an algorithm for identifying arbitrary whitespace rectangles can eliminate the need to correct the rotation of the page. It works somewhat analogously to the previous algorithm, receiving a list of foreground shapes as input (polygons or bounding boxes of words or characters) and returning the maximum white space rectangles that do not overlap and which meet certain parameter requirements (orientation, width, height, position). The next step is to identify text lines, which can often prove problematic for complex layouts with multiple columns of varying widths. The idea is to use the column delimiters detected during the earlier processing steps as "obstacles" for a global branch-and-bound text line detection algorithm. This approach yields far better results compared to those of the traditional global or local methods. Finally, determining the correct order in which the document blocks are read depends not only on the geometric layout of the document, but also on the linguistic and semantic content. A general approach is to use the following two ordering criteria, as demonstrated in Figure 3: 1. The line segment a precedes b if their x-coordinate regions overlap and the line segment a is above the line segment b inside the page.

294

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2. The line segment a precedes b if a is entirely to the left of b and there is no line segment c whose y-coordinate is not between a and b and the x-coordinate area overlaps both a and b.

Figure 3. Topological sorting of text lines

3. VOTING-BASED LAYOUT ANALYSIS SYSTEM

In the following section, we propose a system for reliable document layout analysis, using three different algorithms and combining their results to better identify regions of interest and the emplacement of text areas within the document.

3.1. Overview

Figure 4. Overview of the layout analysis system - the voter aggregates the results of the layout analysis algorithms

295

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

3.2. Heuristic algorithm

The first algorithm uses a top-down approach which attempts to detect text through morphological modifications of the document, while also eliminating images. Based on some simple observations on multiple scanned document, we defined the algorithm, as presented below. A Canny transformation is applied to the grayscale image to detect edges and remove color differences. The unnecessary details of the full image are discarded by downscaling to a maximum size (width or height) of 256 pixels. After experimenting on multiple samples, this value has proven sufficient to obtain good results, while significantly speeding up the processing. After downscaling, we apply a dilate-erode transformation with a 3x3 square kernel to merge small elements into bigger blocks. To eliminate images, we added a preprocessing step in which we use OpenCV’s findContours to obtain a hierarchy of shapes within the document. Then, the tree of contours is searched bottom-up and masks are generated. These masks are used to calculate the entropy of the different areas of the scanned document. Each region with an entropy of more than 2.5 is considered an image and discarded (set to 0). This matches the intuition that a region containing text generally has only 2 colors, which can be represented on one bit, but may sometimes have different more than that. Thus, the threshold of 2.5 yields good results. After the image removal step, we move on to the actual detection of the layout (text paragraphs). A binarized, downscaled version of the image is intersected with the output of the previous steps, by applying a bitwise AND on the two images. The result is morphologically modified by a dilate-erode step. On the resulting image, the contours are once again detected and their bounding rectangles are the final regions that represent the output of the algorithm.

Figure 5. Document through different stages of processing: original, morphed Canny, morphed Canny with images removed and final output regions

296

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

3.3. Geometrical algorithm

The following algorithm, first introduced by Breuel in [3], then further discussed in [4], is a bottom-up heuristic approach. It attempts to find individual letters and then merge the bounding rectangles into text paragraphs. The original document is passed through a Canny edge detection algorithm. The generated result is blurred and then a findContours function is applied. This step finds all the individual characters in the document. Afterwards, we apply an incremental merge of the bounding rectangles. All rectangles with a certain ratio of black and white pixels are considered as text and merged. After multiple iterations of the algorithm we determined that the constant ratio of 0.2 yields the best results. This process is applied iteratively until a preset constant number of final bounding rectangles is extracted. The generated output forms the final paragraphs which will be considered in the voting process.

3.4. Voronoi algorithm

In this section, we will briefly describe a method based on the approximated area Voronoi diagram for solving the page segmentation task of a layout analysis system [5]. Voronoi edges can be found between all adjacent connected components. This means that each component is represented as a set of adjacent Voronoi areas. In order to achieve page segmentation, Voronoi edges - effectively, boundaries between various components in the document - are selected as a result of 3 main stages: 1. Through labeling, the connected components in the document are detected. 2. The area Voronoi diagram is generated 3. The unnecessary Voronoi edges are pruned. Two criteria are used to determine the edges that can be eliminated: • Minimum distance - edges found inside narrow spaces (such as spaces between characters) can be removed. • Area ratio – because minimum distance is efficient for preserving edges between columns (which have thick white areas) but not as efficient for cleaning the boundaries between components. Figure 6 shows the equivalent Delaunay triangulation of the area Voronoi diagram of a document processed as shown above.

297

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 6. The Delaunay triangulation of a processed document

4. VOTING METHODS

With the use of a voting algorithm, a higher degree of accuracy can be obtained. Instead of running a single generic algorithm, several algorithms with some degree of similarity are used in parallel, and their results are combined. This approach can mask the errors that the methods may produce for specific inputs, when run individually. Such voting systems can be grouped by very diverse criteria such as being either hardware voters or software, by the nature of the working environment - synchronous or asynchronous, but in this case, the most relevant classification is on their functionality [6]. Generic voters either choose one of the generated results, or combine them to produce a new one. In this category we include the following: • unanimity voters: all generated results are in agreement • majority voters: at least among generated results agree • plurality voters: m-out-of-n voting where m is less than a strict majority • median voters: they produce a correct result up to a maximum of faulty inputs In the presented system, the outputs of the algorithms presented at 3.2, 3.3 and 3.4 are collected and the voting is done on the identified rectangular surfaces. Votes are accumulated for every surface, and after using a majority voter and a unanimous voter, the aggregated results are compared with each other and against the individual performance of the algorithms (figure 7).

298

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 7. Comparison between the original document, voting from all the algorithms and unanimity voted regions

5. CONCLUSIONS

To benchmark our system we chose the LRDE dataset [7], which was also used in [8]. The dataset provides documents for input files and ground truths for OCR detection. We measured the processing time for each image in the set for each of the three algorrithms, the accuracy of the majority vote (2-ouut-of-3) and that of the unanimity vote (all algorithms in agreement). The success rate was calculated as a ratio between the number of pixels in the generated output and the value of the pixels in ground truth images, and the total number of pixels in the image. The unanimity voting method yielded more consistent results. The average accuracy was 93.66% with a minimum of 54.91% and a maximum of 98.79%. The majority voting yielded a more uniform histogram of results, with an average accuracy of 87.15%, a minimum of 46.76% and a maximum of 98.08%. The images in the LRDE dataset are 2516 x 3272 pixels. The average running times over the 126 images dataset was 686ms for Heuristic, 5449ms for Geometrical and 2813ms for Voronoi. As a future development, the presented research will be integrated into a full document image analysis system, thus combining several previously developed voting-based processing stages [9][10][11][12] in order to further improve the automatic detection accuracy.

299

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 8. Comparison of the majority and unanimity voting methods

300

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 9. Comparison of running times of the algorithms over the whole dataset

ACKNOWLEDGEMENT

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0689 / „Lib2Life- Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate” / "Revitalizing Libraries and Cultural Heritage through Advanced Technologies", within PNCDI III.

REFERENCES

[1] A. Antonacopoulos, C. Clausner, C. Papadopoulos, S. Pletschacher, "ICDAR2015 Competition on Recognition of Documents with Complex Layouts – RDCL2015", Proceedings of the 13th International Conference on Document Analysis and Recognition, Nancy, France, August 2015, pp. 1151-1155. [2] T. M. Breuel, “Document Layout Analysis", Image Understanding and Pattern Recognition Group, Technischen Universität Kaiserslautern (TUK). [3] T. M. Breuel, “Two geometric algorithms for layout analysis”, Proceedings of the Fifth International Workshop on Document Analysis Systems, Princeton, NY, 2002, LNCS 2423, pp. 188-199. [4] U. Kumar, J. Raheja - “Document Presentation Engine for Indian OCR - A Document Layout Analysis Application”, International Journal of Recent Trends in Engineering, (IJRTE) Vol. 3, No. 3, May 2010, pp 182-186.

301

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[5] K. Kise, A. Sato, M. Iwata - “Segmentation of Page Images Using the Area Voronoi Diagram”, Computer Vision and Image Understanding, ISSN: 1077-3142, vol. 70, issue 3, pp. 370-382. [6] P.Babul Saheb, Mr. k.Subbarao , Dr. S.Phani kumar - “A Survey on Voting Algorithms Used In Safety Critical Systems”, International Journal Of Engineering And Computer Science, ISSN: 2319-7242, vol. 2, issue 7, pp. 2272-2275 [7] LRDE Document Binarization Dataset, Available at: https:// www.lrde.epita.fr/ wiki/ Olena/ DatasetDBD, Accessed at: 1 March 2018 [8] M. Soua, A. Benchekroun, R. Kachouri, M. Akil - “Real-time text extraction based on the page layout analysis system”, Proc. SPIE 10223, Real-Time Image and Processing 2017, 1022305, Anaheim, CA, April 2017. [9] Costin-Anton Boiangiu, Radu Ioanitescu, Razvan-Costin Dragomir, “Voting-Based OCR System”, The Proceedings of Journal ISOM, Vol. 10 No. 2 / December 2016 (Journal of Information Systems, Operations Management), pp 470-486, ISSN 1843-4711 [10] Costin-Anton Boiangiu, Mihai Simion, Vlad Lionte, Zaharescu Mihai – “Voting- Based Image Binarization” - , The Proceedings of Journal ISOM Vol. 8 No. 2 / December 2014 (Journal of Information Systems, Operations Management), pp. 343-351, ISSN 1843-4711 [11] Costin-Anton Boiangiu, Paul Boglis, Georgiana Simion, Radu Ioanitescu, "Voting- Based Layout Analysis", The Proceedings of Journal ISOM Vol. 8 No. 1 / June 2014 (Journal of Information Systems, Operations Management), pp. 39-47, ISSN 1843-4711 [12] Costin-Anton Boiangiu, Radu Ioanitescu, “Voting-Based Image Segmentation”, The Proceedings of Journal ISOM Vol. 7 No. 2 / December 2013 (Journal of Information Systems, Operations Management), pp. 211-220, ISSN 1843-4711.

302

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

CUSTOMER LIFETIME VALUE AND CUSTOMER LOYALTY

Andreea BARBU 1* Bogdan TIGANOAIA 2

ABSTRACT

The goal of this paper is to highlight the value of the services and the customer lifetime value, focusing on the benefits of the customer lifetime value research study on determining some marketing strategies used in the services sector. This paper also presents a loyalty strategy using cards utilized by a cinema that applies a Customer Relationship Management program to determine the customer lifetime value in the three stages of the customer’s lifecycle.

KEYWORDS: customer loyalty, loyalty cards, customer lifetime value, the value of services, service performance

INTRODUCTION

In the last years, the services sector has begun to occupy an increasingly important place in the economy, attracting the attention of many researchers. Thus it began appearing more and more interesting papers about the services sector and its importance in the economy. Over the years, many researchers have formulated different definitions for the concept of services, including Philip Kotler (2003), some being very succinct, some very comprehensive. For example, one of the most comprehensive definition is given by Christian Grönroos (1983) who defined the service as a single or a group of activities that are more or less tangible, that occurs when a buyer and a vendor interact, while the simple definition of the service concept refers to the intangibility of an element that offers utility to the beneficiary (Ionescu, S., 2004). In most of the definitions, it is highlighted both the intangibility of the services and the relationship between the buyer and the supplier, which is often decisive in providing the service. After 1990, researchers began to use the concept of value that can be presented through the re-engineering of the services. This implies focusing on identifying the key processes that are necessary for the provision of services, while all the other activities are carried out in order to satisfy the customer. In the service industry, there are quite a few industrialized services, which is why at first sight, this term of re-engineering may seem forced for the services industry.

1* corresponding author, teaching Assistant Eng., Ph.D. Student, University POLITEHNICA of Bucharest, [email protected] 2 Assoc. Prof. Eng. Ph.D., University POLITEHNICA of Bucharest, [email protected]

303

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

In the last years, re-engineering should be seen as a response to the accelerating changes in society that have an impact on the organization, meaning that the businesses change the way they operate in order to remain competitive. According to Michael Hammer (2001), customers and processes are the most important attributes of re-engineering, the client being one of the key elements of the business. Also, organizations have begun to turn to customers by establishing strategies for evaluating the quality of services, including customer satisfaction.

EXPERIMENTAL

Customer lifetime value

One of the concerns of the companies is represented by focusing on customers who have a higher rate of profitability, forgetting to have a long-term vision in which even the segments which have a lower profitability can generate, in time, large profits. So the majority of the companies wants to have as many satisfied customers as possible, also following the customer lifetime value. According to Krstevski, and Mancheski (2016), the concept of customer lifetime value can be summed up to the present value of a customer, value that is based on the future cash flows attributed to the relationship. Also, it is worth mentioning the lifecycle of a client in terms of profitability of a firm, an individual that goes from being a potential customer or suspect, to a cold customer, warm customer, occasional customer and then loyal customer (Claeyssen, Y. et al., 2009). The cold customer level can be achieved by identifying the potential client and bringing him about the company’s products and services. The cold customer can become a warm customer if he is interested in the information that he is receiving and wants to purchase from that company. The objective of the organizations is to continue this growth of the customer and to make the transition from occasional customer to loyal customer, bringing long-term company profitability. Thereby, firms have to deal with customer segmentation and treatment groups throughout the three main stages of their life cycle, namely: to attract new customers, retain the existing customers and develop customer relationships. Companies must determine those factors that contribute in order to increase profitability and discover how to apply it to all categories of customers to achieve maximum profitability. According to Barlow, (Claeyssen, Y. et al., 2009) loyalty can be defined as a strategy which identifies, maintains and increases performance for the best clients, with the help of an added-value relation, that is also a long-term focused relationship. Thus, even though the company can handle on the short term by attracting new customers, on the medium and long-term it must move toward customer loyalty. Carmen Bălan (2007) also debates the subject of the substantiation of a marketing strategy based on customer’s lifetime value measurement in an article published in the Marketing Online Magazine. Customer loyalty strategies are aimed at two axes: an economic added value and an effective added value, which are used in order to persuade loyal customers to resist the

304

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT offers proposed by its competitors. These axes are used by firms to both gratify the rational customer behavior side and emotional side, which determine the individual's attachment to these organizations. The economic value can refer to an activity, a product or a business. It may be worth the manufacturer or the customer value. There is a difference in how the customer and the manufacturer thinks about the value, this difference being presented in Figure 1 (Conti, T., 1992). The customer lifetime value is the measure to which customers perceive that a product has a good satisfaction and is measured by money (the amount by money - VFM) or by the rate of the customer retention. The value may be apparent or real (undistorted by inflation), residual value (which remains from a resource that is about to be exhausted) or marginal value (related to the intended utility of the existence or consumption of an additional unit of an entity). Also, each activity adds value. Thus, the added value is that value that a company brings by processing a product or obtaining a service, calculated as a result of the difference between price and cost.

Figure 1. The meaning of value (T. Conti, 1992)

Creating value

Creating value and the growth of value require the development of links between strategies used by a company on its work (e.g. links between market entry strategy and promotion strategy) Thus, the main links on strategies within a company are (Moss-Kanter, R., 2006):

305

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

• Strategic connections - the use of the management technology, the experience in a field, the value chain structure and value system structure; • Competition connections between customers, distribution channels, and brand used; • Technological connections of technologies serving, serving processes, research methods of new services, exchange of information (information technology); • Operative connections about the extension services, and use capacity serving in excess, the purchase of materials (negotiating power), the use of personnel.

Particularities of services strategy

The environment influences services strategies through the demographic evolution of customers, social and economic factors, customer requirements and expectations, prices, existing legislation, technological developments, the performance of suppliers. On the other hand, the internal factors of the organization providing services are represented by the quality of services perceived by customers, efficiency of sale, costs, satisfaction and security of both employees and customers, process management or even productivity targets. After setting the vision and mission, companies must determine the main issues of a service. These issues have the following characteristics: they are important for the customer; they create high costs if the quality is poor; they appeared frequently; they have a substantial impact; they create significant delays. The key issues on which more services depend on are (Braduţan, 2012): • increasing the level of services (quality); • increasing access to services for all customers; • acting in a manner that ensures the safety of customers and operators; • ensuring a minimum cost for the service operations; • developing communication between customers, suppliers, legislators, and others; • giving clear and complete explanations, ensuring a fair application of and regulations. Through the analysis of the relationship between companies and their competitors, it can be said that elements such as the quality of the services, their range and also their prices can be considered the most important factors that could influence the customers’ behavior. Also, their behavior could be influenced if different strategies were applied. Lately, one of the most common strategies for companies in the service sector is the customer loyalty strategy. One of the main purposes of the implementation of loyalty programs is to prevent the migration of customers to competitors. Any company knows that taking care of this aspect, it actually contributes to changing their profitability. In this regard, we must follow the evolution of the company’s profitability in the medium and long-term by calculating customer loyalty stage in a succession of periods. This value can be obtained as the difference between revenues and costs of this phase, from which we deduct the nonpayment rate (Allard, C., 2003).

306

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Case study- Customer lifetime value for a cinema

To determine the customer lifetime value in case of a cinema, that applies a Customer Relationship Management program and a loyalty strategy through the cards, we use a method proposed by C. Allard (2003) in his book Management de la valeur client. In this regard, we chose to do a study on a cinema from Bucharest, aiming to attract 100.000 new customers among the youth, for a period of 1 year. We used estimated data which has a demonstrative role in supporting the importance of loyalty strategies using loyalty cards. Next, it will be presented the method of determining the customer value for the cinema being pursued and its development in the three stages of the customer’s lifecycle.

RESULTS

Step 1 - Attracting customers

The management of the cinema had decided to contact 100.000 young people per year, in order to inform them about the prices and the offers of the cinema. In this stage of attracting customers, they distributed flyers and magazines such as the magazine Seven nights in faculties and in some public places, the cost of these materials for a potential customer being 1,5 lei (0,33 euro). The simulation is made considering the movie ticket has a value of 18 lei (4 euro).

Table 1. Results obtained in Step 1 - attracting customers Calculation Value Value Code Description Method (lei) (euro) A The number of potential customers Variable 100.000 100.000 B Cost per contact Variable 1,5 0,33 C Total cost A*B 150.000 33.333,33 The percentage representing the number of D Variable 0,8 0,8 people who were interested in the offer E The number of potential contracts A*D 80.000 80.000 F The percentage of signed contracts Variable 0,9 0,9 G The number of new customers E*F 72.000 72.000 H The global agreement rate G/A 0,72 0,72 J The cost of attracting a customer B/H 2,083 0,46 K The revenue generated by concluding a contract Variable 18 4 L The revenue generated in the attraction phase G*K 1.296.000 288.000 M Gross margin rate Variable 0,5 0,5 N Gross margin per customer K*M 9 2

307

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Calculation Value Value Code Description Method (lei) (euro) P Total gross margin L*M 648.000 144.000 Q Return On Investment P/C 4,32 4,32 R Customer's value N-J 6,92 1,54 T The customer lifetime value P-C 498.000 110.666,66

Step 2 – Actions to improve customer loyalty

The management of the cinema is trying to attract customers by offering loyalty points and special prices on certain days and certain hours. The cinema suggests that for every 1 lei paid, the client should receive 1 point. Example: if you buy a ticket that costs 10 lei, you will get 10 points. To pay for a product/ ticket with fidelity points, after you give your loyalty card to the cashier, you must mention that you want to make the payment with loyalty points. The same rule applies in the case you want to buy something from the cinema bar. Also on Tuesday, the unique price for a film is 13 lei for loyalty card holders, the same offer being valid from Monday to Friday for digital movies, for movies that run until 17 o'clock.

Table 2. Results obtained in Step 2 - actions to improve customer loyalty Calculation Value Value Code Description Method (lei) (euro) A The monthly revenue brought by a customer Variable 26 5,78 B The monthly rate of losing customers Variable 0,1 0,1 C Gross margin Variable 0,5 0,5 D The retention cost of a customer Variable 3 0,67 E The nonpayment rate Variable 0 0 The number of customers at the start of the F Variable 72.000 72.000 loyalty phase G The number of lost customers F*B 7.200 7.200 H The number of customers at the end of the phase F-G 64.800 64.800 J Total revenues H*A 1.684.800 374.400 K Gross margin J*C 842.400 187.200 L Customers retention cost F*D 216.000 48.000 M The cost of the nonpayment rate J*E 0 0 N The customer lifetime value in the loyalty phase K-L-M 626.400 139.200

308

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Step 3 - Developing relationships with loyal customers

Table 3. Results obtained in Step 3- developing relationships with loyal customers Calculation Value Value Code Description Method (lei) (euro) A The number of loyal customers Variable 64.800 64.800 The percentage of customers exposed to new B Variable 0,95 0,95 deals C The number of customers exposed to new offers AxB 61.560 61.560 D The acceptance rate of new offers Variable 0,9 0,9 E The number of signed contracts CxD 55.404 55.404 F The revenue generated by new offers Variable 15 3,33 G Total revenues generated ExF 831.060 184.680 H Gross margin rate of an offer Variable 0,5 0,5 J Gross margin generated by new offers GxH 415.530 92.340 K Cost per customer and new offer Variable 6 1,33 L The cost of the new offers CxK 369.360 82.080 M Nonpayment rate Variable 0 0 N The cost of the nonpayment rate GxM 0 0 P The customer lifetime value J-L-N 46.170 10.260

In the third stage of the customer’s lifecycle, the cinema proposes to customers to purchase products from the bar as a standard menu of 15 lei. The percentage of customers exposed to this offer is 95%, the acceptance rate is 90%, whereas nearly all young people who come to the cinema buy juice and popcorn.

DISCUSSION

As we can see in Table 1, corresponding to the attraction phase, knowing that the gross rate agreed is 80%, and the net rate agreed is 90% at the end of this simulation using a Management Customer Relationship, we obtained a customer lifetime value of 498.000 lei (about 110.667 euro). In the loyalty phase, as we can see in Table 2, taking into consideration the presented data, for a monthly income of 26 lei (assuming that the client comes twice a month to the cinema taking benefit of promotional offers), with a 10% rate of loss of customers and 0% rate non-payment of payment, the cinema using the Management Customer Relationship program generates a customer lifetime value at the stage of loyalty of 626.400 lei (139.200 euro), suggesting an increase of 25% compared to revenue during attracting new customers phase. This result helps us demonstrate that it is more advantageous for a company in terms of costs and revenues to make existing customers loyal to the cinema and to pay more attention to them, not only to capture new customers.

309

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Table 3, corresponding to the phase of developing relationships with loyal customers, shows us that at this stage of the customer lifecycle, the customer lifetime value is 46.170 lei (10.260 euro), much lower than other values previously determined, but not negligible.

CONCLUSIONS

When it comes to the services sector, determining the customer lifetime value is very important because it helps to establish certain characteristics of the service. This research helps us put in balance the costs that a particular service involves and the benefits they derived from studies of the customer lifetime value. Through the analysis of other research papers from the services sector, the authors observed that loyalty strategies have a very important role for companies, the easiest way to achieve loyalty being loyalty cards. Using loyalty card brings benefits to consumers who enjoy special promotions, discounts or gifts, but also among companies that use these loyalty strategies. Thus, according to this case study represented by a cinema, a loyalty strategy based on loyalty cards is essential to the firm, which not only helps build customer loyalty and increases market share but also increases sales volume and profit, both on the medium and long-term. The values obtained in this paper after studying the usage of a loyalty strategy highlight the importance of implementing such a strategy in the service sector. The final results demonstrated that in customer loyalty phase, the value of customers was increased by 25,78% than that obtained in the attraction phase. In this case, by using loyalty strategies such as loyalty cards, the customers’ satisfaction will raise, because they are treated differently, they receive personalized offers which make them feel important. In conclusion, if we increase customer lifetime value by maximizing customer satisfaction then we can say that we contribute directly to increase the firm profitability.

REFERENCES

[1] Allard, C., 2003. Management de la Valeur Client, Ed. Dunod, Paris [2] Bălan, C., 2007. Substantiation of the Marketing Strategy Based on Customer Value Measurement- Marketing Online Magazine– Vol.1, Nr.3, 2007 [Online] Available from http://www.edumark.ase.ro/RePEc/rmko/3/10.pdf Accessed 2017-03-15 [3] Braduţan, S., 2012, Strategic Organization of IT and Telecommunication Services, Bulletin of the Transilvania University of Braşov, Vol. 5 (54), No. 1, Series V: Economic Sciences [4] Claeyssen, Y., Deydier, A., Riquet, Y., 2009. Multichannel direct marketing. The prospecting, loyalty and regaining the customer, Polirom Iaşi [5] Conti, T., 1992. Come costruire la qualita totale, Ed. Sperling & Kupfer Milano

310

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[6] Fernandes, T., Calamote, A., 2016. Unfairness in consumer services: Outcomes of differential treatment of new and existing clients-Journal of Retailing and Consumer Services, Volume 28, January 2016, Pages 36-44 [7] Grönroos, C. 1983. Strategic Management and Marketing in the Service Sector, Marketing Science Institute, Boston, MA. [8] Hammer, M., 2001. The Agenda: What Every Business Must Do to Dominate the Decade, Random House Business [9] Ionescu, S., 2004. Reengineering of Services, Ed. ASE, Bucharest [10] Kotler, Ph., 2003. Marketing Insights from A to Z – 80 concepts every manager needs to know, John Wiley and Sons, New Jersey [11] Krstevski, D., Mancheski, G., 2016, Managerial Accounting: Modeling Customer Lifetime Value - An Application in the Telecommunication Industry, European Journal of Business and Social Sciences, Volume 5, Pages 64-77. [12] Moss-Kanter, R., 2006. On the Frontiers of Management, Ed. Meteor Business, Bucharest

311

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

VOTING-BASED HDR COMPRESSION

Răzvan-Costin DRAGOMIR 1* Costin-Anton BOIANGIU 2

ABSTRACT

During the past few decades, highly sensitive able to take high dynamic range photographs have been developed. Unfortunately, screens have not kept pace with this rising technology and what is currently faced is the inability to display HDR images on conventional displays. As a workaround, many techniques that attempt to compress the original image so it can be displayed on a conventional screen have been proposed, but their results differ greatly depending on the input image. What this paper aims to achieve is an improvement in the overall results of such techniques by implementing a voting system between existing tone-mapping algorithms based on alternating the algorithms or modifying their internal input parameters. In addition, the paper also presents a new and improved tone-mapping algorithm resembling the existing ones but faster and which yields satisfying results.

KEYWORDS: High Dynamic Range Images, HDR Compression, Voting Processing

1. INTRODUCTION

A highly important characteristic of an image or a video captured with a conventional device is the degree at which the real scene is reproduced. In the field of image processing, the main element controlling the fidelity of the captured image is the luminance of the objects composing the scene. This property describes the quantity of light radiated, reflected or passing through those objects. In conventional , the luminance of the captured image differs from the one of the real scene due to the difference in their dynamic intervals (the ratio between the lowest and greatest possible brightness value). In other words, due to the limited dynamic interval of conventional photographs, there is significant loss of information compared to the original scene. With technology being is on the rise (mainly because there is a high demand of high quality images in certain fields amongst which autonomous cars or assisted parking sensors), the advent of HDR (“high dynamic range”) devices did not come as a surprise. These devices greatly overcome the drawbacks of their conventional counterparts such as LDR (“low dynamic range”) and SDR (“standard dynamic range”).

1* corresponding author, Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 2 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

312

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

There remains, of course, the problem of displaying these HDR images. Up until now, the transition between conventional and HDR TV’s, screens and phones has not been completed. Therefore, in this paper the aim is to reproduce an HDR image in LDR format in such a manner that it can both improve the level of details present in the image and display it on a device with limited dynamic range. Far from being a new research field, as it has been studied for approximately two decades, it can still benefit from improvements. The technique is known as “tone mapping” and what it does when applied on an HDR image is to perceptually approximate the real scene by preserving a certain level of details and contrast. What propelled this technology again after a promising start at the beginning of 2000 was the advent of mobile phone integrated cameras that offered the advantages of expensive at an affordable price. Most of the tone mapping algorithms [1][2][3][4] try to compress the dynamic range of the real scene and to reproduce it into a limited dynamic range. Unfortunately, most methods account for a specific class of images or have implicit control parameters based on individual statistics and finding the appropriate value for the parameters generally poses difficulties. Taking all the above into consideration, this paper aims to describe a new method of obtaining an LDR image from an HDR one by varying different algorithms and their corresponding configuration parameters. As tone mapping algorithms do yield desired results, each for a certain type of input image, the proposed method relies on a voting system. By combining the aforementioned algorithms, the hope is to obtain an improved limited dynamic range image. In addition to the voting system, a tone mapping algorithm was also devised starting from the existing ones that has a good time complexity and yields satisfying results.

2. IMAGE PROPERTIES

Luminance is a photometric measure which represents the intensity of the light emitted, reflected or passing through objects in a scene and it is basically an indicator of how bright an object in that scene is. One of the most well-known image formats, the RGB format, lacks such indicator of how bright an object is, but can be either converted into a format which does have it, or the luminance can be computed according to formulas given by the IEC 61966-2-1 RGB standard (fig. 2.1).

313

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 1. Luminance formula applied on RGB image (left-right): original image; gray-scale image (luminance)

3. CORRECTION

Gamma correction is an operator that controls the brightness of the entire scene and it is defined by the following expression: γ , , (1) where (x, y) is the pixel corresponding to row x, column y of the scene’s matrix representation; 0 is the current luminance of the scene; L1 is the new value of the luminance and is the translation exponent. According to (5), if = 1, the output image is identical to the input image. If < 1, the operation is called gamma compression and it is used for brightening the scene. On the contrary, if > 1, the operation is called gamma expansion and it is used for darkening the scene.

314

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 2. Gamma correction example (left-right) γ = 1, γ = 0.2, γ = 1.2, γ = 2.2

4. DYNAMIC GAMMA

Any device able to capture or display images has a characteristic called dynamic gamma, defined as the ratio between the highest and lowest pixel value. These values also determine the upper and lower boundaries of the dynamic range. Unfortunately, conventional devices have a significantly small dynamic gamma (300:1 on average) as opposed to a real scene. This drawback is precisely what lead to the advent of the HDR technology both in capturing and displaying devices.

Table 1 Device Dynamic gamma LCD 250:1 – 1750:1 Human eye 1000:1 – 15000:1 DSLR (Nikon D810) 28500:1 (Red Weapon 8k) 92000:1

5.

The exposure time is the amount of time in which the sensor inside the camera is exposed to light before the image is formed. It is usually measured in seconds or fractions of seconds. By varying the exposure time, a photographer can obtain underexposed and thus darker images or overexposed and thus overall brighter images. Both under and overexposure lead to a decrease in the visible level of details in an image.

6. OBTAINING HDR IMAGES

The most widely used approach for obtaining HDR images belongs to Paul Debevec [8] and consists in capturing consecutive frames of the same scene at different exposure times and combining them in a single image. The resulting image is composed of pixels whose values are proportional to the values of the real scene’s luminance.

315

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 3. Images with different exposure levels increasing linearly from top-left to bottom-right

Figure 4. Image obtained by applying Debevec’s algorithm on the frames from figure 2.3. Note that the image in figure 2.4 is very detailed and no region is either too dark or too bright.

316

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

7. PECULIARITIES

Both capturing and converting an HDR image into an LDR image are subject to anomalies. A possible abnormality is caused by the involuntary movement of the camera between frames and can be easily solved for example by fixing the camera on a steady . Another anomaly is caused by moving objects in the scene. Since the latter is harder to control, peculiarities such as the ghost effect are likely to appear.

Figure 4. Moving object in a scene

317

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 5. (top-bottom) the ghost effect; inverted to highlight the artifacts

Another undesired artifact is the halo effect caused by inverting the contrast between an object and the surrounding details. It can be observed around small objects and it is caused by local algorithms that average the values of bright pixels and their neighbors.

8. LOGARITHMS AND EXPONENTIALS IN HDR

Since using linear functions did not yield the expected results when it came to HDR compression, the focus shifted towards logarithmic and exponential functions. The logarithm’s base plays a vital role in determining its value and the higher the base, the higher values will be mapped to lower ones. The opposite happens for the exponential function. Applied on HDR images, certain combinations of logarithmic and exponential functions determine a compression of the dynamic range. Darker areas will be dealt with by exponentials in order to brighten them while brighter areas will be subject to logarithmic operations meant to darken and reveal details.

318

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 6. Applying a logarithmic function to an image

Figure 7. Applying an exponential function to an image

Note the amount of detail revealed by applying logarithmic or exponential functions to the images in figure 2.6 and 2.7.

9. DETERMINING THE QUALITY OF AN IMAGE

Tone-mapping algorithms can yield unexpected and sometimes undesired results such as ghost and halo effects. For this reason, it is of great importance to assess the quality of the output image and this can be done by comparing it to a reference or ideal image. PSNR The “peak signal-to-noise ratio” represents the ratio between the maximum power of a signal and the power of the noise corrupting that signal. I PSNR 20 log10 (2) 2 MSE ∑ Irx, y Itx, y (3)

319

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT where Ir, It are the reference and test images; (x, y) is the pixel corresponding to row x, column y of the scene’s matrix representation; n, m are the total number of rows and columns and max(Ir) is the maximum value of the pixels in the reference image. The bigger the PSNR measure, the better the quality of the image is. The main drawback of this measure is that it does not take human sight into account and therefore cannot provide correct estimates. SSMI A measure that does account for the functioning of the human eye (defined by the HVS model) is the ”Structural similarity index measurement” [12]. TMQI “Tone Mapped Image Quality Index” is a measure proposed by Z. Wang and it is based on a modified structural similarity index and a naturalness function based on statistics acquired from natural images.

10. SIMILAR METHODS

Tone mapping algorithms

Tone mapping algorithms reduce large dynamic ranges so the resulting image can be easily displayed on a standard screen. These algorithms can be split into two categories: local and global. A local operator will change the brightness using the current brightness on the selected pixel and a set of properties of the surrounding pixels. In the case of global algorithms, the brightness compression function becomes the same for all the pixels, in contrast to the local algorithms where it varies depending on the picture fragment.

Similar voting methods

Using the idea of voting, the final results are enhanced, thus subduing the disadvantages of some algorithms. Voting algorithms are being successfully used in case of image processing methods whose results strongly depend on the input images. Among them are: the binarization method, where the images are being converted from grayscale images to black and white images using an algorithm threshold, image segmentation and OCR (“Optical Character Recognition”). The following method can be used in OCR: at first the image areas that contain text are identified, then on each area a variable number of preprocessing filters are applied. For each filter, the OCR engine can successfully recognize a number of characters with an accuracy score. After that the voting algorithm picks the best fitting filter for each area and the final result is obtained by combining all the picture fragments.

320

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

11. THE TONE MAPPING ALGORITHM

The proposed method is a tone mapping algorithm based on a compression function for large dynamic intervals that is applied on the entire image in case of the global approach or on input image blocks if a local approach is favored. The function is a combination of logarithms and exponentials. The logarithmic function restricts a value interval when applied while the exponential one is expanding that interval. By using them in the same time, the brightness can be increased for dark image areas or decreased in case the areas are too bright. The first step of the algorithm is to establish the HDR image luminance. The RGB color format has no variable that contain the brightness value so it needs to be calculated for each pixel. The next step is to ascertain the minimum brightness value, the maximum brightness value, the average of the logarithm of the brightness values and the maximum subtraction in the logarithmic space of the brightness matrix. Using these values, the current luminance is being translated into a LDR using the following formula:

, , , (4) , , (5) where , is the value of the pixel at (x, y) from the new luminance matrix, , is the value of the pixel at (x, y) from the current luminance matrix and is value between [0,2] that controls the luminance of the entire scene (the default is 0.9). The last step of the algorithm is changing the current luminance with the calculated one. A local approach of this algorithm can be made using image segmentation then applying the global method of brightness translation into the LDR domain on each picture block obtained by segmentation. For , the segmentation will be made on 80x80 blocks. The blocks must not overlap and their reunion should produce the input picture. If the situation requires it the blocks can be bigger or smaller, depending on the remaining pixels. After the image segmentation, the global method is applied on each block. At the block level, the results were satisfying: the details were clearer, but when scaling the image, the quality decreased; there were visible brightness differences between neighboring blocks.

321

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 3. Different intensity blocks (left-right) “memorial.hdr”, “oxford_church.hdr”

A weighted average between the maximum values of each block applied on the entire picture was able to solve this issue.

12. ALGORITHM IMPLEMENTATION EXAMPLE

For ease of implementation, the algorithm can be split into four modules: an input module, a preprocessing module, a voting module and an output module. The modules communicate with each other using the voting module. The algorithm’s architecture resembles the LDR image creation technique, the major differences being found in the voting and preprocessing modules. The input module reads the input data and begins the voting algorithms. There are two ways of reading the input image: by using a HDR image or by using a set of standard images, each having its own exposure level, which will be merged into a HDR image using a fusion algorithm. Detection and correction methods can also be used to reduce the number of artefacts that may occur during the creation of an HDR image. The set should be made using a static scene to avoid movement blur between frames. To obtain an accurate result from the fusion algorithm, the exposure level used on each frame should be known. The input data should contain three channels: red, green and blue whether it is an HDR picture or a picture set. The preprocessing module consists in three submodules: a submodule that contains the implementations of tone mapping algorithms, another which contains the control parameters for the algorithms and a submodule that computes the current luminance matrix. As a first step, the module computes the brightness matrix. Then the matrix is sent to the next step where the main tone mapping algorithms are being applied. The parameters needed may vary for each algorithm, but the luminance matrix is used for all of them.

322

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Voting Module

The output module merges the image blocks resulted from the segmentation, then changes the current luminance with the one resulting from the algorithm and prints the resulting LDR image which has a restricted dynamic interval.

13. THE RESULTS

According to the tests, the results show that the global version of the algorithm runs faster, but does not produce an acceptable contrast between colors, while the local algorithm outputs images with a strong contrast but has a greater time complexity (which could be omitted). During the program’s execution, it can be noticed that the Durand operator makes the scene seem artistic and farther away but emphasizes the details in the center of the image, while Drago's operator makes it seem closer, but the colors are lighter. The tests have proven that the running time of the global method is logarithmic. The tests were made on input HDR images, not on LDR image sets to avoid any resulting artefacts that may result from movement. The program was run both globally and locally. For the local case, on the blocks where the ReinhardTMO algorithm won, there is a minor intensity difference between the adjacent blocks. The global voting time consists of executing the five algorithms and obtaining the results for each of them. In the case of a global vote, this is the running time of the voting system. The local voting time is the voting time of executing the entire set of algorithms on each block resulted from the image segmentation. The total running time is the sum of the global voting time and the local voting time.

14. CONCLUSIONS AND FURTHER WORK

The paper has been divided into two parts. The purpose of the first part was to demonstrate the viability of a voting-based method able to choose between different tone mapping algorithms to improve the result. The results were indeed satisfying, but the running time was higher. In the second part of the work a tone mapping algorithm was proposed with a local and a global version which yields satisfying results in a small amount of time for images with medium-sized dynamic intervals. A possible future step would be to reduce time complexity with the help of multi- threading programming. Another step is to develop a better method for reducing the difference in intensity between adjacent blocks. This paper concludes the research carried during the master studies at the faculty of Automatics and Computers from the “Politehnica” University of Bucharest by the first author, thus continuing the work presented in [26].

323

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

ACKNOWLEDGEMENT

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0689 / „Lib2Life- Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate” / "Revitalizing Libraries and Cultural Heritage through Advanced Technologies", within PNCDI III.

REFERENCES

[1] E. Reinhard, M. M. Stark, P. Shirley şi J. A. Ferwerda, „Photographic tone reproduction for digital images,” ACM transactions on (TOG), vol. 21, nr. 3, pp. 267-276, 2002. [2] F. Drago, K. Myszkowski, T. Annen şi N. Chiba, „Adaptive Logarithmic Mapping for Displaying High Contrast Scenes,” Computer Graphics Forum, vol. 22, nr. 3, 2003. [3] J. Kuang, J. Garrett şi F. Mark, „iCAM06: A refined image appearance model for HDR image rendering.” Journal of Visual Communication and Image Representation, vol. 18, nr. 5, pp. 406-414, 2007. [4] F. Durand şi J. Dorsey, „Fast Bilateral Filtering for the Display of High-Dynamic- Range Images,” ACM transactions on graphics (TOG), vol. 21, nr. 3, 2002. [5] International Electrotechnical Commission, „IEC 61966-2-1: systems and equipment-Colour measurement and management-Part 2-1: Colour management-Default RGB colour space-sRGB.,” 1999. [6] „Cambridge in colour,” [Interactiv]. Available: http://www.cambridgeincolour.com/tutorials/cameras-vshuman-eye.htm. [Accessed: August 2017]. [7] „Camera Sensor Ratings,” DxOMark, [Interactiv]. Available: https://www.dxomark.com/Cameras/Ratings. [Accessed: August 2017]. [8] P. Debevec, „Recovering High Dynamic Range Radiance Maps from Photographs,” [Interactiv]. Available: http://www.pauldebevec.com/Research/HDR/. [Accessed: August 2017]. [9] S. Désiré şi S. Abhilash, „Ghost Detection and Removal in High Dynamic Range Images,” Signal Processing: Image Communication, 2012. [10] A. &. Z. D. Horé, „Image quality metrics: PSNR vs. SSIM,” Pattern recognition (icpr), 2010. [11] G. Wetzstein, „The Human Visual System,” [Interactiv]. Available: https://stanford.edu/class/ee267/lectures/lecture5.pdf. [Accessed: August 2017]. [12] Z. Wang şi A. C. Bovik, „A universal image quality index,” IEEE Signal Processing Letters, 2002.

324

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[13] Z. Wang, E. P. Simoncelli şi A. C. Bovik, „Multiscale structural similarity for image quality assessment,” Signals, Systems and Computers, vol. 2, 2003. [14] Z. Wang şi H. Yeganeh, „Objective Quality Assessment of Tone-Mapped Images,” IEEE Transactions on Image Processing, vol. 22, nr. 2, pp. 657-667, 2013. [15] C.-A. Boiangiu, M. Simion, V. Lionte şi Z. Mihai, „Voting Based Image Binarization,” Journal of Information Systems & Operations Management, 2014. [16] C.-A. Boiangiu şi R. Ioanitescu, „Voting-Based Image Segmentation,” Journal of Information Systems & Operations Management, 2013. [17] C-A. Boiangiu, R. Ioanitescu şi R.-C. Dragomir, „Voting-Based OCR System,” Journal of Information Systems & Operations Management, p. 470, 2016. [18] F. Banterle, „HDR Toolbox for Matlab,” [Interactiv]. Available: https://github.com/banterle/HDR_Toolbox. [Accessed: August 2017]. [19] Banterle, Francesco and Artusi, Alessandro and Debattista, Kurt and Chalmers şi Alan, Advanced High Dynamic Range Imaging: Theory and Practice, AK Peters (CRC Press), 2011. [20] Z. Wang şi H. Yeganeh, „TMQI: Tone Mapped Image Quality Index,” [Interactiv]. Available: https://ece.uwaterloo.ca/~z70wang/research/tmqi/. [Accessed: August 2017]. [21] Z. Farbman, R. Fattal, D. Lischinski şi R. Szeliski, „Edge-Preserving Decompositions for Multi-Scale Tone and Detail Manipulation,” [Interactiv]. Available: http://www.cs.huji.ac.il/~danix/epd/. [Accessed: 2017]. [22] „pfstools - High dynamic range images and video,” [Interactiv]. Available: http://pfstools.sourceforge.net/index.html. [Accessed: August 2017]. [23] Y. Al-Najjar şi Der Chen Soong, „Comparison of Image Quality Assessment: PSNR, HVS, SSIM, UIQI,” International Journal of Scientific & Engineering Research, vol. 3, nr. 8, 2012. [24] Peter, „The HDR Image,” [Interactiv]. Available: http://thehdrimage.com/tag/halos- around-trees-in-hdr/. [Accessed: August 2017]. [25] Z. Wang şi A. C. Bovik, „Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures,” IEEE signal processing magazine, vol. 26, nr. 1, pp. 98- 117, 2009 [26] Răzvan-Costin Dragomir, “HDR Compression Using Voting” Dissertation Thesis, Unpublished Work, Bucharest, 2017.

325

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

APPLYING PHOTOGRAPHS FILTERS TO IMPROVE PEOPLE AND OBJECTS RECOGNITION USING AN API

Oana CĂPLESCU 1* Costin-Anton BOIANGIU 2

ABSTRACT

In a world where computer science started evolving so much that artificial intelligence is recognizing objects and people’s reactions from a photo, relying on the technology makes surviving in a modern world so much easier. The most important step in automatic content recognition is the quality of the image - quality which can be improved in many ways. Having a high contrast level provided in a is a characteristic that ensures that more objects would be recognized because Microsoft’s Computer Vision API identifies content in the provided photographs using the shapes of the targets. It is clear that making an image’s objects more “visible” by applying various effects to it will increase the algorithm’s recognition.

KEYWORDS: API, visual recognition, Computer Vision, voting system, knowledge library

1. INTRODUCTION

Using the cloud-based Computer Vision API everyone can analyze data provided by the API. It also ”provides developers with access to advanced algorithms for processing images and returning information” [1]. The API’s ”algorithms can analyze visual content in different ways based on inputs and user choices” [1]

A. Faces

The computer vision algorithms can focus on many kinds of output information depending on the needed results. One of these algorithms [2] can detect human faces and analyze their characteristics like age (approximated using face gestures techniques), gender and displays the rectangle of the face - for pictures containing multiple persons. All of this visual output is actually a subset of metadata generated for each face to describe its content. The output for the following image is: [ ”age”: 28, ”gender”: ”Female”, ”faceRectangle”: ”left”: 447, ”top”: 195, ”width”: 162, ”height”: 162 , ”age”: 10, ”gender”: ”Male”, ”faceRectangle”: ”left”: 355, ”top”: 87, ”width”: 143, ”height”: 143]

1* corresponding author, Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 2 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

326

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 1. Example of a picture’s analysis resulted from Computer Vision API

B. Tagging images

The main point of this paper’s scope is image tagging, which refers to the API’s ”understanding” (or better said matching) of various objects found in the provided images. The tagging consists of two main steps: • Matching the object with its knowledge library: for an input image, Computer Vision API returns tags ”based on more than 2000 recognizable objects” [3] for example: living beings, scenery, house belongings and actions. • On each matching, the algorithm provides a confidence grade that describes how sure it is regarding the recognition of the elements inside the current tag. Generally, the information provided by the algorithm has a confidence grade around 95- 99%, but of course there are images which cannot be recognized at that level. For example, on the following image not all the information is graded with a 99% confidence grade. The Computer Vision API outputs for the following picture: ”tags”: [ ”train”, ”platform”, ”station”, ”building”, ”indoor”, ”subway”, ”track”, ”walking”, ”waiting”, ”pulling”, ”board”, ”people”, ”man”, ”luggage”, ”standing”, ”holding”, ”large”, ”woman”, ”yellow”, ”suitcase”], ”captions”: [ ”text”: ”people waiting at a train station”, ”confidence”: 0.8331026 ].

327

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 2. On each tag, a confidence information is also added which describes (as a percentage) how sure the algorithm is regarding its prediction.

The human operator can see and understand that in this photo the man with the little girl are not waiting for a train but are rather getting out of the metro platform. The reason why the algorithm suuggested that they are waiting for the train (with a confidence grade of 83%) is that usually on a subway platform the people are waiting for the train or that in all of its previous pictures people were waiting for a train. This is because the algorithm, based on artificial intelligence, is being continuously trained by the images it’s processing, becoming more accurate after each category of photos. According to Microsoft’s documentation [3], ”When tags are ambiguous or not common knowledge, the API response provides ’hints’ to clarify the meaning of the tag in context of a known setting.” Unfortunately, it appears that at this point English is the only supported language for image description.

C. Optical Character Recognition - OCR

Computer Vision also provides output (and thus support) for Optical Character Recognition technology which detects the text from input images and also extracts the identified text into a machine-readable character stream, which can be used in manyy ways.

328

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

According to Microsoft’s documentation [4], ”OCR supports 25 languages”. The supported languages are: ”Arabic, Chinese Simplified, Chinese Traditional, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian (Cyrillic and Latin), Slovak, Spanish, Swedish, and Turkish.” Microsoft based their advancement in development of these algorithms on output but they did not use voting as a form of analyzing the data.

2. VOTING PROCESS

To increase the optimal upshot, these voting systems have been used: • majority voting: efficient, and it does not require a large set of data to be analyzed; its rule is: every vote has a fixed weight and a fixed probability of occurrence - considering the minimum confidence which is set at the beginning of the voting and a delta value which confirms if the confidence obtained after applying an effect is considered for the voting process as an input or not - considering that each effect produces its own result and improves its own recognized information within the photos. • weighted voting: needs knowledge about the original pictures (considers previous inputs); in this case, the criteria is established as follows: if the output’s confidence is higher than the original (thus the need from the previous input) it is considered for the voting process.

3. METHOD DESCRIPTION

Several processing based on voting have been developed over time [9][10][11][12]. The integrated ideas are, of course, specific to every application desired outcome, but there are also several key elements in common: the voting is performed on a processed version of the input and any of the used processing methods in itself cannot guarantee optimal results. The analyzing process is further explained in the following diagram:

329

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 3. Voting process from the initial image until the graphs which describe the improvement of the effects applied on the photos

A. Analyzing process

The API functions receives an image as an input (image that needs to have at most 4Mb), it starts analyzing it and displays in the form of JSON file, using tags, the following information: • Description: an array of tags, generally describing the image (such as ”outdoor”, ”water”, ”tree” etc.), captions, also describing the image (like ”text”: ”a beach with palm trees and a body of water”), and the most important factor the ”confidence” which is a real number, used instead of a percentage: 0.984310746) • Tags: describe the objects found in the picture, in pairs of ”name” and ”confidence” values. • Image format, image dimensions • Black and white flag: Boolean value • Dominant colors: for background, foreground, accent Each information provided by the algorithms has a confidence value assigned to it which describes how confident is the algorithm of the information provided.

330

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

1) Contrast increase: one of the most used effects that is added is the increaase of contrast, which makes the output image a lot more clearer than the original input, therefore making it a lot more easy to understand (from both human eye’s point of view but also from the Computer Vision’s understanding), and the impact iis that the number of recognized objects is increased 2) Chromatic effect: another important aspect for Microsoft’s Computer Vision analysis is color. Because of the fact that objects are searched in images based on past recognition, the algorithms analyze color: a beach is either white to light brown, water is turquoise, green or blue shaded - all of these are patterns which train the algorithm on how to better recognize elements and objects.

4. RESULTS

Figure 4. Voting process on a beachside photograph on which all above effects have been applied

331

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

In this case it is very important to observe that unlike the original photo and the chromatic effect, in the picture that had the contrast effect applied on the algorithm recognized another element: the fact that is a daytime picture. Of course this can only be voted as best result in the maajority voting as it does not have a bigger confidence than 0.95 (value which was chosen for weighted voting). Considering a photo with a higher quality (about 10 times higher than the previous one - 2Mb) it can be easily seen that the first three recognized objects have a confidence close to the ideal value - the 0.95 for majority - while still remaining bigger than it, whereas the only picture in which the furniture wasn’t recognized is the original one (which indeed was a bit blurry - it becomes obvious that the applied effects improved this photograph).

Figure 5. Voting process on a living room photograph on which all above effects have been applied

It can be seen that each face was easily recognized as either a man’s or a female’s face, yet the ages (which are not visible in the current photo because the space for these was too small for the algorithm) were more exact than in the original photo. Unfortunately not much else was recognized - as an example trees or other scenery objects. It can also be noticed that the boy’s wasn’t recognized, but this was also not recognized in the original photo, although it is safe to say that this iteration has a higher description confidence (0.9885462) than the original - 0.9802546. Another important aspect is that the age of the people in the photo was more accurate after applying both effects (comparing to appplying only the contrast or only the chromatic one).

332

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 6. Output on both contrast added and chromatic effect on people’s face recognition using Microsoft’s Computer Visioon

5. FUTURE WORK

After experiencing very good results on applying a number of effects on a single photo (such as contrast, saturation and highlights combined with another complete effect – like chromatic [5] or Ludwig [6]) it can be concluded that in order to achieve a better recognition the image’s resolution needs to be as high as possible after the effects have been applied - considering that the effects applied using Canva Editor [7] and Be Funky photo effects [8] are deteriorating the image’s quality. As future work a good approach would be to try Photoshop in applying these effects as the quality won’t get deteriorated and visible impact on Computer Vision’s recognition is expected.

6. CONCLUSIONS

As far as the chromatic effect’s results were confident at first point, we can concur that the output isn’t of that much impact - beeing an important part of the weighted voting process, majority voting couldn’t be concluded as the confidence exposed by this effect didn’t offer any results whatsoever. From the contrast effect’s ”point of view”, results were clear and defined in both the weighted voting and majority voting, which concludes an important improvement in Computer Vision’s algorithms results.

333

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

REFERENCES

[1] Computer Vision API - Microsoft documentation, Available at: https://docs.microsoft.com/en-us/computer-vision, Accessed on 1 March 2018 [2] Detecting and analyzing faces - Microsoft documentation, Available at: https://docs.microsoft.com/en-us/computer-vision/faces, Accessed on 1 March 2018 [3] Tagging images - Microsoft documentation, Available at: https://docs.microsoft.com/en-us/computer-vision/tagging, Accessed on 1 March 2018 [4] Detecting and analyzing printed text found in images – Microsoft documentation, Available at: https://docs.microsoft.com/en-us/computer-vision/ocr, Accessed on 1 March 2018 [5] How do I replicate the Chromatic effect of the Instagram filter, Available at: https://www.youtube.com/chromatic-effect, Accessed on 1 March 2018 [6] How do I replicate the Ludwig Instag effect of the Instagram filter, Available at: https://www.quora.com/Instagram-filter-Ludwig, Accessed on 1 March 2018 [7] Canva - Photo Editor, Available at: https://www.canva.com/photo-editor/, Accessed on 1 March 2018 [8] Canva - Photo Editor, Available at: https://www.befunky.com/photo-effects/, Accessed on 1 March 2018 [9] Costin-Anton Boiangiu, Radu Ioanitescu, Razvan-Costin Dragomir, “Voting-Based OCR System”, The Proceedings of Journal ISOM, Vol. 10 No. 2 / December 2016 (Journal of Information Systems, Operations Management), pp 470-486, ISSN 1843-4711. [10] Costin-Anton Boiangiu, Mihai Simion, Vlad Lionte, Zaharescu Mihai – “Voting Based Image Binarization” - , The Proceedings of Journal ISOM Vol. 8 No. 2 / December 2014 (Journal of Information Systems, Operations Management), pp. 343-351, ISSN 1843-4711. [11] Costin-Anton Boiangiu, Paul Boglis, Georgiana Simion, Radu Ioanitescu, "Voting- Based Layout Analysis", The Proceedings of Journal ISOM Vol. 8 No. 1 / June 2014 (Journal of Information Systems, Operations Management), pp. 39-47, ISSN 1843-4711. [12] Costin-Anton Boiangiu, Radu Ioanitescu, “Voting-Based Image Segmentation”, The Proceedings of Journal ISOM Vol. 7 No. 2 / December 2013 (Journal of Information Systems, Operations Management), pp. 211-220, ISSN 1843-4711.

334

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

UNIT TESTING FOR INTERNET OF THINGS

Alin ZAMFIROIU 1* Daniel SAVU 2 Andrei LEONTE 3

ABSTRACT

The expansion of Internet of Things and migration to cloud computing, highlights the need for testing from the software engineering phase and each module that is part of them. This comes from the fact that each application, or module, or device is developed by a different provider, making software testing more challenging. We will evaluate the need of unit testing for Internet of Things from the point of view of a tester, especially a QA (quality assurance) tester. In the conclusion we will highlight the need of testing for IoT and further research.

KEYWORDS: Testing, Unit Testing, Application, Internet of Things, IoT Architecture

1. INTRODUCTION

Before going further into detail, we will focus on what we should know and understand about the concept of Internet of Things. It is important to know the differences between the Internet and WWW (World Wide Web) – those two terms are usually used as synonyms. The Internet is the physical connection between one point and another, while the World Wide Web is the interface that makes the information to flow, being the layer, which is on top the Internet. While the World Wide Web has continuously growing, the internet has been on a steady path since the beginning, until now, Internet of Things (IoT) is the evolution of the internet, and we are experiencing it right now. People are becoming more proactive than reactive only by having the possibility to be aware of the environment. We evolve as people, by communicating, so this is the next step of the digital world. Devices will evolve also by communication, being a part of the Internet of things.

1* corresponding author, senior researcher, National Institute for Research and Development in Informatics, lecturer PhD, The Bucharest University of Economic Studies, Bucharest, [email protected] 2 assistant researcher, National Institute for Research and Development in Informatics, Bucharest, [email protected] 3 PhD student, University “Politehnica” of Bucharest, Bucharest, [email protected]

335

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 1. The point in time when more things were connected than people via internet

The term of Internet of Things was first mentioned in 1999 by a member of RFID (Radio Frequency Identification Development) department, but the term has more value and meaning in the present. We can agree with the definition given by the Cisco Internet Business Solutions Group (IBSG), IoT is simply the point in time when more “things or objects” were connected to the Internet than people. [1] Consider that around us is another world where billions of devices all interconnected over IP (internet protocol) networks. But not only devices and not only electronic objects that have a higher technological development, but also things such as furniture, clothing, trees and even also animals are connected [2], [3].

Figure 2. Internet of things

Internet of things is the new era of the internet, the evolution of internet. Objects use the power of internet in order to make them recognizable, being capable of obtaining information and further making context-based decision.

336

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

As the IoT evolves, it will unlock new possibilities for us as consumers making our day by day life easier, offering life-enhancing services and also will boost the productivity for enterprises. Addressing our lives as consumers, the connectivity provided by the Internet of Things, will converge into raising life quality by increasing the quality of security, health, education, energy.

Figure 3. Fields that will be impacted by IoT expansion

The Internet of Things is currently in the spotlight right now. Your devices are now giving you pieces of advice on how to improve your life, your watch is advising you on your fitness and what to eat. Our smartphones can open doors for us and lock them if we forgot, but we have not thought yet of testing each of these systems that we interact with. We should consider ways of testing and analyze methods of testing Internet of Things devices. Challenges related to IoT environment go beyond devices, hardware and sensor and relate with big data which is the real issue for testers. The high amount of sensors interactions is making it a challenge for testers in order to be able to create the environment for testing [4]. Even considering that the hardware and protocols are tested by the hardware developers it becomes harder to understand the whole application intelligence [5]. Having a point of view, we will assume the role of tester from the QA (quality assurance) perspective, even though QA eventually means going through the whole stages of testing. We should consider testing IoT devices before they reach maturity, taking into consideration the fact that these devices are transmitting huge amount of data, part of which can be considered as critical. In the IoT world, sensors, devices and applications form an ecosystem, if we apply the testing thinking about the QA convergence of hardware and software, it is not enough, the simple validation of functionality of each system individually is not sufficient in a complex architecture [6], [7]. As we mentioned taking each system individually is still not what we expect to be enough; a working system may not collaborate with all the systems that is interacting with. For example, a shipment tracking device will need to be able to communicate with different back-ends and will need specific algorithms in order to assure the delivery respecting the defined parameters.

337

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2. IOT TESTING AREAS

Cognizant and collaborators [8] defined the following types of tests to be performed within an IoT system: • Connectivity testing: aims at testing the wireless signal to determine what happens when the connection is poor or when there are several devices that are trying to communicate. • Security testing: has the objectives: confidentiality, authorization and authentication. • Functionality testing: validates the correct operation of the IoT system. • Performance testing: validates communication and computing capabilities. Stress testing is done to determine the number of simultaneous connections that can be supported by a particular device. • Exploratory testing: knowns as user experience tests. • Compatibility testing: verifies the correct operation with respect to different protocols and configurations. The important areas for testing the IoT systems are presented in Figure 4.

Figure 4. IoT testing areas

There are some reports on IoT testing procedures [8], [9], [10]. Most reports focus on performance testing [11], [12], [13], IoT resource emulation and IoT testbeds deployments [14], [15].

338

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2.1. Connectivity

The main objective of connectivity testing is to ensure the connection between the objects and the communication infrastructure. Seamless connectivity is an element of critical importance in an IoT network. All devices must always be connected to each other and to other systems, such as IoT servers. The success of an IoT system depends on how well the devices and the hub are connected. Even a split-second connection loss may lead to inaccurate data, which will make the system unusable. Thus: • the device must be permanently connected to the hub even if the hub is in the sleeping / power saving mode; • the device must regularly send ping messages to ensure the connection is in place. As for the information exchanged between users and devices, IoT products implement APIs (Application Program Interfaces). These interfaces are programs responsible for receiving a message and for replying to it through a new message. There are currently several tools available for testing APIs. These tools can simulate a message sent / received by a device, which helps validate information accuracy. Connectivity testing is done both online and offline. The online test analyzes the connection between devices and applications, data transfer and network security. The offline test analyzes what happens when the network is not available. For vital applications, devices such as pacemakers or health monitors must work continuously, regardless of the state of the network. The device must have the ability to store and process the data collected while offline, and then transmit it when the network connection is restored. Often the network connection is intermittent or uncertain. Thus, if the connection is unexpectedly lost, it is important for the user to be sure that their data is saved and stored correctly and that they are provided when the connection is restored. Devices have to communicate with each other. Attention should be paid to the different methods of communication and information exchanged between devices and users. During connectivity tests, the context in which the devices (network type, signal strength, weather conditions, etc.) will be used should be considered and it must be checked whether the device is operating under these conditions.

2.2. Security

Security access is granted to authorized access to protected data and unauthorized access is restricted. Security testing verifies the security of information, confidentiality and reliability of the system in order to guarantee the quality of the IoT environment. Because the connected devices store delicate information, security testing ensures the correctness of the steps taken to ensure safety and confidentiality. It is important to validate the user through authentication and to have data privacy checks as part of the security tests.

339

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Security testing covers confidentiality, autonomy, control, and protection against espionage. Appropriate security and penetration tests are essential because weak security measures can lead to loss of sensitive personal information. In the case of IoT devices, in addition to theft of private information, cybercriminals can also attack home security systems or in-vehicle systems to cause accidents. According to a study by Hewlett Packard, 70% of existing IoT devices are vulnerable to security issues for the following reasons: the lack of data encryption, the lack of minimum password requirements, and unparalleled access to the user interface. The most common security issues reported include: confidentiality issues, insufficient authorization, lack of encryption of the transport, insecure Web interface, inadequate software protection. With billions of built-in sensors, it is essential to address data privacy and security issues within the IoT ecosystem. The different types of security testing requirements are: identification and authentication, data protection, data encryption, data storage security in the local and remote cloud. Avoid unauthorized access to devices or information. Security testing is often ignored because of market pressure on companies to continuously launch new products. Another reason why this type of testing is neglected is due to the lack of understanding of security tests by IoT object manufacturers. There are two main types of security testing: • Static tests: Perform either manually or through code-examining tools. The objectives of these tests are: analyzing the programming language / code developed for the device, identifying whether the programmer has complied with best coding practices and the code is not breaching security; • Dynamic tests: The device is checked during normal operation. The tools used are looking for authentication problems, simulate hacking attacks, indicate invalid device memory usage, etc. Security issues cover various security breach scenarios where the IoT device is used as a weak point of access to the network, possible security breaches leading to user harm or violation of confidentiality.

2.3. Functionality

Functionality testing includes end-to-end testing of the IoT ecosystem to ensure that the system generates the desired results and behaviors according to business requirements. Functionality testing targets Web sites, user interface and back-end. The purpose of these tests is to verify the application's functions to see if they meet all the functional requirements. It analyzes customer requirements and how the consumer wants the output, based on IoT application-specific inputs. Functionality testing is one of the most important methods for any software project and will continue to be extremely important as the Internet of Things expands, requiring powerful test management solutions. The strategy for functional testing of IoT must start by creating virtual devices that can simulate real-world environments and connectivity. One example is the Nest Home

340

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Simulator Testing Tool that works with Nest products so that it can quickly and easily test system events and sensor status. IoT companies should consider using a virtual environment for functional testing of their products. There are a considerable number of challenges and obstacles that developers and researchers have to face in the testing of IoT products. Therefore, it is important that they focus on the main components of the products. Similar functionality testing principles apply to both IoT automation products and IoT Web sites. As a first step, basic components that require functional testing need to be identified. The identified components must be tested on both local and mobile devices. The key factor for functional testing of IoT is a real-life simulation environment using real devices or simulators to test these core components. Modes of functional testing of IoT devices may differ from product to product as the Internet of Things market expands. There are a significant number of challenges and obstacles that manufacturing companies have to face during the testing process. That is why it is important to plan ahead and create the necessary tools to simulate the real-life environments. In addition, innovations are needed to ensure the quality and security of IoT products. There are cases of negative or positive testing. In case of positive testing, the application is verified based on valid input data. Negative testing is performed to make it clear that the application does not work when invalid input data is provided. When conducting test cases, staff involved should consider aspects such as body movements, voice commands, and sensor usage.

2.4. Performance

Performance testing aims to test the behavior of IoT devices along the network, to test internal capabilities of embedded systems and network communication. Performance testing allows you to evaluate the promptness of a communications network and internal computing capabilities of the embedded software system. It must be checked that data is correctly transmitted and stored, even when an unexpected service interruption occurs. The primary objective of this type of test is to determine the relationship between the object and the software with which it interacts and to standardize the association between them. Performance testing validates the hardware and software components of a device based on multiple test cases. It evaluates whether an application can handle the projected increase in user traffic, data volume, frequency of transactions, etc., therefore addressing scalability issues. Compatibility should be validated by analyzing interactions between sensors in order to ensure effectiveness in a real IoT environment. Evaluators should consider factors such as network bandwidth, latency, packet loss, number of competing users, etc. as these factors influence performance significantly. For example, a physical object may not respond to a user's order. In order to avoid connection problems that can affect device performance, tests with different types of networks and data streams can be performed. Network activity needs to be analyzed in detail, paying particular attention to the speed of data transfer from one

341

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT network layer to another. It is also necessary to verify that data is transmitted and stored correctly, even in the event of an unexpected service interruption. In order to evaluate the overall performance of the IoT application and to validate the response time for different load rates, the code needs to be optimized and various scenarios such as battery discharge, memory reduction, switching between different networks must be followed. Testing the performance of IoT devices involves the following aspects: • each authenticated device within the range must be able to connect to the hub; • the device must be able to send any amount of data to the hub (as required); • if the data sent by the device exceeds a predefined quantity, data transfer should only be initiated after confirmation received from the hub; • the device must be able to send data, even in case of power supply problems. These issues need to be resolved as soon as possible. Performance testing targets three main levels: • system level: processing, analysis, database; • application level; • network and gateway level: testing technologies such as Wi-Fi, Bluetooth, Z- Wave, RFID, NFC, protocols such as HTTP, CoAP, MQTT and other specific IoT protocols. Tests can estimate the built-in software capacity, the reliability and stability of communications networks under certain operating conditions.

2.5. Exploratory

Exploratory testing is the process of investigating an application through logically-created but ad-hoc tests that allow you to study and understand the applications, features, and operations of an application. Exploratory testing is conducted in an empirical system where it is possible to study and evaluate in a different way than through predefined test procedures. Exploratory testing of software included in the Internet of Things combines the logical- cyber world with the physical world. The success of any software, including those of the Internet of Things, is determined by its users. Even an IoT application that meets all requirements can be a failure if it does not gain the trust of the target audience. In this context, exploratory tests are important because they allow you to determine how an application behaves when it is used by the end user. Exploratory testing is a test type that is performed from the user's perspective. Considering from this perspective developing software, staff involved in testing can have a better picture of how the application behaves in real terms. This is particularly important because not only the ability of IoT applications to communicate and interact with a wide range of devices can be evaluated, but also their ability to significantly improve end-user living conditions.

342

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Internet of Things devices can be affected by new types of software errors that require new testing approaches. Staff involved in testing activities need to know and use new exploratory testing models to tackle both physical and cyber worlds. Exploratory tests help developers learn more about the functionality of a program and find out if the requirements were correct and fully understood.

2.6. Compatibility

It is an essential step in IoT testing that evaluates the interaction between IoT software and various intelligent devices, platforms, network layers and operating systems. It aims to guarantee the scalability and security of data exchanges and ensure the compatibility of communications protocols. Considering the complexity of the IoT system architecture, compatibility testing appears to be a pressing necessity. This type of testing must always be done in the real world and not in the virtual environment. Because IoT systems include a multitude of devices, sensors, protocols and platforms that are constantly updated, the number of possible combinations is extremely high. There are a lot of devices that can be connected via IoT systems. These devices have a very diverse software and hardware configuration. Therefore, it is essential that compatibility testing involves a comprehensive test matrix to cope with the complex architecture of Internet of Things. At a technical level there is a wide range of quality attributes: compatibility, installation and use of resources. These must be checked to provide objective test results to the client. Also, by testing compatibility, the way the different devices communicate with each other and with the digital environment is evaluated. Various validation options, such as hardware and encryption compatibility checking, and compliance with security standards from device level to network layer are being performed within this type of testing. Testing features such as: operating system versions, browser types and versions, generations of devices, communication (e.g., Bluetooth). Testing compatibility verifies whether the functions are working correctly in different configurations, combinations of device versions, protocols versions, mobile devices, and mobile operating system versions.

3. UNIT TESTING

You may not hear about testing in the IT field, but the only reason you may not know what is testing is not being part of software engineering industry. It may not be given the importance that it deserves, but its purpose of checking if a piece of software corresponds with what it should do on the paper remains very important in the development process. It is basically the acceptance which determines that software is working at the quality level required. Testing may be seen from different perspectives. Functional testing validates that the main functions fulfill the requirements; system testing validates the most common usage paths before releasing; performance testing is approaching a different angle, by testing the system under certain conditions of load and stress.

343

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 5. Unit testing is in the center of all testing techniques

Unit testing, on the other hand, is testing that each part, even the smallest unit will correspond with the defined documentation and APIs. Having the certainty that all it is running and correspond after the unit testing, we can go further and validate the software. Taking into consideration this information we can state the unit testing is the basis for any other forms of testing. As is unit testing is actually testing an application, a function or a class. Usually unit testing is automated From our perspective when we talk about IoT devices, unit testing should start at the level of interconnecting of different objects / devices.

4. UNIT TESTING FOR IOT SYSTEMS

If we consider an IoT architecture, by applying the principles of testing, and considering its steps, unit testing represents the device testing but from different perspectives such as: performance, security, compatibility and usability of each component [15], [16]. To be easier to understand unit testing in the IoT scenario we will split the environment into two main categories, the device interaction layer and the user interaction layer. From the perspective of device interaction there are some required elements that need to be present in testing: Interoperability. Devices must have the possibility to work among other devices, other developments and implementations. Standards. Devices and sensor must respect the standard and be validated that they are working conform to the established standard for their category regarding communications protocols and quality. This is the point where the devices need to be tested before putting them on the market.

344

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Security: It has become a sensible matter of speaking. Security must be assured on the regarding data protection, authentication, data encryption, cloud storage and back-up [17],[18] The user interaction layer is where the device and the end-user communicate. This may vary on the level of experience of the user. Test areas may be: Network capability: Devices must be able to communicate, so testing different network modes is a must. Also, it is necessary that the communication methods are able to fulfill the user needs and be aware of the energy consumption as well. Usability: Of course, that the user knowledge influences the level of usage of a device, but the response of a device must be in certain parameters in order to offer the end user the capability to understand the interaction between different machines / devices. Services and back-end development: User interface and interaction may be the key of a system to work properly, but the IoT system as a whole must have a complex analytical engine to ensure the user experience at higher level [19], [20], [21]. When we talk about IoT unit testing these should be the areas that a tester should cover for each device in the ecosystem. The software engineering it will always be covered by the manufacturers, so the ‘unit’ from the IoT perspectives is the device / object itself that should be tested, but in the corresponding architecture that will be part of. If we have an system based on sensors with Arduino board, we can use ArduinoUnit to develop unit tests [22]. To use this framework it is necessary to have Arduino IDE and from Menu->Sketch- >Include Library->Manage Libraries to install ArduinoUnit, Figure 6.

Figure 6. Install ArduinoUnit library

To use in a program the Arduino Unit we have to import the h file:

#include And after to create test for our methods. We suppose that we have a method that calculate the sum of two numbers, Figure 7.

Figure 7. The tested method

345

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

To create the right test, from the Right-BICEP principle, for this method we implement the test from the Figure 8.

Figure 8. The Right test for the sum() method

In the loop method what we have to do to run the implemented tests is to call the run method from the test:

Test::run();

5. CONCLUSION

A basic IoT system will borrow some testing techniques from the software engineering field in order to validate the applications. For an IoT ecosystem the unit testing it will be linked to testing each device functionality regarding its ecosystem that will be supposed to work within. Consider this the focus of testers will have to be on the following: • performance testing regarding network capabilities, ensuring the level of communication; • security level; • intercompatibility; • exploratory testing for the user experience. The progress made by the Internet of Things in last couple of years and the investments made in infrastructure speak for the fact that this is the future of the internet. As the functionalities and multiple domain usage of the IoT in the consumer and even enterprise markets, should gear up the QA and tester teams and keep up with the digitization. The skill and training on the testing part will make a huge importance on how the IoT is expanding its world of interconnectable devices. Blending the Internet of things into enterprises will make the tester to have more developed skill in order to overtake traditional functional testing and think about integration testing of software, big data and the components of IoT ecosystems. Taking into consideration all of the above and the fact the testing for the Internet of Things is still limited, there are some areas that should be taken into consideration when we talk about IoT testing. • interoperability testing; • testing the Internet of Things ecosystem under limited connection; • techniques for standardization of platforms and possibility of configuration; Another important step that should be explored will be automation of integration testing, but is hardly to be possible until IoT will reach its maturity.

346

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

ACKNOWLEDGEMENT

This paper presents results obtained within the PN-III-P1-1.2-PCCDI-2017-0272 ATLAS project ("Hub inovativ pentru tehnologii avansate de securitate cibernetică / Innovative Hub for Advanced Cyber Security Technologies "), financed by UEFISCDI through the PN III –"Dezvoltarea sistemului national de cercetare-dezvoltare", PN-III-P1-1.2-PCCDI- 2017-1 program.

REFERENCES

[1] Keyur K Patel, Sunil M Patel, Internet of Things-IoT: Definition, Characteristics, Architecture, Enabling Technologies, Application & Future Challenges, 2016 [2] GSMA, URL: https://www.gsma.com/iot/wp-content/uploads/2014/08/cl_iot_wp_07_14.pdf, 2014 [3] Dave Evans, The Internet of Things How the Next Evolution of the Internet Is Changing Everything, April 2011 [4] URL: https://blog.testlodge.com/testing-the-internet-of-things/, 2017 [5] Patrick Kua, Unit Testing, URL: https://www.thekua.com/publications/AppsUnitTesting.pdf [6] URL: https://www.360logica.com/blog/internet-things-iot-testing-challenges- considerations/ [7] Kelly Hill, Testing the internet of things: making the IoT work [8] Cognizant (2016). The internet of things: Qa unleashed. https://www.cognizant.com/InsightsWhitepapers/the-internet-of-things-qa- unleashed-codex1233.pdf, 2016. [9] Bloem, J. (2016). IoTMap - Testing in an IoT environment. Sogeti Publisher. [10] RCR-Wireless (2016). Testing the internet of things: Making the iot work. https://www.2j-antennae.com/files/1479994838.pdf [11] Lunardi, W. T., de Matos, E., Tiburski, R., Amaral, L. A., Marczak, S., and Hessel, F. Context-based search engine for industrial IoT: Discovery, search, selection, and usage of devices. In 2015 IEEE 20th Conference on Emerging Technologies Factory Automation (ETFA), pages 1–8, 2015. [12] Thangavel, D., Ma, X., Valera, A., Tan, H. X., and Tan, C. K. Y. Performance evaluation of mqtt and coap via a common middleware. In Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2014 IEEE Ninth International Conference on, pages 1–6, 2014. [13] Vandikas, K. and Tsiatsis, V. Performance evaluation of an iot platform. Proceedings - 2014 8th International Conference on Next Generation Mobile Applications, Services and Technologies, NGMAST 2014, pages 141–146, 2014.

347

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[14] Adjih, C., Baccelli, E., Fleury, E., Harter, G., Mitton, N., Noel, T., Pissard-Gibollet, R., Saint-Marcel, F., Schreiner, G., Vandaele, J., and Watteyne, T. FIT IoT-LAB: A large scale open experimental IoT testbed. In IEEE World Forum on Internet of Things, WF-IoT 2015 - Proceedings, pages 459–464, 2016. [15] Sanchez, L., Munoz, L., Galache, J. A., Sotres, P., Santana, J. R., Gutierrez, V., Ramdhany, R., Gluhak, A., Krco, S., Theodoridis, E., and Pfisterer, D. SmartSantander: IoT experimentation over a smart city testbed. Computer Networks, 61:217–238, 2014. [16] Esquiagola, J., Costa, L., Calcina, P., Fedrecheski, G. and Zuffo, M. Performance Testing of an Internet of Things Platform. In Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security (IoTBDS 2017), 309-314, ISBN: 978-989-758-245-5, 2017. [17] Kiruthika, J., Khaddaj, S. Software Quality Issues and Challenges of Internet of Things. In 2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), IEEE, 176-179, 2015. [18] Marinissen, E. J., Zorian, Y., Konijnenburg, M., Huang, C. T., Hsieh, P. H., Cockburn, P. and Verbauwhede, I. Iot: Source of test challenges. In 2016 21th IEEE Europe an Test Symposium (ETS) , IEEE, 1-10, 2016. [19] Xu, T., Wendt, J. B. and Potkonjak, M. Security of IoT systems: Design challenges and opportunities. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design, IEEE, 417-423, 2014. [20] Sicari, S., Rizzardi, A., Grieco, L. A. and Coen - Porisini, A. Security, privacy and trust in Internet of Things: The road ahead. Computer Networks, 76, 146-164, 2015. [21] Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M. and Ayyash, M. Internet of things: A survey on enabling technologies, protocols, and applications. IEEE Communications Surveys & Tutorials, 17(4), 2347-2376, 2015. [24] https://github.com/mmurdoch/arduinounit

348

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

A SURVEY ON SOUND-BASED GAMES

Diana-Elena BELMEGA 1* Costin-Anton BOIANGIU 2

ABSTRACT

This paper aims to explore the realm of sound-based games and to discover this field originality, challenges and advancements. Fully discovering and enjoying a sound-based game does not mean only to pay attention to the various sounds that “surrounds” the player, but to try to construct in player’s imagination the world that generated them. This is a world where imagination, music composition, sound generation, imaginative plots and a lot of modern technologies intersect. In recent years, a consistent list of titles from all the game territories (action, adventure, survival thriller, stealth thriller, mystery, horror, psychological, first-person shooter, and puzzle-based adventure) have emerged. The paper at hand tried to make a novelty-based selection of the representative games and will try to understand their main concept, and their contributions to the genre.

KEYWORDS: Sound-Based Games, Sense Substitution, Binaural Audio, Sonic Landscape, Soundscape, VR Headset, Doppler Effect.

1. INTRODUCTION

1.1. About sound-based games

While having sound in a game is common nowadays, when referring to sound-based games, one means that these games put an emphasis on the sound and use it as one of the main mechanisms of the game. One example is using sound intensity to guide the player towards its source in a dark environment.

1.2. Action game

As explained in the book Fundamentals of Game Design, by Ernest Adams, an action game is a game in which the player’s physical skills and coordination are tested [1]. These games usually require quick reactions and good hand-eye coordination. Various challenges, such as puzzles, conflict, and quick-time-events (during which the player must press the correct buttons at the right time) arise during gameplay [1].

1* corresponding author, Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 2 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

349

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2. STATE OF THE ART

This chapter introduces notions that regard sound-based games and presents examples of such games.

2.1. Lurking

Lurking [2] is a sound-based survival thriller game, created by a team of four students from the Digipen Institute of Technology Singapore. The game uses sound to create audio pulses, which reveal the environment surrounding the player, for a limited amount of time. [2] This is the only way to discover the aspect of the surroundings.

Figure 1. The graphics of the game. The circular shapes depict audio pulses. Image taken from [2].

350

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 2. Red contours depicting danger. Image taken from [2].

The premise of the game is that the player must escape the location they start in, while avoiding some creatures that follow him or her based on the sounds made. The creatures’ goal is to kill the player. When the player walks or runs, sounds of different intensities are emitted, which aid the player in seeing in front of themselves. The player can also create sound without moving, by pressing a button to create audio pulses or by using the microphone’s input. One should be careful about the amount of noise made because noises help the antagonists figure out the player’s whereabouts. The graphics of Lurking is minimal [figures 1 and 2], using mostly white contours on a black background, red for signaling danger and light yellow for certain items that may be of interest to the player.

2.2. Stifled

The game, Stifled [2], a follow-up to Lurking, is a sound-based stealth thriller by Gattai Games, released in 2017, for the Playstation 4 and the Playstation VR. It maintains the same concept of sound being simultaneously an aid in exploration and a risk of attracting threats as its predecessor [3]. In addition to the familiar gameplay from Lurking, this game supports the use of the Playstation VR headset, which enables the player to “see” through the eyes of the character they’re playing. Thus, a more immersive experience is created, as the player feels as if they are in the game. Stifled uses the same minimal graphics as Lurking (a black background and white and colored lines to draw the environment), but also added 3D rendered and realistically colored objects throughout the game (as can be seen from official screenshots at [3] and in figure 3).

351

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 3. A location in Stifled using realistic 3D elements. Image taken from [3].

2.3. Perception

The game Perception [4], a first-person psychological game, developed by TheDeepEndGames, introduces the story of a visually-impaired woman, Cassie, who explores a haunted house. Using the premise of blindness, the developers introduced the echolocation mechanic to create a perception of the surrounding world, as anything that is reached by the emitted sounds lights up with a light blue tint (figure 4) [4]. As in the previously presented games, there is a downside to using audio signals as a replacement to vision, because it increases the risk of attracting creatures that the character is not prepared to face. This is summarized by the game’s creative director, Bill Gardner, which said that the game is “about carefully weighing the risk and reward of creating noise”, and about being aware of one’s surroundings [5]. The character’s only defense mechanisms are running, hiding, and using distractions. The game presents challenges like noise sources that need to be handled with care, such as a room full of bubble wrap, which requires careful navigation through it, or radios and talking dolls [5]. Perception contains 3D graphical elements in muted colors, which add more realism, as well as set the mood. The player’s character being blind, the visuals are dimly colored, usually in very few different colors at a time (as can be seen in the game’s trailers and screenshots) [4].

352

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 4. The use of echolocation revealing the surroundings of the player. Image taken from [4].

2.4. BlindSide

A game by Aaron Rasmussen and Michael T. Astolfi, Blindside [6] is an audio-only horror game, which won in the “Most Innovative” category award at Games for Change festival in 2013 [7]. There is no visual environment, the surroundings being depicted by the sounds the players hear in their headphones. The inspiration of the game comes from the experience of one of the game’s co- developers, Aaron Rasmussen, with temporary blindness that resulted from an explosion during a high school chemistry class [6]. In this game, the player identifies as Case, an assistant-professor, who, along with his girlfriend, wake up blind in a dangerous city, which they must get out of [6]. The environment offers only auditory feedback alongside simplistic light signals that signify the direction of the sound source. By using the pitch of the sound to signal danger to the player, the game takes advantage of the Doppler Effect [7]. A notable technical detail, available in the iOS version of the game is the use of the phone’s gyroscope to sense the direction the player wants to head in [8].

2.5. Dark Echo

Dark Echo [9], developed by RAC7 Games, is an atmospheric, adventure puzzle game, available for the Android, iOS, and PC platforms [10]. It consists of 80 levels [10] in which the player’s end goal is to find the exit, while being engulfed in complete darkness. The following description of the game is based on the authors’ personal experience after playing 11 levels of the game on an Android mobile phone.

353

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 5. The player’s movement is illustrated by footsteps, while the white lines represent the noise made by the walking player. The red lines depict danger. Image taken from [9].

The game is in 2D space and the player sees everything from above. As there is no light available, at first, one can see only two footprints, representing the player’s character. Playing on an Android system, the player can tap in the center of the screen to create a sound. The resulting sounds are depicted as white lines that reflect and bounce off the walls, thus revealing the surroundings. If the player holds and then releases the finger, a louder sound is created, which travels further and therefore, reveals more. By moving, the character’s footsteps also create sounds, but they are not as strong as those created by tapping or holding and releasing in the center of the screen. The character moves towards where the player holds their finger on the screen. If the player just taps in one direction, the character moves one step per tap, thus making less noise. Sometimes, the player encounters objects depicted by yellow lines, which trigger an event, like a pathway opening. The player must be careful about the amount of noise created because creatures are trying to kill him or her, and they sometimes follow their target by the noise coming from his or her direction. Red lines illustrate the creatures’ noise, so that one can know where danger lies.

2.6. The Papa Sangre series

Papa Sangre [14] is a horror sound-based game for the iOS, developed by Somethin’ Else and released in 2010 [15]. It was described as “the first binaural real-time, 3D audio engine implemented on a handheld device” [15]. The game features no visuals except for a 2D user interface, being based entirely on sound, and must be played with headphones on. A description of the game follows, after the authors watched a video of the gameplay [16].

354

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Binaural audio consists of sound recorded by two microphones placed in ear-like cavities on the right and left sides of a dummy head [13]. Through this technique, sounds are recorded exactly as human ears would hear them. As a result, the listener can localize the sounds in a 360° space [13]. The game starts with the following premise: the player must save his or her loved one from the kingdom of Papa Sangre, by leaving the world he or she lives in. Afterwards, an eerie sound engulfs the player and it can be heard as the player crashes to the ground and reaches the “land of the dead, the land ruled by Papa Sangre” [16], where it is pitch black dark [15]. The voice of an entity then instructs the player how to move and turn around [16]. The 2D user interface is used to delimit the screen in zones that can be touched to input a certain action. Moving is realized by tapping the left and right sides of the screen, taps which are turned into steps. Tapping quickly makes the player run, but running too fast makes the character trip. Turning right or left can be done by swiping the upper half of the screen in the desired direction. One can get accustomed to this by turning around to face the entity, guiding themselves by the direction of her voice. The game introduces chambers in which one must collect “the trail of music” [16] that leads them to Papa Sangre. The musical notes indicate their position by emitting a short repetitive sound, like a “beep”. The player must collect them by walking towards them. After collecting all the notes, the “sound of the light” appears, which represents the exit of the chamber. The “sound of the light” [16] resembles a twinkle, a sound that loops until the player goes through the exit. Monsters also lurk in the chamber. The guiding voice presents each monster’s sound, so the player can recognize and avoid it. If the player runs into a monster, one can hear the terrified sounds of the player’s character as him or her dies [16]. By listening to the gameplay with headphones on [16], the game proves to be an interesting and immersive experience because of the quality binaural audio. The sound designer of Papa Sangre, Nick Ryan, explained that they had to limit the number of sounds heard at a time to three, because otherwise “it was too confusing” [15]. Papa Sangre II is the sequel to the game, released in 2013 [17]. It uses the same concepts as the first game, while also introducing new features: by tapping the top of the screen, the player can use the character’s hands for actions such as opening doors, clapping to scare monsters and firing a gun [17]. These new abilities therefore give the player the opportunity to also defend themselves against danger instead of always running away from it [17]. The other new feature of the game involves the use of the mobile device’s gyroscope to link the physical turns of the device to the player’s character turning in the game [17].

2.7. Shades of Doom

Shades of Doom [18], developed by GMA Games and released in 2001 [19], is both a first-person shooter and an audio game, aimed at visually impaired people. It features dynamic and multi-layered sound and “3D sound with up to 32 simultaneous sounds” [18]. It also uses the Doppler Effect to create a “realistic movement sound” [18]. The

355

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT authors of this paper experienced a small demo of the game and will use that experience to briefly describe it. The player finds themselves in a top-secret military base, where they must shut down an “ill-fated experiment” [18]. The player’s character is equipped with weapons, a medical kit, and a computer, which can analyze the surroundings and guide the person playing the game [18]. The game is self-voicing [18]: a robotic voice reads aloud the menu and options. That voice also gives indication to the player during the game (e.g. to use the arrow keys to select the difficulty level of the game or to indicate a nearby obstacle). The echo of the character’s footsteps gives the player the indication that they’re moving, while the sound of the wind and of equipment are also designed to guide the player in the right direction [18]. Bumping sounds were made whenever the character ran into an obstacle. The authors of this paper found the game difficult to play, as it was not clear what the game’s controls were. Furthermore, the authors could not easily understand what the voice meant to guide the player was saying because of its robotic trait. However, the game version experienced by the authors is a demo, version 1.2. Based on a gameplay video [20] of the game’s 2.0 version, the developers replaced the robotic voice with an improved, clearer one.

2.8. Game design concepts

In their paper [11], Friberg and Gärdenfors talk about the TiM project (Tactile Interactive Multimedia), in which the Stockholm International Toy Research Centre (SITREC) developed three audio games aimed at visually impaired people. Their goal was to develop games that were both suitable for blind people and aesthetically pleasing through the audio used [11]. Therefore, they developed three games, two of which (Mudsplat and Tim’s Journey) are of interest to my research because of the design concepts they employ. Mudsplat [11] is a game in which the player must defeat monsters that throw mud at them. They can do so by using a hose of water to attack the monsters. The game comprises 25 levels, divided into 5 different settings, also known as „worlds”, which distinguish themselves through the background music [11]. In the last level from each setting, the player must defeat the „boss”, „an extra tough and difficult monster” [11]. Being a game without any visual aid, the player must be able to create a mental image of the level, based only on the audio of the game. In order to achieve this, „the sounds are heard from a first-person perspective” [11] and there is adaptive variation of volume and to enable the player a sense of movement in space. Not all sounds are self- explanatory. Therefore, links between sounds and concepts, such as how big and dangerous a monster is, must be established early in the game [11]. Tim’s Journey, the second game of interest developed by SITREC, creates a complex world out of sounds, a sound landscape and encourages its exploration, its goal being that of „unravelling a mystery” [11]. The world is an island, with different „scenes”, such as a forest or a mill [11].

356

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Unlike other games for visually impaired people, that avoid blending different sounds together and ornamental soundtracks „to facilitate navigation” [11], Tim’s Journey’s sounds contribute to a pleasant soundtrack. Each sound is treated as a musical component: by their position, sound objects “reflect musical structures such as themes, choruses and bridges, and all sounds fit into percussive and melodic patterns” [11]. Important sounds stand out by their usage frequency in the soundtrack, perceived intensity and majestic “presence” [11]. The game provides navigational aids to help the player discover the world, such as foghorns positioned at predefined locations in all 4 cardinal directions, non-player characters that “provide information or clues to the plot”, footstep sounds that indicate the “kind of surface the player is currently walking on” and a device called the “Ambience Reductor”, which can temporarily lower the volume of all sounds that the player cannot directly interact with [11]. SITREC suggested a categorization system for the sounds used in the games developed by them. They can be divided into [11]: • Avatar sounds – sound effects emitted by the actions of the player’s character (e.g. footsteps, touching objects) • Object sounds – used to indicate an object’s presence to the player; can be short and recurring or long and continuous • Character sounds – made by non-player characters • Ornamental sounds – do not convey gameplay information, but are used to enhance the atmosphere of the game (e.g. ambient music) • Instructions – recorded speech used to help the player solve tasks The paper mentions two well-established auditory interfaces design methods: the “auditory icon” and the “earcon” [11]. The former employs the use of sounds that are “as recognizable as possible”, mostly based on authentic recordings, while the latter creates a link between short musical phrases and a concept [11]. The presence of objects in a sound game can be represented either by a continuous, looped sound emitted by each object or by associating the brief sound of an object with a form of impact [11]. The first method would create a confusing environment with many sounds overlapping, while the second one is usually not sufficient to provide the player with an accurate spatial localization of the object [11]. SITREC’s game, Tim’s Journey, uses a middle ground between the two methods: static objects emit brief sounds that are looped rhythmically [11]. Because these games have no graphical elements, some sounds could be mistakenly interpreted as an object. Therefore, it is important to make the objects that allow for user interaction stand out through the kind of sound they emit [11]. Friberg and Gärdenfors also refer to the film sound theories of Michel Chion, which can provide a useful framework to design game audio upon. Chion presents ways in which humans listen: reduced, casual, semantic listening [11]. Casual listening means that one is trying to identify the source of the sound, while semantic listening is used when deciphering speech or other coded information, like the

357

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Morse code) [11]. Reduced listening refers to the way people listen when they analyze the qualities of the sound (pitch, rhythm, harmonies), disregarding the source [11]. These modes of listening can be coupled with the types of sounds described above. Casual listening is useful for identifying the source of object sounds, semantic listening for understanding instructions and reduced listening for enjoying ornamental sounds, like music [11].

4. CONCLUSION

Based on the games introduced in the state-of-the-art section and the game design concepts, sound-based games employ sound to both explore and describe the environment. Sounds that describe objects, characters, setting and events should be carefully chosen so that they achieve their purpose of immersing the player and creating a sonic landscape or “soundscape” [11] for the player. Paul Bennun [17], the CCO (Chief Compliance Officer) of Somethin’ Else (the developer of the Papa Sangre games), said “the graphics card in your head is way better than anything you can get on any computing device” [17]. He emphasizes how powerful of a tool one’s imagination can be. Sound-based games engage the player’s imagination to enhance, and in some cases, create the game’s world. As mentioned in Connor’s, Yazzolino’s, Sánchez’s and Merabet’s paper [21], sounds can be employed to aid people in creating a 3d map of the space around them. In conclusion, as demonstrated by the games and design concepts presented in this paper, sounds can be used, through a variety of techniques, to create fun and interesting experiences for both sighted and non-sighted users.

REFERENCES

[1] Adams E. Fundamentals of game design. Pearson Education. 2014. [2] Lurking, URL: http:// www.lurking-game.com/ , Accessed 21-01-2018 [3] Gattai Games, URL: http:// gattaigames.com/ press/ sheet.php?p=stifled, Accessed 04-02-2018 [4] Perception Official Website, URL: http:// www.thedeependgames.com/ , Accessed 02-02-2018 [5] Megan Farokhmanesh, Perception is a sound-based game of hide-and-seek, URL: https:// www.polygon.com/ 2016/ 3/ 25/ 11305420/ perception-the-deep-end- games, Accessed 21-01-2018 [6] BlindSide – The Audio Adventure Game, URL: http:// www.blindsidegame.com/ , Accessed 02-02-2018 [7] Games For Change BlindSide, URL: http:// www.gamesforchange.org/ game/ blindside/ , Accessed 02-02-2018 [8] BlindSide Audio Adventure Game - Play Demo, URL: https:// www.youtube.com/ watch?v=X4wodbgogtM, Accessed 02-02-2018

358

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[9] Dark Echo, URL: http:// www.darkechogame.com/ , Accessed 04-02-2018 [10] Dark Echo on Steam, URL: http:// store.steampowered.com/ app/ 368650/ Dark_Echo/ , Accessed 02-02-2018 [11] Friberg J, Gärdenfors D. Audio games: new perspectives on game audio. In Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology 2004 Sep 2 (pp. 148-154). ACM. [12] Unity, URL: https:// unity3d.com/ , Accessed 05-02-2018 [13] Mona Lalwani, Surrounded by sound: how 3D audio hacks your brain, 12-02-2015, URL: https:// www.theverge.com/ 2015/ 2/ 12/ 8021733/ 3d-audio-3dio-binaural- immersive-vr-sound-times-square-new-york, Accessed 21-02-2018 [14] Audio Games for Gamers | Projects | Somethin’ Else, URL: http:// www.somethinelse.com/ projects/ gaming-for-gamers-audio-games/ , Accessed 05- 02-2018 [15] Jemima Kiss, Papa Sangre: The sonic iPhone horror game you've been looking for, 20-12-2010, URL: https:// www.theguardian.com/ technology/ pda/ 2010/ dec/ 20/ papa-sangre-game-audio, Accessed 06-02-2018 [16] Let's play "Papa Sangre" - Longplay Walkthrough Part 1 - iPad Air, URL: https:// www.youtube.com/ watch?v=jxXwotksBdM, Accessed 06-02-2018 [17] Andrew Webster, Gaming in darkness: 'Papa Sangre II' is a terrifying world made entirely of sound, 31-10-2013, URL: https:// www.theverge.com/ 2013/ 10/ 31/ 5048298/ papa-sangre-ii-is-a-terrifying-world-made-of-sound, Accessed 06-02- 2018 [18] Shades of Doom, URL: http:// www.gmagames.com/ sod.html, Accessed 06-02- 2018 [19] Audio Games, your resource for audiogames, games for the blind, game for the visually impaired! URL: http:// www.audiogames.net/ db.php?id=shadesofdoom, Accessed 06-02-2018 [20] *True Blind* Let's Play [Shades of Doom] story and level1, URL: https:// www.youtube.com/ watch?v=FMoRm8XAbQw, Accessed 06-02-2018 [21] Connors EC, Yazzolino LA, Sánchez J, Merabet LB. Development of an audio- based virtual gaming environment to assist with navigation skills in the blind. Journal of visualized experiments: JoVE. 2013(73).

359

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

THE USE OF ELECTRONIC RESOURCES IN THE PROCESS OF FOREIGN LANGUAGE TEACHING/LEARNING

Roxana Ştefania BÎRSANU 1*

ABSTRACT

The educational process has benefited greatly from the technological progress, which is constantly reshaping the dynamics of the teaching-learning context. The computer, mobile devices and the Internet have become constant aids to instruction in many parts of the world, for a wide range of disciplines and learning objectives. This paper refers to the formal teaching/learning context and focuses on the use of the Internet and ICTs as resources in foreign language classrooms. By resorting to these valuable resources, teachers undertake a secondary role and encourage learners to exert a higher level of independence and collaboration, which actually increases their creativity and autonomy. We will take a closer look at the main components of the teaching/learning environment (the people, the tools and the environment) and will highlight a series of advantages, but also potential drawbacks and/or problems related to the management of these resources in language classes.

KEYWORDS: language teaching/learning, Internet, web-based content, learner autonomy

1. INTRODUCTION

One of the things of the present that nobody can deny and ignore in the teaching/learning environment is that all the aspects and actors involved in the process are exposed to a number of pressures that drive constant changes. Such changes envisage mainly two coordinates of the educational process: delivered content – or, more precisely, the methods and practices of delivering the information –, and trainers’ approach to the learners’ needs, which are constantly shifting focus. During recent decades, the so-called information revolution, which actually meant the insertion of computer technologies in almost all fields of human activity, has also reached the classroom and, similar to any other form of revolution, it has brought along challenges and advantages. But, undeniably, technology is of great assistance for both ends of the educational process, i.e. for teaching, and for learning. “Technology is becoming an integral part of pedagogy. When educational objectives are clearly defined, the place and role of technological tools used appears to be natural and appropriate. It can be said that the existing problem of technology integration into the educational sphere has been turned into a greater one – realizing the learning process and pedagogy in a new way” (Geladze 2015, 68).

1* corresponding author, Lecturer PhD, Romanian-American University, Bucharest, [email protected]

360

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Nowadays huge amounts of money are spent by educational institutions and governments on research related to the effective use of digital technologies in the teaching process. The process consists of three important components, which, in order to ensure the success of the process, have to strike a balance between newer pedagogical methods and the accurate and inspired use of conventional ones. These resources are: the people (the trainers and the students), the tools (materials used in the classroom; in this particular case, we will focus on the Internet as a source of information) and the environment (the actual location where educational activities actually take place) (Luckin et al. 2012, 10). These components make the focus of this paper from the perspective of the broader electronic context of the teaching/learning continuum. In respect of the electronic resources mentioned herein, “electronic” is to be considered in the meaning provided by Oxford Dictionary as “carried out or accessed by means of a computer or other electronic device, especially over a network”. The advent of the Internet has completely remodeled the manner in which computers can be used in language learning. “Part library, part publishing house, part telephone, part interactive television, the Web represents one of the most diverse and revolutionary media in human history. It is already starting to transform academia, business and entertainment; there seems little doubt that it will eventually have a profound impact on education as well” (Warschauer & Healey 1998, 64). The authors were visionaries, since, ten years after this prediction, language teaching and learning rely heavily on the availability and richness of the Web and on the plethora of electronic devices for effective outcomes of the educational process. Thus, in 2005, UNESCO published a report called Integrating ICTs into the Curriculum: Analytical Catalogue of Key Publications, which mainly deals with the successful integration of ICT in education. The Catalogue is extremely useful for researchers in the field of ICT use in the classroom, but also for teachers who are interested in discovering new technological devices, approaches and methods to integrate them in the traditional classroom. The Catalogue is organised into chapters covering topics such as technology integration into specific subjects, suggestions and strategies for the integration of ICT in education or obstacles to effective ICT integration in a learning environment. The works reviewed in the Catalogue range from books to book chapters, online articles and slide shows of relevance for the above-mentioned topics. Foreign language teaching has undergone a number of changes in the last thirty years. Focus was shifted from mere linguistic proficiency to personal, customised instruction, the authenticity of materials, a more pronounced learner-orientation and the development of communicative skills. Technological advances have also greatly impacted the acquisition of foreign or second languages. In this context, computer-assisted language learning (CALL) is of great relevance, as is computer-mediated communication (CMC). CALL is an innovative use of pedagogy which differs from the traditional approach to language teaching in the meaning that it makes use of the web and of computers for designing language tasks. Language teaching can benefit in various ways from the use of the World Wide Web: for the use and access to authentic materials, for communication with native speakers, as a medium for making students’ productions available to a larger audience and for supplementing language drills (vocabulary, grammar, etc.).

361

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2. COMPONENTS OF THE LEARNING PROCESS

a. The people

Teachers have definitely become increasingly aware that, regardless of whether their classroom is a conventional or a virtual one, they should make full use of the advantages presented by electronic devices for the efficiency of their methods of imparting information to their students. For this, they need a certain level of technological literacy, which sometimes may represent a genuine challenge, especially for teachers who belong to a generation that prefers conventional teaching methods and practices. This suggests the need to train the trainers, who thus may become more knowledgeable in how they can use technology in order to meet the learning needs of their students. Once they acquire the skills necessary to employ the technological devices available at their institution, they need to establish how best to use them in a manner that is truly beneficial for the learning outcome. Education institutions constantly encourage teachers to resort to Information and Communication Technology (ICT) in the educational process. Sometimes, although the organization or institution stimulates the use of such modern methods and techniques, the problem may reside with the trainers/teachers. Some of them may resist using ICT because of the rapid technological progress, which entails that they, too, have to be permanently updated with the most recent technological devices that suit the pedagogical purposes of their disciplines. There is also a psychological barrier; teachers manifest a certain degree of reluctance in using ICT because their students have a high level of ICT competence and they may feel outperformed by the students, the most obvious result being loss of knowledge-based authority in class. Although the role of the teacher is diminished to a certain degree when the Internet is used in the classroom, his/her position as a guide and facilitator is not to be neglected. The amount of information and resources available online for class and independent activities would be useless unless the teacher assists learners in the process of selecting relevant information, monitoring the performance of requested activities and eventually assessing the learners’ progress and the results of their work and research. Student profile is also extremely important when using technology. One of the most challenging categories for teachers is represented by the so-called millennials, or Y generation, people who were born between the early 1980s until 2000. In the educational environment, a gap has been identified by researchers between learners’ expectations and teachers’ performance. With this generation of digital natives, the learner profile has changed considerably as compared to the previous ones. The main driver of this change is precisely their use of technology, which has become an embedded part of their everyday personal and professional life. The learning style of this younger category of learners has undergone a number of mutations as compared to older generations, mutations that need to be considered both when designing teaching materials and when sharing the information to the class. Millennials have a preference for a variety of learning strategies and express greater interest towards materials that are mostly visual and auditory. In this respect, resorting to resources that are available on YouTube or on various digital platforms, PowerPoint

362

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT presentations or collaborative educational tools record the highest rate of success among these learners. Another aspect that should be taken into account is that, with the Y generation, a pragmatic approach to content delivery is of utmost relevance. They are no longer interested in information for itself, but in the broader context of its significance for their life. That is why authentic materials are best used in their case, and teachers may easily tap into what the Internet has to offer in terms of authentic materials. Newspapers, case studies and reports are readily accessible over the Internet and, given the fact that learners are technology savvy, trainers only need a good Internet connection and a laboratory where learners can access the indicated information. These materials can be successfully used as starting points for individual or group assignments. In web-based language learning, traditional methods such as dictionaries and textbooks are successfully replaced by various other aiding materials such as online dictionaries, educational networks and various dedicated web pages, which are highly appreciated by learners due to their variety, saving of time spent searching for vocabulary and pronunciation and the fact that they are able to use multiple skills simultaneously. Quoting M. Warschauer (1997), Wagle mentions a number of principles that teachers should consider when resorting to web-based activities for their class. • One such principle refers to the careful analysis of the objectives targeted by the proposed activities. Before resorting to the Internet, teachers should first ponder on whether this resource is truly more useful than the textbook, for example. • Integration is essential. No great progress can be achieved if learners are merely required to have computer and Internet literacy. Teachers should strive to integrate the practical activities in the general objectives and design of their course. • Teachers should never underestimate the potential technical problems associated with the use of the Internet and of computers in the classroom. There are students with little knowledge on how to use the Internet, which means that teachers must train them first into acquiring this skill. Then there are the issues related to computer/Internet connection availability, equipment malfunction etc. • Teachers should be constantly assisting their students with the proposed activities. Even when they use the Internet as a resource, teachers should also made available to their students printed materials, should organise brief training sessions and stimulate collaborative work in pairs or groups when solving web- based assignments. • One final suggestion is to involve students in the decision-making process. Although using WWW can be stimulating and entertaining for learners due to a multitude of factors, the above-mentioned problems can generate frustration and confusion. That is why learners should be consulted in class on the progress they think they register and on the efficiency of activities performed with the help of ICT and the WWW.

363

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

b. The tools

The Internet is an extremely valuable resource for teachers. It refers not only to content that they are able to access in order to prepare their own materials, but also to specialised sites dedicated to trainers and learners, which contain data that can be processed and organised so as to be used as drills and worksheets in the classroom. Because language teachers under a constant pressure of presenting fresh and interesting materials, they can successfully resort to web pages, which can prove to be a great tool for planning language classes and assignments attractive enough to raise learners’ attention There are several reasons for which the Internet can be successfully used in the teaching activity. One is the relevance of the materials available, most of them up to date, which engages learners’ attention precisely due to its actual connection with reality and everyday life. Another major advantage is the richness of resources; there are plenty of specialised sites to choose from, each offering something for a specific need of the classroom. Another aspect which is not to be neglected is the fact that the Internet has become increasingly accessible, it is a constant in learners’ life and, since they are highly accustomed to use it in their free time, most of them face no difficulty in using it for professional purposes as well. Finally, the very nature of the Internet recommend it as a highly versatile and dynamic learning tool; the potential sources where specific information can be found and how to use it in a specified context encourages students’ independence and assists the development of decision-making skills. There is a variety of means through which foreign language teachers can use this highly useful resource. One is to enhance the visual nature of their lessons/lectures. Images and photos are valuable additions to any informative content and they are particularly effective in the case of learners within younger age categories. Since language classes also have an important cultural component, maps, images depicting local traditions and gastronomy can represent a good starting point for debates and comparisons with learners’ own culture. To further tap into the visual component, can be used for a variety of purposes: to expose learners to various dialects and accents, to check pronunciation, to introduce learners to important figures in the culture of the language they are studying through interviews and short documentaries about their life and activities, and, why not, to liven up the learning process. Another idea would be to invite people to interact with the learners remotely. This is actually one of the greatest features of the Internet, namely that it can allow live interaction among communicators spread all over the world. Depending on the age of the learners, the persons who have an online intervention can be young people of the same age from another country with whom to exchange ideas and impressions on a given topic or a guest such as a specialist in a particular field who could make a presentation or organise a workshop on a topic of interest for the learners. The plethora of applications available now – FaceTime, Skype, WhatsApp etc. – facilitates these interventions at no spectacular costs for the education institution. The above-mentioned applications can also be used in order to create dedicated groups where members can discuss about their projects, assignments, extra-resources identified. It is a good means to foster collaboration among learning group members which also gives them the opportunity to work as a team, to brainstorm and share information that may prove valuable for all those involved. Finally, working over/with the Internet

364

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT provides trainers with the opportunity to share documents, files and materials with entire groups of learners and colleagues. Virtual staffrooms and message boards are spaces where teachers/trainers communicate and share their experience with colleagues from all over the world. Joining such groups is a great opportunity to interact with professionals who may be or may have been faced with similar problems and who are willing to share from their experience and knowledge, which creates a sense of belonging to a larger community upon which you can rely for advice and suggestions. One important goal of the teaching activity is to foster collaboration among students, which helps them develop skills of great relevance for real world situations such as teamwork, leadership or effective communication. This objective can be reached with the help of a number of collaborative tools and applications. TalkBoard is a whiteboard application that allows users to write or draw on the board. It is of particular usefulness for persons with visual learning abilities and for foreign language learners. Padlet can be used as a form of digital bulletin which enables collaborative interactions. Teacher and students can add content on a topic (documents, videos, illustrations etc.) and everyone present can reply in real time. Through Google Apps for education such as Google Docs, learners can share content and work together on specific documents. Google Drawings assists learners in creating various diagrams collaboratively, while Google Slides allows them to design presentations together on a given topic. The main advantage for the teacher is that s/he can provide immediate feedback and keep track of students’ progress in time. For learners, the main advantages of using such applications are that they can request further details on a topic, ask questions, express their opinions and share their productions. In respect of the web pages that can be used in foreign language teaching (FLT), we should mention that there are web sites dedicated to teachers, where they communicate with colleagues and get information and suggestions on issues such as designing materials, devising assessment methods and strategies, drafting lesson plans, directions to other useful web pages, etc. On the other hand, there are pages with activities, exercises and drills that can be used both in the classroom, under the teacher’s guidance and monitoring, and in an out-of school environment, as individual study. Synchronous and asynchronous tools are also successfully used in language teaching, especially in e-learning environments. Real-time or synchronous communication is possible through the use of the Internet, via chat media such as Web chat programs or Internet Relay Chat or through software for local area networks. Students engage in exchanges with all the peers from their class or in smaller study groups. In these exchanges, the teacher plays a diminished role, while the exchange is not dominated by vocal students. Moreover, research has shown that computer-mediated discussions display a higher level of lexical and semantic complexity, which is due to the fact that students have more time to plan their interventions than in oral communication. (Warschauer 1998, 63). The transcripts of such discussions can be used subsequently for assessment of grammar analysis. Given the heightened level of complexity of the language used, language teachers find computer- assisted discussion a useful activity to complete oral debates. In asynchronous communication, the channels of communication between learners and teachers (turned from educators into facilitators of the learning process) mostly occurs via e- mail, but also through conferencing systems, bulletin boards or message board forums.

365

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Asynchronous education makes use of elements such as audio, video and text messages through which teachers share content with their students, the greatest advantage being flexibility and the fact that teachers are not bound by constraints of time and location. For learners, asynchronous learning provides several advantages. One is that they can access the materials any time and since feedback is not requested instantly, they have enough time to formulate an elaborate and well-organised response. Learners set up their own learning pace and rely less on their memory and more on critical thinking. There are also emotional and technical advantages: “as there is less pressure than a real time encounter, the affective filter remains low and learners can respond more innovatively and creatively. The chances of getting irritated by technological problems, like low speed and non-connectivity –are the least, as ample time to attempt e-tivities is available” (Perveen 2016, 22).

c. The environment

Regardless of whether it is one-to-one, individual or one-to-many, the teaching/learning experience would be inexistent in the absence of a learning environment. In formal education, the “learning environment refers to the diverse physical locations, context and cultures in which students learn. […] The term also encompasses the culture of a school or class […] as well as the ways in which teachers may organize an educational setting to facilitate learning, e.g. by conducting classes in relevant natural ecosystems, grouping desks in specific ways, decorating the walls with learning materials or utilizing audio, visual, and digital technologies” (The Glossary of Education Reform). It is obvious that the nature of the learning environment, which is more complex than the mere physical space of the process, has a profound impact on the transfer of information and on how it is processed and acquired by learners. A learning environment consists of a series of factors, whose degree of convergence governs its efficiency. These factors refer to teaching objectives and learning motivation, the activities that are meant to enhance learning, assessment methods and techniques necessary to quantify the registered progress and stimulate further learning efforts, but also the culture that governs the learning context. As we have already seen, the Internet can be used successfully in the traditional learning environment, with formal classrooms and unmediated teacher-student interaction. However, the Internet is an essential tool in the context of virtual learning. In the current global environment, when information is needed fast in various professional contexts, people’s mobility cannot pair the transfer of data. That is why the opportunity to access education services via the online environment is a great opportunity for institutions and learners alike. E-learning environments have become increasingly accessible due to the convenience they provide in terms of location and time restrictions that are bridged through this form of teaching/learning. E-learning refers to “software communication environment which provides technological means to conduct the educational process, its information support and documentation in the Internet to any number of educational institutions, regardless of their professional expertise

366

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT and level of education”1. E-learning is increasingly resorted to both in companies and organisations, and in universities and other higher education establishments. In the higher education environment, e-learning provides a number of advantages: it ensures instruction that matches the expectations of the newer generations of digital natives; it facilitates communication between different similar institutions around the globe; it facilitates the pooling of resources and materials that may overlap for distinct courses; it ensures the ability to reach a larger number of students, who cannot attend the physical classes; it represents a means to save time for the training staff and costs for the institutions involved and it is convenient in terms of flexibility for those learners who are subject to various time and location limitations. Web-based resources can be successfully used to improve all language skills. Writing skills and vocabulary can be enhanced with writing assignments that can be shared on one of the above-mentioned applications, through collaborative projects and even that peers and people from all around the world can access. Vocabulary can also be built online with the help of web pages that offer quizzes, word games and other drills free of charge. Listening skills can be improved through the use of videos, music clips, interviews, conversations, which expose learners to a wide range of dialects and accents, this being a great opportunity to for them to get accustomed to the characteristics of spoken English discourse. Based on these materials, teachers can design various types of exercises that aim at checking comprehension. Public broadcaster web pages such as BBC are an excellent resource of audio materials both due to the variety of topics approached and to the higher technical quality of the recordings. Reading skills can be improved by accessing authentic material such as newspapers, magazines or blogs. Learners can be guided towards various types of web pages (of public institutions, governments, commercial sites etc.) and asked to comment upon the content in terms of reliability, cohesion and coherence of information presentation, identification of various genres and assessment of content presented in a variety of ways. Development of speaking skills can be encouraged through the production of videos, interviews and real time conversations with peers and native language speakers.

3. ADVANTAGES OF USING THE INTERNET FOR FOREIGN LANGUAGE ACQUISITION

The increased level of accessibility of the Internet at institution level has triggered a heightened acknowledgment of the usefulness of this resource in the teaching/learning activity. Institutions providing educational services became aware that they can thus reach a wider number of learners, with the additional benefit of reducing costs. In turn, teachers realized the manifold advantages of this resource in terms of increased cohesion and communication with their students – most of them already proficient in the use of technological devices, higher accessibility and availability of materials and faster communication with peers from all over the world. The Internet also offers students and teachers alike the opportunity to access databases and libraries around the world

1 https://www.igi-global.com/dictionary/explore-the-possibility-of-recourses-and-elements-of-online-teachers- training-program/8794

367

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

One of the above-mentioned advantages is the readability of the information, which fosters out-of-classroom instruction. Students can access data from their own houses and with no restriction of time, outside school hours. They can read information on topics that interest them most in the language they study and can thus increase their linguistic proficiency while doing extra-work. This is particularly relevant in the case of English as a foreign language (EFL), since most information available on the Internet is in English. Apart from reading authentic materials in English, they can also access a variety of web pages where they can practice grammar, pronunciation or listening skills online. The main outcome is faster second language acquisition and language use in and outside the formal environment. Via the Internet, learners may also come into contact with native speakers, which motivates them to use the language in real life situations, thus improving their communicative and linguistic experience, because when they communicate with natives, learners have to ask for clarifications, express their own views, try to persuade their interlocutors and request information. Research has also shown that in the e-learning environment, student participation is higher; this form of learning context is particularly effective in the case of learners who find it difficult to integrate in a conventional classroom environment. Also in the case of e-learning, there is a shift of the traditional focus on the teacher towards the student, who is thus empowered to search for information and be more actively involved in the production of content. The teacher becomes a facilitator, and the learner turns into a more independent researcher and user of the data s/he needs. As the mechanisms and manifestations of languages are deeply embedded in culture, it would be impossible to teach a foreign language disregarding the cultural component. Since unmediated contact is subject to all sorts of limitations (having to do with availability of time and financial means, mostly), the Internet provides the opportunity to access cultural information from home or from school. Learners can gain insight into a given culture by watching documentaries and videos about the customs, traditions, gastronomy, arts of that given culture, they can read informative materials and can engage in communication with natives and do their own cultural investigation. An efficient means to use the web is to consider it a platform for sharing learners’ own pieces of writing or multimedia productions in their native tongue and/or in the foreign language they are studying. In this way, they may receive feedback from peers and other learners with similar interests and thus feel motivated and encouraged to improve and continue practicing the skills that can eventually lead to a high level of foreign language proficiency. Sayers, one the researchers who analysed the impact of the WWW and technological devices in education suggested the following benefits provided by web-based technology in the teaching/learning context (1993): • Increased motivation. Computers and applications are highly popular among students and due to their use in class, learning activities become more entertaining. At the same time, students feel more independent in the use of resources to improve their skills.

368

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

• Better learning outcomes. Learners improve their linguistic skills through their network activities, which consolidates their self-esteem and assists them in identifying the learning strategies that are most suitable to their particular situations. • Authentic materials. Students can access authentic reading and listening materials which are available to them round the clock, at low or no costs. • Customisation. Learners with adaptability issues find collaborative learning more beneficial and stimulating. On the other hand, highly ambitious ones can take additional tasks which they can perform in their own rhythm. • Multiple sources of information. Textbooks are still used in the classroom, but the Internet presents learners with an almost infinite number of other sources of information. An additional advantage is the opportunity of interdisciplinarity and heightened awareness of the rich cultural weaving of the world. • Increased communication and interaction. Through the use of services and features such as online bulletin groups and the use of collaborative applications, learners have the opportunity to interact both with their peers and with other persons from around the world sharing an interest in the same topics and disciplines. Learners also receive real time feedback when solving exercises online, which are automatically corrected.

4. DISADVANTAGES/PROBLEMS OF USING THE INTERNET FOR LEARNING/TEACHING

Despite the numerous and clear benefits provided by Internet use in teaching and learning, tapping into this otherwise valuable resource is not devoid of obstacles and challenges. On the one hand, there are extra-pedagogical problems. These have to do with Internet accessibility, which is inexistent or limited in many parts of the world, and the availability of computers and of other technical devices. Another problem is the little familiarity of both teachers and students with the use of a computer and/or of the Internet. In students’ case, this particularly applies to very young learners or to learners of a certain age, who do not have any technological training and skills (this is especially valid for organization- level training and higher education systems, which learners attend in order to complete their education). This usually generates some sort of computer anxiety, which may be a real obstacle in achieving an effective and positive learning outcome. The drawbacks of using the Internet in the teaching/learning context also depend on learners’ age. In the case of young learners, there are the serious issues related to inadequate content (violent materials, , crime-related content etc.), (with extremely serious and lasting effects upon the emotional development of the victim) and cheating (the copy-paste ‘solution’). In the case of more mature learners, there is still the issue of the vast amount of materials to be found online, much of them inaccurate, misleading or downright harmful. In this respect, teachers must also train learners on how to use the Internet responsibly and in a manner which is useful and helpful, without waste of time and with the discerning power to pick up the relevant information. This is the reason why teachers and trainers have to select the materials themselves for use in class and provide learners with suggestions of web sites and content that they can access with priority at home.

369

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

5. CONCLUSIONS

The pedagogy of foreign language teaching has undergone a significant number of changes in the last decades under the influence of technological advances and the pressure represented by the dynamics of learner profile and learning interests and objectives. In this era of digitalisation, the main components of the teaching/learning process – the people, the tools and the environment – need to adjust to the constantly changing context of education. Electronic devices and the Internet have been acknowledged as valuable resources for language training, with multiple advantages for both trainers and learners. In teachers’ case, the Internet can be used successfully as a rich source of authentic materials and inspiration for course design, lesson plans and practical assignments, but also as a platform for professional collaboration and communication. For learners, ICTs and the web used in learning settings translate into classes that are more entertaining, more varied sources of information and the opportunity to communicate globally with persons sharing the same learning interests. Although the benefits of the above-mentioned resources are numerous, they are counterbalanced by a number of difficulties that are, most of them, external to the instruction process per se (Internet connection limitations, availability of electronic devices, teachers’ and learners’ level of computer literacy etc.). Other obstacles to the effective use of these resources are connected with the variety and reliability of the information presented online and to the efficient time management of tasks and assignments involving web-based content and electronic devices. However, such obstacles can be overcome through teachers’ careful planning of activities and monitoring of assignment performance, the ultimate objective of this teaching/learning approach being to foster learner autonomy in the use and generation of content.

REFERENCES

[1] Aydin, S. The use of Internet in ESL Learning: Problems, advantages and disadvantages. Retrieved in November 2018 at https://www.researchgate.net/publication/274734328_The_use_of_the_Internet_in_ ESL_learning_Problems_advantages_and_disadvantages. [2] Geladze, D. (2015). Using the Internet and Computer Technologies in Learning/Teaching Process. In Journal of Education and Practice (pp. 67-69). Volume 6, No. 2. [3] Luckin, R., Bligh, B., Manches, A., Ainsworth, S., Crook, C. and Noss, R. (2012). Decoding Learning: The Proof, Promise and Potential of Digital Education. London: Nesta. [4] Perveen, A. (2016). Synchronous and Asynchronous E-language Learning: A Case Study of Virtual University of Pakistan. In Open Praxis (pp. 21-39). Volume 8, Issue 1, January-March. [5] Sayers, D. (1993). Distance team teaching and computer learning networks. In TESOL Journal (pp. 19-23). Volume 3(1).

370

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[6] Warschauer, M. and Healey, D. Computers and language learning: an overview. In Language Teaching (pp. 57-71). Volume 31, Issue 2, April. [7] Wagle, U. The Use of Internet in Language Classroom. Retrieved at https://neltachoutari.wordpress.com/2011/09/01/the-use-of-internet-in-language- classroom/ on November 10, 2018

Online resources https://en.oxforddictionaries.com/definition/electronic https://www.edglossary.org/learner/ https://opentextbc.ca/teachinginadigitalage/chapter/5-2-what-is-a-learning-environment/ https://opentextbc.ca/teachinginadigitalage/chapter/5-2-what-is-a-learning-environment/

371

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

ROMANIA’S ENERGETIC SYSTEM

Andreea BARBU 1* Bogdan TIGANOAIA 2

ABSTRACT

Through this paper, the authors are proposing to offer a global image of the current state of Romania’s energy sector, with emphasize on the energy production and the targets that are desired to be reached in this field, in the future. Other concepts will be taken into consideration, such as: the energy demand analysis on different branches, the structure analysis of the primary energy mix, the consumption, production and its composition in Romania in the year 2016, the energy production mix composition on a regular day in the year 2017.

KEYWORDS: energetic system, production mix, energetic objectives

INTRODUCTION

The problems with Romania’s energy system started to appear in 1882, when the first two power stations were put to use in Bucharest, providing the necessary energy for the Cotroceni Palace and the Regal Palace on Victory Road. [1] Around the 1900’s some hydroelectric power plants were put in use, the energetic system has been evolving ever since. As shown in a report done by the Citigroup in 2014[2], Romania is a medium sized market, with almost 20GW capacities installed, placed on the second place on a market sizes scale in Central and Eastern Europe. Although the energy demand in Romania is mainly based on the industrial players, the average consumption level per capita was lower in 2014 than half of the European average. During 1990, the production, transport and electricity distribution was held by the public company RENEL[3]. After the successive restructuring in the year 1998, the company was divided into: - CONEL- national electricity company - Nuclearelectrica- nuclear energy supply company - RAAN- Autonomous Municipal for Nuclear Activities

1* corresponding author, Engineer, Ph.D. Student, University POLITEHNICA of Bucharest, [email protected] 2 Lecturer Engineer, Ph.D., University POLITEHNICA of Bucharest, [email protected]

372

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Subsequently, in the year 2000, CONEL[2] was split in four different companies, each one with certain responsibilities: - Termoelectrica S.A. – fossil fuel supply company - Hidroelectrica S.A.- hydro energy supply company - Electrica S.A. – distribution company, with eight subsidiaries for distribution and supply - Transelectrica S.A.- transport company The electricity distribution system in Romania is made of around 375.000 km of distribution network that ties together more than 9 million customers. The network is operated by eight licensed distribution companies, three of them being part of Enel, three of Electrica, one of CEZ and one of E.ON.

Figure 1. The energy distribution map (Source: Electrica)

From a legislative point of view, ever since 2007, the liberalization of the gas market in Romania has started, all the consumers being eligible to change their natural gas supplier. Although the liberalization of the gas market was officially made, the natural gas provision to the final consumers in a regulated regime continued. In the present days, according to the Electricity and Natural Gases (Law no. 123/2012), any natural gases consumer who exercised his right to eligibility has no right to go back to a regulated market. [4] In general, the consumers on the competitive market are the industrial ones and the producers of electricity whom benefited after the market liberalization through obtaining a

373

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT lower price on natural gases than the one present on the regulated market. This regulated market represents 45.8% of the annual natural gases consumption, the growth rate of the consumers whom exercised their right of eligibility being very low. Within Romania’s Energetic Strategy, presented in the beginning of 2017 [4], the Ministry of Environment shows the evolution of the natural gases market liberalization process in Romania, aiming towards the natural gases quantity that was supplied to the final consumers on the regulated market as a part of the total consumption, and also on the total number of final consumers that exercised their right to eligibility (changed the supplier).

EXPERIMENTAL WORK

Regarding its contents, this paper shows aspects linked to the current situation of Romania’s energetic system and the EU objectives regarding the electric energy in Romania’s region. In order to present Romania’s current state, the country’s Energy Strategy has been analyzed. The analysis had been completed with data interpreted and taken from the official Transelectrica website, from which the consumption, production and the production’s components were determined, both on a regular day in the year 2017 and also for the full year of 2016. For this analysis, we selected each last day of each month, comparing the data that had been registered between 1 p.m. and 2 p.m. The data had been shown in Table 2 and Figure 4. This study is limited to the data provided by the Ministry of Environment through their official websites, the bibliographic online sources being the only relevant ones at the moment, for this field.

RESULTS AND DISSCUSIONS

Electricity Consumption

According to the Romania’s Energy Strategy, throughout the years, the electricity consumption fluctuated in the country, the registered values being between 40 TWh and 60 TWh. If between the years 2008 and 2009 the economic crisis determined a decrease of the electricity consumption, it gradually returned to 47.5 TWh in 2015, in 2016 the values being close to the ones registered the year before. Although in the year 2015 Romania was ranked 6th with the lowest average price for the electricity for household consumption in the European Union (Eurostat), the purchasing power is relatively low, which means that the price is becoming a huge problem, making us have an elevated level of energy poverty. The households prefer using gas as an energetic resource, since its price per kWh is three times lower than the electricity one. The electricity consumption needs to be looked at from an electromobility’s point of view, meaning the public transport electricity usage -which decreased in the last year- or at the electric vehicles acquisition, whose price is still high. Romania’s gross consumption suffered a considerable decrease between the years 1990- 2015, reaching 377 TWh in 2015[4]. The Romania’s Energetic Strategy shows that

374

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT following the modeling of The Optimum Script, in 2030 the gross consumption will reach 394 TWh (a 4% increase), while the final energy demand will have a value of 269 TWh (a 6% increase). The same modeling of the script provides, however, for the year 2050, a decrease of 7% for the demand of primary energy comparing the year 2030 (from 394 TWh to 365 TWh).

Figure 2. The final energy demand on each activity branch, during 2015-2050 (%)

According to the data shown in the Romania’s Energetic Strategy and the ones presented in the Figure 2, we can observe the following: - The households sector remains the biggest, judging by the consumption of final energy, although it registers a decrease of 2% in the year 2030 compared to 2015 and one of 3% in 2050 compared to 2015. - The smallest sector, judging by the consumption of final energy, remains the agriculture and services one, sector that in 2015 used 11% out of the final energy, while in 2030 and 2050 it decreased to 10%. - The transports sector is estimated to increase from 26% in 2015 to 29% in 2050. - The other industrial sectors are also having an estimated increase of the final energy usage by 4% from 2015 to 2050.

The Energy Mix

Regarding the structure of Romania’s energy mix, we can observe that, according to the data provided by Romania’s Energy Strategy between the years 2016-2030, the fact that in 2015 the highest ratio in the mix belonged to natural gases -29% (111 TWh), closely followed by petroleum- 26% (100 TWh). The renewable energy has 19% of the mix (72 TWh), while coal has 17% (65 TWh). The smallest ratio belongs to nuclear energy -9% (34 TWh), caused by the inefficiency of the Uranium resources exploitation, respectively the closing of the Uranium mines. The difference between the gross consumption of energy and the primary energy mix is represented by the net export of electricity (6 TWh). The desired composition for the energy mix by the year 2030 states especially changes regarding the increase by 8% of nuclear energy (from 9% to 17%), the increase of

375

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT renewable energy by 3% (from 19% to 22%), maintaining at the same levels of petroleum consumption, the decrease of the coal consumption with 7% (down to 10%), respectively the decrease with 3% of natural gases consumption (down to 26%).

Figure 3. The structure of the primary energy mix in 2015 and 2030 (Source: PRIMES)

Following the simulation of different scenarios, information regarding the better usage of primary energy resources came up. If we’re talking about petroleum, we can say that the decrease of the oil price in the last years, as well as the investments that need to be done in this field, will lead to a cutting in half of the internal production of petroleum, reaching 2 mil tons in 2030. The results of the natural gases production modeling are different, at least regarding the price evolution, the production depending a lot on the developing of the recently discovered supplies in the Black Sea. It is although expected that until the year 2030 the maximum level of the production of natural gases coming from the Black Sea will be reached. [6] The situation regarding the production of brown and hard coal depends a lot on the national raw material demand, and also on the competitiveness of the raw material’s price. Although for the years between 2017 and 2023 the closing of 5 coal quarries are being foreseen, until the year 2030, 15 quarries will be maintained opened and active. The evolution of biomass is being watched with great interest, since the usage of firewood remains an important representation of the biomass used for energetic purposes. Despite all this, until the year 2030, a decrease by 20% of the usage of firewood is being foreseen, but also an increase of the potential of producing biofuels and biogas. Referring to the import of energetic resources, we can say that, at the moment, Romania is placed third in the European Union, with the lowest level of primary energy imported. If in 2015 the net import was 16% out of the primary energy, it is estimated that before 2030, if the hydrocarbon resources will end, the import is not going to overcome 25% out of the internal usage. [4]

376

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Electricity Production

Regarding the production of electricity, it can be said that Romania’s primary energetic resources are the indigenous ones. The energetic mix is a diverse one with an increase of the renewable energy component being desired. If we analyze the period of time between the years 2015 and 2016, we can observe how the energy mix suffered certain changes regarding the coal, hydro, natural gas and aeolian components. This way, an increase by 2% is noticed for the hydro and natural gas components, a decrease by 3% for coal and by 1% for the aeolian one. As it can be seen in Table 1, the rest of the components remain at a ratio value. The biggest problem that Romania is facing is still the old age of the electricity generating capacities, oldness that is close to 30 years, which means that their technical functioning length is close to an end, parts of them being closed often for reparations. [7]

Table 1. The electricity production- the energetic mix (data provided by the Romania Energetic Strategy) Electricity Mix 2015 (%) 2016 (%) Coal 28 25 Hydro 27 29 Nuclear 18 18 Natural Gas 13 15 Aeolian 11 10 Photovoltaic 2 2 Biomass 1 1

In the year 2015, 40% of the annual electricity production is done by the thermoelectric capacities, which are based on coal and natural gas. Also shown in the table above is the fact that Romania continues to use nuclear energy, our country being one of the 14 states members of the European Union that can produce a part of their electricity this way. This type of electricity production could be increased by 10% if two other reactors would be put in use in the future at Cernavoda. This action will also determine an increase of the pressure upon the competitiveness of the natural gases and coal producers. Looking over the projects that needs to be implemented by the year 2020, we could mention that the natural gases station with combined cycle from Iernut, project made by the Romgaz company and also the investments done by the Hydroelectrical company, investments that aim at the finalization of 200MW new capacities and the upgrading of the existing ones. Also, for the past 5 years, more aeolian and photovoltaic (4500MW) capacities came to life, leading to a downgrade of the electricity price and import, as well as an upgrade of the targets for the reduction of the natural gases with greenhouse effect for the year 2020. Nevertheless, the aeolian stations produce more energy during the evening and winter, while the photovoltaic ones produce more energy during the day and in the summer.

377

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

According to the data provided live by the Transelectrica [8], on 02/03/2017, the biggest component of the electricity production was the usage of coal 30.55%, followed by hydrocarbons 21.74%, the hydro component 17.18%, the nuclear component 16.55%, aeolian 9.6%, photovoltaic 3.74% and only 0.63% for the biomass.

Figure 4. The components of the electricity production mix, february 2017

Looking at the data provided by Transelectrica on their official website [9], we determined the consumption, production and its components for the year 2016. In order to do this analysis, we selected the last day of each month of the year, comparing the data that was registered between 1 p.m. and 2 p.m. The data is presented in Table 2 and Figure 5. From the data that was presented, it can be observed that the biggest consumption was registered in October 2016 (7554MW), closely followed by February (7504MW) and June (7443MW). The biggest electricity productions in 2016 were registered in October (8835MW), June (8800MW), August (8491MW) and November (8328MW), the differences between consumption and production being shown as the negative balance represents the energy export. The only months in which the consumption was bigger than the internal production were September with a - 314MW difference, March with - 259MW and May with - 9MW. Analyzing the energetic mix in 2016, we can observe the fact that a big part of the internal production was represented by the following components: water, coal, nuclear and hydrocarbons. The smallest ratio belongs to biomass, followed by the photovoltaic and aeolian components, the last two depending a lot on the season and weather conditions. At a first glance over the Table 2, the values in May appear as „intruders” in the regular mix ratio, with a consumption that exceeds only with 9MW the internal electricity production, whose componence is dominated by the usage of water and coal. The values for the hydrocarbons, nuclear, aeolian, photovoltaic and biomass components remain among the lowest throughout the year, with a production average of 366MW.

378

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Table 2. The electricity consumption, production and its composition, 2016 Date Consu Avera Produ Coal Hydro Water Nuclea Aeolia Photo[ Bioma Sold mption ge ction [MW] carbon [MW] r n MW] ss [MW] [MW] Consu [MW] s [MW] [MW] [MW] mption [MW] [MW] January 6542 6553 7894 1809 1149 1248 1325 2159 155 48 -1352

Februar 7504 7501 8012 1726 959 2167 1339 1443 328 51 -508 y March 6797 6837 6538 1439 824 1879 1361 295 697 42 259

April 5721 5727 6295 1023 609 2553 1404 56 601 48 -574

May 6941 6923 6932 2087 362 3068 704 77 640 45 9

June 7443 7437 8800 1849 868 2841 1328 1169 707 38 -1357

July 6071 6110 6832 1746 804 2051 1381 84 718 48 -762

August 6954 6995 8491 1614 1326 2160 1325 1393 633 41 -1537

Septem 6541 6624 6226 1718 619 1693 1377 32 720 67 314 ber Octobe 7554 7610 8835 2217 1218 2423 1382 1183 357 55 -1281 r Novem 7283 7279 8328 1864 1707 2861 1326 152 355 61 -1045 ber Decem 6908 6927 7762 1829 1657 1741 1407 677 435 43 -854 ber

Figure 5. Electricity production and consumption, 2016

If we are referring to the primary energy production evolution in Romania, looking at the source of energy, we can say that in 2015, the biggest ratio for primary energy production

379

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT was registered for natural gas, with 34%. After the modelling of the Optimum Script, it is expected for this value to drop with 3% until 2030 and with 12% until 2050. A big decrease of the ratio is also foreseen for the energy coming from petroleum, which will get from 15% in 2015 to 7% in 2030 and only to 5% in 2050. Coal is also registering a decrease, getting from 18% in 2015 to only 4% in 2050. On the other side of the evolution of the primary energy production is the geothermal energy [5], which although in 2015 had an insignificant ratio, until 2050 it is expected to reach 1%. Another significant increase is given by the solar energy, which from 1% in 2015 will get to 2% in 2030 and even 5% in 2050. Aeolian energy will be even more exploited, doubling its ratio until 2030, registering 9% out of the total energy produced until 2050. Biomass and dump goods are also registering an increase for the next years, from 14% in 2015 to 16% in 2030 and 24% in 2050. Nuclear energy is also registering a spectacular increase, doubling its ratio until 2030 (from 11% to 22%) and reaching 24% in 2050. Hydro energy remains with almost the same ratio in the energy production mix, with values of almost 18% in 2030 and 2050.

Figure 6. The primary energy production evolution in Romania, by token of the energy source (Source: PRIMES)

The role of the investments done in this energetic sector remains a very important factor for everything that means the study and the insurance of the energetic efficiency. This way, for the time between the years 2030 and 2050, the level of expenses from the investments in energy rises up to 5.7% in 2030, 5% in 2035, 5.3% in 2044 and 4.9% in 2050. On average, between 2030 and 2050, investments in the energetic sector will be around 665 mils. Euro (exchange rate from 2013). Following the PRIMES modelling results, The Ministry of Energy foresees the following decarbonating target values for the next years:

Table 3. Indicative decarbonating targets for the years 2020, 2030 and 2050 Indicator U.M. 2015 2020 2030 2050 Reduction of greenhouse effect gas emsions % compared to 1990 54 55 62 75 Ratio of energy from renewable sources % 26,3 24 27 47 The intensity of energy in economy tep/mil €2013 218 190 155 105

380

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Although the decrease of the greenhouse effect gas emissions should be around 80% compared to year 1990, after the modelling of the Optimum Script, this value is only 75% in 2050 for Romania. This target could be reached by 2050, but with huge investments effort. In any case, these models took into consideration the fact that an energetic streamlining of homes is desired and an acceleration of transport and support for the Romania Energetic Strategy through the offering of subsidies. This fact would make the rest of the necessary investments for covering the other 5% for the reduction of GES unjustifiable from a financial point of view, their value being way too high. Until 2020, the ratio of renewable energy sources will reach 24%, for 2030 a value of 27% being desired while for 2050 the value will be of 47%. As it can be seen in Table 3, the energetic intensity in economy has a tendency of decreasing for the years 2030 through 2050.

CONCLUSIONS

This paper offers a general image over the status of Romania’s energy sector compared to the registered average in the European Union and the targets that are desired to be registered in the future. One of the most important aspects addressed in this paper refers to the study of the structure of the primary energy mix, the paper including an analysis of the registered data in 2015 and the targets that needs to be reached until the year 2030. Thereby, the biggest difference registered shows the increase of nuclear energy as a percentage in the structure of the energy mix (8%), and also the decrease of the coal consumption (7%). Another analysis presented in this paper refers to the demand of final energy on the activity sectors, analyzing the data from 2015, targets from 2030 and from 2050. The highest ratios were registered in the household sector (34%), on the opposite side being the agriculture and services, with a percentage of only 11% in 2015. The order of the ratios from the targeted sectors is being kept the same until 2050, the ratios increase or decrease in the energy demand evolution being of only a few percentages throughout the 35 years that are being analyzed. Referring to the production of electricity, a case study presented with the help of the data taken from the Transelectrica website triggers a warning signal regarding the components of the energy production mix at the beginning of 2017. The highest ratio in the mix belongs, unfortunately, to coal (30.55%), on the opposite side being the renewable sources -aeolian, photovoltaic and biomass. Going back to the study comparing the situation in the year 2015 and the targets that are desired to be reached until 2030, 2050 respectively, the primary energy production in Romania states, first of all, the massive decrease of energy produced from the usage of coal (from 55% in 2015 to 12% in 2050) and the increase of energy coming from alternative sources. Also, another interesting target refers to the usage of nuclear energy, which is not used at maximum capacity in our country.

381

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

In the future, the authors are aiming to go even further with this study, to see which is the current status of the liberalization of the energy market in the member states of the European Union, emphasizing the existing situation in Romania and the way in which the liberalization affects the country economically.

REFERENCES

[1] V.Vaida: Thermoelectric and Heating Stations in Romania: Past, Present, Future http://tineret.sier.ro/istorie/Documente/termo.pdf [Online – Date of access: 01/10/2017] [2] Citi Citi Research, Electrica SA (ROEL.BX), 7th August 2014 [3] https://ro.wikipedia.org/wiki/Energia_electric%C4%83_%C3%AEn_ Rom%C3%A2nia [Online – Date of access: 01/10/2017] [4] Romania Energy Strategy http://www.mmediu.ro/app/webroot/uploads/files/2017-03-02_Analiza- stadiului-actual.pdf [Online - Date of access: 01/30/2017] [5] Romanian Energy Strategy 2016 - 2030,With An Outlook To 2050 http://www.solarthermalworld.org/sites/gstec/files/news/file/2016-12-30/romanian- energy-strategy-2016-2030-executive-summary3.pdf [Online - Date of access: 01/30/2017] [6] http://www.economica.net/iulian-iancu-romania-dispune-de-rezerve-uriase- de-gaze-naturale-în-marea-neagra-270-miliarde-mc_110635.html#n [Online - Date of access: 01/30/2017] [7] http://jurnalul.ro/bani-afaceri/economia/ge-infrastructura-energetică-a- romaniei-este-foarte-imbatranita-şi-are-nevoie-de-investitii-de-miliarde-de- euro-711517.html [Online - Date of access: 01/30/2017] [8] http://www.transelectrica.ro/web/tel/sistemul-energetic-national [Online - Date of access: 02/03/2017] [9] http://www.transelectrica.ro/widget/web/tel/sen-grafic/- /SENGrafic_WAR_SENGraficportlet [Online - Date of access: 02/03/2017]

382

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

IMAGE RECOLORING FOR COLOR-DEFICIENT VIEWERS

Elena BANTAŞ 1* Costin-Anton BOIANGIU 2

ABSTRACT

Color is relied heavily upon when conveying information in a visual manner, in many, if not all, socio-cultural groups, day-to-day activities, and fields of work. Any misrepresentation or misperception of colors can cause issues stemming from the incorrect interpretation, and in some cases complete absence of information, with impact ranging from negligible inconveniences, such as an unpleasant aesthetic impression, to more serious consequences, especially in cases where information is primarily coded by color (e.g. warning labels, or traffic lights). This, in conjunction with the current spread of digital imagery, has led to the study and development of methods of recoloring images so as to allow color deficient viewers to perceive them as accurately as possible, and deliver a more complete visual experience.

KEYWORDS: Image Recoloring, Color Deficient Viewer, Color Perception, Trichromacy, Dichromacy, Monochromacy, Protanomalous, Deuteranomalous, Tritanomalous, Protanopic, Deuteranopic, Tritanopic.

INTRODUCTION

Studies[1]–[4] estimate that around 3% of the population, specifically between 2% and 10% of males and between 0.1% and 1.5% of females, experience a form of color deficiency, commonly referred to as color blindness, amounting to 200 million persons worldwide. While color blindness can be acquired as a result of an illness or injury, in most cases it is congenital, and affects individuals of all demographics.

COLOR SPACES

In broad terms, processing colored imagery for the use of deficient viewers can be described as remapping to a gamut of lower dimension, while aiming to preserve, and in certain cases enhance, the perceptual differences between colors. As such, many papers in this area of research make use of the CIE-L*a*b* color space to facilitate the assessment of perception, as well as to ensure a device independent representation. CIE-L*a*b* also allows the use of Euclidean distances between points to represent relative perceptual differences, and proves particularly relevant when discussing dichromacy.

1* corresponding author, Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 2 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

383

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

The CIE-L*a*b* standard, a successor to Hunter’s Lab color space [5], is a simplified mathematical approximation of an uniform color space [6], where lightness or luminance is handled by the L* component, a* handles the red–green axis, and b* the blue–yellow axis. Of the three axis, only L* is standardized to span from 0 to 100, the remaining two are theoretically unbounded, though in practice dependent on the medium on which colors will be displayed. In order to better replicate the perception of a real eye, the CIE-L*a*b* system has nonlinear relations between its components, which means conversion to and from other color spaces isn’t always straightforward. Generally speaking, the transformation can be formalized as:

116 16 √ where and 500

200

For the values , , and , a reference white point is needed[6], which is directly related to the (theoretical) light source taken into account. Values for certain lighting conditions have been determined by researchers [7],[8].

COLOR PERCEPTION FOR DEFICIENT VIEWERS

Figure 1. Spectral sensitivity of rods and the three cell types: Top, normal viewer; Bottom, from left: protanopes, deuteranopes, tritanopes. Images taken from [9]

Normal color perception requires that three types of photoreceptors are present in the eye and function correctly. These cells contain pigments sensitive to three portions of the

384

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT visible spectrum, and are usually known as S cone cells (short wavelength, responsible for perceiving blue), M cone cells (medium wavelength, green), and L cone cells (long wavelength, red). Since an accurate reception of a color depends on all three pigments involved, variation in these pigments’ composition − and therefore the cells’ sensitivities − will manifest as a denaturation in an individual’s vision. Viewers that experience abnormalities in color vision can be classified as exhibiting anomalous trichromacy (all three color pigments are present, but one of them is anomalous), dichromacy (only two of the pigments are present), and monochromacy (a single pigment or none at all; the rarest of the three)[9]. In reference to the pigments, protan is used if the red pigment is affected, deutan for green, and tritan for blue. Thus, trichromats (persons exhibiting trichromacy) can be either protanomalous, deuteranomalous, or tritanomalous, while dichromats can be either protanopic, deuteranopic, or tritanopic.

Figure 2. Simulated color perception: From left to right: normal spectrum, protanopes, deuteranopes, and tritanopes. Image taken from [9].

A simulation of color perception with the full spectrum as reference is presented in figure 2, while figure 3 provides an estimation of how a real image might be perceived by viewers with perception anomalies.

Figure 3. Simulated perception of a natural image: From left to right: normal viewers, protanopes, deuteranopes, and tritanopes. Image taken from [10]

385

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 4. Dichromat color planes: From left, protanope (θp = −11.48◦), deuteranope (θd = −8.11◦), tritanope (θt = 46.37◦). Image taken from [9]

Making use of the CIE-L*a*b* standard, dichromat perception can be approximated[9] through a set of three planes defined within the color space. Each anomalous perception is expressed as a deviation between the L*b* plan and a plan rotated around the L* axis, with reference to the b* axis. For each of the three types of dichromat an angle has been determined by Kuhn et al.[11], as illustrated in figure 4.

RECOLORING METHODS

With regards to the different types of color vision deficiency, research in the field has been largely focused on dichromacy, and prevalently on red–green insensitivity and blindness, as these forms of deficiency are the most commonly occurring[1]. Rasche et al.[12][13] proposed an algorithm focused on transforming color images to grey scale images, and which can be extended to accommodate color blindness. While the results are very promising, this method can have a high computational cost, and does not always deliver a natural-seeming image.

Figure 5. Rasche et al [12] proposed algorithm: From left to right: a natural image and the natural image as seen by a tritanope; a recolored image using the Rasche et al. algorithm and the recolored image as seen by a tritanope. Image taken from [12]

The paper defines an objective function, incorporating two main goals — preserving contrast and maintaining luminance consistency — and ultimately resolves to a linear optimisation problem. The proposed procedure can be expressed as follows:

apply quantization to select N colors / select reference points solve linear programming problem using reference points apply interpolation using transformed reference points The linear problem that is to be solved in this method is the minimization of the per- component error (between the ideal color values and the real values), with the goal of

386

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT maintaining relative contrast differences between colors, by employing constrained multidimensional scaling technique. This type of algorithms do not scale well, wherefrom the importance of the quantization step, and the long run times reported by the authors. One technique that aims to preserve the naturalness of images recolored for dichromat viewers is proposed by Kuhn et al.[11]. The algorithm is deterministic, hence its results should be consistent and reproducible.

Figure 6 Rasche et al.[12] and Kuhn et al.[11]: From left to right: a natural image and the natural image as seen by protanopes; a recolored image using Rasche et al. and a recolored image using Kuhn et al. Image taken from[11]

The proposed algorithm consists of a quantization step followed by a mass-spring system optimization step, and a final reconstruction step, as such:

apply quantization to select N colors / select reference points optimize mass-spring system apply interpolation using transformed reference points The mass-spring optimization allows both for a short computation time, and a more natural looking result, due to the fact that this method does not affect colors that are perceived the same by trichromats and dichromats. A mass-spring model contains a number of particles, each with a mass and a position in space, connected to one another through springs with a certain rest length. Particles will naturally push and pull each other; based on the rest length of the spring that connects them, they move as much as their mass allows them (i.e. a heavy particle will move less than a lighter one). Eventually, a point will be reached where the particles have arranged themselves into an optimal configuration, which can be used as an optimization heuristic if a problem can be formulated to fit the model. In this case, the particles are the quantized color reference points, and their mass is inversely proportional to how problematic they are for viewers with a certain deficiency, while the spring’s rest length is the perceptual difference between colors. Another method that places great importance on naturalness is introduced by Huang et al. [14], who also leave the luminance component untouched, only manipulating the a* and b* components, and use a remapping heuristic that modifies colors proportionally to their relevance to the examined dichromacy. Results show improvement over the Rasche et al.[12] method, both in perceived contrast and naturalness.

387

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 7 Rasche et al.[12] and Huang et al.[14]: From left to right: a natural image and the natural image as seen by a tritanope; a recolored image using Rasche et al. and a recolored image using Huang et al. Image taken from[14]

The driving principle behind this technique’s optimization phase is a series of optimization steps, applied to each selected color in order, based on the perceived anomaly in deficient viewers and consisting of a minimization of deviation in the contrast between a pair of two original colors and a pair of corresponding remapped colors.

apply quantization to select N colors / select reference points prioritize reference points for each reference point, starting with the highest priority, optimize transformed point using previously transformed points apply interpolation using transformed reference points Just like Kuhn et al. [11], clustering is employed in order to reduce the computational load, and interpolation to build the final result.

CONCLUSIONS

Given the importance of visual imagery, as well as its widespread use in digital media, methods of transforming images to suit color deficient viewers can be very valuable, and research in the field has shown potential. The first method discussed in this paper, proposed by Rasche et al. [12] provides a good reference point and produces good results, but lacks first in efficiency, due to very high processing times, and, secondly, in naturalness. Images produced by Rasche et al. often do not preserve the perceived aesthetic of an image, focusing mainly on removing confusing color pairings. A more optimized implementation of the algorithm, in view of increasing computation speed, may prove useful. Kuhn et al. [11] seek to improve both processing times and the naturalness of the result image. By using a mass-spring system in the optimization step, not only is the algorithm’s speed superior to that of Rasche et al., but since certain colors, ones that do not affect a deficient viewer’s perception, are not modified, the final product is noticeably more natural seeming. Similarly, Huang et al.[14] also obtain an increase in naturalness, by prioritizing colors based on the effect they have on a dichromat individual’s perception. The introduction of this processing order also reduces the computational load, with a single original color- transformed color pair being treated at a time.

388

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Although very productive, studies so far concern static imagery. While these methods would provide good transformations for single frames, if applied sequentially, to each frame of a video, the end result may suffer from color changes. The non-uniformity introduced by the lack of inter-frame consistency may produce unpleasant results, or even further undermine an already anomalous perception. Machado and Oliveira[15] present a temporally coherent method for contrast enhancement, though their technique seems aimed at data visualization, thus has not focused on preserving naturalness. Possible future work in this area of interest can focus on three aspects: optimizing performance of existing techniques, the scarceness of publicly available software applications that can perform color transformations for dichromats, as well as development of a procedure suitable for naturalness-preserving processing of video content. As far as multi-frame content is concerned, using the existent single-frame methods as a starting point, a proposed algorithm might follow these steps:

for each frame in a video, apply color transformation build a look-up table using the recolored frame for each color pair in the look-up tables, considering its values along frames, apply a smoothing filter for each frame in a video, recolor using smoothed look-up table / interpolation The above mentioned look-up tables are meant to provide an approximation of the recoloring of each frame, by cataloguing how each original color was transformed in each context. By smoothing the transformed values over time, a better overall consistency could be achieved, although the initial selection of reference points might prove problematic, particularly since skipping the quantization step would be very costly in terms of computational effort, especially considering that video formats are often high-resolution. Another potential approach might be the batch-processing of frames, by selecting a set of frames, stitching them together to provide input for a single-frame transformation method, and separating the output to the original frames, although this relies heavily on the premise of a high-performance, time-efficient recoloring method, and would still cause inconsistency issues in videos with lengths larger than the algorithm’s supported input size.

ACKNOWLEDGEMENT

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0689 / „Lib2Life- Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate” / "Revitalizing Libraries and Cultural Heritage through Advanced Technologies", within PNCDI III.

REFERENCES

[1] W. R. Miles, “Color Blindness in Eleven Thousand Museum Visitors,” Yale Journal of Biology and Medicine, vol. 16, no. 1, pp. 59–76, 1943.

389

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[2] H. B. Kim, S. Y. Lee, J. K. Choe, J. H. Lee, and B. H. Ahn, “The incidence of congenital color deficiency among Koreans,” Journal of Korean Medical Science, vol. 4, no. 3, pp. 117–120, 1989. [3] J. Scanlon and J. Roberts, “Color Vision Deficiencies in Children, ,” Rockville, MD: National Center for Health Statistics, vol. 11, no. 118, pp. 1–34, 1972. [4] J. Z. Xie et al., “Color vision deficiency in preschool children: the multi-ethnic pediatric eye disease study,” Ophthalmology, vol. 121, no. 7, pp. 1469–74, Jul. 2014. [5] R. S. Hunter, “Photoelectric Color Difference Meter,” Journal of the Optical Society of America, vol. 48, no. 12, p. 985, 1958. [6] B. Hill, T. Roger, and F. W. Vorhagen, “Comparative Analysis of the Quantization of Color Spaces on the Basis of the CIELAB Color-Difference Formula,” ACM Transactions on Graphics, vol. 16, no. 2, pp. 109–154, 1997. [7] K. M. M. Krishna Prasad, S. Raheem, P. Vijayalekshmi, and C. Kamala Sastri, “Basic aspects and applications of tristimulus colorimetry,” Talanta, vol. 43, no. 8. pp. 1187–1206, 1996. [8] A. Shams-Nateri, “Estimation of CIE tristimulus values under various illuminants,” Color Research and Application, vol. 34, no. 2, pp. 100–107, 2009. [9] G. M. Machado, “A model for simulation of color vision deficiency and a color contrast enhancement technique for dichromats,” Universidade Federal do Rio Grande do Sul, 2010. [10] L. T. Sharpe, A. Stockman, H. Jägle, and J. Nathans, “Opsin genes, cone photopigments, color vision, and color blindness,” in Color Vision: From Genes to Perception, K. R. Gegenfurtner and L. T. Sharpe, Eds. Cambridge University Press, 2001, pp. 3–51. [11] G. R. Kuhn, M. M. Oliveira, and L. A. F. Fernandes, “An Efficient Naturalness- Preserving Image-Recoloring Method for Dichromats,” IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 6, pp. 1747–1754, 2008. [12] K. Rasche, R. Geist, and J. Westall, “Re-coloring images for gamuts of lower dimension,” Computer Graphics Forum, vol. 24, no. 3, pp. 423–432, 2005. [13] K. Rasche, R. Geist, and J. Westall, “Detail preserving reproduction of color images for monochromats and dichromats,” IEEE Computer Graphics and Applications, vol. 25, no. 3, pp. 22–30, 2005. [14] C. R. Huang, K. C. Chiu, and C. S. Chen, “Key color priority based image recoloring for dichromats,” Lecture Notes in Computer Science, vol. 6298, PART 2, pp. 637–647, 2010. [15] G. M. Machado and M. M. Oliveira, “Real-time temporal-coherent color contrast enhancement for dichromats,” Computer Graphics Forum, vol. 29, no. 3, pp. 933– 942, 2010.

390

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

TAKING DECISION BASED ON THE REGRESSION MODEL USING EXCEL 2016

Ana-Maria Mihaela IORDACHE 1*

ABSTRACT

The management decision is represented by the process of choosing a line of action in order to achieve goals, the application of which affects the activity of at least one other person than the decision-maker. The decision making and the possibilities of assisting it depend on the context in which this process takes place and on the typology of decisions. This paper discusses how to make a decision about the choice of the linear regression model. The case study is focused on the observations made on a production company for a period of 30 years. At the end of the study the regression model was chosen in such a way that maximizes the possibility of making the right decision on the production plan over the following periods.

KEYWORDS: decision, Excel 2016, regression, information, linear, knowledge

INTRODUCTION

The decision is the result of conscious activities and means choosing an optimal solution to solve a problem, which has implications in the activity of another person. The decision involves choosing a course of action and engaging in it, which usually involves the allocation of resources. The decision belongs to a person or group of persons who: have professional competence, have the necessary authority and are responsible for the use of resources in certain given situations. The decision is the focus of management activity because it is found in all its functions. The success of a decision depends on few factors as: the decision quality, the way of implementing the decision and how the objectives proposed by its implementation have been achieved. Moreover, the business integration within the business environment depends on the quality of the decision. The quality of the decision refers to its compatibility with the existing restrictions. The essential attribute which characterizes the decision is that of choosing from more alternatives. The alternatives should only be identified from an existing offer (for example, the list of job placement companies or products on the market) or invented by making scenarios (for example, establishing the production program variants). When companies make decisions, they must take into account the following factors that influence their quality:

1* Lecturer, Phd, Romanian-American University, Bucharest, [email protected]

391

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Decision=f[(fc,fin);(V,M,R)], V=f(C,Q), where - fc are the known factors (information, restrictions, influences); - fin are the factors of uncertainty and risk determined by the environment; - V is the value of the human factor; - C represents the person's knowledge and experience who makes the decision; - Q represents the ability of the person to adapt who makes the decision; - M is the motivation of the decision; - R is the responsibility of the person who makes the decision. Through criteria that every manager should take account when make a decision, are: avoiding improvisations and subjectivism in the process of taking decisions, the decision must be taken only by persons who are legally invested and authorized to do so, the decision must include all the elements necessary for proper understanding and implementation, it is preferably a good decision that is taken in time than a very good decision taken later, it is intended to get the best effect for a certain effort, to ensure that decisions on the various departments of the organization are compatible with each other and lead to accomplishment the overall objective of the enterprise. Taking a decision can be treated from two perspectives: classic and knowledge. From the classic perspective, a decision is treated as a choice of a possible alternative to the action from several possible pre-identified ones. In general, the number of alternatives identified and taken into account in the decision-making process is high. The issues that arise here relate to the number of alternatives, how to identify the hidden alternatives, and how these alternatives are managed to be forgotten or omitted. Then there is the problem of selecting one of them, following an analysis of the alternatives, highlighting the implications of each and the impact in accordance with the proposed objectives. The vision from the perspective of knowledge treats the decision as knowledge of how the action is carried out. When a decision is treated as knowledge, the decision-making process becomes a process of creating new knowledge through the use of existing ones. Decision making is seen from this perspective as an intensive activity in knowledge. Microsoft Excel is an outstanding software platform of MS Office. It has great applications in statistical analysis in every field wherever it is needed statistical evaluation for assessing significance of research pursuits or control of quality. Microsoft Excel application has inbuilt provision for designing all sorts of graphics for presentation of data. The data can be arranged in tabulated forms in ascending or descending order, we can find out the average (mean) and standard deviation, and do statistical analysis and verify the results of statistical analysis done by manual methods of statistical analysis. Calculations can be done quite quickly and accurately using Excel worksheet. There are built-in functions to calculate percentage points or p-values for many distributions of interest in statistical models.

THE REGRESSION MODEL

Regression is a method of statistical analysis by which determined values are assigned for a dependent variable (y) and for one or more independent variables xj (j = 1, ..., m), we seek a simple expression of the function that expresses the connection between them. By extrapolation, based on this expression and other values of the independent variables, we

392

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT can look for corresponding values of the dependent variable. If the statistical data expresses an evolution over time, then the calculation of the future values of the dependent variable y is called prognosis and the one of the past values - retrogression. In the regression analysis, the dependent variable y is considered to be a function of one or more independent variables, as shown below:

y = f(x1,x2,.....xn)+ε, where ε represents the deviations of the theoretical variable from the experimental variable. In the regression analysis, it is desirable to determine the function f, so that ε is a variable with the mean of zero and the minimum dispersion. If this pattern is not satisfactory, you can use the multi-linear regression that has the form: 2 3 m y = a1x+a2x +a3x +...amx +b

In multi-line regression, the coefficients aj (j=1,....,m) and b must be determined. When m = 1 the dependence is linear and the regression function is represented by a straight line, whose slope is given by a1, and b gives the ordinate of the point of intersection of the straight line with the Oy axis. There are various other dependency functions that can be reduced to multi-linear function by some transformations. For regression, the least squares method is most often used to minimize the squares of the distances between the values given for y and those calculated by the regression function.

CHOOSING THE RIGHT REGRESSION EQUATION

Various regression functions, such as polynomial functions, can be reduced by appropriate transformations to the multi-line regression function. Regression can also be done with Regression in Data Analysis. But the question is whether a regression function is good or not, how good it is and to what extent is it better than another. These assessments can be made using statistical indicators that cover either the whole of the function or its components. We used in the calculations and interpretation of statistical indicators R2, F and t (Student). The correlation coefficient R is between -1 and 1. If R = 0, there is no correlation between the respective variables in the regression model, and if R = ± 1, the correlation is perfect. If all the values are on the regression curve, it results that R2=1 and therefore data matching is perfect for the calculated regression curve. So, the closer R2 is to 1, the more regression equation is more appropriate. Test F indicates whether the entire regression equation is significant or not. In Excel we get a calculated value of F, which we note with Fc. The theoretical value of the inverse statistical distribution F is noted by Ft and it has values determined in the statistical tables in the specialized books. If Fc > Ft, , means that the regression equation is significant.

Test t (Student) for all coeficients aj (j=1,2,....m) of the variables and for the free term b in the regression equation indicates whether or not the relevant variables or the free term are significant.

393

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

It starts with linear regression. If at least one of the three indicators does not indicate a good correlation, a quadratic regression (regression grade equation 2). If the square regression is good, we will also try a cubic regression and determine which is the most significant. The effective comparison of regressions will be done in 4 stages: data preparation for polynomial regressions; linear regression; quadratic regression; cubic regression.

1. Preparing data for polynomial regression

Consider the case of evolution of an organization on a number of 30 years. The annual production value in € billions represents the independent variable x, and the production costs in € billions represent the y dependent variable in the model. In the worksheet we will first type the years and production x. In the adjacent columns we calculate x2 and x3 and then type the y costs in the next column (Table 1). This column fill order is mandatory, because in Regression the columns with data for independent variables must be contiguous.

Table 1. Statistical data of the company

394

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2. Linear regression

Running regression for the equation y=ax+b. Choose Data menu, then Data Analisys and finally press Regression. The Regression window will appear and then the user must establish some parametres for the regression model, like: the input y range, input x range, where the output results should be available (in a range, in another worksheet or another workbook).

Figure 1.

From the Summary Output report generated in the worksheet it results that: - a=298.48, b=584.98 - n=30 from Observations - k=2 in the first line of the column df. 2 It is noted that R =0.602 (R-Square line), Fc=42.42 (F), ta=6.299, tb=1.546 (from t-Stat). From df1=k-1=1 and df=n-k=df2=28, it results FT=FINV(0.05,1,30)=4.17 and 2 tT=TINV(0.05,30)=2.04. The F and t tests are good, but R does not indicate a good correlation. Next we will try a grade 2 regression equation (square).

395

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Summary Output and ANOVA for linear regression

3. Squared regression

2 The regression equation is y = a1x+a2x . The results are obtained in the sheet to which we assign the name SQUARED.

Figure 2.

In the report from the square sheet is obtained: a1=181.40, a2=28.18, b=2269.039, 2 R =0.846, Fc=34.017, t1=-1.198, t2=3.28, tb=3.73.

From df1=k-1 = 2 and df=n-k =df2=27 it results that FT=3.35 and tT=2.05. It is found that R2 is much better than linear regression, and F and t tests are good.

396

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

It results that the second order regression is good, but we will try cubic regression too.

Summary Output and ANOVA for the second order regression

4. Cubic regression

The cubic regression equation has the following form: 2 3 y = a1x+a2x +a3x +b The regression results are obtained in the sheet we named Cubic.

Figure 3.

397

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

From the obtained report we get the following values: a1=239.58, a2=36.07, a3=0.303, 2 b=2384.014, R =0.84, Fc=21.86, t1=0.55, t2=0.65, t3=0.145, tb=2.37. Because df1=3, and 2 df=df2=26, it results that FT =2.97, tT=2.055. R is much better than linear regression and second order, the F test is good, but the t test is not good for a3. Finally, we can drop the cubic equation.

Summary Output and ANOVA for cubic regression

After performing the regression analysis, the following operations can be done in Excel: calculation of retrognosis and prognosis, as well as graphical representation of the initial data and the obtained results These operations can be done in three steps: writing, generating and arranging data in a proper order; graph generation using ChartWizard; interactive adaptation of chart elements with appropriate dialog boxes.

8000 7000 6000 5000 4000 3000 2000 1000 0 123456789101112131415161718192021222324252627282930

Cost production y Predicted Cost production y

Figure 4.

398

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

The graphical represented of cost of production and predicted cost of production

CONCLUSIONS

The successful implementation of a decision is about avoiding conflicts of interest, understanding the decision by everyone who has to execute it. The criteria that every manager should take account when make a decision are: the scientific substantiation, the legality, the completeness, the opportunity and the efficiency of a decision. The context in which the decision is made may influence the nature of the decision-making, so implicitly how a decision-support system can act. It can be analyzed from several points of view, like: the decisional level, the degree of maturity, the degree of competition and the structure of the organization. The decision-making process is based on information. The information is necessary to: define and structure the problem, explore and choose between alternative solutions, review the effects of the implemented choice, solve the problems. To make good decisions, enterprises have a permanent desire to understand the reasons for certain values of business measures.

REFERENCES

[1] Virgil Chichernea, Sisteme suport de decizie, Ed. PrintGroup, 2010; [2] Vasilica Voicu, Sisteme suport pentru adoptarea deciziei, Editura Universitară, Bucureşti, 2010; [3] Wayne Winston, Microsoft Excel Data Analysis and Business Modeling (5th edition), Microsoft Press, 2016, ISBN: 978-1-509-30422-6 [4] Thomas J. Quirk, Excel 2016 for Engineering Statistics, Springer, Cham, Switzerland, 2016, ISBN: 978-3-319-39181-6 [5] Michael Alexander, John Walkenbach, Richard Kusleika, Excel 2016 Formulas Mr. Spreadsheet's Bookshelf, John Wiley & Sons, 2016, ISBN: 978-1-119-06786-3 [6] Jeff Sauro, James R Lewis, Quantifying the User Experience: Practical Statistics for User Research, Morgan Kaufmann, 2016, ISBN: 978-0-128-02548-2 [7] Rayat C.S., Applications of Microsoft Excel in Statistical Methods. In: Statistical Methods in Medical Research. Springer, Singapore, 2018, ISBN: 978-981-13-0827-7 [8] Andre Petermann, Graph Pattern Mining for Business Decision Support, Database Research Group University of Leipzig, https://dbs.uni-leipzig.de/file/paper03.pdf [9] Steven Weikler, Office 2016 for Beginners – The perfect guide on Microsoft Office, 2016, ISBN: 978-1537205755 [10] Bill Jelen, Michael Alexander, Excel 2016 Pivot Table Data Crunching, ed. Que, 2015, ISBN: 978-0-7897-5629-9

399

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

FULLY CONVOLUTIONAL NEURAL NETWORKS FOR IMAGE SEGMENTATION

Andrei LEICA 1* Mihai Bogdan VOICESCU 2 Răzvan-Ştefan BRÎNZEA 3 Costin-Anton BOIANGIU 4

ABSTRACT

Image segmentation is a Computer Vison process in which an input image is split into different and fully-disjoint parts, which are considered to possess a certain characteristic of interest (they have almost the same color, a resembling texture, they represent the same object inside a scene, etc.). In most scenarios, the key for a successfully image analysis, in which there is required a high-level interpretation of its content, may be found in a correct segmentation, but, unfortunately, in most of the real-life cases, this is a very difficult task. Our method is based on deep learning neural network architectures, which hold state of the art accuracy for pixel-wise segmentation on various challenges. We will design and train different architectures and use all of them together as a voting-based image segmentation system.

KEYWORDS: Image Segmentation, Convolutional Neural Network, Machine Learning, Deep Semantic Segmentation, Thresholding, Region Growing, Split and Merge

1. INTRODUCTION

FCN-s or Fully Convolutional Networks [1] are learning models which can output a pixel- wise, dense prediction, and are widely used for various image segmentation tasks. A Fully Convolutional Neural Network is similar to a usual Convolutional Neural Network (CNN), with multiple convolutional layers stacked on top of each other, mixed with nonlinearities (ReLU) and max-pooling layers, but differentiates itself by replacing the final fully connected layer with another convolutional layer having a large "receptive field". The purpose is to analyze the scene in its entirety (the objects found in the image and a rough estimate of their location), in order to get an accurate prediction for each individual pixel.

1* corresponding author, Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 2 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 3 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 4 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

400

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

This model architecture differs from traditional models because it uses no fully-connected layers, instead relying completely on convolution and upsampling operations. This is because the last fully-connected layers are usually used for classification, thus eliminating any spatial information, which is very important in dense prediction tasks, like the pixel- wise image segmentation. As shown in Figure 1, the FCN model first performs many layers of convolution on the image to extract a multiscale feature representation of the image, with the dimension (Hi, Wi, Ci), where Ci is the number of channels or kernels.

Figure 1. A fully convolutional neural network architecture

The architecture begins with a series of “shallow” layers, which are progressively replaced by “deep” layers. Shallow layers apply a low number of small filters to the image, preserving most of the image’s dimensions, while the deep layers are much smaller in height and width, but contain more channels of information. Finally, the last layer performs a transposed convolution that increases the dimensions to (H, W, C0), whose height and width are the same as the input image, with the depth in each pixel being the likelihood that the pixel belongs to each of the C0 classes. Figure 2 illustrates the actual number of convolutions and max-pooling operations that each layer has, together with the upsampling convolutional layers. The concept of Deconvolution (upsampling or transposed convolution) first appeared in [11] and is a special case of convolution capable of producing larger feature maps from smaller ones, very useful in generative models and dense prediction tasks. This idea was first introduced for segmentation in [12]. Starting with this architecture, we can easily extend our models with different CNN architectures and various type of layers and training procedures, which can be used in a voting manner to get the best accuracy on an evaluation dataset.

401

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 2. Upsample Convolutions

2. RELATED WORK – CLASSICAL METHODS

In the following paragraphs we describe summarily some classical methods of segmentation, outlined in [5]. Thresholding is the simplest and, perhaps, the most used segmentation technique. In this method, the main idea is to find one (or more) thresholds and to build different segments from the pixels that fall into different ranges when comparing them with the thresholds. Contextual segmentation: Region growing. Thresholding is a technique that is, somehow, suboptimal, because it groups pixels based on their individual properties without using their vicinity at all. To correct this, some may need to take into account two properties: discontinuity and similarity. A discontinuity-oriented technique will try to find booundaries (as closed as possible) for the objects using variations of the intensity in the image across the previously detected edges. A similarity-based technique, will perform the other way around, trying to group connected pixels that accomplish both individual and group criteria. Split-and-merge segmentation. The process is presented in Figure 3, and the main data structure employed is the quad tree. The input image is split successively into quadrants until “homogeneous” regions are obtained, and the resulting segments are then merged together if the same criteria is met for their combined result. The “homogeneity” measure may be defined and used accordingly to the application needs, one may choose to measure the variance of the data in the candidate region, other may find more useful to enforce a maximum allowable difference between maximum and minimum pixel values, other may choose to compute the relation between some region statistics against the same statistics performed on whole image.

402

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 3. An illustration of the Split and Merge method. The resulting regions can be stored in a quad tree.

Other recent developments include compression-based methods [6, 7], which hypothesize that an optimal segmentation of an image also minimizes the coding length of the data, histogram-based methods, which have been found useful in video tracking [8], due to their efficiency compared to other image segmentation methods, and graph partitioning methods [9, 10], which model the image as a weighted, undirected graph.

3. RELATED WORK - DEEP SEMANTIC SEGMENTATION

Almost all models developed to-date, including those mentioned above, rely on pixel- level linear classification or other methods with limited predictive power, such as decision trees. Neural networks, by contrast, have the potential to learn highly nonlinear decision boundaries and eliminate tedious human feature engineering. Current state-of-the art methods for semantic image segmentation include basic deep convolutional networks (employed by Chen et al. [2] in conjunction with conditional random fields), fully convolutional networks (Long et al. [1]), and deconvolutional networks (Noh et al. [3]). Of great interest are also models originally intended for object detection (which are frequently applicable to segmentation to some extent) such as the R-CNN model developed by Girshick et al. [4].

Figure 4. Images from the KITTI Road dataset, with the annotated ground-truth

403

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

4. FULLY CONVOLUTIONAL MODEL

Firstly, we focus on an existing fully convolutional neural network architecture, described byy Long et al. in [1], and used successfully in image segmentation tasks. We analyze how this model can be applied to solve the difficult task of road and lane detection. This is a very important task for autonomous driving. It is also used is providing pedestrian detection and offering relevant driving assistance. We trained a model using the KITTI Road dataset [16]. For the training process, ground- truth labels consist of images with the same size as the original input images (1280*380 for KITTI dataset), where each pixel is 1 for road and 0 for non-road area. The network learns to predict a label for each individual pixel, by outputting a probability distribution over the possible pixel values (0/1 in our binary classification task). The network is trained until converges, when the loss (softmax-cross-entropy function) no longer decreases. There are 289 + 290, training + test images, in the urban data set, classified as below: • Unmarked (98 + 100, training + test) • Marked (95 + 96, training + test) • Multiple marked lanes (96 + 94, training + test) • Combined (unmarked, marked, multiple marked lanes) The ground truth data has been obtained using a manual annotation process. It is available for the road area (the collection of all lanes) and the lane for the driving vehicle (marked). Manual annotation is necessary due to ground truth being provided for training images only.

5. RESULTS

Figure 5. Road sections detected in test images

404

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

We extended the KITTI training data set, by also including mirrored version of the training images. Consequently, while the original KITTI Road dataset consists of 289 training images, our dataset uses 580 training images. For evaluation, 34 images are used. The FCN architecture is implemented using the TensorFlow open-source machine learning framework. The following measures were used for the evaluation: Precision, Recall and harmonic mean (F1-measure, β = 1). The best accuracy obtained with FCN architecture was achieved after 300 epochs, with Precision = 0.967, Recall = 0.920, F1_measure = 0.943. A sample result is shown in figure 5.

6. FURTHER WORK ON FCN

For future work, we will focus our attention on the PASCAL VOC 2011 segmentation training set. A bigger, and perhaps more relevant, collection of training images (Hariharan et al. [13]), will be used to train on models on 90 provided classes, thus moving beyond binary classification (as in the road segmentation problem). Different convolutional neural network architectures must be tested, in order to determine which one provides the most satisfying results.

Figure 6. Sample images from the PASCAL VOC 2011 training/validation data

7. INSTANCE OBJECT DETECTION AND SEGMENTATION

In the following, we propose another use of semantic segmentation, by augmenting an Object Detection neural network pipeline with segmentation for individual instances. One of its most popular and useful functions is that of extracting semantic features from high dimensional data, such as 2D streams of data, as is the case with video streams. In the past, people have made intensive, unsuccessful efforts of manually extracting features from 2D images using various techniques. Convolutional Neural Networks (CNN) offer a way of automatically extracting semantically useful features from high dimensional data by enforcing some constraints like the fact that a filter applied to an image should be invariant to its position on the image. The CNN is a Neural Network that contains multiple convolutional layers. A convolution operation is achieved by sliding a kernel over the input and computing the dot product between the input and the filter. The convolved feature is the result of this operation. The use of convolutional networks for

405

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT extracting features led to their extensive usage in other areas as well, such as classifying, detecting and segmenting salient objects. These qualities make them essential to our task of detecting objects with the end goal of driver assisted technologies and self-driving cars. In the following we will try to give an overview of each component of the system. However, a comprehensive and self-contained presentation of each topic used is impossible and beyond the scope of this design document. As such, we redirect the interested reader in search of a more detailed explanation to the huge amount of information available online and throughout books and research papers.

Figure 7. High-level overview of the proposed image segmentation system architecture

The Mask R-CNN [14] network builds upon the Faster R-CNN [15] network with the addition of segmentation, whose purpose is to predict the object mask. The Faster R-CNN network is critical to the system since it is the most accurate at detecting obstacles and classifying them into classes -- cars, busses, pedestrians, etc. We shall refer specifically to Faster R-CNN as the variant from the Mask R-CNN system. The system’s architecture consists of a FCN (fully convolutional network) which acts as the backbone of the system and maps an image given as input into a lower-dimensional representation. This feature map is passed to the RPN (region proposal network), whose

406

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT job is to propose ROIs (regions of interest) for the network to analyze as potential candidates for the presence of objects. These candidates are scored based on their projection quality onto original ground truth bounding boxes. The network then crops out the feature map according to these proposals given by the RPN using the RoIAlign layer. The RoIAlign layer is a quantization-free layer that accurately keep the spatial locations of the features. The crops are then resized to fix dimensions and passed to the 3 parallel branches that form the head of the network: the bounding box regression branch, the bounding box classification branch and the mask segmentation branch. The roles of each of these branches are the following: • The bounding box regression branch - outputs deviations from the original positions of the region of interest detected by the RPN, fine-tuning the exact location of the box such that it faithfully covers the object of interest. • The bounding box classification branch - predicts the class score of the object in question with respect the existing classes the network is training on. • The mask segmentation branch - come up with individual instances and pixel- wise segmentation masks for each object instance. If the tasks are complementary, then sharing the encoder can be beneficial in allowing the tasks to achieve a higher score than without sharing the convolutional encoder backbone. At inference time, each of the tasks outputs its corresponding results which are subsequently applied to the original image for visualization purposes.

7.1. Design details

The system architecture is based upon the following elements: • Fully Convolutional Neural Network (FCN) - this is the shared encoder between the three branches of the architecture (presented as the first architecture). It is responsible for transforming a high dimensional input, i.e. an image to a lower dimensional condensed representation of the image, whose space is better for class separation, and boundary detection. The result of this stage is a feature map with dimensions [w, h, c], w being the weight, h the height and c the number of filters. The following image illustrates the output of this stage.

407

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 8. An example feature map derived by a Convolutional Neural Network

• Region Proposal Network (RPN) employs two convolutional layers with a 3x3 kernel. Their purpose is to detect those regions for which it is possible to find an object in the feature map. For each convolution position on the image a set of 9 anchors having three aspect ratios, three scales and sharing a common center. The process is depicted below. For each position in the image and for each anchor box a regressor outputs the predicted bounding box (x, y, w, h) and a classifier probability p that an object is inside the predicted box. Both parallel heads are trained using the original ground truth boxes and the probability threshold p uses the IoU (intersection over union metric)

Figure 9. Anchors generated by the Region Proposal Network

• The RoIAlign layer - is used to crop a region of interest from the feature map using bilinear interpolation to compute the pixel values, without performing any rounding to the values and thus achieving perfect alignment between the feature map resulted from the crop_and_resize operation and the actual position in the original image. This process is illustrated in greater detail below.

408

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 10. Bilinear interpolation performed by the RoIAlign layer

• Bounding Box Regression Branch - uses fully connected layers to output 4 bounding box regression coordinates (x, y, w, h) for each class. • Bounding Box Classification Branch - uses fully connected layers to output a class score for each of the possible classes of the network. • Mask Segmentation Branch - uses convolutional layers to outu put, for each class, a dense prediction for each extracted feature map, thus labeling each pixel according to its predicted membership to that class. The system is constructed using the TensorFlow deep learning framework developed by Google, using python and numpy as additional tools. TensorFlow offers automatic gradient computation and backpropagation, thus manually implementing the training procedure is unnecessary. Training and testing require computational resources supplied byy the use of multiple GPUs, specifically NVIDIA GTX 1080TI. The computationns are sped up by using NVIDIA’s CUDA toolkit and associated cudnn library.

ACKNOWLEDGEMENT

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PPN-III-P1-1.2-PCCDI-2017-0689 / „Lib2Life- Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate” / "Revitalizing Libraries and Cultural Heritage through Advanced Technologies", within PNCDI III.

REFERENCES

[1] E. Shelhaamer, J. Long, and T. Darrell. “Fully Convolutional Networks for Semantic Segmentation”. IEEE Pattern Analysis and Machine Intelligence (PAMI), May 2016. [2] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. “Semantic image segmentation with deep convolutional nets and fully connected CRFs”. arXiv preprint arXiv:1412.7062, 2014.

409

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[3] H. Noh, S. Hong, and B. Han. “Learning deconvolution network for semantic segmentation”. In Proceedings of the IEEE International Conference on Computer Vision, pages 1520– 1528, 2015. [4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014 [5] Nick Efford. “: A Practical Introduction Using JavaTM”. Pearson Education, 2000. [6] Hossein Mobahi, Shankar Rao, Allen Yang, Shankar Sastry and Yi Ma. “Segmentation of Natural Images by Texture and Boundary Compression”. International Journal of Computer Vision, volume 95, pages 86–98, 2011 [7] Shankar Rao, Hossein Mobahi, Allen Yang, Shankar Sastry and Yi Ma. "Natural Image Segmentation with Adaptive Texture and Boundary Encoding". In Proceedings of the 9th Asian conference on Computer Vision - Volume Part I (ACCV'09), Hongbin Zha, Rin-ichiro Taniguchi, and Stephen Maybank (Eds.), Vol. Part I. Springer-Verlag, Berlin, Heidelberg, pages 135-146, 2011 [8] P. Dunne and B. J. Matuszewski, “Histogram Based Detection of Moving Objects for Tracker Initialization in Surveillance Video”, International Journal of Grid and Distributed Computing, vol. 4, no. 3, pages 71-78, 2011. [9] Jianbo Shi and Jitendra Malik, “Normalized Cuts and Image Segmentation”. IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 22, no. 8 (August 2000), pages 888-905. DOI=http://dx.doi.org/10.1109/34.868688, 2000 [10] Leo Grady, "Random Walks for Image Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 28, no. 11, pages 1768–1783, 2006 [11] Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. Deconvolutional networks. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2528–2535. IEEE, 2010. [12] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions” published as a conference paper at ICLR 2016, arXiv:1511.07122, 2016 [13] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors”, 2011 IEEE International Conference on Computer Vision (ICCV) [14] Kaiming He, Georgia Gkioxari, Piotr Doll, Ross B. Girshick, “Mask R-CNN”, CoRR, abs/1703.06870, 2017 [15] Shaoqing Ren, Kaiming He, Ross B. Girshick, Jian Sun, “Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal Networks”, CoRR, abs/1506.01497, 2015 [16] KITTI Road dataset, URL: http:// www.cvlibs.net/ datasets/ kitti/ eval_road.php, Available: March 1, 2018.

410

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

MULTI-ALGORITHM IMAGE DENOISING

Georgiana-Rodica CHELU 1* Marius-Adrian GHIDEL 2 Denisa-Gabriela OLTEANU 3 Costin-Anton BOIANGIU 4 Ion BUCUR 5

ABSTRACT

In spite of the thorough research that has been done in the field of image denoising, a generic algorithm able to preserve the details of an image at an acceptable level has not been yet discovered. Most methods account for a specific class of noise and provide suitable results only if the implicitly-determined control parameters of the image correspond to the method’s assumptions. Furthermore, many such methods reside on the presumption that noise is spatially-invariant and do not treat the other case. The purpose of this paper is to analyze the classical methods used in image denoising, to observe their limitations in order to decide how mixing different algorithms might correct their undesired behaviors and to set the scene for a new method appropriate for image denoising that would yield better results on a more varied set of images.

KEYWORDS: Image Denoising, Image Processing, Merging Technologies, Random Noise, Fixed Pattern Noise, Banding Noise, Salt-and-Pepper Noise.

INTRODUCTION

Images are 2-dimensional representations of the visible light spectrum and are stored on computers as multi-dimensional arrays where the dimension depends on whether the image is colored (fig. 1) or black and white (fig. 2). For simplicity, the third dimension is also encoded as a tuple of red, green and blue components, corresponding to the RGB color space. A pixel is represented by a pair (i, v(i)), where v(i) is the value at i, the pixel’s position, and it is the result of measuring the light intensity by using a charge-coupled device (CCD) matrix and a focus system (the lens).

1* corresponding author, Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 2 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 3 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 4 Professor PhD Eng.,”Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 5 Associate Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, [email protected],

411

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

In the case of black and white images, v(i) is a real value representing a shade of grey.

Figure 1. Color image. Image taken from [9] Figure 2. Grayscale image. Image taken from [9]

Each element of the CCD matrix counts the number of photons which are incident on it during the exposure time. In the case of a constant light source, it has been proven that the number of photons incident on each captor varies according to the central limit theorem, that is for n incident photons, the fluctuation is around . This accounts for random noise that is characterized by intensity and color fluctuations, it is determined by ISO speed and it is always present at any exposure length. Apart from random noise, there are other two types of noise: fixed pattern noise and baanding noise. Fixed pattern noise, commonly known as hot pixels, occurs when the not adequately cooled captors receive heat spurious photons [10]. What makes this type of noise easy to remove is that it will show almost the same distribution of hot pixels provided that the image is taken under the same conditions. For this reason, it will not make the subject of the paper. Banding noise is camera-dependent and it is introduced when data is read from the camera sensor [10] or after significant noise reduction and it will also not make the subject of the research.

3.1. hot pixels 3.2. random noise 3.3. banding noise 3.4. salt-and-pepper Figure 3. Images taken from [10]

412

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Salt-and-pepper is another type of noise either caused by camera sensors, memory loss or by faulty conversions between analog and digital images. It is characterized by the fact that salt-and-pepper pixels can only take the maximum or the minimum values in the dynamic range. Random noise is the most difficult to extract, as it is often hard to differentiate it from fine textures such as dirt and therefore removing the noise results in removing those detaails as well (fig. 4). Therefore, denoising often leads to other undesired effects such as blur, the staircase effect, the checkerboard effect, the wavelet outliers, etc. depending on the denoising algorithm applied. The difficulty of the problem is also increased due to the uneven distribution of random noise. This does not come as a surprise when using algorithms that are either based on one of the noise models presented above or on a generic image smoothness model.

4.1. Image taken from [9] 4.2. Blur Effect Inflicted by Denoising using NLM Figure 4

As it was argued before, the purpose of image denoising is to reconstruct an image that has been subject to noise. One can mathematically represent the noisy image y as a sum between the original image x and the noise n , a pixel: y = + n (1) The aim of any denoising algorithm is to reduce or completely remove the noise, n(i), in order to obtain the original signal, x(i).

RELATED WORK

A possible class of algorithms consists in algorithms which filters pixel sets in which individuals have a certain degree of similarity. Such an algorithm was developed by Perona-Malik and it is related to anisotropic filtering. Another category of denoising methods uses training sets to derive image statistics about certain coefficients (Wavelet filtering) and attempt to modify these coefficients in order to diminish the noise. Other methods such as the Non-Local Means algorithm use sampling techniques to evaluate the areas in an image that appear to be similar with respect to the structure, but not to the quantity of noise. No matter the approach, the aim is to obtain noise-free images without altering the original image.

413

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Classical methods such as Gaussian or Wiener filtering (Yaroslavsky) work by separating the image into the two parts, the smooth and oscillatory part [8]. The drawback is that this might lead to losing the fine edges present in the original image which is un undesired outcome. Another algorithm that yields similar unwanted results when it comes to preserving the fine edges of an image is Perona-Malik, which filters the noise by using anisotropic diffusion. An improved approach could be robust anisotropic image smoothing. A newer and improved approach uses local adaptive filters to analyze the image in a moving window, compute its spectrum for each position but only use the value at the central pixel of the window [2]. The wavelet thresholding approach assumes that noise is represented with small wavelet coefficients that should be discarded if under a certain threshold. The drawback of this method is that it outputs images in which important edge coefficients are also cancelled which again leads to a loss of fine details and the appearance of spurious pixels [2].

Figure 5: Non-local means applied on two images: first row, original image; second row, noisy image σ 25; third row: image denoised using the non-local means algorithm

414

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Among the promising methods for a certain class of images is the non-local means approach proposed by Antoni Buades, Bartomeu Coll and Jean-Michel Morel as it is fairly easy to implement and yields qualitative results. The method suggests that one should choose a pixel, look for similar neighborhoods to the one surrounding the pixel and replace the pixel by the average of the centers of the neighborhoods. This is an algorithm based on exploiting the self-similarity that it can be seen in most natural images. Total variation denoising techniques assume that images affected by noise have high total variation or the integral of its absolute gradient is high and therefore attempt to reduce the total variation. These algorithms yield impressive results as they remove the noise without affecting the edges as most of the above presented algorithms do [12].

Figure 6: Examples of similar neighborhoods. Image taken from [12]

ALGORITHMS’ ANALYSIS

No matter the method that is being used, the final goal is to obtain an image as free of noise as possible without loss of details and side effects such as blurring. But each and every algorithm presented above has its own specific limitations. For example, anisotropic smoothing methods can preserve strong edges but cannot be used for smooth patterns and textures while methods based on wavelet coefficient statistics yield the expected results only for limited types of input images. The non-local means algorithm works well for repetitive image patches but underperforms when it comes to preserving details in non-repetitive areas as it treats the smooth regions and the edges in the same manner. Furthermore, it is worth observing what type of noise each algorithm is removing the best. Strategies that perform best for low noise levels may not perform as well for high noise levels and the same happens for uniformly or non- uniformly distributed types of noise.

415

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

As a conclusion, there is a strong need for simple, unsupervised, efficient denoising algorithms that performs well on all kinds of input images. A great number of algorithms performs similarly despite using different approaches and thus merging seems to be a suitable solution for improving the already existing methods. In what follows, a method baased on mixing approaches that could account for some drawbacks of the current state- of-the-art denoising algorithms will be presented.

THE PROPOSED SOLUTION

A solution will be proposed in which five of the previously described algorithms will be selected and applied on various images. These images will be black and white images, colored and also on images with different levels of brightness/color uniformity. Which algorithms performs better on each of these images can be categorized and assigned some weights. Having these weights the image can be split after some criteria: 1. Highlights/shadows and non-highlights/shadows 2. Uniform and non-uniform color The average value of this point will be given by the weight of each algorithm on each region. The average can be tweaked by using removing extreme values or values that are outside of a calculated interval. The expectation is to get a better image with each algorithm have a greater impact on each type of surface: e.g. on the sky, where all pixels have the same value the average of the neighboring pixeels will give us the best approximation (this could also go for shadows). On the other hand if the neighboring pixels have really different values this algorithm will fail.

Figure 7: Sample images from the proposed dataset

416

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Five algorithms have been chosen as input for further research and mixing-based improvements: • the Non-Local Means implementation from OpenCV • the TV - L1 (total variation denoising) from OpenCV • the wavelet thresholding implementation from scikit • the bilateral filtering from scikit • and the median filtering from scipy The image dataset was divided into 4 categories in order to better assess the performance of the algorithms, each subset being characterized by: • dominant colors • few colors • many colors • images containing people

THE PREPROCESSING STAGE

Two types of noise were added (salt and pepper and Gaussian noise) to each image. It will have a lower weight assigned or even take it out of the average in case it underperforms.

Edge detection

Edge detection is a method used for identifying those points in an image at which the brightness of the image changes sharply. These points are grouped into a set of curves named edges. The Canny Edge Detection algorithm developed by John F. Canny from OpenCV was used. The first step of this approach is noise reduction, since edge detection is susceptible to noise. To reduce this susceptibility, the noise is removed by applying a 5x5 Gaussian filter. Then the algorithm computes the intensity gradient of the image. A Sobel kernel filter is applied in the horizontal and vertical directions to obtain the first derivatives in horizontal (Gx) and vertical (Gy) directions. After obtaining the gradient magnitude and direction, the program scans the whole image in order to remove any unwanted pixels which may not be a part of an edge. The algorithm is then checking for each pixel if it is a local maximum in its neighborhood in the direction of gradient.

417

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Original noisy image Obtained edges Obtained mask Figure 8

The next step is called hysteresis thresholding and it decides which edges we can keep. For this, we need two threshold values, min and max. Any edge having the intensity gradient greater than max is definitely an edge. Those smaller than min are definitely not edges, so they can be discarded. Those that remain obey to the following rule: if they are connected to a "real-edge", they are a part of the edge. Otherwise, they are discarded. Based on the edges that were obtained by applying this method, a mask was created that the program is using to apply the algorithm to certain areas of the image.

RESULTS AND CONCLUSIONS

Figure 9. The image with Salt-and-pepper noise Figure 12 Denoised with total variation and Non-Local Means algorithms

Figure 13. Denoised with bilateral filtering Figure 10. Image denoised with total variation algorithmh algorithm

418

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Figure 14. Denoised with wavelet thresholding Figure 11. Image denoised with Non-Local algorithm Means algorithm

To assess the results of applying the algorithm, the RMSE was used to compare the original image (before applying the noise) and the resulting image (the one obtained after applying the method to the noisy image).

Figure 15. Processing results interpretation (Ox: Number of windows; Oy: RMSE)

The standard deviation is computed between the original image and the result image. A higher number of windows leads to a better denoising. The drawback is the computation time that increases in an exponential way with the number of the windows. The two denoising algorithms used are Non-Local Means and TV - L1 (total variation denoising). Several voting-based processing methods have been tried over time [13][14][15][16]. The presented approach, despite that fact that does not make use of a real voting system in order to select the final result, has the same mechanisms presented in the voting-based applications: the use of totally different sub-optimal approaches to solve a specific problem and, in the end, the intelligent merge of every output into the final result.

419

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

ACKNOWLEDGEMENT

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0689 / „Lib2Life- Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate” / "Revitalizing Libraries and Cultural Heritage through Advanced Technologies", within PNCDI III.

REFERENCES

[1] K. Sivaramakrishnan and T. Weissman, “Universal denoising of discrete-time continuous-amplitude signals,” in ”Proceedings of the IEEE International Symposium on Information Theory” [2] A. Buades, B. Coll And J.M. Morel, “Image Denoising Algorithms, With A New One” [3] Antoni Buades, Bartomeu Coll, Jean Michel Morel, “On image denoising methods” [4] A. Buades, B. Coll, and J. Morel, “A non-local algorithm for image denoising” [5] Kamakshi Sivaramakrishnan, Tsachy Weissman, “A Context Quantization Approach to Universal Denoising” [6] G. Motta, E. Ordentlich, I. Ramirez, G. Seroussi, and M. J. Weinberger, “The DUDE framework for continuous tone image denoising” [7] Nima Khademi Kalantari, Pradeep Sen, “Removing the Noise in Monte Carlo Rendering with General Image Denoising Algorithms” [8] Hyuntaek Oh, “Bayesian ensemble learning for image denoising” [9] Ratan, Rajeev. “Mastering Computer Vision with OpenCV in Python”. Udemy, Inc. Web. November 2017 [10] “Digital Camera Image Noise” Digital Camera Image Noise: Concept and Types, Available at: www.cambridgeincolour.com/tutorials/image-noise.htm. Accessed on: 1 March 2018 [11] Estrada, Francisco. Fleet, David. Jepson, Allan. “Stochastic image denoising” [12] Glasner, Daniel. Bagon, Shai. Irani, Michal. ”Super-Resolution from a Single Image”. ICCV. 2009. [13] Costin-Anton Boiangiu, Radu Ioanitescu, Razvan-Costin Dragomir, “Voting-Based OCR System”, The Proceedings of Journal ISOM, Vol. 10 No. 2 / December 2016 (Journal of Information Systems, Operations Management), pp 470-486, ISSN 1843-4711 [14] Costin-Anton Boiangiu, Mihai Simion, Vlad Lionte, Zaharescu Mihai – “Voting- Based Image Binarization” - , The Proceedings of Journal ISOM Vol. 8 No. 2 / December 2014 (Journal of Information Systems, Operations Management), pp. 343-351, ISSN 1843-4711

420

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[15] Costin-Anton Boiangiu, Paul Boglis, Georgiana Simion, Radu Ioanitescu, "Voting- Based Layout Analysis", The Proceedings of Journal ISOM Vol. 8 No. 1 / June 2014 (Journal of Information Systems, Operations Management), pp. 39-47, ISSN 1843-4711 [16] Costin-Anton Boiangiu, Radu Ioanitescu, “Voting-Based Image Segmentation”, The Proceedings of Journal ISOM Vol. 7 No. 2 / December 2013 (Journal of Information Systems, Operations Management), pp. 211-220, ISSN 1843-4711.

421

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

MAJORITY VOTING IMAGE BINARIZATION

Alexandru PRUNCU 1* Cezar GHIMBAS 2 Radu BOERU 3 Vlad NECULAE 4 Costin-Anton BOIANGIU 5

ABSTRACT

This paper presents a new binarization technique for text based images. The proposed method combines several state of the art binarization algorithms through a majority voting scheme and applies a post processing effect to improve the results, more specifically, the edge map of the grayscale image is used in combination with the image resulted through the voting process in order to ensure a more accurate determination of the image’s characters. Compared individually to each algorithm used, the binarization result proves to be quite promising, surpassing every other algorithm for certain images containing machine-written characters.

KEYWORDS: image binarization, image processing, image segmentation, voting technique, edge map, Otsu, Riddler-Calvard, Niblack, Sauvola, Wolf

1. INTRODUCTION

Image binarization refers to the process of applying thresholding to an image in order to determine which of two different color levels (usually denoted as black and white) a certain pixel will be associated with. Such an algorithm could be used in the process of finding background and foreground pixels in an image, proving itself particularly useful when dealing with text documents, usually being one of the first processing steps in any good Optical Character Recognition (OCR) software. Classical image binarization techniques are usually separated in local and global methods. Global thresholding techniques use a single value for the whole image and are therefore faster than local ones, but only perform well in cases where there is a good separation between background and foreground. Local thresholding algorithms compute a value for

1* corresponding author, Engineer, “Politehnica” University of Bucharest, Bucharest, Romania [email protected] 2 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 3 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 4 Engineer, “Politehnica” University of Bucharest, Bucharest, Romania, [email protected] 5 Professor PhD Eng., “Politehnica” University of Bucharest, Bucharest, Romania, [email protected]

422

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT every pixel, providing better results for images with uneven lighting, but performing worse in the case of a noisy image. On one hand, as every algorithm has different results, combining them can provide a new binarization that exploits the best parts of every individual algorithm. On the other hand, this combination can also propagate the less attractive aspects of the various algorithms, so care should be taken when deploying such a technique. The paper is structured as follows: Section 2 presents the state of the art in terms of binarization and combining several algorithms to obtain a better result, Section 3 details the proposed algorithm that uses majority voting and a post-processing step based on the edge map of the grayscale image, Section 4 presents the results that this method produced and Section 5 describes the presented approach’s conclusions.

2. RELATED WORK

An interesting approach to binarization through voting can be seen in [1]. The algorithm starts by applying a 5x5 Wiener filter [2] to the grayscale image. Using the filtered image, an odd number of binarized images are determined using various algorithms and a majority vote is applied to obtain a new binarized image. The edge map of the filtered grayscale image is determined, preserving only the edge values that are above a certain threshold. A new edge map is determined by only keeping the connected components that overlap with the voted binarized image (in a 3x3 neighborhood). The resulting shapes in the edge map are then filled in using an extension to the Run-Length Smoothing Algorithm [3] (turning successive white pixels into successive black pixels). The foreground pixels from both the resulting image and the binarized image are cumulated and a final conditional dilation is applied. This method, usually produces the best output result considering any of the input algorithms. Voting based approaches have been previously shown to lead to promising results in other fields related to Image Processing and Computer Vision, such as OCR Systems [4], Layout Analysis [5] and Image Segmentation [6].

2.1. Local methods

Local image binarization methods compute a threshold for each “region” of the image by sliding a rectangular window over the input image. In Niblack’s approach [7], the average and dispersion of neighbors from the corresponding window are calculated for each pixel. This method can excellently recognize the foreground, but it also less resistant to noise. For Niblack, the chosen threshold is computed using the following formula:

Where m is the average, s the standard deviation and k is a constant, usually chosen (empirically) as 0.2 to balance signal with noise. Sauvola [8] improves the previous method by using the dynamic range of standard deviation R. This works well on a light background texture and when the foreground and

423

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT background pixels are near the lower and, respectively, the upper interval range of the color range. Performance declines quickly when these values are close to each other. 11 Wolf’s algorithm [9] solves Sauvola’s drawbacks by computing a global minimum gray value, M: (3) 1 Using a globally calculated value can also represent a disadvantage, as it can be influenced by noisy “regions” of the image.

2.2. Automatic thresholding

These methods start either from a random or a specific value of the threshold and, as the algorithm executes, the value is computed by using several techniques in order to obtain a better result. A bimodal distribution is a continuous probability function with two different local maximum values. Otsu’s method [10] is a globally thresholding approach which offers good results if the input image presents this type of distribution (a low minimum value between two peaks). The target image is divided into two classes (background and foreground) by choosing a value for the threshold. Then, the class mean and deviation are computed, as well as the normalized histogram. Each class has a weight associated to it, computed from the histogram’s bins. Using the previous values, the method tries to minimize intra-class variance and maximize inter-class variance: Where is the class mean and is the class probability. is repeatedly computed until the desired result is achieved. Riddler-Calvard [11] starts from the presumption that an image is the sum of 2 distributions (background and foreground). If the 2 distributions are Gaussian (normal distributions) and their deviations are equal, then a threshold can be computed from the arithmetic mean of the distributions expected value. This new value is considered the new threshold value and the 2 new distributions are computed. The steps are repeated until the threshold value does not change (i.e. the chosen minimum error is not exceeded).

3. PROPOSED METHOD

The proposed algorithms obtain quality binarizations for input images that respect certain properties, each offering the best results for their characteristic segment. The 3 algorithms are described below.

3.1. Majority voting (unweighted)

Unweighted Majority Voting [12] consists of using multiple binarization techniques in order to generate a number of resulting images, and combining them into a single one,

424

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT setting a pixel to the value that most resulting images agree on. Each algorithm is executed and the results are compared per pixel as following: if the majority of the methods consider the pixel as part of the foreground, then the final result will also be part of the foreground. The same logic applies to background pixels. An advantage of this method is that a big number of results closer to the ground truth will always result in a final result close to the ground-truth. Unfortunately, this also means that a false positive (or negative) majority will produce a wrong final image.

3.2. Majority voting (weighted)

Unlike the previous method, each algorithm’s result is weighted. The value of the weight is either chosen randomly, or based on the overall results after thresholding each image in the dataset. For example, if the Wolf method’s results are the closest to the ground-truth, then its vote will value more. To avoid vote manipulation, the maximum weight will not exceed the value of: Number_of_total_methods – 2. The main advantage of this method is the avoidance of false positives/negatives. The biggest problem is choosing the value for the weight: random values tend to produce inconsistent results, while a specific value is highly dependent on the algorithm’s implementation and the image database.

3.3. Edge based post processing

After the majority vote is done, a post-processing operation is applied. This algorithm is comprised of the following steps: • A new background-filled image is created, in order to be used as a destination • The edge map of the original image is determined using the Canny algorithm (after applying a Gaussian blur to ensure better results) • Every resulting pixel in the edge map is inserted into the destination image • Starting from the newly inserted edge pixel, the following algorithm is applied in order to find other pixels corresponding to the current shape: the edge map is being traversed to the right as long as the image resulted through voting still contains foreground pixels or until a new edge pixel is detected. During this scan, every corresponding pixel from the destination image is being written as a foreground pixel. • The previous method is also applied vertically, starting from the same edge pixel as mentioned before.

4. RESULTS

Our test database consists of images used at the Document Image Binarization Competition (DIBCO) in 2013 [13] and 2016 [14]. Based on the value of a pixel (white or black) from the ground truth image and the resulted binarized image, they can be grouped as [15]: • TP (true positive) - a pixel which is on in both the ground truth (GT) and the binarization result images

425

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

• FP (false positive) - a pixel which is on only in the resulted image • FN (false negative) - a pixel which is on only in the GT image Using this classification the following metrics can be used to compute the binarization quality: 2 The complete results can be found in Table 1, whilst Table 2 presents the best, worst and edge results for several images.

Table 1. Results of the experiments. Vote is the result of a weighted majority voting process

Image Type of degradation Method F-measure PSNR DRD

10 (DIBCO 2016) Old, shades Niblack 64.64 10.79 7.4

Otsu 66.04 10.24 9.13

Riddler-Calvard 68.24 11.1 7.15 Sauvola 62.27 10.8 7.43 Wolf 64.16 9.85 9.97 Vote 66.04 10.1 8.81 Edge 57.9 8.8 13.3

1 (DIBCO 2016) Wet, stamp Niblack 61.20 12.53 30

Otsu 78 15.81 13.08

Riddler-Calvard 78.17 15.8 13.13 Sauvola 79 15.97 12.63 Wolf 78.83 15.78 12.92 Vote 76.16 15.56 14 Edge 76.74 15.4 14.41

426

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Image Type of degradation Method F-measure PSNR DRD

7 (DIBCO 2016) Slim faded paper Niblack 65 12.58 9

Otsu 57.72 12.05 10.14

Riddler-Calvard 35 10.82 13.71 Sauvola 39.98 11 13 Wolf 53.34 11.75 11 Vote 57.7 12 10.14 Edge 71.9 13 8.11

PR03 (DIBCO 2013) Noisy paper Niblack 69.58 13.88 15.73

Otsu 78.11 15.8 8.9

Riddler-Calvard 65.31 14.23 12.77

Sauvola 69.13 14.62 11.63

Wolf 79.6 16 8.52

Vote 77.4 15.68 9.16

Edge 82.27 16.29 8.42

HW05 (DIBCO 2013) Back page text visible Niblack 31.39 10.75 66

Otsu 49.96 14 28.92

Riddler-Calvard 67.25 17.36 12

Sauvola 62.53 16.32 15.8

Wolf 55.37 14.93 22.44

Vote 50.7 14.29 27.35

Edge 26.86 9.8 82.64

427

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Table.2 Best, worst and edge results for several images. Best column refers to the highest F- measure image different from our resulting ones.

Image Edge Best Worst

10 (DIBCO 2016)

Riddler-C Sauvola

1 (DIBCO 2016)

Sauvola Niblack

Image Edge Best Worst

7 (DIBCO 2016)

Niblack Riddler-C

PR03 (DIBCO 2013)

Wolf Riddler-C

5. CONCLUSION

Regarding the input algorithms, it’s useful to say that any of them has specific advantages and drawbacks. Riddler-Calvard tends to thin the characters, whilst produces good results when the image presents noise or back page text is visible (an example being the image “HW05”). However, when the text is very thin, it can misinterpret it as being part of the background

428

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

(as in image “7”, where a full row of text was removed). Otsu follows the same pattern, except the letters are thicker. Niblack’s main advantage relies in the solid quality of the slim and low-contrast text, while its main disadvantage remains the excessive amount of noise introduced in the output. Both aforementioned methods, Riddler-Calvard and Otsu, taking a fill global thresholding approach instead of a full local one, are suffering from the opposite defects types. By combining the properties of these algorithms through voting and further improving the output by filling in the gaps using edge detection, the resulting binarization images are further improved. The results are heavily-dependent on the edge-detection algorithm, more precisely on the noise of the edge map. If the Canny edge detector result is very noisy, the post-processing step will choose a lot of irrelevant pixels and will try to fill in unimportant shapes. The presented binarization technique, consisting of a voting based approach, usually leads to better results, when taking into account each of its composing algorithms. Adding on top of the voting binarization the post processing step further enhances the results in the majority of tests, with the exception of pictures containing old shades or where the text bleeds through from the back page. When the images to be binarized do not present these types of degradation, the Edge method should be considered, as it provides better results than the Vote one. When it is not certain whether the images contain these faults or not, the Vote method should be considered. The problem with the latter category of degraded images lies not in the binarization method that is used, but in the way the images are interpreted. One cannot be certain if a line of text is part of the analyzed page, or part of the back page. The only clue that can help one make an educated guess is the fact that the text is written from left to right. Were the image to be reversed, the main-page text would now be considered background, and the bleed through text would become the foreground. Such exceptions should not be considered when discussing a binarization technique.

ACKNOWLEDGEMENT

This work was supported by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0689 / „Lib2Life- Revitalizarea bibliotecilor si a patrimoniului cultural prin tehnologii avansate” / "Revitalizing Libraries and Cultural Heritage through Advanced Technologies", within PNCDI III.

REFERENCES

[1] B. Gatos, I. Pratikakis, and S. Perantonis, “Improved document image binarization by using a combination of multiple binarization techniques and adapted edge information,” International Conference on Pattern Recognition, pp. 1–4, 2008. [2] Wiener N:”The interpolation, extrapolation and smoothing of stationary time series”, Report of the Services 19, Research Project DIC-6037 MIT, February 1942

429

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

[3] F.M. Wahl, K.Y. Wong, R.G. Casey, “Block Segmentation and Text Extraction in Mixed Text/Image Documents”, Comp. Grap. and Im. Proc., pp. 375-390, 1982 [4] Costin-Anton Boiangiu, Radu Ioanitescu, Razvan-Costin Dragomir, “Voting-Based OCR System”, The Proceedings of Journal ISOM, Vol. 10 No. 2 / December 2016 (Journal of Information Systems, Operations Management), pp 470-486, ISSN 1843-4711 [5] Costin-Anton Boiangiu, Paul Boglis, Georgiana Simion, Radu Ioanitescu, "Voting- Based Layout Analysis", The Proceedings of Journal ISOM Vol. 8 No. 1 / June 2014 (Journal of Information Systems, Operations Management), pp. 39-47, ISSN 1843-4711 [6] Costin-Anton Boiangiu, Radu Ioanitescu, “Voting-Based Image Segmentation”, The Proceedings of Journal ISOM Vol. 7 No. 2 / December 2013 (Journal of Information Systems, Operations Management), pp. 211-220, ISSN 1843-4711 [7] W.Niblack, An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs, (1986). [8] J.Sauvola, T,Seppanen, S.Haapakoski, M.Pietikainen, “Adaptive Document Binarization”, 4th Int. Conf. On Document Analysis and Recognition, Ulm, Germany, pp.147-152 (1997). [9] C. Wolf, J-M. Jolion, “Extraction and Recognition of Artificial Text in Multimedia Documents”, Pattern Analysis and Applications, 6(4):309-326, (2003). [10] N. Otsu, “A threshold selection method from grey level histogram”, IEEE Trans. Syst. Man Cybern., vol. 9 no. 1, pp. 62-66 (1979). [11] “Picture Thresholding Using an Iterative Selection Method” by T. Ridler and S. Calvard, in IEEE Transactions on Systems, Man and Cybernetics, vol. 8, no. 8, August 1978. [12] Nabendu Chaki, Soharab Hossain Shaikh, Khalid Saeed, “Exploring Image Binarization Techniques”, Springer, 2014 [13] I. Pratikakis, B. Gatos and K. Ntirogiannis, “ICDAR 2013 Document Image Binarization Contest (DIBCO 2013)”, 12th International Conference on Document Analysis and Recognition (ICDAR 2013), pp. 1471 - 476, Washington, DC, USA, 2013. [14] I. Pratikakis, K. Zagoris, G. Barlas, and B. Gatos, “Icfhr2016 handwritten document image binarization contest (h-dibco 2016),” in Frontiers in Handwriting Recog., International Conference on. IEEE, 2016, pp. 619–623. [15] B. Gatos, K. Ntirogiannis and I. Pratikakis, “ICDAR 2009 Document Image Binarization Contest (DIBCO2009)”, 10th International Conference on Document analysis and Recognition (ICDAR’09). Jul. 26-29. 2009. Barcelona. Spain. pp. 1375-1382.

430

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

DEVELOPING VIRTUAL REALITY AND AUGMENTED REALITY PROJECTS WITH UNITY3D

Laura SAVU 1*

ABSTRACT

In the current article we present how we can develop games with Unity for Augmented Reality and Virtual Reality. The devices that we have been working with are a HoloLens and the Acer Immersive Headset. Unity is currently the most used game development platform. The scripts are developed in C# programming language. This article shows step-by-step how to create a project in Unity, configure the settings for building a Universal Windows Platform Application that will run on HoloLens, run the project on the device directly from Unity, generate the UWP build and open it in Visual Studio, install the App on the device by running the project on HoloLens using direct device option or remote machine.

KEYWORDS: Augmented Reality, Virtual Reality, Mixed Reality, HoloLens, Unity, Visual Studio, Physics, 3D, Holograms.

INTRODUCTION

This paper presents how to develop for Augmented Reality. The device that was used is HoloLens, produced by Microsoft, it is a Windows 10 running device, being itself a PC as there is no need to plug it to a computer, as we do with all the virtual reality headsets. Regarding products and tools, we created the projects in Unity and we generated the build to be opened in Visual Studio, which is Microsoft Integrated Development Environment. This article walks the reader step-by-step through all the phases needed to create, test, deploy and run a Unity project on a HoloLens device.

1. Create a Unity Project

(Requirements: Visual Studio 2017, Unity3D, Windows 10 ,Devices: HoloLens and/or Mixed Reality headset). In the next section we will use Unity game development platform to create our project that will run on HoloLens. Unity is a cross-platform game engine developed by Unity Technologies, first announced and released in June 2005 at Apple Inc.'s Worldwide Developers Conference as an OS X- exclusive game engine. As of 2018, the engine has been extended to support 27 platforms. The engine can be used to create both three-dimensional and two-dimensional games as

1* corresponding author, PhD, Microsoft Bucharest, [email protected]

431

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT well as simulations for its many platforms. Several major versions of Unity have been released since its launch, with the latest stable version being Unity 2018.2.18, released on November 30, 2018. We are implementing Gaze and Air Tape to action on holograms. Open Unity and create a new 3D Project.

Now let’s download Mixed Reality Toolkit. The Mixed Reality Toolkit is a collection of scripts and components intended to accelerate development of applications targeting Microsoft HoloLens and Windows Mixed Reality headsets. Download MRTK: https://github.com/Microsoft/MixedRealityToolkit-Unity/releases

Drag and drop HoloToolkit in the Project Panel. You can see all the resources in the package. Click on Import button.

432

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

You have the HoloToolkit in your Project:

Delete Main Camera from your Scene. Drag and drop the HoloLens Camera from HoloToolkit\Input\Prefabs. Drag and drop the InputManager prefabs as well. Also drag and drop the Cursor prefab.

433

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

2. Enable Virtual Reality Support

Click on Button “Add Open Scenes”

Select the Platform to UWP

434

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Click on Button Switch Platform. The Unity icon will be on the selected platform.

Click on button “Player Settings” Now click on Player Settings button and you will have in Inspector the settings for the Player. Click on Windows Store green icon, check the option Virtual Reality Support. If there is nothing selected, click on the little plus sign and add Windows Holographic.

435

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Close Build Settings. Under Hierarchy, click on Create and chose 3D Object = > Cube.

Select the Cube GO in Hierarchy. You will see in Inspector all the properties of the Cube. Rename it to Floor and set the following values for the Transform properties:

436

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Add one more 3D GO, Cube and set the following values for Transform properties.

Create a material for the cube. In Project panel, click Create => Folder and name it “Materials”, for example. Right-click on the folder. Right-click on the folder and Create => Material. Name it cubMat.

437

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Select cubMat and, in Inspector click on the PickupColor control and chose Red.

Select the Cube GO in Hierarchy, look at Inspector, in Mesh Renderer section, expend Materials

Drag & Drop the cubMat material from Project panel, on Element 0 field in Inspector.

438

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

In Project panel, create a new Folder, “Scripts”.

Create a new C# script into the Scripts folder and name it “DropCube”.

Double-click on the file to open it in VS. Replace the code of your script with the following one:

using UnityEngine; using HoloToolkit.Unity.InputModule; public class DropCube : MonoBehaviour, IInputClickHandler { // Called by GazeGestureManager when the user performs a Select gesture

439

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

public void OnInputClicked(InputClickedEventData eventData) { if (!this.GetComponent()) { var rigidbody = this.gameObject.AddComponent(); rigidbody.collisionDetectionMode = CollisionDetectionMode.Continuous; } } } Save the file and come back to Unity. In Hierarchy, select the Cube GO and drag and drop the DropCube script on it.

In Menu, go to Windows => Holographic Emulation.

440

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Chose for Emulation Mode, the property Remote to Device. Provide the IP address of the HoloLens and Connect.

On the HoloLens, you need to have the app Holographic Remoting App opened. In Unity click on Play to run the Project on HoloLens. Tap on the red cube and the RigidBody component will be added to it and it will fall on the Plane.

Go Back to Script in Visual Studio and add a public property named createPrefab, and the code to create a new cube every time you Tap on the cube.

public GameObject createPrefab; GameObject createdCubs = Instantiate(createPrefab, this.gameObject.transform.position, this.gameObject.transform.rotation) as GameObject; Your .cs file will look like this:

using UnityEngine; using HoloToolkit.Unity.InputModule; public class DropCube : MonoBehaviour, IInputClickHandler {

441

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

// Called by GazeGestureManager when the user performs a Select gesture public GameObject createPrefab; public void OnInputClicked(InputClickedEventData eventData) { if (!this.GetComponent()) { var rigidbody = this.gameObject.AddComponent(); rigidbody.collisionDetectionMode = CollisionDetectionMode.Continuous; } GameObject createdCubs = Instantiate(createPrefab, this.gameObject.transform.position, this.gameObject.transform.rotation) as GameObject; } }

3. Run the project on HoloLens

Let’s add some color to the cubes. Add a public array of colors and assign it a random color from the defined ones.

public Color[] cubColorArray; Your .cs file will look like this:

using UnityEngine; using HoloToolkit.Unity.InputModule; public class DropCube : MonoBehaviour, IInputClickHandler { // Called by GazeGestureManager when the user performs a Select gesture

442

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

public GameObject createPrefab; public Color[] cubColorArray; public void OnInputClicked(InputClickedEventData eventData) { if (!this.GetComponent()) { var rigidbody = this.gameObject.AddComponent(); rigidbody.collisionDetectionMode = CollisionDetectionMode.Continuous; } GameObject createdCubs = Instantiate(createPrefab, this.gameObject.transform.position, this.gameObject.transform.rotation) as GameObject; int randomInt = Random.Range(0, cubColorArray.Length); createdCubs.GetComponent().material.color = cubColorArray[randomInt]; } } Save the file and come back to Unity. Select the Cube and see in Inspector the public properties declared in the Script.

Click on the little circle which is on the right of the Create prefab field.

Select the Cube prefab.

443

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

Set the size of the array to 4, for example, and provide some colors

Test the project on the HoloLens:

444

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

4. The project was successfully deployed and run in Virtual Reality environment.

Because the project is a Universal Windows platform Application, it can be deployed on any device running Windows 10, so it can be installed on a PC, as a Virtual Reality Application and run using an immersive headset from one pf the Microsoft partners: Acer, HP, Asus, Dell, Lenovo, Samsung. For our tests, we used an Acer immersive headset device. We had the VR headset and the motion controllers.

We used the same package that was generated for HoloLens, to run it in Mixed Reality Portal. This is the result:

CONCLUSIONS

This article demonstrates how to create an Application for Augmented Reality, for HoloLens device, using Unity. It goes through all the steps needed to create the project, test it, generated the build for the specific platform and install it on the device.

445

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

BIBLIOGRAPHY

[1] Mixed Reality Toolkit: https://github.com/Microsoft/MixedRealityToolkit-Unity [2] Design Labs: https://github.com/Microsoft/MixedRealityDesignLabs_Unity [3] Microsoft Academy: https://docs.microsoft.com/en-us/windows/mixed- reality/academy [4] Unity tutorials: https://unity3d.com/learn/tutorials

446

JOURNAL OF INFORMATION SYSTEMS & OPERATIONS MANAGEMENT

JOURNAL of Information Systems & Operations Management

ISSN : 1843-4711 --- Romanian American University No. 1B, Expozitiei Avenue Bucharest, Sector 1, ROMANIA http://JISOM.RAU.RO [email protected]

447

Taieri 19,1 / 17 / 27,1 / 24,5 De redus interior 95% 8 , 201 , 2018 . 2 December December

2 . No 2 . 12 ol. ol. 12 No

2018 ol. 12 No

ISOM – V ISOM - V

December nal of . 2 ISOM - V nal of our No J

our 12 J 8 nal of ol. our J

ISOM – V

December 201 ISSN: 1843-4711 nal of

our J