GOCE DELCEV UNIVERSITY - STIP, REPUBLIC OF MACEDONIA FACULTY OF COMPUTER SCIENCE

ISSN 2545-479X print ISSN 2545-4803 on line

ISSN 2545-479X print ISSN 2545-4803 on line

BALKAN JOURNAL OF APPLIED MATHEMATICS AND INFORMATICS

(BJAMI) GOCE DELCEV UNIVERSITY - STIP, REPUBLIC OF MACEDONIA FACULTY OF COMPUTER SCIENCE

ISSN 2545-479X print ISSN 2545-4803 on line

BALKAN JOURNAL OF APPLIED MATHEMATICS AND INFORMATICS

(BJAMI) AIMS AND SCOPE: BJAMI publishes original research articles in the areas of applied mathematics and informatics.

Topics: 1. Computer science; 2. Computer and software engineering; 3. Information technology; 4. Computer security; 5. Electrical engineering; 6. Telecommunication; 7. Mathematics and its applications; 8. Articles of interdisciplinary of computer and information sciences with education, economics, environmental, health, and engineering.

Managing editor Biljana Zlatanovska Ph.D.

Editor in chief Zoran Zdravev Ph.D.

Technical editor Slave Dimitrov

Address of the editorial office Goce Delcev University – Stip Faculty of philology Krste Misirkov 10-A PO box 201, 2000 Štip, R. of Macedonia

BALKAN JOURNAL OF APPLIED MATHEMATICS AND INFORMATICS (BJAMI), Vol 1

ISSN 2545-479X print ISSN 2545-4803 on line Vol. 1, No. 1, Year 2018 EDITORIAL BOARD

Adelina Plamenova Aleksieva-Petrova, Technical University – Sofia, Faculty of Computer Systems and Control, Sofia, Bulgaria Lyudmila Stoyanova, Technical University - Sofia , Faculty of computer systems and control, Department – Programming and computer technologies, Bulgaria Zlatko Georgiev Varbanov, Department of Mathematics and Informatics, Veliko Tarnovo University, Bulgaria Snezana Scepanovic, Faculty for Information Technology, University “Mediterranean”, Podgorica, Montenegro Daniela Veleva Minkovska, Faculty of Computer Systems and Technologies, Technical University, Sofia, Bulgaria Stefka Hristova Bouyuklieva, Department of Algebra and Geometry, Faculty of Mathematics and Informatics, Veliko Tarnovo University, Bulgaria Vesselin Velichkov, University of Luxembourg, Faculty of Sciences, Technology and Communication (FSTC), Luxembourg Isabel Maria Baltazar Simões de Carvalho, Instituto Superior Técnico, Technical University of Lisbon, Portugal Predrag S. Stanimirović, University of Niš, Faculty of Sciences and Mathematics, Department of Mathematics and Informatics, Niš, Serbia Shcherbacov Victor, Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, Moldova Pedro Ricardo Morais Inácio, Department of Computer Science, Universidade da Beira Interior, Portugal Sanja Panovska, GFZ German Research Centre for Geosciences, Germany Georgi Tuparov, Technical University of Sofia Bulgaria Dijana Karuovic, Tehnical Faculty “Mihajlo Pupin”, Zrenjanin, Serbia Ivanka Georgieva, South-West University, Blagoevgrad, Bulgaria Georgi Stojanov, Computer Science, Mathematics, and Environmental Science Department The American University of Paris, France Iliya Guerguiev Bouyukliev, Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Bulgaria Riste Škrekovski, FAMNIT, University of Primorska, Koper, Slovenia Stela Zhelezova, Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Bulgaria Katerina Taskova, and Data Mining Group, Faculty of Biology, Johannes Gutenberg-Universität Mainz (JGU), Mainz, Germany. Dragana Glušac, Tehnical Faculty “Mihajlo Pupin”, Zrenjanin, Serbia Cveta Martinovska-Bande, Faculty of Computer Science, UGD, Macedonia Blagoj Delipetrov, Faculty of Computer Science, UGD, Macedonia Zoran Zdravev, Faculty of Computer Science, UGD, Macedonia Aleksandra Mileva, Faculty of Computer Science, UGD, Macedonia Igor Stojanovik, Faculty of Computer Science, UGD, Macedonia Saso Koceski, Faculty of Computer Science, UGD, Macedonia Natasa Koceska, Faculty of Computer Science, UGD, Macedonia Aleksandar Krstev, Faculty of Computer Science, UGD, Macedonia Biljana Zlatanovska, Faculty of Computer Science, UGD, Macedonia Natasa Stojkovik, Faculty of Computer Science, UGD, Macedonia Done Stojanov, Faculty of Computer Science, UGD, Macedonia Limonka Koceva Lazarova, Faculty of Computer Science, UGD, Macedonia Tatjana Atanasova Pacemska, Faculty of Electrical Engineering, UGD, Macedonia 4 C O N T E N T

Aleksandar, Velinov, Vlado, Gicev PRACTICAL APPLICATION OF SIMPLEX METHOD FOR SOLVING LINEAR PROGRAMMING PROBLEMS ...... 7

Biserka Petrovska , Igor Stojanovic , Tatjana Atanasova Pachemska CLASSIFICATION OF SMALL DATA SETS OF IMAGES WITH TRANSFER LEARNING IN CONVOLUTIONAL NEURAL NETWORKS ...... 17

Done Stojanov WEB SERVICE BASED GENOMIC DATA RETRIEVAL...... 25

Aleksandra Mileva, Vesna Dimitrova SOME GENERALIZATIONS OF RECURSIVE DERIVATES OF k-ary OPERATIONS...... 31

Diana Kirilova Nedelcheva SOME FIXED POINT RESULTS FOR CONTRACTION SET - VALUED MAPPINGS IN CONE METRIC SPACES...... 39

Aleksandar Krstev, Dejan Krstev, Boris Krstev, Sladzana Velinovska DATA ANALYSIS AND STRUCTURAL EQUATION MODELLING FOR DIRECT FOREIGN INVESTMENT FROM LOCAL POPULATION...... 49

Maja Srebrenova Miteva, Limonka Koceva Lazarova NOTION FOR CONNECTEDNESS AND PATH CONNECTEDNESS IN SOME TYPE OF TOPOLOGICAL SPACES...... 55

The Appendix

Aleksandra Stojanova , Mirjana Kocaleva , Natasha Stojkovikj , Dusan Bikov , Marija Ljubenovska , Savetka Zdravevska , Biljana Zlatanovska , Marija Miteva , Limonka Koceva Lazarova OPTIMIZATION MODELS FOR SHEDULING IN KINDERGARTEN AND HEALTHCARE CENTES...... 65

Maja Kukuseva Paneva, Biljana Citkuseva Dimitrovska, Jasmina Veta Buralieva, Elena Karamazova, Tatjana Atanasova Pacemska PROPOSED QUEUING MODEL M/M/3 WITH INFINITE WAITING LINE IN A SUPERMARKET...... 73

Maja Mijajlovikj1, Sara Srebrenkoska, Marija Chekerovska, Svetlana Risteska, Vineta Srebrenkoska APPLICATION OF TAGUCHI METHOD IN PRODUCTION OF SAMPLES PREDICTING PROPERTIES OF POLYMER COMPOSITES ...... 79

Sara Srebrenkoska, Silvana Zhezhova, Sanja Risteski, Marija Chekerovska Vineta Srebrenkoska Svetlana Risteska APPLICATION OF FACTORIAL EXPERIMENTAL DESIGN IN PREDICTING PROPERTIES OF POLYMER COMPOSITES...... 85

Igor Dimovski, Ice Gjumandeloski, Filip Kochoski, Mahendra Paipuri, Milena Veneva , Aleksandra Risteska COMPUTER AIDED (FILAMENT WINDING) TAPE PLACEMENT FOR ELBOWS. PRACTICALLY ORIENTATED ALGORITHM...... 89

5 PRACTICAL APPLICATION OF SIMPLEX METHOD FOR SOLVING LINEAR PROGRAMMING PROBLEMS

Aleksandar, Velinov1, Vlado, Gicev 1

1Faculty of Computer Science, Goce Delcev University, Stip, Macedonia [email protected] [email protected]

Abstract: In this paper we consider application of linear programming in solving optimization problems with constraints. We used the simplex method for finding a maximum of an objective function. This method is applied to a real example. We used the “linprog” function in MatLab for problem solving. We have shown, how to apply simplex method on a real world problem, and to solve it using linear programming. Finally we investigate the complexity of the method via variation of the computer time versus the number of control variables. Keywords: simplex method, linear programming, objective function, complexity.

1. Introduction Linear programming was developed during World War II, when a system with which to maximize the efficiency of resources was of utmost importance. New war-related projects demanded optimization of constrained resources. “Programming” was used as a military term that referred to activities such as planning schedules efficiently or deploying men optimally [1].

Mathematical programming is that branch of mathematics dealing with techniques for maximizing or minimizing an objective function subject to linear, nonlinear, and integer constraints on the variables. Special case of mathematical programming is a linear programming. Linear programming is concerned with the maximization or minimization of a linear objective function with many variables subject to linear equality and inequality constraints [2]. Linear programming can be viewed as a part of a great revolutionary development. It has the ability to define general goals and to find detailed decisions in order to achieve that goals. It can be faced with practical situations of great complexity. To formulate real-world problems, linear programming uses mathematical terms (models), techniques for solving the models (algorithms), and engines for executing the steps of algorithms (computers and software) [3].

Optimization principles have important aspect in modern engineering design and system operations in various areas. Computers capable of solving large-scale problems contribute to the recent development of new optimization techniques. The main goal of these techniques is to optimize (maximize or minimize) some function f. This functions are called objective functions. As a case study we used the objective function f that represent the revenue of the production of electronic elements, more precisely graphics cards. We used methods for maximizing the revenue of the company. Using linear programming, we can model wide variety of objective functions as: yield per minute in a chemical process, revenue in a production of cars, the hourly number of customers served in a bank, the mileage per gallon of a certain type of car, the production of computers on monthly basis and so on. Sometimes we may want to minimize f if f is the cost per unit of producing certain graphics cards (opposite of our example where we maximize the revenue of production), the operating cost of some power plant, the time needed to produce a new type of car, the daily loss of heat in a heating system, the costs for IT infrastructure in some company and so on.

In most optimization problems the objective function f depends on several variables:

x1, x2,…..xn

These variables are called “control variables” because we can control them, that is, we can choose their values. For example the production of some plant may depend on temperature x1, moisture content x2, nitrogen in the soil x3. The efficiency of a certain air-conditioning system may depend on air pressure x1, temperature x2, cross-sectional area of outlet x3, moisture content x4, and so on. The optimization theory develops methods for optimal choices of x1,…,xn, UDK: 004.777:004.657]:575.22

[9] Zhou JT, Pan S, Tsang IW, Yan Y, “Hybrid Heterogeneous Transfer Learning Through Deep Learning”, Proceedings of WEB SERVICE BASED GENOMIC DATA RETRIEVAL the national conference on artificial intelligence, vol. 3. 2014. p. 2213–20. [10] Pan SJ, Yang Q, “A Survey on Transfer Learning” IEEE Trans Knowl Data Eng 2010; 22(10):1345–59. [11] Bergstra, J. and Bengio, Y. (2012), “Random Search for Hyper-Parameter Optimization”, J. Machine Learning Res., 13, Done Stojanov 281–305. [12] Courville, A., Bergstra, J., and Bengio, Y. (2011), “Unsupervised Models of Images by Spike-And-Slab RBMs”, In ICML’2011 Faculty of computer science, Goce Delcev University, Stip, Macedonia [13] Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, 2009 [email protected] [14] www.analyticsvidhya.com,https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tuning-a-pre- trained-model/ Abstract: An application of web service for genomic data retrieval is considered. Records for nucleotide [15] www.innoarchitech.com, https://www.innoarchitech.com/artificial-intelligence-deep-learning-neural-networks-explained/ sequences are retrieved from the European Nucleotide Archive and preprocessed locally in order to be able [16] Bengio, Y. (2012), “Practical Recommendations for Gradient-Based Training of Deep Architectures”, arXiv: 1206.5533V2 to apply local host based analysis. This analysis is required because many of the built-in EMBL-EBI services [cs.LG] 16 Sep 2012 [17] Zeiler, M. D., and Fergus, R. “Visualizing and understanding convolutional networks”, CoRR, abs/1311.2901, 2013. pose restrictions upon the metrics with which will operate the user, what in some cases may also affect the Published in Proc. ECCV, 2014 structure of the solution, such as when performing pairwise or multiple DNA or mRNA comparison. [18] Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. OverFeat: “Integrated Recognition, Localization and Detection using Convolutional Networks”. In Proc. ICLR, 2014. Keywords: web, service, data, retrieval, ENA [19] Simonyan, K. and Zisserman, A. “Two-stream convolutional networks for action recognition in videos”. CoRR, abs/1406.2199, 2014. Published in Proc. NIPS, 2014 [20] Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., Le, Q. V., and Ng, A. Y. “Large scale distributed deep networks”. In NIPS, pp. 1232–1240, 2012 1. Introduction [21] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., In recent years, a huge volume of genomic data has been accessed. This was possible due to the advances in the Berg, A. C., and Fei-Fei, L. “ImageNet large scale visual recognition challenge”. CoRR, abs/1409.0575, 2014. DNA techniques, such as shotgun sequencing and bridge PCR technique. The value of the collected [22] Perronnin, F., Sa´nchez, J., and Mensink, T. “Improving the Fisher kernel for large-scale image classification”. In Proc. DNA information increases by sharing in the scientific community. This idea drove the construction of the central ECCV, 2010. repository of nucleotide data: ENA or The European Nucleotide Archive [1].

ENA or The European Nucleotide Archive (web access: https://www.ebi.ac.uk/ena) [1] is a public database hosted under the European Institute (EMBL-EBI) [2] that contains records for nucleotide sequences. ENA allows upload/access/download of nucleotide data derived from different sources (organisms) applying different sequencing techniques.

ENA evolved from the EMBL Data Library which was officially released in 1982 [3] and it contained 568 records with total size of 500,000 base pairs [4]. Since then the volume of the data in this repository exponentially grows [5].

Data stored in this repository is functionally annotated. This means that in spite of the exact order of nucleotides in the sequence, data linkage details are also available. For instance, when accessing a whole genome, each CDS (coding sequence) is differentiated from other coding frames, as well as from non-coding frames. By knowing the start and the end of each CDS, the structure of the mRNA being transcribed and the protein being translated can be exactly determined.

Regardless record’s functional association, there is metadata common for all. This means that regardless the record is associated to a whole genome, chromosome or partial CDS (partial coding region or partial mRNA) there are descriptors which are common for all records.

For each record there is information that describes: the source (organism) of nucleotide sequence, type and topology of the molecule, taxonomic division or taxonomic rank (ex. HUM (Human), PRO (Prokaryote)…etc.), length of the sequence or the number of base pairs, sequence version, date when the sequence was made public and the date of last update of the sequence. Keywords that describe the sequence and secondary accession information are also available.

However, the most important feature of each record in ENA is that there is no restriction in the accession of exact structure of nucleotide sequence. This information can be accessed in three data formats: TEXT, FASTA and XML.

The European Bioinformatics Institute (EMBL-EBI) also hosts web implementations of the most popular algorithms for genomic data analysis, such as: EMBOSS Needle – an implementation of the Needleman-Wunsch algorithm [6], EMBOSS Water – an implementation of the Smith-Waterman algorithm [7], EMBOSS Matcher – an

25 Done Stojanov

implementation of the LALIGN algorithm [8], EMBOSS backtranambig – reverses protein into nucleotide complete genome or chromosome or for human, virus, bacteria…samples. The search against specific pattern sequence…etc. Most of these services work with data form the European Nucleotide Archive under some identifies all ENA sequences that contain the referred pattern. programming constrains. For instance when comparing two nucleotide sequences using EMBOSS Water, only 8 values for gap opening are available: {1,5,10,15,20,25,50,100} and there is no option to chose value out of the list. Once the record has been accessed, data views in: TXT, FASTA and XML are available, Figure 2. FASTA is the But since the structure of the solution depends of the metrics being chosen, especially in terms of accuracy and most suitable data format for easy separation of metadata and genomic data. Given FASTA view for a specific comprehensiveness, there is a real need to develop a program which would be able to retrieve and filter ENA data record, only the first line contains metadata regarding the name of the sequence and ID, while the rest is the locally in order to apply algorithms that overcome the limitations posed by EMBL-EBI web service structure of the nucleotide sequence, Figure 3. implementations.

In this paper a web service based application that retrieves and filters data from the European Nucleotide Archive is considered.

2. Materials and methods The user interface of the European Nucleotide Archive provides search services for nucleotide sequences based Figure 2. Options to view record on: sequence ID, descriptor and it also provides search against a specific pattern, Figure 1.

Figure 3. Record view in FASTA preview

In order to retrieve genomic data locally, a web client application in C# was developed (we work with object derived from the WebClient() class), Figure 4. This application exploits FASTA data format.

Figure 1. ENA user interface

Each sequence in ENA has a unique ID which is a combination of letters and numbers. For instance, CP025048 Figure 4. Web client application is ENA ID of Escherichia coli strain SC516 chromosome, complete genome (bacteria). Unlike the use of ENA ID which provides search for specific record, descriptors provide general search. Descriptors are keywords associated First, the user must provide correct ENA ID of sequence. This parameter is provided in TextBox control, Figure to the structure, function or taxonomic division of the nucleotide sequence. In this way, we can search ENA for 4. Since the ENA URL of each record in FASTA data format follows this pattern:

26 WEB SERVICE BASED GENOMIC DATA RETRIEVAL

implementation of the LALIGN algorithm [8], EMBOSS backtranambig – reverses protein into nucleotide complete genome or chromosome or for human, virus, bacteria…samples. The search against specific pattern sequence…etc. Most of these services work with data form the European Nucleotide Archive under some identifies all ENA sequences that contain the referred pattern. programming constrains. For instance when comparing two nucleotide sequences using EMBOSS Water, only 8 values for gap opening are available: {1,5,10,15,20,25,50,100} and there is no option to chose value out of the list. Once the record has been accessed, data views in: TXT, FASTA and XML are available, Figure 2. FASTA is the But since the structure of the solution depends of the metrics being chosen, especially in terms of accuracy and most suitable data format for easy separation of metadata and genomic data. Given FASTA view for a specific comprehensiveness, there is a real need to develop a program which would be able to retrieve and filter ENA data record, only the first line contains metadata regarding the name of the sequence and ID, while the rest is the locally in order to apply algorithms that overcome the limitations posed by EMBL-EBI web service structure of the nucleotide sequence, Figure 3. implementations.

In this paper a web service based application that retrieves and filters data from the European Nucleotide Archive is considered.

2. Materials and methods The user interface of the European Nucleotide Archive provides search services for nucleotide sequences based Figure 2. Options to view record on: sequence ID, descriptor and it also provides search against a specific pattern, Figure 1.

Figure 3. Record view in FASTA preview

In order to retrieve genomic data locally, a web client application in C# was developed (we work with object derived from the WebClient() class), Figure 4. This application exploits FASTA data format.

Figure 1. ENA user interface

Each sequence in ENA has a unique ID which is a combination of letters and numbers. For instance, CP025048 Figure 4. Web client application is ENA ID of Escherichia coli strain SC516 chromosome, complete genome (bacteria). Unlike the use of ENA ID which provides search for specific record, descriptors provide general search. Descriptors are keywords associated First, the user must provide correct ENA ID of sequence. This parameter is provided in TextBox control, Figure to the structure, function or taxonomic division of the nucleotide sequence. In this way, we can search ENA for 4. Since the ENA URL of each record in FASTA data format follows this pattern:

27 Done Stojanov

"http://www.ebi.ac.uk/ena/data/view/" + ENA_ID_of_Record + "&display=fasta", this value has to be appended Table 1: Average Retrieval Time for different size samples between "http://www.ebi.ac.uk/ena/data/view/" and "&display=fasta". RAT (Retrieval To retrieve the genomic data in local .txt file, we call the DownloadFile() function. This function is called for Length Test 1 Test 2 Test 3 Average Time) the web client object and it has two parameters. The first one is the URL of the data being retrieved, while the Seqeunce (bp) ENA ID (ms) (ms) (ms) (ms) second is the location in the host where the data is written (@”location”). Escherichia coli plasmid pCOV1 clone COV1_c1 16537 MG648836 3536 4301 3952 3929,666667 The code lines bellow summarize this discussion. Escherichia coli plasmid pCOV6 clone COV6_c2 26897 MG648862 4686 3550 3962 4066 WebClient Object = new WebClient(); Escherichia coli plasmid pCOV26 clone string URL_String = "http://www.ebi.ac.uk/ena/data/view/" + textBox.Text + COV26_c1 32494 MG649013 3594 3820 5004 4139,333333 "&display=fasta"; Escherichia coli plasmid pCOV13 clone COV13_c1 53034 MG648915 3916 3864 4778 4186 Object.DownloadFile(new Uri(URL_String), @"location"); Escherichia coli plasmid pCOV9 clone After sequence retrieval, we face another task. If we want to process genomic data we must separate genomic COV9_c1 79483 MG648907 5471 4906 5043 5140 data from metadata. If we analyze the structure of ENA record, Figure 3, we can notice that ‘ ‘ (single space) is used as a metadata inner separator, Figure 3. The end of metadata information (details regarding the name of the sequence and its ID ) is followed by only one ‘ ‘ (single space) which is followed by genomic data without any interruption, RAT (Retrieval Figure 3. y = 0.0179xAverage + 3544.6 Time) ms, RAT (Retrieval RAT (RetrievalR² = 0.856579483, 5140 So if we want to extract genomic data, first we must read the content from the local .txt file. The Split() RAT (Retrieval RAT (Retrieval Average Time) ms, Average Time) ms, procedure with delimiter ‘ ‘ (single space) is called. This procedure splits record and stores into an array. The last Average Time) msAverage, Time) ms, 32494, 4139.333333 53034, 4186 element in this array (array[array.Count() - 1]) contains only genomic data. 16537, 3929.66666726897, 4066 The code lines bellow summarize the previous discussion. RAT (ms)

string string_name = File.ReadAllText(@"location");

string[] array = string_name.Split(' ');

string genomic_data = array[array.Count() - 1]; Length (bp) 3. Results and discussion Applying the web client application from the previous section, the average retrieval time was analyzed on Figure 5. Best fitting line and linear trend line for RAT samples in Table 1 Escherichia coli plasmid pCOV_ clone COV_ samples, Table 1, with lengths in range between 16.5 kbp and 80 kbp (kilo base pairs). Each sample was retrieved three times and the corresponding retrieval times were measured accordingly in: Test 1, Test 2 and Test 3, Table 1. Retrieval average time (Test 1+Test 2+Test 3)/3 was commuted Concluding remarks for each sample. Tests were performed on Acer Aspire 5570Z with Genuine Intel T2080 @ 1.73 GHz and 2 GB A web client application that retrieves and extracts genomic data directly from the world-famous repository of RAM with stable net connection. nucleotide sequences – the European Nucleotide Archive was considered. This solution provides genomic data In general the retrieval time is impacted by: the network speed, host performance and the size of the sequence retrieval for local host-based analysis and that is necessary in order to overcome the limitations posed by EMBL- being retrieved. Since the first two parameters can be considered to be static (tests are performed on the same EBI web services, such as limitations upon the value of the gap penalty when aligning two samples, what is some machine with constant download speed), the size of the sequence would have greatest impact on retrieval time. cases may affect the structure of the solution. Obtained results showed that longer sequence has been retrieved, more time to retrieve the sequence was spent, References Table 1, Figure 5. In order to retrieve the shortest Escherichia coli plasmid pCOV1 clone COV1_c1 sequence of 16.5 [1] The European Nucleotide Archive. https://www.ebi.ac.uk/ena kbp 3.9 seconds were required in average, while 5.1 seconds were required to retrieve the longest Escherichia coli [2] The European Bioinformatics Institute. https://www.ebi.ac.uk/ plasmid pCOV9 clone COV9_c1 sample of 80 kbp, Table 1, Figure 5. Linear trend line between the length of the [3] Hamm, G.H., Cameron, G.N. (1986). "The EMBL data library": Oxford University Press, Nucleic Acids sequence and the retrieval time can be also established, Figure 5. Research. 14(1): 5–9. [4] Kneale, G., Kennard, O. (1984). "The EMBL nucleotide sequence data library": Portland Press, Biochemical Society Transactions. 12(6): 1011–1014. [5] Cochrane, G., Alako, B., Amid, C., Bower, L., Cerdeno-Tarraga, A., Cleland, I., Gibson, R., Goodgame, N., Jang, M. (2012). "Facing growth in the European Nucleotide Archive": Oxford University Press, Nucleic Acids Research. 41(D1): D30–D35.

28 WEB SERVICE BASED GENOMIC DATA RETRIEVAL

"http://www.ebi.ac.uk/ena/data/view/" + ENA_ID_of_Record + "&display=fasta", this value has to be appended Table 1: Average Retrieval Time for different size samples between "http://www.ebi.ac.uk/ena/data/view/" and "&display=fasta". RAT (Retrieval To retrieve the genomic data in local .txt file, we call the DownloadFile() function. This function is called for Length Test 1 Test 2 Test 3 Average Time) the web client object and it has two parameters. The first one is the URL of the data being retrieved, while the Seqeunce (bp) ENA ID (ms) (ms) (ms) (ms) second is the location in the host where the data is written (@”location”). Escherichia coli plasmid pCOV1 clone COV1_c1 16537 MG648836 3536 4301 3952 3929,666667 The code lines bellow summarize this discussion. Escherichia coli plasmid pCOV6 clone COV6_c2 26897 MG648862 4686 3550 3962 4066 WebClient Object = new WebClient(); Escherichia coli plasmid pCOV26 clone string URL_String = "http://www.ebi.ac.uk/ena/data/view/" + textBox.Text + COV26_c1 32494 MG649013 3594 3820 5004 4139,333333 "&display=fasta"; Escherichia coli plasmid pCOV13 clone COV13_c1 53034 MG648915 3916 3864 4778 4186 Object.DownloadFile(new Uri(URL_String), @"location"); Escherichia coli plasmid pCOV9 clone After sequence retrieval, we face another task. If we want to process genomic data we must separate genomic COV9_c1 79483 MG648907 5471 4906 5043 5140 data from metadata. If we analyze the structure of ENA record, Figure 3, we can notice that ‘ ‘ (single space) is used as a metadata inner separator, Figure 3. The end of metadata information (details regarding the name of the sequence and its ID ) is followed by only one ‘ ‘ (single space) which is followed by genomic data without any interruption, RAT (Retrieval Figure 3. y = 0.0179xAverage + 3544.6 Time) ms, RAT (Retrieval RAT (RetrievalR² = 0.856579483, 5140 So if we want to extract genomic data, first we must read the content from the local .txt file. The Split() RAT (Retrieval RAT (Retrieval Average Time) ms, Average Time) ms, procedure with delimiter ‘ ‘ (single space) is called. This procedure splits record and stores into an array. The last Average Time) msAverage, Time) ms, 32494, 4139.333333 53034, 4186 element in this array (array[array.Count() - 1]) contains only genomic data. 16537, 3929.66666726897, 4066 The code lines bellow summarize the previous discussion. RAT (ms)

string string_name = File.ReadAllText(@"location");

string[] array = string_name.Split(' ');

string genomic_data = array[array.Count() - 1]; Length (bp) 3. Results and discussion Applying the web client application from the previous section, the average retrieval time was analyzed on Figure 5. Best fitting line and linear trend line for RAT samples in Table 1 Escherichia coli plasmid pCOV_ clone COV_ samples, Table 1, with lengths in range between 16.5 kbp and 80 kbp (kilo base pairs). Each sample was retrieved three times and the corresponding retrieval times were measured accordingly in: Test 1, Test 2 and Test 3, Table 1. Retrieval average time (Test 1+Test 2+Test 3)/3 was commuted Concluding remarks for each sample. Tests were performed on Acer Aspire 5570Z with Genuine Intel T2080 @ 1.73 GHz and 2 GB A web client application that retrieves and extracts genomic data directly from the world-famous repository of RAM with stable net connection. nucleotide sequences – the European Nucleotide Archive was considered. This solution provides genomic data In general the retrieval time is impacted by: the network speed, host performance and the size of the sequence retrieval for local host-based analysis and that is necessary in order to overcome the limitations posed by EMBL- being retrieved. Since the first two parameters can be considered to be static (tests are performed on the same EBI web services, such as limitations upon the value of the gap penalty when aligning two samples, what is some machine with constant download speed), the size of the sequence would have greatest impact on retrieval time. cases may affect the structure of the solution. Obtained results showed that longer sequence has been retrieved, more time to retrieve the sequence was spent, References Table 1, Figure 5. In order to retrieve the shortest Escherichia coli plasmid pCOV1 clone COV1_c1 sequence of 16.5 [1] The European Nucleotide Archive. https://www.ebi.ac.uk/ena kbp 3.9 seconds were required in average, while 5.1 seconds were required to retrieve the longest Escherichia coli [2] The European Bioinformatics Institute. https://www.ebi.ac.uk/ plasmid pCOV9 clone COV9_c1 sample of 80 kbp, Table 1, Figure 5. Linear trend line between the length of the [3] Hamm, G.H., Cameron, G.N. (1986). "The EMBL data library": Oxford University Press, Nucleic Acids sequence and the retrieval time can be also established, Figure 5. Research. 14(1): 5–9. [4] Kneale, G., Kennard, O. (1984). "The EMBL nucleotide sequence data library": Portland Press, Biochemical Society Transactions. 12(6): 1011–1014. [5] Cochrane, G., Alako, B., Amid, C., Bower, L., Cerdeno-Tarraga, A., Cleland, I., Gibson, R., Goodgame, N., Jang, M. (2012). "Facing growth in the European Nucleotide Archive": Oxford University Press, Nucleic Acids Research. 41(D1): D30–D35.

29 Done Stojanov

Some Generalizations Of Recursive Derivates of [6] Needleman, S.B., Wunsch, C.D. (1970). "A general method applicable to the search for similarities in the k-ary Operations amino acid sequence of two proteins": Elsevier, Journal of Molecular Biology. 48(3): 443–453. [7] Smith, T.F., Waterman, M.S. (1981). "Identification of Common Molecular Subsequences": Elsevier, Journal of Molecular Biology. 147: 195–197. [8] Pearson, W.R. (1994). "Using the FASTA program to search protein and DNA sequence databases": Aleksandra Mileva, Vesna Dimitrova Humana Press, Computer Analysis of Sequence Data. 307-331. 1Faculty of Computer Science, University “Goce Delˇcev”, Stip,ˇ Republic of Macedonia [email protected] 2 Faculty of Computer Science and Engineering, University “Ss Cyril and Methodius”, Skopje, Republic of Macedonia [email protected]

Abstract. We present several results about recursive derivates of k- ary operations defined on finite set Q. They are generalizations of some binary cases given by Larionova-Cojocaru and Syrbu [7]. Also, we present several experimental results about recursive differentiability of ternary quasigroups of order 4. We also prove that the multiplication group of k-ary quasigroups obtained by recursive differentiability of a given k-ary quasigroup (Q, f) is a subgroup of the multiplication group of the (Q, f).

Keywords: Recursively t-differentiable quasigroups, k-ary operations 2010 Mathematics Subject Classification. 20N05, 20N15, 05B15, 94B60.

1 Introduction

Let Q be a nonempty set and let i and k be a positive integers, i k. We will ≤ k (k i+1) k use (xi ) to denote the (k i + 1)-tuple (xi,...,xk) Q − , and x to denote the k-tuple (x,...,x) Q−k. A k ary operation f ∈on the set Q is a mapping k ∈ k − k f : Q Q defined by f :(x1 ) xk+1, for which we write f(x1 )=xk+1.Ak- ary groupoid→ (k 1) is an algebra→ (Q, f) on a nonempty set Q as its universe and with one k-ary operation≥ f.Ak-ary groupoid (Q, f) is called a k-ary quasigroup (of order Q = q) if any k of the elements a1,a2,...,ak+1 Q, satisfying the equality | | ∈ k f(a1 )=ak+1, uniquely specifies the remaining one. The k ary operations f1,f2,...,fd, 1 d k, defined on a set Q are or- − k d ≤ ≤ k d thogonal if the system fi(x1 )=ai i=1 has exactly q − solutions for any a ,...,a Q, where q ={Q [4, 3]. There} is one-to-one correspondence between 1 d ∈ | | the set of all k-tuples of orthogonal k-ary operations defined on a set Q and the set of all permutations θ : Qk Qk ([4]), given by → θ(xk) (f (xk),f (xk),...,f (xk)). 1 → 1 1 2 1 d 1

30