<<

Feature-Based Comparison of Language Transformation Tools

1Muhammad Ilyas, 2Saad Razzaq, 3Fahad Maqbool, 4Qaiser Abbas, 5Wakeel Ahmad, 6Syed M Adnan 1,2,3,4Department of Computer Science and IT, University of Sargodha, Sargodha, Pakistan 5,6Department of Computer Science, University of Wah, Wah Cantt, Pakistan [email protected],

Abstract Code transformation is the best option while switching from farmer to next technology. Our paper presents a comparative analysis of code transformation tools based on 18 different factors. These factors are Classes, pointers, Access Specifiers, Functions and Exceptions, etc. For this purpose, we have selected varyCode, Telerik, Multi-online converter, and InstantVB. Source Language considered for this purpose is C sharp (C#) and the target language is Visual Basics (VB). Results show that VaryCode is best among the four tools as its converted programs throw fewer errors and require minor changes while running the program. Key words: Code transformation, Multiconverter, Stylistic Feature, Code Smell, Author style.

more readable for the developer. Due to the 1. INTRODUCTION rapid improvement in programming languages, companies want to migrate their Source to source compilation is a process of software to new updated/upgraded code. converting source code written in one high- Developing code from scratch each time is level language to itself or other high-level difficult because it is a complex and languages [1]. It is a refactoring process that cumbersome activity. The automatic is helpful when programs that have to transformation of the code is always very refactor are outside the control of the favorable and speeds up the work. original implementer. The purpose of Several techniques have been used for transpilation is to convert legacy code to a this purpose such that automatic source to newer version of a particular language. Such source error compensation of floating-point as converting a program from one dialect to programs [17], generating database access another in the same code from domain models [18], and [2, 25]. Translation of code from one to generating pseudo-code from source code another language is also necessary for [19], etc. understanding code according to expertise in

a particular language. This makes code much

The objectives of feature-based comparison third is Language Conversion that converts are to help in making newer versions of tools from one programming language into efficient and more accurate as well as to another. highlight the areas where a particular tool is There are many purposes for code lagging. For this purpose, we have selected transformation such as performance four source to source compilers: Telerik improvements [11], memory optimization [20], VaryCode [21], Multi-online Converter [12], parallelization, and vectorization. Due [22], and Instant VB [23]. We have chosen to the rapid growth of the Internet, one of the C# as the source language and Visual Basic main reasons for code transformation is the (VB) has the target language. We have migration of a legacy system into a web- focused on eighteen different features for enabled environment [3]. Code comparison, some of these features are transformation [13] also helps advisory tools Classes, Functions, Access Specifiers and that guide the developers to parallelize the Exceptions, etc. The paper is organized as real-world problems. These tools have faced follows: related work is described in Section problems due to the large size of programs 2, feature-based comparison of language and high code complexity. That’s why these transformation tools are presented in Section tools are unable to provide meaningful 3, and the conclusion is described in Section parallelization hint to the developers. Then 4. code transformation overcomes this problem by simplifying the code. 2. RELATED WORK A source to source compilation is a process Cfir Aguston et al. Proposed the approach to of translating a high-level programming overcome the complexity of code issue language to itself or another high-level called Skeletonization which automatically programming language [1]. One of the transforms the complex code into a much purposes of the source-to-source compilation simpler structure. The proposed algorithm is translating legacy code to use the transforms the constructs such as covers upcoming version of the underlying pointers, nested conditional statements, programming language. It is a refactoring nested loops, etc. This algorithm transforms process that is helpful when the programs to pointers into integer indexes and replaces C refactor are outside the control of the struct references with references to arrays. original implementer such as to convert a Source level compiler performs analysis on program from legacy API to the new API, or skeletonized code for parallelizing the code. when the program size makes it impossible The generated code is not equivalent to the to refactor it by hand. original code, it suggests possible parallelization patterns to the developers. Malton [2] defines three conversion tasks: First is Dialect Conversion which converts a Another tool that is concerned with program from one dialect to another in the performance improvements is GPU S2S same programming language. This is used which automatically transforms the C when a new version of the compiler is used. sequential code into CUDA (Compute The second is API Migration used to convert Unified Device Architecture) code [11]. This the program into a new set of APIs. The contains three modules: Directive recognition, Parsing module, and CUDA

code generation. GPU S2S takes C code The concept of source-to-source Translator with directives as input and translates it into can be viewed as a generation of Database the CUDA code. Experiments showed that Access code from Domain Models. N. Y. generated code has significant improvements Khelifi et al. presented his Idea on Database as compared to C original code, which leads Access code generation from Domain to performance enhancement. Models. As a source, they used RSL (Requirements ). The G. Dimitroulakos and C. Lezos et al. process of code generation consists of two presented a source to source compiler main steps. The first step is related to MEMSCOPT for Dynamic code analysis transformation rules having translational and loop transformations and assisting the semantics for domain vocabulary constructs optimization of the memory hierarchy of a of RSL. The second step constitutes of digital hardware system. MEMSCOPT is a MOLA ( Language) command-line application and is extended algorithms for implementing by providing a Graphical User Interface transformations. The validity of results is (GUI) in C#. It takes a C code file as input assured by using the framework of the and performs an analysis such that ReDSeeDS (Requirements-Driven Software monitoring the loop. It also performs a Development System) tool suite. The dynamic code translation. MEMSCOPT resultant transformations produced applies 7 types of transformations such that consistent and quality code that can be loop extends, loop shift, loop reversal, loop utilized directly for the implementation of interchange, loop fusion, loop fission, loop the data access layer [18]. normalization, loop reorder, loop switching, loop scope move forward and loop scope A Pseudogene is a tool for converting move backward [12]. pseudo code from Source code using SMT (Statistical Machine Translation), worked on J. Cronsioe et al proposed another source to by H. Fudaba et al. The tool performs the source compiler that performs automatic transformation of Python code to English or transformations such that optimization of Japanese. The proposed tool uses T2SMT loop structures on multi-core platforms. The (Tree-to-String machine translation) method. proposed research is concerned with a Input to this method is a parse tree through scientific application written in which Tokenization and parsing are named BigDFT (Density Functional performed. The tree is broken up for Theory). BigDFT is defined by its heavy use translation using translation patterns for of convolution operators on large arrays. conversion into pseudo-code. Proposed PIPs and BOAST the two S2S applications techniques are one of the best application of have been used with MagicFilter to optimize SMT for translating the code into natural BigDFT. It is found that PIP’s generic language [19]. transformation logic is not always suitable for adapted optimizations there is another B. S. K. Vorobyov et al worked on source to technique BOAST for more suitable source Translator from Datalog to SQL for transformations over multi-core platforms Static Program Analysis. The process starts [15]. with the processing of facts and rules by a lexer and parser. An intermediate

representation of the Datalog program is are inserted in this step. In step 2 type constructed. The intermediate representation conversion and data types are analyzed, as a is translated to SQL queries by using simple result, Java incompatible types are removed. syntactic translation schemes. The proposed In step 3 C/C++ code is translated into Java work cannot be matched with the code, compiled with any Java compiler, and performance of industrial-strength tools [16]. verified. Also, it doesn’t support many L. Thévenoux et al presented his work on features such as external libraries, assembly automatic source to source Error code, and goto statements. Experiments have compensation of floating-point programs. shown that Ephedra demands source code Numerical programs may suffer inaccuracies need to be manually altered for processable as finite precision arithmetics is an [5]. C2Eiff [14], [10] is a tool that performs approximation of real arithmetic. The automatic translation of complete C code research considers IEEE 754 floating-point into Eiffel, an object-oriented programming arithmetic and used Error Free language. It supports the complete entire C Transformations that are lossless of basic language such that (Function pointers, floating-point operations. The proposed pointer arithmetic, unrestricted branch approach resulted in many accurate values; instructions). It compiles GNU C compiler moreover, it is the first step to automatic extensions and ANSI with the help of CIL generation of multi-criteria program (C Intermediate Language) framework, also optimizations [17]. support native libraries. The generated code is functionally equivalent to the C code. The C2J [7] is a C to Java translator that completeness of code is evaluated by an translates large volumes of C code correctly attested set of programs to which translation to Java. The translated code is difficult to was performed. C2Eif automatically fully read which circumvents run-time checking translated over 900,000 lines of C code to system and Java types which make it hard to functionally equivalent 2 Eiffel code. This interface with mainstream Java programs. It translation introduces contracts that help in requires a lot of modifications in the detecting errors such that null pointer translated code to be run correctly. Java referencing etc. This will improve the Backend for GCC also translates C code to readability of the code. There are two Java and has the same disadvantages as methods to reuse the source code written in a C2J[8]. Some translators only focus on the foreign language into the host language: extension of original language like migration First is wrapping foreign code that uses the from legacy code into object-oriented foreign language implementation by API of languages, such as C to C++ language [3]. bridge libraries. The second is Translating Ephedra [4] is a tool that translates the foreign code. C2Eiff uses a wrapping foreign legacy C code to Java. It does not translate code approach to translate only assembly fully C code but supports a heavy subset of code and external functions. C. Ephedra [9] defines three steps to conversion C/C++ code into Java code. In Martin et al. Presented automatic source step 1, conversion of K&R style C code code transformation between octave and R which doesn’t contain function prototypes. It fourth-generation languages [6]. The main limits the capability of the compiler to goal is to convert octave algorithms into R perform checking. All function prototypes for scientists to use in their applications.

TXL programming language was used for There are numerous online and offline tools analysis and transformation. TXL was available for performing source to source created for the transformation of source and transformation. These tools transform source target languages that somehow similar. TXL code of different languages into different does not give conversion code into the target languages such that C++ to C# correct format, for this purpose authors used converter, C# to VB and VB to java, etc. PERL scripts for correcting the format of [23]. We have taken C# as a source language code. As a result, the author evaluated its and VB (Visual Basic) as the target language effect on the performance and readability of to have a comparative study of different converted code. tools. The tools we have selected for this purpose are Telerik [20], VaryCode [21], Multi-online Converter [22] and Instant VB 1) on which we have tested auto- [23]. Blanker, another tool that searches and transformation of code from C# to visual unifies equivalent statements available in the basic. Later on, we have checked the code language before feeding the source to an manually and alter it accordingly. We also existing code clone detector limited to type- checked for the areas that are not covered by 2 clones. Gabriel Sebastián has explained A any of these tools as well as the validity of comprehensive approach to Model Driven programs converted. Architecture (MDA), from the definition of We have presented the result of our testing the computational independent model (CIM in Table 1 considering 18 different features. layer) to the implementation-specific model Some of the features are covered by all of (ISM layer) and the process of these tools such that Loops, Data Types, transformations required for automatic Conditions, Access Specifiers, Modifiers source code generation (in HTML and and Read and Write methods but the other JavaScript) from Language Learning Apps features vary. These tools do not produce a [26]. full running code as we need to amend some of the factors such that placement of 3. METHODOLOGY Namespaces and Libraries etc. A common Our proposed methodology is presented in problem in all of these tools is that the figure 1. We took a dataset of 50 C# converted code contains a shared main programs from [1] and tested their auto- function within the class that is not conversion into Visual Basic on Telerik, supported by the VB compiler so we need to Varycode, Multi-online converter, and add another main function outside the class InstantVB. These tools transform C# code and place the main code over there. into visual basic code automatically. We 4. CONCLUSION have taken a list of features (as given in table

Figure 1: Source Code Language Transformation Model

Due to rapid improvement in the 2001. APSEC 2001. Eighth Asia- programming languages, organizations want Pacific. IEEE, 2001. to shift their software to the new [4] J. Martin and H. A. Muller, "Strategies updated/upgraded code. Every time, it’s hard for migration from C to Java." Software to develop code from scratch because it’s a Maintenance and Reengineering, 2001. complex, time-taking activity. So automatic Fifth European Conference on. IEEE, code transformation always very supportive 2001. and expedite the work. In this paper, we have performed a [5] T. Marco, et al., "C to OO translation: feature-based comparison of four different Beyond the easy stuff." Reverse tools that can transform code automatically Engineering (WCRE), 2012 19th from C# to VB. Results of comparative Working Conference on. IEEE, 2012. analysis conclude that VaryCode is best [6] J. Martin and J. Gutenberg, "Automated among the four tools as its converted source code transformations on fourth- programs throw fewer errors and require generation languages." Software minor changes while running the program. Maintenance and Reengineering, 2004. Moreover, it also exhibits that none of the CSMR 2004. Proceedings. Eighth tested tools support all features and European Conference on. IEEE, 2004. appropriate modifications need to be [7] J. Martin and H. A. Muller, "C to java performed in our converted code to take full migration experiences." Software benefit. Maintenance and Reengineering, 2002. As future work, our comparative analysis Proceedings. Sixth European could guide the developers to improve the Conference on. IEEE, 2002. transformation tools by covering the features that are not supported yet. Also, a mature [8] T. Waddington, Java Backend for GCC. practice for such transformation could http://archive.csee.uq.edu.au/ ̃csmweb/u support the shifting of legacy code into the qbt.html#gcc- jvm, November 2000. code of modern language. Such [9] K. Cashion, S. Ravindran, N. Powar and transformation will also support software J. Gold, "IOT device code translators refactoring with better reusability also. using LSTM networks," 2017 IEEE Along with all such options, we will extend National Aerospace and Electronics this work to find out its financial impact Conference (NAECON), Dayton, OH, 2017, pp. 88-90. REFERENCES [10] M. Trudel, C. A. Furia, and M. [1] "Types of compilers". Compilers.net. Nordio, "Automatic C to OO translation 1997–2005. Retrieved 28 October 2010. with C2Eiffel." Reverse Engineering [2] A. J. Malton, "The software migration (WCRE), 2012 19th Working barbell." ASERC Workshop on Conference on. IEEE, 2012. Software Architecture. 2001. [11] Li, Dan, et al., "Gpu-s2s: a compiler [3] Z. Ying and K. Kontogiannis, "A for source-to-source translation on gpu." framework for migrating procedural Parallel Architectures, Algorithms and code to object-oriented platforms." Programming (PAAP), 2010 Third Software Engineering Conference,

International Symposium on. IEEE, accuracy and time." Concurrency and 2010. Computation: Practice and Experience [12] D. Grigoris, C. Lezos, and K. 29.7 (2017). Masselos, "MEMSCOPT: A source-to- [18] Khelifi, N. Yamouni, M. Śmiałek, and source compiler for dynamic code R. Mekki, "Generating database access analysis and loop transformations." code from domain models." Computer Design and Architectures for Signal and Science and Information Systems Image Processing (DASIP), 2012 (FedCSIS), 2015 Federated Conference Conference on. IEEE, 2012. on. IEEE, 2015. [13] A. Cfir, Y. B. Asher, and G. Haber, [19] F. Hiroyuki, et al., "Pseudogen: A "Parallelization Hints via Code Tool to Automatically Generate Skeletonization." IEEE Transactions on Pseudo-Code from Source Code." Parallel and Distributed Systems 26.11 Automated Software Engineering (ASE), (2015): 3099-3107. 2015 30th IEEE/ACM International [14] S. Ribic, "Concept and Conference on. IEEE, 2015. implementation of the programming [20] “Code Converter”. Telerik Code language and translator, for embedded Converter by systems, based on machine code progress.Converter.telerik.com/. Web. decompilation and equivalence between Accessed 10 Jan 2018. source and executable code," 2006 13th [21] “Online Code Converter: C#, VB, Working Conference on Reverse JAVA, C++, Ruby, Python, Boo”. Engineering, Benevento, 2006, pp. 307- VARYCODE. www.varycode.com/. 308. Web. Accessed. 18 Jan 2018.

[15] C. Johan, B. Videau, and V. [22] ”code Translator: Code Translation Marangozova-Martin, "BOAST: from VB.NET <-> C# <->TypeScript <- Bringing optimization through > Java”. Carlosag.net. automatic source-to-source www.carlosag.net/tools/codetranslator/. transformations." Embedded Multicore Accessed. 21 Jan 2018. Socs (MCSoC), 2013 IEEE 7th international symposium on, IEEE, [23] “Source Code Converter”. Tangible 2013 Software Solutions. www.tangiblesoftwaresolutions.com/pr

[16] S. Bernhard, et al., "A datalog oduct_details/csharp-to-vb- source-to-source translator for static converter.html. Web. Accessed. 25 Jan program analysis: an experience report." 2018. Software Engineering Conference (ASWEC), 2015 24th Australasian. [24] “1000 C# Programs with Example IEEE, 2015. Code and output-Sanfoundry”. Sanfoundry Technology Education

[17] T. Laurent, P. Langlois, and Blog. M.Martel, "Automatic source‐to‐source error compensation of floating‐point www.sanfoundry.com/csharp- programs: code synthesis to optimize programming-examples/. Web. Accessed. 12 Jan 2020.

[25] G. Sebastián, R. Tesoriero and J. A. [26] D. Pizzolotto and K. Inoue, "Blanker: Gallud, "Automatic Code Generation for A Refactor-Oriented Cloned Source Language-Learning Applications," in Code Normalizer," 2020 IEEE 14th IEEE Latin America Transactions, vol. International Workshop on Software 18, no. 08, pp. 1433-1440, August 2020, Clones (IWSC), London, ON, Canada, doi: 10.1109/TLA.2020.9111679. 2020, pp. 22-25.

Table 1. Feature based Comparison

Features: Telerik Varycode Multi online Instant VB converter Comments: Do not convert Convert comments Do not convert Convert comments comments comments Namespaces Predefined Predefined Predefined Predefined + Name Name Name Spaces: Name Spaces: Libraries: Spaces:(Error) Spaces:(Error) Converted user- Converted in converted in converted defined user-defined code user- code user- NameSpaces: NameSpaces: defined Name defined name Do not Converted Spaces: Do not convert Spaces: Do not convert convert Loops: Supporting loops Supporting loops Supporting loops Supporting loops

Data type: Primitive Data Primitive Data Primitive Built-in Data Types Types (Converted: Types (Converted: DataTypes (Converted: Int, Date, Int, Date, Unsigned Int, Date, Unsigned (Converted: Int, char, Double, Arrays, int, char, Double, int, char, Double, Date, char, Strings, Nullable String, Nullable String, Nullable Double, Arrays, datatypes) datatypes) datatypes) Strings, Nullable User-Defined Data User-Defined User-Defined datatypes) Types DataTypes Datatypes User-Defined Converted Converted Converted Datatypes Converted Conditional Supports conditions Supports conditions Supports Supports Statements: if+ switch if+ switch if+ switch if+ switch Access specifier: Converted Converted Converted Converted

Modifiers: Converted Converted Converted Converted Shift Not supported Supports Shift Not supported Converts but throws Operation: Operation Exception Read and Converted Converted Converted Converted write Input: Classes: Multilevel Multilevel Multilevel Multilevel inheritance inheritance inheritance inheritance converted converted converted converted Single level Single level Single level Single level inheritance inheritance inheritance inheritance converted converted converted converted Abstract class Abstract Abstract Abstract Inherited classes classes classes classes require Must Inherit in (Converted (Converted (Converted) VB but not ) ) available in converted code Macros: Handles i.e. By Takes macros in Takes macros in Macros Converted making functions converted code as they converted code as they are in are in original program so original program so error error

Exceptions: Converted Converted Converted Converted (IndexedOutOfRange, (IndexedOutOfRange (IndexedOutOfRange, (IndexedOutOfRa DividingbyZero(datatype (no proper syntax), DividingbyZero(dividec nge, conflict), DividingbyZero(dividec on DividingbyZero, InvalidTypeCasting on flict),invalidTypeCasti invalidTypeCastin flict) , ng, g, InvalidTypeCasting, nullReference) StackOverflow(Error), , MultipleException, StackOverflow, null reference) StackOverflow nullReference) (converted but not working due to specifier with main function), nullReference) Typecasting: Typecasting in VB Typecasting in VB Typecasting in VB Typecasting in VB converted but do not converted but do not converted but do not converted. follow syntax in some follow syntax in some follow syntax in some programs. programs. programs. i.e. case of alphabets in i.e. case of alphabets i.e. case of alphabets statement in statement in statement Functions: Built-in functions Built-in functions Built-in functions Built-in functions converted (toString, converted (toString, converted (toString, converted GetType, Copy, Sort, GetType, Copy, Sort, GetType, Copy, Sort, (toString, Copy, Split, Split, Reverse, Split, Reverse, Sort, Split, Concat,Substring,Index IndexOf, Concat, substring, Reverse, Concat, O F,Replace,Reverse, BinarySearch, Trim, IndexOf, Replace substring, Trim, BinarySearch,Trim) Concat, SubString) , BinarySearch, Trim) IndexOf, Replace, User-defined functions User-defined User-defined BinarySearch) converted functions functions User-defined Function return type converted converted functions Virtual conversion Function return type Function return type function converted Virtual function conversion conversion Method Hiding converted Method Virtual function Virtual function (Converted) Static Hiding (Converted) converted Method converted Method method Pass by Static method Pass by Hiding converted Hiding (Converted) reference reference (Error: Static method Pass by (public/private (public/private shared overriding demands reference shared main error) main error) over-ridable function (public/private shared that is not in main error) conversion) Static method pass by reference not converted Online/ Online Online Online Downloadable Downloadable

Syntax Sometimes do not Sometimes do not Sometimes do not Follows the follow the syntax follow the syntax follow the syntax syntax Pointers Not Converted Converted but error as Converted Converted but not able to create a error as not able to pointer class (No create a pointers in VB) pointer class (No

pointers in VB)

Jump Converted (goto) Converted (goto) Converted (goto) Converted (goto) statements: but do not follow the syntax