Agile to Transform Web Applications

Thomas Dean1 and Mykyta Synytskyy2

1 Electrical and Computer Engineering, Queen’s University [email protected] 2 Amazon.com [email protected]

Abstract. Syntactic analysis lies at the heart of many transformation tools. Grammars are used to provide a structure to guide the application of transformations. Agile parsing is a technique in which grammars are adapted on a transformation by transformation basis to simplify transformation tasks. This paper gives an overview of agile parsing techniques, and how they may be applied to Web Applications. We give examples from several transformations that have been used in the Web application domain.

1 Introduction

Many program transformation tools [3,4,6,7,8,21,25] are syntactic in nature. That is, they parse the input into an abstract syntax (AST) or graph (ASG) as part of the transformation process. There is good reason for this: syntax is the framework on which the of most modern is defined. One example is [24], which maps the syntax to a formal mathematical domain. The syntax of the , and thus the structure of the tree or graph used by these tools is determined by a grammar. The grammar is typically some form of context free grammar, although it might be augmented with other information [1,2]. In most cases, the transformation rules that may be applied to the input are constrained by the resulting data structure. Grammars are used by transformation languages to impose a structure on the input that can be used by the rules. These grammars are general grammars and represent the compromise of authoring the best grammars that are suitable for a wide variety of tasks. However, for some tasks, minor changes to the grammar can make significant differences in the performance of the transformations. We call changing the grammar on a transfor- mation by transformation basis agile parsing [13]. One particular area of analysis that agile parsing is useful for is web applications. Web applications are written in a variety of languages, and are the subject of a variety of analysis and transformation tasks. We have used agile parsing techniques on several web application transformation projects [11,14,16,22,23,26,27]. We do not discuss the details of the transformations as they are covered elsewhere. Instead we focus on how grammars can be crafted for the web applications and how they are customized for each of the particular transformation tasks.

R. Lämmel, J. Saraiva, and J. Visser (Eds.): GTTSE 2005, LNCS 4143, pp. 312 – 326, 2006. © Springer-Verlag Berlin Heidelberg 2006 Agile Parsing to Transform Web Applications 313

2 Agile Parsing

Agile parsing means customizing the grammar to the task. It is a collection of techniques from a variety of sources that we have found useful for transformation tasks. In this section, we give an introduction to agile parsing, how it is accomplished in the TXL and a quick overview of some of the techniques. The parsing techniques generally serve several needs. One is to change the way the input is parsed. One example is that in the C reference grammar the keyword typedef is simply given in the grammar as a storage specifier, eliminating any syntactic difference between variable and type declarations. That is, there is a single non-terminal declaration, that consists of a sequence of declaration specifiers followed by the declarator list. A declaration of a global variable and a type definition using typedef are parsed as the same non-terminal. For many transformations, this is a reasonable choice. However a transformation that targets only type definitions must include conditions in the rule to match only those declarations that include the typedef keyword. Splitting the declaration productions into two non-terminals, one which requires a typedef keyword, and one that does not, lets the parser make the distinction. Those rule sets that depend on the distinction become much simpler since they no longer have to code the distinction directly as conditions in the rule. They can simply target the appropriate non-terminal. The change to the grammar is local and only applies for the given transformation task. The second need is to modify the grammar to allow information to be stored within the tree. An example is to modify the grammar to allow us to add XML style markup to various non-terminals in the grammar. While some transformation systems permit metadata to be added to the parse tree, sometimes data more complex than can be handled in the metadata is more appropriate for the problem. Thus the markup can be used to store temporary information part way through the transformation. Markup can also be used as a vehicle for communicating the results of an analysis to the user. The third need is to robustly parse a mixture of languages such as embedded SQL in COBOL[18] or in this case, dynamic scripting languages in HTML. The last need is to unify input and output grammars so that the rules may translate from one to the other while maintaining a consistent and well typed parse tree.

2.1 Agile Parsing in TXL TXL is a pure language particularly designed to support rule- based source-to-source transformation [7,8,10]. Each TXL program has two parts: a structure specification of the input to be transformed, which is expressed as an unrestricted context-free grammar; and a set of one or more transformation rules, which are specified as pattern/replacement pairs to define what actions will be performed on the input. Each pattern/replacement pair in a transformation rule is specified by example, and may be arbitrarily parameterized for more rule flexibility. Figure 1 shows a simple TXL grammar for expressions. Square brackets denote the use of a non-terminal. Prefixing a non-terminal symbol with the keyword repeat denotes a sequence of the non-terminal and the vertical bar is used to indicate alternate productions. The terminal symbols in the grammar are either literal values, such as the operator symbols in Fig. 1, or general token classes which are also referenced using square brackets ([id] for identifier in Fig 1).

314 T. Dean and M. Synytskyy

define expression define term [term] [repeat addop_term] [factor] [repeat mulop_factor] end define end define

define addop_term define mulop_factor [add_op] [term] [mul_op] [factor] end define end define

define add_op define factor + | - [id] end define end define

Fig. 1. Simple Expression Grammar in TXL

Agile parsing is supported in TXL through the use of the redefine keyword. The redefine keyword is used to change a grammar. Figure 2 shows an example. In this example, the factor grammar production is changed to include XML markup on the identifier. The ellipses (‘...’) in the factor redefinition indicate the previous productions for the factor non-terminal. Thus a factor is whatever it was before, or it may be an identifier annotated by XML markup. Subsequent transformation programs may choose to insist that all such identifiers be marked up when parsing the input by removing the first line and the vertical bar from the factor redefinition. The TXL processor first tokenizes the input using a standard set of tokens which may be extended by the grammar definition, parses the input using the grammar and then applies the rules. In TXL, the rules are constrained by the grammar to keep the tree well formed. Thus the grammar includes not only the productions for parsing the input, but also the productions for the output and intermediate results. This may lead to ambiguities which are resolved through the use of an ordered parse. That is, the order of the rules in the grammar is used to determine the form of the input.

3 Agile Parsing in Web Applications

Web applications deliver services over the HTTP protocol, usually to a browser of some sort. Early web applications provided dynamic content to standard web browsers using server side scripting such as CGI scripts, servlets, JSP and ASP. Recent innovations include Real Simple Syndication (RSS), SOAP and AJAX.

redefine factor define XMLEnd ... | [XmlStart][id][XMLEnd] end define end redefine define XmlStart define XMLParm <[id] [repeat XMLParm]> [id] = [stringlit] end define end define Fig. 2. Adding XML Markup to Identifiers in Expressions

Agile Parsing to Transform Web Applications 315

Web applications present a particular challenge for parsing and for transformation. While some of the files in the web application contain conventional languages such as Java, the core files of the web application (the web pages) are typically comprised of a mixture of several languages. The web pages typically contain HTML, augmented with JavaScript, and server side languages such as Java, Visual Basic, or PHP. Other components of the application may be in XML such as various configuration files, or the XML transfer schema such as that used in SOAP. For some web transformation tasks, it may be possible to convert to a single language [12], but if it is desired to translate between client and server side technologies, we must be able to represent all of the languages in a single parse. A grammar for web applications must also be robust. Most browsers accept malformed HTML and our approach must also accept and deal with the malformed HTML.

3.1 Island Grammars and Robust Parsing

The first agile parsing technique is island grammars [5,16,18,19,23]. Island grammars, originally introduced by van Deursen et al. [5], are a technique where the input is divided into interesting sequences of tokens, called islands and uninteresting tokens called water. Grammar productions that recognize the islands are given precedence over the water grammar productions which typically match any token or keyword. The productions can be nested, with islands containing lakes (sequences of tokens within the island that are uninteresting), which may in turn contain nested islands. Island grammars provide a natural way to handle mixed languages and to handle unexpected input in a robust way [18]. It can also be used to identify interesting sequences of tokens without performing a detailed parse of the entire input [19]. Island grammars must be crafted with care since errors in the grammar may cause interesting token sequences to be parsed as water. In the context of web applications, we use island grammars to find and parse application elements within the natural language on the web page and to handle erroneous input in a reasonable way. Island grammars provide us with a mechanism to identify the interesting elements. However, what constitutes an interesting element depends on the task. In some contexts we are only interested in the Java or Visual Basic code that is embedded within the web page and the links from that code to other components such as other code modules or databases. In other contexts we may be interested in some of the structural components of the web pages such as tables, forms, anchors and links. Agile parsing permits us to change what sequences of tokens are considered islands based on the task. Island grammars provide robustness. The general access by almost anyone to web publishing means that browsers have to deal with errors in the HTML markup in the pages. Two common errors are that style markup tags in web documents be improperly nested and that closing tags for some constructs may be missing. The following, although technically illegal, is accepted: “bold italic text ”. An example of the second is the closing tags for tables, table elements and forms. In some cases, the tags are implicitly closed when a surrounding markup is closed (table rows closed by the table, nested tables closed by surrounding tables, tables closed by the end of the document.

316 T. Dean and M. Synytskyy

define program define interesting_element [repeat html_document_element] [html_interesting_element] end define end define

define html_document_element define uninteresting_element [interesting_element] [html_uninteresting_element] | [uninteresting_element] |[token_or_key] end define end define Fig. 3. Core HTML Grammar in TXL

Since browsers allow these errors, any analysis or transformation must also allow these errors and more importantly, interpret them in the same way as browsers do. We have developed an HTML island grammar that is the basis of our approach to analyzing and transforming web applications. Figure 3 shows the core of our HTML grammar. The predefined non-terminal program is the goal symbol of the grammar. In this case, it is a sequence (i.e. repeat) of document elements (html_docu- ment_element). Each document element is either an interesting element or an uninteresting element. Since TXL uses an ordered parse, only input that cannot be parsed as interesting elements can be parsed as uninteresting elements. The base grammar defines interesting elements as interesting html elements (the non-terminal html_interesting_element). In our case interesting elements include anchors (including links), tables and forms. The main purpose of the interesting_element non-terminal is to act as a extension point when adding other languages to the grammar. Uninteresting elements use the grammar production token_or_key which is defined as any token or keyword. The html_uninteresting_element production called from uninteresting element is used to handle formatting issues. TXL uses a generalized unparser to write out the results of the transformation. Each token is written out separated from the previous by a space. This formatting is occasionally inappropriate for HTML text. The html_uninteresting_element production allows the parser to recognize those cases and provide formatting cues in the grammar to handle them. Figure 4 shows the grammar productions used to recognize the form elements (one of the interesting elements). The production html_form_tag (which is one of the alternatives of html_interesting_element) is rather straightforward. It is a form tag with parameters (html_any_tag_param) followed by form content and an optional end element. The end element must be optional since non-terminated forms are accepted by most browsers. The SPOFF, SP, SPON, and NL symbols are formatting cues. The SPOFF cue turns off spacing between the tokens as they are written to the output until the SPON cue is encountered. The SP cue inserts a space and the NL cue inserts a newline into the output. These elements exist in the grammar only, they are not included in the parse tree, and are only used when the parse tree is written out. They have no effect on the construction of the parse tree from the input. The content is either legitimate form content which includes form elements or other content which are html_document_elements such as arbitrary text and other islands such as tables. The TXL not modifier is used to guard the other content

Agile Parsing to Transform Web Applications 317

define html_form_tag define html_form_content [SPOFF]

|[html_form_other_content] [SPON][NL] end define [repeat html_form_content] [opt html_form_tag_closing] define html_form_other_content end define [not html_form_tag_stop] [html_document_element] define html_form_tag_closing end define [SPOFF]
[SPON][NL] end define define html_form_tag_stop

redefine interesting_element define jsp_interesting_element ... [jsp_start] |[jsp_interesting_element] [jsp_expression] end redefine [jsp_end] |[jsp_start] define jsp_delimiter [repeat jsp_scriptlet] [jsp_start] | [jsp_end] [jsp_end] end define |[jsp_useBean] |[jsp_formal_declaration] |[jsp_include_directive] |[jsp_include_action] |[jsp_forward] end define Fig. 5. Grammar for JSP Elements

318 T. Dean and M. Synytskyy before, the interesting_element production is extended to include the JSP elements. The tokens jsp_start and jsp_end represent the “<%” and “%>” tokens respectively. So interesting elements are java expressions and scriptlets that are enclosed between jsp_start and jsp_end elements, jsp bean declaratives, declarations include directives, include actions and forward directives. The other side of the extension is the ability of java statements to include HTML. For example:

<% for(i = 0; i < 10; i++){ %> <% } %>
<%=i%>

In this case we have a Java for loop that generates the first 10 integers in a table. With the grammar additions shown in Figure 6, the end of the first scriptlet (containing the start of the for statement, will be treated as if it was a Java statement. The statement consists of HTML elements which in turn contain a nested java expression. The jsp_html_segment, is the reverse of a jsp_interesting_element. It starts with a jsp_end token and ends with a jsp_start token. The content is similar to the regular html content, but does not include scriptlets. Opening another scriplet thus results in a statement at the same scope level as the html text. One final wrinkle is that Java code may also be invoked within the attribute values of tags. They are usually enclosed in double quotes. This causes some difficulty in TXL since string literals are primitive tokens in TXL. We handle this with a simple lexical preprocessor that translates double quotes containing java scriptlets into double square brackets. The grammar for attributes in html tags is then extended to permit double square brackets as well as identifiers and string literals.

3.2 Markup Grammars

Another agile parsing technique we use for analyzing and transforming web applications is markup grammars. Markup grammars extend the original grammar for some language to recognize a markup that has been applied to the input. Sometimes

define jsp_element_or_html_seg define jsp_html_segment_content [jsp_declaration_or_statement] |[jsp_useBean] |[jsp_expression] |[jsp_formal_declaration] |[jsp_html_segment] |[jsp_include_directive] end define |[jsp_include_action] |[jsp_forward] define jsp_html_segment |[html_only_interesting_element] [jsp_end] |[not jsp_delimiter] [repeat jsp_html_segment_content] [html_element] [jsp_start] end define end define Fig. 6. Grammar for JSP Islands

Agile Parsing to Transform Web Applications 319

if(! rs.next()){ Preparedstatement stm = con.preparedstatement(“INSERT INTO Employee” + “(Username,Password,FirstName,LastName) VALUES(?,?,?,?)”); stm.setstring(1,request.getparameter(“userName”)); Fig. 7. Implementing Slicing in Web Applications Using Markup the markup is used as part of a transformation. For example, the use of a variable may be marked up with unique identifier linking the declaration and all uses of the variable[14]. Markup can be used to store details about a transformation. Some transformations can be complex to identify, but rather simple to carry out once they have been completed. The complex identification problem can be broken down into multiple simple transformations, each of which analyses some part of the application and adds markup to encode the information that has been deduced. Subsequent transformations can combine markup to produce more sophisticated markup [9,12,17]. Once the information has been identified and the appropriate elements annotated, then the final transformation is straightforward. Markup can also be used to convey information to further transformations or back to the user. For example, a slice is the minimal subset of a program that can affect the slicing criteria[16]. The slicing criteria is the value of one or more variables at a particular point in the program. Markup can be used to show both the slice and how the slice was computed in the context of the larger program. Figure 7 shows an small example of a slice in the Java portion of a JSP application using markup. The slicing criteria is identified using the slice tag which annotates the stm.setstring method call near the end of the code sequence. The set of variables active at each statement is calculated by a transformation and stored using the varset tag, and finally the elements of the slice are identified using backslice tag. This markup is intended for further transformations and is not very readable. One of the transformations[16] used to present the result for human consumption wraps the entire JSP page in

tags, translates all of the angle brackets of the HTML tags to ampersand notation (i.e. < >) and translates all of the slicing markup to font tags with color (i.e. ). This allows the code to be viewed in a standard web browser.

4 Tasks

We have used the island based web application grammar as a core for several tasks. These include identical and near miss clone detection and conversion of classic Java Server Pages (JSP) to JSP with custom tags. For each task, we make small modifications to the grammar using TXL’s grammar redefinition mechanism.

320 T. Dean and M. Synytskyy

include “HTML.Grm” rule exportEachIsland include “ASP.Grm” replace $ [interestingElement] I [interesting_elemnet] function main import Seq [number] export Seq [number] construct FileName 1 CloneCandidate_ [+ Seq] replace [program] export Seq P [program] Seq [+ 1] by by P [exportEachIsland] I [write FileName] end function end define

Fig. 8. Island Extraction Transform

4.1 Clone Detection

The first task is clone detection. One may wish to limit clone detection to significant syntactic units. This is particularly important if you are looking for structural clones in web sites. Cloned text in the body of the page is not as interesting, except to the extent that participates in the cloned structure. An example is a menu bar on the top or side of a web page that was implemented directly within HTML. Such a menu bar might be implemented using tables. Since islands represent the significant structural elements, then they form an ideal basis for clone detection and resolution. Our approach [11,22] to clone detection is to write each island out to a separate file. This is accomplished with a very simple transformation, shown in Figure 8. As explained previously, TXL grammars contain standard formatting cues, so each island is written in a standard format. The detection of identical clones is accomplished by comparing the files. This essentially turns a structural comparison into a lexical comparison. Figure 9 shows an example table clone candidate. Since we are dealing with identical clones, we only show a single instance of the clone. While this is essentially the same as using a pretty printer to print the code and use lexical analysis to find the clones, the difference is that the use of the island grammars limits the output to individual islands. Thus the clone candidates are limited to the structural elements of the page identified by the islands.

File CloneCandidate_3:

foo bar
abc xyz
Fig. 9. Formatted Table Island

Agile Parsing to Transform Web Applications 321

define div_table_tag define html_div_content [SPOFF]

end define [SPON][NL] [repeat html_div_content] redefine interesting_element [opt div_table_tag_close] [div_table_tag] end define | ... end redefine define div_table_tag_close [SPOFF]
[SPON][NL] end define Fig. 10. Adding Div Islands

To change the structural elements that are considered for the clone analysis, you need only change the grammar to identify what elements are possible clones. This is analogous to our typedef example earlier where we changed the grammar to simplify the rules. In our system, islands were tables, forms and links, thus only tables forms and links are potential clones. One of the latest trends in web sites is to use the HTML div tag to build the navigation menus rather than HTML tables. Adding the definitions from Figure 10 to the program in Figure 8 adds the div tag island to the grammar (by redefining interesting_element to include div_table_tag) and thus makes them clone candidates. Once identified, clones in web applications can be resolved using a variety of techniques[22]. The multilingual island grammar provides a framework for such transformations. A similar approach can be used for locating identical clones in conventional programming languages. Rather than parse the entire language, the elements that you wish to consider as clones can be isolated using an island grammar. For example, one might construct an island grammar which selects java method headers and a water grammar that balances braces (i.e. ‘{‘ and ‘}’) inserting newlines at all semicolons (‘;’). This would permit the identification of identical method clones. One could also add control statements as islands. Since islands nest, it would both make control statements candidates for clones as well as change the formatting of control statements within method islands. The change of formatting would change the way in which control statements contribute to the identification of identical method clones. This method can also be easily extended to handle near miss clones. Near miss clones are clones that differ only by a small amount. A web application example is the same menubar, but one in which the current page is highlighted. So the menubar on the home page will have the home entry shown in a different color, while the contact page will have the contact entry on the menu shown. We extend our approach to handle near miss clones by modifying the grammar to include different formatting cues. Figure 11 shows the redefinitions to the table grammar to handle near-miss clone detection. The only difference is the addition of the [NL] tokens in the appropriate places. In this case, we add newlines after the tag identifiers table, tr and td, after each attribute, and after the closing angle bracket. Figure 12 shows two near miss table clone islands formatted by this grammar. The arrows identify the different lines. One line is an attribute of the table, the other is the

322 T. Dean and M. Synytskyy

redefine html_table_tag redefine html_table_row

[NL] > [NL] [repeat html_table_content] [repeat html_th_or_td] [opt html_table_tag_closing] [NL] end redefine end redefine

redefine html_tag_param redefine td [id]=[stringlit_or_id] [NL]

[NL] redefine html_table_tag_closing [repeat html_document_lement]
[NL] [NL] end redfine end redefine Fig. 11. Near Miss Clone Islands contents of one of the table cells. Near miss clones are then identified using a threshold on the ratio of the number of changed lines to the number of identical lines. The formatting control allows us to control the balance between the structural and the water elements of the islands. Most islands have some water in them. Tables usually have some content in each of the cells. By formatting the tag and HTML attributes on separate lines and leaving the water on a single line, we control the balance between the weight of structural elements and the water elements. A single difference in a tag (e.g. TH instead of TD) or attribute produces an additional line difference, while multiple changes the same section of water will only produce a single line change.

> … … Fig. 12. Near Miss Clone Islands

Agile Parsing to Transform Web Applications 323

define ctag_start define tag_name <[tag_name] [id] : [id] [repeat tag_attr]> end define end define define ctag_element define ctag_end [not ctag_end] [html_document_element] end define end define

define custom_tag redefine interesting_element [ctag_start] ... | [custom_tag] [repeat ctag_element] end redefine [ctag_end] end define Fig. 13. CustomTag Grammar Module

4.2 JSP Custom Tag Translation

JavaServer Pages (JSP) is one of the popular technologies for building web applications that serve dynamic contents [1]. As indicated by the name, the scripting language in JSP is, by default, Java. In particular, scriptlets, which take the form of “<% Java code %>”, are used to embed Java code within HTML. The explicit use of scriptlets facilitates rapid prototyping but introduces complexity into the implementation. The mixture of HTML and Java blurs this distinction between the presentation and business logic which is needed in larger applications. Such separation can not only make a web application easier to maintain and evolve, but also allow individual developers with different skills to cooperate more efficiently. The transformations of our approach to the migration from embedded Java to custom tags [26,11] require support from the grammar. This support takes several forms. We extend the HTML island grammar to handle custom tags (as well as Java). Figure 13 shows the grammar for the custom tags. It is very similar to the other markup grammars. The major addition, is that tag names are now two identifiers separated by a colon. We also require unique naming markup for Java code to keep track of variables during control and data flow analysis. Each of the scriplets and each java variable is identified with a unique identifier. For example:

String[] items = cart.getItems ();

This information is supplemented with information in the tags to which each statement will belong. The markup is used by the transformation that removes the scriplets from the JSP pages, replaces them with custom markup and generates the java classes that implements the custom tags. Since the variables may end up in

324 T. Dean and M. Synytskyy different classes (for different custom tags), the unique naming is also used to do dependency analysis to ensure that all variables accessed between different custom tags have appropriate get and set methods.

5 Conclusions

This paper has presented some of the ways agile parsing can be used as a basis for analyzing and transforming web applications. Web applications provide a particular challenge to transformation since they often contain multiple languages, some of which are executed on the server side, while others are executed on the client side. We have examined two techniques in particular: the use of robust island grammars to uniformly represent all of the languages in a single parse and markup grammars which allow us to add annotations to web applications as part of various analysis and transformation tasks. We have also examined how agile parsing has been used in several transformations tasks for web applications: structural clone detection and JSP migration. In both cases the general island grammar used to parse and transform the complex HTML pages was modified as part of the transformation task. In the clone identification task, the island grammar was used to identify clone candidates based on the syntactic structure provided by the islands. Changing which features are parsed as islands changes what may be considered clones. The JSP migration task needed two changes to the grammar. The first change is a markup that associated uses of java variables with the declaration of those variables. The other is the addition of the custom tag grammar productions for the result of the transformation.

References

1. Badros, G.: “JavaML: a Markup Language for Java source code”, Computer Networks Vol 33, No. 6 (June 2000), pp. 159-177. 2. Bell Canada, Datrix Abstract Semantic Graph: Reference Manual, version 1.4, Bell Canada Inc., Montreal Canada, May 01, 2000. 3. van den Brand, M., van Deursen, A., Heering, J., de Jong, H., de Jonge, M., Kuipers, T., Klint, P., Moonen, L., Olivier, P., Scheerder, J., Vinju, J., Visser, C., and Visser, J., “The ASF+SDF Meta-Environment: a component-based language development environment”, Construction 2001 (CC 2001), Lecture Notes in , R. Wilhelm, ed., Vol 1827, Springer Verlag,, 2001, pp. 365–370. 4. van den Brand, M., Heering, J., Klint, P., and Olivier, P., “Compiling Rewrite Systems: The ASF+SDF Compiler”, ACM Transactions on Programming Languages and Systems, Vol 24, No 4, July 2002, pp. 334–368. 5. van Deursen, A., and Kuipers, T., “Building Documentation Generators”, Proc. International Conference on Software Maintenance (ICSM 99), Oxford, England, 1999, pp. 40–49. 6. Baxter, I.D., and Pidgeon, C.W., “Software Change Through Design Maintenance”, Proc. 1997 International Conference on Software Maintenance. Bari, Italy, 1997, pp. 250–259.

Agile Parsing to Transform Web Applications 325

7. Cordy, J.R., Halpern, and C.D., and Promislow, E., “TXL: A Rapid Prototyping System for Programming Language Dialects”, Computer Languages, Vol 16, No 1, 1991, pp. 97-107. 8. Cordy, J.R., “TXL - A Language for Programming Language Tools and Applications”, Proc. LDTA 2004, ACM 4th International Workshop on Language Descriptions, Tools and Applications, Edinburg, Scotland, January 2005, pp. 3-31. 9. Cordy, J.R., Schneider, K., Dean, T., and Malton, A., “HSML: Design Directed Source Code Hot Spots”, Proc. 9th International Workshop on Program Comprehension(IWPC 01), Toronto, Canada, 2001, pp. 145–154. 10. Cordy, J., Dean, T., Malton, A., and Schneider, K., “Source Transformation in Software Engineering using the TXL Transformation System”, Special Issue on Source Code Analysis and Manipulation, Journal of Information and Software Technology, Vol. 44, No. 13, 2002, pp. 827-837. 11. Cordy, J.R., Dean, T., Synytskyy, N., “Practical Language-Independent Detection of Near- Miss Clones” Proc 14th IBM Center for Advanced Studies Conference, Toronto, Canada, Oct 2004, 29-40. 12. Dean, T., Cordy, J., Schneider, K., and Malton, A. “Experience Using Design Recovery Techniques to Transform Legacy Systems”, Proc. International Conference on Software Maintenance (ICSM 2001), Florence, Italy, 2001, pp. 622-631. 13. Dean, T.R., Cordy, J.R., Malton, A.J. and Schneider, K.A., "Agile Parsing in TXL", Journal of Automated Software Engineering Vol. 10, No. 4, October 2003, pp. 311-336. 14. Guo, X., Cordy, J., and Dean, T., “Unique Renaming in Java”, 3rd International Workshop on Source Code Analysis and Manipulation, Amsterdam, Netherlands, September 2003. 15. Hassan, A.E., and Holt, R.C., “Migrating Web Frameworks Using Water Transformations”, Proceedings of COMPSAC 2003: International Computer Software and Application Conference, Dallas, Texas, USA, November 2003, p296-303, 16. Li, X., Defining and Visualizing Web Application Slices, M.Sc. Thesis, School of Computing, Queen’s University, 2004. 17. Malton, A.J., Schneider, K.A., Cordy, J.R., Dean, T.R., Cousineau, D., and Reynolds, J. “Processing Software Source Text in Automated Design Recovery and Transformation”, Proc. 9th International Workshop on Program Comprehension (IWPC 2001), Toronto, Canada, 2001, pp. 127-134. 18. Moonen, L.,“Generating Robust Parsers using Island Grammars”, Proc. 8th Working Conference on Reverse Engineering (WCRE 01), Stuttgart, Germany, 2001, pp 13-22. 19. Moonen, L., “Lightweight Impact using Island Grammars”, Proceedings 10th International Workshop on Program Comprehension (IWPC 02), Paris France, 2002, pp 343-352. 20. Neighbors, J., “The Draco Approach to Constructing Software from Reusable Components”, IEEE Transactions on Software Engineering, Vol 10. No. 5, 1984,564-574. 21. Reasoning Systems, Refine User’s Manual, Palo Alto, California, 1992. 22. Synytskyy, N., Cordy, J.R., Dean, T., “Resolution of Static Clones in Dynamic Web Pages”, Proc IEEE 5th International Workshop on Web Site Evolution, Amsterdam, September 2003, pp. 49-58. 23. Synytskyy, Mykyta, Cordy, J., Dean., T, “Robust Multilingual Parsing Using Island Grammars”, Proc. IBM Center for Advanced Studies Conference, Toronto, Canada, Nov 2003. 24. Tennent, A., Semantics of Programming Langauges, Prentice Hall, 1990.

326 T. Dean and M. Synytskyy

25. Visser, E. “Stratego: A Language for Program Transformation Based on Rewriting Strategies. System description of Stratego 0.5.” Rewriting techniques and Applications (RTA ‘01), Lecture Notes in Computer Science, A Middeldorp, ed. SpringerVerlag, 2001, pp. 357–361. 26. Xu, Shannon, Dean, T., “Transforming Embedded Java Code into Custom Tags”, Proc 5th International Worshop on Source Code Analysis and Manipulation, Budapest, Hungary, Oct 2005, 173-182. 27. Xu, Shannon,Modernizing Java ServerPages by Transformation, M.Sc. Thesis, School of Computing, Queen’s Univeristy, 2005.

> foo bar foo bar
abc abc xyz xyzzy