UNIVERSIDAD POLITÉCNICA DE MADRID

ESCUELA TÉCNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN

CONTRIBUTION TO THE AUTOMATION OF QUALITY CONTROL OF WEB APPLICATIONS

TESIS DOCTORAL

BONIFACIO GARCÍA GUTIÉRREZ Ingeniero de Telecomunicación 2011

DEPARTAMENTO DE INGENIERÍA DE SISTEMAS TELEMÁTICOS

ESCUELA TÉCNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN

UNIVERSIDAD POLITÉCNICA DE MADRID

CONTRIBUTION TO THE AUTOMATION OF SOFTWARE QUALITY CONTROL OF WEB APPLICATIONS

Autor: BONIFACIO GARCÍA GUTIÉRREZ Ingeniero de Telecomunicación

Director: JUAN CARLOS DUEÑAS LÓPEZ Doctor Ingeniero de Telecomunicación

2011

Tribunal nombrado por el Magfco. y Excmo. Sr. Rector de la Universidad Politécnica de Madrid, el día 26 de julio de 2011.

Presidente: ______

Vocal: ______

Vocal: ______

Vocal: ______

Secretario: ______

Suplente: ______

Suplente: ______

Realizado el acto de defensa y lectura de la Tesis el día 9 de septiembre de 2011 en la E.T.S.I.T. habiendo obtenido la calificación de ______

EL PRESIDENTE LOS VOCALES

EL SECRETARIO

A mis padres

Agradecimientos

Uno de los conceptos claves de los que va a tratar esta tesis doctoral es la búsqueda de los diferentes caminos que se pueden recorrer para lograr un determinado objetivo. Esta idea no dista mucho de las vivencias de las personas. En estos últimos años he tenido que recorrer diferentes caminos, muchos buenos, otro no tanto, e incluso alguno terriblemente duro. Es por ello que llegados al punto en el que se cierra este ciclo, es el momento de echar la vista atrás y acordarme de todas las personas que han hecho posible este viaje. Quiero en primer lugar expresar mi agradecimiento más sincero a Juan Carlos Dueñas por hacer posible esta tesis. Hace casi 5 años me brindaste la oportunidad de trabajar en la universidad como investigador. Transcurrido este tiempo, me gustaría agradecer una y mil veces todo el apoyo y confianza que has depositado en mí. A parte de tu brillantez para dirigir esta tesis, por encima de todo quiero destacar tu calidad humana, tu cercanía y comprensión que me han ayudado siempre a superar los momentos difíciles. Puedo afirmar sin duda alguna que el mayor éxito que he conseguido en mi vida profesional es haber trabajado a tu lado. En segundo lugar, quiero expresar mi gratitud a todos los compañeros de laboratorio con los que he compartido tanto tiempo: Álvaro, Antonio, Bea, Chema, Marta, Félix, Freakant, Hugo, José Ignacio, José Luis, Laura, Lorena, Mar, Rodrigo, Rubén, Samuel y Sandra. Muchas gracias también a July, que es el auténtico motor del laboratorio. I would like to thank to the European partners which have made possible my research stay in VTT‐Espoo (Finland) during summer 2010. Thank you very much to Juha Pärssinen and Hannu Honka for making possible this journey. Special thanks to Arto Laikari and the rest of the group: Janne, Juha, Julia, Vesa, Jukla, Kari. Thank you to Anne Kontula for helping us during the stay. En esta ronda de agradecimiento no me puedo olvidar de los amigos de siempre. Aquellos con lo que siempre puedes contar para echarte unas risas, sin las cuales muchas veces no valdría la pena el esfuerzo: Álvaro, Amalia, Ana, Aurora, Barrix, Chechu, Fátima, Gari, Iván, Jesús, Kike, Laura, María, Marta, Miky, Riky, Santos, Tomate y Vanesa. El agradecimiento más especial es para mi chica, Vero. Muchas gracias por el cariño que me demuestras día a día, por tu apoyo y ayuda incondicional, y por compartir tantos momentos juntos. El agradecimiento más especial quiero que sea para mis hermanas: Yoly e Inma. Solo vosotras sabéis bien por todo lo que hemos pasado. Sólo me gustaría expresar el orgullo que tengo de ser vuestro hermano y espero estar ahí siempre para vosotras. Además, habéis traído al mundo (sin la inestimable ayuda de Mario y Rubén respectivamente) a las personillas más importantes que puede haber. Me refiero a Andrea, Silvia, y la recién llegada Laura. Su alegría es el medio más poderoso que conozco para encarar el futuro con optimismo. Por supuesto quiero acordarme también del resto de mi familia: abuelos, tíos, y primos. Por último, pero en el primer lugar de mi corazón, quiero acordarme de mis padres. No habría cosa que más me gustaría en el mundo que me hubieseis podido ver culminando esta etapa de mi vida. Seguro que os sentiríais muy orgullosos de mí, tanto como yo lo soy de ser hijo vuestro. Quiero daros las gracias por todo lo que luchasteis en vuestra vida por nosotros. Siempre os llevo conmigo, tened por seguro que de lo poco que puedo presumir es de ser hijo de Pablo y Dolores.

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Abstract

The Web has become one of the most influential instruments in the history of mankind. Therefore, web applications development is a hot topic in the Software Engineering domain. In this context, the software quality is a key concept since it determines the degree in which a system meets its requirements and meets the expectations of its customers and/or users. Quality control (also known as verification and validation) is the set of activities designed to assess a software system in order to ensure its quality. Therefore, the quality control process ensures the requirements of applications while reducing the number of defects. The two core activities in quality control are testing and analysis. On one hand, testing is a dynamic method, i.e., it assesses the responses of a running system. On the other hand, analysis is static, i.e., it assesses the software artefacts (e.g., source code, models, and so on) without its execution. Current web applications market is defined by fierce global competition. This market can be divided into three different positions: quality, cost, and time to market. In order to minimize costs and time to market in the development of web applications is a very common practice the reduction or elimination of quality control processes. This fact has a direct impact in the low quality of such applications. Automation of quality control activities help to improve the overall quality of software developed while reducing development time and costs. This PhD dissertation proposes a set of techniques to automate the quality control (testing and analysis) for web applications. The heterogeneous nature of web applications makes complex the quality control activities. Web applications are based on client‐server architecture. This dissertation is focuses on the client‐side of web systems, since it is the differentiating factor of such applications. According to the ISO‐9126 standard, quality in use is the quality perceived by users of the applications during phases of operation and maintenance of these applications. This type of quality is determined by its external quality (properties of the system during its execution) and internal quality (system properties statically). Thus, the quality use of web applications is always perceived from client‐side in web applications. The quality control process proposed in this dissertation is based on the automation of the navigation of web applications. Functional and non‐functional requirements of the system under test will guide the process. Regarding non‐functional requirements, testing and analysis will be made to the quality attributes considered the most important for web applications: performance, security, compatibility, usability and accessibility. The first step in this automation is defining the structure of the navigation. To achieve this aim, existing software artefacts in the phase of analysis and design of web applications under test will be reused as far as possible. Then, as the navigation is automated, there will be different kinds of tests and analysis in the various states of the navigation. The aggregation of the verdicts of the evaluation is stored in an automatically generated report will contain different defects and potential issues found. The processes and methods proposed in this dissertation have been implemented by means of reference architecture. In addition, several experiments and case studies have been conducted in order to assess the proposal. This work has been carried out in different national and international research projects mainly in the ICT‐ROMULUS, ITEA‐MOSIS and Factur@.

xi

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Resumen

La Web se ha convertido en uno de los instrumentos más influyentes de la humanidad. El desarrollo de aplicaciones web es por tanto un tema de capital importancia en el mundo de la Ingeniería de Software. En este ámbito, la calidad de software es un concepto clave ya que determina el grado en el que un sistema cumple sus requisitos y satisface las expectativas de sus clientes y/o usuarios. El control de calidad (también conocido como verificación y validación) es el conjunto de actividades dirigidas a evaluar un sistema software con el objetivo de asegurar la calidad del mismo. El control de calidad es por tanto el proceso encargado de asegurar que se cumplen los requisitos de las aplicaciones al tiempo que se elimina (o se reduce al máximo) el número de defectos en las mismas. Las dos actividades básicas del control de calidad son las pruebas y el análisis. Las pruebas son de naturaleza dinámica, esto es, se evalúa las respuestas de un sistema en ejecución. Por el contrario, el análisis es de naturaleza estática, es decir, se evalúa los artefactos que componen el software en cuestión (por ejemplo, su código fuente, modelos, etc.) sin la ejecución del mismo. El mercado de las aplicaciones web está determinado por una competencia global dirigida por tres ejes: calidad, costes, y tiempo de salida al mercado. Para minimizar costes y tiempo de salida al mercado, es una práctica muy común en el desarrollo de aplicaciones web la reducción o eliminación de los procesos de control de calidad, aminorando por tanto la calidad final de las aplicaciones web. La automatización de las actividades de control de calidad ayuda a mejorar la calidad global del software desarrollado mientras se reducen los tiempos de desarrollo y costes. Esta tesis doctoral propone un conjunto de técnicas para automatizar el control de calidad (pruebas y análisis) para aplicaciones web. La naturaleza heterogénea de las aplicaciones web hace las actividades de control de calidad sean complejas. Las aplicaciones web están basadas en una arquitectura cliente‐servidor. Esta tesis está centrada en la parte cliente de los sistemas web, ya que es el factor diferenciador de este tipo de aplicaciones. Según el estándar ISO‐9126, la calidad en uso es la calidad percibida por los usuarios de las aplicaciones durante las fases de operación y mantenimiento de dichas aplicaciones. Este tipo de calidad está determinada por la calidad externa (propiedades del sistema durante su ejecución) e interna (propiedades del sistema de forma estática) del sistema en cuestión. Así pues, la calidad en uso de las aplicaciones web es percibida siempre desde lado cliente de las aplicaciones web. El proceso de control de calidad propuesto en esta tesis doctoral está basado en la automatización de la navegación de las aplicaciones web. Los requisitos funcionales y no funcionales del sistema bajo pruebas guiarán el proceso. Respecto a los requisitos no funcionales, se realizarán pruebas y análisis para los atributos de calidad considerados como los más importantes para aplicaciones web: rendimiento, seguridad, compatibilidad, usabilidad y accesibilidad. El primer paso en esta automatización consistirá en definir la estructura de navegación de la misma. Para ello se usarán (y reutilizarán en la medida de lo posible) artefactos software existentes en las fase de análisis y diseño de las aplicaciones web bajo prueba. A continuación, según se lleve a cabo la navegación de forma automática, se realizarán diferentes tipos de pruebas y análisis en los diferentes estados por los que va pasando el sistema según avanza la navegación. La agregación de los veredictos de dicha evaluación será almacenada en un informe generado automáticamente que contendrá los diferentes tipos

xiii PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

defectos encontrados, así como problemas potenciales en los atributos de calidad previamente seleccionados. Los procesos y métodos propuestos en esta tesis han sido puestos en marcha mediante una arquitectura e implementación de referencia. Además, se han llevado a cabo diferentes experimentos y casos de estudio para evaluar la validez de la propuesta. Este trabajo ha sido llevado a cabo en diferentes proyectos nacionales e internacionales de investigación, principalmente en los proyectos ICT‐ROMULUS, ITEA‐MOSIS y Factur@.

xiv PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Table of Contents

Chapter 1. Motivation ...... 1 1.1. Research Methodology ...... 4 1.2. Structure of the document ...... 5

Chapter 2. State of the Art ...... 7 2.1. Software Quality ...... 7 2.1.1. Quality Engineering ...... 7 2.1.2. Quality Assurance ...... 9 2.1.3. Verification and Validation ...... 12 2.2. Static Analysis ...... 14 2.2.1. Inspections ...... 14 2.2.2. Review ...... 15 2.2.3. Automated Software Analysis ...... 15 2.2.4. Formal Methods ...... 15 2.3. Software Testing...... 17 2.3.1. Testing Levels ...... 18 2.3.2. Testing Methods ...... 21 2.4. Testing of Web Applications ...... 24 2.4.1. Web Testing Levels ...... 24 2.4.2. Web Testing Strategies ...... 26 2.4.3. Non‐Functional Web Testing ...... 27 2.4.4. Web Testing Tools ...... 28 2.5. Automated Software Testing ...... 29 2.5.1. Test Case Generation ...... 29 2.5.2. Test Data Generation ...... 31 2.5.3. Automated Test Oracles ...... 35 2.5.4. AST Frameworks ...... 36 2.5.5. AST Frameworks for Web Applications ...... 37 2.6. Summary ...... 41

Chapter 3. Objectives ...... 43

Chapter 4. Methodology Foundations ...... 47 4.1. Web Applications ...... 47 4.2. Automated Quality Control Activities ...... 49 4.2.1. Automated Software Testing ...... 49 4.2.2. Automated Software Analysis ...... 52 4.3. Quality Views ...... 53 4.3.1. Functionality ...... 53 4.3.2. Performance ...... 54

xv PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

4.3.3. Security ...... 54 4.3.4. Compatibility ...... 54 4.3.5. Usability ...... 55 4.3.6. Accessibility ...... 55 4.4. Test Process ...... 56 4.5. Summary ...... 59

Chapter 5. Automated Functional Testing ...... 61 5.1. Scope of the Dissertation ...... 62 5.2. Approach ...... 65 5.3. Modelling Web Navigation ...... 68 5.3.1. UML Models ...... 68 5.3.2. XML Files ...... 74 5.3.3. R&P Approach ...... 76 5.4. Finding the Paths in a Multidigraph ...... 77 5.5. Summary ...... 82

Chapter 6. Automated Non‐Functional Assessment ...... 85 6.1. Approach ...... 85 6.2. Automated Non‐Functional Testing ...... 87 6.2.1. Performance ...... 87 6.2.2. Security ...... 91 6.3. Automated Non‐Functional Analysis ...... 94 6.3.1. Compatibility ...... 94 6.3.2. Usability ...... 95 6.3.3. Accessibility ...... 97 6.4. Summary ...... 97

Chapter 7. Architecture ...... 99 7.1. Tools Integration ...... 99 7.2. Tool Survey ...... 100 7.2.1. Functionality ...... 100 7.2.2. Performance ...... 104 7.2.3. Security ...... 106 7.2.4. Compatibility ...... 107 7.2.5. Usability ...... 111 7.2.6. Accessibility ...... 112 7.3. Automatic Testing Platform ...... 114 7.3.1. Test Cases ...... 116 7.3.2. ATP Extension ...... 117 7.3.3. Web Site Java Modelling ...... 118

xvi PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

7.4. ATP4Romulus ...... 119 7.5. Summary ...... 121

Chapter 8. Validation ...... 123 8.1. Research Questions ...... 123 8.2. Factur@ ...... 124 8.2.1. System Description ...... 124 8.2.2. Pre‐Automation ...... 126 8.2.3. Configuration ...... 131 8.2.4. Generation ...... 133 8.2.5. Post‐Automation ...... 137 8.2.6. Execution ...... 137 8.2.7. Reports ...... 138 8.3. Romulus Demonstrators ...... 142 8.4. Conclusions ...... 147

Chapter 9. Conclusions ...... 151 9.1. Main Contributions ...... 151 9.2. Future work ...... 154

References ...... 157

Annex I: Navigation XSD Schema ...... 165

Annex II: Example of ATP Report ...... 169

Annex III: Curriculum Vitae ...... 181

xvii

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

List of Figures

Figure 1. Verification & Validation in Context ...... 2 Figure 2. Web Evolution ...... 3 Figure 3. Market Dimensions for Web Applications ...... 3 Figure 4. Software Quality Engineering Proccess ...... 8 Figure 5. ISO/IEC‐9126 Quality Lifecycle ...... 10 Figure 6. ISO/IEC‐9126 Quality Model (External and Internal Quality) ...... 10 Figure 7. ISO/IEC‐9126 Quality Model (Quality in Use) ...... 11 Figure 8. Verification & Validation Schema...... 13 Figure 9. Unit Testing ...... 19 Figure 10. Test Data Generation Techniques ...... 32 Figure 11. Generic Search Based Test Input Generation Scheme ...... 34 Figure 12. Software Engineering Layers ...... 44 Figure 13. Tipical Web Applications Architecture ...... 47 Figure 14. Software Defects in Context...... 49 Figure 15. Fault Origin/Dectection Distribution and Cost ...... 49 Figure 16. Generic Testing Activities ...... 51 Figure 17. Generic Analysis Activities ...... 52 Figure 18. Transition‐based Coverage Criteria ...... 59 Figure 19. Methodology Levels ...... 59 Figure 20. Methodology Quality Dimmensions ...... 60 Figure 21. Methodology Process ...... 60 Figure 22. Web Site and Quality Control Metamodel ...... 64 Figure 23. Automated Functional Testing Schematic Diagram ...... 67 Figure 24. Use Case Diagram Example ...... 71 Figure 25. Activity Diagram Example ...... 71 Figure 26. Activity Diagram with Complex Transition ...... 73 Figure 27. Presentation Diagram Example ...... 73 Figure 28. XSD Graphic Representation for a Web Site ...... 74 Figure 29. XSD Graphic Representation for a ...... 75 Figure 30. XSD Graphic Representation for a Web Transition ...... 75 Figure 31. Digraph Example ...... 79 Figure 32. Node Reduction Example ...... 80 Figure 33. Node Reduction vs. CPP Costs ...... 81 Figure 34. Node Reduction vs. CPP Time ...... 81 Figure 35. MBT Taxonomy ...... 82 Figure 36. Automated Non‐Functional Testing and Analysis Schematic Diagram ...... 87 Figure 37. Response Time Latency ...... 90 Figure 38. Browser Use Evolution since 2002 ...... 94 Figure 39. Browser Use on March 2011 ...... 95 Figure 40. IDE ...... 102 Figure 41. Recorded Script in HTML ...... 102 Figure 42. ATP Process ...... 115 Figure 43. ATP Architecture ...... 116

xix PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 44. 3‐Tier ATP Methodology ...... 117 Figure 45. 3‐ Method for Adding New Generators in ATP ...... 118 Figure 46. Web Applicartion Model ...... 119 Figure 47. 3‐ ATP4Romulus Architecture ...... 120 Figure 48. Factur@ Architecture ...... 125 Figure 49. Factur@ Architecture ...... 125 Figure 50. Factur@ Administration Module Screenshot ...... 126 Figure 51. Factur@ Use Case Diagram ...... 127 Figure 52. Login ...... 128 Figure 53. New Company ...... 128 Figure 54. New Adminitrator ...... 128 Figure 55. Search Company ...... 128 Figure 56. Search Administrator ...... 128 Figure 57. Presentation Diagrams ...... 129 Figure 58. Input Folder ...... 132 Figure 59. Generated Project ...... 133 Figure 60. Test Data for Login state in Path 1 ...... 136 Figure 61. Test Oracles for Init state in Path 1 ...... 136 Figure 62. Test Data and Oracles for Login state in Path 2 ...... 136 Figure 63. Paths Found by CPP for Search Company Activity Diagram ...... 136 Figure 64. Ant Script ...... 138 Figure 65. ATP Report ...... 138 Figure 66. ATP Report ...... 139 Figure 67. Average Traffic in Iteration 1 of 02‐login‐ko. ...... 140 Figure 68. Factur@ Error 500 ...... 140 Figure 69. EUProjectManager Screenshot ...... 143 Figure 70. Cornelius Screenshot ...... 143 Figure 71. Scrooge Screenshot ...... 143 Figure 72. Scrooge Web Performance Charts ...... 146 Figure 73. Cornelius Database Performance Charts ...... 147 Figure 74. Dissertation Summary ...... 154

xx PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

List of Tables

Table 1. Black‐Box Vs. White‐Box Testing ...... 22 Table 2. Web Application Non‐Functional Testing ...... 27 Table 3. Decision Table Template ...... 35 Table 4. Automated Software Testing Frameworks for Web Applications ...... 37 Table 5. Selenium Projects ...... 39 Table 6. Graph Types ...... 57 Table 7. Literals for Actions in Web Transitions ...... 63 Table 8. Test Data and Expected Outcuome Template ...... 65 Table 9. UML 2.0 Diagrams ...... 68 Table 10. UML‐Based Web Modelling Technologies Comparision ...... 70 Table 11. Techniques and Algorithms for Decomposing a Graph into Paths...... 77 Table 12. Open‐source Java Graphs Libraries ...... 81 Table 13. Functional Web Tools ...... 101 Table 14. Functional Web Tools Comparison ...... 101 Table 15. Browser Compatibility of Selenium ...... 102 Table 16. Selenium Commands Subset ...... 103 Table 17. Web Performance Tools ...... 104 Table 18. Performace Web Tools Comparison ...... 105 Table 19. Web Application Scanners ...... 106 Table 20. Web Application Scanners Comparison ...... 107 Table 21. Snapshots Compatibility Tools ...... 107 Table 22. Performace Web Tools Comparison ...... 109 Table 23. HTML Checkers ...... 110 Table 24. HTML Checkers Comparison ...... 110 Table 25. CSS Checkers ...... 110 Table 26. HTML Checkers Comparison ...... 111 Table 27. Broken Tools ...... 111 Table 28. Usability Guideline Inspection Tools ...... 112 Table 29. Accessiblity Guideline Tools ...... 112 Table 30. ATP Components ...... 115 Table 31. Factur@ Metrics Summary ...... 126 Table 32. Factur@ Administration Testing Figures ...... 126 Table 33. New Administrator in HTML Format ...... 130 Table 34. Summary Report for 01‐login‐ok.html ...... 139 Table 35. Summary Report for 02‐login‐ko.html ...... 139 Table 36. Summary Report for 03‐new_company.html ...... 140 Table 37. Summary Report for 04‐edit_company.html ...... 141 Table 38. Summary Report for 05‐new_admin.html ...... 141 Table 39. Summary Report for 06‐edit_admin.html ...... 142 Table 40. Summary Report for Factur@ ...... 142 Table 41. ATP4Romulus Results ...... 144 Table 42. EUProjectManager Test Results ...... 145 Table 43. Pros and Cons of XMI, XML and HTML ...... 148

xxi

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

List of Snippets

Snippet 1. Watir Script Example ...... 40 Snippet 2. Procedure to Translate Guards into HTML Elements ...... 72 Snippet 3. XML‐based Navigation Example...... 76 Snippet 4. Path Expression using CPP ...... 79 Snippet 5. Path Expression using Node Reduction ...... 80 Snippet 6. Procedure to Translate Guards into HTML Elements ...... 103 Snippet 7. ATP in the Shell ...... 116 Snippet 8. Generators in ATP ...... 117 Snippet 9. Roma Metaframework ...... 120 Snippet 10. Login in XML Format ...... 129 Snippet 11. Setting root in ATP ...... 131 Snippet 12. Setting Navigation Folder in ATP ...... 131 Snippet 13. Configuration Parameters Listing in ATP ...... 132 Snippet 14. Test Case Generation in ATP ...... 133 Snippet 15. JUnit Test Case Generated for Login (XML) ...... 134 Snippet 16. Execution of Test Cases in ATP ...... 137 Snippet 17. Visualization of Reports in ATP ...... 138 Snippet 18. Installing ATP4Romulus in Roma Metaframework ...... 144 Snippet 19. GraphML Input for ATP4Romulus (EUProjectManager) ...... 144 Snippet 20. Test Case Generation in ATP4Romulus ...... 144 Snippet 21. Test Case Execution in ATP4Romulus ...... 144

xxiii

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Glossary

ACM Association for Computing Machinery AI Artificial Intelligence Asynchronous JavaScript and XML ANN Artificial Neural Network API Application Programming Interface ASA Automated Software Analysis ASP Active Server Pages AST Automated Software Testing ASVS Application Security Verification Standard ATDG Automated Test Data Generation ATI Automated Testing Institute ATP Automatic Testing Platform BCET Best‐Case Execution Time BFS Breadth‐First Search BNF Backus‐Naur Form BSD Berkeley Software Distribution CASE Computer‐Aided Software Engineering CERN European Organization for Nuclear Research CFG Control Graph COTS Commercial Off‐The‐Shelf CPL Common Public License CPP Chinese Postman Problem CPT Chinese Postman Tour CPU Central Processing Unit CRS Customer Requirements Specification CS Computer Science CSS Cascading Style Sheets CSV Comma Separated Value DDBB Database DDD Domain Driven Design DDR Dynamic Domain Reduction DFS Depth‐First Search DIT Departamento de Ingeniería de Sistemas Telemáticos DOC Depended‐On Component DOM DSL Domain‐Specific Languages EA Enterprise Architect EDI Electronic Data Interchange EFG Event‐Flow Graph EIG Event Interaction Graph EPL Eclipse Public License ER Entity‐Relationship ESIG Event Semantic Interaction Graph ETSIT Escuela Técnica Superior de Ingenieros de Telecomunicación

xxv PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

EU European Union EUPL European Union Public Licence FIFO First‐In First‐Out FSM Finite‐State Machines FTL FreeMarker Template Language FWPTT Fast Web Performance Test Tool GB Giga Byte GDL Graph Description Language GIF Graphics Interchange Format GML Graph Modelling Language GNU GNU's Not Unix GPL GNU General Public License GUI Graphical User Interface HDM Hypermedia Design Model HFPM Hypermedia Flexible Process Modelling HP Hewlett‐Packard HTML HyperText Markup Language HTTP Hypertext Transfer Protocol HTTPS Hypertext Transfer Protocol Secure IANA Internet Assigned Numbers Authority I/O Input/Output IBM International Business Machines ICT Information and Communication Technologies IDE Integrated Development Environment IE IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers IFN Info Fuzzy Network IP Internet Protocol IS Information Systems ISBN International Standard Book Number ISO International Standards Organization IT Information Technology JDBC Java DataBase Connectivity JML Java Modelling Language JPEG Joint Photographic Experts Group JSP Java Server Pages JTC Joint Technical Committee JUNG Java Universal Network/Graph Framework KB Kilo Bytes LGPL Lesser General Public License LLBB Local Bodies LOC Lines Of Code LTS Labelled Transition Systems MBT Model‐Based Testing MIME Multipurpose Internet Mail Extensions

xxvi PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

MDD Model‐Driven Development MDE Model‐Driven Engineering MIT Massachusetts Institute of Technology MOF Meta‐Object Facility NATO North Atlantic Treaty Organization NDT Navigational Development Techniques NIST National Institute of Standards and Technology NP Nondeterministic Polynomial OCL Object Constraint Language OOHDM Object Oriented Hypermedia Design Model ORM Object Relational Mapping OS Operating System OWASP Open Web Application Security Project PAS Publicly Available Specifications PC Personal Computer PHP Hypertext Preprocessor PKCS Public‐Key Cryptography Standards PM Person‐Month PNG Portable Network Graphics PV Visualization Prototype QA Quality Assurance QE Quality Engineering RAM Random‐Access Memory RAVEN Rule‐Based Accessibility Validation Environment RC Remote Control RE Requirement Engineering RM&E Requirements Management and Engineering RNA Relationship‐Navigational Analysis RQ Research Question SBSE Search‐Based Software Engineering SC Subcommittee SDL Specification and Description Language SE Software Engineering SEO Search Engine Optimization SHDM Semantic Hypermedia Design Method SME Small and Medium Enterprises SOA Service‐Oriented Architecture SOHDM Scenario‐based Object‐Oriented Hypermedia Design Methodology SPP Shortest Path Problem SQL Structured Query Language SRS Software Requirements Specification STAF Software Testing Automation Framework STL Software Testing Lifecycle SUT System Under Test SWEBOK Software Engineering Body of Knowledge TCP Transmission Control Protocol

xxvii PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

TDG Test Data Generation TR Technical Report TSP Traveling Salesman Problem UBL Universal Business Language UI User Interface UML Unified Modelling Language UPM Universidad Politécnica de Madrid URL Uniform Resource Locator UWE UML‐based Web Engineering VCG Visualizing Compiler Graphs VDM Vienna Definition Method W3 W3C World Wide Web Consortium WAI Web Accessibility Initiative WATIR Web Application Testing in Ruby WCAG Web Content Accessibility Guidelines WCET Worst‐Case Execution Time WE Web Engineering WG Working Group WP Work Package WS Web Service WSDL Web Services Description Language WSDM Web Site Design Method WWW World Wide Web XHTML eXtensible HyperText Markup Language XMI XML Metadata Interchange XML Extensible Markup Language XSD XML Schema XSS Cross‐site Scripting

xxviii PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 1. Motivation

Untested code is the dark matter of the software.

‐ Robert C. Martin

oftware is the collection of computer programs, related data and associated documentation developed for a particular customer or for a general market. Software is an essential part of the modern world, and it has become pervasive in telecommunications, utilities, commerce, culture, entertainment, and so on. The Sterm was coined in contrast to the term hardware, i.e. the physical devices of a computer. The activity of using software and hardware is known as computing. Glass et al. breaks the computing field down into three main subdivisions [54], namely Computer Science (CS), Information Systems (IS), and Software Engineering (SE). According to Sommerville [132], CS focuses on theory and fundamentals of information and computation; IS (sometimes referred as system engineering) is concerned with all aspects of computer‐based systems development including hardware, software and processes; finally SE is an engineering discipline concerned with all aspects of software production. The notion of SE was first proposed in 1968 at a NATO conference held to discuss the “software crisis”, i.e. unreliability problems in software because of individual approaches did not scale up to large and complex software systems [118]. The Software Engineering Body of Knowledge (SWEBOK) [1] establishes a boundary for SE, dividing this discipline in different knowledge areas, namely: software requirements, software design, software construction, software testing, software maintenance, software configuration management, software engineering management, software engineering process, software engineering tools and methods, software quality, measurement, and security. According the Standard Glossary of Software Engineering Terminology [70], software quality is “the degree to which a system, component, or process meets specified requirements, and customer or user needs or expectations”. Software Quality Engineering (QE) ‐sometimes referred as Quality Management‐ is a discipline concerned with the improvement in the software quality. The overall QE process includes three essential stages [138]: i) quality planning; ii) execution of selected Quality Assurance (QA) activities; iii) measurement and analysis to provide convincing evidence to demonstrate software (post‐QA). Quality Assurance (QA) is the process of defining how software quality can be achieved and how the development organisation knows that the software has the required level of quality [47].

‐ 1 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The most common QA activities are Verification and Validation (V&V) ‐also known as quality control‐, which can be seen as a disciplined approach to assessing software products and services. As depicted in Figure 1, V&V can be divided into two big categories: testing (evaluating software by observing its execution, also known as dynamic analysis) [5] and static analysis (evaluating software without executing the code) [132]. Static analysis and testing are often confused and both are mistakenly group under the term testing [1]. In this piece or research static and dynamic techniques are treated separately but grouped into V&V (software quality control), which is the major topic of this dissertation.

Quality Engineering

Quality Assurance

Verification & Validation Testing Static Analysis

Figure 1. Verification & Validation in Context On the one hand, V&V (both static and dynamic techniques) plays a crucial role in the software development process since it is necessary to meet the quality requirements of every software project [68]. On the other hand, both testing and static analysis are usually hard and time‐ consuming activities. Some studies have shown that testing is one of the most costly development processes, sometimes exceeding fifty per cent of the total development cost [13]. Therefore, V&V activities are often poorly performed or skipped by practitioners, creating an industry‐wide deficiency in software quality control. The advent of the Internet has brought new opportunities and challenges for SE. The Internet can be defined as a global system of interconnected networks using the Internet protocol suite TCP/IP (Transmission Control Protocol/Internet Protocol). This suite represents a synthesis of several standards developed mainly in the 1960s and 1970s. TCP and IP protocols were created in 1974 by Vint Cerf and Bob Kahn ‐the so‐called “Fathers of the Internet”‐ [112]. One of the most important services in the Internet is the World Wide Web (WWW, or simple the Web). The first web site was created in 1990 by Tim Berners‐Lee and Robert Cailliau at CERN (European Nuclear Research Center) in Geneve (Switzerland). That first web site consisted of a collection of documents with static content, encoded in the HyperText Markup Language (HTML). The basic element on which the Web is founded is the Hypertext Transfer Protocol (HTTP). HTTP is a client‐server application protocol which defines a standard format to specify requests of resources on the Web [66]. Nowadays, the Web is not only an environment hosting simple and static document, since several technologies have enhanced the web model, becoming the Web into a multi‐domain infrastructure for the execution of web applications and services. The current web applications comprise large‐scale enterprise platforms, e‐commerce systems, collaborative distributed

‐ 2 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

environments, social networks, and so on. Web applications have continued evolve as more and more technologies become available. As illustrated in Figure 2, we are now in the “web 2” era, with rich and dynamic web pages [12]. To address this growth of web systems and to ensure their quality it has been defined the discipline of Web Engineering (WE) [127]. All in all, the Web has become one of the most influential instruments not only in computing but in the history of mankind. Hence the target of this dissertation is web applications.

199x ‐ 2003 2004 ‐ Today Future WWW Web 2 Web 3

•HTTP •XML • Semantic •HTML •AJAX web •JavaScript •SOAP •DHTML •SOA •CSS

Figure 2. Web Evolution The lack of V&V described before is especially significant for web applications. Large and complex web applications with a growing number of potential users are more and more required nowadays. Hence, current web applications market is defined by fierce global competition. This market can be divided into three different positions: quality, cost, and time to market. As illustrated in Figure 3, to produce quality web applications requires better, cheaper, and faster development processes [12].

Quality

Better

Web Cheaper Application Faster

Cost Time to Market

Figure 3. Market Dimensions for Web Applications Nevertheless, the development of web applications has been in general ad hoc and V&V is often neglected of web development, resulting in poor‐quality web applications. Quality control of a web application may be expensive, but the impact of defects resulting from lack of testing and static analysis could be more costly. Therefore, in order to produce better web applications, there is an increasing need of methodologies and guidelines to develop such applications delivered on time (faster), within budget (cheaper), and with a high level of quality (better).

‐ 3 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

In the 1980s software quality was the key SE problem. In this first part of the 21st century, time to market is usually more critical (especially for web applications). Quality is still an important factor, but it must be achieved in the context of rapid delivery. In order to conciliate the need of developing quality web application with the urgency of the market while saving costs, the automation of software quality control activities might be the key [38][7][83][63]. The automation of the software development processes (such as V&V) has a profound impact on the speed, quality, and cost of releasing software. There are some actions impossible to automate, for example some kind of testing and analysis techniques that rely on experience testers. However, the list of issues that can be automated is long [23]. All in all, the main problem I face in this dissertation is to find effective ways to improve the quality of web applications by means of the automation its quality control (V&V) activities, i.e. software testing and analysis. This way, the problem of costly and expensive V&V activities could be abbreviated and therefore web application would get better quality levels.

1.1. Research Methodology

The automation of software testing and analysis is becoming a hot research topic in SE. Web applications are more and more difficult to assess, due to its peculiarities. Therefore, the automation of quality control activities presents important research challenges. The work carried out in the ICT‐ROMULUS project, a research project within the European Union Seventh Framework Programme for Research and Technology Development, was crucial to identify these challenges. In particular the participation in the Work Package (WP) 5 in this project, which studied enhancement of software quality of web applications from the conception of the software by means of automatic code and tests generation techniques, was very useful to detect major drawbacks of current solutions and to point towards areas for improvement. In the top of that, methods for the automated functional and non‐functional testing and static analysis were proposed. As a result of this work, the Automatic Testing Platform (ATP), a proof‐of‐concept deployment system was created and integrated in the open‐source Romulus Framework. This initial work was continued in the ITEA‐MOSIS project, a research project focused on managing variability and Domain‐Specific Languages (DSL) for Model‐Driven Development (MDD) of software‐intensive systems. In this context, state‐of‐the‐art modelling methods and technologies for web applications were studied and included in the automated quality control approach previously developed. As a result, Model‐Based Testing (MBT) techniques were proposed in order to enhance the automation of quality control proposed. Finally, the electronic invoice application developed used MDD in the context of the project Factur@ (an innovation project funded by Telvent and Comunidad de Madrid) was employed to perform a case study which perform the validation of the final work. All in all, the process followed in this has the following steps: i) Identification of the problem at hand; ii) Literature review (state‐of‐the‐art); iii) Proposal of a solution; iv) Validation of the proposed approach by means of experiment and case studies, and also by disseminations of the results in journals, conferences, project deliverables, and so on; v) Synthesis of findings (conclusions) and definition of the possible future work.

‐ 4 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

1.2. Structure of the document

After introducing the motivation and context of the research work the document follows with a detailed chapter on the state of the art on software quality control. This section first introduces V&V in the context of software quality. After that, static analysis and traditional software testing is depicted. Finally, this section describes in detail web testing for web applications and automated software testing. Then, chapter 3 establishes a more detailed description of the objectives of this research work. Chapter 4 describes the high‐level decisions taken in this piece of research in order to achieve the stated goals. The next two chapters detail a fine‐grained description of the main contributions of this dissertation: automated functional testing (section 5) and automated non‐functional assessment (section 6) for web applications. Chapter 7 finalizes the description of the original contributions by thoroughly explaining the reference architecture proposed to perform the automation of quality control activities. Chapter 8 provides a extend summary of all the validation activities carried out in order to ensure the correctness of the results presented in this dissertation. Finally, Chapter 9 establishes the main conclusions of this work, as well as a description of the possible future research activities.

‐ 5 ‐

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 2. State of the Art

In order to make an apple pie from scratch, you must first invent the universe.

‐ Carl Sagan

oftware engineering is concerned with the practicalities of developing and delivering useful and quality software. This section presents and state of the practice on software quality control, i.e. V&V. First, I introduce the key concepts on software quality, quality assurance, and V&V in section 2.1. Then the main techniques and Smethods for static analysis are presented in section 2.2. Software testing is described in section 2.3. Testing of web applications and automated testing are depicted in section 2.4 and 2.5 respectively.

2.1. Software Quality

The question “What is software quality?” can generate different answers, depending on the involved practitioners role or the kind of software systems [138]. Regarding people, there are different views and expectations based on their roles and responsibilities. There are two main groups of people involved in a software product or service. On one hand, there are consumers, i.e. customers (responsible for the acquisition of software products or services) and users (people who use the software products or services for various purposes). Nevertheless the dual roles of customers and users are quite common. On the other hand, producers are people involved with the development, management, maintenance, marketing, and service of software products. The quality expectations of consumers are that a software system performs useful functions as it is specified. For software producers, the fundamental quality question is fulfilling their contractual obligations by producing software products that conform to the Service Level Agreement (SLA). Pressman’s definition of software quality comprises both points of views [122]: “An effective software process applied in a manner that creates a useful product that provides measurable value for those who produce it and those who use it”.

2.1.1. Quality Engineering Quality Engineering (QE) ‐also known as Quality Management‐ is a process that evaluates, assesses, and improves the quality of software. There are three major groups of activities in the QE process, as depicted in Figure 3:

‐ 7 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

1. Quality planning (pre‐QA activities). This stage establishes the overall quality goal by managing customer’s expectations under the project cost and budgetary constraints. This quality plan also includes the QA strategy, i.e. the selection of QA activities to perform and the appropriate quality measurements to provide feedback and assessment. 2. Quality Assurance (in‐QA activities). It guarantees that software products and processes in the project life cycle meet their specified requirements by planning and performing a set of activities to provide adequate confidence that quality is being built into the software. The main QA activity is V&V, but there are others such as software quality metrics, quality standards, configuration management, documentation management, or experts’ opinion. 3. Quality quantification and improvement measurement, analysis, feedback, and follow‐up activities (post‐QA activities). The analyses would provide quantitative assessment of product quality, and identification of improvement opportunities.

Quality Plan Quality Assurance Post‐QA

•Quality Goal •V&V •Meassurement •Quality Strategy •Quality Metrics •Analisys •Qualty Standards •Feedback •... •Follow‐up

Figure 4. Software Quality Engineering Proccess

2.1.1.1. Requirements and Specification Requirements are a key topic in the QE domain. A requirement is a statement identifying a capability, physical characteristic, or quality factor that bounds a product or process need for which a solution will be pursued. The requirements development (also known as requirements engineering) is the process of producing and analysing customer, product, and product‐ component requirements. The set of procedures that support the development of requirements including planning, traceability, impact analysis, change management and so on is known as requirements management. Requirements Management and Engineering (RM&E) is the overall term used to include all requirements related processes [67]. There are two kinds of software requirements [126]: ‐ Functional requirements are actions that the product must do to be useful to its users. They arise from the work that stakeholders need to do. Almost any action, inspect, publish, or most other active verbs can be a functional requirement. ‐ Non‐functional requirements are properties, or qualities, that the product must have. For example, they can describe such properties as look and feel, usability, or security. They are often called quality attributes. Another important topic strongly linked with the requirements is the specification, which is a document that specifies in a complete, precise, verifiable manner, the requirements, design, behaviour, or other characteristics of a system, and often, the procedures for determining whether these provisions have been satisfied [140]. For example, a (non‐functional) requirement should be “the product response time shall be less than 0.25 second”. The specification for this requirement would include technical information about specific design

‐ 8 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

aspects. It is important to distinguish between the specification supplied by a customer, known as a Customer Requirements Specification (CRS) and the specification created by the developers, known as Software Requirements Specification (SRS). This second kind of specification is a complete description of the behaviour of the system to be developed, and includes the definition of the system use cases. A use case can be seen as a way of documenting functional requirements describing the interactions between the users and the system.

2.1.2. Quality Assurance Quality Assurance (QA) is a “systematic, planned set of actions necessary to provide adequate confidence that the software development and maintenance process of a software system product conforms to established specification as well as with the managerial requirements of keeping the schedule and operating within the budgetary confines” [47]. QA is primarily concerned with defining or selecting standards that should be applied to the software development process or software product. Moreover, QA process selects the V&V activities, tools and methods to support these standards [132]. V&V is a set of activities carried out with the main objective of withholding products from shipment if they do not qualify. In contrast, QA is meant to minimize the costs of quality by introducing a variety of activities throughout the development and maintenance process in order to prevent the causes of errors, detect them, and correct them in the early stages of development. As a result, QA substantially reduces the rates of non‐qualifying products. All in all, V&V activities are only a part of the total range of QA activities [47].

2.1.2.1. Quality Standards Various quality standards have been proposed to accommodate these different quality views and expectations. This section describes the ISO/IEC‐9126 (maybe the mostly influential in the SE community to date) and its successor, the ISO/IEC‐25000.

2.1.2.1.1. ISO/IEC‐9126 ISO/IEC‐9000 is a family of standards for quality management systems. In 1991, ISO published its first international consensus on the terminology for the quality characteristics for software product evaluation (ISO 9126 on Software Product Quality Characteristics and Guidelines for their Use) [77]. Afterwards, from 2001 to 2004, ISO published an expanded four‐part version, containing both hierarchical framework for quality models and metrics for these models. The current version of the ISO/IEC‐9126 series now consists of one International Standard (IS) [73] and three Technical Reports (TR) [74][75][76]. The ISO/IEC‐9126 quality model distinguishes three different views on software product quality: ‐ Internal quality: concerns the properties of the system that can be measured without executing it. ‐ External quality: concerns the properties of the system that can be observed during its execution. ‐ Quality in use: concerns the properties experienced by its users/customers during operation and maintenance of the system. Ideally, the internal quality determines the external quality and external quality determines quality in use, as depicted in the following picture:

‐ 9 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Internal influences External influences Quality Quality Quality in Use depends on depends on

Figure 5. ISO/IEC‐9126 Quality Lifecycle The first document of the ISO/IEC 9126 series (quality model) contains two‐part quality model for software product quality [5]: i) Internal and external quality model; ii) Quality in‐use model. The first part of the two‐part quality model determines six characteristics in which they are subdivided into twenty‐seven sub‐characteristics for internal and external quality [73]. Measures for estimating external, internal, and quality‐in‐use characteristics are listed in three technical reports accompanying the standard quality model. ISO/IEC 9126‐2 [74], ISO/IEC 9126‐ 3 [75], and ISO/IEC 9126‐4 [76] define respectively: external, internal, and quality in use quality metrics. Quality model of ISO/IEC‐9126 divides the internal and external software product quality into six top‐level quality features:

External and Internal Quality

Functionality Reliability Usability Efficiency Maintainability Portability

Suitability Maturity Understandability Time Analysability Adaptability Accuracy Fault Tolerance Learnability Behaviour Changeability Installability Interoperability Recoverability Operability Resource Stability Co‐Existence Security Reliability Attractiveness Utilisation Testability Replaceability Functionality Compliance Usability Efficiency Maintainability Portability Compliance Compliance Compliance Compliance Compliance

Figure 6. ISO/IEC‐9126 Quality Model (External and Internal Quality) The following definitions have been extracted directly from the norm ISO/IEC‐9126‐1 [73]: ‐ Functionality: “The capability of the software product to provide functions which meet stated and implied needs when the software is used under specified conditions”. The sub‐ characteristics include: ‐ Reliability: “The capability of the software product to maintain a specified level of performance when used under specified conditions”. The sub‐characteristics include: ‐ Usability: “The capability of the software product to be understood, learned, used and attractive to the user, when used under specified conditions”. The sub‐characteristics include: ‐ Efficiency: “The capability of the software product to provide appropriate performance, relative to the amount of resources used, under stated conditions”. The sub‐characteristics include: ‐ Maintainability: “The capability of the software product to be modified. Modifications may include corrections, improvements or adaptation of the software to changes in

‐ 10 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

environment, and in requirements and functional specifications”. The sub‐characteristics include: ‐ Portability: “The capability of the software product to be transferred from one environment to another”. The sub‐characteristics include: The attributes of quality in use are categorised into the following four characteristics:

Quality in use

Effectiveness Productivity Safety Satisfaction

Figure 7. ISO/IEC‐9126 Quality Model (Quality in Use) ‐ Effectiveness: “The capability of the software product to enable users to achieve specified goals with accuracy and completeness in a specified context of use”. ‐ Productivity: “The capability of the software product to enable users to expend appropriate amounts of resources in relation to the effectiveness achieved in a specified context of use”. ‐ Safety: “The capability of the software product to achieve acceptable levels of risk of harm to people, business, software, property or the environment in a specified context of use”. ‐ Satisfaction: “The capability of the software product to satisfy users in a specified context of use”.

2.1.2.1.2. ISO/IEC‐25000 ISO/IEC‐9126 presents some weaknesses found by researchers and practitioners [4]. Since 2005 and up‐to‐date, the ISO is updating the current ISO/IEC‐9126 international standard on software product quality measurement. However, this current standard will be superseded by the upcoming ISO/IEC‐25000 series of international standards on Software product Quality Requirements and Evaluation (SQuaRE). One of the objectives of this new standard series is the harmonization of its contents with the software measurement terminology of ISO/IEC‐15939 (software measurement process). ISO/IEC‐25000 series will replace the series of standards ISO/IEC‐9126 (software product quality) and also the ISO/IEC‐14598 (software product evaluation). The work on ISO/IEC‐25000 series is unfinished at this time. It is being carried out by Working Group 6 (WG6) of the software and system engineering subcommittee (SC7) of the ISO/IEC Joint Technical Committee (JTC1) on Information Technology (ISO/IEC JTC1/SC71). SQuaRE consists of the following five divisions: ‐ ISO/IEC‐2500n: Quality Management Division. The standards this division define all common models, terms and definition referred further by all other standards from SQuaRE series. ‐ ISO/IEC‐2501n: Quality Model Division. SQuaRE employs the same quality model proposed by ISO/IEC‐9126, dividing quality in characteristics for internal, external, and quality in use.

‐ 11 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

This division details this quality model decomposing, the internal and external software quality characteristics into sub‐characteristics ‐ ISO/IEC‐2502n: Quality Measurement Division. Software product quality measurement reference model, mathematical definitions of quality measures, and practical guidance for their application. ‐ ISO/IEC‐2503n: Quality Requirements Division. This division helps to specify quality requirements. These quality requirements can be used in the process of quality requirements elicitation for a software product to be developed or as input for an evaluation process. ‐ ISO/IEC‐2504n: Quality Evaluation Division. Requirements, recommendations and guidelines for software product evaluation, whether performed by evaluators, acquirers or developers. ‐ ISO/IEC 25050 to 25099 are reserved to be used for SQuaRE extension International Standards, Technical Specifications, Publicly Available Specifications (PAS) and/or Technical Reports: ISO/IEC 25051 and ISO/IEC 25062 are already published.

2.1.3. Verification and Validation Verification and Validation (V&V) ‐also known as Software Quality Control‐ is concerned with evaluating that software being developed meets its specification and delivers the functionality expected by the consumers. These checking processes start as soon as requirements become available and continue through all stages of the development process [54]. Verification is different to validation, although they are often confused. Barry Boehm expressed the difference between them [19]:

‐ Verification: are we building the product right? The aim of verification is to check that the software meets its stated functional and non‐functional requirements (i.e. the specification). ‐ Validation: are we building the right product? The aim of validation is to ensure that the software meets consumer’s expectations. It is a more general process than verification, due to the fact that specifications not always reflect the real wishes or needs of consumers (i.e., users and customers). V&V activities include a wide array of QA activities. Although software testing plays an extremely important role in V&V, other activities are also necessary. Within the V&V process, two big groups of techniques of system checking and analysis may be used [111]:

‐ Software testing. It is the most commonly performed activity within QA. Given a piece of code, software testing (or simply testing) consists of observing a sample of executions (test cases), and giving a verdict over them [16]. Hence testing is an execution‐based QA activity so a prerequisite is the existence of the implemented software units, components, or system to be tested. For that reason, it is sometimes called dynamic analysis. Software testing is a broad term encompassing a wide spectrum of different concepts, such as testing level (unit, integration, system, user testing, and so on), testing strategies (black‐ box, white‐box, grey‐box, and non‐functional testing), and testing processes (manual, model‐based, automated testing, and so on). On one hand testing establishes the existence of defects. On the other hand, debugging is concerned with locating and

‐ 12 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

correcting these defects [132]. As major parts on this dissertation, testing is covered in section 2.3 and automated testing in section 2.4. ‐ Static analysis. It is a form of V&V that does not require execution of the software. Static analysis work on a source representation of the software: either a model of the specification of design, or the source or the program [3]. Perhaps the most commonly used are inspection and review, where a specification, design or program is checked by a group of people. Additional static analysis techniques may be used, such as automated program analysis (the source code of a program is checked for patterns that are known to be potentially erroneous) and formal methods (mathematical arguments that a program conform its specification) [132]. Nowadays, the executable code per excellence is code (although there are some executable specification and design languages, there are not widespread). Thus, any product during development can be evaluated using static analysis, including of course code. However, testing (dynamic analysis) almost exclusively executes code. It should be noted that there is a strong divergence of opinion about what types of testing constitute validation or verification. Some authors believe that all testing is verification and that validation is conduced when requirements are reviewed and approved. Other authors view unit and integration testing as verification and higher‐order testing (e.g. system or user testing) as validation [122]. To solve this divergence, V&V can be treated as a single topic rather than as two separate topics [1]. Therefore, V&V can be seen as a disciplined approach to assessing software products throughout the product life cycle. All in all, V&V activities can be summarized in the following picture:

Levels

Testing Strategies

Processes

Inspection V&V

Review Static Analisys Automatic Analysis

Formal Methods

Figure 8. Verification & Validation Schema

2.1.3.1. Defects Key to the correctness aspect of V&V is the concept of software defect. The term defect generally refers to some problem with the software, either with its external or internal

‐ 13 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

behaviour. Software problems or defects are also commonly referred to as “bugs”. The IEEE Standard 610.12 defines the following terms related to defects [82]: ‐ Error: A human action that produces an incorrect result. Errors can be classified into two categories: i) Syntax error (program statement that violates one or more rules of the language in which it is written); ii) Logic error (incorrect data fields, out‐of‐range terms, or invalid combinations). ‐ Fault: An incorrect step, process, or data definition in a computer program. It is a condition that causes a system to fail in performing its required function. ‐ Failure: The inability of a system or component to perform its required functions within specified performance requirements. In addition to this level of granularity for defects, it is interesting to contemplate also incidents as symptom associated with a failure that alerts the user to the occurrence of a failure. All in all, error, faults, failures, and incidents are different aspects of software defects. A causal relation exists among these four aspects of defects [138]. Errors may cause faults to be injected into the software, and faults may cause failures when the software is executed.

2.2. Static Analysis

Static analysis of a software piece is performed without executing the code. There are three advantages of software analysis over testing [132]:

1. During testing, errors can hide other errors. This situation does not happen with static analysis, because it is not concerned with interactions between errors. 2. Incomplete versions of a system can be statically analysed without additional cost. In testing, if a program is incomplete, test harnesses have to be developed. 3. Static analysis can consider broader quality attributes of a System Under Test (SUT) than searching defects, such as compliance with standards, portability, and maintainability.

2.2.1. Inspections Inspections are critical examinations of software artefacts by human inspectors aimed at discovering and fixing faults in the software systems. All kinds of software artefacts for are subject to be inspected. This is primary reason for the existence of inspection: not waiting for the availability of executable programs (such as in testing) before starting performing inspection [138]. The original Fagan inspection process included five steps [41]: i) Planning: Deciding what to inspect, who should be involved, and what role. ii) Overview meeting: The author assigns the individual indications of inspection to the inspectors. iii) Preparation: Each inspector performs individual inspection. iv) Inspection meeting to collect and consolidate individual inspection results. v) Rework. The author fixes the identified problems or provides other responses. vi) Follow‐up: Closing the inspection process by final validation. The Gilb inspection supposes a variation of Fagan inspection since an additional step (called “process brainstorming”) is added right after the inspection meeting [52]. This step is aimed at preventive actions and process improvement in the form of reduced defect injections for future development activities.

‐ 14 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2.2.2. Review Review is the process in which a group of people examine the software and its associated documentation, looking for potential problems and non‐conformance with standards, and other potential problems or omissions. The review team makes informed judgment about the level of quality of the system under review. This review process is based on documents produced during the software development process, such as specification, design, code, models, test plan, configuration management procedures, or user manuals [54].

A special form of review is called walkthrough, a more organized review typically applied to software design and code. It is considered to be an informal type of review. According to IEEE Standard for Software Reviews, a walkthrough is a form of software peer review “in which a designer or programmer leads members of the development team and other interested parties through a software product, and the participants ask questions and make comments about possible errors, violation of development standards, and other problems” [69].

2.2.3. Automated Software Analysis Automated Software Analysis (ASA) assesses the source code using patterns that are known to be potentially dangerous [54] ASA technologies are usually delivered as commercial or open source tools and services. These tools can locate many common programming faults, analysing the source code before it is tested and identifying potential problems in order to re‐code them before they manifest themselves as failures [83]. The intention of this analysis is to draw a code reader’s attention to faults in the program, such as: ‐ Data faults. For example, variable used before initialization, variables declared but never used, variables assigned twice but never used between assignments, and so on. ‐ Control faults. For example, unreachable code or unconditional branches into loops. ‐ Input/output faults. For example, variables output twice with no intervening assignment. ‐ Interface faults. For example, parameter‐type mismatches, parameter under mismatches, non‐usage of the results of functions, uncalled functions and procedures, etc. ‐ Storage management faults. For example, unassigned pointers, pointers arithmetic, or memory leaks.

2.2.4. Formal Methods The term “formal methods” is used to refer to any activities that rely on mathematical representations of software including formal specification and verification. In the 1980s, many software engineering researchers proposed that using formal development methods was the best way to improve software quality. They predicted that by the 21st century, a large proportion of software would be developed using formal method. This is not true due to the fact that [54]: ‐ Successful SE techniques. The use of other SE methods such as structured methods or configuration management has resulted in improvements in software quality. ‐ Limited scope of formal methods. Formal methods are not well suited to specifying user interfaces and user interaction, and nowadays the user interface component has become a greater and greater part of most systems.

‐ 15 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Limited scalability of formal methods. Projects that have used these techniques have mostly been concerned with relatively small, such as critical kernel systems. As systems grow, the effort required to develop a formal specification grows excessively. Formal methods comprise a set of mathematically‐based techniques for the specification and verification of software systems. Both formal specification techniques and formal verification techniques are widely referred to as formal methods collectively in literature. The existence of formal specifications is a prerequisite for formal verifications.

2.2.4.1. Formal Specifications Formal specifications produce an unambiguous set of product specification such as customer requirements, environmental constraints and design intentions. They can be produced in several different forms [132]. Firstly, descriptive specifications are focused on the properties or conditions associated with software products and their components. There are several kinds of descriptive specifications, such as Entity‐Relationship (ER) diagrams are commonly used to describe product components and connections. These diagrams show data entities, their associated attributes and the relations between them [132]. Logical specifications focus on the formal properties associated with different product components or the product as a whole. They are logical statements or conditions associated with the states, or program states, of programs or program segments. The basic elements of these logical specifications are the pre‐conditions, post‐conditions, and invariants, which are generally associated with program code. Some examples are the Z specification [133] and VDM (Vienna Definition Method) [81] languages. Contracts it is also a kind of logical specification. In addition, the Object Constraint Language (OCL) was initially designed as a logical specification for UML [144]. Nowadays it is part of the Meta‐Object Facility (MOF) standard by the Object Management Group (OMG). Algebraic specifications focus on functional computation carried out by a program or program‐segment and related properties, for example Larch [57] or the OBJ family [46]. Syntactic specifications are used to describe languages used in computing, such as programming languages. The BNF (Backus–Naur Form) notation is universally used for syntactic specifications. Secondly, operational specifications are focuses on the required behaviour of the software systems, for example: Data Flow Diagrams (DFDs) specify information flow among the major functional units. They are used to show how data flows through a sequence of processing steps. Unified Modelling Language (UML) provides a general‐purpose visual modelling language used to specify, visualize, construct and document software artefacts. Finite‐State Machines (FSMs) is a behavioural model composed of a finite number of states, transitions between those states, and actions. Labelled Transition Systems (LTS) is also a state machine representation. The main difference between FSM and LTS is that while an FSM has transitions labelled with pairs of (input, output), an LTS specifies interactions, which can, but need not be interpreted as input or as output. Similarly, Petri Nets are considered as a special kind of FSMs with two distinct types of nodes called places and transitions [119]. Graphs‐based models are the abstract representation of a set of points (vertices or nodes) connected by lines (edges or links). Graphs or digraphs (i.e. directed graphs) are sometimes used in the literature to model the behaviour of a software system. For instance, an Event‐Flow Graph (EFG) is used in GUI testing since it represents all the possible interactions among the events in a GUI. Similarly, an

‐ 16 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Event Interaction Graph (EIG) contains nodes, one for each system‐interaction event in the GUI. An Event Semantic Interaction Graph (ESIG) contains nodes that represent events, directed edges from nodes shows that there is a semantic relationship from the event represented by nodes. Specification and Description Language (SDL) specifies the description and the behaviour of reactive and distributed systems. It provides both a graphical Graphic Representation (SDL/GR) and a textual Phrase Representation (SDL/PR) [123]. A system is specified as a set of interconnected abstract machines which are extensions of FSM.

2.2.4.2. Formal Verification Formal verification checks the conformance of software design or code to the formal specifications, ensuring that the software is fault‐free with respect to its formal specifications. Axiomatic correctness (Hoare logic) [65][147] works with the logical specifications of programs or formal designs by associating with each type of program or design elements with an axiom to prescribe the logical transformation of program state before and after the execution of this element type. Weakest pre‐conditions [36][56] works focusing on the goal or the computational result that is captured by the final state of the execution sequence. A series of backward chaining operations through the use of the so‐called weakest pre‐conditions transform this final state and its properties into an initial state and its properties. Functional correctness (program calculus) [107] is similar to the axiomatic approach in the sense that some basic axioms or meanings of program elements are prescribed. Symbolic execution is used to connect these elements in a program. It involves executing (interpreting) a program symbolically. During the execution, the values of variables are held in algebraic form and the outcome of the program is represented as one or more expressions. Decisions are handled by following both outcomes while remembering the condition value corresponding to each of the paths now being followed separately. At the end of the evaluation, there will be two facts about each path through the program: a list of the decision outcomes made along the path and the final expression of all the variables expressed algebraically. Together these define the function of the program and can be compared with the required function. Semi‐formal techniques check certain properties instead of proving the full correctness of software. For example, model checking, which is an approach to automatically or algorithmically check certain properties for some software systems [138]. In model checking, a software system is modelled as a FSM, with some property of interest expressed as a suitable formula, or a proposition, defined with respect to the FSM. After that, the model checker runs an algorithm to check the validity of the proposition.

2.3. Software Testing

Software testing consists of the dynamic evaluation of the behaviour of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behaviour [1]. The key concepts of this definition are depicted as follows:

‐ 17 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Dynamic: The SUT is executed with specific input values to find failures in its behaviour. Thus, the actual SUT should ensure that the design and code are correct, and also the environment, such as the libraries, the operating system and network support, and so on. ‐ Finite: Exhaustive testing is not possible or practical for most real programs. They usually have a large number of allowable inputs to each operation, plus even more invalid or unexpected inputs, and the possible sequences of operations are usually infinite as well. Testers must choose a number of tests so that we can run the tests in the available time. ‐ Selected: Since there is a huge or infinite set of possible tests but can afford to run only a small fraction of them, the key challenge of testing is how to select the tests that are most likely to expose failures in the system. ‐ Expected: After each test execution, it must be decided whether the observed behaviour of the system was a failure or not.

2.3.1. Testing Levels Typically, a commercial software system has to go through three stages of testing [132]: 1. Development testing, where the SUT is tested during development to discover defects. This stage is performed by software engineers (i.e. programmers, testers, system designers, and so on). 2. Release testing, where a separate testing team tests a complete version of the system before it is released to users. The aim of this stage is to check that the SUT meets its requirements. 3. User testing, in which potential or real users of the system test it in their own environment.

2.3.1.1. Development Testing Development testing includes all testing activities that are performed by the team developing the system. In this stage, testing may be carried out at three levels of granularity [132]: 1. Unit testing, where individual program units are tested. Unit testing should focus on the functionality of objects or methods. 2. Integration testing, where units are combined to create composite components. Integration testing should focus on testing components interfaces. 3. System testing, where all of the components are integrated and the system is tested as a whole. System testing should focus on testing components interactions.

2.3.1.1.1. Unit Testing Unit testing is a method by which individual pieces of source code are tested to verify that the design and implementation for that unit has been correctly implemented. There are four phases executed in sequence in a unit test case [101], illustrated in Figure 9 and described as follows: ‐ Setup. The test case initialises the test fixture, that is the “before” picture required for the SUT to exhibit the expected behaviour. ‐ Exercise. The test case interacts with the unit or component under test. The unit to be tested usually queries another component, named Depended‐On Component (DOC). ‐ Verify. The test case determines whether the expected outcome has been obtained using test oracles.

‐ 18 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Teardown. Test case tears down the test fixture to put the SUT back into the initial state.

Figure 9. Unit Testing Unit testing be done with unit under test in isolation, i.e., without interacting with its DOCs. For that aim, test doubles are employed to replace any components on which the unit under test depends. There are the following kinds of test doubles [101]: ‐ A dummy object is a placeholder object that is passed to the SUT as an argument (or an attribute of an argument) but is never actually used. ‐ A test stub is an object that replaces a real component on which the SUT depends so that the test can control the indirect inputs of the SUT. It allows the test to force the SUT down paths it might not otherwise exercise. A Test Spy, which is a more capable version of a Test Stub, can be used to verify the indirect outputs of the SUT by giving the test a way to inspect them after exercising the SUT. ‐ A mock object is an object that replaces a real component on which the SUT depends so that the test can verify its indirect outputs. ‐ A fake object is an object that replaces the functionality of the real DOC with an alternative implementation of the same functionality. Tester should write two kinds of unit test cases. The first one should reflect normal operation of a program and should show that the components work. The other kind of test case should be based on testing experience of where common problems arise. It should use abnormal inputs to check that these are properly processed and no not crash the unit under test [132].

2.3.1.1.2. Integration Testing Integration testing should expose defects in the interfaces and interaction between integrated components or modules [132]. There are different strategies to perform integration testing. First, decomposition‐based is a strategy to describe the order in which units are to be integrated, presuming that the units have been separately tested. There are four integration strategies based on the functional decomposition of the SUT [82]: ‐ Top‐down integration. This strategy starts with the main unit (module), i.e. the root of the procedural tree. Any lower‐level module that is called by the main unit should be substituted by a test double (e.g. a Test Stub). Once testers are convinced that the main unit logic is correct, the stubs are gradually replaced with the actual code. This process is repeated for the rest of lower‐unit in the procedural tree. The main advantage of this approach is that defects are more easily found. ‐ Bottom‐up integration. This strategy is a mirror image to the top‐down order, with the difference that test double modules (e.g. a Fake Object) emulate units at the next level up

‐ 19 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

in the procedural tree. In this case, the test double module (whichever) is known as a driver. Whit this approach is easier to find a missing branch link. ‐ Sandwich integration. This strategy is a combination of top‐down and bottom‐up integration. ‐ Big‐Bang integration. All or most of the units are integrated at the same time. This method is very effective for saving time in the integration testing process. However it makes the entire integration process more complicated. The second integration testing strategy is call graph‐based, in which the basic idea is to use the call graph instead of the functional decomposition tree. A call graph is a directed labelled graph which represents the SUT. There are two types of call graph based integration testing ‐ Pairwise integration. The idea behind this approach is to eliminate the need for developing test doubles, using the actual code. Integration is restricted to a pair of units in the graph. ‐ Neighbourhood integration. The neighbourhood of a node in a graph is the set of nodes that are one edge away from the given node. Neighbourhood integration testing reduces the number of test sessions and avoids the use of test doubles. Finally, in the path‐based approach the motivation is to combine structural and behavioural methods of testing for integration testing, focusing on interactions among units [82].

2.3.1.1.3. System Testing System testing during development involves integrating components to create a version of the system and the testing the integrated system. It verifies that the components are compatible, interact correctly and transfer the right data at the right time across the interfaces [54]. It obviously overlaps with integration testing, but the difference here is that system testing should involve all the components developed. When the testing process is perform to determine whether the system meets its specification is known as conformance testing. When a new feature or functionality is introduced to a system (we can call it, a build), the way of testing this new feature in known as progression testing. In addition to that, to check that the new introduced changes do not affect the correctness of the rest of the system, the existing test cases are exercised. This approach is commonly known as regression testing [82]. When the system interacts with any external or third party system, another testing could be done, known as system integration testing. This kind of testing verifies that the system is integrated to any external systems properly.

2.3.1.2. Release Testing Release testing is the process of testing a particular release of a system performed by separate team outside the development team. While development system testing should focus on discovering defects in the system (defect testing), the aim of release testing is to check that the system meets is requirements (validation testing) [132]. The primary goal of the release testing process is to convince the supplier of the system that is good enough for use. If so, it can be released as a product or delivered to the consumer. Release testing is usually a black‐box testing process where tests are derived from the specification.

‐ 20 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2.3.1.3. User Testing User or customer testing is a stage in the testing process in which users or customers provide input and advice on system testing. There are different types of user testing [132]: ‐ Alpha testing takes place at developers' sites, working together with the software consumers, before it is released to external users or customers. ‐ Beta testing takes place at customers' sites, and involves testing by a group of customers who use the system at their own locations and provide feedback, before the system is released to other customers. ‐ Acceptance testing, where consumers decide whether or not the system is ready to be deployed in the consumer environment. It can be seen as a black‐box (functional) testing performed at system level by final users or customers. ‐ Operational testing is performed by the end user in its normal operating environment.

2.3.2. Testing Methods Testing methods (or strategies) define the approaches for designing test cases. They can be responsibility based (black‐box), implementation based (white box), or hybrid (grey‐box) [120]. Black‐box techniques design test cases on the basis of the specified functionality of the item to be tested. White‐box ones rely on source code analysis to develop test cases. Grey‐box testing designs test cases using both responsibilities based and implementation based approaches.

2.3.2.1. Black‐Box Testing Black‐box testing (also known as functional or behavioural testing) is based on requirements with no knowledge of the internal program structure or data. Black‐box testing relies on the specification of the system or the component that is being tested to derive test cases. The system is a black‐box whose behaviour can only be determined by studying its inputs and the related outputs [82]. There are a lot of specific black‐box testing techniques; some of the most well‐known ones are described as below. Systematic testing refers to a complete testing approach in which SUT is shown to conform exhaustively to a specification, up to the testing assumptions. It generates test cases only in the limiting sense that each domain point is a singleton sub‐domain [82]. Inside this category, it can be found for example pairwise (all‐pairs) testing, which is a combinatorial testing method that, for each pair of input parameters to a SUT, tests all possible discrete combinations of those parameters. Other systematic black‐box testing techniques are equivalence partitioning and boundary value analysis (described in section 2.5.2). Random testing is literally the antithesis of systematic testing: the sampling is over the entire input domain. Duran and Ntafos [37] have demonstrated, with both theoretical and empirical evidence, that random test case selection criteria can be as effective at defect detection as partitioning methods. This means of testing seems a better choice than systematic testing in two general situations [58]: i) Sparse sampling: for a large, unstructured input domain. ii) Persistent state: the usual theoretical assumption is that software is reset between tests, so that results are repeatable. Fuzz testing is a form of black‐box random testing which randomly mutates well‐formed inputs and tests the program on the resulting data [54]. It delivers randomly sequenced and/or structurally bad data to a system to see if failures occur.

‐ 21 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Graphic User Interface (GUI) testing is the process of ensuring the specification of software with a graphic interface interacting with the user. GUI testing is event driven (e.g. mouse movements or menu selections) and provides a front end to the underlying application code through messages or method calls [98]. GUI testing at unit level is used typically at the button level. GUI testing at system level exercises the event‐drive nature of the SUT. GUI applications offer a small benefit for testers: there is little need for integration testing. GUI testing is mainly used for ensuring the correctness the entire system’s functionality, safety, robustness, and usability [82]. Smoke testing is the process of ensuring the main functionality of the SUT. A smoke test case if the first to be run by testers before accepting a build for further testing. Failure of a smoke test case will mean that the build is refused by testers. The name of “smoke testing” derives electrical system testing, whereby the first test was to switch on and see if it smoked [42]. Sanity testing determines whether or not it is reasonable to proceed with further testing. The difference with smoke testing it is that if a smoke test fails, it is impossible to conduct a sanity test. In contrast, if the sanity test fails, it is not reasonable to attempt more rigorous testing. Both sanity tests and smoke testing are ways to avoid wasting time and effort in more rigorous testing. The typical example of sanity testing for development environment is the “Hello world” program.

2.3.2.2. White‐Box Testing White‐box testing (also known as structural testing) is based on knowledge of the internal logic of an application's code. It determines if the program‐code structure and logic is faulty. White‐ box test cases are accurate only if the tester knows what the program is supposed to do. White‐box testing does not account for errors caused by omission [60]. Black‐box testing uses only the specification to identify use cases, while white‐box testing uses the program source code (implementation) as the basis to of test cases identification. Both approaches used in conjunction should be necessary in order to select a good set of test cases for the SUT [60]. Hence, the following table summarizes the main differences between black and white‐box testing approaches:

Table 1. Black‐Box Vs. White‐Box Testing Feature Black‐box White‐box

Tester visibility Specification/Requirements (input Code & output) Defect type Failures Faults Defect No, debugging in needed to find Yes, test cases identify the identification the fault which causes the failure specific LOC involved Usually done by Independent tester team Developers Some of the most significant white‐box techniques are described as follows. Code coverage defines the degree of source code which has been tested, for example in terms of percentage of Lines of Code (LOC). There are several criteria for the code coverage: ‐ Statement Coverage. Line of code coverage granularity.

‐ 22 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Decision (branch) Coverage. Control structure (e.g. if‐else) coverage granularity. ‐ Condition coverage. Boolean expression (true‐false) coverage granularity. ‐ Paths coverage. Every possible route coverage granularity. ‐ Function coverage. Program functions coverage granularity. ‐ Entry/exit coverage. Call and return of the coverage granularity. Fault injection is the process of injecting faults into software to determine how well (or badly) some SUT behaves [42]. Defects can be said to propagate in that their effects are visible in program states beyond the state in which the error existed (a fault became a failure). Mutation analysis validates tests and their data by running them against many copies of the SUT containing different, single, and deliberately inserted changes. Mutation analysis helps to identify omissions in the code [42].

2.3.2.3. Grey‐Box Testing Grey‐box testing is the technique that uses a combination of black‐box and white‐box testing. Grey‐box testing is not black box testing, because the tester does know some of the internal workings of the SUT. In grey‐box testing, the tester applies a limited number of test cases to the internal workings of the software under test. In the remaining part of the grey‐box testing, one takes a black‐box approach in applying inputs to the SUT and observing the outputs.

2.3.2.4. Non‐Functional Testing The non‐functional aspects of a system can require considerable effort to test and perfect. Within this group it can be found different means of testing, for example performance testing conducted to evaluate the compliance of a SUT with specified performance requirements [42]. These requirements usually includes constraints about the time behaviour (capability of the software product to provide appropriate response and processing times and throughput rates when performing its function, under stated conditions) and resource utilization (capability of the software product to use appropriate amounts and types of resources when the software performs its function under stated conditions). Performance testing may measure response time with a single user exercising the system or with multiple users exercising the system. Load testing is focused on increasing the load on the system to some stated or implied maximum load, to verify the system can handle the defined system boundaries. Volume testing is often considered synonymous with load testing, yet volume testing focuses on data. Stress testing exercises beyond normal operational capacity to the extent that the system fails, identifying actual boundaries at which the system breaks. The aim of stress testing is to observe how the system fails and where the bottlenecks are [132]. Security testing tries to ensure the following concepts: confidentiality (protection against the disclosure of information), integrity (ensuring the correctness of the information), authentication (ensuring the identity of the user), authorisation (determining that a user is allowed to receive a service or perform an operation), availability (ensuring that the system perform its functionality when required) and non‐repudiation (ensuring the denial that an action happened). Usability testing focuses on finding user interface problems, which may make the software difficult to use or may cause users to misinterpret the output.

‐ 23 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Accessibility testing is the technique of making sure that your product is accessibility (ability to access to the system functionality) compliant.

2.4. Testing of Web Applications

Web‐based applications (or simple web applications) shares the same objectives of traditional application testing, i.e. to ensure quality and finding defects in the required functionality and services. A web application can be viewed as a client‐server distributed system, with the following main characteristics [34]: ‐ A wide number of users distributed all over the world accessing concurrently. ‐ Heterogeneous execution environments (different hardware, network connections, operating systems, web servers and browsers). ‐ A heterogeneous nature, because of different technologies (programming languages and models), and different involved components (generated from scratch, legacy ones, hypermedia components, Commercial Off‐The‐Shelf ‐COTS‐ and so on). ‐ Dynamic nature. Web pages can be generated at run time according to user inputs and server status. The aim of web testing consists of executing the application using combinations of input and state to reveal failures. These failures are mainly caused by faults in the running environment or in the web application itself. The running environment mainly affects the non‐functional requirements of a web application (e.g. performance, stability, or compatibility), while the web application is responsible for the functional requirements. Therefore, web testing has to be considered from this two distinct perspectives (functional and non‐functional), since they are complementary and not mutually exclusive. All in all, different types of testing have to be executed to reveal these diverse types of failures [34].

2.4.1. Web Testing Levels Compared with traditional software, the definition of the development testing levels (i.e., unit, integration, and system testing) for a web application requires a greater attention.

2.4.1.1. Unit Web Testing Different types of unit may be identified in a web application, such as the web pages, or scripting modules, forms, applets, servlets, or other web objects. Anyway, the basic unit that can be actually tested is a web page. There are some differences between testing a client and a server page.

2.4.1.1.1. Client Page Testing Client pages show the textual or hyper‐textual information to users, accepting user input, or allowing user navigation throughout the application. A client page may include scripting code modules that perform simple functions, such as input validation or simple computations. Testing of dynamically generated client pages is a particular case of client page testing. The basic problem of this testing is the availability of built pages that depends on the capability of identifying and reproducing the conditions from which pages are built. A second problem is a state explosion problem, since the number of generable dynamic pages can be considerable,

‐ 24 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

depending on the large number of possible combinations of application states and user inputs. Equivalence class partitioning criteria should be used to approach this question. The typical failures that the testing of a client page would identify are the following: ‐ Differences between the content displayed by the page and the one specified and expected by a user. ‐ Wrong destination of links towards other pages. ‐ Existence of broken links (links towards not existing pages). ‐ Wrong actions performed when a button, or any other active object, is selected by a user ‐ Script failures in the client page. Unit testing of client pages can be carried out by white‐box, black‐box, or gray‐box techniques. The typical criteria for client page white box test coverage are: ‐ HTML statement coverage. ‐ Web objects coverage (e.g., each image or applet has to be exercised at least once). ‐ Script blocks coverage (e.g. each block of scripting code has to be executed at least once). ‐ Statement/branch/path coverage for each script module. ‐ Hyper‐textual link coverage.

2.4.1.1.2. Server Page Testing Server pages have the main responsibility for implementing the business logic of the application, managing the storing and retrieving of data into/from a database. Server pages are usually implemented with scripting technologies, such as JSP (Java Server Pages), Servlets, ASP (Active Server Pages), COTS, or PHP (Hypertext Preprocessor) among others. The typical failures detectable by server page testing are: ‐ Failures in the executions of servlets or other technologies. ‐ Incorrect executions of data storing into a database. ‐ Failures due to the existence of incorrect links between pages. ‐ Defects in dynamically generated pages. Likewise client web page testing, server web page testing can be carried out by white‐box, black‐box, or grey‐box techniques. The coverage criteria for server page white box testing could be: ‐ Statement/branch/path coverage in script modules. ‐ HTML statement coverage. ‐ Servlet, COTS, and other web objects coverage. ‐ Hyper‐textual link coverage. ‐ Coverage of dynamically generated pages.

2.4.1.2. Integration Web Testing Web application integration testing considers sets of related web pages in order to assess how they work together, and identify failures due to their coupling [34]. The web application use cases (or any other description of the functional requirements) can drive the process of page integration. The identification of such web pages can be made by analysing the development documentation or by reverse engineering the application code.

‐ 25 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

At the testing integration level, the knowledge of both the structure (set of pages to be integrated) and the behaviour of the web application have to be considered. Therefore, grey‐ box techniques will be more suitable than pure black or white box ones to carry out integration testing.

2.4.1.3. System Web Testing On one hand, black‐box techniques are usually employed to accomplish system testing in the externally visible behaviour of the application. On the other hand, in order to discover web applications failures due to incorrect navigation links among pages, grey‐box testing techniques are most suitable. The errors due to incorrect navigation include: ‐ Links reaching a web page different from the specified one ‐ Pending links to unreachable pages (broken links). Typical coverage criteria for system testing include: ‐ User function/use case coverage (black‐box approach). ‐ Page (both client and server) coverage (white‐box or grey‐box approaches). ‐ Link coverage (white‐box or grey‐box approaches).

2.4.2. Web Testing Strategies The following sub‐sections describes white‐box (structural), black‐box (functional) and non‐ functional testing for web applications‐

2.4.2.1. White‐Box Web Testing The design of test cases using a white‐box strategy is made using two artefacts: ‐ The test model, i.e. the code representation of the component under test. ‐ The coverage model, which specifies the parts of the representation that must be exercised by the test case. Regarding the test model, there are two families mainly adopted in the literature for white‐box web testing: ‐ One focuses on the level of abstraction of single statements of code components representing the information about their control‐flow or data‐flow. ‐ A second family considers the coarser degree of granularity of the navigation structure between pages of the application with some eventual additional details. Regarding the coverage criteria, traditional ones (such as those involving nodes, edges, or paths) have been applied to both families of test models.

2.4.2.2. Black‐Box Web Testing Black‐box (functional) testing should find the failures of the web applications that are due to faults in the implementation of the specified functional requirements, rather than to the execution environment. Most of the methods and approaches used to test the functional requirements of traditional software can be used for web applications too. The main issue with black‐box testing of web applications is the choice of a suitable model for specifying the behaviour of the SUT and deriving test cases. This behaviour may be significantly

‐ 26 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

dependant on data managed by the application or user input, with the consequence of a state explosion problem. To solve this problem, some solutions are presented in the literature. The approach proposed by Di Lucca et al. [90] exploits decision tables as a combinatorial model for representing the behaviour of the web application and producing test cases. This approach provides a method for both unit and integration testing. Another approach provided by Andrews et al. [6] proposes Finite State Machine (FSM) to model state dependent behaviour of web applications and designing test cases. This approach mainly addresses integration and system testing.

2.4.2.3. Grey‐Box Web Testing Grey‐box testing is well suited for web testing because it evaluates high‐level design, environment, and interoperability conditions. It can reveal issues on end‐to‐end information flow and system configuration and compatibility [113]. Strategies based on the collection of user session data can be classified as grey‐box. These strategies use collected data to test the behaviour of the application in a black‐box fashion, but they also aim at verifying the coverage of any internal component of the application (e.g. page or link coverage). Data to be captured include clients’ requests expressed in form of URLs and name‐value pairs. Captured data about user sessions can be transformed into a set of HTTP requests, each one providing a separate test case.

2.4.3. Non‐Functional Web Testing The main non‐functional requirements for web applications are the following: performance, scalability, compatibility, accessibility, usability, and security [34]. The following table presents a description of these non‐functional requirements and a list of verification activities that can be executed for web applications:

Table 2. Web Application Non‐Functional Testing Activities Description Performance It verifies the specified system performances, such as response time or testing service availability. It is executed by simulating many concurrent users accessing over a defined time interval. Failures revealed by performance testing are mainly due to running environment faults, such as scarce resources or not well deployed resources. There are two special cases of performance testing: ‐ Load testing (sometimes called volume testing): It requires that system performance is evaluated with some predefined conditions, such as the minimum and maximum activity levels of the running application. ‐ Stress testing: It is executed to evaluate a system, or component at or beyond the limits of its specified requirements. It is used to evaluate system responses at activity peaks that can exceed systems limitations, and to verify if the system crashes or it is able to recover from such conditions. Compatibility It has to uncover failures due to the usage of different web server platforms testing or client browsers. Therefore, both the application and the running environment are responsible for compatibility failures.

‐ 27 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Usability It aims at verifying to what extend an application is easy to use. Usability testing testing is mainly centred on testing the User Interface (UI): issues concerning the correct rendering of the contents (e.g. graphics, text editing format, etc.) as well as the clearness of messages, prompts and commands are to be considered and verified. Web usability testing is about the completeness, correctness and conciseness of the navigation. The application is mainly responsible for usability failures. Accessibility It aims to verify that access to the content of the application is allowed even testing in presence of reduced hardware or software on the client side (such as browser configurations disabling graphical visualization, or scripting execution), or of users with physical disabilities (such as blind people). The application is the main responsible for accessibility. Security It aims at verifying the effectiveness of the web defences against undesired testing access of unauthorized users or improper uses, and to grant the access to authorized users to authorized services and resources. Both the running environment and the application can be responsible for security failures.

2.4.4. Web Testing Tools The effectiveness of a testing process can significantly depend on the tools used to support the process. Testing tools usually automate some tasks required by the process, such as test case generation, test case execution, or result evaluation. A list of more than 400 testing tools (commercial and open source) is presented in http://www.softwareqatest.com/qatweb1.html. Web application testing tools can be classified using the following main categories [100]: 1. Supporting non‐functional requirements: a. Load, performance and stress test tools. b. Web security test tools. c. HTML/XML validators. 2. Supporting conformance testing: d. Link checkers. e. Usability and accessibility test tools. 3. Supporting functional testing: f. Web functional/regression test tools. Regarding functional testing, existing tools’ main contribution is limited to managing test case suites created manually, and to matching the test case results with respect to an oracle created manually. The usage of these browser elements ‐such as the back/forward or reload buttons‐ may negatively affect the navigation, because they might introduce some inconsistencies or violate any functional/not‐functional requirements of the application [34]. To avoid this situation, such features should be worth of consideration when testing the behaviour of the web application. For example, [91] proposes a model and an approach has been proposed to test the interaction between a web application and browser buttons where the browser is modelled by a state chart diagram and each state is defined by the page displayed and by the state of the Back/Forward buttons, while user actions on page links or browser buttons determine the state transitions.

‐ 28 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2.5. Automated Software Testing

Dustin et al. define Automated Software Testing (AST) as the “Application and implementation of software technology throughout the entire Software Testing Lifecycle (STL) with the goal to improve efficiencies and effectiveness” [38]. One of the software testing research dreams described by Bertolino in [16] is to achieve 100% AST. This dream is divided into: i) developing advanced techniques for generating the test inputs; ii) finding innovative support procedures to automate the testing process. Many surveys have highlighted the lack of AST tasks in most software organizations [128][125][50][53][104]. The main benefits of AST are [47]: anticipated cost savings, shortened test duration, heightened thoroughness of the tests performed, improvement of test accuracy, improvement of result reporting as well as statistical processing and subsequent reporting. AST at system level is usually more difficult than unit or integration. Automated unit testing relies on predicting the outputs then encoding these predictions, which are compared with the real outputs. At system level, the outputs are larger and cannot be easily predicted [132]. All in all, AST must provide tools that address test planning, test design, test construction, test execution, and test results verification, and test reporting [109]. Hence, AST would be implemented by means of a powerful integrated test framework which takes care of generating or recovering the needed test case data, generating the most suitable test cases, executing them and finally issuing a test report. The following subsections present the following topics on AST, namely: i) Test case generation; ii) Test data generation; iii) Automated test oracle; iv) AST frameworks; v) AST frameworks for web applications.

2.5.1. Test Case Generation Several approaches have been proposed for test case generation. In Model‐Based Testing (MBT) test cases are derived in whole or in part from a model that describes some (if not all) aspects of the SUT [8]. MBT is a form of black‐box testing because tests are generated from a model, which is derived from the requirements documentation. It can be done at unit, integration of system level. The difference from the usual black‐box testing is that rather than manually writing tests based on the requirements documentation, a model of the expected SUT behaviour is created, which captures some of the requirements. Then the MBT tools are used to automatically generate tests from that model [141]. The main use of MBT is to generate functional tests, but it can also be used for some kinds of non‐functional tests, such as robustness or performance testing (under development). Specification‐based test case generation is based formal specification in a determined language and automatically generate test cases for an implementation of that specification. Some examples of these languages are SDL [123], or Z specification [133]. An important kind of specification‐based test generation uses contracts. A contract can be seen as a collection of the following constraints: pre‐conditions, post‐conditions, and invariants. A pre‐condition is a condition or a logic predicate that must be met just before the execution of a portion of code. A post‐condition is a condition or a logical predicate that must always be met just after the execution of a code. An invariant is a condition or logical predicate that must always be met.

‐ 29 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Continue in the same way, OCL (Object Constraint Language) is a language for defining constraints in UML models [144], and JML (Java Modelling Language) which combines the design by contract (DbC) approach [105] and the model‐based specification approach of the Larch family [57]. Golden software defines a correct version of a software artefact [134][143]. Golden software has been employed in software testing to derivate test cases are generated by comparison of a software component towards its golden version. For example, in [39] in those golden test cases are used to compare others test cases in order to make a test suite selection/reduction, or even to generated Differential Unit Test (DUT), which are a hybrid of unit and system tests. In the intelligent approach test cases are identified selecting goals such as a statement or branch. This approach employs computational intelligent and/or Artificial Intelligence (AI) techniques. Pedrycz and Vukovich presented in [88] a fuzzy approach to cause‐effect software modelling as a basis for designing test cases in black‐box testing. Last and Friedman demonstrate the potential use of data mining algorithms for automated induction of functional requirements [139]. The Record&Playback approach is carried out firstly recording linear scripts corresponding to actions performed in the system (record stage). This script can be parameterized and after that, the automation can be done repeating the recorded script and exercising the SUT (playback stage) [24].

2.5.1.1. Source Code Generation An important aspect in test case generation is source code generation, which is the act of producing source code automatically. It is about writing programs that write programs, i.e. code generators. Code generators are separated into two high‐level categories: active and passive. Passive generators build a set of code, which the software engineer is then free to edit and alter at will. The passive generator maintains no responsibility for the code either in the short or long term. The typical example of passive generator is the “wizards” in Integrated Development Environments (IDEs). Active generators maintain responsibility for the code long term by allowing the generator to be run multiple times over the same output. As changes to the code become necessary, team members can input parameters to the generator and run the generator again to update the code. There are several types of active generators: ‐ Code Munging1. Given some input code, the munger picks out important features and uses them to create one or more output files of varying types. It usually employs regular expressions or simple source parsing, and then uses built‐in or external templates to build output files. It can be used to create documentation or to read constants or function prototypes from a file. ‐ Mixed‐code generator. It works just like the inline‐code generator, i.e. by using marks or specially formatted comments in order to embed additional code in these predefined

1 Munging is slang for twisting and shaping something from one form into another form.

‐ 30 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

positions. The difference of this with the previous type is that it saves the output to the same file, which used as input. ‐ Inline‐code expander. This kind of generators accept source code as input and expands it by replacing or embedding code to selected or marked points of it and produces the output source code. They are commonly used to embed Structured Query Language (SQL) into a source code file. The engine reads the file and finds the appropriate marks in the source code. While finds it, it replace them or embed the source code for the expansion. The purpose of this type is to keep the development code free of the infrastructure required to manage the SQL queries. ‐ Partial‐class generator. It accepts two inputs: i) Definitions file which keeps metadata written in a definition language like XMI or any other free language like XML mark‐up. The file describes the classes in order to be generated. ii) Templates which in conjunctions with the definition file they generate the output classes source code. These kinds of generators are used often for Object Relational Mapping (ORM) to build the data access tier of an application. ‐ Tier generator. In contrast to partial‐class these generators are responsible for building the full tier of an application. The input files are the same as to the previous code generation type, so there is a definition file which describes the classes and one or more templates which are being processed by the engine to produce the final source code for the application tier. While this type seems to have an advantage to generate the full tier, partial‐class generator presents faster development while increases developer flexibility, because tier generators are based more on generics and it is very difficult to design for special cases. ‐ Template Metaprogramming. Metaprogramming is the name given to computer programs that manipulate other programs (or themselves) at runtime [11]. Template metaprogramming is a technique in which a template processor (also kwon as template engine or a template parser) combines one or more templates with a data model to produce one or more result documents.

2.5.2. Test Data Generation Test data is the input needed for executing a test case. Test Data Generation (TDG) is a crucial software testing activity because test data is one of the key factors for determining the quality of testing process. Automated Test Data Generation (ATDG) is an activity that automatically tries to create effective test data (i.e., test input values) for the SUT. TDG is an expensive and error prone process if it is done manually, nevertheless ATDG could curtail testing expenses increasing the reliability of testing as a whole [97]. The values generated by ATDG must match two different criteria [115]: 1. Syntax criterion. It depends on the test level: i) Unit: Test data for methods parameters and non‐local variables. ii) System: Test data for user‐level interaction. 2. Semantic criterion. It could be one of the following: i) To satisfy some test condition. ii) Special or invalid values. iii) Random values. Test data is typically used in the following testing areas [96]: ‐ The coverage of specific program structures (white‐box testing). ‐ The exercising of some specific program feature (black‐box testing).

‐ 31 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Attempting to automatically disprove certain grey‐box properties regarding the operation of a piece of software, for example trying to stimulate error conditions. ‐ To verify non‐functional properties, such as the Worst‐Case Execution Time (WCET) or the Best‐Case Execution Time (BCET) of a SUT. ATDG general problem is formally unsolvable [115]. TDG is an old research topic that has many contributors. The following picture shows a list of the main TDG/ATDG appeared in the literature, from the traditional techniques (partition or equivalence class partitioning, boundary value analysis, cause effect graphing and random data generation) to others more recent in the literature (path‐oriented, constraint‐based, goal‐oriented, and search‐based).

Equivalence Boundary Value Partitioning Analysis

Cause Effect Graphing

Random Anti‐Random

Test Data Generation Dynamic Domain Path‐Oriented Constraint‐Based Reduction

Chaining

Goal‐Oriented Assertion‐ Oriented Search‐Based

Figure 10. Test Data Generation Techniques

2.5.2.1. Equivalence Partitioning Data generation for equivalence class partitioning (partition testing) was defined by Myers [110] in 1978 as “a technique that partitions the input domain of a program into a finite number of classes [sets], it then identifies a minimal set of well selected test cases to represent these classes. There are two types of input equivalence classes, valid and invalid”. The equivalence partitioning testing theory ensures that only one test case of each partition is needed to evaluate the behaviour of the program for the related partition. Boundary value analysis is a method which complements equivalence partitioning by looking at the boundaries of the input equivalence classes. NIST defines it in 1981 as “a selection technique in which test data are chosen to lie along ‘boundaries’ of the input domain [or output range] classes, data structures, procedure parameters” [2]. Boundary value tests data usually includes the following values: min‐1, min, min+1, max‐1, max, and max+1.

2.5.2.2. Cause–Effect Graphing Cause‐effect graphing is an old technique which can be defined as either test case generation [105] or test case selection [2], besides test data generation. Cause‐effect graphing’s aim is to

‐ 32 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

select the correct inputs to cover an entire effect set, and as such it deals with selection of test data. Cause–effect graphing exercises the different combinations of inputs from the equivalence classes. A cause‐effect graph is a directed graph that maps a set of causes to a set of effects. It is useful for generating a reduced decision table. Test cases are derived from the decision table.

2.5.2.3. Random Data Generation Random data generation consists of generating inputs at random until a useful input is found. This approach is quick and simple but might be a poor choice, since the probability of selecting an adequate input by chance could be low [92]. In the main derivate of random testing, namely anti‐random, each data is chosen such its maximum distance from the previous test data.

2.5.2.4. Path‐Oriented Data Generation Path‐oriented data generation technique first transforms source code of the program under test to a Control Flow Graph (CFG), which is directed graph that represents its control structure. Then, the CFG is used to determine the paths to cover. Finally, test data for these paths is generated. Path‐oriented approach is used with the help of symbolic execution (also known as symbolic evaluation), which is an automatic static analysis technique that allows the derivation of symbolic expressions encapsulating the entire semantics of programs. It extracts information from the source code of programs by abstracting inputs and sub‐routines parameters as symbols rather than by using actual values as during actual program execution [92]. In symbolic execution variables are used instead of actual values while traversing the path. Constraint‐based data generation is based on the path‐oriented techniques. It uses algebraic constraints to describe the input variables which describe the conditions necessary for the traversal of a given path. Constraint satisfaction problems are in general NP‐complete [96]. Dynamic Domain Reduction (DDR) is a TDG technique that was originally employed as part of constraint‐based testing, developed by DeMillo and Offutt [33]. DDR creates a set of values that executes a specific path. The DDR process is the following [115]: 1. Definition of an initial symbolic domain for each input variable. 2. Selection of a test path through the program. 3. Symbolical evaluation of the path, reducing the input domains at each branch. 4. Evaluation of the expressions with domain‐symbolic algorithms. 5. After walking the path, values in the input variables’ domains ensure execution of the path. 6. If a domain is empty, the path is re‐evaluated with different decisions at branches.

2.5.2.5. Goal‐Oriented Approach In his paper published in 1992, Korel developed what became known as the Goal‐Oriented Approach [86]. It can be described as data generation techniques which aim to find program input and a sequence on which the selected statement is executed. Test data is selected from the available pool of candidate test data to execute the selected goal, such as a statement, irrespective of the path taken. This approach involves two basic steps: to identify a set of statements (respective branches) the covering of which implies covering the criterion; to generate input test data that execute

‐ 33 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

every selected statement (respective branch). Two typical approaches, assertion‐based and chaining approach are known as goal‐oriented. Assertion‐based was proposed by Korel and Al‐Yami [87] attempts to find test cases that violate assertion conditions, which are embedded by the programmer into the program code. The chaining approach [85][43] uses the concept of an event sequence as an intermediate means of deciding the type of path required for execution up to the target node.

2.5.2.6. Search‐Based Data Generation Search‐Based Software Engineering (SBSE) is an approach to apply metaheuristic search techniques to automate the construction of solutions to SE problems [60]. Metaheuristic search techniques are a set of high‐level optimization algorithms which utilise heuristics (i.e. an experience‐based method) to find solutions to combinatorial problems at a reasonable computational cost. These problems may have been classified as NP‐complete or NP‐hard, or be a problem for which a polynomial time algorithm is known to exist but is not practical. Metaheuristic search techniques are not standalone algorithms in themselves, but rather strategies ready for adaption to specific problems [59]. The term SBSE was first coined in 2001 [62], since which time there has been a rapidly developing community working on this area. SBSE has been applied to problems throughout the SE lifecycle, such as requirements engineering, project planning, maintenance, reengineering, and testing [61]. Some metaheuristic techniques have been used in ATDG, such as hill climbing, simulated annealing, evolutionary algorithms (such as genetic algorithms), or tabu search. The general idea behind search‐based ATDG is that the set of possible inputs to the program forms a search space and the test adequacy criterion is coded as a fitness function. For example, in order to achieve branch coverage, the fitness function assesses how close a test input comes to executing an uncovered branch; in order to find worst case execution time, the fitness is simply the duration of execution for the test case in question [60]. This function has to be designed by a human. Once a fitness function has been defined for a test adequacy criterion C, then the generation of C‐adequate test inputs can be automated. This process is outlined in the Figure 11 [61].

Figure 11. Generic Search Based Test Input Generation Scheme

‐ 34 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2.5.3. Automated Test Oracles A test oracle is a reliable source of expected outputs. The oracle problem the name given to one of the biggest challenges in software testing: How do we know that the software did what it was supposed to do when we ran a given test case? [145]. Generally, expected outputs are manually generated based on specifications or developers’ knowledge of how software should behave [5]. These manual oracles are costly and unreliable. Hence, automated test oracles are required to ensure the testing quality while reducing costs. Complete automated test oracles can be expensive and sometimes impossible to provide. Several researches have been done to provide automated test oracles, but none of them could completely automate all test oracle activities in all circumstances [131]. The most important challenge to develop a complete automated test oracle is the output generation. In order to provide a reliable oracle, it is suggested that there should be a simulated model behaving like the SUT and automatically generate expected outputs for every possible inputs specified in the specification. The survey on automated test oracles carried out by Shahamiri on [131] describes the following methods: ‐ N‐Version diverse systems and M‐Model program (M‐mp) testing. ‐ Decision tables. ‐ Info Fuzzy Network (IFN) regression tester. ‐ Artificial Intelligence (AI) planner test oracle. ‐ Artificial Neural Network (ANN) based test oracle. ‐ Input/Output (I/O) analysis based automatic expected output generator. N‐Version diverse is a testing method based on various implementations of a program implementing the same functionalities presented on [93]. A gold version (i.e. a trusted implementation of SUT) is used to automate the oracle. This method is so expensive, so the authors reduce the cost by using M‐mp testing increasing the reliability of process by providing more precise oracle. A decision table is a requirements representation model used wherever there are many conditions affecting responses. Decision table consists of a condition section (combination of inputs) and the action section (combination of output for when the conditions are satisfied). Each row in the table decision presents unique combination of conditions. Di Luca et al applied decision tables in unit and integration web testing for both client and server pages in [90]. The following table shows a template of decision table:

Table 3. Decision Table Template Input Section Output Section Input Input action State before Expected Expected Expected variable test result output state after sections test

IFN regression testing is an approach developed for knowledge discovery and data mining. This approach uses AI methods for simulating the SUT behaviour using it as test oracle [88]. An IFN

‐ 35 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

presents the functional requirements by a tree‐like structure, where each input attribute is associated with a single layer and the leaf nodes corresponds to input values combinations. AI planning is applied as automated GUI test oracle in [142], modelling the internal behaviour of GUI using a representation of GUI elements and actions. A formal model composed of GUI objects and their specifications is applied as oracle. GUI actions are defined by their preconditions and effects. Expected states are automatically generated using the model. ANN as test oracle requires generating a neural network simulate the software behaviour by means of /O pairs as training patterns. Since ANNs can memorize or learn from I/O pairs, it is possible to apply them as test oracle. This approach has been applied in [142]. There has been presented several approaches on semi‐automated expected output generation finding I/O relationships by changing the input values and executing the program while observing the outputs [121][30][28][130]. The drawback of these methods is that incomplete I/O relationship detection may result in imperfect test oracle.

2.5.4. AST Frameworks AST is most effective when implemented within a framework. Testing frameworks may be defined as a set of abstract concepts, processes, procedures and environment in which automated tests will be designed, created and implemented. This framework definition includes the physical structures used for test creation and implementation, as well as the logical interactions among those components. A powerful AST framework must provide tools that address test planning, test design, test construction, test execution, and test results verification, and test reporting [109]. According to the Automated Testing Institute2 (ATI), there are three different kinds of AST frameworks: 1st, 2nd, and 3rd generation frameworks. The 1st generation framework is primarily comprised of the linear approach to automated test development. This approach typically yields a one‐dimensional set of automated tests in which each automated test is treated simply as an extension of its manual counterpart. Driven mostly by the use of the Record & Playback (R&P), all components that are executed by a linear script largely exist within the body of that script. There is little to no modularity, reusability, or any other Quality Attribute considered in the creation of linear scripts. Linear Scripts may be useful in environments with a very small scope. There are not calls to external modules or external data in a linear script The 2nd generation frameworks are comprised by two kinds that fit into this generation: the data‐driven framework and functional decomposition framework. Frameworks built on data‐driven scripting are similar to linear scripts. The difference is how the data is handled. The data used in Data‐driven scripts is typically stored in a database of file external to the script. Functional decomposition refers to the process of producing modular components (user‐ defined functions) in such a way that automated test scripts can be constructed to achieve a testing objective largely by combining these existing components. The 3rd generation frameworks are the most defined frameworks. They require proficiency in the automation method being used to develop the framework. The two frameworks that fit

2 http://www.automatedtestinginstitute.com/

‐ 36 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

into this generation include Keyword‐driven and Model‐based frameworks. The Keyword‐ driven frameworks (often called “table‐driven”) process automated tests that are developed in data tables with a vocabulary of keywords (“action” words) that are independent of the automated test tool used to execute them. The keywords are associated with application‐ specific and application‐independent functions and scripts that interpret the keyword data tables along with its application‐specific data parameters. The automated scripts execute the interpreted statements in the SUT. The model‐based frameworks (often called as “intelligent framework”) go beyond creating automated tests that are executed by the tool. These frameworks are typically “given” information about the application, and the framework “creates” and executes tests in a semi‐intelligent manner. Test automators describe the features of an application, typically through state models that depict the basic actions that may be performed on the application, as well as the broad expected reactions. Armed with this information the framework dynamically implements tests on the application.

2.5.5. AST Frameworks for Web Applications Following the classifications of framework depicted before, the following table summarizes some the most significant AST frameworks for web applications to date. Each of these frameworks is analysed in further sub‐sections.

Table 4. Automated Software Testing Frameworks for Web Applications Framework Generation Creator License Operative System SOATest3 2nd and 3rd Parasoft Proprietary Windows, Linux, Solaris HP Quality Center4 2nd and 3rd HP Proprietary Windows, Linux, Solaris, AIX, HP‐UX IBM Software 2nd and 3rd IBM Proprietary Windows, Linux, Quality Solaris, AIX, Z/OS Management5 Selenium6 1st ThoughtWorks Apache Cross‐Platform Silk7 2nd and 3rd Micro Focus Proprietary Windows, Red Hat Enterprise, Solaris STAF8 3rd IBM EPL Cross‐Platform TestComplete9 3rd AutomatedQA Proprietary Windows WATIR10 1st Bret Pettichord and BSD Cross‐Platform Paul Rogers

3 http://www.parasoft.com/jsp/products/soatest.jsp?itemId=101 4 ://h10078.www1.hp.com/cda/hpms/display/main/hpms_content.jsp?zn=bto&cp=1‐11‐ 127‐24_4000_100__ 5 http://www‐142.ibm.com/software/products/us/en/subcategory/rational/SW730 6 http://seleniumhq.org/ 7 http://microfocus.com/products/silk/index.aspx 8 http://staf.sourceforge.net/ 9 http://www.automatedqa.com/testcomplete/ 10 http://watir.com/

‐ 37 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2.5.5.1. SOATest Parasoft SOAtest is a quality platform that automates web application testing, message/protocol testing, cloud testing, security testing, and behaviour virtualization. Parasoft SOAtest is packaged together with Parasoft Load Test, and they can be integrated with Parasoft language products such as JTest, to help teams prevent and detect application‐layer defects from the start of the Systems Development Life Cycle (SDLC).

2.5.5.2. HP Quality Center HP Quality Center is a software quality tool suite. Many of the tools of the suite were acquired from Mercury Interactive Corporation. It offers QA, including requirements management, test management and business process testing for IT and application environments. The products of this suite are the following: ‐ HP Business Process Testing for Oracle/SAP: System for defining and executing business‐ centric . ‐ HP Center Management for Quality Center: Project management of QA workflows. ‐ HP Change Impact Testing for SAP Application: Recommendations on SAP testing priorities ‐ HP Functional Testing: Complete automated testing solution for functional, GUI and regression testing. ‐ HP Quality Center: Web‐based application that supports all aspects of test management ‐ HP QuickTest Professional: It provides functional and regression testing automation for major software application environments. ‐ HP Requirements Management: It captures, manages and tracks requirements at every step of the application development and testing process. ‐ HP Service Test Management: Automatic QA and test assets for any application component or service for the Service‐Oriented Architectures (SOA). ‐ HP Service Test: Simplifies and the automated functional testing of SOA services.

2.5.5.3. IBM Software Quality Management IBM Rational Quality Mmanagement is a family of products which helps to deliver enduring quality throughout the product and application lifecycle. The tools in this suite are the following: ‐ Rational Application Performance Analyzer: Pinpoint and understand the root cause of actual bottlenecks in the application. ‐ Rational AppScan Product line: Static and dynamic security testing in all stages of application development. ‐ Rational Functional Tester: Automated functional testing of Java, Web and VS.NET WinForm‐based applications. ‐ Rational Functional Tester Plus: Functional and regression testing solution covering a wide variety of software applications. ‐ Rational Performance Tester: Verifies acceptable application response time and scalability under variable multi‐user loads. ‐ Rational Policy Tester OnDemand Privacy, Quality and Accessibility Edition: Web‐based, multi‐user solution providing centralized scanning of Web content for accessibility, privacy, and quality compliance respectively.

‐ 38 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Rational Professional Bundle: Provides enterprise desktop tools to design, construct, and test J2EE/Portal/Service‐oriented applications. ‐ Rational Purify: Dynamic software analysis tool for Windows/Linux/UNIX application development. ‐ Rational Quality Manager Product line: Web‐based centralized test management environment. ‐ Rational Robot: General purpose test automation tool for client/server applications. ‐ Rational Service Tester for SOA Quality: A regression and functional testing solution for testing GUI‐less services. ‐ Rational Software Analyzer: It provides capabilities to ensure reliably quality code. ‐ Rational Test Lab Manager: Build and configure test environments providing inventory control and analytics. ‐ Rational Test RealTime: Helps identify and resolve issues early in the development cycle.

2.5.5.4. Selenium Selenium is a testing framework for web applications. Selenium was been firstly developed by a team of programmers at ThoughtWorks (IT consultancy). It has been released under the Apache 2.0 license. According to Selenium , Selenium is composed by different projects. The following table summarizes these projects:

Table 5. Selenium Projects Component Description Selenium IDE add‐on that makes to record and playback tests in Firefox 2+. Selenium Remote Selenium RC is a client/server system that allows you to control web Control browsers locally or on other computers, using almost any programming language and testing framework. Selenium Grid It allows Selenium RC running tests on many servers at the same time. Selenium Core It is the original JavaScript‐based testing system. Selenium on Rails It provides a suite to run Selenium tests for Rails applications. Selenium on Ruby It is the hub for newer Ruby related Selenium projects (work‐in‐ progress). CubicTest Graphical Eclipse plug‐in to write Selenium and Watir (Web Application Testing in Ruby) tests Bromine Web‐based QA tool that enables running and reporting selenium tests

Selenium Core tests run directly in a browser, just as real users do. They run in Internet Explorer, Mozilla and Firefox on Windows, Linux, and Macintosh. Selenium Core uses a mechanism that allows it to run on so many platforms. Written in pure JavaScript/DHTML, Selenium Core allows the tests to run in any supported browser on the client‐side. Selenium Remote Control (RC) is a web server written in Java that accepts HTTP commands. RC makes it possible to write automated tests for a web application in any programming language, which allows a better integration of Selenium in existing unit test frameworks. To make writing tests easier, Selenium project currently provides client drivers for Python, Ruby, .NET, Perl, Java, and PHP. The Java driver can also be used with JavaScript (via the Rhino engine).

‐ 39 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2.5.5.5. Silk Silk is an automated software quality management solution. It ensures that developed applications are reliable and meet the needs of business user by automating the testing process. It prevents or discovers quality issues early in the development cycle. The following products compose the Silk tool suite: ‐ SilkPerformer: Automated software load, stress and performance testing. ‐ SilkPerformer Diagnostics: Accelerates the resolution of found performance problems. ‐ SilkCentral Test Manager: Automated test management solution that can manage agile or traditional test cycles. ‐ SilkTest: Automation tool for testing the functionality of enterprise applications. It has also support for web 2.0 applications. ‐ Silk4Net and Silk4J: IDE support for Eclipse or Visual Studio. ‐ TestPartner: Automated functional and GUI testing. ‐ DataExpress: Automated test data generation and management.

2.5.5.6. STAF STAF (Software Testing Automation Framework) creates and manages automated test cases and test environments. It externalizes its capabilities through services. STAF Proc is the process that runs on a machine, called a STAF Client, which accepts requests and routes them to the appropriate service. These requests may come from the local machine or from another STAF Client. Thus, STAF works in a peer environment, where machines may make requests of services on other machines.

2.5.5.7. TestComplete TestComplete is an automated testing tool in which tests can be recorded, manually scripted or created manually with keyword operations and used for automated playback and error logging. It is used for testing application such as web, Windows, Flash, .NET and Java. It automates front end UI/functional and back‐end testing like database, and HTTP load testing.

2.5.5.8. WATIR Watir (pronounced water) stands for “Web Application Testing in Ruby”. It is an automated test tool that uses the Ruby scripting language to drive the . Watir is a toolkit for automated tests to be developed and run against a web browser. The following example is a very simple WATIR script to drive to google.com, make a search and validate the results:

Snippet 1. Watir Script Example require 'watir' test_site = 'http://www.google.com' b = Watir::Browser.new b.goto(test_site) b.text_field(:name, "q").set("pickaxe") b.button(:name, "btnG").click if b.text.include?("Programming Ruby") puts "Test Passed. Found the test string: 'Programming Ruby'." else puts "Test Failed! Could not find: 'Programming Ruby'" end

‐ 40 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2.6. Summary

Quality control (V&V) is an important topic within SE. It can be divided into two important groups of activities: testing and analysis. On one hand, the nature of testing is dynamic since it involves exercise an application and observes its outcomes. On the other hand, the nature of analysis is static since it involves the evaluation of software artefacts (typically source code, but it is possible to analyse specifications, designs, models, and so on) without its execution. Both activities are very important to ensure the quality of a software product. Regarding (static) analysis, there are several techniques reported in the literature. Inspections are examinations of software artefacts by human inspectors aimed at discovering faults in software systems. Review is the process in which a group of people examine the software looking for potential problems. Automated Software Analysis (ASA) assesses the source code using patterns that are known to be potentially dangerous. Finally, formal methods are used to refer to any activities that rely on mathematical representations of software including formal specification and verification. Regarding (dynamic) testing, it covers a wide spectrum of different concepts, such as testing level (unit, integration, system, and so on), testing strategies (black‐box, white‐box, grey‐box, and non‐functional testing), and testing processes (manual, model‐based, automated testing, and so on). Web testing aims to find defects of web applications. Automated Software Testing (AST) can be seen as the application of software technology to the STL with the goal to improve the effectiveness of testing. AST involves several aspects, such as test case generation, test data derivation, and automated oracles. AST is most effective when implemented within a framework. By analysing the existing proposal to perform automation assessment for web applications, it is clear there is a lot work done in this field. Nevertheless, these achievements are usually scattered or incomplete since it does not involved AST and ASA at the same time. Therefore, I conclude there is still room for improvement, and this dissertation is going to perform original contributions in this domain.

‐ 41 ‐

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 3. Objectives

We can only see a short distance ahead, but we can see plenty there that needs to be done.

‐ Alan Turing

s is clear from the study of the state‐of‐the‐art, quality control activities are crucial in software development but also time‐consuming. Nowadays, the software business must be responsive, i.e. it should change and adapt very quickly to external demands. The delivery challenge looks for shortening delivery times for Alarge and complex systems without compromising software quality. The automation of quality control (which is the major topic of this PhD dissertation) has been proposed as a solution of this challenge. After the analysis of the state‐of‐the‐art in this area I concluded that there was still room for improvement in this field, since there is not a complete solution which addressed at once the automation of testing and analysis activities for web applications. Therefore, the overall objective that this dissertation proposes is to investigate and improve the current processes and mechanisms that support the automation of quality control (testing and analysis) for web applications. The outcome of the work to be done will facilitate the improvement of the software quality for web applications while reducing the time to market and saving total costs of the development. In order to divide the main aim of this dissertation in several specific objectives, I am going to rely in Pressman’s definition of SE. Pressman proposed a four‐layer approach to define any engineering approach [122], such as SE. This approach is illustrated in Figure 12 in form of pyramid. In the bottom of the structure, SE must rest on organizational commitment to quality. According to Pressman, “The bedrock that supports software engineering is a quality focus”. The following foundation for SE is the process layer, which is a collection of activities, actions and tasks that are performed when some work product is created. Process defines a framework established for effective delivery of a SE technology. The next layer is SE methods, which provide the technical how‐to for building software. In other words, methods establish the way of solving a problem. Finally, SE tools provide the practical support for the process and methods.

‐ 43 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Tools

Methods

Processes Quality

Figure 12. Software Engineering Layers Therefore, using the SE layers decomposition described before, the main objective of this dissertation may be divided into the following set of specific goals: 1. To propose a complete methodology for the automation of software quality control for web applications. This objective has to do with the design of a complete approach to achieve the automation of software testing and static analysis within the development lifecycle of web applications. It corresponds to the bottom layers of the SE pyramid depicted before, i.e. quality a processes. To achieve this goal, firstly the high‐level quality attributes to be ensured has to be defined in order to guide the V&V processes. For each of these quality attributes, specific software testing and/or static analysis techniques will be selected. The next step in this methodology will be to define the process how to perform these quality control activities in an automated way. The automated quality control to be proposed will be carried out during the development lifecycle of the web application in question. Hence, the methodology should reuse software development artefacts from the analysis and design phases (such as requirements or models) in order to guide the automation of testing and analysis activities as far as possible. 2. To analyse the challenges and potential problems of the automated software testing for web applications. This goal should establish how the automation of software testing will performed, i.e. the method. As presented in the state‐of‐the‐art section, software testing is a broad term encompassing a wide spectrum of different activities. The methods to be proposed should define how testing activities will be carried out in order to achieve greater automation. Therefore, the definition and election of the testing levels (unit, integration, and system) and strategies (functional, non‐functional, and structural testing) will be carried out to aim this goal. Software testing is an important topic in this dissertation since it is the most commonly performed activity within V&V. Once the high level processes for automated software testing has been defined in the methodology, specific automated software testing methods has to be defined.

‐ 44 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Due to its peculiarities, web applications are difficult to test. Therefore, specific contributions have to be done in order to achieve automated software testing for web applications, concretely in test case (test data, fixture, and oracle) generation, test execution, and test reporting. 3. Propose a detailed model to perform automated analysis for web applications. Many authors have highlighted that software testing should be done in in conjunction with static analysis for a good quality control assurance. Therefore, this dissertation should study and select the most suitable automated static analysis techniques to be carried out for web applications. Moreover, static analysis presents several advantages over software testing since it considers broader quality attributes and it is not concerned with interaction between errors. Thus, this contribution should be aware of this situation, taking the most of automated static analysis for specific features of web applications. Therefore, and following the guidelines depicted in the methodology, this contribution should define precise methods to achieve this kind of V&V specifically for web applications. 4. Validate the feasibility of the research approach by means of developing reference architecture of the proposed methodology. The reference architecture provides further details on the elements included in the methodology. This architecture will define or select specific tools which implement the proposed methods for testing and analysis. A working prototype will be developed and validated against a set of representative case studies to verify that it addresses the objectives presented in this dissertation. These case studies will be performed using real web applications. In addition, as a major part of the validation, this dissertation contributes to international research projects as well as open‐source communities aligned with the context and objectives of this work.

‐ 45 ‐

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 4. Methodology Foundations

The real voyage of discovery consists not in seeking new lands but seeing with new eyes.

‐ Marcel Proust

his section describes the high‐level view of the proposal to automate quality control for web applications. First, I am going to describe precisely how the target of the proposed approach (i.e. web applications) is understood in the scope of the dissertation. Second, the generic approach to automate quality control (testing and Tanalysis) for web applications will be depicted. Third, the quality dimensions to be assessed with V&V activities will be selected and explained. Finally, the process to guide the automation of quality control will be described.

4.1. Web Applications Web applications follow a client‐server application protocol. The web client (using a web browser, such as Explorer, , , Firefox or Chrome) sends an HTTP request through a TCP‐IP network (typically the Internet) to a web server. The server receives this request and determines the page, which usually contains some script language to connect which a database server. A middleware component connects the web server with the database to inform about the query and get the requested data. This data is used to generate an HTML page, which is sent back to the client in form of a HTTP response. This typical architecture for web applications is illustrated in the following picture:

Figure 13. Tipical Web Applications Architecture

‐ 47 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Nowadays, more and more web applications are not limited to the synchronous interaction of HTTP request and responses. Using the group of technologies called AJAX (Asynchronous JavaScript and XML) web applications can send requests and retrieve responses asynchronously. This interaction with the web server is done without interfering with the displayed web page [22]. The XMLHttpRequest object is the core of AJAX. It is an API that can be used by JavaScript, JScript, VBScript and any other scripting language to transfer and manipulate data between the server and the client using HTTP. XMLHttpRequest original concept (called XMLHTTP) was initially developed by Microsoft as part of Outlook Web Access 2000. On 2010, W3C released the final draft specification for the XMLHttpRequest object to create an official web standard11. Therefore, web applications involve heterogeneous technologies, components (browsers, servers, and databases), programming languages, networking aspects, and so on. Thus, the global quality control for these kinds of applications is a very complex endeavour task, and the automation of such activities is even harder. All in all, it is not possible to cover every aspect of the web chain in a single PhD dissertation. Hence, the focus of this piece of research will be web applications from the client‐side view. This choice is based on the fact that the client‐side view of a web application is the real key differentiator for such applications. If the client‐side and the HTTP communication are dropped from the picture, the resulting system is essentially a normal application with some business logic, a database, and so on. These kinds of applications can be tested and analysed using traditional testing approaches. In addition, the quality in use of web applications is perceived in the client‐side. According to ISO 9126, quality in use can be considered as the highest level of quality since it is experienced by the consumers during and operation and maintenance, and it is influenced by external and internal quality (see Figure 5 in section 2.1.2.1.1). A key aspect in the quality in use in web applications is the number of defects detected in the client‐side. Therefore, the final aim of quality control should be minimizing such defects (defect reduction). Faults in web application are caused by errors made by human developers. Automated fault detection is intended to minimize the number of failures in a web system, and therefore the number of incidences that final consumers (users/costumers) noticed in the application. This chain defect is illustrated in the following picture:

11 http://www.w3.org/TR/XMLHttpRequest/

‐ 48 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 14. Software Defects in Context These defects are mainly injected in the design and coding phase within the overall of the software development lifecycle, comprising the 90% of the inserted faults [31]. The cost of correcting faults when the system is in operation increases exponentially [18][122]. These facts are illustrated in the following chart:

Figure 15. Fault Origin/Dectection Distribution and Cost 4.2. Automated Quality Control Activities

Quality control activities are an important part of any Software Development Lifecycle (SDLC). As depicted in the state‐of‐the‐art section, quality control activities can be divided in two big groups: (dynamic) testing and (static) analysis. This section describes how these activities will be automated for web applications (client‐side) as target in the generic approach proposed in this dissertation.

4.2.1. Automated Software Testing Planning, design and execution of testing activities are carried out throughout the software development process. These activities are divided in phases as a testing procedure illustrated in Figure 16 and described as follows [111][47]: 1. Test requirements. The aim of this step is to define the features of a software artefact that performs the tests must satisfy or cover. These requirements are sometimes compiled in a

‐ 49 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

document named test plan (sometimes known as test specification). A test plan provides a set of ideas which the tests will be conducted, and includes the following features: ‐ Quality views or attributes are constraints on the services or functions offered by the system, that is, non‐functional requirements. It is not possible for any system to be optimized for all of these attributes. For example, improving robustness may lead to loss of performance. The test plan should therefore define the most important quality attributes for the software that is being developed [132]. ‐ The test goal or objective is the intention or purpose of the testing activities. There are two distinct testing objectives [132]: i) To demonstrate to the developer and the customer that the software meets its requirements (verification). ii) To discover situations in which the behaviour of the software is incorrect or does not conform to its specification (defect testing). There is no define boundary between these two approaches: during verification testing defects can be found, and during defect testing some of the tests can show that the SUT meets or not its requirements. ‐ Test process is the description of the steps in the lifecycle performed according to procedures that have been approved conforming to the QA plan adopted by the developing organization. 2. Test design. This stage produces the description of test cases according to the test plan. In some cases, the test design is described in a document called test model, although test design can be also be depicted in the test plan. A test case (or simple test) is a procedure, whether manually executed or automated, that can be used to verify that the SUT is behaving as expected [101]. A collection of test cases running together is known as test suite. A test case prepared form to be executed on the SUT and produce a report is known as test script [5]. It should be defined the following features for each test case: ‐ The first part of the test design should be identifying the elements to be assessed. The test level sets the scale of the piece of code under test, or where test cases are added in the software development process, for example unit, integration, or system testing. ‐ The test strategy (also known as method or approach) is the point of the view of the test case, namely: black‐box (functional), white‐box (structural), or non‐functional (e.g. performance, security, usability, reliability testing, among others depending on the non‐functional requirement to be checked). The combination of the black and white‐ box approaches is usually known as grey‐box testing. ‐ Testing activities should be driven by Computer‐Aided Software Engineering (CASE) programs, i.e. software testing tools. In the test design phase the test tools to be employed should be selected. 3. Test implementation. This stage instantiate each of the designed test cases. There are two possible strategies to select test cases [132]: i) Partition testing, where testers identify groups of inputs that have common characteristics. ii) Guideline‐based testing, there experience is used to choose test cases. Each selected test case should include: ‐ Test logic is the part of the test cases that implements the test design using a programming language. The test fixture (also known as test context) is the part of the test logic which ensures that there is a controlled, stable environment in which tests are run so that results are repeatable. A test fixture first creates the desired state for test prior to execution (setup) and then cleans up after the test execution (teardown).

‐ 50 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Test data, i.e. inputs for test cases. It can be based on the requirements specification, the source code, or tester’s expectations. Test inputs are selected depending on the test goal. ‐ Test oracle, which is a mechanism used to determine whether a test has passed or failed. It could be an entity‐program within the test case, a process or a human expert. The test oracle within a test case is the code that decides success or failure for that test data [145]. A test oracle has two different parts, namely oracle information (expected output of the program for the selected input) and oracle procedure (comparator to verify actual results) [146]. The expect output is a complex entity that may include the following: i) Values produced by the program such as outputs for local observation (integer, text, audio, image), messages for remote storage, manipulation, or observation. ii) State change, such as state change of the program or state change of the database (due to add, delete, and update operations). The complexity of comparison depends on the complexity of the data to be observed. 4. Test execution, i.e. performing the actual tests. ‐ Test cases exercise the SUT, i.e. program is executed and the actual outcome of the program is observed. ‐ Using the test oracle, a test verdict is assigned to the test case execution. There are three major kinds of test verdicts: i) Pass: The program produces the expected outcome and the purpose of the test case is satisfied. ii) Fail: The program does not produce the expected outcome. iii) Inconclusive: In some cases it may not be possible to assign a clear pass or fail verdict. For example, if a timeout occurs while executing a test case on a distributed application, it is not possible to assign a clear pass or fail verdict. An inconclusive verdict means that further tests are needed to refine the verdict. ‐ A test report must be written after analysing the test result. The motivation for writing a test report is to get the found defects fixed.

Figure 16. Generic Testing Activities Once established the testing activities procedure, I describe how to automate these steps. The aim of this requirement stage is to define the test plan, and it is essentially manual. The testing requirements depend directly on the system requirements (functional and non‐functional). In order to achieve automation, I propose to reuse the development requirements as testing requirements. Nevertheless, the specific quality views, test goal and process for testing web

‐ 51 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

applications should be defined. Regarding test goal, it is two‐folded: verification testing (ensure requirements) and defect testing (seeking faults and so on). The following sub‐sections describe the quality views to be assessed and the generic process to achieve this aim. Design step has to do with the test model which supports the test plan, and it also requires the intervention of human testers. They should appoint what are the test requirements, the test strategy and the test levels to be carried out. Once the test plan and model are defined, the instantiation of the test model will create test cases (implementation). This part can be effectively automated. This derivation can be divided in the following steps: i) Test logic generation; ii) Test data generation; iii) Test oracle generation. Finally, the execution and report steps are usually automated since these steps are carried out by testing tools. The definition of the automation in test design, implementation and execution are depicted in sections 5, 6 and 7 of this dissertation.

4.2.2. Automated Software Analysis Besides AST, the second activity to achieve automated quality control is Automated Software Analysis (ASA). The difference is that ASA does not involve the execution of the SUT, but it examines statically the source code. ASA is based on the use of a set of rules to guide the analysis of the source code. These rules can be divided in the following groups: ‐ Best practices are generally‐accepted techniques, methods, or processes that have proven to find faults over time. ‐ Patterns are reusable solution to solve recurring problems. ‐ Assumptions are conjectures about the correct way of working of a software component. ‐ Bad smells are undesirable symptoms within the source code. ‐ Fault description is a representation of problematic issues within the source code. Basic static analysers run simple text‐based searches for strings and patterns in source code files, recursively analysing the code base for faults and then generating an analysis report. More modern static‐analysis tools trace the data’s path through code to provide a more complete and accurate analysis [71]. This behaviour is illustrated in the following picture:

Figure 17. Generic Analysis Activities This process is automated by definition: analyser employs pre‐established rules which guide the analysis, performed by a code scanner which examines the source code looking for these rules. As a result, an analysis report is generated with the found faults in the source code.

‐ 52 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

4.3. Quality Views

Industry is going through a revolution in what software quality means the success of software products [115]. Quality is incorporated into a web application as a consequence of good methodology, design and implementation. As depicted in the state‐of‐the‐art section, to ensure the quality of a web application involves the assessment of the required functionality, but also the assessment of the non‐functional requirements (quality attributes). The most important non‐functional requirements for web applications have been identified in [34] and are summarized in Table 2 in section 2.4.3. Therefore, the quality dimensions for web applications to be ensured in this dissertation are: functionality (functional requirements), performance, security, compatibility, usability, and accessibility (non‐functional requirements). Both automated testing and analysis can be used to assess these quality attributes. The following sub‐sections identify the most suitable quality control activities (testing and/or analysis) to assess each of the selected quality attributes.

4.3.1. Functionality The functionality is evaluated to ensure the conformance to customer requirements. In the case of the client‐side of web applications, the key aspect to ensure the functionality is the web navigation. Web navigation plays a key role in the overall web experience. The act of navigating from one page to another by means of web links is known as browsing. To automate the web navigation, that is, the web browsing, the structure of the web should be identified. In web navigation, users typically interact with data forms. These forms are used to submit the input data to the server for processing. Forms are composed by the following kind of fields: ‐ Text fields. These elements allow the user to input of a single line of text. ‐ Textarea fields. These elements allow the user to input a multiple rows of text data. ‐ Checkbox buttons, multiple selection elements. These buttons are usually shown on screen as square boxes that can contain a white space (for unselected) or a tick mark or square (for selected). ‐ Radio buttons, single selection elements. These buttons are usually shown on screen as circular holes that can contain a white space (for unselected) or a dot (for selected). ‐ Select fields. These elements allow the user to choose one or more values from a list. ‐ File fields. These controls allow the user selecting a local file and uploading it to the web server. ‐ Buttons. These controls provide a way to trigger events. Some special types of buttons in web forms are the reset button (used to clear the form) and the submit button (used to take an action, typically send the form to the web server). The input data is processed by the server, and as a result, some output data is returned to the client. These output information can be shown to the user using forms too, although this option is not very common. It is usually displayed using the HTML capabilities to render information using document body elements. In the communication of client and web servers, it is very important the concept of web session. HTTP is a stateless protocol, so the server does not retain information or status of the

‐ 53 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

clients during multiple requests. HTTP cookies are employed to implement web sessions. A cookie is an object created by a server‐side and stored at the client (typically, in the disk cache of the browser) [127]. Cookies are used by the server‐side program to store and retrieve state information associated with the client, i.e. to keep the web session. All in all, the automation of the web navigation is carried out by exercising the SUT using real web browsers, where input/output data and session is managed. Therefore, automated testing will be carried out to assess the functionality of web application in this dissertation.

4.3.2. Performance Web performance is critical because users do not like to wait too long for a response to their requests. Web performance testing should be considered as a continuous activity to be carried out in order to tune the system adequately [34]. Effective performance testing cannot be carried out without using automated test processes. There is no practical way to provide reliable, repeatable performance tests without using some form of automation. Therefore, performance testing should be part of this dissertation. The aim of automated performance testing is to simplify the performance testing process. This is normally achieved by providing the ability to record end‐user activity and to render this data as scripts. After that, these scripts are used to create load testing scenarios which perform the actual performance tests. Therefore, the testing R&P process can be used to automate the performance testing process since record scripts can easily be rerun on demand (playback).

4.3.3. Security Web security assessment should provide evidences that a web application is protected against hostile attacks and malicious inputs. The heterogeneous web nature together with the very large number of possible users makes web applications more vulnerable than traditional ones and security assessment more difficult to be accomplished. There are several approaches to carry out security assessment: ‐ Black‐box testing takes an external perspective of the tested object. Tester only knows the inputs and outputs of the application. It is a good approach to anticipate attacks, because it proves the application from the attacker's perspective. ‐ White‐box testing is a more exhaustive approach because it needs to look into the application code to find security weaknesses. ‐ Static analysis run simple text‐based searches for vulnerability patterns in source code, recursively analysing the code base for security defects and then generating a report. This dissertation is focused in the client‐side of web applications. Therefore, the business logic code is not available to assess, thus white‐box testing and static analysis are not suitable to be automated. Hence, the automation of black‐box security testing will be the technique to be employed.

4.3.4. Compatibility As depicted in the state‐of‐the‐art section, compatibility assessment tries to find uncovered failures due to the usage of different web server platforms or client browsers. Since this dissertation is focused in the client‐side of web applications, compatibility assessment will be focused on client browsers.

‐ 54 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Web compatibility in the client‐side is achieved writing standard HTML/CSS which can be rendered by in any browser. Therefore, the automation of compatibility in this dissertation will be achieved by evaluating the accomplishment of these client‐side elements (HTML, CSS) by means of automated static analysis.

4.3.5. Usability Usability is defined as the degree to which users can perform a set of required tasks. Web applications have become a standard and cross‐platform means to communicate people and make businesses on Internet. Brink et al claims that “high usability is a key factor in achieving maximum return on information technology investments” [9]. Therefore, web usability may determine the success of the application. As a consequence, the application front end and the way users interact with it need greater attention along the quality control process. There are different ways to assess web usability: ‐ Usability inspection involves a designer (or group of them) evaluating the user interface of a web site based on general design principles or specific lists of guidelines. ‐ Group walkthroughs involves a group of stakeholders walking through common tasks on the web site. At each step of the task, the group identifies any issues in the design and tracks fixes that need to be made. It is very similar to a usability inspection but it is task‐ oriented, and it often involves non‐designers. ‐ User testing involves observing users performing specific activities with the web site to identify what problems they have as they use the site. A special case of user testing is “hallway testing”12, which uses a group of users to test the usability of a web application. ‐ Static analysis employs rules for good design and heuristics to find potential usability issues. Inspections, walkthroughs and user testing are manual processes that cannot be completely automated by definition. Therefore, static analysis is selected to perform automated usability assessment in the client‐side or web applications in this dissertation.

4.3.6. Accessibility Web accessibility assessment evaluates how well web applications can be used by people with disabilities. Web accessibility evaluation combines different disciplines and skills. There are several scopes of evaluation, from individual web pages, collections of web pages, whole web sites, or just specific parts of web pages such as tables or images. Web accessibility assessment is closely related to the development process since it is carried out with the purpose of improving or maintaining the web content. There are three main types of web accessibility assessment techniques [102]: ‐ Manual testing, which is carried out by human testers. The types of manual testing are depicted as follows: o Non‐technical evaluation which are carried out by non‐technical evaluators such as content authors, e.g. to determine if the ALT‐attributes describes the purpose of the images appropriately or if the transcriptions for the multimedia content is correct.

12 http://www.useit.com/alertbox/20000319.html

‐ 55 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

o Technical checks which are usually carried out by web developers, evaluating markup code and document structure as well as compatibility with specific technologies. o Expert checks (walkthroughs) which are carried out by evaluators who have knowledge of how people with disabilities use the web and who can identify issues that relate to the user interaction. ‐ User testing, which is carried out by real end‐users in informal or formal settings. In general one there are two modes of user testing: o Informal checks, which can be carried out by non‐experts, for example by asking individual persons like friends or colleagues for their opinions. o Formal checks which are usually carried out by professionals who follow well‐ established usability procedures. ‐ Automated evaluation, which is carried out without the need for human intervention. There are the following types of automated evaluation: o Syntactic checks. It consists in the analysis of the web application ensuring the correctness of the web content such as checking the existence of ALT‐attributes in IMG elements or LANG‐attributes in the root HTML elements, and others. o Heuristic checks. It examines some of the semantics in the web content such as the layout and markup or the natural language of information. o Indicative checks. It uses statistical metrics and profiling techniques to estimate performance of whole web sites or large collections of web content. These techniques are useful for large‐scale surveys, for example to monitor the overall developments in the public sector of a country. Similarly as usability, there are accessibility assessment techniques predominantly manuals (user and manual testing). Therefore, it is clear that automated accessibility evaluation based on static analysis is the best choice to be included in this dissertation.

4.4. Test Process

Once the quality dimensions are defined and following the objectives described in chapter 3, this section establishes the generic process to automate the quality control activities for web applications in the client‐side. Following the guidelines described to asset the functional requirements (see section 4.3.1), the proposed process is based on the automated browsing of web applications. To perform this automation, it is needed to model the navigation of a web site, and then divide that navigation in independent paths. Automation in testing and analysis demand not only time but also resources in terms of preparation and planning. Therefore, automated quality control should be an assisted process. In order to automate these activities some human interaction is required [94]. Thus, the first step in the generic process proposed in this dissertation is to establish the correct navigation structure of the web under test. This step should be done by human the tester or developers in charge of the quality control of the SUT. It will guide the automation process, since it must be known forehand the right way of traversing a web application. Quality control activities are primarily based in comparisons. As depicted in Figure 16, the test oracles must know the expected outcome prior to exercise the web system. This also applies to the proposed process

‐ 56 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

to automate the quality control of web application based on the navigation: the correct navigation should be established in order to know what it is right and what it is not. I propose the following types of modelling the navigation (these models will be described in detail in next section): ‐ UML models. UML is the de‐facto standard for modelling and design. Reusing such models for quality control it is a way of saving time. ‐ XML files. I will propose a self‐defined XSD schema to model the navigation. Such models are useful in analysis and design, but these kinds of files will be richer than UML in the sense of these files can contain test data and oracles. ‐ R&P scripts. This king of input will be useful for finished web applications, or at least when the web application can be executed. Therefore, this kind of input is useful for operation and maintenance stages in the SDLF. Once the navigation structure is defined, I use graph theory to represent the defined web site navigation. Graph theory is the study of graphs in mathematics and computer science. A graph is the abstract representation of a set of vertices (vertex or nodes) connected by arcs (edges or links). A graph is a pair G=(V,E) of sets such that E ⊆ [V]2; the elements of V are vertex/nodes and the elements of E are the edges/links. The usual way to picture a graph is by drawing a dot for each vertex and joining two of these dots by a line if the corresponding two vertices form an edge [15]. The following table shows definitions useful to understand the process to be defined.

Table 6. Graph Types Graph Type Example A graph in with the edges have no orientation is known as undirected graph.

In a mixed graph some edges may be directed and some may be undirected.

If the edges have orientation, the graph is known as directed graph (or digraph) [9]. A digraph is acyclic if it has no cycle. A digraph is strongly connected (or, just, strong) if every vertex is reachable from every other vertex, i.e. there is a path from each vertex in the graph to every other vertex. In a weighted graph a number is assigned to each edge. This number (weight) could represent costs, lengths and so on. A weighted digraph is known as a network. A flow is a network where each node has a capacity and each edge receives flow[20].

‐ 57 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

A multigraph is a graph in which is permitted having multiple edges (two or more edges that are incident to the same two vertices) and/or loops (edge that connects a vertex to itself). If the multigraph is directed, then is known as multidigraph.

A path P=(V,E) is a graph of the form V={vo,v1,..,vk}, E={eo,e1,..,ek} where all the nodes ei are all distinct. If the start node is the same than the end node, then the path is known as cycle. A walk is a path in which nodes or links may be repeated. A circuit is closed walk. A trail is a path in which all the edges are distinct. In a Hamiltonian path each node is visited exactly once. In an Eulerian trail each edge is visited exactly once. An Eulerian circuit (or tour) is an Eulerian trail which starts and ends on the same vertex [44]. A Hamiltonian circuit (or tour) is a Hamiltonian path which starts and ends on the same vertex. A tree is a graph in which any two nodes are connected by exactly one path. In other words, a tree is connected graph with no cycles. A forest is a graph with no cycles. In other words, a forest is a disjoint union of trees.

All in all, web navigation can be modelled by means of a finite multidigraph, that is, a finite directed graph (finite set of web pages and nodes) in which multiple edges and/or loops are allowed. Any web page of the SUT will correspond to a single node within the graph. The following step in the automation process is to define the structural model coverage criteria. Due to the fact that a multidigraph is a transition‐based model, the following coverage criteria can be applied [141]: ‐ All‐paths: Every path must be traversed at least once. ‐ All‐states: Every state of the model is visited at least once. ‐ All‐configurations: Every configuration of a graph is visited at least once. ‐ This coverage criterion applies for systems with parallel execution. If a snapshot is taken of such a parallel system during its execution, two or more active states can be found. Each of these snapshots is called a configuration. For systems that contain no parallelism, this coverage criterion is the same as all‐states coverage. ‐ All‐transitions: Every transition of the model must be traversed at least once. ‐ All‐transition‐pairs: Every pair of adjacent transitions in model must be traversed at least once. ‐ All‐loop‐free‐paths: Every loop‐free path must be traversed at least once. ‐ All‐one‐loop‐paths: Every path containing at most two repetitions of one (and only one) configuration must be traversed at least once; ‐ All‐round‐trips: Is similar to the all‐one‐loop‐paths criterion because it requires a test for each loop in the model, and that test only performs a single iteration around the loop.

The hierarchy of these criteria types is illustrated in the figure below:

‐ 58 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 18. Transition‐based Coverage Criteria Due to the fact that the graph represents the web navigation of the SUT, the selected coverage criteria will be all‐paths coverage, and concretely the all‐transition type. This criterion establishes that each edge (web transition) is traversed at least once. As Figure 18 shows, this criteria also implies that each vertex is visited at least once (all‐states). Therefore, given a multidigraph, it is necessary to be able to select the different paths within it. Once the independent paths are found, the automation of the navigation will be performed. As travel along these paths, testing and static analysis is carried out in each page in order to assess the selected quality attributes (functionality, performance, security, compatibility, usability and accessibility). In testing terms, the model of a web application using a multidigraph is the system testing level. The evaluation of each independent path can be considered as integration testing. Finally, the assessment of each single page is the lowest level, i.e. unit testing. This approach is illustrated in as follows:

Figure 19. Methodology Levels Thus, the result of the automated assessment of the SUT will be the composition of the results in quality control for such attributes: (I) ∑,,,,, Where: ‐ : Quality control results. ‐ : Functionality results. ‐ : Performance results. ‐ : Security results. ‐ : Compatibility results. ‐ : Usability results. ‐ : Accessibility results.

4.5. Summary

This section has presented the methodological basis of this thesis, which is basically composed by two parts: on the one hand the quality dimensions to be covered; on the other hand the

‐ 59 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

generic process to automate the quality control for web applications in the client‐side. The quality goals to be covered are summarized in Figure 20. In this picture, the quality dimensions to be assessed with software testing are illustrated with red colour background (functionality, performance, and security). The quality attributes to be evaluated using static analysis are illustrated with green colour background, i.e. compatibility, usability, and accessibility.

Performance

Functional Security V&V Non‐Functional Compatibility

Usability

Accessibility

Figure 20. Methodology Quality Dimmensions Finally, the methodology establishes the generic process to perform the automation of quality control of web applications. This process has four steps. First, the web site under study is modelled using a multidigraph, i.e. a finite directed graph in which multiple edges and/or loops are allowed. Second, some method should be used in order to find the independent paths within the multidigraphs. Third, each found path is traversed by automatically browsing the web application from the client‐side. Finally, for each page within each path, it is carried out testing and static analysis (i.e. software quality control) in order to assess the selected quality factors. This process is summarized as follows:

5. Quality control results are 4. Automated aggregated in a testing and unified report analysis is 3. Each path in the navigation is performed in traversed each state of the paths 2. Navigation is automatically modelled using a multidigraph

1. Testers define the correct navigation structure

Figure 21. Methodology Process Each step of this process is detailed in the chapters 5 (automated functional) and 6 (automated non‐functional) of this PhD dissertation.

‐ 60 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 5. Automated Functional Testing

Anyone who has never made a mistake has never tried anything new.

‐ Albert Einstein

esting is the main activity performed for evaluating software‐intensive systems quality, and for improving it, by identifying defects and problems [1]. This section is focused on the automation of functional testing of web applications in the client‐ side. Web testing is a difficult task, due to the peculiarities of such applications. A Tsignificant conclusion has been reached in the survey of web testing depicted in [34]: “further research efforts should be spent to define and assess the effectiveness of testing models, methods, techniques and tools that combine traditional testing approaches with new and specific ones”. In line with this statement, and following the guidelines explained in the methodology, this contribution presents specific methods to perform automated functional testing for web applications in the client‐side. Functional testing has the responsibility of uncovering failures of the applications that are due to faults in the implementation of the specified functional requirements. Di Lucca and Fasolino draw an important conclusion about functional testing for web applications [34]: “As to the functional testing, existing tools main contribution is limited to manage test case suites manually created, and to match the test case results with respect to a manually created oracle. Therefore, greater support to automatic test case generation would be needed to enhance the practice of testing Web applications”. This piece of research presents a method to perform functional testing for web applications by automating its navigation using a real browser. On one hand, web navigation is the process of traversing a web application using a browser. On the other hand, as depicted in the state‐of‐ the‐art section, functional requirements are actions that an application must do [126]. Therefore, the evaluation of the correct navigation of web applications results in the assessment of the specified functional requirements. The method proposed to perform this automated functional testing can be seen as the basis of this dissertation. As depicted in the methodology section, the automation will be led by the correct navigation structure defined by using several ways: UML, XML, and R&P scripts. Moreover, this automation of the web navigation will be also used to guide the assessment of the selected non‐functional attributes, namely performance, security, compatibility, usability and accessibility (chapter 6).

‐ 61 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The remainder of this chapter is divided as follows. First, I present the concept and metamodel to represent web applications handled in this dissertation. Second, the automation of the functional testing approach is depicted. Third, a thorough description of the different way of modelling web applications for developers and testers is presented, i.e. UML, XML, and R&P scripts. Fourth, a survey and laboratory experiment is presented in order to look for the most suitable way to find the independent paths within a multidigraph representing the navigation. Finally a summary of this contribution is provided.

5.1. Scope of the Dissertation

Web applications are accessed by navigation mechanisms implemented by hyper‐links. Focusing in the client‐side of a web application, the interaction is reduced on web browser using HTTP to a remote server. Thus, focusing on the navigational nature of the Web, such applications can be seen as a set of web states:

(II) ,,…,

Each web state is composed by a set of elements that can be accessed with the API Document Object Model (DOM). The nature of such elements is heterogeneous, and it is identified by its Internet media type, originally called MIME (Multipurpose Internet Mail Extensions) type. A complete list of the different kinds of MIME types can be found on the W3Schools web site13. IANA (Internet Assigned Numbers Authority) manages a registry of these types. The list of such types is available on its web site14, and it is summarized as follows:

‐ Application: For multipurpose elements, for example: application/, application/json, application/zip, and so on. ‐ Text: For example, text/html, text/, and so on. ‐ Image: For example, image/gif, image/jpg, and so on. ‐ Audio: For example, audio/mpeg, audio/ogg, and so on. ‐ Video: For example, video/mpeg, video/mp4, and so on. ‐ Message: For example, message/http, message/rfc822, and so on. ‐ Model: For 3D model, such as model/vrml, model/iges, and so on. ‐ Multipart: For archives and other objects made of more than one part. For example, multipart/mixed, multipart/encrypted, and so on. ‐ Vnd: For vender‐specific files, for example application/msword. ‐ X: For non‐standard file, such as application/x-latex. ‐ X‐PKCS: For PKCS (Public‐Key Cryptography Standards) files, for example application/x-pkcs7-mime. The following equation represented a state composed by a set of elements:

(III) ,,…, The most important kind of elements will be those based on text, since in this kind of elements are contained the HTML elements. HTML elements can contains web forms, which are the elements which contains data to be submitted to server.

13 http://www.w3schools.com/media/media_mimeref.asp 14 http://www.iana.org/assignments/media‐types/index.html

‐ 62 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Given a web application, its navigation always has an entry state. This state is identified by its URL, and the following states are connected by means of transitions. Therefore, in order to know a navigation of a web application, it should be known the entry point (an URL) and the sequence of transitions among the following states. In other words, it is not necessary to know each URL of the states after the first one to model the navigation of a web application. Each web transition is composed by a sequence of atomic actions ∝. Examples of atomic actions could be clicking a link, moving the mouse over some HTML element, and so on. The factor that distinguishes a transition is the fact that when the set of atomic actions in is performed, as a result a HTTP request from the client to the server is triggered. This HTTP request will result in a HTTP response that will change the state to . All this information is summarized in the following equation:

(IV) ∝,∝,…,∝ | These actions are based on the DOM event specified by the W3C15. The following table summarized such events:

Table 7. Literals for Actions in Web Transitions Literal Description blur An element loses focus change The content of a field changes click Mouse clicks an object dblclik Mouse double‐clicks an object error An error occurs when loading a document or an image focus An element gets focus keypress A keyboard key is pressed keydown A keyboard key is pressed or held down keyup A keyboard key is released load A page or image is finished loading mousedown A mouse button is pressed mousemove The mouse is moved mouseout The mouse is moved off an element mouseover The mouse is moved over an element mouseup A mouse button is released resize A window or frame is resized select Text is selected unload The user exits the page

All in all, in this dissertation I am going to employ the following metamodel to represent web applications. This metamodel implements the concepts depicted in equations II, III, and IV:

15 http://www.w3.org/TR/DOM‐Level‐2‐Events/events.html

‐ 63 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 22. Web Site and Quality Control Metamodel

‐ 64 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

5.2. Approach

The approach I propose to automate the functional testing for web applications can be seen as an aggregation of the following automated methods (depicted in the state‐of‐the‐art in section 2.5): ‐ R&P. Linear scripts using a record and playback method is used. This approach is considered the 1st generation of AST frameworks. ‐ Data‐driven approach (2nd generation). This testing approach means that using a single test case driving the test with input and expected values from an external data source instead of using the same hard‐coded values each time the test runs. ‐ MBT (3rd generation). UML (use cases, activity, and presentation diagram) and XML models from analysis and design phases in the SDLC will be reused in order to guide the automation. In order to achieve the data‐driven approach, the automation will mean the separation of the test case and test data/expected outcome generation. Test cases will be generated using a programming language to be defined in section 7. In order to store the test data and expected outcome a tabular data file will be used. The template of this file will be as follows:

Table 8. Test Data and Expected Outcuome Template Test data (input) Expected outcome (output)

_ _ … _ _ _ … _

data1_1 data2_1 … datan_1 outcome1_1 outcome2_1 … outcomem_1

data2_1 data2_2 … datan_2 outcome2_1 outcome2_2 … outcomem_2 … … … … … … … …

This template can be seen as the implementation of a decision table, which is one of the methods described in the state‐of‐the‐art section regarding automated test oracles (see Table

3 on section 2.5.3). Each input and output element (_ and _) is located in the web pages using the pseudocode depicted in Snippet 2. Each file of data of the table will be automatically filled during the process of test case generation. All in all, the method I propose to perform automated functional testing for web applications has one strong prerequisite: there should be a model of the navigation behaviour of the web under test. As depicted before, this navigational model is one of these three notations: UML (using NDT), or XML, or R&P. This requisite is illustrated in the red box labelled as pre‐ automation in Figure 23. Once test cases for the navigation paths are generated, additional input and output data can be manually added to drive more test cases with the same test logic. These data (input and output) can be stored as new files in the tabular file as depicted in Table 8. This process is shown schematically in the yellow box labelled as post‐automation in Figure 23. Regarding test case generation, the automation is done within the test implementation level of the testing process depicted in the methodology (see chapter 4, Figure 16). Therefore, the

‐ 65 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

automatic test case generation implies three different stages: i) Test logic generation; ii) Test data generation; iii) Test oracle generation. Test logic generation is illustrated in the green box (labelled as “logic”) in Figure 23. This step takes as input the model from the pre‐automation stage, i.e. a model in UML, or XML or R&P. This logic generation pass through the following steps: ‐ White‐Box Parser. This entity is in charge of translating the different models used for testers (UML, XML, or R&P) to the internal way of modelling web applications. ‐ As depicted in the methodology, graph theory is employed to handle the navigation of the web under by means of multidigraphs. ‐ Paths. Some algorithm or method (to be depicted in section 5.4) should be used to find the set of independent paths in the navigation. These paths correspond to a sequence of web pages that should be exercised against the SUT to ensure the navigation requirements. Test data generation is illustrated in the purple box (labelled as “data”) on Figure 23. This stage is also fed with the navigation model from the pre‐automation stage. The process is as follows: ‐ Black‐Box Parser. This module extracts test data and expected outcome from the input model. As depicted in section 5.2, XML models can include test data and R&P models can include test data and expected outcome. Regarding test data (input), this black‐box parser should extract the value and the data type. ‐ Data type will feed a test data dictionary. This dictionary contains a collection of data that can be used as input for test cases. For the selection of specific value, besides the type of data, a module that generates a random pointer will be used (randomizer). ‐ Therefore, the data required for the test cases consist on the aggregation of three different sources: i) Data from the XML and R&P models; ii) Randomly generated data from on a test data dictionary; iii) Manual data included as new rows in the tabulated file (post‐ automation). Test oracle generation is illustrated in the green box (labelled as “oracle”) on Figure 23. This module has the following parts: ‐ Outcome analyser. This module collects data from the response of the SUT and extracts the following information: i) Navigation state; ii) Actual data returned by the application. In order to find out the real state navigation, the aggregation of data field will be used. ‐ White‐Box Oracle. This module will establish verdicts by comparing the expected to the current state. The expected state is set by the navigation path previously extracted in the test logic module. The real state is extracted from the SUT’s response by the outcome analyser using the procedure described before (aggregation of data fields). ‐ Black‐Box Oracle. This module will establish verdicts by comparing expected with actual data. The expected data comes from the black‐box parser of the test data module. In addition, additional expected data can be added in the post‐automation stage by adding new information in the tabular file depicted in Table 8. ‐ Verdicts from white and black‐box oracles will become the test report.

‐ 66 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 23. Automated Functional Testing Schematic Diagram The key piece in the presented approach is a block which has been named Browser. This entity can be seen as the conductor of the automation. It has as input the test data and the information about the paths, i.e. the set of states to be traversed and the transition between these states. Thus, the browser is in charge of performing the navigation using a real browser by exercising the client‐side of the web under test using the path information and test data. As a result, the web under test returns the real value of the navigation in terms of functionality (output data) and structure (navigation states). Therefore, the automated verification performed by the browser element is summarized as follows: ‐ State verification: This verification is two‐folded. On one hand, the navigation is ensured (white‐box, i.e. structural testing). To achieve that aim, each state is validated as the aggregation of the defined test data. For this reason, the existence of each defined test data is ensured. Moreover, as depicted in section 5.1, a web state is composed by a set of elements (e.g. frames, images, and so on). Therefore, each element in the DOM of each state is checked. On the other hand, the functionality is ensured (black‐box, i.e. functional testing). Thus, each test oracle is assessed using the expected values against the real data. ‐ Transition verification: As depicted in section 5.1, each transition is composed by a set of actions (e.g. mouse over an element and then click on a link). These actions are composed by some locators and which are assessed and exercised. In addition, before executing the transition there could appear JavaScript notifications. These warnings are captured in order to complement the report to be generated.

‐ 67 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

5.3. Modelling Web Navigation

Modelling can help to understand the growing complexity of web applications. Regarding modelling web applications, in some cases new models have been proposed, while in other cases, existing modelling techniques have been adapted from other software domains. As explained in section before, the structure of the web is represented by means of a multidigraph using the JUNG library. The next step is to establish a way of modelling the web application for developers. As depicted in the methodology, software development artefacts from the analysis and design phases should be reused in order to guide the automation of testing. To achieve automated functional testing, requirements should be described in a form that can be understood by software programs. Using a MBT approach, test cases are derived from a given model of the SUT [27]. Therefore, UML diagrams can be used for describing requirements a driving the test automation process. The R&P approach is a useful way to represent the structure of a web application by recordings interactions with the application through the browser. This method is more agile than UML, since the application can be developed avoiding the formal design phase. Halfway between the UML models and R&P, I propose a syntax‐neutral way of modelling the navigation using a specific created XML notation. XML (Extensible Markup Language) provides an easy way to store and share information. To provide the formal declaration of this XML format, XML Schema language (also known as XML Schema Definition, XSD) will be employed to perform the formalization of the navigation constraints, expressed as rules. The following subsections describe each of these options (UML, XML, and R&P).

5.3.1. UML Models The de‐facto notation standard widely employed for modelling is UML (Unified Modelling Language). UML 2.0 models of the SUT can be used to define the web navigation. The standard diagrams in UML 2.0 are the following [84]:

Table 9. UML 2.0 Diagrams Type Description Use Case Functionality from the user’s viewpoint Activity The flow within a Use case or the system Class Classes, entities, business domain, database Sequence Interactions between objects Interaction overview Overview interactions at a general high level Communication Interactions between objects Object Objects and their links State machine The run‐time life cycle of an object Composite Structure component or object behavior at run‐time

‐ 68 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Component Executable, linkable libraries, etc. Deployment Hardware nodes and processors Package Subsystems, organizational units Timing Time concept during object interactions Type Description Use Case Functionality from the user’s viewpoint Activity The flow within a Use case or the system Class Classes, entities, business domain, database Sequence Interactions between objects

For MBT, use case models and activity diagrams are particularly effective since they bridge requirements to system tests, and the can be created early in the development life cycle [10]. Use case diagrams offer a perspective of the functional requirements of the application interaction with the actors. The use case for testing can be developed early in the project life cycle as soon as the requirements are available. Activity diagrams are used to show details of each use case. These diagrams describe the flow within a use case. Activity diagram for web applications describes the navigation structure. In order to model the data which is handled by the web application, some kind of presentation diagrams are needed. The standard diagrams in UML 2.0 does not provide in a standard way any diagrams (see Table 12) to model some specific aspects of the web applications, such as the presentation. For that reason, specific UML extensions have been developed: - Web Site Design Method16 (WSDM) is a user‐centred approach for the development of web sites that models the application based on the information of the users’ groups [32]. - Scenario‐based Object‐Oriented Hypermedia Design Methodology (SOHDM) proposes a requirement specification based on scenarios [135]. - Relationship‐Navigational Analysis (RNA) is a methodology that offers a sequence of steps to develop web applications focusing mainly on analysis [29]. - Hypermedia Flexible Process Modelling (HFPM) is a wide engineering‐based approach, which includes analysis‐oriented descriptive and prescriptive process modelling strategies [17]. - Object Oriented Hypermedia Design Model (OOHDM) postulates the separation of concerns that defines its various models (requirements, conceptual, navigation, abstract interface and implementation). OOHDM, and its successor, SHDM (Semantic Hypermedia Design Method, which uses Semantic Web models) are supported by an open source, freely available environment, HyperDE17. - UML‐based Web Engineering18 (UWE) is a software engineering approach aiming to cover the whole life‐cycle of web application development [32]. - W2000 is an approach that also extends UML notation to model multimedia elements. These multimedia elements are inherited from HDM (Hypermedia Design Model) [89].

16 http://wsdm.vub.ac.be/ 17 http://www.tecweb.inf.puc‐rio.br/hyperde 18 http://uwe.pst.ifi.lmu.de/

‐ 69 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

- Web Modeling Language (WebML) is a high‐level specification language for designing complex web applications. It offers a visual both in Entity‐Relationship and UML, although UML is preferred by the authors [17]. - Navigational Development Techniques19 (NDT) is a methodological approach that offers specific models and techniques to deal with the requirements of web applications. It is based upon the use of UML [40]. Recent research shows that software project success is directly tied to requirement quality [132]. Requirement Engineering (RE) involves all lifecycle activities devoted to identification and analysis of user requirements, documentation of the requirements as specification, and validation of the documented requirements against user needs. The following table presents a comparison of the studied UML‐based technologies using which types of requirements are handled by each approach [54]. Each column shows whether or not the technology manages different types of requirements: i) data requirements (establishes how information is stored and administrated); ii) user interface (interaction requirements); iii) navigation (users’ navigation needs); iv) personalization (how requirements are dynamically adaptable); v) transactional; vi) non‐functional requirements.

Table 10. UML‐Based Web Modelling Technologies Comparision Data UI Navigation Personalization Transactional Non‐Functional

WSDM    SOHDM    RNA     HFPM     OOHDM    UWE      W2000    WebML      NDT      

Having seen these results, NDT seems to be a good choice to model web applications since it covers each kind of the requirements studied. NDT proposes dividing the treatment of requirements in accordance with the idea of concept separation, which is followed by most of the other web approaches in the design phase. Therefore, the UML models to guide the MBT approach of this dissertation will be based on NDT. Next, an example of the three kinds of UML diagrams selected to model the navigation in NDT is shown. First, a simple use case is illustrated in Figure 24. This use case implements the typical formal requirement 'Log on to the website'.

19 http://www.iwt2.org/

‐ 70 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

DRS - Requisitos Funcionales 3.3.1. DIAGRAMAS...

«RF» RF-01.Login

AC-01.User

Figure 24. Use Case Diagram Example NDT links each use with its corresponding activity diagram (notice the links symbol ∞ in the diagram before). Apart of this link, there is a direct relationship between each use case and its activity since the name of both of them should be the same (in this example, RF-01.Login). The activity diagram for this use case is illustrated in Figure 25. This diagram has two different activities called LoginPage and Init, representing two independent web pages. The first one is LoginPage, due to the fact that the Start node is pointing to this activity. In this page the user is in charge of introducing its credentials (login and password) and then submitting the information by clicking in the button named frmDatos_0. The business logic of the application will ensure that the credentials are valid, typically by checking a database. In that case, the navigation will go on to the next page, called Init. Otherwise, the navigation will return to the first page again, i.e. LoginPage.

Figure 25. Activity Diagram Example The guard in the links from one activity to the following (annotated as frmDatos_0 in Figure 25) describes how a web state changes to another. This guard is interpreted using the following procedure:

‐ 71 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Snippet 2. Procedure to Translate Guards into HTML Elements Function LookForHTMLElement(Guard) Found = nothing For each frame in the frameset (if frames exist) For each HTML element in the frame Found = Look for Guard in the id/name/title/value tag If Not Found Found = Look for Guard as text If Not Found Found = Execute Guard as XPath expression End If End If End For End For Return Found End Function In the example before, the transition from LoginPage to Init is the most common way of browsing web applications: transitions from page to page by clicking HTML elements. Nevertheless, this way of modelling may be insufficient for all types of web applications. For example, if the event that triggers a transition is not a click but a double click or carrying out a mouse action, the UML model is not enough. For that reason, and following the equation (IV) depicted in section 5.1, a transition should be composed by several atomic actions. In order to achieve this aim in UML diagrams, the guard of a transition between two activities will follow this notation:

(V) ,, ; ,, ; … ; ,,

Each group of three , is separated using the semicolon symbol. For each , the procedure depicted in Snippet 2 is applied to find its corresponding HTML element. Each can be one the following literals listed in Table 7. These literals 20 are based on the DOM event specified by the W3C . Finally, each is optional. This field is only used when a keyboard event is being described (keypress, keydown and keyup). In this case, this third element will be the key that triggers the event, i.e. a character. Figure 26 illustrates an UML activity diagram with a transition composed by two atomic actions. In this figure, first a mouseover action is performed to the element frmElement_1. Second, an action click is performed to the element frmElement_2:

20 http://www.w3.org/TR/DOM‐Level‐2‐Events/events.html

‐ 72 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 26. Activity Diagram with Complex Transition The last kind of diagrams needed to model completely the navigation is presentation. In NDT, the main kind of presentation diagrams is called Visualization Prototypes (PV). Such diagrams represent the screens that correspond to each state of the web navigation. The elements of these screens are the following: - Text boxes allow the user to input text information to be used by the application. - Check boxes allow the user to make multiple selections from a number of options. - Radio buttons allow the user to choose only one of a predefined set of options. - Combo boxes allow the user to choose one or more items from the list of existing options. Figure 27 illustrates the screens of the presented example. This diagram contains two screens, for the states LoginPage and Init. These kind diagrams presents in a graphical way the web forms in each page. These forms are used to submit information from the client to the server. These diagrams are needed to identify such forms, since their fields will interpreted as the test data (input) in the test cases to be generated.

DRS - Requisitos de Interacción 3.4.1.DIAGRAMA DE INTERACCIÓN

«PV» «PV» LoginPage Init

username razonSocial nombreComercial

password identificadorFiscal idFormatoFactura

receptora idTipoEntidad

idUsuario

password confPassword

nombre dni

primerApellido segundoApellido

fax telefono

email idioma

Figure 27. Presentation Diagram Example

‐ 73 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

5.3.2. XML Files In order to enhance the way of modelling a web application, I have created a specific XML‐ based notation to support the event‐drive nature of web applications. This notation is based on XSD schema, and it can be seen as XML implementation of the web site meta‐model proposed in Figure 22. The complete content of this schema can be found on Annex I. This XSD schema defines a website as a collection of states and transitions. The initial state is always unique, since it identifies the entry point to the navigation. In addition, there is a finite number or web states connected by transitions, as depicted in Figure 28, which represents the XSD type for a web site. There is a mandatory XML attribute in the definition of a web site named base. This attribute is the starting URL for the navigation. The automation of the browsing will be carried out from this URL.

Figure 28. XSD Graphic Representation for a Web Site Each state is recognised by a unique identifier. Moreover, each state can contain a set of data fields and oracles. Each data field contains the following information (see Figure 29):

‐ Locator: Data field identifier. ‐ Ref: Optional reference to transition (attribute id in transitions). ‐ Type: Data type. It corresponds to the HTML input types, i.e. text, password, checkbox, radio, submit, reset, file, hidden, image, button. ‐ Required: Boolean value than indicates whether or not the data field is mandatory. ‐ Stereotype: One of the following types: email, date, name, surname, address, string, integer. ‐ Value: Collection of values of the data field. Each oracle (assertion) contains the following information:

‐ Locator: Oracle identifier. ‐ Ref: Optional reference to transition (attribute id in transitions). ‐ Type: Oracle type. It could be: text (assertion for a text in the element identified by locator), notText (the opposite of text), textPresent (assertion for a text present in the any element in the web page), textNotPresent (the opposite of textPresent), value (assertion for a value in the element identified by locator), notValue (the opposite of value).

‐ 74 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 29. XSD Graphic Representation for a Web Page Finally, web links (see Figure 30) are composed by an attribute called from (which is the identifier of the web page source) and a collection of actions and web targets (attribute to). The action attribute is composed by the following fields: ‐ Target: Identifier (id field) of the destination web page. ‐ Event: Literal that describes the action performed. This field follows the same notation described in Table 7. ‐ Key: Optional field containing the button that triggers events keypress, keydown and keyup. Each transition changes the navigation from a source state to another target. This destination is defined in the attribute to, which has the following properties: ‐ State: Target state identification (id). ‐ Id: Optional identification for the transition. This id is used for condition data fields and oracles tagged with the attribute ref. ‐ Weight: Integer value which determines the importance of the transition. By default, each transition is assigned a numeric weight of 5. When the CPP algorithm is applied, it looks for this weight to decide which the following transition in the navigation is. If there is several alternatives, the transition with a higher weight will be selected first.

Figure 30. XSD Graphic Representation for a Web Transition A simple example of navigation based on this XSD‐schema is shown in Snippet 3. This piece of code illustrates the same example depicted in section before for UML in Figure 24 (use case), Figure 25 (activity diagram), and Figure 27 (presentation diagram). As shown in this example, the way of modelling the selector from one web to another (see Figure 25) is by means of different outputs (several to tags inside link).

‐ 75 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Unlike the UML, XML allows the test data modelling for the data fields that make up web pages. For example, Snippet 3 shows one value for data fields username and password (Administrator and admin respectively).

Snippet 3. XML‐based Navigation Example

Administrador admin bad-login bad-password

Login or password invalid

Welcome.

5.3.3. R&P Approach In R&P approach, during the recording stage, the user interacts with the system manually via the web UI while a tool records the interactions. During playback stage, the tool interacts with the system via the UI to replay the original session. This approach is very useful to automate the navigation, since a recorded script contains itself the navigation, the data introduced, and the expected outcomes.

‐ 76 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

R&P is more comprehensive than UML and XML modelling, since the recording stage is carried out against the real application. UML models the navigation behaviour of the SUT. XML enhances this model by adding the capability of inserting test data in the navigation. Finally, R&P enhances both UML and XML because it introduces the possibility of adding the expected outcome (i.e. the oracle) in the navigation model. The specific tool to perform R&P will be studied in chapter 7 of this dissertation.

5.4. Finding the Paths in a Multidigraph

A path is a sequence of pages designed within an overall structure to provide a linear experience for the user, to provide a single coherent narrative within a larger collection of information [9]. As introduced in the methodology, I employ graph theory to represent and work with the web navigation by using finite multidigraph. Thus, I need a method or algorithm to break this multidigraph into non‐hamiltonian paths [64]. In a non‐hamiltonian path each node can be visited more than once. The decomposition of such graphs in its paths can be seen as the browsing of the navigation from the initial node to the rest of nodes. As depicted in the methodology, the navigation has a strong condition: each edge (transition) has to be visited at least once. This condition implies that each node (state) is also visited at least once. In order to find these paths, the following table presents a survey on the possible algorithms and methods. Some discussion and experiments are provided below in order to select the best option to solve the problem.

Table 11. Techniques and Algorithms for Decomposing a Graph into Paths Algorithm Description Graph Graph traversal is the facility to move through a graph visiting every vertex once. Traversal There are two possible traversal methods for a graph: Breadth‐First Search (BFS) and Depth‐First Search (DFS) [21]. BFS visits all the vertex, beginning with a specified start. No vertex is visited more than once. BFS makes use of a queue data structure which holds a list of vertices which have not been visited yet but which should be visited soon. Since a queue is a First‐In First‐Out (FIFO) structure, vertices are visited in the order in which they are added to the queue. Visiting a vertex involves outputting the data stored in that vertex, and also adding its neighbours to the queue. DFS works in a similar way, except that the neighbours of each visited vertex are added to a stack data structure. Vertices are visited in the order in which they are popped from the stack, i.e. Last‐In, First‐Out (LIFO). Traveling A Hamiltonian tour is a cycle that visits each vertex exactly once. Giving a graph, Salesman to find a Hamiltonian tour within it is a NP‐hard problem. The Traveling Problem Salesman Problem (TSP) tries to find the most efficient (i.e., least total distance) Hamiltonian tour through each of each vertex of a graph [44]. TSP is a variation of the Hamiltonian tour problem, and it also belongs to the class of NP‐hard problems. Shortest The Shortest Path Problem (SPP) is the problem of finding a path between two Path nodes within a graph such that the sum of the weights of its constituent edges is Problem minimized. This problem is sometimes called the single‐pair shortest path problem, to distinguish it from [21]: i) The single‐source shortest path problem

‐ 77 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

find the shortest paths from a source vertex to all other vertices. ii) The single‐ destination shortest path problem finds the shortest paths from all vertices to a single destination. iii) The all‐pairs shortest path problem finds the shortest paths between every pair of vertices v, v' in the graph. The main algorithms employed in the different categories of SPP are: Dijkstra, Bellman‐Ford (in which edges weights may be negative), A* (pronounced “A star”, which uses heuristic to speed up the search), and Floyd‐Warshall. Chinese The Chinese Postman Problem (CPP), also known as the postman tour or route Postman inspection problem, is the problem of finding a shortest circuit that visits every Problem edge of a graph at least once, i.e. the Chinese Postman Tour (CPT). Finding an optimal solution of these problems is NP‐complete. CPP was first coined by Alan Goldman of the U.S. National Bureau, as it was originally studied by the Chinese mathematician Mei‐Ku Kuan in 1962 [114]. Thimbleby proposes a solution for CPP in form of deterministic algorithm in [137], providing an executable Java to solve this problem. The constraint imposed by this algorithm is that the input digraph has to be strongly connected with no negative weight cycles. It considers a graph as a collection of arcs , where label is an identifier for an arc from vertex i to j, and c the cost associated with it. Node The node reduction algorithm finds out the path between two nodes, typically Reduction the entry and exit nodes by reducing the rest of graph connecting these nodes [13]. It employs graph algebra to achieve this goal. The multiplicative operator in graph algebra means concatenation: if edge a is followed by edge b, their product is a∙b (path product). The additive operator is selection: if either edge a or edge b can be taken, their sum is a + b. A path expression contains path products and zero or more additive operators, and are usually represented by upper case letters (e.g. A = a∙b) [5]. In graph algebra it is usually employed the graph matrix representation, which is a square array with one row and one column for every node in the graph. Each row‐column combination corresponds to a relation between the node corresponding to the row and the node corresponding to the column.

None of these methods and algorithms fits exactly in the problem at hand: to find the different paths within a graph. BFS and DFS algorithms traverse each vertex within a graph, but these methods do not ensure that each edge is visited at least once. This applied to web navigation is not acceptable, due to the fact that we need to visit each web transition. TSP presents the same problem: Hamiltonian tours have nothing to do with edges but vertices coverage. The different algorithms of SPP are not useful in this domain due to the fact that such algorithms look for the shortest path between nodes. CPP and node reduction fit with the objective of 100% edge coverage, but it has a strong constraint that cannot be ensured for any multidigraph modelling web navigations: they should be strong connected. Therefore, in order to use CPP or node reduction in this approach, the input multidigraph should be converted to strongly connected. In order to increase connectivity, digraph theory offers several alternatives [9]: i) Reversing arcs; ii) Deorienting arcs; iii) Adding arcs. This last option is the most suitable, some virtual links can be added to connect the leaf nodes (those with no out links), connecting them with the start node in the navigation.

‐ 78 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Considering as the in‐degree of a vertex (i.e. the number of arcs going into a vertex) and as the out‐degree (i.e. the number of arcs pointing out of a vertex), the difference of these values is represented as follows: (VI) If 0, vertex v is balanced (same number of arcs going into and pointing out). Hence, the set of unbalanced vertex with an excess of out‐going arcs is denoted as follows: (VII) | 0 On the other hand, the set of unbalanced vertex with an excess of in‐going arcs is the following: (VIII) | 0 Adding arcs from the leaf nodes to the root is a simple way of balance the digraph, and as a result it becomes strongly connect [9]. The least number of arcs that need to be added to make the graph strongly connect is called the deficiency [137]. Consider the digraph labelled as “i) Original” in Figure 31. The added virtual links are labelled with “R”, which means “reset”. These links are substituted by additive operator (i.e. a new path) when reducing the graph into its paths. The new equivalent digraph is also shown in Figure 31, labelled as “ii) Strongly connected”.

Figure 31. Digraph Example In this new situation, CPP can be applied to find the optimal CPT, i.e. the shortest tour with few repeated visits to cover every edge in the graph. The cost of a CPT is defined as the total arc weight, summed along the circuit. Assigning a weight of 1 for each link and applying the CPP algorithm given in [137] to the proposed example, the resulting path expression is the following:

Snippet 4. Path Expression using CPP E1·E3·E4·E5·R·E1·E7·R·E1·E2·E6·R·E0 = = E1·E3·E4·E5 + E1·E7 + E1·E2·E6 + E0 Thus, CPP has found four different paths with a total cost of 10 links. Moreover, node reduction can be also applied to the strongly connected graph in order to reduce the equivalence graph matrix. The complete explanation of how this process is done can be found in [14]. In short, node reduction algorithm has two steps: i) remove self‐loops (any node n that has an edge to itself); ii) eliminate intermediate nodes and replace them with a set of equivalent links. This method has been employed for web navigation by Rica and Tonella [124].

‐ 79 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The application of node reduction algorithm for the strongly connected digraph above is illustrated in Figure 32.

Figure 32. Node Reduction Example Therefore, the resulting path expression of this application of the node reduction algorithm to the proposed example is:

Snippet 5. Path Expression using Node Reduction (E1·E7·R + E1·E2·E6·R)(E0+E1·E2·E3)·E5·E6·R = = E1·E7 + E0·E5·E6 + E1·E3·E4·E5·E6 + E1·E2·E6 That is, four different paths with a total cost of 13 links. It is a quite similar solution than the one provided by CPP (10 vs. 13 links). This fact suggests that CPP gives better results than node reduction. In order to ensure this statement, I have made a laboratory experiment. The experiment consisted on the comparison between node reduction and CPP, using random multidigraphs (i.e., with loops and multiple edges). These graphs have been created using an incremental number of links (from 1 to 50). For each digraph node reduction and CPP is executed, comparing its cost (number of links employed in the resulting set of paths), and also the computation time (milliseconds in achieve the solution). Java language has been employed in this experiment. In order to work with graphs, I have analysed the following open‐source Java libraries: JUNG21, JGraph22, JGraphT23. The aspects to be taken into account for selecting one of these libraries are basically two: the input/output formats and the layout/rendering capabilities. In one side, the output capability is the ability of the library of exporting graphs to a file into a graph‐specific open format, such as GraphML24 Pajek (Slovene word for spider), GML (Graph Modelling Language), VCG (Visualizing Compiler Graphs) or GDL (Graph Description Language). The input capability is the opposite process, i.e., importing these formats to a Java graph. On the other side, an optional desirable capability of the graph library is exporting the graph to diagram, such as JPEG or PNG. This capability is divided into two: the first one is the capability to render the graph, and the other one is the

21 http://jung.sourceforge.net/ 22 http://www.jgraph.com/ 23 http://www.jgrapht.org/ 24 http://graphml.graphdrawing.org/

‐ 80 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

capability to layout, i.e., ordering the graphs elements following an algorithm such as circular or hierarchical organization. The following table summarizes the features of the studied libraries:

Table 12. Open‐source Java Graphs Libraries License Input Output Render Layout

JUNG BSD Pajek & GraphML Pajek & GraphML   JGraph Open XML XML  ‐ JGraphT LGPL 2.1 ‐ GraphML  ‐

Looking at the results presented in the table before, JUNG is the best choice, due to the fact that it is the one which solves all the expected capabilities. The experiment has been carried out in a PC Intel Core2 Quad (2.66 GHz) with 4 GB of RAM memory. It has been repeated 100 times, and the mean of the values (cost and time) is shown in the following charts:

Figure 33. Node Reduction vs. CPP Costs

Figure 34. Node Reduction vs. CPP Time CPP has a better behaviour than node reduction since it is more linear both in cost and time. It always has a better cost solution than node reduction (see Figure 33). In addition, while the resolution time in CPP it is always a few milliseconds, in node reduction the resolution time is exponential when the number of links is higher and higher (see Figure 34).

‐ 81 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

All in all, the deterministic CPP Java algorithm created by Thimbleby [137] is the selected solution to find out the set of independent paths within a multidigraph representing a website. This algorithm establishes the optimal CPT by looking for the optimal tour to minimize its cost, expressed as:

(IX) ∑

Where is the cost of the least cost from vertex to and is the number of times that the path from to must be taken. Therefore, this algorithm searches the Eulerian circuit by minimizing the cost paths and repeating 0 each path.

5.5. Summary

This contribution has presented a method for the automated functional testing of web application based on its navigation. The basic idea behind this approach is to exercise the SUT using a real browser, performing the navigation from state to state. Each web state corresponds to a step in the navigation model, which is ensured to be correct during the web browsing. The first kind of input for this method can be UML models of the SUT. To be specific, three kinds of UML diagrams are needed: use case, activity, and presentation diagram. Since presentation diagrams are not standard in UML 2.0, I have done a study on web modelling, and I have decide to use NDT to model web application using UML. The second alternative in this method is using an XML‐based file to model the navigation. This XML file is a simple way of structuring the navigation following a XSD schema. Third and last, this method can be fed using a recorded script (R&P approach). Therefore, in order to summarize the automation of this approach, I am going to rely in the MBT taxonomy depicted in [141], which can be summarized in the following schema:

Figure 35. MBT Taxonomy Regarding SUT, it is web applications. The proposed approach can be seen as a grey‐box testing method, since it is the combination of black‐box and white‐box approaches [14][5]: - Black‐box (functional) testing: This method ensures the correctness of the functional requirements by exercising the SUT with test data (input) and observing the correctness of the results (output). - White‐box (structural) testing: This method ensures the correctness of the structure of the web application in terms of navigation.

‐ 82 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Model independence reflects the source of the test model. In this approach, UML development analysis models are reused for testing purposes. The required models are: use case, activity, and presentation diagrams. Some features are enhanced in the analysis models in order to achieve a real testability of the SUT. Regarding state and transition information, it is a must to annotate every state (web page) and transition (link) of the activity diagram with its correspondent title and link identifier. Regarding model characteristics, this approach is deterministic since every state and transition in the navigation is different. The selected model paradigm is transition‐based, i.e. the transitions between different states (web states) of the system, by means of transitions (web transition). The test selection criterion is multiple: - Structural: Path coverage is used since the test suite must execute every path through the multidigraph. - Data: A pseudo‐random approach is used, i.e. existing data values classified by types. For the test technology, we use a graph search algorithms, concretely the CPP, which finds the shortest path through a graph that visits every edge (100% transitions and state coverage). Regarding test execution, it is based on offline testing, in which test generation and execution phases are decoupled. This improves the test traceability, due to the fact that each path is explicitly transcribed in unit test cases. Regarding test data, a tabular file with input and output will be employed for each path in the navigation. Each column in each page is the data for each field. The first row is the name of each data field, and the other rows are the data themselves.

‐ 83 ‐

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 6. Automated Non‐Functional Assessment

The best way to have a good idea is to have many of them.

‐ Linus Pauling

his section is focused in the automation of the testing and analysis activities with the aim of ensuring the non‐functional quality attributes selected in the methodology, namely performance, security, compatibility, usability, and accessibility. On one hand, automated testing will be used to evaluate the performance and security of Tweb applications in the client‐side. On the other hand, automated analysis will be employed to assess compatibility, usability, and accessibility of such applications. The remainder of this section is structured as follows. First, the description of the non‐functional assessment proposed in this dissertation is presented. Second, I focus in how automated testing is carried out to assess performance and security. Third, I explain how automated static analysis is carried out to assess compatibility, usability and accessibility. Finally some conclusions are described.

6.1. Approach

The non‐functional assessment approach proposed in this dissertation is a continuation of the functional approach described in section before. The automation of the navigation is carried out by an entity called Browser. This entity will be in charge of the orchestration of the non‐ functional assessment. Browser sends HTTP request to the web under test following the description of the navigation, i.e. the paths. For each state in each path a HTTP response is received from the SUT. The following information is extracted from these responses: - State information. Each state is formed by the aggregation of the defined data fields. - Out data. This information is useful for the functional validation. - Source code. For each HTML and CSS elements within each states, Browser extract their source code. This information is needed to perform static analysis, which is the selected quality control activity to assess compatibility, usability and accessibility (as depicted in the methodology in section 4.3). - Web session. Browser is responsible of performing the automation of the navigation. While this automation is performed, a web session can be established between the web

‐ 85 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

server and client. Browser must identify this session, since this artefact should be used to carry out performance and security testing. All in all, the schema of the non‐functional assessment is depicted in Figure 36. As this picture shows, Browser is the core element which performs the navigation. The information handled by Browser is reported to the rest of quality control activities entities (labelled as Performance, Security, Compatibility, Usability and Accessibility in the picture). Moreover, this picture introduces another piece called “Configuration” to the approach. This entity is responsible of the customization of the elements in the approach. In the picture, the following components are aware to be configured: - Browser. The configuration element should customize the way of working of the automated navigation. For example, it should be possible to choose the real browser to be employed (Firefox, Explorer, and so on) in the navigation. - Performance. In order to assess this kind of testing, the performance specification should be defined. In other words, some kind of maximum or minimum performance figures should be identified. The aim of this configuration element is to establish this information. The following sub‐section studies which are the performance parameters to be assessed, and also a general estimation for them. - Security. The selected way to assess security will be using web application scanners. Therefore, this configuration element should identify the vulnerability types to be used in the analysis. - Compatibility, usability, and accessibility. According to the methodology, these quality attributes will be assessed by means of static analysis. Therefore, in the configuration step, the set of compatibility, usability and accessibility rules should be customized. Finally, as a result of the assessment, a set of verdicts is generated from each element (functionality, performance, security, compatibility, usability, and accessibility). These verdicts will be aggregated in a final report. This report must contain the defects founds in the automation of the navigation. It must be ordered following the paths and states of the navigation.

‐ 86 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 36. Automated Non‐Functional Testing and Analysis Schematic Diagram 6.2. Automated Non‐Functional Testing

The automated non‐functional testing (i.e. performance and security testing) approach proposed in this dissertation follows the guidelines depicted in the methodology: ensure the quality attributes by the division of the navigation in paths, understanding a path as an aggregation of web states (pages). Therefore, the automation of performance and security testing for web applications can be seen as a continuation of the automated functional testing approach depicted in chapter 5, which is summarized in Figure 23 (section 5.4).

6.2.1. Performance As depicted in the state‐of‐the‐art chapter (see Table 2 on section 2.4.3), performance testing for web applications verifies the system performance specification. There are two special cases

‐ 87 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

of performance testing, namely load and stress testing. Regarding load testing, it is clearly out of the scope of this dissertation, since it evaluates the performance of a system under some predefined conditions, which are mainly related with the state of the web server, and this dissertation is focused on the client‐side. Regarding stress testing, it evaluates a system beyond its normal limits. Therefore, it can be considered a kind of performance testing with a huge workload. Thus, I am going to focus on performance testing, and stress testing would be exercised by tuning the workload (described in the next‐subsection). Hence, and focusing on automated performance testing, it typically has the following components [35]: - Scripting module. Enables recording of end‐user activity and may support many different middleware protocols. - Test management module. Allows the creation and execution of load test sessions or scenarios that represent different mixes of end‐user activity. - Load injector(s). Generates the load normally from multiple workstations or servers, depending on the amount of load required. - Analysis module. Provides the ability to analyse the data collected from each test execution. In the approach presented in this dissertation, the scripting module is equivalent to the pre‐ automation stage, in which UML, R&P and XML describes the navigation of the web under test. The test management module is implemented in the test logic component. The load injector is not covered, so this component is added (as depicted in Figure 36). Finally, the analysis module is represented as a black‐box oracle in Figure 36, since the expected performance figures (defined using the configuration element) is compared to the real performance figures obtained from the web under test. The following sub‐section describes these elements, i.e. the load injector and performance oracle modules.

6.2.1.1. Load Injector A load injector is a computer or part of an automated performance testing tool used to simulate real end‐user activity. The load injector module included in the proposed method in this dissertation will simulate concurrent users browsing the different paths of the navigation. In other words, performance testing will be similar to functional testing in the sense of the paths are the same, but this time there will be many concurrent users performing HTTP request to the web server. Automated performance test tools use one or more machines as load injectors to simulate real user activity. It’s important to ensure that none of the injectors are overloaded in terms of CPU or memory utilization, since this situation can introduce inaccuracy into the performance test results. Calculating concurrent connections for a web application is in general a difficult task. Sun published a guide with an overview of the sizing process that can be applied to most web applications [136]. Maximum number of concurrent sessions defines how many connected users a web site can handle. This figure can be calculated using this formula:

(X) _ _

‐ 88 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Where:

- : Maximum number of concurrent sessions. - _ : Expected percentage of users online. - _ : Potential users for the web application under test. Calculating the potential users of the web is a specific operation for each single web application. Anyway, some advices can be followed to get this figure: - To identify only users who are active. - To estimate a finite figure for user base conservatively. - To study access logs. - To identify the geographic locations of web users. - To review the business plan (if exists) regarding who the potential users are. After calculating the maximum number of concurrent sessions, there should be calculated the maximum number of concurrent users. A concurrent user is one connected to a web browser submitting requests to or receiving responses from a web application. The maximum number of concurrent users is the highest possible number of concurrent users within a predefined period of time. To calculate the maximum number of concurrent users, I use this formula:

(XI) Where:

- : Maximum number of concurrent users. - : Maximum number of concurrent sessions. - : Average time between page requests.

6.2.1.2. Performance Oracles In order to accurately measure performance there are a number of indicators that must be taken into account [35]: ‐ Availability. The amount of time an application is available to the end user. ‐ Response time. The amount of time it takes the application to respond a user request. ‐ Throughput. The rate at which application events occur. For example, the number of hints on a web page within a given period of time. ‐ Utilization. The percentage of the theoretical capacity of a resource such as the network or server. This dissertation is focused in the client‐side of web applications. Therefore, availability and utilization are out of the scope since these measurements should be taken in the server‐side. All in all, the performance measurements to be ensured will be response time and throughput.

6.2.1.2.1. Response Time As depicted in section 4.1, a web application is a distributed system, composed by a client (browser) that generates HTTP request to a web server through a TCP/IP networks. Web servers, in turn, issue database queries to obtain the data needed for generating the HTTP response (see Figure 37)

‐ 89 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 37. Response Time Latency Therefore, the response time observed from the user perspective (client‐side) is the aggregation of several latency time, described in the following equation:

(XII) _ _ ∑ ∑ Where:

‐ : Response time. ‐ _: Time used by the HTTP request to reach the web server.

‐ _ : Time used by the HTTP response to reach the web client. ‐ ∑ : Aggregation of time used by the web server (and web application server, if exists) to process the request.

‐ ∑ : Aggregation of time used to access to the database (if exits). In order to perform complete performance testing, performance oracles should be able to compare real measurements with the expected outcome. This leads me to ask myself the following question: What should be the expected response time for any web application? This question has no simple answer, since there is any generic industry standard for good and bad performance. Some researches attempted to map user productivity to response time [136], as follows: ‐ Greater than 15 seconds. If such delays can occur, the system should be designed so that the user can turn to other activities and request the response at some later time. ‐ Greater than 4 seconds. These delays are generally too long for a conversation requiring the end‐user to retain information in short‐term end‐user memory. ‐ From 2 to 4 seconds. A delay in this range may be sometimes acceptable. For example, it may be acceptable to make a purchaser wait 2 to 4 seconds after typing in her address and credit card number, but not at an earlier stage when she is comparing various product features. ‐ Less than 2 seconds. For complex activities, 2 seconds represents an important response‐ time limit. When the user has to remember information throughout several responses, the response time must be short. The more detailed the information to be remembered, the greater the need for responses of less than 2 seconds. ‐ Sub‐second response time. Certain types of thought‐intensive work (e.g. writing a book), require very short response times to maintain the users’ interest and attention for long periods of time. ‐ Deci‐second response time. This response time is required when instantaneous responses are required, such as the response to pressing a key of seeing the character displayed on the screen or to clicking a screen object with a mouse.

‐ 90 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The critical response time barrier seems to be around 4 seconds. Response times greater than this have a definite impact on productivity for the average user. This fact is confirmed in the several surveys [80][103] that have shown that high user latencies drive web users away. Response times above 4 seconds interrupt the user experience, causing the user to leave the web site. All in all, the figure employed in this dissertation to assert the maximum response time (i.e., a performance oracle) will be 4 seconds.

6.2.1.2.2. Throughput In broad terms, throughput measures the amount of work performed by web server. It can be defined as the number of requests processed per minute per server instance, i.e. the volume of session data stored per minute. To calculate throughput per web page, I use the following formula [129]:

(XIII) ser_rate t Where: ‐ : Throughput per web page.

‐ ser : User arrival rate (new users per second). ‐ t : Average pages per visit.

6.2.2. Security Web application security is difficult because these applications are, by definition, exposed to the general public, including malicious users. Additionally, input to web applications comes from within HTTP requests. The incorrect or missing input validation causes most of the vulnerabilities in web applications [72]. As depicted in the methodology section, security will be assessed by means of black‐box testing. This type of security testing is carried out by means of web application scanners, i.e. automated program that examines web applications for security vulnerabilities by simulating attacks on it. This involves generation of malicious inputs and subsequent evaluation of application’s response. Vulnerabilities are flaw or weakness in a system's design, implementation, or operation and management that could be exploited to violate the system's security policy [79]. New security vulnerabilities are discovered every day in commonly used applications. The Open Web Application Security Project (OWASP)25 publishes the list of the most critical web application vulnerabilities. Some common vulnerabilities and attacks are: ‐ Cross‐site scripting (XSS) vulnerabilities. The vulnerability occurs when an attacker submits malicious data to a web application. ‐ Injection vulnerabilities. This includes data injection, command injection, resource injection, and SQL injection. SQL Injection occurs when a web application does not properly filter user input and places it directly into a SQL statement.

25 https://www.owasp.org/index.php/Main_Page

‐ 91 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Cookie poisoning is a technique mainly employed for achieving impersonation and breach of privacy through manipulation of session cookies, which maintain the identity of the client. ‐ Unvalidated input. It includes tainted data and forms, improper use of hidden fields, use of unvalidated data in array index, in function call, in a format string, in loop condition, in memory allocation and array allocation. ‐ Authentication, authorization and access control vulnerabilities could allow malicious user to gain control of the application or backend servers. ‐ Incorrect error handling and reporting may reveal information thus opening doors for malicious users to guess sensitive information. OWASP initiative has also created the Application Security Verification Standard (ASVS) with the primary to normalize the range in the coverage and level of rigor available in the market when it comes to performing web application security assessment [78]. The standard provides a basis for testing application technical security controls, as well as any technical security controls in the environment, that are relied on to protect typical vulnerabilities. Therefore, this standard can be used to establish a level of confidence in the security of web applications. The ASVS defines four levels of verification that increase in both breadth and depth as moving up the levels: ‐ Level 1, “Automated Verification”. It is typically appropriate for applications where some confidence in the correct use of security controls is required. There are two constituent components for Level 1. Level 1A is for the use of automated application vulnerability scanning (dynamic analysis) tools in order to detect vulnerabilities in the application’s security controls. Level 1B is for the use of automated source code scanning (static analysis) tools to search through the application source code to find patterns that represent vulnerabilities. ‐ Level 2, “Manual Verification”. It provides some confidence in the correct use of security controls and confidence that the security controls are working correctly. Level 2A consists on manual application security testing consists of creating dynamic tests to verify an application’s proper design, implementation, and use of security controls. Level 2B consists on manual code review consists of human searching and analysis of the application source code in order to verify the application’s design, implementation, and proper use of security controls. ‐ Level 3, “Design Verification”. It is typically appropriate for applications that handle significant business‐to‐business transactions, including those that process healthcare information, implement business‐critical or sensitive functions, or process other sensitive assets. ‐ Level 4, “Internal Verification”. It is typically appropriate for critical applications that protect life and safety, critical infrastructure, or defence functions. Therefore, the proposed method to evaluate security in this dissertation corresponds to the Level 1A, since 1B requires access to the source code in the server‐side. It is not the greatest procedure to assess the security of a website, but at least provides a minimum effort for developers because it is completely automatic.

‐ 92 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Similarly to performance testing, security testing will be assessed by adding a web application scanner to the functional approach. This scanner will attack each page of each path in the navigation. This module will also be in charge of the security oracle, i.e. to check whether or not each attack is effective or not. When an attack discovers a security hole, it should be documented in the test report as usual. Another similar way of classify web applications security assessment tools was proposed by Curphey and Araujo [71], establishing eight categories of such tools: source code analysers, web application (black‐box) scanners, database scanners, binary analysis tools, runtime analysis tools, configuration management tools, HTTP proxies, and miscellaneous tools, as follows: ‐ Source code analysers. This kind of tools search patterns in source code files to detect security defects. The main benefit of these tools is that they integrate tightly into the development process and early in the construction process. The main drawback is that they are limited in what they actually find and are prone to reporting false positives. ‐ Web application scanners (also known as black‐box scanners) use a web browser to mimic an attack. These tools attempt to inject malicious payloads into HTTP requests and watch for indications of success in the resulting HTTP response. The benefit of black‐box scanners is that they require little skill and they are also effective at finding many configuration management issues. The drawback is that these tools perform poorly, i.e., they find little vulnerabilities. ‐ Database scanners act as an SQL client and perform various database queries to analyse the database’s security configuration. These tools are easy to use but rarely find problems outside database configuration. ‐ Binary Analysis Tools. These tools attempt to fuzz the input parameters of public functions (mainly C and C++) looking for signs of an application crash, common vulnerability signatures, and other improper behaviour. Binary analysis tools are easy to use but produce very complex results so a highly skilled analyst is required to understand and use the results effectively. ‐ Runtime analysis tools essentially act like profilers and intercept function calls as they occur. Instead of identifying application bugs, these tools give reviewers and testers a variety of critical information, letting them make the decision themselves. ‐ Configuration Analysis Tools. These tools usually performs static analysis, but rather than examining code, typically operate against the application configuration files, host settings, or server configuration. ‐ Proxies trap the HTTP request and response, allowing testers to view and modify different parts of the request, ranging from cookies, HTTP headers, GET and POST parameters, and HTML content. - Miscellaneous: Other tools do not directly fit under any category, such as: i) White‐box tools for performing security‐related activities such as unit testing and web UI testing. ii) Black‐box tools that attempt brute‐force authentication against the web application or check for search‐engine information leakage (leading to what is often termed as “Google hacking”).

‐ 93 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

All in all, web applications scanners (which correspond to the level 1A of security according to OWASP) will be employed in this dissertation to assess security in the client‐side of web applications.

6.3. Automated Non‐Functional Analysis

This section provides the method to assess the selected non‐functional quality attributes to be ensured by means of automated static analysis, i.e. compatibility, usability and accessibility.

6.3.1. Compatibility The compatibility assessment is based on the assurance that each page of each path is the same for each possible web browser. According to the W3C statistics26, Firefox27, Internet Explorer28, Chrome29, Safari30, and Opera31 are the most popular browsers today. The following chart presents an evolution of the percentage usage of the different browser since May 2002. As this diagram shows, Internet Explorer was the majority browser in early XXI century. Nowadays this situation has changed since browsers such as Firefox have been removing its dominant position. Chrome, launched by Google on 2008, has grown rapidly in the last years, being the only browser whose use is increasing. Other browsers such as Safari or Opera have a minor but constant usage ratio.

100 90 80 70 60 50 40 30 20 10 0 May May May May May May June June

June

March March

March

March

2002 2003 2004 2005 2006 2007 2008 2010 December December November November November November November November 2009

December September September

2011

September 2008 2010

2009 2008 2010 2002 2003 2004 2005 2006 2007 2009 2008 2010 2009

Explorer Firefox Chrome Safari Opera Mozilla

Figure 38. Browser Use Evolution since 2002

26 http://www.w3schools.com/browsers/browsers_stats.asp 27 http://www.mozilla‐europe.org/ 28 http://windows.microsoft.com/en‐US/internet‐explorer/products/ie/home 29 http://www.google.com/chrome 30 http://www.apple.com/safari/ 31 http://www.opera.com/

‐ 94 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

All in all, the W3C statistics corresponding to May 2011 establishes the following classification about browser usage: 1) Firefox (42%); 2) Internet Explorer (26%); 3) Chrome (25%); 4) Safari (4%); 5) Opera (3%). This distribution is shown in a pie chart in the following picture:

3%

4%

26% Explorer 25% Firefox Chrome Safari Opera

42%

Figure 39. Browser Use on March 2011 The methods I am going to employ in this dissertation to assess the compatibility of web sites are the following: ‐ HTML validation (HTML 2.0, 3.0, 4.0, 4.01, XHTML 1.0, 1.1). The use of static analysis to check the syntactical correctness of the client‐side HTML code helps to produce more compatible web pages. ‐ CSS validation. Using a static code checkers to look for errors and potential problems in Cascading Style Sheets. ‐ Taking snapshots. While traversing the navigation path, a snapshot of each path will be taken, and these snapshots will be added to the report. That way, testers will have the possibility of rapidly checking how each page has been rendered in the selected browser. HTML and CSS validation should be done according to the guidelines proposed by W3C32.

6.3.2. Usability Although any web application is based on a relatively simple interface consisting of links, buttons, menus, text fields, text, and graphics, usability problems are common. The main areas that contribute to these problems are the following [9]: ‐ Human perception. These issues can arise when pages are designed regardless how the information can best meet the needs of the user. The style of the pages can also contribute to these problems, e.g. poor contrast and layout can contribute to perceptual usability problems. ‐ Navigation. Disorientation is one of the biggest frustrations for web applications users. To avoid disorientation, there should be sufficient indicators to give the user’s current location. For example, it is considered a good usability web practice to include some “you

32 http://www.w3.org/

‐ 95 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

are here” indicator. Another common issue on navigation comes from the use of ambiguous links that may cause the user to go to the wrong page. ‐ Human memory. Web applications that require users to remember items from one page to another are likely to cause usability problems. ‐ Database integration. A common problem is that information within web pages can be out of sync with information in the serving database. Another usability issue stemming from web technology involves the caching systems in most web browsers. The possible methods available to assess usability are the following: ‐ User interaction tracking allows monitoring the user interaction (e.g. click, mouse movements, and so on) by heat map over a web site. The tools implementing these methods usually work by adding some JavaScript code to the page and compiling the data in the server‐side using a language such as PHP, Java, and so on. ‐ Heat map attention methods use advanced artificial intelligence algorithms to simulate human visual processing and attention. This predictive eye tracking method is principally based on research by two famous neuroscientists, Koch and Itti [78]. In visual neuroscience, their concept of algorithmic attention prediction is known as a Saliency Map33. ‐ User testing methods involves final user assessing the usability of a web site. There are two kinds of usability user testing: i) Moderated usability test. Participants interact with a web while a moderator their interaction. This moderator could be a human or a software program that asks the participant to perform both unassisted and directed tasks. ii) Unmoderated usability test. Participants can be either intercepted off a web site or recruited based on specific needs. ‐ Broken link checkers methods looks for links pointing to web pages no longer available. ‐ Usability guideline inspectors. These guidelines include most factors to consider during a usability evaluation of a website, for example the “Research‐Based & Usability Guidelines” 34 [129] developed by the U.S. Department of Health and Human Services (HHS). These guidelines have been created with the following aims: 1. To create better and more usable health and human service web sites. 2. To provide quantified, peer‐reviewed web site design guidelines. 3. To stimulate research into areas that will have the greatest influence on the creation of usable web sites. As depicted in the methodology, I use static analysis in the client‐side to assess usability. This kind of evaluation is based on the comparison of a set of rules (best practices, patterns, assumptions, bad smells, and fault description) with the source code under test, in this case this source is in the client‐side, i.e. the HTML code got with each HTTP response. Therefore, user interaction tracking method and user testing for usability is out of the scope of this dissertation, since these methods involves respectively the use of server‐side technologies and testing assessment. Hence, I am going to use broken link checkers, and usability guideline inspectors to perform automated usability analysis.

33 http://www.scholarpedia.org/article/Saliency_map 34 http://www.usability.gov/guidelines/

‐ 96 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

6.3.3. Accessibility Web accessibility rules such as the one provided by the Web Content Accessibility Guidelines (WCAG)35 have been established, so that accessibility assessment will have to verify the compliance to such rules. These guidelines explain how to make web content accessible to people with disabilities. WCAG is part of a series of accessibility guidelines published by the Web Accessibility Initiative (WAI)36, which is a group within the W3C with the aim of improving the accessibility of the WWW. Following the general process depicted in this dissertation, the WCAG guidelines will be used to assess the accessibility by checking the conformance of each web state in each navigation path. As a result, the accessibility issues found by this evaluation will be added to the final report.

6.4. Summary

This chapter has presented a method to perform automated non‐functional testing and analysis for web application in the client‐side. Regarding testing, this method is a continuation of the functional approach presented in chapter 5. Based on the division of navigation paths, performance and security testing is performed in each of the web pages. Therefore, the testing method has been extended using a load and attack injector, to check the performance and security respectively. In addition, new test oracles should be added since performance (response time and throughput) and security (protection against typical vulnerabilities) should be ensured. On the other hand, compatibility, usability and accessibility are assessed by means of well‐ known guidelines and rules. Compatibility is going to be evaluated by means of HTML and CSS validation according to W3C guidelines. Usability is going to be checked using Research‐Based Web Design & Usability Guidelines developed by the U.S. Department of Health and Human Services. Finally, accessibility is evaluated using the Web Content Accessibility Guidelines (WCAG) by the Web Accessibility Initiative (WAI).

35 http://www.w3.org/TR/WCAG20/ 36 http://www.w3.org/WAI/

‐ 97 ‐

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 7. Architecture

Simplicity is the ultimate sophistication.

‐ Leonardo da Vinci

ccording to the chapter 3 of this dissertation, Pressman’s four‐layer definition of SE [122] has been employed to find the specific objectives (see Figure 12). The lowest layers of this definition are quality and processes, which have been addressed in chapter 4 (methodology). The next layer is SE methods, which have been described Aon chapter 5 and 6. Finally, the top layer of Pressman’s approach is tools, which provides the practical support for the process and method. This section covers this last layer. To that aim, firstly a summary of the needed tools is provided. Then, a survey of the existing tools is provided. This survey looks for tools to support the described methods to ensure the selected quality attributes: functionality, performance, security, compatibility, usability, and accessibility. As a result of this survey, a set of tools will be selected. These tools will be integrated in a reference implementation of this dissertation, which has been named Automatic Testing Platform (ATP). An extension of this tool was created under the grant of ICT‐ ROMULUS37 project. This tool was called ATP4Romulus (Automatic Testing Platform for Romulus).

7.1. Tools Integration

Sections 5 and 6 presented the methods to automate quality control for web application following the methodology presented in section 4. The summary of these methods can be seen in Figure 23 and Figure 36. In these pictures, it can be noticed that there are some components to be defined. I am going to cover the functionality of such elements by integrating existing open‐source tools. These elements are listed as follows: ‐ Web engine. This piece is the responsible of automating the navigation by traversing the path of the web under test. ‐ Load injector. This component should simulate concurrent users exercising the web. ‐ Web scanners (attack injector). This element is in charge of generating malicious URLs to attack the web under test and observe its response in order to find vulnerabilities. ‐ HTML/CSS checker. This piece is a static checker to assess compatibility. ‐ Usability and accessibility checkers. Static analysers.

37 http://www.ict‐romulus.eu/web/romulus

‐ 99 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The following sub‐sections describe a tool survey carried out in order to find the most suitable existing tools to be integrated in the reference architecture and implementation.

7.2. Tool Survey

The quality goals selected in this dissertation has been established on the methodology section, summarized in Figure 20 (section 4.5). The quality dimensions to be assessed with software testing are functionality, performance, and security. The quality attributes to be evaluated using static analysis are compatibility, usability, and accessibility. This section presents a survey of the possible tool to support the proposed method in this dissertation. It should be noticed that the amount of existing testing tools nowadays is huge. There are several web pages that compiles the most significant testing tools (open source, commercial products, and so on). For example, opensourcetesting.org38 summarizes only open source testing tools. At writing time, this web counts 466 tools. Other important web site is SoftwareQATest.com39, which compiles more than 500 tools listed in 13 categories. The Automated Testing Institute40 gathers 716 testing tool at writing time. Due to the fact that there are many tools, this survey does not intent to compare each existing tool. On the other side, I compare the most significant tools to the stated aims in the proposed methods (chapter 5 and 6) to assess automatically the functionality, performance, security, compatibility, usability and accessibility of web applications in the client‐side. That way, the amount of tools to be studied is somehow reduced. The aim of this study is to select a group of tools to be integrated in a complete automated quality control framework for web applications in the client‐side. Therefore, the requirements to the address for the tools to be selected are the following: 1. The tool should have an API to access to its functionality. By using the API, I will assess the quality goals (functionality, performance, security, compatibility, usability and accessibility) to customize the navigation according to the paths. 2. The tool should be preferably open source. The fact that a tool/library is open source provides freedom to change and redistribute the resulting product, which is very interesting in research. 3. The tool should be preferably cross‐platform (multi‐platform). A cross‐platform application may run on as many as all existing platform, such as Windows, Linux, Mac OS X, PowerPC, and so on.

7.2.1. Functionality In order to assess functionality, the entity called Browser should perform automated interaction with web application is needed. Therefore, I need to find a web engine able to perform this automation. These kinds of tools are sometimes called headless web browser (also known as GUI‐Less browser). The additional requirements of the tool I need to accomplish to the proposed method are the following: 1. It should be compatible with the use of real browsers.

38 http://www.opensourcetesting.org/ 39 http://www.softwareqatest.com/qatweb1.html 40 http://www.automatedtestinginstitute.com/

‐ 100 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

2. It should count with an R&P facility. The candidate tools are summarized in the following table:

Table 13. Functional Web Tools Tool Description License HtmlUnit41 It provides a Java API to invoke pages, fill out forms, click links, Apache 2 and so on, just like with a normal browser. HttpUnit42 Testing framework used to perform testing of web sites without BSD the need for a real web browser. Selenium43 Portable testing framework for web applications. It provides an Apache 2 R&P module called Selenium IDE (see section 2.5.5.4). Watir44 Ruby libraries for automating web browsers (see section BSD 2.5.5.8). JWebUnit45 Java‐based testing framework for web applications which wraps LGPL HtmlUnit and Selenium with a unified, simple API. Canoo Open source tool for automated testing of web applications. It Apache 2 WebTest46 writes the scripts using XML or Groovy. IEUnit47 Simple framework to test logical behaviors of web pages. MIT soapUI48 Open source functional testing tool, mainly it is used for SOA. LGPL SOAtest Testing platform with R&P scripts run in real web browsers. Proprietary Sahi49 Automation testing tool for web applications, with R&P scripts. Apache 2 TestWise50 Testing toolset that automates test and maintenance of web Freeshare applications.

The following table shows a comparison of such tools with the requirements listed above:

Table 14. Functional Web Tools Comparison Tool 1. API 2. Open Source 3. Cross‐Platform 4. Real Browsers 5. R&P

HtmlUnit    HttpUnit    Selenium      Watir     JWebUnit     

41 http://htmlunit.sourceforge.net/ 42 http://httpunit.sourceforge.net/ 43 http://seleniumhq.org/ 44 http://watir.com/ 45 http://jwebunit.sourceforge.net/ 46 http://webtest.canoo.com/webtest/manual/WebTestHome.html 47 http://code.google.com/p/ieunit/ 48 http://www.soapui.org/ 49 http://sahi.co.in/w/ 50 http://itest2.com/en/testwise

‐ 101 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Canoo      WebTest IEUnit     soapUI     SOAtest    Sahi      TestWise    

In the view of the results, there are several useful alternatives: Selenium, JWebUnit, Canoo WebTest, and Sahi. I have chosen Selenium since it is a versatile tool with different components (see Table 5 on section 2.5.5.4). In addition, and as shown in the following table, it supports the automation of the navigation of web pages using the main current web browsers (as depicted in Figure 39 on section 6.3.1), namely Explorer, Firefox, Safari, Opera, and Chrome:

Table 15. Browser Compatibility of Selenium

Explorer Firefox Safari Opera Chrome

Windows 6,7,8 2,3 8,9,10 6,7

Linux 2,3 8,9,10

Mac OS X 2,3 2,3,4 8,9,10

In addition, Selenium IDE is the project of Selenium that will support R&P. Selenium IDE is implemented as a Firefox plugin, and the language of the recorded interaction is stored as HTML. The following pictures show an example of the UI of Selenium IDE (Figure 40) and an example of a recorded script in HTML (Figure 41):

Figure 41. Recorded Script in HTML

Figure 40. Selenium IDE

‐ 102 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The following snippet corresponds to the HTML script recorded by Selenium IDE in Figure 40 and illustrated on Figure 41:

Snippet 6. Procedure to Translate Guards into HTML Elements google_selenium

google_selenium
open /
type q selenium
click btnG
clickAndWait link=Selenium web application testing system
Due to the fact that the number of Selenium command is large, I have chosen a subset of such command to be integrated in the approach. These commands are summarized in the following table:

Table 16. Selenium Commands Subset Type Command Description Initial opening open This command establishes the URL’s SUT. Mouse buttons click Click in an HTML element. doubleClick Double click in an HTML element. clickAt Click in a position X,Y Change of state clickAndWait Click and wait to load a web page. doubleClickAndWait Double click and wait to load a page. clickAtAndWait Click in a position and wait to load. Mouse mouseDown The user depresses the mouse button. movements mouseMove Mouse pointer moves inside an element.

‐ 103 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

mouseOut Mouse pointer leaves an element. mouseOver Mouse pointer enters on an element. mouseUp Mouse pointer is over on an element, and the mouse button is released. Keyboard type Sets the value of an input field. actions select Select an option from a drop‐down. keyDown Pressing a key (without releasing it yet). keyPress Pressing and releasing a key. keyUp Releasing a key. Assertions assertText Assert that an element text is present. assertNotText Assert that an element text is not present. assertTextPresent Assert that a string is present. assertTextNotPresent Assert that a string is not present. assertValue Assert that an element value is present. assertNotValue Assert that an element value is not present.

7.2.2. Performance As depicted in section 6.2.1, I need to incorporate a load injector that can be configured to take and assert the response time and throughput perceived in the client‐side of the web applications under test. Therefore, the additional requirements to choose a performance tools it that response time and throughput must be measured and assessed. Hence, the performance testing tool candidates are summarized in the following table:

Table 17. Web Performance Tools Tool Description License curl‐loader51 Tool written in C‐language, simulating application load and GPL application behavior of thousands and tens of thousands HTTP/HTTPS and FTP/FTPS clients. FunkLoad52 Functional and load web tester, written in Python, for GPL Linux systems FWPTT53 Web application tester program for load testing web GPL applications. Grinder54 Java load testing framework that makes it easy to run a BSD distributed test using many load injector machines. Hammerora55 Load generation tool for the Oracle Database and web GPL

51 http://curl‐loader.sourceforge.net/ 52 http://funkload.nuxeo.org/ 53 http://fwptt.sourceforge.net/ 54 http://grinder.sourceforge.net/ 55 http://hammerora.sourceforge.net/

‐ 104 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

applications. Httperf It generates various HTTP workloads to measure server GPL performance. JMeter56 Java application designed to load test functional behavior Apache 2 and measure performance. http_load57 It runs multiple HTTP fetches in parallel, to test the Free throughput of a Web server. JCrawler58 Stress tool for web‐applications. It comes with the CPL crawling/exploratory feature. JUnitPerf59 It measures the performance and scalability of BSD functionality contained within existing JUnit tests. loadUI60 Tool integrated with soapUI for load testing web services, EUPL REST, AMF, JMS, JDBC as well as web sites. Multi‐ Framework for web performance and load testing. It LGPL Mechanize61 generates load against a web site or web service. OpenSTA62 Distributed software testing architecture designed around GPL CORBA. It tests the performance from Windows platforms. Pylot63 It runs HTTP load tests, which are useful for capacity GPL planning, benchmarking, analysis, and system tuning. TestMaker64 Platform for Functional Testing, Regression, Load and GPL Performance Testing, and Business Service Monitoring WebLOAD65 Load generation engine for stress testing and performance GPL/Propri testing of web applications. etary

The next step is to compare these tools according to the stated requirements, as follows:

Table 18. Performace Web Tools Comparison Tool 1. API 2. Open Source 3. Cross‐Platform 4. Response Time & Throughput

curl‐loader    FunkLoad   FWPTT  Grinder    

56 http://jakarta.apache.org/jmeter/ 57 http://www.acme.com/software/http_load/ 58 http://jcrawler.sourceforge.net/ 59 http://www.clarkware.com/software/JUnitPerf.html 60 http://www.loadui.org/ 61 http://code.google.com/p/multi‐mechanize/ 62 http://opensta.org/ 63 http://www.pylot.org/ 64 http://www.pushtotest.com/index.php 65 http://www.webload.org/

‐ 105 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Hammerora   Httperf     JMeter     http_load  JCrawler   JUnitPerf   loadUI    Multi‐    Mechanize OpenSTA    Pylot    TestMaker     WebLOAD   

A priori there are several tools that meet preset expectations: Grinder, Httper, JMeter, or TestMaker. I have selected JMeter, since it is a mature open source framework based in Java and totally cross‐platform.

7.2.3. Security As depicted in section 6.2.2, I need to incorporate an attack injector (web application scanner) that can be configured to assess the security along the paths of the web application under test.

Table 19. Web Application Scanners Tool Description License HP WebInspect66 It performs web application security testing and assessment Proprietary for complex web applications. Rational Commercial vulnerability scanner which can detect many Proprietary AppScan67 common server misconfigurations as well as vulnerabilities. OWASP Framework for analysing applications that communicate GPL WebScarab68 using the HTTP and HTTPS protocols. Nikto69 Web scanner which performs comprehensive tests against GPL web servers for multiple items. Wikto70 Windows tool for analyzing vulnerabilities in web Free applications. Wapiti71 Web application scanner performs different types of attack, GPL

66 https://www.fortify.com/products/web_inspect.html 67 http://www‐01.ibm.com/software/awdtools/appscan/ 68 https://www.owasp.org/index.php/Category:OWASP_WebScarab_Project 69 http://cirt.net/nikto2 70 http://www.sensepost.com/labs/tools/pentest/wikto 71 http://wapiti.sourceforge.net/

‐ 106 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

submitting random inputs of various sizes to the application. NTOspider72 It scans and analyses complex web sites and identifies Proprietary application vulnerabilities as well as site exposure risk.

As in other sections, I am going to compare these web application scanners with the stated requirements, as follows:

Table 20. Web Application Scanners Comparison Tool 1. API 2. Open Source 3. Cross‐Platform HP WebInspect  Rational AppScan  OWASP WebScarab   Nikto   Wikto Wapiti    NTOspider 

All in all, I have selected Wapiti as the web application scanner to be integrated in the proposed approach.

7.2.4. Compatibility The methods selected in section 6.3.1 to assess compatibility are the following: HTML validation, CSS validation, and making snapshots of the different web pages along the navigation paths. Therefore, I am going to survey these kinds of assessment by seeking different existing tools:

Table 21. Snapshots Compatibility Tools Tool Description License Browser Photo73 It shows actual photos (not emulations) of a web page taken Proprietary on different browser (Explorer, Opera, Firefox and Safari) and operative systems (Windows, Mac, and Linux). IE NetRenderer74 It checks how an URL is rendered by Internet Explorer 9 to Free online lower versions. service AnyBrowser.com It takes a snapshot using different HTML levels (3.2, 4.0, and Free online 75 so on). service Viewer76 It shows a web page like in the Lynx text‐only browser. Free online service Browsershots77 It makes screenshots of a web page in different browsers. Free online

72 http://www.ntobjectives.com/ntospider 73 http://www.netmechanic.com/products/browser‐index.shtml 74 http://ipinfo.info/netrenderer/index.php?page=1 75 http://www.anybrowser.com/siteviewer.html 76 http://www.delorie.com/web/lynxview.html

‐ 107 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

service BrowserCam78 It tests a web site on different operating systems creating Proprietary screenshots. online service url2png79 It takes a snapshot of a web page in Internet Explorer Free online service Browsrcamp.com It takes a snapshot of a web page in Safari using a Mac OS X Free online 80 system service Adobe Browser It shows screenshots of a website as seen by several Freeware Lab81 different environments, such as Firefox on Windows and OS X, Explorer on Windows, and Safari on Mac OS X. Microsoft It checks a web site in multiple versions of Explorer, Firefox, Freeware SuperPreview82 and Safari (Windows and Mac OS X platforms). Cross Browser It allows selecting an operating system (Windows, Mac, Freeware Testing83 Ubuntu), a browser (desktop and mobile version of Explorer, Firefox, Chrome, Opera, and Safari) and then it takes screenshot of a web site. Spoon Browser It is a web‐based application virtualization service that Freeware Sandbox84 allows users to run a variety of different Windows applications without installing them. The Spoon plugin multiple web browsers, such as Internet Explorer 6, 7, and 8, Firefox 3 and 3.5, Safari 2, 3, and 4, Opera 9 and 10, and Chrome. Litmus85 It shows screenshots of a website across all major web Freeware browsers IE Tab86 Firefox plugin that enables to load webpages in Explorer. Free Plugin IETester87 It is a browser that allows previewing the rendering and Free JavaScript engines of Explorer (9, 8, 7, 6, and 5.5) on Windows 7, Vista and XP. IECapt88 Command‐line utility to capture Internet Explorer’s GPL rendering of a web page into a BMP, JPEG or PNG image file. Multi‐Safari89 Special versions of Safari browser for Mac OS X that use the Free original Web Kit framework in order to run different version

77 http://browsershots.org/ 78 http://www.browsercam.com/ 79 http://www.iecapture.com/ 80 http://browsrcamp.com/app/screenshots 81 https://browserlab.adobe.com/ 82 http://www.microsoft.com/expression/products/SuperPreview_Overview.aspx 83 http://crossbrowsertesting.com/ 84 http://spoon.net/browsers/ 85 http://litmusapp.com/ 86 https://addons.mozilla.org/en‐US/firefox/addon/ie‐tab/ 87 http://www.my‐debugbar.com/wiki/IETester/HomePage 88 http://iecapt.sourceforge.net/ 89 http://michelf.com/projects/multi‐safari/

‐ 108 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

of Safari in the same machine. Cloud Testing90 It uses R&P to verify both the look and feel and the Freeware functionality of using IE 6,7,8, Firefox, Apple Safari, and Opera browsers.

The additional feature for the tool to be selected is that the main browsers should be supported (see Figure 39 on section 6.3.1):

Table 22. Performace Web Tools Comparison Tool 1. API 2. Open Source 3. Cross‐Platform 4. Real Browsers

Browser Photo   IE NetRenderer  AnyBrowser.com   Lynx Viewer  Browsershots   BrowserCam   url2png  Browsrcamp.com  Adobe Browser Lab   Microsoft SuperPreview   Cross Browser Testing    Spoon Browser    Sandbox Litmus  IE Tab IETester IECapt   Multi‐Safari Cloud Testing 

Therefore, any of the studied tools address all the stated requirements. To solve this situation, I am going to use the same tool selected to perform functional testing: Selenium. This tool has the feature of taking snapshots of the analyses web page. In addition, it is open‐source, cross‐ platform and it is supported by the main browsers (Explorer, Firefox, Chrome, Opera, and Safari). The second type of compatibility tools needed is a HTML checker. The found candidates are summarized in the following table:

90 http://www.cloudtesting.com/functional‐testing/

‐ 109 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Table 23. HTML Checkers Tool Description License W3C Markup It checks a web documents either by giving its URL or Free online Validator uploading your document directly to the validator. service Service91 HTML Tidy92 HTML syntax checker (program and library) BSD‐style license Tidy online93 Online service version of HTML Tidy Free online service JTidy94 Java port of HTML Tidy Open95 Web Page It checks HTML pages for compliance with various standards Free online Purifier96 (4.0 Transitional, 4.0 Strict, and so on). service Web Page It checks web pages to see if they are compatible with Free online Backward browsers that lack certain features. service Compatibility Viewer97 SortSite It checks an URL with HTML 2.0, 3.0, 4.0, 4.01, XHTML 1.0, Shareware Professional98 1.1 and CSS validation.

Therefore, the comparison of these tools is the following:

Table 24. HTML Checkers Comparison Tool 1. API 2. Open Source 3. Cross‐Platform W3C Markup Validator Service  HTML Tidy   Tidy online  JTidy    Web Page Purifier  Web Page Backward Compatibility Viewer   SortSite Professional 

Therefore, the best candidate is JTidy since it covers all the requirements. Regarding CSS checkers, I am going to use the same requirements (changing HTML by CSS validation). The studied options are the following:

Table 25. CSS Checkers

91 http://validator.w3.org/ 92 http://tidy.sourceforge.net/ 93 http://services.w3.org/tidy/tidy 94 http://jtidy.sourceforge.net/ 95 http://jtidy.sourceforge.net/license.html 96 http://www.delorie.com/web/purify.html 97 http://www.delorie.com/web/wpbcv.html 98 http://www.fileheap.com/software‐sortsite‐professional‐download‐26787.html

‐ 110 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Tool Description License W3C W3 Consortium's CSS validator. It can be downloaded and Free/Free CSSValidator99 used in local machine, upload a CSS to their online service, online or supply a URL for their spider to visit the web site. service CSSCheck100 Online CSS validator Free online service CSSTidy101 Open‐source CSS parser and optimiser. GPL

And the comparison of these tools is summarized in the following table, which shows that CSSValidator is the most suitable tool:

Table 26. HTML Checkers Comparison Tool 1. API 2. Open Source 3. Cross‐Platform W3C CSSValidatior    CSSCheck  CSSTidy 

7.2.5. Usability As depicted in section 6.3.2, I am going to employ broken link checkers, and usability guideline inspectors to perform automated usability analysis. Therefore, in this section I describe the survey made to select a tool in each of these fields. Regarding broken links tools, there are a lot of tools performing such assessment. Some of them are summarized in the following table.

Table 27. Broken Links Tools Tool Description License W3C Link Online validator from the W3 Consortium to recursively Free online Checker102 checks a website for dead links. service Xenu Link Windows program that checks a web site for broken links Freeware Sleuth103 Broken Link Online checker for broken links on a web site. Free online Checker104 service LinkChecker105 Recursive and multithreaded web link checker. GPL

Nevertheless, I am not going to use any of these tools, since with Selenium (the selected tool to perform functional testing) I have the capability of performing broken link checking.

99 http://jigsaw.w3.org/css‐validator/ 100 http://www.htmlhelp.com/tools/csscheck/ 101 http://csstidy.sourceforge.net/ 102 http://validator.w3.org/checklink 103 http://home.snafu.de/tilman/xenulink.html 104 http://www.iwebtool.com/broken_link_checker 105 http://linkchecker.sourceforge.net/

‐ 111 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The usability guideline inspection tools are summarized in the following table. Similarly, the only tool completely open source and cross‐platform seem to be WebSat, so this tool is the selected one.

Table 28. Usability Guideline Inspection Tools Tool Description License Usability Usability and accessibility guidelines about functionality, user Free online checklist106 control, language and content, consistency, and so on. It is service designed to be printed and make some kind of surveys. SortSite It checks an URL with the Usability.gov guidelines. Shareware Professional107 Website Free SEO (Search Engine Optimization) that identifies key Free online Grader108 usability issues. service A‐Prompt109 Windows program that checks a web page for accessibility Free issues. WebSAT110 Checks web page HTML against typical usability guidelines. Open

7.2.6. Accessibility A complete list of accessibility tools is maintained by WAI111. Below it is shown a summary of the tools that check for conformance to accessibility guidelines.

Table 29. Accessiblity Guideline Tools Tool Description License A‐Checker 112 Online accessibility checker that tests web pages for Open/Free conformance to various accessibility guidelines. online A‐Prompt113 Accessibility evaluation and repair tool. Free Acc114 Firefox Extension, which is capable of evaluating and Free reporting some accessibility criteria. Accessibility It evaluates a web page against subset of the WAI Free online Check115 guidelines. service AccessValet116 Analyses HTML and XHTML pages. Reports deprecated and Proprietary invalid markup, and violations of accessibility guidelines. aDesigner117 Disability simulator to help web designers ensuring their Proprietary

106 http://ist.mit.edu/services/consulting/usability/guidelines 107 http://www.fileheap.com/software‐sortsite‐professional‐download‐26787.html 108 http://websitegrader.com/ 109 http://aprompt.snow.utoronto.ca/ 110 http://zing.ncsl.nist.gov/WebTools/WebSAT/overview.html 111 http://www.w3.org/WAI/ER/tools/complete.html 112 http://achecker.ca/checker/index.php 113 http://www.aprompt.ca/ 114 http://appro.mit.jyu.fi/tools/acc/ 115 http://www.etre.com/tools/accessibilitycheck/ 116 http://valet.webthing.com/access/ 117 http://www.alphaworks.ibm.com/tech/adesigner

‐ 112 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

pages is accessible and usable by the visually impaired. ART Guide It reviews sites for compliance with the international Free online accessibility standards. service Bobby118 Web accessibility desktop testing tool designed to help Proprietary expose barriers to accessibility and encourage compliance with existing accessibility guidelines. Cynthia Says It checks a web page against the US Section 508 standards Free online Portal119 and the Web Content Accessibility Guidelines (WCAG) service EvalAccess120 On‐line web accessibility evaluation tool which has been Free online developed using Web Service technology service Functional It analyses web resources for markup that is consistent with Free online Accessibility the use of DRES/CITES HTML best practices for accessibility. service Evaluator121 Hera122 A web‐based system that performs some automated WCAG Free online 1.0 testing. service Hermish123 Online HTML accessibility checker Free online service IBM Rule‐based The IBM Rule‐Based Accessibility Validation Environment Proprietary Accessibility (RAVEN) is an innovative suite of tools for inspecting Java Validation and web rich‐client graphical user interfaces and validating Environment124 them for accessibility. Accessibility Tools Framework that serves as an extensible infrastructure upon EPL Framework which developers can build a variety of utilities that help to (ACTF)125 evaluate and enhance the accessibility of applications and content for people with disabilities imergo126 Standards compliance and quality assurance tool targeted to Proprietary industrial Internet portals Ocawa127 It runs accessibility tests based on the W3C WCAG using a Proprietary built‐in expert system. Readability index It calculates a readability index score for a text. Free online calculator service SiteCheck128 It checks an entire site for errors in such areas as spelling, Freeware linking and accessibility. SortSite It checks an URL with the W3C WCAG 1.0 guidelines. Shareware

118 http://www.w3.org/WAI/ER/tools/complete.html 119 http://www.cynthiasays.com/ 120 http://sipt07.si.ehu.es/evalaccess2/index.html 121 http://fae.cita.uiuc.edu/ 122 http://www.sidar.org/hera/ 123 http://hermish.com/ 124 http://www.alphaworks.ibm.com/tech/raven 125 http://www.eclipse.org/actf/ 126 http://imergo.com/home 127 http://www.ocawa.com/ 128 http://siteimprove.com/

‐ 113 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Professional129 WAVE It checks a web page for compliance with various Free online Accessibility accessibility standards. service Tool130

As this table showed, most of these tools are proprietary or provide online service. This situation is a problem since I am looking for a tool with an API and preferably open‐source and cross‐platform. The only real choice seems to be A‐Checker, since it provides a free Web Service API131 to assess the WCAG guidelines.

7.3. Automatic Testing Platform

The framework which implements the AST proposed method in this dissertation has been named Automatic Testing Platform (ATP)132 and has been released as open‐source under the terms of Apache license 2.0. The first decision I have made to implement ATP is the programming language. In order to achieve cross‐platform, I have chosen Java as the language in which ATP will be developed. This decision has a direct impact on the unit test cases: JUnit, which is the standard de‐facto for unit testing in Java, will be used in ATP. As depicted in section 5.4 and 7.2.1, ATP will accept three kinds of inputs: i) XML navigation; ii) NDT files; iii) Selenium scripts in HTML format. Regarding NDT approach, it uses Enterprise Architect (EA) to build its models [40]. ATP accepts this EA models in XMI (XML Metadata Interchange) format. From the navigation structure in one of these formats, ATP will create a Java Eclipse project as from the scratch as output. This project will contain the following components inside: - JUnit test cases, one per activity diagram/XML file/Selenium script. Inside these test cases there is one test method per found path. For each web page in each page the selected quality attributes are assessed: functionality, performance, security, compatibility, usability and accessibility. In order to create the JUnit test cases, I we use templates. - Test data collection. As depicted in section 5.4, the data‐driven approach is implemented by the separation the test case and test data/expected outcome generation. The tabular file with test data (input) and expected outcome (output) will be stored using an Excel spread‐sheet per path. - Script runner. The selected framework to run the JUnit test cases by means of an Apache Ant script. This script starts the Selenium server before running the unit test cases. All in all, the schematic way of working of ATP is represented in the following picture:

129 http://www.fileheap.com/software‐sortsite‐professional‐download‐26787.html 130 http://wave.webaim.org/ 131 http://achecker.ca/documentation/web_service_api.php 132 http://atestingp.sourceforge.net/

‐ 114 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 42. ATP Process Therefore, ATP has been built using existing open‐source components, summarized in the following table:

Table 30. ATP Components Type Tool URL Unit framework JUnit http://www.junit.org/ Test logic generation Freemarker http://freemarker.sourceforge.net/ Test case execution Ant http://ant.apache.org/ Test case reporting iText http://itextpdf.com/ Graph manipulation JUNG http://jung.sourceforge.net/ Charts creation JFreeChart http://www.jfree.org/jfreechart/ XML parsing JDOM http://www.jdom.org/ Spread‐sheet access JExcelAPI http://jexcelapi.sourceforge.net/ Functionality Selenium http://seleniumhq.org/ Performance JMeter http://jakarta.apache.org/jmeter/ Security Wapiti http://wapiti.sourceforge.net/ HTML Compatibility JTidy http://jtidy.sourceforge.net/ CSS Compatibility CSSValidator http://jigsaw.w3.org/css‐validator/ Usability Guidelines WebSAT http://zing.ncsl.nist.gov/WebTools/WebSAT/over view.html Accessibility Guidelines A‐Checker http://achecker.ca/checker/index.php

The following picture shows this information in a graphical way:

‐ 115 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 43. ATP Architecture ATP has been implemented as a command‐line tool. Typing atp in the shell, it shows the following help:

Snippet 7. ATP in the Shell > atp [INFO] ATP (Automatic Testing Platform) v2.0 [INFO] [http://atestingp.sourceforge.net] [INFO] Copyright (c) 2011 UPM. Apache 2.0 license. [INFO] [INFO] Use one of these options: [INFO] atp create [INFO] atp run [INFO] atp clean [INFO] atp list [INFO] atp set [INFO] atp report Where:

- atp create: this command creates the test case and data for each path. The Eclipse project which contains these artefacts is also created with this command. - atp run: this command executes the previously created test cases, by using the Ant script executor already created. Previous to the execution, a Selenium server is launched. As a result, test reports in different formats are created (XML, HTML, and PDF). - atp clean: This command drops the Eclipse project previously created. - atp list: This command shows the configuration parameters of ATP. The most important parameters are: root, folder where the Eclipse project with the entire output artefacts to be created; navigation_type, type of input (xml, xmi or html); navigation_folder, root to the input file(s). - atp report: This command opens the HTML reports previously generated.

7.3.1. Test Cases The inner architecture of ATP is based on entities called “generators” for the test case generation. These entities are in charge of gathering information, transform it and generate each single type of test cases. For achieving this goal it follows this three‐tier process:

‐ 116 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

1. Collection of input data. This gathers information needed for the test case generation. 2. Transformation. This stage reads the information of the sources and prepares it for writing the test cases. 3. Test case generation. This stage generates the output, i.e., writes the test cases.

Figure 44. 3‐Tier ATP Methodology The first stage is performed by entities named collectors. The aim of a collector is compiling the location of the input sources for the test cases. It examines the code looking for the sources in which the information is stored. The second stage is implemented by transformers. A transformer reads the proper information for a test case in the source found by collectors. This information is passed to the third stage, which is implemented by writers. The aim of a writer is to generate the test case, i.e., creating the test case file. Each writer is linked with a testing tool. The aggregation of these three entities (collector, transformer and writer) is known as a generator. This cascade process has the Java unit test case generation as a result. The way of working of a generator is implemented using the following snippet code:

Snippet 8. Generators in ATP public void generate() throws Exception { Collection in = (this.getCollector() != null) ? this .getCollector().collect() : null; Collection> tc = this.getTransformer() .transform(in); if (!tc.isEmpty()) { this.getWriter().write(tc); } }

7.3.2. ATP Extension ATP provides a template‐based platform for the automatic unit test generation for web applications. The tool is extensible and based on a plug‐in design model, so the amount of generated test cases depends on the numbers of generators registered by ATP. A generator is linked to a specific unit technology because of its writer. This technology is JUnit by default, but it could be another, such as TestNG133. The method for extending ATP by registering new unit generators in the platform is shown in the following picture:

133 http://testng.org/

‐ 117 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 45. 3‐ Method for Adding New Generators in ATP 1. Select unit under test. This first stage looks for the component of the SUT to be test. 2. Select tool. This step consists on the selection of a unit testing Java framework, i.e. JUnit, TestNG, and so on. 3. Write test case. The test case should be written once according to the selected tool. This part is performed by hand, and it will be the pattern of the test cases generated automatically by ATP. 4. Identify variable elements. The test case generated in the section before should be susceptible to be generalised. It should be identified the variable elements in the test case and change this parts for FreeMarker tags, e.g. ${element}. This stage produces the FreeMarker template, i.e., an FTL (FreeMarker Template Language) file. 5. Identify the source of information. According to the selected tags, it should be located the source of the information. This is handled by the collectors. 6. Identify transformation. Transformers must be able to pick up the information in the found sources by the collector and transform it to a map composed by pairs key‐value. The set of keys must match the templates tags, and the value should be found in the sources found by the collector. The writer then creates the test cases using this map as input with the FTL template. 7. Registry in tool. When a generator (collector, transformer, and writer) is created, it must be registered in ATP, which is performed by adding its information to a CSV (Comma Separated Value) file which is processed by the ATP to register all the generators.

7.3.3. Web Site Java Modelling To work with a web site, ATP has its own set of classes which implements the meta‐model presented in Figure 21 (section 5.1). The part of this metamodel corresponding to web application has been implemented in Java as follows:

‐ 118 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 46. Web Applicartion Model 7.4. ATP4Romulus

An extension of ATP has been created in the ICT‐ROMULUS project. This tool has been called ATP4Romulus (Automatic Testing Platform for Romulus)134, and it perform testing for Roma Framework135 web applications. Roma Framework allows developing enterprise level Java applications with a Domain Driven Design (DDD) approach. ATP4Romulus has specific generators (see 7.3.1) to generate test cases for Roma‐based web applications. These generators has been added using the extension method depicted in section 7.3.2. ATP4Romulus has a big difference respect ATP. ATP4Romulus’ generators have been designed to check server‐side components. In Roma, these components are Plain‐Old Java Objects (POJOs). Following the ATP approach, ATP4Romulus is completed with the usage of different pluggable open source testing frameworks: JUnit v3, JUnit v4, TestNG, Selenium, JUnitPerf, JMeter, and DBUnit, as follows:

134 http://www.ict‐romulus.eu/web/atp4romulus 135 http://www.romaframework.org/

‐ 119 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 47. 3‐ ATP4Romulus Architecture ATP4Romulus has been released as a Roma wizard. The integration ATP4Romulus with Roma is very simple: ATP4Romulus should be installed as wizard inside the $ROMA_HOME/modules. Once this is installed, we can call the ATP options by means of the Roma console as follows, using the roma project test command:

Snippet 9. Roma Metaframework $ROMA_HOME>roma ROMA Framework CONSOLE v.2.1.0 [http://www.romaframework.org] Copyrights (c) 2006-2009 Luca Garulli. Apache 2.0 license. Free to use!

Please specify the wizard to use between the following wizards discovered in the classpath:

- get [] - module add [-p] - module check [new] - module info - module install [] - module uninstall - module upgrade [ []] - project create [] - project crud [] [-p] - project info [-p] - project list - project remove [-p] - project switch [] - project test

Example: roma project create webready blog org.test.blog C:/temp

‐ 120 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

7.5. Summary

This section closes the approach of this dissertation, by selecting the tools which composes the proposed reference architecture. According to section 5 and 6, the components I need to integrate are the following: i) A web engine to automate the navigation, also known as a head‐ less browser; ii) A load injector, to carry out automated performance testing; iii) A web application scanner, i.e. and attack generator to perform black‐box security testing; iv) Some static checkers to assess compatibility (HTML and CSS compliance) and usability/accessibility guidelines. In order to select these tools, a complete survey has been performed. This study has been addressed by the following conditions: a) The selected tool should have an API; b) The selected tool should be preferably open‐source; c) The selected tool should be preferably cross‐ platform. All in all, the selected tools have been the following: - Selenium, as head‐less browser. - JMeter, as load injector. - Wapiti, as web application scanner. - JTidy and CSSValidator, to carry out compatibility (HTML and CSS respectively) analysis. - WebSAT, to assess usability by means of static analysis of the HTML elements. - A‐Checker, to assess accessibility by means of static analysis of the HTML elements. These elements have been integrated in a framework named Automatic Testing Platform (ATP), licensed as open source using the Apache 2 license. ATP performs automated quality control (testing and analysis) for web applications in the client‐side following the approaches presented in this dissertation. Moreover, an extension of this tool has been developed within the ICT‐Romulus project. This new tools has been named ATP4Romulus (ATP for Romulus), and the target of this framework are web applications developed using the Roma Framework.

‐ 121 ‐

PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Chapter 8. Validation

Program testing can be used to show the presence of bugs, but never to show their absence!

‐ Edsger Dijkstra

he aim of this chapter is to present the experiments that have been carried out to validate the feasibility of the proposed process and methods to automate the quality control of web applications. First, section 8.1 presents the research question which will drive these experiments. TSecond, section 8.2 describes an industrial case study conducted to validate the test and analysis automation by using the reference implementation of this dissertation, namely ATP. For this validation, the web application “Factur@” has been chosen as system under test and analysis. This application was developed in the context of IT project Factur@. Third, the extension of ATP implemented for Romulus application described on section 7.3 (i.e. ATP4Romulus) has also been validated. This work was performed in the context of the ICT Romulus by means of three different demonstrators: EU Project Managers, Cornelius and Scrooge. These demonstrators are web applications built using Roma Framework combined with all the different technologies created in the Romulus project. Therefore, section 0 provides information about the automated test and analysis for these demonstrators by using ATP4Romulus. Finally section 8.4 will draw some conclusions about the described experiments by answering the formulated research questions.

8.1. Research Questions

Research Questions (RQs) are formal statements of the aim of a study. RQs should state clearly what the study will investigate or attempt to prove [49]. In order to accomplish the validation in this dissertation, the reference implementations ATP (and its version for Romulus, i.e. ATP4Romlus) described in section 7 have been employed. Moreover, some web applications will be selected and used as SUT. All in all, the RQs driven this case study are the following: ‐ RQ1: Does the SUT accomplish its functional requirements? ‐ RQ2: Does the SUT have an acceptable behaviour in terms of non‐functional requirements?

‐ 123 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ RQ3: Is ATP able to reveal defects in web application of different types? ‐ RQ4: What are the advantages and disadvantages of different types of input (UML, XML, and R&P) to ATP? ‐ RQ5: Does ATP provide any advantages in testing and analysis (defects revealed, reduction of efforts, and so on)?

8.2. Factur@

In order to carry out the first part of the validation of this dissertation, an industrial case study has been performed. Therefore, first I needed a target web application to carry out the automated testing and analysis processes. The selected SUT is named “Factur@”. This system has been developed by UPM in the context of IT Factur@ innovation project for the Spanish company Telvent136.

8.2.1. System Description Factur@ is an electronic invoice web management system which has been developed using a Model‐Driven Engineering (MDE) approach [48] Factur@ consists on a series of independent but interconnected modules. These modules are illustrated in Figure 48 and described as follows: ‐ Administration: This module is in charge of the management of entities and users of the platform. It provides a web interface allowing authenticated user to configure the parameters of the platform. ‐ WS (Web Services): This module is the interface to the platform. It will publish a set of web services, allowing Enterprise Resource Planning (ERP), Small and Medium Enterprises (SME) and Local Bodies (LLBB) to use electronic billing capabilities, such as interoperability between formats, generation, validation, signature and electronic invoices custody. ‐ Web: This module is intended to be a web front‐end to meet the requirements by SMEs and LLBB that lack sophisticated ERPs and need an interoperability solution for the generation, validation, custody and signature of electronic invoices. ‐ Interoperability: This module allows performing translations between electronic invoice formats supported by the signature platform, such as UBL (Universal Business Language) or EDI (Electronic Data Interchange). ‐ Generation and Validation: This module allows the creation of electronic invoices in various formats and their compliance validation with the relevant standard. ‐ Custody: This module encapsulates the database (DDBB) logic, as well as the persistence layer and the operations of collection and storage of electronic invoices from the DDBB. ‐ Audit: This module enables the other modules to store information on transactions among them. It provides transactional recovery and also generates reports, statistics and logs. ‐ Signature: It provides functionality for generation and validation of electronic signatures in various formats and modes.

136 http://www.telvent.com/

‐ 124 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 48. Factur@ Architecture Factur@ has been developed using Java technology with Spring Framework 137 . The presentation layer is based on Apache Struts138 and JSPs. Web services have been performed using Apache CXF139. Security management is used by Spring Security140. The database access is performed using Hibernate141 with C3P0142 (which is in charge of managing the connections to the database). The databases used are MySQL143 and Oracle144. The application server is JBoss145. These components are illustrated in the following diagram:

Figure 49. Factur@ Architecture

137 http://www.springsource.org/ 138 http://struts.apache.org/ 139 http://cxf.apache.org/ 140 http://static.springsource.org/spring‐security/site/ 141 http://www.hibernate.org/ 142 http://sourceforge.net/projects/c3p0/ 143 http://www.mysql.com/ 144 http://www.oracle.com/ 145 http://www.jboss.org/

‐ 125 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

A summary of the Factur@ metrics in terms of number of Java packages, Java classes, JSPs (Java Server Pages) and Lines of Code (LOC) is depicted in the table below:

Table 31. Factur@ Metrics Summary Packages Classes JSPs LOC 59 395 150 27227

All in all, Factur@ is a complex enterprise system composed by several modules and developed with Java technologies. Therefore, this system is suitable for becoming in the SUT for a case study in the validation in this dissertation. In fact, the complete Factur@ system is too big since it implies many parts. For this reason, I am going to select one web‐based module as SUT. Concretely, the Administration module will be the selected one to be tested and analysed with ATP v2.0. Figure 50 shows a screenshot of the Factur@ administration web application:

Figure 50. Factur@ Administration Module Screenshot Thus, Factur@ was a finished web system when it was selected as SUT in this case study. Regarding the assessment of this application, only functional testing was carried during its development phase. The following table summarizes the figures of effort and number of test cases performed:

Table 32. Factur@ Administration Testing Figures Number of Test Cases Passed Failed Inconclusive Effort 13 3 7 3 1 PM (Person‐Month)

8.2.2. Pre‐Automation In order to perform the case study, first I need some input models for ATP. The creation or reuse of these models corresponds to the phase called “pre‐automation” in the functional

‐ 126 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

automation approach depicted in Figure 23 (section 5.4). These models could be one of these three types: 1. UML models created using EA following the NDT approach. The format of these diagrams is XMI, and to build them the existing analysis model of Factur@ has been reused according to the method described in section 5.3.1. 2. XML files describing the navigation, data, and oracles. These files has been created using the XSD schema described in section 5.3.2 3. R&P scripts in HTML format, recorded using Selenium IDE with the subset of command described in section 7.2.1. ATP must use one of the three types of entries submitted earlier, i.e. XML, XMI, or R&P (see Figure 23 in section 5.4). Nevertheless, and in order to perform a complete case study, the three kinds of input will be used. These inputs are described in the following subsections.

8.2.2.1. UML Models First I proceed to detail the NDT models. To carry out the automation described in chapter 5, three types of UML models should be created: use case, activity, and presentation diagrams. The use case diagram of Factur@ is shown in Figure 51. As can be seen in this picture, there are five cases of use in the application: login in the application, to create a new company, to find company, to search a company, to create new administrator, and to search an administrator.

Figure 51. Factur@ Use Case Diagram Each use case is linked to an activity diagram. These activity diagrams show the navigation structure of the web application. These diagrams have been enhanced according to the notation described in section 5.3.1. The five activity diagrams for each use case are shown respectively in Figure 52, Figure 53, Figure 54, Figure 55, and Figure 56.

‐ 127 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 52. Login Figure 53. New Company Figure 54. New Adminitrator

Figure 55. Search Company Figure 56. Search Administrator

Finally, the presentation diagrams are created using the Visualization Prototype (PV) stereotype in NDT. These diagrams are shown in the figure below. Each of these PV corresponds to a state in the above activity diagrams.

‐ 128 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 57. Presentation Diagrams

8.2.2.2. XML Models The use case described in section before has been also coded as XML using the XSD schema created in this dissertation. Because the verbatim transcript of each XML file would be too long, I have selected one of these XML files as an example, shown in the following snippet. It corresponds to the use case "Login":

Snippet 10. Login in XML Format

Administrador admin bad-login

‐ 129 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

bad-password

Nombre de usuario o contraseña inválidos

No hay elementos pendientes.

As this snippet shows, with XML it is possible to describe the web navigation (tags state and transition), test data (tag data) and oracles (tag assert).

8.2.2.3. R&P Models Due to the same reasons described in the previous sub‐section, I only show a recorded script using Selenium IDE and coded in HTML. Specifically, this script corresponds to the use case "New Administrator". The recorded script is basically an HTML table with three values: command, target and value. The following table represents this script:

Table 33. New Administrator in HTML Format new_admin open /WebAdmin type username Administrador type password admin clickAndWait frmDatos_0 No hay elementos assertText texto-entrada pendientes. Nombre de usuario o contraseña assertTextNotPresent inválidos click link=Gestión de Administradores clickAndWait link=Nuevo Administrador type login username01 type password pass

‐ 130 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

new_admin type confPassword pass type nombre name type dni 1111 type primerApellido surname type segundoApellido surname 2 type fax 2222 type telefono 3333 type email [email protected] select idioma label=es clickAndWait //input[@value='Aceptar']

With the creation of the input (XMI, XML, and HTML) finishes the pre‐automation step. At this point it can be carried out the generation of test cases to conduct automatically the quality control. To this aim, I use the reference implementation ATP. Therefore, the first activity is setting up properly ATP. As described in Chapter 7, ATP has been implemented as a command‐ shell tool, so that the setting phase is described below.

8.2.3. Configuration Regarding configuration, there are three mandatory parameters, namely root, navigation_dir and navigation_type. The first one (root) determines the folder in which will be generated the Eclipse project containing the test cases, data, libraries, and other necessary artefacts for the automation of testing and analysis. Therefore, to tell to ATP that the folder will have the name factura at a level above where ATP is stored, the command would be as follows:

Snippet 11. Setting root in ATP >atp set root ../factura The other two mandatory parameters (navigation_dir and navigation_type) correspond to the folder that will be saved input files and their type (xml, xmi, or html). In this case study has created a directory called casestudy in the same folder of ATP, so the command would be:

Snippet 12. Setting Navigation Folder in ATP >atp set navigation_dir casestudy The content of this folder is illustrated in the following picture. There are a XMI file containing the models described in section 8.2.2.1, five XML files (see an example in section 8.2.2.2) and six scripts in HTML recorded with Selenium IDE (see an example in 8.2.2.3).

‐ 131 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 58. Input Folder The rest of setting will be the defaults, that is, full automation of functionality, performance, security, compatibility, accessibility, and usability. To show these parameters, it should be used the command list of ATP as follows:

Snippet 13. Configuration Parameters Listing in ATP >atp list [INFO] root = ../factura [INFO] navigation_dir = casestudy [INFO] navigation_type = xml [INFO] browser = firefox [INFO] tool = junit4 [INFO] timeout = 3000 [INFO] source = false [INFO] seleniumPort = 5555 [INFO] functional = true

[INFO] performance = true [INFO] jmeterPort = 4445 [INFO] minThroughput = 10 [INFO] concurrentUsers = 10 [INFO] minBitrate = 10 [INFO] maxTraffic = 5 [INFO] maxResponseTime = 2000

[INFO] security = true [INFO] vulnerabilities = crlf,exec,sql,xss,backup,htaccess,blindsql [INFO] method = get,post

[INFO] compatibility = true [INFO] htmlProfile = html401strict [INFO] cssProfile = css21

[INFO] accessibility = true [INFO] guidelines = WCAG1-AA [INFO] acheckerUrl = http://localhost/AChecker/checkacc.php [INFO] acheckerId = 70f7795f8ca4d06a82fdcbe5537e8ae645cda9b0

‐ 132 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

[INFO] usability = true [INFO] usabilitMethods = form,maintain,navigate,read,headinfo,bodyinfo

8.2.4. Generation In order to perform the test case generation according to the configuration, It should be used the command create of ATP.

Snippet 14. Test Case Generation in ATP >atp create [INFO] ATP (Automatic Testing Platform) v2.0 [INFO] [http://atestingp.sourceforge.net] [INFO] Copyright (c) 2011 UPM. Apache 2.0 license. [INFO] [INFO] Initializing tool ... ok [INFO] Creating Eclipse project (factura) ... ok [INFO] Copying libraries ... ok [INFO] Creating test case (AntTransformer/ant) ... ok [INFO] Checking navigation input folder (casestudy) ... ok [INFO] Creating test case (NavigationTransformer/junit4) ... ok The operation should be repeated for XMI and HTML since XML is the format by default. The result of this command is the generation of an Eclipse project with test cases, data, and so on. It is not needed to import the Eclipse project to run the test cases, since ATP has the option to do so automatically (run command). Nevertheless, this project can be imported and modified within the Eclipse IDE. The following image shows the contents of the project once inside Eclipse:

Figure 59. Generated Eclipse Project Below the content of one of the generated test cases is shown. This test case is the login, and its corresponding XML file that has been shown in Snippet 10 and which is equivalent to the

‐ 133 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

activity diagram shown in Figure 52. The preamble of the test case (setUp) basically sets up the browser with the configuration parameters of ATP (see Snippet 13). The test methods are marked with the annotation @Test of JUnit v4. These methods correspond to the found paths in the navigation using CPP. In this example there are two independent paths, called testPath1 and testPath2.

Snippet 15. JUnit Test Case Generated for Login (XML) public class Test_01login {

private Browser testContext;

@Before public void setUp() throws Exception { testContext = new Browser(); testContext.setTestCaseName("01login"); testContext.setWebUnderTest("http://localhost:8080/WebAdmin/"); testContext.setRoot("C:/Users/bgarcia/Thesis/dev/atp/factura"); testContext.setSeleniumPort(5555); testContext.setBrowser("firefox"); testContext.setTimeout(3000);

// Functionality testContext.performFunctionality();

// Performance final int jmeterPort = 4445; final int concurrentUsers = 10; final int maxResponseTime = 2000; // ms final int minThroughput = 10; // Samples/s final int minBitrate = 10; // KB/s final int maxTraffic = 5; // KB testContext.performPerformance(jmeterPort, concurrentUsers, maxResponseTime, minThroughput, minBitrate, maxTraffic);

// Security final String vulnerabilities = "crlf,exec,sql,xss,backup,htaccess,blindsql"; final String method = "get,post"; testContext.performSecurity(vulnerabilities, method);

// Compatibility* final String htmlProfile = "html401strict"; final String cssProfile = "css21"; testContext.performCompatibility(htmlProfile, cssProfile);

// Usability final String usabilitMethods = "form,maintain,navigate,read,headinfo,bodyinfo"; testContext.performUsability(usabilitMethods);

// Accessibility final String guidelines = "WCAG1-AA";

‐ 134 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

final String acheckerUrl = "http://localhost/AChecker/checkacc.php"; final String acheckerId = "70f7795f8ca4d06a82fdcbe5537e8ae645cda9b0"; testContext.performAccessibility(guidelines, acheckerUrl, acheckerId); testContext.start(); }

@Test public void testPath1() throws Exception { final int pathNumber = 1;

while (testContext.hasNextIteration(pathNumber)) { // State 1 testContext.processState("login"); testContext.click("frmDatos_0");

// State 2 testContext.processState("init"); } }

@Test public void testPath2() throws Exception { final int pathNumber = 2;

while (testContext.hasNextIteration(pathNumber)) { // State 1 testContext.processState("login"); testContext.click("frmDatos_0");

// State 2 testContext.processState("login"); } }

@After public void tearDown() throws Exception { testContext.stop(); } } The test data and oracles for each path are stored in a single Excel spreadsheet. In the login example there are two spreadsheets (see Figure 59): data_01login_path_1.xls and data_01login_path_2.xls, corresponding to each path. The first one of these files has two sheets, one per state in the path while the second one has only one (since this path has only one state).

The content of data_01login_path_1.xls is illustrated in Figure 60 and Figure 61. In the first picture it is shown the data for login state. The first row contains the HTML locators for the data fields. To find these elements ATP follows the procedure depicted in Snippet 2.

‐ 135 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

One empty column is left between data and oracles. It can be also distinguished visually since data header has yellow background while the oracle header background is green. Under this heading are located the actual data, both real data and oracles.

Figure 60. Test Data for Login state in Path 1 Figure 61. Test Oracles for Init state in Path 1

The content of data_01login_path_2.xls is illustrated in Figure 62. In this path there are both data and oracle in the same state:

Figure 62. Test Data and Oracles for Login state in Path 2 In order to show a more complex example regarding possible path, I am going to illustrated the path founds for “search company” use case, illustrated as an activity diagram in Figure 55. In this case, CPP find two independent paths by browsing the transitions. These paths are illustrated as follows:

Figure 63. Paths Found by CPP for Search Company Activity Diagram

‐ 136 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

As this picture shows, CPP has found two paths (illustrated with blue and red colour in the diagram). Therefore, two test methods will be created in the JUnit test case following the same structure as Snippet 15. The sequence of states for each path is the following: ‐ Path 1: Login  SearchCompany  EditCompany  EditManager  EditPassword  EditPassword2  EditManager  EditPassword  EditManager  EditCompany  EditManager  EditCompany  SearchCompany  EditCompany  NewManager  EditCompany  SearchCompany  EditCompany  NewManager  EditCompany. ‐ Path 2: Login  SearchCompany. At this point, it should be remembered that CPP looks for the paths by collecting links in the resulting digraph. Each link in the graph has the same weight (except reset links, see Figure 31). Therefore, it should be possible that the found paths could be optimized in terms of functionality. In the example before and depending how the application is implemented, it should be possible that the “Buscar” transition (in SearchCompany state) should be triggered before “Editar Empresa”. In particular, this supposition does not apply in this SUT, but if so, the solution should be customize the generated JUnit (e.g. Snippet 15) test case according to the specific needs. This problem happens since this path resolution has been done for a UML model. This issue can be avoided using XML files since the weight of each link can be customized (see Figure 30).

8.2.5. Post‐Automation Once ATP has generated test cases, data, and other artefacts within the Eclipse project, it is possible to begin the post‐automation phase. This phase was described in Figure 23 (section 5.4), and aims to incorporate new test data and oracles by adding new information in the generated Excel sheets for the navigation paths as new rows/columns of data and oracles. Thus, this phase can be considered the data‐driven part of the approach. This phase is optional when using XML and HTML as input, because in these types of input test data and oracles are incorporated in the navigation model. However, the phase of post‐ automation is mandatory when the input is UML (NDT models in XMI) since those models do not include test data nor oracles.

8.2.6. Execution At the time when use cases are created and the test data ready, it can be performed the execution step. To achieve this, It should be used the ATP command run, as follows:

Snippet 16. Execution of Test Cases in ATP >atp run [INFO] ATP (Automatic Testing Platform) v2.0 [INFO] [http://atestingp.sourceforge.net] [INFO] Copyright (c) 2011 UPM. Apache 2.0 license. [INFO] [INFO] Initializing tool ... ok [INFO] Running and reporting test cases ... [INFO] This could take a few minutes, please wait ...... [INFO] Look for the test results in the 'reports' folder within your application or type 'atp report'

‐ 137 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

If the project has been imported to Eclipse (see Figure 59), it can also be executed using the Ant script runAllTests.xml:

Figure 64. Ant Script The final step in the execution is the generation of the reports. These reports contain the different test/analysis verdicts and should be analysed by a human developer/tester in order to complete the assessment of the web under test.

8.2.7. Reports The reports show a summary of the defects found in the web under test while automating its navigation by ATP. To show this summary (in HTML), the ATP report command should be used:

Snippet 17. Visualization of Reports in ATP >atp report This command shows the HTML report using the default viewer for that kind of files in the tester’s computer. The following screenshot shows an example of report. A complete example of report can be found on Annex II.

Figure 65. ATP Report

‐ 138 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

The following paragraphs details a summary of the results in the report generated when the R&P scripts files in HTML (see Figure 59) are used as input to APT, i.e. 01-login-ok.html, 02-login-ko.html, 03-new_company.html, 04-edit_company.html, 05- new_admin.html and 06-edit_admin.html. Regarding the login-ok file, Table 34 details the summary about functionality, performance, security, compatibility, usability, and accessibility.

Table 34. Summary Report for 01‐login‐ok.html

Iteration Functionality Performance Security Compatibility Usability Accessibility

#1 0 1 1 1 23 Low 0 Medium 0 High 39 20 88 27 54 13 Warnings Errors Warnings Errors Errors Errors Errors Warnings Errors Warning Likely Potential Errors Errors Errors

Regarding functionality, this table shows that there has been found 1 error. This error is due to a HTTP 404 error in the main page (see Figure 65). To discover the HTML line causing this error, the tester should check the sources. ATP captures each HTML and CSS source and stores a local copy within the report. Therefore, the component source causing this error is the following:

Figure 66. ATP Report Regarding the login-ko file, Table 35 details the summary of this report. Focusing on performance, I notice there are 2 warnings and 1 error in this test case. ATP captures the performance data and also creates some charts with that information.

Table 35. Summary Report for 02‐login‐ko.html

Iteration Functionality Performance Security Compatibility Usability Accessibility

#1 0 1 2 1 20 Low 0 Medium 0 High 26 20 62 15 33 7 Warnings Errors Warnings Errors Errors Errors Errors Warnings Errors Warning Likely Potential Errors Errors Errors

Examining the report, the cause of the warnings is average traffic both for state1 and state2 (8.58 KB/sec and 8.77 KB/sec). According to the configuration (see Snippet 13 or Snippet 15 ), the maximum traffic should be 5 KB. The chart for this value (included in the report) is shown as follows:

‐ 139 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

Figure 67. Average Traffic in Iteration 1 of 02‐login‐ko.html Regarding the error found, it is about the fact that the web application throws an exception when requesting twice the URL http://localhost:8080/WebAdmin/login/LoginAction.action. This situation can be reproduced even using a web browser, as illustrated in the following screenshot:

Figure 68. Factur@ Error 500 Regarding the new_company file, the following table details the summary of this report. Focusing on security, ATP found 29 low errors in this path:

Table 36. Summary Report for 03‐new_company.html

Iteration Functionality Performance Security Compatibility Usability Accessibility

#1 2 1 2 1 29 Low 0 Medium 0 High 74 40 130 29 60 14 Warnings Errors Warnings Errors Errors Errors Errors Warnings Errors Warning Likely Potential Errors Errors Errors

In this example the detected low errors can be grouped into different categories: ‐ Backup files. The analysis found backup files of scripts on the webserver that the web administrator put there to save a previous version or backup files. These copies may reveal information like source code or credentials. This situation happens for example in http://localhost:8080/WebAdmin/~

‐ 140 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos

‐ Potentially dangerous file. Some scripts are known to be vulnerable or dangerous. Databases of such files exist and attackers often scan websites to find such vulnerabilities and exploit them. For example: http://localhost:8080/status?full=true ‐ SQL Injection. It is a technique that exploits a vulnerability occurring in the database, for example in http://localhost:8080/WebAdmin/login/LoginAction.action?%BF%27%22%28 ‐ Blind SQL Injection. This kind of vulnerability is harder to detect than basic SQL injections because no error message will be displayed on the webpage. For example, the following URL: http://localhost:8080/WebAdmin/login/LoginAction.action?sleep%287%29%23 ‐ Commands execution. This attack consists in executing system commands on the server. The attacker tries to inject these commands in the request parameters. One example of this vulnerability is http://localhost:8080/WebAdmin/login/LoginAction.action?a%3Benv

Regarding the edit_company file, Table 37 shows the summary of the found defects. Focusing on compatibility this time, 112 warnings and 60 errors has been found. These defects are basically HTML and CSS problems. A few examples are shown below:

‐ Warning in http://localhost:8080/WebAdmin/. &op=browse Scripting (XSS). CA-2000-02. y References: dang http://www.cert.org/advisories/CA-2000- erou 02.html s file

http://localhost:8080/WebAdmin/ Pote http://localhost:8080/JUNK(223)DEFACED