UNIVERSIDAD POLITÉCNICA DE MADRID
ESCUELA TÉCNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN
CONTRIBUTION TO THE AUTOMATION OF SOFTWARE QUALITY CONTROL OF WEB APPLICATIONS
TESIS DOCTORAL
BONIFACIO GARCÍA GUTIÉRREZ Ingeniero de Telecomunicación 2011
DEPARTAMENTO DE INGENIERÍA DE SISTEMAS TELEMÁTICOS
ESCUELA TÉCNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN
UNIVERSIDAD POLITÉCNICA DE MADRID
CONTRIBUTION TO THE AUTOMATION OF SOFTWARE QUALITY CONTROL OF WEB APPLICATIONS
Autor: BONIFACIO GARCÍA GUTIÉRREZ Ingeniero de Telecomunicación
Director: JUAN CARLOS DUEÑAS LÓPEZ Doctor Ingeniero de Telecomunicación
2011
Tribunal nombrado por el Magfco. y Excmo. Sr. Rector de la Universidad Politécnica de Madrid, el día 26 de julio de 2011.
Presidente: ______
Vocal: ______
Vocal: ______
Vocal: ______
Secretario: ______
Suplente: ______
Suplente: ______
Realizado el acto de defensa y lectura de la Tesis el día 9 de septiembre de 2011 en la E.T.S.I.T. habiendo obtenido la calificación de ______
EL PRESIDENTE LOS VOCALES
EL SECRETARIO
A mis padres
Agradecimientos
Uno de los conceptos claves de los que va a tratar esta tesis doctoral es la búsqueda de los diferentes caminos que se pueden recorrer para lograr un determinado objetivo. Esta idea no dista mucho de las vivencias de las personas. En estos últimos años he tenido que recorrer diferentes caminos, muchos buenos, otro no tanto, e incluso alguno terriblemente duro. Es por ello que llegados al punto en el que se cierra este ciclo, es el momento de echar la vista atrás y acordarme de todas las personas que han hecho posible este viaje. Quiero en primer lugar expresar mi agradecimiento más sincero a Juan Carlos Dueñas por hacer posible esta tesis. Hace casi 5 años me brindaste la oportunidad de trabajar en la universidad como investigador. Transcurrido este tiempo, me gustaría agradecer una y mil veces todo el apoyo y confianza que has depositado en mí. A parte de tu brillantez para dirigir esta tesis, por encima de todo quiero destacar tu calidad humana, tu cercanía y comprensión que me han ayudado siempre a superar los momentos difíciles. Puedo afirmar sin duda alguna que el mayor éxito que he conseguido en mi vida profesional es haber trabajado a tu lado. En segundo lugar, quiero expresar mi gratitud a todos los compañeros de laboratorio con los que he compartido tanto tiempo: Álvaro, Antonio, Bea, Chema, Marta, Félix, Freakant, Hugo, José Ignacio, José Luis, Laura, Lorena, Mar, Rodrigo, Rubén, Samuel y Sandra. Muchas gracias también a July, que es el auténtico motor del laboratorio. I would like to thank to the European partners which have made possible my research stay in VTT‐Espoo (Finland) during summer 2010. Thank you very much to Juha Pärssinen and Hannu Honka for making possible this journey. Special thanks to Arto Laikari and the rest of the group: Janne, Juha, Julia, Vesa, Jukla, Kari. Thank you to Anne Kontula for helping us during the stay. En esta ronda de agradecimiento no me puedo olvidar de los amigos de siempre. Aquellos con lo que siempre puedes contar para echarte unas risas, sin las cuales muchas veces no valdría la pena el esfuerzo: Álvaro, Amalia, Ana, Aurora, Barrix, Chechu, Fátima, Gari, Iván, Jesús, Kike, Laura, María, Marta, Miky, Riky, Santos, Tomate y Vanesa. El agradecimiento más especial es para mi chica, Vero. Muchas gracias por el cariño que me demuestras día a día, por tu apoyo y ayuda incondicional, y por compartir tantos momentos juntos. El agradecimiento más especial quiero que sea para mis hermanas: Yoly e Inma. Solo vosotras sabéis bien por todo lo que hemos pasado. Sólo me gustaría expresar el orgullo que tengo de ser vuestro hermano y espero estar ahí siempre para vosotras. Además, habéis traído al mundo (sin la inestimable ayuda de Mario y Rubén respectivamente) a las personillas más importantes que puede haber. Me refiero a Andrea, Silvia, y la recién llegada Laura. Su alegría es el medio más poderoso que conozco para encarar el futuro con optimismo. Por supuesto quiero acordarme también del resto de mi familia: abuelos, tíos, y primos. Por último, pero en el primer lugar de mi corazón, quiero acordarme de mis padres. No habría cosa que más me gustaría en el mundo que me hubieseis podido ver culminando esta etapa de mi vida. Seguro que os sentiríais muy orgullosos de mí, tanto como yo lo soy de ser hijo vuestro. Quiero daros las gracias por todo lo que luchasteis en vuestra vida por nosotros. Siempre os llevo conmigo, tened por seguro que de lo poco que puedo presumir es de ser hijo de Pablo y Dolores.
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Abstract
The Web has become one of the most influential instruments in the history of mankind. Therefore, web applications development is a hot topic in the Software Engineering domain. In this context, the software quality is a key concept since it determines the degree in which a system meets its requirements and meets the expectations of its customers and/or users. Quality control (also known as verification and validation) is the set of activities designed to assess a software system in order to ensure its quality. Therefore, the quality control process ensures the requirements of applications while reducing the number of defects. The two core activities in quality control are testing and analysis. On one hand, testing is a dynamic method, i.e., it assesses the responses of a running system. On the other hand, analysis is static, i.e., it assesses the software artefacts (e.g., source code, models, and so on) without its execution. Current web applications market is defined by fierce global competition. This market can be divided into three different positions: quality, cost, and time to market. In order to minimize costs and time to market in the development of web applications is a very common practice the reduction or elimination of quality control processes. This fact has a direct impact in the low quality of such applications. Automation of quality control activities help to improve the overall quality of software developed while reducing development time and costs. This PhD dissertation proposes a set of techniques to automate the quality control (testing and analysis) for web applications. The heterogeneous nature of web applications makes complex the quality control activities. Web applications are based on client‐server architecture. This dissertation is focuses on the client‐side of web systems, since it is the differentiating factor of such applications. According to the ISO‐9126 standard, quality in use is the quality perceived by users of the applications during phases of operation and maintenance of these applications. This type of quality is determined by its external quality (properties of the system during its execution) and internal quality (system properties statically). Thus, the quality use of web applications is always perceived from client‐side in web applications. The quality control process proposed in this dissertation is based on the automation of the navigation of web applications. Functional and non‐functional requirements of the system under test will guide the process. Regarding non‐functional requirements, testing and analysis will be made to the quality attributes considered the most important for web applications: performance, security, compatibility, usability and accessibility. The first step in this automation is defining the structure of the navigation. To achieve this aim, existing software artefacts in the phase of analysis and design of web applications under test will be reused as far as possible. Then, as the navigation is automated, there will be different kinds of tests and analysis in the various states of the navigation. The aggregation of the verdicts of the evaluation is stored in an automatically generated report will contain different defects and potential issues found. The processes and methods proposed in this dissertation have been implemented by means of reference architecture. In addition, several experiments and case studies have been conducted in order to assess the proposal. This work has been carried out in different national and international research projects mainly in the ICT‐ROMULUS, ITEA‐MOSIS and Factur@.
xi
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Resumen
La Web se ha convertido en uno de los instrumentos más influyentes de la humanidad. El desarrollo de aplicaciones web es por tanto un tema de capital importancia en el mundo de la Ingeniería de Software. En este ámbito, la calidad de software es un concepto clave ya que determina el grado en el que un sistema cumple sus requisitos y satisface las expectativas de sus clientes y/o usuarios. El control de calidad (también conocido como verificación y validación) es el conjunto de actividades dirigidas a evaluar un sistema software con el objetivo de asegurar la calidad del mismo. El control de calidad es por tanto el proceso encargado de asegurar que se cumplen los requisitos de las aplicaciones al tiempo que se elimina (o se reduce al máximo) el número de defectos en las mismas. Las dos actividades básicas del control de calidad son las pruebas y el análisis. Las pruebas son de naturaleza dinámica, esto es, se evalúa las respuestas de un sistema en ejecución. Por el contrario, el análisis es de naturaleza estática, es decir, se evalúa los artefactos que componen el software en cuestión (por ejemplo, su código fuente, modelos, etc.) sin la ejecución del mismo. El mercado de las aplicaciones web está determinado por una competencia global dirigida por tres ejes: calidad, costes, y tiempo de salida al mercado. Para minimizar costes y tiempo de salida al mercado, es una práctica muy común en el desarrollo de aplicaciones web la reducción o eliminación de los procesos de control de calidad, aminorando por tanto la calidad final de las aplicaciones web. La automatización de las actividades de control de calidad ayuda a mejorar la calidad global del software desarrollado mientras se reducen los tiempos de desarrollo y costes. Esta tesis doctoral propone un conjunto de técnicas para automatizar el control de calidad (pruebas y análisis) para aplicaciones web. La naturaleza heterogénea de las aplicaciones web hace las actividades de control de calidad sean complejas. Las aplicaciones web están basadas en una arquitectura cliente‐servidor. Esta tesis está centrada en la parte cliente de los sistemas web, ya que es el factor diferenciador de este tipo de aplicaciones. Según el estándar ISO‐9126, la calidad en uso es la calidad percibida por los usuarios de las aplicaciones durante las fases de operación y mantenimiento de dichas aplicaciones. Este tipo de calidad está determinada por la calidad externa (propiedades del sistema durante su ejecución) e interna (propiedades del sistema de forma estática) del sistema en cuestión. Así pues, la calidad en uso de las aplicaciones web es percibida siempre desde lado cliente de las aplicaciones web. El proceso de control de calidad propuesto en esta tesis doctoral está basado en la automatización de la navegación de las aplicaciones web. Los requisitos funcionales y no funcionales del sistema bajo pruebas guiarán el proceso. Respecto a los requisitos no funcionales, se realizarán pruebas y análisis para los atributos de calidad considerados como los más importantes para aplicaciones web: rendimiento, seguridad, compatibilidad, usabilidad y accesibilidad. El primer paso en esta automatización consistirá en definir la estructura de navegación de la misma. Para ello se usarán (y reutilizarán en la medida de lo posible) artefactos software existentes en las fase de análisis y diseño de las aplicaciones web bajo prueba. A continuación, según se lleve a cabo la navegación de forma automática, se realizarán diferentes tipos de pruebas y análisis en los diferentes estados por los que va pasando el sistema según avanza la navegación. La agregación de los veredictos de dicha evaluación será almacenada en un informe generado automáticamente que contendrá los diferentes tipos
xiii PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
defectos encontrados, así como problemas potenciales en los atributos de calidad previamente seleccionados. Los procesos y métodos propuestos en esta tesis han sido puestos en marcha mediante una arquitectura e implementación de referencia. Además, se han llevado a cabo diferentes experimentos y casos de estudio para evaluar la validez de la propuesta. Este trabajo ha sido llevado a cabo en diferentes proyectos nacionales e internacionales de investigación, principalmente en los proyectos ICT‐ROMULUS, ITEA‐MOSIS y Factur@.
xiv PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Table of Contents
Chapter 1. Motivation ...... 1 1.1. Research Methodology ...... 4 1.2. Structure of the document ...... 5
Chapter 2. State of the Art ...... 7 2.1. Software Quality ...... 7 2.1.1. Quality Engineering ...... 7 2.1.2. Quality Assurance ...... 9 2.1.3. Verification and Validation ...... 12 2.2. Static Analysis ...... 14 2.2.1. Inspections ...... 14 2.2.2. Review ...... 15 2.2.3. Automated Software Analysis ...... 15 2.2.4. Formal Methods ...... 15 2.3. Software Testing...... 17 2.3.1. Testing Levels ...... 18 2.3.2. Testing Methods ...... 21 2.4. Testing of Web Applications ...... 24 2.4.1. Web Testing Levels ...... 24 2.4.2. Web Testing Strategies ...... 26 2.4.3. Non‐Functional Web Testing ...... 27 2.4.4. Web Testing Tools ...... 28 2.5. Automated Software Testing ...... 29 2.5.1. Test Case Generation ...... 29 2.5.2. Test Data Generation ...... 31 2.5.3. Automated Test Oracles ...... 35 2.5.4. AST Frameworks ...... 36 2.5.5. AST Frameworks for Web Applications ...... 37 2.6. Summary ...... 41
Chapter 3. Objectives ...... 43
Chapter 4. Methodology Foundations ...... 47 4.1. Web Applications ...... 47 4.2. Automated Quality Control Activities ...... 49 4.2.1. Automated Software Testing ...... 49 4.2.2. Automated Software Analysis ...... 52 4.3. Quality Views ...... 53 4.3.1. Functionality ...... 53 4.3.2. Performance ...... 54
xv PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
4.3.3. Security ...... 54 4.3.4. Compatibility ...... 54 4.3.5. Usability ...... 55 4.3.6. Accessibility ...... 55 4.4. Test Process ...... 56 4.5. Summary ...... 59
Chapter 5. Automated Functional Testing ...... 61 5.1. Scope of the Dissertation ...... 62 5.2. Approach ...... 65 5.3. Modelling Web Navigation ...... 68 5.3.1. UML Models ...... 68 5.3.2. XML Files ...... 74 5.3.3. R&P Approach ...... 76 5.4. Finding the Paths in a Multidigraph ...... 77 5.5. Summary ...... 82
Chapter 6. Automated Non‐Functional Assessment ...... 85 6.1. Approach ...... 85 6.2. Automated Non‐Functional Testing ...... 87 6.2.1. Performance ...... 87 6.2.2. Security ...... 91 6.3. Automated Non‐Functional Analysis ...... 94 6.3.1. Compatibility ...... 94 6.3.2. Usability ...... 95 6.3.3. Accessibility ...... 97 6.4. Summary ...... 97
Chapter 7. Architecture ...... 99 7.1. Tools Integration ...... 99 7.2. Tool Survey ...... 100 7.2.1. Functionality ...... 100 7.2.2. Performance ...... 104 7.2.3. Security ...... 106 7.2.4. Compatibility ...... 107 7.2.5. Usability ...... 111 7.2.6. Accessibility ...... 112 7.3. Automatic Testing Platform ...... 114 7.3.1. Test Cases ...... 116 7.3.2. ATP Extension ...... 117 7.3.3. Web Site Java Modelling ...... 118
xvi PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
7.4. ATP4Romulus ...... 119 7.5. Summary ...... 121
Chapter 8. Validation ...... 123 8.1. Research Questions ...... 123 8.2. Factur@ ...... 124 8.2.1. System Description ...... 124 8.2.2. Pre‐Automation ...... 126 8.2.3. Configuration ...... 131 8.2.4. Generation ...... 133 8.2.5. Post‐Automation ...... 137 8.2.6. Execution ...... 137 8.2.7. Reports ...... 138 8.3. Romulus Demonstrators ...... 142 8.4. Conclusions ...... 147
Chapter 9. Conclusions ...... 151 9.1. Main Contributions ...... 151 9.2. Future work ...... 154
References ...... 157
Annex I: Navigation XSD Schema ...... 165
Annex II: Example of ATP Report ...... 169
Annex III: Curriculum Vitae ...... 181
xvii
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
List of Figures
Figure 1. Verification & Validation in Context ...... 2 Figure 2. Web Evolution ...... 3 Figure 3. Market Dimensions for Web Applications ...... 3 Figure 4. Software Quality Engineering Proccess ...... 8 Figure 5. ISO/IEC‐9126 Quality Lifecycle ...... 10 Figure 6. ISO/IEC‐9126 Quality Model (External and Internal Quality) ...... 10 Figure 7. ISO/IEC‐9126 Quality Model (Quality in Use) ...... 11 Figure 8. Verification & Validation Schema...... 13 Figure 9. Unit Testing ...... 19 Figure 10. Test Data Generation Techniques ...... 32 Figure 11. Generic Search Based Test Input Generation Scheme ...... 34 Figure 12. Software Engineering Layers ...... 44 Figure 13. Tipical Web Applications Architecture ...... 47 Figure 14. Software Defects in Context...... 49 Figure 15. Fault Origin/Dectection Distribution and Cost ...... 49 Figure 16. Generic Testing Activities ...... 51 Figure 17. Generic Analysis Activities ...... 52 Figure 18. Transition‐based Coverage Criteria ...... 59 Figure 19. Methodology Levels ...... 59 Figure 20. Methodology Quality Dimmensions ...... 60 Figure 21. Methodology Process ...... 60 Figure 22. Web Site and Quality Control Metamodel ...... 64 Figure 23. Automated Functional Testing Schematic Diagram ...... 67 Figure 24. Use Case Diagram Example ...... 71 Figure 25. Activity Diagram Example ...... 71 Figure 26. Activity Diagram with Complex Transition ...... 73 Figure 27. Presentation Diagram Example ...... 73 Figure 28. XSD Graphic Representation for a Web Site ...... 74 Figure 29. XSD Graphic Representation for a Web Page ...... 75 Figure 30. XSD Graphic Representation for a Web Transition ...... 75 Figure 31. Digraph Example ...... 79 Figure 32. Node Reduction Example ...... 80 Figure 33. Node Reduction vs. CPP Costs ...... 81 Figure 34. Node Reduction vs. CPP Time ...... 81 Figure 35. MBT Taxonomy ...... 82 Figure 36. Automated Non‐Functional Testing and Analysis Schematic Diagram ...... 87 Figure 37. Response Time Latency ...... 90 Figure 38. Browser Use Evolution since 2002 ...... 94 Figure 39. Browser Use on March 2011 ...... 95 Figure 40. Selenium IDE ...... 102 Figure 41. Recorded Script in HTML ...... 102 Figure 42. ATP Process ...... 115 Figure 43. ATP Architecture ...... 116
xix PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Figure 44. 3‐Tier ATP Methodology ...... 117 Figure 45. 3‐ Method for Adding New Generators in ATP ...... 118 Figure 46. Web Applicartion Model ...... 119 Figure 47. 3‐ ATP4Romulus Architecture ...... 120 Figure 48. Factur@ Architecture ...... 125 Figure 49. Factur@ Architecture ...... 125 Figure 50. Factur@ Administration Module Screenshot ...... 126 Figure 51. Factur@ Use Case Diagram ...... 127 Figure 52. Login ...... 128 Figure 53. New Company ...... 128 Figure 54. New Adminitrator ...... 128 Figure 55. Search Company ...... 128 Figure 56. Search Administrator ...... 128 Figure 57. Presentation Diagrams ...... 129 Figure 58. Input Folder ...... 132 Figure 59. Generated Eclipse Project ...... 133 Figure 60. Test Data for Login state in Path 1 ...... 136 Figure 61. Test Oracles for Init state in Path 1 ...... 136 Figure 62. Test Data and Oracles for Login state in Path 2 ...... 136 Figure 63. Paths Found by CPP for Search Company Activity Diagram ...... 136 Figure 64. Ant Script ...... 138 Figure 65. ATP Report ...... 138 Figure 66. ATP Report ...... 139 Figure 67. Average Traffic in Iteration 1 of 02‐login‐ko.html ...... 140 Figure 68. Factur@ Error 500 ...... 140 Figure 69. EUProjectManager Screenshot ...... 143 Figure 70. Cornelius Screenshot ...... 143 Figure 71. Scrooge Screenshot ...... 143 Figure 72. Scrooge Web Performance Charts ...... 146 Figure 73. Cornelius Database Performance Charts ...... 147 Figure 74. Dissertation Summary ...... 154
xx PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
List of Tables
Table 1. Black‐Box Vs. White‐Box Testing ...... 22 Table 2. Web Application Non‐Functional Testing ...... 27 Table 3. Decision Table Template ...... 35 Table 4. Automated Software Testing Frameworks for Web Applications ...... 37 Table 5. Selenium Projects ...... 39 Table 6. Graph Types ...... 57 Table 7. Literals for Actions in Web Transitions ...... 63 Table 8. Test Data and Expected Outcuome Template ...... 65 Table 9. UML 2.0 Diagrams ...... 68 Table 10. UML‐Based Web Modelling Technologies Comparision ...... 70 Table 11. Techniques and Algorithms for Decomposing a Graph into Paths...... 77 Table 12. Open‐source Java Graphs Libraries ...... 81 Table 13. Functional Web Tools ...... 101 Table 14. Functional Web Tools Comparison ...... 101 Table 15. Browser Compatibility of Selenium ...... 102 Table 16. Selenium Commands Subset ...... 103 Table 17. Web Performance Tools ...... 104 Table 18. Performace Web Tools Comparison ...... 105 Table 19. Web Application Scanners ...... 106 Table 20. Web Application Scanners Comparison ...... 107 Table 21. Snapshots Compatibility Tools ...... 107 Table 22. Performace Web Tools Comparison ...... 109 Table 23. HTML Checkers ...... 110 Table 24. HTML Checkers Comparison ...... 110 Table 25. CSS Checkers ...... 110 Table 26. HTML Checkers Comparison ...... 111 Table 27. Broken Links Tools ...... 111 Table 28. Usability Guideline Inspection Tools ...... 112 Table 29. Accessiblity Guideline Tools ...... 112 Table 30. ATP Components ...... 115 Table 31. Factur@ Metrics Summary ...... 126 Table 32. Factur@ Administration Testing Figures ...... 126 Table 33. New Administrator in HTML Format ...... 130 Table 34. Summary Report for 01‐login‐ok.html ...... 139 Table 35. Summary Report for 02‐login‐ko.html ...... 139 Table 36. Summary Report for 03‐new_company.html ...... 140 Table 37. Summary Report for 04‐edit_company.html ...... 141 Table 38. Summary Report for 05‐new_admin.html ...... 141 Table 39. Summary Report for 06‐edit_admin.html ...... 142 Table 40. Summary Report for Factur@ ...... 142 Table 41. ATP4Romulus Results ...... 144 Table 42. EUProjectManager Test Results ...... 145 Table 43. Pros and Cons of XMI, XML and HTML ...... 148
xxi
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
List of Snippets
Snippet 1. Watir Script Example ...... 40 Snippet 2. Procedure to Translate Guards into HTML Elements ...... 72 Snippet 3. XML‐based Navigation Example...... 76 Snippet 4. Path Expression using CPP ...... 79 Snippet 5. Path Expression using Node Reduction ...... 80 Snippet 6. Procedure to Translate Guards into HTML Elements ...... 103 Snippet 7. ATP in the Shell ...... 116 Snippet 8. Generators in ATP ...... 117 Snippet 9. Roma Metaframework ...... 120 Snippet 10. Login in XML Format ...... 129 Snippet 11. Setting root in ATP ...... 131 Snippet 12. Setting Navigation Folder in ATP ...... 131 Snippet 13. Configuration Parameters Listing in ATP ...... 132 Snippet 14. Test Case Generation in ATP ...... 133 Snippet 15. JUnit Test Case Generated for Login (XML) ...... 134 Snippet 16. Execution of Test Cases in ATP ...... 137 Snippet 17. Visualization of Reports in ATP ...... 138 Snippet 18. Installing ATP4Romulus in Roma Metaframework ...... 144 Snippet 19. GraphML Input for ATP4Romulus (EUProjectManager) ...... 144 Snippet 20. Test Case Generation in ATP4Romulus ...... 144 Snippet 21. Test Case Execution in ATP4Romulus ...... 144
xxiii
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Glossary
ACM Association for Computing Machinery AI Artificial Intelligence AJAX Asynchronous JavaScript and XML ANN Artificial Neural Network API Application Programming Interface ASA Automated Software Analysis ASP Active Server Pages AST Automated Software Testing ASVS Application Security Verification Standard ATDG Automated Test Data Generation ATI Automated Testing Institute ATP Automatic Testing Platform BCET Best‐Case Execution Time BFS Breadth‐First Search BNF Backus‐Naur Form BSD Berkeley Software Distribution CASE Computer‐Aided Software Engineering CERN European Organization for Nuclear Research CFG Control Flow Graph COTS Commercial Off‐The‐Shelf CPL Common Public License CPP Chinese Postman Problem CPT Chinese Postman Tour CPU Central Processing Unit CRS Customer Requirements Specification CS Computer Science CSS Cascading Style Sheets CSV Comma Separated Value DDBB Database DDD Domain Driven Design DDR Dynamic Domain Reduction DFS Depth‐First Search DIT Departamento de Ingeniería de Sistemas Telemáticos DOC Depended‐On Component DOM Document Object Model DSL Domain‐Specific Languages EA Enterprise Architect EDI Electronic Data Interchange EFG Event‐Flow Graph EIG Event Interaction Graph EPL Eclipse Public License ER Entity‐Relationship ESIG Event Semantic Interaction Graph ETSIT Escuela Técnica Superior de Ingenieros de Telecomunicación
xxv PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
EU European Union EUPL European Union Public Licence FIFO First‐In First‐Out FSM Finite‐State Machines FTL FreeMarker Template Language FWPTT Fast Web Performance Test Tool GB Giga Byte GDL Graph Description Language GIF Graphics Interchange Format GML Graph Modelling Language GNU GNU's Not Unix GPL GNU General Public License GUI Graphical User Interface HDM Hypermedia Design Model HFPM Hypermedia Flexible Process Modelling HP Hewlett‐Packard HTML HyperText Markup Language HTTP Hypertext Transfer Protocol HTTPS Hypertext Transfer Protocol Secure IANA Internet Assigned Numbers Authority I/O Input/Output IBM International Business Machines ICT Information and Communication Technologies IDE Integrated Development Environment IE Internet Explorer IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers IFN Info Fuzzy Network IP Internet Protocol IS Information Systems ISBN International Standard Book Number ISO International Standards Organization IT Information Technology JDBC Java DataBase Connectivity JML Java Modelling Language JPEG Joint Photographic Experts Group JSP Java Server Pages JTC Joint Technical Committee JUNG Java Universal Network/Graph Framework KB Kilo Bytes LGPL Lesser General Public License LLBB Local Bodies LOC Lines Of Code LTS Labelled Transition Systems MBT Model‐Based Testing MIME Multipurpose Internet Mail Extensions
xxvi PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
MDD Model‐Driven Development MDE Model‐Driven Engineering MIT Massachusetts Institute of Technology MOF Meta‐Object Facility NATO North Atlantic Treaty Organization NDT Navigational Development Techniques NIST National Institute of Standards and Technology NP Nondeterministic Polynomial OCL Object Constraint Language OOHDM Object Oriented Hypermedia Design Model ORM Object Relational Mapping OS Operating System OWASP Open Web Application Security Project PAS Publicly Available Specifications PC Personal Computer PHP Hypertext Preprocessor PKCS Public‐Key Cryptography Standards PM Person‐Month PNG Portable Network Graphics PV Visualization Prototype QA Quality Assurance QE Quality Engineering RAM Random‐Access Memory RAVEN Rule‐Based Accessibility Validation Environment RC Remote Control RE Requirement Engineering RM&E Requirements Management and Engineering RNA Relationship‐Navigational Analysis RQ Research Question SBSE Search‐Based Software Engineering SC Subcommittee SDL Specification and Description Language SE Software Engineering SEO Search Engine Optimization SHDM Semantic Hypermedia Design Method SME Small and Medium Enterprises SOA Service‐Oriented Architecture SOHDM Scenario‐based Object‐Oriented Hypermedia Design Methodology SPP Shortest Path Problem SQL Structured Query Language SRS Software Requirements Specification STAF Software Testing Automation Framework STL Software Testing Lifecycle SUT System Under Test SWEBOK Software Engineering Body of Knowledge TCP Transmission Control Protocol
xxvii PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
TDG Test Data Generation TR Technical Report TSP Traveling Salesman Problem UBL Universal Business Language UI User Interface UML Unified Modelling Language UPM Universidad Politécnica de Madrid URL Uniform Resource Locator UWE UML‐based Web Engineering VCG Visualizing Compiler Graphs VDM Vienna Definition Method W3 World Wide Web W3C World Wide Web Consortium WAI Web Accessibility Initiative WATIR Web Application Testing in Ruby WCAG Web Content Accessibility Guidelines WCET Worst‐Case Execution Time WE Web Engineering WG Working Group WP Work Package WS Web Service WSDL Web Services Description Language WSDM Web Site Design Method WWW World Wide Web XHTML eXtensible HyperText Markup Language XMI XML Metadata Interchange XML Extensible Markup Language XSD XML Schema XSS Cross‐site Scripting
xxviii PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Chapter 1. Motivation
Untested code is the dark matter of the software.
‐ Robert C. Martin
oftware is the collection of computer programs, related data and associated documentation developed for a particular customer or for a general market. Software is an essential part of the modern world, and it has become pervasive in telecommunications, utilities, commerce, culture, entertainment, and so on. The Sterm was coined in contrast to the term hardware, i.e. the physical devices of a computer. The activity of using software and hardware is known as computing. Glass et al. breaks the computing field down into three main subdivisions [54], namely Computer Science (CS), Information Systems (IS), and Software Engineering (SE). According to Sommerville [132], CS focuses on theory and fundamentals of information and computation; IS (sometimes referred as system engineering) is concerned with all aspects of computer‐based systems development including hardware, software and processes; finally SE is an engineering discipline concerned with all aspects of software production. The notion of SE was first proposed in 1968 at a NATO conference held to discuss the “software crisis”, i.e. unreliability problems in software because of individual approaches did not scale up to large and complex software systems [118]. The Software Engineering Body of Knowledge (SWEBOK) [1] establishes a boundary for SE, dividing this discipline in different knowledge areas, namely: software requirements, software design, software construction, software testing, software maintenance, software configuration management, software engineering management, software engineering process, software engineering tools and methods, software quality, measurement, and security. According the Standard Glossary of Software Engineering Terminology [70], software quality is “the degree to which a system, component, or process meets specified requirements, and customer or user needs or expectations”. Software Quality Engineering (QE) ‐sometimes referred as Quality Management‐ is a discipline concerned with the improvement in the software quality. The overall QE process includes three essential stages [138]: i) quality planning; ii) execution of selected Quality Assurance (QA) activities; iii) measurement and analysis to provide convincing evidence to demonstrate software (post‐QA). Quality Assurance (QA) is the process of defining how software quality can be achieved and how the development organisation knows that the software has the required level of quality [47].
‐ 1 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The most common QA activities are Verification and Validation (V&V) ‐also known as quality control‐, which can be seen as a disciplined approach to assessing software products and services. As depicted in Figure 1, V&V can be divided into two big categories: testing (evaluating software by observing its execution, also known as dynamic analysis) [5] and static analysis (evaluating software without executing the code) [132]. Static analysis and testing are often confused and both are mistakenly group under the term testing [1]. In this piece or research static and dynamic techniques are treated separately but grouped into V&V (software quality control), which is the major topic of this dissertation.
Quality Engineering
Quality Assurance
Verification & Validation Testing Static Analysis
Figure 1. Verification & Validation in Context On the one hand, V&V (both static and dynamic techniques) plays a crucial role in the software development process since it is necessary to meet the quality requirements of every software project [68]. On the other hand, both testing and static analysis are usually hard and time‐ consuming activities. Some studies have shown that testing is one of the most costly development processes, sometimes exceeding fifty per cent of the total development cost [13]. Therefore, V&V activities are often poorly performed or skipped by practitioners, creating an industry‐wide deficiency in software quality control. The advent of the Internet has brought new opportunities and challenges for SE. The Internet can be defined as a global system of interconnected networks using the Internet protocol suite TCP/IP (Transmission Control Protocol/Internet Protocol). This suite represents a synthesis of several standards developed mainly in the 1960s and 1970s. TCP and IP protocols were created in 1974 by Vint Cerf and Bob Kahn ‐the so‐called “Fathers of the Internet”‐ [112]. One of the most important services in the Internet is the World Wide Web (WWW, or simple the Web). The first web site was created in 1990 by Tim Berners‐Lee and Robert Cailliau at CERN (European Nuclear Research Center) in Geneve (Switzerland). That first web site consisted of a collection of documents with static content, encoded in the HyperText Markup Language (HTML). The basic element on which the Web is founded is the Hypertext Transfer Protocol (HTTP). HTTP is a client‐server application protocol which defines a standard format to specify requests of resources on the Web [66]. Nowadays, the Web is not only an environment hosting simple and static document, since several technologies have enhanced the web model, becoming the Web into a multi‐domain infrastructure for the execution of web applications and services. The current web applications comprise large‐scale enterprise platforms, e‐commerce systems, collaborative distributed
‐ 2 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
environments, social networks, and so on. Web applications have continued evolve as more and more technologies become available. As illustrated in Figure 2, we are now in the “web 2” era, with rich and dynamic web pages [12]. To address this growth of web systems and to ensure their quality it has been defined the discipline of Web Engineering (WE) [127]. All in all, the Web has become one of the most influential instruments not only in computing but in the history of mankind. Hence the target of this dissertation is web applications.
199x ‐ 2003 2004 ‐ Today Future WWW Web 2 Web 3
•HTTP •XML • Semantic •HTML •AJAX web •JavaScript •SOAP •DHTML •SOA •CSS
Figure 2. Web Evolution The lack of V&V described before is especially significant for web applications. Large and complex web applications with a growing number of potential users are more and more required nowadays. Hence, current web applications market is defined by fierce global competition. This market can be divided into three different positions: quality, cost, and time to market. As illustrated in Figure 3, to produce quality web applications requires better, cheaper, and faster development processes [12].
Quality
Better
Web Cheaper Application Faster
Cost Time to Market
Figure 3. Market Dimensions for Web Applications Nevertheless, the development of web applications has been in general ad hoc and V&V is often neglected of web development, resulting in poor‐quality web applications. Quality control of a web application may be expensive, but the impact of defects resulting from lack of testing and static analysis could be more costly. Therefore, in order to produce better web applications, there is an increasing need of methodologies and guidelines to develop such applications delivered on time (faster), within budget (cheaper), and with a high level of quality (better).
‐ 3 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
In the 1980s software quality was the key SE problem. In this first part of the 21st century, time to market is usually more critical (especially for web applications). Quality is still an important factor, but it must be achieved in the context of rapid delivery. In order to conciliate the need of developing quality web application with the urgency of the market while saving costs, the automation of software quality control activities might be the key [38][7][83][63]. The automation of the software development processes (such as V&V) has a profound impact on the speed, quality, and cost of releasing software. There are some actions impossible to automate, for example some kind of testing and analysis techniques that rely on experience testers. However, the list of issues that can be automated is long [23]. All in all, the main problem I face in this dissertation is to find effective ways to improve the quality of web applications by means of the automation its quality control (V&V) activities, i.e. software testing and analysis. This way, the problem of costly and expensive V&V activities could be abbreviated and therefore web application would get better quality levels.
1.1. Research Methodology
The automation of software testing and analysis is becoming a hot research topic in SE. Web applications are more and more difficult to assess, due to its peculiarities. Therefore, the automation of quality control activities presents important research challenges. The work carried out in the ICT‐ROMULUS project, a research project within the European Union Seventh Framework Programme for Research and Technology Development, was crucial to identify these challenges. In particular the participation in the Work Package (WP) 5 in this project, which studied enhancement of software quality of web applications from the conception of the software by means of automatic code and tests generation techniques, was very useful to detect major drawbacks of current solutions and to point towards areas for improvement. In the top of that, methods for the automated functional and non‐functional testing and static analysis were proposed. As a result of this work, the Automatic Testing Platform (ATP), a proof‐of‐concept deployment system was created and integrated in the open‐source Romulus Framework. This initial work was continued in the ITEA‐MOSIS project, a research project focused on managing variability and Domain‐Specific Languages (DSL) for Model‐Driven Development (MDD) of software‐intensive systems. In this context, state‐of‐the‐art modelling methods and technologies for web applications were studied and included in the automated quality control approach previously developed. As a result, Model‐Based Testing (MBT) techniques were proposed in order to enhance the automation of quality control proposed. Finally, the electronic invoice application developed used MDD in the context of the project Factur@ (an innovation project funded by Telvent and Comunidad de Madrid) was employed to perform a case study which perform the validation of the final work. All in all, the process followed in this has the following steps: i) Identification of the problem at hand; ii) Literature review (state‐of‐the‐art); iii) Proposal of a solution; iv) Validation of the proposed approach by means of experiment and case studies, and also by disseminations of the results in journals, conferences, project deliverables, and so on; v) Synthesis of findings (conclusions) and definition of the possible future work.
‐ 4 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
1.2. Structure of the document
After introducing the motivation and context of the research work the document follows with a detailed chapter on the state of the art on software quality control. This section first introduces V&V in the context of software quality. After that, static analysis and traditional software testing is depicted. Finally, this section describes in detail web testing for web applications and automated software testing. Then, chapter 3 establishes a more detailed description of the objectives of this research work. Chapter 4 describes the high‐level decisions taken in this piece of research in order to achieve the stated goals. The next two chapters detail a fine‐grained description of the main contributions of this dissertation: automated functional testing (section 5) and automated non‐functional assessment (section 6) for web applications. Chapter 7 finalizes the description of the original contributions by thoroughly explaining the reference architecture proposed to perform the automation of quality control activities. Chapter 8 provides a extend summary of all the validation activities carried out in order to ensure the correctness of the results presented in this dissertation. Finally, Chapter 9 establishes the main conclusions of this work, as well as a description of the possible future research activities.
‐ 5 ‐
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Chapter 2. State of the Art
In order to make an apple pie from scratch, you must first invent the universe.
‐ Carl Sagan
oftware engineering is concerned with the practicalities of developing and delivering useful and quality software. This section presents and state of the practice on software quality control, i.e. V&V. First, I introduce the key concepts on software quality, quality assurance, and V&V in section 2.1. Then the main techniques and Smethods for static analysis are presented in section 2.2. Software testing is described in section 2.3. Testing of web applications and automated testing are depicted in section 2.4 and 2.5 respectively.
2.1. Software Quality
The question “What is software quality?” can generate different answers, depending on the involved practitioners role or the kind of software systems [138]. Regarding people, there are different views and expectations based on their roles and responsibilities. There are two main groups of people involved in a software product or service. On one hand, there are consumers, i.e. customers (responsible for the acquisition of software products or services) and users (people who use the software products or services for various purposes). Nevertheless the dual roles of customers and users are quite common. On the other hand, producers are people involved with the development, management, maintenance, marketing, and service of software products. The quality expectations of consumers are that a software system performs useful functions as it is specified. For software producers, the fundamental quality question is fulfilling their contractual obligations by producing software products that conform to the Service Level Agreement (SLA). Pressman’s definition of software quality comprises both points of views [122]: “An effective software process applied in a manner that creates a useful product that provides measurable value for those who produce it and those who use it”.
2.1.1. Quality Engineering Quality Engineering (QE) ‐also known as Quality Management‐ is a process that evaluates, assesses, and improves the quality of software. There are three major groups of activities in the QE process, as depicted in Figure 3:
‐ 7 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
1. Quality planning (pre‐QA activities). This stage establishes the overall quality goal by managing customer’s expectations under the project cost and budgetary constraints. This quality plan also includes the QA strategy, i.e. the selection of QA activities to perform and the appropriate quality measurements to provide feedback and assessment. 2. Quality Assurance (in‐QA activities). It guarantees that software products and processes in the project life cycle meet their specified requirements by planning and performing a set of activities to provide adequate confidence that quality is being built into the software. The main QA activity is V&V, but there are others such as software quality metrics, quality standards, configuration management, documentation management, or experts’ opinion. 3. Quality quantification and improvement measurement, analysis, feedback, and follow‐up activities (post‐QA activities). The analyses would provide quantitative assessment of product quality, and identification of improvement opportunities.
Quality Plan Quality Assurance Post‐QA
•Quality Goal •V&V •Meassurement •Quality Strategy •Quality Metrics •Analisys •Qualty Standards •Feedback •... •Follow‐up
Figure 4. Software Quality Engineering Proccess
2.1.1.1. Requirements and Specification Requirements are a key topic in the QE domain. A requirement is a statement identifying a capability, physical characteristic, or quality factor that bounds a product or process need for which a solution will be pursued. The requirements development (also known as requirements engineering) is the process of producing and analysing customer, product, and product‐ component requirements. The set of procedures that support the development of requirements including planning, traceability, impact analysis, change management and so on is known as requirements management. Requirements Management and Engineering (RM&E) is the overall term used to include all requirements related processes [67]. There are two kinds of software requirements [126]: ‐ Functional requirements are actions that the product must do to be useful to its users. They arise from the work that stakeholders need to do. Almost any action, inspect, publish, or most other active verbs can be a functional requirement. ‐ Non‐functional requirements are properties, or qualities, that the product must have. For example, they can describe such properties as look and feel, usability, or security. They are often called quality attributes. Another important topic strongly linked with the requirements is the specification, which is a document that specifies in a complete, precise, verifiable manner, the requirements, design, behaviour, or other characteristics of a system, and often, the procedures for determining whether these provisions have been satisfied [140]. For example, a (non‐functional) requirement should be “the product response time shall be less than 0.25 second”. The specification for this requirement would include technical information about specific design
‐ 8 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
aspects. It is important to distinguish between the specification supplied by a customer, known as a Customer Requirements Specification (CRS) and the specification created by the developers, known as Software Requirements Specification (SRS). This second kind of specification is a complete description of the behaviour of the system to be developed, and includes the definition of the system use cases. A use case can be seen as a way of documenting functional requirements describing the interactions between the users and the system.
2.1.2. Quality Assurance Quality Assurance (QA) is a “systematic, planned set of actions necessary to provide adequate confidence that the software development and maintenance process of a software system product conforms to established specification as well as with the managerial requirements of keeping the schedule and operating within the budgetary confines” [47]. QA is primarily concerned with defining or selecting standards that should be applied to the software development process or software product. Moreover, QA process selects the V&V activities, tools and methods to support these standards [132]. V&V is a set of activities carried out with the main objective of withholding products from shipment if they do not qualify. In contrast, QA is meant to minimize the costs of quality by introducing a variety of activities throughout the development and maintenance process in order to prevent the causes of errors, detect them, and correct them in the early stages of development. As a result, QA substantially reduces the rates of non‐qualifying products. All in all, V&V activities are only a part of the total range of QA activities [47].
2.1.2.1. Quality Standards Various quality standards have been proposed to accommodate these different quality views and expectations. This section describes the ISO/IEC‐9126 (maybe the mostly influential in the SE community to date) and its successor, the ISO/IEC‐25000.
2.1.2.1.1. ISO/IEC‐9126 ISO/IEC‐9000 is a family of standards for quality management systems. In 1991, ISO published its first international consensus on the terminology for the quality characteristics for software product evaluation (ISO 9126 on Software Product Quality Characteristics and Guidelines for their Use) [77]. Afterwards, from 2001 to 2004, ISO published an expanded four‐part version, containing both hierarchical framework for quality models and metrics for these models. The current version of the ISO/IEC‐9126 series now consists of one International Standard (IS) [73] and three Technical Reports (TR) [74][75][76]. The ISO/IEC‐9126 quality model distinguishes three different views on software product quality: ‐ Internal quality: concerns the properties of the system that can be measured without executing it. ‐ External quality: concerns the properties of the system that can be observed during its execution. ‐ Quality in use: concerns the properties experienced by its users/customers during operation and maintenance of the system. Ideally, the internal quality determines the external quality and external quality determines quality in use, as depicted in the following picture:
‐ 9 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Internal influences External influences Quality Quality Quality in Use depends on depends on
Figure 5. ISO/IEC‐9126 Quality Lifecycle The first document of the ISO/IEC 9126 series (quality model) contains two‐part quality model for software product quality [5]: i) Internal and external quality model; ii) Quality in‐use model. The first part of the two‐part quality model determines six characteristics in which they are subdivided into twenty‐seven sub‐characteristics for internal and external quality [73]. Measures for estimating external, internal, and quality‐in‐use characteristics are listed in three technical reports accompanying the standard quality model. ISO/IEC 9126‐2 [74], ISO/IEC 9126‐ 3 [75], and ISO/IEC 9126‐4 [76] define respectively: external, internal, and quality in use quality metrics. Quality model of ISO/IEC‐9126 divides the internal and external software product quality into six top‐level quality features:
External and Internal Quality
Functionality Reliability Usability Efficiency Maintainability Portability
Suitability Maturity Understandability Time Analysability Adaptability Accuracy Fault Tolerance Learnability Behaviour Changeability Installability Interoperability Recoverability Operability Resource Stability Co‐Existence Security Reliability Attractiveness Utilisation Testability Replaceability Functionality Compliance Usability Efficiency Maintainability Portability Compliance Compliance Compliance Compliance Compliance
Figure 6. ISO/IEC‐9126 Quality Model (External and Internal Quality) The following definitions have been extracted directly from the norm ISO/IEC‐9126‐1 [73]: ‐ Functionality: “The capability of the software product to provide functions which meet stated and implied needs when the software is used under specified conditions”. The sub‐ characteristics include: ‐ Reliability: “The capability of the software product to maintain a specified level of performance when used under specified conditions”. The sub‐characteristics include: ‐ Usability: “The capability of the software product to be understood, learned, used and attractive to the user, when used under specified conditions”. The sub‐characteristics include: ‐ Efficiency: “The capability of the software product to provide appropriate performance, relative to the amount of resources used, under stated conditions”. The sub‐characteristics include: ‐ Maintainability: “The capability of the software product to be modified. Modifications may include corrections, improvements or adaptation of the software to changes in
‐ 10 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
environment, and in requirements and functional specifications”. The sub‐characteristics include: ‐ Portability: “The capability of the software product to be transferred from one environment to another”. The sub‐characteristics include: The attributes of quality in use are categorised into the following four characteristics:
Quality in use
Effectiveness Productivity Safety Satisfaction
Figure 7. ISO/IEC‐9126 Quality Model (Quality in Use) ‐ Effectiveness: “The capability of the software product to enable users to achieve specified goals with accuracy and completeness in a specified context of use”. ‐ Productivity: “The capability of the software product to enable users to expend appropriate amounts of resources in relation to the effectiveness achieved in a specified context of use”. ‐ Safety: “The capability of the software product to achieve acceptable levels of risk of harm to people, business, software, property or the environment in a specified context of use”. ‐ Satisfaction: “The capability of the software product to satisfy users in a specified context of use”.
2.1.2.1.2. ISO/IEC‐25000 ISO/IEC‐9126 presents some weaknesses found by researchers and practitioners [4]. Since 2005 and up‐to‐date, the ISO is updating the current ISO/IEC‐9126 international standard on software product quality measurement. However, this current standard will be superseded by the upcoming ISO/IEC‐25000 series of international standards on Software product Quality Requirements and Evaluation (SQuaRE). One of the objectives of this new standard series is the harmonization of its contents with the software measurement terminology of ISO/IEC‐15939 (software measurement process). ISO/IEC‐25000 series will replace the series of standards ISO/IEC‐9126 (software product quality) and also the ISO/IEC‐14598 (software product evaluation). The work on ISO/IEC‐25000 series is unfinished at this time. It is being carried out by Working Group 6 (WG6) of the software and system engineering subcommittee (SC7) of the ISO/IEC Joint Technical Committee (JTC1) on Information Technology (ISO/IEC JTC1/SC71). SQuaRE consists of the following five divisions: ‐ ISO/IEC‐2500n: Quality Management Division. The standards form this division define all common models, terms and definition referred further by all other standards from SQuaRE series. ‐ ISO/IEC‐2501n: Quality Model Division. SQuaRE employs the same quality model proposed by ISO/IEC‐9126, dividing quality in characteristics for internal, external, and quality in use.
‐ 11 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
This division details this quality model decomposing, the internal and external software quality characteristics into sub‐characteristics ‐ ISO/IEC‐2502n: Quality Measurement Division. Software product quality measurement reference model, mathematical definitions of quality measures, and practical guidance for their application. ‐ ISO/IEC‐2503n: Quality Requirements Division. This division helps to specify quality requirements. These quality requirements can be used in the process of quality requirements elicitation for a software product to be developed or as input for an evaluation process. ‐ ISO/IEC‐2504n: Quality Evaluation Division. Requirements, recommendations and guidelines for software product evaluation, whether performed by evaluators, acquirers or developers. ‐ ISO/IEC 25050 to 25099 are reserved to be used for SQuaRE extension International Standards, Technical Specifications, Publicly Available Specifications (PAS) and/or Technical Reports: ISO/IEC 25051 and ISO/IEC 25062 are already published.
2.1.3. Verification and Validation Verification and Validation (V&V) ‐also known as Software Quality Control‐ is concerned with evaluating that software being developed meets its specification and delivers the functionality expected by the consumers. These checking processes start as soon as requirements become available and continue through all stages of the development process [54]. Verification is different to validation, although they are often confused. Barry Boehm expressed the difference between them [19]:
‐ Verification: are we building the product right? The aim of verification is to check that the software meets its stated functional and non‐functional requirements (i.e. the specification). ‐ Validation: are we building the right product? The aim of validation is to ensure that the software meets consumer’s expectations. It is a more general process than verification, due to the fact that specifications not always reflect the real wishes or needs of consumers (i.e., users and customers). V&V activities include a wide array of QA activities. Although software testing plays an extremely important role in V&V, other activities are also necessary. Within the V&V process, two big groups of techniques of system checking and analysis may be used [111]:
‐ Software testing. It is the most commonly performed activity within QA. Given a piece of code, software testing (or simply testing) consists of observing a sample of executions (test cases), and giving a verdict over them [16]. Hence testing is an execution‐based QA activity so a prerequisite is the existence of the implemented software units, components, or system to be tested. For that reason, it is sometimes called dynamic analysis. Software testing is a broad term encompassing a wide spectrum of different concepts, such as testing level (unit, integration, system, user testing, and so on), testing strategies (black‐ box, white‐box, grey‐box, and non‐functional testing), and testing processes (manual, model‐based, automated testing, and so on). On one hand testing establishes the existence of defects. On the other hand, debugging is concerned with locating and
‐ 12 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
correcting these defects [132]. As major parts on this dissertation, testing is covered in section 2.3 and automated testing in section 2.4. ‐ Static analysis. It is a form of V&V that does not require execution of the software. Static analysis work on a source representation of the software: either a model of the specification of design, or the source or the program [3]. Perhaps the most commonly used are inspection and review, where a specification, design or program is checked by a group of people. Additional static analysis techniques may be used, such as automated program analysis (the source code of a program is checked for patterns that are known to be potentially erroneous) and formal methods (mathematical arguments that a program conform its specification) [132]. Nowadays, the executable code per excellence is code (although there are some executable specification and design languages, there are not widespread). Thus, any product during development can be evaluated using static analysis, including of course code. However, testing (dynamic analysis) almost exclusively executes code. It should be noted that there is a strong divergence of opinion about what types of testing constitute validation or verification. Some authors believe that all testing is verification and that validation is conduced when requirements are reviewed and approved. Other authors view unit and integration testing as verification and higher‐order testing (e.g. system or user testing) as validation [122]. To solve this divergence, V&V can be treated as a single topic rather than as two separate topics [1]. Therefore, V&V can be seen as a disciplined approach to assessing software products throughout the product life cycle. All in all, V&V activities can be summarized in the following picture:
Levels
Testing Strategies
Processes
Inspection V&V
Review Static Analisys Automatic Analysis
Formal Methods
Figure 8. Verification & Validation Schema
2.1.3.1. Defects Key to the correctness aspect of V&V is the concept of software defect. The term defect generally refers to some problem with the software, either with its external or internal
‐ 13 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
behaviour. Software problems or defects are also commonly referred to as “bugs”. The IEEE Standard 610.12 defines the following terms related to defects [82]: ‐ Error: A human action that produces an incorrect result. Errors can be classified into two categories: i) Syntax error (program statement that violates one or more rules of the language in which it is written); ii) Logic error (incorrect data fields, out‐of‐range terms, or invalid combinations). ‐ Fault: An incorrect step, process, or data definition in a computer program. It is a condition that causes a system to fail in performing its required function. ‐ Failure: The inability of a system or component to perform its required functions within specified performance requirements. In addition to this level of granularity for defects, it is interesting to contemplate also incidents as symptom associated with a failure that alerts the user to the occurrence of a failure. All in all, error, faults, failures, and incidents are different aspects of software defects. A causal relation exists among these four aspects of defects [138]. Errors may cause faults to be injected into the software, and faults may cause failures when the software is executed.
2.2. Static Analysis
Static analysis of a software piece is performed without executing the code. There are three advantages of software analysis over testing [132]:
1. During testing, errors can hide other errors. This situation does not happen with static analysis, because it is not concerned with interactions between errors. 2. Incomplete versions of a system can be statically analysed without additional cost. In testing, if a program is incomplete, test harnesses have to be developed. 3. Static analysis can consider broader quality attributes of a System Under Test (SUT) than searching defects, such as compliance with standards, portability, and maintainability.
2.2.1. Inspections Inspections are critical examinations of software artefacts by human inspectors aimed at discovering and fixing faults in the software systems. All kinds of software artefacts for are subject to be inspected. This is primary reason for the existence of inspection: not waiting for the availability of executable programs (such as in testing) before starting performing inspection [138]. The original Fagan inspection process included five steps [41]: i) Planning: Deciding what to inspect, who should be involved, and what role. ii) Overview meeting: The author assigns the individual indications of inspection to the inspectors. iii) Preparation: Each inspector performs individual inspection. iv) Inspection meeting to collect and consolidate individual inspection results. v) Rework. The author fixes the identified problems or provides other responses. vi) Follow‐up: Closing the inspection process by final validation. The Gilb inspection supposes a variation of Fagan inspection since an additional step (called “process brainstorming”) is added right after the inspection meeting [52]. This step is aimed at preventive actions and process improvement in the form of reduced defect injections for future development activities.
‐ 14 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.2.2. Review Review is the process in which a group of people examine the software and its associated documentation, looking for potential problems and non‐conformance with standards, and other potential problems or omissions. The review team makes informed judgment about the level of quality of the system under review. This review process is based on documents produced during the software development process, such as specification, design, code, models, test plan, configuration management procedures, or user manuals [54].
A special form of review is called walkthrough, a more organized review typically applied to software design and code. It is considered to be an informal type of review. According to IEEE Standard for Software Reviews, a walkthrough is a form of software peer review “in which a designer or programmer leads members of the development team and other interested parties through a software product, and the participants ask questions and make comments about possible errors, violation of development standards, and other problems” [69].
2.2.3. Automated Software Analysis Automated Software Analysis (ASA) assesses the source code using patterns that are known to be potentially dangerous [54] ASA technologies are usually delivered as commercial or open source tools and services. These tools can locate many common programming faults, analysing the source code before it is tested and identifying potential problems in order to re‐code them before they manifest themselves as failures [83]. The intention of this analysis is to draw a code reader’s attention to faults in the program, such as: ‐ Data faults. For example, variable used before initialization, variables declared but never used, variables assigned twice but never used between assignments, and so on. ‐ Control faults. For example, unreachable code or unconditional branches into loops. ‐ Input/output faults. For example, variables output twice with no intervening assignment. ‐ Interface faults. For example, parameter‐type mismatches, parameter under mismatches, non‐usage of the results of functions, uncalled functions and procedures, etc. ‐ Storage management faults. For example, unassigned pointers, pointers arithmetic, or memory leaks.
2.2.4. Formal Methods The term “formal methods” is used to refer to any activities that rely on mathematical representations of software including formal specification and verification. In the 1980s, many software engineering researchers proposed that using formal development methods was the best way to improve software quality. They predicted that by the 21st century, a large proportion of software would be developed using formal method. This is not true due to the fact that [54]: ‐ Successful SE techniques. The use of other SE methods such as structured methods or configuration management has resulted in improvements in software quality. ‐ Limited scope of formal methods. Formal methods are not well suited to specifying user interfaces and user interaction, and nowadays the user interface component has become a greater and greater part of most systems.
‐ 15 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
‐ Limited scalability of formal methods. Projects that have used these techniques have mostly been concerned with relatively small, such as critical kernel systems. As systems grow, the effort required to develop a formal specification grows excessively. Formal methods comprise a set of mathematically‐based techniques for the specification and verification of software systems. Both formal specification techniques and formal verification techniques are widely referred to as formal methods collectively in literature. The existence of formal specifications is a prerequisite for formal verifications.
2.2.4.1. Formal Specifications Formal specifications produce an unambiguous set of product specification such as customer requirements, environmental constraints and design intentions. They can be produced in several different forms [132]. Firstly, descriptive specifications are focused on the properties or conditions associated with software products and their components. There are several kinds of descriptive specifications, such as Entity‐Relationship (ER) diagrams are commonly used to describe product components and connections. These diagrams show data entities, their associated attributes and the relations between them [132]. Logical specifications focus on the formal properties associated with different product components or the product as a whole. They are logical statements or conditions associated with the states, or program states, of programs or program segments. The basic elements of these logical specifications are the pre‐conditions, post‐conditions, and invariants, which are generally associated with program code. Some examples are the Z specification [133] and VDM (Vienna Definition Method) [81] languages. Contracts it is also a kind of logical specification. In addition, the Object Constraint Language (OCL) was initially designed as a logical specification for UML [144]. Nowadays it is part of the Meta‐Object Facility (MOF) standard by the Object Management Group (OMG). Algebraic specifications focus on functional computation carried out by a program or program‐segment and related properties, for example Larch [57] or the OBJ family [46]. Syntactic specifications are used to describe languages used in computing, such as programming languages. The BNF (Backus–Naur Form) notation is universally used for syntactic specifications. Secondly, operational specifications are focuses on the required behaviour of the software systems, for example: Data Flow Diagrams (DFDs) specify information flow among the major functional units. They are used to show how data flows through a sequence of processing steps. Unified Modelling Language (UML) provides a general‐purpose visual modelling language used to specify, visualize, construct and document software artefacts. Finite‐State Machines (FSMs) is a behavioural model composed of a finite number of states, transitions between those states, and actions. Labelled Transition Systems (LTS) is also a state machine representation. The main difference between FSM and LTS is that while an FSM has transitions labelled with pairs of (input, output), an LTS specifies interactions, which can, but need not be interpreted as input or as output. Similarly, Petri Nets are considered as a special kind of FSMs with two distinct types of nodes called places and transitions [119]. Graphs‐based models are the abstract representation of a set of points (vertices or nodes) connected by lines (edges or links). Graphs or digraphs (i.e. directed graphs) are sometimes used in the literature to model the behaviour of a software system. For instance, an Event‐Flow Graph (EFG) is used in GUI testing since it represents all the possible interactions among the events in a GUI. Similarly, an
‐ 16 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Event Interaction Graph (EIG) contains nodes, one for each system‐interaction event in the GUI. An Event Semantic Interaction Graph (ESIG) contains nodes that represent events, directed edges from nodes shows that there is a semantic relationship from the event represented by nodes. Specification and Description Language (SDL) specifies the description and the behaviour of reactive and distributed systems. It provides both a graphical Graphic Representation (SDL/GR) and a textual Phrase Representation (SDL/PR) [123]. A system is specified as a set of interconnected abstract machines which are extensions of FSM.
2.2.4.2. Formal Verification Formal verification checks the conformance of software design or code to the formal specifications, ensuring that the software is fault‐free with respect to its formal specifications. Axiomatic correctness (Hoare logic) [65][147] works with the logical specifications of programs or formal designs by associating with each type of program or design elements with an axiom to prescribe the logical transformation of program state before and after the execution of this element type. Weakest pre‐conditions [36][56] works focusing on the goal or the computational result that is captured by the final state of the execution sequence. A series of backward chaining operations through the use of the so‐called weakest pre‐conditions transform this final state and its properties into an initial state and its properties. Functional correctness (program calculus) [107] is similar to the axiomatic approach in the sense that some basic axioms or meanings of program elements are prescribed. Symbolic execution is used to connect these elements in a program. It involves executing (interpreting) a program symbolically. During the execution, the values of variables are held in algebraic form and the outcome of the program is represented as one or more expressions. Decisions are handled by following both outcomes while remembering the condition value corresponding to each of the paths now being followed separately. At the end of the evaluation, there will be two facts about each path through the program: a list of the decision outcomes made along the path and the final expression of all the variables expressed algebraically. Together these define the function of the program and can be compared with the required function. Semi‐formal techniques check certain properties instead of proving the full correctness of software. For example, model checking, which is an approach to automatically or algorithmically check certain properties for some software systems [138]. In model checking, a software system is modelled as a FSM, with some property of interest expressed as a suitable formula, or a proposition, defined with respect to the FSM. After that, the model checker runs an algorithm to check the validity of the proposition.
2.3. Software Testing
Software testing consists of the dynamic evaluation of the behaviour of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behaviour [1]. The key concepts of this definition are depicted as follows:
‐ 17 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
‐ Dynamic: The SUT is executed with specific input values to find failures in its behaviour. Thus, the actual SUT should ensure that the design and code are correct, and also the environment, such as the libraries, the operating system and network support, and so on. ‐ Finite: Exhaustive testing is not possible or practical for most real programs. They usually have a large number of allowable inputs to each operation, plus even more invalid or unexpected inputs, and the possible sequences of operations are usually infinite as well. Testers must choose a number of tests so that we can run the tests in the available time. ‐ Selected: Since there is a huge or infinite set of possible tests but can afford to run only a small fraction of them, the key challenge of testing is how to select the tests that are most likely to expose failures in the system. ‐ Expected: After each test execution, it must be decided whether the observed behaviour of the system was a failure or not.
2.3.1. Testing Levels Typically, a commercial software system has to go through three stages of testing [132]: 1. Development testing, where the SUT is tested during development to discover defects. This stage is performed by software engineers (i.e. programmers, testers, system designers, and so on). 2. Release testing, where a separate testing team tests a complete version of the system before it is released to users. The aim of this stage is to check that the SUT meets its requirements. 3. User testing, in which potential or real users of the system test it in their own environment.
2.3.1.1. Development Testing Development testing includes all testing activities that are performed by the team developing the system. In this stage, testing may be carried out at three levels of granularity [132]: 1. Unit testing, where individual program units are tested. Unit testing should focus on the functionality of objects or methods. 2. Integration testing, where units are combined to create composite components. Integration testing should focus on testing components interfaces. 3. System testing, where all of the components are integrated and the system is tested as a whole. System testing should focus on testing components interactions.
2.3.1.1.1. Unit Testing Unit testing is a method by which individual pieces of source code are tested to verify that the design and implementation for that unit has been correctly implemented. There are four phases executed in sequence in a unit test case [101], illustrated in Figure 9 and described as follows: ‐ Setup. The test case initialises the test fixture, that is the “before” picture required for the SUT to exhibit the expected behaviour. ‐ Exercise. The test case interacts with the unit or component under test. The unit to be tested usually queries another component, named Depended‐On Component (DOC). ‐ Verify. The test case determines whether the expected outcome has been obtained using test oracles.
‐ 18 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
‐ Teardown. Test case tears down the test fixture to put the SUT back into the initial state.
Figure 9. Unit Testing Unit testing be done with unit under test in isolation, i.e., without interacting with its DOCs. For that aim, test doubles are employed to replace any components on which the unit under test depends. There are the following kinds of test doubles [101]: ‐ A dummy object is a placeholder object that is passed to the SUT as an argument (or an attribute of an argument) but is never actually used. ‐ A test stub is an object that replaces a real component on which the SUT depends so that the test can control the indirect inputs of the SUT. It allows the test to force the SUT down paths it might not otherwise exercise. A Test Spy, which is a more capable version of a Test Stub, can be used to verify the indirect outputs of the SUT by giving the test a way to inspect them after exercising the SUT. ‐ A mock object is an object that replaces a real component on which the SUT depends so that the test can verify its indirect outputs. ‐ A fake object is an object that replaces the functionality of the real DOC with an alternative implementation of the same functionality. Tester should write two kinds of unit test cases. The first one should reflect normal operation of a program and should show that the components work. The other kind of test case should be based on testing experience of where common problems arise. It should use abnormal inputs to check that these are properly processed and no not crash the unit under test [132].
2.3.1.1.2. Integration Testing Integration testing should expose defects in the interfaces and interaction between integrated components or modules [132]. There are different strategies to perform integration testing. First, decomposition‐based is a strategy to describe the order in which units are to be integrated, presuming that the units have been separately tested. There are four integration strategies based on the functional decomposition of the SUT [82]: ‐ Top‐down integration. This strategy starts with the main unit (module), i.e. the root of the procedural tree. Any lower‐level module that is called by the main unit should be substituted by a test double (e.g. a Test Stub). Once testers are convinced that the main unit logic is correct, the stubs are gradually replaced with the actual code. This process is repeated for the rest of lower‐unit in the procedural tree. The main advantage of this approach is that defects are more easily found. ‐ Bottom‐up integration. This strategy is a mirror image to the top‐down order, with the difference that test double modules (e.g. a Fake Object) emulate units at the next level up
‐ 19 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
in the procedural tree. In this case, the test double module (whichever) is known as a driver. Whit this approach is easier to find a missing branch link. ‐ Sandwich integration. This strategy is a combination of top‐down and bottom‐up integration. ‐ Big‐Bang integration. All or most of the units are integrated at the same time. This method is very effective for saving time in the integration testing process. However it makes the entire integration process more complicated. The second integration testing strategy is call graph‐based, in which the basic idea is to use the call graph instead of the functional decomposition tree. A call graph is a directed labelled graph which represents the SUT. There are two types of call graph based integration testing ‐ Pairwise integration. The idea behind this approach is to eliminate the need for developing test doubles, using the actual code. Integration is restricted to a pair of units in the graph. ‐ Neighbourhood integration. The neighbourhood of a node in a graph is the set of nodes that are one edge away from the given node. Neighbourhood integration testing reduces the number of test sessions and avoids the use of test doubles. Finally, in the path‐based approach the motivation is to combine structural and behavioural methods of testing for integration testing, focusing on interactions among units [82].
2.3.1.1.3. System Testing System testing during development involves integrating components to create a version of the system and the testing the integrated system. It verifies that the components are compatible, interact correctly and transfer the right data at the right time across the interfaces [54]. It obviously overlaps with integration testing, but the difference here is that system testing should involve all the components developed. When the testing process is perform to determine whether the system meets its specification is known as conformance testing. When a new feature or functionality is introduced to a system (we can call it, a build), the way of testing this new feature in known as progression testing. In addition to that, to check that the new introduced changes do not affect the correctness of the rest of the system, the existing test cases are exercised. This approach is commonly known as regression testing [82]. When the system interacts with any external or third party system, another testing could be done, known as system integration testing. This kind of testing verifies that the system is integrated to any external systems properly.
2.3.1.2. Release Testing Release testing is the process of testing a particular release of a system performed by separate team outside the development team. While development system testing should focus on discovering defects in the system (defect testing), the aim of release testing is to check that the system meets is requirements (validation testing) [132]. The primary goal of the release testing process is to convince the supplier of the system that is good enough for use. If so, it can be released as a product or delivered to the consumer. Release testing is usually a black‐box testing process where tests are derived from the specification.
‐ 20 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.3.1.3. User Testing User or customer testing is a stage in the testing process in which users or customers provide input and advice on system testing. There are different types of user testing [132]: ‐ Alpha testing takes place at developers' sites, working together with the software consumers, before it is released to external users or customers. ‐ Beta testing takes place at customers' sites, and involves testing by a group of customers who use the system at their own locations and provide feedback, before the system is released to other customers. ‐ Acceptance testing, where consumers decide whether or not the system is ready to be deployed in the consumer environment. It can be seen as a black‐box (functional) testing performed at system level by final users or customers. ‐ Operational testing is performed by the end user in its normal operating environment.
2.3.2. Testing Methods Testing methods (or strategies) define the approaches for designing test cases. They can be responsibility based (black‐box), implementation based (white box), or hybrid (grey‐box) [120]. Black‐box techniques design test cases on the basis of the specified functionality of the item to be tested. White‐box ones rely on source code analysis to develop test cases. Grey‐box testing designs test cases using both responsibilities based and implementation based approaches.
2.3.2.1. Black‐Box Testing Black‐box testing (also known as functional or behavioural testing) is based on requirements with no knowledge of the internal program structure or data. Black‐box testing relies on the specification of the system or the component that is being tested to derive test cases. The system is a black‐box whose behaviour can only be determined by studying its inputs and the related outputs [82]. There are a lot of specific black‐box testing techniques; some of the most well‐known ones are described as below. Systematic testing refers to a complete testing approach in which SUT is shown to conform exhaustively to a specification, up to the testing assumptions. It generates test cases only in the limiting sense that each domain point is a singleton sub‐domain [82]. Inside this category, it can be found for example pairwise (all‐pairs) testing, which is a combinatorial testing method that, for each pair of input parameters to a SUT, tests all possible discrete combinations of those parameters. Other systematic black‐box testing techniques are equivalence partitioning and boundary value analysis (described in section 2.5.2). Random testing is literally the antithesis of systematic testing: the sampling is over the entire input domain. Duran and Ntafos [37] have demonstrated, with both theoretical and empirical evidence, that random test case selection criteria can be as effective at defect detection as partitioning methods. This means of testing seems a better choice than systematic testing in two general situations [58]: i) Sparse sampling: for a large, unstructured input domain. ii) Persistent state: the usual theoretical assumption is that software is reset between tests, so that results are repeatable. Fuzz testing is a form of black‐box random testing which randomly mutates well‐formed inputs and tests the program on the resulting data [54]. It delivers randomly sequenced and/or structurally bad data to a system to see if failures occur.
‐ 21 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Graphic User Interface (GUI) testing is the process of ensuring the specification of software with a graphic interface interacting with the user. GUI testing is event driven (e.g. mouse movements or menu selections) and provides a front end to the underlying application code through messages or method calls [98]. GUI testing at unit level is used typically at the button level. GUI testing at system level exercises the event‐drive nature of the SUT. GUI applications offer a small benefit for testers: there is little need for integration testing. GUI testing is mainly used for ensuring the correctness the entire system’s functionality, safety, robustness, and usability [82]. Smoke testing is the process of ensuring the main functionality of the SUT. A smoke test case if the first to be run by testers before accepting a build for further testing. Failure of a smoke test case will mean that the build is refused by testers. The name of “smoke testing” derives electrical system testing, whereby the first test was to switch on and see if it smoked [42]. Sanity testing determines whether or not it is reasonable to proceed with further testing. The difference with smoke testing it is that if a smoke test fails, it is impossible to conduct a sanity test. In contrast, if the sanity test fails, it is not reasonable to attempt more rigorous testing. Both sanity tests and smoke testing are ways to avoid wasting time and effort in more rigorous testing. The typical example of sanity testing for development environment is the “Hello world” program.
2.3.2.2. White‐Box Testing White‐box testing (also known as structural testing) is based on knowledge of the internal logic of an application's code. It determines if the program‐code structure and logic is faulty. White‐ box test cases are accurate only if the tester knows what the program is supposed to do. White‐box testing does not account for errors caused by omission [60]. Black‐box testing uses only the specification to identify use cases, while white‐box testing uses the program source code (implementation) as the basis to of test cases identification. Both approaches used in conjunction should be necessary in order to select a good set of test cases for the SUT [60]. Hence, the following table summarizes the main differences between black and white‐box testing approaches:
Table 1. Black‐Box Vs. White‐Box Testing Feature Black‐box White‐box
Tester visibility Specification/Requirements (input Code & output) Defect type Failures Faults Defect No, debugging in needed to find Yes, test cases identify the identification the fault which causes the failure specific LOC involved Usually done by Independent tester team Developers Some of the most significant white‐box techniques are described as follows. Code coverage defines the degree of source code which has been tested, for example in terms of percentage of Lines of Code (LOC). There are several criteria for the code coverage: ‐ Statement Coverage. Line of code coverage granularity.
‐ 22 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
‐ Decision (branch) Coverage. Control structure (e.g. if‐else) coverage granularity. ‐ Condition coverage. Boolean expression (true‐false) coverage granularity. ‐ Paths coverage. Every possible route coverage granularity. ‐ Function coverage. Program functions coverage granularity. ‐ Entry/exit coverage. Call and return of the coverage granularity. Fault injection is the process of injecting faults into software to determine how well (or badly) some SUT behaves [42]. Defects can be said to propagate in that their effects are visible in program states beyond the state in which the error existed (a fault became a failure). Mutation analysis validates tests and their data by running them against many copies of the SUT containing different, single, and deliberately inserted changes. Mutation analysis helps to identify omissions in the code [42].
2.3.2.3. Grey‐Box Testing Grey‐box testing is the technique that uses a combination of black‐box and white‐box testing. Grey‐box testing is not black box testing, because the tester does know some of the internal workings of the SUT. In grey‐box testing, the tester applies a limited number of test cases to the internal workings of the software under test. In the remaining part of the grey‐box testing, one takes a black‐box approach in applying inputs to the SUT and observing the outputs.
2.3.2.4. Non‐Functional Testing The non‐functional aspects of a system can require considerable effort to test and perfect. Within this group it can be found different means of testing, for example performance testing conducted to evaluate the compliance of a SUT with specified performance requirements [42]. These requirements usually includes constraints about the time behaviour (capability of the software product to provide appropriate response and processing times and throughput rates when performing its function, under stated conditions) and resource utilization (capability of the software product to use appropriate amounts and types of resources when the software performs its function under stated conditions). Performance testing may measure response time with a single user exercising the system or with multiple users exercising the system. Load testing is focused on increasing the load on the system to some stated or implied maximum load, to verify the system can handle the defined system boundaries. Volume testing is often considered synonymous with load testing, yet volume testing focuses on data. Stress testing exercises beyond normal operational capacity to the extent that the system fails, identifying actual boundaries at which the system breaks. The aim of stress testing is to observe how the system fails and where the bottlenecks are [132]. Security testing tries to ensure the following concepts: confidentiality (protection against the disclosure of information), integrity (ensuring the correctness of the information), authentication (ensuring the identity of the user), authorisation (determining that a user is allowed to receive a service or perform an operation), availability (ensuring that the system perform its functionality when required) and non‐repudiation (ensuring the denial that an action happened). Usability testing focuses on finding user interface problems, which may make the software difficult to use or may cause users to misinterpret the output.
‐ 23 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Accessibility testing is the technique of making sure that your product is accessibility (ability to access to the system functionality) compliant.
2.4. Testing of Web Applications
Web‐based applications (or simple web applications) shares the same objectives of traditional application testing, i.e. to ensure quality and finding defects in the required functionality and services. A web application can be viewed as a client‐server distributed system, with the following main characteristics [34]: ‐ A wide number of users distributed all over the world accessing concurrently. ‐ Heterogeneous execution environments (different hardware, network connections, operating systems, web servers and browsers). ‐ A heterogeneous nature, because of different technologies (programming languages and models), and different involved components (generated from scratch, legacy ones, hypermedia components, Commercial Off‐The‐Shelf ‐COTS‐ and so on). ‐ Dynamic nature. Web pages can be generated at run time according to user inputs and server status. The aim of web testing consists of executing the application using combinations of input and state to reveal failures. These failures are mainly caused by faults in the running environment or in the web application itself. The running environment mainly affects the non‐functional requirements of a web application (e.g. performance, stability, or compatibility), while the web application is responsible for the functional requirements. Therefore, web testing has to be considered from this two distinct perspectives (functional and non‐functional), since they are complementary and not mutually exclusive. All in all, different types of testing have to be executed to reveal these diverse types of failures [34].
2.4.1. Web Testing Levels Compared with traditional software, the definition of the development testing levels (i.e., unit, integration, and system testing) for a web application requires a greater attention.
2.4.1.1. Unit Web Testing Different types of unit may be identified in a web application, such as the web pages, or scripting modules, forms, applets, servlets, or other web objects. Anyway, the basic unit that can be actually tested is a web page. There are some differences between testing a client and a server page.
2.4.1.1.1. Client Page Testing Client pages show the textual or hyper‐textual information to users, accepting user input, or allowing user navigation throughout the application. A client page may include scripting code modules that perform simple functions, such as input validation or simple computations. Testing of dynamically generated client pages is a particular case of client page testing. The basic problem of this testing is the availability of built pages that depends on the capability of identifying and reproducing the conditions from which pages are built. A second problem is a state explosion problem, since the number of generable dynamic pages can be considerable,
‐ 24 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
depending on the large number of possible combinations of application states and user inputs. Equivalence class partitioning criteria should be used to approach this question. The typical failures that the testing of a client page would identify are the following: ‐ Differences between the content displayed by the page and the one specified and expected by a user. ‐ Wrong destination of links towards other pages. ‐ Existence of broken links (links towards not existing pages). ‐ Wrong actions performed when a button, or any other active object, is selected by a user ‐ Script failures in the client page. Unit testing of client pages can be carried out by white‐box, black‐box, or gray‐box techniques. The typical criteria for client page white box test coverage are: ‐ HTML statement coverage. ‐ Web objects coverage (e.g., each image or applet has to be exercised at least once). ‐ Script blocks coverage (e.g. each block of scripting code has to be executed at least once). ‐ Statement/branch/path coverage for each script module. ‐ Hyper‐textual link coverage.
2.4.1.1.2. Server Page Testing Server pages have the main responsibility for implementing the business logic of the application, managing the storing and retrieving of data into/from a database. Server pages are usually implemented with scripting technologies, such as JSP (Java Server Pages), Servlets, ASP (Active Server Pages), COTS, or PHP (Hypertext Preprocessor) among others. The typical failures detectable by server page testing are: ‐ Failures in the executions of servlets or other technologies. ‐ Incorrect executions of data storing into a database. ‐ Failures due to the existence of incorrect links between pages. ‐ Defects in dynamically generated pages. Likewise client web page testing, server web page testing can be carried out by white‐box, black‐box, or grey‐box techniques. The coverage criteria for server page white box testing could be: ‐ Statement/branch/path coverage in script modules. ‐ HTML statement coverage. ‐ Servlet, COTS, and other web objects coverage. ‐ Hyper‐textual link coverage. ‐ Coverage of dynamically generated pages.
2.4.1.2. Integration Web Testing Web application integration testing considers sets of related web pages in order to assess how they work together, and identify failures due to their coupling [34]. The web application use cases (or any other description of the functional requirements) can drive the process of page integration. The identification of such web pages can be made by analysing the development documentation or by reverse engineering the application code.
‐ 25 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
At the testing integration level, the knowledge of both the structure (set of pages to be integrated) and the behaviour of the web application have to be considered. Therefore, grey‐ box techniques will be more suitable than pure black or white box ones to carry out integration testing.
2.4.1.3. System Web Testing On one hand, black‐box techniques are usually employed to accomplish system testing in the externally visible behaviour of the application. On the other hand, in order to discover web applications failures due to incorrect navigation links among pages, grey‐box testing techniques are most suitable. The errors due to incorrect navigation include: ‐ Links reaching a web page different from the specified one ‐ Pending links to unreachable pages (broken links). Typical coverage criteria for system testing include: ‐ User function/use case coverage (black‐box approach). ‐ Page (both client and server) coverage (white‐box or grey‐box approaches). ‐ Link coverage (white‐box or grey‐box approaches).
2.4.2. Web Testing Strategies The following sub‐sections describes white‐box (structural), black‐box (functional) and non‐ functional testing for web applications‐
2.4.2.1. White‐Box Web Testing The design of test cases using a white‐box strategy is made using two artefacts: ‐ The test model, i.e. the code representation of the component under test. ‐ The coverage model, which specifies the parts of the representation that must be exercised by the test case. Regarding the test model, there are two families mainly adopted in the literature for white‐box web testing: ‐ One focuses on the level of abstraction of single statements of code components representing the information about their control‐flow or data‐flow. ‐ A second family considers the coarser degree of granularity of the navigation structure between pages of the application with some eventual additional details. Regarding the coverage criteria, traditional ones (such as those involving nodes, edges, or paths) have been applied to both families of test models.
2.4.2.2. Black‐Box Web Testing Black‐box (functional) testing should find the failures of the web applications that are due to faults in the implementation of the specified functional requirements, rather than to the execution environment. Most of the methods and approaches used to test the functional requirements of traditional software can be used for web applications too. The main issue with black‐box testing of web applications is the choice of a suitable model for specifying the behaviour of the SUT and deriving test cases. This behaviour may be significantly
‐ 26 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
dependant on data managed by the application or user input, with the consequence of a state explosion problem. To solve this problem, some solutions are presented in the literature. The approach proposed by Di Lucca et al. [90] exploits decision tables as a combinatorial model for representing the behaviour of the web application and producing test cases. This approach provides a method for both unit and integration testing. Another approach provided by Andrews et al. [6] proposes Finite State Machine (FSM) to model state dependent behaviour of web applications and designing test cases. This approach mainly addresses integration and system testing.
2.4.2.3. Grey‐Box Web Testing Grey‐box testing is well suited for web testing because it evaluates high‐level design, environment, and interoperability conditions. It can reveal issues on end‐to‐end information flow and system configuration and compatibility [113]. Strategies based on the collection of user session data can be classified as grey‐box. These strategies use collected data to test the behaviour of the application in a black‐box fashion, but they also aim at verifying the coverage of any internal component of the application (e.g. page or link coverage). Data to be captured include clients’ requests expressed in form of URLs and name‐value pairs. Captured data about user sessions can be transformed into a set of HTTP requests, each one providing a separate test case.
2.4.3. Non‐Functional Web Testing The main non‐functional requirements for web applications are the following: performance, scalability, compatibility, accessibility, usability, and security [34]. The following table presents a description of these non‐functional requirements and a list of verification activities that can be executed for web applications:
Table 2. Web Application Non‐Functional Testing Activities Description Performance It verifies the specified system performances, such as response time or testing service availability. It is executed by simulating many concurrent users accessing over a defined time interval. Failures revealed by performance testing are mainly due to running environment faults, such as scarce resources or not well deployed resources. There are two special cases of performance testing: ‐ Load testing (sometimes called volume testing): It requires that system performance is evaluated with some predefined conditions, such as the minimum and maximum activity levels of the running application. ‐ Stress testing: It is executed to evaluate a system, or component at or beyond the limits of its specified requirements. It is used to evaluate system responses at activity peaks that can exceed systems limitations, and to verify if the system crashes or it is able to recover from such conditions. Compatibility It has to uncover failures due to the usage of different web server platforms testing or client browsers. Therefore, both the application and the running environment are responsible for compatibility failures.
‐ 27 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Usability It aims at verifying to what extend an application is easy to use. Usability testing testing is mainly centred on testing the User Interface (UI): issues concerning the correct rendering of the contents (e.g. graphics, text editing format, etc.) as well as the clearness of messages, prompts and commands are to be considered and verified. Web usability testing is about the completeness, correctness and conciseness of the navigation. The application is mainly responsible for usability failures. Accessibility It aims to verify that access to the content of the application is allowed even testing in presence of reduced hardware or software on the client side (such as browser configurations disabling graphical visualization, or scripting execution), or of users with physical disabilities (such as blind people). The application is the main responsible for accessibility. Security It aims at verifying the effectiveness of the web defences against undesired testing access of unauthorized users or improper uses, and to grant the access to authorized users to authorized services and resources. Both the running environment and the application can be responsible for security failures.
2.4.4. Web Testing Tools The effectiveness of a testing process can significantly depend on the tools used to support the process. Testing tools usually automate some tasks required by the process, such as test case generation, test case execution, or result evaluation. A list of more than 400 testing tools (commercial and open source) is presented in http://www.softwareqatest.com/qatweb1.html. Web application testing tools can be classified using the following main categories [100]: 1. Supporting non‐functional requirements: a. Load, performance and stress test tools. b. Web security test tools. c. HTML/XML validators. 2. Supporting conformance testing: d. Link checkers. e. Usability and accessibility test tools. 3. Supporting functional testing: f. Web functional/regression test tools. Regarding functional testing, existing tools’ main contribution is limited to managing test case suites created manually, and to matching the test case results with respect to an oracle created manually. The usage of these browser elements ‐such as the back/forward or reload buttons‐ may negatively affect the navigation, because they might introduce some inconsistencies or violate any functional/not‐functional requirements of the application [34]. To avoid this situation, such features should be worth of consideration when testing the behaviour of the web application. For example, [91] proposes a model and an approach has been proposed to test the interaction between a web application and browser buttons where the browser is modelled by a state chart diagram and each state is defined by the page displayed and by the state of the Back/Forward buttons, while user actions on page links or browser buttons determine the state transitions.
‐ 28 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.5. Automated Software Testing
Dustin et al. define Automated Software Testing (AST) as the “Application and implementation of software technology throughout the entire Software Testing Lifecycle (STL) with the goal to improve efficiencies and effectiveness” [38]. One of the software testing research dreams described by Bertolino in [16] is to achieve 100% AST. This dream is divided into: i) developing advanced techniques for generating the test inputs; ii) finding innovative support procedures to automate the testing process. Many surveys have highlighted the lack of AST tasks in most software organizations [128][125][50][53][104]. The main benefits of AST are [47]: anticipated cost savings, shortened test duration, heightened thoroughness of the tests performed, improvement of test accuracy, improvement of result reporting as well as statistical processing and subsequent reporting. AST at system level is usually more difficult than unit or integration. Automated unit testing relies on predicting the outputs then encoding these predictions, which are compared with the real outputs. At system level, the outputs are larger and cannot be easily predicted [132]. All in all, AST must provide tools that address test planning, test design, test construction, test execution, and test results verification, and test reporting [109]. Hence, AST would be implemented by means of a powerful integrated test framework which takes care of generating or recovering the needed test case data, generating the most suitable test cases, executing them and finally issuing a test report. The following subsections present the following topics on AST, namely: i) Test case generation; ii) Test data generation; iii) Automated test oracle; iv) AST frameworks; v) AST frameworks for web applications.
2.5.1. Test Case Generation Several approaches have been proposed for test case generation. In Model‐Based Testing (MBT) test cases are derived in whole or in part from a model that describes some (if not all) aspects of the SUT [8]. MBT is a form of black‐box testing because tests are generated from a model, which is derived from the requirements documentation. It can be done at unit, integration of system level. The difference from the usual black‐box testing is that rather than manually writing tests based on the requirements documentation, a model of the expected SUT behaviour is created, which captures some of the requirements. Then the MBT tools are used to automatically generate tests from that model [141]. The main use of MBT is to generate functional tests, but it can also be used for some kinds of non‐functional tests, such as robustness or performance testing (under development). Specification‐based test case generation is based formal specification in a determined language and automatically generate test cases for an implementation of that specification. Some examples of these languages are SDL [123], or Z specification [133]. An important kind of specification‐based test generation uses contracts. A contract can be seen as a collection of the following constraints: pre‐conditions, post‐conditions, and invariants. A pre‐condition is a condition or a logic predicate that must be met just before the execution of a portion of code. A post‐condition is a condition or a logical predicate that must always be met just after the execution of a code. An invariant is a condition or logical predicate that must always be met.
‐ 29 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Continue in the same way, OCL (Object Constraint Language) is a language for defining constraints in UML models [144], and JML (Java Modelling Language) which combines the design by contract (DbC) approach [105] and the model‐based specification approach of the Larch family [57]. Golden software defines a correct version of a software artefact [134][143]. Golden software has been employed in software testing to derivate test cases are generated by comparison of a software component towards its golden version. For example, in [39] in those golden test cases are used to compare others test cases in order to make a test suite selection/reduction, or even to generated Differential Unit Test (DUT), which are a hybrid of unit and system tests. In the intelligent approach test cases are identified selecting goals such as a statement or branch. This approach employs computational intelligent and/or Artificial Intelligence (AI) techniques. Pedrycz and Vukovich presented in [88] a fuzzy approach to cause‐effect software modelling as a basis for designing test cases in black‐box testing. Last and Friedman demonstrate the potential use of data mining algorithms for automated induction of functional requirements [139]. The Record&Playback approach is carried out firstly recording linear scripts corresponding to actions performed in the system (record stage). This script can be parameterized and after that, the automation can be done repeating the recorded script and exercising the SUT (playback stage) [24].
2.5.1.1. Source Code Generation An important aspect in test case generation is source code generation, which is the act of producing source code automatically. It is about writing programs that write programs, i.e. code generators. Code generators are separated into two high‐level categories: active and passive. Passive generators build a set of code, which the software engineer is then free to edit and alter at will. The passive generator maintains no responsibility for the code either in the short or long term. The typical example of passive generator is the “wizards” in Integrated Development Environments (IDEs). Active generators maintain responsibility for the code long term by allowing the generator to be run multiple times over the same output. As changes to the code become necessary, team members can input parameters to the generator and run the generator again to update the code. There are several types of active generators: ‐ Code Munging1. Given some input code, the munger picks out important features and uses them to create one or more output files of varying types. It usually employs regular expressions or simple source parsing, and then uses built‐in or external templates to build output files. It can be used to create documentation or to read constants or function prototypes from a file. ‐ Mixed‐code generator. It works just like the inline‐code generator, i.e. by using marks or specially formatted comments in order to embed additional code in these predefined
1 Munging is slang for twisting and shaping something from one form into another form.
‐ 30 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
positions. The difference of this with the previous type is that it saves the output to the same file, which used as input. ‐ Inline‐code expander. This kind of generators accept source code as input and expands it by replacing or embedding code to selected or marked points of it and produces the output source code. They are commonly used to embed Structured Query Language (SQL) into a source code file. The engine reads the file and finds the appropriate marks in the source code. While finds it, it replace them or embed the source code for the expansion. The purpose of this type is to keep the development code free of the infrastructure required to manage the SQL queries. ‐ Partial‐class generator. It accepts two inputs: i) Definitions file which keeps metadata written in a definition language like XMI or any other free language like XML mark‐up. The file describes the classes in order to be generated. ii) Templates which in conjunctions with the definition file they generate the output classes source code. These kinds of generators are used often for Object Relational Mapping (ORM) to build the data access tier of an application. ‐ Tier generator. In contrast to partial‐class these generators are responsible for building the full tier of an application. The input files are the same as to the previous code generation type, so there is a definition file which describes the classes and one or more templates which are being processed by the engine to produce the final source code for the application tier. While this type seems to have an advantage to generate the full tier, partial‐class generator presents faster development while increases developer flexibility, because tier generators are based more on generics and it is very difficult to design for special cases. ‐ Template Metaprogramming. Metaprogramming is the name given to computer programs that manipulate other programs (or themselves) at runtime [11]. Template metaprogramming is a technique in which a template processor (also kwon as template engine or a template parser) combines one or more templates with a data model to produce one or more result documents.
2.5.2. Test Data Generation Test data is the input needed for executing a test case. Test Data Generation (TDG) is a crucial software testing activity because test data is one of the key factors for determining the quality of testing process. Automated Test Data Generation (ATDG) is an activity that automatically tries to create effective test data (i.e., test input values) for the SUT. TDG is an expensive and error prone process if it is done manually, nevertheless ATDG could curtail testing expenses increasing the reliability of testing as a whole [97]. The values generated by ATDG must match two different criteria [115]: 1. Syntax criterion. It depends on the test level: i) Unit: Test data for methods parameters and non‐local variables. ii) System: Test data for user‐level interaction. 2. Semantic criterion. It could be one of the following: i) To satisfy some test condition. ii) Special or invalid values. iii) Random values. Test data is typically used in the following testing areas [96]: ‐ The coverage of specific program structures (white‐box testing). ‐ The exercising of some specific program feature (black‐box testing).
‐ 31 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
‐ Attempting to automatically disprove certain grey‐box properties regarding the operation of a piece of software, for example trying to stimulate error conditions. ‐ To verify non‐functional properties, such as the Worst‐Case Execution Time (WCET) or the Best‐Case Execution Time (BCET) of a SUT. ATDG general problem is formally unsolvable [115]. TDG is an old research topic that has many contributors. The following picture shows a list of the main TDG/ATDG appeared in the literature, from the traditional techniques (partition or equivalence class partitioning, boundary value analysis, cause effect graphing and random data generation) to others more recent in the literature (path‐oriented, constraint‐based, goal‐oriented, and search‐based).
Equivalence Boundary Value Partitioning Analysis
Cause Effect Graphing
Random Anti‐Random
Test Data Generation Dynamic Domain Path‐Oriented Constraint‐Based Reduction
Chaining
Goal‐Oriented Assertion‐ Oriented Search‐Based
Figure 10. Test Data Generation Techniques
2.5.2.1. Equivalence Partitioning Data generation for equivalence class partitioning (partition testing) was defined by Myers [110] in 1978 as “a technique that partitions the input domain of a program into a finite number of classes [sets], it then identifies a minimal set of well selected test cases to represent these classes. There are two types of input equivalence classes, valid and invalid”. The equivalence partitioning testing theory ensures that only one test case of each partition is needed to evaluate the behaviour of the program for the related partition. Boundary value analysis is a method which complements equivalence partitioning by looking at the boundaries of the input equivalence classes. NIST defines it in 1981 as “a selection technique in which test data are chosen to lie along ‘boundaries’ of the input domain [or output range] classes, data structures, procedure parameters” [2]. Boundary value tests data usually includes the following values: min‐1, min, min+1, max‐1, max, and max+1.
2.5.2.2. Cause–Effect Graphing Cause‐effect graphing is an old technique which can be defined as either test case generation [105] or test case selection [2], besides test data generation. Cause‐effect graphing’s aim is to
‐ 32 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
select the correct inputs to cover an entire effect set, and as such it deals with selection of test data. Cause–effect graphing exercises the different combinations of inputs from the equivalence classes. A cause‐effect graph is a directed graph that maps a set of causes to a set of effects. It is useful for generating a reduced decision table. Test cases are derived from the decision table.
2.5.2.3. Random Data Generation Random data generation consists of generating inputs at random until a useful input is found. This approach is quick and simple but might be a poor choice, since the probability of selecting an adequate input by chance could be low [92]. In the main derivate of random testing, namely anti‐random, each data is chosen such its maximum distance from the previous test data.
2.5.2.4. Path‐Oriented Data Generation Path‐oriented data generation technique first transforms source code of the program under test to a Control Flow Graph (CFG), which is directed graph that represents its control structure. Then, the CFG is used to determine the paths to cover. Finally, test data for these paths is generated. Path‐oriented approach is used with the help of symbolic execution (also known as symbolic evaluation), which is an automatic static analysis technique that allows the derivation of symbolic expressions encapsulating the entire semantics of programs. It extracts information from the source code of programs by abstracting inputs and sub‐routines parameters as symbols rather than by using actual values as during actual program execution [92]. In symbolic execution variables are used instead of actual values while traversing the path. Constraint‐based data generation is based on the path‐oriented techniques. It uses algebraic constraints to describe the input variables which describe the conditions necessary for the traversal of a given path. Constraint satisfaction problems are in general NP‐complete [96]. Dynamic Domain Reduction (DDR) is a TDG technique that was originally employed as part of constraint‐based testing, developed by DeMillo and Offutt [33]. DDR creates a set of values that executes a specific path. The DDR process is the following [115]: 1. Definition of an initial symbolic domain for each input variable. 2. Selection of a test path through the program. 3. Symbolical evaluation of the path, reducing the input domains at each branch. 4. Evaluation of the expressions with domain‐symbolic algorithms. 5. After walking the path, values in the input variables’ domains ensure execution of the path. 6. If a domain is empty, the path is re‐evaluated with different decisions at branches.
2.5.2.5. Goal‐Oriented Approach In his paper published in 1992, Korel developed what became known as the Goal‐Oriented Approach [86]. It can be described as data generation techniques which aim to find program input and a sequence on which the selected statement is executed. Test data is selected from the available pool of candidate test data to execute the selected goal, such as a statement, irrespective of the path taken. This approach involves two basic steps: to identify a set of statements (respective branches) the covering of which implies covering the criterion; to generate input test data that execute
‐ 33 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
every selected statement (respective branch). Two typical approaches, assertion‐based and chaining approach are known as goal‐oriented. Assertion‐based was proposed by Korel and Al‐Yami [87] attempts to find test cases that violate assertion conditions, which are embedded by the programmer into the program code. The chaining approach [85][43] uses the concept of an event sequence as an intermediate means of deciding the type of path required for execution up to the target node.
2.5.2.6. Search‐Based Data Generation Search‐Based Software Engineering (SBSE) is an approach to apply metaheuristic search techniques to automate the construction of solutions to SE problems [60]. Metaheuristic search techniques are a set of high‐level optimization algorithms which utilise heuristics (i.e. an experience‐based method) to find solutions to combinatorial problems at a reasonable computational cost. These problems may have been classified as NP‐complete or NP‐hard, or be a problem for which a polynomial time algorithm is known to exist but is not practical. Metaheuristic search techniques are not standalone algorithms in themselves, but rather strategies ready for adaption to specific problems [59]. The term SBSE was first coined in 2001 [62], since which time there has been a rapidly developing community working on this area. SBSE has been applied to problems throughout the SE lifecycle, such as requirements engineering, project planning, maintenance, reengineering, and testing [61]. Some metaheuristic techniques have been used in ATDG, such as hill climbing, simulated annealing, evolutionary algorithms (such as genetic algorithms), or tabu search. The general idea behind search‐based ATDG is that the set of possible inputs to the program forms a search space and the test adequacy criterion is coded as a fitness function. For example, in order to achieve branch coverage, the fitness function assesses how close a test input comes to executing an uncovered branch; in order to find worst case execution time, the fitness is simply the duration of execution for the test case in question [60]. This function has to be designed by a human. Once a fitness function has been defined for a test adequacy criterion C, then the generation of C‐adequate test inputs can be automated. This process is outlined in the Figure 11 [61].
Figure 11. Generic Search Based Test Input Generation Scheme
‐ 34 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.5.3. Automated Test Oracles A test oracle is a reliable source of expected outputs. The oracle problem the name given to one of the biggest challenges in software testing: How do we know that the software did what it was supposed to do when we ran a given test case? [145]. Generally, expected outputs are manually generated based on specifications or developers’ knowledge of how software should behave [5]. These manual oracles are costly and unreliable. Hence, automated test oracles are required to ensure the testing quality while reducing costs. Complete automated test oracles can be expensive and sometimes impossible to provide. Several researches have been done to provide automated test oracles, but none of them could completely automate all test oracle activities in all circumstances [131]. The most important challenge to develop a complete automated test oracle is the output generation. In order to provide a reliable oracle, it is suggested that there should be a simulated model behaving like the SUT and automatically generate expected outputs for every possible inputs specified in the specification. The survey on automated test oracles carried out by Shahamiri on [131] describes the following methods: ‐ N‐Version diverse systems and M‐Model program (M‐mp) testing. ‐ Decision tables. ‐ Info Fuzzy Network (IFN) regression tester. ‐ Artificial Intelligence (AI) planner test oracle. ‐ Artificial Neural Network (ANN) based test oracle. ‐ Input/Output (I/O) analysis based automatic expected output generator. N‐Version diverse is a testing method based on various implementations of a program implementing the same functionalities presented on [93]. A gold version (i.e. a trusted implementation of SUT) is used to automate the oracle. This method is so expensive, so the authors reduce the cost by using M‐mp testing increasing the reliability of process by providing more precise oracle. A decision table is a requirements representation model used wherever there are many conditions affecting responses. Decision table consists of a condition section (combination of inputs) and the action section (combination of output for when the conditions are satisfied). Each row in the table decision presents unique combination of conditions. Di Luca et al applied decision tables in unit and integration web testing for both client and server pages in [90]. The following table shows a template of decision table:
Table 3. Decision Table Template Input Section Output Section Input Input action State before Expected Expected Expected variable test result output state after sections test
IFN regression testing is an approach developed for knowledge discovery and data mining. This approach uses AI methods for simulating the SUT behaviour using it as test oracle [88]. An IFN
‐ 35 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
presents the functional requirements by a tree‐like structure, where each input attribute is associated with a single layer and the leaf nodes corresponds to input values combinations. AI planning is applied as automated GUI test oracle in [142], modelling the internal behaviour of GUI using a representation of GUI elements and actions. A formal model composed of GUI objects and their specifications is applied as oracle. GUI actions are defined by their preconditions and effects. Expected states are automatically generated using the model. ANN as test oracle requires generating a neural network simulate the software behaviour by means of /O pairs as training patterns. Since ANNs can memorize or learn from I/O pairs, it is possible to apply them as test oracle. This approach has been applied in [142]. There has been presented several approaches on semi‐automated expected output generation finding I/O relationships by changing the input values and executing the program while observing the outputs [121][30][28][130]. The drawback of these methods is that incomplete I/O relationship detection may result in imperfect test oracle.
2.5.4. AST Frameworks AST is most effective when implemented within a framework. Testing frameworks may be defined as a set of abstract concepts, processes, procedures and environment in which automated tests will be designed, created and implemented. This framework definition includes the physical structures used for test creation and implementation, as well as the logical interactions among those components. A powerful AST framework must provide tools that address test planning, test design, test construction, test execution, and test results verification, and test reporting [109]. According to the Automated Testing Institute2 (ATI), there are three different kinds of AST frameworks: 1st, 2nd, and 3rd generation frameworks. The 1st generation framework is primarily comprised of the linear approach to automated test development. This approach typically yields a one‐dimensional set of automated tests in which each automated test is treated simply as an extension of its manual counterpart. Driven mostly by the use of the Record & Playback (R&P), all components that are executed by a linear script largely exist within the body of that script. There is little to no modularity, reusability, or any other Quality Attribute considered in the creation of linear scripts. Linear Scripts may be useful in environments with a very small scope. There are not calls to external modules or external data in a linear script The 2nd generation frameworks are comprised by two kinds that fit into this generation: the data‐driven framework and functional decomposition framework. Frameworks built on data‐driven scripting are similar to linear scripts. The difference is how the data is handled. The data used in Data‐driven scripts is typically stored in a database of file external to the script. Functional decomposition refers to the process of producing modular components (user‐ defined functions) in such a way that automated test scripts can be constructed to achieve a testing objective largely by combining these existing components. The 3rd generation frameworks are the most defined frameworks. They require proficiency in the automation method being used to develop the framework. The two frameworks that fit
2 http://www.automatedtestinginstitute.com/
‐ 36 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
into this generation include Keyword‐driven and Model‐based frameworks. The Keyword‐ driven frameworks (often called “table‐driven”) process automated tests that are developed in data tables with a vocabulary of keywords (“action” words) that are independent of the automated test tool used to execute them. The keywords are associated with application‐ specific and application‐independent functions and scripts that interpret the keyword data tables along with its application‐specific data parameters. The automated scripts execute the interpreted statements in the SUT. The model‐based frameworks (often called as “intelligent framework”) go beyond creating automated tests that are executed by the tool. These frameworks are typically “given” information about the application, and the framework “creates” and executes tests in a semi‐intelligent manner. Test automators describe the features of an application, typically through state models that depict the basic actions that may be performed on the application, as well as the broad expected reactions. Armed with this information the framework dynamically implements tests on the application.
2.5.5. AST Frameworks for Web Applications Following the classifications of framework depicted before, the following table summarizes some the most significant AST frameworks for web applications to date. Each of these frameworks is analysed in further sub‐sections.
Table 4. Automated Software Testing Frameworks for Web Applications Framework Generation Creator License Operative System SOATest3 2nd and 3rd Parasoft Proprietary Windows, Linux, Solaris HP Quality Center4 2nd and 3rd HP Proprietary Windows, Linux, Solaris, AIX, HP‐UX IBM Software 2nd and 3rd IBM Proprietary Windows, Linux, Quality Solaris, AIX, Z/OS Management5 Selenium6 1st ThoughtWorks Apache Cross‐Platform Silk7 2nd and 3rd Micro Focus Proprietary Windows, Red Hat Enterprise, Solaris STAF8 3rd IBM EPL Cross‐Platform TestComplete9 3rd AutomatedQA Proprietary Windows WATIR10 1st Bret Pettichord and BSD Cross‐Platform Paul Rogers
3 http://www.parasoft.com/jsp/products/soatest.jsp?itemId=101 4 https://h10078.www1.hp.com/cda/hpms/display/main/hpms_content.jsp?zn=bto&cp=1‐11‐ 127‐24_4000_100__ 5 http://www‐142.ibm.com/software/products/us/en/subcategory/rational/SW730 6 http://seleniumhq.org/ 7 http://microfocus.com/products/silk/index.aspx 8 http://staf.sourceforge.net/ 9 http://www.automatedqa.com/testcomplete/ 10 http://watir.com/
‐ 37 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.5.5.1. SOATest Parasoft SOAtest is a quality platform that automates web application testing, message/protocol testing, cloud testing, security testing, and behaviour virtualization. Parasoft SOAtest is packaged together with Parasoft Load Test, and they can be integrated with Parasoft language products such as JTest, to help teams prevent and detect application‐layer defects from the start of the Systems Development Life Cycle (SDLC).
2.5.5.2. HP Quality Center HP Quality Center is a software quality tool suite. Many of the tools of the suite were acquired from Mercury Interactive Corporation. It offers QA, including requirements management, test management and business process testing for IT and application environments. The products of this suite are the following: ‐ HP Business Process Testing for Oracle/SAP: System for defining and executing business‐ centric test automation. ‐ HP Center Management for Quality Center: Project management of QA workflows. ‐ HP Change Impact Testing for SAP Application: Recommendations on SAP testing priorities ‐ HP Functional Testing: Complete automated testing solution for functional, GUI and regression testing. ‐ HP Quality Center: Web‐based application that supports all aspects of test management ‐ HP QuickTest Professional: It provides functional and regression testing automation for major software application environments. ‐ HP Requirements Management: It captures, manages and tracks requirements at every step of the application development and testing process. ‐ HP Service Test Management: Automatic QA and test assets for any application component or service for the Service‐Oriented Architectures (SOA). ‐ HP Service Test: Simplifies and the automated functional testing of SOA services.
2.5.5.3. IBM Software Quality Management IBM Rational Quality Mmanagement is a family of products which helps to deliver enduring quality throughout the product and application lifecycle. The tools in this suite are the following: ‐ Rational Application Performance Analyzer: Pinpoint and understand the root cause of actual bottlenecks in the application. ‐ Rational AppScan Product line: Static and dynamic security testing in all stages of application development. ‐ Rational Functional Tester: Automated functional testing of Java, Web and VS.NET WinForm‐based applications. ‐ Rational Functional Tester Plus: Functional and regression testing solution covering a wide variety of software applications. ‐ Rational Performance Tester: Verifies acceptable application response time and scalability under variable multi‐user loads. ‐ Rational Policy Tester OnDemand Privacy, Quality and Accessibility Edition: Web‐based, multi‐user solution providing centralized scanning of Web content for accessibility, privacy, and quality compliance respectively.
‐ 38 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
‐ Rational Professional Bundle: Provides enterprise desktop tools to design, construct, and test J2EE/Portal/Service‐oriented applications. ‐ Rational Purify: Dynamic software analysis tool for Windows/Linux/UNIX application development. ‐ Rational Quality Manager Product line: Web‐based centralized test management environment. ‐ Rational Robot: General purpose test automation tool for client/server applications. ‐ Rational Service Tester for SOA Quality: A regression and functional testing solution for testing GUI‐less services. ‐ Rational Software Analyzer: It provides capabilities to ensure reliably quality code. ‐ Rational Test Lab Manager: Build and configure test environments providing inventory control and analytics. ‐ Rational Test RealTime: Helps identify and resolve issues early in the development cycle.
2.5.5.4. Selenium Selenium is a testing framework for web applications. Selenium was been firstly developed by a team of programmers at ThoughtWorks (IT consultancy). It has been released under the Apache 2.0 license. According to Selenium website, Selenium is composed by different projects. The following table summarizes these projects:
Table 5. Selenium Projects Component Description Selenium IDE Firefox add‐on that makes to record and playback tests in Firefox 2+. Selenium Remote Selenium RC is a client/server system that allows you to control web Control browsers locally or on other computers, using almost any programming language and testing framework. Selenium Grid It allows Selenium RC running tests on many servers at the same time. Selenium Core It is the original JavaScript‐based testing system. Selenium on Rails It provides a suite to run Selenium tests for Rails applications. Selenium on Ruby It is the hub for newer Ruby related Selenium projects (work‐in‐ progress). CubicTest Graphical Eclipse plug‐in to write Selenium and Watir (Web Application Testing in Ruby) tests Bromine Web‐based QA tool that enables running and reporting selenium tests
Selenium Core tests run directly in a browser, just as real users do. They run in Internet Explorer, Mozilla and Firefox on Windows, Linux, and Macintosh. Selenium Core uses a mechanism that allows it to run on so many platforms. Written in pure JavaScript/DHTML, Selenium Core allows the tests to run in any supported browser on the client‐side. Selenium Remote Control (RC) is a web server written in Java that accepts HTTP commands. RC makes it possible to write automated tests for a web application in any programming language, which allows a better integration of Selenium in existing unit test frameworks. To make writing tests easier, Selenium project currently provides client drivers for Python, Ruby, .NET, Perl, Java, and PHP. The Java driver can also be used with JavaScript (via the Rhino engine).
‐ 39 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.5.5.5. Silk Silk is an automated software quality management solution. It ensures that developed applications are reliable and meet the needs of business user by automating the testing process. It prevents or discovers quality issues early in the development cycle. The following products compose the Silk tool suite: ‐ SilkPerformer: Automated software load, stress and performance testing. ‐ SilkPerformer Diagnostics: Accelerates the resolution of found performance problems. ‐ SilkCentral Test Manager: Automated test management solution that can manage agile or traditional test cycles. ‐ SilkTest: Automation tool for testing the functionality of enterprise applications. It has also support for web 2.0 applications. ‐ Silk4Net and Silk4J: IDE support for Eclipse or Visual Studio. ‐ TestPartner: Automated functional and GUI testing. ‐ DataExpress: Automated test data generation and management.
2.5.5.6. STAF STAF (Software Testing Automation Framework) creates and manages automated test cases and test environments. It externalizes its capabilities through services. STAF Proc is the process that runs on a machine, called a STAF Client, which accepts requests and routes them to the appropriate service. These requests may come from the local machine or from another STAF Client. Thus, STAF works in a peer environment, where machines may make requests of services on other machines.
2.5.5.7. TestComplete TestComplete is an automated testing tool in which tests can be recorded, manually scripted or created manually with keyword operations and used for automated playback and error logging. It is used for testing application such as web, Windows, Flash, .NET and Java. It automates front end UI/functional and back‐end testing like database, and HTTP load testing.
2.5.5.8. WATIR Watir (pronounced water) stands for “Web Application Testing in Ruby”. It is an automated test tool that uses the Ruby scripting language to drive the web browser. Watir is a toolkit for automated tests to be developed and run against a web browser. The following example is a very simple WATIR script to drive to google.com, make a search and validate the results:
Snippet 1. Watir Script Example require 'watir' test_site = 'http://www.google.com' b = Watir::Browser.new b.goto(test_site) b.text_field(:name, "q").set("pickaxe") b.button(:name, "btnG").click if b.text.include?("Programming Ruby") puts "Test Passed. Found the test string: 'Programming Ruby'." else puts "Test Failed! Could not find: 'Programming Ruby'" end
‐ 40 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.6. Summary
Quality control (V&V) is an important topic within SE. It can be divided into two important groups of activities: testing and analysis. On one hand, the nature of testing is dynamic since it involves exercise an application and observes its outcomes. On the other hand, the nature of analysis is static since it involves the evaluation of software artefacts (typically source code, but it is possible to analyse specifications, designs, models, and so on) without its execution. Both activities are very important to ensure the quality of a software product. Regarding (static) analysis, there are several techniques reported in the literature. Inspections are examinations of software artefacts by human inspectors aimed at discovering faults in software systems. Review is the process in which a group of people examine the software looking for potential problems. Automated Software Analysis (ASA) assesses the source code using patterns that are known to be potentially dangerous. Finally, formal methods are used to refer to any activities that rely on mathematical representations of software including formal specification and verification. Regarding (dynamic) testing, it covers a wide spectrum of different concepts, such as testing level (unit, integration, system, and so on), testing strategies (black‐box, white‐box, grey‐box, and non‐functional testing), and testing processes (manual, model‐based, automated testing, and so on). Web testing aims to find defects of web applications. Automated Software Testing (AST) can be seen as the application of software technology to the STL with the goal to improve the effectiveness of testing. AST involves several aspects, such as test case generation, test data derivation, and automated oracles. AST is most effective when implemented within a framework. By analysing the existing proposal to perform automation assessment for web applications, it is clear there is a lot work done in this field. Nevertheless, these achievements are usually scattered or incomplete since it does not involved AST and ASA at the same time. Therefore, I conclude there is still room for improvement, and this dissertation is going to perform original contributions in this domain.
‐ 41 ‐
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Chapter 3. Objectives
We can only see a short distance ahead, but we can see plenty there that needs to be done.
‐ Alan Turing
s is clear from the study of the state‐of‐the‐art, quality control activities are crucial in software development but also time‐consuming. Nowadays, the software business must be responsive, i.e. it should change and adapt very quickly to external demands. The delivery challenge looks for shortening delivery times for Alarge and complex systems without compromising software quality. The automation of quality control (which is the major topic of this PhD dissertation) has been proposed as a solution of this challenge. After the analysis of the state‐of‐the‐art in this area I concluded that there was still room for improvement in this field, since there is not a complete solution which addressed at once the automation of testing and analysis activities for web applications. Therefore, the overall objective that this dissertation proposes is to investigate and improve the current processes and mechanisms that support the automation of quality control (testing and analysis) for web applications. The outcome of the work to be done will facilitate the improvement of the software quality for web applications while reducing the time to market and saving total costs of the development. In order to divide the main aim of this dissertation in several specific objectives, I am going to rely in Pressman’s definition of SE. Pressman proposed a four‐layer approach to define any engineering approach [122], such as SE. This approach is illustrated in Figure 12 in form of pyramid. In the bottom of the structure, SE must rest on organizational commitment to quality. According to Pressman, “The bedrock that supports software engineering is a quality focus”. The following foundation for SE is the process layer, which is a collection of activities, actions and tasks that are performed when some work product is created. Process defines a framework established for effective delivery of a SE technology. The next layer is SE methods, which provide the technical how‐to for building software. In other words, methods establish the way of solving a problem. Finally, SE tools provide the practical support for the process and methods.
‐ 43 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Tools
Methods
Processes Quality
Figure 12. Software Engineering Layers Therefore, using the SE layers decomposition described before, the main objective of this dissertation may be divided into the following set of specific goals: 1. To propose a complete methodology for the automation of software quality control for web applications. This objective has to do with the design of a complete approach to achieve the automation of software testing and static analysis within the development lifecycle of web applications. It corresponds to the bottom layers of the SE pyramid depicted before, i.e. quality a processes. To achieve this goal, firstly the high‐level quality attributes to be ensured has to be defined in order to guide the V&V processes. For each of these quality attributes, specific software testing and/or static analysis techniques will be selected. The next step in this methodology will be to define the process how to perform these quality control activities in an automated way. The automated quality control to be proposed will be carried out during the development lifecycle of the web application in question. Hence, the methodology should reuse software development artefacts from the analysis and design phases (such as requirements or models) in order to guide the automation of testing and analysis activities as far as possible. 2. To analyse the challenges and potential problems of the automated software testing for web applications. This goal should establish how the automation of software testing will performed, i.e. the method. As presented in the state‐of‐the‐art section, software testing is a broad term encompassing a wide spectrum of different activities. The methods to be proposed should define how testing activities will be carried out in order to achieve greater automation. Therefore, the definition and election of the testing levels (unit, integration, and system) and strategies (functional, non‐functional, and structural testing) will be carried out to aim this goal. Software testing is an important topic in this dissertation since it is the most commonly performed activity within V&V. Once the high level processes for automated software testing has been defined in the methodology, specific automated software testing methods has to be defined.
‐ 44 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Due to its peculiarities, web applications are difficult to test. Therefore, specific contributions have to be done in order to achieve automated software testing for web applications, concretely in test case (test data, fixture, and oracle) generation, test execution, and test reporting. 3. Propose a detailed model to perform automated analysis for web applications. Many authors have highlighted that software testing should be done in in conjunction with static analysis for a good quality control assurance. Therefore, this dissertation should study and select the most suitable automated static analysis techniques to be carried out for web applications. Moreover, static analysis presents several advantages over software testing since it considers broader quality attributes and it is not concerned with interaction between errors. Thus, this contribution should be aware of this situation, taking the most of automated static analysis for specific features of web applications. Therefore, and following the guidelines depicted in the methodology, this contribution should define precise methods to achieve this kind of V&V specifically for web applications. 4. Validate the feasibility of the research approach by means of developing reference architecture of the proposed methodology. The reference architecture provides further details on the elements included in the methodology. This architecture will define or select specific tools which implement the proposed methods for testing and analysis. A working prototype will be developed and validated against a set of representative case studies to verify that it addresses the objectives presented in this dissertation. These case studies will be performed using real web applications. In addition, as a major part of the validation, this dissertation contributes to international research projects as well as open‐source communities aligned with the context and objectives of this work.
‐ 45 ‐
PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Chapter 4. Methodology Foundations
The real voyage of discovery consists not in seeking new lands but seeing with new eyes.
‐ Marcel Proust
his section describes the high‐level view of the proposal to automate quality control for web applications. First, I am going to describe precisely how the target of the proposed approach (i.e. web applications) is understood in the scope of the dissertation. Second, the generic approach to automate quality control (testing and Tanalysis) for web applications will be depicted. Third, the quality dimensions to be assessed with V&V activities will be selected and explained. Finally, the process to guide the automation of quality control will be described.
4.1. Web Applications Web applications follow a client‐server application protocol. The web client (using a web browser, such as Explorer, Opera, Safari, Firefox or Chrome) sends an HTTP request through a TCP‐IP network (typically the Internet) to a web server. The server receives this request and determines the page, which usually contains some script language to connect which a database server. A middleware component connects the web server with the database to inform about the query and get the requested data. This data is used to generate an HTML page, which is sent back to the client in form of a HTTP response. This typical architecture for web applications is illustrated in the following picture:
Figure 13. Tipical Web Applications Architecture
‐ 47 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Nowadays, more and more web applications are not limited to the synchronous interaction of HTTP request and responses. Using the group of technologies called AJAX (Asynchronous JavaScript and XML) web applications can send requests and retrieve responses asynchronously. This interaction with the web server is done without interfering with the displayed web page [22]. The XMLHttpRequest object is the core of AJAX. It is an API that can be used by JavaScript, JScript, VBScript and any other scripting language to transfer and manipulate data between the server and the client using HTTP. XMLHttpRequest original concept (called XMLHTTP) was initially developed by Microsoft as part of Outlook Web Access 2000. On 2010, W3C released the final draft specification for the XMLHttpRequest object to create an official web standard11. Therefore, web applications involve heterogeneous technologies, components (browsers, servers, and databases), programming languages, networking aspects, and so on. Thus, the global quality control for these kinds of applications is a very complex endeavour task, and the automation of such activities is even harder. All in all, it is not possible to cover every aspect of the web chain in a single PhD dissertation. Hence, the focus of this piece of research will be web applications from the client‐side view. This choice is based on the fact that the client‐side view of a web application is the real key differentiator for such applications. If the client‐side and the HTTP communication are dropped from the picture, the resulting system is essentially a normal application with some business logic, a database, and so on. These kinds of applications can be tested and analysed using traditional testing approaches. In addition, the quality in use of web applications is perceived in the client‐side. According to ISO 9126, quality in use can be considered as the highest level of quality since it is experienced by the consumers during and operation and maintenance, and it is influenced by external and internal quality (see Figure 5 in section 2.1.2.1.1). A key aspect in the quality in use in web applications is the number of defects detected in the client‐side. Therefore, the final aim of quality control should be minimizing such defects (defect reduction). Faults in web application are caused by errors made by human developers. Automated fault detection is intended to minimize the number of failures in a web system, and therefore the number of incidences that final consumers (users/costumers) noticed in the application. This chain defect is illustrated in the following picture:
11 http://www.w3.org/TR/XMLHttpRequest/
‐ 48 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Figure 14. Software Defects in Context These defects are mainly injected in the design and coding phase within the overall of the software development lifecycle, comprising the 90% of the inserted faults [31]. The cost of correcting faults when the system is in operation increases exponentially [18][122]. These facts are illustrated in the following chart:
Figure 15. Fault Origin/Dectection Distribution and Cost 4.2. Automated Quality Control Activities
Quality control activities are an important part of any Software Development Lifecycle (SDLC). As depicted in the state‐of‐the‐art section, quality control activities can be divided in two big groups: (dynamic) testing and (static) analysis. This section describes how these activities will be automated for web applications (client‐side) as target in the generic approach proposed in this dissertation.
4.2.1. Automated Software Testing Planning, design and execution of testing activities are carried out throughout the software development process. These activities are divided in phases as a testing procedure illustrated in Figure 16 and described as follows [111][47]: 1. Test requirements. The aim of this step is to define the features of a software artefact that performs the tests must satisfy or cover. These requirements are sometimes compiled in a
‐ 49 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
document named test plan (sometimes known as test specification). A test plan provides a set of ideas which the tests will be conducted, and includes the following features: ‐ Quality views or attributes are constraints on the services or functions offered by the system, that is, non‐functional requirements. It is not possible for any system to be optimized for all of these attributes. For example, improving robustness may lead to loss of performance. The test plan should therefore define the most important quality attributes for the software that is being developed [132]. ‐ The test goal or objective is the intention or purpose of the testing activities. There are two distinct testing objectives [132]: i) To demonstrate to the developer and the customer that the software meets its requirements (verification). ii) To discover situations in which the behaviour of the software is incorrect or does not conform to its specification (defect testing). There is no define boundary between these two approaches: during verification testing defects can be found, and during defect testing some of the tests can show that the SUT meets or not its requirements. ‐ Test process is the description of the steps in the lifecycle performed according to procedures that have been approved conforming to the QA plan adopted by the developing organization. 2. Test design. This stage produces the description of test cases according to the test plan. In some cases, the test design is described in a document called test model, although test design can be also be depicted in the test plan. A test case (or simple test) is a procedure, whether manually executed or automated, that can be used to verify that the SUT is behaving as expected [101]. A collection of test cases running together is known as test suite. A test case prepared form to be executed on the SUT and produce a report is known as test script [5]. It should be defined the following features for each test case: ‐ The first part of the test design should be identifying the elements to be assessed. The test level sets the scale of the piece of code under test, or where test cases are added in the software development process, for example unit, integration, or system testing. ‐ The test strategy (also known as method or approach) is the point of the view of the test case, namely: black‐box (functional), white‐box (structural), or non‐functional (e.g. performance, security, usability, reliability testing, among others depending on the non‐functional requirement to be checked). The combination of the black and white‐ box approaches is usually known as grey‐box testing. ‐ Testing activities should be driven by Computer‐Aided Software Engineering (CASE) programs, i.e. software testing tools. In the test design phase the test tools to be employed should be selected. 3. Test implementation. This stage instantiate each of the designed test cases. There are two possible strategies to select test cases [132]: i) Partition testing, where testers identify groups of inputs that have common characteristics. ii) Guideline‐based testing, there experience is used to choose test cases. Each selected test case should include: ‐ Test logic is the part of the test cases that implements the test design using a programming language. The test fixture (also known as test context) is the part of the test logic which ensures that there is a controlled, stable environment in which tests are run so that results are repeatable. A test fixture first creates the desired state for test prior to execution (setup) and then cleans up after the test execution (teardown).
‐ 50 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
‐ Test data, i.e. inputs for test cases. It can be based on the requirements specification, the source code, or tester’s expectations. Test inputs are selected depending on the test goal. ‐ Test oracle, which is a mechanism used to determine whether a test has passed or failed. It could be an entity‐program within the test case, a process or a human expert. The test oracle within a test case is the code that decides success or failure for that test data [145]. A test oracle has two different parts, namely oracle information (expected output of the program for the selected input) and oracle procedure (comparator to verify actual results) [146]. The expect output is a complex entity that may include the following: i) Values produced by the program such as outputs for local observation (integer, text, audio, image), messages for remote storage, manipulation, or observation. ii) State change, such as state change of the program or state change of the database (due to add, delete, and update operations). The complexity of comparison depends on the complexity of the data to be observed. 4. Test execution, i.e. performing the actual tests. ‐ Test cases exercise the SUT, i.e. program is executed and the actual outcome of the program is observed. ‐ Using the test oracle, a test verdict is assigned to the test case execution. There are three major kinds of test verdicts: i) Pass: The program produces the expected outcome and the purpose of the test case is satisfied. ii) Fail: The program does not produce the expected outcome. iii) Inconclusive: In some cases it may not be possible to assign a clear pass or fail verdict. For example, if a timeout occurs while executing a test case on a distributed application, it is not possible to assign a clear pass or fail verdict. An inconclusive verdict means that further tests are needed to refine the verdict. ‐ A test report must be written after analysing the test result. The motivation for writing a test report is to get the found defects fixed.
Figure 16. Generic Testing Activities Once established the testing activities procedure, I describe how to automate these steps. The aim of this requirement stage is to define the test plan, and it is essentially manual. The testing requirements depend directly on the system requirements (functional and non‐functional). In order to achieve automation, I propose to reuse the development requirements as testing requirements. Nevertheless, the specific quality views, test goal and process for testing web
‐ 51 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
applications should be defined. Regarding test goal, it is two‐folded: verification testing (ensure requirements) and defect testing (seeking faults and so on). The following sub‐sections describe the quality views to be assessed and the generic process to achieve this aim. Design step has to do with the test model which supports the test plan, and it also requires the intervention of human testers. They should appoint what are the test requirements, the test strategy and the test levels to be carried out. Once the test plan and model are defined, the instantiation of the test model will create test cases (implementation). This part can be effectively automated. This derivation can be divided in the following steps: i) Test logic generation; ii) Test data generation; iii) Test oracle generation. Finally, the execution and report steps are usually automated since these steps are carried out by testing tools. The definition of the automation in test design, implementation and execution are depicted in sections 5, 6 and 7 of this dissertation.
4.2.2. Automated Software Analysis Besides AST, the second activity to achieve automated quality control is Automated Software Analysis (ASA). The difference is that ASA does not involve the execution of the SUT, but it examines statically the source code. ASA is based on the use of a set of rules to guide the analysis of the source code. These rules can be divided in the following groups: ‐ Best practices are generally‐accepted techniques, methods, or processes that have proven to find faults over time. ‐ Patterns are reusable solution to solve recurring problems. ‐ Assumptions are conjectures about the correct way of working of a software component. ‐ Bad smells are undesirable symptoms within the source code. ‐ Fault description is a representation of problematic issues within the source code. Basic static analysers run simple text‐based searches for strings and patterns in source code files, recursively analysing the code base for faults and then generating an analysis report. More modern static‐analysis tools trace the data’s path through code to provide a more complete and accurate analysis [71]. This behaviour is illustrated in the following picture:
Figure 17. Generic Analysis Activities This process is automated by definition: analyser employs pre‐established rules which guide the analysis, performed by a code scanner which examines the source code looking for these rules. As a result, an analysis report is generated with the found faults in the source code.
‐ 52 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
4.3. Quality Views
Industry is going through a revolution in what software quality means the success of software products [115]. Quality is incorporated into a web application as a consequence of good methodology, design and implementation. As depicted in the state‐of‐the‐art section, to ensure the quality of a web application involves the assessment of the required functionality, but also the assessment of the non‐functional requirements (quality attributes). The most important non‐functional requirements for web applications have been identified in [34] and are summarized in Table 2 in section 2.4.3. Therefore, the quality dimensions for web applications to be ensured in this dissertation are: functionality (functional requirements), performance, security, compatibility, usability, and accessibility (non‐functional requirements). Both automated testing and analysis can be used to assess these quality attributes. The following sub‐sections identify the most suitable quality control activities (testing and/or analysis) to assess each of the selected quality attributes.
4.3.1. Functionality The functionality is evaluated to ensure the conformance to customer requirements. In the case of the client‐side of web applications, the key aspect to ensure the functionality is the web navigation. Web navigation plays a key role in the overall web experience. The act of navigating from one page to another by means of web links is known as browsing. To automate the web navigation, that is, the web browsing, the structure of the web should be identified. In web navigation, users typically interact with data forms. These forms are used to submit the input data to the server for processing. Forms are composed by the following kind of fields: ‐ Text fields. These elements allow the user to input of a single line of text. ‐ Textarea fields. These elements allow the user to input a multiple rows of text data. ‐ Checkbox buttons, multiple selection elements. These buttons are usually shown on screen as square boxes that can contain a white space (for unselected) or a tick mark or square (for selected). ‐ Radio buttons, single selection elements. These buttons are usually shown on screen as circular holes that can contain a white space (for unselected) or a dot (for selected). ‐ Select fields. These elements allow the user to choose one or more values from a list. ‐ File fields. These controls allow the user selecting a local file and uploading it to the web server. ‐ Buttons. These controls provide a way to trigger events. Some special types of buttons in web forms are the reset button (used to clear the form) and the submit button (used to take an action, typically send the form to the web server). The input data is processed by the server, and as a result, some output data is returned to the client. These output information can be shown to the user using forms too, although this option is not very common. It is usually displayed using the HTML capabilities to render information using document body elements. In the communication of client and web servers, it is very important the concept of web session. HTTP is a stateless protocol, so the server does not retain information or status of the
‐ 53 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
clients during multiple requests. HTTP cookies are employed to implement web sessions. A cookie is an object created by a server‐side and stored at the client (typically, in the disk cache of the browser) [127]. Cookies are used by the server‐side program to store and retrieve state information associated with the client, i.e. to keep the web session. All in all, the automation of the web navigation is carried out by exercising the SUT using real web browsers, where input/output data and session is managed. Therefore, automated testing will be carried out to assess the functionality of web application in this dissertation.
4.3.2. Performance Web performance is critical because users do not like to wait too long for a response to their requests. Web performance testing should be considered as a continuous activity to be carried out in order to tune the system adequately [34]. Effective performance testing cannot be carried out without using automated test processes. There is no practical way to provide reliable, repeatable performance tests without using some form of automation. Therefore, performance testing should be part of this dissertation. The aim of automated performance testing is to simplify the performance testing process. This is normally achieved by providing the ability to record end‐user activity and to render this data as scripts. After that, these scripts are used to create load testing scenarios which perform the actual performance tests. Therefore, the testing R&P process can be used to automate the performance testing process since record scripts can easily be rerun on demand (playback).
4.3.3. Security Web security assessment should provide evidences that a web application is protected against hostile attacks and malicious inputs. The heterogeneous web nature together with the very large number of possible users makes web applications more vulnerable than traditional ones and security assessment more difficult to be accomplished. There are several approaches to carry out security assessment: ‐ Black‐box testing takes an external perspective of the tested object. Tester only knows the inputs and outputs of the application. It is a good approach to anticipate attacks, because it proves the application from the attacker's perspective. ‐ White‐box testing is a more exhaustive approach because it needs to look into the application code to find security weaknesses. ‐ Static analysis run simple text‐based searches for vulnerability patterns in source code, recursively analysing the code base for security defects and then generating a report. This dissertation is focused in the client‐side of web applications. Therefore, the business logic code is not available to assess, thus white‐box testing and static analysis are not suitable to be automated. Hence, the automation of black‐box security testing will be the technique to be employed.
4.3.4. Compatibility As depicted in the state‐of‐the‐art section, compatibility assessment tries to find uncovered failures due to the usage of different web server platforms or client browsers. Since this dissertation is focused in the client‐side of web applications, compatibility assessment will be focused on client browsers.
‐ 54 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Web compatibility in the client‐side is achieved writing standard HTML/CSS which can be rendered by in any browser. Therefore, the automation of compatibility in this dissertation will be achieved by evaluating the accomplishment of these client‐side elements (HTML, CSS) by means of automated static analysis.
4.3.5. Usability Usability is defined as the degree to which users can perform a set of required tasks. Web applications have become a standard and cross‐platform means to communicate people and make businesses on Internet. Brink et al claims that “high usability is a key factor in achieving maximum return on information technology investments” [9]. Therefore, web usability may determine the success of the application. As a consequence, the application front end and the way users interact with it need greater attention along the quality control process. There are different ways to assess web usability: ‐ Usability inspection involves a designer (or group of them) evaluating the user interface of a web site based on general design principles or specific lists of guidelines. ‐ Group walkthroughs involves a group of stakeholders walking through common tasks on the web site. At each step of the task, the group identifies any issues in the design and tracks fixes that need to be made. It is very similar to a usability inspection but it is task‐ oriented, and it often involves non‐designers. ‐ User testing involves observing users performing specific activities with the web site to identify what problems they have as they use the site. A special case of user testing is “hallway testing”12, which uses a group of users to test the usability of a web application. ‐ Static analysis employs rules for good design and heuristics to find potential usability issues. Inspections, walkthroughs and user testing are manual processes that cannot be completely automated by definition. Therefore, static analysis is selected to perform automated usability assessment in the client‐side or web applications in this dissertation.
4.3.6. Accessibility Web accessibility assessment evaluates how well web applications can be used by people with disabilities. Web accessibility evaluation combines different disciplines and skills. There are several scopes of evaluation, from individual web pages, collections of web pages, whole web sites, or just specific parts of web pages such as tables or images. Web accessibility assessment is closely related to the development process since it is carried out with the purpose of improving or maintaining the web content. There are three main types of web accessibility assessment techniques [102]: ‐ Manual testing, which is carried out by human testers. The types of manual testing are depicted as follows: o Non‐technical evaluation which are carried out by non‐technical evaluators such as content authors, e.g. to determine if the ALT‐attributes describes the purpose of the images appropriately or if the transcriptions for the multimedia content is correct.
12 http://www.useit.com/alertbox/20000319.html
‐ 55 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
o Technical checks which are usually carried out by web developers, evaluating markup code and document structure as well as compatibility with specific technologies. o Expert checks (walkthroughs) which are carried out by evaluators who have knowledge of how people with disabilities use the web and who can identify issues that relate to the user interaction. ‐ User testing, which is carried out by real end‐users in informal or formal settings. In general one there are two modes of user testing: o Informal checks, which can be carried out by non‐experts, for example by asking individual persons like friends or colleagues for their opinions. o Formal checks which are usually carried out by professionals who follow well‐ established usability procedures. ‐ Automated evaluation, which is carried out without the need for human intervention. There are the following types of automated evaluation: o Syntactic checks. It consists in the analysis of the web application ensuring the correctness of the web content such as checking the existence of ALT‐attributes in IMG elements or LANG‐attributes in the root HTML elements, and others. o Heuristic checks. It examines some of the semantics in the web content such as the layout and markup or the natural language of information. o Indicative checks. It uses statistical metrics and profiling techniques to estimate performance of whole web sites or large collections of web content. These techniques are useful for large‐scale surveys, for example to monitor the overall developments in the public sector of a country. Similarly as usability, there are accessibility assessment techniques predominantly manuals (user and manual testing). Therefore, it is clear that automated accessibility evaluation based on static analysis is the best choice to be included in this dissertation.
4.4. Test Process
Once the quality dimensions are defined and following the objectives described in chapter 3, this section establishes the generic process to automate the quality control activities for web applications in the client‐side. Following the guidelines described to asset the functional requirements (see section 4.3.1), the proposed process is based on the automated browsing of web applications. To perform this automation, it is needed to model the navigation of a web site, and then divide that navigation in independent paths. Automation in testing and analysis demand not only time but also resources in terms of preparation and planning. Therefore, automated quality control should be an assisted process. In order to automate these activities some human interaction is required [94]. Thus, the first step in the generic process proposed in this dissertation is to establish the correct navigation structure of the web under test. This step should be done by human the tester or developers in charge of the quality control of the SUT. It will guide the automation process, since it must be known forehand the right way of traversing a web application. Quality control activities are primarily based in comparisons. As depicted in Figure 16, the test oracles must know the expected outcome prior to exercise the web system. This also applies to the proposed process
‐ 56 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
to automate the quality control of web application based on the navigation: the correct navigation should be established in order to know what it is right and what it is not. I propose the following types of modelling the navigation (these models will be described in detail in next section): ‐ UML models. UML is the de‐facto standard for modelling and design. Reusing such models for quality control it is a way of saving time. ‐ XML files. I will propose a self‐defined XSD schema to model the navigation. Such models are useful in analysis and design, but these kinds of files will be richer than UML in the sense of these files can contain test data and oracles. ‐ R&P scripts. This king of input will be useful for finished web applications, or at least when the web application can be executed. Therefore, this kind of input is useful for operation and maintenance stages in the SDLF. Once the navigation structure is defined, I use graph theory to represent the defined web site navigation. Graph theory is the study of graphs in mathematics and computer science. A graph is the abstract representation of a set of vertices (vertex or nodes) connected by arcs (edges or links). A graph is a pair G=(V,E) of sets such that E ⊆ [V]2; the elements of V are vertex/nodes and the elements of E are the edges/links. The usual way to picture a graph is by drawing a dot for each vertex and joining two of these dots by a line if the corresponding two vertices form an edge [15]. The following table shows definitions useful to understand the process to be defined.
Table 6. Graph Types Graph Type Example A graph in with the edges have no orientation is known as undirected graph.
In a mixed graph some edges may be directed and some may be undirected.
If the edges have orientation, the graph is known as directed graph (or digraph) [9]. A digraph is acyclic if it has no cycle. A digraph is strongly connected (or, just, strong) if every vertex is reachable from every other vertex, i.e. there is a path from each vertex in the graph to every other vertex. In a weighted graph a number is assigned to each edge. This number (weight) could represent costs, lengths and so on. A weighted digraph is known as a network. A flow is a network where each node has a capacity and each edge receives flow[20].
‐ 57 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
A multigraph is a graph in which is permitted having multiple edges (two or more edges that are incident to the same two vertices) and/or loops (edge that connects a vertex to itself). If the multigraph is directed, then is known as multidigraph.
A path P=(V,E) is a graph of the form V={vo,v1,..,vk}, E={eo,e1,..,ek} where all the nodes ei are all distinct. If the start node is the same than the end node, then the path is known as cycle. A walk is a path in which nodes or links may be repeated. A circuit is closed walk. A trail is a path in which all the edges are distinct. In a Hamiltonian path each node is visited exactly once. In an Eulerian trail each edge is visited exactly once. An Eulerian circuit (or tour) is an Eulerian trail which starts and ends on the same vertex [44]. A Hamiltonian circuit (or tour) is a Hamiltonian path which starts and ends on the same vertex. A tree is a graph in which any two nodes are connected by exactly one path. In other words, a tree is connected graph with no cycles. A forest is a graph with no cycles. In other words, a forest is a disjoint union of trees.
All in all, web navigation can be modelled by means of a finite multidigraph, that is, a finite directed graph (finite set of web pages and nodes) in which multiple edges and/or loops are allowed. Any web page of the SUT will correspond to a single node within the graph. The following step in the automation process is to define the structural model coverage criteria. Due to the fact that a multidigraph is a transition‐based model, the following coverage criteria can be applied [141]: ‐ All‐paths: Every path must be traversed at least once. ‐ All‐states: Every state of the model is visited at least once. ‐ All‐configurations: Every configuration of a graph is visited at least once. ‐ This coverage criterion applies for systems with parallel execution. If a snapshot is taken of such a parallel system during its execution, two or more active states can be found. Each of these snapshots is called a configuration. For systems that contain no parallelism, this coverage criterion is the same as all‐states coverage. ‐ All‐transitions: Every transition of the model must be traversed at least once. ‐ All‐transition‐pairs: Every pair of adjacent transitions in model must be traversed at least once. ‐ All‐loop‐free‐paths: Every loop‐free path must be traversed at least once. ‐ All‐one‐loop‐paths: Every path containing at most two repetitions of one (and only one) configuration must be traversed at least once; ‐ All‐round‐trips: Is similar to the all‐one‐loop‐paths criterion because it requires a test for each loop in the model, and that test only performs a single iteration around the loop.
The hierarchy of these criteria types is illustrated in the figure below:
‐ 58 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Figure 18. Transition‐based Coverage Criteria Due to the fact that the graph represents the web navigation of the SUT, the selected coverage criteria will be all‐paths coverage, and concretely the all‐transition type. This criterion establishes that each edge (web transition) is traversed at least once. As Figure 18 shows, this criteria also implies that each vertex is visited at least once (all‐states). Therefore, given a multidigraph, it is necessary to be able to select the different paths within it. Once the independent paths are found, the automation of the navigation will be performed. As travel along these paths, testing and static analysis is carried out in each page in order to assess the selected quality attributes (functionality, performance, security, compatibility, usability and accessibility). In testing terms, the model of a web application using a multidigraph is the system testing level. The evaluation of each independent path can be considered as integration testing. Finally, the assessment of each single page is the lowest level, i.e. unit testing. This approach is illustrated in as follows:
Figure 19. Methodology Levels Thus, the result of the automated assessment of the SUT will be the composition of the results in quality control for such attributes: (I) ∑ , , , , , Where: ‐ : Quality control results. ‐ : Functionality results. ‐ : Performance results. ‐ : Security results. ‐ : Compatibility results. ‐ : Usability results. ‐ : Accessibility results.
4.5. Summary
This section has presented the methodological basis of this thesis, which is basically composed by two parts: on the one hand the quality dimensions to be covered; on the other hand the
‐ 59 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
generic process to automate the quality control for web applications in the client‐side. The quality goals to be covered are summarized in Figure 20. In this picture, the quality dimensions to be assessed with software testing are illustrated with red colour background (functionality, performance, and security). The quality attributes to be evaluated using static analysis are illustrated with green colour background, i.e. compatibility, usability, and accessibility.
Performance
Functional Security V&V Non‐Functional Compatibility
Usability
Accessibility
Figure 20. Methodology Quality Dimmensions Finally, the methodology establishes the generic process to perform the automation of quality control of web applications. This process has four steps. First, the web site under study is modelled using a multidigraph, i.e. a finite directed graph in which multiple edges and/or loops are allowed. Second, some method should be used in order to find the independent paths within the multidigraphs. Third, each found path is traversed by automatically browsing the web application from the client‐side. Finally, for each page within each path, it is carried out testing and static analysis (i.e. software quality control) in order to assess the selected quality factors. This process is summarized as follows:
5. Quality control results are 4. Automated aggregated in a testing and unified report analysis is 3. Each path in the navigation is performed in traversed each state of the paths 2. Navigation is automatically modelled using a multidigraph
1. Testers define the correct navigation structure
Figure 21. Methodology Process Each step of this process is detailed in the chapters 5 (automated functional) and 6 (automated non‐functional) of this PhD dissertation.
‐ 60 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Chapter 5. Automated Functional Testing
Anyone who has never made a mistake has never tried anything new.
‐ Albert Einstein
esting is the main activity performed for evaluating software‐intensive systems quality, and for improving it, by identifying defects and problems [1]. This section is focused on the automation of functional testing of web applications in the client‐ side. Web testing is a difficult task, due to the peculiarities of such applications. A Tsignificant conclusion has been reached in the survey of web testing depicted in [34]: “further research efforts should be spent to define and assess the effectiveness of testing models, methods, techniques and tools that combine traditional testing approaches with new and specific ones”. In line with this statement, and following the guidelines explained in the methodology, this contribution presents specific methods to perform automated functional testing for web applications in the client‐side. Functional testing has the responsibility of uncovering failures of the applications that are due to faults in the implementation of the specified functional requirements. Di Lucca and Fasolino draw an important conclusion about functional testing for web applications [34]: “As to the functional testing, existing tools main contribution is limited to manage test case suites manually created, and to match the test case results with respect to a manually created oracle. Therefore, greater support to automatic test case generation would be needed to enhance the practice of testing Web applications”. This piece of research presents a method to perform functional testing for web applications by automating its navigation using a real browser. On one hand, web navigation is the process of traversing a web application using a browser. On the other hand, as depicted in the state‐of‐ the‐art section, functional requirements are actions that an application must do [126]. Therefore, the evaluation of the correct navigation of web applications results in the assessment of the specified functional requirements. The method proposed to perform this automated functional testing can be seen as the basis of this dissertation. As depicted in the methodology section, the automation will be led by the correct navigation structure defined by using several ways: UML, XML, and R&P scripts. Moreover, this automation of the web navigation will be also used to guide the assessment of the selected non‐functional attributes, namely performance, security, compatibility, usability and accessibility (chapter 6).
‐ 61 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The remainder of this chapter is divided as follows. First, I present the concept and metamodel to represent web applications handled in this dissertation. Second, the automation of the functional testing approach is depicted. Third, a thorough description of the different way of modelling web applications for developers and testers is presented, i.e. UML, XML, and R&P scripts. Fourth, a survey and laboratory experiment is presented in order to look for the most suitable way to find the independent paths within a multidigraph representing the navigation. Finally a summary of this contribution is provided.
5.1. Scope of the Dissertation
Web applications are accessed by navigation mechanisms implemented by hyper‐links. Focusing in the client‐side of a web application, the interaction is reduced on web browser using HTTP to a remote server. Thus, focusing on the navigational nature of the Web, such applications can be seen as a set of web states:
(II) , ,…,
Each web state is composed by a set of elements that can be accessed with the API Document Object Model (DOM). The nature of such elements is heterogeneous, and it is identified by its Internet media type, originally called MIME (Multipurpose Internet Mail Extensions) type. A complete list of the different kinds of MIME types can be found on the W3Schools web site13. IANA (Internet Assigned Numbers Authority) manages a registry of these types. The list of such types is available on its web site14, and it is summarized as follows:
‐ Application: For multipurpose elements, for example: application/javascript, application/json, application/zip, and so on. ‐ Text: For example, text/html, text/css, and so on. ‐ Image: For example, image/gif, image/jpg, and so on. ‐ Audio: For example, audio/mpeg, audio/ogg, and so on. ‐ Video: For example, video/mpeg, video/mp4, and so on. ‐ Message: For example, message/http, message/rfc822, and so on. ‐ Model: For 3D model, such as model/vrml, model/iges, and so on. ‐ Multipart: For archives and other objects made of more than one part. For example, multipart/mixed, multipart/encrypted, and so on. ‐ Vnd: For vender‐specific files, for example application/msword. ‐ X: For non‐standard file, such as application/x-latex. ‐ X‐PKCS: For PKCS (Public‐Key Cryptography Standards) files, for example application/x-pkcs7-mime. The following equation represented a state composed by a set of elements:
(III) , ,…, The most important kind of elements will be those based on text, since in this kind of elements are contained the HTML elements. HTML elements can contains web forms, which are the elements which contains data to be submitted to server.
13 http://www.w3schools.com/media/media_mimeref.asp 14 http://www.iana.org/assignments/media‐types/index.html
‐ 62 ‐ PhD Dissertation Boni García Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Given a web application, its navigation always has an entry state. This state is identified by its URL, and the following states are connected by means of transitions. Therefore, in order to know a navigation of a web application, it should be known the entry point (an URL) and the sequence of transitions among the following states. In other words, it is not necessary to know each URL of the states after the first one to model the navigation of a web application. Each web transition is composed by a sequence of atomic actions ∝ . Examples of atomic actions could be clicking a link, moving the mouse over some HTML element, and so on. The factor that distinguishes a transition is the fact that when the set of atomic actions in is performed, as a result a HTTP request from the client to the server is triggered. This HTTP request will result in a HTTP response that will change the state to . All this information is summarized in the following equation: