Otto-von-Guericke-Universit¨atMagdeburg
School of Computer Science Department of Technical and Business Information Systems
Master Thesis
Feature-Oriented Decomposition of SQL:2003
Author: Sunkle Sagar October 24, 2007
Advisor: Prof. Dr. rer. nat. habil. Gunter Saake, Dipl.-Inform. Marko Rosenm¨uller, Dipl.-Inform. Martin Kuhlemann University of Magdeburg School of Computer Science Department of Technical and Business Information Systems P.O. Box 4120, D–39016 Magdeburg Germany Sunkle, Sagar: Feature-Oriented Decomposition of SQL:2003 Master Thesis, Otto-von-Guericke-Universit¨at Magdeburg, 2007. i
Acknowledgements
I would like to thank my master thesis advisors for their support and reviews. I would like to thank Prof. Dr. Gunter Saake for his support and confidence in me and allowing me to work on master thesis in his group. I would like to thank Marko Rosenm¨uller, Martin Kuhlemann and Norbert Siegmund for their helpful reviews and suggestions. Marko and Martin reviewed this thesis for smallest details. I would like to especially thank Marko, he was specifically assigned to me for the thesis and without our lengthy and very interesting discussions about various related topics where he tried patiently to answer my innumerable queries orally or by emails, this thesis would not have been possible. Finally I would like to thank my family because of whom this education is made possible for me. ii CONTENTS iii
Contents
Contents iii
List of Figures vii
List of Tables ix
List of Abbreviations x
1 Introduction 1 1.1 Motivation ...... 1 1.2 Goals ...... 4 1.3 Structure of the Thesis ...... 4
2 Background 5 2.1 SQL ...... 5 2.1.1 History ...... 5 2.1.2 Standardization and Evolution ...... 6 2.2 Software Product Line Concepts ...... 7 2.2.1 Software Product Line Engineering ...... 7 2.2.2 Domain Engineering ...... 8 2.2.3 Application Engineering ...... 10 2.3 Feature-Oriented Decomposition ...... 11 2.3.1 Features ...... 11 2.3.2 Feature Diagrams ...... 11 2.4 Feature-Oriented Programming ...... 17 2.4.1 GenVoca ...... 17 2.4.2 AHEAD ...... 18 iv CONTENTS
2.4.3 Using GenVoca/AHEAD and Related Tools ...... 19 2.5 Separation of Concerns ...... 20 2.6 Summary ...... 23
3 Feature-Oriented Decomposition of SQL:2003 24 3.1 Feature Modeling Technique for SQL:2003 ...... 24 3.1.1 Basis for Modeling Features in SQL:2003 ...... 25 3.2 Feature Diagrams for SQL:2003 ...... 28 3.3 Sub-grammars Based on Feature Diagrams ...... 36 3.4 Summary ...... 41
4 Issues in Feature-Oriented Decomposition of SQL:2003 42 4.1 Other Implementation Models ...... 42 4.1.1 Superimposed Variants ...... 42 4.1.2 Hyperspaces ...... 43 4.1.3 Comparison of Different Implementation Models ...... 45 4.2 SQL:2003 Specific Issues ...... 45 4.3 Related Work ...... 50 4.4 Summary ...... 52
5 Conclusion 53 5.1 Further Work ...... 54
Bibliography 56
Appendices
A SQL:2003 Feature Diagrams 61
B Taxonomy of SQL Non-Framework Optional Fearures 86 B.1 Java Routines and Types Using the Java Programming Language (SQL/JRT) ...... 87 B.2 SQL Object Language Bindings (SQL/OLB) ...... 87 B.3 SQL Persistent Stored Modules (SQL/PSM) ...... 87 B.4 SQL Management of External Data (SQL/MED) ...... 88 B.5 SQL XML-Related Specifications (SQL/XML) ...... 88 CONTENTS v
C SQL Platform Support 91 vi CONTENTS LIST OF FIGURES vii
List of Figures
2.1 Structure of the SEI Framework for Product Line Practice [CE00] . . . .8 2.2 Domain Engineering and Application Engineering as parallel processes [CE00] ...... 10 2.3 Feature Diagram with a concept node and three features...... 12 2.4 Feature Diagram with mandatory and optional features...... 13 2.5 Alternative and OR features...... 13 2.6 AND features...... 14
3.1 Parent child relationships in feature diagrams as grammar rules [Bat05]. . 26 3.2 Main Feature Diagram of SQL:2003 ...... 28 3.3 Domain Definition Feature Diagram ...... 29 3.4 Table Definition Feature Diagram ...... 30 3.5 View Definition Feature Diagram ...... 31 3.6 Schema Routine Feature Diagram ...... 32 3.7 Insert statement Feature Diagram ...... 33 3.8 Merge statement Feature Diagram ...... 33 3.9 Query Expression Feature Diagram ...... 34 3.10 Query Specification Feature Diagram ...... 35 3.11 Table Expression Feature Diagram ...... 35
4.1 Overview of Superimposed Variants Approach [CA05] ...... 43 4.2 Hyperspace matrix with two relevant dimensions; Classes and Features [PRB03] ...... 44
A.1 SQL/Foundation Feature Diagram ...... 61 A.2 SQL schema statement Feature Diagram ...... 62 viii LIST OF FIGURES
A.3 Schema Definition Feature Diagram ...... 63 A.4 Column Definition Feature Diagram ...... 64 A.5 Sequence Generator Feature Diagram ...... 65 A.6 Trigger Definition Feature Diagram ...... 66 A.7 User-Defined Type Definition Feature Diagram ...... 67 A.8 Grant Privilege Feature Diagram ...... 68 A.9 Privilege Feature Diagram ...... 68 A.10 Alter Statements Feature Diagram ...... 69 A.11 SQL Data Statements Feature Diagram ...... 70 A.12 Cursor Feature Diagram ...... 71 A.13 SQL Data Change Statements Feature Diagram ...... 71 A.14 Delete statement Feature Diagram ...... 72 A.15 Update statement Feature Diagram ...... 72 A.16 SQL Transaction statements Feature Diagram ...... 73 A.17 SQL Control statements Feature Diagram ...... 74 A.18 SQL Connection statements Feature Diagram ...... 74 A.19 SQL Session statements Feature Diagram ...... 75 A.20 SQL Dynamic Statements Feature Diagram ...... 76 A.21 SQL Diagnostic Statements Feature Diagram ...... 76 A.22 Scalar Expressions Feature Diagram ...... 77 A.23 Data Type Feature Diagram ...... 78 A.24 Window Function Feature Diagram ...... 79 A.25 Function Specification Feature Diagram ...... 80 A.26 Search Cycle Clause Feature Diagram ...... 81 A.27 Table Reference Feature Diagram ...... 82 A.28 Group By Clause Feature Diagram ...... 83 A.29 Window Clause Feature Diagram ...... 84 A.30 Predicate Feature Diagram ...... 85 LIST OF TABLES ix
List of Tables
B.1 Number of Features enlisted in the SQL:2003 Specification Draft . . . . . 86 B.2 SQL/JRT features ...... 87 B.3 SQL/OLB features ...... 87 B.4 SQL/PSM features ...... 88 B.5 SQL/MED features ...... 89 B.6 SQL/XML features ...... 90
C.1 SQL Platform Support -1 ...... 91 C.2 SQL Platform Support -2 ...... 92 x
List of Abbreviations
AHEAD Algebraic Hierarchical Equations for Application Design BNF Backus Naur form CLI Call Level Interface DSL Domain Specific Language FOD Feature-Oriented Decomposition FODA Feature-Oriented Domain Analysis FOP Feature-Oriented Programming FOR Feature-Oriented Refactoring FOSD Feature-Oriented Software Development JTS Jakarta Tool Suite JRT Routines and Types for Java Programming Language MED Management of External Data MBSE Model Based Software Engineering OLB Object Level Bindings PSM Persistent Stored Modules SPL Software Product Line SPLE Software Production Line Engineering SQL Structured Query Language Chapter 1. Introduction 1
Chapter 1
Introduction
1.1 Motivation
Databases have come up a long way since Codd’s concept of relational data model in 1970, now posed as the most vital component in the status quo of information technology. Database technology is at the core of multitude of software applications such as business transaction applications of varying size, digital libraries, web applications like online banking and online shopping, scientific projects like the human genome mapping project and NASA’s earth observation system, Enterprise Resource Planning (ERP) systems, data warehouses, business intelligence applications like data mining and Online Analytical Processing (OLAP), embedded systems for personal information management, etc. The basic structure of databases has evolved from the relational model to encompass other conceptual models like entity relationship model, object relation model, object role model1, etc. Databases are available now also in the form of semi structured databases, object-oriented databases, multi-dimensional databases, distributed and parallel databases, etc. It can be argued that database technology will continue to evolve at an alarming rate as it makes foray into new domains with new domain specific techniques being invented and merged into current database technology.
It has been observed that most popular database vendors tend to provide bits and pieces of support to every other kind of database functionality, be it indexes or XML support or special kinds of queries, etc [CW00]. Many small database vendors, which are large in number, cater only to a specific type of database technology, but large companies inevitably offer a jumble of features packed in one product. In a way, features of database products have been treated as a sales and marketing issue by big players without much consideration to how bloated such a product becomes for simplest of database applications. Every new release of the database product comes with the tag ‘Feature Rich’, claiming to be better than competitors.
Ironically, any enterprise of modest size makes use of only a tiny bit of mam- moth functionality provided by database products. This tends to be low-end features [CW00]. Most of the high-end features are rarely or never used. On one hand, it’s good to have a large array of features that could give enhanced performance and
1http://www.orm.net/ 2 1.1. Motivation
other advantages, on the other hand one can also say that database products are overloaded with features [CW00]. Database vendors have ‘learned’ to adopt a one size fits all approach and maintain a single code line with all database management services they provide, reasons for which Stonebraker et al. [SC05] identify as maintenance and compatibility cost of code base and sales and marketing problems. They conclude that this is no more applicable and that “the commercial world (of databases) will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser”.
Like databases which are getting caught in the universality trap [CW00], the structured query language (SQL) which is the basis for interaction between database technology and its user, has grown enormously. As Chaudhuri et al. [CW00] claim, the core of SQL, selection-projection-join queries and aggregation is extremely useful, but from those conceptually simple queries SQL now contains ever growing number of additional constructs like nested subqueries, recursive views, joins with added functionality, null value support, data structures like multi-sets and arrays, XML type and so on, asking a common developer to learn all the esoteric syntax and know correct application for each construct. Although every addition to features is useful in a specific context, with so many choices overall, it increases the complexity and confusion in the developer’s mind.
Software Production Line Engineering (SPLE) is a software engineering approach that considers such issues of products of similar kind made for a specific market segment and differing in features, also called Software Product Line (SPL). We propose that software product line research is capable of providing answers to the problems of features in products, especially database products, and that SPLs should be developed using appropriate decomposition mechanisms for concerned products. This should apply to all artifacts involved in software product line development. SPLE considers software artifacts in terms of reusable assets with a predictable reuse in the products of product line. Therefore, engineering database technology as a software product line using Feature-Oriented Software Development (FOSD) concepts, is favorable in terms of predictive reuse as against merely opportunistic reuse2 and can be applied to database management systems in variety of domains. We propose that feature-oriented con- cepts are applicable to SQL:2003 as well, which is the latest ISO/ANSI standard for SQL.
Decomposing SQL:2003 using feature-oriented concepts can be beneficial and in- sightful not only in managing features of SQL itself but also in database technology of embedded and real time systems as well as in the area of software generators where these concepts are immediately applicable as explained in the following:
• Database Systems for Smartcards and other Embedded Systems Em- bedded systems contain both hardware and software with varying influence on the software part, e.g., music systems, mobile phones, personal digital assistants. Smartcards are a kind of embedded systems that contain a software chip which can store and manipulate data. Bobineau et al. [BBPV00] have identified the need of scaling down database functionality for smartcards. According to them, smartcards are being used to store personal data in healthcare, banking and insurance appli-
2http://www.softwareproductlines.com/introduction/introduction.html Chapter 1. Introduction 3
cations and in each case sophisticated queries are run against this data. Kersten et al. [KWF+03] have similarly identified the need for embedding database man- agement systems in various hardware appliances. Peer-to-peer and stream based architectures for embedded devices too, require declarative query processing for resource discovery, caching and archiving, etc [KWF+03]. A standard called Struc- tured Card Query Language (SCQL) by ISO considers inter-industry commands for use in smart cards [Int99]. A feature decomposition of SQL:2003 can be used to cre- ate ‘scaled down’ version of SQL appropriate for such applications, by establishing a product line architecture for SQL variants.
• Database Systems for Real Time Applications Real time applications such as automotive control systems demand management of large amount of data with temporal properties against which pre-compiled and ad-hoc queries are executed [NTN+04]. The Electronic Control Units in automotive systems, which are respon- sible for processing the collected data, require different kinds of database function- ality as well as different kinds of queries resulting in different configurations of the database system. Nystr¨omet al. [NTN+04] suggest that different database con- figurations including the query processing component for these configurations can be generated from preconfigured database components. This configuration activity can benefit from a well defined SQL product line in which reusable assets for the most common configurations are considered.
• Software Generators for Programming Languages In the current software design and development scenario, the view of real world entities in terms of objects, classes, and methods is quite prevalent. According to Batory et al. [BBGN01] this makes it difficult to create low level specifications of applications that can be used in automating software generation. To make this happen, they argue that focus in software engineering approaches must shift to the key concepts of Domain Specific Language (DSL) and Features. There must be programming support not just in terms of generic languages but rather domain specific languages so that domain and task specific notations can be used to produce superior software design. Instead of low level code centric components, reusable units that implement orthogonal fea- tures must be the building blocks of this support [BBGN01]. At the same time, they assert that, the rationale for this is the fact that customers demand features in the products and do not care about code contained in the software. An effort in this direction was made with GenVoca [BO92],[BJMvH00],[BLHM02], in which language and language extensions are viewed as reusable units that encapsulate features rather than code modules. Bali is a related tool which is used to compose grammars of programming language in order to extend it [BLS98], signifying the feature view of programming language extensions. We base our work on feature de- composition of SQL:2003 mainly on grammar specification for SQL:2003 in various ISO/ANSI standards and gain thereby further insights about feature extension of a declarative query language and programming language in general by composing extension grammars.
Thus, a variety of software applications stand to gain by application of product line concepts to database technology. We take a step in this direction by focusing on the feature-oriented decomposition of SQL:2003. 4 1.2. Goals
1.2 Goals
In this thesis we explain how features and feature modeling form the analysis phase of Software Product Line Engineering and show that SQL:2003 can be decomposed on the basis of feature concept. Specifically we intend to:
1. Decompose SQL:2003 into features using various SQL:2003 ISO/ANSI standards and present these features in terms of feature diagrams.
2. Present an example of how to construct sub-grammars of SQL:2003 based on de- composed features and show how to compose them to obtain customized parsability for SQL:2003.
3. Review related feature implementation models and compare them within the con- text of SQL:2003 features.
The detailed structure of the thesis is explained in the next section.
1.3 Structure of the Thesis
Chapter 2 In chapter 2 we set the background for the feature-oriented decomposition of SQL:2003. We first take the review of history of SQL and its standardization. The standardization process is seen in the light of features added in the form of packages to the various SQL standards. We then take review of the Software Product Line En- gineering and its two sub-processes Domain Engineering and Application Engineering. The core concepts of features, feature diagrams and feature-oriented decomposition are then discussed, followed by a section on the related discipline of the feature- oriented programming. Finally, we conclude the chapter with an elaborate account of the concepts of Separation of Concerns and Multi-dimensional Separation of Concerns.
Chapter 3 In chapter 3 we begin by explaining the modeling techniques and es- tablishing the basis for the feature-oriented decomposition of SQL:2003. Some of the important feature diagrams are presented that explain the modeling technique used. We follow the decomposition by explaining how features in the feature diagrams can be associated with sub-grammars of SQL:2003 and a comparison between the Bali approach and our own approach.
Chapter 4 In chapter 4 we present other implementation models and compare them in the context of mapping SQL:2003 features. Various SQL:2003 specific issues are then discussed. We conclude the chapter with a review of the related work.
Chapter 5 Finally, we present the conclusions in chapter 5 and discuss further work. Chapter 2. Background 5
Chapter 2
Background
In this chapter we explore various topics required as background for proper understanding of the feature-oriented decomposition of SQL:2003. We review the history of SQL and its standardization. We explore the product line concepts further. Then we talk about feature-oriented decomposition and feature-oriented programming. Finally we discuss separation of concerns in the light of SQL:2003 features.
2.1 SQL
The following definition for SQL is given in [vdL06]. “Structured Query Language (SQL) is a database language used for formulating statements that are processed by database server.” Database is assumed to be “a collection of persistent data” [Dat95] and a database server or database management system is “a collection of programs that enable users to create and maintain a database” [EN03].
SQL is based on a formal theory known as Codd’s Relational Model [Cod70]. Since its conception SQL has now changed to a hybrid language with both non- procedural and procedural constructs, although in the beginning it was labeled a declarative and non-procedural database language. Triggers and stored procedures are examples of procedural constructs. SQL can be used both in an interactive manner, where a user can use SQL statements to manipulate data and in a pre-programmed manner, in which case a user need not be aware of SQL statement syntax, etc., which is common in most business applications [vdL06].
The following account is based on history of SQL presented in [vdL06].
2.1.1 History The history of SQL is closely related to the history of an IBM project called System R, the purpose of which was to create a relational database server. A language called SEQUEL was developed as a database language for system R by designers R. F. Boyce and D. D. Chamberlin, which was later renamed to SQL [vdL06].
In ‘phase zero’ of the project System R (1974-1975), only part of SQL without JOIN queries was implemented. In ‘phase one’ of the project (1976-1977), SQL was 6 2.1. SQL implemented afresh with multi-user capacity and support for JOIN queries. In the ‘phase three’ or the final phase (1978-1979), System R was installed and evaluated at various client bases. Knowledge gained in this implementation was useful in further advances in capabilities of SQL; IBM started developing commercial products based on System R. Finally SQL was standardized for the first time by the American National Standards Institute (ANSI) in 1986 [vdL06].
2.1.2 Standardization and Evolution The first ANSI edition of SQL standard was created in 1986, unofficially called SQL1 [vdL06]. In 1987, the ISO edition called ISO 9075-1987, ‘Database Language SQL’ was completed. SQL1 had very small set of integrity mechanisms. They were extended by adding support for primary and foreign keys in 1989 [vdL06]. This standard is known as SQL89 and the corresponding SQL document is called ISO 9075-1989, ‘Database Language SQL with Integrity Enhancements’. The successor to 1989 standard was given the name SQL2. Many new statements and extensions were added to 1989 standard to create the SQL92 standard [vdL06]. After SQL92 was published, SQL/CLI (Call Level Interface) was added in 1995. In 1996, SQL/PSM (Persistent Stored Modules) was added. In 1998, SQL/OLB (Object Level Bindings) was published as well. SQL3 or the SQL 1999 standard ultimately comprised five parts: SQL/Framework, SQL/Foundation, SQL/CLI, SQL/PSM and SQL/OLB. At the same time, plans for further additions were made. In 2003, the newest edition of SQL standard, referred to as ‘SQL:2003’ was created [vdL06], which consists of SQL/JRT (Routines and Types for Java Programming Language), SQL/XML and SQL/MED (Management of External Data) along with the original five parts of SQL 1999 standard with additional improvements. The part of SQL/Foundation that dealt with schema was taken out of it and put into SQL/Schemata in SQL:2003. For description of various SQL packages refer to Appendix B.
The SQL/CLI of 1995 was created based on a report by ‘SQL Access Group’, a committee set up by database vendors Informix, Ingres and Oracle, that attempted to define a standard for interoperability between applications created using different specifications. Finally Microsoft developed ODBC (Open Database Connectivity) based on SQL/CLI [vdL06].
The next standard, called SQL 20071 is in the making. It will add features like regular expression support, binary and floating decimal data types, materialized views, streaming data support, XQuery support and further enhancements to SQL/XML and support for RDF and semantic web.
The standardization process clearly shows how the core of SQL remained more or less constant from 1992 ahead, with additional features added to encompass foray of database technology into other areas of computing. All the discussion about features becomes relevant when we see how product line concepts are immediately applicable to database technology and its base standards.
1http://www.standards.org.au/downloads/ABS-2005-12.pdf Chapter 2. Background 7
SQL:2003 as we have seen, added SQL/XML, and made some modifications to other parts of SQL 1999. Following are the features specific to SQL:2003 [EMK+04]:
• New data types BIGINT, MULTISET, and XML. Types BIT and BIT VARYING were removed.
• Improved SQL-invoked routines (especially table functions that return a ’table’)
• New Create Table AS and Create Table Like statements, which are extensions to the Create Table statement
• New Merge statement, which combines the facility provided by SELECT and UP- DATE statements
• New Sequence Generators, which can automatically generate unique values for columns
• New Identity and Generated Columns, which automatically generate next values for specified columns based on evaluation of associated scalar expression
• New Window clause in Query Expression, which can be used to define window of rows against which window functions can be executed
• Support for sample data (Tablesample) for improved performance
• Improved Savepoint handling
2.2 Software Product Line Concepts
Software Product Line Engineering is a methodology for developing a variety of qual- ity software systems in short time [PBvdL98]. SPLE differs from other methodologies in terms of its stress on capturing and managing variability. SPLE contains two distinct de- velopment processes within itself: Domain Engineering and Application Engineering. We first take the review of SPLE and then Domain engineering and Application Engineering in turn.
2.2.1 Software Product Line Engineering In Pohl et al. [PBvdL98], following definition is given, “Software Product Line Engi- neering is a paradigm to develop software applications (software intensive systems and software products) using platforms and mass customization.”. This definition covers both standalone software and software embedded into a system that integrates both hardware and software (embedded systems). Developing applications using platform means plan for reuse and build reusable assets. Building applications for mass cus- tomization means employing the concept of managed variability. Managed variability means that adaptations need to be anticipated and accomplished in controlled and reproducible manner. Domain Engineering and Application Engineering are deemed as sub-processes of SPLE. Pohl et al. [PBvdL98] assert that there is a separation of two concerns here namely, “to build a robust platform and to build customer-specific 8 2.2. Software Product Line Concepts applications in a short time”. The first refers to Domain Engineering and the second to the Application Engineering. Figure 2.1 shows this relationship between Domain and Application Engineering in terms of reusable assets development and product development respectively. Further description of the concept of ‘separation of concerns’ is given in Section 2.5.
Figure 2.1: Structure of the SEI Framework for Product Line Practice [CE00]
2.2.2 Domain Engineering Czarnecki et al. [CE00] give the following definition of Domain Engineering.
Domain Engineering is the activity of collecting, organizing, and storing past experience in building systems or parts of systems in a particular do- main in the form of reusable assets (i.e. reusable work products), as well as providing an adequate means for reusing these assets (i.e., retrieval, qualifi- cation, dissemination, adaptation, assembly, and so on) when building new systems.
A common element among different Domain Engineering definitions is support for reuse in a family of similar applications.
Like analysis, design and implementation phases of software engineering, Domain Engineering consists of a Domain Analysis phase, Domain Design phase and Domain Implementation phase. The phases of Application Engineering parallel the phases of Domain Engineering with Requirements Analysis phase, Product Configuration phase and Integration and Testing phase respectively [CE00]. Customer needs are assessed during Requirements Analysis of Application Engineering while in Domain Analysis, useful knowledge about domain is gathered. In feature based methodology, requirements are presented as features and the domain model is built as Product Configuration and Domain Design phases go hand in hand. The results of these phases are used in establishing the product line architecture where different product configurations yield different products of the product line. The Domain Implementation phase may use Domain Specific Languages (DSLs) and other generator tools during the integration of similar products. The entire process is repeated as new requirements are made, Chapter 2. Background 9 signifying in some cases extra need for further Domain Analysis.
The phases of Domain Engineering are explained further:
• Domain Analysis Domain Analysis is used to define a specific domain and establish its scope [CE00]. Information from the current systems (if available), different stakeholders, information obtained from experiments and prototypes created before, standards documents (as in the various SQL:2003 standards documents used in this work) and any other related information available in any form is used during Domain Analysis. As such this is not a mere book keeping of all domain related informa- tion, rather it is used to gain as extensive knowledge as possible about a given domain so that the scope of domain can be established and insights about reuse are obtained.
With the knowledge about reusable assets in the system [CE00], a domain analyst can represent common and variable parts of system. The domain model contains information about relationships between common and variable parts as well as any accompanying constraints. Feature models are used to represent set of reusable and configurable requirements, treated as features, and consist of feature diagrams and additional information.
Czarnecki et al. [CE00] introduce two kinds of domain scope with respect to software systems in a domain: Horizontal or System Category Scope and Vertical or Per System Scope, which consider how different systems are formed in the domain and what parts of these systems are in domain respectively. In this way, Domain Analysis involves Domain Scoping and Domain Modeling.
• Domain Design The Domain Design is used to create a product line architecture [CE00]. For this, different functional and non-functional requirements such as performance, adapt- ability, extendibility are considered [CE00]. System components are arranged in architectural patterns (one of which is ‘layers pattern’ that arranges system com- ponents in groups of subtasks of a particular level of abstraction, another is ‘micro- kernel pattern’ which represents a minimal functional core which can be extended with customer specific parts of the system [CE00]). The architecture also estab- lishes how variability is represented and how products can be configured (Czarnecki et al. [CE00] maintain that configuration languages can be used for configurable or variable parts of the system).
• Domain Implementation In the final phase of Domain Engineering, the architecture established during Do- main Design is implemented along with the production plan [CE00]. Various gen- erator tools, configuration and other domain specific languages, GUIs, etc., may be used during Domain Implementation for realizing product specific production plans. In case of products delivered to customers are required to be augmented with more features then custom development may be carried out with these tools [CE00]. This is a customer specific addition of features as opposed to creating basic 10 2.2. Software Product Line Concepts
Figure 2.2: Domain Engineering and Application Engineering as parallel processes [CE00]
product variant based on configuration.
2.2.3 Application Engineering Czarnecki et al. [CE00] define Application Engineering as “the process of building systems based on the results of Domain Engineering”. The phases of Application Engineering operate simultaneously along with the phases of Domain Engineering. The processes of Domain Engineering and Application Engineering can be differentiated by the fact that Domain Engineering considers all possible systems within the restricted domain using the scope defined during domain scoping, whereas Application Engineering considers a concrete application based on customer requirements [PBvdL98]. Thus, different applications may be engineered at different times making use of knowledge acquired during previous Domain Engineering phases. If requirements change or additional requirements are made, these can be supported by Domain Design specification and with product configuration tools for application ordering (cf. Figure 2.2). As such Domain Engineering can be signified as Design-for-reuse, while the basic principle of Application Engineering can be designated as design-with-reuse2 [CE00].
How the products of a product line relate
There are two ways in which the products of the product line may relate to each other [SB00]. In the first, all products, of a product line may have common functionality while the remaining features are mutually exclusive. In the second, different products may have different core functionality and the rest of the features are such that they complement each other. With respect to the first, a product line offers customers with products that provide the same basic functionality and customers can avail themselves
2http://www.mpi-inf.mpg.de/∼kettner/courses/lib design 03/notes/intro.html Chapter 2. Background 11
by buying extra features as required in addition to the core functionality. With respect to the second, a customer can review product specifications and combine various features, in other words customize the product including the core and optional components as required.
2.3 Feature-Oriented Decomposition
We first take the review of the Feature concept. We also discuss the feature diagrams which are standard diagramming notation to organize features in hierarchical manner. We then review other pieces of information generally associated with the feature diagrams and finally present a definition of feature-oriented decomposition.
2.3.1 Features Definitions Different definitions of features can be found in related literature. Czarnecki et al. [CE00] give two definitions of features, as found in Domain Engineering literature.
An end-user-visible characteristic of a system.
A distinguishable characteristic of a concept (e.g., system, component, and so on) that is relevant to some stakeholder of the concept.
Svahnberg et al. [SvGB01] define a feature as “set of functional and non-functional requirements”, which is attributed to their assumption that “there is an order of magnitude difference between number of stated requirements and features encapsulating those requirements” and that a feature is used to “group related requirements”. This assumption follows from their notion of features as abstraction from requirements. Batory et al. [BLHM02] define a feature as “an increment in program functionality” and also “it is a product characteristic that is used in distinguishing programs within a family of related programs”. Czarnecki et al. [CHE04] in their cardinality based feature modeling, extend the definition of features from “end-user-visible and distinguishable characteristic” to “any functional and non-functional characteristic at requirements, architectural, component, platform or any other level”.
In modeling the features of SQL:2003, we take the view of features as end-user- visible and distinguishable characteristics of a concept.
2.3.2 Feature Diagrams Feature diagrams [KCH+90], [CE00] are used to model features in hierarchical manner as a tree, the root of which represents a concept. Feature diagram together with some additional information constitutes a feature model. The general contents of the additional information are given in Section 3.1.
The root of a feature diagram is called a concept node as it represents a concept, shown as the node CN in Figure 2.3. Other nodes are feature nodes. The hierarchical 12 2.3. Feature-Oriented Decomposition structure of feature diagram indicates that there is a parent child relationship between feature nodes. In Figure 2.3 A,B, and C are features of concept represented by node CN. Additionally, A is the parent node of B and B is parent node of C. Also A is the direct feature of CN, B and C are indirect features of CN [CE00]. B is also called direct subfeature of A while C is indirect subfeature of A [CE00]. The CN node can be a feature itself as well as a concept.
Figure 2.3: Feature Diagram with a concept node and three features.
The feature diagrams contain a various types of features such as mandatory, optional, AND features, alternative features and OR features. A feature instance is described by including the concept node of the feature diagram and traversing the diagram from the concept and depending on the type of the node, the node becomes part of the instance description [CE00].
Mandatory Features These are the features that identify the product. A mandatory feature is always included in the instance description except when its own parent is optional and not included in the instance description. Consider Figure 2.4. In any instance description of this feature diagram, CN and C are always included.
Optional Features Optional features may or may not be included in the instance description of feature diagram. They add value or extra functionality to the core features [SvGB01]. In Figure 2.4, A and D are optional features. B is a mandatory feature, but it is included in the feature instance description only when A is included too. Chapter 2. Background 13
Figure 2.4: Feature Diagram with mandatory and optional features.
Alternative Features These are set of features, only one of which can be included in the instance description, provided that their parent was included too. The alternative features contain an arc joining the edges of the alternatives; these are called edge decorations [CE00]. In Figure 2.5, CN has direct alternative features A,B, and C, only one of which can be selected at a time.
Figure 2.5: Alternative and OR features.
OR Features These are set of features from which any non-empty subset can be included in the instance description, provided that their parent was also included. In Figure 2.5, feature C has three OR features D, E, and F. Any non-empty subset of these can be included in the feature instance description when feature C was selected among the alternatives.
AND Features These are set of features all of which are included in the instance description depending on the type of each feature node. In Figure 2.6 two instance descriptions are possible, one with feature B and one without feature B, while including all other features in both. 14 2.3. Feature-Oriented Decomposition
Figure 2.6: AND features.
Cardinality Based Feature Modeling Czarnecki et al. [CHE04],[CK05] have proposed cardinality based extensions to the original feature model by Kang et al. [KCH+90].
The OR features of the original feature model were extended to group features having group cardinality (n-m) that specified the minimum (n) and the maximum (m) number of features to be selected from this group. The original model allows ‘one-or-more’ number of features without the facility to determine bounds. If no group cardinality is mentioned for a group, then (1-1) is the default cardinality. The cardinality of a solitary feature (i.e. the feature is not part of a group) determines how many time the feature can be cloned, i.e. how many times subtree (if any) emanating from this feature can be copied. Accordingly, a mandatory solitary feature has cardinality [1..1] and optional solitary feature has cardinality [0..1]. Another addition to the original model is attributes. A feature can have maximum one attribute, which itself can have a type associated with it.
Feature Variability Optional, alternative, optional alternative, and or-features are called variable features and the nodes to which these features are attached are called variation points [CE00].
Bosch et al. [BFG+02] identify the following as the most important issues about feature variability:
• Mechanism It is a way of generating or configuring different products of the prod- uct line, and used at various times during the development life cycle. Inheritance, preprocessor directives, make and build files, feature configuration templates are some examples of mechanism that can be used to configure and generate different products.
• Phase Variable features may be introduced and bound at different phases of the product life cycle. Introducing a variable feature later may cause some restructuring of domain model, but if feature analysis had anticipated a variation point then it Chapter 2. Background 15
is easier to accommodate a new variable feature. Since modeling variability and commonality is the hallmark of feature-oriented domain analysis, it is certainly better suited than other formalisms to cope with such variable features than others.
• Representation As a methodology, both object-orientation and feature- orientation contain formalisms and diagramming notations. Object-oriented no- tation is more concerned about classes, subclasses, interactions, etc., but contains no notation for features, while in feature-orientation precedence is given to features over internal details in the diagramming notation (which is feature diagrams and related extensions to them).
• Dependency Dependencies capture relationships between variation points and other features.
• Tool support Tool support is the presence of proper software tools to manage variability in products and assemble or generate different products of a product line.
Bosch et al. [BFG+02] found that variability is addressed for the first time at the architectural level in any software engineering approach. They assert that variability is delaying design decisions and at the architectural level variability analysis is devoted to the abstractions of variation points without any thought to how these are actually incorporated in the products.
A final observation made about variability by Bosch et al. [BFG+02] is that vari- ability is not ‘fixed’ in time. That is, variations points themselves may evolve, that variability needs to be managed not only in space but also in time. Along the space axis, it should be possible to create different products at the same time and along the time axis, individual products may evolve. This is important in creating a complete product line architecture for SQL:2003 using decomposed features. The implementation models for SQL:2003 should consider the variability in space and time, so that evolution of SQL:2003 product line and that of individual implementations could be efficiently managed.
Other information associated with feature diagrams A feature model represents the common and the variable features of a system under consideration and consists of feature diagrams and some additional information. Following information is generally associated with feature diagrams [CE00]:
• Semantic description Semantic description contains a short description about the feature, which is to be used by the developer during implementation phase, to get a quick reference about what a feature means. Any information that is useful in understanding more about features can be given in semantic description including additional diagramming notation based on given formalism.
• Rationale A feature diagram may also contain information about why a specific feature was included, i.e. the intent of a feature. 16 2.3. Feature-Oriented Decomposition
• Stakeholders and client programs Different stakeholders are interested in dif- ferent features. This information can be attached in addition to semantic descrip- tion and rationale, and may be used by developers to segment the system into specific stakeholder type and treat the features separately (i.e. provide different functionality) based on stakeholder type.
• Exemplar systems If the feature exists in any other system, then the information about how it is used there, how it was developed, how it was integrated in overall system, what formalisms were used in various phases to represent system related issues, and finally what kind of mapping method was used to create executable version of the system can be useful to the developer of current system to gain insights and direction. Therefore details about such implementations can be added as description to feature diagrams.
• Constraints and default dependency rules Two types of rules are most im- portant to instantiate a feature mapped to models: ’requires’ and ’excludes’. These conditions can span feature diagrams, often involving features from different fea- ture diagrams. Czarnecki et al.[CE00] assert that default dependency rules are used to assign default values to feature attributes which can be used as is or overwritten during model configuration. Together, constraints and default dependency rules can be used to establish a configuration of feature model.
• Availability sites, binding sites, binding modes Availability site indicates which feature is available to which stakeholder, including specific part of the system itself. Binding site and binding mode determine where and when a feature is bound and whether statically or dynamically.
• Priorities Priorities specify importance of a feature for inclusion in the overall system. Features of higher priority are implemented before features of lower priority.
Having established the concept of features and feature diagrams, we present the definition of Feature-Oriented Decomposition.
Feature-Oriented Decomposition is a feature modeling activity, carried out in Domain Analysis to capture commonalities and variabilities in terms of features, of systems in a domain. The concepts of features and feature modeling were originally developed by Kang et al. [KCH+90] as parts of the Feature-Oriented Domain Analysis (FODA). FODA was further developed at the Software Engineering Institute (SEI) [CE00]. According to Czarnecki et al. [CE00] FODA later became part of Model Based Software Engineering (MBSE), which encompasses both Domain Engineering and Application Engineering. As such FODA is the Domain Analysis component of MBSE3.
As seen in the Section 2.2.2 on Domain Engineering and Domain Analysis, FODA consists of phases which set up the scope of domain and produce a model (which is a feature model). These are called Context Analysis and Domain Modeling phases respectively, in FODA. The Context Analysis is used to study the domain scope. The
3http://www.sei.cmu.edu/mbse/ Chapter 2. Background 17
Domain Modeling itself consists of Information Analysis, Feature Analysis and Opera- tional Analysis [CE00]. Information analysis “captures domain knowledge about domain entities and relationship between them” and feature analysis “captures customer’s or end user’s understanding of the general capabilities of applications in domain” [CE00]. The last phase of operational analysis establishes how the application works and what are the relationships of features in the feature model with the corresponding entities in the model to which it is mapped. We consider the first two phases of Domain Modeling, especially the feature analysis while carrying out the feature-oriented decomposition of SQL:2003.
2.4 Feature-Oriented Programming
Feature-Oriented Programming (FOP)4 [Pre97] is the study of feature modularity and how to use it in program synthesis [Bat03a].
FOP is based on the notion of Stepwise Development methodology [Wir71] which itself is concerned with constructing complex programs by adding incremental details to a simple program [Bat03a]. In FOP, the incremental details are features.
The following are basic premises of FOP [BBGN01] : • Algebraically, programs are values and extensions or refinements are functions and their composition is an expression that maps programs as values. • A domain model can be represented as a set of algebraic operations, i.e. in terms of values and functions. The compositional expressions of these algebraic operations define a space of programs that can be synthesized over the domain model. Based on experience with the relational algebra, algebraic entities could similarly be used in algebraic representation of a domain model to optimize refinement expressions of programs. • In terms of its effect on program design, a feature addition to a program incurs significant changes. Therefore it is a large scale program extension. • This large scale extension would indicate altering definitions of existing classes by adding member variables and functions and also adding extra classes to the base definition. • Treating features of a feature model as set of values and functions, algebraic com- position can be applied to synthesize customized programs. Salient ideas of FOP are expressed by two models: GenVoca and its successor AHEAD.
2.4.1 GenVoca The basic ideas of FOP as stated above were first implemented in GenVoca [BO92].
4http://www.cs.utexas.edu/users/schwartz/Started.html 18 2.4. Feature-Oriented Programming
Let f and g represent base programs with specific features.
As stated before program extension is a function that maps programs as values.
The • is the composition operator. ‘a•x’ indicates that feature a is added to pro- gram x. Similarly the equation ‘App1 = a•f’ indicates that feature a is added to program f to obtain application App1. A family of applications is treated as a set of named expressions consisting of composition equations.
Given a base program therefore, the application can be identified in terms of fea- tures that were added to base programs. Addition of features to programs can be implemented in different ways.
In relational query optimization, the basic query is optimized by optimizing the relational algebra expression that represents the query. Similarly program implementa- tion optimization can be considered as an optimization of application expressions over space of semantically equivalent programs [Bat03b].
Constraints over programs and refinements represented as values and expressions are called design rules [Bat03a]. Since GenVoca aspires to represent domain model as a set of algebraic operations, the constraints on the operations and values, are domain specific. Another set of constraints are FODA specific, known as requires and excludes constraints.
2.4.2 AHEAD Algebraic Hierarchical Equations for Application Design (AHEAD) [BSR04] is a generalization of GenVoca. The purpose of AHEAD is to show that various concepts of GenVoca need to be generalized to achieve scaling of ’feature’ concept to large number of programs and their representations.
The theory of AHEAD can be explained in the following:
• System analysts, developers use different kinds of knowledge representations [Bat04] throughout the analysis, design and implementation phases of software development, to identify important domain entities, relationships between them and how they can be implemented. Accordingly, various diagramming notations that are used to denote data flows, processes, states of entities, UML notations, make and build files, specifications for domain specific languages and finally what- ever implementation language and platforms that are used in mapping the features to models, are all some or the other kinds of knowledge representations. There was a need to be able to encapsulate representations of all kinds.
• Adding a new feature to a program that has multiple representations affects any or all of these representations e.g., adding a new feature to a program changes its source, related documentation, build properties, possibly adding or refining related UML diagrams and so on. In order that such transformations scale across all Chapter 2. Background 19
affected representations AHEAD must have mechanisms that generalize transforms [Bat04].
• Transformation follows from composition. Composing a feature to a program that has multiple representations not only implies composition between feature and the program, but also between any or all of their corresponding representations and requires composition mechanism to be present for each kind of representation. In this way AHEAD allows distributing composition over encapsulation [Bat04].
• Batory et al. [Bat03a] define a module as “containment hierarchy of the related artifacts”. A class is a containment hierarchy with first level of classes and second level of members and methods. Similarly a ’package’ is a three level hierarchy. Thus representation of feature and programs can have modules of varying depths. AHEAD needed to generalize modularity.
• AHEAD generalizes GenVoca in terms of hierarchy of artifacts, such that extension artifacts can be added at various points in the hierarchy of artifacts representing the base program. In AHEAD module hierarchies are implemented as directory hierarchies in which related artifacts are kept in specific directories and content of directories of both base programs and features are composed together.
Thus, Batory et al. [Bat04] generalized various concepts from GenVoca to AHEAD. In the next section we give an overview of GenVoca/AHEAD implementation.
2.4.3 Using GenVoca/AHEAD and Related Tools We have already seen the basic premises of GenVoca and its generalization AHEAD. Jakarta Tool Suite (JTS) is the related collection of domain independent generator tools [BLS98]. The generator tools are also known as GenVoca generators and are used for creating domain specific languages. JTS consists of an extended version of Java called ‘Jak’, capable of meta-programming. The tool suite related to AHEAD also contains tools to extend programming languages. In these tools, both language and language extensions are treated as reusable components. While JTS is used for language extension and meta-programming, AHEAD can be used for language extension and in general, scalable feature composition of features of any kind specified in AHEAD specific format. Different combinations of language and language extensions yield different variants of given language. As stated in [BLS98], “Bali and Jak work cooperatively” to compose language and language extensions and create a parser with possibility to add semantic actions in Jak code as well as embedding semantic actions in corresponding Javacc implementation used in Bali.
In AHEAD a language and language extension are defined in two layers. 5 The first is the Syntax layer which contains grammars specific to both the language and language extensions written in Bali grammar notation. Bali2jak tool is used to transform the files of grammar composed using Balicomposer tool, to java parser files. The second layer is the Semantic layer which is generated using the Bali2layer tool. This layer is used to add semantic actions to jak files thus generated. The modified jak files are
5AHEAD Documentation- http://www.cs.utexas.edu/users/schwartz/ 20 2.5. Separation of Concerns composed to syntax layer generated files via refinement addition mechanism provided by tools like Mixin and Jampack. Bali2javacc tool is used in the syntax layer to convert the composed grammar specification to Javacc grammar specification. Grammars are arranged in directories and the composition sequence (which is particularly important in composing language extensions due to Bali specific grammar composition rules) is specified in equation files. Bali2jak tool takes an equation file which contains paths to grammars, and generates inheritance lattice and parse tree classes. The jak2java tool con- verts all jak files to java file thus producing a preprocessor for given composition [BLS98].
During our work on customizable parser for SQL:2003, we found that simple composition rules of balicomposer limit the application of Bali approach to capture the complexity of declarative nature of SQL:2003 specification. Allowing language or grammar specific composition rules by extending the original Bali grammar specification can solve this problem to a large extent.
2.5 Separation of Concerns
In software engineering and software development, new methodology (such as feature- oriented programming, aspects oriented programming) is invented which can tackle software issues like complexity, reuse, etc., based on its formalism. It is assumed by each of the various methodologies that depending on its specific viewpoint it succeeds in reducing software complexity and increasing software comprehensibility, provided that they also contain valid decomposition and composition techniques. Object-oriented methodology assumes that by viewing the real world entities as objects and classes and interactions among entities as messages, the real world scenario is well captured, thus encouraging decomposition of software into objects and providing mechanism for encapsulating and manipulating objects. Feature-oriented methodology’s stand on this is that by creating products based on features and viewing them as abstraction from requirements [SvGB01] can help manage complexity and evolution and increase comprehensibility, because customers are basically concerned with features of a product and believe that a product is different from other products only in terms of features provided and that a product evolves by augmenting features or modifying and removing them. Each methodology also establishes how software based on specific concern is organized in manageable pieces (like objects and features), how the formalism in the methodology enforces low coupling in order to minimize the impact of changes [TOHJ99], how to trace what parts have been affected by changes and manage them without invalidating other dependent parts of the system, and how to promote reuse of already existing components, etc.
Nevertheless it is found that evolution and maintenance activities result in in- creased coupling between software artifacts. They may also incur invasive modifications that affect other software artifacts in unexpected ways [TOHJ99]. Unforeseen require- ments or change of requirements based on one concern while the software was modeled in another considerably restrict reuse of artifacts. Mapping from a software model to implementation tends to obscure perspective as projects size grows thus reducing traceability. These problems about impact of change, reuse and traceability can be Chapter 2. Background 21
attributed to the limitation and unfulfilled requirements related to separation of concern [TOHJ99].
A problem has different important characteristics, and it is better to think of one facet at a time rather than thinking about complex relations between all of them simultaneously [Dij76]. Each issue handled correctly in isolation will lead to the solution of complete problem. This is known as the principle of separation of concerns as originated by Edsger W. Dijkstra [Dij76]. Czarnecki et al.[CE00] assert that in order to facilitate good qualities - understandability, adaptability, reusability, etc., of a program the principle of separation of concerns should be used and that issues should be handled in such way that intentionality and localization are also adhered to, so that a programmer’s intention about a specific issue (i.e. what the problem was and how the programmer planed to solve it) is well identified in the overall solution.
Ossher and Tarr [OT00] identify three distinct components to separation of con- cerns: Identification, in which a software is decomposed according to given formalism along a specific dimension. Encapsulation, which provides mechanisms to manipulate the concerns as first-class entities, and Integration, which is the composition mechanism in the given formalism to integrate the concerns which were represented as first-class entities into software based on those concerns.
Tyranny of the Dominant Decomposition
Given that all methodologies support decomposition based on a specific concern and that all life cycle phases in given methodology provide ways of decomposing and composing software artifacts, the separation of concerns becomes biased toward one kind of concern than others. This side effect is known as the tyranny of dominant decomposition [OT00].
All activities within a methodology revolve around a specific kind of concern (also called dominant dimension) and therefore other concerns are hardly given any thought. Related formalism is generally all about addressing one specific concern (Object in object-oriented software development and feature in feature-oriented software development). Programming languages that support different formalisms (like C++ which can be used both for procedural and object-oriented programming and recently extended for feature-oriented programming [ALRS05]) are ultimately used for one dominant concern. From a programmer’s point of view, programs that were created with specific concern in mind would be more comprehensible than if it was a ’mix’ of two concerns, which means that given the past experience, liking or mastery of specific way of programming (some programmers are good in procedural programming, some in object-oriented programming) a programmer or a developer may choose one kind of formalism over others, even if there was a way to implement them all simultaneously. In effect the overall modular structure evolves with this dominant concern in the developer’s mind. 22 2.5. Separation of Concerns
Multi-Dimensional Separation of Concerns
Ossher and Tarr [OT00] refer to a kind of concern as a dimension of concern. It is often discovered that many times, more than one concern may be important in achieving the hypothesized advantages of separation of concerns. That is, different concerns may be useful in different contexts and one would like to have a formalism that allows to signify different concerns within a system and provide decomposition and composition mechanisms for kinds of concerns, yet preserving the essential element of separation in them with respect to each other.
Different dimensions and corresponding formalisms address specific properties of software engineering that should be abided in creating good quality software [OT00]. For object-oriented development methodology, data abstraction results in isolating the details of representing the real world entity while encapsulation results in localizing entity specific interaction details. In this way, all about a real world entity is effectively in one place, thereby making future changes specific to it easier to handle. A formalism within a software methodology considers only one specific concern and ways to handle concerns of another dimension are not at all considered or integrated in it. Therefore a specific way of modeling artifacts although able in itself to achieve desirable software engineering properties, may not be able to do so when other concerns are required to be modeled besides it. What is good for one kind of concern may pose threat of unmanageability and complexity to other concerns. In other words unforeseen software engineering characteristics may emerge if two different concerns are implemented simul- taneously, seriously affecting the basic advantages of each kind of concern [TOHJ99]. Object-oriented decomposition may result in what Ossher and Tarr [OT00] identify as two negative phenomena: scattering and tangling with respect to feature dimension. Features may be “scattered across multiple classes”, and methods supporting one feature are “tangled with methods supporting other features within the same class”. The scattering and tangling imply that change in requirements in terms of feature affects multiple classes and modifications done to these classes may in fact undermine the original object structure.
Ossher and Tarr [OT00] assert that “Different dimensions are useful for different reasons, at different times”. They find that “set of dimensions of concern and the set of concerns within those dimensions vary over time”. Design patterns, refactoring, object serialization/deserialization are concerns within object dimension. But not all of the concerns in each dimension are thought about from the beginning of modeling based on each dimension. They become relevant over time [OT00]. Given that software was feature decomposed, augmenting another feature would be intuitively straight forward and minimally invasive within software modeled also along feature dimension and implemented accordingly, since mechanisms for handling feature interactions, dependency, etc., are provided in it. Adding a feature to software that was decomposed and implemented along object dimension on the other hand, poses immediate obstacles, as the mechanisms of change belong to object dimension rather than the feature dimension. Similarly introducing a new real world entity and managing its interaction with pre-existing entities is intuitive in software based on object dimension. There- fore a methodology may provide advantages in one area while posing problems in another. Chapter 2. Background 23
The hallmark of reuse is expecting changes before they occur [OT00]. But not all kinds of changes that may or may not happen can be well anticipated. The provisions made to the modular structure with anticipation of changes may never be really used, because those changes never happened or they happened in some other way than what they were provided for. Such anticipation may result in provisions that might add only to complexity of overall software as main intent of the provision may be lost over time if the changes did not take place at all.
Sometimes, even though a developer wants to favor one methodology over the other, the software specifications may be delivered to him in content and vocabulary grossly biased to some other kind of methodology [OT00]. The simplest example of this is when requirements are stated in terms of features to a developer who uses object-orientation for creating software. He has to translate the requirements as features to object-oriented vocabulary and proceed to create classes, etc. If the developer is versed on only one kind of methodology, this process is bound to complicate the matters.
The bottom line of above discussion is that concerns from multiple dimensions may have to be unavoidably considered in different phases of product life cycle and this is applicable also when creating a product line architecture of SQL:2003 based on its feature-oriented decomposition as such an architecture will invariably have to deal with multiple dimensions one of which is the feature dimension.
2.6 Summary
Since its inception SQL has been standardized five times. SQL:2003 is the current ISO/ANSI SQL standard. We apply Software Product Line Engineering concepts to SQL:2003. The principles of separation of concerns and stepwise development are basic principles of software engineering; at the same time, many different kinds of concerns have to be unavoidably considered in different phases of software development. We fo- cus on modeling and implementation of the feature concern. The feature-oriented de- composition is part of the Domain Analysis phase of Domain Engineering, one of the two sub-processes of Software Product Line Engineering, the other being Application Engineering. Features of SQL:2003 are distinguishable characteristics or constructs of SQL:2003. In order to implement features thus obtained we intend to borrow feature- oriented programming approach of language and language extensions from the Bali and the related GenVoca/AHEAD family of tools. In the next chapter we present the fea- ture diagrams of SQL:2003 and present how customized parsability of SQL:2003 may be achieved using sub-grammar composition and a parser generator. 24
Chapter 3
Feature-Oriented Decomposition of SQL:2003
We restrict the scope of the thesis to modeling the features of SQL:2003, particularly of SQL/Foundation [Mel03a]. We give an example also of how the features can be used in implementing a customizable parser. The complete implementation of various rules in the standard is beyond the scope of this thesis. We only take review of various imple- mentation models that may be used to do so.
3.1 Feature Modeling Technique for SQL:2003
We have seen in the last chapter that feature diagrams are accompanied by additional pieces of information (cf. Section 2.3.2). We consider these again as applied to the feature modeling for SQL:2003.
• Semantic description In our work, we include small description of the feature. References to sections in the ISO/ANSI standards are given for further reference. We did not repeat the explanations from the standard. It is suggested that de- velopers have access to the SQL standard documents for additional reference and implementation related help as many SQL constructs have complex set of rules to follow, that must be considered in further phases of development and cannot be covered either in feature diagram or in the semantic description.
• Rationale We base the feature decomposition of SQL mainly on BNF grammar of various SQL statements and their constituents and corresponding specifications in the SQL:2003 standard (particularly SQL/Foundation [Mel03a] and SQL/Frame- work [Mel03b]. That the grammar and specification should be used in feature decomposition of a language is intuitive. No programming language can be decom- posed in terms of features without attending to the syntax of various important language constructs. Accordingly, any prominent part of the production rules for a specific statement is considered a feature. Cardinality based modeling notations are used to denote multiple occurrences of a particular construct within a statement. Therefore, the qualifying characteristic for any part of grammar to be a feature is that it represents an important SQL language construct.
• Stakeholders and client programs There are no particular stakeholders in- Chapter 3. Feature-Oriented Decomposition of SQL:2003 25
volved. In a larger case study undertaking, other departments, universities, research institutes and private vendors may become stakeholders.
• Exemplar systems There is no known case study of decomposing in feature- oriented way either an entire programming language (or parts of it) from scratch (extending language is not the same) or SQL:2003 itself. Bali and related tools from AHEAD tool suite [BSR04], as well as Czarnecki’s Eclipse plug-in for feature modeling called ‘fmp’ [CK05] were useful in gaining insights about our work.
• Constraints and default dependency rules The ‘requires’ condition, if existing in the same feature diagram is shown with a labeled arc. If it spans to other feature diagrams then those detail are mentioned in ‘Requires’ section accompanying the feature diagram. We have not used the concept of attributes as there are not many instances of SQL constructs which need to be assigned default values and in general it would not add to better understanding of the decomposition. Each of the statement specification in the SQL standard though contains complex conditions of syntax which cannot be represented by merely stating ‘requires’ or ‘excludes’ and they have not been covered.
• Availability sites, binding sites, binding modes We have omitted this part in our work, since the scope is restricted to see whether feature decomposition can be done and we concentrate only on feature decomposition of SQL:2003 in this work and not on implementation in which case availability and binding sites and binding modes would be important. The accompanying implementation of a customizable parser only gives example of parsability based on the sub-grammars from SQL:2003 features.
• Priorities We have not considered the priority concept in our work.
3.1.1 Basis for Modeling Features in SQL:2003 The sources of features can include existing and potential stakeholders, domain experts and domain literature, existing systems, pre-existing models, and models created during development. The main source for our work is the various SQL:2003 standards ISO/IEC 9075-(n):2003 [Mel03b] which define the SQL language. The parts SQL/Framework, SQL/foundation and SQL/Schemata encompass the minimum requirements of the language. Other parts define extensions [Mel03b].
We base our feature diagrams on the BNF grammar specification of SQL:2003 and other information given in SQL/Foundation [Mel03a]. The idea of similarity between features represented as feature diagrams and a BNF grammar representation was put forward by De Jonge et al. [dJV02]. Batory et al. [Bat05] have used iterative grammars for this purpose, though they assert that more general grammars can be used, e.g., see Figure 3.1,
• Figure 3.1(a) is the production A: B C D ; assuming all features are mandatory. If a feature is optional (as is C), it is surrounded by [brackets]. Thus, the production for 3.1(a) is A: B [C] D ; 26 3.1. Feature Modeling Technique for SQL:2003
• Figure 3.1(b) is the production: A: B | C | D;
• Figure 3.1(c) corresponds to a pair of rules: A:t+; and t: B | C | D ; meaning one or more of the B, C, D are to be selected.
(a) AND features (b) Alternative features (c) OR features
Figure 3.1: Parent child relationships in feature diagrams as grammar rules [Bat05].
The similarity between feature diagrams and grammars considers a very general form of grammar. Having obtained feature diagrams using the SQL:2003 BNF grammar, we wish to convert the feature diagrams to LL(k) sub-grammars (as required by ANTLR parser generator). The BNF grammar of SQL is used in constructing the feature diagrams based on following assumptions:
• A complete SQL:2003 BNF grammar represents a product line, in which various sub-grammars represent features which when composed together give products of this product line, namely different variants of SQL:2003.
• A non-terminal may be considered as a feature only when the non-terminal clearly expresses an SQL construct; placeholder non-terminals are not considered.
• Mandatory non-terminals are represented as mandatory features.
• Optional non-terminals are represented as optional features.
• The choices in the production rule are represented as or-features (instead of alter- native features). Consider - A : B | C | D ; if such a production rule appears in the SQL grammar we would use 3.1(c) instead of 3.1(b) to represent the corresponding feature diagram. This is required because we want the implementation product configuration to consider all choices, instead of an alternative among them.
• A terminal symbol is considered only if it presents an important characteristic of feature under consideration apart from the syntax.
• The notation ‘...’ is used in the standard to show multiple occurrences of a con- struct, see Section 6.2 of [Mel03b]. We use the cardinality notation to depict this fact.
The grammar given in SQL/Foundation [Mel03a] is useful in understanding overall structure of an SQL construct, or what different SQL constructs constitute the larger SQL construct. We have found out that this approach may also be useful in general Chapter 3. Feature-Oriented Decomposition of SQL:2003 27 to carry out a feature decomposition of any programming language as the grammar establishes the basic building blocks of any programming language.
We use the cardinality concept from cardinality based feature modeling(cf. Sec- tion 2.3.2) in many feature diagrams, such as, e.g., the feature diagram for Domain Definition (Figure 3.3), Table Definition(Figure 3.4) and Schema Routine(Figure 3.6), etc., indicating that a particular feature such as table element in the feature diagram table definition may be cloned with its subtree, and parts of the subtree may be configured differently as in varying syntax and choices of non-terminals along the table element subtree.
The cardinality concept is not absolutely necessary [CE00]. Cardinalities about features can be expressed even without using a cardinality notation, e.g., by directly mentioning the cardinality information in the feature, or creating another feature that expresses the number information about the main feature and so on [CE00]. We use the cardinality notation because it expresses the modeling intent more succinctly without adding complexity or overloading feature diagrams with extra features. The cardinality notation also expresses a closer relationship to the SQL BNF grammar on which we base the feature diagrams.
Regarding the tree nature and large size of feature diagrams Czarnecki et al. [CE00] make the following observations.
In some cases, representing a feature diagram using a more general di- rected graph would be certainly useful. For example, we might want to allow for multiple references to one subgraph, in order to avoid its duplication within the diagram. In the following discussion, however, we assume that the diagram is a tree. A practical approach to avoiding the duplication of feature subtrees in a larger feature diagram is to only include the roots of the subtrees in the larger diagram and to show the duplicated subtree in one separate diagram. and
Sometimes, when we draw a large feature diagram, it is convenient to split it into a number of smaller diagrams. In this case, the roots of the smaller subdiagrams are features of the concept represented by the root of the original diagram rather than concepts.
For larger feature diagrams in which the leaf features are further expanded, they are shown in separate feature diagrams, and references to such feature diagrams are given.
When a feature diagram contains features that require each other, this is shown by dashed arc with arrow pointing to the required feature. When a feature in a feature diagram requires feature from another feature diagram the requires conditions are given under ‘Requires’ in the semantic description for that feature diagram. 28 3.2. Feature Diagrams for SQL:2003
3.2 Feature Diagrams for SQL:2003
We present here 10 feature diagrams that are representative of modeling technique used as well as those that present some of the most important SQL:2003 constructs and explain the modeling technique as applied to the grammar specification in SQL:2003 standards and other information. The rest of the 30 feature diagrams are decomposed in a similar manner like the 10 feature diagrams here. The reader is requested to refer to Appendix A for these feature diagrams.
Feature ID - SQL:2003
Figure 3.2: Main Feature Diagram of SQL:2003
Semantic Description Figure 3.2 shows the main feature diagram of SQL:2003. It also represents the most coarse-grained decomposition. We have chosen to decompose SQL/Foundation further as core of the SQL:2003 has been defined in SQL/Foundation. A customer may be presented with this feature tree for selecting the SQL/Foundation and other packages. If the customer wishes to select specific features from within the extension packages, then further decomposition of these can be carried out in a manner similar to the decomposition of SQL/Foundation. SQL/Foundation [Mel03a] defines the basic operations of SQL. Chapter 3. Feature-Oriented Decomposition of SQL:2003 29
Figure 3.3: Domain Definition Feature Diagram Feature ID - Domain Definition Semantic Description - Figure 3.3 shows the feature diagram for Domain Definition. This feature shows the use of cardinality notation [0..*]. A domain is used to define a set of valid values of a data type by specifying 0 or more domain constraints, denoted in the grammar specification as ‘[
Requires - The Check Constraint Definition feature requires the Search Condi- tion feature of the Predicate feature (Figure A.30). The Domain Definition feature optionally requires the Predefined Types feature of the Data Type Feature (Figure A.23).
This diagram also shows the use of ‘Requires’ condition which occurs because of the tree nature of feature diagrams and parent child relationship between features, in which a child can have only one parent. The grammar specification indicates that the Check Constraint Definition feature has Search Condition as its child feature. Due to the fact that Search Condition feature repeats in many other feature diagrams, we moved it to a single parent the Predicate feature and we state the relationship of Check Constraint Definition to Search Condition in the Requires part of the semantic description. 30 3.2. Feature Diagrams for SQL:2003
Figure 3.4: Table Definition Feature Diagram Feature ID - Table Definition Semantic Description - Figure 3.4 shows the feature diagram for Table Definition.A table is a collection of rows having one or more columns. A table definition is specified by ‘CREATE TABLE’ statement. A table can be created in three different ways. The most general way is to specify the table name and columns with their data types and constraints. A table can be created using CREATE TABLE AS. CREATE TABLE AS (the ‘AS subquery’ feature) creates a table from the result of a SELECT statement. Finally, a table can be created using the LIKE clause. A table can be created that looks like another table. That is, one can create a table that includes all of the column definitions from an existing table using the LIKE clause.
As seen above the Table Definition feature contains SQL:2003 specific features such as Table As and Table Like statements. The grammar specifications for Table Element and Typed Table Element are stated as ‘