Otto-von-Guericke-Universit¨atMagdeburg

School of Computer Science Department of Technical and Business Information Systems

Master Thesis

Feature-Oriented Decomposition of SQL:2003

Author: Sunkle Sagar October 24, 2007

Advisor: Prof. Dr. rer. nat. habil. Gunter Saake, Dipl.-Inform. Marko Rosenm¨uller, Dipl.-Inform. Martin Kuhlemann University of Magdeburg School of Computer Science Department of Technical and Business Information Systems P.O. Box 4120, D–39016 Magdeburg Germany Sunkle, Sagar: Feature-Oriented Decomposition of SQL:2003 Master Thesis, Otto-von-Guericke-Universit¨at Magdeburg, 2007. i

Acknowledgements

I would like to thank my master thesis advisors for their support and reviews. I would like to thank Prof. Dr. Gunter Saake for his support and confidence in me and allowing me to work on master thesis in his group. I would like to thank Marko Rosenm¨uller, Martin Kuhlemann and Norbert Siegmund for their helpful reviews and suggestions. Marko and Martin reviewed this thesis for smallest details. I would like to especially thank Marko, he was specifically assigned to me for the thesis and without our lengthy and very interesting discussions about various related topics he tried patiently to answer my innumerable queries orally or by emails, this thesis would not have been possible. Finally I would like to thank my family because of whom this education is made possible for me. ii CONTENTS iii

Contents

Contents iii

List of Figures vii

List of Tables ix

List of Abbreviations x

1 Introduction 1 1.1 Motivation ...... 1 1.2 Goals ...... 4 1.3 Structure of the Thesis ...... 4

2 Background 5 2.1 SQL ...... 5 2.1.1 History ...... 5 2.1.2 Standardization and Evolution ...... 6 2.2 Software Product Line Concepts ...... 7 2.2.1 Software Product Line Engineering ...... 7 2.2.2 Domain Engineering ...... 8 2.2.3 Application Engineering ...... 10 2.3 Feature-Oriented Decomposition ...... 11 2.3.1 Features ...... 11 2.3.2 Feature Diagrams ...... 11 2.4 Feature-Oriented Programming ...... 17 2.4.1 GenVoca ...... 17 2.4.2 AHEAD ...... 18 iv CONTENTS

2.4.3 Using GenVoca/AHEAD and Related Tools ...... 19 2.5 Separation of Concerns ...... 20 2.6 Summary ...... 23

3 Feature-Oriented Decomposition of SQL:2003 24 3.1 Feature Modeling Technique for SQL:2003 ...... 24 3.1.1 Basis for Modeling Features in SQL:2003 ...... 25 3.2 Feature Diagrams for SQL:2003 ...... 28 3.3 Sub-grammars Based on Feature Diagrams ...... 36 3.4 Summary ...... 41

4 Issues in Feature-Oriented Decomposition of SQL:2003 42 4.1 Other Implementation Models ...... 42 4.1.1 Superimposed Variants ...... 42 4.1.2 Hyperspaces ...... 43 4.1.3 Comparison of Different Implementation Models ...... 45 4.2 SQL:2003 Specific Issues ...... 45 4.3 Related Work ...... 50 4.4 Summary ...... 52

5 Conclusion 53 5.1 Further Work ...... 54

Bibliography 56

Appendices

A SQL:2003 Feature Diagrams 61

B Taxonomy of SQL Non-Framework Optional Fearures 86 B.1 Java Routines and Types Using the Java Programming Language (SQL/JRT) ...... 87 B.2 SQL Object Language Bindings (SQL/OLB) ...... 87 B.3 SQL Persistent Stored Modules (SQL/PSM) ...... 87 B.4 SQL Management of External Data (SQL/MED) ...... 88 B.5 SQL XML-Related Specifications (SQL/XML) ...... 88 CONTENTS v

C SQL Platform Support 91 vi CONTENTS LIST OF FIGURES vii

List of Figures

2.1 Structure of the SEI Framework for Product Line Practice [CE00] . . . .8 2.2 Domain Engineering and Application Engineering as parallel processes [CE00] ...... 10 2.3 Feature Diagram with a concept node and three features...... 12 2.4 Feature Diagram with mandatory and optional features...... 13 2.5 Alternative and OR features...... 13 2.6 AND features...... 14

3.1 Parent child relationships in feature diagrams as grammar rules [Bat05]. . 26 3.2 Main Feature Diagram of SQL:2003 ...... 28 3.3 Domain Definition Feature Diagram ...... 29 3.4 Table Definition Feature Diagram ...... 30 3.5 View Definition Feature Diagram ...... 31 3.6 Schema Routine Feature Diagram ...... 32 3.7 Insert statement Feature Diagram ...... 33 3.8 Merge statement Feature Diagram ...... 33 3.9 Query Expression Feature Diagram ...... 34 3.10 Query Specification Feature Diagram ...... 35 3.11 Table Expression Feature Diagram ...... 35

4.1 Overview of Superimposed Variants Approach [CA05] ...... 43 4.2 Hyperspace matrix with two relevant dimensions; Classes and Features [PRB03] ...... 44

A.1 SQL/Foundation Feature Diagram ...... 61 A.2 SQL schema statement Feature Diagram ...... 62 viii LIST OF FIGURES

A.3 Schema Definition Feature Diagram ...... 63 A.4 Column Definition Feature Diagram ...... 64 A.5 Sequence Generator Feature Diagram ...... 65 A.6 Trigger Definition Feature Diagram ...... 66 A.7 User-Defined Type Definition Feature Diagram ...... 67 A.8 Grant Privilege Feature Diagram ...... 68 A.9 Privilege Feature Diagram ...... 68 A.10 Alter Statements Feature Diagram ...... 69 A.11 SQL Data Statements Feature Diagram ...... 70 A.12 Cursor Feature Diagram ...... 71 A.13 SQL Data Change Statements Feature Diagram ...... 71 A.14 Delete statement Feature Diagram ...... 72 A.15 Update statement Feature Diagram ...... 72 A.16 SQL Transaction statements Feature Diagram ...... 73 A.17 SQL Control statements Feature Diagram ...... 74 A.18 SQL Connection statements Feature Diagram ...... 74 A.19 SQL Session statements Feature Diagram ...... 75 A.20 SQL Dynamic Statements Feature Diagram ...... 76 A.21 SQL Diagnostic Statements Feature Diagram ...... 76 A.22 Scalar Expressions Feature Diagram ...... 77 A.23 Data Type Feature Diagram ...... 78 A.24 Window Function Feature Diagram ...... 79 A.25 Function Specification Feature Diagram ...... 80 A.26 Search Cycle Clause Feature Diagram ...... 81 A.27 Table Reference Feature Diagram ...... 82 A.28 Group By Clause Feature Diagram ...... 83 A.29 Window Clause Feature Diagram ...... 84 A.30 Predicate Feature Diagram ...... 85 LIST OF TABLES ix

List of Tables

B.1 Number of Features enlisted in the SQL:2003 Specification Draft . . . . . 86 B.2 SQL/JRT features ...... 87 B.3 SQL/OLB features ...... 87 B.4 SQL/PSM features ...... 88 B.5 SQL/MED features ...... 89 B.6 SQL/XML features ...... 90

C.1 SQL Platform Support -1 ...... 91 C.2 SQL Platform Support -2 ...... 92 x

List of Abbreviations

AHEAD Algebraic Hierarchical Equations for Application Design BNF Backus Naur form CLI Call Level Interface DSL Domain Specific Language FOD Feature-Oriented Decomposition FODA Feature-Oriented Domain Analysis FOP Feature-Oriented Programming FOR Feature-Oriented Refactoring FOSD Feature-Oriented Software Development JTS Jakarta Tool Suite JRT Routines and Types for Java Programming Language MED Management of External Data MBSE Model Based Software Engineering OLB Object Level Bindings PSM Persistent Stored Modules SPL Software Product Line SPLE Software Production Line Engineering SQL Structured Query Language Chapter 1. Introduction 1

Chapter 1

Introduction

1.1 Motivation

Databases have come up a long way since Codd’s concept of relational data model in 1970, now posed as the most vital component in the status quo of information technology. Database technology is at the core of multitude of software applications such as business transaction applications of varying size, digital libraries, web applications like online banking and online shopping, scientific projects like the human genome mapping project and NASA’s earth observation system, Enterprise Resource Planning (ERP) systems, data warehouses, business intelligence applications like data mining and Online Analytical Processing (OLAP), embedded systems for personal information management, etc. The basic structure of databases has evolved the relational model to encompass other conceptual models like entity relationship model, object relation model, object role model1, etc. Databases are available now also in the form of semi structured databases, object-oriented databases, multi-dimensional databases, distributed and parallel databases, etc. It can be argued that database technology will continue to evolve at an alarming rate as it makes foray into new domains with new domain specific techniques being invented and merged into current database technology.

It has been observed that most popular database vendors tend to provide bits and pieces of support to every other kind of database functionality, be it indexes or XML support or special kinds of queries, etc [CW00]. Many small database vendors, which are large in number, cater only to a specific type of database technology, but large companies inevitably offer a jumble of features packed in one product. In a way, features of database products have been treated as a sales and marketing issue by big players without much consideration to how bloated such a product becomes for simplest of database applications. Every new release of the database product comes with the tag ‘Feature Rich’, claiming to be better than competitors.

Ironically, any enterprise of modest size makes use of only a tiny bit of mam- moth functionality provided by database products. This tends to be low-end features [CW00]. Most of the high-end features are rarely or never used. On one hand, it’s good to have a large array of features that could give enhanced performance and

1http://www.orm.net/ 2 1.1. Motivation

other advantages, on the other hand one can also say that database products are overloaded with features [CW00]. Database vendors have ‘learned’ to adopt a one size fits all approach and maintain a single code line with all database management services they provide, reasons for which Stonebraker et al. [SC05] identify as maintenance and compatibility cost of code base and sales and marketing problems. They conclude that this is no more applicable and that “the commercial world (of databases) will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser”.

Like databases which are getting caught in the universality trap [CW00], the structured query language (SQL) which is the basis for interaction between database technology and its user, has grown enormously. As Chaudhuri et al. [CW00] claim, the core of SQL, selection-projection- queries and aggregation is extremely useful, but from those conceptually simple queries SQL now contains ever growing number of additional constructs like nested subqueries, recursive views, joins with added functionality, value support, data structures like multi-sets and arrays, XML type and so on, asking a common developer to learn all the esoteric syntax and know correct application for each construct. Although every addition to features is useful in a specific context, with so many choices overall, it increases the complexity and confusion in the developer’s mind.

Software Production Line Engineering (SPLE) is a software engineering approach that considers such issues of products of similar kind made for a specific market segment and differing in features, also called Software Product Line (SPL). We propose that software product line research is capable of providing answers to the problems of features in products, especially database products, and that SPLs should be developed using appropriate decomposition mechanisms for concerned products. This should apply to all artifacts involved in software product line development. SPLE considers software artifacts in terms of reusable assets with a predictable reuse in the products of product line. Therefore, engineering database technology as a software product line using Feature-Oriented Software Development (FOSD) concepts, is favorable in terms of predictive reuse as against merely opportunistic reuse2 and can be applied to database management systems in variety of domains. We propose that feature-oriented con- cepts are applicable to SQL:2003 as well, which is the latest ISO/ANSI standard for SQL.

Decomposing SQL:2003 using feature-oriented concepts can be beneficial and in- sightful not only in managing features of SQL itself but also in database technology of embedded and real time systems as well as in the area of software generators where these concepts are immediately applicable as explained in the following:

• Database Systems for Smartcards and other Embedded Systems Em- bedded systems contain both hardware and software with varying influence on the software part, e.g., music systems, mobile phones, personal digital assistants. Smartcards are a kind of embedded systems that contain a software chip which can store and manipulate data. Bobineau et al. [BBPV00] have identified the need of scaling down database functionality for smartcards. According to them, smartcards are being used to store personal data in healthcare, banking and insurance appli-

2http://www.softwareproductlines.com/introduction/introduction.html Chapter 1. Introduction 3

cations and in each case sophisticated queries are run against this data. Kersten et al. [KWF+03] have similarly identified the need for embedding database man- agement systems in various hardware appliances. Peer-to-peer and stream based architectures for embedded devices too, require declarative query processing for resource discovery, caching and archiving, etc [KWF+03]. A standard called Struc- tured Card Query Language (SCQL) by ISO considers inter-industry commands for use in smart cards [Int99]. A feature decomposition of SQL:2003 can be used to cre- ate ‘scaled down’ version of SQL appropriate for such applications, by establishing a product line architecture for SQL variants.

• Database Systems for Real Time Applications Real time applications such as automotive control systems demand management of large amount of data with temporal properties against which pre-compiled and ad-hoc queries are executed [NTN+04]. The Electronic Control Units in automotive systems, which are respon- sible for processing the collected data, require different kinds of database function- ality as well as different kinds of queries resulting in different configurations of the database system. Nystr¨omet al. [NTN+04] suggest that different database con- figurations including the query processing component for these configurations can be generated from preconfigured database components. This configuration activity can benefit from a well defined SQL product line in which reusable assets for the most common configurations are considered.

• Software Generators for Programming Languages In the current software design and development scenario, the view of real world entities in terms of objects, classes, and methods is quite prevalent. According to Batory et al. [BBGN01] this makes it difficult to create low level specifications of applications that can be used in automating software generation. To make this happen, they argue that focus in software engineering approaches must shift to the key concepts of Domain Specific Language (DSL) and Features. There must be programming support not just in terms of generic languages but rather domain specific languages so that domain and task specific notations can be used to produce superior software design. Instead of low level code centric components, reusable units that implement orthogonal fea- tures must be the building blocks of this support [BBGN01]. At the same time, they assert that, the rationale for this is the fact that customers demand features in the products and do not care about code contained in the software. An effort in this direction was made with GenVoca [BO92],[BJMvH00],[BLHM02], in which language and language extensions are viewed as reusable units that encapsulate features rather than code modules. Bali is a related tool which is used to compose grammars of programming language in order to extend it [BLS98], signifying the feature view of programming language extensions. We base our work on feature de- composition of SQL:2003 mainly on grammar specification for SQL:2003 in various ISO/ANSI standards and gain thereby further insights about feature extension of a declarative query language and programming language in general by composing extension grammars.

Thus, a variety of software applications stand to gain by application of product line concepts to database technology. We take a step in this direction by focusing on the feature-oriented decomposition of SQL:2003. 4 1.2. Goals

1.2 Goals

In this thesis we explain how features and feature modeling form the analysis phase of Software Product Line Engineering and show that SQL:2003 can be decomposed on the basis of feature concept. Specifically we intend to:

1. Decompose SQL:2003 into features using various SQL:2003 ISO/ANSI standards and present these features in terms of feature diagrams.

2. Present an example of how to construct sub-grammars of SQL:2003 based on de- composed features and show how to compose them to obtain customized parsability for SQL:2003.

3. Review related feature implementation models and compare them within the con- text of SQL:2003 features.

The detailed structure of the thesis is explained in the next section.

1.3 Structure of the Thesis

Chapter 2 In chapter 2 we set the background for the feature-oriented decomposition of SQL:2003. We first take the review of history of SQL and its standardization. The standardization process is seen in the light of features added in the form of packages to the various SQL standards. We then take review of the Software Product Line En- gineering and its two sub-processes Domain Engineering and Application Engineering. The core concepts of features, feature diagrams and feature-oriented decomposition are then discussed, followed by a section on the related discipline of the feature- oriented programming. Finally, we conclude the chapter with an elaborate account of the concepts of Separation of Concerns and Multi-dimensional Separation of Concerns.

Chapter 3 In chapter 3 we begin by explaining the modeling techniques and es- tablishing the basis for the feature-oriented decomposition of SQL:2003. Some of the important feature diagrams are presented that explain the modeling technique used. We follow the decomposition by explaining how features in the feature diagrams can be associated with sub-grammars of SQL:2003 and a comparison between the Bali approach and our own approach.

Chapter 4 In chapter 4 we present other implementation models and compare them in the context of mapping SQL:2003 features. Various SQL:2003 specific issues are then discussed. We conclude the chapter with a review of the related work.

Chapter 5 Finally, we present the conclusions in chapter 5 and discuss further work. Chapter 2. Background 5

Chapter 2

Background

In this chapter we explore various topics required as background for proper understanding of the feature-oriented decomposition of SQL:2003. We review the history of SQL and its standardization. We explore the product line concepts further. Then we talk about feature-oriented decomposition and feature-oriented programming. Finally we discuss separation of concerns in the light of SQL:2003 features.

2.1 SQL

The following definition for SQL is given in [vdL06]. “Structured Query Language (SQL) is a database language used for formulating statements that are processed by database server.” Database is assumed to be “a collection of persistent data” [Dat95] and a database server or database management system is “a collection of programs that enable users to create and maintain a database” [EN03].

SQL is based on a formal theory known as Codd’s Relational Model [Cod70]. Since its conception SQL has now changed to a hybrid language with both non- procedural and procedural constructs, although in the beginning it was labeled a declarative and non-procedural database language. Triggers and stored procedures are examples of procedural constructs. SQL can be used both in an interactive manner, where a user can use SQL statements to manipulate data and in a pre-programmed manner, in which case a user need not be aware of SQL statement syntax, etc., which is common in most business applications [vdL06].

The following account is based on history of SQL presented in [vdL06].

2.1.1 History The history of SQL is closely related to the history of an IBM project called System R, the purpose of which was to create a relational database server. A language called SEQUEL was developed as a database language for system R by designers R. F. Boyce and D. D. Chamberlin, which was later renamed to SQL [vdL06].

In ‘phase zero’ of the project System R (1974-1975), only part of SQL without JOIN queries was implemented. In ‘phase one’ of the project (1976-1977), SQL was 6 2.1. SQL implemented afresh with multi-user capacity and support for JOIN queries. In the ‘phase three’ or the final phase (1978-1979), System R was installed and evaluated at various client bases. Knowledge gained in this implementation was useful in further advances in capabilities of SQL; IBM started developing commercial products based on System R. Finally SQL was standardized for the first time by the American National Standards Institute (ANSI) in 1986 [vdL06].

2.1.2 Standardization and Evolution The first ANSI edition of SQL standard was created in 1986, unofficially called SQL1 [vdL06]. In 1987, the ISO edition called ISO 9075-1987, ‘Database Language SQL’ was completed. SQL1 had very small set of integrity mechanisms. They were extended by adding support for primary and foreign keys in 1989 [vdL06]. This standard is known as SQL89 and the corresponding SQL document is called ISO 9075-1989, ‘Database Language SQL with Integrity Enhancements’. The successor to 1989 standard was given the name SQL2. Many new statements and extensions were added to 1989 standard to create the SQL92 standard [vdL06]. After SQL92 was published, SQL/CLI (Call Level Interface) was added in 1995. In 1996, SQL/PSM (Persistent Stored Modules) was added. In 1998, SQL/OLB (Object Level Bindings) was published as well. SQL3 or the SQL 1999 standard ultimately comprised five parts: SQL/Framework, SQL/Foundation, SQL/CLI, SQL/PSM and SQL/OLB. At the same time, plans for further additions were made. In 2003, the newest edition of SQL standard, referred to as ‘SQL:2003’ was created [vdL06], which consists of SQL/JRT (Routines and Types for Java Programming Language), SQL/XML and SQL/MED (Management of External Data) along with the original five parts of SQL 1999 standard with additional improvements. The part of SQL/Foundation that dealt with schema was taken out of it and put into SQL/Schemata in SQL:2003. For description of various SQL packages refer to Appendix B.

The SQL/CLI of 1995 was created based on a report by ‘SQL Access Group’, a committee set up by database vendors Informix, Ingres and Oracle, that attempted to define a standard for interoperability between applications created using different specifications. Finally Microsoft developed ODBC (Open Database Connectivity) based on SQL/CLI [vdL06].

The next standard, called SQL 20071 is in the making. It will add features like regular expression support, binary and floating decimal data types, materialized views, streaming data support, XQuery support and further enhancements to SQL/XML and support for RDF and semantic web.

The standardization process clearly shows how the core of SQL remained more or less constant from 1992 ahead, with additional features added to encompass foray of database technology into other areas of computing. All the discussion about features becomes relevant when we see how product line concepts are immediately applicable to database technology and its base standards.

1http://www.standards.org.au/downloads/ABS-2005-12.pdf Chapter 2. Background 7

SQL:2003 as we have seen, added SQL/XML, and made some modifications to other parts of SQL 1999. Following are the features specific to SQL:2003 [EMK+04]:

• New data types BIGINT, MULTISET, and XML. Types BIT and BIT VARYING were removed.

• Improved SQL-invoked routines (especially table functions that return a ’table’)

• New Create Table AS and Create Table Like statements, which are extensions to the Create Table statement

• New Merge statement, which combines the facility provided by SELECT and UP- DATE statements

• New Sequence Generators, which can automatically generate unique values for columns

• New Identity and Generated Columns, which automatically generate next values for specified columns based on evaluation of associated scalar expression

• New Window clause in Query Expression, which can be used to define window of rows against which window functions can be executed

• Support for sample data (Tablesample) for improved performance

• Improved Savepoint handling

2.2 Software Product Line Concepts

Software Product Line Engineering is a methodology for developing a variety of qual- ity software systems in short time [PBvdL98]. SPLE differs from other methodologies in terms of its stress on capturing and managing variability. SPLE contains two distinct de- velopment processes within itself: Domain Engineering and Application Engineering. We first take the review of SPLE and then Domain engineering and Application Engineering in turn.

2.2.1 Software Product Line Engineering In Pohl et al. [PBvdL98], following definition is given, “Software Product Line Engi- neering is a paradigm to develop software applications (software intensive systems and software products) using platforms and mass customization.”. This definition covers both standalone software and software embedded into a system that integrates both hardware and software (embedded systems). Developing applications using platform means plan for reuse and build reusable assets. Building applications for mass cus- tomization means employing the concept of managed variability. Managed variability means that adaptations need to be anticipated and accomplished in controlled and reproducible manner. Domain Engineering and Application Engineering are deemed as sub-processes of SPLE. Pohl et al. [PBvdL98] assert that there is a separation of two concerns here namely, “to build a robust platform and to build customer-specific 8 2.2. Software Product Line Concepts applications in a short time”. The first refers to Domain Engineering and the second to the Application Engineering. Figure 2.1 shows this relationship between Domain and Application Engineering in terms of reusable assets development and product development respectively. Further description of the concept of ‘separation of concerns’ is given in Section 2.5.

Figure 2.1: Structure of the SEI Framework for Product Line Practice [CE00]

2.2.2 Domain Engineering Czarnecki et al. [CE00] give the following definition of Domain Engineering.

Domain Engineering is the activity of collecting, organizing, and storing past experience in building systems or parts of systems in a particular do- main in the form of reusable assets (i.e. reusable work products), as well as providing an adequate means for reusing these assets (i.e., retrieval, qualifi- cation, dissemination, adaptation, assembly, and so on) when building new systems.

A common element among different Domain Engineering definitions is support for reuse in a family of similar applications.

Like analysis, design and implementation phases of software engineering, Domain Engineering consists of a Domain Analysis phase, Domain Design phase and Domain Implementation phase. The phases of Application Engineering parallel the phases of Domain Engineering with Requirements Analysis phase, Product Configuration phase and Integration and Testing phase respectively [CE00]. Customer needs are assessed during Requirements Analysis of Application Engineering while in Domain Analysis, useful knowledge about domain is gathered. In feature based methodology, requirements are presented as features and the domain model is built as Product Configuration and Domain Design phases go hand in hand. The results of these phases are used in establishing the product line architecture where different product configurations yield different products of the product line. The Domain Implementation phase may use Domain Specific Languages (DSLs) and other generator tools during the integration of similar products. The entire process is repeated as new requirements are made, Chapter 2. Background 9 signifying in some cases extra need for further Domain Analysis.

The phases of Domain Engineering are explained further:

• Domain Analysis Domain Analysis is used to define a specific domain and establish its scope [CE00]. Information from the current systems (if available), different stakeholders, information obtained from experiments and prototypes created before, standards documents (as in the various SQL:2003 standards documents used in this work) and any other related information available in any form is used during Domain Analysis. As such this is not a mere book keeping of all domain related informa- tion, rather it is used to gain as extensive knowledge as possible about a given domain so that the scope of domain can be established and insights about reuse are obtained.

With the knowledge about reusable assets in the system [CE00], a domain analyst can represent common and variable parts of system. The domain model contains information about relationships between common and variable parts as well as any accompanying constraints. Feature models are used to represent set of reusable and configurable requirements, treated as features, and consist of feature diagrams and additional information.

Czarnecki et al. [CE00] introduce two kinds of domain scope with respect to software systems in a domain: Horizontal or System Category Scope and Vertical or Per System Scope, which consider how different systems are formed in the domain and what parts of these systems are in domain respectively. In this way, Domain Analysis involves Domain Scoping and Domain Modeling.

• Domain Design The Domain Design is used to create a product line architecture [CE00]. For this, different functional and non-functional requirements such as performance, adapt- ability, extendibility are considered [CE00]. System components are arranged in architectural patterns (one of which is ‘layers pattern’ that arranges system com- ponents in groups of subtasks of a particular level of abstraction, another is ‘micro- kernel pattern’ which represents a minimal functional core which can be extended with customer specific parts of the system [CE00]). The architecture also estab- lishes how variability is represented and how products can be configured (Czarnecki et al. [CE00] maintain that configuration languages can be used for configurable or variable parts of the system).

• Domain Implementation In the final phase of Domain Engineering, the architecture established during Do- main Design is implemented along with the production plan [CE00]. Various gen- erator tools, configuration and other domain specific languages, GUIs, etc., may be used during Domain Implementation for realizing product specific production plans. In case of products delivered to customers are required to be augmented with more features then custom development may be carried out with these tools [CE00]. This is a customer specific addition of features as opposed to creating basic 10 2.2. Software Product Line Concepts

Figure 2.2: Domain Engineering and Application Engineering as parallel processes [CE00]

product variant based on configuration.

2.2.3 Application Engineering Czarnecki et al. [CE00] define Application Engineering as “the process of building systems based on the results of Domain Engineering”. The phases of Application Engineering operate simultaneously along with the phases of Domain Engineering. The processes of Domain Engineering and Application Engineering can be differentiated by the fact that Domain Engineering considers all possible systems within the restricted domain using the scope defined during domain scoping, whereas Application Engineering considers a concrete application based on customer requirements [PBvdL98]. Thus, different applications may be engineered at different times making use of knowledge acquired during previous Domain Engineering phases. If requirements change or additional requirements are made, these can be supported by Domain Design specification and with product configuration tools for application ordering (cf. Figure 2.2). As such Domain Engineering can be signified as Design-for-reuse, while the basic principle of Application Engineering can be designated as design-with-reuse2 [CE00].

How the products of a product line relate

There are two ways in which the products of the product line may relate to each other [SB00]. In the first, all products, of a product line may have common functionality while the remaining features are mutually exclusive. In the second, different products may have different core functionality and the rest of the features are such that they complement each other. With respect to the first, a product line offers customers with products that provide the same basic functionality and customers can avail themselves

2http://www.mpi-inf.mpg.de/∼kettner/courses/lib design 03/notes/intro.html Chapter 2. Background 11

by buying extra features as required in addition to the core functionality. With respect to the second, a customer can review product specifications and combine various features, in other words customize the product including the core and optional components as required.

2.3 Feature-Oriented Decomposition

We first take the review of the Feature concept. We also discuss the feature diagrams which are standard diagramming notation to organize features in hierarchical manner. We then review other pieces of information generally associated with the feature diagrams and finally present a definition of feature-oriented decomposition.

2.3.1 Features Definitions Different definitions of features can be found in related literature. Czarnecki et al. [CE00] give two definitions of features, as found in Domain Engineering literature.

An end-user-visible characteristic of a system.

A distinguishable characteristic of a concept (e.g., system, component, and so on) that is relevant to some stakeholder of the concept.

Svahnberg et al. [SvGB01] define a feature as “set of functional and non-functional requirements”, which is attributed to their assumption that “there is an order of magnitude difference between number of stated requirements and features encapsulating those requirements” and that a feature is used to “group related requirements”. This assumption follows from their notion of features as abstraction from requirements. Batory et al. [BLHM02] define a feature as “an increment in program functionality” and also “it is a product characteristic that is used in distinguishing programs within a family of related programs”. Czarnecki et al. [CHE04] in their cardinality based feature modeling, extend the definition of features from “end-user-visible and distinguishable characteristic” to “any functional and non-functional characteristic at requirements, architectural, component, platform or any other level”.

In modeling the features of SQL:2003, we take the view of features as end-user- visible and distinguishable characteristics of a concept.

2.3.2 Feature Diagrams Feature diagrams [KCH+90], [CE00] are used to model features in hierarchical manner as a tree, the root of which represents a concept. Feature diagram together with some additional information constitutes a feature model. The general contents of the additional information are given in Section 3.1.

The root of a feature diagram is called a concept node as it represents a concept, shown as the node CN in Figure 2.3. Other nodes are feature nodes. The hierarchical 12 2.3. Feature-Oriented Decomposition structure of feature diagram indicates that there is a parent child relationship between feature nodes. In Figure 2.3 A,B, and C are features of concept represented by node CN. Additionally, A is the parent node of B and B is parent node of C. Also A is the direct feature of CN, B and C are indirect features of CN [CE00]. B is also called direct subfeature of A while C is indirect subfeature of A [CE00]. The CN node can be a feature itself as well as a concept.

Figure 2.3: Feature Diagram with a concept node and three features.

The feature diagrams contain a various types of features such as mandatory, optional, AND features, alternative features and OR features. A feature instance is described by including the concept node of the feature diagram and traversing the diagram from the concept and depending on the type of the node, the node becomes part of the instance description [CE00].

Mandatory Features These are the features that identify the product. A mandatory feature is always included in the instance description except when its own parent is optional and not included in the instance description. Consider Figure 2.4. In any instance description of this feature diagram, CN and C are always included.

Optional Features Optional features may or may not be included in the instance description of feature diagram. They add value or extra functionality to the core features [SvGB01]. In Figure 2.4, A and D are optional features. B is a mandatory feature, but it is included in the feature instance description only when A is included too. Chapter 2. Background 13

Figure 2.4: Feature Diagram with mandatory and optional features.

Alternative Features These are set of features, only one of which can be included in the instance description, provided that their parent was included too. The alternative features contain an arc joining the edges of the alternatives; these are called edge decorations [CE00]. In Figure 2.5, CN has direct alternative features A,B, and C, only one of which can be selected at a time.

Figure 2.5: Alternative and OR features.

OR Features These are set of features from which any non-empty subset can be included in the instance description, provided that their parent was also included. In Figure 2.5, feature C has three OR features D, E, and F. Any non-empty subset of these can be included in the feature instance description when feature C was selected among the alternatives.

AND Features These are set of features all of which are included in the instance description depending on the type of each feature node. In Figure 2.6 two instance descriptions are possible, one with feature B and one without feature B, while including all other features in both. 14 2.3. Feature-Oriented Decomposition

Figure 2.6: AND features.

Cardinality Based Feature Modeling Czarnecki et al. [CHE04],[CK05] have proposed cardinality based extensions to the original feature model by Kang et al. [KCH+90].

The OR features of the original feature model were extended to group features group cardinality (n-m) that specified the minimum (n) and the maximum (m) number of features to be selected from this group. The original model allows ‘one-or-more’ number of features without the facility to determine bounds. If no group cardinality is mentioned for a group, then (1-1) is the default cardinality. The cardinality of a solitary feature (i.e. the feature is not part of a group) determines how many time the feature can be cloned, i.e. how many times subtree (if any) emanating from this feature can be copied. Accordingly, a mandatory solitary feature has cardinality [1..1] and optional solitary feature has cardinality [0..1]. Another addition to the original model is attributes. A feature can have maximum one attribute, which itself can have a type associated with it.

Feature Variability Optional, alternative, optional alternative, and or-features are called variable features and the nodes to which these features are attached are called variation points [CE00].

Bosch et al. [BFG+02] identify the following as the most important issues about feature variability:

• Mechanism It is a way of generating or configuring different products of the prod- uct line, and used at various times during the development life cycle. Inheritance, preprocessor directives, make and build files, feature configuration templates are some examples of mechanism that can be used to configure and generate different products.

• Phase Variable features may be introduced and bound at different phases of the product life cycle. Introducing a variable feature later may cause some restructuring of domain model, but if feature analysis had anticipated a variation point then it Chapter 2. Background 15

is easier to accommodate a new variable feature. Since modeling variability and commonality is the hallmark of feature-oriented domain analysis, it is certainly better suited than other formalisms to cope with such variable features than others.

• Representation As a methodology, both object-orientation and feature- orientation contain formalisms and diagramming notations. Object-oriented no- tation is more concerned about classes, subclasses, interactions, etc., but contains no notation for features, while in feature-orientation precedence is given to features over internal details in the diagramming notation (which is feature diagrams and related extensions to them).

• Dependency Dependencies capture relationships between variation points and other features.

• Tool support Tool support is the presence of proper software tools to manage variability in products and assemble or generate different products of a product line.

Bosch et al. [BFG+02] found that variability is addressed for the first time at the architectural level in any software engineering approach. They assert that variability is delaying design decisions and at the architectural level variability analysis is devoted to the abstractions of variation points without any thought to how these are actually incorporated in the products.

A final observation made about variability by Bosch et al. [BFG+02] is that vari- ability is not ‘fixed’ in time. That is, variations points themselves may evolve, that variability needs to be managed not only in space but also in time. Along the space axis, it should be possible to create different products at the same time and along the time axis, individual products may evolve. This is important in creating a complete product line architecture for SQL:2003 using decomposed features. The implementation models for SQL:2003 should consider the variability in space and time, so that evolution of SQL:2003 product line and that of individual implementations could be efficiently managed.

Other information associated with feature diagrams A feature model represents the common and the variable features of a system under consideration and consists of feature diagrams and some additional information. Following information is generally associated with feature diagrams [CE00]:

• Semantic description Semantic description contains a short description about the feature, which is to be used by the developer during implementation phase, to get a quick reference about what a feature means. Any information that is useful in understanding more about features can be given in semantic description including additional diagramming notation based on given formalism.

• Rationale A feature diagram may also contain information about why a specific feature was included, i.e. the intent of a feature. 16 2.3. Feature-Oriented Decomposition

• Stakeholders and client programs Different stakeholders are interested in dif- ferent features. This information can be attached in addition to semantic descrip- tion and rationale, and may be used by developers to segment the system into specific stakeholder type and treat the features separately (i.e. provide different functionality) based on stakeholder type.

• Exemplar systems If the feature exists in any other system, then the information about how it is used there, how it was developed, how it was integrated in overall system, what formalisms were used in various phases to represent system related issues, and finally what kind of mapping method was used to create executable version of the system can be useful to the developer of current system to gain insights and direction. Therefore details about such implementations can be added as description to feature diagrams.

• Constraints and default dependency rules Two types of rules are most im- portant to instantiate a feature mapped to models: ’requires’ and ’excludes’. These conditions can span feature diagrams, often involving features from different fea- ture diagrams. Czarnecki et al.[CE00] assert that default dependency rules are used to assign default values to feature attributes which can be used as is or overwritten during model configuration. Together, constraints and default dependency rules can be used to establish a configuration of feature model.

• Availability sites, binding sites, binding modes Availability site indicates which feature is available to which stakeholder, including specific part of the system itself. Binding site and binding mode determine where and when a feature is bound and whether statically or dynamically.

• Priorities Priorities specify importance of a feature for inclusion in the overall system. Features of higher priority are implemented before features of lower priority.

Having established the concept of features and feature diagrams, we present the definition of Feature-Oriented Decomposition.

Feature-Oriented Decomposition is a feature modeling activity, carried out in Domain Analysis to capture commonalities and variabilities in terms of features, of systems in a domain. The concepts of features and feature modeling were originally developed by Kang et al. [KCH+90] as parts of the Feature-Oriented Domain Analysis (FODA). FODA was further developed at the Software Engineering Institute (SEI) [CE00]. According to Czarnecki et al. [CE00] FODA later became part of Model Based Software Engineering (MBSE), which encompasses both Domain Engineering and Application Engineering. As such FODA is the Domain Analysis component of MBSE3.

As seen in the Section 2.2.2 on Domain Engineering and Domain Analysis, FODA consists of phases which set up the scope of domain and produce a model (which is a feature model). These are called Context Analysis and Domain Modeling phases respectively, in FODA. The Context Analysis is used to study the domain scope. The

3http://www.sei.cmu.edu/mbse/ Chapter 2. Background 17

Domain Modeling itself consists of Information Analysis, Feature Analysis and Opera- tional Analysis [CE00]. Information analysis “captures domain knowledge about domain entities and relationship between them” and feature analysis “captures customer’s or end user’s understanding of the general capabilities of applications in domain” [CE00]. The last phase of operational analysis establishes how the application works and what are the relationships of features in the feature model with the corresponding entities in the model to which it is mapped. We consider the first two phases of Domain Modeling, especially the feature analysis while carrying out the feature-oriented decomposition of SQL:2003.

2.4 Feature-Oriented Programming

Feature-Oriented Programming (FOP)4 [Pre97] is the study of feature modularity and how to use it in program synthesis [Bat03a].

FOP is based on the notion of Stepwise Development methodology [Wir71] which itself is concerned with constructing complex programs by adding incremental details to a simple program [Bat03a]. In FOP, the incremental details are features.

The following are basic premises of FOP [BBGN01] : • Algebraically, programs are values and extensions or refinements are functions and their composition is an expression that maps programs as values. • A domain model can be represented as a set of algebraic operations, i.e. in terms of values and functions. The compositional expressions of these algebraic operations define a space of programs that can be synthesized over the domain model. Based on experience with the relational algebra, algebraic entities could similarly be used in algebraic representation of a domain model to optimize refinement expressions of programs. • In terms of its effect on program design, a feature addition to a program incurs significant changes. Therefore it is a large scale program extension. • This large scale extension would indicate altering definitions of existing classes by adding member variables and functions and also adding extra classes to the base definition. • Treating features of a feature model as set of values and functions, algebraic com- position can be applied to synthesize customized programs. Salient ideas of FOP are expressed by two models: GenVoca and its successor AHEAD.

2.4.1 GenVoca The basic ideas of FOP as stated above were first implemented in GenVoca [BO92].

4http://www.cs.utexas.edu/users/schwartz/Started.html 18 2.4. Feature-Oriented Programming

Let f and g represent base programs with specific features.

As stated before program extension is a function that maps programs as values.

The • is the composition operator. ‘a•x’ indicates that feature a is added to pro- gram x. Similarly the equation ‘App1 = a•f’ indicates that feature a is added to program f to obtain application App1. A family of applications is treated as a set of named expressions consisting of composition equations.

Given a base program therefore, the application can be identified in terms of fea- tures that were added to base programs. Addition of features to programs can be implemented in different ways.

In relational query optimization, the basic query is optimized by optimizing the relational algebra expression that represents the query. Similarly program implementa- tion optimization can be considered as an optimization of application expressions over space of semantically equivalent programs [Bat03b].

Constraints over programs and refinements represented as values and expressions are called design rules [Bat03a]. Since GenVoca aspires to represent domain model as a set of algebraic operations, the constraints on the operations and values, are domain specific. Another set of constraints are FODA specific, known as requires and excludes constraints.

2.4.2 AHEAD Algebraic Hierarchical Equations for Application Design (AHEAD) [BSR04] is a generalization of GenVoca. The purpose of AHEAD is to show that various concepts of GenVoca need to be generalized to achieve scaling of ’feature’ concept to large number of programs and their representations.

The theory of AHEAD can be explained in the following:

• System analysts, developers use different kinds of knowledge representations [Bat04] throughout the analysis, design and implementation phases of software development, to identify important domain entities, relationships between them and how they can be implemented. Accordingly, various diagramming notations that are used to denote data flows, processes, states of entities, UML notations, make and build files, specifications for domain specific languages and finally what- ever implementation language and platforms that are used in mapping the features to models, are all some or the other kinds of knowledge representations. There was a need to be able to encapsulate representations of all kinds.

• Adding a new feature to a program that has multiple representations affects any or all of these representations e.g., adding a new feature to a program changes its source, related documentation, build properties, possibly adding or refining related UML diagrams and so on. In order that such transformations scale across all Chapter 2. Background 19

affected representations AHEAD must have mechanisms that generalize transforms [Bat04].

• Transformation follows from composition. Composing a feature to a program that has multiple representations not only implies composition between feature and the program, but also between any or all of their corresponding representations and requires composition mechanism to be present for each kind of representation. In this way AHEAD allows distributing composition over encapsulation [Bat04].

• Batory et al. [Bat03a] define a module as “containment hierarchy of the related artifacts”. A class is a containment hierarchy with first level of classes and second level of members and methods. Similarly a ’package’ is a three level hierarchy. Thus representation of feature and programs can have modules of varying depths. AHEAD needed to generalize modularity.

• AHEAD generalizes GenVoca in terms of hierarchy of artifacts, such that extension artifacts can be added at various points in the hierarchy of artifacts representing the base program. In AHEAD module hierarchies are implemented as directory hierarchies in which related artifacts are kept in specific directories and content of directories of both base programs and features are composed together.

Thus, Batory et al. [Bat04] generalized various concepts from GenVoca to AHEAD. In the next section we give an overview of GenVoca/AHEAD implementation.

2.4.3 Using GenVoca/AHEAD and Related Tools We have already seen the basic premises of GenVoca and its generalization AHEAD. Jakarta Tool Suite (JTS) is the related collection of domain independent generator tools [BLS98]. The generator tools are also known as GenVoca generators and are used for creating domain specific languages. JTS consists of an extended version of Java called ‘Jak’, capable of meta-programming. The tool suite related to AHEAD also contains tools to extend programming languages. In these tools, both language and language extensions are treated as reusable components. While JTS is used for language extension and meta-programming, AHEAD can be used for language extension and in general, scalable feature composition of features of any kind specified in AHEAD specific format. Different combinations of language and language extensions yield different variants of given language. As stated in [BLS98], “Bali and Jak work cooperatively” to compose language and language extensions and create a parser with possibility to add semantic actions in Jak code as well as embedding semantic actions in corresponding Javacc implementation used in Bali.

In AHEAD a language and language extension are defined in two layers. 5 The first is the Syntax layer which contains grammars specific to both the language and language extensions written in Bali grammar notation. Bali2jak tool is used to transform the files of grammar composed using Balicomposer tool, to java parser files. The second layer is the Semantic layer which is generated using the Bali2layer tool. This layer is used to add semantic actions to jak files thus generated. The modified jak files are

5AHEAD Documentation- http://www.cs.utexas.edu/users/schwartz/ 20 2.5. Separation of Concerns composed to syntax layer generated files via refinement addition mechanism provided by tools like Mixin and Jampack. Bali2javacc tool is used in the syntax layer to convert the composed grammar specification to Javacc grammar specification. Grammars are arranged in directories and the composition sequence (which is particularly important in composing language extensions due to Bali specific grammar composition rules) is specified in equation files. Bali2jak tool takes an equation file which contains paths to grammars, and generates inheritance lattice and parse tree classes. The jak2java tool con- verts all jak files to java file thus producing a preprocessor for given composition [BLS98].

During our work on customizable parser for SQL:2003, we found that simple composition rules of balicomposer limit the application of Bali approach to capture the complexity of declarative nature of SQL:2003 specification. Allowing language or grammar specific composition rules by extending the original Bali grammar specification can solve this problem to a large extent.

2.5 Separation of Concerns

In software engineering and software development, new methodology (such as feature- oriented programming, aspects oriented programming) is invented which can tackle software issues like complexity, reuse, etc., based on its formalism. It is assumed by each of the various methodologies that depending on its specific viewpoint it succeeds in reducing software complexity and increasing software comprehensibility, provided that they also contain valid decomposition and composition techniques. Object-oriented methodology assumes that by viewing the real world entities as objects and classes and interactions among entities as messages, the real world scenario is well captured, thus encouraging decomposition of software into objects and providing mechanism for encapsulating and manipulating objects. Feature-oriented methodology’s stand on this is that by creating products based on features and viewing them as abstraction from requirements [SvGB01] can help manage complexity and evolution and increase comprehensibility, because customers are basically concerned with features of a product and believe that a product is different from other products only in terms of features provided and that a product evolves by augmenting features or modifying and removing them. Each methodology also establishes how software based on specific concern is organized in manageable pieces (like objects and features), how the formalism in the methodology enforces low coupling in order to minimize the impact of changes [TOHJ99], how to trace what parts have been affected by changes and manage them without invalidating other dependent parts of the system, and how to promote reuse of already existing components, etc.

Nevertheless it is found that evolution and maintenance activities result in in- creased coupling between software artifacts. They may also incur invasive modifications that affect other software artifacts in unexpected ways [TOHJ99]. Unforeseen require- ments or change of requirements based on one concern while the software was modeled in another considerably restrict reuse of artifacts. Mapping from a software model to implementation tends to obscure perspective as projects size grows thus reducing traceability. These problems about impact of change, reuse and traceability can be Chapter 2. Background 21

attributed to the limitation and unfulfilled requirements related to separation of concern [TOHJ99].

A problem has different important characteristics, and it is better to think of one facet at a time rather than thinking about complex relations between all of them simultaneously [Dij76]. Each issue handled correctly in isolation will lead to the solution of complete problem. This is known as the principle of separation of concerns as originated by Edsger W. Dijkstra [Dij76]. Czarnecki et al.[CE00] assert that in order to facilitate good qualities - understandability, adaptability, reusability, etc., of a program the principle of separation of concerns should be used and that issues should be handled in such way that intentionality and localization are also adhered to, so that a programmer’s intention about a specific issue (i.e. what the problem was and how the programmer planed to solve it) is well identified in the overall solution.

Ossher and Tarr [OT00] identify three distinct components to separation of con- cerns: Identification, in which a software is decomposed according to given formalism along a specific dimension. Encapsulation, which provides mechanisms to manipulate the concerns as first-class entities, and Integration, which is the composition mechanism in the given formalism to integrate the concerns which were represented as first-class entities into software based on those concerns.

Tyranny of the Dominant Decomposition

Given that all methodologies support decomposition based on a specific concern and that all life cycle phases in given methodology provide ways of decomposing and composing software artifacts, the separation of concerns becomes biased toward one kind of concern than others. This side effect is known as the tyranny of dominant decomposition [OT00].

All activities within a methodology revolve around a specific kind of concern (also called dominant dimension) and therefore other concerns are hardly given any thought. Related formalism is generally all about addressing one specific concern (Object in object-oriented software development and feature in feature-oriented software development). Programming languages that support different formalisms (like C++ which can be used both for procedural and object-oriented programming and recently extended for feature-oriented programming [ALRS05]) are ultimately used for one dominant concern. From a programmer’s point of view, programs that were created with specific concern in mind would be more comprehensible than if it was a ’mix’ of two concerns, which means that given the past experience, liking or mastery of specific way of programming (some programmers are good in procedural programming, some in object-oriented programming) a programmer or a developer may choose one kind of formalism over others, even if there was a way to implement them all simultaneously. In effect the overall modular structure evolves with this dominant concern in the developer’s mind. 22 2.5. Separation of Concerns

Multi-Dimensional Separation of Concerns

Ossher and Tarr [OT00] refer to a kind of concern as a dimension of concern. It is often discovered that many times, more than one concern may be important in achieving the hypothesized advantages of separation of concerns. That is, different concerns may be useful in different contexts and one would like to have a formalism that allows to signify different concerns within a system and provide decomposition and composition mechanisms for kinds of concerns, yet preserving the essential element of separation in them with respect to each other.

Different dimensions and corresponding formalisms address specific properties of software engineering that should be abided in creating good quality software [OT00]. For object-oriented development methodology, data abstraction results in isolating the details of representing the real world entity while encapsulation results in localizing entity specific interaction details. In this way, all about a real world entity is effectively in one place, thereby making future changes specific to it easier to handle. A formalism within a software methodology considers only one specific concern and ways to handle concerns of another dimension are not at all considered or integrated in it. Therefore a specific way of modeling artifacts although able in itself to achieve desirable software engineering properties, may not be able to do so when other concerns are required to be modeled besides it. What is good for one kind of concern may pose threat of unmanageability and complexity to other concerns. In other words unforeseen software engineering characteristics may emerge if two different concerns are implemented simul- taneously, seriously affecting the basic advantages of each kind of concern [TOHJ99]. Object-oriented decomposition may result in what Ossher and Tarr [OT00] identify as two negative phenomena: scattering and tangling with respect to feature dimension. Features may be “scattered across multiple classes”, and methods supporting one feature are “tangled with methods supporting other features within the same class”. The scattering and tangling imply that change in requirements in terms of feature affects multiple classes and modifications done to these classes may in fact undermine the original object structure.

Ossher and Tarr [OT00] assert that “Different dimensions are useful for different reasons, at different times”. They find that “set of dimensions of concern and the set of concerns within those dimensions vary over time”. Design patterns, refactoring, object serialization/deserialization are concerns within object dimension. But not all of the concerns in each dimension are thought about from the beginning of modeling based on each dimension. They become relevant over time [OT00]. Given that software was feature decomposed, augmenting another feature would be intuitively straight forward and minimally invasive within software modeled also along feature dimension and implemented accordingly, since mechanisms for handling feature interactions, dependency, etc., are provided in it. Adding a feature to software that was decomposed and implemented along object dimension on the other hand, poses immediate obstacles, as the mechanisms of change belong to object dimension rather than the feature dimension. Similarly introducing a new real world entity and managing its interaction with pre-existing entities is intuitive in software based on object dimension. There- fore a methodology may provide advantages in one area while posing problems in another. Chapter 2. Background 23

The hallmark of reuse is expecting changes before they occur [OT00]. But not all kinds of changes that may or may not happen can be well anticipated. The provisions made to the modular structure with anticipation of changes may never be really used, because those changes never happened or they happened in some other way than what they were provided for. Such anticipation may result in provisions that might add only to complexity of overall software as main intent of the provision may be lost over time if the changes did not take place at all.

Sometimes, even though a developer wants to favor one methodology over the other, the software specifications may be delivered to him in content and vocabulary grossly biased to some other kind of methodology [OT00]. The simplest example of this is when requirements are stated in terms of features to a developer who uses object-orientation for creating software. He has to translate the requirements as features to object-oriented vocabulary and proceed to create classes, etc. If the developer is versed on only one kind of methodology, this process is bound to complicate the matters.

The bottom line of above discussion is that concerns from multiple dimensions may have to be unavoidably considered in different phases of product life cycle and this is applicable also when creating a product line architecture of SQL:2003 based on its feature-oriented decomposition as such an architecture will invariably have to deal with multiple dimensions one of which is the feature dimension.

2.6 Summary

Since its inception SQL has been standardized five times. SQL:2003 is the current ISO/ANSI SQL standard. We apply Software Product Line Engineering concepts to SQL:2003. The principles of separation of concerns and stepwise development are basic principles of software engineering; at the same time, many different kinds of concerns have to be unavoidably considered in different phases of software development. We fo- cus on modeling and implementation of the feature concern. The feature-oriented de- composition is part of the Domain Analysis phase of Domain Engineering, one of the two sub-processes of Software Product Line Engineering, the other being Application Engineering. Features of SQL:2003 are distinguishable characteristics or constructs of SQL:2003. In order to implement features thus obtained we intend to borrow feature- oriented programming approach of language and language extensions from the Bali and the related GenVoca/AHEAD family of tools. In the next chapter we present the fea- ture diagrams of SQL:2003 and present how customized parsability of SQL:2003 may be achieved using sub-grammar composition and a parser generator. 24

Chapter 3

Feature-Oriented Decomposition of SQL:2003

We restrict the scope of the thesis to modeling the features of SQL:2003, particularly of SQL/Foundation [Mel03a]. We give an example also of how the features can be used in implementing a customizable parser. The complete implementation of various rules in the standard is beyond the scope of this thesis. We only take review of various imple- mentation models that may be used to do so.

3.1 Feature Modeling Technique for SQL:2003

We have seen in the last chapter that feature diagrams are accompanied by additional pieces of information (cf. Section 2.3.2). We consider these again as applied to the feature modeling for SQL:2003.

• Semantic description In our work, we include small description of the feature. References to sections in the ISO/ANSI standards are given for further reference. We did not repeat the explanations from the standard. It is suggested that de- velopers have access to the SQL standard documents for additional reference and implementation related help as many SQL constructs have complex set of rules to follow, that must be considered in further phases of development and cannot be covered either in feature diagram or in the semantic description.

• Rationale We base the feature decomposition of SQL mainly on BNF grammar of various SQL statements and their constituents and corresponding specifications in the SQL:2003 standard (particularly SQL/Foundation [Mel03a] and SQL/Frame- work [Mel03b]. That the grammar and specification should be used in feature decomposition of a language is intuitive. No programming language can be decom- posed in terms of features without attending to the syntax of various important language constructs. Accordingly, any prominent part of the production rules for a specific statement is considered a feature. Cardinality based modeling notations are used to denote multiple occurrences of a particular construct within a statement. Therefore, the qualifying characteristic for any part of grammar to be a feature is that it represents an important SQL language construct.

• Stakeholders and client programs There are no particular stakeholders in- Chapter 3. Feature-Oriented Decomposition of SQL:2003 25

volved. In a larger case study undertaking, other departments, universities, research institutes and private vendors may become stakeholders.

• Exemplar systems There is no known case study of decomposing in feature- oriented way either an entire programming language (or parts of it) from scratch (extending language is not the same) or SQL:2003 itself. Bali and related tools from AHEAD tool suite [BSR04], as well as Czarnecki’s Eclipse plug-in for feature modeling called ‘fmp’ [CK05] were useful in gaining insights about our work.

• Constraints and default dependency rules The ‘requires’ , if existing in the same feature diagram is shown with a labeled arc. If it spans to other feature diagrams then those detail are mentioned in ‘Requires’ section accompanying the feature diagram. We have not used the concept of attributes as there are not many instances of SQL constructs which need to be assigned default values and in general it would not add to better understanding of the decomposition. Each of the statement specification in the SQL standard though contains complex conditions of syntax which cannot be represented by merely stating ‘requires’ or ‘excludes’ and they have not been covered.

• Availability sites, binding sites, binding modes We have omitted this part in our work, since the scope is restricted to see whether feature decomposition can be done and we concentrate only on feature decomposition of SQL:2003 in this work and not on implementation in which case availability and binding sites and binding modes would be important. The accompanying implementation of a customizable parser only gives example of parsability based on the sub-grammars from SQL:2003 features.

• Priorities We have not considered the priority concept in our work.

3.1.1 Basis for Modeling Features in SQL:2003 The sources of features can include existing and potential stakeholders, domain experts and domain literature, existing systems, pre-existing models, and models created during development. The main source for our work is the various SQL:2003 standards ISO/IEC 9075-(n):2003 [Mel03b] which define the SQL language. The parts SQL/Framework, SQL/foundation and SQL/Schemata encompass the minimum requirements of the language. Other parts define extensions [Mel03b].

We base our feature diagrams on the BNF grammar specification of SQL:2003 and other information given in SQL/Foundation [Mel03a]. The idea of similarity between features represented as feature diagrams and a BNF grammar representation was put forward by De Jonge et al. [dJV02]. Batory et al. [Bat05] have used iterative grammars for this purpose, though they assert that more general grammars can be used, e.g., see Figure 3.1,

• Figure 3.1(a) is the production A: B C D ; assuming all features are mandatory. If a feature is optional (as is C), it is surrounded by [brackets]. Thus, the production for 3.1(a) is A: B [C] D ; 26 3.1. Feature Modeling Technique for SQL:2003

• Figure 3.1(b) is the production: A: B | C | D;

• Figure 3.1(c) corresponds to a pair of rules: A:t+; and t: B | C | D ; meaning one or more of the B, C, D are to be selected.

(a) AND features (b) Alternative features (c) OR features

Figure 3.1: Parent child relationships in feature diagrams as grammar rules [Bat05].

The similarity between feature diagrams and grammars considers a very general form of grammar. Having obtained feature diagrams using the SQL:2003 BNF grammar, we wish to convert the feature diagrams to LL(k) sub-grammars (as required by ANTLR parser generator). The BNF grammar of SQL is used in constructing the feature diagrams based on following assumptions:

• A complete SQL:2003 BNF grammar represents a product line, in which various sub-grammars represent features which when composed together give products of this product line, namely different variants of SQL:2003.

• A non-terminal may be considered as a feature only when the non-terminal clearly expresses an SQL construct; placeholder non-terminals are not considered.

• Mandatory non-terminals are represented as mandatory features.

• Optional non-terminals are represented as optional features.

• The choices in the production rule are represented as or-features (instead of alter- native features). Consider - A : B | C | D ; if such a production rule appears in the SQL grammar we would use 3.1(c) instead of 3.1(b) to represent the corresponding feature diagram. This is required because we want the implementation product configuration to consider all choices, instead of an alternative among them.

• A terminal symbol is considered only if it presents an important characteristic of feature under consideration apart from the syntax.

• The notation ‘...’ is used in the standard to show multiple occurrences of a con- struct, see Section 6.2 of [Mel03b]. We use the cardinality notation to depict this fact.

The grammar given in SQL/Foundation [Mel03a] is useful in understanding overall structure of an SQL construct, or what different SQL constructs constitute the larger SQL construct. We have found out that this approach may also be useful in general Chapter 3. Feature-Oriented Decomposition of SQL:2003 27 to carry out a feature decomposition of any programming language as the grammar establishes the basic building blocks of any programming language.

We use the cardinality concept from cardinality based feature modeling(cf. Sec- tion 2.3.2) in many feature diagrams, such as, e.g., the feature diagram for Domain Definition (Figure 3.3), Table Definition(Figure 3.4) and Schema Routine(Figure 3.6), etc., indicating that a particular feature such as table element in the feature diagram table definition may be cloned with its subtree, and parts of the subtree may be configured differently as in varying syntax and choices of non-terminals along the table element subtree.

The cardinality concept is not absolutely necessary [CE00]. Cardinalities about features can be expressed even without using a cardinality notation, e.g., by directly mentioning the cardinality information in the feature, or creating another feature that expresses the number information about the main feature and so on [CE00]. We use the cardinality notation because it expresses the modeling intent more succinctly without adding complexity or overloading feature diagrams with extra features. The cardinality notation also expresses a closer relationship to the SQL BNF grammar on which we base the feature diagrams.

Regarding the tree nature and large size of feature diagrams Czarnecki et al. [CE00] make the following observations.

In some cases, representing a feature diagram using a more general di- rected graph would be certainly useful. For example, we might want to allow for multiple references to one subgraph, in order to avoid its duplication within the diagram. In the following discussion, however, we assume that the diagram is a tree. A practical approach to avoiding the duplication of feature subtrees in a larger feature diagram is to only include the roots of the subtrees in the larger diagram and to show the duplicated subtree in one separate diagram. and

Sometimes, when we draw a large feature diagram, it is convenient to split it into a number of smaller diagrams. In this case, the roots of the smaller subdiagrams are features of the concept represented by the root of the original diagram rather than concepts.

For larger feature diagrams in which the leaf features are further expanded, they are shown in separate feature diagrams, and references to such feature diagrams are given.

When a feature diagram contains features that require each other, this is shown by dashed arc with arrow pointing to the required feature. When a feature in a feature diagram requires feature from another feature diagram the requires conditions are given under ‘Requires’ in the semantic description for that feature diagram. 28 3.2. Feature Diagrams for SQL:2003

3.2 Feature Diagrams for SQL:2003

We present here 10 feature diagrams that are representative of modeling technique used as well as those that present some of the most important SQL:2003 constructs and explain the modeling technique as applied to the grammar specification in SQL:2003 standards and other information. The rest of the 30 feature diagrams are decomposed in a similar manner like the 10 feature diagrams here. The reader is requested to refer to Appendix A for these feature diagrams.

Feature ID - SQL:2003

Figure 3.2: Main Feature Diagram of SQL:2003

Semantic Description Figure 3.2 shows the main feature diagram of SQL:2003. It also represents the most coarse-grained decomposition. We have chosen to decompose SQL/Foundation further as core of the SQL:2003 has been defined in SQL/Foundation. A customer may be presented with this feature tree for selecting the SQL/Foundation and other packages. If the customer wishes to specific features from within the extension packages, then further decomposition of these can be carried out in a manner similar to the decomposition of SQL/Foundation. SQL/Foundation [Mel03a] defines the basic operations of SQL. Chapter 3. Feature-Oriented Decomposition of SQL:2003 29

Figure 3.3: Domain Definition Feature Diagram Feature ID - Domain Definition Semantic Description - Figure 3.3 shows the feature diagram for Domain Definition. This feature shows the use of cardinality notation [0..*]. A domain is used to define a set of valid values of a data type by specifying 0 or more domain constraints, denoted in the grammar specification as ‘[... ]’. The syntax notation is described in SQL/Framework [Mel03b]. The presence of a feature ‘0 or more times’ is depicted in the feature diagram with the cardinality notation [0..*]. For the grammar specification of Domain Definition, refer to Section 11.24 of SQL/Foundation [Mel03a].

Requires - The Check Constraint Definition feature requires the Search Condi- tion feature of the Predicate feature (Figure A.30). The Domain Definition feature optionally requires the Predefined Types feature of the Data Type Feature (Figure A.23).

This diagram also shows the use of ‘Requires’ condition which occurs because of the tree nature of feature diagrams and parent child relationship between features, in which a child can have only one parent. The grammar specification indicates that the Check Constraint Definition feature has Search Condition as its child feature. Due to the fact that Search Condition feature repeats in many other feature diagrams, we moved it to a single parent the Predicate feature and we state the relationship of Check Constraint Definition to Search Condition in the Requires part of the semantic description. 30 3.2. Feature Diagrams for SQL:2003

Figure 3.4: Table Definition Feature Diagram Feature ID - Table Definition Semantic Description - Figure 3.4 shows the feature diagram for Table Definition.A table is a collection of rows having one or more columns. A table definition is specified by ‘CREATE TABLE’ statement. A table can be created in three different ways. The most general way is to specify the table name and columns with their data types and constraints. A table can be created using CREATE TABLE AS. CREATE TABLE AS (the ‘AS subquery’ feature) creates a table from the result of a SELECT statement. Finally, a table can be created using the LIKE clause. A table can be created that looks like another table. That is, one can create a table that includes all of the column definitions from an existing table using the LIKE clause.

As seen above the Table Definition feature contains SQL:2003 specific features such as Table As and Table Like statements. The grammar specifications for Table Element and Typed Table Element are stated as ‘

[ {
}... ]’ and ‘ [ { }... ]’ respec- tively. (cf. Section 11.3 SQL/Foundation [Mel03a]). This is depicted in the feature diagram with cardinality notation [1..*]. For general information about Tables refer to Section 4.14 of SQL/Foundation [Mel03a], Chapter 3. Feature-Oriented Decomposition of SQL:2003 31 for the grammar specification of Table Definition refer to Section 11.3 of SQL/Founda- tion [Mel03a].

Requires - The AS Subquery feature require the Query Specification feature 3.10.

Figure 3.5: View Definition Feature Diagram Feature ID - View Definition Semantic Description - Figure 3.5 shows the feature diagram for View Definition.A view is a query evaluated and named so that it can be used as a table in other queries. If the view contains a reference to itself in its own definition, then it is a recursive view. This feature diagram shows the peculiar case of an optional feature having an optional child. The grammar specification for the feature Check Option is ‘[ WITH [ ] CHECK OPTION ]’. The Level feature may or may not be selected depending on whether Check Option feature was selected. (cf. Section 11.22 SQL/Foun- dation [Mel03a]).

Requires - The View Definition feature requires the Query Expression Feature(Figure 3.9). 32 3.2. Feature Diagrams for SQL:2003

Figure 3.6: Schema Routine Feature Diagram Feature ID - Schema Routine Semantic Description - Figure 3.6 shows the feature diagram for Schema Routine.A schema routine or an SQL-invoked routine is either an SQL-invoked procedure invoked from SQL call statement or an SQL -invoked function, the invocation of which returns a value.

The complex set of ‘requires’ arc happens because the features Routine Body, Routine Characteristic, and SQL Parameter are required by both the Schema Procedure and Schema Function features. (cf. Section 11.50 of SQL/Foundation [Mel03a] for the grammar specification)

The Returns Type feature shows the SQL:2003 specific feature in which an SQL- invoked function can return a table. The cardinality notation [1..*] for SQL Parameter feature indicates that both Schema Procedure and Schema Function may take 1 or more SQL Parameters, with same or different Parameter Mode.

For further information regarding SQL-invoked routines refer to Section 4.27 of SQL/Foundation [Mel03a]. Chapter 3. Feature-Oriented Decomposition of SQL:2003 33

Figure 3.7: Insert statement Feature Diagram Feature ID - Insert statement Semantic Description - Figure 3.7 shows the feature diagram for Insert Statement. The statement adds rows to columns. The values for various columns are obtained from a subquery, using the Values Clause or default values.

The Override Clause is an optional feature of both From Subquery and From Constructor features. Therefore it has been moved to the concept feature Insert Statement with corresponding ‘requires’ arcs.

Requires - The From Subquery feature requires the Query Expression feature(Figure 3.9), Column Definition feature (Figure A.4) and Sequence Generator feature (Figure A.5).

Figure 3.8: Merge statement Feature Diagram Feature ID - Merge statement 34 3.2. Feature Diagrams for SQL:2003

Semantic Description - Figure 3.8 shows the feature diagram for Merge Statement. The Merge Statement feature is SQL:2003 specific feature. A statement combines the functionality of and insert statements, so that whenever both operations are required to be executed together, they can be replaced by a merge statement (e.g., when transferring rows from a transaction table to a master table) [EMK+04]. The update is carried out when rows match and the insert is carried out when they do not. This is depicted in the feature diagram fro Merge Statement. (cf. Section 14.9 of SQL/Foundation [Mel03a])

Requires - Merge Statement feature requires Table Reference feature(Figure A.27), Insert Statement feature(Figure 3.7), the On Clause feature requires Search Condition feature of Predicate feature(A.30), and the Merge Insert Specification feature requires the Contextually Typed Value Specification of Scalar Expression feature A.22.

Figure 3.9: Query Expression Feature Diagram Feature ID - Query Expression Semantic Description - Figure 3.9 shows the feature diagram for Query Expression. The Query Expression feature is an important feature required by many other features.

A query expression is used to specify a table. The With Clause feature is used to define query expressions that can be named and treated as tables. The ‘requires’ arc from the With List Element feature to its grandparent Query Expression indicates a recursive relationship when accompanied by RECURSIVE keyword. This is depicted in feature diagram for Query Expression with Recursive Query feature. For further information on features like Except Operator, Intersect Operator, and Union Operator refer to Section 7.13 of SQL/Foundation [Mel03a]. These features are represented by keywords in the grammar specification and they are treated as features as they denote an important functionality related to the feature Query Expression. Chapter 3. Feature-Oriented Decomposition of SQL:2003 35

Figure 3.10: Query Specification Feature Diagram Feature ID - Query Specification Semantic Description - Figure 3.10 shows the feature diagram for Query Specification. The Query Specification feature specifies a SELECT statement. The [1..*] cardinality notation for Select Sublist feature indicates that one or more columns can be selected. The Set Quantifier feature tests for duplicate rows and returns all rows or distinct rows depending on the features ALL and DISTINCT. The Asterisk feature indicates that all columns of specified table are selected. The AS Clause is used to name or rename result columns.

Figure 3.11: Table Expression Feature Diagram Feature ID - Table Expression Semantic Description - Figure 3.11 shows the feature diagram for Table Expression. The Table Expression feature specifies a table or a grouped table. The From Clause feature is mandatory. The Where Clause feature can be used to apply conditions to columns in a table expression. With the Group By feature can be used to group columns values while using aggregate and other grouping functions. Having Clause feature is supposed to be used along with Group By to apply conditions to grouping of columns. If it is used without Group By Clause feature then it applied to all rows that satisfy given condition. Since Having Clause can be used independently of Group By Clause feature, no ‘requires’ arc has been shown between the two. 36 3.3. Sub-grammars Based on Feature Diagrams

The Window Clause feature is SQL:2003 specific. It is used to define window (i.e. a set of rows) which can be used with the Window Functions.

Requires - The Having clause can be used with and without clause, although making Having Clause feature require Group By Clause feature we can make using both of them together whenever Having is used. The From Clause feature requires the Table Reference feature (Figure A.27).

The rest of the feature diagrams for features of SQL:2003 can be found in Ap- pendix A. These diagrams are modeled with the same modeling techniques and in a manner similar to the feature diagrams presented in this section. In the next section we present an example of how these features can be used to construct sub-grammars of SQL:2003 that can be composed to obtain customized parsability.

3.3 Sub-grammars Based on Feature Diagrams

The goals of the thesis are to model feature diagrams for features of SQL:2003 and review the related techniques. While stating the case for feature-oriented decomposition of SQL:2003, we also mentioned the tool Bali, which can compose extension grammars (sub-grammars) of a programming language to create language extensions. We present here an example of how sub-grammars based on SQL:2003 features may be composed to obtain parsability of precisely those SQL statement(s) which were represented in feature diagrams as features and whose feature instance description constituted the concerned SQL statement(s).

The Approach We are working on a prototype for a parser for SQL:2003 which borrows the idea of composing sub-grammar to add extra functionality to a language from the Bali approach. In the last chapter we introduced Bali and related tools in Section 2.4.3. In Bali and related tools the language and language extensions are defined in two layers - syntax layer and semantic layer. In the syntax layer given the base grammar for language (such as Java) and extension grammars (Java language features with which we want to extend the syntax of Java language) various tools of the AHEAD tool suite can be used to compose the base grammar and extension grammars to the grammar that would include the modified syntax. When semantic actions are included in the semantic layer the language is extended with different features. From features and feature diagrams perspective, this process can be summarized as follows:

1. We model the original specification of a language and also the specification of a language extension in terms of feature diagrams.

2. Given the feature diagram of a particular part of language that needs to be ex- tended (e.g., if conditional statement in Java) we need to create the grammar Chapter 3. Feature-Oriented Decomposition of SQL:2003 37

(LL(k) or any other type of grammar depending on which parser generator we wish to use) that captures the feature instance description so that the specific feature is included in it which we want to add to the original specification.

3. Having obtained LL(k) grammars for the base and extension features, we compose them to obtain LL(k) grammar which contains the syntax for both the base and extension features.

4. With a suitable parser generator, we use the composed grammar to obtain a parser which can effectively parse the original as well as extension specification.

5. Using Jak and other feature-oriented programming tools we add semantic actions to the parser code thus generated effectively creating a preprocessor that extends the original language. We wanted to check whether the features and feature diagrams of SQL:2003 can be used to create a customized parser with these steps. We found that we can create a customized parser for SQL:2003 that can selectively parse precisely those SQL:2003 statements which are represented as features in feature diagrams under consideration. We have not covered the 5th step of adding semantic actions using the Jak and other feature-oriented programming tools, since adding semantic actions for any SQL:2003 features must observe complex and nontrivial rules and is beyond the scope of this thesis. The following section presents an example based on the steps above. We use the ANTLR parser generator along with our own grammar composer in our approach instead of the Bali and related tools, the reasons for which are discussed in the comparison between the Bali approach and our implementation.

The Implementation Suppose that we want to implement the SELECT statement in SQL:2003 represented by the Query Specification feature (cf. Figure 3.10). Specifically we want to implement a feature instance description of {Query Specification, Select List, Select Sublist (with cardinality 1), Table Expression} with the Table Expression feature instance descrip- tion, {Table Expression, From Clause, Table Reference (with cardinality 1)}. We would proceed as follows: 1. We refer to the feature diagram for the feature Query Specification (cf. Figure 3.10) obtained by feature-oriented decomposition of SQL:2003.

2. Based on the Query Specification feature diagram, we create LL(k) grammars for each feature in the feature instance description using the original SQL:2003 BNF specification for Query Specification (cf. Section 7.12 in SQL/Foundation [Mel03a]). Listings 3.1, 3.3 and 3.5 are LL(k) sub-grammars for the features in the feature in- stance description. We compose these sub-grammars and the corresponding tokens to one LL(k) grammar as given in the Listing 3.7.

3. We compose these sub-grammars and the corresponding tokens to one LL(k) gram- mar as given in the Listing 3.7. Since different features of a particular SQL:2003 statement all represent variation of production rules we need to use a lookahead value sufficiently large so that accurate parsing decision may be taken by the parser 38 3.3. Sub-grammars Based on Feature Diagrams

code generated. We have found that a lookahead of 3 is sufficient for various sub- grammars of SQL:2003 that we have used.

4. Using the ANTLR parser generator, we create the parser with the composed gram- mar. The parser code thus generated is specific to the features we considered. That is, it is capable of parsing precisely the features in the feature instance descrip- tion for which we created LL(k) grammars. It parses a SELECT statement with optional Set Quantifier and Optional Where clause and nothing else that was not included in the feature instance description.

The sub-grammar for Query Specification feature with Single Sublist feature (single column) (Listing 3.1) only contains production rules that represent the feature instance description of {Query Specification, Select List, Select Sublist (with cardinality 1), Table Expression} with the Table Expression feature description, {Table Expression, From Clause, Table Reference (with cardinality 1)}. The sub-grammar for Query Specification feature with the optional Set Quantifier feature (Listing 3.3) only contains production rules that represent the feature instance description of {Query Specification, Set Quantifier}. The sub-grammar for Table Expression feature with the optional Where Clause feature (Listing 3.5) only contains production rules that represent the feature instance description of {Table Expression, Where Clause}. The sub-grammars shown in Listing 3.2, Listing 3.4 and Listing 3.6 show the tokens for the above sub-grammars respectively. Listing 3.1: Sub-grammar for Query Specification with single column(cardinality 1 of Select Sublist feature) 1 startproduction : queryspecification ; 2 queryspecification : SELECT selectlist tableexpression ; 3 selectlist : selectsublist ; 4 selectsublist : derivedcolumn ; 5 tableexpression : fromclause ; 6 fromclause : FROM tablereferencelist ;

Listing 3.2: Tokens for Query Specification 1 SELECT : ’select’ ; 2 FROM : ’from’ ; 3 PERIOD:’.’; 4 ASTERISK:’*’;

Listing 3.3: Sub-grammar for Query Specification with Set Quantifier feature 1 startproduction : queryspecification ; 2 queryspecification : SELECT SETQUANTIFIER? selectlist tableexpression ;

Listing 3.4: Tokens for Set Quantifier feature 1 SETQUANTIFIER : ’DISTINCT’ | ’ALL’ ;

Listing 3.5: Sub-grammar for Table Expression with the Where Clause feature 1 tableexpression : fromclause whereclause? ; 2 whereclause : WHERE searchcondition ; Chapter 3. Feature-Oriented Decomposition of SQL:2003 39

Listing 3.6: Tokens for Where Clause feature 1 WHERE: ’where’ ; 2 EqualsOperator: ’=’ ;

Listing 3.7: Grammar obtained by composing sub-grammars 1 grammar GC ; 2 options { k=3; // lookahead of 3} 3 4 startproduction : queryspecification ; 5 fromclause : FROM tablereferencelist ; 6 whereclause : WHERE searchcondition ; 7 selectsublist : derivedcolumn ; 8 tableexpression : fromclause whereclause? ; 9 selectlist : selectsublist ; 10 queryspecification : SELECT SETQUANTIFIER? selectlist tableexpression ; 11 12 ASTERISK:’*’; 13 SELECT : ’select’ ; 14 SETQUANTIFIER : ’DISTINCT’ | ’ALL’ ; 15 EqualsOperator : ’=’ ; 16 PERIOD:’.’; 17 FROM : ’from’ ; 18 WHERE : ’where’ ; 19 20 identifier : ID ; 21 ID : (’a’..’z’|’A’..’Z’)+ ; 22 INT : ’0’..’9’+ ; 23 NEWLINE:’\r’? ’\n’ ; 24 WS : (’ ’ | ’\t’ | ’\n’ | ’\r’)+ \{skip();\} ;

In Listing 3.7, lines 1 and 2 show the ANTLR lookahead option set to 3. Lines 4-10 show the production rules composed from various sub-grammars. Lines 12-18 show the tokens and lines 20-24 show ANTLR specific tokens. Composing the sub-grammars for Query Specification feature, the optional Set Quantifier feature of Query Specifica- tion and the optional Where Clause feature of Table Expression feature which itself is a mandatory feature of Query Specification (Figure 3.10), gives a grammar which can essentially parse a SELECT statement with a single column from a single table with optional set quantifier (DISTINCT or ALL) and optional where clause. This procedure can be extended to other features of SQL:2003, first mapping features to sub-grammars and then composing them to obtain a customizable parser.

Comparison of the Bali approach and our Implementation The Bali approach requires Bali specific notation which is converted to Javacc LL(k) grammars. We instead use the LL(k) grammars with additional ANTLR options in our prototype. In spite of getting the idea of composing sub-grammars from Bali, we found that the tool Balicomposer’s composition of Bali grammar rules is restrictive in express- ing the complex structure of SQL rules. Therefore we implemented our own grammar- composition mechanism. In composing LL(k) grammars we must consider the treatment of non-terminals, terminals, and other specific syntactical constructs. Following is the summary of the difference between the two approaches. • Non-terminal symbols Bali considers different kinds of non-terminal symbols such as named productions and sub-productions.A named production assigns 40 3.3. Sub-grammars Based on Feature Diagrams

a name to non-terminal production rule. When composing named productions the Balicomposer overrides all named-productions with the last production in composition sequence. The problem with the composition of named-productions is that replacing the original productions results in erroneous and lossy composition when many sub-grammars containing various optional features are composed. Only the production rule that is last in sequence is retained. Sub-productions are choices for the same non-terminal, e.g., A: B | C | D; The order of the merged sub-productions is arbitrary in Balicomposer when various choices for the LHS are composed. The composition of sub-productions considers only singleton non- terminals (non-terminals that are not surrounded by other non-terminals, optional or mandatory, e.g., A : B ; here B is a singleton non-terminal, while in A: C B ; or A: B [C] ; B is non-singleton.) while we may have cases where sub-productions of a non-terminal expand to non-singleton productions, in which case they have to be represented as named-productions with the same disadvantage again.

In the composer used for the prototype no names are used to name the productions. The defining criterion for a production is its Left Hand Side (LHS). A grammar repository contains a map of LHS of a rule which is stored as a key and the Right Hand Side as one of the values of an arraylist.

The composition of production rules with same LHS are as follows:

– If the new RHS production contains the old one, then replace the old RHS with the new. – If the new RHS production is contained in the old one, then keep the old RHS production. – If the new and old RHS productions defer, then they are appended.

Any optional specification within a RHS should be composed after the correspond- ing non optional specification. Similarly for the sub-productions of a same non- terminal both singleton and non-singleton RHS are considered. Accordingly, if the RHS production differs from all the already present RHS productions then it will be appended. The order of composition is important only from the point of view of incorporating the optional NTs in the RHS.

• Complex lists Complex lists are of the form ‘ [ ... ]’. In the Balicomposer every new RHS replaces the old one as we have seen in the name-production composition. This poses difficulty in incorporating ‘one or more’ of a feature when the complex list is surrounded by other non-terminals and also there are sub-productions for the same LHS non-terminal. In the composer for the prototype, if a feature contains complex list and its sublist (like the Select Sublist feature), then they simply have to be composed sequentially with the sublist being composed ahead of the complex list, and no information is lost in composition.

• Terminal symbols There is no separate treatment of tokens in Bali. In the com- poser for the prototype each grammar file has its corresponding tokens file, in which we must mention the tokens used in this grammar file. The token definitions Chapter 3. Feature-Oriented Decomposition of SQL:2003 41

are persisted, which means that each token will have clear meaning and duplicate definitions are not considered.

Production rules common to different sub-grammars representing features of a concept are kept in a separate file and composed later. Maintaining tokens and maintaining common rules is a nontrivial task and we are exploring other ways of managing them.

Given that Balicomposer was the piece of code most essential to this task and that it was limited in its functionality, we started working on our own implementation of a grammar composer. We found that in mapping SQL:2003 features to implemen- tation model, the feature constraints must be on features instead of non-terminals of individual grammars representing the features. Additionally the composition sequence in Balicomposer does not really reflect the parent child relationship of features in terms of inclusion of features in the feature instance description. We are exploring ways of using XML for representing features with their parent child relationship as in the base feature model and along with ‘require’ and ‘exclude’ conditions if any. In creat- ing a product the feature will be included in instance description in a manner similar to traversing the feature tree and including features adhering to rules of feature modeling.

The Bali approach is still important because in order to create an SQL engine (even for a moderate number of SQL statements) we need the capability provided by feature-oriented programming which is already present in the form of Jak language and related tools like Mixin, Jampack in the tool suite. In the next chapter we present other implementation models for mapping SQL:2003 features to implementation models.

3.4 Summary

We use the idea of similarity between feature diagrams and grammar representation proposed by De Jong et al. [dJV02] and Batory [Bat05] in our feature decomposition and modeling technique. We present 40 feature diagrams obtained by feature-oriented decomposition of SQL/Foundation of SQL:2003. Other extension packages of SQL:2003 can be decomposed similarly. A software product line may be represented as a feature diagram and corresponding grammar representation. A valid feature instance description is equivalent to a sentence of this grammar and represents a specific product of the prod- uct line. We apply this to SQL:2003 BNF grammar specification and SQL:2003 features so that using feature diagrams of SQL:2003 we create LL(k) grammars corresponding to feature instance description which represent products of this product line namely, valid statements in SQL:2003. We found the original Bali approach restrictive and therefore implemented own grammar composer. Bali and related tools are still important from the perspective of adding semantic actions to generated parser code using feature-oriented programming. In this way, we decompose SQL:2003 into features and present an example of how to use these features in obtaining customized parsability. 42

Chapter 4

Issues in Feature-Oriented Decomposition of SQL:2003

In the last chapter, we gave examples of the feature decomposition of SQL:2003 in the form of various feature diagrams and how the sub-grammars based on feature diagrams can be composed to obtain a grammar which can be used to parse only the selected SQL constructs. We also presented the difference between the Bali tool from the GenVo- ca/AHEAD family of tools and our own approach. In this chapter, we first present other implementation models and then compare these approaches in the context of SQL:2003 features. Various issues specific to the feature-oriented decomposition of SQL:2003 are then discussed.

4.1 Other Implementation Models

Although features represent commonalities and variabilities in a concise taxonomic form, they only indicate the concepts and characteristics of a given system to be implemented, without any hint to implementation details; as such features are only symbols [CA05]. Mapping features to implementation models ascribes meaning to the concept hierarchy so to speak, in a workable context.

We discuss only the superimposed variants and hyperspace approaches from many other possible approaches because we have found that as we proceed in our approach, it resembles more to the structure of superimposed variants approach by Czarnecki et al. [CA05], also, the hyperJ approach by Ossher and Tarr et al. [OT00] is direct result of their research on multi-dimensional separation of concerns.

4.1.1 Superimposed Variants Model Concept The superimposed variants approach [CA05] consists of using a model template and a feature model. A configurable feature model mimics the parent-child relationship of a feature diagram and presents the superimposition of all variants of a product line represented by various feature diagrams. The model template parallels the structure of a feature model. (see Figure 4.1) Czarnecki et al. [CA05] use the concept of presence Chapter 4. Issues in Feature-Oriented Decomposition of SQL:2003 43 conditions and meta-expressions. The information contained in feature diagram about ‘mandatory’ness and optionality as well as conditions such as ‘requires’ and ‘excludes’ are captured in the feature model of superimposed variants. The presence conditions state whether the feature it is attached to is to be included in the instance of model template or not. Meta-expressions are used to encapsulate cardinality based feature modeling concepts of attributes and return type. (Refer to Section 2.3.2)

Figure 4.1: Overview of Superimposed Variants Approach [CA05]

Model Implementation Creating a product from a product line represented as a feature model requires the user to first of all express all about product line in terms of features, Feature diagrams can be suitably used to create the feature model and configure it with presence conditions and meta-expressions. Once all information about features has been encoded in the feature model, user selects the features for the desired product. Based on feature model and selected features, corresponding model template is instantiated. Presence conditions are evaluated for valid feature inclusion. If a parent of a feature was not selected but feature itself was selected, then such a feature is discarded just as in a feature diagram.

Czarnecki et al.[CA05] discuss the use of XPATH1 to express complex presence conditions that may include such cardinality based feature modeling information as number of instances of a particular features (called clones [CHE04]) and feature attributes. They have found using XPATH to be effective in evaluating conditions and user-defined functions about conditions as required. Once the model template has been instantiated, the information in the model template can be used to process the feature model in target notation (such as create product instance in Java) with product specific conditions.

4.1.2 Hyperspaces Model Concept The hyperspace approach [TOHJ99], [OT00] is a methodology for decomposing and composing software systems according to multiple concerns like Objects or Classes

1 http://www.w3.org/TR/xpath20/ 44 4.1. Other Implementation Models

(from object-orientation) and Features (from feature-orientation). A hyperspace is used to denote concerns in system as a multi-dimensional matrix (see Figure 4.2). Each dimension signifies the kind or the type of concern in the hyperspace matrix. For feature models to be mapped to an object-oriented implementation, two concerns, Classes and Feature prevail. Decomposition and composition is carried out feature- oriented way but implemented in object-oriented manner. A hyperslice encapsulates concerns along specific dimension. Composing hyperslices of a kind of concerns yields a hypermodule. Composing hypermodules with concern specific composition rules, a software product family can be obtained. Hyperslices must be declaratively complete [OT00], i.e. all members must declare functions and variables they use. Concerns of different dimensions may be related to each other either in a context sensitive way, such that they affect each other only in certain context, or in a context insensitive way, such that they affect each other regardless of any particular context [OT00].

Figure 4.2: Hyperspace matrix with two relevant dimensions; Classes and Features [PRB03]

Model Implementation HyperJ HyperJ2 is a tool that supports the hyperspace approach in Java [OT00]. In HyperJ, a program can be decomposed to classes and other concerns. A concern of a kind is encapsulated into hyperslices and hypermodules by extracting code and information relevant to a concern from the java programs. New Concerns of the same kind and also concerns of other kinds can be added at any stage in product family development. Interactions of concerns of different dimensions can be adequately expressed and handled by HyperJ. Concerns of same and different kinds can be composed and integrated to obtain software based on hyperspace approach.

Hyper/UML Philippow et al. [PRB03] discuss the hyperspace approach as applied to Unified Model- ing Language (UML). Hyper/UML was created as an extension to UML. The process of using Hyper/UML involves obtaining feature model, creating design components using

2http://www.alphaworks.ibm.com/tech/hyperj Chapter 4. Issues in Feature-Oriented Decomposition of SQL:2003 45

Hyper/UML, and using these design components to create HyperJ hyperslices which are composed to hypermodules and integrated into a complete system [PRB03]. According to Philippow et al. [PRB03] applying hyperspace approach to UML and using it as a bridge between feature models and HyperJ components, a high degree of automation is achieved.

4.1.3 Comparison of Different Implementation Models We have seen in the last chapter how SQL:2003 features may be decomposed and how sub-grammars for feature-diagrams representing the features can be used to obtain a parser for specific features. During our work on the prototype parser generation we found that we require a mix of capabilities provided by various implementation models discussed so far. We identify two development requirements: 1. The Bali approach definitely gave us the first idea about composing sub-grammars of features to obtain a grammar that combines functionality of these features. We surmise that Bali and related tools are more focused on feature-oriented program- ming concept while the superimposed variants approach more closely reflects the feature modeling concept. We found that we require a way of addressing the feature diagrams obtained in the feature decomposition process in the form a feature tree (like a product line), where selecting some features gives us the product we want (which in this case is a parser). We need a way also to address the parent child relationship of features and create feature instances including only those features that satisfy constraints. Using the superimposed variants approach for language ex- tension by composing grammars implies that except the general architecture, the only difference is in feature composition, which in case of SQL:2003 features, are rules for composition of sub-grammars corresponding to various SQL:2003 features. Therefore as far as a parser is concerned we can use the superimposed variants ap- proach or any similar approach with the common denominator being the composer for grammar. We began with Bali because we were aware of its language extension capabilities before coming across other approaches. 2. In order to create a full-fledged SQL engine though, we require a way of addressing features as first class entities and support for manipulating features. Jak language of GenVoca/AHEAD family of tools is the most prominent example of a feature- oriented programming language. Expressing the intricate syntax, access, general rules of SQL:2003 standard is extremely nontrivial, yet possible using Jak or Jak like language. Thus we require an easier and intuitive way of mapping features to implementation mod- els remaining close to feature-oriented development concepts, the functionality present in superimposed variants approach and we require a robust feature-oriented programming capability, an example of which is the Jak language of GenVoca/AHEAD family of tools.

4.2 SQL:2003 Specific Issues

In this section, we discuss various issues relevant to the feature-oriented decomposition of SQL:2003 and the implications for feature-oriented implementations of a general pro- 46 4.2. SQL:2003 Specific Issues gramming language.

Feature analysis of a programming language Any programming language has features mainly categorized by the paradigm it supports, i.e. whether it is an object-oriented, functional, procedural language or some mix of these and other programming paradigms. They are also distinguished by whether they support pure or impure aspect of that paradigm such as pure or impure functional pro- gramming languages, pure or impure object-oriented programming languages. Each of the programming languages support some or the other kind of features while discarding others. Some languages do support language extension and meta-programming facilities, but our need is not the same as in adding extra syntax to language, say, using LISP or carry out template meta-programming with C++. Almost anything can be done in sufficiently powerful programming language with some or the other way and if there is no first-class support for such a task, some workaround can be designed. In composing features of SQL:2003, we require a way to compose sub-grammars of SQL:2003 BNF specification selectively.

Bali tool of the AHEAD tool suite [TBKC07] provides the functionality to com- pose sub-grammars to create parser for the resulting grammar. Parser generators like Javacc, ANTLR take complete grammar specification to a produce a parser (we implemented BALI like functionality using ANTLR after we found that ‘grammar composition’ was the common denominator). Bali takes LL(k) grammar specification. Grammars are written in Bali specific format which are composed to obtain different variants of a language. The original Balicomposer’s composition rules proved to be restrictive in expressing the complex declarative syntax of SQL:2003 as seen in Section 3.3 and problems like feature optionality due to inappropriate composition lead us to creating our own composer for the prototype of customizable parser. We haven’t yet explored the option of extending Bali’s composition mechanism functionality by using Bali itself and it will probably be part of our ongoing work.

Almost no case study of composing sub-grammars for programming language is found except research in natural language processing, such as a study by Meng et al. [MLXW02] on GLR(Generalized Left Right) parsing with multiple grammars for natural language queries. Traditionally the effort in extending programming language features was tackled most prominently using LISP like languages using macros. Many other meta-programming paradigms do exist but the thought of having sub-grammars of languages as features of languages did not come up, perhaps because, like feature- oriented programming it requires a shift in focus from viewing language as having single grammar single parser to thinking of it as collection and combination of features each of which is ascribed to a sub-grammar, thus obtaining multi-grammar multi-parser paradigm for variants of the same language. This would allow creating a customizable parser (we are involved currently in creating a prototype for SQL:2003) in the longer run allowing us to also create customizable compiler or interpreter.

Another strand of research has been devoted to domain specific languages. A study by Deursen and Klint [vDK01] combined feature concept with DSLs. They created a DSL Chapter 4. Issues in Feature-Oriented Decomposition of SQL:2003 47 for a commercial document generator based on a feature description language which is capable of creating feature based variants. Nevertheless no extensive guidelines about feature selection and analysis of feature optionality within a programming language are available. We hope to obtain insights in our ongoing work about the same.

Difference between Feature-Oriented Decomposition and Feature-Oriented Refactoring As seen in Section 2.3 Feature-Oriented Decomposition (FOD) is a feature modeling ac- tivity to find commonalities and variabilities in terms of features, while Feature-Oriented Refactoring (FOR) is the “process of decomposing a program into features” [LBL06]. On the surface both seem alike, but in feature-oriented decomposition we do not have an existing application which is non-feature based. The ‘feature finding’, so to speak, be- gins from a scratch. Refactoring on the other hand, as Martin Fowler defines it [Fow99] is the “process of changing a software systems in such a way that it does not alter the external behavior of the code yet improves it internal structure”. It is a process of “im- proving design after it has been written”. Feature refactoring changes a software system so that it is feature based. The feature refactoring process consists of choosing a feature expression for a legacy application to find out what features exist in it, finding differ- ence between base and derivative features (derivatives contain only method refinements while base contains classes and inter-type declarations [LBL06]), refactoring base and derivative features, finally reconstituting the program. Our work is the feature-oriented decomposition of SQL:2003, but it is also possible to feature refactor SQL engine code of a particular implementation of SQL:2003, with a different degree of insight into feature model of SQL:2003, since we would have the implementation components with us apart from the SQL standards.

Influence of Language Grammar on Feature Analysis As stated in the basis of feature-oriented decomposition of SQL:2003, the BNF grammar of SQL along with specifications in standard drafts was the main source for features found. Features can be extracted out of any kind of programming language grammar since the inherent structure of a feature diagram itself is very simple.

Most prominent non-terminals in the grammar were made features in the feature diagram, with the originating non-terminal which corresponded as main SQL construct as the root feature and subsequent important non-terminals in the production rules of the originating non-terminal which corresponded to parts or specific properties of the SQL construct as child features. In most cases the mandatory or optional nature of a feature was derived from the mandatory or optional presence of the corresponding non-terminals.

There was uncertainty about whether major keywords should be treated as fea- tures, since to be in compliance with the ISO/ANSI standard, SQL variants from database vendors must support “the same major keywords in a similar manner”3. But since they are part of the syntax (which is implementation specific and we are concerned about feature decomposition of a language standard), it was decided that keywords are

3http://www.w3schools.com/sql/sql intro.asp 48 4.2. SQL:2003 Specific Issues not be treated as features (e.g., SELECT keyword as a feature in Query Specification). Only in certain cases where keywords addressed very important properties or behavior of parent SQL construct they were included in the feature diagrams as features, indicating that, that property or behavior is important from the point of view of correct functioning of the given SQL construct.

Duplicate Features A feature analysis based on language grammar is bound to contain duplicate features, or features that occur in many language construct and end up have different parent features. Especially in case of complex grammar for declarative query language like SQL this is bound to happen as various non-terminals appear at different places in different produc- tion rules. This problem of a feature with many parents (incorrect due to tree nature of feature diagrams) was alleviated by moving many occurrences of duplicate features to a single parent, e.g., Search Condition and Query Expression which initially appeared in different feature diagrams. They were moved to Predicate and SQL Foundation feature diagrams respectively). A ’requires’ condition was included indicating that the removed feature is in fact required by the former parent feature in composition. Consequently, the order of features in feature diagrams has no importance from the point of view of gram- mar. We also found that due to the tree nature and parent-child relationship in feature diagrams, we had to move some features to one specific parent. This has the effect of re- moving a feature from the original feature diagram where it appeared and thus one may not be able to guess the grammatical structure of the SQL construct from the refined feature diagram although all such features are put in ‘Requires’ part of the semantic description. This also means that a developer needs access to SQL BNF grammar and standards specification apart from the feature diagrams to map the feature models to implementation models.

Design Rules and Constraints For Features The following conventions for specification of syntactic elements are reproduced from Section 6.3.1 of [Mel03b]. It consists of 6 elements :

1. Function: A short statement of the purpose of the element.

2. Format: A BNF definition of the syntax of the element.

3. Syntax Rules: A specification in English of the syntactic properties of the element, or of additional syntactic constraints, not expressed in BNF, that the element shall satisfy, or both.

4. Access Rules: A specification in English of rules governing the accessibility of schema objects that shall hold before the General Rules may be successfully ap- plied.

5. General Rules: A specification in English of the run-time effect of the element. Where more than one General Rule is used to specify the effect of an element, the required effect is that which would be obtained by beginning with the first General Rule and applying the Rules in numeric sequence unless a Rule is applied Chapter 4. Issues in Feature-Oriented Decomposition of SQL:2003 49

that specifies or implies a change in sequence or termination of the application of the Rules. Unless otherwise specified or implied by a specific Rule that is applied, application of General Rules terminates when the last in the sequence has been applied.

6. Conformance Rules: A specification of how the element shall be supported for conformance to SQL. A complete coverage of syntax, access, general and conformance rules in terms of ’requires’ and ’excludes’ is very difficult to capture, and beyond the scope of this thesis. Coupled with other important features such as large database support, indexes, backup and restore system, persistence, logging, concurrency and transactions, etc., of a database management systems that uses SQL as interface to database, interactions within features of SQL and among SQL and database system features become much more complex. Unlike other programming language SQL is a kind of DSL used for retrieval and management of database. In other words, interacting with database is its core function. Mapping the decomposed features of such a language as SQL:2003 would indeed be a very complex task because feature interactions are not just among various parts of the programming language but also with implementation components of the database system. It is reasonable to believe that it would be easier to map features a general programming language to a model implementation than SQL.

More than any other domain to which feature-oriented domain analysis is applied, the number of constraints of various kinds is large for a general programming language and staggering for a database language like SQL because of its natural language like syntax and the database context within which it operates. General applications contain features that capture high level functionality and therefore limited number of ‘requires’ and ‘excludes’ constraints may prove more than enough. On the other hand, the more features a programming language has the more low level functionality enters the scenario, increasing constraints of various kinds. We found out that there are generally three distinct types of constraints in mapping feature models to implementation models: • Feature constraints, such as ‘requires’ and ‘excludes’ which can be captured in feature diagrams.

• Implementation model constraints, which depend on specific implementation model e.g., ‘fmp’ plug-in by Czarnecki et al. [AC04] based on cardinality based feature modeling concepts, require tree-oriented navigation and query facilities and other operators for resolving constraints on feature sets and feature attributes. Hyper- slices in hyperspaces must be declaratively complete [OT00]. GenVoca type of feature-oriented programming considers feature derivatives for resolving constraints on optional features [Liu04]. Similarly use of propositional formulas to determine composition validation is also considered [Bat05]. All these can be categorized as constraints on implementation models that arise due to specific semantics of these models.

• Implementation constraints, which are constraints on runtime behavior which in case of SQL:2003 involve maintaining ACID properties, maintain default states and default values of variables, descriptors and diagnostic areas, etc. These kind of 50 4.3. Related Work

constraints cannot be represented either in feature diagrams or in implementation model. Much of SQL:2003’s general rules, access and syntax rules are of this kind and have to be taken care of in whatever programming language was used to implement the features.

In all of the feature diagrams no ‘excludes’ constraint is present. It does not mean that exclusion conditions do not exist at all but at least not in the feature modeling sense and at the feature constraints level. The exclusion constraints are present in the form of what we have called implementation constraints. At the runtime, under some specific conditions of database and values of various descriptors and locators exclusion conditions arise (for such conditions refer to specifications of different SQL constructs in SQL/Foundation [Mel03a], but there is no way to represent this information either in the feature diagram or the semantic description because at the abstraction level of features this information is not under consideration. Ideally all about a feature in the standard would have to be considered to talk about exclusion conditions. At the statement level and in terms of feature constraints, it is the ‘requires’ constraints that are more prominent than ‘excludes’ constraints and therefore only ‘requires’ constraints are presented.

Alternate Ways of Modeling Top Level Features in SQL:2003 In this thesis, the top level features have been arranged at different levels of granularity with the basic decomposition guided by classification of SQL statements by function (cf. Section 4.33.2 of SQL/Foundation [Mel03a]). An alternative to this could be arranging statements according to statements specific to various schema elements such as schema, domains, tables, views, roles, privileges, triggers and assertions, etc., and data types such as arrays, multi-sets, row data type and xml data type. Such a choice will alter the structure of the main feature tree describing complete SQL:2003. This will conse- quently change the sub-grammars arranged in hierarchical manner and depending on mapping formalism used, could influence the final software modeled in terms of features offered to the customer. Since SQL statement classes were clearly defined in SQL/Foun- dation [Mel03a] we chose to follow that decomposition instead of decomposition based on schema elements. Having thus decomposed SQL:2003 based on statement classes, it can be argued that such decomposition is more general than one based on schema ele- ments, since schema elements’ implementation could be very vendor specific, and that initial decomposition based on schema elements may have to be tuned further indicating changes to modeled features to suit each vendor’s implementation needs.

4.3 Related Work

The GenVoca model was inspired by the works of Don Batory [BO92] on Genesis [BBG+88] and O’Malley et al. [PHOA89] on AVOCA/x-kernel, which are generators for database management systems and network protocols respectively [CE00]. Batory and other researcher have worked on refining the ideas behind GenVoca in subsequent years. Batory et al. [TBKC07] have also proposed a method for validating composition which is relevant to our work. They identified several kinds of constraints apart from general feature inclusion and exclusion constraints such as refinement constraint, superclass constraint, reference constraint, single introduction constraint, abstract Chapter 4. Issues in Feature-Oriented Decomposition of SQL:2003 51

class constraints, and interface constraints, all of which belong to the specific nature of implementation constructs of Jak language. We have already seen that of three important kinds of constraints the above relate to the category of ’Implementation model constraints’. (Refer to Section 4.2)

Liu et al. [LBL06] have applied feature-oriented refactoring to ‘Prevayler’ which is an open source java application, with in memory database and object persistence. They obtained 7 base modules and 9 derivatives which were used to create different variants. They also made an interesting observation that features can have different implementations in different products of the product line. This is important when implementing the feature decomposed SQL:2003, as it can affect the ‘Implementation model constraints’ in subtle ways.

Blair and Batory [BB04] compared GenVoca and XVCL generators. XVCL (XML based variants configuration language) generators are based on the idea of frames, which are “parameterized functions that return a text string which is interpreted as source”. They found that GenVoca considers compositional programming based on feature modularity and program objectification, while XVCL’s frame based design does not need to consider such modularity and only changes to be made need to be defined. Maintenance in GenVoca generators was found to be simpler than XVCL and GenVoca scaled more readily than XVCL.

We have already seen domain specific languages4 and features related research like [vDK01], [BBGN01], [BLS98]. Czarnecki et al. [CE00] list different types of DSLs such as fixed DSLs with separate translators, embedded DSLs embedded in general programming languages, modularly composable DSLs like the ‘Intentional Program- ming’ system [Sim95], [Sim96]. Czarnecki et al. [CE00] also mention the problems with application specific language containing all required language features as parsing problem, cost of specialized compilers and programming environments, distribution of language extensions and evolution problem. The parsing problems is evident for feature-rich and domain-specific text-based languages (like feature decomposed SQL) in two ways : adding more features may make language unparsable and requirement of parsability restricts domain specific notation. This is quite relevant in our work as managing grammars including vast number of tokens is no problem in single-grammar, single parser paradigm but is problematic in multi-grammar, multi-parser scenario. (SQL:2003 grammar is large in terms of non-terminal and terminal symbols and tokens, keywords, etc., and when small grammars corresponding to features are considered, managing these grammars and their corresponding tokens is quite involved) The fixed grammar point of view restricts evolution and feature concept may provide solution. Costs of specialized compilers and programming environments are estimated to be very high and distributing extensions is difficult. Both these problems are relevant too, as database technology has expanded uncontrollably, to really make sense a product line architecture must be established with feature concept as the basis for databases so that database feature evolution can be coordinated and put into feature context, which is going to be very costly.

4An annotated bibliography of domain specific languages by Deursen, Klint, and Visser is available at http://homepages.cwi.nl/∼arie/papers/dslbib/dslbib.html. 52 4.4. Summary

A study by Lopez-Herrejon et al. [LHBC05] suggests that there are certain technology- independent properties of feature modularity that must be addressed by implementation models. They evaluated support for features in AspectJ, HyperJ, Jiazzi, Scala and AHEAD. They concluded that none of these technologies provide a satisfactory solution to problem of building products of product line. They also found that feature modularity properties are algebraic in nature. They proposed that different modularization and implementation models need to be consolidated within the context of algebraic feature modularity properties. This complements our conclusion regarding implementation of SQL:2003, that different capabilities provided by different approaches are required (cf. Section 4.1.3).

Our group at Institute for Technical and Business Information Systems the Uni- versity of Magdeburg is involved in two related projects, namely, the FAME-DBMS5 project which considers highly configurable database families for embedded systems and Tailor-made Data Management Software6 workshop, which is about including only specific features in products that help in limiting the size of the codebase, using concepts of feature-oriented software development.

4.4 Summary

Apart from the Bali and related tools and our implementation of a grammar composer we reviewed other related implementation models. We propose that in order to use features of SQL:2003 in creating a complete SQL engine we need two development requirements, one to address the feature instance description creation and validation from feature diagrams by resolving feature constraints and the other to support feature-oriented pro- gramming in adding semantic actions to the generated parser code. We propose that feature decomposition similar to that of SQL:2003 may be applied to other programming languages in general to build and extend them in a feature-oriented way. Implementing SQL:2003 engine is more difficult than general programming languages due to the non- trivial nature of various rules governing SQL syntax and execution. We identify three different kinds of constraints, feature constraints, implementation model constraints, and implementation constraints which must be addressed when using feature model- ing and feature-oriented programming together with an implementation model. Feature refactoring is more apparent than feature-oriented decomposition since implementation components are available apart from domain knowledge. We find various instances of related work in other implementation models, domain specific languages and software generators.

5http://wwwiti.cs.uni-magdeburg.de/iti db/forschung/FAME-DBMS/index.htm 6http://wwwiti.cs.uni-magdeburg.de/iti db/workshops/BTW-07/eng.html Chapter 5. Conclusion 53

Chapter 5

Conclusion

In this thesis, we showed how feature-oriented decomposition can be applied to SQL:2003 and how to implement the features thus obtained to create a customizable parser. We present in this chapter, various conclusions we reached while carrying out the feature-oriented decomposition of SQL:2003.

We started by stating the case for feature-oriented decomposition of SQL:2003. We observed in the motivation that applying Software Product Line Engineering concepts to SQL:2003 can be beneficial in many areas such as embedded and real time systems, software generators and database technology in general as we would achieve the capacity to customize and select only the required features of SQL:2003.

SQL standardization clearly depicts the ‘core + features’ development of SQL. We sought to apply the feature-orientation concepts to decomposition of SQL:2003 along the feature concern. Features of SQL:2003 are user-visible and distinguishable characteristics of SQL:2003. We explained how the feature concept is situated in Software Product Line Engineering and also established the feature modeling technique as applicable to SQL:2003.

No specific implementation model is best in achieving separation of concerns [LHBC05]. We explained the Bali approach, our implementation of grammar composition, other implementation models like superimposed variants and hyperspaces. We established that in order to use feature modeling and feature-oriented programming together we need mechanisms that respectively take care of these and that both of them need to be present in the model to decompose SQL:2003 into features and to create a customizable parser for SQL:2003 and a customizable SQL engine in future.

We have also seen that feature analysis can be applied to programming languages in general with the standpoint of multiple grammars and multiple parsers. If the feature analysis and feature-oriented programming were bootstrapped with features as first-class entities (BALI comes close but manipulation is still along the class concern), then extending this language with new features would be intuitive and in the lines of generative and automatic programming.1

1“Generative programming occurs when mapping between levels of abstraction and automatic pro- gramming occurs when optimizing within a level of abstraction.” [Bat03b] 54 5.1. Further Work

An interesting observation made by Liu et al. [LBL06] is that features may be implemented differently in different products of a product line. When we consider this observation in the light of ‘scaled down’ DBMS with feature decomposed SQL:2003, we see that not only SQL but the respective DBMS must also be feature decomposed and these two must be aligned and integrated into an overall feature based database system. Considering the number of domains in which database technology is active, we require a product line architecture of SQL:2003 and database systems in order to really gain from application of feature-oriented software development concepts to database systems.

The feature diagrams of SQL:2003 show features at varying levels of granularity. Including a feature node (optional or mandatory) implies that its parent has to be included too in the feature instance description according to definitions (cf. Section 2.3.2) and with the parent feature all mandatory features come into picture, given that they are all AND features. Depending on requirements of a system like e.g., embedded system with less resources and storage capacity, features may have to be repositioned lest they end up including more functionality than required or possible for the system. Another concern regarding granularity as a level of detail is that feature models are more abstract than other models. In general applications features may map to real world entities capturing sufficient details of the entity, but in SQL:2003, we have seen that each standard contains a set of syntax, access, general and conformance rules, etc., which cannot be expressed easily in feature models, no matter what kind of granularity is introduced. Inspecting an SQL engine code and perhaps feature refactoring it may give more insights to fine tune the feature decomposition of SQL.

We summarize various conclusions in the light of goals we established at the be- ginning of the thesis.

1. Software Product Line Engineering concepts can be successfully applied to SQL:2003. User-visible and distinguishable characteristics of SQL:2003 are the fea- tures of SQL:2003. Feature-oriented decomposition of SQL:2003 is carried out with 40 feature diagrams denoting various features of SQL/Foundation. Other extension packages of SQL:2003 can be decomposed similarly.

2. The complete SQL:2003 grammar can be considered a product line where sub- grammars denote features. These sub-grammars can be composed like feature com- position and along with a parser generator, customized parsability can be achieved.

3. The following capabilities are required for the implementation model: a) accessing feature diagrams and generating feature instances by resolving feature constraints, and b) applying feature-oriented programming concepts to add semantic actions to parser code thus generated.

5.1 Further Work

We have been working on the prototype of a parser for the feature decomposed SQL:2003. It is currently in its initial stage. We have already discussed a variety of Chapter 5. Conclusion 55 related issues. We would like to continue working on this prototype. At the same time we need to work on a product line architecture of SQL:2003 and database systems in general in order to gain insight about how SQL features interact with database features in general. This information is necessary to create a complete engine for SQL:2003 that is feature customizable.

We need to find out how to express complex and nontrivial constraints based on various kinds of syntax, access, general rules in feature models or some other formalism besides feature models and integrate them. Given the fact that SQL syntax is effectively open-ended, the grammar and corresponding set of rules would change too (variability in space and variability in time). We must be able to create an architecture that can cope variability in space and in time in relation to the enormous SQL grammar (considering not only grammar from SQL/Foundation but also from extension packages) and inter- action of new features with each other and old features (from older decompositions). Feature-oriented decomposition is a modeling activity in Domain Analysis, we need to carry out the work on other phases of Domain Engineering such as Domain Design and Domain Implementation and also on the parallel phases of Application Engineering to create a complete product line architecture for SQL:2003. In the Domain Design we need to explore various system architecture patterns for the proposed product line architecture, such as layers and micro kernel patterns described earlier. Similarly various configuration and application ordering tools, etc., need to be tested in phases of Application Engineering.

It is also possible to use a variety of techniques as stated in Chapters 3 and 4 for mapping feature models to implementation models. Using different implementation models on a subset of SQL:2003 will give us insights about what kind of components and paradigms we require to create an efficient and customizable SQL:2003 engine.

We propose the following tasks for further work in sequential order:

1. Find a way to model complex and nontrivial constraints of different kinds related to SQL:2003 and database systems.

2. Explore various architectural patterns, other software generators, implementation models, configuration languages, etc., for implementing various phases of Software Product Line Engineering as applicable to SQL:2003.

3. Establish a complete product line architecture for SQL:2003 by sequentially ex- ploring various phases of Software Product Line Engineering. Prepare a proof of concept implementation based on such architecture.

In this way, it would be possible to create a completely customizable SQL:2003 engine. 56

Bibliography

[AC04] Antkiewicz, M.; Czarnecki, K.: FeaturePlugin: Feature Modeling plug-in for Eclipse. In Proceedings of the 2004 OOPSLA workshop on eclipse technology eXchange(Eclipse’04), S. 67–72. ACM Press, New York, NY, USA, 2004.

[ALRS05] Apel, S.; Leich, T.; Rosenm¨uller,M.; Saake, G.: FeatureC++: On the Sym- biosis of Feature-Oriented and Aspect-Oriented Programming. In Proceed- ings of Fourth International Conference on Generative Programming and Component Engineering (GPCE’05), S. 125–140, 2005.

[Bat03a] Batory, D.: A Tutorial on Feature-oriented Programming and Product-lines. In Proceedings of the 25th International Conference on Software Engineering (ICSE’03), S. 753–754. IEEE Computer Society, Washington, DC, USA, 2003.

[Bat03b] Batory, D. S.: The Road to Utopia: A Future for Generative Programming. In Domain-Specific Program Generation, S. 1–18, 2003.

[Bat04] Batory, D. S.: Program Comprehension in Generative Programming: A History of Grand Challenges. In Proceedings of the 12th IEEE International Workshop on Program Comprehension (IWPC’04), S. 2. IEEE Computer Society, Washington, DC, USA, 2004.

[Bat05] Batory, D.: Feature Models, Grammars, and Propositional Formulas. In Software Product Line Conference, 2005.

[BB04] Blair, J.; Batory, D.: A Comparison of Generative Approaches: XVCL and GenVoca. Technical report, Department of Computer Sciences, University of Texas at Austin, 2004.

[BBG+88] Batory, D. S.; Barnett, J. R.; Garza, J. F.; Smith, K. P.; Tsukuda, K.; Twichell, B. C.; Wise, T. E.: GENESIS: An Extensible Database Manage- ment System. IEEE Trans. Software Eng., Band 14, Nr. 11, S. 1711–1730, 1988.

[BBGN01] Batory, D.; Brant, D.; Gibson, M.; Nolen, M.: ExCIS: An Integra- tion of Domain-Specific Languages and Feature-Oriented Programming. www.isis.vAnderbilt.edu/sdp, 2001.

[BBPV00] Bobineau, C.; Bouganim, L.; Pucheral, P.; Valduriez, P.: PicoDBMS: Scaling down Database Techniques for the Smartcard, 2000. BIBLIOGRAPHY 57

[BFG+02] Bosch, J.; Florijn, G.; Greefhorst, D.; Kuusela, J.; Obbink, J. H.; Pohl, K.: Variability Issues in Software Product Lines. In Revised Papers from the 4th International Workshop on Software Product-Family Engineering (PFE’01), S. 13–21. Springer-Verlag, London, UK, 2002.

[BJMvH00] Batory, D. S.; Johnson, C.; MacDonald, B.; Heeder, D. v.: Achieving Ex- tensibility Through Product-Lines and Domain-Specific Languages: A Case Study. In International Conference on Software Reuse, S. 117–136, 2000.

[BLHM02] Batory, D.; Lopez-Herrejon, R. E.; Martin, J.-P.: Generating Product-Lines of Product-Families. Automated Software Engineering, 2002.

[BLS98] Batory, D.; Lofaso, B.; Smaragdakis, Y.: JTS: Tools For Implementing Domain Specific Languages. In Proceedings Fifth International Conference on Software Reuse (ICSR’98), S. 143–153. IEEE, Victoria, BC, Canada, 1998.

[BO92] Batory, D. S.; O’Malley, S. W.: The design and implementation of hierar- chical software systems with reusable components. ACM Transactions on Software Engineering Methodology, Band 1, Nr. 4, S. 355–398, 1992.

[BSR04] Batory, D.; Sarvela, J.; Rauschmayer, A.: Scaling StepWise Refinement. IEEE Transactions on Software Engineering, Band 30, S. 1278–1295, 2004.

[CA05] Czarnecki, K.; Antkiewicz, M.: Mapping Features to Models: A Template Approach Based on Superimposed Variants. In Proceedings of Fourth In- ternational Conference on Generative Programming and Component Engi- neering (GPCE’05), S. 422–437, 2005.

[CE00] Czarnecki, K.; Eisenecker, U. W.: Generative Programming - Methods, Tools, and Applications. Addison-Wesley, 2000.

[CHE04] Czarnecki, K.; Helsen, S.; Eisenecker, U.: Formalizing Cardinality-based Feature Models and Their Specialization. Technical Report Nr. 4-11, De- partment of Electrical and Computer Engineering, University of Waterloo, Canada, 2004.

[CK05] Czarnecki, K.; Kim, P.: Cardinality-Based Feature Modeling and Con- straints: A Progress Report. In Proceedings of the International Workshop on Software Factories (OOPSLA’05), 2005.

[Cod70] Codd, E. F.: A Relational Model of Data for Large Shared Data Banks. Commun. ACM, Band 13, Nr. 6, S. 377–387, 1970.

[CW00] Chaudhuri, S.; Weikum, G.: Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System. In Abbadi, A. E.; Brodie, M. L.; Chakravarthy, S.; Dayal, U.; Kamel, N.; Schlageter, G.; Whang, K.-Y. (Hrsg.): Proceedings of 26th International Conference on Very Large Data Bases (VLDB’00), S. 1–10. Morgan Kaufmann, 2000. 58 BIBLIOGRAPHY

[Dat95] Date, C. J.: An Introduction To Database Systems Volume I, Sixth Edition. Addison-Wesley, 1995.

[Dij76] Dijkstra, E. W.: A Discipline of Programming. Prentice Hall, Englewood Cliffs, NJ, 1976.

[dJV02] Jonge, M. d.; Visser, J.: Grammars as Feature Diagrams. CWI, Amsterdam, 2002.

[EMK+04] Eisenberg, A.; Melton, J.; Kulkarni, K.; Michels, J.-E.; Zemke, F.: SQL:2003 has been published. SIGMOD Rec., Band 33, Nr. 1, S. 119–126, 2004.

[EN03] Elmasri, R.; Navathe, S. B.: Fundamentals of Database Systems, Fourth Edition. Addison-Wesley, 2003.

[Fow99] Fowler, M.: Refactoring: Improving the Design of Existing Code. Addison- Wesley, Boston, MA, USA, 1999.

[Int99] International Organization for Standardization (ISO): Part 7: Interindustry Commands for Structured Card Query Language (SCQL). In Identification Cards – Integrated Circuit(s) Cards with Contacts, ISO/IEC 7816-7, 1999.

[KCH+90] Kang, K.; Cohen, S.; Hess, J.; Novak, W.; Peterson, A.: Feature-Oriented Domain Analysis (FODA) Feasibility Study. Technischer Bericht Nr. CMU/SEI-90-TR-21, Software Engineering Institute, Carnegie Mellon Uni- versity, 1990.

[Kli04] Kline, K. E.: SQL InA Nutshell, 2nd Edition. O’Reilly, 2004.

[KWF+03] Kersten, M. L.; Weikum, G.; Franklin, M. J.; Keim, D. A.; Buchmann, A. P.; Chaudhuri, S.: A Database Striptease or How to Manage Your Personal Databases. In Freytag, J. C.; Lockemann, P. C.; Abiteboul, S.; Carey, M. J.; Selinger, P. G.; Heuer, A. (Hrsg.): Proceedings of the 29th International Conference on Very Large Data Bases (VLDB-03), S. 1043–1044. VLDB, Morgan Kaufmann, Berlin, Germany, 2003.

[LBL06] Liu, J.; Batory, D.; Lengauer, C.: Feature Oriented Refactoring of Legacy Applications. In Proceedings of 28th International Conference on Software Engineering (ICSE’06). ACM Press, 2006.

[LHBC05] Lopez-Herrejon, R. E.; Batory, D. S.; Cook, W. R.: Evaluating Support for Features InAdvanced Modularization Technologies. In Proceedings of Eu- ropean Conference on Object-Oriented Programming (ECOOP’05), S. 169– 194, 2005.

[Liu04] Liu, J.: Feature Interactions and Software Derivatives. Journal of Object Technology, Band 4, Nr. 3, S. 13–19, 2004.

[Mel03a] Melton, J.: Working Draft : SQL Foundation . ISO/IEC 9075-2:2003 (E) Nr. 5WD-02-Foundation-2003-09, ISO/ANSI, 2003. BIBLIOGRAPHY 59

[Mel03b] Melton, J.: Working Draft : SQL Framework . ISO/IEC 9075-1:2003 (E) Nr. 5WD-01-Framework-2003-09, ISO/ANSI, 2003.

[Mel03c] Melton, J.: Working Draft : SQL Object Language Bindings. ISO/IEC 9075-10:2003 (E) Nr. 5WD-10-OLB-2003-09, ISO/ANSI, 2003.

[Mel03d] Melton, J.: Working Draft : SQL Routines and Types for the Java Pro- gramming Language. ISO/IEC 9075-13:2003 (E) Nr. 5WD-13-JRT-2003-09, ISO/ANSI, 2003.

[Mel03e] Melton, J.: Working Draft : XML-Related Specifications . ISO/IEC 9075- 14:2003 (E) Nr. 5WD-14-XML-2003-09, ISO/ANSI, 2003.

[Mel03f] Melton, J.: Working Draft SQL Management of External Data. ISO/IEC 9075-9:2003 (E) Nr. 5WD-09-MED-2003-09, ISO/ANSI, 2003.

[Mel03g] Melton, J.: Working Draft: SQL Persistent Stored Modules . ISO/IEC 9075-4:2003 (E) Nr. 5WD-04-PSM-2003-09, ISO/ANSI, 2003.

[MLXW02] Meng, H.; Luk, P.-C.; Xu, K.; Weng, F.: GLR Parsing with Multiple Gram- mars for Natural Language Queries. ACM Transactions on Asian Language Information Processing (TALIP), Band 1, Nr. 2, S. 123–144, 2002.

[NTN+04] Nystr¨om, D.; Tesanovic, A.; Nolin, M.; Norstr¨om,C.; Hansson, J.: COMET: A Component-Based Real-Time Database for Automotive Systems. In Workshop on Software Engineering for Automotive Systems., S. 1–8. The IEE, Edinburgh, Scotland,, 2004.

[OT00] Ossher, H.; Tarr, P.: Multi-Dimensional Separation of Concerns and the Hyperspace Approach. In Proceedings of the Symposium on Software Ar- chitectures and Component Technology: The State of the Art InSoftware Development. Kluwer, 2000.

[PBvdL98] Pohl, K.; B¨ockle, G.; Linden, F. v. d.: Software Product Line Engineering : Foundations, Principles, and Techniques . Springer, 1998.

[PHOA89] Peterson, L.; Hutchinson, N.; O’Malley, S.; Abbott, M.: RPC in the x- Kernel: Evaluating New Design Techniques. SIGOPS Oper. Syst. Rev., Band 23, Nr. 5, S. 91–101, 1989.

[PRB03] Philippow, I.; Riebisch, M.; Boellert, K.: The Hyper/UML Approach for Feature Based Software Design . In Third International Workshop on Aspect-oriented modeling (AOM’03), 2003.

[Pre97] Prehofer, C.: Feature-Oriented Programming: A Fresh Look at Objects. In Proceedings of European Conference on Object-Oriented Programming (ECOOP’97), S. 419–443, 1997.

[SB00] Svahnberg, M.; Bengtsson, P.: Software Product Lines from Customer to Code, 2000. 60 BIBLIOGRAPHY

[SC05] Stonebraker, M.; Cetintemel, U.: ”One Size Fits All”: An Idea Whose Time Has Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05), S. 2–11. IEEE Computer Society, Washington, DC, USA, 2005.

[Sim95] Simonyi, C.: The Death of Computer Languages, the Birth of Intentional Programming, 1995.

[Sim96] Simonyi, C.: Intentional Programming: Innovation in the Legacy Age, 1996.

[SvGB01] Svahnberg, M.; Gurp, J. v.; Bosch, J.: On the Notion of Variability in Software Product Lines. In Proceedings of 2nd Working IEEE / IFIP Con- ference on Software Architecture (WICSA’01), S. 45–54. IEEE Computer Society, 2001.

[TBKC07] Thaker, S.; Batory, D.; Kitchin, D.; Cook, W.: Safe composition of product lines. In Proceedings of the 6th international conference on Generative pro- gramming and component engineering (GPCE’07), S. 95–104. ACM Press, New York, NY, USA, 2007.

[TOHJ99] Tarr, P. L.; Ossher, H.; Harrison, W. H.; Jr., S. M. S.: N Degrees of Sepa- ration: Multi-Dimensional Separation of Concerns. In Proceedings of Inter- national Conference on Software Engineering (ICSE’99), S. 107–119, 1999.

[vDK01] Deursen, A. v.; Klint, P.: Domain-Specific Language Design Requires Fea- ture Descriptions. Journal of Computing and Information Technology, 2001.

[vdL06] Lans, R. F. v. d.: Introduction to SQL: Mastering the Relational Database Language, Fourth Edition/20th Anniversary Edition. Addison-Wesley Pro- fessional, 2006.

[Wir71] Wirth, N.: Program Development By Stepwise Refinement. Communica- tions ACM, Band 14, Nr. 4, S. 221–227, 1971. Appendix A. SQL:2003 Feature Diagrams 61

Appendix A

SQL:2003 Feature Diagrams

Figure A.1: SQL/Foundation Feature Diagram Feature ID - SQL/Foundation Semantic Description - Figure A.1 shows the feature diagram for SQL/Foundation. The SQL/Foundation can be expressed as a concept feature for classes of all SQL statements which can be classified according to their function. An SQL-statement is a string of characters that conforms to the ‘Format and Syntax Rules’ specified in the parts of ISO/IEC 9075 [Mel03b]. Main classes of SQL statements specified in ISO/IEC 9075 are [Mel03a] : SQL-schema statements (covered in Figure A.2); these may have a persistent effect on the set of schema, SQL-data statements; some of these, the SQL-data change statements (covered in Figure A.13), may have a persistent effect on SQL data, SQL-data statements(covered in Figure A.11) consist of table and cursor declarations and operations on them, while the data change statements consist of insert, , update, merge and their dynamic versions, SQL-transaction statements (covered in Figure A.16); that control transactions in database, SQL-control statements(covered in Figure A.17); that allow SQL to be used in a manner similar to writing a program in a structured programming language, SQL-connection statements(covered in Figure A.18); that allow establishing connection with a specific database such that SQL statements may be executed and results returned within the context of a connection, 62

SQL-session statements(covered in Figure A.19); that allow setting session specific characteristics, and SQL-diagnostics statements(covered in Figure A.21); that provide diagnostic information, SQL-dynamic statements allow users to build SQL statements dynamically at runtime.

The features scalar expressions(Figure A.22), the predicates(Figure A.30) and Query Expression(Figure 3.9) contain features which are required by features appearing in other feature diagrams. This is indicated by a ’Requires’ part of a feature description.

Figure A.2: SQL schema statement Feature Diagram Feature ID - SQL Schema Statements. Semantic Description - Figure A.2 shows the feature diagram of SQL schema statements. An SQL-schema is a persistent, named collection of object descriptors such as schema, domain, table, view, role, assertion, transliteration, sequence generators, Appendix A. SQL:2003 Feature Diagrams 63

collation, trigger, character set, user-defined ordering, transforms and schema routines.

The semantic description of schema, domain, table, view, sequence generator, trigger and schema routine are given with the feature diagrams A.3, 3.3, 3.4, 3.5, A.5, A.6 and 3.6 respectively. The Alter statements alter definitions of given schema elements by modifying constituting SQL constructs.(Figure A.10)

Requires- An Assertion feature requires Search Condition feature of the Predi- cate feature(Figure A.30).

Figure A.3: Schema Definition Feature Diagram Feature ID - Schema Definition Semantic Description - Figure A.3 shows the feature diagram for Schema Definition. A Schema Definition is specified by CREATE SCHEMA statement. A new schema is created with the name specified in schema-name-clause. A schema-element is a CREATE or GRANT statement that is specified using the normal syntax for such a statement and which is executed by the CREATE SCHEMA statement [Mel03a]. The privileges necessary to execute the schema definition (denoted by the Authorization feature) are implementation-defined.

Refer to Section 11.1 (grammar specification) of SQL/Foundation [Mel03a] for the Schema Definition feature.

Requires- A schema element can be a table (Figure 3.4), view(Figure 3.5), trig- ger(Figure A.6), domain (Figure 3.3), sequence generator(Figure A.5), a schema routine(Figure 3.6), and character set, collation, transliteration, transform, user-defined- ordering, user-defined-cast, assertion(all Figure A.2). All these features are required by the schema element feature. 64

Figure A.4: Column Definition Feature Diagram Feature ID - Column Definition Semantic Description - Figure A.4 shows the feature diagram for Column Definition. A table is defined on one or more columns and consists of zero or more rows. A column has a name and a declared type. The collate clause can be applied to a column definition to define the collation, or to a character string expression to apply a collation cast [Mel03a].

Refer to Section 4.13 (definitions of columns, fields and attributes) and Section 11.4 (grammar specification) of SQL/Foundation [Mel03a] for Column Definition feature.

Requires - The Column Definition feature optionally requires the Data Type feature of Scalar Expression feature(Figure A.22). Appendix A. SQL:2003 Feature Diagrams 65

Figure A.5: Sequence Generator Feature Diagram Feature ID - Sequence Generator Semantic Description - Figure A.5 shows the feature diagram for Sequence Generator. A sequence generator is used to generate successive exact numeric values, one at a time. The sequence generator is a SQL:2003 specific feature [EMK+04]. Specification of a sequence generator can optionally include the specification of a data type, a minimum value, a maximum value, a start value, an increment, and a cycle option [Mel03a].

Refer to Section 4.21 (definition of Sequence Generator and related information) and 11.62 (grammar specification) of SQL/Foundation [Mel03a] for Sequence Generator feature.

Requires - The Sequence Generator Datatype Option feature re- quire the Data Type feature of Scalar Expression feature(Figure A.22). 66

Figure A.6: Trigger Definition Feature Diagram Feature ID - Trigger Definition Semantic Description - Figure A.6 shows the feature diagram for Trigger Definition. A trigger event takes place as a result of executing some SQL-data change statement. It is a specification for a given action (known as a triggered action) to take place every time a given operation (known as a trigger event) takes place on a given object (known as the subject table) [Mel03a].

Refer to Section 4.38 (definition and general description of triggers) and 11.39 (grammar specification) of SQL/Foundation [Mel03a] for Trigger Definition feature.

Requires - Since the trigger arises as a consequence of executing SQL-data change statements and are permitted to include these statements, the feature SQL Data Change Statements (Figure A.13) and all its subfeatures are required by the trigger. Appendix A. SQL:2003 Feature Diagrams 67

Figure A.7: User-Defined Type Definition Feature Diagram Feature ID - User-Defined Type Definition Semantic Description - Figure A.7 shows the feature diagram for User-Defined Type Definition. A user can create own types using the CREATE TYPE statement. The definition of a user-defined type may include a method specification list consisting of one or more method specifications [Mel03a].

Refer to Section 4.7 (definition of user-defined types and related information) and 11.41 (grammar specification) of SQL/Foundation [Mel03a] for User-defined Type Definition feature.

Requires - The Distinct Type feature requires the Predefined Types feature of the feature Data Type(Figure A.23) while the Attribute Definition requires the Data Type(Figure A.23) feature. 68

Figure A.8: Grant Privilege Feature Diagram Feature ID - Grant Privilege Semantic Description - Figure A.8 shows the feature diagram for Grant Privilege. The Grant Privilege statement defines privileges. Privileges can be granted by current user or currently defined role, and the grantee can be public or a specific user with authorization [Mel03a]. The granted privileges apply to objects defined in the current schema. The Privilege feature is shown in Figure A.9.

Refer to Section 12.2 (grammar specification) of SQL/Foundation [Mel03a] for Grant Privilege feature.

Figure A.9: Privilege Feature Diagram Feature ID - Privilege Semantic Description - Figure A.9 shows the feature diagram for Privilege.A privilege authorizes a given category of action to be performed on an object which can be a specified base table, a view, a column, a domain, a character set, a collation, a transliteration, a user-defined type, a trigger, an SQL-invoked routine, or a sequence generator [Mel03a]. Appendix A. SQL:2003 Feature Diagrams 69

Refer to Section 4.34.2 (definition of privileges and related information) and 12.3 (grammar specification) of SQL/Foundation [Mel03a]for Privilege feature.

Figure A.10: Alter Statements Feature Diagram Feature ID - Alter statements Semantic Description - Figure A.10 shows the feature diagram for Alter Statements. The Alter statements are applicable to various schema elements such as table, domain, type, transform, routine, and sequence generator. They change the schema element’s definition by adding, removing or modifying constituents. For the feature diagrams of Table Definition, Domain Definition, User-defined-type Definition, Schema Routine, Sequence Generator and Transform Definition refer to Figures 3.4, 3.3, A.7, 3.6, A.5, and A.2 respectively.

Refer to grammar specification in of SQL/Foundation [Mel03a] with Section 11.10 for Alter Table feature, Section 11.25 for Alter Domain, Section 11.43 for Alter Type feature, Section 11.51 for Alter Routine, Section 11.58 for Alter Transform and Section 11.63 for Alter Sequence Generator feature.

Requires - Features Table Definition, Domain Definition, User-defined-type Defi- nition, Schema-routine, Sequence Generator and Transform Definition are required by their counterpart alter statements. 70

Figure A.11: SQL Data Statements Feature Diagram Feature ID - SQL Data statements Semantic Description - Figure A.11 shows the feature diagram for SQL Data Statements. SQL-Data statements perform query and modification on database tables, cursors, locators. Refer to Section 4.33.2.2 of SQL/Foundation [Mel03a]for various SQL Data statements. Appendix A. SQL:2003 Feature Diagrams 71

Figure A.12: Cursor Feature Diagram Feature ID - Cursor Semantic Description - Figure A.12 shows the feature diagram for Cursor. A cursor allows operating on rows of a table one at a time. A cursor is allows operating on rows of a table one at a time. Various Cursor characteristics are explained in Section 4.32 of SQL/Foundation [Mel03a]. For grammar specification refer to Section 14.1.

Requires - The Cursor Specification feature requires the Query Expression fea- ture (Figure 3.9).

Figure A.13: SQL Data Change Statements Feature Diagram Feature ID - SQL Data Change statements Semantic Description - Figure A.13 shows the feature diagram for SQL Data Change Statements. SQL data change statements in SQL-data statements are insert, delete, merge, update and their dynamic counterparts. Refer to Section 4.33.2.3 of SQL/Foundation [Mel03a]for various SQL Data Change statements. 72

Figure A.14: Delete statement Feature Diagram Feature ID - Delete statement Semantic Description - Figure A.14 shows the feature diagram for Delete Statement. There are two forms of DELETE statement:

The positioned DELETE form specifies that one or more rows corresponding to the current cursor position are to be deleted. The searched DELETE form is used to delete one or more rows, optionally determined by a search condition. Refer to Sections 14.6 and 14.7 of SQL/Foundation [Mel03a] for these statements.

For description of select statement and other clauses refer to Query Specification (Figure 3.10) and Table Expression (Figure 3.11).

Requires- The Positioned feature requires Cursor feature (Figure A.12) and Searched requires Search Condition feature of Predicate (Figure A.30).

Figure A.15: Update statement Feature Diagram Feature ID - Update statement Semantic Description - Figure A.15 shows the feature diagram for Update Statement. There are two forms of UPDATE statement.The positioned UPDATE(UPositioned feature) form specifies that one or more rows corresponding to the current cursor position are to be updated. The searched UPDATE(USearched feature) form is used to update one or more rows optionally determined by a search condition. Refer to Sections 14.10 and 14.11 of SQL/Foundation [Mel03a]for these statements. Appendix A. SQL:2003 Feature Diagrams 73

Requires- The UPositioned feature requires Cursor feature(Figure A.12), and USearched requires Search Condition feature of Predicate(Figure A.30).

Figure A.16: SQL Transaction statements Feature Diagram Feature ID - SQL Transaction Statements Semantic Description - Figure A.16 shows the feature diagram for SQL Transaction Statements. SQL-transaction statements control transactions in database. An SQL- transaction is a sequence of executions of SQL-statements. An SQL-transaction may be partially rolled back by using a savepoint.Refer to Section 4.33.2.4 of SQL/Foundation [Mel03a]for these SQL Transaction statements. For general information about SQL- transactions refer to Section 4.35. 74

Figure A.17: SQL Control statements Feature Diagram Feature ID - SQL Control Statements Semantic Description - Figure A.17 shows the feature diagram for SQL Control Statements. The CALL statement is used for invoking an SQL-invoked routine. The RETURN statement is used to return a value from an SQL-invoked function. Refer to Section 4.33.2.6 of SQL/Foundation [Mel03a]for SQL Control statements.

Figure A.18: SQL Connection statements Feature Diagram Feature ID - SQL Connection Statements Semantic Description - Figure A.18 shows the feature diagram for SQL Connection Statements. Refer to Section 4.33.2.5 of SQL/Foundation [Mel03a]for SQL Connection statements. Appendix A. SQL:2003 Feature Diagrams 75

Figure A.19: SQL Session statements Feature Diagram Feature ID - SQL Session Statements Semantic Description - Figure A.19 shows the feature diagram for SQL Session Statements. An SQL-session spans the execution of a sequence of consecutive SQL- statements invoked either by a single user from a single SQL-agent or by the direct invocation of SQL [Mel03a]. Refer to Section 4.33.2.7 of SQL/Foundation [Mel03a]for SQL Session statements. 76

Figure A.20: SQL Dynamic Statements Feature Diagram Feature ID - SQL Dynamic statements Semantic Description - Figure A.20 shows the feature diagram for SQL Dynamic Statements. SQL Dynamic statements support the preparation and execution of dynamically-generated SQL-statements. Refer to Section 4.33.2.9 of SQL/Foundation [Mel03a]for SQL Dynamic statements.

Figure A.21: SQL Diagnostic Statements Feature Diagram Appendix A. SQL:2003 Feature Diagrams 77

Feature ID - SQL Diagnostic Statement Semantic Description - Figure A.21 shows the feature diagram for SQL Diagnostic Statement. SQL Diagnostics statements get diagnostics information from the diagnostics area. A diagnostics area is a place in which conditions are recorded as they arise by executing various statements [Mel03a]. Statement Information describes the overall result of the SQL statement, in particular the number of rows that it has modified and the number of exceptions that result, while Condition Information describes individual exceptions. Refer to Section 4.33.2.8 of SQL/Foundation [Mel03a]for SQL Diagnostics statements.

Figure A.22: Scalar Expressions Feature Diagram Feature ID - Scalar Expressions Semantic Description - Figure A.22 shows the feature diagram for Scalar Expres- sions. Scalar expressions contain various value expressions, functions and data types definitions. Scalar expressions contain various value expressions, functions and data types definitions. Refer to Chapter 6 of SQL/Foundation [Mel03a] for Scalar Expressions feature. 78

Figure A.23: Data Type Feature Diagram Feature ID - Data Type Semantic Description - Figure A.23 shows the feature diagram for Data Type. SQL:2003 supports three kinds of data types: predefined data types, constructed types, and user-defined types. Refer to Section 6.1 of SQL/Foundation [Mel03a] for Data Type feature. For general information about various data types refer to chapter 4.

Requires - The Data Type feature requires the User-defined Type Definition fea- ture (A.7). Appendix A. SQL:2003 Feature Diagrams 79

Figure A.24: Window Function Feature Diagram Feature ID - Window Function Semantic Description - Figure A.24 shows the feature diagram for Window Function. Window functions are the functions that can be executed on a set of rows (window). Refer to Section 4.15.3 (definition and general information about Window functions) and Section 6.10 (grammar specification) of SQL/Foundation [Mel03a] for Window functions.

Requires- Window Function may only appear in the Select List of a Query Specification and Select statement: Single Row of SQL Data statements, consequently the window function requires these features (Figures 3.10 and A.11 respectively.) The Ranking Aggregate functions require the Window Clause feature of the Window Clause. (A.29) 80

Figure A.25: Function Specification Feature Diagram Feature ID - Function Specification Semantic Description - Figure A.25 shows the feature diagram for Function Spec- ification. Refer to Section 6.9 of SQL/Foundation [Mel03a] for the feature Function Specification.

Requires- Both Aggregate Function and Grouping Operation features require the Se- lect List feature of a Query Specification and Select statement: Single Row of SQL Data Statements and Cursor.(Figures 3.10, A.11, A.12 respectively) Appendix A. SQL:2003 Feature Diagrams 81

Figure A.26: Search Cycle Clause Feature Diagram Feature ID - Search Cycle Clause Semantic Description - Figure A.26 shows the feature diagram for Search Cy- cle Clause.A Search or cycle clause is used to specify the ordering and cycle detection information during execution of recursive query expressions. Refer to Section 7.14 of SQL/Foundation [Mel03a] for the feature Search Cycle Clause. 82

Figure A.27: Table Reference Feature Diagram Feature ID - Table Reference Semantic Description - Figure A.27 shows the feature diagram for Table Reference. A table reference is used to refer to different types of tables. Refer to Section 7.6 (grammar specification) of SQL/Foundation [Mel03a] for Table Reference feature. For general information about various types of tables refer to Section 4.14.

Requires - The Table Subquery feature of Derived Table feature requires the Query Expression feature(Figure 3.9). Appendix A. SQL:2003 Feature Diagrams 83

Figure A.28: Group By Clause Feature Diagram Feature ID - Group By Clause Semantic Description - Figure A.28 shows the feature diagram for Group By Clause. The GROUP BY clause specifies an intermediate result table that consists of a grouping of the rows. Refer to Section 7.9 of SQL/Foundation [Mel03a] for the feature Group By Clause. 84

Figure A.29: Window Clause Feature Diagram Feature ID - Window Clause Semantic Description - Figure A.29 shows the feature diagram for Window Clause. The WINDOW clause is used in a SELECT statement(Query Specification) to define all or part of a window (set of rows) for use with window functions. Refer to Section 7.11 (grammar specification) of SQL/Foundation [Mel03a] for Window Clause feature. Appendix A. SQL:2003 Feature Diagrams 85

Figure A.30: Predicate Feature Diagram Feature ID - Predicate Semantic Description - Figure A.30 shows the feature diagram for Predicate. Predicates are used to specify a condition that is evaluated to return a boolean value. Refer to Chapter 8 of SQL/Foundation [Mel03a] for grammar specifications of various predicates.

This completes the feature-oriented decomposition of SQL:2003 (SQL/Founda- tion). The extension packages of SQL:2003 can be decomposed in a similar way. Refer to Appendix B for features of these packages. 86

Appendix B

Taxonomy of SQL Non-Framework Optional Fearures

The SQL:2003 standards provides mandatory and optional features for conformance by vendors (cf. Section 6.3.7 of [Mel03b]) For each SQL standard, they are listed under the ANNEX, “SQL feature taxonomy”. Only the SQL/Foundation [Mel03a] contains mandatory features and their definitions (cf. Annex F, Table 35 of [Mel03a]). The SQL feature taxonomy in all other standards contains only optional features without definitions. The definitions of mandatory features given in terms of sub-clauses of grammar specification does not always match our features, neither their mandatory or optional nature. No hierarchical structure is evident in the definitions of mandatory features or the nomenclature of mandatory and optional features. In this sense these features differ from features obtained in feature decomposition of SQL:2003. The table B.1 enlists the number of mandatory and optional features in various standards. The large number of listed features in the table B.1 clearly indicate the complexity of implementing the product line architecture for SQL:2003. Various tables in this appendix list the optional features of extension packages.

P artofSQL : 2003SpecificationNo.ofMandatoryF eaturesNo.ofOptionalF eatures SQL Foundation 164 256 SQL PSM - 30 SQL MED - 25 SQL OLB - 9 SQL JRT - 17 SQL XML - 53

Table B.1: Number of Features enlisted in the SQL:2003 Specification Draft Appendix B. Taxonomy of SQL Non-Framework Optional Fearures 87

B.1 Java Routines and Types Using the Java Pro- gramming Language (SQL/JRT)

ISO/IEC 9075-13[Mel03d] defines extensions to Database Language SQL to enable in- vocations of static methods written in the Java programming language as SQL-invoked routines, and to use classes defined in the Java programming language as SQL structured types. (cf. Table B.2)

F eatureIDF eatureName F eatureIDF eatureName J511 Commands J521 JDBC data types J531 Deployment J541 SERIALIZABLE J551 SQLDATA J561 JAR privileges J571 NEW operator J581 Output parameters J611 References J621 external Java routines J622 external Java types J631 Java signatures J641 Static fields J651 SQL/JRT Information Schema J652 SQL/JRT Usage tables

Table B.2: SQL/JRT features

B.2 SQL Object Language Bindings (SQL/OLB)

ISO/IEC 9075-10 [Mel03c] defines facilities for the embedding of SQL statements in Java programs. (cf. Table B.3)

F eatureIDF eatureName F eatureIDF eatureName J001 Embedded Java J002 JResultSetIterator access to JDBC ResultSet J003 Execution control J004 Batch update J005 Call statement J006 Assignment Function state- ment J007 Compound statement J008 Datalinks via SQL language J009 Multiple Open ResultSets

Table B.3: SQL/OLB features

B.3 SQL Persistent Stored Modules (SQL/PSM)

ISO/IEC 9075-4 [Mel03g] makes SQL computationally complete by specifying the syntax and semantics of additional SQL-statements. (cf. Table B.4) 88 B.4. SQL Management of External Data (SQL/MED)

F eatureIDF eatureName F eatureIDF eatureName P001 Stored modules P001-01 SQL-server module defini- tion P001-02 drop module statement P002 Computational complete- ness P002-01 compound statement P002-02 handler declaration P002-03 condition declaration P002-04 SQL variable declaration P002-05 assignment statement P002-06 case statement P002-09 leave statement P002-10 loop statement P002-11 repeat statement P002-12 while statements P002-13 for statement P002-14 signal statement P002-15 resignal statement P002-16 control statement P003 Information Schema views P003-01 MODULES view P003-02 MODULE TABLE USAGEP003-03 MODULE COLUMN USAGE view view P003-04 MODULE PRIVILEGES P004 Extended CASE statement view P005 Qualified SQL variable ref-P006 Multiple assignment erences P007 Enhanced diagnostics man-P008 Comma-separated predi- agement cates in a CASE statement

Table B.4: SQL/PSM features

B.4 SQL Management of External Data (SQL/MED)

ISO/IEC 9075-9 [Mel03f] defines extensions to Database Language SQL to support management of external data through the use of foreign tables and datalink data types. (cf. Table B.5)

B.5 SQL XML-Related Specifications (SQL/XML)

ISO/IEC 9075-14 [Mel03e] defines extensions to database language SQL to enable creation and manipulation of XML documents. (cf. Table B.6) Appendix B. Taxonomy of SQL Non-Framework Optional Fearures 89

F eatureIDF eatureName F eatureIDF eatureName M001 Datalinks M002 Datalinks via SQL/CLI M003 Datalinks via EmbeddedM004 Foreign data support SQL M005 Foreign schema support M006 GetSQLString routine M007 TransmitRequest M009 GetOpts and GetStatistics routines M010 Foreign data wrapper sup-M011 Datalinks via Ada port M012 Datalinks via C M013 Datalinks via COBOL M014 Datalinks via Fortran M015 Datalinks via M M016 Datalinks via Pascal M017 Datalinks via PL/I M018 Foreign-data wrapper inter-M019 Foreign-data wrapper inter- face routines in Ada face routines in C M020 Foreign-data wrapper inter-M021 Foreign-data wrapper inter- face routines in COBOL face routines in Fortran M022 Foreign-data wrapper inter-M023 Foreign-data wrapper inter- face routines in MUMPS face routines in Pascal M024 Foreign-data wrapper inter-M030 SQL-server foreign data face routines in PL/I support M031 Foreign data wrapper gen- eral routines

Table B.5: SQL/MED features 90 B.5. SQL XML-Related Specifications (SQL/XML)

F eatureIDF eatureName F eatureIDF eatureName X010 XML type X011 Arrays of XML typeI X012 Multisets of XML type X013 Distinct types of XML X014 Attributes of XML type X015 Fields of XML type X016 Persistent XML values X020 XML Concatenation X031 XMLElement X032 XMLForest X033 XMLRoot X034 XMLAgg X035 XMLAgg: ORDER BY option X041 Basic table mapping: null absent X042 Basic table mapping: null as nil X043 Basic table mapping: table as for- est X044 Basic table mapping: table as el-X045 Basic table mapping: with target ement namespace X046 Basic table mapping: data map-X047 Basic table mapping: metadata ping mapping X048 Basic table mapping: base64 en-X049 Basic table mapping: hex encod- coding of binary strings ing of binary strings X051 Advanced table mapping: null ab-X052 Advanced table mapping: null as sent nil X053 Advanced table mapping: table asX054 Advanced table mapping: table as forest element X055 Advanced table mapping: withX056 Advanced table mapping: data target namespace mapping X057 Advanced table mapping: meta-X058 Advanced table mapping: base64 data mapping encoding of binary strings X059 Advanced table mapping: hex en-X060 XMLParse: CONTENT option coding of binary strings X061 XMLParse: DOCUMENT optionX062 XMLParse: explicit WHITES- PACE option X070 XMLSerialize: CONTENT optionX071 XMLSerialize: DOCUMENT op- tion X080 Namespaces in XML publishing X081 Query-level XML namespace dec- larations X082 XML namespace declarations inX083 XML namespace declarations in DML DDL X084 XML namespace declarations inX090 XML document predicate compound statements X100 Host language support for XML:X101 Host language support for XML: CONTENT option DOCUMENT option X110 Host language support for XML:X111 Host language support for XML: VARCHAR mapping CLOB mapping X120 XML parameters in SQL routinesX121 XML parameters in external rou- tines X131 Query-level XMLBINARY clauseX132 XMLBINARY clause in DML X133 XMLBINARY clause in DDL X134 XMLBINARY clause in com- pound statements X135 XMLBINARY clause in sub- queries

Table B.6: SQL/XML features Appendix C. SQL Platform Support 91

Appendix C

SQL Platform Support

SQL Feature DB2 v8.1MySQL Oracle PostGRESQL Feature ID v4.8 v10g SQL v7.2Server 200 Role Definition NS NS SWV NS NS ( Figure A.2) Schema Definition SWL NS SWV NS NS ( Figure A.3) Domain Definition NS NS NS NS NS ( Figure 3.3) Table Definition SWV SWV SWV SWV SWV ( Figure 3.4) View Definition SWV NS SWV SWV SWV ( Figure 3.5) Trigger Definition SWV NS SWV SWV SWV ( Figure A.6) Schema Routine-SWV SWL SWV SWL SWV ( Figure 3.6) Functions Schema Routine-SWV NS NS NS NS ( Figure 3.6) Methods Schema Routine-S NS S NS S ( Figure 3.6) Procedures User Defined TypeSWV NS SWL SWV NS ( Figure A.7) Definition Grant Privilege SWV SWV SWV SWV SWV ( Figure A.8) Alter Rou-SWV SWL SWV SWL SWV ( Figure A.10) tine(schema proce- dure) Alter Routine(SQLSWV SWL SWV SWL SWV ( Figure A.10) function) Alter Routine(SQLSWV NS NS NS NS ( Figure A.10) method) Alter Table SWV SWV SWV SWV SWV ( Figure A.10) Alter Type SWV NS SWV NS NS ( Figure A.10) Alter Domain NS NS NS NS NS ( Figure A.10) Cursor SWL NS SWL SWL SWL ( Figure A.12)

Table C.1: SQL Platform Support -1 The tables C.1 and C.2 enlist various features supported (at different support levels) by different platforms.(adapted from [Kli04]) 92

SQL Feature DB2 v8.1MySQL Oracle PostGRESQL Feature ID v4.8 v10g SQL v7.2Server 200 INSERT statementSWV SWV SWV S SWV ( Figure 3.7) MERGE statementS NS S NS NS ( Figure 3.8) DELETE state-SWV SWV SWV S SWL ( Figure A.14) ment Commit statement S SWV SWV SWV SWV ( Figure A.16) Savepoint SWV NS S NS SWL ( Figure A.16) Call statement SWV NS SWV NS NS ( Figure A.17) Return statement SWV NS S SWL S ( Figure A.17) Disconnect state-SWL NS SWL NS SWL ( Figure A.18) ment EXCEPT in QuerySWL NS SWL SWL NS ( Figure 3.9) Expression INTERSECT inSWL NS SWL SWL NS ( Figure 3.9) Query Expression Select statementSWV SWV SWV SWV SWV ( Figure 3.10, Figure (Query Specifica-(ANSI (ANSI (ANSI (ANSI (ANSI 3.11, Figure A.27) tion and relatedjoin sup-join par-join sup-join par-join sup- features) ported) tially ported) tially ported) sup- sup- ported) ported)

Table C.2: SQL Platform Support -2

Meaning of level of support:- Supported (S) The platform supports the given feature in the SQL:2003 standard. Supported, with variations (SWV) The platform supports the given feature with different syntax and code execution than stated in SQL:2003 standard. Supported, with limitations (SWL) The given feature is supported only to some extent as specified by SQL:2003 standard. Not supported (NS) The platform does not support the given feature according to the SQL:2003 standard. 93

Selbstst¨andigkeitserkl¨arung

Hiermit erkl¨areich, dass ich die vorliegende Arbeit selbstst¨andigund nur mit erlaubten Hilfsmitteln angefertigt habe.

Magdeburg, den October 24, 2007 Sagar Sunkle 94