Journal of Universal Computer Science, vol. 5, no. 4 (1999), 243-286

submitted: 12/4/99, accepted: 23/4/99, appeared: 28/4/99  Springer Pub. Co.

Automatic Data Restructuring

Seymour Ginsburg

घComputer Science Department, University of Southern California,

Los Angeles, California, 90089

e-mail: ginsburg@p ollux.usc.eduङ

Nan C. Shu

घIBM Los Angeles Scienti c Center,

Los Angeles,Californiaङ

Dan A. Simovici

घUniversity of Massachusetts at Boston,

Department of Mathematics and Computer Science,

Boston, Massachusetts 02125

e-mail: [email protected]

Abstract: Data restructuring is often an integral but non-trivial part of informa-

tion pro cessing, esp ecially when the data structures are fairly complicated. This pap er

describ es the underpinnings of a program, called the Restructurer, that relieves the

user of the \thinking and co ding" pro cess normally asso ciated with writing pro cedural

programs for data restructuring. The pro cess is accomplished by the Restructurer in

two stages. In the rst, the di erences in the input and output data structures are

recognized and the applicabilityofvarious transformation rules analyzed. The result

is a plan for mapping the sp eci ed input to the desired output. In the second stage,

the plan is executed using emb edded knowledge ab out b oth the target language and

run-time eciency considerations.

The emphasis of this pap er is on the planning stage. The restructuring op erations

and the mapping strategies are informally describ ed and explained with mathematical

formalism. The notion of solution of a set of instantiated forms with resp ect to an

output form is then intro duced. Finally,itisshown that such a solution exists if and

only if the Restructurer pro duces one.

Key Words: non- rst normal form , data restructuring, instantiated forms,

hierarchical structures, solution of a set of instantiated forms.

Category: H.2, E.1

1 Intro duction

The rapid decline of computing costs and the desire for pro ductivity improve-

ment has created increased demands for computerized information pro cessing by

end users [22]. The challenge for computer professionals is to bring computing

capabilities - usefully and simply - to p eople without sp ecial computer training.

The emergence of visual programming [36, 8, 24, 38, 6] and the resurgence of

automatic programming [5] represent two signi cant approaches to meet this

challenge. FORMAL [34], a visual-directed and forms-oriented application de-

velopment system, is a prototyp e that emb o dies the spirits of b oth. The visual

programming asp ects of FORMAL have b een describ ed elsewhere [34,36, 37].

The purp ose of this pap er is to present the automatic asp ects of data restruc-

244 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

1

turing in FORMAL and a theoretical foundation for it. However, the abstract

considerations presented here are not dep endent on the particular implementa-

tion solutions adopted in FORMAL.

Data restructuring is often done in an ad ho c manner. What sets our metho d

apart from other data restructuring e orts is the automatic mechanism's ability

to takeover the \thinking and co ding" pro cess normally asso ciated with writ-

ing algorithms for data restructuring. For convenience this mechanism is called

\the Automatic Restructurer" घor simply the Restructurerङ. The most imp ortant

pieces of information given to the Automatic Restructurer are an unambiguous

description of each input and an unambiguous structural description of the de-

sired output. If there are two or more inputs, then the match elds used to tie the

inputs together must also b e given. Based on these descriptions, an executable

program घusing the restructuring op erations originally prop osed in CONVERT

[31]ङ is generated byatwo-stage pro cess. In the rst stage, the di erences b e-

tween the input and output data structures are recognized and the applicability

of various transformational rules analyzed. The result is a plan for mapping the

sp eci ed inputघsङ to the desired output. In the second stage, runtime ecien-

cies are taken into consideration, and an executable program is implemented to

carry out the plan. The result is a reasonably ecient program for the restruc-

turing task at hand. A ma jor contribution of this pap er is a detailed explanation

घexpressed b oth formally and informallyङ of the underpinnings of the automatic

restructuring mechanism. Since the eciency considerations and implementation

details are not the primary interests of this pap er, the concerns of the second

घi.e., construction or co dingङ stage are included only to complete the exp osition.

The emphasis is on the mechanism underlying the rst घi.e., planningङ stage,

namely, on the derivation of a sequence of restructuring op erations aimed at

pro ducing a user-sp eci ed output from the given input.

The pap er itself is organized into eight sections, the rst b eing this intro duc-

tion. Section 2 explains the notion of \data restructuring." Section 3 presents the

underlying data mo del. Section 4 discusses the restructuring problem. Section 5

de nes four typ es of basic op erations designed sp eci cally for data restructuring.

Section 6 considers the strategies for mapping sp eci ed inputs to desired out-

puts, using sequences of these basic op erations. Section 7 discusses the eciency

considerations. Section 8 presents a summary of the results obtained.

2 Data Restructuring

As mentioned in the Intro duction, the purp ose of this pap er is to present a

metho d for automatically p erforming data restructuring tasks and provide a

theoretical basis for it. This section explains and elab orates on the meaning of

\data restructuring," why it is an imp ortant part of information pro cessing, and

what is the signi cance of an automatic programming approach.

Historically, reorganization is de ned as \changing some asp ect of

the way in which a database is arranged logically and/or physically" [39] - a

1

The facilities provided byFORMAL are a sup erset of those describ ed in this pap er.

Here, wecho ose to discuss only those directly relevant to the restructuring of the

data structures. Capabilities suchaschanging comp onent names, sp ecifying criteria

for selections, case-by-case assignments, arithmetic and string op erations, handling

of exceptions, etc., are not germane to structural transformations, and are therefore not included in the discussion.

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 245

generic term that covers b oth data restructuring घchanging logical structuresङ

and reformating घchanging physical structuresङ of database systems. With that

p ersp ective, data restructuring fell primarily in the domain of data-pro cessing

op erations p ersonnel and database researchers. Our view of data restructur-

ing has a much broader scop e. Here, the term data restructuring is to mean

a sequence of op erations aimed at pro ducing an output with structure di erent

from that of the corresp onding input. Data restructuring may b e p erformed for a

variety of reasons, for example, as a desirable database function घto improve p er-

formance or storage utilizationङ, or as a useful application घto supp ort decision

making or data pro cessingङ.

घPERSONङ

ENO DNO NAME PHONE JC घKIDSङ घSCHOOLङ SEX LOC

KNAME AGE SNAME घENROLLङ

YEARIN YEAROUT

05 D1 SMITH 5555 05 JOHN 02 PRINCETON 1966 1970 F SF

MARY 04 1972 1976

07 D1 JONES 5555 05 DICK 07 SJS 1960 1965 F SF

JANE 04

BERKEPRY 1965 1969

11 D1 ENGEL 2568 05 RITA 04 UCLA 1970 1974 F LA

12 D1 DURAN 7610 05 MARY 08 M SF

BOB 10

19 D1 HOPE 3150 07 MARYLOU 10 M SJ

MARYANN 07

02 D2 GREEN 1111 01 DAVID 04 M SF

20 D2 CHU 3348 10 CHARLIE 06 HONGKONG 1962 1966 F LA

CHRIS 09

BONNIE 04 STANFORD 1967 1969

1972 1975

21 D2 DWAN 3535 12 USC 1970 1974 F SJ

43 D2 JACOB 4643 09 PAULA 07 BERKEPRY 1962 1966 M SJ

Figure 1: घPERSONङ le

Data restructuring plays an imp ortant role in information pro cessing applica-

tions. When data is extracted from sources or new elds are created and placed

in the output, the structure of the resulting output seldom resembles that of

the input. To illustrate, consider the p ersonnel information of a given company

shown in the घPERSONङ le describ ed in Figure 1. Supp ose a Christmas party

is b eing planned. All employees' children will receive gifts at the party. Di erent

gifts are planned for each age group. The organizer of the party wishes to list,

for each lo cation घLOCङ, by age of the children घAGEङ, the name of each kid

घKNAMEङ and his/her parent name घNAMEङ. For this seemingly simple exam-

ple, all information required to pro duce the desired घXMASLISTङ le घFigure

2ङ is contained in the घPERSONङ le. Yet, the actual pro cess of pro ducing the

desired घXMASLISTङ involves data restructuring of a non-trivial nature घsee

Figure 3ङ.

Besides showing that data restructuring is an integral part of many informa-

tion pro cessing applications, the ab ove example also shows that query facilities

घdesigned for simple retrievalङ are not adequate to handle many of the real world

situations: programming is required. Traditionally, programming is considered to

246 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घXMASLISTङ

LOC घRECIPIENTङ

AGE घKIDङ

KNAME NAME

LA 04 BONNIE CHU

RITA ENGEL

06 CHARLIE CHU

09 CHRIS CHU

SF 02 JOHN SMITH

04 MARY SMITH

JANE JONES

DAVID GREEN

07 DICK JONES

08 MARY DURAN

10 BOB DURAN

SJ 07 PAULA JACOB

MARYANN HOPE

10 MARYLOU HOPE

Figure 2: Desired घXMASLISTङ

b e in the professional programmers' province. But the reliance on a professional

programmer often means placing a formal request to the data pro cessing p erson-

nel and waiting on a priority list for programmers to take action. The alternative

is for the end users to learn to program.

Learning to program is generally a time-consuming and often frustrating

endeavor. Moreover, even after the skill is learned, writing and testing a pro-

gram is still tedious and lab or intensive. Writing programs to p erform data

transformation is no exception. To ease the task, sp ecial languages with high

level op erations for data restructuring have b een rep orted in the literature. घSee

[1, 11,17,20,21,25,30,31,32,40,42] for example.ङ These languages are high

level, but they are still \pro cedural" and programming training is necessary.In

other words, users of these languages must write out the algorithms of restruc-

turing in a step by step manner. घSee App endix A for an example.ङ

In the rapidly growing end-user computing environment, many non-DP pro-

fessionals simply do not have the time, interest, or skill to program. Indeed, as

noted in a survey of end-user computing घ [27]ङ, two thirds of the end-user ap-

plications were develop ed by supp ort p ersonnel, programmers, and consultants.

It is our contention that unless an essential part of tedious, low level program-

ming can be made automatic, a large community of p otential end users will

not develop their applications. In the remainder of this pap er, we present an

automatic restructuring mechanism which is aimed at decreasing the tedium

normally asso ciated with writing pro cedural programs. For example, b ecause of

the Automatic Restructurer implemented in FORMAL, घXMASLISTङ can be

created from घPERSONङ with the sp eci cation shown in Figure 4. In essence,

the user describ es the structure of the desired output घXMASLISTङ and tells

the system to nd the required information in the source form घPERSONङ. The system do es the rest.

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 247

घPERSONङ

ENO DNO NAME JC घKIDSङ घSCHOOLङ SEX LOC

KNAME AGE SNAME घENROLLङ

YEARIN YEAROUT

Restructuring

घXMASLISTङ ?

LOC घRECIPIENTङ

AGE घKIDङ

KNAME NAME

Figure 3: Restructuring of घPERSONङ to घXMASLISTङ

3 The Underlying Data Mo del

As mentioned in the Intro duction, the most imp ortant information given to

the Restructurer is an unambiguous description of the input and output data

structures. In this section, we discuss the data structures supp orted by the Re-

structurer, and indicate several ways to describ e them.

Brie y, the Automatic Restructurer handles all data structures as hierar-

chies. Relational tables are treated as degenerate hierarchies घi.e., hierarchies

of one levelङ, and networks are assumed to have b een decomp osed into fami-

lies of hierarchies b efore the Automatic Restructurer is invoked. No conceptual

limitations are placed on the depth or width of the hierarchies.

There are many di erentways to describ e hierarchical data structures. In the

database community, hierarchy graphs are commonly used when data structures

are discussed. As an example, Figure 3 shows the hierarchy graphs of घPERSONङ

and घXMASLISTङ. But hierarchy graphs b ecome cumb ersome when the display

248 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घXMASLISTङ

घRECIPIENTङ

LOC

AGE घKIDङ

KNAME NAME

SOURCE घPERSONङ

Figure 4: A sp eci cation for creating घXMASLISTङ from घPERSONङ

of instances is desired. The diculty is that one \instance graph" is required to

represent each o ccurrence, and consequently many \instance graphs" are needed

to showmultiple o ccurrences.

Data Pro cessing p ersonnel mayormay not care ab out instance graphs, but

for users who do not have very much data-pro cessing training, visualization

of o ccurrences often enhances the understanding of a problem. A \form" data

mo del was therefore suggested in [31] as a two-dimensional representation of

hierarchical data where multiple o ccurrences can be displayed conveniently. It

was intended at that time as a visual aid to assist the users in understanding the

high level op erations p erformed on their data. The \form op eration by example"

[23], the nested table घNTङ mo del [20], and more recently, the \nested normal

forms", and \non rst normal form relational databases" घsee [1, 9, 11,12,18,

19, 28, 29, 30, 41], for exampleङ use similar representations as visual aids. In

these representations, hierarchical structures are implied by exhibiting sample

instances.

Visual aids enhance human understanding. But, unless the representations

are precise, they cannot b e pro cessed by a computer program घdata restructurer

includedङ. In order to provide unambiguous descriptions घwithout dep ending

on the display of sample instancesङ to the Restructurer, the hierarchical struc-

tures must b e made explicit. A straightforward approach is to re ne the \form

headings" used in the form mo del so that the one-many relationships are dis-

tinguished from the one-one relationships [33,10]. Before presenting the re ned

form heading as an unambiguous description of a hierarchical structure, a few

de nitions are in order.

A eld is the smallest unit of data that can b e referenced in a user's appli-

cation. A group is a sequence of one or more elds and a numb er of sub ordinate

groups. For example, ENO घEmployee numb erङ, DNO घDepartment numb erङ,

NAME घEmployee Nameङ, PHONE, JC घJob Co deङ, ..., are elds of the घPER-

SONङ le घFigure 1ङ, while घKIDSङ, घSCHOOLङ, and घENROLLङ are groups.

Groups are either rep eating or non-rep eating. A non-repeating group is a

convenientway to refer to a sequence of elds as a unit घe.g., DATE is a non-

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 249

rep eating group over MONTH, DAY and YEARङ. A repeating group, on the

other hand, represents the one-many relationship and thus exhibits the hierar-

chical relationship. Take घPERSONङ घFigure 1ङ for example. An employee may

have manychildren, and mayhave attended a number of scho ols. Thus, for an

employee, घKIDSङ and घSCHOOLङ are rep eating groups. Rep eating groups can

b e nested घrepresenting several levels of a hierarchy, e.g., घENROLLङ is a sub or-

dinate rep eating group, nested within घSCHOOLङङ, or in parallel घrepresenting

several branches of a hierarchy, e.g., घKIDSङ and घSCHOOLङ representtwo di er-

ent branchesङ. To simplify the discussion, we do not deal with the non-rep eating

groups in the rest of this pap er.

In data pro cessing applications, a le is a named collection of homogeneously

structured data. The structure of a le is generally de ned in terms of its com-

p onents घwhere a component is either a eld or a groupङ. Comp onent names are

assumed to b e unique for each le, i.e., there are no duplicate names among the

comp onents.

In FORMAL, a form heading assumes a role that is commonly known as the

data structure de nition in conventional programming languages. Its purp ose is

to unambiguously de ne the name of the asso ciated le, its comp onents, and

structural relationships among the comp onents. Parentheses are used to denote

the existence of multiple घi.e. zero or moreङ o ccurrences in the asso ciated le or

group. Thus, names of the le and its repeating groups घalso cal led subformsङ

are enclosed in parentheses. More precisely, in a form heading, the le name is

placed on the top line. The names of the ro ot or top level comp onents are shown

in columns under the le name. The names of comp onents contained in a group,

in turn, are placed in columns under the name of the containing group. A double

line signals the end of a form heading. घSee the lines ab ove the double line in

Figure 1ङ.

In a form heading, the nesting and/or parallel parenthesized names serveto

de ne explicitly a hierarchical structure. The level of each comp onent is indicated

by its row-p osition under the le name. To illustrate, the level numb ers are

explicitly shown in Figure 5 on the outside of the घPERSONङ form heading. For

example, ENO, घKIDSङ, and घSCHOOLङ are at level 1, SNAME is at level 2 and

YEARIN level 3.

The Restructurer classi es data structures घor \forms"ङ into three distinct

shapes : \ at", \branch", and \tree." The algorithm for determining the shap e

is quite simple. It is implemented by traversing a given two-dimensional form

heading घor its equivalent \comp onent description table", discussed laterङ. Dur-

ing the traversal, each comp onent in the form heading is assigned a level number

according to its row p osition. At the end of the traversal, the shap e of the data

structure is determined. If the numb er of parenthesized names equals one, then

none of the comp onents is a rep eating group घor subformङ. The structure is at

shap ed. If the numb er of parenthesized names घsay pङ is greater than one, then

that number घpङ is compared with the largest level numb er घsay nङ. Obviously,

if p is larger than n, then at least one level must have more than one subform,

and the shap e of the form is a tree. घSee Figure 5 for an example.ङ Otherwise,

the structure is a branch. घSee घXMASLISTङ in Figure 4 for an example.ङ

The shap es of at, branch, and tree structures are ranked 1, 2 and 3, re-

sp ectively. The relative shap e rankings of the input and output structures have

a b earing on how the Restructurer maps the strategy for transformation घdis- cussed in Section 6ङ. In the rest of the pap er, when the shap e of the data structure

250 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घPERSONङ level

ENO DNO NAME PHONE JC घKIDSङ घSCHOOLङ SEX LOC 1

KNAME AGE SNAME घENROLLङ 2

YEARIN YEAROUT 3

Index IDENTIFIER NEXTI CHILDI PARENTI PRVEL GROUP

0 PERSON 0 1 0 0 1

1 ENO 2 0 0 1 0

2 DNO 3 0 0 1 0

3 NAME 4 0 0 1 0

4 PHONE 5 0 0 1 0

5 JC 6 0 0 1 0

6 KIDS 7 10 0 1 1

7 SCHOOL 8 12 0 1 1

8 SEX 9 0 0 1 0

9 LOC 0 0 0 1 0

10 KNAME 11 0 6 2 0

11 AGE 0 0 6 2 0

12 SNAME 13 0 7 2 0

13 ENROLL 0 14 7 2 1

14 YEARIN 15 0 13 3 0

15 YEAROUT 0 0 13 3 0

Note: NEXTI, CHILDI and PARENTI are indexes to entries in the comp onent description

table for next comp onent, child comp onent,and parent comp onent, resp ectively.

Figure 5: Comp onent description table of घPERSONङ form

is not the center of attention, the generic term \tree" is used interchangeably

with the term \hierarchy" घdiscussed in the intro duction to the present sectionङ.

To view the data, o ccurrences are displayed under the form heading. The

compactness of the form heading enables the visualization of many o ccurrences

at one time घsee Figure 1ङ.

Since the form heading represents a concrete means to unambiguously de-

ne a hierarchical structure of arbitrary depth and width, it is used as an in-

put/output description byFORMAL. An input is de ned once घregardless of the

numb er of di erent applications utilizing itङ by a user or a systems p erson, and

the description is immediately available in the system catalog. There is no need

for another user to de ne the same le again. Description of the desired output,

on the other hand, is provided by the user as part of his application sp eci -

cation. For example, in Figure 4, the description of घXMASLISTङ is provided

in the form heading of the FORMAL program that pro duces घXMASLISTङ. As

discussed later in Section 6, intermediate les may b e needed during the restruc-

turing pro cess. Structures of all intermediate les are de ned by the Restructurer

and are of no concern to the end users.

As mentioned earlier, it is necessary that the unambiguous descriptions of

input and output data structures are available to the automatic restructuring

mechanism, but it is immaterial how the required information is captured. In the

FORMAL implementation, a parser translates the two-dimensional form head-

ings into component description tables which are used as input to the Automatic

Restructurer. As an example, Figure 5 shows the comp onent description table

corresp onding to the घPERSONङ form heading. The comp onent description ta-

bles are easily translatable from the form headings, and vice versa. They are

really two representations of the same information, one b eing more convenient

for the user, the other more suitable for machine manipulation.

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 251

In the formal treatment which follows, we use the term form to denote, in

an abstracted form, the unambiguous description of a data structure. In the

examples, we use form heading as the concrete representation of the abstracted

notion.

In order to discuss the data restructuring pro cess in a more abstract manner,

a formal description of the hierarchical data structures is now presented.

Let BA b e an in nite set of abstract elements called basic attributes. घBasic

1

attributes corresp ond to the elds mentioned earlier.ङ For each basic attribute

A, let

{ DomघAङ घcalled the domain of Aङ b e a nonempty set of at least two abstract

elements, exactly one of which is denoted ? घthe null symb olङ;

A

{ the basic attributes of A, denoted BAघAङ, b e the set fAg; and

{ the attributes of A, denoted Attrib घAङ, b e the set fAg.

Continuing recursively, let B ;:::;B be r ऋ 1 attributes such that BA घB ङ \

1 r i

BAघB ङ = ; for all i 6= j , and at least one of them a basic attribute. Let

j

B = hB ;:::;B i, where \h" and \i" are two sp ecial symb ols. Then

1 r

{ DomघB ङ is the family of nite subsets of the घorderedङ Cartesian pro duct

DomघB ङ ࣾ ::: ࣾ DomघB ङ;

1 r

{ B is called a form घattributeङ and a parent of each B , and each B is an

i i

o spring or an immediate descendent of B ;

S

{ BAघB ङ= BAघB ङ, and is called the set of basic attributes of B ; and

i

i

S

{ Attrib घB ङ=fB g[ AttribघB ङ, and is called the set of attributes of B .

i

i

Each attribute in Attrib घB ङ fB g is called a proper attribute of B . Each B

i

which is a form is called a subform of B . Recursively, each subform of a subform

B is also called a subform of B . If each B is a basic attribute, then B is said

i i

to b e a at form.IfB has at least one subform and B and each subform of B is

0 0 0 0

is a form, then B is said i, where at most one B ;:::;B of the shap e B = hB

s 1

i

to b e a branch.

To illustrate, in Figure 1, ENO, DNO, NAME, PHONE, JC, KNAME, AGE,

SNAME, YEARIN, YEAROUT, SEX and LOC are basic attributes.

घENROLLङ=hYEARIN ; YEAROUTi;

घSCHOOLङ = hSNAME ; ENROLLi;

घKIDS ङ = hKNAME; AGEi; and

घPERSONङ = hENO;:::;JC; घKIDSङ; घSCHOOLङ; SEX; LOCi

are forms.

घPERSONङ is the parent of ENO, DNO, ..., and LOC. Each attribute except

घPERSONङ is a prop er attribute of घPERSONङ.

Note that each prop er attribute C of B has exactly one parent, denoted

Par घC ङ, or Par घC ङ when B is understo o d. Also, if D is a prop er attribute of

B

C and C is an attribute of B , then Par घD ङ=Par घD ङ.

C B

0 0

C is an ancestor of C if either C = Par घC ङ, or C = Par घD ङ and D is

0 0

an ancestor of C घalternatively, C = Par घC ङ, or C is an ancestor of D and

0 0 0

D = Par घC ङङ. C is said to b e a descendent of C if C is an ancestor of C . C

0 0

and C are said to b e siblings if Par घC ङ=Par घC ङ.

If C is an ancestor of B , then C is a form.

252 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

Let C be an attribute in Attrib घF ङ. If D ;:::;D are the immediate de-

1 n

scendents of C in F , then Free घC ङ = fD j1 ऊ i ऊ n and D in BAघF ङg. If

F i i

C in BAघF ङ, then Free घC ङ=fC g. The subscript F is omitted when F is clear

F

from the context. Note that Free घC ङ 6= ; for each C in AttribघF ङ. Thus

F

FreeघघPERSONङङ = fENO; DNO; NAME; PHONE; JC; SEX; LOCg;

FreeघघSCHOOLङङ = fSNAMEg and

FreeघघENROLLङङ = fYEARIN; YEAROUTg:

0 0 0

Let B; B be two attributes. B equals B घdenoted by B = B ङif

0

{ B and B are the same basic attribute, or

0 0 0 0

{ B = hB ;:::;B i, B = hB ;:::;B i and B = B for 1 ऊ i ऊ r .

1 r i

1 r

i

This means that two forms that have the same direct descendents are considered

identical.

Referring to the level number discussed earlier, let F = hF ;:::;F i be a

1 r

form. The level of each attribute F with resp ect to F is de ned to b e 1. Con-

i

0

tinuing by induction, supp ose F is an attribute in F of level l with resp ect to

0 0 0 0

;:::;F i the level of each F with resp ect to F is de ned to b e F .For F = hF

s

1

j

l + 1. Each attribute of level 1 is said to b e at root level. The level of an attribute

A in Attrib घF ङ is denoted by level घAङ, or by levelघAङ when F is understo o d.

F

Wenow turn to the instance of a form. Using recursion, an instance of a form

F = hF ;:::;F i is a nite set I of functions

1 r

f : FreeघF ङ [fF jF a form g!

j j

S

fDomघAङjA in FreeघF ङg[fInsघF ङjF a formg;

j j

such that f घAङ is in DomघAङ for each A in FreeघF ङ and f घF ङisinInsघF ङ for

j j

each form F . Let InsघF ङ b e the set of all instances of F .

j

Figure 1 illustrates an instance I of the घPERSONङ form. The set I consists

of nine functions f ;:::;f , representing the information in the compartments 1

1 9

through 9 under the double line. f is the function over

1

fENO, DNO, NAME, PHONE, JC, घKIDSङ, घSCHOOLङ, SEX, LOCg

de ned by

f घENOङ = 05; f घDNOङ = D 1; f घNAME ङ = SM I T H;

1 1 1

f घPHONEङ = 5555; f घJCङ = 05; f घSEX ङ = F;

1 1 1

f घLOCङ = SF; f घघKIDSङ ङ = I f घघSCHOOLङ ङ = I :

1 1 11 1 12

Here, I is an instance of घKIDSङ that consists of the functions f ;f de ned

11 111 112

by

f घKNAMEङ = JOHN; f घAGEङ = 02;

111 111

f घKNAMEङ = MARY; f घAGEङ = 04:

112 112

I is the instance of घSCHOOLङ consisting of the function f , where

12 121

f घSNAME ङ = PRINCETON;f घघENROLLङङ = I :

121 121 1211

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 253

I is the instance of the form घENROLLङ consisting of the functions f

1211 12111

and f over fYEARIN,YEAROUTg, where

12112

f घYEARIN ङ = 1966;f घYEAROUTङ = 1970;

12111 12111

f घYEARIN ङ = 1972;f घYEAROUTङ = 1976:

12112 12112

The remaining functions f ;:::;f are de ned similarly.

2 9

S

For each form F , let Ins घF ङ= fI jI in InsघF ङg, that is, Ins घF ङ is the

1 1

set of all functions which are in at least one instance of F .Aninstantiated form

is a pair घI ;Fङ, where F is a form and I is an instance of F .

A set of instantiated forms B = fघI ;FङjF in Fg will be referred to as an

instantation of F .

4 The Restructuring Problem

2

Wenow turn to a precise statement of the basic problem . The actual formal-

ization is rather complicated and requires a numb er of concepts. A central role

will b e played by the notions of \direct solution" and of \solution" intro duced

in Section 7.

We start by recalling some notions from relational database theory. Let X be

a nite nonempty set of basic attributes. A tuple घover X ङ is a function f from

S

X into DomघAङ such that f घAङ is in DomघAङ for all A in X .A relation

A in X

is a pair घI; Xङ, or I when X is understo o d, where X is a nonempty set of basic

attributes and I is a nite set of tuples over X . If घI ; hB ;:::;B iङ is a at

1 n

instantiated form, then B ;:::;B are basic attributes and घI ; fB ;:::;B gङis

1 n 1 n

a relation.

For each nonempty subset Y of X , the projection over Y is the function ँ

Y

from the set of tuples over X to the set of tuples over Y de ned by ँ घf ङ=g ,

Y

where f is a tuple over X and g is the tuple over Y for which g घAङ=f घAङ for

all A in Y .

Our interest in tuples over a set of basic attributes stems from the following

concept. By recursion, for each form F = hF :::;F i and each f 2 Ins घF ङ, let

1 r 1

K घf ङ, called the kernel of f , b e the set of all tuples g over BAघF ङ such that

घiङ g घAङ=f घAङ for each A in FreeघF ङ and

घiiङ g over BAघF ङ, F a form, is a tuple in K घf घF ङङ.

j j j

S

For each I in InsघF ङ, the set fK घf ङjf in Ig denoted by K घI ;Fङ, is the kernel

of घI ;Fङ.

Obviously, each kernel is a nite set of tuples. If F is a at form, then

K घI ;Fङ=I . When F is clear from the context we shall write K घI ङ instead of

K घI ;Fङ.

Given a form F and a nite set R of tuples over BA घF ङ there may b e more

than one instantiated form घI ;Fङ such that K घI ;Fङ=R .For example, consider

the form F = hA; F ;F i, where F = hB i and F = hC i, and the set R of tuples

1 2 1 2

over BAघF ङis

2

The authors are indebted to Dr. Marc Gyssens for helping to clarify the original

incorrect formulation of the basic problem.

254 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

A B C

a b c

1

a b c

2

a b c

1 1

a b c

2

Both instantiated forms घI ;Fङ and घI ;Fङ given b elowhave their kernels equal

1 2

to R .

F F

A F F A F F

1 2 1 2

B C B C

घI ;Fङ= घI ;Fङ=

a b c a b c

2 1

1 1

b c

2 1

a b c a b c

1 1 2

a b c a b c

2 2

The set of paths of F , denoted by Paths घF ङ, is de ned inductively on the

number k of subforms of F . Sp eci cally, if k = 0, then F is a at form and

Paths घF ङ=fF g. Continuing by induction, let F b e the form F = hF ;:::;F i.

1 r

If F ;:::;F are the subforms of F at the ro ot level, then each of them has

i i

1 m

fewer than k subforms. This allows us to de ne Paths घF ङas

Paths घF ङ=ffH g jH = F and

j 0ऊj ऊ` 0

a form in F for 1 ऊ p ऊ mg: ङ;F fH g in Paths घF

i j 1ऊj ऊ` i

p p

Each path ए in Paths घF ङ generates a घuniqueङ branch F obtained by removing

from F all form attributes, together with their descendants, that do not o ccur

in ए .We shall use the notation BAघए ङ for BAघF ङ.

5 Basic Typ es of Restructuring Op erators

As mentioned in Section 3, the data structures of interest to us are basically hi-

erarchical structures of arbitrary depth or width. Relational tables are treated as

hierarchies of one level, and networks are assumed to have b een decomp osed into

families of trees. In this section, restructuring op erations sp eci cally designed to

work on hierarchical structures are discussed.

In transforming a hierarchical structure to a di erent one, the Automatic Re-

structurer makes use of four basic typ es of tree op erations: trimming, attening,

stretching, and grafting. When applied to a particular situation, eachof these

four typ es of op erations causes a restructuring to take place - an op eration that

pro duces one output le from either one घin the case of trimming, attening and

stretchingङ or two घin the case of graftingङ input les.

These four typ es of op erations were called SELECT, SLICE, CONSOL-

IDATE and GRAFT in the CONVERT system [32]. घThe names have b een

changed here for mnemonic purp oses.ङ Exp erience has shown that the data ex-

traction, manipulation and restructuring needs of the real world can b e handled

byacombination of these four typ es of op erations. Similar op erations were later

prop osed in [20,42,13]. Query languages designed for \nested tables" or \non

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 255

rst normal form relational databases" generally have capabilities to p erform

functions similar to the basic op erations discussed here. घSee [2, 3,10,14,26,28]

for example.ङ

The following examines the functions of the four basic typ es of op erations.

5.1 Trimming

A trimming op eration removes unwanted comp onents from a form or a subform.

Data is extracted from the input and placed in the output without anychange in

the hierarchical relationships. An example of trimming is presented in Figures 6

and 7. Figure 6 shows the structural transformation, and 7 the instances involved.

Wenow consider \trimming" in a formal manner.

A form G = hG :::G i is a trimming of the form F = hF ;:::;F i if G 6= F

1 s 1 r

and there is a one-to-one partial function h, called a trimming operation, from

Attrib घF ङonto AttribघGङ with the following prop erties:

1

घTr1ङ h घB ङ=B for each basic attribute B in Attrib घGङ.

1 1

घTr2ङ h घPar घB ङङ = Par घh घB ङङ for each prop er attribute B of G.

1

घTr3ङ h घGङ=F .

Symb olically,we write hघF ङ=G.

Notice that each p ermutation of siblings at the ro ot level of a form F ,orat

the ro ot level of some subform of F , is a trimming of F . Also, the comp osition

of two trimmings is a trimming; that is, if G is a trimming of F घby h ङ and H

1

is a trimming of G घby h ङ, then H is a trimming of F घby h h , where h h is

2 2 1 2 1

the function obtained by rst applying h and then h ङ.

1 2

We note the following result, whose pro of is given in App endix B.

Prop osition 1. घTrimming Prop ositionङ Let G = hG :::G i be a trimming

1 s

of F = hF ;:::;F i by h.Let j be such that hघF ङ exists. Then

1 r j

घaङ There exists k घj ङ such that hघF ङ=G .

j

k घj ङ

घbङ G is a basic attribute if and only if F is a basic attribute, in which

j

k घj ङ

case G = F .

j

k घj ङ

घcङ hघAttrib घF ङङ = Attrib घG ङ.

j

k घj ङ

घdङ G is a trimming of F by h.

j

k घj ङ

Now let G = hG ;:::;G i be a trimming of F = hF ;:::;F i by h. Using

1 s 1 r

InsघF ङ

घaङ, घbङ and घdङ of the Trimming Prop osition, let h b e the mapping from

InsघF ङ

Ins घF ङtoIns घGङ where, for each f in Ins घF ङ, h घf ङ=g , g de ned as

1 1 1

follows:

घ1ङ g घAङ=f घAङ for all A in FreeघGङ=fG jG a basic attributeg.

k k

घ2ङ For each subform F such that hघF ङ exists, hघF ङ=G is a subform of

j j j

k घj ङ

InsघF ङ

j

G and G is a trimming of F by h. Recursively, h maps Ins घF ङinto

j 1 j

k घj ङ

InsघF ङ

j

Ins घG ङ. Let g घG ङ=h घf घF ङङ. Clearly, g घG ङisinInsघG ङ.

1 j

k घj ङ k घj ङ k घj ङ k घj ङ

The following result is also shown in App endix B.

Theorem 2. घTrimming Theoremङ Let G be a trimming of F by h. Then

InsघF ङ

K घh घI ङङ = ँ घK घI ङङ for each I in InsघF ङ.

BAघGङ

256 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घPERSONङ

ENO DNO NAME PHONE JC घKIDSङ घSCHOOLङ SEX LOC

KNAME AGE SNAME घENROLLङ

YEARIN YEAROUT

TRIMMING

?

घPERSON5ङ

DNO ENO NAME LOC घSCHOOL5ङ

SNAME घENROLL5ङ

YEAROUT

Figure 6: Structural transformation by TRIMMING

5.2 Flattening

A attening op eration pro duces a at form from a branch of an input by ex-

tracting all information at the level of the values of the basic attributes of the

branch. The relative p ositioning of these attributes in the output need not be

the same as that in the input. An example of attening is shown in Figures 8

and 9.

From a theoretical standp oint, it is simpler to restrict the shap e of the input

to b e a branch. In case the input has the shap e of a tree, it is not dicult to

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 257

घPERSONङ

ENO DNO NAME PHONE JC घKIDSङ SCHOOLङ SEX LOC

KNAME AGE SNAME घENROLLङ

YEARIN YEAROUT

05 D1 SMITH 5555 05 JOHN 02 PRINCETON 1966 1970 F SF

MARY 04 1972 1976

07 D1 JONES 5555 05 DICK 07 SJS 1960 1965 F SF

JANE 04

BERKELEY 1965 1969

11 D1 ENGEL 2568 05 RITA 04 UCLA 1970 1974 F LA

.

.

.

TRIMMING

?

घPERSON5ङ

DNO ENO NAME LOC घSCHOOL5ङ

SNAME घENROLL5ङ

YEAROUT

D1 05 SMITH SF PRINCETON 1970

1976

D1 07 JONES SF SJS 1965

BERKELEY 1969

D1 11 ENGEL LA UCLA 1974

.

.

.

Figure 7: TRIMMING with instance displayed

envisage a two-step pro cess: First apply a trimming op eration to obtain an in-

termediate form containing only the relevant branch with the desired attributes.

Then apply a attening op eration on the intermediate form to pro duce the de-

3

sired at output .Thus, in the formal treatment that follows, we assume that

the input form has the shap e of a branch, and a trimming op eration has already

b een applied if necessary.

A form G = hG ;:::;G i is a attening of the form F if G 6= F , F is a

1 s

branch and there is a one-to-one partial function h घcalled a attening operation

ङ from Attrib घF ङonto Attrib घGङ with the following prop erties:

घFl 1ङ Each G is a basic attribute, i.e., G is a at form.

i

घFl 2ङ BAघGङ=BAघF ङ.

1

घFl 3ङ h घG ङ=G for each G .

i i i

1

घFl 4ङ h घGङ=F .

By घFl 3ङ and घFl 4ङ, hघC ङ is de ned only for C in fF g[ BAघGङ.

Theorem 3. घFlattening Theoremङ If G is a attening of F by h and I is

InsघF ङ

an instanceof F , then K घh घI ङ;Gङ=K घI ;Fङ.

3

From the software p oint of view, however, this two-step pro cess is not as ecient

as directly applying a attening op eration to a branch of a tree घi.e., without rst

trimming the treeङ. Therefore, the implementation enables the direct application of a

attening op eration to a branch regardless of whether the input form is tree-shap ed

or branch-shap ed.

258 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घP1ङ

ENO DNO NAME PHONE JC घSCHOOLङ SEX LOC

SNAME घENROLLङ

YEARIN YEAROUT

FLATTENING

?

घP2ङ

DNO ENO NAME PHONE JC SNAME YEARIN YEAROUT SEX LOC

Figure 8: Structural transformation by attening

घP1ङ

ENO DNO NAME PHONE JC घSCHOOLङ SEX LOC

SNAME घENROLLङ

YEARIN YEAROUT

05 D1 SMITH 5555 05 PRINCETON 1966 1970 F SF

1972 1976

07 D1 JONES 5555 05 SJS 1960 1965 F SF

BERKELEY 1965 1969

11 D1 ENGEL 2568 05 UCLA 1970 1974 F LA

.

.

.

FLATTENING

?

घP2ङ

DNO ENO NAME PHONE JC SNAME YEARIN YEAROUT SEX LOC

D1 05 SMITH 5555 05 PRINCETON 1966 1970 F SF

D1 05 SMITH 5555 05 PRINCETON 1972 1976 F SF

D1 07 JONES 5555 05 SJS 1960 1965 F SF

D1 07 JONES 5555 05 BERKELEY 1965 1969 F SF

D1 11 ENGEL 5555 05 UCLA 1970 1974 F LA

.

.

.

Figure 9: Flattening with instance displayed

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 259

InsघF ङ

Proof. Let G b e a attening of F by h.For each f in Ins घF ङ, let h घf ङ=

1

S

InsघF ङ InsघF ङ

ँ घK घf ङङ. For each I in InsघF ङ, let h घI ङ= fh घf ङjf is in Ig.

BAघGङ

InsघF ङ InsघF ङ

Thus, we obtain h घI ङ=ँ घK घI ङङ. Clearly, h घI ङisinInsघGङ.

BA घGङ

Since G is a at form, K घJ ङ=J for each J in InsघGङ. Hence,

InsघF ङ InsघF ङ

K घh घI ङङ = h घI ङ=ँ घK घI ङङ:

BAघGङ

5.3 Stretching

A stretching op eration pro duces an output with more hierarchical levels than

the input. The e ect is to factor out the common values in the o ccurrences. In

terms of घhierarchicalङ trees, this op eration forms a taller tree. The additional

hierarchical levels must b e formed from attributes existing at the root level of the

input. The names of the new subforms are sp eci ed by the user घin the description

of the output structure, i.e., the output form headingङ or by the Restructurer

घwhen an intermediate form is requiredङ. An example of stretching is shown in

Figure 10 where the output घPERSON7ङ has two more levels than the input

घPERSONङ. The new levels are formed by factoring out the common values of

DNO and, within each unique DNO, the common values of LOC. घBRANCHESङ

and घEMPLOYEEङ are new subforms.

In more detail, a form G = hG ;:::;G i is said to be a stretching of the

1 s

0 0 0

form F = hF ;:::;F i if there exists a subform G = hG ;:::;G i of G with the

1 r

1 q

following prop erties:

घSt 1ङ The forms in the set fF ;:::;F g coincide with the forms in the set

1 r

0 0

fG ;:::;G g.

1 q

0 0

घSt 2ङ The siblings of G , and the siblings of each ancestor of G , together with

0 0

the basic attributes in the set fG ;:::;G g, coincide with the basic attributes

1 q

in the set fF ;:::;F g, i.e., coincide with FreeघF ङ.

1 r

From घSt 1ङ and घSt 2ङ, it follows that BA घF ङ=BAघGङ.

The transformation of F into G is said to b e a stretching operation. Expressed

otherwise, G is said to b e a stretching of F by a stretching op eration.

To simplify the discussion of instances, we present a slightly di erent formu-

lation of stretching.

Prop osition 4. घStretching Prop ositionङ A form G is a stretching of the

l l l

i, form F = hF ;:::;F i if and only if there exists a sequence G = hG ;:::;G

1 r

1

sघlङ

0

0 ऊ l ऊ t, where t>0 and G = G, with the fol lowing properties:

t t

g coincide with the forms in fF ;:::;F g. ;:::;G घaङ The forms in fG

1 r

1

sघtङ

l l l

= , and G is a form, say G घbङ For each l , 0 ऊ l

i

iघlङ iघlङ

l+1

G .

घcङ The basic attributes in fF ;:::;F g coincide with the basic attributes in

1 r

i

fG j for al l i; j g.

j

0 0 0

i is a stretching Proof. Indeed, supp ose hG ;:::;G i = G = G = hG ;:::;G

1 s

1

sघ0ङ

0 0 0

of F . Then there exists a subform G = hG ;:::;G i satisfying घSt 1ङ and घSt

1 q

0 0 1 1 1

g such that either ;:::;G i be the form in fG ;:::;G 2ङ. Let G = hG

1 1

sघ0ङ sघ1ङ

1 0 1 0 1

G = G or G is an ancestor of G . By घSt 2ङ, all the siblings of G are

260 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घPERSONङ

ENO DNO NAME PHONE JC घKIDSङ घSCHOOLङ SEX LOC

@

@

@

@

KNAME AGE SNAME घENROLLङ

YEARIN YEAROUT

STRETCHING

घPERSON7ङ

?

DNO घBRANCHESङ

LOC घEMPLOYEEङ

ENO NAME JC घSCHOOLङ घKIDSङ PHONE SEX

SNAME घENROLLङ KNAME AGE

YEARIN YEAROUT

Figure 10: Structural transformation by stretching

basic attributes in fF ;:::;F g.Now rep eat the pro cedure for l ऋ 1, obtaining

1 r

l l l l1 l 0

G = hG ;:::;G i from G ,until G = G . Let t b e the value of l for which

1

sघlङ

l 0 l l l

G = G . Clearly, t>0 and the sequence G = hG ;:::;G i घwhere 0 ऊ l ऊ tङ

1

sघlङ

satis es घaङ - घcङ.

l 0 l l

;:::;G i घwhere 0 ऊ l ऊ t, t > 0 and G = Now supp ose G = hG

1

sघlङ

0 0 0 t t t

Gङ satis es घaङ - घcङ. Let G = hG ;:::;G i = G = hG ;:::;G i. Clearly,

1 q 1

sघtङ

conditions घSt 1ङ and घSt 2ङ hold. Hence, G is a stretching of F .

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 261

Let G be a stretching of F = hF ;:::;F i by a stretching op eration h and

1 r

l l l

G = hG ;:::;G i घwhere 0 ऊ l ऊ tङ a sequence of forms satisfying घaङ - घcङ of

1

sघlङ

Ins घF ङ

1

the Stretching Prop osition. For each function f in Ins घF ङ, let h घf ङ=

1

0 0 0 0 0

g , where g is the function on FreeघG ङ [fG g de ned by g घAङ=f घAङ for

iघ0ङ

0 0 0 1

each A in FreeघG ङ ई FreeघF ङ and g घG ङ=fg g. By induction, for each l ,

iघ0ङ

l l l l

1 ऊ l

iघlङ

l l l l+1 t

for each A in FreeघG ङ ई FreeघF ङ and g घG ङ=fg g. Let g b e the function

iघlङ

on

t t t

a form g jG FreeघG ङ [fG

j j

t

= FreeघG ङ [fF jF a form g;

j j

by घaङ of the Stretching Prop osition

ई FreeघF ङ [fF jF a formg

j j

t t t

de ned by g घAङ=f घAङ for each A in FreeघG ङ ई FreeघF ङ and g घF ङ=f घF ङ

j j

for each F in fF jF a form g. It is readily seen that :

j i i

0

घiङ g is in Ins घGङ and

1

0

घiiङ K घg ङ is the set of all functions gऌ over BAघGङ = BAघF ङ such that

S

t

l

FreeघG ङ andg ऌघF ङ is a tuple in K घf घF ङङ for gऌघAङ=f घAङ for each A in

j j

l=0

each form F .

j

ऌ ऌ

By de nition, K घf ङ is the set of all functions f over BAघF ङ such that f घAङ=

f घAङ for each A in FreeघF ङ, and f घF ङ is a tuple in K घf घF ङङ for each form F .

j j j

0 Ins घF ङ

1

By घcङ of the Stretching Prop osition, K घf ङ=K घg ङ=K घh घf ङङ. From

this, we immediately get:

Theorem 5. घStretching Theoremङ If G is a stretching of F by h and I is

InsघF ङ

an instanceof F , then K घh घI ङ;Gङ=K घI ;Fङ.

5.4 Grafting

A grafting op eration combines two hierarchies horizontally to form a wider hi-

erarchyby matching the values of the common elds at the root घtopङ level of

4

the two inputs. These common basic attributes are used as linkages between

two inputs, and are called the match attributes for convenience of discussion. In

terms of trees, one can view the op eration as \grafting" one घhierarchicalङ tree

to another in order to form a new one. An example is shown in Figure 11.

A formal description of grafting is now given. A form G = hG ;:::;G i is

1 s

said to b e a grafting of the forms E = hE ;:::;E i and F = hF ;:::;F i if

1 q 1 r

घGr 1ङ There exists some E and F such that BAघE ङ \ BAघF ङ 6= ;.

i j i j

घGr 2ङ For each E and F such that BAघE ङ \ BAघF ङ 6= ;, E = F and is

i j i j i j

a basic attribute.

घGr 3ङ

hG ;:::;G i = hE ;:::;E ;F ;:::;F i;

1 s 1 q

uघ1ङ uघsq ङ

4

If some of the common basic attributes are not at the ro ot level, then additional

op erations must b e p erformed b efore grafting can take place.

262 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घDEPTङ घPERSON7Aङ

DNO DNAME घPROJECTङ DNO घEMPLOYEEङ

? ?

PJNO BUDGET ENO NAME LOC घSCHOOLङ

?

SNAME घENROLLङ

?

YEARIN YEAROUT

GRAFTING घmatch on DNOङ

घDEPTMENTङ

?

DNO DNAME घPROJECTङ घEMPLOYEEङ

? ?

PJNO BUDGET ENO NAME LOC घSCHOOLङ

?

SNAME घENROLLङ

?

YEARIN YEAROUT

Figure 11: Structural transformation by grafting

where F ;:::;F is the subsequence of F ;:::;F consisting of all F not

1 r i

uघ1ङ uघsq ङ

in the set fE ;:::;E g.

1 q

The transformation of E and F into G is said to b e a grafting operation. Ex-

pressed otherwise, G is said to b e a grafting of E and F by a grafting op eration.

We denote G by E 1 F .

gr

Let G = hG ;:::;G i b e a grafting of

1 s

E = hE ;:::;E i and F = hF ;:::;F i

1 q 1 r

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 263

by a grafting op eration h.For each e in Ins घE ङ and f in Ins घF ङ such that

1 1

InsघE;F ङ

eघAङ=f घAङ for all A in FreeघE ङ \ FreeघF ङ, let h घe; f ङ= g , where g is

the function in Ins घGङ de ned as follows:

1

{ g घAङ=eघAङ for each A in FreeघE ङ.

{ g घAङ=f घAङ for each A in FreeघF ङ.

{ g घG ङ=eघG ङifG is a form in fE ;:::;E g.

i i i 1 q

{ g घG ङ=f घG ङifG is a form in fF ;:::;F g.

i i i 1 r

For each e in Ins घE ङ and f in Ins घF ङ such that eघAङ 6= f घAङ for some A in

1 1

InsघE;F ङ

FreeघE ङ \ FreeघF ङ, let h घe; f ङ b e unde ned.

Consider the instance

InsघE;F ङ 1 2 InsघE;F ङ 1 2

h घI ; I ङ=fh घe; f ङje is in I ;f is in I g;

1 2

denoted by I 1 I . The following results are established in App endix C.

gr

1

Theorem 6. घGrafting Theoremङ Let G be a grafting of E and F , and I

2 1 2

and I be instances of E and F , respectively. Then K घI 1 I ;E 1 F ङ=

gr gr

1 2

K घI ;Eङ 1 K घI ;Fङ

6 Mapping Strategy

In Section 5, we discussed four typ es of restructuring op erations: trimming,

attening, stretching, and grafting. Compared to traditional programming lan-

5

guages, they are indeed \very high level". Nevertheless, while some business

applications need only one of these op erations, many require sequences of them.

In the following, we discuss how the restructuring programs are develop ed and

generated by the Automatic Restructurer. In essence, the Restructurer takes

over the \thinking and co ding" pro cess by घaङ establishing a plan for mapping

the sp eci ed inputघsङ to the desired output, and घbङ implementing the plan for

reasonably ecient execution. The घaङ part is discussed in the remainder of this

section, and the घbङ part is addressed in Section 7.

It is imp ortanttokeep in mind that the Restructurer is an automatic data

restructuring facility for data pro cessing applications, and not a database man-

agement system. Some concepts of theoretical interest to database management

घe.g., key attributes, p otential information loss or recoverability, access rights,

etc.ङ are not necessarily imp ortant for end users' applications. For example, the

ability to distinguish each entity in an entity set घi.e. the key conceptङ is im-

p ortant for database research, but not for statistical applications. As another

example, the recoverability of information is an imp ortant issue to database re-

searchers and maintainers, but not to end users. The les created by the end

users' applications are generally used to pro duce rep orts, to accommo date di er-

ent p oints of view, and to aid in the decision-making pro cess. They are created

for private use and are not part of the centralized database. These and simi-

lar considerations motivated the underlying philosophy of the Restructurer: The

5

Expansion ratios ranging from 1:18 to 1:175 have b een rep orted [32 ] for a CONVERT

घsee the intro duction to Section 5ङ statement to PL/I statements.

264 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

user knows what he/she wants. As long as the descriptions of the data struc-

tures given to the Restructurer are unambiguous, the Restructurer honors the

request. If it is p ossible to pro duce an output as the user-speci ed, the Restruc-

turer pro ceeds to pro duce it. If no way is found, the Restructurer sends the user

a message.

We are now ready to present the mapping strategy. The material is organized

into three subsections, dealing resp ectively with the sp eci cs of the mapping

strategy for one, two, or more inputs.

6.1 Transformation Strategy When There Is Only One Input

To determine the strategy for transforming one input to one output, the Re-

structurer rst examines the p ossibility of applying the following op erationघsङ,

6

in the order shown. For convenience, we shall refer to these six cases as the rst

order operations.

1. trimming

2. attening

3. at-trim घi.e., attening followed by trimmingङ

4. stretching

5. at-stretch घi.e., attening followed by stretchingङ

6. at-trim-stretch घi.e., attening followed by trimming, then stretchingङ

In other words, if the desired output can b e pro duced by applying trimming,

then trimming is chosen to b e the op eration. Otherwise, the Restructurer exam-

ines the next p ossibility, i.e. attening. If attening also fails the applicability

test, then the third p ossibility is tried, and so forth. If a transformation can-

not b e accomplished byany of the rst order op erations, then \hybridization"

घdiscussed laterङ is applied.

The ab ove op erations are now discussed in more detail.

Consider घ1ङ. To b e \trimmable", the hierarchical relationships b etween the

comp onents must not b e disturb ed during restructuring. In other words, there

must b e no change of \heritage", even though p ositioning of comp onents within

a subform may b e altered, and some of the leaves or branches of a hierarchymay

b e omitted. To determine whether an output is trimmable from a given source,

the tests shown in Figure 12 are applied.

Now consider घ2ङ and घ3ङ. If the output cannot be pro duced by trimming

the given source, then the applicability of attening is examined. When the

desired output do es not have a at shap e, it obviously cannot b e pro duced by

applying attening to the input. Otherwise, the attening op eration is a natural

candidate. However, the following two additional conditions must b e satis ed in

order for attening to b e applicable:

घaङ The source comp onents required for the creation of the output must

b elong to a single branch. This source branch can be either a branch-shap ed

input, or a branch-shap ed subform within a tree-shap ed input.

घbङ At least one attribute must b e extracted from each level along the branch.

6

There may b e more than one way to map the given input to the desired output.

The ordering of the examination discussed here makes sure that the eciency of the

generated program is taken into consideration. घSee Section 7.ङ

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 265

Shap e ranking of yes

आ उ

output > that of - Not trimmable

input ?

ई इ

? no

For all output comp onents:

no Hierarchical level of

उ आ

- Not trimmable each output comp onent

= that of corresp onding

ई इ

input comp onent?

? yes

For all output comp onents:

Parent of output comp onent no

उ आ

=Parent of corresp onding - Not trimmable

input comp onent?

ई इ

? yes Note: Wesay that comp onents X

and Y corresp ond घwith resp ect

to a mapping hङifhघX ङ= Y .

उ आ

Trimmable

ई इ

Figure 12: Determining the applicability of trimming

More sp eci cally, the rules in Figure 13 are followed to determine whether

case घ2ङ or case घ3ङ is applicable. Figure 14 gives an example for case घ3ङ, i.e.

FLAT-TRIM.

Consider घ4ङ. When the output is neither a at form nor trimmable from the

source, the Restructurer explores the p ossibility of using stretching. To apply

this op eration, two rules must b e observed:

घaङ The expansion of new hierarchical levels o ccurs only from basic attributes

at the ro ot level of the source form.

266 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

' $

no Cannot apply

- FLATTENING

Output at?

or FLAT-TRIM

? yes

& क

All comp onents in no

output b elong to one -

branch of input?

आ उ

? yes Send message घsee Noteङ

ई इ

Any skipping of no

- hierarchical levels

among the required

comp onents of input?

? yes

उ आ

Apply FLATTENING

उ आ ई इ

Apply FLAT-TRIM: -

$

ई इ

Apply FLATTENING to pro duce a temp orary le which

includes, in addition to the user-sp eci ed output

attributes, one \holding" attribute from each missing

level along the hierarchical path in the input,

followed by

TRIMMING o the holding attributeघsङ from the

temp orary le to pro duce the desired output.

Note: The data contained in each branch of a tree is

indep endent of the data contained in other branches of that tree

Therefore, simultaneous attening of two or more branches of a

tree into a at form is not allowed.

Figure 13: The applicability of attening and of at-trim

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 267

घINङ

A B C X घGDङ घGFङ

घINङ is the input

D E F घGGङ

G H

TRIMMING out B and घGDङ

followed by FLATTENING

?

घTEMPङ

A C F G H

TRIMMING

?

घOUTङ is the user-sp eci ed output

घOUTङ

A C G H

Figure 14: Example of at-trim

घbङ Except for those comp onents participating in घaङ, corresponding pairs of

the output and input comp onents must maintain the same hierarchical relation-

ships घi.e. if A is the parentof B in the input, then A must b e the parentof B in

the outputङ. Furthermore, if a subform is carried from the input to the output,

it must b e carried in its entirety.

The metho d to determine whether an output is \stretchable" from the input

is shown in Figure 15.

Finally, consider घ5ङ and घ6ङ. When all four cases discussed ab ove fail, the

shap es of the data structures again come into consideration. Supp ose the output

is a branch and the source is either a branch form or all the comp onents necessary

for the creation of the output are in a single branch of the source tree. Then re-

structuring can b e accomplished by attening followed by stretching घsee Figure

16 for an exampleङ, or at-trim followed by stretching. Otherwise, hybridization

is necessary.

The strategy discussed so far is summarized in Figure 17.

In general, if the shap e of the output structure is ranked lower than that of

the input, one of the rst order op erations can b e applied. Hybridization may

b e required when b oth the output and the input are tree-shap ed. An example

requiring hybridization is given in Figure 18.

Hybridization is carried out in two stages, decomposition and synthesis.

At the decomp osition stage, the output data structure is decomp osed into its

268 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

Is output a branch and घeither the

yes input is at or all comp onents

आ उ

- Stretchable required for pro ducing output are

at ro ot level of inputङ?

ई इ

? no

Do es expansion of hierarchical levels no

उ आ

Not Stretchable in output o ccur from and only from the -

ro ot level attributes of the input?

ई इ

? yes

For all comp onents in the output

that are not at the input ro ot level:

no घ1ङ Is the hierarchical relationship

among output comp onents the same

उ आ

Not Stretchable - as those of corresp onding input

comp onents?

ई इ

? yes

no

घ2ङ Comp onents within any subform in

उ आ

Not Stretchable - output the same as those in the

corresp onding subform in input?

ई इ

? yes

उ आ

Stretchable

ई इ

Figure 15: Determining the applicability of stretching

subtrees घi.e., subformsङ. Each of the subtrees concatenated with the attributes

along the hierarchical path is then treated as an intermediate le and compared

with its counterpart in the input. These subtrees are comp onents of the desired

output, and thus have simpler structures घthan that of the outputङ. The data

structures of these intermediate les may also b e simpler than the input, in which

case the rst order op erations can b e applied. If hybridization is again required

for the pro duction of any of the intermediate les, then decomp osition is applied

recursively. Eventually, the Restructurer succeeds in marking all intermediate

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 269

घBRANCH1ङ

A B C घGDङ

Source Form

D E

FLATTENING

?

घTEMP1ङ

A B C D E

STRETCHING

?

Output Form

घBRANCH2ङ

A D घGBङ

B C E

Figure 16: Restructuring of one branch form to another by FLAT-STRETCH

les with rst order op erations घunless there is no solution, in which case the

user is given a message and the pro cess terminatesङ.

The synthesis stage of hybridization b egins when the decomp ositional anal-

ysis is complete. Essentially, the marked op erations are applied to pro duce the

intermediate les from the input, in a top-to-b ottom, left-to-right order. The

intermediate les are then grafted pairwise, in a b ottom-up, right-to-left order,

until the desired output is pro duced. Grafting twointermediate les is the same

as grafting two source घinputङ les. Details of the grafting op eration are discussed

in the following subsection.

270 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

OUTPUT FORM INPUT FORM

Flat Branch Tree

Flat TRIMMING TRIMMING if all required input comp onents

are at the ro ot level;

otherwise FLATTENING or FLAT-TRIMघsee Figure 13ङ

Branch STRETCHING TRIMMING if trimmable by Figure 12;

otherwise STRETCHING if stretchable by Figure 15

otherwise FLATTENING or FLAT-TRIM घsee

Figure 13ङ followed by STRETCHING

घi.e. FLAT-STRETCH or FLAT-TRIM-STRETCHङ

Tree HYBRIDIZATION TRIMMING if trimmable by Figure 12;

otherwise STRETCHING

if stretchable by Figure 15;

otherwise HYBRIDIZATION

Figure 17: Summary of mapping strategy for one input

6.2 Transformation Strategy When There Are Exactly Two Inputs

Whenever there are two inputs, a grafting is required. A grafting op eration

ties two hierarchies side by side by matching the values of the corresp onding

match attributes of the inputs. Except for the match attributes of the second

input, the structure of each input is carried in its entiretyinto the output. No

restructuring of any part is allowed during the grafting op eration. Thus, in case

the comp onents from two source forms are intertwined in the desired output, it is

necessary to use an intermediate form as the output of the grafting. A trimming

is then applied to the intermediate form to rearrange the juxtap osition of the

comp onents according to the user's sp eci cation.

Aside from the fact that grafting may require the creation of an intermediate

form as discussed ab ove, there are two rules that the grafting op eration must

observe:

घ1ङ The match attributes must b e at the ro ot level of b oth inputs.

घ2ङ The match attributes must app ear at the ro ot level of the output.

Consider घ1ङ. If a match attribute is not at the ro ot level of an input, then a

attening op eration must b e p erformed b efore the grafting.

Now, consider घ2ङ. Two cases arise:

घ2aङ All matching attributes app ear at the ro ot level of the output.

घ2bङ Some matching attributes are not at the ro ot level of the output.

In the case of घ2aङ, the Restructurer pro ceeds according to Figure 19. Brie y,

the output data structure is decomp osed into two subparts घeach corresp ond-

ing to one of the input formsङ, with match attributes included in b oth of the

subparts. Each of the subparts is then compared with its corresp onding source

le. If the structures are the same, then the source le is used directly as in-

put घcalled the Graft-source in Figure 19ङ to the grafting op eration. Otherwise,

the subpart is treated as the appropriate input to the grafting op eration, and a

transformation from the given source into this subpart is necessary. The prob-

lem is reduced to creating an intermediate le from only one input, and the

metho d discussed earlier घin Section 6.1ङ is followed. A grafting is p erformed

when the two Graft-sources are ready. An example is shown in Figure 20 with

DIRECTRY and DEP as the given input and RESDIR as the desired output.

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 271

घINङ

घTEMP4ङ

A A A A घF ङ घF ङ

1 2 3 4 1 2



A A A A B B C घF ङ

1 2 3 4 1 2 1 3

C C

2 3

TRIMMING

STRETCHING TRIMMING

?

?

घTEMP3ङ

घTEMP2ङ

A A घF ङ घF ङ

1 2 1 2

A A घF ङ

1 2 4

B B C घF ङ

1 2 1 3

A A

3 4

C C

2 3

घTEMP1ङ

A A घF ङ घF ङ घF ङ

1 2 4 1 2

GRAFTING -

A A B B C घF ङ

3 4 1 2 1 3

C C

घmatchon A ;A ङ

2 3

1 2

STRETCHING

?

घOUTङ

A घF ङ

1 5

A घF ङ घF ङ घF ङ

2 4 1 2

A A B B C घF ङ

3 4 1 2 1 3

C C

2 3

Figure 18: Restructuring of घINङ to घOUTङ byhybridization

In the case of घ2bङ, when some match attributes are not at the ro ot level

of the output, it is necessary to construct an intermediate le with all match

attributes app earing at the ro ot level so that a grafting op eration can b e applied.

Take Figure 21 for example. In order to pro duce NEWDIR घthe desired outputङ

from DEP and DIRECTRY les, an intermediate le घTEMP2ङ is constructed as

the output of grafting. Notice that the match eld ENO in the intermediate le

घTEMP2ङ is at the ro ot level. The problem is now reduced to grafting where all

match attributes app ear at the ro ot level of the output. The metho d describ ed

earlier घFigure 19ङ is followed. A stretching is then applied to transform the

output of the grafting घi.e., the intermediate leङ into the nal desired form.

272 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घ1ङ Decomp ose output structure

घaccording to the sourcesङ into two

subparts, with match attributeघsङ

included in b oth subparts.

?

घ2ङFor i = 1 to 2:

yes

Is Subpart.i identical with

its corresp onding Source.i ?

? no

?

Follow Section 5.1 to

transform Source.i to Subpart.i

?

?

Graft-source.i = Subpart.i

?

घ3ङ Do es p ositioning of

yes

comp onents in targeted Graft-source.1 Graft-source.2

-

output agree with p ositioning

in corresp onding sources ?

no ?

apply De ne a temp orary le, TEMP,

GRAFTING by concatenating structures of

Graft-source.1 and Graft-source.2

Graft-source.1 Graft-source.2 ?

apply GRAFTING ?

?

apply TRIMMING

-

TEMP Desired output

to alter p ositioning

Figure 19: Grafting where all matching attributes app ear at ro ot level

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 273

घDEPङ

घDIRECTRYङ

DNO MGR DIV घEMPङ

ENO NAME PHONE

ENO JC

Note that the match attribute ENO is not

at the ro ot level of the input घDEPङ, but FLAT-TRIM

do es app ear at the ro ot level of घRESDIRङ,

the desired output. ?

घTEMP1ङ

DNO ENO

GRAFTING घmatch on ENOङ

?

घTEMP2ङ

ENO NAME PHONE DNO

? TRIMMING

घRESDIRङ

ENO NAME DNO PHONE

Figure 20: Pro ducing घRESDIRङ from घDIRECTRYङ and घDEPङ

6.3 Transformation Strategy When There Are More Than Two

Inputs

The preceding subsection घ6.2ङ discussed the transformation strategy when there

are exactly two source les. In case there are more than two, a gradual pairwise

build-up can b e employed. For example, TFORM can b e created from S1, S2,

S3, S4 and S5 as shown in Figure 22.

Since the basic concepts underlying this approachhave already b een covered,

no further discussion is necessary.

274 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

घDIRECTRYङ

घDEPङ

ENO NAME PHONE

DNO MGR DIV घEMPङ

ENO JC

FLATTENING

?

घTEMP1ङ

DNO MGR DIV ENO JC

GRAFTING घmatch on ENOङ

?

घTEMP2ङ

DNO MGR DIV ENO JC NAME PHONE

STRETCHING

?

घNEWDIRङ

DNO MGR DIV घEMPङ

ENO JC NAME PHONE

Note that the match attribute ENO is not at the

ro ot level of घNEWDIRङ, the desired output.

Figure 21: Pro ducing घNEWDIRङ from घDIRECTRYङ and घDEPङ

7 Implementing The Plan

As so on as the strategy for transforming the designated input into the desired

output is completed, the second stage of automatic restructuring b egins. The

main function of the second stage is to pro duce an executable program to carry

out the sequence of basic op erations determined in the rst stage. In other words,

the main function of the rst stage is planning and of the second stage con-

structing.

The program constructed by the Restructurer consists of घ1ङ a declarative

part which de nes in target language the data structures of all the les involved

in the op eration, and घ2ङ an imp erative part which sp ells out the co de sequence

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 275

r r r r r S1 S2 S3 S4 S5

@ @

@

@ r r S12 S34

@

@

@

@ r S1234

H

H

H

H r TFORM

Figure 22: The computation of TFORM

necessary to p erform the particular op erations at hand. The declarative part of

the pro cedure is pro duced from information contained in the comp onent descrip-

tion tables. घThe comp onent description tables describing inputs are fetched from

a catalog of prede ned forms and those describing intermediate and output forms

are prepared and placed in the catalog at the end of the co de generation phase,

i.e., the constructing stageङ. The imp erative part is patterned after program

mo dels designed for the basic op eration typ es trimming, attening, stretching,

and grafting.

The goal of the Automatic Restructurer is to pro duce an ecient program

tailored for the particular situation at hand. Some eciency considerations are

taken by the Restructure at the planning stage. For example, among the rst

order op erations discussed in Section 6.1, the p ossibility of applying a trimming

op eration is considered b efore all others. Having a trimming op eration as the

rst of a sequence of op erations serves two purp oses: घ1ङ unwanted instances

are ltered out as early as p ossible घthus reducing the numb er of instances to b e

pro cessed by subsequent op erationsङ and घ2ङ unwanted comp onents are trimmed

o as early as p ossible घthus reducing the width of the forms to b e pro cessed by

subsequent op erationsङ.

The main concern at the constructing stage is to make sure that, regardless

of the op eration typ e, the generated co de घi.e., the procedure ङ for each op eration

7

passes over data once and only once. The goal is to p erform necessary functions

with minimum scanning and movement of data. The criterion of passing over

data only once for each op eration causes no problem for trimming and atten-

ing. For these op erations, one can always fetch a record, pro cess it, and go back

to fetch another one. For the other twotyp es of op erations, one pass over the

data is p ossible only when a certain sort order घor simply orderङ requirementis

8

enforced. . In the case of stretching, input must b e sorted according to the elds

which form the extended hierarchical level in the output. In the case of grafting,

b oth inputs must be sorted on the match attributes घin the same ordering se-

quence and the same ascending/descending directionsङ. Thus, b efore pro ducing

7

This concern was discussed in [32 ]

8

Referring to the rst order op erations discussed in Section 6.1, since attening do es

not require a sorted order while stretching do es, the applicability of attening is

examined b efore the applicability of stretching

276 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

a pro cedure for a stretching or grafting op eration, the order requirement for the

op eration is examined and compared with the sequencing order of the input घif

knownङ. If they are not compatible, or the order information on the input is not

available, then a sort op eration is inserted to enforce the order requirement.

As mentioned ab ove, the unavailability of the order information could mean

an unnecessary insertion of a SORT op eration घe.g., an intermediate le may

already be in the required sort order for the next op erationङ. Therefore, the

order information for all intermediate forms is recorded whenever p ossible. It

is easy to see that the trimming and attening op erations are order preserving

if घ1ङ the input to the op eration is sorted and घ2ङ all the order elds in the

input are extracted and placed in the output directly घwithout the side e ects

of arithmetic or string functionsङ. The Restructurer detects these situations and

propagates the order information accordingly. In the case of the stretching and

grafting op erations, the resulting order of the output is determinable once the

order requirement on the input is enforced. Thus, the Restructurer is able to

record the order information for the output of these op erations.

When the desired ordering of the user-sp eci ed output is known, the Restruc-

turer makes the enforced ordering compatible with the desired ordering whenever

p ossible. If that is not feasible, then a nal SORT घand/or a sort within par-

entङ is included in the generated program. In this manner, the transformation

strategy determined in the rst stage is implemented in the second stage as an

executable program घconsisting of a set of prop erly sequenced pro cedures and

SORT commandsङ to carry out the plan.

Let B be a nite set of instantiated forms. The notion of B -computation

intro duced next is intended as a formalization of the automatic restructuring

pro cess.

A sequence of instantiated forms घI ;F ङ;:::;घI ;F ङ is said to be a B -

1 1 n n

computation of length n if for every i, 1 ऊ i ऊ n, one of the following cases

o ccurs:

1. घI ;F ङisinB ;

i i

2. there exists an integer j ,1 ऊ j

i i j j

by stretching, attening, or trimming; or

3. there exist integers j ;j ,1ऊ j 6= j

1 2 1 2 i i

घI ;F ङ and घI ;F ङby grafting.

j j j j

1 1 2 2

If घI ;F ङ;:::;घI ;F ङisaB -computation and घI ;Fङ=घI ;F ङ, then the in-

1 1 n n n n

stantiated form घI ;Fङ is said to b e B -computable.

Let F be a nite set of forms. A form G is said to be F -compatible if for

every ए in Paths घGङ, there is a subset fF ;:::;F g of F and a set of paths

1 n

fए ;:::;ए g such that ए in Paths घF ङ for 1 ऊ i ऊ n and

1 n i i

[

fBAघए ङj1 ऊ i ऊ ng: BAघए ङ ई

i

If G = F 1 F , then G is fF ;F g-compatible by the de nition of grafting.

1 gr 2 1 2

Similarly, direct applications of the de nitions of the remaining basic restructur-

ing op erations show that if G is obtained from F by either stretching, attening,

1

or trimming, then G is fF g-compatible.

1

Tointro duce the notion of solution we need the following concept. A nite

0 0

set of forms F is directly solvable if BAघF ङ \ BAघF ङ=FreeघF ङ \ FreeघF ङ 6= ;

0 0

for all distinct forms F and F , F 6= F in F .

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 277

Note that every set F with exactly one form is directly solvable. Furthermore,

F is directly solvable if F = fF ;F g and F 1 F is de ned.

1 2 1 gr 2

Prop osition 7. Let F = fF ;:::;F g, n ऋ 2,be a directly solvable set of forms.

1 n

S

Then G =घघF 1 F ङ 1 ࣽࣽࣽङ 1 F is de ned, BAघGङ= fBAघF ङj1 ऊ

1 gr 2 gr gr n i

S

i ऊ ng, and FreeघGङ= fFreeघF ङj1 ऊ i ऊ ng.

i

Proof. The argument is by induction on n. The case n = 2 is obvious. Now

supp ose that the statement holds for directly solvable sets of forms with fewer

than n elements. Let F = fF ;:::;F g be a directly solvable set of forms.

1 n

0 0

Clearly, F = F fF g is a directly solvable set of forms, G = घघF 1

n 1 gr

S

0

F ङ 1 ࣽࣽࣽङ 1 F is de ned, BAघG ङ = fBAघF ङj1 ऊ i ऊ n 1g, and

2 gr gr n1 i

S

0

FreeघG ङ= fFreeघF ङj1 ऊ i ऊ n 1g. Then

i

[

0

BAघF ङ \ BAघG ङ=BAघF ङ \ fBAघF ङj1 ऊ i ऊ n 1g

n n i

[

= fBAघF ङ \ BAघF ङj1 ऊ i ऊ n 1g

n i

[

= fFreeघF ङ \ FreeघF ङj1 ऊ i ऊ n 1g

n i

[

= FreeघF ङ \ fFreeघF ङj1 ऊ i ऊ n 1g

n i

0

= FreeघF ङ \ FreeघG ङ:

n

0

Thus, the grafting b etween G and F is de ned and the desired form G = F 1

n n gr

0 0

G can be constructed. Since BAघGङ = BAघF ङ [ BAघG ङ and FreeघGङ =

n

0

FreeघF ङ [ FreeघG ङ, the inductivehyp othesis implies the desired conclusion.

n

Theorem 8. Let F = fF j1 ऊ i ऊ ng bea nite, directly solvable set of forms

i

and F beanF -compatible form. Then for every instantiation B = fघI ;F ङj1 ऊ

i i

i ऊ ng of F there exists a B -computable instance घI ;Fङ of F such that

K घI ;Fङ=ँ घ1 fK घI ;F ङj1 ऊ i ऊ ngङ:

i i

BAघF ङ

Proof. Supp ose that F is a branch. Let F be the form obtained from F by

attening. Since BAघF ङ = BAघF ङ, the F -compatibility of F implies the F -

compatibilityof F .

Let G = घघF 1 F ङ 1 ࣽࣽࣽङ 1 F be the form whose existence was

1 gr 2 gr gr n

established in Prop osition 7. Let B = fघI ;F ङ;:::;घI ;F ङg b e an instantiation

1 1 n n

of F . The corresp onding instance घJ ;Gङ is given by

घJ ;Gङ=घघI ;F ङ 1 घI ;F ङ 1 ࣽࣽࣽङ 1 घI ;F ङ:

1 1 gr 2 2 gr gr n n

Therefore

K घJ ;Gङ=K घI ;F ङ 1 ࣽࣽࣽ 1 K घI ;F ङ

1 1 n n

F by a rep eated application of the Grafting Theorem. The F -compatibilityof

implies BAघF ङ ई BAघGङ, so there is an instance घI ; F ङofF such that

I = K घI ; F ङ=ँ घK घI ;F ङ 1 ࣽࣽࣽ 1 K घI ;F ङङ:

1 1 n n

BAघF ङ

278 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

The branch घI ;Fङ can be obtained from घI ; F ङ by stretching. The Stretching

Theorem implies

K घI ;Fङ=ँ घK घI ;F ङ 1 ࣽࣽࣽ 1 K घI ;F ङङ;

1 1 n n

BAघF ङ

as needed. The argument also shows that घI ;FङisB -computable by a sequence

of restructuring op erations involving attening, grafting and stretching.

Now supp ose that F is a tree. The branches G ;:::;G of F can b e extracted

1 `

by trimming. The F -compatibilityofF implies the F -compatibilityofeachof

its branches. By the previous argument, for every instantiation B of F and for

every integer k ,1ऊ k ऊ `, there is an instance घJ ;G ङofG such that

k k k

K घJ ;G ङ=ँ घ1 fK घI ;F ङj1 ऊ i ऊ ngङ:

k k i i

BA घG ङ

k

The grafting of the instances घJ ;G ङ yields an instance of घI ;Fङ. By the Graft-

k k

ing Theorem we then have

K घI ;Fङ=K घJ ;G ङ 1 ࣽࣽࣽ 1 K घJ ;G ङ

1 1 ` `

= ँ घ1 fK घI ;F ङj1 ऊ i ऊ ngङ 1 ࣽࣽࣽ

i i

BAघG ङ

1

1 ँ घ1 fK घI ;F ङj1 ऊ i ऊ ngङ

i i

BAघG ङ

`

= ँ घ1 fK घI ;F ङj1 ऊ i ऊ ngङ;

i i

BAघF ङ

which gives the desired equality for trees.

The existence of the B -computation for घI ;Fङ is implicit in this argument.

Sp eci cally, this computation extends the computations of the instances of the

branches of F and grafts these instances to obtain an instance of F .

The instantiated form घI ;Fङ in Theorem 8 will b e referred to as an F -direct

solution of B , or simply as a direct solution of B when F is clear from the context.

The previous theorem shows that if F is a directly solvable set of forms and F

is an F -compatible form, then for every instantiation B of F , the Restructures

can pro duce a direct solution.

We now extend the notion of direct solution. Let B be a nite set of in-

stantiated forms and F a form. An F -solution of B is de ned inductively as

follows:

1. Every instantiated form घI ;FङinB is a solution of B .

2. Every direct solution of B is a solution of B .

3. If घI ;F ङ;:::;घI ;F ङ are solutions of B , fF ;:::;F g is a directly solvable

1 1 n n 1 n

set of forms and F is fF ;:::;F g-compatible, then every F -direct solution

1 n

घI ;FङoffF ;:::;F g is a solution of B .

1 n

We shall refer to an F -solution of B as a solution of B when F is understo o d

from the context.

The next theorem shows that the Restructurer pro duces a solution if and

only if a solution exists.

Theorem 9. Let B be a nite set of instantiated forms.

If घI ;Fङ is a B -computable form, then घI ;Fङ is a solution of B .

Conversely, if B is a nite set of forms and F is a form such that thereisan

F -solution of B , then there exists a B -computation that generates an instantiated

form घI ;Fङ.

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 279

Proof. Supp ose that B is a nite set of instantiated forms and घI ;Fङ is a B -

computable form. Then there exists a B -computation घI ;F ङ;:::;घI ;F ङ such

1 1 n n

that घI ;Fङ=घI ;F ङ. We proveby induction that घI ;Fङ is a solution of B .

n n

If n = 1 then घI ;Fङ b elongs to B by de nition. Supp ose that the theorem

holds for sequences of length less than n. Several sub cases of the inductive case

need to b e considered, dep ending on the basic restructuring op eration applied

in constructing घI ;F ङ.

n n

Supp ose that घI ;F ङ =घI ;F ङ 1 घI ;F ङ, where j ;j

n n j j gr j j 1 2

1 1 2 2

ङ are solutions of B . The de nition ;F ङ and घI ;F inductivehyp othesis, घI

j j j j

2 2 1 1

of the grafting op eration implies that fF ;F g is directly solvable, and that

j j

1 2

घI ;F ङisanF -direct solution of fघI ;F ङ; घI ;F ङg.We leave to the reader

n n n j j j j

1 1 2 2

the argument of the remaining sub cases when घI ;F ङ is obtained from घI ;F ङ,

n n j j

j

Conversely, supp ose that घI ;Fङ is a solution of B . If घI ;Fङ b elongs to B ,

then घI ;Fङ is clearly B -computable. Also, if घI ;Fङ is a direct solution of B , then

Theorem 8 implies that घI ;Fङ is B -computable. Finally, supp ose that घI ;Fङ

0

is a direct solution of the set B = fघI ;F ङ;:::;घI ;F ङg, where घI ;F ङ is a

1 1 n n i i

solution of B for 1 ऊ i ऊ n, fF ;:::;F g is a directly solvable set of forms,

1 n

and F is fF ;:::;F g-compatible. Supp ose that each instantiated form घI ;F ङ

1 n i i

0

is B -computable. Theorem 8 implies that घI ;Fङ is B -computable. Therefore,

0

by concatenating the n B -computations of घI ;F ङ;:::;घI ;F ङ with the B -

1 1 n n

computation of घI ;Fङ one obtains a B -computation of घI ;Fङ, so घI ;Fङ is B -

computable.

8 Summary

Most information-pro cessing applications involve data restructuring of various

degrees. The e ort of writing application programs can b e substantially reduced

when the e ort of pro ducing data restructuring co de is carried out automatically.

This pap er discusses the Automatic Restructurer, a mechanism that relieves the

user from the tedium of writing a substantial amount of co de.

The Restructurer was implemented at the IBM Los Angeles Scienti c Center

as part of an application development system called FORMAL [34] The capabil-

ity of generating an executable program for the desired transformation dep ends

mainly on the unambiguous descriptions of input and output data structures.

The pro cess of generating co de is accomplished in two stages. In the rst, the

goal of an application is recognized, the di erences b etween the input and output

data structures examined, and the applicabilities of the rules and metho ds for

transformation analyzed. The result is a plan for converting the input into the

output. In the second stage, construction b egins. Emb edded knowledge of the

target system is utilized to implement the plan eciently. The result is a tailored

executable program capable of transforming the input to the desired output.

Given this automatic data restructuring capability, complicated information-

pro cessing applications can be sp eci ed in a simple and truly non-pro cedural

fashion [34, 36] and the non-trivial task of data restructuring can be accom-

plished in a systematic manner.

We intro duce the notion of a solution of a set of instantiated forms with

resp ect to an output form. This allows us to characterize those instantiated

forms that can b e obtained as a result of recasting the information contained in

280 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

a set of instantiated forms. Furthermore, we prove that if such a solution exists,

then the Restructurer will always pro duce a solution. घIf the Restructurer states

that it is unable to restructure data contained by the set of forms, then no

solution exists for that set of forms with resp ect to the desired output form.ङ

9 Acknowledgements

S. Ginsburg's contribution was supp orted in part as a consultant to the IBM

Los Angeles Scienti c Center and in part by the National Science Foundation

under grant CCR-8618907.

John Kepler and Jim Jordan's management supp ort is very much appreci-

ated. The authors are indebted to Dr. Marc Gyssens for helping to clarify the

original incorrect formulation of the basic problem.

References

[1]. Abiteb oul, S. and Bidoit, N.: Non First Normal Form Relations to Represent Hi-

erarchical ly Organized Data, Pro c. ACM SIGACT-SIGMOD Symp osium on Prin-

ciples of Database Systems, April 1984, pp. 191-200.

[2]. Abiteb oul, S. and Bidoit, N.: Non First Normal Form Relations: AnAlgebraAl-

lowing Data Restructuring, Journal of Computer and System Sciences, 1986, vol.

33, pp. 361-393.

[3]. Abiteb oul, S. and Hull, R.: Restructuring Hierarchical Database Objects, INRIA

Rapp orts de Recherche No. 615, 1987, To app ear in Theoretical Computer Science.

[4]. Atkinson, M.P., and Buneman, O.P.: Types and Persistence in Database Program-

ming Languages,ACM Computing Surveys, 1987 vol. 19 घ2ङ, pp. 105-190.

[5]. Barr, A. and Feigenbaum, E.A.: The Handbook of Arti cial Intel ligence vol.2, 1982.

William Kaufmann, Inc.

[6]. Burnett M. B., Ambler, A. L.: Interactive Visual Data Abstraction in a Declarative

Programming Language, Journal of Visual Languages and Computing, 1994, vol.

5, pp. 29{60.

[7]. Carter, D.A., Collins, J.W. and Khandewal, V.K.: Data Conversion and Restruc-

turing at Australian Iron and Steel PTY LTD using XPRS, Australasian Share

Guide Meeting, July 1982.

[8]. Chang, S. K.: A Visual Language Compiler for Information Retrieval by Visual

Reasoning, IEEE Transactions on Software Engineering, 1990, v. 16, pp. 1136-

1149.

[9]. Colby,L.S.:ARecursive Algebra and Query Optimization for NestedRelations, Pro-

ceedings of 1989 ACM SIGMOD Conference on Management of Data, 1989, pp.

273-283.

[10]. Dadam, P., Kuesp ert, F., Andersen, H., Blanken, H., Erb e, R., Guenauer, J.,

2

Lum., V., Pistor, P. and Walch, G.: A DBMS Prototype to Support Extended NF

Relations: An Integrated View on Flat Tables and Hierarchies, Pro ceedings of the

1986 ACM-SIGMOD Conference on Management of Data, pp. 356-357.

[11]. Deshpande, A., and Van Gucht, D.:An implementation for Nested Relational

Databases, Pro ceedings of the 14th VLDB Conference, 1988, pp. 76-87.

[12]. Fisher, P., Thomas, S.: Operators for Non-First-Normal-Form Relations, Pro-

ceedings of COMPSAC, 1983, pp. 464-475.

[13]. Guting, R.H., Zicari, R., and Chopy, D.M.: An Algebra for Structured Oce

Documents, IBM Research Rep ort RJ5559, 1987.

[14]. Hull, R. A Survey of Theoretical Research on Typed Complex Database Objects in

Databases घedited byJ.Paredaensङ, Academic Press, 1987, London, pp. 193-256.

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 281

[15]. Hull, R. and Su, J.: On the Expressive Power of Database Queries with Inter-

mediate Types, Pro ceedings of the ACM Symp osium on Principles of Database

Systems, 1988, pp. 39-51.

[16]. Data Extraction, Processing and Restructuring System: De ne and CONVERT

Reference Manual, SH20-2178, Program Numb er 5796-PLH, 1979, IBM Corp ora-

tion.

[17]. Jaeschke, G. and Schek, H.J.: Remarks on the Algebra of Non First Normal Form

Relations, Pro c. ACM SIGACT-SIGMOD Symp osium on Principles of Database

Systems, 1982, pp. 124-138.

[18]. Jaeschke, G.: Nonrecursive Algebra for Relations with Relation-ValuedAttributes,

IBM Heidelb erg Scienti c Center Technical Rep ort TR 85.03.001, March 1985.

[19]. Jaeschke, G.: Recursive Algebra for Relations with Relation-Valued Attributes,

IBM Heidelb erg Scienti c Center Technical Rep ort, TR 85.03.002, March 1985.

[20]. Kitagawa, H., Kunii, T. and Ishii, Y., Design and Implementation of a Form

Management System APAD Using ADABAS INQ DBMS, Pro c. COMPSAC 81,

Nov.1981, pp. 324-334.

[21]. Kitagawa, H., Gotoh, M., Misaki, S. and Azuma, M.: Form Document Man-

agement System SPECDOQ - Its Architecture and Implementation, Pro c. of the

Second ACM Conf. on Oce Information Systems, June 1984, pp. 132-142.

[22]. Leitheiser, R.L., and Wetherb e, J.C.: Approaches to End-User Computing: Service

May Spel l Success, Journal of Information Systems Management, Winter 1986,

pp.9-14.

[23]. Luo, D., and Yao, S.B.: Form Operation by Example - A Language for Oce

Information Processing, Pro ceedings of SIGMOD Conference, 1981, pp. 213-223.

[24]. Mohan L., Kashyap, R. L.: A Visual Language for Graphical Interaction with

Schema-Intensive Databases, IEEE Transactions on Knowledge and Data Engi-

neering, 1993, 5, pp. 843-858.

[25]. Navathe, S.B. and Fry, J.P.: Restructuring for Large Databases: ThreeLevels of

Abstraction,ACM Trans. Database Systems, 1976, vol. 1 घ2ङ, pp. 138-158.

2

[26]. Pistor, P., and Andersen, F.: Designing a Generalized NF Model with an SQL-

TypeLanguage Interface, Pro ceedings of 12th Conf. on VLDB, 1986, pp. 278-288.

[27]. Ro ckart, J.F., and Flannery, L.S.: The Management of End User Computing,

Communications of the ACM, Oct 1983, vol.26 घ10ङ, pp. 776-784.

[28]. Roth, M.A., Korth, H.F., and Batory, D.S.: SQL/NF: A Query Language for

:1NF Relational Databases, Information Systems, 1987, vol. 12 घ1ङ, pp. 99-114.

[29]. Roth, M.A. and Korth, H.F., The Design of :1NF Relational Databases into

Nested Normal Form, Dept. of Computer Sciences, Univ. of Texas at Austin, Tech-

nical Rep ort TR-86-27, Dec 1986.

[30]. Roth, M.A., Korth, H.F., and Silb erschatz, A.: Extended Algebra and Calculus

for Nested Relational Databases, ACM Transactions on Database Systems, Dec.

1988, vol. 13 घ4ङ, pp. 389-417.

[31]. Shu, N.C., Housel, B.C. and Lum, V.Y.: CONVERT: A High Level Translation

De nition Language for Data Conversion, Communications of ACM, Oct 1975,

vol. 18 घ10ङ, pp 557-567.

[32]. Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., and Lum, V.Y.: EXPRESS:

A Data Extraction, Processing, and Restructuring System,ACM Trans. Database

Systems, June 1977, vol. 2 घ2ङ, pp. 134-174.

[33]. Shu, N.C., Lum, V.Y., Tung, F.C., and Chang, C.L.: Speci cation of Forms Pro-

cessing and Business Procedures for OceAutomation, IEEE Trans. on Software

Engineering, Sept. 1982, vol. SE-8 घ5ङ, pp. 499-512.

[34]. Shu, N.C.: FORMAL: A Forms-Oriented, Visual-Directed Application Develop-

ment System, IEEE COMPUTER, Aug. 1985, vol. 18 घ8ङ, pp. 38-49.

[35]. Shu, N.C.: Automatic Data Transformation and Restructuring, Pro ceedings of

IEEE Third Data Engineering Conference, Feb., 1987, pp. 173-180.

282 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

[36]. Shu, N.C.: Visual Programming, Van Nostrand Reinhold Company, Inc. 1988,

NewYork.

[37]. Shu, N.C.: A Visual Designed for Automatic Program-

ming, Pro c. of 21st Annual Hawaii International Conf. on Systems Sciences, Jan.

1988, pp. 662-671.

[38]. Siau, K. L., Chan, H.C., Tan, K. P.: Visual Know ledge Query Language, IEICE

Transactions on Information and Systems, 1992, E75-D, pp. 697{703.

[39]. So ckut, G.H. and Goldb erg, R.P.: Database Reorganization - Principles and Prac-

tice,ACM Computing Surveys, Dec 1979, vol.11 घ4ङ, pp. 371-395.

[40]. Swartwout, D.E., Depp e, M.E. and Fry, J.P.: Operational Software for Restruc-

turing Network Databases, Pro c. National Computer Conference, June 1977, pp.

499-508.

[41]. Tansel, A.U. and Garnett, L.:Nested Historical Relations Pro c. 1989 ACM SIG-

MOD Conference, 1989, pp. 284-293.

[42]. Thomas, S.J. and Fischer, P.C.: Operators for Non First Normal Form Relations,

Pro c. COMPSAC, 1983, pp. 464-475.

[43]. Ullman, J.D.: Principles of Database and Know ledge-Base Systems. Vol.I, Com-

puter Science Press, 1988, Maryland.

[44]. Valduriez, P.:Complex Objects in Relational Database Systems, Technology and

Science of Informatics, 1987, vol. 6घ8ङ, pp. 597-608.

A A Program to Pro duce घXMASLISTङ from घPERSONङ

The following is a CONVERT program [32] which pro duces घXMASLISTङ shown

in Figure 2 from घPERSONङ shown in Figure 1. The Automatic Restructurer is

able to generate this program from the FORMAL sp eci cation shown in Figure

4. घRecall that the SLICE and CONSOLIDATE op erations used in CONVERT

are equivalent to the attening and stretching op erations discussed in Section

4.ङ

/** PROCESS NAME = XMASLIST **/

FORM PERSON घ

ENO CHघ3ङ,

DNO CHघ2ङ,

NAME CHघ8ङ,

PHONE CHघ4ङ,

JC CHघ2ङ,

KIDSघ

KNAME CHघ8ङ,

AGE CHघ2ङ

ङ RG,

SCHOOLघ

SNAME CHघ9ङ,

ENROLLघ

YEARIN CHघ4ङ,

YEAROUT CHघ2ङ

ङ RG

घORDERED ON घENROLL.YEARIN ASCङङ

ङ RG,

SEX CHघ1ङ,

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 283

LOC CHघ2ङ

:ORDERED ON घDNO ASC,ENO ASCङ;

$P1F2 = SLICE घ

NAME

,LOC

,KNAME

,AGE

FROM PERSONङ;

FORM $P1F2 घ

NAME CHघ8ङ,

LOC CHघ2ङ,

KIDSघ

KNAME CHघ8ङ,

AGE CHघ2ङ

ङ NRG

ङ;

/* IT IS NECESSARY TO SORT $P1F2 ON CONCATENATED KEYS

OF OUTPUT FOR CONSOLIDATE OPERATION. */

FORM $P1F3 घ

NAME CHघ8ङ,

LOC CHघ2ङ,

KIDSघ

KNAME CHघ8ङ,

AGE CHघ2ङ

ङ NRG

:ORDERED ON घLOC ASC,KIDS.AGE ASCङ;

$P1F3 = SORT घ$P1F2 BY

LOC ASC

,AGE ASC

ङ;

XMASLIST = CONSOLIDATE घ$P1F3ङ;

FORM XMASLIST घ

LOC CHघ2ङ,

RECEIVERघ

AGE CHघ2ङ,

284 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

KIDघ

KNAME CHघ8ङ,

NAME CHघ8ङ

ङ RG

ङ RG

घKEY IS घRECEIVER.AGEङ,

ORDERED ON घRECEIVER.AGE ASCङङ

:KEY IS घLOCङ,

ORDERED ON घLOC ASCङ,

DISPOSITION IS घINTERNALङ;

END;

B Pro ofs of the Trim Prop osition and Theorem

In order to establish the Trim Prop osition, we rst prove:

Prop osition 10. Let G be a trimming of F by h.

घaङ Suppose B and C areinAttrib घF ङ, B = Par घC ङ, and hघB ङ and hघC ङ

exist. Then hघB ङ=Par घhघC ङङ, i.e., hघC ङ is an o spring of hघB ङ.

घbङ Suppose hघC ङ exists. Then hघB ङ exists for every ancestor B of C , and is

an ancestor of hघC ङ.

घcङ Suppose hघB ङ and hघC ङ exist, and C is a descendent of B . Then hघC ङ is

a descendent of hघB ङ.

घdङ Suppose hघC ङ is a descendent of hघB ङ. Then C is a descendent of B .

घAlternatively, if hघB ङ is an ancestor of hघC ङ, then B is an ancestor of C ङ.

Proof. Let G = hG ;:::;G i and F = hF ;:::;F i.

1 s 1 r

1 1

घaङ Supp ose hघC ङ=G. Then C = h घhघC ङङ = h घGङ=F घbyघTr 3 ङङ, so

B = Par घC ङ=ParघF ङ. This is a contradiction, since F has no parent. Thus,

hघC ङ 6= G. Therefore, Par घhघC ङङ exists. Hence,

1 1

h घPar घhघC ङङङ = Par घh घhघC ङङङघbyघTr 2ङङ

1

= Par घC ङ=B = h घhघB ङङ:

1

Since h is one-to-one, thus h is one-to-one, Par घhघC ङङ = hघB ङ.

घbङ Using induction and घaङ, it suces to show that hघB ङ exists for B =

Par घC ङ. Supp ose Par घhघC ङङ do es not exist. Then hघC ङ=G, since G is a form.

1 1

Hence, C = h घhघC ङङ = h घGङ=F . This contradicts the fact that C has a

parent. Thus, hघB ङ=hघPar घC ङङ = Par घhघC ङङ exists.

घcङ By घbङ, hघB ङ is an ancestor of hघC ङ, i.e., hघC ङ is a descendentofhघB ङ.

1

घdङ This statement follows by induction from the fact that h घPar घD ङङ =

1

Par घh घD ङङ for each prop er attribute D of G.

Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring 285

Trimming Prop osition. Let

G = hG ;:::;G i

1 s

b e a trimming of F = hF ;:::;F i by h. Let j b e such that hघF ङ exists. Then

1 r j

घaङ there exists k घj ङ such that hघF ङ=G .

j

k घj ङ

घbङ G is a basic attribute if and only if F is a basic attribute, in which

j

k घj ङ

case G = F .

j

k घj ङ

घcङ hघAttrib घF ङङ = Attrib घG ङ.

j

k घj ङ

घdङ G is a trimming of F by h घrestricted to AttribघF ङङ.

j j

k घj ङ

Proof. घaङ. Since F = Par घF ङ, hघF ङ=G = Par घhघF ङङ by घaङ of Prop osition

j j

10. By de nition, each o spring of G is one of the G . Hence, F = G for

i j

k घj ङ

some k घj ङ.

1

घbङ. By घTr 1ङ, G = h घG ङ=F if G is a basic attribute. Supp ose

j

k घj ङ k घj ङ k घj ङ

G is a form. Then there is a descendent C of G which is a basic attribute.

k घj ङ k घj ङ

1 1

By घdङ of Prop osition 10, C = h घC ङ is a descendentofh घG ङ=F . Since

j

k घj ङ

every ancestor of a basic attribute is a form, F is a form.

j

घcङ This follows immediately from घcङ and घdङ of Prop osition 10.

घdङ This follows from घcङ and घTr1ङ-घTr 3ङ.

Wenow show:

InsघF ङ

Trimming Theorem Let G b e a trimming of F by h. Then K घh घI ङङ =

ँ घK घI ङङ for each I in InsघF ङ.

BAघGङ

Proof. Supp ose F is a at form. Then for each f in I , f is a tuple over FreeघF ङ=

InsघF ङ

BAघF ङ and h घf ङ=ँ घf ङ. Hence,

BAघGङ

InsघF ङ InsघF ङ

K घh घI ङङ = K घfh घf ङjf is in Igङ

= K घfँ घf ङjf is in Igङ

BAघGङ

[

= ँ घ fK घf ङjf is in Igङ; since f is a tuple over BAघF ङ

BAघGङ

= ँ घK घI ङङ:

BAघGङ

InsघF ङ 0 0

0

Continuing by induction supp ose that K घh घI ङङ = ँ घK घI ङङ for

BAघG ङ

0 0 0 0

each subform F of F and each instance I of F , where G is the trimming of

0

F by h as guaranteed by घdङ of the Trim Prop osition. To extend the induction,

InsघF ङ

it suces to show that K घh घf ङङ = ँ घK घf ङङ for each f in Ins घF ङ.

1

BAघGङ

Let G = hG ;:::;G i and F = hF ;:::;F i.Two cases arise.

1 s 1 r

InsघF ङ InsघF ङ

घ ङ Supp ose e is in K घh घf ङङ. Now h घf ङ=g , where g is the func-

tion in Ins घGङ de ned by g घAङ=f घAङ for each A in FreeघGङ and g घG ङ=

1

k घj ङ

InsघhF iङ

j

h घf घF ङङ for each form F such that hघF ङ exists. By de nition, eघAङ=

j j j

286 Ginsburg S., Shu N.C., Simovici D.A.: Automatic Data Restructuring

g घAङ = f घAङ for each A in FreeघGङ and e over BAघG ङ, where G is a

k घj ङ k घj ङ

form, is a tuple in

K घg घG ङङ = ँ घK घf घF ङङङ

j

k घj ङ BA घG ङ

k घj ङ

InsघF ङ

j

= K घh घf घF ङङङ; by induction :

j

Thus, e is in ँ घK घf ङङ.

BAघGङ

घ ङ Supp ose e is in ँ घK घf ङङ. Then there exists d in K घf ङ such that

BA घGङ

e = ँ घdङ. By de nition, dघAङ=f घAङ for each A in FreeघF ङ and d over

BA घGङ

BAघF ङ, where F a form, is a tuple in K घf घF ङङ. Then eघAङ=dघAङ=f घAङ for

j j j

each A in FreeघGङ and, by induction, e over BAघG ङisin

k घj ङ

InsघF ङ

ँ K घf घF ङङ = K घh घf घF ङङङ

j j

BAघG ङ

k घj ङ

InsघF ङ

for each k घj ङ such that G is a form. Hence e is in K घh घf ङङ.

k घj ङ

C Pro of of Grafting Theorem

1 2

Grafting Theorem. Let G b e a grafting of E and F , and I and I b e instances

1 2 1 2

of E and F , resp ectively. Then K घI 1 I ;E 1 F ङ=K घI ;Eङ 1 K घI ;Fङ

gr gr

Proof. Let E = hE ;:::;E i, F = hF ;:::;F i, and let G = hG ;:::;G i b e the

1 q 1 r 1 s

1 2 1

grafting of the forms E and F .If ` in K घI 1 I ;E 1 F ङ, there is g in I 1

gr gr gr

2

I such that ` b elongs to K घg ङ. By the de nition of the grafting op eration, there

is e in I and f in I such that g घAङ = eघAङ = f घAङ for all A in FreeघE ङ=

1 2

0 0 0

FreeघF ङ, g घE ङ=eघE ङ for every direct descendent E of E that is a form and

0 0 0

g घF ङ=f घF ङ for every direct descendent F of F that is a form. We claim that

ँ घK घg ङङ = ँ घK घeङङ and that ँ घK घg ङङ = ँ घK घf ङङ.

BAघE ङ BAघE ङ BA घF ङ BA घF ङ

Indeed, K घg ङ is the set of all functions k over BAघGङ such that k घAङ=g घAङ for

all A in FreeघGङ and k over BAघG ङ घwhere G is a formङ is a tuple in K घg घG ङङ.

j j j

If G is a form that is a direct descendentofE ,say E , then k over BA घG ङisa

j j

iघj ङ

tuple in K घg घE ङङ. Thus, ँ घK घg ङङ is the set of all functions over BAघE ङ

iघj ङ BAघE ङ

such that k घAङ = eघAङ for all A in FreeघE ङ ई FreeघGङ and k over BAघG ङ

j

घwhere G is a direct descendent of E ङ is a tuple in K घg घE ङङ. Therefore,

j

iघj ङ

ँ घK घg ङङ = K घeङ. Note that K घg ङ=ँ घK घg ङङ 1 ँ घK घg ङङ =

BAघE ङ BA घE ङ BAघF ङ

1 2

ँ घK घeङङ 1 ँ घK घf ङङ, which implies ` in K घI ;Eङ 1 K घI ;Fङ.

BAघE ङ BAघF ङ

1 2 1 2

Conversely, let ` in K घI ;Eङ 1 K घI ;Fङ. There are e; f in I ; I , resp ec-

tively such that `घAङ=e घAङ for all A in BAघE ङ and `घB ङ=f घB ङ for all B

1 1

in BAघF ङ for some e in K घeङ and f in K घf ङ. Let g be de ned by g घAङ =

1 1

0 0

eघAङ = f घAङ for every A in FreeघGङ = FreeघE ङ = FreeघF ङ, g घE ङ = eघE ङ

0 0

for all direct descendents of E and g घF ङ = f घF ङ for all direct descendents

0 1 2

F of F . Clearly, g b elongs to I 1 I and K घg ङ = K घeङ 1 K घf ङ. Since ` in

1 2 1 2

K घeङ 1 K घf ङ= K घg ङ ई K घI ङ 1 K घI ङ, we obtain ` in K घI 1 I ;E 1 F ङ.

gr gr