<<

Recognition of via

Computer Using Baseline Structure

Richard Zanibbi

zanibbicsqueensuca

August

External Technical Rep ort

ISSN

Department of Computing and Information Science

Queens University

Kingston Ontario Canada KL N

Do cument prepared August

c

Copyright Richard Zanibbi

Abstract

The spatial structure of mathematical expressions may b e represented using

the structure of baselines in an This baseline structure may b e

represented as a tree and then rewritten to represent dierent interpretations

of spatial structure in an expression This structure tree may also b e trans

A

lated into mathematical string languages such as L T X Maple etc A parsing

E

algorithm which extracts baseline structure in mathematics expressions from

a list of attributed symb ols is presented along with an implementation of

the algorithm The implementation of the parsing algorithm has b een in

terfaced with an online mathematics entry system The resulting integrated

system allowed testing of the parser on hundreds of input expressions with

encouraging results Future work in the extension of the current mathematics

recognition technique and application of directionbased recognition to other

such as music notation and circuit diagrams are also discussed

Acknowledgments

Thanks to Dorothea Blostein who was a fun knowledgeable and supp ortive

sup ervisor Ed Lank for all his help with the design of DRACULAE and

the ideas in this thesis Nick Willan for his implementation of the user in

terface for DRACULAE and patiently listening to me work out many of

the ideas in this thesis at the white b oard for the record Nick named the

parsing application DRACULA and I simply added an E Ken Whelan

for his fantastic pro ofreading Gary Anderson who saved my life on many

an o ccasion and was consistently an amazing source of information useful

and entertaining Ho da Fahmy whose exp erience and insight have b een very

helpful and who raised some of the issues discussed at the end

of this thesis Jianping Wu for his help with course work and things C

Talib Hussein Laurie Ricker and all the other students and professors who

help ed me survive my MSc in various ways

Thanks to Genarro Costagliola who provided what for me was a highly

inuential draft pap er on recognizing a subset of mathematics notation using

a p ositional grammar and some valuable comments

Thanks to Steve Smithies Kevin Novins and Jim Arvo for letting me use

FFES to develop DRACULAE Without FFES it would have b een much

harder to develop and test DRACULAE

Thanks to Debby Linda Irene Sandra Tom Bradshaw and Gary Powley

for all the help with the hard thankless stu that keeps things going and

gets them done

Finally thanks to Katey for listening to my rambles ab out this stu over

and over again and for b eing a fabulous pro ofreader wife and friend

Chapter

Intro duction

In certain domains the ability to use visual languages to communicate with

computers allows simpler and more succinct expression of a users intentions

than is p ossible with a string language For example it is easier for a user

to draw a complicated mathematical expression in mathematics notation

a visual language than to write the same expression in a string language

A

such as L T X or Maple Due to the sometimes more succinct and intuitive

E

expressions available in visual languages in comparison to string languages

disciplines such as music architecture and engineering commonly use visual

languages to convey information in the form of diagrams

Informally a visual language is comprised of sets of symb ols laid out

in two or three dimensions diagrams which comprise the of legal

expressions under a syntactic and semantic denition of the visual language

The principal dierence b etween a visual language and a string language is

that the syntax of a visual language is of higher dimensionality As a result

b o okkeeping is required for spatial relationships b etween the terminals and

nonterminals of the language when parsing

A visual language may b e dened as in the following

Graphical Primitive A graphical primitive is a line or dot in

an image

Symb ol A symb ol is dened by a nonempty set of spatial rela

tions on graphical primitives which are intended to b e p er

ceived as a single unit

Diagram A diagram is a two or three dimensional collection

set of symb ols

Intro duction

Visual Language A visual language is a set of diagrams which

are considered valid expressions in the dened language

Both the syntax and semantics of a visual language are de

termined through the spatial relationships b etween symb ols

in a diagram based on description in

Currently there is ongoing research into the design and development of

visual programming languages including compilercompilers for these lan

guages The diagrams of a visual programming language may b e

translated to an executable form using a visual language compiler Spread

sheets are the most common and p erhaps also the most successful typ e of

visual programming language to date

Some visual languages such as math notation music notation and engi

neering drawings are not formally dened and have dialects ie notational

practices which are not standard but accepted We call these visual lan

guages natural visual languages and dene them as in the following

Dialect A dialect of a visual language is a variant of a visual

language which employs symb olic andor spatial conventions

which are not standard in all other variants of the visual

language

Natural Visual Language A natural visual language is a vi

sual language for which no formal description exists andor

dialects of the visual language exist

Mathematics notation is the natural visual language on which we fo cus

in this thesis For the remainder of this thesis when mathematics nota

tion is used it is the Western standard read lefttoright which is b eing

indicated for an interesting comparison see for a discussion of recogniz

ing Arabic mathematical notation which is read righttoleft Mathematics

notation is not formally dened we usually assume that mathematics no

tation includes notations for algebra calculus and a numb er of other

mathematical disciplines Mathematical discourse also often involves the use

of dened notation and symb ols often have dierent semantic interpreta

tions eg fb may b e interpreted as application or as implied

or may b e dened notation pro ducing dialects

Natural visual languages such as mathematics notation have b oth hard

and soft conventions Hard conventions are those asp ects of a visual lan

guage that are consistent across dialects Soft conventions may or may not

Intro duction

apply to a particular diagram dep ending on the dialect used when a diagram

was created An example of a hard convention in mathematics notation is

the lefttoright direction of interpretation Soft conventions in mathematics

notation include layout of symb ols

The lack of formal denition for natural visual languages necessitates

identication of hard and soft constraints b efore a language denition may

b e constructed Without formal denitions it is dicult and p erhaps im

p ossible to pro duce a completely accurate list of the two sets of conventions

for any natural visual language as these have to b e obtained largely through

observation and introsp ection Once the conventions of the language are felt

to b e reasonably well understo o d and classied as hard or soft one may set

ab out dening a syntax sp ecifying legal symb ol placement in diagrams and

the semantic interpretation of this set of legal diagrams Again the lack of

formal denition prevents this language denition from b eing complete in

many cases

In the case of mathematics notation some general descriptions of the

structure of the notation are available from typ esetting pro cedures

a history of the evolution of mathematics notation and a history of

mathematics Ca joris history of notation unfortunately fo cuses on the

evolution of symb ols rather than on the use of space b etween symb ols In

addition the typ esetting references describ e notation from the p ersp ective

of generating mathematics notation and do not explicitly state the hard and

soft conventions of the notation

Dialects complicate syntactic and semantic denition due to the resulting

multiple interpretations that are p ossible For example in mathematics no

tation it is imp ossible to determine whether cos should b e interpreted as a

function or implied multiplication of variables without a prior decision ab out

how to interpret this letter grouping ie cho osing a soft convention This

is an example of how interpreting visual language dialects requires a priori

information ab out the intended dialect of interpretation b efore the intended

syntax and semantics can b e obtained from a diagram If this information

is available the appropriate soft conventions may b e adopted to prop erly

recognize a diagram

Intro duction

The Visual Language Recognition Pro

cess

Figure shows the pro cess of recognizing visual languages by computer

and the four general typ es of diagram representation usable by a computer

lab eled Typ es IIV Each of these diagrammatic representations is explained

further b elow

Visual Language Editor - Allows direct input of symbols TYPE II: Attributed List via mouse, data tablet and keyboard. TYPE IV: Semantic Representation Parsing

TYPE I: Pixel Map Symbol Recognition a. Handwritten, Noisy Semantic Evaluation b. Handwritten, Noise-Free TYPE III:Structural Representation c. Typeset, Noisy * d. Typeset, Noise-Free

Visual Language Definition Natural Visual Language Conventions - Syntax - Soft Conventions (dialect dependent) - Semantics - Hard Conventions (dialect independent)

LEGEND: Conversion of Diagram Representation Definitions Influencing a Representation or other Definitions User-Specified Symbol List * Structure Analysis of Pixel Map and Partitioning of Graphical Primitives

in Pixel Map by Structural Representation

Figure The Visual Language Recognition Pro cess

Diagram Representation Typ e I Pixel Map

Initial input is obtained either via a pixel map or an online visual language

editor An online editor has the advantage of eliminating symb ol

as a symb ol recognition step is unnecessary A user may either directly

select symb ols from a list or a facility is provided for correction of symb ol

recognition errors by the user b efore passing the resulting attributed symb ol

list for parsing as done in

Two issues concerning the use of pixel map representations are noise and

manner of pro duction Noise in a pixel map is any additional pixel informa

The Visual Language Recognition Pro cess

tion which is not part of the diagram itself For instance pixels in a pixel

map added through smudges andor spurious dots present on scanned hard

copy or added by the scanning pro cess are considered noise Also typ eset

input is more consistent in terms of b oth layout and symb ol comp osition

than handwritten input Noisefree handwritten and typ eset inputs are gen

erally pro duced online that is a user creates the pixel map on the computer

itself Two examples of noisefree typ eset pixel map are p ostscript images

A

pro duced using L T X and pixel maps created using a mouse or data tablet

E

in a drawing application

Diagram Representation Typ e I I Attributed Sym

b ol List

An attributed symb ol list gives for each symb ol a name and set of additional

attributes which may contain co ordinates indicating the relative p osition

of the symb ol in the diagram Often this co ordinate information is given

in the form of b ounding b ox co ordinates for twodimensional notations A

b ounding b ox is the minimal rectangular region which contains all the pixels

of a symb ol in a digitized image of a diagram

Conversion from Diagram Representation Typ e

I to Typ e II

A pixel map may b e converted to an attributed symb ol list through the use of

a symb ol recognizer Generally symb ol recognition involves two stages First

graphical primitives are lo cated in the pixel map Second these graphical

primitives are partitioned into sets based on their relative p ositions and then

lab eled using a pro cess that maps these sets to symb ol lab els

Both noise and the manner of pro duction inuence the certainty of symb ol

recognition Pixel maps from typ eset information are more consistent and

thus it is less dicult to recognize symb ols than when handwritten input

is given Similarly noise reduces the certainty of a symb ol recognizer in

b oth typ eset and handwritten pixel maps Generally statistical metho ds are

used to rank certainty of symb ol identication and then pro duce the most

statistically likely symb ol list according to the statistical mo del employed

In some instances multiple lists of symb ols ordered by statistical likeliness

are maintained to allow analysis of dierent symb ol sets

Intro duction

In some cases symb ol recognition is p erformed directly on the pixel map

while in others such as pro jection prole cutting the pixel map is divided

into regions b efore the application of a symb ol recognizer obtaining spatial

structure concurrently this is shown via the lab eled A symb ol list

may b e pro duced iteratively using feedback or after a complete partitioning

of the pixel map

Diagram Representation Typ e I I I Structural Rep

resentation

A structural representation describ es the spatial structure of symb ols in a

A

diagram For example a L T X string is an example of a structural repre

E

sentation of a mathematics expression Another less explicit example is a

A

parse tree pro duced by a visual language parser from which a L T X string

E

may have b een translated

An attributed symb ol list may b e converted to a structural representation

through a visual language parser As mentioned earlier in some recognition

metho ds a structural representation is obtained directly from a pixel map

Diagram Representation Typ e IV Semantic Rep

resentation

A semantic representation of a diagram contains the information content of

a diagram A Maple string is a semantic representation of a mathematical

expression diagram Semantic representations may in some cases b e used

for evaluation In a visual programming language this is the execution of

the program In the case of a Maple string this is the evaluation of the

mathematical expression by Maple

For a given visual language denition a semantic representation may b e

obtained from a syntactically valid structural representation

Denition of the Visual Language

The visual language syntax and semantics are needed for structural and

semantic analysis In the case of natural visual languages signicant work

is required to identify and co dify the syntax and semantics

Thesis and Contributions

Thesis and Contributions

The research presented in this do cument investigates the following thesis

Through separating spatial structure from semantics in mathe

matics notation a general and exible recognition of mathematics

expressions may b e obtained

By general we mean that dialects of mathematics notation may b e conve

niently handled Flexible refers to the ability to handle a large range of

symb ol placements

The following contributions are made in supp ort of this thesis

A mo del for the structure of mathematics notation

A novel mo del called the baseline structure tree is intro duced Base

line in mathematics notation is used to refer to two dierent things

rst an imaginary line running through symb ols which are horizontally

adjacent and secondly the set of horizontally adjacent symb ols them

selves Baseline structure trees represent the spatial structure b etween

sets of horizontally adjacent symb ols in mathematics notation This

mo del of baseline structure is consistent with all common dialects of

mathematics notation

A visual language parsing algorithm

This algorithm creates baseline structure trees from attributed symb ol

lists The algorithm is more general than existing mathematics nota

tion parsers b ecause the spatial relationships used to obtain baseline

structure may b e redened and tree rewriting may b e used to handle

soft conventions ie dialects after an initial baseline structure tree

has b een obtained

An implementation

The visual language parsing algorithm was implemented and integrated

into a complete recognition system for handwritten mathematics no

tation The symb ol recognition and user interface comp onents of this

system were created by Steve Smithies Jim Arvo and Kevin Novins

The system has b een tested on hundreds of handwritten expres

sions with excellent results The system was demonstrated at CASCON

Intro duction

and was enthusiastically received Public distribution of the system

is planned

Chapter

Review of Mathematics

Notation Recognition

In this chapter the existing mathematics notation recognition literature is

examined A survey of mathematics recognition research may b e found in

and a more general survey of diagram recognition in

General Issues

Blostein notes in that

any recognition metho d including pro cedurallyco ded rules im

plicitly or explicitly denes the syntax of recognizable expres

sions

Each of the systems describ ed in this chapter thus make some assumptions

ab out the syntactic structure of mathematics notation Unfortunately in

many cases the reasons for cho osing a particular structural denition andor

syntax for mathematics notation are not describ ed in detail

Martin cites this typ e of analysis of mathematics notation syntax as a

key step in the pro duction of an eective recognition system More

sp ecically Martin indicates that three things are necessary b efore a usable

mathematics notation recognition system may b e created First a study of

the structure of mathematical notation Second the creation of recognition

systems which are extendible Third a userfriendly means for extending a

recognition system

Review of Mathematics Notation Recognition

Towards analyzing the structure of mathematics notation Martin pro

vides some simple syntactic information on the notation in the form of

spatial relationships and a numb er of examples of ambiguous expressions

in He notes that

Mathematical notation is designed to b e unambiguous However

if the expressions are not carefully written or certain conventions

are not observed they may app ear ambiguous

By conventions Martin is referring to layout of symb ols and the choice of

relative size of symb ols

Martin also states that a precedence parser may b e built for recognizing

mathematics notation if all symb ols which indicate vertical displacement

from the centre line are leftmost in their sub expressions In this case an

algorithm may b e constructed which scans an attributed symb ol list left

to right pro ducing an appropriate structural representation The following

expression with overlapping limits would not b e recognized by such a parser

P

as the symb ol indicating vertical displacement from the centre line is

not leftmost

X

i

i

Symb ol Recognition in Mathematics No

tation

The recognition of even typ eset mathematical symb ols has proven more di

cult in the context of existing OCR systems that one might exp ect Benjamin

Berman and Richard Fateman have done some research in this regard to

develop character recognition systems b etter suited to the math notation

recognition context

Berman and Fateman observed that commercial optical character recog

nition systems with recognition rates of or higher were falling to

or less once tried on p erfectly formed characters in mathematical

contexts They indicate that this is b ecause the heuristics employed which

work well on straight text multicolumn printing such as in newspap ers

and tables fails with math notation b ecause of the following

Existing Mathematics Notation Parsers and Structural

Analyzers

Variations in font size

Multiple baselines

Sp ecial characters

Dierent sp elling digraph frequencies ie statistical frequency of hor

izontally adjacent symb ols

Symb ol Recognition Techniques Employed

The techniques that have b een used to recognize symb ols in mathematics no

tation are more or less standard in the pattern recognition literature Tem

plate matching is used in nearest neighb or classication in neural

networks in hidden Markov mo dels in and chain co de recognition in

Winkler and Lang discuss a hidden Markov mo del approach which

employs softdecision making in order to generate multiple interpretations of

an input expression Miller and Viola discuss the resolution of symb ol

ambiguity using higher level context Recognition of Arabic mathematical

symb ols is addressed in

The ma jority of symb ol recognition research in mathematics notation

fo cuses on handwritten input pro duced online via data tablet or scanned im

ages of typ eset A smaller amount of research discusses recognizing

typ eset pixel maps pro duced online

Existing Mathematics Notation Parsers

and Structural Analyzers

Existing mathematics recognition systems fall into two main categories First

there are metho ds which obtain a structural representation directly from pixel

maps which we call Structural Analyzers In the other category analysis of

structure is p erformed after symb ol recognition has o ccurred ie these meth

o ds obtain a structural representation from an attributed symb ol list We

call these systems Mathematics Notation Parsers

Below is a categorization of existing techniques

Structural Analyzers

Review of Mathematics Notation Recognition

a Pro jection Prole Cutting

Mathematics Notation Parsers

a Attributed String Grammars

b Structure Sp ecication Schemes

c Graph Grammars

d Sto chastic Grammars

e Pro cedural Translation

We discuss the ab ove metho ds in greater detail in the following sections

Pro jection Prole Cutting

Any mathematical expression can b e considered as a collection

of a numb er of comp onents single symb ols or sub expressions

arranged horizontally each of which may contain smaller comp o

nents arranged vertically

Okamoto et al in outline a metho d of obtaining a structural

representation of scanned images of mathematics notation using recursive

pro jection prole cutting A pro jection corresp onds to pro jecting pixels onto

the x and y axes of the image The cutting pro cess separates horizontally

adjacent sub expressions using a vertical pro jection followed by horizontal

pro jection to separate baselines This pro cess is applied recursively

Figure shows an expression and the structural representation ob

tained after pro jection prole cutting has o ccurred Horizontally adjacent

no des in the tree corresp ond to pixel regions obtained from a vertical pro jec

tion while vertically adjacent no des in the tree corresp ond to pixel regions

separated by a horizontal pro jection Connected pixel regions which cannot

b e further separated by pro jection prole cuts are then recognized shown as

symb ols in b oxes at the leaves

This approach is not able to detect sup erscripts subscripts matrices

limit expressions eg or expressions within square ro ots each

of which requires additional pro cessing These additional pro cesses rewrite

the structure tree created by the pro jection prole cutting pro cess A T X

E

string is then translated from the tree

Pro jection Prole Cutting

Figure Example of Pro jection Prole Cutting

The general approach taken here is interesting The following steps are

employed in this system

A structural representation is obtained using a structural feature of

mathematics notation ie the structure of horizontally and vertically

adjacent symb ols in a mathematics expression

The structural representation is altered so that it is syntactically valid

In this case a valid syntactic representation is considered to b e a tree

which may b e translated into valid T X output E

Review of Mathematics Notation Recognition

The structural representation is mapp ed to a string ie T X

E

In pro jection prole cutting is augmented by b ottomup pro cess

ing Essentially character recognition is p erformed rst and the problematic

structures for pro jection prole cutting are lo cated analyzed and group ed

b efore pro jection prole cutting is applied

This research is closely related to that of Wang and Faure In

this research pro jections are used in conjunction with a more complicated

analysis of spatial relationships b etween handwritten symb ols created online

Separate metho ds are describ ed for lab eling sup erscripts and subscripts and

a routine for handling square ro ot expressions is mentioned As in the system

by Okamoto et al an initial structure is obtained and rewritten to obtain

a syntactically valid structure

The simplicity and sp eed of pro jection prole cutting is app ealing How

ever in addition to diculty with square ro ots and sub and sup erscripts

a strictly pro jectionbased metho d may improp erly segment characters with

broken lines and skew in an input image may alter the necessary horizontal

and vertical relationships which are used for later analysis

Mathematics Notation Parsers

Attributed String Grammars

Parsing mathematics notation using an attributed string grammar is p er

haps the most common typ e of mathematics recognition system

Generally these systems involve slightly mo difying ex

isting parsing metho ds to obtain a structure representation andor semantic

interpretation from attributed symb ol lists

The earliest rep orted mathematics recognition systems are those of Anderson

and Martin Both Anderson and Martin used systems which they called

coordinate grammars A co ordinate grammar is essentially an attributed

string grammar with constraints on pro duction application based on the

relative p ositions of symb ols Unlike a string grammar the order of symb ols

in the input is unimp ortant

Anderson prop oses representing symb ols through b oth a b ounding b ox

and a single co ordinate intended to approximate the p osition of the centre

line or baseline through horizontally adjacent symb ols Anderson denes the

centre of a symb ol based on where the symb ol lies relative to the baseline

Mathematics Notation Parsers

as shown in Figure In his system all characters have a horizontal centre

co ordinate corresp onding to the middle horizontal p oint of the symb ol shown

as xcentre for x in Figure

Middle Line x p A j Centre Line / Baseline Writing Line

xcentre

Figure Typ ographic Centre of Symb ols

Andersons system accepts an attributed symb ol list obtained from a data

tablet as input and assumes that symb ol recognition errors have b een re

solved b efore the input is received The symb ol list contains symb ol identity

and b ounding b ox co ordinates which dene the rectangular region contain

ing all the pixels of a symb ol for each symb ol Consider the following rule

from Andersons grammar for initially lo cating a term A nontermi

nal DIVTERM is matched if a horizontal line is found with symb ols ab ove

and b elow which do not extend past the horizontal line on either side or

overlap the horizontal line vertically The parse tree resulting from a match

of this rule lo oks like Figure

Andersons parser works topdown attempting all p ossible partitions of

symb ols until a valid parse is obtained some heuristics to improve p erfor

mance are given in After the parse tree has b een built the semantics of

the expression are obtained by propagating a meaning string attribute from

the leaves to the ro ot of the parse tree For example in the attribute syn

thesis step after initially building the parse tree the nonterminal DIVTERM

is assigned a ycentre co ordinate equivalent to the ycentre co ordinate of the

horizontal line and a meaning attribute based on the meaning attributes of

S and S

Andersons grammar was p ossibly the most complete to date including

a separate grammar for handling simple matrices However the system was

very computationally exp ensive Fateman noted that

Andersons parserhandled a very large domain of mathematical

expressions However at the time of his work his rec

Review of Mathematics Notation Recognition

DIVTERM

EXPRESSION - (horiztonal line) EXPRESSION (S1) (S2) (S3)

Spatial constraint: DIVTERM attributes: -all symbols in S1 and S3 have meaning: (meaning(S1))/(meaning(S3)) y centre coordinate: ycentre(S2) bounding boxes neither left nor right of S2 -all symbols in S1 are above S2

-all symbols in S3 are below S2

Figure Partial Parse Tree for Andersons Co ordinate Grammar

ognizer app ears to have b een tested to its limits with equations

having ab out eight symb ols

Later research in recognition of mathematics expression using attribute

grammars tends to fo cus more on symb ol recognition and ambiguity issues

than pro ducing a more expressive grammar

Structure Sp ecication Schemes

Chang in presents another system for obtaining a tree structure rep

resenting an expression based on denitions of op erator range precedence

and dominance Op erator range refers to the syntactically valid spatial lo

cations of an op erators Op erator precedence denes the order

of application of op erators represented as symb ols in an expression eg

has greater precedence than Op erator dominance is dened through a

partial ordering of the op erators in an input expression based on the op erator

precedence and range

Mathematics Notation Parsers

A structure specication scheme assigns to each op erator a way in which

an input pattern is partitioned into regions for the op erator and its op erands

Along with the ordering on op erators given by op erator dominance a struc

ture sp ecication scheme may b e used to obtain the structure of a mathe

matics expression

Input to Changs metho d is an attributed symb ol list The symb ols in

the original input are then recursively partitioned into op erator and op erand

regions using the structure sp ecication scheme each time partitioning based

on the least dominant op erator

The parsing algorithm provided by Chang is fast b eing of O n time

complexity The system describ ed do es not app ear to handle subscripts or

sup erscripts as these op erators are implicit represented through relative

p osition of symb ols rather than symb ols in the input

Sto chastic Grammars

In a sto chastic grammar probabilities are asso ciated with pro ductions of the

grammar Chou presents a sto chastic grammar which is capable of recogniz

ing expressions in the context of a signicant amount of noise pro duced by

ipping bits in a typ eset pixel map Symb ol recognition is p erformed

using exhaustive template matching A dynamic programming algorithm is

then used to nd the most likely parse for an input expression The results

are impressive for the given test examples but the expressions themselves

are quite small

Miller and Viola in discuss their work with a sto chastic grammar

based on Chous with some p erformance improvements The inputs they

examined were noisefree pixel maps pro duced online

Pro cedural Translation

Lee and Lee describ e a set of pro cedural metho ds used to translate an

attributed symb ol list into a formatted string eg in EQN format Symb ols

which deviate from the typ ographical centre of an expression are group ed into

units called symbol groups and then the symb ol groups and the remaining

symb ols in the input are ordered lefttoright based on the yco ordinate of

their center p oints

An output string is obtained by applying this grouporder pro cedure re

cursively In a later pap er this algorithm is mo died so that symb ol groups

Review of Mathematics Notation Recognition

are lo cated recursively rst and a structural representation in the form of a

tree is built Some additional discussion is given ab out correcting recog

nition errors through the use of semantic analysis

A similar approach to Lee and Lees is describ ed by Winkler who uses

a directed acyclic graph representation Winklers pro cess has the ad

ditional asp ect of generating multiple interpretations using a probabilistic

mo del of symb ol layout

The advantage of a pro cedural recognition metho d is sp eed The ma jor

disadvantage of a pro cedural translation approach is the diculty in under

standing maintaining and extending such a system as it is represented solely

in terms of pro cedural co de

Graph Grammars

Bunke in demonstrated how attributed graph rewriting is a useful ap

proach for recognizing schematic diagrams Graph grammar approaches to

diagram recognition have b een used by Dori for recognizing dimensions

in machine drawings Fahmy and Baumann for music notation

and Grbavec and Lavirotte and Pottier for mathematics notation

In graphgrammar based diagram recognition a graph is initially built

from input symb ols containing lab eled edges representing spatial

structure This graph is then constrained andor collapsed using a graph

grammar parser which propagates attributes b etween no des during pro duc

tion application

Individual rules in a graph grammar are relatively intuitive due to the

visual representation via graphs Graph grammars are a very general formal

ism but are computationally exp ensive to parse as graph matching is NP

complete Lavirotte and Pottier attempt to alleviate this problem through

making their graph grammar deterministic but this app ears dicult to do

without restricting the expressivity of the graph grammar Smithies in

prop oses using an A searching algorithm to attempt to improve p erformance

through heuristic means

Summary

It is worth noting that in the systems describ ed trees are used to represent

the structure of mathematics expressions either explicitly as in the case

Summary

of pro jection prole cutting or implicitly in the parse trees pro duced by

mathematical notation parsers other than graph grammar parsers This

pattern of representation inuenced the choice of structural representation

to b e explored in the next chapter

The desirability of linearizing an attributed symb ol list where p ossible is

also a common theme Where p ossible linearization improves p erformance

dramatically as noted rst by Anderson in

The problem of pro ducing a system which is b oth ecient and suciently

exible to recognize complex spatial relationships remains an op en problem

The area would probably b enet greatly if the kind of study of mathemat

ics syntax prop osed by Martin were to b e undertaken The current lack

of information characterizing the problem domain makes it necessary to ex

p end a considerable amount of eort identifying conventions of mathematics

notation b efore a recognition technique may even b e designed

Review of Mathematics Notation Recognition

Chapter

A Mo del for the Structure of

Mathematics Notation

A ma jor ob jective for mathematics notation recognition is to de

ne the syntax cleanly to provide a unifying framework for han

dling the myriad details and exceptions that arise during mathe

matics recognition

Mathematics notation is a natural visual language and as a result there

is no existing formal denition and dialects exist However in order to

recognize at least one or more of the dialects of mathematics notation one

needs to pro duce a language denition

A study of the hard and soft conventions of the notation may b e made

through examining conventions observed in the literature and through in

trosp ection and observation A visual language denition for mathematics

notation may then b e sp ecied

Esp ecially in light of the existence of dened notation the p ossibility of

creating a complete visual language denition for mathematics notation is un

likely However rather than attempt to pro duce an ascompleteasp ossible

language denition for a recognition system it may b e more advantageous to

identify those notational conventions that seem to b e present across dialects

ie hard conventions and then pro duce a representation which can b e easily

manipulated for syntactic analysis and semantic interpretation under various

language denitions

In this chapter using the hard conventions of direction of interpretation

and op erator dominance a mo del of spatial structure in mathematics nota

A Mo del for the Structure of Mathematics Notation

tion is provided This is the baseline structure tree BST a tree representa

tion which makes baselines and spatial relations b etween baselines explicit

An example is provided which demonstrates how a baseline structure tree

may act as the starting p oint for a more detailed syntactic analysis under a

visual language denition

Hard Conventions of Mathematics Nota

tion

As indicated by Okamoto any mathematical expression may b e consid

ered as a collection of symb ols and sub expressions arranged horizontally

This horizontal adjacency of symb ols or sub expressions is referred to infor

mally as a baseline Through observation the direction of interpretation

of symb ols andor sub expressions along a baseline is always lefttoright

corresp onding to the reading direction in Germanic languages We prop ose

that this lefttoright direction of interpretation along baselines is a hard

convention of mathematics notation

Anderson observed the usefulness of the directedness of baselines in math

ematics notation in indicating that pro ceeding lefttoright along a

baseline may obtain the desired syntactic structure of linear expressions such

as

a b cx

The linear structure allows the relationship b etween the ab ove symb ols to

b e represented in a string using only concatenation unlike other spatial re

lationships which require a more explicit representation of twodimensional

spatial structure as will b e shown later

Another hard convention of mathematics notation is the lo cation of the

symb ol from which interpretation b egins in unambiguous expressions Fate

man states that the subset of T X used for computer algebra systems eg

E

Maple Mathematica is strictly lefttoright parseable if some heuristics are

employed Sp ecically he states that in a given sub expression

the leftmost glyph governs the meaning of the expression In the

few exceptions to this rule we have tried by manipulating the

expression glyphs to expand the key op erator to the left to assert

R

by a virtual this truth For example we consider extending

bar extending to its left

Hard Conventions of Mathematics Notation

In other words expressions are generally interpreted b eginning with their

leftmost symb ol Common exceptions include the following

Horizontal lines not b eing leftmost in their asso ciated sub expression

as in

where the four extends as far to the left as the horizontal line

Limit symb ols with overlapping limits This o ccurs when symb ols in

one of the limits extends to the left of the limit symb ol as in

X

i

In these cases where the symb ol from which interpretation b egins is not

leftmost op erator dominance as dened by Chang may b e used to lo cate

the dominant op erator in the leftmost sub expression from where interpreta

tion b egins For example the dominant op erator in the rst example ab ove

is the horizontal line and this horizontal line is the symb ol from which to

b egin interpretation In the case of symb ols which are not in the scop e of an

op erator and are leftmost we simply b egin interpretation from this symb ol

Martin in provides a numb er of ambiguous cases where it is imp os

sible to determine the starting symb ol of an expression b ecause either it is

imp ossible to determine op erator dominance eg in the case where a frac

tion such as abc is displayed vertically with equal length horizontal lines

or the range of op erators is unclear such as when a horizontal line repre

senting a division overlaps a symb ol slightly making it unclear whether that

symb ol is an of the division or an adjacent term Resolving such

requires a decision on the part of the reader concerning the range

of op erators andor the dominant op erator in the leftmost sub expression

In the case then where an expression has an unambiguous op erator dom

inance in the leftmost sub expression interpretation b egins from the dom

inant op erator in that expression or the leftmost symb ol if that symb ol

alone constitutes the leftmost sub expression This is a hard convention The

interpretation of ambiguous examples involves soft conventions as these am

biguities may b e resolved using a numb er of dierent approaches none b eing

necessarily correct

A Mo del for the Structure of Mathematics Notation

This pro cess of nding the starting symb ol of an expression may b e rep

resented using the function START START is sp ecied in the following

START START is a function which takes an attributed sym

b ol list representing an unambiguous expression l ist and

s

returns the starting symb ol in l ist or if l ist is empty

s s

Symb ol Layout as a Soft Convention

In mathematics notation spatial relations are used to group symb ols into

units and to sp ecify op erators For instance horizontal adjacency and

the distance b etween horizontally adjacent symb ols whitespace are used

to group symb ols into syntactic units as in cos x The binary op erator

for exp onentiation is represented using the spatial relation b etween base and

exp onent as in x

Horizontal adjacency subscripting and sup erscripting are hard conven

tions of mathematics notation spatial structure When p erceived as b eing

clearly present by a reader they are unambiguous spatial relationships b e

tween symb ols However the set of layouts which dene each spatial relation

is a soft convention for instance consider Figure a

XXXa a a X

a. b.c. d.

Figure Example Spacing Between Adjacent Symb ols

It is imp ossible to state with certainty exactly where the p osition of the

a would stop b eing a subscript as in Figure a and start b eing hori

zontally adjacent as in Figure b Likewise the same problem o ccurs when

trying to decide where b etween the p ositions of a in Figure b and Fig

ure c the a starts b eing in the sup erscripted p osition Figure d is

Symb ol Layout as a Soft Convention

itself either syntactically invalid or an exp onent dep ending on the syntactic

denition adopted

From observation sup erscripts horizontal adjacency ab ove b elow and

containing eg by a square ro ot spatial relationships are hard notational

conventions They are commonly used in the interpretation of mathematics

notation However given the soft convention of symb ol layout any character

ization of spatial relations in mathematics notation will require the adoption

of a set of arbitrarily chosen thresholds to dene regions for each relation

Thresholds may b e dened for what are felt to b e ambiguous regions but

this in itself will b e a decision on the part of the visual language designer as

what constitutes a spatially ambiguous diagram is not formally dened

Mathematics Notation as a ContextSensitive Vi

sual Language

Many of the spatial relations employed in mathematics notation may b e

describ ed through a series of interacting spatial relations b etween symb ols

Consider Figure 2 100000 X i

i = 1

Figure Exp onent and

The reader do es not simply lo ok directly ab ove the limit to nd the

upp er limit of the summation this pro duces the wrong limit We

can informally describ e a pro cess of lo cating the upp er limit in Figure in

the following

The is closer to the x than the the in the upp er limit closer to

the

Though horizontally adjacent the is visibly separated from the by

whitespace the is thus sup erscripted from the x and the is part of

the upp er limit of the

A Mo del for the Structure of Mathematics Notation

The zero which ends the upp er limit is much closer to the than the

next symb ol which is horizontally adjacent to the i and so is part

of the upp er limit

is directly ab ove the

Concatenating the and we have found to b e part of the upp er limit

with the ab ove the we obtain an upp er limit of

The upp er limit of the summation in Figure may b e obtained through

the pro cess ab ove or another similar pro cess involving the examination of the

relative p ositions of symb ols in the expression

The kind of complex spatial interaction present in Figure is common

in expressions with multiple baselines obtaining the spatial relationship b e

tween two symb ols may require knowledge of the relationship b etween those

symb ols to other symb ols in their asso ciated expression This demonstrates

how mathematics notation is a contextsensitive visual language

Spatial Relations in Mathematics Notation

In this section a numb er of binary relations are informally dened in order

to describ e spatial structure in mathematics notation These denitions are

based in part on those discussed by Genarro Costagliola

The sp ecications are in Table For each S is a single symb ol

We discuss the spatial relations in greater detail in the following

HOR indicates horizontal adjacency b etween two symb ols on a baseline

For example in Figure b HORXa is a valid relation

ABOVE and BELOW corresp ond intuitively to baseline symb ol sets

which are ab ove or b elow a symb ol but directly ie only those symb ols

whose center horizontally overlaps the width of the domain symb ol

p

CONTAINS indicates square ro ot sub expressions For example x

p

could b e represented as CONTAINS x

BLEFT and TLEFT represent spatial relations b etween a limit symb ol

R

such as and symb ols in limits which extend to the left of the limit

symb ol when the limit symb ol starts a baseline For instance in

X

i

i

Symb ol Layout as a Soft Convention

Spatial Relation Denition

HORS S S is the next symb ol horizontally adjacent

to S

SUPERS S S is the set of symb ols sup erscripted from

S

SUBSCS S S is the set of symb ols subscripted from S

ABOVES S S is the set of symb ols directly ab ove S

BELOWS S S is the set of symb ols directly b elow S

CONTAINSS S S is the set of symb ols contained by the

b ounding b ox of S

TLEFTS S S is the set of symb ols to the top left of a

limit symb ol which starts a baseline S

BLEFTS S S is the set of symb ols to the b ottom left of

a limit symb ol which starts a baseline S

Table Spatial Relations

The i and of the lower limit and the and b eginning the upp er

limit are in BLEFT and TLEFT relations with the limit symb ol ie

P P

TLEFT fg and BLEFT fig

In the next chapter each of these relations are dened in greater detail

using co ordinates of symb ols For now these general descriptions will suce

for explanation

The previously dened spatial relations along with the hard conventions

of starting symb ol and direction of interpretation along baselines may b e used

to create a structural representation for mathematics notation the baseline

structure tree

We need rst to dene baseline symbol set and main baseline symbol set

as in the following

Baseline Symb ol Set A baseline symb ol set is a set of attributed

symb ols for which the relation HOR holds b etween all s s

i i

for i n where n is the numb er of symb ols in the

set

Main Baseline Symb ol Set Given a list of symb ols l ist the

s

main baseline symb ol set is the baseline bl ine asso ciated main

A Mo del for the Structure of Mathematics Notation

with the symb ol in l ist from which interpretation b egins

s

ie a nonempty set symb ol returned by STARTl ist

s

Using the spatial relations discussed along with the idea of baseline sym

b ol set the structure of a large numb er of mathematics notation diagrams

may b e characterized using a tree and parenthesized strings translated from

the tree

Consider the expression

X

i i

i

The spatial structure of the symb ols in this expression can b e represented

via a baseline expression tree as given in Figure For the purp oses of the

following examples assume that the sp ecied spatial relations are valid Expression

i + 2 7 i + 2

BELOW TLEFTABOVE SUPER SUPER

i = 1 1 0 0 0 0 0 2

Figure A Baseline Structure Tree

The ro ot is lab eled Expression and along with the no des with spatial

relation lab els has children ordered left to right ie a baseline symb ol set

is represented as children of the ro ot and each relation This eliminates the

need in the tree to show HOR explicitly as any two adjacent siblings b elow

the ro ot or a relation are in a HOR relationship Symb ols such as the i in i

P

and may have spatial relationlab eled children indicating a set of symb ols

in a region relative to the symb ol

The main baseline of the expression is represented b elow the ro ot while

the main baseline of each sub expression app ears under an asso ciated relation no de

Symb ol Layout as a Soft Convention

Prop erties of Baseline Structure Trees

There are a numb er of advantages to structural representation via a baseline

structure tree These include

The grouping and left to right order of baseline symb ol sets is explicit

All spatial relations are explicit

A depthrst traversal of this tree will pro duce a linearized string rep

resenting the structure if bracketing is used For instance the lineariza

tion via depth rst traversal of Figure gives

P

Expression BELOW fig TLEFT fg ABOVE fg SUPER fg

i SUPERfgi

If the ro ot Expression is eliminated the ab ove string p ossesses the

A

typ e of syntactic structure used in L T X strings

E

The tree may b e restructured in order to represent more complicated

structures via lab eled relation no des

As a demonstration of restructuring a baseline structure tree consider

Figure Here a treerewriting rule has b een applied to Figure This

rule can b e stated as starting from the ro ot collect all TLEFT ABOVE

and SUPER no des asso ciated with a Sigma and replace them with a single

relation no de lab eled ULIMIT placing all children of the TLEFT ABOVE

and SUPER no des under ULIMIT Then do the same for BLEFT BELOW

and SUBSC The tree now has a structure which contains a syntactically

valid representation of the limits of a summation This is a very simple rule

in the next chapter a more complicated rewrite is describ ed which allows

almost arbitrary sub expressions as limits

The original expression represented in Figures and is ambiguous

it is not clear what the scop e of the summation is ie do es it end with i

P

with i or do es it encompass all sub expressions adjacent to the

This may b e resolved through the use of a soft convention For instance

sp ecifying either a certain distance of white space indicating memb ership

P

in the asso ciated sub expression of the or restricting scop e to the rst

sub expression in the summations scop e if bracketing is not used

Once having chosen a soft convention the convention may b e represented

in the tree using another tree rewrite For instance supp ose a syntax deni

tion where only the rst unbracketed term is considered to b e in the scop e of

A Mo del for the Structure of Mathematics Notation

Expression

i + 2 7 i + 2

BLIMIT ULIMIT SUPER

i = 1 1 0 0 0 0 0 2

Figure Baseline Structure Tree with Rewritten Limits

the summation is used Another typ e of relation no de may b e added to the

tree which we will call SCOPE The main baseline may then b e scanned for

symb ols up to an op erator and then a rewrite may b e applied to pro duce

Figure

In this thesis only a small numb er of rewrites are dened and only for rel

atively hard notational conventions A design for a more complete system

capable of handling many dialects is presented in the last chapter

Symb ol Layout as a Soft Convention

Expression

+ 2 7 i + 2

BLIMIT ULIMIT SCOPE i = 1 1 0 0 0 0 0 i

SUPER

2

Figure Baseline Structure Tree with Rewritten Summation Scop e

A Mo del for the Structure of Mathematics Notation

Chapter

A Mathematics Notation

Parser

As discussed in the last chapter a baseline structure tree is a exible struc

tural representation of a mathematics notation diagram In this chapter a

parsing algorithm for obtaining baseline structure trees from an attributed

symb ol list is presented Examples of syntactic analysis p erformed through

tree rewriting and contextsensitive translation to string languages are also

discussed along with relevant issues

Prepro cessing and Symb ol Attributes

For the purp oses of this algorithm symb ols in the input list must have iden

tity and b ounding b ox attributes A b ounding b ox is dened as a pair of

co ordinates used to sp ecify the minimum and maximum xy co ordinates b e

tween which all the pixels of the symb ol are lo cated in the p ositive Cartesian

plane An example is shown in Figure

As a prepro cessing step each symb ol is given additional attributes cal

culated using the identity and b ounding b ox attributes These additional at

tributes are typ e centroid class centroid co ordinate and wall attributes

which sp ecify a partitioned region in the input

The typ e attribute is used to group structurally similar symb ols for in

R

P S T

stance limit symb ols op en f and close brackets

g Typ e attributes simplify pro cessing based on structural function

Each symb ol is assigned a centroid attribute which is used to represent

A Mathematics Notation Parser

(maxX, maxY)

(minX, minY)

Figure A Bounding Box

Middle Line x p A j Centre Line / Baseline Writing Line centred centred ascending descending

Character Centroid Class

Figure Centroid Class for Dierent Letters

the p osition of each symb ol using a single co ordinate The value of this

co ordinate is calculated by examining the normal p osition of the b ounding

b ox of the symb ol in relation to a baselinecentre line

Figure demonstrates dierent centroid classes for a numb er of letters

Ascending characters extend ab ove the middle line descending characters

fall b elow the writing line and what we term centred characters either fall

b etween the writing line and middle line or extend past b oth writing line and

middle line such as the j in Figure

Table is based closely on Grabavecs list of centroid classes given

in It shows the centroid classes for a numb er of characters used in

Prepro cessing and Symb ol Attributes

mathematics notation Symb ols which do not app ear in this list are in the

centred centroid class

Characters Alignment

Ascending

AZ Ascending

bdfhiklt Ascending

gpqy Descending

acejmnorsuvwxz Centred

Ascending

Descending

o Centred

Limit Symb ols Centred

Table Centroid Classes for Symb ols

With the exception of brackets the x co ordinate of the centroid is always

the middle horizontal p oint ie minX maxX minX In the case

of op en brackets the minX co ordinate is assigned to the x co ordinate of the

centroid and the maxX co ordinate is assigned to the x co ordinate for close

brackets

The y co ordinate of the centroid is then assigned using the centroid

class Ascending characters are assigned a y centroid co ordinate at minY

maxYminY Descending characters are assigned a y centroid co

ordinate at minY maxY minY and centred class characters are

assigned their ycentre ie minY maxYminY

These yco ordinate assignments are intended to reect the approximate

lo cation of the baseline through typ eset characters This corresp onds to a

rough normalization of the input symb ol In particular with handwrit

ten input this may b e a very crude approximation essentially replacing the

contents of the b ounding b ox by a single p oint based on a mo del of typ eset

characters

Finally each symb ol p ossesses four wall attributes top b ottom left

and right which sp ecify a region in the input pattern sp ecied by b ot

tomleft and topright Initially the b ottom and left walls of all symb ols

are set to and top and right to an arbitrarily large p ositive eg

These wall attributes will b e used in the parsing algorithm to partition

A Mathematics Notation Parser

the input when examining spatial relationships b etween symb ols

Syntax Directed Scanning

In this section we discuss syntax directed scanning which is a key comp onent

of the parsing algorithm describ ed in the following sections

In Genarro Costagliola and ShiKuo Chang describ e how syntax di

rected scanning of the input can allow conventional LR parser generators such

as YACC to b e used for building parsers for a class of visual languages in

cluding a simple mathematic language which may b e describ ed using what

they term positional grammars The ability to build a positional LR parser

requires the use of syntax directed scanning of the input

In this technique functions corresp onding to the spatial relations b e

tween symb ols are used to describ e the syntax of a visual language These

functions are then used to drive the scanning of the input pattern during a

parse Normally in an LR parser linear retrieval is employed where the next

unexamined symb ol in the input array is returned The spatial functions

order the input pattern resulting in a very ecient ie On parser for a

restricted but useful class of visual languages

The input to a p ositional LR parser is assumed to b e a list of attributed

symb ols ie containing centroid co ordinates on which the spatial functions

may op erate Each spatial function takes the index of a symb ol in the input

array and returns the index of a token matching the asso ciated relation or

indicates that no symb ol matching the spatial relation was found A symb ol

matched by a spatial function during a parse is marked in the input allowing

that symb ol to b e removed from consideration during later analysis

In App endix A a p ositional grammar is given which corresp onds to the

syntax of baseline structure trees However the creation of a generalized LR

parser as describ ed by Costagliola is not p ossible

see App endix A for further discussion

Spatial Functions

In this section the spatial relations intro duced in Chapter are dened as a

set of functions For the remainder of this discussion the term input list will

refer to a prepro cessed list of attributed symb ols ie with typ e centroid

Spatial Functions

y (maxX, maxY)

R

(minX, minY)

0 x

Figure An Example Region

class centroid and wall attributes An input list is assumed to have b een

sorted by leftmost b ounding b ox xco ordinate in ascending order

A region R is dened as an axis parallel rectangular area in the p ositive

integer Cartesian plane dened by a pair of xy co ordinates A co ordinate

C is said to b e in a region R if it lies in the region dened by R and C do es

not lie on the maximum Y or maximum X co ordinate b oundaries Figure

demonstrates this The maximum X and Y b oundaries are shown with

dotted lines to indicate that a p oint C is not considered within region R on

those b oundaries

Horizontal overlap is dened as in the following

Horizontal Overlap A symb ol S is horizontal ly overlapped by

a symb ol S when the minX b ounding b ox co ordinate of S

is less than or equal to the centroid x co ordinate of S

START OVERLAP and SP Functions

START is a function which returns the starting symb ol for a partitioned

region of an input list The algorithm for START provided in App endix

A Mathematics Notation Parser

B contains a simplied analysis of op erator dominance ie it returns the

leftmost limit symb ol if one is present rather than analyzing the range of

limit symb ols but for the purp oses of this research was found to b e adequate

for a large numb er of test cases

START calls a function OVERLAP b efore returning a symb ol OVER

LAP is a function which determines whether a symb ol is overlapp ed by a

horizontal line this is part of the op erator dominance analysis OVERLAP

takes an integer a pair of yco ordinates and an input list The function then

returns either the passed integer if the symb ol at that index is not overlapp ed

by a horizontal line or the index of the largest overlapping line see App endix

B for the OVERLAP algorithm

START scans the input to lo cate the leftmost symb ol in R If no symb ol

is found is returned If a symb ol is found the input is scanned further

lo oking for the leftmost limit symb ol If no limit symb ol is found or is the

same as the leftmost symb ol the leftmost symb ol is checked for overlap with

horizontal lines and then the result is returned If a limit symb ol is found the

scan reverses checking all symb ols to see if they are horizontally adjacent to

the limit symb ol If no such symb ol is found the leftmost symb ol is part of

a limit and the limit symb ol is checked for overlap and the result of overlap

check is returned Otherwise the leftmost symb ol is not part of a limit and is

checked for overlap The result of the overlap examination is then returned

Another function SP may b e dened corresp onding to the typ e of start

symb ol function for visual languages describ ed in SP gives the start

symb ol of the entire expression using R dened as Assuming

l ist is nonempty SPl ist returns the rst symb ol of the main baseline

in in

symb ol set of the entire expression ie in the region dened ab ove

HOR Function

The extension of baselines can b e summarized using the following spatial

function denition of HOR HOR takes a symb ol and an input list l ist and

in

returns either a symb ol of l ist or

in

Given an input list l ist and a symb ol s HORl ist s a where

in in

a is if no unmarked symb ol is horizontally adjacent to s in the region

dened by ss wall attributes

a is an unmarked symb ol in the region dened by ss wall attributes

Spatial Functions

which is the next horizontally adjacent symb ol on the baseline symb ol

set of which s is a memb er

If a horizontally overlaps a horizontal line hl ine let a b e the longest

horizontal line which overlaps hl ine or hl ine if no horizontal line over

laps hl ine

Horizontal adjacency is dened using the wall attributes of s ie a

region and the centroid co ordinates of a HOR pro duces a dened result for

HORs for each of the cases in Table

The algorithm for HOR is very simple If s is a horizontal line or

START is called on the region dened by the wall attributes of s and the

maxX b ounding b ox co ordinate of s ie replacing leftWall The remaining

cases simply require a scan of the input list p erforming tests on co ordinate

p ositions of s and the identity and co ordinates of each symb ol encountered

The scan stops when a symb ol meeting the criteria for HOR is met A call

is then made to OVERLAP again to check with overlap with horizontal

lines

As established in the last section START is O n in the worst case For

the other HOR cases a scan of the input with O comparisons for each

element O n is followed by a call to OVERLAP O n Thus HOR is of

O n time complexity in the worst case Due to the input list b eing sorted

in many cases only a single set of comparisons is made ie O b est case

Figure shows the HOR region searched in the general case In this

research the convention of assigning the Upp er Threshold to of the b ound

ing b ox height and the Lower Threshold to the height of the b ounding

b ox was used as in and others TLEFT and BLEFT are bracketed as

they are only searched if the A was a limit symb ol TLEFT and BLEFT

regions are only examined when a limit symb ol starts a baseline or follows

a bracket or horizontal line on a baseline

The sp ecial cases for binary op erator and calling START reect the typ e

of deviations from the centre line that o ccur and which are still p erceived as

adjacent In the example in Table if the addition sign moves up or down

in the binary op erator case the baseline structure is still clear

In the general case the leftmost symb ol on the right of s in the HOR

region is returned as a if it is present and not overlapp ed by a horizontal

line Note that in this denition of region symb ols may have the same minX

co ordinate and b e pro cessed as b eing horizontally adjacent in the order in

l ist in

A Mathematics Notation Parser

s a

EXAMPLE

Binary Op erator Horizontal Line to

not Horizontal right of binary op

Line erator in prop er

region

Any Symb ol Next leftmost symb ol

in region which has a

Z

b ounding b ox that ex

x

tends b oth ab ove and

b elow that of the do

main symb ol

Horizontal Line STARTwall at

Op en Bracket tributes let leftWall

maxX of sl ist

in

X

General Case Leftmost symb ol with

R

center in HOR region

x

Table Horizontal Adjacency of Symb ols Under HOR

Spatial Functions

minX maxX topWall

(TLEFT) ABOVE SUPER maxY Upper Threshold HOR ( Lower Threshold A minY (BLEFT) BELOW SUBSC

bottomWall leftWall* leftWall rightWall

(limit symbols)

Figure General Regions for Spatial Functions

A Mathematics Notation Parser

For horizontal lines and op en brackets START is applied to the region

dened by the maxX b ounding b ox co ordinate of the symb ol and the re

maining three wall attributes

These denitions of HOR START and the remaining spatial regions

clearly adopt some soft conventions of spacing eg a denition of regions

overlap and a series of thresholds The thresholds adopted are designed to

b e as exible as p ossible allowing for the widely separated symb ols such as

those in Figure d to b e recognized as spatially related in this particular

case as a sup erscript

Secondary Baseline Symb ol Sets

A given baseline symb ol set can b e divided in several ways ie by the

o ccurance of other baselines in the regions shown in Figure We call

these secondary baseline symb ol sets as their p osition is obtained relative

to the main baseline symb ol set in a given region The regions p ertinent to

secondary baseline symb ol sets may b e examined using the following metho d

Let B b e a baseline symb ol set in region R RminXRminYRmaxXRmaxY

For in where n is the numb er of symb ols in B s s are the

n

symb ols in B

set the rightWall attribute of s to the minX co ordinate of s

i i

set the leftWall attribute of s to the symb ols minX co ordinate

i

unless this is a limit symb ol If the limit symb ol follows an op en

bracket or horizontal line assign maxX of s to leftWall If the

i

limit symb ol starts a baseline assign RminX to leftWall Other

wise set leftWall to the minX co ordinate of the limit symb ol

set the topWall attribute of s to RmaxY

i

set the b ottomWall attribute of s to RminY

i

In any order examine the following two disjoint regions for each of the

symb ols in B see Figure

ABOVE fminXUpp er ThresholdmaxXtopWallg

BELOW fminXb ottomWallmaxXLower Thresholdg

A Mathematics Notation Parsing Algorithm

Additionally the following also disjoint regions may b e examined

SUPER fmaxXUpp er ThresholdrightWalltopWallg

SUBSC fmaxXLower ThresholdrightWallb ottomWallg

TLEFT fleftWallUpp er ThresholdminXtopWallg

BLEFT fleftWallb ottomWallminXLower Thresholdg

CONTAINS fminXminYmaxXmaxYg

For each region R ab ove STARTRl ist is applied which returns the

in

starting symb ol of the main baseline symb ol set in that region

Symb ol identity of the symb ols in B determines which regions are exam

ined

TLEFT and BLEFT are examined only for limit symb ols which either

start a baseline rst symb ol in B or directly follows a horizontal line

or an op en bracket b oth of which indicate the end of a sub expression

CONTAINS is used only for square ro ots

SUPER is not used for horizontal lines or op en brackets

SUBSC is not used for horizontal lines or op en brackets

A Mathematics Notation Parsing Algo

rithm

parsing algorithm is given which extracts a baseline In App endix B an On

structure tree from an input list Essentially the algorithm recursively lo cates

symb ols which start a baseline extracts the asso ciated baselines and then

records the observed baseline structure in a baseline structure tree In this

section we discuss the algorithm in only a very general way This discussion

is simplied symb ols are describ ed as b eing pushed on and o the stack

when it is really the index to the symb ol and an asso ciated tree no de that is

pushed onto either data structure and a numb er of other details are ignored

Please consult the app endix for the more detailed description

While lo cating baselines essentially using START in dierent regions

symb ols which start a baseline are pushed on a queue In the extraction

A Mathematics Notation Parser

stage symb ols are removed from the queue one at a time After removing

a symb ol from the queue it is pushed on a stack and the function HOR is

then used in a lo op to lo cate the asso ciated baseline symb ols which are then

also pushed on the stack When HOR returns eg the end of a baseline

is foundEOBL is pushed on the stack to act as a separator The next

symb ol in the queue is then removed and the same pro cess rep eated until

the queue is empty

When the queue is empty the algorithm reverts back to lo cating base

lines by using START on all the appropriate regions for secondary baselines

relative to the symb ols on the stack Any symb ols which start a secondary

baseline are placed in the queue The algorithm stops when either all sym

b ols have b een added to the baseline structure tree or no new baselines are

found in which case an error is indicated The baseline structure tree is

then returned

Some alterations may b e made to the algorithm For instance the al

gorithm has b een constructed so that the tree is built one level at a time

though this is not necessary An algorithm using a single stack which builds

the baseline structure tree depthrst may also b e created The paired data

structures of a stack and a queue are remnants of earlier research b efore

START was used recursively

Tree Rewriting

Graph rewriting is a p owerful and exible formalism As discussed in Chapter

it has b een used for parsing many dierent typ es of visual languages Tree

rewriting a subset of graph rewriting has the advantage of b eing tractable

Tree rewriting has proven a very p owerful technique for translating program

ming languages to dierent versionsdialects

The context present in a baseline structure tree may b e used to manip

ulate the tree in order to represent higher orderspatial relations directly

andor subtrees may b e regroup ed and reparsed In this way the initial

baseline structure tree pro duced by the algorithm ab ove may b e used for

structural analysis using dierent visual language syntax denitions ie di

alects This is why the baseline structure tree represents a metasyntax the

original structure is a subset of the syntax of existing dialects of which we

are aware

As an example recall Figure In this instance the baseline structure

Tree Rewriting

tree while valid in terms of representing the dened spatial relations was

not descriptive enough to represent the common syntactic structure present

in the context of a summation

The simple rule provided in Chapter for rewriting Figure as Figure

is adequate for limits comprised of a single baseline This rule represents

a higherorder spatial relation ie the new ULIMIT relation is dened in

terms of binary spatial relations in the tree

However it may b e p ossible for andor other limits to b e present

To address this new p ossibility one can simply remove and then concatenate

the three regions SUPER ABOVE and TLEFT found as children of a limit

symb ol Place the concatenated regions in a separate input to the Baseline

Structure Tree parsing algorithm Then place the resulting baseline structure

tree and place the result as a child of the limit symb ol with the relation lab el

ULIMIT

As a last example consider Figure The initial baseline structure

tree Figure br shows the from as part of the sup erscript

P

region of the x This is b ecause the do es not start the main baseline

However the sup erscript region o of the x may b e divided into two regions

partitioned by the rightmost symb ol which has a centroid that overlaps the

P

horizontally in this case the j All symb ols to the left of and including

the partition symb ol remain in the SUPER region the remaining symb ols

are placed in a new region lab eled ULIMIT The symb ols in the SUPER and

P

ABOVE regions relative to the are also placed in ULIMIT and the subtrees

ro oted at SUPER and ABOVE removed The symb ols in the ULIMIT and

SUPER regions SUPER relative to the x are then reparsed pro ducing

the tree in Figure c

It is worth p ointing out that using whitespace analysis may have simpli

ed this problem greatly The algorithm describ ed in this thesis uses neither

white space or p ointsize information The expression in Figure would

require this typ e of analysis as the last tree rewrite provided would not alter

the baseline structure tree leaving part of the limit the in the SUPER

region of the x

The visual language syntax dened by the parsing algorithm and the

ab ove rewrite rules is very simple However it is descriptive enough to obtain

the structure of a large numb er of expressions as demonstrated in the next

chapter Future work stemming from more complex syntax denitions is

discussed in the last chapter

A Mathematics Notation Parser

Translation to Output Strings

A linear representation of a baseline structure tree called a positional sen

tence may now b e obtained by p erforming a depthrst traversal of the

tree The structure of an expression in a p ositional sentence translated from

a baseline structure tree is a recursive structure of the form

s ABOVEfSB g BELOWfSB g SUBSCfSB g SUPERfSB g

ULIMITfSB g BLIMITfSB g s

All items in square brackets are optional and each nested baseline SB

p osesses the same structure as ab ove

A

A L T X string can b e obtained in a similar fashion mapping the ap

E

propriate symb ol names andor contexts in the tree to the corresp onding

LaTeX symb ol Alternately the baseline structure tree may b e rewritten

A

into a L T X compatible form ie replacing symb ols and contexts and

E

then directly translated to a string using a depthrst traversal In either

case the ro ot Expression needs to b e removed in order to create a legal

A

L T X string

E

In theory this typ e of translation is also p ossible for Maple and Mathemat

ica or other computer algebra system languages but this involves additional

semantic issues discussed in Chapter and is b eyond the scop e of this thesis

Translation to Output Strings

Expression aj 10000 x i x i i = 1 SUPER ABOVE BELOW a 1 0000 i = 1 a. Input Expression SUBSC j b. Initial Baseline Structure Tree

Expression

x i

SUPER ULIMIT BELOW a 10000 i = 1 SUBSC j

c. Rewritten Baseline Structure Tree

Figure Tree Rewriting

A Mathematics Notation Parser

Chapter

Implementation and Test

Results

Based on the general parser outlined in Chapter a parsing application was

built the Diagram Recognition Application for Computer Understanding

of Large Algebraic Expressions DRACULAE DRACULAE was develop ed

using an existing mathematics entry system called FFES develop ed by Steve

Smithies at the University of Otago New Zealand FFES had a parser of its

own which was based on existing graph grammar techniques but which

was very restricted in terms of numb er of symb ols and layout that the system

could handle DRACULAE was develop ed with the intention to augment

andor replace the graph grammarbased parser The FFES graphical user

interface is shown in Figure

DRACULAE was written in Java and FFES was implemented in TclTk

and C DRACULAE has b een interfaced with FFES through alteration

of the FFES TclTk co de Both DRACULAE and FFES have b een built

on Linux platforms and DRACULAE has also b een succesfully built under

Solaris

At present only LaTeX and p ositional sentences are available as output

ie structural representations In the last chapter we discuss issues in obtain

ing mappings to semantic representations such as Maple and Mathematica

strings

In the following a general overview of FFES and DRACULAE are pro

vided followed by a discussion of test results for DRACULAE

Implementation and Test Results

Figure The Freehand Entry System

The Freehand Formula Entry System

In this section we briey outline the FFES system For greater detail please

see

The Freehand Formula Entry System is designed to allow an individual

to input math expressions to a computer using a mouse or data tablet The

system is comprised of a graphic user interface a graphgrammar parser and

a nearestneighb our symb ol recognizer created by Jim Arvo at Caltech

The user draws a numb er strokes which FFES tracks and groups for

recognition in the background After a sp eciable time delay or after four

strokes have b een made the maximum numb er of strokes needed for the

set of symb ols recognized the symb ol recognizer is called on all p ossible

partitions of the strokes sent to the recognizer Note that Jim Arvo has

indicated that a newer version sends only O n sets of stroke partitions to

The Freehand Formula Entry System

the recognizer

The system allows easy correction of any stroke grouping or symb ol iden

tity recognition errors The Group button allows the user to select strokes

to b e joined by drawing a line through the strokes to b e group ed The

program then p erforms recognition on the selected group and any strokes

separated by the grouping

If a symb ol lab el is incorrect the Replace button allows the user to click

the mouse on the misrecognized symb ol and then select from a pulldown

list of lab els in order of condence from the symb ol recognizer or typ e a

lab elling string in from the keyb oard

In addition the spatial p osition of symb ols may b e moved using the

Select button The user can select individual symb ols or groups of symb ols

and move them in the image This is useful for interaction b etween the

parser output and the user inputs which have their structure misparsed can

b e easily altered

The original graphgrammar based parser may b e called on the recog

nized symb ols using the FFES parser button while the DRACULAE

parser calls the implementation of the baseline structuretree based parsing

algorithm describ ed in the last chapter The FFES parser was kept in the

application for comparison only and in the future will b e removed

The errorcorrection facilities of FFES proved invaluable during devel

opment of DRACULAE b ecause this allowed our research to assume the

presence of a p erfect symb ol recognizer Because of the clear lab elling in

FFES any mislab elled symb ols in the image input can b e quickly lo cated

and corrected While the user may o ccassionally miss misclassied symb ols

these errors are easily detected Likewise if the layout of symb ols confuses

the parser this can b e easily corrected through directly manipulating the

lo cation of the symb ols Currently DRACULAE parses in under two seconds

on average on a PentiumI I I MhZ Linux system with a two to ten sec

ond delay to create the graphical user interface window This is due to the

implementation language Java rather than the complexity of the parser

FFES was tested with human users and was found to b e a promising way

to input mathematical expressions to a computer The largest drawback

in the system time and accuracy wise was the existing parser

Implementation and Test Results

DRACULAE implementation

DRACULAE is comprised of a parser and a graphic user interface DRAC

ULAE takes a list of symb ols as input from any source providing b ounding

b ox and symb ol identity information and passes it to the parser Then

the baseline structure tree output by the parser is passed to the graphic

user interface which displays the tree and calls translators to display the

A

L T X and p ositional sentence translations as shown in Figure Tabs

E

are used to switch b etween the Image Viewer which displays an image

p

ab

A

of the L T X pro cessed string shown in the Output String line eg

E

x

and the Tree Viewer which shows the baseline structure tree and symb ol

attributes

Figure DRACULAE Graphic User Interface Image and Tree Views

Shown

Java was an ideal prototyping language for the following reasons First

it creates a theoretically platform indep endent prototyp e Second the

javado c facility was app ealing from a program understanding and analysis

p oint of view Third the Swing library allowed automated visualization

of the tree structure which was crucial to the program design

Test Results

DRACULAE is relatively fast parsing in under a second for many large

expressions In Linux the largest delay seems to result b ecause the version

of Java which was available did not p osess a justintime compiler resulting

in purely interpreted execution Window creation and display is particularly

slow taking up to seconds at times

In addition to b eing relatively fast DRACULAE is robust If symb ols

are missed in the input a message is sent to the terminal along with the

symb ols missed The part of the expression that was parsed is passed along

If no input is passed a single no de lab elled Expression the ro ot of the

tree is returned

Test Results

Through FFES DRACULAE has b een tested on hundreds of mathematical

expressions There are approximately one hundred test cases which act as a

test suite Examples of inputs which have b een prop erly recognized are pre

sented in Figures to Figures and are taken from a Calculus

textb o ok while Figure is presented as an example of limit handling

in DRACULAE Figure shows an expression with accents prop erly han

dled by DRACULAE and nally Figure demonstrates a large complex

expression recognized by DRACULAE

General Results

The following are some general results obtained during testing

On average the time from requesting a parse from DRACULAE to

viewing the graphical user interface is under ten seconds

We have tested on expressions with more than symb ols in complex

layouts and noticed little or no p erformance discrepancy compared

with smaller expressions

DRACULAE is robust If a misparse o ccurs the generated p ositional

A A

sentence and L T X strings are returned and a L T X image created In

E E

this case the structure of the baseline structure tree may b e observed

A

if the image do es not oer enough information The L T X string

E

will often have the spatial relations explicitly in the output image in

Implementation and Test Results

this case eg ABOVE SUPER etc app ear directly in the generated

image Empty input is handled by returning a baseline structure tree

A

with a single ro ot and the string Expression as L T X output

E

As DRACULAE p erforms no semantic analysis syntactically invalid

inputs eg with unmatched parentheses are handled without di

culty provided the baseline structure is clear to DRACULAE

The thresholds used for regions work reasonably well if there is little

skew in the input expression However slanted expressions result in

improp er subscripting or sup erscripting

Horizontal lines are mapp ed to division lines subtraction symb ols over

line eg b o olean negation or underline dep ending on the context in

the baseline structure tree For instance if a horiztonal line has sym

A

b ols ab ove and b elow L T X is instructed to create a If the

E

line has an argument ab ove or b elow underline or overline strings are

created Finally if no symb ols are ab ove or b elow the line the line is

translated as a subtraction symb ol Horizontal lines are thus allowed

to interact in complex ways with little ambiguity

When horizontal lines overlap less than the required thresholded amount

for overlap detection unusual outputs are created

R

P

Limit symb ols eg may have overlapping limits see Figure

and these limits may b e of almost arbitrary complexity as shown in

Figure However due to the current denition of START the dom

inant limit symb ol must b e leftmost in the case of an expression such as

that given in Figure This could b e resolved through incorp orating

size information into STARTs denition

There is a rewrite rule to collect separated signs using context in

the tree The particular version that has b een implemented do es not

work if the is in a limit but otherwise p erforms well This could

b e resolved by a constraint on the length that a line must b e to b e

considered overlapping a limit symb ol

A

Many expressions which are not translated to L T X appropriately are

E

nontheless recognized accurately For instance choice notation created

using single andor variables have the relative p osition of the

Test Results

( 1 ) n 1 − n t − i Sigma Sigma omega u v j t j − j = 0 t 0

=

Figure Test Input Example

symb ols correctly represented in the baseline structure tree A tree

A

rewrite to group the expression into an appropriate L T X string simply

E

hasnt b een created yet

Implementation and Test Results

7

Sigma j

j = 4 Sigma 2 ( ) i + 7

i

= 6

Figure Test Input Example

sqrt 1 4 1 1 x integral integral integral − 1 d d d y x − z 2 ) 1/ 2 ( 2 sqrt + y 1 2 x 1 x 0 −

− −

Figure Test Input Example

Test Results

( ) − − b a

disjunctionc disjunction

Figure Test Input Example

3 6 4

2 + − 2 sqrt

4 2

x

− sqrt 2 2 a sqrt 2 4 b + b + a − c −

− − 2 a ( integral5 ) 3 2 x d

2 x − x 4 + 6

0

Figure Test Input Example

Implementation and Test Results

Chapter

Future Work and Conclusion

In this chapter future work arising from this research is discussed Improve

ments which may to b e made to DRACULAE are outlined along with more

general issues such as reimplementation of DRACULAE in TXL semantic

issues which arise in mapping from baseline structure trees to semantic in

terpretations and diagrammatic notations other than mathematics notation

which may b enet from a recognition metho d similar to baseline extraction

Limitations of DRACULAE

Starting Symb ol Denition

As noted in the last chapter Figure would not b e successfully parsed if the

P P

in the upp er limit were moved to the left of the lower This is b ecause

the op erator dominance analysis currently p erformed by START do es not

take symb ol size into account It also was not originally clear that this was

an analysis of op erator dominance rather than simply a spatial analysis

With this new information DRACULAE would b enet greatly from a more

generally dened START function with a b etter dened op erator dominance

analysis

Whitespace

The current implementation of DRACULAE p erforms no tokenization of

input and do es not provide rewrite rules to group integers separate function

Future Work and Conclusion

names etc This would not b e dicult to add to the current system however

analysis of whitespace is necessary for creating a more mature system

Futher neither matrices or multiple line expressions are recognized prop

erly using DRACULAE It is unclear whether simple rewrites could b e cre

ated to handle these cases though it seems more likely that segmenting the

input and sending singleline expressions to DRACULAE would result in

b etter p erformance due to the p otential complexity of analysis

Sensitivity to Symb ol Size

Very large symb ols have very large regions and small symb ols viceversa It

is unclear whether representation of symb ols by single p oints in handwritten

expressions is reasonable or whether some additional b ounding b ox andor

pixel information would result in b etter p erformace particularly in situations

where skew is present

Pixel Level Information

DRACULAE cannot nor will it b e able to handle structures which require

pixellevel information As an example consider nthro ots Without knowing

where the line dividing the expression into inner and outer parts of the square

ro ot is it is imp ossible to dierentiate a value in the scop e of a ro ot from a

value indicating the degree of a ro ot

An arbitrary threshold could b e used but this would restrict the typ es

of expressions that could b e accepted by DRACULAE

In order to lo cate the degree of a ro ot without uncertainty DRACULAE

would need access to the p osition of the dividing line Kerned symb ols are

another example where pixel level information would result in improved p er

formance for instance in the case of T which DRACULAE would currently

n

recognize as T BELOW fng if the T b ounding b ox overlaps the centroid of

the n

TXL Implementation of DRACULAE

Java was a convenient language for prototyping DRACULAE abstracting a

great deal of low level details However it is not easy to express tree rewriting

rules in Java requiring a fairly large amount of detailed co de for even a single

TXL Implementation of DRACULAE

rewrite Additionally while the Java implementation of DRACULAE is fast

it remains to b e seen how rapidly a version in a compiled language would

execute

TXL is a programming language which p erforms a pro cess identical to

that used for DRACULAE a tree is built using a grammar the tree is rewrit

ten using a set of pro ductions and then the tree is translated into an output

string TXL allows attributed no des in its trees making the translation

from Java to TXL relatively easy TXL represents tree pro ductions using

a compact and simple syntax this makes extension and maintenance of a

system like DRACULAE easier than if the system is programmed in pro

gramming language such as C C or Java To extend DRACULAE as

it currently is to a real system capable of handling dialects Figure is prop osed

Type II Representation Baseline Structure Tree Builder Type III Representation (Attributed Symbol List) (TXL) (Structural Representation: Positional Sentence)

Type III Representation (Structural Representation, TXL Dialect Translators i.e. LaTeX) Translation Program(s) for Dialect 1 Dialect 1 Translation Program(s) for Dialect 2 Dialect 2 ...

... Translation Program(s) for Dialect n Dialect n

Type IV Representation (Semantic Representation, i.e. Maple)

Dialect 1 Dialect 2 ...

Dialect n

Figure Prop osed TXL Reimplementation of DRACULAE

Future Work and Conclusion

Issues in Semantics of Mathematics No

tation

As stated earlier mathematics notation is a natural visual language and as a

result do es not p ossess a xed semantic interpretation Therefore it would b e

worth making a study to determine which parts of mathematical semantics

if any are xed ie are hard conventions and which uctuate by dialect

ie are soft conventions

In order for a semantic interpretation to b e made the baseline structure

obtained using a system such as DRACULAE must b e brokenrewritten into

tokens integer values must b e distinguished from decimal values exp onential

numb ers function names names and so on The current system

makes no such analysis though this is clearly necessary for b oth displaying

A

more complicated expressions ie to indicate to L T X where whitespace is

E

to app ear and for semantic interpretation andor evaluation of mathematical

expressions that have had their baseline structure extracted This would not

b e dicult to add directly to DRACULAE in the form of tree rewrite rules

In the case of simple arithmetic a semantic interpretation may obtained

through mapping from the baseline structure tree to the op erator tree for

the expression as shown in Figure

2 + (2 * 3) Expression + 2 () 2 +() 2* 3 * 2 3

a. Expression b. Baseline Structure Tree c. Operator Tree

Figure Simple Arithmetic Expression and Representations

The op erator tree may b e obtained from the baseline structure tree using

op erator dominance The op erator tree indicates the order of application

of the op erators b ottomup

For more complex mathematical expressions a computer algebra system

language such as Maple or Mathematica may b e used for representation For

this typ e of representation to b e constructed a priori semantic information is

Use of Directed Recognition Algorithms in Other Notations

required For instance to pro duce appropriate output for a Maple program

that uses a function p there must b e a facility to indicate that pa is in fact

function applciation and not implied multiplication of terms

It would b e worth studying what other typ es of a priori semantic in

formation are needed to pro duce valid semantic interpretations for dierent

dialects of mathematics notation With such information DRACULAE or a

similar system could b e extended to pro duce strings in a computer algebra

system format eg Maple for dierent dialects of mathematics notation as

shown in Figure

Use of Directed Recognition Algorithms

in Other Notations

The algorithm used in this thesis exploits the direction inherent in baselines

Given the ability to lo cate the b eginning of a baseline symb ol set an al

gorithm may b e dened in order to progress lefttoright to nd the other

symb ols on a given baseline In a divide and conquer fashion baseline symb ol

sets are lo cated recursively using binary spatial relations

Other notations b esides mathematics notation have a clear element of

direction and it may b e p ossible to recognize the structure of these notations

in a similar fashion Music notation for instance has direction inherent in

the progression of notes lefttoright across a sta The analogy b etween

baselines in mathematics notation and a voice in music notation is strong in

fact a single musical voice can b e viewed as a typ e of symb ol set similar to

a baseline symb ol set A complicating factor in music notation is that unlike

baselines voices in music notation can overlap and as a result it is not trivial

to determine which voice a note b elongs to It is worth examining whether

a partially or completely dialectneutral representation of the structure of

music notation may b e obtained There are many dialects of music notation

so it is unclear whether this is p ossible

Circuit diagrams also have an element of direction the direction of cur

rent ow If p ower sources can b e lo cated in a diagram it may b e p ossible

to employ the direction of circuit ow directly to aid recognition

Future Work and Conclusion

Conclusion

The research presented in this do cument has established the following thesis

Through separating spatial structure from semantics in mathe

matics notation a general and exible recognition of mathematics

expressions may b e obtained

As a reminder general is used to indicate that dialects of mathematics no

tation may b e conveniently handled Flexible refers to the ability to handle

a large range of symb ol placements

The following contributions which supp ort this thesis have b een discussed

A mo del for the structure of mathematics notation

A novel mo del called the baseline structure tree was intro duced in

Chapter This mo del represents spatial structure b etween baselines

in mathematics expressions It is demonstrated in Chapter how a

baseline structure tree may b e rewritten in order to p erform syntactic

analysis for dierent dialects of mathematics notation

A visual language parsing algorithm

In Chapter an algorithm which creates baseline structure trees from

attributed symb ol lists is describ ed The parsing algorithm presented is

more exible than exisiting mathematics notation recognition systems

b ecause the spatial relationships used to obtain baseline structure may

b e redened and tree rewriting may b e used to handle soft conventions

ie dialects

An implementation

An implementation of the visual language parsing algorithm was pre

sented in Chapter The implementation is named the Diagram Recog

nition Application for Computer Understanding of Large Algebraic Ex

pressions DRACULAE Chapter outlines how DRACULAE has

b een integrated into a complete recognition system for handwritten

mathematics notation the Freehand Formula Entry System FFES

The symb ol recognition and user interface comp onents of FFES were

created by Steve Smithies Jim Arvo and Kevin Novins The system

has b een tested on hundreds of handwritten expressions with excellent

Conclusion

results FFES and DRACULAE were demonstrated at CASCON

and were enthusiastically received Public distribution of the system is planned

Future Work and Conclusion

Bibliography

RH Anderson SyntaxDirected Recognition of HandPrinted Two

Dimensional Equations PhD thesis Harvard University Cambridge

MA January

RH Anderson Twodimensional mathematical notation In KS Fu

editor Syntactic Pattern Recognition SpringerVerlag New York

Stephan Baumann A simplied attributed graph grammar for highlevel

music recognition In Proc Third Intl Conf on Document Analysis and

Recognition Montreal Canada August

Ab delwaheb Belaid and JeanPaul Haton A Syntactic Approach for

Handwritten Mathematical Formula Recognition IEEE Transactions

on Pattern Analysis and Machine Intel ligence January

Benjamin P Berman and Richard J Fateman Optical character recog

nition for typ eset mathematics In Proceedings of the International

Syposium on Symbolic and Algebraic Computation pages July

Dorothea Blostein General diagramrecognition metho dologies In Lec

ture Notes in volume pages Springer

Verlag New York

Dorothea Blostein and Ann Grbavec Recognition of mathematical no

tation In Handbook of Character Recognition and Document Image

Analysis pages World Scientic Publishing Company

BIBLIOGRAPHY

H Bunke Attributed programmed graph grammars and their applica

tion to schematic diagram interpretation IEEE Transactions on Pattern

Analysis and Machine Intel ligence Novemb er

Florian Ca jori A Chelsea Publishing Company

New York

Florian Ca jori A History of Mathematical Notations The Op en Court

Publishing Company Chicago Illinois vols

G Castagliola A De Lucia S Orece and G Tortora A framework

of syntactic mo dels for the implementation of visual languages In Proc

Symposium on Visual Languages pages

ShiKuo Chang A metho d for the structural analysis of twodimensional

mathematical expressions Information Sciences

TW Chaundy PR Barrett and Charles Batey The Printing of Math

ematics Oxford University Press London

LingHwei Chen and PengYeng Yin A system for online recognition of

handwritten mathematical expressions Computer Processing of Chinese

and Oriental Languages June

P A Chou Recognition of equations using a twodimensional sto chastic

contextfree grammar In W A Pearlman editor Visual Communica

tions and Image Processing IV volume of SPIE Proceedings Series

pages

JR Cordy CD Halp ern and E Promislow Txl A rapid prototyp

ing system for programming language dialects Computer Languages

Jan

Genarro Costagliola and ShiKuo Chang Parsing linear pictorial lan

guages by syntaxdirected scanning Languages of Design

Genarro Costagliola Andrea De Lucia and Sergio Orece Towards

ecient parsing of diagrammatic languages In Proceedings of Advanced

Visual Interfaces pages ACM Press

BIBLIOGRAPHY

Genarro Costagliola Andrea De Lucia and Sergio Orece A parsing

metho dology for the implementation of visual systems IEEE Transac

tions on Software Engineering Decemb er

Genarro Costagliola Andrea De Lucia Sergio Orece and Genny Tor

tora Positional grammars A formalism for LRlike parsing of visual

languages In Visual Language Theory pages SpringerVerlag

New York

Gennaro Costagliola and ShiKuo Chang Using linear p ositional gram

mars for the LR parsing of d symb olic languages draft pap er for copy

contact gencosdiaunisait

Gennaro Costagliola Andrea De Lucia and Sergio Orece Towards

ecient parsing of diagrammatic languages In Proceedings of Advanced

Visual Interfaces pages ACM Press

Yannis A Dimitriadis and Juan Lop ez Coronado Towards an art based

mathematical editor that uses online handwritten symb ol recognition

Pattern Recognition

Dov Dori and Amir Pneuli The grammar of dimensions in machine

drawings Computer Vision Graphics and Image Processing

Talaat Salem ElSheikh Recognition of handwritten arabic mathemat

ical In United Kingdom Information Technology Conference

March

Ho da Fahmy and Dorothea Blostein A graph grammar programming

style for recognition of music notation Machine Vision and Applica

tions

Ho da Fahmy and Dorothea Blostein A graphrewriting paradigm for

discrete relaxation Application to sheet music recognition Interna

tional Journal of Pattern Recognition and Articial Intel ligence August

Richard J Fateman and Taku Tokuyasu Progress in recognizing typ eset

mathematics In Proceedings of the International Society for Optical

Engineering volume

BIBLIOGRAPHY

Claudie Faure and Zi Xiong Wang Automatic p erception of the struc

ture of handwritten mathematical expressions In R Plamondon and

C G Leedham editors Computer Processing of Handwriting pages

World Scientic Publishing Co

KS Fu Syntactic Pattern Recognition SpringerVerlag New York

Ann Grbavec Recognition of mathematics notation using graph rewrit

ing Masters thesis Queens University Kingston Ontario Canada

January

Ann Grbavec and Dorothea Blostein Mathematics recognition using

graph rewriting In Third Intl Conf on Document Analysis and Recog

nition Montreal August

Nicholas J Higham Handbook of Writing for the Mathematical Sciences

So ciety for Industrial and Applied Mathematics Philadelphia

Donald E Knuth TeX and METAFONT New Directions in Typeset

ting Digital Press Crosby Drive Bedford MA USA

Andreas Kosmala and Gerhard Rigoll Online handwritten formula

recognition using statistical metho ds In Proceedings of the Fourteenth

International Conference on Pattern Recognition pages Au

gust

Stephane Lavirotte and Loic Pottier Optical Formula Recognition In

Proc th International Conference on Document Analysis and Recogni

tion volume pages Ulm Germany

HsiJian Lee and JiumnShine Wang Design of a mathematical ex

pression recognition system In Proceedings of the third International

Conference on Document Analusis and Recognition pages

HsiJuan Lee and MinChou Lee Understanding Mathematical Expres

sions Using Pro cedureOriented Transformation Pattern Recognition

BIBLIOGRAPHY

Kim Marriott Bernd Meyer and Kent D Wittenburg A survey of vi

sual language sp ecication and recognition In Visual Language Theory

pages SpringerVerlag New York

William A Martin Computer InputOutput of Mathematical Expres

sions In Proceedings of the Second Symposium on Symbolic and Alge

braic Manipulation pages March

William G McCallum Deb orah HughesHallett Andrew M Gleason

and et al Multivariable Calculus Draft Version John Wiley and Sons

Inc

Eric G Miller and Paul A Viola Ambiguity and constraint in math

ematical expression recognition In Proceedings of the th National

Conference of Articial Intel ligence Madison Wisconsin July

American Asso ciation of Articial Intelligence

Brad A Myers Taxonomies of visual programming and program visu

alization Journal of Visual Languages and Computing

Masayuki Okamoto and Akira Miyazawa An exp erimental implemen

tation of a do cument recognition system for pap ers containing math

ematical expressions In HS Baird H Bunke and K Yamamoto

editors Structured Document Image Analysis pages Springer

Verlag New York

Nasayuki Okamoto and Bin Miao Recognition of Mathematical Expres

sions by Using the Layout Structures of Symb ols In Proceedings of the

First International Conference on Document Analysis and Recognition

volume pages SaintMalo France

Giulia M Pagallo Constrained attribute grammars for recognition of

multidimensional ob jects In Advances in Pattern Recognition pages

SpringerVerlag

Steve Smithies Freehand formula entry system Masters thesis Uni

versity of Otago Dunedin New Zealand May

Steve Smithies Kevin Novins and James Arvo A HandwritingBased

Equation Editor In Proc Graphics Interface Kingston Ontario

Canada June

BIBLIOGRAPHY

Hashim M Twaakyondo and Masayuki Okamoto Structure Analysis

and Recognition of Mathematical Expressions In Proceedings of the

Third International Conference on Document Analysis and Recognition

volume Montreal Canada

ZiXiong Wang and Claudie Faure Structural Analysis of Handwritten

Mathematical Expressions In Proceedings of the Ninth International

Conference on Pattern Recognition pages

HJ Winkler H Fahrner and M Lang A Soft Decision Approach

for Structural Analysis of Handwritten Mathematical Expressions In

International Conference on Acoustics Speech and Signal Processing

pages IEEE

HansJurgen Winkler and Manfred Lang Symb ol segmentation and

recognition for understanding handwritten mathematical expressions In

Progress in Handwriting Recognition World Scientic Singap ore

Yanjie Zhao Tetsuya Sakurai Hiroshi Sugiura and Tatsuo Torii A

metho dology of parsing mathematical notation for mathematical com

putation In Proceedings of the International Symposium on Symbolic

and Algebraic Computation July

App endix A

A Positional Grammar for

Baseline Structure

In this app endix a positional grammar is presented which represents the

structure of baselines in mathematics notation ie a syntax of baseline

structure trees The strings derived from the grammar are p ositional sen

tences which corresp ond to baseline structure trees

A p ositional grammar is an attributed contextfree grammar augmented

by a set of p ositional relations which are explicit in the derived parse string

The output of a positional parser for a p ositional grammar is a string called

a p ositional sentence which alternates terminals of the grammar with p osi

tional relations see for more details There is

a class of p ositional grammars for which fast ie On LR parsers with

actions may b e constructed using an automatic parser generator such as

YACC

In a p ositional grammar multiple p ositional relations on one symb ol are

presented using enumeration of relations in the p ositional sentence For ex

ample in

aH O R fcgSUPER fgSUBSC fig

the numeral n after each relation indicate that it applies to the symb ol on the

immediate right and the nth last symb ol in the string The ab ove example

represents a c

i

Figure A denes a p ositional grammar for the structure of baselines in a

mathematics expression Note that SYMBOL is a terminal of the grammar

A Positional Grammar for Baseline Structure

E SP BLSS

BLSS BLS HOR BLSS j

BLS SYMBOL B C D E F G H

B TLEFT fBLSSg j

C BLEFT fBLSSg j

D ABOVE fBLSSg j

E BELOW fBLSSg j

F SUPER fBLSSg j

G SUBSC fBLSSg j

H CONTAINS fBLSSg j

Figure A Positional Grammar of Baseline Structure

representing an attributed symb ol Attributes are propagated from the left

hand side to the right hand side This is done in the following way

Rule will assign the wall attributes of BLSS to the pair

In Rule the wall attributes of the lefthand BLSS baseline symb ol

set are passed to b oth of the nonterminals on the right hand side The

rightWall attribute of BLS baseline symb ol is then set to the mini

mum b ounding b ox co ordinate for the symb ol represented by the BLSS

nonterminal on the right side of the rule this provides the necessary

partition for SUBSC and SUPER regions

In Rule the wall attributes of BLS are passed to SYMBOL and all

nonterminals B to H

In Rules if a symb ol is lo cated the BLSS on the right hand side

inherits wall attributes equivalent to the region within which the symb ol

was found

A Positional Grammar for Baseline Structure

A problem with the grammar ab ove is that the grammar presents context

through bracketing which is not present in the original p ositional grammar

formalism As a result the manner in which to cleanly construct the bracket

ing in a parser for the grammar ab ove has not presented itself The algorithm

in Chapter is essentially equivalent to a topdown parser based on the gram

mar ab ove For that parsing algorithm the bracketing issue is resolved by

constructing a baseline structure tree during the parse and then bracketing

based on context while translating the tree into a string

Attributes need to b e synthesized topdown for the grammar ab ove and

as a result an LR parser could not b e constructed for the grammar even if the

bracketing issue were resolved More research is needed to see whether the

techniques used by Costagliola et al to create LR parsers with actions may

b e generalized to create LL parsers with actions The advantage of such a

generalization is that it would allow the use of automatic parser generators to

build LL as well as LR parsers for visual languages describable by p ositional grammars

A Positional Grammar for Baseline Structure

App endix B

Algorithms

B START and OVERLAP

In the following we dene spatial functions STARTRl ist where R is a

in

region and l ist an input list STARTRl ist returns the starting symb ol

in in

s of input list l ist in region R We assume that l ist is a list sorted by

star t in in

leftmost co ordinate indexed from to the numb er of elements in the list

START Let l ist b e the passed input list

in

Let l ef tW al l bottomW al l r ig htW al l topW al l b e the passed values den

ing a region R

Let l ef tmostI ndex

Let l imitI ndex

Let l istI ndex

Let ov er l apI ndex

Let n b e the numb er of items in l ist

in

While l ef tmostI ndex and l istI ndex n

If l ist l istI ndex is an unmarked symb ol with its centroid in

in

region R let l ef tmostI ndex l istI ndex

else let l istI ndex l istI ndex

If l ef tmostI ndex then return l ef tmostI ndex

else

While l istI ndex n and l imitI ndex

Algorithms

If l ist l istI ndex is an unmarked limit symb ol in region R

in

then let l imitI ndex l istI ndex

else l istI ndex l istI ndex

If l imitI ndex or l imitI ndex l ef tmostI ndex

return OVERLAPl ef tmostI ndextopW al l bottomW al l l ist

in

else

Let upper T hr eshol d b e the maximum y b ounding b ox co or

dinate value at l ist l imitI ndex

in

Let l ow er T hr eshol d b e the minimum y co ordinate b ounding

b ox value at l ist l imitI ndex

in

While l istI ndex l ef tmostI ndex

l istI ndex l istI ndex

If the centroid y co ordinate of the symb ol at l ist l istI ndex

in

upper T hr eshol d and the same y centroid co ordinate

l ow er T hr eshol d then let ov er l apI ndex l istI ndex

If ov er l apI ndex l imitI ndex then

return OVERLAPl ef tmostI ndextopW al l bottomW al l l ist

in

else return OVERLAPl imitI ndextopW al l bottomW al l l ist

in

END OF ALGORITHM

Next we describ e OVERLAPsymb olIndextopWallb ottomWall l ist

in

OVERLAP Let sy mbol I ndex b e the passed index to l ist

in

Let topW al l and bottomW al l b e passed yco ordinates

Let l ist b e the passed input list

in

Let l istI ndex sy mbol I ndex

Let stop false

Let n b e the numb er of items in l ist

in

If l ist sy mbol I ndex is a line then let maxLeng th b e maxX minX of

in

that symb ol

else let maxLeng th

Let mainLine

While l istI ndex and stop false

If l ist l istI ndex contains a symb ol which has a minX b ound

in

ing b ox co ordinate less than that of the symb ol at l ist sy mbol I ndex

in

then stop true

B Main Parsing Algorithm

else l istI ndex l istI ndex

While l istI ndex n and minX b ounding b ox co ordinate of the symb ol at

l ist l istI ndex is less than maxX of l ist sy mbol I ndex

in in

If l ist l istI ndex is

in

An unmarked horizontal line with y centroid co ordinate

topWall and b ottomWall and

Has a minX b ounding b ox co ordinate minX of l ist sy mbol I ndex

in

and

Is longer than maxLeng th

then let maxLeng th b e the length of this line maxXminX and

let mainLine l istI ndex

If mainLine return sy mbol I ndex

else return mainLine

END OF ALGORITHM

The worstcase time complexity for START and OVERLAP are O n In

the worst case OVERLAP scans the entire list rst forward and then back

ward p erforming O op erations for each element giving O n START

in the worst case will rst scan the entire input forward and backward in the

case of a limit symb ol which is rightmost in the input with the start of one of

the limits b eing leftmost again O n Assuming that all symb ols horizon

tally overlap the subsequent call to OVERLAP then requires an additional

O n steps n n O n Therefore b oth algorithms are of linear time

complexity in the worst case

B Main Parsing Algorithm

In this section we present the full algorithm describ ed in chapter

Let l ist b e a sorted list of prepro cessed symb ols

in

Let T b e a tree with a single no de at the ro ot T

r oot

Let S b e a stack

Let Q b e a queue

Let ParentNo de Symb olNo de and RelationNo de b e tree no des

Algorithms

Let Temp and Temp and s b e integers

star t

Let region R fg

Obtain the index of the start symb ol s SPR

star t

If s set the wall attributes of s to R enqueue s T in Q

star t star t star t r oot

and mark the symb ol at l ist s

in star t

While Q is not empty

EXTRACT BASELINES

While Q is not empty

TempParentNo de dequeueQ

Assign to Symb olNo de a new tree no de with all the attributes of

the symb ol at Temp in l ist

in

Push TempSymb olNo de on S

R wallAttributesTemp

Temp HORl ist Temp

in

While Temp

Mark the symb ol at l ist Temp

in

wallAttributesTemp wallAttributesTemp

Let Symb olNo de b e a new tree no de containing all attributes

asso ciated with l ist Temp

in

Add Symb olNo de as the last child of ParentNo de

Push TempSymb olNo de

rightWallTemp minXTemp

If Temp is a limit symb ol and Temp is a horizontal line or

op en bracket assign leftWallTemp maxXTemp

Temp Temp

Temp HORl ist Temp

in

Push EOBL on the stack

LOCATE SECONDARY BASELINES

While S is not empty

If topStack EOBL then PopS

TempSymb olNo de Pops

B Main Parsing Algorithm

Check all of the relevant secondary baseline regions for the symb ol

at index Temp in l ist TLEFT BLEFT ABOVE BELOW

in

SUBSC SUPER CONTAINS each calls STARTRl ist for

in

R dened for the region b eing examined see Figure For

each region examined which returns an result ie Temp

STARTRl ist Temp and Temp do the following

in

Mark the symb ol at l ist Temp

in

wallAttributesTemp R

Let RelationNo de b e a new tree no de lab eled with the name

of the matching relation

Add RelationNo de as a child of Symb olNo de

Enqueue Temp RelationNo de in Q

Scan the input If any unmarked tokens remain output an error indicating

symb ols in the input were not added to the baseline structure tree this

corresp onds to Genarro Costagliolas ANY function

Return T

END OF ALGORITHM

The algorithm is of time complexity O n At most n symb ols are con

sidered for each of the extract and lo cate pro cesses indicated in upp er case

letters ie the inner lo ops execute O n times each HOR and START are

b oth O n as established earlier This eciency is due to the determinism

of the parser

An adjacency list representation may b e used for the tree T With an

adjacency list creating no des is an O n op eration in the worst case We

never need to examine the tree during the pro cess of the algorithm

The maximum size of the tree is n where n is the numb er of symb ols in

the input list The worst case o ccurs when no baseline symb ol set has more

than one symb ol as an element

Algorithms

Vita

Name Richard Zanibbi

Place and year of birth Sudbury Ontario

Education Queens University

Exp erience Teaching assistant Department of Computing and

Information Science Queens University

Research assistant Department of Computing

and Information Science Queens University

Software Develop er Legasys Corp oration

Kingston Ontario Canada

Awards Wilfred Laurier University Scholarshipdeclined

Queens Graduate Award

University of Otago Optical Music Recognition

Research Scholarshipdeclined