Robust Meeting Request Extraction from Emails, Using a Collocational Semantic Grammar

Total Page:16

File Type:pdf, Size:1020Kb

Robust Meeting Request Extraction from Emails, Using a Collocational Semantic Grammar

Robust Identification and Extraction of Meeting Requests from Email Using a Collocational Semantic Grammar

Hugo Liu Hassan Alam, Rachmat Hartono, Timotius Tjahjadi MIT Media Laboratory Natural Language R&D Division 20 Ames St., Bldg. E15 990 Linden Dr., Suite 203 Cambridge, MA 02139 USA Santa Clara, CA 95050 [email protected] {hassana, rachmat, timmyt}@bcltechnologies.com

Abstract ed by a client, or arranging a movie date with a friend. Because today we rely on email to accomplish much of Meeting Runner is a software agent that acts as a our social and work-related communication, and because user’s personal secretary by observing the user’s emails are in some senses less invasive to a busy person incoming emails, identifying requests for meet- than a phone call, people generally prefer to request ings, and interacting with the person making the meetings and get-togethers with co-workers and friends request to schedule the meeting into the user’s by sending them an email message. The sender might calendar, on behalf of the user. Two important then receive a response confirming, declining, or subtasks are being able to robustly identify rescheduling the meeting. This back-and-forth interac- emails containing meeting requests, and being tion may continue many times over until something is able to extract relevant meeting details. Statisti- agreed upon. Such a task model is referred to as asyn- cal approaches to meeting request identification chronous email meeting scheduling. are inappropriate because they generate a large Previous approaches to software-assisted asynchronous number of false positive classifications (fallout) email meeting scheduling either require all involved par- than can be annoying to users. A full parsing ap- ties to possess common software, such as with Microsoft proach has low fallout, and can assist in the ex- Outlook, and Lotus Notes, or require explicit user action, traction of relevant meeting details, but exhibits such as with web-based meeting invitation systems like poor recall because deep parsing methods break evite.com and meetingwizard.com. over very noisy emails. In the former approach taken by Microsoft Outlook and Lotus Notes, users can directly add meeting items to In this paper, we demonstrate how a broad-cov- the calendars of other users, and the software can auto- erage partial parsing approach using a colloca- matically identify times when all parties are available. tional semantic grammar can be combined with This is very effective, and can be very useful within com- lightweight semantic recognition and informa- panies where all workers have common software; howev- tion extraction techniques to robustly identify er, such an approach is inadequate as a universal solution and extract meeting requests from emails. Using to email meeting scheduling because a user of the soft- a relatively small collocation-based semantic ware cannot use the system to automatically schedule grammar, we are able to demonstrate a good meetings with non-users of the software, and vice versa. 74.5% recall with a low 0.4% fallout, yielding a The latter approach exemplified by evite.com and precision of 93.8%. We situate these processes meetingwizard.com moves the meeting scheduling task to in the context of our overall software agent ar- a centralized web server, and all involved parties commu- chitecture and the email meeting scheduling task nicate with that server to schedule meetings. Because all domain. that is required is a web browser, the second approach circumvents the software-dependency limitations of the 1 The Task: Email Meeting Scheduling former. However, a drawback is that this system for via Natural Language 1 meeting scheduling is not automated; it requires users to read the email with the invite, open the URL to the meet- When we think of computers of the future, what comes to ing item, and check some boxes. If the meeting details mind for many are personal software agents that help us were to change, the whole process would have to repeat to manage our daily lives, taking on responsibilities such itself. It is evident that this approach is not amenable to as booking dinner reservations, and ordering groceries to automation. restock the refrigerator. One of the more useful tasks that a personal software agent might do for us is to help manage our schedules – booking an appointment request-

1 1.1 Why via Natural Language? 1.2 Identifying Meeting Requests and The approach we have taken is to build a personal soft- Extracting Meeting Details ware agent called Meeting Runner that can automatically The task model of the software agent has the following interact with co-workers, clients, and friends to schedule steps: 1) observe the user’s incoming emails and from meetings through emails by having the interaction take them, identify emails containing meeting requests; 2) place in plain and common everyday language, or natural from meeting request emails, extract partial meeting de- language as it is generally called. Figure 1 depicts how tails; 3) through natural language dialog over email, in- the system automatically recognizes a meeting request teract with the person making the request to negotiate the from incoming email and extracts the relevant meeting details of the meeting; 4) schedule the meeting in the details. user’s calendar. In this paper, we focus on the first two steps, which are themselves very challenging. In section 2, we motivate our approach by first reviewing how two very common identification and extraction strategies fail to address the needs of our task model. In section 3, we discuss how shallow partial parsing with a collocational semantic grammar is applied to the task of meeting request identi- fication. Section 4 presents how lightweight semantic recognition agents and information extraction techniques extract relevant meeting details from identified emails. Section 5 gives an evaluation of the performance of the identification and extraction tasks. We conclude by dis- cussing some of the methodological gains of our ap- proach, and give directions for future work.

2 Existing Strategies Two existing strategies to the identification of meeting requests and the extraction of meeting details are consid- ered in this section. First, statistical model-based classi- fication can be coupled with statistical extraction of meeting details. Second, full parsing can be combined with extraction of meeting details from parse trees. In the following subsections we argue how neither strategies address the needs of the application’s task model require- ments. Figure 1. Screenshot of the Meeting Runner agent recogniz- ing a meeting request in an incoming email. Email is not a 2.1 Strategy #1: Statistical typical example and is simple for illustrative purposes. Statistical machine learning approaches are popular in the information filtering literature, especially with regards to In contrast to existing approaches to meeting scheduling, the task of email classification. Finn et al. (2002) ap- we have chosen natural language as the communication plied statistical machine learning approaches to genre “protocol”. Natural language is arguably the most com- classification of emails based on handcrafted rules, part- mon format a software program can communicate in, be- of-speech, and bag-of-words features. In one of their ex- cause humans are already proficient in this. By specify- periments, they sought to classify an email as either sub- ing natural language as the format of emails that can be jective or fact, within a single domain of football, poli- understood and generated by our software agent, we can tics, or finance. They reported accuracy from 85-88%. overcome the problem of required common software (a While these results might at first glance suggest that a person who has installed our agent can automatically re- similar approach is promising in the identification of ceive meeting requests and schedule meetings with some- meeting request emails, there are problems in the details. one who does not have our agent installed), and the prob- First, error (12-15%) was equally attributable to false lem of required user action (our agent can interact with positives and false negatives (the distribution of false the person who requested the meeting by further emails, positives versus false negatives is hard to control in sta- never requiring the intervention of the user). tistical classifiers). This would imply a false positive (fallout) rate of 6-7%. In our meeting request scheduling application, the system would take an action (e.g. reply to the sender, or notify the user) each time it detected a meeting. User attention is very expensive. While the system can tolerate missing some true meeting request emails, since the user can still discover the meeting re- was extremely poor (< 30%). Upon closer examination quest manually, the system cannot tolerate many false of the reasons for the poor performance, we found that meeting request identifications, as they waste the user’s the email domain was too noisy for the syntactic parser to attention. Therefore, our task model requires a very low handle. Sources of noise in our corpus included improper fallout rate, and statistical classification would seem in- capitalization, improper or lack of punctuation, mis- appropriate. spellings, run-on sentences, short sentence fragments, Second, there are further reasons to believe that meet- and disfluencies resulting from English as a Second Lan- ing request classification is a far harder problem for sta- guage authors. And the problem is not limited to our tistical classifiers than genre classification. In genre Link Grammar Parser, as most chart parsers are also gen- classification, vocabulary and word choice are surface erally not very tolerant of noise. Such poor performance features that are fairly evenly spread across the large in- was disappointing, but it helped to inspire another ap- put. However, email meeting requests can be as short as proach—one which exhibits characteristics of parsing, “let’s do lunch”, with no hints that can be gleaned from without its brittleness. the rest of the email. In this sense, statistical classifiers would have trouble because they are semantically weaker 3 Robust Meeting Request Identification methods that require large input with cues scattered throughout. Though we have not explicitly experimented Unlike the relative “clean” text found in the Wall Street with statistical classifiers for our task, we anticipate that Journal corpus, text found in emails can be notoriously such characteristics of the input would make machine “dirty”. As previously mentioned, email texts often lack learning and classification very difficult. proper punctuation, capitalization, tend to have sentence Third, even if we assumed that a statistical classifier fragments, omit words with little semantic content, use could do a fair job of identifying emails containing meet- abbreviations and shorthand, and sometimes contain ing requests, it still would not be able to identify the mildly ill-formed grammar. Therefore, many of the full salient sentence(s) in the email explicitly containing the parsers that can parse clean text well would have a tough request. Explicit identification of salient sentences time with a dirty text, and are generally not robust would provide valuable and necessary cues to the meet- enough for this type of input. Thankfully, we do not ing detail extraction mechanism. Without this informa- need such a deep level of understanding for meeting re- tion, the extraction of such details would prove difficult, quest extraction. In fact, this is purely an information ex- especially if the email contains multiple dates, times, traction task. As with most information extraction prob- people, places, and occasions. Also, in such a case, sta- lems, the desired knowledge, which in our case is the tistical extraction of meeting details would prove nearly meeting request details, can be described by a semantic impossible. frame with the slots similar to the following: 2.2 Strategy #2: Full Parsing  Meeting Request Type: (new meeting request, can- cellation, rescheduling, confirmation, irrelevant) Now that we have examined some of the reasons why sta-  Date/Time interval proposed: (i.e.: next week, next tistical methods might not be appropriate to our task, we month) examine the possibility of applying a full parsing ap-  Location/Duration/Attendees proach. In this approach, we perform a full syntactic  Activity/occasion: (i.e.: birthday party, conference constituent parse of each email, and from the resulting call) parse trees, we perform semantic interpretation into the- matic role frames. We could then use rule-based heuris- As previously defined, the task of identifying and extract- tics to determine which semantic interpretations are ing meeting request details from emails can be decom- meeting requests and which are not. Similarly, we can posed into 1) classifying the request type of the email as extract meeting details from our semantic interpretations. shown in the frame above, and 2) filling in the remaining On some levels this method is more appropriate to our slots in the frame. In our system, the second task can be task than statistical methods. It is much easier to prevent solved with help from the solution to the first problem. false positives using rule-based heuristics over a parse We approach the classification of email into request type tree than via statistical methods, which are less amenable classes in the following manner: Each request type class to this kind of control. Also, the full parsing approach is treated as a language described by a grammar. Mem- would yield the exact location of salient sentences and bership in a language determines the classification. therefore, facilitate the extraction of meeting details in Membership in multiple languages requires disambigua- the close proximity of the salient meeting request sen- tion by a decision tree. If an email is not a member of tences. any of the languages, then it is deemed an irrelevant However, from pilot work, we found this approach to email not containing a meeting request. We will now de- be extremely brittle and impractical. Using the con- scribe the properties of the grammar. stituent output from the Link Grammar Parser of English (Sleator and Temperley, 1993) bundled with some rule- based heuristics for semantic interpretation, we parsed a test corpus of email. While fallout was held low, recall 3.1 A Collocational Semantic Grammar our example rule as the following: (for clarification, we Semantic grammars were originally developed for the do- also show the expansions of some semantic types) mains of question answering and intelligent tutoring (Brown and Burton, 1975). A property of these gram- MeetingRequest  ProposalType SecondPersonType mars is that the constituents of the grammar correspond GatherVerbType DateType ActivityType to concepts specific to the domain being discussed. An example of a semantic grammar rule is as follows: ProposalType  can| could | may | might

MeetingRequest SecondPersonType  we | us  Can we get together DateType for ActivityType

In the above example, DateType and ActivityType can be GatherVerbType  get together | meet | … satisfied by any word or phrase that falls under that se- mantic category. Semantic grammars are a practical ap- In our new rules, it is implied that the right-hand side proach to parsing emails for request type because they al- contains a collocation of atoms. That is to say, between low information to be extracted in stages. That is, se- each of the atoms in our new rule, there can be any string mantic recognizers first label words and phrases with the of text. An additional constraint of collocations is that in semantic types they belong to, then rules are applied to applying the rules, we constrain the rule to match the text sentences to test for membership in the language. Se- only within a specified window of words, for example, mantic grammars also have advantage of being very intu- ten words. Our rewritten rule has improved coverage, itive, and so extending the grammar is simple to under- now generating all the productions mentioned earlier, stand. Examples of successful applications of semantic plus many more. In addition, the rule becomes more ro- grammars in information extraction can be found in en- bust to ill-formed grammar, omitted words, etc. Anoth- trants to the U.S. government sponsored MUC confer- er observation that can be made is that our grammar size ences, including FASTUS system (Hobbs et al., 1997), is significantly reduced because each rule is capable of CIRCUS (Lehnert et al., 1991), and SCISOR (Jacobs and more productions. Rau, 1990). The type of semantic grammar shown in the above ex- 3.2 Negative Collocates ample is still somewhat narrow in coverage because the There are however, limitations associated with colloca- productions generated by such rules are too specific to tion-based semantic grammars – namely, the words not certain syntactic realizations. For example, the previous specified, which fall between the atoms in our rules, can example can generate the first production listed below, have a unforeseen impact on the meaning of the text sur- but not the next two, which are slight variations. rounding it. Two major concerns relating to meeting re- quests are the appearance of “not” to modify the main  Can we get together tomorrow for a movie verb, and the presence of a sentence break in the middle  *Can we get together tomorrow to catch a movie of a matching rule. For example, our rule incorrectly ac-  *Can we get together sometime tomorrow and check cepts the following productions: out a movie  *Could we please not get together tomorrow for that We could arguably create additional rules to handle the movie? (presence of the word “not”) second and third productions, but that comes at the ex-  *How can we meet tomorrow? I have to go to a pense of a much larger grammar in which all syntactic re- movie with Fred. (presence of an inappropriate sen- alizations must be mapped. We need a way to keep the tence break) grammar small, the coverage of each rule broad, and at To overcome these occurrences of false positives, we in- the same time, the grammar we choose must be robust to troduce the novel notion of negative collocates into our all the aforementioned problems that plague email texts grammar, which we will denote as atoms surrounded by # like omission of words, and sentence fragments. To meet signs. A negative collocate between two regular collo- all of these goals, we add the idea of collocation to our cates means that the negative collocate may not fall be- semantic grammars. Collocation is generally defined as tween the two regular collocates. We also introduce an the proximity of two words within some fixed “window” empty collocate into the grammar, represented by 0. An size. This technique has been used in variety of natural empty collocate between two regular collocates means language tasks including word-sense disambiguation that nothing except a space can fall between the two reg- (Yarowsky, 1993), and information extraction (Lin, ular collocates. We can now modify our rule to restrict 1998). Applying the idea of collocations to our semantic the false positives it produces as follows: grammar, we eliminate all except the three or four most salient features from each of our rules, which generally happen to be the following atom types: subjectType, verbType, and objectType. For example, we can rewrite MeetingRequest  ProposalType 0 SecondPersonType tween consecutive email requests. Figure 2 shows a sim- #SentenceBreak# #not# GatherVerbType #Sentence Break# plified version of the finite state automaton that does this. DateType #SentenceBreak# ActivityType

Using this latest rule, most plausible false positives are restricted. Though it is still possible for such a rule to generate false positives, pragmatics make further false positives unlikely. 3.3 Implications of Pragmatics Pragmatics is the study of how language is used in prac- tice. A basic underlying principle of how language is used is that it is relevant, and economical. According to Grice’s maxim (Grice, 1975), language is used coopera- tively and with relevance to communicate ideas. The work of Sperber and Wilson (1986) adds that language is used economically, without intention to confuse the lis- tener/reader. Figure 2. The nodes in this simple finite state automaton The implications of pragmatics on our grammar is that represent request types, and the directed edges represent although there exists words that could be added between allowable transitions between the states collocates to create false positives, the user will not do so So in the above example, an email whose DCF has the re- if it makes the language more expensive or less relevant. quest type “meeting declined” cannot be classified as This important implication is largely validated by the “meeting confirmed” at the present step, and a meeting fallout metric in performance evaluations which we will cannot be rescheduled if it has not first been requested. present later in this paper. 3.5 “Parsing” with a Collocational 3.4 Constraints on Email Classification Semantic Grammar from Dialog Context Our collocational semantic grammar contains no more One additional piece of context which can be leveraged than 100 rules for each of the request types. “Parsing” to improve the accuracy of the meeting request classifier the text occurs with the following constraint: the distance is constraint from the dialog context. In our system, from the first token in the fired rule to the last token in emails may be classified into the request types of: new the fired rule must fall within a fixed window size, which meeting request, cancellation, rescheduling, confirma- is usually ten words. When the language of more than tion, or irrelevant. Using information gleaned from email one request type accepts the email text, disambiguation headers, we are able to track whether or not an email is techniques are applied, such as the aforementioned finite following up with a previous email which contained a state automaton of allowable request type transitions meeting request. This constitutes what we call a “dialog (Figure 2). context”. We briefly explain its implications to the clas- Now that we have discussed how the collocational se- sifier below. mantic grammar enables robust meeting request identifi- cation, we now discuss how linguistic preprocessing, se- 3.4.1 Dialog Context Frames mantic recognition agents, and informational extraction The dialog management step examines the incoming techniques are applied to fill in the rest of the meeting re- email header to determine if that email belongs to a quest details. thread of emails in which a meeting is being scheduled. To accomplish this, each email determined to contain a 4 Extracting Meeting Details meeting request is added to a repository of “dialog con- In our previous discussion of the collocational semantic text frames” (DCF). A DCF serves to link the unique ID grammar, we took for granted that we could recognized of the email, which appears in the email header, to the certain classes of named entities such as dates, activities, meeting request frame that contains the details of the times, and so forth. In this section, we give an account of meeting. The slots of the DCF are nearly identical to the how linguistic preprocessing and semantic recognition meeting request frame, except that it also contains infor- agents perform this recognition. We then explain how, mation about the email thread it belongs to. DCFs are given an identified meeting request sentence within an passed on to later processing steps, and provide the his- email, the rest of the meeting request frame can be filled torical context of the current email, which can be useful out. As we have thus far described processes out of or- in disambiguation tasks. In addition, the most recent re- der, we wish to refer the reader to Figure 3 for the overall quest type, as specified in a DCF, helps to determine the processing steps of the system architecture to regain allowable next request type states. In other words, a fi- some context. nite state automaton dictates the allowable transitions be- In cases where multiple qualifying tokens can fill a slot, distance to the parts of the email responsible which were accepted by the grammar is deemed inversely proportion- al to the relevance of that token. Therefore, we can dis- ambiguate the attachment of frame slots tokens by word distance. Sanity check rules and truth maintenance rules verify that the chosen frame details are consistent with the request type and the user’s calendar. When the semantic frame is filled, it is sent to other components of the system that execute the necessary ac- tions associated with each request type.

Figure 3. Flowchart of the system’s processing steps. 5 Evaluation The performance of the natural language component of 4.1 Normalization our system was evaluated against a real-world corpus of Because of the dirty nature of email text, it is necessary 5680 emails containing 670 meeting request emails. The to clean, or normalize, as much of the text as possible by corpus was compiled from the email collections of sever- fixing spelling and abbreviations, and regulating spacing al people who reported that they commonly schedule and punctuation. We apply an unsupervised automatic meetings using emails. Emails were directed to the par- spelling correction routine to the text, recognize and tag ticular person in the “To:” line, and not to any mailing abbreviations, and tokenize paragraphs and sentences. list that the person belongs to, and does not include spam mail. Emails were judged by two separate evaluators as 4.2 Semantic Recognition Agents to whether or not they contained meeting requests. Each semantic recognition agent extracts a specific type Emails over which evaluators disagreed were discarded of information from the email text that becomes relevant from the corpus. The standard evaluation metrics of re- to the task of filling the meeting request frames. These call, precision, and fallout (false positives) were used. include semantic recognizers for dates, times, durations, An email is marked as a true positive if it meets ALL date and time intervals, holidays, activities, and action of the following conditions. verbs. 1. the email contained a meeting request The resulting semantic types are also used by the pars- 2. the request type was correctly identified by our er to determine the meeting request type. For many of system the semantic types used by our system, matching against 3. the request frame was filled out correctly. a dictionary of concepts which belong to a semantic type Likewise, false positives meet ANY of the following con- is sufficient to constitute a recognition agent. But other ditions: semantic types such as temporal expressions require more 1. the email does not contain a meeting request but elaborate recognition agents, which may use a generative was classified as containing a meeting request grammar to describe instances of a semantic type. 2. the request frame was filled out incorrectly As with other heuristic and dictionary-based approach- es, not all of the temporal and semantic expressions will In the test system, a collocation window size of 10 was be recognized, but we demonstrate in the evaluation that used by the collocational semantic grammar’s pattern most are recognized. In the task of semantic recognition matcher. Of the 5680 emails in our test corpus, 670 con- in a practical and real domain such as email, our experi- tain email requests. Our system discovered 499 true pos- ence is that the 80-20 rule applies. 80% recognition per- itives and 21 false positives, missing 171 meeting re- formance can be obtained by handling only 20% of the quests. Table 1 summarizes the findings of the evalua- conceivable cases. Again, the implications of pragmatics tion. can be felt. Table 1: evaluation summary 4.3 Filling in Meeting Details Metric Score Recall 74.5% After identifying the request type of the email, each re- quest type has a semantic frame associated with it which Precision 96.0% must be filled as much as possible from the text. Be- Fallout 0.4% cause each slot in the semantic frame has a semantic type Evaluation based on 5680 emails, 670 of which requested meetings associated with it, slot fillers are guaranteed to be atoms. Filling the frame is only a matter of finding the right to- Upon examination of the results, we suggest two main kens. Because we know the location of the salient meet- factors are to account for the false positives. First, our ing request sentence(s) in the email, we use a proximity use of synonym sets like “VerbType” as described in sec- heuristic to extract the relevant meeting details. tion 3.1 led to several overgeneralizations. For example, the words belonging to the GatherVerbType may include “get together,” “meet,” and “hook up.” Unintended mean- edge bases. We plan to do more extensive evaluations ings and syntactic usages of these phrases (e.g. “a high once we have compiled a corpus from our beta testing. school track meet”) do occur, and occasionally, these word sense ambiguities cause false classifications, 6 Conclusion though most are caught by fail-safes such as the dialog context frame. However, these overgeneralizations were We have built a personal software agent that can automati- expected, given that we did not employ word-sense dis- cally read the user’s emails, detect meeting requests, and in- ambiguation techniques such as part-of-speech tagging or teract with the person making the request to schedule the chunking, because these mechanisms themselves generate meeting. Unlike previous approaches taken to asynchro- error. Second, email layout was a source of errors. In nous email meeting scheduling, our system receives meeting some emails, spacing and indentation was substituted for requests and generates meeting dialog email all in natural proper end-of-sentence punctuation. This sometimes language, which is arguably the most portable representa- caused consecutive sections to be concatenated together, tion. This paper focused on two tasks in our system: identi- becoming a source of noise for the semantic grammar’s fying emails containing meeting requests, and extracting de- pattern matcher. tails of the proposed meeting. Approximately 25% of the actual meeting request We examined how two prominent approaches for email emails were missed by the system. A variety of factors classification, statistical and full parsing, failed to address contributed to this. The largest contributor was lack of the needs of our task model. Statistical classifiers generate vocabulary of specific named entities. For example, our too many false positives which are an expensive proposition semantic recognizers can recognize the sentence “Do you because they annoy and distract users. Full parsing is too want to see a movie with me tonight?” as a MeetingRe- brittle over the noisy email corpora we tested, and produce quest because it recognizes seeing a movie as an Activi- too low a recall. By leveraging collocational semantic ty. However, it does not recognize the sentence, “Do you grammars with two unique collocation operators, the nega- want to see the Matrix with me tonight?” because it has tive collocate and empty collocate, we were able to more no vocabulary for the names of movies. Other meeting flexibly identify meeting requests, even over ill-formed text. requests were often missed because there was no appro- This recognition of salient meeting request sentences priate rule in the grammar or because their recognition worked synergistically to help the extraction of remaining required a larger window size. Upon playing with win- meeting details. Our preliminary evaluation the first genera- dow sizes, however, we found that increasing the window tion implementation, while small, shows that the approach size by just 2 tokens substantially increased the occur- has promise in demonstrating high recall and very low fall- rence of false positives. The tradeoff between varying out. window sizes will be a future point of exploration. Other less prominent contributors included the inability to rec- 6.1 Limitations ognize some date and activity phrases, the often subtle There are several limitations associated with the approach and implicit nature of meeting requests (e.g. “Bill was taken. One issue is scalability. Unlike systems that use ma- hoping to learn more about your work”), and the fact that chine learning techniques to automatically learn rules from a many meeting requests were temporally unspecified, i.e., training corpus, our grammar must be manually extended. there was no particular date or time proposed. Luckily, the email meeting request domain is fairly small In spite of the lower-than-expected 74.5% accuracy and contained, and the ease of inputting new rules makes rate, we see the evaluation results as positive and encour- this limitation bearable. aging. The most encouraging result is that fallout is min- Another issue is portability. Our grammar and recogni- imal at 0.4%. The fact that our collocation-based seman- tion agents were developed specifically for the email meet- tic grammar did not create more false positives provides ing request domain, so it is highly unlikely that they will be some validation to the collocation approach, and to the reuseable or portable to other problem domains, though we implications of pragmatics on the reliability of positive feel that the general methodology of using collocational se- classifications produced through such an approach. mantic grammars for low-fallout classification will find uses The recall statistics can be improved by broadening the in many other domains. coverage of our grammar, and by acquiring more specific Despite these limitations, the availability of an agent to vocabulary, such as movie names. As of now, each re- facilitate email users in the identification and management quest type has fewer than 100 grammar rules, and cur- of meeting requests is arguably invaluable in the commer- rently there are only seven semantic types. There are no cial domain, thus justifying the work needed to build a do- semantic types with a level of granularity that would cov- main-specific grammar. Performance of the evaluated sys- er a movie name. We believe that investing more re- tem suggests how the application can best be structured. sources to expand the grammar and vocabulary can boost Meeting requests can be identified and meeting details ex- our recall by 10%. Growing the grammar is fairly easy to tracted with 74.5% accuracy, with only a 0.4% occurrence do because rules are simple to add, and each rule can of false positives. In the Meeting Runner software agent, generate a large number of productions. Growing vocab- meeting scheduling is currently semi-automatic. The sys- ulary is dependent on the availability of specific knowl- tem opportunistically identifies incoming emails that con- tain email requests. Because of the low fallout rate, costly user attention is not squandered in incorrect classifications. Grice, H. P. (1975). Logic and conversation. In Cole, P., Thus, in measuring the benefit of the system to the user, we and Morgan, J. L. (Eds.), Speech Acts: Syntax and Seman- make a fail-soft argument. The system helps the user identi- tics Volume 3, pp. 41-58. Academic Press, New York. fy and schedule the vast majority of meeting requests auto- matically, while in the remaining cases, the user does the Hobbs, J. R., Appelt, D., Bear, J., Israel, D., Kameyama, scheduling manually, which is what he/she would anyway M., Stickel, M.E., and Tyson, M. (1997). FASTUS: A cas- in the absence of any system. caded finite-state transducer for extracting information from natural-language text. In Roche, E., and Schabes, Y. (Eds.), 6.2 Future Work Finite-State Devices for Natural Language Processing, pp. 383-406. MIT Press, Cambridge, MA. In the near future, we plan to extend the coverage of the grammar to include an expanded notion of what a “meeting” Jacobs, P. and Rau, L. (1990). SCISOR: A system for ex- can be. For example many errands such as “pick up the tracting information from online news. Communications of laundry” could constitute meeting requests. the ACM, 33(11), 88-97. We would also like to increase the power of the recogni- Jurafsky, D. and Martin, J. (2000). Speech and Language tion agents by supplying them with more world semantic re- Processing: An Introduction to Natural Language Process- sources, or information about the world. For example, our ing, Computational Linguistics and Speech Recognition, pp. current system will understand, “Do you want to see a 501-661. Prentice Hall. movie tonight?” but it will not be able to understand, “Do you want to see Lord of the Rings tonight?” We can envi- Lehnert, W. G., Cardie, C., Fisher, D., Riloff, E., and sion providing our recognition agents with abundant seman- Williams, R. (1991). Description of the CIRCUS system as tic resources such as movie names, all of which can be used for MUC-3. In Sundheim, B. (Ed.), Proceedings of the mined from databases on the Web. We have already begun Third Message Understanding Conference, pp. 223-233. to supply our recognition agents with world knowledge Morgan Kaufmann. mined out of the Open Mind Commonsense knowledge base (Singh, 2002), an open-source database of approximately Levin, B. (1993). English Verb Classes and Alternations. 500,000 commonsense facts. By growing the dictionary of University of Chicago Press, Chicago. everyday concepts our system understands, we can hope to Lin, D. (1998). Using collocation statistics in information improve the recall of our system. extraction. In Proc. of the Seventh Message Understanding Conference (MUC-7). Acknowledgements Lotus Notes Web Site. Available at http://www.lotus.com/home.nsf/welcome/notes We thank our colleagues at the MIT Media Lab, MIT AI Lab, and BCL Technologies. This project was funded by MeetingWizard.com Web Site. Available at the U.S. Dept. of Commerce ATP contract # 70NANB9H3025 http://www.meetingwizard.com Microsoft Outlook Web Site. Available at http://www.mi- References crosoft.com/outlook/ Bobrow, D. G., Kaplan, R. M., Kay, M., Norman, D. A., Singh, P. (2002). The public acquisition of commonsense Thompson, H., and Winograd, T. (1977). Gus, a frame driv- knowledge. In Proceedings of AAAI Spring Symposium: en dialog system. Artificial Intelligence, 8, 155-173. Acquiring (and Using) Linguistic (and World) Knowledge Brown, J. S. and Burton, R. R. (1975). Multiple representa- for Information Access. Palo Alto, CA, AAAI. tions of knowledge for tutorial reasoning. In Bobrow, D. G. Sleator, D. and Temperley, D. Parsing English with a Link and Collins, A. (Eds.), Representation and Understanding, Grammar, Third International Workshop on Parsing Tech- pp. 311-350. Academic Press, New York. nologies, August 1993. Available at: http://www.link.cs.c- Evite.com Web Site. Available at http://www.evite.com mu.edu/link/papers Fellbaum, C. (Ed.). (1998). WordNet: An Electronic Lexical Sperber, D., and D. Wilson. (1986). Relevance: Communi- Database. MIT Press, Cambridge, MA. cation and Cognition. Oxford: Blackwell. Finn, A., Kushmerick, N. & Smyth, B. (2002) Genre classi- Woods, W. A. (1977). Lunar rocks in natural English: Ex- fication and domain transfer for information filtering. In plorations in natural language question answering. In Zam- Proc. European Colloquium on Information Retrieval Re- polli, A. (Ed.), Linguistic Structures Processing, pp. 521- search (Glasgow). 569. North Holland, Amsterdam. Gaizauskas, R., Wakao, T., Humphreys, K., Cunningham, Yarowsky, D. (1993). One sense per collocation. In Pro- H., and Wilks, Y. (1995). University of Sheffield: Descrip- ceedings of the ARPA Workshop on Human Language tion of the LaSIE system as used for MUC-6. In Proceed- Technology, pages 266-271. ings of the Sixth Message Understanding Conference (MUC-6), San Francisco, pp. 207-220. Morgan Kaufmann.

Recommended publications