Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research

Denis Newman-Griffis1 Jill Fain Lehman2 Carolyn Rosé3 Harry Hochheiser1 1Department of Biomedical Informatics, University of Pittsburgh, USA 2Human-Computer Interaction Institute, Carnegie Mellon University, USA 3Language Technologies Institute, Carnegie Mellon University, USA {dnewmangriffis, harryh}@pitt.edu, {jfl, cprose}@cs.cmu.edu

Abstract Applications Enables application

Natural language processing (NLP) research needs andDrive constraints modeling goals combines the study of universal principles, Translational through basic science, with applied science tar- questions NLP geting specific use cases and settings. How- Identifies new research phenomena ever, the process of exchange between ba- Discovers underlyingProvides evidence for linguistic theory sic NLP and applications is often assumed Linguistics Modeling to emerge naturally, resulting in many inno- vations going unapplied and many important Guides model questions left unstudied. We describe a new development paradigm of Translational NLP, which aims to Figure 1: Interactions between linguistic theory, model structure and facilitate the processes by which development, and applications in NLP research. Solid basic and applied NLP research inform one lines indicate moving from basic research to applica- another. Translational NLP thus presents a tions, and dashed lines indicate how applied research third research paradigm, focused on under- feeds back into basic study. Translational NLP devel- standing the challenges posed by application ops processes to realize this exchange. needs and how these challenges can drive in- novation in basic science and technology de- sign. We show that many significant advances in NLP research have emerged from the in- impact. Meanwhile, real-world applications may tersection of basic principles with application rely on regular expressions (Anzaldi et al., 2017) needs, and present a conceptual framework or unigram frequencies (Slater et al., 2017) when outlining the stakeholders and key questions more sophisticated methods would yield deeper in- in translational research. Our framework pro- sight. When successful translations of basic NLP vides a roadmap for developing Translational insights into practical applied technologies do oc- NLP as a dedicated research area, and identi- fies general translational principles to facilitate cur, the factors contributing to this success are exchange between basic and applied research. rarely analyzed, limiting our ability to learn how to enable the next project and the next technology. 1 Introduction We argue for a third kind of NLP research, which Natural language processing (NLP) lies at the in- we call Translational NLP. Translational NLP re-

arXiv:2104.07874v1 [cs.CL] 16 Apr 2021 tersection of basic science and applied technolo- search aims to understand why one translation suc- gies. However, translating innovations in basic ceeds while another fails, and to develop general, NLP methods to successful applications remains a reusable processes to facilitate more (and easier) difficult task in which failure points often appear translation between basic NLP advances and real- late in the development process, delaying or pre- world application settings. Much NLP research venting potential impact in research and industry. already includes translational insights, but often Application challenges range widely, from changes considers them properties of a specific application in data distributions (Elsahar and Gallé, 2019) to rather than generalizable findings that can advance computational bottlenecks (Desai et al., 2020) and the field. This paper illustrates why general prin- integration with domain expertise (Rahman et al., ciples of the translational process enhance mutual 2020). When unanticipated, such challenges can exchange between linguistic inquiry, model devel- be fatal to applications of new NLP methodologies, opment, and application research (illustrated in Fig- leaving exciting innovations with minimal practical ure1), and are key drivers of NLP advances. We present a conceptual framework for Trans- on one problem at a time, and frequently leverages lational NLP, with specific elements of the trans- established datasets to provide a well-controlled lational process that are key to successful appli- environment for varying model design. Basic NLP cations, each of which presents distinct areas for research is intended to take the long view: it takes research. Our framework provides a concrete path the time to investigate fundamental questions that for designing use-inspired basic research so that may yield rewards for years to come. research products can effectively be turned into Applied research Applied NLP research studies practical technologies, and provides the tools to the intersection of universal principles with specific understand why a technology translation succeeds settings; it is responsive to the needs of commer- or fails. A translational perspective further enables cial applications or researchers in other domains. factorizing “grand challenge” research questions Applied research utilizes real-world datasets, often into clearly-defined pieces, producing intermediate specialized, and involves sources of noise and un- results and driving new basic research questions. reliability that complicate capturing linguistic regu- Our paper makes the following contributions: larities of interest. Applications often involve tack- ling multiple interrelated problems, and demand • We characterize the stakeholders involved in complex combinations of tools (e.g. using OCR the process of translating basic NLP advances followed by NLP to analyze scanned documents). to applications, and identify the roles they play Applied research is concrete and immediate, but in identifying new research problems (§3.1). may also be reactive and have a limited scope. • We present a general-purpose checklist to use Translational research The term translational as a starting point for the translational pro- is used in medicine to describe research that aims to cess, to help integrate basic NLP innovations transform advances in basic knowledge (biological into applications and to identify basic research or clinical) to applications to human health (Butte, opportunities arising from application needs 2008; Rubio et al., 2010). Translational research is (§3.2). a distinct discipline bridging basic science and ap- plications (Pober et al., 2001; Reis et al., 2010). We • We present a case study in the medical domain adopt the term Translational NLP to describe re- illustrating how the elements of our Transla- search bridging the gap between basic and applied tional NLP framework can lead to new chal- NLP research, and aiming to understand the pro- lenges for basic, applied, and translational cesses by which each informs the other. Section4 NLP research (§4). presents one in-depth example; other salient ex- amples include comparing the efficacy of domain 2 Defining Translational NLP adaptation methods for different application do- 2.1 A third type of research mains (Naik et al., 2019) and developing reusable software for processing specific text genres (Neu- A long history of distinguishing between basic and mann et al., 2019). Translational research occupies applied research (Bush, 1945; Shneiderman, 2016) a middle ground in the timeframe and complex- has noted that these terms are often relative; one ity of solutions: it develops processes to rapidly researcher’s basic study is the application of an- and effectively integrate new innovations into appli- other’s theory. In practice, basic and applied re- cations to address emerging needs, and facilitates search in NLP are endpoints of a spectrum, rather integration between pipelines of NLP tools. than discrete categories. As use-inspired research, most NLP studies incorporate elements of both ba- 2.2 Translation is bidirectional sic and applied research. We therefore define our key terms for this paper as follows: In addition to “forward” motion of basic inno- Basic research Basic NLP research is focused vations into practical applications, the needs of on universal principles: linguistically-motivated real-world applications also provide significant op- study that guides model design (e.g., Recasens and portunities for new fundamental research. Shnei- Hovy(2009) for coreference, Kouloumpis et al. derman’s model of “two parents, three children” (2011) for sentiment analysis), or modeling tech- (Shneiderman, 2016) provides an informative pic- niques designed for general use across different ture: combining a practical problem and a theo- settings and genres. Basic research tends to focus retical model yields (1) a solution to the problem, (2) a refinement of the theory, and (3) guidance for and applied research has in enriching scientific en- future research. Tight links between basic research deavor (Stokes, 1997; Branscomb, 1999; Narayana- and applications have driven many major advances murti et al., 2013; Shneiderman, 2016). An inte- in NLP, from machine translation and dialog sys- grated approach has been cited by both tems to search engines and question answering. (Spector et al., 2012) and IBM (McQueeney, 2003) Designing research with application needs in mind as central to their successes in both business and is a key impact criterion for both funding agencies research. The aim of our paper is to facilitate this (Christianson et al., 2018) and industry (Spector integration in NLP more broadly, through present- et al., 2012), and helps to identify new, high-impact ing a rubric for studying and facilitating the process research problems (Shneiderman, 2018). of getting both to and back from application.

2.3 NLP as a translational field: a historical 2.4 A practical definition perspective For an operational definition of Translational NLP, it is instructive to consider four phases of a generic The NLP field has always lain at the nexus of ba- workflow for tackling a novel NLP problem using sic and applied research. Application needs have supervised machine learning.1 First, a team of NLP driven some of the most fundamental developments experts works with subject matter experts (SMEs) in the field, leading to explosions in basic research to identify appropriate corpora, define concepts in new topics and on long-standing challenges. to be extracted, and construct annotation guide- The need to automatically translate Russian sci- lines for the target task. Second, SMEs use these entific papers in the early years of the Cold War led guidelines to annotate natural language data, us- to some of the earliest NLP research, creating the ing iterative evaluation, revision of guidelines, and still-thriving field of machine translation (Slocum, re-annotation to converge on a high-quality gold 1985). Machine translation has since helped drive standard set of annotations. Third, NLP experts use many significant advances in basic NLP research, these annotations to train and evaluate candidate from the adoption of statistical models in the 1980s models of the task, joined with SMEs in a feed- (Dorr et al., 1999) to neural sequence-to-sequence back loop to discuss results and needed revisions modeling (Sutskever et al., 2014) and attention of goals, guidelines, and gold standards. Finally, mechanisms (Bahdanau et al., 2015). buy-in is sought from SMEs and practitioners in Similarly, the rapid growth of the World Wide the target domain, in a dialogue informed by empir- Web in the 1990s created an acute need for tech- ical results and conceptual training. NLP adoption nologies to search the growing sea of information, in practice identifies failure cases and new informa- leading to the development of NLP-based search tion needs, and the process begins again. engines such as Lycos (Mauldin, 1997), followed This laborious process is needed because of the by PageRank (Page et al., 1999) and the growth of gaps between expertise in NLP technology and ex- Google. The need to index and monetize vast quan- pertise in use cases where NLP is applied. NLP tities of textual information led to an explosion in expertise is needed to properly formulate problems, information retrieval research, and the NLP field and subsequently to develop sound and generaliz- and ever-growing web data continue to co-develop. able solutions to those problems. However, for up- In a more recent example, IBM identified auto- take (and therefore impact) to occur, these solutions mated question answering (QA) as a new business must be based in deep expertise in the use case do- opportunity in a high-information world, and de- main, reified in a computable manner through anno- veloped the Watson project (Ferrucci et al., 2010). tation or knowledge resource development. These Watson’s early successes catapulted QA into the distinct forms of expertise are generally found in center of NLP research, where it has continued different groups of individuals with complementary to drive both novel technology development and perspectives (see e.g. Kruschwitz and Hull(2017)). benchmark evaluation datasets used in hundreds of Given this gap, we define Translational NLP as basic NLP studies (Rajpurkar et al., 2016). the development of theories, tools, and processes These and other examples illustrate the key role to enable the direct application of advanced NLP that application needs have played in driving inno- 1While workflows will vary for different classes of NLP vation in NLP research. This reflects not only the problems, dialogue between NLP experts and subject matter history of the field but the role that integrating basic experts is at the heart of developing almost all NLP solutions. tools in specific use cases. Implementing these ate NLP solutions. Our framework codifies fun- tools and processes, and engaging with basic NLP damental variables in this process, providing a experts and SMEs in their use, is the role of the roadmap for negotiating the design of methodolog- Translational NLP scientist. Although every use ical innovations with an eye towards potential ap- case has unique characteristics, there are shared plications. Although it is certainly not the case principles in designing NLP solutions that under- that every basic research advance must be tied to gird the whole of the research and application pro- a downstream application need, designing founda- cess. These shared translational principles can be tional technologies for potential application from adopted by basic researchers to increase the im- the beginning produces more robust technologies pact of NLP methods innovations, and guide the that are easier to transfer to practical settings, in- translational researcher in developing novel efforts creasing the impact of basic research. By defining targeting fundamental gaps between basic research common variables, our framework also provides and applications. The framework presented in this a structure for aligning application needs to basic paper identifies common variables and asks specific technologies, helping to identify potential failure questions that can drive this research. points and new research needs early for faster adop- For examples of this process in practice, it is tion of basic NLP advances. valuable to examine NLP development in the medi- Our framework has two components: cal domain. Use-inspired NLP research has a long 1. A definition of broad classes of stakeholders history in medicine (Sager et al., 1982; Ranum, in translating basic NLP innovations into ap- 1989), frequently with an eye towards practical ap- plications, including the roles that each stake- plications in research and care. Chapman et al. holder plays in defining and guiding research; (2011) highlight shared tasks as a key step towards addressing numerous barriers to application of NLP 2. A checklist of fundamental questions to struc- on clinical notes, including lack of shared datasets, ture the Translational NLP process, and to insufficient conventions and standards, limited re- guide identification of basic research opportu- producibility, and lack of user-centered design (all nities in specific application cases. factors presenting basic research opportunities, in 3.1 Stakeholders addition to NLP task improvement). Several ef- forts have explored the development of graphical NLP applications involve three broad categories user interfaces for conducting NLP tasks, including of stakeholders, illustrated in Figure2. Each con- creation and execution of pipelines (Cunningham, tributes differently to technology implementation 2002; D’Avolio et al., 2010, 2011; Soysal et al., and identifying new research challenges. 2018), although these efforts generally do not re- NLP Experts NLP researchers bring key ana- port on evaluation of usability by non-NLP experts. lytic skills to enable achieving the goals of an ap- Usability has been investigated by other studies in- plied system. NLP experts provide methodological volving more focused tools aimed at specific NLP sophistication in models and paradigms for analyz- tasks, including concept searching (Hultman et al., ing language, and an understanding of the nature 2018), annotation (Gobbel et al., 2014b,a), and in- of language and how it captures information. NLP teractive review of and update of text classification researchers provide much-needed data expertise, models (Trivedi et al., 2018, 2019; Savelka et al., including skills in obtaining, cleaning, and format- 2015). Recent research has utilized interactive NLP ting data for machine learning and evaluation, as tools for processing cancer research (Deng et al., well as conceptual models for representing informa- 2019) and care (Yala et al., 2017) documents. By tion needs. NLP scientists identify research oppor- constructing, designing, and evaluating tools de- tunities in modeling information needs, bringing signed to simplify specific NLP processes, these linguistic knowledge into the equation, and devel- efforts present examples of Translational NLP. oping appropriate tools for application and reuse. Subject Matter Experts Subject matter experts 3 The Translational NLP framework (SMEs) provide the context that helps to determine what information is important to analyze and what We present a conceptual framework for Transla- the outputs of applied NLP systems mean for the tional NLP, to formalize shared principles describ- application setting. SMEs, from medical practition- ing how basic and applied research interact to cre- ers to legal scholars and financial experts, bring NLP Experts End Users Subject Matter Experts

Tractable analytic methods Computing constraints Research context Data expertise Data availability Information context Conceptual models of information Organizational priorities NLP consumers

Translational NLP Researchers

Figure 2: Attributes of key stakeholders in the translational process for NLP. an understanding of where relevant information use NLP technologies to address their own infor- can be found (e.g., document sources (Fisher et al., mation needs according to the priorities of their 2016) and sections (Afzal et al., 2018)), which can organizations. These organizational priorities may help identify new types of language for basic re- conflict with existing modeling assumptions, high- searchers to study (Burstein, 2009; Crossley et al., lighting new opportunities for basic research to 2014) and new challenges such as sparse complex expand model capabilities. For example, Shah et al. information (Newman-Griffis and Fosler-Lussier, (2019) highlight the conceptual gap between pre- 2019) and higher-level structure in complex docu- dictive model performance in medicine and clinical ments (Naik et al., 2019). In addition, the context utility to call for new research on utility-driven that domain experts offer in terms of the needs of model evaluation. Spector et al.(2012) make a sim- target applications feeds back into evaluation meth- ilar point about Google’s mission-driven research ods in the basic research setting (Graham, 2015). identifying unseen gaps for new basic research. SMEs are also the consumers of NLP solutions, The role of the Translational NLP researcher as tools for their own research and applications. is to interface with each of these stakeholders, to Thus, SMEs must also be consultants regarding connect their goals, constraints, and contributions the trustworthiness and reliability of proposed so- into a single applied system, and to identify new lutions, and can identify key application-specific research opportunities where parts of this system concerns such as security requirements. conflict with one another. Notably, this creates an End Users The end users of NLP solutions span opportunity for valuable study of SME and end user a range of roles, environmental contexts, and goals, research practices, and for participatory design of each of which guides implementation factors of NLP research (Lazar et al., 2017). Our checklist, NLP applications. For example, collecting patient introduced in the next section, provides a structured language in a lab setting, in a clinic, or at home will framework for this translational process. pose different challenges in each setting, which can inform the development of basic NLP methods. Ap- 3.2 Translational NLP Checklist plication settings may have limited computational The path between basic research and applications is resources, motivating the development of efficient often nebulous in NLP, limiting the downstream im- alternatives to high-resource models (e.g. Wang pact of modeling innovations and obscuring basic et al.(2020)), and have different human factors research challenges found in application settings. affecting information collection and use. We present a general-purpose checklist covering End users have different constraints on data fundamental variables in translating basic research availability, in terms of how much data of what into applications, which breaks down the transla- types can be obtained from whom; the extensive tional process into discrete pieces for negotiation, work funded by DARPA’s Low Resource Lan- measurement, and identification of new research guages for Emergent Incidents (LORELEI) initia- opportunities. Our checklist, illustrated in Figure3, tive (Christianson et al., 2018) is a testament to the is loosely ordered from initial design to application basic research arising from these constraints. details. In practice, these items reflect different ele- Beyond the individual domain expert, end users ments of the application process and are constantly ample, a clinician interested in the documented What is the goal, and what Design Information Need are the outputs? reasoning behind a series of laboratory test orders needs: (1) the orders themselves (text spans); (2) Data Which genres and linguistic Characteristics communities are involved? the temporal sequence of the orders; and (3) a text

Which existing NLP tasks are span containing the justification for each order. Task Paradigms involved? Ex. 1: type, severity, history of symptoms. Available What knowledge sources and Ex. 2: clinical findings, logical criteria. Resources infrastructure is available? Data Characteristics A clear description of the What models and methods NLP Technologies are appropriate? language data to be analyzed is key to identifying appropriate NLP technologies. Data characteris- How is model performance Evaluation evaluated? tics include the natural language(s) used (e.g., En- glish, Chinese), the genre(s) of language to analyze How can model decisions be Interpretation interpreted for accountability? (e.g., scientific abstracts, quarterly earnings reports,

Application How can this tool be made tweets, conversations), and the type(s) of linguis- Engineering consumable and reusable? Application tic community that produced them (e.g., medical practitioners, educators, policy experts). This infor- Figure 3: The eight items in our Translational NLP mation identifies the sublanguage(s) of interest (Gr- checklist, with key questions for each. Items are ishman and Kittredge, 1986), which determine the loosely ordered from initial design to application de- availability and development of appropriate NLP tails, but should be regularly revisited in a feedback tools (Grishman, 2001). Corporate disclosures, fi- loop between application stakeholders. nancial news reports, and tweets all require differ- ent processing strategies (Xing et al., 2018), as do re-evaluated via a feedback loop between the ap- tweets written by different communities (Blodgett plication stakeholders. While many of these items et al., 2016; Groenwold et al., 2020). will be familiar to NLP researchers, each represents Ex. 1: clinical texts, lab reports. potential points of failure in translation. Designing Ex. 2: clinical texts, legal guidelines. the research process with these variables in mind Task Paradigms To address the overall goal will produce basic innovations that are more eas- with an NLP solution, it must be formulated in ily adopted for application and more directly con- terms of one or more well-defined NLP problems. nected to the challenges of real-world use cases. Many real-world application needs do not clearly We illustrate our items for two example cases: correspond to a single benchmark task formulation. Ex. 1: Analysis of multimodal clinical data For example, our earlier example of the sequence (scanned text, tables, images) for patient diagnosis. of lab order justifications can be formulated as a Ex. 2: Comparison of medical observations to gov- sequence of: (1) Named Entity Recognition (treat- ernment treatment and billing guidelines. ing the order types as named entities in a medical Information Need The initial step that guides knowledge base); (2) time expression extraction an application is defining inputs and outputs, at two and normalization; (3) event ordering; and (4) evi- levels: (1) the overall problem to address with NLP dence identification. Breaking the application need (led by the subject matter expert), and (2) the for- into well-studied subproblems at design time en- mal representation of that problem (led by the NLP ables faster identification and development of rele- expert). The overall goal (e.g., “extract informa- vant NLP technologies, and highlights any portions tion on cancer from clinical notes”) determines the of the goal that do not correspond with a known requirements of the solution, and is central to iden- problem, requiring novel basic research. tifying a measurement of its effectiveness. Once Ex. 1: document type classification, OCR, informa- the overall goal is determined, the next step is a tion extraction (IE), patient classification. formal representation of that goal in terms of text Ex. 2: IE, natural language inference. units (documents, spans) to analyze and what the Available Resources The question of resources analysis should produce (class labels, sequence to support an NLP solution includes two distinct annotations, document rankings, etc.). These re- concerns: (1) knowledge sources available to rep- quirements are tailored to specific applications and resent salient aspects of the target task; and (2) may not reflect standardized NLP tasks. For ex- compute infrastructure for NLP system execution and deployment. Knowledge sources may be sym- joint or multi-task learning (Caruana, 1997). bolic, such as knowledge graphs or gazetteers, or Ex. 1: large language models, dictionary matching, representational, such as representative corpora or OCR, multi-task learning. pretrained language models. For some applica- Ex. 2: dictionary matching, small neural models. tions, powerful knowledge sources may be avail- Evaluation Once a solution has been designed, able (such as the UMLS (Bodenreider, 2004) for it must be evaluated in terms of both the specific biomedical reasoning), while others are severely NLP problem(s) and the overall goal of the applica- under-resourced (such as emerging geopolitical tion. Standardized NLP task formulations typically events, which may lack even relevant social me- define benchmark metrics which can be used for dia text). These resources in turn affect the kinds evaluating the NLP components: F-1 and AUC for of technologies that are appropriate to use. information extraction, MRR and NDCG for infor- In terms of infrastructure, NLP technologies are mation retrieval, etc. The design of these metrics deployed on a wide variety of systems, from com- is its own extensive area of research (Jones and mercial data centers to mobile devices. Each set- Galliers, 1996; Hirschman and Thompson, 1997; ting presents constraints of limited resources and Graham, 2015), and even established evaluation throughput requirements (Nityasya et al., 2020). methods may be constantly revised (Grishman and An application environment with a high maxi- Sundheim, 1995). Critically for the translational mum resource load but low median availability researcher, some metrics may be preferred over is amenable to batch processing architectures or others (e.g., precision over recall), and standard- approaches with high pretraining cost and low ized evaluation metrics may not reflect the goals test-time cost. Pretrained word representstions and needs of applications (Friedman and Hripcsak, (Mikolov et al., 2013; Pennington et al., 2014) and 1998). Improvements on standardized evaluation language models (Peters et al., 2018; Devlin et al., metrics (such as increased AUC) may even obscure 2019) are one example of fundamental technolo- degradations in application-relevant performance gies that address such a need. Throughput require- measures (such as decreased process efficiency). ments, i.e., how much language input needs to be Translational researchers thus have the opportunity analyzed in a fixed amount of time, often require to work with NLP experts and SMEs to identify engineering optimization for specific environments or develop metrics that capture both the effective- (Afshar et al., 2019), but the need for faster runtime ness of the NLP system and its contribution to the computation has led to many advances in machine application’s overall goal. learning for NLP, such as variational autoencoders Ex. 1: F-1, patient outcomes. (Kingma and Welling, 2014) and the Transformer Ex. 2: F-1, billing rates. architecture (Vaswani et al., 2017). Interpretation Interpretability and analysis of Ex. 1: UMLS, high GPU compute. NLP and other machine learning systems has been Ex. 2: UMLS, guideline criteria, low compute. the focus of extensive research in recent years NLP Technologies The interaction between task (Gilpin et al., 2018; Belinkov and Glass, 2019), paradigms, data characteristics, and available re- with debate over what constitutes an interpretation sources helps to determine what types of implemen- (Rudin, 2019; Wiegreffe and Pinter, 2019) and de- tations are appropriate to the task. Implementa- velopment of broad-coverage software packages tions can be further broken down into representa- for ease of use (Nori et al., 2019). For the trans- tion technologies, for mathematically representing lational researcher, the first step is to engage with the language units to be analyzed; modeling ar- SMEs to determine what constitutes an acceptable chitectures, for capturing regularities within that interpretation of an NLP system’s output in the ap- language; and optimization strategies (when us- plication domain (which may be subject to specific ing machine learning), for efficiently estimating legal or ethical requirements around accountability model parameters from data. In low-resource set- in decision-making processes). This leads to an tings, highly parameterized models such as BERT iterative process, working with SMEs and NLP ex- may not be appropriate, while large-scale GPU perts to identify appropriately interpretable models, server farms enable highly complex model archi- or to identify the need for new basic research on tectures. When the overall goal is factorized into interpretability within the target domain. multiple NLP tasks, optimization often involves Ex. 1: Evidence identification, model audits. Ex. 2: Criteria visualization, model audits. methods); subject matter experts in disability and Application Engineering Last but not least, the rehabilitation; and SSA end users (limited com- translational process must also be concerned with puting, large data but strictly controlled, overall the implementation of NLP solutions, both in terms priorities of efficiency and accuracy). of the specific technologies used and how they can The Translational NLP checklist for this setting fit in to broader information processing pipelines. is shown in Table1. This combination of factors The development of general-purpose NLP architec- has led to several translational studies, including: tures such as the Stanford CoreNLP Toolkit (Man- ning et al., 2014), spaCy (Honnibal and Montani, • Newman-Griffis et al.(2018) developed a low- 2017), and AllenNLP (Gardner et al., 2018), as resource entity embedding method for do- well as more targeted architectures such as the mains with minimal knowledge sources (lack clinical NLP framework presented by Wen et al. of Available Resources). (2019), provide well-engineered frameworks for • Newman-Griffis and Zirikly(2018) analyzed implementing new technologies in a way that is the data size and representativeness tradeoff easy for others to both adopt and adapt for use in for information extraction in domains lacking their own pipelines. Standardized data exchange large corpora (Available Resources). frameworks such as UIMA (Ferrucci and Lally, • Newman-Griffis and Fosler-Lussier(2019) 2004) and JSON make implementations more mod- developed a flexible method for identifying ular and easier to wire together. Leveraging tools sparse health information that is syntactically and frameworks like these, together with good soft- complex (challenging Data Characteristics). ware design principles, makes NLP tools both eas- ier to apply downstream and easier for other re- • Newman-Griffis and Fosler-Lussier(2021) searchers to incorporate into their own work. compared the Task Paradigms of classification Ex. 1: Multiple interoperable technologies. and candidate selection paradigms for medical Ex. 2: Single decision support tool. coding in a new domain.

3.2.1 Translating methodology advances into While these studies do not systematically ex- existing applications plore Evaluation, Interpretation, or Application En- While the checklist items can guide initial design gineering, they illustrate how the characteristics of of a new NLP solution, they are equally applica- one application setting can lead to a line of Trans- ble for incorporating new basic NLP innovations lational NLP research with broader implications. into existing solutions. Any new innovation can be Several further challenges of this application area reviewed in terms of our checklist items to iden- remain unstudied: for example, representing and tify new requirements or constraints (e.g., higher modeling the complex timelines of persons with computational cost, more intuitive interpretability chronic health conditions and intermittent health measures). The translational researcher can then care and adapting NLP systems to highly variable work with NLP experts, SMEs, and the end users to medical language from practitioners and patients determine how to incorporate the new innovation around the US. These present intriguing challenges into the existing solution. for basic NLP research that can inform many other applications beyond this case study. 4 Case Study: NLP for Disability Review Of course, these studies are far from the only We illustrate our Translational NLP framework us- examples of Translational NLP research. Many ing our recent line of research on developing NLP studies tackle translational questions, from domain tools to assist US Social Security Administration adaptation (shifts in Data Characteristics) and low- (SSA) officials in reviewing applications for dis- resource learning (limited Available Resources), ability benefits (Desmet et al., 2020). The goal of and the growing NLP literature in domain-specific this effort was to help identify relevant pieces of venues such as medical research, law, finance, and medical evidence for making decisions about dis- more involves all aspects of the translational pro- ability benefits, analyzing vast quantities of medi- cess. Rather, this case study is simply one illustra- cal records collected during the review process. tion of how an explicitly translational perspective The stakeholders in this setting included: NLP in study design can identify and connect broad op- researchers (interested in developing generalizable portunities for contributions to NLP research. Information Need Overall goal: Improve disability benefits review process by highlighting relevant information Formal representation: Spans of evidence, with attributes for activity type and level of limitation Data Characteristics Medical records and administrative forms from USA, mostly English Task Paradigms Information extraction (spans), information retrieval (documents), span classification (activity and limitations) Available Resources Minimal knowledge sources for function and disability, no public corpora; US government computing systems; high throughput requirements (thousands of records/day) NLP Technologies Low-latency, low-compute sequence models; rule-based systems Evaluation Standard metrics (F-1, accuracy). Information retrieval metrics reported for use case prototypes. Interpretation Interpretation needs primarily around human decision-making; NLP tools highlight and organize information in context. No ML interpretability reported in published results. Application Open-source implementations using standardized frameworks for preprocessing. No data ex- Engineering change reported.

Table 1: Translational NLP checklist items for Disability Review case study, including notes on published results.

5 Discussion for every project. The specifics of different applica- tions will expand our initial questions in different Our paradigm of Translational NLP defines and ways (e.g., “Data Characteristics” may involve mul- gives structure to a valuable area of research not timodal data, or different language styles), and the explicitly represented in the ACL community. We dynamics of collaborations will shift answers over note that translational research is not meant to re- time (e.g., a change in evaluation criteria may mo- place either basic or applied research, nor do we in- tivate different model training approaches). Our tend to say that all basic NLP studies must be tied to checklist provides a minimal set of common ques- specific application needs. Rather we aim to high- tions, and can function as a touchstone for discus- light the value of studying the processes of turn- sions throughout the research process, but it can ing basic innovations into successful applications. and should be tailored to the nature of each project. These processes, from scaling model computation Our framework is itself a preliminary characteriza- to redesigning tools to meet changing application tion of Translational NLP research, and will evolve needs, can inform new research in model design, over time as the field continues to develop. domain adaptation, etc., and can help us understand why some tools succeed in application while oth- 6 Conclusion ers fail. In addition to helping more innovations successfully translate, the principles outlined in We have outlined a new model of NLP research, this paper can be of use to basic and applied NLP Translational NLP, which aims to bridge the gap researchers as well as translational ones, in identi- between basic and applied NLP research with gen- fying common variables and concerns to connect eralizable principles, tools, and processes. We iden- new work to the broader community. tified key types of stakeholders in NLP applica- Translational research is equally at home in in- tions and how they inform the translational process, dustry and academia, and already occurring in both. and presented a checklist of common variables and While resource disparities between industrial and translational principles to consider in basic, transla- academic research increasingly push large-scale tional, or applied NLP research. The translational modeling efforts out of reach of academic teams, a framework reflects the central role that integrating translational lens can help to identify rich areas of basic and applied research has played in the devel- knowledge-driven study that do not require exas- opment of the NLP field, and is illustrated by both cale data or computing resources. The general prin- the broad successes of machine translation, speech ciples and interdisciplinary nature of translational processing, and web search, as well as many indi- research make it a natural fit for public knowledge- vidual studies in the ACL community and beyond. driven academic settings, while its applicability to Acknowledgments commercial needs is highly relevant to industry. Our framework provides a starting point for the This work was supported by the National Library translational process, which will evolve differently of Medicine of the National Institutes of Health under award number T15 LM007059, and National Atul J Butte. 2008. Translational Bioinformatics: Com- Science Foundation grant 1822831. ing of Age. Journal of the American Medical Infor- matics Association, 15(6):709–714. Rich Caruana. 1997. Multitask learning. Machine References learning, 28(1):41–75. Majid Afshar, Dmitriy Dligach, Brihat Sharma, Xi- Wendy W Chapman, Prakash M Nadkarni, Lynette aoyuan Cai, Jason Boyda, Steven Birch, Daniel Hirschman, Leonard W D’Avolio, Guergana K Valdez, Suzan Zelisko, Cara Joyce, François Mo- Savova, and Ozlem Uzuner. 2011. Overcoming bar- dave, and Ron Price. 2019. Development and ap- riers to NLP for clinical text: the role of shared tasks plication of a high throughput natural language pro- and the need for additional creative solutions. Jour- cessing architecture to convert all clinical documents nal of the American Medical Informatics Associa- in a clinical data warehouse into standardized medi- tion, 18(5):540–543. cal vocabularies. Journal of the American Medical Informatics Association, 26(11):1364–1369. Caitlin Christianson, Jason Duncan, and Boyan Onyshkevych. 2018. Overview of the DARPA Naveed Afzal, Vishnu Priya Mallipeddi, Sunghwan LORELEI Program. Machine Translation, 32(1):3– Sohn, Hongfang Liu, Rajeev Chaudhry, Christo- 9. pher G Scott, Iftikhar J Kullo, and Adelaide M Scott A Crossley, Laura K Allen, Kristopher Kyle, and Arruda-Olson. 2018. Natural language processing Danielle S McNamara. 2014. Analyzing Discourse of clinical notes for identification of critical limb is- Processing Using a Simple Natural Language Pro- International Journal of Medical Informat- chemia. cessing Tool. Discourse Processes, 51(5-6):511– ics , 111:83–89. 534. Laura J Anzaldi, Ashwini Davison, Cynthia M Boyd, Hamish Cunningham. 2002. GATE, a General Archi- Bruce Leff, and Hadi Kharrazi. 2017. Comparing tecture for Text Engineering. Computers and the clinician descriptions of frailty and geriatric syn- Humanities, 36(2):223–254. dromes using electronic health records: a retrospec- tive cohort study. BMC Geriatrics, 17(1):248. Leonard W. D’Avolio, Thien M. Nguyen, Wildon R. Farwell, Yongming Chen, Felicia Fitzmeyer, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- Owen M. Harris, and Louis D. Fiore. 2010. Eval- gio. 2015. Neural Machine Translation By Jointly uation of a generalizable approach to clinical Learning To Align and Translate. In ICLR 2015. information retrieval using the automated retrieval console (ARC). Journal of the American Medical Yonatan Belinkov and James Glass. 2019. Analysis Informatics Association, 17(4):375–382. methods in neural language processing: A survey. Transactions of the Association for Computational Leonard W. D’Avolio, Thien M. Nguyen, Sergey Gory- Linguistics, 7:49–72. achev, and Louis D. Fiore. 2011. Automated concept-level information extraction to reduce the Su Lin Blodgett, Lisa Green, and Brendan O’Connor. need for custom software and rules development. 2016. Demographic Dialectal Variation in Social Journal of the American Medical Informatics Asso- Media: A Case Study of African-American English. ciation, 18(5):607–613. In Proceedings of the 2016 Conference on Empiri- Zhengyi Deng, Kanhua Yin, Yujia Bao, Victor Diego cal Methods in Natural Language Processing, pages Armengol, Cathy Wang, Ankur Tiwari, Regina 1119–1130, Austin, Texas. Association for Compu- Barzilay, Giovanni Parmigiani, Danielle Braun, and tational Linguistics. Kevin S. Hughes. 2019. Validation of a semiauto- mated natural language processing–based procedure Olivier Bodenreider. 2004. The Unified Med- for meta-analysis of cancer susceptibility gene pene- ical Language System (UMLS): integrating trance. JCO Clinical Cancer Informatics, (3):1–9. biomedical terminology. Nucleic Acids Research, 32(90001):D267–D270. Shrey Desai, Geoffrey Goh, Arun Babu, and Ahmed Aly. 2020. Lightweight convolutional represen- Lewis M Branscomb. 1999. The false dichotomy: Sci- tations for on-device natural language processing. entific creativity and utility. Issues in Science and arXiv preprint arXiv:2002.01535. Technology, 16(1):66–72. Bart Desmet, Julia Porcino, Ayah Zirikly, Denis Jill Burstein. 2009. Opportunities for Natural Lan- Newman-Griffis, Guy Divita, and Elizabeth Rasch. guage Processing Research in Education. In Com- 2020. Development of natural language process- putational Linguistics and Intelligent Text Process- ing tools to support determination of federal dis- ing. CICLing 2009, pages 6–27, Berlin, Heidelberg. ability benefits in the U.S. In Proceedings of the Springer Berlin Heidelberg. 1st Workshop on Language Technologies for Govern- ment and Public Administration (LT4Gov), pages 1– Vannevar Bush. 1945. Science, The Endless Frontier. 6, Marseille, France. European Language Resources U.S. Government Printing Office. Association. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Dario Giuse, Theodore Speroff, Steven H. Brown, Kristina Toutanova. 2019. BERT: Pre-training of Hua Xu, and Michael E. Matheny. 2014a. Assisted deep bidirectional transformers for language under- annotation of medical free text using RapTAT. Jour- standing. In Proceedings of the 2019 Conference nal of the American Medical Informatics Associa- of the North American Chapter of the Association tion, 21(5):833–841. for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Glenn T. Gobbel, Ruth Reeves, Shrimalini Jayara- pages 4171–4186, Minneapolis, Minnesota. Associ- maraja, Dario Giuse, Theodore Speroff, Steven H. ation for Computational Linguistics. Brown, Peter L. Elkin, and Michael E. Matheny. 2014b. Development and evaluation of RapTAT: Bonnie J Dorr, Pamela W Jordan, and John W Benoit. a machine learning system for concept mapping of 1999. A Survey of Current Paradigms in Machine phrases from medical narratives. Journal of Biomed- Translation. volume 49, pages 1–68. Elsevier. ical Informatics, 48:54–65.

Hady Elsahar and Matthias Gallé. 2019. To annotate Yvette Graham. 2015. Re-evaluating Automatic Sum- or not? predicting performance drop under domain marization with BLEU and 192 Shades of ROUGE. shift. In Proceedings of the 2019 Conference on In Proceedings of the 2015 Conference on Empiri- Empirical Methods in Natural Language Processing cal Methods in Natural Language Processing, pages and the 9th International Joint Conference on Natu- 128–137, Lisbon, Portugal. Association for Compu- ral Language Processing (EMNLP-IJCNLP), pages tational Linguistics. 2163–2173, Hong Kong, China. Association for Computational Linguistics. Ralph Grishman. 2001. Adaptive information extrac- tion and sublanguage analysis. In Proceedings of the David Ferrucci, Eric Brown, Jennifer Chu-Carroll, Workshop on Adaptive Text Extraction and Mining, James Fan, David Gondek, Aditya A Kalyanpur, Seventeenth International Joint Conference on Arti- Adam Lally, J William Murdock, Eric Nyberg, John ficial Intelligence (IJCAI-2001), pages 1–4, Seattle, Prager, and Others. 2010. Building Watson: An Washington, USA. overview of the DeepQA project. AI magazine, 31(3):59–79. Ralph Grishman and R Kittredge. 1986. Analyzing lan- guage in restricted domains: Sublanguage descrip- David Ferrucci and Adam Lally. 2004. UIMA: an tion and processing. Lawrence Erlbaum Associates. architectural approach to unstructured information processing in the corporate research environment. Ralph Grishman and Beth Sundheim. 1995. Design of Natural Language Engineering, 10:1–26. the MUC-6 evaluation. In Proceedings of the 6th Conference on Message Understanding Ingrid E Fisher, Margaret R Garnsey, and Mark E , pages 1–11. Hughes. 2016. Natural Language Processing in Ac- Association for Computational Linguistics. counting, Auditing and Finance: A Synthesis of the Sophie Groenwold, Lily Ou, Aesha Parekh, Samhita Literature with a Roadmap for Future Research. In- telligent Systems in Accounting, Finance and Man- Honnavalli, Sharon Levy, Diba Mirza, and William Yang Wang. 2020. Investigating African- agement, 23(3):157–214. American Vernacular English in Transformer-Based C Friedman and G Hripcsak. 1998. Evaluating natural Text Generation. In Proceedings of the 2020 Con- language processors in the clinical domain. Methods ference on Empirical Methods in Natural Language of information in medicine, 37(4-5):334. Processing (EMNLP), pages 5877–5883, Online. Association for Computational Linguistics. Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Pe- Lynette Hirschman and Henry S. Thompson. 1997. ters, Michael Schmitz, and Luke Zettlemoyer. 2018. Overview of evaluation in speech and natural lan- AllenNLP: A deep semantic natural language pro- guage processing. In Ronald A. Cole, editor, Survey cessing platform. In Proceedings of Workshop for of the State of the Art in Human Language Technol- NLP Open Source Software (NLP-OSS), pages 1– ogy. Cambridge University Press. 6, Melbourne, Australia. Association for Computa- tional Linguistics. Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understanding with bloom embed- L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, dings, convolutional neural networks and incremen- and L. Kagal. 2018. Explaining explanations: An tal parsing. overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on Data Gretchen Hultman, Reed McEwan, Serguei Pakho- Science and Advanced Analytics (DSAA), pages 80– mov, Elizabeth Lindemann, Steven Skube, and 89. Genevieve B. Melton. 2018. Usability Evaluation of an Unstructured Clinical Document Query Tool Glenn T. Gobbel, Jennifer Garvin, Ruth Reeves, for Researchers. AMIA Joint Summits on Transla- Robert M. Cronin, Julia Heavirland, Jenifer tional Science proceedings. AMIA Joint Summits on Williams, Allison Weaver, Shrimalini Jayaramaraja, Translational Science, 2017:84–93. Karen Spärck Jones and Julia R. Galliers. 1996. Eval- Joint Conference on Natural Language Processing uating Natural Language Processing Systems: An (EMNLP-IJCNLP): System Demonstrations, pages Analysis and Review. Springer-Verlag, Berlin, Hei- 85–90, Hong Kong, China. Association for Compu- delberg. tational Linguistics.

Diederik P Kingma and Max Welling. 2014. Auto- Denis Newman-Griffis and Eric Fosler-Lussier. 2021. encoding variational bayes. ICLR 2014. Automated Coding of Under-Studied Medical Con- cept Domains: Linking Physical Activity Reports to Efthymios Kouloumpis, Theresa Wilson, and Johanna the International Classification of Functioning, Dis- Moore. 2011. Twitter sentiment analysis: The good ability, and Health. Frontiers in Digital Health, the bad and the omg! In Fifth International AAAI 3:620828. conference on weblogs and social media. Citeseer. Denis Newman-Griffis, Albert M Lai, and Eric Fosler- Udo Kruschwitz and Charlie Hull. 2017. Searching the Lussier. 2018. Jointly embedding entities and text Enterprise. Foundations and Trends in Information with distant supervision. In Proceedings of The Retrieval, 11(1):18. Third Workshop on Representation Learning for NLP, pages 195–206, Melbourne, Australia. Asso- Jonathan Lazar, Jinjuan Heidi Feng, and Harry ciation for Computational Linguistics. Hochheiser. 2017. Research methods in human- computer interaction. Morgan Kaufmann. Denis Newman-Griffis and Ayah Zirikly. 2018. Embed- ding transfer for low-resource medical named entity Christopher Manning, Mihai Surdeanu, John Bauer, recognition: A case study on patient mobility. In Jenny Finkel, Steven Bethard, and David McClosky. Proceedings of the BioNLP 2018 workshop, pages 2014. The Stanford CoreNLP natural language pro- 1–11, Melbourne, Australia. Association for Compu- cessing toolkit. In Proceedings of 52nd Annual tational Linguistics. Meeting of the Association for Computational Lin- guistics: System Demonstrations, pages 55–60, Bal- Made Nindyatama Nityasya, Haryo Akbarianto Wi- timore, Maryland. Association for Computational bowo, Radityo Eko Prasojo, and Alham Fikri Aji. Linguistics. 2020. No Budget? Don’t Flex! Cost Considera- tion when Planning to Adopt NLP for Your Business. M I Mauldin. 1997. Lycos: design choices in an Inter- arXiv preprint arXiv:2012.08958. net search service. IEEE Expert, 12(1):8–11. Harsha Nori, Samuel Jenkins, Paul Koch, and Rich David F McQueeney. 2003. IBM’s evolving re- Caruana. 2019. InterpretML: A unified framework search strategy. Research-Technology Management, for machine learning interpretability. arXiv preprint 46(4):20–27. arXiv:1909.09223. Lawrence Page, Sergey Brin, , and Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- Terry Winograd. 1999. The PageRank citation rank- frey Dean. 2013. Efficient Estimation of Word ing: Bringing order to the web. Technical report, Representations in Vector Space. arXiv preprint Stanford InfoLab. arXiv:1301.3781, pages 1–12. Jeffrey Pennington, Richard Socher, and Christopher Aakanksha Naik, Luke Breitfeller, and Carolyn Rose. Manning. 2014. GloVe: Global vectors for word 2019. TDDiscourse: A Dataset for Discourse-Level representation. In Proceedings of the 2014 Confer- Temporal Ordering of Events. In Proceedings of the ence on Empirical Methods in Natural Language 20th Annual SIGdial Meeting on Discourse and Dia- Processing (EMNLP), pages 1532–1543, Doha, logue, pages 239–249, Stockholm, Sweden. Associ- Qatar. Association for Computational Linguistics. ation for Computational Linguistics. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Venkatesh Narayanamurti, Tolu Odumosu, and Lee Gardner, Christopher Clark, Kenton Lee, and Luke Vinsel. 2013. RIP: The basic/applied research Zettlemoyer. 2018. Deep contextualized word rep- dichotomy. Issues in Science and Technology, resentations. In Proceedings of the 2018 Confer- 29(2):31–36. ence of the North American Chapter of the Associ- ation for Computational Linguistics: Human Lan- Mark Neumann, Daniel King, Iz Beltagy, and Waleed guage Technologies, Volume 1 (Long Papers), pages Ammar. 2019. ScispaCy: Fast and robust models 2227–2237, New Orleans, Louisiana. Association Pro- for biomedical natural language processing. In for Computational Linguistics. ceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy. Association Jordan S Pober, Crystal S Neuhauser, and Jeremy M for Computational Linguistics. Pober. 2001. Obstacles facing translational research in academic medical centers. The FASEB Journal, Denis Newman-Griffis and Eric Fosler-Lussier. 2019. 15(13):2303–2313. HARE: a Flexible Highlighting Annotator for Rank- ing and Exploration. In Proceedings of the Protiva Rahman, Arnab Nandi, and Courtney Hebert. 2019 Conference on Empirical Methods in Natu- 2020. Amplifying domain expertise in clinical data ral Language Processing and the 9th International pipelines. JMIR Med Inform, 8(11):e19612. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Jonathan Slocum. 1985. A Survey of Machine Trans- Percy Liang. 2016. SQuAD: 100,000+ Questions lation: Its History, Current Status, and Future for Machine Comprehension of Text. In Proceed- Prospects. Comput. Linguist., 11(1):1–17. ings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392. Ergin Soysal, Jingqi Wang, Min Jiang, Yonghui Wu, Serguei Pakhomov, Hongfang Liu, and Hua Xu. David L Ranum. 1989. Knowledge-based understand- 2018. CLAMP – a toolkit for efficiently build- ing of radiology text. Computer methods and pro- ing customized clinical natural language processing grams in biomedicine, 30(2-3):209–215. pipelines. Journal of the American Medical Infor- matics Association, 25(3):331–336. Publisher: Ox- Marta Recasens and Eduard Hovy. 2009. A deeper ford Academic. look into features for coreference resolution. In Proceedings of the 7th Discourse Anaphora and Alfred Spector, , and Slav Petrov. 2012. Anaphor Resolution Colloquium on Anaphora Pro- Google’s hybrid approach to research. Communica- cessing and Applications, DAARC ’09, page 29–42, tions of the ACM, 55(7):34–37. Berlin, Heidelberg. Springer-Verlag. Donald E Stokes. 1997. Pasteur’s quadrant: Basic sci- Steven E Reis, Lars Berglund, Gordon R Bernard, ence and technological innovation. Brookings Insti- Robert M Califf, Garret A Fitzgerald, Peter C John- tution Press. son, National Clinical Consortium, and Transla- tional Science Awards. 2010. Reengineering the na- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. tional clinical and translational research enterprise: Sequence to sequence learning with neural networks. the strategic plan of the National Clinical and Trans- In Advances in neural information processing sys- lational Science Awards Consortium. Academic tems, pages 3104–3112. Medicine, 85(3):463–469. Gaurav Trivedi, Esmaeel R Dadashzadeh, Robert M Doris McGartland Rubio, Ellie E Schoenbaum, Linda S Handzel, Wendy W Chapman, Shyam Visweswaran, Lee, David E Schteingart, Paul R Marantz, Karl E and Harry Hochheiser. 2019. Interactive NLP in Anderson, Lauren Dewey Platt, Adriana Baez, Clinical Care: Identifying Incidental Findings in Ra- and Karin Esposito. 2010. Defining translational diology Reports. Appl Clin Inform, 10(04):655– research: implications for training. Academic 669. Medicine, 85(3):470–475. Gaurav Trivedi, Phuong Pham, Wendy W. Chapman, Cynthia Rudin. 2019. Stop explaining black box ma- Rebecca Hwa, Janyce Wiebe, and Harry Hochheiser. chine learning models for high stakes decisions and 2018. NLPReViz: an interactive tool for natu- use interpretable models instead. Nature Machine ral language processing on clinical text. Journal Intelligence, 1(5):206–215. of the American Medical Informatics Association, 25(1):81–87. N Sager, IDJ Bross, G Story, P Bastedo, E Marsh, and D Shedd. 1982. Automatic encoding of clini- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob cal narrative. Computers in Biology and Medicine, Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz 12(1):43–56. Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information pro- Jaromir Savelka, Gaurav Trivedi, and Kevin Ashley. cessing systems, pages 5998–6008. 2015. Applying an interactive machine learning ap- proach to statutory analysis. In JURIX 2015 - the Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han 28th International Conference on Legal Knowledge Cai, Ligeng Zhu, Chuang Gan, and Song Han. and Information Systems. 2020. Hat: Hardware-aware transformers for effi- cient natural language processing. arXiv preprint Nigam H Shah, Arnold Milstein, and PhD Bagley arXiv:2005.14187. Steven C. 2019. Making Machine Learning Models Clinically Useful. JAMA, 322(14):1351–1352. Andrew Wen, Sunyang Fu, Sungrim Moon, Mohamed El Wazir, Andrew Rosenbaum, Vinod C Kaggal, Si- Ben Shneiderman. 2016. The new ABCs of research: jia Liu, Sunghwan Sohn, Hongfang Liu, and Jung- Achieving breakthrough collaborations. Oxford wei Fan. 2019. Desiderata for delivering nlp to University Press. accelerate healthcare ai advancement and a mayo clinic nlp-as-a-service implementation. NPJ Digital Ben Shneiderman. 2018. Twin-Win Model: A human- Medicine, 2(1):1–7. centered approach to research success. Proceedings of the National Academy of Sciences of the United Sarah Wiegreffe and Yuval Pinter. 2019. Attention is States of America, 115(50):12590–12594. not not explanation. In Proceedings of the 2019 Con- ference on Empirical Methods in Natural Language Stefan Slater, Srecko´ Joksimovic,´ Vitomir Kovanovic, Processing and the 9th International Joint Confer- Ryan S Baker, and Dragan Gasevic. 2017. Tools for ence on Natural Language Processing (EMNLP- educational data mining: A review. Journal of Edu- IJCNLP), pages 11–20, Hong Kong, China. Associ- cational and Behavioral Statistics, 42(1):85–106. ation for Computational Linguistics. Frank Z Xing, Erik Cambria, and Roy E Welsch. 2018. Natural language based financial forecasting: a sur- vey. Artificial Intelligence Review, 50(1):49–73. Adam Yala, Regina Barzilay, Laura Salama, Molly Griffin, Grace Sollender, Aditya Bardia, Constance Lehman, Julliette M Buckley, Suzanne B Coopey, Fernanda Polubriaginof, et al. 2017. Using machine learning to parse breast pathology reports. Breast Cancer Research and Treatment, 161(2):203–211.