CAREER: Structured Output Models of Recommendations, Activities, and Behavior: project summary

Predictive models of human behavior underlie many of the most important computing systems in science and industry. Examples abound that seek to suggest friendship links, recommend content, esti- mate reactions, and forecast activities. Across years of research, and dozens of high-profile applications, these systems almost invariably follow a common structure: (a) collect large volumes of historical ac- tivity traces; (b) identify some target variable to be predicted (what will be clicked? what will be liked? who will be befriended? etc.); and (c) train supervised learning approaches to predict this output as accurately as possible. In this proposal, we seek to investigate a fundamentally new modality of recommender systems and personalized models that are capable of estimating rich, structured outputs. That is, rather than predicting clicks, likes, or star ratings, we can systems that are capable of answering complex questions about content, estimating nuanced reactions in the form of text, or even designing the content itself in order to achieve a specific reaction. Our plan to achieve these goals builds upon traditional recommender systems approaches (that esti- mate models of users and content from huge datasets of historical activities), as well as newly emerging methods for generative and structured output modeling. At the intersection of these ideas shall be a new suite of generative models whose outputs can be personalized to individual users, and equivalently, new forms of recommender systems that can generate rich, structured outputs. We realize this contribution via three research thrusts. In each, we seek to extend work on gener- ative and structured output modeling to allow for personalized prediction and recommendation. In our first thrust, we consider knowledge extraction and Q/A systems, where our goal is to answer questions that depend on subjective, personal experience. Next, we focus on personalized generative models of sequences and text, in order to estimate individuals’ nuanced reactions to content. Finally, we con- sider personalized Generative Adversarial models, that allow us to address the problem of personalized content creation and design, i.e., to suggest new content that would elicit a certain response, if it were created. Intellectual merit. Methodologically, the above contribution extends each of the research areas upon which it builds: Predictive models of human behavior (and in particular recommender systems), and generative models/structured output prediction (in particular, GANs, LSTMs, etc.). Closing the research gap between these areas enables a host of exciting new applications dealing with structured knowledge, text, and images, and places us at the forefront of a newly emerging research area. Broader impact. Scientific: Our research can be used to develop models for a range of structured output tasks where the ‘user’ explains substantial variability in the data. Such applications are ubiquitous in do- mains ranging from linguistics, social science, personalized healthcare (etc.). Specifically, our research shall be supported through collaborations with the UCSD School of Medicine, to investigate systems for automated patient intake and triage, by predicting the severity of symptoms from automated Q/A dialogs (Thrust 1), and with the UCSD Cognitive Science Department to develop models of heartrate data, to build systems capable of designing personalized exercise routines (Thrust 2). Industrial: Personalized, predictive models of human behavior are of broad industrial interest, as drivers for applications in e-Commerce, social media, and recommender systems (etc.). This proposal is supported by collaborations with , to study personalized models of artistic preferences, and , to study systems for personalized Visual Question Answering (Thrust 3). Educational: We are developing the curriculum for UCSD’s new Data Science major and minor, and are creating new coursework that aims to expose graduate and undergraduate students to cutting-edge research. We also regularly organize community workshops, and are involved in mentorship activi- ties aimed at fostering the retention and involvement of groups including URMs, high-schoolers, and engineering professionals.

1 Summary: We seek to build the next generation of personalized recommender systems that support predictions in the form of rich, structured outputs. We seek systems that are more interactive, by responding to complex queries, more contextual, by estimating detailed reactions to content, and more creative, by suggesting what content should be created, in order to elicit a specific response. We combine ideas from personalized and context-aware recommender systems (the main research area of the PI’s lab [1–27]), with recent advances on generative and structured output modeling. Doing so allows us to build behavioral models and recommender systems whose outputs include answers to complex questions, text, or images—representing a fundamental advance over existing models that are limited to predicting simple numerical or categorical values. Building these models helps to advance traditional applications of recommender systems, and has broader applications wherever individuals interact with complex content, including medicine, e-commerce, and personalized health. 1 Introduction Every day we interact with predictive systems that seek to model our behavior, monitor our activities, and make recommendations: Whom will we befriend [25]? What articles will we like [26]? Who influences us in our social network [19]? And do our activities change over time [23]? Models that answer such questions drive important real-world systems, and at the same time are of basic scientific interest to economists, linguists, and social scientists, among others. Despite their success in a field with many important applications, the fundamental structure of rec- ommender systems has not changed in many years: all of the above questions boil down to regression, classification, and retrieval. This limits the type of queries that can be addressed, and several important problems remain open. For example, methods based on regression and retrieval cannot estimate how a user would react to content (e.g. what would they say?), or what type of content should be created in order to achieve a certain reaction. Such questions are fundamentally more challenging, as they depend on generating complex, structured outputs and cannot be solved by existing predictive models. In this proposal, we seek to develop a new area of rich-in- (I) Structured inputs put, rich-output recommender systems. Our research seeks to Classifiers (CNNs, etc.) facilitate (a) finer-grained modeling of behavior; (b) generalize Context-Aware RecSys to a larger class of queries (both in terms of inputs and out- Content & This puts); and (c) support the generation of content, closing a gap proposal: in the capability of existing approaches. This research is at the (Cond.) GANs Rich-input, confluence of several emerging and established research areas Rich-output (GANs,Generative LSTMs,Encoder-Decoder etc.)models RNNs (Fig. 1), including recommender systems (especially those that Q/A & Dialog SystemsRecommender leverage structured inputs), as well as recent developments on Systems T1: interactivity generative modeling and structured output learning. Our re- T2: context search aims to be at the forefront of a newly emerging area that T3: creativity combines personalization with modern machine learning tech- "Traditional" RecSys (CF & MF, BPR, etc.) niques, for data including structured knowledge [28–30], text (O) Structured outputs (R) Behavioral models & [31–39], and images [40–43]. Recommender Systems Figure 1: We seek to leverage recent work on genera- Research Plan. We propose an ambitious research program tive and structured output modeling (O) with recom- consisting of three thrusts to realize this contribution. In each mender systems techniques (R), including those that thrust our goal is to show how notions like personalization and make use of structured and multimodal inputs (I). subjectivity can be can be combined with complex, structured output models, in order to build new models and drive transformative new applications. In Thrust 1, we focus on knowledge extraction and Q/A, where the notions of personalization and subjectivity allow us to answer questions that depend on subjective personal experience, and to develop systems that are more empathetic to people’s individual needs. In Thrust 2, we focus on personalized generative models of sequences and text (and in particular LSTMs), which allow us to estimate more nuanced reactions, by predicting people’s responses (in terms of the language they would use) to previously unseen content. And in Thrust 3, we consider personalized Generative Adversarial Networks (GANs), allowing us to address the problem of personalized content creation and design, i.e., to suggest new content that would elicit a certain response, if it were created. Each thrust is associated with specific broader impacts, to be realized through established academic and industrial collaborations: In Thrust 1, we will study ‘subjective symptoms’ from clinical Q/A dialogs with the UCSD School of Medicine. In Thrust 2, we will study personalized models of heartrate and sequence data with the UCSD Cognitive Science Department. And in Thrust 3, we will study artistic preference models with , and personalized Visual Question Answering (VQA) with .

1 es rlmnr otemi hut fti rpsl o xml,asml omo tutrdoutput structured [ a In of output. form simple that a about example, uncertainty For proposal. a this is of thrusts recommender main the to preliminary tems, beers, for [ fashion systems and recommender art, include Examples [ [ systems goods models. consumer temporal and and wines, relational, visual, textual, applications. new important of work. behavior suite prior a & drives and background models, transformative Mak- PI behavioral a addressed. predictive represents be about can thinking complex—change that of methodologically questions way simple—though of conceptually types the the this change between ing fundamentally relationships can the we approaches, changing modeling By (Table ‘content.’ outputs and and inputs ‘users’ models’ between 3). complex interactions (Thrust complex to content model personalized answers and personalized 2), personalized including (Thrust make reactions outputs, and textual structured behavior, personalized predict rich, 1), activities, (Thrust of model questions form can the we in which recommendations, in area, research open. new [ remains exciting signal systems input recommender opportunity. auxiliary personalized and an gap of on Research context conditioned the are within that models such outputs using generate to trained system’s the be as can content high-dimensional with [ applications new captioning suggest and generation [ image synthesis as such applications for their high-dimensional, [ text and preferences complex using are artistic models: models and such trends building fashion to [ of devoted opinions people’s been of [ has evolution preferences and lab) their context our express people within how (including understand effort to of provides deal [ great information scenarios A side cold-start in Such (especially accuracy [ (etc.). prediction etc. graphs Machines, Factorization images, BPR, regression like quences, Netflix optimize techniques the to modern e.g. of order (see swathe in filtering a collaborative form, as like same well methods the as classical of prize), includes data This historical benchmarks. on retrieval training or by ratings), clicks, chases, hypothetical a (for proposal 2 current the and work, preliminary our ( work, scenario). prior recommendation fashion between Relationship 1: Table ← complex outputs simple outputs → uptadheteroscedastic recommendation and output multiple- work: Preliminary ‘unstructured’ recommendation work: Prior u taeyt ls hsgpi eitdi Table in depicted is gap this close to strategy Our popularity wide gained recently have images, and sequences, text, of models generative contrast, In se- text, of form the in side-information rich incorporate to extended been has work of line This nteps er ehv eu netgtn eea om f‘tutrdotu’rcmedrsys- recommender output’ ‘structured of forms several investigating begun have we year, past the In pur- (e.g. signals simple predict to seek systems recommender and models behavioral Traditionally, Background hsproposal This n ssc ehv xeineapyn ahn erigt eatclyrc aa including data, rich semantically to learning machine applying experience have we such as and , 2 , 55 11 , , 56 58 6 .Teesseshv eeae tt-fteatrslsfravreyo ak,adalso and tasks, of variety a for results state-of-the-art generated have systems These ]. ;finsi ik n oilcmuiis[ communities social and links friendship ]; – eeocdsi regressor heteroscedastic 10 f f f f f f f type output f , : : : : : : : : U U U U U U U U 13 × × × × × × I → → , I S I I I I U 14 ∗ → P → → → → × → , S R I R 3 I 60 ∗ S ( – × I ∗ 5 ) R ;adqeto-nwrn ytm [ systems question-answering and ]; u a’ eerhmsini odevelop to is mission research lab’s Our ewe hs w ie fiqiyw e nopruiyt eeo an develop to opportunity an see we inquiry of lines two these Between U , + 1 22 23 =user, oun2,adlvrgn h oe fnwyeegn generative emerging newly of power the leveraging and 2), Column , ,w sdti omo erso osgetmr fcetsurgery efficient more suggest to regressor of form this used we ], n design generation and content Personalized sys- tems dialog and Q/A Personalized etc. uncertainty, eling mod- regression: Heteroscedastic siaiglkl ecin and models text reactions personalized likely Estimating recommendation trade and Barter etc. generation, dle bun- regression: Multiple-output etc. purchases, likely actions, next Predicting Retrieval: etc.) clicks, opinions (ratings, Predicting Regression: example/applications , 24 I , =item, 27 9 11 .. erso htmdl oha upta ela its as well as output an both models that regressor a i.e., , , ,icuigscal n eprlyaaerecommender temporally-aware and socially including ], 10 , 23 , outputs S ,aduigiae oudrtn h iulevolution visual the understand to images using and ], 13 ∗ 2 =sequence/text, , 12 51 1 , 49 ieeitn eomne ytm ese to seek we systems recommender existing like : 15 .Nt oee htwiethe while that however Note ]. r tl ipeadlow-dimensional. and simple still are ro work prior [ ) n oices oe nepeaiiy[ interpretability model increase to and ]), 23 [ [ [ , [ 20 12 16 9 , [ [ , 16 22 44 52 24 , , , 1 47 46 15 18 ] ] , , ] 45 ] ] ] , ,uigsqeta aat nesadthe understand to data sequential using ], 25 ] 53 , context ,mcietasain[ translation machine ], I 59 saeo osbeitems) possible of =space 12 ;rdi umsin [ submissions reddit ]; output , 15 htcnb sdt improve to used be can that rdciemdl fhuman of models predictive ]. ainso hs models these of Variants . xml nu/uptthrust input/output example 40 – 43 inputs 44 54 , 26 , 57 ,adtext and ], 45 ;images, ]; ,though ], , osuch to 48 hut3 Thrust 2 Thrust 1 Thrust ]. 50 ]. prelim. Short term goals Long term goals Thrust Mission Enabling technologies work and intellectual impact and broader impacts Make behavioral mod- Subjective knowledge ex- Neurotological Q/A and dialog Develop personalized and sub- els more interactive traction from crowds [29], analysis for patient intake and 1 [12, 15] jective ‘opinion bases’ and Q/A by answering complex MoE-based Q/A systems triage (w/ UCSD School of systems queries [12] Medicine) Make behavioral mod- Develop personalized LSTMs Personalized sequence models LSTMs, encoder-decoder els more contextual by and text models to estimate nu- for heartrate monitoring and ur- 2 architectures, sequence-to- [24] estimating nuanced re- anced reactions to unseen con- ban sensor data (w/ UCSD sequence models [54] actions tent CogSci Department) Make behavioral mod- Siamese Nets [61], GANs Develop personalized Genera- Personalized VQA and artis- 3 els more creative by [62, 63], conditional [9, 47] tive Adversarial Networks for tic preference modeling (with generating new content GANs [40–43, 57] content synthesis and design and ) last 1-3 years =⇒ next 2-4 years =⇒ next 3-6 years Table 2: Short term goals, long term goals, broader impacts, and collaborators. schedules, showing that both duration as well as our uncertainty about the duration varies due to the char- acteristics of the operation and the individual surgeon. We have also recently considered other forms of ‘simple’ structured recommendation, such as multiple output recommendation and set recommenda- tion. Examples include recommending trading partners in online bartering platforms (where we must recommend trading partners, as well as a pair of items they can mutually trade) [16], or recommending bundles of products to be purchased together [18] (where one has to consider issues such as compatibil- ity and diversity). Although simpler than the types of complex outputs considered in this proposal, what our preliminary work demonstrates is: (1) There is genuine research gap to be filled, in the sense that there are a variety of open problems and new applications at the intersection of structured output and predictive behavioral models, ranging from surgery scheduling [22] to DDR step-chart generation [20]. And (2) In addition to driving new applications, even ‘traditional’ objectives (like duration prediction, a regression task) can be quantitatively improved by considering structure in the output space. 3 Research trajectory Our overall research trajectory is depicted in Table 2. As mentioned, each thrust is motivated by a specific application that extends structured output modeling with personalization. Each builds upon enabling technologies from prior work on knowledge extraction systems, generative text models such as LSTMs, and generative image models such as GANs. Our short term goals consist of developing specific extensions to these technologies to explain effects that arise due to user variance. Building on established technologies and metrics ensures the feasibility of our evaluation plan (Section 6). While the ‘obvious’ applications of our research are to domains like recommender systems, opinion mining, and e-commerce, our focus on these applications is largely opportunistic, as these are domains for which large, labeled datasets are widely available. Our long term goals are concerned with explor- ing related applications where ‘users’ also exhibit substantial variance, such as clinical notes, heartrate data, etc. Realizing these goals is a multi-year process, due to the need to manage interdisciplinary collaborators, collect sensitive data, survey human subjects, etc. Challenges, Feasibility, and Qualifications. Developing practical approaches to modeling human be- havior from large scale, high-dimensional datasets presents a range of challenges: Methods must be scalable, as many of the datasets we consider (Table 9) span several terabytes of actions, text, and im- ages. Moreover, such datasets are sparse and long-tailed when we consider the problem of modeling individuals. Thus we need models that trade off complexity with parsimony: generative models of text and images already require millions of parameters and lengthy training schedules, thus we must extend them while adding only a modest number of parameters and computational overhead. Our approach to these challenges incorporates ideas from recommender systems—where scalability and sparsity issues are rife—including dimensionality reduction and regularization techniques, into end- to-end learning systems for Q/A, text, and image data. Our lab has significant experience dealing with each of these domains (e.g. Q/A [12, 15], text [5, 24], and images [6–10, 13, 14]), including some of the largest-scale studies to date, spanning hundreds of millions of activities mined from sources including Amazon [5], facebook [25], and Google [21]. Although we acknowledge the complexity of a research program consisting of several challenging tasks, each with their own academic and industrial partners, this is exactly the kind of cross-disciplinary, collaborative research at which our lab excels, having collaborated with 33 authors across 10 departments and institutions during the last two years alone.

3 4 Contributions and Impact 4.1 Intellectual Merit Methodologically, our research contributes to each of the areas upon which it builds. To the area of generative modeling and structured output learning, our contribution is a new suite of techniques and approaches (‘personalized’ knowledge extraction, personalized text generation, personalized image generation, etc.) that can be used to model applications where the ‘user’ explains substantial variance in the data, due to personal preferences, biases, subjectivity (etc.). Such applications are ubiquitous within recommender systems and social media mining, but also emerge when modeling (for example) clinical notes (Thrust 1), heartrate data (Thrust 2), etc. Our research also drives new applica- tions in recommender systems and person- alized modeling. Our goal is to trans- form the way systems are currently trained and evaluated, by re-imagining the notion of ‘content’ as more than just features to augment a system’s performance, but as desirable outputs that enhance interactiv- ity, context-awareness, and creativity. The relationship between our contributions, in terms of methodology, applications, and broader impacts is depicted in Fig. 2. 4.2 Broader Impacts Health. Our collaboration with the UCSD School of Medicine shall investigate sys- tems for patient intake and triage, by pre- dicting the severity of symptoms from auto- mated Q/A dialogs. The primary challenge to overcome is the subjectivity with which Figure 2: Thrusts, applications, and impact. symptoms are reported, which we expect to benefit from personalized Q/A systems (Thrust 1). And with the UCSD Cognitive Science department we shall investigate personalized models of heartrate data, which we cast as a structured output task for the purpose of recommending personalized exercise routines (Thrust 2). Broadly, for applications like personalized health, it is critical to deal with structured data, while simultaneously accounting for variance that arises due to individual behavioral patterns [22]. Commercialization. Broadly, work on personalized recommender systems has direct impact to count- less industrial applications, such as e-Commerce [5], social media mining [26], and even video gam- ing [20]. Our lab has ongoing collaborations with industrial partners, to develop personalized visual models of artistic preferences (with ) and personalized Visual Question Answering Systems (with ). These collaborations are described in Thrust 3. Note that while our work is clearly of commer- cial interest, the goals of our proposal are basic-science contributions that differ from what is typically funded via industrial grants; rather, our industrial collaborators provide complimentary expertise, and help to ensure that the potential impact of our work is realized. Letters of Collaboration from the UCSD School of Medicine, , and are attached. Education and Outreach. In the coming year we shall create an additional research-oriented class that closely follows the thrusts in this proposal, and develop the intro-level curriculum for UCSD’s upcoming Data Science major and minor. We are also involved in mentorship programs aimed at increasing the retention of URMs in CS, and other activities targeting high-schoolers and engineering professionals. We regularly organize community workshops, including the SoCal Machine Learning Symposium, and the Workshop on Mining and Learning with Graphs. Data. The dissemination of new datasets is an ongoing effort and major contribution of our lab. Al- though we have collected a wide selection of datasets that can already support the proposed work (Ta- ble 9), new data will be collected throughout this proposal. The PI has a track record of making all data and code publicly available; many of the datasets we have collected have become de facto standards in academia and industry, having been used in dozens of ML and data mining classes worldwide (including our own), and receiving thousands of downloads per month.

4 product: Sanford EXPLORERV Vista product: Thermos 16 Oz Stainless Steel product: Mommy’s Helper Kid Keeper Explorer 60” Tripod Question: “how many hours does it keep hot Question: “I have a big two year old (30 Question: “Is this tripod better then the and cold ?” lbs) who is very active and pretty strong. AmazonBasics 60-Inch Lightweight Tri- Will this harness fit him? Will there be any pod with Bag one?” sample responses: “Does keep the coffee very hot for sev- room to grow?” eral hours.”; “Keeps hot Beverages hot for a long time.”; sample responses: “So if you have big babies, this may sample responses: “However, if you are looking for a “I bought this to replace an aging one which was nearly not fit very long.”; “They fit my boys okay for now, but steady tripod, this product is not the product that you are identical to it on the outside, but which kept hot liquids I was really hoping they would fit around their torso for looking for”; “If you need a tripod for a camera or cam- hot for over 6 hours.”; “Simple, sleek design, keeps the longer.”; “I have a very active almost three year old who corder and are on a tight budget, this is the one for you.”; coffee hot for hours, and that’s all I need.”; “I tested it by is huge.” “This would probably work as a door stop at a gas station, placing boiling hot water in it and it did not keep it hot but for any camera or spotting scope work I’d rather just for 10 hrs.”; “Overall, I found that it kept the water hot for lean over the hood of my pickup.” about 3-4 hrs.” Figure 3: Examples of community questions showing dependence on context (left), dependence on the querier (middle), and the inconsistency of opinions (right). From our prior work [12, 15].

Thrust 1: Answering Complex Questions: Personalized Knowledge Extraction and Q/A Systems Introduction and problem statement. Extracting useful and actionable knowledge from massive vol- umes of unstructured data is a fundamental problem with applications in a variety of domains. Systems range from IBM’s Watson—used to answer Jeopardy questions [30], to Microsoft’s system that analyzes search logs to predict adverse drug reactions [64], to academic systems that discover relationships in literature ranging from pharmacogenomics to paleontology [65]. Building such systems means addressing two basic problems: One consists of condensing huge vol- umes of unstructured or semi-structured data—especially text—into codified form, i.e., to build ‘knowl- edge bases’ of facts or evidence. The second problem consists of querying the knowledge base in a natural and interactive way, whether to identify the entities being referred to in a Jeopardy question, or to predict a drug reaction from patterns in search logs. Existing work on knowledge base extraction generally assumes that there is some ‘truth’ to be dis- covered, and that deviation from this truth is due to noise or misinformation that must be corrected. However this may be too simple an assumption in the context of recommender systems, where questions are subjective and require personalized answers. Consider for example the questions in Fig. 3 (taken from our corpus of questions from Amazon [12]). In the first example (left), the user provides context describing their specific use case. In the second (center), the user asks a question that is highly subjective (asking which item is ‘better’); furthermore a good answer to this question should be personalized to the querier (are they an expert? on a budget? etc.). In the third (right), a seemingly objective question is asked, yet responses are remarkably inconsistent, reflecting a wide variety of personal experiences. What the above suggests is the need to develop knowledge extraction and Q/A systems that account for issues such as user-dependent context, personalization, and subjectivity. Through this thrust we hope to achieve these goals by combining knowledge extraction and personalization techniques, via two research questions: first to deal with the ‘subjective knowledge extraction’ problem (RQ1.1), and second to use this knowledge to provide personalized answers to complex subjective questions (RQ1.2). Thrust 1. How can the notions of personalization and subjectivity be used to improve the perfor- mance and interactivity of knowledge extraction and question/answering systems?

Prior and preliminary work. In [12, 15] we collected a dataset of 1.4 home & health kitchen million questions (and answers), which we used to study the problem of Okapi BM25+ 84.0% 81.7% question answering using consumer reviews. Our research raised several ‘Factual’ Text 86.3% 84.2% interesting attributes of real-world Q/A communities: Subjective Text 90.7% 88.0% Table 3: Prior work and motivation 1. Real Q/A communities surface multiple inconsistent answers (we col- (from [12]). Performance on sim- lected around four answers per question, on average) [15]. ple Q/A tasks can be improved with ‘subjective’ knowledge, such as that 2. Even questions identified as being ‘objective’ by crowd workers yield found in product reviews. Numbers inconsistent answers, based on differing personal experiences [12]. shown are the fraction of times a cor- 3. Traditional Q/A and information retrieval techniques perform poorly rect answer is selected over an alter- when confronted with nuanced and subjective questions [12, 15]. native on two Q/A corpora. Most importantly, we found that substantial performance improvements can be achieved by making use of subjective information, such as product reviews and opinion data, rather than relying on sources of ‘factual’ information alone (Table 3). While our prior work demonstrates the need for personalization and subjectivity within Q/A systems, the predictors were limited to binary Q/A problems or multi-class

5 answer selection. Given our fundamental research goal, here we plan to extend such methods to extract structured knowledge and to answer complex questions interactively. RQ1.1. Can we generalize knowledge bases and entity-relationship models by extracting ‘opinion bases’ that account for the range and subjectivity of personal experiences? First we will design entity-relation- ⟨ attribute, operator, comparison, preference score, subjectivity score, according to ⟩ “doesn’t taste as good as the new flavors” ⟨ ‘taste’, ≤, ‘new flavors’, 4.0, 1.0, Kevin HG ⟩ ship models that account for subjective “the chocolate is by far better” ⟨ ‘taste’, ≤, ‘chocolate’, 3.0, 1.0, Enner ⟩ “tastes like elmers glue with a fruit loop aftertaste” ⟨ ‘taste’, ∼, ‘elmers glue’, 2.0, 1.0, NC ⟩ relationships, as well as the characteris- “I had really bad hives-inducing allergic reactions to ⟨ ‘allergenic’, ≥, ‘Vega One’, 1.0, 0.5, Cembalista ⟩ this product...I have since switched to Vega One pow- tics of the opinion holder (Table 4). This der, which is soy free, and have had no reactions” requires both (a) defining ontologies to Table 4: Examples of the type of entity relationship to be extracted (from reviews describe the types of relations to be ex- of ‘Soylent Meal Replacement Drink’). Relationships may capture comparative tracted, followed by (b) a crowdsourcing judgments, and be associated with measurements reflecting their subjectivity. effort to extract ground-truth labels for training and evaluation. For the former we will adopt techniques from existing work on entity-relationship modeling [66, 67], while for the latter we will use techniques to resolve ambiguity [68] and mine subjective knowledge [29] from crowds. The next task consists of developing techniques to automatically extract further entity relationships from our complete corpus, using su- pervised or semi-supervised knowledge-graph completion techniques. We plan to make use of embedding techniques for relationship ex- traction, following our previous work [12, 14, 21] and prior work for Feature space Relation space knowledge graph completion [66]. Here entities to be linked (e.g. two Figure 4: We plan to extract relationships items being compared) are related via embeddings; using multiple em- using embedding techniques, where enti- beddings [14] or translation vectors [21, 66] allows for complex non- ties (h, t) are related via embeddings Mr metric relationships to be learned between entities (Fig. 4): and translation vectors (see e.g. [66]).

p(relation) = σ(hMrt) σ(Mr,0h + Mr,1t) σ(Mrh + Mrt + b) σ(Mr(u, s, p)h + Mr(u, s, p)t + b) bilinear model translation-based models personalized translation-based models [12, 15, 69] [14, 21, 66] RQ1.1 We have previously used similar embedding techniques to establish item-to-item relationships for rec- ommendation scenarios [5, 14, 21]. Although these techniques can be applied as-is as baselines for entity extraction, our main contribution here shall be to extend these methods with parameterized em- beddings Mr(u, s, p) that account for user latent factors (u), the subjectivity (s) of the statement in question, and other available features such as preference scores (p). We anticipate that accounting for such personalization and subjectivity factors shall lead to improved performance for entity relationship extraction, just as they did for simple Q/A tasks in [12, 15]. RQ1.2. How can the subjective and personalized knowledge bases developed in RQ1.1 be used to build more personalized and interactive Q/A systems? Having built a labeled dataset and used it to extract complex and subjective relationships, we must next use the extracted information to build structured question answering systems. In our prior work we built systems for answering binary questions and multiclass answer selection [12, 15]. To do so we made use of mixtures-of-experts type frameworks to answer (binary) questions using bilinear models: ∑ ∑ p(answer to is yes|q) p(f is relevant|q)p(y|f is relevant, q) = σ(fMqT ) · σ(fM ′qT ). (1) f∈ text fragments f Note that this framework from our prior work ignores the knowledge extraction task (RQ1.1) altogether, and instead tries to identify significant fragments via a latent relevance model; this proves effective for simple questions (Eq. (1) is a binary classifier), but does not generalize to generate structured outputs. Here we plan to (a) improve the performance of these systems using the knowledge bases extracted in RQ1.1; (b) to use structured knowledge from RQ1.1 to answer complex and open-ended questions, and (c) adapt Eq. (1) so that answers are tailored to the preferences of the querier. For (a) we can adapt the same framework from Eq. (1) to make use of structured knowledge (Table 4) rather than raw text; here our goal is to demonstrate that Q/A performance for subjective questions can be improved by making use of structured knowledge rather than free text alone. For (b) we shall make use of question analysis and question classification techniques similar to those already developed [70]. This consists of converting a question into a query in our knowledge base. We will initially start with simple and predefined sets of queries (e.g. to mine comparisons [71] and compatibility relationships [72]), before considering more complex and arbitrary relationships. For (c) we need to further parameterize the relevance models (M) to include characteristics of the user entering the query (i.e., to discover what relations are true for you).

6 Broader impacts: Q/A Systems for Patient Intake and Triage Although datasets such as the one we have collected from Amazon can already be used for training and evaluation, to increase the broader impacts of this proposal we are also developing structured Q/A systems for the specific problem of automated patient intake and triage (Fig. 5). Together with (Professor of Medicine, UCSD), and (a practicing physician), we are working with the UCSD Clinical and Translational Research Institute (CTRI) and the Institute of Engineering in Medicine (IEM) on a pilot study that will collect Q/A dialogs for neu- rotology patient intake (neurological disorders of the ear, Fig. 5). During the next 18 months, this study will collect Q/A data and medical notes from real patients, and develop initial prototypes. Note that the goal here is not to auto- Figure 5: (Synthetic) clinical Q/A dialog. matically answer questions, but rather to estimate whether a particular Q/A dialog corresponds to a high-priority case (e.g. whether it is benign or refers to an emergency situation), which can be used for automated triage. We expect that this type of analysis will benefit from systems that analyze text in a personalized way, given the subjectivity of how symptoms are reported. Evaluation. Detailed evaluation plans for all thrusts are included in Section 6. Thrust 2: Predicting Complex Reactions: Personalized Generative Models of Text and Sequences Introduction and problem statement. While existing personalized models focus on predicting what content users like or dislike, a more personalized and contextual system might tell users why they like some content, for example by estimating what they would say (i.e., the text they would use) in reaction to it. Such a system could be used to make more nuanced recommendations (by telling users both what they will like and why), or be used by content creators and marketers to forecast reactions in detail. Traditionally, ‘context’ is added to personalized models by training systems that are ‘context-aware’ [73–76] or ‘aspect-aware’ [77–81], e.g. to disentagle opinions into multiple dimensions. Again though, such systems treat context as input to the model, and still focus on predicting simple outputs. In this thrust we consider whether it is possible to estimate detailed reactions to content, in the form of text that a user might write in response to a given stimulus. For example, we might estimate what review a user would write for a product they haven’t reviewed yet (Fig. 6), or what specific aspects they might focus on, etc. This is a challenging language modeling task, as we must estimate (a) what types of language are used to describe the item? (b) what types of words does the user tend to favor? And (c) what words describe the interaction between the user and the item (e.g. the user’s preference)? These issues are further exacerbated by the need to work with incredibly sparse datasets, and the need to generate long sequences, a particularly challenging problem for generative language models [39]. Thrust 2. How can we adapt generative models of text and sequences to explain the variance that arises due to subjective and personal factors, by predicting how an individual would respond to a certain stimulus?

Prior and preliminary work. Our work builds generally upon the ar- Mosstrooper’s predicted review for eas of content and context-aware recommender systems [73–76], and Shocktop Belgian White: Poured from 12oz bottle into half-liter Pilsner more specifically on ‘aspect-aware’ sentiment analysis, itself a sim- Urquell branded pilsner glass. Appearance: Pours a cloudy golden-orange color with a small, quickly ple form of ‘structured output’ recommendation that aims to estimate dissipating white head that leaves a bit of lace behind. Smell: Smells HEAVILY of citrus. By fine-grained reactions in terms of a fixed set of aspects [77–81]. heavily, I mean that this smells like kitchen cleaner Generating text has long been cast in terms of trying to predict with added wheat (contd.)... the next token (a word or character) given those seen so far [82]. Actual review from test set: Poured from a 12oz bottle into a 16oz Samuel Adams Potential models to build upon include [83] (which seeks to gener- Perfect Pint glass. Appearance: Very pale golden color with a thin, white head that leaves little lac- ate wikipedia articles), as well as sequence-to-sequence models and ing. Smell: Very mild and inoffensive aromas of citrus. Taste: Starts with the same tastes of the citrus and fruit encoder-decoder architectures [54]. Such models have been used to flavors of orange and lemon and the orange taste is model reviews (and ‘tips’), though typically not with a focus on per- all there (contd.)... sonalized generation [31, 32, 36, 38, 39]. Figure 6: Preliminary experiments. Exam- Our own work on this topic dates back to [24], where we showed ple of a generated (i.e., predicted) review for a particular user/item combination (using a that recommendation performance can be improved by combining tra- Generative Concatenative Network, Fig. 7e), ditional recommender systems with topic models. This has been ex- along with its groundtruth from the test set.

7 tended to use various language models, including CNNs [38], and LSTMs [32, 36, 84] (as we propose to use here), though again these models struggle to generate outputs in a personalized way. RQ2.1. How can generative text models be adapted to capture user-, content-, and interaction- specific language, in order to estimate detailed textual reactions to content? Two of the main challenges in generating personalized responses Perplexity are (a) sparsity, i.e., learning the variance across peoples’ writing styles Mean Median based on a limited number of observations, and (b) generating long and Unsupervised (Fig. 7b) 4.23 2.22 structured sequences. For the former it would seem that the framework Personalized (Fig. 7e) 2.25 1.98 of encoder-decoder RNNs would be appropriate (Fig. 7c,d) [54]; this Table 5: Preliminary experiments. Com- bining language models with recommen- type of model has been used (for example) to caption images by first dation techniques lead to more accurate learning an image representation which is passed as input to an LSTM and personalized models of language. [53]; similarly, we might learn user and item representations as input to an RNN (Fig. 7d). However preliminary experiments showed this to be ineffective when generating long sequences [46, 85], as the model quickly ‘forgets’ its input and saturates to a background (i.e., non-personalized) model. An effective way to encourage the model to ‘remember’ its input/context is simply to replicate user and item representations at each step (via one-hot encodings), from which the model can learn interactions between the two (which we term a ‘Generative Concatenative Network,’ or GCN; Fig. 7e). This technique was used to generate the example in Fig. 6: this is a promising initial result, as the model is able to capture both the high-level structure of the review, as well as estimating the user’s subjective response to its attributes. This method also captures language more accurately in a quantitative sense (Table 5). However there are three lim- itations that we seek to over- come to make such models ef- fective: (a) The model requires several days of training (given a modest corpus of ∼200,000 doc- uments); this is largely an issue of dimensionality that arises due to one-hot encodings of users and items (which cannot scale beyond Figure 7: Selection of model architectures to be developed and points of comparison to be a few thousand users/items); we considered. Top left to bottom right: (a) Latent factor recommender system; (b) LSTM; plan to address this by inves- (c) Encoder-Decoder LSTM; (d) Encoder-Decoder architecture for reaction estimation; (e) tigating models that learn low- Generative Concatenative Network (GCN) [46]; Bottom row: (f) Factorized GCN; (g) Semi- dimensional user/item represen- Supervised Factorized GCN; (h) Conditional GCN for sequence modeling. tations that are passed as input to the LSTM at each step (Fig. 7f). (b) The model requires substantial amounts of observations per user and item (on the order of 100 documents) to learn effective personal- ized representations; this is not always realistic in sparse datasets. We plan to address this by investi- gating ‘semi-supervised’ learning approaches that learn user/item representations from more abundant sources of data (ratings, clicks, etc.) in addition to a limited number of documents (Fig. 7g). And (c) a generative language model alone is not yet effective as a means of making predictions and recommen- dations; we address this in RQ2.2. A selection of possible architectures to be compared to achieve these goals is shown in Fig. 7. RQ2.2. How can the personalized language models from RQ2.1 be extended to make effective predictions and recommendations, and applied to other forms of sequence data? Our prior work [24] showed that combining language models with recommender systems leads to more accurate performance (in terms of predicting ratings), as we can learn users’ preferences more ef- ficiently from review content than we could from preference data alone (which makes sense, as reviews are intended to ‘explain’ the dimensions of people’s preferences). Our first goal with this research ques- tion is to show that the same is true for the Generative Concatenative models developed in RQ2.1, which we plan to achieve by investigating joint training training schemes for ‘BPR-like’ activity prediction models and GCNs. For example, a trivial formulation might follow the training approach we introduced in [24]: ∑ ′ log σ(x (γ , γ ) − x ′ (γ , γ )) +µ log perplexity(corpus|γ , γ ) . (2) | u,i u {zi u,i u i } | {z u i} u,i,i′ activity model (following BPR training scheme) language model (GCN or similar)

8 Second we will evaluate our models in terms of novel recommendation objectives. In particular we are interested in showing that GCN language models can be used for personalized search and retrieval, and to more effectively navigate existing content. For example a user might describe the desirable attributes of an item (e.g. ‘good suit to wear to a wedding’), and the system would surface the items about which the user would be most likely to make such a statement; this connects to our goals from Thrust 1, as doing so provides an additional method to solve retrieval tasks in terms of personalized and subjective language. We also plan to investigate how our approach can be used for personalized review summarization and aspect-based discovery, following ideas from [86–92]. Broader impacts: Personalized Models of Heartrate Data for Exercise Recommendation Beyond learning personalized generative models of text, we plan to apply the same technology to other forms of sequential data. We will collaborate with (through a co-advised student, ) to use Generative Concatenative Networks to build per- sonalized models of heartrate data, using a large dataset of workout histories (including GPS, heartrate logs, weather, etc.) collected from EndoMondo. We cast this as a sequence-to-sequence prediction task (see Fig. 7h) that aims to model the heartrate sequence a user will exhibit given (GPS data of) a work- out to be undertaken. We will use models of this data to build recommender systems that can suggest activities and design exercise schedules that will help individuals to achieve personalized fitness goals. Thrust 3: Generating Complex Content: Personalized Design and Generative Adversarial Recommendation Introduction and problem statement. Our final thrust of research consists of studying how personal- ization and recommendation can be combined with Generative Adversarial Models. Our main goal with this thrust is to design new systems that can facilitate the personalized generation of new content, rather than merely retrieving content that already exists. We consider this problem within the context of ‘visually-aware’ recommendation. This allows us to combine our own prior work on visually-aware recommender systems [7–11, 14, 19], with recent developments on comparitive image models [7, 61] and generative adversarial models of images [62, 63]. Again, our goal is to show that an important class of existing structured output models can be extended and improved by incorporating personalization and behavioral modeling techniques. Such methods have immediate applications to domains like fashion recommendation and design (for example), but more broadly to any domains where subjective visual features guide people’s decisions. This research has the potential to (a) lead to more accurate personalized retrieval techniques, by better understanding the visual dimensions that guide people’s decisions; (b) to develop more expressive gen- erative models that can be conditioned on subjective factors; and (c) to take a first step toward building systems that can be used to aid the (personalized) design of new content. Thrust 3. How can personalization be used in the generative design of content, i.e., to suggest the characteristics of content that users would interact with, if it were created. Prior and preliminary work. Recently, we have shown that recommender systems can perform signif- icantly more accurately when made to be ‘visually aware,’ especially in sparse datasets or ‘cold-start’ settings [9, 10] (VBPR, Table 6). So far, we have built systems that incorporate visual features ‘off the shelf,’ in concert with content-aware recommendation approaches [9]. We have shown that such tech- niques can be used to recommend clothing and outfits [6, 9], to understand the evolution of styles over time [10], and to understand the visual and social factors that influence people’s artistic preferences [13]. In parallel, there are two lines of research from computer vi- BPR VBPR DVBPR sion that we seek to extend via behavioral modeling techniques. [45][9] RQ3.1 Amazon fashion 0.628 0.748 0.796 The first is Siamese CNNs [61]: these are discriminative models Amazon fashion (cold) 0.551 0.732 0.772 that can be trained to make comparisons between images using two Tradesy.com 0.586 0.750 0.786 ‘copies’ of the same CNN. This can be used (for example) for face Tradesy.com (cold) 0.542 0.753 0.779 verification [93], or to determine whether two items belong to the same style [7], though not in a personalized way. Two recent works suggest that such models could be used to develop better person- Table 6: Prior and preliminary results (in terms alized recommender systems: In our own work [7] and in [94], of AUC). In [9] we showed that improved results can be obtained by incorporating visual features Siamese CNNs are used to identify visually compatible items, and into content-aware recommenders (VBPR); in in [95] comparative judgments between images are modeled, which [10] we showed that the same techniques can be is a similar problem to personalized recommendation. used to study fashion trends over time.

9 Pretrained Network

Item Model

User Model (a) (b) (c) (d) (e) (f) (g)

Figure 8: Models to be developed and evaluated in Thrust 3. We seek to design personalized models for content creation and design. A selection of possible architectures (including from prior work) are shown above. From left to right: (a) (non-personalized) Siamese network [61]; (b) Pre-trained network within content-aware RecSys (VBPR, [97]); (c) Deep VBPR (RQ3.1); (d) retrieval with DVBPR; (e) Discriminator; (f); Personalized GAN (RQ3.2); (g) Joint GAN/recommender. A second line of work consists of models for image generation, and in particular Generative Ad- versarial Networks (GANs). These are unsupervised learning frameworks in which two components ‘compete’ to generate realistic looking outputs [62, 63]: a generator is trained to generate images, while a discriminator is trained to distinguish real versus generated images. Thus the generated images look ‘realistic’ in the sense that they are indistinguishable from those in the training set. Such systems can also be conditioned on additional inputs, in order to sample outputs with certain characteristics [40–43, 57]. We seek to combine these lines of work through two technical contributions. First, we shall inves- tigate recommender systems that can handle high-dimensional image content in an end-to-end fashion, following a recent trend of incorporating representation learning techniques into recommender systems [96]. Having developed these representations, we seek to use our models generatively, so that rich content is not just an input to the system but also an output. RQ3.1. How can image representation and generation techniques be personalized, and conversely how can we extend systems for content-aware recommendation to make use of end-to-end image representation techniques? Our first task is to investigate end-to-end visually-aware models for personalized recommendation, extending our prior work that used features ‘off-the-shelf’ within content-aware recommender systems. Learning such personalized representations is necessary before we attempt to synthesize new items in RQ3.2, and we expect that doing so will also lead to improved predictive performance. Our prior work made use of the BPR ∑ ′ ℓσ(x − x ′ ) framework [45], in which comparative u,i,i u,i u,i BPR [45] judgments are modeled between users and visual user/itemz }| preference{ T T content by fitting a function xu,i such that xu,i = α + βu + βi + |γu{zγ}i + θu (Efi) VBPR [9] Fig. 8b items i the user u prefers (likes, purchases, latent user/item preference etc.) are associated with a higher score than T xu,i = α + βu + θu Φ(Xi) ‘Deep’ VBPR RQ3.1 Fig. 8c other items i′. Training consists of maxi- T mizing pairwise differences (Table 7, row argmaxe∈X θu Φ(e) retrieval RQ3.1 Fig. 8d 1). x can be any function, e.g. our prior T − − 2 u,i argmaxe∈G(·) θu Φ(e) η[D(e) 1] synthesis RQ3.2 Fig. 8e-g work made use of compatibility functions that incorporate image features from a (pre- Table 7: Selection of formulations to be developed and compared for Thrust 3. trained) CNN (Table 7, row 2) [9]. We shall investigate a variety of end-to-end representations (including those in Fig. 8) in order to determine which are best for preference learning, and which can most effectively be adapted to recom- mendation tasks. The idea here is that both image representations (Φ(·)) and users’ preferences toward them (θu) are learned jointly (Table 7, row 3). Our preliminary results (Table 6) show that even a trivial architecture (a ‘vanilla’ Siamese CNN, with a user preference term, Fig. 8c), already improves results over pre-trained models. Following this, RQ3.2 seeks methods for GAN training that are capable of synthesizing items that look realistic while targeting the preferences of specific users. RQ3.2. How can recommender systems be used generatively for the purposes of content design, and how can users (or designers) effectively navigate such systems?

10 real item from dataset Existing systems, including those to be de- personalized design for a specific user veloped in RQ3.1, are concerned with predicting generated items from GAN which existing items and content a user will inter- act with (purchase, click, view, etc.). By building personalized models capable of synthesizing new content, we hope to develop tools that might be continuous manifold of plausible designs used by content creators or designers in order to Figure 9: We seek not just to recommend existing content, but to use generate, or to suggest attributes of, content that our system to design new content, that may be desirable to specific users might be desirable to a user, population, or group. or groups. For example, we can generate items with desirable or fash- An example of this type of contribution is ionable attributes. To do so we seek to estimate a manifold of plausible shown in Figure 9. Essentially our goal is to de- items (above) that can be explored in a personalized way. velop methods that can efficiently sample arbitrary ‘realistic’ items from a prior distribution, and later to condition samples on the preferences or attributes desired by individual users. Generative Adversar- ial Networks (GANs) [98] offer an effective way to capture such a distribution and have demonstrated promising results in a variety of applications [99, 100]; in particular, conditional GANs [40] can gen- erate images (or other content) according to semantic inputs like class labels [41], textual descriptions [57], etc. The overall ideas of such a system and potential architectures are shown in Figure 8. In a traditional GAN model (Fig. 8e and f), a discriminator D and a generator G are trained to optimize a value function V (G, D) [98]:

min max V (D,G) = Ee∼p (e)[log D(e)] + Ez∼p (z)[log(1 − D(G(z)))]. G D data z A trivial way to use such a model to generate personalized items is to max- T imize an objective that balances a user’s preference score θu Φ(e) with the Figure 10: Preliminary results. likelihood of the image being ‘real’ [D(e) − 1]2 (Table 7, row 5); we used this Examples of personalized GAN- generated items (Fig. 8f). simple objective to generate the preliminary experiments in Figs. 9 and 10. These preliminary results are promising in the sense that they look ‘plausible,’ and have higher per- T sonalized objectives θu Φ(e) compared to existing items in the corpus. Following these results we seek to address the following problems: (a) evaluate, extend, and improve the results using deeper conditional networks modified from [63, 100]; (b) investigate joint training schemes that simultaneously optimize the recommendation and generation objectives (e.g. following the model in Fig. 8g); (c) conduct user studies to evaluate the quality of generated items. For the latter, much as we did in Thrust 2, we aim to show that the model is not merely useful as a means of synthesizing new content, but can also be used to better navigate and retrieve existing content in a quantitative sense [101]. Broader impacts: Artistic Recommendation and Personalized VQA This thrust is connected to work with two of our lab’s collaborators. With , we have built models for artistic recommendation that in- corporate visual and social features, using both public and proprietary data from the artistic community Behance [13], and more recently we have begun preliminary work developing models that incorporate both Siamese Networks and GANs (RQ3.1)[47]. Our lab will also collaborate with to build systems to an- swer personalized queries regarding the visual appearance of items. Even simple queries (e.g. finding ‘good shoes to wear to a black-tie wedding’) are challenging due to issues of ambiguity, subjectivity, and the dependence on visual appearance. Solving this problem is con- nected to Thrust 1, due to our need to answer subjective queries, and Figure 11: With XXXXXX we are working Thrust 3, due to our need to understand preferences within image mod- on systems for personalized Visual Ques- els. This line of research extends recent developments on Visual Ques- tion Answering. tion Answering (VQA) [102] to allow for personalization and subjectivity (Fig. 11). Our research in this thrust focuses largely on applications in art and fashion. While these applications are important in and of themselves [103–118], again our focus on them is largely due to the fact that these are domains for which the best data is available, i.e., large image corpora, with associated preference judgments (reviews, likes, etc.). If successful, we plan to apply the same technology to other areas where visual factors influence people’s decisions, as well as to non-visual domains where Generative Adversarial training can be used to design content (e.g. text [33, 119–121]).

11 task example metrics established in comparison points/baselines datasets (Table 9) linear/bilinear models, translation- RQ1.1 Precision/Recall (hit rate), AUC [12, 15, 66] Extracted entity relations from based embedding techniques Amazon Q/A and neurotological Relevance ranking techniques RQ1.2 AUC, MAP, Precision@k [12, 15, 161, 163] dialogs (e.g. Okapi-bm25+) RQ2.1 Perplexity, per-token accuracy, F-score [20, 46] char-LSTM, GCN, Fig. 7b-g BeerAdvocate, Amazon, Yelp, RQ2.2 RMSE, AUC, subjective preference [45, 47] BPR and MF-like methods, Fig. 7h EndoMondo, Strava, DDR, etc. RQ3.1 RMSE, AUC [9, 10] VBPR, TVBPR [9, 10], Fig. 8b,c GANs and Conditional GANS Behance, Amazon RQ3.2 Inception score, SSIM, diversity [41, 62, 175] (e.g. [40, 41, 174]), Fig. 8d-g Table 8: Selection of metrics, baselines, and datasets to be used for evaluation.

5 Further Related Work The overall themes of this proposal are grounded in traditional approaches to behavioral modeling, i.e., methods that establish latent user preferences and item properties from historical activity traces [45, 122–128]. A variety of methods extend traditional approaches by making use of content and context [50, 129–131], including temporally-aware [132–142], and socially-aware [2, 19, 76, 131, 143–146] models; such methods have also been extended to incorporate modern (i.e., ‘deep’) learning techniques [96, 147–149]. These techniques—which combine rich and heterogeneous forms of content to improve prediction—are also related to ideas from multimedia and information fusion [150–158]. Although we build on such technology (as well as our own work on content-aware behavioral models [1–25]), the main novelty of our work is the focus on ‘content’ in the output space, rather than as input to the model. Our proposed work on subjective knowledge discovery and Q/A (Thrust 1) fits within a large body of work on Q/A and information retrieval. Our work is somewhat related to ‘general purpose’ techniques for Q/A, retrieval, and knowledge graph completion [30, 66, 67, 159–163], though these differ widely in formulation, and most are quite different from our own take on the topic [12, 15]. More closely related are methods that mine knowledge specifically from reviews, either for the purpose of summarization [86–90], or aspect-based opinion mining [91, 92]. A variety of works build Q/A systems upon opinion data [71, 159, 164–170], though most are limited to specific forms of query (e.g. mining compatibility relationships [72]). Part of this limitation of existing work owes to the lack of adequate benchmark data for general-purpose subjective Q/A, an issue our own work partly addressed through the introduction of a large subjective Q/A dataset [12]. Our work is also related to ideas of knowledge acquisition from crowds, especially methods that deal with subjective information [29]. Our work on personalized text generation (Thrust 2) again builds on ideas from aspect-aware rec- ommendation and summarization [78–81, 171], as well as a growing body of work that uses text as a form of side information to improve the performance on traditional predictive tasks (e.g. rating predic- tion) [37, 172, 173], including methods based on CNN and LSTMs [34–36]. We also build on standard models for text generation [33, 52, 55, 56, 84], and more specifically review generation [31, 32], though most existing models consider the generative task without personalizing results to individuals. Finally, our work on personalized generative models (Thrust 3) was initially motivated by the grow- ing body of work on fashion image modeling and recommendation [103–118]. Besides fashion, there is a growing interest in making recommender systems ‘visually aware,’ by exploiting content derived from images [51, 101], including our own work [6–10, 13, 14, 60]. Our work also builds upon image gen- eration approaches based on GANs [62, 63] and related evaluation protocols [174]. Such models have been adapted to take conditional inputs (e.g. attributes, or textual descriptions) [40–43, 57]. In essence our contribution shall be a form of conditional GAN that allows for conditioning on individual users. 6 Evaluation Plan 6.1 Metrics and Baselines All research in this proposal shall be measured in terms of baselines, metrics, and datasets estab- lished in prior work, as described in Table 8. In essence, our plan is to demonstrate that accounting for subjectivity/personalization yields improved performance over models that fail to do so. Knowledge base completion and Q/A (Thrust 1): For knowledge extraction tasks (RQ1.1), our goal is to extract entity relationships that match labeled groundtruth; we plan to crowdsource a dataset of at least 10,000 labeled relationships. Groundtruth relations should be identified (recall), and identified relationships should be valid (precision). This follows the protocol used elsewhere for knowledge-base completion [66]. We will initially make use of a sample of reviews and answers to questions from our existing Amazon corpus; this corpus contains 1.4 million answered questions, meaning that we can use

12 Category Type Description No. of Records Students & Thrust Refs. Collabs. e-Commerce General reviews Product reviews, metadata, 100M T2, T3 [7– images 11, 14, 19] Reviews Beer & wine reviews Beeradvocate, RateBeer, Cellar- 4.5M T2 [24, 177] tracker Reviews Local business reviews Google 11.4M T2 [21] Activities and images Clicks, likes, etc. (Behance) 11.8M actions, T3 [13] 980k images Q/A Q/A data Product-related queries Amazon 1.4M questions T1 [12, 16] and answers Entity relations* Extracted entity relationships RQ1.1 TBD T1 Medical dialogs* Medical Q/A and conversa- UCSD TBD T1 tions Sequences Dance Choreography DDR step charts StepMania 350k dance moves, T2 [20] 35 hours of audio Workout logs Heart-rate and GPS data (run- EndoMondo, Strava 1M workouts T2 ning and cycling) Metropolitan Sensors Road and building sensors UCSD (NSF-IIS-1636879) 40M sensor read- T2 ings Clinical procedures Procedure durations and char- UCSD 117K T2 [22] acteristics Other structured Peer-to-peer trades Bartering transactions Ratebeer, Tradesy 125k peer-to-peer [16] RecSys data transactions Grocery shopping Grocery shopping transac- Anonymous Seattle-based gro- 152k transactions [17] tions (baskets) cery store (via collaboration w/ across 53k trips Microsoft) Product bundles Bundled purchases Steam 902K [18] Table 9: Datasets to be used in this proposal, and associated thrusts and references. All are data either collected by the PI’s lab, or obtained via an industrial or academic partnership. After publication, all (non-private) data are released publicly. (*to be collected) it as a testbed for an end-to-end evaluation of our relation extraction systems, in terms of their ability to resolve queries correctly. Later we will use knowledge extracted from neurotology dialogs following our collaboration with the UCSD School of Medicine. For Q/A tasks (RQ1.2), we shall evaluate methods in terms of the AUC as in our previous work [12, 15]. Generative models of text (Thrust 2): initial evalua- tion of generative text models (RQ2.1) shall be in terms of perplexity and per-token accuracy measure- ments (i.e., how ‘surprised’ the model is to see held-out data), as we have previously used for generative sequence modeling tasks [20]. Given that such measures do not adequately characterize the quality (or ‘plausibility’) of generated text, we will perform further qualitative evaluation via user studies; these will consist of pairwise preference comparisons between text generated via different approaches (see e.g. the protocol we used in [176]). We will also consider quantitative retrieval and recommendation measures (RQ2.2), i.e., demonstrating that a richer text model leads to improved MSE/AUC/Hit Rate when re- trieving items (see e.g. [21]). For other sequence modeling tasks, like heartrate modeling, we are also interested in measures like the earth-mover’s distance, to compare groundtruth and predicted heartrate sequences. Generative image models and design (Thrust 3): As with Thrust 2 above, our initial quantita- tive evaluation (RQ3.1) shall be in terms of whether richer image models improve performance in terms of recommendation and retrieval benchmarks. To evaluate the quality of generated images (RQ3.2), we shall make use of the inception score [62], as well as structural similarity (SSIM) [175] when measuring image diversity [41, 174]. Since measuring image quality is inherently subjective, we shall also evaluate the generated images in terms of how well they match user preference models, and through user studies (which will again consist of pairwise preference comparisons) [12, 47]. 6.2 Datasets and Reproducibility A selection of datasets to be used in this proposal (and their associated thrusts) is shown in Ta- ble 9. These data range from clicks, purchases, text, images, time-series, and social networks, to clinical procedures and dance choreography. Note that all are either collected by the PI’s lab, or obtained via an industrial or academic partnership. Additional sources of public benchmark data (e.g. Yelp [178], MovieLens [179], Epinions [180], Dunnhumby [181], etc.) are also regularly used in our work. Collecting data is an ongoing effort, with additional data to be collected throughout this project, including medical (Thrust 1) and commercial (Thrust 3) data that will contribute to the broader impacts of our research. The PI has a track record of releasing all datasets immediately after publication, along with open-source implementations of all research artifacts. Similarly, all (non-sensitive) data, methods, and code from this proposal will be released publicly. This guarantees that all methods and results produced here shall be reproducible.

13 task Year 1 Year 2 Year 3 Year 4 Year 5 1.1a Specify ‘opinion-base’ ontologies / relations 1.1b Collect labeled relations via crowdsourcing 1.1c Investigate supervised techniques for personalized ER extraction 1.1d Evaluate and compare against baselines for RQ1.1 1.2a Investigate extensions of MoE models from [12] using KB data 1.2b Investigate models for subjective question analysis/classification 1.2c Evaluate and compare against baselines for RQ1.2 * Dissemination of RQ1 results Broader Impacts: Systems for neurotological dialog and Q/A 2.1a Investigate fully-supervised GCN models (Fig. 7e and f) 2.1b Investigate semi-supervised GCN models (Fig. 7g) * 2.1c Evaluate and compare against baselines for RQ2.1 2.2a Investigate joint GCNs/recommender systems (Eq. (2)) 2.2b Evaluate GCNs for personalized retrieval (incl. user studies) Dissemination of RQ2 results * Broader impacts: (user × sequence → sequence) models 3.1a Investigate methods for personalized comparative learning (Fig. 8c) 3.1b Evaluate DVBPR and VBPR against alternatives 3.2a Investigate GAN methods for personalized generation (Fig. 8e and f) 3.2b Investigate joint training schemes for RecSys+GAN training (Fig. 8g) 3.2c Evaluate RQ3.2 methods (incl. user studies) * Dissemination of RQ3 results * * Broader Impacts: Development of personalized VQA models Table 10: Five-year timeline and collaborators (Table 11). (*Student is expected to graduate during this period).

6.3 Advising and Collaboration Plan Research in this proposal will be conducted by the PhD. students in the PI’s lab (Table 11), as well as our academic and industrial collaborators (see letters of collaboration attached). Thrust 1 will be conducted by , building on her prior work on Opinion Q/A systems [15]. Thrust 2 will initially be conducted by ; following his graduation this research will be conducted by . Thrust 3 builds upon prior work by [9]; following his graduation work will be conducted by . developed the preliminary experiments reported in this proposal [46, 47, 85]. Broader impacts will be achieved through our industrial PhD. Students and academic partnerships. Our work on Q/A for patient in- Abbr. Student Graduation PhD, 2017 take will be conducted in collaboration with and PhD, 2018 ; this work is is connected with an ongoing pi- † PhD, 2020 † PhD, 2021 lot study that will collect data and develop initial prototypes PhD, 2021 over the next 18 months; our use of UCSD’s clinical record PhD, 2021 ∗ PhD, 2019 data is covered by IRB (April 2016), with additional ∗ PhD, 2019 approvals to be obtained as needed. Our work on sequence ∗† PhD, 2019 ∗ modeling for heartrate data will be conducted through by co-advised; CD with ; LM with ; IS with . . Finally our work on artistic preferences and Other student collaborators personalized VQA will be conducted with and . Undergraduate, UCSD † Undergraduate, UCSD Timeline and funding allocation. Our thrusts are staggered † Undergraduate, UCSD to coincide with graduation of senior students and † Undergraduate, UCSD Undergraduate, UCSD the training of new students ; we aim to complete Undergraduate/MS, UCSD Thrusts 1, 2, and 3 in years 3, 4, and 5. is funded by MS, UCSD an MSR fellowship during years 1-2; will graduate in 2018 MS, UCSD MS, UCSD and will help to train ; and are new students partially Other academic and industrial collaborators funded by departmental scholarships during years 1-2. Thus we UCSD (School of Medicine) expect funding to be allocated between during year Neurotology Clinician Stanford 3, and during years 1,2,4, and 5. An approximate UCSD timeline for these activities is shown in Table 10. UCSD Although our timeline involves multiple tasks in parallel, our advising plan guarantees that students in our lab are en- couraged to pursue interdisciplinary and collaborative research, † exposing them to ideas from departments across campus, and Table 11: PI’s lab. Female / URM putting their ideas into practice via industrial partnerships. Each of the collaborations described above are established working relationships, involving regular meetings between the PI, collaborators, and advisees.

14 7 Educational Plan and Outreach Activities Mentorship. In addition to my PhD. advising activities, I am actively involved in mentorship of students from diverse backgrounds, ranging from high-schoolers to engineering professionals. I am an advisor in UCSD’s Early Research Scholars Program (ERSP), a mentorship program aimed at increasing retention of women and underrepresented minorities in CS by directly involving them in undergraduate research. I have advised nine undergraduates through this program across three research projects (SL, ZQ, SH dur- ing 2017). I am also an advisor in UCSD’s MAS program, a graduate program for working, engineering professionals. Several of my advisees are women and underrepresented minorities (Table 11). Outreach. Recruitment and retention: I regularly engage in activities aimed at fostering interest in com- puter science amongst people of all ages: I have participated in events with the UC High Coding Club (high-schoolers), with the UC Village (transfer students), the UCSD CSE SPIS program (new admits), and as an advisor in the ERSP (aimed at increasing retention of URMs). Community Workshops: I am actively involved in organizing community workshops on machine learning and data mining. In 2016/17 I organized the Workshop on Mining and Learning with Graphs (MLG); the Workshop on Big Graphs, Theory and Practice; the Workshop on Machine Learning meets Fashion; and the Southern California Machine Learning Symposium. The latter was a standalone event that attracted 43 papers, 7 corporate sponsors, and around 300 attendees. SDM Doctoral Forum: In 2018 I will serve as Chair of the SDM Doctoral Forum, an NSF-sponsored event that provides opportunities for Ph.D. students to discuss and receive feedback on their research from peers and senior practitioners in the field. Integrating Research and Teaching. My teaching focuses on a ‘hands-on’ approach to machine learn- ing, data mining, and recommender systems, by allowing students to work with large-scale datasets, solving predictive tasks that come directly from real-world applications. For example, my students compete on Kaggle to develop recommendation engines, using large review datasets from my own re- search (Table 9). Each week I cover case-studies from academic literature; this integration between fundamental concepts and cutting-edge research has proven to be among the most positively evaluated aspects of my classes, especially among undergraduates. In 2016/17 I have developed new (graduate and undergraduate) classes on Web Mining and Recommender Systems, which have quickly grown to become the most popular electives (by enrollment) in our department; following the research in this proposal I plan to incorporate sections on generative and structured output modeling into my existing classes. In 2017/18 I will develop another course that focuses on large-scale implementation of modern recommender systems research. Recommender Systems for Student Admissions. One of my roles in our department is as Chair of our MS recruiting committee. This is a demanding role, given the need to review over 3,000 applications— and around 10,000 letters of reference—per year. In the last year I have been analyzing our historical admissions with a goal to build recommender systems to (a) increase the efficiency of our admissions process; (b) identify the strongest files early, as well as ‘outliers’ that need special attention; and (c) to reduce bias in our admissions process. For the latter, even simple analytics (e.g. what percentile is a 3.8 GPA at this university? How often does this letter writer say ‘top 1%’? etc.) can drastically increase the efficiency of the process, while also increasing its fairness. I am also collaborating with Ndapa Nakashole (a new faculty member in our department) to analyze the use of gendered language in CS letters of recommendation, and the effect that it has on admissions decisions; this shall benefit from our results in Thrust 2, as this task inherently requires the analysis of subjective language. Developing UCSD’s Data Science Program. I am working to develop of UCSD’s new Data Science undergraduate major and minor programs that are launching in 2017/18. My course will be incorpo- rated into the new curriculum, and I am also working on the development of new introductory courses (DS10/20/30) to be offered in the program’s lower division. Results from Prior NSF Support The PI is supported by NSF-IIS-1636879 (“Knowledge discovery and real-time interventions from sensory data flows in urban spaces,” $517,917, with Altintas, I., Gadh, R., Gupta, R., Shutters, S., and Srivastava, M.). This project is currently in its first year of funding, and has so far led to two publications (with RH and RP) [10, 21], as well as the collection of large datasets of urban sensor measurements (Table 9). This project is concerned with building predictive models of urban sensors (e.g. building energy usage, weather, parking availability, EV charging, etc.), and as such connects to our broader impacts in Thrust 2. This project supports the collection of new heterogeneous sequence datasets, and conversely shall benefit from the new structured modeling techniques to be developed here.

15 8 References

[1] Jaewon Yang, Julian McAuley, and Jure Leskovec. Detecting cohesive and 2-mode communities in directed and undirected networks. In Web Search and Data Mining, 2014.

[2] T. Zhao, J. J. McAuley, and I. King. Leveraging social connections to improve personalized ranking for collaborative filtering. In Conference on Information and Knowledge Management, 2014.

[3] T. Zhao, J. J. McAuley, and I. King. Improving latent factor models via personalized feature projection for one-class recommendation. In Conference on Information and Knowledge Man- agement, 2015.

[4] D. Lim, J. McAuley, and G. Lanckriet. Top-n recommendation with missing implicit feedback. In ACM Conference on Recommender Systems, 2015.

[5] Julian McAuley, Rahul Pandey, and Jure Leskovec. Inferring networks of substitutable and com- plementary products. In Knowledge Discovery and Data Mining, 2015.

[6] Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-based recommendations on style and substitutes. In SIGIR, 2015.

[7] Andreas Veit, Balazs Kovacs, Sean Bell, Julian McAuley, Kavita Bala, and Serge Belongie. Learning visual clothing style with heterogeneous dyadic co-occurrences. In ICCV, 2015.

[8] R. He, C. Lin, J. Wang, and J. McAuley. Sparse hierarchical embeddings for visually-aware one- class collaborative filtering. In International Joint Conference on Artificial Intelligence, 2016.

[9] R. He and J. McAuley. VBPR: visual bayesian personalized ranking from implicit feedback. In AAAI, 2016.

[10] R. He and J. McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In World Wide Web, 2016.

[11] R. He and J. McAuley. Fusing similarity models with markov chains for sparse sequential rec- ommendation. In International Conference on Data Mining, 2016.

[12] Julian McAuley and Alex Yang. Addressing complex and subjective product-related queries with customer reviews. In World Wide Web, 2016.

[13] R. He, C. Fang, Z. Wang, and J. McAuley. Vista: A visually, socially, and temporally-aware model for artistic recommendation. In RecSys, 2016.

[14] R. He, C. Packer, and J. McAuley. Learning compatibility across categories for heterogeneous item recommendation. In International Conference on Data Mining, 2016.

[15] M. Wan and J. McAuley. Modeling ambiguity, subjectivity, and diverging viewpoints in opinion question answering systems. In ICDM, 2016.

[16] J. Rappaz, M.-L. Vladarean, J. McAuley, and M. Catasta. Bartering books to beers: A recom- mender system for exchange platforms. In Web Search and Data Mining, 2017.

[17] M. Wan, D. Wang, M. Goldman, M. Taddy, J. Rao, J. Liu, D. Lymberopoulos, and J. McAuley. Modeling consumer preferences and price sensitivities from large-scale grocery shopping trans- action logs. In World Wide Web, 2017.

1 [18] A. Pathak, K. Gupta, and J. McAuley. Generating and personalizing bundle recommendations on steam. In SIGIR, 2017.

[19] C. Cai, R. He, and J. McAuley. SPMC: Socially-aware personalized markov chains for sparse sequential recommendation. In International Joint Conference on Artificial Intelligence, 2017.

[20] C. Donahue, Z. Lipton, and J. McAuley. Dance dance convolution. In International Conference on Machine Learning, 2017.

[21] R. He, W.-C. Kang, and J. McAuley. Translation-based recommendation. In RecSys, 2017.

[22] N. Ng, R. Gabriel, J. McAuley, C. Elkan, and Z. Lipton. Predicting surgery duration with neural heteroscedastic regression. In Machine Learning in Health Care, 2017.

[23] Julian McAuley and Jure Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In World Wide Web, 2013.

[24] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In ACM Conference on Recommender Systems, 2013.

[25] Julian McAuley and Jure Leskovec. Learning to discover social circles in networks. In Neural Information Processing Systems, 2012.

[26] Himabindu Lakkaraju, Julian McAuley, and Jure Leskovec. What’s in a name? understanding the interplay between titles, content, and communities in social media. In International Conference on Weblogs and Social Media, 2013.

[27] Julian McAuley, Jure Leskovec, and Dan Jurafsky. Learning attitudes and attributes from multi- aspect reviews. In International Conference on Data Mining, 2012.

[28] Dawei Chen, Lexing Xie, Aditya Krishna Menon, and Cheng Soon Ong. Structured recommen- dation. arXiv preprint arXiv:1706.09067, 2017.

[29] Rui Meng, Hao Xin, Lei Chen, and Yangqiu Song. Subjective knowledge acquisition and enrich- ment powered by crowdsourcing. arXiv preprint arXiv:1705.05720, 2017.

[30] David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A. Kalyan- pur, Adam Lally, J. William Murdock, Eric Nyberg, John Prager, Nico Schlaefer, and Chris Welty. Building Watson: An overview of the DeepQA project. In AI Magazine, 2010.

[31] Alec Radford, Rafal Jozefowicz,´ and Ilya Sutskever. Learning to generate reviews and discovering sentiment. CoRR, abs/1704.01444, 2017.

[32] Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, and Ke Xu. Learning to generate product reviews from attributes. In Association for Computational Linguistics, 2017.

[33] Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P. Xing. Controllable text generation. CoRR, abs/1703.00955, 2017.

[34] Lei Zheng, Vahid Noroozi, and Philip S. Yu. Joint deep modeling of users and items using reviews for recommendation. In Web Search and Data Mining, 2017.

[35] Rose Catherine and William W. Cohen. Transnets: Learning to transform for recommendation. CoRR, abs/1704.02298, 2017.

[36] Chao-Yuan Wu, Amr Ahmed, and Alexander J. Smola Alex Beutel. Joint training of ratings and reviews with recurrent recommender networks. In ICLR Workshops, 2017.

2 [37] Guang-Neng Hu. Integrating reviews into personalized ranking for cold start recommendation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2017.

[38] Xiang Zhang and Yann LeCun. Text understanding from scratch. arXiv preprint arXiv:1502.01710, 2015.

[39] Piji Li, Zihao Wang, Zhaochun Ren, Lidong Bing, and Wai Lam. Neural rating regression with abstractive tips generation for recommendation. In SIGIR, 2017.

[40] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.

[41] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with aux- iliary classifier gans. In ICML, 2017.

[42] Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. Attribute2image: Conditional image generation from visual attributes. In ECCV, 2016.

[43] Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In NIPS, 2016.

[44] Yehuda Koren and Robert Bell. Advances in collaborative filtering. In Recommender Systems Handbook. Springer, 2011.

[45] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. BPR: bayesian personalized ranking from implicit feedback. In UAI, 2009.

[46] Zachary Lipton, Sharad Vikram, and Julian McAuley. Generative concatenative nets jointly learn to write and classify reviews. in preparation.

[47] Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian McAuley. Visually-aware fashion recommendation and design with generative image models. in submission.

[48] Steffen Rendle. Factorization machines. In ICDM, 2010.

[49] Ke Zhou, Shuang-Hong Yang, and Hongyuan Zha. Functional matrix factorizations for cold-start recommendation. In SIGIR, 2011.

[50] Trung V Nguyen, Alexandros Karatzoglou, and Linas Baltrunas. Gaussian process factorization machines for context-aware recommendations. In SIGIR, 2014.

[51] Chun-Che Wu, Tao Mei, Winston H Hsu, and Yong Rui. Learning to personalize trending image search suggestion. In SIGIR, 2014.

[52] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In IEEE Conference on Computer and Pattern Recognition, 2015.

[53] Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.

[54] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural net- works. In Advances in Neural Information Processing Systems, pages 3104–3112, 2014.

[55] Rafal Jozefowicz,´ Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the limits of language modeling. CoRR, abs/1602.02410, 2016.

3 [56] Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. Affect-lm: A neural language model for customizable affective text generation. CoRR, abs/1704.06851, 2017.

[57] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text to image synthesis. In ICML, 2016.

[58] Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, and Nigam Shah. Finding progres- sion stages in time-evolving event sequences. In World Wide Web, 2014.

[59] Jaewon Yang, Julian McAuley, and Jure Leskovec. Community detection in networks with node attributes. In International Conference on Data Mining, 2013.

[60] J. J. McAuley and J. Leskovec. Image labeling on a network: using social-network metadata for image classification. In European Conference on Computer Vision, 2012.

[61] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In CVPR, 2006.

[62] Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In NIPS, 2016.

[63] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. arXiv preprint ArXiv:1611.04076, 2016.

[64] Ryen White, Sheng Wang, Apurv Pant, Rave Harpaz, Pushpraj Shukla, Walter Sun, William DuMouchel, and Eric Horvitz. Early identification of adverse drug reactions from search log data. Journal of Biomedical Informatics, 2016.

[65] Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Re. Incremen- tal knowledge base construction using deepdive. In VLDB, 2015.

[66] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, 2015.

[67] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, , and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In KDD, 2008.

[68] Vikas Raykar, Shipeng Yu, Linda Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bo- goni, and Linda Moy. Learning from crowds. JMLR, 2010.

[69] Aditya Menon and Charles Elkan. Link prediction via matrix factorization. Machine Learning and Knowledge Discovery in Databases, pages 437–452, 2011.

[70] Junwei Bao, Nan Duan, Ming Zhou, and Tiejun Zhao. Knowledge-based question answering as machine translation. Cell, 2(6), 2014.

[71] Bushra Anjum and Chaman L Sabharwal. An entropy based product ranking algorithm using reviews and q&a data. In DMS, pages 69–76, 2016.

[72] Hu Xu, Lei Shu, Jingyuan Zhang, and Philip S Yu. Mining compatible/incompatible entities from question and answering via yes/no answer classification using distant label expansion. arXiv preprint arXiv:1612.04499, 2016.

[73] Yang Bao, Hui Fang, and Jie Zhang. TopicMF: Simultaneously exploiting ratings and reviews for recommendation. In AAAI, 2014.

4 [74] Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. Content-based collabora- tive filtering for news topic recommendation. In AAAI, 2015.

[75] Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Jeff Yuan, and Lluis Garcia- Pueyo. Supercharging recommender systems using taxonomies for learning user purchase behav- ior. In VLDB, 2012.

[76] Zhi Qiao, Peng Zhang, Yanan Cao, Chuan Zhou, Li Guo, and Binxing Fang. Combining het- erogenous social and geographical information for event recommendation. In AAAI, 2014.

[77] J. McAuley, J. Leskovec, and D. Jurafsky. Learning attitudes and attributes from multi-aspect reviews. In ICDM, 2012.

[78] S. Blair-Goldensohn, T. Neylon, K. Hannan, G. Reis, R. McDonald, and J. Reynar. Building a sentiment summarizer for local service reviews. In NLP in the Information Explosion Era, 2008.

[79] S. Brody and N. Elhadad. An unsupervised aspect-sentiment model for online reviews. In ACL, 2010.

[80] G. Ganu, N. Elhadad, and A. Marian. Beyond the stars: Improving rating predictions using review text content. In WebDB, 2009.

[81] N. Gupta, G. Di Fabbrizio, and P. Haffner. Capturing the stars: predicting ratings for service and product reviews. In HLT Workshops, 2010.

[82] Jeffrey L. Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.

[83] Ilya Sutskever, James Martens, and Geoffrey E. Hinton. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1017–1024, 2011.

[84] Alex Graves. Generating sequences with recurrent neural networks. CoRR, abs/1308.0850, 2013.

[85] Jianmo Ni, Zachary Lipton, Sharad Vikram, and Julian McAuley. Estimating reactions and rec- ommending products with generative models of reviews. in submission.

[86] Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In SIGKDD, 2004.

[87] Giuseppe Carenini, Jackie Chi Kit Cheung, and Adam Pauls. Multi-document summarization of evaluative text. Computational Intelligence, 2013.

[88] Rahul Katragadda and Vasudeva Varma. Query-focused summaries or query-biased summaries? In ACL-IJCNLP, 2009.

[89] Xinfan Meng and Houfeng Wang. Mining user reviews: from specification to summarization. In ACL-IJCNLP, 2009.

[90] Jian Liu, Gengfeng Wu, and Jianxin Yao. Opinion searching in multi-product reviews. In CIT, 2006.

[91] Samaneh Moghaddam and Martin Ester. AQA: aspect-based opinion question answering. In ICDMW, 2011.

[92] Hong Yu and Vasileios Hatzivassiloglou. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In EMNLP, 2003.

[93] Junlin Hu, Jiwen Lu, and Yap-Peng Tan. Discriminative deep metric learning for face verification in the wild. In CVPR, 2014.

5 [94] Sean Bell and Kavita Bala. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG), 34(4):98, 2015.

[95] Chenyi Lei, Dong Liu, Weiping Li, Zheng-Jun Zha, and Houqiang Li. Comparative deep learning of hybrid representations for image recommendations. In CVPR, 2016.

[96] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. Collaborative deep learning for recommender systems. In SIGKDD, 2015.

[97] Ruining He and Julian McAuley. VBPR: Visual bayesian personalized ranking from implicit feedback. In AAAI, 2016.

[98] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.

[99] Emily Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS, 2015.

[100] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

[101] Devashish Shankar, Sujay Narumanchi, HA Ananya, Pramod Kompalli, and Krishnendu Chaud- hury. Deep learning based large scale visual recommendation and search for e-commerce. arXiv preprint arXiv:1703.02344, 2017.

[102] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. Vqa: Visual question answering. In ICCV, 2015.

[103] Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. What makes paris look like paris? SIGGRAPH, 2012.

[104] Diane J. Hu, Rob Hall, and Josh Attenberg. Style in the long tail: Discovering unique interests with latent variable models in large scale social e-commerce. In KDD, 2014.

[105] Wei Di, Catherine Wah, Anurag Bhardwaj, Robinson Piramuthu, and Neel Sundaresan. Style finder: Fine-grained clothing style detection and retrieval. In CVPR Workshops, 2013.

[106] Vignesh Jagadeesh, Robinson Piramuthu, Anurag Bhardwaj, Wei Di, and Neel Sundaresan. Large scale visual recommendations from street fashion images. arXiv:1401.1778, 2014.

[107] Yannis Kalantidis, Lyndon Kennedy, and Li-Jia Li. Getting the look: Clothing recognition and segmentation for automatic product suggestions in everyday photos. In ICMR, 2013.

[108] Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu, Changsheng Xu, and Shuicheng Yan. Hi, magic closet, tell me what to wear! In ACM Conference on Multimedia, 2012.

[109] Edgar Simo-Serra and Hiroshi Ishikawa. Fashion style in 128 floats: Joint ranking and classifica- tion using weak data for feature extraction. In CVPR, 2016.

[110] Kota Yamaguchi, M Hadi Kiapour, and Tamara L Berg. Paper doll parsing: Retrieving similar styles to parse clothing items. In ICCV, 2013.

[111] Wei Yang, Ping Luo, and Liang Lin. Clothing co-parsing by joint image segmentation and label- ing. In CVPR, 2014.

[112] Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, and Raquel Urtasun. Neuroaesthetics in fashion: Modeling the perception of fashionability. In CVPR, 2015.

6 [113] Iljung Sam Kwak, Ana C. Murillo, Peter Belhumeur, Serge Belongie, and David Kriegman. From bikers to surfers: Visual recognition of urban tribes. In BMVC, 2013.

[114] Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In WWW, 2016.

[115] Ana C Murillo, Iljung S Kwak, Lubomir Bourdev, David Kriegman, and Serge Belongie. Urban tribes: Analyzing group photos from a social perspective. In CVPR Workshops, 2012.

[116] Sirion Vittayakorn, Kota Yamaguchi, Alexander C Berg, and Tamara L Berg. Runway to realway: Visual analysis of fashion. In WACV, 2015.

[117] Lukas Bossard, Matthias Dantone, Christian Leistner, Christian Wengert, Till Quack, and Luc Van Gool. Apparel classification with style. In ACCV, 2013.

[118] Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. Material recognition in the wild with the materials in context database. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pages 3479–3487, 2015.

[119] Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI, pages 2852–2858, 2017.

[120] Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547, 2017.

[121] Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. Improving neural machine translation with conditional sequence generative adversarial nets. arXiv preprint arXiv:1703.04887, 2017.

[122] Steffen Rendle. Factorization machines. In ICDM, 2010.

[123] Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. Fast matrix factorization for online recommendation with implicit feedback. In SIGIR, 2016.

[124] Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. Mymedialite: a free recommender system library. In RecSys, 2011.

[125] Robert M. Bell, Yehuda Koren, and Chris Volinsky. The bellkor solution to the netflix prize, 2007.

[126] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In ICDM, 2008.

[127] James Bennett and Stan Lanning. The netflix prize. In KDDCup, 2007.

[128] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul Kantor, editors. Recommender Systems Handbook. Springer, 2011.

[129] Defu Lian, Yong Ge, Fuzheng Zhang, Nicholas Jing Yuan, Xing Xie, Tao Zhou, and Yong Rui. Content-aware collaborative filtering for location recommendation based on human mobility data. In ICDM, 2015.

[130] Maciej Kula. Metadata embeddings for user and item cold-start recommendations. arXiv preprint arXiv:1507.08439, 2015.

[131] Hao Ma, Haixuan Yang, Michael R Lyu, and Irwin King. Sorec: social recommendation using probabilistic matrix factorization. In CIKM, 2008.

[132] Zijun Yao, Yanjie Fu, Bin Liu, Yanchi Liu, and Hui Xiong. POI recommendation: A temporal matching between POI popularity and user regularity. In ICDM, 2016.

7 [133] Subhabrata Mukherjee, Hemank Lamba, and Gerhard Weikum. Experience-aware item recom- mendation in evolving review communities. In ICDM, 2015.

[134] Jingyuan Yang, Chuanren Liu, Mingfei Teng, Hui Xiong, March Liao, and Vivian Zhu. Exploiting temporal and social factors for B2B marketing campaign recommendations. In ICDM, 2015.

[135] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. Factorizing personalized markov chains for next-basket recommendation. In WWW, 2010.

[136] Andrew Zimdars, David Maxwell Chickering, and Christopher Meek. Using temporal data for making recommendations. In UAI, 2001.

[137] Guy Shani, Ronen I Brafman, and David Heckerman. An mdp-based recommender system. In UAI, 2002.

[138] Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. Using sequential and non- sequential patterns in predictive web usage mining tasks. In ICDM, 2002.

[139] Haixun Wang, Wei Fan, Philip S Yu, and Jiawei Han. Mining concept-drifting data streams using ensemble classifiers. In SIGKDD, 2003.

[140] Ralf Klinkenberg. Learning drifting concepts: Example selection vs. example weighting. Intelli- gent Data Analysis, 2004.

[141] Yi Ding and Xue Li. Time weight collaborative filtering. In CIKM, 2005.

[142] Yehuda Koren. Collaborative filtering with temporal dynamics. Communications of the ACM, 2010.

[143] Artus Krohn-Grimberghe, Lucas Drumond, Christoph Freudenthaler, and Lars Schmidt-Thieme. Multi-relational matrix factorization using bayesian personalized ranking for social network data. In WSDM, 2012.

[144] Weike Pan and Li Chen. Gbpr: Group preference based bayesian personalized ranking for one- class collaborative filtering. In IJCAI, 2013.

[145] Guibing Guo, Jie Zhang, and Neil Yorke-Smith. Trustsvd: Collaborative filtering with both the explicit and implicit influence of user trust and of item ratings. In AAAI, 2015.

[146] Allison JB Chaney, David M Blei, and Tina Eliassi-Rad. A probabilistic model for using social networks in personalized item recommendation. In RecSys, 2015.

[147] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. Wide & deep learning for recommender systems. In Workshop on Deep Learning for Recommender Systems, 2016.

[148] Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. Deep content-based music recommendation. In NIPS, 2013.

[149] Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for youtube recommenda- tions. In RecSys, 2016.

[150] Milind Naphade, John R Smith, Jelena Tesic, Shih-Fu Chang, Winston Hsu, Amd Lyndon Kennedy, Alexander Hauptmann, and Jon Curtis. Large-scale concept ontology for multimedia. IEEE Multimedia, 2006.

8 [151] Tao Chen, Felix Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. Object-based visual sentiment concept analysis and application. In ACM Conference on Multimedia, 2014.

[152] Nikhil Rasiwasia, Pedro Mereno, and Nuno Vasconcelos. Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 2007.

[153] John Smith and Shih-Fu Chang. VisualSEEk: a fully automated content-based image query system. In ACM Conference on Multimedia, 1996.

[154] Michael S Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. Content-based multimedia in- formation retrieval: State of the art and challenges. ACM TOMCCAP, 2006.

[155] Dong liu, Guangnan Ye, Ching-Ting Chen, Shuicheng Yan, and Shih-Fu Chang. Hybrid social media network. In ACM Conference on Multimedia, 2012.

[156] Malcolm Slaney. Web-scale multimedia analysis: Does content matter? http:// ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5754643, 2011.

[157] Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert Lanckriet, Roger Levy, and Nuno Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Conferenc on Multimedia, 2010.

[158] Cees Snoek and Marcel Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 2005.

[159] Jing He and Decheng Dai. Summarization of yes/no questions using a feature function model. JMLR, 2011.

[160] A. Bordes, J. Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowledge bases. In AAAI, 2011.

[161] Yuanhua Lv and ChengXiang Zhai. Lower-bounding term frequency normalization. In CIKM, 2011.

[162] Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In ACL Workshop on Text Summarization Branches Out, 2004.

[163] K Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: development and comparative experiments. Information Processing & Management, 2000.

[164] Jianxing Yu, Zheng-Jun Zha, and Tat-Seng Chua. Answering opinion questions on products by exploiting hierarchical organization of consumer reviews. In EMNLP-CoNLL, 2012.

[165] Alexandra Balahur, Ester Boldrini, Andres´ Montoyo, and Patricio Mart´ınez-Barco. Going beyond traditional QA systems: challenges and keys in opinion question answering. In COLING, 2010.

[166] Alexandra Balahur, Ester Boldrini, Andres´ Montoyo, and Patricio Mart´ınez-Barco. Opinion ques- tion answering: Towards a unified approach. In ECAI, 2010.

[167] Alexandra Balahur, Ester Boldrini, Andres Montoyo, and Patricio Martinez-Barco. Opinion and generic question answering systems: a performance analysis. In ACL-IJCNLP, 2009.

[168] Fangtao Li, Yang Tang, Minlie Huang, and Xiaoyan Zhu. Answering opinion questions with random walks on graphs. In ACL-IJCNLP, 2009.

[169] Veselin Stoyanov, Claire Cardie, and Janyce Wiebe. Multi-perspective question answering using the OpQA corpus. In HLT/EMNLP, 2005.

9 [170] Lun-Wei Ku, Yu-Ting Liang, and Hsin-Hsi Chen. Question analysis and answer passage retrieval for opinion question answering systems. In ROCLING, 2007.

[171] I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In ACL, 2008.

[172] Guang Ling, Michael R Lyu, and Irwin King. Ratings meet reviews, a combined approach to recommend. In Proceedings of the 8th ACM Conference on Recommender systems, pages 105– 112. ACM, 2014.

[173] Qiming Diao, Minghui Qiu, Chao-Yuan Wu, Alexander J Smola, Jing Jiang, and Chong Wang. Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In Pro- ceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 193–202. ACM, 2014.

[174] Lucas Theis, Aaron¨ van den Oord, and Matthias Bethge. A note on the evaluation of generative models. In ICLR, 2016.

[175] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: from error visibility to structural similarity. TIP, 2004.

[176] Chris Donahue, Akshay Balsubramani, Julian McAuley, and Zachary C Lipton. Semantically de- composing the latent spaces of generative adversarial networks. arXiv preprint arXiv:1705.07904, 2017.

[177] J. J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimen- sions with review text. In ACM Conference on Recommender Systems, 2013.

[178] Yelp. Yelp Dataset Challenge. https://www.yelp.com/dataset_challenge, 2013.

[179] F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4):19, 2016.

[180] Jure Leskovec and Andrej Krevl. {SNAP Datasets}:{Stanford} large network dataset collection. 2015.

[181] Dunnhumby. Dunnhumby Dataset. https://www.dunnhumby.com/sourcefiles.

10 9 Data Manamegement Plan Expected Data to be Managed Data to be used in this proposal are listed below (see Table 9 in the Narrative). Data span records in- cluding clicks, purchases, co-purchases, trades, likes, reviews, images, questions-and-answers, heartrate logs, metropolitan senors, clinical records, and dance choreography: Category Type Description Source No. of Records Students & Thrust Refs. Collabs. e-Commerce General reviews Product reviews, metadata, Amazon 100M MW, WK T2, T3 [7– images 11, 14, 19] Reviews Beer & wine reviews Beeradvocate, RateBeer, Cellar- 4.5M ZL, JN T2 [24, 177] tracker Reviews Local business reviews Google 11.4M RH, WK T2 [21] Activities and images Clicks, likes, etc. Adobe (Behance) 11.8M actions, CF, WK T3 [13] 980k images Q/A Q/A data Product-related queries Amazon 1.4M questions MW T1 [12, 16] and answers Entity relations* Extracted entity relationships RQ1.1 TBD MW T1 Medical dialogs* Medical Q/A and conversa- UCSD TBD EV, CH T1 tions Sequences Dance Choreography DDR step charts StepMania 350k dance moves, CD, ZL T2 [20] 35 hours of audio Workout logs Heart-rate and GPS data (run- EndoMondo, Strava 1M workouts MM, LM T2 ning and cycling) Metropolitan Sensors Road and building sensors UCSD (NSF-IIS-1636879) 40M sensor read- RG, RP T2 ings Clinical procedures Procedure durations and char- UCSD 117K ZL, NN T2 [22] acteristics Other structured Peer-to-peer trades Bartering transactions Ratebeer, Tradesy 125k peer-to-peer MC [16] RecSys data transactions Grocery shopping Grocery shopping transac- Anonymous Seattle-based gro- 152k transactions MW, DW [17] tions (baskets) cery store (via collaboration w/ across 53k trips Microsoft) Product bundles Bundled purchases Steam 902K AP, KG [18] The majority of data required for initial evaluation has already been collected. Our lab has a track record of providing reference implementations and complete data from all academic publications to date;1 similarly, wherever possible, data collected in this proposal will be made available. We note the following exceptions: 1. Proprietary datasets, i.e., those collected from industrial collaborations, will not be released, and shall be maintained securely. Where possible, we will release (anonymized) public subsets of the data, as we have done previously with our Behance data during our collaboration with Adobe [13]. Doing so ensures that our results are verifiable and extensible, even where private data are used. 2. Sensitive data (e.g. clinical notes) will not be released, and shall be maintained securely. Our use of UCSD’s clinical records data is covered under IRB #160410, with additional approvals to be obtained as needed. 3. Collection of additional data for neurotology patient intake and triage (Thrust 1) will be done in collaboration with Chun-Nan Hsu (Professor of Medicine, UCSD), and Erik Viirre (a practicing physician). Data collection for this project is expected to be completed by Q2 2019. Collecting additional datasets is a substantial and ongoing effort by our lab, and necessary for the advancement of research on behavioral models and recommender systems. Many of the datasets col- lected by our lab have become de facto standards in industry and academia, and are widely used in data mining and machine learning classes worldwide. Dissemination As per the NSF policy on the dissemination and sharing of research results, all research data shall be made available at the time of publication, and shall remain available for a minimum of three years.

1http://cseweb.ucsd.edu/˜jmcauley/

1