Microsoft Research New Faculty Fellowship Nomination

Total Page:16

File Type:pdf, Size:1020Kb

Microsoft Research New Faculty Fellowship Nomination

Microsoft Research Faculty Fellowship Nomination Application Questions

Innovative Merit

Social, healthcare, and research tasks that were traditionally done individually and offline are now accomplished much better collaboratively in online communities.

Mei’s work represents early attempts in integrative and in-situ information retrieval and mining that target at data emerged in those communities. His unique vision is to connect the content, the context, the crowd, and the cloud, which leads to the next generation of community-based information system that he calls the Foreseer (“4-C-er”). He has conducted the first formal study of the influence of context in text mining, including environmental, emotional, and social context. This new paradigm of text mining, contextual text mining, has helped him win two best paper awards from the top data mining conference, KDD, and a dissertation award from SIGKDD.

Mei’s research offers insights into many other fundamental challenges in community- based information analysis, such as how to jointly model user-generated contents and user behaviors, how to develop efficient algorithms for web-scale data mining, how to incorporate user interactions and domain knowledge such as social theories into the mining process, and how to utilize mining results to support decision-making.

Mei’s work has a strong interdisciplinary flavor. He leverages techniques and findings in multiple fields, including computer science, linguistics, sociology, network science, and biomedical science, to facilitate his own research in social computing and health informatics. He aims to develop an advanced set of data analysis tools and end-user- oriented information systems that are appreciated by social network users, researchers, and healthcare providers and consumers (i.e., physicians and patients).

Potential to Advance the Stat of the Art

In existing text mining systems, the context and behaviors of users are usually isolated from the mining process. Content and user behaviors tend to be modeled separately, and context information is usually neglected. Mei’s work advances the state-of-the-art by bridging the gap among these isolated components in a unified mining process: text content, context information, user behaviors, and social networks. Mei’s dissertation has already made pioneering contribution in integrating context information in text mining, by opening a new direction of text mining called contextual text mining. Mei’s dissertation work contains a general framework of contextual text mining as well as many instantiations of contextual language models that well captures the effect of environmental, emotional, and social context in the generative process of text data. This new paradigm of text mining not only leads to a number significantly improved methods for information retrieval and topic modeling, but also provides the solution to many novel and challenging research problems, such as modeling the evolution of topics in scientific

Qiaozhu Mei, University of Michigan School of Information research, personalized search, generating faceted summary of public opinions, as well as the discovery of topical communities from social networks.

Mei’s recent work has proceeded beyond contextual text mining, by going towards the next generation of community-based data mining techniques that connect the content, the context, the crowd, and the cloud. The Foreseer system in Mei’s vision is the first attempt of integrative and in situ analysis of information generated in online communities. The core of this new information system consists of a family of general probabilistic community models, which advance the state-of-the-art of community analysis by modeling the dependency between the user-generated content, the user actions, the social network structure, and the latent context information. Such a model provides a better understanding of the content generation, information diffusion, and network evolution in online communities and their correlations. The proposed models unify many existing models and lead to a number of powerful new instantiations for community-based information analysis.

In existing data mining systems, the mining results are usually only useful to data analysts but not directly useful to the end users who have generated the data. There is also a lack of user participation and interaction in the mining process. Mei’s research will develop text mining systems that are compatible with the incentive of users, actively interact with users, adjust to the guidance and feedback from users, and learn from the collective wisdom of users. The new text mining systems target at using the mining results to directly benefit end users, by enhancing computational thinking and smart decision making of users in different domains. One example is Mei’s work on assessing the credibility of controversial and widespread rumors created in the online social communities. Instead of relying on the endorsement of authoritative sources, the project will provide personalized recommendations to users based on text mining and social reputation systems. Another example is Mei’s work on a novel query suggestion service for searching electronic health records (EHRs). The system will suggest medical search queries to physicians and medical researchers through both automatic text mining from EHRs and user query logs, as well as through a social recommendation mechanism.

In summary, Mei’s work will lead to an advanced set of models, tools, and systems of community-based information analysis across disciplines. The portfolio includes contextualized probabilistic models for text and user behaviors, scalable learning algorithms, toolkits, and user-centered information systems for users in social communities, researchers, healthcare providers and consumers. Mei’s findings will also be useful for the improvement web information systems such as search engines. Besides being evaluated using controlled datasets and controlled user study, the new text mining techniques will also be evaluated through prototype systems to understand how effective it is in influencing the user behaviors in the community.

University’s Current Support of the Nominee’s Work

The University of Michigan supports the work of Qiaozhu Mei in several ways. First, a quarter of Qiaozhu's academic year salary is allocated for his research activity. Second,

Qiaozhu Mei, University of Michigan School of Information we provide office and lab space for Qiaozhu. Third, when we hired Qiaozhu he received a generous start-up package over $310,000 that consists of research funds and student support. Finally, the University allows Qiaozhu to recover a small percentage of the indirect costs on his grants which is paid into his discretionary account.

Current Funding from Other Grant Sources

* PI: NSF IIS-0968489. “Assessing information credibility without authoritative sources.” * Co-PI: NIH HHSN276201000032C. “Developing an Intelligent and Socially Oriented Search Query Recommendation Service for Facilitating Information Retrieval in Electronic Health Records.” (PI: Zheng) * Co-PI: NIH 3-U54-DA-021519-05-S2. “NCIBI Bridge Supplement.” (PI: Athey)

Statement of Hypothesis, Objectives, and Methodology of Current Work and Plans

I. Research Objectives:

The rapid evolution of novel internet applications, especially the Web 2.0 applications, has fundamentally changed people’s lives. It has created a huge opportunity for “the PeopleWeb,” by bringing individuals behind the screens into the crowd of online communities. Tasks that were traditionally done individually and offline are now done collaboratively and interactively, which spur the users to lead better social experiences, healthier lives, as well as accelerated scientific research. A new generation of information systems has emerged along with this trend. Those community-based information systems provide their users a brand-new experience featured by rich user-system interactions as well as rich user-user interactions. On the other hand, it has also created a brand-new opportunity and challenge for data miners, where users that were playing a passive role as the data creators have now become part of the data themselves. Indeed, an unprecedented volume of data is generated from these community-based information systems, which consists of rich user-generated content, rich context information, as well as rich user behaviors and interactions. How to battle with this huge dynamic data, model the influence of environmental and social context on user behaviors, how to infer the correlation between the behaviors of users and the content they generate, and how to utilize the knowledge discovered to influence the decision-making and enhance the experiences of end users in these communities are all important research questions, which can lead to innovative technologies and make a large positive impact to our society.

The objective of Mei’s research is to develop theories, models, and systems for next generation of information retrieval and data mining techniques with broad and influential applications to online communities. The goal of such a community-based information analysis system not only includes generating useful results for analysts and researchers,

Qiaozhu Mei, University of Michigan School of Information but also to influence both individual and collective behaviors of users in the communities, such as to enhance information seeking, social communication, scientific innovation, and decision-making in health-related problems. Mei proposes to develop a new paradigm of community-based information analysis by integrating the content, the context, the crowd, and the cloud, and generate interpretable mining results to facilitate computational thinking and decision-making for social actors, scientific researchers, and healthcare providers and consumers. Mei uses an interdisciplinary research approach where he works closely with experts in multiple fields in order to integrate the theories and findings of social behaviors and social context into the text mining process, and evaluate the effectiveness of a mining system based on how well the system influences behaviors in the communities. The interdisciplinary research model of the School of Information and the University of Michigan has created an ideal environment for Mei’s research, with ample opportunities for him to collaborate with world-class experts in social computing, sociology, network science, and biomedical research.

II. Hypothesis:

The key hypothesis is that many traditionally individual tasks are now done in communities, and an integrated community-based information retrieval and mining system can be more effective than existing information systems that isolate the content from the users and contexts in the community. Different from traditional mining systems that mainly target at data analysts, such an integrative and in situ information analysis system is expected to directly influence the behaviors of end users in the community, by facilitating computational thinking and decision-making in their personal and collective activities. People who generate the data can also be who benefit from the system. Mei believes that the user behaviors and the content generated in those communities are closely correlated with each other, and both are influenced by various types of contexts, including environmental context, emotional context, and social context. Mei believes that community-based mining systems can be significantly enhanced through integrative modeling of text content and user activities conditional on context information, where users themselves become a special type of context. The community- based mining process can benefit from interaction between the system and the users as well as the interaction between users and users. The mining of large-scale datasets can be facilitated by leveraging the cloud. Mei believes that mining results can be interpreted in an understandable way to end users, and can be utilized to influence user behaviors in the communities. The effectiveness of such a system should be evaluated by how well it enhances tasks and decision-making of end users in those communities. Mei believes that techniques built from this new generation of information retrieval and mining can facilitate many interdisciplinary research areas, such as social computing and health informatics. These techniques will also enhance existing information systems, such as Web search engines.

III. Methodology and Research Plan:

Mei has made pioneering contribution in integrating context and content in text mining. He has proposed a general framework of contextual text mining that consists of a

Qiaozhu Mei, University of Michigan School of Information family of probabilistic language models that explains the generative process of text depending on context variables including environmental context such as time and geographic location, emotional context such as sentiments, and social context such as social networks. The framework also includes fast inference algorithms, a mechanism to steer the mining process with user’s guidance and personal preferences, as well as a labeling module that generates interpretable annotations for mining results. Mei has proposed a novel technique to regularize probabilistic language models based on the structure of social and information networks. This regularization framework provides a general way for incorporating structural and social context into text mining models and can be flexibly extended to capture other assumptions and constraints about social behavior and social influence. The next generation of community-based information retrieval and mining system that Mei proposes, the Foreseer system, will consist of a new family of probabilistic community models. These models integrate the generative processes of text content and user activities and their dependency on heterogeneous context information. Mei also plans to develop efficient inference and mining algorithms for the collection and analysis of very large data sets collected, especially by utilizing cloud infrastructures. Mei expects to work closely with domain experts to design computational models for social communities that reflect the intuitions, findings, and theory of social behaviors, social influence, and social context and models for health informatics that reflect the domain knowledge in healthcare and biomedical research. Mei plans to integrate such prior knowledge into the model development, by regularizing the community models with important objectives and constraints of how people behave in the community. Mei plans to apply the developed community-based mining techniques to various communities including social networking communities, online health communities, and the community of physicians and medical researchers.

If funded, one portion of the funds will be used to support one Ph.D. student for three years. Two nodes (12 cores each) will be added to the current SI Hadoop Cluster. The funds will also be used to cover other research costs such as system development, controlled user studies for system evaluation, and conference travels.

Qiaozhu Mei, University of Michigan School of Information

Recommended publications